Series in Pandas

pandas
Published

October 13, 2023

What is a Pandas Series?

A Pandas Series is essentially a one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects, etc.). Think of it as a highly efficient and versatile column in a spreadsheet or a single column of a SQL table. The labels are collectively called the index, which provides a convenient way to access and manipulate individual elements.

Creating a Pandas Series

There are several ways to create a Pandas Series:

1. From a list:

import pandas as pd

data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)

This creates a Series with a default integer index starting from 0.

2. From a NumPy array:

import numpy as np
import pandas as pd

data = np.array([1, 2, 3, 4, 5])
series = pd.Series(data)
print(series)

Similar to using a list, but leverages the efficiency of NumPy arrays.

3. From a dictionary:

import pandas as pd

data = {'a': 100, 'b': 200, 'c': 300}
series = pd.Series(data)
print(series)

Here, the dictionary keys become the index, offering a more descriptive labeling.

4. Specifying an index:

import pandas as pd

data = [1, 2, 3, 4, 5]
index = ['A', 'B', 'C', 'D', 'E']
series = pd.Series(data, index=index)
print(series)

This allows for custom, meaningful labels beyond the default numerical index.

Accessing Series Elements

Accessing elements in a Pandas Series is straightforward:

1. Using index labels:

import pandas as pd

data = {'a': 100, 'b': 200, 'c': 300}
series = pd.Series(data)
print(series['a'])  # Accessing element with label 'a'

2. Using integer positions (loc and iloc):

import pandas as pd

data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series.iloc[0]) # Accessing the first element (position 0)
print(series.loc[0]) # Accessing element at index 0

iloc uses integer-based indexing, while loc uses label-based indexing.

Series Operations

Pandas Series support various mathematical and logical operations:

import pandas as pd

series1 = pd.Series([1, 2, 3, 4, 5])
series2 = pd.Series([10, 20, 30, 40, 50])

print(series1 + series2) # Element-wise addition
print(series1 * 2)      # Scalar multiplication
print(series1 > 2)      # Boolean indexing

These operations are performed element-wise, making vectorized computations highly efficient.

Handling Missing Data

Pandas offers robust tools for managing missing data (represented as NaN):

import pandas as pd
import numpy as np

series = pd.Series([1, 2, np.nan, 4, 5])
print(series.isnull()) # Identify missing values
print(series.dropna()) # Remove rows with missing values
print(series.fillna(0)) # Fill missing values with 0

Data Alignment

When performing operations between Series with different indices, Pandas automatically aligns the data based on the index labels:

import pandas as pd

series1 = pd.Series({'a': 1, 'b': 2, 'c': 3})
series2 = pd.Series({'b': 4, 'c': 5, 'd': 6})

print(series1 + series2)  # Notice how NaN appears where indices don't match

Slicing and Filtering

Slicing and filtering are crucial for extracting specific subsets of data:

import pandas as pd

series = pd.Series([10, 20, 30, 40, 50], index=['A', 'B', 'C', 'D', 'E'])
print(series['A':'C'])  # Slicing by label
print(series[series > 20]) #Filtering based on a condition

These examples demonstrate the flexibility and power of Pandas Series for data analysis and manipulation. Further exploration into more advanced features will solidify your proficiency in Pandas.