Pandas Data Structures

pandas
Published

October 7, 2023

Pandas Series: One-Dimensional Data

A Pandas Series is essentially a one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects, etc.). The labels are collectively called the index. Think of it as a highly enhanced and efficient version of a Python list or dictionary.

import pandas as pd

data = [10, 20, 30, 40, 50]
series_from_list = pd.Series(data)
print("Series from list:\n", series_from_list)

data = {'a': 100, 'b': 200, 'c': 300}
series_from_dict = pd.Series(data)
print("\nSeries from dictionary:\n", series_from_dict)

print("\nAccessing element with label 'b':", series_from_dict['b'])

print("\nAccessing element at index 1 (list based):", series_from_list[1])

Pandas DataFrame: Two-Dimensional Data

The DataFrame is the workhorse of Pandas. It’s a two-dimensional labeled data structure with columns of potentially different types. You can think of it as a table, similar to a spreadsheet or SQL table. Each column is essentially a Series.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 28],
        'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print("DataFrame:\n", df)

print("\nAge column:\n", df['Age'])

print("\nRow for Alice:\n", df.loc[df['Name'] == 'Alice'])

print("\nFirst row:\n", df.iloc[0])

#Adding a new column
df['Country'] = ['USA', 'UK', 'France']
print("\nDataFrame with added column:\n", df)

Working with DataFrame Indexes

Pandas allows for flexible index manipulation. You can set a specific column as the index, reset the index, or even create a multi-index for more complex data structures.

#Setting index
df = df.set_index('Name')
print("\nDataFrame with Name as index:\n", df)

#Resetting index
df = df.reset_index()
print("\nDataFrame with default numerical index:\n",df)

This provides a foundation for working with Pandas. Further exploration involves data cleaning, manipulation, analysis, and visualization – all built upon these core data structures.