Pandas Series: One-Dimensional Data
A Pandas Series
is essentially a one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects, etc.). The labels are collectively called the index. Think of it as a highly enhanced and efficient version of a Python list or dictionary.
import pandas as pd
= [10, 20, 30, 40, 50]
data = pd.Series(data)
series_from_list print("Series from list:\n", series_from_list)
= {'a': 100, 'b': 200, 'c': 300}
data = pd.Series(data)
series_from_dict print("\nSeries from dictionary:\n", series_from_dict)
print("\nAccessing element with label 'b':", series_from_dict['b'])
print("\nAccessing element at index 1 (list based):", series_from_list[1])
Pandas DataFrame: Two-Dimensional Data
The DataFrame
is the workhorse of Pandas. It’s a two-dimensional labeled data structure with columns of potentially different types. You can think of it as a table, similar to a spreadsheet or SQL table. Each column is essentially a Series
.
import pandas as pd
= {'Name': ['Alice', 'Bob', 'Charlie'],
data 'Age': [25, 30, 28],
'City': ['New York', 'London', 'Paris']}
= pd.DataFrame(data)
df print("DataFrame:\n", df)
print("\nAge column:\n", df['Age'])
print("\nRow for Alice:\n", df.loc[df['Name'] == 'Alice'])
print("\nFirst row:\n", df.iloc[0])
#Adding a new column
'Country'] = ['USA', 'UK', 'France']
df[print("\nDataFrame with added column:\n", df)
Working with DataFrame Indexes
Pandas allows for flexible index manipulation. You can set a specific column as the index, reset the index, or even create a multi-index for more complex data structures.
#Setting index
= df.set_index('Name')
df print("\nDataFrame with Name as index:\n", df)
#Resetting index
= df.reset_index()
df print("\nDataFrame with default numerical index:\n",df)
This provides a foundation for working with Pandas. Further exploration involves data cleaning, manipulation, analysis, and visualization – all built upon these core data structures.