Creating DataFrames
There are several ways to create a DataFrame. The most common are from dictionaries and lists.
From a Dictionary:
import pandas as pd
= {'Name': ['Alice', 'Bob', 'Charlie'],
data 'Age': [25, 30, 28],
'City': ['New York', 'London', 'Paris']}
= pd.DataFrame(data)
df print(df)
This code snippet creates a DataFrame from a dictionary where keys become column names and values become column data.
From a List of Lists:
= [['Alice', 25, 'New York'],
data 'Bob', 30, 'London'],
['Charlie', 28, 'Paris']]
[
= pd.DataFrame(data, columns=['Name', 'Age', 'City'])
df print(df)
Here, a list of lists is used, requiring explicit column name specification.
Accessing Data
Retrieving data from a DataFrame is straightforward. You can access columns by name:
print(df['Name']) # Accesses the 'Name' column
print(df[['Name', 'Age']]) # Accesses multiple columns
Individual rows can be accessed using .loc
(label-based indexing) or .iloc
(integer-based indexing):
print(df.loc[0]) # Accesses the first row by label (index 0)
print(df.iloc[1]) # Accesses the second row by integer location
Data Manipulation
Pandas DataFrames offer a rich set of functionalities for data manipulation. Here are a few examples:
Adding a New Column:
'Country'] = ['USA', 'UK', 'France']
df[print(df)
Filtering Data:
= df[df['Age'] > 28]
filtered_df print(filtered_df)
This filters the DataFrame to include only rows where the ‘Age’ is greater than 28.
Sorting Data:
= df.sort_values(by='Age', ascending=False)
sorted_df print(sorted_df)
This sorts the DataFrame by the ‘Age’ column in descending order.
Handling Missing Data
Missing data is a common problem. Pandas handles this gracefully using NaN
(Not a Number) values.
'Salary'] = [50000, 60000, float('NaN')]
df[print(df)
print(df.dropna()) # Removes rows with missing values
dropna()
removes rows with missing values. Other methods like fillna()
allow you to replace missing values with a specific value or calculated statistic.
Working with CSV Files
DataFrames excel at importing and exporting data from various sources. CSV (Comma Separated Values) files are particularly common:
Reading from a CSV:
= pd.read_csv("data.csv") # Assumes 'data.csv' is in your working directory.
df_csv print(df_csv)
Writing to a CSV:
"output.csv", index=False) # index=False prevents writing the index to the file. df.to_csv(