From a Single Dictionary
The simplest way to create a DataFrame is from a single dictionary where keys represent column names and values represent column data (lists or arrays of equal length).
import pandas as pd
= {'Name': ['Alice', 'Bob', 'Charlie'],
data 'Age': [25, 30, 28],
'City': ['New York', 'London', 'Paris']}
= pd.DataFrame(data)
df print(df)
This will output a DataFrame like this:
Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 28 Paris
From a List of Dictionaries
When your data is structured as a list of dictionaries, each dictionary representing a row, the process is slightly different.
= [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
data 'Name': 'Bob', 'Age': 30, 'City': 'London'},
{'Name': 'Charlie', 'Age': 28, 'City': 'Paris'}]
{
= pd.DataFrame(data)
df print(df)
This produces the same DataFrame as before. Note that in this case, Pandas automatically infers the column order based on the order of keys in the first dictionary. Inconsistent keys across dictionaries will lead to NaN
(Not a Number) values in the resulting DataFrame.
Handling Missing Data
If your dictionaries have missing data, Pandas handles this gracefully using NaN
values.
= [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
data 'Name': 'Bob', 'Age': 30},
{'Name': 'Charlie', 'Age': 28, 'City': 'Paris'}]
{
= pd.DataFrame(data)
df print(df)
This will create a DataFrame with a NaN
value where the ‘City’ is missing for Bob.
Specifying Column Order
You can explicitly define the column order using the columns
parameter.
= [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
data 'Name': 'Bob', 'Age': 30, 'City': 'London'},
{'Name': 'Charlie', 'Age': 28, 'City': 'Paris'}]
{
= pd.DataFrame(data, columns=['City', 'Name', 'Age'])
df print(df)
This allows you to control the order of columns in the resulting DataFrame regardless of the order in the input dictionaries.
DataFrame from Dictionaries with Different Lengths
While ideal for creating DataFrames, dictionaries with lists of unequal lengths will cause errors. Pandas expects consistent lengths across all columns. You may need to handle data inconsistencies before creating the DataFrame.
Using from_dict
The pd.DataFrame.from_dict()
method provides additional flexibility, allowing you to specify the orientation ('columns'
or 'index'
) to control how the dictionary is interpreted. The default is 'columns'
, which is consistent with the examples above.
= {'Name': ['Alice', 'Bob', 'Charlie'],
data 'Age': [25, 30, 28],
'City': ['New York', 'London', 'Paris']}
= pd.DataFrame.from_dict(data)
df print(df)
= pd.DataFrame.from_dict(data, orient='index')
df_index print(df_index)
This shows the difference between the default column-oriented approach and a row-oriented approach using orient='index'
. The orient='index'
method will generate a DataFrame where the dictionary keys are the index, and each inner list becomes a column.
This demonstrates the versatility of creating Pandas DataFrames from dictionaries, enabling efficient data structuring for further analysis. The choice of method depends on the specific structure and characteristics of your input data.