DataFrame from Dictionaries – Mastering Python

From a Single Dictionary

The simplest way to create a DataFrame is from a single dictionary where keys represent column names and values represent column data (lists or arrays of equal length).

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 28],
        'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)
print(df)

This will output a DataFrame like this:

      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   28     Paris

From a List of Dictionaries

When your data is structured as a list of dictionaries, each dictionary representing a row, the process is slightly different.

data = [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
        {'Name': 'Bob', 'Age': 30, 'City': 'London'},
        {'Name': 'Charlie', 'Age': 28, 'City': 'Paris'}]

df = pd.DataFrame(data)
print(df)

This produces the same DataFrame as before. Note that in this case, Pandas automatically infers the column order based on the order of keys in the first dictionary. Inconsistent keys across dictionaries will lead to NaN (Not a Number) values in the resulting DataFrame.

Handling Missing Data

If your dictionaries have missing data, Pandas handles this gracefully using NaN values.

data = [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
        {'Name': 'Bob', 'Age': 30},
        {'Name': 'Charlie', 'Age': 28, 'City': 'Paris'}]

df = pd.DataFrame(data)
print(df)

This will create a DataFrame with a NaN value where the ‘City’ is missing for Bob.

Specifying Column Order

You can explicitly define the column order using the columns parameter.

data = [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
        {'Name': 'Bob', 'Age': 30, 'City': 'London'},
        {'Name': 'Charlie', 'Age': 28, 'City': 'Paris'}]

df = pd.DataFrame(data, columns=['City', 'Name', 'Age'])
print(df)

This allows you to control the order of columns in the resulting DataFrame regardless of the order in the input dictionaries.

DataFrame from Dictionaries with Different Lengths

While ideal for creating DataFrames, dictionaries with lists of unequal lengths will cause errors. Pandas expects consistent lengths across all columns. You may need to handle data inconsistencies before creating the DataFrame.

Using `from_dict`

The pd.DataFrame.from_dict() method provides additional flexibility, allowing you to specify the orientation ('columns' or 'index') to control how the dictionary is interpreted. The default is 'columns', which is consistent with the examples above.

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 28],
        'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame.from_dict(data)
print(df)

df_index = pd.DataFrame.from_dict(data, orient='index')
print(df_index)

This shows the difference between the default column-oriented approach and a row-oriented approach using orient='index'. The orient='index' method will generate a DataFrame where the dictionary keys are the index, and each inner list becomes a column.

This demonstrates the versatility of creating Pandas DataFrames from dictionaries, enabling efficient data structuring for further analysis. The choice of method depends on the specific structure and characteristics of your input data.