DataFrame from Lists

pandas
Published

September 27, 2024

From a Single List to a DataFrame

If you have a single list, Pandas will interpret it as a single column in your DataFrame. You need to specify the column name.

import pandas as pd

data = [10, 20, 30, 40, 50]
df = pd.DataFrame(data, columns=['Values'])
print(df)

This will output:

   Values
0      10
1      20
2      30
3      40
4      50

From a List of Lists to a DataFrame

For more complex datasets, you’ll often use a list of lists. Each inner list represents a row in your DataFrame. You can optionally specify column names.

import pandas as pd

data = [[1, 'Alice', 25], [2, 'Bob', 30], [3, 'Charlie', 22]]
df = pd.DataFrame(data, columns=['ID', 'Name', 'Age'])
print(df)

This will produce:

   ID      Name  Age
0   1     Alice   25
1   2       Bob   30
2   3  Charlie   22

If you omit the columns parameter, Pandas will automatically assign numerical column names (0, 1, 2, …).

import pandas as pd

data = [[1, 'Alice', 25], [2, 'Bob', 30], [3, 'Charlie', 22]]
df = pd.DataFrame(data)
print(df)

Handling Different Data Types

DataFrames can handle various data types within a single column or across columns.

import pandas as pd

data = [[1, 'Alice', 25.5, True], [2, 'Bob', 30, False], [3, 'Charlie', 22, True]]
df = pd.DataFrame(data, columns=['ID', 'Name', 'Age', 'Status'])
print(df)

This example shows a mix of integers, strings, floats, and booleans. Pandas handles these automatically.

Using Dictionaries for Column Names and Data

An alternative, and often more readable, approach is to use a dictionary where keys represent column names and values are lists representing the data for each column.

import pandas as pd

data = {'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22]}
df = pd.DataFrame(data)
print(df)

This offers a clearer way to structure your data, especially when dealing with numerous columns.

Creating DataFrames from Lists of Dictionaries

You can also create a DataFrame from a list of dictionaries. Each dictionary represents a row, and keys represent column names.

import pandas as pd

data = [{'ID': 1, 'Name': 'Alice', 'Age': 25}, {'ID': 2, 'Name': 'Bob', 'Age': 30}, {'ID': 3, 'Name': 'Charlie', 'Age': 22}]
df = pd.DataFrame(data)
print(df)

This method is useful when your data is naturally structured as a list of individual records. Note that all dictionaries should ideally contain the same keys (columns).

Handling Missing Data

If your lists have unequal lengths, or dictionaries have missing keys, Pandas will fill in missing values with NaN (Not a Number).

import pandas as pd

data = [[1, 'Alice', 25], [2, 'Bob'], [3, 'Charlie', 22, 'extra']]
df = pd.DataFrame(data)
print(df)

Pandas gracefully handles these situations, allowing for flexible data input. You can later use Pandas’ powerful tools to handle these missing values (e.g., imputation, removal).