Accessing Data: The Basics
Pandas offers several ways to access data within a DataFrame. The most common methods are using labels (column names and row indices) and integer-based location.
1. Using .loc
for label-based indexing:
.loc
allows you to access data using labels. This is generally preferred when you know the names of the columns and indices you want to access.
import pandas as pd
= {'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]}
data = pd.DataFrame(data, index=['A', 'B', 'C'])
df
print(df.loc['B', 'col2']) # Output: 5
print(df.loc[:, 'col1']) # Output: A 1\nB 2\nC 3\nName: col1, dtype: int64
print(df.loc[:, ['col1', 'col3']])
print(df.loc['A'])
print(df.loc['A':'B', 'col1':'col2'])
2. Using .iloc
for integer-based indexing:
.iloc
uses integer positions to access data. This is useful when you need to select data based on its position regardless of labels.
import pandas as pd
= {'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]}
data = pd.DataFrame(data)
df
print(df.iloc[1, 1]) # Output: 5
print(df.iloc[:, 0]) # Output: 0 1\n1 2\n2 3\nName: col1, dtype: int64
print(df.iloc[:, [0, 2]])
print(df.iloc[0])
print(df.iloc[0:2, 0:2])
3. Using []
for flexible indexing:
The square bracket notation []
offers a more flexible approach. It can sometimes use label-based indexing similar to .loc
and integer-based indexing like .iloc
, depending on the input. However, it is generally recommended to use .loc
and .iloc
explicitly for clarity and to avoid ambiguity.
import pandas as pd
= {'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]}
data = pd.DataFrame(data)
df
print(df['col1'])
print(df[['col1', 'col3']])
print(df[0:2]) # This uses integer location, not labels.
Boolean Indexing
Boolean indexing allows you to select rows based on a condition. This is incredibly useful for filtering data.
import pandas as pd
= {'col1': [1, 2, 3, 4, 5], 'col2': [10, 20, 30, 40, 50]}
data = pd.DataFrame(data)
df
print(df[df['col1'] > 2])
print(df[(df['col1'] > 2) & (df['col2'] < 40)])
Indexing with .at
and .iat
For accessing single elements, .at
(label-based) and .iat
(integer-based) offer optimized access:
import pandas as pd
= {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
data = pd.DataFrame(data, index=['A', 'B', 'C'])
df
print(df.at['B', 'col2']) # Output: 5
print(df.iat[1, 1]) # Output: 5