Selecting Rows in DataFrame

pandas
Published

January 24, 2024

Using .loc for Label-Based Selection

The .loc accessor is your go-to method when selecting rows based on their labels (index values). It’s intuitive and highly versatile.

import pandas as pd

data = {'col1': [1, 2, 3, 4, 5],
        'col2': [6, 7, 8, 9, 10]}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E'])

print(df.loc['A'])

print(df.loc[['A', 'C', 'E']])

print(df.loc['B':'D']) # Inclusive of 'D'

.loc also allows for boolean indexing, enabling powerful row selection based on conditions.

print(df.loc[df['col1'] > 2])

print(df.loc[(df['col1'] > 2) & (df['col2'] < 9)])

Using .iloc for Integer-Based Selection

.iloc provides integer-based indexing, allowing you to select rows based on their position in the DataFrame. This is particularly useful when you don’t rely on the DataFrame’s index.

print(df.iloc[0])

print(df.iloc[1:4])

print(df.iloc[::2])

Similar to .loc, .iloc can be combined with boolean indexing for more complex selection. However, the boolean array must align with the integer positions.

boolean_array = [False, True, False, True, False]
print(df.iloc[boolean_array])

Using .at and .iat for Single Element Selection

For accessing single elements (a single row and single column), .at (label-based) and .iat (integer-based) offer the most efficient approach. Avoid using .loc or .iloc for single-element access; these are significantly slower.

print(df.at['B', 'col1'])

print(df.iat[2, 0])

Filtering with query() for Readable Code

The query() method offers a highly readable way to filter rows based on conditions, especially with complex queries.

print(df.query('col1 > 2'))

print(df.query('col1 > 2 and col2 < 9'))

This method significantly improves code readability, making it easier to understand and maintain complex filtering operations. Note that column names containing spaces will need to be quoted within the query string.

These techniques empower you to effectively and efficiently select rows in your Pandas DataFrames, significantly enhancing your data analysis workflow. Experiment with these methods to find the approach best suited to your specific needs and data structure.