DataFrame Indexing – Mastering Python

Accessing Data: The Basics

Pandas offers several ways to access data within a DataFrame. The most common methods are using labels (column names and row indices) and integer-based location.

1. Using `.loc` for label-based indexing:

.loc allows you to access data using labels. This is generally preferred when you know the names of the columns and indices you want to access.

import pandas as pd

data = {'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]}
df = pd.DataFrame(data, index=['A', 'B', 'C'])

print(df.loc['B', 'col2'])  # Output: 5

print(df.loc[:, 'col1'])  # Output: A    1\nB    2\nC    3\nName: col1, dtype: int64

print(df.loc[:, ['col1', 'col3']])

print(df.loc['A'])

print(df.loc['A':'B', 'col1':'col2'])

2. Using `.iloc` for integer-based indexing:

.iloc uses integer positions to access data. This is useful when you need to select data based on its position regardless of labels.

import pandas as pd

data = {'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]}
df = pd.DataFrame(data)

print(df.iloc[1, 1])  # Output: 5

print(df.iloc[:, 0])  # Output: 0    1\n1    2\n2    3\nName: col1, dtype: int64

print(df.iloc[:, [0, 2]])

print(df.iloc[0])

print(df.iloc[0:2, 0:2])

3. Using `[]` for flexible indexing:

The square bracket notation [] offers a more flexible approach. It can sometimes use label-based indexing similar to .loc and integer-based indexing like .iloc, depending on the input. However, it is generally recommended to use .loc and .iloc explicitly for clarity and to avoid ambiguity.

import pandas as pd

data = {'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]}
df = pd.DataFrame(data)

print(df['col1'])

print(df[['col1', 'col3']])

print(df[0:2]) # This uses integer location, not labels.

Boolean Indexing

Boolean indexing allows you to select rows based on a condition. This is incredibly useful for filtering data.

import pandas as pd

data = {'col1': [1, 2, 3, 4, 5], 'col2': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

print(df[df['col1'] > 2])

print(df[(df['col1'] > 2) & (df['col2'] < 40)])

Indexing with `.at` and `.iat`

For accessing single elements, .at (label-based) and .iat (integer-based) offer optimized access:

import pandas as pd

data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data, index=['A', 'B', 'C'])

print(df.at['B', 'col2'])  # Output: 5
print(df.iat[1, 1])       # Output: 5

Accessing Data: The Basics

1. Using .loc for label-based indexing:

2. Using .iloc for integer-based indexing:

3. Using [] for flexible indexing:

Boolean Indexing

Indexing with .at and .iat

1. Using `.loc` for label-based indexing:

2. Using `.iloc` for integer-based indexing:

3. Using `[]` for flexible indexing:

Indexing with `.at` and `.iat`