DataFrame loc

pandas
Published

March 25, 2024

Understanding .loc

The .loc accessor in Pandas allows you to select data from a DataFrame using labels (index and column names). This differs from .iloc, which uses integer-based indexing. .loc offers flexibility and readability, especially when working with named indices and columns.

Basic Selection:

Let’s start with a simple example:

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 22, 28],
        'City': ['New York', 'London', 'Paris', 'Tokyo']}

df = pd.DataFrame(data)
print(df)

print("\nRow labeled 'Alice':\n", df.loc['Alice'])

#Select single column
print("\n'Age' column:\n",df.loc[:,"Age"])


#Select multiple columns
print("\n'Age' and 'City' columns:\n",df.loc[:,['Age','City']])

This demonstrates selecting a single row by its index label (‘Alice’) and selecting specific columns (‘Age’, ‘City’). Note that .loc requires labels, not numerical positions.

Slicing with .loc

.loc enables slicing similar to Python lists, but using labels:

print("\nRows from 'Bob' to 'David':\n", df.loc['Bob':'David'])

print("\nRows 0-2 (inclusive) using labels:\n", df.loc[:2])

#Select specific rows and columns
print("\nRows from 'Bob' to 'Charlie', 'Age' and 'City' columns:\n", df.loc['Bob':'Charlie',['Age','City']])

Boolean Indexing with .loc

A powerful feature of .loc is the ability to select rows based on boolean conditions:

print("\nRows where Age > 25:\n", df.loc[df['Age'] > 25])

print("\nRows where City is 'Paris' or 'Tokyo':\n", df.loc[(df['City'] == 'Paris') | (df['City'] == 'Tokyo')])

This allows for complex filtering of your data based on multiple criteria.

Setting Values with .loc

.loc is not just for selection; it’s also used for assigning new values:

df.loc['Alice', 'Age'] = 26
print("\nDataFrame after changing Alice's age:\n", df)

#Change multiple values
df.loc[df['Age']>25,'Age']=30
print("\nDataFrame after changing ages >25:\n", df)

This provides a concise way to modify specific data points within your DataFrame.

Handling Multiple Indices

.loc seamlessly handles DataFrames with multiple indices:

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df_multi = pd.DataFrame(np.random.randn(8, 4), index=index)


print(df_multi.loc[('bar',)])

#Select specific rows
print(df_multi.loc[('bar','one'),:])

#Select rows and columns
print(df_multi.loc[('bar','one'),0:2])