DataFrame loc – Mastering Python

Understanding `.loc`

The .loc accessor in Pandas allows you to select data from a DataFrame using labels (index and column names). This differs from .iloc, which uses integer-based indexing. .loc offers flexibility and readability, especially when working with named indices and columns.

Basic Selection:

Let’s start with a simple example:

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 22, 28],
        'City': ['New York', 'London', 'Paris', 'Tokyo']}

df = pd.DataFrame(data)
print(df)

print("\nRow labeled 'Alice':\n", df.loc['Alice'])

#Select single column
print("\n'Age' column:\n",df.loc[:,"Age"])


#Select multiple columns
print("\n'Age' and 'City' columns:\n",df.loc[:,['Age','City']])

This demonstrates selecting a single row by its index label (‘Alice’) and selecting specific columns (‘Age’, ‘City’). Note that .loc requires labels, not numerical positions.

Slicing with `.loc`

.loc enables slicing similar to Python lists, but using labels:

print("\nRows from 'Bob' to 'David':\n", df.loc['Bob':'David'])

print("\nRows 0-2 (inclusive) using labels:\n", df.loc[:2])

#Select specific rows and columns
print("\nRows from 'Bob' to 'Charlie', 'Age' and 'City' columns:\n", df.loc['Bob':'Charlie',['Age','City']])

Boolean Indexing with `.loc`

A powerful feature of .loc is the ability to select rows based on boolean conditions:

print("\nRows where Age > 25:\n", df.loc[df['Age'] > 25])

print("\nRows where City is 'Paris' or 'Tokyo':\n", df.loc[(df['City'] == 'Paris') | (df['City'] == 'Tokyo')])

This allows for complex filtering of your data based on multiple criteria.

Setting Values with `.loc`

.loc is not just for selection; it’s also used for assigning new values:

df.loc['Alice', 'Age'] = 26
print("\nDataFrame after changing Alice's age:\n", df)

#Change multiple values
df.loc[df['Age']>25,'Age']=30
print("\nDataFrame after changing ages >25:\n", df)

This provides a concise way to modify specific data points within your DataFrame.

Handling Multiple Indices

.loc seamlessly handles DataFrames with multiple indices:

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df_multi = pd.DataFrame(np.random.randn(8, 4), index=index)


print(df_multi.loc[('bar',)])

#Select specific rows
print(df_multi.loc[('bar','one'),:])

#Select rows and columns
print(df_multi.loc[('bar','one'),0:2])

Understanding .loc

Slicing with .loc

Boolean Indexing with .loc

Setting Values with .loc

Handling Multiple Indices

Understanding `.loc`

Slicing with `.loc`

Boolean Indexing with `.loc`

Setting Values with `.loc`