Understanding .loc
The .loc
accessor in Pandas allows you to select data from a DataFrame using labels (index and column names). This differs from .iloc
, which uses integer-based indexing. .loc
offers flexibility and readability, especially when working with named indices and columns.
Basic Selection:
Let’s start with a simple example:
import pandas as pd
= {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
data 'Age': [25, 30, 22, 28],
'City': ['New York', 'London', 'Paris', 'Tokyo']}
= pd.DataFrame(data)
df print(df)
print("\nRow labeled 'Alice':\n", df.loc['Alice'])
#Select single column
print("\n'Age' column:\n",df.loc[:,"Age"])
#Select multiple columns
print("\n'Age' and 'City' columns:\n",df.loc[:,['Age','City']])
This demonstrates selecting a single row by its index label (‘Alice’) and selecting specific columns (‘Age’, ‘City’). Note that .loc
requires labels, not numerical positions.
Slicing with .loc
.loc
enables slicing similar to Python lists, but using labels:
print("\nRows from 'Bob' to 'David':\n", df.loc['Bob':'David'])
print("\nRows 0-2 (inclusive) using labels:\n", df.loc[:2])
#Select specific rows and columns
print("\nRows from 'Bob' to 'Charlie', 'Age' and 'City' columns:\n", df.loc['Bob':'Charlie',['Age','City']])
Boolean Indexing with .loc
A powerful feature of .loc
is the ability to select rows based on boolean conditions:
print("\nRows where Age > 25:\n", df.loc[df['Age'] > 25])
print("\nRows where City is 'Paris' or 'Tokyo':\n", df.loc[(df['City'] == 'Paris') | (df['City'] == 'Tokyo')])
This allows for complex filtering of your data based on multiple criteria.
Setting Values with .loc
.loc
is not just for selection; it’s also used for assigning new values:
'Alice', 'Age'] = 26
df.loc[print("\nDataFrame after changing Alice's age:\n", df)
#Change multiple values
'Age']>25,'Age']=30
df.loc[df[print("\nDataFrame after changing ages >25:\n", df)
This provides a concise way to modify specific data points within your DataFrame.
Handling Multiple Indices
.loc
seamlessly handles DataFrames with multiple indices:
= [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
arrays 'one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
[= list(zip(*arrays))
tuples = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index = pd.DataFrame(np.random.randn(8, 4), index=index)
df_multi
print(df_multi.loc[('bar',)])
#Select specific rows
print(df_multi.loc[('bar','one'),:])
#Select rows and columns
print(df_multi.loc[('bar','one'),0:2])