Why Use .query()
?
The primary benefit of .query()
is its enhanced readability. Instead of constructing complex boolean expressions directly within bracket notation, .query()
allows you to express your filtering criteria as a string. This leads to code that’s easier to understand, write, and maintain, particularly when dealing with intricate selection logic. Furthermore, for larger datasets, .query()
can offer performance improvements compared to direct boolean indexing, especially when the query involves multiple conditions.
Basic Usage
Let’s start with a simple example. Suppose we have a Pandas DataFrame:
import pandas as pd
= {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
data 'Age': [25, 30, 22, 28],
'City': ['New York', 'London', 'Paris', 'Tokyo']}
= pd.DataFrame(data)
df print(df)
To select individuals older than 25, we can use .query()
like this:
= df.query('Age > 25')
older_than_25 print(older_than_25)
This concisely expresses our filtering condition. The result will be a DataFrame containing only Bob and David’s information.
Multiple Conditions
.query()
elegantly handles multiple conditions using logical operators:
= df.query('Age < 30 and City == "London"')
young_londoners print(young_londoners)
This selects individuals younger than 30 who live in London. Note the use of and
– &
is also acceptable, offering more flexibility for combining complex boolean expressions within the query string. Similarly, or
(or |
) can be used.
Using Variables within Queries
One of .query()
’s strengths is its ability to incorporate variables from the surrounding scope:
= 26
age_threshold = 'Paris'
city_to_find
= df.query('Age > @age_threshold or City == @city_to_find')
filtered_df print(filtered_df)
The @
symbol prefixes variables from the surrounding Python environment, making the queries dynamic and reusable.
Handling Special Characters
If your column names contain spaces or other special characters, you’ll need to use backticks to enclose them in the query string:
= {'Name and Age': ['Alice', 'Bob'],
data 'City of Residence': ['New York', 'London']}
= pd.DataFrame(data)
df2 = df2.query('`Name and Age` == "Alice"')
result print(result)
In-Place Modification
To modify the DataFrame directly without creating a copy, use the inplace=True
argument (use with caution):
'Age < 25', inplace=True)
df.query(print(df)
Beyond Basic Filtering
The .query()
method isn’t limited to simple comparisons. You can use more complex operations within your query strings, including string methods and regular expressions (using the str
accessor), providing substantial flexibility for advanced data filtering. Experimentation is key to unlocking its full potential.