Pandas Query Method

pandas
Published

June 17, 2024

Why Use .query()?

The primary benefit of .query() is its enhanced readability. Instead of constructing complex boolean expressions directly within bracket notation, .query() allows you to express your filtering criteria as a string. This leads to code that’s easier to understand, write, and maintain, particularly when dealing with intricate selection logic. Furthermore, for larger datasets, .query() can offer performance improvements compared to direct boolean indexing, especially when the query involves multiple conditions.

Basic Usage

Let’s start with a simple example. Suppose we have a Pandas DataFrame:

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 22, 28],
        'City': ['New York', 'London', 'Paris', 'Tokyo']}

df = pd.DataFrame(data)
print(df)

To select individuals older than 25, we can use .query() like this:

older_than_25 = df.query('Age > 25')
print(older_than_25)

This concisely expresses our filtering condition. The result will be a DataFrame containing only Bob and David’s information.

Multiple Conditions

.query() elegantly handles multiple conditions using logical operators:

young_londoners = df.query('Age < 30 and City == "London"')
print(young_londoners)

This selects individuals younger than 30 who live in London. Note the use of and& is also acceptable, offering more flexibility for combining complex boolean expressions within the query string. Similarly, or (or |) can be used.

Using Variables within Queries

One of .query()’s strengths is its ability to incorporate variables from the surrounding scope:

age_threshold = 26
city_to_find = 'Paris'

filtered_df = df.query('Age > @age_threshold or City == @city_to_find')
print(filtered_df)

The @ symbol prefixes variables from the surrounding Python environment, making the queries dynamic and reusable.

Handling Special Characters

If your column names contain spaces or other special characters, you’ll need to use backticks to enclose them in the query string:

data = {'Name and Age': ['Alice', 'Bob'],
        'City of Residence': ['New York', 'London']}

df2 = pd.DataFrame(data)
result = df2.query('`Name and Age` == "Alice"')
print(result)

In-Place Modification

To modify the DataFrame directly without creating a copy, use the inplace=True argument (use with caution):

df.query('Age < 25', inplace=True)
print(df)

Beyond Basic Filtering

The .query() method isn’t limited to simple comparisons. You can use more complex operations within your query strings, including string methods and regular expressions (using the str accessor), providing substantial flexibility for advanced data filtering. Experimentation is key to unlocking its full potential.