Grouping by Multiple Columns: The Basics
The core concept remains the same: you specify the columns to group by as a list within the groupby()
method. Pandas will then create groups based on the unique combinations of values in those columns.
Let’s start with a simple example. Imagine you have a dataset of sales transactions:
import pandas as pd
= {'Region': ['North', 'North', 'South', 'South', 'East', 'East'],
data 'Product': ['A', 'B', 'A', 'B', 'A', 'B'],
'Sales': [100, 150, 200, 250, 120, 180]}
= pd.DataFrame(data)
df print(df)
This will output:
Region Product Sales
0 North A 100
1 North B 150
2 South A 200
3 South B 250
4 East A 120
5 East B 180
To group by both Region
and Product
, we use:
= df.groupby(['Region', 'Product'])
grouped print(grouped)
This doesn’t display the grouped data directly. It creates a GroupBy
object. To see the actual results, we need to apply an aggregation function:
= grouped['Sales'].sum()
grouped_sum print(grouped_sum)
This outputs:
Region Product
East A 120
B 180
North A 100
B 150
South A 200
B 250
Name: Sales, dtype: int64
This shows the total sales for each combination of Region and Product.
Beyond sum()
: Other Aggregation Functions
The groupby()
method works seamlessly with other aggregation functions like mean()
, count()
, max()
, min()
, std()
, etc.
= grouped['Sales'].mean()
grouped_mean print(grouped_mean)
= grouped['Sales'].count()
grouped_count print(grouped_count)
Applying Multiple Aggregations at Once
Instead of calling multiple aggregation functions separately, you can use agg()
to apply several at once:
= grouped['Sales'].agg(['sum', 'mean', 'count'])
aggregated print(aggregated)
This provides a concise summary of sales data for each group.
Handling Missing Values
If your dataset contains missing values (NaN), Pandas handles them intelligently during aggregation. By default, sum()
and mean()
ignore NaN values. You can control this behavior using the dropna
parameter in certain aggregation functions if needed.
Grouping and More Complex Operations
The possibilities extend far beyond simple aggregations. You can use groupby()
as a stepping stone for more complex data manipulations, filtering, transformations, and more. The key is to understand how to use the GroupBy
object to perform these operations efficiently on your grouped data. This will be explored in future posts.