Pandas Max

pandas
Published

September 12, 2023

Finding the Maximum Value in a Single Pandas Series

Let’s start with the simplest case: finding the maximum value within a single column (Series) of your DataFrame.

import pandas as pd

data = {'col1': [1, 5, 2, 8, 3]}
df = pd.DataFrame(data)

max_value = df['col1'].max()
print(f"The maximum value in 'col1' is: {max_value}")

This snippet directly applies max() to the ‘col1’ Series, efficiently returning the largest value.

Finding Maximum Values Across Multiple Columns

What if you need the maximum value across several columns? Pandas makes this easy too.

import pandas as pd

data = {'col1': [1, 5, 2, 8, 3], 'col2': [10, 2, 15, 4, 6], 'col3': [7, 9, 1, 3, 12]}
df = pd.DataFrame(data)

row_max = df.max(axis=1)
print("Maximum values across each row:\n", row_max)

#Method 2: Using `apply()` with `max` function
row_max_method2 = df.apply(lambda row: row.max(), axis=1)
print("\nMaximum values across each row (using apply):\n", row_max_method2)

overall_max = df.values.max()
print(f"\nThe overall maximum value in the DataFrame is: {overall_max}")

Here, we explore two approaches: using axis=1 to apply the max() function row-wise and utilizing the apply() method for more customized row-wise operations. We also show how to get the absolute maximum across the entire DataFrame.

Handling Missing Data (NaN)

Missing values (NaN) can affect the outcome of max(). Let’s see how to handle them gracefully.

import pandas as pd
import numpy as np

data = {'col1': [1, 5, np.nan, 8, 3]}
df = pd.DataFrame(data)

max_value_with_nan = df['col1'].max() # NaN will be ignored
print(f"Maximum value in 'col1' (ignoring NaN): {max_value_with_nan}")

max_value_skipping_nan = df['col1'].max(skipna=True) #Explicitly skip NaN
print(f"Maximum value in 'col1' (explicitly skipping NaN): {max_value_skipping_nan}")


max_value_including_nan = df['col1'].max(skipna=False) #NaN will be returned
print(f"Maximum value in 'col1' (including NaN): {max_value_including_nan}")

This demonstrates how skipna parameter controls the handling of missing values, providing flexibility depending on your needs.

Finding the Maximum Value with a Condition

You can combine max() with boolean indexing for more sophisticated selection.

import pandas as pd

data = {'col1': [1, 5, 2, 8, 3], 'col2': ['A', 'B', 'A', 'C', 'B']}
df = pd.DataFrame(data)

max_value_condition = df[df['col2'] == 'A']['col1'].max()
print(f"Maximum value in 'col1' where 'col2' is 'A': {max_value_condition}")

This example shows how to find the maximum value in ‘col1’ only for rows where ‘col2’ is equal to ‘A’.

Beyond the Basics: idxmax()

While max() provides the maximum value, idxmax() gives you the index of that maximum value.

import pandas as pd

data = {'col1': [1, 5, 2, 8, 3]}
df = pd.DataFrame(data)

max_index = df['col1'].idxmax()
print(f"The index of the maximum value in 'col1' is: {max_index}")

This is particularly helpful when you need to locate the row containing the maximum value within your DataFrame.