Boolean Indexing in NumPy

numpy
Published

February 19, 2024

Understanding Boolean Indexing

Boolean indexing leverages Boolean arrays (arrays containing only True and False values) to filter elements from another array. The index array’s shape must be compatible with the array being indexed. A compatible shape means that either the index array has the same shape as the array being indexed, or it has a shape that’s broadcastable to match.

Let’s illustrate with a simple example:

import numpy as np

arr = np.array([10, 20, 30, 40, 50])
bool_index = np.array([True, False, True, False, True])

result = arr[bool_index]
print(result)  # Output: [10 30 50]

In this example, bool_index selects elements where the corresponding value is True.

Creating Boolean Arrays with Comparison Operators

The power of Boolean indexing truly shines when you generate the Boolean array dynamically using comparison operators. These operators directly compare array elements against a value or another array, resulting in a Boolean array:

arr = np.array([10, 20, 30, 40, 50])

greater_than_25 = arr > 25
print(greater_than_25)  # Output: [False False  True  True  True]

result = arr[greater_than_25]
print(result)  # Output: [30 40 50]

within_range = (arr >= 20) & (arr <= 40) # Combining conditions with & (and) and | (or)
print(within_range) # Output: [False  True  True  True False]
result = arr[within_range]
print(result) # Output: [20 30 40]

This demonstrates how to easily select subsets of your data based on specific criteria. Note the use of & (logical AND) and | (logical OR) to combine multiple conditions.

Multi-Dimensional Boolean Indexing

Boolean indexing isn’t limited to one-dimensional arrays. It extends seamlessly to multi-dimensional arrays:

arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

greater_than_5 = arr_2d > 5
print(greater_than_5)
#Output:
#[[False False False]

result = arr_2d[greater_than_5]
print(result) # Output: [6 7 8 9]

#Selecting rows based on a condition on a column
row_condition = arr_2d[:,0] > 3
print(row_condition) # Output: [False  True  True]
result = arr_2d[row_condition,:]
print(result) #Output: [[4 5 6] [7 8 9]]

Here, we demonstrate how to select elements that meet the criteria across rows and columns.

Modifying Arrays with Boolean Indexing

Boolean indexing isn’t just for selection; you can also use it to modify array elements.

arr = np.array([10, 20, 30, 40, 50])
arr[arr > 25] = 100
print(arr) # Output: [ 10  20 100 100 100]

This example shows how to efficiently update elements fulfilling a condition.

Advanced Techniques: np.where()

NumPy’s np.where() function provides a concise way to perform conditional assignments:

arr = np.array([1,2,3,4,5])
arr_new = np.where(arr>2, 100, arr)
print(arr_new) # Output: [  1   2 100 100 100]

np.where(condition, x, y) assigns x where the condition is true and y where it’s false. This adds another layer of flexibility to your Boolean indexing operations.