NumPy Sum Function

numpy
Published

July 16, 2024

Basic Summation with NumPy’s sum()

The simplest use of sum() involves summing all elements within a NumPy array.

import numpy as np

my_array = np.array([1, 2, 3, 4, 5])
total = np.sum(my_array)
print(f"The sum of the array is: {total}")  # Output: The sum of the array is: 15

This is functionally equivalent to Python’s built-in sum(), but NumPy’s version is significantly faster for larger datasets.

Summing Along Specific Axes

NumPy arrays can be multi-dimensional. The power of sum() truly shines when dealing with these structures. The axis parameter allows you to specify which dimension to sum along.

Let’s consider a 2D array:

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

column_sums = np.sum(matrix, axis=0)
print(f"Column sums: {column_sums}")  # Output: Column sums: [12 15 18]

row_sums = np.sum(matrix, axis=1)
print(f"Row sums: {row_sums}")  # Output: Row sums: [ 6 15 24]

total_sum = np.sum(matrix, axis=(0,1)) #Summing across both axes
print(f"Total sum across both axes: {total_sum}") #Output: Total sum across both axes: 45

This demonstrates the flexibility to calculate sums across rows, columns, or the entire array.

Handling Missing Data (NaN)

NumPy’s sum() cleverly handles NaN (Not a Number) values. By default, if a NaN is present, the sum will also be NaN. However, the nan_to_num() function can be used in conjunction with sum() to replace NaN values with 0 before summation.

array_with_nan = np.array([1, 2, np.nan, 4, 5])

sum_with_nan = np.sum(array_with_nan)
print(f"Sum with NaN: {sum_with_nan}")  # Output: Sum with NaN: nan

sum_without_nan = np.sum(np.nan_to_num(array_with_nan))
print(f"Sum after NaN replacement: {sum_without_nan}")  # Output: Sum after NaN replacement: 12.0

Beyond Basic Sums: keepdims Parameter

The keepdims parameter is crucial when performing summation along axes. Setting it to True ensures that the output array maintains the same number of dimensions as the input, even after reducing a dimension through summation. This is very handy for broadcasting operations later on.

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

column_sums_keepdims = np.sum(matrix, axis=0, keepdims=True)
print(f"Column sums (keepdims=True):\n{column_sums_keepdims}")
#Output: Column sums (keepdims=True):

Notice the extra set of square brackets indicating the retained dimension.

Working with Other Data Types

NumPy’s sum() seamlessly handles various data types, including integers, floats, and complex numbers. The output data type will generally match the input type.

These examples illustrate the flexibility and efficiency of NumPy’s sum() function, making it an essential tool for any data scientist or programmer working with numerical data in Python.