Understanding NumPy’s mean() Function
The mean() function, part of the NumPy library, computes the arithmetic mean (average) of a given array or data set. It offers flexibility by allowing you to calculate means across different axes or dimensions of multi-dimensional arrays. This makes it incredibly versatile for various data analysis tasks.
Basic Usage: Calculating the Mean of a 1D Array
Let’s start with the simplest scenario: calculating the mean of a one-dimensional array.
import numpy as np
data = np.array([1, 2, 3, 4, 5])
mean_value = np.mean(data)
print(f"The mean is: {mean_value}") # Output: The mean is: 3.0This code snippet demonstrates the basic application of np.mean(). The function takes the array data as input and returns its mean.
Handling Multi-Dimensional Arrays: Specifying the Axis
The true power of np.mean() shines when dealing with multi-dimensional arrays. The axis parameter allows you to specify along which axis (dimension) to calculate the mean.
import numpy as np
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
row_mean = np.mean(data, axis=0)
print(f"Row-wise mean: {row_mean}") # Output: Row-wise mean: [4. 5. 6.]
column_mean = np.mean(data, axis=1)
print(f"Column-wise mean: {column_mean}") # Output: Column-wise mean: [2. 5. 8.]
overall_mean = np.mean(data)
print(f"Overall mean: {overall_mean}") # Output: Overall mean: 5.0This example showcases how axis=0 computes the mean along each column (resulting in a row vector), axis=1 computes the mean along each row (resulting in a column vector), and omitting axis computes the mean of the entire array.
Handling Missing Data (NaN values)
NumPy’s mean() function intelligently handles NaN (Not a Number) values, which often represent missing data in real-world datasets. By default, np.mean() will return NaN if any NaN values are present. However, you can use the nanmean() function to ignore these NaN values:
import numpy as np
data = np.array([1, 2, np.nan, 4, 5])
mean_with_nan = np.mean(data)
print(f"Mean with NaN: {mean_with_nan}") # Output: Mean with NaN: nan
mean_without_nan = np.nanmean(data)
print(f"Mean without NaN: {mean_without_nan}") # Output: Mean without NaN: 3.0Weighted Means
While np.mean() calculates the arithmetic mean, you can easily calculate weighted means using NumPy’s array operations. This involves creating a weight array and using element-wise multiplication before applying np.sum() and then dividing by the total weight.
import numpy as np
data = np.array([1, 2, 3, 4, 5])
weights = np.array([0.1, 0.2, 0.3, 0.2, 0.2]) #Example weights
weighted_mean = np.sum(data * weights) / np.sum(weights)
print(f"Weighted mean: {weighted_mean}") #Output will vary slightly depending on your weightsThis shows how to calculate a weighted mean, a powerful extension of the basic mean calculation. Remember to adjust the weights according to your specific needs.