Understanding NumPy’s unique
Function
The numpy.unique
function serves a straightforward yet indispensable purpose: it returns a sorted array of unique elements from an input array. This is incredibly useful for data cleaning, analysis, and preprocessing tasks where identifying distinct values is essential. Beyond simply returning the unique elements, unique
also provides optional outputs that reveal the index of the first occurrence of each unique element in the input array and the counts of each unique element.
Basic Usage: Extracting Unique Elements
The most basic application of numpy.unique
involves obtaining a sorted list of unique values. Consider the following example:
import numpy as np
= np.array([1, 2, 2, 3, 4, 4, 4, 5, 1])
arr = np.unique(arr)
unique_elements print(unique_elements) # Output: [1 2 3 4 5]
This code snippet demonstrates how easily we can extract the unique elements from a NumPy array. The output is a sorted array containing only the distinct values from the input array, eliminating duplicates.
Advanced Usage: Indices and Counts
numpy.unique
offers more than just unique elements; it also provides information about their indices and counts. The function can return three outputs:
unique_elements
: The sorted array of unique values.indices
: The indices of the first occurrence of each unique element in the input array.counts
: The number of times each unique element appears in the input array.
Let’s see this in action:
import numpy as np
= np.array([1, 2, 2, 3, 4, 4, 4, 5, 1])
arr = np.unique(arr, return_index=True, return_counts=True)
unique_elements, indices, counts
print("Unique elements:", unique_elements) # Output: [1 2 3 4 5]
print("Indices:", indices) # Output: [0 1 3 4 7]
print("Counts:", counts) # Output: [2 2 1 3 1]
Here, return_index=True
and return_counts=True
enable the function to return the indices and counts, respectively. This allows for more detailed analysis of the unique values within the array.
Handling Multi-dimensional Arrays
numpy.unique
seamlessly handles multi-dimensional arrays as well. However, by default, it flattens the array before finding unique elements. If you need to find unique rows or columns, you need to specify the axis
parameter:
import numpy as np
= np.array([[1, 2], [1, 2], [3, 4]])
arr_2d = np.unique(arr_2d, axis=0)
unique_rows print(unique_rows) # Output: [[1 2] [3 4]]
This example shows how to find unique rows in a 2D array. Specify axis=1
to find unique columns.
Working with Different Data Types
The unique
function works effectively with various data types, including strings and other custom data types supported by NumPy.
import numpy as np
= np.array(['apple', 'banana', 'apple', 'orange'])
arr_str = np.unique(arr_str)
unique_strings print(unique_strings) # Output: ['apple' 'banana' 'orange']
This showcases the versatility of numpy.unique
across different data types. Remember that for custom objects, appropriate comparison methods must be defined for accurate results.