Unique Elements: np.unique()
Often, you need to identify the unique elements within a NumPy array. The np.unique() function simplifies this process considerably. It returns a sorted array containing only the unique values.
import numpy as np
arr = np.array([1, 2, 2, 3, 4, 4, 5, 1])
unique_elements = np.unique(arr)
print(unique_elements) # Output: [1 2 3 4 5]np.unique() can also return the indices of the unique elements in the original array using the return_index argument. This is helpful when you need to know the original positions of the unique values.
arr = np.array([1, 2, 2, 3, 4, 4, 5, 1])
unique_elements, indices = np.unique(arr, return_index=True)
print(unique_elements) # Output: [1 2 3 4 5]
print(indices) # Output: [0 1 3 4 6]You can also get the counts of each unique element using the return_counts argument.
arr = np.array([1, 2, 2, 3, 4, 4, 5, 1])
unique_elements, counts = np.unique(arr, return_counts=True)
print(unique_elements) # Output: [1 2 3 4 5]
print(counts) # Output: [2 2 1 2 1]Set Operations: np.intersect1d(), np.union1d(), np.setdiff1d(), np.setxor1d()
NumPy provides functions mirroring standard set operations:
np.intersect1d(arr1, arr2): Returns the common elements between two arrays.
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([3, 5, 6, 7, 8])
intersection = np.intersect1d(arr1, arr2)
print(intersection) # Output: [3 5]np.union1d(arr1, arr2): Returns the union of two arrays (all unique elements from both).
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([3, 5, 6, 7, 8])
union = np.union1d(arr1, arr2)
print(union) # Output: [1 2 3 4 5 6 7 8]np.setdiff1d(arr1, arr2): Returns the elements inarr1that are not inarr2.
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([3, 5, 6, 7, 8])
difference = np.setdiff1d(arr1, arr2)
print(difference) # Output: [1 2 4]np.setxor1d(arr1, arr2): Returns the elements that are in eitherarr1orarr2, but not both (symmetric difference).
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([3, 5, 6, 7, 8])
symmetric_difference = np.setxor1d(arr1, arr2)
print(symmetric_difference) # Output: [1 2 4 6 7 8]These functions offer a concise and efficient way to perform set operations on NumPy arrays, making your code cleaner and faster, especially when dealing with large datasets. They are essential tools for data cleaning, analysis, and manipulation tasks.