NumPy Random Shuffle

numpy
Published

November 24, 2023

Understanding numpy.random.shuffle

The numpy.random.shuffle function operates in-place, meaning it directly modifies the original array rather than creating a new shuffled copy. This is crucial for memory efficiency, especially when dealing with large datasets. It randomly rearranges the elements along the first axis of an array. If you’re working with a 1D array, this simply shuffles the elements. For multi-dimensional arrays, it shuffles the rows.

Important Note: numpy.random.shuffle modifies the array directly. If you need to preserve the original array, remember to create a copy before shuffling:

import numpy as np

original_array = np.array([1, 2, 3, 4, 5])
shuffled_array = np.copy(original_array)  # Create a copy
np.random.shuffle(shuffled_array)

print("Original Array:", original_array)
print("Shuffled Array:", shuffled_array)

Shuffling 1D Arrays

Shuffling a one-dimensional array is straightforward:

import numpy as np

my_array = np.array([10, 20, 30, 40, 50])
np.random.shuffle(my_array)
print(my_array)

Each time you run this code, you’ll get a different randomized ordering of the elements.

Shuffling Multi-Dimensional Arrays

When working with multi-dimensional arrays, numpy.random.shuffle shuffles the rows. Consider this example:

import numpy as np

my_matrix = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

np.random.shuffle(my_matrix)
print(my_matrix)

The rows of my_matrix will be randomly permuted. The columns remain unchanged.

Setting the Random Seed for Reproducibility

For reproducible results, it’s essential to set a random seed using numpy.random.seed(). This ensures that the shuffling sequence is consistent across multiple runs.

import numpy as np

np.random.seed(42) # Set the seed

my_array = np.array([1, 2, 3, 4, 5])
np.random.shuffle(my_array)
print(my_array)

np.random.seed(42) # Same seed, same shuffle
my_array = np.array([1, 2, 3, 4, 5])
np.random.shuffle(my_array)
print(my_array)

By using the same seed (42 in this case), you’ll consistently get the same shuffled array.

Alternatives: numpy.random.permutation

While numpy.random.shuffle modifies the array in-place, numpy.random.permutation returns a new shuffled array, leaving the original array unchanged.

import numpy as np

my_array = np.array([1, 2, 3, 4, 5])
shuffled_array = np.random.permutation(my_array)

print("Original Array:", my_array)
print("Shuffled Array:", shuffled_array)

This provides more flexibility, particularly when you need to preserve the original data. Choose the method that best suits your needs based on whether you need in-place modification or a new shuffled array.