NumPy Random Seed – Mastering Python

What is a Random Seed?

In essence, a random seed is an integer value that acts as an initial input to a pseudorandom number generator (PRNG). A PRNG doesn’t generate truly random numbers; instead, it produces a deterministic sequence of numbers based on the initial seed. The same seed will always produce the same sequence of “random” numbers. This deterministic nature is vital for reproducibility. Without a set seed, NumPy’s random functions will use a system-dependent seed, leading to different sequences on different runs, even with identical code.

Setting the NumPy Random Seed

Setting the seed is straightforward using numpy.random.seed(). This function takes an integer as an argument. Let’s illustrate:

import numpy as np

np.random.seed(42)

random_numbers = np.random.rand(5)
print(f"Random numbers with seed 42: {random_numbers}")

np.random.seed(100)

random_numbers_2 = np.random.rand(5)
print(f"Random numbers with seed 100: {random_numbers_2}")

Running this code will produce the same sequence of random numbers each time you execute it, as long as the seed remains the same (42 in the first case, 100 in the second).

Importance of Reproducibility

The ability to reproduce results is essential for debugging, sharing code, and validating research. Consider a machine learning model trained with random weight initialization. If you don’t set the seed, you might obtain vastly different model performance on each run, making it difficult to assess the impact of different hyperparameters or model architectures. Using a seed ensures consistency and allows you to directly compare results across various runs.

Seed Management Across Multiple Processes

If your code involves multiple processes (e.g., multiprocessing), managing seeds becomes more complex. Each process needs its own unique seed to generate independent random number sequences. A simple solution involves setting a base seed and then deriving unique seeds for each process using techniques such as adding the process ID or a counter to the base seed. Libraries such as multiprocessing in Python can help in managing this efficiently.

import numpy as np
import multiprocessing

def worker(seed):
    np.random.seed(seed)
    random_numbers = np.random.rand(5)
    print(f"Process {multiprocessing.current_process().name}: {random_numbers}")

if __name__ == '__main__':
    base_seed = 42
    processes = []
    for i in range(3):
        process = multiprocessing.Process(target=worker, args=(base_seed + i,))
        processes.append(process)
        process.start()

    for process in processes:
        process.join()

This example demonstrates setting different seeds for each worker process, leading to different random number sequences generated within each process, while still ensuring reproducibility across multiple runs if the base seed is consistent.

Default Seed Behavior

It’s worth noting that if you don’t explicitly set a seed using np.random.seed(), NumPy will use a default seed, typically derived from the system’s time or other system-specific sources. This explains why you get different random sequences each time you run your code without a defined seed.

Using `default_rng()`

For newer code, consider using the default_rng() function to create a Generator object. This offers enhanced performance and more features.

from numpy.random import default_rng

rng = default_rng(seed=42)
random_numbers = rng.random(5)
print(random_numbers)

This approach provides a more controlled and often more efficient way to manage randomness in your NumPy code.