NumPy random.RandomState() vs random.seed()

Numpy’s random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions.

Table of Contents

Introduction

A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG’s seed (which may include truly random values).

Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility. In particular, Numpy’s random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions.

Using the NumPy random.seed() function

Recall that the random.rand(n) function generates an array of the n random samples from a uniform distribution over $[0, 1)$. In the following code, we use random.rand(3) to generate an array of 3 random numbers twice.

1import numpy as np
2print(np.random.rand(3))
3print(np.random.rand(3))

[0.70814782 0.29090474 0.51082761]
[0.89294695 0.89629309 0.12558531]

We see that the two separate calls to the random.rand(3) function lead to two completely different random arrays. If there is a need to reproduce the same results everytime you call the random function, we can set a seed in the random.seed() function.

1np.random.seed(3)
2print(np.random.rand(3))
3
4np.random.seed(3)
5print(np.random.rand(3))

[0.5507979  0.70814782 0.29090474]
[0.5507979  0.70814782 0.29090474]

We see that the 2 arrays generated are identical. Setting a certain seed means that the random generator will produce numbers from a deterministic sequence, which means that subsequent random calls (after the pseudorandom number generator is initialized with a seed) will produce the same results.

Note: To generate the same random array with each call to a random function, we need to precede the call by an initialization with the same seed each time.

Let’s take a look at the following:

1np.random.seed(4)
2print(np.random.rand(3))
3print(np.random.rand(3))
4print(np.random.rand(3))

[0.96702984 0.54723225 0.97268436]
[0.71481599 0.69772882 0.2160895 ]
[0.97627445 0.00623026 0.25298236]

1np.random.seed(4)
2print(np.random.rand(3))
3print(np.random.rand(3))
4print(np.random.rand(3))

[0.96702984 0.54723225 0.97268436]
[0.71481599 0.69772882 0.2160895 ]
[0.97627445 0.00623026 0.25298236]

Note that the subsequent sequences of random arrays are the same after initializing with the same seed, though the arrays generated with each call are different. Providing a fixed seed assures that the same series of calls to functions in the numpy.random namespace will always produce the same results, which can be helpful in code testing.

The problem with NumPy random.seed() function

The np.random.seed() function ensures that we can create reproducible results, which means that all random arrays generated (after initialization with the same seed) will be the same on any machine. However, there is a potential problem - the np.random.seed()function sets the seed to a global instance of the pseudorandom number generator.

This can potentially be a problem for projects which import other modules or packages which also call np.random.seed(), affecting all calls to the NumPy random functions. For instance, these imported modules could reset the global random seed to other values, leading to unexpected changes to computed results. Therefore, the preferred best practice for getting reproducible pseudorandom numbers is to instantiate a generator object with a seed and “pass it around”.

The preferred best practice for getting reproducible pseudorandom numbers is to instantiate a generator object with a seed and pass it around. The implicit global RandomState behind the numpy.random.* convenience functions can cause problems, especially when threads or other forms of concurrency are involved. Global state is always problematic. We categorically recommend avoiding using the convenience functions when reproducibility is involved.

NEP 19 — Random number generator policy by Robert Kern

Using the NumPy random.RandomState() function

To avoid impacting the global numpy state, we shall use the np.random.RandomState() function to replace the random.seed() function. The np.random.RandomState() function has the advantage that it does not change the global RandomState instance that underlies the functions in the numpy.random namespace.

1R = np.random.RandomState(32)
2print(R.rand(3))
3
4R = np.random.RandomState(32)
5print(R.rand(3))

[0.85888927 0.37271115 0.55512878]
[0.85888927 0.37271115 0.55512878]

We see that the 2 random arrays generated are identical. Also, note that after setting the variable R= np.random.RandomState(32), we only need to prefix the call to the rand() function by R. We can also combine the 2 statements into a single statement:

1np.random.RandomState(32).rand(3)

array([0.85888927, 0.37271115, 0.55512878])

Calling 3 times results in 3 identical random arrays.

1np.random.RandomState(32).rand(3)
2np.random.RandomState(32).rand(3)
3np.random.RandomState(32).rand(3)

array([0.85888927, 0.37271115, 0.55512878])
array([0.85888927, 0.37271115, 0.55512878])
array([0.85888927, 0.37271115, 0.55512878])

However, calling the following multiple times will not lead to the same array.

1R = np.random.RandomState(32)
2print(R.rand(3))
3print(R.rand(3))
4print(R.rand(3))

[0.85888927 0.37271115 0.55512878]
[0.95565655 0.7366696  0.81620514]
[0.10108656 0.92848807 0.60910917]

If you wish to generate the same random array with each call to a random function using np.random.RandomState() and yet do not wish to repeat the prefix with each call, you may write a simple function as follows:

1def rng(n): # n is the length of the random array
2    R = np.random.RandomState(12)
3    return R.rand(n)
4
5print(rng(4))
6print(rng(4))
7print(rng(4))

[0.15416284 0.7400497  0.26331502 0.53373939]
[0.15416284 0.7400497  0.26331502 0.53373939]
[0.15416284 0.7400497  0.26331502 0.53373939]

Conclusion

If you use the functions in the numpy.random namespace, you will not get consistent pseudorandom numbers because they are pulling from a different RandomState instance than the one you just created. Instead, initializing the PRNG with a certain seed using numpy.random.RandomState(seed) will return a new seeded RandomState instance but otherwise does not change anything else. Note that you have to use the returned RandomState instance everytime to get consistent pseudorandom numbers.

On the other hand, numpy.random.seed() resets the state of the existing global RandomState instance that underlies the functions in the numpy.random namespace. This may have undesirable and unexpected consequences on computed outputs and is to be avoided.