Numpy’s random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions.
Table of Contents
- Introduction
- Using the NumPy
random.seed()
function - The problem with NumPy
random.seed()
function - Using the NumPy
random.RandomState()
function - Conclusion
Introduction
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG’s seed (which may include truly random values).
Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility. In particular, Numpy’s random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions.
Using the NumPy random.seed() function
Recall that the random.rand(n)
function generates an array of the n
random samples from a uniform distribution over $[0, 1)$. In the following code, we use random.rand(3)
to generate an array of 3 random numbers twice.
1import numpy as np
2print(np.random.rand(3))
3print(np.random.rand(3))
[0.70814782 0.29090474 0.51082761]
[0.89294695 0.89629309 0.12558531]
We see that the two separate calls to the random.rand(3)
function lead to two completely different random arrays. If there is a need to reproduce the same results everytime you call the random function, we can set a seed in the random.seed()
function.
1np.random.seed(3)
2print(np.random.rand(3))
3
4np.random.seed(3)
5print(np.random.rand(3))
[0.5507979 0.70814782 0.29090474]
[0.5507979 0.70814782 0.29090474]
We see that the 2 arrays generated are identical. Setting a certain seed means that the random generator will produce numbers from a deterministic sequence, which means that subsequent random calls (after the pseudorandom number generator is initialized with a seed) will produce the same results.
Let’s take a look at the following:
1np.random.seed(4)
2print(np.random.rand(3))
3print(np.random.rand(3))
4print(np.random.rand(3))
[0.96702984 0.54723225 0.97268436]
[0.71481599 0.69772882 0.2160895 ]
[0.97627445 0.00623026 0.25298236]
1np.random.seed(4)
2print(np.random.rand(3))
3print(np.random.rand(3))
4print(np.random.rand(3))
[0.96702984 0.54723225 0.97268436]
[0.71481599 0.69772882 0.2160895 ]
[0.97627445 0.00623026 0.25298236]
Note that the subsequent sequences of random arrays are the same after initializing with the same seed, though the arrays generated with each call are different. Providing a fixed seed assures that the same series of calls to functions in the numpy.random
namespace will always produce the same results, which can be helpful in code testing.
ADVERTISEMENT
The problem with NumPy random.seed() function
The np.random.seed()
function ensures that we can create reproducible results, which means that all random arrays generated (after initialization with the same seed) will be the same on any machine. However, there is a potential problem - the np.random.seed()
function sets the seed to a global instance of the pseudorandom number generator.
This can potentially be a problem for projects which import other modules or packages which also call np.random.seed()
, affecting all calls to the NumPy random functions. For instance, these imported modules could reset the global random seed to other values, leading to unexpected changes to computed results. Therefore, the preferred best practice for getting reproducible pseudorandom numbers is to instantiate a generator object with a seed and “pass it around”.
The preferred best practice for getting reproducible pseudorandom numbers is to instantiate a generator object with a seed and pass it around. The implicit global
RandomState
behind thenumpy.random.*
convenience functions can cause problems, especially when threads or other forms of concurrency are involved. Global state is always problematic. We categorically recommend avoiding using the convenience functions when reproducibility is involved.
NEP 19 — Random number generator policy by Robert Kern
Using the NumPy random.RandomState() function
To avoid impacting the global numpy state, we shall use the np.random.RandomState()
function to replace the random.seed()
function. The np.random.RandomState()
function has the advantage that it does not change the global RandomState
instance that underlies the functions in the numpy.random
namespace.
1R = np.random.RandomState(32)
2print(R.rand(3))
3
4R = np.random.RandomState(32)
5print(R.rand(3))
[0.85888927 0.37271115 0.55512878]
[0.85888927 0.37271115 0.55512878]
We see that the 2 random arrays generated are identical. Also, note that after setting the variable R= np.random.RandomState(32)
, we only need to prefix the call to the rand()
function by R.
We can also combine the 2 statements into a single statement:
1np.random.RandomState(32).rand(3)
array([0.85888927, 0.37271115, 0.55512878])
Calling 3 times results in 3 identical random arrays.
1np.random.RandomState(32).rand(3)
2np.random.RandomState(32).rand(3)
3np.random.RandomState(32).rand(3)
array([0.85888927, 0.37271115, 0.55512878])
array([0.85888927, 0.37271115, 0.55512878])
array([0.85888927, 0.37271115, 0.55512878])
However, calling the following multiple times will not lead to the same array.
1R = np.random.RandomState(32)
2print(R.rand(3))
3print(R.rand(3))
4print(R.rand(3))
[0.85888927 0.37271115 0.55512878]
[0.95565655 0.7366696 0.81620514]
[0.10108656 0.92848807 0.60910917]
If you wish to generate the same random array with each call to a random function using np.random.RandomState()
and yet do not wish to repeat the prefix with each call, you may write a simple function as follows:
1def rng(n): # n is the length of the random array
2 R = np.random.RandomState(12)
3 return R.rand(n)
4
5print(rng(4))
6print(rng(4))
7print(rng(4))
[0.15416284 0.7400497 0.26331502 0.53373939]
[0.15416284 0.7400497 0.26331502 0.53373939]
[0.15416284 0.7400497 0.26331502 0.53373939]
Conclusion
If you use the functions in the numpy.random
namespace, you will not get consistent pseudorandom numbers because they are pulling from a different RandomState
instance than the one you just created. Instead, initializing the PRNG with a certain seed using numpy.random.RandomState(seed)
will return a new seeded RandomState
instance but otherwise does not change anything else. Note that you have to use the returned RandomState
instance everytime to get consistent pseudorandom numbers.
On the other hand, numpy.random.seed()
resets the state of the existing global RandomState
instance that underlies the functions in the numpy.random
namespace. This may have undesirable and unexpected consequences on computed outputs and is to be avoided.