Ad

Setting Random Seed In Python Disturbs Multiprocessing

I've observed that setting a random seed before using multiprocessing in python causes strange behaviour.

In python 3.5.2, only 2 or 3 cores are used with a low percentage of used CPU. In python 2.7.13, all requested cores are used at 100%, but the code seems to never finish. When I remove the initialization of the random seed, the parallelization works fine.

This happens even though there is not an explicit use of random in the parallelized function. I now assume the seed is shared among processes and that prevents the smooth running of multiprocessing, but can someone provide the correct answer?


I've run the code on Linux and here is a minimal code example :

from multiprocessing import Pool
import numpy as np
import random

random.seed = 2018

NB_CPUS = 4

def test(x):
    return x**2

pool = Pool(NB_CPUS)
args = [np.random.rand() for _ in range(100000)]

results = pool.map(test, args)

pool.terminate()
results[-5:]
Ad

Answer

Bit late with an answer, but you're breaking things by setting the random.seed function to an int. You should instead be doing:

random.seed(2018)

the last few lines of traceback provide the context that should have made this obvious:

  File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib64/python2.7/multiprocessing/forking.py", line 125, in __init__
    random.seed()
TypeError: 'int' object is not callable

this causes Pool to keep trying to create new worker processes, but because this happens every time no forward progress can be made.

The behind this is that multiprocessing knows it should re-seed the random module when forking so that child processes don't share the same RNG state. To do this it tries to call the random.seed function, but you've set it to an int which isn't callable --- hence the error!

Another issue related to this is that multiprocessing doesn't know to re-seed the NumPy RNG, so the following code:

from multiprocessing import Pool
import numpy as np

def test(i):
    print(i, np.random.rand())

with Pool(4) as pool:
    pool.map(test, range(4))

will cause each worker to print the same value. This issue has been known for a while, but is still open. You can work around this by using a worker initializer, e.g:

def initfn():
    np.random.seed()

with Pool(4, initializer=initfn) as pool:
    pool.map(test, range(4))

will now cause the above test function to print different values. Note that you could even use Pool(4, initializer=np.random.seed) if you're not doing any other work level initialization.

Ad
source: stackoverflow.com
Ad