Setting Random Seed In Python Disturbs Multiprocessing
I've observed that setting a random seed before using multiprocessing in python causes strange behaviour.
In python 3.5.2, only 2 or 3 cores are used with a low percentage of used CPU. In python 2.7.13, all requested cores are used at 100%, but the code seems to never finish. When I remove the initialization of the random seed, the parallelization works fine.
This happens even though there is not an explicit use of random in the parallelized function. I now assume the seed is shared among processes and that prevents the smooth running of multiprocessing, but can someone provide the correct answer?
I've run the code on Linux and here is a minimal code example :
from multiprocessing import Pool
import numpy as np
import random
random.seed = 2018
NB_CPUS = 4
def test(x):
return x**2
pool = Pool(NB_CPUS)
args = [np.random.rand() for _ in range(100000)]
results = pool.map(test, args)
pool.terminate()
results[-5:]
Answer
Bit late with an answer, but you're breaking things by setting the random.seed
function to an int
. You should instead be doing:
random.seed(2018)
the last few lines of traceback provide the context that should have made this obvious:
File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 125, in __init__
random.seed()
TypeError: 'int' object is not callable
this causes Pool
to keep trying to create new worker processes, but because this happens every time no forward progress can be made.
The behind this is that multiprocessing
knows it should re-seed the random module when forking so that child processes don't share the same RNG state. To do this it tries to call the random.seed
function, but you've set it to an int
which isn't callable --- hence the error!
Another issue related to this is that multiprocessing
doesn't know to re-seed the NumPy RNG, so the following code:
from multiprocessing import Pool
import numpy as np
def test(i):
print(i, np.random.rand())
with Pool(4) as pool:
pool.map(test, range(4))
will cause each worker to print the same value. This issue has been known for a while, but is still open. You can work around this by using a worker initializer
, e.g:
def initfn():
np.random.seed()
with Pool(4, initializer=initfn) as pool:
pool.map(test, range(4))
will now cause the above test
function to print different values. Note that you could even use Pool(4, initializer=np.random.seed)
if you're not doing any other work level initialization.
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module