Ad

Give A Unique Name To Multiple Files With Multiprocessing

- 1 answer

I'm trying to learn multiprocessing library. This way it's working:

def generate_files(file_number, directory):
    var02 = str(int(100*random.random()))
    with open(f"{directory}/sample{file_number}.csv", "w") as f:
        f.write(var02)

if __name__ == "__main__":
    N1 = 100
    # create directory for samples
    directory = "samples"
    if not os.path.exists(directory):
        os.makedirs(directory)
    cpu_count = int(os.environ["NUMBER_OF_PROCESSORS"]) # doesn't work on mac
    # generate using all cores
    for i in range(N1):
        process = multiprocessing.Process(target=generate_files, args=[i, directory])
        process.start()

The bad thing is that program creates 100 processes. I'd like to limit them to cpu_count. So it should look something like this:

for i in range(cpu_count):
    process = multiprocessing.Process(target=generate_files, args=[i, directory, cpu_count])

But this way all processes are trying to write to the same file, as names are the same. It's also not perfect if number of files is not multiple of cores. Any way around it?

Ad

Answer

You could use multiprocessing's pool with the map function. You can specifiy how many processes to use inside your pool (the dfault is os.cpu_count) and then run a function on each element of an iterable. So in your case you could do something like:

from multiprocessing import Pool
import os
import random

# create directory for samples
directory = "samples"

def generate_files(file_number):
    var02 = str(int(100*random.random()))
    with open(f"{directory}/sample{file_number}.csv", "w") as f:
        f.write(var02)

if __name__ == "__main__":
    N1 = 100

    if not os.path.exists(directory):
        os.makedirs(directory)

    with Pool() as pool:
        pool.map(generate_files, range(N1))
Ad
source: stackoverflow.com
Ad