Skip to content Skip to sidebar Skip to footer

How To Use Multiprocessing Module To Iterate A List And Match It With A Key In Dictionary?

I have a list named master_lst created from a CSV file using the following code infile= open(sys.argv[1], 'r') lines = infile.readlines()[1:] master_lst = ['read'] for line in line

Solution 1:

Here's a multiprocessing version. It uses a slightly different approach than you do in your code which does away with the need for creating the ind_lst.

The essence of the difference is that it first produces a transpose of the desired data, and then transpose that into the desired result.

In other words, instead of creating this directly:

Read,file1,file2
AAAAAAAAAAAAAAA,7451,4456
AAAAAAAAAAAAAAAA,4133,3624
AAAAAAAAAAAAAAAAA,2783,7012

It first produces:

Read,AAAAAAAAAAAAAAA,AAAAAAAAAAAAAAAA,AAAAAAAAAAAAAAAAA 
file1,7451,4133,2783
file2,4456,3624,7012

...and then transposes that with the built-in zip() function to obtain the desired format.

Besides not needing to create the ind_lst, it also allows the creation of one row of data per file rather than one column of it (which is easier and can be done more efficiently with less effort).

Here's the code:

from __future__ import print_function

import csv
from functools import partial
from glob import glob
from itertools import izip  # Python 2import operator
import os
from multiprocessing import cpu_count, Pool, Queue
import sys

defget_master_list(filename):
    withopen(filename, "rb") as csvfile:
        reader = csv.reader(csvfile)
        next(reader)  # ignore first row
        sequence_getter = operator.itemgetter(3)  # retrieves fourth column of each rowreturnmap(sequence_getter, reader)

defprocess_fa_file(master_list, filename):
    fa_dict = {}
    withopen(filename) as fa_file:
        for line in fa_file:
            if line and line[0] != '>':
                fa_dict[sequence] = int(line)
            elif line:
                sequence = line[1:-1]

    get = fa_dict.get  # local var to expedite access
    basename = os.path.basename(os.path.splitext(filename)[0])
    return [basename] + [get(key, 0) for key in master_list]

defprocess_fa_files(master_list, filenames):
    pool = Pool(processes=4)  # "processes" is the number of worker processes to# use. If processes is None then the number returned# by cpu_count() is used.# Only one argument can be passed to the target function using Pool.map(),# so create a partial to pass first argument, which doesn't vary.
    results = pool.map(partial(process_fa_file, master_list), filenames)
    header_row = ['Read'] + master_list
    return [header_row] + results

if __name__ == '__main__':
    master_list = get_master_list('master_list.csv')

    fa_files_dir = '.'# current directory
    filenames = glob(os.path.join(fa_files_dir, '*.fa'))

    data = process_fa_files(master_list, filenames)

    rows = zip(*data)  # transposewithopen('output.csv', 'wb') as outfile:
        writer = csv.writer(outfile)
        writer.writerows(rows)

    # show data written to filefor row in rows:
        print(','.join(map(str, row)))

Post a Comment for "How To Use Multiprocessing Module To Iterate A List And Match It With A Key In Dictionary?"