Ad

Fastest Way To Compare A List To A Dict Of Lists

- 1 answer

So I have a 2 lists:

list1 = ['abc', 'efg', 'hijk'] #list of strings

list2 = ['lmno', 'pqrs'] #also a list of strings

then I have a dict which is fairly large usually, there are only ~100 keys and a few hundred thousand values of strings populating the lists

d = {'abc': ['lmno'], 'efg': ['lmno', 'pqrs']}

so I need to loop through each item of list1 and each of list2:

example:

for i1 in list1:
   for i2 in list2:
      print(i1, i2)

then compare the data to the dict:

for i1 in list1:
   for i2 in list2:
      if i1.lower() in d:
         if i2 in d[i1.lower()]:
             continue #ignore
         else:
             #process data

currently, my code is like above but it is very slow when the dict is large is there a faster way to do this?

for i1 in list1:
   for i2 in list2:
      if i1.lower() in d:
         if i2 in d[i1.lower()]:
             continue #ignore
         else:
             #process data
Ad

Answer

Swap the second and the third lines so you don't iterate over list2 if i1.lower() is not in d.

for i1 in list1:
    if i1.lower() in d:
        for i2 in list2:
            if i2 in d[i1.lower()]:
                continue #ignore
            else:
                 #process data

Also, as @aran-fey mentioned, convert your d to a dict of sets first:

d = {k: set(v) for k, v in d.items()}

Even further (thanks to @AlexHall):

d = {k: set(v) for k, v in d.items()}
set2 = {i2.lower() for i2 in list2}

for i1 in list1:
    for i2 in set2 - d.get(i1.lower(), set()):
         #process data
Ad
source: stackoverflow.com
Ad