Skip to content Skip to sidebar Skip to footer

Efficient Way To Modify A Dictionary While Comparing Its Items

I have a dictionary with strings as keys and sets as values. These sets contain integers, which may be common for different items of the dictionary. Indeed, my goal is to compare e

Solution 1:

I think your approach is not overly inefficient, but there is at least one slightly faster way to achieve this. I think this is a good job for defaultdict.

from collections import defaultdict

def deduplicate(db: dict) -> dict:
    # Collectall keys which each number appears inby inverting the mapping:
    inverse = defaultdict(set)  # number -> keys
    for key, valuein db.items():
        for number invalue:
            inverse[number].add(key)

    # Invert the mapping again - this time joining the keys:
    result= defaultdict(set)  # joined key -> numbers
    for number, keys in inverse.items():
        new_key = "".join(sorted(keys))
        result[new_key].add(number)
    return dict(result)

When comparing using your example - this is roughly 6 times faster. I don't doubt there are faster ways to achieve it!

Using itertools.combinations would be a reasonable approach if for example you knew that each number could show up in at most 2 keys. As this can presumably be larger - you would need to iterate combinations of varying sizes, ending up with more iterations overall.

Note 1: I have sorted the keys before joining into a new key. You mention that you don't mind about the order - but in this solution its important that keys are consistent, to avoid both "abc" and "bac" existing with different numbers. The consistent order will also make it easier for you to test.

Note 2: the step where you remove values from the blacklist has a side effect on the input db - and so running your code twice with the same input db would produce different results. You could avoid that in your original implementation by performing a deepcopy:

values = deepcopy(list(db.values())

Solution 2:

first create an inverted dictionary where each value maps to the keys that contain it. Then, for each value, use combinations to create key pairs that contain it and build the resulting dictionary from those:

from itertools import combinations

db = {"a":{1}, "b":{1}, "c":{2,3,4}, "d":{2,3}}

matches = dict() # { value:set of keys }for k,vSet in db.items():
    for v in vSet: matches.setdefault(v,[]).append(k)

result = dict()  # { keypair:set of values }for v,keys in matches.items():
    iflen(keys)<2:
        result.setdefault(keys[0],set()).add(v)
        continuefor a,b in combinations(keys,2):
        result.setdefault(a+b,set()).add(v)

print(result) # {'ab': {1}, 'cd': {2, 3}, 'c': {4}}

Post a Comment for "Efficient Way To Modify A Dictionary While Comparing Its Items"