Efficiently Identifying Whether Part Of String Is In List/dict Keys?

August 20, 2024 Post a Comment

I have a lot (>100,000) lowercase strings in a list, where a subset might look like this: str_list = ['hello i am from denmark', 'that was in the united states', 'nothing here']

Solution 1:

This seems to be a good way:

input_strings = ["hello i am from denmark",
                 "that was in the united states",
                 "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

output_strings = []

forstring in input_strings:
    for key, value in dict_x.items():
        if key in string:
            output_strings.append(value)
            breakelse:
        output_strings.append(string)
print(output_strings)

Solution 2:

Assuming:

lst = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

You can do:

res = [dict_x.get(next((k forkin dict_x if k in my_str), None), my_str) formy_strin lst]

which returns:

print(res)  # -> ['dk', 'us', 'nothing here']

The cool thing about this (apart from it being a python-ninjas favorite weapon aka list-comprehension) is the get with a default of my_str and next with a StopIteration value of None that triggers the above default.

Solution 3:

Something like this would work. Note that this will convert the string to the first encountered key fitting the criteria. If there are multiple you may want to modify the logic based on whatever fits your use case.

strings = [str1, str2, str3]
converted = []
forstring in strings:
    updated_string = stringfor key, value in dict_x.items()
        if key in string:
            updated_string = value
            break
    converted.append(updated_string)
print(converted)

Solution 4:

Try

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

for k, v in dict_x.items():
    for i in range(len(str_list)):
        if k in str_list[i]:
            str_list[i] = v

print(str_list)

This iterates through the key, value pairs in your dictionary and looks to see if the key is in the string. If it is, it replaces the string with the value.

Solution 5:

You can subclass dict and use a list comprehension.

In terms of performance, I advise you try a few different methods and see what works best.

class dict_contains(dict):
    def __getitem__(self, value):
        key = next((k forkinself.keys() if k in value), None)
        returnself.get(key)

str1 = "hello i am from denmark"
str2 = "that was in the united states"
str3 = "nothing here"

lst = [str1, str2, str3]

dict_x = dict_contains({"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"})

res = [dict_x[i] or i foriin lst]

# ['dk', 'us', "nothing here"]

howtostartbloggingformoney