Skip to content Skip to sidebar Skip to footer

Python: Remove Elements From The List Which Are Prefix Of Other

Fastest (& python) way to get list of elements which do not contain any other elements as their prefix. (Elements can be in any order, for the sake of clarity in explanation el

Solution 1:

If your list is sorted, every element is either a prefix of the next one, or not a prefix of any of them. Therefore, you can write:

ls.sort()
[ls[i] for i in range(len(ls))[:-1] ifls[i] != ls[i+1][:len(ls[i])]] + [ls[-1]]

This will be n log(n) sorting plus one pass through the list (n).

For your current sorted list, it is marginally quicker as well, because it is linear, timeit gives 2.11 us.

A slightly quicker implementation (but not asymptotically), and more pythonic as well, using zip:

[x for x, y in zip(ls[:-1], ls[1:]) if x != y[:len(x)]] + [ls[-1]]

timeit gives 1.77 us

Solution 2:

List comprehension (ls is the name of your input list):

[x for x inlsif x not in [y[:len(x)] for y inlsif y != x]]

I doubt it is the quickest in terms of performance, but the idea is very straightforward. You are going through the list element by element and checking if it is the prefix of any element in a list of all the rest of elements.

timeit result: 11.9 us per loop (though the scaling is more important if you are going to use it for large lists)

Solution 3:

ls.sort() first if your list is originally unordered.

use startswith :

In[71]: [i for i, j in zip(ls[:-1], ls[1:]) ifnotj.startswith(i)]+[ls[-1]]
Out[71]: ['ABCDEFG', 'BCD', 'DEFGHI', 'EF', 'GKL', 'JKLM']

or enumerate:

[v for i, v in enumerate(ls[:-1]) if not ls[i+1].startswith(v)]+[ls[-1]]

Compared with @sashkello's approach:

In [78]: timeit [v for i, v in enumerate(ls[:-1]) if not ls[i+1].startswith(v)]+[ls[-1]] 
10000 loops, best of 3: 29.6 us per loop

In [79]: timeit [i for i, j in zip(ls[:-1], ls[1:]) if not j.startswith(i)]+[ls[-1]]
10000 loops, best of 3: 28.5 us per loop

In [80]: timeit [x for x inlsif x not in [y[:len(x)] for y inlsif y != x]]
1000 loops, best of 3: 1.77 ms per loop

Post a Comment for "Python: Remove Elements From The List Which Are Prefix Of Other"