Skip to content Skip to sidebar Skip to footer

Python Dictionary - Binary Search For A Key?

I want to write a container class that acts like a dictionary (actually derives from a dict), The keys for this structure will be dates. When a key (i.e. date) is used to retrieve

Solution 1:

You really don't want to subclass dict because you can't really reuse any of its functionality. Rather, subclass the abstract base class collections.Mapping (or MutableMapping if you want to also be able to modify an instance after creation), implement the indispensable special methods for the purpose, and you'll get other dict-like methods "for free" from the ABC.

The methods you need to code are __getitem__ (and __setitem__ and __delitem__ if you want mutability), __len__, __iter__, and __contains__.

The bisect module of the standard library gives you all you need to implement these efficiently on top of a sorted list. For example...:

import collections
import bisect

classMyDict(collections.Mapping):
  def__init__(self, contents):
    "contents must be a sequence of key/value pairs"
    self._list = sorted(contents)
  def__iter__(self):
    return (k for (k, _) in self._list)
  def__contains__(self, k):
    i = bisect.bisect_left(self._list, (k, None))
    return i < len(self._list) and self._list[i][0] == k
  def__len__(self):
    returnlen(self._list)
  def__getitem__(self, k):
    i = bisect.bisect_left(self._list, (k, None))
    if i >= len(self._list): raise KeyError(k)
    return self._list[i][1]

You'll probably want to fiddle __getitem__ depending on what you want to return (or whether you want to raise) for various corner cases such as "k greater than all keys in self".

Solution 2:

The sortedcontainers module provides a SortedDict type that maintains the keys in sorted order and supports bisecting on those keys. The module is pure-Python and fast-as-C implementations with 100% test coverage and hours of stress.

For example:

from sortedcontainers import SortedDict

sd = SortedDict((date, value) for date, value in data)

# Bisect for the index of the desired key.
index = sd.bisect('2001/01/05')

# Lookup the real key at that index.
key = sd.iloc[index]

# Retrieve the value associated with that key.
value = sd[key]

Because SortedDict supports fast indexing, it's easy to look ahead or behind your key as well. SortedDict is also a MutableMapping so it should work nicely in your type system.

Solution 3:

I'd extend a dict, and override the __getitem__ and __setitem__ method to store a sorted list of keys.

from bisect import bisect

classNearestNeighborDict(dict):
    def__init__(self):
        dict.__init__(self)
        self._keylist = []

    def__getitem__(self, x):
        if x in self:
            returndict.__getitem__(self, x)

        index = bisect(self._keylist, x)
        if index == len(self._keylist):
            raise KeyError('No next date')

        returndict.__getitem__(self, self._keylist[index])

    def__setitem__(self, x, value):
        if x notin self:
            index = bisect(self._keylist, x)
            self._keylist.insert(index, value)

        dict.__setitem__(self, x, value)

It's true you're better off inheriting from MutableMapping, but the principle is the same, and the above code can be easily adapted.

Solution 4:

Why not just maintain a sorted list from dict.keys() and search that? If you're subclassing dict you may even devise an opportunity to do a binary insert on that list when values are added.

Solution 5:

Use the floor_key method on bintrees.RBTree: https://pypi.python.org/pypi/bintrees/2.0.1

Post a Comment for "Python Dictionary - Binary Search For A Key?"