Skip to content Skip to sidebar Skip to footer

How To Separate Time Ranges/intervals Into Bins If Intervals Occur Over Multiple Bins

I have a dataset which consists of pairs of start-end times (say seconds) of something happening across a recorded period of time. For example: #each tuple includes (start, stop) o

Solution 1:

If you don't mind using numpy, here is a strategy:

import numpy as np

def bin_times(data, bin_size, total_length):
    times = np.zeros(total_length, dtype=np.bool)
    for start, stop indata:
        times[start:stop] = True
    binned = 100 * np.average(times.reshape(-1, bin_size), axis=1)
    return binned.tolist()

data = [(0, 1), (5,8), (15,21), (29,30)]
bin_times(data, 5, 40)
// => [20.0, 60.0, 0.0, 100.0, 20.0, 20.0, 0.0, 0.0]

To explain the logic of bin_times(), let me use a smaller example:

data = [(0, 1), (3, 8)]
bin_times(data, 3, 9)
// => [33.3, 100.0, 66.6]
  1. The times array encodes whether your event is happening in each unit time interval. You start by setting every entry to False:

    [False, False, False, False, False, False, False, False, False]
    
  2. Read the incoming data and turn the appropriate entries to True:

    [True, False, False, True, True, True, True, True, False]
    
  3. Reshape it into a two-dimensional matrix in which the length of the rows is bin_size:

    [[True, False, False],
     [True,  True,  True],
     [True,  True, False]]
    
  4. Take the average in each row:

    [0.333, 1.000, 0.666]
    
  5. Multiply by 100 to turn those numbers into percentages:

    [33.3, 100.0, 66.6]
    
  6. To hide the use of numpy from the consumer of the function, use the .tolist() method to turn the resulting numpy array into a plain Python list.

One caveat: bin_size needs to evenly divide total_length — the reshaping will throw a ValueError otherwise.

Post a Comment for "How To Separate Time Ranges/intervals Into Bins If Intervals Occur Over Multiple Bins"