Skip to content Skip to sidebar Skip to footer

Pandas Dataframe - For Each Row, Return Count Of Other Rows With Overlapping Dates

I've got a dataframe with projects, start dates, and end dates. For each row I would like to return the number of other projects in process when the project started. How do you nes

Solution 1:

I suggest you take advantage of numpy broadcasting:

ends = df.pr_start_date.values < df.pr_end_date.values[:, None]
starts = df.pr_start_date.values > df.pr_start_date.values[:, None]
df['overlap'] = (ends & starts).sum(0)
print(df)

Output

projectpr_start_datepr_end_dateoverlap0A2018-09-01  2019-06-15        01B2019-04-01  2019-12-01        12C2019-06-08  2019-08-01        2

Both ends and starts are matrices of 3x3 that are truth when the condition is met:

# ends   
[[ TrueTrueTrue]  
 [ TrueTrueTrue]
 [ TrueTrueTrue]]

# starts
[[FalseTrueTrue]
 [FalseFalseTrue]
 [FalseFalseFalse]]

Then find the intersection with the logical & and sum across columns (sum(0)).

Solution 2:

Solution 3:

I assume the rows are sorted by the start date, and check the previously started projects that have not yet completed. The df.index.get_loc(r.name) yields the index of row being processed.

df["overlap"]=df.apply(lambda r: df.loc[:df.index.get_loc(r.name),"pr_end_date"].gt(r["pr_start_date"]).sum()-1, axis=1)

Post a Comment for "Pandas Dataframe - For Each Row, Return Count Of Other Rows With Overlapping Dates"