Aggregations Over Specific Columns Of A Large Dataframe, With Named Output
I am looking for a way to aggregate over a large dataframe, possibly using groupby. Each group would be based on either pre-specified columns or regex, and the aggregation should p
Solution 1:
Not a groupby solution and it uses a loop but I think it's nontheless rather elegant: first get a list of unique column from - to combinations using a set and then do the sums using filter
:
cols = sorted([(x[0],x[1]) for x inset([(x.split('.')[0], x.split('.')[-1]) for x in df.columns])])
for c0, c1 in cols:
df[f'{c0}.SUM.{c1}'] = df.filter(regex = f'{c0}\.\d+\.{c1}').sum(axis=1)
Result:
A.1.EA.1.FA.1.GA.2.E...B.SUM.GC.SUM.EC.SUM.FC.SUM.G2018-08-31 978746408109...4061 5413 4102 49082018-09-30 923649488447...5585 3634 3857 42282018-10-31 911359897425...5039 2961 5246 41262018-11-30 77479536509...4634 4325 2975 42492018-12-31 608995114603...5377 5277 4509 34992019-01-31 138612363218...4514 5088 4599 48352019-02-28 994148933990...3907 4310 3906 35522019-03-31 950931209915...4354 5877 4677 55572019-04-30 255168357800...5267 5200 3689 50012019-05-31 593594824986...4221 2108 4636 36062019-06-30 975396919242...3841 4787 4556 31412019-07-31 350312104113...4071 5073 4829 3717
If you want to have the result in a new DataFrame, just create an empty one and add the columns to it:
result = pd.DataFrame()
for c0, c1 in cols:
result[f'{c0}.SUM.{c1}'] = df.filter(regex = f'{c0}\.\d+\.{c1}').sum(axis=1)
Update: using simple groupby
(which is even more simple in this particular case):
defgrouper(col):
c = col.split('.')
returnf'{c[0]}.SUM.{c[-1]}'
df.groupby(grouper, axis=1).sum()
Post a Comment for "Aggregations Over Specific Columns Of A Large Dataframe, With Named Output"