Skip to content Skip to sidebar Skip to footer

How Do I Sum Time Series Data By Day In Python? Resample.sum() Has No Effect

I am new to Python. How do I sum data based on date and plot the result? I have a Series object with data like: 2017-11-03 07:30:00 NaN 2017-11-03 09:18:00 NaN 2017-11-03

Solution 1:

Use pandas groupby function.

importioimportpandasaspddata=io.StringIO('''2017-11-03 07:30:00,NaN2017-11-03 09:18:00,NaN2017-11-03 10:00:00,NaN2017-11-03 11:08:00,NaN2017-11-03 14:39:00,NaN2017-11-03 14:53:00,NaN2017-11-03 15:00:00,NaN2017-11-03 16:00:00,NaN2017-11-03 17:03:00,NaN2017-11-03 17:42:00,800.02017-11-04 07:27:00,600.02017-11-04 10:10:00,NaN2017-11-04 11:48:00,NaN2017-11-04 12:58:00,500.02017-11-04 13:40:00,NaN2017-11-04 15:15:00,NaN2017-11-04 16:21:00,NaN2017-11-04 17:37:00,500.02017-11-04 21:37:00,NaN2017-11-05 03:00:00,NaN2017-11-05 06:30:00,NaN2017-11-05 07:19:00,NaN2017-11-05 08:31:00,200.02017-11-05 09:31:00,500.02017-11-05 12:03:00,NaN2017-11-05 12:25:00,200.02017-11-05 13:11:00,500.02017-11-05 16:31:00,NaN2017-11-05 19:00:00,500.02017-11-06 08:08:00,NaN''')
column_names = ['date','val']df=pd.read_csv(data,sep=',',header=None,names=column_names)df['date']=pd.to_datetime(df['date'])df=df.groupby(df['date'].dt.date)[['val']].sum()df.plot()

Solution 2:

This answer helped me see that I needed to assign it to a new object (if that's the right terminology):

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('/Users/user/Documents/health/PainOverTime.csv',delimiter=',')
# plot bar graph of date and painkiller amount
times = pd.to_datetime(df.loc[:,'Time'])

# raw plot of data
ts = pd.Series(df.loc[:,'acetaminophen'].values, index = times,
               name = 'Painkiller over Time')
fig1 = ts.plot()

# combine data by day
test2 = ts.resample('D').sum()
fig2 = test2.plot()

That produces the following plots:

first plot

second plot

Is this method not better than the 'groupby' function?

Now how do I make a scatter or bar plot instead of this line plot...?

Solution 3:

Short answer: you need .groupby(), not .resample(), as in this answer

Longer code:

importpandasaspdfromioimportStringIOdoc=StringIO("""2017-11-0307:30:00NaN2017-11-03 09:18:00      NaN2017-11-03 10:00:00      NaN2017-11-03 11:08:00      NaN2017-11-03 14:39:00      NaN2017-11-03 14:53:00      NaN2017-11-03 15:00:00      NaN2017-11-03 16:00:00      NaN2017-11-03 17:03:00      NaN2017-11-03 17:42:00    800.02017-11-04 07:27:00    600.02017-11-04 10:10:00      NaN2017-11-04 11:48:00      NaN2017-11-04 12:58:00    500.02017-11-04 13:40:00      NaN2017-11-04 15:15:00      NaN2017-11-04 16:21:00      NaN2017-11-04 17:37:00    500.02017-11-04 21:37:00      NaN2017-11-05 03:00:00      NaN2017-11-05 06:30:00      NaN2017-11-05 07:19:00      NaN2017-11-05 08:31:00    200.02017-11-05 09:31:00    500.02017-11-05 12:03:00      NaN2017-11-05 12:25:00    200.02017-11-05 13:11:00    500.02017-11-05 16:31:00      NaN2017-11-05 19:00:00    500.02017-11-06 08:08:00      NaN""")df=pd.read_csv(doc,sep='\\s{2,}',header=None,converters={'timestamp':pd.to_datetime},names= ['timestamp', 'acetaminophen'],engine='python')df=df.set_index('timestamp')#true, but rather ugly x axis linedf.plot.bar()df1=df.groupby(by=[df.index.date]).sum()df1.plot.bar()

If you dates are not continious, you can create an empty dataframe with full timeindex and merge df1 with it.

Post a Comment for "How Do I Sum Time Series Data By Day In Python? Resample.sum() Has No Effect"