How Do I Sum Time Series Data By Day In Python? Resample.sum() Has No Effect
I am new to Python. How do I sum data based on date and plot the result? I have a Series object with data like: 2017-11-03 07:30:00 NaN 2017-11-03 09:18:00 NaN 2017-11-03
Solution 1:
Use pandas groupby function.
importioimportpandasaspddata=io.StringIO('''2017-11-03 07:30:00,NaN2017-11-03 09:18:00,NaN2017-11-03 10:00:00,NaN2017-11-03 11:08:00,NaN2017-11-03 14:39:00,NaN2017-11-03 14:53:00,NaN2017-11-03 15:00:00,NaN2017-11-03 16:00:00,NaN2017-11-03 17:03:00,NaN2017-11-03 17:42:00,800.02017-11-04 07:27:00,600.02017-11-04 10:10:00,NaN2017-11-04 11:48:00,NaN2017-11-04 12:58:00,500.02017-11-04 13:40:00,NaN2017-11-04 15:15:00,NaN2017-11-04 16:21:00,NaN2017-11-04 17:37:00,500.02017-11-04 21:37:00,NaN2017-11-05 03:00:00,NaN2017-11-05 06:30:00,NaN2017-11-05 07:19:00,NaN2017-11-05 08:31:00,200.02017-11-05 09:31:00,500.02017-11-05 12:03:00,NaN2017-11-05 12:25:00,200.02017-11-05 13:11:00,500.02017-11-05 16:31:00,NaN2017-11-05 19:00:00,500.02017-11-06 08:08:00,NaN''')
column_names = ['date','val']df=pd.read_csv(data,sep=',',header=None,names=column_names)df['date']=pd.to_datetime(df['date'])df=df.groupby(df['date'].dt.date)[['val']].sum()df.plot()
Solution 2:
This answer helped me see that I needed to assign it to a new object (if that's the right terminology):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('/Users/user/Documents/health/PainOverTime.csv',delimiter=',')
# plot bar graph of date and painkiller amount
times = pd.to_datetime(df.loc[:,'Time'])
# raw plot of data
ts = pd.Series(df.loc[:,'acetaminophen'].values, index = times,
name = 'Painkiller over Time')
fig1 = ts.plot()
# combine data by day
test2 = ts.resample('D').sum()
fig2 = test2.plot()
That produces the following plots:
Is this method not better than the 'groupby' function?
Now how do I make a scatter or bar plot instead of this line plot...?
Solution 3:
Short answer: you need .groupby()
, not .resample()
, as in this answer
Longer code:
importpandasaspdfromioimportStringIOdoc=StringIO("""2017-11-0307:30:00NaN2017-11-03 09:18:00 NaN2017-11-03 10:00:00 NaN2017-11-03 11:08:00 NaN2017-11-03 14:39:00 NaN2017-11-03 14:53:00 NaN2017-11-03 15:00:00 NaN2017-11-03 16:00:00 NaN2017-11-03 17:03:00 NaN2017-11-03 17:42:00 800.02017-11-04 07:27:00 600.02017-11-04 10:10:00 NaN2017-11-04 11:48:00 NaN2017-11-04 12:58:00 500.02017-11-04 13:40:00 NaN2017-11-04 15:15:00 NaN2017-11-04 16:21:00 NaN2017-11-04 17:37:00 500.02017-11-04 21:37:00 NaN2017-11-05 03:00:00 NaN2017-11-05 06:30:00 NaN2017-11-05 07:19:00 NaN2017-11-05 08:31:00 200.02017-11-05 09:31:00 500.02017-11-05 12:03:00 NaN2017-11-05 12:25:00 200.02017-11-05 13:11:00 500.02017-11-05 16:31:00 NaN2017-11-05 19:00:00 500.02017-11-06 08:08:00 NaN""")df=pd.read_csv(doc,sep='\\s{2,}',header=None,converters={'timestamp':pd.to_datetime},names= ['timestamp', 'acetaminophen'],engine='python')df=df.set_index('timestamp')#true, but rather ugly x axis linedf.plot.bar()df1=df.groupby(by=[df.index.date]).sum()df1.plot.bar()
If you dates are not continious, you can create an empty dataframe with full
timeindex and merge df1
with it.
Post a Comment for "How Do I Sum Time Series Data By Day In Python? Resample.sum() Has No Effect"