Skip to content Skip to sidebar Skip to footer

Datetime Issues While Time Series Predicting In Pandas

Trying to implement the model of time series predicting in python but facing with issues with datetime data. So I have a dataframe 'df' with two columns of datetime and float types

Solution 1:

It's complicated.

First of all, when creating a numpy array, all types will be the same. However, datetime64 is not the same as int. So we'll have to resolve that, and we will.

Second, you tried to do this with df.values. Which makes sense, however, what happens is that pandas makes the whole df into dtype=object then into an object array. The problem with that is that Timestamps get left as Timestamps which is getting in your way.

So I'd convert them on my own like this

a = np.column_stack([df[c].values.astype(int) for c in ['transaction_date', 'amount']])

a

array([[1454284800000000000,                   1],
       [1454371200000000000,                   2],
       [1454457600000000000,                   3],
       [1454544000000000000,                   4],
       [1454630400000000000,                   5]])

We can always convert the first column of a back like this

a[:, 0].astype(df.transaction_date.values.dtype)

array(['2016-02-01T00:00:00.000000000', '2016-02-02T00:00:00.000000000',
       '2016-02-03T00:00:00.000000000', '2016-02-04T00:00:00.000000000',
       '2016-02-05T00:00:00.000000000'], dtype='datetime64[ns]')

Solution 2:

you can convert your integer into a timedelta, and do the calculations as you did before:

from datetime import timedelta

interval = timedelta(days = 5)

#5 days later
time_stamp += interval

Post a Comment for "Datetime Issues While Time Series Predicting In Pandas"