calculate slope in dataframe

This question is just about calculating the slope at each timestep in a dataframe. There's a lot of extra detail here, that you are welcome to peruse or not, but that one step is all Im looking for.

I have a forecast and an observed dataframe. I am trying to calculate the "interesting" changes in the forecast.

I'd like to try to accomplish that by:

  • calculate the best fit of the observed data (ie, linear regression).
  • find its slope
  • find the difference between the slope and the slope at each moment of the observed data

To do this, I need to generate the slope at each moment in the time series.

  • calculate the stddev and mean of that difference
  • use that to generate z-scores for the values in the forecast DF.

How do I calculate the slope at each point in the data?

original

from sklearn import linear_model

original = series.copy() # the observations
f = y.copy() # the forecast

app = ' app_2'

original.reset_index(inplace=True)
original['date'] = pd.to_timedelta(original['date'] ).dt.total_seconds().astype(int)    

# * calculate the best fit of the observed data (ie, linear regression).
reg = linear_model.LinearRegression()

# * find its slope
reg.fit(original['date'].values.reshape(-1, 1), original[app].values)
slope = reg.coef_

# * find the difference between the slope and the slope at each moment of the observed data
delta = original[app].apply(lambda x: abs(slope - SLOPE_OF(x)))

# * calculate the stddev and mean of that difference
odm = delta.mean()
ods = delta.std(ddof=0)

# * use that to generate z-scores for the values in the forecast DF. 
# something like
f['test_delta'] = np.cumsum(f[app]).apply(lambda x: abs(slope - x))
f['z'] = f['test_delta'].apply(lambda x: x - odm / ods)

# from that I might find interesting segments of the forecast:
sig = f.index[f['z'] > 2].tolist()
728x90

1 Answers calculate slope in dataframe

To "calculate the slope at each point in the data," the simplest is to compute "rise over run" for each adjacent row using Series.diff() as follows. The resulting Series gives (an estimate of) the instantaneous rate of change (IROC) between the previous and current row.

iroc = original[app].diff() / original['date'].diff()

Also, you don't need apply. Thanks to numpy vectorization, scalar - array behaves as expected:

delta = slope - iroc

Hope this works. As Wen-Ben commented, it would really help to see actual data and your expected output.

1 weeks ago