calculate slope in dataframe
This question is just about calculating the slope at each timestep in a dataframe. There's a lot of extra detail here, that you are welcome to peruse or not, but that one step is all Im looking for.
I have a forecast and an observed dataframe. I am trying to calculate the "interesting" changes in the forecast.
I'd like to try to accomplish that by:
- calculate the best fit of the observed data (ie, linear regression).
- find its slope
- find the difference between the slope and the slope at each moment of the observed data
To do this, I need to generate the slope at each moment in the time series.
- calculate the stddev and mean of that difference
- use that to generate z-scores for the values in the forecast DF.
How do I calculate the slope at each point in the data?
from sklearn import linear_model original = series.copy() # the observations f = y.copy() # the forecast app = ' app_2' original.reset_index(inplace=True) original['date'] = pd.to_timedelta(original['date'] ).dt.total_seconds().astype(int) # * calculate the best fit of the observed data (ie, linear regression). reg = linear_model.LinearRegression() # * find its slope reg.fit(original['date'].values.reshape(-1, 1), original[app].values) slope = reg.coef_ # * find the difference between the slope and the slope at each moment of the observed data delta = original[app].apply(lambda x: abs(slope - SLOPE_OF(x))) # * calculate the stddev and mean of that difference odm = delta.mean() ods = delta.std(ddof=0) # * use that to generate z-scores for the values in the forecast DF. # something like f['test_delta'] = np.cumsum(f[app]).apply(lambda x: abs(slope - x)) f['z'] = f['test_delta'].apply(lambda x: x - odm / ods) # from that I might find interesting segments of the forecast: sig = f.index[f['z'] > 2].tolist()
1 Answers calculate slope in dataframe
To "calculate the slope at each point in the data," the simplest is to compute "rise over run" for each adjacent row using
Series.diff() as follows. The resulting Series gives (an estimate of) the instantaneous rate of change (IROC) between the previous and current row.
iroc = original[app].diff() / original['date'].diff()
Also, you don't need
apply. Thanks to numpy vectorization,
scalar - array behaves as expected:
delta = slope - iroc
Hope this works. As Wen-Ben commented, it would really help to see actual data and your expected output.