Find non-overlap instances (5-day stock returns) of rolling window product which meet condition

I have a time series of Amazon close price from 31 Dec 2009 to present.

I am trying to find the number of instances where 5-day returns of Amzn falls more than 15%

Date          Open         High        Low        Close      Adj Close  Volume
2009-12-30  138.399994  138.399994  135.279999  136.490005  136.490005  6913200
2009-12-31  137.089996  137.279999  134.520004  134.520004  134.520004  4523000



## to get the 1-day returns 
df['returns'] =  df['Close'] / df['Close'].shift(1) 

## to get the rolling 5-day performance 
df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod()) 


## filter returns -15% or more 
df2 = df[df['roll']<.85] 

While the above outputs a table of rows where 5-day returns is -15% or more, the output of the df2 has overlapping 5-day windows.

1) How can I output in a list the rolling 5-day window as a column in the data frame => for example : ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3']

2) How can I remove overlapping periods in df2. If there is any days in the 5-day window that overlaps with another row, only keep 1 of the row.

728x90

1 Answers Find non-overlap instances (5-day stock returns) of rolling window product which meet condition

I'm not 100% sure what you mean in question 1, but if you want to extract all the dates from the dataframe index and put them into a list you can use list(df.index). If you meant that you would like to have a column which contains the dates used in each 5-day rolling window then this is likely to be non-trivial and a bit hacky. Pandas won't allow you just to calculate this with a roll. However here is a working solution

def find_dates(df, center=False):

    dates = df.index.values
    dates_list = np.zeros((dates.shape[0],5), dtype=object)
    if center:
        for i,r in enumerate([2,1,0,-1,-2]):
            dates_list[:,i]=np.roll(dates,r)
            nul_dates_numbers = [0,1,len(dates)-1,len(dates)-2]

    else:
        for i,r in enumerate([4,3,2,1,0]):
            dates_list[:,i]=np.roll(dates,r)
            nul_dates_numbers = [0,1,2,3]

    dates_list = [list(d) if j not in nul_dates_numbers else None for j,d in enumerate(dates_list)]
    return dates_list


# make a quick dataframe
index_leters = 'a b c d e f g h i j'
indexes = index_leters.split(' ')
df = pd.DataFrame({'B': list(np.arange(len(indexes)))}, index=indexes)

center = False # can set to False
#apply rolling function    
df['roll']= pd.rolling_apply(df.B,5,lambda x : x.prod(), center=center) 

# extract index windows (will work on dates)
df['dates'] = find_dates(df, center=center)

2) Assuming that you have data for each date then a simple way of removing overlapping periods is just to slice the dataframe to only keep every 5 rows

## to get the rolling 5-day performance 
df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod())
df_nonoverlapping = df[::5]

Also if you would like to have the date assigned to each window as the middle element in your window. ie in the window ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3'] you want the date to be '2010-1-1' then you should use

df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod(), center=True) 

The way you have it now then date index for each 5 day window will be set as the last one, ie '2010-1-3' in this case.

Finally you should note that you will have a few NaN values in your 'roll' column because you can't calculate the 5 day rolling average at both the start and end of your dataframe. SO expect to have 4 NaN values in your 'roll' column.

4 months ago