Estimating the Hurst Exponent Using R/S Range

In this post we will estimate the Hurst exponent using the R/S method. The Hurst exponent determines the long range memory of a time series (more below). If a series has no memory ie if each point in time is independent from previous points in time then its said to be more of a random process. Examples of random processes are markov processes, Brownian motion and white noise.

A series which trends has autocorrelation where one point in time is correlated with a lagged version of itself. The autocorrelation of a series may be calculated with the following formula:

auto correlation

In English the top part of the equation (numerator) is the covariance where Yi is the series and Yi+k some lagged version of itself. The bottom part of the equation (denominator) is the variance.

So we simply calculate the covariance of the series and a lagged version of itself and divide by the variance.

We may plot the autocorrelation of the SP500 futures contract close prices.
source: https://www.quandl.com/data/CHRIS/CME_SP1-S-P-500-Futures-Continuous-Contract-1-SP1-Front-Month

Can not validate the accuracy of this data but good enough for illustration purposes.

auto_cor.png

This is far from random and close prices have a high degree of correlation between some lag of itself.

Now we may estimate the Hurst exponent over a SP futures return series using the rescaled analysis method (RS). This is the original method calculated by Harold Edwin Hurst who devised the formula to measure the mean reversion tendency of the Nile river.

We shall estimate H for a 1 bar return series and a 100 bar return series.

The R/S calculation steps are depicted on Wikipedia.

The Hurst exponent is more of an estimate than an absolute calculation. Corresponding H values signal:

H values less than 0.5 = anti persistent behavior, mean reverting
H values of 0.5 = random process
H values greater than 0.5 = persistent behavior, trending

The Hurst exponent procedure is as follows which calculates the R/S range and estimates the Hurst exponent by regression log(R/S) and log(n-lag):

1. Calculate a return series, We will estimate the Hurst exponent on a 1 bar return series. The thesis here is that it will be close to a random process if indeed each bar is independent from the last, or no auto correlation, H=0.5. If we pick a longer return series such as 100 we would expect to see higher H values and higher auto correlation coefficients.
2. Hurst exponent is estimated by regression of a power law of log(R/S) vs log(n-lag). The Slope being the Hurst exponent. For that reason logarithmically spaced block sizes were chosen at lags: [8,16,32,64,128,256,512,1024]
2. Calculate the mean for each block size. In this case 1:8, 2:9 etc.. or i-n+1:i. Do this for each block size.
3. For said block size, subtract the mean from each value in the return series
4. Sum the deviations from the mean
5. Find the maximum and minimum of the sum of deviations from the mean
6. Find R, the range of summed deviations from the mean by subtracting maximum – minimum
7. Calculate the standard deviation of the deviations from the mean
8. Calculate R/S, the rescaled range by dividing R by the stdev of deviations from the mean
9. After rolling through the series calculating the RS value along each block size. We take the mean RS value for each block size. So for each lag (block size) we have one mean value.
10. Perform a regression log2(mean_RS) vs log2(n_lag). The slope is the hurst exponent. The procedure above is detailed below using Julia language:


####################################################################################
# Rolling Hurst
####################################################################################

# initialize outputs
m = Array{Float64}(length(d_es_close),0)
log_n_out = m
out_RS = m

# Set lags (range or specific)
max_lag = 100
n = 200
lags = n:max_lag
# or specific lags #comment out where necessary
lags = [8,16,32,64,128,256,512,1024]
# Specify return series lag
n_lag = 2000

i=1
j=1
c=1
    for j = lags
        n=lags[c]
    # Calculate returns of the series
    #n_lag = lags[c] # set n_lag 1 for 1 day returns
    rets = zeros(d_es_close)
    n_lag = n_lag
    for i in n_lag+1:length(d_es_close)
        rets[i] = ((d_es_close[i] / d_es_close[i-n_lag])-1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1)
    end
    #rets = d_es_close
    # Find mean width of lookback
    mean_out = zeros(rets)
    for i = n:size(rets,1)
                mean_out[i] = mean(rets[i-n+1:i])
            end
    # Subtract deviations from the mean
    dev_mean_out = zeros(mean_out)
    for i = n:size(mean_out,1)
                dev_mean_out[i] = (rets[i] - mean_out[i])
            end
    # Roll sum the deviations from the mean
    sum_out = zeros(dev_mean_out)
    for i = n:size(dev_mean_out,1)
            sum_out[i] = sum(dev_mean_out[i-n+1:i])
        end
    # Find the maximum and minimum of sum of the mean deviations
    max_out = zeros(sum_out)
    for i = n:size(sum_out,1)
                max_out[i] = maximum(sum_out[i-n+1:i])
            end
    min_out = zeros(sum_out)
    for i = n:size(sum_out,1)
                min_out[i] = minimum(sum_out[i-n+1:i])
            end
    # R = Range, max - min
    R_out = zeros(dev_mean_out)
    for i= n:size(dev_mean_out,1)
        R_out[i] = max_out[i] - min_out[i]
    end
    # Rolling standard deviation of (returns) the mean
    stdev_out = zeros(rets)
    for i = n:size(rets,1)
            stdev_out[i] = sqrt(var(dev_mean_out[i-n+1:i]))
        end
    # Calculate rescaled range (Range (R_out) / stdev of returns )
    RS_out = zeros(R_out)
    for i = n:size(R_out,1)
            RS_out[i] = R_out[i] / stdev_out[i]
        end
    # Calculate log_n (n)
    index = fill(n,length(rets))
    log_n = zeros(rets)
    for i =n:size(index,1)
        log_n[i] = log2(index[i])
    end

# out
log_n_out = hcat(log_n_out, log_n)
#log_rs_out = hcat(log_rs_out, log_RS)
out_RS = hcat(out_RS, RS_out) # re-scaled range

c = c+1
end

# access dims of the matrix
# row ,col
#log_rs_out[20,:]

# Calculate expected value for R/S over various n
# Take the rolling mean over each varying n lag at n width
expected_RS = zeros(size(out_RS,1), size(out_RS,2))
n=100
for j = 1:size(out_RS,2)  # loop on column dimension
    for i = n:size(out_RS,1) # loop on row dimension
            expected_RS[i,j] =  mean(out_RS[i-n+1:i,j])
            end
        end

# log 2
expected_log_RS = zeros(size(out_RS,1), size(out_RS,2))
c=1
for j = 1:size(expected_log_RS,2)  # loop on column dimension
    for i = n:size(expected_log_RS,1) # loop on row dimension
            expected_log_RS[i,j] =  log2(expected_RS[i,j])
            end
                        c=c+1
        end

    # Regression slope of log(n) and expected value of log(R/S)
    # x = log(n), y = log(R/S)
    b_slope = zeros(size(out_RS,1))
    A_intercept = zeros(size(out_RS,1))

    for i = n:size(out_RS,1)
        xi = log_n_out[i,:]  # grab the row for varying lags of log_n
        yi = expected_log_RS[i,:] # grab the row for varying lags of r/s value
        # Mean of X (Mx)
        Mx = mean(xi)
        # Mean of Y (My)
        My = mean(yi)
        # Standard deviation of X (sX)
        sX = sqrt(var(xi))
        # Standard Deviation of Y (sY)
        sY = sqrt(var(yi))
        # Correlation of x and y  (r)
        r = cor(xi,yi)
        # slope (b) = r * (sY/sX)
        b = r * (sY/sX)
    # find intercept A = MY - bMX
    A = My - (b*Mx)

# out
b_slope[i] = b
A_intercept[i] = A
end

Using the code above we may plot the hurst exponent for a 1 bar return series.

Hurst (2)

If we have a random process, successive points in time are independent from each other. We see Hurst values fluctuated around the .50 area with H .75 in 2009 during a strong trending period.

We may see the effect of analyzing a longer return stream, lets say 100 bar return series:

H_close.png

We see Hurst values closer to 1 for a 100 bar return series.

Furthermore to see how the Hurst exponent and auto correlation are related, we may calculate a rolling auto correlation:

# Rolling Autocorrelation
# Sliding window (k width)
mean_out = zeros(size(d_es_close,1))
var_out = zeros(size(d_es_close,1))
cov_out = zeros(size(d_es_close,1))
autocor_out = zeros(size(d_es_close,1))
devmean_out = zeros(size(d_es_close,1))
n=1000
k = 1 # set lag
n_lag = 100 # return series lag

# Calculate 1 bar return series
rets = zeros(size(d_es_close,1))
for i in n_lag+1:length(d_es_close)
    rets[i] = ((d_es_close[i] / d_es_close[i-n_lag])-1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1)
end

# lag the series by k
lagged_series = [fill(0,k); rets[1:end-k]]

# Calculate the mean of the rolling  sample
for i = n:size(rets,1)
    mean_out[i] = mean(rets[i-n+1:i])
end

# Calculate deviations from the mean
for i = n:size(rets,1)
devmean_out[i] = rets[i] - mean_out[i]
end

# Calculate rolling variance
for i = n:size(rets,1)
    var_out[i] = var(rets[i-n+1:i])
end

# Calculate rolling covariance
for i = n:size(rets,1)
    if i+k >= size(rets,1)
        break
    end
    cov_out[i] = cov(rets[i-n+1:i],lagged_series[i-n+1:i])
end

# Rolling Autocorrelation
for i =n:size(rets,1)
    autocor_out[i] = cov_out[i] / var_out[i]
end

For a 1 bar return series at lag, k=1

auto_cor_1_bar

We see very little correlation from one point in time to the next. The respective hurst exponent is around 0.40 to 0.55 range.

And the auto correlation for a 100 bar return series at lag , k=1

auto_cor_100_bar.png

We see strong correlation at the 100 bar return series with correlation coefficients greater than .96 with respective hurst exponents closer to 1.0.

So whats the point in all of this?

As depicted above information pertaining to the (non) randomness of series can may be of some benefit when designing models to fit to the data. The autocorrelation / Hurst exponent may be used to compliment existing strategies signalling the type of regime the market currently is under and may add to strategies which better suit current conditions.

As a  note of interest. On a 1 bar daily return series for the SPY ETF at lags 2:20:

H_two_twenty

Values of H are below .50 and in mean reversion territory. Lets shorten the lags one more time 2:10.

H_two_ten

We see significant mean reversion H values. At holding periods sub 20 bars, a mean reversion model would be best suited.

A way I like to think of this is how quickly does something diffuse from the mean? A mean reversion series is almost stuck in the mud, and the prices are centered around a mean. I like to think of it as a compressed coil or in the fractal dimension, pertains to the roughness of a surface where mean reversion is most certainly rough. On the other side a strong trend has a faster rate of diffusion from the mean or in the fractal dimension has a smooth surface. The S&P500 is a blend of this action where on a shorter holding period, there is mean reversion tendency (rough) but on longer time horizon the market displays trending behavior (smooth).

In closing and to recap. We see how the Hurst exponent and the auto correlation of a series are related to each other. Random processes show no dependence from one point in time to a successive point in time, or having no correlation with a subsequent Hurst value of 0.5. On the other side of this, a series showing persistent behavior, a trending series, displayed high correlation coefficients where successive points in time were correlated with each other and subsequent H values closer to 1.

Code may be found on my github.

R – Quantifying Trend Days

In this post I will be using R and data.table to extract all trend up / down days. The following method was used to quantify a trend day:

1. Trend up = Close price closes within 25% of the days high
2. Trend down = Close prices closes within 25% of the days low
3. Exclusive of gaps, if open is above yesterdays high or low exclude
4. Daily return must be over / below .75 / -.75%

Other methods come to mind:
1. Select those stocks that close within 25% of low / high AND when daily gain / loss is above or below 1%.
2. Linear regression, low r2 see blog post here: http://quantgorilla.com/blog/quantifying-trend-days/

My shell of a code can be altered to include these methods but for now lets stick to closing within top / bottom 25% of days range excluding gap ups, the code to do this: (check github code bottom of page, WP is having a hard time displaying the code snippet correctly)

# Es Up / Down Trend Isolation
# 2.26.2018
# Andrew Bannerman 

# Load Daily ES Data
es_d <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.day.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_d$Date <- mdy(es_d$Date)  # Change to date format 

# Find relative position of the closing price  (Close - Low) / (High - Low)
es_d$pos <- (es_d$Close - es_d$Low) / (es_d$High - es_d$Low)

# Calculate returns
es_d$rets <- ((es_d$Close / lag(es_d$Close,1)) - 1) * 100

# Find gap up / down days 
es_d$gap_up  lag(es_d$High,1),1,0)
es_d$gap_dn <- ifelse(es_d$Open < lag(es_d$Low,1),1,0)

# Find trend up / down days 
es_d$trend_up = .75,1,0)
es_d$trend_dn <- ifelse(es_d$pos <= .25,1,0)

# Subset all trend up / down days 
# Trend day definition: close within 25 % of day high / low and close  over .75% to exclude quiet days 
# Exclude gap up / down days where open closes over high , low
trend_up = .75 & es_d$gap_up == 0, ]
trend_dn <- es_d[es_d$trend_dn == 1 & es_d$rets <= -.75 & es_d$gap_dn == 0, ]

# Count total trend up / down days
total_up <- nrow(trend_up)
total_dn <- nrow(trend_dn)

# Percentage trend days of sample
total <- nrow(es_d)
perc.up <- total_up / total
perc.dn <- total_dn / total

# Save Dates in order to use for susetting 5 min bars
trend_up_dates <- trend_up$Date
trend_dn_dates <- trend_dn$Date

There is a total of  5167 days in the sample. Of those days approx 11% are up trend days and 10% down trend days.  Next we may extract each trend day and save the 1 minute intraday plot:

# Load 1 minute bars
es_1 <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.min.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_1$Date_new <- paste(es_1$Date,es_1$Time)
es_1$Date_new <- mdy_hm(es_1$Date_new) # Change date to date format
es_1$Date <- mdy(es_1$Date)
# Save up trend plots
# initialize list
t_UP <- list()
i=1
for (i in 1:length(trend_up_dates)) {
tryCatch({
ptm0 <- proc.time()
temp <- subset(es_1 , Date == trend_up_dates[i]) # Conditionl subset == grab trend day on intraday level
temp$range <- temp$High – temp$Low
t_UP[[i]] <- temp
name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
temp <- temp[3:10]
head(temp)
colnames(temp)[7] <- "Date"
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
head(df)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names

# Save up trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_up_plots"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Up Day"))
dev.off()
ptm1=proc.time() – ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}

# General Trend Up Stats
trend_up_out <- do.call(rbind,t_UP)
head(trend_up_out)
t_up_mean_r <- mean(trend_up_out$range)

# Save intraday down trend plots
# initialize list
t_DN <- list()
i=1
for (i in 1:length(trend_dn_dates)) {
tryCatch({
ptm0 <- proc.time()
Sys.sleep(0.1)
temp <- subset(es_1 , Date == trend_dn_dates[i])
temp$range <- temp$High – temp$Low
t_DN[[i]] <- temp
name_date <- as.numeric(gsub("-", "", head(temp$Date,1))) #replace – with ""
temp$chg temp$Open, "up", "dn")
temp <- temp[3:10]
colnames(temp)[7] <- "Date"
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names

# Save down trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_down_plots"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Down Day"))
dev.off()
ptm1=proc.time() – ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}

# General Trend Down Stats
trend_dn_out <- do.call(rbind,t_DN)
t_dn_mean_r <- mean(trend_dn_out$range)

With example output:

20180126

20020611

Next we can plot daily bars, 5 days prior to each trend day:

##############################################################################
# Extract 5 days prior to trend up / down day
##############################################################################
dates_df <- data.frame(Date = es_d$Date)
dates_df$wkdays <- weekdays(as.Date(dates_df$Date)) # extract day of week

# Grab trend up start dates
es_d_prior_m <- ifelse(dates_df$wkdays == "Monday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_tue <- ifelse(dates_df$wkdays == "Tuesday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_w <- ifelse(dates_df$wkdays == "Wednesday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_th <- ifelse(dates_df$wkdays == "Thursday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_f <- ifelse(dates_df$wkdays == "Friday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 5),NA)
# Remove NA
es_d_prior_m <- es_d_prior_m[!is.na(es_d_prior_m)]
es_d_prior_tue <- es_d_prior_tue[!is.na(es_d_prior_tue)]
es_d_prior_w <- es_d_prior_w[!is.na(es_d_prior_w)]
es_d_prior_th <- es_d_prior_th[!is.na(es_d_prior_th)]
es_d_prior_f <- es_d_prior_f[!is.na(es_d_prior_f)]
t_up_all_prior <- c(es_d_prior_m,es_d_prior_tue,es_d_prior_w,es_d_prior_th,es_d_prior_f)
# sort dates
t_up_all_prior <- sort(t_up_all_prior)
t_up_all_prior <- ymd(t_up_all_prior)

# up trend subsetting
up_list <- list()
for (i in 1:length(trend_up_dates)) {
  up_list[[i]] = t_up_all_prior[i] & es_d$Date <= trend_up_dates[i],]
}

# Plot 5 days prior to uptrend
t_UP_Prior <- list()
i=1
for (i in 1:length(up_list)) {
  tryCatch({
    ptm0 <- proc.time()
    temp <- up_list[[i]] # Conditionl subset == grab trend day on intraday level
    name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
    df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
    open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    all_ts <- cbind(open,high,low,close)
    names <- c("Open","High","Low","Close")
    colnames(all_ts) <- names

    # Save up trend plots
    dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_up_prior"
    mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
    png(file=mypath,width= 1400, height=1000)
    mytitle = paste(name_date[1])
    chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini -- Trend Up Day"))
    dev.off()
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

# Grab trend down start dates
es_d_prior_m <- ifelse(dates_df$wkdays == "Monday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_tue <- ifelse(dates_df$wkdays == "Tuesday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_w <- ifelse(dates_df$wkdays == "Wednesday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_th <- ifelse(dates_df$wkdays == "Thursday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_f <- ifelse(dates_df$wkdays == "Friday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 5),NA)
# Remove NA
es_d_prior_m <- es_d_prior_m[!is.na(es_d_prior_m)]
es_d_prior_tue <- es_d_prior_tue[!is.na(es_d_prior_tue)]
es_d_prior_w <- es_d_prior_w[!is.na(es_d_prior_w)]
es_d_prior_th <- es_d_prior_th[!is.na(es_d_prior_th)]
es_d_prior_f <- es_d_prior_f[!is.na(es_d_prior_f)]
t_up_all_prior <- c(es_d_prior_m,es_d_prior_tue,es_d_prior_w,es_d_prior_th,es_d_prior_f)
# sort dates
t_up_all_prior <- sort(t_up_all_prior)
t_up_all_prior <- ymd(t_up_all_prior)

# down trend subsetting
dn_list <- list()
for (i in 1:length(trend_dn_dates)) {
  dn_list[[i]] = t_up_all_prior[i] & es_d$Date <= trend_dn_dates[i],]
}

# Plot 5 days prior to down trend
i=1
for (i in 1:length(dn_list)) {
  tryCatch({
    ptm0 <- proc.time()
    temp <- dn_list[[i]] # Conditionl subset == grab trend day on intraday level
    name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
    df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
    open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    all_ts <- cbind(open,high,low,close)
    names <- c("Open","High","Low","Close")
    colnames(all_ts) <- names

    # Save up trend plots
    dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_dn_prior"
    mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
    png(file=mypath,width= 1400, height=1000)
    mytitle = paste(name_date[1])
    chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini -- Trend Up Day"))
    dev.off()
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

With output:

19990204

20110914

There are two ways to attempt to predict a trend day with a certain degree of probability:

1. Use patterns of daily bars
2. Use intraday patterns within the trend day

The daily bars plotted above allow to view daily bars prior to a trend up / down day. This code can be extended to look at specific features which I will allow the reader to work at which may include:

1. Prior to trend day, was previous day down or up?
2. Where did the price close relative to the days range (high – low / high – low)
3. What did the 3 day pattern look like prior to a trend day?
– 3 tight closes?
– middle bar low and close lower than the bars each side?

etc etc…

Now we move on to an attempt to quantify a trend day within the trend day itself. This involves a simple count of the number of new 1 minute highs made by a certain time frame. In this example I chose the number of 1 minute highs or lows prior to 10am Central time.

The code that achieves this:

##########################################################################################
# Intraday Trend up / down
##########################################################################################
es_1 <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.min.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_1$Date_new <- paste(es_1$Date,es_1$Time)
es_1$Date_new  <- mdy_hm(es_1$Date_new)  # Change date to date format
es_1$Date <- mdy(es_1$Date)
# make data.table
# data.table for speed
es_1  shift(roll_max), 1,0),by = Date] # place 1 when close makes new day high
# rolling day low
es_1[, roll_min := cummin(Low), by = Date] # rolling day high
es_1[, close_below_low := ifelse(Close = "08:31" & Time = "08:31" & Time  0)
# plot distriutions
ggplot(data=plot.df, aes(plot.df$count_max)) +
  geom_histogram(binwidth = .5,fill="darkgreen",alpha = 0.5,col="grey") +
  scale_x_continuous(breaks = seq(plot.df$count_max), max(plot.df$count_max),name="No. New Highs Prior 10am Central")+
  scale_y_continuous(breaks = seq(0, 600000,100000))+
  ggtitle("Number of new 1 minute highs prior to 10am Central")+
  ylab("Total New Highs")

# Plot new lows
ggplot(data=plot.df, aes(plot.df$count_min)) +
  geom_histogram(binwidth = .5,fill="red",alpha = 0.5,col="grey") +
  scale_x_continuous(breaks = seq(plot.df$count_min), max(plot.df$count_min),name="No. New Lows Prior 10am Central")+
  #scale_y_continuous(breaks = seq(0, 35500,2000))+
  ggtitle("Number of new 1 minute lows prior to 10am Central")+
  ylab("Total New Low")
graphics.off()

We may plot the distribution of new highs / lows prior to 10am central time:

Rplot305

Rplot306

We will back test less frequent new highs / new lows values prior to 10am central.

Rules of back test:

1. Go long when number of new highs prior to 10am central is >= 8
2. Go short when number of new highs prior to 10am central is >= 8
3. exit at the end of the trading day

no stops, no slippage and commissions.

The code to do this:

# Back test
# Trading Signal long and short
es_1[, sig_buy := ifelse(before_10_UCT_up >= 8, 1,0)] # long signal
es_1[, sig_exit :=  Time = 8, -1,0)] # long signal
es_1[, sig_s_exit :=  Time < "15:15"] # exit time
#es_1[is.na(es_1)] <- 0
es_1[, sig_end_l := ifelse(sig_exit == FALSE,0,NA)] # mark end of trade retain NA
es_1[, sig_end_s := ifelse(sig_s_exit == FALSE,0,NA)] # mark end of trade retain NA
# Combine NA + signal for long trades
es_1$z_l <- ifelse(es_1$sig_buy == 1,
               rowSums(es_1[, c("sig_buy", "sig_end_l")], na.rm=TRUE),
               rowSums(es_1[, c("sig_buy", "sig_end_l")]))
es_1$z_l[1] <- 0
# Combine NA + signal for short trades
es_1$z_s <- ifelse(es_1$sig_short == -1,
                   rowSums(es_1[, c("sig_short", "sig_end_s")], na.rm=TRUE),
                   rowSums(es_1[, c("sig_short", "sig_end_s")]))
es_1$z_s[1] <- 0
# Forward fill 1's to end of the trade using package zoo, function na.locf
es_1[, final.signal_long := na.locf(z_l)] # long trades
es_1[, final.signal_short := na.locf(z_s)] # long trades
# lag signal by one forward day to signal entry next day (Trade at market open)
es_1$final.signal_long <- lag(es_1$final.signal_long,1) # Note k=1 implies a move *forward*
es_1$final.signal_short <- lag(es_1$final.signal_short,1) # Note k=1 implies a move *forward*
# Close to close returns
es_1[, one_min_rets := (Close / shift(Close)) - 1]
# Calculate strategy returns
es_1[, sig_long_rets := final.signal_long * one_min_rets]
es_1[, sig_short_rets := final.signal_short * one_min_rets]
es_1$sig_long_rets[1] <- 0
es_1$sig_short_rets[1] <- 0
# Combine equity curves
es_1[, combined_rets := sig_long_rets + sig_short_rets]
# Cumulative returns
es_1[, cum_rets_l := cumprod(1 + sig_long_rets) - 1]
es_1[, cum_rets_s := cumprod(1 + sig_short_rets) - 1]
es_1[, cum_rets_comb := cumprod(1 + combined_rets) - 1]
plot(es_1$Date_new,es_1$cum_rets_l,type="l")
graphics.off()
# Plot equity curves
line_plot_df <- data.frame(es_1$cum_rets_l,es_1$cum_rets_s,es_1$cum_rets_comb)
line.plot.df <- melt(line_plot_df)
date <- rep(es_1$Date,3)
line.plot.df <- cbind(line.plot.df,date)
head(line.plot.df)
ggplot(data=line.plot.df, aes(x=date, y=value, group = variable, colour = variable)) +
  geom_line()

With the final equity curves:

Rplot307

No optimization for times, number of new highs etc… I arbitrarily chose each.

To conclude, the idea of predicting trend days prior and within the trend day itself is an interesting idea. This is a very simple example which may be extended.

Full code can be found on my github:

Thanks for reading!

Model Fitting – Using Autocorrelation to Detect The Nature of a Time Series

I believe a solid approach in developing trading strategies is to fit the correct model to the under lying. In a previous post we studied the Hurst exponent and its ability to detect if a series was mean reverting, momentum or a random walk. In this post we will look at a 2 lag autocorrelation on the S&P500. The long and short of it is: if there is high degree of correlation from one point in time to a previous point in time then we have momentum. Conversely, if we have weak correlation between one point in time to a previous point in time, we can say we have mean reversion.

In order to test this we use the acf R function. We run this over daily S&P500 log returns using rollyapply on a (252 trading days * 3, 756 bars = 3 trading years) rolling window. Thus each data point is the correlation coefficient of the previous 3 year rolling window.

The code that achieves this:

# Rolling auto correlation 

# Required Packages
require(ggplot2)
require(TTR)

# SP500 
SP500 <- read.csv("C:/Stock Market Analysis/Norgate Market Data/MASTER_DATA_DUMP/$SPX.csv", header=TRUE,stringsAsFactors = FALSE)
SP500$Date <- ymd(SP500$Date)
SP500$log.close <- log(SP500$Close)

# rets 
SP500$rets <- ROC(SP500$log.close, type = c("discrete"))
SP500$rets[1] <-0

# Rolling acf width 
acf.width <- 756 # bars ,1008 bars =  4 years (252 * 4 = 1008)

# Dates 
SP500dates <- SP500[acf.width:nrow(SP500),]
dates.df <- data.frame(Date=SP500dates$Date)
str(dates.df)
head(dates.df )

# Rolling auto correlation 
result <- rollapply(SP500$rets, width = acf.width, FUN=acf, 
                    lag.max = 30,type = "correlation",plot = FALSE)

# Extract different correlations 
cor2 <- lapply(result[,1], function(x) x[2, drop=FALSE])
cor3 <- lapply(result[,1], function(x) x[3, drop=FALSE])
cor4 <- lapply(result[,1], function(x) x[4, drop=FALSE])
cor5 <- lapply(result[,1], function(x) x[5, drop=FALSE])
cor6 <- lapply(result[,1], function(x) x[6, drop=FALSE])
cor7 <- lapply(result[,1], function(x) x[7, drop=FALSE])
cor8 <- lapply(result[,1], function(x) x[8, drop=FALSE])
cor9 <- lapply(result[,1], function(x) x[9, drop=FALSE])
cor10 <- lapply(result[,1], function(x) x[10, drop=FALSE])
cor11 <- lapply(result[,1], function(x) x[11, drop=FALSE])
cor12 <- lapply(result[,1], function(x) x[12, drop=FALSE])
cor13 <- lapply(result[,1], function(x) x[13, drop=FALSE])
cor14 <- lapply(result[,1], function(x) x[14, drop=FALSE])
cor15 <- lapply(result[,1], function(x) x[15, drop=FALSE])
cor16 <- lapply(result[,1], function(x) x[16, drop=FALSE])
cor17 <- lapply(result[,1], function(x) x[17, drop=FALSE])
cor18 <- lapply(result[,1], function(x) x[18, drop=FALSE])
cor19 <- lapply(result[,1], function(x) x[19, drop=FALSE])
cor20 <- lapply(result[,1], function(x) x[20, drop=FALSE])
cor21 <- lapply(result[,1], function(x) x[21, drop=FALSE])
cor22 <- lapply(result[,1], function(x) x[22, drop=FALSE])
cor23 <- lapply(result[,1], function(x) x[23, drop=FALSE])
cor24 <- lapply(result[,1], function(x) x[24, drop=FALSE])
cor25 <- lapply(result[,1], function(x) x[25, drop=FALSE])
cor26 <- lapply(result[,1], function(x) x[26, drop=FALSE])
cor27 <- lapply(result[,1], function(x) x[27, drop=FALSE])
cor28 <- lapply(result[,1], function(x) x[28, drop=FALSE])
cor29 <- lapply(result[,1], function(x) x[29, drop=FALSE])
cor30 <- lapply(result[,1], function(x) x[30, drop=FALSE])

# cbind outputs
cor2.df <- as.data.frame(do.call(rbind, cor2))
cor2.df <- data.frame(Date=dates.df$Date,cor2.df)
cor3.df <- as.data.frame(do.call(rbind, cor3))
cor3.df <- data.frame(Date=dates.df$Date,cor3.df)
cor4.df <- as.data.frame(do.call(rbind, cor4))
cor4.df <- data.frame(Date=dates.df$Date,cor4.df)
cor5.df <- as.data.frame(do.call(rbind, cor5))
cor5.df <- data.frame(Date=dates.df$Date,cor5.df)
cor6.df <- as.data.frame(do.call(rbind, cor6))
cor6.df <- data.frame(Date=dates.df$Date,cor6.df)
cor7.df <- as.data.frame(do.call(rbind, cor7))
cor7.df <- data.frame(Date=dates.df$Date,cor7.df)
cor8.df <- as.data.frame(do.call(rbind, cor8))
cor8.df <- data.frame(Date=dates.df$Date,cor8.df)
cor9.df <- as.data.frame(do.call(rbind, cor9))
cor9.df <- data.frame(Date=dates.df$Date,cor9.df)
cor10.df <- as.data.frame(do.call(rbind, cor10))
cor10.df <- data.frame(Date=dates.df$Date,cor10.df)
cor11.df <- as.data.frame(do.call(rbind, cor11))
cor11.df <- data.frame(Date=dates.df$Date,cor11.df)
cor12.df <- as.data.frame(do.call(rbind, cor12))
cor12.df <- data.frame(Date=dates.df$Date,cor12.df)
cor13.df <- as.data.frame(do.call(rbind, cor13))
cor13.df <- data.frame(Date=dates.df$Date,cor13.df)
cor14.df <- as.data.frame(do.call(rbind, cor14))
cor14.df <- data.frame(Date=dates.df$Date,cor14.df)
cor15.df <- as.data.frame(do.call(rbind, cor15))
cor15.df <- data.frame(Date=dates.df$Date,cor15.df)
cor16.df <- as.data.frame(do.call(rbind, cor16))
cor16.df <- data.frame(Date=dates.df$Date,cor16.df)
cor17.df <- as.data.frame(do.call(rbind, cor17))
cor17.df <- data.frame(Date=dates.df$Date,cor17.df)
cor18.df <- as.data.frame(do.call(rbind, cor18))
cor18.df <- data.frame(Date=dates.df$Date,cor18.df)
cor19.df <- as.data.frame(do.call(rbind, cor19))
cor19.df <- data.frame(Date=dates.df$Date,cor19.df)
cor20.df <- as.data.frame(do.call(rbind, cor20))
cor20.df <- data.frame(Date=dates.df$Date,cor20.df)
cor21.df <- as.data.frame(do.call(rbind, cor21))
cor21.df <- data.frame(Date=dates.df$Date,cor21.df)
cor22.df <- as.data.frame(do.call(rbind, cor22))
cor22.df <- data.frame(Date=dates.df$Date,cor22.df)
cor23.df <- as.data.frame(do.call(rbind, cor23))
cor23.df <- data.frame(Date=dates.df$Date,cor23.df)
cor24.df <- as.data.frame(do.call(rbind, cor24))
cor24.df <- data.frame(Date=dates.df$Date,cor24.df)
cor25.df <- as.data.frame(do.call(rbind, cor25))
cor25.df <- data.frame(Date=dates.df$Date,cor25.df)
cor26.df <- as.data.frame(do.call(rbind, cor26))
cor26.df <- data.frame(Date=dates.df$Date,cor26.df)
cor27.df <- as.data.frame(do.call(rbind, cor27))
cor27.df <- data.frame(Date=dates.df$Date,cor27.df)
cor28.df <- as.data.frame(do.call(rbind, cor28))
cor28.df <- data.frame(Date=dates.df$Date,cor28.df)
cor29.df <- as.data.frame(do.call(rbind, cor29))
cor29.df <- data.frame(Date=dates.df$Date,cor29.df)
cor30.df <- as.data.frame(do.call(rbind, cor30))
cor30.df <- data.frame(Date=dates.df$Date,cor30.df)

# smooth 
#cor2.df$sma <- SMA(cor2.df$V1,100)

# Plot ACF 
ggplot(cor2.df,aes(Date,V1))+geom_line(aes(color="Lag 2"))+
    geom_line(data=cor3.df,aes(color="Lag 3"))+
  geom_line(data=cor4.df,aes(color="Lag 4"))+
  geom_line(data=cor5.df,aes(color="Lag 5"))+
  geom_line(data=cor6.df,aes(color="Lag 6"))+
  geom_line(data=cor7.df,aes(color="Lag 7"))+
  geom_line(data=cor8.df,aes(color="Lag 8"))+
  geom_line(data=cor9.df,aes(color="Lag 9"))+
  geom_line(data=cor10.df,aes(color="Lag 10"))+
  geom_line(data=cor11.df,aes(color="Lag 11"))+
  geom_line(data=cor12.df,aes(color="Lag 12"))+
  geom_line(data=cor13.df,aes(color="Lag 13"))+
  geom_line(data=cor14.df,aes(color="Lag 14"))+
  geom_line(data=cor15.df,aes(color="Lag 15"))+
  geom_line(data=cor16.df,aes(color="Lag 16"))+
  geom_line(data=cor17.df,aes(color="Lag 17"))+
  geom_line(data=cor18.df,aes(color="Lag 18"))+
  geom_line(data=cor19.df,aes(color="Lag 19"))+
  geom_line(data=cor20.df,aes(color="Lag 20"))+
  geom_line(data=cor21.df,aes(color="Lag 21"))+
  geom_line(data=cor22.df,aes(color="Lag 22"))+
  geom_line(data=cor23.df,aes(color="Lag 23"))+
  geom_line(data=cor24.df,aes(color="Lag 24"))+
  geom_line(data=cor25.df,aes(color="Lag 25"))+
  geom_line(data=cor26.df,aes(color="Lag 26"))+
  geom_line(data=cor27.df,aes(color="Lag 27"))+
  geom_line(data=cor28.df,aes(color="Lag 28"))+
  geom_line(data=cor29.df,aes(color="Lag 29"))+
  geom_line(data=cor30.df,aes(color="Lag 30"))+
labs(color="Legend text")


# Plot with shade
ggplot(cor2.df,aes(Date,V1))+geom_line(aes(colour="V1"))+
  geom_area(aes(y = V1, fill="V1"))+
  ggtitle("Rolling Auto Correlation, 2 lag, rolling window = 3 years", subtitle = "S&P500 Log Returns") +
  labs(x="Date",y="Correlation")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  theme(legend.position="none")+
  scale_y_continuous(breaks = seq(-.5, .50, by = 0.05))

# Write 
write.csv(cor2.df,"C:/R Projects/lag2.autocorr.csv")

Plot the 2 lag autocorrelation:

Rplot172

A few things are obvious… but first… lets plot a simple strategy to test for buying high and selling higher (momentum), buying low and selling higher (mean reversion). We want to simply test what happens if we executed the above.

The rules:
1. Take a rolling z-score of S&P500 close prices
2. Momentum buy and exit: Buy when rolling z-score is over 0 and sell when its below 0.
3. Mean reversion buy and exit: Buy when rolling z-score is below 0 and sell when its above 0.

The code that achieves this:

# Mean Reversion @ Momentum S&P500 
# Andrew Bannerman 1.17.2018

######################################
# Required Packages 
#####################################
require(xts)
require(data.table)
require(ggplot2)
require(lubridate)
require(magrittr)
require(scales)
require(reshape2)
require(PerformanceAnalytics)
require(dplyr)
require(TTR)
require(gridExtra)

## Load Data / Format handling
SP500 <- read.csv("C:/Stock Market Analysis/Norgate Market Data/MASTER_DATA_DUMP/$SPX.csv", header=TRUE,stringsAsFactors = FALSE)
SP500$Date <- ymd(SP500$Date)

# Use TTR package to create rolling SMA n day moving average 
# Create function and loop in order to repeat the desired number of SMAs for example 2:30
getSMA <- function(numdays) {
  function(SP500) {
    SMA(SP500[,"Close"], numdays)    # Calls TTR package to create SMA
  }
}
# Create a matrix to put the SMAs in
sma.matrix <- matrix(nrow=nrow(SP500), ncol=0)

# Loop for filling it
for (i in 2:60) {
  sma.matrix <- cbind(sma.matrix, getSMA(i)(SP500))
}

# Rename columns
colnames(sma.matrix) <- sapply(2:60, function(n)paste("close.sma.n", n, sep=""))

# Bind to existing dataframe
SP500 <-  cbind(SP500, sma.matrix)

# Use TTR package to create rolling Standard Deviation
# Create function and loop in order to repeat the desired number of Stdev for example 2:30
getSD <- function(numdays) {
  function(SP500) {
    runSD(SP500$Close, numdays, cumulative = FALSE)    # Calls TTR package to create SMA
  }
}
# Create a matrix to put the SMAs in
sd.matrix <- matrix(nrow=nrow(SP500), ncol=0)

# Loop for filling it
for (i in 2:60) {
  sd.matrix <- cbind(sd.matrix, getSD(i)(SP500))
}

# Rename columns
colnames(sd.matrix) <- sapply(2:60, function(n)paste("close.sd.n", n, sep=""))

# Bind to existing dataframe
SP500 <-  cbind(SP500, sd.matrix)


# Use base R to work out the rolling z-score (Close - roll mean) / stdev
SP500$close.zscore.n2 <- apply(SP500[,c('Close','close.sma.n2', 'close.sd.n2')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n3 <- apply(SP500[,c('Close','close.sma.n3', 'close.sd.n3')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n4 <- apply(SP500[,c('Close','close.sma.n4', 'close.sd.n4')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n5 <- apply(SP500[,c('Close','close.sma.n5', 'close.sd.n5')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n6 <- apply(SP500[,c('Close','close.sma.n6', 'close.sd.n6')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n7 <- apply(SP500[,c('Close','close.sma.n7', 'close.sd.n7')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n8 <- apply(SP500[,c('Close','close.sma.n8', 'close.sd.n8')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n9 <- apply(SP500[,c('Close','close.sma.n9', 'close.sd.n9')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n10 <- apply(SP500[,c('Close','close.sma.n10', 'close.sd.n10')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n11 <- apply(SP500[,c('Close','close.sma.n11', 'close.sd.n11')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n12 <- apply(SP500[,c('Close','close.sma.n12', 'close.sd.n12')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n13 <- apply(SP500[,c('Close','close.sma.n13', 'close.sd.n13')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n14 <- apply(SP500[,c('Close','close.sma.n14', 'close.sd.n14')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n15 <- apply(SP500[,c('Close','close.sma.n15', 'close.sd.n15')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n16 <- apply(SP500[,c('Close','close.sma.n16', 'close.sd.n16')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n17 <- apply(SP500[,c('Close','close.sma.n17', 'close.sd.n17')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n18 <- apply(SP500[,c('Close','close.sma.n18', 'close.sd.n18')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n19 <- apply(SP500[,c('Close','close.sma.n19', 'close.sd.n19')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n20 <- apply(SP500[,c('Close','close.sma.n20', 'close.sd.n20')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n21 <- apply(SP500[,c('Close','close.sma.n21', 'close.sd.n21')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n22 <- apply(SP500[,c('Close','close.sma.n22', 'close.sd.n22')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n23 <- apply(SP500[,c('Close','close.sma.n23', 'close.sd.n23')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n24 <- apply(SP500[,c('Close','close.sma.n24', 'close.sd.n24')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n25 <- apply(SP500[,c('Close','close.sma.n25', 'close.sd.n25')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n26 <- apply(SP500[,c('Close','close.sma.n26', 'close.sd.n26')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n27 <- apply(SP500[,c('Close','close.sma.n27', 'close.sd.n27')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n28 <- apply(SP500[,c('Close','close.sma.n28', 'close.sd.n28')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n29 <- apply(SP500[,c('Close','close.sma.n29', 'close.sd.n29')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n30 <- apply(SP500[,c('Close','close.sma.n30', 'close.sd.n30')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n31 <- apply(SP500[,c('Close','close.sma.n31', 'close.sd.n31')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n32 <- apply(SP500[,c('Close','close.sma.n32', 'close.sd.n32')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n33 <- apply(SP500[,c('Close','close.sma.n33', 'close.sd.n33')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n34 <- apply(SP500[,c('Close','close.sma.n34', 'close.sd.n34')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n35 <- apply(SP500[,c('Close','close.sma.n35', 'close.sd.n35')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n36 <- apply(SP500[,c('Close','close.sma.n36', 'close.sd.n36')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n37 <- apply(SP500[,c('Close','close.sma.n37', 'close.sd.n37')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n38 <- apply(SP500[,c('Close','close.sma.n38', 'close.sd.n38')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n39 <- apply(SP500[,c('Close','close.sma.n39', 'close.sd.n39')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n40 <- apply(SP500[,c('Close','close.sma.n40', 'close.sd.n40')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n41 <- apply(SP500[,c('Close','close.sma.n41', 'close.sd.n41')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n42 <- apply(SP500[,c('Close','close.sma.n42', 'close.sd.n42')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n43 <- apply(SP500[,c('Close','close.sma.n43', 'close.sd.n43')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n44 <- apply(SP500[,c('Close','close.sma.n44', 'close.sd.n44')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n45 <- apply(SP500[,c('Close','close.sma.n45', 'close.sd.n45')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n46 <- apply(SP500[,c('Close','close.sma.n46', 'close.sd.n46')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n47 <- apply(SP500[,c('Close','close.sma.n47', 'close.sd.n47')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n48 <- apply(SP500[,c('Close','close.sma.n48', 'close.sd.n48')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n49 <- apply(SP500[,c('Close','close.sma.n49', 'close.sd.n49')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n50 <- apply(SP500[,c('Close','close.sma.n50', 'close.sd.n50')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n51 <- apply(SP500[,c('Close','close.sma.n51', 'close.sd.n51')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n52 <- apply(SP500[,c('Close','close.sma.n52', 'close.sd.n52')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n53 <- apply(SP500[,c('Close','close.sma.n53', 'close.sd.n53')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n54 <- apply(SP500[,c('Close','close.sma.n54', 'close.sd.n54')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n55 <- apply(SP500[,c('Close','close.sma.n55', 'close.sd.n55')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n56 <- apply(SP500[,c('Close','close.sma.n56', 'close.sd.n56')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n57 <- apply(SP500[,c('Close','close.sma.n57', 'close.sd.n57')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n58 <- apply(SP500[,c('Close','close.sma.n58', 'close.sd.n58')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n59 <- apply(SP500[,c('Close','close.sma.n59', 'close.sd.n59')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n60 <- apply(SP500[,c('Close','close.sma.n60', 'close.sd.n60')], 1, function(x) { (x[1]-x[2])/x[3] } )

# Convert all NA to 0
SP500[is.na(SP500)] <- 0

# Calculate quartiles, where close is relation to range (Close - High) / (High - Low)
#SP500$quartile <- apply(SP500[,c('Close', 'Low', 'High')], 1, function(x) { (x[1]-x[2])/(x[3]-x[2])} )

# Calculate Returns from open to close 
SP500$ocret <- apply(SP500[,c('Open', 'Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
SP500$clret <- ROC(SP500$Close, type = c("discrete"))
SP500$clret[1] <- 0

#####################################################################
# Split Data To Train and Test Set
#####################################################################
#SP500 <- subset(SP500, Date >= as.Date("1993-01-01") )             #Input for subsetting by date versus splitting, works with train set only
train.index <- 1:(nrow(SP500)*.570) # Add *.666 for 2/3rd split
train.set <- SP500[train.index, ]
test.set <- SP500[-train.index, ]

#####################################################################
# Assign train and test set indicactors 
#####################################################################
# Name indicators #
train.indicator <- train.set$close.zscore.n11
test.indicator <- test.set$close.zscore.n11

######################################################################
# Develop Training Set Paramaters
# ####################################################################
# Enter buy / sell rules
train.set$enter.mean.rev <- ifelse(train.indicator < 0, 1, 0)
train.set$exit.mean.rev <- ifelse(train.indicator > 0, 1, 0)
train.set$enter.momo <- ifelse(train.indicator > 0, 1, 0)
train.set$exit.momo <- ifelse(train.indicator < 0, 1, 0)

# Mean Rev
train.set <- train.set %>%
  dplyr::mutate(sig.mean.rev = ifelse(enter.mean.rev == 1, 1,
                                      ifelse(exit.mean.rev == 1, 0, 0)))

# Momentum
train.set <- train.set %>%
  dplyr::mutate(sig.momo = ifelse(enter.momo == 1, 1,
                                      ifelse(exit.momo == 1, 0, 0)))


# lag signal by one forward day to signal entry next day 
train.set$sig.mean.rev <- lag(train.set$sig.mean.rev,1) # Note k=1 implies a move *forward*
train.set$sig.momo <- lag(train.set$sig.momo,1) # Note k=1 implies a move *forward*
train.set[is.na(train.set)] <- 0  # Set NA to 0

# Calculate equity curves
# Mean rev
train.set <- train.set %>%
  dplyr::mutate(RunID = rleid(sig.mean.rev)) %>%
  group_by(RunID) %>%
  dplyr::mutate(mean.rev.equity = ifelse(sig.mean.rev == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Momentum
train.set <- train.set %>%
  dplyr::mutate(RunID = rleid(sig.momo)) %>%
  group_by(RunID) %>%
  dplyr::mutate(momo.equity = ifelse(sig.momo == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats 
xts1 = xts(train.set$mean.rev.equity, order.by=as.POSIXct(train.set$Date, format="%Y-%m-%d")) 
xts2 = xts(train.set$momo.equity, order.by=as.POSIXct(train.set$Date, format="%Y-%m-%d")) 
xts3 = xts(train.set$clret, order.by=as.POSIXct(train.set$Date, format="%Y-%m-%d")) 

# Join XTS together 
train.compare <- cbind(xts1,xts2,xts3)

# Use the PerformanceAnalytics package for trade statistics
colnames(train.compare) <- c("Mean Reversion","Momentum","Buy And Hold")
charts.PerformanceSummary(train.compare,main="Cumulative Returns", wealth.index=TRUE, colorset=rainbow12equal)
#png(filename="20090606_rsi2_performance_updated.png", 720, 720)
#performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
#drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()

# Train Set Log Returns 
train.logRets <- log(cumprod(1+train.compare))
chart.TimeSeries(train.logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

######################################################
# Develop Test Set Paramaters (Unseen Data)
######################################################

# Enter buy / sell rules
test.set$enter.mean.rev <- ifelse(test.indicator < 0, 1, 0)
test.set$exit.mean.rev <- ifelse(test.indicator > 0, 1, 0)
test.set$enter.momo <- ifelse(test.indicator > 0, 1, 0)
test.set$exit.momo <- ifelse(test.indicator < 0, 1, 0)

# Mean Rev
test.set <- test.set %>%
  dplyr::mutate(sig.mean.rev = ifelse(enter.mean.rev == 1, 1,
                                      ifelse(exit.mean.rev == 1, 0, 0)))

# Momentum
test.set <- test.set %>%
  dplyr::mutate(sig.momo = ifelse(enter.momo == 1, 1,
                                  ifelse(exit.momo == 1, 0, 0)))


# lag signal by one forward day to signal entry next day 
test.set$sig.mean.rev <- lag(test.set$sig.mean.rev,1) # Note k=1 implies a move *forward*
test.set$sig.momo <- lag(test.set$sig.momo,1) # Note k=1 implies a move *forward*
test.set[is.na(test.set)] <- 0  # Set NA to 0

# Calculate equity curves
# Mean rev
test.set <- test.set %>%
  dplyr::mutate(RunID = rleid(sig.mean.rev)) %>%
  group_by(RunID) %>%
  dplyr::mutate(mean.rev.equity = ifelse(sig.mean.rev == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Momentum
test.set <- test.set %>%
  dplyr::mutate(RunID = rleid(sig.momo)) %>%
  group_by(RunID) %>%
  dplyr::mutate(momo.equity = ifelse(sig.momo == 0, 0,
                                     ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats 
xts1 = xts(test.set$mean.rev.equity, order.by=as.POSIXct(test.set$Date, format="%Y-%m-%d")) 
xts2 = xts(test.set$momo.equity, order.by=as.POSIXct(test.set$Date, format="%Y-%m-%d")) 
xts3 = xts(test.set$clret, order.by=as.POSIXct(test.set$Date, format="%Y-%m-%d")) 

# Join XTS together 
test.compare <- cbind(xts1,xts2,xts3)

# Use the PerformanceAnalytics package for trade statistics
colnames(test.compare) <- c("Mean Reversion","Momentum","Buy And Hold")
charts.PerformanceSummary(test.compare,main="Cumulative Returns", wealth.index=TRUE, colorset=rainbow12equal)
#png(filename="20090606_rsi2_performance_updated.png", 720, 720)
#performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
#drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()

# Test Set Log Returns 
test.logRets <- log(cumprod(1+test.compare))
chart.TimeSeries(test.logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)
head(train.logRets)

tail(train.logRets)

# Prepare data for plotting 
train.reps <- rep(1,nrow(train.logRets))
test.reps <- rep(2,nrow(test.logRets))
id <- c(train.reps,test.reps)
final <- rbind(test.compare,train.compare)
final <-log(cumprod(1+final))
final.df <- data.frame(final,id=id)
final.df <- setDT(final.df, keep.rownames = TRUE)[] # Set row names
colnames(final.df)[1] <- "Date"
final.df$Date <- ymd_hms(final.df$Date)

# negative 2 lag auto correlation dates 
start_one <- as.POSIXct("1931-9-25")
end_one <- as.POSIXct("1932-7-21")
start_two <- as.POSIXct("1936-1-13")
end_two <- as.POSIXct("1940-5-13")
start_three <- as.POSIXct("1940-12-26")
end_three <- as.POSIXct("1941-1-11")
start_four <- as.POSIXct("1941-4-4")
end_four <- as.POSIXct("1941-7-29")
start_five <- as.POSIXct("1965-9-23")
end_five <- as.POSIXct("1965-10-22")
start_six <- as.POSIXct("1990-10-12")
end_six <- as.POSIXct("1990-10-19")
start_seven <- as.POSIXct("1994-12-23")
end_seven <- as.POSIXct("1995-1-16")
start_eight <- as.POSIXct("1998-9-8")
end_eight <- as.POSIXct("2017-10-20")

# Plot train and test set equity curves with ggplot2 
p1<- ggplot(data = final.df, aes(x =Date, y = Mean.Reversion,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Mean.Reversion,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Momentum,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Buy.And.Hold,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Mean.Reversion,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Momentum,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Buy.And.Hold,colour=id))+
  theme_bw()+
  geom_vline(xintercept=as.numeric(as.POSIXct("1977-08-18 19:00:00"),linetype="dashed"))+
  theme(legend.position = "none")+
  scale_y_continuous(breaks = seq(-7, 7, by = .5))+
  ggtitle("Mean Reversion, Momentum, Buy And Hold - S&P500 Index",subtitle="1928 to Present") +
  labs(x="Date",y="Cumulative Log Return")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  #geom_rect(aes(xmin=start_one,xmax=end_one,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_two,xmax=end_two,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_three,xmax=end_three,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_four,xmax=end_four,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_five,xmax=end_five,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_six,xmax=end_six,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_seven,xmax=end_seven,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_eight,xmax=end_eight,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  annotate("text", label = "Mean Reversion", x = as.POSIXct("2010-01-01"), y = -2.0, color = "#003A90")+
  annotate("text", label = "Momentum", x = as.POSIXct("2010-01-01"), y = 6.5, color = "#003A90")+
  annotate("text", label = "Buy and Hold", x = as.POSIXct("2010-01-01"), y = 3.5, color = "#003A90")+
annotate("text", label = "Out of Sample", x = as.POSIXct("2000-01-01"), y = .5, color = "#56B1F7")+
  annotate("text", label = "In Sample", x = as.POSIXct("1940-01-01"), y = 4.0, color = "#132B43")
  
  

p2 <- ggplot(cor2.df,aes(Date,V1))+geom_line(aes(colour="V1"))+
  geom_area(aes(y = V1, fill="V1"))+
  ggtitle("Rolling Auto Correlation, 2 lag, rolling window = 3 years", subtitle = "S&P500 Log Returns") +
  labs(x="Date",y="Correlation")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  theme(legend.position="none")+
  scale_y_continuous(breaks = seq(-.5, .50, by = 0.05))
gridExtra::grid.arrange(p1, p2, ncol = 1)

Now lets plot the strategies against the lag 2 autocorrelation:

Rplot173

The success of a momentum strategy relies (in this short term momentum rule) on a positive 2 lag autocorrelation. We see that the positive 2 lag autocorrelation peaked in 5/23/1973 and declined into negative territory around 1998 where it remains negative at present day. The success of a mean reversion strategy is largely dependent on a negative 2 lag autocorrelation, notably beginning post dot com bubble. This marked a new regime shift from momentum to mean reversion (interday basis).

To conclude, the 2 lag autocorrelation may be a useful tool for detecting which type of model to fit to the underlying series.