SPY Mean Reversion With John Ehlers Adaptive RSI

It has been a busy few months. I have been exploring market indicators that John Ehlers has created which he publicly made available in his book:ย Cycle Analytics for Traders : Advanced Technical Trading Concepts.

The key theme of his book is applying digital signal processing filters to better process market data. He utilizes various high and low pass filters which only allow certain frequencies to pass at specified look back periods and thus make better attempts to reduce signal to noise and perhaps extract a dominant cycle from the data.

Key ideas from the book:

  1. Traditional indicators have lag and assume normal probability distribution
  2. His indicators reduce lag
  3. Extracting the dominant cycle over a range of periods can be used to calibrate traditional indicators such as RSI, stochastic etc….
  4. Has methods to convert indicator output to as close to normal probability distribution

I’m in the process of making his indicators from his Cycle analytics for traders book available on my github in a package called MarketCycles.jl. Please refer to the package README.md for further information.

The package will be used in the next part of this post where we use an Adaptive RSI to time the mean reversion tendency of the daily S&P500.

First install the package:

Pkg.clone("https://github.com/flare9x/MarketCycles.jl")

The Adaptive RSI is adaptive because the indicator extracts the dominant cycle using the autocorrelation periodogram. At any given point in time [i] half the dominant cycle period is used to calibrate the lookback length of the RSI.

I want to point out a few stark differences with the below strategy back test code. First, the code is setup to BUY NEXT BAR AT OPEN. This means when a signal is triggered at the close of a bar. A trade is then entered at the next open price.

Secondly, rather than calculate the cumulative % close to close returns as so many do. We specify an initial starting capital and add the $ gain to a running total adjusting for trading commissions which provides us with a result that is more in line with how it might work in the real world.

Please see top of code snippet in order to adjust the following input variables for the strategy:

# Strategy input variables
starting_capital = 10000
comms = 10
entry_level = .3
exit_level = .8

The rest is best described by following the below julia language code. Note if you do not have a data source I have made this post easy to follow by providing a code to download data from the alphavantage api. You just need to edit the alphavantage link and add in your own API key:

# Ehlers mean reversion

using HTTP
using DataFrames
using CSV
using Gadfly
using MarketCycles

# Multiple symbols
adjusted = 1  # if want adjusted closes set to 1 else non adjusted set it to 0
t = 1
tickers = ["SPY"]
for t in 1:length(tickers)  # Note it might get stuck!! if it fails.. print the t value... then restart the for loop for t in 4:length(tickers)  instead of 1:length(tickers) for example
# Using HTTP package
sleep(10)  # sleep between API calls
if adjusted == 1
res = HTTP.get(joinpath(
"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
else
    res = HTTP.get(joinpath(
    "https://www.alphavantage.co/query?function=TIME_SERIES_MONTHLYD&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
end
mycsv = readcsv(res.body)
x = convert(DataFrame, mycsv)
x = x[2:nrow(x),:]  # subset remove header row
# Rename Columns
if adjusted == 1
    colnames = ["Date","Open","High","Low","Close","Adjusted_Close","Volume","Dividend_Amount","split_coefficient"]
else
colnames = ["Date","Open","High","Low","Close","Volume"]
end
names!(x.colindex, map(parse, colnames))
# Convert String Date to Date format
x[:Date] = Date.(x[:Date],Dates.DateFormat("yyy-mm-dd"))
# Sort Date Frame By Date
x = sort!(x, [order(:Date)], rev=(false))
# Convert OHLC to Float64 and Volume to Int64
if adjusted == 1
for i in 2:length(x)-2
    x[i] = convert(Array{Float64,1},x[i])
end
    x[7] = convert(Array{Int64,1},x[7])
    x[8] = convert(Array{Float64,1},x[8])
else
    for i in 2:length(x)-1
        x[i] = convert(Array{Float64,1},x[i])
    end
        x[6] = convert(Array{Int64,1},x[6])
end

if adjusted == 1
CSV.write(joinpath("CSV_OUT_ADJUSTED_"tickers[t]".csv"), x)
else
CSV.write(joinpath("CSV_OUT_"tickers[t]".csv"), x)
end
end

# Load Data
SPY = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/Indicators/CSV_OUT_ADJUSTED_SPY.csv",types=[String; fill(Float64, 5); Int;Float64],header=true)
SPY[:Date] = DateTime.(SPY[:Date],Dates.DateFormat("mm/dd/yyyy"))

# Pull arrays
Date_index = SPY[:Date]
Spy_close = SPY[:Close]
Spy_open = SPY[:Open]

# Eherls Adaptive RSI
Adaptive_RSI = AdaptiveRSI(Spy_close)

# Strategy input variables
starting_capital = 10000.00
comms = 10
entry_level = .3
exit_level = .8

# Cross under entry level and sell when cross over exit level
entry = zeros(Adaptive_RSI)
exit = zeros(Adaptive_RSI)
for i = 2:size(Adaptive_RSI,1)
    if (Adaptive_RSI[i] <= entry_level) && (Adaptive_RSI[i-1] > entry_level)
        entry[i] = 1
    else
        entry[i] = 0
    end
    if (Adaptive_RSI[i] >= exit_level) && (Adaptive_RSI[i-1] < exit_level)
        exit[i] = 2
    else
        exit[i] = 0
    end
end

# combine entry and exit
signal = exit .+ entry


# Fill the trade between entry and exit with 1's
all_signal = zeros(signal)
for i = 2:length(signal)
    print(i)
    if signal[i] == 1.0
        all_signal[i] = 1.0
    end
    if signal[i] == 0.0 && all_signal[i-1] == 1.0
        all_signal[i] = 1
    end
    if signal[i-1] == 2.0 && signal[i] == 0.0
        all_signal[i-1] = 1.0
    end
end

# ID the start and end of the trade
trade_start = zeros(all_signal)
trade_end = zeros(all_signal)
for i = 2:size(trade_end,1)
if all_signal[i] == 1.0 && all_signal[i] != all_signal[i-1]
    trade_start[i] = 1
end
if all_signal[i-1] == 1.0 && all_signal[i-1] != all_signal[i]
    trade_end[i-1] = 2
end
end

# Calculate the % return buying at open and selling at the close
open_price = Float64[]
close_price = Float64[]
strategy_returns = zeros(size(all_signal,1))
for i = 1:size(all_signal,1)-1
    if trade_start[i] == 1
        open_price = Spy_open[i+1]
    elseif trade_end[i] == 2
        close_price = Spy_close[i]
        # place the close return on same index position as the end of the trade
    strategy_returns[i] = close_price /  open_price - 1
end
end


# Work out gain on starting investment amount
dollar_gain = zeros(size(strategy_returns,1))
dollar_gain[1] = starting_capital

i=1
for i=2:size(strategy_returns,1)
    # if the returns are 0 then use the previous $ amount
if strategy_returns[i] == 0.0
    dollar_gain[i] = dollar_gain[i-1]
elseif strategy_returns[i] != 0.0
    dollar_gain[i] = dollar_gain[i-1] + (dollar_gain[i-1] * strategy_returns[i]) - comms
end
end

# plot final result
white_panel = Theme(
	panel_fill="white",
    default_color="blue",
    background_color="white"
)
p = plot(x=Date_index,y=dollar_gain,Geom.line,Guide.title("Ehlers Adaptive RSI"),white_panel,Scale.y_continuous(format=:plain))
         draw(PNG("C:/Users/Andrew.Bannerman/Desktop/Julia/ehlers_rsi.png", 1500px, 800px), p);

And we may plot the output which is the growth of $10,000 inclusive of $5 round trip commissions:

ehlers_rsi

It should be noted that default settings from Ehlers book were used as a High and low pass filter as well as the range of lags used in the autocorrelation periodogram to calculate the dominant cycle.

The function I created has many inputs that may be calibrated:

AdaptiveRSI(x::Array{Float64}; min_lag::Int64=1, max_lag::Int64=48,LPLength::Int64=10, HPLength::Int64=48, AvgLength::Int64=3)::Array{Float64}

Thus the default settings might not be best tuned to SPY daily data. One would be advised to iterate through the input variables to learn how each variable affects the output as well as find a correct tuning of the indicator.

If anyone wishes to have a more professional back test script with varying entry / exits methods that accounts for slippage and commissions do not hesitate to contact me.

Follow me on twitter: @flare9x

Github

R – Quantifying Trend Days

In this post I will be using R and data.table to extract all trend up / down days. The following method was used to quantify a trend day:

1. Trend up = Close price closes within 25% of the days high
2. Trend down = Close prices closes within 25% of the days low
3. Exclusive of gaps, if open is above yesterdays high or low exclude
4. Daily return must be over / below .75 / -.75%

Other methods come to mind:
1. Select those stocks that close within 25% of low / high AND when daily gain / loss is above or below 1%.
2. Linear regression, low r2 see blog post here: http://quantgorilla.com/blog/quantifying-trend-days/

My shell of a code can be altered to include these methods but for now lets stick to closing within top / bottom 25% of days range excluding gap ups, the code to do this: (check github code bottom of page, WP is having a hard time displaying the code snippet correctly)

# Es Up / Down Trend Isolation
# 2.26.2018
# Andrew Bannerman 

# Load Daily ES Data
es_d <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.day.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_d$Date <- mdy(es_d$Date)  # Change to date format 

# Find relative position of the closing price  (Close - Low) / (High - Low)
es_d$pos <- (es_d$Close - es_d$Low) / (es_d$High - es_d$Low)

# Calculate returns
es_d$rets <- ((es_d$Close / lag(es_d$Close,1)) - 1) * 100

# Find gap up / down days 
es_d$gap_up  lag(es_d$High,1),1,0)
es_d$gap_dn <- ifelse(es_d$Open < lag(es_d$Low,1),1,0)

# Find trend up / down days 
es_d$trend_up = .75,1,0)
es_d$trend_dn <- ifelse(es_d$pos <= .25,1,0)

# Subset all trend up / down days 
# Trend day definition: close within 25 % of day high / low and close  over .75% to exclude quiet days 
# Exclude gap up / down days where open closes over high , low
trend_up = .75 & es_d$gap_up == 0, ]
trend_dn <- es_d[es_d$trend_dn == 1 & es_d$rets <= -.75 & es_d$gap_dn == 0, ]

# Count total trend up / down days
total_up <- nrow(trend_up)
total_dn <- nrow(trend_dn)

# Percentage trend days of sample
total <- nrow(es_d)
perc.up <- total_up / total
perc.dn <- total_dn / total

# Save Dates in order to use for susetting 5 min bars
trend_up_dates <- trend_up$Date
trend_dn_dates <- trend_dn$Date

There is a total ofย  5167 days in the sample. Of those days approx 11% are up trend days and 10% down trend days.ย  Next we may extract each trend day and save the 1 minute intraday plot:

# Load 1 minute bars
es_1 <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.min.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_1$Date_new <- paste(es_1$Date,es_1$Time)
es_1$Date_new <- mdy_hm(es_1$Date_new) # Change date to date format
es_1$Date <- mdy(es_1$Date)
# Save up trend plots
# initialize list
t_UP <- list()
i=1
for (i in 1:length(trend_up_dates)) {
tryCatch({
ptm0 <- proc.time()
temp <- subset(es_1 , Date == trend_up_dates[i]) # Conditionl subset == grab trend day on intraday level
temp$range <- temp$High – temp$Low
t_UP[[i]] <- temp
name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
temp <- temp[3:10]
head(temp)
colnames(temp)[7] <- "Date"
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
head(df)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names

# Save up trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_up_plots"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Up Day"))
dev.off()
ptm1=proc.time() – ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}

# General Trend Up Stats
trend_up_out <- do.call(rbind,t_UP)
head(trend_up_out)
t_up_mean_r <- mean(trend_up_out$range)

# Save intraday down trend plots
# initialize list
t_DN <- list()
i=1
for (i in 1:length(trend_dn_dates)) {
tryCatch({
ptm0 <- proc.time()
Sys.sleep(0.1)
temp <- subset(es_1 , Date == trend_dn_dates[i])
temp$range <- temp$High – temp$Low
t_DN[[i]] <- temp
name_date <- as.numeric(gsub("-", "", head(temp$Date,1))) #replace – with ""
temp$chg temp$Open, "up", "dn")
temp <- temp[3:10]
colnames(temp)[7] <- "Date"
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names

# Save down trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_down_plots"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Down Day"))
dev.off()
ptm1=proc.time() – ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}

# General Trend Down Stats
trend_dn_out <- do.call(rbind,t_DN)
t_dn_mean_r <- mean(trend_dn_out$range)

With example output:

20180126

20020611

Next we can plot daily bars, 5 days prior to each trend day:

##############################################################################
# Extract 5 days prior to trend up / down day
##############################################################################
dates_df <- data.frame(Date = es_d$Date)
dates_df$wkdays <- weekdays(as.Date(dates_df$Date)) # extract day of week

# Grab trend up start dates
es_d_prior_m <- ifelse(dates_df$wkdays == "Monday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_tue <- ifelse(dates_df$wkdays == "Tuesday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_w <- ifelse(dates_df$wkdays == "Wednesday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_th <- ifelse(dates_df$wkdays == "Thursday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_f <- ifelse(dates_df$wkdays == "Friday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 5),NA)
# Remove NA
es_d_prior_m <- es_d_prior_m[!is.na(es_d_prior_m)]
es_d_prior_tue <- es_d_prior_tue[!is.na(es_d_prior_tue)]
es_d_prior_w <- es_d_prior_w[!is.na(es_d_prior_w)]
es_d_prior_th <- es_d_prior_th[!is.na(es_d_prior_th)]
es_d_prior_f <- es_d_prior_f[!is.na(es_d_prior_f)]
t_up_all_prior <- c(es_d_prior_m,es_d_prior_tue,es_d_prior_w,es_d_prior_th,es_d_prior_f)
# sort dates
t_up_all_prior <- sort(t_up_all_prior)
t_up_all_prior <- ymd(t_up_all_prior)

# up trend subsetting
up_list <- list()
for (i in 1:length(trend_up_dates)) {
  up_list[[i]] = t_up_all_prior[i] & es_d$Date <= trend_up_dates[i],]
}

# Plot 5 days prior to uptrend
t_UP_Prior <- list()
i=1
for (i in 1:length(up_list)) {
  tryCatch({
    ptm0 <- proc.time()
    temp <- up_list[[i]] # Conditionl subset == grab trend day on intraday level
    name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
    df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
    open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    all_ts <- cbind(open,high,low,close)
    names <- c("Open","High","Low","Close")
    colnames(all_ts) <- names

    # Save up trend plots
    dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_up_prior"
    mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
    png(file=mypath,width= 1400, height=1000)
    mytitle = paste(name_date[1])
    chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini -- Trend Up Day"))
    dev.off()
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

# Grab trend down start dates
es_d_prior_m <- ifelse(dates_df$wkdays == "Monday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_tue <- ifelse(dates_df$wkdays == "Tuesday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_w <- ifelse(dates_df$wkdays == "Wednesday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_th <- ifelse(dates_df$wkdays == "Thursday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_f <- ifelse(dates_df$wkdays == "Friday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 5),NA)
# Remove NA
es_d_prior_m <- es_d_prior_m[!is.na(es_d_prior_m)]
es_d_prior_tue <- es_d_prior_tue[!is.na(es_d_prior_tue)]
es_d_prior_w <- es_d_prior_w[!is.na(es_d_prior_w)]
es_d_prior_th <- es_d_prior_th[!is.na(es_d_prior_th)]
es_d_prior_f <- es_d_prior_f[!is.na(es_d_prior_f)]
t_up_all_prior <- c(es_d_prior_m,es_d_prior_tue,es_d_prior_w,es_d_prior_th,es_d_prior_f)
# sort dates
t_up_all_prior <- sort(t_up_all_prior)
t_up_all_prior <- ymd(t_up_all_prior)

# down trend subsetting
dn_list <- list()
for (i in 1:length(trend_dn_dates)) {
  dn_list[[i]] = t_up_all_prior[i] & es_d$Date <= trend_dn_dates[i],]
}

# Plot 5 days prior to down trend
i=1
for (i in 1:length(dn_list)) {
  tryCatch({
    ptm0 <- proc.time()
    temp <- dn_list[[i]] # Conditionl subset == grab trend day on intraday level
    name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
    df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
    open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    all_ts <- cbind(open,high,low,close)
    names <- c("Open","High","Low","Close")
    colnames(all_ts) <- names

    # Save up trend plots
    dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_dn_prior"
    mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
    png(file=mypath,width= 1400, height=1000)
    mytitle = paste(name_date[1])
    chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini -- Trend Up Day"))
    dev.off()
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

With output:

19990204

20110914

There are two ways to attempt to predict a trend day with a certain degree of probability:

1. Use patterns of daily bars
2. Use intraday patterns within the trend day

The daily bars plotted above allow to view daily bars prior to a trend up / down day. This code can be extended to look at specific features which I will allow the reader to work at which may include:

1. Prior to trend day, was previous day down or up?
2. Where did the price close relative to the days range (high – low / high – low)
3. What did the 3 day pattern look like prior to a trend day?
– 3 tight closes?
– middle bar low and close lower than the bars each side?

etc etc…

Now we move on to an attempt to quantify a trend day within the trend day itself. This involves a simple count of the number of new 1 minute highs made by a certain time frame. In this example I chose the number of 1 minute highs or lows prior to 10am Central time.

The code that achieves this:

##########################################################################################
# Intraday Trend up / down
##########################################################################################
es_1 <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.min.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_1$Date_new <- paste(es_1$Date,es_1$Time)
es_1$Date_new  <- mdy_hm(es_1$Date_new)  # Change date to date format
es_1$Date <- mdy(es_1$Date)
# make data.table
# data.table for speed
es_1  shift(roll_max), 1,0),by = Date] # place 1 when close makes new day high
# rolling day low
es_1[, roll_min := cummin(Low), by = Date] # rolling day high
es_1[, close_below_low := ifelse(Close = "08:31" & Time = "08:31" & Time  0)
# plot distriutions
ggplot(data=plot.df, aes(plot.df$count_max)) +
  geom_histogram(binwidth = .5,fill="darkgreen",alpha = 0.5,col="grey") +
  scale_x_continuous(breaks = seq(plot.df$count_max), max(plot.df$count_max),name="No. New Highs Prior 10am Central")+
  scale_y_continuous(breaks = seq(0, 600000,100000))+
  ggtitle("Number of new 1 minute highs prior to 10am Central")+
  ylab("Total New Highs")

# Plot new lows
ggplot(data=plot.df, aes(plot.df$count_min)) +
  geom_histogram(binwidth = .5,fill="red",alpha = 0.5,col="grey") +
  scale_x_continuous(breaks = seq(plot.df$count_min), max(plot.df$count_min),name="No. New Lows Prior 10am Central")+
  #scale_y_continuous(breaks = seq(0, 35500,2000))+
  ggtitle("Number of new 1 minute lows prior to 10am Central")+
  ylab("Total New Low")
graphics.off()

We may plot the distribution of new highs / lows prior to 10am central time:

Rplot305

Rplot306

We will back test less frequent new highs / new lows values prior to 10am central.

Rules of back test:

1. Go long when number of new highs prior to 10am central is >= 8
2. Go short when number of new highs prior to 10am central is >= 8
3. exit at the end of the trading day

no stops, no slippage and commissions.

The code to do this:

# Back test
# Trading Signal long and short
es_1[, sig_buy := ifelse(before_10_UCT_up >= 8, 1,0)] # long signal
es_1[, sig_exit :=  Time = 8, -1,0)] # long signal
es_1[, sig_s_exit :=  Time < "15:15"] # exit time
#es_1[is.na(es_1)] <- 0
es_1[, sig_end_l := ifelse(sig_exit == FALSE,0,NA)] # mark end of trade retain NA
es_1[, sig_end_s := ifelse(sig_s_exit == FALSE,0,NA)] # mark end of trade retain NA
# Combine NA + signal for long trades
es_1$z_l <- ifelse(es_1$sig_buy == 1,
               rowSums(es_1[, c("sig_buy", "sig_end_l")], na.rm=TRUE),
               rowSums(es_1[, c("sig_buy", "sig_end_l")]))
es_1$z_l[1] <- 0
# Combine NA + signal for short trades
es_1$z_s <- ifelse(es_1$sig_short == -1,
                   rowSums(es_1[, c("sig_short", "sig_end_s")], na.rm=TRUE),
                   rowSums(es_1[, c("sig_short", "sig_end_s")]))
es_1$z_s[1] <- 0
# Forward fill 1's to end of the trade using package zoo, function na.locf
es_1[, final.signal_long := na.locf(z_l)] # long trades
es_1[, final.signal_short := na.locf(z_s)] # long trades
# lag signal by one forward day to signal entry next day (Trade at market open)
es_1$final.signal_long <- lag(es_1$final.signal_long,1) # Note k=1 implies a move *forward*
es_1$final.signal_short <- lag(es_1$final.signal_short,1) # Note k=1 implies a move *forward*
# Close to close returns
es_1[, one_min_rets := (Close / shift(Close)) - 1]
# Calculate strategy returns
es_1[, sig_long_rets := final.signal_long * one_min_rets]
es_1[, sig_short_rets := final.signal_short * one_min_rets]
es_1$sig_long_rets[1] <- 0
es_1$sig_short_rets[1] <- 0
# Combine equity curves
es_1[, combined_rets := sig_long_rets + sig_short_rets]
# Cumulative returns
es_1[, cum_rets_l := cumprod(1 + sig_long_rets) - 1]
es_1[, cum_rets_s := cumprod(1 + sig_short_rets) - 1]
es_1[, cum_rets_comb := cumprod(1 + combined_rets) - 1]
plot(es_1$Date_new,es_1$cum_rets_l,type="l")
graphics.off()
# Plot equity curves
line_plot_df <- data.frame(es_1$cum_rets_l,es_1$cum_rets_s,es_1$cum_rets_comb)
line.plot.df <- melt(line_plot_df)
date <- rep(es_1$Date,3)
line.plot.df <- cbind(line.plot.df,date)
head(line.plot.df)
ggplot(data=line.plot.df, aes(x=date, y=value, group = variable, colour = variable)) +
  geom_line()

With the final equity curves:

Rplot307

No optimization for times, number of new highs etc… I arbitrarily chose each.

To conclude, the idea of predicting trend days prior and within the trend day itself is an interesting idea. This is a very simple example which may be extended.

Full code can be found on my github:

Thanks for reading!

Tail Risk – Downside Analysis (S&P500)

Tail Risk. The enemy for long term investing with leverage.

Lets express the 1 day daily returns in a box plot.

Rplot78

# Maximum Loss and Gain 
min(df$spx.clret)
max(df$spx.clret)
# Location of min/max
min.loc <- which.min(df$spx.clret)
max.loc <- which.max(df$spx.clret)
# Date of Min 
min <- df$Date[min.loc]
max <- df$Date[max.loc]
min
max
> min
[1] "1987-10-19"
> max
[1] "1933-03-15"
> min(df$spx.clret)
[1] -0.2289973
> max(df$spx.clret)
[1] 0.1536613

The maximum one day loss, -0.3558507% which occurred October 19th 1987 stock crash. The largest 1 day gain, 0.3522206% which occurred March 15th 1933.

If using leverage we obtain a 1 day loss of (-0.7117014 %,2x leverage) or (-1.0675521%, 3x leverage). These one day ‘shocks’ are what is known as tail risk. It means that one could be down to the above magnitude in one trading day.

If we check the frequency 1 day losses over over ‘n’ loss %:

lsoses

Rplot88

Again if using leverage, these losses multiply by their leverage factor. Since 1928, there were 8 occurrences as example of > 20% one day losses. If we use 3x leverage this one day loss increases to >60%.

Furthermore, we also face bear market declines which again pose another issue when holding long term with leverage.

dd

Inclusive of 1929 bear market, peak to trough draw downs can be in excess of 50%. Top of 08 to bottom in 09 is around 56%. Again if we buy and hold a leveraged product. We will see gains 2x or 3x this amount.

So these are the ‘issues’ with buy and holding with leverage. Is there a solution?

One method of exploration is to use a simple moving average as a low pass filter. Using monthly bars, a simple moving average tends to reduce over all volatility.

Rplot83

Areas in red are cash periods or periods below the 12sma.

The below plot shows the maximum draw down for any given simple moving average from 1928 to present day. The simple moving average moves the investor to cash when below:

Rplot80

Over the same period, buy and hold draw down is 0.89783016%. Depending on the sma used, we approx reduce draw downs by 40 to 50%.

The optimal monthly simple moving average ranked by cumulative returns 1928 to present day.

Rplot79

If we perhaps consider a ‘modern market’ 1950 onwards.

Over all draw down reduces from buy and hold 0.5687671 %… to the 25/30% area depending on the simple moving average used.

Rplot81

And the optimal look back for the monthly simple moving average:

Rplot82

The message is pretty clear. Using a simple moving average as a ‘low pass filter’ often reduces volatility of simple buy and hold. This strategy does well to avoid both bear market declines post 1999 and 2007.

The over all results of monthly 12 sma, 1988 to present:

One day shocks like the 1987 stock crash can yield a 1 day drown down using 2 / 3x leverage up to 70 or 90%.

Post 1987 stock crash we see maximum draw downs for varying simple moving averages. Note short term simple moving averages are invested during bear market declines:

Rplot86

The question going forward. Are one day shocks of the 1987 stock crash possible again in the future? Is there anything structural that can change this? Does HFT provide liquidity or will it evaporate during times of distress?

A maximum draw down of 19% since 1987 using the low pass filter approach seems pretty simple to bear. One could, from 1987 to present use a 2 or 3x leverage factor and increase gains SUBSTANTIALLY whilst matching a typical buy and hold draw down of 56%.

Leverage and risk is the trade off. If believe the next 40 years will be as rosy as the previous. One may wish to take this bet. Not to mention will the S&P500 even be the ‘king’ market and continue to provide the same level of returns as the past 100 years?

No one knows ๐Ÿ™‚ If the risk profile suits. Then that’s up to you. Otherwise, a normal buy and hold investing approach using a simple moving average will avoid large negative compounding periods.

Long over 12 monthly sma:

Rplot87

The frequency of heavy 1 day losses reduces from buy and hold, This is partly due to avoiding the majority of bear market declines.

Rplot89

I believe the message is this: regardless of what you google about holding 3x leverage products, it should be studied whether the ETF is built soundly, ie has minimal tracking error for there are many ETFs with poor design.

Next – are you comfortable or can you stomach the sickening draw downs? There is no doubt spells of volatility and this is likely to continue.

Perhaps there are LESS VOLATILE instruments with smoother equity curves, perhaps less volatile strategies with smoother equity curves. With little volatility one may add a leverage multiplier at their will.

Pairs Trading – Testing For Conintergration, ADF (unit root), Johansen Test & Half Life of Mean Reversion

Note: I back test this 100% incorrectly. I also wish to add scaling of the independent variable per the hedge ratio correctly. I am leaning towards working in share prices and will update accordingly. What is correct is all the statistical testing, however, the back test is 100% incorrect. I also run the johansen test on the stationary series obtained from the hedge ratio through linear regression, this is an unnecessary step as we can use the hedge ratio from linear regression to scale the independent variable position sizes. I will get to the revisions when I can.

A statistical arbitrage opportunity exists between a cointegrated pair when the price between the pair moves out of equilibrium. This short term disequilibrium creates a trading opportunity where one enters a short position in one series with an equal weighted long position in the other series. The aim is that two series will eventually converge and the trade be closed out with a profit.

Essentially between two cointegrated pairs, the spread is mean reverting. The time of mean reversion is one factor to consider. How long will it take for the period of disequilibrium to revert to the mean and will it ever revert to the mean? Co-integration can and does break down. So one question to answer would be: How do I know if the trade will converge, when is it time to exit with a loss?

The half life of mean reversion which is calculated on the spread can help answer this question. If historically a pair of stocks A and B have shown to mean revert within ‘n’ days, if this threshold is exceeded by a long margin OR the trade meets a pre determined stop loss % OR a stop loss measured in standard deviations. The trade can be closed out with a loss.

A pair of stocks can have a linear trend however, we can form a stationary series by estimating the hedge ratio with ordinary least squares regression. In practice we run 2x ols regressions. One with Stock A as the independent variable and one with Stock B as the independent variable. We then pick the regression with the most significant t-statistic.

First lets get a feel for the data and plot the closing prices of EWA and EWC. I want to use these two series and time period to see if we can replicate Ernie Chans study in his book: Algorithmic Trading: Winning Strategies and Their Rationale.

Rplot72
EWA and EWC visually appear to have a high correlation. The pearson correlation coefficient is 0.9564724. If two series are correlated it essentially means they move in the same direction most of the time. However, it tells you nothing about the nature of the spread between two series and if they will even converge (mean revert) or continue to diverge. With co integration, we form a linear relationship between Stock A and Stock B where the distance between A and B is somewhat fixed and would be estimated to mean revert. In essence, we measure the ‘distance’ between A and B in terms of standard deviations and a trade would be taken when ‘n’ standard deviations is met. Also a stop loss may be placed outwith the normal sigma range ensuring the trade is exited if the relationship breaks down.

A scatter plot shows the variation around the ols regression fit:

# Scatter plot of price series
# plot x,y
plot(df$EWA_Close, df$EWC_Close, main="Scatterplot EWA and EWC With OLS Regression Fit",
     xlab="EWA Close", ylab="EWC Close ", pch=19)
abline(lm(df$EWC_Close~df$EWA_Close), col="red") # regression line (y~x)

Rplot71

We perform ordinary least squares regression using EWA as the independent variable (x) and EWC as the dependent variable (y). Note we set the intercept to 0 in so that the regression fit provides only the hedge ratio between EWA and EWC.


# Ordinary least squares regression
# EWA independant
ols <- lm(df$EWC_Close~df$EWA_Close+0) # regression line (y~x)
summary(ols)
beta <- coef(ols)[1]

resid <- as.data.frame(df$EWC_Close - beta * df$EWA_Close)
resid <- data.frame(resid,"Date" = df$Date)
colnames(resid)[1] <- "residuals"
plot(resid$Date,resid$residual, col = 'navyblue', xlab = 'Date', ylab = 'residuals', type="l", main=" EWA,EWC Residuals - EWA Independant Variable - No Intercept",cex.main=1)

Rplot75

In the above, we used ordinary least squares regression to estimate the hedge ratio (beta) and we form the spread by subtracting EWC close – the beta * EWA close. Note we use the closing prices and not the daily returns. By using the closing prices we form the hedge ratio that would provide us with the number of shares to go long/short for each ETF.

The augmented dickey fuller test is used to determine if a pair is non stationary or stationary (trending or non trending). The purpose of the test is to reject the null hypothesis (unit root). If we have unit root it means we have a non-stationary series. We wish to have a stationary series so that we may trade efficiency using mean reversion.


# urca ADF test
library(urca)
  summary(ur.df(resid$residual, type = "drift", lags = 1))

###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
############################################### 

Test regression drift 

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)

Residuals:
     Min       1Q   Median       3Q      Max
-1.17542 -0.17346 -0.00447  0.16173  1.27800 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.0006657  0.0072252   0.092 0.926599
z.lag.1     -0.0176743  0.0053087  -3.329 0.000892 ***
z.diff.lag  -0.1751998  0.0254683  -6.879 8.82e-12 ***
---
Signif. codes:  0 โ€˜***โ€™ 0.001 โ€˜**โ€™ 0.01 โ€˜*โ€™ 0.05 โ€˜.โ€™ 0.1 โ€˜ โ€™ 1

Residual standard error: 0.2781 on 1495 degrees of freedom
Multiple R-squared:  0.04101,	Adjusted R-squared:  0.03973
F-statistic: 31.97 on 2 and 1495 DF,  p-value: 2.548e-14

Value of test-statistic is: -3.3293 5.5752 

Critical values for test statistics:
      1pct  5pct 10pct
tau2 -3.43 -2.86 -2.57
phi1  6.43  4.59  3.78

The ADF t-statistic is -3.3293 which is lower than the critical value of 5% ( below -2.86). We may reject the unit root hypothesis with 95% certainty that we have a non-stationary relationship. If we flip the independent variable from EWA to EWC and re-run OLS and the augmented dickey fuller test we obtain a t-statistic of -3.3194 which is lower than the 5% critical value of -2.86 which means we have 95% certainty the series is non-stationary. In this case we choose the most negative / statistically significant test-statistic of EWA as independent variable.

Next to determine the number of shares we need to purchase / short we may use the Johansen test. The eigenvectors determine the weights for each ETF.

# Johansen test
# Eigen vectors are the number of shares
# Test is cointegration test
coRes=ca.jo(data.frame(df$EWA_Close, df$EWC_Close),type="trace",K=2,ecdet="none", spec="longrun")
summary(coRes)
slot(coRes,"V") # The matrix of eigenvectors, normalised with respect to the first variable.

######################
# Johansen-Procedure #
###################### 

Test type: trace statistic , with linear trend 

Eigenvalues (lambda):
[1] 0.010146673 0.002667727

Values of teststatistic and critical values of test:

          test 10pct  5pct  1pct
r <= 1 |  4.00  6.50  8.18 11.65
r = 0  | 19.28 15.66 17.95 23.52

Eigenvectors, normalised to first column:
(These are the cointegration relations)

                df.EWA_Close.l2 df.EWC_Close.l2
df.EWA_Close.l2       1.0000000       1.0000000
df.EWC_Close.l2      -0.7772904       0.6080793

Weights W:
(This is the loading matrix)

               df.EWA_Close.l2 df.EWC_Close.l2
df.EWA_Close.d     0.004309716    -0.003029255
df.EWC_Close.d     0.028971878    -0.002649008

The eigenvectors form the weights for each ETF. Simply put, if allocate $10,000 to each ETF. Purchase 10,000 * 1 / EWA close to determine the number of shares to long EWA. When shorting EWC, purchase 10,000 * -0.7772904 / EWC Close. Thus our weightings form a stationary portfolio.

In order to determine how long the cointegrated relationship takes to revert to the mean we can perform a linear regression of the 1 lagged daily differences minus the mean of the 1 lagged differences. Also known at the half life of mean reversion.

 

# Calculate half life of mean reversion (residuals)
# Calculate yt-1 and (yt-1-yt)
# pull residuals to a vector
residuals <- c(resid$residual)
y.lag <- c(residuals[2:length(residuals)], 0) # Set vector to lag -1 day
y.lag <- y.lag[1:length(y.lag)-1] # As shifted vector by -1, remove anomalous element at end of vector
residuals <- residuals[1:length(residuals)-1] # Make vector same length as vector y.lag
y.diff <- residuals - y.lag # Subtract todays close from yesterdays close
y.diff <- y.diff [1:length(y.diff)-1] # Make vector same length as vector y.lag
prev.y.mean <- y.lag - mean(y.lag) # Subtract yesterdays close from the mean of lagged differences
prev.y.mean <- prev.y.mean [1:length(prev.y.mean )-1] # Make vector same length as vector y.lag
final.df <- data.frame(y.diff,prev.y.mean) # Create final data frame

# Linear Regression With Intercept
result <- lm(y.diff ~ prev.y.mean, data = final.df)
half_life <- -log(2)/coef(result)[2]
half_life

# Linear Regression With No Intercept
result = lm(y.diff ~ prev.y.mean + 0, data = final.df)
half_life1 = -log(2)/coef(result)[1]
half_life1

# Print general linear regression statistics
summary(result)

The half life of mean reversion is 31.62271 days.

We will now form a trading strategy and calculate a rolling z-score over the residuals with a look back set equal to obtained half life.

# Make z-score of residuals
# Set look back equal to half life
colnames(resid)
resid$mean <- SMA(resid[,"residuals"], round(half_life))
resid$stdev <- runSD(resid[,"residuals"], n = round(half_life), sample = TRUE, cumulative = FALSE)
resid$zscore <- apply(resid[,c('residuals', 'mean','stdev')], 1, function(x) { (x[1]-x[2]) / x[3] })

# Convert all NA to 0
resid[is.na(resid)] <- 0

# Place residuals data frame to original (df)
df <- data.frame(df,resid)

# Plot z-score of residuals
plot(resid$zscore,type="l", main="Rolling Z-Score - Look Back Set Equal To Half Life")

Rplot77

Note the range of standard deviations for the spread is between 3,-3.

We can now perform a back test to see how well it would have done. For the sake of simplicity we will not include optimization on in-sample and perform out of sample testing. We will also not consider share amounts of the weightings or commissions.

###################
# Back Test Script
###################

# Create Trading Logic
# When z-score crossed below 0 buy EWA and sell short EWC per eigenvectors
df$enter.long <- ifelse(df$zscore <= 0, 1,0)   # for long EWA below mean
df$exit.long <- ifelse(df$zscore == 0, 1,0)    # for long EWA below mean
df$enter.short <- ifelse(df$zscore <= 0 , -1,0) # for short EWC below mean
df$exit.short <- ifelse(df$zscore == 0 , -1,0)  # for short EWC below mean

df$enter.short.1 <- ifelse(df$zscore >= 0, -1,0)   # for short EWA above mean
df$exit.short.1 <- ifelse(df$zscore == 0, -1,0)    # for short EWA above mean
df$enter.long.1 <- ifelse(df$zscore >= 0 , 1,0) # for long EWC above mean
df$exit.long.1 <- ifelse(df$zscore == 0 , 1,0)  # for long EWC above mean

# Calculate close to close returns
df$ewa.clret <- ROC(df$EWA_Close, type = c("discrete"))
df$ewa.clret[1] <- 0
df$ewc.clret <- ROC(df$EWC_Close, type = c("discrete"))
df$ewc.clret[1] <- 0

# Long EWA below mean
df <- df %>%
  dplyr::mutate(ewa.long = ifelse(enter.long ==  1, 1,
                                  ifelse(exit.long == 1, 0, 0)))

# Short EWC above mean
df <- df %>%
  dplyr::mutate(ewc.short = ifelse(enter.short ==  -1, 1,
                                  ifelse(exit.short == -1, 0, 0)))

# Short EWA above mean
df <- df %>%
  dplyr::mutate(ewa.short = ifelse(enter.short.1 ==  -1, 1,
                                  ifelse(exit.short.1 == 1, 0, 0)))

# Long EWC above mean
df <- df %>%
  dplyr::mutate(ewc.long = ifelse(enter.long.1 ==  1, 1,
                                   ifelse(exit.long.1 == -1, 0, 0)))

# lag signal by one forward day to signal entry next day
df$ewa.long <- lag(df$ewa.long,1) # Note k=1 implies a move *forward*
df$ewc.short <- lag(df$ewc.short,1) # Note k=1 implies a move *forward*
df$ewa.short <- lag(df$ewa.short,1) # Note k=1 implies a move *forward*
df$ewc.long <- lag(df$ewc.long,1) # Note k=1 implies a move *forward*

# Calculate equity curves
df$equity.long <- apply(df[,c('ewa.long', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short <- apply(df[,c('ewc.short', 'ewc.clret')], 1, function(x) { x[1]*x[2]})
df$equity.long.1 <- apply(df[,c('ewa.short', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short.1 <- apply(df[,c('ewc.long', 'ewc.clret')], 1, function(x) { x[1]*x[2]})

# Combine signals
df$combined <- apply(df[,c('equity.long', 'equity.short','equity.long.1','equity.short.1')], 1, function(x) { x[1]+x[2]+x[3]+x[4] })

# Pull select columns from data frame to make XTS whilst retaining format
xts1 = xts(df$equity.long, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts2 = xts(df$equity.short, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts3 = xts(df$equity.long.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts4 = xts(df$equity.short.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts5 = xts(df$combined, order.by=as.POSIXct(df$Date, format="%Y-%m-%d")) 

# Join XTS together
compare <- cbind(xts1,xts2,xts3,xts4,xts5)

# Use the PerformanceAnalytics package for trade statistics
colnames(compare) <- c("EWA Long","EWC Short","EWA Short","EWC Long","Combined")
charts.PerformanceSummary(compare,main="EWC,EWA Pair", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

Rplot76

One thing to note is that for the whole sample period from April 26th 2006 to April 9th 2012. We use the exact same hedge ratio to form the stationary series. This is likely not the correct procedure, we also use the full sample to record the half life of mean reversion. As we expect the series A & B to change over time I believe it would be better practice to use a fixed rolling window to estimate the hedge ratio and mean reversion half life. When a new signal is generated it will be the hedge ratio/half life derived from the rolling look back period.

It should also be noted that we do not quite match Ernie Chans test from his book, I am using alpha vantage as a source of price data and we have a difference in price between Ernies EWA and the alpha vantage EWA. There is also differences in the urca Johansen test and the one Ernie uses in the jplv7 package. I will attempt to address these issues in upcoming posts.

This serves as a good introduction for me and others to the topic of pairs trading.

 

Full R code below. 

# Test for cointegration
# Download daily price data from Alpha Vantage API
# Plot two series
# Perform Linear Regression
# Plot Residuals
# ADF test for unit root
# Half life of mean reversion
# zscore look back equal to half life
# back test long stock A, short Stock B
# Andrew Bannerman 11.1.2017

require(alphavantager)
require(urca)
require(lattice)
require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Download price series from Alpha Vantage
# install.packages('alphavantager')
# Key
#https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=EWA&apikey=6RSYX9BPXKZVXUS9&outputsize=full&datatype=csv
av_api_key("your_api_key")
print(av_api_key())
#EWC <- av_get(symbol = "GLD", av_fun = "TIME_SERIES_DAILY_ADJUSTED", outputsize = "full")
#EWA <- av_get(symbol = "SLV", av_fun = "TIME_SERIES_DAILY_ADJUSTED", outputsize = "full")
#bit coin EWC <- av_get(symbol = "BTC", av_fun = "DIGITAL_CURRENCY_DAILY", outputsize = "full", market="CNY")

EWA <- read.csv("D:/R Projects/Final Scripts/Cointegration Scripts/data/daily_adjusted_EWA.csv", header=TRUE,stringsAsFactors=FALSE)
EWC <- read.csv("D:/R Projects/Final Scripts/Cointegration Scripts/data/daily_adjusted_EWC.csv", header=TRUE,stringsAsFactors=FALSE)
str(EWA)
# Convert timestamp to Date format
EWA$timestamp <- ymd(EWA$timestamp)
EWC$timestamp <- ymd(EWC$timestamp)

# Sort
EWA <- arrange(EWA,timestamp)
EWC <- arrange(EWC,timestamp)

# Merge Data Frames By date
df <- merge(EWA,EWC, by='timestamp')
head(df)
# Rename Columns
colnames(df)[1] <- "Date"
colnames(df)[6] <- "EWA_Close"
colnames(df)[14] <- "EWC_Close"

# Subset by date
df <- subset(df, Date >= as.POSIXct("2006-04-26") & Date <= as.POSIXct("2012-04-09"))  

# Scatter plot of price series
# plot x,y
plot(df$EWA_Close, df$EWC_Close, main="Scatterplot EWA and EWC With OLS Regression Fit",
     xlab="EWA Close", ylab="EWC Close ", pch=19)
abline(lm(df$EWC_Close~df$EWA_Close), col="red") # regression line (y~x)

# Line plot of series
require(ggplot2)
require(scales)

ggplot(df, aes(Date)) +
  theme_classic()+
  geom_line(aes(y=EWA_Close), colour="red") +
  geom_line(aes(y=EWC_Close), colour="blue") +
  scale_x_date(breaks = date_breaks("years"), labels = date_format("%Y"))+
  ggtitle("EWC, EWC Close", subtitle = "2006-04-26  to 2012-04-09") +
  labs(x="Year",y="EWA,EWC Close")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  annotate("text", label = "EWA", x = as.Date("2006-04-26"), y = 22, color = "blue")+
  annotate("text", label = "EWC", x = as.Date("2006-04-26"), y = 15, color = "red")

# Find corrlation coefficient
# method = pearson, spearman, kendall
EWA.EWC.cor <- cor(df$EWA_Close, df$EWC_Close, use="complete.obs", method="pearson") 

# Ordinary least squares regression
# EWA independant
ols <- lm(df$EWC_Close~df$EWA_Close+0) # regression line (y~x)
summary(ols)
beta <- coef(ols)[1]

resid <- as.data.frame(df$EWC_Close - beta * df$EWA_Close)
resid <- data.frame(resid,"Date" = df$Date)
colnames(resid)[1] <- "residuals"
plot(resid$Date,resid$residual, col = 'navyblue', xlab = 'Date', ylab = 'residuals', type="l", main=" EWA,EWC Residuals - EWA Independant Variable - No Intercept",cex.main=1)

# urca ADF test
library(urca)
  summary(ur.df(resid$residual, type = "drift", lags = 1))

# Johansen test
# Eigen vectors are the number of shares
# Test is cointegration test
coRes=ca.jo(data.frame(df$EWA_Close, df$EWC_Close),type="trace",K=2,ecdet="none", spec="longrun")
summary(coRes)
slot(coRes,"V") # The matrix of eigenvectors, normalised with respect to the first variable.
slot(coRes,"Vorg") # The matrix of eigenvectors, such that \hat V'S_{kk}\hat V = I.

# Calculate half life of mean reversion (residuals)
# Calculate yt-1 and (yt-1-yt)
# pull residuals to a vector
residuals <- c(resid$residual)
y.lag <- c(residuals[2:length(residuals)], 0) # Set vector to lag -1 day
y.lag <- y.lag[1:length(y.lag)-1] # As shifted vector by -1, remove anomalous element at end of vector
residuals <- residuals[1:length(residuals)-1] # Make vector same length as vector y.lag
y.diff <- residuals - y.lag # Subtract todays close from yesterdays close
y.diff <- y.diff [1:length(y.diff)-1] # Make vector same length as vector y.lag
prev.y.mean <- y.lag - mean(y.lag) # Subtract yesterdays close from the mean of lagged differences
prev.y.mean <- prev.y.mean [1:length(prev.y.mean )-1] # Make vector same length as vector y.lag
final.df <- data.frame(y.diff,prev.y.mean) # Create final data frame

# Linear Regression With Intercept
result <- lm(y.diff ~ prev.y.mean, data = final.df)
half_life <- -log(2)/coef(result)[2]
half_life

# Linear Regression With No Intercept
result = lm(y.diff ~ prev.y.mean + 0, data = final.df)
half_life1 = -log(2)/coef(result)[1]
half_life1

# Print general linear regression statistics
summary(result)

# Make z-score of residuals
# Set look back equal to half life
colnames(resid)
half_life = 31
resid$mean <- SMA(resid[,"residuals"], round(half_life))
resid$stdev <- runSD(resid[,"residuals"], n = round(half_life), sample = TRUE, cumulative = FALSE)
resid$zscore <- apply(resid[,c('residuals', 'mean','stdev')], 1, function(x) { (x[1]-x[2]) / x[3] })

# Convert all NA to 0
resid[is.na(resid)] <- 0

# Place residuals data frame to original (df)
df <- data.frame(df,resid)

# Plot z-score of residuals
plot(resid$zscore,type="l", main="Rolling Z-Score - Look Back Set Equal To Half Life", col="darkblue")

###################
# Back Test Script
###################

# Create Trading Logic
# When z-score crossed below 0 buy EWA and sell short EWC per eigenvectors
df$enter.long <- ifelse(df$zscore <= -0.7, 1,0)   # for long EWA below mean
df$exit.long <- ifelse(df$zscore == 0, 1,0)    # for long EWA below mean
df$enter.short <- ifelse(df$zscore <= -0.7 , -1,0) # for short EWC below mean
df$exit.short <- ifelse(df$zscore == 0 , -1,0)  # for short EWC below mean

df$enter.short.1 <- ifelse(df$zscore >= 0.7, -1,0)   # for short EWA above mean
df$exit.short.1 <- ifelse(df$zscore == 0, -1,0)    # for short EWA above mean
df$enter.long.1 <- ifelse(df$zscore >= 0.7 , 1,0) # for long EWC above mean
df$exit.long.1 <- ifelse(df$zscore == 0 , 1,0)  # for long EWC above mean

# Calculate close to close returns
df$ewa.clret <- ROC(df$EWA_Close, type = c("discrete"))
df$ewa.clret[1] <- 0
df$ewc.clret <- ROC(df$EWC_Close, type = c("discrete"))
df$ewc.clret[1] <- 0

# Long EWA below mean
df <- df %>%
  dplyr::mutate(ewa.long = ifelse(enter.long ==  1, 1,
                                  ifelse(exit.long == 1, 0, 0)))

# Short EWC above mean
df <- df %>%
  dplyr::mutate(ewc.short = ifelse(enter.short ==  -1, 1,
                                  ifelse(exit.short == -1, 0, 0)))

# Short EWA above mean
df <- df %>%
  dplyr::mutate(ewa.short = ifelse(enter.short.1 ==  -1, 1,
                                  ifelse(exit.short.1 == 1, 0, 0)))

# Long EWC above mean
df <- df %>%
  dplyr::mutate(ewc.long = ifelse(enter.long.1 ==  1, 1,
                                   ifelse(exit.long.1 == -1, 0, 0)))

# lag signal by one forward day to signal entry next day
df$ewa.long <- lag(df$ewa.long,1) # Note k=1 implies a move *forward*
df$ewc.short <- lag(df$ewc.short,1) # Note k=1 implies a move *forward*
df$ewa.short <- lag(df$ewa.short,1) # Note k=1 implies a move *forward*
df$ewc.long <- lag(df$ewc.long,1) # Note k=1 implies a move *forward*

# Calculate equity curves
df$equity.long <- apply(df[,c('ewa.long', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short <- apply(df[,c('ewc.short', 'ewc.clret')], 1, function(x) { x[1]*x[2]})
df$equity.long.1 <- apply(df[,c('ewa.short', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short.1 <- apply(df[,c('ewc.long', 'ewc.clret')], 1, function(x) { x[1]*x[2]})

tail(df,500)

# Combine signals
df$combined <- apply(df[,c('equity.long', 'equity.short','equity.long.1','equity.short.1')], 1, function(x) { x[1]+x[2]+x[3]+x[4] })

# Pull select columns from data frame to make XTS whilst retaining format
xts1 = xts(df$equity.long, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts2 = xts(df$equity.short, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts3 = xts(df$equity.long.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts4 = xts(df$equity.short.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts5 = xts(df$combined, order.by=as.POSIXct(df$Date, format="%Y-%m-%d")) 

# Join XTS together
compare <- cbind(xts1,xts2,xts3,xts4,xts5)

# Use the PerformanceAnalytics package for trade statistics
colnames(compare) <- c("EWA Long","EWC Short","EWA Short","EWC Long","Combined")
charts.PerformanceSummary(compare,main="EWC,EWA Pair", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

Non Price Data – UltraPRO ETF Assets Under Management

In previous posts we determined that the S&P500 was mean reverting on a interday basis. What happens if there are better ways to capture the mean reverting nature than simply using price data alone?

In this post we will look at the UltraPRO family of ETF’s and use their assets under management to gauge over all market sentiment. The proceure is as follows:

1. Sum bullish ETF AUM’s
2. Sum Bearish ETF AUM’s
3. Either take the ratio of total bull aum / bear aum or total bull aum – bear aum.

For bullish funds I have used:

“DDM”,”MVV”,”QLD”,”SAA”,”SSO”,”TQQQ”,”UDOW”,”UMDD”,”UPRO”,”URTY”,”UWM”, “BIB”, “FINU”,”LTL”,”ROM”, “RXL”, “SVXY”,”UBIO”,”UCC”,”UGE”,”UPW”,”URE”,”USD”,”UXI”,”UYG”,”UYM”

And bearish:

“DOG”,”DXD”,”MYY”,”MZZ”,”PSQ”,”QID”,”RWM”,”SBB”,”SDD”,”SDOW”,”SDS”,”SH”,”SPXU”,”SQQQ”,”SRTY”,”TWM”,”SMDD”,”UVXY”,”VIXM”,”VIXY”

(Note make sure the order of your bull / bear ETFs are the same as mine if you will try my code. You can do this by seeing the order of the files in your directory by printing file.names on line 37 of the R code, this is important!)

Lets see what this looks like, FYI you can download all the historical NAV’s in .csv format from the UltraPRO website manually or you can use the R code below:

# Download ETF AUM Files
file.list <- c("DDM","MVV","QLD","SAA","SSO","TQQQ","UDOW","UMDD","UPRO","URTY","UWM", "BIB", "FINU","LTL","ROM", "RXL", "SVXY","UBIO","UCC","UGE","UPW","URE","USD","UXI","UYG","UYM","DOG","DXD","MYY","MZZ","PSQ","QID","RWM","SBB","SDD","SDOW","SDS","SH","SPXU","SQQQ","SRTY","TWM","SMDD","UVXY","VIXM","VIXY")

for (i in 1 : length(file.list)) {
  file.name.variable <-  file.list[i]
  url <- paste0("https://accounts.profunds.com/etfdata/ByFund/",
                file.name.variable, "-historical_nav.csv")
  destfile <- paste0("C:/R Projects/Data/etf_aum/",
                     file.name.variable, ".csv")
  download.file(url, destfile, mode="wb")
}

If we sum all assets under management for each group to obtain our bullish and bearish total aums.

total.bull.aum

total.bear.aum.png

spread.bull.bear

Note the spread on a yearly basis is mean reverting. We could run a statistical test but we see that the spread does revert to the mean. However, we can also over lay a 9sma and measure how often the spread moves above and beyond the rolling 9sma. We see what this looks like below:

Rplot158

We see that the spread on a rolling 9sma window is very mean reverting. This will be the subject of our back test. To keep things simple we will simply buy the SPY when the rolling z-score falls below 0 and we will sell when it crosses back over 0.

ultrapro_mean_rev

ultrapro_etf_mean_rev_dd

We see that the maximum draw down is 18%. We avoid most of the 08 decline. As the draw down is so little, we could explore using 2 or 3x leverage to increase profits and match the risk profile of buy and hold for example.

Annualized return is 12% since 2006, this AUM data for the Ultrapro ETF family did not exist prior to 2006. Sharpe ratio is .84.

We total 267 trades, 192 which were profitable and 75 losing trades for a win % of 72%. We are 47% of the time invested from 06 to present day. It means we beat buy and hold and free up capital for ‘layering’ other strategies.

The fact that this didnt break down during the 08 crisis is encouraging. I would rather see robustness against all market regimes.

In closing the Assets under management as a collective is essentially a measure of market sentiment. Asset levels rise and fall in the bull / bear funds according to the view of the market participants.

The strategy above seeks to trade the market when sentiment is lower or more on the bearish side and exit when the sentiment has reverted back to the mean or in this case when the 9 period rolling z score crossed back over 0.

Ideas for other tests:

  1. Change exit criteria, sell at a value higher than 0 for example
  2. Specify a ‘n’ day exit.. for example exit the trade after 3,4 or 5 days
  3. Do not sell first holding day but sell at first immediate profit from entry

These may be better / worse exits than above, but will leave you good readers to test those!

Full R Code below which includes:

  1. Loop for downloading specified ETF .csv from UltraPro website
  2. Loop for merging all .csv into one data frame
  3. Back test script
  4. Plots used above
# UltraPRO ETF Assets Under Management
# Scrape ETF data, study AUM's, create indicators, back test strategy
# Andrew Bannerman 10.4.2017
# Need to subtract bull ETF from bear... or ratio.... zscore ratio.......

require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)
require(ggplot2)
require(ggthemes)
require(lubridate)
require(scales)

# Download ETF AUM Files
file.list <- c("DDM","MVV","QLD","SAA","SSO","TQQQ","UDOW","UMDD","UPRO","URTY","UWM", "BIB", "FINU","LTL","ROM", "RXL", "SVXY","UBIO","UCC","UGE","UPW","URE","USD","UXI","UYG","UYM","DOG","DXD","MYY","MZZ","PSQ","QID","RWM","SBB","SDD","SDOW","SDS","SH","SPXU","SQQQ","SRTY","TWM","SMDD","UVXY","VIXM","VIXY")
for (i in 1 : length(file.list)) {
  file.name.variable <-  file.list[i]
  url <- paste0("https://accounts.profunds.com/etfdata/ByFund/",
                file.name.variable, "-historical_nav.csv")
  destfile <- paste0("C:/R Projects/Data/etf_aum/",
                     file.name.variable, ".csv")
  download.file(url, destfile, mode="wb")
}

# Set data Dir
data.dir <- "C:/R Projects/Data/etf_aum/"
download.dir <- "C:/R Projects/Data/etf_aum/"

#Read list of files
files = list.files(path = "C:/R Projects/Data/etf_aum/", pattern = ".", full.names = TRUE)

# Read file names
file.names = list.files(path = "C:/R Projects/Data/etf_aum", pattern = ".csv")

#Read the first file
data.file <- paste(files[1],sep="")

# Read data first file to data frame
df <- read.csv(data.file,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)

# Convert data formats
cols <-c(4:9)
df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
#Convert Date Column [1]
df$Date <- mdy(df$Date)   #mdy for .txt

# Loop for merging all files to one data frame by common date

for (f in 2:length(files)) {
  data.next <- paste(files[f],sep="")
  # if using names.txt
  #  data.next <- paste(data.dir,fileslist[f],'.csv',sep="")
  next.file <- read.csv(data.next,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE) 

  cols <-c(4:9)
  next.file[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
  #Convert Date Column [1]
  next.file$Date <- mdy(next.file$Date)   #mdy for .txt

  next.df <- full_join(df, next.file, by = c("Date" = "Date"))

 cnames <- rep(c("Proshares.Name","Ticker","NAV","Prior.NAV","NAV.Change","Nav.Change.a","Shares.Outstanding","AUM"),f)
  cnums <- rep(seq(1,f),each=8)
  columnames <- paste(cnames,cnums)
  colnames(next.df) <- c('Date',columnames)

  df <- next.df
  colnames(df)
}

# Sort data.frame from 'newest to oldest' to 'oldest to newest'
df <- arrange(df, Date)

# Subset AUM columns to one data frame
new.df <- df[, seq(from = 1, to = ncol(df), by=8)]

# Convert all NA to 0
new.df[is.na(new.df)] <- 0

colnames(new.df)

# Sum all AUMs
new.df$aum.sum.bull <-  rowSums(new.df[,c("AUM 1", "AUM 2", "AUM 5", "AUM 6", "AUM 7", "AUM 12", "AUM 13", "AUM 15", "AUM 16", "AUM 26", "AUM 27","AUM 28","AUM 30","AUM 31","AUM 32","AUM 33","AUM 34","AUM 35","AUM 36","AUM 37","AUM 38","AUM 39","AUM 41","AUM 42","AUM 43","AUM 44")])   # Bull Aums
new.df$aum.sum.bear <-  rowSums(new.df[,c("AUM 3", "AUM 4", "AUM 8", "AUM 9", "AUM 10", "AUM 11", "AUM 14", "AUM 17", "AUM 18", "AUM 19", "AUM 20","AUM 21","AUM 22","AUM 23","AUM 24","AUM 25","AUM 29","AUM 40","AUM 45","AUM 46")])   # Bull Aums

# Ratio or spread of Bull to bear
new.df$aum.sum <- new.df$aum.sum.bull / new.df$aum.sum.bear

new.df <- new.df[-c(2855), ]

# Make Plot
ggplot(new.df, aes(Date, aum.sum)) +
  geom_line()+
  theme_economist() +
  scale_x_date(breaks = date_breaks("years"), labels = date_format("%y"))+
  scale_y_continuous(breaks = seq(-19000000000, 19000000000, by = 500500000))+
  ggtitle("Spread of Total Bear Bull Assets Under Management", subtitle = "2006 To Present") +
  labs(x="Year",y="Total Bear AUM")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  theme(axis.text.x= element_text(hjust=0.5, angle=0))

# Create rolling n day z-score of total AUM sum
# Use TTR package to create n day SMA
get.SMA <- function(numdays) {
  function(new.df) {
    SMA(new.df$aum.sum, n=numdays)    # Calls TTR package
  }
}
# Create a matrix to put the SMA's in
sma.matrix <- matrix(nrow=nrow(new.df), ncol=0)

# Loop for filling it
for (i in 2:30) {
  sma.matrix <- cbind(sma.matrix, get.SMA(i)(new.df))
}

# Rename columns
colnames(sma.matrix) <- sapply(2:30, function(n)paste("sma.n", n, sep=""))

# Bind to existing dataframe
new.df <-  cbind(new.df, sma.matrix)

# Use TTR package to create n day SD
get.SD <- function(numdays) {
  function(new.df) {
    runSD(new.df$aum.sum, numdays, cumulative = FALSE)     # Calls TTR package to create RSI
  }
}
# Create a matrix to put the SMA's in
sd.matrix <- matrix(nrow=nrow(new.df), ncol=0)

# Loop for filling it
for (i in 2:30) {
  sd.matrix <- cbind(sd.matrix, get.SD(i)(new.df))
}

# Rename columns
colnames(sd.matrix) <- sapply(2:30, function(n)paste("sd.n", n, sep=""))

# Bind to existing dataframe
new.df <-  cbind(new.df, sd.matrix)

# zscore calculation
new.df$zscore.n2 <- apply(new.df[,c('aum.sum','sma.n2', 'sd.n2')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n3 <- apply(new.df[,c('aum.sum','sma.n3', 'sd.n3')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n4 <- apply(new.df[,c('aum.sum','sma.n4', 'sd.n4')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n5 <- apply(new.df[,c('aum.sum','sma.n5', 'sd.n5')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n6 <- apply(new.df[,c('aum.sum','sma.n6', 'sd.n6')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n7 <- apply(new.df[,c('aum.sum','sma.n7', 'sd.n7')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n8 <- apply(new.df[,c('aum.sum','sma.n8', 'sd.n8')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n9 <- apply(new.df[,c('aum.sum','sma.n9', 'sd.n9')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n10 <- apply(new.df[,c('aum.sum','sma.n10', 'sd.n10')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n11 <- apply(new.df[,c('aum.sum','sma.n11', 'sd.n11')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n12 <- apply(new.df[,c('aum.sum','sma.n12', 'sd.n12')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n13 <- apply(new.df[,c('aum.sum','sma.n13', 'sd.n13')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n14 <- apply(new.df[,c('aum.sum','sma.n14', 'sd.n14')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n15 <- apply(new.df[,c('aum.sum','sma.n15', 'sd.n15')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n16 <- apply(new.df[,c('aum.sum','sma.n16', 'sd.n16')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n17 <- apply(new.df[,c('aum.sum','sma.n17', 'sd.n17')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n18 <- apply(new.df[,c('aum.sum','sma.n18', 'sd.n18')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n19 <- apply(new.df[,c('aum.sum','sma.n19', 'sd.n19')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n20 <- apply(new.df[,c('aum.sum','sma.n20', 'sd.n20')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n21 <- apply(new.df[,c('aum.sum','sma.n21', 'sd.n21')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n22 <- apply(new.df[,c('aum.sum','sma.n22', 'sd.n22')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n23 <- apply(new.df[,c('aum.sum','sma.n23', 'sd.n23')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n24 <- apply(new.df[,c('aum.sum','sma.n24', 'sd.n24')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n25 <- apply(new.df[,c('aum.sum','sma.n25', 'sd.n25')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n26 <- apply(new.df[,c('aum.sum','sma.n26', 'sd.n26')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n27 <- apply(new.df[,c('aum.sum','sma.n27', 'sd.n27')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n28 <- apply(new.df[,c('aum.sum','sma.n28', 'sd.n28')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n29 <- apply(new.df[,c('aum.sum','sma.n29', 'sd.n29')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n30 <- apply(new.df[,c('aum.sum','sma.n30', 'sd.n30')], 1, function(x) { (x[1]-x[2])/x[3] } )

# View Plot
ggplot(new.df, aes(Date, zscore.n9)) +
  geom_line()+
  theme_economist() +
  scale_x_date(breaks = date_breaks("years"), labels = date_format("%y"))+
  scale_y_continuous()+
  ggtitle("Rolling 9 Day Zscore of Total Bull Bear AUM Spread", subtitle = "2006 To Present") +
  labs(x="Year",y="9 Day Zscore")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  theme(axis.text.x= element_text(hjust=0.5, angle=0))

# Load S&P500 benchmark data
read.spx <- read.csv("C:/R Projects/Data/SPY.csv", header=TRUE, stringsAsFactors = FALSE)

# Convert data formats
cols <-c(2:7)
read.spx[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
#Convert Date Column [1]
read.spx$Date <- ymd(read.spx$Date)

# Join to existing data frame
new.df <- full_join(read.spx, new.df, by = c("Date" = "Date"))

# Convert all NA to 0
new.df[is.na(new.df)] <- 0

# Calculate Returns from open to close
new.df$ocret <- apply(new.df[,c('Open', 'Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
new.df$clret <- ROC(new.df$Close, type = c("discrete"))
new.df$clret[1] <- 0

# Subset by date
new.df <- subset(new.df, Date >= as.POSIXct("2006-01-01") ) 

# Name indicators #
#train.indicator <- train.set$close.zscore.n10
#test.indicator <- test.set$close.zscore.n10
signal <- new.df$zscore.n9
trigger <- 0

# Enter buy / sell rules
new.df$enter <- ifelse(signal < trigger, 1,0)
new.df$exit <- ifelse(signal > 0, 1,0)

# Mean Rev
new.df <- new.df %>%
  dplyr::mutate(signal = ifelse(enter == 1, 1,
                                      ifelse(exit == 1, 0, 0)))

# lag signal by one forward day to signal entry next day
new.df$signal <- lag(new.df$signal,1) # Note k=1 implies a move *forward*

new.df[is.na(new.df)] <- 0  # Set NA to 0

# Calculate equity curves

# Signal
new.df <- new.df %>%
  dplyr::mutate(RunID = rleid(signal)) %>%
  group_by(RunID) %>%
  dplyr::mutate(mean.rev.equity = ifelse(signal == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats
xts1 = xts(new.df$mean.rev.equity, order.by=as.POSIXct(new.df$Date, format="%Y-%m-%d %H:%M"))
xts2 = xts(new.df$clret, order.by=as.POSIXct(new.df$Date, format="%Y-%m-%d %H:%M")) 

# Join XTS together
compare <- cbind(xts1,xts2)

require(PerformanceAnalytics)
colnames(compare) <- c("Mean Reversion UltraPRO ETF's","Buy And Hold")
charts.PerformanceSummary(compare,main="Mean Reversion - UltraPRO ETF AUM", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

# Find net trade result of multiple 'n' day trades
# Find start day of trade, find end day, perform (last(Close) - first(Open))/first(Open) % calculation
new.df <- new.df %>%
  dplyr::mutate(RunID = data.table::rleid(signal)) %>%
  group_by(RunID) %>%
  dplyr::mutate(perc.output = ifelse(signal == 0, 0,
                                     ifelse(row_number() == n(),
                                            (last(Close) - first(Open))/first(Open), 0))) %>%
  ungroup() %>%
  select(-RunID)

# Win / Loss %
# All Holding Days
winning.trades <- sum(new.df$mean.rev.equity > '0', na.rm=TRUE)
losing.trades <- sum(new.df$mean.rev.equity < '0', na.rm=TRUE)
even.trades <- sum(new.df$mean.rev.equity == '0', na.rm=TRUE)
total.days <- NROW(new.df$mean.rev.equity)

# Multi Day Trades
multi.winning.trades <- sum(new.df$perc.output > '0', na.rm=TRUE)
multi.losing.trades <- sum(new.df$perc.output < '0', na.rm=TRUE)
multi.total.days <- multi.winning.trades+multi.losing.trades

# % Time Invested
time.invested <- (winning.trades + losing.trades) / total.days
winning.trades + losing.trades

# Calcualte win loss %
# All Days
total <- winning.trades + losing.trades
win.percent <- winning.trades / total
loss.percent <- losing.trades / total
# Multi Day Trades
multi.total <- multi.winning.trades + multi.losing.trades
multi.win.percent <- multi.winning.trades / multi.total
multi.loss.percent <- multi.losing.trades / multi.total
# Calculate Consecutive Wins Loss
# All Days
remove.zero <- new.df[-which(new.df$mean.rev.equity == 0 ), ] # removing rows 0 values
consec.wins <- max(rle(sign(remove.zero$mean.rev.equity))[[1]][rle(sign(remove.zero$mean.rev.equity))[[2]] == 1])
consec.loss <- max(rle(sign(remove.zero$mean.rev.equity))[[1]][rle(sign(remove.zero$mean.rev.equity))[[2]] == -1])
consec.wins

# Multi Day Trades
multi.remove.zero <- new.df[-which(new.df$perc.output == 0 ), ] # removing rows 0 values
multi.consec.wins <- max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == 1])
multi.consec.loss <-max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == -1])

# Calculate Summary Statistics
# All Days
average.trade <- mean(new.df$mean.rev.equity)
average.win <- mean(new.df$mean.rev.equity[new.df$mean.rev.equity >0])
average.loss <- mean(new.df$mean.rev.equity[new.df$mean.rev.equity <0])
median.win <- median(new.df$mean.rev.equity[new.df$mean.rev.equity >0])
median.loss <- median(new.df$mean.rev.equity[new.df$mean.rev.equity <0])
max.gain <- max(new.df$mean.rev.equity)
max.loss <- min(new.df$mean.rev.equity)
win.loss.ratio <- winning.trades / abs(losing.trades)
summary <- cbind(winning.trades,losing.trades,even.trades,total.days,win.percent,loss.percent,win.loss.ratio,time.invested,average.trade,average.win,average.loss,median.win,median.loss,consec.wins,consec.loss,max.gain,max.loss)
summary <- as.data.frame(summary)
colnames(summary) <- c("Winning Trades","Losing Trades","Even Trades","Total Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(summary)

# Multi Day Trades
multi.average.trade <- mean(new.df$perc.output)
multi.average.win <- mean(new.df$perc.output[new.df$perc.output >0])
multi.average.loss <- mean(new.df$perc.output[new.df$perc.output <0])
multi.median.win <- median(new.df$perc.output[new.df$perc.output >0])
multi.median.loss <- median(new.df$perc.output[new.df$perc.output <0])
multi.win.loss.ratio <- multi.average.win / abs(multi.average.loss)
multi.max.gain <- max(new.df$perc.output)
multi.max.loss <- min(new.df$perc.output)
multi.summary <- cbind(multi.winning.trades,multi.losing.trades,multi.total.days,multi.win.percent,multi.loss.percent,multi.win.loss.ratio,time.invested,multi.average.trade,multi.average.win,multi.average.loss,multi.median.win,multi.median.loss,multi.consec.wins,multi.consec.loss,multi.max.gain,multi.max.loss)
multi.summary <- as.data.frame(multi.summary)
colnames(multi.summary) <- c("Winning Trades","Losing Trades","Total Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(multi.summary)
print(performance.table)
print(drawdown.table)
table.Drawdowns(xts1, top=10)
Return.cumulative(xts1, geometric = TRUE)

# Write output to file
write.csv(new.df,file="C:/R Projects/ultra.pro.etf.mean.rev.csv")

Momentum Strategy – Boosting Returns – VFINX / VFITX

In this post we will look at a strategy that is 100% invested in the S&P500 and during adverse conditions, is 100% invested in an intermediate term treasury bond. The two funds used for the strategy:

Vanguard 500 Index Fund Investor Class(VFINX)
Vanguard Intermediate-Term Treasury Fund Investor Shares(VFITX)

The strategy is as follows using MONTHLY bars:
1. Long VFINX(SP500) when the price is over the 189 day moving average or monthly equivalent (9sma)
2. When VFINX(SP500) is below monthly 9sma, sell 100% of VFINX(SP500) at next months open and buy VFITX(Bonds)
3. When VFINX(SP500) is above the monthly 9sma, sell 100% of VFITX(Bonds) and 100% re-long VFINX(SP500)

Essentially the strategy is long the S&P500 when its above the 9sma (daily, 189sma) and 100% in intermediate bonds when below the 9sma (daily, 189sma).

The color red denotes times when VFINX(SP500) is below the monthly 9sma (daily, 189sma) and when the strategy would be 100% long bonds.

below9sma

The returns for this strategy since 1991 inception:

Rplot122

vfinx.vfitx.monthly

From the backtest we see an annualized return of 12.4% with a sharpe ratio of 1.21. Maximum draw down is 15% which means we may explore using leverage to boost gains whilst maintaining the same risk profile of buy and hold (VFINX).

We may boost returns by longing a 3x leveraged S&P500 ETF. This can be achieved with ProShares UltraPro ETF(UPRO, inception, 2009). First I want to check to see if we can model UPRO data prior to 2009.

Rplot117

We see from plotting the regression fit of UPRO and 3x SP500 daily returns that the fit is not perfect. There is a high degree of correlation. However, the % return tracking is not exact.

Rplot118

We have almost perfect tracking prior to June 2012. However, there is a notable divergence between the 3x S&P500 daily returns and UPRO.

When we model UPRO prior to 2009 by simply multiplying the S&P500 returns by 3 we know it will be largely theoretical.

Lets back test swapping VFINX with our theoretical UPRO prior to 2009.

Rplot119

upro

Maximum draw down is 45% which is significantly lower than buy and hold UPRO (92%). Essentially we enjoy staggering returns with almost half of the draw down. The 189 daily sma or monthly 9sma acts a 'low pass filter' and avoids the catastrophic bear markets post 1999 and 2007.

Rplot120

cum.ret

We see that cumulative returns are 1276.56%.

This is theoretical but staggering results.

# Get First And Last Date
last.date <- tail(df$Date, n = 1)
first.date <- head(df$Date, n = 1)
## Find Time Difference
time <- last.date - first.date
years.between <- time/352
years.between <- as.numeric(years.between, units="days")   # Extract numerical value from time difference 'Time difference of 2837.208 days'
years.between

10 thousand dollars grows to 12.7 million dollars over a time period of 26.98 years. Not bad for for a maximum draw down of 45% with 3x leverage.

In closing, the 189 daily sma or 9sma monthly equivalent acts as a low pass filter. The aim of this filter is to reduce over all volatility, provide less of a draw down and avoid negative compounding periods. We see from the results that this has been achieved.

Reader Question:
What happens if we optimize the monthly bar SMA look back higher / lower than the tested 9sma?

Full back test R code:


# Long Term S&P500 - Switch To Bonds
# Andrew Bannerman 10.8.2017

require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Data path
data.dir <- "C:/R Projects/Final Scripts/Vanguard Long Term Strategies"
data.read.VFINX <- paste(data.dir,"VFINX.csv",sep="/")
data.read.VFITX <- paste(data.dir,"VFITX.csv",sep="/")

# Read data
read.VFINX <- read.csv(data.read.VFINX,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)
read.VFITX <- read.csv(data.read.VFITX,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)
#read.VFITX <- read.VFITX[-nrow(read.VFITX),] 

# Convert Values To Numeric
cols <-c(2:7)
read.VFINX[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
read.VFITX[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

#Convert Date Column [1]
read.VFINX$Date <- ymd(read.VFINX$Date)
read.VFITX$Date <- ymd(read.VFITX$Date)

# Merge two data sets
df <- full_join(read.VFINX, read.VFITX, by = c("Date" = "Date"))

# Rename Columns
colnames(df)[1] <-"Date"
colnames(df)[2] <-"VFINX.Open"
colnames(df)[3] <-"VFINX.High"
colnames(df)[4] <-"VFINX.Low"
colnames(df)[5] <-"VFINX.Close"
colnames(df)[6] <-"VFINX.Adj.Close"
colnames(df)[7] <-"VFINX.Volume"
colnames(df)[8] <-"VFITX.Open"
colnames(df)[9] <-"VFITX.High"
colnames(df)[10] <-"VFITX.Low"
colnames(df)[11] <-"VFITX.Close"
colnames(df)[12] <-"VFITX.Adj.Close"
colnames(df)[13] <-"VFITX.Volume"

# Convert all NA to 0
df[is.na(df)] <- 0

# Use TTR package to create rolling SMA n day moving average
# Create function and loop in order to repeat the desired number of SMAs for example 2:30
getSMA <- function(numdays) {
  function(df) {
    SMA(df[,"VFINX.Adj.Close"], numdays)    # Calls TTR package to create SMA
  }
}
# Create a matrix to put the SMAs in
sma.matrix <- matrix(nrow=nrow(df), ncol=0)

# Loop for filling it
for (i in 2:400) {
  sma.matrix <- cbind(sma.matrix, getSMA(i)(df))
}

# Rename columns
colnames(sma.matrix) <- sapply(2:400, function(n)paste("adj.close.sma.n", n, sep=""))

# Bind to existing dataframe
df <-  cbind(df, sma.matrix)

# Convert all NA to 0
df[is.na(df)] <- 0

# Calculate Returns from open to close
#df$VFINX.ocret <- apply(df[,c('VFINX.Open', 'VFINX.Adj.Close')], 1, function(x) { (x[2]-x[1])/x[1]} )
#df$VFITX.ocret <- apply(df[,c('VFITX.Open', 'VFITX.Adj.Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
df$VFINX.clret <- ROC(df$VFINX.Adj.Close, type = c("discrete"))
df$VFITX.clret <- ROC(df$VFITX.Adj.Close, type = c("discrete"))
df$VFINX.clret[1] <- 0
df$VFITX.clret[1] <- 0

# Add leverage multiplier
#df$VFINX.clret <- df$VFINX.clret * 3

# Subset Date
df <- subset(df, Date >= as.POSIXct("1991-10-01") ) 

# Name indicators #
VFINX.sma <- df$adj.close.sma.n9

# Enter buy / sell rules
df$signal.long.stocks <- ifelse(df$VFINX.Adj.Close > VFINX.sma, 1,0)
df$signal.long.bonds <- ifelse(df$VFINX.Adj.Close < VFINX.sma, 1,0)

# lag signal by one forward day to signal entry next day
df$signal.long.stocks <- lag(df$signal.long.stocks,1) # Note k=1 implies a move *forward*
df$signal.long.bonds <- lag(df$signal.long.bonds,1) # Note k=1 implies a move *forward*

df[is.na(df)] <- 0  # Set NA to 0

#Plot VIFNX Monthly with 9sma
plot(df$VFINX.Adj.Close, col = ifelse(df$VFINX.Adj.Close < VFINX.sma,'red','black'), pch = 10, cex=.5,ylab="VFINX Close",main="VFINX Below Monthly 9sma")

# Calculate equity curves
# Long Stocks
df <- df %>%
  dplyr::mutate(RunID = rleid(signal.long.stocks)) %>%
  group_by(RunID) %>%
  dplyr::mutate(long.stocks.equity.curve = ifelse(signal.long.stocks== 0, 0,
                                         ifelse(row_number() == 1, VFINX.clret, VFINX.clret))) %>%
  ungroup() %>%
  select(-RunID)

# Long Bonds
df <- df %>%
  dplyr::mutate(RunID = rleid(signal.long.bonds)) %>%
  group_by(RunID) %>%
  dplyr::mutate(long.bonds.equity.curve = ifelse(signal.long.bonds == 0, 0,
                                                  ifelse(row_number() == 1, VFITX.clret, VFITX.clret))) %>%
  ungroup() %>%
  select(-RunID)

# Combine Signals
df$combined.equity.curve <- df$long.stocks.equity.curve + df$long.bonds.equity.curve

# Pull select columns from data frame to make XTS whilst retaining formats
xts1 = xts(df$long.stocks.equity.curve, order.by=as.Date(df$Date, format="%m/%d/%Y"))
xts2 = xts(df$long.bonds.equity.curve, order.by=as.Date(df$Date, format="%m/%d/%Y"))
xts3 = xts(df$combined.equity.curve, order.by=as.Date(df$Date, format="%m/%d/%Y"))
xts4 = xts(df$VFINX.clret, order.by=as.Date(df$Date, format="%m/%d/%Y")) 

# Join XTS together
compare <- cbind(xts3,xts2,xts1,xts4)

# Use the PerformanceAnalytics package for trade statistics

require(PerformanceAnalytics)
colnames(compare) <- c("Long.Stocks.Switch.Bonds","Long.Bonds","Long.Stocks","Buy And Hold")
charts.PerformanceSummary(compare,main="Long Stocks(VFINX) Over 200sma, Long Bonds(VFITX) Under 200sma", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)
Return.cumulative(xts3, geometric = TRUE)

Simple S&P500 Linear Strategy

From our hurst exponent test during a previous post we determined that the S&P500 was a mean reverting series on an interday time frame. For this back test we have created a very simple linear model to ‘fit’ the nature of the series.

We use a rolling ‘n’ day zscore and we test using a look back of 11 days, for some reason the lookback of 11 days provided the best results. For the sake of simplicity we will test over the full sample period and we will not include any commissions or slippage.

We will use the SPY for this back test from inception to present.

Here are the parameters:
1. Rolling lookback period of z-score = 11 days
2. Entry level = < -0.2
3. Exit Time = 4 days

As you can see the parameters are kept to a minimum. The main goal here is to fit a model to the existing anomaly versus force fit a model to the data.

11.

result.

As we can see we obtain better performance than buy and hold. We have about a 21% better DD than buy and hold and a 10% annualized from 1993 to present. Notice the time invested is 48%. It means we have capital available to execute to other strategies. To offset the draw downs even further we could add another uncorrelated strategy but this as a base is not a bad attempt to capture the mean reversion nature of the S&P500 on an interday basis. Note this might not be the most stellar performance but its good example to illustrate the train of thought.

Some questions to ponder?

1. What happens if we ‘fit’ a mean reversion model to S&p500 daily data from 1929 to 1980?
2. What happens if we fit a momentum model to S&p500 daily data from 1929 to 1980?
3. What happens if we fit a momentum model for S&P500 daily data from 1993 to present?

Those questions good readers shall leave for you to answer!

The rules of this system are:
1. When zscore crosses below <-0.2 buy next days open
2. After 4 days has passed sell all, if another signal is generated while in the existing trade we will take the next signal also

Notes about the rules of the backtest:
1. We calculate S&P500 close to close returns and also open to close returns. We do this in order to set the first buying day to calculate an open to close return (At the end of a day a signal is generated, we then buy next days open, thus the returns for the first day are open to close returns, on 2 day and over the returns are close to close returns.)
2. To avoid look ahead bias, after a signal is generated, we lag +1 the time series forward a day so that we are simulating buying the NEXT days open.

All parameters can be optimized and entry and exit rules adjusted. Some ideas could be:

1. For exit, exit on specific zscore value… ie buy <-0.20 and sell when it crossed back over 0.
2. For exit, ignore repeat trades while in existing trade
3. Exit soon as the price is higher than our entry

Different rules produce slightly different results. I may go into some of these different selling rules and the code for them in future posts! For now this is a good start ๐Ÿ™‚

# Calculate rolling z-score of SPY close price
# Sell at fixed n day hold
# 8.13.2017
# Andrew Bannerman
require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Data path
data.dir <- "D:/R Projects"
data.read.spx <- paste(data.dir,"SPY.csv",sep="/")

# Read data
read.spx <- read.csv(data.read.spx,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)

# Make dataframe
new.df <- data.frame(read.spx)
tail(new.df)
# Convert Values To Numeric
cols <-c(3:8)
new.df[,cols] %% lapply(function(x) as.numeric(as.character(x)))

#Convert Date Column [1]
new.df$Date <- ymd(new.df$Date)

# Use TTR package to create rolling SMA n day moving average
# Create function and loop in order to repeat the desired number of SMAs for example 2:30
getSMA <- function(numdays) {
function(new.df) {
SMA(new.df[,"Close"], numdays) # Calls TTR package to create SMA
}
}
# Create a matrix to put the SMAs in
sma.matrix <- matrix(nrow=nrow(new.df), ncol=0)

# Loop for filling it
for (i in 2:60) {
sma.matrix <- cbind(sma.matrix, getSMA(i)(new.df))
}

# Rename columns
colnames(sma.matrix) <- sapply(2:60, function(n)paste("close.sma.n", n, sep=""))

# Bind to existing dataframe
new.df <- cbind(new.df, sma.matrix)

# Use TTR package to create rolling Standard Deviation
# Create function and loop in order to repeat the desired number of Stdev for example 2:30
getSD <- function(numdays) {
function(new.df) {
runSD(new.df$Close, numdays, cumulative = FALSE) # Calls TTR package to create SMA
}
}
# Create a matrix to put the SMAs in
sd.matrix <- matrix(nrow=nrow(new.df), ncol=0)

# Loop for filling it
for (i in 2:60) {
sd.matrix <- cbind(sd.matrix, getSD(i)(new.df))
}

# Rename columns
colnames(sd.matrix) <- sapply(2:60, function(n)paste("close.sd.n", n, sep=""))

# Bind to existing dataframe
new.df <- cbind(new.df, sd.matrix)

# Use base R to work out the rolling z-score (Close – roll mean) / stdev
new.df$close.zscore.n2 <- apply(new.df[,c('Close','close.sma.n2', 'close.sd.n2')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n3 <- apply(new.df[,c('Close','close.sma.n3', 'close.sd.n3')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n4 <- apply(new.df[,c('Close','close.sma.n4', 'close.sd.n4')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n5 <- apply(new.df[,c('Close','close.sma.n5', 'close.sd.n5')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n6 <- apply(new.df[,c('Close','close.sma.n6', 'close.sd.n6')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n7 <- apply(new.df[,c('Close','close.sma.n7', 'close.sd.n7')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n8 <- apply(new.df[,c('Close','close.sma.n8', 'close.sd.n8')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n9 <- apply(new.df[,c('Close','close.sma.n9', 'close.sd.n9')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n10 <- apply(new.df[,c('Close','close.sma.n10', 'close.sd.n10')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n11 <- apply(new.df[,c('Close','close.sma.n11', 'close.sd.n11')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n12 <- apply(new.df[,c('Close','close.sma.n12', 'close.sd.n12')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n13 <- apply(new.df[,c('Close','close.sma.n13', 'close.sd.n13')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n14 <- apply(new.df[,c('Close','close.sma.n14', 'close.sd.n14')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n15 <- apply(new.df[,c('Close','close.sma.n15', 'close.sd.n15')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n16 <- apply(new.df[,c('Close','close.sma.n16', 'close.sd.n16')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n17 <- apply(new.df[,c('Close','close.sma.n17', 'close.sd.n17')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n18 <- apply(new.df[,c('Close','close.sma.n18', 'close.sd.n18')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n19 <- apply(new.df[,c('Close','close.sma.n19', 'close.sd.n19')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n20 <- apply(new.df[,c('Close','close.sma.n20', 'close.sd.n20')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n21 <- apply(new.df[,c('Close','close.sma.n21', 'close.sd.n21')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n22 <- apply(new.df[,c('Close','close.sma.n22', 'close.sd.n22')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n23 <- apply(new.df[,c('Close','close.sma.n23', 'close.sd.n23')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n24 <- apply(new.df[,c('Close','close.sma.n24', 'close.sd.n24')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n25 <- apply(new.df[,c('Close','close.sma.n25', 'close.sd.n25')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n26 <- apply(new.df[,c('Close','close.sma.n26', 'close.sd.n26')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n27 <- apply(new.df[,c('Close','close.sma.n27', 'close.sd.n27')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n28 <- apply(new.df[,c('Close','close.sma.n28', 'close.sd.n28')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n29 <- apply(new.df[,c('Close','close.sma.n29', 'close.sd.n29')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n30 <- apply(new.df[,c('Close','close.sma.n30', 'close.sd.n30')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n31 <- apply(new.df[,c('Close','close.sma.n31', 'close.sd.n31')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n32 <- apply(new.df[,c('Close','close.sma.n32', 'close.sd.n32')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n33 <- apply(new.df[,c('Close','close.sma.n33', 'close.sd.n33')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n34 <- apply(new.df[,c('Close','close.sma.n34', 'close.sd.n34')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n35 <- apply(new.df[,c('Close','close.sma.n35', 'close.sd.n35')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n36 <- apply(new.df[,c('Close','close.sma.n36', 'close.sd.n36')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n37 <- apply(new.df[,c('Close','close.sma.n37', 'close.sd.n37')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n38 <- apply(new.df[,c('Close','close.sma.n38', 'close.sd.n38')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n39 <- apply(new.df[,c('Close','close.sma.n39', 'close.sd.n39')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n40 <- apply(new.df[,c('Close','close.sma.n40', 'close.sd.n40')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n41 <- apply(new.df[,c('Close','close.sma.n41', 'close.sd.n41')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n42 <- apply(new.df[,c('Close','close.sma.n42', 'close.sd.n42')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n43 <- apply(new.df[,c('Close','close.sma.n43', 'close.sd.n43')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n44 <- apply(new.df[,c('Close','close.sma.n44', 'close.sd.n44')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n45 <- apply(new.df[,c('Close','close.sma.n45', 'close.sd.n45')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n46 <- apply(new.df[,c('Close','close.sma.n46', 'close.sd.n46')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n47 <- apply(new.df[,c('Close','close.sma.n47', 'close.sd.n47')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n48 <- apply(new.df[,c('Close','close.sma.n48', 'close.sd.n48')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n49 <- apply(new.df[,c('Close','close.sma.n49', 'close.sd.n49')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n50 <- apply(new.df[,c('Close','close.sma.n50', 'close.sd.n50')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n51 <- apply(new.df[,c('Close','close.sma.n51', 'close.sd.n51')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n52 <- apply(new.df[,c('Close','close.sma.n52', 'close.sd.n52')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n53 <- apply(new.df[,c('Close','close.sma.n53', 'close.sd.n53')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n54 <- apply(new.df[,c('Close','close.sma.n54', 'close.sd.n54')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n55 <- apply(new.df[,c('Close','close.sma.n55', 'close.sd.n55')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n56 <- apply(new.df[,c('Close','close.sma.n56', 'close.sd.n56')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n57 <- apply(new.df[,c('Close','close.sma.n57', 'close.sd.n57')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n58 <- apply(new.df[,c('Close','close.sma.n58', 'close.sd.n58')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n59 <- apply(new.df[,c('Close','close.sma.n59', 'close.sd.n59')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n60 <- apply(new.df[,c('Close','close.sma.n60', 'close.sd.n60')], 1, function(x) { (x[1]-x[2])/x[3] } )

# Convert all NA to 0
new.df[is.na(new.df)] <- 0

# Calculate quartiles, where close is relation to range (Close – High) / (High – Low)
#new.df$quartile <- apply(new.df[,c('Close', 'Low', 'High')], 1, function(x) { (x[1]-x[2])/(x[3]-x[2])} )

# Calculate Returns from open to close
new.df$ocret <- apply(new.df[,c('Open', 'Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
new.df$clret <- ROC(new.df$Close, type = c("discrete"))
new.df$clret[1] <- 0

# Name indicators
indicator <- new.df$close.zscore.n12

# Create Long Signals
new.df$signal <- ifelse(indicator < -.2, 1, 0)

######## Fixed 'n' day hold logic ##########
# Variable for loop
indicator <- new.df$signal

# Create Vector From Signal
signal.1 <- c(indicator)

# Set variable for number of days to hold
n.day <- 4

# Loop for fixed 'n' day hold
res <- NULL
while (length(res) < length(signal.1)) {
if (signal.1[length(res)+1] == 1) {
res <- c(res, rep(1,n.day))
} else {
res <- c(res, 0)
}
}
res <- res[1:length(signal.1)]
new.df <- data.frame(new.df,response = res)

# lag signal by one forward day to signal entry next day
new.df$response %
group_by(RunID) %>%
mutate(equity.curve = ifelse(response == 0, 0,
ifelse(row_number() == 1, ocret, clret))) %>%
ungroup() %>%
select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats
xts1 = xts(new.df$equity.curve, order.by=as.Date(new.df$Date, format=”%m/%d/%Y”))
xts2 = xts(new.df$clret, order.by=as.Date(new.df$Date, format=”%m/%d/%Y”))

# Join XTS together
compare <- cbind(xts1,xts2)

# Use the PerformanceAnalytics package for trade statistics
# install.packages("PerformanceAnalytics")
require(PerformanceAnalytics)
colnames(compare) <- c("Mean Reversion","Buy And Hold")
charts.PerformanceSummary(compare,main="Cumulative Returns", wealth.index=TRUE, colorset=rainbow12equal)
#png(filename="20090606_rsi2_performance_updated.png", 720, 720)
performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
logRets %
group_by(RunID) %>%
dplyr::mutate(perc.output = ifelse(response == 0, 0,
ifelse(row_number() == n(),
(last(Close) – first(Open))/first(Open), 0))) %>%
ungroup() %>%
select(-RunID)

# All NA to 0
new.df[is.na(new.df)] <- 0
# Win / Loss %
# 1 Day Hold Trades
winning.trades '0', na.rm=TRUE)
losing.trades <- sum(new.df$equity.curve < '0', na.rm=TRUE)
total.days <- NROW(new.df$equity.curve)
# Multi Day Hold Trades
multi.winning.trades '0', na.rm=TRUE)
multi.losing.trades <- sum(new.df$perc.output < '0', na.rm=TRUE)
multi.total.days <- NROW(new.df$perc.output)
multi.losing.trades
# % Time Invested (Same column for 1 day and multi hold trades)
time.invested <- (winning.trades + losing.trades) / total.days
winning.trades + losing.trades
winning.trades
losing.trades

# Calcualte win loss %
# 1 Day Hold Trades
total <- winning.trades + losing.trades
win.percent <- winning.trades / total
loss.percent <- losing.trades / total
# Multi Day Hold Trades
multi.total <- multi.winning.trades + multi.losing.trades
multi.win.percent <- multi.winning.trades / multi.total
multi.loss.percent <- multi.losing.trades / multi.total
# Calculate Consecutive Wins Loss
# 1 Day Hold Trades
remove.zero <- new.df[-which(new.df$equity.curve == 0 ), ] # removing rows 0 values
consec.wins <- max(rle(sign(remove.zero$equity.curve))[[1]][rle(sign(remove.zero$equity.curve))[[2]] == 1])
consec.loss <- max(rle(sign(remove.zero$equity.curve))[[1]][rle(sign(remove.zero$equity.curve))[[2]] == -1])

# Multi Day Hold Trades
multi.remove.zero <- new.df[-which(new.df$perc.output == 0 ), ] # removing rows 0 values
multi.consec.wins <- max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == 1])
multi.consec.loss <-max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == -1])

# Calculate Summary Statistics
# 1 Day Hold Trades OR all days if multi holding
average.trade <- mean(new.df$equity.curve)
average.win 0])
average.loss <- mean(new.df$equity.curve[new.df$equity.curve <0])
median.win 0])
median.loss <- median(new.df$equity.curve[new.df$equity.curve <0])
max.gain <- max(new.df$equity.curve)
max.loss <- min(new.df$equity.curve)
win.loss.ratio <- winning.trades / abs(losing.trades)
summary <- cbind(winning.trades,losing.trades,win.percent,loss.percent,win.loss.ratio,time.invested,average.trade,average.win,average.loss,median.win,median.loss,consec.wins,consec.loss,max.gain,max.loss)
summary <- as.data.frame(summary)
colnames(summary) <- c("Winning Trades","Losing Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(summary)

# Multi Day Hold Trades
multi.average.trade <- mean(new.df$perc.output)
multi.average.win 0])
multi.average.loss <- mean(new.df$perc.output[new.df$perc.output <0])
multi.median.win 0])
multi.median.loss <- median(new.df$perc.output[new.df$perc.output <0])
multi.win.loss.ratio <- multi.average.win / abs(multi.average.loss)
multi.max.gain <- max(new.df$perc.output)
multi.max.loss <- min(new.df$perc.output)
multi.summary <- cbind(multi.winning.trades,multi.losing.trades,multi.win.percent,multi.loss.percent,multi.win.loss.ratio,time.invested,multi.average.trade,multi.average.win,multi.average.loss,multi.median.win,multi.median.loss,multi.consec.wins,multi.consec.loss,multi.max.gain,multi.max.loss)
multi.summary <- as.data.frame(multi.summary)
colnames(multi.summary) <- c("Winning Trades","Losing Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(multi.summary)
print(performance.table)
print(drawdown.table)
table.Drawdowns(xts1, top=10)
charts.PerformanceSummary(compare,main="Cumulative Returns",wealth.index=TRUE,colorset=rainbow12equal)

# Write output to file
write.csv(new.df,file="D:/R Projects/spy_mean.csv")