SPY Mean Reversion With John Ehlers Adaptive RSI

It has been a busy few months. I have been exploring market indicators that John Ehlers has created which he publicly made available in his book: Cycle Analytics for Traders : Advanced Technical Trading Concepts.

The key theme of his book is applying digital signal processing filters to better process market data. He utilizes various high and low pass filters which only allow certain frequencies to pass at specified look back periods and thus make better attempts to reduce signal to noise and perhaps extract a dominant cycle from the data.

Key ideas from the book:

  1. Traditional indicators have lag and assume normal probability distribution
  2. His indicators reduce lag
  3. Extracting the dominant cycle over a range of periods can be used to calibrate traditional indicators such as RSI, stochastic etc….
  4. Has methods to convert indicator output to as close to normal probability distribution

I’m in the process of making his indicators from his Cycle analytics for traders book available on my github in a package called MarketCycles.jl. Please refer to the package README.md for further information.

The package will be used in the next part of this post where we use an Adaptive RSI to time the mean reversion tendency of the daily S&P500.

First install the package:

Pkg.clone("https://github.com/flare9x/MarketCycles.jl")

The Adaptive RSI is adaptive because the indicator extracts the dominant cycle using the autocorrelation periodogram. At any given point in time [i] half the dominant cycle period is used to calibrate the lookback length of the RSI.

I want to point out a few stark differences with the below strategy back test code. First, the code is setup to BUY NEXT BAR AT OPEN. This means when a signal is triggered at the close of a bar. A trade is then entered at the next open price.

Secondly, rather than calculate the cumulative % close to close returns as so many do. We specify an initial starting capital and add the $ gain to a running total adjusting for trading commissions which provides us with a result that is more in line with how it might work in the real world.

Please see top of code snippet in order to adjust the following input variables for the strategy:

# Strategy input variables
starting_capital = 10000
comms = 10
entry_level = .3
exit_level = .8

The rest is best described by following the below julia language code. Note if you do not have a data source I have made this post easy to follow by providing a code to download data from the alphavantage api. You just need to edit the alphavantage link and add in your own API key:

# Ehlers mean reversion

using HTTP
using DataFrames
using CSV
using Gadfly
using MarketCycles

# Multiple symbols
adjusted = 1  # if want adjusted closes set to 1 else non adjusted set it to 0
t = 1
tickers = ["SPY"]
for t in 1:length(tickers)  # Note it might get stuck!! if it fails.. print the t value... then restart the for loop for t in 4:length(tickers)  instead of 1:length(tickers) for example
# Using HTTP package
sleep(10)  # sleep between API calls
if adjusted == 1
res = HTTP.get(joinpath(
"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
else
    res = HTTP.get(joinpath(
    "https://www.alphavantage.co/query?function=TIME_SERIES_MONTHLYD&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
end
mycsv = readcsv(res.body)
x = convert(DataFrame, mycsv)
x = x[2:nrow(x),:]  # subset remove header row
# Rename Columns
if adjusted == 1
    colnames = ["Date","Open","High","Low","Close","Adjusted_Close","Volume","Dividend_Amount","split_coefficient"]
else
colnames = ["Date","Open","High","Low","Close","Volume"]
end
names!(x.colindex, map(parse, colnames))
# Convert String Date to Date format
x[:Date] = Date.(x[:Date],Dates.DateFormat("yyy-mm-dd"))
# Sort Date Frame By Date
x = sort!(x, [order(:Date)], rev=(false))
# Convert OHLC to Float64 and Volume to Int64
if adjusted == 1
for i in 2:length(x)-2
    x[i] = convert(Array{Float64,1},x[i])
end
    x[7] = convert(Array{Int64,1},x[7])
    x[8] = convert(Array{Float64,1},x[8])
else
    for i in 2:length(x)-1
        x[i] = convert(Array{Float64,1},x[i])
    end
        x[6] = convert(Array{Int64,1},x[6])
end

if adjusted == 1
CSV.write(joinpath("CSV_OUT_ADJUSTED_"tickers[t]".csv"), x)
else
CSV.write(joinpath("CSV_OUT_"tickers[t]".csv"), x)
end
end

# Load Data
SPY = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/Indicators/CSV_OUT_ADJUSTED_SPY.csv",types=[String; fill(Float64, 5); Int;Float64],header=true)
SPY[:Date] = DateTime.(SPY[:Date],Dates.DateFormat("mm/dd/yyyy"))

# Pull arrays
Date_index = SPY[:Date]
Spy_close = SPY[:Close]
Spy_open = SPY[:Open]

# Eherls Adaptive RSI
Adaptive_RSI = AdaptiveRSI(Spy_close)

# Strategy input variables
starting_capital = 10000.00
comms = 10
entry_level = .3
exit_level = .8

# Cross under entry level and sell when cross over exit level
entry = zeros(Adaptive_RSI)
exit = zeros(Adaptive_RSI)
for i = 2:size(Adaptive_RSI,1)
    if (Adaptive_RSI[i] <= entry_level) && (Adaptive_RSI[i-1] > entry_level)
        entry[i] = 1
    else
        entry[i] = 0
    end
    if (Adaptive_RSI[i] >= exit_level) && (Adaptive_RSI[i-1] < exit_level)
        exit[i] = 2
    else
        exit[i] = 0
    end
end

# combine entry and exit
signal = exit .+ entry


# Fill the trade between entry and exit with 1's
all_signal = zeros(signal)
for i = 2:length(signal)
    print(i)
    if signal[i] == 1.0
        all_signal[i] = 1.0
    end
    if signal[i] == 0.0 && all_signal[i-1] == 1.0
        all_signal[i] = 1
    end
    if signal[i-1] == 2.0 && signal[i] == 0.0
        all_signal[i-1] = 1.0
    end
end

# ID the start and end of the trade
trade_start = zeros(all_signal)
trade_end = zeros(all_signal)
for i = 2:size(trade_end,1)
if all_signal[i] == 1.0 && all_signal[i] != all_signal[i-1]
    trade_start[i] = 1
end
if all_signal[i-1] == 1.0 && all_signal[i-1] != all_signal[i]
    trade_end[i-1] = 2
end
end

# Calculate the % return buying at open and selling at the close
open_price = Float64[]
close_price = Float64[]
strategy_returns = zeros(size(all_signal,1))
for i = 1:size(all_signal,1)-1
    if trade_start[i] == 1
        open_price = Spy_open[i+1]
    elseif trade_end[i] == 2
        close_price = Spy_close[i]
        # place the close return on same index position as the end of the trade
    strategy_returns[i] = close_price /  open_price - 1
end
end


# Work out gain on starting investment amount
dollar_gain = zeros(size(strategy_returns,1))
dollar_gain[1] = starting_capital

i=1
for i=2:size(strategy_returns,1)
    # if the returns are 0 then use the previous $ amount
if strategy_returns[i] == 0.0
    dollar_gain[i] = dollar_gain[i-1]
elseif strategy_returns[i] != 0.0
    dollar_gain[i] = dollar_gain[i-1] + (dollar_gain[i-1] * strategy_returns[i]) - comms
end
end

# plot final result
white_panel = Theme(
	panel_fill="white",
    default_color="blue",
    background_color="white"
)
p = plot(x=Date_index,y=dollar_gain,Geom.line,Guide.title("Ehlers Adaptive RSI"),white_panel,Scale.y_continuous(format=:plain))
         draw(PNG("C:/Users/Andrew.Bannerman/Desktop/Julia/ehlers_rsi.png", 1500px, 800px), p);

And we may plot the output which is the growth of $10,000 inclusive of $5 round trip commissions:

ehlers_rsi

It should be noted that default settings from Ehlers book were used as a High and low pass filter as well as the range of lags used in the autocorrelation periodogram to calculate the dominant cycle.

The function I created has many inputs that may be calibrated:

AdaptiveRSI(x::Array{Float64}; min_lag::Int64=1, max_lag::Int64=48,LPLength::Int64=10, HPLength::Int64=48, AvgLength::Int64=3)::Array{Float64}

Thus the default settings might not be best tuned to SPY daily data. One would be advised to iterate through the input variables to learn how each variable affects the output as well as find a correct tuning of the indicator.

If anyone wishes to have a more professional back test script with varying entry / exits methods that accounts for slippage and commissions do not hesitate to contact me.

Follow me on twitter: @flare9x

Github

Monthly Investing vs Yearly Investing – Effects on Average Share Cost

I invest in a long term S&P500 momentum strategy. I have been making monthly contributions and over the short term. The average trade cost seemed to be much higher than initial cost. I wanted to see if buying in bulk 1x a year was better than investing every month over a long run of history.

The calculation to work out the average trade cost is as follows:

Total $ amount invested / Total shares invested.

If we invest at each time increment:

# Invest - Avg Share Cost
# Rolling Calculation
prices <- c(12.56,13.45,11.10,10.09,12.3,16.7)
total.to.invest <- c(10000,7000,1600,6000,3000,12000)
table <- data.frame(prices,total.to.invest)
table$total.shares <- apply(table[,c('prices','total.to.invest')], 1, function(x) { round((x[2]/x[1])) } )
table$total.shares.sum <- cumsum(table$total.shares)
table$total.to.invest.sum <- cumsum(table$total.to.invest)
table$roll.cost.avg <- apply(table[,c('total.shares.sum','total.to.invest.sum')], 1, function(x) { (x[2]/x[1]) } ) > head(table)
  prices total.to.invest total.shares total.shares.sum total.to.invest.sum roll.cost.avg
1  12.56           10000          796              796               10000      12.56281
2  13.45            7000          520             1316               17000      12.91793
3  11.10            1600          144             1460               18600      12.73973
4  10.09            6000          595             2055               24600      11.97080
5  12.30            3000          244             2299               27600      12.00522
6  16.70           12000          719             3018               39600      13.12127

This is on a rolling basis, we see that the average cost is higher than our initial entry which will be the case with an up trending series.

If we invest less at the beginning and more at the end:

# Invest less at beginning, Invest more later
prices <- c(12.56,13.45,11.10,10.09,12.3,16.7)
total.to.invest <- c(100,700,1600,6000,3000,120000)
table <- data.frame(prices,total.to.invest)
table$total.shares <- apply(table[,c('prices','total.to.invest')], 1, function(x) { round((x[2]/x[1])) } )
table$total.shares.sum <- cumsum(table$total.shares)
table$total.to.invest.sum <- cumsum(table$total.to.invest)
table$roll.cost.avg <- apply(table[,c('total.shares.sum','total.to.invest.sum')], 1, function(x) { (x[2]/x[1]) } )

  prices total.to.invest total.shares total.shares.sum total.to.invest.sum roll.cost.avg
1  12.56             100            8                8                 100      12.50000
2  13.45             700           52               60                 800      13.33333
3  11.10            1600          144              204                2400      11.76471
4  10.09            6000          595              799                8400      10.51314
5  12.30            3000          244             1043               11400      10.93001
6  16.70          120000         7186             8229              131400      15.96792

As we invested a higher amount of capital later than earlier, we brought our average cost up and closer to the current price. Its a weighted average after all.

Next we simulate adding a fixed amount of capital every month and a fixed amount of capital every year.

Two investment capitals:

# Enter monthly and yearly capital investments
capital.invest.e.month <- 10000/12 # Invest 10,000 a year, split into months
bulk.invest.1.year <- 10000

We invest 10,000 per year divided by 12 months, 833.33 each month and 10,000 every year buying the S&P500. Orders are placed at the start of every month and yearly orders at the start of every year.

We then compute a rolling average cost, the code to do this:

# Bulk Buy 1x a year Vs investing every month
# Andrew Bannerman 1.3.2018

require(xts)
require(data.table)
require(ggplot2)
require(lubridate)
require(magrittr)
require(scales)
require(reshape2)
require(PerformanceAnalytics)
require(dplyr)
require(TTR)

# Download SPX data
require(quantmod)
startdate<- "1930-01-01"
SPX <- getSymbols("^GSPC",from=startdate,auto.assign=FALSE)
SPX <-  data.frame(Date=index(SPX), coredata(SPX)) # Change XTS to data frame and retain Date column

# Add day of month column 
# This is a helper column for creating buy/sell rules
months <- SPX %>% dplyr::mutate(month = lubridate::month(Date)) %>% group_by(month) 
days <- SPX %>% dplyr::mutate(day = lubridate::day(Date)) %>% group_by(day) 
years <- SPX %>% dplyr::mutate(year = lubridate::year(Date)) %>% group_by(year) 
df <- data.frame(SPX,month=months$month,day=days$day,year=years$year)

# Subset df by date 
#df <- subset(df, Date >= as.Date("2009-01-01"))
head(df$Date)

# Enter monthly and yearly capital investments
capital.invest.e.month <- 10000/12 # Invest 10,000 a year, split into months
bulk.invest.1.year <- 10000

# Simulate buying every month
output <- df %>%
  dplyr::mutate(RunID = data.table::rleid(month)) %>%
  group_by(RunID) %>%
  mutate(ID.No = row_number()) %>%
  dplyr::mutate(total.shares.months = ifelse(ID.No == 1,first(capital.invest.e.month) / first(GSPC.Adjusted),0)) %>%  # Divide total purchased by cost price for total share
  dplyr::mutate(total.cost.months = ifelse(ID.No == 1,first(total.shares.months) * first(GSPC.Adjusted),0)) %>%  # Divide total purchased by cost price for total share
  ungroup() %>%
  select(-RunID)

df <- data.frame(output)
head(df$Date)
# Simulate buying 1x share start of each month
#output <- df %>%
#dplyr::mutate(RunID = data.table::rleid(month)) %>%
#group_by(RunID) %>%
#  mutate(ID.No = row_number()) %>%
#  dplyr::mutate(first.month.total.cost = ifelse(ID.No == 1,first(GSPC.Adjusted) * 1,0)) %>% # Own 1x share at close price change 1 to 2 for more..
#  dplyr::mutate(total.shares = ifelse(ID.No == 1,first(first.month.total.cost) / first(GSPC.Adjusted),0)) %>%  # Divide total purchased by cost price for total share
#    ungroup() %>%
#  select(-RunID)

# Simulate bulk investing 1x a year
output <- df %>%
  dplyr::mutate(RunID = data.table::rleid(year)) %>%
  group_by(RunID) %>%
  mutate(ID.No = row_number()) %>%
  dplyr::mutate(total.shares.years = ifelse(ID.No == 1,round(first(bulk.invest.1.year) / first(GSPC.Adjusted)),0)) %>%  # Divide total purchased by cost price for total share
  dplyr::mutate(total.cost.years = ifelse(ID.No == 1,first(total.shares.years) * first(GSPC.Adjusted),0)) %>%  # Divide total purchased by cost price for total share
  ungroup() %>%
  select(-RunID)
# output data frame
df <- data.frame(output)

# Calculate average cost per share 
# sum first.month.total cost / sum of total shares bought
month.invest.avg.cost <- sum(df$total.cost.months) / sum(df$total.shares.months)
year.invest.avg.cost <- sum(df$total.cost.years) / sum(df$total.shares.years)
find.first.price <- head(df$GSPC.Adjusted,1)
find.last.price <- tail(df$GSPC.Adjusted,1)

# Subset for month avg cost
# index
df$index <- seq(1:nrow(df))
df.month <- subset(df,total.shares.months >0)
# totals 
df.month$total.shares.months.sum <- cumsum(df.month$total.shares.months)
df.month$total.cost.months.sum <- cumsum(df.month$total.cost.months)
df.month$month.roll.avg.cost <- apply(df.month[,c('total.cost.months.sum','total.shares.months.sum')], 1, function(x) { (x[1]/x[2]) } )
head(df.month$Date)
# Join original df 
df.join.month <- full_join(df, df.month, by = c("Date" = "Date"))
df.join.month$month.roll.avg.cost <- na.locf(df.join.month$month.roll.avg.cost)
head(df.join.month$Date)

# Subset for year avg year cost
df.year <- subset(df,total.shares.years >0)
# totals 
df.year$year.total.shares.years.sum <- cumsum(df.year$total.shares.years)
df.year$year.total.cost.years.sum <- cumsum(df.year$total.cost.years)
df.year$year.roll.avg.cost <- apply(df.year[,c('year.total.cost.years.sum','year.total.shares.years.sum')], 1, function(x) { (x[1]/x[2]) } )

# Join original df 
df.join.year <- full_join(df, df.year, by = c("Date" = "Date"))
df.join.year$year.roll.avg.cost <- na.locf(df.join.year$year.roll.avg.cost)
tail(plot.df,1000)
# Plot 
plot.df  <- data.frame("Date" = df.join.month$Date, "Rolling Average Cost Monthly" = df.join.month$month.roll.avg.cost,"Rolling Average Cost Yeary" = df.join.year$year.roll.avg.cost, "SPX Adjusted Close" = df$GSPC.Adjusted)
# Melt for plotting
plot.df <- melt(plot.df, id.vars="Date")
ggplot(plot.df, aes(x=Date, y=value, fill=variable)) +
  geom_bar(stat='identity', position='dodge')+  
  ggtitle("Average Share Cost - Investing Monthly Vs 1x Bulk Investing Each Year",subtitle="SPX 1950 To Present")+
  labs(x="Date",y="SPX Close Price")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))

And the output vs SPX price:

Rplot238
Average Share Cost – Invest start of every month and start of every year – 1950 To Present

From 1950 to present we see we lock in a better average cost pre dot com up trend.

Rplot234
Average Share Cost – Invest start of every month and start of every year – 2000 To Present

If we started investing at the top of the dot com bubble. We see the effects of averaging down. Both bear markets are somewhat averaged out so to speak. Investing at the top of the market, buy and hold gains were somewhat stagnant for an entire decade.

Rplot235
Average Share Cost – Invest start of every month and start of every year – 2009 To Present

There is not much variation between electing to invest every month vs every year. The difference is negligible over the long haul. It is however, notable that investing incrementally affects the total % return. If we bulk invest 10,000 at one price and it rises 10%. We have +$1000. If we dollar cost average, we invest 20,000 and dilute the initial cost, we have a 5% gain instead of a 10% gain. However, 5% gain on 20,000 is 1000.

Raising the cost per share as price trends does affect the amount of cushion one has. As in this example, this is conditionally investing in the momentum strategy:

Rplot252

This is bulk investing $50,000 at a new leg and adding a fixed amount every month / year. We see our average cost chases the current price. One way one could avoid is bulk invest at the start of each new leg without adding any new funds. Only large contributions are made when a new momentum trade is entered, which is in this case is every other year for my S&P500 strategy. More to come.

Thanks for reading!

Full code can be found on my github.

Pairs Trading – Testing For Conintergration, ADF (unit root), Johansen Test & Half Life of Mean Reversion

Note: I back test this 100% incorrectly. I also wish to add scaling of the independent variable per the hedge ratio correctly. I am leaning towards working in share prices and will update accordingly. What is correct is all the statistical testing, however, the back test is 100% incorrect. I also run the johansen test on the stationary series obtained from the hedge ratio through linear regression, this is an unnecessary step as we can use the hedge ratio from linear regression to scale the independent variable position sizes. I will get to the revisions when I can.

A statistical arbitrage opportunity exists between a cointegrated pair when the price between the pair moves out of equilibrium. This short term disequilibrium creates a trading opportunity where one enters a short position in one series with an equal weighted long position in the other series. The aim is that two series will eventually converge and the trade be closed out with a profit.

Essentially between two cointegrated pairs, the spread is mean reverting. The time of mean reversion is one factor to consider. How long will it take for the period of disequilibrium to revert to the mean and will it ever revert to the mean? Co-integration can and does break down. So one question to answer would be: How do I know if the trade will converge, when is it time to exit with a loss?

The half life of mean reversion which is calculated on the spread can help answer this question. If historically a pair of stocks A and B have shown to mean revert within ‘n’ days, if this threshold is exceeded by a long margin OR the trade meets a pre determined stop loss % OR a stop loss measured in standard deviations. The trade can be closed out with a loss.

A pair of stocks can have a linear trend however, we can form a stationary series by estimating the hedge ratio with ordinary least squares regression. In practice we run 2x ols regressions. One with Stock A as the independent variable and one with Stock B as the independent variable. We then pick the regression with the most significant t-statistic.

First lets get a feel for the data and plot the closing prices of EWA and EWC. I want to use these two series and time period to see if we can replicate Ernie Chans study in his book: Algorithmic Trading: Winning Strategies and Their Rationale.

Rplot72
EWA and EWC visually appear to have a high correlation. The pearson correlation coefficient is 0.9564724. If two series are correlated it essentially means they move in the same direction most of the time. However, it tells you nothing about the nature of the spread between two series and if they will even converge (mean revert) or continue to diverge. With co integration, we form a linear relationship between Stock A and Stock B where the distance between A and B is somewhat fixed and would be estimated to mean revert. In essence, we measure the ‘distance’ between A and B in terms of standard deviations and a trade would be taken when ‘n’ standard deviations is met. Also a stop loss may be placed outwith the normal sigma range ensuring the trade is exited if the relationship breaks down.

A scatter plot shows the variation around the ols regression fit:

# Scatter plot of price series
# plot x,y
plot(df$EWA_Close, df$EWC_Close, main="Scatterplot EWA and EWC With OLS Regression Fit",
     xlab="EWA Close", ylab="EWC Close ", pch=19)
abline(lm(df$EWC_Close~df$EWA_Close), col="red") # regression line (y~x)

Rplot71

We perform ordinary least squares regression using EWA as the independent variable (x) and EWC as the dependent variable (y). Note we set the intercept to 0 in so that the regression fit provides only the hedge ratio between EWA and EWC.


# Ordinary least squares regression
# EWA independant
ols <- lm(df$EWC_Close~df$EWA_Close+0) # regression line (y~x)
summary(ols)
beta <- coef(ols)[1]

resid <- as.data.frame(df$EWC_Close - beta * df$EWA_Close)
resid <- data.frame(resid,"Date" = df$Date)
colnames(resid)[1] <- "residuals"
plot(resid$Date,resid$residual, col = 'navyblue', xlab = 'Date', ylab = 'residuals', type="l", main=" EWA,EWC Residuals - EWA Independant Variable - No Intercept",cex.main=1)

Rplot75

In the above, we used ordinary least squares regression to estimate the hedge ratio (beta) and we form the spread by subtracting EWC close – the beta * EWA close. Note we use the closing prices and not the daily returns. By using the closing prices we form the hedge ratio that would provide us with the number of shares to go long/short for each ETF.

The augmented dickey fuller test is used to determine if a pair is non stationary or stationary (trending or non trending). The purpose of the test is to reject the null hypothesis (unit root). If we have unit root it means we have a non-stationary series. We wish to have a stationary series so that we may trade efficiency using mean reversion.


# urca ADF test
library(urca)
  summary(ur.df(resid$residual, type = "drift", lags = 1))

###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
############################################### 

Test regression drift 

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)

Residuals:
     Min       1Q   Median       3Q      Max
-1.17542 -0.17346 -0.00447  0.16173  1.27800 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.0006657  0.0072252   0.092 0.926599
z.lag.1     -0.0176743  0.0053087  -3.329 0.000892 ***
z.diff.lag  -0.1751998  0.0254683  -6.879 8.82e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2781 on 1495 degrees of freedom
Multiple R-squared:  0.04101,	Adjusted R-squared:  0.03973
F-statistic: 31.97 on 2 and 1495 DF,  p-value: 2.548e-14

Value of test-statistic is: -3.3293 5.5752 

Critical values for test statistics:
      1pct  5pct 10pct
tau2 -3.43 -2.86 -2.57
phi1  6.43  4.59  3.78

The ADF t-statistic is -3.3293 which is lower than the critical value of 5% ( below -2.86). We may reject the unit root hypothesis with 95% certainty that we have a non-stationary relationship. If we flip the independent variable from EWA to EWC and re-run OLS and the augmented dickey fuller test we obtain a t-statistic of -3.3194 which is lower than the 5% critical value of -2.86 which means we have 95% certainty the series is non-stationary. In this case we choose the most negative / statistically significant test-statistic of EWA as independent variable.

Next to determine the number of shares we need to purchase / short we may use the Johansen test. The eigenvectors determine the weights for each ETF.

# Johansen test
# Eigen vectors are the number of shares
# Test is cointegration test
coRes=ca.jo(data.frame(df$EWA_Close, df$EWC_Close),type="trace",K=2,ecdet="none", spec="longrun")
summary(coRes)
slot(coRes,"V") # The matrix of eigenvectors, normalised with respect to the first variable.

######################
# Johansen-Procedure #
###################### 

Test type: trace statistic , with linear trend 

Eigenvalues (lambda):
[1] 0.010146673 0.002667727

Values of teststatistic and critical values of test:

          test 10pct  5pct  1pct
r <= 1 |  4.00  6.50  8.18 11.65
r = 0  | 19.28 15.66 17.95 23.52

Eigenvectors, normalised to first column:
(These are the cointegration relations)

                df.EWA_Close.l2 df.EWC_Close.l2
df.EWA_Close.l2       1.0000000       1.0000000
df.EWC_Close.l2      -0.7772904       0.6080793

Weights W:
(This is the loading matrix)

               df.EWA_Close.l2 df.EWC_Close.l2
df.EWA_Close.d     0.004309716    -0.003029255
df.EWC_Close.d     0.028971878    -0.002649008

The eigenvectors form the weights for each ETF. Simply put, if allocate $10,000 to each ETF. Purchase 10,000 * 1 / EWA close to determine the number of shares to long EWA. When shorting EWC, purchase 10,000 * -0.7772904 / EWC Close. Thus our weightings form a stationary portfolio.

In order to determine how long the cointegrated relationship takes to revert to the mean we can perform a linear regression of the 1 lagged daily differences minus the mean of the 1 lagged differences. Also known at the half life of mean reversion.

 

# Calculate half life of mean reversion (residuals)
# Calculate yt-1 and (yt-1-yt)
# pull residuals to a vector
residuals <- c(resid$residual)
y.lag <- c(residuals[2:length(residuals)], 0) # Set vector to lag -1 day
y.lag <- y.lag[1:length(y.lag)-1] # As shifted vector by -1, remove anomalous element at end of vector
residuals <- residuals[1:length(residuals)-1] # Make vector same length as vector y.lag
y.diff <- residuals - y.lag # Subtract todays close from yesterdays close
y.diff <- y.diff [1:length(y.diff)-1] # Make vector same length as vector y.lag
prev.y.mean <- y.lag - mean(y.lag) # Subtract yesterdays close from the mean of lagged differences
prev.y.mean <- prev.y.mean [1:length(prev.y.mean )-1] # Make vector same length as vector y.lag
final.df <- data.frame(y.diff,prev.y.mean) # Create final data frame

# Linear Regression With Intercept
result <- lm(y.diff ~ prev.y.mean, data = final.df)
half_life <- -log(2)/coef(result)[2]
half_life

# Linear Regression With No Intercept
result = lm(y.diff ~ prev.y.mean + 0, data = final.df)
half_life1 = -log(2)/coef(result)[1]
half_life1

# Print general linear regression statistics
summary(result)

The half life of mean reversion is 31.62271 days.

We will now form a trading strategy and calculate a rolling z-score over the residuals with a look back set equal to obtained half life.

# Make z-score of residuals
# Set look back equal to half life
colnames(resid)
resid$mean <- SMA(resid[,"residuals"], round(half_life))
resid$stdev <- runSD(resid[,"residuals"], n = round(half_life), sample = TRUE, cumulative = FALSE)
resid$zscore <- apply(resid[,c('residuals', 'mean','stdev')], 1, function(x) { (x[1]-x[2]) / x[3] })

# Convert all NA to 0
resid[is.na(resid)] <- 0

# Place residuals data frame to original (df)
df <- data.frame(df,resid)

# Plot z-score of residuals
plot(resid$zscore,type="l", main="Rolling Z-Score - Look Back Set Equal To Half Life")

Rplot77

Note the range of standard deviations for the spread is between 3,-3.

We can now perform a back test to see how well it would have done. For the sake of simplicity we will not include optimization on in-sample and perform out of sample testing. We will also not consider share amounts of the weightings or commissions.

###################
# Back Test Script
###################

# Create Trading Logic
# When z-score crossed below 0 buy EWA and sell short EWC per eigenvectors
df$enter.long <- ifelse(df$zscore <= 0, 1,0)   # for long EWA below mean
df$exit.long <- ifelse(df$zscore == 0, 1,0)    # for long EWA below mean
df$enter.short <- ifelse(df$zscore <= 0 , -1,0) # for short EWC below mean
df$exit.short <- ifelse(df$zscore == 0 , -1,0)  # for short EWC below mean

df$enter.short.1 <- ifelse(df$zscore >= 0, -1,0)   # for short EWA above mean
df$exit.short.1 <- ifelse(df$zscore == 0, -1,0)    # for short EWA above mean
df$enter.long.1 <- ifelse(df$zscore >= 0 , 1,0) # for long EWC above mean
df$exit.long.1 <- ifelse(df$zscore == 0 , 1,0)  # for long EWC above mean

# Calculate close to close returns
df$ewa.clret <- ROC(df$EWA_Close, type = c("discrete"))
df$ewa.clret[1] <- 0
df$ewc.clret <- ROC(df$EWC_Close, type = c("discrete"))
df$ewc.clret[1] <- 0

# Long EWA below mean
df <- df %>%
  dplyr::mutate(ewa.long = ifelse(enter.long ==  1, 1,
                                  ifelse(exit.long == 1, 0, 0)))

# Short EWC above mean
df <- df %>%
  dplyr::mutate(ewc.short = ifelse(enter.short ==  -1, 1,
                                  ifelse(exit.short == -1, 0, 0)))

# Short EWA above mean
df <- df %>%
  dplyr::mutate(ewa.short = ifelse(enter.short.1 ==  -1, 1,
                                  ifelse(exit.short.1 == 1, 0, 0)))

# Long EWC above mean
df <- df %>%
  dplyr::mutate(ewc.long = ifelse(enter.long.1 ==  1, 1,
                                   ifelse(exit.long.1 == -1, 0, 0)))

# lag signal by one forward day to signal entry next day
df$ewa.long <- lag(df$ewa.long,1) # Note k=1 implies a move *forward*
df$ewc.short <- lag(df$ewc.short,1) # Note k=1 implies a move *forward*
df$ewa.short <- lag(df$ewa.short,1) # Note k=1 implies a move *forward*
df$ewc.long <- lag(df$ewc.long,1) # Note k=1 implies a move *forward*

# Calculate equity curves
df$equity.long <- apply(df[,c('ewa.long', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short <- apply(df[,c('ewc.short', 'ewc.clret')], 1, function(x) { x[1]*x[2]})
df$equity.long.1 <- apply(df[,c('ewa.short', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short.1 <- apply(df[,c('ewc.long', 'ewc.clret')], 1, function(x) { x[1]*x[2]})

# Combine signals
df$combined <- apply(df[,c('equity.long', 'equity.short','equity.long.1','equity.short.1')], 1, function(x) { x[1]+x[2]+x[3]+x[4] })

# Pull select columns from data frame to make XTS whilst retaining format
xts1 = xts(df$equity.long, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts2 = xts(df$equity.short, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts3 = xts(df$equity.long.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts4 = xts(df$equity.short.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts5 = xts(df$combined, order.by=as.POSIXct(df$Date, format="%Y-%m-%d")) 

# Join XTS together
compare <- cbind(xts1,xts2,xts3,xts4,xts5)

# Use the PerformanceAnalytics package for trade statistics
colnames(compare) <- c("EWA Long","EWC Short","EWA Short","EWC Long","Combined")
charts.PerformanceSummary(compare,main="EWC,EWA Pair", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

Rplot76

One thing to note is that for the whole sample period from April 26th 2006 to April 9th 2012. We use the exact same hedge ratio to form the stationary series. This is likely not the correct procedure, we also use the full sample to record the half life of mean reversion. As we expect the series A & B to change over time I believe it would be better practice to use a fixed rolling window to estimate the hedge ratio and mean reversion half life. When a new signal is generated it will be the hedge ratio/half life derived from the rolling look back period.

It should also be noted that we do not quite match Ernie Chans test from his book, I am using alpha vantage as a source of price data and we have a difference in price between Ernies EWA and the alpha vantage EWA. There is also differences in the urca Johansen test and the one Ernie uses in the jplv7 package. I will attempt to address these issues in upcoming posts.

This serves as a good introduction for me and others to the topic of pairs trading.

 

Full R code below. 

# Test for cointegration
# Download daily price data from Alpha Vantage API
# Plot two series
# Perform Linear Regression
# Plot Residuals
# ADF test for unit root
# Half life of mean reversion
# zscore look back equal to half life
# back test long stock A, short Stock B
# Andrew Bannerman 11.1.2017

require(alphavantager)
require(urca)
require(lattice)
require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Download price series from Alpha Vantage
# install.packages('alphavantager')
# Key
#https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=EWA&apikey=6RSYX9BPXKZVXUS9&outputsize=full&datatype=csv
av_api_key("your_api_key")
print(av_api_key())
#EWC <- av_get(symbol = "GLD", av_fun = "TIME_SERIES_DAILY_ADJUSTED", outputsize = "full")
#EWA <- av_get(symbol = "SLV", av_fun = "TIME_SERIES_DAILY_ADJUSTED", outputsize = "full")
#bit coin EWC <- av_get(symbol = "BTC", av_fun = "DIGITAL_CURRENCY_DAILY", outputsize = "full", market="CNY")

EWA <- read.csv("D:/R Projects/Final Scripts/Cointegration Scripts/data/daily_adjusted_EWA.csv", header=TRUE,stringsAsFactors=FALSE)
EWC <- read.csv("D:/R Projects/Final Scripts/Cointegration Scripts/data/daily_adjusted_EWC.csv", header=TRUE,stringsAsFactors=FALSE)
str(EWA)
# Convert timestamp to Date format
EWA$timestamp <- ymd(EWA$timestamp)
EWC$timestamp <- ymd(EWC$timestamp)

# Sort
EWA <- arrange(EWA,timestamp)
EWC <- arrange(EWC,timestamp)

# Merge Data Frames By date
df <- merge(EWA,EWC, by='timestamp')
head(df)
# Rename Columns
colnames(df)[1] <- "Date"
colnames(df)[6] <- "EWA_Close"
colnames(df)[14] <- "EWC_Close"

# Subset by date
df <- subset(df, Date >= as.POSIXct("2006-04-26") & Date <= as.POSIXct("2012-04-09"))  

# Scatter plot of price series
# plot x,y
plot(df$EWA_Close, df$EWC_Close, main="Scatterplot EWA and EWC With OLS Regression Fit",
     xlab="EWA Close", ylab="EWC Close ", pch=19)
abline(lm(df$EWC_Close~df$EWA_Close), col="red") # regression line (y~x)

# Line plot of series
require(ggplot2)
require(scales)

ggplot(df, aes(Date)) +
  theme_classic()+
  geom_line(aes(y=EWA_Close), colour="red") +
  geom_line(aes(y=EWC_Close), colour="blue") +
  scale_x_date(breaks = date_breaks("years"), labels = date_format("%Y"))+
  ggtitle("EWC, EWC Close", subtitle = "2006-04-26  to 2012-04-09") +
  labs(x="Year",y="EWA,EWC Close")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  annotate("text", label = "EWA", x = as.Date("2006-04-26"), y = 22, color = "blue")+
  annotate("text", label = "EWC", x = as.Date("2006-04-26"), y = 15, color = "red")

# Find corrlation coefficient
# method = pearson, spearman, kendall
EWA.EWC.cor <- cor(df$EWA_Close, df$EWC_Close, use="complete.obs", method="pearson") 

# Ordinary least squares regression
# EWA independant
ols <- lm(df$EWC_Close~df$EWA_Close+0) # regression line (y~x)
summary(ols)
beta <- coef(ols)[1]

resid <- as.data.frame(df$EWC_Close - beta * df$EWA_Close)
resid <- data.frame(resid,"Date" = df$Date)
colnames(resid)[1] <- "residuals"
plot(resid$Date,resid$residual, col = 'navyblue', xlab = 'Date', ylab = 'residuals', type="l", main=" EWA,EWC Residuals - EWA Independant Variable - No Intercept",cex.main=1)

# urca ADF test
library(urca)
  summary(ur.df(resid$residual, type = "drift", lags = 1))

# Johansen test
# Eigen vectors are the number of shares
# Test is cointegration test
coRes=ca.jo(data.frame(df$EWA_Close, df$EWC_Close),type="trace",K=2,ecdet="none", spec="longrun")
summary(coRes)
slot(coRes,"V") # The matrix of eigenvectors, normalised with respect to the first variable.
slot(coRes,"Vorg") # The matrix of eigenvectors, such that \hat V'S_{kk}\hat V = I.

# Calculate half life of mean reversion (residuals)
# Calculate yt-1 and (yt-1-yt)
# pull residuals to a vector
residuals <- c(resid$residual)
y.lag <- c(residuals[2:length(residuals)], 0) # Set vector to lag -1 day
y.lag <- y.lag[1:length(y.lag)-1] # As shifted vector by -1, remove anomalous element at end of vector
residuals <- residuals[1:length(residuals)-1] # Make vector same length as vector y.lag
y.diff <- residuals - y.lag # Subtract todays close from yesterdays close
y.diff <- y.diff [1:length(y.diff)-1] # Make vector same length as vector y.lag
prev.y.mean <- y.lag - mean(y.lag) # Subtract yesterdays close from the mean of lagged differences
prev.y.mean <- prev.y.mean [1:length(prev.y.mean )-1] # Make vector same length as vector y.lag
final.df <- data.frame(y.diff,prev.y.mean) # Create final data frame

# Linear Regression With Intercept
result <- lm(y.diff ~ prev.y.mean, data = final.df)
half_life <- -log(2)/coef(result)[2]
half_life

# Linear Regression With No Intercept
result = lm(y.diff ~ prev.y.mean + 0, data = final.df)
half_life1 = -log(2)/coef(result)[1]
half_life1

# Print general linear regression statistics
summary(result)

# Make z-score of residuals
# Set look back equal to half life
colnames(resid)
half_life = 31
resid$mean <- SMA(resid[,"residuals"], round(half_life))
resid$stdev <- runSD(resid[,"residuals"], n = round(half_life), sample = TRUE, cumulative = FALSE)
resid$zscore <- apply(resid[,c('residuals', 'mean','stdev')], 1, function(x) { (x[1]-x[2]) / x[3] })

# Convert all NA to 0
resid[is.na(resid)] <- 0

# Place residuals data frame to original (df)
df <- data.frame(df,resid)

# Plot z-score of residuals
plot(resid$zscore,type="l", main="Rolling Z-Score - Look Back Set Equal To Half Life", col="darkblue")

###################
# Back Test Script
###################

# Create Trading Logic
# When z-score crossed below 0 buy EWA and sell short EWC per eigenvectors
df$enter.long <- ifelse(df$zscore <= -0.7, 1,0)   # for long EWA below mean
df$exit.long <- ifelse(df$zscore == 0, 1,0)    # for long EWA below mean
df$enter.short <- ifelse(df$zscore <= -0.7 , -1,0) # for short EWC below mean
df$exit.short <- ifelse(df$zscore == 0 , -1,0)  # for short EWC below mean

df$enter.short.1 <- ifelse(df$zscore >= 0.7, -1,0)   # for short EWA above mean
df$exit.short.1 <- ifelse(df$zscore == 0, -1,0)    # for short EWA above mean
df$enter.long.1 <- ifelse(df$zscore >= 0.7 , 1,0) # for long EWC above mean
df$exit.long.1 <- ifelse(df$zscore == 0 , 1,0)  # for long EWC above mean

# Calculate close to close returns
df$ewa.clret <- ROC(df$EWA_Close, type = c("discrete"))
df$ewa.clret[1] <- 0
df$ewc.clret <- ROC(df$EWC_Close, type = c("discrete"))
df$ewc.clret[1] <- 0

# Long EWA below mean
df <- df %>%
  dplyr::mutate(ewa.long = ifelse(enter.long ==  1, 1,
                                  ifelse(exit.long == 1, 0, 0)))

# Short EWC above mean
df <- df %>%
  dplyr::mutate(ewc.short = ifelse(enter.short ==  -1, 1,
                                  ifelse(exit.short == -1, 0, 0)))

# Short EWA above mean
df <- df %>%
  dplyr::mutate(ewa.short = ifelse(enter.short.1 ==  -1, 1,
                                  ifelse(exit.short.1 == 1, 0, 0)))

# Long EWC above mean
df <- df %>%
  dplyr::mutate(ewc.long = ifelse(enter.long.1 ==  1, 1,
                                   ifelse(exit.long.1 == -1, 0, 0)))

# lag signal by one forward day to signal entry next day
df$ewa.long <- lag(df$ewa.long,1) # Note k=1 implies a move *forward*
df$ewc.short <- lag(df$ewc.short,1) # Note k=1 implies a move *forward*
df$ewa.short <- lag(df$ewa.short,1) # Note k=1 implies a move *forward*
df$ewc.long <- lag(df$ewc.long,1) # Note k=1 implies a move *forward*

# Calculate equity curves
df$equity.long <- apply(df[,c('ewa.long', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short <- apply(df[,c('ewc.short', 'ewc.clret')], 1, function(x) { x[1]*x[2]})
df$equity.long.1 <- apply(df[,c('ewa.short', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short.1 <- apply(df[,c('ewc.long', 'ewc.clret')], 1, function(x) { x[1]*x[2]})

tail(df,500)

# Combine signals
df$combined <- apply(df[,c('equity.long', 'equity.short','equity.long.1','equity.short.1')], 1, function(x) { x[1]+x[2]+x[3]+x[4] })

# Pull select columns from data frame to make XTS whilst retaining format
xts1 = xts(df$equity.long, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts2 = xts(df$equity.short, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts3 = xts(df$equity.long.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts4 = xts(df$equity.short.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts5 = xts(df$combined, order.by=as.POSIXct(df$Date, format="%Y-%m-%d")) 

# Join XTS together
compare <- cbind(xts1,xts2,xts3,xts4,xts5)

# Use the PerformanceAnalytics package for trade statistics
colnames(compare) <- c("EWA Long","EWC Short","EWA Short","EWC Long","Combined")
charts.PerformanceSummary(compare,main="EWC,EWA Pair", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

Strategy – S&P500 Seasonal

During a previous post, S&P500 Seasonal Study + Other Commodities: https://flare9xblog.wordpress.com/2017/10/12/seasonal-study/

It was determined that the best months to be long the S&P500 was from months October through to the end of May.

As a refresher:

Rplot126
S&P500 Mean Daily Spread Per Month

We back test this theme to see how it performs over monthly bars from 1960 to present day (10.12.2017):

Rplot136

SP500seasoal.results

What we see is that we are invested 67% of the time being long from the Open bar of October and selling at the Close of May. We do beat the S&P500 and I believe this is partly due to drawing down less post 1999 bear market. Our maximum draw down is slightly less than than buy and hold at 43%.

What stands out is the average gain during this October to May holding period. We see that the average trade is 14.3%.

To see some relative benchmark, lets see what it looks like only buying May open and selling at the Close of September:

Rplot137

may to sep

> Return.cumulative(xts1, geometric = TRUE)
                        [,1]
Cumulative Return -0.1973985

We see cumulative returns for Buying May open and selling Close of September are -0.1973985. That is the growth of $1. Essentially May to September is negative to flat. The cumulative returns do not even warrant shorting May to September.

In closing we see that the majority of the S&P500 gains seem to be attributed to seasonal trends specifically buying in October and Selling in May.

Full back test R Code for the above


# S&P500 Seasonal Back Test 
# Month Bars
# Buy Open of October, Sell End of April (Based on seasonal study)
# Andrew Bannerman 10.12.2017

require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Data path
data.dir <- "C:/R Projects/Final Scripts/Seasonal Strategies/Data"
data.read.spx <- paste(data.dir,"$SPX.Monthly.csv",sep="/")

# Read data
read.spx <- read.csv(data.read.spx,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)

# Make dataframe 
new.df <- data.frame(read.spx)

# Convert Values To Numeric 
cols <-c(3:8)
new.df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Convert Date Column [1]
new.df$Date <- ymd(new.df$Date)

# Add day of month column 
# This is a helper column for creating buy/sell rules
group <- new.df %>% dplyr::mutate(month = lubridate::month(Date)) %>% group_by(month) 
new.df <- data.frame(new.df,group$month)

# Calculate Returns from open to close 
new.df$ocret <- apply(new.df[,c('Open', 'Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
new.df$clret <- ROC(new.df$Close, type = c("discrete"))
new.df$clret[1] <- 0

# Subset by Date
new.df <- subset(new.df, Date >= as.POSIXct("1960-01-01") )  

# Enter Long Rules 
# Buy October To End of April
x = new.df$group.month
# October to May Hold Period Rules
#new.df$enter <- ifelse(x == 1,1,0 | ifelse(x == 2,1,0 | ifelse(x == 3,1,0 | ifelse(x == 4,1,0 | ifelse(x==5,1,0 | ifelse(x == 10,1,0 | ifelse(x == 11,1,0 | ifelse(x == 12,1,0))))))))
#May to September Hold Period Rules
new.df$enter <- ifelse(x == 5,1,0 | ifelse(x == 6,1,0 | ifelse(x == 7,1,0 | ifelse(x == 8,1,0 | ifelse(x==9,1,0)))))

# Calculate equity curves
# Long
new.df <- new.df %>%
  dplyr::mutate(RunID = rleid(enter)) %>%
  group_by(RunID) %>%
  dplyr::mutate(seasonal.equity = ifelse(enter == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats 
xts1 = xts(new.df$seasonal.equity, order.by=as.POSIXct(new.df$Date, format="%Y-%m-%d")) 
xts2 = xts(new.df$clret, order.by=as.POSIXct(new.df$Date, format="%Y-%m-%d")) 
str(new.df)

# Join XTS together 
compare <- cbind(xts1,xts2)

# Use the PerformanceAnalytics package for trade statistics
colnames(compare) <- c("Seasonal","Buy And Hold")
charts.PerformanceSummary(compare,main="Long May Open To Close of September", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

# Find net trade result of multiple 'n' day trades
# Find start day of trade, find end day, perform (last(Close) - first(Open))/first(Open) % calculation
new.df <- new.df %>%
  dplyr::mutate(RunID = data.table::rleid(enter)) %>%
  group_by(RunID) %>%
  dplyr::mutate(perc.output = ifelse(enter == 0, 0,
                                     ifelse(row_number() == n(),
                                            (last(Close) - first(Open))/first(Open), 0))) %>%
  ungroup() %>%
  select(-RunID)

# Win / Loss % 
# All Holding Months
winning.trades <- sum(new.df$seasonal.equity > '0', na.rm=TRUE)
losing.trades <- sum(new.df$seasonal.equity < '0', na.rm=TRUE)
even.trades <- sum(new.df$seasonal.equity == '0', na.rm=TRUE)
total.days <- NROW(new.df$seasonal.equity)

# Multi Month Trades
multi.winning.trades <- sum(new.df$perc.output > '0', na.rm=TRUE)
multi.losing.trades <- sum(new.df$perc.output < '0', na.rm=TRUE)
multi.total.days <- multi.winning.trades+multi.losing.trades

# % Time Invested
time.invested <- (winning.trades + losing.trades) / total.days
winning.trades + losing.trades

# Calcualte win loss %
# All months
total <- winning.trades + losing.trades
win.percent <- winning.trades / total
loss.percent <- losing.trades / total
# Multi Month Trades
multi.total <- multi.winning.trades + multi.losing.trades
multi.win.percent <- multi.winning.trades / multi.total
multi.loss.percent <- multi.losing.trades / multi.total
# Calculate Consecutive Wins Loss 
# All Months
remove.zero <- new.df[-which(new.df$seasonal.equity == 0 ), ] # removing rows 0 values 
consec.wins <- max(rle(sign(remove.zero$seasonal.equity))[[1]][rle(sign(remove.zero$seasonal.equity))[[2]] == 1])
consec.loss <- max(rle(sign(remove.zero$seasonal.equity))[[1]][rle(sign(remove.zero$seasonal.equity))[[2]] == -1])
consec.wins

# Multi Month Trades
multi.remove.zero <- new.df[-which(new.df$perc.output == 0 ), ] # removing rows 0 values 
multi.consec.wins <- max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == 1])
multi.consec.loss <-max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == -1])

# Calculate Summary Statistics
# All Months
average.trade <- mean(new.df$seasonal.equity)
average.win <- mean(new.df$seasonal.equity[new.df$seasonal.equity >0])
average.loss <- mean(new.df$seasonal.equity[new.df$seasonal.equity <0])
median.win <- median(new.df$seasonal.equity[new.df$seasonal.equity >0])
median.loss <- median(new.df$seasonal.equity[new.df$seasonal.equity <0])
max.gain <- max(new.df$seasonal.equity)
max.loss <- min(new.df$seasonal.equity)
win.loss.ratio <- winning.trades / abs(losing.trades)
summary <- cbind(winning.trades,losing.trades,even.trades,total.days,win.percent,loss.percent,win.loss.ratio,time.invested,average.trade,average.win,average.loss,median.win,median.loss,consec.wins,consec.loss,max.gain,max.loss)
summary <- as.data.frame(summary)
colnames(summary) <- c("Winning Trades","Losing Trades","Even Trades","Total Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(summary)

# Multi Month Trades
multi.average.trade <- mean(new.df$perc.output)
multi.average.win <- mean(new.df$perc.output[new.df$perc.output >0])
multi.average.loss <- mean(new.df$perc.output[new.df$perc.output <0])
multi.median.win <- median(new.df$perc.output[new.df$perc.output >0])
multi.median.loss <- median(new.df$perc.output[new.df$perc.output <0])
multi.win.loss.ratio <- multi.average.win / abs(multi.average.loss)
multi.max.gain <- max(new.df$perc.output)
multi.max.loss <- min(new.df$perc.output)
multi.summary <- cbind(multi.winning.trades,multi.losing.trades,multi.total.days,multi.win.percent,multi.loss.percent,multi.win.loss.ratio,time.invested,multi.average.trade,multi.average.win,multi.average.loss,multi.median.win,multi.median.loss,multi.consec.wins,multi.consec.loss,multi.max.gain,multi.max.loss)
multi.summary <- as.data.frame(multi.summary)
colnames(multi.summary) <- c("Winning Trades","Losing Trades","Total Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(multi.summary)
print(performance.table)
print(drawdown.table)
table.Drawdowns(xts1, top=10)
Return.cumulative(xts1, geometric = TRUE)

# Write output to file
write.csv(new.df,file="C:/R Projects/SP500.Seasonal.csv")

Momentum Strategy – Boosting Returns – VFINX / VFITX

In this post we will look at a strategy that is 100% invested in the S&P500 and during adverse conditions, is 100% invested in an intermediate term treasury bond. The two funds used for the strategy:

Vanguard 500 Index Fund Investor Class(VFINX)
Vanguard Intermediate-Term Treasury Fund Investor Shares(VFITX)

The strategy is as follows using MONTHLY bars:
1. Long VFINX(SP500) when the price is over the 189 day moving average or monthly equivalent (9sma)
2. When VFINX(SP500) is below monthly 9sma, sell 100% of VFINX(SP500) at next months open and buy VFITX(Bonds)
3. When VFINX(SP500) is above the monthly 9sma, sell 100% of VFITX(Bonds) and 100% re-long VFINX(SP500)

Essentially the strategy is long the S&P500 when its above the 9sma (daily, 189sma) and 100% in intermediate bonds when below the 9sma (daily, 189sma).

The color red denotes times when VFINX(SP500) is below the monthly 9sma (daily, 189sma) and when the strategy would be 100% long bonds.

below9sma

The returns for this strategy since 1991 inception:

Rplot122

vfinx.vfitx.monthly

From the backtest we see an annualized return of 12.4% with a sharpe ratio of 1.21. Maximum draw down is 15% which means we may explore using leverage to boost gains whilst maintaining the same risk profile of buy and hold (VFINX).

We may boost returns by longing a 3x leveraged S&P500 ETF. This can be achieved with ProShares UltraPro ETF(UPRO, inception, 2009). First I want to check to see if we can model UPRO data prior to 2009.

Rplot117

We see from plotting the regression fit of UPRO and 3x SP500 daily returns that the fit is not perfect. There is a high degree of correlation. However, the % return tracking is not exact.

Rplot118

We have almost perfect tracking prior to June 2012. However, there is a notable divergence between the 3x S&P500 daily returns and UPRO.

When we model UPRO prior to 2009 by simply multiplying the S&P500 returns by 3 we know it will be largely theoretical.

Lets back test swapping VFINX with our theoretical UPRO prior to 2009.

Rplot119

upro

Maximum draw down is 45% which is significantly lower than buy and hold UPRO (92%). Essentially we enjoy staggering returns with almost half of the draw down. The 189 daily sma or monthly 9sma acts a 'low pass filter' and avoids the catastrophic bear markets post 1999 and 2007.

Rplot120

cum.ret

We see that cumulative returns are 1276.56%.

This is theoretical but staggering results.

# Get First And Last Date
last.date <- tail(df$Date, n = 1)
first.date <- head(df$Date, n = 1)
## Find Time Difference
time <- last.date - first.date
years.between <- time/352
years.between <- as.numeric(years.between, units="days")   # Extract numerical value from time difference 'Time difference of 2837.208 days'
years.between

10 thousand dollars grows to 12.7 million dollars over a time period of 26.98 years. Not bad for for a maximum draw down of 45% with 3x leverage.

In closing, the 189 daily sma or 9sma monthly equivalent acts as a low pass filter. The aim of this filter is to reduce over all volatility, provide less of a draw down and avoid negative compounding periods. We see from the results that this has been achieved.

Reader Question:
What happens if we optimize the monthly bar SMA look back higher / lower than the tested 9sma?

Full back test R code:


# Long Term S&P500 - Switch To Bonds
# Andrew Bannerman 10.8.2017

require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Data path
data.dir <- "C:/R Projects/Final Scripts/Vanguard Long Term Strategies"
data.read.VFINX <- paste(data.dir,"VFINX.csv",sep="/")
data.read.VFITX <- paste(data.dir,"VFITX.csv",sep="/")

# Read data
read.VFINX <- read.csv(data.read.VFINX,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)
read.VFITX <- read.csv(data.read.VFITX,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)
#read.VFITX <- read.VFITX[-nrow(read.VFITX),] 

# Convert Values To Numeric
cols <-c(2:7)
read.VFINX[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
read.VFITX[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

#Convert Date Column [1]
read.VFINX$Date <- ymd(read.VFINX$Date)
read.VFITX$Date <- ymd(read.VFITX$Date)

# Merge two data sets
df <- full_join(read.VFINX, read.VFITX, by = c("Date" = "Date"))

# Rename Columns
colnames(df)[1] <-"Date"
colnames(df)[2] <-"VFINX.Open"
colnames(df)[3] <-"VFINX.High"
colnames(df)[4] <-"VFINX.Low"
colnames(df)[5] <-"VFINX.Close"
colnames(df)[6] <-"VFINX.Adj.Close"
colnames(df)[7] <-"VFINX.Volume"
colnames(df)[8] <-"VFITX.Open"
colnames(df)[9] <-"VFITX.High"
colnames(df)[10] <-"VFITX.Low"
colnames(df)[11] <-"VFITX.Close"
colnames(df)[12] <-"VFITX.Adj.Close"
colnames(df)[13] <-"VFITX.Volume"

# Convert all NA to 0
df[is.na(df)] <- 0

# Use TTR package to create rolling SMA n day moving average
# Create function and loop in order to repeat the desired number of SMAs for example 2:30
getSMA <- function(numdays) {
  function(df) {
    SMA(df[,"VFINX.Adj.Close"], numdays)    # Calls TTR package to create SMA
  }
}
# Create a matrix to put the SMAs in
sma.matrix <- matrix(nrow=nrow(df), ncol=0)

# Loop for filling it
for (i in 2:400) {
  sma.matrix <- cbind(sma.matrix, getSMA(i)(df))
}

# Rename columns
colnames(sma.matrix) <- sapply(2:400, function(n)paste("adj.close.sma.n", n, sep=""))

# Bind to existing dataframe
df <-  cbind(df, sma.matrix)

# Convert all NA to 0
df[is.na(df)] <- 0

# Calculate Returns from open to close
#df$VFINX.ocret <- apply(df[,c('VFINX.Open', 'VFINX.Adj.Close')], 1, function(x) { (x[2]-x[1])/x[1]} )
#df$VFITX.ocret <- apply(df[,c('VFITX.Open', 'VFITX.Adj.Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
df$VFINX.clret <- ROC(df$VFINX.Adj.Close, type = c("discrete"))
df$VFITX.clret <- ROC(df$VFITX.Adj.Close, type = c("discrete"))
df$VFINX.clret[1] <- 0
df$VFITX.clret[1] <- 0

# Add leverage multiplier
#df$VFINX.clret <- df$VFINX.clret * 3

# Subset Date
df <- subset(df, Date >= as.POSIXct("1991-10-01") ) 

# Name indicators #
VFINX.sma <- df$adj.close.sma.n9

# Enter buy / sell rules
df$signal.long.stocks <- ifelse(df$VFINX.Adj.Close > VFINX.sma, 1,0)
df$signal.long.bonds <- ifelse(df$VFINX.Adj.Close < VFINX.sma, 1,0)

# lag signal by one forward day to signal entry next day
df$signal.long.stocks <- lag(df$signal.long.stocks,1) # Note k=1 implies a move *forward*
df$signal.long.bonds <- lag(df$signal.long.bonds,1) # Note k=1 implies a move *forward*

df[is.na(df)] <- 0  # Set NA to 0

#Plot VIFNX Monthly with 9sma
plot(df$VFINX.Adj.Close, col = ifelse(df$VFINX.Adj.Close < VFINX.sma,'red','black'), pch = 10, cex=.5,ylab="VFINX Close",main="VFINX Below Monthly 9sma")

# Calculate equity curves
# Long Stocks
df <- df %>%
  dplyr::mutate(RunID = rleid(signal.long.stocks)) %>%
  group_by(RunID) %>%
  dplyr::mutate(long.stocks.equity.curve = ifelse(signal.long.stocks== 0, 0,
                                         ifelse(row_number() == 1, VFINX.clret, VFINX.clret))) %>%
  ungroup() %>%
  select(-RunID)

# Long Bonds
df <- df %>%
  dplyr::mutate(RunID = rleid(signal.long.bonds)) %>%
  group_by(RunID) %>%
  dplyr::mutate(long.bonds.equity.curve = ifelse(signal.long.bonds == 0, 0,
                                                  ifelse(row_number() == 1, VFITX.clret, VFITX.clret))) %>%
  ungroup() %>%
  select(-RunID)

# Combine Signals
df$combined.equity.curve <- df$long.stocks.equity.curve + df$long.bonds.equity.curve

# Pull select columns from data frame to make XTS whilst retaining formats
xts1 = xts(df$long.stocks.equity.curve, order.by=as.Date(df$Date, format="%m/%d/%Y"))
xts2 = xts(df$long.bonds.equity.curve, order.by=as.Date(df$Date, format="%m/%d/%Y"))
xts3 = xts(df$combined.equity.curve, order.by=as.Date(df$Date, format="%m/%d/%Y"))
xts4 = xts(df$VFINX.clret, order.by=as.Date(df$Date, format="%m/%d/%Y")) 

# Join XTS together
compare <- cbind(xts3,xts2,xts1,xts4)

# Use the PerformanceAnalytics package for trade statistics

require(PerformanceAnalytics)
colnames(compare) <- c("Long.Stocks.Switch.Bonds","Long.Bonds","Long.Stocks","Buy And Hold")
charts.PerformanceSummary(compare,main="Long Stocks(VFINX) Over 200sma, Long Bonds(VFITX) Under 200sma", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)
Return.cumulative(xts3, geometric = TRUE)

Simple S&P500 Linear Strategy

From our hurst exponent test during a previous post we determined that the S&P500 was a mean reverting series on an interday time frame. For this back test we have created a very simple linear model to ‘fit’ the nature of the series.

We use a rolling ‘n’ day zscore and we test using a look back of 11 days, for some reason the lookback of 11 days provided the best results. For the sake of simplicity we will test over the full sample period and we will not include any commissions or slippage.

We will use the SPY for this back test from inception to present.

Here are the parameters:
1. Rolling lookback period of z-score = 11 days
2. Entry level = < -0.2
3. Exit Time = 4 days

As you can see the parameters are kept to a minimum. The main goal here is to fit a model to the existing anomaly versus force fit a model to the data.

11.

result.

As we can see we obtain better performance than buy and hold. We have about a 21% better DD than buy and hold and a 10% annualized from 1993 to present. Notice the time invested is 48%. It means we have capital available to execute to other strategies. To offset the draw downs even further we could add another uncorrelated strategy but this as a base is not a bad attempt to capture the mean reversion nature of the S&P500 on an interday basis. Note this might not be the most stellar performance but its good example to illustrate the train of thought.

Some questions to ponder?

1. What happens if we ‘fit’ a mean reversion model to S&p500 daily data from 1929 to 1980?
2. What happens if we fit a momentum model to S&p500 daily data from 1929 to 1980?
3. What happens if we fit a momentum model for S&P500 daily data from 1993 to present?

Those questions good readers shall leave for you to answer!

The rules of this system are:
1. When zscore crosses below <-0.2 buy next days open
2. After 4 days has passed sell all, if another signal is generated while in the existing trade we will take the next signal also

Notes about the rules of the backtest:
1. We calculate S&P500 close to close returns and also open to close returns. We do this in order to set the first buying day to calculate an open to close return (At the end of a day a signal is generated, we then buy next days open, thus the returns for the first day are open to close returns, on 2 day and over the returns are close to close returns.)
2. To avoid look ahead bias, after a signal is generated, we lag +1 the time series forward a day so that we are simulating buying the NEXT days open.

All parameters can be optimized and entry and exit rules adjusted. Some ideas could be:

1. For exit, exit on specific zscore value… ie buy <-0.20 and sell when it crossed back over 0.
2. For exit, ignore repeat trades while in existing trade
3. Exit soon as the price is higher than our entry

Different rules produce slightly different results. I may go into some of these different selling rules and the code for them in future posts! For now this is a good start 🙂

# Calculate rolling z-score of SPY close price
# Sell at fixed n day hold
# 8.13.2017
# Andrew Bannerman
require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Data path
data.dir <- "D:/R Projects"
data.read.spx <- paste(data.dir,"SPY.csv",sep="/")

# Read data
read.spx <- read.csv(data.read.spx,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)

# Make dataframe
new.df <- data.frame(read.spx)
tail(new.df)
# Convert Values To Numeric
cols <-c(3:8)
new.df[,cols] %% lapply(function(x) as.numeric(as.character(x)))

#Convert Date Column [1]
new.df$Date <- ymd(new.df$Date)

# Use TTR package to create rolling SMA n day moving average
# Create function and loop in order to repeat the desired number of SMAs for example 2:30
getSMA <- function(numdays) {
function(new.df) {
SMA(new.df[,"Close"], numdays) # Calls TTR package to create SMA
}
}
# Create a matrix to put the SMAs in
sma.matrix <- matrix(nrow=nrow(new.df), ncol=0)

# Loop for filling it
for (i in 2:60) {
sma.matrix <- cbind(sma.matrix, getSMA(i)(new.df))
}

# Rename columns
colnames(sma.matrix) <- sapply(2:60, function(n)paste("close.sma.n", n, sep=""))

# Bind to existing dataframe
new.df <- cbind(new.df, sma.matrix)

# Use TTR package to create rolling Standard Deviation
# Create function and loop in order to repeat the desired number of Stdev for example 2:30
getSD <- function(numdays) {
function(new.df) {
runSD(new.df$Close, numdays, cumulative = FALSE) # Calls TTR package to create SMA
}
}
# Create a matrix to put the SMAs in
sd.matrix <- matrix(nrow=nrow(new.df), ncol=0)

# Loop for filling it
for (i in 2:60) {
sd.matrix <- cbind(sd.matrix, getSD(i)(new.df))
}

# Rename columns
colnames(sd.matrix) <- sapply(2:60, function(n)paste("close.sd.n", n, sep=""))

# Bind to existing dataframe
new.df <- cbind(new.df, sd.matrix)

# Use base R to work out the rolling z-score (Close – roll mean) / stdev
new.df$close.zscore.n2 <- apply(new.df[,c('Close','close.sma.n2', 'close.sd.n2')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n3 <- apply(new.df[,c('Close','close.sma.n3', 'close.sd.n3')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n4 <- apply(new.df[,c('Close','close.sma.n4', 'close.sd.n4')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n5 <- apply(new.df[,c('Close','close.sma.n5', 'close.sd.n5')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n6 <- apply(new.df[,c('Close','close.sma.n6', 'close.sd.n6')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n7 <- apply(new.df[,c('Close','close.sma.n7', 'close.sd.n7')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n8 <- apply(new.df[,c('Close','close.sma.n8', 'close.sd.n8')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n9 <- apply(new.df[,c('Close','close.sma.n9', 'close.sd.n9')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n10 <- apply(new.df[,c('Close','close.sma.n10', 'close.sd.n10')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n11 <- apply(new.df[,c('Close','close.sma.n11', 'close.sd.n11')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n12 <- apply(new.df[,c('Close','close.sma.n12', 'close.sd.n12')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n13 <- apply(new.df[,c('Close','close.sma.n13', 'close.sd.n13')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n14 <- apply(new.df[,c('Close','close.sma.n14', 'close.sd.n14')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n15 <- apply(new.df[,c('Close','close.sma.n15', 'close.sd.n15')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n16 <- apply(new.df[,c('Close','close.sma.n16', 'close.sd.n16')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n17 <- apply(new.df[,c('Close','close.sma.n17', 'close.sd.n17')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n18 <- apply(new.df[,c('Close','close.sma.n18', 'close.sd.n18')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n19 <- apply(new.df[,c('Close','close.sma.n19', 'close.sd.n19')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n20 <- apply(new.df[,c('Close','close.sma.n20', 'close.sd.n20')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n21 <- apply(new.df[,c('Close','close.sma.n21', 'close.sd.n21')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n22 <- apply(new.df[,c('Close','close.sma.n22', 'close.sd.n22')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n23 <- apply(new.df[,c('Close','close.sma.n23', 'close.sd.n23')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n24 <- apply(new.df[,c('Close','close.sma.n24', 'close.sd.n24')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n25 <- apply(new.df[,c('Close','close.sma.n25', 'close.sd.n25')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n26 <- apply(new.df[,c('Close','close.sma.n26', 'close.sd.n26')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n27 <- apply(new.df[,c('Close','close.sma.n27', 'close.sd.n27')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n28 <- apply(new.df[,c('Close','close.sma.n28', 'close.sd.n28')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n29 <- apply(new.df[,c('Close','close.sma.n29', 'close.sd.n29')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n30 <- apply(new.df[,c('Close','close.sma.n30', 'close.sd.n30')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n31 <- apply(new.df[,c('Close','close.sma.n31', 'close.sd.n31')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n32 <- apply(new.df[,c('Close','close.sma.n32', 'close.sd.n32')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n33 <- apply(new.df[,c('Close','close.sma.n33', 'close.sd.n33')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n34 <- apply(new.df[,c('Close','close.sma.n34', 'close.sd.n34')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n35 <- apply(new.df[,c('Close','close.sma.n35', 'close.sd.n35')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n36 <- apply(new.df[,c('Close','close.sma.n36', 'close.sd.n36')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n37 <- apply(new.df[,c('Close','close.sma.n37', 'close.sd.n37')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n38 <- apply(new.df[,c('Close','close.sma.n38', 'close.sd.n38')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n39 <- apply(new.df[,c('Close','close.sma.n39', 'close.sd.n39')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n40 <- apply(new.df[,c('Close','close.sma.n40', 'close.sd.n40')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n41 <- apply(new.df[,c('Close','close.sma.n41', 'close.sd.n41')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n42 <- apply(new.df[,c('Close','close.sma.n42', 'close.sd.n42')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n43 <- apply(new.df[,c('Close','close.sma.n43', 'close.sd.n43')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n44 <- apply(new.df[,c('Close','close.sma.n44', 'close.sd.n44')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n45 <- apply(new.df[,c('Close','close.sma.n45', 'close.sd.n45')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n46 <- apply(new.df[,c('Close','close.sma.n46', 'close.sd.n46')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n47 <- apply(new.df[,c('Close','close.sma.n47', 'close.sd.n47')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n48 <- apply(new.df[,c('Close','close.sma.n48', 'close.sd.n48')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n49 <- apply(new.df[,c('Close','close.sma.n49', 'close.sd.n49')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n50 <- apply(new.df[,c('Close','close.sma.n50', 'close.sd.n50')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n51 <- apply(new.df[,c('Close','close.sma.n51', 'close.sd.n51')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n52 <- apply(new.df[,c('Close','close.sma.n52', 'close.sd.n52')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n53 <- apply(new.df[,c('Close','close.sma.n53', 'close.sd.n53')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n54 <- apply(new.df[,c('Close','close.sma.n54', 'close.sd.n54')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n55 <- apply(new.df[,c('Close','close.sma.n55', 'close.sd.n55')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n56 <- apply(new.df[,c('Close','close.sma.n56', 'close.sd.n56')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n57 <- apply(new.df[,c('Close','close.sma.n57', 'close.sd.n57')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n58 <- apply(new.df[,c('Close','close.sma.n58', 'close.sd.n58')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n59 <- apply(new.df[,c('Close','close.sma.n59', 'close.sd.n59')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$close.zscore.n60 <- apply(new.df[,c('Close','close.sma.n60', 'close.sd.n60')], 1, function(x) { (x[1]-x[2])/x[3] } )

# Convert all NA to 0
new.df[is.na(new.df)] <- 0

# Calculate quartiles, where close is relation to range (Close – High) / (High – Low)
#new.df$quartile <- apply(new.df[,c('Close', 'Low', 'High')], 1, function(x) { (x[1]-x[2])/(x[3]-x[2])} )

# Calculate Returns from open to close
new.df$ocret <- apply(new.df[,c('Open', 'Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
new.df$clret <- ROC(new.df$Close, type = c("discrete"))
new.df$clret[1] <- 0

# Name indicators
indicator <- new.df$close.zscore.n12

# Create Long Signals
new.df$signal <- ifelse(indicator < -.2, 1, 0)

######## Fixed 'n' day hold logic ##########
# Variable for loop
indicator <- new.df$signal

# Create Vector From Signal
signal.1 <- c(indicator)

# Set variable for number of days to hold
n.day <- 4

# Loop for fixed 'n' day hold
res <- NULL
while (length(res) < length(signal.1)) {
if (signal.1[length(res)+1] == 1) {
res <- c(res, rep(1,n.day))
} else {
res <- c(res, 0)
}
}
res <- res[1:length(signal.1)]
new.df <- data.frame(new.df,response = res)

# lag signal by one forward day to signal entry next day
new.df$response %
group_by(RunID) %>%
mutate(equity.curve = ifelse(response == 0, 0,
ifelse(row_number() == 1, ocret, clret))) %>%
ungroup() %>%
select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats
xts1 = xts(new.df$equity.curve, order.by=as.Date(new.df$Date, format=”%m/%d/%Y”))
xts2 = xts(new.df$clret, order.by=as.Date(new.df$Date, format=”%m/%d/%Y”))

# Join XTS together
compare <- cbind(xts1,xts2)

# Use the PerformanceAnalytics package for trade statistics
# install.packages("PerformanceAnalytics")
require(PerformanceAnalytics)
colnames(compare) <- c("Mean Reversion","Buy And Hold")
charts.PerformanceSummary(compare,main="Cumulative Returns", wealth.index=TRUE, colorset=rainbow12equal)
#png(filename="20090606_rsi2_performance_updated.png", 720, 720)
performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
logRets %
group_by(RunID) %>%
dplyr::mutate(perc.output = ifelse(response == 0, 0,
ifelse(row_number() == n(),
(last(Close) – first(Open))/first(Open), 0))) %>%
ungroup() %>%
select(-RunID)

# All NA to 0
new.df[is.na(new.df)] <- 0
# Win / Loss %
# 1 Day Hold Trades
winning.trades '0', na.rm=TRUE)
losing.trades <- sum(new.df$equity.curve < '0', na.rm=TRUE)
total.days <- NROW(new.df$equity.curve)
# Multi Day Hold Trades
multi.winning.trades '0', na.rm=TRUE)
multi.losing.trades <- sum(new.df$perc.output < '0', na.rm=TRUE)
multi.total.days <- NROW(new.df$perc.output)
multi.losing.trades
# % Time Invested (Same column for 1 day and multi hold trades)
time.invested <- (winning.trades + losing.trades) / total.days
winning.trades + losing.trades
winning.trades
losing.trades

# Calcualte win loss %
# 1 Day Hold Trades
total <- winning.trades + losing.trades
win.percent <- winning.trades / total
loss.percent <- losing.trades / total
# Multi Day Hold Trades
multi.total <- multi.winning.trades + multi.losing.trades
multi.win.percent <- multi.winning.trades / multi.total
multi.loss.percent <- multi.losing.trades / multi.total
# Calculate Consecutive Wins Loss
# 1 Day Hold Trades
remove.zero <- new.df[-which(new.df$equity.curve == 0 ), ] # removing rows 0 values
consec.wins <- max(rle(sign(remove.zero$equity.curve))[[1]][rle(sign(remove.zero$equity.curve))[[2]] == 1])
consec.loss <- max(rle(sign(remove.zero$equity.curve))[[1]][rle(sign(remove.zero$equity.curve))[[2]] == -1])

# Multi Day Hold Trades
multi.remove.zero <- new.df[-which(new.df$perc.output == 0 ), ] # removing rows 0 values
multi.consec.wins <- max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == 1])
multi.consec.loss <-max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == -1])

# Calculate Summary Statistics
# 1 Day Hold Trades OR all days if multi holding
average.trade <- mean(new.df$equity.curve)
average.win 0])
average.loss <- mean(new.df$equity.curve[new.df$equity.curve <0])
median.win 0])
median.loss <- median(new.df$equity.curve[new.df$equity.curve <0])
max.gain <- max(new.df$equity.curve)
max.loss <- min(new.df$equity.curve)
win.loss.ratio <- winning.trades / abs(losing.trades)
summary <- cbind(winning.trades,losing.trades,win.percent,loss.percent,win.loss.ratio,time.invested,average.trade,average.win,average.loss,median.win,median.loss,consec.wins,consec.loss,max.gain,max.loss)
summary <- as.data.frame(summary)
colnames(summary) <- c("Winning Trades","Losing Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(summary)

# Multi Day Hold Trades
multi.average.trade <- mean(new.df$perc.output)
multi.average.win 0])
multi.average.loss <- mean(new.df$perc.output[new.df$perc.output <0])
multi.median.win 0])
multi.median.loss <- median(new.df$perc.output[new.df$perc.output <0])
multi.win.loss.ratio <- multi.average.win / abs(multi.average.loss)
multi.max.gain <- max(new.df$perc.output)
multi.max.loss <- min(new.df$perc.output)
multi.summary <- cbind(multi.winning.trades,multi.losing.trades,multi.win.percent,multi.loss.percent,multi.win.loss.ratio,time.invested,multi.average.trade,multi.average.win,multi.average.loss,multi.median.win,multi.median.loss,multi.consec.wins,multi.consec.loss,multi.max.gain,multi.max.loss)
multi.summary <- as.data.frame(multi.summary)
colnames(multi.summary) <- c("Winning Trades","Losing Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(multi.summary)
print(performance.table)
print(drawdown.table)
table.Drawdowns(xts1, top=10)
charts.PerformanceSummary(compare,main="Cumulative Returns",wealth.index=TRUE,colorset=rainbow12equal)

# Write output to file
write.csv(new.df,file="D:/R Projects/spy_mean.csv")