SPY Mean Reversion With John Ehlers Adaptive RSI

It has been a busy few months. I have been exploring market indicators that John Ehlers has created which he publicly made available in his book: Cycle Analytics for Traders : Advanced Technical Trading Concepts.

The key theme of his book is applying digital signal processing filters to better process market data. He utilizes various high and low pass filters which only allow certain frequencies to pass at specified look back periods and thus make better attempts to reduce signal to noise and perhaps extract a dominant cycle from the data.

Key ideas from the book:

  1. Traditional indicators have lag and assume normal probability distribution
  2. His indicators reduce lag
  3. Extracting the dominant cycle over a range of periods can be used to calibrate traditional indicators such as RSI, stochastic etc….
  4. Has methods to convert indicator output to as close to normal probability distribution

I’m in the process of making his indicators from his Cycle analytics for traders book available on my github in a package called MarketCycles.jl. Please refer to the package README.md for further information.

The package will be used in the next part of this post where we use an Adaptive RSI to time the mean reversion tendency of the daily S&P500.

First install the package:

Pkg.clone("https://github.com/flare9x/MarketCycles.jl")

The Adaptive RSI is adaptive because the indicator extracts the dominant cycle using the autocorrelation periodogram. At any given point in time [i] half the dominant cycle period is used to calibrate the lookback length of the RSI.

I want to point out a few stark differences with the below strategy back test code. First, the code is setup to BUY NEXT BAR AT OPEN. This means when a signal is triggered at the close of a bar. A trade is then entered at the next open price.

Secondly, rather than calculate the cumulative % close to close returns as so many do. We specify an initial starting capital and add the $ gain to a running total adjusting for trading commissions which provides us with a result that is more in line with how it might work in the real world.

Please see top of code snippet in order to adjust the following input variables for the strategy:

# Strategy input variables
starting_capital = 10000
comms = 10
entry_level = .3
exit_level = .8

The rest is best described by following the below julia language code. Note if you do not have a data source I have made this post easy to follow by providing a code to download data from the alphavantage api. You just need to edit the alphavantage link and add in your own API key:

# Ehlers mean reversion

using HTTP
using DataFrames
using CSV
using Gadfly
using MarketCycles

# Multiple symbols
adjusted = 1  # if want adjusted closes set to 1 else non adjusted set it to 0
t = 1
tickers = ["SPY"]
for t in 1:length(tickers)  # Note it might get stuck!! if it fails.. print the t value... then restart the for loop for t in 4:length(tickers)  instead of 1:length(tickers) for example
# Using HTTP package
sleep(10)  # sleep between API calls
if adjusted == 1
res = HTTP.get(joinpath(
"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
else
    res = HTTP.get(joinpath(
    "https://www.alphavantage.co/query?function=TIME_SERIES_MONTHLYD&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
end
mycsv = readcsv(res.body)
x = convert(DataFrame, mycsv)
x = x[2:nrow(x),:]  # subset remove header row
# Rename Columns
if adjusted == 1
    colnames = ["Date","Open","High","Low","Close","Adjusted_Close","Volume","Dividend_Amount","split_coefficient"]
else
colnames = ["Date","Open","High","Low","Close","Volume"]
end
names!(x.colindex, map(parse, colnames))
# Convert String Date to Date format
x[:Date] = Date.(x[:Date],Dates.DateFormat("yyy-mm-dd"))
# Sort Date Frame By Date
x = sort!(x, [order(:Date)], rev=(false))
# Convert OHLC to Float64 and Volume to Int64
if adjusted == 1
for i in 2:length(x)-2
    x[i] = convert(Array{Float64,1},x[i])
end
    x[7] = convert(Array{Int64,1},x[7])
    x[8] = convert(Array{Float64,1},x[8])
else
    for i in 2:length(x)-1
        x[i] = convert(Array{Float64,1},x[i])
    end
        x[6] = convert(Array{Int64,1},x[6])
end

if adjusted == 1
CSV.write(joinpath("CSV_OUT_ADJUSTED_"tickers[t]".csv"), x)
else
CSV.write(joinpath("CSV_OUT_"tickers[t]".csv"), x)
end
end

# Load Data
SPY = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/Indicators/CSV_OUT_ADJUSTED_SPY.csv",types=[String; fill(Float64, 5); Int;Float64],header=true)
SPY[:Date] = DateTime.(SPY[:Date],Dates.DateFormat("mm/dd/yyyy"))

# Pull arrays
Date_index = SPY[:Date]
Spy_close = SPY[:Close]
Spy_open = SPY[:Open]

# Eherls Adaptive RSI
Adaptive_RSI = AdaptiveRSI(Spy_close)

# Strategy input variables
starting_capital = 10000.00
comms = 10
entry_level = .3
exit_level = .8

# Cross under entry level and sell when cross over exit level
entry = zeros(Adaptive_RSI)
exit = zeros(Adaptive_RSI)
for i = 2:size(Adaptive_RSI,1)
    if (Adaptive_RSI[i] <= entry_level) && (Adaptive_RSI[i-1] > entry_level)
        entry[i] = 1
    else
        entry[i] = 0
    end
    if (Adaptive_RSI[i] >= exit_level) && (Adaptive_RSI[i-1] < exit_level)
        exit[i] = 2
    else
        exit[i] = 0
    end
end

# combine entry and exit
signal = exit .+ entry


# Fill the trade between entry and exit with 1's
all_signal = zeros(signal)
for i = 2:length(signal)
    print(i)
    if signal[i] == 1.0
        all_signal[i] = 1.0
    end
    if signal[i] == 0.0 && all_signal[i-1] == 1.0
        all_signal[i] = 1
    end
    if signal[i-1] == 2.0 && signal[i] == 0.0
        all_signal[i-1] = 1.0
    end
end

# ID the start and end of the trade
trade_start = zeros(all_signal)
trade_end = zeros(all_signal)
for i = 2:size(trade_end,1)
if all_signal[i] == 1.0 && all_signal[i] != all_signal[i-1]
    trade_start[i] = 1
end
if all_signal[i-1] == 1.0 && all_signal[i-1] != all_signal[i]
    trade_end[i-1] = 2
end
end

# Calculate the % return buying at open and selling at the close
open_price = Float64[]
close_price = Float64[]
strategy_returns = zeros(size(all_signal,1))
for i = 1:size(all_signal,1)-1
    if trade_start[i] == 1
        open_price = Spy_open[i+1]
    elseif trade_end[i] == 2
        close_price = Spy_close[i]
        # place the close return on same index position as the end of the trade
    strategy_returns[i] = close_price /  open_price - 1
end
end


# Work out gain on starting investment amount
dollar_gain = zeros(size(strategy_returns,1))
dollar_gain[1] = starting_capital

i=1
for i=2:size(strategy_returns,1)
    # if the returns are 0 then use the previous $ amount
if strategy_returns[i] == 0.0
    dollar_gain[i] = dollar_gain[i-1]
elseif strategy_returns[i] != 0.0
    dollar_gain[i] = dollar_gain[i-1] + (dollar_gain[i-1] * strategy_returns[i]) - comms
end
end

# plot final result
white_panel = Theme(
	panel_fill="white",
    default_color="blue",
    background_color="white"
)
p = plot(x=Date_index,y=dollar_gain,Geom.line,Guide.title("Ehlers Adaptive RSI"),white_panel,Scale.y_continuous(format=:plain))
         draw(PNG("C:/Users/Andrew.Bannerman/Desktop/Julia/ehlers_rsi.png", 1500px, 800px), p);

And we may plot the output which is the growth of $10,000 inclusive of $5 round trip commissions:

ehlers_rsi

It should be noted that default settings from Ehlers book were used as a High and low pass filter as well as the range of lags used in the autocorrelation periodogram to calculate the dominant cycle.

The function I created has many inputs that may be calibrated:

AdaptiveRSI(x::Array{Float64}; min_lag::Int64=1, max_lag::Int64=48,LPLength::Int64=10, HPLength::Int64=48, AvgLength::Int64=3)::Array{Float64}

Thus the default settings might not be best tuned to SPY daily data. One would be advised to iterate through the input variables to learn how each variable affects the output as well as find a correct tuning of the indicator.

If anyone wishes to have a more professional back test script with varying entry / exits methods that accounts for slippage and commissions do not hesitate to contact me.

Follow me on twitter: @flare9x

Github

Testing Cryptocurrency Coin Price Series for Momentum, Mean Reversion, Random Walk

In this post we will test a modified version of the relative strength index to test for the presence of persistent momentum, mean reversion and random walk properties of coin price series.

The Relative Strength Indicator (RSI) was developed by J. Welles Wilder and published in a 1978 book, New Concepts in Technical Trading Systems, and in Commodities magazine (now Futures magazine) in the June 1978 issue (wikipedia).

The RSI can be calculated by:
U = close − close (Upward Change)
D = close − close (Downward Change)

If U is higher, then D = 0. If D is lower, then U = 0.

An (n) period exponential moving average is calculated over the U and D values.

The ratio of these averages is the relative strength or relative strength factor:
RS = EMA(U,n) / EMA(D,n)
n denoting the look back period.

The final RS ratio is converted into an index which oscillates between 0 and 100:

RS = 100 – (100/ 1+100)

In its classic form Wilder recommended a smoothing period of 14. Furthermore, a period of 14 days has been tested on the S&P500, (Adrian Ţăran-Moroşan(2011). The author of the paper selects RSI extreme signal levels of 30 and 70. Sell short the market over 70 and buy the market below 30. The test yielded poor results, however, there may be room for improvement in the look back period that has been selected. There is no consideration for the S&P500 half life of mean reversion. For example. If the S&P500 typically mean reverts on a period of less that 14 days, the classic 14 period would mask most signals. An optimal look back period equal to the half life of mean reversion may have yielded better results.

In this post we will examine a short term RSI set at an arbitrarily chosen look back period of 2 days. We select RSI extreme signal levels of 20 and 70 respectively. The strategy will buy the market when RSI2 is over 70 and will exit the trade when RSI2 drops below 85. Conversely, the strategy will buy the market when RSI2 is below 20 and will sell when RSI2 is over 70.

In essence, the two RSI models above test for momentum (buy high and sell higher) and test for mean reversion (buy low and sell high). Those series where both momentum and mean reversion models do not yield good results, those series may be considered random walks, where random walk series may be attributed to being difficult to predict.

Persistent price anomalies may exist in price series. If this holds true then the models above may show medium / long term stability of predictable returns.

Next, we download 1568 daily coin price series to .csv format using the crypto compare API. Using R we achieve this with the following code:

# Crypto Compare Coin Data
# Download using R
# Andrew Bannerman 11.29.2017

library(jsonlite)
library(data.table)
library(ggplot2)

# Obtain coin list
response = fromJSON("https://www.cryptocompare.com/api/data/coinlist")
df = as.data.frame(data.table::rbindlist(response$Data, fill=TRUE))
# Write coin list to .csv
write.csv(df,file="D:/R Projects/Final Scripts/BitCoin Scripts/coin_list.csv")

# Load Tickets From Coin List
read.data <- read.csv("D:/R Projects/Final Scripts/BitCoin Scripts/coin_list.csv", header=TRUE, stringsAsFactors = FALSE)
tickers <- c(read.data$Symbol)
# Remove all * at end of file names
tickers <- gsub("[*]", "", tickers)

# Obtain API (information only)
#cc <- fromJSON("https://min-api.cryptocompare.com/")
#str(cc)

# Download Daily Data
# Loop for downloading all .csvs
# Save each coin close plot
# Add error catching to loop
# Add completion time of each iteration
i=1
for (i in 1:length(tickers)) {
  tryCatch({
  next.coin <- tickers[i] # next coin in ticker vector
coin_daily <- fromJSON(paste0("https://min-api.cryptocompare.com/data/histoday?fsym=",next.coin,"&tsym=USD&allData=true&e=CCCAGG"))
df <- data.frame("time" = coin_daily$Data$time, "close" = coin_daily$Data$close,"open" = coin_daily$Data$open,"high" = coin_daily$Data$high,"low" = coin_daily$Data$low,"volumefrom" = coin_daily$Data$volumefrom,"volumeto" = coin_daily$Data$volumeto)
# Save plot out put to folder
# Set path, set png plot size
# ggplot 2 to create line plot
mytitle = paste0(next.coin)
graphics.off() # Close graphic device before next plot
p <- ggplot(df, aes(time)) +
  theme_classic()+
  geom_line(aes(y=close), colour="red") +
  ggtitle(mytitle, subtitle = "") +
  labs(x="Time",y="Close")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))
ggsave(path="D:/R Projects/Final Scripts/BitCoin Scripts/daily_plot",paste(next.coin,".png"))
# Save each file to .csv
write.csv(df,paste0(file="D:/R Projects/Final Scripts/BitCoin Scripts/data/",next.coin,".csv"))
ptm0 <span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span><- proc.time()
Sys.sleep(0.1)
ptm1=proc.time() - ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

This coin universe will be used to test for mean reversion, momentum and random walk using RSI2.

An R script was setup to run the RSI2 momentum and mean reversion model on each coin series. No effort was made to restrict the starting date of each series, rather the entire price series was tested in each instance.

The end of day closing prices were used for testing. To avoid look ahead bias. When an RSI2 signal was triggered. The entry was placed on the exchange at the next days market open. Subsequently, a sell signal was closed out on the same day of the signal. In real life, the script would run 15 minutes before the close and any conditions met would close out prior to end of the day. In this back test, due to my absence of higher frequency coin data, this has not been included. The results would be for guidance only not for exact specifics. However, despite the 15 minute discrepancy, it should still serve as a basis for revealing mean reversion, momentum and random walk series.

No effort has been made to optimize parameters, optimize model parameters in and out of sample data, no statistical testing for mean reversion, momentum and no half life of mean reversion calibrations. These tests may aid in selecting optimal look back periods post initial RSI2 mean reversion, momentum screening and also avoid effects of look ahead bias and over optimization.

First, the top 50 momentum price series results are provided in table 1.0. To refresh, a momentum buy signal was generated when RSI2 closed over 70 and a sell signal generated when RSI2 crossed below 85. Note these parameters have been arbitrarily chosen. Data is sorted by top momentum cumulative returns.

momo_coin
Table 1.0 – Table sorted by highest momentum cumulative return

Lastly, the top 50 mean reversion price series results are provided in table 1.1. To refresh, a mean reversion buy signal was generated when RSI2 closed below 20 and a sell signal generated when RSI2 crossed above 70. Data is sorted by top mean reversion cumulative returns.

mean_rev_coin
Table 1.1 – Table sorted by highest mean reversion cumulative return

The RSI2 momentum, mean reversion and random walk screening reveled a total of 202 momentum series, where momentum out performed buy and hold. A total of 1086 mean reversion series, where mean reversion out performed buy and hold. A total of 340 random walk series, where buy and hold out performed both momentum and mean reversion RSI2 models. Based on this test, it would appear that mean reversion is the dominating anomaly within the coin price series.

Rplot161

The series obtained from the above RSI2 screening should be further selected for in-sample optimization and out of sample testing. The below figure shows an example of how this process may be conducted and typically is referred to as the hold out method:

optimize
Example of hold out method for conducting in-sample optimization and out-of-sample testing

For each in-sample period (orange), the model parameters are optimized and subsequently selected for forward out of sample testing (green). At the end of the first out of sample test, the in-sample period is then re-set by the fixed in-sample look back period. The model parameters are once again optimized and used for subsequent out of sample testing. This process is repeated for the entire sample period.

To conclude, the RSI2 may be a simple method to test for momentum, buy high and sell higher and mean reversion, buy low and sell high. These attributes may provide an under lying anomaly which can be exploited using simple models. The RSI look back period of 2 was arbitrarily chosen. This may serve as a purpose for revealing momentum or mean reversion series.

Furthermore, the RSI 2 may not be the correct look back to use for the the specific series under examination. This would be true because each and every series display different rates of diffusion from the mean. Those with momentum properties move away from the mean at a higher rate than mean reversion series or a slower rate than random walk. Mean reversion series exhibit different variances from the mean and also different speeds of reversion to the mean. The former relating directly to how much profit can be achieved from the mean reverting series. The latter relating to the how long it takes for a series to revert to the mean. A mean reversion speed of 9 months may not be suitable for a shorter term trader, not to mention, may not be the best deployment of capital.

For further research, the speed of mean reversion may be calculated using the Ornstein-Uhlenbeck Formula which is presented in a previous post. An optimal look back period for a mean reversion series would be considered equal to the half life of mean reversion (Ernest P. Chan,2013).

In closing, it should also be noted that there is a large benefit to using statistical testing to confirm the presence of momentum, mean reversion and random walk. This is due to the fact that statistical tests cover all data points in a sample. Where a model that is pre-selected like the above RSI2, only considers days when the RSI2 trigger was active, thus eliminating many days from the sample (Ernest P. Chan,2013).

A statistical test for confirming momentum, mean reversion and random walk is the hurst exponent. This is presented in two previous posts:
Hurst exponent in R
Modelling the Hurst exponent in R

The augmented dickey fuller test may be used to confirm the presence of mean reversion. Where one may reject the null hypothesis.

Good trading!
Andrew

Pairs Trading – Testing For Conintergration, ADF (unit root), Johansen Test & Half Life of Mean Reversion

Note: I back test this 100% incorrectly. I also wish to add scaling of the independent variable per the hedge ratio correctly. I am leaning towards working in share prices and will update accordingly. What is correct is all the statistical testing, however, the back test is 100% incorrect. I also run the johansen test on the stationary series obtained from the hedge ratio through linear regression, this is an unnecessary step as we can use the hedge ratio from linear regression to scale the independent variable position sizes. I will get to the revisions when I can.

A statistical arbitrage opportunity exists between a cointegrated pair when the price between the pair moves out of equilibrium. This short term disequilibrium creates a trading opportunity where one enters a short position in one series with an equal weighted long position in the other series. The aim is that two series will eventually converge and the trade be closed out with a profit.

Essentially between two cointegrated pairs, the spread is mean reverting. The time of mean reversion is one factor to consider. How long will it take for the period of disequilibrium to revert to the mean and will it ever revert to the mean? Co-integration can and does break down. So one question to answer would be: How do I know if the trade will converge, when is it time to exit with a loss?

The half life of mean reversion which is calculated on the spread can help answer this question. If historically a pair of stocks A and B have shown to mean revert within ‘n’ days, if this threshold is exceeded by a long margin OR the trade meets a pre determined stop loss % OR a stop loss measured in standard deviations. The trade can be closed out with a loss.

A pair of stocks can have a linear trend however, we can form a stationary series by estimating the hedge ratio with ordinary least squares regression. In practice we run 2x ols regressions. One with Stock A as the independent variable and one with Stock B as the independent variable. We then pick the regression with the most significant t-statistic.

First lets get a feel for the data and plot the closing prices of EWA and EWC. I want to use these two series and time period to see if we can replicate Ernie Chans study in his book: Algorithmic Trading: Winning Strategies and Their Rationale.

Rplot72
EWA and EWC visually appear to have a high correlation. The pearson correlation coefficient is 0.9564724. If two series are correlated it essentially means they move in the same direction most of the time. However, it tells you nothing about the nature of the spread between two series and if they will even converge (mean revert) or continue to diverge. With co integration, we form a linear relationship between Stock A and Stock B where the distance between A and B is somewhat fixed and would be estimated to mean revert. In essence, we measure the ‘distance’ between A and B in terms of standard deviations and a trade would be taken when ‘n’ standard deviations is met. Also a stop loss may be placed outwith the normal sigma range ensuring the trade is exited if the relationship breaks down.

A scatter plot shows the variation around the ols regression fit:

# Scatter plot of price series
# plot x,y
plot(df$EWA_Close, df$EWC_Close, main="Scatterplot EWA and EWC With OLS Regression Fit",
     xlab="EWA Close", ylab="EWC Close ", pch=19)
abline(lm(df$EWC_Close~df$EWA_Close), col="red") # regression line (y~x)

Rplot71

We perform ordinary least squares regression using EWA as the independent variable (x) and EWC as the dependent variable (y). Note we set the intercept to 0 in so that the regression fit provides only the hedge ratio between EWA and EWC.


# Ordinary least squares regression
# EWA independant
ols <- lm(df$EWC_Close~df$EWA_Close+0) # regression line (y~x)
summary(ols)
beta <- coef(ols)[1]

resid <- as.data.frame(df$EWC_Close - beta * df$EWA_Close)
resid <- data.frame(resid,"Date" = df$Date)
colnames(resid)[1] <- "residuals"
plot(resid$Date,resid$residual, col = 'navyblue', xlab = 'Date', ylab = 'residuals', type="l", main=" EWA,EWC Residuals - EWA Independant Variable - No Intercept",cex.main=1)

Rplot75

In the above, we used ordinary least squares regression to estimate the hedge ratio (beta) and we form the spread by subtracting EWC close – the beta * EWA close. Note we use the closing prices and not the daily returns. By using the closing prices we form the hedge ratio that would provide us with the number of shares to go long/short for each ETF.

The augmented dickey fuller test is used to determine if a pair is non stationary or stationary (trending or non trending). The purpose of the test is to reject the null hypothesis (unit root). If we have unit root it means we have a non-stationary series. We wish to have a stationary series so that we may trade efficiency using mean reversion.


# urca ADF test
library(urca)
  summary(ur.df(resid$residual, type = "drift", lags = 1))

###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
############################################### 

Test regression drift 

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)

Residuals:
     Min       1Q   Median       3Q      Max
-1.17542 -0.17346 -0.00447  0.16173  1.27800 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.0006657  0.0072252   0.092 0.926599
z.lag.1     -0.0176743  0.0053087  -3.329 0.000892 ***
z.diff.lag  -0.1751998  0.0254683  -6.879 8.82e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2781 on 1495 degrees of freedom
Multiple R-squared:  0.04101,	Adjusted R-squared:  0.03973
F-statistic: 31.97 on 2 and 1495 DF,  p-value: 2.548e-14

Value of test-statistic is: -3.3293 5.5752 

Critical values for test statistics:
      1pct  5pct 10pct
tau2 -3.43 -2.86 -2.57
phi1  6.43  4.59  3.78

The ADF t-statistic is -3.3293 which is lower than the critical value of 5% ( below -2.86). We may reject the unit root hypothesis with 95% certainty that we have a non-stationary relationship. If we flip the independent variable from EWA to EWC and re-run OLS and the augmented dickey fuller test we obtain a t-statistic of -3.3194 which is lower than the 5% critical value of -2.86 which means we have 95% certainty the series is non-stationary. In this case we choose the most negative / statistically significant test-statistic of EWA as independent variable.

Next to determine the number of shares we need to purchase / short we may use the Johansen test. The eigenvectors determine the weights for each ETF.

# Johansen test
# Eigen vectors are the number of shares
# Test is cointegration test
coRes=ca.jo(data.frame(df$EWA_Close, df$EWC_Close),type="trace",K=2,ecdet="none", spec="longrun")
summary(coRes)
slot(coRes,"V") # The matrix of eigenvectors, normalised with respect to the first variable.

######################
# Johansen-Procedure #
###################### 

Test type: trace statistic , with linear trend 

Eigenvalues (lambda):
[1] 0.010146673 0.002667727

Values of teststatistic and critical values of test:

          test 10pct  5pct  1pct
r <= 1 |  4.00  6.50  8.18 11.65
r = 0  | 19.28 15.66 17.95 23.52

Eigenvectors, normalised to first column:
(These are the cointegration relations)

                df.EWA_Close.l2 df.EWC_Close.l2
df.EWA_Close.l2       1.0000000       1.0000000
df.EWC_Close.l2      -0.7772904       0.6080793

Weights W:
(This is the loading matrix)

               df.EWA_Close.l2 df.EWC_Close.l2
df.EWA_Close.d     0.004309716    -0.003029255
df.EWC_Close.d     0.028971878    -0.002649008

The eigenvectors form the weights for each ETF. Simply put, if allocate $10,000 to each ETF. Purchase 10,000 * 1 / EWA close to determine the number of shares to long EWA. When shorting EWC, purchase 10,000 * -0.7772904 / EWC Close. Thus our weightings form a stationary portfolio.

In order to determine how long the cointegrated relationship takes to revert to the mean we can perform a linear regression of the 1 lagged daily differences minus the mean of the 1 lagged differences. Also known at the half life of mean reversion.

 

# Calculate half life of mean reversion (residuals)
# Calculate yt-1 and (yt-1-yt)
# pull residuals to a vector
residuals <- c(resid$residual)
y.lag <- c(residuals[2:length(residuals)], 0) # Set vector to lag -1 day
y.lag <- y.lag[1:length(y.lag)-1] # As shifted vector by -1, remove anomalous element at end of vector
residuals <- residuals[1:length(residuals)-1] # Make vector same length as vector y.lag
y.diff <- residuals - y.lag # Subtract todays close from yesterdays close
y.diff <- y.diff [1:length(y.diff)-1] # Make vector same length as vector y.lag
prev.y.mean <- y.lag - mean(y.lag) # Subtract yesterdays close from the mean of lagged differences
prev.y.mean <- prev.y.mean [1:length(prev.y.mean )-1] # Make vector same length as vector y.lag
final.df <- data.frame(y.diff,prev.y.mean) # Create final data frame

# Linear Regression With Intercept
result <- lm(y.diff ~ prev.y.mean, data = final.df)
half_life <- -log(2)/coef(result)[2]
half_life

# Linear Regression With No Intercept
result = lm(y.diff ~ prev.y.mean + 0, data = final.df)
half_life1 = -log(2)/coef(result)[1]
half_life1

# Print general linear regression statistics
summary(result)

The half life of mean reversion is 31.62271 days.

We will now form a trading strategy and calculate a rolling z-score over the residuals with a look back set equal to obtained half life.

# Make z-score of residuals
# Set look back equal to half life
colnames(resid)
resid$mean <- SMA(resid[,"residuals"], round(half_life))
resid$stdev <- runSD(resid[,"residuals"], n = round(half_life), sample = TRUE, cumulative = FALSE)
resid$zscore <- apply(resid[,c('residuals', 'mean','stdev')], 1, function(x) { (x[1]-x[2]) / x[3] })

# Convert all NA to 0
resid[is.na(resid)] <- 0

# Place residuals data frame to original (df)
df <- data.frame(df,resid)

# Plot z-score of residuals
plot(resid$zscore,type="l", main="Rolling Z-Score - Look Back Set Equal To Half Life")

Rplot77

Note the range of standard deviations for the spread is between 3,-3.

We can now perform a back test to see how well it would have done. For the sake of simplicity we will not include optimization on in-sample and perform out of sample testing. We will also not consider share amounts of the weightings or commissions.

###################
# Back Test Script
###################

# Create Trading Logic
# When z-score crossed below 0 buy EWA and sell short EWC per eigenvectors
df$enter.long <- ifelse(df$zscore <= 0, 1,0)   # for long EWA below mean
df$exit.long <- ifelse(df$zscore == 0, 1,0)    # for long EWA below mean
df$enter.short <- ifelse(df$zscore <= 0 , -1,0) # for short EWC below mean
df$exit.short <- ifelse(df$zscore == 0 , -1,0)  # for short EWC below mean

df$enter.short.1 <- ifelse(df$zscore >= 0, -1,0)   # for short EWA above mean
df$exit.short.1 <- ifelse(df$zscore == 0, -1,0)    # for short EWA above mean
df$enter.long.1 <- ifelse(df$zscore >= 0 , 1,0) # for long EWC above mean
df$exit.long.1 <- ifelse(df$zscore == 0 , 1,0)  # for long EWC above mean

# Calculate close to close returns
df$ewa.clret <- ROC(df$EWA_Close, type = c("discrete"))
df$ewa.clret[1] <- 0
df$ewc.clret <- ROC(df$EWC_Close, type = c("discrete"))
df$ewc.clret[1] <- 0

# Long EWA below mean
df <- df %>%
  dplyr::mutate(ewa.long = ifelse(enter.long ==  1, 1,
                                  ifelse(exit.long == 1, 0, 0)))

# Short EWC above mean
df <- df %>%
  dplyr::mutate(ewc.short = ifelse(enter.short ==  -1, 1,
                                  ifelse(exit.short == -1, 0, 0)))

# Short EWA above mean
df <- df %>%
  dplyr::mutate(ewa.short = ifelse(enter.short.1 ==  -1, 1,
                                  ifelse(exit.short.1 == 1, 0, 0)))

# Long EWC above mean
df <- df %>%
  dplyr::mutate(ewc.long = ifelse(enter.long.1 ==  1, 1,
                                   ifelse(exit.long.1 == -1, 0, 0)))

# lag signal by one forward day to signal entry next day
df$ewa.long <- lag(df$ewa.long,1) # Note k=1 implies a move *forward*
df$ewc.short <- lag(df$ewc.short,1) # Note k=1 implies a move *forward*
df$ewa.short <- lag(df$ewa.short,1) # Note k=1 implies a move *forward*
df$ewc.long <- lag(df$ewc.long,1) # Note k=1 implies a move *forward*

# Calculate equity curves
df$equity.long <- apply(df[,c('ewa.long', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short <- apply(df[,c('ewc.short', 'ewc.clret')], 1, function(x) { x[1]*x[2]})
df$equity.long.1 <- apply(df[,c('ewa.short', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short.1 <- apply(df[,c('ewc.long', 'ewc.clret')], 1, function(x) { x[1]*x[2]})

# Combine signals
df$combined <- apply(df[,c('equity.long', 'equity.short','equity.long.1','equity.short.1')], 1, function(x) { x[1]+x[2]+x[3]+x[4] })

# Pull select columns from data frame to make XTS whilst retaining format
xts1 = xts(df$equity.long, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts2 = xts(df$equity.short, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts3 = xts(df$equity.long.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts4 = xts(df$equity.short.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts5 = xts(df$combined, order.by=as.POSIXct(df$Date, format="%Y-%m-%d")) 

# Join XTS together
compare <- cbind(xts1,xts2,xts3,xts4,xts5)

# Use the PerformanceAnalytics package for trade statistics
colnames(compare) <- c("EWA Long","EWC Short","EWA Short","EWC Long","Combined")
charts.PerformanceSummary(compare,main="EWC,EWA Pair", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

Rplot76

One thing to note is that for the whole sample period from April 26th 2006 to April 9th 2012. We use the exact same hedge ratio to form the stationary series. This is likely not the correct procedure, we also use the full sample to record the half life of mean reversion. As we expect the series A & B to change over time I believe it would be better practice to use a fixed rolling window to estimate the hedge ratio and mean reversion half life. When a new signal is generated it will be the hedge ratio/half life derived from the rolling look back period.

It should also be noted that we do not quite match Ernie Chans test from his book, I am using alpha vantage as a source of price data and we have a difference in price between Ernies EWA and the alpha vantage EWA. There is also differences in the urca Johansen test and the one Ernie uses in the jplv7 package. I will attempt to address these issues in upcoming posts.

This serves as a good introduction for me and others to the topic of pairs trading.

 

Full R code below. 

# Test for cointegration
# Download daily price data from Alpha Vantage API
# Plot two series
# Perform Linear Regression
# Plot Residuals
# ADF test for unit root
# Half life of mean reversion
# zscore look back equal to half life
# back test long stock A, short Stock B
# Andrew Bannerman 11.1.2017

require(alphavantager)
require(urca)
require(lattice)
require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Download price series from Alpha Vantage
# install.packages('alphavantager')
# Key
#https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=EWA&apikey=6RSYX9BPXKZVXUS9&outputsize=full&datatype=csv
av_api_key("your_api_key")
print(av_api_key())
#EWC <- av_get(symbol = "GLD", av_fun = "TIME_SERIES_DAILY_ADJUSTED", outputsize = "full")
#EWA <- av_get(symbol = "SLV", av_fun = "TIME_SERIES_DAILY_ADJUSTED", outputsize = "full")
#bit coin EWC <- av_get(symbol = "BTC", av_fun = "DIGITAL_CURRENCY_DAILY", outputsize = "full", market="CNY")

EWA <- read.csv("D:/R Projects/Final Scripts/Cointegration Scripts/data/daily_adjusted_EWA.csv", header=TRUE,stringsAsFactors=FALSE)
EWC <- read.csv("D:/R Projects/Final Scripts/Cointegration Scripts/data/daily_adjusted_EWC.csv", header=TRUE,stringsAsFactors=FALSE)
str(EWA)
# Convert timestamp to Date format
EWA$timestamp <- ymd(EWA$timestamp)
EWC$timestamp <- ymd(EWC$timestamp)

# Sort
EWA <- arrange(EWA,timestamp)
EWC <- arrange(EWC,timestamp)

# Merge Data Frames By date
df <- merge(EWA,EWC, by='timestamp')
head(df)
# Rename Columns
colnames(df)[1] <- "Date"
colnames(df)[6] <- "EWA_Close"
colnames(df)[14] <- "EWC_Close"

# Subset by date
df <- subset(df, Date >= as.POSIXct("2006-04-26") & Date <= as.POSIXct("2012-04-09"))  

# Scatter plot of price series
# plot x,y
plot(df$EWA_Close, df$EWC_Close, main="Scatterplot EWA and EWC With OLS Regression Fit",
     xlab="EWA Close", ylab="EWC Close ", pch=19)
abline(lm(df$EWC_Close~df$EWA_Close), col="red") # regression line (y~x)

# Line plot of series
require(ggplot2)
require(scales)

ggplot(df, aes(Date)) +
  theme_classic()+
  geom_line(aes(y=EWA_Close), colour="red") +
  geom_line(aes(y=EWC_Close), colour="blue") +
  scale_x_date(breaks = date_breaks("years"), labels = date_format("%Y"))+
  ggtitle("EWC, EWC Close", subtitle = "2006-04-26  to 2012-04-09") +
  labs(x="Year",y="EWA,EWC Close")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  annotate("text", label = "EWA", x = as.Date("2006-04-26"), y = 22, color = "blue")+
  annotate("text", label = "EWC", x = as.Date("2006-04-26"), y = 15, color = "red")

# Find corrlation coefficient
# method = pearson, spearman, kendall
EWA.EWC.cor <- cor(df$EWA_Close, df$EWC_Close, use="complete.obs", method="pearson") 

# Ordinary least squares regression
# EWA independant
ols <- lm(df$EWC_Close~df$EWA_Close+0) # regression line (y~x)
summary(ols)
beta <- coef(ols)[1]

resid <- as.data.frame(df$EWC_Close - beta * df$EWA_Close)
resid <- data.frame(resid,"Date" = df$Date)
colnames(resid)[1] <- "residuals"
plot(resid$Date,resid$residual, col = 'navyblue', xlab = 'Date', ylab = 'residuals', type="l", main=" EWA,EWC Residuals - EWA Independant Variable - No Intercept",cex.main=1)

# urca ADF test
library(urca)
  summary(ur.df(resid$residual, type = "drift", lags = 1))

# Johansen test
# Eigen vectors are the number of shares
# Test is cointegration test
coRes=ca.jo(data.frame(df$EWA_Close, df$EWC_Close),type="trace",K=2,ecdet="none", spec="longrun")
summary(coRes)
slot(coRes,"V") # The matrix of eigenvectors, normalised with respect to the first variable.
slot(coRes,"Vorg") # The matrix of eigenvectors, such that \hat V'S_{kk}\hat V = I.

# Calculate half life of mean reversion (residuals)
# Calculate yt-1 and (yt-1-yt)
# pull residuals to a vector
residuals <- c(resid$residual)
y.lag <- c(residuals[2:length(residuals)], 0) # Set vector to lag -1 day
y.lag <- y.lag[1:length(y.lag)-1] # As shifted vector by -1, remove anomalous element at end of vector
residuals <- residuals[1:length(residuals)-1] # Make vector same length as vector y.lag
y.diff <- residuals - y.lag # Subtract todays close from yesterdays close
y.diff <- y.diff [1:length(y.diff)-1] # Make vector same length as vector y.lag
prev.y.mean <- y.lag - mean(y.lag) # Subtract yesterdays close from the mean of lagged differences
prev.y.mean <- prev.y.mean [1:length(prev.y.mean )-1] # Make vector same length as vector y.lag
final.df <- data.frame(y.diff,prev.y.mean) # Create final data frame

# Linear Regression With Intercept
result <- lm(y.diff ~ prev.y.mean, data = final.df)
half_life <- -log(2)/coef(result)[2]
half_life

# Linear Regression With No Intercept
result = lm(y.diff ~ prev.y.mean + 0, data = final.df)
half_life1 = -log(2)/coef(result)[1]
half_life1

# Print general linear regression statistics
summary(result)

# Make z-score of residuals
# Set look back equal to half life
colnames(resid)
half_life = 31
resid$mean <- SMA(resid[,"residuals"], round(half_life))
resid$stdev <- runSD(resid[,"residuals"], n = round(half_life), sample = TRUE, cumulative = FALSE)
resid$zscore <- apply(resid[,c('residuals', 'mean','stdev')], 1, function(x) { (x[1]-x[2]) / x[3] })

# Convert all NA to 0
resid[is.na(resid)] <- 0

# Place residuals data frame to original (df)
df <- data.frame(df,resid)

# Plot z-score of residuals
plot(resid$zscore,type="l", main="Rolling Z-Score - Look Back Set Equal To Half Life", col="darkblue")

###################
# Back Test Script
###################

# Create Trading Logic
# When z-score crossed below 0 buy EWA and sell short EWC per eigenvectors
df$enter.long <- ifelse(df$zscore <= -0.7, 1,0)   # for long EWA below mean
df$exit.long <- ifelse(df$zscore == 0, 1,0)    # for long EWA below mean
df$enter.short <- ifelse(df$zscore <= -0.7 , -1,0) # for short EWC below mean
df$exit.short <- ifelse(df$zscore == 0 , -1,0)  # for short EWC below mean

df$enter.short.1 <- ifelse(df$zscore >= 0.7, -1,0)   # for short EWA above mean
df$exit.short.1 <- ifelse(df$zscore == 0, -1,0)    # for short EWA above mean
df$enter.long.1 <- ifelse(df$zscore >= 0.7 , 1,0) # for long EWC above mean
df$exit.long.1 <- ifelse(df$zscore == 0 , 1,0)  # for long EWC above mean

# Calculate close to close returns
df$ewa.clret <- ROC(df$EWA_Close, type = c("discrete"))
df$ewa.clret[1] <- 0
df$ewc.clret <- ROC(df$EWC_Close, type = c("discrete"))
df$ewc.clret[1] <- 0

# Long EWA below mean
df <- df %>%
  dplyr::mutate(ewa.long = ifelse(enter.long ==  1, 1,
                                  ifelse(exit.long == 1, 0, 0)))

# Short EWC above mean
df <- df %>%
  dplyr::mutate(ewc.short = ifelse(enter.short ==  -1, 1,
                                  ifelse(exit.short == -1, 0, 0)))

# Short EWA above mean
df <- df %>%
  dplyr::mutate(ewa.short = ifelse(enter.short.1 ==  -1, 1,
                                  ifelse(exit.short.1 == 1, 0, 0)))

# Long EWC above mean
df <- df %>%
  dplyr::mutate(ewc.long = ifelse(enter.long.1 ==  1, 1,
                                   ifelse(exit.long.1 == -1, 0, 0)))

# lag signal by one forward day to signal entry next day
df$ewa.long <- lag(df$ewa.long,1) # Note k=1 implies a move *forward*
df$ewc.short <- lag(df$ewc.short,1) # Note k=1 implies a move *forward*
df$ewa.short <- lag(df$ewa.short,1) # Note k=1 implies a move *forward*
df$ewc.long <- lag(df$ewc.long,1) # Note k=1 implies a move *forward*

# Calculate equity curves
df$equity.long <- apply(df[,c('ewa.long', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short <- apply(df[,c('ewc.short', 'ewc.clret')], 1, function(x) { x[1]*x[2]})
df$equity.long.1 <- apply(df[,c('ewa.short', 'ewa.clret')], 1, function(x) { x[1]*x[2] })
df$equity.short.1 <- apply(df[,c('ewc.long', 'ewc.clret')], 1, function(x) { x[1]*x[2]})

tail(df,500)

# Combine signals
df$combined <- apply(df[,c('equity.long', 'equity.short','equity.long.1','equity.short.1')], 1, function(x) { x[1]+x[2]+x[3]+x[4] })

# Pull select columns from data frame to make XTS whilst retaining format
xts1 = xts(df$equity.long, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts2 = xts(df$equity.short, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts3 = xts(df$equity.long.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts4 = xts(df$equity.short.1, order.by=as.POSIXct(df$Date, format="%Y-%m-%d"))
xts5 = xts(df$combined, order.by=as.POSIXct(df$Date, format="%Y-%m-%d")) 

# Join XTS together
compare <- cbind(xts1,xts2,xts3,xts4,xts5)

# Use the PerformanceAnalytics package for trade statistics
colnames(compare) <- c("EWA Long","EWC Short","EWA Short","EWC Long","Combined")
charts.PerformanceSummary(compare,main="EWC,EWA Pair", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

How To Ignore Errors In a Loop – Continue Processing Each Iteration

In this post we will look at a method to process many files or data frames. In this example we will just make data frames, but in actual fact, these may point to a directory on our hard drive, see ?list.files.

First lets make some dummy data that we can work with.

# Create dummy data frames
    file1 <- as.data.frame(runif(100, 0,100))
    file2 <- as.data.frame(runif(100, 0,100))
    file3 <- as.data.frame(runif(100, 0,100))
    file4 <- as.data.frame(runif(12, 0,100))
    file5 <- as.data.frame(runif(100, 0,100))
    file6 <- as.data.frame(runif(15, 0,100))
    file7 <- as.data.frame(runif(100, 0,100))
    file8 <- as.data.frame(runif(8, 0,100))  # This is the df that its intended to fail on
    file9 <- as.data.frame(runif(100, 0,100))
    file10 <- as.data.frame(runif(100, 0,100))
    file11 <- as.data.frame(runif(100, 0,100))

    # Store all data frames in a list
    file.list <- list(file1,file2,file3,file4,file5,file6,file7,file8,file9,file10)

# Rename column names for all 11 data frames
Names <- function(x) {
  names(x) <- c("Close")
  return(x)
}
# Apply name change to all 10 data frames
file.list <- lapply(file.list, Names)

In the above we made 11 random data frames. We stored every data frame in a list, inside the file.list variable. We can print the first data frame by file.list[1] or the last file.list[11].

We create a function to rename the column names. We use lapply to run this over every data frame in the list from data frame file1 to file11.

Now that we have our dummy data, we may now create a function which will store our commands. Commands that we wish to iterate on each and every data frame.

# Create function for performing commands.
    genSMA = function(x){
      nextfile <- data.frame(file.list[[i]],stringsAsFactors=FALSE)
      new.df <- data.frame(nextfile)
      # Load packages 
      require(TTR)
      # Use TTR package to create rolling SMA n day moving average 
      getSMA <- function(numdays) {
        function(new.df) {
          SMA(new.df[,"Close"], numdays)    # Calls TTR package to create SMA
        }
      }
      # Create a matrix to put the SMAs in
      sma.matrix <- matrix(nrow=nrow(new.df), ncol=0)
      tail(sma.matrix)
      # Loop for filling it
      for (i in 2:12) {
        sma.matrix <- cbind(sma.matrix, getSMA(i)(new.df))
      }

      # Rename columns
      colnames(sma.matrix) <- sapply(2:12, function(n)paste("close.sma.n", n, sep=""))

      # Bind to existing dataframe
      new.df <-  cbind(new.df, sma.matrix)

    }

The above function will create a simple moving average on the Close column. It will calculate a 2 to 12 simple moving average. This is a very simplistic example, however in reality there may be full scripts and a multitude of calculations or operations within this function.

So we have our function with the calculations we wish to perform over all 11 data frames stored in our list. Next thing to do is write a for loop which will call the function and iterate it over every data frame in our file.list of data frames.

# Loop for running function over all data frames
for (i in 1:length(file.list)){
  tryCatch({
    genSMA(file.list[[i]])   # Start from data frame 1
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

After running the full code we should get the error:
[1] "i = 8 failed:"

This is because we purposely setup data frame file8 with only 8 data points which is less than our required 12 for a 2:12 simple moving average. For a 12 simple moving average we need 12 data points. Thus the code throws an error.

In normal conditions without using tryCatch. The loop would break and we would then have to remove the error-some file or data frame and continue to run the loop again. Perhaps dealing with only a few files this is not an issue. But if you are processing thousands of files its a real inconvenience!

tryCatch prints which iteration failed also so may perform further due diligence.

Note we can also modify the loop and function to run over a directory on our hard drive if thats where our data is stored.

We can do this with:

# Specify directory where files are stored 
 files = list.files(path = "C:/R Projects/Data/", pattern = ".", full.names = TRUE)

Then inside our function we can:

  genSMA = function(x){
        nextfile <- read.table(files[i],header=TRUE, sep=",", stringsAsFactors=FALSE)  #reads first file in directory 
        new.df <- data.frame(nextfile)  # putting inside data.frame

ALL YOUR COMMANDS HERE

}

And we call the loop in the same way. This time we want to run the loop over every single file in our directory. We placed this variable in files.

# Loop for running function over all data frames
for (i in 1:length(files)){
  tryCatch({
    genSMA(files[[i]])
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

Here we run the function over every single file in our directory that we specified in the files variable. Note I do not specify any output of the processing above. We may output a plot or store the final results in a data frame. I will revisit! However, the main topic is addressed – successfully ignore a failure in the loop and continue to process the remaining iterations.

Strategy – S&P500 Seasonal

During a previous post, S&P500 Seasonal Study + Other Commodities: https://flare9xblog.wordpress.com/2017/10/12/seasonal-study/

It was determined that the best months to be long the S&P500 was from months October through to the end of May.

As a refresher:

Rplot126
S&P500 Mean Daily Spread Per Month

We back test this theme to see how it performs over monthly bars from 1960 to present day (10.12.2017):

Rplot136

SP500seasoal.results

What we see is that we are invested 67% of the time being long from the Open bar of October and selling at the Close of May. We do beat the S&P500 and I believe this is partly due to drawing down less post 1999 bear market. Our maximum draw down is slightly less than than buy and hold at 43%.

What stands out is the average gain during this October to May holding period. We see that the average trade is 14.3%.

To see some relative benchmark, lets see what it looks like only buying May open and selling at the Close of September:

Rplot137

may to sep

> Return.cumulative(xts1, geometric = TRUE)
                        [,1]
Cumulative Return -0.1973985

We see cumulative returns for Buying May open and selling Close of September are -0.1973985. That is the growth of $1. Essentially May to September is negative to flat. The cumulative returns do not even warrant shorting May to September.

In closing we see that the majority of the S&P500 gains seem to be attributed to seasonal trends specifically buying in October and Selling in May.

Full back test R Code for the above


# S&P500 Seasonal Back Test 
# Month Bars
# Buy Open of October, Sell End of April (Based on seasonal study)
# Andrew Bannerman 10.12.2017

require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Data path
data.dir <- "C:/R Projects/Final Scripts/Seasonal Strategies/Data"
data.read.spx <- paste(data.dir,"$SPX.Monthly.csv",sep="/")

# Read data
read.spx <- read.csv(data.read.spx,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)

# Make dataframe 
new.df <- data.frame(read.spx)

# Convert Values To Numeric 
cols <-c(3:8)
new.df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Convert Date Column [1]
new.df$Date <- ymd(new.df$Date)

# Add day of month column 
# This is a helper column for creating buy/sell rules
group <- new.df %>% dplyr::mutate(month = lubridate::month(Date)) %>% group_by(month) 
new.df <- data.frame(new.df,group$month)

# Calculate Returns from open to close 
new.df$ocret <- apply(new.df[,c('Open', 'Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
new.df$clret <- ROC(new.df$Close, type = c("discrete"))
new.df$clret[1] <- 0

# Subset by Date
new.df <- subset(new.df, Date >= as.POSIXct("1960-01-01") )  

# Enter Long Rules 
# Buy October To End of April
x = new.df$group.month
# October to May Hold Period Rules
#new.df$enter <- ifelse(x == 1,1,0 | ifelse(x == 2,1,0 | ifelse(x == 3,1,0 | ifelse(x == 4,1,0 | ifelse(x==5,1,0 | ifelse(x == 10,1,0 | ifelse(x == 11,1,0 | ifelse(x == 12,1,0))))))))
#May to September Hold Period Rules
new.df$enter <- ifelse(x == 5,1,0 | ifelse(x == 6,1,0 | ifelse(x == 7,1,0 | ifelse(x == 8,1,0 | ifelse(x==9,1,0)))))

# Calculate equity curves
# Long
new.df <- new.df %>%
  dplyr::mutate(RunID = rleid(enter)) %>%
  group_by(RunID) %>%
  dplyr::mutate(seasonal.equity = ifelse(enter == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats 
xts1 = xts(new.df$seasonal.equity, order.by=as.POSIXct(new.df$Date, format="%Y-%m-%d")) 
xts2 = xts(new.df$clret, order.by=as.POSIXct(new.df$Date, format="%Y-%m-%d")) 
str(new.df)

# Join XTS together 
compare <- cbind(xts1,xts2)

# Use the PerformanceAnalytics package for trade statistics
colnames(compare) <- c("Seasonal","Buy And Hold")
charts.PerformanceSummary(compare,main="Long May Open To Close of September", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

# Find net trade result of multiple 'n' day trades
# Find start day of trade, find end day, perform (last(Close) - first(Open))/first(Open) % calculation
new.df <- new.df %>%
  dplyr::mutate(RunID = data.table::rleid(enter)) %>%
  group_by(RunID) %>%
  dplyr::mutate(perc.output = ifelse(enter == 0, 0,
                                     ifelse(row_number() == n(),
                                            (last(Close) - first(Open))/first(Open), 0))) %>%
  ungroup() %>%
  select(-RunID)

# Win / Loss % 
# All Holding Months
winning.trades <- sum(new.df$seasonal.equity > '0', na.rm=TRUE)
losing.trades <- sum(new.df$seasonal.equity < '0', na.rm=TRUE)
even.trades <- sum(new.df$seasonal.equity == '0', na.rm=TRUE)
total.days <- NROW(new.df$seasonal.equity)

# Multi Month Trades
multi.winning.trades <- sum(new.df$perc.output > '0', na.rm=TRUE)
multi.losing.trades <- sum(new.df$perc.output < '0', na.rm=TRUE)
multi.total.days <- multi.winning.trades+multi.losing.trades

# % Time Invested
time.invested <- (winning.trades + losing.trades) / total.days
winning.trades + losing.trades

# Calcualte win loss %
# All months
total <- winning.trades + losing.trades
win.percent <- winning.trades / total
loss.percent <- losing.trades / total
# Multi Month Trades
multi.total <- multi.winning.trades + multi.losing.trades
multi.win.percent <- multi.winning.trades / multi.total
multi.loss.percent <- multi.losing.trades / multi.total
# Calculate Consecutive Wins Loss 
# All Months
remove.zero <- new.df[-which(new.df$seasonal.equity == 0 ), ] # removing rows 0 values 
consec.wins <- max(rle(sign(remove.zero$seasonal.equity))[[1]][rle(sign(remove.zero$seasonal.equity))[[2]] == 1])
consec.loss <- max(rle(sign(remove.zero$seasonal.equity))[[1]][rle(sign(remove.zero$seasonal.equity))[[2]] == -1])
consec.wins

# Multi Month Trades
multi.remove.zero <- new.df[-which(new.df$perc.output == 0 ), ] # removing rows 0 values 
multi.consec.wins <- max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == 1])
multi.consec.loss <-max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == -1])

# Calculate Summary Statistics
# All Months
average.trade <- mean(new.df$seasonal.equity)
average.win <- mean(new.df$seasonal.equity[new.df$seasonal.equity >0])
average.loss <- mean(new.df$seasonal.equity[new.df$seasonal.equity <0])
median.win <- median(new.df$seasonal.equity[new.df$seasonal.equity >0])
median.loss <- median(new.df$seasonal.equity[new.df$seasonal.equity <0])
max.gain <- max(new.df$seasonal.equity)
max.loss <- min(new.df$seasonal.equity)
win.loss.ratio <- winning.trades / abs(losing.trades)
summary <- cbind(winning.trades,losing.trades,even.trades,total.days,win.percent,loss.percent,win.loss.ratio,time.invested,average.trade,average.win,average.loss,median.win,median.loss,consec.wins,consec.loss,max.gain,max.loss)
summary <- as.data.frame(summary)
colnames(summary) <- c("Winning Trades","Losing Trades","Even Trades","Total Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(summary)

# Multi Month Trades
multi.average.trade <- mean(new.df$perc.output)
multi.average.win <- mean(new.df$perc.output[new.df$perc.output >0])
multi.average.loss <- mean(new.df$perc.output[new.df$perc.output <0])
multi.median.win <- median(new.df$perc.output[new.df$perc.output >0])
multi.median.loss <- median(new.df$perc.output[new.df$perc.output <0])
multi.win.loss.ratio <- multi.average.win / abs(multi.average.loss)
multi.max.gain <- max(new.df$perc.output)
multi.max.loss <- min(new.df$perc.output)
multi.summary <- cbind(multi.winning.trades,multi.losing.trades,multi.total.days,multi.win.percent,multi.loss.percent,multi.win.loss.ratio,time.invested,multi.average.trade,multi.average.win,multi.average.loss,multi.median.win,multi.median.loss,multi.consec.wins,multi.consec.loss,multi.max.gain,multi.max.loss)
multi.summary <- as.data.frame(multi.summary)
colnames(multi.summary) <- c("Winning Trades","Losing Trades","Total Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(multi.summary)
print(performance.table)
print(drawdown.table)
table.Drawdowns(xts1, top=10)
Return.cumulative(xts1, geometric = TRUE)

# Write output to file
write.csv(new.df,file="C:/R Projects/SP500.Seasonal.csv")

R – Scrape List Of ETF’s From Nasdaq.Com

Using the package data.table and the function fread() we can read a .csv file that is hosted on the internet without downloading it to our hard drive.

# Obtain List of ETFS From nasdaq.com
# Andrew Bannerman 10.4.2017

library(data.table)

# Read ETF list csv file from nasdaq.com
# Use fread() from data.table package 
# install.packages("data.table")
read.data <- fread("http://www.nasdaq.com/investing/etfs/etf-finder-results.aspx?download=Yes")

# Subset Column 1, Symbol Column
symbol.col <- read.data$Symbol

# Export symbol column as .txt file 
write.table(symbol.col,"C:/R Projects/Data/etf_list_nasdaq_dot_com/etf.list.txt",append=TRUE,quote=FALSE,row.names=FALSE,col.names=FALSE)

# Count ETF tickers 
NROW(symbol.col)

As of date 10.4.2017 there is a total of 1726 ETF’s listed on nasdaq.com.

Stock GVP – Mean Reverting Series

Let us explore the ticker symbol GVP. We will test for mean reversion with the Hurst exponent and calculate the half life of mean reversion.

First, lets plot the daily closing prices:

library(ggplot2)
ggplot(new.df, aes(x = Date, y = Close))+
geom_line()+
labs(title = "GVP Close Prices", subtitle = "19950727 to 20170608")+
theme(plot.title = element_text(hjust=0.5),plot.subtitle = element_text(hjust=0.5,size=9), plot.caption = element_text(size=7))

Rplot13

Lets run the Hurst exponent to test for mean reversion, we will do this over the entire history of GVP. For this test we will use a short term lag period of 2:20 days (Explanation Here).

# Hurst Exponent
# Andrew Bannerman
# 8.11.2017

require(lubridate)
require(dplyr)
require(magrittr)
require(zoo)
require(lattice)

# Data path
data.dir <- "D:/R Projects"
output.dir <- "D:/R Projects"
data.read.spx <- paste(data.dir,"GVP.csv",sep="/")

# Read data
read.spx <- read.csv(data.read.spx,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)

# Convert Values To Numeric
cols <-c(3:8)
read.spx[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Convert Date Column [1]
read.spx$Date <- ymd(read.spx$Date)

# Make new data frame
new.df <- data.frame(read.spx)

# Subset Date Range
#new.df <- subset(new.df, Date >= "2000-01-06" & Date <= "2017-08-06")
#new.df <- subset(new.df, Date >= as.Date("2017-01-07") ) 

#Create lagged variables
lags <- 2:20

# Function for finding differences in lags. Todays Close - 'n' lag period
getLAG.DIFF <- function(lagdays) {
  function(new.df) {
    c(rep(NA, lagdays), diff(new.df$Close, lag = lagdays, differences = 1, arithmetic = TRUE, na.pad = TRUE))
  }
}
# Create a matrix to put the lagged differences in
lag.diff.matrix <- matrix(nrow=nrow(new.df), ncol=0)

# Loop for filling it
for (i in lags) {
  lag.diff.matrix <- cbind(lag.diff.matrix, getLAG.DIFF(i)(new.df))
}

# Rename columns
colnames(lag.diff.matrix) <- sapply(lags, function(n)paste("lagged.diff.n", n, sep=""))

# Bind to existing dataframe
new.df <-  cbind(new.df, lag.diff.matrix)
head(new.df)

# Calculate Variances of 'n period' differences
variance.vec <- apply(new.df[,9:ncol(new.df)], 2, function(x) var(x, na.rm=TRUE))

# Linear regression of log variances vs log lags
log.linear <- lm(formula = log(variance.vec) ~ log(lags))
# Print general linear regression statistics
summary(log.linear)
# Plot log of variance 'n' lags vs log time
xyplot(log(variance.vec) ~ log(lags),
       main="GVP log variance of price diff Vs log time lags",
       xlab = "Time",
       ylab = "Logged Variance 'n' lags",
       grid = TRUE,
       type = c("p","r"),col.line = "red",
       abline=(h = 0)) 

hurst.exponent = coef(log.linear)[2]/2
hurst.exponent

Rplot14

linear.regression.output

If we divide the log(logs) coefficient by 2 we obtain the Hurst exponent of 0.4598435.

Remember H value less than 0.5 = mean reversion.

0.5 = random walk

0.5 = momentum.

Great.

Lets apply a simple linear strategy to see how it performs over this series. We will setup a rolling z-score and we will buy when the zscore crosses below 0 and we will sell when it crosses back over 0. We use a arbitrarily chosen lookback of 10 days for this.

Here are the results:

Rplot109

The above plot is the compounded growth of $1 and since 1995 $1 has grown to over $800 or over 79,900 %.

Next lets calculate the half life of mean reversion. We do this with linear regression. For the independent variable we use the price difference between today’s close and yesterdays close. For the dependent variable we use the price differences between today’s and yesterdays close – the mean of the price difference between today’s close and yesterdays close.

Note we use the previous 100 days of data to produce this test:

# Calculate yt-1 and (yt-1-yt)
y.lag <- c(random.data[2:length(random.data)], 0)   # Set vector to lag -1 day
y.lag  <- y.lag[1:length(y.lag)-1]    # As shifted vector by -1, remove anomalous element at end of vector
random.data <- random.data[1:length(random.data)-1]  # Shift data by -1 to make same length of vector
y.diff <- random.data - y.lag    # Subtract todays close - close from yesterday
y.diff  <- y.diff [1:length(y.diff)-1]   # Adjust length of vector
prev.y.mean <- y.lag - mean(y.lag)  # Subtract yesterdays close from the mean of lagged differences
prev.y.mean <- prev.y.mean [1:length(prev.y.mean )-1]  # Adjust length of vector
final <- merge(y.diff, prev.y.mean)   # Merge
final.df <- as.data.frame(final)  # Create final data frame

# Linear Regression With Intercept
result <- lm(y.diff ~ prev.y.mean, data = final.df)
half_life <- -log(2)/coef(result)[2]
half_life

We obtain a half life of 4.503093 days.

Next lets see if we can set our linear strategy lookback period equal to the half life to see if it improves results. The original look back period was 10 days chosen arbitrarily. The result of a look back of 4.5 rounded to 5 days is below:

Rplot109

From 1995 to roughly present day the result did not improve significantly but looking at the plot we see a large uptick in the equity curve from 2013 onwards. Lets subset our data to only include data post 2013 and lets re-run the 10 day look back and also the 5 day look back to see if we can see the benefit of optimizing using the mean reversion half life.

First the result of the 10 day look back arbitrarily chosen:

Rplot112

We see that $1 has grown to $8 or 700% increase.

Next the look back of 4.5 rounded to 5 days derived from the mean reversion half life calculation:

Rplot109.png
We see that using a look back set to equal the mean reversion half life of 5 days rounded, we see $1 has grown to over $15 or a 1400% increase.

Lets run the Hurst exponent on both periods, the first from 1995 to 2013. The second from 2013 to roughly present day:

1st test: We see H = 0.4601632
2nd: We see H = 0.4230494

Ok so we see the Hurst exponent become more mean reverting post 2013. If we test >= 2016 and >= 2017 we see:
H = 0.3890816 and 0.2759805 respectively.

Next lets choose a random time frame between 1995 and 2013.

From period 2000 to 2003, H = 0.5198083 which is more a random walk.

If we look at period 2003 to 2008 we have a H value of 0.4167166 which is more mean reverting, however, this H value of 0.41 is actually lower than the post 2013 H value of 0.4230494. So the H value in this case didnt say because H is this, then gains should be that.

This might be caused by other factors, frequency of trades, price range, fluctuations etc..

Note this post is largely theoretical no commissions are included in any of the trades. This demonstrates the combination of using statistical tools and performing a back test.