Parameter Sensitivity Analysis

In this post we demonstrate ideas to test for parameter sensitivity.

Here we have a strategy with 5x parameters. 3x being look back periods for a specific indiactor. The other 2x being an entry level threshold and an exit level threshold.

I decided to change the original parameters by up to 50% in either direction of the original. It might look something like this:

# Original 
param1 = 116
param2 = 10
param3 = 69
entry_level = 6
exit_level= 85

So if you multiply any of the params by 0.5… you obtain a value… and in this example lets multiply the first parameter, param1: 116 * 0.5 = 58. This provides the new upper and lower limit. The range now becomes:

58 to 174.

This has created a range of parameters up to a maximum of 50% change in either direction of the original. Do this for all parameters. Note if your indicator only goes to 100 for example then that would be your maximum versus the full 50% range. Same in the opposite direction. Parameter lower limit obviously 0.

Next run a monte carlo running randomly through the parameter ranges. I tend to do 5000 iterations then look at the distribution. An example output of this process is below:

1 (1)1 (2)1 (3)

Here we show some key metrics. Average trade profit. This is an important value because on average we want to know what we make per trade and it has to be high enough to withstand slippage and commissions.

Next we have the win % where depending on which money management system you are using, often this is an input variable in money management calculations. An example would be half Kelly formula.

And of course we have the net ending balance of the strategy itself less commissions (I didnt add slippage in this example).

The plots above show my original parameter setting results by marking a straight vertical red line on the plot.

This red line was some optimal combination again found through a random search. As we can see its not THE optimal as to the right side of the red line we have parameter sets which beat the original.

Besides this what can we learn – we see that by running 5000 iterations and randomly selecting a parameter from a range which includes up to a 50% change in either direction from the original. That we land in positive expectancy across our selected trading metrics. We may observe the mean values below:

params

Needless to say. The random parameter selection might not choose the most extreme 50% parameter change at any given iteration. Thus across the 5 parameter sets we would be combining a range of changes in either direction up to a 50% maximum. One iteration might might look like:

Param1 = +/- 10% change from original

Param2 = No change from original

Param 3 = +/- 40% change from original

Param 4 = +/- 12 % change from original

Param 5 = +/- 5% change from original

And run another:

Param1 = +/- 45% change from original

Param2 = +/- 8% change from original

Param 3 = +/- 2% change from original

Param 4 = +/- 22 % change from original

Param 5 = +/- 34% change from original

So on and so forth for 5000 iterations.

How do you obtain confidence that you can protect your initial strategy investment when going live?

This type of study essentially says we may place a strategy live and have a degree in confidence in changing parameters.

What would I do in real life?

I would place 3 parameters bands in live trading. Whichever parameter set triggered first is the trade that is taken. Thus I am covering a parameter band and I am covered if the market changes away from my original parameters…..

Food for thought!

If you have an interest in this type of research feel free to drop me a line.

Follow me: @flare9x / Github

Stress Testing an Intraday Strategy Through Monte Carlo Methods

This is an intraday ES strategy that I was testing for a client. The client was not interested in it due to the low frequency of trades, hence I may post it for others to view. It shows how a strategy was proved through stress testing and looking for optimal conditions to apply the strategy. The steps for strategy development are below:

Strategy Development Example – Andrew Bannerman – 1.28.2018
Bannerman1985@gmail.com

Introduction:
Follow a scientific process in which a strategy is developed and stress tested before live incubation /
trading.

Tools used: R Statistical Programming Language

Strategy Development Procedure:
1. General view, general statistics, tail risk
2. Initial Back Testing
3. Walk Forward Analysis (Cross validation) , Rolling and Fixed.
4. Parameter Sensitivity Analysis (Random adjustments 0 to 50% parameter change, n times)
5. Draw down expectation (sample without replacement net trade result, n times)
6. Re-sample original time series (Maximum Entropy Bootstrapping, n times) and run strategy over new
series.
7. Market Random Test (Test if strategy beats buying randomly)
8. Noise Test (Adding random fixed % to open, high, low, close, simulate back test n times)
9. Strategy Seasonality
10. Layering non-correlated strategies

Please see attached .pdf.

Download – Intraday Trading Strategy Development Using R – pdf

Let me know if you have any questions!

Optimal f – Monte Carlo R Implementation

As we go through the stages of strategy robustness testing. I thought it would be prudent to visit the subject of position sizing. World Cup Championship trader Kevin Davey (2014) uses monte carlo simulation to find the optimal f position size fraction. After finding optimal f the share/contract number can be found with:

N = f * equity / L

f= optimal f (found through simulation or pre selected)

L – Worst historical loss (absolute value dollar $ loss) note: often our worst draw down is ahead of us, one may want to increase this value also

If a trader wanted to risk a fraction of equity on a trade, for example , 0.4 or 40% of total equity, where equity is 25,000 and the historical worst loss was $500:

Nshares = 0.4 * 25,000 / 500  = 20 shares

We implement an R optimal f simulation. We plot various performance metrics against fractional (f). Cumulative returns, maximum draw down, risk of ruin and sharpe ratio. In the code example we run 252 days worth of random data in $ dollar gain / loss amounts. We then adjust these outputs by varying fractional (f). It should be noted that a real dollar strategy PnL may be added to this code to run simulations, there is a section that will sample with or without replacement to obtain many iterations of a back test PnL.

An actual strategy back test has a set sequence of wins and losses and by using monte carlo simulation we may mix up the original trade sequence. We may run 1000’s or millions of iterations to obtain a distribution of performance metrics. The distribution of performance metrics may provide a more realistic expectancy going forward. The R code to perform optimal f simulations:

 # Optimal F Simulation In R 
# Make random returns 
# Adjust $ return output by varying fractional (f)
#######################################################################
# Calculations 
# Calculation 1 = Optimal F calculation Start calculation = 1 + fractional amount * trade gain/loss (-/+) / abs(worst.loss)
# Calculation 2 = Optimal F Calculation = prev equity * (1+fractional (f) amount * trade gain/loss (-/+) / abs(worst.loss)
#####################################################################

# Required Packages 
require(reshape2)
require(PerformanceAnalytics)

# Set (f) values
f <- c(0,0.05,	0.1,	0.15,	0.2,	0.25,	0.3,	0.35,	0.4	,0.45,	0.5	,0.55,	0.6	,0.65,	0.7,	0.75,	0.8,	0.85,	0.9	,0.95	,1)

# Create emptry matrix's for each (f) iteration
f.0.00 <- matrix()
f.0.05 <- matrix()
f.0.10 <- matrix()
f.0.15 <- matrix()
f.0.05 <- matrix()
f.0.20 <- matrix()
f.0.25 <- matrix()
f.0.30 <- matrix()
f.0.35 <- matrix()
f.0.40 <- matrix()
f.0.45 <- matrix()
f.0.50 <- matrix()
f.0.55 <- matrix()
f.0.60 <- matrix()
f.0.65 <- matrix()
f.0.70 <- matrix()
f.0.75 <- matrix()
f.0.80 <- matrix()
f.0.85 <- matrix()
f.0.90 <- matrix()
f.0.95 <- matrix()
f.1 <- matrix()

# Create random $ amount gains and losses
random.rets <- ifelse(runif(252)<0.40,-1,1)*round(252*runif(252),2)

# Sample and replace original return stream (note this may be real life gains and losses from a strategy)
# One may wish to enter actual trade results here
# and sample with or without replacement
rets <- sample(random.rets,replace=TRUE) # Remove repace=TRUE for without replacement
rets <- random.rets
i=2# Start on second location (calculation 1 is fixed)
for (i in 2:length(rets)) { 
  f.0.00[1] = (1+0.00 * rets[1] / abs(min(rets)))
  f.0.00[i] = f.0.00[i-1] * ((1+0.00* rets[i] / abs(min(rets))))
f.0.05[1] = (1+0.05 * rets[1] / abs(min(rets)))
f.0.05[i] = f.0.05[i-1] * ((1+0.05* rets[i] / abs(min(rets))))
f.0.10[1] = (1+0.10 * rets[1] / abs(min(rets)))
f.0.10[i] = f.0.10[i-1] * ((1+0.10* rets[i] / abs(min(rets))))
f.0.15[1] = (1+0.15 * rets[1] / abs(min(rets)))
f.0.15[i] = f.0.15[i-1] * ((1+0.15* rets[i] / abs(min(rets))))
f.0.20[1] = (1+0.20 * rets[1] / abs(min(rets)))
f.0.20[i] = f.0.20[i-1] * ((1+0.20* rets[i] / abs(min(rets))))
f.0.25[1] = (1+0.25 * rets[1] / abs(min(rets)))
f.0.25[i] = f.0.25[i-1] * ((1+0.25* rets[i] / abs(min(rets))))
f.0.30[1] = (1+0.30 * rets[1] / abs(min(rets)))
f.0.30[i] = f.0.30[i-1] * ((1+0.30* rets[i] / abs(min(rets))))
f.0.35[1] = (1+0.35 * rets[1] / abs(min(rets)))
f.0.35[i] = f.0.35[i-1] * ((1+0.35* rets[i] / abs(min(rets))))
f.0.40[1] = (1+0.40 * rets[1] / abs(min(rets)))
f.0.40[i] = f.0.40[i-1] * ((1+0.40* rets[i] / abs(min(rets))))
f.0.45[1] = (1+0.45 * rets[1] / abs(min(rets)))
f.0.45[i] = f.0.45[i-1] * ((1+0.45* rets[i] / abs(min(rets))))
f.0.50[1] = (1+0.50 * rets[1] / abs(min(rets)))
f.0.50[i] = f.0.50[i-1] * ((1+0.50* rets[i] / abs(min(rets))))
f.0.55[1] = (1+0.55 * rets[1] / abs(min(rets)))
f.0.55[i] = f.0.55[i-1] * ((1+0.55* rets[i] / abs(min(rets))))
f.0.60[1] = (1+0.60 * rets[1] / abs(min(rets)))
f.0.60[i] = f.0.60[i-1] * ((1+0.60* rets[i] / abs(min(rets))))
f.0.65[1] = (1+0.65 * rets[1] / abs(min(rets)))
f.0.65[i] = f.0.65[i-1] * ((1+0.65* rets[i] / abs(min(rets))))
f.0.70[1] = (1+0.70 * rets[1] / abs(min(rets)))
f.0.70[i] = f.0.70[i-1] * ((1+0.70* rets[i] / abs(min(rets))))
f.0.75[1] = (1+0.75 * rets[1] / abs(min(rets)))
f.0.75[i] = f.0.75[i-1] * ((1+0.75* rets[i] / abs(min(rets))))
f.0.80[1] = (1+0.80 * rets[1] / abs(min(rets)))
f.0.80[i] = f.0.80[i-1] * ((1+0.80* rets[i] / abs(min(rets))))
f.0.85[1] = (1+0.85 * rets[1] / abs(min(rets)))
f.0.85[i] = f.0.85[i-1] * ((1+0.85* rets[i] / abs(min(rets))))
f.0.90[1] = (1+0.90 * rets[1] / abs(min(rets)))
f.0.90[i] = f.0.90[i-1] * ((1+0.90* rets[i] / abs(min(rets))))
f.0.95[1] = (1+0.95 * rets[1] / abs(min(rets)))
f.0.95[i] = f.0.95[i-1] * ((1+0.95* rets[i] / abs(min(rets))))
f.1[1] = (1+1 * rets[1] / abs(min(rets)))
f.1[i] = f.1[i-1] * ((1+1* rets[i] / abs(min(rets))))
}

# Output
out <- cbind(f.0.00,f.0.05,f.0.10,f.0.15,f.0.20,f.0.25,f.0.30,f.0.35,f.0.40,f.0.45,f.0.50,f.0.55,f.0.60,f.0.65,f.0.70,f.0.75,f.0.80,f.0.85,f.0.90,f.0.95,f.1)
cnames <- f
colnames(out) <- cnames

######################################################
# Find cumulative return for any given fractional (f)
#######################################################
cr.list <- list()
i=1
for (i in 1:21) { 
  cr.rets <- ROC(out[,i],type = "discrete",na.pad=TRUE)
  cr.list[[i]] <- rbind(as.numeric(as.data.frame(Return.cumulative(xts(cr.rets, Sys.Date()-length(cr.rets):1)))))
}

# cbind output 
cr_out <- do.call(cbind,cr.list)
cnames <- f
colnames(cr_out) <- cnames
# Plot cumulative return 
cr_plot <- melt(cr_out)
cr_plot <- setDT(cr_plot, keep.rownames=TRUE)
cr_plot <- data.frame(cr_plot)
plot(cr_plot$Var2,cr_plot$value,type="o",col="red",ann=FALSE,xaxt="n")
title("Cumulative Return Vs Fractional (f)",xlab="Fractional (f)", ylab="Cumulative Return (%)")
axis(side = 1, at = f, las=2)

# Plot for final $ equity
#plot.df <- out[nrow(out),] # Extract last row of simulation
#plots <- melt(plot.df)
#plots <- setDT(plots, keep.rownames=TRUE)
#plots <- data.frame(plots)
# Plot
#plot(plots$rn,plots$value,type="o",col="red", ann=FALSE)
#title("Final Equity Vs Fractional (f)",xlab="Fractional (f)", ylab="Final Equity ($)")

################################################################################
# Find maximum drawdown for any given fractioanl (f)
################################################################################
dd <- data.frame(out)
cnames <- f
colnames(dd) <- cnames

# loop to pull max dd 
dd.list <- list()
i=1
for (i in 1:length(dd)) { 
 dd.rets <- ROC(dd[,i],type = "discrete",na.pad=TRUE)
  dd.list[[i]] <- rbind(as.numeric(as.data.frame(maxDrawdown(xts(dd.rets, Sys.Date()-length(dd.rets):1)))))
}

# Validation
#rets <- ROC(dd[,2],type = "discrete",na.pad=TRUE)
#maxDrawdown(xts(rets, Sys.Date()-length(rets):1))
#chart.Drawdown(xts(rets$rets, Sys.Date()-nrow(rets):1), legend.loc = NULL, colorset = (1:12))

# cbind output 
dd_out <- do.call(cbind,dd.list)
cnames <- f
colnames(dd_out) <- cnames

# Plot DD vs fractional (f)
  dd.plot <- melt(dd_out)
# Plot
plot(dd.plot$Var2,dd.plot$value,type="o",col="red", ann=FALSE,xaxt="n")
title("Maximum Drawdown Vs Fractional (f)",xlab="Fractional (f)", ylab="Maximum Draw Down (%)")
axis(side = 1, at = f, las=2)

# Combine Plots 
par(mfrow=c(2,2))
# Cumulative Return
plot(cr_plot$Var2,cr_plot$value,type="o",col="red", ann=FALSE, xaxt="n")
title("Cumulative Return Vs Fractional (f)",xlab="Fractional (f)", ylab="Cumulative Return (%)")
axis(side = 1, at = f, las=2)
# Maximum Draw down
plot(dd.plot$Var2,dd.plot$value,type="o",col="red", ann=FALSE,xaxt="n")
title("Maximum Drawdown Vs Fractional (f)",xlab="Fractional (f)", ylab="Maximum Draw Down (%)")
axis(side = 1, at = f, las=2)

# Find win / loss % of sample
pw <- sum(rets >0) / length(rets)
pl <- sum(rets <0)/ length(rets)
pw
pl

###############################################################
# Find Risk of ruin for any given fractional (f)
###############################################################
# Calculate daily returns
rets.list <- list()
mean.win.list <- list()
mean.loss.list <- list()
sd.ret.list <- list()
i=1
for (i in 1:21) { 
  rets.rets <- ROC(out[,i],type = "discrete",na.pad=TRUE)
  rets.list[[i]] <- rets.rets
  mean.win.list[[i]] <- mean(rets.rets[rets.rets>0],na.rm=TRUE)
  mean.loss.list[[i]] <- mean(rets.rets[rets.rets<0],na.rm=TRUE)
    sd.ret.list[[i]] <- sd(rets.rets,na.rm=TRUE)
}

# cbind output 
rets_out <- do.call(cbind,rets.list)
cnames <- f
colnames(rets_out) <- cnames

mean_win_out <- do.call(cbind,mean.win.list)
cnames <- f
colnames(mean_win_out) <- cnames

mean_loss_out <- do.call(cbind,mean.loss.list)
cnames <- f
colnames(mean_loss_out) <- cnames

sd_out <- do.call(cbind,mean.ret.list)
cnames <- f
colnames(sd_out) <- cnames

# Fixed fractional risk of ruin
# Ralph Vince 1990 (Portfolio Management Formulas : Mathematical Trading Methods for the Futures, Options, and Stock Markets)
risk_ruin.list <- list()
  i=1
q=2
g= 0.3   # Set maximum % loss willing to accept
for (i in 1:21) { 
  aw  = mean_win_out[,i]
  al  = mean_loss_out[,i]
  pw = pw
  p = .5 * (1+(z/a))
  z = (abs((aw/q)*pw)) - (abs(al/q)*(1-pw))
  a = ((pw * (aw/q)^2) + ((1-pw) * (al/q)^2)) ^(1/2)
  u = g/a
  risk_ruin.list[[i]] =  ((1-p)/p)^u *100
  }

# cbind output 
risk_ruin_out <- do.call(cbind,risk_ruin.list)
cnames <- f
colnames(risk_ruin_out) <- cnames

# Plot risk of ruin vs fractional (f)
rr.plot <- melt(risk_ruin_out)
rr.plot <- replace(rr.plot, is.na(rr.plot), 0)
#options(scipen = 0)
# Plot
plot(rr.plot$Var2,rr.plot$value,type="o",col="red", ann=FALSE,xaxt="n")
title("Risk Of Ruin Vs Fractional (f)",xlab="Fractional (f)", ylab="Risk of Ruin (%)")
axis(side = 1, at = f, las=2)

########################################################################################
# Find sharpe ratio for any given fractional (f)
################################################################################
sr.list <- list()
i=1
for (i in 1:21) { 
  sr.rets <- ROC(out[,i],type = "discrete",na.pad=TRUE)
  sr.list[[i]] <- rbind(as.numeric(as.data.frame(SharpeRatio.annualized(xts(sr.rets, Sys.Date()-length(sr.rets):1)))))
}

# cbind output 
sr_out <- do.call(cbind,sr.list)
cnames <- f
colnames(sr_out) <- cnames
# Plot cumulative return 
sr_plot <- melt(sr_out)
sr_plot <- setDT(sr_plot, keep.rownames=TRUE)
sr_plot <- data.frame(sr_plot)
plot(sr_plot$Var2,sr_plot$value,type="o",col="red",ann=FALSE,xaxt="n")
title("Sharpe Ratio Vs Fractional (f)",xlab="Fractional (f)", ylab="Sharpe Ratio")
axis(side = 1, at = f, las=2)

############################################################
# Find optimal F 
###########################################################
find.max <- which(cr_plot$value==max(cr_plot$value))
optimal.f <- cr_plot$Var2[find.max]
cat("Optimal f for this iteration = ", optimal.f)

########################################################################################################
# Highlight code and ctrl+r to run many iterations

 And the output:

Rplot273

With an optimal f of:

> cat("Optimal f = ", optimal.f)
Optimal f =  0.2

And for reference, the random sample winning / losing %:

> # Find win / loss % of sample
> pw <- sum(rets >0) / length(rets) # win
> pl <- sum(rets <0)/ length(rets)  # loss
> pw
[1] 0.5674603
> pl
[1] 0.4325397

For this particular trade sequence, we see that optimal f is 0.20. If we connect the dots for maximum draw down, 0.20 = max dd of 74% and a risk of ruin for a 30% draw down at 17.4%. One would select a lower optimal f, or abandon the strategy all together.

A few notes about the risk of ruin calculation. It was taken from Ralph Vince(1990) and is the R3 risk of ruin calculation as described in his book.

Next steps for the monte carlo simulator involve adding in $ PnL of back tested results. And run 1000’s of simulations. I will be working on this over the coming month.

Full code on my github.

Thanks for reading, any questions drop me a line.

Testing A Strategy For Robustness With Time Series Cross Validation

In two previous posts we attempted to increase our sample size for back testing using time series bootstrapping. One method used the maximum entropy bootstrap for time series using the meboot R package, the post can be found here. The second method involved using block bootstrap with sample with replacement (one may edit code for without), by any means the post can be found here.

After questioning the validity of the above procedures. We now focus on applying cross validation techniques on our time series data. Instead of optimizing over the full data set which can lead to over fitting. We split the data into many train (in-sample) and test (out-sample) periods. A good explanation can be found here.

In this example. We split each train set by a total number of 252 rows, which equates to 1 year of trading data. Each test period is split into a total number of 126 rows, which equates to 6 months of trading data. For this example we will use the VXV / VXMT sma ratio strategy. As a reminder the rules are as follows:

  1. long VXX when VXV/VXMT ratio is over 1 & the VXV/VXMT ratio is over an ‘n’ period sma of the VXV/VXMT ratio.
  2. long XIV when VXV/VXMT ratio is less than 1 & the VXV/VXMT ratio is below an ‘n’ period sma of the VXV/VXMT ratio.

The free parameters which can be optimized are the VXV/VXMT ratio value and also the VXV/VXMT simple moving average look back period.

We optimize the vxv/vxmt ratio sma look back period over a sma length of 2:170. Each sma ratio is optimized over each train set and ranked per their sharpe ratio. Those that produced the highest sharpe ratio were used in the next test set period.

An example of the procedure: after optimizing the sma lookback on train set 1, from rows 1:253, [1+252 = 253] we use the sma look back which produced the highest sharpe ratio in the next test set 1, rows 253:378 (+125 rows, inclusive start row). After this test set period. We move the whole train set window up by +126 rows, making train set 2 range from rows, 127 to 379 (+252 row width) the next test set 1 then begins on row 379 to 504 (+126 row width, inclusive start row). After this test set, we then move the train set 3 window up another 126 rows, so on and so forth until we reach the end of the series.

First before splitting the train and test sets, we wish to continuously calculate the VXV/VXMT ratio and all VXV/VXMT ratio SMA’s. This is so that we have the long run history and we do not try and calculate anything on each smaller train/test block.

Next we run the back test over all train set periods and record the highest sharpe ratio in each train period, the code that achieves this:


# Train / Test Set VXV / VXMT Strategy 
# Andrew Bannerman 

################################################
# Procedure 
# 1. Split data into train / test sets 
# 2. Select 1 year look back for each train set (252 days)
# 3. Select 6 months test set (126)
# 4. Re-calibrate on a moving window through entire time series 
#################################################

# Required Packages 
library(readxl)
require(xts)
require(data.table)
require(ggplot2)
require(RColorBrewer)
require(lubridate)
require(magrittr)
require(scales)
require(reshape2)
require(PerformanceAnalytics)
require(dplyr)
require(TTR)

#################################################
# 1. Load Data 
#################################################
# Load Syntehtic and join to alpha vantage adjusted prices 
# Load synthetic VXX and XIV data 
synth <- read_excel("D:/R Projects/Final Scripts/VIX_term_structure/vix-funds-models-no-formulas.xls", col_names = TRUE)
synth1 <- read_excel("D:/R Projects/Final Scripts/VIX_term_structure/vix-mt-funds-models-no-formulas.xls", col_names = TRUE)
synth <- as.data.frame(synth)
synth1 <- as.data.frame(synth1)
# Extract synthetic series 
vxx.synth <- data.frame(synth$Date, synth$'VXX calc')
xiv.synth <- data.frame(synth$Date, synth$'XIV calc')
ziv.synth <- data.frame(synth1$Date, synth1$'ZIV calc')
vxz.synth <- data.frame(synth1$Date, synth1$'VXZ calc')
colnames(vxx.synth)[1] <- "Date"
colnames(vxx.synth)[2] <- "vxx_close"
colnames(xiv.synth)[1] <- "Date"
colnames(xiv.synth)[2] <- "xiv_close"
colnames(ziv.synth)[1] <- "Date"
colnames(ziv.synth)[2] <- "ziv_close"
colnames(vxz.synth)[1] <- "Date"
colnames(vxz.synth)[2] <- "vxz_close"
vxx.synth$Date <- ymd(vxx.synth$Date)
xiv.synth$Date <- ymd(xiv.synth$Date)
ziv.synth$Date <- ymd(ziv.synth$Date)
vxz.synth$Date <- ymd(vxz.synth$Date)

# Download data from alphavantage
VXX <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=VXX&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
VXX$timestamp <- ymd(VXX$timestamp)   #Lubridate to change character date to date format
VXX <- arrange(VXX,timestamp)   #dplyr to sort data frame by date ascending order
colnames(VXX)[1] <- "Date"
VXX$Date <- ymd(VXX$Date)
VXX <- as.data.frame(VXX)

XIV <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=XIV&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
XIV$timestamp <- ymd(XIV$timestamp)   #Lubridate to change character date to date format
XIV <- arrange(XIV,timestamp)   #dplyr to sort data frame by date ascending order
colnames(XIV)[1] <- "Date"
XIV$Date <- ymd(XIV$Date)
XIV <- as.data.frame(XIV)

# Join sythentic data to alpha vantage 
vxx.synth <- subset(vxx.synth, Date <= as.POSIXct("2009-01-29"))
xiv.synth <- subset(xiv.synth, Date <= as.POSIXct("2010-11-29"))
# Subset only date and close from alpha vantage data 
VXX <- VXX[ -c(2:5, 7:9) ]  # subset adjusted close
XIV <- XIV[ -c(2:5, 7:9) ]  # subset adjusted close
colnames(VXX)[2] <- "vxx_close"
colnames(XIV)[2] <- "xiv_close"

# row bind 
VXX <- rbind(vxx.synth,VXX)
XIV <- rbind(xiv.synth,XIV)
df <- cbind(VXX,XIV)

# Download VXV and VXMT data from CBOE website
# Download VXV Data From CBOE website 
VXV_cboe <- fread("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vix3mdailyprices.csv")
VXV_cboe <- as.data.frame(VXV_cboe)
VXV_cboe <- VXV_cboe[3:nrow(VXV_cboe), ]
colnames(VXV_cboe)[1] = "Date"
colnames(VXV_cboe)[2] = "vxv_cboe_open"
colnames(VXV_cboe)[3] = "vxv_cboe_high"
colnames(VXV_cboe)[4] = "vxv_cboe_low"
colnames(VXV_cboe)[5] = "vxv_cboe_close"
VXV_cboe$Date <- mdy(VXV_cboe$Date)
cols <-c(2:5)
VXV_cboe[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Download VXMT Data from CBOE website
VXMT_cboe <- fread("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vxmtdailyprices.csv")
VXMT_cboe <- as.data.frame(VXMT_cboe)
VXMT_cboe <- VXMT_cboe[3:nrow(VXMT_cboe), ]
colnames(VXMT_cboe)[1] = "Date"
colnames(VXMT_cboe)[2] = "vxmt_cboe_open"
colnames(VXMT_cboe)[3] = "vxmt_cboe_high"
colnames(VXMT_cboe)[4] = "vxmt_cboe_low"
colnames(VXMT_cboe)[5] = "vxmt_cboe_close"
VXMT_cboe$Date <- mdy(VXMT_cboe$Date)
cols <-c(2:5)
VXMT_cboe[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Join VIX, VIX3m (VXV) and VXMT CBOE data to ETF df
cboe.df <- merge(VIX_cboe,VXV_cboe, by="Date")
cboe.df <- merge(cboe.df,VXMT_cboe, by="Date")
start_date <- cboe.df$Date[1]
# Subset Strategy To Start At VXMT Date 
df <- subset(df, Date >= as.POSIXct(start_date) )
df <- df[,c(-3)] # Drop unused dates
df <- full_join(df, cboe.df, by = c("Date" = "Date"))

####################################################################
# Create VXV VXMT Ratio and ratio SMA 
###################################################################
# VXV / VXMT Ratio 
df$vxv.vxmt.ratio <- df$vxv_cboe_close / df$vxmt_cboe_close
# Calculate Close-to-Close returns
df$vxx.close.ret <- ROC(df$vxx_close, type = c("discrete"))
df$xiv.close.ret <- ROC(df$xiv_close, type = c("discrete"))

# Calculate SMA of VXV , VXMT ratio 
sma.lookback <- 2:170
getSMA <- function(y) {
  function(x) {
    SMA(df[,"vxv.vxmt.ratio"], sma.lookback[i])    # Calls TTR package to create SMA
  }
}
# Create a matrix to put the SMAs in
sma.matrix <- matrix(nrow=nrow(df), ncol=0)
# Loop for filling it
for (i in 1:length(sma.lookback)) {
  sma.matrix <- cbind(sma.matrix, getSMA(i)(x))
}
# Rename columns
colnames(sma.matrix) <- sapply(sma.lookback, function(n)paste("ratio.sma.n", n, sep=""))
# Bind to existing dataframe
df <-  cbind(df, sma.matrix)

####################################################################
# Split Data Into Train and Test Sets
# 252 days train , 126 days test 
##################################################################
# Set row numbers for splitting to train and test sets
total.row <- nrow(df) # total rows
train.length <- 252
test.length <- 125
train.list <- list()
test.list <- list()
# Set start and ending rows for subsetting by row
train_num_set <- seq(1, nrow(df), by=test.length+1)
test_num_set <- seq(train.length+1, nrow(df), by=test.length+1)
train.list<- lapply(train_num_set, function(i) df[c(i:(i+train.length)),])
test.list<- lapply(test_num_set, function(i) df[c(i:(i+test.length)),])


####################################################################
# Run Optimizations over all training windows
# Select optimal sharpe ratio 
####################################################################
train.set.results <- data.frame()
i=1
train.loop.length[i]
train.loop.length <- rep(1:length(test.list),each=length(sma.lookback)) # run through the dfs the same length as the number of smas
sma.loop.length <- rep(2:length(sma.lookback),length(train.list)) # run each sma over every df
train.xts <- list()
# Loop for running all smas through all train sets, note need to wrap train.loop.length and sma.loop.lenght in [[i]]
for (i in 1:length(train.loop.length)) {
  tryCatch({
    df.num <- train.loop.length[[i]]
  # Enter buy / sell rules
  train.temp.df <- data.frame(Date=train.list[[df.num]]$Date)
  train.temp.df$vxx.signal <- ifelse(train.list[[df.num]]$vxv.vxmt.ratio > 1 & train.list[[df.num]]$vxv.vxmt.ratio > train.list[[df.num]][,paste0("ratio.sma.n", sma.loop.length[[i]])], 1,0)
  train.temp.df$xiv.signal <- ifelse(train.list[[df.num]]$vxv.vxmt.ratio < 1 & train.list[[df.num]]$vxv.vxmt.ratio < train.list[[df.num]][,paste0("ratio.sma.n", sma.loop.length[[i]])], 1,0)
  
  # lag signal by two forward days
  # CBOE data is available next day
  train.temp.df$vxx.signal <- lag(train.temp.df$vxx.signal,2) # Note k=1 implies a move *forward*
  train.temp.df$xiv.signal <- lag(train.temp.df$xiv.signal,2) # Note k=1 implies a move *forward*
  
  train.temp.df[is.na(train.temp.df)] <- as.Date(0)  # Set NA to 0
  
  # Calculate equity curves
  train.temp.df$vxx.signal.ret <-   train.temp.df$vxx.signal *  train.list[[df.num]]$vxx.close.ret
  train.temp.df$xiv.signal.ret <-   train.temp.df$xiv.signal *  train.list[[df.num]]$xiv.close.ret
  
  # Combine signals 
   train.temp.df$total.signal.ret <-  train.temp.df$vxx.signal.ret +  train.temp.df$xiv.signal.ret
  
  # Pull select columns from data frame to make XTS whilst retaining formats
  xts1 = xts(train.temp.df$vxx.signal.ret, order.by=as.Date(train.temp.df$Date, format="%y-%m-%d"))
  xts2 = xts(train.temp.df$xiv.signal.ret, order.by=as.Date(train.temp.df$Date, format="%y/%m/%d")) 
  xts3 = xts(train.temp.df$total.signal.ret, order.by=as.Date(train.temp.df$Date, format="%y/%m/%d")) 
  
  # Join XTS together
  compare <- cbind(xts1,xts2,xts3)
  
  # Use the PerformanceAnalytics package for trade statistics
  require(PerformanceAnalytics)
  colnames(compare) <- c("vxx","xiv","combined")
  #charts.PerformanceSummary(compare,main="Long when current month is higher than previous 12 month", wealth.index=TRUE, colorset=rainbow12equal)
  #performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
  #drawdown.table <- rbind(table.Drawdowns(xts3))
  #dev.off()  
  # logRets <- log(cumprod(1+compare))
  # chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal,main="Log Returns")
  
  #print(performance.table)
  #print(drawdown.table)
  cum.ret <- Return.cumulative(xts3, geometric = TRUE)
  annualized <- Return.annualized(xts3, scale = NA, geometric = TRUE)
  dd <- maxDrawdown(xts3)
  sharpe <- SharpeRatio.annualized(xts3, Rf = 0, scale = NA, geometric = TRUE)
  
  
  # Output data
  temp <- data.frame("Annualized Return" = annualized,"Annualized Sharpe" = sharpe,"Cumulative Return" = cum.ret,"Maximum Draw Down" =dd, ID=as.numeric(sma.loop.length[[i]]))
  rownames(temp) <- paste0("train.set.",train.loop.length[[i]],"_sma_",sma.loop.length[[i]])
  train.set.results <- rbind.data.frame(train.set.results,temp)
 compare <- data.frame(compare)
 compare <-  data.frame(date=index(compare), coredata(compare))
 train.xts[[i]] <- cbind.data.frame(compare)
  
  ptm0 <- proc.time()
  Sys.sleep(0.1)  
  ptm1=proc.time() - ptm0
  time=as.numeric(ptm1[3])
  cat('\n','Iteration Train Set',train.loop.length[i],'sma',sma.loop.length[i],'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}


# Seperate train set results from data frame output
train.set.results <- setDT(train.set.results, keep.rownames = TRUE)[] # Set row names
train.set.results$ID <- as.numeric(train.set.results$ID)
# Subset test set results
l <- nrow(train.set.results)
train.result.list <- list()
output_num <- seq(1, l, by=length(sma.lookback))
train.result.list<- lapply(output_num, function(i) train.set.results[c(i:(i+length(sma.lookback)-1)),])

# Extract highest performance metric from each data frame within list
# Sharpe Ratio in this case
highest.sharpe <- list()
i=1
for (i in 1:length(train.result.list)) {  # for every data frame inside results list
  temp <- train.result.list[[i]][order(train.result.list[[i]]$Annualized.Sharpe),]
  highest.sharpe[[i]] <- as.data.frame(tail(temp[,6],1))
}

# Save performance metric
sharpe.df <- do.call(cbind,highest.sharpe)
sharpe.vec <- paste(sharpe.df)
sharpe.vec <- as.numeric(sharpe.vec)
cat("Optimal In-Sample Train Set Parameters:SMA of VXV/VXMT Ratio - Optimal SMAs =",sharpe.vec)

###### Sharpe Optimization Complete ##########

As we can see, there is a large degree of movement in the optimal sma look back period:

cat("Optimal In-Sample Train Set Parameters:SMA of VXV/VXMT Ratio - Optimal SMAs =",sharpe.vec)
Optimal In-Sample Train Set Parameters:SMA of VXV/VXMT Ratio - Optimal SMAs = 35 108 139 138 135 65 63 63 145 2 18 23 21 32 4 13 24 66 136 2

For train set 1, optimal sma of 35, train set 2, optimal sma of 108, train set 3, optimal sma of 139 etc..

These smas produced the highest sharpe ratios during in-sample training. We then use these sma parameters for each out of sample test set. Data that the model has never seen before. The code to run optimized parameters over each test test period:

#################################################################
# Run Optimal Parameters over all train sets 
#################################################################

test.set.results <- data.frame()
test.xts <- list()
test.loop.length <- rep(1:length(test.list),1) # run through the all df's 1x 
optimal.sma <- sharpe.vec # run each sma over every df
i=1
################### Manual Intervention ############################ 
# Remove nas on last test data frame inside list (Note this willl change adjust as needed)
#######################################################################
length.test.set <- length(test.list)
tail(test.list[[length.test.set]])
nrow.test.list <- NROW(test.list[[length.test.set]])
test.list[[length.test.set]] <- test.list[[length.test.set]][1:((nrow.test.list)-5),]
# Loop for running all smas through all train sets, note need to wrap train.loop.length and sma.loop.lenght in [[i]]
for (i in 1:length(test.loop.length )) {
  tryCatch({
    df.num <- test.loop.length[i]
    # Enter buy / sell rules
    test.temp.df <- data.frame(Date=test.list[[df.num]]$Date)
    test.temp.df$vxx.signal <- ifelse(test.list[[df.num]]$vxv.vxmt.ratio > 1 & test.list[[df.num]]$vxv.vxmt.ratio > test.list[[df.num]][,paste0("ratio.sma.n", optimal.sma[[i]])], 1,0)
    test.temp.df$xiv.signal <- ifelse(test.list[[df.num]]$vxv.vxmt.ratio < 1 & test.list[[df.num]]$vxv.vxmt.ratio < test.list[[df.num]][,paste0("ratio.sma.n", optimal.sma[[i]])], 1,0)
    
    # lag signal by two forward days
    # CBOE data is available next day
    test.temp.df$vxx.signal <- lag(test.temp.df$vxx.signal,2) # Note k=1 implies a move *forward*
    test.temp.df$xiv.signal <- lag(test.temp.df$xiv.signal,2) # Note k=1 implies a move *forward*
    
    test.temp.df[is.na(test.temp.df)] <- as.Date(0)  # Set NA to 0
    
    # Calculate equity curves
    test.temp.df$vxx.signal.ret <-   test.temp.df$vxx.signal *  test.list[[df.num]]$vxx.close.ret
    test.temp.df$xiv.signal.ret <-   test.temp.df$xiv.signal *  test.list[[df.num]]$xiv.close.ret
    
    # Combine signals 
    test.temp.df$total.signal.ret <-  test.temp.df$vxx.signal.ret +  test.temp.df$xiv.signal.ret
    
    # Pull select columns from data frame to make XTS whilst retaining formats
    xts1 = xts(test.temp.df$vxx.signal.ret, order.by=as.Date(test.temp.df$Date, format="%y-%m-%d"))
    xts2 = xts(test.temp.df$xiv.signal.ret, order.by=as.Date(test.temp.df$Date, format="%y/%m/%d")) 
    xts3 = xts(test.temp.df$total.signal.ret, order.by=as.Date(test.temp.df$Date, format="%y/%m/%d")) 
    
    # Join XTS together
    compare <- cbind(xts1,xts2,xts3)
    
    # Use the PerformanceAnalytics package for trade statistics
    require(PerformanceAnalytics)
    colnames(compare) <- c("vxx","xiv","combined")
    charts.PerformanceSummary(compare,main="Long when current month is higher than previous 12 month", wealth.index=TRUE, colorset=rainbow12equal)
    #performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
    #drawdown.table <- rbind(table.Drawdowns(xts3))
    #dev.off()  
    # logRets <- log(cumprod(1+compare))
    # chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal,main="Log Returns")
    
    #print(performance.table)
    #print(drawdown.table)
    cum.ret <- Return.cumulative(xts3, geometric = TRUE)
    annualized <- Return.annualized(xts3, scale = NA, geometric = TRUE)
    dd <- maxDrawdown(xts3)
    sharpe <- SharpeRatio.annualized(xts3, Rf = 0, scale = NA, geometric = TRUE)
    
    
    # Save test set results
    temp <- data.frame("Annualized Return" = annualized,"Annualized Sharpe" = sharpe,"Cumulative Return" = cum.ret,"Maximum Draw Down" =dd, ID=as.numeric(optimal.sma[[i]]))
    test.set.results <- rbind.data.frame(test.set.results)
    test.xts[[i]] <- cbind.data.frame(compare)
    
    ptm0 <- proc.time()
    Sys.sleep(0.1)  
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration Test Set',test.loop.length[i],'optimal.sma',optimal.sma[i],'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

# Extract xts3 cumulative returns 
cum.rets <- do.call(rbind,test.xts)
cum.rets <- setDT(cum.rets, keep.rownames = TRUE)[] # Set row names
colnames(cum.rets)[1] <- "Date"
cum.rets[is.na(cum.rets)] <- 0
cum.rets$Date <- ymd(cum.rets$Date)
cum.rets.xts <- xts(cum.rets$combined, order.by=as.Date(cum.rets$Date, format="%y-%m-%d")) 

# Summary Statistic on final walk forward curve 
charts.PerformanceSummary(cum.rets,main="Long when current month is higher than previous 12 month", wealth.index=TRUE, colorset=rainbow12equal)
table.AnnualizedReturns(cum.rets.xts)
maxDrawdown(cum.rets.xts)
CalmarRatio(cum.rets.xts)
table.DownsideRisk(cum.rets.xts)
drawdown.table <- rbind(table.Drawdowns(cum.rets.xts))
drawdown.table 

Here are some performance metrics for all test set periods:

 > table.AnnualizedReturns(cum.rets.xts)

Annualized Return         0.6797
Annualized Std Dev        0.4312
Annualized Sharpe (Rf=0%) 1.5765
> maxDrawdown(cum.rets.xts)
[1] 0.4312712
> CalmarRatio(cum.rets.xts)
                 [,1]
Calmar Ratio 1.576093
> table.DownsideRisk(cum.rets.xts)

Semi Deviation                 0.0192
Gain Deviation                 0.0235
Loss Deviation                 0.0266
Downside Deviation (MAR=210%)  0.0220
Downside Deviation (Rf=0%)     0.0183
Downside Deviation (0%)        0.0183
Maximum Drawdown               0.4313
Historical VaR (95%)          -0.0409
Historical ES (95%)           -0.0677
Modified VaR (95%)            -0.0385
Modified ES (95%)             -0.0621

The strategy reports an annualized return of 68% for all out of sample test periods with an annualized sharpe ratio of 1.57. Note that is data from VXMT inception, 2008 to present day.

Next we can validate our code by creating plots of the test set equity curve and plot the train and test set windows:

# Extract train and test dates for plotting 
# Train Dates
train.dateslist <- list()
train.dates.loop.length <- rep(1:length(train.list),each=1)
temp <- data.frame()
i=1
for (i in 1:length(train.dates.loop.length)) {
  temp <- data.frame("ID"=paste("train.id",train.dates.loop.length[[i]]))
  date <- train.list[[i]]$Date
  train.dateslist[[i]] <- cbind("Date" = date,ID=temp)

# Test Dates
test.dateslist <- list()
test.dates.loop.length <- rep(1:length(test.list),each=1)
i=1
for (i in 1:length(test.dates.loop.length)) {
  temp <- data.frame("ID"=paste("test.id",test.dates.loop.length[[i]]))
  date <- test.list[[i]]$Date
  test.dateslist[[i]] <- cbind("Date" = date,ID=temp)
  }

# Loop to extract start and end dates of each train / test segment 
train.dates.start <- list()
train.dates.end <- list()
for (i in 1:length(train.dateslist)) {
train.dates.start[[i]] <- train.dateslist[[i]]$Date[1]
train.dates.end[[i]] <- tail(train.dateslist[[i]]$Date,1)
}
test.dates.start <- list()
test.dates.end <- list()
for (i in 1:length(test.dateslist)) {
  test.dates.start[[i]] <- test.dateslist[[i]]$Date[1]
  test.dates.end[[i]] <- tail(test.dateslist[[i]]$Date,1)
}

# Plot train and test starting / ending periods
  ggplot()+
    theme_bw()+
    scale_y_continuous(expand = c(0, 0),breaks = round(seq(min(1), max(length(test.dateslist)), by = 1)))+ # set y axis to same amount as total train / test sample periosd
  theme(legend.position = "none")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Train And Test Windows",subtitle=paste("Train Width =",train.length,"Days, Test Width =",test.length,"Days"))+
    geom_rect(aes(xmin=as.Date(train.dates.start[[1]]),xmax=as.Date(train.dates.end[[1]]),ymin=1, ymax=2),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[2]]),xmax=as.Date(train.dates.end[[2]]),ymin=2, ymax=3),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[3]]),xmax=as.Date(train.dates.end[[3]]),ymin=3, ymax=4),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[4]]),xmax=as.Date(train.dates.end[[4]]),ymin=4, ymax=5),fill="#0072B2",alpha=.8)+    
    geom_rect(aes(xmin=as.Date(train.dates.start[[5]]),xmax=as.Date(train.dates.end[[5]]),ymin=5, ymax=6),fill="#0072B2",alpha=.8)+    
    geom_rect(aes(xmin=as.Date(train.dates.start[[6]]),xmax=as.Date(train.dates.end[[6]]),ymin=6, ymax=7),fill="#0072B2",alpha=.8)+    
    geom_rect(aes(xmin=as.Date(train.dates.start[[7]]),xmax=as.Date(train.dates.end[[7]]),ymin=7, ymax=8),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[8]]),xmax=as.Date(train.dates.end[[8]]),ymin=8, ymax=9),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[9]]),xmax=as.Date(train.dates.end[[9]]),ymin=9, ymax=10),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[10]]),xmax=as.Date(train.dates.end[[10]]),ymin=10, ymax=11),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[11]]),xmax=as.Date(train.dates.end[[11]]),ymin=11, ymax=12),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[12]]),xmax=as.Date(train.dates.end[[12]]),ymin=12, ymax=13),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[13]]),xmax=as.Date(train.dates.end[[13]]),ymin=13, ymax=14),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[14]]),xmax=as.Date(train.dates.end[[14]]),ymin=14, ymax=15),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[15]]),xmax=as.Date(train.dates.end[[15]]),ymin=15, ymax=16),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[16]]),xmax=as.Date(train.dates.end[[16]]),ymin=16, ymax=17),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[17]]),xmax=as.Date(train.dates.end[[17]]),ymin=17, ymax=18),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(train.dates.start[[18]]),xmax=as.Date(train.dates.end[[18]]),ymin=18, ymax=19),fill="#0072B2",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[1]]),xmax=as.Date(test.dates.end[[1]]),ymin=1, ymax=2),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[2]]),xmax=as.Date(test.dates.end[[2]]),ymin=2, ymax=3),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[3]]),xmax=as.Date(test.dates.end[[3]]),ymin=3, ymax=4),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[4]]),xmax=as.Date(test.dates.end[[4]]),ymin=4, ymax=5),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[5]]),xmax=as.Date(test.dates.end[[5]]),ymin=5, ymax=6),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[6]]),xmax=as.Date(test.dates.end[[6]]),ymin=6, ymax=7),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[7]]),xmax=as.Date(test.dates.end[[7]]),ymin=7, ymax=8),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[8]]),xmax=as.Date(test.dates.end[[8]]),ymin=8, ymax=9),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[9]]),xmax=as.Date(test.dates.end[[9]]),ymin=9, ymax=10),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[10]]),xmax=as.Date(test.dates.end[[10]]),ymin=10, ymax=11),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[11]]),xmax=as.Date(test.dates.end[[11]]),ymin=11, ymax=12),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[12]]),xmax=as.Date(test.dates.end[[12]]),ymin=12, ymax=13),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[13]]),xmax=as.Date(test.dates.end[[13]]),ymin=13, ymax=14),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[14]]),xmax=as.Date(test.dates.end[[14]]),ymin=14, ymax=15),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[15]]),xmax=as.Date(test.dates.end[[15]]),ymin=15, ymax=16),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[16]]),xmax=as.Date(test.dates.end[[16]]),ymin=16, ymax=17),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[17]]),xmax=as.Date(test.dates.end[[17]]),ymin=17, ymax=18),fill="#D55E00",alpha=.8)+
    geom_rect(aes(xmin=as.Date(test.dates.start[[18]]),xmax=as.Date(test.dates.end[[18]]),ymin=18, ymax=19),fill="#D55E00",alpha=.8)+
    labs(x="Date",y="Train / Test Set No.")
    
    
# Extract final equity curve from test set 
  final.plot <- list()
  i=1
  for (i in 1:length(test.xts)) {
    to.sum <- data.frame("to.sum"=test.xts[[i]]$combined)
    dates <- test.list[[i]]$Date
      final.plot[[i]] <- data.frame("Date"=dates,to.sum)
  }

  # Make df of output 
  final.df <- do.call(rbind,final.plot)
  # Cumsum equity curve
final.df$equity <- cumprod(final.df$to.sum + 1) -1 
  final.df <- final.df[c(1,3) ]
  # Melt df 
  final.plot.df <- melt(data = final.df,id.vars = 'Date')   # melt df for plotting with ggplot2
  # Create ID
  final.plot.df$final_num <- as.numeric(rep(1:test.length+1, each=test.length+1, length.out=nrow(final.plot.df)))
  # Subset plots to plot each test period a different colour
  subset.plots <- list()
  test.width <- 125
  subset.plots<- lapply(row_nums, function(i) final.plot.df[c(i:(i+test.length)),])
 # Create plot 
  # Create colour scheme
  myPalette <- colorRampPalette(rev(brewer.pal(11, "Paired")))
  sc <- scale_colour_gradientn(colours = myPalette(100), limits=c(1, 18))
  # Plot out of sample test results equity curve                           
  ggplot(data = subset.plots[[1]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[2]], aes(x =Date, y = value))+
    geom_line(data = subset.plots[[3]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[4]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[5]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[6]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[7]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[8]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[9]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[10]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[11]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[12]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[13]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[14]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[15]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[16]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[17]], aes(x =Date, y = value,colour=final_num))+
    geom_line(data = subset.plots[[18]], aes(x =Date, y = value,colour=final_num))+
    sc+ 
    theme_bw()+
    theme(legend.position = "none")+
    scale_y_continuous(breaks = seq(1, 110, by = 5))+
    ggtitle("Test Set Results", subtitle =paste("Test Set Lengths =",test.length," rows")) +
    labs(x="Date",y="Cumulative Return")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))

  
Rplot204
Train and Test Sets – Train = 252 trading days. Test = 126 days.

The plot above was created using the same data created by subsetting the rows into train and test set periods and stitching them back together. The blue periods indicate in-sample test periods and the orange represent out of sample test periods. The final equity curve below is the result of all test set periods.

cumretrs
Cumulative Returns All Test Set Periods

Each color on the equity curve represents each test set period, 126 trading days long.

To conclude, the benefit of this type of cross validation is that it demonstrates robustness. Each in sample parameter optimization is subsequently used in the out of sample period. This is data that the model has never seen before. If an optimized parameter fell apart in out of sample testing, we know we have over fit in some way or another, or the edge itself has disappeared. Failing this test would be cause to abandon the strategy.

Thanks for reading!

Full code can be found on my github.

Bootstap Analysis – Resample and Replace

As I alluded in the previous post I made attempts to block boostrap time series outside of the packages, meboot, tseries & boot. In this post I will share the results of the effort.

First, like the last post we need data, we need to arrive at this point in the previous posts script:

annotate("text", label = "sma 117 to 161", x = 135, y = .65, color = "red")
line 294, from there you may follow along with this post.

The reason for block boot strapping time series is to maintain the correlation structure of the series. First in order to do this we will difference the time series data with the following code to create stationary series:

# First order differencing
# Removes auto correlation so each data point is not dependant on the other
# XIV diff
df$xiv.close.diff &lt;- c(rep(NA, 1), diff(df$xiv_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$xiv.close.diff, type="l",main="XIV Close Differencing")
mean(df$xiv.close.diff, na.rm = TRUE)
var(df$xiv.close.diff, na.rm = TRUE)
# VXX diff
df$vxx.close.diff &lt;- c(rep(NA, 1), diff(df$vxx_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
df$vxx.close.diff &lt;- c(rep(NA, 1), diff(df$vxx_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$vxx.close.diff, type="l")
mean(df$vxx.close.diff, na.rm = TRUE)
var(df$vxx.close.diff, na.rm = TRUE)
# VXV Diff
df$vxv.close.diff &lt;- c(rep(NA, 1), diff(df$vxv_cboe_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$vxv.close.diff, type="l")
mean(df$vxv.close.diff, na.rm = TRUE)
var(df$vxv.close.diff, na.rm = TRUE)
#VXMT Diff
df$vxmt.close.diff &lt;- c(rep(NA, 1), diff(df$vxmt_cboe_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$vxmt.close.diff, type="l")
mean(df$vxmt.close.diff, na.rm = TRUE)
var(df$vxmt.close.diff, na.rm = TRUE)

Output:
xiv close diff

Next, we set a block size dervived from the b.star command from R package np. This calculates the block size for stationary and circular bootstrap per the paper Patton, Politis and White (2009).

# Create new data frame with differenced time series (staionary)
require(np)
#block.boot.df &lt;- subset(df, Date &gt;= as.POSIXct("2008-01-07") ) # subset at start of latest data set (VXMT)
block.boot.df &lt;- data.frame(xiv.close.diff = df$xiv.close.diff, vxx.close.diff = df$vxx.close.diff, vxv.close.diff = df$vxv.close.diff, vxmt.close.diff = df$vxmt.close.diff)
# Find block size with package np
np.block.size.xiv &lt;- b.star(block.boot.df$xiv.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
xiv.block.size &lt;- np.block.size.xiv[2]
np.block.size.vxx &lt;- b.star(block.boot.df$vxx.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
vxx.block.size &lt;- np.block.size.vxx[2]
np.block.size.vxv &lt;- b.star(block.boot.df$vxv.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
vxv.block.size &lt;- np.block.size.vxv[2]
np.block.size.vxmt &lt;- b.star(block.boot.df$vxmt.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
vxmt.block.size &lt;- np.block.size.vxmt[2]
block.size &lt;- c(xiv.block.size,vxx.block.size,vxv.block.size,vxmt.block.size)
block &lt;- min(block.size) 

The block sizes for each 4 series in our volatility strategy:

 &gt; block.size
  xiv.block.size vxx.block.size vxv.block.size vxmt.block.size
1              2              4             11               4

For this we will choose the maximum number obtained, which is 11. The code works by grouping the data frame by nrow by its block size and storing in a list. On the list output, a re sampling with replacement is performed on the list of groupped rows. The final reshuffled list is finally stored in an output dataframe. This function is iterated 100 times, storing each block bootstrapped series in an output list. As we are working with 4x time series, all 4 series are reshuffled in the same order and will end up in the same locations in the new time series. If this was not the case there would be more randomness between all series which might not make much sense. Heres the code:

# Set block number to begin subsetting per NROW
reps &lt;- NROW(block.boot.df)/block # Set block number
reps &lt;-(round(reps))
id &lt;- rep(1:reps,each=block) # each = 5 corresponds to number of blocks to bootstrap by (5 in this case) # Check lengths of data and ID NROW(block.boot.df) length(id) # If ID is longer than df NROW if(length(id)&gt;NROW(block.boot.df)){
  id.diff &lt;- length(id) - NROW(block.boot.df)
  length.id &lt;- length(id)-id.diff
  id  &lt;- id[1:length.id]
} else {
  nrow.diff &lt;- NROW(block.boot.df) - length(id)
  max.id &lt;- max(id)
  add.value &lt;- max.id+1
  pad.na &lt;- rep(add.value,nrow.diff)  # pad nrow diff
  id &lt;- c(id,pad.na)  # join added to vector
}

# Join ID and block.df
block.boot.df &lt;- data.frame(block.boot.df,id) # place back in data frame
#block.boot.df$id.final[is.na(block.boot.df$id.final)] &lt;- 693 # all NA to 1
# Id data
IDs &lt;- unique(block.boot.df$id)
temp &lt;- list()
# Function for bootstrap 1x data frame
# subsets data by id number
i=1
bootSTRAP = function(x){
  for (i in 1:length(IDs)){
    temp[i] &lt;- list(block.boot.df[block.boot.df$id==IDs[i],])
  }
  out &lt;- sample(temp,replace = TRUE)
  boot.df.output &lt;- do.call(rbind, out)
}

# Loop for running it a 1000 times
runs &lt;- 1:100
run.output &lt;- list()
i=1
for (i in 1:length(runs)){
  tryCatch({
    temp.1 &lt;- bootSTRAP(runs[i])
    #cum_ret &lt;- rbind.data.frame(cum_ret, temp)
    run.output[[i]] &lt;- cbind(temp.1)
    ptm0 &lt;- proc.time()
    Sys.sleep(0.1)
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

# cumsum staionary differenced series
for (i in 1:length(run.output)){
  run.output[[i]]$cumsum.diff.xiv.diff &lt;- cumsum(run.output[[i]]$xiv.close.diff)
  run.output[[i]]$cumsum.diff.vxx.diff &lt;- abs(cumsum(run.output[[i]]$vxx.close.diff))
  run.output[[i]]$cumsum.diff.vxv.diff &lt;- cumsum(run.output[[i]]$vxv.close.diff)
  run.output[[i]]$cumsum.diff.vxmt.diff &lt;- cumsum(run.output[[i]]$vxmt.close.diff)
}

# Reverse VXX down trending series
for (i in 1:length(run.output)){
  run.output[[i]]$cumsum.diff.vxx.diff &lt;- rev(run.output[[i]]$cumsum.diff.vxx.diff )
}
# Add index for merging
for (i in 1:length(run.output)){
  run.output[[i]]$index &lt;- seq(1:NROW(run.output[[i]]))
}

# Merge all dfs
L &lt;- run.output[[1]]
for (i in 2:length(run.output)) L &lt;- merge(L,  run.output[[i]], by=c("index"))

# cbinds
replace.s &lt;- cbind.data.frame(L)
replace.s &lt;- replace.s[,-1]
# Subset data frames for loop
xiv.df &lt;- replace.s[ , grepl("cumsum.diff.xiv.diff*", names(replace.s), perl=TRUE)]
vxx.df &lt;- replace.s[ , grepl("cumsum.diff.vxx.diff*", names(replace.s), perl=TRUE)]
vxv.df &lt;- replace.s[ , grepl("cumsum.diff.vxv.diff*", names(replace.s), perl=TRUE)]
vxmt.df &lt;- replace.s[ , grepl("cumsum.diff.vxmt.diff*", names(replace.s), perl=TRUE)]

# Add Date to df's
df.date &lt;- data.frame(Date = df$Date)
diff &lt;- nrow(df.date) - nrow(xiv.df)
df.date &lt;- df.date[1:nrow(xiv.df),]
# Add Date to subsetted df
xiv.ensemble.df &lt;- data.frame(Date = df.date, xiv.df)
vxx.ensemble.df &lt;- data.frame(Date = df.date, vxx.df)
vxv.ensemble.df &lt;- data.frame(Date = df.date, vxv.df)
vxmt.ensemble.df &lt;- data.frame(Date = df.date, vxmt.df)

# Melt data frames for plotting
xiv.plot.df &lt;- melt(xiv.ensemble.df,id.vars = "Date")
vxx.plot.df &lt;- melt(vxx.ensemble.df,id.vars = "Date")
vxv.plot.df &lt;- melt(vxv.ensemble.df,id.vars = "Date")
vxmt.plot.df &lt;- melt(vxmt.ensemble.df,id.vars = "Date")

# Plot XIV Resampled series
ggplot(data = xiv.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=xiv_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - XIV", subtitle="100 Iterations")

# Plot VXX Resampled series
ggplot(data = vxx.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxx_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXX", subtitle="100 Iterations")

# Plot VXV Resampled series
ggplot(data = vxv.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxv_cboe_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXV", subtitle="100 Iterations")

# Plot VXMT Resampled series
ggplot(data = vxmt.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxmt_cboe_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXMT", subtitle="100 Iterations")

This is the following output:

block size 11 xiv
XIV – Resample and replace – block size == 11
vxx size 11
VXX – Resample and replace – block size == 11 – Note the re sampled is not a great representation of the original!

In contrast to maximum entropy bootstrapping where each series resembled the original to a high degree. The block bootstrapped series differ to the original. The VXX series itself, the trend is straight down. The packages, tseries and boot did not manage to bootstrap this successfully either, essentially after differencing, and cumsum, the series needed to be reversed. This is due to the sign of the VXX trend and cumsum – – values == positive and we are left with up trending VXX.

By any means. This might be effective for back testing price series based on trading rules over layed on price series. The VXV/VXMT is a bet on contango, backwardation thus the trading rules are not per say over layed on the series itself, however, those strategies which use trading rules based on price series. The above bootstrapped procedure may serve a useful purpose.

Code for this can be found on my github.

 

Robustness Testing – Volatility Strategy – Time Series Bootstrapping

In this post we will look at a simple volatility strategy. We will optimize the parameters over a back test and select a region of parameters which show the best performance. We will then subject this optimal performance band to further testing by resampling the original time series using the meboot package.

Traditionally bootstrapping is usually performed on data that is independent and identically distributed which of course it not the case with time series data due to the presence of auto correlation, if auto correlation is present then time series is dependent on its past. For lag 1 correlation at varying time periods x is the correlation between, (x4,x3),(x3,x2),(x2,x1),…). and for lag 2 correlation between time x, (x5,x3),(x4,x2),(x3,x1),…) so on and so forth for the entire sample at varying time lags.

acf xiv

This is the auto correlation of XIV. We may remove the auto correlation by differencing, Xt−Xt−1:

Rplot153One may notice there is no trend, the series is stationary. We notice the variance has increased over time.

mean(df$xiv.close.diff, na.rm = TRUE)
[1] 0.050235
> var(df$xiv.close.diff, na.rm = TRUE)
[1] 1.299783

We may stabilize the variance by taking the log of the close and differencing.

Rplot209

mean(df$log.xiv.close.diff, na.rm = TRUE)
[1] 0.001229805
> var(df$log.xiv.close.diff, na.rm = TRUE)
[1] 0.001382537

 

Rplot154

With the auto correlation removed, one may attempt to perform traditional methods of bootstrapping.

With stationary series, the mean is constant over time, there is no trend. An idea I was experimenting with was differencing as per above, and in an attempt to retain the correlation structure of the original series, block bootstrapping through resample and replace and stitch the series back together. As the resampled series is still stationary, cumulatively sum the output in order to create a trending series. This would then serve as a new series to run new back tests on.  See my attempt here.

I so far have opted to use the meboot (maximum entropy bootstrap) R package which does not require the series to be stationary, it does not require block lengths as a parameter and it keeps the correlation structure of the data intact.

We will need XIV, VXX, CBOE VXV and CBOE VXMT data, I also use sythnetic vxx and xiv data stored on my HD, feel free to edit these portions to include the locations of your own synthetic (if you choose to want more history).

Strategy rules as follows:

  1. long VXX when VXV/VXMT ratio is over 1 & the VXV/VXMT ratio is over an ‘n’ period sma of the VXV/VXMT ratio.
  2. long XIV when VXV/VXMT ratio is less than 1 & the VXV/VXMT ratio is below an ‘n’ period sma of the VXV/VXMT ratio.

We have two free parameters here. The ratio value currently set to 1 and the lookback period for the vxv/vxmt sma ratio.

We optimize the vxv/vxmt sma ratio from sma 2:300 and plot the annualized return for each sma period, the code to do this:

Note im using alphavantage API, you will need your api key…

# Meboot time series resampling
# Andrew Bannerman 12.29.2017

require(xts)
require(data.table)
require(ggplot2)
require(lubridate)
require(magrittr)
require(scales)
require(reshape2)
require(PerformanceAnalytics)
require(dplyr)
require(TTR)
require(meboot)
require(np)

######### Download ETF Data ###############
# Load Syntehtic and join to alpha vantage adjusted prices
# Load synthetic VXX and XIV data
library(readxl)
synth <- read_excel("D:/R Projects/Final Scripts/VIX_term_structure/vix-funds-models-no-formulas.xls", col_names = TRUE)
synth1 <- read_excel("D:/R Projects/Final Scripts/VIX_term_structure/vix-mt-funds-models-no-formulas.xls", col_names = TRUE)
synth <- as.data.frame(synth)
synth1 <- as.data.frame(synth1)
# Extract synthetic series
vxx.synth <- data.frame(synth$Date, synth$'VXX calc')
xiv.synth <- data.frame(synth$Date, synth$'XIV calc')
ziv.synth <- data.frame(synth1$Date, synth1$'ZIV calc')
vxz.synth <- data.frame(synth1$Date, synth1$'VXZ calc')
colnames(vxx.synth)[1] <- "Date"
colnames(vxx.synth)[2] <- "vxx_close"
colnames(xiv.synth)[1] <- "Date"
colnames(xiv.synth)[2] <- "xiv_close"
colnames(ziv.synth)[1] <- "Date"
colnames(ziv.synth)[2] <- "ziv_close"
colnames(vxz.synth)[1] <- "Date"
colnames(vxz.synth)[2] <- "vxz_close"
vxx.synth$Date <- ymd(vxx.synth$Date)
xiv.synth$Date <- ymd(xiv.synth$Date)
ziv.synth$Date <- ymd(ziv.synth$Date)
vxz.synth$Date <- ymd(vxz.synth$Date)

# Download SPY data
# Note you need tyo place your API key...your_key_here
SPY <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=SPY&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
SPY$timestamp <- ymd(SPY$timestamp)   #Lubridate to change character date to date format
SPY <- arrange(SPY,timestamp)   #dplyr to sort data frame by date ascending order
colnames(SPY)[1] <- "Date"
SPY$Date <- ymd(SPY$Date)
SPY <- as.data.frame(SPY)
SPY <- subset(SPY, Date >= as.POSIXct("2004-03-26") ) # synthetic data start
head(SPY)

VXX <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=VXX&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
VXX$timestamp <- ymd(VXX$timestamp)   #Lubridate to change character date to date format
VXX <- arrange(VXX,timestamp)   #dplyr to sort data frame by date ascending order
colnames(VXX)[1] <- "Date"
VXX$Date <- ymd(VXX$Date)
VXX <- as.data.frame(VXX)
head(VXX)

XIV <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=XIV&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
XIV$timestamp <- ymd(XIV$timestamp)   #Lubridate to change character date to date format
XIV <- arrange(XIV,timestamp)   #dplyr to sort data frame by date ascending order
colnames(XIV)[1] <- "Date"
XIV$Date <- ymd(XIV$Date)
XIV <- as.data.frame(XIV)
head(XIV)
#XIV <- subset(XIV, Date >= as.POSIXct("2012-01-01"))

ZIV <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=ZIV&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
ZIV$timestamp <- ymd(ZIV$timestamp)   #Lubridate to change character date to date format
ZIV <- arrange(ZIV,timestamp)   #dplyr to sort data frame by date ascending order
colnames(ZIV)[1] <- "Date"
ZIV$Date <- ymd(ZIV$Date)
ZIV <- as.data.frame(ZIV)
head(ZIV)

VXZ <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=VXZ&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
VXZ$timestamp <- ymd(VXZ$timestamp)   #Lubridate to change character date to date format
VXZ <- arrange(VXZ,timestamp)   #dplyr to sort data frame by date ascending order
colnames(VXZ)[1] <- "Date"
VXZ$Date <- ymd(VXZ$Date)
VXZ <- as.data.frame(VXZ)
tail(VXZ)

# Join sythentic data to alpha vantage
vxx.synth <- subset(vxx.synth, Date <= as.POSIXct("2009-01-29"))
xiv.synth <- subset(xiv.synth, Date <= as.POSIXct("2010-11-29"))
ziv.synth <- subset(ziv.synth, Date <= as.POSIXct("2010-11-29"))
vxz.synth <- subset(vxz.synth, Date <= as.POSIXct("2009-02-19"))
# Subset only date and close from alpha vantage data
VXX <- VXX[ -c(2:5, 7:9) ]  # subset adjusted close
XIV <- XIV[ -c(2:5, 7:9) ]  # subset adjusted close
ZIV <- ZIV[ -c(2:5, 7:9) ]  # subset adjusted close
VXZ <- VXZ[ -c(2:5, 7:9) ]  # subset adjusted close
SPY <- SPY[ -c(3:5, 7:9) ]  # subset adjusted close
colnames(VXX)[2] <- "vxx_close"
colnames(XIV)[2] <- "xiv_close"
colnames(ZIV)[2] <- "ziv_close"
colnames(VXZ)[2] <- "vxz_close"
colnames(SPY)[2] <- "spy_open"
colnames(SPY)[3] <- "spy_close"

# row bind
VXX <- rbind(vxx.synth,VXX)
XIV <- rbind(xiv.synth,XIV)
ZIV <- rbind(ziv.synth,ZIV)
VXZ <- rbind(vxz.synth,VXZ)
df <- cbind(VXX,XIV,ZIV,VXZ,SPY)
tail(df)

# Download Spot VIX Price and VXV Price from CBOE website
VIX_cboe <- fread("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv")
VIX_cboe <- as.data.frame(VIX_cboe)
VIX_cboe <- VIX_cboe[2:nrow(VIX_cboe), ]
colnames(VIX_cboe)[1] = "Date"
colnames(VIX_cboe)[2] = "vix_open"
colnames(VIX_cboe)[3] = "vix_high"
colnames(VIX_cboe)[4] = "vix_low"
colnames(VIX_cboe)[5] = "vix_close"
VIX_cboe$Date <- mdy(VIX_cboe$Date)
cols <-c(2:5)
VIX_cboe[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
VIX_cboe <- subset(VIX_cboe, Date >= as.POSIXct("2007-12-04") )
# Download VXV Data From CBOE website
VXV_cboe <- fread("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vix3mdailyprices.csv")
VXV_cboe <- as.data.frame(VXV_cboe)
VXV_cboe <- VXV_cboe[3:nrow(VXV_cboe), ]
colnames(VXV_cboe)[1] = "Date"
colnames(VXV_cboe)[2] = "vxv_cboe_open"
colnames(VXV_cboe)[3] = "vxv_cboe_high"
colnames(VXV_cboe)[4] = "vxv_cboe_low"
colnames(VXV_cboe)[5] = "vxv_cboe_close"
VXV_cboe$Date <- mdy(VXV_cboe$Date)
cols <-c(2:5)
VXV_cboe[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Download VXMT Data from CBOE website
VXMT_cboe <- fread("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vxmtdailyprices.csv")
VXMT_cboe <- as.data.frame(VXMT_cboe)
VXMT_cboe <- VXMT_cboe[3:nrow(VXMT_cboe), ]
colnames(VXMT_cboe)[1] = "Date"
colnames(VXMT_cboe)[2] = "vxmt_cboe_open"
colnames(VXMT_cboe)[3] = "vxmt_cboe_high"
colnames(VXMT_cboe)[4] = "vxmt_cboe_low"
colnames(VXMT_cboe)[5] = "vxmt_cboe_close"
VXMT_cboe$Date <- mdy(VXMT_cboe$Date)
cols <-c(2:5)
VXMT_cboe[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Join VIX, VIX3m (VXV) and VXMT CBOE data to ETF df
cboe.df <- merge(VIX_cboe,VXV_cboe, by="Date")
cboe.df <- merge(cboe.df,VXMT_cboe, by="Date")
tail(cboe.df)
df <- df[,c(-3,-5,-7,-9)] # Drop unused dates
df <- full_join(df, cboe.df, by = c("Date" = "Date"))

# Remove last rows
nrow <- NROW(df)-2
df <- head(df,nrow)
tail(df)

############################################
# Back Test VXV / VXMT Ratio
# Find best params without bootstrapping
############################################
# Calculate Close-to-Close returns
df$vxx.close.ret <- ROC(df$vxx_close, type = c("discrete"))
df$xiv.close.ret <- ROC(df$xiv_close, type = c("discrete"))

# VXV / VXMT Ratio
df$vxv.vxmt.ratio <- df$vxv_cboe_close / df$vxmt_cboe_close
df[is.na(df)] <- 0
head(df)

# Calculate SMA of ratio
numdays <- 2:500
getSMA <- function(numdays) {
  function(df) {
    SMA(df[,"vxv.vxmt.ratio"], numdays)    # Calls TTR package to create SMA
  }
}
# Create a matrix to put the SMAs in
sma.matrix <- matrix(nrow=nrow(df), ncol=0)

# Loop for filling it
for (i in numdays) {
  sma.matrix <- cbind(sma.matrix, getSMA(i)(df))
}

# Rename columns
colnames(sma.matrix) <- sapply(numdays, function(n)paste("ratio.sma.n", n, sep=""))

# Bind to existing dataframe
df <-  cbind(df, sma.matrix)
tail(df)

##############################################
# Optimize Strategy Params
##############################################
num.days <- 2:300
i=3
# Initialize data frame
data_output_df <- data.frame()
# Optimize #########
optIMIZE = function(x){
  #spx.sma <- df[,paste0("close.sma.n", sma[i])]
  names(df)
  # Enter buy / sell rules
  #df$vxx.signal <- ifelse(df$vxv.vxmt.ratio > 1 & df$vxv.vxmt.ratio > df$ratio.sma , 1,0)
  #df$xiv.signal <- ifelse(df$vxv.vxmt.ratio < 1 & df$vxv.vxmt.ratio < df$ratio.sma , 1,0)
  df$vxx.signal <- ifelse(df$vxv.vxmt.ratio > 1 & df$vxv.vxmt.ratio > df[,paste0("ratio.sma.n", num.days[i])], 1,0)
  df$xiv.signal <- ifelse(df$vxv.vxmt.ratio < 1 & df$vxv.vxmt.ratio < df[,paste0("ratio.sma.n", num.days[i])], 1,0)

  # lag signal by two forward days
  # CBOE data is available next day
  df$vxx.signal <- lag(df$vxx.signal,2) # Note k=1 implies a move *forward*
  df$xiv.signal <- lag(df$xiv.signal,2) # Note k=1 implies a move *forward*

  df[is.na(df)] <- 0  # Set NA to 0

  # Calculate equity curves
  df$vxx.signal.ret <-  df$vxx.signal * df$vxx.close.ret
  df$xiv.signal.ret <-  df$xiv.signal * df$xiv.close.ret

  # Combine signals
  df$total.signal.ret <- df$vxx.signal.ret + df$xiv.signal.ret

  # Pull select columns from data frame to make XTS whilst retaining formats
  xts1 = xts(df$vxx.signal.ret, order.by=as.Date(df$Date, format="%m/%d/%Y"))
  xts2 = xts(df$xiv.signal.ret, order.by=as.Date(df$Date, format="%m/%d/%Y"))
  xts3 = xts(df$total.signal.ret, order.by=as.Date(df$Date, format="%m/%d/%Y"))
  tail(xts3)

  # Join XTS together
  compare <- cbind(xts1,xts2,xts3)

  # Use the PerformanceAnalytics package for trade statistics

  require(PerformanceAnalytics)
  colnames(compare) <- c("vxx","xiv","combined")
  #charts.PerformanceSummary(compare,main="Long when current month is higher than previous 12 month", wealth.index=TRUE, colorset=rainbow12equal)
  # performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
  # drawdown.table <- rbind(table.Drawdowns(xts3))
  #dev.off()
  # logRets <- log(cumprod(1+compare))
  # chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal,main="Log Returns")

  #print(performance.table)
  #print(drawdown.table)
  cum.ret <- Return.cumulative(xts3, geometric = TRUE)
  annualized <- Return.annualized(xts3, scale = NA, geometric = TRUE)
  dd <- maxDrawdown(xts3)
  sharpe <- SharpeRatio.annualized(xts3, Rf = 0, scale = NA, geometric = TRUE)

  # Create data output of rep and close.diff columns rbind
  data_output_df <- data.frame("Annualized Return" = annualized,"Annualized Sharpe" = sharpe,"Cumulative Return" = cum.ret,"Maximum Draw Down" = dd)

}

for (i in 1:length(num.days)){    # Length of optimization
  tryCatch({
    temp <- optIMIZE(num.days[[i]])
    rownames(temp) <- paste0("",num.days[i])
    #cum_ret <- rbind.data.frame(cum_ret, temp)
    data_output_df <- rbind.data.frame(data_output_df,temp)
    ptm0 <- proc.time()
    Sys.sleep(0.1)
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

# Join SMA number to data frame
data_output_df <- data.frame(data_output_df,num.days)
show <- data_output_df[51:70,]

# Plot
colnames(data_output_df)
library(ggplot2)
ggplot(data=data_output_df, aes(x=num.days,Annualized.Return))+
  geom_bar(stat="identity")+
  theme_classic()+
  scale_x_continuous(breaks = seq(min(data_output_df$num.days), max(data_output_df$num.days)))+
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5,size=15))+
  ggtitle("VXV/VXMT Volatility Strategy - Optimized sma vzx/vxmt ratio",subtitle="2008 to present")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  geom_rect(aes(xmin=51,xmax=70,ymin=0,ymax=Inf),alpha=0.01,fill="#CC6666")+
  geom_rect(aes(xmin=117,xmax=161,ymin=0,ymax=Inf),alpha=0.01,fill="#CC6666")+
  annotate("text", label = "sma 51 to 70", x = 60, y = .65, color = "red")+
  annotate("text", label = "sma 117 to 161", x = 135, y = .65, color = "red")
Rplot162
Annualized Returns

We see two parameter bands so to speak using sma look back periods of 51 to 70 and 117 to 161.  Those are sma’s which provided annualized returns, since 2008, of over 50%.

This is considered cross sectional stability, see Tony Copper comment here however, one issue is that these adjacent areas have correlated trades and perhaps would need to scrutinize the returns on a time series basis to rule out lucky periods or months.

Next, we select the optimzed sma band range of sma 51 to 70 and run this over the resampled time series. This is done with this code:

############################
# Meboot
# Generate Maximum Entropy Bootstrapped Time Series Ensemble
############################
# XIV
df[is.na(df)] <- 0
df <- head(df,NROW(df)-1) # Remove missng value...
tail(df$xiv_close,10)
xiv.boot <- meboot(df$xiv_close, reps=100, trim=list(trim=0.10, xmin=NULL, xmax=NULL), reachbnd=FALSE,
               expand.sd=TRUE, force.clt=FALSE, scl.adjustment = FALSE, sym = FALSE,
               elaps=TRUE, colsubj, coldata, coltimes)
# Place meboot results in data frame
xiv.ensemble.df <- data.frame(xiv.boot$ensemble)
xiv.ensemble.df <- data.frame(xiv.ensemble.df,"Date"=df$Date)
# Melt for plotting
xiv.plot.df <- melt(xiv.ensemble.df,id.vars = "Date")
# Plot ggplot2
ggplot(data = xiv.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=xiv_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - XIV",subtitle="100 Iterations")

# VXX
vxx.boot <- meboot(df$vxx_close, reps=100, trim=list(trim=0.10, xmin=NULL, xmax=NULL), reachbnd=FALSE,
                   expand.sd=TRUE, force.clt=FALSE, scl.adjustment = FALSE, sym = FALSE,
                   elaps=TRUE, colsubj, coldata, coltimes)
# Place meboot results in data frame
vxx.ensemble.df <- data.frame(vxx.boot$ensemble)
vxx.ensemble.df <- data.frame(vxx.ensemble.df,"Date"=df$Date)
# Melt for plotting
vxx.plot.df <- melt(vxx.ensemble.df,id.vars = "Date")
# Plot ggplot2
ggplot(data = vxx.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxx_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXX", subtitle="100 Iterations")

# VXV
vxv.boot <- meboot(df$vxv_cboe_close, reps=100, trim=list(trim=0.10, xmin=NULL, xmax=NULL), reachbnd=FALSE,
                   expand.sd=TRUE, force.clt=FALSE, scl.adjustment = FALSE, sym = FALSE,
                   elaps=TRUE, colsubj, coldata, coltimes)
# Place meboot results in data frame
vxv.ensemble.df <- data.frame(vxv.boot$ensemble)
vxv.ensemble.df <- data.frame(vxv.ensemble.df,"Date"=df$Date)
# Melt for plotting
vxv.plot.df <- melt(vxv.ensemble.df,id.vars = "Date")
# Plot ggplot2
ggplot(data = vxv.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxv_cboe_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXV", subtitle="100 Iterations")

# VXMT
vxmt.boot <- meboot(df$vxmt_cboe_close, reps=100, trim=list(trim=0.10, xmin=NULL, xmax=NULL), reachbnd=FALSE,
                   expand.sd=TRUE, force.clt=FALSE, scl.adjustment = FALSE, sym = FALSE,
                   elaps=TRUE, colsubj, coldata, coltimes)
# Place meboot results in data frame
vxmt.ensemble.df <- data.frame(vxmt.boot$ensemble)
vxmt.ensemble.df <- data.frame(vxmt.ensemble.df,"Date"=df$Date)
# Melt for plotting
vxmt.plot.df <- melt(vxmt.ensemble.df,id.vars = "Date")
# Plot ggplot2
ggplot(data = vxmt.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxmt_cboe_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXMT")

An example of a resample series using meboot:

Rplot163
XIV – Resampled 100 times

One thing that is noticeable is the the resampled series is tightly bound to the original series. In this example this is 100 iterations for the sake of illustration purposes, one would create upwards of a 1000 re-sampled series.

Next course of action is to back test the original strategy using the optimal parameter band of vxv/vxmt sma ratio 51 to 70. In this example the strategy will run from sma 51 over all 100 series to sma 70. We will record the results in a list and then plot the distribution of results over key metrics: Annualized returns, maximum draw down and annualized sharpe ratio. The code that achieves this:

#################################
# Back test optimal band found in initial sample over boot strapped series
##################################
df.boot <- data.frame("Date" = xiv.ensemble.df$Date)
# boot strap sample series results
# Initialize list
boot_output <- list()
#Drop dates (not used for plotting)
xiv.ensemble.df <- xiv.ensemble.df[,-101]
vxx.ensemble.df <- vxx.ensemble.df[,-101]
vxv.ensemble.df <- vxv.ensemble.df[,-101]
vxmt.ensemble.df <- vxmt.ensemble.df[,-101]

i=1
sma <- rep(51:70,each=length(xiv.ensemble.df))
length.dfs <- rep(1:100,11)
for (i in 1:length(length.dfs)) {
    tryCatch({
  ############################################
  # Back Test VXV / VXMT Ratio Over Boot Strapped Series
  ############################################
  # Calculate Close-to-Close returns
  df.boot$vxx.close.ret <- ROC(vxx.ensemble.df[,length.dfs[i]], type = c("discrete"))
  df.boot$xiv.close.ret <- ROC(xiv.ensemble.df[,length.dfs[i]], type = c("discrete"))

  # VXV / VXMT Ratio
  df.boot$vxv.vxmt.ratio <- vxv.ensemble.df[,length.dfs[i]]/ vxmt.ensemble.df[,length.dfs[i]]
  df.boot$vxv.vxmt.ratio[is.na(df.boot$vxv.vxmt.ratio)] <- 0
  df.boot$vxv.vxmt.ratio[is.nan(df.boot$vxv.vxmt.ratio)] <- 0
  df.boot$vxv.vxmt.ratio[is.infinite(df.boot$vxv.vxmt.ratio)] <- 0

  # Create sma
  df.boot$sma <- SMA(df.boot[,"vxv.vxmt.ratio"], sma[i])    # Calls TTR package to create SMA

    # Enter buy / sell rules
  sma.n <- sma[i]
    #df.boot$vxx.signal <- ifelse(df.boot$vxv.vxmt.ratio > 1 & df.boot$vxv.vxmt.ratio > df.boot[,paste0("ratio.sma.n", sma.n)], 1,0)
  df.boot$vxx.signal <- ifelse(df.boot$vxv.vxmt.ratio > 1 & df.boot$vxv.vxmt.ratio > df.boot$sma, 1,0)
  df.boot$xiv.signal <- ifelse(df.boot$vxv.vxmt.ratio < 1 & df.boot$vxv.vxmt.ratio < df.boot$sma, 1,0)

    # lag signal by two forward days
    # CBOE data is available next day
    df.boot$vxx.signal <- lag(df.boot$vxx.signal,2) # Note k=1 implies a move *forward*
    df.boot$xiv.signal <- lag(df.boot$xiv.signal,2) # Note k=1 implies a move *forward*

    # Calculate equity curves
    df.boot$vxx.signal.ret <-  df.boot$vxx.signal * df.boot$vxx.close.ret
    df.boot$xiv.signal.ret <-  df.boot$xiv.signal * df.boot$xiv.close.ret

    # Combine signals
    df.boot$total.signal.ret <- df.boot$vxx.signal.ret + df.boot$xiv.signal.ret

    # Pull select columns from data frame to make XTS whilst retaining formats
    xts1 = xts(df.boot$vxx.signal.ret, order.by=as.Date(df.boot$Date, format="%m/%d/%Y"))
    xts2 = xts(df.boot$xiv.signal.ret, order.by=as.Date(df.boot$Date, format="%m/%d/%Y"))
    xts3 = xts(df.boot$total.signal.ret, order.by=as.Date(df.boot$Date, format="%m/%d/%Y"))
    tail(xts3)

    # Join XTS together
    compare <- cbind(xts1,xts2,xts3)

    # Use the PerformanceAnalytics package for trade statistics

    require(PerformanceAnalytics)
    colnames(compare) <- c("vxx","xiv","combined")
    #charts.PerformanceSummary(compare,main="Long when current month is higher than previous 12 month", wealth.index=TRUE, colorset=rainbow12equal)
    # performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
    # drawdown.table <- rbind(table.Drawdowns(xts3))
    #dev.off()
    # logRets <- log(cumprod(1+compare))
    # chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal,main="Log Returns")

    #print(performance.table)
    #print(drawdown.table)
    cum.ret <- Return.cumulative(xts3, geometric = TRUE)
    annualized <- Return.annualized(xts3, scale = NA, geometric = TRUE)
    dd <- maxDrawdown(xts3)
    sharpe <- SharpeRatio.annualized(xts3, Rf = 0, scale = NA, geometric = TRUE)
    id <- paste0("col",length.dfs[i],"sma",sma[i])

    # Create data output of rep and close.diff columns rbind
   out <- data.frame("Annualized Return" = annualized,"Annualized Sharpe" = sharpe,"Cumulative Return" = cum.ret,"Maximum Draw Down" = dd, id = id)
   rownames(out) <- paste0("col",length.dfs[i],"sma",sma[i])
    boot_output[[i]] <- rbind(out)

     ptm0 <- proc.time()
    Sys.sleep(0.1)
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

  # Join  boot output
  master <- do.call(rbind, boot_output)
  names(master)

  # Find confidence intervals
  annualized.ret.mean <- mean(master$Annualized.Return)
  annualized.ret.sdev <- sd(master$Annualized.Return)
  annualized.ret.sample.size <- nrow(master)
  annualized.error <- qnorm(0.975)* annualized.ret.sdev/sqrt( annualized.ret.sample.size)
  annualized.left <- annualized.ret.mean - annualized.error
  annualized.right <- annualized.ret.mean + annualized.error

  #annualized.SE <- annualized.ret.sdev / sqrt(annualized.ret.sample.size)
  #t.test(master$Annualized.Sharpe,conf.level = 0.98)

  annualized.sharpe.mean <- mean(master$Annualized.Sharpe)
  annualized.sharpe.sdev <- sd(master$Annualized.Sharpe)
  annualized.sharpe.sample.size <- nrow(master)
  annualized.sharpe.error <- qnorm(0.975)* annualized.sharpe.sdev/sqrt(annualized.sharpe.sample.size)
  annualized.sharpe.left <- annualized.sharpe.mean - annualized.error
  annualized.sharpe.right <- annualized.sharpe.mean + annualized.sharpe.error

  Maximum.Draw.Down.mean <- mean(master$Maximum.Draw.Down)
  Maximum.Draw.Down.sdev <- sd(master$Maximum.Draw.Down)
  Maximum.Draw.Down.sample.size <- nrow(master)
  Maximum.Draw.Down.error <- qnorm(0.975)* Maximum.Draw.Down.sdev/sqrt( Maximum.Draw.Down.sample.size)
  Maximum.Draw.Down.left <- Maximum.Draw.Down.mean - annualized.error
  Maximum.Draw.Down.right <- Maximum.Draw.Down.mean + Maximum.Draw.Down.error

  #t.test(master$Annualized.Return)
 # jarque.bera.test(master$Maximum.Draw.Down)
  require(tseries)

  # Plot
  # Annualized Return
  library(ggplot2)
  p1 <- ggplot(data=master, aes(Annualized.Return,col=I("red")))+
    geom_histogram(binwidth = 0.02)+
    theme_classic()+
    ggtitle("Resampled Strategy Results - Sma 51 to 70 - Annualized Return",subtitle="Strategy run over 100 resampled time series")+
    labs(x="Annualized Return",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
   # geom_vline(xintercept = left, color="blue", linetype="dashed")+
    #geom_vline(xintercept = right, color="blue", linetype="dashed")+
    geom_vline(xintercept = annualized.ret.mean, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 0.425727", x = 0.35, y = 117, color = "blue")

 # Sharpe Ratio
 p3 <- ggplot(data=master, aes(Annualized.Sharpe,col=I("red")))+
    geom_histogram(binwidth = 0.001)+
    theme_classic()+
    ggtitle("Resampled Strategy Results - Sma 51 to 70 - Annualized Sharpe Ratio",subtitle="Strategy run over 100 resampled time series")+
    labs(x="Annualized.Sharpe",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    geom_vline(xintercept = annualized.sharpe.mean, color="blue", linetype="dashed")+
   geom_vline(xintercept = annualized.sharpe.right, color="blue", linetype="dashed")+
   geom_vline(xintercept = annualized.sharpe.left, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 1.02887", x = 1.02887, y = 15, color = "blue")

 annualized.sharpe.right

  # Maximum DD
  # Sharpe Ratio
  p5 <- ggplot(data=master, aes(Maximum.Draw.Down,col=I("red")))+
    geom_histogram(binwidth = 0.01)+
    theme_classic()+
    ggtitle("Resampled Strategy Results - Sma 51 to 70 - Maximum Draw Down",subtitle="Strategy run over 100 resampled time series")+
    labs(x="Maximum.Draw.Down",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    geom_vline(xintercept = Maximum.Draw.Down.mean, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 0.5020338", x = 0.6, y = 65, color = "blue")

  ###################
  # Plot Original back test results
  ##################
  # Plot
  # Annualized Return
  library(ggplot2)
  # find original mean annualized
  mean.orig.annaulized <- mean(show$Annualized.Return)
  p2 <- ggplot(data=show, aes(Annualized.Return,col=I("red")))+
    geom_histogram(binwidth = 0.01)+
    theme_classic()+
    ggtitle("Original Strategy Results - Sma 51 to 70 - Annualized Return",subtitle="")+
    labs(x="Annualized Return",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    # geom_vline(xintercept = left, color="blue", linetype="dashed")+
    #geom_vline(xintercept = right, color="blue", linetype="dashed")+
    geom_vline(xintercept = mean.orig.annaulized, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 0.6039258", x = 0.585, y = 15, color = "blue")
  require(gridExtra)
  gridExtra::grid.arrange(p1, p2, nrow = 1)

  # Sharpe Ratio
  # find original mean annualized
  mean.orig.sharpe <- mean(show$Annualized.Sharpe)
 p4 <- ggplot(data=show, aes(Annualized.Sharpe,col=I("red")))+
    geom_histogram(binwidth = 0.02)+
    theme_classic()+
    ggtitle("Original Strategy Results - Sma 51 to 70 - Annualized Sharpe Ratio",subtitle="")+
    labs(x="Annualized.Sharpe",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    geom_vline(xintercept = mean.orig.sharpe, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 1.565738", x = 1.52, y = 6.5, color = "blue")
 gridExtra::grid.arrange(p3, p4, nrow = 1)

  # Maximum DD
  # Sharpe Ratio
 names(show)
 show$Maximum.Draw.Down
 mean.orig.maxdd <- mean(show$Maximum.Draw.Down)
  p6 <- ggplot(data=show, aes(Maximum.Draw.Down,col=I("red")))+
    geom_histogram(binwidth = 0.0001)+
    theme_classic()+
    ggtitle("Original Strategy Results - Sma 51 to 70 - Maximum Draw Down",subtitle="")+
    labs(x="Maximum.Draw.Down",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    geom_vline(xintercept = mean.orig.maxdd, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 0.3420499", x = 0.35, y = 7, color = "blue")
  gridExtra::grid.arrange(p5, p6, nrow = 1)

ann ret

Mean annualized return over the re-sampled time series is .43% vs original strategy back test of 0.6039258%.

sharpe

Mean annualized sharpe ratio over the re-sampled time series is 1.02887 vs original strategy back test of 1.565738.

max dd

Mean maximum draw down over the re-sampled time series is 0.5020338 % vs original strategy back test of 0.3420499 %.

Table original optimized parameters:
area in 50 to 60

Over all the monte carlo distributions, on average, show a reduction in performance on annualized return, sharpe ratio and maximum draw down.

It should be noted however that the expectancy is somewhat positive. The mean, variance of the resampled series is different to the original series, however, this may still not be the best way to proceed. The meboot method retains the structure of the original series and the resampled series bear a strong resemblance to the original series. However, the results do show that a change in mean / variance, although drop in performance, the strategy parameters themselves did not fall apart. I deem that useful information moving forward as a type of sensitivity test to changes in mean / variance.

The range 51 to 70 sma, its like throwing darts, your likely to land in a zone that would make money going forward.

This research on my end is very infant and I look for other ways to stress test strategies. Some which come to mind include:

1. Run monte carlo simulations on the distribution of returns. Changing the order many times may give an expectancy of losing streak and maximum draw down.
2. Find other ways to bootstrap time series data.
3. Randomly add / subtract an amount to original price series changing the data itself, run multiple iterations.

4. Walk forward analysis, multiple train and test periods

5. In out of sample testing, hold out method

More to come

Please feel free to drop me line on other methods of strategy stress test testing and any other general collaboration.

Thanks for reading!

Full code can be found on my github.