Bootstap Analysis – Resample and Replace

As I alluded in the previous post I made attempts to block boostrap time series outside of the packages, meboot, tseries & boot. In this post I will share the results of the effort.

First, like the last post we need data, we need to arrive at this point in the previous posts script:

annotate("text", label = "sma 117 to 161", x = 135, y = .65, color = "red")
line 294, from there you may follow along with this post.

The reason for block boot strapping time series is to maintain the correlation structure of the series. First in order to do this we will difference the time series data with the following code to create stationary series:

# First order differencing
# Removes auto correlation so each data point is not dependant on the other
# XIV diff
df$xiv.close.diff <- c(rep(NA, 1), diff(df$xiv_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$xiv.close.diff, type="l",main="XIV Close Differencing")
mean(df$xiv.close.diff, na.rm = TRUE)
var(df$xiv.close.diff, na.rm = TRUE)
# VXX diff
df$vxx.close.diff <- c(rep(NA, 1), diff(df$vxx_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
df$vxx.close.diff <- c(rep(NA, 1), diff(df$vxx_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$vxx.close.diff, type="l")
mean(df$vxx.close.diff, na.rm = TRUE)
var(df$vxx.close.diff, na.rm = TRUE)
# VXV Diff
df$vxv.close.diff <- c(rep(NA, 1), diff(df$vxv_cboe_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$vxv.close.diff, type="l")
mean(df$vxv.close.diff, na.rm = TRUE)
var(df$vxv.close.diff, na.rm = TRUE)
#VXMT Diff
df$vxmt.close.diff <- c(rep(NA, 1), diff(df$vxmt_cboe_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$vxmt.close.diff, type="l")
mean(df$vxmt.close.diff, na.rm = TRUE)
var(df$vxmt.close.diff, na.rm = TRUE)

Output:
xiv close diff

Next, we set a block size dervived from the b.star command from R package np. This calculates the block size for stationary and circular bootstrap per the paper Patton, Politis and White (2009).

# Create new data frame with differenced time series (staionary)
require(np)
#block.boot.df <- subset(df, Date >= as.POSIXct("2008-01-07") ) # subset at start of latest data set (VXMT)
block.boot.df <- data.frame(xiv.close.diff = df$xiv.close.diff, vxx.close.diff = df$vxx.close.diff, vxv.close.diff = df$vxv.close.diff, vxmt.close.diff = df$vxmt.close.diff)
# Find block size with package np
np.block.size.xiv <- b.star(block.boot.df$xiv.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
xiv.block.size <- np.block.size.xiv[2]
np.block.size.vxx <- b.star(block.boot.df$vxx.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
vxx.block.size <- np.block.size.vxx[2]
np.block.size.vxv <- b.star(block.boot.df$vxv.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
vxv.block.size <- np.block.size.vxv[2]
np.block.size.vxmt <- b.star(block.boot.df$vxmt.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
vxmt.block.size <- np.block.size.vxmt[2]
block.size <- c(xiv.block.size,vxx.block.size,vxv.block.size,vxmt.block.size)
block <- min(block.size) 

The block sizes for each 4 series in our volatility strategy:

 > block.size
  xiv.block.size vxx.block.size vxv.block.size vxmt.block.size
1              2              4             11               4

For this we will choose the maximum number obtained, which is 11. The code works by grouping the data frame by nrow by its block size and storing in a list. On the list output, a re sampling with replacement is performed on the list of groupped rows. The final reshuffled list is finally stored in an output dataframe. This function is iterated 100 times, storing each block bootstrapped series in an output list. As we are working with 4x time series, all 4 series are reshuffled in the same order and will end up in the same locations in the new time series. If this was not the case there would be more randomness between all series which might not make much sense. Heres the code:

# Set block number to begin subsetting per NROW
reps <- NROW(block.boot.df)/block # Set block number
reps <-(round(reps))
id <- rep(1:reps,each=block) # each = 5 corresponds to number of blocks to bootstrap by (5 in this case) # Check lengths of data and ID NROW(block.boot.df) length(id) # If ID is longer than df NROW if(length(id)>NROW(block.boot.df)){
  id.diff <- length(id) - NROW(block.boot.df)
  length.id <- length(id)-id.diff
  id  <- id[1:length.id]
} else {
  nrow.diff <- NROW(block.boot.df) - length(id)
  max.id <- max(id)
  add.value <- max.id+1
  pad.na <- rep(add.value,nrow.diff)  # pad nrow diff
  id <- c(id,pad.na)  # join added to vector
}

# Join ID and block.df
block.boot.df <- data.frame(block.boot.df,id) # place back in data frame
#block.boot.df$id.final[is.na(block.boot.df$id.final)] <- 693 # all NA to 1
# Id data
IDs <- unique(block.boot.df$id)
temp <- list()
# Function for bootstrap 1x data frame
# subsets data by id number
i=1
bootSTRAP = function(x){
  for (i in 1:length(IDs)){
    temp[i] <- list(block.boot.df[block.boot.df$id==IDs[i],])
  }
  out <- sample(temp,replace = TRUE)
  boot.df.output <- do.call(rbind, out)
}

# Loop for running it a 1000 times
runs <- 1:100
run.output <- list()
i=1
for (i in 1:length(runs)){
  tryCatch({
    temp.1 <- bootSTRAP(runs[i])
    #cum_ret <- rbind.data.frame(cum_ret, temp)
    run.output[[i]] <- cbind(temp.1)
    ptm0 <- proc.time()
    Sys.sleep(0.1)
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

# cumsum staionary differenced series
for (i in 1:length(run.output)){
  run.output[[i]]$cumsum.diff.xiv.diff <- cumsum(run.output[[i]]$xiv.close.diff)
  run.output[[i]]$cumsum.diff.vxx.diff <- abs(cumsum(run.output[[i]]$vxx.close.diff))
  run.output[[i]]$cumsum.diff.vxv.diff <- cumsum(run.output[[i]]$vxv.close.diff)
  run.output[[i]]$cumsum.diff.vxmt.diff <- cumsum(run.output[[i]]$vxmt.close.diff)
}

# Reverse VXX down trending series
for (i in 1:length(run.output)){
  run.output[[i]]$cumsum.diff.vxx.diff <- rev(run.output[[i]]$cumsum.diff.vxx.diff )
}
# Add index for merging
for (i in 1:length(run.output)){
  run.output[[i]]$index <- seq(1:NROW(run.output[[i]]))
}

# Merge all dfs
L <- run.output[[1]]
for (i in 2:length(run.output)) L <- merge(L,  run.output[[i]], by=c("index"))

# cbinds
replace.s <- cbind.data.frame(L)
replace.s <- replace.s[,-1]
# Subset data frames for loop
xiv.df <- replace.s[ , grepl("cumsum.diff.xiv.diff*", names(replace.s), perl=TRUE)]
vxx.df <- replace.s[ , grepl("cumsum.diff.vxx.diff*", names(replace.s), perl=TRUE)]
vxv.df <- replace.s[ , grepl("cumsum.diff.vxv.diff*", names(replace.s), perl=TRUE)]
vxmt.df <- replace.s[ , grepl("cumsum.diff.vxmt.diff*", names(replace.s), perl=TRUE)]

# Add Date to df's
df.date <- data.frame(Date = df$Date)
diff <- nrow(df.date) - nrow(xiv.df)
df.date <- df.date[1:nrow(xiv.df),]
# Add Date to subsetted df
xiv.ensemble.df <- data.frame(Date = df.date, xiv.df)
vxx.ensemble.df <- data.frame(Date = df.date, vxx.df)
vxv.ensemble.df <- data.frame(Date = df.date, vxv.df)
vxmt.ensemble.df <- data.frame(Date = df.date, vxmt.df)

# Melt data frames for plotting
xiv.plot.df <- melt(xiv.ensemble.df,id.vars = "Date")
vxx.plot.df <- melt(vxx.ensemble.df,id.vars = "Date")
vxv.plot.df <- melt(vxv.ensemble.df,id.vars = "Date")
vxmt.plot.df <- melt(vxmt.ensemble.df,id.vars = "Date")

# Plot XIV Resampled series
ggplot(data = xiv.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=xiv_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - XIV", subtitle="100 Iterations")

# Plot VXX Resampled series
ggplot(data = vxx.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxx_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXX", subtitle="100 Iterations")

# Plot VXV Resampled series
ggplot(data = vxv.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxv_cboe_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXV", subtitle="100 Iterations")

# Plot VXMT Resampled series
ggplot(data = vxmt.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxmt_cboe_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXMT", subtitle="100 Iterations")

This is the following output:

block size 11 xiv
XIV – Resample and replace – block size == 11
vxx size 11
VXX – Resample and replace – block size == 11 – Note the re sampled is not a great representation of the original!

In contrast to maximum entropy bootstrapping where each series resembled the original to a high degree. The block bootstrapped series differ to the original. The VXX series itself, the trend is straight down. The packages, tseries and boot did not manage to bootstrap this successfully either, essentially after differencing, and cumsum, the series needed to be reversed. This is due to the sign of the VXX trend and cumsum – – values == positive and we are left with up trending VXX.

By any means. This might be effective for back testing price series based on trading rules over layed on price series. The VXV/VXMT is a bet on contango, backwardation thus the trading rules are not per say over layed on the series itself, however, those strategies which use trading rules based on price series. The above bootstrapped procedure may serve a useful purpose.

Code for this can be found on my github.

 

Robustness Testing – Volatility Strategy – Time Series Bootstrapping

In this post we will look at a simple volatility strategy. We will optimize the parameters over a back test and select a region of parameters which show the best performance. We will then subject this optimal performance band to further testing by resampling the original time series using the meboot package.

Traditionally bootstrapping is usually performed on data that is independent and identically distributed which of course it not the case with time series data due to the presence of auto correlation, if auto correlation is present then time series is dependent on its past. For lag 1 correlation at varying time periods x is the correlation between, (x4,x3),(x3,x2),(x2,x1),…). and for lag 2 correlation between time x, (x5,x3),(x4,x2),(x3,x1),…) so on and so forth for the entire sample at varying time lags.

acf xiv

This is the auto correlation of XIV. We may remove the auto correlation by differencing, Xt−Xt−1:

Rplot153One may notice there is no trend, the series is stationary. We notice the variance has increased over time.

mean(df$xiv.close.diff, na.rm = TRUE)
[1] 0.050235
> var(df$xiv.close.diff, na.rm = TRUE)
[1] 1.299783

We may stabilize the variance by taking the log of the close and differencing.

Rplot209

mean(df$log.xiv.close.diff, na.rm = TRUE)
[1] 0.001229805
> var(df$log.xiv.close.diff, na.rm = TRUE)
[1] 0.001382537

 

Rplot154

With the auto correlation removed, one may attempt to perform traditional methods of bootstrapping.

With stationary series, the mean is constant over time, there is no trend. An idea I was experimenting with was differencing as per above, and in an attempt to retain the correlation structure of the original series, block bootstrapping through resample and replace and stitch the series back together. As the resampled series is still stationary, cumulatively sum the output in order to create a trending series. This would then serve as a new series to run new back tests on.  See my attempt here.

I so far have opted to use the meboot (maximum entropy bootstrap) R package which does not require the series to be stationary, it does not require block lengths as a parameter and it keeps the correlation structure of the data intact.

We will need XIV, VXX, CBOE VXV and CBOE VXMT data, I also use sythnetic vxx and xiv data stored on my HD, feel free to edit these portions to include the locations of your own synthetic (if you choose to want more history).

Strategy rules as follows:

  1. long VXX when VXV/VXMT ratio is over 1 & the VXV/VXMT ratio is over an ‘n’ period sma of the VXV/VXMT ratio.
  2. long XIV when VXV/VXMT ratio is less than 1 & the VXV/VXMT ratio is below an ‘n’ period sma of the VXV/VXMT ratio.

We have two free parameters here. The ratio value currently set to 1 and the lookback period for the vxv/vxmt sma ratio.

We optimize the vxv/vxmt sma ratio from sma 2:300 and plot the annualized return for each sma period, the code to do this:

Note im using alphavantage API, you will need your api key…

# Meboot time series resampling
# Andrew Bannerman 12.29.2017

require(xts)
require(data.table)
require(ggplot2)
require(lubridate)
require(magrittr)
require(scales)
require(reshape2)
require(PerformanceAnalytics)
require(dplyr)
require(TTR)
require(meboot)
require(np)

######### Download ETF Data ###############
# Load Syntehtic and join to alpha vantage adjusted prices
# Load synthetic VXX and XIV data
library(readxl)
synth <- read_excel("D:/R Projects/Final Scripts/VIX_term_structure/vix-funds-models-no-formulas.xls", col_names = TRUE)
synth1 <- read_excel("D:/R Projects/Final Scripts/VIX_term_structure/vix-mt-funds-models-no-formulas.xls", col_names = TRUE)
synth <- as.data.frame(synth)
synth1 <- as.data.frame(synth1)
# Extract synthetic series
vxx.synth <- data.frame(synth$Date, synth$'VXX calc')
xiv.synth <- data.frame(synth$Date, synth$'XIV calc')
ziv.synth <- data.frame(synth1$Date, synth1$'ZIV calc')
vxz.synth <- data.frame(synth1$Date, synth1$'VXZ calc')
colnames(vxx.synth)[1] <- "Date"
colnames(vxx.synth)[2] <- "vxx_close"
colnames(xiv.synth)[1] <- "Date"
colnames(xiv.synth)[2] <- "xiv_close"
colnames(ziv.synth)[1] <- "Date"
colnames(ziv.synth)[2] <- "ziv_close"
colnames(vxz.synth)[1] <- "Date"
colnames(vxz.synth)[2] <- "vxz_close"
vxx.synth$Date <- ymd(vxx.synth$Date)
xiv.synth$Date <- ymd(xiv.synth$Date)
ziv.synth$Date <- ymd(ziv.synth$Date)
vxz.synth$Date <- ymd(vxz.synth$Date)

# Download SPY data
# Note you need tyo place your API key...your_key_here
SPY <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=SPY&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
SPY$timestamp <- ymd(SPY$timestamp)   #Lubridate to change character date to date format
SPY <- arrange(SPY,timestamp)   #dplyr to sort data frame by date ascending order
colnames(SPY)[1] <- "Date"
SPY$Date <- ymd(SPY$Date)
SPY <- as.data.frame(SPY)
SPY <- subset(SPY, Date >= as.POSIXct("2004-03-26") ) # synthetic data start
head(SPY)

VXX <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=VXX&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
VXX$timestamp <- ymd(VXX$timestamp)   #Lubridate to change character date to date format
VXX <- arrange(VXX,timestamp)   #dplyr to sort data frame by date ascending order
colnames(VXX)[1] <- "Date"
VXX$Date <- ymd(VXX$Date)
VXX <- as.data.frame(VXX)
head(VXX)

XIV <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=XIV&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
XIV$timestamp <- ymd(XIV$timestamp)   #Lubridate to change character date to date format
XIV <- arrange(XIV,timestamp)   #dplyr to sort data frame by date ascending order
colnames(XIV)[1] <- "Date"
XIV$Date <- ymd(XIV$Date)
XIV <- as.data.frame(XIV)
head(XIV)
#XIV <- subset(XIV, Date >= as.POSIXct("2012-01-01"))

ZIV <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=ZIV&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
ZIV$timestamp <- ymd(ZIV$timestamp)   #Lubridate to change character date to date format
ZIV <- arrange(ZIV,timestamp)   #dplyr to sort data frame by date ascending order
colnames(ZIV)[1] <- "Date"
ZIV$Date <- ymd(ZIV$Date)
ZIV <- as.data.frame(ZIV)
head(ZIV)

VXZ <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=VXZ&outputsize=full&apikey=your_api_key&datatype=csv") #fread() data.table for downloading directly to a data frame
VXZ$timestamp <- ymd(VXZ$timestamp)   #Lubridate to change character date to date format
VXZ <- arrange(VXZ,timestamp)   #dplyr to sort data frame by date ascending order
colnames(VXZ)[1] <- "Date"
VXZ$Date <- ymd(VXZ$Date)
VXZ <- as.data.frame(VXZ)
tail(VXZ)

# Join sythentic data to alpha vantage
vxx.synth <- subset(vxx.synth, Date <= as.POSIXct("2009-01-29"))
xiv.synth <- subset(xiv.synth, Date <= as.POSIXct("2010-11-29"))
ziv.synth <- subset(ziv.synth, Date <= as.POSIXct("2010-11-29"))
vxz.synth <- subset(vxz.synth, Date <= as.POSIXct("2009-02-19"))
# Subset only date and close from alpha vantage data
VXX <- VXX[ -c(2:5, 7:9) ]  # subset adjusted close
XIV <- XIV[ -c(2:5, 7:9) ]  # subset adjusted close
ZIV <- ZIV[ -c(2:5, 7:9) ]  # subset adjusted close
VXZ <- VXZ[ -c(2:5, 7:9) ]  # subset adjusted close
SPY <- SPY[ -c(3:5, 7:9) ]  # subset adjusted close
colnames(VXX)[2] <- "vxx_close"
colnames(XIV)[2] <- "xiv_close"
colnames(ZIV)[2] <- "ziv_close"
colnames(VXZ)[2] <- "vxz_close"
colnames(SPY)[2] <- "spy_open"
colnames(SPY)[3] <- "spy_close"

# row bind
VXX <- rbind(vxx.synth,VXX)
XIV <- rbind(xiv.synth,XIV)
ZIV <- rbind(ziv.synth,ZIV)
VXZ <- rbind(vxz.synth,VXZ)
df <- cbind(VXX,XIV,ZIV,VXZ,SPY)
tail(df)

# Download Spot VIX Price and VXV Price from CBOE website
VIX_cboe <- fread("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv")
VIX_cboe <- as.data.frame(VIX_cboe)
VIX_cboe <- VIX_cboe[2:nrow(VIX_cboe), ]
colnames(VIX_cboe)[1] = "Date"
colnames(VIX_cboe)[2] = "vix_open"
colnames(VIX_cboe)[3] = "vix_high"
colnames(VIX_cboe)[4] = "vix_low"
colnames(VIX_cboe)[5] = "vix_close"
VIX_cboe$Date <- mdy(VIX_cboe$Date)
cols <-c(2:5)
VIX_cboe[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
VIX_cboe <- subset(VIX_cboe, Date >= as.POSIXct("2007-12-04") )
# Download VXV Data From CBOE website
VXV_cboe <- fread("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vix3mdailyprices.csv")
VXV_cboe <- as.data.frame(VXV_cboe)
VXV_cboe <- VXV_cboe[3:nrow(VXV_cboe), ]
colnames(VXV_cboe)[1] = "Date"
colnames(VXV_cboe)[2] = "vxv_cboe_open"
colnames(VXV_cboe)[3] = "vxv_cboe_high"
colnames(VXV_cboe)[4] = "vxv_cboe_low"
colnames(VXV_cboe)[5] = "vxv_cboe_close"
VXV_cboe$Date <- mdy(VXV_cboe$Date)
cols <-c(2:5)
VXV_cboe[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Download VXMT Data from CBOE website
VXMT_cboe <- fread("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vxmtdailyprices.csv")
VXMT_cboe <- as.data.frame(VXMT_cboe)
VXMT_cboe <- VXMT_cboe[3:nrow(VXMT_cboe), ]
colnames(VXMT_cboe)[1] = "Date"
colnames(VXMT_cboe)[2] = "vxmt_cboe_open"
colnames(VXMT_cboe)[3] = "vxmt_cboe_high"
colnames(VXMT_cboe)[4] = "vxmt_cboe_low"
colnames(VXMT_cboe)[5] = "vxmt_cboe_close"
VXMT_cboe$Date <- mdy(VXMT_cboe$Date)
cols <-c(2:5)
VXMT_cboe[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Join VIX, VIX3m (VXV) and VXMT CBOE data to ETF df
cboe.df <- merge(VIX_cboe,VXV_cboe, by="Date")
cboe.df <- merge(cboe.df,VXMT_cboe, by="Date")
tail(cboe.df)
df <- df[,c(-3,-5,-7,-9)] # Drop unused dates
df <- full_join(df, cboe.df, by = c("Date" = "Date"))

# Remove last rows
nrow <- NROW(df)-2
df <- head(df,nrow)
tail(df)

############################################
# Back Test VXV / VXMT Ratio
# Find best params without bootstrapping
############################################
# Calculate Close-to-Close returns
df$vxx.close.ret <- ROC(df$vxx_close, type = c("discrete"))
df$xiv.close.ret <- ROC(df$xiv_close, type = c("discrete"))

# VXV / VXMT Ratio
df$vxv.vxmt.ratio <- df$vxv_cboe_close / df$vxmt_cboe_close
df[is.na(df)] <- 0
head(df)

# Calculate SMA of ratio
numdays <- 2:500
getSMA <- function(numdays) {
  function(df) {
    SMA(df[,"vxv.vxmt.ratio"], numdays)    # Calls TTR package to create SMA
  }
}
# Create a matrix to put the SMAs in
sma.matrix <- matrix(nrow=nrow(df), ncol=0)

# Loop for filling it
for (i in numdays) {
  sma.matrix <- cbind(sma.matrix, getSMA(i)(df))
}

# Rename columns
colnames(sma.matrix) <- sapply(numdays, function(n)paste("ratio.sma.n", n, sep=""))

# Bind to existing dataframe
df <-  cbind(df, sma.matrix)
tail(df)

##############################################
# Optimize Strategy Params
##############################################
num.days <- 2:300
i=3
# Initialize data frame
data_output_df <- data.frame()
# Optimize #########
optIMIZE = function(x){
  #spx.sma <- df[,paste0("close.sma.n", sma[i])]
  names(df)
  # Enter buy / sell rules
  #df$vxx.signal <- ifelse(df$vxv.vxmt.ratio > 1 & df$vxv.vxmt.ratio > df$ratio.sma , 1,0)
  #df$xiv.signal <- ifelse(df$vxv.vxmt.ratio < 1 & df$vxv.vxmt.ratio < df$ratio.sma , 1,0)
  df$vxx.signal <- ifelse(df$vxv.vxmt.ratio > 1 & df$vxv.vxmt.ratio > df[,paste0("ratio.sma.n", num.days[i])], 1,0)
  df$xiv.signal <- ifelse(df$vxv.vxmt.ratio < 1 & df$vxv.vxmt.ratio < df[,paste0("ratio.sma.n", num.days[i])], 1,0)

  # lag signal by two forward days
  # CBOE data is available next day
  df$vxx.signal <- lag(df$vxx.signal,2) # Note k=1 implies a move *forward*
  df$xiv.signal <- lag(df$xiv.signal,2) # Note k=1 implies a move *forward*

  df[is.na(df)] <- 0  # Set NA to 0

  # Calculate equity curves
  df$vxx.signal.ret <-  df$vxx.signal * df$vxx.close.ret
  df$xiv.signal.ret <-  df$xiv.signal * df$xiv.close.ret

  # Combine signals
  df$total.signal.ret <- df$vxx.signal.ret + df$xiv.signal.ret

  # Pull select columns from data frame to make XTS whilst retaining formats
  xts1 = xts(df$vxx.signal.ret, order.by=as.Date(df$Date, format="%m/%d/%Y"))
  xts2 = xts(df$xiv.signal.ret, order.by=as.Date(df$Date, format="%m/%d/%Y"))
  xts3 = xts(df$total.signal.ret, order.by=as.Date(df$Date, format="%m/%d/%Y"))
  tail(xts3)

  # Join XTS together
  compare <- cbind(xts1,xts2,xts3)

  # Use the PerformanceAnalytics package for trade statistics

  require(PerformanceAnalytics)
  colnames(compare) <- c("vxx","xiv","combined")
  #charts.PerformanceSummary(compare,main="Long when current month is higher than previous 12 month", wealth.index=TRUE, colorset=rainbow12equal)
  # performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
  # drawdown.table <- rbind(table.Drawdowns(xts3))
  #dev.off()
  # logRets <- log(cumprod(1+compare))
  # chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal,main="Log Returns")

  #print(performance.table)
  #print(drawdown.table)
  cum.ret <- Return.cumulative(xts3, geometric = TRUE)
  annualized <- Return.annualized(xts3, scale = NA, geometric = TRUE)
  dd <- maxDrawdown(xts3)
  sharpe <- SharpeRatio.annualized(xts3, Rf = 0, scale = NA, geometric = TRUE)

  # Create data output of rep and close.diff columns rbind
  data_output_df <- data.frame("Annualized Return" = annualized,"Annualized Sharpe" = sharpe,"Cumulative Return" = cum.ret,"Maximum Draw Down" = dd)

}

for (i in 1:length(num.days)){    # Length of optimization
  tryCatch({
    temp <- optIMIZE(num.days[[i]])
    rownames(temp) <- paste0("",num.days[i])
    #cum_ret <- rbind.data.frame(cum_ret, temp)
    data_output_df <- rbind.data.frame(data_output_df,temp)
    ptm0 <- proc.time()
    Sys.sleep(0.1)
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

# Join SMA number to data frame
data_output_df <- data.frame(data_output_df,num.days)
show <- data_output_df[51:70,]

# Plot
colnames(data_output_df)
library(ggplot2)
ggplot(data=data_output_df, aes(x=num.days,Annualized.Return))+
  geom_bar(stat="identity")+
  theme_classic()+
  scale_x_continuous(breaks = seq(min(data_output_df$num.days), max(data_output_df$num.days)))+
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5,size=15))+
  ggtitle("VXV/VXMT Volatility Strategy - Optimized sma vzx/vxmt ratio",subtitle="2008 to present")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  geom_rect(aes(xmin=51,xmax=70,ymin=0,ymax=Inf),alpha=0.01,fill="#CC6666")+
  geom_rect(aes(xmin=117,xmax=161,ymin=0,ymax=Inf),alpha=0.01,fill="#CC6666")+
  annotate("text", label = "sma 51 to 70", x = 60, y = .65, color = "red")+
  annotate("text", label = "sma 117 to 161", x = 135, y = .65, color = "red")
Rplot162
Annualized Returns

We see two parameter bands so to speak using sma look back periods of 51 to 70 and 117 to 161.  Those are sma’s which provided annualized returns, since 2008, of over 50%.

This is considered cross sectional stability, see Tony Copper comment here however, one issue is that these adjacent areas have correlated trades and perhaps would need to scrutinize the returns on a time series basis to rule out lucky periods or months.

Next, we select the optimzed sma band range of sma 51 to 70 and run this over the resampled time series. This is done with this code:

############################
# Meboot
# Generate Maximum Entropy Bootstrapped Time Series Ensemble
############################
# XIV
df[is.na(df)] <- 0
df <- head(df,NROW(df)-1) # Remove missng value...
tail(df$xiv_close,10)
xiv.boot <- meboot(df$xiv_close, reps=100, trim=list(trim=0.10, xmin=NULL, xmax=NULL), reachbnd=FALSE,
               expand.sd=TRUE, force.clt=FALSE, scl.adjustment = FALSE, sym = FALSE,
               elaps=TRUE, colsubj, coldata, coltimes)
# Place meboot results in data frame
xiv.ensemble.df <- data.frame(xiv.boot$ensemble)
xiv.ensemble.df <- data.frame(xiv.ensemble.df,"Date"=df$Date)
# Melt for plotting
xiv.plot.df <- melt(xiv.ensemble.df,id.vars = "Date")
# Plot ggplot2
ggplot(data = xiv.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=xiv_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - XIV",subtitle="100 Iterations")

# VXX
vxx.boot <- meboot(df$vxx_close, reps=100, trim=list(trim=0.10, xmin=NULL, xmax=NULL), reachbnd=FALSE,
                   expand.sd=TRUE, force.clt=FALSE, scl.adjustment = FALSE, sym = FALSE,
                   elaps=TRUE, colsubj, coldata, coltimes)
# Place meboot results in data frame
vxx.ensemble.df <- data.frame(vxx.boot$ensemble)
vxx.ensemble.df <- data.frame(vxx.ensemble.df,"Date"=df$Date)
# Melt for plotting
vxx.plot.df <- melt(vxx.ensemble.df,id.vars = "Date")
# Plot ggplot2
ggplot(data = vxx.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxx_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXX", subtitle="100 Iterations")

# VXV
vxv.boot <- meboot(df$vxv_cboe_close, reps=100, trim=list(trim=0.10, xmin=NULL, xmax=NULL), reachbnd=FALSE,
                   expand.sd=TRUE, force.clt=FALSE, scl.adjustment = FALSE, sym = FALSE,
                   elaps=TRUE, colsubj, coldata, coltimes)
# Place meboot results in data frame
vxv.ensemble.df <- data.frame(vxv.boot$ensemble)
vxv.ensemble.df <- data.frame(vxv.ensemble.df,"Date"=df$Date)
# Melt for plotting
vxv.plot.df <- melt(vxv.ensemble.df,id.vars = "Date")
# Plot ggplot2
ggplot(data = vxv.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxv_cboe_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXV", subtitle="100 Iterations")

# VXMT
vxmt.boot <- meboot(df$vxmt_cboe_close, reps=100, trim=list(trim=0.10, xmin=NULL, xmax=NULL), reachbnd=FALSE,
                   expand.sd=TRUE, force.clt=FALSE, scl.adjustment = FALSE, sym = FALSE,
                   elaps=TRUE, colsubj, coldata, coltimes)
# Place meboot results in data frame
vxmt.ensemble.df <- data.frame(vxmt.boot$ensemble)
vxmt.ensemble.df <- data.frame(vxmt.ensemble.df,"Date"=df$Date)
# Melt for plotting
vxmt.plot.df <- melt(vxmt.ensemble.df,id.vars = "Date")
# Plot ggplot2
ggplot(data = vxmt.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxmt_cboe_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXMT")

An example of a resample series using meboot:

Rplot163
XIV – Resampled 100 times

One thing that is noticeable is the the resampled series is tightly bound to the original series. In this example this is 100 iterations for the sake of illustration purposes, one would create upwards of a 1000 re-sampled series.

Next course of action is to back test the original strategy using the optimal parameter band of vxv/vxmt sma ratio 51 to 70. In this example the strategy will run from sma 51 over all 100 series to sma 70. We will record the results in a list and then plot the distribution of results over key metrics: Annualized returns, maximum draw down and annualized sharpe ratio. The code that achieves this:

#################################
# Back test optimal band found in initial sample over boot strapped series
##################################
df.boot <- data.frame("Date" = xiv.ensemble.df$Date)
# boot strap sample series results
# Initialize list
boot_output <- list()
#Drop dates (not used for plotting)
xiv.ensemble.df <- xiv.ensemble.df[,-101]
vxx.ensemble.df <- vxx.ensemble.df[,-101]
vxv.ensemble.df <- vxv.ensemble.df[,-101]
vxmt.ensemble.df <- vxmt.ensemble.df[,-101]

i=1
sma <- rep(51:70,each=length(xiv.ensemble.df))
length.dfs <- rep(1:100,11)
for (i in 1:length(length.dfs)) {
    tryCatch({
  ############################################
  # Back Test VXV / VXMT Ratio Over Boot Strapped Series
  ############################################
  # Calculate Close-to-Close returns
  df.boot$vxx.close.ret <- ROC(vxx.ensemble.df[,length.dfs[i]], type = c("discrete"))
  df.boot$xiv.close.ret <- ROC(xiv.ensemble.df[,length.dfs[i]], type = c("discrete"))

  # VXV / VXMT Ratio
  df.boot$vxv.vxmt.ratio <- vxv.ensemble.df[,length.dfs[i]]/ vxmt.ensemble.df[,length.dfs[i]]
  df.boot$vxv.vxmt.ratio[is.na(df.boot$vxv.vxmt.ratio)] <- 0
  df.boot$vxv.vxmt.ratio[is.nan(df.boot$vxv.vxmt.ratio)] <- 0
  df.boot$vxv.vxmt.ratio[is.infinite(df.boot$vxv.vxmt.ratio)] <- 0

  # Create sma
  df.boot$sma <- SMA(df.boot[,"vxv.vxmt.ratio"], sma[i])    # Calls TTR package to create SMA

    # Enter buy / sell rules
  sma.n <- sma[i]
    #df.boot$vxx.signal <- ifelse(df.boot$vxv.vxmt.ratio > 1 & df.boot$vxv.vxmt.ratio > df.boot[,paste0("ratio.sma.n", sma.n)], 1,0)
  df.boot$vxx.signal <- ifelse(df.boot$vxv.vxmt.ratio > 1 & df.boot$vxv.vxmt.ratio > df.boot$sma, 1,0)
  df.boot$xiv.signal <- ifelse(df.boot$vxv.vxmt.ratio < 1 & df.boot$vxv.vxmt.ratio < df.boot$sma, 1,0)

    # lag signal by two forward days
    # CBOE data is available next day
    df.boot$vxx.signal <- lag(df.boot$vxx.signal,2) # Note k=1 implies a move *forward*
    df.boot$xiv.signal <- lag(df.boot$xiv.signal,2) # Note k=1 implies a move *forward*

    # Calculate equity curves
    df.boot$vxx.signal.ret <-  df.boot$vxx.signal * df.boot$vxx.close.ret
    df.boot$xiv.signal.ret <-  df.boot$xiv.signal * df.boot$xiv.close.ret

    # Combine signals
    df.boot$total.signal.ret <- df.boot$vxx.signal.ret + df.boot$xiv.signal.ret

    # Pull select columns from data frame to make XTS whilst retaining formats
    xts1 = xts(df.boot$vxx.signal.ret, order.by=as.Date(df.boot$Date, format="%m/%d/%Y"))
    xts2 = xts(df.boot$xiv.signal.ret, order.by=as.Date(df.boot$Date, format="%m/%d/%Y"))
    xts3 = xts(df.boot$total.signal.ret, order.by=as.Date(df.boot$Date, format="%m/%d/%Y"))
    tail(xts3)

    # Join XTS together
    compare <- cbind(xts1,xts2,xts3)

    # Use the PerformanceAnalytics package for trade statistics

    require(PerformanceAnalytics)
    colnames(compare) <- c("vxx","xiv","combined")
    #charts.PerformanceSummary(compare,main="Long when current month is higher than previous 12 month", wealth.index=TRUE, colorset=rainbow12equal)
    # performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
    # drawdown.table <- rbind(table.Drawdowns(xts3))
    #dev.off()
    # logRets <- log(cumprod(1+compare))
    # chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal,main="Log Returns")

    #print(performance.table)
    #print(drawdown.table)
    cum.ret <- Return.cumulative(xts3, geometric = TRUE)
    annualized <- Return.annualized(xts3, scale = NA, geometric = TRUE)
    dd <- maxDrawdown(xts3)
    sharpe <- SharpeRatio.annualized(xts3, Rf = 0, scale = NA, geometric = TRUE)
    id <- paste0("col",length.dfs[i],"sma",sma[i])

    # Create data output of rep and close.diff columns rbind
   out <- data.frame("Annualized Return" = annualized,"Annualized Sharpe" = sharpe,"Cumulative Return" = cum.ret,"Maximum Draw Down" = dd, id = id)
   rownames(out) <- paste0("col",length.dfs[i],"sma",sma[i])
    boot_output[[i]] <- rbind(out)

     ptm0 <- proc.time()
    Sys.sleep(0.1)
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

  # Join  boot output
  master <- do.call(rbind, boot_output)
  names(master)

  # Find confidence intervals
  annualized.ret.mean <- mean(master$Annualized.Return)
  annualized.ret.sdev <- sd(master$Annualized.Return)
  annualized.ret.sample.size <- nrow(master)
  annualized.error <- qnorm(0.975)* annualized.ret.sdev/sqrt( annualized.ret.sample.size)
  annualized.left <- annualized.ret.mean - annualized.error
  annualized.right <- annualized.ret.mean + annualized.error

  #annualized.SE <- annualized.ret.sdev / sqrt(annualized.ret.sample.size)
  #t.test(master$Annualized.Sharpe,conf.level = 0.98)

  annualized.sharpe.mean <- mean(master$Annualized.Sharpe)
  annualized.sharpe.sdev <- sd(master$Annualized.Sharpe)
  annualized.sharpe.sample.size <- nrow(master)
  annualized.sharpe.error <- qnorm(0.975)* annualized.sharpe.sdev/sqrt(annualized.sharpe.sample.size)
  annualized.sharpe.left <- annualized.sharpe.mean - annualized.error
  annualized.sharpe.right <- annualized.sharpe.mean + annualized.sharpe.error

  Maximum.Draw.Down.mean <- mean(master$Maximum.Draw.Down)
  Maximum.Draw.Down.sdev <- sd(master$Maximum.Draw.Down)
  Maximum.Draw.Down.sample.size <- nrow(master)
  Maximum.Draw.Down.error <- qnorm(0.975)* Maximum.Draw.Down.sdev/sqrt( Maximum.Draw.Down.sample.size)
  Maximum.Draw.Down.left <- Maximum.Draw.Down.mean - annualized.error
  Maximum.Draw.Down.right <- Maximum.Draw.Down.mean + Maximum.Draw.Down.error

  #t.test(master$Annualized.Return)
 # jarque.bera.test(master$Maximum.Draw.Down)
  require(tseries)

  # Plot
  # Annualized Return
  library(ggplot2)
  p1 <- ggplot(data=master, aes(Annualized.Return,col=I("red")))+
    geom_histogram(binwidth = 0.02)+
    theme_classic()+
    ggtitle("Resampled Strategy Results - Sma 51 to 70 - Annualized Return",subtitle="Strategy run over 100 resampled time series")+
    labs(x="Annualized Return",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
   # geom_vline(xintercept = left, color="blue", linetype="dashed")+
    #geom_vline(xintercept = right, color="blue", linetype="dashed")+
    geom_vline(xintercept = annualized.ret.mean, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 0.425727", x = 0.35, y = 117, color = "blue")

 # Sharpe Ratio
 p3 <- ggplot(data=master, aes(Annualized.Sharpe,col=I("red")))+
    geom_histogram(binwidth = 0.001)+
    theme_classic()+
    ggtitle("Resampled Strategy Results - Sma 51 to 70 - Annualized Sharpe Ratio",subtitle="Strategy run over 100 resampled time series")+
    labs(x="Annualized.Sharpe",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    geom_vline(xintercept = annualized.sharpe.mean, color="blue", linetype="dashed")+
   geom_vline(xintercept = annualized.sharpe.right, color="blue", linetype="dashed")+
   geom_vline(xintercept = annualized.sharpe.left, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 1.02887", x = 1.02887, y = 15, color = "blue")

 annualized.sharpe.right

  # Maximum DD
  # Sharpe Ratio
  p5 <- ggplot(data=master, aes(Maximum.Draw.Down,col=I("red")))+
    geom_histogram(binwidth = 0.01)+
    theme_classic()+
    ggtitle("Resampled Strategy Results - Sma 51 to 70 - Maximum Draw Down",subtitle="Strategy run over 100 resampled time series")+
    labs(x="Maximum.Draw.Down",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    geom_vline(xintercept = Maximum.Draw.Down.mean, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 0.5020338", x = 0.6, y = 65, color = "blue")

  ###################
  # Plot Original back test results
  ##################
  # Plot
  # Annualized Return
  library(ggplot2)
  # find original mean annualized
  mean.orig.annaulized <- mean(show$Annualized.Return)
  p2 <- ggplot(data=show, aes(Annualized.Return,col=I("red")))+
    geom_histogram(binwidth = 0.01)+
    theme_classic()+
    ggtitle("Original Strategy Results - Sma 51 to 70 - Annualized Return",subtitle="")+
    labs(x="Annualized Return",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    # geom_vline(xintercept = left, color="blue", linetype="dashed")+
    #geom_vline(xintercept = right, color="blue", linetype="dashed")+
    geom_vline(xintercept = mean.orig.annaulized, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 0.6039258", x = 0.585, y = 15, color = "blue")
  require(gridExtra)
  gridExtra::grid.arrange(p1, p2, nrow = 1)

  # Sharpe Ratio
  # find original mean annualized
  mean.orig.sharpe <- mean(show$Annualized.Sharpe)
 p4 <- ggplot(data=show, aes(Annualized.Sharpe,col=I("red")))+
    geom_histogram(binwidth = 0.02)+
    theme_classic()+
    ggtitle("Original Strategy Results - Sma 51 to 70 - Annualized Sharpe Ratio",subtitle="")+
    labs(x="Annualized.Sharpe",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    geom_vline(xintercept = mean.orig.sharpe, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 1.565738", x = 1.52, y = 6.5, color = "blue")
 gridExtra::grid.arrange(p3, p4, nrow = 1)

  # Maximum DD
  # Sharpe Ratio
 names(show)
 show$Maximum.Draw.Down
 mean.orig.maxdd <- mean(show$Maximum.Draw.Down)
  p6 <- ggplot(data=show, aes(Maximum.Draw.Down,col=I("red")))+
    geom_histogram(binwidth = 0.0001)+
    theme_classic()+
    ggtitle("Original Strategy Results - Sma 51 to 70 - Maximum Draw Down",subtitle="")+
    labs(x="Maximum.Draw.Down",y="Count")+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
    geom_vline(xintercept = mean.orig.maxdd, color="blue", linetype="dashed")+
    annotate("text", label = "Mean = 0.3420499", x = 0.35, y = 7, color = "blue")
  gridExtra::grid.arrange(p5, p6, nrow = 1)

ann ret

Mean annualized return over the re-sampled time series is .43% vs original strategy back test of 0.6039258%.

sharpe

Mean annualized sharpe ratio over the re-sampled time series is 1.02887 vs original strategy back test of 1.565738.

max dd

Mean maximum draw down over the re-sampled time series is 0.5020338 % vs original strategy back test of 0.3420499 %.

Table original optimized parameters:
area in 50 to 60

Over all the monte carlo distributions, on average, show a reduction in performance on annualized return, sharpe ratio and maximum draw down.

It should be noted however that the expectancy is somewhat positive. The mean, variance of the resampled series is different to the original series, however, this may still not be the best way to proceed. The meboot method retains the structure of the original series and the resampled series bear a strong resemblance to the original series. However, the results do show that a change in mean / variance, although drop in performance, the strategy parameters themselves did not fall apart. I deem that useful information moving forward as a type of sensitivity test to changes in mean / variance.

The range 51 to 70 sma, its like throwing darts, your likely to land in a zone that would make money going forward.

This research on my end is very infant and I look for other ways to stress test strategies. Some which come to mind include:

1. Run monte carlo simulations on the distribution of returns. Changing the order many times may give an expectancy of losing streak and maximum draw down.
2. Find other ways to bootstrap time series data.
3. Randomly add / subtract an amount to original price series changing the data itself, run multiple iterations.

4. Walk forward analysis, multiple train and test periods

5. In out of sample testing, hold out method

More to come

Please feel free to drop me line on other methods of strategy stress test testing and any other general collaboration.

Thanks for reading!

Full code can be found on my github.

XIV | VXX – Testing Mean Reversion, Momentum, Random Walk – Rolling Hurst Exponent

In addition to the previous posts studying volatility strategies here and here we aim to study the nature of the XIV,VXX series. We subject the series to the Hurst exponent and we follow the same procedure as Ernie Chan (2013) where we take the lagged price differences and perform a regression on the log time lags vs log variance of the lagged differences, an example of this here.

For this post we will download VIX futures data from CBOE website and join synthetic XIV and XIV data to make the largest data set we possibly can per this previous post. After we have our XIV and VXX data we then proceed to create the hurst exponent of each series and the procedure for this is as follows:

1. Compute lagged differences on varying time lags. For example. lag 2 = todays close price – close price 2 days ago. lag 3 = todays close price – close price 3 days ago.
2. Next, Compute the variance of the lagged differences. Ernie chan recommends at least 100 days of data for this. Compute variance over a period of 100 days for each lagged difference.
3. Perform linear regression of the x,y,  log(time_lags) ~ log(variance_lagged_differences) and divide the slope by 2 to obtain the hurst exponent.

We compute the procedure above on a rolling basis and the look back chosen for the variance is 100 trading days. We also use the R package RcppEigen and use fastlm to perform a rolling linear regression. The code that achieves this:

###############################
# Hurst Exponent (varying lags)
###############################

require(magrittr)
require(zoo)
require(lattice)

## Set lags
lags <- 2:(252*6)

# Function for finding differences in lags. Todays Close - 'n' lag period
getLAG.DIFF <- function(lagdays) {
  function(term.structure.df) {
    c(rep(NA, lagdays), diff(term.structure.df$vxx_close, lag = lagdays, differences = 1, arithmetic = TRUE, na.pad = TRUE))
  }
}
# Create a matrix to put the lagged differences in
lag.diff.matrix <- matrix(nrow=nrow(term.structure.df), ncol=0)

# Loop for filling it
for (i in lags) {
  lag.diff.matrix <- cbind(lag.diff.matrix, getLAG.DIFF(i)(term.structure.df))
}

# Rename columns
colnames(lag.diff.matrix) <- sapply(lags, function(n)paste("lagged.diff.n", n, sep=""))

# Bind to existing dataframe
term.structure.df <-  cbind(term.structure.df, lag.diff.matrix)
head(term.structure.df,25)

############################################################
# Calculate rolling variances of 'n period' differences
# Set variance look back to 100 days
############################################################
# Convert NA to 0
term.structure.df[is.na(term.structure.df)] <- as.Date(0)

get.VAR <- function(varlag) {
  function(term.structure.df) {
    runVar(term.structure.df[,paste0("lagged.diff.n", lags[i])], y = NULL, n = 100, sample = TRUE, cumulative = FALSE)
  }
}
# Create a matrix to put the variances in
lag.var.matrix <- matrix(nrow=nrow(term.structure.df), ncol=0)

# Loop for filling it
for (i in 1:length(lags)) {
  lag.var.matrix <- cbind(lag.var.matrix, get.VAR(i)(term.structure.df))
}

# Rename columns
colnames(lag.var.matrix) <- sapply(lags, function(n)paste("roll.var.diff.", n, sep=""))

# Bind to existing dataframe
term.structure.df <-  cbind(term.structure.df, lag.var.matrix)

########################################
# Subset to remove all leading NA
########################################
#NonNAindex <- which(!is.na(term.structure.df))
#set_lag_threshold <- 50 # Set Variance
#na <- which(!is.na(term.structure.df[,paste0("roll.var.diff.", set_lag_threshold)]))
#firstNonNA <- min(na)
#term.structure.df<-term.structure.df[firstNonNA:nrow(term.structure.df),]

########################################
# Rolling linear regression to compute hurst exponent
########################################
variance <- list()
lag.vec <- c(2:30)  # Select short term lags
# Select column selection
for (i in 1:nrow(term.structure.df)) {
  variance[i] <- list(c(term.structure.df$roll.var.diff.2[i],
                        term.structure.df$roll.var.diff.3[i],
                        term.structure.df$roll.var.diff.4[i],
                        term.structure.df$roll.var.diff.5[i],
                        term.structure.df$roll.var.diff.6[i],
                        term.structure.df$roll.var.diff.7[i],
                        term.structure.df$roll.var.diff.8[i],
                        term.structure.df$roll.var.diff.9[i],
                        term.structure.df$roll.var.diff.10[i],
                        term.structure.df$roll.var.diff.11[i],
                        term.structure.df$roll.var.diff.12[i],
                        term.structure.df$roll.var.diff.13[i],
                        term.structure.df$roll.var.diff.14[i],
                        term.structure.df$roll.var.diff.15[i],
                        term.structure.df$roll.var.diff.16[i],
                        term.structure.df$roll.var.diff.17[i],
                        term.structure.df$roll.var.diff.18[i],
                        term.structure.df$roll.var.diff.19[i],
                        term.structure.df$roll.var.diff.20[i],
                        term.structure.df$roll.var.diff.21[i],
                        term.structure.df$roll.var.diff.22[i],
                        term.structure.df$roll.var.diff.23[i],
                        term.structure.df$roll.var.diff.24[i],
                        term.structure.df$roll.var.diff.25[i],
                        term.structure.df$roll.var.diff.26[i],
                        term.structure.df$roll.var.diff.27[i],
                        term.structure.df$roll.var.diff.28[i],
                        term.structure.df$roll.var.diff.29[i],
                        term.structure.df$roll.var.diff.30[i]))
}

#Initialize list, pre allocate memory
results<-vector("list", length(variance))
hurst<-vector("list", length(variance))
library(RcppEigen)
i=1
for(i in 1:length(variance)){
  results[[i]]<-fastLm( log(lag.vec) ~ log(variance[[i]]), data=variance)
  hurst[[i]]<- coef(results[[i]])[2]/2
  ptm0 <- proc.time()
  Sys.sleep(0.1)
  ptm1=proc.time() - ptm0
  time=as.numeric(ptm1[3])
  cat('\n','Iteration',i,'took', time, "seconds to complete")
}

# Join results to data frame
hurst <- do.call(rbind, hurst)
hurst.df <- as.data.frame(hurst)
hurst.df <- data.frame(hurst.df,Date=term.structure.df$Date)
colnames(hurst.df)[1] <- "Hurst"
hurst.df <- subset(hurst.df, Date >= as.POSIXct("2008-04-28") ) # subset data remove leading NA VXX only
# Plot Data
ggplot() +
  geom_line(data=hurst.df ,aes(x=Date,y=Hurst), colour="black") +
  theme_classic()+
  scale_y_continuous(breaks = round(seq(min(hurst.df$Hurst), max(hurst.df$Hurst), by = 0.2),2))+
  scale_x_date(breaks = date_breaks("years"), labels = date_format("%Y"))+
  ggtitle("VXX Hurst Exponent - Daily Bars - Lags 2:30", subtitle = "Regression log(variances) ~ log(time_lags) - Hurst = Coef/2") +
  labs(x="Date",y="Hurst")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  #geom_hline(yintercept = 0.5, color = "red", size=0.5,linetype="dashed")+
  geom_rect(aes(xmin=as.Date(head(hurst.df$Date,1)),xmax=as.Date(Inf),ymin=0.5,ymax=Inf),alpha=0.1,fill="green")+
  geom_rect(aes(xmin=as.Date(head(hurst.df$Date,1)),xmax=as.Date(Inf),ymin=-Inf,ymax=0.5),alpha=0.1,fill="orange")+
  geom_rect(aes(xmin=as.Date(head(hurst.df$Date,1)),xmax=as.Date(Inf),ymin=0.48,ymax=0.52),alpha=.7,fill="red")

The output:

Rplot140
Random Walk Band = Red Between .48 and .52. Momentum = Green > 0.52. Mean Reversion = Red < .48.
Rplot142
Random Walk Band = Red Between .48 and .52. Momentum = Green > 0.52. Mean Reversion = Red < .48.

This is for lagged differences of 2 to 30 days. A rolling variance of 100 days. We chose the smaller lagged differences as the strategies going forward will likely hold no longer than 30 days so it makes sense to see the nature of the series on this time period.

We can compute how often XIV and VXX is in a momentum, mean reversion and random walk regime. The code for this:

# Count how often in each regime
momo <- sum(hurst.df$Hurst > 0.52, na.rm=TRUE)
mean.rev <- sum(hurst.df$Hurst < 0.48 , na.rm=TRUE)
random <- sum(hurst.df$Hurst >= 0.48 & hurst.df$Hurst <=.52, na.rm=TRUE)
exact.random <- sum(hurst.df$Hurst >= 0.50 & hurst.df$Hurst <.51, na.rm=TRUE)
total.rows <- NROW(hurst.df)

# Percentage of time in momentum, mean reversion, random walk
momo.perc <- momo / total.rows
mean.rev.perc <- mean.rev / total.rows
random.perc <- random / total.rows
exact.random.perc <- exact.random / total.rows

VXX:

vxx.percs.df <- data.frame ("Momentum, Over 0.50" = momo.perc,"Mean Reversion, Less than 0.5" = mean.rev.perc, "Random Walk Band, 0.48 to 0.52" = random.perc, "Exact Random Walk, 0.50" = exact.random.perc) vxx.percs.df > vxx.percs.df
  Momentum..Over.0.50 Mean.Reversion..Less.than.0.5 Random.Walk.Band..0.48.to.0.52 Exact.Random.Walk..0.50
1           0.7471264                     0.1395731                      0.1133005              0.02791461<span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>

XIV:

 xiv.percs.df <- data.frame ("Momentum, Over 0.50" = momo.perc,"Mean Reversion, Less than 0.5" = mean.rev.perc, "Random Walk Band, 0.48 to 0.52" = random.perc, "Exact Random Walk, 0.50" = exact.random.perc) xiv.percs.df  > xiv.percs.df
  Momentum..Over.0.50 Mean.Reversion..Less.than.0.5 Random.Walk.Band..0.48.to.0.52 Exact.Random.Walk..0.50
1           0.7081281                     0.2085386                     0.08333333                0.022578

What we see is VXX is in momentum phase 74% of the time and XIV in momentum phase 70% of the time. That is the dominating theme where mean reversion 13% (VXX) and 20% (XIV) and times of random walk 11%(VXX) and 8% (XIV).

If fitting a model to the series itself without using the volatility risk premium / roll yield as entry signals. One may try fitting models based on the theme of momentum.

In subsequent posts we will be applying models based on extracting profits when the market is in contango/backwardation as well as applying models to the series itself. We will also look at the auto correlatation of the series and this will serve as a primer for testing a strategy for robustness.

Thanks for reading!

Creating Constant 30 Day Maturity From VIX1 | VIX2 Futures Contracts

In previous post Creating synthetic VXX and XIV data and plotting the VIX term structure using R we created the VIX term structure from CBOE data.

In this post we will create a 30 day constant maturity contract for the VIX1 | VIX2 futures contracts. This code is an extension and total rework of Ilya Kipnis term structure code.

The methodology for this procedure is to weight VIX1 and VIX2 based on the number of days remaining to expiration. The math for this taken from this prospectus, PS-32.

eg

where:
dt = The total number of business days in the current Roll Period beginning with, and including, the starting CBOE VIX Futures Settlement Date and ending with, but excluding, the following CBOE VIX Futures Settlement Date. The number of business days will not change for purposes of this calculation in cases of a new holiday introduced intra-month or an unscheduled market closure.
dr = The total number of business days within a roll period beginning with, and including, the following business day and ending with, but excluding, the following CBOE VIX Futures Settlement Date. The number of business days includes a new holiday introduced intra-month up to the business day preceding such a holiday.

At the close on the Tuesday corresponding to the start of the Roll Period, all of the weight is allocated to the first month contract. On each subsequent business day a fraction of the first month VIX futures contracts are sold and an equal notional amount of the second month VIX futures contracts is bought. The fraction, or quantity, is proportional to the number of first month VIX futures contracts as of the previous index roll day, and inversely proportional to the length of the current Roll Period. In this way the initial position in the first month contract is progressively moved to the second month over the course of the month, until the next following Roll Period starts when the old second month VIX futures contract becomes the new first month VIX futures contract. In addition to the transactions described above, the weight of each index component is also adjusted every day to ensure that the change in total dollar exposure for the index is only due to the price change of each contract and not due to using a different weight for a contract trading at a higher price

In short, on Tuesday before VIX Wednesday expiration 100% of the weight will be in the VIX1 contract, the former VIX2 contract. As a short example. Lets say there are 25 days in the roll period (dt). On day 10 (dr) the weight in the front month (CRW1,t) would be 100*dr/dt = 40%. On the Monday before Wednesday VIX expiration, dt = 25, dr = 1, CRW1,t = 100*dr/dt = 4%. As the prospectus describes, “At the close on the Tuesday corresponding to the start of the Roll Period, all of the weight is allocated to the first month contract.” Thus on Tuesday before Wednesday expiration, if (dt) = 20, (dr) also =20 and the weight at the start of a new roll period would be CRW,1,t = 100*dr/dt = 100% as per the prospectus. Remember on Tuesday, VIX2 becomes VIX1 because this is when the start of the roll period begins for the next expiration date.

One note here, in this example we provide no additional dollar weight adjustments.

With that out of the way. The R code to perform the above procedure.

# VIX1|VIX2 30 Day Constant Maturity
# Andrew Bannerman 12.10.2017 

require(xts)
require(data.table)
require(ggplot2)
require(lubridate)
require(magrittr)
require(scales)
require(reshape2)
require(dplyr)
require(PerformanceAnalytics)
require(quantmod)

# 06 through 18
years <- c(paste0("0", c(6:9)), as.character(c(10:18)))

# futures months
futMonths <- c("F", "G", "H", "J", "K", "M",
               "N", "Q", "U", "V", "X", "Z")

# expiries come from http://www.macroption.com/vix-expiration-calendar/
expiries <- read.table("D:/R Projects/Final Scripts/VIX_term_structure/expiries.txt", header=FALSE)

# convert expiries into dates in R
dateString <- paste(expiries$V3, expiries$V2, expiries$V1, sep = "-")
dates <- as.Date(dateString, format = "%Y-%B-%d")

# map futures months to numbers for dates
monthMaps <- cbind(futMonths, c("01", "02", "03", "04", "05", "06",
                                "07", "08", "09", "10", "11", "12"))
monthMaps <- data.frame(monthMaps)
colnames(monthMaps) <- c("futureStem", "monthNum")

settlement.dates <- data.frame(Date=dates, Expiration.Date=2)
dates <- data.frame(dates)
dates$dateMon <- substr(dates$dates, 1, 7) # Extract year month only

contracts <- expand.grid(futMonths, years)
contracts <- paste0(contracts[,1], contracts[,2])
#contracts <- c(contracts, "F18")
stem <- "https://cfe.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_"
#contracts <- paste0(stem, contracts, "_VX.csv")

masterlist <- list()
timesToExpiry <- list()
i=1
for(i in 1:length(contracts)) {

  # obtain data
  contract <- contracts[i]
  dataFile <- paste0(stem, contract, "_VX.csv")
  expiryYear <- paste0("20",substr(contract, 2, 3)) # Paste element 2 and 3 from the contract xYY
  expiryMonth <- monthMaps$monthNum[monthMaps$futureStem == substr(contract,1,1)]
  expiryDate <- dates$dates[dates$dateMon == paste(expiryYear, expiryMonth, sep="-")]
  data <- suppressWarnings(fread(dataFile))
  # create dates
  dataDates <- as.Date(data$`Trade Date`, format = '%m/%d/%Y')

  # create time to expiration xts
  toExpiry <- xts(expiryDate - dataDates, order.by=dataDates)
  colnames(toExpiry) <- contract
  timesToExpiry[[i]] <- toExpiry

  # get settlements
  settlement <- xts(data$Settle, order.by=dataDates)
  colnames(settlement) <- contract
  masterlist[[i]] <- settlement
}
i
# cbind outputs
masterlist <- do.call(cbind, masterlist)
timesToExpiry <- do.call(cbind, timesToExpiry)

# NA out zeroes in settlements
masterlist[masterlist==0] <- NA

sumNonNA <- function(row) {
  return(sum(!is.na(row)))
}

simultaneousContracts <- xts(apply(masterlist, 1, sumNonNA), order.by=index(masterlist))
chart.TimeSeries(simultaneousContracts)

dim(masterlist)
i=1
termStructure <- list()
expiryStructure <- list()
masterDates <- unique(c(first(index(masterlist)), dates$dates[dates$dates %in% index(masterlist)], Sys.Date()-1)) # %in% operator matches dates, sys.date-1 to include final range date
for(i in 1:(length(masterDates)-1)) {
  subsetDates <- masterDates[c(i, i+1)]
  dateRange <- paste(subsetDates[1], subsetDates[2], sep="::")
  subset <- masterlist[dateRange,c(i:(i+7))]
  subset <- subset[-1,]
  expirySubset <- timesToExpiry[index(subset), c(i:(i+7))]
  colnames(subset) <- colnames(expirySubset) <- paste0("C", c(1:8))
  termStructure[[i]] <- subset
  expiryStructure[[i]] <- expirySubset
}

termStructure <- do.call(rbind, termStructure)
expiryStructure <- do.call(rbind, expiryStructure)

simultaneousContracts <- xts(apply(termStructure, 1, sumNonNA), order.by=index(termStructure))
chart.TimeSeries(simultaneousContracts)

plot(t(coredata(last(termStructure))), type = 'b')
# Plot specific date term structure
backwardation <- termStructure["2017-11-30"] # Subset specific date
back.df <- as.data.frame(backwardation)
back.df <- setDT(back.df, keep.rownames = TRUE)[] # Set row names
colnames(back.df)[1] <- "Date"
back.df$Date <- ymd(back.df$Date)
back.df <- melt(data = back.df,id.vars = 'Date')   # melt df for plotting with ggplot2
colnames(back.df)[2] <- "Contract"

# plot
ggplot(data=back.df,aes(x=Contract,y=value,group = 1))+
  geom_point()+  geom_line()+
  theme_classic()+
  ggtitle("VIX Term Structure for Date 2017-11-30",subtitle="Example of Contango")+
  labs(x="Contract",y="Settlement")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))

# Save Term Structure to data frame
term.structure.df <- as.data.frame(termStructure)
term.structure.df <- setDT(term.structure.df, keep.rownames = TRUE)[] # Set row names
colnames(term.structure.df)[1] <- "Date"
term.structure.df$Date <- ymd(term.structure.df$Date)

# Adjust for 10:1 split pre 03-26-2007
# Split information here: http://cfe.cboe.com/publish/CFEinfocirc/CFEIC07-003%20.pdf
df.10.split <- subset(term.structure.df, Date < as.POSIXct("2007-03-26/") ) # subset data prior to split
library(magrittr)
df.10.split[,2:9] %<>% lapply(function(x) x / 10) # appply split to all columns excluding date 1
df.post.split <- subset(term.structure.df, Date >= as.POSIXct("2007-03-26/") )  # subset post split
term.structure.df <- rbind(df.10.split,df.post.split) # rbind pre and post split data frames
tail(term.structure.df,50)
length(term.structure.df)

# Find last dates
last <- tail(term.structure.df$Date,1)
last.settle <- tail(settlement.dates$Date,1)
date.fill <- as.data.frame(seq(as.Date(last), as.Date(last.settle), "days"))
colnames(date.fill)[1] = "Date"
date.excl.wknds <- date.fill[which(weekdays(as.Date(date.fill$Date, format = "%Y/%m/%d"))
                              %in% c('Monday','Tuesday', 'Wednesday', 'Thursday', 'Friday')), ]

# Dummy data in order to fill out to future expiration dates
dates.df <- data.frame(date.excl.wknds,NA,NA,NA,NA,NA,NA,NA,NA)
colnames(dates.df)[1] <- "Date"
colnames(dates.df)[2] <- "C1"
colnames(dates.df)[3] <- "C2"
colnames(dates.df)[4] <- "C3"
colnames(dates.df)[5] <- "C4"
colnames(dates.df)[6] <- "C5"
colnames(dates.df)[7] <- "C6"
colnames(dates.df)[8] <- "C7"
colnames(dates.df)[9] <- "C8"
head(dates.df)
term.structure.df <- rbind(term.structure.df,dates.df)

# Calculate constant 30 day VIX1|VIX2 maturity
# Check weights http://www.ipathetn.com/US/16/en/details.app?instrumentId=259118
# USe method: http://app.velocitysharesetns.com/files/prospectus/PRICING_SUPPLEMENT_No__VLS_ETN-1_A36_long_form_2.pdf
# Join settlement dates
term.structure.df <- full_join(term.structure.df, settlement.dates, by = c("Date" = "Date"))
term.structure.df$Expiration.Date[is.na(term.structure.df$Expiration.Date)] <- 1  # Turn all NA to 1
# Calculate days in roll period
#term.structure.df <- subset(term.structure.df, Date >= as.POSIXct("2014-09-16") & Date <= as.POSIXct("2014-11-21")) # Subset between matches
term.structure.df$Day.in.roll.period <- NULL
for(i in 1:nrow(term.structure.df)) {
  term.structure.df$Day.in.roll.period[i] <- ifelse(term.structure.df$Expiration.Date[i + 1]==2,1,term.structure.df$Day.in.roll.period[i - 1]+1)
}

# Calculate dt = Total days in roll period
term.structure.df$dt <- NULL  # Initialize
for(i in nrow(term.structure.df):1) {
  term.structure.df$dt[i] <- ifelse(term.structure.df$Expiration.Date[i+2]==2,term.structure.df$Day.in.roll.period[i],NA)
}
library(zoo)
dt.total.days.in.roll.period  <- na.locf(term.structure.df$dt, fromLast = TRUE)
length(dt.total.days.in.roll.period)
diff <- nrow(term.structure.df) - length(dt.total.days.in.roll.period)  # Find difference in lengths
dt.total.days.in.roll.period<- c(dt.total.days.in.roll.period ,rep(NA,diff)) # Pad out with NA to match
term.structure.df <- data.frame(term.structure.df,dt.total.days.in.roll.period)

# Calculate dr = Days remaining in roll period
term.structure.df$dr <- NULL  # Initialize
for(i in nrow(term.structure.df):1) {
  term.structure.df$dr[i] <- ifelse(term.structure.df$Expiration.Date[i+2]==2,1,term.structure.df$dr[i + 1]+1)
}

# Calculate CRW(1,t) Front month weight at close of day
term.structure.df$CRW <- (term.structure.df$dr / term.structure.df$dt.total.days.in.roll.period)*1

# Adjust front and 2nd month contacts at Tuesday prior to Expiration
# Shift VIX2 at Tuesday before wednesday expiration
# Pass 1
term.structure.df$C.1.roll.prices.pass.1 <- NULL  # Initialize
for(i in nrow(term.structure.df):1) {
  term.structure.df$C.1.roll.prices.pass.1[i] <- ifelse(term.structure.df$Expiration.Date[i+1]==2,term.structure.df$C2[i],term.structure.df$C1[i])
}
# Pass 2
term.structure.df$C.1.roll.prices.pass.2 <- NULL  # Initialize
for(i in nrow(term.structure.df):1) {
  term.structure.df$C.1.roll.prices.pass.2[i] <- ifelse(term.structure.df$Expiration.Date[i]==2,term.structure.df$C2[i],term.structure.df$C.1.roll.prices.pass.1[i])
}
colnames(term.structure.df)[17] = "C1.Rolled"

# Shift VIX3 at Tuesday before wednesday expiration
# Pass 1
term.structure.df$C.2.roll.prices.pass.1 <- NULL  # Initialize
for(i in nrow(term.structure.df):1) {
  term.structure.df$C.2.roll.prices.pass.1[i] <- ifelse(term.structure.df$Expiration.Date[i+1]==2,term.structure.df$C3[i],term.structure.df$C2[i])
}
# Pass 2
term.structure.df$C.2.roll.prices.pass.2 <- NULL  # Initialize
for(i in nrow(term.structure.df):1) {
  term.structure.df$C.2.roll.prices.pass.2[i] <- ifelse(term.structure.df$Expiration.Date[i]==2,term.structure.df$C3[i],term.structure.df$C.2.roll.prices.pass.1[i])
}
colnames(term.structure.df)[19] = "C2.Rolled"

# 30 Day Constant Maturity futures price VIX1 | VIX2
# CRW(1,t) * VIX1 + (1-CRW(1,t)) * VIX2
term.structure.df$vix1.2.contstant <- (term.structure.df$CRW * term.structure.df$C1.Rolled) +((1-term.structure.df$CRW)* term.structure.df$C2.Rolled)
# Download spot VIX from CBOE
VIX <- fread("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv", skip = 1)
VIX <- as.data.frame(VIX)
VIX$Date <- mdy(VIX$Date)
colnames(VIX)[5] <- "Spot_VIX"
head(VIX)
#VIX <- xts(VIX$vix_close, order.by=as.Date(VIX$Date, format = '%Y/%m/%d'))
#VIX1VIX2 = xts(term.structure.df$vix1.2.contstant, order.by=as.Date(term.structure.df$Date, format="%Y-%m-%d"))
tail(term.structure.df$vix1.2.contstant)
# Merge VIX1|VIX2 by common date
vix12.df <- data.frame(Date=term.structure.df$Date,VIX1VIX2=term.structure.df$vix1.2.contstant)
plot.df <- full_join(vix12.df, VIX, by = c("Date" = "Date"))

# Plot Spot Vix vs VIX1|VIX2 30 day constant maturity
ggplot() +
  geom_line(data=plot.df ,aes(x=Date,y=VIX1VIX2), colour="red3") +
  geom_line(data=plot.df,aes(x=Date,y=Spot_VIX), colour="#0072B2")+
  theme_classic()+
  scale_x_date(breaks = date_breaks("years"), labels = date_format("%Y"))+
  ggtitle("VIX Spot Vs VIX1|VIX2 30 Day Constant Maturity", subtitle = "") +
  labs(x="Date",y="Settlement")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  annotate("text", label = "Spot VIX", x = as.Date("2005-04-26"), y = 30, color = "#0072B2")+
  annotate("text", label = "VIX1|VIX2", x = as.Date("2011-10-26"), y = 60, color = "red3")

And this is the net result. Plotting Spot VIX to the VIX1|VIX2 constant 30 maturity:

Rplot136_constant

All code can be found on my github.

Thanks for reading, any questions comments or concerns please let me know!

Creating synthetic VXX and XIV data and plotting the VIX term structure using R

This is the first of a collection of volatility postings. Ilya Kipnis did a good job of writing an R script to download VIX settlement data direct from the CBOE website. Furthermore, we can use this data to re-create synthetic VXX, XIV series and kudos to Quant Bear for extending Ilya’s code, the post on how to create synthetic VXX data can be found here.

To follow this post you will need to use Ilya’s script and have the term structure data downloaded within R.

Here we plot the VIX term structure from 2005 to present:

Rplot90

Note the large change in price in 2007. This is due to a 10:1 split, more information here. We adjust all prices prior to the split date 2007-03-26 by dividing by 10. Next we re-plot the data. The code to do this:

# Prepare Data for plotting varying contract expirations
# Adjust for 10:1 split pre 03-26-2007
# Split information here: http://cfe.cboe.com/publish/CFEinfocirc/CFEIC07-003%20.pdf
# Change termStucture column names
colnames(termStructure)[1] = "F"
colnames(termStructure)[2] = "G"
colnames(termStructure)[3] = "H"
colnames(termStructure)[4] = "J"
colnames(termStructure)[5] = "K"
colnames(termStructure)[6] = "M"
colnames(termStructure)[7] = "N"
colnames(termStructure)[8] = "Q"

# Save xts series to date frame in order to melt
df <- data.frame(termStructure)
df <- setDT(df, keep.rownames = TRUE)[] # Set row names
colnames(df)[1] = "Date"
df.10.split <- subset(df, Date < as.POSIXct("2007-03-26/") ) # subset data prior to split
library(magrittr)
df.10.split[,2:9] %<>% lapply(function(x) x / 10) # appply split to all columns excluding date 1
df.post.split <- subset(df, Date >= as.POSIXct("2007-03-26/") ) # subset post split
df <- rbind(df.10.split,df.post.split) # rbind pre and post split data frames
df <- melt(data = df,id.vars = 'Date') # melt df for plotting with ggplot2
library(lubridate)
df$Date <- ymd(df$Date) # Convert date column to Date format
colnames(df)[2] = "Contract" # Rename

# Plot term structure
ggplot(data=df,aes(x=Date,y=value,colour=Contract))+
geom_line()+
scale_x_date(breaks = date_breaks("years"), labels = date_format("%Y"))+
ggtitle("VIX Term Structure",subtitle="Adjusted for 10:1 split pre 2007-03-26")+
labs(x="Year",y="Settlement")+
theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5)

Rplot93

If we want to hone in on any one time period we can subset between dates and replot by:

# Subet by date (for plotting any window for further examination)
df <- subset(df, Date >= as.POSIXct("2007-06-01") & Date <= as.POSIXct("2010-01-01") )

# Plot term structure
ggplot(data=df,aes(x=Date,y=value,colour=Contract))+
geom_line()+
scale_x_date(breaks = date_breaks("years"), labels = date_format("%Y"))+
ggtitle("VIX Term Structure",subtitle="Adjusted for 10:1 split pre 2007-03-26")+
labs(x="Year",y="Settlement")+
theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))

Rplot94

This is a clear view of a market in backwardation during the financial crsis. More information on backwardation here. And a plot for a backwardation period in 2008:

Rplot108

A an example for a market in contango:

Rplot110

Lets plot XIV and VXX data. That way we can get a feel for what happens to these products during contango and times of backwardation. We will use the syntethic data created from quant bears R code, an extensions to Ilyna’s code. Also, please follow quant bears post in order to correctly download the risk free rate data to .csv format. The code to create synthetic VXX and XIV data based on the CBOE data:

#This script allows you to calculate proxy values for the VXX ETF before its inception on the 29t of January 2009
#Prerequisites:

#obtain historical CBOE VIX Futures data following the instructions outlined in this post by Ilya Kipnis (QuantStratTradeR)
#https://quantstrattrader.wordpress.com/2017/04/27/creating-a-vix-futures-term-structure-in-r-from-official-cboe-settlement-data/

#have the variables expiries and masterlist defined exactly as described in the post above

#have obtained TBill Rates from https://www.treasurydirect.gov/instit/annceresult/annceresult_query.htm and manipulated as described in my post

#Loading required packages
require(xts)
require(bizdays)
require(timeDate)
load_rmetrics_calendars(2004:2018)

# expiries come from http://www.macroption.com/vix-expiration-calendar/
expiries <- read.table("D:/R Projects/Final Scripts/VIX_term_structure/expiries_1.txt", header=FALSE)

#Transforming the expiries
expiries = as.Date(apply(expiries,1,function(x){paste(x,collapse=" ")}),format="%d %b %Y")

#preparing the tbillrates
tbillrates <- read.csv("D:/R Projects/Final Scripts/VIX_term_structure/tbills.2.csv", header=TRUE, stringsAsFactors = FALSE)
tbillrates[is.na(tbillrates)] <- 0
tbillrates$Date <- mdy(tbillrates$Date)
str(tbillrates)
tbillrates = xts(tbillrates[,"Rate"],as.Date(tbillrates[,"Date"]))

#defining function to calculate the contract roll weights
getCRW <- function(today){
today = as.Date(today)
periodstart = expiries[max(which(expiries<=today))] periodend = expiries[min(which(expiries>today))]
dt = bizdays(periodstart,periodend,"Rmetrics/NYSE")
dr = bizdays(today,periodend,"Rmetrics/NYSE")-1
return(c(dr/dt,(dt-dr)/dt))
}

#defining function to calculate TBR
getTBR <- function(today,lastday){
today = as.Date(today)
lastday = as.Date(lastday)
delta = as.numeric(today-lastday)
rate = tbillrates[max(which(index(tbillrates)<today))]
tbr = (1/(1-91/360*rate))^(delta/91)-1
return(tbr)
}

i=1
#calculating the index values
days = index(masterlist["2005-12-20/2009-01-29"])
indx = 100000
for(i in 2:length(days)){
crw = getCRW(days[i-1])
tbr = getTBR(days[i],days[i-1])
fut1 = masterlist[days[i-1],which(!is.na(masterlist[days[i-1]]))[1:2]]
fut2 = masterlist[days[i],which(!is.na(masterlist[days[i]]))[1:2]]
if(!names(fut1)[1]%in%names(fut2)){
fut1 = masterlist[days[i-1],which(!is.na(masterlist[days[i-1]]))[2:3]]
}
twdi = sum(crw*as.numeric(fut1))
twdo = sum(crw*as.numeric(fut2))
cdr = twdo/twdi-1
#cdr = -cdr
val = indx[length(indx)]*(1+cdr+tbr)
indx = c(indx,val)
}
indx = xts(indx,days)

#adjusting for 10:1 split
indx["2007-03-26/"] = 10*indx["2007-03-26/"]
indxret = (indx/lag(indx)-1)[-1]

#calculating VXX values
vxxvals = 107089.921875
for(i in nrow(indxret):1){
tmp = vxxvals[length(vxxvals)]/(1+indxret[i,])
vxxvals = c(vxxvals,tmp)
}
vxxvals = rev(vxxvals)
vxxvals = xts(vxxvals,index(indx))
plot(vxxvals,type="l")
head(vxxvals)

We plot synthetic VXX data:

Rplot95

Next using the alphavantage API, See my post here for details to download alphavantage data using R, we will join synthetic VXX data to Alpha Vantage VXX data. The code to do this, note you will need to insert your API key:

# Join synthetic VXX data to Alpha Vantage VXX data
# Plot with ggplot2
# Download data
require(lubridate)
require(data.table)
require(dplyr)
# Note you need tyo place your API key...your_key_here
VXX <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=VXX&outputsize=full&apikey=.your_key_here&datatype=csv") #fread() data.table for downloading directly to a data frame
VXX$timestamp <- ymd(VXX$timestamp) #Lubridate to change character date to date format
VXX <- arrange(VXX,timestamp) #dplyr to sort data frame by date ascending order
colnames(VXX)[1] <- "Date"
VXX <- as.data.frame(VXX)
head(VXX)

# store synthetic VXX vxxvals price data to data.frame
df <- data.frame(vxxvals)
df <- setDT(df, keep.rownames = TRUE)[] # Set row names
colnames(df)[1] = "Date"
colnames(df)[2] = "close"
df$Date <- ymd(df$Date) # Convert to date format
df <- as.data.frame(df)

# Retain only Date and close columns for alphavantage VXX data
VXX <- VXX[ -c(2:5, 7:9) ]
colnames(VXX)[2] <- "close"

# ggplot2 to plot synthetic VXX and alphavantage VXX
require(ggplot2)
require(scales)
ggplot() +
geom_line(data=df,aes(x=Date,y=close), colour="red") +
geom_line(data=VXX,aes(x=Date,y=close), colour="black")+
theme_classic()+
scale_x_date(breaks = date_breaks("years"), labels = date_format("%Y"))+
ggtitle("VXX", subtitle = "Synthetic prior to 2009-01-29") +
labs(x="Date",y="VXX")+
theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
annotate("text", label = "Synthetic VXX", x = as.Date("2007-04-26"), y = 100000, color = "red")+
annotate("text", label = "Alphavantage VXX", x = as.Date("2015-04-26"), y = 100000, color = "black")

Rplot97

For XIV, per QUANTBEAR post, we need to inverse the sign of the synthetic VXX returns. We achieve this with the following code which is the very same code as VXX, except we flip the sign of the returns and we have tacked on the alpha vantage adjusted close prices to the synthetic data:

#adjusting for 10:1 split
indx["2007-03-26/"] = 10*indx["2007-03-26/"]
indxret = (indx/lag(indx)-1)[-1]
indxret = -indxret # Flip the sign of the returns ####

#calculating XIV values
xivvals = 9.557 ####### Note place first XIV price here (we are using first adjusted close aklpha vantage close)
for(i in nrow(indxret):1){
tmp = xivvals[length(xivvals)]/(1+indxret[i,])
xivvals = c(xivvals,tmp)
}
xivvals = rev(xivvals)
xivvals = xts(xivvals,index(indx))
plot(xivvals,type="l")

# Join synthetic XIV data to Alpha Vantage XIV data
# Plot with ggplot2
# Download data
require(lubridate)
require(data.table)
require(dplyr)
# Note you need tyo place your API key...your_key_here
XIV <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=XIV&outputsize=full&apikey=6RSYX9BPXKZVXUS9&datatype=csv") #fread() data.table for downloading directly to a data frame
XIV$timestamp <- ymd(XIV$timestamp) #Lubridate to change character date to date format
XIV <- arrange(XIV,timestamp) #dplyr to sort data frame by date ascending order
colnames(XIV)[1] <- "Date"
XIV <- as.data.frame(XIV)

# store synthetic XIV XIVvals price data to data.frame
df <- data.frame(xivvals)
df <- setDT(df, keep.rownames = TRUE)[] # Set row names
colnames(df)[1] = "Date"
colnames(df)[2] = "close"
df$Date <- ymd(df$Date) # Convert to date format
df <- as.data.frame(df)

# Retain only Date and close columns for alphavantage XIV data
XIV <- XIV[ -c(2:5, 7:9) ]
colnames(XIV)[2] <- "close"

# ggplot2 to plot synthetic XIV and alphavantage XIV
require(ggplot2)
require(scales)
ggplot() +
geom_line(data=df,aes(x=Date,y=close), colour="red") +
geom_line(data=XIV,aes(x=Date,y=close), colour="black")+
theme_classic()+
scale_x_date(breaks = date_breaks("years"), labels = date_format("%Y"))+
ggtitle("XIV", subtitle = "Synthetic prior to 2010-11-30") +
labs(x="Date",y="XIV")+
theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
annotate("text", label = "Synthetic XIV", x = as.Date("2007-04-26"), y = 85, color = "red")+
annotate("text", label = "Alphavantage XIV", x = as.Date("2015-04-26"), y = 85, color = "black")

Rplot98

Lets plot times when the term structure was in backwardation. This code prepares and plots backwardation:

# Change termStucture column names
colnames(termStructure)[1] = "F"
colnames(termStructure)[2] = "G"
colnames(termStructure)[3] = "H"
colnames(termStructure)[4] = "J"
colnames(termStructure)[5] = "K"
colnames(termStructure)[6] = "M"
colnames(termStructure)[7] = "N"
colnames(termStructure)[8] = "Q"

# Prepare Data For backwardation<span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span> Plot
# Obtain backwardation dates
backwardation <- data.frame(termStructure)
backwardation <- setDT(backwardation, keep.rownames = TRUE)[] # Set row names
colnames(backwardation)[1] <- "Date"
backwardation.G <- ifelse(termStructure$F > termStructure$G,1,0)
backwardation.H <- ifelse(termStructure$G > termStructure$H,1,0)
backwardation.J <- ifelse(termStructure$H > termStructure$J,1,0)
backwardation.K <- ifelse(termStructure$J > termStructure$K,1,0)
backwardation.M <- ifelse(termStructure$K > termStructure$M,1,0)
backwardation.N <- ifelse(termStructure$M > termStructure$N,1,0)
backwardation.vec <- backwardation.G + backwardation.H + backwardation.J + backwardation.K + backwardation.M + backwardation.N
dates <- c(backwardation$Date)
backwardation.df <- data.frame(dates,backwardation.vec)
colnames(backwardation.df)[1] <- "Date"
colnames(backwardation.df)[2] <- "backwardation"

Rplot105.png

Times in pink are when the market was in backwardation. Next we may see how VXX and XIV acted during times of backwardation. Plot the non adjusted VXX for clarity.

Rplot103

Rplot106

Times of backwardation essentially cause XIV to go down in price and VXX to go up in price.

As we organized the backwardation dates, lets view how often the VIX is in backwardation / contango. We can pull the data using the code below:

# Count how often VIX in backwardation / contango
total.backwardation.days <- sum(backwardation.df$backwardation > 0, na.rm=TRUE)
total.days <- nrow(backwardation.df)
backwardation.perc <- total.backwardation.days/total.days *100
contango = 100-backwardation.perc
print(contango)
print(backwardation.perc)

VIX term structure is in backwardation 29.07% of the time and in contango 70.92% of the time.

Let see what the beta is for XIV, VXX. Code courtesy of Ilya Kipnis for blog post see here. With the original post by Dr. Jonathan Kinlay on beta convexity. At the center of the idea is that an asset relative to its benchmark exhibit different betas when the benchmark is down and up in price.

# Beta Convexity 
# Download SPY data
# Note you need tyo place your API key...your_key_here
SPY <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=SPY&outputsize=full&apikey=your_key_here&datatype=csv") #fread() data.table for downloading directly to a data frame
SPY$timestamp <- ymd(SPY$timestamp)   #Lubridate to change character date to date format
SPY <- arrange(SPY,timestamp)   #dplyr to sort data frame by date ascending order
colnames(SPY)[1] <- "Date"
SPY$Date <- ymd(SPY$Date)
SPY <- as.data.frame(SPY)
SPY <- subset(SPY, Date >= as.POSIXct("2009-01-30") ) # VXX start
SPY <- subset(SPY, Date >= as.POSIXct("2000-01-03")) # XIV start

VXX <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=VXX&outputsize=full&apikey=your_key_here&datatype=csv") #fread() data.table for downloading directly to a data frame
VXX$timestamp <- ymd(VXX$timestamp)   #Lubridate to change character date to date format
VXX <- arrange(VXX,timestamp)   #dplyr to sort data frame by date ascending order
colnames(VXX)[1] <- "Date"
VXX$Date <- ymd(VXX$Date)
VXX <- as.data.frame(VXX)

XIV <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=XIV&outputsize=full&apikey=your_key_here&datatype=csv") #fread() data.table for downloading directly to a data frame
XIV$timestamp <- ymd(XIV$timestamp)   #Lubridate to change character date to date format
XIV <- arrange(XIV,timestamp)   #dplyr to sort data frame by date ascending order
colnames(XIV)[1] <- "Date"
XIV$Date <- ymd(XIV$Date)
XIV <- as.data.frame(XIV)
XIV <- subset(XIV, Date >= as.POSIXct("2000-01-03 "))

# Calculate returns 
spy.xts = xts(SPY$adjusted_close, order.by=as.Date(SPY$Date, format="%Y/%m/%d"))
vxx.xts = xts(VXX$adjusted_close, order.by=as.Date(VXX$Date, format="%Y/%m/%d"))
xiv.xts = xts(XIV$adjusted_close, order.by=as.Date(XIV$Date, format="%Y/%m/%d"))
spy.ret <- Return.calculate(spy.xts)
vxx.ret <- Return.calculate(vxx.xts)
xiv.ret <- Return.calculate(xiv.xts)

rets <- na.omit(cbind(xiv.ret, spy.ret))
colnames(rets) <- c("XIV", "SPY")

betaConvexity <- function(Ra, Rb) {
  positiveBench <- Rb[Rb > 0]
  assetPositiveBench <- Ra[index(positiveBench)]
  positiveBeta <- CAPM.beta(Ra = assetPositiveBench, Rb = positiveBench)
  
  negativeBench <- Rb[Rb < 0]
  assetNegativeBench <- Ra[index(negativeBench)]
  negativeBeta <- CAPM.beta(Ra = assetNegativeBench, Rb = negativeBench)
  
  out <- (positiveBeta - negativeBeta) ^ 2
  return(out)
}

betaConvexity(rets$XIV, rets$SPY)

> betaConvexity(rets$XIV, rets$SPY)
[1] 2.066219

> betaConvexity(rets$VXX, rets$SPY)
[1] 1.931566

For the entire XIV and VXX sample beta is approx 2 for each product, meaning spy moves 1% XIV,VXX move approx 2%.

Traditionally we may run a regression fit between the daily returns of the SPY and XIV,VXX. This however doesn’t take into effect the beta of an asset when the benchmark is up or down. We see how the beta has changed over time with a rolling linear regression. The code to do this:

# Run rolling linear regression for beta
# Stock SPY as independant variable
rolling_lms <- lapply(seq(20,nrow(rets)), function(x) lm(XIV ~ SPY, data = rets[1:x , ]) )
length(rolling_lms)
nrow(rets)
all_slopes <-unlist(sapply(1:length(rolling_lms),function(j) rolling_lms[[j]]$coefficients[2]))
all_slopes<- unlist(all_slopes)
plot(all_slopes,type="l")
# Join regression output to original data frame 
row.diff <- rep(NA, nrow(rets) - length(rolling_lms)) # make leading NAs to position data correclty with original
beta <- c(row.diff,all_slopes)
rets <- data.frame(rets,beta)
rets <- setDT(rets, keep.rownames = TRUE)[] # Set row names
colnames(rets)[1] <- "Date"
rets$Date <- ymd(rets$Date)

# Plot beta 
ggplot() +
  geom_line(data=rets,aes(x=Date,y=beta), colour="red") +
  theme_classic()+
  scale_x_date(breaks = date_breaks("years"), labels = date_format("%Y"))+
  ggtitle("Rolling Linear Regression - Plot XIV Beta", subtitle = "XIV Inception") +
  labs(x="Date",y="VXX")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))

Rplot122

And for a visual of the regression fit for the full XIV sample period. The code to plot this:

# Plot Linear Regression
ggplot(rets, aes(x=SPY, y=XIV))+ 
  geom_point(shape=1)+
  geom_smooth(method=lm)+
  ggtitle("Linear Regression y~x XIV~SPY", subtitle = "Start date = XIV inception") +
  labs(x="SPY",y="XIV")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))

Rplot121

We repeat the process for VXX.

Rplot118

Rplot119

Both XIV and VXX beta have increased over time. Essentially a 1x move in the SPY leads to an approx -3x move in VXX and 3.66 move in XIV.

This will be a primer for upcoming posts where we will experiment with different volatility strategies.

All code can be found on my github.

Thanks again Ilya and QUANTBEAR for sharing your code on your blogs.