Bootstap Analysis – Resample and Replace

As I alluded in the previous post I made attempts to block boostrap time series outside of the packages, meboot, tseries & boot. In this post I will share the results of the effort.

First, like the last post we need data, we need to arrive at this point in the previous posts script:

annotate("text", label = "sma 117 to 161", x = 135, y = .65, color = "red")
line 294, from there you may follow along with this post.

The reason for block boot strapping time series is to maintain the correlation structure of the series. First in order to do this we will difference the time series data with the following code to create stationary series:

# First order differencing
# Removes auto correlation so each data point is not dependant on the other
# XIV diff
df$xiv.close.diff <- c(rep(NA, 1), diff(df$xiv_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$xiv.close.diff, type="l",main="XIV Close Differencing")
mean(df$xiv.close.diff, na.rm = TRUE)
var(df$xiv.close.diff, na.rm = TRUE)
# VXX diff
df$vxx.close.diff <- c(rep(NA, 1), diff(df$vxx_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
df$vxx.close.diff <- c(rep(NA, 1), diff(df$vxx_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$vxx.close.diff, type="l")
mean(df$vxx.close.diff, na.rm = TRUE)
var(df$vxx.close.diff, na.rm = TRUE)
# VXV Diff
df$vxv.close.diff <- c(rep(NA, 1), diff(df$vxv_cboe_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$vxv.close.diff, type="l")
mean(df$vxv.close.diff, na.rm = TRUE)
var(df$vxv.close.diff, na.rm = TRUE)
#VXMT Diff
df$vxmt.close.diff <- c(rep(NA, 1), diff(df$vxmt_cboe_close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))
plot(df$Date,df$vxmt.close.diff, type="l")
mean(df$vxmt.close.diff, na.rm = TRUE)
var(df$vxmt.close.diff, na.rm = TRUE)

Output:
xiv close diff

Next, we set a block size dervived from the b.star command from R package np. This calculates the block size for stationary and circular bootstrap per the paper Patton, Politis and White (2009).

# Create new data frame with differenced time series (staionary)
require(np)
#block.boot.df <- subset(df, Date >= as.POSIXct("2008-01-07") ) # subset at start of latest data set (VXMT)
block.boot.df <- data.frame(xiv.close.diff = df$xiv.close.diff, vxx.close.diff = df$vxx.close.diff, vxv.close.diff = df$vxv.close.diff, vxmt.close.diff = df$vxmt.close.diff)
# Find block size with package np
np.block.size.xiv <- b.star(block.boot.df$xiv.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
xiv.block.size <- np.block.size.xiv[2]
np.block.size.vxx <- b.star(block.boot.df$vxx.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
vxx.block.size <- np.block.size.vxx[2]
np.block.size.vxv <- b.star(block.boot.df$vxv.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
vxv.block.size <- np.block.size.vxv[2]
np.block.size.vxmt <- b.star(block.boot.df$vxmt.close.diff,Kn = NULL,mmax= NULL,Bmax = NULL,c = NULL,round = TRUE)
vxmt.block.size <- np.block.size.vxmt[2]
block.size <- c(xiv.block.size,vxx.block.size,vxv.block.size,vxmt.block.size)
block <- min(block.size) 

The block sizes for each 4 series in our volatility strategy:

 > block.size
  xiv.block.size vxx.block.size vxv.block.size vxmt.block.size
1              2              4             11               4

For this we will choose the maximum number obtained, which is 11. The code works by grouping the data frame by nrow by its block size and storing in a list. On the list output, a re sampling with replacement is performed on the list of groupped rows. The final reshuffled list is finally stored in an output dataframe. This function is iterated 100 times, storing each block bootstrapped series in an output list. As we are working with 4x time series, all 4 series are reshuffled in the same order and will end up in the same locations in the new time series. If this was not the case there would be more randomness between all series which might not make much sense. Heres the code:

# Set block number to begin subsetting per NROW
reps <- NROW(block.boot.df)/block # Set block number
reps <-(round(reps))
id <- rep(1:reps,each=block) # each = 5 corresponds to number of blocks to bootstrap by (5 in this case) # Check lengths of data and ID NROW(block.boot.df) length(id) # If ID is longer than df NROW if(length(id)>NROW(block.boot.df)){
  id.diff <- length(id) - NROW(block.boot.df)
  length.id <- length(id)-id.diff
  id  <- id[1:length.id]
} else {
  nrow.diff <- NROW(block.boot.df) - length(id)
  max.id <- max(id)
  add.value <- max.id+1
  pad.na <- rep(add.value,nrow.diff)  # pad nrow diff
  id <- c(id,pad.na)  # join added to vector
}

# Join ID and block.df
block.boot.df <- data.frame(block.boot.df,id) # place back in data frame
#block.boot.df$id.final[is.na(block.boot.df$id.final)] <- 693 # all NA to 1
# Id data
IDs <- unique(block.boot.df$id)
temp <- list()
# Function for bootstrap 1x data frame
# subsets data by id number
i=1
bootSTRAP = function(x){
  for (i in 1:length(IDs)){
    temp[i] <- list(block.boot.df[block.boot.df$id==IDs[i],])
  }
  out <- sample(temp,replace = TRUE)
  boot.df.output <- do.call(rbind, out)
}

# Loop for running it a 1000 times
runs <- 1:100
run.output <- list()
i=1
for (i in 1:length(runs)){
  tryCatch({
    temp.1 <- bootSTRAP(runs[i])
    #cum_ret <- rbind.data.frame(cum_ret, temp)
    run.output[[i]] <- cbind(temp.1)
    ptm0 <- proc.time()
    Sys.sleep(0.1)
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

# cumsum staionary differenced series
for (i in 1:length(run.output)){
  run.output[[i]]$cumsum.diff.xiv.diff <- cumsum(run.output[[i]]$xiv.close.diff)
  run.output[[i]]$cumsum.diff.vxx.diff <- abs(cumsum(run.output[[i]]$vxx.close.diff))
  run.output[[i]]$cumsum.diff.vxv.diff <- cumsum(run.output[[i]]$vxv.close.diff)
  run.output[[i]]$cumsum.diff.vxmt.diff <- cumsum(run.output[[i]]$vxmt.close.diff)
}

# Reverse VXX down trending series
for (i in 1:length(run.output)){
  run.output[[i]]$cumsum.diff.vxx.diff <- rev(run.output[[i]]$cumsum.diff.vxx.diff )
}
# Add index for merging
for (i in 1:length(run.output)){
  run.output[[i]]$index <- seq(1:NROW(run.output[[i]]))
}

# Merge all dfs
L <- run.output[[1]]
for (i in 2:length(run.output)) L <- merge(L,  run.output[[i]], by=c("index"))

# cbinds
replace.s <- cbind.data.frame(L)
replace.s <- replace.s[,-1]
# Subset data frames for loop
xiv.df <- replace.s[ , grepl("cumsum.diff.xiv.diff*", names(replace.s), perl=TRUE)]
vxx.df <- replace.s[ , grepl("cumsum.diff.vxx.diff*", names(replace.s), perl=TRUE)]
vxv.df <- replace.s[ , grepl("cumsum.diff.vxv.diff*", names(replace.s), perl=TRUE)]
vxmt.df <- replace.s[ , grepl("cumsum.diff.vxmt.diff*", names(replace.s), perl=TRUE)]

# Add Date to df's
df.date <- data.frame(Date = df$Date)
diff <- nrow(df.date) - nrow(xiv.df)
df.date <- df.date[1:nrow(xiv.df),]
# Add Date to subsetted df
xiv.ensemble.df <- data.frame(Date = df.date, xiv.df)
vxx.ensemble.df <- data.frame(Date = df.date, vxx.df)
vxv.ensemble.df <- data.frame(Date = df.date, vxv.df)
vxmt.ensemble.df <- data.frame(Date = df.date, vxmt.df)

# Melt data frames for plotting
xiv.plot.df <- melt(xiv.ensemble.df,id.vars = "Date")
vxx.plot.df <- melt(vxx.ensemble.df,id.vars = "Date")
vxv.plot.df <- melt(vxv.ensemble.df,id.vars = "Date")
vxmt.plot.df <- melt(vxmt.ensemble.df,id.vars = "Date")

# Plot XIV Resampled series
ggplot(data = xiv.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=xiv_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - XIV", subtitle="100 Iterations")

# Plot VXX Resampled series
ggplot(data = vxx.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxx_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXX", subtitle="100 Iterations")

# Plot VXV Resampled series
ggplot(data = vxv.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxv_cboe_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXV", subtitle="100 Iterations")

# Plot VXMT Resampled series
ggplot(data = vxmt.plot.df, aes(x=Date,y=value))+
  geom_line(aes(group = variable))+
  theme_classic()+
  theme(legend.position = "none")+
  geom_line(data=df,aes(x=Date,y=vxmt_cboe_close,colour="red"))+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  ggtitle("Resampled Time Series - VXMT", subtitle="100 Iterations")

This is the following output:

block size 11 xiv
XIV – Resample and replace – block size == 11
vxx size 11
VXX – Resample and replace – block size == 11 – Note the re sampled is not a great representation of the original!

In contrast to maximum entropy bootstrapping where each series resembled the original to a high degree. The block bootstrapped series differ to the original. The VXX series itself, the trend is straight down. The packages, tseries and boot did not manage to bootstrap this successfully either, essentially after differencing, and cumsum, the series needed to be reversed. This is due to the sign of the VXX trend and cumsum – – values == positive and we are left with up trending VXX.

By any means. This might be effective for back testing price series based on trading rules over layed on price series. The VXV/VXMT is a bet on contango, backwardation thus the trading rules are not per say over layed on the series itself, however, those strategies which use trading rules based on price series. The above bootstrapped procedure may serve a useful purpose.

Code for this can be found on my github.

 

Author: Andrew Bannerman

Integrity Inspector. Quantitative Analysis is a favorite past time.

One thought on “Bootstap Analysis – Resample and Replace”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s