How To Ignore Errors In a Loop – Continue Processing Each Iteration

In this post we will look at a method to process many files or data frames. In this example we will just make data frames, but in actual fact, these may point to a directory on our hard drive, see ?list.files.

First lets make some dummy data that we can work with.

# Create dummy data frames
    file1 <- as.data.frame(runif(100, 0,100))
    file2 <- as.data.frame(runif(100, 0,100))
    file3 <- as.data.frame(runif(100, 0,100))
    file4 <- as.data.frame(runif(12, 0,100))
    file5 <- as.data.frame(runif(100, 0,100))
    file6 <- as.data.frame(runif(15, 0,100))
    file7 <- as.data.frame(runif(100, 0,100))
    file8 <- as.data.frame(runif(8, 0,100))  # This is the df that its intended to fail on
    file9 <- as.data.frame(runif(100, 0,100))
    file10 <- as.data.frame(runif(100, 0,100))
    file11 <- as.data.frame(runif(100, 0,100))

    # Store all data frames in a list
    file.list <- list(file1,file2,file3,file4,file5,file6,file7,file8,file9,file10)

# Rename column names for all 11 data frames
Names <- function(x) {
  names(x) <- c("Close")
  return(x)
}
# Apply name change to all 10 data frames
file.list <- lapply(file.list, Names)

In the above we made 11 random data frames. We stored every data frame in a list, inside the file.list variable. We can print the first data frame by file.list[1] or the last file.list[11].

We create a function to rename the column names. We use lapply to run this over every data frame in the list from data frame file1 to file11.

Now that we have our dummy data, we may now create a function which will store our commands. Commands that we wish to iterate on each and every data frame.

# Create function for performing commands.
    genSMA = function(x){
      nextfile <- data.frame(file.list[[i]],stringsAsFactors=FALSE)
      new.df <- data.frame(nextfile)
      # Load packages 
      require(TTR)
      # Use TTR package to create rolling SMA n day moving average 
      getSMA <- function(numdays) {
        function(new.df) {
          SMA(new.df[,"Close"], numdays)    # Calls TTR package to create SMA
        }
      }
      # Create a matrix to put the SMAs in
      sma.matrix <- matrix(nrow=nrow(new.df), ncol=0)
      tail(sma.matrix)
      # Loop for filling it
      for (i in 2:12) {
        sma.matrix <- cbind(sma.matrix, getSMA(i)(new.df))
      }

      # Rename columns
      colnames(sma.matrix) <- sapply(2:12, function(n)paste("close.sma.n", n, sep=""))

      # Bind to existing dataframe
      new.df <-  cbind(new.df, sma.matrix)

    }

The above function will create a simple moving average on the Close column. It will calculate a 2 to 12 simple moving average. This is a very simplistic example, however in reality there may be full scripts and a multitude of calculations or operations within this function.

So we have our function with the calculations we wish to perform over all 11 data frames stored in our list. Next thing to do is write a for loop which will call the function and iterate it over every data frame in our file.list of data frames.

# Loop for running function over all data frames
for (i in 1:length(file.list)){
  tryCatch({
    genSMA(file.list[[i]])   # Start from data frame 1
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

After running the full code we should get the error:
[1] "i = 8 failed:"

This is because we purposely setup data frame file8 with only 8 data points which is less than our required 12 for a 2:12 simple moving average. For a 12 simple moving average we need 12 data points. Thus the code throws an error.

In normal conditions without using tryCatch. The loop would break and we would then have to remove the error-some file or data frame and continue to run the loop again. Perhaps dealing with only a few files this is not an issue. But if you are processing thousands of files its a real inconvenience!

tryCatch prints which iteration failed also so may perform further due diligence.

Note we can also modify the loop and function to run over a directory on our hard drive if thats where our data is stored.

We can do this with:

# Specify directory where files are stored 
 files = list.files(path = "C:/R Projects/Data/", pattern = ".", full.names = TRUE)

Then inside our function we can:

  genSMA = function(x){
        nextfile <- read.table(files[i],header=TRUE, sep=",", stringsAsFactors=FALSE)  #reads first file in directory 
        new.df <- data.frame(nextfile)  # putting inside data.frame

ALL YOUR COMMANDS HERE

}

And we call the loop in the same way. This time we want to run the loop over every single file in our directory. We placed this variable in files.

# Loop for running function over all data frames
for (i in 1:length(files)){
  tryCatch({
    genSMA(files[[i]])
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

Here we run the function over every single file in our directory that we specified in the files variable. Note I do not specify any output of the processing above. We may output a plot or store the final results in a data frame. I will revisit! However, the main topic is addressed – successfully ignore a failure in the loop and continue to process the remaining iterations.

Free Data: Alternative to Yahoo Finance

Alpha Vantage have an API that offers free data. They have intraday data which spans the last 10 to 15 days at resolutions of 1min, 5min, 15min, 30min, 60min.

Daily, weekly, and monthly data which spans the past 20 years.

We can use R to interact with the aplha advantage API. We use the fread() command from the data.table package to directly download data to a data frame within R.

Note: insert your own Alpha Vantage API code inside the URL

require(lubridate)
require(data.table)
require(dplyr)
GE <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=GE&outputsize=full&apikey=YOUR_API_KEY_HERE&datatype=csv") #fread() data.table for downloading directly to a data frame
GE$timestamp <- ymd(GE$timestamp)   #Lubridate to change character date to date format
GE <- arrange(GE,timestamp)   #dplyr to sort data frame by date ascending order 

#Plot GE Data
plot(GE$timestamp,GE$close,type="l",main="Alpha Vantage - GE Close Prices")

alpha.vantage.GE

There are different variables that go into the alpha advantage URL and for full usage you may visit the documentation page.

Download Multiple .csv Files To Hard Drive

If we want to download multiple files at one time. We can self specify them in a vector and use a loop to download each file name in that vector.


# Download multiple .csv files to hard drive
# Self specify symbols in a vector
file.list <- c("DDM","MVV","QLD","SAA","SSO","TQQQ","UDOW","UMDD","UPRO","URTY","UWM", "BIB", "FINU","LTL","ROM", "RXL", "SVXY","UBIO","UCC","UGE","UPW","URE","USD","UXI","UYG","UYM","DOG","DXD","MYY","MZZ","PSQ","QID","RWM","SBB","SDD","SDOW","SDS","SH","SPXU","SQQQ","SRTY","TWM","SMDD","UVXY","VIXM","VIXY")

for (i in 1 : length(file.list)) {
  file.name.variable <-  file.list[i]
  url <- paste0("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=",file.name.variable,"&outputsize=full&apikey=YOUR_API_KEY_HERE&datatype=csv")
               destfile <- paste0("D:/R Projects/",
                                  file.name.variable, ".csv")
               download.file(url, destfile, mode="wb")

fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=GE&outputsize=full&apikey=YOUR_API_KEY_HERE&datatype=csv")
}

Or if we can find a list of symbols online we can import those into R and iterate through each symbol and download each file to our hard drive.

Import 1700 ETF Symbols From Nasdaq.com – Download .csv To Hard Drive

In this case we can read an ETF list of 1700 symbols from nasdaq.com.
Then we can download each symbol to our hard drive:

# Read ETF list csv file from nasdaq.com
# Use fread() from data.table package
# install.packages("data.table")
require(data.table)
read.data <- fread("http://www.nasdaq.com/investing/etfs/etf-finder-results.aspx?download=Yes")

# Make vector of symbol names
symbol.names <- read.data$Symbol

# Count Total ETF tickers
NROW(symbol.names)

for (i in 1 : length(symbol.names)) {
  file.name.variable <-  symbol.names[i]
  url <- paste0("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=",file.name.variable,"&outputsize=full&apikey=YOUR_API_CODE_HERE&datatype=csv")
  destfile <- paste0("D:/R Projects/",
                     file.name.variable, ".csv")
  download.file(url, destfile, mode="wb")
}

Also to note.. you may import a symbol list from your hard drive. The same principal applies.

Non Price Data – UltraPRO ETF Assets Under Management

In previous posts we determined that the S&P500 was mean reverting on a interday basis. What happens if there are better ways to capture the mean reverting nature than simply using price data alone?

In this post we will look at the UltraPRO family of ETF’s and use their assets under management to gauge over all market sentiment. The proceure is as follows:

1. Sum bullish ETF AUM’s
2. Sum Bearish ETF AUM’s
3. Either take the ratio of total bull aum / bear aum or total bull aum – bear aum.

For bullish funds I have used:

“DDM”,”MVV”,”QLD”,”SAA”,”SSO”,”TQQQ”,”UDOW”,”UMDD”,”UPRO”,”URTY”,”UWM”, “BIB”, “FINU”,”LTL”,”ROM”, “RXL”, “SVXY”,”UBIO”,”UCC”,”UGE”,”UPW”,”URE”,”USD”,”UXI”,”UYG”,”UYM”

And bearish:

“DOG”,”DXD”,”MYY”,”MZZ”,”PSQ”,”QID”,”RWM”,”SBB”,”SDD”,”SDOW”,”SDS”,”SH”,”SPXU”,”SQQQ”,”SRTY”,”TWM”,”SMDD”,”UVXY”,”VIXM”,”VIXY”

(Note make sure the order of your bull / bear ETFs are the same as mine if you will try my code. You can do this by seeing the order of the files in your directory by printing file.names on line 37 of the R code, this is important!)

Lets see what this looks like, FYI you can download all the historical NAV’s in .csv format from the UltraPRO website manually or you can use the R code below:

# Download ETF AUM Files
file.list <- c("DDM","MVV","QLD","SAA","SSO","TQQQ","UDOW","UMDD","UPRO","URTY","UWM", "BIB", "FINU","LTL","ROM", "RXL", "SVXY","UBIO","UCC","UGE","UPW","URE","USD","UXI","UYG","UYM","DOG","DXD","MYY","MZZ","PSQ","QID","RWM","SBB","SDD","SDOW","SDS","SH","SPXU","SQQQ","SRTY","TWM","SMDD","UVXY","VIXM","VIXY")

for (i in 1 : length(file.list)) {
  file.name.variable <-  file.list[i]
  url <- paste0("https://accounts.profunds.com/etfdata/ByFund/",
                file.name.variable, "-historical_nav.csv")
  destfile <- paste0("C:/R Projects/Data/etf_aum/",
                     file.name.variable, ".csv")
  download.file(url, destfile, mode="wb")
}

If we sum all assets under management for each group to obtain our bullish and bearish total aums.

total.bull.aum

total.bear.aum.png

spread.bull.bear

Note the spread on a yearly basis is mean reverting. We could run a statistical test but we see that the spread does revert to the mean. However, we can also over lay a 9sma and measure how often the spread moves above and beyond the rolling 9sma. We see what this looks like below:

Rplot158

We see that the spread on a rolling 9sma window is very mean reverting. This will be the subject of our back test. To keep things simple we will simply buy the SPY when the rolling z-score falls below 0 and we will sell when it crosses back over 0.

ultrapro_mean_rev

ultrapro_etf_mean_rev_dd

We see that the maximum draw down is 18%. We avoid most of the 08 decline. As the draw down is so little, we could explore using 2 or 3x leverage to increase profits and match the risk profile of buy and hold for example.

Annualized return is 12% since 2006, this AUM data for the Ultrapro ETF family did not exist prior to 2006. Sharpe ratio is .84.

We total 267 trades, 192 which were profitable and 75 losing trades for a win % of 72%. We are 47% of the time invested from 06 to present day. It means we beat buy and hold and free up capital for ‘layering’ other strategies.

The fact that this didnt break down during the 08 crisis is encouraging. I would rather see robustness against all market regimes.

In closing the Assets under management as a collective is essentially a measure of market sentiment. Asset levels rise and fall in the bull / bear funds according to the view of the market participants.

The strategy above seeks to trade the market when sentiment is lower or more on the bearish side and exit when the sentiment has reverted back to the mean or in this case when the 9 period rolling z score crossed back over 0.

Ideas for other tests:

  1. Change exit criteria, sell at a value higher than 0 for example
  2. Specify a ‘n’ day exit.. for example exit the trade after 3,4 or 5 days
  3. Do not sell first holding day but sell at first immediate profit from entry

These may be better / worse exits than above, but will leave you good readers to test those!

Full R Code below which includes:

  1. Loop for downloading specified ETF .csv from UltraPro website
  2. Loop for merging all .csv into one data frame
  3. Back test script
  4. Plots used above
# UltraPRO ETF Assets Under Management
# Scrape ETF data, study AUM's, create indicators, back test strategy
# Andrew Bannerman 10.4.2017
# Need to subtract bull ETF from bear... or ratio.... zscore ratio.......

require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)
require(ggplot2)
require(ggthemes)
require(lubridate)
require(scales)

# Download ETF AUM Files
file.list <- c("DDM","MVV","QLD","SAA","SSO","TQQQ","UDOW","UMDD","UPRO","URTY","UWM", "BIB", "FINU","LTL","ROM", "RXL", "SVXY","UBIO","UCC","UGE","UPW","URE","USD","UXI","UYG","UYM","DOG","DXD","MYY","MZZ","PSQ","QID","RWM","SBB","SDD","SDOW","SDS","SH","SPXU","SQQQ","SRTY","TWM","SMDD","UVXY","VIXM","VIXY")
for (i in 1 : length(file.list)) {
  file.name.variable <-  file.list[i]
  url <- paste0("https://accounts.profunds.com/etfdata/ByFund/",
                file.name.variable, "-historical_nav.csv")
  destfile <- paste0("C:/R Projects/Data/etf_aum/",
                     file.name.variable, ".csv")
  download.file(url, destfile, mode="wb")
}

# Set data Dir
data.dir <- "C:/R Projects/Data/etf_aum/"
download.dir <- "C:/R Projects/Data/etf_aum/"

#Read list of files
files = list.files(path = "C:/R Projects/Data/etf_aum/", pattern = ".", full.names = TRUE)

# Read file names
file.names = list.files(path = "C:/R Projects/Data/etf_aum", pattern = ".csv")

#Read the first file
data.file <- paste(files[1],sep="")

# Read data first file to data frame
df <- read.csv(data.file,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)

# Convert data formats
cols <-c(4:9)
df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
#Convert Date Column [1]
df$Date <- mdy(df$Date)   #mdy for .txt

# Loop for merging all files to one data frame by common date

for (f in 2:length(files)) {
  data.next <- paste(files[f],sep="")
  # if using names.txt
  #  data.next <- paste(data.dir,fileslist[f],'.csv',sep="")
  next.file <- read.csv(data.next,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE) 

  cols <-c(4:9)
  next.file[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
  #Convert Date Column [1]
  next.file$Date <- mdy(next.file$Date)   #mdy for .txt

  next.df <- full_join(df, next.file, by = c("Date" = "Date"))

 cnames <- rep(c("Proshares.Name","Ticker","NAV","Prior.NAV","NAV.Change","Nav.Change.a","Shares.Outstanding","AUM"),f)
  cnums <- rep(seq(1,f),each=8)
  columnames <- paste(cnames,cnums)
  colnames(next.df) <- c('Date',columnames)

  df <- next.df
  colnames(df)
}

# Sort data.frame from 'newest to oldest' to 'oldest to newest'
df <- arrange(df, Date)

# Subset AUM columns to one data frame
new.df <- df[, seq(from = 1, to = ncol(df), by=8)]

# Convert all NA to 0
new.df[is.na(new.df)] <- 0

colnames(new.df)

# Sum all AUMs
new.df$aum.sum.bull <-  rowSums(new.df[,c("AUM 1", "AUM 2", "AUM 5", "AUM 6", "AUM 7", "AUM 12", "AUM 13", "AUM 15", "AUM 16", "AUM 26", "AUM 27","AUM 28","AUM 30","AUM 31","AUM 32","AUM 33","AUM 34","AUM 35","AUM 36","AUM 37","AUM 38","AUM 39","AUM 41","AUM 42","AUM 43","AUM 44")])   # Bull Aums
new.df$aum.sum.bear <-  rowSums(new.df[,c("AUM 3", "AUM 4", "AUM 8", "AUM 9", "AUM 10", "AUM 11", "AUM 14", "AUM 17", "AUM 18", "AUM 19", "AUM 20","AUM 21","AUM 22","AUM 23","AUM 24","AUM 25","AUM 29","AUM 40","AUM 45","AUM 46")])   # Bull Aums

# Ratio or spread of Bull to bear
new.df$aum.sum <- new.df$aum.sum.bull / new.df$aum.sum.bear

new.df <- new.df[-c(2855), ]

# Make Plot
ggplot(new.df, aes(Date, aum.sum)) +
  geom_line()+
  theme_economist() +
  scale_x_date(breaks = date_breaks("years"), labels = date_format("%y"))+
  scale_y_continuous(breaks = seq(-19000000000, 19000000000, by = 500500000))+
  ggtitle("Spread of Total Bear Bull Assets Under Management", subtitle = "2006 To Present") +
  labs(x="Year",y="Total Bear AUM")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  theme(axis.text.x= element_text(hjust=0.5, angle=0))

# Create rolling n day z-score of total AUM sum
# Use TTR package to create n day SMA
get.SMA <- function(numdays) {
  function(new.df) {
    SMA(new.df$aum.sum, n=numdays)    # Calls TTR package
  }
}
# Create a matrix to put the SMA's in
sma.matrix <- matrix(nrow=nrow(new.df), ncol=0)

# Loop for filling it
for (i in 2:30) {
  sma.matrix <- cbind(sma.matrix, get.SMA(i)(new.df))
}

# Rename columns
colnames(sma.matrix) <- sapply(2:30, function(n)paste("sma.n", n, sep=""))

# Bind to existing dataframe
new.df <-  cbind(new.df, sma.matrix)

# Use TTR package to create n day SD
get.SD <- function(numdays) {
  function(new.df) {
    runSD(new.df$aum.sum, numdays, cumulative = FALSE)     # Calls TTR package to create RSI
  }
}
# Create a matrix to put the SMA's in
sd.matrix <- matrix(nrow=nrow(new.df), ncol=0)

# Loop for filling it
for (i in 2:30) {
  sd.matrix <- cbind(sd.matrix, get.SD(i)(new.df))
}

# Rename columns
colnames(sd.matrix) <- sapply(2:30, function(n)paste("sd.n", n, sep=""))

# Bind to existing dataframe
new.df <-  cbind(new.df, sd.matrix)

# zscore calculation
new.df$zscore.n2 <- apply(new.df[,c('aum.sum','sma.n2', 'sd.n2')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n3 <- apply(new.df[,c('aum.sum','sma.n3', 'sd.n3')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n4 <- apply(new.df[,c('aum.sum','sma.n4', 'sd.n4')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n5 <- apply(new.df[,c('aum.sum','sma.n5', 'sd.n5')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n6 <- apply(new.df[,c('aum.sum','sma.n6', 'sd.n6')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n7 <- apply(new.df[,c('aum.sum','sma.n7', 'sd.n7')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n8 <- apply(new.df[,c('aum.sum','sma.n8', 'sd.n8')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n9 <- apply(new.df[,c('aum.sum','sma.n9', 'sd.n9')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n10 <- apply(new.df[,c('aum.sum','sma.n10', 'sd.n10')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n11 <- apply(new.df[,c('aum.sum','sma.n11', 'sd.n11')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n12 <- apply(new.df[,c('aum.sum','sma.n12', 'sd.n12')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n13 <- apply(new.df[,c('aum.sum','sma.n13', 'sd.n13')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n14 <- apply(new.df[,c('aum.sum','sma.n14', 'sd.n14')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n15 <- apply(new.df[,c('aum.sum','sma.n15', 'sd.n15')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n16 <- apply(new.df[,c('aum.sum','sma.n16', 'sd.n16')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n17 <- apply(new.df[,c('aum.sum','sma.n17', 'sd.n17')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n18 <- apply(new.df[,c('aum.sum','sma.n18', 'sd.n18')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n19 <- apply(new.df[,c('aum.sum','sma.n19', 'sd.n19')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n20 <- apply(new.df[,c('aum.sum','sma.n20', 'sd.n20')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n21 <- apply(new.df[,c('aum.sum','sma.n21', 'sd.n21')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n22 <- apply(new.df[,c('aum.sum','sma.n22', 'sd.n22')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n23 <- apply(new.df[,c('aum.sum','sma.n23', 'sd.n23')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n24 <- apply(new.df[,c('aum.sum','sma.n24', 'sd.n24')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n25 <- apply(new.df[,c('aum.sum','sma.n25', 'sd.n25')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n26 <- apply(new.df[,c('aum.sum','sma.n26', 'sd.n26')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n27 <- apply(new.df[,c('aum.sum','sma.n27', 'sd.n27')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n28 <- apply(new.df[,c('aum.sum','sma.n28', 'sd.n28')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n29 <- apply(new.df[,c('aum.sum','sma.n29', 'sd.n29')], 1, function(x) { (x[1]-x[2])/x[3] } )
new.df$zscore.n30 <- apply(new.df[,c('aum.sum','sma.n30', 'sd.n30')], 1, function(x) { (x[1]-x[2])/x[3] } )

# View Plot
ggplot(new.df, aes(Date, zscore.n9)) +
  geom_line()+
  theme_economist() +
  scale_x_date(breaks = date_breaks("years"), labels = date_format("%y"))+
  scale_y_continuous()+
  ggtitle("Rolling 9 Day Zscore of Total Bull Bear AUM Spread", subtitle = "2006 To Present") +
  labs(x="Year",y="9 Day Zscore")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  theme(axis.text.x= element_text(hjust=0.5, angle=0))

# Load S&P500 benchmark data
read.spx <- read.csv("C:/R Projects/Data/SPY.csv", header=TRUE, stringsAsFactors = FALSE)

# Convert data formats
cols <-c(2:7)
read.spx[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
#Convert Date Column [1]
read.spx$Date <- ymd(read.spx$Date)

# Join to existing data frame
new.df <- full_join(read.spx, new.df, by = c("Date" = "Date"))

# Convert all NA to 0
new.df[is.na(new.df)] <- 0

# Calculate Returns from open to close
new.df$ocret <- apply(new.df[,c('Open', 'Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
new.df$clret <- ROC(new.df$Close, type = c("discrete"))
new.df$clret[1] <- 0

# Subset by date
new.df <- subset(new.df, Date >= as.POSIXct("2006-01-01") ) 

# Name indicators #
#train.indicator <- train.set$close.zscore.n10
#test.indicator <- test.set$close.zscore.n10
signal <- new.df$zscore.n9
trigger <- 0

# Enter buy / sell rules
new.df$enter <- ifelse(signal < trigger, 1,0)
new.df$exit <- ifelse(signal > 0, 1,0)

# Mean Rev
new.df <- new.df %>%
  dplyr::mutate(signal = ifelse(enter == 1, 1,
                                      ifelse(exit == 1, 0, 0)))

# lag signal by one forward day to signal entry next day
new.df$signal <- lag(new.df$signal,1) # Note k=1 implies a move *forward*

new.df[is.na(new.df)] <- 0  # Set NA to 0

# Calculate equity curves

# Signal
new.df <- new.df %>%
  dplyr::mutate(RunID = rleid(signal)) %>%
  group_by(RunID) %>%
  dplyr::mutate(mean.rev.equity = ifelse(signal == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats
xts1 = xts(new.df$mean.rev.equity, order.by=as.POSIXct(new.df$Date, format="%Y-%m-%d %H:%M"))
xts2 = xts(new.df$clret, order.by=as.POSIXct(new.df$Date, format="%Y-%m-%d %H:%M")) 

# Join XTS together
compare <- cbind(xts1,xts2)

require(PerformanceAnalytics)
colnames(compare) <- c("Mean Reversion UltraPRO ETF's","Buy And Hold")
charts.PerformanceSummary(compare,main="Mean Reversion - UltraPRO ETF AUM", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

# Find net trade result of multiple 'n' day trades
# Find start day of trade, find end day, perform (last(Close) - first(Open))/first(Open) % calculation
new.df <- new.df %>%
  dplyr::mutate(RunID = data.table::rleid(signal)) %>%
  group_by(RunID) %>%
  dplyr::mutate(perc.output = ifelse(signal == 0, 0,
                                     ifelse(row_number() == n(),
                                            (last(Close) - first(Open))/first(Open), 0))) %>%
  ungroup() %>%
  select(-RunID)

# Win / Loss %
# All Holding Days
winning.trades <- sum(new.df$mean.rev.equity > '0', na.rm=TRUE)
losing.trades <- sum(new.df$mean.rev.equity < '0', na.rm=TRUE)
even.trades <- sum(new.df$mean.rev.equity == '0', na.rm=TRUE)
total.days <- NROW(new.df$mean.rev.equity)

# Multi Day Trades
multi.winning.trades <- sum(new.df$perc.output > '0', na.rm=TRUE)
multi.losing.trades <- sum(new.df$perc.output < '0', na.rm=TRUE)
multi.total.days <- multi.winning.trades+multi.losing.trades

# % Time Invested
time.invested <- (winning.trades + losing.trades) / total.days
winning.trades + losing.trades

# Calcualte win loss %
# All Days
total <- winning.trades + losing.trades
win.percent <- winning.trades / total
loss.percent <- losing.trades / total
# Multi Day Trades
multi.total <- multi.winning.trades + multi.losing.trades
multi.win.percent <- multi.winning.trades / multi.total
multi.loss.percent <- multi.losing.trades / multi.total
# Calculate Consecutive Wins Loss
# All Days
remove.zero <- new.df[-which(new.df$mean.rev.equity == 0 ), ] # removing rows 0 values
consec.wins <- max(rle(sign(remove.zero$mean.rev.equity))[[1]][rle(sign(remove.zero$mean.rev.equity))[[2]] == 1])
consec.loss <- max(rle(sign(remove.zero$mean.rev.equity))[[1]][rle(sign(remove.zero$mean.rev.equity))[[2]] == -1])
consec.wins

# Multi Day Trades
multi.remove.zero <- new.df[-which(new.df$perc.output == 0 ), ] # removing rows 0 values
multi.consec.wins <- max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == 1])
multi.consec.loss <-max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == -1])

# Calculate Summary Statistics
# All Days
average.trade <- mean(new.df$mean.rev.equity)
average.win <- mean(new.df$mean.rev.equity[new.df$mean.rev.equity >0])
average.loss <- mean(new.df$mean.rev.equity[new.df$mean.rev.equity <0])
median.win <- median(new.df$mean.rev.equity[new.df$mean.rev.equity >0])
median.loss <- median(new.df$mean.rev.equity[new.df$mean.rev.equity <0])
max.gain <- max(new.df$mean.rev.equity)
max.loss <- min(new.df$mean.rev.equity)
win.loss.ratio <- winning.trades / abs(losing.trades)
summary <- cbind(winning.trades,losing.trades,even.trades,total.days,win.percent,loss.percent,win.loss.ratio,time.invested,average.trade,average.win,average.loss,median.win,median.loss,consec.wins,consec.loss,max.gain,max.loss)
summary <- as.data.frame(summary)
colnames(summary) <- c("Winning Trades","Losing Trades","Even Trades","Total Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(summary)

# Multi Day Trades
multi.average.trade <- mean(new.df$perc.output)
multi.average.win <- mean(new.df$perc.output[new.df$perc.output >0])
multi.average.loss <- mean(new.df$perc.output[new.df$perc.output <0])
multi.median.win <- median(new.df$perc.output[new.df$perc.output >0])
multi.median.loss <- median(new.df$perc.output[new.df$perc.output <0])
multi.win.loss.ratio <- multi.average.win / abs(multi.average.loss)
multi.max.gain <- max(new.df$perc.output)
multi.max.loss <- min(new.df$perc.output)
multi.summary <- cbind(multi.winning.trades,multi.losing.trades,multi.total.days,multi.win.percent,multi.loss.percent,multi.win.loss.ratio,time.invested,multi.average.trade,multi.average.win,multi.average.loss,multi.median.win,multi.median.loss,multi.consec.wins,multi.consec.loss,multi.max.gain,multi.max.loss)
multi.summary <- as.data.frame(multi.summary)
colnames(multi.summary) <- c("Winning Trades","Losing Trades","Total Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(multi.summary)
print(performance.table)
print(drawdown.table)
table.Drawdowns(xts1, top=10)
Return.cumulative(xts1, geometric = TRUE)

# Write output to file
write.csv(new.df,file="C:/R Projects/ultra.pro.etf.mean.rev.csv")

S&P500 – General Intraday Study

Have arbitrarily chosen 1 minute and 30 second bar for studying the range (high – low) and also studying the mean close to close change per time of day.

I follow this procedure:

1. Calculate High – Low to find the bar range
2. Calculate the Close to Close differences for each bar
3. Group all by time of day
4. Mean for each time of day for the range and close to close differences
5. Plot

If we look at 1 minute range bars first:

Rplot150
Mean High-Low Group By Time 1 Minute Bars – 1998  To present (10.13.2017). Due to image size. Right click and download and zoom around to see the image

What is prominent, if we classify volatility as a range extension. we see that range expands in the am and contracts mid day and again increases post 1300.

We also see notable uptick in range at the (first/Last) 830am /1500  bar and also the 900 and 1300 bar Central Time. I believe the 9am uptick may invite more buyers after waiting for the first 30 minutes to pass to see what directions trades will place their bets.

Post 1300 hours we see the range increase to market closing.

The highest one range bars are the first and last bar and the bars in the morning specifically the 9am bar.

We also see prominently that at every 30 minute intervals, on and half past the hour we see an uptick in the range. This is true for every 10 minute intervals also.

If we take the mean close to close change per time of day:

Rplot156.png
Mean Close to Close Difference 1 Min Bars – April 2017 To Present

We see the the first 4 minutes of the open tend have a mean positive bias and the closing 15 minutes tend to show a slightly negative bias.

If we dig into the 30 second bars looking first at the range:

Rplot153
Mean High-Low Group By Time 30 Second Bars – April 2017 to Present

We see the same theme. First and last bars have the highest range. Again we see range decline into the 9am before an uptick of range. Range declines into the lunch hours and at 1300 we see an uptick in range before steadily increasing to the market close. We also see the increase of range on every hour and half hour intervals (top and bottom of the hour).

If we view the mean close to close difference by time:

Rplot154
Mean Close to Close Difference 30 Second Bars – April 2017 To Present

We see that the first 30 second bar has a notable positive bias. The other bars do not offer much statistical significance.

Rplot157
Mean High-Low Group By Time 15 Second Bars – April 2017 to Present

This is the range of the 15 second bar. The theme is clear. From 830 to 1030 we see the highest range. Lunch time range contracts before again expanding post 1300 to market close.

Free wordpress doesn't allow the images to be viewed clearly and right click save as yields blurred results. If you have an interest in the images above shoot me and email from the contact form and I will email them.

 

Strategy – S&P500 Seasonal

During a previous post, S&P500 Seasonal Study + Other Commodities: https://flare9xblog.wordpress.com/2017/10/12/seasonal-study/

It was determined that the best months to be long the S&P500 was from months October through to the end of May.

As a refresher:

Rplot126
S&P500 Mean Daily Spread Per Month

We back test this theme to see how it performs over monthly bars from 1960 to present day (10.12.2017):

Rplot136

SP500seasoal.results

What we see is that we are invested 67% of the time being long from the Open bar of October and selling at the Close of May. We do beat the S&P500 and I believe this is partly due to drawing down less post 1999 bear market. Our maximum draw down is slightly less than than buy and hold at 43%.

What stands out is the average gain during this October to May holding period. We see that the average trade is 14.3%.

To see some relative benchmark, lets see what it looks like only buying May open and selling at the Close of September:

Rplot137

may to sep

> Return.cumulative(xts1, geometric = TRUE)
                        [,1]
Cumulative Return -0.1973985

We see cumulative returns for Buying May open and selling Close of September are -0.1973985. That is the growth of $1. Essentially May to September is negative to flat. The cumulative returns do not even warrant shorting May to September.

In closing we see that the majority of the S&P500 gains seem to be attributed to seasonal trends specifically buying in October and Selling in May.

Full back test R Code for the above


# S&P500 Seasonal Back Test 
# Month Bars
# Buy Open of October, Sell End of April (Based on seasonal study)
# Andrew Bannerman 10.12.2017

require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Data path
data.dir <- "C:/R Projects/Final Scripts/Seasonal Strategies/Data"
data.read.spx <- paste(data.dir,"$SPX.Monthly.csv",sep="/")

# Read data
read.spx <- read.csv(data.read.spx,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)

# Make dataframe 
new.df <- data.frame(read.spx)

# Convert Values To Numeric 
cols <-c(3:8)
new.df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Convert Date Column [1]
new.df$Date <- ymd(new.df$Date)

# Add day of month column 
# This is a helper column for creating buy/sell rules
group <- new.df %>% dplyr::mutate(month = lubridate::month(Date)) %>% group_by(month) 
new.df <- data.frame(new.df,group$month)

# Calculate Returns from open to close 
new.df$ocret <- apply(new.df[,c('Open', 'Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
new.df$clret <- ROC(new.df$Close, type = c("discrete"))
new.df$clret[1] <- 0

# Subset by Date
new.df <- subset(new.df, Date >= as.POSIXct("1960-01-01") )  

# Enter Long Rules 
# Buy October To End of April
x = new.df$group.month
# October to May Hold Period Rules
#new.df$enter <- ifelse(x == 1,1,0 | ifelse(x == 2,1,0 | ifelse(x == 3,1,0 | ifelse(x == 4,1,0 | ifelse(x==5,1,0 | ifelse(x == 10,1,0 | ifelse(x == 11,1,0 | ifelse(x == 12,1,0))))))))
#May to September Hold Period Rules
new.df$enter <- ifelse(x == 5,1,0 | ifelse(x == 6,1,0 | ifelse(x == 7,1,0 | ifelse(x == 8,1,0 | ifelse(x==9,1,0)))))

# Calculate equity curves
# Long
new.df <- new.df %>%
  dplyr::mutate(RunID = rleid(enter)) %>%
  group_by(RunID) %>%
  dplyr::mutate(seasonal.equity = ifelse(enter == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats 
xts1 = xts(new.df$seasonal.equity, order.by=as.POSIXct(new.df$Date, format="%Y-%m-%d")) 
xts2 = xts(new.df$clret, order.by=as.POSIXct(new.df$Date, format="%Y-%m-%d")) 
str(new.df)

# Join XTS together 
compare <- cbind(xts1,xts2)

# Use the PerformanceAnalytics package for trade statistics
colnames(compare) <- c("Seasonal","Buy And Hold")
charts.PerformanceSummary(compare,main="Long May Open To Close of September", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

# Find net trade result of multiple 'n' day trades
# Find start day of trade, find end day, perform (last(Close) - first(Open))/first(Open) % calculation
new.df <- new.df %>%
  dplyr::mutate(RunID = data.table::rleid(enter)) %>%
  group_by(RunID) %>%
  dplyr::mutate(perc.output = ifelse(enter == 0, 0,
                                     ifelse(row_number() == n(),
                                            (last(Close) - first(Open))/first(Open), 0))) %>%
  ungroup() %>%
  select(-RunID)

# Win / Loss % 
# All Holding Months
winning.trades <- sum(new.df$seasonal.equity > '0', na.rm=TRUE)
losing.trades <- sum(new.df$seasonal.equity < '0', na.rm=TRUE)
even.trades <- sum(new.df$seasonal.equity == '0', na.rm=TRUE)
total.days <- NROW(new.df$seasonal.equity)

# Multi Month Trades
multi.winning.trades <- sum(new.df$perc.output > '0', na.rm=TRUE)
multi.losing.trades <- sum(new.df$perc.output < '0', na.rm=TRUE)
multi.total.days <- multi.winning.trades+multi.losing.trades

# % Time Invested
time.invested <- (winning.trades + losing.trades) / total.days
winning.trades + losing.trades

# Calcualte win loss %
# All months
total <- winning.trades + losing.trades
win.percent <- winning.trades / total
loss.percent <- losing.trades / total
# Multi Month Trades
multi.total <- multi.winning.trades + multi.losing.trades
multi.win.percent <- multi.winning.trades / multi.total
multi.loss.percent <- multi.losing.trades / multi.total
# Calculate Consecutive Wins Loss 
# All Months
remove.zero <- new.df[-which(new.df$seasonal.equity == 0 ), ] # removing rows 0 values 
consec.wins <- max(rle(sign(remove.zero$seasonal.equity))[[1]][rle(sign(remove.zero$seasonal.equity))[[2]] == 1])
consec.loss <- max(rle(sign(remove.zero$seasonal.equity))[[1]][rle(sign(remove.zero$seasonal.equity))[[2]] == -1])
consec.wins

# Multi Month Trades
multi.remove.zero <- new.df[-which(new.df$perc.output == 0 ), ] # removing rows 0 values 
multi.consec.wins <- max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == 1])
multi.consec.loss <-max(rle(sign(multi.remove.zero$perc.output))[[1]][rle(sign(multi.remove.zero$perc.output))[[2]] == -1])

# Calculate Summary Statistics
# All Months
average.trade <- mean(new.df$seasonal.equity)
average.win <- mean(new.df$seasonal.equity[new.df$seasonal.equity >0])
average.loss <- mean(new.df$seasonal.equity[new.df$seasonal.equity <0])
median.win <- median(new.df$seasonal.equity[new.df$seasonal.equity >0])
median.loss <- median(new.df$seasonal.equity[new.df$seasonal.equity <0])
max.gain <- max(new.df$seasonal.equity)
max.loss <- min(new.df$seasonal.equity)
win.loss.ratio <- winning.trades / abs(losing.trades)
summary <- cbind(winning.trades,losing.trades,even.trades,total.days,win.percent,loss.percent,win.loss.ratio,time.invested,average.trade,average.win,average.loss,median.win,median.loss,consec.wins,consec.loss,max.gain,max.loss)
summary <- as.data.frame(summary)
colnames(summary) <- c("Winning Trades","Losing Trades","Even Trades","Total Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(summary)

# Multi Month Trades
multi.average.trade <- mean(new.df$perc.output)
multi.average.win <- mean(new.df$perc.output[new.df$perc.output >0])
multi.average.loss <- mean(new.df$perc.output[new.df$perc.output <0])
multi.median.win <- median(new.df$perc.output[new.df$perc.output >0])
multi.median.loss <- median(new.df$perc.output[new.df$perc.output <0])
multi.win.loss.ratio <- multi.average.win / abs(multi.average.loss)
multi.max.gain <- max(new.df$perc.output)
multi.max.loss <- min(new.df$perc.output)
multi.summary <- cbind(multi.winning.trades,multi.losing.trades,multi.total.days,multi.win.percent,multi.loss.percent,multi.win.loss.ratio,time.invested,multi.average.trade,multi.average.win,multi.average.loss,multi.median.win,multi.median.loss,multi.consec.wins,multi.consec.loss,multi.max.gain,multi.max.loss)
multi.summary <- as.data.frame(multi.summary)
colnames(multi.summary) <- c("Winning Trades","Losing Trades","Total Trades","Win %","Loss %","Win Loss Ratio","Time Invested","Average Trade","Average Win","Average Loss","Median Gain","Median Loss","Consec Wins","Consec Loss","Maximum Win","Maximum Loss")
print(multi.summary)
print(performance.table)
print(drawdown.table)
table.Drawdowns(xts1, top=10)
Return.cumulative(xts1, geometric = TRUE)

# Write output to file
write.csv(new.df,file="C:/R Projects/SP500.Seasonal.csv")

S&P500 Seasonal Study + Other Commodities

We study to see if there is seasonality to the S&P500. We perform the procedure below on data from 1928 to present day (10.11.2017)

1. Calculate daily spread of closing prices
2. Group daily spread by month
3. Calculate mean of each month

We simply compute the spread of the the close to close values. We do not use the % returns here, simply close – close for every day in the series.

Next we group all days by their month. We then compute the mean for each grouped month.

The results for the S&P500 are below:

Rplot126

Rplot127

The old adage… ‘Sell In May And Go Away!’ seems to be true.

Other ETF’s:

Rplot134

DIA follows mostly the same seasonal pattern to the S&P500.

Rplot130

The best months for Crude Oil seem to be from Feb through June.

Rplot135

Natural Gas has its worst months in July and August.

Rplot131

Best months for Gold look to be Jan/Feb and August.

Rplot132

 

Silver follows a similar seasonal pattern to Gold.

Commodities tend to exhibit seasonal supply and demand flutuations which are consistently shown in the mean plots above and with a bit of googling may be explained.

In another post we will test for seasonal strategies which will attempt to exploit the above seasonal trends.

Full R Code below:

# S&P500 Seasonal Study 
# Calculate daily price spreads
# Group by month 
# Average each monthly group 

require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(ggplot2)
require(ggthemes)

# Data path
data.dir <- "C:/Stock Market Analysis/Market Data/MASTER_DATA_DUMP"
data.read.spx <- paste(data.dir,"$SPX.csv",sep="/")

# Read data
read.spx <- read.csv(data.read.spx,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)

# Convert Values To Numeric 
cols <-c(3:8)
read.spx[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Convert Date Column [1] to Date format 
read.spx$Date <- ymd(read.spx$Date)

# Subset Date
#read.spx <- subset(read.spx, Date >= as.Date("1960-01-01") ) 

# Compute daily price differences 
# We replicate NA 1 time in order to maintain correct positioning of differences
# Within the data frame
read.spx$close.diff <- c(rep(NA, 1), diff(read.spx$Close, lag = 1, differences = 1, arithmetic = TRUE, na.pad = TRUE))

# Group each daily difference by month
group <- read.spx %>% dplyr::mutate(mymonth = lubridate::month(Date)) %>% group_by(mymonth) 
read.spx <- data.frame(read.spx,group$mymonth)
read.spx <- arrange(read.spx,group.mymonth)

# Duplicate df
for.mean <- data.frame(read.spx)

# Perform mean
mean <- for.mean %<>%
  group_by(group.mymonth) %>%
  summarise(mean=mean(close.diff,na.rm = TRUE))

# Confidence
jan <- subset(read.spx, group.mymonth  == 1)
feb <- subset(read.spx, group.mymonth  == 2)
mar <- subset(read.spx, group.mymonth  == 3)
apr <- subset(read.spx, group.mymonth  == 4)
may <- subset(read.spx, group.mymonth  == 5)
jun <- subset(read.spx, group.mymonth  == 6)
jul <- subset(read.spx, group.mymonth  == 7)
aug <- subset(read.spx, group.mymonth  == 8)
sep <- subset(read.spx, group.mymonth  == 9)
oct <- subset(read.spx, group.mymonth  == 10)
nov <- subset(read.spx, group.mymonth  == 11)
dec <- subset(read.spx, group.mymonth  == 12)
jan.t.test <- t.test(jan$close.diff, conf.level = 0.95,na.rm = TRUE)
jan.t.test$estimate

# Jan Plot 
hist(jan$close.diff,main="Jan Mean - Normal Distribution",xlab="Mean")

# Plot 
ggplot(mean, aes(group.mymonth, mean)) +
  geom_col()+
  theme_classic()+
  scale_x_continuous(breaks = seq(0, 12, by = 1))+
  ggtitle("UNG - Mean Daily Spead Per Month", subtitle = "2007 To Present") +
  labs(x="Month",y="Mean Daily Spread Per Month")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))

ggplot(mean, aes(group.mymonth, mean)) +
  geom_line()+
  theme_bw() +
  scale_x_continuous(breaks = seq(0, 12, by = 1))+
  scale_y_continuous(breaks = seq(-0.15, 0.30, by = 0.02))+
  ggtitle("Mean Daily Spead Per Month", subtitle = "1928 To Present") +
  labs(x="Month",y="Mean Daily Spread Per Month")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  geom_rect(aes(xmin=4.5,xmax=9,ymin=-Inf,ymax=Inf),alpha=0.1,fill="#CC6666")+
  geom_rect(aes(xmin=1,xmax=4.5,ymin=-Inf,ymax=Inf),alpha=0.1,fill="#66CC99")+
  geom_rect(aes(xmin=9,xmax=12,ymin=-Inf,ymax=Inf),alpha=0.1,fill="#66CC99")

# Write output to file
write.csv(read.spx,file="C:/R Projects/seasonal.csv")


Momentum Strategy – Boosting Returns – VFINX / VFITX

In this post we will look at a strategy that is 100% invested in the S&P500 and during adverse conditions, is 100% invested in an intermediate term treasury bond. The two funds used for the strategy:

Vanguard 500 Index Fund Investor Class(VFINX)
Vanguard Intermediate-Term Treasury Fund Investor Shares(VFITX)

The strategy is as follows using MONTHLY bars:
1. Long VFINX(SP500) when the price is over the 189 day moving average or monthly equivalent (9sma)
2. When VFINX(SP500) is below monthly 9sma, sell 100% of VFINX(SP500) at next months open and buy VFITX(Bonds)
3. When VFINX(SP500) is above the monthly 9sma, sell 100% of VFITX(Bonds) and 100% re-long VFINX(SP500)

Essentially the strategy is long the S&P500 when its above the 9sma (daily, 189sma) and 100% in intermediate bonds when below the 9sma (daily, 189sma).

The color red denotes times when VFINX(SP500) is below the monthly 9sma (daily, 189sma) and when the strategy would be 100% long bonds.

below9sma

The returns for this strategy since 1991 inception:

Rplot122

vfinx.vfitx.monthly

From the backtest we see an annualized return of 12.4% with a sharpe ratio of 1.21. Maximum draw down is 15% which means we may explore using leverage to boost gains whilst maintaining the same risk profile of buy and hold (VFINX).

We may boost returns by longing a 3x leveraged S&P500 ETF. This can be achieved with ProShares UltraPro ETF(UPRO, inception, 2009). First I want to check to see if we can model UPRO data prior to 2009.

Rplot117

We see from plotting the regression fit of UPRO and 3x SP500 daily returns that the fit is not perfect. There is a high degree of correlation. However, the % return tracking is not exact.

Rplot118

We have almost perfect tracking prior to June 2012. However, there is a notable divergence between the 3x S&P500 daily returns and UPRO.

When we model UPRO prior to 2009 by simply multiplying the S&P500 returns by 3 we know it will be largely theoretical.

Lets back test swapping VFINX with our theoretical UPRO prior to 2009.

Rplot119

upro

Maximum draw down is 45% which is significantly lower than buy and hold UPRO (92%). Essentially we enjoy staggering returns with almost half of the draw down. The 189 daily sma or monthly 9sma acts a 'low pass filter' and avoids the catastrophic bear markets post 1999 and 2007.

Rplot120

cum.ret

We see that cumulative returns are 1276.56%.

This is theoretical but staggering results.

# Get First And Last Date
last.date <- tail(df$Date, n = 1)
first.date <- head(df$Date, n = 1)
## Find Time Difference
time <- last.date - first.date
years.between <- time/352
years.between <- as.numeric(years.between, units="days")   # Extract numerical value from time difference 'Time difference of 2837.208 days'
years.between

10 thousand dollars grows to 12.7 million dollars over a time period of 26.98 years. Not bad for for a maximum draw down of 45% with 3x leverage.

In closing, the 189 daily sma or 9sma monthly equivalent acts as a low pass filter. The aim of this filter is to reduce over all volatility, provide less of a draw down and avoid negative compounding periods. We see from the results that this has been achieved.

Reader Question:
What happens if we optimize the monthly bar SMA look back higher / lower than the tested 9sma?

Full back test R code:


# Long Term S&P500 - Switch To Bonds
# Andrew Bannerman 10.8.2017

require(lubridate)
require(dplyr)
require(magrittr)
require(TTR)
require(zoo)
require(data.table)
require(xts)
require(PerformanceAnalytics)

# Data path
data.dir <- "C:/R Projects/Final Scripts/Vanguard Long Term Strategies"
data.read.VFINX <- paste(data.dir,"VFINX.csv",sep="/")
data.read.VFITX <- paste(data.dir,"VFITX.csv",sep="/")

# Read data
read.VFINX <- read.csv(data.read.VFINX,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)
read.VFITX <- read.csv(data.read.VFITX,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)
#read.VFITX <- read.VFITX[-nrow(read.VFITX),] 

# Convert Values To Numeric
cols <-c(2:7)
read.VFINX[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))
read.VFITX[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

#Convert Date Column [1]
read.VFINX$Date <- ymd(read.VFINX$Date)
read.VFITX$Date <- ymd(read.VFITX$Date)

# Merge two data sets
df <- full_join(read.VFINX, read.VFITX, by = c("Date" = "Date"))

# Rename Columns
colnames(df)[1] <-"Date"
colnames(df)[2] <-"VFINX.Open"
colnames(df)[3] <-"VFINX.High"
colnames(df)[4] <-"VFINX.Low"
colnames(df)[5] <-"VFINX.Close"
colnames(df)[6] <-"VFINX.Adj.Close"
colnames(df)[7] <-"VFINX.Volume"
colnames(df)[8] <-"VFITX.Open"
colnames(df)[9] <-"VFITX.High"
colnames(df)[10] <-"VFITX.Low"
colnames(df)[11] <-"VFITX.Close"
colnames(df)[12] <-"VFITX.Adj.Close"
colnames(df)[13] <-"VFITX.Volume"

# Convert all NA to 0
df[is.na(df)] <- 0

# Use TTR package to create rolling SMA n day moving average
# Create function and loop in order to repeat the desired number of SMAs for example 2:30
getSMA <- function(numdays) {
  function(df) {
    SMA(df[,"VFINX.Adj.Close"], numdays)    # Calls TTR package to create SMA
  }
}
# Create a matrix to put the SMAs in
sma.matrix <- matrix(nrow=nrow(df), ncol=0)

# Loop for filling it
for (i in 2:400) {
  sma.matrix <- cbind(sma.matrix, getSMA(i)(df))
}

# Rename columns
colnames(sma.matrix) <- sapply(2:400, function(n)paste("adj.close.sma.n", n, sep=""))

# Bind to existing dataframe
df <-  cbind(df, sma.matrix)

# Convert all NA to 0
df[is.na(df)] <- 0

# Calculate Returns from open to close
#df$VFINX.ocret <- apply(df[,c('VFINX.Open', 'VFINX.Adj.Close')], 1, function(x) { (x[2]-x[1])/x[1]} )
#df$VFITX.ocret <- apply(df[,c('VFITX.Open', 'VFITX.Adj.Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
df$VFINX.clret <- ROC(df$VFINX.Adj.Close, type = c("discrete"))
df$VFITX.clret <- ROC(df$VFITX.Adj.Close, type = c("discrete"))
df$VFINX.clret[1] <- 0
df$VFITX.clret[1] <- 0

# Add leverage multiplier
#df$VFINX.clret <- df$VFINX.clret * 3

# Subset Date
df <- subset(df, Date >= as.POSIXct("1991-10-01") ) 

# Name indicators #
VFINX.sma <- df$adj.close.sma.n9

# Enter buy / sell rules
df$signal.long.stocks <- ifelse(df$VFINX.Adj.Close > VFINX.sma, 1,0)
df$signal.long.bonds <- ifelse(df$VFINX.Adj.Close < VFINX.sma, 1,0)

# lag signal by one forward day to signal entry next day
df$signal.long.stocks <- lag(df$signal.long.stocks,1) # Note k=1 implies a move *forward*
df$signal.long.bonds <- lag(df$signal.long.bonds,1) # Note k=1 implies a move *forward*

df[is.na(df)] <- 0  # Set NA to 0

#Plot VIFNX Monthly with 9sma
plot(df$VFINX.Adj.Close, col = ifelse(df$VFINX.Adj.Close < VFINX.sma,'red','black'), pch = 10, cex=.5,ylab="VFINX Close",main="VFINX Below Monthly 9sma")

# Calculate equity curves
# Long Stocks
df <- df %>%
  dplyr::mutate(RunID = rleid(signal.long.stocks)) %>%
  group_by(RunID) %>%
  dplyr::mutate(long.stocks.equity.curve = ifelse(signal.long.stocks== 0, 0,
                                         ifelse(row_number() == 1, VFINX.clret, VFINX.clret))) %>%
  ungroup() %>%
  select(-RunID)

# Long Bonds
df <- df %>%
  dplyr::mutate(RunID = rleid(signal.long.bonds)) %>%
  group_by(RunID) %>%
  dplyr::mutate(long.bonds.equity.curve = ifelse(signal.long.bonds == 0, 0,
                                                  ifelse(row_number() == 1, VFITX.clret, VFITX.clret))) %>%
  ungroup() %>%
  select(-RunID)

# Combine Signals
df$combined.equity.curve <- df$long.stocks.equity.curve + df$long.bonds.equity.curve

# Pull select columns from data frame to make XTS whilst retaining formats
xts1 = xts(df$long.stocks.equity.curve, order.by=as.Date(df$Date, format="%m/%d/%Y"))
xts2 = xts(df$long.bonds.equity.curve, order.by=as.Date(df$Date, format="%m/%d/%Y"))
xts3 = xts(df$combined.equity.curve, order.by=as.Date(df$Date, format="%m/%d/%Y"))
xts4 = xts(df$VFINX.clret, order.by=as.Date(df$Date, format="%m/%d/%Y")) 

# Join XTS together
compare <- cbind(xts3,xts2,xts1,xts4)

# Use the PerformanceAnalytics package for trade statistics

require(PerformanceAnalytics)
colnames(compare) <- c("Long.Stocks.Switch.Bonds","Long.Bonds","Long.Stocks","Buy And Hold")
charts.PerformanceSummary(compare,main="Long Stocks(VFINX) Over 200sma, Long Bonds(VFITX) Under 200sma", wealth.index=TRUE, colorset=rainbow12equal)
performance.table <- rbind(table.AnnualizedReturns(compare),maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()
#logRets <- log(cumprod(1+compare))
#chart.TimeSeries(logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)
Return.cumulative(xts3, geometric = TRUE)