Julia – Download Free Data Using Alphavantage API

For those of you wanting to get started/familiar with the Julia language. I wrote a small script to import .csv data using the Alphavantage API.

In this example we demonstrate how to download a single .csv file as well as batch download multiple .csv files and export as .csv.

First lets download a single .csv. AAPL 1 minute data using the alphavantage API. You will need to insert your API key.

single__test

And plot the output:

AAPL_1min

If we wish to download multiple .csv files and export to .csv we may:

multiple

Those of you using R may visit: Free Data: Alternative to Yahoo Finance

Full Julia code on my github

Speed Check! Juilia Vs R Back Test Script

In a quest for speed enhancements over R. I opted to look at the Julia language. It is a high level programming language touting similar speed to C. I find the syntax not at all that different from Python and R. If you have knowledge of the ‘how’ to solve many problems using those languages, the same logic applies to using Julia only having to learn a slightly new but similar syntax.

I created a very simple back test script. The strategy is to stay long the ES mini when the close price is over the 200 period moving average applied to 30 minute bars (200 * 30 minutes is a 6000 minute moving average). Close the position when the ES crosses under the 6000 minute moving average.

I kept the functions and methods of calculation mostly similar between the two languages as stepped below:

1. Load .csv data, 30 minute ES time
2. Create a date and time column, convert to Date format
3. Create 200 Bar SMA
4. Create Long Signals
5. Lag the Long Signal forward +1 to avoid look ahead bias
6. For loop to calculate bar to bar returns
7. Subset Data to remove 200 missing data on SMA 200 creation
8. Calculate strategy returns and buy and hold returns

I excluded any plotting processes as right now I am plotting within the IDE. For R Im using R studio and for Julia I am using Atom – Juno.

Lets now get to the code showing the backtest script for both R and Julia:

Load Packages R:

require(TTR)
require(lubridate)
require(dplyr)

Load Packages Julia:

using DataFrames using Indicators

Load .txt Data R

df <- read.csv("C:/Users/Andrew.Bannerman/Desktop/Julia/30.min.es.txt", header=TRUE,stringsAsFactors = FALSE)

Load txt Data Julia:

df = readtable("30.min.es.txt", header=true)

Lets check to see how many rows the ES 30 minute data has:

julia> nrow(df) 223571

Next lets make a Date Time Column and convert to Date Time format in R:

# Make date time column
df$Date_Time <- paste(df$Date,df$Time)
df$Date_Time <- mdy_hm(df$Date_Time)

Make Date Time Column Julia (Couldn't find a clean R paste() like Julia function!) and convert to DateTime format:

a = df[:Date] b = df[:Time] c = map(join,zip(a,b), " ") out = String[] temp = String[] for i in 1:length(a) temp = map(join,zip([a[i]],[b[i]]), " ") append!(out,temp) end df[:Date_Time] = out df[:Date_Time] = DateTime.(df[:Date_Time],Dates.DateFormat("mm/dd/yyyy H:M")

Next we can create the 200SMA and Calculate the Long Signal, first R:

# Create Sma
df$sma_200 <- SMA(df$Close,200)
# Create long signal
df$Long_Signal  df$sma_200,1,0)
df$Long_Signal <- dplyr::lag(df$Long_Signal,1) # lag forward avoid look ahead bias

And Julia:

# Create simple moving average # using Indicators Close = convert(Array, df[:Close]) sma_200 = sma(Close,n=200) df[:Close_200sma] = sma_200 # Create Signals # Stay long over 200sma # Exit positions below 200sma # use ifelse() function see - #https://en.wikibooks.org/wiki/Introducing_Julia/Controlling_the_flow # remember . in front of the (.>) for vectorization! df[:Signal_Long] = ifelse(df[:Close] .> df[:Close_200sma],1,0) # Lag data +1 forward # Avoid look ahead bias df[:Signal_Long] = [0; df[1:end-1,:Signal_Long]]

Next we can calculate Close to Close Returns. From this we multiply the returns by the strategy signal 1 or 0.

First R:

# For loop for returns
out <- vector()
for (i in 2:nrow(df)){
out[i] = df$Close[i]/df$Close[i-2+1] - 1.0
}
df <- cbind(df,out)
colnames(df)[12] = "Close_Ret"
# Calculate strategy Returns
df$Sig_Rets <- df$Long_Signal * df$Close_Ret
df[is.na(df)] <- 0

And same for Julia:

# Calculate Close to Close Returns Close = df[:Close] x = convert(Array, Close) out = zeros(x) for i in 2:size(Close,1) out[i] = Close[i]/Close[i-2+1] - 1.0 end df[:Close_Rets] = out # Calculate signal returns df[:Signal_Rets] = df[:Signal_Long] .* df[:Close_Rets]

And finally we calculate cumulative returns:

First R:

# Calculate Cumulative Returns
# Buy and hold and Strategy returns
# Subset Data To start after SMA creation
df = df[201:nrow(df),]
df$Signal_cum_ret <- cumprod(1+df$Sig_Rets)-1
df$BH_cum_ret <- cumprod(1+df$Close_Ret)-1

And Julia:

# Calculate Cumulative Returns df = df[201:end,:] df[:Cum_Rets] = cumprod(1+df[1:end, :Signal_Rets])-1 df[:BH_Cum_Rets] = cumprod(1+df[1:end, :Close_Rets])-1g] .* df[:Close_Rets]

Next lets wrap the script in a for loop and run it 100 times and take the mean time ( full code on my github)

The mean time result for a 100 iterations using R:

out_results
Time
1 4.881509
2 4.550159
3 4.762161
4 4.847419
5 5.260049
6 4.715544
7 4.617849
8 4.642842
9 4.933652
10 4.660920

mean(out_results$Time)
[1] 4.582826

And the mean time result for 100 iterations Julia:

julia> final_out
100-element Array{Int64,1}:
 2321
 1974
 2123
    â‹®
 1943
 1933
 2083

julia> print(mean(final_out))
1957.93
julia> 1957.93/1000  # Convert milliseconds to seconds
1.9579300000000002

We see on average that Julia took 1.95 seconds to complete each back test iteration. The Julia script contained two for loops vs 1x for loop in R. I didnt play to R’s vectorized strengths in this regard. But on a almost exact same code to code speed check Julia comes out on top beating R on average by 2.624896 seconds per script iteration.

After 100 iterations R total time for completion:

> sum(out_results$Time)
[1] 458.2826

or 7.6380433333 minutes.

And total Time for Julia:

julia> print(sum(final_out))
195793
julia> 195793 / 1000
195.793

or 3.263216667 minutes.

In this example after running a back test script 100 times and taking the average time + sum time for completion we see Julia is 2.34 times faster than R.
It should be noted that each function is pretty standard to each language. I used Julias DataFrames package versus using straight Arrays. Using Arrays might be faster than working with dataframes. We see no slow down at all using for loops in Julia. My hunch is that removing the for loop in R would get the time closer to Julia but i’m too lazy to check this 🙂 (ok i’m not if we play to the vectored theme of R and remove the slow for loop for calculating returns and replacing with data.table:

require(data.table)
df = data.table(df)
df[, Close_Ret := (Close / shift(Close))-1]

Speed improves with 1x script run taking:

Time difference of 2.614989 secs
)

This is my first Julia script so if spot anywhere I can make the code more efficient drop me a line.

A similar package to TTR for financial indicators is Julias Indicators package.

I like working with Rstudio and a similar IDE for Julia is Juno-Atom

atom_juno

Finally:

Here is the back test results from R / Julia:

plot(df$Signal_cum_ret,type="l",main="R 200SMA Back Test Result")

Rplot431

# Plot
using StatPlots
gr(size=(1500 ,1000))
@df df plot(:Date_Time, [:Cum_Rets :BH_Cum_Rets], title = "SPY Long Over 200sma", xlab = "Date", ylab = "Cumulative Returns",colour = [:lightgreen :pink],legend = :topleft)
savefig("myplot.png")

myplot.png

R Code = https://gist.github.com/flare9x/2d73e73218967699c035d6d70fa4ae8a
Julia Code = https://gist.github.com/flare9x/7d1d41856ffbe3106983d15885d8a0cc

R – Quantifying Trend Days

In this post I will be using R and data.table to extract all trend up / down days. The following method was used to quantify a trend day:

1. Trend up = Close price closes within 25% of the days high
2. Trend down = Close prices closes within 25% of the days low
3. Exclusive of gaps, if open is above yesterdays high or low exclude
4. Daily return must be over / below .75 / -.75%

Other methods come to mind:
1. Select those stocks that close within 25% of low / high AND when daily gain / loss is above or below 1%.
2. Linear regression, low r2 see blog post here: http://quantgorilla.com/blog/quantifying-trend-days/

My shell of a code can be altered to include these methods but for now lets stick to closing within top / bottom 25% of days range excluding gap ups, the code to do this: (check github code bottom of page, WP is having a hard time displaying the code snippet correctly)

# Es Up / Down Trend Isolation
# 2.26.2018
# Andrew Bannerman 

# Load Daily ES Data
es_d <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.day.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_d$Date <- mdy(es_d$Date)  # Change to date format 

# Find relative position of the closing price  (Close - Low) / (High - Low)
es_d$pos <- (es_d$Close - es_d$Low) / (es_d$High - es_d$Low)

# Calculate returns
es_d$rets <- ((es_d$Close / lag(es_d$Close,1)) - 1) * 100

# Find gap up / down days 
es_d$gap_up  lag(es_d$High,1),1,0)
es_d$gap_dn <- ifelse(es_d$Open < lag(es_d$Low,1),1,0)

# Find trend up / down days 
es_d$trend_up = .75,1,0)
es_d$trend_dn <- ifelse(es_d$pos <= .25,1,0)

# Subset all trend up / down days 
# Trend day definition: close within 25 % of day high / low and close  over .75% to exclude quiet days 
# Exclude gap up / down days where open closes over high , low
trend_up = .75 & es_d$gap_up == 0, ]
trend_dn <- es_d[es_d$trend_dn == 1 & es_d$rets <= -.75 & es_d$gap_dn == 0, ]

# Count total trend up / down days
total_up <- nrow(trend_up)
total_dn <- nrow(trend_dn)

# Percentage trend days of sample
total <- nrow(es_d)
perc.up <- total_up / total
perc.dn <- total_dn / total

# Save Dates in order to use for susetting 5 min bars
trend_up_dates <- trend_up$Date
trend_dn_dates <- trend_dn$Date

There is a total of  5167 days in the sample. Of those days approx 11% are up trend days and 10% down trend days.  Next we may extract each trend day and save the 1 minute intraday plot:

# Load 1 minute bars
es_1 <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.min.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_1$Date_new <- paste(es_1$Date,es_1$Time)
es_1$Date_new <- mdy_hm(es_1$Date_new) # Change date to date format
es_1$Date <- mdy(es_1$Date)
# Save up trend plots
# initialize list
t_UP <- list()
i=1
for (i in 1:length(trend_up_dates)) {
tryCatch({
ptm0 <- proc.time()
temp <- subset(es_1 , Date == trend_up_dates[i]) # Conditionl subset == grab trend day on intraday level
temp$range <- temp$High – temp$Low
t_UP[[i]] <- temp
name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
temp <- temp[3:10]
head(temp)
colnames(temp)[7] <- "Date"
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
head(df)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names

# Save up trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_up_plots"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Up Day"))
dev.off()
ptm1=proc.time() – ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}

# General Trend Up Stats
trend_up_out <- do.call(rbind,t_UP)
head(trend_up_out)
t_up_mean_r <- mean(trend_up_out$range)

# Save intraday down trend plots
# initialize list
t_DN <- list()
i=1
for (i in 1:length(trend_dn_dates)) {
tryCatch({
ptm0 <- proc.time()
Sys.sleep(0.1)
temp <- subset(es_1 , Date == trend_dn_dates[i])
temp$range <- temp$High – temp$Low
t_DN[[i]] <- temp
name_date <- as.numeric(gsub("-", "", head(temp$Date,1))) #replace – with ""
temp$chg temp$Open, "up", "dn")
temp <- temp[3:10]
colnames(temp)[7] <- "Date"
df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M"))
all_ts <- cbind(open,high,low,close)
names <- c("Open","High","Low","Close")
colnames(all_ts) <- names

# Save down trend plots
dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_down_plots"
mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
png(file=mypath,width= 1400, height=1000)
mytitle = paste(name_date[1])
chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini — Trend Down Day"))
dev.off()
ptm1=proc.time() – ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
}, error = function(e) { print(paste("i =", i, "failed:")) })
}

# General Trend Down Stats
trend_dn_out <- do.call(rbind,t_DN)
t_dn_mean_r <- mean(trend_dn_out$range)

With example output:

20180126

20020611

Next we can plot daily bars, 5 days prior to each trend day:

##############################################################################
# Extract 5 days prior to trend up / down day
##############################################################################
dates_df <- data.frame(Date = es_d$Date)
dates_df$wkdays <- weekdays(as.Date(dates_df$Date)) # extract day of week

# Grab trend up start dates
es_d_prior_m <- ifelse(dates_df$wkdays == "Monday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_tue <- ifelse(dates_df$wkdays == "Tuesday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_w <- ifelse(dates_df$wkdays == "Wednesday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_th <- ifelse(dates_df$wkdays == "Thursday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 7),NA)
es_d_prior_f <- ifelse(dates_df$wkdays == "Friday" & es_d$Date %in% trend_up_dates, paste(es_d$Date - 5),NA)
# Remove NA
es_d_prior_m <- es_d_prior_m[!is.na(es_d_prior_m)]
es_d_prior_tue <- es_d_prior_tue[!is.na(es_d_prior_tue)]
es_d_prior_w <- es_d_prior_w[!is.na(es_d_prior_w)]
es_d_prior_th <- es_d_prior_th[!is.na(es_d_prior_th)]
es_d_prior_f <- es_d_prior_f[!is.na(es_d_prior_f)]
t_up_all_prior <- c(es_d_prior_m,es_d_prior_tue,es_d_prior_w,es_d_prior_th,es_d_prior_f)
# sort dates
t_up_all_prior <- sort(t_up_all_prior)
t_up_all_prior <- ymd(t_up_all_prior)

# up trend subsetting
up_list <- list()
for (i in 1:length(trend_up_dates)) {
  up_list[[i]] = t_up_all_prior[i] & es_d$Date <= trend_up_dates[i],]
}

# Plot 5 days prior to uptrend
t_UP_Prior <- list()
i=1
for (i in 1:length(up_list)) {
  tryCatch({
    ptm0 <- proc.time()
    temp <- up_list[[i]] # Conditionl subset == grab trend day on intraday level
    name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
    df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
    open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    all_ts <- cbind(open,high,low,close)
    names <- c("Open","High","Low","Close")
    colnames(all_ts) <- names

    # Save up trend plots
    dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_up_prior"
    mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
    png(file=mypath,width= 1400, height=1000)
    mytitle = paste(name_date[1])
    chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini -- Trend Up Day"))
    dev.off()
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

# Grab trend down start dates
es_d_prior_m <- ifelse(dates_df$wkdays == "Monday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_tue <- ifelse(dates_df$wkdays == "Tuesday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_w <- ifelse(dates_df$wkdays == "Wednesday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_th <- ifelse(dates_df$wkdays == "Thursday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 7),NA)
es_d_prior_f <- ifelse(dates_df$wkdays == "Friday" & es_d$Date %in% trend_dn_dates, paste(es_d$Date - 5),NA)
# Remove NA
es_d_prior_m <- es_d_prior_m[!is.na(es_d_prior_m)]
es_d_prior_tue <- es_d_prior_tue[!is.na(es_d_prior_tue)]
es_d_prior_w <- es_d_prior_w[!is.na(es_d_prior_w)]
es_d_prior_th <- es_d_prior_th[!is.na(es_d_prior_th)]
es_d_prior_f <- es_d_prior_f[!is.na(es_d_prior_f)]
t_up_all_prior <- c(es_d_prior_m,es_d_prior_tue,es_d_prior_w,es_d_prior_th,es_d_prior_f)
# sort dates
t_up_all_prior <- sort(t_up_all_prior)
t_up_all_prior <- ymd(t_up_all_prior)

# down trend subsetting
dn_list <- list()
for (i in 1:length(trend_dn_dates)) {
  dn_list[[i]] = t_up_all_prior[i] & es_d$Date <= trend_dn_dates[i],]
}

# Plot 5 days prior to down trend
i=1
for (i in 1:length(dn_list)) {
  tryCatch({
    ptm0 <- proc.time()
    temp <- dn_list[[i]] # Conditionl subset == grab trend day on intraday level
    name_date <- as.numeric(gsub("-", "", head(temp$Date,1)))
    df <- data.frame("Open" = temp$Open, "High" = temp$High, "Low" = temp$Low,"Close" = temp$Close,"Date" = temp$Date)
    open = xts(df$Open, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    high = xts(df$High, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    low = xts(df$Low, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    close = xts(df$Close, order.by=as.POSIXct(df$Date, format="%Y-%m-%d %H:%M:%S"))
    all_ts <- cbind(open,high,low,close)
    names <- c("Open","High","Low","Close")
    colnames(all_ts) <- names

    # Save up trend plots
    dir <- "C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/Trend Days/trend_dn_prior"
    mypath <- file.path(dir,paste(name_date[1], ".png", sep = ""))
    png(file=mypath,width= 1400, height=1000)
    mytitle = paste(name_date[1])
    chartSeries(all_ts, type = "bars", ,bar.type = "ohlc",theme = chartTheme("white"),name=paste("es_mini -- Trend Up Day"))
    dev.off()
    ptm1=proc.time() - ptm0
    time=as.numeric(ptm1[3])
    cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

With output:

19990204

20110914

There are two ways to attempt to predict a trend day with a certain degree of probability:

1. Use patterns of daily bars
2. Use intraday patterns within the trend day

The daily bars plotted above allow to view daily bars prior to a trend up / down day. This code can be extended to look at specific features which I will allow the reader to work at which may include:

1. Prior to trend day, was previous day down or up?
2. Where did the price close relative to the days range (high – low / high – low)
3. What did the 3 day pattern look like prior to a trend day?
– 3 tight closes?
– middle bar low and close lower than the bars each side?

etc etc…

Now we move on to an attempt to quantify a trend day within the trend day itself. This involves a simple count of the number of new 1 minute highs made by a certain time frame. In this example I chose the number of 1 minute highs or lows prior to 10am Central time.

The code that achieves this:

##########################################################################################
# Intraday Trend up / down
##########################################################################################
es_1 <- read.csv("C:/R Projects/Final Scripts/2018_new_scripts_saved/Final Scripts/ES/ES Data/1.min.reg.sess.es.txt",header=TRUE,stringsAsFactors = FALSE)
es_1$Date_new <- paste(es_1$Date,es_1$Time)
es_1$Date_new  <- mdy_hm(es_1$Date_new)  # Change date to date format
es_1$Date <- mdy(es_1$Date)
# make data.table
# data.table for speed
es_1  shift(roll_max), 1,0),by = Date] # place 1 when close makes new day high
# rolling day low
es_1[, roll_min := cummin(Low), by = Date] # rolling day high
es_1[, close_below_low := ifelse(Close = "08:31" & Time = "08:31" & Time  0)
# plot distriutions
ggplot(data=plot.df, aes(plot.df$count_max)) +
  geom_histogram(binwidth = .5,fill="darkgreen",alpha = 0.5,col="grey") +
  scale_x_continuous(breaks = seq(plot.df$count_max), max(plot.df$count_max),name="No. New Highs Prior 10am Central")+
  scale_y_continuous(breaks = seq(0, 600000,100000))+
  ggtitle("Number of new 1 minute highs prior to 10am Central")+
  ylab("Total New Highs")

# Plot new lows
ggplot(data=plot.df, aes(plot.df$count_min)) +
  geom_histogram(binwidth = .5,fill="red",alpha = 0.5,col="grey") +
  scale_x_continuous(breaks = seq(plot.df$count_min), max(plot.df$count_min),name="No. New Lows Prior 10am Central")+
  #scale_y_continuous(breaks = seq(0, 35500,2000))+
  ggtitle("Number of new 1 minute lows prior to 10am Central")+
  ylab("Total New Low")
graphics.off()

We may plot the distribution of new highs / lows prior to 10am central time:

Rplot305

Rplot306

We will back test less frequent new highs / new lows values prior to 10am central.

Rules of back test:

1. Go long when number of new highs prior to 10am central is >= 8
2. Go short when number of new highs prior to 10am central is >= 8
3. exit at the end of the trading day

no stops, no slippage and commissions.

The code to do this:

# Back test
# Trading Signal long and short
es_1[, sig_buy := ifelse(before_10_UCT_up >= 8, 1,0)] # long signal
es_1[, sig_exit :=  Time = 8, -1,0)] # long signal
es_1[, sig_s_exit :=  Time < "15:15"] # exit time
#es_1[is.na(es_1)] <- 0
es_1[, sig_end_l := ifelse(sig_exit == FALSE,0,NA)] # mark end of trade retain NA
es_1[, sig_end_s := ifelse(sig_s_exit == FALSE,0,NA)] # mark end of trade retain NA
# Combine NA + signal for long trades
es_1$z_l <- ifelse(es_1$sig_buy == 1,
               rowSums(es_1[, c("sig_buy", "sig_end_l")], na.rm=TRUE),
               rowSums(es_1[, c("sig_buy", "sig_end_l")]))
es_1$z_l[1] <- 0
# Combine NA + signal for short trades
es_1$z_s <- ifelse(es_1$sig_short == -1,
                   rowSums(es_1[, c("sig_short", "sig_end_s")], na.rm=TRUE),
                   rowSums(es_1[, c("sig_short", "sig_end_s")]))
es_1$z_s[1] <- 0
# Forward fill 1's to end of the trade using package zoo, function na.locf
es_1[, final.signal_long := na.locf(z_l)] # long trades
es_1[, final.signal_short := na.locf(z_s)] # long trades
# lag signal by one forward day to signal entry next day (Trade at market open)
es_1$final.signal_long <- lag(es_1$final.signal_long,1) # Note k=1 implies a move *forward*
es_1$final.signal_short <- lag(es_1$final.signal_short,1) # Note k=1 implies a move *forward*
# Close to close returns
es_1[, one_min_rets := (Close / shift(Close)) - 1]
# Calculate strategy returns
es_1[, sig_long_rets := final.signal_long * one_min_rets]
es_1[, sig_short_rets := final.signal_short * one_min_rets]
es_1$sig_long_rets[1] <- 0
es_1$sig_short_rets[1] <- 0
# Combine equity curves
es_1[, combined_rets := sig_long_rets + sig_short_rets]
# Cumulative returns
es_1[, cum_rets_l := cumprod(1 + sig_long_rets) - 1]
es_1[, cum_rets_s := cumprod(1 + sig_short_rets) - 1]
es_1[, cum_rets_comb := cumprod(1 + combined_rets) - 1]
plot(es_1$Date_new,es_1$cum_rets_l,type="l")
graphics.off()
# Plot equity curves
line_plot_df <- data.frame(es_1$cum_rets_l,es_1$cum_rets_s,es_1$cum_rets_comb)
line.plot.df <- melt(line_plot_df)
date <- rep(es_1$Date,3)
line.plot.df <- cbind(line.plot.df,date)
head(line.plot.df)
ggplot(data=line.plot.df, aes(x=date, y=value, group = variable, colour = variable)) +
  geom_line()

With the final equity curves:

Rplot307

No optimization for times, number of new highs etc… I arbitrarily chose each.

To conclude, the idea of predicting trend days prior and within the trend day itself is an interesting idea. This is a very simple example which may be extended.

Full code can be found on my github:

Thanks for reading!

Stress Testing an Intraday Strategy Through Monte Carlo Methods

This is an intraday ES strategy that I was testing for a client. The client was not interested in it due to the low frequency of trades, hence I may post it for others to view. It shows how a strategy was proved through stress testing and looking for optimal conditions to apply the strategy. The steps for strategy development are below:

Strategy Development Example – Andrew Bannerman – 1.28.2018
Bannerman1985@gmail.com

Introduction:
Follow a scientific process in which a strategy is developed and stress tested before live incubation /
trading.

Tools used: R Statistical Programming Language

Strategy Development Procedure:
1. General view, general statistics, tail risk
2. Initial Back Testing
3. Walk Forward Analysis (Cross validation) , Rolling and Fixed.
4. Parameter Sensitivity Analysis (Random adjustments 0 to 50% parameter change, n times)
5. Draw down expectation (sample without replacement net trade result, n times)
6. Re-sample original time series (Maximum Entropy Bootstrapping, n times) and run strategy over new
series.
7. Market Random Test (Test if strategy beats buying randomly)
8. Noise Test (Adding random fixed % to open, high, low, close, simulate back test n times)
9. Strategy Seasonality
10. Layering non-correlated strategies

Please see attached .pdf.

Download – Intraday Trading Strategy Development Using R – pdf

Let me know if you have any questions!

Model Fitting – Using Autocorrelation to Detect The Nature of a Time Series

I believe a solid approach in developing trading strategies is to fit the correct model to the under lying. In a previous post we studied the Hurst exponent and its ability to detect if a series was mean reverting, momentum or a random walk. In this post we will look at a 2 lag autocorrelation on the S&P500. The long and short of it is: if there is high degree of correlation from one point in time to a previous point in time then we have momentum. Conversely, if we have weak correlation between one point in time to a previous point in time, we can say we have mean reversion.

In order to test this we use the acf R function. We run this over daily S&P500 log returns using rollyapply on a (252 trading days * 3, 756 bars = 3 trading years) rolling window. Thus each data point is the correlation coefficient of the previous 3 year rolling window.

The code that achieves this:

# Rolling auto correlation 

# Required Packages
require(ggplot2)
require(TTR)

# SP500 
SP500 <- read.csv("C:/Stock Market Analysis/Norgate Market Data/MASTER_DATA_DUMP/$SPX.csv", header=TRUE,stringsAsFactors = FALSE)
SP500$Date <- ymd(SP500$Date)
SP500$log.close <- log(SP500$Close)

# rets 
SP500$rets <- ROC(SP500$log.close, type = c("discrete"))
SP500$rets[1] <-0

# Rolling acf width 
acf.width <- 756 # bars ,1008 bars =  4 years (252 * 4 = 1008)

# Dates 
SP500dates <- SP500[acf.width:nrow(SP500),]
dates.df <- data.frame(Date=SP500dates$Date)
str(dates.df)
head(dates.df )

# Rolling auto correlation 
result <- rollapply(SP500$rets, width = acf.width, FUN=acf, 
                    lag.max = 30,type = "correlation",plot = FALSE)

# Extract different correlations 
cor2 <- lapply(result[,1], function(x) x[2, drop=FALSE])
cor3 <- lapply(result[,1], function(x) x[3, drop=FALSE])
cor4 <- lapply(result[,1], function(x) x[4, drop=FALSE])
cor5 <- lapply(result[,1], function(x) x[5, drop=FALSE])
cor6 <- lapply(result[,1], function(x) x[6, drop=FALSE])
cor7 <- lapply(result[,1], function(x) x[7, drop=FALSE])
cor8 <- lapply(result[,1], function(x) x[8, drop=FALSE])
cor9 <- lapply(result[,1], function(x) x[9, drop=FALSE])
cor10 <- lapply(result[,1], function(x) x[10, drop=FALSE])
cor11 <- lapply(result[,1], function(x) x[11, drop=FALSE])
cor12 <- lapply(result[,1], function(x) x[12, drop=FALSE])
cor13 <- lapply(result[,1], function(x) x[13, drop=FALSE])
cor14 <- lapply(result[,1], function(x) x[14, drop=FALSE])
cor15 <- lapply(result[,1], function(x) x[15, drop=FALSE])
cor16 <- lapply(result[,1], function(x) x[16, drop=FALSE])
cor17 <- lapply(result[,1], function(x) x[17, drop=FALSE])
cor18 <- lapply(result[,1], function(x) x[18, drop=FALSE])
cor19 <- lapply(result[,1], function(x) x[19, drop=FALSE])
cor20 <- lapply(result[,1], function(x) x[20, drop=FALSE])
cor21 <- lapply(result[,1], function(x) x[21, drop=FALSE])
cor22 <- lapply(result[,1], function(x) x[22, drop=FALSE])
cor23 <- lapply(result[,1], function(x) x[23, drop=FALSE])
cor24 <- lapply(result[,1], function(x) x[24, drop=FALSE])
cor25 <- lapply(result[,1], function(x) x[25, drop=FALSE])
cor26 <- lapply(result[,1], function(x) x[26, drop=FALSE])
cor27 <- lapply(result[,1], function(x) x[27, drop=FALSE])
cor28 <- lapply(result[,1], function(x) x[28, drop=FALSE])
cor29 <- lapply(result[,1], function(x) x[29, drop=FALSE])
cor30 <- lapply(result[,1], function(x) x[30, drop=FALSE])

# cbind outputs
cor2.df <- as.data.frame(do.call(rbind, cor2))
cor2.df <- data.frame(Date=dates.df$Date,cor2.df)
cor3.df <- as.data.frame(do.call(rbind, cor3))
cor3.df <- data.frame(Date=dates.df$Date,cor3.df)
cor4.df <- as.data.frame(do.call(rbind, cor4))
cor4.df <- data.frame(Date=dates.df$Date,cor4.df)
cor5.df <- as.data.frame(do.call(rbind, cor5))
cor5.df <- data.frame(Date=dates.df$Date,cor5.df)
cor6.df <- as.data.frame(do.call(rbind, cor6))
cor6.df <- data.frame(Date=dates.df$Date,cor6.df)
cor7.df <- as.data.frame(do.call(rbind, cor7))
cor7.df <- data.frame(Date=dates.df$Date,cor7.df)
cor8.df <- as.data.frame(do.call(rbind, cor8))
cor8.df <- data.frame(Date=dates.df$Date,cor8.df)
cor9.df <- as.data.frame(do.call(rbind, cor9))
cor9.df <- data.frame(Date=dates.df$Date,cor9.df)
cor10.df <- as.data.frame(do.call(rbind, cor10))
cor10.df <- data.frame(Date=dates.df$Date,cor10.df)
cor11.df <- as.data.frame(do.call(rbind, cor11))
cor11.df <- data.frame(Date=dates.df$Date,cor11.df)
cor12.df <- as.data.frame(do.call(rbind, cor12))
cor12.df <- data.frame(Date=dates.df$Date,cor12.df)
cor13.df <- as.data.frame(do.call(rbind, cor13))
cor13.df <- data.frame(Date=dates.df$Date,cor13.df)
cor14.df <- as.data.frame(do.call(rbind, cor14))
cor14.df <- data.frame(Date=dates.df$Date,cor14.df)
cor15.df <- as.data.frame(do.call(rbind, cor15))
cor15.df <- data.frame(Date=dates.df$Date,cor15.df)
cor16.df <- as.data.frame(do.call(rbind, cor16))
cor16.df <- data.frame(Date=dates.df$Date,cor16.df)
cor17.df <- as.data.frame(do.call(rbind, cor17))
cor17.df <- data.frame(Date=dates.df$Date,cor17.df)
cor18.df <- as.data.frame(do.call(rbind, cor18))
cor18.df <- data.frame(Date=dates.df$Date,cor18.df)
cor19.df <- as.data.frame(do.call(rbind, cor19))
cor19.df <- data.frame(Date=dates.df$Date,cor19.df)
cor20.df <- as.data.frame(do.call(rbind, cor20))
cor20.df <- data.frame(Date=dates.df$Date,cor20.df)
cor21.df <- as.data.frame(do.call(rbind, cor21))
cor21.df <- data.frame(Date=dates.df$Date,cor21.df)
cor22.df <- as.data.frame(do.call(rbind, cor22))
cor22.df <- data.frame(Date=dates.df$Date,cor22.df)
cor23.df <- as.data.frame(do.call(rbind, cor23))
cor23.df <- data.frame(Date=dates.df$Date,cor23.df)
cor24.df <- as.data.frame(do.call(rbind, cor24))
cor24.df <- data.frame(Date=dates.df$Date,cor24.df)
cor25.df <- as.data.frame(do.call(rbind, cor25))
cor25.df <- data.frame(Date=dates.df$Date,cor25.df)
cor26.df <- as.data.frame(do.call(rbind, cor26))
cor26.df <- data.frame(Date=dates.df$Date,cor26.df)
cor27.df <- as.data.frame(do.call(rbind, cor27))
cor27.df <- data.frame(Date=dates.df$Date,cor27.df)
cor28.df <- as.data.frame(do.call(rbind, cor28))
cor28.df <- data.frame(Date=dates.df$Date,cor28.df)
cor29.df <- as.data.frame(do.call(rbind, cor29))
cor29.df <- data.frame(Date=dates.df$Date,cor29.df)
cor30.df <- as.data.frame(do.call(rbind, cor30))
cor30.df <- data.frame(Date=dates.df$Date,cor30.df)

# smooth 
#cor2.df$sma <- SMA(cor2.df$V1,100)

# Plot ACF 
ggplot(cor2.df,aes(Date,V1))+geom_line(aes(color="Lag 2"))+
    geom_line(data=cor3.df,aes(color="Lag 3"))+
  geom_line(data=cor4.df,aes(color="Lag 4"))+
  geom_line(data=cor5.df,aes(color="Lag 5"))+
  geom_line(data=cor6.df,aes(color="Lag 6"))+
  geom_line(data=cor7.df,aes(color="Lag 7"))+
  geom_line(data=cor8.df,aes(color="Lag 8"))+
  geom_line(data=cor9.df,aes(color="Lag 9"))+
  geom_line(data=cor10.df,aes(color="Lag 10"))+
  geom_line(data=cor11.df,aes(color="Lag 11"))+
  geom_line(data=cor12.df,aes(color="Lag 12"))+
  geom_line(data=cor13.df,aes(color="Lag 13"))+
  geom_line(data=cor14.df,aes(color="Lag 14"))+
  geom_line(data=cor15.df,aes(color="Lag 15"))+
  geom_line(data=cor16.df,aes(color="Lag 16"))+
  geom_line(data=cor17.df,aes(color="Lag 17"))+
  geom_line(data=cor18.df,aes(color="Lag 18"))+
  geom_line(data=cor19.df,aes(color="Lag 19"))+
  geom_line(data=cor20.df,aes(color="Lag 20"))+
  geom_line(data=cor21.df,aes(color="Lag 21"))+
  geom_line(data=cor22.df,aes(color="Lag 22"))+
  geom_line(data=cor23.df,aes(color="Lag 23"))+
  geom_line(data=cor24.df,aes(color="Lag 24"))+
  geom_line(data=cor25.df,aes(color="Lag 25"))+
  geom_line(data=cor26.df,aes(color="Lag 26"))+
  geom_line(data=cor27.df,aes(color="Lag 27"))+
  geom_line(data=cor28.df,aes(color="Lag 28"))+
  geom_line(data=cor29.df,aes(color="Lag 29"))+
  geom_line(data=cor30.df,aes(color="Lag 30"))+
labs(color="Legend text")


# Plot with shade
ggplot(cor2.df,aes(Date,V1))+geom_line(aes(colour="V1"))+
  geom_area(aes(y = V1, fill="V1"))+
  ggtitle("Rolling Auto Correlation, 2 lag, rolling window = 3 years", subtitle = "S&P500 Log Returns") +
  labs(x="Date",y="Correlation")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  theme(legend.position="none")+
  scale_y_continuous(breaks = seq(-.5, .50, by = 0.05))

# Write 
write.csv(cor2.df,"C:/R Projects/lag2.autocorr.csv")

Plot the 2 lag autocorrelation:

Rplot172

A few things are obvious… but first… lets plot a simple strategy to test for buying high and selling higher (momentum), buying low and selling higher (mean reversion). We want to simply test what happens if we executed the above.

The rules:
1. Take a rolling z-score of S&P500 close prices
2. Momentum buy and exit: Buy when rolling z-score is over 0 and sell when its below 0.
3. Mean reversion buy and exit: Buy when rolling z-score is below 0 and sell when its above 0.

The code that achieves this:

# Mean Reversion @ Momentum S&P500 
# Andrew Bannerman 1.17.2018

######################################
# Required Packages 
#####################################
require(xts)
require(data.table)
require(ggplot2)
require(lubridate)
require(magrittr)
require(scales)
require(reshape2)
require(PerformanceAnalytics)
require(dplyr)
require(TTR)
require(gridExtra)

## Load Data / Format handling
SP500 <- read.csv("C:/Stock Market Analysis/Norgate Market Data/MASTER_DATA_DUMP/$SPX.csv", header=TRUE,stringsAsFactors = FALSE)
SP500$Date <- ymd(SP500$Date)

# Use TTR package to create rolling SMA n day moving average 
# Create function and loop in order to repeat the desired number of SMAs for example 2:30
getSMA <- function(numdays) {
  function(SP500) {
    SMA(SP500[,"Close"], numdays)    # Calls TTR package to create SMA
  }
}
# Create a matrix to put the SMAs in
sma.matrix <- matrix(nrow=nrow(SP500), ncol=0)

# Loop for filling it
for (i in 2:60) {
  sma.matrix <- cbind(sma.matrix, getSMA(i)(SP500))
}

# Rename columns
colnames(sma.matrix) <- sapply(2:60, function(n)paste("close.sma.n", n, sep=""))

# Bind to existing dataframe
SP500 <-  cbind(SP500, sma.matrix)

# Use TTR package to create rolling Standard Deviation
# Create function and loop in order to repeat the desired number of Stdev for example 2:30
getSD <- function(numdays) {
  function(SP500) {
    runSD(SP500$Close, numdays, cumulative = FALSE)    # Calls TTR package to create SMA
  }
}
# Create a matrix to put the SMAs in
sd.matrix <- matrix(nrow=nrow(SP500), ncol=0)

# Loop for filling it
for (i in 2:60) {
  sd.matrix <- cbind(sd.matrix, getSD(i)(SP500))
}

# Rename columns
colnames(sd.matrix) <- sapply(2:60, function(n)paste("close.sd.n", n, sep=""))

# Bind to existing dataframe
SP500 <-  cbind(SP500, sd.matrix)


# Use base R to work out the rolling z-score (Close - roll mean) / stdev
SP500$close.zscore.n2 <- apply(SP500[,c('Close','close.sma.n2', 'close.sd.n2')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n3 <- apply(SP500[,c('Close','close.sma.n3', 'close.sd.n3')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n4 <- apply(SP500[,c('Close','close.sma.n4', 'close.sd.n4')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n5 <- apply(SP500[,c('Close','close.sma.n5', 'close.sd.n5')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n6 <- apply(SP500[,c('Close','close.sma.n6', 'close.sd.n6')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n7 <- apply(SP500[,c('Close','close.sma.n7', 'close.sd.n7')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n8 <- apply(SP500[,c('Close','close.sma.n8', 'close.sd.n8')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n9 <- apply(SP500[,c('Close','close.sma.n9', 'close.sd.n9')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n10 <- apply(SP500[,c('Close','close.sma.n10', 'close.sd.n10')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n11 <- apply(SP500[,c('Close','close.sma.n11', 'close.sd.n11')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n12 <- apply(SP500[,c('Close','close.sma.n12', 'close.sd.n12')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n13 <- apply(SP500[,c('Close','close.sma.n13', 'close.sd.n13')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n14 <- apply(SP500[,c('Close','close.sma.n14', 'close.sd.n14')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n15 <- apply(SP500[,c('Close','close.sma.n15', 'close.sd.n15')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n16 <- apply(SP500[,c('Close','close.sma.n16', 'close.sd.n16')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n17 <- apply(SP500[,c('Close','close.sma.n17', 'close.sd.n17')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n18 <- apply(SP500[,c('Close','close.sma.n18', 'close.sd.n18')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n19 <- apply(SP500[,c('Close','close.sma.n19', 'close.sd.n19')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n20 <- apply(SP500[,c('Close','close.sma.n20', 'close.sd.n20')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n21 <- apply(SP500[,c('Close','close.sma.n21', 'close.sd.n21')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n22 <- apply(SP500[,c('Close','close.sma.n22', 'close.sd.n22')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n23 <- apply(SP500[,c('Close','close.sma.n23', 'close.sd.n23')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n24 <- apply(SP500[,c('Close','close.sma.n24', 'close.sd.n24')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n25 <- apply(SP500[,c('Close','close.sma.n25', 'close.sd.n25')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n26 <- apply(SP500[,c('Close','close.sma.n26', 'close.sd.n26')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n27 <- apply(SP500[,c('Close','close.sma.n27', 'close.sd.n27')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n28 <- apply(SP500[,c('Close','close.sma.n28', 'close.sd.n28')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n29 <- apply(SP500[,c('Close','close.sma.n29', 'close.sd.n29')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n30 <- apply(SP500[,c('Close','close.sma.n30', 'close.sd.n30')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n31 <- apply(SP500[,c('Close','close.sma.n31', 'close.sd.n31')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n32 <- apply(SP500[,c('Close','close.sma.n32', 'close.sd.n32')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n33 <- apply(SP500[,c('Close','close.sma.n33', 'close.sd.n33')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n34 <- apply(SP500[,c('Close','close.sma.n34', 'close.sd.n34')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n35 <- apply(SP500[,c('Close','close.sma.n35', 'close.sd.n35')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n36 <- apply(SP500[,c('Close','close.sma.n36', 'close.sd.n36')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n37 <- apply(SP500[,c('Close','close.sma.n37', 'close.sd.n37')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n38 <- apply(SP500[,c('Close','close.sma.n38', 'close.sd.n38')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n39 <- apply(SP500[,c('Close','close.sma.n39', 'close.sd.n39')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n40 <- apply(SP500[,c('Close','close.sma.n40', 'close.sd.n40')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n41 <- apply(SP500[,c('Close','close.sma.n41', 'close.sd.n41')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n42 <- apply(SP500[,c('Close','close.sma.n42', 'close.sd.n42')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n43 <- apply(SP500[,c('Close','close.sma.n43', 'close.sd.n43')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n44 <- apply(SP500[,c('Close','close.sma.n44', 'close.sd.n44')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n45 <- apply(SP500[,c('Close','close.sma.n45', 'close.sd.n45')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n46 <- apply(SP500[,c('Close','close.sma.n46', 'close.sd.n46')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n47 <- apply(SP500[,c('Close','close.sma.n47', 'close.sd.n47')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n48 <- apply(SP500[,c('Close','close.sma.n48', 'close.sd.n48')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n49 <- apply(SP500[,c('Close','close.sma.n49', 'close.sd.n49')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n50 <- apply(SP500[,c('Close','close.sma.n50', 'close.sd.n50')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n51 <- apply(SP500[,c('Close','close.sma.n51', 'close.sd.n51')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n52 <- apply(SP500[,c('Close','close.sma.n52', 'close.sd.n52')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n53 <- apply(SP500[,c('Close','close.sma.n53', 'close.sd.n53')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n54 <- apply(SP500[,c('Close','close.sma.n54', 'close.sd.n54')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n55 <- apply(SP500[,c('Close','close.sma.n55', 'close.sd.n55')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n56 <- apply(SP500[,c('Close','close.sma.n56', 'close.sd.n56')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n57 <- apply(SP500[,c('Close','close.sma.n57', 'close.sd.n57')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n58 <- apply(SP500[,c('Close','close.sma.n58', 'close.sd.n58')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n59 <- apply(SP500[,c('Close','close.sma.n59', 'close.sd.n59')], 1, function(x) { (x[1]-x[2])/x[3] } )
SP500$close.zscore.n60 <- apply(SP500[,c('Close','close.sma.n60', 'close.sd.n60')], 1, function(x) { (x[1]-x[2])/x[3] } )

# Convert all NA to 0
SP500[is.na(SP500)] <- 0

# Calculate quartiles, where close is relation to range (Close - High) / (High - Low)
#SP500$quartile <- apply(SP500[,c('Close', 'Low', 'High')], 1, function(x) { (x[1]-x[2])/(x[3]-x[2])} )

# Calculate Returns from open to close 
SP500$ocret <- apply(SP500[,c('Open', 'Close')], 1, function(x) { (x[2]-x[1])/x[1]} )

# Calculate Close-to-Close returns
SP500$clret <- ROC(SP500$Close, type = c("discrete"))
SP500$clret[1] <- 0

#####################################################################
# Split Data To Train and Test Set
#####################################################################
#SP500 <- subset(SP500, Date >= as.Date("1993-01-01") )             #Input for subsetting by date versus splitting, works with train set only
train.index <- 1:(nrow(SP500)*.570) # Add *.666 for 2/3rd split
train.set <- SP500[train.index, ]
test.set <- SP500[-train.index, ]

#####################################################################
# Assign train and test set indicactors 
#####################################################################
# Name indicators #
train.indicator <- train.set$close.zscore.n11
test.indicator <- test.set$close.zscore.n11

######################################################################
# Develop Training Set Paramaters
# ####################################################################
# Enter buy / sell rules
train.set$enter.mean.rev <- ifelse(train.indicator < 0, 1, 0)
train.set$exit.mean.rev <- ifelse(train.indicator > 0, 1, 0)
train.set$enter.momo <- ifelse(train.indicator > 0, 1, 0)
train.set$exit.momo <- ifelse(train.indicator < 0, 1, 0)

# Mean Rev
train.set <- train.set %>%
  dplyr::mutate(sig.mean.rev = ifelse(enter.mean.rev == 1, 1,
                                      ifelse(exit.mean.rev == 1, 0, 0)))

# Momentum
train.set <- train.set %>%
  dplyr::mutate(sig.momo = ifelse(enter.momo == 1, 1,
                                      ifelse(exit.momo == 1, 0, 0)))


# lag signal by one forward day to signal entry next day 
train.set$sig.mean.rev <- lag(train.set$sig.mean.rev,1) # Note k=1 implies a move *forward*
train.set$sig.momo <- lag(train.set$sig.momo,1) # Note k=1 implies a move *forward*
train.set[is.na(train.set)] <- 0  # Set NA to 0

# Calculate equity curves
# Mean rev
train.set <- train.set %>%
  dplyr::mutate(RunID = rleid(sig.mean.rev)) %>%
  group_by(RunID) %>%
  dplyr::mutate(mean.rev.equity = ifelse(sig.mean.rev == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Momentum
train.set <- train.set %>%
  dplyr::mutate(RunID = rleid(sig.momo)) %>%
  group_by(RunID) %>%
  dplyr::mutate(momo.equity = ifelse(sig.momo == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats 
xts1 = xts(train.set$mean.rev.equity, order.by=as.POSIXct(train.set$Date, format="%Y-%m-%d")) 
xts2 = xts(train.set$momo.equity, order.by=as.POSIXct(train.set$Date, format="%Y-%m-%d")) 
xts3 = xts(train.set$clret, order.by=as.POSIXct(train.set$Date, format="%Y-%m-%d")) 

# Join XTS together 
train.compare <- cbind(xts1,xts2,xts3)

# Use the PerformanceAnalytics package for trade statistics
colnames(train.compare) <- c("Mean Reversion","Momentum","Buy And Hold")
charts.PerformanceSummary(train.compare,main="Cumulative Returns", wealth.index=TRUE, colorset=rainbow12equal)
#png(filename="20090606_rsi2_performance_updated.png", 720, 720)
#performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
#drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()

# Train Set Log Returns 
train.logRets <- log(cumprod(1+train.compare))
chart.TimeSeries(train.logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)

######################################################
# Develop Test Set Paramaters (Unseen Data)
######################################################

# Enter buy / sell rules
test.set$enter.mean.rev <- ifelse(test.indicator < 0, 1, 0)
test.set$exit.mean.rev <- ifelse(test.indicator > 0, 1, 0)
test.set$enter.momo <- ifelse(test.indicator > 0, 1, 0)
test.set$exit.momo <- ifelse(test.indicator < 0, 1, 0)

# Mean Rev
test.set <- test.set %>%
  dplyr::mutate(sig.mean.rev = ifelse(enter.mean.rev == 1, 1,
                                      ifelse(exit.mean.rev == 1, 0, 0)))

# Momentum
test.set <- test.set %>%
  dplyr::mutate(sig.momo = ifelse(enter.momo == 1, 1,
                                  ifelse(exit.momo == 1, 0, 0)))


# lag signal by one forward day to signal entry next day 
test.set$sig.mean.rev <- lag(test.set$sig.mean.rev,1) # Note k=1 implies a move *forward*
test.set$sig.momo <- lag(test.set$sig.momo,1) # Note k=1 implies a move *forward*
test.set[is.na(test.set)] <- 0  # Set NA to 0

# Calculate equity curves
# Mean rev
test.set <- test.set %>%
  dplyr::mutate(RunID = rleid(sig.mean.rev)) %>%
  group_by(RunID) %>%
  dplyr::mutate(mean.rev.equity = ifelse(sig.mean.rev == 0, 0,
                                         ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Momentum
test.set <- test.set %>%
  dplyr::mutate(RunID = rleid(sig.momo)) %>%
  group_by(RunID) %>%
  dplyr::mutate(momo.equity = ifelse(sig.momo == 0, 0,
                                     ifelse(row_number() == 1, ocret, clret))) %>%
  ungroup() %>%
  select(-RunID)

# Pull select columns from data frame to make XTS whilst retaining formats 
xts1 = xts(test.set$mean.rev.equity, order.by=as.POSIXct(test.set$Date, format="%Y-%m-%d")) 
xts2 = xts(test.set$momo.equity, order.by=as.POSIXct(test.set$Date, format="%Y-%m-%d")) 
xts3 = xts(test.set$clret, order.by=as.POSIXct(test.set$Date, format="%Y-%m-%d")) 

# Join XTS together 
test.compare <- cbind(xts1,xts2,xts3)

# Use the PerformanceAnalytics package for trade statistics
colnames(test.compare) <- c("Mean Reversion","Momentum","Buy And Hold")
charts.PerformanceSummary(test.compare,main="Cumulative Returns", wealth.index=TRUE, colorset=rainbow12equal)
#png(filename="20090606_rsi2_performance_updated.png", 720, 720)
#performance.table <- rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare),table.DownsideRisk(compare))
#drawdown.table <- rbind(table.Drawdowns(compare))
#dev.off()

# Test Set Log Returns 
test.logRets <- log(cumprod(1+test.compare))
chart.TimeSeries(test.logRets, legend.loc='topleft', colorset=rainbow12equal)

print(performance.table)
print(drawdown.table)
head(train.logRets)

tail(train.logRets)

# Prepare data for plotting 
train.reps <- rep(1,nrow(train.logRets))
test.reps <- rep(2,nrow(test.logRets))
id <- c(train.reps,test.reps)
final <- rbind(test.compare,train.compare)
final <-log(cumprod(1+final))
final.df <- data.frame(final,id=id)
final.df <- setDT(final.df, keep.rownames = TRUE)[] # Set row names
colnames(final.df)[1] <- "Date"
final.df$Date <- ymd_hms(final.df$Date)

# negative 2 lag auto correlation dates 
start_one <- as.POSIXct("1931-9-25")
end_one <- as.POSIXct("1932-7-21")
start_two <- as.POSIXct("1936-1-13")
end_two <- as.POSIXct("1940-5-13")
start_three <- as.POSIXct("1940-12-26")
end_three <- as.POSIXct("1941-1-11")
start_four <- as.POSIXct("1941-4-4")
end_four <- as.POSIXct("1941-7-29")
start_five <- as.POSIXct("1965-9-23")
end_five <- as.POSIXct("1965-10-22")
start_six <- as.POSIXct("1990-10-12")
end_six <- as.POSIXct("1990-10-19")
start_seven <- as.POSIXct("1994-12-23")
end_seven <- as.POSIXct("1995-1-16")
start_eight <- as.POSIXct("1998-9-8")
end_eight <- as.POSIXct("2017-10-20")

# Plot train and test set equity curves with ggplot2 
p1<- ggplot(data = final.df, aes(x =Date, y = Mean.Reversion,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Mean.Reversion,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Momentum,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Buy.And.Hold,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Mean.Reversion,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Momentum,colour=id))+
  geom_line(data = final.df, aes(x =Date, y = Buy.And.Hold,colour=id))+
  theme_bw()+
  geom_vline(xintercept=as.numeric(as.POSIXct("1977-08-18 19:00:00"),linetype="dashed"))+
  theme(legend.position = "none")+
  scale_y_continuous(breaks = seq(-7, 7, by = .5))+
  ggtitle("Mean Reversion, Momentum, Buy And Hold - S&P500 Index",subtitle="1928 to Present") +
  labs(x="Date",y="Cumulative Log Return")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  #geom_rect(aes(xmin=start_one,xmax=end_one,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_two,xmax=end_two,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_three,xmax=end_three,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_four,xmax=end_four,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_five,xmax=end_five,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_six,xmax=end_six,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_seven,xmax=end_seven,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  #geom_rect(aes(xmin=start_eight,xmax=end_eight,ymin=-Inf,ymax=Inf),alpha=0.01,fill="#CC6666")+
  annotate("text", label = "Mean Reversion", x = as.POSIXct("2010-01-01"), y = -2.0, color = "#003A90")+
  annotate("text", label = "Momentum", x = as.POSIXct("2010-01-01"), y = 6.5, color = "#003A90")+
  annotate("text", label = "Buy and Hold", x = as.POSIXct("2010-01-01"), y = 3.5, color = "#003A90")+
annotate("text", label = "Out of Sample", x = as.POSIXct("2000-01-01"), y = .5, color = "#56B1F7")+
  annotate("text", label = "In Sample", x = as.POSIXct("1940-01-01"), y = 4.0, color = "#132B43")
  
  

p2 <- ggplot(cor2.df,aes(Date,V1))+geom_line(aes(colour="V1"))+
  geom_area(aes(y = V1, fill="V1"))+
  ggtitle("Rolling Auto Correlation, 2 lag, rolling window = 3 years", subtitle = "S&P500 Log Returns") +
  labs(x="Date",y="Correlation")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))+
  theme(legend.position="none")+
  scale_y_continuous(breaks = seq(-.5, .50, by = 0.05))
gridExtra::grid.arrange(p1, p2, ncol = 1)

Now lets plot the strategies against the lag 2 autocorrelation:

Rplot173

The success of a momentum strategy relies (in this short term momentum rule) on a positive 2 lag autocorrelation. We see that the positive 2 lag autocorrelation peaked in 5/23/1973 and declined into negative territory around 1998 where it remains negative at present day. The success of a mean reversion strategy is largely dependent on a negative 2 lag autocorrelation, notably beginning post dot com bubble. This marked a new regime shift from momentum to mean reversion (interday basis).

To conclude, the 2 lag autocorrelation may be a useful tool for detecting which type of model to fit to the underlying series.

Optimal f – Monte Carlo R Implementation

As we go through the stages of strategy robustness testing. I thought it would be prudent to visit the subject of position sizing. World Cup Championship trader Kevin Davey (2014) uses monte carlo simulation to find the optimal f position size fraction. After finding optimal f the share/contract number can be found with:

N = f * equity / L

f= optimal f (found through simulation or pre selected)

L – Worst historical loss (absolute value dollar $ loss) note: often our worst draw down is ahead of us, one may want to increase this value also

If a trader wanted to risk a fraction of equity on a trade, for example , 0.4 or 40% of total equity, where equity is 25,000 and the historical worst loss was $500:

Nshares = 0.4 * 25,000 / 500  = 20 shares

We implement an R optimal f simulation. We plot various performance metrics against fractional (f). Cumulative returns, maximum draw down, risk of ruin and sharpe ratio. In the code example we run 252 days worth of random data in $ dollar gain / loss amounts. We then adjust these outputs by varying fractional (f). It should be noted that a real dollar strategy PnL may be added to this code to run simulations, there is a section that will sample with or without replacement to obtain many iterations of a back test PnL.

An actual strategy back test has a set sequence of wins and losses and by using monte carlo simulation we may mix up the original trade sequence. We may run 1000’s or millions of iterations to obtain a distribution of performance metrics. The distribution of performance metrics may provide a more realistic expectancy going forward. The R code to perform optimal f simulations:

 # Optimal F Simulation In R 
# Make random returns 
# Adjust $ return output by varying fractional (f)
#######################################################################
# Calculations 
# Calculation 1 = Optimal F calculation Start calculation = 1 + fractional amount * trade gain/loss (-/+) / abs(worst.loss)
# Calculation 2 = Optimal F Calculation = prev equity * (1+fractional (f) amount * trade gain/loss (-/+) / abs(worst.loss)
#####################################################################

# Required Packages 
require(reshape2)
require(PerformanceAnalytics)

# Set (f) values
f <- c(0,0.05,	0.1,	0.15,	0.2,	0.25,	0.3,	0.35,	0.4	,0.45,	0.5	,0.55,	0.6	,0.65,	0.7,	0.75,	0.8,	0.85,	0.9	,0.95	,1)

# Create emptry matrix's for each (f) iteration
f.0.00 <- matrix()
f.0.05 <- matrix()
f.0.10 <- matrix()
f.0.15 <- matrix()
f.0.05 <- matrix()
f.0.20 <- matrix()
f.0.25 <- matrix()
f.0.30 <- matrix()
f.0.35 <- matrix()
f.0.40 <- matrix()
f.0.45 <- matrix()
f.0.50 <- matrix()
f.0.55 <- matrix()
f.0.60 <- matrix()
f.0.65 <- matrix()
f.0.70 <- matrix()
f.0.75 <- matrix()
f.0.80 <- matrix()
f.0.85 <- matrix()
f.0.90 <- matrix()
f.0.95 <- matrix()
f.1 <- matrix()

# Create random $ amount gains and losses
random.rets <- ifelse(runif(252)<0.40,-1,1)*round(252*runif(252),2)

# Sample and replace original return stream (note this may be real life gains and losses from a strategy)
# One may wish to enter actual trade results here
# and sample with or without replacement
rets <- sample(random.rets,replace=TRUE) # Remove repace=TRUE for without replacement
rets <- random.rets
i=2# Start on second location (calculation 1 is fixed)
for (i in 2:length(rets)) { 
  f.0.00[1] = (1+0.00 * rets[1] / abs(min(rets)))
  f.0.00[i] = f.0.00[i-1] * ((1+0.00* rets[i] / abs(min(rets))))
f.0.05[1] = (1+0.05 * rets[1] / abs(min(rets)))
f.0.05[i] = f.0.05[i-1] * ((1+0.05* rets[i] / abs(min(rets))))
f.0.10[1] = (1+0.10 * rets[1] / abs(min(rets)))
f.0.10[i] = f.0.10[i-1] * ((1+0.10* rets[i] / abs(min(rets))))
f.0.15[1] = (1+0.15 * rets[1] / abs(min(rets)))
f.0.15[i] = f.0.15[i-1] * ((1+0.15* rets[i] / abs(min(rets))))
f.0.20[1] = (1+0.20 * rets[1] / abs(min(rets)))
f.0.20[i] = f.0.20[i-1] * ((1+0.20* rets[i] / abs(min(rets))))
f.0.25[1] = (1+0.25 * rets[1] / abs(min(rets)))
f.0.25[i] = f.0.25[i-1] * ((1+0.25* rets[i] / abs(min(rets))))
f.0.30[1] = (1+0.30 * rets[1] / abs(min(rets)))
f.0.30[i] = f.0.30[i-1] * ((1+0.30* rets[i] / abs(min(rets))))
f.0.35[1] = (1+0.35 * rets[1] / abs(min(rets)))
f.0.35[i] = f.0.35[i-1] * ((1+0.35* rets[i] / abs(min(rets))))
f.0.40[1] = (1+0.40 * rets[1] / abs(min(rets)))
f.0.40[i] = f.0.40[i-1] * ((1+0.40* rets[i] / abs(min(rets))))
f.0.45[1] = (1+0.45 * rets[1] / abs(min(rets)))
f.0.45[i] = f.0.45[i-1] * ((1+0.45* rets[i] / abs(min(rets))))
f.0.50[1] = (1+0.50 * rets[1] / abs(min(rets)))
f.0.50[i] = f.0.50[i-1] * ((1+0.50* rets[i] / abs(min(rets))))
f.0.55[1] = (1+0.55 * rets[1] / abs(min(rets)))
f.0.55[i] = f.0.55[i-1] * ((1+0.55* rets[i] / abs(min(rets))))
f.0.60[1] = (1+0.60 * rets[1] / abs(min(rets)))
f.0.60[i] = f.0.60[i-1] * ((1+0.60* rets[i] / abs(min(rets))))
f.0.65[1] = (1+0.65 * rets[1] / abs(min(rets)))
f.0.65[i] = f.0.65[i-1] * ((1+0.65* rets[i] / abs(min(rets))))
f.0.70[1] = (1+0.70 * rets[1] / abs(min(rets)))
f.0.70[i] = f.0.70[i-1] * ((1+0.70* rets[i] / abs(min(rets))))
f.0.75[1] = (1+0.75 * rets[1] / abs(min(rets)))
f.0.75[i] = f.0.75[i-1] * ((1+0.75* rets[i] / abs(min(rets))))
f.0.80[1] = (1+0.80 * rets[1] / abs(min(rets)))
f.0.80[i] = f.0.80[i-1] * ((1+0.80* rets[i] / abs(min(rets))))
f.0.85[1] = (1+0.85 * rets[1] / abs(min(rets)))
f.0.85[i] = f.0.85[i-1] * ((1+0.85* rets[i] / abs(min(rets))))
f.0.90[1] = (1+0.90 * rets[1] / abs(min(rets)))
f.0.90[i] = f.0.90[i-1] * ((1+0.90* rets[i] / abs(min(rets))))
f.0.95[1] = (1+0.95 * rets[1] / abs(min(rets)))
f.0.95[i] = f.0.95[i-1] * ((1+0.95* rets[i] / abs(min(rets))))
f.1[1] = (1+1 * rets[1] / abs(min(rets)))
f.1[i] = f.1[i-1] * ((1+1* rets[i] / abs(min(rets))))
}

# Output
out <- cbind(f.0.00,f.0.05,f.0.10,f.0.15,f.0.20,f.0.25,f.0.30,f.0.35,f.0.40,f.0.45,f.0.50,f.0.55,f.0.60,f.0.65,f.0.70,f.0.75,f.0.80,f.0.85,f.0.90,f.0.95,f.1)
cnames <- f
colnames(out) <- cnames

######################################################
# Find cumulative return for any given fractional (f)
#######################################################
cr.list <- list()
i=1
for (i in 1:21) { 
  cr.rets <- ROC(out[,i],type = "discrete",na.pad=TRUE)
  cr.list[[i]] <- rbind(as.numeric(as.data.frame(Return.cumulative(xts(cr.rets, Sys.Date()-length(cr.rets):1)))))
}

# cbind output 
cr_out <- do.call(cbind,cr.list)
cnames <- f
colnames(cr_out) <- cnames
# Plot cumulative return 
cr_plot <- melt(cr_out)
cr_plot <- setDT(cr_plot, keep.rownames=TRUE)
cr_plot <- data.frame(cr_plot)
plot(cr_plot$Var2,cr_plot$value,type="o",col="red",ann=FALSE,xaxt="n")
title("Cumulative Return Vs Fractional (f)",xlab="Fractional (f)", ylab="Cumulative Return (%)")
axis(side = 1, at = f, las=2)

# Plot for final $ equity
#plot.df <- out[nrow(out),] # Extract last row of simulation
#plots <- melt(plot.df)
#plots <- setDT(plots, keep.rownames=TRUE)
#plots <- data.frame(plots)
# Plot
#plot(plots$rn,plots$value,type="o",col="red", ann=FALSE)
#title("Final Equity Vs Fractional (f)",xlab="Fractional (f)", ylab="Final Equity ($)")

################################################################################
# Find maximum drawdown for any given fractioanl (f)
################################################################################
dd <- data.frame(out)
cnames <- f
colnames(dd) <- cnames

# loop to pull max dd 
dd.list <- list()
i=1
for (i in 1:length(dd)) { 
 dd.rets <- ROC(dd[,i],type = "discrete",na.pad=TRUE)
  dd.list[[i]] <- rbind(as.numeric(as.data.frame(maxDrawdown(xts(dd.rets, Sys.Date()-length(dd.rets):1)))))
}

# Validation
#rets <- ROC(dd[,2],type = "discrete",na.pad=TRUE)
#maxDrawdown(xts(rets, Sys.Date()-length(rets):1))
#chart.Drawdown(xts(rets$rets, Sys.Date()-nrow(rets):1), legend.loc = NULL, colorset = (1:12))

# cbind output 
dd_out <- do.call(cbind,dd.list)
cnames <- f
colnames(dd_out) <- cnames

# Plot DD vs fractional (f)
  dd.plot <- melt(dd_out)
# Plot
plot(dd.plot$Var2,dd.plot$value,type="o",col="red", ann=FALSE,xaxt="n")
title("Maximum Drawdown Vs Fractional (f)",xlab="Fractional (f)", ylab="Maximum Draw Down (%)")
axis(side = 1, at = f, las=2)

# Combine Plots 
par(mfrow=c(2,2))
# Cumulative Return
plot(cr_plot$Var2,cr_plot$value,type="o",col="red", ann=FALSE, xaxt="n")
title("Cumulative Return Vs Fractional (f)",xlab="Fractional (f)", ylab="Cumulative Return (%)")
axis(side = 1, at = f, las=2)
# Maximum Draw down
plot(dd.plot$Var2,dd.plot$value,type="o",col="red", ann=FALSE,xaxt="n")
title("Maximum Drawdown Vs Fractional (f)",xlab="Fractional (f)", ylab="Maximum Draw Down (%)")
axis(side = 1, at = f, las=2)

# Find win / loss % of sample
pw <- sum(rets >0) / length(rets)
pl <- sum(rets <0)/ length(rets)
pw
pl

###############################################################
# Find Risk of ruin for any given fractional (f)
###############################################################
# Calculate daily returns
rets.list <- list()
mean.win.list <- list()
mean.loss.list <- list()
sd.ret.list <- list()
i=1
for (i in 1:21) { 
  rets.rets <- ROC(out[,i],type = "discrete",na.pad=TRUE)
  rets.list[[i]] <- rets.rets
  mean.win.list[[i]] <- mean(rets.rets[rets.rets>0],na.rm=TRUE)
  mean.loss.list[[i]] <- mean(rets.rets[rets.rets<0],na.rm=TRUE)
    sd.ret.list[[i]] <- sd(rets.rets,na.rm=TRUE)
}

# cbind output 
rets_out <- do.call(cbind,rets.list)
cnames <- f
colnames(rets_out) <- cnames

mean_win_out <- do.call(cbind,mean.win.list)
cnames <- f
colnames(mean_win_out) <- cnames

mean_loss_out <- do.call(cbind,mean.loss.list)
cnames <- f
colnames(mean_loss_out) <- cnames

sd_out <- do.call(cbind,mean.ret.list)
cnames <- f
colnames(sd_out) <- cnames

# Fixed fractional risk of ruin
# Ralph Vince 1990 (Portfolio Management Formulas : Mathematical Trading Methods for the Futures, Options, and Stock Markets)
risk_ruin.list <- list()
  i=1
q=2
g= 0.3   # Set maximum % loss willing to accept
for (i in 1:21) { 
  aw  = mean_win_out[,i]
  al  = mean_loss_out[,i]
  pw = pw
  p = .5 * (1+(z/a))
  z = (abs((aw/q)*pw)) - (abs(al/q)*(1-pw))
  a = ((pw * (aw/q)^2) + ((1-pw) * (al/q)^2)) ^(1/2)
  u = g/a
  risk_ruin.list[[i]] =  ((1-p)/p)^u *100
  }

# cbind output 
risk_ruin_out <- do.call(cbind,risk_ruin.list)
cnames <- f
colnames(risk_ruin_out) <- cnames

# Plot risk of ruin vs fractional (f)
rr.plot <- melt(risk_ruin_out)
rr.plot <- replace(rr.plot, is.na(rr.plot), 0)
#options(scipen = 0)
# Plot
plot(rr.plot$Var2,rr.plot$value,type="o",col="red", ann=FALSE,xaxt="n")
title("Risk Of Ruin Vs Fractional (f)",xlab="Fractional (f)", ylab="Risk of Ruin (%)")
axis(side = 1, at = f, las=2)

########################################################################################
# Find sharpe ratio for any given fractional (f)
################################################################################
sr.list <- list()
i=1
for (i in 1:21) { 
  sr.rets <- ROC(out[,i],type = "discrete",na.pad=TRUE)
  sr.list[[i]] <- rbind(as.numeric(as.data.frame(SharpeRatio.annualized(xts(sr.rets, Sys.Date()-length(sr.rets):1)))))
}

# cbind output 
sr_out <- do.call(cbind,sr.list)
cnames <- f
colnames(sr_out) <- cnames
# Plot cumulative return 
sr_plot <- melt(sr_out)
sr_plot <- setDT(sr_plot, keep.rownames=TRUE)
sr_plot <- data.frame(sr_plot)
plot(sr_plot$Var2,sr_plot$value,type="o",col="red",ann=FALSE,xaxt="n")
title("Sharpe Ratio Vs Fractional (f)",xlab="Fractional (f)", ylab="Sharpe Ratio")
axis(side = 1, at = f, las=2)

############################################################
# Find optimal F 
###########################################################
find.max <- which(cr_plot$value==max(cr_plot$value))
optimal.f <- cr_plot$Var2[find.max]
cat("Optimal f for this iteration = ", optimal.f)

########################################################################################################
# Highlight code and ctrl+r to run many iterations

 And the output:

Rplot273

With an optimal f of:

> cat("Optimal f = ", optimal.f)
Optimal f =  0.2

And for reference, the random sample winning / losing %:

> # Find win / loss % of sample
> pw <- sum(rets >0) / length(rets) # win
> pl <- sum(rets <0)/ length(rets)  # loss
> pw
[1] 0.5674603
> pl
[1] 0.4325397

For this particular trade sequence, we see that optimal f is 0.20. If we connect the dots for maximum draw down, 0.20 = max dd of 74% and a risk of ruin for a 30% draw down at 17.4%. One would select a lower optimal f, or abandon the strategy all together.

A few notes about the risk of ruin calculation. It was taken from Ralph Vince(1990) and is the R3 risk of ruin calculation as described in his book.

Next steps for the monte carlo simulator involve adding in $ PnL of back tested results. And run 1000’s of simulations. I will be working on this over the coming month.

Full code on my github.

Thanks for reading, any questions drop me a line.

Monthly Investing vs Yearly Investing – Effects on Average Share Cost

I invest in a long term S&P500 momentum strategy. I have been making monthly contributions and over the short term. The average trade cost seemed to be much higher than initial cost. I wanted to see if buying in bulk 1x a year was better than investing every month over a long run of history.

The calculation to work out the average trade cost is as follows:

Total $ amount invested / Total shares invested.

If we invest at each time increment:

# Invest - Avg Share Cost
# Rolling Calculation
prices <- c(12.56,13.45,11.10,10.09,12.3,16.7)
total.to.invest <- c(10000,7000,1600,6000,3000,12000)
table <- data.frame(prices,total.to.invest)
table$total.shares <- apply(table[,c('prices','total.to.invest')], 1, function(x) { round((x[2]/x[1])) } )
table$total.shares.sum <- cumsum(table$total.shares)
table$total.to.invest.sum <- cumsum(table$total.to.invest)
table$roll.cost.avg <- apply(table[,c('total.shares.sum','total.to.invest.sum')], 1, function(x) { (x[2]/x[1]) } ) > head(table)
  prices total.to.invest total.shares total.shares.sum total.to.invest.sum roll.cost.avg
1  12.56           10000          796              796               10000      12.56281
2  13.45            7000          520             1316               17000      12.91793
3  11.10            1600          144             1460               18600      12.73973
4  10.09            6000          595             2055               24600      11.97080
5  12.30            3000          244             2299               27600      12.00522
6  16.70           12000          719             3018               39600      13.12127

This is on a rolling basis, we see that the average cost is higher than our initial entry which will be the case with an up trending series.

If we invest less at the beginning and more at the end:

# Invest less at beginning, Invest more later
prices <- c(12.56,13.45,11.10,10.09,12.3,16.7)
total.to.invest <- c(100,700,1600,6000,3000,120000)
table <- data.frame(prices,total.to.invest)
table$total.shares <- apply(table[,c('prices','total.to.invest')], 1, function(x) { round((x[2]/x[1])) } )
table$total.shares.sum <- cumsum(table$total.shares)
table$total.to.invest.sum <- cumsum(table$total.to.invest)
table$roll.cost.avg <- apply(table[,c('total.shares.sum','total.to.invest.sum')], 1, function(x) { (x[2]/x[1]) } )

  prices total.to.invest total.shares total.shares.sum total.to.invest.sum roll.cost.avg
1  12.56             100            8                8                 100      12.50000
2  13.45             700           52               60                 800      13.33333
3  11.10            1600          144              204                2400      11.76471
4  10.09            6000          595              799                8400      10.51314
5  12.30            3000          244             1043               11400      10.93001
6  16.70          120000         7186             8229              131400      15.96792

As we invested a higher amount of capital later than earlier, we brought our average cost up and closer to the current price. Its a weighted average after all.

Next we simulate adding a fixed amount of capital every month and a fixed amount of capital every year.

Two investment capitals:

# Enter monthly and yearly capital investments
capital.invest.e.month <- 10000/12 # Invest 10,000 a year, split into months
bulk.invest.1.year <- 10000

We invest 10,000 per year divided by 12 months, 833.33 each month and 10,000 every year buying the S&P500. Orders are placed at the start of every month and yearly orders at the start of every year.

We then compute a rolling average cost, the code to do this:

# Bulk Buy 1x a year Vs investing every month
# Andrew Bannerman 1.3.2018

require(xts)
require(data.table)
require(ggplot2)
require(lubridate)
require(magrittr)
require(scales)
require(reshape2)
require(PerformanceAnalytics)
require(dplyr)
require(TTR)

# Download SPX data
require(quantmod)
startdate<- "1930-01-01"
SPX <- getSymbols("^GSPC",from=startdate,auto.assign=FALSE)
SPX <-  data.frame(Date=index(SPX), coredata(SPX)) # Change XTS to data frame and retain Date column

# Add day of month column 
# This is a helper column for creating buy/sell rules
months <- SPX %>% dplyr::mutate(month = lubridate::month(Date)) %>% group_by(month) 
days <- SPX %>% dplyr::mutate(day = lubridate::day(Date)) %>% group_by(day) 
years <- SPX %>% dplyr::mutate(year = lubridate::year(Date)) %>% group_by(year) 
df <- data.frame(SPX,month=months$month,day=days$day,year=years$year)

# Subset df by date 
#df <- subset(df, Date >= as.Date("2009-01-01"))
head(df$Date)

# Enter monthly and yearly capital investments
capital.invest.e.month <- 10000/12 # Invest 10,000 a year, split into months
bulk.invest.1.year <- 10000

# Simulate buying every month
output <- df %>%
  dplyr::mutate(RunID = data.table::rleid(month)) %>%
  group_by(RunID) %>%
  mutate(ID.No = row_number()) %>%
  dplyr::mutate(total.shares.months = ifelse(ID.No == 1,first(capital.invest.e.month) / first(GSPC.Adjusted),0)) %>%  # Divide total purchased by cost price for total share
  dplyr::mutate(total.cost.months = ifelse(ID.No == 1,first(total.shares.months) * first(GSPC.Adjusted),0)) %>%  # Divide total purchased by cost price for total share
  ungroup() %>%
  select(-RunID)

df <- data.frame(output)
head(df$Date)
# Simulate buying 1x share start of each month
#output <- df %>%
#dplyr::mutate(RunID = data.table::rleid(month)) %>%
#group_by(RunID) %>%
#  mutate(ID.No = row_number()) %>%
#  dplyr::mutate(first.month.total.cost = ifelse(ID.No == 1,first(GSPC.Adjusted) * 1,0)) %>% # Own 1x share at close price change 1 to 2 for more..
#  dplyr::mutate(total.shares = ifelse(ID.No == 1,first(first.month.total.cost) / first(GSPC.Adjusted),0)) %>%  # Divide total purchased by cost price for total share
#    ungroup() %>%
#  select(-RunID)

# Simulate bulk investing 1x a year
output <- df %>%
  dplyr::mutate(RunID = data.table::rleid(year)) %>%
  group_by(RunID) %>%
  mutate(ID.No = row_number()) %>%
  dplyr::mutate(total.shares.years = ifelse(ID.No == 1,round(first(bulk.invest.1.year) / first(GSPC.Adjusted)),0)) %>%  # Divide total purchased by cost price for total share
  dplyr::mutate(total.cost.years = ifelse(ID.No == 1,first(total.shares.years) * first(GSPC.Adjusted),0)) %>%  # Divide total purchased by cost price for total share
  ungroup() %>%
  select(-RunID)
# output data frame
df <- data.frame(output)

# Calculate average cost per share 
# sum first.month.total cost / sum of total shares bought
month.invest.avg.cost <- sum(df$total.cost.months) / sum(df$total.shares.months)
year.invest.avg.cost <- sum(df$total.cost.years) / sum(df$total.shares.years)
find.first.price <- head(df$GSPC.Adjusted,1)
find.last.price <- tail(df$GSPC.Adjusted,1)

# Subset for month avg cost
# index
df$index <- seq(1:nrow(df))
df.month <- subset(df,total.shares.months >0)
# totals 
df.month$total.shares.months.sum <- cumsum(df.month$total.shares.months)
df.month$total.cost.months.sum <- cumsum(df.month$total.cost.months)
df.month$month.roll.avg.cost <- apply(df.month[,c('total.cost.months.sum','total.shares.months.sum')], 1, function(x) { (x[1]/x[2]) } )
head(df.month$Date)
# Join original df 
df.join.month <- full_join(df, df.month, by = c("Date" = "Date"))
df.join.month$month.roll.avg.cost <- na.locf(df.join.month$month.roll.avg.cost)
head(df.join.month$Date)

# Subset for year avg year cost
df.year <- subset(df,total.shares.years >0)
# totals 
df.year$year.total.shares.years.sum <- cumsum(df.year$total.shares.years)
df.year$year.total.cost.years.sum <- cumsum(df.year$total.cost.years)
df.year$year.roll.avg.cost <- apply(df.year[,c('year.total.cost.years.sum','year.total.shares.years.sum')], 1, function(x) { (x[1]/x[2]) } )

# Join original df 
df.join.year <- full_join(df, df.year, by = c("Date" = "Date"))
df.join.year$year.roll.avg.cost <- na.locf(df.join.year$year.roll.avg.cost)
tail(plot.df,1000)
# Plot 
plot.df  <- data.frame("Date" = df.join.month$Date, "Rolling Average Cost Monthly" = df.join.month$month.roll.avg.cost,"Rolling Average Cost Yeary" = df.join.year$year.roll.avg.cost, "SPX Adjusted Close" = df$GSPC.Adjusted)
# Melt for plotting
plot.df <- melt(plot.df, id.vars="Date")
ggplot(plot.df, aes(x=Date, y=value, fill=variable)) +
  geom_bar(stat='identity', position='dodge')+  
  ggtitle("Average Share Cost - Investing Monthly Vs 1x Bulk Investing Each Year",subtitle="SPX 1950 To Present")+
  labs(x="Date",y="SPX Close Price")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))

And the output vs SPX price:

Rplot238
Average Share Cost – Invest start of every month and start of every year – 1950 To Present

From 1950 to present we see we lock in a better average cost pre dot com up trend.

Rplot234
Average Share Cost – Invest start of every month and start of every year – 2000 To Present

If we started investing at the top of the dot com bubble. We see the effects of averaging down. Both bear markets are somewhat averaged out so to speak. Investing at the top of the market, buy and hold gains were somewhat stagnant for an entire decade.

Rplot235
Average Share Cost – Invest start of every month and start of every year – 2009 To Present

There is not much variation between electing to invest every month vs every year. The difference is negligible over the long haul. It is however, notable that investing incrementally affects the total % return. If we bulk invest 10,000 at one price and it rises 10%. We have +$1000. If we dollar cost average, we invest 20,000 and dilute the initial cost, we have a 5% gain instead of a 10% gain. However, 5% gain on 20,000 is 1000.

Raising the cost per share as price trends does affect the amount of cushion one has. As in this example, this is conditionally investing in the momentum strategy:

Rplot252

This is bulk investing $50,000 at a new leg and adding a fixed amount every month / year. We see our average cost chases the current price. One way one could avoid is bulk invest at the start of each new leg without adding any new funds. Only large contributions are made when a new momentum trade is entered, which is in this case is every other year for my S&P500 strategy. More to come.

Thanks for reading!

Full code can be found on my github.