SPY Mean Reversion With John Ehlers Adaptive RSI

It has been a busy few months. I have been exploring market indicators that John Ehlers has created which he publicly made available in his book: Cycle Analytics for Traders : Advanced Technical Trading Concepts.

The key theme of his book is applying digital signal processing filters to better process market data. He utilizes various high and low pass filters which only allow certain frequencies to pass at specified look back periods and thus make better attempts to reduce signal to noise and perhaps extract a dominant cycle from the data.

Key ideas from the book:

  1. Traditional indicators have lag and assume normal probability distribution
  2. His indicators reduce lag
  3. Extracting the dominant cycle over a range of periods can be used to calibrate traditional indicators such as RSI, stochastic etc….
  4. Has methods to convert indicator output to as close to normal probability distribution

I’m in the process of making his indicators from his Cycle analytics for traders book available on my github in a package called MarketCycles.jl. Please refer to the package README.md for further information.

The package will be used in the next part of this post where we use an Adaptive RSI to time the mean reversion tendency of the daily S&P500.

First install the package:

Pkg.clone("https://github.com/flare9x/MarketCycles.jl")

The Adaptive RSI is adaptive because the indicator extracts the dominant cycle using the autocorrelation periodogram. At any given point in time [i] half the dominant cycle period is used to calibrate the lookback length of the RSI.

I want to point out a few stark differences with the below strategy back test code. First, the code is setup to BUY NEXT BAR AT OPEN. This means when a signal is triggered at the close of a bar. A trade is then entered at the next open price.

Secondly, rather than calculate the cumulative % close to close returns as so many do. We specify an initial starting capital and add the $ gain to a running total adjusting for trading commissions which provides us with a result that is more in line with how it might work in the real world.

Please see top of code snippet in order to adjust the following input variables for the strategy:

# Strategy input variables
starting_capital = 10000
comms = 10
entry_level = .3
exit_level = .8

The rest is best described by following the below julia language code. Note if you do not have a data source I have made this post easy to follow by providing a code to download data from the alphavantage api. You just need to edit the alphavantage link and add in your own API key:

# Ehlers mean reversion

using HTTP
using DataFrames
using CSV
using Gadfly
using MarketCycles

# Multiple symbols
adjusted = 1  # if want adjusted closes set to 1 else non adjusted set it to 0
t = 1
tickers = ["SPY"]
for t in 1:length(tickers)  # Note it might get stuck!! if it fails.. print the t value... then restart the for loop for t in 4:length(tickers)  instead of 1:length(tickers) for example
# Using HTTP package
sleep(10)  # sleep between API calls
if adjusted == 1
res = HTTP.get(joinpath(
"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
else
    res = HTTP.get(joinpath(
    "https://www.alphavantage.co/query?function=TIME_SERIES_MONTHLYD&symbol="tickers[t]"&outputsize=full&apikey=your_api_key&datatype=csv"))
end
mycsv = readcsv(res.body)
x = convert(DataFrame, mycsv)
x = x[2:nrow(x),:]  # subset remove header row
# Rename Columns
if adjusted == 1
    colnames = ["Date","Open","High","Low","Close","Adjusted_Close","Volume","Dividend_Amount","split_coefficient"]
else
colnames = ["Date","Open","High","Low","Close","Volume"]
end
names!(x.colindex, map(parse, colnames))
# Convert String Date to Date format
x[:Date] = Date.(x[:Date],Dates.DateFormat("yyy-mm-dd"))
# Sort Date Frame By Date
x = sort!(x, [order(:Date)], rev=(false))
# Convert OHLC to Float64 and Volume to Int64
if adjusted == 1
for i in 2:length(x)-2
    x[i] = convert(Array{Float64,1},x[i])
end
    x[7] = convert(Array{Int64,1},x[7])
    x[8] = convert(Array{Float64,1},x[8])
else
    for i in 2:length(x)-1
        x[i] = convert(Array{Float64,1},x[i])
    end
        x[6] = convert(Array{Int64,1},x[6])
end

if adjusted == 1
CSV.write(joinpath("CSV_OUT_ADJUSTED_"tickers[t]".csv"), x)
else
CSV.write(joinpath("CSV_OUT_"tickers[t]".csv"), x)
end
end

# Load Data
SPY = CSV.read("C:/Users/Andrew.Bannerman/Desktop/Julia/Indicators/CSV_OUT_ADJUSTED_SPY.csv",types=[String; fill(Float64, 5); Int;Float64],header=true)
SPY[:Date] = DateTime.(SPY[:Date],Dates.DateFormat("mm/dd/yyyy"))

# Pull arrays
Date_index = SPY[:Date]
Spy_close = SPY[:Close]
Spy_open = SPY[:Open]

# Eherls Adaptive RSI
Adaptive_RSI = AdaptiveRSI(Spy_close)

# Strategy input variables
starting_capital = 10000.00
comms = 10
entry_level = .3
exit_level = .8

# Cross under entry level and sell when cross over exit level
entry = zeros(Adaptive_RSI)
exit = zeros(Adaptive_RSI)
for i = 2:size(Adaptive_RSI,1)
    if (Adaptive_RSI[i] <= entry_level) && (Adaptive_RSI[i-1] > entry_level)
        entry[i] = 1
    else
        entry[i] = 0
    end
    if (Adaptive_RSI[i] >= exit_level) && (Adaptive_RSI[i-1] < exit_level)
        exit[i] = 2
    else
        exit[i] = 0
    end
end

# combine entry and exit
signal = exit .+ entry


# Fill the trade between entry and exit with 1's
all_signal = zeros(signal)
for i = 2:length(signal)
    print(i)
    if signal[i] == 1.0
        all_signal[i] = 1.0
    end
    if signal[i] == 0.0 && all_signal[i-1] == 1.0
        all_signal[i] = 1
    end
    if signal[i-1] == 2.0 && signal[i] == 0.0
        all_signal[i-1] = 1.0
    end
end

# ID the start and end of the trade
trade_start = zeros(all_signal)
trade_end = zeros(all_signal)
for i = 2:size(trade_end,1)
if all_signal[i] == 1.0 && all_signal[i] != all_signal[i-1]
    trade_start[i] = 1
end
if all_signal[i-1] == 1.0 && all_signal[i-1] != all_signal[i]
    trade_end[i-1] = 2
end
end

# Calculate the % return buying at open and selling at the close
open_price = Float64[]
close_price = Float64[]
strategy_returns = zeros(size(all_signal,1))
for i = 1:size(all_signal,1)-1
    if trade_start[i] == 1
        open_price = Spy_open[i+1]
    elseif trade_end[i] == 2
        close_price = Spy_close[i]
        # place the close return on same index position as the end of the trade
    strategy_returns[i] = close_price /  open_price - 1
end
end


# Work out gain on starting investment amount
dollar_gain = zeros(size(strategy_returns,1))
dollar_gain[1] = starting_capital

i=1
for i=2:size(strategy_returns,1)
    # if the returns are 0 then use the previous $ amount
if strategy_returns[i] == 0.0
    dollar_gain[i] = dollar_gain[i-1]
elseif strategy_returns[i] != 0.0
    dollar_gain[i] = dollar_gain[i-1] + (dollar_gain[i-1] * strategy_returns[i]) - comms
end
end

# plot final result
white_panel = Theme(
	panel_fill="white",
    default_color="blue",
    background_color="white"
)
p = plot(x=Date_index,y=dollar_gain,Geom.line,Guide.title("Ehlers Adaptive RSI"),white_panel,Scale.y_continuous(format=:plain))
         draw(PNG("C:/Users/Andrew.Bannerman/Desktop/Julia/ehlers_rsi.png", 1500px, 800px), p);

And we may plot the output which is the growth of $10,000 inclusive of $5 round trip commissions:

ehlers_rsi

It should be noted that default settings from Ehlers book were used as a High and low pass filter as well as the range of lags used in the autocorrelation periodogram to calculate the dominant cycle.

The function I created has many inputs that may be calibrated:

AdaptiveRSI(x::Array{Float64}; min_lag::Int64=1, max_lag::Int64=48,LPLength::Int64=10, HPLength::Int64=48, AvgLength::Int64=3)::Array{Float64}

Thus the default settings might not be best tuned to SPY daily data. One would be advised to iterate through the input variables to learn how each variable affects the output as well as find a correct tuning of the indicator.

If anyone wishes to have a more professional back test script with varying entry / exits methods that accounts for slippage and commissions do not hesitate to contact me.

Follow me on twitter: @flare9x

Github

Julia – Build any time resolution using 1 minute data

Reliable data makes for more accurate models. It is not the end of the world if there are minor discrepancies although data does need to be representative to build models and make good assumptions. Common data errors are known to be found at market closing times. We want the auction price not the last price. […]

 

Reliable data makes for more accurate models. It is not the end of the world if there are minor discrepancies although data does need to be representative to build models and make good assumptions.

Common data errors are known to be found at market closing times. We want the auction price not the last price. Last price might be some fluff trade with 1 lot. We want the real close or the auction close. This increases the accuracy of the daily close data.

To achieve this we can create any resolution using 1 minute bars and sample them at which ever time resolution one wishes. For example. If we want to create more accurate daily close data using the auction close price at 15:15 for ES. We may simply sample every 1 minute close at time stamp of 15:15.

If we want to build models and avoid a period of volatility, the last 15 minutes of trade we may sample the time at every 15:00 time stamp.

So in order to have more control over the creation of data I created the code attached to this blog post.

If we build 15 minute data. We may sample the 1 minute close price at each :00, :15, :30, :45 time increment.

For 30 minute data. We may sample the 1 minute close price at each :00 and :30 time increment.

# Handling missing data

Where missing data was experienced this was dealt with by forward filling the closest value. If there was a time stamp as follows: 09:01, 09:02, 09:03, 09:05, 09:07, 09:08. Where 09:04 and 09:06 are missing. To fill missing 09:04 we forward fill 09:03 data points and to fill missing 09:06 we forward fill 09:05.

This methodology seems consistent with how tradestation builds their larger resolution data from 1 minute data. Although if the data point is simply too far away, TS has an ignore feature (after studying further TS forward or back fill missing data if on the same day, I am since now looking to make this edit as it makes complete sense!). So they would miss the point 100% in the new time resolution sample.

Despite this, I feel the procedure deployed and the low frequency of missing data makes the data set good enough to build models on.

# Data Accuracy

A point to note pertaining to data accuracy. When building models on OHLC price data it makes sense to form statistical distributions by altering the original reported OHLC price by some % of the days ATR over many iterations. Thus we may observe how sensitive the models are to changes in the reported OHLC prices. In the absence of expensive bid ask data this is a good alternative.

# How to have confidence

The data of course has to be representative and accurate to how it unfolded in real time. Artificial signals would ensue if the data was of terrible quality. For this reason by forming distributions we may land in some place of confidence. If a trading model can take a hammering over the days random % ATR adjustment.  Future confidence can be gained and suggests the model is fit to signal where noise/errors in OHLC reporting does not throw off the model and make it redundant.

The code for building any resolution may be found on my github with the steps as:

  1. Load ES daily data to index the days of trading. This removes all market holiday closures.
  2. Create a date and time index and join original 1 minute data to the date/time index.
  3. Forward fill missing data. For loop within acts like na.locf.
  4. Find half day market closures. Here we load 30 minute data and loop through looking for early closures and late market open (1x, sep 11, 2002)
  5. Save market holiday dates and times and loop through the series filling in the holiday times with “” or 0.0
  6. Reduce to remove all 0 and “”
  7. Build any time frame resolution. 10min, 15 min, 30 min etc…

Hope you enjoy and can save time vs. downloading each and every time resolution 🙂

See bar plot of number of errors in re-sampled vs Tradestation:

Errors

julia> out_all
6×2 DataFrames.DataFrame
│ Row │ x1 │ x2 │
├─────┼───────┼────┤
│ 1 │ Date │ 0 │
│ 2 │ Time │ 0 │
│ 3 │ Open │ 58 │
│ 4 │ High │ 9 │
│ 5 │ Low │ 13 │
│ 6 │ Close │ 0 │

58 discrepancies involved with the open price. There are a total of 67387 30 min bars in the example. This represents 0.0861% of the total sample.

Tradestation had missing data around the open 1 min prices. I forward filled the previous day close in this case. Where it would be more accurate to backfill the next available quote on the same day. I will likely update this pretty soon. There were other special closes such as 1 minute silences which I didn’t account for in my data set. A list may be found:

Nyse-Closings.pdf

Thanks for reading.

Estimating the Hurst Exponent Using R/S Range

In this post we will estimate the Hurst exponent using the R/S method. The Hurst exponent determines the long range memory of a time series (more below). If a series has no memory ie if each point in time is independent from previous points in time then its said to be more of a random process. Examples of random processes are markov processes, Brownian motion and white noise.

A series which trends has autocorrelation where one point in time is correlated with a lagged version of itself. The autocorrelation of a series may be calculated with the following formula:

auto correlation

In English the top part of the equation (numerator) is the covariance where Yi is the series and Yi+k some lagged version of itself. The bottom part of the equation (denominator) is the variance.

So we simply calculate the covariance of the series and a lagged version of itself and divide by the variance.

We may plot the autocorrelation of the SP500 futures contract close prices.
source: https://www.quandl.com/data/CHRIS/CME_SP1-S-P-500-Futures-Continuous-Contract-1-SP1-Front-Month

Can not validate the accuracy of this data but good enough for illustration purposes.

auto_cor.png

This is far from random and close prices have a high degree of correlation between some lag of itself.

Now we may estimate the Hurst exponent over a SP futures return series using the rescaled analysis method (RS). This is the original method calculated by Harold Edwin Hurst who devised the formula to measure the mean reversion tendency of the Nile river.

We shall estimate H for a 1 bar return series and a 100 bar return series.

The R/S calculation steps are depicted on Wikipedia.

The Hurst exponent is more of an estimate than an absolute calculation. Corresponding H values signal:

H values less than 0.5 = anti persistent behavior, mean reverting
H values of 0.5 = random process
H values greater than 0.5 = persistent behavior, trending

The Hurst exponent procedure is as follows which calculates the R/S range and estimates the Hurst exponent by regression log(R/S) and log(n-lag):

1. Calculate a return series, We will estimate the Hurst exponent on a 1 bar return series. The thesis here is that it will be close to a random process if indeed each bar is independent from the last, or no auto correlation, H=0.5. If we pick a longer return series such as 100 we would expect to see higher H values and higher auto correlation coefficients.
2. Hurst exponent is estimated by regression of a power law of log(R/S) vs log(n-lag). The Slope being the Hurst exponent. For that reason logarithmically spaced block sizes were chosen at lags: [8,16,32,64,128,256,512,1024]
2. Calculate the mean for each block size. In this case 1:8, 2:9 etc.. or i-n+1:i. Do this for each block size.
3. For said block size, subtract the mean from each value in the return series
4. Sum the deviations from the mean
5. Find the maximum and minimum of the sum of deviations from the mean
6. Find R, the range of summed deviations from the mean by subtracting maximum – minimum
7. Calculate the standard deviation of the deviations from the mean
8. Calculate R/S, the rescaled range by dividing R by the stdev of deviations from the mean
9. After rolling through the series calculating the RS value along each block size. We take the mean RS value for each block size. So for each lag (block size) we have one mean value.
10. Perform a regression log2(mean_RS) vs log2(n_lag). The slope is the hurst exponent. The procedure above is detailed below using Julia language:


####################################################################################
# Rolling Hurst
####################################################################################

# initialize outputs
m = Array{Float64}(length(d_es_close),0)
log_n_out = m
out_RS = m

# Set lags (range or specific)
max_lag = 100
n = 200
lags = n:max_lag
# or specific lags #comment out where necessary
lags = [8,16,32,64,128,256,512,1024]
# Specify return series lag
n_lag = 2000

i=1
j=1
c=1
    for j = lags
        n=lags[c]
    # Calculate returns of the series
    #n_lag = lags[c] # set n_lag 1 for 1 day returns
    rets = zeros(d_es_close)
    n_lag = n_lag
    for i in n_lag+1:length(d_es_close)
        rets[i] = ((d_es_close[i] / d_es_close[i-n_lag])-1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1)
    end
    #rets = d_es_close
    # Find mean width of lookback
    mean_out = zeros(rets)
    for i = n:size(rets,1)
                mean_out[i] = mean(rets[i-n+1:i])
            end
    # Subtract deviations from the mean
    dev_mean_out = zeros(mean_out)
    for i = n:size(mean_out,1)
                dev_mean_out[i] = (rets[i] - mean_out[i])
            end
    # Roll sum the deviations from the mean
    sum_out = zeros(dev_mean_out)
    for i = n:size(dev_mean_out,1)
            sum_out[i] = sum(dev_mean_out[i-n+1:i])
        end
    # Find the maximum and minimum of sum of the mean deviations
    max_out = zeros(sum_out)
    for i = n:size(sum_out,1)
                max_out[i] = maximum(sum_out[i-n+1:i])
            end
    min_out = zeros(sum_out)
    for i = n:size(sum_out,1)
                min_out[i] = minimum(sum_out[i-n+1:i])
            end
    # R = Range, max - min
    R_out = zeros(dev_mean_out)
    for i= n:size(dev_mean_out,1)
        R_out[i] = max_out[i] - min_out[i]
    end
    # Rolling standard deviation of (returns) the mean
    stdev_out = zeros(rets)
    for i = n:size(rets,1)
            stdev_out[i] = sqrt(var(dev_mean_out[i-n+1:i]))
        end
    # Calculate rescaled range (Range (R_out) / stdev of returns )
    RS_out = zeros(R_out)
    for i = n:size(R_out,1)
            RS_out[i] = R_out[i] / stdev_out[i]
        end
    # Calculate log_n (n)
    index = fill(n,length(rets))
    log_n = zeros(rets)
    for i =n:size(index,1)
        log_n[i] = log2(index[i])
    end

# out
log_n_out = hcat(log_n_out, log_n)
#log_rs_out = hcat(log_rs_out, log_RS)
out_RS = hcat(out_RS, RS_out) # re-scaled range

c = c+1
end

# access dims of the matrix
# row ,col
#log_rs_out[20,:]

# Calculate expected value for R/S over various n
# Take the rolling mean over each varying n lag at n width
expected_RS = zeros(size(out_RS,1), size(out_RS,2))
n=100
for j = 1:size(out_RS,2)  # loop on column dimension
    for i = n:size(out_RS,1) # loop on row dimension
            expected_RS[i,j] =  mean(out_RS[i-n+1:i,j])
            end
        end

# log 2
expected_log_RS = zeros(size(out_RS,1), size(out_RS,2))
c=1
for j = 1:size(expected_log_RS,2)  # loop on column dimension
    for i = n:size(expected_log_RS,1) # loop on row dimension
            expected_log_RS[i,j] =  log2(expected_RS[i,j])
            end
                        c=c+1
        end

    # Regression slope of log(n) and expected value of log(R/S)
    # x = log(n), y = log(R/S)
    b_slope = zeros(size(out_RS,1))
    A_intercept = zeros(size(out_RS,1))

    for i = n:size(out_RS,1)
        xi = log_n_out[i,:]  # grab the row for varying lags of log_n
        yi = expected_log_RS[i,:] # grab the row for varying lags of r/s value
        # Mean of X (Mx)
        Mx = mean(xi)
        # Mean of Y (My)
        My = mean(yi)
        # Standard deviation of X (sX)
        sX = sqrt(var(xi))
        # Standard Deviation of Y (sY)
        sY = sqrt(var(yi))
        # Correlation of x and y  (r)
        r = cor(xi,yi)
        # slope (b) = r * (sY/sX)
        b = r * (sY/sX)
    # find intercept A = MY - bMX
    A = My - (b*Mx)

# out
b_slope[i] = b
A_intercept[i] = A
end

Using the code above we may plot the hurst exponent for a 1 bar return series.

Hurst (2)

If we have a random process, successive points in time are independent from each other. We see Hurst values fluctuated around the .50 area with H .75 in 2009 during a strong trending period.

We may see the effect of analyzing a longer return stream, lets say 100 bar return series:

H_close.png

We see Hurst values closer to 1 for a 100 bar return series.

Furthermore to see how the Hurst exponent and auto correlation are related, we may calculate a rolling auto correlation:

# Rolling Autocorrelation
# Sliding window (k width)
mean_out = zeros(size(d_es_close,1))
var_out = zeros(size(d_es_close,1))
cov_out = zeros(size(d_es_close,1))
autocor_out = zeros(size(d_es_close,1))
devmean_out = zeros(size(d_es_close,1))
n=1000
k = 1 # set lag
n_lag = 100 # return series lag

# Calculate 1 bar return series
rets = zeros(size(d_es_close,1))
for i in n_lag+1:length(d_es_close)
    rets[i] = ((d_es_close[i] / d_es_close[i-n_lag])-1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1)
end

# lag the series by k
lagged_series = [fill(0,k); rets[1:end-k]]

# Calculate the mean of the rolling  sample
for i = n:size(rets,1)
    mean_out[i] = mean(rets[i-n+1:i])
end

# Calculate deviations from the mean
for i = n:size(rets,1)
devmean_out[i] = rets[i] - mean_out[i]
end

# Calculate rolling variance
for i = n:size(rets,1)
    var_out[i] = var(rets[i-n+1:i])
end

# Calculate rolling covariance
for i = n:size(rets,1)
    if i+k >= size(rets,1)
        break
    end
    cov_out[i] = cov(rets[i-n+1:i],lagged_series[i-n+1:i])
end

# Rolling Autocorrelation
for i =n:size(rets,1)
    autocor_out[i] = cov_out[i] / var_out[i]
end

For a 1 bar return series at lag, k=1

auto_cor_1_bar

We see very little correlation from one point in time to the next. The respective hurst exponent is around 0.40 to 0.55 range.

And the auto correlation for a 100 bar return series at lag , k=1

auto_cor_100_bar.png

We see strong correlation at the 100 bar return series with correlation coefficients greater than .96 with respective hurst exponents closer to 1.0.

So whats the point in all of this?

As depicted above information pertaining to the (non) randomness of series can may be of some benefit when designing models to fit to the data. The autocorrelation / Hurst exponent may be used to compliment existing strategies signalling the type of regime the market currently is under and may add to strategies which better suit current conditions.

As a  note of interest. On a 1 bar daily return series for the SPY ETF at lags 2:20:

H_two_twenty

Values of H are below .50 and in mean reversion territory. Lets shorten the lags one more time 2:10.

H_two_ten

We see significant mean reversion H values. At holding periods sub 20 bars, a mean reversion model would be best suited.

A way I like to think of this is how quickly does something diffuse from the mean? A mean reversion series is almost stuck in the mud, and the prices are centered around a mean. I like to think of it as a compressed coil or in the fractal dimension, pertains to the roughness of a surface where mean reversion is most certainly rough. On the other side a strong trend has a faster rate of diffusion from the mean or in the fractal dimension has a smooth surface. The S&P500 is a blend of this action where on a shorter holding period, there is mean reversion tendency (rough) but on longer time horizon the market displays trending behavior (smooth).

In closing and to recap. We see how the Hurst exponent and the auto correlation of a series are related to each other. Random processes show no dependence from one point in time to a successive point in time, or having no correlation with a subsequent Hurst value of 0.5. On the other side of this, a series showing persistent behavior, a trending series, displayed high correlation coefficients where successive points in time were correlated with each other and subsequent H values closer to 1.

Code may be found on my github.

Julia – Download Free Data Using Alphavantage API

For those of you wanting to get started/familiar with the Julia language. I wrote a small script to import .csv data using the Alphavantage API.

In this example we demonstrate how to download a single .csv file as well as batch download multiple .csv files and export as .csv.

First lets download a single .csv. AAPL 1 minute data using the alphavantage API. You will need to insert your API key.

single__test

And plot the output:

AAPL_1min

If we wish to download multiple .csv files and export to .csv we may:

multiple

Those of you using R may visit: Free Data: Alternative to Yahoo Finance

Full Julia code on my github

Speed Check! Juilia Vs R Back Test Script

In a quest for speed enhancements over R. I opted to look at the Julia language. It is a high level programming language touting similar speed to C. I find the syntax not at all that different from Python and R. If you have knowledge of the ‘how’ to solve many problems using those languages, the same logic applies to using Julia only having to learn a slightly new but similar syntax.

I created a very simple back test script. The strategy is to stay long the ES mini when the close price is over the 200 period moving average applied to 30 minute bars (200 * 30 minutes is a 6000 minute moving average). Close the position when the ES crosses under the 6000 minute moving average.

I kept the functions and methods of calculation mostly similar between the two languages as stepped below:

1. Load .csv data, 30 minute ES time
2. Create a date and time column, convert to Date format
3. Create 200 Bar SMA
4. Create Long Signals
5. Lag the Long Signal forward +1 to avoid look ahead bias
6. For loop to calculate bar to bar returns
7. Subset Data to remove 200 missing data on SMA 200 creation
8. Calculate strategy returns and buy and hold returns

I excluded any plotting processes as right now I am plotting within the IDE. For R Im using R studio and for Julia I am using Atom – Juno.

Lets now get to the code showing the backtest script for both R and Julia:

Load Packages R:

require(TTR)
require(lubridate)
require(dplyr)

Load Packages Julia:

using DataFrames using Indicators

Load .txt Data R

df <- read.csv("C:/Users/Andrew.Bannerman/Desktop/Julia/30.min.es.txt", header=TRUE,stringsAsFactors = FALSE)

Load txt Data Julia:

df = readtable("30.min.es.txt", header=true)

Lets check to see how many rows the ES 30 minute data has:

julia> nrow(df) 223571

Next lets make a Date Time Column and convert to Date Time format in R:

# Make date time column
df$Date_Time <- paste(df$Date,df$Time)
df$Date_Time <- mdy_hm(df$Date_Time)

Make Date Time Column Julia (Couldn't find a clean R paste() like Julia function!) and convert to DateTime format:

a = df[:Date] b = df[:Time] c = map(join,zip(a,b), " ") out = String[] temp = String[] for i in 1:length(a) temp = map(join,zip([a[i]],[b[i]]), " ") append!(out,temp) end df[:Date_Time] = out df[:Date_Time] = DateTime.(df[:Date_Time],Dates.DateFormat("mm/dd/yyyy H:M")

Next we can create the 200SMA and Calculate the Long Signal, first R:

# Create Sma
df$sma_200 <- SMA(df$Close,200)
# Create long signal
df$Long_Signal  df$sma_200,1,0)
df$Long_Signal <- dplyr::lag(df$Long_Signal,1) # lag forward avoid look ahead bias

And Julia:

# Create simple moving average # using Indicators Close = convert(Array, df[:Close]) sma_200 = sma(Close,n=200) df[:Close_200sma] = sma_200 # Create Signals # Stay long over 200sma # Exit positions below 200sma # use ifelse() function see - #https://en.wikibooks.org/wiki/Introducing_Julia/Controlling_the_flow # remember . in front of the (.>) for vectorization! df[:Signal_Long] = ifelse(df[:Close] .> df[:Close_200sma],1,0) # Lag data +1 forward # Avoid look ahead bias df[:Signal_Long] = [0; df[1:end-1,:Signal_Long]]

Next we can calculate Close to Close Returns. From this we multiply the returns by the strategy signal 1 or 0.

First R:

# For loop for returns
out <- vector()
for (i in 2:nrow(df)){
out[i] = df$Close[i]/df$Close[i-2+1] - 1.0
}
df <- cbind(df,out)
colnames(df)[12] = "Close_Ret"
# Calculate strategy Returns
df$Sig_Rets <- df$Long_Signal * df$Close_Ret
df[is.na(df)] <- 0

And same for Julia:

# Calculate Close to Close Returns Close = df[:Close] x = convert(Array, Close) out = zeros(x) for i in 2:size(Close,1) out[i] = Close[i]/Close[i-2+1] - 1.0 end df[:Close_Rets] = out # Calculate signal returns df[:Signal_Rets] = df[:Signal_Long] .* df[:Close_Rets]

And finally we calculate cumulative returns:

First R:

# Calculate Cumulative Returns
# Buy and hold and Strategy returns
# Subset Data To start after SMA creation
df = df[201:nrow(df),]
df$Signal_cum_ret <- cumprod(1+df$Sig_Rets)-1
df$BH_cum_ret <- cumprod(1+df$Close_Ret)-1

And Julia:

# Calculate Cumulative Returns df = df[201:end,:] df[:Cum_Rets] = cumprod(1+df[1:end, :Signal_Rets])-1 df[:BH_Cum_Rets] = cumprod(1+df[1:end, :Close_Rets])-1g] .* df[:Close_Rets]

Next lets wrap the script in a for loop and run it 100 times and take the mean time ( full code on my github)

The mean time result for a 100 iterations using R:

out_results
Time
1 4.881509
2 4.550159
3 4.762161
4 4.847419
5 5.260049
6 4.715544
7 4.617849
8 4.642842
9 4.933652
10 4.660920

mean(out_results$Time)
[1] 4.582826

And the mean time result for 100 iterations Julia:

julia> final_out
100-element Array{Int64,1}:
 2321
 1974
 2123
    ⋮
 1943
 1933
 2083

julia> print(mean(final_out))
1957.93
julia> 1957.93/1000  # Convert milliseconds to seconds
1.9579300000000002

We see on average that Julia took 1.95 seconds to complete each back test iteration. The Julia script contained two for loops vs 1x for loop in R. I didnt play to R’s vectorized strengths in this regard. But on a almost exact same code to code speed check Julia comes out on top beating R on average by 2.624896 seconds per script iteration.

After 100 iterations R total time for completion:

> sum(out_results$Time)
[1] 458.2826

or 7.6380433333 minutes.

And total Time for Julia:

julia> print(sum(final_out))
195793
julia> 195793 / 1000
195.793

or 3.263216667 minutes.

In this example after running a back test script 100 times and taking the average time + sum time for completion we see Julia is 2.34 times faster than R.
It should be noted that each function is pretty standard to each language. I used Julias DataFrames package versus using straight Arrays. Using Arrays might be faster than working with dataframes. We see no slow down at all using for loops in Julia. My hunch is that removing the for loop in R would get the time closer to Julia but i’m too lazy to check this 🙂 (ok i’m not if we play to the vectored theme of R and remove the slow for loop for calculating returns and replacing with data.table:

require(data.table)
df = data.table(df)
df[, Close_Ret := (Close / shift(Close))-1]

Speed improves with 1x script run taking:

Time difference of 2.614989 secs
)

This is my first Julia script so if spot anywhere I can make the code more efficient drop me a line.

A similar package to TTR for financial indicators is Julias Indicators package.

I like working with Rstudio and a similar IDE for Julia is Juno-Atom

atom_juno

Finally:

Here is the back test results from R / Julia:

plot(df$Signal_cum_ret,type="l",main="R 200SMA Back Test Result")

Rplot431

# Plot
using StatPlots
gr(size=(1500 ,1000))
@df df plot(:Date_Time, [:Cum_Rets :BH_Cum_Rets], title = "SPY Long Over 200sma", xlab = "Date", ylab = "Cumulative Returns",colour = [:lightgreen :pink],legend = :topleft)
savefig("myplot.png")

myplot.png

R Code = https://gist.github.com/flare9x/2d73e73218967699c035d6d70fa4ae8a
Julia Code = https://gist.github.com/flare9x/7d1d41856ffbe3106983d15885d8a0cc