Speed Check! Juilia Vs R Back Test Script

In a quest for speed enhancements over R. I opted to look at the Julia language. It is a high level programming language touting similar speed to C. I find the syntax not at all that different from Python and R. If you have knowledge of the ‘how’ to solve many problems using those languages, the same logic applies to using Julia only having to learn a slightly new but similar syntax.

I created a very simple back test script. The strategy is to stay long the ES mini when the close price is over the 200 period moving average applied to 30 minute bars (200 * 30 minutes is a 6000 minute moving average). Close the position when the ES crosses under the 6000 minute moving average.

I kept the functions and methods of calculation mostly similar between the two languages as stepped below:

1. Load .csv data, 30 minute ES time
2. Create a date and time column, convert to Date format
3. Create 200 Bar SMA
4. Create Long Signals
5. Lag the Long Signal forward +1 to avoid look ahead bias
6. For loop to calculate bar to bar returns
7. Subset Data to remove 200 missing data on SMA 200 creation
8. Calculate strategy returns and buy and hold returns

I excluded any plotting processes as right now I am plotting within the IDE. For R Im using R studio and for Julia I am using Atom – Juno.

Lets now get to the code showing the backtest script for both R and Julia:

Load Packages R:

require(TTR)
require(lubridate)
require(dplyr)

Load Packages Julia:

using DataFrames using Indicators

Load .txt Data R

df <- read.csv("C:/Users/Andrew.Bannerman/Desktop/Julia/30.min.es.txt", header=TRUE,stringsAsFactors = FALSE)

Load txt Data Julia:

df = readtable("30.min.es.txt", header=true)

Lets check to see how many rows the ES 30 minute data has:

julia> nrow(df) 223571

Next lets make a Date Time Column and convert to Date Time format in R:

# Make date time column
df$Date_Time <- paste(df$Date,df$Time)
df$Date_Time <- mdy_hm(df$Date_Time)

Make Date Time Column Julia (Couldn't find a clean R paste() like Julia function!) and convert to DateTime format:

a = df[:Date] b = df[:Time] c = map(join,zip(a,b), " ") out = String[] temp = String[] for i in 1:length(a) temp = map(join,zip([a[i]],[b[i]]), " ") append!(out,temp) end df[:Date_Time] = out df[:Date_Time] = DateTime.(df[:Date_Time],Dates.DateFormat("mm/dd/yyyy H:M")

Next we can create the 200SMA and Calculate the Long Signal, first R:

# Create Sma
df$sma_200 <- SMA(df$Close,200)
# Create long signal
df$Long_Signal  df$sma_200,1,0)
df$Long_Signal <- dplyr::lag(df$Long_Signal,1) # lag forward avoid look ahead bias

And Julia:

# Create simple moving average # using Indicators Close = convert(Array, df[:Close]) sma_200 = sma(Close,n=200) df[:Close_200sma] = sma_200 # Create Signals # Stay long over 200sma # Exit positions below 200sma # use ifelse() function see - #https://en.wikibooks.org/wiki/Introducing_Julia/Controlling_the_flow # remember . in front of the (.>) for vectorization! df[:Signal_Long] = ifelse(df[:Close] .> df[:Close_200sma],1,0) # Lag data +1 forward # Avoid look ahead bias df[:Signal_Long] = [0; df[1:end-1,:Signal_Long]]

Next we can calculate Close to Close Returns. From this we multiply the returns by the strategy signal 1 or 0.

First R:

# For loop for returns
out <- vector()
for (i in 2:nrow(df)){
out[i] = df$Close[i]/df$Close[i-2+1] - 1.0
}
df <- cbind(df,out)
colnames(df)[12] = "Close_Ret"
# Calculate strategy Returns
df$Sig_Rets <- df$Long_Signal * df$Close_Ret
df[is.na(df)] <- 0

And same for Julia:

# Calculate Close to Close Returns Close = df[:Close] x = convert(Array, Close) out = zeros(x) for i in 2:size(Close,1) out[i] = Close[i]/Close[i-2+1] - 1.0 end df[:Close_Rets] = out # Calculate signal returns df[:Signal_Rets] = df[:Signal_Long] .* df[:Close_Rets]

And finally we calculate cumulative returns:

First R:

# Calculate Cumulative Returns
# Buy and hold and Strategy returns
# Subset Data To start after SMA creation
df = df[201:nrow(df),]
df$Signal_cum_ret <- cumprod(1+df$Sig_Rets)-1
df$BH_cum_ret <- cumprod(1+df$Close_Ret)-1

And Julia:

# Calculate Cumulative Returns df = df[201:end,:] df[:Cum_Rets] = cumprod(1+df[1:end, :Signal_Rets])-1 df[:BH_Cum_Rets] = cumprod(1+df[1:end, :Close_Rets])-1g] .* df[:Close_Rets]

Next lets wrap the script in a for loop and run it 100 times and take the mean time ( full code on my github)

The mean time result for a 100 iterations using R:

out_results
Time
1 4.881509
2 4.550159
3 4.762161
4 4.847419
5 5.260049
6 4.715544
7 4.617849
8 4.642842
9 4.933652
10 4.660920

mean(out_results$Time)
[1] 4.582826

And the mean time result for 100 iterations Julia:

julia> final_out
100-element Array{Int64,1}:
 2321
 1974
 2123
    ⋮
 1943
 1933
 2083

julia> print(mean(final_out))
1957.93
julia> 1957.93/1000  # Convert milliseconds to seconds
1.9579300000000002

We see on average that Julia took 1.95 seconds to complete each back test iteration. The Julia script contained two for loops vs 1x for loop in R. I didnt play to R’s vectorized strengths in this regard. But on a almost exact same code to code speed check Julia comes out on top beating R on average by 2.624896 seconds per script iteration.

After 100 iterations R total time for completion:

> sum(out_results$Time)
[1] 458.2826

or 7.6380433333 minutes.

And total Time for Julia:

julia> print(sum(final_out))
195793
julia> 195793 / 1000
195.793

or 3.263216667 minutes.

In this example after running a back test script 100 times and taking the average time + sum time for completion we see Julia is 2.34 times faster than R.
It should be noted that each function is pretty standard to each language. I used Julias DataFrames package versus using straight Arrays. Using Arrays might be faster than working with dataframes. We see no slow down at all using for loops in Julia. My hunch is that removing the for loop in R would get the time closer to Julia but i’m too lazy to check this 🙂 (ok i’m not if we play to the vectored theme of R and remove the slow for loop for calculating returns and replacing with data.table:

require(data.table)
df = data.table(df)
df[, Close_Ret := (Close / shift(Close))-1]

Speed improves with 1x script run taking:

Time difference of 2.614989 secs
)

This is my first Julia script so if spot anywhere I can make the code more efficient drop me a line.

A similar package to TTR for financial indicators is Julias Indicators package.

I like working with Rstudio and a similar IDE for Julia is Juno-Atom

atom_juno

Finally:

Here is the back test results from R / Julia:

plot(df$Signal_cum_ret,type="l",main="R 200SMA Back Test Result")

Rplot431

# Plot
using StatPlots
gr(size=(1500 ,1000))
@df df plot(:Date_Time, [:Cum_Rets :BH_Cum_Rets], title = "SPY Long Over 200sma", xlab = "Date", ylab = "Cumulative Returns",colour = [:lightgreen :pink],legend = :topleft)
savefig("myplot.png")

myplot.png

R Code = https://gist.github.com/flare9x/2d73e73218967699c035d6d70fa4ae8a
Julia Code = https://gist.github.com/flare9x/7d1d41856ffbe3106983d15885d8a0cc

Author: Andrew Bannerman

Integrity Inspector. Quantitative Analysis is a favorite past time.

5 thoughts on “Speed Check! Juilia Vs R Back Test Script”

  1. I’m a past Python fan who has converted to Julia. I’m a bit surprised Julia is only about 2x faster than R.

    There could be two reasons:

    (1) Most of your code has global scope. Wrapping most things in a function should give you better performance. See https://docs.julialang.org/en/stable/manual/performance-tips/. For a complex program this would lead to a huge difference if you repeatedly execute the function many times.

    (2) Your code does not involve loops. While a simple strategy like in the example could work just fine with vector operations, a more complicated strategy where future trade decisions depend on past trades would often require the use of loops. Once many loops are involved, the performance gap between Julia and the likes of Python/R would be much more dramatic. You can see some benchmarks here: https://julialang.org/benchmarks/

    Liked by 1 person

    1. Agree on all points and thank you for sharing the speed link. The Julia documentation is excellent in my opinion. Vectorizing within R calls upon C functions so really in that department and in this example there is nothing computationally intensive about it. For bigger jobs, lets say a co-integration script running regressions on hundreds (thousands) of pairs, R has a very slow regression function lm(), so these for sure would run faster in Julia. Although there are faster alternatives to R lm() such as fastLm which uses Rcpp and C for speed enhancements I guess Cython is Pythons equivalent of using Rcpp / C code. However, I feel there is more ‘freedom’ with Julia. Sometimes it is not possible to avoid using for loops. I found many many instances of this in R and as we know R is notorious for having slow for loops.

      I’m excited for Julias future. I have no C experience and the higher level nature of Julia makes picking it up rather easy. So there is a low cost of time learning investment for extra speed. There are close to a 0.7 release and v1.0 is not too far away. It should see more support package wise in the future.

      Like

  2. Well, why are you using for loop in R to calculate returns and not using TTR::ROC? And why are you using data.frame and not xts? And why are you not using data.table::fread to load csv files from disk? And why are you not using microbenchmark to measure execution time of R? For example look at how to do it here https://quantstrattrader.wordpress.com/2018/01/24/which-implied-volatility-ratio-is-best/ (except use data.table::fread to load data from csv files).

    In my opinion only after you do at least all of the above you can start compare “speed” of R to any other language. And result and your conclusion on “speed” would be dramatically different.

    Like

    1. The goal of the post was to compare like for like code between the languages. No effort was made to write fast R code. On a sort of raw level Julia wins. I routinely use the packages you mentioned. Generally for most quant jobs R suffices. Moving to high computationally expensive machine learning type scopes we seek alternatives.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s