Stock GVP – Mean Reverting Series

Let us explore the ticker symbol GVP. We will test for mean reversion with the Hurst exponent and calculate the half life of mean reversion.

First, lets plot the daily closing prices:

library(ggplot2)
ggplot(new.df, aes(x = Date, y = Close))+
geom_line()+
labs(title = "GVP Close Prices", subtitle = "19950727 to 20170608")+
theme(plot.title = element_text(hjust=0.5),plot.subtitle = element_text(hjust=0.5,size=9), plot.caption = element_text(size=7))

Rplot13

Lets run the Hurst exponent to test for mean reversion, we will do this over the entire history of GVP. For this test we will use a short term lag period of 2:20 days (Explanation Here).

# Hurst Exponent
# Andrew Bannerman
# 8.11.2017

require(lubridate)
require(dplyr)
require(magrittr)
require(zoo)
require(lattice)

# Data path
data.dir <- "D:/R Projects"
output.dir <- "D:/R Projects"
data.read.spx <- paste(data.dir,"GVP.csv",sep="/")

# Read data
read.spx <- read.csv(data.read.spx,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE)

# Convert Values To Numeric
cols <-c(3:8)
read.spx[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

# Convert Date Column [1]
read.spx$Date <- ymd(read.spx$Date)

# Make new data frame
new.df <- data.frame(read.spx)

# Subset Date Range
#new.df <- subset(new.df, Date >= "2000-01-06" & Date <= "2017-08-06")
#new.df <- subset(new.df, Date >= as.Date("2017-01-07") ) 

#Create lagged variables
lags <- 2:20

# Function for finding differences in lags. Todays Close - 'n' lag period
getLAG.DIFF <- function(lagdays) {
  function(new.df) {
    c(rep(NA, lagdays), diff(new.df$Close, lag = lagdays, differences = 1, arithmetic = TRUE, na.pad = TRUE))
  }
}
# Create a matrix to put the lagged differences in
lag.diff.matrix <- matrix(nrow=nrow(new.df), ncol=0)

# Loop for filling it
for (i in lags) {
  lag.diff.matrix <- cbind(lag.diff.matrix, getLAG.DIFF(i)(new.df))
}

# Rename columns
colnames(lag.diff.matrix) <- sapply(lags, function(n)paste("lagged.diff.n", n, sep=""))

# Bind to existing dataframe
new.df <-  cbind(new.df, lag.diff.matrix)
head(new.df)

# Calculate Variances of 'n period' differences
variance.vec <- apply(new.df[,9:ncol(new.df)], 2, function(x) var(x, na.rm=TRUE))

# Linear regression of log variances vs log lags
log.linear <- lm(formula = log(variance.vec) ~ log(lags))
# Print general linear regression statistics
summary(log.linear)
# Plot log of variance 'n' lags vs log time
xyplot(log(variance.vec) ~ log(lags),
       main="GVP log variance of price diff Vs log time lags",
       xlab = "Time",
       ylab = "Logged Variance 'n' lags",
       grid = TRUE,
       type = c("p","r"),col.line = "red",
       abline=(h = 0)) 

hurst.exponent = coef(log.linear)[2]/2
hurst.exponent

Rplot14

linear.regression.output

If we divide the log(logs) coefficient by 2 we obtain the Hurst exponent of 0.4598435.

Remember H value less than 0.5 = mean reversion.

0.5 = random walk

0.5 = momentum.

Great.

Lets apply a simple linear strategy to see how it performs over this series. We will setup a rolling z-score and we will buy when the zscore crosses below 0 and we will sell when it crosses back over 0. We use a arbitrarily chosen lookback of 10 days for this.

Here are the results:

Rplot109

The above plot is the compounded growth of $1 and since 1995 $1 has grown to over $800 or over 79,900 %.

Next lets calculate the half life of mean reversion. We do this with linear regression. For the independent variable we use the price difference between today’s close and yesterdays close. For the dependent variable we use the price differences between today’s and yesterdays close – the mean of the price difference between today’s close and yesterdays close.

Note we use the previous 100 days of data to produce this test:

# Calculate yt-1 and (yt-1-yt)
y.lag <- c(random.data[2:length(random.data)], 0)   # Set vector to lag -1 day
y.lag  <- y.lag[1:length(y.lag)-1]    # As shifted vector by -1, remove anomalous element at end of vector
random.data <- random.data[1:length(random.data)-1]  # Shift data by -1 to make same length of vector
y.diff <- random.data - y.lag    # Subtract todays close - close from yesterday
y.diff  <- y.diff [1:length(y.diff)-1]   # Adjust length of vector
prev.y.mean <- y.lag - mean(y.lag)  # Subtract yesterdays close from the mean of lagged differences
prev.y.mean <- prev.y.mean [1:length(prev.y.mean )-1]  # Adjust length of vector
final <- merge(y.diff, prev.y.mean)   # Merge
final.df <- as.data.frame(final)  # Create final data frame

# Linear Regression With Intercept
result <- lm(y.diff ~ prev.y.mean, data = final.df)
half_life <- -log(2)/coef(result)[2]
half_life

We obtain a half life of 4.503093 days.

Next lets see if we can set our linear strategy lookback period equal to the half life to see if it improves results. The original look back period was 10 days chosen arbitrarily. The result of a look back of 4.5 rounded to 5 days is below:

Rplot109

From 1995 to roughly present day the result did not improve significantly but looking at the plot we see a large uptick in the equity curve from 2013 onwards. Lets subset our data to only include data post 2013 and lets re-run the 10 day look back and also the 5 day look back to see if we can see the benefit of optimizing using the mean reversion half life.

First the result of the 10 day look back arbitrarily chosen:

Rplot112

We see that $1 has grown to $8 or 700% increase.

Next the look back of 4.5 rounded to 5 days derived from the mean reversion half life calculation:

Rplot109.png
We see that using a look back set to equal the mean reversion half life of 5 days rounded, we see $1 has grown to over $15 or a 1400% increase.

Lets run the Hurst exponent on both periods, the first from 1995 to 2013. The second from 2013 to roughly present day:

1st test: We see H = 0.4601632
2nd: We see H = 0.4230494

Ok so we see the Hurst exponent become more mean reverting post 2013. If we test >= 2016 and >= 2017 we see:
H = 0.3890816 and 0.2759805 respectively.

Next lets choose a random time frame between 1995 and 2013.

From period 2000 to 2003, H = 0.5198083 which is more a random walk.

If we look at period 2003 to 2008 we have a H value of 0.4167166 which is more mean reverting, however, this H value of 0.41 is actually lower than the post 2013 H value of 0.4230494. So the H value in this case didnt say because H is this, then gains should be that.

This might be caused by other factors, frequency of trades, price range, fluctuations etc..

Note this post is largely theoretical no commissions are included in any of the trades. This demonstrates the combination of using statistical tools and performing a back test.

Advertisements

Half life of Mean Reversion – Ornstein-Uhlenbeck Formula for Mean-Reverting Process

Ernie chan proposes a method to calculate the speed of mean reversion. He proposes to adjust the ADF (augmented dickey fuller test, more stringent) formula from discrete time to differential form. This takes shape of the Ornstein-Uhlenbeck Formula for mean reverting process. Ornstein Uhlenbeck Process – Wikipedia

dy(t) = (λy(t − 1) + μ)dt + dε

Where dε is some Gaussian noise. Chan goes on to mention that using the discrete ADF formula below:

Δy(t) = λy(t − 1) + μ + βt + α1Δy(t − 1) + … + αkΔy(t − k) + ∋t

and performing a linear regression of Δy(t) against y(t − 1) provides λ which is then used in the first equation. However, the advantage of writing the formula in differential form is it allows an analytical solution for the expected value of y(t).

E( y(t)) = y0exp(λt) − μ/λ(1 − exp(λt))

Mean reverting series exhibit negative λ. Conversely positive λ means the series doesn’t revert back to the mean.

When λ is negative, the value of price decays exponentially to the value −μ/λ with the half-life of decay equals to −log(2)/λ. See references.

We can perform the regression of yt-1 and (yt-1-yt) with the below R code on the SPY price series. For this test we will use a look back period of 100 days versus the entire price series (1993 inception to present). If we used all of the data, we would be including how long it takes to recover from bear markets. For trading purposes, we wish to use a shorter sample of data in order to produce a more meaningful statistical test.

The procedure:
1. Lag SPY close by -1 day
2. Subtract todays close – yesterdays close
3. Subtract (todays close – yesterdays close) – mean(todays close – yesterdays close)
4. Perform linear regression of (today close – yesterday) ~ (todays close – yesterdays close) – mean(todays close – yesterdays close)
5. On regression output perform -log(2)/λ

# Calculate yt-1 and (yt-1-yt)
y.lag <- c(random.data[2:length(random.data)], 0) # Set vector to lag -1 day
y.lag <- y.lag[1:length(y.lag)-1] # As shifted vector by -1, remove anomalous element at end of vector
random.data <- random.data[1:length(random.data)-1] # Make vector same length as vector y.lag
y.diff <- random.data - y.lag # Subtract todays close from yesterdays close
y.diff <- y.diff [1:length(y.diff)-1] # Make vector same length as vector y.lag
prev.y.mean <- y.lag - mean(y.lag) # Subtract yesterdays close from the mean of lagged differences
prev.y.mean <- prev.y.mean [1:length(prev.y.mean )-1] # Make vector same length as vector y.lag
final.df <- as.data.frame(final) # Create final data frame

# Linear Regression With Intercept
result <- lm(y.diff ~ prev.y.mean, data = final.df)
half_life <- -log(2)/coef(result)[2]
half_life

# Linear Regression With No Intercept
result = lm(y.diff ~ prev.y.mean + 0, data = final.df)
half_life1 = -log(2)/coef(result)[1]
half_life1

# Print general linear regression statistics
summary(result)

regress

regress..

Observing the output of the above regression we see that the slope is negative and is a mean revering process. We see from summary(results) λ is -0.06165 and when we perform -log(2)/λ we obtain a mean reversion half life of 11.24267 days.

11.24267 days is the half life of mean reversion which means we anticipate the series to fully revert to the mean by 2 * the half life or 22.48534 days. However, to trade mean reversion profitably we need not exit directly at the mean each time. Essentially if a trade extended over 22 days we may expect a short term or permanent regime shift. One may insulate against such defeats by setting a ‘time stop’.

The obtained 11.24267 day half life is short enough for a interday trading horizon. If we obtained a longer half life we may be waiting a long time for the series to revert back to the mean. Once we determine that the series is mean reverting we can trade this series profitably with a simple linear model using a look back period equal to the half life. In a previous post we explored a simple linear zscore model: https://flare9xblog.wordpress.com/2017/09/24/simple-linear-strategy-for-sp500/

The lookback period of 11 days was obtained using a ‘brute force approach’ (maybe luck). An optimal look back period of 11 days produced the best result for the SPY.

Post brute forcing, it was noted during optimization of the above strategy that adjusting the half life from 11 days to any number above or below, we experienced a decrease in performance.

We illustrate the effect of moving the look back period shorter and longer than the obtained half life. For simplicity, we will use the total cumulative returns for comparison:

10

11.

12

We see that a look back of 11 days produced the highest cumulative compounded returns.

Ernie Chan goes on to mention that ‘why bother with statistical testing’. The answer lies in the fact that specific trading rules only trigger when their conditions are met and therefore tend to skip over data. Statistical testing includes data that a model may skip over and thus produce results with higher statistical significance.

Furthermore, once we confirm a series is mean reverting we can be assured to find a profitable trading strategy and not per se the strategy that we just back tested.

References
Algorithmic Trading: Winning Strategies and Their Rationale – May 28, 2013, by Ernie Chan