Modelling The Hurst Exponent

One of the purposes of using the Hurst Exponent is to validate whether a price series is momentum, random walk or mean reverting. If we know this type of information, we may ‘fit’ a model to capture the nature of the series.

The Hurst exponent is categorized as:
H <0.5 = mean reverting
H == 0.5 = random walk 
H >0.5 = momentum

Editable parameters:
mu = mean # Change mean value
eta = theta # Try decreasing theta for less mean reversion, increase for more mean reversion
sigma = standard deviation # Change the height of the peaks and valleys with standard deviation

# Create OU simulation
OU.sim <- function(T = 1000, mu = 0.75, eta = 0.3, sigma = 0.05){
  P_0 = mu # Starting price is the mean
  P = rep(P_0,T)
  for(i in 2:T){
    P[i] = P[i-1] + eta * (mu - P[i-1]) + sigma * rnorm(1) * P[i-1]
  }
  return(P)
}

# Plot
plot(OU.sim(), type="l", main="Mean Reversion Sim")

# Save plot to data frame
plot.df <- data.frame(OU.sim())
plot(plot.df$OU.sim.., type="l",main="Mean Reversion Sim")

Rplot05

Looks pretty mean reverting.

We stored the simulation in a data frame so lets run the Hurst exponent to see which H value we obtain.

# Hurst Exponent (varying lags)
require(magrittr)
require(zoo)
require(lattice)

#Create lagged variables
lags <- 2:20

# Function for finding differences in lags. Todays Close - 'n' lag period
getLAG.DIFF <- function(lagdays) {
  function(plot.df) {
    c(rep(NA, lagdays), diff(plot.df$OU.sim.., lag = lagdays, differences = 1, arithmetic = TRUE, na.pad = TRUE))
  }
}
# Create a matrix to put the lagged differences in
lag.diff.matrix <- matrix(nrow=nrow(plot.df), ncol=0)

# Loop for filling it
for (i in lags) {
  lag.diff.matrix <- cbind(lag.diff.matrix, getLAG.DIFF(i)(plot.df))
}

# Rename columns
colnames(lag.diff.matrix) <- sapply(lags, function(n)paste("lagged.diff.n", n, sep=""))

# Bind to existing dataframe
plot.df <-  cbind(plot.df, lag.diff.matrix)

# Calculate Variances of 'n period' differences
variance.vec <- apply(plot.df[,2:ncol(plot.df)], 2, function(x) var(x, na.rm=TRUE))

# Linear regression of log variances vs log lags
log.linear <- lm(formula = log(variance.vec) ~ log(lags))
# Print general linear regression statistics
summary(log.linear)
# Plot log of variance 'n' lags vs log time
xyplot(log(variance.vec) ~ log(lags),
       main="SPY Daily Price Differences Variance vs Time Lags",
       xlab = "Time",
       ylab = "Logged Variance 'n' lags",
       grid = TRUE,
       type = c("p","r"),col.line = "red",
       abline=(h = 0)) 

hurst.exponent = coef(log.linear)[2]/2
hurst.exponent

We obtain a hurst exponent of 0.1368407
which is significantly mean reverting.

Lets change some of the parameters of the simulation to create a moderately mean reverting series. We can alter the theta, if we change eta = 0.3 to eta = 0.04 we obtain this output:

Rplot06

Looks less mean reverting than the first and H = 0.4140561. This is below H 0.50 and is considered mean reverting.

Let us test the SPY from 1993 (inception) to present (9.23.2017) to see what the H value is. The below chart output is a linear regression between the SPY lagged log differences and the log time. The Hurst exponent is the slope / 2 (code included).

Rplot07

The Hurst exponent for the SPY daily bars on time lags 2:20 is 0.4378202. We know that price series display different characteristics over varying time frames. If we simply plot the SPY daily closes:

Rplot10

Observing the  long term trend we see that the series looks more trending or momentum. We already tested a 2:20 day lagged period which is H value of 0.4378202 and if place the lags from 6 months to 1 and a half years (126:378 trading days) we see that H=0.6096454 and is on the momentum side of the scale.

So far – it is as expected.

What does a random series look like?

We can create this using randn from the ramify package. We simply cumsum each random generated data point and add a small positive drift to make it a trending series.

# Plot Random Walk With A Trend
require(ramify)
random.walk = cumsum(randn(10000)+0.025)
plot(random.walk, type="l", main="Random Walk")

# Random Walk Data Frame
random.df <- data.frame(cumsum(randn(10000)+0.03))
colnames(random.df)[1] <- "random"
plot(random.df$random, type="l", main="Random Walk")

Rplot09

The H for this series (lags 2:20) is 0.4999474 which rounded is 0.50 a random walk.

It would seem that based on the statistical tests the Hurst exponent is somewhat accurate in reflecting the nature of the series. It should be noted that different lags produce different regimes. 2:20 lags exhibit stronger mean reversion, on a 6 month to 1 and a half year time period (lags 126:378) the market exhibited stronger momentum H 0.6096454. At lags 50:100 its close to a random walk at H 0.5093078. What does this mean? Not only when optimizing models, we must optimize time frames.

To recap we:

1. Created a mean reverting price series with mu = 0.75, eta = 0.3, sigma = 0.05
2. We saved the output to a data frame and used the hurst calculation (linear regression of log lagged price differences vs log time) over a 2:20 lagged period to obtain the H value. See this post for more information on the hurst exponent calculation: https://flare9xblog.wordpress.com/2017/08/11/hurst-exponent-in-r/
3. The result was significantly mean reverting as we expected.
4. We tested SPY closes 1993 to 9.23.2017. On a lagged period of 2:20 the series was mean reverting and on a 6 month to 1.5 year time period the series was more momentum. This was as expected.
5. We created a random set of numbers and added a small drift to each data point to create a random walk trend. We obtained a H value of 0.5 rounded. Which is as expected.

The parameters for the simulated series can be edited to change the characteristics and the Hurst exponent can be calculated on each output. Try making the series more mean reverting or less mean reverting and the H value should adjust accordingly.

Full R code below:


# Modelling different price series 
# Mean reverison, random and momentum 
# Andrew Bannerman 9.24.2017

# Create OU simulation
# mu = mean
# eta = theta # Try decreasing theta for less mean reversion, increase for more mean reversion
# sigma = standard deviation # Change the height of the peaks and valleys with standard deviation
OU.sim <- function(T = 1000, mu = 0.75, eta = 0.04, sigma = 0.05){
  P_0 = mu # Starting price is the mean
  P = rep(P_0,T)
  for(i in 2:T){
    P[i] = P[i-1] + eta * (mu - P[i-1]) + sigma * rnorm(1) * P[i-1]
  }
  return(P)
}

# Plot
plot(OU.sim(), type="l", main="Mean Reversion Sim")

# Save plot to data frame 
plot.df <- data.frame(OU.sim())
plot(plot.df$OU.sim.., type="l",main="Mean Reversion Sim")

# Hurst Exponent Mean Reversion (varying lags)
require(magrittr)
require(zoo)
require(lattice)

#Create lagged variables
lags <- 2:20

# Function for finding differences in lags. Todays Close - 'n' lag period
getLAG.DIFF <- function(lagdays) {
  function(plot.df) {
    c(rep(NA, lagdays), diff(plot.df$OU.sim.., lag = lagdays, differences = 1, arithmetic = TRUE, na.pad = TRUE))
  }
}
# Create a matrix to put the lagged differences in
lag.diff.matrix <- matrix(nrow=nrow(plot.df), ncol=0)

# Loop for filling it
for (i in 2:20) {
  lag.diff.matrix <- cbind(lag.diff.matrix, getLAG.DIFF(i)(plot.df))
}

# Rename columns
colnames(lag.diff.matrix) <- sapply(2:20, function(n)paste("lagged.diff.n", n, sep=""))

# Bind to existing dataframe
plot.df <-  cbind(plot.df, lag.diff.matrix)

# Calculate Variances of 'n period' differences
variance.vec <- apply(plot.df[,2:ncol(plot.df)], 2, function(x) var(x, na.rm=TRUE))

# Linear regression of log variances vs log lags
log.linear <- lm(formula = log(variance.vec) ~ log(lags))  
# Print general linear regression statistics  
summary(log.linear) 
# Plot log of variance 'n' lags vs log time  
xyplot(log(variance.vec) ~ log(lags),         
       main="SPY Daily Price Differences Variance vs Time Lags",        
       xlab = "Time",        
       ylab = "Logged Variance 'n' lags",       
       grid = TRUE,        
       type = c("p","r"),col.line = "red",        
       abline=(h = 0)) 

hurst.exponent = coef(log.linear)[2]/2
hurst.exponent

# Write output to file write.csv(new.df,file="G:/R Projects/hurst.csv")

  # Plot Random Walk With A Trend
  require(ramify)
  random.walk = cumsum(randn(10000)+0.025)
  plot(random.walk, type="l", main="Random Walk")
  
  # Random Walk Data Frame 
  random.df <- data.frame(cumsum(randn(10000)+0.03))
  colnames(random.df)[1] <- "random"
  plot(random.df$random, type="l", main="Random Walk")

# Hurst Exponent Random Walk (varying lags)
require(magrittr)
require(zoo)
require(lattice)

#Create lagged variables
lags <- 2:20

# Function for finding differences in lags. Todays Close - 'n' lag period
getLAG.DIFF <- function(lagdays) {
  function(random.df) {
    c(rep(NA, lagdays), diff(random.df$random, lag = lagdays, differences = 1, arithmetic = TRUE, na.pad = TRUE))
  }
}
# Create a matrix to put the lagged differences in
lag.diff.matrix <- matrix(nrow=nrow(random.df), ncol=0)

# Loop for filling it
for (i in 2:20) {
  lag.diff.matrix <- cbind(lag.diff.matrix, getLAG.DIFF(i)(random.df))
}

# Rename columns
colnames(lag.diff.matrix) <- sapply(2:20, function(n)paste("lagged.diff.n", n, sep=""))

# Bind to existing dataframe
random.df <-  cbind(random.df, lag.diff.matrix)

# Calculate Variances of 'n period' differences
variance.vec <- apply(random.df[,2:ncol(random.df)], 2, function(x) var(x, na.rm=TRUE))

# Linear regression of log variances vs log lags
log.linear <- lm(formula = log(variance.vec) ~ log(lags))  
# Print general linear regression statistics  
summary(log.linear) 
# Plot log of variance 'n' lags vs log time  
xyplot(log(variance.vec) ~ log(lags),         
       main="SPY Daily Price Differences Variance vs Time Lags",        
       xlab = "Time",        
       ylab = "Logged Variance 'n' lags",       
       grid = TRUE,        
       type = c("p","r"),col.line = "red",        
       abline=(h = 0)) 

hurst.exponent = coef(log.linear)[2]/2
hurst.exponent

References
Algorithmic Trading: Winning Strategies and Their Rationale – May 28, 2013, by Ernie Chan

Author: Andrew Bannerman

Integrity Inspector. Quantitative Analysis is a favorite past time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s