One of the purposes of using the Hurst Exponent is to validate whether a price series is momentum, random walk or mean reverting. If we know this type of information, we may ‘fit’ a model to capture the nature of the series.

The Hurst exponent is categorized as:

`H <0.5 = mean reverting`

`H == 0.5 = random walk `

`H >0.5 = momentum`

Editable parameters:

mu = mean `# Change mean value`

eta = theta `# Try decreasing theta for less mean reversion, increase for more mean reversion`

sigma = standard deviation `# Change the height of the peaks and valleys with standard deviation`

# Create OU simulation OU.sim <- function(T = 1000, mu = 0.75, eta = 0.3, sigma = 0.05){ P_0 = mu # Starting price is the mean P = rep(P_0,T) for(i in 2:T){ P[i] = P[i-1] + eta * (mu - P[i-1]) + sigma * rnorm(1) * P[i-1] } return(P) } # Plot plot(OU.sim(), type="l", main="Mean Reversion Sim") # Save plot to data frame plot.df <- data.frame(OU.sim()) plot(plot.df$OU.sim.., type="l",main="Mean Reversion Sim")

Looks pretty mean reverting.

We stored the simulation in a data frame so lets run the Hurst exponent to see which **H** value we obtain.

# Hurst Exponent (varying lags) require(magrittr) require(zoo) require(lattice) #Create lagged variables lags <- 2:20 # Function for finding differences in lags. Todays Close - 'n' lag period getLAG.DIFF <- function(lagdays) { function(plot.df) { c(rep(NA, lagdays), diff(plot.df$OU.sim.., lag = lagdays, differences = 1, arithmetic = TRUE, na.pad = TRUE)) } } # Create a matrix to put the lagged differences in lag.diff.matrix <- matrix(nrow=nrow(plot.df), ncol=0) # Loop for filling it for (i in lags) { lag.diff.matrix <- cbind(lag.diff.matrix, getLAG.DIFF(i)(plot.df)) } # Rename columns colnames(lag.diff.matrix) <- sapply(lags, function(n)paste("lagged.diff.n", n, sep="")) # Bind to existing dataframe plot.df <- cbind(plot.df, lag.diff.matrix) # Calculate Variances of 'n period' differences variance.vec <- apply(plot.df[,2:ncol(plot.df)], 2, function(x) var(x, na.rm=TRUE)) # Linear regression of log variances vs log lags log.linear <- lm(formula = log(variance.vec) ~ log(lags)) # Print general linear regression statistics summary(log.linear) # Plot log of variance 'n' lags vs log time xyplot(log(variance.vec) ~ log(lags), main="SPY Daily Price Differences Variance vs Time Lags", xlab = "Time", ylab = "Logged Variance 'n' lags", grid = TRUE, type = c("p","r"),col.line = "red", abline=(h = 0)) hurst.exponent = coef(log.linear)[2]/2 hurst.exponent

We obtain a hurst exponent of **0.1368407
** which is significantly mean reverting.

Lets change some of the parameters of the simulation to create a moderately mean reverting series. We can alter the theta, if we change `eta = 0.3 `

to `eta = 0.04`

we obtain this output:

Looks less mean reverting than the first and **H = 0.4140561**. This is below **H 0.50 **and is considered mean reverting.

Let us test the SPY from 1993 (inception) to present (9.23.2017) to see what the **H **value is. The below chart output is a linear regression between the SPY lagged log differences and the log time. The Hurst exponent is the slope / 2 (code included).

The Hurst exponent for the SPY daily bars on time lags 2:20 is **0.4378202**. We know that price series display different characteristics over varying time frames. If we simply plot the SPY daily closes:

Observing the long term trend we see that the series looks more trending or momentum. We already tested a 2:20 day lagged period which is **H** value of **0.4378202** and if place the lags from 6 months to 1 and a half years (126:378 trading days) we see that **H=0.6096454 ** and is on the momentum side of the scale.

So far – it is as expected.

What does a random series look like?

We can create this using `randn`

from the `ramify package`

. We simply `cumsum`

each random generated data point and add a small positive drift to make it a trending series.

# Plot Random Walk With A Trend require(ramify) random.walk = cumsum(randn(10000)+0.025) plot(random.walk, type="l", main="Random Walk") # Random Walk Data Frame random.df <- data.frame(cumsum(randn(10000)+0.03)) colnames(random.df)[1] <- "random" plot(random.df$random, type="l", main="Random Walk")

The **H** for this series (lags 2:20) is **0.4999474** which rounded is **0.50** a random walk.

It would seem that based on the statistical tests the Hurst exponent is somewhat accurate in reflecting the nature of the series. It should be noted that different lags produce different regimes. 2:20 lags exhibit stronger mean reversion, on a 6 month to 1 and a half year time period (lags 126:378) the market exhibited stronger momentum **H 0.6096454**. At lags 50:100 its close to a random walk at **H 0.5093078**. What does this mean? Not only when optimizing models, we must optimize time frames.

To recap we:

1. Created a mean reverting price series with `mu = 0.75, eta = 0.3, sigma = 0.05`

2. We saved the output to a data frame and used the hurst calculation (linear regression of log lagged price differences vs log time) over a 2:20 lagged period to obtain the **H** value. See this post for more information on the hurst exponent calculation: https://flare9xblog.wordpress.com/2017/08/11/hurst-exponent-in-r/

3. The result was significantly mean reverting as we expected.

4. We tested SPY closes `1993 to 9.23.2017. `

On a lagged period of 2:20 the series was mean reverting and on a 6 month to 1.5 year time period the series was more momentum. This was as expected.

5. We created a random set of numbers and added a small drift to each data point to create a random walk trend. We obtained a **H **value of **0.5** rounded. Which is as expected.

The parameters for the simulated series can be edited to change the characteristics and the Hurst exponent can be calculated on each output. Try making the series more mean reverting or less mean reverting and the H value should adjust accordingly.

Full R code below:

# Modelling different price series # Mean reverison, random and momentum # Andrew Bannerman 9.24.2017 # Create OU simulation # mu = mean # eta = theta # Try decreasing theta for less mean reversion, increase for more mean reversion # sigma = standard deviation # Change the height of the peaks and valleys with standard deviation OU.sim <- function(T = 1000, mu = 0.75, eta = 0.04, sigma = 0.05){ P_0 = mu # Starting price is the mean P = rep(P_0,T) for(i in 2:T){ P[i] = P[i-1] + eta * (mu - P[i-1]) + sigma * rnorm(1) * P[i-1] } return(P) } # Plot plot(OU.sim(), type="l", main="Mean Reversion Sim") # Save plot to data frame plot.df <- data.frame(OU.sim()) plot(plot.df$OU.sim.., type="l",main="Mean Reversion Sim") # Hurst Exponent Mean Reversion (varying lags) require(magrittr) require(zoo) require(lattice) #Create lagged variables lags <- 2:20 # Function for finding differences in lags. Todays Close - 'n' lag period getLAG.DIFF <- function(lagdays) { function(plot.df) { c(rep(NA, lagdays), diff(plot.df$OU.sim.., lag = lagdays, differences = 1, arithmetic = TRUE, na.pad = TRUE)) } } # Create a matrix to put the lagged differences in lag.diff.matrix <- matrix(nrow=nrow(plot.df), ncol=0) # Loop for filling it for (i in 2:20) { lag.diff.matrix <- cbind(lag.diff.matrix, getLAG.DIFF(i)(plot.df)) } # Rename columns colnames(lag.diff.matrix) <- sapply(2:20, function(n)paste("lagged.diff.n", n, sep="")) # Bind to existing dataframe plot.df <- cbind(plot.df, lag.diff.matrix) # Calculate Variances of 'n period' differences variance.vec <- apply(plot.df[,2:ncol(plot.df)], 2, function(x) var(x, na.rm=TRUE)) # Linear regression of log variances vs log lags log.linear <- lm(formula = log(variance.vec) ~ log(lags)) # Print general linear regression statistics summary(log.linear) # Plot log of variance 'n' lags vs log time xyplot(log(variance.vec) ~ log(lags), main="SPY Daily Price Differences Variance vs Time Lags", xlab = "Time", ylab = "Logged Variance 'n' lags", grid = TRUE, type = c("p","r"),col.line = "red", abline=(h = 0)) hurst.exponent = coef(log.linear)[2]/2 hurst.exponent # Write output to file write.csv(new.df,file="G:/R Projects/hurst.csv") # Plot Random Walk With A Trend require(ramify) random.walk = cumsum(randn(10000)+0.025) plot(random.walk, type="l", main="Random Walk") # Random Walk Data Frame random.df <- data.frame(cumsum(randn(10000)+0.03)) colnames(random.df)[1] <- "random" plot(random.df$random, type="l", main="Random Walk") # Hurst Exponent Random Walk (varying lags) require(magrittr) require(zoo) require(lattice) #Create lagged variables lags <- 2:20 # Function for finding differences in lags. Todays Close - 'n' lag period getLAG.DIFF <- function(lagdays) { function(random.df) { c(rep(NA, lagdays), diff(random.df$random, lag = lagdays, differences = 1, arithmetic = TRUE, na.pad = TRUE)) } } # Create a matrix to put the lagged differences in lag.diff.matrix <- matrix(nrow=nrow(random.df), ncol=0) # Loop for filling it for (i in 2:20) { lag.diff.matrix <- cbind(lag.diff.matrix, getLAG.DIFF(i)(random.df)) } # Rename columns colnames(lag.diff.matrix) <- sapply(2:20, function(n)paste("lagged.diff.n", n, sep="")) # Bind to existing dataframe random.df <- cbind(random.df, lag.diff.matrix) # Calculate Variances of 'n period' differences variance.vec <- apply(random.df[,2:ncol(random.df)], 2, function(x) var(x, na.rm=TRUE)) # Linear regression of log variances vs log lags log.linear <- lm(formula = log(variance.vec) ~ log(lags)) # Print general linear regression statistics summary(log.linear) # Plot log of variance 'n' lags vs log time xyplot(log(variance.vec) ~ log(lags), main="SPY Daily Price Differences Variance vs Time Lags", xlab = "Time", ylab = "Logged Variance 'n' lags", grid = TRUE, type = c("p","r"),col.line = "red", abline=(h = 0)) hurst.exponent = coef(log.linear)[2]/2 hurst.exponent

References

*Algorithmic Trading: Winning Strategies and Their Rationale – May 28, 2013, by Ernie Chan*