The Hurst Exponent is a statistical testing method which tests if a series is mean reverting, trending or in geometric brownian motion. Using the hurst exponent a time series can be categorized by the following:

**Hurst Values < 0.5 = mean reverting **

**Hurst Vales = 0.5 = geometric brownian motion **

**Hurst Values > 0.5 = trending**

The hurst exponent falls between a range of 0 to 1. Where values closer to 0 signal stronger mean reversion and values closer to 1 signal stronger trending behavior.

Using R, we can calculate the hurst exponent:

# Hurst Exponent # Andrew Bannerman # 8.11.2017 require(lubridate) require(dplyr) require(magrittr) require(zoo) require(lattice) # Data path data.dir <- "G:/R Projects" output.dir <- "G:/R Projects" data.read.spx <- paste(data.dir,"SPY.csv",sep="/") # Read data read.spx <- read.csv(data.read.spx,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE) # Convert Values To Numeric cols <-c(3:8) read.spx[,cols] %<>% lapply(function(x) as.numeric(as.character(x))) # Convert Date Column [1] read.spx$Date <- ymd(read.spx$Date) # Make new data frame new.df <- data.frame(read.spx) # Subset Date Range new.df <- subset(new.df, Date >= "2000-01-06" & Date <= "2017-08-06") #Create lagged variables lags <- 2:20 # Function for finding differences in lags. Todays Close - 'n' lag period getLAG.DIFF <- function(lagdays) { function(new.df) { c(rep(NA, lagdays), diff(new.df$Close, lag = lagdays, differences = 1, arithmetic = TRUE, na.pad = TRUE)) } } # Create a matrix to put the lagged differences in lag.diff.matrix <- matrix(nrow=nrow(new.df), ncol=0) # Loop for filling it for (i in 2:20) { lag.diff.matrix <- cbind(lag.diff.matrix, getLAG.DIFF(i)(new.df)) } # Rename columns colnames(lag.diff.matrix) <- sapply(2:20, function(n)paste("lagged.diff.n", n, sep="")) # Bind to existing dataframe new.df <- cbind(new.df, lag.diff.matrix) # Calculate Variances of 'n period' differences variance.vec <- apply(new.df[,9:ncol(new.df)], 2, function(x) var(x, na.rm=TRUE)) # Linear regression of log variances vs log lags log.linear <- lm(formula = log(variance.vec) ~ log(lags)) # Print general linear regression statistics summary(log.linear) # Plot log of variance 'n' lags vs log time xyplot(log(variance.vec) ~ log(lags), main="SPY Daily Price Differences Variance vs Time Lags", xlab = "Time", ylab = "Logged Variance 'n' lags", grid = TRUE, type = c("p","r"),col.line = "red", abline=(h = 0)) hurst.exponent = coef(log.linear)[2]/2 hurst.exponent # Write output to file write.csv(new.df,file="G:/R Projects/hurst.csv")

For a little explanation of what is actually going on here: 1. First we are computing the lagged difference in close prices for the SPY. We do this by taking today’s SPY close – 2 day lag. This gives us the price difference between today’s SPY close and the SPY close 2 days ago. We do this for each lag 2:20. So for lag 3, this will take today’s SPY close – SPY Close 3 days ago. Repeat the process through to lag 20 (2:20). This will roll through the entire series. This is evident with head(new.df)

> head(new.df) Date Ticker Open High Low Close Volume Open.Interest lagged.diff.n2 lagged.diff.n3 lagged.diff.n4 lagged.diff.n5 1753 2000-01-06 SPY 139.2124 141.0819 137.3430 137.3430 6245656 138 NA NA NA NA 1754 2000-01-07 SPY 139.8979 145.3193 139.6486 145.3193 8090507 146 NA NA NA NA 1755 2000-01-10 SPY 145.8178 146.4410 144.5715 145.8178 5758617 146 8.47488 NA NA NA 1756 2000-01-11 SPY 145.3816 145.6932 143.0760 143.8861 7455732 144 -1.43326 6.54310 NA NA 1757 2000-01-12 SPY 144.1976 144.1976 142.4528 142.6398 6932185 143 -3.17808 -2.67956 5.29680 NA 1758 2000-01-13 SPY 144.0730 145.3193 142.8267 144.5715 5173588 145 0.68547 -1.24631 -0.74779 7.22857

There are leading NA’s depending on which lag period we used. This then rolls through the series taking the lagged differences.

2. After we will have all of our lagged differences from 2:20 (or any other range chosen)

3. We then for each ‘n’ lag period, compute the variance for that particular lagged period. This will be the variance of the total length of each lagged difference. We can see this by printing the variance vector:

> variance.vec lagged.diff.n2 lagged.diff.n3 lagged.diff.n4 lagged.diff.n5 lagged.diff.n6 lagged.diff.n7 lagged.diff.n8 lagged.diff.n9 lagged.diff.n10 4.288337 6.065315 7.823918 9.552756 11.155789 12.702647 14.185067 15.724892 17.180618 lagged.diff.n11 lagged.diff.n12 lagged.diff.n13 lagged.diff.n14 lagged.diff.n15 lagged.diff.n16 lagged.diff.n17 lagged.diff.n18 lagged.diff.n19 18.651980 20.167477 21.854415 23.647368 25.289570 26.751552 28.403188 30.110954 31.620225 lagged.diff.n20 33.130844

This shows us the variance for each of our lagged differences from 2 to 20.

4. After we plot the the log variance vs the log lags.

# Linear regression of log variances vs log lags log.linear <- lm(formula = log(variance.vec) ~ log(lags)) # Plot log of varaince 'n' lags vs log time xyplot(log(variance.vec) ~ log(lags), main="SPY Daily Price Differences Variance vs Time Lags", xlab = "Log Lags", ylab = "Logged Variance 'n' lags", grid = TRUE, type = c("p","r"),col.line = "red", abline=(h = 0))

5. The hurst exponent is log(lags) estimate / 2 (the slope / 2)

For date range: “2000-01-06” to “2017-08-06” at our chosen lags of 2:20 days:

**SPY Hurst exponent is 0.443483**. Which is mean reverting.

Another method is to compute a rolling simple hurst exponent over a rolling ‘n’ day period.

The calculation for simple Hurst:

# Function For Simple Hurst Exponent x <- new.df$Close # set x variable simpleHurst <- function(y){ sd.y <- sd(y) m <- mean(y) y <- y - m max.y <- max(cumsum(y)) min.y <- min(cumsum(y)) RS <- (max.y - min.y)/sd.y H <- log(RS) / log(length(y)) return(H) } simpleHurst(x) # Obtain Hurst exponent for entire series

What we can do is apply the simple hurst function using rollapply in R over ‘n’ day rolling look back period, we do this using our created getHURST function:

# Hurst Exponent # Andrew Bannerman # 8.11.2017 require(lubridate) require(dplyr) require(magrittr) require(zoo) require(ggplot2) # Data path data.dir <- "G:/R Projects" #Enter your directry here of you S&p500 data.. you need / between folder names not \ data.read.spx <- paste(data.dir,"SPY.csv",sep="/") # Read data to read.spx data frame read.spx <- read.csv(data.read.spx,header=TRUE, sep=",",skip=0,stringsAsFactors=FALSE) # Make dataframe new.df <- data.frame(read.spx) # Convert Values To Numeric cols <-c(3:8) new.df[,cols] %<>% lapply(function(x) as.numeric(as.character(x))) #Convert Date Column [1] new.df$Date <- ymd(new.df$Date) # Use for subsetting by date new.df <- subset(new.df, Date >= "2000-01-06" & Date <= "2017-08-06") # Change date ranges #new.df <- subset(new.df, Date >= as.Date("1980-01-01")) # Choose start date to present # Function For Simple Hurst Exponent x <- new.df$Close # set x variable simpleHurst <- function(y){ sd.y <- sd(y) m <- mean(y) y <- y - m max.y <- max(cumsum(y)) min.y <- min(cumsum(y)) RS <- (max.y - min.y)/sd.y H <- log(RS) / log(length(y)) return(H) } simpleHurst(x) #Obtain Hurst exponent for entire series # Calcualte rolling hurst exponent for different 'n' periods getHURST <- function(rolldays) { function(new.df) { rollapply(new.df$Close, width = rolldays, # width of rolling window FUN = simpleHurst, fill = NA, align = "right") } } # Create a matrix to put the roll hurst in roll.hurst.matrix <- matrix(nrow=nrow(new.df), ncol=0) # Loop for filling it for (i in 2:252) { roll.hurst.matrix <- cbind(roll.hurst.matrix, getHURST(i)(new.df)) } # Rename columns colnames(roll.hurst.matrix) <- sapply(2:252, function(n)paste("roll.hurst.n", n, sep="")) # Bind to existing dataframe new.df <- cbind(new.df, roll.hurst.matrix) # Line Plot of rolling hurst ggplot(data=new.df, aes(x = Date)) + geom_line(aes(y = roll.hurst.n5), colour = "black") + labs(title="Hurst Exponent - Rolling 5 Days") + labs(x="Date", y="Hurst Exponent") # Plot Roll Hurst Histogram qplot(new.df$roll.hurst.n5, geom="histogram", binwidth = 0.005, main = "Simple Hurst Exponent - Rolling 5 Days", fill=I("grey"), col=I("black"), xlab = "Hurst Exponent") # Plot S&P500 Close ggplot(data=new.df, aes(x = Date)) + geom_line(aes(y = Close), colour = "darkblue") + ylab(label="S&P500 Close") + xlab("Date") + labs(title="S&P500") # Write output to file write.csv(new.df,file="G:/R Projects/hurst.roll.csv")

This calculates the simple hurst exponent over an ‘n’ day look back period. As we can see from the plotted histograms, shorter time frames for the SPY show hurst exponents < 0.50 and if we extend our period to longer time frames the SPY fits the trending hurst category closer to hurst 1.

Using this information we may design (fit) a model which captures the nature of the series under examination. In this case it would make sense to build mean reversion models on short time periods for SPY and develop trending or momentum models for the longer time frames.

References

*Algorithmic Trading: Winning Strategies and Their Rationale – May 28, 2013, by Ernie Chan*