In this post we will estimate the Hurst exponent using the R/S method. The Hurst exponent determines the long range memory of a time series (more below). If a series has no memory ie if each point in time is independent from previous points in time then its said to be more of a random process. Examples of random processes are markov processes, Brownian motion and white noise.

A series which trends has autocorrelation where one point in time is correlated with a lagged version of itself. The autocorrelation of a series may be calculated with the following formula:

In English the top part of the equation (numerator) is the covariance where Yi is the series and Yi+k some lagged version of itself. The bottom part of the equation (denominator) is the variance.

So we simply calculate the covariance of the series and a lagged version of itself and divide by the variance.

We may plot the autocorrelation of the SP500 futures contract close prices.

source: https://www.quandl.com/data/CHRIS/CME_SP1-S-P-500-Futures-Continuous-Contract-1-SP1-Front-Month

*Can not validate the accuracy of this data but good enough for illustration purposes. *

This is far from random and close prices have a high degree of correlation between some lag of itself.

Now we may estimate the Hurst exponent over a SP futures return series using the rescaled analysis method (RS). This is the original method calculated by Harold Edwin Hurst who devised the formula to measure the mean reversion tendency of the Nile river.

We shall estimate H for a 1 bar return series and a 100 bar return series.

The R/S calculation steps are depicted on Wikipedia.

The Hurst exponent is more of an estimate than an absolute calculation. Corresponding H values signal:

H values less than 0.5 = anti persistent behavior, mean reverting

H values of 0.5 = random process

H values greater than 0.5 = persistent behavior, trending

The Hurst exponent procedure is as follows which calculates the R/S range and estimates the Hurst exponent by regression log(R/S) and log(n-lag):

1. Calculate a return series, We will estimate the Hurst exponent on a 1 bar return series. The thesis here is that it will be close to a random process if indeed each bar is independent from the last, or no auto correlation, H=0.5. If we pick a longer return series such as 100 we would expect to see higher H values and higher auto correlation coefficients.

2. Hurst exponent is estimated by regression of a power law of log(R/S) vs log(n-lag). The Slope being the Hurst exponent. For that reason logarithmically spaced block sizes were chosen at lags: [8,16,32,64,128,256,512,1024]

2. Calculate the mean for each block size. In this case 1:8, 2:9 etc.. or i-n+1:i. Do this for each block size.

3. For said block size, subtract the mean from each value in the return series

4. Sum the deviations from the mean

5. Find the maximum and minimum of the sum of deviations from the mean

6. Find R, the range of summed deviations from the mean by subtracting maximum – minimum

7. Calculate the standard deviation of the deviations from the mean

8. Calculate R/S, the rescaled range by dividing R by the stdev of deviations from the mean

9. After rolling through the series calculating the RS value along each block size. We take the mean RS value for each block size. So for each lag (block size) we have one mean value.

10. Perform a regression log2(mean_RS) vs log2(n_lag). The slope is the hurst exponent. The procedure above is detailed below using Julia language:

#################################################################################### # Rolling Hurst #################################################################################### # initialize outputs m = Array{Float64}(length(d_es_close),0) log_n_out = m out_RS = m # Set lags (range or specific) max_lag = 100 n = 200 lags = n:max_lag # or specific lags #comment out where necessary lags = [8,16,32,64,128,256,512,1024] # Specify return series lag n_lag = 2000 i=1 j=1 c=1 for j = lags n=lags[c] # Calculate returns of the series #n_lag = lags[c] # set n_lag 1 for 1 day returns rets = zeros(d_es_close) n_lag = n_lag for i in n_lag+1:length(d_es_close) rets[i] = ((d_es_close[i] / d_es_close[i-n_lag])-1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1) end #rets = d_es_close # Find mean width of lookback mean_out = zeros(rets) for i = n:size(rets,1) mean_out[i] = mean(rets[i-n+1:i]) end # Subtract deviations from the mean dev_mean_out = zeros(mean_out) for i = n:size(mean_out,1) dev_mean_out[i] = (rets[i] - mean_out[i]) end # Roll sum the deviations from the mean sum_out = zeros(dev_mean_out) for i = n:size(dev_mean_out,1) sum_out[i] = sum(dev_mean_out[i-n+1:i]) end # Find the maximum and minimum of sum of the mean deviations max_out = zeros(sum_out) for i = n:size(sum_out,1) max_out[i] = maximum(sum_out[i-n+1:i]) end min_out = zeros(sum_out) for i = n:size(sum_out,1) min_out[i] = minimum(sum_out[i-n+1:i]) end # R = Range, max - min R_out = zeros(dev_mean_out) for i= n:size(dev_mean_out,1) R_out[i] = max_out[i] - min_out[i] end # Rolling standard deviation of (returns) the mean stdev_out = zeros(rets) for i = n:size(rets,1) stdev_out[i] = sqrt(var(dev_mean_out[i-n+1:i])) end # Calculate rescaled range (Range (R_out) / stdev of returns ) RS_out = zeros(R_out) for i = n:size(R_out,1) RS_out[i] = R_out[i] / stdev_out[i] end # Calculate log_n (n) index = fill(n,length(rets)) log_n = zeros(rets) for i =n:size(index,1) log_n[i] = log2(index[i]) end # out log_n_out = hcat(log_n_out, log_n) #log_rs_out = hcat(log_rs_out, log_RS) out_RS = hcat(out_RS, RS_out) # re-scaled range c = c+1 end # access dims of the matrix # row ,col #log_rs_out[20,:] # Calculate expected value for R/S over various n # Take the rolling mean over each varying n lag at n width expected_RS = zeros(size(out_RS,1), size(out_RS,2)) n=100 for j = 1:size(out_RS,2) # loop on column dimension for i = n:size(out_RS,1) # loop on row dimension expected_RS[i,j] = mean(out_RS[i-n+1:i,j]) end end # log 2 expected_log_RS = zeros(size(out_RS,1), size(out_RS,2)) c=1 for j = 1:size(expected_log_RS,2) # loop on column dimension for i = n:size(expected_log_RS,1) # loop on row dimension expected_log_RS[i,j] = log2(expected_RS[i,j]) end c=c+1 end # Regression slope of log(n) and expected value of log(R/S) # x = log(n), y = log(R/S) b_slope = zeros(size(out_RS,1)) A_intercept = zeros(size(out_RS,1)) for i = n:size(out_RS,1) xi = log_n_out[i,:] # grab the row for varying lags of log_n yi = expected_log_RS[i,:] # grab the row for varying lags of r/s value # Mean of X (Mx) Mx = mean(xi) # Mean of Y (My) My = mean(yi) # Standard deviation of X (sX) sX = sqrt(var(xi)) # Standard Deviation of Y (sY) sY = sqrt(var(yi)) # Correlation of x and y (r) r = cor(xi,yi) # slope (b) = r * (sY/sX) b = r * (sY/sX) # find intercept A = MY - bMX A = My - (b*Mx) # out b_slope[i] = b A_intercept[i] = A end

Using the code above we may plot the hurst exponent for a 1 bar return series.

If we have a random process, successive points in time are independent from each other. We see Hurst values fluctuated around the .50 area with H .75 in 2009 during a strong trending period.

We may see the effect of analyzing a longer return stream, lets say 100 bar return series:

We see Hurst values closer to 1 for a 100 bar return series.

Furthermore to see how the Hurst exponent and auto correlation are related, we may calculate a rolling auto correlation:

# Rolling Autocorrelation # Sliding window (k width) mean_out = zeros(size(d_es_close,1)) var_out = zeros(size(d_es_close,1)) cov_out = zeros(size(d_es_close,1)) autocor_out = zeros(size(d_es_close,1)) devmean_out = zeros(size(d_es_close,1)) n=1000 k = 1 # set lag n_lag = 100 # return series lag # Calculate 1 bar return series rets = zeros(size(d_es_close,1)) for i in n_lag+1:length(d_es_close) rets[i] = ((d_es_close[i] / d_es_close[i-n_lag])-1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1) end # lag the series by k lagged_series = [fill(0,k); rets[1:end-k]] # Calculate the mean of the rolling sample for i = n:size(rets,1) mean_out[i] = mean(rets[i-n+1:i]) end # Calculate deviations from the mean for i = n:size(rets,1) devmean_out[i] = rets[i] - mean_out[i] end # Calculate rolling variance for i = n:size(rets,1) var_out[i] = var(rets[i-n+1:i]) end # Calculate rolling covariance for i = n:size(rets,1) if i+k >= size(rets,1) break end cov_out[i] = cov(rets[i-n+1:i],lagged_series[i-n+1:i]) end # Rolling Autocorrelation for i =n:size(rets,1) autocor_out[i] = cov_out[i] / var_out[i] end

For a 1 bar return series at lag, k=1

We see very little correlation from one point in time to the next. The respective hurst exponent is around 0.40 to 0.55 range.

And the auto correlation for a 100 bar return series at lag , k=1

We see strong correlation at the 100 bar return series with correlation coefficients greater than .96 with respective hurst exponents closer to 1.0.

So whats the point in all of this?

As depicted above information pertaining to the (non) randomness of series can may be of some benefit when designing models to fit to the data. The autocorrelation / Hurst exponent may be used to compliment existing strategies signalling the type of regime the market currently is under and may add to strategies which better suit current conditions.

As a note of interest. On a 1 bar daily return series for the SPY ETF at lags 2:20:

Values of H are below .50 and in mean reversion territory. Lets shorten the lags one more time 2:10.

We see significant mean reversion H values. At holding periods sub 20 bars, a mean reversion model would be best suited.

A way I like to think of this is how quickly does something diffuse from the mean? A mean reversion series is almost stuck in the mud, and the prices are centered around a mean. I like to think of it as a compressed coil or in the fractal dimension, pertains to the roughness of a surface where mean reversion is most certainly rough. On the other side a strong trend has a faster rate of diffusion from the mean or in the fractal dimension has a smooth surface. The S&P500 is a blend of this action where on a shorter holding period, there is mean reversion tendency (rough) but on longer time horizon the market displays trending behavior (smooth).

In closing and to recap. We see how the Hurst exponent and the auto correlation of a series are related to each other. Random processes show no dependence from one point in time to a successive point in time, or having no correlation with a subsequent Hurst value of 0.5. On the other side of this, a series showing persistent behavior, a trending series, displayed high correlation coefficients where successive points in time were correlated with each other and subsequent H values closer to 1.

Code may be found on my github.