Estimating the Hurst Exponent Using R/S Range

In this post we will estimate the Hurst exponent using the R/S method. The Hurst exponent determines the long range memory of a time series (more below). If a series has no memory ie if each point in time is independent from previous points in time then its said to be more of a random process. Examples of random processes are markov processes, Brownian motion and white noise.

A series which trends has autocorrelation where one point in time is correlated with a lagged version of itself. The autocorrelation of a series may be calculated with the following formula:

auto correlation

In English the top part of the equation (numerator) is the covariance where Yi is the series and Yi+k some lagged version of itself. The bottom part of the equation (denominator) is the variance.

So we simply calculate the covariance of the series and a lagged version of itself and divide by the variance.

We may plot the autocorrelation of the SP500 futures contract close prices.
source: https://www.quandl.com/data/CHRIS/CME_SP1-S-P-500-Futures-Continuous-Contract-1-SP1-Front-Month

Can not validate the accuracy of this data but good enough for illustration purposes.

auto_cor.png

This is far from random and close prices have a high degree of correlation between some lag of itself.

Now we may estimate the Hurst exponent over a SP futures return series using the rescaled analysis method (RS). This is the original method calculated by Harold Edwin Hurst who devised the formula to measure the mean reversion tendency of the Nile river.

We shall estimate H for a 1 bar return series and a 100 bar return series.

The R/S calculation steps are depicted on Wikipedia.

The Hurst exponent is more of an estimate than an absolute calculation. Corresponding H values signal:

H values less than 0.5 = anti persistent behavior, mean reverting
H values of 0.5 = random process
H values greater than 0.5 = persistent behavior, trending

The Hurst exponent procedure is as follows which calculates the R/S range and estimates the Hurst exponent by regression log(R/S) and log(n-lag):

1. Calculate a return series, We will estimate the Hurst exponent on a 1 bar return series. The thesis here is that it will be close to a random process if indeed each bar is independent from the last, or no auto correlation, H=0.5. If we pick a longer return series such as 100 we would expect to see higher H values and higher auto correlation coefficients.
2. Hurst exponent is estimated by regression of a power law of log(R/S) vs log(n-lag). The Slope being the Hurst exponent. For that reason logarithmically spaced block sizes were chosen at lags: [8,16,32,64,128,256,512,1024]
2. Calculate the mean for each block size. In this case 1:8, 2:9 etc.. or i-n+1:i. Do this for each block size.
3. For said block size, subtract the mean from each value in the return series
4. Sum the deviations from the mean
5. Find the maximum and minimum of the sum of deviations from the mean
6. Find R, the range of summed deviations from the mean by subtracting maximum – minimum
7. Calculate the standard deviation of the deviations from the mean
8. Calculate R/S, the rescaled range by dividing R by the stdev of deviations from the mean
9. After rolling through the series calculating the RS value along each block size. We take the mean RS value for each block size. So for each lag (block size) we have one mean value.
10. Perform a regression log2(mean_RS) vs log2(n_lag). The slope is the hurst exponent. The procedure above is detailed below using Julia language:


####################################################################################
# Rolling Hurst
####################################################################################

# initialize outputs
m = Array{Float64}(length(d_es_close),0)
log_n_out = m
out_RS = m

# Set lags (range or specific)
max_lag = 100
n = 200
lags = n:max_lag
# or specific lags #comment out where necessary
lags = [8,16,32,64,128,256,512,1024]
# Specify return series lag
n_lag = 2000

i=1
j=1
c=1
    for j = lags
        n=lags[c]
    # Calculate returns of the series
    #n_lag = lags[c] # set n_lag 1 for 1 day returns
    rets = zeros(d_es_close)
    n_lag = n_lag
    for i in n_lag+1:length(d_es_close)
        rets[i] = ((d_es_close[i] / d_es_close[i-n_lag])-1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1)
    end
    #rets = d_es_close
    # Find mean width of lookback
    mean_out = zeros(rets)
    for i = n:size(rets,1)
                mean_out[i] = mean(rets[i-n+1:i])
            end
    # Subtract deviations from the mean
    dev_mean_out = zeros(mean_out)
    for i = n:size(mean_out,1)
                dev_mean_out[i] = (rets[i] - mean_out[i])
            end
    # Roll sum the deviations from the mean
    sum_out = zeros(dev_mean_out)
    for i = n:size(dev_mean_out,1)
            sum_out[i] = sum(dev_mean_out[i-n+1:i])
        end
    # Find the maximum and minimum of sum of the mean deviations
    max_out = zeros(sum_out)
    for i = n:size(sum_out,1)
                max_out[i] = maximum(sum_out[i-n+1:i])
            end
    min_out = zeros(sum_out)
    for i = n:size(sum_out,1)
                min_out[i] = minimum(sum_out[i-n+1:i])
            end
    # R = Range, max - min
    R_out = zeros(dev_mean_out)
    for i= n:size(dev_mean_out,1)
        R_out[i] = max_out[i] - min_out[i]
    end
    # Rolling standard deviation of (returns) the mean
    stdev_out = zeros(rets)
    for i = n:size(rets,1)
            stdev_out[i] = sqrt(var(dev_mean_out[i-n+1:i]))
        end
    # Calculate rescaled range (Range (R_out) / stdev of returns )
    RS_out = zeros(R_out)
    for i = n:size(R_out,1)
            RS_out[i] = R_out[i] / stdev_out[i]
        end
    # Calculate log_n (n)
    index = fill(n,length(rets))
    log_n = zeros(rets)
    for i =n:size(index,1)
        log_n[i] = log2(index[i])
    end

# out
log_n_out = hcat(log_n_out, log_n)
#log_rs_out = hcat(log_rs_out, log_RS)
out_RS = hcat(out_RS, RS_out) # re-scaled range

c = c+1
end

# access dims of the matrix
# row ,col
#log_rs_out[20,:]

# Calculate expected value for R/S over various n
# Take the rolling mean over each varying n lag at n width
expected_RS = zeros(size(out_RS,1), size(out_RS,2))
n=100
for j = 1:size(out_RS,2)  # loop on column dimension
    for i = n:size(out_RS,1) # loop on row dimension
            expected_RS[i,j] =  mean(out_RS[i-n+1:i,j])
            end
        end

# log 2
expected_log_RS = zeros(size(out_RS,1), size(out_RS,2))
c=1
for j = 1:size(expected_log_RS,2)  # loop on column dimension
    for i = n:size(expected_log_RS,1) # loop on row dimension
            expected_log_RS[i,j] =  log2(expected_RS[i,j])
            end
                        c=c+1
        end

    # Regression slope of log(n) and expected value of log(R/S)
    # x = log(n), y = log(R/S)
    b_slope = zeros(size(out_RS,1))
    A_intercept = zeros(size(out_RS,1))

    for i = n:size(out_RS,1)
        xi = log_n_out[i,:]  # grab the row for varying lags of log_n
        yi = expected_log_RS[i,:] # grab the row for varying lags of r/s value
        # Mean of X (Mx)
        Mx = mean(xi)
        # Mean of Y (My)
        My = mean(yi)
        # Standard deviation of X (sX)
        sX = sqrt(var(xi))
        # Standard Deviation of Y (sY)
        sY = sqrt(var(yi))
        # Correlation of x and y  (r)
        r = cor(xi,yi)
        # slope (b) = r * (sY/sX)
        b = r * (sY/sX)
    # find intercept A = MY - bMX
    A = My - (b*Mx)

# out
b_slope[i] = b
A_intercept[i] = A
end

Using the code above we may plot the hurst exponent for a 1 bar return series.

Hurst (2)

If we have a random process, successive points in time are independent from each other. We see Hurst values fluctuated around the .50 area with H .75 in 2009 during a strong trending period.

We may see the effect of analyzing a longer return stream, lets say 100 bar return series:

H_close.png

We see Hurst values closer to 1 for a 100 bar return series.

Furthermore to see how the Hurst exponent and auto correlation are related, we may calculate a rolling auto correlation:

# Rolling Autocorrelation
# Sliding window (k width)
mean_out = zeros(size(d_es_close,1))
var_out = zeros(size(d_es_close,1))
cov_out = zeros(size(d_es_close,1))
autocor_out = zeros(size(d_es_close,1))
devmean_out = zeros(size(d_es_close,1))
n=1000
k = 1 # set lag
n_lag = 100 # return series lag

# Calculate 1 bar return series
rets = zeros(size(d_es_close,1))
for i in n_lag+1:length(d_es_close)
    rets[i] = ((d_es_close[i] / d_es_close[i-n_lag])-1) # rets[i] = ((d_es_close[i] / d_es_close[i-n_lag+1])-1)
end

# lag the series by k
lagged_series = [fill(0,k); rets[1:end-k]]

# Calculate the mean of the rolling  sample
for i = n:size(rets,1)
    mean_out[i] = mean(rets[i-n+1:i])
end

# Calculate deviations from the mean
for i = n:size(rets,1)
devmean_out[i] = rets[i] - mean_out[i]
end

# Calculate rolling variance
for i = n:size(rets,1)
    var_out[i] = var(rets[i-n+1:i])
end

# Calculate rolling covariance
for i = n:size(rets,1)
    if i+k >= size(rets,1)
        break
    end
    cov_out[i] = cov(rets[i-n+1:i],lagged_series[i-n+1:i])
end

# Rolling Autocorrelation
for i =n:size(rets,1)
    autocor_out[i] = cov_out[i] / var_out[i]
end

For a 1 bar return series at lag, k=1

auto_cor_1_bar

We see very little correlation from one point in time to the next. The respective hurst exponent is around 0.40 to 0.55 range.

And the auto correlation for a 100 bar return series at lag , k=1

auto_cor_100_bar.png

We see strong correlation at the 100 bar return series with correlation coefficients greater than .96 with respective hurst exponents closer to 1.0.

So whats the point in all of this?

As depicted above information pertaining to the (non) randomness of series can may be of some benefit when designing models to fit to the data. The autocorrelation / Hurst exponent may be used to compliment existing strategies signalling the type of regime the market currently is under and may add to strategies which better suit current conditions.

As a  note of interest. On a 1 bar daily return series for the SPY ETF at lags 2:20:

H_two_twenty

Values of H are below .50 and in mean reversion territory. Lets shorten the lags one more time 2:10.

H_two_ten

We see significant mean reversion H values. At holding periods sub 20 bars, a mean reversion model would be best suited.

A way I like to think of this is how quickly does something diffuse from the mean? A mean reversion series is almost stuck in the mud, and the prices are centered around a mean. I like to think of it as a compressed coil or in the fractal dimension, pertains to the roughness of a surface where mean reversion is most certainly rough. On the other side a strong trend has a faster rate of diffusion from the mean or in the fractal dimension has a smooth surface. The S&P500 is a blend of this action where on a shorter holding period, there is mean reversion tendency (rough) but on longer time horizon the market displays trending behavior (smooth).

In closing and to recap. We see how the Hurst exponent and the auto correlation of a series are related to each other. Random processes show no dependence from one point in time to a successive point in time, or having no correlation with a subsequent Hurst value of 0.5. On the other side of this, a series showing persistent behavior, a trending series, displayed high correlation coefficients where successive points in time were correlated with each other and subsequent H values closer to 1.

Code may be found on my github.

Author: Andrew Bannerman

Integrity Inspector. Quantitative Analysis is a favorite past time.

One thought on “Estimating the Hurst Exponent Using R/S Range”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s