Basic Data Handling With Python

I have recently been looking for live execution platforms for placing a strategy into production. I have found zipline for python and with the intention of using zipline as a live execution platform I figured it would be prudent to pick up some python.

The learning curve from moving to R to python doesnt look that steep and in this post I will cover some basic data handling using python. We will:

1. Load Data, view data, check formats, convert integer date to Date format
2. Plot price series
3. Calculate cumulative returns / Annualized return, Annualized sharpe ratio
4. Calculate rolling mean, standard deviation and z-score
5. Perform linear regression by preparing two series, joining two data sets by common date and running the ordinary least squares regression

# Python Code For General working of data
# Andrew Bannerman 11.4.2017

# Import library
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.graphics.regressionplots import abline_plot # For
plotting regression line

# Read Data
df = pd.read_csv("Your Data Dir/MASTER_DATA_DUMP/SPY.csv")

# Print First 10 Lines Of Data
print(df.head(10))

# Print First 10 Rows Of Specific Data Frame Column
print(df.Close.head(10))

# Print Last 10 Lines of Data
print(df.tail(10))

# Print Last 10 Lines of Specific Data Frame Column
print(df.Close.tail(10))

# Check Data Frame Types
print(type(df))

# Check Type Of 1 Column
print(df['Close'].dtype)
print(df['Date'].dtype)

# Convert yyyymmdd to date format
df['Date'] = pd.to_datetime(df['Date'].astype(str), format ='%Y%m%d')
print(df['Date'].dtype) # Check date is in date format
print(df.Date.head(10)) # Print head of date column

# Create Plot Of SPY Close Prices
df.plot(x='Date',y='Close')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.title('SPY Close 1993 To Present')
plt.legend(loc='upper left')
plt.show()

Figure_1

# Calculate % Differences of Close Column
# Calculate Cumualtive Return
# use Dataframe.shift(1) to move series by -1 (yesterday)
df['ret'] = (df.Close - df.Close.shift(1)) / df.Close.shift(1)
df['cum_ret'] = (1 + df.ret).cumprod # Cumulative return
print(df.cum_ret.iloc[-1]) # Print last row of cumulative returns
(use iloc[-1])

# Plot Cumulative Returns
df.plot(x='Date',y='Close')
plt.xlabel('cum_ret')
plt.ylabel('Cumulative Returns')
plt.title('SPY Cumulative Returns')
plt.legend(loc='upper left')
plt.show()

Figure_1-1

# Calculate Annualized Sharpe Ratio
sharpe_ratio = np.sqrt(252) * (df.ret.mean() / df.ret.std())
print("Annualized Share Ratio is",sharpe_ratio)

# Calculate Annualized Return
time_between = (df.Date.iloc[-1] - df.Date.iloc[0]).days
print(time_between)
cagr = (df.Close.iloc[-1] / df.Close.iloc[1]) ** (365/time_between) -1
print("Annualized Return is",cagr, "Percent")

# Calculate Maximum Draw Down
lookback = 252
rolling_max = df['Close'].rolling(lookback, min_periods=1).max()
daily_drawdown = df['Close']/rolling_max - 1
daily_drawdown.plot()
plt.ylabel('Draw Down %')
plt.title('252 Day Rolling Draw Down')
plt.legend(loc='upper left')
plt.show()

Figure_1-2

# Calculate Rolling Statistics
# Calculate Rolling Z-Score
lookback = 12
df['moving_average'] = df.Close.rolling(lookback).mean() # Rolling
Moving Average
df['rolling_std_dev'] = df.Close.rolling(lookback).std() # Rolling stdev
df['rolling_zscore'] = (df.Close - df.moving_average) /
df.rolling_std_dev  # Zscore

# Plot Rolling Mean And Stdev
df.plot(x='Date', y=['Close','moving_average'])
plt.title('SPY Close With Moving Average')
plt.show()
df.plot(x='Date', y=['rolling_zscore'])
plt.title('N Period Look back Rolling Zscore')
plt.show()

Figure_1-3

Figure_1-4

# Linear Regression Two Series
SPX = pd.read_csv("Your Data Dir/MASTER_DATA_DUMP/$SPX.csv")
VIX = pd.read_csv("Your Data Dir/MASTER_DATA_DUMP/$VIX.csv")

# Prepare For Join By Common Date
# Convert yyyymmdd to date format
SPX['Date'] = pd.to_datetime(SPX['Date'].astype(str), format ='%Y%m%d')
VIX['Date'] = pd.to_datetime(VIX['Date'].astype(str), format ='%Y%m%d')

# Join Two Data Series By Common Date
merge_df = pd.merge(left=SPX, left_on='Date', right=VIX, right_on='Date')
print(merge_df.tail(10))

# Scatter Plot of SPX and VIX
merge_df.plot(x='Close_x',y='Close_y', kind='scatter')
plt.title('Scatter Plot VIX and SPX')
plt.show()

Figure_1_5

# Perform Linear Regression On Close Prices
model = sm.OLS(merge_df.Close_y, merge_df.Close_x)
results = model.fit()
print(results.summary())

ols

Author: Andrew Bannerman

Integrity Inspector. Quantitative Analysis is a favorite past time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s