Julia – Build any time resolution using 1 minute data

Reliable data makes for more accurate models. It is not the end of the world if there are minor discrepancies although data does need to be representative to build models and make good assumptions. Common data errors are known to be found at market closing times. We want the auction price not the last price. […]

 

Reliable data makes for more accurate models. It is not the end of the world if there are minor discrepancies although data does need to be representative to build models and make good assumptions.

Common data errors are known to be found at market closing times. We want the auction price not the last price. Last price might be some fluff trade with 1 lot. We want the real close or the auction close. This increases the accuracy of the daily close data.

To achieve this we can create any resolution using 1 minute bars and sample them at which ever time resolution one wishes. For example. If we want to create more accurate daily close data using the auction close price at 15:15 for ES. We may simply sample every 1 minute close at time stamp of 15:15.

If we want to build models and avoid a period of volatility, the last 15 minutes of trade we may sample the time at every 15:00 time stamp.

So in order to have more control over the creation of data I created the code attached to this blog post.

If we build 15 minute data. We may sample the 1 minute close price at each :00, :15, :30, :45 time increment.

For 30 minute data. We may sample the 1 minute close price at each :00 and :30 time increment.

# Handling missing data

Where missing data was experienced this was dealt with by forward filling the closest value. If there was a time stamp as follows: 09:01, 09:02, 09:03, 09:05, 09:07, 09:08. Where 09:04 and 09:06 are missing. To fill missing 09:04 we forward fill 09:03 data points and to fill missing 09:06 we forward fill 09:05.

This methodology seems consistent with how tradestation builds their larger resolution data from 1 minute data. Although if the data point is simply too far away, TS has an ignore feature (after studying further TS forward or back fill missing data if on the same day, I am since now looking to make this edit as it makes complete sense!). So they would miss the point 100% in the new time resolution sample.

Despite this, I feel the procedure deployed and the low frequency of missing data makes the data set good enough to build models on.

# Data Accuracy

A point to note pertaining to data accuracy. When building models on OHLC price data it makes sense to form statistical distributions by altering the original reported OHLC price by some % of the days ATR over many iterations. Thus we may observe how sensitive the models are to changes in the reported OHLC prices. In the absence of expensive bid ask data this is a good alternative.

# How to have confidence

The data of course has to be representative and accurate to how it unfolded in real time. Artificial signals would ensue if the data was of terrible quality. For this reason by forming distributions we may land in some place of confidence. If a trading model can take a hammering over the days random % ATR adjustment.  Future confidence can be gained and suggests the model is fit to signal where noise/errors in OHLC reporting does not throw off the model and make it redundant.

The code for building any resolution may be found on my github with the steps as:

  1. Load ES daily data to index the days of trading. This removes all market holiday closures.
  2. Create a date and time index and join original 1 minute data to the date/time index.
  3. Forward fill missing data. For loop within acts like na.locf.
  4. Find half day market closures. Here we load 30 minute data and loop through looking for early closures and late market open (1x, sep 11, 2002)
  5. Save market holiday dates and times and loop through the series filling in the holiday times with “” or 0.0
  6. Reduce to remove all 0 and “”
  7. Build any time frame resolution. 10min, 15 min, 30 min etc…

Hope you enjoy and can save time vs. downloading each and every time resolution 🙂

See bar plot of number of errors in re-sampled vs Tradestation:

Errors

julia> out_all
6×2 DataFrames.DataFrame
│ Row │ x1 │ x2 │
├─────┼───────┼────┤
│ 1 │ Date │ 0 │
│ 2 │ Time │ 0 │
│ 3 │ Open │ 58 │
│ 4 │ High │ 9 │
│ 5 │ Low │ 13 │
│ 6 │ Close │ 0 │

58 discrepancies involved with the open price. There are a total of 67387 30 min bars in the example. This represents 0.0861% of the total sample.

Tradestation had missing data around the open 1 min prices. I forward filled the previous day close in this case. Where it would be more accurate to backfill the next available quote on the same day. I will likely update this pretty soon. There were other special closes such as 1 minute silences which I didn’t account for in my data set. A list may be found:

Nyse-Closings.pdf

Thanks for reading.

Julia – Download Free Data Using Alphavantage API

For those of you wanting to get started/familiar with the Julia language. I wrote a small script to import .csv data using the Alphavantage API.

In this example we demonstrate how to download a single .csv file as well as batch download multiple .csv files and export as .csv.

First lets download a single .csv. AAPL 1 minute data using the alphavantage API. You will need to insert your API key.

single__test

And plot the output:

AAPL_1min

If we wish to download multiple .csv files and export to .csv we may:

multiple

Those of you using R may visit: Free Data: Alternative to Yahoo Finance

Full Julia code on my github

Free Coin Data: Batch download coin data using R

Will use the crypto compare API. The data is free. We will download daily data.

Required packages

library(jsonlite)
library(data.table)
library(ggplot2)

Next download the full coin list from crypto compare and save the list to a data frame

# Obtain coin list
response = fromJSON("https://www.cryptocompare.com/api/data/coinlist")
df = as.data.frame(data.table::rbindlist(response$Data, fill=TRUE))

The format is in a list, for ease we will save it to a .csv

# Write coin list to .csv
write.csv(df,file="D:/R Projects/Final Scripts/BitCoin Scripts/coin_list.csv")

Load the list of symbols from the .csv file and save the symbols in a vector

# Load Tickets From Coin List
read.data <- read.csv("D:/R Projects/Final Scripts/BitCoin Scripts/coin_list.csv", header=TRUE, stringsAsFactors = FALSE)
tickers <- c(read.data$Symbol)

Clean the list

# Those symbols with * at the end of their name
# Remove *
tickers <- gsub("[*]", "", tickers)

Next we write a loop to loop through the ticker symbol vector and download each symbol to .csv format. As an extra bonus, we will also plot the close price and save the plot to our output folder.

# Download Daily Data 
# Loop for downloading all .csvs 
# Save each coin close plot
# Add error catching to loop
# Add completion time of each iteration
i=1
for (i in 1:length(tickers)) {
  tryCatch({
  next.coin <- tickers[i] # next coin in ticker vector
coin_daily <- fromJSON(paste0("https://min-api.cryptocompare.com/data/histoday?fsym=",next.coin,"&tsym=USD&allData=true&e=CCCAGG"))
df <- data.frame("time" = coin_daily$Data$time, "close" = coin_daily$Data$close,"open" = coin_daily$Data$open,"high" = coin_daily$Data$high,"low" = coin_daily$Data$low,"volumefrom" = coin_daily$Data$volumefrom,"volumeto" = coin_daily$Data$volumeto)
# Save plot to output to folder
# Set path, set png plot size
# ggplot 2 to create line plot 
mytitle = paste0(next.coin)
graphics.off() # Close graphic device before next plot
p <- ggplot(df, aes(time)) +
  theme_classic()+
  geom_line(aes(y=close), colour="red") +
  ggtitle(mytitle, subtitle = "") +
  labs(x="Time",y="Close")+
  theme(plot.title = element_text(hjust=0.5),plot.subtitle =element_text(hjust=0.5))
ggsave(path="D:/R Projects/Final Scripts/BitCoin Scripts/daily_plot",paste(next.coin,".png"))
# Save each file to .csv
write.csv(df,paste0(file="D:/R Projects/Final Scripts/BitCoin Scripts/data/",next.coin,".csv"))
ptm0 <- proc.time()
Sys.sleep(0.1)  
ptm1=proc.time() - ptm0
time=as.numeric(ptm1[3])
cat('\n','Iteration',i,'took', time, "seconds to complete")
  }, error = function(e) { print(paste("i =", i, "failed:")) })
}

Free Data: Alternative to Yahoo Finance

Alpha Vantage have an API that offers free data. They have intraday data which spans the last 10 to 15 days at resolutions of 1min, 5min, 15min, 30min, 60min.

Daily, weekly, and monthly data which spans the past 20 years.

We can use R to interact with the aplha advantage API. We use the fread() command from the data.table package to directly download data to a data frame within R.

Note: insert your own Alpha Vantage API code inside the URL

require(lubridate)
require(data.table)
require(dplyr)
GE <- fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=GE&outputsize=full&apikey=YOUR_API_KEY_HERE&datatype=csv") #fread() data.table for downloading directly to a data frame
GE$timestamp <- ymd(GE$timestamp)   #Lubridate to change character date to date format
GE <- arrange(GE,timestamp)   #dplyr to sort data frame by date ascending order 

#Plot GE Data
plot(GE$timestamp,GE$close,type="l",main="Alpha Vantage - GE Close Prices")

alpha.vantage.GE

There are different variables that go into the alpha advantage URL and for full usage you may visit the documentation page.

Download Multiple .csv Files To Hard Drive

If we want to download multiple files at one time. We can self specify them in a vector and use a loop to download each file name in that vector.


# Download multiple .csv files to hard drive
# Self specify symbols in a vector
file.list <- c("DDM","MVV","QLD","SAA","SSO","TQQQ","UDOW","UMDD","UPRO","URTY","UWM", "BIB", "FINU","LTL","ROM", "RXL", "SVXY","UBIO","UCC","UGE","UPW","URE","USD","UXI","UYG","UYM","DOG","DXD","MYY","MZZ","PSQ","QID","RWM","SBB","SDD","SDOW","SDS","SH","SPXU","SQQQ","SRTY","TWM","SMDD","UVXY","VIXM","VIXY")

for (i in 1 : length(file.list)) {
  file.name.variable <-  file.list[i]
  url <- paste0("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=",file.name.variable,"&outputsize=full&apikey=YOUR_API_KEY_HERE&datatype=csv")
               destfile <- paste0("D:/R Projects/",
                                  file.name.variable, ".csv")
               download.file(url, destfile, mode="wb")

fread("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=GE&outputsize=full&apikey=YOUR_API_KEY_HERE&datatype=csv")
}

Or if we can find a list of symbols online we can import those into R and iterate through each symbol and download each file to our hard drive.

Import 1700 ETF Symbols From Nasdaq.com – Download .csv To Hard Drive

In this case we can read an ETF list of 1700 symbols from nasdaq.com.
Then we can download each symbol to our hard drive:

# Read ETF list csv file from nasdaq.com
# Use fread() from data.table package
# install.packages("data.table")
require(data.table)
read.data <- fread("http://www.nasdaq.com/investing/etfs/etf-finder-results.aspx?download=Yes")

# Make vector of symbol names
symbol.names <- read.data$Symbol

# Count Total ETF tickers
NROW(symbol.names)

for (i in 1 : length(symbol.names)) {
  file.name.variable <-  symbol.names[i]
  url <- paste0("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=",file.name.variable,"&outputsize=full&apikey=YOUR_API_CODE_HERE&datatype=csv")
  destfile <- paste0("D:/R Projects/",
                     file.name.variable, ".csv")
  download.file(url, destfile, mode="wb")
}

Also to note.. you may import a symbol list from your hard drive. The same principal applies.

R – Scrape List Of ETF’s From Nasdaq.Com

Using the package data.table and the function fread() we can read a .csv file that is hosted on the internet without downloading it to our hard drive.

# Obtain List of ETFS From nasdaq.com
# Andrew Bannerman 10.4.2017

library(data.table)

# Read ETF list csv file from nasdaq.com
# Use fread() from data.table package 
# install.packages("data.table")
read.data <- fread("http://www.nasdaq.com/investing/etfs/etf-finder-results.aspx?download=Yes")

# Subset Column 1, Symbol Column
symbol.col <- read.data$Symbol

# Export symbol column as .txt file 
write.table(symbol.col,"C:/R Projects/Data/etf_list_nasdaq_dot_com/etf.list.txt",append=TRUE,quote=FALSE,row.names=FALSE,col.names=FALSE)

# Count ETF tickers 
NROW(symbol.col)

As of date 10.4.2017 there is a total of 1726 ETF’s listed on nasdaq.com.