The scripts below automate this process. The first one, downloadVixFutures.py , gets the data from cboe, saves each file in a data directory and then combines them in a single csv file, vix_futures.csv
The second script reconstructVXX.py parses the vix_futures.csv, calculates the daily returns of VXX and saves results to reconstructedVXX.csv.
To check the calculations, I've compared my simulated results with the SPVXSTR index data, the two agree pretty well, see the charts below.
Note: For a fee, I can provide support to get the code running or create a stand-alone program, contact me if you are interested.
--------------------------------source codes--------------------------------------------
Code for getting futures data from CBOE and combining it into a single table
downloadVixFutures.py
#-------------------------------------------------------------------------------
# Name: download CBOE futures
# Purpose: get VIX futures data from CBOE, process data to a single file
#
#
# Created: 15-10-2011
# Copyright: (c) Jev Kuznetsov 2011
# Licence: BSD
#-------------------------------------------------------------------------------
#!/usr/bin/env python
from urllib import urlretrieve
import os
from pandas import *
import datetime
import numpy as np
m_codes = ['F','G','H','J','K','M','N','Q','U','V','X','Z'] #month codes of the futures
codes = dict(zip(m_codes,range(1,len(m_codes)+1)))
dataDir = os.path.dirname(__file__)+'/data'
def saveVixFutureData(year,month, path, forceDownload=False):
''' Get future from CBOE and save to file '''
fName = "CFE_{0}{1}_VX.csv".format(m_codes[month],str(year)[-2:])
if os.path.exists(path+'\\'+fName) or forceDownload:
print 'File already downloaded, skipping'
return
urlStr = "http://cfe.cboe.com/Publish/ScheduledTask/MktData/datahouse/{0}".format(fName)
print 'Getting: %s' % urlStr
try:
urlretrieve(urlStr,path+'\\'+fName)
except Exception as e:
print e
def buildDataTable(dataDir):
""" create single data sheet """
files = os.listdir(dataDir)
data = {}
for fName in files:
print 'Processing: ', fName
try:
df = DataFrame.from_csv(dataDir+'/'+fName)
code = fName.split('.')[0].split('_')[1]
month = '%02d' % codes[code[0]]
year = '20'+code[1:]
newCode = year+'_'+month
data[newCode] = df
except Exception as e:
print 'Could not process:', e
full = DataFrame()
for k,df in data.iteritems():
s = df['Settle']
s.name = k
s[s<5] = np.nan
if len(s.dropna())>0:
full = full.join(s,how='outer')
else:
print s.name, ': Empty dataset.'
full[full<5]=np.nan
full = full[sorted(full.columns)]
# use only data after this date
startDate = datetime.datetime(2008,1,1)
idx = full.index >= startDate
full = full.ix[idx,:]
#full.plot(ax=gca())
print 'Saving vix_futures.csv'
full.to_csv('vix_futures.csv')
if __name__ == '__main__':
if not os.path.exists(dataDir):
print 'creating data directory %s' % dataDir
os.makedirs(dataDir)
for year in range(2008,2013):
for month in range(12):
print 'Getting data for {0}/{1}'.format(year,month+1)
saveVixFutureData(year,month,dataDir)
print 'Raw wata was saved to {0}'.format(dataDir)
buildDataTable(dataDir)
Code for reconstructing the VXX
reconstructVXX.py
"""
Reconstructing VXX from futures data
author: Jev Kuznetsov
License : BSD
"""
from __future__ import division
from pandas import *
import numpy as np
class Future(object):
""" vix future class, used to keep data structures simple """
def __init__(self,series,code=None):
""" code is optional, example '2010_01' """
self.series = series.dropna() # price data
self.settleDate = self.series.index[-1]
self.dt = len(self.series) # roll period (this is default, should be recalculated)
self.code = code # string code 'YYYY_MM'
def monthNr(self):
""" get month nr from the future code """
return int(self.code.split('_')[1])
def dr(self,date):
""" days remaining before settlement, on a given date """
return(sum(self.series.index>date))
def price(self,date):
""" price on a date """
return self.series.get_value(date)
def returns(df):
""" daily return """
return (df/df.shift(1)-1)
def recounstructVXX():
"""
calculate VXX returns
needs a previously preprocessed file vix_futures.csv
"""
X = DataFrame.from_csv('vix_futures.csv') # raw data table
# build end dates list & futures classes
futures = []
codes = X.columns
endDates = []
for code in codes:
f = Future(X[code],code=code)
print code,':', f.settleDate
endDates.append(f.settleDate)
futures.append(f)
endDates = np.array(endDates)
# set roll period of each future
for i in range(1,len(futures)):
futures[i].dt = futures[i].dr(futures[i-1].settleDate)
# Y is the result table
idx = X.index
Y = DataFrame(index=idx, columns=['first','second','days_left','w1','w2','ret'])
# W is the weight matrix
W = DataFrame(data = np.zeros(X.values.shape),index=idx,columns = X.columns)
# for VXX calculation see http://www.ipathetn.com/static/pdf/vix-prospectus.pdf
# page PS-20
for date in idx:
i =nonzero(endDates>=date)[0][0] # find first not exprired future
first = futures[i] # first month futures class
second = futures[i+1] # second month futures class
dr = first.dr(date) # number of remaining dates in the first futures contract
dt = first.dt #number of business days in roll period
W.set_value(date,codes[i],100*dr/dt)
W.set_value(date,codes[i+1],100*(dt-dr)/dt)
# this is all just debug info
Y.set_value(date,'first',first.price(date))
Y.set_value(date,'second',second.price(date))
Y.set_value(date,'days_left',first.dr(date))
Y.set_value(date,'w1',100*dr/dt)
Y.set_value(date,'w2',100*(dt-dr)/dt)
valCurr = (X*W.shift(1)).sum(axis=1) # value on day N
valYest = (X.shift(1)*W.shift(1)).sum(axis=1) # value on day N-1
Y['ret'] = valCurr/valYest-1 # index return on day N
return Y
##-------------------Main script---------------------------
Y = recounstructVXX()
print Y.head(30)#
Y.to_csv('reconstructedVXX.csv')


Do you have equivalent code to recreate historical VXZ data?
ReplyDeletethanks
Just read the VXZ brochure and adjust the calculation accordingly
DeleteHi, how do you run this script? I have a finance background, never programmed before. I downloaded and installed Python, copy and pasted the script from above into IDLE and when I run the code, nothing really happens. Is it maybe because the URL in the code is broken?
ReplyDeleteBy the way, I was able to get a simple text script to run so it's installed properly.
Thanks, this would be very helpful instead of manually downloading this data month by month
I suppose that at least something should happen when you run the script, an error of some sort.
DeleteThe process of setting up a scientific Python environment is described here: http://tradingwithpython.blogspot.nl/p/setting-up-development-environment.html
What is the minimum python version required? I'm having issues when running on an older Linux install with python 2.4.
ReplyDeleteI answered my own question, requires at least 2.6.
Deleteby the way, there is an error in the follwing line
ReplyDeletei =nonzero(endDates>=date)[0][0] # find first not exprired future
nonzero is from numpy, so the correct line is
i =np.nonzero(endDates>=date)[0][0] # find first not exprired future
thanks for your work
Thanks for finding and sharing the correction Dai Ichi :-)
ReplyDeleteHi Jev, thank you for your post! It is very helpful for anyone who is interested in this topic.
ReplyDeleteI am curious whether your process replicates the actual VXX NAVs in addition to its returns? For example,
vxx1 = 10, vxx2 = 10.1 return = 10%;
vxx1 = 20, vxx2 = 20.2 return = 10%; these two sets data have set returns but different underlying values. The reason i ask is i tried using your code to replicate VXX NAV and there were some big discrepancies.
Thank you!