Friday, October 28, 2011

kung fu pandas will solve your data problems

I love researching strategies, but sadly I spend too much time on low-level work like filtering and aligning datasets. In fact, about 80% of my time is spent on this mind numbing work. There had got to be a better way than hacking all the filtering code myself, and there is!
Some time ago I've come across a data analysis toolkit pandas especially suited for working with financial data. After just scratching the surface of its capabilities I'm already blown away by what it delivers. The package is being actively developed by Wes McKinney  and his ambition is to create the most powerful and flexible open source data analysis/manipulation tool available. Well, I think he is well on the way!

Let's take a look at just how easy it is to align two datasets:

from tradingWithPython.lib import yahooFinance

startDate = (2005,1,1)

# create two timeseries. data for SPY goes much further back
# than data of VXX
spy = yahooFinance.getHistoricData('SPY',sDate=startDate)
vxx = yahooFinance.getHistoricData('VXX',sDate=startDate)

# Combine two datasets
X = DataFrame({'VXX':vxx['adj_close'],'SPY':spy['adj_close']})

# remove NaN entries
X= X.dropna()
# make a nice picture

Two lines of code! ( this could be even fit to one line, but I've split it for readability)
Here is the result:

Man, this could have saved me a ton of time! But it still will help me in the future, as I'll be using its DataFrame object as a standard in my further work.

Monday, October 17, 2011

Tools & Cookbook

I've added two pages specifically to help new users to get started.
Tools: here you'll find all the info you need to set up a development environment.
Cookbook: Overview of recipies I've written. The code itself is hosted on Google Code.

Saturday, October 15, 2011

How to download a bunch of csv files

I'll be analyzing VIX futures data, so to start with, I need all the files from CBOE. They use a consistent format for file names, which is easy to generate. The following script creates a data directory and then downloads all csv files for the period 2004-2011. As you can see, not much coding is required, less than 30 lines!

from urllib import urlretrieve
import os

m_codes = ['F','G','H','J','K','M','N','Q','U','V','X','Z'] #month codes of the futures
dataDir = os.getenv("USERPROFILE")+'\\twpData\\vixFutures' # data directory

def saveVixFutureData(year,month, path):
    ''' Get future from CBOE and save to file '''
    fName = "CFE_{0}{1}_VX.csv".format(m_codes[month],str(year)[-2:])
    urlStr = "{0}".format(fName)

    except Exception as e:
        print e

if __name__ == '__main__':
    if not os.path.exists(dataDir):

    for year in range(2004,2012):
        for month in range(12):
            print 'Getting data for {0}/{1}'.format(year,month)

    print 'Data was saved to {0}'.format(dataDir)

Friday, October 14, 2011

Interactive work with IPython

Pretty good screencast on python basics/
There is more of this good stuff which can be found here.