Friday, October 28, 2011

kung fu pandas will solve your data problems

I love researching strategies, but sadly I spend too much time on low-level work like filtering and aligning datasets. In fact, about 80% of my time is spent on this mind numbing work. There had got to be a better way than hacking all the filtering code myself, and there is!
Some time ago I've come across a data analysis toolkit pandas especially suited for working with financial data. After just scratching the surface of its capabilities I'm already blown away by what it delivers. The package is being actively developed by Wes McKinney  and his ambition is to create the most powerful and flexible open source data analysis/manipulation tool available. Well, I think he is well on the way!

Let's take a look at just how easy it is to align two datasets:

from tradingWithPython.lib import yahooFinance

startDate = (2005,1,1)

# create two timeseries. data for SPY goes much further back
# than data of VXX
spy = yahooFinance.getHistoricData('SPY',sDate=startDate)
vxx = yahooFinance.getHistoricData('VXX',sDate=startDate)

# Combine two datasets
X = DataFrame({'VXX':vxx['adj_close'],'SPY':spy['adj_close']})

# remove NaN entries
X= X.dropna()
# make a nice picture

Two lines of code! ( this could be even fit to one line, but I've split it for readability)
Here is the result:

Man, this could have saved me a ton of time! But it still will help me in the future, as I'll be using its DataFrame object as a standard in my further work.


  1. could you post your tradingWithPython.lib?
    That looks like fun to play around with even if it is not ready for prime time or whatever.

  2. @dutchbookmaker: All the code is already made public on Google Code ( You can get it with subversion. I'll try to write a post on how to setup a dev environment and get the code.

  3. Ahh nice sjev..that would be a great post. Hand holding for the newbs like me. I am not that familiar with google code but browsing it now.

  4. Sorry, when I go onto the site there are no files for downloading. Am I missing something?

  5. @NOYB: you can browse the code online here :

    To download the code, you'll need to use svn as explained here:

  6. #Before
    # Combine two datasets
    X = DataFrame({'VXX':vxx['adj_close'],'SPY':spy['adj_close']})

    #should import pandas DataFrame first:

    from pandas import DataFrame