Trading with Python: Backtesting dilemmas

Monday, May 26, 2014

Backtesting dilemmas

A quantitative trader faces quite some challenges on a way to a successful trading strategy. Here I’ll discuss a couple dilemmas involved in backtesting. A good trading simulation must :

Be good approximation of the real world. This one is of course the most important requirement .
Allow unlimited flexibility: the tooling should not stand in the way of testing out-of-the-box ideas. Everything that can be quantified should be usable.
Be easy to implement & maintain. It is all about productivity and being able to test many ideas to find one that works.
Allow for parameter scans, walk-forward testing and optimisations. This is needed for investigating strategy performance and stability depending on strategy parameters.

The problem with satisfying all of the requirements above is that #2 and #3 are conflicting ones. There is no tool that can do everything without the cost of high complexity (=low maintainablity). Typically, a third party point-and-click tool will severely limit freedom to test with custom signals and odd portfolios, while at the other end of the spectrum a custom-coded diy solution will require tens or more hours to implement with high chances of ending up with cluttered and unreadable code. So in attempt to combine the best of both worlds, let’s start somewehere in the middle: use an existing backtesting framework and adapt it to our taste.
In the following posts I’ll be looking at three possible candidates I’ve found:

Zipline is widely known and is the engine behind Quantopian
PyAlgotrade seems to be actively developed and well-documented
pybacktest is a light-weight vector-based framework with that might be interesting because of its simplicity and performance.

I’ll be looking at suitability of these tools benchmarking them against a hypothetical trading strategy. If none of these options fits my requirements I will have to decide if I want to invest into writing my own framework (at least by looking at the available options I’ll know what does not work) or stick with custom code for each strategy.
First one for the evaluation is Zipline.
My first impression of Zipline and Quantopian is a positive one. Zipline is backed by a team of developers and is tested in production, so quality (bugs) should be great. There is good documentation on the site and an example notebook on github .
To get a hang of it, I downloaded the exampe notebook and started playing with it. To my disappointment I quickly run into trouble at the first example Simplest Zipline Algorithm: Buy Apple. The dataset has only 3028 days, but running this example just took forever. Here is what I measured:

dma = DualMovingAverage()
%timeit perf = dma.run(data)

1 loops, best of 3: 52.5 s per loop

I did not expect stellar performance as zipline is an event-based backtester, but almost a minute for 3000 samples is just too bad. This kind of performance would be prohibitive for any kind of scan or optimization. Another problem would arise when working with larger datasets like intraday data or multiple securities, which can easily contain hundreds of thousands of samples.
Unfortunately, I will have to drop Zipline from the list of useable backtesters as it does not meet my requirement #4 by a fat margin.
In the following post I will be looking at PyAlgotrade.
Note: My current system is a couple of years old, running an AMD Athlon II X2 @2800MHZ with 3GB of RAM. With vector-based backtesting I’m used to calculation times of less than a second for a single backtest and a minute or two for a parameter scan. A basic walk-forward test with 10 steps and a parameter scan for 20x20 grid would result in a whooping 66 hours with zipline. I’m not that paitient.

3 comments:

UnknownJune 11, 2014 at 7:34 PM
While I'm a fan of zipline, I concur with your speed comments and have witnessed as much. PyAlgotrade, also event-driven, I concur is significantly faster, I assume with much less bells/whistles than zipline? not sure - no free lunch though.

Crunching on the dataset as one pandas df is certainly always the quickest when coded properly, as is the framework used with pybacktest. I'm certain you'll find this the fastest; but, note the current implementation is single security, not portfolio. Ideally, we optimize over a portfolio, not a security.

I have a hunch you'll be home-brewing, but we'll see what you think; that is, unless you want to try out commercial solutions.

A final comment, is that even though processing a df all at once is most efficient, I find myself leaning towards event-driven solutions if only to be 100% certain of no look-ahead bias, or at least minimize the possibility. They also mimic the real world best, can handle rollover situations,etc. Perhaps the parameter scans investigations you mention, etc. would be best suited for df-level; but, back up with a confirmation with an even-driven tool.

I look forward to your future posts.
ReplyDelete
Replies
High HandSeptember 14, 2014 at 9:48 PM
Did you look at bt - Flexible Backtesting for Python ?
http://pmorissette.github.io/bt/
ReplyDelete
Replies

Add comment

Trading With Python course

If you are a trader or an investor and would like to acquire a set of quantitative trading skills you may consider taking the Trading With Python couse. The online course will provide you with the best tools and practices for quantitative trading research, including functions and scripts written by expert quantitative traders. You will learn how to get and process incredible amounts of data, design and backtest strategies and analyze trading performance. This will help you make informed decisions that are crucial for a traders success. Click here to continue to the Trading With Python course website

About

My name is Jev Kuznetsov, during daytime I am a researcher/engineer in a company that is involved in printing business. By night I am a trader.

I studied applied physics with specialization in pattern recognition and artificial intelligence. My daily work involves anything from quick algorithm prototyping in Matlab and other languages to hardware design & programming.

Since 2009 I have been using my technical skills in financial markets. Before coming to the conclusion that Python is the best tool available, I was working extensively in Matlab, which is covered on my other blog.

You can reach me at