Monday, November 17, 2014

Trading VXX with nearest neighbors prediction

An experienced trader knows what behavior to expect from the market based on a set of indicators and their interpretation. The latter is often done based on his memory or some kind of model. Finding a good set of indicators and processing their information poses a big challenge. First, one needs to understand what  factors are correlated to future prices. Data that does not have any predictive quality only intorduces noise and complexity, decreasing strategy performance. Finding good indicators is a science on its own, often requiring deep understandig of the market dynamics. This part of strategy design can not be easily automated. Luckily, once a good set of indicators has been found, the traders memory and 'intuition' can be easily replaced with a statistical model, which will likely to perform much better as computers do have flawless memory and can make perfect statistical estimations.

Regarding volatility trading, it took me quite some time to understand what influences its movements. In particular, I'm interested in variables that predict future returns of VXX and  XIV. I will not go into full-length explanation here, but just present a conclusion : my two most valuable indicators for volatility are the term structure slope and current volatility premium.
My definition of these two is:

  • volatility premium = VIX-realizedVol
  • delta (term structure slope) = VIX-VXV
VIX & VXV are the forward 1 and 3 month implied volatilities of the S&P 500. realizedVol here is a 10-day realized volatility of SPY, calculated with Yang-Zhang formula. delta  has been often discussed on VixAndMore blog, while premium is well-known from option trading.

It makes sense to go short volatility when premium  is high and futures are in contango (delta  < 0). This will cause a tailwind from both the premium and daily roll along the term structure in VXX. But this is just a rough estimation. A good trading strategy would combine information from both premium and delta to come with a prediction on trading direction in VXX.
I've been struggling for a very long time to come up with a good way to combine the noisy data from both indicators. I've tried most of the 'standard' approaches, like linear regression, writing a bunch of 'if-thens' , but all with a very minor improvements compared to using only one indicator. A good example of such 'single indicator' strategy with simple  rules can be found on TradingTheOdds blog . Does not look bad, but what can be done with multiple indicators?

I'll start with some out-of-sample VXX data that I got from MarketSci. Note that this is simulated data, before VXX was created. 


The indicators for the same period are plotted below:



If we take one of the indicators (premium in this case) and plot it against future returns of VXX, some correlation can be seen, but the data is extremely noisy:


Still, it is clear that negative premium is likely to have positive VXX returns on the next day.
Combining both premium and delta into one model has been a challenge for me, but I always wanted to do a statistical approximation. In essence, for a combination of (delta,premium), I'd like to find all historic values that are closest to the current values and make an estimation of the future returns based on them. A couple of times I've started writing my own nearest-neighbor interpolation algorithms, but every time I had to give up... until I came across the scikit nearest neighbors regression. It enabled me to quickly build a predictor based on two inputs and the results are so good, that I'm a bit worried that I've made a mistake somewhere...

Here is what I did:

  1. create a dataset of [delta,premium] -> [VXX next day return] (in-of-sample)
  2. create a nearest-neighbor predictor based on the dataset above
  3. trade strategy (out-of-sample) with the rules:
    • go long if predicted return > 0
    • go short if predicted return <0
The strategy could not be simpler...

The results seem extremely good and get better when more neigbors are used for estimation.
First, with 10 points, the strategy is excellent in-sample, but is flat out-of-sample (red line in figure below is the last point in-sample)

Then, performance gets better with 40 and 80 points:



In the last two plots, the strategy seems to perform the same in- and out-of-sample. Sharpe ratio is around 2.3.
I'm very pleased with the results and have the feeling that I've only been scratching the surface of what is possible with this technique.



11 comments:

  1. Well, now that you fit three possible permutations to the "out of sample" data, while it may be out of sample for the model, what about out of sample for the analysis?

    ReplyDelete
    Replies
    1. I'm not sure that I understand your question.
      Tthe predictor is trained on the in-sample data and does not use out-of-sample data for prediction.

      Delete
    2. I know this post is quite old and stale, but in case anyone stumbles upon this... I agree with quantstrattrader on this one. Your model may only use in-sample data, but if you continue to iterate through your parameter space in the in-sample data until it yields the desired results in the out-of-sample data, you've effectively fit to the out of sample data as well.

      Delete
  2. Seems fishy that oos performance holds up so well for large #neighbors. I would suggest looking at a plot of outofsample performance ~ f(# of neighbors) ; if there is a clear linear relationship i'd be worried about some unintended lookahead bias somewhere

    ReplyDelete
  3. How did you calculate the premium?
    The closing value of VIX in day x minus YZ in x-10?

    ReplyDelete
  4. Hi Jev, I was wondering if you found out if there was some look ahead bias. Using the KNRegressor with uniform weights, I get more the opposite: the more neighbors you take into account, the worse the prediction as delta and premium often have contradictory signals.

    ReplyDelete
  5. Hi, I was wondering if you found out if there was any look ahead bias in your code? I have tested this strategy (both with sample over a fixed period and expanding period up to t-1) and don't end up with any similar results. The delta is a good predictor but the premium is less good and the combination doesn't make it any better. Increasing the number of neighbours seems to have the opposite effect ie decreasing the performance.

    ReplyDelete
    Replies
    1. I am now in the process of refactoring the code and will make an update post when its ready. Can take some time though.

      Delete
  6. Any progress on the code refactor? I'm sure there is a slight bias, but the idea is very sound. Any ideas?

    ReplyDelete