*future*prices. Data that does not have any predictive quality only intorduces noise and complexity, decreasing strategy performance. Finding good indicators is a science on its own, often requiring deep understandig of the market dynamics. This part of strategy design can not be easily automated. Luckily, once a good set of indicators has been found, the traders memory and 'intuition' can be easily replaced with a statistical model, which will likely to perform much better as computers do have flawless memory and can make perfect statistical estimations.

Regarding volatility trading, it took me quite some time to understand what influences its movements. In particular, I'm interested in variables that

*predict*future returns of VXX and XIV. I will not go into full-length explanation here, but just present a conclusion : my two most valuable indicators for volatility are the term structure slope and current volatility premium.

My definition of these two is:

*volatility premium = VIX-realizedVol**delta (term structure slope) = VIX-VXV*

*VIX & VXV*are the forward 1 and 3 month implied volatilities of the S&P 500.

*realizedVol*here is a 10-day realized volatility of SPY, calculated with Yang-Zhang formula.

*delta*has been often discussed on VixAndMore blog, while

*premium*is well-known from option trading.

*premium*is high and futures are in contango (

*delta*< 0). This will cause a tailwind from both the premium and daily roll along the term structure in VXX. But this is just a rough estimation. A good trading strategy would combine information from both

*premium*and

*delta*to come with a prediction on trading direction in VXX.

If we take one of the indicators (premium in this case) and plot it against future returns of VXX, some correlation can be seen, but the data is extremely noisy:

Still, it is clear that negative premium is likely to have positive VXX returns on the next day.

Combining both premium and delta into one model has been a challenge for me, but I always wanted to do a statistical approximation. In essence, for a combination of (delta,premium), I'd like to find all historic values that are closest to the current values and make an estimation of the future returns based on them. A couple of times I've started writing my own nearest-neighbor interpolation algorithms, but every time I had to give up... until I came across the scikit nearest neighbors regression. It enabled me to quickly build a predictor based on two inputs and the results are so good, that I'm a bit worried that I've made a mistake somewhere...

Here is what I did:

- create a dataset of [
*delta,premium*] -> [*VXX next day return*] (in-of-sample) - create a nearest-neighbor predictor based on the dataset above
- trade strategy (out-of-sample) with the rules:
- go long if predicted return > 0
- go short if predicted return <0

In the last two plots, the strategy seems to perform the same in- and out-of-sample. Sharpe ratio is around 2.3.

I'm very pleased with the results and have the feeling that I've only been scratching the surface of what is possible with this technique.