This is the question people ask me very often. And it’s happening with both retail and institutional traders who are trying to run systematic strategies. There are quite a lot of things that can go wrong between perfect backtesting results and pure trading performance. In this article, I’m trying to cover all the reasons that are most frequent from my experience. Of course, this might not be everything, so feel free to leave comments with another case you had in your trading.
Let’s start with a human factor. If you don’t have a fully automated trading set up, most likely you have some influence of it to your trading. I know quite a lot of traders who don’t follow the systematic processes in 100% of cases but amend orders discretionary based on their experience and market conditions. In many cases, it’s a good idea, but from time to time, it can influence your performance in a wrong way. You might have emotions interfering with what strategy says you to do. There also might be cases when you can’t physically trades, like in the middle of the night.
My advice here is for you to reflect on your trading in backtesting as much as possible:
- If there is a time when you can’t trade – implement a trading window in your backtest.
- If you feel that your SL / PT is too tight, try to reduce it in backtest to reflect how you treat your positions during execution.
- If you decline signals because of the market conditions – try to formalize it mathematically and add it as an additional filter.
Another good idea is to separate your systematic and discretionary trades into different accounts. This way, you can better understand your performance and attribution of your trading to the overall P&L. If you’ll do this in an organized fashion, you might even see when you’re discretionally trading works and when it doesn’t so you can amend your behavior a bit and try to keep your emotion under control.
Low-quality or corrupted data
In Data Science and Machine Learning, there is a saying, “garbage in, garbage out”. Here it is even more important because you put at risk quite a lot of your money. Data quality is super important, and try to invest as much as possible your money/time in data quality. Try to purchase good data sources with decent data quality. Fortunately, these days market data API has become more and more democratic and for a decent amount of money, you can get clean data pretty easily. Remember that free data sources, even Yahoo Finance, might be wrong. They don’t owe anything to you.
Also, try to create processes to clean/validate your data. In Modern languages, it’s easy to check that data doesn’t have at least NAs, zeroes, P&L of 10x daily, and other outliers.
One of the mistakes I see people making is backtesting their models on the non-adjusted stock data. If it’s not adjusted for splits, you might see tremendous performances in a single data point. But not adjusting stocks even for dividends might be an issue for you. The average dividend rate is around 4-5%, so you’ll see, on average, an additional 4-5% drop yearly in the price of your stock. This will improve short strategies and punish long strategies.
The same point is valid for continuous futures. Ideally, to receive realistic results, you should use/construct a backward adjusted continuous future instead of the simple front one without adjustment. Historical values might not make sense to you, but backtesting results will be more correct.
Also, be careful when you’re using non-linear data like Renko and Heiking-Ashi. Remember that Renko ideally had to be created on tick data. if your Renko is created from high timeframe OHLC values, your analysis won’t make any sense. You should also understand that Heiking-Ashi candles can be used only for signals. You can’t use its prices for computing backtest itself. For example, this problem exists in TradingView, and people tend to run their strategies on top of Heiking-Ashi and see much better results, but that won’t work in real trading.
Be careful about backtesting your strategies on not tradable instruments. For example, strategies calculated on VIX might be much better than actual strategies running on VIX futures. Try to use continuous future directly instead.
Backtest has a data leakage
In Machine learning, there is a term called Data Leakage. In short, it means that for learning, you’re using a feature you can’t use because it comes together with the outcome, and you shouldn’t know it when you make your decision.
The same issue I face quite often in models of my clients. The idea is that you’re trying to use the data from the future to trade. And it’s not always easy to spot this problem.
The most common issue is when backtest is created on a few timeframes. For example, you trade on a 1-hour chart and get confirmation from the 1-day chart. If you don’t code your strategy carefully, you can use the value of today’s close to making your decision on a 1-hour chart. And of course, it’s not possible to know the close during the day. With this bug, you can build a fantastic backtest, but in reality, it won’t work as well. Be careful on multi-timeframe backtests.
Another pretty frequent issue is when you’re building a model based on a few instruments from different timezones, and you don’t correctly convert them to a single timeframe. This means that one of your instruments will be a few hours ahead of the other, and we’ll use values from the future.
Not-so-obvious cases are when for example, you trade futures but base your signals on the spot for example. The problem here might be if, on particular markets, you might see that spot doesn’t close at the same time as the future. If it closes, for example, 15 minutes after the close of the future, you’ll again use values from the future to trade futures at the close. 15 minutes is not a lot, but in reality, on big rallies, it might influence your strategy quite significantly.
Your backtest is not realistic.
So you have good data, correct backtest, and you follow your model 1 to 1. What else can go wrong? Ideally, your backtest should reflect all the costs you have when you trade. For specific markets, these costs can be very high and transform excellent strategies to losing one quickly.
The easiest and most important thing you can adjust in your backtests are transaction costs, for some strategies, costs can be more crucial than for others. For example, if you trade intraday, crypto transaction costs can change your results a lot. With a 0.1% transaction cost frequently, trading strategies can lose quite a lot of performance. For other strategies, it’s not so critical. If, for example, you trade stocks on Interactive Brokers on a daily basis, you can even omit the cost.
Another thing that people quite often forget is that when you have a short strategy, you have to pay a margin rate. With the current interest rates, they might be quite high. For example, at the moment, for USD at Interactive Brokers, you’ll pay 5-6% annualized. It’s not a lot, but in addition to other factors, it can make a difference.
Another essential parameter to count for is slippage. When you execute your trade through 3rd party bots, you need to be ready to see a few seconds of delay between your signal and execution. Of course, during this period price will move, and your price might not be the same, and quite often, it will go in the wrong direction. Also, you need to count for the spread width. Calculating trades always on mid might be too optimistic. And if you trade not liquid instruments like deep out-of-the-money options, you might see huge spreads.
So my advice here is to trade for a bit and then compare your real prices with the prices you get from the exchange and then make adjustments to your backtests to reflect these discrepancies.
Your backtest is overfitted
If everything else is not applied to you, but you see that performance of live trading is much worse than your backtest – there is a big change that you overfitted your strategy. Overfitting means that you fitted your strategy to work perfectly on existing data, but most likely, it won’t work for the new data during live trading.
So here are the most popular signs you have an overfitted backtest:
- You have a lot of parameters
- You use static levels for specific parameters
- You optimized your strategy on a limited amount of data.
- You launched huge grid optimization to find the perfect parameters
- You change your strategy too frequently when you receive even a few new bad trades.
- Your strategy works much worse on similar instruments.
Here are a few advises to avoid overfitting will be the following:
- Try to reduce the number of parameters. The fewer parameters you have, the harder to overfit your strategy.
- Try to replace absolute levels in your backtest with computed ones. For example, if you use an absolute level to indicate high volatility, use percentile instead. Market conditions can change significantly, and your strategy should adapt to them. What yesterday was high today can be normal or low.
- Make sure you have enough data to run your backtest on. It’s hard to say how much is enough, but I try to see at least 100-200 trades in the backtest. But don’t use too much data as well. The market 50 years ago was very different, and today’s intraday strategy might not make sense in the past.
- When you running an optimization, try to avoid local minimums. Try to change the parameters for your best backtest slightly. If you see a significant drop in your performance, you’re in trouble and try to find another set of parameters. It’s better to find a plateau of decent parameters than one perfect but with really bad around it.
- Try to be patient with your strategy. One bad trade doesn’t mean anything. You have to check it on a significant amount of trades to judge if it is good to not.
- Look at the performance of your strategy for similar products. If on BTC it’s perfect, but at ETH it’s really bad, then you might be overfitted.
- Try “out of sample testing”. When optimizing your strategy, don’t use the entire history, but leave a significant part of recent data aside. When you feel that you’re ready to trade it – include the missing part of the data and check performance during this period. It represents a performance as if you went live with it. If it’s terrible – return to the drawing board.
- Try to find the rationale behind the rules of your strategy. It can be quite a useful exercise. Quite a lot of the market’s behavior is human behavior. And seeing these patterns behind your rules can be a superpower and a much better idea than trying random indicators together.
Past performance is not an indication of future returns
I don’t want to tell you this, but there is no guarantee that the most correct and best backtest will work in live trading. History tent to repeat itself, but not always. This is why you see huge hedge funds with hundreds of PhDs losing billions. We’re testing these strategies on historical data, and the future might be different. There are no 100% guarantees in trading, so be ready for that. Make your backtest as best as they can be, but be ready that even they might not work. Good luck trading!