Overfitting Is the Silent Killer of Trading Strategies
Reading Notes · Systematic Trading — Part 2
There's a particular feeling you get when a backtest finally fits. You've been tweaking the lookback window, adjusting the entry threshold, maybe adding a volatility filter — and then suddenly the equity curve smooths out, the Sharpe ratio climbs past 1.5, and the drawdowns look almost civilized. It feels like discovery. It is, in fact, the opposite.
Robert Carver calls this process "fitting," and he dedicates the entire second part of Systematic Trading to explaining why it quietly destroys more trading systems than bad luck or bad markets ever will. This is the second post in my series working through the book. Part One covered the theoretical foundations — why systematic rules beat human judgment, and why realistic performance expectations are so much lower than the industry admits. Part Two is where Carver gets surgical.
Not financial advice. Carver's framework is presented here as a rigorous intellectual structure for thinking about strategy design — not a recommendation to trade anything.
The Taxonomy of Fitting Sins
Carver's Chapter 3 opens with a taxonomy that I found genuinely clarifying, because it separates things that are usually conflated under the umbrella term "overfitting."
The most common form is in-sample optimization: you test a parameter (say, a moving average crossover with a 20-day and 50-day window) against your historical data, find it works, and declare victory. The problem is you've used the answer key to write the test. The parameter values that happen to work in your dataset may have nothing to do with any underlying economic mechanism — they fit the noise.
A subtler variant is data snooping, or what statisticians call "p-hacking" in other contexts. This is what happens when you test 200 parameter combinations, find the 10 that are profitable, and then write a strategy around those 10. Even if every individual test has a 5% false-positive rate, running 200 tests almost guarantees you'll find several that look good purely by chance. The book is blunt: if you've tested more than a handful of variations, you should trust the result less than if you'd tested one.
Then there's survivorship bias, which is particularly treacherous for equity traders. If you're backtesting on a current index, you're backtesting on companies that survived. Every company that went bankrupt and fell out of the index is invisible to your model. The historical returns of "the S&P 500 components" look much better than what a real investor actually experienced, because real investors held some of those companies before they disappeared.
Carver's test for whether a strategy is overfit is elegantly simple: look at the out-of-sample period. Take your full dataset, fit your strategy to the first half, and see how it performs on the second half you didn't touch. If the Sharpe ratio in the second half is dramatically lower than in the first, you haven't discovered an edge — you've memorized history.
But here's the uncomfortable corollary he draws. Even a legitimate out-of-sample test has limits. Because financial data is so noisy, it takes an average of 37 years of data to statistically prove that a typical trading rule with a Sharpe ratio of 0.3 is genuinely profitable — not just lucky. Most backtests cover 10 to 15 years. This isn't a minor caveat. It means we are working in a regime of fundamental statistical uncertainty, and most of what we think we know about strategy performance is noise dressed up as signal.
Why Fewer Parameters Win
The practical implication Carver draws from this is counterintuitive for anyone trained in quantitative methods: use fewer parameters, not more.
This runs against instinct. More parameters means more flexibility, more expressiveness, more potential to capture complexity. But complexity in a model is only valuable if you have enough data to reliably estimate each parameter. In finance, you almost never do. A strategy with 10 parameters needs vastly more data to be confident it's not overfitting than a strategy with 2 parameters, and that data simply doesn't exist in typical backtesting horizons.
Carver's prescription is what he calls the "ideas first" approach. Start with an economic hypothesis — something you believe is true about how markets work and why it should persist. Then build the simplest possible rule that captures that hypothesis. Test it. If it works, be skeptical anyway. If it doesn't work, don't iterate endlessly until it does.
This is philosophically opposite to what most retail quants actually do, which is: run a parameter sweep, find what worked historically, post-hoc rationalize why it should have worked. Carver's point is that the second approach is fitting, not discovery.
His specific examples of how strategies degrade are instructive. A trend-following rule tested on data from 1990 to 2005 might show a Sharpe ratio of 0.8. Test it on 2006 to 2015 and it might drop to 0.4. That's not necessarily catastrophic — some decay is expected. But the 0.4 is probably closer to the truth about what the strategy actually offers. The 0.8 was partly fitting to the specific character of the 90s trend environment.
Handcrafting: The Antithesis of Optimization
Chapter 3's most provocative claim is about what to do instead of optimizing. Carver's answer is handcrafting — a manual, judgment-based approach to portfolio construction that he explicitly argues is superior to mathematical optimization for retail and semi-professional systematic traders.
The case against Markowitz-style mean-variance optimization is damning. The optimizer requires estimates of expected returns, variances, and correlations for every asset in the portfolio. It then finds the combination of weights that maximizes expected Sharpe ratio. In theory, this is optimal. In practice, it's a disaster.
The reason is that the optimizer is extremely sensitive to its inputs. Expected returns are notoriously difficult to estimate — Carver notes that the uncertainty in any historical return estimate swamps the signal. Feed the optimizer slightly wrong inputs (which you always will, because all your inputs are slightly wrong) and it produces wildly extreme results: 100% in a single asset class, short positions in everything else, complete abandonment of diversification. The math is right; the estimates are wrong; the portfolio is dangerous.
Handcrafting sidesteps this by relying on structural judgments instead of parameter estimates. The process, as Carver describes it, goes roughly like this: group your assets by economic similarity (government bonds in different currencies are correlated; equities and bonds are less correlated; equity and crude oil are even less). Within each group, assign equal weights. Across groups, assign weights based on a simple lookup table that approximates diversification without requiring precise correlation estimates.
The result is a portfolio that looks unsophisticated — a "pencil and paper" calculation, as Carver puts it. But it consistently outperforms mean-variance optimization in out-of-sample tests, precisely because it doesn't try to exploit noisy parameter estimates. It sacrifices mathematical optimality for robustness, and robustness is what actually matters when you're running money in real markets.
Portfolio Allocation: Diversification's Limits and Mechanisms
Chapter 4 extends the handcrafting philosophy to portfolio allocation more broadly. Carver's framing here is worth sitting with: he describes diversification as "the only free lunch in investing," but then immediately qualifies that most investors significantly underuse it — not by holding too few assets, but by misunderstanding how diversification actually works.
The naive view of diversification is: hold more assets, reduce risk. The more rigorous view is: hold assets with low correlations to each other, and size them so they contribute roughly equal amounts of risk. These sound similar but produce very different portfolios.
Consider an investor holding 60% equities and 40% bonds. They have two asset classes, which sounds diversified. But in terms of risk contribution, the portfolio is dominated by equities, because equities are far more volatile. The "40% bonds" position might contribute only 15-20% of total portfolio risk. So the portfolio is effectively 80-85% equity risk, with a thin layer of bonds that barely influences outcomes.
Carver's risk-parity-inspired approach to allocation corrects for this. You size each position so that its daily volatility contribution to the portfolio is standardized — not its capital weight, but its actual risk contribution. This naturally results in larger positions in low-volatility assets (bonds) and smaller positions in high-volatility assets (equities), producing genuine diversification rather than the cosmetic kind.
The bootstrap approach he discusses for estimating portfolio weights is intellectually honest about its own limitations. Rather than trusting a single historical estimate of correlations and Sharpe ratios, bootstrapping involves resampling from historical data many times to get a distribution of possible outcomes. This reveals the uncertainty in your parameter estimates rather than hiding it. The result is often that the "optimal" allocation from a single historical sample falls somewhere in the middle of a wide distribution — meaning you could have allocated quite differently and gotten similar expected outcomes. Equal-weight is frequently within the confidence interval.
This is Carver's empirical argument for why equal-weight often beats "optimal" allocations: not that equal-weight is theoretically best, but that given the amount of noise in parameter estimates, the gap between equal-weight and mean-variance optimal is smaller than the error bands around any estimate you could plausibly make.
The Practical Checklist
Before moving on to Parts Three and Four — which cover the mechanical framework for translating market views into actual trades — it's worth extracting what Part Two actually asks of a practitioner.
On strategy design:
- Build rules on economic hypotheses, not parameter searches. If you can't articulate why a rule should work in plain language, you shouldn't be trading it.
- Minimize parameters. If you have two versions of a rule — one with three parameters, one with one — default to the simpler version unless you have overwhelming evidence the complexity adds value.
- Out-of-sample performance should be your primary metric, not in-sample Sharpe ratio. If you haven't tested out-of-sample, you don't know what you have.
- Be extremely skeptical of any strategy that required significant parameter tuning before it looked good.
On portfolio construction:
- Don't use mean-variance optimization without understanding how sensitive it is to input errors. If you must use it, use bootstrap confidence intervals to understand the range of plausible allocations.
- Handcraft instead: group by correlation structure, equal-weight within groups, risk-weight across groups.
- Target equal risk contribution, not equal capital allocation.
What I Keep Coming Back To
The overfitting chapter resonates because the temptation it describes is so recognizable. Adding parameters until a curve fits perfectly isn't just a technical mistake — it's an emotional response to the frustration of watching a strategy that seemed promising fail to produce clean results. Carver's handcrafting philosophy is a specific, principled antidote to that impulse. Simplicity not as laziness, but as epistemic humility about what the data can actually tell you.
I think the deeper lesson is about the difference between understanding a phenomenon and fitting a model to it. If you understand why a market inefficiency exists, you need very few parameters to capture it. If you're fitting, you need many. The number of parameters is a signal about your actual level of understanding.
Parts Three and Four cover how to translate these principles into a daily operational system — position sizing, execution, and what the trading diary actually looks like in practice. That's where the framework moves from philosophy to plumbing, and Carver's treatment of it is, predictably, the opposite of what you'd expect.
This post is part of a series on Robert Carver's Systematic Trading. Part One covered the theoretical foundations. Parts Three and Four are forthcoming. Nothing here constitutes financial advice — I'm a reader trying to understand a framework, not a trading advisor.
Leave a comment ✎