Bias-Variance Tradeoff
Every prediction error can be decomposed into three parts: bias, variance, and irreducible noise. Understanding this decomposition tells you not just that a model is performing poorly, but why — and what to do about it.
The Decomposition
Expected Test Error = Bias² + Variance + Irreducible Noise
Bias²: Error from wrong assumptions about the data (underfitting — model too simple)
Variance: Error from sensitivity to training data fluctuations (overfitting — model too complex)
Irreducible Noise: Error from noise in the data itself — can't be reducedIntuition with a Dartboard
Imagine throwing darts, where bull’s-eye = true function:
High Bias, Low Variance: Low Bias, High Variance: Low Bias, Low Variance:(Consistent but wrong) (Accurate but inconsistent) (Ideal)
○ ○ × × ● ○●○ ×● × ●● ○ ○ × × ●
Darts cluster far from Darts spread widely Darts cluster nearcenter — underfitting around center — overfitting center — just rightHigh Bias vs. High Variance Indicators
High Bias (Underfitting):
- Training error is high
- Validation error is close to training error (both bad)
- Learning curve: both train/val curves plateau at high error
- Fix: more complex model, better features, less regularization
High Variance (Overfitting):
- Training error is low
- Validation error is much higher than training error
- Learning curve: large gap between train and val curves
- Fix: more data, regularization, dropout, simpler model
Diagnosing with Learning Curves
import matplotlib.pyplot as pltimport numpy as npfrom sklearn.model_selection import learning_curvefrom sklearn.pipeline import Pipelinefrom sklearn.preprocessing import StandardScalerfrom sklearn.svm import SVC
def plot_learning_curves(estimator, X, y, title): train_sizes, train_scores, val_scores = learning_curve( estimator, X, y, cv=5, scoring='accuracy', train_sizes=np.linspace(0.1, 1.0, 10), n_jobs=-1 )
train_mean = train_scores.mean(axis=1) train_std = train_scores.std(axis=1) val_mean = val_scores.mean(axis=1) val_std = val_scores.std(axis=1)
plt.figure(figsize=(8, 5)) plt.plot(train_sizes, train_mean, 'o-', color='blue', label='Training Score') plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.1, color='blue') plt.plot(train_sizes, val_mean, 'o-', color='green', label='Validation Score') plt.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.1, color='green')
plt.xlabel('Training Set Size') plt.ylabel('Accuracy') plt.title(title) plt.legend() plt.grid(True, alpha=0.3)
plot_learning_curves(SVC(kernel='rbf', C=100), X, y, 'High Variance (C=100)')plot_learning_curves(SVC(kernel='linear', C=0.01), X, y, 'High Bias (C=0.01)')How Model Complexity Affects Bias and Variance
Model Complexity → Simple ─────────────────────── Complex
Bias: High ───────────────────────── LowVariance: Low ────────────────────────── HighTotal Error: High ──── optimal ───────────── High ↑ Sweet spot: lowest total errorThe “sweet spot” is the model complexity that minimizes validation error — neither too simple nor too complex.
The Double Descent Phenomenon
In very large overparameterized models (neural networks with more parameters than training samples), the classic bias-variance tradeoff breaks down:
Error | \ classic U-shape | ─── | \ second descent | ───────────── | └─────────────────────────── Model Complexity ↑ interpolation thresholdModern neural networks are in this “beyond interpolation” regime where the classical tradeoff doesn’t directly apply — they can interpolate training data and still generalize well when sufficiently overparameterized.
Strategies by Diagnosis
| Problem | Root Cause | Solutions |
|---|---|---|
| High training + high val error | High Bias | More features, more complex model, less regularization, more epochs |
| Low training + high val error | High Variance | More training data, regularization, dropout, simpler model, early stopping |
| Both errors unstable | High Variance | Fix randomness, more data, gradient clipping |
The diagnosis comes first — every fix is specific to whether you have a bias problem or a variance problem. Misdiagnosis leads to applying the wrong fix and making things worse.