Overfitting and Underfitting: How to Spot Them and Fix Them

If a trained model performs beautifully on the data it was trained on but falls apart on anything new, or if it performs poorly everywhere including on its own training data, you’re looking at one of the two most common failure modes in machine learning: overfitting or underfitting. Both are diagnosable from a single, simple artifact — the training and validation loss curves — and both have well-established, practical fixes.

Underfitting: The Model Hasn’t Learned Enough

Underfitting happens when a model is too simple, or hasn’t trained long enough, to capture the actual pattern in the data — it performs poorly on both training and validation data because it genuinely hasn’t learned the relationship yet.

import matplotlib.pyplot as plt

# Underfitting signature: both curves are high and roughly flat
epochs = range(1, 21)
train_loss = [0.9, 0.85, 0.82, 0.80, 0.79, 0.78, 0.78, 0.77, 0.77, 0.77] * 2
val_loss   = [0.92, 0.87, 0.84, 0.82, 0.81, 0.80, 0.80, 0.79, 0.79, 0.79] * 2

plt.plot(epochs, train_loss, label="Train")
plt.plot(epochs, val_loss, label="Validation")
# Both lines stay high and close together — the model hasn't learned much at all

Overfitting: The Model Has Memorized, Not Learned

Overfitting happens when a model learns the training data too precisely, including its noise and idiosyncrasies, at the cost of generalizing to new data — training loss keeps dropping while validation loss stalls or starts rising.

# Overfitting signature: training loss keeps dropping, validation loss diverges upward
train_loss = [0.9, 0.6, 0.4, 0.25, 0.15, 0.08, 0.04, 0.02, 0.01, 0.005]
val_loss   = [0.92, 0.65, 0.5, 0.45, 0.44, 0.46, 0.50, 0.55, 0.60, 0.65]

# The growing gap between the two lines, especially validation loss increasing,
# is the clearest possible signal of overfitting

This growing gap between training and validation loss is the single most important pattern to watch for during training — it’s usually visible well before final evaluation metrics would reveal the same problem.

The Classic Training Curve Shapes

Underfitting:        Good fit:            Overfitting:

Loss                 Loss                 Loss
 │  train             │  train              │  train
 │  ≈ val             │  ↓                  │  ↓╲
 │─────               │   ↓___val           │    ╲___val (rising back up)
 │                     │                     │
 └────── epochs        └────── epochs        └────── epochs

A well-fit model shows both curves decreasing together and then converging to a small, stable gap — this is the target shape every training run should aim toward, and it’s the visual signal that tells you training has reached a good stopping point.

Fixing Underfitting

Increase model capacity. Add layers or neurons — a genuinely under-capacity model simply can’t represent the necessary pattern.
Train for more epochs. Sometimes the model hasn’t been given enough time to converge.
Reduce regularization. If dropout or weight decay is too aggressive, it can prevent the model from learning even the patterns it’s capable of representing.
Improve input features or representation. Poor or insufficient input features, covered in Feature Engineering, can make even a capable model underperform.

Fixing Overfitting

Add regularization. L1/L2 penalties on weights, or dropout, covered in Regularization and Dropout, directly discourage the model from fitting noise.
Use early stopping. Halt training at the epoch where validation loss is lowest, rather than continuing to the point where training loss is lowest.

best_val_loss = float('inf')
patience_counter = 0
patience = 5

for epoch in range(num_epochs):
    train_one_epoch(model, train_loader)
    val_loss = evaluate(model, val_loader)

    if val_loss < best_val_loss:
        best_val_loss = val_loss
        patience_counter = 0
        save_checkpoint(model)   # save the best-performing version
    else:
        patience_counter += 1
        if patience_counter >= patience:
            print(f"Early stopping at epoch {epoch}")
            break

Gather more training data. More diverse examples make it genuinely harder for a model to simply memorize the training set.
Data augmentation. Especially for images, artificially expanding the effective training set (random crops, flips, rotations) reduces the chance of memorizing specific examples.
Reduce model capacity. If the model is dramatically over-parameterized relative to the problem, a smaller architecture can directly reduce its capacity to overfit.

Why Watching Both Curves Matters More Than Watching One

A common mistake is only tracking training loss, since it’s readily available and always looks like it’s improving. Without the validation curve as a reference, overfitting is invisible until you actually deploy the model and discover degraded real-world performance — by which point the fix requires retraining rather than a simple adjustment to an in-progress run. Logging and plotting both curves, ideally in real time during training (via TensorBoard or a similar tool), is one of the highest-value habits in practical deep learning work.

A Third, Less Discussed Pattern: Double Descent

Modern research on very large, over-parameterized networks has identified a pattern that doesn’t fit neatly into the classical overfitting/underfitting picture — as model size or training time increases well past the point where classical theory predicts overfitting should worsen, test performance can sometimes improve again after an initial dip, a phenomenon called “double descent.” This is still an active area of research and doesn’t change the practical diagnostic approach covered above for most everyday projects — tracking training and validation curves remains the right first step — but it’s worth knowing this pattern exists so that an unusual, non-textbook training curve on a very large model isn’t automatically assumed to be a bug rather than a genuine, if less common, training dynamic.

Summary

Symptom	Diagnosis	Primary fixes
Both losses high, close together	Underfitting	More capacity, more training, less regularization
Training loss low, validation loss high/rising	Overfitting	Regularization, dropout, more data, early stopping
Both losses low and converging	Good fit	Continue current approach

Overfitting and underfitting aren’t abstract textbook terms — they’re directly visible in a training curve, diagnosable in seconds, and each has a well-established, specific set of fixes rather than requiring guesswork.

Written by NPBlue Engineering Team — Practitioners who writes every guide from hands-on production experience, not paraphrased documentation.

Reviewed for technical accuracy. Spot an error? Let us know.