Online Learning

Most machine learning workflows follow a static pattern: collect data, train a model, deploy it, and retrain periodically when it degrades. Online learning breaks that pattern. Instead of batch training, the model updates incrementally — processing one example (or a small batch) at a time, continuously incorporating new information.

Why Online Learning Exists

The world changes. A fraud detection model trained on last year’s patterns misses new attack vectors. A recommendation model trained on summer behavior is wrong in December. A price prediction model trained before a market shock is immediately stale.

Retraining from scratch on all historical data is:

Computationally expensive (hours or days)
Memory intensive (can’t hold all data)
Slow to respond to rapid changes

Online learning addresses this by updating the model incrementally as each new observation arrives.

The Core Mechanism

Batch learning:
  All data → Train once → Deploy → Wait for degradation → Retrain

Online learning:
  New example arrives →
    1. Make prediction
    2. Observe true label / feedback
    3. Update model weights immediately
    4. Repeat for next example

The model is always current — it’s seen every data point up to this moment.

Stochastic Gradient Descent (Online SGD)

The mathematical backbone of online learning. Instead of computing gradients over the full dataset (batch gradient descent), update weights after each example:

# Mini-batch SGD — the de facto standard
for epoch in range(n_epochs):
    for X_batch, y_batch in dataloader:          # process one batch
        optimizer.zero_grad()
        loss = criterion(model(X_batch), y_batch)
        loss.backward()                          # compute gradients
        optimizer.step()                         # update weights

With batch_size=1, this is true online learning. With small batches, it’s a practical compromise between the speed of online and the stability of batch.

Concept Drift: The Main Challenge

The biggest enemy of deployed models is concept drift — the statistical properties of the target variable changing over time.

Types of drift:
  Sudden drift:     Distribution changes abruptly (market crash, new fraud method)
  Gradual drift:    Slow shift over time (seasonal patterns, demographic change)
  Recurring drift:  Cyclical changes (day/night, weekday/weekend)
  Incremental:      Continuous small changes

Detection methods:

ADWIN (Adaptive Windowing): Detects distribution changes in a stream
Page-Hinkley test: Statistical test for abrupt changes
DDM/EDDM: Monitor error rate; trigger retraining when error rises significantly
Population Stability Index (PSI): Compare feature distributions between windows

Practical Online Learning with River

River is the leading Python library for online/streaming ML:

from river import linear_model, preprocessing, metrics, drift

# Build an incremental pipeline
scaler = preprocessing.StandardScaler()
model = linear_model.LogisticRegression()
metric = metrics.Accuracy()
drift_detector = drift.ADWIN()

for x, y_true in data_stream:
    # Scale features incrementally (online)
    x_scaled = scaler.transform_one(x)
    scaler.learn_one(x)

    # Predict before learning
    y_pred = model.predict_one(x_scaled)
    metric.update(y_true, y_pred)

    # Learn from the new example
    model.learn_one(x_scaled, y_true)

    # Detect concept drift
    drift_detector.update(int(y_pred != y_true))
    if drift_detector.drift_detected:
        print(f"Drift detected! Accuracy: {metric.get():.3f}")
        metric = metrics.Accuracy()  # reset metric

Online Learning in Production Systems

Real-world examples where online learning is essential:

Recommendation systems: User preferences evolve. Netflix, YouTube, and Spotify all use forms of online learning to update user embeddings in real time.

Fraud detection: Fraudsters adapt. Models that only update weekly fall behind attack patterns that change daily.

Ad click prediction: Billions of ad impressions per day; feedback arrives immediately. Batch retraining can’t keep up with fast-changing click rates.

Financial trading: Market microstructure changes; models must adapt continuously.

Online vs. Batch: When to Use Each

Factor	Online Learning	Batch Learning
Data velocity	High (streaming)	Low-medium
Concept drift	High risk	Manageable
Memory constraint	Tight	Flexible
Training stability	Lower	Higher
Debugging complexity	High	Lower
Industry adoption	Niche	Dominant

Recommendation: Start with batch learning and periodic retraining (e.g., daily). Move to online learning only when retraining latency or concept drift creates a measurable business impact. The operational complexity of online systems is significant — stream processing infrastructure, drift monitoring, rollback mechanisms.

2025–2026: Continual Learning

Academic research increasingly focuses on continual learning — neural networks that learn new tasks without forgetting old ones (the “catastrophic forgetting” problem). Methods like EWC (Elastic Weight Consolidation), progressive networks, and memory replay are making large models adaptable without full retraining. This will likely find its way into production LLM fine-tuning pipelines in the coming years.