Model Deployment and MLOps

Training an accurate model is the beginning, not the end. Getting that model into production, keeping it accurate over time, and maintaining it as requirements change — that’s MLOps. The gap between a Jupyter notebook and a production system that processes millions of requests is where most ML projects fail.

The ML Production Gap

Data Science Reality:
  Notebook → Accurate Model → "Done!"

MLOps Reality:
  Data Pipeline → Feature Engineering → Training → Validation →
  Containerization → Serving API → Monitoring → Retraining →
  A/B Testing → Rollback Plan → Repeat

Model Serialization

import joblib
import pickle
import mlflow

# sklearn models
joblib.dump(model, 'model.joblib')
model = joblib.load('model.joblib')

# PyTorch models
import torch
torch.save(model.state_dict(), 'model_weights.pt')
model.load_state_dict(torch.load('model_weights.pt'))
torch.save(model, 'full_model.pt')  # Save entire model

# ONNX: framework-agnostic format for deployment
import torch.onnx
dummy_input = torch.randn(1, input_size)
torch.onnx.export(model, dummy_input, 'model.onnx',
                  input_names=['input'], output_names=['output'])

MLflow: Experiment Tracking and Model Registry

import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score

mlflow.set_experiment("fraud-detection")

with mlflow.start_run(run_name="gbm-v2"):
    # Log parameters
    params = {'n_estimators': 200, 'learning_rate': 0.05, 'max_depth': 4}
    mlflow.log_params(params)

    # Train
    model = GradientBoostingClassifier(**params, random_state=42)
    model.fit(X_train, y_train)

    # Log metrics
    auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])
    mlflow.log_metric("test_auc", auc)
    mlflow.log_metric("train_auc", roc_auc_score(y_train, model.predict_proba(X_train)[:, 1]))

    # Log model (registered to Model Registry)
    mlflow.sklearn.log_model(
        model, "model",
        registered_model_name="fraud-detector",
        input_example=X_test[:5]
    )

    print(f"Run ID: {mlflow.active_run().info.run_id}")
    print(f"Test AUC: {auc:.4f}")

Serving a Model with FastAPI

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np
from typing import List
import uvicorn

app = FastAPI(title="ML Model API", version="1.0")

# Load model at startup
model = joblib.load("model.joblib")
preprocessor = joblib.load("preprocessor.joblib")

class PredictionRequest(BaseModel):
    features: List[float]

class PredictionResponse(BaseModel):
    prediction: int
    probability: float
    model_version: str = "1.0"

@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    try:
        X = np.array(request.features).reshape(1, -1)
        X_processed = preprocessor.transform(X)

        pred = model.predict(X_processed)[0]
        prob = model.predict_proba(X_processed)[0, 1]

        return PredictionResponse(prediction=int(pred), probability=float(prob))
    except Exception as e:
        raise HTTPException(status_code=422, detail=str(e))

@app.get("/health")
async def health():
    return {"status": "healthy", "model_loaded": model is not None}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

Containerization with Docker

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.joblib preprocessor.joblib ./
COPY app.py .

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=10s \
  CMD curl -f http://localhost:8000/health || exit 1

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

# Build and run
docker build -t fraud-model:v1 .
docker run -p 8000:8000 fraud-model:v1

# Push to registry
docker tag fraud-model:v1 myregistry.azurecr.io/fraud-model:v1
docker push myregistry.azurecr.io/fraud-model:v1

Monitoring: Data Drift Detection

Models degrade when input distributions change — customers’ behavior shifts, new product categories appear, economic conditions change. Drift detection catches this before accuracy falls.

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, DataQualityPreset, ClassificationPreset
from evidently.pipeline.column_mapping import ColumnMapping

# Data drift report
report = Report(metrics=[DataDriftPreset(), DataQualityPreset()])
report.run(
    reference_data=X_train_df,      # Data used during training
    current_data=X_production_df,   # Recent production data
    column_mapping=ColumnMapping(target='label', prediction='prediction')
)
report.save_html("drift_report.html")

# Programmatic drift check
result = report.as_dict()
drift_detected = result['metrics'][0]['result']['dataset_drift']
if drift_detected:
    print("⚠️ Data drift detected — retrain recommended")

CI/CD for ML with GitHub Actions

name: ML CI/CD Pipeline

on:
  push:
    branches: [main]
    paths: ['src/**', 'notebooks/**']

jobs:
  train-and-validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with: {python-version: '3.11'}

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run data validation
        run: python validate_data.py

      - name: Train model
        run: python train.py

      - name: Evaluate model
        run: python evaluate.py --min-auc 0.85

      - name: Run model tests
        run: pytest tests/test_model.py -v

      - name: Build Docker image
        run: docker build -t fraud-model:${{ github.sha }} .

      - name: Push to registry
        run: docker push myregistry.azurecr.io/fraud-model:${{ github.sha }}

Feature Store

Centralizes feature computation — ensures training and serving use identical feature transformations:

from feast import FeatureStore

# Retrieve training features
store = FeatureStore(repo_path="feature_repo/")
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=["user_features:age", "user_features:account_age_days",
               "transaction_features:avg_7d_spend", "transaction_features:max_1h_velocity"]
).to_df()

# Retrieve serving features (real-time, low-latency)
feature_vector = store.get_online_features(
    features=["user_features:age", "transaction_features:avg_7d_spend"],
    entity_rows=[{"user_id": "user_12345"}]
).to_dict()

MLOps Maturity Model

Level 0 — Manual process:
  Notebooks, no automation, manual deployment, no monitoring

Level 1 — ML Pipeline automation:
  Automated training pipeline, manual model deployment, basic monitoring

Level 2 — CI/CD for ML:
  Automated retraining, automated deployment, drift monitoring,
  model versioning, A/B testing, feature store

Most teams operate at Level 0–1. Level 2 requires investment but enables rapid, safe iteration at scale. The right level depends on how frequently you retrain and how costly model failures are.