Model Deployment and MLOps: Taking Machine Learning to Production

Learn MLOps — model serving, containerization, CI/CD for ML, monitoring data drift, feature stores, ML pipelines, and best practices for production ML in 2026.

Model Deployment and MLOps

Training an accurate model is the beginning, not the end. Getting that model into production, keeping it accurate over time, and maintaining it as requirements change — that’s MLOps. The gap between a Jupyter notebook and a production system that processes millions of requests is where most ML projects fail.


The ML Production Gap

Data Science Reality:
Notebook → Accurate Model → "Done!"
MLOps Reality:
Data Pipeline → Feature Engineering → Training → Validation →
Containerization → Serving API → Monitoring → Retraining →
A/B Testing → Rollback Plan → Repeat

Model Serialization

import joblib
import pickle
import mlflow
# sklearn models
joblib.dump(model, 'model.joblib')
model = joblib.load('model.joblib')
# PyTorch models
import torch
torch.save(model.state_dict(), 'model_weights.pt')
model.load_state_dict(torch.load('model_weights.pt'))
torch.save(model, 'full_model.pt') # Save entire model
# ONNX: framework-agnostic format for deployment
import torch.onnx
dummy_input = torch.randn(1, input_size)
torch.onnx.export(model, dummy_input, 'model.onnx',
input_names=['input'], output_names=['output'])

MLflow: Experiment Tracking and Model Registry

import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score
mlflow.set_experiment("fraud-detection")
with mlflow.start_run(run_name="gbm-v2"):
# Log parameters
params = {'n_estimators': 200, 'learning_rate': 0.05, 'max_depth': 4}
mlflow.log_params(params)
# Train
model = GradientBoostingClassifier(**params, random_state=42)
model.fit(X_train, y_train)
# Log metrics
auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])
mlflow.log_metric("test_auc", auc)
mlflow.log_metric("train_auc", roc_auc_score(y_train, model.predict_proba(X_train)[:, 1]))
# Log model (registered to Model Registry)
mlflow.sklearn.log_model(
model, "model",
registered_model_name="fraud-detector",
input_example=X_test[:5]
)
print(f"Run ID: {mlflow.active_run().info.run_id}")
print(f"Test AUC: {auc:.4f}")

Serving a Model with FastAPI

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np
from typing import List
import uvicorn
app = FastAPI(title="ML Model API", version="1.0")
# Load model at startup
model = joblib.load("model.joblib")
preprocessor = joblib.load("preprocessor.joblib")
class PredictionRequest(BaseModel):
features: List[float]
class PredictionResponse(BaseModel):
prediction: int
probability: float
model_version: str = "1.0"
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
try:
X = np.array(request.features).reshape(1, -1)
X_processed = preprocessor.transform(X)
pred = model.predict(X_processed)[0]
prob = model.predict_proba(X_processed)[0, 1]
return PredictionResponse(prediction=int(pred), probability=float(prob))
except Exception as e:
raise HTTPException(status_code=422, detail=str(e))
@app.get("/health")
async def health():
return {"status": "healthy", "model_loaded": model is not None}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)

Containerization with Docker

# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model.joblib preprocessor.joblib ./
COPY app.py .
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s \
CMD curl -f http://localhost:8000/health || exit 1
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Terminal window
# Build and run
docker build -t fraud-model:v1 .
docker run -p 8000:8000 fraud-model:v1
# Push to registry
docker tag fraud-model:v1 myregistry.azurecr.io/fraud-model:v1
docker push myregistry.azurecr.io/fraud-model:v1

Monitoring: Data Drift Detection

Models degrade when input distributions change — customers’ behavior shifts, new product categories appear, economic conditions change. Drift detection catches this before accuracy falls.

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, DataQualityPreset, ClassificationPreset
from evidently.pipeline.column_mapping import ColumnMapping
# Data drift report
report = Report(metrics=[DataDriftPreset(), DataQualityPreset()])
report.run(
reference_data=X_train_df, # Data used during training
current_data=X_production_df, # Recent production data
column_mapping=ColumnMapping(target='label', prediction='prediction')
)
report.save_html("drift_report.html")
# Programmatic drift check
result = report.as_dict()
drift_detected = result['metrics'][0]['result']['dataset_drift']
if drift_detected:
print("⚠️ Data drift detected — retrain recommended")

CI/CD for ML with GitHub Actions

.github/workflows/ml-pipeline.yml
name: ML CI/CD Pipeline
on:
push:
branches: [main]
paths: ['src/**', 'notebooks/**']
jobs:
train-and-validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with: {python-version: '3.11'}
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run data validation
run: python validate_data.py
- name: Train model
run: python train.py
- name: Evaluate model
run: python evaluate.py --min-auc 0.85
- name: Run model tests
run: pytest tests/test_model.py -v
- name: Build Docker image
run: docker build -t fraud-model:${{ github.sha }} .
- name: Push to registry
run: docker push myregistry.azurecr.io/fraud-model:${{ github.sha }}

Feature Store

Centralizes feature computation — ensures training and serving use identical feature transformations:

from feast import FeatureStore
# Retrieve training features
store = FeatureStore(repo_path="feature_repo/")
training_df = store.get_historical_features(
entity_df=entity_df,
features=["user_features:age", "user_features:account_age_days",
"transaction_features:avg_7d_spend", "transaction_features:max_1h_velocity"]
).to_df()
# Retrieve serving features (real-time, low-latency)
feature_vector = store.get_online_features(
features=["user_features:age", "transaction_features:avg_7d_spend"],
entity_rows=[{"user_id": "user_12345"}]
).to_dict()

MLOps Maturity Model

Level 0 — Manual process:
Notebooks, no automation, manual deployment, no monitoring
Level 1 — ML Pipeline automation:
Automated training pipeline, manual model deployment, basic monitoring
Level 2 — CI/CD for ML:
Automated retraining, automated deployment, drift monitoring,
model versioning, A/B testing, feature store

Most teams operate at Level 0–1. Level 2 requires investment but enables rapid, safe iteration at scale. The right level depends on how frequently you retrain and how costly model failures are.