Model Deployment and MLOps
Training an accurate model is the beginning, not the end. Getting that model into production, keeping it accurate over time, and maintaining it as requirements change — that’s MLOps. The gap between a Jupyter notebook and a production system that processes millions of requests is where most ML projects fail.
The ML Production Gap
Data Science Reality: Notebook → Accurate Model → "Done!"
MLOps Reality: Data Pipeline → Feature Engineering → Training → Validation → Containerization → Serving API → Monitoring → Retraining → A/B Testing → Rollback Plan → RepeatModel Serialization
import joblibimport pickleimport mlflow
# sklearn modelsjoblib.dump(model, 'model.joblib')model = joblib.load('model.joblib')
# PyTorch modelsimport torchtorch.save(model.state_dict(), 'model_weights.pt')model.load_state_dict(torch.load('model_weights.pt'))torch.save(model, 'full_model.pt') # Save entire model
# ONNX: framework-agnostic format for deploymentimport torch.onnxdummy_input = torch.randn(1, input_size)torch.onnx.export(model, dummy_input, 'model.onnx', input_names=['input'], output_names=['output'])MLflow: Experiment Tracking and Model Registry
import mlflowimport mlflow.sklearnfrom sklearn.ensemble import GradientBoostingClassifierfrom sklearn.metrics import roc_auc_score
mlflow.set_experiment("fraud-detection")
with mlflow.start_run(run_name="gbm-v2"): # Log parameters params = {'n_estimators': 200, 'learning_rate': 0.05, 'max_depth': 4} mlflow.log_params(params)
# Train model = GradientBoostingClassifier(**params, random_state=42) model.fit(X_train, y_train)
# Log metrics auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]) mlflow.log_metric("test_auc", auc) mlflow.log_metric("train_auc", roc_auc_score(y_train, model.predict_proba(X_train)[:, 1]))
# Log model (registered to Model Registry) mlflow.sklearn.log_model( model, "model", registered_model_name="fraud-detector", input_example=X_test[:5] )
print(f"Run ID: {mlflow.active_run().info.run_id}") print(f"Test AUC: {auc:.4f}")Serving a Model with FastAPI
from fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModelimport joblibimport numpy as npfrom typing import Listimport uvicorn
app = FastAPI(title="ML Model API", version="1.0")
# Load model at startupmodel = joblib.load("model.joblib")preprocessor = joblib.load("preprocessor.joblib")
class PredictionRequest(BaseModel): features: List[float]
class PredictionResponse(BaseModel): prediction: int probability: float model_version: str = "1.0"
@app.post("/predict", response_model=PredictionResponse)async def predict(request: PredictionRequest): try: X = np.array(request.features).reshape(1, -1) X_processed = preprocessor.transform(X)
pred = model.predict(X_processed)[0] prob = model.predict_proba(X_processed)[0, 1]
return PredictionResponse(prediction=int(pred), probability=float(prob)) except Exception as e: raise HTTPException(status_code=422, detail=str(e))
@app.get("/health")async def health(): return {"status": "healthy", "model_loaded": model is not None}
if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8000)Containerization with Docker
# DockerfileFROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt
COPY model.joblib preprocessor.joblib ./COPY app.py .
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s \ CMD curl -f http://localhost:8000/health || exit 1
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]# Build and rundocker build -t fraud-model:v1 .docker run -p 8000:8000 fraud-model:v1
# Push to registrydocker tag fraud-model:v1 myregistry.azurecr.io/fraud-model:v1docker push myregistry.azurecr.io/fraud-model:v1Monitoring: Data Drift Detection
Models degrade when input distributions change — customers’ behavior shifts, new product categories appear, economic conditions change. Drift detection catches this before accuracy falls.
from evidently.report import Reportfrom evidently.metric_preset import DataDriftPreset, DataQualityPreset, ClassificationPresetfrom evidently.pipeline.column_mapping import ColumnMapping
# Data drift reportreport = Report(metrics=[DataDriftPreset(), DataQualityPreset()])report.run( reference_data=X_train_df, # Data used during training current_data=X_production_df, # Recent production data column_mapping=ColumnMapping(target='label', prediction='prediction'))report.save_html("drift_report.html")
# Programmatic drift checkresult = report.as_dict()drift_detected = result['metrics'][0]['result']['dataset_drift']if drift_detected: print("⚠️ Data drift detected — retrain recommended")CI/CD for ML with GitHub Actions
name: ML CI/CD Pipeline
on: push: branches: [main] paths: ['src/**', 'notebooks/**']
jobs: train-and-validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3
- name: Setup Python uses: actions/setup-python@v4 with: {python-version: '3.11'}
- name: Install dependencies run: pip install -r requirements.txt
- name: Run data validation run: python validate_data.py
- name: Train model run: python train.py
- name: Evaluate model run: python evaluate.py --min-auc 0.85
- name: Run model tests run: pytest tests/test_model.py -v
- name: Build Docker image run: docker build -t fraud-model:${{ github.sha }} .
- name: Push to registry run: docker push myregistry.azurecr.io/fraud-model:${{ github.sha }}Feature Store
Centralizes feature computation — ensures training and serving use identical feature transformations:
from feast import FeatureStore
# Retrieve training featuresstore = FeatureStore(repo_path="feature_repo/")training_df = store.get_historical_features( entity_df=entity_df, features=["user_features:age", "user_features:account_age_days", "transaction_features:avg_7d_spend", "transaction_features:max_1h_velocity"]).to_df()
# Retrieve serving features (real-time, low-latency)feature_vector = store.get_online_features( features=["user_features:age", "transaction_features:avg_7d_spend"], entity_rows=[{"user_id": "user_12345"}]).to_dict()MLOps Maturity Model
Level 0 — Manual process: Notebooks, no automation, manual deployment, no monitoring
Level 1 — ML Pipeline automation: Automated training pipeline, manual model deployment, basic monitoring
Level 2 — CI/CD for ML: Automated retraining, automated deployment, drift monitoring, model versioning, A/B testing, feature storeMost teams operate at Level 0–1. Level 2 requires investment but enables rapid, safe iteration at scale. The right level depends on how frequently you retrain and how costly model failures are.