Logistic Regression

Despite the name, logistic regression is a classification algorithm. It predicts the probability that an input belongs to a class — and it’s one of the most reliable, interpretable, and underrated tools in the ML practitioner’s kit.

Why Not Just Use Linear Regression for Classification?

Linear regression predicts unbounded values (-∞ to +∞). For classification, you need a probability (0 to 1). Applying linear regression to binary targets (0/1) produces predictions outside [0,1], which don’t make sense as probabilities.

The fix: pass the linear output through a sigmoid function that squashes any value to (0, 1).

The Sigmoid Function

σ(z) = 1 / (1 + e⁻ᶻ)

z = -6 → σ(z) ≈ 0.002 (very unlikely class 1)
z = 0  → σ(z) = 0.5   (uncertain)
z = 6  → σ(z) ≈ 0.997 (very likely class 1)

σ(z)
  1 │     ──────────────
    │   /
0.5 │  │
    │ /
  0 │──────────────
   -6    0    +6        z

The model:

Computes a linear combination: z = β₀ + β₁x₁ + … + βₙxₙ
Passes through sigmoid: P(y=1|x) = σ(z)
Predicts class 1 if P > 0.5 (or a custom threshold)

Training: Maximum Likelihood

Unlike linear regression (minimize MSE), logistic regression maximizes the log-likelihood — how probable the observed labels are under the model. This is equivalent to minimizing binary cross-entropy loss:

Loss = -[y × log(p) + (1-y) × log(1-p)]

When y=1: want p close to 1 → log(1) = 0 (zero loss)
When y=0: want p close to 0 → log(1) = 0 (zero loss)
Large penalty when prediction is confident AND wrong.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score

model = LogisticRegression(C=1.0, max_iter=1000, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]  # P(class=1)

print(classification_report(y_test, y_pred))
print(f"AUC-ROC: {roc_auc_score(y_test, y_prob):.4f}")

Interpreting Coefficients (Odds Ratios)

The key strength: logistic regression coefficients have a meaningful interpretation through log odds.

log(p / (1-p)) = β₀ + β₁x₁ + β₂x₂ + ...

Taking the exponential:
  e^βᵢ = odds ratio for feature i

If β_age = 0.05:
  e^0.05 ≈ 1.051
  → Each additional year of age multiplies the odds by 1.051 (5.1% increase in odds)

This is why logistic regression is the default choice in medicine, finance, and legal contexts where “why did the model decide this?” matters as much as what it decided.

Regularization in Logistic Regression

The C parameter is the inverse of regularization strength (C = 1/λ):

# More regularization (C small): simpler model, less variance
model_reg = LogisticRegression(C=0.01)

# Less regularization (C large): more complex, potentially overfit
model_flex = LogisticRegression(C=100)

# L1 regularization: sparse coefficients (feature selection)
model_l1 = LogisticRegression(C=1.0, penalty='l1', solver='saga')

Multiclass Extension

One-vs-Rest (OvR): Train K binary classifiers, one per class. Predict the class whose classifier is most confident.

Softmax (Multinomial): Extends sigmoid to K classes directly. Probabilities for all classes sum to 1.

# Multinomial logistic regression
model_multi = LogisticRegression(multi_class='multinomial', solver='lbfgs')
model_multi.fit(X_train, y_train)  # y_train has 3+ classes

y_prob = model_multi.predict_proba(X_test)
# Returns probability for each class

Threshold Tuning

Default threshold is 0.5, but this isn’t always optimal. If false negatives are more expensive than false positives (e.g., cancer screening), lower the threshold:

# Try different thresholds, plot precision-recall curve
from sklearn.metrics import precision_recall_curve

precisions, recalls, thresholds = precision_recall_curve(y_test, y_prob)

# Find threshold that maximizes F1
f1_scores = 2 * precisions * recalls / (precisions + recalls + 1e-8)
best_threshold = thresholds[f1_scores.argmax()]
y_pred_tuned = (y_prob >= best_threshold).astype(int)

When to Choose Logistic Regression

Use logistic regression when:

You need interpretable probability estimates
Dataset is small-to-medium
Features are mostly linearly separable
Regulatory/compliance requires explainability
A fast, reliable baseline is needed

Consider alternatives when:

Decision boundaries are highly non-linear → Tree-based models or SVM
Many complex feature interactions → Gradient boosting
Image or text input → Neural networks
You have millions of features → Large-scale linearized models or deep learning

Logistic regression is often competitive with more complex models on small, well-preprocessed tabular datasets. Always include it in your model comparison — it frequently punches above its weight.