Hyperparameter Tuning

Hyperparameters are model configuration values set before training — learning rate, tree depth, regularization strength. Unlike weights, they’re not learned from data. Systematic hyperparameter tuning often improves performance more than switching algorithms, yet it’s frequently done ad-hoc.

Grid Search

Exhaustively evaluates every combination in a predefined parameter grid:

from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier

param_grid = {
    'n_estimators': [100, 200, 500],
    'max_depth': [3, 5, 10, None],
    'min_samples_leaf': [1, 5, 10],
    'max_features': ['sqrt', 'log2', 0.5]
}
# Total: 3 × 4 × 3 × 3 = 108 combinations × 5 folds = 540 model fits

grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=StratifiedKFold(5, shuffle=True, random_state=42),
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1,
    return_train_score=True
)

grid_search.fit(X_train, y_train)

print(f"Best params: {grid_search.best_params_}")
print(f"Best CV AUC: {grid_search.best_score_:.4f}")
print(f"Test AUC:    {roc_auc_score(y_test, grid_search.predict_proba(X_test)[:, 1]):.4f}")

Limitation: Exponential in the number of parameters. 10 parameters × 5 values = 10M combinations.

Random Search

Samples random combinations — often finds good solutions faster than grid search:

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

param_distributions = {
    'n_estimators': randint(50, 500),
    'max_depth': randint(2, 20),
    'min_samples_leaf': randint(1, 30),
    'max_features': uniform(0.1, 0.9),
    'min_impurity_decrease': uniform(0, 0.01)
}

random_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_distributions,
    n_iter=100,            # Try 100 random combinations
    cv=StratifiedKFold(5, shuffle=True, random_state=42),
    scoring='roc_auc',
    n_jobs=-1,
    random_state=42
)

random_search.fit(X_train, y_train)
print(f"Best params: {random_search.best_params_}")
print(f"Best CV AUC: {random_search.best_score_:.4f}")

Bayesian Optimization with Optuna

The most efficient approach: models which parameter combinations are most likely to improve results and focuses search there:

import optuna
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier

def objective(trial):
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 500),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'max_depth': trial.suggest_int('max_depth', 2, 8),
        'subsample': trial.suggest_float('subsample', 0.5, 1.0),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 50),
    }

    model = GradientBoostingClassifier(**params, random_state=42)
    scores = cross_val_score(model, X_train, y_train,
                              cv=StratifiedKFold(3, shuffle=True, random_state=42),
                              scoring='roc_auc', n_jobs=-1)
    return scores.mean()

# Create study and optimize
study = optuna.create_study(direction='maximize',
                             sampler=optuna.samplers.TPESampler(seed=42))
study.optimize(objective, n_trials=100, n_jobs=1)

print(f"Best AUC: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")

# Optuna visualization
optuna.visualization.plot_optimization_history(study).show()
optuna.visualization.plot_param_importances(study).show()

Halving Search (Fast Alternative to Grid Search)

Trains all candidates on a small subset, progressively allocates more resources to better candidates:

from sklearn.model_selection import HalvingGridSearchCV

halving_search = HalvingGridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,
    factor=3,          # 1/3 of candidates advance each round
    scoring='roc_auc',
    n_jobs=-1
)
halving_search.fit(X_train, y_train)

Key Hyperparameters by Model

Model	Key Parameters	Typical Starting Range
Random Forest	n_estimators, max_depth, min_samples_leaf	100–500, 3–20, 1–20
XGBoost	learning_rate, max_depth, subsample	0.01–0.3, 3–8, 0.5–1.0
Neural Network	learning_rate, hidden_dims, dropout	1e-4–1e-2, varies, 0.1–0.5
SVM	C, gamma	0.01–100, 1e-4–1
Ridge/Lasso	alpha	1e-4–1e4

Best Practices

Start coarse, then refine: Wide ranges first, zoom in on promising regions
Use early stopping (XGBoost/LightGBM): Saves time by stopping bad trials early
Fix random seeds: Ensures reproducibility across runs
Log all experiments: MLflow or Optuna’s built-in storage
Never tune on test data: All tuning happens on validation set / CV
Use parallel search: n_jobs=-1 for sklearn, n_jobs>1 for Optuna