Hyperparameter Tuning: Systematic Model Optimization

Learn hyperparameter tuning strategies — grid search, random search, Bayesian optimization with Optuna, early stopping, and best practices for efficient hyperparameter search.

Hyperparameter Tuning

Hyperparameters are model configuration values set before training — learning rate, tree depth, regularization strength. Unlike weights, they’re not learned from data. Systematic hyperparameter tuning often improves performance more than switching algorithms, yet it’s frequently done ad-hoc.


Exhaustively evaluates every combination in a predefined parameter grid:

from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
param_grid = {
'n_estimators': [100, 200, 500],
'max_depth': [3, 5, 10, None],
'min_samples_leaf': [1, 5, 10],
'max_features': ['sqrt', 'log2', 0.5]
}
# Total: 3 × 4 × 3 × 3 = 108 combinations × 5 folds = 540 model fits
grid_search = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=StratifiedKFold(5, shuffle=True, random_state=42),
scoring='roc_auc',
n_jobs=-1,
verbose=1,
return_train_score=True
)
grid_search.fit(X_train, y_train)
print(f"Best params: {grid_search.best_params_}")
print(f"Best CV AUC: {grid_search.best_score_:.4f}")
print(f"Test AUC: {roc_auc_score(y_test, grid_search.predict_proba(X_test)[:, 1]):.4f}")

Limitation: Exponential in the number of parameters. 10 parameters × 5 values = 10M combinations.


Samples random combinations — often finds good solutions faster than grid search:

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform
param_distributions = {
'n_estimators': randint(50, 500),
'max_depth': randint(2, 20),
'min_samples_leaf': randint(1, 30),
'max_features': uniform(0.1, 0.9),
'min_impurity_decrease': uniform(0, 0.01)
}
random_search = RandomizedSearchCV(
RandomForestClassifier(random_state=42),
param_distributions,
n_iter=100, # Try 100 random combinations
cv=StratifiedKFold(5, shuffle=True, random_state=42),
scoring='roc_auc',
n_jobs=-1,
random_state=42
)
random_search.fit(X_train, y_train)
print(f"Best params: {random_search.best_params_}")
print(f"Best CV AUC: {random_search.best_score_:.4f}")

Bayesian Optimization with Optuna

The most efficient approach: models which parameter combinations are most likely to improve results and focuses search there:

import optuna
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier
def objective(trial):
params = {
'n_estimators': trial.suggest_int('n_estimators', 50, 500),
'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
'max_depth': trial.suggest_int('max_depth', 2, 8),
'subsample': trial.suggest_float('subsample', 0.5, 1.0),
'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 50),
}
model = GradientBoostingClassifier(**params, random_state=42)
scores = cross_val_score(model, X_train, y_train,
cv=StratifiedKFold(3, shuffle=True, random_state=42),
scoring='roc_auc', n_jobs=-1)
return scores.mean()
# Create study and optimize
study = optuna.create_study(direction='maximize',
sampler=optuna.samplers.TPESampler(seed=42))
study.optimize(objective, n_trials=100, n_jobs=1)
print(f"Best AUC: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")
# Optuna visualization
optuna.visualization.plot_optimization_history(study).show()
optuna.visualization.plot_param_importances(study).show()

Trains all candidates on a small subset, progressively allocates more resources to better candidates:

from sklearn.model_selection import HalvingGridSearchCV
halving_search = HalvingGridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=5,
factor=3, # 1/3 of candidates advance each round
scoring='roc_auc',
n_jobs=-1
)
halving_search.fit(X_train, y_train)

Key Hyperparameters by Model

ModelKey ParametersTypical Starting Range
Random Forestn_estimators, max_depth, min_samples_leaf100–500, 3–20, 1–20
XGBoostlearning_rate, max_depth, subsample0.01–0.3, 3–8, 0.5–1.0
Neural Networklearning_rate, hidden_dims, dropout1e-4–1e-2, varies, 0.1–0.5
SVMC, gamma0.01–100, 1e-4–1
Ridge/Lassoalpha1e-4–1e4

Best Practices

  1. Start coarse, then refine: Wide ranges first, zoom in on promising regions
  2. Use early stopping (XGBoost/LightGBM): Saves time by stopping bad trials early
  3. Fix random seeds: Ensures reproducibility across runs
  4. Log all experiments: MLflow or Optuna’s built-in storage
  5. Never tune on test data: All tuning happens on validation set / CV
  6. Use parallel search: n_jobs=-1 for sklearn, n_jobs>1 for Optuna