Hyperparameter Tuning
Hyperparameters are model configuration values set before training — learning rate, tree depth, regularization strength. Unlike weights, they’re not learned from data. Systematic hyperparameter tuning often improves performance more than switching algorithms, yet it’s frequently done ad-hoc.
Grid Search
Exhaustively evaluates every combination in a predefined parameter grid:
from sklearn.model_selection import GridSearchCV, StratifiedKFoldfrom sklearn.ensemble import RandomForestClassifier
param_grid = { 'n_estimators': [100, 200, 500], 'max_depth': [3, 5, 10, None], 'min_samples_leaf': [1, 5, 10], 'max_features': ['sqrt', 'log2', 0.5]}# Total: 3 × 4 × 3 × 3 = 108 combinations × 5 folds = 540 model fits
grid_search = GridSearchCV( RandomForestClassifier(random_state=42), param_grid, cv=StratifiedKFold(5, shuffle=True, random_state=42), scoring='roc_auc', n_jobs=-1, verbose=1, return_train_score=True)
grid_search.fit(X_train, y_train)
print(f"Best params: {grid_search.best_params_}")print(f"Best CV AUC: {grid_search.best_score_:.4f}")print(f"Test AUC: {roc_auc_score(y_test, grid_search.predict_proba(X_test)[:, 1]):.4f}")Limitation: Exponential in the number of parameters. 10 parameters × 5 values = 10M combinations.
Random Search
Samples random combinations — often finds good solutions faster than grid search:
from sklearn.model_selection import RandomizedSearchCVfrom scipy.stats import randint, uniform
param_distributions = { 'n_estimators': randint(50, 500), 'max_depth': randint(2, 20), 'min_samples_leaf': randint(1, 30), 'max_features': uniform(0.1, 0.9), 'min_impurity_decrease': uniform(0, 0.01)}
random_search = RandomizedSearchCV( RandomForestClassifier(random_state=42), param_distributions, n_iter=100, # Try 100 random combinations cv=StratifiedKFold(5, shuffle=True, random_state=42), scoring='roc_auc', n_jobs=-1, random_state=42)
random_search.fit(X_train, y_train)print(f"Best params: {random_search.best_params_}")print(f"Best CV AUC: {random_search.best_score_:.4f}")Bayesian Optimization with Optuna
The most efficient approach: models which parameter combinations are most likely to improve results and focuses search there:
import optunafrom sklearn.model_selection import cross_val_scorefrom sklearn.ensemble import GradientBoostingClassifier
def objective(trial): params = { 'n_estimators': trial.suggest_int('n_estimators', 50, 500), 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True), 'max_depth': trial.suggest_int('max_depth', 2, 8), 'subsample': trial.suggest_float('subsample', 0.5, 1.0), 'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 50), }
model = GradientBoostingClassifier(**params, random_state=42) scores = cross_val_score(model, X_train, y_train, cv=StratifiedKFold(3, shuffle=True, random_state=42), scoring='roc_auc', n_jobs=-1) return scores.mean()
# Create study and optimizestudy = optuna.create_study(direction='maximize', sampler=optuna.samplers.TPESampler(seed=42))study.optimize(objective, n_trials=100, n_jobs=1)
print(f"Best AUC: {study.best_value:.4f}")print(f"Best params: {study.best_params}")
# Optuna visualizationoptuna.visualization.plot_optimization_history(study).show()optuna.visualization.plot_param_importances(study).show()Halving Search (Fast Alternative to Grid Search)
Trains all candidates on a small subset, progressively allocates more resources to better candidates:
from sklearn.model_selection import HalvingGridSearchCV
halving_search = HalvingGridSearchCV( RandomForestClassifier(random_state=42), param_grid, cv=5, factor=3, # 1/3 of candidates advance each round scoring='roc_auc', n_jobs=-1)halving_search.fit(X_train, y_train)Key Hyperparameters by Model
| Model | Key Parameters | Typical Starting Range |
|---|---|---|
| Random Forest | n_estimators, max_depth, min_samples_leaf | 100–500, 3–20, 1–20 |
| XGBoost | learning_rate, max_depth, subsample | 0.01–0.3, 3–8, 0.5–1.0 |
| Neural Network | learning_rate, hidden_dims, dropout | 1e-4–1e-2, varies, 0.1–0.5 |
| SVM | C, gamma | 0.01–100, 1e-4–1 |
| Ridge/Lasso | alpha | 1e-4–1e4 |
Best Practices
- Start coarse, then refine: Wide ranges first, zoom in on promising regions
- Use early stopping (XGBoost/LightGBM): Saves time by stopping bad trials early
- Fix random seeds: Ensures reproducibility across runs
- Log all experiments: MLflow or Optuna’s built-in storage
- Never tune on test data: All tuning happens on validation set / CV
- Use parallel search:
n_jobs=-1for sklearn,n_jobs>1for Optuna