Support Vector Machines
Support Vector Machines are a powerful and elegant family of algorithms for classification and regression. The core idea: find the decision boundary that not only separates classes, but does so with the maximum possible margin — the widest gap between the boundary and the nearest data points from each class.
The Maximum Margin Concept
Class A: ○ ○ ○ Class B: × × ×
Poor boundary (close to both): ○ ○ | × × ↑ small margin
Optimal SVM boundary (maximum margin): ○ ○ ←margin→ × × ○ ○ | | × × ↑ decision boundary ← support vectors: the points closest to the boundarySupport vectors are the training points that define and constrain the margin. If you remove any other training point, the boundary stays the same. SVMs are efficient because most data points don’t matter — only the support vectors do.
Linear SVM
from sklearn.svm import SVCfrom sklearn.preprocessing import StandardScalerfrom sklearn.pipeline import Pipeline
# IMPORTANT: Always scale features before SVMsvm_pipeline = Pipeline([ ('scaler', StandardScaler()), ('svc', SVC(kernel='linear', C=1.0, probability=True))])
svm_pipeline.fit(X_train, y_train)
# Decision function gives distance from hyperplanedistances = svm_pipeline.decision_function(X_test)predictions = svm_pipeline.predict(X_test)The C parameter controls the soft-margin trade-off:
- Small C: Allow more misclassifications → wider margin → more generalizable
- Large C: Allow fewer misclassifications → narrower margin → may overfit
The Kernel Trick
Linear SVMs can only draw straight-line boundaries. The kernel trick maps data to a higher-dimensional space where it becomes linearly separable — without explicitly computing the transformation.
2D non-linearly separable → 3D linearly separable:
○ × ○ ○ × ○ × → × × × ○ × ○ ○ (z = x² + y² added)Common Kernels
# RBF (Radial Basis Function) — most commonly used# Works well when decision boundary is complex, unknown shapesvm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale')
# Polynomial — good for image/text data with polynomial structuresvm_poly = SVC(kernel='poly', degree=3, C=1.0)
# Linear — best when data is high-dimensional (text) or linearly separablesvm_linear = SVC(kernel='linear', C=1.0)
# Sigmoid — rare, can behave like a neural network's output layersvm_sig = SVC(kernel='sigmoid', C=1.0)The gamma parameter (RBF/polynomial/sigmoid): controls the influence of each training point.
- High gamma: each point has close influence → complex, tight boundary → risk of overfitting
- Low gamma: each point has far-reaching influence → smooth boundary → risk of underfitting
Hyperparameter Tuning
from sklearn.model_selection import GridSearchCV
param_grid = { 'svc__C': [0.1, 1, 10, 100], 'svc__gamma': ['scale', 'auto', 0.001, 0.01, 0.1], 'svc__kernel': ['rbf', 'linear']}
grid_search = GridSearchCV( svm_pipeline, param_grid, cv=5, scoring='accuracy', n_jobs=-1, verbose=1)grid_search.fit(X_train, y_train)
print(f"Best params: {grid_search.best_params_}")print(f"Best CV score: {grid_search.best_score_:.4f}")Support Vector Regression (SVR)
SVMs extend naturally to regression. Instead of maximizing the margin around a boundary, SVR fits a tube of width ε around the regression line, ignoring errors within the tube.
from sklearn.svm import SVR
svr = Pipeline([ ('scaler', StandardScaler()), ('svr', SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1))])
svr.fit(X_train, y_train)ε (epsilon): The tube width. Points inside the tube contribute zero loss — SVR is robust to small noise.
When to Use SVMs
SVMs work well when:
- Dataset is small to medium (< 100k samples; scales as O(n² to n³))
- Features are well-engineered and informative
- Classes are high-dimensional (text, gene expression)
- The data is somewhat linearly separable after feature engineering
Consider alternatives when:
- Dataset is very large (use LinearSVC or SGDClassifier instead)
- You need fast prediction (tree-based models are faster)
- You need native probability estimates (SVM probabilities via Platt scaling are slow and sometimes inaccurate)
- You need interpretability (SVMs are harder to explain than trees)
LinearSVC for Large Datasets
For large datasets where full SVM is too slow, LinearSVC uses a more efficient solver:
from sklearn.svm import LinearSVCfrom sklearn.calibration import CalibratedClassifierCV
# LinearSVC doesn't output probabilities nativelylinear_svc = LinearSVC(C=1.0, max_iter=2000)# Wrap with Platt scaling for probabilitiescalibrated = CalibratedClassifierCV(linear_svc, cv=5)calibrated.fit(X_train, y_train)SVMs remain competitive on small, well-preprocessed datasets, particularly in fields like bioinformatics and NLP where they were long the state of the art before deep learning.