Machine Learning Fundamentals
Machine learning is the engine that powers generative AI. Before you can understand LLMs, diffusion models, or AI agents, you need a firm grasp of what ML actually is and how its three main paradigms differ. This guide gives you that foundation — without getting lost in the mathematics.
What Machine Learning Actually Is
Traditional programming looks like this:
Rules + Data → OutputMachine learning flips it:
Data + Output (labels) → Rules (the model)Instead of writing explicit instructions, you feed examples to an algorithm and let it discover the patterns. Once trained, the model can apply those patterns to new, unseen data.
That’s the whole idea. Everything else — neural networks, attention, RLHF — is a specific way of implementing that idea at different scales and for different tasks.
Supervised Learning
This is the most common type and the easiest to reason about. You have a dataset where every input comes paired with a correct answer (a label). The model trains by comparing its predictions to those labels and adjusting itself to minimize errors.
Training:Input: Email text → Model → Prediction: "Spam" ↑ Compare with true label: "Spam" ✓ Update weights if wrongClassic examples:
- Email spam classification
- House price prediction from features
- Medical image diagnosis (tumor vs. no tumor)
- Sentiment analysis on reviews
How does it learn? Through a process called gradient descent. The model computes a loss (how wrong it was), calculates gradients (which direction to nudge each parameter to reduce loss), and takes a small step in that direction. Repeat millions of times. This is called backpropagation when applied to neural networks.
Two Flavours
| Type | Output | Example |
|---|---|---|
| Classification | Discrete category | Is this review positive or negative? |
| Regression | Continuous number | What will this stock be worth tomorrow? |
Unsupervised Learning
Here, you give the model data but no labels. The model must find structure on its own — discovering clusters, patterns, and representations without being told what to look for.
Data → Model → discovers → Groups / Patterns / Reduced dimensions(no labels)Why is this powerful? Labeling data is expensive and slow. Unsupervised techniques let you extract insights from raw, unlabeled data at scale.
Key techniques:
Clustering
Groups similar data points together. K-Means, DBSCAN, and hierarchical clustering are the workhorses. Useful for customer segmentation, anomaly detection, document grouping.
Dataset of customers:High spender, frequent → Cluster A (VIP)Low spender, rare → Cluster B (Dormant)Mid spender, occasional → Cluster C (Regular)Dimensionality Reduction
Compresses high-dimensional data into fewer dimensions while preserving structure. PCA and t-SNE are classical; autoencoders and modern embedding models do this implicitly as part of deep learning.
Generative Modelling
Learning the underlying distribution of data so you can sample from it — this is the bridge to generative AI. VAEs (Variational Autoencoders) and GANs live here.
Semi-Supervised and Self-Supervised Learning
Two variants worth knowing because they’re central to how modern AI works at scale:
Semi-supervised: A small amount of labeled data plus a large amount of unlabeled data. The model uses the unlabeled data to build better representations and the labeled data to fine-tune direction.
Self-supervised: The model creates its own labels from the data. For language models, this means predicting the next word. For vision models, it might mean predicting a masked patch. This is how GPT, BERT, and CLIP were trained — no human labels needed for the pre-training phase.
Self-supervised language example:Input: "The transformer architecture was introduced in ___"Target: "2017" ← derived from the text itself, not a human annotatorDeep Learning
Deep learning is a subset of machine learning that uses neural networks with many layers (hence “deep”). It’s what makes modern generative AI possible.
Input → [Layer 1] → [Layer 2] → ... → [Layer N] → Output Simple More Complex features complex features abstractionsIn a deep network, each layer learns increasingly abstract representations of the input. For an image classifier:
- Layer 1 might detect edges
- Layer 3 might detect shapes
- Layer 7 might detect “wheel” or “face”
- Final layer makes the classification
This automatic feature learning is why deep learning beat traditional ML on so many tasks — you no longer need to hand-engineer features.
What You Need for Deep Learning
| Ingredient | Why It Matters |
|---|---|
| Large dataset | More data → better generalization |
| GPU compute | Matrix multiplications in parallel |
| Architecture choice | CNN for vision, Transformer for sequences |
| Optimization algorithm | Adam, SGD with momentum |
| Regularization | Dropout, weight decay to prevent overfitting |
Reinforcement Learning (Briefly)
Worth knowing because it’s central to aligning LLMs. An agent takes actions in an environment, receives rewards for good actions, and learns a policy that maximizes long-term reward.
Agent → Action → Environment → Reward/State ↑ | └───────────────────────────────────┘ Update policyIn LLM alignment (RLHF), human raters score model outputs. Those scores train a reward model, which then guides further fine-tuning of the LLM via RL. The result: a model that generates responses humans actually prefer.
Where ML Sits in the AI Stack (2026)
┌──────────────────────────────┐ │ Agentic AI / Copilots │ ← Applications ├──────────────────────────────┤ │ LLMs / Diffusion Models │ ← Generative AI ├──────────────────────────────┤ │ Deep Learning │ ← Architecture ├──────────────────────────────┤ │ Machine Learning │ ← Learning paradigm ├──────────────────────────────┤ │ Statistics & Linear Algebra │ ← Mathematics └──────────────────────────────┘Every layer depends on the one below. You don’t need to master all of them, but understanding at least two or three levels makes you dramatically more effective when working with AI systems.
Practical Takeaway
If you’re building or using AI systems in 2026, here’s what actually matters day-to-day:
- Supervised learning is how most production ML models are built and evaluated
- Self-supervised pre-training is how the large foundation models you interact with were created
- RL/RLHF is how those foundation models are aligned to your preferences
- Unsupervised clustering and embeddings are how you search, organize, and retrieve knowledge in RAG systems
You don’t need to implement backpropagation from scratch to be effective. But knowing which paradigm applies to which problem will save you weeks of misdirection.