Machine Learning Fundamentals

Machine learning is the engine that powers generative AI. Before you can understand LLMs, diffusion models, or AI agents, you need a firm grasp of what ML actually is and how its three main paradigms differ. This guide gives you that foundation — without getting lost in the mathematics.

What Machine Learning Actually Is

Traditional programming looks like this:

Rules + Data → Output

Machine learning flips it:

Data + Output (labels) → Rules (the model)

Instead of writing explicit instructions, you feed examples to an algorithm and let it discover the patterns. Once trained, the model can apply those patterns to new, unseen data.

That’s the whole idea. Everything else — neural networks, attention, RLHF — is a specific way of implementing that idea at different scales and for different tasks.

Supervised Learning

This is the most common type and the easiest to reason about. You have a dataset where every input comes paired with a correct answer (a label). The model trains by comparing its predictions to those labels and adjusting itself to minimize errors.

Training:
Input: Email text  →  Model  →  Prediction: "Spam"
                       ↑
                  Compare with true label: "Spam" ✓
                  Update weights if wrong

Classic examples:

Email spam classification
House price prediction from features
Medical image diagnosis (tumor vs. no tumor)
Sentiment analysis on reviews

How does it learn? Through a process called gradient descent. The model computes a loss (how wrong it was), calculates gradients (which direction to nudge each parameter to reduce loss), and takes a small step in that direction. Repeat millions of times. This is called backpropagation when applied to neural networks.

Two Flavours

Type	Output	Example
Classification	Discrete category	Is this review positive or negative?
Regression	Continuous number	What will this stock be worth tomorrow?

Unsupervised Learning

Here, you give the model data but no labels. The model must find structure on its own — discovering clusters, patterns, and representations without being told what to look for.

Data → Model → discovers → Groups / Patterns / Reduced dimensions
(no labels)

Why is this powerful? Labeling data is expensive and slow. Unsupervised techniques let you extract insights from raw, unlabeled data at scale.

Key techniques:

Clustering

Groups similar data points together. K-Means, DBSCAN, and hierarchical clustering are the workhorses. Useful for customer segmentation, anomaly detection, document grouping.

Dataset of customers:
High spender, frequent → Cluster A (VIP)
Low spender, rare      → Cluster B (Dormant)
Mid spender, occasional → Cluster C (Regular)

Dimensionality Reduction

Compresses high-dimensional data into fewer dimensions while preserving structure. PCA and t-SNE are classical; autoencoders and modern embedding models do this implicitly as part of deep learning.

Generative Modelling

Learning the underlying distribution of data so you can sample from it — this is the bridge to generative AI. VAEs (Variational Autoencoders) and GANs live here.

Semi-Supervised and Self-Supervised Learning

Two variants worth knowing because they’re central to how modern AI works at scale:

Semi-supervised: A small amount of labeled data plus a large amount of unlabeled data. The model uses the unlabeled data to build better representations and the labeled data to fine-tune direction.

Self-supervised: The model creates its own labels from the data. For language models, this means predicting the next word. For vision models, it might mean predicting a masked patch. This is how GPT, BERT, and CLIP were trained — no human labels needed for the pre-training phase.

Self-supervised language example:
Input:  "The transformer architecture was introduced in ___"
Target: "2017"  ← derived from the text itself, not a human annotator

Deep Learning

Deep learning is a subset of machine learning that uses neural networks with many layers (hence “deep”). It’s what makes modern generative AI possible.

Input → [Layer 1] → [Layer 2] → ... → [Layer N] → Output
         Simple        More             Complex
         features    complex features   abstractions

In a deep network, each layer learns increasingly abstract representations of the input. For an image classifier:

Layer 1 might detect edges
Layer 3 might detect shapes
Layer 7 might detect “wheel” or “face”
Final layer makes the classification

This automatic feature learning is why deep learning beat traditional ML on so many tasks — you no longer need to hand-engineer features.

What You Need for Deep Learning

Ingredient	Why It Matters
Large dataset	More data → better generalization
GPU compute	Matrix multiplications in parallel
Architecture choice	CNN for vision, Transformer for sequences
Optimization algorithm	Adam, SGD with momentum
Regularization	Dropout, weight decay to prevent overfitting

Reinforcement Learning (Briefly)

Worth knowing because it’s central to aligning LLMs. An agent takes actions in an environment, receives rewards for good actions, and learns a policy that maximizes long-term reward.

Agent → Action → Environment → Reward/State
  ↑                                   |
  └───────────────────────────────────┘
           Update policy

In LLM alignment (RLHF), human raters score model outputs. Those scores train a reward model, which then guides further fine-tuning of the LLM via RL. The result: a model that generates responses humans actually prefer.

Where ML Sits in the AI Stack (2026)

                    ┌──────────────────────────────┐
                    │   Agentic AI / Copilots       │  ← Applications
                    ├──────────────────────────────┤
                    │   LLMs / Diffusion Models     │  ← Generative AI
                    ├──────────────────────────────┤
                    │   Deep Learning               │  ← Architecture
                    ├──────────────────────────────┤
                    │   Machine Learning            │  ← Learning paradigm
                    ├──────────────────────────────┤
                    │   Statistics & Linear Algebra │  ← Mathematics
                    └──────────────────────────────┘

Every layer depends on the one below. You don’t need to master all of them, but understanding at least two or three levels makes you dramatically more effective when working with AI systems.

Practical Takeaway

If you’re building or using AI systems in 2026, here’s what actually matters day-to-day:

Supervised learning is how most production ML models are built and evaluated
Self-supervised pre-training is how the large foundation models you interact with were created
RL/RLHF is how those foundation models are aligned to your preferences
Unsupervised clustering and embeddings are how you search, organize, and retrieve knowledge in RAG systems

You don’t need to implement backpropagation from scratch to be effective. But knowing which paradigm applies to which problem will save you weeks of misdirection.

Written by NPBlue AI Team — AI / ML Engineers who builds and ships production GenAI systems — not just demo notebooks.

Reviewed for technical accuracy. Spot an error? Let us know.