Cloud/ AWS / AWS Certified AI Practitioner (AIF-C01) / Generative AI & Foundation Models Explained — AWS AIF-C01 Step 2

AWS Amazon Web Services Foundational Step 2 of 5 106 guides · updated 2026

Hands-on guides to compute, storage, databases, networking, and serverless on the world's most widely adopted cloud platform.

Step 2 — Generative AI & Foundation Models

Ask ten people what a “foundation model” actually is and you’ll get ten vague answers involving the word “big.” The exam wants precision, not vibes. This step pulls apart the machinery — what these models are built from, how you steer them, and where they quietly fail you if you’re not careful.


What Makes a Model a “Foundation Model”

A foundation model is a large model trained on a broad, diverse swath of data — text, code, images, or a mix — such that it develops general-purpose capabilities rather than being built for one narrow task. The defining trait isn’t size alone; it’s that one trained model can be adapted to many downstream jobs: summarizing a document today, drafting an email tomorrow, classifying support tickets the day after, all without retraining from scratch.

Compare that to the older, narrow-ML approach:

TRADITIONAL ML FOUNDATION MODEL APPROACH
──────────────────────── ────────────────────────────
Train model A → spam filter Train ONE large model on
Train model B → sentiment score massive general data
Train model C → topic classifier │
Train model D → translation ┌────────┴────────┐
▼ ▼ ▼
Each model: separate Prompt Fine-tune RAG-augment
training run, separate for spam for legal for company
dataset, narrow skill detection docs Q&A knowledge base

That single foundation model can be steered toward wildly different jobs depending on how you adapt it — which brings us to the three adaptation strategies the exam cares about.


Pretraining, Fine-Tuning, and Prompting — Three Different Levers

Pretraining happens once, by the model provider, on enormous datasets, at enormous cost. This is where the model learns grammar, facts, reasoning patterns, and coding syntax. As someone building on AWS, you almost never do this yourself — you consume an already-pretrained model through Bedrock or SageMaker JumpStart.

Fine-tuning takes a pretrained model and continues training it on a smaller, task-specific or domain-specific dataset, adjusting the model’s weights so it performs better on that narrower job — say, a customer service tone specific to your company, or terminology specific to your industry. It costs more than prompting and takes more time, but it can bake behavior in more durably.

Prompting is the cheapest and fastest lever: you don’t change the model’s weights at all, you just craft the input text carefully to get the output you want. Techniques like zero-shot (just ask), few-shot (show examples in the prompt), and chain-of-thought (ask the model to reason step by step) all fall under this umbrella.

ApproachChanges Model Weights?CostSpeed to ImplementBest For
PretrainingYes — from scratchVery highMonthsBuilding a brand-new foundation model (rare for most orgs)
Fine-tuningYes — incrementalModerate to highDays to weeksDomain-specific tone, terminology, or task specialization
Prompt engineeringNoLowMinutesQuick iteration, general-purpose tasks
RAG (retrieval-augmented generation)NoLow to moderateHours to daysGrounding answers in current, private, or factual data

Notice RAG sits in that table too — it’s not fine-tuning, and the exam loves to test that distinction. RAG doesn’t change the model at all; it changes what information the model sees at the moment of answering, by retrieving relevant documents and stuffing them into the prompt.


Tokens and Context Windows, Without the Math

A model doesn’t read text the way you do. It breaks input into tokens — chunks that might be a whole word, part of a word, or punctuation. As a rough rule of thumb, a token is a bit less than one English word on average. “Understanding” might split into two tokens; “cat” is probably one.

The context window is the maximum number of tokens a model can consider at once — both what you feed in (the prompt, any retrieved documents, conversation history) and what it generates back. Run out of room, and older content gets truncated or dropped.

CONTEXT WINDOW (fixed capacity, measured in tokens)
┌────────────────────────────────────────────────────────┐
│ System │ Retrieved │ Conversation │ Model │
│ Prompt │ Documents │ History │ Output │
│ (rules) │ (RAG chunks) │ (prior turns) │ (reply) │
└────────────────────────────────────────────────────────┘
▲ ▲
everything above must fit inside one window ──────┘

By 2026, context windows on frontier models have grown large enough to hold entire books or codebases in a single pass, which changes the design conversation: instead of always chunking documents into tiny fragments, teams can sometimes feed much larger source material directly. But bigger context windows cost more per request and can still suffer from the model paying less attention to content buried in the middle of a very long prompt — so retrieval and summarization remain relevant skills, not obsolete ones.


Embeddings and Vector Search, Conceptually

An embedding is a numeric representation of text (or an image, or audio) — a list of numbers, a vector, positioned in a high-dimensional space such that semantically similar items end up near each other. “Puppy” and “dog” land close together; “puppy” and “spreadsheet” land far apart.

This is the trick behind semantic search: instead of matching exact keywords, you convert a search query into an embedding and find the stored documents whose embeddings are nearest to it — even if they don’t share a single word in common.

"How do I reset my password?"
[ Embed the query ]
Vector: [0.12, -0.87, 0.44, ...]
┌──────────────┼──────────────┐
▼ ▼ ▼
Doc: "Account Doc: "Login Doc: "Shipping
recovery steps" troubleshooting" policy FAQ"
distance: 0.04 distance: 0.09 distance: 0.91
│ │
┌───┴──────────────┘
Closest matches returned to the model as context

A vector database (or vector index) stores millions of these embeddings and can retrieve the nearest neighbors fast. On AWS, this capability shows up inside Bedrock Knowledge Bases, in OpenSearch Service’s vector engine, and in vector capabilities added to several managed database services — the exam wants you to recognize the pattern conceptually more than memorize every product name that supports it.


What Generative AI Is Actually Good At


Where It Falls Down — Know These Cold

The exam will absolutely test your understanding of generative AI’s limitations, because deploying it responsibly means knowing where it can quietly mislead you.

Hallucination — The model generates plausible-sounding but factually incorrect content, stated with the same confidence as correct content. It isn’t “lying” — it’s predicting likely-sounding text, and likely-sounding isn’t the same as true. This is precisely why RAG and grounding in verified data matter so much for factual use cases.

Bias — Models learn from training data, and training data reflects the biases present in the world and in whoever curated it. A model can reproduce and even amplify stereotypes present in its training corpus unless deliberately mitigated.

Non-determinism — Ask the same model the same question twice and you may get two differently worded (sometimes substantively different) answers, especially with non-zero “temperature” settings that inject randomness into generation. This matters for testing, auditing, and any workflow that assumes reproducibility.

Lack of true understanding — These models predict statistically likely next tokens; they don’t “reason” the way a human does, even when their output reads like careful reasoning. That’s worth sitting with — output fluency is not proof of correctness.


Exam Focus: What Questions Test From This Step