Step 1 โ AI/ML Fundamentals
Most people studying for AIF-C01 have used ChatGPT or Alexa, but few can explain the layers underneath them. That gap is exactly what this exam probes first โ not your ability to write code, but whether you understand what these systems are, how they learn, and which AWS service handles which job. Letโs build that foundation properly.
Untangling AI, ML, and Deep Learning
These three terms get thrown around interchangeably, and the exam will absolutely test whether you know theyโre not the same thing. Think of them as nested circles, each one a subset of the one before it.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Artificial Intelligence โโ Any technique that lets machines mimic behavior โโ we'd call "intelligent" (rule engines, planning, โโ search algorithms, expert systems, ML...) โโ โโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโ โ Machine Learning โ โโ โ Systems that learn patterns from data โ โโ โ instead of being explicitly programmed โ โโ โ โ โโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โโ โ โ Deep Learning โ โ โโ โ โ ML using multi-layer neural โ โ โโ โ โ networks โ powers modern vision, โ โ โโ โ โ speech, and language models โ โ โโ โ โ โ โ โโ โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ โโ โ โ โ Generative AI โ โ โ โโ โ โ โ Deep learning models that โ โ โ โโ โ โ โ create new content: text, โ โ โ โโ โ โ โ images, audio, code โ โ โ โโ โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ โโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโAI is the umbrella. A 1980s chess program using hand-written rules is AI, but it isnโt ML โ nobody trained it on data, someone just wrote โif opponent moves here, respond there.โ ML flips that: instead of writing the rules, you feed the system examples and let it discover the rules itself. Deep learning is a specific ML approach built from layered neural networks, and generative AI is deep learning aimed specifically at producing new content rather than just classifying or predicting a number.
You will see distractor answers on the exam that swap these terms deliberately โ โwhich of the following is a subset of machine learningโ being one of the more common phrasings. Keep the nesting order memorized and youโll clear those questions without even reading the options twice.
The Three Ways a Model Learns
Every ML approach fits into one of a small number of learning paradigms. AIF-C01 wants conceptual fluency here โ no math, just correct pattern recognition.
Supervised learning โ You give the model labeled examples: inputs paired with correct answers. Show it thousands of emails tagged โspamโ or โnot spam,โ and it learns the mapping. This covers most classification and regression tasks: predicting house prices, detecting fraud, classifying images.
Unsupervised learning โ No labels at all. The model looks at raw data and finds structure on its own โ grouping similar customers together (clustering), or reducing thousands of features down to a handful that matter (dimensionality reduction). You use this when you donโt already know what the โright answerโ categories are.
Reinforcement learning โ An agent takes actions in an environment and gets rewards or penalties, gradually learning a policy that maximizes reward over time. Think of a robot learning to walk, or a game-playing agent. This paradigm also underlies how many modern chat-style models get fine-tuned to be more helpful and less harmful โ a technique broadly known as reinforcement learning from human feedback.
| Learning Type | Data Needed | Typical Use Case | Example AWS Service |
|---|---|---|---|
| Supervised | Labeled data | Fraud detection, demand forecasting | SageMaker (built-in algorithms) |
| Unsupervised | Unlabeled data | Customer segmentation, anomaly detection | SageMaker (clustering algorithms) |
| Reinforcement | Reward signal, no fixed labels | Robotics, game AI, model alignment | SageMaker RL, Bedrock model tuning |
A quick gut-check question for yourself: if someone hands you a spreadsheet of 50,000 past loan applications with an โapproved / deniedโ column already filled in, which paradigm applies? Supervised โ the labels already exist, youโre just learning the mapping.
The ML Lifecycle, Start to Finish
Exam questions frequently describe a scenario and ask โwhich phase of the ML lifecycle does this represent?โ So it pays to know the stages cold, and to know they loop rather than run once.
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ Business โโโโโโถโ Data โโโโโโถโ Data โ โ Problem โ โ Collection โ โ Preparation โ โ Framing โ โ โ โ & Cleaning โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโโ โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโผโโโโโโโ โ Monitoring โโโโโโโ Deployment โโโโโโโ Model โ โ & Retraining โ โ & Inference โ โ Training โ โโโโโโโโฌโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโโ โ โ โ โโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโถโ Evaluation โโโโโโโโโโโโโโ โ & Tuning โ โโโโโโโโโโโโโโโโBusiness problem framing โ Before touching data, define success. Are you optimizing for accuracy, latency, or cost? A fraud model thatโs 99.9% accurate is worthless if it misses the 0.1% of transactions that are actually fraudulent โ this is where accuracy alone can lie to you.
Data collection โ Pulling raw data from wherever it lives: databases, logs, S3 buckets, third-party feeds.
Data preparation โ Cleaning, deduplicating, handling missing values, and splitting into training, validation, and test sets. This step consumes more practitioner time than any other, and the exam knows it โ expect at least one question framed around โthe model is underperforming because of data quality.โ
Model training โ Feeding the prepared data through an algorithm so it learns parameters that minimize error.
Evaluation and tuning โ Measuring performance against the test set using metrics appropriate to the task (accuracy, precision, recall, F1, RMSE, depending on whether itโs classification or regression), then adjusting hyperparameters and repeating.
Deployment and inference โ Putting the trained model where it can serve real predictions, whether thatโs real-time (an endpoint responding in milliseconds) or batch (processing a large file overnight).
Monitoring and retraining โ Watching for model drift as real-world data shifts away from training data, then retraining before accuracy silently degrades.
Notice the loop back from monitoring into data collection. Models are not โfinishedโ the day they deploy โ they decay as the world around them changes, and a mature ML practice budgets for that from day one.
Where Real Organizations Use This
The exam likes to test recognition of use cases, matching a business scenario to the right category of AI/ML solution. A few patterns worth internalizing:
- Healthcare โ Medical image analysis, patient readmission risk scoring, drug discovery acceleration
- Financial services โ Fraud detection, credit risk modeling, algorithmic trading signals
- Retail โ Demand forecasting, personalized recommendations, dynamic pricing
- Manufacturing โ Predictive maintenance (catching equipment failure before it happens), quality inspection via computer vision
- Media and entertainment โ Content recommendation, automated captioning, generative content creation
- Customer service โ Chatbots, sentiment analysis on support tickets, call transcription and summarization
Where AWS Fits: The AI/ML Stack
AWS organizes its AI/ML offerings into three broad layers, each trading flexibility for ease of use in the opposite direction.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ TOP LAYER โ AI Services (pre-built, API-driven) โโ Rekognition (vision) ยท Comprehend (NLP) ยท Textract (OCR) โโ Transcribe (speech-to-text) ยท Polly (text-to-speech) โโ Translate ยท Lex (conversational bots) โโ โ No ML expertise required, fastest time to value โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ MIDDLE LAYER โ Amazon Bedrock โโ Access foundation models from multiple providers, โโ build RAG apps, agents, and custom generative solutions โโ โ Some prompt/architecture skill needed, no infra to manage โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ BOTTOM LAYER โ Amazon SageMaker โโ Full ML platform: build, train, tune, deploy custom models โโ โ Requires ML/data science skill, maximum control โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโIf a scenario says โwe need to extract text from scanned invoices quickly, no ML team available,โ the answer is Textract โ a pre-built AI service, not a custom SageMaker model. If the scenario says โwe need full control over model architecture and training data for a proprietary use case,โ that points toward SageMaker. If it says โwe want to build a chatbot on top of an existing large language model without training anything from scratch,โ thatโs Bedrock.
A rough mental shortcut that helps on exam day: the higher you go up that stack, the less you build and the faster you ship; the lower you go, the more control you get and the more expertise it demands.
Exam Focus: What Questions Test From This Step
- Correctly ordering AI โ ML โ Deep Learning โ Generative AI as nested subsets, not synonyms
- Matching a data scenario (labeled vs. unlabeled vs. reward-based) to supervised, unsupervised, or reinforcement learning
- Identifying which phase of the ML lifecycle a described activity belongs to, especially data preparation and monitoring/drift
- Recognizing that models degrade over time and require retraining โ not a โset and forgetโ exam trap
- Choosing the correct AWS layer (AI services vs. Bedrock vs. SageMaker) given a scenarioโs skill level and control requirements
- Matching business use cases (fraud detection, predictive maintenance, personalization) to the right AI/ML category