Cloud/ AWS / AWS Certified Machine Learning Engineer โ€” Associate (MLA-C01) / MLA-C01 Model Training: Algorithms, Tuning & Distributed Training on SageMaker

AWS Amazon Web Services Associate Step 2 of 5 106 guides ยท updated 2026

Hands-on guides to compute, storage, databases, networking, and serverless on the world's most widely adopted cloud platform.

Step 2 โ€” Model Development & Training

Once the data is clean and sitting in a feature group or an S3 prefix, the real engineering decisions start. This step is where the exam separates people whoโ€™ve only trained models in a notebook from people whoโ€™ve had to make a training job survive contact with a real budget and a real deadline.


Built-In Algorithms vs. Bring-Your-Own

SageMaker gives you three ways to get a model trained, and picking the wrong one for a scenario is a classic wrong-answer trap.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Built-in Algorithms โ”‚ Script Mode (BYO code) โ”‚ Bring Your Own Container โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ XGBoost, Linear โ”‚ Your training script + โ”‚ Custom Docker image with โ”‚
โ”‚ Learner, k-NN, DeepAR โ”‚ a prebuilt framework โ”‚ full control over runtimeโ”‚
โ”‚ Fastest to deploy, โ”‚ container (TF, PyTorch,โ”‚ Use when you need custom โ”‚
โ”‚ least flexible โ”‚ MXNet, Hugging Face) โ”‚ system deps or a stack โ”‚
โ”‚ โ”‚ โ”‚ SageMaker doesn't ship โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

If a question describes a tabular classification/regression problem with no unusual requirements, XGBoost is almost always the โ€œbestโ€ built-in answer โ€” itโ€™s fast, well-documented, and handles missing values natively. DeepAR is the built-in for time-series forecasting with multiple related series. BlazingText covers word embeddings and text classification at scale. For anything involving a custom PyTorch or TensorFlow architecture, script mode is the answer โ€” you keep your own training code but let SageMaker manage the infrastructure, container, and I/O channels around it.

Bring-your-own-container is reserved for edge cases: an unsupported framework, a specific CUDA/driver version, or a non-Python runtime. If a question is testing โ€œleast operational overhead,โ€ BYOC is rarely correct โ€” itโ€™s the highest-maintenance option.

SageMaker JumpStart deserves a mention here too: itโ€™s the model hub for pretrained foundation models and common architectures, letting you fine-tune instead of training from scratch. On the exam, JumpStart is the answer whenever the scenario is โ€œwe want to fine-tune an existing large model quicklyโ€ rather than build one from zero.


Hyperparameter Tuning

SageMaker Automatic Model Tuning (AMT) runs multiple training jobs with different hyperparameter combinations and picks the best by an objective metric you define.

StrategyHow it searchesTrade-off
Grid searchExhaustively tries every combinationGuaranteed coverage, expensive at scale
Random searchSamples combinations at randomCheaper, surprisingly competitive with grid
Bayesian optimizationUses prior results to pick the next combinationFewest jobs needed, default AMT strategy
HyperbandEarly-stops poorly performing jobsBest when training is expensive and many configs are clearly bad early

Bayesian optimization is the default and generally the right answer when a question asks โ€œwhich strategy finds a good hyperparameter combination with the fewest training jobs.โ€ Hyperband is the answer when the emphasis is on cost control for expensive training runs, since it kills bad trials before they finish.

A subtlety worth internalizing: AMT parallelizes jobs, but too much parallelism with Bayesian optimization actually hurts it, because the strategy canโ€™t learn from jobs that havenโ€™t finished yet. If a scenario mentions โ€œwe want maximum parallelism,โ€ thatโ€™s a mild signal toward random search instead.


Distributed Training

Once a single GPU or a single instance canโ€™t hold the model or the data, you split the work. Two strategies, and the exam wants you to know which problem each one solves.

DATA PARALLELISM MODEL PARALLELISM
(model fits on one device, (model does NOT fit on one device)
dataset is the bottleneck)
GPU 1: full model, batch A GPU 1: layers 1-10
GPU 2: full model, batch B GPU 2: layers 11-20
GPU 3: full model, batch C GPU 3: layers 21-30
โ”‚ โ”‚
โ–ผ โ–ผ
Gradients averaged/synced Activations passed between
across devices each step devices in a pipeline

SageMakerโ€™s Distributed Data Parallel (SMDDP) library optimizes the gradient-sync step specifically for AWS networking, and itโ€™s the answer whenever a question is about scaling training throughput across many GPUs with an already-fits-in-memory model. SageMaker Model Parallel is the answer when the model itself โ€” think large language models โ€” is too big for one acceleratorโ€™s memory.


Training Infrastructure Choices

Instance selection is one of those areas where the exam expects current, practical judgment rather than memorized specs.

Managed Spot Training is a near-guaranteed exam topic: it runs training jobs on Spot capacity for up to 90% savings versus on-demand, and SageMaker automatically checkpoints and resumes if the instance is reclaimed, so you donโ€™t lose all your progress. The trade-off is unpredictable start times and possible interruption โ€” fine for most training jobs, risky for anything with a hard deadline where you canโ€™t tolerate delay.

Managed Spot Training flow:
Training job starts โ”€โ”€โ–บ checkpoint saved every N steps โ”€โ”€โ–บ S3
โ”‚
โ–ผ
Spot interruption (2-min warning)
โ”‚
โ–ผ
Job automatically resumes from last checkpoint on new Spot capacity

Evaluation Metrics and Experiment Tracking

Choosing the right metric is inseparable from choosing the right training approach, and the exam frequently tests this pairing rather than metrics in isolation.

Problem typeCommon metricsNotes
Binary classification (balanced)Accuracy, AUC-ROCFine when classes are roughly balanced
Binary classification (imbalanced)Precision, Recall, F1, PR-AUCAccuracy is misleading here
Multi-class classificationMacro/micro F1, confusion matrixMacro F1 when classes matter equally
RegressionRMSE, MAE, RยฒRMSE penalizes large errors more than MAE
Ranking/recommendationNDCG, MAPOrder matters, not just correctness

For tracking, SageMaker Experiments (built into Studio) automatically logs parameters, metrics, and artifacts for every training run tied to a pipeline, so you can compare runs side by side instead of relying on someoneโ€™s spreadsheet of results. This matters operationally as much as it matters for the exam: reproducibility questions almost always trace back to whether experiment metadata was captured at training time, not reconstructed after the fact.


Exam Focus: What Questions Test From This Step