Hardware & Model
GPU RTX 3060 — 12 GB VRAM (safe: 8 GB) RTX 4080 — 16 GB VRAM (safe: 12 GB) RTX 4090 — 24 GB VRAM (safe: 20 GB) A10G — 24 GB VRAM (safe: 20 GB) A100 40G — 40 GB VRAM (safe: 36 GB) A100 80G — 80 GB VRAM (safe: 72 GB) H100 SXM — 80 GB VRAM (safe: 72 GB)
Model Size (billion parameters) 125M — BERT-base / GPT-2 small 350M — GPT-2 medium 1.3B — OPT-1.3B 7B — LLaMA-2 7B / Mistral 7B 13B — LLaMA-2 13B 30B — LLaMA-2 30B 70B — LLaMA-2 70B
Precision FP32 — Full precision BF16/FP16 — Mixed precision INT8 — 8-bit quantization INT4 — 4-bit (GPTQ/QLoRA)
Optimizer AdamW (2× weight states) AdaFactor / 8-bit Adam (1×) SGD (no momentum, 0×)
Sequence Length (tokens) 256 tokens 512 tokens 1 024 tokens 2 048 tokens 4 096 tokens
Number of GPUs 1 GPU 2 GPUs (DDP) 4 GPUs (DDP) 8 GPUs (DDP)
Target Effective Batch Size 32 (small models / fine-tuning) 64 128 256 (recommended for pre-training) 512 1 024 (large-scale)
Calculate Optimal Batch Size →