Complete LLM Transformer Engineering Notes
Complete LLM Transformer Engineering Mastery: From Scratch to 124M GPT
Welcome to Complete LLM Transformer Engineering Notes
Select a module from the sidebar to begin.
About This Course
Complete LLM Transformer Engineering Mastery: From Scratch to 124M GPT
What You'll Learn
- Encoder-Decoder Transformers
- FlashAttention from Scratch
- Capstone: Build Your GPT from Scratch
- Scaling Laws & Optimization
- Byte-Level BPE from Scratch
- Tokenization & Vocabulary
- Beam Search & Sampling
- Inference & KV Cache
- Training Loop & Backpropagation
- Decoder-Only Architecture
- "Attention is All You Need" — Feedforward & Residuals
- "Attention is All You Need" — Positional Encoding
- "Attention is All You Need" — Multi-Head & Self-Attention
- "Attention is All You Need" — Add Positional Encodings
- "Attention is All You Need" — Build Scaled Dot-Product Attention from Scratch
- NumPy → PyTorch: Math, Tensors, Arrays, Matrices & Vector Operations
- Complete LLM Engineering Mastery: From Scratch to 124M GPT
- FlashAttention: Optimization Details for Efficient Exact Attention