Complete LLM Transformer Engineering Notes

Complete LLM Transformer Engineering Mastery: From Scratch to 124M GPT

Welcome to Complete LLM Transformer Engineering Notes

Select a module from the sidebar to begin.

About This Course

Complete LLM Transformer Engineering Mastery: From Scratch to 124M GPT

What You'll Learn

  • Encoder-Decoder Transformers
  • FlashAttention from Scratch
  • Capstone: Build Your GPT from Scratch
  • Scaling Laws & Optimization
  • Byte-Level BPE from Scratch
  • Tokenization & Vocabulary
  • Beam Search & Sampling
  • Inference & KV Cache
  • Training Loop & Backpropagation
  • Decoder-Only Architecture
  • "Attention is All You Need" — Feedforward & Residuals
  • "Attention is All You Need" — Positional Encoding
  • "Attention is All You Need" — Multi-Head & Self-Attention
  • "Attention is All You Need" — Add Positional Encodings
  • "Attention is All You Need" — Build Scaled Dot-Product Attention from Scratch
  • NumPy → PyTorch: Math, Tensors, Arrays, Matrices & Vector Operations
  • Complete LLM Engineering Mastery: From Scratch to 124M GPT
  • FlashAttention: Optimization Details for Efficient Exact Attention