RLVR from Scratch: Full LLM Alignment Pipeline
A from-scratch implementation of the full transformer → pretraining → SFT → GRPO → GDPO pipeline. Each layer built, tested, and documented. The repo is the artifact, the site is the narrative.
Personal project
🔨 In Development — Phase 1/5