Start Here
Find the most relevant content based on what you're working on.
Curated Reading Paths
Reinforcement Learning for LLMs
From PPO fundamentals to GRPO and GDPO — the complete policy optimization series.
View series landing page →Contextual Bandits
A 5-part series from decision framework to production deployment.
- Part 1: The Decision Framework
- Part 2: Theory & Regret Bounds
- Part 3: Algorithm Guide
- Part 4: Neural Bandits
- Part 5: Production Deployment
Recommendation Systems
From contextual bandits for personalization to retrieval-augmented generation.
Foundations
Reference explainers covering the building blocks of modern ML. Start with the Transformer Internals series.
View all foundations →