Start Here

Find the most relevant content based on what you're working on.

Reinforcement Learning for LLMs

From RL foundations through PPO, GRPO, and GDPO to the GRPO family map — the complete policy optimization series.

Part 1: RL Foundations
Part 2: PPO Deep Dive
Part 3: GRPO
Part 4: GDPO
Part 5: The GRPO Family Map

View series landing page →

Contextual Bandits

A 5-part series from decision framework to production deployment.

Part 1: The Decision Framework
Part 2: Theory & Regret Bounds
Part 3: Algorithm Guide
Part 4: Neural Bandits
Part 5: Production Deployment

View series landing page →

Recommendation Systems

From contextual bandits for personalization to retrieval-augmented generation.

Contextual Bandits in Production
RAG System with LlamaIndex, Elasticsearch & Llama3

LLM Evaluation

Moving beyond vibes to systematic evaluation.

Beyond the Vibe Check

Foundations

Reference explainers covering the building blocks of modern ML. Start with the Transformer Internals series — written alongside RLVR from Scratch, where every component is implemented and tested from raw tensors.