Blog: policy-gradients | Vitor Sousa — AI Engineer & Data Scientist <meta name="astro-view-transitions-enabled" content="true"><meta name="astro-view-transitions-fallback" content="animate"> <script> (() => { const storageKey = 'vitor-theme'; const getPreferred = () => { try { const saved = window.localStorage.getItem(storageKey); if (saved === 'light' || saved === 'dark') return saved; } catch (error) { console.warn('Unable to access theme preference storage.', error); } return window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light'; }; /** * @param {'light' | 'dark'} theme */ const applyTheme = (theme) => { const root = document.documentElement; root.dataset.theme = theme; root.style.colorScheme = theme; }; applyTheme(getPreferred()); })(); </script>

Diagram showing the reinforcement learning loop applied to language model fine-tuning

Reinforcement Learning Foundations for LLM Alignment

Master the RL fundamentals powering modern LLM training: from MDPs and policy gradients through value functions and actor-critic methods. The mathematical foundations you need before diving into PPO, GRPO, and beyond.

Series

Policy Optimization for LLMs: From Fundamentals to Production Part 1

Jan 11, 2026 ~35 min

Read article

Articles tagged policy-gradients

Reinforcement Learning Foundations for LLM Alignment