Timeline | Vitor Sousa — AI Engineer & Data Scientist <meta name="astro-view-transitions-enabled" content="true"><meta name="astro-view-transitions-fallback" content="animate"> <script> (() => { const storageKey = 'vitor-theme'; const getPreferred = () => { try { const saved = window.localStorage.getItem(storageKey); if (saved === 'light' || saved === 'dark') return saved; } catch (error) { console.warn('Unable to access theme preference storage.', error); } return window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light'; }; /** * @param {'light' | 'dark'} theme */ const applyTheme = (theme) => { const root = document.documentElement; root.dataset.theme = theme; root.style.colorScheme = theme; }; applyTheme(getPreferred()); })(); </script>

Living Timeline

This page is a living timeline that brings together my long-form writing, project launches, and a few personal milestones in one chronological stream.

By listing articles and project releases side by side, you can trace how ideas move from early notes into published posts or shipped software over time. Each entry is tagged with the domains involved — from LLM systems and reinforcement learning to the occasional life event that influenced the work.

Everything is ordered from newest to oldest, with upcoming experiments included to highlight what's currently in the pipeline.

Future plans are clearly marked so you can see what might be coming next.

Activity Feed

2026

Feb 3, 2026 Article
GRPO: Eliminating the Value Network (opens project or article)
Group Relative Policy Optimization replaces PPO's learned value function with a simple insight: sample multiple outputs and use their relative rewards as advantages. 33% memory savings, simpler implementation, and the algorithm powering DeepSeek-R1.
- reinforcement-learning
- grpo
- llm-alignment
- deepseek
- policy-optimization
Jan 18, 2026 Article
PPO for Language Models: The RLHF Workhorse (opens project or article)
Deep dive into Proximal Policy Optimization—the algorithm behind most LLM alignment. Understand trust regions, the clipped objective, GAE, and why PPO's four-model architecture creates problems at scale.
- reinforcement-learning
- ppo
- llm-alignment
- rlhf
- deep-learning
Jan 11, 2026 Article
Reinforcement Learning Foundations for LLM Alignment (opens project or article)
Master the RL fundamentals powering modern LLM training: from MDPs and policy gradients through value functions and actor-critic methods. The mathematical foundations you need before diving into PPO, GRPO, and beyond.
- reinforcement-learning
- machine-learning-theory
- llm-alignment
- policy-gradients
- deep-learning

2025

Nov 21, 2025 Article
Deploying Contextual Bandits: Production Guide and Offline Evaluation (opens project or article)
Systems design, offline evaluation, and monitoring strategies for running contextual bandits safely in production.
- contextual-bandits
- production
- experimentation
Nov 19, 2025 Article
Neural Contextual Bandits for High-Dimensional Data (opens project or article)
When linear models fail, neural networks step in. Learn when to use neural bandits, how to quantify uncertainty with bootstrap ensembles, and handle high-dimensional action spaces with embeddings and two-stage selection.
- contextual-bandits
- neural-networks
- deep-learning
- uncertainty-quantification
Nov 17, 2025 Article
Implementing Contextual Bandits: Complete Algorithm Guide (opens project or article)
Complete Python implementations of ε-greedy, UCB, LinUCB, and Thompson Sampling. Learn which algorithm to use for your problem with default hyperparameters and practical tuning guidance.
- contextual-bandits
- algorithms
- python
- implementation
Nov 15, 2025 Article
Contextual Bandit Theory: Regret Bounds and Exploration (opens project or article)
Understand the theory behind contextual bandits: regret bounds, the exploration-exploitation tradeoff, reward models, and why certain algorithms work. Math that directly informs practice.
- contextual-bandits
- machine-learning-theory
- regret-bounds
- exploration-exploitation
Nov 13, 2025 Article
When to Use Contextual Bandits: The Decision Framework (opens project or article)
Stop running month-long A/B tests that leave value on the table. Learn when contextual bandits are the right choice for adaptive, personalized optimization—and when to stick with simpler alternatives.
- contextual-bandits
- reinforcement-learning
- online-learning
- personalization
Nov 5, 2025 Article
Beyond the Vibe Check: A Systematic Approach to LLM Evaluation (opens project or article)
Stop relying on gut feelings to evaluate LLM outputs. Learn systematic approaches to build trustworthy evaluation pipelines with measurable metrics, proven methods, and production-ready practices. A practical guide covering faithfulness vs helpfulness, LLM-as-judge techniques, bias mitigation, and continuous monitoring.
- llm-evaluation
- machine-learning
- rag-systems
- ai-quality
- systematic-testing
- production-ml

2024

Sep 12, 2024 Life No link

Became a father

Our first child, Benjamim, arrived on September 12, 2024 — and everything changed in the best possible way. I hit pause on the deep-learning roadmap to start collecting training data from the tiniest (and most fascinating) human dataset I’ll ever work with. These days, midnight diaper shifts feel a lot like reinforcement-learning loops — except the reward signal is a sleepy giggle that makes every iteration worth it. I still catch myself jotting notes in our “family lab notebook,” half scientist, half dad, completely in awe.

2023

Oct 7, 2023 Project
Large Language Models with MLX (opens project or article)
I explored chat tooling on Apple Silicon using MLX to understand the runtime and packaging story.
- llms
- mistral
- lamma2
Oct 7, 2023 Project
LoRA and DoRA Implementation (opens project or article)
I implemented LoRA and DoRA from scratch in PyTorch to understand the methods end to end.
- llms
- peft
- pytorch
Apr 30, 2023 Article
OpenELM Notes (opens project or article)
I wrote about OpenELM and how Apple approaches efficient language models.
- OpenELM
- research
- paper
Apr 21, 2023 Project
RAG System with LlamaIndex, Elasticsearch & Llama3 (opens project or article)
A deep dive into building a local-first retrieval-augmented generation system for document Q&A.
- Elasticsearch
- LlamaIndex
- Llama3
- RAG
- Vector Search