I build production ML systems where learning, decision-making, and evaluation matter.
I'm Vitor Sousa, a Senior Data Scientist at Wellhub on the GenAI & Engagement team, where I build contextual bandit systems for personalized nudges, reinforcement learning pipelines, LLM-powered engagement workflows, and ML infrastructure on Kubeflow and Kafka. Previously at Farfetch, I built recommendation and size-prediction systems serving 4M+ customers across 190 countries — deep learning from scratch, learning-to-rank, and a published paper at ACM RecSys. This site goes beyond the day job — it's where I dig into research interests, build things from scratch to understand them deeply, and write about the ideas I'm most curious about.
11 articles · 4 projects
Reinforcement Learning for LLMs
A 4-part deep dive from RL foundations through PPO, GRPO, and GDPO — covering the full policy optimization stack for language model alignment, with math derivations and from-scratch implementations.
Read the seriesSelected writing
See also: Foundations →GDPO: Multi-Reward RL Done Right
When GRPO meets multiple rewards, advantages collapse. GDPO fixes this by normalizing each reward independently before combining. Learn why this matters for tool calling, math reasoning, and any multi-objective LLM alignment.
GRPO: Eliminating the Value Network
Group Relative Policy Optimization replaces PPO's learned value function with a simple insight: sample multiple outputs and use their relative rewards as advantages. 33% memory savings, simpler implementation, and the algorithm powering DeepSeek-R1.
PPO for Language Models: The RLHF Workhorse
Deep dive into Proximal Policy Optimization—the algorithm behind most LLM alignment. Understand trust regions, the clipped objective, GAE, and why PPO's four-model architecture creates problems at scale.
Selected projects
Tailor: Size Recommendations at Farfetch Scale
Sequence classification models for personalized size prediction in luxury fashion — LSTMs, attention mechanisms, and a published paper at ACM RecSys 2023.
recommendation-systems · deep-learning · pytorch · sequence-models · attention · production-ml · ab-testing
RAG System with LlamaIndex, Elasticsearch & Llama3
Local-first RAG pipeline with hybrid search: BM25 + dense retrieval on Elasticsearch, LlamaIndex orchestration, and Llama3 for generation. Evaluated with RAGAS metrics across chunking strategies and retrieval configurations.
Elasticsearch · LlamaIndex · Llama3 · RAG · Vector Search
LoRA and DoRA Implementation
Parameter-efficient fine-tuning from first principles — every matrix decomposition derived and implemented in PyTorch without libraries. Validated against Hugging Face PEFT outputs for correctness.
llms · peft · pytorch