About

I'm Vitor Sousa, a Senior Data Scientist at Wellhub, previously at Farfetch. I build production ML systems where learning, decision-making, and evaluation matter — recommendation engines that learn from user behavior, contextual bandits that make decisions under uncertainty, reinforcement learning pipelines for LLM alignment, and evaluation frameworks that keep deployed models honest.

Background

I studied Information Systems Engineering at the University of Minho, where I first got pulled into machine learning. What started as academic curiosity quickly turned into something deeper — the gap between understanding an algorithm on paper and making it work on real data was humbling, and closing that gap became the thing I cared about most.

At Farfetch that gap got real. I worked on recommendation systems serving over four million active customers across 190 countries — the kind of scale where every modelling choice has measurable business impact. I built size-prediction systems using sequence classification with LSTMs and attention mechanisms, work that became a published paper at FashionXRecsys (ACM RecSys 2023). I also worked on real-time personalized recommendations, learning-to-rank pipelines, and collaborative filtering systems. This wasn't fine-tuning pretrained models — it was designing deep learning architectures from the ground up, training on proprietary data, and optimizing for real business metrics. Farfetch taught me what production ML actually demands, and it remains probably the most challenging and formative experience of my career.

At Wellhub I'm a Senior Data Scientist on the GenAI & Engagement team. I build contextual bandit systems for personalized nudges, reinforcement learning pipelines for adaptive optimization, LLM-powered engagement workflows, and the ML infrastructure that supports it all — Kubeflow for training and orchestration, Kafka for real-time event processing, and evaluation frameworks that keep deployed models honest. Where Farfetch was about recommendations at scale, Wellhub deepened my understanding of experimentation, causal reasoning, and the feedback loops between models and user behavior.

What I write about

I write to close the gap between reading a paper and truly understanding it. Each piece starts from first principles — re-deriving the math, implementing from scratch, and pressure-testing against real problems. The topics follow my research interests: reinforcement learning for LLM alignment, contextual bandits, and systematic LLM evaluation.

Current focus

Right now I'm in a deliberate foundations phase — working through Prince's Understanding Deep Learning, building transformer components from scratch (attention, positional encoding, full forward pass) with tests at every layer, and strengthening the math underpinnings: linear algebra, calculus, and probability at the derivation level.

The next stage shifts to reinforcement learning proper — Sutton & Barto cover to cover, from-scratch implementations of policy gradient methods and PPO. This converges on a flagship project I'm building toward: a research-quality RLVR + GRPO implementation that trains small language models on math reasoning, with proper benchmarks, ablations, and a technical write-up series documenting the full process.

Alongside the research focus, I continue to draw on my experience in recommender systems and classical ML — the fundamentals of feature engineering, offline evaluation, and production trade-offs that apply whether the model is a gradient-boosted tree or a 7B parameter language model.

Currently reading

Hands-On Large Language Models — working through practical patterns for shipping LLM applications.

Reinforcement Learning (Sutton & Barto) — revisiting the fundamentals of reinforcement learning theory.

Get in touch

I'm always open to discussing recommendation systems, reinforcement learning for LLMs, production ML, or research collaboration.

GitHub · LinkedIn · CV