Large Language Models with MLX
Check the repo: 🔗 Large Language Models with MLX.

Check the repo: 🔗 Large Language Models with MLX.

A from-scratch implementation of the full transformer → pretraining → SFT → GRPO → GDPO pipeline. Each layer built, tested, and documented. The repo is the artifact, the site is the narrative.
Sequence classification models for personalized size prediction in luxury fashion — LSTMs, attention mechanisms, and a published paper at ACM RecSys 2023.
Local-first RAG pipeline with hybrid search: BM25 + dense retrieval on Elasticsearch, LlamaIndex orchestration, and Llama3 for generation. Evaluated with RAGAS metrics across chunking strategies and retrieval configurations.
Parameter-efficient fine-tuning from first principles — every matrix decomposition derived and implemented in PyTorch without libraries. Validated against Hugging Face PEFT outputs for correctness.
Chat inference on Apple Silicon using MLX — exploring the runtime, quantization options, and packaging story for local LLM deployment with Mistral and Llama2.