Building production LLM systems, scalable backends, and AI-optimized infrastructure
I'm a senior engineer passionate about building production-grade AI systems at the intersection of LLMs, distributed backends, and cloud-native infrastructure. My work focuses on taking cutting-edge AI research and transforming it into reliable, scalable systems that solve real-world problems.
I believe in learning by building, sharing knowledge openly, and pushing the boundaries of what's possible with modern AI technology.
Building scalable LLM systems that work in the real world
Implementing cutting-edge research in production
Contributing to and building open-source projects
GPT-4, GPT-4 Turbo, Claude 3, Gemini Pro
Llama 3/3.1, Mistral, Phi-3, Qwen 2, Gemma 2
OpenAI Ada-002, Cohere, BGE, E5
vLLM, TGI, ONNX Runtime, Triton
Production-grade Retrieval Augmented Generation platform with advanced chunking, hybrid search, and multi-LLM support.
Multi-agent system demonstrating agentic AI patterns with tool use, planning, memory, and orchestration.
End-to-end platform for fine-tuning open-source LLMs with experiment tracking, evaluation, and deployment.
High-performance Go backend for LLM service routing, request management, and observability.
Production Kubernetes infrastructure optimized for GPU workloads and model serving with GitOps.
Building agentic workflows with multi-agent collaboration
Optimizing RAG systems for production (latency, cost, quality)
Exploring RLHF and DPO for LLM alignment
GPU cost optimization strategies for model serving
Responsible AI practices and bias mitigation
I'm always interested in connecting with fellow engineers, researchers, and builders in the AI space.
Open to: