Notes on building real AI systems.
Writing about retrieval quality, model routing, agent evaluation, and the product decisions behind production-grade AI work.
Why RAG Quality Is Mostly Retrieval Design
Most "the model hallucinated" complaints I've debugged turned out to be retrieval bugs three layers down.
I Scored My LLM's Incident Reports Against a Rubric. Here's What Broke.
Prompt iteration without a regression suite is vibes. A 5-dimension rubric, 5 scenarios, and the surprises that fell out.
Agent Systems Need Evals Before They Need More Tools
I've watched two teams add web search, memory, and planning loops to fix what was actually a retrieval-prompt bug. Both took six weeks. The eval would have taken two days.
Model Routing Is a Product Decision, Not Just an Optimization
A real routing layer cut my last project's LLM bill by 62%. The reason it worked was not the cheaper model — it was admitting which calls didn't deserve the expensive one.
Continuous Batching Changes How I Think About LLM Serving
The biggest cost wins I've seen on self-hosted LLM serving came from the scheduler, not the model.
Your Personal GPT Should Be a Knowledge System, Not a Prompt Wrapper
I built two versions of a personal assistant — a clever prompt and a real retrieval system. Only one of them was still useful three months later.
How AI Should Actually Help in Graduate Admissions
I applied to grad school the year ChatGPT shipped. The places AI helped me weren't the places it markets itself for.
How I Choose AWS Compute for AI and Product Systems
A field guide written after a few too many migrations between Lambda, Fargate, and EKS. The lesson keeps being "match the abstraction to the workload, not your résumé."
Kubernetes, Lambda, and Fargate Are Different Abstractions, Not Just Different Services
They're not three flavors of the same thing. They're different deals about what you own versus what AWS owns. The right pick depends on which parts of that deal you actually want.
AI Code Review in CI Needs Scopes, Rubrics, and Escalation
A bot that comments on everything with equal confidence is worse than no bot at all. I have the noisy-PR receipts to prove it.
Capacity Planning for AI Products Starts with Traffic Shape
A single QPS number lied to me for a year. The thing that hurt at peak wasn't average load — it was the shape under it.
Config-Driven Systems Scale Teams Better Than Hard-Coded Flows
I once watched a team ship six weeks of "add a region" tickets that should have been a config change. The fix wasn't more engineers — it was admitting that the platform owned the wrong things.