Blog

Notes on building real AI systems.

Writing about retrieval quality, model routing, agent evaluation, and the product decisions behind production-grade AI work.

2026-03-227 min read

Why RAG Quality Is Mostly Retrieval Design

Most "the model hallucinated" complaints I've debugged turned out to be retrieval bugs three layers down.

RAGRetrievalLLM Systems

2026-03-159 min read

I Scored My LLM's Incident Reports Against a Rubric. Here's What Broke.

Prompt iteration without a regression suite is vibes. A 5-dimension rubric, 5 scenarios, and the surprises that fell out.

EvalsLLM as JudgeSREProduction AI

2026-02-117 min read

Agent Systems Need Evals Before They Need More Tools

I've watched two teams add web search, memory, and planning loops to fix what was actually a retrieval-prompt bug. Both took six weeks. The eval would have taken two days.

AgentsEvalsProduction AI

2025-11-166 min read

Model Routing Is a Product Decision, Not Just an Optimization

A real routing layer cut my last project's LLM bill by 62%. The reason it worked was not the cheaper model — it was admitting which calls didn't deserve the expensive one.

Model RoutingAI ProductCost Engineering

2025-08-218 min read

Continuous Batching Changes How I Think About LLM Serving

The biggest cost wins I've seen on self-hosted LLM serving came from the scheduler, not the model.

LLM ServingContinuous BatchingSystems

2025-05-098 min read

Your Personal GPT Should Be a Knowledge System, Not a Prompt Wrapper

I built two versions of a personal assistant — a clever prompt and a real retrieval system. Only one of them was still useful three months later.

Personal AIRetrievalSystem Design

2025-01-267 min read

How AI Should Actually Help in Graduate Admissions

I applied to grad school the year ChatGPT shipped. The places AI helped me weren't the places it markets itself for.

AI ProductEducationGraduate Admissions

2024-11-039 min read

How I Choose AWS Compute for AI and Product Systems

A field guide written after a few too many migrations between Lambda, Fargate, and EKS. The lesson keeps being "match the abstraction to the workload, not your résumé."

AWSArchitectureInfrastructure

2024-09-129 min read

Kubernetes, Lambda, and Fargate Are Different Abstractions, Not Just Different Services

They're not three flavors of the same thing. They're different deals about what you own versus what AWS owns. The right pick depends on which parts of that deal you actually want.

AWSKubernetesLambdaFargate

2024-06-188 min read

AI Code Review in CI Needs Scopes, Rubrics, and Escalation

A bot that comments on everything with equal confidence is worse than no bot at all. I have the noisy-PR receipts to prove it.

CI/CDCode ReviewLLM Systems

2024-02-088 min read

Capacity Planning for AI Products Starts with Traffic Shape

A single QPS number lied to me for a year. The thing that hurt at peak wasn't average load — it was the shape under it.

Capacity PlanningAI InfrastructureKubernetes

2023-11-147 min read

Config-Driven Systems Scale Teams Better Than Hard-Coded Flows

I once watched a team ship six weeks of "add a region" tickets that should have been a config change. The fix wasn't more engineers — it was admitting that the platform owned the wrong things.

System DesignConfig-Driven DesignPlatform Engineering