How a YC Founder Cut Their AI Burn by 62% in a Weekend
The founder, call him D., walked into our Zoom looking like he had not slept. His Series A deck had a chart of AI spend curving up and to the right like a hockey stick, and not in the good way. $80K a month. Forty percent of burn. Growing at 18% month-over-month against revenue growing at 11%. Runway, which used to be 14 months, was now 9.
"The board is going to eat me alive on Thursday," he said. "Can you actually fix this or is this another thing I have to build in-house?"
Three days later his daily AI spend had dropped 62%. The story of how, and what it actually teaches you about running a startup on LLM infrastructure, is below. All numbers are stylized for his privacy but proportionally accurate.
Where the $80K was going
D.'s product is an AI-powered research assistant for mid-market finance teams. Avg ACV $1,200/mo, 240 paying customers, so about $288K in MRR. Gross AI COGS of $80K is a 72% gross margin on paper, which sounds fine until you remember that SaaS benchmarks want 75-85% and his trajectory was taking him to 65% by Q3.
We spent 40 minutes with his Anthropic and OpenAI invoices and the usage export from his internal logger. Here is where the spend actually lived:
| Workload | Model | Monthly spend | Share | Why they picked it |
|---|---|---|---|---|
| Research synthesis (main loop) | claude-opus-4-1 | $41,200 | 51% | Shipped on launch day, never revisited |
| Query expansion / re-ranking | claude-opus-4-1 | $14,800 | 18% | Same SDK, easy to reuse |
| Summarization on fetch | gpt-4o | $11,300 | 14% | Historical; original MVP was OpenAI-only |
| Classification + routing logic | gpt-4o | $6,900 | 9% | Never benchmarked against cheaper |
| Eval + scoring harness | claude-sonnet-4-5 | $3,400 | 4% | Correct choice, left alone |
| Misc (agents, debugging) | mixed | $2,400 | 3% | Orphan spend no one owned |
Two things jumped out. First, Opus was doing 69% of the spend and maybe 30% of the work that genuinely needed it. Query expansion does not need a frontier model; it needs a decent one that is 90% cheaper. Second, nobody had ever looked at this table. Not D., not his two engineers, not his head of product. They had been flying blind for nine months.
Day 1: install, point, measure
The first move was to do nothing clever. We swapped the base URL in his Anthropic and OpenAI clients to point at the KairosRoute gateway and kept the model names identical — same Opus calls, same 4o calls, nothing routed yet. The only thing that changed was that every request started emitting a receipt row with latency, token counts, tool classification, and an estimated what-if cost for three alternative models.
# Before
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
# After (one line change; zero behavior change)
client = Anthropic(
api_key=os.environ["KAIROSROUTE_API_KEY"],
base_url="https://api.kairosroute.com/v1",
)Twenty-four hours of passthrough traffic gave us a 180K-request dataset. The headline from the dashboard:
- 73% of Opus calls had a shadow-scored cheaper alternative within 2 points on their own evals.
- p95 latency on Opus was 6.8 seconds for query expansion — an internal, never-user-visible step. Pure waste.
- Classification calls on 4o averaged 180 input tokens. A Haiku-class model does this for roughly 1/12th the cost.
D. looked at the dashboard and said the thing every founder says at this point: "oh god."
Day 2: route by stage, not by dogma
The temptation with routing is to flip a switch and let the router decide everything. That is fine eventually, but on day two you want surgical wins you can defend in a board meeting. We carved the workload into three buckets:
- Keep on frontier. The research synthesis step — the part that generates the user-facing answer — stayed on Opus. That is the moat.
- Move to mid-tier. Summarization on fetch moved to Sonnet via
kr-autowith a quality threshold. The router could pick Sonnet or a cheaper alt depending on prompt. - Move to fast/cheap. Query expansion, re-ranking, and classification moved to Haiku-class routing. These are internal steps; users never see the output directly, and eval scores were indistinguishable above 89%.
All of that was config. No code changes beyond the model string on the four routes. We shadow-ran for six hours against live traffic with the old path still primary, compared eval scores side by side, and flipped the cutover that evening.
Day 3: the numbers he took to the board
Projected from 48 hours of post-cutover data to a full month:
| Workload | Before | After | Delta |
|---|---|---|---|
| Research synthesis | $41,200 | $39,800 | -3% |
| Query expansion / re-rank | $14,800 | $1,700 | -89% |
| Summarization on fetch | $11,300 | $3,900 | -65% |
| Classification + routing | $6,900 | $520 | -92% |
| Eval + scoring | $3,400 | $3,400 | 0% |
| Misc | $2,400 | $800 | -67% |
| Total | $80,000 | $30,120 | -62% |
$49,880 a month in savings. An extra 5.2 months of runway at the current burn. Gross margin jumped from 72% to 90%. And because the receipts export was landing in his data warehouse every minute, his CFO could finally tie AI COGS to the customer generating it — something he had been faking in his board deck for nine months.
What D. wishes he had done differently
I asked him a month later, over coffee, what he wished he had known. Three things:
- Instrument before you optimize. "I assumed I knew where the spend was. I was wrong on three of the six buckets." A week of passive receipts would have told him that on month two, not month nine.
- Quality thresholds are a feature, not a risk. He had been avoiding cheaper models because of a vague fear of regressions. The shadow eval showed that fear was wrong for 73% of his traffic. See Silent Quality Regression for the discipline around this.
- Cost discipline is a culture thing, not a tool thing. His engineers had never seen the Anthropic bill. Once he put a daily Slack digest of spend per route in the #eng channel, orphan workloads dropped to zero within two weeks.
The playbook, generalized
If you are a seed or Series A founder staring at an AI bill that is growing faster than your revenue, here is the 72-hour version:
- Hour 1-2: Swap your base URL to the gateway. Do not route anything yet. Just observe.
- Hour 2-24: Let the receipts stream in. Pull the spend-by-route table.
- Hour 24-36: Identify your top 3 waste lines — internal steps on frontier models, classification on premium models, summarization that never gets evaluated.
- Hour 36-48: Move those routes to
kr-autowith quality thresholds. Shadow-run against eval set. - Hour 48-72: Cut over. Put a daily spend digest in your team Slack. Write the board update.
The router is the wedge. The analytics — the receipts, the per-route attribution, the shadow evals, the daily digest — are what actually change how you run the company. Once a founder sees their spend disaggregated for the first time, they never go back to flat invoices.
Try the math on your own bill
Drop your last Anthropic or OpenAI invoice into the savings calculator. It will estimate your routable spend against current prices across the 45+ models in our registry. Most seed-stage founders find between 40% and 70% of their bill is addressable in a weekend.
If you want to see it live before you switch anything, the playground lets you send a prompt through kr-auto and see the routing decision, the quality score, and the cost delta in one pane. No credit card, no commitment. That is how most YC founders start.
Ready to route smarter?
KairosRoute gives you a single OpenAI-compatible endpoint that routes every request to the cheapest model meeting your quality bar — plus the observability, A/B testing, and cost analytics that turn cheaper infrastructure into a durable margin.
Related Reading
AI agents scale 10–100x model calls per user action. If you don't have a per-ticket, per-task, or per-conversation cost model, you are running a business on vibes. Here's how to build one — and what it reveals.
Our Business tier is $499/month. Our Scale tier is $1,499/month. Our Enterprise tier starts at $25K ACV. Are those prices fair for what you get? This post is the real accounting — including a fully transparent 4% managed-key gateway fee.
Most RAG pipelines run every stage on the same frontier model. That is the single biggest cost leak in production AI. Here is the stage-by-stage model selection pattern, with a concrete per-query cost breakdown.