Open any AI product discussion in 2026 and someone says "inference economics" within five minutes. Token prices, caching strategies, model routing, gross margins. It gets treated like a new discipline that arrived with the cost crunch. It isn't new. It's just what AI product work looks like when someone in the room has to pay for the feature. For me, that someone showed up in 2023.

Funding came from a spreadsheet, not a demo

In 2023 I was Head of Product at Emma, the Sleep Company, pitching an internal committee on an AI Sleep Coach inside a 1,000-person business that thinks in EBIT. The demo was the easy part. GPT-4 was brand new and we were building on it through early access, while most teams were still on the waitlist. The recommendations looked like magic, and everyone in the room could feel the potential.

The money did not move on the demo. It moved on the unglamorous question underneath: what does this cost per user, per month, at scale, and what does that do to margin? So the business case answered it before anyone asked. Usage assumptions, cost per active user, how the numbers bend as adoption grows, where the levers were if the economics drifted.

That case, together with the roadmap built on it, unlocked €250k in its first month and over €500k of internal funding across two rounds. The lesson has run my product work since: an AI feature is not a product until it has a unit cost. Before that, it's a demo with a burn rate.

What cost discipline actually means

Not procurement. Not finance homework after the build. A small set of product habits:

Pick a unit that means something. Cost per token is a price. Cost per task, or per active user per month, is an economic fact about your product. The first one is what the platform charges you; the second one is what your business model has to survive. Model in the second, always.

Set targets per route. Different jobs deserve different models. The step that drafts a reflection prompt and the step that classifies an input are not the same workload, and paying frontier-model prices for classification work is how margins quietly die. Give each route its own cost and latency target, then hold it there.

Context is the biggest lever. Most teams tune model choice first. In practice the bill is driven by how much you send, not just what you send it to. Context discipline and caching beat model-shopping in almost every case we have touched: switching models saves you percentages, cutting context bloat saves you multiples.

Make cost a ship gate. We run evals as our QA layer, and cost sits inside that same gate. A feature that clears quality but breaks its unit economics does not ship; it is not done, in exactly the way a feature that fails a safety check is not done. "Works" and "works at a price we can charge for" are different claims, and only the second one is a product.

Watch drift. Costs regress the way quality regresses. Prompts grow, context accumulates, someone adds a retry loop, and a route that cleared its target in March is 40 percent over it by June without any single decision causing it. Cost needs regression checks like everything else that matters.

Flowchart of cost as a ship gate: a feature idea is priced first, built on a route, then checked by a gate requiring both eval quality and unit cost at or under target; over-target results loop back to context cuts and rerouting, and production re-measurement feeds cost drift back into the gate.
Price first, gate on quality and cost together, and re-measure in production, because cost drifts.

Built in from day one at Mindlid

At Mindlid this was never a retrofit, because a consumer emotional-wellness companion lives or dies on unit economics at scale. Deep sessions are the product working as intended, and deep sessions are exactly what makes an unpriced AI product bleed. So the operating layer carries cost and latency targets per route, context discipline and caching to keep cost per task low, and gates that check economics alongside reliability and safety before anything deploys.

That is also the through-line of this small series: define done, gate it, price it. Goal-based loops give the work a finish line, evals hold the quality bar, and unit costs decide whether the whole thing is a business or a subsidy.

What I got wrong

Two things, so this doesn't read like hindsight dressed up as strategy.

  • I optimized in the wrong order early on. I chased cheaper models before attacking context size, because model price is visible on a pricing page and context bloat is not visible anywhere until you instrument it. The savings were backwards: the model swap was worth a haircut, the context work was worth multiples.
  • I treated pricing as a launch artifact instead of a living number. The first model was right on day one and quietly wrong within months, as features stacked and prompts grew. Now cost gets re-measured like a regression suite, not re-derived like an annual budget.

"Prices will drop" is not a strategy

The most common objection to all of this is that model prices keep falling, so why bother. Two answers. Prices per token fall, and usage grows faster: cheaper calls get spent on more calls, longer contexts, more ambitious features, and the bill survives the discount. And a price drop rescues your costs without rescuing your discipline; the team that waited for the platform to fix its margins has learned nothing it can reuse, and drifts right back over the line.

Cost was never a platform problem to wait out. It is a product decision you own, in the same list as scope, quality, and safety. In 2023 that was a survival requirement for getting an AI product funded inside a company that counts. In 2026 it has a fashionable name. The practice is the same either way, and it is available to any team willing to put a unit cost next to every feature idea before falling in love with it.

If you want the gate structure we run, including where the cost check sits next to the quality checks, the framework is free to copy: Evals-as-QA Framework.

The cheapest AI feature is the one you priced before you built it.