← All articles
9 min read

The Real Cost of One AI Conversation — and the Math Behind It

Your AI assistant has been live for one month. It handled about 10,000 conversations — shoppers asking about sizing, stock, shipping, the assistant answering every one. A good month. Then the invoice lands: $1,150. You read it twice. That is not a server, not an ad budget — that is the price of a chatbot talking.

Nothing went wrong. No bug, no abuse, no traffic spike. The assistant did exactly its job, and doing its job is what cost a four-figure sum. And here is the line that should keep you up at night: the store across the street ran the identical assistant, fielded the same 10,000 conversations, and paid $130. One-ninth the price. Not a discount, not a smarter contract — the same software, the same shoppers. The entire gap is one setting on a screen neither owner was ever shown.

This is the quiet trick in every AI pricing page. They tell you it is "cheap" — a fraction of a cent per chat — and they are technically right. But "a fraction of a cent" is the most expensive phrase in software. It hides a 10× spread, and nobody does the multiplication for you. One fraction of a cent buys the $130 month. Another buys the $1,150 one. Same three words on the page.

So let's do the multiplication. No code, no jargon you have not met before — just the arithmetic behind a single conversation, traced through all three models that quietly bill you, until you can read any pricing page and know exactly what you are about to pay.

Five stacks of small-denomination coins of varying heights on a white background.
Five stacks, five different heights — the same small coins. The cost of an AI conversation works exactly this way: the unit is tiny, but how high the stack climbs is decided by choices you control.

There are three different model bills hiding inside one conversation:

  1. The language model (LLM) — the part that reads the shopper and writes the replies.
  2. The embedding model — the part that searches your catalog.
  3. The eval model — the part that quietly grades the assistant's answers.

We will price each one, then add them up.

Where the three bills come fromOne conversation quietly calls three separate paid modelsShopperasks a questionLanguage Modelwrites every replyEmbedding Modelsearches the catalogEval Modelgrades the replyquestionanswersearch querymatchesreply to gradeBill #1 · ~99% of the costBill #2 · ≈ $0Bill #3 · a rounding error
The shape of the bill. Every shopper turn runs through the language model; that model reaches out to an embedding model to search your catalog and an eval model to grade itself. Three models, three line items — but as you will see, they are wildly unequal.

First: a "conversation" is not one question

The instinct is to think of a chat as one question and one answer. It almost never is. A real shopping conversation looks like this:

Shopper: Do you have running shoes for flat feet?

Assistant: Yes — a few good options. Are these for road or trail?

Shopper: Road, mostly.

Assistant: Then I'd look at these three… (lists products)

Shopper: Is the second one true to size?

Assistant: It runs about half a size small…

Shopper: Okay, add the size 10 to my cart.

Assistant: Done — anything else?

That is six back-and-forth turns. Each turn is a separate request to the language model. And here is the part that surprises people: every turn re-sends the entire conversation so far. The model has no memory between turns — to answer message #6, it must be handed messages #1 through #5 again, in full.

Hold onto that. It is the single biggest driver of the bill.

The unit you actually pay for: tokens

You do not pay per message or per word. You pay per token. A token is a chunk of text — roughly ¾ of a word in English. "Running shoes for flat feet" is about 6 tokens. A typical sentence is 15–25 tokens. A paragraph is 80–120.

Two rules of thumb:

And tokens come in two kinds, priced differently:

Token typeWhat it isRelative price
InputEverything you send to the model: the shopper message, the conversation history, your store instructions, product dataCheaper
OutputEverything the model writes back3–5× more expensive

Output is the expensive one. The assistant writing a paragraph costs several times more than it reading a paragraph.

What gets sent on every single turn

When the assistant answers one message, the request is not just that message. It is a stack:

PieceWhat it isTypical size
System instructions"You are a helpful shopping assistant for [Store]. Be concise. Never invent prices…" plus the list of actions it is allowed to take (search, add to cart, check stock)~2,000 tokens
Conversation historyEvery previous message, shopper and assistant, word for wordgrows each turn
Retrieved product dataThe catalog entries pulled in to answer this question~800 tokens
The new shopper messageWhat they just typed~50 tokens

The system instructions and retrieved data are sent fresh every turn. The history grows every turn. The shopper typed 50 tokens — but the model receives close to 3,000.

The hidden multiplier: re-sending the conversation

Let's price the six-turn chat above. Assume each shopper message is ~50 tokens, each assistant reply ~150 tokens, system instructions ~2,000 tokens, and retrieved product data ~800 tokens per turn.

Because every turn re-sends everything before it, the input grows turn over turn:

TurnInput sent to modelOutput written
12,000 + 800 + 0 history + 50 = 2,850150
22,000 + 800 + 200 history + 50 = 3,050150
32,000 + 800 + 400 history + 50 = 3,250150
42,000 + 800 + 600 history + 50 = 3,450150
52,000 + 800 + 800 history + 50 = 3,650150
62,000 + 800 + 1,000 history + 50 = 3,850150
Total20,100 input tokens900 output tokens
Every turn re-sends the whole conversationThe shopper typed ~300 tokens. The model received 20,100.2,850Turn 13,050Turn 23,250Turn 33,450Turn 43,650Turn 53,850Turn 6Fixed: system prompt + product data (resent every turn)Growing: conversation history
Each bar is one turn. The pale block — the system prompt and product data — is identical every time and never shrinks. The solid block on top is the conversation history, growing turn after turn. The shopper's actual words are a sliver of either.

Here is the headline. The shopper typed about 300 tokens of text. The conversation consumed 20,100 input tokens — roughly 67× more. The system instructions alone (2,000 × 6 turns = 12,000 tokens) account for more than half the bill.

This is not waste — it is how the technology works. But it explains why "the messages were so short, why did it cost that much?" has a real answer.

Bill #1: the language model

Now apply price. Model pricing varies enormously, so let's use two vendor-neutral tiers that bracket the real market:

TierInput price /1MOutput price /1M
Flagship (top-end reasoning model)$5.00$15.00
Mid-tier (fast, capable, cheaper)$0.50$1.50

Flagship model:

Mid-tier model:

Same conversation. Same shopper. A 10× difference, decided entirely by which model the assistant runs on. That is the choice nobody puts in front of you — and it is the most important one on the page.

Bill #2: the embedding model (catalog search)

When the shopper asks for "running shoes for flat feet," the assistant cannot read your whole catalog every time — that would be far too many tokens. Instead it uses embeddings.

An embedding turns a piece of text into a list of numbers that captures its meaning. Products with similar meaning end up with similar numbers. To search, you embed the shopper's question and find the catalog entries whose numbers are closest. This is what lets "flat feet" surface a shoe described as "stability / motion control" even though those exact words never matched.

Embeddings have two costs:

a) Indexing your catalog — a one-time cost. Every product description gets embedded once, then re-embedded only when it changes. For a 5,000-product store at ~200 tokens per product:

b) Searching during the conversation — a per-conversation cost. Each search embeds only the shopper's short query. Six searches × ~50 tokens = 300 tokens:

That is six millionths of a dollar. For practical purposes, the embedding cost of a conversation is zero. It matters for catalog indexing, not for the per-chat bill. Good to know — mostly so you are not upsold on it.

Bill #3: the eval model (quality control)

The third model is the one most store owners have never heard of. An eval model is a second, usually smaller, language model whose job is to grade the assistant's answers — automatically checking things like: did it stay on topic, did it invent a price, was it actually helpful?

You do not need this on every conversation. It is a quality-control sample — like a factory checking 1 in 20 units, not all of them. But when it runs, it is another model call, so it has a cost.

Grading one assistant reply means sending the eval model the question, the answer, and a rubric (~600 input tokens) and getting back a short verdict (~100 output tokens). Eval almost always runs on a cheap small model — say $0.15 /1M input, $0.60 /1M output.

If you graded all six turns of our conversation:

And if you sample 1 conversation in 20 instead of grading every turn, the eval cost effectively disappears. It is a rounding error either way — but it is real, and it is the reason your assistant keeps getting better instead of quietly drifting.

Putting the whole bill together

One six-turn conversation, all three models added up:

ComponentFlagship LLMMid-tier LLM
Language model$0.1140$0.0115
Embedding (search)$0.000006$0.000006
Eval (all turns graded)$0.0009$0.0009
Total per conversation~$0.115~$0.013
The same conversation, two model tiersPer conversation — and per month at 10,000 conversationsFlagship model$0.115 per chat$1,150 / monthMid-tier model$0.013 per chat$130 / month
The whole article in one picture. Identical conversation, identical shopper — the only variable is which language model answered. That single choice is the difference between a $130 month and a $1,150 one.

The language model is the bill — 99% of it. Embeddings are free in practice. Eval is a rounding error. So when you compare AI assistants, do not get lost in feature lists about "advanced retrieval" or "evaluation pipelines." Ask which language model answers the shopper, and at what tier. That one answer sets your cost.

To make it concrete — at 10,000 conversations a month:

Same traffic. Same store. The gap is a model choice.

What actually moves the meter

If you want the bill lower without making the assistant worse, these are the five real levers — in order of impact:

  1. Model tier. The 10× lever. Many stores do not need a flagship model to recommend shoes; a strong mid-tier model handles ordinary shopping questions well. The best setups route — cheap model for simple chats, flagship only for genuinely hard ones.
  2. Prompt caching. Those 2,000-token system instructions are identical on every turn. Most providers let you cache that fixed block so re-sending it costs a fraction of full price. On a long conversation this alone can cut the input bill by half or more. Ask if your provider uses it.
  3. System prompt size. A bloated 5,000-token instruction block is sent on every turn of every conversation forever. Tightening it is a permanent discount.
  4. History trimming. A 30-turn conversation does not need turn 1 verbatim. Summarizing or dropping old turns stops the input from growing without limit.
  5. Eval sampling. Grade a representative sample, not every message. You get the quality signal at a fraction of the cost.

Notice what is not on the list: embeddings and search. They are cheap enough that optimizing them saves you nothing.

The takeaway for store owners

Cheap is not a number. Now you have the number — and the math to check anyone else's.

Turn questions into checkout.

WisWes drops into your store and guides shoppers from browsing to buying. 14-day free trial — no card.