Question 1

How is the monthly cost of an AI chat agent calculated?

Accepted Answer

How is the monthly cost of an AI chat agent calculated?

The monthly cost is (input tokens ÷ 1,000,000 × the model’s input price) + (output tokens ÷ 1,000,000 × its output price). WisWes derives the token counts by multiplying your unique monthly users by the engagement rate to get conversations, then estimating input and output tokens per conversation from the system prompt, tool definitions, retrieved RAG context, accumulated history and the model’s answers.

How do I use the WisWes cost calculator?

Where do the default values come from?

How accurate is this AI agent cost estimate?

Can I share or link a specific estimate?

Question 2

How do I use the WisWes cost calculator?

Accepted Answer

Start with the E-commerce profile to set a baseline for your industry, then enter your monthly traffic and engagement under Audience. The estimated monthly model spend updates live as you adjust retrieval, tools and model. Use the Token estimator if you’re unsure what a token value should be.

Question 3

Where do the default values come from?

Accepted Answer

The defaults mirror a real WisWes agent: ~20% on-site chat engagement (Tidio benchmark), ~1,200 tokens of retrieved context per turn, ~17 tools at ~190 tokens each re-sent every call, and the per-token rate card used by the WisWes backend.

Question 4

How accurate is this AI agent cost estimate?

Accepted Answer

This is a planning estimate, not a bill. WisWes models token counts from a typical turn structure, so your real spend varies with how chatty shoppers are, how often tools fire and how much context you retrieve. Treat it as a grounded ballpark and refine the inputs with your own analytics.

Question 5

Can I share or link a specific estimate?

Accepted Answer

Yes. The WisWes calculator reads its values from the URL, so a link like /calculator?users=5000&model=claude-sonnet-4-6 opens pre-filled. The WisWes chat assistant uses this to hand you a ready-made estimate from a conversation.

Question 6

What does the E-commerce profile do?

Accepted Answer

The E-commerce profile sets a sensible baseline for every cost parameter from three inputs: your industry (one of 10 top e-commerce verticals), B2B or B2C, and your current conversion rate. Choosing a profile retunes retrieval size, tool count, conversation length and engagement, and you can still fine-tune any value afterwards.

Question 7

Which e-commerce industries does the calculator cover?

Accepted Answer

The WisWes calculator covers the 10 largest e-commerce industries: general/multi-category, fashion & apparel, consumer electronics, health & beauty, home & furniture, food/beverage & grocery, sports & outdoors, toys/kids & hobbies, jewelry & accessories, and auto parts & accessories. Each preset tunes retrieval and tool counts to that vertical’s typical data depth.

Question 8

Why does electronics cost more than clothing for an AI agent?

Accepted Answer

Electronics costs more because it carries far more per-product data — specs, compatibility and comparisons — so a WisWes agent retrieves larger context and needs more tools, raising tokens per turn. Clothing leans on a few attributes (size, colour, fit), so its retrieval and tool footprint is lighter and cheaper per conversation.

Question 9

How does B2B vs B2C change the estimate?

Accepted Answer

B2B raises the estimate because B2B conversations run longer and more technical — quotes, accounts and approvals — so the profile adds messages per conversation, more tools, a larger system prompt and an extra reasoning round-trip. B2C stays leaner and higher-volume.

Question 10

How is my conversion rate used in the estimate?

Accepted Answer

Your purchase conversion rate is used as a buying-intent signal: higher-converting traffic tends to engage the assistant more, so the profile nudges the chat engagement rate up with it. It does not change per-token pricing — it only sets a smarter engagement starting point.

Question 11

Can I override the auto-tuned values?

Accepted Answer

Yes. The E-commerce profile only sets starting values — every slider and field below stays editable, so you can match your own measured numbers after picking an industry.

Question 12

How do I find my unique monthly users?

Accepted Answer

Your unique monthly users are the unique visitors to your store in a typical month — not pageviews or sessions. In Google Analytics 4 it is the “Users” (or “Total users”) metric; in Shopify Analytics it is “Online store visitors”; in Plausible it is “Unique visitors”.

Question 13

How do I estimate my engagement rate?

Accepted Answer

Engagement rate is the share of visitors who actually start a chat: conversations started ÷ unique visitors, over the same period. If your widget reports “chats opened”, divide that by visitors. If you don’t track it yet, start with the 20% default — the typical on-site bot engagement rate (Tidio data) — and refine once you have your own numbers.

Question 14

What counts as a “conversation” vs a “message”?

Accepted Answer

A conversation is one shopper’s chat session with the agent; a message is a single turn within it. The calculator bills tokens per message, and there are several messages per conversation — so both the engagement rate (how many conversations) and messages per conversation drive the cost.

Question 15

How many messages should I assume per conversation?

Accepted Answer

Count the user’s turns (messages they send), not the bot’s replies, in a typical engaged chat. A quick product question is 2–3; a guided “help me choose” or support flow is 6–10. The default is 6. More messages mean more LLM calls, and history grows each turn, so this scales cost noticeably.

Question 16

Does more traffic always mean proportionally more cost?

Accepted Answer

Roughly yes — cost scales with conversations, which is users × engagement rate, so doubling traffic at the same engagement roughly doubles the model spend. The bigger non-linear levers are conversation length, retrieval size and model choice.

Question 17

What is RAG and why does it add cost?

Accepted Answer

RAG (retrieval-augmented generation) injects relevant snippets from your catalog, FAQs and policies into the prompt so the agent answers from your data instead of guessing. Those retrieved tokens are added to the input on the turns that search, so larger or more numerous results raise cost.

Question 18

How do I measure my RAG (knowledge-base) context size?

Accepted Answer

RAG context per turn ≈ (number of results you return) × (tokens per result). Take one real retrieved chunk — a product snippet or an FAQ answer — paste it into the Token estimator to get its token count, then multiply by how many you show per answer. WisWes defaults to ~1,200 tokens (≈8 product matches or ≈3 FAQ answers).

Question 19

Does my AI shopping agent need RAG?

Accepted Answer

You need RAG if your agent must answer from your live catalog, specs or policies — that grounding is what keeps answers accurate. A purely scripted or FAQ-light bot can switch RAG off here to see the lower-bound cost, but most commerce agents rely on it.

Question 20

What happens if I return more results per answer?

Accepted Answer

Cost rises roughly linearly with results shown: 8 product matches cost about twice the tokens of 4. Returning just enough to answer well — rather than a long list — is an easy way to trim per-turn tokens.

Question 21

How do I count tools and tokens per tool?

Accepted Answer

Count the distinct actions your agent can take — search catalog, recommend, add to cart, apply discount, track order, hand off to a human, and so on. Each tool ships a name, description and JSON parameter schema; serialized, that is ~150–250 tokens (WisWes averages ~190). To measure your own, paste one tool’s JSON definition into the Token estimator.

Question 22

Why are tool and prompt tokens counted on every message?

Accepted Answer

WisWes re-sends the full tool definitions and system prompt on every LLM call and does not apply a prompt-cache discount today, so those input tokens are billed fresh on each turn. That makes the tool count and prompt size meaningful cost levers — trimming unused tools or a bloated system prompt lowers every single call.

Question 23

Do unused tools still cost me?

Accepted Answer

Yes, on every call where they’re offered. WisWes sends the full tool list as input tokens each turn with no cache discount, so a tool the shopper never triggers still adds its ~190 tokens to every message. Pruning tools the agent doesn’t need lowers every call.

Question 24

How do I measure one tool’s token size?

Accepted Answer

Copy the tool’s serialized definition — its name, description and JSON parameter schema — and paste it into the Token estimator. Multiply a representative tool’s size by your tool count, or measure your largest few and average.

Question 25

What is a token in LLM pricing?

Accepted Answer

A token is the unit language models read and write — roughly 4 characters, or about 0.75 words, of English. Model pricing is quoted per million tokens, split between input (what you send) and output (what the model generates).

Question 26

What’s the difference between input and output tokens?

Accepted Answer

Input tokens are everything you send the model each call — system prompt, tool definitions, retrieved RAG, conversation history and the user’s message. Output tokens are what the model writes back. They’re priced separately, and output usually costs 4–10× more per token.

Question 27

Why is input usually the bigger cost?

Accepted Answer

Because you send a lot of it every call — system prompt + all tool definitions + RAG + growing history — while the answer is comparatively short. Even though output is priced higher per token, the sheer input volume on each turn usually dominates the bill.

Question 28

How do I estimate tokens from my own text?

Accepted Answer

Use the Token estimator on this page: paste any text — a prompt, a product description, a tool schema — and it returns characters, words and an estimated token count using the rule of thumb of roughly 4 characters per token. Then click “apply” to drop that number into the system-prompt, RAG or tokens-per-tool field.

Question 29

Does conversation history get re-sent every turn?

Accepted Answer

Yes. Each LLM call includes the recent conversation history so the model has context, and WisWes re-sends it every turn (windowed to the last ~20 messages). That’s why longer conversations cost more than the message count alone suggests.

Question 30

What is prompt caching, and why isn’t it applied here?

Accepted Answer

Prompt caching lets providers charge less for repeated, unchanged input (like a fixed system prompt or tool list). WisWes doesn’t apply a cache discount today, so this calculator bills that input fresh on every call, matching current behaviour. If caching is enabled later, real costs would be lower than shown.

Question 31

Which AI model should I choose for a chat agent?

Accepted Answer

Choose a fast, cheap “flash/mini” model (Gemini Flash, GPT-4o mini, Claude Haiku) for high-volume FAQ, search and recommendation work, and reserve premium models (Claude Sonnet/Opus, GPT-4.1) for complex reasoning or sensitive flows. Switching model is usually the single biggest cost lever.

Question 32

Which model does WisWes use by default?

Accepted Answer

Free and Standard plans run on Gemini 3 Flash; Professional and Enterprise use smart routing that escalates harder turns to stronger models. You can pick any supported model here to see its cost.

Question 33

Why is the Gemini 3 Flash price marked “estimated”?

Accepted Answer

Gemini 3 Flash is the production default but isn’t yet in the WisWes backend rate card, so its per-token price here is our best estimate. Every other model uses the exact rates from the rate card.

Question 34

Can I bring my own API key or model?

Accepted Answer

Yes — WisWes lets you bring your own provider key to pay the model provider directly if you prefer. In that case this estimate is roughly what the provider would bill you for tokens.

Question 35

What is “LLM calls per message”?

Accepted Answer

It’s how many model round-trips one user turn takes. A turn that calls a tool then answers is 2 calls; complex turns can chain more (WisWes allows up to 5). Each call re-sends the prompt + tools, so this multiplies the input cost per message.

Question 36

What are the “Advanced assumptions”?

Accepted Answer

They are the per-turn details most people can leave at the defaults: LLM calls per message, system-prompt size, accumulated conversation history re-sent each call (windowed to the last 20 messages), the user’s message size, and output tokens per answer. Open the panel only if you want to fine-tune.

Question 37

Why does the system prompt size matter?

Accepted Answer

The system prompt — persona, policies, guidance — is sent on every call as input tokens. A larger prompt (more rules, more injected context) raises the cost of every single turn, so it’s worth keeping tight.

Question 38

How does conversation history affect cost?

Accepted Answer

History is re-sent each call and grows as the chat continues (until it’s windowed or summarised), so later turns in a long conversation carry more input tokens than earlier ones. Shorter conversations and summarisation reduce this.

Question 39

How can I lower the monthly cost of an AI chat agent?

Accepted Answer

Lower the monthly cost by, in order of impact: choosing a cheaper model for routine turns, trimming unused tools, returning fewer and leaner RAG results, keeping the system prompt tight, and shortening conversations. Model choice alone often moves the total several-fold.

Question 40

Does this include the WisWes subscription price?

Accepted Answer

No. The calculator shows the raw LLM provider spend (model tokens) only. It excludes the WisWes plan fee, infrastructure, embeddings and any margin — it models the underlying model cost, not your invoice.

Question 41

What costs are NOT included?

Accepted Answer

Everything except model tokens: the WisWes subscription, hosting and infrastructure, embedding generation for indexing your catalog, and any margin. This is purely the LLM provider’s token spend for the configured usage.

Question 42

Does WisWes charge me per token?

Accepted Answer

No. WisWes plans are flat monthly fees with included usage and pay-per-result overages — not per-token billing. This calculator models the underlying model cost (useful for understanding the economics), not your WisWes invoice.

Question 43

What is a “win-back” and how does WisWes actually bill?

Accepted Answer

A win-back is a shopper the agent brings back from the edge of leaving and converts. WisWes charges a flat plan fee with a monthly allowance of conversations and win-backs, then small per-result overages beyond it ($0.05 per extra conversation, $0.65 per extra win-back) — outcome-based, not token-based.

Input tokens / call	5,770
Input tokens / conversation	76,440
Output tokens / conversation	3,000
Model	Gemini 3 Flash

What does a conversational AI agent cost per month?

Audience

Retrieval (RAG)

Tools

Model

How the estimate works

Using this calculator

E-commerce profile

Audience: users, engagement & conversations

Retrieval (RAG)

Tools

Tokens & the token estimator

Choosing a model

Advanced assumptions

Scope & WisWes billing

Stop guessing your AI bill — ship a predictable plan.