Cost calculator
Estimate the raw LLM model spend of running a conversational shopping agent. Adjust your traffic, engagement, retrieval, tools and model — the defaults are pre-filled to match how a real WisWes agent runs.
Gemini 3 Flash price is estimated — not yet in the rate card.
Estimated monthly model spend
$12.17/mo
| Input tokens / call | 5,770 |
| Input tokens / conversation | 76,440 |
| Output tokens / conversation | 3,000 |
| Model | Gemini 3 Flash |
Raw LLM provider spend only — excludes the WisWes subscription, infrastructure and embedding costs. Assumes no prompt-cache discount, matching current WisWes behaviour.
Everything you need to fill in each field — and how WisWes turns it into a monthly number. Each settings panel above links here.
The monthly cost is (input tokens ÷ 1,000,000 × the model’s input price) + (output tokens ÷ 1,000,000 × its output price). WisWes derives the token counts by multiplying your unique monthly users by the engagement rate to get conversations, then estimating input and output tokens per conversation from the system prompt, tool definitions, retrieved RAG context, accumulated history and the model’s answers.
Start with the E-commerce profile to set a baseline for your industry, then enter your monthly traffic and engagement under Audience. The estimated monthly model spend updates live as you adjust retrieval, tools and model. Use the Token estimator if you’re unsure what a token value should be.
The defaults mirror a real WisWes agent: ~20% on-site chat engagement (Tidio benchmark), ~1,200 tokens of retrieved context per turn, ~17 tools at ~190 tokens each re-sent every call, and the per-token rate card used by the WisWes backend.
This is a planning estimate, not a bill. WisWes models token counts from a typical turn structure, so your real spend varies with how chatty shoppers are, how often tools fire and how much context you retrieve. Treat it as a grounded ballpark and refine the inputs with your own analytics.
Yes. The WisWes calculator reads its values from the URL, so a link like /calculator?users=5000&model=claude-sonnet-4-6 opens pre-filled. The WisWes chat assistant uses this to hand you a ready-made estimate from a conversation.
The E-commerce profile sets a sensible baseline for every cost parameter from three inputs: your industry (one of 10 top e-commerce verticals), B2B or B2C, and your current conversion rate. Choosing a profile retunes retrieval size, tool count, conversation length and engagement, and you can still fine-tune any value afterwards.
The WisWes calculator covers the 10 largest e-commerce industries: general/multi-category, fashion & apparel, consumer electronics, health & beauty, home & furniture, food/beverage & grocery, sports & outdoors, toys/kids & hobbies, jewelry & accessories, and auto parts & accessories. Each preset tunes retrieval and tool counts to that vertical’s typical data depth.
Electronics costs more because it carries far more per-product data — specs, compatibility and comparisons — so a WisWes agent retrieves larger context and needs more tools, raising tokens per turn. Clothing leans on a few attributes (size, colour, fit), so its retrieval and tool footprint is lighter and cheaper per conversation.
B2B raises the estimate because B2B conversations run longer and more technical — quotes, accounts and approvals — so the profile adds messages per conversation, more tools, a larger system prompt and an extra reasoning round-trip. B2C stays leaner and higher-volume.
Your purchase conversion rate is used as a buying-intent signal: higher-converting traffic tends to engage the assistant more, so the profile nudges the chat engagement rate up with it. It does not change per-token pricing — it only sets a smarter engagement starting point.
Yes. The E-commerce profile only sets starting values — every slider and field below stays editable, so you can match your own measured numbers after picking an industry.
Your unique monthly users are the unique visitors to your store in a typical month — not pageviews or sessions. In Google Analytics 4 it is the “Users” (or “Total users”) metric; in Shopify Analytics it is “Online store visitors”; in Plausible it is “Unique visitors”.
Engagement rate is the share of visitors who actually start a chat: conversations started ÷ unique visitors, over the same period. If your widget reports “chats opened”, divide that by visitors. If you don’t track it yet, start with the 20% default — the typical on-site bot engagement rate (Tidio data) — and refine once you have your own numbers.
A conversation is one shopper’s chat session with the agent; a message is a single turn within it. The calculator bills tokens per message, and there are several messages per conversation — so both the engagement rate (how many conversations) and messages per conversation drive the cost.
Count the user’s turns (messages they send), not the bot’s replies, in a typical engaged chat. A quick product question is 2–3; a guided “help me choose” or support flow is 6–10. The default is 6. More messages mean more LLM calls, and history grows each turn, so this scales cost noticeably.
Roughly yes — cost scales with conversations, which is users × engagement rate, so doubling traffic at the same engagement roughly doubles the model spend. The bigger non-linear levers are conversation length, retrieval size and model choice.
RAG (retrieval-augmented generation) injects relevant snippets from your catalog, FAQs and policies into the prompt so the agent answers from your data instead of guessing. Those retrieved tokens are added to the input on the turns that search, so larger or more numerous results raise cost.
RAG context per turn ≈ (number of results you return) × (tokens per result). Take one real retrieved chunk — a product snippet or an FAQ answer — paste it into the Token estimator to get its token count, then multiply by how many you show per answer. WisWes defaults to ~1,200 tokens (≈8 product matches or ≈3 FAQ answers).
You need RAG if your agent must answer from your live catalog, specs or policies — that grounding is what keeps answers accurate. A purely scripted or FAQ-light bot can switch RAG off here to see the lower-bound cost, but most commerce agents rely on it.
Cost rises roughly linearly with results shown: 8 product matches cost about twice the tokens of 4. Returning just enough to answer well — rather than a long list — is an easy way to trim per-turn tokens.
Count the distinct actions your agent can take — search catalog, recommend, add to cart, apply discount, track order, hand off to a human, and so on. Each tool ships a name, description and JSON parameter schema; serialized, that is ~150–250 tokens (WisWes averages ~190). To measure your own, paste one tool’s JSON definition into the Token estimator.
WisWes re-sends the full tool definitions and system prompt on every LLM call and does not apply a prompt-cache discount today, so those input tokens are billed fresh on each turn. That makes the tool count and prompt size meaningful cost levers — trimming unused tools or a bloated system prompt lowers every single call.
Yes, on every call where they’re offered. WisWes sends the full tool list as input tokens each turn with no cache discount, so a tool the shopper never triggers still adds its ~190 tokens to every message. Pruning tools the agent doesn’t need lowers every call.
Copy the tool’s serialized definition — its name, description and JSON parameter schema — and paste it into the Token estimator. Multiply a representative tool’s size by your tool count, or measure your largest few and average.
A token is the unit language models read and write — roughly 4 characters, or about 0.75 words, of English. Model pricing is quoted per million tokens, split between input (what you send) and output (what the model generates).
Input tokens are everything you send the model each call — system prompt, tool definitions, retrieved RAG, conversation history and the user’s message. Output tokens are what the model writes back. They’re priced separately, and output usually costs 4–10× more per token.
Because you send a lot of it every call — system prompt + all tool definitions + RAG + growing history — while the answer is comparatively short. Even though output is priced higher per token, the sheer input volume on each turn usually dominates the bill.
Use the Token estimator on this page: paste any text — a prompt, a product description, a tool schema — and it returns characters, words and an estimated token count using the rule of thumb of roughly 4 characters per token. Then click “apply” to drop that number into the system-prompt, RAG or tokens-per-tool field.
Yes. Each LLM call includes the recent conversation history so the model has context, and WisWes re-sends it every turn (windowed to the last ~20 messages). That’s why longer conversations cost more than the message count alone suggests.
Prompt caching lets providers charge less for repeated, unchanged input (like a fixed system prompt or tool list). WisWes doesn’t apply a cache discount today, so this calculator bills that input fresh on every call, matching current behaviour. If caching is enabled later, real costs would be lower than shown.
Choose a fast, cheap “flash/mini” model (Gemini Flash, GPT-4o mini, Claude Haiku) for high-volume FAQ, search and recommendation work, and reserve premium models (Claude Sonnet/Opus, GPT-4.1) for complex reasoning or sensitive flows. Switching model is usually the single biggest cost lever.
Free and Standard plans run on Gemini 3 Flash; Professional and Enterprise use smart routing that escalates harder turns to stronger models. You can pick any supported model here to see its cost.
Gemini 3 Flash is the production default but isn’t yet in the WisWes backend rate card, so its per-token price here is our best estimate. Every other model uses the exact rates from the rate card.
Yes — WisWes lets you bring your own provider key to pay the model provider directly if you prefer. In that case this estimate is roughly what the provider would bill you for tokens.
It’s how many model round-trips one user turn takes. A turn that calls a tool then answers is 2 calls; complex turns can chain more (WisWes allows up to 5). Each call re-sends the prompt + tools, so this multiplies the input cost per message.
They are the per-turn details most people can leave at the defaults: LLM calls per message, system-prompt size, accumulated conversation history re-sent each call (windowed to the last 20 messages), the user’s message size, and output tokens per answer. Open the panel only if you want to fine-tune.
The system prompt — persona, policies, guidance — is sent on every call as input tokens. A larger prompt (more rules, more injected context) raises the cost of every single turn, so it’s worth keeping tight.
History is re-sent each call and grows as the chat continues (until it’s windowed or summarised), so later turns in a long conversation carry more input tokens than earlier ones. Shorter conversations and summarisation reduce this.
Lower the monthly cost by, in order of impact: choosing a cheaper model for routine turns, trimming unused tools, returning fewer and leaner RAG results, keeping the system prompt tight, and shortening conversations. Model choice alone often moves the total several-fold.
No. The calculator shows the raw LLM provider spend (model tokens) only. It excludes the WisWes plan fee, infrastructure, embeddings and any margin — it models the underlying model cost, not your invoice.
Everything except model tokens: the WisWes subscription, hosting and infrastructure, embedding generation for indexing your catalog, and any margin. This is purely the LLM provider’s token spend for the configured usage.
No. WisWes plans are flat monthly fees with included usage and pay-per-result overages — not per-token billing. This calculator models the underlying model cost (useful for understanding the economics), not your WisWes invoice.
A win-back is a shopper the agent brings back from the edge of leaving and converts. WisWes charges a flat plan fee with a monthly allowance of conversations and win-backs, then small per-result overages beyond it ($0.05 per extra conversation, $0.65 per extra win-back) — outcome-based, not token-based.
WisWes runs frontier models with usage included and pay-per-result overages. Start a 14-day free trial — no credit card.