← All articles
8 min read

Catching AI-Generated Fake Reviews — With a Model Small Enough to Ship Inside a Plugin

A robotic arm in a vast warehouse of phones, posting a glowing five-star product review at scale — machine-generated review spam.

Fake reviews are old. What changed is who writes them. For years a fake review was a human paid a dollar to type two sloppy sentences — bad grammar, obvious tells, easy enough to spot. Now a language model writes them: fluent, specific, on-brand, and generated by the thousand. The new review-spam problem is not that the spam reads badly. It is that the spam reads perfectly.

Why this is a business problem, not a vanity metric

Reviews are the closest thing an online store has to a salesperson on the floor. Most shoppers read them before they buy, and the star rating is often the single biggest nudge between "add to cart" and "close the tab." That is exactly why fake reviews are not a cosmetic nuisance — they corrupt the one signal customers trust most. And AI changed the scale of the attack: writing a thousand believable reviews used to take a thousand people; now it takes one prompt and an afternoon.

It cuts in two directions. Fake five-stars inflate your own products — and fake one-stars, just as cheap to generate, can be aimed at your bestsellers or sprayed across a competitor's catalog. Either way, the cost lands on the business:

Where it hurtsWhat it costs you
Shopper trustA wall of suspiciously perfect reviews reads as fake — and once buyers distrust the reviews, they distrust the store.
Returns & refundsInflated ratings pull in the wrong buyers; the product underdelivers, and you eat the returns, refunds, chargebacks, and support tickets.
Wasted ad spendYou pay to send traffic to a listing whose social proof is fabricated; the extra bounces quietly burn the budget.
Marketplace penaltiesGoogle, Amazon, and app stores down-rank or delist sellers caught hosting fake reviews.
Legal exposureThe FTC's 2024 rule bans fake and AI-generated reviews outright, with civil penalties that can reach tens of thousands of dollars per violation.
Lost compoundingHonest reviews are an asset that compounds over years; fake ones poison the well and make every genuine review look doubtful.

So the goal is not "delete the bad reviews." It is to restore the signal — to tell a real customer's voice from a machine-written one, the moment a review lands, so a human can decide what to do about it. That is the problem the rest of this series sets out to solve.

So we set out to catch it — and the first instinct, in 2026, is to reach for a big model: feed every review to an LLM and ask "is this fake?" That works, and it is also the wrong tool. It is slow, it costs a token bill on every review, it needs an API key and a network round-trip, and it cannot tell you why in a way you can audit. You cannot ship that inside a store plugin that has to score a review grid in a single page load.

This is Part 1 of a build-in-public series. The goal for the whole series is a fake-review detector that runs inside a Magento or Shopify store — per review, instantly, offline. That constraint decides everything about the model, so we started there: a tiny, explainable classifier. It is open, it is ~1.3 MB, and it hits 94.6% accuracy on reviews it has never seen.

Pipeline: a review goes into TF-IDF (word 1–2 grams), then logistic regression, which outputs a fake-probability and the signals explaining why it flagged.
The entire model is two well-understood pieces: TF-IDF turns a review into weighted word and word-pair counts, and logistic regression scores it. Small, fast, and — crucially — explainable.

What "fake" means here (precisely)

We trained on the Salminen et al. Fake Reviews Dataset — about 40,000 Amazon-style reviews, each labelled one of two ways:

LabelMeaningWe call it
CGComputer-generated — a language model wrote the reviewfake
OROriginal — a real customer wrote itreal

So this detects machine-written reviews — exactly the modern bot-farm problem. Be clear about what it is not: it is not a sentiment detector, and it is not an "honest vs. dishonest human" judge. A real, glowing, human-written five-star review is not what it flags. It flags text that a machine produced.

Why a tiny model beats reaching for an LLM

For this job, the small classic model is not a compromise — it is the better engineering choice on every axis that matters to a plugin:

LLM per reviewThis classifier
Cost per reviewA token bill, every time$0 after training
LatencyA network round-tripSub-millisecond, local
Runs offlineNo — needs an APIYes — ~1.3 MB on disk
ExplainableA paragraph you must trustThe exact tokens, scored
DeterministicVaries run to runSame input, same output

The explainability row is the one we care about most. When you flag a merchant's review, you owe them a reason — and "a large model said so" is not one. This model can hand you the precise words that moved the verdict.

The recipe

Nothing exotic. The whole pipeline is two scikit-learn components, and it trains on ~40k reviews in a few seconds on a laptop:

StageWhat it doesSettings that matter
TF-IDFTurns text into weighted word + word-pair countsword 1–2 grams · sublinear_tf · min_df 2 · 30,000 features
Logistic regressionScores those features into a fake-probabilityC = 4.0 · class_weight balanced

Two details do the heavy lifting. Bigrams (word pairs like highly recommend or love it) catch the canned, scaffolded phrasing generated reviews lean on — single words miss it. And sublinear_tf dampens repetition, so a review that says "love" five times does not get five times the weight.

The results

Trained on 32,345 reviews, tested on a held-out 8,087 it never saw:

Held-out scores · 8,087 reviews it never saw in training80 / 20 split · higher is better · perfect = 1.001.000.946Accuracy0.945F10.988ROC-AUC
Accuracy and F1 both land at ~0.95, and ROC-AUC at 0.988 — meaning the model ranks a random fake above a random real review 99% of the time. For a 1.3 MB model with no GPU and no pretraining, that is a lot of signal for very little machinery.
MetricScorePlain meaning
Accuracy0.946Of all reviews, the share it labelled correctly
F10.945Balance of catching fakes vs. false alarms
ROC-AUC0.988How well it ranks fake above real

The part that makes it shippable: it shows its work

A probability alone is a black box. The reason this model earns a place in a merchant-facing tool is that it can point at the evidence. For each review it returns signals — the individual tokens whose weight (TF-IDF value × model coefficient) pushed the score toward fake.

Feed it a textbook generated review and you get back:

Input: "Love this! Well made and very comfortable. I love it!"

{
  "label": "fake",
  "fake_probability": 0.94,
  "signals": [
    { "token": "love it", "weight": 0.71 },
    { "token": "i love",  "weight": 0.40 }
  ]
}
Not just a verdict — the receipts"Love this! Well made and very comfortable. I love it!" → P(fake) 0.94"love it"+0.71"i love"+0.40These are the tokens that moved the score toward fake — the "why-flagged" column for the grids later.
The same prediction as a picture. The model didn't just say fake — it named the exact phrases that gave it away. This is the column that will sit next to every flagged review in the Magento and Shopify grids.

Try it in two commands

The whole project is open source (MIT) on GitHub — github.com/wiswes/fakereviews. It is genuinely clone-and-run: the trained model is committed, so you can predict immediately, or retrain from scratch in seconds:

pip install -r requirements.txt

# Score a review straight away (model ships in the repo)
python -m fakereviews.cli predict "Best product ever!!! Buy it now!!!"

# …or retrain from scratch — fetches the dataset, trains in seconds
python -m fakereviews.train

Or use it as a library — one import, one call, with a threshold you raise in production to flag only high-confidence fakes:

from fakereviews import FakeReviewClassifier

clf = FakeReviewClassifier()
result = clf.predict(review_text, threshold=0.5)
print(result.label, result.fake_probability)

Honest limits

What's next

One model, three places to run itPart 1 — ClassifierPython · donePart 2 — Magentoflag fakes in the gridPart 3 — Shopifyplanned✓ shippedbuilding nownext
The classifier was deliberately built small so it could live inside a store. Next we put it there.

The model was the easy, contained part. The series gets interesting when it leaves the notebook:

The whole point of a 1.3 MB, no-dependency model is that it can run anywhere a store runs. Parts 2 and 3 are where we prove it.

WisWes builds AI that lives inside your store — answering shoppers, recommending products, and, soon, keeping your reviews honest. This series is us building one piece of that in the open.

Turn questions into checkout.

WisWes drops into your store and guides shoppers from browsing to buying. 14-day free trial — no card.