Day 4: Moving the search into a real Shopify app — and giving it a brain
Three days in, Margeen could already find products. Day 2 taught it to read AliExpress search pages; Day 3 taught it to fan one prompt into ~25 searches with Gemini. But all of that ran as a Python script on a schedule, and when it finished it wrote a candidates.json file into the repo. Useful to me. Useless to a business.
Because here is the thing I keep reminding myself: Margeen is supposed to be an LLM that runs its own resale business — it finds products, prices them, lists them, sells them. A business does not live in a JSON file on a timer. It lives in a store. So this entry is two small moves that belong together: first I moved the search into a real Shopify app, and then — once the app turned out to be dumber than the script it replaced — I wired the brain back in.
A script and an app are not the same animal
I’d been treating “the search works” as the finish line. It isn’t. A script that runs on a timer and a tool a person opens and uses are different things, and the gap between them is most of what makes software feel real.
| The script (Day 2–3) | The app | |
|---|---|---|
| Where it runs | GitHub Actions, on a schedule | Inside the Shopify admin |
| How you start it | Edit a workflow, wait | Type a niche, click search |
| What you get | A JSON file in the repo | Candidates on screen, in the store |
| Who could use it | Me, reading raw JSON | Anyone with the store open |
Same engine, completely different object. The search didn’t need to get smarter to make this move. It needed to get somewhere a merchant actually is.
How I built it (and what I leaned on)
I didn’t hand-write a Shopify app from scratch — almost nobody should. I generated the boilerplate with Shopify’s CLI and then dropped in the one thing that’s actually Margeen: the search. If you want the generic, step-by-step version of those first two moves, I wrote them up separately, for non-coders:
- Setting up Claude Code to write Shopify code — the tooling.
- Creating your first Shopify app — the empty app shell.
With the shell running, the real work was porting the AliExpress search from Python to Node and adding one admin page that calls it.
The port had one catch worth flagging, because it’s the same wall I hit on Day 2. Node’s built-in fetch gets bot-blocked by AliExpress exactly like Python’s requests did — wrong TLS fingerprint, so it’s served a captcha page instead of results. The fix is the same trick: shell out to curl, whose fingerprint passes for a browser. So the Node version still calls curl under the hood and parses the HTML it returns. Not elegant. It works, and “works” is the bar.
What it did at first
The app got a Product finder page. You type what you want to sell — “wireless earbuds with case” — hit search, and it runs the AliExpress search server-side and lists the candidates right there: thumbnail, title, listed price, and a link out to the source. The Day 2–3 capability, now with a face.
The app was dumber than the script
And that is exactly when the second problem jumped out at me. The app worked — and it was worse than the script. You typed wireless earbuds, it searched AliExpress for exactly wireless earbuds, and showed you page one. That’s the Day 2 capability — the one Day 3 already improved on by fanning each prompt into ~25 variants and roughly doubling the candidate pool.
So Margeen had a brain and a body, built on different days, in different languages, that had never been in the same room. The fix was the wire between them — and, almost for free, it handed me a first crude relevance signal I wasn't expecting.
Two halves that never met
| The app | The expansion script (Day 3) | |
|---|---|---|
| Searches run | One — the literal query | ~25 Gemini variants |
| Language | Node (in the Remix app) | Python (GitHub Actions) |
| Lives where the merchant is | Yes — in the admin | No — in a JSON file |
| Smart about vocabulary | No | Yes |
Neither column is the goal. The goal is the diagonal: the brain's ~25 searches, running in the body, where the merchant actually is. To get there I had to do two things — port the expander to Node, and decide what happens when 25 searches all come back at once.
Porting the brain to Node
The expander itself is a near-line-for-line port of Day 3's expand_keywords() — same prompt, same responseMimeType: "application/json" JSON mode, same temperature 0.7, same defensive dedupe. One detail flips, though, and it's worth saying out loud because it contradicts the app's other network call:
curl trick. Gemini does not.curl because Node's fetch gets bot-blocked by its TLS fingerprint. Gemini is a normal JSON API with no such gatekeeping — so the expander uses the built-in fetch. The curl thing is an AliExpress workaround, not a Margeen house style.And the discipline that made the Day 3 script fork-friendly survives the port intact: the function always returns a non-empty list. No key, an API error, a timeout, malformed JSON — every failure resolves to the single-element fallback [query]. A fork with no GEMINI_API_KEY behaves exactly like the single-query finder, with zero extra branches downstream.
export async function expandKeywords(query, { timeoutMs = 20000 } = {}) {
const base = [query];
const apiKey = (process.env.GEMINI_API_KEY || "").trim();
if (!apiKey) {
console.log("[margeen] GEMINI_API_KEY not set - skipping expansion");
return { model: null, expanded: false, variants: base }; // <- single-query behaviour
}
// ...POST to Gemini with responseMimeType: "application/json"...
// ...on any error: return { ..., expanded: false, variants: base }...
}The throttle I owed AliExpress
Day 3 ended on a bruise. It fired 25 fetches in about five seconds from GitHub's data-centre IPs, and 23 of 25 came back bot-blocked — a textbook crawler signature. The fix I promised back then ("look less like a crawler") is the natural place to pay that debt, so I folded it straight into the wire: cap how many searches run at once, and jitter between launches.
const MAX_PARALLEL = 3; // not 25-at-once like Day 3
const JITTER_MS = 400; // small random gap before each launch
// a bounded worker pool: at most MAX_PARALLEL searches in flight
const perVariant = await runPool(variants, (v) => searchAliExpress(v), MAX_PARALLEL);Honest caveat, because that is the whole point of doing this in public: I have not hammered this at scale yet. The finder runs from wherever you run shopify app dev — your own machine, a residential IP — which already looks far less suspicious than a GitHub runner. Throttling on top should push the block rate down further. But AliExpress gets the final vote, and the real numbers will come from real runs on the dev store, not from me asserting them here.
Merging — and an accidental relevance signal
Twenty-five searches return overlapping products. The merge dedupes by productId — but instead of throwing the duplicates away, it remembers which variants surfaced each product. Day 3 already designed the JSON for this (seen_in on every candidate); this is where it finally gets used.
for (const c of result.candidates) {
const hit = byId.get(c.productId);
if (hit) hit.seenIn.push(variant); // corroborated by another search
else byId.set(c.productId, { ...c, seenIn: [variant] });
}
// rank: products found by the most variants float to the top
candidates.sort((a, b) => b.seenIn.length - a.seenIn.length || ...);The logic is plain: if blue boy pants, kids blue jeans and navy kids pants all surface the same product, that product is probably genuinely on-topic — three independent phrasings agreed on it. A listing that shows up under exactly one weird variant is more likely noise. So the finder now sorts by agreement and tags the strong ones found by N searches.
I want to be precise about what this is and isn't:
| What it is | What it is not |
|---|---|
| A cheap corroboration heuristic, free from the merge | The LLM relevance filter (still a later chunk) |
| Good at floating broadly-matched products up | Able to catch a confident, wrong single match |
| Zero extra API calls | A judgement about whether the product is any good |
It will not, on its own, save us from the SERVO A50 PRO foldable flip phone that Day 2 returned for iphone 18 replica. That still needs a model reading {niche, title} and saying "no." But as a free first pass, ranking by agreement beats "whatever order they parsed in."
What the page shows now
Same finder, three new honest signals, all surfaced in the admin rather than hidden in a log:
- Expanded to N searches · gemini-2.5-flash — or, with no key, a plain single query — expansion skipped badge. No silent magic.
- X of N bot-checked — the block count, shown, not swallowed. If everything blocks, you get a warning instead of an empty page pretending all went well.
- found by N searches — on each corroborated candidate, so the relevance signal is visible, not just a sort order.
What’s honestly still rough
Build-in-public means showing the seams, so:
| Rough edge | Status |
|---|---|
| Real relevance filter | Agreement-ranking is a proxy; the LLM judge isn’t built yet |
| Bot-block rate from the app | Throttled (3 at a time + jitter), but unmeasured at scale |
| No margin math | Candidates show listed price only — real margin is a later chunk |
| Nothing is created in the store yet | Finding ≠ listing. The store is still empty. The milestone |
That last row is the one that matters, and it hasn’t moved. Margeen can now find well from inside its own store. It still can’t put a single product on the shelf. Turning a candidate into a real Shopify product — images, description, price, live SKU — is where the empty store finally starts to fill, and it’s the next real milestone.
The code
It's all in the shopify-app folder of the repo: app/lib/aliexpress.server.js (the curl + regex search), app/lib/keywords.server.js (the Node expander), app/lib/finder.server.js (expand → throttle → merge → rank), and app/routes/app.finder.jsx (the admin page). The SETUP.md explains how they sit on top of the CLI-generated shell and the optional GEMINI_API_KEY step. MIT, public, fork it.