Analysis · 3 June 2026
The first question in AI visibility is whether the model can see the web
Whether an engine retrieves live or answers from training data moves the recommendation more than which model it is or how you phrase the question. If you work on getting cited by AI, this is the variable to check first. Most people skip it.
People optimising for AI answers argue about models and prompts. Which assistant matters most, how to word the question, whether to add the year. Those matter at the margin. The thing that moves the answer most is more basic, and easy to miss because it is a setting rather than a sentence: does the engine search the live web before it answers, or does it reply from what it learned in training? I have it isolated cleanly, because I ran one model both ways.
The same model, one toggle
I run Mistral alongside three US engines. The first time, I ran it without web search while the others had it on. Then I turned its search on and ran the identical questions again. Same model, same prompts, same week. Only retrieval changed. Three things moved at once.
It stopped over-listing. Without search, Mistral named 4.4 apps an answer in fitness and 2.8 in the AI category. With search on, 2.3 and 2.0, in line with the other engines. A model with no live source to check against hedges by naming everything it can remember.
It stopped recommending the dead. I seed each category with a product that no longer exists: Mint, the budgeting app Intuit shut down in 2024, and Bard, the assistant Google renamed to Gemini. Ungrounded, Mistral recommended Mint in 13 of 15 answers and named Bard six times. Grounded, it named Mint zero times and Bard once, correctly, as "Gemini, formerly Bard." The staleness was not a flaw in the model. It was the missing web.
And it stopped looking insular. Ungrounded, Mistral was the only engine that ever recommended its own assistant, Le Chat, six times, when every US engine named it zero. That looked like home-team bias. Switch search on and it names Le Chat zero too, because Le Chat has a thin web footprint and a grounded model retrieves what the web actually says. The "bias" was a model reciting its training data before the web could correct it.
An ungrounded model recommends the past. It favours what was big when its training stopped, keeps dead products alive, over-lists, and leans on whatever its makers fed it. Give it live retrieval and it converges on what the web says now.
I then folded the grounded Mistral into the leaderboard score and the ranking barely moved: the AI leaders shifted by a point or two, no more. Two engines from different labs and countries, both grounded, mostly agree. The disagreement I had measured before was not French versus American. It was offline versus online.
Grounded is not a switch, and not a cure
It helps to think of grounding as a dial, not a button. Even with search on, engines vary by how hard they are pinned to what they retrieve. Asked for budgeting apps, grounded ChatGPT still named dead Mint 8 times and grounded Gemini 9, on questions that did not mention it. Google's AI Overviews, the surface tied most tightly to live search, named it zero. Same dead app, same week, three different amounts of staleness among engines that were all "grounded."
The lesson is not that grounding fixes everything. It is that the floor without it is much lower, and that the surface your buyers actually use matters. Google's chat app and Google's search overviews are the same company and behave differently. Knowing which one your customers query is part of the job.
The flip side: retrieval makes your web presence the currency
If a grounded engine recommends what the web says, then having nothing on the web means being invisible, regardless of how good the product is. I tracked a brand-new indie fitness app with no marketing behind it. It scored zero across every engine, on every question. Not penalised, absent. In the AI category, Qwen, a capable model with little Western coverage, did the same. Grounding rewards the thing AEO is supposed to be about: a real, current, retrievable footprint. It punishes products that exist but are not written about.
What to do with this if you work on AEO
Check grounding before you read a single ranking. A visibility report that does not say whether the engine had search on is telling you about a model's memory, not about what your customers see. Test the grounded consumer surface people actually use, not a bare API with its tools switched off, and not a model running from memory. Treat any single ungrounded snapshot as stale by default: it reflects the training cutoff, inflates long lists, and over-rewards whatever was prominent a year or two ago.
Then spend your effort where retrieval can reach it. Keep current, accurate, quotable information about your product where the engines crawl, because that is what a grounded answer is built from. A model with web search is not judging you on what it learned in training. It is judging you on what it can find today.
What this is
Drawn from three preview runs on 3 June 2026, across budgeting, fitness and AI assistant apps,
US English. The clean before-and-after is Mistral, run once without web search and once with it
via its web_search connector; the US engines (ChatGPT, Gemini, Claude) ran grounded
throughout, and Google's AI Overviews are captured from the live search page. The full
Mistral comparison is its own write-up; the dead-app
detail is in the Mint piece. See the
method for why this index grounds every engine.