Analysis · 3 June 2026

The first question in AI visibility is whether the model can see the web

Whether an engine retrieves live or answers from training data moves the recommendation more than which model it is or how you phrase the question. If you work on getting cited by AI, this is the variable to check first. Most people skip it.

People optimising for AI answers argue about models and prompts. Which assistant matters most, how to word the question, whether to add the year. Those matter at the margin. The thing that moves the answer most is more basic, and easy to miss because it is a setting rather than a sentence: does the engine search the live web before it answers, or does it reply from what it learned in training? I have it isolated cleanly, because I ran one model both ways.

The same model, one toggle

I run Mistral alongside three US engines. The first time, I ran it without web search while the others had it on. Then I turned its search on and ran the identical questions again. Same model, same prompts, same week. Only retrieval changed. Three things moved at once.

It stopped over-listing. Without search, Mistral named 4.4 apps an answer in fitness and 2.8 in the AI category. With search on, 2.1 and 1.9, close to the other engines. A model with no live source to check against hedges by naming everything it can remember.

It stopped recommending the dead. I seed the AI category with a product that no longer exists under its old name: Bard, the assistant Google renamed to Gemini in early 2024. Ungrounded, Mistral named Bard six times, treating a retired brand as a live recommendation. Grounded, the only times Bard appears are the correct "Gemini, formerly Bard," not a recommendation of the dead product. The staleness was not a flaw in the model. It was the missing web.

And it stopped looking insular. Ungrounded, Mistral was the only engine that ever recommended its own assistant, Le Chat, six times, when every US engine named it zero. That looked like home-team bias. Switch search on and it stops naming Le Chat entirely, zero times across the set, because Le Chat has a thin web footprint and a grounded model retrieves what the web actually says. The "bias" was a model reciting its training data before the web could correct it.

An ungrounded model recommends the past. It favours what was big when its training stopped, keeps dead products alive, over-lists, and leans on whatever its makers fed it. Give it live retrieval and it converges on what the web says now.

I then folded the grounded Mistral into the leaderboard score and the ranking barely moved: the AI leaders shifted by a point or two, no more. Two engines from different labs and countries, both grounded, mostly agree. The disagreement I had measured before was not French versus American. It was offline versus online.

Grounded is not a switch, and not a cure

It helps to think of grounding as a dial, not a button. Even with search on, engines vary by how hard they are pinned to what they retrieve, and the same company can run two surfaces that disagree. Ask Google's Gemini API for the best AI app and it behaves like the other majors. Ask Google's AI Overview the same question and it leads with Gemini and leaves ChatGPT out entirely. Same web, same week, different answer.

The lesson is not that grounding fixes everything. It is that the floor without it is much lower, and that the surface your buyers actually use matters. Google's chat app and Google's search overviews are the same company and behave differently. Knowing which one your customers query is part of the job.

The flip side: retrieval makes your web presence the currency

If a grounded engine recommends what the web says, then having nothing on the web means being invisible, regardless of how good the product is. I tracked a brand-new indie fitness app with no marketing behind it. It scored zero across every engine, on every question. Not penalised, absent. In the AI category, Qwen, a capable model with little Western coverage, did the same. Grounding rewards the thing AEO is supposed to be about: a real, current, retrievable footprint. It punishes products that exist but are not written about.

What to do with this if you work on AEO

Check grounding before you read a single ranking. A visibility report that does not say whether the engine had search on is telling you about a model's memory, not about what your customers see. Test the grounded consumer surface people actually use, not a bare API with its tools switched off, and not a model running from memory. Treat any single ungrounded snapshot as stale by default: it reflects the training cutoff, inflates long lists, and over-rewards whatever was prominent a year or two ago.

Then spend your effort where retrieval can reach it. Keep current, accurate, quotable information about your product where the engines crawl, because that is what a grounded answer is built from. A model with web search is not judging you on what it learned in training. It is judging you on what it can find today.

What this is

Drawn from preview runs on 3 June 2026, across fitness and AI assistant apps, US English. The clean before-and-after is Mistral, run once without web search and once with it via its web_search connector; the US engines (ChatGPT, Gemini, Claude) ran grounded throughout, and Google's AI Overviews are captured from the live search page. See the method for why this index grounds every engine.