Analysis · 3 June 2026

Give Mistral web search and it stops looking like the odd one out

Three of my four answer engines are American. The fourth is French. The first time I ran the index, the French one behaved very differently. The difference turned out to be a setting, not a passport.

ChatGPT, Gemini and Claude come from US labs. Mistral is the European outlier, the model European and public-sector buyers reach for on sovereignty grounds. So the comparison is tempting: ask the same buyer questions and see whether the French model keeps up. On the first run it plainly did not. It recommended more apps than anyone, it kept dead products on the list, and it was the only engine that ever named its own assistant. The easy headline wrote itself.

I did not publish it, because the comparison was rigged. The three US models ran with live web search switched on. Mistral ran without it, answering from training data alone. I was comparing a model that could read today's web against one working from memory. So I wired search into Mistral and ran it again.

What the handicapped run looked like

Worth seeing first, because it is what an ungrounded model does in general, not what a French one does. Without search, Mistral said yes to everything. In fitness it named 4.4 apps an answer against ChatGPT's 1.4. It rated the strength tracker Strong at 70%, Nike Training Club at 65%, Freeletics at 58%, where the grounded engines put all three far lower. A model with no live source to check against hedges by listing.

It also went stale. In the AI category I track Bard, the assistant Google renamed to Gemini in early 2024, as a deliberate dead-entity test. Ungrounded Mistral named Bard as a live product six times. And it was the lone engine to recommend its own Le Chat, six times, when every US model named it zero. Read at face value, that last one looks like a French model favouring a French product.

What changed when I turned search on

It converged. Asked for the best AI app, grounded Mistral now leads with "ChatGPT, Claude and Gemini," the same top tier the US models name. Its top three by rate are ChatGPT at 41%, Gemini at 38% and Claude at 32%, in line with the others. The over-listing collapsed: 2.0 apps an answer in AI, 2.3 in fitness, which puts it in the middle of the pack rather than off the end of it. Strong fell from 70% to 26%, Nike Training Club from 65% to 35%. In fitness the grounded French model now names fewer apps than Gemini does.

The dead products went with it. Bard, the assistant Google renamed to Gemini, dropped from six mentions to one, and that one is correct: "Gemini, formerly Bard." Budgeting is starker. Ungrounded Mistral recommended Mint, the app Intuit shut down in 2024, in thirteen of fifteen answers. Grounded, it names Mint zero times, cleaner on the dead app than grounded ChatGPT or Gemini, which still slip. The staleness was the missing web, nothing else.

I folded grounded Mistral into the score and the leaderboard barely moved. ChatGPT went from 47 to 46, Claude from 47 to 44, Gemini stayed at 40. A new engine that changes the ranking by a point or two is one that already agrees with the others.

The home bias was a memory

The cleanest result is Le Chat. The single strongest "Mistral favours its own" data point, the one an easy story would lead with, vanished the moment Mistral could see the web. Grounded, it names Le Chat zero times, exactly like every US engine. The reason is plain once you look: Le Chat has a thin web footprint, so any model leaning on live search struggles to surface it, including the model its own maker built. The bias was not loyalty. It was a model reciting what it learned in training, before the web could correct it.

What this actually measures

The useful lesson is bigger than one vendor. When you compare AI engines for whether they recommend your product, you are mostly comparing whether they have live search, not which lab trained them or which country they come from. Grounding dominates. An ungrounded model, French or American, gives you an inflated, two-year-old picture. A grounded one gives you something close to the live consensus, whoever built it. That is why this index runs every engine with search on, and why grounded Mistral now counts in the score next to the rest.

One thing the convergence does not erase: a lot of Mistral deployments, especially the sovereignty-driven ones, run the plain chat model with no search. If that is the engine your customers query, you are facing the handicapped version described above, the one that surfaces stale and obscure names and answers from an older map. The fix on the vendor side is the same one I applied. Turn search on.

What this is

Preview runs on 3 June 2026 across AI assistants, fitness and budgeting, US English. AI and fitness use 34 frozen prompts each on the explore tier (mistral-small-latest); budgeting is the five-prompt preview on the fidelity tier (mistral-medium-latest), so its Mistral numbers sit beside chat models on the same tier. The US engines use their built-in web search; grounded Mistral uses its web_search connector through the Agents API, a different mechanism for the same thing, live retrieval before answering. I re-ran only the Mistral column and patched it into the existing snapshots, so the other engines' numbers are unchanged. See the AI apps index, the fitness index and the Mint write-up for the full picture, or the self-preference piece for the one home-team bias that grounding did not remove.