Analysis · 3 June 2026
I asked the assistants for the best AI app. They mostly didn't pick themselves.
The obvious worry about an AI ranking AI is self-dealing. I tested it directly. Four of the assistants I query are also apps on the board. The bias is real, but it isn't where you'd look.
This is the most recursive category I measure. When I ask ChatGPT, Claude, Gemini and Mistral's Le Chat to name the best AI app, they are ranking a shelf they sit on. The headline fear writes itself: each model crowns itself. So I checked, on the open "best AI app" questions where no brand is named in the prompt.
It mostly didn't happen. The three big chat models cross-recommend each other as a settled top tier. ChatGPT, asked for the best AI app, names ChatGPT less often than Gemini or Claude name it. It rates itself lowest of the three. Claude and Gemini land within noise of how their rivals rate them. There is no clear act of self-coronation among the majors. If anything, they agree on each other.
The big models don't favor themselves. They favor the same three names, including their competitors. The bias is consensus, not vanity.
Where I thought it hid, and where it really does
I expected to find it in the smaller player. In my first run, Mistral was the only engine that ever named its own Le Chat, six times, while every other assistant named it zero. That looked like a maker propping up its own product. It was not. That run had Mistral answering from memory, with no web search. When I gave it the same live search the others use, it named Le Chat zero times, the same as everyone else. The home bias was a property of the ungrounded model, not of Mistral. The full before-and-after is its own write-up.
The self-preference that survives is Google's. Not the Gemini API, which behaved like the other majors, but Google's AI Overviews, the box hundreds of millions of people actually see. Asked "best AI app 2026," that box leads with Gemini and does not mention ChatGPT at all. Aggregate it over more questions and it evens out, but the most-seen surface, on the headline query, quietly puts its own product first. I saw the same split in budgeting: the Gemini app kept recommending a dead app while Google's search surface got it right. The model and the surface are not the same animal.
A ghost named Bard
I seeded the roster with Bard, the assistant Google renamed to Gemini in early 2024, an AI version of the dead-app test that caught Mint still being recommended in budgeting. Grounding kills the ghost cleanly. The one engine that named Bard as a live product was Mistral on its ungrounded run, six times. Switch its web search on and that drops to a single mention, the correct one: "Gemini, formerly Bard." Every search-grounded engine, Google's own included, passes. Without grounding, a model will happily recommend a product that no longer exists under that name.
The shape of the category
Three names own the top: ChatGPT, Claude and Gemini, clustered at 40 to 47, with Perplexity and Microsoft Copilot trailing and everything else in single digits. It's a consolidated market, but consolidated around three brands rather than one, unlike budgeting's single leader or fitness's open scrum. And "AI app" fractures the moment you add a job: ask for the best app for coding and the assistants name Cursor and GitHub Copilot, none of the consumer chatbots; for writing, Grammarly and Notion AI; for research, Perplexity and a wall of academic tools. The famous names win the generic question and lose the specific one.
What this is
Thirty-four frozen questions across four API assistants on the fast model tier, all with web search on, plus Google's AI Overviews captured from the web. US English, k=1. Every engine is grounded and counts in the score. A preview, like the others. See the AI apps index, or the budgeting ghost and fitness write-ups for how the same method reads three very different markets.