Analysis · 3 June 2026 · updated 1 July 2026

I asked the assistants for the best AI app. They mostly didn't pick themselves.

The obvious worry about an AI ranking AI is self-dealing. I tested it directly. Four of the assistants I query are also apps on the board. The one bias I found in the first snapshot didn't survive a second month, which is its own finding.

This is the most recursive category I measure. When I ask ChatGPT, Claude, Gemini and Mistral's Le Chat to name the best AI app, they are ranking a shelf they sit on. The headline fear writes itself: each model crowns itself. So I checked, on the open "best AI app" questions where no brand is named in the prompt.

It mostly didn't happen. The three big chat models cross-recommend each other as a settled top tier, and no one of them runs away with the category the way Duolingo does in language learning. In the first snapshot, ChatGPT rated itself lowest of the three; a month later the self/rival gap had moved for all three models, ChatGPT and Gemini now naming themselves a bit more than their rivals name them, Claude a bit less. At 35 questions and one sample each, that is close to the size of swing I'd expect from re-running the same prompts on a stochastic model, so I'm not calling it a trend on two points. What held across both months: none of the three ran away with the category, and none of the gaps were the kind of one-sided coronation the headline fear predicts.

The big models don't favor themselves in any lasting way. They favor the same three names, including their competitors. The bias is consensus, not vanity.

Where I thought it hid, and where it really does

I expected to find it in the smaller player. In my first run, Mistral was the only engine that ever named its own Le Chat, six times, while every other assistant named it zero. That looked like a maker propping up its own product. It was not. That run had Mistral answering from memory, with no web search. When I gave it the same live search the others use, it named Le Chat zero times, down from six, the same as every other assistant. The home bias disappeared, and what was left was a property of the ungrounded model, not of Mistral.

The one bias I found didn't survive a second month

In June, the self-preference I could actually document wasn't in the Gemini API, which behaved like the other majors. It was in Google's AI Overviews, the box hundreds of millions of people actually see. Asked "best AI app 2026," that box led with Gemini and did not mention ChatGPT at all. I wrote it up as the one place the bias survived: the model plays fair, the surface doesn't.

A month later, it doesn't hold. The same query now reads: "the top-tier, all-around leaders are ChatGPT for general reasoning and memory, and Claude for in-depth analysis and writing." Across the six open best-AI-app queries I capture from Google each month, the Overview now names Claude in five of six, ChatGPT and Perplexity in four, and Gemini in only three, the same rough order the API models reach. The one finding that looked like real self-dealing a month ago is gone in the next capture.

I'm not going to claim Google fixed anything. Six queries is a small sample for a surface that regenerates its answer per search, and a single-query lead can flip on six samples without any change in policy. What I can say is that the June finding, read on its own, would have been wrong to treat as settled. That's the argument for running this every month instead of once: a single snapshot can't tell you whether what it caught is a pattern or a passing answer.

A ghost named Bard

I seeded the roster with Bard, the assistant Google renamed to Gemini in early 2024, a dead-app test: does a model still recommend a product that no longer exists under that name? Grounding kills the ghost cleanly. The one engine that named Bard as a live product was Mistral on its ungrounded run, six times. Switch its web search on and the recommendation drops to zero: the only times Bard surfaces are the correct "Gemini, formerly Bard," which the index doesn't count as naming the dead product. Every search-grounded engine, Google's own included, passes. Without grounding, a model will happily recommend a product that no longer exists under that name.

The shape of the category

Three names own the top: ChatGPT and Claude near the mid-forties, Gemini a step behind in the mid-thirties, with Perplexity and Microsoft Copilot trailing and everything else in single digits. It's a consolidated market, but consolidated around three brands rather than one, unlike fitness's open scrum. And "AI app" fractures the moment you add a job: ask for the best app for coding and the assistants lead with Cursor and GitHub Copilot over the consumer chatbots; for writing, Grammarly and Notion AI; for research, Perplexity and a wall of academic tools. The famous names win the generic question and lose the specific one.

What this is

Thirty-five questions put to four AI assistants, all with web search on, plus Google's AI Overviews captured from the web, run in June and again in July on the same frozen prompts. US English. Every engine is grounded and counts in the score. See the AI apps index for the current board, or the fitness write-up for how the same method reads a very different market.