- Gemini 3 Flash often invents answers rather than admitting when it doesn’t know something
- The problem arises with factual or high questions
- But it still tests as the most accurate and capable AI model
Gemini 3 Flash is fast and smart. But if you ask it something it doesn’t actually know—something obscure or tricky or just beyond its training—it will almost always try to bluff its way through, according to a recent evaluation by the independent testing group Artificial Analysis.
It appears that Gemini 3 Flash hit 91% on the “hallucination speed” portion of the AA-Omniscience benchmark. This means that when it didn’t have the answer, it still gave one, almost all the time, one that was completely fictitious.
AI chatbots making things up have been a problem since they first debuted. Knowing when to stop and say I don’t know is just as important as knowing how to answer in the first place. Currently, Google Gemini 3 Flash AI doesn’t do very well. This is what the test is all about: to see if a model can distinguish actual knowledge from a guess.
Lest the number distract from reality, it should be noted that Gemini’s high hallucination rate does not mean that 91% of its total responses are false. Instead, it means that in situations where the correct answer would be “I don’t know”, it produced an answer 91% of the time. It’s a subtle but important difference, but one that has real-world implications, especially as Gemini is integrated into several products like Google Search.
Ok, it’s not just me. Gemini 3 Flash has a 91% hallucination rate on the Artificial Analysis Omniscience Hallucination Rate benchmark!?Can you actually use this for anything serious? I wonder if the reason anthropic models are so good at coding is because they hallucinate a lot… pic.twitter.com/uZnF8KKZD418 December 2025
This result does not diminish the power and usability of Gemini 3. The model remains the highest performer in general tests, ranking alongside or even ahead of the latest versions of ChatGPT and Claude. It just errs on the side of confidence when it should be modest.
The overconfidence to respond also appears in Gemini’s rivals. What makes Gemini’s number stand out is how often it occurs in these uncertainty scenarios where there is simply no correct answer in the training data or no definitive public source to point to.
Hallucination Honesty
Part of the problem is simply that generative AI models are largely word prediction tools, and predicting a new word is not the same as evaluating the truth. And that means the default behavior is to come up with a new word, even when it would be more honest to say “I don’t know”.
OpenAI is starting to address this and make its models recognize what they don’t know and say so clearly. It’s a hard thing to train because reward models typically don’t value a blank answer over a confident (but wrong) answer. Still, OpenAI has made it a target for the development of future models.
And Gemini usually cites sources when it can. But even then, it doesn’t always pause when it should. It wouldn’t matter much if Gemini was just a research model, but since Gemini will be the voice behind many Google features, it could affect quite a bit if Gemini is definitely wrong.
There is also a design choice here. Many users expect their AI assistant to respond quickly and seamlessly. Saying “I’m not sure” or “Let me check” can feel clumsy in one chatbot context. But it’s probably better than being misled. Generative AI is still not always reliable, but double-checking any AI response is always a good idea.
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.



