This is a wake-up call for those who blindly trust the answers given by AI chatbots. Google has published an assessment revealing the accuracy of AI chatbots. Using its recently launched FACTS benchmark suite, Google found that even the most powerful AI models fail to achieve better than 70 percent factual accuracy. In simple terms, this means that AI chatbots give the wrong answer approximately one out of every three responses.
Gemini 3 Pro was the most accurate
In Google’s benchmark tests, the company’s Gemini 3 Pro model performed best, with 69 percent accuracy. The models of OpenAI, Anthropic and Elon Musk’s xAI could not even reach this level. Gemini 2.5 Pro and ChatGPT-5 answered with 62 percent accuracy, while Cloud 4.5 Opus showed 51 percent accuracy and Grok 4 showed about 54 percent accuracy. Most AI models faltered in multimodal tasks, and their accuracy dropped below 50 percent.
How does Google’s benchmark test work?
Google’s benchmark assesses the capabilities of AI models in a different way. While most tests involve tasks like summarizing text or writing code, the FACTS benchmark asks the model to verify how factually accurate the information it provides is. This model works on four practical use cases. The first test checks whether the model can provide factual answers based on the data used during training. The second test evaluates the model’s search performance, the third test examines how well the model relies on a given document to derive new and additional details, and the fourth test tests its multimodal understanding capabilities, such as the ability to understand charts, diagrams, and images.












