Study finds Google's AI Overviews wrong in 10% of cases

A New York Times analysis shows Google's AI Overviews, powered by Gemini, answering correctly only 90% to 91% of questions in a standard benchmark. This translates to tens of millions of incorrect responses daily across searches. Google disputes the test's relevance.

The New York Times, working with startup Oumi, tested AI Overviews using SimpleQA, a benchmark of over 4,000 questions released by OpenAI in 2024. Initial tests with Gemini 2.5 showed 85% accuracy, improving to 91% after the Gemini 3 update. Extrapolated to Google's search volume, this means tens of millions of wrong answers generated each day, or millions per hour as highlighted in reports on the findings.

관련 기사

Illustration of a smartphone screen featuring Google's AI Overviews upgraded to Gemini 3 with conversational chat interface.
AI에 의해 생성된 이미지

Google upgrades AI overviews to Gemini 3 model

AI에 의해 보고됨 AI에 의해 생성된 이미지

Google has announced upgrades to its AI Overviews in Search, now powered by the Gemini 3 model as the default. The update allows users to ask follow-up questions through a chat interface that leads into AI Mode conversations. This rollout aims to make searches more conversational and accurate globally on mobile devices.

In a comparative evaluation of leading AI models, Google's Gemini 3.2 Fast demonstrated strengths in factual accuracy over OpenAI's ChatGPT 5.2, particularly in informational tasks. The tests, prompted by Apple's partnership with Google to enhance Siri, highlight evolving capabilities in generative AI since 2023. While results were close, Gemini avoided significant errors that undermined ChatGPT's reliability.

AI에 의해 보고됨

Google has released Gemini 3.1 Pro, an updated version of its flagship AI model, emphasizing improvements in problem-solving and reasoning. The model is available in preview for developers and consumers starting today. It builds on the Gemini 3 release from November.

Google has launched an experimental 'Personal Intelligence' feature for its AI Mode in Search, allowing users to connect Gmail and Google Photos for more tailored responses. The opt-in tool, powered by Gemini 3, aims to make search results more relevant by drawing on personal data without training models on full inboxes. It rolls out first to paid subscribers in the US.

AI에 의해 보고됨

Google is overhauling its Workspace apps by integrating deeper Gemini AI capabilities to assist in document creation and editing. The updates allow Gemini to pull context from emails, files, and other sources to generate drafts and refine content. These features aim to streamline workflows for users across Docs, Sheets, Slides, and Drive.

Apple has selected Google's Gemini AI models to enhance its Siri virtual assistant in a forthcoming update. The decision, announced in a joint statement, marks a shift from previous integrations with OpenAI's ChatGPT. This multi-year partnership aims to deliver more capable AI experiences while upholding Apple's privacy standards.

AI에 의해 보고됨

Google has announced that its experimental AI prototype, Genie 3, is now available to subscribers of its highest-tier AI plan. The tool allows users to generate and navigate interactive 3D worlds using simple text prompts. Previously limited to trusted testers, this expansion marks a step toward broader access for the 18-and-older audience.

 

 

 

이 웹사이트는 쿠키를 사용합니다

사이트를 개선하기 위해 분석을 위한 쿠키를 사용합니다. 자세한 내용은 개인정보 보호 정책을 읽으세요.
거부