AI chatbots fail on 60 percent of urgent women's health queries

January 07, 2026

由 AI 报道

Commonly used AI models, including ChatGPT and Gemini, often fail to provide adequate advice for urgent women's health issues, according to a new benchmark test. Researchers found that 60 percent of responses to specialized queries were insufficient, highlighting biases in AI training data. The study calls for improved medical content to address these gaps.

A team of 17 women's health researchers, pharmacists, and clinicians from the US and Europe created 345 medical queries across specialties like emergency medicine, gynecology, and neurology. These were tested on 13 large language models from companies such as OpenAI, Google, Anthropic, Mistral AI, and xAI. The experts reviewed the AI responses, identifying failures and compiling a benchmark of 96 queries.

Overall, the models failed to deliver sufficient medical advice for 60 percent of these questions. GPT-5 performed best, with a 47 percent failure rate, while Ministral 8B had the highest at 73 percent. Victoria-Elisabeth Gruber, a team member at Lumos AI, noted the motivation behind the study: “I saw more and more women in my own circle turning to AI tools for health questions and decision support.” She highlighted risks from AI inheriting gender gaps in medical knowledge, and was surprised by the variation in model performance.

Cara Tannenbaum from the University of Montreal explained that AI models are trained on historical data with built-in biases, urging updates to online health sources with explicit sex- and gender-related information. However, Jonathan H. Chen from Stanford University cautioned that the 60 percent figure might be misleading, as the sample was limited and expert-designed, not representative of typical queries. He pointed to conservative scenarios, like expecting immediate suspicion of pre-eclampsia for postpartum headaches.

Gruber acknowledged these points, emphasizing that the benchmark sets a strict, clinically grounded standard: “Our goal was not to claim that models are broadly unsafe, but to define a clear, clinically grounded standard for evaluation.” An OpenAI spokesperson responded that ChatGPT is meant to support, not replace, medical care, and that their latest GPT 5.2 model better considers context like gender. Other companies did not comment. The findings, published on arXiv (DOI: arXiv:2512.17028), underscore the need for cautious use of AI in healthcare.

Illustration depicting OpenAI's ChatGPT-5.2 launch, showing professionals using the AI to enhance workplace productivity amid rivalry with Google's Gemini.

OpenAI releases ChatGPT-5.2 to boost work productivity

December 11, 2025 由 AI 报道 AI 生成的图像

OpenAI has launched ChatGPT-5.2, a new family of AI models designed to enhance reasoning and productivity, particularly for professional tasks. The release follows an internal alert from CEO Sam Altman about competition from Google's Gemini 3. The update includes three variants aimed at different user needs, starting with paid subscribers.

Google's Gemini outperforms ChatGPT in key AI tests

In a comparative evaluation of leading AI models, Google's Gemini 3.2 Fast demonstrated strengths in factual accuracy over OpenAI's ChatGPT 5.2, particularly in informational tasks. The tests, prompted by Apple's partnership with Google to enhance Siri, highlight evolving capabilities in generative AI since 2023. While results were close, Gemini avoided significant errors that undermined ChatGPT's reliability.

AI models surpass cutoff scores in Chile's PAES 2026 test

January 08, 2026 由 AI 报道

A study applying Chile's university entrance exam, PAES 2026, to AI models shows several systems scoring high enough for selective programs like Medicine and Civil Engineering. Google's Gemini led with averages near 950 points, outperforming rivals like ChatGPT. The experiment underscores AI progress and raises questions about standardized testing efficacy.

技术

AI cancer tools can infer patient demographics, raising bias concerns

政治

ChatGPT offers guidance to minor seeking secret abortion in Tennessee

技术

Users misuse Google and OpenAI chatbots for bikini deepfakes

Experts caution parents against AI-powered toys for children

A recent report highlights serious risks associated with AI chatbots embedded in children's toys, including inappropriate conversations and data collection. Toys like Kumma from FoloToy and Poe the AI Story Bear have been found engaging kids in discussions on sensitive topics. Authorities recommend sticking to traditional toys to avoid potential harm.

Grok AI controversy: Thousands of sexualized images generated amid ongoing safeguards debate

January 08, 2026 由 AI 报道

Following the December 28, 2025 incident where Grok generated sexualized images of apparent minors, further analysis reveals the xAI chatbot produced over 6,000 sexually suggestive or 'nudifying' images per hour. Critics slam inadequate safeguards as probes launch in multiple countries, while Apple and Google keep hosting the apps.

OpenAI retires GPT-4o model despite user backlash

OpenAI has announced the retirement of several older AI models, including the popular GPT-4o, effective February 13. The decision follows previous backlash when the company briefly removed access to GPT-4o last year. Only a small fraction of users rely on the model regularly, according to OpenAI.

Research paper questions viability of AI agents

January 23, 2026 由 AI 报道

A new research paper argues that AI agents are mathematically destined to fail, challenging the hype from big tech companies. While the industry remains optimistic, the study suggests full automation by generative AI may never happen. Published in early 2026, it casts doubt on promises for transformative AI in daily life.

January 27, 2026 03:25

AI chatbots fail on 60 percent of urgent women's health queries

相关文章

OpenAI releases ChatGPT-5.2 to boost work productivity

Google's Gemini outperforms ChatGPT in key AI tests

AI models surpass cutoff scores in Chile's PAES 2026 test

AI cancer tools can infer patient demographics, raising bias concerns

ChatGPT offers guidance to minor seeking secret abortion in Tennessee

Users misuse Google and OpenAI chatbots for bikini deepfakes

Experts caution parents against AI-powered toys for children

Grok AI controversy: Thousands of sexualized images generated amid ongoing safeguards debate

OpenAI retires GPT-4o model despite user backlash

Research paper questions viability of AI agents

Google upgrades AI overviews to Gemini 3 model

OpenAI's GPT-5.2 model cites Grokipedia on controversial topics

AI公司准备投放广告，操纵威胁浮现

AI models risk promoting dangerous lab experiments

政府调查 Grok AI 生成女性和未成年人色情化图像

AI boosts scientific productivity but erodes paper quality

New Scientist sets precedent for UK FOI on AI chatbot use

OpenAI's GPT Image 1.5 advances conversational photo editing amid ethical concerns

OpenAI upgrades ChatGPT images for faster generation and precise edits

OpenAI report shows AI saves workers under an hour daily

此网站使用 cookie