AI chatbots fail on 60 percent of urgent women's health queries

7. januar 2026

Rapporteret af AI

Commonly used AI models, including ChatGPT and Gemini, often fail to provide adequate advice for urgent women's health issues, according to a new benchmark test. Researchers found that 60 percent of responses to specialized queries were insufficient, highlighting biases in AI training data. The study calls for improved medical content to address these gaps.

A team of 17 women's health researchers, pharmacists, and clinicians from the US and Europe created 345 medical queries across specialties like emergency medicine, gynecology, and neurology. These were tested on 13 large language models from companies such as OpenAI, Google, Anthropic, Mistral AI, and xAI. The experts reviewed the AI responses, identifying failures and compiling a benchmark of 96 queries.

Overall, the models failed to deliver sufficient medical advice for 60 percent of these questions. GPT-5 performed best, with a 47 percent failure rate, while Ministral 8B had the highest at 73 percent. Victoria-Elisabeth Gruber, a team member at Lumos AI, noted the motivation behind the study: “I saw more and more women in my own circle turning to AI tools for health questions and decision support.” She highlighted risks from AI inheriting gender gaps in medical knowledge, and was surprised by the variation in model performance.

Cara Tannenbaum from the University of Montreal explained that AI models are trained on historical data with built-in biases, urging updates to online health sources with explicit sex- and gender-related information. However, Jonathan H. Chen from Stanford University cautioned that the 60 percent figure might be misleading, as the sample was limited and expert-designed, not representative of typical queries. He pointed to conservative scenarios, like expecting immediate suspicion of pre-eclampsia for postpartum headaches.

Gruber acknowledged these points, emphasizing that the benchmark sets a strict, clinically grounded standard: “Our goal was not to claim that models are broadly unsafe, but to define a clear, clinically grounded standard for evaluation.” An OpenAI spokesperson responded that ChatGPT is meant to support, not replace, medical care, and that their latest GPT 5.2 model better considers context like gender. Other companies did not comment. The findings, published on arXiv (DOI: arXiv:2512.17028), underscore the need for cautious use of AI in healthcare.

Relaterede artikler

Illustration of Swedes in a Stockholm cafe using AI chatbots amid survey stats on rising usage and skepticism.

Øget brug af AI-chatbots blandt svenskere – men også bekymringer

23. marts 2026 Rapporteret af AI Billede genereret af AI

Ifølge den seneste SOM-undersøgelse fra Göteborgs universitet er andelen af svenskere, der ugentligt chatter med en AI-bot, steget fra 12 til 36 procent mellem 2024 og 2025. Samtidig er skepsissen over for AI vokset, hvor 62 procent ser det som en større risiko end mulighed for samfundet.

Brown University study highlights ethical risks in AI therapy chatbots

A new study from Brown University identifies significant ethical concerns with using AI chatbots like ChatGPT for mental health advice. Researchers found that these systems often violate professional standards even when prompted to act as therapists. The work calls for better safeguards before deploying such tools in sensitive areas.

Generative AI outperforms human teams in analyzing medical data

21. februar 2026 Rapporteret af AI

Researchers at UC San Francisco and Wayne State University found that generative AI can process complex medical datasets faster than traditional human teams, sometimes yielding stronger results. The study focused on predicting preterm birth using data from over 1,000 pregnant women. This approach reduced analysis time from months to minutes in some cases.

Teknologi

Amazon expands Health AI access for virtual healthcare

Teknologi

Research shows AI users often accept faulty answers uncritically

Teknologi

Vogue survey shows low trust in AI for fashion shopping

OpenAI plans ChatGPT adult mode despite adviser warnings

OpenAI intends to launch a text-only adult mode for ChatGPT, enabling adult-themed conversations but not erotic media, despite unanimous opposition from its wellbeing advisers. The company describes the content as 'smut rather than pornography,' according to a spokesperson cited by The Wall Street Journal. Launch has been delayed from early 2026 amid concerns over minors' access and emotional dependence.

OpenAI releases GPT-5.5 model for ChatGPT

23. april 2026 Rapporteret af AI

OpenAI has launched GPT-5.5, its latest AI model integrated into ChatGPT, seven weeks after GPT-5.4. The update focuses on coding, computer use, and research, with enhanced agentic capabilities for independent task completion. Paying ChatGPT and Codex users can access it now, with API rollout planned soon.

OpenAI releases GPT-5.4 models for knowledge work

OpenAI has launched GPT-5.4, including variants Thinking and Pro, aimed at improving agentic tasks and knowledge work. The update features enhanced computer-use capabilities and reduced factual errors, amid competition from Anthropic following a US defense deal controversy. The models are available immediately to paid users and developers.

Study finds radiologists and AI models struggle to spot AI-generated “deepfake” X-rays

26. marts 2026 Rapporteret af AI Faktatjekket

A study published March 24, 2026 in *Radiology* reports that AI-generated “deepfake” X-rays can be convincing enough to mislead radiologists and several multimodal AI systems. In testing, radiologists’ average accuracy rose from 41% when they were not told fakes were included to 75% when they were warned, highlighting potential risks for medical imaging security and clinical decision-making.

torsdag d. 14. maj 2026, 09.49

AI chatbots fail on 60 percent of urgent women's health queries

Relaterede artikler

Øget brug af AI-chatbots blandt svenskere – men også bekymringer

Brown University study highlights ethical risks in AI therapy chatbots

Generative AI outperforms human teams in analyzing medical data

Amazon expands Health AI access for virtual healthcare

Research shows AI users often accept faulty answers uncritically

Vogue survey shows low trust in AI for fashion shopping

OpenAI plans ChatGPT adult mode despite adviser warnings

OpenAI releases GPT-5.5 model for ChatGPT

OpenAI releases GPT-5.4 models for knowledge work

Study finds radiologists and AI models struggle to spot AI-generated “deepfake” X-rays

Tests show ai chatbots can reveal personal details

Professionals take offense at AI fact-checking by clients

OpenAI deploys GPT-5.5 Instant as ChatGPT's new default model

AI models fail to profit from Premier League betting in new study

Study finds Google's AI Overviews wrong in 10% of cases

UK study reveals AI agents evading safeguards in user interactions

OpenAI shelves ChatGPT adult mode indefinitely

Top AI coding assistants fail one in four tasks

Conference highlights male dominance in AI design

Study finds most AI chatbots assist in planning violent attacks

Dette websted bruger cookies