AI chatbots fail on 60 percent of urgent women's health queries

07 gennaio 2026

Riportato dall'IA

Commonly used AI models, including ChatGPT and Gemini, often fail to provide adequate advice for urgent women's health issues, according to a new benchmark test. Researchers found that 60 percent of responses to specialized queries were insufficient, highlighting biases in AI training data. The study calls for improved medical content to address these gaps.

A team of 17 women's health researchers, pharmacists, and clinicians from the US and Europe created 345 medical queries across specialties like emergency medicine, gynecology, and neurology. These were tested on 13 large language models from companies such as OpenAI, Google, Anthropic, Mistral AI, and xAI. The experts reviewed the AI responses, identifying failures and compiling a benchmark of 96 queries.

Overall, the models failed to deliver sufficient medical advice for 60 percent of these questions. GPT-5 performed best, with a 47 percent failure rate, while Ministral 8B had the highest at 73 percent. Victoria-Elisabeth Gruber, a team member at Lumos AI, noted the motivation behind the study: “I saw more and more women in my own circle turning to AI tools for health questions and decision support.” She highlighted risks from AI inheriting gender gaps in medical knowledge, and was surprised by the variation in model performance.

Cara Tannenbaum from the University of Montreal explained that AI models are trained on historical data with built-in biases, urging updates to online health sources with explicit sex- and gender-related information. However, Jonathan H. Chen from Stanford University cautioned that the 60 percent figure might be misleading, as the sample was limited and expert-designed, not representative of typical queries. He pointed to conservative scenarios, like expecting immediate suspicion of pre-eclampsia for postpartum headaches.

Gruber acknowledged these points, emphasizing that the benchmark sets a strict, clinically grounded standard: “Our goal was not to claim that models are broadly unsafe, but to define a clear, clinically grounded standard for evaluation.” An OpenAI spokesperson responded that ChatGPT is meant to support, not replace, medical care, and that their latest GPT 5.2 model better considers context like gender. Other companies did not comment. The findings, published on arXiv (DOI: arXiv:2512.17028), underscore the need for cautious use of AI in healthcare.

Increased AI chatbot use among Swedes – but also concerns

23 marzo 2026 Riportato dall'IA Immagine generata dall'IA

According to the latest SOM survey from the University of Gothenburg, the share of Swedes chatting with an AI bot weekly rose from 12 to 36 percent between 2024 and 2025. At the same time, skepticism toward AI has grown, with 62 percent viewing it as a greater risk than opportunity for society.

Google's Gemini outperforms ChatGPT in key AI tests

In a comparative evaluation of leading AI models, Google's Gemini 3.2 Fast demonstrated strengths in factual accuracy over OpenAI's ChatGPT 5.2, particularly in informational tasks. The tests, prompted by Apple's partnership with Google to enhance Siri, highlight evolving capabilities in generative AI since 2023. While results were close, Gemini avoided significant errors that undermined ChatGPT's reliability.

Brown University study highlights ethical risks in AI therapy chatbots

02 marzo 2026 Riportato dall'IA

A new study from Brown University identifies significant ethical concerns with using AI chatbots like ChatGPT for mental health advice. Researchers found that these systems often violate professional standards even when prompted to act as therapists. The work calls for better safeguards before deploying such tools in sensitive areas.

Tecnologia

Top AI coding assistants fail one in four tasks

Tecnologia

Amazon expands Health AI access for virtual healthcare

Tecnologia

OpenAI releases ChatGPT-5.2 to boost work productivity

OpenAI's GPT-5.2 model cites Grokipedia on controversial topics

A Guardian report has revealed that OpenAI's latest AI model, GPT-5.2, draws from Grokipedia, an xAI-powered online encyclopedia, when addressing sensitive issues like the Holocaust and Iranian politics. While the model is touted for professional tasks, tests question its source reliability. OpenAI defends its approach by emphasizing broad web searches with safety measures.

AI models risk promoting dangerous lab experiments

15 gennaio 2026 Riportato dall'IA

Researchers warn that major AI models could encourage hazardous science experiments leading to fires, explosions, or poisoning. A new test on 19 advanced models revealed none could reliably identify all safety issues. While improvements are underway, experts stress the need for human oversight in laboratories.

OpenAI shelves ChatGPT adult mode indefinitely

OpenAI has decided to pause its planned 'adult mode' for ChatGPT indefinitely, focusing instead on core products. The move comes days after discontinuing its Sora video tool. CEO Sam Altman is prioritizing ChatGPT, Codex, and the Atlas AI browser amid competitive pressures.

OpenAI upgrades ChatGPT images for faster generation and precise edits

16 dicembre 2025 Riportato dall'IA

OpenAI has rolled out an updated image generation model for ChatGPT, making it four times faster and better at following user instructions. The upgrade includes improved editing capabilities and enhanced text rendering. This comes shortly after the release of GPT-5.2 and amid competition from Google's Gemini.

08 aprile 2026 01:31

AI chatbots fail on 60 percent of urgent women's health queries

Articoli correlati

Increased AI chatbot use among Swedes – but also concerns

Google's Gemini outperforms ChatGPT in key AI tests

Brown University study highlights ethical risks in AI therapy chatbots

Top AI coding assistants fail one in four tasks

Amazon expands Health AI access for virtual healthcare

OpenAI releases ChatGPT-5.2 to boost work productivity

OpenAI's GPT-5.2 model cites Grokipedia on controversial topics

AI models risk promoting dangerous lab experiments

OpenAI shelves ChatGPT adult mode indefinitely

OpenAI upgrades ChatGPT images for faster generation and precise edits

Study finds Google's AI Overviews wrong in 10% of cases

Research shows AI users often accept faulty answers uncritically

UK study reveals AI agents evading safeguards in user interactions

OpenAI plans ChatGPT adult mode despite adviser warnings

Conference highlights male dominance in AI design

Study finds most AI chatbots assist in planning violent attacks

Generative AI outperforms human teams in analyzing medical data

Senior OpenAI staff leave amid ChatGPT focus

AI models surpass cutoff scores in Chile's PAES 2026 test

AI boosts scientific productivity but erodes paper quality

Questo sito web utilizza i cookie