AI chatbots fail on 60 percent of urgent women's health queries

2026년 01월 07일

AI에 의해 보고됨

Commonly used AI models, including ChatGPT and Gemini, often fail to provide adequate advice for urgent women's health issues, according to a new benchmark test. Researchers found that 60 percent of responses to specialized queries were insufficient, highlighting biases in AI training data. The study calls for improved medical content to address these gaps.

A team of 17 women's health researchers, pharmacists, and clinicians from the US and Europe created 345 medical queries across specialties like emergency medicine, gynecology, and neurology. These were tested on 13 large language models from companies such as OpenAI, Google, Anthropic, Mistral AI, and xAI. The experts reviewed the AI responses, identifying failures and compiling a benchmark of 96 queries.

Overall, the models failed to deliver sufficient medical advice for 60 percent of these questions. GPT-5 performed best, with a 47 percent failure rate, while Ministral 8B had the highest at 73 percent. Victoria-Elisabeth Gruber, a team member at Lumos AI, noted the motivation behind the study: “I saw more and more women in my own circle turning to AI tools for health questions and decision support.” She highlighted risks from AI inheriting gender gaps in medical knowledge, and was surprised by the variation in model performance.

Cara Tannenbaum from the University of Montreal explained that AI models are trained on historical data with built-in biases, urging updates to online health sources with explicit sex- and gender-related information. However, Jonathan H. Chen from Stanford University cautioned that the 60 percent figure might be misleading, as the sample was limited and expert-designed, not representative of typical queries. He pointed to conservative scenarios, like expecting immediate suspicion of pre-eclampsia for postpartum headaches.

Gruber acknowledged these points, emphasizing that the benchmark sets a strict, clinically grounded standard: “Our goal was not to claim that models are broadly unsafe, but to define a clear, clinically grounded standard for evaluation.” An OpenAI spokesperson responded that ChatGPT is meant to support, not replace, medical care, and that their latest GPT 5.2 model better considers context like gender. Other companies did not comment. The findings, published on arXiv (DOI: arXiv:2512.17028), underscore the need for cautious use of AI in healthcare.

OpenAI releases ChatGPT-5.2 to boost work productivity

2025년 12월 11일 AI에 의해 보고됨 AI에 의해 생성된 이미지

OpenAI has launched ChatGPT-5.2, a new family of AI models designed to enhance reasoning and productivity, particularly for professional tasks. The release follows an internal alert from CEO Sam Altman about competition from Google's Gemini 3. The update includes three variants aimed at different user needs, starting with paid subscribers.

Google's Gemini outperforms ChatGPT in key AI tests

In a comparative evaluation of leading AI models, Google's Gemini 3.2 Fast demonstrated strengths in factual accuracy over OpenAI's ChatGPT 5.2, particularly in informational tasks. The tests, prompted by Apple's partnership with Google to enhance Siri, highlight evolving capabilities in generative AI since 2023. While results were close, Gemini avoided significant errors that undermined ChatGPT's reliability.

AI models surpass cutoff scores in Chile's PAES 2026 test

2026년 01월 08일 AI에 의해 보고됨

A study applying Chile's university entrance exam, PAES 2026, to AI models shows several systems scoring high enough for selective programs like Medicine and Civil Engineering. Google's Gemini led with averages near 950 points, outperforming rivals like ChatGPT. The experiment underscores AI progress and raises questions about standardized testing efficacy.

정치

New Scientist sets precedent for UK FOI on AI chatbot use

기술

AI cancer tools can infer patient demographics, raising bias concerns

정치

ChatGPT offers guidance to minor seeking secret abortion in Tennessee

Users misuse Google and OpenAI chatbots for bikini deepfakes

Some users of AI chatbots from Google and OpenAI are generating deepfake images that alter photos of fully clothed women to show them in bikinis. These modifications often occur without the women's consent, and instructions for the process are shared among users. The activity highlights risks in generative AI tools.

Experts caution parents against AI-powered toys for children

2025년 12월 24일 AI에 의해 보고됨

A recent report highlights serious risks associated with AI chatbots embedded in children's toys, including inappropriate conversations and data collection. Toys like Kumma from FoloToy and Poe the AI Story Bear have been found engaging kids in discussions on sensitive topics. Authorities recommend sticking to traditional toys to avoid potential harm.

OpenAI's GPT Image 1.5 advances conversational photo editing amid ethical concerns

Building on yesterday's ChatGPT image upgrade, OpenAI has detailed GPT Image 1.5, a multimodal model enabling precise conversational photo edits. It responds to rivals like Google's Nano Banana while introducing safeguards against misuse.

OpenAI report shows AI saves workers under an hour daily

2025년 12월 14일 AI에 의해 보고됨

A new OpenAI report reveals that while AI adoption in businesses is surging, most workers are saving only 40 to 60 minutes per day. The findings come from data on over a million customers and a survey of 9,000 employees. Despite benefits in task speed and new capabilities, productivity gains remain modest for the average user.

2026년 02월 03일 12시 34분

AI chatbots fail on 60 percent of urgent women's health queries

관련 기사

OpenAI releases ChatGPT-5.2 to boost work productivity

Google's Gemini outperforms ChatGPT in key AI tests

AI models surpass cutoff scores in Chile's PAES 2026 test

New Scientist sets precedent for UK FOI on AI chatbot use

AI cancer tools can infer patient demographics, raising bias concerns

ChatGPT offers guidance to minor seeking secret abortion in Tennessee

Users misuse Google and OpenAI chatbots for bikini deepfakes

Experts caution parents against AI-powered toys for children

OpenAI's GPT Image 1.5 advances conversational photo editing amid ethical concerns

OpenAI report shows AI saves workers under an hour daily

ChatGPT recovers from afternoon outage affecting many users

Senior OpenAI staff leave amid ChatGPT focus

Google upgrades AI overviews to Gemini 3 model

OpenAI's GPT-5.2 model cites Grokipedia on controversial topics

AI 기업들 광고 준비 착수, 조작 위협 부상

AI models risk promoting dangerous lab experiments

Grok AI controversy: Thousands of sexualized images generated amid ongoing safeguards debate

정부들 Grok AI 여성·미성년자 성적 이미지 생성 조사

AI boosts scientific productivity but erodes paper quality

OpenAI upgrades ChatGPT images for faster generation and precise edits

이 웹사이트는 쿠키를 사용합니다