Radiologist and AI system struggling to identify deepfake X-ray images in a medical study.

Study finds radiologists and AI models struggle to spot AI-generated “deepfake” X-rays

2026년 03월 26일

AI에 의해 보고됨

AI에 의해 생성된 이미지

사실 확인됨

A study published March 24, 2026 in *Radiology* reports that AI-generated “deepfake” X-rays can be convincing enough to mislead radiologists and several multimodal AI systems. In testing, radiologists’ average accuracy rose from 41% when they were not told fakes were included to 75% when they were warned, highlighting potential risks for medical imaging security and clinical decision-making.

Radiologists from 12 institutions across six countries—the United States, France, Germany, Turkey, the United Kingdom and the United Arab Emirates—took part in image-review tests described in a study published March 24, 2026 in Radiology, the journal of the Radiological Society of North America.

The study included 17 radiologists with experience ranging from beginners to physicians with up to 40 years in practice. They evaluated 264 X-ray images split evenly between real scans and AI-generated images.

Participants reviewed two image sets with no overlap. One set mixed real images with AI-generated radiographs made using ChatGPT and included X-rays from different parts of the body. The second set focused on chest X-rays, with half real and half generated using RoentGen, an open-source diffusion model developed by Stanford Medicine researchers.

When radiologists were not told that fake images were included, they recognized the AI-generated X-rays 41% of the time. After they were informed synthetic images were present, their average accuracy in distinguishing real from fake rose to 75%.

Performance varied widely across individuals. Radiologists correctly identified between 58% and 92% of the ChatGPT-generated images. For RoentGen-generated chest X-rays, radiologists’ accuracy ranged from 62% to 78%.

The researchers also tested several multimodal large language models on the same task. Four systems—GPT-4o and GPT-5 from OpenAI, Gemini 2.5 Pro from Google, and Llama 4 Maverick from Meta—achieved accuracy rates ranging from 57% to 85%. Even ChatGPT-4o, which was used to generate some of the deepfake images, did not detect all of them, though it performed better than the other models.

The study found no link between years of radiology experience and the ability to identify fake X-rays, but reported that musculoskeletal radiologists performed significantly better than other subspecialists.

Lead author Mickael Tordjman, M.D., a post-doctoral fellow at the Icahn School of Medicine at Mount Sinai in New York, said the results point to both legal and cybersecurity vulnerabilities. “This creates a high-stakes vulnerability for fraudulent litigation if, for example, a fabricated fracture could be indistinguishable from a real one,” he said, adding that there is “a significant cybersecurity risk if hackers were to gain access to a hospital’s network and inject synthetic images to manipulate patient diagnoses or cause widespread clinical chaos by undermining the fundamental reliability of the digital medical record.”

Tordjman also described visual patterns that may appear in synthetic images, saying deepfake medical images can look “too perfect,” with overly smooth bones, unnaturally straight spines, overly symmetrical lungs, excessively uniform blood vessel patterns and unusually clean-looking fractures.

To reduce the risk of tampering and misattribution, the researchers recommended safeguards including invisible watermarks embedded directly into images and cryptographic signatures linked to the imaging technologist at the time of image capture. They also said they released a curated deepfake dataset with interactive quizzes intended for training and awareness.

“We are potentially only seeing the tip of the iceberg,” Tordjman said, arguing that AI-generated 3D images such as CT and MRI could be the next step and that detection tools and educational resources should be developed early.

사람들이 말하는 것

Discussions on X express alarm over a study showing radiologists detect AI-generated deepfake X-rays at only 41% accuracy unaware, improving to 75% when warned, while AI models also falter. Reactions highlight risks to clinical decisions, research integrity, insurance, and cybersecurity. Experts and outlets urge detection training and datasets. Sentiments include disturbance, concern for trust erosion, and calls for safeguards.

AI-generated “deepfake” X-ray images can be difficult to distinguish from authentic images, even for experienced radiologists and advanced multimodal AI models, according to a new study published in @radiology_rsna. Read the full story: https://t.co/87pXCixMZe #MedicalImaging… pic.twitter.com/yCv72Oz2aH
— RSNA (@RSNA) 2026년 3월 24일

A new study published bears disturbing news: Radiologists are not able to easily distinguish AI-generated “deepfake” x-ray images from authentic ones #radiology #medicalAI #xray @IcahnMountSinai https://t.co/p7C3IrMetc https://t.co/p7C3IrMetc
— AuntMinnie.com (@AuntMinnie) 2026년 3월 24일

The majority of radiologists and 4 LLMs were unable to differentiate synthetic, deepfake scans from real oneshttps://t.co/aAJhP0cpfG @RSNA pic.twitter.com/q0wdIKicK6
— Eric Topol (@EricTopol) 2026년 3월 24일

Can you spot a deepfake X-ray? Neither can your radiologist https://t.co/bI9ETj07k1
— STAT (@statnews) 2026년 3월 24일

Can you spot a deepfake X‑ray? 🧠🩻

Radiologists, and AI, struggled to tell ChatGPT‑generated radiographs from real ones. No model caught them all.

As synthetic imaging advances, deepfake detection training is critical.

🔍 Explore the dataset:https://t.co/soACrmSPRh pic.twitter.com/4Uk6BseWS9
— @RadiologyEditor (@RadiologyEditor) 2026년 3월 25일

Study finds radiologists and AI models struggle to spot AI-generated “deepfake” X-rays

사람들이 말하는 것

관련 기사

Deezer reports 44% of music uploads are now AI-generated amid rising fraud concerns

Generative AI outperforms human teams in analyzing medical data

UK study reveals AI agents evading safeguards in user interactions

Kobe University team reports AI can flag acromegaly from privacy-conscious hand photos

Top AI coding assistants fail one in four tasks

ZDF director calls for transparency after AI scandal

AI models fail to profit from Premier League betting in new study

Japan shows high AI trust despite low workplace use

Study shows AI can deanonymize online users from posts

AIs frequently recommend nuclear strikes in war simulations

OpenAI and Google bolster AI safeguards after Grok image scandal

이 웹사이트는 쿠키를 사용합니다