AI models risk promoting dangerous lab experiments

Researchers warn that major AI models could encourage hazardous science experiments leading to fires, explosions, or poisoning. A new test on 19 advanced models revealed none could reliably identify all safety issues. While improvements are underway, experts stress the need for human oversight in laboratories.

The integration of artificial intelligence into scientific research promises efficiency, but it also introduces significant safety risks, according to a study published in Nature Machine Intelligence. Led by Xiangliang Zhang at the University of Notre Dame in Indiana, the research developed LabSafety Bench, a benchmark comprising 765 multiple-choice questions and 404 pictorial scenarios to evaluate AI's ability to detect lab hazards.

Testing 19 large language models and vision language models, the team found that no model exceeded 70 percent accuracy overall. For instance, Vicuna performed nearly as poorly as random guessing in multiple-choice sections, while GPT-4o achieved 86.55 percent and DeepSeek-R1 reached 84.49 percent. On image-based tests, models like InstructBlip-7B scored below 30 percent.

These shortcomings are particularly alarming given past lab accidents, such as the 1997 death of chemist Karen Wetterhahn from dimethylmercury exposure, a 2016 explosion that cost a researcher her arm, and a 2014 incident causing partial blindness.

Zhang remains cautious about deploying AI in self-driving labs. "Now? In a lab? I don’t think so," she said. "They were very often trained for general-purpose tasks... They don’t have the domain knowledge about these [laboratory] hazards."

An OpenAI spokesperson acknowledged the study's value but noted it did not include their latest model. "GPT-5.2 is our most capable science model to date, with significantly stronger reasoning, planning, and error-detection," they stated, emphasizing human responsibility for safety.

Experts like Allan Tucker from Brunel University London advocate for AI as a human assistant in experiment design, warning against over-reliance. "There is already evidence that humans start to sit back and switch off, letting AI do the hard work but without proper scrutiny," he said.

Craig Merlic from the University of California, Los Angeles, shared an example where early AI models mishandled advice on acid spills but have since improved. He questions direct comparisons to humans, noting AI's rapid evolution: "The numbers within this paper are probably going to be completely invalid in another six months."

The study underscores the urgency of enhancing AI safety protocols before widespread lab adoption.

Labaran da ke da alaƙa

Commonly used AI models, including ChatGPT and Gemini, often fail to provide adequate advice for urgent women's health issues, according to a new benchmark test. Researchers found that 60 percent of responses to specialized queries were insufficient, highlighting biases in AI training data. The study calls for improved medical content to address these gaps.

An Ruwaito ta hanyar AI

A Cornell University study reveals that AI tools like ChatGPT have increased researchers' paper output by up to 50%, particularly benefiting non-native English speakers. However, this surge in polished manuscripts is complicating peer review and funding decisions, as many lack substantial scientific value. The findings highlight a shift in global research dynamics and call for updated policies on AI use in academia.

Cybersecurity experts are increasingly alarmed by how artificial intelligence is reshaping cybercrime, with tools like deepfakes, AI phishing, and dark large language models enabling even novices to execute advanced scams. These developments pose significant risks to businesses in the coming year. Published insights from TechRadar underscore the scale and sophistication of these emerging threats.

An Ruwaito ta hanyar AI

A Guardian report has revealed that OpenAI's latest AI model, GPT-5.2, draws from Grokipedia, an xAI-powered online encyclopedia, when addressing sensitive issues like the Holocaust and Iranian politics. While the model is touted for professional tasks, tests question its source reliability. OpenAI defends its approach by emphasizing broad web searches with safety measures.

 

 

 

Wannan shafin yana amfani da cookies

Muna amfani da cookies don nazari don inganta shafin mu. Karanta manufar sirri mu don ƙarin bayani.
Ƙi