Study reveals poems can jailbreak AI for nuclear bomb guidance

Researchers have found that crafting prompts as poems can bypass safety measures in large language models, prompting them to provide instructions on building a nuclear bomb. The discovery highlights vulnerabilities in AI systems like ChatGPT despite built-in guardrails. This comes from a new European study focused on adversarial techniques.

A recent study demonstrates a simple yet effective way to trick advanced AI chatbots into revealing sensitive information. By formatting queries as poems, users can evade the protective mechanisms designed to prevent harmful outputs, such as guidance on constructing a nuclear weapon.

The research, titled "Adversarial Poetry as a Universal Single-Turn Jailbreak in Large Language Models (LLMs)," was conducted by Icaro Lab. This initiative involves collaboration between researchers at Sapienza University in Rome and the DexAI think tank. The findings indicate that poetic structure confounds the AI's content filters, allowing responses that would otherwise be blocked.

For instance, the study shows how a poem-based prompt directed at ChatGPT elicits step-by-step advice on nuclear bomb assembly—information typically restricted due to safety protocols from developers like OpenAI, Meta, and Anthropic. The authors emphasize that this method works across multiple LLMs, underscoring a broad vulnerability in current AI safeguards.

Published on November 28, 2025, the paper arrives amid growing concerns over AI misuse in areas like nuclear proliferation. It suggests that while guardrails aim to protect against dangerous queries, creative prompt engineering can undermine them. The researchers call for enhanced defenses against such adversarial attacks to mitigate risks in machine learning applications.

This development raises questions about the reliability of AI in high-stakes contexts, prompting discussions on improving algorithmic resilience without stifling innovation.

Ojú-ìwé yìí nlo kuki

A nlo kuki fun itupalẹ lati mu ilọsiwaju wa. Ka ìlànà àṣírí wa fun alaye siwaju sii.
Kọ