OpenAI has released a new AI model, GPT-5.4-Cyber, exclusively to verified cybersecurity professionals. The fine-tuned version of its GPT-5.4 model aims to test defenses against jailbreaks and adversarial attacks. This move follows Anthropic's recent announcement of its own powerful model.
OpenAI announced GPT-5.4-Cyber on Tuesday through a blog post, limiting access to participants in its expanded Trusted Access for Cyber program. The company stated that testers will help identify gaps, potential jailbreaks, and risks, while improving resilience to adversarial attacks and defensive capabilities. OpenAI emphasized using feedback to understand model benefits and mitigate harms in an AI-versus-AI cybersecurity landscape. The model is a fine-tuned variant of GPT-5.4, adjusted for cybersecurity tasks with lower guardrails, making it less likely to refuse risky security-related requests. This allows experts to assess how it might be weaponized by malicious actors. OpenAI's release appears to respond to Anthropic's Project Glasswing, unveiled last week, which introduced the Claude Mythos Preview. Anthropic reported finding security vulnerabilities in every major operating system and web browser with that model. OpenAI described its own safeguards as sufficiently reducing cyber risk for now, amid ongoing competition with Anthropic for government and enterprise contracts. Both companies are enhancing AI security as models grow more powerful, with cybersecurity professionals gaining early access to bolster defenses.