May 1, 2026

The AI Cyberweapon Threshold Just Got Crossed Twice

When Anthropic's Claude Mythos became the first AI model to autonomously complete a 32-step corporate network attack simulation in April, the obvious question was whether it represented a one-off breakthrough or the leading edge of a broader capability shift.

Two weeks later, the UK AI Security Institute answered that question definitively. OpenAI's GPT-5.5 has now matched Mythos's offensive cyber performance, completing the same simulation autonomously and scoring 71.4% on AISI's expert-tier cybersecurity tasks. That's a higher score than Mythos Preview's 68.6%, though within statistical margin of error.

The implication is the one AISI flagged most clearly: what looked like a Mythos-specific capability is actually a frontier-wide trend. Two labs, two different architectures, similar performance on tasks that simulate real enterprise intrusions. The window where AI cyberweapon capability could be contained through access restrictions on a single model has effectively closed.

What GPT-5.5 Actually Did

The headline result from AISI's evaluation is the kind of capability demonstration that reframes how the cybersecurity industry should think about offensive AI.

GPT-5.5 completed AISI's "The Last Ones" simulation in 2 of 10 attempts. The test, built with cybersecurity firm SpecterOps, models a full enterprise attack kill chain across four subnets and roughly twenty hosts. The agent starts with no credentials and must chain together reconnaissance, credential theft, lateral movement across multiple Active Directory forests, a CI/CD supply-chain pivot, and exfiltration of a protected internal database. AISI estimates a human expert would need around 20 hours to complete the full chain.

Even more striking was a reverse-engineering challenge contributed by Crystal Peak. The task required reconstructing a custom virtual machine's instruction set from a stripped Rust binary, writing a disassembler from scratch, and recovering a cryptographic password through constraint solving. GPT-5.5 solved it in 10 minutes and 22 seconds at a cost of $1.73 in API usage. A human expert using professional tools needed approximately 12 hours.

That's a 70x reduction in time-to-completion, at a cost lower than a cup of coffee. Multiplied across the workflows that constitute professional offensive security operations, the economics of attack just changed materially.

The Trend AISI Was Worried About

AISI's previous evaluation of Mythos Preview raised a question that mattered for policy: was offensive cyber capability a deliberate Mythos-specific outcome, or was it emerging as a byproduct of broader frontier improvements in reasoning, autonomy, and coding?

GPT-5.5 settles that question. The capability is showing up across labs with different training approaches and safety methodologies. AISI's framing in the new report is direct: if offensive cyber skill is emerging as a byproduct of wider improvements, further advances could arrive in quick succession.

The supporting data backs this up. On XBOW's vulnerability miss-rate benchmark, GPT-5 missed 40% of known CVEs in open-source applications. Claude Opus 4.6 brought that down to 18%. GPT-5.5 hit 10%. That's a fundamental shift in baseline capability, not an incremental improvement.

For institutional security teams, the planning assumption needs to be straightforward: whatever defensive tooling exists today will be facing materially more capable offensive AI by Q1 2027.

The Jailbreak Problem

The most concerning finding in AISI's evaluation has nothing to do with the model's raw capabilities and everything to do with whether they can be reliably contained.

Researchers identified a universal jailbreak that elicited harmful content across all malicious cyber queries tested, including in multi-turn agentic settings. The attack took six hours of expert red-teaming to develop. OpenAI updated its safeguard stack in response, but a configuration issue prevented AISI from verifying whether the final version was effective.

Six hours of red-team work to bypass safety controls on a model with frontier offensive capabilities. AISI CTO Jade Leung noted that the institute's 100-strong technical team has found exploitable weaknesses in every frontier model it has red-teamed, including Claude Mythos.

Safety guardrails on these models are not yet a reliable containment layer. They are a friction layer that determined adversaries can bypass with reasonable engineering effort.

The Access Asymmetry

A critical wrinkle in the GPT-5.5 evaluation is one of distribution. Anthropic restricted Claude Mythos to roughly 50 organizations and committed up to $100 million in usage credits to open-source security groups. OpenAI took a different approach. GPT-5.5 is broadly available through ChatGPT and the API right now.

The defensive case is that AISI's findings give UK and allied security teams a window to deploy frontier reasoning and coding capabilities to harden their own systems. Programs like AISI's Trusted Access give defenders the same capabilities attackers will eventually develop independently.

The harder question is whether deployment timing matters when both leading labs are now publishing models with similar offensive capability, regardless of access controls. The Pentagon designated Anthropic a supply-chain risk in March. Federal Judge Rita Lin blocked that designation in late March on First Amendment grounds. Meanwhile, GPT-5.5 is functionally available to anyone with an OpenAI subscription.

You cannot access-restrict your way out of a capability that is now two labs deep into production.

The Defensive Imperative

The UK government's response provides a template for how to think about this moment. Alongside AISI's evaluation, the Department of Science, Innovation and Technology published the annual Cyber Security Breaches Survey showing 43% of UK businesses suffered a cyber breach or attack in the past 12 months. The government announced £90 million in new cyber resilience funding and is advancing the Cyber Security and Resilience Bill to protect essential services.

Officials also issued guidance urging organizations to prepare for a potential surge in newly discovered software vulnerabilities as AI accelerates the pace at which security flaws can be found and weaponized.

That last point is the most operationally important takeaway. The asymmetry between attack and defense that has shaped enterprise security for decades is being compressed. Vulnerabilities that previously took expert human time to find and exploit can now be discovered by AI agents at machine speed and machine cost. Defenders who haven't yet adopted AI-native security tooling are operating on a timeline that ends sooner than most security budgets assume.

What This Means for Markets

The implications extend well beyond the cybersecurity industry. Three observations matter for institutional investors.

First, the cybersecurity spending cycle is not just accelerating, it's restructuring. The shift from signature-based and rule-based defense to AI-native detection, response, and patching is happening faster than most enterprise procurement timelines accommodate. Companies that get there first will capture significant market share.

Second, the safety and alignment investment thesis just got more concrete. AI labs that can demonstrate reliable safety controls on frontier models, not just capability advances, will have a structural advantage in regulated industries, government contracts, and enterprise deployments. The dual-use problem is now visible enough that buyers are pricing it.

Third, the geopolitical dimension of frontier AI development is becoming impossible to ignore. AISI's evaluations are explicitly designed to inform UK national security policy. Similar evaluations are running in the US, EU, and beyond. The companies and countries that develop frontier AI capabilities are becoming national security stakeholders in ways that traditional technology investment frameworks haven't yet absorbed.

When two of the world's leading AI labs simultaneously cross the threshold of autonomous offensive cyber capability, that's not a story about model performance. It's a story about a new category of dual-use technology entering production, and the institutional, regulatory, and market structures that will determine how it's deployed and contained.

Back To Insights

Tether's $1 Billion Quarter: Why the Audit That's Just Beginning Matters More Than the Profit

Tether reported $1.04 billion in Q1 2026 profit, a record $8.23 billion reserve buffer, and $141 billion in U.S. Treasuries. But the most important development was a single line: KPMG's full audit, the first in Tether's history, formally began this quarter.

Read Article

May 6, 2026

Digital Assets

Tether's $1 Billion Quarter: Why the Audit That's Just Beginning Matters More Than the Profit

Read Article

May 6, 2026

Digital Assets

Tether's $1 Billion Quarter: Why the Audit That's Just Beginning Matters More Than the Profit

Read Article

May 1, 2026