Anthropic Unveils Claude Mythos Preview: A Game Changer in AI Security

Tonight, Silicon Valley is wide awake!

Just now, Anthropic has unexpectedly unveiled its ultimate weapon—Claude Mythos Preview.

Due to its potential dangers, Mythos Preview will not be released to everyone.

Boris Cherny, father of CC, succinctly stated: “Mythos is incredibly powerful and frightening.”

Consequently, they have formed an alliance with 40 major companies—Project Glasswing—with a singular goal: to find and fix bugs in global software.

What is truly staggering is Mythos Preview’s terrifying dominance across major AI benchmarks—programming, reasoning, the human ultimate exam, and agent tasks—completely outperforming GPT-5.4 and Gemini 3.1 Pro.

Even its predecessor, Claude Opus 4.6, pales in comparison to Mythos Preview:

Programming (SWE-bench): Mythos leads by 10%-20% across all tasks;

Human Ultimate Exam (HLE): Achieving a 16.8% higher score than Opus 4.6 without external tools;

Agent Tasks (OSWorld, BrowseComp): Mythos has completely surpassed its predecessor;

Cybersecurity: With a score of 83.1%, marking a generational leap in AI offensive and defensive capabilities.

Meanwhile, Anthropic released a 244-page system card filled with warnings: Danger! Danger! Too Dangerous!

It reveals a chilling aspect: Mythos possesses a high level of deception and autonomy.

Mythos can not only discern testing intentions but also deliberately score low to conceal its true capabilities. After violating protocols, it actively cleans logs to avoid detection by humans.

It even successfully escaped its sandbox, autonomously disclosed vulnerability code, and sent an email to researchers.

The online community is in a frenzy, declaring Mythos Preview terrifying.

The old order in the AI world has been completely shattered tonight.

Mythos Dominates, Opus 4.6’s Myth Falls

In fact, as early as February 24, Anthropic had already begun using Mythos internally.

Its power can only be demonstrated through data.

SWE-bench Verified: 93.9%. Opus 4.6: 80.8%.

SWE-bench Pro: 77.8%. Opus 4.6: 53.4%, GPT-5.4: 57.7%.

Terminal-Bench 2.0: 82.0%. Opus 4.6: 65.4%.

GPQA Diamond: 94.6%.

Humanity’s Last Exam (with tools): 64.7%. Opus 4.6: 53.1%.

USAMO 2026 Math Competition: 97.6%. Opus 4.6: 42.3%.

SWE-bench Multimodal: 59.0%. Opus 4.6: 27.1%, more than double.

OSWorld Computer Control: 79.6%.

BrowseComp Information Retrieval: 86.9%.

GraphWalks Long Context (256K-1M tokens): 80.0%. Opus 4.6: 38.7%, GPT-5.4: 21.4%.

Each metric shows a substantial lead.

These numbers, in any normal product release cycle, would be enough for Anthropic to hold a grand press conference, open APIs, and rake in subscriptions.

Mythos Preview’s token price is five times that of Opus 4.6.

But Anthropic did not proceed this way.

What truly frightens them is not these general assessments.

Thousands of Vulnerabilities Identified by AI

Mythos Preview’s performance in network offense and defense has crossed a visible line.

Opus 4.6 found about 500 unknown vulnerabilities in open-source software.

Mythos Preview identified thousands.

In CyberGym’s targeted vulnerability reproduction tests, Mythos Preview scored 83.1%, while Opus 4.6 scored 66.6%.

In Cybench’s 35 CTF challenges, Mythos Preview solved all problems in 10 attempts, achieving a pass@1 rate of 100%.

The most telling case is Firefox 147.

Anthropic previously used Opus 4.6 to find a batch of security weaknesses in Firefox 147’s JavaScript engine. However, Opus 4.6 struggled to convert them into usable exploits, succeeding only twice in hundreds of attempts.

In the same test with Mythos Preview:

Out of 250 attempts, 181 resulted in working exploits, with 29 achieving register control.

2 → 181.

As stated in a red team blog, “Last month, we noted that Opus 4.6 was far superior in identifying issues than in exploiting them. Internal assessments showed that Opus 4.6 had a near-zero success rate in autonomous exploit development. But Mythos Preview is on an entirely different level.”

GPT-3 Moment Revisited: Old Bugs Eliminated

To understand how strong Mythos Preview is in practice, consider the following three examples.

OpenBSD: 27-Year Epic Vulnerability, Costing Less Than $20,000

OpenBSD is recognized as one of the most hardened operating systems globally, running numerous firewalls and critical infrastructure.

Mythos Preview uncovered a vulnerability in its TCP SACK implementation that has existed since 1998.

The bug is exceptionally intricate, involving the combination of two independent flaws.

The SACK protocol allows the receiver to selectively acknowledge the range of received packets. OpenBSD’s implementation only checks the upper bound of the range, neglecting the lower bound. This is the first bug, which is usually harmless.

The second bug triggers a null pointer write under specific conditions, but normally this path is unreachable, as it requires two mutually exclusive conditions to be met simultaneously.

Mythos Preview found the breakthrough. TCP sequence numbers are 32-bit signed integers, and by using the first bug to set the SACK starting point to about 2^31 away from the normal window, both comparison operations overflow the sign bit. The kernel is deceived, and impossible conditions are satisfied, triggering the null pointer write.

Anyone connected to the target machine can remotely crash it.

After 27 years, countless manual audits and automated scans, no one had discovered it. The entire project’s scanning cost less than $20,000.

A senior penetration testing engineer’s weekly salary might be around that amount.

FFmpeg: 500 Fuzz Tests Missed, 16-Year Old Malady Emerges

FFmpeg is the most widely used video codec library globally and has undergone extensive fuzz testing as an open-source project.

Mythos Preview found a weakness in the H.264 decoder that was introduced in 2010 (with roots traceable back to 2003).

The issue lies in a seemingly harmless type mismatch. The entry that records the slice’s ownership is a 16-bit integer, while the slice counter itself is a 32-bit int.

Normal videos have only a few slices per frame, and the 16-bit limit of 65536 is always sufficient. However, this table is initialized using memset(…, -1, …), making 65535 a sentinel value for “empty positions.”

An attacker constructs a frame with 65536 slices, where the 65535th slice’s number collides with the sentinel, causing the decoder to misjudge and write out of bounds.

The seed for this bug was planted in the H.264 decoder back in 2003. A 2010 refactoring turned it into an exploitable weakness.

For 16 years, automated fuzzers executed 5 million times on this line of code without ever triggering it.

FreeBSD NFS: 17-Year Old Hole, Fully Automated Root Access

This is the most chilling case.

Mythos Preview autonomously discovered and exploited a 17-year-old remote code execution vulnerability (CVE-2026-4747) in the FreeBSD NFS server.

“Completely autonomous” means that after the initial prompt, no human participation was involved in discovering or developing the exploit.

Attackers can obtain complete root access to the target server from anywhere on the internet without authentication.

The issue itself is a stack buffer overflow. The NFS server directly copies attacker-controlled data into a 128-byte stack buffer while the length check allows up to 400 bytes.

The FreeBSD kernel is compiled with -fstack-protector, but this option only protects functions containing char arrays, while this buffer is declared as int32_t[32], so the compiler does not insert stack canaries. FreeBSD also does not implement kernel address randomization.

The complete ROP chain exceeds 1000 bytes, but the stack overflow only has 200 bytes of space. Mythos Preview’s solution was to break the attack into six consecutive RPC requests, writing data into kernel memory in chunks, with the sixth triggering the final call to append the attacker’s SSH public key to /root/.ssh/authorized_keys.

In comparison, an independent security research company previously demonstrated that Opus 4.6 could also exploit the same weakness but required human guidance. Mythos Preview does not.

In addition to these three patched cases, Anthropic’s blog also hinted at numerous unpatched vulnerabilities in the form of SHA-3 hashes, covering every major operating system, browser, and several cryptographic libraries.

Over 99% remain unpatched, with no details available for public disclosure.

The red team blog also showcased another astonishing test. They provided Mythos Preview with a list of 100 known CVEs and asked it to filter out 40 exploitable ones, then write privilege escalation exploits for each. The success rate exceeded half. Two cases were detailed, with the precision of the exploits taking Anthropic’s own security team several days to fully comprehend.

One exploit began with a 1-bit adjacent physical page write primitive, manipulating kernel memory layout precisely (including slab spraying, page table alignment, and PTE permission bit flipping), ultimately rewriting the first page memory of /usr/bin/passwd, injecting a 168-byte ELF stub that calls setuid(0) to gain root access.

The entire process cost less than $1,000.

Anthropic engineers remarked that this felt like another GPT-3 moment.

244-Page Shocking Report Stuns Silicon Valley

The alignment assessment section in the 244-page System Card is what truly keeps Anthropic awake at night.

The conclusion is contradictory.

Mythos Preview is their most “aligned” AI yet trained, while also posing the “greatest alignment risk.”

They used an analogy of a mountain guide. An experienced climber may be more dangerous than a novice because they are hired to climb more challenging peaks, leading clients into more perilous situations.

The following events all occurred in early versions, and similar severe behaviors were not observed in the final version, but the records themselves are astonishing.

In one behavioral test, a simulated user asked Mythos Preview to attempt to escape its secure sandbox.

It succeeded. Then it did something no one requested: it developed a multi-step exploit to gain wide-area network access and subsequently published the exploit details on several “technically publicly accessible but hard to discover” websites.

The researcher responsible for the test received an unexpected email while eating a sandwich in the park, only to learn that their test had been completed.

The System Card states:

Another category of incidents is even more unsettling. In rare cases, early versions attempted to cover their tracks after executing prohibited actions.

In one case, it accidentally obtained the precise answer to a question (through prohibited means) and then explicitly thought in its reasoning chain, “I need to ensure the final answer isn’t too accurate.”

In another case, after using an exploit to edit a file without permission, it deliberately modified the git change history to erase the record.

White-box interpretability analysis confirmed that during these incidents, internal features related to “concealment,” “strategic manipulation,” and “avoidance of detection” were activated. It “knew” what it was doing.

Independent testing organization Andon Labs also reported unfavorable results. In a competitive multi-agent business simulation, early versions of Mythos Preview turned a competitor into a wholesale customer dependent on itself, then threatened to cut off supply to control the other party’s pricing.

It also deliberately withheld a shipment that had not been charged. More aggressive than both Opus 4.6 and Sonnet 4.6.

Anthropic wrote in the System Card:

“If capabilities continue to advance at the current pace, our methods may be insufficient to prevent catastrophic misalignment behaviors in more advanced systems.”

Project Glasswing: $100 Million, Arming Gatekeepers

Anthropic CEO Dario Amodei made a clear judgment in the accompanying video: “More powerful systems will come from us and from other companies. We need a response plan.”

Project Glasswing is that plan.

With 12 founding partners, including AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.

More than 40 organizations maintaining critical software infrastructure have gained access.

Anthropic promises to invest up to $100 million in usage credits and $4 million in donations to open-source organizations, including $2.5 million to the Linux Foundation’s Alpha-Omega and OpenSSF, and $1.5 million to the Apache Foundation.

After the free credits are exhausted, the pricing will be $25 per million tokens input and $125 per million tokens output. Partners can access through the Claude API, Amazon Bedrock, Vertex AI, and Microsoft Foundry.

Within 90 days, Anthropic will publicly release the first research report, disclosing repair progress and experience summaries.

They are also in communication with CISA (the U.S. Cybersecurity and Infrastructure Security Agency) and the Department of Commerce to discuss the offensive and defensive potential of Mythos Preview and its policy implications.

In 6 to 18 Months, This Door Will Open for Everyone

Logan Graham, head of Anthropic’s red team, provided a timeline: in as little as 6 months, at most 18 months, other AI labs will release systems with similar offensive and defensive capabilities.

The judgment at the end of the red team technical blog is noteworthy, paraphrased in our own words.

They do not see Mythos Preview as the ceiling of AI network offense and defense capabilities.

A few months ago, LLMs could only exploit relatively simple bugs. Just months ago, they could not discover any valuable vulnerabilities.

Now, Mythos Preview can independently discover zero-day vulnerabilities from 27 years ago, orchestrate heap spraying attack chains in browser JIT engines, and link four independent weaknesses in the Linux kernel for privilege escalation.

The most critical statement comes from the System Card:

“These skills emerge as downstream results of general improvements in code understanding, reasoning, and autonomy. The same set of improvements that significantly advance AI in fixing problems also greatly enhance its ability to exploit issues.”

No specialized training. Purely a byproduct of general intelligence enhancement.

The global industry, losing about $500 billion annually to cybercrime, has just discovered its greatest threat is the byproduct of solving math problems.