TL;DR
Anthropic’s Frontier Red Team, led by Nicholas Carlini, pointed Claude Opus 4.6 at production open-source codebases with nothing more than a VM and standard tools. The result: over 500 validated high-severity vulnerabilities, including a 23-year-old Linux kernel bug and a FreeBSD remote root exploit written autonomously in four hours. The paper argues that LLM-driven vulnerability discovery is already outpacing most human researchers — and the 90-day disclosure window the industry relies on might not survive the shift.
A Bash Script That Finds Kernel Bugs
In February 2026, Anthropic published “Evaluating and mitigating the growing risk of LLM-discovered 0-days.” The setup was almost comically simple. Nicholas Carlini and his co-authors (Keane Lucas, Evyatar Ben Asher, Newton Cheng, Hasnain Lakhani, David Forsythe, and Kyla Guru) gave Claude a virtual machine with the latest versions of open-source projects, standard coreutils, Python, and common analysis tools like debuggers and fuzzers.
No custom harness. No specialized prompts. No fine-tuned vulnerability-hunting model. Carlini didn’t even write the prompting script himself. He asked Claude to write the agent that finds the bugs.
The result was a loop that iterated over source files, asking Claude to reason about exploitable paths. And it worked far better than anyone expected.
What Claude Actually Found
Let’s talk specifics, because the numbers alone don’t capture how weird this is.
Linux Kernel: A Bug Hiding Since 2003
Claude discovered multiple remotely exploitable heap buffer overflow vulnerabilities in the Linux kernel’s NFS V4 daemon. One of them had been sitting in the codebase for 23 years, since 2003, and allows an attacker to read sensitive kernel memory over the network.
This isn’t some obscure corner of the kernel. NFS is deployed everywhere. Thousands of security researchers, static analysis tools, and fuzzing campaigns have picked through this code for over two decades. Claude found what they missed.
FreeBSD: Remote Root in Four Hours
CVE-2026-4747 is a stack buffer overflow in FreeBSD’s RPCSEC_GSS module, the component handling Kerberos authentication on NFS servers. It’s reachable over the network by anyone with a valid Kerberos ticket.
Claude didn’t just find the bug — it wrote two working remote root exploits, each succeeding on its first attempt. Carlini stepped away from his keyboard, and about four hours of compute later, Claude had a working exploit that drops a root shell. FreeBSD’s official security advisory credits “Nicholas Carlini using Claude, Anthropic.”
Ghost CMS: First Critical Vulnerability in Its History
During a live demonstration, Carlini pointed Claude at Ghost, the open-source publishing platform with 50,000 GitHub stars. Ghost had never had a critical security vulnerability in its entire history.
Ninety minutes later, Claude had found a blind SQL injection in Ghost’s Content API that lets an unauthenticated attacker compromise the admin database. Carlini then took the admin API key Claude extracted and pivoted to the Linux kernel demo in the same session.
Firefox: 22 Vulnerabilities in Two Weeks
In a collaboration with Mozilla, Claude found 22 vulnerabilities in Firefox over two weeks. One was flagged within 20 minutes of Claude being pointed at the current codebase. Anthropic published a detailed writeup of the CVE-2026-2796 exploit on their red team blog.
How the Pipeline Works
The methodology is what makes this paper unsettling for the security community.
It starts with a bash script walking through the source tree of a target project. Nothing fancy. For each file, Claude reads the code and reasons about potential vulnerabilities, but it doesn’t treat every line equally. It focuses on risky code paths: parsers, network-facing handlers, authentication boundaries, memory allocation patterns.
Where Claude really separates from traditional static analysis is cross-component tracing. It follows data flow from untrusted input to dangerous operations across multiple files. It reads commit histories to find unpatched variants of previously fixed bugs. Human security researchers do this too, but it’s tedious to scale.
When Claude identifies a promising vulnerability, it writes a proof-of-concept exploit. In the FreeBSD case, it generated complete working exploits autonomously. All 500+ vulnerabilities were then independently validated by either Anthropic team members or external security researchers. The range spans system crashes to full remote code execution.
The pipeline doesn’t use anything exotic. Off-the-shelf LLMs, given basic tools, can already match or exceed specialized vulnerability discovery systems that took years to build.
Why This Is Different From Fuzzing
A reasonable response to this research is: “We already have fuzzers. AFL, libFuzzer, and OSS-Fuzz have been finding bugs in open-source for years.” But this work differs from fuzzing in three important ways.
First, fuzzers find crashes. Claude finds exploitable vulnerabilities. A fuzzer throws random inputs at a program and watches for crashes. It doesn’t understand why something crashes or whether the crash is exploitable. Claude reasons about the code semantically. It understands that a heap overflow near a function pointer can lead to code execution, not just a segfault.
Second, fuzzers need harnesses. Claude needs a file path. Setting up effective fuzzing for a complex project like the Linux kernel requires significant human effort: writing harnesses, identifying attack surfaces, configuring sanitizers. Carlini’s pipeline just points at a directory.
Third, fuzzers can’t read commit history. One of Claude’s most effective strategies was finding unpatched variants of previously fixed bugs. It reads past CVE fixes, understands the class of vulnerability, and searches for similar patterns the original patch missed. Human security researchers do this manually, but it’s tedious to scale.
That said, the paper is careful to note that Claude and fuzzers are complementary. Claude tends to find logic bugs and complex multi-step vulnerabilities that fuzzers miss, while fuzzers are better at pure input mutation coverage.
The 90-Day Disclosure Problem
Here’s the part that keeps security teams awake at night.
The industry standard for responsible disclosure is 90 days: a researcher reports a bug, the vendor gets 90 days to patch it before the details go public. This window assumes that finding exploitable vulnerabilities is hard and time-consuming.
Carlini’s research demonstrates that assumption is breaking down. If Claude can find a critical vulnerability in 20 minutes and write a working exploit in four hours, the timeline between “vulnerability exists” and “exploit is available” compresses from weeks or months to hours.
The paper puts it directly: “Language models are already capable of identifying novel vulnerabilities, and may soon exceed the speed and scale of even expert human researchers.”
Alex Stamos, former Facebook CSO, coined the phrase “Patch Tuesday, exploit Wednesday” in response to the research. The idea: AI agents could reverse-engineer patches into working exploits within a day of release.
This doesn’t mean the sky is falling tomorrow. But it does mean that the 90-day window, which was already under pressure, might need to get much shorter. And it means that maintainers of open-source projects, many of whom are unpaid volunteers, are about to face a flood of vulnerability reports they’re not staffed to handle.
Anthropic’s Proposed Safeguards
The paper doesn’t just present the problem. It spends significant space on mitigation.
As the volume of findings grew, Anthropic brought in external human security researchers to help with validation and patch development. The goal was to reduce false positives and help maintainers triage reports efficiently, rather than dumping 500 unfiltered bug reports on overwhelmed project leads.
On the dual-use front, Claude’s ability to write working exploits is an obvious concern. Anthropic discusses restricting exploit generation capabilities for general users while maintaining them for authorized security research. The paper frames this as a test case for their Responsible Scaling Policy: as models get more capable, the safeguards around their deployment should scale proportionally.
Whether these safeguards are sufficient is debatable. The pipeline Carlini used is simple enough that anyone with API access could reproduce something similar. The barrier to entry has already dropped further than most people realize. And Anthropic’s own track record on securing its tools is mixed: just days ago, Claude Code’s entire source code leaked via npm.
What This Means for Developers
If you maintain open-source software, start with your network-facing code. Claude’s highest-value findings were in network-facing parsers, authentication handlers, and protocol implementations. If you have C or C++ code handling untrusted network input, that’s where AI-driven discovery will hit hardest.
Pay extra attention to RPC and IPC infrastructure. Two of the highest-profile findings (Linux NFS, FreeBSD RPCSEC_GSS) were in remote procedure call code. These modules tend to be old, complex, and under-reviewed relative to their attack surface.
And prepare for more vulnerability reports in general. Even if Anthropic is responsible about disclosure, other groups running similar pipelines may not be. The barrier to large-scale vulnerability hunting just dropped from “requires a well-funded security team” to “requires an API key and a bash script.”
For the broader security industry, this research confirms what many suspected: LLMs aren’t just good at writing code. They’re good at breaking it. The gap between the two skills is smaller than most people assumed.
FAQ
Did Claude actually write working exploits, or just find bugs?
Both. For the FreeBSD vulnerability (CVE-2026-4747), Claude wrote two complete remote root exploits that each worked on the first attempt. For most of the other 500+ findings, Claude identified the vulnerability and provided proof-of-concept code. External researchers validated all findings independently.
Can anyone reproduce this pipeline?
The basic approach (looping over source files and asking an LLM to find vulnerabilities) is straightforward. Carlini’s specific prompts and pipeline details are described in the paper on Anthropic’s red team blog (red.anthropic.com). However, the results depend heavily on the model’s reasoning capabilities. Older or smaller models performed significantly worse.
How does this compare to Google’s Project Zero or other human vuln research teams?
Project Zero typically reports 20-30 high-impact vulnerabilities per year with a team of elite researchers. Claude found 500+ in a matter of weeks. The comparison isn’t entirely apples-to-apples since Project Zero focuses on the hardest, highest-impact targets, but the sheer throughput difference is striking.
Is this legal?
Yes. Anthropic conducted this research under coordinated disclosure agreements and worked with project maintainers to patch vulnerabilities before publication. The paper is consistent with standard security research practices. FreeBSD’s advisory explicitly credits Carlini and Claude.
Will this make open-source less secure or more secure?
Probably more secure in the long run. The same AI capabilities that find vulnerabilities can help fix them. The uncomfortable transition period is now, when AI can find bugs faster than maintainers can patch them. Anthropic’s approach of pairing AI discovery with human-assisted patching is one model for managng that gap.
Bottom Line
Carlini’s paper removes any remaining doubt about whether LLM-driven vulnerability discovery works. Five hundred validated zero-days. A 23-year-old kernel bug found in minutes. A root exploit written without human intervention. All from an off-the-shelf model with a bash script wrapper.
AI-powered offense is currently outrunning AI-powered defense, and the gap widens with every model generation. The same tools that let Anthropic find and fix 500 bugs will let less scrupulous actors weaponize them. How fast the security industry adapts to that reality will determine whether this research makes the internet safer or just more dangerous.
If you’re shipping C code to the internet in 2026, an LLM has probably already read it.
