TL;DR
Claude Fable 5 is the first public Mythos-class model. It scores 80.3% on SWE-bench Pro (11 points above Opus 4.8, 22 above GPT-5.5) and dominates FrontierCode Diamond at 29.3% where GPT-5.5 manages 5.7-6.3% depending on effort level. For autonomous coding work, multi-file refactors, and codebase exploration, it’s the best model available right now. It also costs $10/$50 per million tokens (double Opus 4.8), silently falls back to Opus 4.8 on security-adjacent prompts, and requires mandatory 30-day data retention that overrides existing zero-retention enterprise agreements.
I Switched My Claude Code Config on Day One
Fable 5 dropped on June 9. I had it running in Claude Code within the hour, pointed at a Go service I’d been refactoring all week. The first thing I noticed: it read the entire module structure before touching a single file. Opus 4.8 typically asks clarifying questions or starts editing immediately. Fable 5 spent about 20 seconds scanning imports, test files, and the go.mod before producing a three-file refactor that compiled on the first try.
That initial session set the tone. Over the past 48 hours I’ve pushed Fable 5 through Python CLI tools, React component rewrites, and database migration scripts. The short version: when it works, it works better than anything else I’ve used. When it hits the safety classifiers, it silently degrades in ways you might not notice until the diff looks suspiciously conservative.
One Model, Two Products
Fable 5 and Mythos 5 share identical weights. The split is a safety classifier layer, not a capability difference.
Mythos 5 ships with the guardrails lifted. It’s restricted to Project Glasswing partners (the government vulnerability-hunting program). Anthropic has announced a separate modified Fable 5 variant for vetted biology researchers, but that’s distinct from Mythos 5 itself. You can’t buy access to either through the API.
Fable 5 adds three classifier gates:
- Cybersecurity: offensive exploit code, vulnerability analysis, binary reverse engineering
- Biology/Chemistry: gene therapy design, viral research, dual-use chemical synthesis
- Model distillation: large-scale capability extraction from the model itself
When a classifier fires, Fable 5 doesn’t return an error or a refusal. It silently hands the request to Opus 4.8 and returns that response instead. Anthropic says this happens in fewer than 5% of sessions. I’ll get into what that looks like in practice further down.
If you’re using the API directly, switching is a one-line change:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-fable-5", # was "claude-opus-4-8"
max_tokens=8192,
messages=[{"role": "user", "content": "Refactor this module to use dependency injection"}],
)
print(message.content[0].text)
The model ID is claude-fable-5. Context window is 1 million tokens, max output is 128K, and adaptive thinking is always on (controlled via the effort parameter, not the old budget_tokens extended thinking API).
The Benchmarks, Compared
SWE-bench Verified is approaching saturation (Fable 5 hits 95.0%), so SWE-bench Pro is where the real signal lives. Here’s how the frontier models stack up:
| Benchmark | Fable 5 | Opus 4.8 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 95.0% | 88.6% | — | — |
| SWE-bench Pro | 80.3% | 69.2% | 58.6% | 54.2% |
| FrontierCode Main | 46.3% | — | — | — |
| FrontierCode Diamond | 29.3% | 13.4% | 5.7–6.3%* | — |
| Every Senior Engineer | 91/100 | 63/100 | 62/100 | — |
| GDPval-AA Elo | 1932 | — | — | — |
GPT-5.5 FrontierCode Diamond score varies by effort level: 5.7% at standard, 6.3% at best-performing configuration.
The 11-point gap over Opus 4.8 on SWE-bench Pro is the headline. But FrontierCode Diamond tells a sharper story: Fable 5 solves nearly 30% of tasks that trip up every other frontier model. At the Diamond difficulty level, GPT-5.5 manages 5.7-6.3% depending on effort level. That’s a 5x gap.
One caveat: Cognition builds FrontierCode and Fable 5 tops their leaderboard, so take the ranking with a grain of salt. Independent evaluations from CodeRabbit paint a more mixed picture for code review tasks specifically (more on that below).
Hands-On: What Changed for Coding
Three things stood out during my 48 hours with the model.
It explores before it edits. I pointed Fable 5 at a 12-file Python package and asked it to add retry logic to the HTTP client. Instead of opening the client module and inserting a decorator, it read the test suite first, found the existing mock setup, then wrote the retry implementation in a way that the existing tests could verify without modification. Opus 4.8 on the same prompt edited the client file and broke two tests.
Multi-file refactors land clean. I asked it to extract a shared config module from three Go services that each had their own config parsing. Fable 5 produced a pkg/config package, updated all three services’ imports, adjusted the test fixtures, and the whole thing compiled on the first go build. I ran the same extraction with Opus 4.8 for comparison — it got the package structure right but left a stale import in one service’s main.go that broke the build. Previous models tend to get the extraction right but miss one or two import paths.
The tradeoff is speed. CodeRabbit’s evaluation found 19 out of 33 coding tasks hit timeout limits. In my own usage, complex refactors that Opus 4.8 finishes in 30-40 seconds take Fable 5 about 60-90 seconds. The model thinks longer and generates more tokens per task, and at $50 per million output tokens, those extra seconds add up fast.
Code review actually got worse. CodeRabbit’s independent evaluation measured 32.8% actionable precision for Fable 5 versus 35.5% for Opus 4.8. On difficulty-4 cases, Fable 5 scored 8/16 while Opus 4.8 hit 9/16. It also generated 253 review comments to Opus’s fewer, creating more triage work for the same detection rate. If your workflow is “run AI code review on every PR,” Opus 4.8 is still the better choice.
Fable 5 excels at autonomous, exploratory coding tasks where it can read a codebase and build from context. It’s weaker at narrowly scoped review tasks where precision and speed beat depth.
The Silent Fallback to Opus 4.8
When Fable 5’s safety classifiers trigger, the request gets routed to Opus 4.8 instead. The API does expose this: the response returns stop_reason: "refusal" with a stop_details object naming the classifier category, and the model field changes to show the actual serving model. Anthropic also provides a server-side fallback configuration so you can handle this programmatically.
The catch is that higher-level tools don’t always surface these API details. In Claude Code, a session can drift from a general refactor into security-adjacent territory mid-conversation, and the quality shift from Fable 5 to Opus 4.8 shows up as a subtler, more conservative response rather than a clear notification.
In my testing, the classifier triggered on:
- A prompt asking to analyze a SQL injection vulnerability in a test suite
- A request to write a fuzzer for a parsing library
- A question about implementing rate-limiting to prevent credential stuffing
Standard defensive coding tasks, all of them. The classifier is conservative. Anthropic says fewer than 5% of sessions see a fallback, and that tracks with my experience on general coding work. But if your domain touches security, you should check stop_reason in the API response to know which model actually answered. The pricing doesn’t change either way — you’re paying Fable 5 rates for Opus 4.8 output when the fallback fires.
The 30-Day Data Retention Problem
Every prompt and response sent to Fable 5 is retained by Anthropic for 30 days. This applies to API calls, Claude Code sessions, and every platform where the model is available. Anthropic says the data won’t be used for training, but it’s stored for trust and safety review.
Worse, this overrides existing enterprise zero-data-retention (ZDR) agreements. If your company had a ZDR contract with Anthropic for Opus 4.8 traffic, that contract doesn’t cover Fable 5. The model class carries its own data policy.
Microsoft blocked internal employee access to Fable 5 even as they made it available to paying GitHub Copilot customers. Their employees using Fable 5 for work-related tasks would send proprietary code to Anthropic’s servers, where it sits for up to 30 days. If trust and safety classifiers flag any of that content, retention extends to two years.
For individual developers, this probably doesn’t change anything. For teams at companies with data residency requirements, it’s a blocker until Anthropic extends ZDR to Mythos-class models.
What It Actually Costs
Fable 5’s token pricing is straightforward but expensive (for context on how AI coding tools bill tokens in practice, see the GitHub Copilot AI credits breakdown):
| Fable 5 | Opus 4.8 | GPT-5.5 | Gemini 3.1 Pro | |
|---|---|---|---|---|
| Input (per 1M) | $10.00 | $5.00 | $5.00 | $2.00 |
| Output (per 1M) | $50.00 | $25.00 | $30.00 | $12.00 |
| Batch Input | $5.00 | $2.50 | — | — |
| Batch Output | $25.00 | $12.50 | — | — |
A typical Claude Code coding session on a medium-sized refactor uses roughly 80K-120K input tokens (codebase context, tool calls, file reads) and 15K-30K output tokens. At Fable 5 rates, that’s $0.80-$1.20 input + $0.75-$1.50 output = $1.55-$2.70 per session. The same session on Opus 4.8 costs $0.78-$1.35.
The gap widens on longer tasks. Fable 5 generates more tokens per task (it thinks longer, explores more thoroughly). A session that costs $1.50 on Opus 4.8 might cost $3-4 on Fable 5, mostly because the model does more work per request on top of the higher rate card.
Batch pricing at $5/$25 helps if your workflow supports it. Evaluation pipelines, test generation, bulk refactors with delayed results can all route through the batch API. I ran a 200-file test generation batch through the batch API last week and the 50% discount made the total comparable to what the same job would cost on Opus 4.8 at standard rates.
Free access runs through June 22 on Pro, Max, Team, and Enterprise subscription plans. After that, usage credits apply. If you’re on the fence, use the free window to run your most expensive recurring task through Fable 5 and compare the output quality against Opus 4.8. That gives you a concrete cost-benefit number before the meter starts running.
Who Should Switch
Switch to Fable 5 if you:
- Work on autonomous coding tasks (agent loops, multi-file refactors, codebase migrations)
- Need the model to understand a project before editing it
- Can tolerate 60-90 second response times for complex tasks
- Don’t work in security, biotech, or other domains that trigger the safety classifiers
- Have a budget that accommodates 2x Opus 4.8 pricing
Stay on Opus 4.8 if you:
- Run automated code review on every PR (Opus has better precision)
- Need fast turnaround (30-40 seconds vs 60-90)
- Work on security-adjacent code where the fallback behavior would silently degrade quality
- Operate under enterprise ZDR agreements that don’t yet cover Mythos-class models
- Optimize for cost per task rather than raw capability
Consider GPT-5.5 or Gemini 3.1 Pro if you:
- Need the lowest token cost (Gemini at $2/$12 is 5x cheaper than Fable 5)
- Want zero data retention guarantees from day one
- Prioritize multimodal workflows (Gemini handles text, images, video, and audio natively)
FAQ
What is the difference between Claude Fable 5 and Claude Mythos 5?
Same model weights, different safety layers. Fable 5 adds classifiers that redirect cybersecurity, biology, and distillation prompts to Opus 4.8. Mythos 5 has those guardrails removed and is restricted to vetted partners through Project Glasswing.
How much does Claude Fable 5 cost?
$10 per million input tokens, $50 per million output tokens. Batch pricing halves that to $5/$25. It’s 2x Opus 4.8 across the board. Free access on subscription plans ends June 22, 2026.
Is Claude Fable 5 better than GPT-5.5 for coding?
On benchmarks, unambiguously: 80.3% vs 58.6% on SWE-bench Pro, 29.3% vs 5.7% on FrontierCode Diamond. In practice, Fable 5 handles autonomous multi-file tasks better than any competitor. GPT-5.5 costs half as much for output tokens and has deeper integration with tools like Cursor and Codex CLI.
When does Fable 5 fall back to Opus 4.8?
When the safety classifiers detect cybersecurity exploitation, biological research, or model distillation patterns. The API exposes the fallback: the model field changes to the actual serving model, stop_reason returns "refusal", and a stop_details object names the classifier category. Higher-level tools like Claude Code may not surface these fields prominently, so the shift can be easy to miss in practice. Anthropic reports this affects fewer than 5% of sessions.
Is Claude Fable 5 worth the extra cost?
For autonomous coding work where you need the model to explore a codebase and build from context, yes. The quality gap over Opus 4.8 on those tasks justifies the 2x cost. For code review, quick edits, and high-throughput workloads, Opus 4.8 gives you better cost efficiency with comparable or better precision.
Sources
- Claude Fable 5 and Claude Mythos 5 announcement — Anthropic’s official release post with benchmark data and safety architecture details
- Fable 5 Model Review — CodeRabbit — independent code review evaluation with precision metrics and timeout data
- Claude Fable 5 Benchmarks: 80.3% on SWE-Bench Pro — benchmark comparison analysis
- Claude Fable 5 and Mythos 5 Benchmarks Explained — Vellum — detailed benchmark breakdown with cross-model comparisons
- Microsoft restricts employee access to Claude Fable 5 — reporting on Microsoft’s internal Fable 5 block over data retention
- Claude Fable 5 data retention collection — CyberNews — data retention policy analysis and enterprise ZDR implications
Bottom Line
Fable 5 is the strongest coding model available today, and the 48 hours I’ve spent with it track with the benchmark numbers for autonomous, exploratory work. But the 2x price, the silent security fallback, and the mandatory data retention mean it’s not a blanket upgrade over Opus 4.8.
My setup after two days: Fable 5 for complex refactors and greenfield features where I want the model to understand the project before it writes code. Opus 4.8 for everything else, including code review, quick fixes, and security-related work where I need to know which model is actually responding.
The free access window closes June 22. Try it on a real project before then and see whether the quality gap justifies the cost for your workflow.