Claude Fable 5 Review: 80% SWE-Bench Pro, but Read the Fine Print

Q: "What is the difference between Claude Fable 5 and Claude Mythos 5?"

" Same model weights, different safety layers. Fable 5 adds classifiers that redirect cybersecurity, biology, and distillation prompts to Opus 4.8. Mythos 5 has those guardrails removed and is restricted to vetted partners through Project Glasswing."

Q: "Is Claude Fable 5 better than GPT-5.5 for coding?"

" On benchmarks, unambiguously: 80.3% vs 58.6% on SWE-bench Pro, 29.3% vs 5.7% on FrontierCode Diamond. In practice, Fable 5 handles autonomous multi-file tasks better than any competitor. GPT-5.5 costs half as much for output tokens and has deeper integration with tools like Cursor and Codex CLI."

Q: "Is Claude Fable 5 worth the extra cost?"

" For autonomous coding work where you need the model to explore a codebase and build from context, yes. The quality gap over Opus 4.8 on those tasks justifies the 2x cost. For code review, quick edits, and high-throughput workloads, Opus 4.8 gives you better cost efficiency with comparable or better precision."

TL;DR

Claude Fable 5 is the first public Mythos-class model. It scores 80.3% on SWE-bench Pro (11 points above Opus 4.8, 22 above GPT-5.5) and dominates FrontierCode Diamond at 29.3% where GPT-5.5 manages 5.7-6.3% depending on effort level. For autonomous coding work, multi-file refactors, and codebase exploration, it’s the best model available right now. It also costs $10/$50 per million tokens (double Opus 4.8), silently falls back to Opus 4.8 on security-adjacent prompts, and requires mandatory 30-day data retention that overrides existing zero-retention enterprise agreements.

I Switched My Claude Code Config on Day One

Fable 5 dropped on June 9. I had it running in Claude Code within the hour, pointed at a Go service I’d been refactoring all week. The first thing I noticed: it read the entire module structure before touching a single file. Opus 4.8 typically asks clarifying questions or starts editing immediately. Fable 5 spent about 20 seconds scanning imports, test files, and the go.mod before producing a three-file refactor that compiled on the first try.

That initial session set the tone. Over the past 48 hours I’ve pushed Fable 5 through Python CLI tools, React component rewrites, and database migration scripts. The short version: when it works, it works better than anything else I’ve used. When it hits the safety classifiers, it silently degrades in ways you might not notice until the diff looks suspiciously conservative.

One Model, Two Products

Fable 5 and Mythos 5 share identical weights. The split is a safety classifier layer, not a capability difference.

Mythos 5 ships with the guardrails lifted. It’s restricted to Project Glasswing partners (the government vulnerability-hunting program). Anthropic has announced a separate modified Fable 5 variant for vetted biology researchers, but that’s distinct from Mythos 5 itself. You can’t buy access to either through the API.

Fable 5 adds three classifier gates:

Cybersecurity: offensive exploit code, vulnerability analysis, binary reverse engineering
Biology/Chemistry: gene therapy design, viral research, dual-use chemical synthesis
Model distillation: large-scale capability extraction from the model itself

When a classifier fires, Fable 5 doesn’t return an error or a refusal. It silently hands the request to Opus 4.8 and returns that response instead. Anthropic says this happens in fewer than 5% of sessions. I’ll get into what that looks like in practice further down.

If you’re using the API directly, switching is a one-line change:

import anthropic

client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-fable-5",  # was "claude-opus-4-8"
    max_tokens=8192,
    messages=[{"role": "user", "content": "Refactor this module to use dependency injection"}],
)
print(message.content[0].text)

The model ID is claude-fable-5. Context window is 1 million tokens, max output is 128K, and adaptive thinking is always on (controlled via the effort parameter, not the old budget_tokens extended thinking API).

80.3%

SWE-bench Pro

29.3%

FrontierCode Diamond

$10/$50

Per 1M tokens (in/out)

30 days

Mandatory data retention

The Benchmarks, Compared

SWE-bench Verified is approaching saturation (Fable 5 hits 95.0%), so SWE-bench Pro is where the real signal lives. Here’s how the frontier models stack up:

Benchmark	Fable 5	Opus 4.8	GPT-5.5	Gemini 3.1 Pro
SWE-bench Verified	95.0%	88.6%	—	—
SWE-bench Pro	80.3%	69.2%	58.6%	54.2%
FrontierCode Main	46.3%	—	—	—
FrontierCode Diamond	29.3%	13.4%	5.7–6.3%*	—
Every Senior Engineer	91/100	63/100	62/100	—
GDPval-AA Elo	1932	—	—	—

GPT-5.5 FrontierCode Diamond score varies by effort level: 5.7% at standard, 6.3% at best-performing configuration.

The 11-point gap over Opus 4.8 on SWE-bench Pro is the headline. Open-weight models are closing in too: GLM-5.2 hit 62.1% on SWE-bench Pro under MIT license, passing GPT-5.5’s 58.6%. But FrontierCode Diamond tells a sharper story: Fable 5 solves nearly 30% of tasks that trip up every other frontier model. At the Diamond difficulty level, GPT-5.5 manages 5.7-6.3% depending on effort level. That’s a 5x gap.

One caveat: Cognition builds FrontierCode and Fable 5 tops their leaderboard, so take the ranking with a grain of salt. Independent evaluations from CodeRabbit paint a more mixed picture for code review tasks specifically (more on that below).

Hands-On: What Changed for Coding

Three things stood out during my 48 hours with the model.

It explores before it edits. I pointed Fable 5 at a 12-file Python package and asked it to add retry logic to the HTTP client. Instead of opening the client module and inserting a decorator, it read the test suite first, found the existing mock setup, then wrote the retry implementation in a way that the existing tests could verify without modification. Opus 4.8 on the same prompt edited the client file and broke two tests.

Multi-file refactors land clean. I asked it to extract a shared config module from three Go services that each had their own config parsing. Fable 5 produced a pkg/config package, updated all three services’ imports, adjusted the test fixtures, and the whole thing compiled on the first go build. I ran the same extraction with Opus 4.8 for comparison — it got the package structure right but left a stale import in one service’s main.go that broke the build. Previous models tend to get the extraction right but miss one or two import paths.

The tradeoff is speed. CodeRabbit’s evaluation found 19 out of 33 coding tasks hit timeout limits. In my own usage, complex refactors that Opus 4.8 finishes in 30-40 seconds take Fable 5 about 60-90 seconds. The model thinks longer and generates more tokens per task, and at $50 per million output tokens, those extra seconds add up fast.

Code review actually got worse. CodeRabbit’s independent evaluation measured 32.8% actionable precision for Fable 5 versus 35.5% for Opus 4.8. On difficulty-4 cases, Fable 5 scored 8/16 while Opus 4.8 hit 9/16. It also generated 253 review comments to Opus’s fewer, creating more triage work for the same detection rate. If your workflow is “run AI code review on every PR,” Opus 4.8 is still the better choice.

Fable 5 excels at autonomous, exploratory coding tasks where it can read a codebase and build from context. It’s weaker at narrowly scoped review tasks where precision and speed beat depth.

The Silent Fallback to Opus 4.8

When Fable 5’s safety classifiers trigger, the request gets routed to Opus 4.8 instead. The API does expose this: the response returns stop_reason: "refusal" with a stop_details object naming the classifier category, and the model field changes to show the actual serving model. Anthropic also provides a server-side fallback configuration so you can handle this programmatically.

The catch is that higher-level tools don’t always surface these API details. In Claude Code, a session can drift from a general refactor into security-adjacent territory mid-conversation, and the quality shift from Fable 5 to Opus 4.8 shows up as a subtler, more conservative response rather than a clear notification.

In my testing, the classifier triggered on:

A prompt asking to analyze a SQL injection vulnerability in a test suite
A request to write a fuzzer for a parsing library
A question about implementing rate-limiting to prevent credential stuffing

Standard defensive coding tasks, all of them. The classifier is conservative. Anthropic says fewer than 5% of sessions see a fallback, and that tracks with my experience on general coding work. But if your domain touches security, you should check stop_reason in the API response to know which model actually answered. The pricing doesn’t change either way — you’re paying Fable 5 rates for Opus 4.8 output when the fallback fires.

The 30-Day Data Retention Problem

Every prompt and response sent to Fable 5 is retained by Anthropic for 30 days. This applies to API calls, Claude Code sessions, and every platform where the model is available. Anthropic says the data won’t be used for training, but it’s stored for trust and safety review.

Worse, this overrides existing enterprise zero-data-retention (ZDR) agreements. If your company had a ZDR contract with Anthropic for Opus 4.8 traffic, that contract doesn’t cover Fable 5. The model class carries its own data policy.

Microsoft blocked internal employee access to Fable 5 even as they made it available to paying GitHub Copilot customers. Their employees using Fable 5 for work-related tasks would send proprietary code to Anthropic’s servers, where it sits for up to 30 days. If trust and safety classifiers flag any of that content, retention extends to two years.

For individual developers, this probably doesn’t change anything. For teams at companies with data residency requirements, it’s a blocker until Anthropic extends ZDR to Mythos-class models.

What It Actually Costs

Fable 5’s token pricing is straightforward but expensive (for context on how AI coding tools bill tokens in practice, see the GitHub Copilot AI credits breakdown):

	Fable 5	Opus 4.8	GPT-5.5	Gemini 3.1 Pro
Input (per 1M)	$10.00	$5.00	$5.00	$2.00
Output (per 1M)	$50.00	$25.00	$30.00	$12.00
Batch Input	$5.00	$2.50	—	—
Batch Output	$25.00	$12.50	—	—

A typical Claude Code coding session on a medium-sized refactor uses roughly 80K-120K input tokens (codebase context, tool calls, file reads) and 15K-30K output tokens. At Fable 5 rates, that’s $0.80-$1.20 input + $0.75-$1.50 output = $1.55-$2.70 per session. The same session on Opus 4.8 costs $0.78-$1.35.

The gap widens on longer tasks. Fable 5 generates more tokens per task (it thinks longer, explores more thoroughly). A session that costs $1.50 on Opus 4.8 might cost $3-4 on Fable 5, mostly because the model does more work per request on top of the higher rate card.

Batch pricing at $5/$25 helps if your workflow supports it. Evaluation pipelines, test generation, bulk refactors with delayed results can all route through the batch API. I ran a 200-file test generation batch through the batch API last week and the 50% discount made the total comparable to what the same job would cost on Opus 4.8 at standard rates.

Free access runs through June 22 on Pro, Max, Team, and Enterprise subscription plans. After that, usage credits apply (see the full Pro vs Max cost breakdown for what each tier’s credit pool actually buys). If you’re on the fence, use the free window to run your most expensive recurring task through Fable 5 and compare the output quality against Opus 4.8. That gives you a concrete cost-benefit number before the meter starts running.

Who Should Switch

Switch to Fable 5 if you:

Work on autonomous coding tasks (agent loops, multi-file refactors, codebase migrations)
Need the model to understand a project before editing it
Can tolerate 60-90 second response times for complex tasks
Don’t work in security, biotech, or other domains that trigger the safety classifiers
Have a budget that accommodates 2x Opus 4.8 pricing

Stay on Opus 4.8 if you:

Run automated code review on every PR (Opus has better precision)
Need fast turnaround (30-40 seconds vs 60-90)
Work on security-adjacent code where the fallback behavior would silently degrade quality
Operate under enterprise ZDR agreements that don’t yet cover Mythos-class models
Optimize for cost per task rather than raw capability

Consider GPT-5.5 or Gemini 3.1 Pro if you:

Need the lowest token cost (Gemini at $2/$12 is 5x cheaper than Fable 5)
Want zero data retention guarantees from day one
Prioritize multimodal workflows (Gemini handles text, images, video, and audio natively)

FAQ

What is the difference between Claude Fable 5 and Claude Mythos 5?

Same model weights, different safety layers. Fable 5 adds classifiers that redirect cybersecurity, biology, and distillation prompts to Opus 4.8. Mythos 5 has those guardrails removed and is restricted to vetted partners through Project Glasswing.

How much does Claude Fable 5 cost?

$10 per million input tokens, $50 per million output tokens. Batch pricing halves that to $5/$25. It’s 2x Opus 4.8 across the board. Free access on subscription plans ends June 22, 2026.

Is Claude Fable 5 better than GPT-5.5 for coding?

On benchmarks, unambiguously: 80.3% vs 58.6% on SWE-bench Pro, 29.3% vs 5.7% on FrontierCode Diamond. In practice, Fable 5 handles autonomous multi-file tasks better than any competitor. GPT-5.5 costs half as much for output tokens and has deeper integration with tools like Cursor and Codex CLI.

When does Fable 5 fall back to Opus 4.8?

When the safety classifiers detect cybersecurity exploitation, biological research, or model distillation patterns. The API exposes the fallback: the model field changes to the actual serving model, stop_reason returns "refusal", and a stop_details object names the classifier category. Higher-level tools like Claude Code may not surface these fields prominently, so the shift can be easy to miss in practice. Anthropic reports this affects fewer than 5% of sessions.

Is Claude Fable 5 worth the extra cost?

For autonomous coding work where you need the model to explore a codebase and build from context, yes. The quality gap over Opus 4.8 on those tasks justifies the 2x cost. For code review, quick edits, and high-throughput workloads, Opus 4.8 gives you better cost efficiency with comparable or better precision.

Sources

Claude Fable 5 and Claude Mythos 5 announcement — Anthropic’s official release post with benchmark data and safety architecture details
Fable 5 Model Review — CodeRabbit — independent code review evaluation with precision metrics and timeout data
Claude Fable 5 Benchmarks: 80.3% on SWE-Bench Pro — benchmark comparison analysis
Claude Fable 5 and Mythos 5 Benchmarks Explained — Vellum — detailed benchmark breakdown with cross-model comparisons
Microsoft restricts employee access to Claude Fable 5 — reporting on Microsoft’s internal Fable 5 block over data retention
Claude Fable 5 data retention collection — CyberNews — data retention policy analysis and enterprise ZDR implications

Bottom Line

Fable 5 is the strongest coding model available today, and the 48 hours I’ve spent with it track with the benchmark numbers for autonomous, exploratory work. But the 2x price, the silent security fallback, and the mandatory data retention mean it’s not a blanket upgrade over Opus 4.8.

My setup after two days: Fable 5 for complex refactors and greenfield features where I want the model to understand the project before it writes code. Opus 4.8 for everything else, including code review, quick fixes, and security-related work where I need to know which model is actually responding.

The free access window closes June 22. Try it on a real project before then and see whether the quality gap justifies the cost for your workflow.

TL;DR#

I Switched My Claude Code Config on Day One#

One Model, Two Products#

The Benchmarks, Compared#

Hands-On: What Changed for Coding#

The Silent Fallback to Opus 4.8#

The 30-Day Data Retention Problem#

What It Actually Costs#

Who Should Switch#

FAQ#

What is the difference between Claude Fable 5 and Claude Mythos 5?#

How much does Claude Fable 5 cost?#

Is Claude Fable 5 better than GPT-5.5 for coding?#

When does Fable 5 fall back to Opus 4.8?#

Is Claude Fable 5 worth the extra cost?#

Sources#

Bottom Line#

Don't miss what's next

Related Articles

Claude Sonnet 5 Review: A Week With Anthropic's New Default

GLM-5.2 Review: 753B Open-Weight Model That Undercuts GPT-5.5

GPT-5.5 Review After Seven Weeks: Where It Beats Claude and Where It Doesn't

Claude Code Dynamic Workflows: Build 4 Production Scripts From Scratch