TL;DR

Anthropic shipped Claude Opus 4.7 on April 16. Same sticker price as Opus 4.6 ($5/M input, $25/M output). SWE-Bench Pro up to 64.3% from 53.4%. Images now accepted at 3.75 megapixels, triple the old ceiling. A new xhigh effort level sits between high and max. It edges out GPT-5.4 and Gemini 3.1 Pro on agentic coding. The catch buried in the release notes: the tokenizer changed, and the same prompt now bills at 1.0 to 1.35x more tokens depending on content, so the headline per-token price overstates how much you actually save.

What Anthropic Shipped

Opus 4.7 went generally available on Thursday across Claude products, the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Every channel live on day one, which is now the default cadence for a top-tier model drop.

The headline claim from Anthropic: Opus 4.7 takes back the top spot on agentic coding, scaled tool-use, agentic computer use, and financial analysis, beating GPT-5.4 and Gemini 3.1 Pro. Read the small print and the phrase that does the work is “generally available”. Anthropic’s own internal Mythos Preview still scores higher on their evals. It’s the model they’ve kept behind the glass since the Mythos leak in March, and Mythos is still off-limits to paying customers. Opus 4.7 is what you can actually call from your code.

For economic context: Anthropic crossed $30B ARR earlier this month while reportedly spending roughly 3x less on training compute than OpenAI. Opus 4.7 is what that cost structure produced. Holding pricing flat while bumping benchmarks looks like a deliberate play on unit economics.

The Benchmark Numbers

Anthropic posted an unusually long eval table for a point release. The shape of it:

BenchmarkOpus 4.7Opus 4.6Delta
SWE-Bench Pro64.3%53.4%+10.9 pts
SWE-Bench Verified87.6%n/an/a
Terminal-Bench 2.069.4%n/an/a
CursorBench70%58%+12 pts
XBOW visual acuity98.5%54.5%+44 pts
Databricks OfficeQA Pron/an/a21% fewer errors
Rakuten SWE-Bench (prod)n/an/a3x more tasks resolved

Two things jump out.

First, the XBOW visual acuity lift is enormous: 44 percentage points. That lines up with the vision change discussed below, and it means screenshots, dense diagrams, and PDFs that Opus 4.6 would misread are now in range.

Second, Rakuten’s 3x “production tasks resolved” is the one I’d actually trust. SWE-Bench variants are well-scraped by now; Rakuten’s internal test rig is not, and a 3x lift on real engineering work is a much harder number to game. On a separate 93-task coding benchmark Anthropic cited, Opus 4.7 resolved 13% more tasks than Opus 4.6, and solved four that neither 4.6 nor Sonnet 4.6 could finish.

What’s New for Developers

Four changes in the tooling affect daily use more than any benchmark number.

xhigh effort level. Opus 4.6 gave you low/medium/high/max. Opus 4.7 adds xhigh between high and max, filling a gap that teams running hard refactors had been working around. max was often overkill or unnecessarily slow, while high sometimes stopped short of the depth the job needed.

Task budgets (public beta). You can now set a hard token ceiling on a long-running agent job. Every team running autonomous Claude agents has been asking for this since at least last summer. A runaway agent on a 1M context window was an expensive way to find out your prompt had a loop bug. Beta means rough edges, but it’s live.

/ultrareview in Claude Code. A dedicated deep-review command, separate from the normal edit loop. From the docs, it runs a more exhaustive pass: cross-file diff review, spec re-reading, second-order consequence checks. The kind of thing you’d do manually before merging a gnarly PR.

Auto mode extended to Max users. Auto mode (Claude Code’s “pick the right model for each step” routing) was previously gated above the Max tier. It’s now open to Max subscribers, which is most serious users of Claude Code.

flowchart LR
    A[Claude Code Request] --> B{Auto Mode}
    B -->|Simple edit| C[Haiku 4.5]
    B -->|Standard task| D[Sonnet 4.6]
    B -->|Deep reasoning| E[Opus 4.7]
    B -->|xhigh effort| F[Opus 4.7 xhigh]

Taken together, these four changes tighten the loop between intent and correct output, and they come at no extra cost on the rate card.

Vision: 3.75 MP Changes the Math for Diagrams

Opus 4.6 maxed out at about 1.15 megapixels per image, roughly 1,568 pixels on the long edge. Opus 4.7 accepts up to 2,576 pixels on the long edge, or around 3.75 megapixels. That’s over 3x the pixel count.

In practical terms: a 4K screenshot of a dashboard used to get downscaled before the model ever saw it. Tables at the edges blurred. Axis labels on dense charts smeared into illegible fuzz. With Opus 4.7 you can pass a real full-res screenshot and the model reads the pixels you actually sent. Chemical structures, circuit diagrams, architecture docs, mockups with small text: all of these become usable inputs instead of “I need to crop this first” inputs.

The 98.5% on XBOW’s visual-acuity benchmark, up from 54.5%, is the numerical version of that experience.

The Deliberate Cyber Pullback

Anthropic differentially reduced Opus 4.7’s offensive cyber capabilities during training, which is the opposite of what OpenAI shipped two days earlier.

In Anthropic’s framing: if you’re a security researcher who wants the unthrottled version, you apply to the new Cyber Verification Program, Anthropic confirms you are who you say you are, and you get access to a model with fewer refusals on dual-use work.

Two days before Opus 4.7 shipped, OpenAI went the other direction: a fine-tuned GPT-5.4-Cyber variant with reduced safeguards, gated behind an expanded Trusted Access for Cyber program. The mechanism is similar — verify, then open access — but the defaults are inverted: OpenAI ships the stronger cyber variant and lets you opt in, Anthropic ships the weaker one and lets you opt in.

Neither approach is obviously correct. Default-on expands the attack surface; default-off pushes legitimate researchers into paperwork to do what they could do on GPT-4 three years ago. The two labs genuinely disagree on the tradeoff here, and that disagreement will probably show up in how their customer bases skew over the next year.

Pricing: Same Sticker, More Tokens

$5 per million input tokens. $25 per million output tokens. Unchanged from Opus 4.6, and unchanged from Opus 4.5 before it, which is when Anthropic brought Opus pricing down from the $15/$75 tier. On a dollars-per-token basis this release is a free upgrade, though the tokenizer change below takes some of that savings back.

$5/M
Input tokens
$25/M
Output tokens
1.0-1.35x
Token count vs 4.6

One caveat: Anthropic changed the tokenizer. The same prompt text that cost you N tokens on Opus 4.6 will cost you between 1.0 and 1.35 times N tokens on Opus 4.7, depending on content. Code-heavy inputs hit near the top of that range; plain English hits the bottom.

So your effective price per character rose somewhere between 0% and 35%. For a shop running a 6-figure Claude bill, that is not a rounding error, and it is the kind of thing you should measure on your own traffic mix before extrapolating budget numbers from the headline rate card.

Anthropic will argue, probably correctly, that the new tokenizer packs semantic meaning more densely, so at equal quality you need fewer completion tokens even if you use more prompt tokens. My advice: run it on a week of your actual workload before trusting the rate card.

Where Opus 4.7 Sits in the Lineup

Three public flagships compete for general-purpose work right now: GPT-5.4 Thinking, Gemini 3.1 Pro, and Claude Opus 4.7. Anthropic’s release notes claim Opus 4.7 wins on agentic coding, tool-use at scale, agentic computer use, and financial analysis. VentureBeat described the release as “narrowly retaking the lead” among generally available models. Note the “narrowly”. The margins are small and they flip benchmark-by-benchmark.

What doesn’t change with this release: the economics of using these models in production. Sonnet 4.6 remains the workhorse for most teams, and Opus is a premium you pay on hard tasks. The JetBrains 2026 developer survey had Claude Code surging on exactly that split: devs pay for Opus when they need to, coast on cheaper models when they don’t. Nothing in Opus 4.7’s release changes that math. If anything, the new task-budget feature makes it easier to be disciplined about when you reach for Opus.

FAQ

Is Claude Opus 4.7 available in Claude Code?

Yes. Opus 4.7 is selectable in Claude Code from day one, including on the Auto mode routing, and the new /ultrareview command is part of the update.

How much does Claude Opus 4.7 cost?

$5 per million input tokens and $25 per million output tokens, unchanged from Opus 4.6. Note the tokenizer change means the same text uses 1.0 to 1.35x more tokens, so your effective cost per character went up slightly even though the sticker price didn’t.

What is the context window size for Claude Opus 4.7?

Opus 4.7 ships with a 1M-token context window at standard API pricing, with no long-context premium. Earlier Opus models had a 200K standard tier with 1M in beta and a surcharge above 200K; that split is gone on 4.7. Since the tokenizer also changed, developers who were running long-context workloads on 4.6 should retest: the same text now uses 1.0 to 1.35x more tokens, which affects how much actually fits in the window.

Is Claude Opus 4.7 better than GPT-5.4?

On agentic coding, scaled tool-use, agentic computer use, and financial analysis: yes, by Anthropic’s numbers, though narrowly. On other benchmarks the picture is mixed, and any “which model is best” answer has a two-week expiration date right now. Pick based on the workload you actually run.

What is Mythos, and why isn’t it released?

Mythos is Anthropic’s internal next-tier model, referenced as “Mythos Preview” in official posts. It scores higher than Opus 4.7 on Anthropic’s evals but has not been cleared for public release. The company cites alignment concerns: Mythos showed capability gains that Anthropic wants more testing on before shipping. No public release date.

Bottom Line

Opus 4.7 is a tidy point release: better SWE-Bench Pro, a 3x vision ceiling, a cleaner effort-level knob, and a 1M context window at standard pricing. All of that for the same sticker price makes it worth moving production workloads onto. Measure the tokenizer change on your own traffic mix before you adjust budget forecasts.

The cyber-capability divergence with OpenAI is the bigger story here. Two labs, same mechanism (verify, then grant access), inverted defaults. How that plays out with security researchers and enterprise buyers over the next quarter will tell us more about the labs’ strategic positioning than another round of SWE-Bench numbers.

If you’re on Opus 4.6, the upgrade is straightforward and I’d recommend moving production traffic over once your token-cost retest comes back. If you’re running Sonnet 4.6 for cost reasons, hold: Sonnet 4.7 is the release that will actually change that calculation, and Anthropic hasn’t announced it yet.