Claude Opus 4.7 Launches: 64.3% S WE-Bench Score, Higher Image Resolution

What Happened

Anthropic released Claude Opus 4.7 on April 16, targeting two specific capability gaps: code reliability and visual resolution. The release maintains existing pricing , according to the source publication on Juejin.

On coding benchmarks, Opus 4.7 reached a 64.3% solve rate on complex real-world GitHub programming problems — a 10 percentage point improvement over the previous generation, per the source article. The gain is attributed to a new self-verification mechanism in which the model audits its own output before returning results, catching logical errors internally rather than surfacing them to the user.

Early access developers at Replit and W arp reported the model now proactively requests clarification on amb iguous requirements rather than generating speculative code, according to the source.

Why It Matters

A 10-point jump on real-world GitHub tasks is material. SWE-bench-style evalu ations are widely regarded as harder to game than synthetic benchmarks because they require end-to-end patch generation on actual repositories. If the 64.3% figure holds under independent evaluation, it closes the gap with competing coding-focused models.

The no-price-increase decision is strategically significant . Anthropic is absorbing compute cost increases at a moment when several competitors have raised API prices. For enterprises running high-volume coding agents, stable pricing reduces total cost of ownership uncertainty — a direct procurement argument against alternatives.

The deliberate suppression of offensive cybersecurity capabilities is also worth noting. Anthropic explicitly weakened the model's attack-generation performance via training constraints, accepting lower scores on certain security benchmarks as a tradeoff. This positions the company ahead of anticipated regulatory requirements around dual-use AI capabilities in the EU AI Act and U.S. executive order frameworks.

The Technical Detail

Vision: 2,576px Long-Edge Support

Maximum image resolution increases to 2,576 pixels on the long edge — described by the source as approximately a 3x improvement in effective visual acuity over the prior version. This enables the model to process dense financial charts, chemical structure diagrams, and high -fidelity design mockups without significant detail loss from downsampling.

Self-Verification Architecture

The self-verification feature adds an internal review pass before code is returned to the caller. The model identifies logical gaps in its own generated solutions and revises them within the same inference cycle. No additional API calls are required from the developer side, though this likely increases per-request latency and token consumption for complex tasks.

New Developer Controls

xhigh effort level: A new tier above existing effort settings, giving developers f iner control over the tradeoff between reasoning depth and response speed.
Task budget / token cap: Developers can set a hard upper bound on token consumption per task, preventing runaway costs in agentic workflows where the model might otherwise iterate indefinitely.
/ultrareview command in Claude Code: A dedicated code review command that runs automated bug detection on submitted code blocks.

Security Constraints

Anthropic confirmed intent ional degradation of offensive cybersecurity capabilities via training-time constraints. An automatic inter ception mechanism blocks attack-vector generation. The source notes this results in marginally lower scores on certain security-category benchmarks — a deliberate tradeoff the company is transparent about.

What To Watch

Independent benchmark re plication (next 14 days): The 64.3% SWE-bench figure needs third-party confirmation. Expect academic and community re -runs within two weeks of release.
Replit and Warp integ rations (next 30 days): Both companies had early access. Watch for announced GA integrations or quantified productivity metrics from their engineering teams.
Competitor responses: OpenAI's GPT-4. 1 coding tier and Google's Gemini 2.5 Pro are the direct comparables. A 64.3% solve rate on real GitHub tasks will pressure both companies to publish updated benchmark numbers or ship capability responses.
Token budget API adoption : The task budget feature is a direct enabler of production agentic deploy ments. Adoption rate will signal how many enterprise teams are running Opus in autonomous rather than interactive mode.
Regulatory alignment sign aling: Anthropic's proactive cybersecurity capability suppression may be referenced in upcoming EU AI Act compliance discussions. Watch for official policy statements in Q2 .

Claude Opus 4.7 Launches: 64.3% S WE-Bench Score, Higher Image Resolution

What Happened

Why It Matters

The Technical Detail

Vision: 2,576px Long-Edge Support

Self-Verification Architecture

New Developer Controls

Security Constraints

What To Watch

Related Reading

GPT- 5.5 Tops Every Benchmark, Edges Out Opus 4.7 — OpenAI Strikes Back

The One Thing You Must Do with Claude Code: Sign a Contract ( CLAUDE.md)

Pro Users Locked Out of Claude Code Unless They Pay $100/ Mo for Max

Anthropic's Claude Code Source Leak : 510 K Lines Reveal How It Saves You Money

Connecting Obsidian to Claude Code: Long -Term AI Memory in 400 Lines

Spec -Driven Development with OpenSpec