Anthropic released Claude Sonnet 5, the latest Sonnet-class model. Although it’s not a frontier-model breakthrough, Sonnet 5 meaningfully upgrades performance over previous models to deliver stronger coding capabilities, better agentic performance, and more efficient token usage.

Anthropic’s announcement emphasized agentic performance, specifically the model’s ability to carry out multi-step work with less direct human guidance. Anthropic says Sonnet 5 can make plans, use tools such as browsers and terminals, and operate autonomously at a level that recently required larger, more expensive models.

Sonnet 5 Is More Economical With Tokens

Anthropic shows that Sonnet 5 improves over 4.6 with lower-price options and higher quality. Opus 4.8 still beats Sonnet 5 for accuracy, but Anthropic says that the effort level can be adjusted to find the best balance between cost and performance. There is also an introductory price for Sonnet 5 of $2/MTok input and $10/MTok output through August 31.

Sonnet 5 Performance Benchmarks

Sonnet 5 beats Sonnet 4.6, GPT-5.5 and Gemini 3.5 Flash across a number of benchmarks.

The BrowseComp tests how well an AI agent can locate difficult to find information on the web.

BrowseComp scores:

Claude Sonnet 5: 84.7 (single agent)
Claude Sonnet 4.6: 76.2
GPT-5.5: 84.4

Terminal-Bench 2.1 is a test of an AI model’s ability with coding tasks in terminal and CLI.

Terminal-Bench 2.1 scores:

Claude Sonnet 5: 80.4
Claude Sonnet 4.6: 67.0
GPT-5.5: 83.4 (Codex CLI)
Gemini 3.5: Flash 76.2

SWE-bench Pro is a software engineering benchmark in which Sonnet 5 outperformed other similar LLMs.

SWE-bench Pro scores:

Claude Sonnet 5: 63.2
Claude Sonnet 4.6: 58.1
GPT-5.5: 58.6
Gemini 3.5 Flash: 55.1

FrontierCode is a benchmark for agentic coding across 150 tasks, a benchmark that Sonnet 5 significantly outperformed GPT-5.5.

The Claude Sonnet 5 System Card explains:

“Each task gives the agent a checked-out repository and a single issue description; the agent then works autonomously in a containerized environment to produce a final patch, with no human intervention and no timeout information.

Patches are graded against blocking functional criteria (primarily held-out unit tests) plus weighted rubric criteria, including model-graded checks for required test coverage and prohibited implementation patterns. Tasks were authored by maintainers of the underlying repositories and individually reviewed by Cognition researchers, with a random subset manually solved toverify fairness.”

The FrontierCode scores:

Claude Sonnet 5: 38.8
Claude Sonnet 4.6: 15.1
GPT-5.5: 25.5

Sonnet 5 Is “Near-Opus Intelligence”

Anthropic does not claim that Sonnet 5 is a frontier model breakthrough, although it does say that it’s their most capable Sonnet-class model. The system card explains that it is less capable than Anthropic’s more capable Opus and Mythos models. Yet Anthropic does claim that it is “near-Opus intelligence at Sonnet pricing for coding, agents, and everyday professional work.”

Read the full announcement at Anthropic.

Featured Image by Shutterstock/jackpress

Source link

Addresse

Numéro de téléphone

Adresse email

Sonnet 5 Is More Economical With Tokens

Sonnet 5 Performance Benchmarks

Sonnet 5 Is “Near-Opus Intelligence”

The 4-Step Test That Catches AI Errors Before They Shape Your Strategy

New WordPress Plugin Safely And Easily Connects AI To Your Website

Leave a Reply Cancel reply

Navigation

Services

Rester en contact

Anthropic’s Claude Sonnet 5 Is “Near-Opus Intelligence” For All Plans

Sonnet 5 Is More Economical With Tokens

Sonnet 5 Performance Benchmarks

Sonnet 5 Is “Near-Opus Intelligence”

The 4-Step Test That Catches AI Errors Before They Shape Your Strategy

New WordPress Plugin Safely And Easily Connects AI To Your Website

Leave a Reply Cancel reply

Navigation

Services

Rester en contact