Anthropic’s Claude 3.7 Sonnet takes aim at OpenAI and DeepSeek in AI’s next big battle

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Anthropic just fired a warning shot at OpenAI, DeepSeek and the entire AI industry with the launch of Claude 3.7 Sonnet, a model that gives users unprecedented control over how much time an AI spends “thinking” before generating a response. The release, alongside the debut of Claude Code, a command-line AI coding agent, signals Anthropic’s aggressive push into the enterprise AI market — a push that could reshape how businesses build software and automate work.

The stakes couldn’t be higher. Last month, DeepSeek stunned the tech world with an AI model that matched the capabilities of U.S. systems at a fraction of the cost, sending Nvidia’s stock down 17% and raising alarms about America’s AI leadership. Now Anthropic is betting that precise control over AI reasoning — not just raw speed or cost savings — will give it an edge.

Claude 3.7 Sonnet introduces a ‘thinking mode’ toggle, allowing users to optimize the AI’s response time based on task complexity. (Credit: Anthropic)

“We just believe that reasoning is a core part and core component of an AI, rather than a separate thing that you have to pay separately to access,” said Dianne Penn, who leads product management for research at Anthropic, in an interview with VentureBeat. “Just like humans, the AI should handle both quick responses and complex thinking. For a simple question like ‘what time is it?’, it should answer instantly. But for complex tasks — like planning a two-week Italy trip while accommodating gluten-free dietary needs — it needs more extensive processing time.”

“We don’t see reasoning, planning and self-correction as separate capabilities,” she added. “So this is essentially our way of expressing that philosophical difference…Ideally, the model itself should recognize when a problem requires more intensive thinking and adjust, rather than requiring users to explicitly select different reasoning modes.”

A comparison of AI models shows Claude 3.7 Sonnet’s performance across various tasks, with notable gains in extended thinking capabilities compared to its predecessor. (Credit: Anthropic)

The benchmark data backs up Anthropic’s ambitious vision. In extended thinking mode, Claude 3.7 Sonnet achieves 78.2% accuracy on graduate-level reasoning tasks, challenging OpenAI’s latest models and outperforming DeepSeek-R1.

But the more revealing metrics come from real-world applications. The model scores 81.2% on retail-focused tool use and shows marked improvements in instruction-following (93.2%) — areas where competitors have either struggled or haven’t published results.

While DeepSeek and OpenAI lead in traditional math benchmarks, Claude 3.7’s unified approach demonstrates that a single model can effectively switch between quick responses and deep analysis, potentially eliminating the need for businesses to maintain separate AI systems for different types of tasks.

How Anthropic’s hybrid AI could reshape enterprise computing

The timing of the release is crucial. DeepSeek’s emergence last month sent shockwaves through Silicon Valley, demonstrating that sophisticated AI reasoning could be achieved with far less computing power than previously thought. This challenged fundamental assumptions about AI development costs and infrastructure requirements. When DeepSeek published its results, Nvidia’s stock dropped 17% in a single day, investors suddenly questioning whether expensive chips were truly essential for advanced AI.

For businesses, the stakes couldn’t be higher. Companies are spending millions integrating AI into their operations, betting on which approach will dominate. Anthropic’s hybrid model offers a compelling middle path: the ability to fine-tune AI performance based on the task at hand, from instant customer service responses to complex financial analysis. The system maintains Anthropic’s previous pricing of $3 per million input tokens and $15 per million output tokens, even with added reasoning features.

Claude 3.7 Sonnet introduces a ‘thinking mode’ toggle, allowing users to optimize the AI’s response time based on task complexity. (Credit: Anthropic)

“Our customers are trying to achieve outcomes for their customers,” explained Michael Gerstenhaber, Anthropic’s head of platform. “Using the same model and prompting the same model in different ways allows somebody like Thompson Reuters to do legal research, allows our coding partners like Cursor or GitHub to be able to develop applications and meet those goals.”

Anthropic’s hybrid approach represents both a technical evolution and a strategic gambit. While OpenAI maintains separate models for different capabilities and DeepSeek focuses on cost efficiency, Anthropic is pursuing unified systems that can handle both routine tasks and complex reasoning. It’s a philosophy that could reshape how businesses deploy AI and eliminate the need to juggle multiple specialized models.

Meet Claude Code: AI’s new developer assistant

Anthropic today also unveiled Claude Code, a command-line tool that allows developers to delegate complex engineering tasks directly to AI. The system requires human approval before committing code changes, reflecting growing industry focus on responsible AI development.

Claude Code’s terminal interface, part of Anthropic’s new developer tools suite, emphasizes simplicity and direct interaction. (Credit: Anthropic)

“You actually still have to accept the changes Claude makes. You are a reviewer with hands on [the] wheel,” Penn noted. “There is essentially a sort of checklist that you have to essentially accept for the model to take certain actions.”

The announcements come amid intense competition in AI development. Stanford researchers recently created an open-source reasoning model for under $50, while Microsoft just integrated OpenAI’s o3-mini model into Azure. DeepSeek’s success has also spurred new approaches to AI development, with some companies exploring model distillation techniques that could further reduce costs.

The command-line interface of Claude Code allows developers to delegate complex engineering tasks while maintaining human oversight. (Credit: Anthropic)

From Pokémon to enterprise: Testing AI’s new intelligence

Penn illustrated the dramatic progress in AI capabilities with an unexpected example: “We’ve been asking different versions of Claude to play Pokémon…This version has made it all the way to Vermilion City, captured multiple Pokémon, and even grinds to level-up. It has the right Pokémon to battle against rivals.”

“I think you’ll see us continue to innovate and push on the quality of reasoning, push towards things like dynamic reasoning,” Penn explained. “We have always thought of it as a core part of the intelligence, rather than something separate.”

The real test of Anthropic’s approach will come from enterprise adoption. While playing Pokémon might seem trivial, it demonstrates the kind of adaptive intelligence businesses need: AI that can handle both routine operations and complex strategic decisions without switching between specialized models. Earlier versions of Claude couldn’t navigate beyond a game’s starting town. The latest version builds strategies, manages resources and makes tactical decisions — capabilities that mirror the complexity of real-world business challenges.

For enterprise customers, this could mean the difference between maintaining multiple AI systems for different tasks and deploying a single, more capable solution. The next few months will reveal whether Anthropic’s bet on unified AI reasoning will reshape the enterprise market or become just another experiment in the industry’s rapid evolution.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link