Less is more: How ‘Chain of Draft’ could cut AI costs by 90% while improving performance

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
A team of researchers at Zoom Communications has developed a breakthrough technique that could dramatically reduce the cost and computational resources needed for artificial intelligence systems to tackle complex reasoning problems, potentially transforming how enterprises deploy AI at scale.
The method, called Chain of Draft (CoD), enables large language models to solve problems with minimal words — using as little as 7.6% of the text required by current methods while maintaining or even improving accuracy. The findings were published in a paper last week on the research repository arXiv.
“By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks,” write the authors, led by Silei Xu, a researcher at Zoom.
How ‘less is more’ transforms AI reasoning without sacrificing accuracy
Chain of Draft draws inspiration from how humans solve complex problems. Rather than articulating every detail when working through a math problem or logical puzzle, people typically jot down only essential information in abbreviated form.
“When solving complex tasks — whether solving mathematical problems, drafting essays, or coding — we often jot down only the critical pieces of information that help us progress,” the researchers explain in their paper. “By emulating this behavior, LLMs can focus on advancing toward solutions without the overhead of verbose reasoning.”
The team tested their approach on numerous benchmarks, including arithmetic reasoning (GSM8k), commonsense reasoning (date understanding and sports understanding), and symbolic reasoning (coin flip tasks).
In one striking example involving Claude 3.5 Sonnet processing sports-related questions, the Chain of Draft approach reduced the average output from 189.4 tokens to just 14.3 tokens — a 92.4% reduction — while simultaneously improving accuracy from 93.2% to 97.3%.
Slashing enterprise AI costs: The business case for concise machine reasoning
“For an enterprise processing 1 million reasoning queries monthly, CoD could cut costs from $3,800 (CoT) to $760, saving over $3,000 per month,” notes AI researcher Ajith Vallath Prabhakar in an analysis of the paper.
The research comes at a critical time for enterprise AI deployment. As companies increasingly integrate sophisticated AI systems into their operations, the computational costs and response times have emerged as significant barriers to widespread adoption.
Current state-of-the-art reasoning techniques like Chain-of-Thought (CoT), introduced in 2022, have dramatically improved AI’s ability to solve complex problems by breaking them down into step-by-step reasoning. But this approach generates lengthy explanations that consume substantial computational resources and increase response latency.
“The verbose nature of CoT prompting results in substantial computational overhead, increased latency, and higher operational expenses,” says Prabhakar.
What makes Chain of Draft particularly noteworthy for enterprises is its simplicity of implementation. Unlike many AI advancements that require expensive model retraining or architectural changes, CoD can be deployed immediately with existing models through a simple prompt modification.
“Organizations already using CoT can switch to CoD with a simple prompt modification,” Prabhakar explains.
The technique could prove especially valuable for latency-sensitive applications like real-time customer support, mobile AI, educational tools, and financial services, where even small delays can significantly impact user experience.
Industry experts suggest that the implications extend beyond cost savings. By making advanced AI reasoning more accessible and affordable, Chain of Draft could democratize access to sophisticated AI capabilities for smaller organizations and resource-constrained environments.
As AI systems continue to evolve, techniques like Chain of Draft highlight a growing emphasis on efficiency alongside raw capability. For enterprises navigating the rapidly changing AI landscape, such optimizations could prove as valuable as improvements in the underlying models themselves.
“As AI models continue to evolve, optimizing reasoning efficiency will be as critical as improving their raw capabilities,” concludes Prabhakar.
The research code and data have been made publicly available on GitHub, allowing organizations to implement and test the approach with their own AI systems.