Beyond RAG: SEARCH-R1 integrates search engines directly into reasoning models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Large language models (LLMs) have seen remarkable advancements in using reasoning capabilities. However, their ability to correctly reference and use external data — information that they weren’t trained on — in conjunction with reasoning has largely lagged behind.
This is an issue especially when using LLMs dynamic, information-intensive scenarios that demand up-to-date data from search engines.
But an improvement has arrived: SEARCH-R1, a technique introduced in a paper by researchers at the University of Illinois at Urbana-Champaign and the University of Massachusetts Amherst, trains LLMs to generate search queries and seamlessly integrate search engine retrieval into their reasoning.
With enterprises seeking ways to integrate these new models into their applications, techniques such as SEARCH-R1 promise to unlock new reasoning capabilities that rely on external data sources.
The challenge of integrating search with LLMs
Search engines are crucial for providing LLM applications with up-to-date, external knowledge. The two main methods for integrating search engines with LLMs are Retrieval-Augmented Generation (RAG) and tool use, implemented through prompt engineering or model fine-tuning.
However, both methods have limitations that make them unsuitable for reasoning models. RAG often struggles with retrieval inaccuracies and lacks the ability to perform multi-turn, multi-query retrieval, which is essential for reasoning tasks.
Prompting-based tool use often struggles with generalization, while training-based approaches require extensive, annotated datasets of search-and-reasoning interactions, which are difficult to produce at scale.
(In our own experiments with reasoning models, we found that information retrieval remains one of the key challenges.)
SEARCH-R1
SEARCH-R1 enables LLMs to interact with search engines during their reasoning process as opposed to having a separate retrieval stage.
SEARCH-R1 defines the search engine as part of the LLM’s environment, enabling the model to integrate its token generation with search engine results seamlessly.
The researchers designed SEARCH-R1 to support iterative reasoning and search. The model is trained to generate separate sets of tokens for thinking, search, information, and answer segments. This means that during its reasoning process (marked by <think></think> tags), if the model determines that it needs external information, it generates a <search></search> sequence that contains the search query. The query is then passed on to a search engine and the results are inserted into the context window in an <information></information> segment. The model then continues to reason with the added context and when ready, generates the results in an <answer></answer> segment.
This structure allows the model to invoke the search engine multiple times as it reasons about the problem and obtains new information (see example below).
Reinforcement learning
Training LLMs to interleave search queries with their reasoning chain is challenging. To simplify the process, the researchers designed SEARCH-R1 to train the model through pure reinforcement learning (RL), where the model is left to explore the use of reasoning and search tools without guidance from human-generated data.
SEARCH-R1 uses an “outcome-based reward model,” in which the model is only evaluated based on the correctness of the final response. This eliminates the need for creating complex reward models that verify the model’s reasoning process.
This is the same approach used in DeepSeek-R1-Zero, where the model was given a task and only judged based on the outcome. The use of pure RL obviates the need to create large datasets of manually annotated examples (supervised fine-tuning).
“SEARCH-R1 can be viewed as an extension of DeepSeek-R1, which primarily focuses on parametric reasoning by introducing search-augmented RL training for enhanced retrieval-driven decision-making,” the researchers write in their paper.
SEARCH-R1 in action
The researchers tested SEARCH-R1 by fine-tuning the base and instruct versions of Qwen-2.5 and Llama-3.2 and evaluating them on seven benchmarks encompassing a diverse range of reasoning tasks requiring single-turn and multi-hop search. They compared SEARCH-R1 against different baselines: direct inference with Chain-of-Thought (CoT) reasoning, inference with RAG, and supervised fine-tuning for tool use.
SEARCH-R1 consistently outperforms baseline methods by a fair margin. It also outperforms reasoning models trained on RL but without search retrieval. “This aligns with expectations, as incorporating search into LLM reasoning provides access to relevant external knowledge, improving overall performance,” the researchers write.

SEARCH-R1 is also effective for different model families and both base and instruction-tuned variants, suggesting that RL with outcome-based rewards can be useful beyond pure reasoning scenarios. The researchers have released the code for SEARCH-R1 on GitHub.
SEARCH-R1’s ability to autonomously generate search queries and integrate real-time information into reasoning can have significant implications for enterprise applications. It can enhance the accuracy and reliability of LLM-driven systems in areas such as customer support, knowledge management, and data analysis. By enabling LLMs to dynamically adapt to changing information, SEARCH-R1 can help enterprises build more intelligent and responsive AI solutions. This capability can be very helpful for applications that require access to constantly changing data, and that require multiple steps to find an answer.
It also suggests that we have yet to explore the full potential of the new reinforcement learning paradigm that has emerged since the release of DeepSeek-R1.