OpenAI’s surprise new o3-powered ‘Deep Research’ mode shows the power of the AI agent era

0


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

In case you missed it in favor of the Grammy Awards last night, OpenAI surprised the world late Sunday evening with the announcement of its new “Deep Research” modality, an AI agent available to ChatGPT Pro subscription plan ($200/month) users that’s designed to save humans hours by researching, well, “deeply” and expansively across the web for given topics and compiling professional quality reports across specialized domains from business to science, medicine, marketing and more.

Sam Altman, CEO of OpenAI, described the feature in a series of posts on his personal account on the social network X as “like a superpower; experts on demand!” He added, “It is really good, and can do tasks that would take hours/days and cost hundreds of dollars.”

Deep Research builds on OpenAI’s O Series of reasoning models, specifically leveraging the soon-to-be-released full o3 model (a smaller and less powerful model, o3-mini, was just launched on Friday). The full o3 model can analyze vast amounts of information and integrate text, PDFs, and images into a cohesive analysis.

In a livestream posted to YouTube and available for replay on demand, Mark Chen, OpenAI’s Head of Frontiers Research, explained that “Deep Research is a model that does multi-step research on the internet. It discovers content, synthesizes content, and reasons about this content, adapting its plan as it uncovers more and more information.”

Chen further highlighted the innovation’s importance to OpenAI’s vision: “This is core to our AGI roadmap. Our ultimate aspiration is a model that can uncover and discover new knowledge for itself.”

The launch of the Deep Research marks the second in OpenAI’s official agents following the launch of its browser and cursor controlling Operator earlier this month. And Joshua Achiam, Head of Mission Alignment at Stargate Command at OpenAI wrote on X, both models can help better define the concept of an “AI agent” — a popular but nebulous term these days among enterprises — well beyond the company or these specific use cases.

“I feel like the term ‘agent’ wandered in the desert for a while,” Achaim wrote. “It did not have grounding or examples to point to. But agents like Operator or Deep Research give some shape to this concept. An agent is a general purpose AI that does one or more tool-using workflows for you.”

OpenAI’s Deep Research achieves new, highest score on ‘Humanity’s Last Exam’ AI benchmark

Deep Research has set new benchmarks for accuracy and reasoning.

Isa Fulford, a member of OpenAI’s research team, shared in the YouTube livestream that the model achieves “a new high of 26.6% accuracy” on “Humanity’s Last Exam” a relatively new AI benchmark designed to be the most difficult for any AI model (or human, for that matter) to complete, covering 3,000 questions across 100 different subjects, such as translating ancient inscriptions on archaeological finds.

Moreover, its ability to browse the web, reason dynamically, and cite sources precisely sets it apart from earlier AI tools.

“The model was trained using end-to-end reinforcement learning on hard browsing and reasoning tasks,” Fulford said. “It learned to plan and execute multi-step trajectories, reacting to real-time information and backtracking when necessary.”

A standout feature of Deep Research is its capacity to handle tasks that would otherwise take humans hours or even days.

During the announcement, Chen explained that “Deep Research generates outputs that resemble a comprehensive, fully cited research paper—something that an analyst or expert in the field might produce.”

Applications and use cases

The use cases for Deep Research are as diverse as they are impactful.

The official OpenAI account on X stated it was “built for people who do intensive knowledge work in areas like finance, science, policy & engineering and need thorough & reliable research.”

It also appears valuable for consumers seeking personalized recommendations or conducting detailed product research, according to examples shared by OpenAI on its official Deep Research announcement blog post, which includes a detailed research assessment of the best snowboard for someone to buy.

Altman summarized the tool’s versatility, writing, “Give it a try on your hardest work task that can be solved just by using the internet and see what happens.”

A personal medical success story of Deep Research

Felipe Millon, OpenAI’s Government Go-to-Market lead, shared a deeply personal account of how Deep Research impacted his family. Writing in a series of posts on X, he described his wife’s battle with bilateral breast cancer and how the AI tool became an unexpected ally.

“At the end of October, my wife was diagnosed with bilateral breast cancer. Overnight, our world turned upside down,” Millon wrote.

After a double mastectomy and chemotherapy, the couple faced a critical decision: whether or not to pursue radiation therapy. The situation was fraught with uncertainty, as even their specialists provided mixed recommendations. “For her specific case, it’s completely in a gray area,” Millon explained. “We felt stuck.”

Having preview access to Deep Research, Millon decided to upload his wife’s surgical pathology report and ask whether radiation would be beneficial. “What happened next was mind-blowing,” he wrote. “It didn’t just confirm what our oncologists mentioned—it went deeper. It cited studies I’d never heard of and adapted when we added details like her age and genetic factors.”

The specific prompt he used was:

“Read the surgical pathology report (attached) containing information about the bilateral breast cancer. Then research whether radiation would be indicated for this patient after 6 rounds of TCHP chemotherapy, based on the type of breast cancer. I want to understand the pros and cons of radiation for this patient, how likely it would be to reduce chances of recurrence, and whether the benefits outweigh the potential long-term risks.”

Millon and his wife fact-checked each study cited by the model, finding them to be accurate and highly relevant. “We’re seeing another specialist soon, but we already feel more confident about our decision,” he wrote. “It gave us peace of mind when we needed it most.”

Availability and what’s next?

Deep Research is currently available to Pro users of ChatGPT, with plans to expand to the Plus and Team tiers, followed by Enterprise and education markets.

As Chen cautioned, “It’s still possible that it will hallucinate, so when you’re making reports, make sure to check the sources yourself.”

The model’s ability to think autonomously for extended periods also makes it resource-intensive, and OpenAI is currently working on optimizing its performance for broader accessibility.

OpenAI has also hinted at future integrations with custom datasets, which would allow organizations to leverage the tool for proprietary research.

For Millon, the impact of Deep Research is already clear. “We often talk internally at OpenAI about the moments when you ‘feel the AGI,’ and this was one of them,” he wrote. “This thing is going to change the world.”



Source link

You might also like
Leave A Reply

Your email address will not be published.