‘When intelligence and trust move together, AI stops being an experiment and starts becoming how work gets done’: Microsoft and OpenAI make AI research tools smarter to help answer even your toughest questions

Multi-model agents will check each other before sharing research with you to ensure maximum quality
Researcher mode with Critique enabled scores high on the DRACO benchmark
Copilot Cowork is here for Frontier program customers

Microsoft has announced plans to upgrade its M365 Copilot Researcher agent with a clear focus on using multiple models across AI workflows to combine the power of different systems.

During this shift away from single model systems, multiple AI agents will collaborate and delegate different parts of the task to each other.

Initially, the Researcher agent will use GPT models to generate the initial response, with Claude stepping in to review it for accuracy, completeness, and quality.

The article continues below

M365 Copilot’s Researcher agent sends responses through other agents

Microsoft AI at Work Chief Marketing Officer Jared Spataro explained The update follows the success of Anthropic’s Claude Cowork, which has since been integrated into M365 Copilot. The aptly named Copilot Cowork, which has now been made available in the Frontier program ahead of a wider rollout, allows humans to delegate work to AI.

Spataro explained how Copilot Cowork moves the utility of AI from simple, basic prompts to end-to-end task execution, ideal for long-running and multi-step workflows.

As for Researcher mode with the new Claude-based Critique feature, it has already outperformed single-model systems in early tests, with the latter ensuring the best quality output. It scores 13.8% higher on the DRACO benchmark (Deep Research Accuracy, Completeness and Objectivity), which is considered the industry standard.

At 57.4% thanks to the multi-model setup, it is more than twice as reliable as Deep Research with OpenAI’s o4-mini model. It’s also better than o3-based Deep Research, Gemini Deep Research, Claude Opus 4.6, and Perplexity’s Deep Research when using Opus 4.5 and 4.6. Microsoft didn’t compare it to newer flagship models like GPT-5.4, which performed exceptionally well.

“When intelligence and trust move together, AI stops being an experiment and starts becoming how work gets done,” Spataro wrote, talking about Microsoft’s progress toward Wave 3 of M365 Copilot — an intelligence it defines as “understanding[ing] the context of the work.”

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply