Surprisingly enough, it seems that some AI agents are not entirely up to scratching some basic business tests


  • Salesforce Research finds a single rotal assignment See only 58% success while efficiency with multiple turns drops to 35%
  • Reasoning models such as Gemini-2.5-Pro ​​tend to surpass lighter models
  • CRMARENA-PRO has proven to be a challenging benchmark

Researchers from Salesforce AI Research have introduced a new benchmark-crmarena-pro use synthetic company data to access the LLM agent’s performance in difference CRM scenarios.

It found that LLM agents achieved about 58% success on tasks that can be completed in a single step, with tasks that require more interactions that fall into efficiency to only 35% – just under more than one in three.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top