- Many AI pilots fail in the real world and 95% of the Genai pilots when not production, Salesforce claims
- CRMARENA-PRO lets companies stress testing their AI agents with digital twins
- Two new benchmarks are used for stress test AI agents
Salesforce says companies are struggling with their AI pilots failing in the real world operations, and launching CRMARENA-PRO, a new service to allow companies to create a digital twin of their operations to stress test AI agents before being implemented.
The company cited newer MIT research, which found that 95% of the generative AI pilots don’t even reach the production stage.
CRMARENA-PRO evaluates AI agents on real tasks, such as customer service, sales forecasts and disruptions in the supply chain, but using synthetic data validated by experts.
Salesforce allows you to stress test AIA agents using digital twins
“CRMARENA-PRO creates a strict, contextual simulated company environment with synthetic data where they can safely evaluate API calls for relevant systems as well as the opportunity to protect PII data,” the company wrote in a message.
By adding noise in the real world in the test environment, the CRMARENA-PRO can better evaluate performance, strengthen the resilience and bridge the gap between the space between and after submission.
“The result is AI agents who are skilled, consistent, reliable and agent-ready.”
Businesses can also see how AI agents handle challenges in the real world such as messy data, older systems and complex workflows.
Salesforce noted that part of the complexity comes from the wide range of models available to choose today and to know what specific model or combination of models to use is not that simple.
For this melody, the company has published two new benchmarks to measure agent’s performance: MCP evaluation for evaluation through synthetic tasks and MCP universe, adding tasks in the real world and executing-based evaluators to stress testing in complex scenarios.
In a previous post, Salesforce noted that CRMARENA -PRO “lays the basis for the next border: Enterprise General Intelligence” – and for now users can expect “secure, skilled and effective” AI for all organizations.



