Claude just beat GPT-5, Gemini and Fall in job assignments in the real world, according to Openai’s own study


  • Openai has released GDPval, a new evaluation system to test how AI works on work -related tasks
  • Claude Opus 4.1 comes out in the lead with ‘Chatgpt-5 High’ in second place
  • Tasks include things like e -maile an answer to a dissatisfied customer

We are all familiar with AI -Benchmarks that measure performance in certain tasks, but often these tasks do not reflect the real world and how people actually use AI, especially at work.

To combat this problem, Openai, the manufacturer of Chatgpt, GDPval, introduces a new way of measuring AI model performance using the real world’s tasks compared to a real human being across 44 professions, from software developers and lawyers to registered nurses and mechanical engineers.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top