- A new study from the BBC says AI -Chatbots are unable to accurately summarize news
- The survey asked chatgpt, gemini, copilot and confusion to summarize BBC news articles
- 51% of the answers had ‘significant problems’ and 19% introduced actual errors
A new study from the BBC has found that four of the world’s most popular AI chatbots including chatgpt summarize inaccurate news stories.
The BBC asked chatgpt, copilot, gemini and confusion to summarize 100 news stories from the news site and then evaluated each answer to determine how exact AI items were.
The study found that “51% of all AI answers to questions about the news were considered to have significant problems of some form.” and “19% of the AI items that cited BBC content introduced actual errors, such as incorrect actual statements, numbers and dates.”
The study shows several examples of inaccuracies that showed different information to the news the summary. The examples notice that “Gemini wrong said that NHS did not recommend Vaping to stop smoking” and “Chatgpt and Copilot said Rishi Sunak and Nicola Sturgeon were still in office even after they were gone.”
Incorrectly aside that is another crucial finding. The report found that AI “struggled to distinguish between meaning and fact, editorialized and often failed to include significant context.”
While these results are not surprising considering how often we see problems with news summary tools at the moment, including Apple Intelligence’s Mix-Ups that have led Apple to temporarily remove the feature of iOS 18.3, it’s a good reminder not to believe everything , what you read from AI.
Are you surprised?
From the study, the BBC concludes that “Microsoft’s Copilot and Google Gemini had more significant problems than Openais Chatgpt and Confusion,”
While this research does not necessarily give us much more info, it validates skepticism about AI residence tools and emphasizes how important it is to take information from AI chatbots with a pinch of salt. AI is developing rapidly and large language models (LLMS) are being released almost weekly at the moment, so it is expected that errors will occur. That said, from my personal test, I have found inaccuracies and hallucinations as less frequent now in software as chatgpt than it was only a few months ago.
Sam Altman said in a blog post yesterday that AI is progressing faster than Moore’s law, and that means we will continue to see constant improvements in software and how it interacts with the world around it. For the time being, however, it is probably best not to rely on AI for your daily news, and if it is tech-based, you might as well stick to Techradar instead.