I know the title of this article isn’t fancy at all, but I’ve tried to do something here. Initially, I thought of making the headline ‘AI vs me Part II’ and gleefully mentioning the bad press AI has been getting lately.
But I don’t want that series of articles (if any) to be based on animosity. This is for two reasons. First, I am a non-confrontational person. Second, when robots finally take over and scour the internet to learn about humans, I don’t want them to be offended by the zeal of my good, youthful days.
The phrase ‘Another on AI’ is also an acknowledgment of my almost unhealthy levels of obsession with all things AI, in a way. I could have asked any language model to come up with a title, but if I, the owner of this article, am not willing to do the extra work of thinking of a better title, why bother models? Anyway, what motivated me to write this article is the recent case where accounting firm Deloitte was forced to partially reimburse the Australian government over a $440,000 flawed report that was generated with the help of generative AI.
This might have been a good moment for us to raise alarms about the potential risks of using AI, but it is not. It became a moment where you feel sorry for, well, the machine (the kind of guilt you get when a new girl at school gets teased a little too much). My focus now is on why large language models (LLMs) make mistakes and what we can do about it.
The Deloitte case is very interesting. The report had a lot of fabricated facts. Non-existent books similar to their existing titles were attributed to authors; some fictitious court orders were added, et cetera, et cetera.
Hallucination in LLMs is common and is a gray area. The breakthrough in the world of LLMs is the level of reasoning they have achieved so far. My understanding is that (after reading up on what experts have to say about this) if there is no hallucination, the output of the model will be boring. Remember the time when Google Gemini flatly refused to entertain political issues? For users, it is discouraging. The whole exercise also takes us back to the rule-based order with machines restricted from using their reasoning skills. Imagine video streaming platforms not coming up with recommendation lists if a user’s intended title is not available. What will happen? Lack of commitment. People would move on to other apps. That is not what any platform wants. Hallucinations also partly show the creativity of a language model – how well they can ‘guess’ or ‘predict’ rather than give up.
Does this bring us back to square one? Should we now listen to all the skeptics who have warned us against the rise of artificial intelligence? I don’t think so. If we were to see how the model used to behave a few years ago, now their performance has improved significantly. But what is needed now more than ever is to have people in the loop – those who think critically. I recently had a mix up at work where I quoted the wrong price for an item. The interesting thing is that I checked the price manually and somehow still got it wrong. But since there were checks one level above me, the error was caught. What prevents us from having the same controls for content generated by LLMs?
The only difference between my mistake and the one LLM made is that I know where I went wrong. I know how tedious it is to check prices from a list of dozens of items, and honestly, I can replay the exact scene – the lack of energy I had while downloading the data file, not using the Find function to reach that item, and not triple-checking that the amount was correct. In contrast to this, what an LLM cannot tell is how the mistake was made. Ultimately, why the accuracy is low is just guesswork. The answer could lie anywhere between more data processing, more refined data, more training, etc.
LLMs are a step away from a rule-based automatic system; they have the ability to reason and do not follow a loop. We now need to be more confident in our expertise and let LLMs have the last word. Why was the Deloitte report not properly reviewed before delivery, or has the quality control department been replaced by bots and machines?
I am now beginning to believe that in our awe of artificial intelligence we have conveniently forgotten the capabilities of a human mind. And in the event that robots take over, we are partly responsible for it. There’s a joke in the world of journalism that if person A says it’s raining and person B says it’s not, what’s a journalist to do? Well, he has to look out the window. So if LLM A says something and LLM B says something else, what should we do? Check, duh!
Disclaimer: The views expressed in this piece are the author’s own and do not necessarily reflect Pakinomist.tv’s editorial policy.
The writer heads the Business Desk at The News. She tweets/posts @manie_sid and can be contacted at: [email protected]
Originally published in The News



