Study Reveals AI Models Still Hallucinate Frequently

Bitcoin reaches new all-time high of over $118,000 within 24 hours

Bitcoin reached a new all-time high of $118,900 on Friday, surpassing its previous record of $113,822 set on Thursday. As Read more

Conveyor Revolutionizes Vendor Security Reviews and RFPs with AI

Selling software to companies can be a daunting task, especially when it comes to meeting security requirements. Chas Ballew, founder Read more

Ready-made Stem Cell Therapies in Development for Pets

Earlier this week, San Diego startup Gallant announced $18 million in funding to bring the first FDA-approved ready-to-use stem cell Read more

Elon Musk’s Dodgy Election Claims Have Gone Viral with 2 Billion Views on X

The world’s richest man buys out one of the most popular social media platforms and uses it as a propaganda Read more

All generative AI models, from Google’s Gemini to Anthropic’s Claude to OpenAI’s GPT-4o, have been found to hallucinate in various degrees. These models can be unreliable narrators, sometimes with hilarious results, and other times with more serious consequences.

### Benchmarking Hallucinations
A recent study conducted by researchers from Cornell, the universities of Washington and Waterloo, and the nonprofit research institute AI2 aimed to benchmark the hallucinations of AI models like GPT-4o against authoritative sources on a range of topics. The results showed that no model performed exceptionally well across all subjects, and the models that hallucinated the least often did so by avoiding questions they would answer incorrectly.

### Trusting AI Outputs
According to Wenting Zhao, a doctoral student at Cornell and co-author of the research, even the best AI models can only generate text without hallucinations about 35% of the time. This raises concerns about the reliability of AI-generated content.

See also  Paragon cancels contracts with Italy over government's refusal to investigate spyware attack on journalist

### Testing Different Models
The study evaluated over a dozen popular models, including GPT-4o, Meta’s Llama 3 70B, Mistral’s Mixtral 8x22B, and Cohere’s Command R+. Surprisingly, the results indicated that models are still prone to hallucinations, despite claims from major AI players.

### Future Improvements
Zhao suggests that improvements in reducing hallucinations may be limited. She believes that human involvement is crucial in fact-checking and validating information generated by generative AI models. Developing advanced fact-checking tools and implementing human-in-the-loop processes could help mitigate hallucinations in AI-generated content.

In conclusion, the study highlights the ongoing challenges in ensuring the accuracy and reliability of AI models. While progress has been made, there are still significant opportunities for improvement in reducing hallucinations and enhancing the trustworthiness of AI-generated content.

Ara Partners’ $800M Fund to Transform Old Industrial Assets into Green Powerhouses

This Week in AI: Generative AI Not as Harmful as Feared