OpenAI’s brand new o3 and o4-mini AI models are cutting-edge in many ways. But here’s the catch: these models are actually seeing things, and they’re seeing even more than some of OpenAI’s older models.
Hallucinations in AI have always been a tough nut to crack, even for the most advanced systems out there today. Traditionally, each new model has gotten better at minimizing hallucinations compared to its predecessor. But with o3 and o4-mini, that trend seems to be going in reverse.
### More Hallucinations, More Problems
According to OpenAI’s tests, these new reasoning models are hallucinating more frequently than their older reasoning models like o1, o1-mini, and o3-mini, as well as the traditional non-reasoning models like GPT-4o.
### Why is This Happening?
The ChatGPT maker is scratching their heads, trying to figure out why these models are seeing things more often. In their technical report, OpenAI admits that more research is needed to understand why hallucinations are on the rise as reasoning models scale up.
### The Numbers Don’t Lie
Testing revealed that o3 was hallucinating in response to 33% of questions on PersonQA, double the rate of its predecessors. And o4-mini did even worse, hallucinating 48% of the time.
It’s a bit concerning that these models are making up events, actions, and information that never actually happened. But hey, maybe a little imagination isn’t always a bad thing, right?
Now, the race is on to find a solution before these hallucinations get even more out of control. Stay tuned for more updates on this trippy AI journey!
