Apple's new AI models aren't living up to the hype on performance

Apple recently announced updates to its AI models that power its Apple Intelligence features across various platforms. However, the company’s own benchmarks show that these new models are underperforming compared to older models from competitors like OpenAI.

According to Apple, human testers rated the text generated by the newest “Apple On-Device” model as comparable to, but not better than, text from similar Google and Alibaba models. Additionally, Apple’s more advanced model, “Apple Server,” was rated behind OpenAI’s GPT-4o in a separate test evaluating image analysis capabilities.

These benchmark results suggest that Apple’s AI research division is struggling to keep up with competitors in the AI race. Despite promises of a Siri upgrade, Apple’s AI capabilities have fallen short in recent years. Some customers have even taken legal action against Apple for falsely marketing AI features that have not been delivered.

Both Apple On-Device and Apple Server have seen improvements in tool use and efficiency compared to their predecessors. They are capable of understanding around 15 languages, thanks to an expanded training dataset that includes various types of data such as images, PDFs, documents, and more. Developers can now access these models through Apple’s Foundation Models framework.