People are now using Super Mario to test AI performance

Levy Health aims to assist women in detecting fertility issues earlier

Caroline Mitterdorfer began her fertility journey after a cancer diagnosis at age 27. She co-founded Levy Health to help speed Read more

Lenovo’s newest designs show that PCs can still be enjoyable

Large corporations typically play it safe when it comes to consumer hardware, sticking to incremental updates year after year. Lenovo, Read more

PowerSchool reveals massive data breach: Hackers steal students’ sensitive info!

Welcome to the Edtech World Big news in the education tech world! PowerSchool, the edtech giant, recently experienced a data Read more

Telegram’s Crypto Wallet Debuts in the United States

Telegram Expanding Access to Crypto Wallet Telegram is now offering its crypto wallet to its 87 million users in the Read more

Thought Pokémon was a tough benchmark for AI? Well, some researchers argue that Super Mario Bros. is even tougher. Hao AI Lab at the University of California San Diego recently put AI to the test in live Super Mario Bros. games. Anthropic’s Claude 3.7 came out on top, with Google’s Gemini 1.5 Pro and OpenAI’s GPT-4o struggling to keep up.

The game wasn’t exactly the same as the original 1985 release, as it ran in an emulator and integrated with a framework called GamingAgent. This framework gave the AIs control over Mario by providing basic instructions and in-game screenshots.

Interestingly, the lab found that reasoning models, which “think” through problems step by step, performed worse than “non-reasoning” models in the game. This is because reasoning models take longer to decide on actions, which can be detrimental in real-time games like Super Mario Bros.

See also  Eight months in, Swedish startup Lovable hits $100M in annual revenue

While some experts question the significance of AI’s gaming skills in technological advancement, flashy gaming benchmarks like these highlight an “evaluation crisis.” As Andrej Karpathy from OpenAI mentioned, it’s unclear which metrics to focus on when evaluating AI models. But hey, at least we can enjoy watching AI play Mario.

MathGPT.ai: The ‘cheat-proof’ Tutor and Teaching Assistant, Extends to More Than 50 Institutions

German travel disruptor Tourlane secures $26M funding with Sequoia leading the way!