Crowdsourced AI benchmarks ain’t all that, say some experts

Jack Dorsey invests $10 million in a non-profit organization dedicated to open source social media.

Twitter co-founder and Block CEO Jack Dorsey is not only working on new social apps like Bitchat and Sun Day, Read more

Rivian collaborates with Google to enhance navigation experience in its EVs and app

For the past 18 months, Rivian and Google engineers have been working together on a new project that is now Read more

Trump EPA Investigates Small Geoengineering Startup for Air Pollution

Humans have found it hard to quit fossil fuels, which is why some argue that we’ll soon need to start Read more

PHNX Materials: Turning Dirty Coal Waste into Eco-Friendly Concrete

Coal-fired power plants have made quite a mess over the past century. From climate change to health issues, they haven't Read more

AI labs are turning to crowdsourced platforms like Chatbot Arena to test their latest models, but some experts question the ethics and validity of this approach.

Emily Bender, a linguistics professor, criticizes Chatbot Arena for its flawed benchmarking process, which relies on user voting to evaluate models.

Asmelash Teka Hadgu believes that benchmarks like Chatbot Arena are being manipulated by AI labs to make exaggerated claims, citing Meta’s controversial handling of the Llama 4 Maverick model.

Hadgu and Kristine Gloria argue that model evaluators should be compensated for their work, drawing parallels to exploitative practices in the data labeling industry.

Tech and VC heavyweights are joining the Disrupt 2025 agenda to discuss the evolving landscape of AI benchmarking and evaluation processes.

See also  A former OpenAI engineer shares insights on working there

Matt Fredrikson of Gray Swan AI emphasizes the importance of internal benchmarks and paid private evaluations in addition to public crowdsourced benchmarks.

Alex Atallah of OpenRouter and Wei-Lin Chiang of LMArena stress the need for diverse testing methods beyond open benchmarking to ensure accurate model evaluations.

Chiang explains that Chatbot Arena has updated its policies to prevent discrepancies and reinforce fair evaluations, aiming to provide an open and transparent space for community engagement with AI.

Microsoft and a16z unite against AI regulation

Ransomware gang Hunters International announces closure