Two students develop AI speech model to compete with NotebookLM

Jack Dorsey invests $10 million in a non-profit organization dedicated to open source social media.

Twitter co-founder and Block CEO Jack Dorsey is not only working on new social apps like Bitchat and Sun Day, Read more

Rivian collaborates with Google to enhance navigation experience in its EVs and app

For the past 18 months, Rivian and Google engineers have been working together on a new project that is now Read more

Trump EPA Investigates Small Geoengineering Startup for Air Pollution

Humans have found it hard to quit fossil fuels, which is why some argue that we’ll soon need to start Read more

PHNX Materials: Turning Dirty Coal Waste into Eco-Friendly Concrete

Coal-fired power plants have made quite a mess over the past century. From climate change to health issues, they haven't Read more

A pair of undergraduate students, neither with extensive AI expertise, claim to have developed an openly available AI model capable of generating podcast-style clips similar to Google’s NotebookLM.

The Market for Synthetic Speech Tools and Nari Labs’ Model

The market for synthetic speech tools is vast and growing, with ElevenLabs being one of the largest players alongside challengers like PlayAI and Sesame. Investors see immense potential in these tools, as evidenced by startups in voice AI technology raising over $398 million in VC funding last year, according to PitchBook.

Inspired by NotebookLM, Toby Kim and his fellow co-founder at Korea-based Nari Labs started learning about speech AI three months ago. They used Google’s TPU Research Cloud program to train their model, Dia, which boasts 1.6 billion parameters. Dia allows users to customize speakers’ tones, insert disfluencies, coughs, laughs, and other nonverbal cues.

See also  Lucid Motors to Introduce Hands-Free Highway Driving in July

Testing and Features of Dia

Dia, available on the AI development platform Hugging Face and GitHub, can run on most modern PCs with at least 10GB of VRAM. It can generate a random voice or clone a person’s voice based on user prompts. TechCrunch’s testing of Dia through Nari’s web demo showed promising results, with competitive voice quality and an easy-to-use voice cloning function.

Concerns and Future Plans of Nari Labs

While Dia offers powerful voice generation capabilities, it lacks safeguards against potential misuse. Nari warns against using the model for impersonation, deception, or illicit activities but states that they are not responsible for misuse. The origin of the data used to train Dia has also not been disclosed, raising concerns about potential copyright violations.

Despite these issues, Kim envisions a synthetic voice platform with a “social aspect” built on top of Dia and future models. Nari plans to release a technical report for Dia and expand language support beyond English in the future.

Sen. Hawley to investigate Meta following report of AI chatbots flirting with kids

Apple’s Visual Intelligence: A New Experience on Your iPhone