. . AI: Voice, Robots, Depth Perception & Democracy (1.22.24)

Friends, here are the stories that were hot across the AI community yesterday and over the weekend. Inclement weather kept me from posting them last night - so I’ll be bringing you some more later today! These are a really interesting snapshot on the evolving capabilities of AI, including in robotics and voice.

Thanks for reading and sharing these stories.

-Marshall Kirkpatrick, Editor

First impacted: Voice users
Time to impact: Short

ElevenLabs, a voice AI research firm, secured $80 million in Series B funding from Andreessen Horowitz, Nat Friedman, and Daniel Gross, among others. The company has launched several products, including Dubbing Studio and a Voice Library marketplace, and an early version of their Mobile Reader App, while noting that the funding will support further research, product development, infrastructure expansion, and ethical AI development; the technology from ElevenLabs has already been used to generate over 100 years of audio.

  • The company has launched a Voice Library marketplace, where users can reportedly create an AI representation of their own voice, authenticate it, and make it available for others to use, earning reimbursement whenever their voice is utilized by others.

  • The company's newly introduced Dubbing Studio workflow offers users the ability to dub entire films, produce and edit their transcripts, translations, and timecodes. This provides more control over content creation and enhances the existing AI dubbing feature that, according to the company, facilitates video localization in 29 languages.

First impacted: Robotics developers and users
Time to impact: Medium to long

Jim Fan, a robotics focused AI research scientist at NVIDIA, discussed in his new TED Talk the concept of a "foundation agent," an AI model that could operate in both virtual and physical environments. Fan suggested that this model could potentially learn to handle up to 10,000 varied scenarios, affecting areas such as video games, metaverses, drones, and humanoid robots. At that point, a novel real world scenario is just scenario #10,001. Whether you are excited about robots or not, this is a good talk to watch to get a picture of the state of work in robotics. [Jim Fan: The next grand challenge for AI] Explore more of our coverage of: AI Research, Foundation Agent, NVIDIA. Share this story by email

First impacted: Developers using images
Time to impact: Short

TikTok has launched a new tool, Depth Anything, trained on over 62 million unlabeled images and 1.5 million labeled ones. The tool can estimate relative depth of items in an image. There's a demo on Hugging Face and the output is like a heat map of a photo you upload. The company says it's for product visualization, to interpret environments for the visually impaired, holds potential for use in self-driving technology, and can estimate metric depth for detailed urban maps and forests. [Depth Anything - a Hugging Face Space by LiheYoung] Explore more of our coverage of: TikTok AI, Depth Estimation, Self-Driving Technology. Share this story by email

First impacted: AI developers, AI researchers
Time to impact: Medium to Long

Percy Liang, director of the Stanford Center for Research on Foundation Models, shared his new TED Talk about his vision of a return to open and transparent AI. From TED.com: "Today's AI is trained on the work of artists and writers without attribution, its core values decided by a privileged few. What if the future of AI was more open and democratic?" [Percy Liang: A new way to build AI, openly] Explore more of our coverage of: AI Transparency, Foundation Models, AI Conferences. Share this story by email

First impacted: LLM Engineers, LLM Scientists
Time to impact: Medium

Maxime Labonne, a machine learning scientist at JP Morgan and creator of a Large Language Model course on Github, has completed the LLM Engineer Roadmap, rounding off the course along with LLM Fundamentals and the LLM Scientist. Labonne says this newly added section focuses on developing and implementing LLM-based applications, complementing the previously covered sections on foundational knowledge in mathematics, Python, and neural networks, as well as advanced techniques for building LLMs. See also Jeremy Howard's new free introductory tutorial on CUDA for Python programmers. https://www.twitter.com/jeremyphoward/status/1749153507096322239 [GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.] Explore more of our coverage of: Large Language Model, AI Education, Neural Networks. Share this story by email

That’s it! More AI news in a few hours!