• AI Time to Impact
  • Posts
  • . . AI: New Hardware, Software and Concepts Pushing the Frontier of AI (2.21.24)

. . AI: New Hardware, Software and Concepts Pushing the Frontier of AI (2.21.24)

Groq's LPU, Deepmind's RTC, Karpathy, Compounding, ElevenLabs

Friends, today's edition showcases a broad range of breakthroughs and expert insights, from new chips designed to achieve record inference speeds, to innovative concepts by renowned experts aimed at scaling efficiency and, awesome sound effects courtesy of ElevenLabs.

If you find this newsletter valuable, I hope you’ll ponder a few possibilities of friends you can share it with. I really appreciate it when you do.


And now here’s today’s news.

-Marshall Kirkpatrick, Editor

First impacted: AI developers, Tech enthusiasts
Time to impact: Medium

Groq Inc. launched a new AI chip called a Language Processing Unit (LPU) and LLM, which they say provides lightning-fast responses. Their website explains that "An LPU system has as much or more compute as a GPU and reduces the amount of time per word calculated, allowing faster generation of text sequences. With no external memory bandwidth bottlenecks an LPU Inference Engine delivers orders of magnitude better performance than GPUs." Groq has raised over $300M from Tiger Global Management and others. Check out the link for a demo, it is very fast! [groq.com] Explore more of our coverage of: AI Chip Development, Groq Inc., Machine Learning. Share this story by email

First impacted: Software Developers, Data Scientists
Time to impact: Short

Google's DeepMind has introduced a new assessment method for LLMs, dubbed "round-trip correctness" (RTC). RTC allows Code LLM evaluation on a broader spectrum of real-world software domains and reduces the need for human curation. The RTC system generates a description for a piece of code, then generates new code based on that description, and then evaluates the quality of the output code. If it's semantically equivilent to the original code, that's a sign the LLM is coding well and an evaluation performed without human curatation as a bottleneck. [Unsupervised Evaluation of Code LLMs with Round-Trip Correctness] Explore more of our coverage of: DeepMind Technologies, Language Model Evaluation, Round-Trip Correctness. Share this story by email

First impacted: AI Researchers, Data Scientists
Time to impact: Medium

Andrej Karpathy, the recently departed co-founder of OpenAI, unveiled a presentation titled "Let's Build the GPT Tokenizer," in which he delves into the creation and impact of tokenizers on LLMs. Karpathy outlines the challenges of tokenization and proposes the idea of eliminating the tokenization step entirely. He also introduces Minbpe, a tool he says is designed for LLM tokenization and is capable of training tokenizers on large datasets. [Let's build the GPT Tokenizer] Explore more of our coverage of: GPT Tokenizer, OpenAI, Large Language Models. Share this story by email

First impacted: System Architects, AI researchers, Data scientists
Time to impact: Medium

Databricks' CTO, Matei Zaharia, shared a reserach paper outlining how an AI systems' performance can be enhanced by using inference algorithms strategically. The research paper highlighted that "state-of-the-art AI results are increasingly obtained by compound systems with multiple components, not just monolithic models". A great read if you're architecting in the space, check out more in the link. [via @matei_zaharia] Explore more of our coverage of: Databricks, AI Performance, Inference Algorithms. Share this story by email

First impacted: Sound designers, Video game developers
Time to impact: Short

ElevenLabs has launched a project to produce AI-generated sound effects, taking inspiration from OpenAI's Sora model's capability to craft amazing (but silent) videos. The company says this project will give users the ability to create sounds from their own descriptions, moving beyond its existing text-to-speech models. [ElevenLabs Sound Effects Waitlist] Explore more of our coverage of: AI Sound Generation, OpenAI Sora Model, Text-to-Speech Models. Share this story by email

First impacted: AI Developers, Data Scientists
Time to impact:

Sasha Rush has detailed in a blog post how the neural structure, Mamba, can be efficiently computed using the S6 algorithm on current hardware. It is said that Mamba has been implemented in OpenAI's language, Triton, and its coding can handle complex tasks on the GPU, offering a performance boost of around 50 times compared to torch.cumsum.

  • Sasha Rush's blog post discusses the S6 algorithm, saying it enables efficient computation of the Mamba, a modern recurrent neural network architecture, on current hardware. This could serve as an interesting topic for the podcast.

  • The post also highlights the Triton, OpenAI's language, which has been employed for implementing Mamba. According to Rush, this provides a performance boost and allows for managing complex tasks on the GPU, showcasing the potential of GPU programming in advancing neural network architecture.

[Mamba: The Hard Way] Explore more of our coverage of:OpenAI,Mamba Architecture,S6 Algorithm. Share this storyby email

First impacted: AI developers, Machine learning engineers
Time to impact:

OpenAI's Andrej Karpathy has launched Minbpe, a tool he says is designed for the BPE algorithm used in LLM tokenization. The tool, which Karpathy believes can recreate the GPT-4 tokenizer, is said to be easily understood and capable of training tokenizers on large datasets with a vocabulary size of 100K, functioning in about 25 seconds on an M1 MacBook.

  • Andrej Karpathy's creation, Minbpe, successfully emulates the GPT-4 tokenizer, according to claims, highlighting its effectiveness through the BPE algorithm and its potential as a training tool for tokenizers.

  • When tested on an M1 MacBook, the Minbpe tool reportedly completed its training script in a mere 25 seconds, demonstrating its impressive speed and efficiency.

First impacted: Data Managers, AI Application Developers
Time to impact:

LlamaIndex Inc., in collaboration with MongoDB, has launched LlamaCloud, a service they claim enhances data management in MongoDB Atlas Vector DB and simplifies retrieval-augmented generation (RAG) pipeline setups. They also mention the introduction of LlamaParse, a document parser that works alongside their other technologies, and express that the service will expedite the deployment of generative AI apps.

  • Jerry Liu tweeted about LlamaCloud's feature, LlamaParse, a sophisticated document parser said to parse embedded tables and conduct OCR, which they claim directly integrates with LlamaIndex.

  • A tweet from MongoDB indicates a partnership with LlamaIndex, which they suggest will provide seamless data management and potentially lead to cost efficiency for large language models.

First impacted: AI developers, AI researchers
Time to impact:

Maxime Labonne has launched AlphaMonarch-7B, a series of 7B models, which he says outperform on multiple benchmarks, including Nous, EQ-bench, MT-Bench, described as a complex multi-turn question set testing models' conversation and instruction-following skills, and the Open LLM Leaderboard. Labonne reports that AlphaMonarch, modified using the OpenHermes2.5-dpo-binarized-alpha dataset, has outperformed the NeuralMonarch model, which was constructed with distilabel-intel-orca-dpo-pairs and truthy-dpo-pairs, and showed improved results on Nous' benchmark suite and MT-Bench, while also fixing previous tokenizer issues.

  • Labonne has shared that the creation of AlphaMonarch-7B involved around 50 merges, shaping an intricate model lineage that has amplified its performance on numerous benchmarks.

  • He further highlights a unique edge of AlphaMonarch-7B: it may not surpass OmniBeagle, but it offers a more practical trade-off, especially demonstrating proficiency in multi-turn queries on MT-Bench, a test designed to gauge conversational and instruction-following capacities of models.

Read more of today's top AI news stories like this at [https://aitimetoimpact.com/].