AI Time to Impact
Posts
. . AI: Mistral and the Mixture of Experts Model (12.8.23)

. . AI: Mistral and the Mixture of Experts Model (12.8.23)

Marshall Kirkpatrick
December 08, 2023

Friends, the Mixture of Experts paradigm dominated the AI news today, thanks to a huge new Mistral release. It's exciting to think about the exponential impact of optimized training systems for multiple AI models to work together.

As always, these are the stories most talked about by the AI specialist community. My hope is that a broader readership of people, who have not been following this news all day, will find it useful to have a round-up of summaries at the end of the day.

The AI topics you’re already thinking about, the innovations you’re working on, may be affirmed, extended, or critiqued in these stories.

May we all learn from and make good use of this information,

Marshall Kirkpatrick, Editor

Mistral AI Launches Massive New Mixture of Experts Model

First impacted: AI researchers, AI developers
Time to impact: Short

Mistral AI has introduced a new model, a Mixture of Experts, apparently combining 8 versions of Mistral 7B models specializing in different things, through a BitTorrent magnet link on Twitter. Released with no comment, context, press release or demo, it was the most-talked about thing in the AI world today. [via @MistralAI] Share by email

MegaBlocks Outperforms Tutel in MoE Training

First impacted: AI researchers, AI developers
Time to impact: Short

MegaBlocks, a mixture-of-experts (MoE) training library built by a team from Stanford, Microsoft, and Google, claims to outperform MoE training with state of the art platform Tutel by up to 40%. The MegaBlocks team says its method prevents tokens from being dropped when they are routed between the expert models being trained in an MoE. MegaBlocks support was part of what made the big Mistral story today possible. [GitHub - stanford-futuredata/megablocks] Share by email

Together Research Launches StripedHyena Models to Challenge Transformers

First impacted: AI researchers, AI developers
Time to impact: Medium

Together Research has introduced StripedHyena models, a new architecture they say competes with top open-source Transformers in both short and long-context evaluations. Transformers are foundational in open source AI, so architectures that purport to challenge them are a big deal. [Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers] Share by email

GPT4 Accused of "Laziness"

First impacted: AI researchers, OpenAI model users
Time to impact: Medium

There are emerging claims that GPT4 is "becoming lazier," despite no changes since November 11th. (Eg sometimes it refuses to do things until you say "just do it!" and then it will!) OpenAI has responded to these discussions within the AI community, stating they are looking into the issue. Users have jokingly proposed rebooting the model as a possible fix. [via @ChatGPTapp] Share by email

That’s it! More AI stories on Monday.