AI Time to Impact
Posts
. . AI: Open source AI, questioning risk claims, cool hardware prototype (11.3.23)

. . AI: Open source AI, questioning risk claims, cool hardware prototype (11.3.23)

Marshall Kirkpatrick
November 03, 2023

Friends, this Friday was a pretty thoughtful day in the world of AI. I've put them in a specific order that makes a story arc, concluding with a fun vision of the future in a hardware prototype.

Regular readers may note some new branding around here. I've posted a very nicely designed LinkedIn carousel version of today's news here. I'll say more about the new design next week, for now I'm just going to practice using it on a Friday afternoon.

New readers (thank you!) here may not know: this is a selection of the top stories in AI each day, as determined by a weighted analysis of AI community engagement, analyzed by an ensemble of AI LLMs and edited by me. Thanks for joining, we do this every weekday. I hope you'll share this with others.

And now, here's the news.

Marshall Kirkpatrick, Editor

The story of Zephyr: Global collaboration on open source AI

First impacted: Open source AI developers

Time to impact: Short

Hugging Face's Thomas Wolf posted a story on X today recounting how AI developers across the US, Europe, and China were able to combine their work on top of the Mistral AI model to create a new model called Zephyr. Zephyr's initial commit was just 26 days ago and it has now been downloaded more than 77,000 times. Hugging Face CEO Clement Delangue shared today that "the current 7 best trending models on HuggingFace are NOT from BIG TECH!" Notably, 4 of the top 7 are built on Mistral. [via @Thom_Wolf/]

"Propaganda or Science: Open Source AI and Bioterrorism Risk"

First impacted: Smart policymakers

Time to impact: Short to medium

A widely shared nearly 7,000 word blog post published yesterday begins like this: "I examined all the biorisk-relevant citations from a policy paper arguing that we should ban powerful open source LLMs. None of them provide good evidence for the paper's conclusion. The best of the set is evidence from statements from Anthropic -- which rest upon data that no one outside of Anthropic can even see, and on Anthropic's interpretation of that data. The rest of the evidence cited in this paper ultimately rests on a single extremely questionable 'experiment' without a control group." [Propaganda or Science: Open Source AI and Bioterrorism Risk]

AllenNLP Launches Multilingual MADLAD-400 Dataset

First impacted: AI researchers, Open-source community developers

Time to impact: Short to medium

AllenNLP has published a dataset called Multilingual Audited Dataset: Low-resource And Document-level (MADLAD-400). It's a multilingual compilation featuring 7.2 trillion tokens of document-level web data across 419 languages, based on Common Crawl and incorporating all snapshots up to August 1, 2022. It is released under the CC-BY-4.0 license. https://creativecommons.org/licenses/by/4.0/ [MADLAD-400]

1st AI Machine is a Hardware Prototype for Video Editing

First impacted: VJs, video remixers

Time to impact: Medium to Long

RunwayML CEO Cristóbal Valenzuela shared a video of a prototype of a hardware video mixing board powered by AI. It's simple, but it's pretty cool, and it's fun to imagine machines like this in kinds of settings. [1st AI Machine]

That's it! More AI news tomorrow.