. . AI: Just a day of 40-95% improvements (1.18.23)

Friends, it's easy for our eyes to glaze over with the huge leaps and bounds seen many days in the world of AI, but I want to suggest pausing to take note of today. These are stories of big break-throughs and massive investments. Several could make a very big impact on the industry.

As always, these are the stories the AI community is talking about most, according to our weighted analysis of community engagement.

I also want to highlight Washington State’s announcement of state-wide guidance for the use of AI in schools today, though: “with a priority for human inquiry that uses AI for production, but never as the final thought, product, or paper.” That’s pretty interesting. (Thanks Deane!)

Now on to today’s top stories,

Marshall Kirkpatrick, Editor

First impacted: Data Scientists, Software Developers
Time to impact: Short

Microsoft has launched LLMLingua and LongLLMLingua, tools integrated into the LlamaIndex pipeline, which are aimed at enhancing the efficiency of LLMs. According to Microsoft, these tools can compress prompts up to 20 times, thereby reducing costs and enhancing context support, with the potential to increase performance by up to 21.4% using only a quarter of the tokens. Who needs their prompts compressed 20X? People putting whole meeting transcripts, codebases, or documents into LLMs! If you can make something 95% smaller, you can put 20X as much of it in the same space, of course. Check out the Examples page here. [GitHub - microsoft/LLMLingua: To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.] Explore more of our coverage of: Microsoft, LLM Efficiency. Share this story by email

First impacted: AI researchers at Meta, Users of Meta's AI-focused devices
Time to impact: Medium to Long

Mark Zuckerberg has announced that Meta is bringing "closer together" its two AI research branches, FAIR and GenAI, to focus on the development of general intelligence and the training of its upcoming model, Llama 3. He detailed Meta's ambitious plan to build a massive computing infrastructure with nearly 600k H100-equivalent GPUs, including a goal to acquire 350k H100s by the end of the year. "Our long term vision is to build general intelligence, open source it responsibly, and make it widely available so everyone can benefit." [via @soumithchintala] Explore more of our coverage of: Meta, General Intelligence Development, GPU Infrastructure Expansion. Share this story by email

First impacted: Software Developers, AI Researchers
Time to impact: Short

Startup Codium has launched a new approach to enhance the performance of LLMs in code generation. They report that AlphaCodium increased GPT-4's accuracy from 19% to 44%, outperforming previous methods and requiring less computational resources. They say their method involves creating extra data during a pre-processing phase, such as self-reflection and public test reasoning, which is then utilized in the iterative process of running and correcting the generated code against input-output tests. Their blog post is really interesting, I tried adding their reflection prompt to a few AI tasks, asking GPT-4 to revise its answer as appropriate based on the reflection it just did on the problem. In my limited tests, that sometimes produced much better answers! [State-of-the-art Code Generation with AlphaCodium - From Prompt Engineering to Flow Engineering | CodiumAI] Explore more of our coverage of: AI Code Generation, Large Language Models, Computational Efficiency. Share this story by email

First impacted: Video Editors, Content Creators
Time to impact: Short

Runway, an AI creative suite, introduced the Multi Motion Brush, a tool that reportedly allows users to control different areas of their video creations individually, by rolling distinct brushes over the various parts. Check out the demo video here: [via @runwayml] Explore more of our coverage of: Runway, video. Share this story by email

First impacted: AI developers, AI researchers
Time to impact: Short

Researchers at the University of Washington and the Allen Institute for AI write: "Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors. However, tuning these models has become increasingly resourceintensive, or impossible when model weights are private. We introduce proxy-tuning, a lightweight decoding-time algorithm that operates on top of black-box LMs to achieve the result of directly tuning the model, but by accessing only its prediction over the output vocabulary. Our method instead tunes a smaller LM, then applies the difference between the predictions of the small tuned and untuned LMs to shift the original predictions of the base model in the direction of tuning, while retaining the benefits of larger-scale pretraining." [Tuning Language Models by Proxy] Explore more of our coverage of: Tuning, Allen Institute. Share this story by email

First impacted: Data scientists, Embeddings users
Time to impact: Short

Sebastian Bruch has published a detailed 200 page document titled "Foundations of Vector Retrieval", on arXiv, covering fundamental concepts and advanced structures related to vector retrieval. Bruchsayst understanding these principles could ease the learning process for those new to the field, given that vector retrieval is crucial in handling data represented as vectors such as text, images, or speech. [Foundations of Vector Retrieval] Explore more of our coverage of: Vector Retrieval, Data Representation, AI Research. Share this story by email

That’s it! I wonder what will happen tomorrow?