. .AI: Using synthetic data to challenge GPT-4 (1.19.24)

Friends, could Llama 2 challenge GPT-4 if it's fine tuned on synthetic data? That's the top story today! That and much more in today’s top 5 stories the AI community is discussing most today.

-Marshall Kirkpatrick, Editor

First impacted: AI researchers, AI model trainers
Time to impact: Short to medium

"We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal." Meta and NYU researchers have published a paper on Self-Rewarding Language Models, which are designed to generate their own rewards during training. Using the LLM-as-a-Judge system in Iterative Direct Preference Optimization training, it is said that the Llama 2 70B model, after three rounds of fine-tuning, surpassed Claude 2, Gemini Pro, and GPT-4 0613 on the AlpacaEval 2.0 leaderboard. That's a real big deal. [Self-Rewarding Language Models] Explore more of our coverage of: Synthetic data, Self-Rewarding, DPO, AlpacaEval . Share this story by email

First impacted: LLM (Language Model) users, AI developers
Time to impact: Short

Andrej Karpathy extended on yesterday's news of startup Codium's release of AlphaCodium, which increased GPT-4's accuracy in code generation from 19% to 44%, thanks to a paradigm called "Flow engineering." This technique, which involves well-designed pipelines, reusable modules, optimizers, and adjustments in architecture, represents a shift from traditional question-and-answer paradigms to a more iterative problem-solving approach. I've been using methods inspired by that story, in pretty much every prompt I run since writing it up yesterday. This is just the tip of a very big iceberg, though. [via @karpathy] Explore more of our coverage of: Flow Engineering, LLM Performance, Iterative Problem-Solving, Reflection. Share this story by email

First impacted: R1 product users, Perplexity Pro potential customers
Time to impact: Short

Perplexity AI and Rabbit Inc. have announced a partnership, with Rabbit's R1 mobile AI device aiming to deliver real-time, accurate responses via Perplexity's PPLX online LLM API. [via @perplexity_ai] Explore more of our coverage of: Rabbit, Perplexity, Partnerships. Share this story by email

First impacted: Data analysts, AI developers
Time to impact: Short

Fireworks.ai has launched FireLLaVA, an open-source model that they say can process and analyse data from various sources, including images. The company says FireLLaVA is the first commercially licensed LLaVa multi-modality model, developed using the CodeLlama 34B Instruct model and trained with 588K lines of visual question-answering data, matching the performance of the original LLaVa model and even surpassing it in four out of seven benchmarks. There's a cool demo online where you can upload an image and ask questions about them. [FireLLaVA: the first commercially permissive OSS LLaVA model] Explore more of our coverage of: Open-Source AI, Data Analysis, FireLLaVA Model. Share this story by email

First impacted: Policy makers, Technology innovators
Time to impact: Medium to long

A new brief titled "Considerations for Governing Open Foundation Models", by researchers affiliated with the Stanford Center for Research on Foundation Models (CRFM), discusses the benefits and potential risks of open foundation models. The authors emphasize the value of these models in combating market monopoly and catalyzing innovation, but they say the ease of access and minimal usage restrictions of these open models could make them more vulnerable to misuse by harmful entities: disinformation, cyberweapons, bioweapons, and spear-phishing emails. But: "Some interventions are better targeted at choke points downstream of the foundation model layer." And "Several current policy proposals (e.g., liability for downstream harm, licensing) are likely to disproportionately damage open foundation model developers." [Issue Brief Considerations for Governing Open Foundation Models] Explore more of our coverage of: Open Foundation Models, AI Regulation, Risk. Share this story by email

Ok, that’s it for this week! More AI stories on Monday!