• AI Time to Impact
  • Posts
  • . . AI: New Evaluation Benchmarks Set by Reka and Latest Advances in Training Models by Meta (5.1.24)

. . AI: New Evaluation Benchmarks Set by Reka and Latest Advances in Training Models by Meta (5.1.24)

Reka, Meta, KAN, Abacus AI

In today's edition, Meta has launched a new update claiming a 17% increase in the effectiveness of LLMs on key benchmarks. We also see a new benchmark for evaluating multi-modal models from Reka along with other headlines mostly focused on model enhancements.

Enjoy!

First impacted: AI developers, NLP Researchers

Researchers from Meta proposed a new method of training large language models (LLMs) like GPT and Llama. According to their study, predicting multiple future tokens at once during training improves the downstream capabilities and sample efficiency of these models, with benefits especially noticeable in generative benchmarks such as coding; in testing, their 13B parameter models solved 12-17% more problems on benchmarks like HumanEval and MBPP comparable traditional models. [Better & Faster Large Language Models via Multi-token Prediction] Share this story by email

First impacted: AI developers, AI model testers

Reka, an AI development company, has launched Vibe-Eval, an evaluation suite that the company says includes 269 high-quality image-text prompts and responses, which are designed to test and separate frontier-class multi-modal models. According to the company's blog post, the prompts are challenging for even cutting-edge models, with half failing to provide a perfect response, indicating room for improvement as models evolve. [Vibe-Eval: A new open and hard evaluation suite for measuring progress of multimodal language models — Reka AI] Share this story by email

[via @aidan_mclau] Share this story by email

[via @bindureddy] Share this story by email