. . AI: Updates from OpenAI, Google, and Apple (4.1.24)

Plus OpenUI

Friends, in today's edition, we delve into the latest AI stories from industry giants OpenAI, Google, and Apple. Each one is releasing a large update on product or research which will move the frontier of AI in a meaningful way. We also see posts from people in the industry, one being on algorithmic efficiency trends and the other on LLM evaluation. Finally, we conclude with a story on Text-to-HTML which has a cool demo video and demo. Enjoy!

-Marshall Kirkpatrick, Editor

First impacted: Content creators, AI safety, AI policy
Time to impact: Short

OpenAI has provided preliminary insights from the early use of their Voice Engine model, which it says can generate natural-sounding speech that closely resembles a speaker's voice using text and a short audio clip. The model, which powers the preset voices in the text-to-speech API, ChatGPT Voice, and Read Aloud, has been tested with a select group of partners for tasks such as reading assistance and content translation, with OpenAI implementing safety measures like watermarking and usage tracking due to concerns about potential misuse of synthetic voices. Given the risks to biometric security they have also recommended that companies phase out voice-based authentication as a security measure for accessing bank accounts and recommend accelerating the development and adoption of techniques for tracking the origin of audiovisual content, so it's always clear when you're interacting with a real person or with an AI. [Navigating the Challenges and Opportunities of Synthetic Voices] Explore more of our coverage of: OpenAI, Synthetic Voices, Text-to-Speech API. Share this story by email

First impacted: Data scientists, Machine learning engineers
Time to impact: Medium

Researchers from Google have published a paper on Gecko, a compact text embedding model they report outperforms other models with 768 embedding dimensions when Gecko uses only 256 embedding dimensions. When Gecko gets configured with 768 embedding dimensions as well, it can compete with models seven times its size and with five times higher dimensional embeddings, achieving an average score of 66.31 on the Massive Text Embedding Benchmark (MTEB). The model works by generating and refining synthetic data to create diverse, high-quality training examples and then enhances the data quality by using the LLM to retrieve, rank, and relabel data, identifying the most relevant positive and hard negative passages for each query. [Gecko: Versatile Text Embeddings Distilled from Large Language Models] Explore more of our coverage of: Text Embedding, AI Models, Benchmark Performance. Share this story by email

First impacted: UI researchers, AI researchers
Time to impact: Medium

Apple researchers have published a paper on LLMs' ability to better understand and interact with screens by putting their elements in text context. By transforming screen content into a textual format, the system allows models to accurately identify and react to on-screen references, significantly enhancing user experiences with virtual assistants. They say this delivers more than 5% improvement in resolving on-screen references compared to previous methods. Furthermore, it achieves performance on par with GPT-4 using models that are substantially smaller, marking a significant step forward in efficient and effective AI design. [ReALM: Reference Resolution As Language Modeling] Explore more of our coverage of: Apple Research, Language Learning Models, AI Advancements. Share this story by email

Blog Post: AI Accuracy Improves While Costs Reduce

First impacted: AI researchers, AI developers
Time to impact: Medium

Karina Nguyen, a member of the technical staff at Anthropic, shared a blog post outlining that AI models are achieving higher reasoning accuracy on the Massive Multitask Language Understanding (MMLU) benchmark while their associated costs are decreasing, with current models reaching around 80% accuracy and the costs reducing year on year. Nguyen predicts that in the next 2-5 years, these models could potentially achieve 95-100% MMLU accuracy, while also being more affordable. [The cost of AI reasoning over time.] Explore more of our coverage of: AI Reasoning, MMLU Benchmark, Cost Efficiency. Share this story by email

First impacted: AI developers, AI quality assurance testers
Time to impact: Short

Hamel Husain, an independent consultant, has shared a guide on how to build reliable evaluation systems for LLMs, using Rechat's AI assistant, Lucy, as a case study. Husain recommends a three-step evaluation process: unit tests, model and human evaluation, and A/B testing, and suggests using an LLM to generate synthetic inputs for testing scenarios. [Your AI Product Needs Evals] Explore more of our coverage of: AI Evaluation, Language Learning Models, Rechat's Lucy. Share this story by email

First impacted: Web Developers, UI/UX Designers
Time to impact: Short

AI firm Weights & Biases has launched OpenUI, a tool that they say streamlines the process of building UI components by allowing users to describe the design in text and see it rendered live. According to the company's co-founder, the tool, which has been open-sourced, can convert HTML into various formats including React, Svelte, and Web Components. Check out the video and demo in the link! [GitHub - wandb/openui: OpenUI let's you describe UI using your imagination, then see it rendered live.] Explore more of our coverage of: OpenUI, UI Design, Open-Source Tools. Share this story by email

That’s it! More AI news tomorrow!