. .AI: New research expanding AI capabilities (1.3.24)

Team, today's top 6 stories in AI are heavily dominated by new research, representing expanded capabilities.

For some of us, these new tech capabilities are directly relevant, while others of us can find valuable connections to our work: to inspire innovation, aid cross-disciplinary problem-solving, and broaden our perspectives.

Let me know any ways you execute on this that we can share. Speaking of cool things readers of this newsletter are sharing, check out Justin Kistner’s smart new 5 Whys GPT (“ChatGPT is trained on data about the method. However, it's not trained on how to facilitate the method.”) and Dr. Phil Hendrix’s 21-day deep dive into generative AI that he’s presenting to an enterprise CMO and making publicly available on LinkedIn. It’s good.

And now today’s top news in AI, according to our weighted analysis of AI community engagement.

Marshall Kirkpatrick, Editor

P.S. Want to see and hear me walk through these stories in a short video? I’ve posted a 6 minute video here on Twitter.

First impacted: Financial Analysts, Data Scientists
Time to impact: Medium

The AI team at JPMorgan has launched DocLLM, a model they claim outperforms State of the Art LLMs "on 14 out of 16 datasets across all tasks, and generalizes well to 4 out of 5 previously unseen datasets." DocLLM, a pared-down version of traditional LLMs, incorporates the physical layout of a document and is fine-tuned using a large dataset. "In addition to its immediate utility in visually rich document understanding tasks, we posit that DocLLM offers an opportunity to change the landscape of generative pre-training by enabling language models to go beyond next token prediction in plain text settings. " Cool! [Paper page - DocLLM: A layout-aware generative language model for multimodal document understanding] Explore more of our coverage of: JPMorgan, multimodal. Share this story by email

First impacted: AI researchers, AI developers
Time to impact: Medium

A new fine-tuning method, Self-Play fIne-tuNing (SPIN), has been published by a research team from UCLA, which is said to enhance the performance of LLMs without requiring additional human-annotated data. The method aims to grow a strong LLM out of a weak one by having two agents play a game against eachother, where one is trying to determine whether the answer to a question came from the LLM or from a human, and the other agent is trying to fool the first. [Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models] Explore more of our coverage of: Self-Play, Fine-Tuning, Human-Level Performance. Share this story by email

First impacted: AI researchers, AI developers
Time to impact: Medium

A new collaborative research paper introduces a technique called Self-Extend, which the researchers say can expand the context window of LLMs without additional fine-tuning. The method, requiring only a four-line code modification, constructs bi-level attention information (precise location info for neighbor areas, less precise but above a floor for tokens in very different groups) and, according to the paper, has shown in experiments that it can effectively extend the context window of existing LLMs without extra training. [LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning] Explore more of our coverage of: Self-Extend Technique, Context window. Share this story by email

First impacted: Early-stage startup owners, Startup community members
Time to impact: Short

Hugging Face's CEO, Clement Delangue, expressed concern in a tweet about the potential difficulties early-stage startups may face in 2024, particularly those nearing the end of their financial resources without having established a product-market fit. He encouraged these startups to consider joining Hugging Face, citing the platform's past successful collaborations with projects such as Gradio and Timm. [via @ClementDelangue] Explore more of our coverage of: Startup Challenges, Hugging Face, Product-Market Fit. Share this story by email

First impacted: Robotics engineers, AI researchers
Time to impact: Medium

A Stanford and DeepMind team has developed a robot, Mobile ALOHA, which they say can autonomously perform complex tasks such as cooking and serving shrimp, operating an elevator, and storing a 3Ibs pot in a cabinet after learning from just 50 demonstrations. The robot is said to perform tasks reliably, achieving success in 9 consecutive attempts at wiping up a ring from a wine glass on a table and 5 attempts at calling an elevator, even when they try to distract it by throwing things at it! If you've ever dreamed of having a robot that can high five people, or secure them with zip ties, today could be your lucky day. [via @zipengfu] Explore more of our coverage of: Autonomous Robots, Imitation Learning. Share this story by email

First impacted: Graphic Designers, AI Researchers
Time to impact: Short

Alibaba has launched AnyText, a diffusion-based multilingual visual text generation and editing model, which the company says focuses on rendering accurate and coherent text within images. In addition, Alibaba has released a large dataset of multilingual text images, AnyWord-3M, and proposed AnyText-benchmark, a tool for evaluating visual text generation accuracy and quality, with plans to make the project open-source. [AnyText - a Hugging Face Space by modelscope] Explore more of our coverage of: Alibaba AI, Visual Text Generation, Open-Source Projects. Share this story by email

That’s it! More AI news tomorrow!