New Technique Slashes LLM Memory Use

AI optimization firm Nota AI announced a new quantization technology that cut the memory usage of Upstage's Solar LLM by 72%. The company claims its "MoE Quantization" approach achieves this significant reduction while preserving the model's performance, a key step for efficient AI deployment.

The core challenge in deploying large-scale AI is managing the immense memory and computational power required, which often restricts models to data centers. Optimization techniques that shrink this footprint are critical for running AI on edge devices like phones and in vehicles, reducing latency and operational costs for enterprise applications. The model at the center of this breakthrough, Upstage's Solar Open 100B, is a massive 102.6-billion parameter Large Language Model. It uses a Mixture-of-Experts (MoE) architecture, where different sub-networks specialize in tasks, activating only 12 billion parameters for any given token to improve efficiency. Conventional quantization methods that uniformly compress a model can degrade the performance of MoE architectures. Nota AI's specialized algorithm, "MoE Quantization," selectively preserves precision in crucial areas of the model's expert sub-networks, which cut the Solar model's memory footprint from 191.2GB down to 51.9GB. This approach keeps the model's perplexity score, a measure of performance, nearly identical to the original. This optimization directly reduces hardware requirements, cutting the number of high-end GPUs needed for inference by half. Specifically, the quantized Solar model can run on two NVIDIA A100 (80GB) GPUs instead of the four required by the original, significantly lowering deployment costs. Nota AI is a South Korean startup focused on AI optimization, backed by $42.6 million in funding from investors like Samsung, LG, and Naver. The company is preparing for an IPO on the KOSDAQ market and has established partnerships with major hardware firms like NVIDIA, Arm, and Intel through its NetsPresso optimization platform. Upstage, the creator of the Solar LLM, is another major South Korean AI firm, having raised over $157 million from backers including Amazon, AMD, and Korea Development Bank. The company is actively expanding into the U.S. market, specifically targeting the insurance industry with its document AI and automation tools. This collaboration was part of the "Sovereign AI Foundation Model Project," an initiative led by South Korea's Ministry of Science and ICT to foster the development of globally competitive, homegrown foundation models.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.