LLM memory breakthrough for living docs

EverMind unveiled an MSA (Memory Sparse Attention) architecture that promises efficient long‑term memory for LLMs, potentially letting AI-powered knowledge bases retain institutional context across millions of tokens. That kind of persistent memory makes automated, multi-year documentation assistants plausible instead of ephemeral. (prnewswire.com)

EverMind released its Memory Sparse Attention (MSA) research paper on March 18, 2026 and followed with a PR Newswire announcement on March 19, 2026; the release points to a Zenodo paper record and an open GitHub repository for MSA. (prnewswire.com)) The MSA README reports under‑9% performance degradation when scaling a model’s context from 16K to 100M tokens and claims O(L) (near‑linear) computational complexity via document‑wise RoPE. (github.com)) EverMind describes KV‑cache compression plus a “Memory Parallel” inference engine that uses tiered storage—GPU‑resident routing keys and CPU content K/V—with distributed scoring and on‑demand transfers to enable 100‑million‑token throughput on 2× A800 GPUs. (github.com)) The team reports MSA outperforms same‑backbone RAG systems and best‑of‑breed RAG pipelines on long‑context QA and Needle‑In‑A‑Haystack (NIAH) benchmarks, maintaining stability and accuracy at extreme context lengths. (github.com)) EverMind positions MSA inside its broader EverMemOS stack and publishes LoCoMo and LongMemEval‑S scores on its site—LoCoMo 93.05% and LongMemEval‑S 83.00%—as proof points for sustained long‑term memory performance. (evermind.ai)) The public GitHub repo shows 52 commits and lists “Code” and “Models” as coming soon while the PR release links the paper and code, signaling a research‑first release with staged artifact availability for adopters. (github.com))

LLM memory breakthrough for living docs

Get your own daily briefing