Supermemory tops LongMemEval benchmark

- Hylexa highlighted Supermemory on June 2, saying the open-source agent memory system extracts conversational facts into a knowledge graph and supports retrieval. - Supermemory’s most-cited figure is 81.6% on LongMemEval-S, with its repository also advertising roughly 50-millisecond retrieval in one-call queries. - The code is available on GitHub now, and LongMemEval’s maintainers provide the benchmark dataset and evaluation scripts publicly.

Hylexa on June 2 pointed to Supermemory, an open-source memory system for AI agents that says it can turn conversations into structured, retrievable facts. The project’s GitHub repository describes it as a “memory and context layer for AI” that learns from conversations, extracts facts, builds user profiles and handles contradictions and expired information. Supermemory’s research page says the system scored 81.6% on LongMemEval-S, a public benchmark for long-term chat memory, and the repository says retrieval takes about 50 milliseconds in a single call. ### What is Supermemory actually storing? Supermemory’s repository says the system stores more than raw chat logs. It says the engine extracts facts from conversations, tracks temporal changes, resolves contradictions, forgets expired information and combines memory with retrieval-augmented generation in a single query. The same page says it can also build user profiles and connect to sources including Google Drive, Gmail, Notion, OneDrive and GitHub. (github.com) The project’s research page says its design is aimed at “reliable recall, temporal reasoning, and knowledge updates at scale.” That matters because long-running agents often need to remember not just a sentence that appeared earlier, but whether a newer fact replaced an older one. ### What does the 81.6% score measure? LongMemEval’s maintainers say the benchmark was built to test the long-term memory of chat assistants. (github.com) The GitHub page for the benchmark says it includes 500 questions and measures five core abilities: information extraction, multi-session reasoning, knowledge updates, temporal reasoning and abstention. It requires systems to process timestamped chat histories and answer questions after all interaction sessions are complete. (supermemory.ai) Supermemory’s research page says its 81.6% result was on LongMemEval-S, the smaller cleaned version of the benchmark. The page breaks that score into categories including 97.14% for single-session-user, 96.43% for single-session-assistant, 88.46% for knowledge-update, 76.69% for temporal-reasoning and 71.43% for multi-session, while listing Zep at 71.2% overall and full-context prompting at 60.2%. ### Why are agent builders paying attention to memory systems? (github.com) The Supermemory repository says the system is designed for agents that need persistent context across conversations rather than one-off prompts. It says the engine can “deliver the right context at the right time” and build a persistent memory graph across discussions so an assistant can retain preferences, projects and past interactions. (supermemory.ai) LongMemEval’s authors say existing systems must parse dynamic interactions online and answer later questions after all sessions have passed. That setup mirrors a common production problem for agents: remembering what changed, what still applies and what should be ignored as stale. ### Is this just a benchmark story, or a product story too? GitHub shows Supermemory as an active open-source project with more than 24,000 stars and recent commits within the last day. (github.com) The repository presents the system as both a memory engine for developers and a broader “context layer” with connectors, file processing and multimodal extraction. The research page says the benchmark was chosen because it approximates “real-world chat history” and tests retrieval, reasoning over time and filtering noise. (github.com) That is the company’s argument for why benchmark performance should translate into practical agent memory, though that claim remains the project’s own characterization. ### Where can developers check the claims? The LongMemEval benchmark repository says the dataset is publicly available through Hugging Face and includes scripts for evaluation. (github.com) The maintainers also note that LongMemEval was accepted at ICLR 2025 and posted an update in May 2026 pointing users to LongMemEval-V2 for agentic contexts. Supermemory’s code, docs and research pages are already public, and the repository says developers can use the system through its open-source codebase and integrations now. (supermemory.ai) (github.com 1) (github.com 2)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.