NumKong for Lightweight RAG

NumKong (formerly SimSIMD) is positioning itself as an efficient choice for small-scale, local RAG across six languages and 20 formats—boasting around 1 million weekly PyPI pulls and support for hardware from Arm to mainframes. The project is aimed at production prototypes that need low-resource retrieval without heavy infra. (x.com)

NumKong is an open-source math library that packages thousands of highly optimized routines for doing vector and matrix math on ordinary processors and in browsers, so applications can run similarity searches and other retrieval work without relying on specialised accelerators. (ashvardanian.com) The project is a relaunch of the earlier SimSIMD work and was published by Ash Vardanian as a compact, multi-language toolkit with roughly 2,000 low-level kernels and about 200,000 lines of code aimed at production prototypes and edge use. (ashvardanian.com) NumKong focuses on mixed-precision arithmetic — that means it deliberately mixes number formats with different sizes and accuracies so you can trade a small amount of numerical precision for much lower memory use and faster computation — and it uses wider accumulators (summing small values into a larger number type) to prevent overflow and reduce precision loss during long dot products. (github.com) Because NumKong was developed alongside a vector-search engine called USearch and includes a WebAssembly backend for running in browsers and other sandboxed environments, the library is explicitly designed to perform the inner products and distance calculations that power retrieval-augmented generation (the step where documents are scored and returned as context) on ordinary servers or in-client environments rather than on a central GPU fleet. (ashvardanian.com) The recent v7.x releases add native support for 8-bit floating-point formats (FP8), which are very small numeric representations that reduce memory and bandwidth when running model inference, and provide efficient software emulation when hardware lacks FP8 instructions so older servers and browsers can still benefit. (github.com) (developer.nvidia.com) NumKong ships a Python SDK that sits between NumPy and native kernels — keeping familiar array interoperability while exposing low-precision dtypes and packed kernels — and the project has been packaged for common distribution systems (PyPI and a vcpkg port was added in late March 2026) so teams can evaluate it in existing build and deployment flows. (pypi.org) (vcpkg.roundtrip.dev) In practical terms for production prototypes, NumKong’s benchmarks show multi-giga-scale operation throughput on batched dot products with measured relative-error figures, which translates into higher throughput for CPU-based similarity scoring and lower memory bandwidth for local vector indexes — useful when you want a lightweight retrieval layer that runs on commodity servers or in-browser sandboxes instead of large GPU-backed infra. (ashvardanian.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.