ModelScope’s dots.mocr OCR

ModelScope released dots.mocr, a 3B multimodal OCR model that topped benchmarks and is now integrated into vLLM v0.11.0—potentially improving extraction quality for document‑heavy RAG systems. The model’s multimodal gains could reduce noisy retrieval signals from scanned documents. (x.com/ModelScope2022/status/2034826884018500081)

The dots.mocr paper "Multimodal OCR: Parse Anything from Documents" was posted to arXiv on March 13, 2026 and lists authors including Handong Zheng, Yumeng Li, Kaile Zhang and Xiang Bai among the contributing team. (arxiv.org) The project publishes both a general dots.mocr model and a dots.mocr‑svg variant that explicitly targets image→SVG conversion for charts, UI layouts and scientific figures. (huggingface.co) On the olmOCR benchmark table reproduced in the repo, dots.mocr posts per‑category scores such as 85.9, 85.5 and 90.7 on several splits and an overall reported score of 83.9 ± 0.9 in the authors' evaluation table. (github.com) The repository and Hugging Face card include an example serve command for production use: "vllm serve rednote‑hilab/dots.ocr --trust‑remote‑code" and note a vLLM model executor class for dots_ocr in the vLLM API docs. (stable-learn.com) A vLLM Docker image tag frequently referenced for deployment is vllm/vllm-openai:v0.11.0 (multi‑platform image, compressed layers ~11.6 GB in the registry metadata). (hub.docker.com) The project documentation and third‑party deployment guides show a Docker + vLLM compose path and state that performance was validated against the original out‑of‑tree registration during their vLLM server tests. (deepwiki.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.