M1 Max runs LocateAnything model

- A researcher posted on X on June 1 that Nvidia's LocateAnything model ran on an M1 Max MacBook Pro through Apple's MPS backend. - Nvidia's project page says LocateAnything is a 3B vision-language model for GUI grounding, OCR localization and dense object detection using parallel box decoding. - Nvidia hosts the model on Hugging Face and published the paper on arXiv on May 26.

A researcher said on X on June 1 that Nvidia's LocateAnything vision-language model ran on an M1 Max MacBook Pro using Apple's Metal Performance Shaders, or MPS, backend rather than CUDA. The post said the setup produced real-time bounding-box detection on images and user-interface elements on Apple silicon. Nvidia published LocateAnything last week as a 3B model built for visual grounding and detection across general objects, documents, text and GUI targets. That matters because LocateAnything was introduced by Nvidia as a unified model for "document understanding, GUI grounding, dense object detection, and OCR localization," tasks that are usually associated with GPU-heavy inference pipelines. Nvidia's project page says the model's core method, Parallel Box Decoding, predicts each bounding box in a single forward pass instead of generating coordinate tokens sequentially. (x.com) ### What exactly is LocateAnything doing in this demo? Nvidia's paper describes LocateAnything as a vision-language grounding model that takes natural-language prompts and returns geometric outputs such as boxes or points. The project page says it can localize objects, interface controls, text regions and document elements under one model. (research.nvidia.com) In practice, that means a user can ask for a target in plain language — such as a button in a screenshot or an object in a photo — and the model returns a bounding box rather than only a text answer. The X post described that behavior running locally on an M1 Max MacBook Pro through MPS. ### Why is the M1 Max detail getting attention? (research.nvidia.com) Apple's MPS backend is the path PyTorch uses to run supported models on Apple silicon without Nvidia CUDA hardware. The significance of the June 1 post is not that Nvidia released a Mac-specific version, but that a newly released Nvidia model appears to have been made to run locally on an M1 Max laptop. That is an inference from the post and Nvidia's release materials. (x.com) The X post's claim of real-time behavior also lines up with Nvidia's own emphasis on speed. Nvidia's project page says Parallel Box Decoding achieves "significantly faster decoding throughput," and the paper says the method improves both throughput and localization accuracy. ### What did Nvidia release last week? Nvidia researchers posted the arXiv paper on May 26 and revised it on May 27. (x.com) The paper says the team trained on more than 138 million samples and curated a dataset with 138 million language queries and 785 million bounding boxes covering object detection, GUI grounding, referring comprehension and text localization. (research.nvidia.com) Hugging Face lists the checkpoint as `nvidia/LocateAnything-3B` and shows usage through Transformers and server frameworks including vLLM and SGLang. Nvidia also hosts a public demo Space that says larger inputs in the demo are auto-resized to 1K, while full-resolution inference is available when the weights are run locally. ### Does this mean Macs are now a standard target for the model? (arxiv.org) Nvidia's published materials do not present LocateAnything as a Mac launch. The official pages describe the model, paper, demo and downloadable weights, while the M1 Max run appears in a third-party X post rather than Nvidia documentation. What the post does show, if replicated, is that the model can be adapted beyond a CUDA-only workflow for at least one Apple-silicon setup. (huggingface.co) Nvidia's next public milestones are likely to be updates to the Hugging Face model page, demo and the dataset page marked "incoming" on the project site. (research.nvidia.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.