Open-source single-camera player tracking demo

- Developer Piotr Skalski published an open-source basketball vision pipeline on September 30, 2025, showing single-camera player detection, tracking and jersey recognition. - The released workflow combines RF-DETR, SAM2, SmolVLM2, SigLIP and ResNet, and Roboflow said it fine-tuned RF-DETR on 10 basketball classes. - The code is available through Roboflow’s published notebook and model repositories, with sample videos and implementation details online.

Piotr Skalski published an open-source basketball computer-vision pipeline that shows how a single camera feed can be used to detect players, track them across frames and read jersey numbers, according to a Roboflow post published on September 30, 2025. The workflow uses RF-DETR for object detection, SAM2 for segmentation and tracking, and SmolVLM2 for optical character recognition on cropped jersey numbers, the post said. Roboflow said the code was released publicly through a notebook linked from the article, alongside example videos of the system running on game footage. ### Which parts of the demo are actually open source? Roboflow’s September 30, 2025 post said the “basketball player recognition code” was open-sourced and linked to a Colab notebook for the workflow. The article described a pipeline that detects players, jersey numbers, the ball and the rim, then tracks players frame to frame and assigns identities from jersey reads. GitHub repositories for the underlying models are also public. (blog.roboflow.com) Roboflow’s RF-DETR repository says the model is an open-source real-time object detection and segmentation architecture, while Meta’s SAM2 repository says it provides code, checkpoints and notebooks for inference with Segment Anything Model 2. Hugging Face has published SmolVLM2 model documentation and model cards for the vision-language model family used in the workflow. ### How does a single-camera system identify players? The Roboflow post said the first step is RF-DETR-based detection of players, jersey numbers and basketball objects in each frame. SAM2 then tracks selected objects across video frames, producing segmentation masks rather than only bounding boxes, which the post said improves matching between players and number regions. SmolVLM2 is used to read cropped jersey numbers, and a fine-tuned ResNet classifier is used on number crops as part of the recognition stack, according to the Roboflow article. (github.com) The same post said SigLIP embeddings and K-means clustering were used to separate players into teams based on visual similarity. ### What basketball events does the pipeline say it can tag? Roboflow said it fine-tuned RF-DETR on a custom dataset with 10 classes: ball, ball-in-basket, number, player, player-in-possession, player-jump-shot, player-layup-dunk, player-shot-block, referee and rim. (blog.roboflow.com) Those labels indicate the system is built to do more than tracking alone, because it can attach possession and shot-type information to detected players and objects. A course listing and discussion pages describing the same workflow said the pipeline also includes homography for mapping player positions to court coordinates and a shot-detection step for classifying outcomes. Those secondary sources align with the event-tagging claims in the demo materials, though the most specific class list comes from Roboflow’s own post. ### Why do the model choices matter? (blog.roboflow.com) RF-DETR’s GitHub repository describes the model as a real-time transformer architecture for object detection and instance segmentation built on a DINOv2 backbone. Roboflow’s post said it used RF-DETR-S because it offered what the company called the best balance of speed and accuracy for basketball footage with motion blur, overlaps and small jersey numbers. Meta’s SAM2 repository says the model is designed for promptable segmentation in images and videos, with streaming memory for real-time video processing. (github.com) Hugging Face’s SmolVLM2 materials describe the family as compact vision-language models built for image and video understanding, which fits the OCR-style jersey-reading step shown in the basketball pipeline. ### How does this fit with earlier sports-vision work? (github.com) A CVPR 2024 workshop paper by Maria Koshkina and James H. Elder described jersey-number recognition as an important task in sports video analysis and released a public framework and dataset for that problem. That paper focused on hockey and soccer settings, but it provides prior academic context for the number-recognition problem the newer basketball demo is tackling with newer open models. (github.com) The next concrete step for readers is the published notebook and model repositories: Roboflow’s post links the basketball workflow, RF-DETR’s GitHub repository shows the current package and release history, and Meta’s SAM2 repository provides checkpoints and inference examples for replication. (blog.roboflow.com) (arxiv.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.