Sonar-Visual underwater dataset accepted ICRA 2026
- Weitung Chen and co-authors said on June 3 their SOVIS sonar-visual underwater perception dataset had been accepted to an IEEE ICRA 2026 workshop. - The paper reports more than 76,000 paired frames from 17 dives at six Trondheimfjord sites, plus a 7x fish-detection gain over a camera baseline. - The authors said dataset assets, annotations and code will be released through a project page tied to the SOVIS paper.
Weitung Chen and four co-authors said on June 3 that their sonar-visual underwater robotics dataset, called SOVIS, had been accepted to the IEEE ICRA 2026 S2S workshop in Vienna. The paper describes a paired dataset built for cross-modal perception, where a robot learns from both acoustic sonar returns and camera imagery rather than from either sensor alone. The work arrives as underwater robotics researchers push for more public data that can support detection, mapping and navigation in low-visibility environments. The workshop program lists the paper among presentations in “From Sea to Space: Advancing Perception in Harsh Domains.” ### Why are researchers pairing sonar with cameras underwater? Underwater robots typically use cameras for semantic detail and sonar for range information that remains usable when water is dark, turbid or backscattered, the authors wrote in the paper. Chen and his co-authors said the problem is that cross-modal learning between those sensors has remained limited because paired sonar-visual datasets are scarce. (arxiv.org) The paper frames SOVIS as a dataset for that gap. Instead of treating sonar and vision as separate pipelines, the dataset is designed to align them in time so models can be trained to predict across modalities, including tasks such as deriving sonar-like structure from monocular images, according to the authors. ### What exactly is in SOVIS? SOVIS comprises more than 76,000 paired frames collected across 17 dives at six sites in the Trondheimfjord, the paper says. (arxiv.org) The authors said the data was processed through an end-to-end pipeline that cleans and synchronizes the sonar and camera streams before training or annotation. The paper also says the team built an interactive annotation tool to speed up labeling of the paired data. (arxiv.org) That matters because underwater datasets often require manual review across noisy acoustic imagery and visually degraded camera footage, which can make labeling slow and inconsistent. The authors did not detail a public release date in the abstract, but said the dataset is intended to support broader research in cross-modal underwater perception. ### What result did the team show first? The authors used a small labeled subset of SOVIS for a proof-of-concept fish-detection task. In that experiment, the paper reports a sevenfold improvement in mAP@0.10 over a monocular camera baseline. That result is narrow and early by the authors’ own framing. The abstract presents it as a demonstration that paired sonar-camera data can improve perception performance, not as a comprehensive benchmark across all underwater tasks or environments. (arxiv.org) ### Where does the workshop acceptance fit? The arXiv record says the paper was submitted on May 31 and accepted to the IEEE ICRA 2026 S2S Workshop, short for “From Sea to Space: Advancing Perception in Harsh Domains.” The workshop site lists the paper in its presented papers lineup for ICRA 2026 in Vienna on June 1. (arxiv.org) The acceptance matters mainly as a venue signal. Workshop papers are typically used to surface early datasets, methods and problem statements to a specialist audience before wider benchmarking or follow-on releases. (arxiv.org) That characterization is based on the workshop format and the paper’s own presentation as a first step, not on any claim by the organizers about future publication. ### What happens next? (arxiv.org) The June 3 announcement pointed readers to a forthcoming project page for the dataset and code, and the paper names Weitung Chen, Phil Tinn, Per Gunnar Auran, Martin Ludvigsen and Peter Halland Haro as authors. The public paper is already available on arXiv as 2606.01398, while the workshop listing places the presentation at the ICRA 2026 S2S program in Vienna. (arxiv.org)