Skin‑cancer DL set hits 120K
A deep‑learning model demonstrated state‑level performance on skin‑cancer diagnosis after training on a 120,000‑sample dataset, a scale that helps tighten diagnostic accuracy benchmarks. (x.com). That sample size and reported accuracy make the result notable for clinical translation — bigger, diverse datasets keep turning ML lab gains into practical diagnostic tools. (x.com)
A skin-cancer image model only gets good by seeing thousands of moles, rashes, and melanomas first. A computer learns this the way a resident doctor does: by comparing one labeled case after another until the visual patterns start to stick. (nature.com) The catch is that skin lesions can look wildly different across cameras, clinics, and skin tones. A model trained on a narrow image set can ace a benchmark and still stumble when it meets a real patient from a different population. (academic.oup.com) That is why dataset size keeps coming up in dermatology artificial intelligence. The International Skin Imaging Collaboration’s 2020 challenge used 33,126 dermoscopic training images, and newer benchmarks have pushed far beyond that to test whether performance survives at larger scale. (isic-archive.com, isic-archive.com) One newer benchmark, the SLICE-3D dataset used in the 2024 International Skin Imaging Collaboration challenge, contains 400,000 skin-lesion image crops taken from 3D total-body photographs. That shift from tens of thousands to hundreds of thousands shows where the field is heading: bigger image banks, more varied lesions, and harder tests. (isic-archive.com) Another piece is image type. Many older systems were built on dermoscopic photos, which are close-up magnified images taken with a special skin camera, but clinics also rely on ordinary clinical photos taken from farther away. (nejm.org) The MIDAS benchmark from Stanford enrolled 796 patients with 1,290 unique lesions and 3,830 total images, pairing dermoscopic shots with clinical photos and pathology labels. That kind of paired dataset lets researchers test whether a model can handle the same lesion from more than one viewing angle instead of memorizing one camera style. (nejm.org) Skin tone is another stress test. Stanford’s Diverse Dermatology Images dataset was built specifically because many algorithms had not been rigorously assessed on darker skin tones or on uncommon diseases, which can hide bias until the model leaves the lab. (aimi.stanford.edu) That is the backdrop for a skin-cancer model trained on 120,000 samples posting state-of-the-art results. The number matters because it is large enough to move the conversation away from “can a neural network recognize melanoma in principle” and toward “does it still work when the cases get messy and varied.” (nature.com, academic.oup.com) Researchers have been chasing this for nearly a decade. A Stanford study published in Nature in 2017 reported dermatologist-level classification of skin cancer with deep neural networks, but the field has spent the years since then trying to prove that strong headline accuracy can hold up across external datasets and real clinics. (nature.com, nejm.org) Regulators are now close enough to this idea that the Food and Drug Administration has an AI-enabled medical device list, and DermaSensor received a De Novo classification as a software-aided adjunctive diagnostic device for physicians evaluating lesions suspicious for skin cancer. That means skin-cancer artificial intelligence is no longer just a conference demo; parts of the category are already entering the medical-device rulebook. (fda.gov, accessdata.fda.gov) The hard part now is not squeezing out one more decimal point on a leaderboard. It is proving that a model trained on 120,000 examples keeps its edge across clinics, cameras, and patients who do not look like the training set. (academic.oup.com, aimi.stanford.edu)