TheLens_oph finds limits in glaucoma AI

- A 2025 meta-analysis in *Biomedicines* says deep-learning systems can detect glaucoma well on fundus photos and OCT scans, but deployment still looks premature. - The paper pooled 48 studies, not 72, and found external validation lagged internal testing while progression-prediction models remained noticeably weaker than diagnosis models. - That matters because glaucoma screening needs reliability across clinics, devices, and populations — not just strong scores on curated research datasets.

Glaucoma AI keeps producing impressive numbers. That part is real. But the harder question is whether those numbers survive contact with actual clinics — different cameras, different patient populations, different definitions of disease, and follow-up over time. A 2025 systematic review and meta-analysis sharpens that gap: deep-learning models look strong for diagnosing glaucoma from eye images, but the evidence base is still uneven and much less convincing for predicting progression. ### What kind of AI are we talking about? This is mostly image-reading AI. The models take fundus photographs — pictures of the back of the eye — or OCT scans, which are cross-sectional images of retinal layers, and try to classify whether glaucoma is present. That is a good fit for deep learning because glaucoma leaves structural clues in the optic nerve head and retinal nerve fiber layer that can show up in those images. What do they actually cover? The paper people are pointing to is *Deep Learning in Glaucoma Detection and Progression Prediction*, published February 10, 2025 in *Biomedicines*. It searched the literature through October 30, 2024 and included 48 studies in the quantitative meta-analysis. That matters because the “72 studies” figure floating around online does not match the meta-analysis count in the paper itself. How good are the diagnosis models? Pretty good on paper. For fundus photography, the pooled sensitivity was 0.92 and specificity was 0.93, with an AUROC of 0.90. For OCT, pooled sensitivity was 0.90 and specificity was 0.87, with an AUROC of 0.86. In plain English, these systems were usually strong at separating glaucoma from non-glaucoma cases in the datasets they were tested on. ### Then where’s the catch? Test sets usually flatter a model because they look like the data it already learned from. The review says internal validation outperformed external validation, which is exactly the warning sign clinicians worry about. A separate 2023 *npj Digital Medicine* paper makes the same point bluntly: many glaucoma models look excellent inside one clinic and then lose accuracy when prevalence, camera hardware, or disease labels shift. ### Why is external validation such a big deal? Because medicine is not Kaggle. A glaucoma model might learn quirks of one imaging device, one referral center, or one labeling style instead of learning glaucoma itself. Think of it like a student who memorized one practice exam — great score, wrong skill. If the model is going to screen people across community clinics, academic hospitals, and different countries, it has to stay reliable when the background conditions change. ### What about predicting progression? That is the weaker part of the story. Diagnosing existing glaucoma from images is one task. Predicting which patients will worsen over time is harder because glaucoma progression is longitudinal and often depends on more than a single image. The 2025 review says progression models were less robust and likely need multimodal inputs like visual field testing, not just imaging alone. ### Overhyped? Not exactly. It means the likely near-term role is assistance, not autonomy. There are already examples of more generalizable systems — one 2023 model was tested on 149,455 fundus images from 13 data sources and still performed strongly — but even that paper called for prospective validation. The field is moving, just not to “replace the clinician” territory yet. Glaucoma AI is good at the easy part of innovation theater — posting high retrospective accuracy. The hard part is proving the model works across messy real-world care and over time. That proof is still being built.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.