Study: AI misses primary diagnosis

A new study found AI language models failed to produce an appropriate early diagnosis more than 80% of the time in tested cases. (euronews.com) Observers note regulatory enforcement around AI in healthcare remains nascent even as patient use rises, raising questions about supervised clinical deployment. (medicalbuyer.co.in)

Artificial intelligence chatbots can sound like a doctor, but a new JAMA Network Open study found they still missed an appropriate early diagnosis in more than 80 percent of tested cases. (jamanetwork.com) The researchers tested 21 off-the-shelf large language models on 29 standardized clinical vignettes from the January 2025 update of the MSD Manual and generated 16,254 responses in total. Analyses ran from January through December 2025. (jamanetwork.com) The weakest step was differential diagnosis, the first ranked list of likely causes a clinician uses to decide what to test next. JAMA said final diagnosis and management scored better than that early-reasoning stage, but the authors wrote the models still could not be relied on for unsupervised patient-facing decisions. (jamanetwork.com) The study matters because large language models are already being pitched for clinical use, while patients are using general-purpose chatbots for symptom advice outside hospitals and clinics. The paper said many past evaluations leaned on multiple-choice tests that do not capture the messier sequence of real patient care. (jamanetwork.com) Regulators have approved many other kinds of medical artificial intelligence, especially devices, but that does not mean a general chatbot is cleared to diagnose patients on its own. The United States Food and Drug Administration says it maintains a public list of artificial intelligence-enabled medical devices authorized for marketing in the United States. (fda.gov) Hospitals are also getting governance guidance before they get a single national rulebook for chatbot medicine. The Joint Commission and the Coalition for Health AI released initial guidance on September 17, 2025, covering local validation, monitoring, and policies for responsible use. (jointcommission.org) States have started writing narrower laws aimed at transparency and oversight. California’s AB 3030, which took effect on January 1, 2025, requires notice when generative artificial intelligence is used in patient clinical communications unless a licensed or certified human provider reviews the message. (mbc.ca.gov) California also passed AB 489, effective October 1, 2025, barring artificial intelligence systems from suggesting a patient is under licensed human medical oversight when no such oversight exists. Texas laws that took effect in late 2025 and early 2026 require disclosure when artificial intelligence is used for diagnostic or treatment-related services and require review of artificial intelligence-created records. (fenwick.com) (agg.com) (texmed.org) Not every study points in the same direction on every task. A February 2026 meta-analysis in npj Digital Medicine found large language model-assisted clinicians outperformed clinicians alone on several diagnostic measures, but the authors still called for rigorous real-world evaluation before clinical implementation. (nature.com) The line emerging from the new evidence is narrower than the hype: these systems may help with parts of care, but the first diagnostic guess still breaks too often to hand over to a chatbot alone. (jamanetwork.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.