Raw LLMs can shift medical advice
Researchers and practitioners are warning that without safety layers, big language models can alter or drift medical guidance — a reminder that simply using a base model for health advice is risky. (Virendra Singh Bhalothia highlighted an arXiv paper showing LLMs change medical recommendations when unguarded.) (x.com)
A large language model is the autocomplete behind a chatbot: it predicts the next word from patterns in huge text datasets, not by checking a medical rulebook line by line. In medicine, that means a model can sound like a careful clinician while still stitching together advice from old papers, mixed guidelines, and misleading prompts. (arxiv.org) (who.int) A safety layer is the extra system wrapped around that base model, like guardrails on a mountain road. It can add refusal rules, retrieval from current guidelines, escalation to a human, and warnings when a question looks urgent or high risk. (who.int) (jmir.org) Take those guardrails away, and the same underlying model can drift. A May 2025 arXiv paper from researchers at Dartmouth, Massachusetts General Hospital, and Northwestern built a benchmark called DriftMedQA and found seven state-of-the-art models struggled to reject outdated recommendations across 4,290 scenarios based on changing clinical guidance. (arxiv.org) That kind of drift is not just “the model is a little behind.” The paper says models often endorsed conflicting guidance when recommendations changed over time, which is the medical equivalent of a map that still sends drivers onto a bridge that has already been closed. (arxiv.org) Researchers have been measuring the same problem from another angle: basic safety. MedSafetyBench, an arXiv benchmark first posted in March 2024 and revised in October 2024, found publicly available medical language models did not meet standards of medical safety until they were specifically tuned for it. (arxiv.org) Even when a model starts with decent behavior, outside text can push it off course. A December 19, 2025 study in JAMA Network Open tested prompt injection, which is hidden or malicious text designed to steer a model, and found attacks succeeded in 94.4% of 216 simulated patient dialogues and 91.7% of extremely high-harm scenarios. (jamanetwork.com) The examples were not minor. The JAMA paper says manipulated systems recommended unsafe or contraindicated treatments, including pregnancy Category X drugs such as thalidomide in a high-risk scenario. (jamanetwork.com) This is why “raw model” versus “medical assistant” is not a branding detail. The World Health Organization warned in January 2024 that large multimodal models in health care need governance, human oversight, and risk management because these systems can generate plausible but incorrect outputs in sensitive settings. (who.int) You can see the value of those protections in newer model behavior. A March 16, 2026 Journal of Medical Internet Research study looked at 908 responses from four popular models to 227 real patient questions and found the systems increased disclaimers and referrals as case urgency rose, which is exactly the kind of wrapper behavior a bare base model does not reliably provide. (jmir.org) The practical rule is simple: if a system is giving health advice, the base model is only one part of the machine. The part that checks current guidance, flags emergencies, resists manipulation, and hands off to humans is the part keeping “sounds right” from turning into “dangerously wrong.” (arxiv.org) (jamanetwork.com) (who.int)