Chatbots still trip on medical Qs

New coverage reports that chatbots can still give incorrect answers to medical questions, and it lays out four prompt tips aimed at improving accuracy when using AI for health queries. The write‑up stresses that prompt quality and manual validation remain important when relying on conversational AI for sensitive topics. (mashable.com)

People are asking chatbots medical questions every day, but new reporting says the answers can still be wrong in ways that miss emergencies. (mashable.com) Mashable reported on April 12 that three recent studies found large language models were less reliable on health questions than many users assume, even when the bots sounded confident. The article cites one study that said ChatGPT Health, launched in January 2026, “under-triaged” slightly more than half the cases it was given, including emergencies that needed immediate care. (mashable.com; openai.com) A February 2026 Oxford-led study in *Nature Medicine* put nearly 1,300 online participants through medical scenarios and found people using large language models did no better than people using traditional methods such as search or their own judgment. The researchers said users often left out details the models needed, while the models mixed good and bad advice in the same answer. (ox.ac.uk) A separate physician-led study published February 13, 2026 tested Claude, Gemini, GPT-4o, and Llama on 222 patient-style primary care questions. It found problematic response rates from 21.6 percent for Claude to 43.2 percent for Llama, and unsafe response rates from 5 percent for Claude to 13 percent for GPT-4o and Llama. (nature.com) The problem starts with how these systems work: they predict plausible next words from huge training datasets, not a checked medical verdict from a doctor. The World Health Organization said in January 2024 that health-focused generative artificial intelligence can produce false, inaccurate, biased, or incomplete statements that can affect decisions. (who.int) The audience is already large. OpenAI said in January 2026 that more than 230 million people globally ask health and wellness questions on ChatGPT every week, and Pew Research Center said on April 7 that 22 percent of United States adults get health information from artificial intelligence chatbots at least sometimes. (openai.com; pewresearch.org) Mashable’s four prompt tips are practical, not technical. The article says users should first test a chatbot with a known falsehood, then give it exact details about age, sex, symptoms, medications, and timing, ask it to explain what information it is missing, and press it to name red-flag symptoms that require urgent care. (mashable.com) Those steps do not turn a chatbot into a clinician. OpenAI’s help page for ChatGPT Health says the product is designed to support, not replace, medical care, and Anthropic says Claude is not a substitute for professional advice or medical care. (help.openai.com; anthropic.com) Users seem to understand part of that tradeoff. Pew found most people who use social media or artificial intelligence chatbots for health information do not rate the information as highly accurate, but many do rate it as convenient and easy to understand. (pewresearch.org) That leaves the same closing rule the studies and companies keep returning to: use the bot to organize questions, not to settle diagnosis or urgency on its own. On medical questions, the safest answer still comes from a licensed human who can examine the patient, check the record, and be accountable for the call. (ox.ac.uk; help.openai.com)

Chatbots still trip on medical Qs

Get your own daily briefing