Particle.news

Independent Study Finds ChatGPT Health Frequently Mis-Triages Emergencies

A Mount Sinai evaluation reports major safety gaps in the health chatbot, prompting calls for stronger guardrails and ongoing independent auditing.

Overview

  • Researchers tested 60 clinician-authored vignettes across 21 specialties in roughly 960 interactions and found the chatbot under-triaged more than half of true emergencies.
  • In lower-risk cases, the system over-triaged many mild conditions that guidelines say can be managed at home, highlighting risks at both ends of severity.
  • Simulated social input from family or friends made the model nearly 12 times more likely to downplay symptoms, underscoring high sensitivity to context.
  • Suicide-risk safeguards proved inconsistent, with crisis banners appearing in some scenarios but disappearing in nearly identical cases after adding normal lab results.
  • OpenAI says the study misinterprets real-world use and that it is refining the model, while clinicians urge people to skip chatbots for acute symptoms and note that data shared with such tools is not protected by HIPAA even as companies say health data is segregated and not used to train models.