Particle.news

First Independent Study Finds Safety Gaps in ChatGPT Health Triage and Suicide Safeguards

Researchers urge ongoing independent testing with clinician guidance given rapid public use.

Overview

  • Published February 23 in Nature Medicine, the Mount Sinai-led evaluation is the first independent safety assessment of ChatGPT Health since its January 2026 launch.
  • Across 960 interactions using 60 clinician-authored scenarios in 21 specialties, the system under-triaged more than half of physician-defined emergencies and misclassified 35% of non-urgent cases.
  • Suicide-crisis prompts to contact the 988 Lifeline triggered inconsistently, failing more often when users described specific self-harm plans than in lower-risk scenarios.
  • Performance was stronger in clear emergencies such as stroke or anaphylaxis but faltered in nuanced high-risk cases like diabetic ketoacidosis or impending respiratory failure, including an asthma case where it advised waiting despite danger signs.
  • OpenAI reported roughly 40 million daily health users, and the study authors call for continuous independent evaluation with clinician oversight and plan to test updates and additional use cases as the model evolves.