Overview
- An independent Nature Medicine study reported a 52% under‑triage rate for gold‑standard emergencies, including advice to wait 24–48 hours in scenarios like diabetic ketoacidosis and impending respiratory failure.
- Performance followed an inverted U pattern, with stronger results on textbook cases but notable failures at high‑risk extremes and a 64.8% over‑triage rate in lower‑risk scenarios.
- Suicide‑crisis messaging triggered inconsistently, appearing more reliably in lower‑risk descriptions while sometimes failing when users described specific plans for self‑harm.
- Context strongly influenced outputs, as mentions of family minimizing symptoms shifted recommendations toward less urgent care with an odds ratio of about 11.7.
- OpenAI welcomed the research yet said it may not reflect typical real‑life use, emphasized ongoing model updates and that ChatGPT Health is not a substitute for medical care, and cited roughly 40 million U.S. adult users daily.