Study Warns ChatGPT Health Under-Assessing Emergencies with Safety Risks

Stephania Chopra
13 hours ago
2 min read

A major independent study published in Nature Medicine has raised significant safety concerns about ChatGPT Health, the consumer-facing medical guidance tool developed by OpenAI. The research found that the system under-triaged more than half of simulated emergency medical cases, suggesting that the AI could give inappropriate advice in potentially life-threatening situations.

medical emergency evaluation with AI interface — A new study finds that ChatGPT Health under-assesses serious medical emergencies, highlighting safety concerns about AI triage.

This finding has prompted experts to warn that reliance on AI tools for health-related emergency assessment could lead to delayed care and serious harm if users trust incorrect recommendations.

Study Design and Key Findings

Researchers at the Icahn School of Medicine at Mount Sinai conducted a structured evaluation of ChatGPT Health using 60 clinician-authored medical scenarios spanning 21 clinical specialties. These scenarios were tested under varied contextual conditions, resulting in 960 interactions with the AI tool.

The outcomes included:

ChatGPT Health under-triaged 52 percent of gold-standard emergency cases, directing users toward non-urgent care rather than recommending immediate hospital or emergency department evaluation.
The tool typically performed adequately for textbook emergencies such as classic stroke or severe allergic reactions but struggled with more nuanced or borderline cases.
It showed inconsistencies in crisis intervention guidance for scenarios involving suicidal ideation, where safety messages did not always trigger appropriately.

These results indicate that while the AI demonstrates some ability to identify serious conditions, its overall triage performance may be unreliable for urgent health decisions.

Safety Concerns and Expert Reactions

Experts reviewing the study results have expressed alarm over the potential risks of using AI systems for health triage without sufficient oversight or clinical safeguards. Some have characterized the situation as “unbelievably dangerous,” since misclassification or delayed emergency recognition could lead to preventable harm.

Critics also note that the system’s responses can be influenced by contextual cues such as comments from family or friends, which may shift recommendations toward less urgent care even when serious symptoms are present. This susceptibility to bias highlights limitations in the AI’s decision logic when applied to real-world health assessments.

Researchers emphasized that the study used simulated cases and that real-world validation is needed before drawing definitive conclusions about everyday use of ChatGPT Health. However, the findings underscore the importance of transparent evaluation and caution against over-reliance on AI tools for critical health decisions.

OpenAI Response and Ongoing Development

OpenAI acknowledged the study and welcomed independent research, noting that the results may not reflect typical real-life usage patterns and that the model undergoes continuous improvements. The company maintains that updates and refinements are ongoing to enhance performance and safety.

With millions of people reportedly using ChatGPT for health-related queries daily, the discussion around AI safety standards and regulatory oversight continues to grow. Experts call for clearer guidelines and more robust testing frameworks to ensure that AI health tools do not inadvertently endanger users by providing misleading or unsafe guidance.

Study Warns ChatGPT Health Under-Assessing Emergencies with Safety Risks

Study Design and Key Findings

Safety Concerns and Expert Reactions

OpenAI Response and Ongoing Development

Recent Posts

Comments