Millions of Americans are now asking chatbots about rashes, coughs, and other health problems instead of starting with a doctor, but the machines they’re turning to are still lousy at medicine. A new study in JAMA Network Open found that 21 frontier large language models failed most often when symptoms were ambiguous and still missed a disturbing share of straightforward cases, while a separate survey suggests that a huge slice of the public is already using AI for health guidance anyway.
That combination is the real story: AI medical advice demand is racing ahead of reliability. The models may be getting better at sounding helpful, but medicine is less about sounding confident than resisting the urge to guess. Chatbots keep doing the opposite.
What the JAMA study found
Researchers tested the models with realistic patient scenarios and asked them to act like doctors. When the symptoms could point to more than one condition, the failure rate topped 80 percent. Even with clearer cases that included physical exam findings and lab results, the models still got it wrong 40 percent of the time.
The pattern matters more than the raw miss rate: the models collapsed too quickly onto a single answer instead of working through a differential diagnosis the way clinicians do. That is the sort of error that can turn a harmless symptom check into a confident dead end.
- 21 frontier large language models were evaluated
- Failure rate was over 80 percent for ambiguous symptoms
- Failure rate was 40 percent even with exam findings and lab results
- The models tended to jump to one answer too early
Why people are still using ChatGPT and other chatbots for health
The survey side is just as sobering. One in four American adults, or 66 million people, said they have asked ChatGPT or a similar chatbot for medical advice. Many said they used AI before or after seeing a professional, but a meaningful share said the bot replaced a real visit entirely.
Among people who used AI for health questions, 14 percent said they skipped a provider they otherwise would have seen. Cost is clearly part of the appeal: 27 percent said they did not want to pay for a visit, and 14 percent said they could not afford one. Add in time, access, and plain convenience, and the chatbot starts looking less like a novelty and more like a triage shortcut.
The confidence problem
The nastiest part is that wrong answers can still feel useful. Almost half of respondents said AI made them feel more confident speaking with a provider, 22 percent said it helped them spot issues earlier, and 19 percent said it helped them avoid unnecessary tests or procedures. That can be a feature when the advice is solid. When it is hallucinated nonsense, it is just branding.
There is also healthy skepticism in the mix. About a third of respondents who used AI for health issues said they distrusted it, and one in ten said the advice was potentially unsafe. So the public appears to know these tools are shaky while using them anyway, which is a very internet-shaped way to approach medicine.
AI health advice is outrunning oversight
None of this is happening in a vacuum. Google has already had to clean up absurd AI suggestions, and doctors have reported transcription tools inventing medications that never existed. The problem is not just factual error; it is false certainty delivered at scale, wrapped in a chat interface that makes bad advice feel personal.
That is why this looks less like a quirky tech hiccup and more like an overdue regulatory fight. Hospitals, insurers, and platform makers are all moving into AI-assisted care, but the public is already living with the consequences before any serious guardrails have caught up. The next question is whether regulators move first, or whether the next wave of patients learns the hard way that a fluent answer is not the same thing as a correct one.

