Chatbot medical responses get good reviews in UCSD study

John Ayers, a UC San Diego computational epidemiologist
John Ayers, a UC San Diego computational epidemiologist, led a team that explored using a chatbot to answer routine medical questions.
(John Gibbins)

ChatGPT often answered routine questions more completely and with more empathy than busy human doctors, according to evaluators.


What are my odds of dying after swallowing a toothpick?

Do I need to see a doctor after hitting my head on a metal bar while running?

Am I likely to go blind after getting bleach splashed in my eye?

A new study led by researchers at UC San Diego in La Jolla explores how artificial intelligence compares with human expertise in the workaday task of dashing off quick responses to routine medical questions.

The paper, published April 28 in the medical journal JAMA Internal Medicine, indicates that ChatGPT, the chatbot with a seemingly infinite breadth of training, more than held its own when its responses were judged by a panel of experts against those made by flesh-and-blood physicians.

Evaluators “preferred the chatbot responses to the physician responses” in 78 percent of evaluations made, according to the study. What’s more, chatbot responses were found to be of a “significantly higher quality” than those from humans.

And in terms of empathy, an area where people intuitively would seem to have an edge, silicon again excelled.

“Chatbot responses were rated significantly more empathetic than physician responses,” the paper states.

Despite the results, the paper’s authors say doctors should be excited by it.

John Ayers, a UCSD computational epidemiologist who led the data collection and analysis process, said he believes artificial intelligence will be a game-changer for medicine in its ability to lighten workloads while improving quality for patients.

“So many more patients who are now getting no response or a bad response will be able to get answers from an AI-equipped physician who will be able to serve far more patients,” Ayers said.

The paper’s results, however, test a specific set of circumstances pertaining to text communications between doctors and patients and do not generalize to clinical settings.

Researchers pulled 195 randomly selected questions from the Ask a Doctor subsection of, the popular news aggregation and discussion site. The group, which has nearly 500,000 members, allows anyone to publicly ask any question they want of doctors whose qualifications are verified by Reddit.

Since questions and answers are all made in public for anyone on the internet to read, feeding them to ChatGPT required no particular data wizardry.

“Honestly, it’s just plug-and-play,” Ayers said. “All we did was cut and paste the questions into ChatGPT and save the response.”

No additional refinement was made, he said, after the chatbot delivered an answer.

UCSD lab
Dr. Davey Smith, chief of infectious-disease research at UCSD, says a bot “can help at the beginning, but it’s on me to sign off.”
(K.C. Alfred / The San Diego Union-Tribune)

Chatbot answers tended to be much more verbose and friendly-sounding, while those from doctors were clearly dashed off by a chronically busy person relying on shorthand to be as efficient as possible.

In answering the swallowed-toothpick question, for example, the doctor’s response starts, “If you’ve surpassed 2-6 h??, chances are they’ve passed into your intestines. Which means it can’t be retrieved easily.”

ChatGPT starts with, “It’s natural to be concerned if you have ingested a foreign object, but in this case, it’s highly unlikely that the toothpick you swallowed will cause you any serious harm.”

The head-injury question about hitting a metal bar on a run indicates that chatbots simply have time to be more complete.

The physician response bangs out eight symptoms that should cause the person to see a doctor, including nausea or vomiting, dizziness, severe or worsening headache, loss of consciousness, confusion, neck stiffness, problems with vision and limb weakness, concluding, “If you develop any of these in the next 24 h, rush to the emergency room.”

The chatbot provides a more complete set of symptoms, telling the patient to be wary of loss of consciousness “even if it’s just for a few seconds,” and includes slurred speech, difficulty with balance or coordination, seizures, changes in behavior or personality and clear fluid draining from the nose or ears.

“While it’s possible that you may be fine, it’s important to be evaluated by a medical professional to rule out any serious injuries,” the chatbot response adds. “It is possible that you may have suffered a concussion or other head injury, even if you didn’t lose consciousness.”

Dr. Davey Smith, chief of infectious-disease research at UCSD and one of the doctors tasked with evaluating the responses, said he found the chatbot’s facility at answering medical questions to be shocking, even knowing that ChatGPT has already successfully passed medical licensing exams.

“It seemed like it could read in the message from the patient that they were anxious or sad or, you know, had emotions attached to these questions,” Smith said. “Not only was it more accurate, because it has all the information at its fingertips, right, but it was also empathetic, which was pretty cool.”

But does Smith, who sees patients every day, fear eventual replacement?

Not at all. AI, he said, is looking like a salve rather than an irritant.

“I get patient emails every day and they’re asking questions almost exactly like this,” Smith said. “And I spend about an hour a day — others spend more — going through emails and answering them as quickly as possible, you know, making an appointment, ‘Here’s your prescription,’ ‘That’s just a hangnail’ or ‘You need to go to the emergency room.’

“I don’t have time for empathy either. I’m just trying to get through it, but what if we had a way where this program could make it easier for us? What if it would draft something out ahead of time and I just review it?”

If the computer has the time to refer to the actual literature and churn out more complete answers — and the time to show a little more evidence of concern for a patient’s anxiety — that could be revolutionary, he said.

But, he added, no AI is giving his patients advice on its own. At the end of the day, it’s his medical license on the line if the AI gets something wrong.

“The bot can help at the beginning, but it’s on me to sign off,” Smith said. ◆