LinkedInXFacebook
Subscribe
Orthopedics This Week
  • My Feed
  • |Posts
  • |Events
  • |MSK Innovations
  • |Power Rankings
  • |Masterclasses
  • |Technology Awards
  • Press Releases
  • |Advertising
  • |Job Board
  • Spine
  • ◆Joints
  • ◆Upper Extremities
  • ◆Foot & Ankle
  • ◆Sports Medicine
  • ◆Pain Mgmt
  • ◆Trauma
  • ◆Biologics
  • ◆Technology
  • ◆People
  • ◆Company News
  • ◆Legal & Regulatory
Home/Spine/ChatGPT Beats Dr. Google for … Misinformation?
Spine

ChatGPT Beats Dr. Google for … Misinformation?

November 14, 2024 2 min read Premium comments

Advertisement

ChatGPT Beats Dr. Google for … Misinformation?
Source: Wikimedia Commons and Azeliad
#lumbardischerniationSecondary#chatgpt#northamericanspinesociety#radiculopathy

How do ChatGPT 3.5 and ChatGPT 4.0 stack up to North American Spine Society (NASS) clinical guidelines when it comes to lumbar disc herniation with radiculopathy? That’s what a multicenter team set out to learn. Their work, “Lumbar disc herniation with radiculopathy: a comparison of NASS guidelines and ChatGPT,” appears in the September 2024 edition of the North American Spine Society Journal.

Co-author Ankur Kayastha, B.S., a medical student at Kansas City University, told OTW, “Artificial intelligence (AI) is becoming mainstream in society and will be implemented in a variety of fields. This particular study was performed to obtain an early look at performance within a clinical medicine context. Utilizing two versions of ChatGPT for this study may provide insight as to the rate of progression of AI technology between updates.”

The researchers prompted ChatGPT 3.5 and ChatGPT 4.0 with 15 questions from the 2012 NASS Clinical Guidelines for the diagnosis and treatment of lumbar disc herniation with radiculopathy. Two independent authors assessed the language output based on accuracy, over-conclusiveness, supplementary, and incompleteness.

OTW asked Dr. Kayastha about the challenges/milestones he encountered conducting this study. “ChatGPT answers questions differently even when asking the same question at different time points which makes it difficult to assess reliability” he said. “It also has a tendency to fabricate answers even in cases where it has access to correct information.”

Progress ≠ Perfect

Among the 15 responses produced by ChatGPT 3.5,

  • 7 (47%) were accurate,
  • 7 (47%) were over-conclusive,
  • 15 (100%) were supplementary, and
  • 6 (40%) were incomplete.

For ChatGPT 4.0,

  • 10 (67%) were accurate,
  • 5 (33%) were over-conclusive,
  • 10 (67%) were supplementary, and
  • 6 (40%) were incomplete.

While there was a statistically significant difference in supplementary information between ChatGPT 3.5 and ChatGPT 4.0. (100% vs. 67%) there was no statistically significant difference between the two versions for accuracy (47% vs. 67%), over-conclusiveness (47% vs. 33%), or incompleteness (40% vs. 40%).

Advertisement

Both versions reached 100% accuracy for definition and history and physical exam categories.

However, ChatGPT 3.5 completely failed (0% accuracy) to answer questions regarding diagnostic testing and surgical intervention.

By contrast, ChatGPT 4.0 hit 100% accuracy for diagnostic testing and a 33% accuracy rate for surgical intervention information.

For questions regarding nonsurgical interventions, ChatGPT 3.5 gave accurate information 50% of the time. ChatGPT 4.0 hit a 63% accuracy rate.

“ChatGPT performed best regarding definition-based questions and worst with more complex surgical or prognosis-type questions,” explained Kayastha. “AI will be integrated into healthcare in the future—the question is when. With proper regulatory oversight and access to accurate and up-to-date information, AI can be a useful clinical adjunct for clinicians. Ethical safeguards and questions of liability need to be reconciled before implementation, but AI can be incredibly helpful when used in the right context.”

Buyer Beware

The authors found that although advancements in ChatGPT 4.0 have been made, a third of the responses in the data obtained in this comparative analysis still contained inaccurate information. ChatGPT 4.0 outperformed ChatGPT 3.5 in a statistically significant fashion only in terms of one out of four outcome measures: supplementary information. ChatGPT 3.5 added supplemental information to all tested clinical guidelines, whereas ChatGPT 4.0 responses were more conservative.

The investigators determined that both models were “vulnerable to producing unsupported or irrelevant details” and that “ChatGPT sometimes generates fabricated data to provide the user with an immediate response, regardless of the content’s factual integrity. Despite these limitations, ChatGPT may still have potential to be a supplemental source for medical professionals pending future updates and ethical considerations.”

React:

Discussion

14
DS
Dr. Sarah MitchellOrthopedic Surgeon · Mayo Clinic

This is a fascinating development. In my practice we've seen similar outcomes with the revised protocol. The key differentiator seems to be patient selection criteria. Has anyone else noticed the correlation with BMI thresholds?

8
JT
James Thornton, MDSpine Fellow · HSS

Great point. I'd push back slightly on the conclusion, the sample size in the cited study is too small to draw population-level inferences. That said, the directional signal is compelling and worth a larger RCT.

5
RP
R. PatelSports Medicine · Stanford

We implemented a similar approach last year. Early results are promising but we're still gathering 12-month follow-up data. Happy to share our protocol if anyone is interested.

Join the conversation

Orthopedic professionals are discussing this. Sign in and upgrade to read every comment and add your voice.

Subscribe

Get Full Access

Read every OTW article and join member discussions for $24.99/month.

Get Full Access

Advertisement

Advertisement

Advertisement

Orthopedics This Week

The most trusted source in orthopedic industry news since 2005. Covering spine, joints, trauma, biologics, and the business of orthopedics.

A publication of RRY Publications, LLC

LinkedInXFacebook

Categories

  • Spine
  • Joints
  • Upper Extremities
  • Foot & Ankle
  • Sports Medicine
  • Pain Mgmt
  • Trauma
  • Biologics
  • Technology
  • People
  • Company News
  • Legal & Regulatory

Resources

  • Subscribe
  • Community Posts
  • Job Board
  • Press Release Opportunities
  • Power Rankings
  • About OTW
  • Advertise
  • Contact Us

Get Full Access

Unlimited articles, community posts, and Power Rankings.

Get Full Access

Plans start at $24.99/mo · Annual saves 20%

© 2026 Orthopedics This Week · RRY Publications, LLC

Privacy PolicyTerms of ServiceCookie Policy