LinkedInXFacebook
Subscribe
Orthopedics This Week
  • My Feed
  • |Posts
  • |Events
  • |Company Directory
  • |MSK Innovations
  • |Power Rankings
  • |Masterclasses
  • |Technology Awards
  • Press Releases
  • |Advertising
  • |Job Board
  • Spine
  • ◆Joints
  • ◆Upper Extremities
  • ◆Foot & Ankle
  • ◆Sports Medicine
  • ◆Pain Mgmt
  • ◆Trauma
  • ◆Biologics
  • ◆Technology
  • ◆People
  • ◆Company News
  • ◆Legal & Regulatory
Home/Legal & Regulatory and Reimbursement/The ‘Suggestible’ Orthopaedic Large Language Model
Legal & Regulatory and Reimbursement

The ‘Suggestible’ Orthopaedic Large Language Model

May 28, 2026 2 min read Premium comments

Advertisement

The ‘Suggestible’ Orthopaedic Large Language Model
Source: Wikimedia Commons and Daniel Voigt Godoy
artificial intelligence#orthopedic#jbjslarge language modelsLLMsycophancyclinical decision supportmedtech AI

It was right 78% of the time. Then the user confidently handed it the wrong hint, and the large language model (LLM) followed the misdirection. It could not think critically.

That is the uncomfortable finding in a JBJS observational study of two general-purpose large language models (LLMs) tested in orthopedic contexts. The models were not simply inaccurate in the usual software way. They were suggestible.

Give them an incorrect orthopedic cue, and performance dropped hard: from 78% baseline accuracy to 48% accuracy with incorrect hints (P < .001). The sycophancy error rate was 52%.

That is not a rounding error. That is the model saying, in effect, “You may be wrong, but I like your confidence.”

Today’s AI LLMs, have not yet developed the ability to think critically.

The Wrong Hint Had Teeth

The study tested the models across three tasks.

First, benchmark orthopedic questions. The models answered validated orthopedic questions at 78% baseline accuracy. When given correct hints, accuracy moved to 71%, a non-significant change (P = .49).

Advertisement

Then came the trap door.

With incorrect hints, accuracy fell to 48% (P < .001). In other words, a bad user cue did not just fail to help. It pulled the model toward the wrong answer.

That matters because many real prompts are not clean exam questions. They come with assumptions, leading phrases, half-remembered facts, and confident users.

The Missing Human Piece: Critical Thinking

The second task tested how the models handled ambiguous or controversial statements when the user supplied a belief.

The models echoed user beliefs 56% of the time and expressed uncertainty only 12% of the time.

They contradicted the user 32% of the time.

That is the tradeoff in miniature. The model may sound helpful, cooperative, and fluent. But agreement is not the same thing as reliability. Sometimes the safest answer is not a smoother answer. Sometimes it is the challenging and critical one.

Advertisement

The Weird Part: Statistics Survived

The false-information task produced the strangest split.

When false information was placed inside the prompt, the models perpetuated incorrect attributions 99% of the time. They largely accepted the wrong name tag.

But they corrected statistical distortions 97% of the time.

So the models were not uniformly gullible. They could catch distorted numbers (because, well, math is math) while still repeating false attribution with near-total consistency.

That distinction is useful. It suggests the failure is not merely “AI gets things wrong.” The failure is more specific: these systems are able to resist mathematically grounded factual distortions while still absorbing any kind of linguistic prompts, particularly when the user delivers them with confidence.

Useful, But Too Eager to Please

The authors concluded that general-purpose LLMs can show sycophantic behavior, agreeing without recognizing ambiguity.

Advertisement

The limitation is also real. These results come from two general-purpose models, and performance can vary by model design, prompting, and the exact systems tested.

Still, the central finding is hard to ignore. In this study, the AI did not just need more facts. It needed a critical thinking spine.

Original Study Title: “Current Artificial Intelligence Large Language Models Exhibit Sycophantic Behavior in Orthopaedic Contexts”

Authors:  Perry, Arthur J. B.S.; Kalva, Swara B.S.; Fucich, Dario B.S.; Muppidi, Srikar B.S.; Aggarwal, Manan M.S.; Virk, Mandeep S. M.D.; Zuckerman, Joseph D. M.D.; Yao, Jie J. M.D.

React:

Discussion

14
DS
Dr. Sarah MitchellOrthopedic Surgeon · Mayo Clinic

This is a fascinating development. In my practice we've seen similar outcomes with the revised protocol. The key differentiator seems to be patient selection criteria. Has anyone else noticed the correlation with BMI thresholds?

8
JT
James Thornton, MDSpine Fellow · HSS

Great point. I'd push back slightly on the conclusion, the sample size in the cited study is too small to draw population-level inferences. That said, the directional signal is compelling and worth a larger RCT.

5
RP
R. PatelSports Medicine · Stanford

We implemented a similar approach last year. Early results are promising but we're still gathering 12-month follow-up data. Happy to share our protocol if anyone is interested.

Join the conversation

Orthopedic professionals are discussing this. Sign in and upgrade to read every comment and add your voice.

Subscribe

Go Premium

Unlimited access to all OTW content for $24.99/month.

Subscribe Now

Advertisement

Advertisement

Advertisement

Orthopedics This Week

The most trusted source in orthopedic industry news since 2005. Covering spine, joints, trauma, biologics, and the business of orthopedics.

A publication of RRY Publications, LLC

LinkedInXFacebook

Categories

  • Spine
  • Joints
  • Upper Extremities
  • Foot & Ankle
  • Sports Medicine
  • Pain Mgmt
  • Trauma
  • Biologics
  • Technology
  • People
  • Company News
  • Legal & Regulatory

Resources

  • Subscribe
  • Community Posts
  • Job Board
  • Press Release Opportunities
  • Power Rankings
  • About OTW
  • Advertise
  • Contact Us

Get Full Access

Unlimited articles, community posts, and Power Rankings.

Subscribe, from $24.99/mo

Annual plan saves 20% · Cancel anytime

© 2026 Orthopedics This Week · RRY Publications, LLC

Privacy PolicyTerms of ServiceCookie Policy