ChatGPT Goes to Clinic: Can It Make Patient Handouts Readable?

Inside the Epic Systems electronic health record (HER), the Elsevier patient education library is packed with useful content.

The problem? Much of it hovers well above the recommended sixth-grade reading level. Prior analyses clocked English materials at a mean grade level of 8.6. Spanish? Better — but still not ideal for universal comprehension.

So, the obvious question: Can artificial intelligence make your patient handouts clearer — without making them wrong?

A recent blinded study set out to answer exactly that.

The Orthopedic Readability Showdown

The investigators took on a formidable task: 806 orthopedic patient education materials (PEMs) were run through standardized simplification prompts using ChatGPT (March 2025 vintage).

They weren’t just checking whether the prose sounded friendlier. They asked three critical questions: Does AI actually lower the reading level? Does it preserve clinical accuracy? And how does it compare to human-written “easy-to-read” versions?

To answer the fidelity question, 86 PEMs that already had human-simplified versions were analyzed. Two blinded clinicians reviewed the outputs — human and AI — against the original documents. They scrutinized: Hallucinations (the dreaded “creative” additions), omissions (important missing content), and inconsistencies (subtle meaning shifts).

All graded by severity.

Then, after the release of ChatGPT-5, the team performed a post hoc analysis using identical criteria. Consider it a generational implant survivorship study — but for language models.

The Results: Surprisingly Competitive

ChatGPT-4 delivered an average grade level in English of 6.1 and in Spanish, an average grade level of 3.5.

Yes — 3.5 in Spanish. That’s impressively accessible without turning “anterior cruciate ligament reconstruction” into “knee string surgery.”

Fidelity Face-Off: Human vs. AI

Compared to human simplifications, ChatGPT delivered fewer English omissions, similar Spanish omissions, fewer inconsistencies in both languages, comparable English hallucinations and higherSpanish hallucinations (early model growing pains).

Then came ChatGPT-5.

It maintained English performance and significantly improved Spanish fidelity — reducing hallucinations to rates comparable with human authors.

The ChatGPT Good News

You often focus on surgical precision, implant alignment, and complication avoidance. But if your patients leave clinic with a less-than-optimal understanding of their understanding of the diagnosis, their consent decisions, their postoperative compliance or their expectations — then you’re prescribing future confusion.

This study suggests that AI — when standardized and clinician-supervised — can reliably produce more readable orthopedic materials without sacrificing acceptable clinical accuracy.

And importantly, it can do so at scale.

The Takeaway for Orthopedic Surgeons

AI simplification meaningfully lowers reading levels.
Fidelity is comparable to human simplification in most domains.
Newer iterations (ChatGPT-5) improve multilingual reliability.
Clinician oversight remains essential — this is a power tool, not autopilot.

Sometimes the most impactful intervention isn’t a new implant. It’s making sure your patient actually understands the one you just put in.

Origin Study Title: A Blinded Analysis of Quality and Fidelity in Orthopaedic Patient Education Materials Simplified by ChatGPT and Humans

Authors: Joseph E. Nassar, B.S.; Maxwell Sahhar, B.S.; Lama A. Ammar, M.D., M.S.; Joseph Carroll, B.S.; Anne-Emilie Rouffiac, B.S.; Marco Kaper, B.S.; Alex Hernandez-Manriquez, B.S.; Manjot Singh, M.D.; Edward Akelman, M.D.; Alan H. Daniels, M.D.

ChatGPT Goes to Clinic: Can It Make Patient Handouts Readable?

Comments (0)