
Only 27% of physicians feel their current electronic health record actually supports good clinical care. The rest quietly describe it as “a billing system we happen to practice inside.”
Large language models (LLMs) are blowing that up. Not overnight. Not perfectly. But fast enough that vendors, compliance officers, and clinicians are all scrambling at the same time.
Let me break down exactly how these models are changing clinical documentation workflows today, what is hype, what is already happening on the ground, and where this is going in the next 3–5 years.
1. From “Click Boxes” To “Explain What Happened”
The core shift: documentation is moving from structured data entry to natural language explanation, with the machine doing the translation in the background.
Right now, a typical workflow looks like this:
- You interview and examine the patient.
- You type (or dictate) a note into the EHR.
- You click your way through problem lists, orders, ICD-10, CPT, quality measures, prior auth forms, etc.
LLM-based systems are attacking all three steps, but the first and third are where the disruption is most obvious.
Ambient clinical documentation: what is actually happening in real clinics
Here is the current, real-world pattern I see:
A physician walks into the exam room, starts talking to the patient as usual. On the wall there is a microphone or a tablet, or the physician has a phone in “ambient capture” mode. The system records the visit, sends the audio to a cloud service, and within minutes produces:
- A structured SOAP note (or equivalent format)
- Problem list changes
- Medication changes
- Orders to review/sign
- Sometimes even suggested ICD-10 and CPT codes
The key difference from old-school speech recognition (Dragon, etc.):
You are not dictating. You are having a real conversation. The LLM is inferring clinical structure from messy, overlapping dialogue.
| Category | Value |
|---|---|
| 2019 | 2 |
| 2021 | 6 |
| 2023 | 18 |
| 2025 (proj) | 40 |
Those numbers are roughly what major health systems are reporting: low single-digit percent a few years ago, rapidly approaching mainstream in primary care and some specialties.
How the note is actually assembled under the hood
Vendors pretend this is magical. It is not. It is engineering.
The pipeline usually looks like this:
- Speech-to-text with speaker diarization: who said what, when.
- NER (named entity recognition): finding problems, meds, allergies, dates, dosages.
- Dialogue understanding: what is history, what is assessment, what is plan.
- LLM generation: turn all that into coherent narrative, in your preferred style.
- Post-processing: templates, smart phrases, institution-specific sections, legal language.
The LLM is mostly step 4 (and sometimes 3). And it is very good at that part.
You end up with a draft note that feels oddly like something you would have written on a good day with too much time. It is not always correct. But it is often 70–90% of the way there.
And that changes your job from “author the note” to “edit and sign the note.”
2. Concrete Workflow Changes: Before vs After LLMs
Let me get specific. Conceptual talk is useless if it does not translate into time, clicks, and cognitive load.
Outpatient primary care visit
Old workflow for a 15-minute visit:
- 3–5 minutes pre-charting
- 12–15 minutes with the patient
- 7–10 minutes post-visit documenting, ordering, coding
- 30–60 minutes after clinic finishing notes and inbox
LLM-augmented workflow in clinics that have actually integrated this well:
- 1–2 minutes glance at a summarization of prior notes (LLM-generated)
- 12–15 minutes with the patient, ambient capture running in background
- 3–5 minutes reviewing AI-drafted note, signing orders, editing wording
- Less (sometimes much less) after-hours work
The time savings is not uniform. Some physicians shave off 3–4 minutes per visit; others save nearly all of their after-hours documentation time. It depends on:
- How chaotic the visit is
- How specific your institution’s documentation requirements are
- How much you trust the AI to draft in your voice
| Step | Traditional Workflow (min) | LLM-Augmented Workflow (min) |
|---|---|---|
| Pre-charting | 3–5 | 1–2 |
| In-room charting | 3–6 | 0–1 (just quick notes) |
| Post-visit note | 7–10 | 3–5 |
| End-of-day catchup | 30–60 | 5–20 |
These are not vendor marketing numbers. This is what large systems report internally once the novelty wears off.
Inpatient rounding workflow
Inpatient is messier, but you still see three specific LLM-enabled changes:
Pre-round summaries
The system generates concise overnight summaries pulling from nursing notes, vitals, labs, imaging, and consults. Instead of reading through 10+ notes, you skim one or two paragraphs plus a trends panel.Assessment and plan scaffolding
You speak your reasoning once (often into a mobile app after you walk out of the room). The LLM expands it into a structured A/P. You correct nuance, adjust phrasing, then move on.Consult communication
Some systems now let you say: “Summarize for cardiology why we are consulting them today, including key labs and imaging.” The LLM drafts the text pager or secure chat message.
This is not sci-fi. It is rolling out right now in academic centers and some IDNs.
3. What LLMs Are Actually Good At In Documentation (And What They Are Not)
People either wildly overestimate or underestimate LLM capability. Both are dangerous in clinical workflows.
Strengths: where LLMs are already outperforming humans
Turning long, messy speech into coherent notes
Humans hate transcribing. Models love it. They have endless patience for rambling stories and repeated details.Regenerating content in different formats
You can say: “Convert this clinic note into a discharge summary,” or “Write a patient-friendly after-visit summary from this H&P.” The LLM handles register, tone, and structure easily.Context-sensitive summarization
Ask for: “Summarize the last 6 months of cardiology-related events for this patient in 5 bullet points.” Or “Give me the key elements relevant to pre-op clearance.” This is where attention-based models shine.Filling in standard phrasing, boilerplate, and checklists
“Normal physical exam except as in HPI” becomes a fully formatted exam section that matches your institution’s style. No more hunting for smart phrases.Suggesting codes from text
Given: “New onset atrial fibrillation with rapid ventricular response in a patient with long-standing hypertension,” it will suggest the right ICD-10 and often CPT bundles. Coders still check. But the draft is fast.
| Category | Value |
|---|---|
| Transcription/Summarization | 90 |
| Template/Boilerplate Generation | 85 |
| ICD/CPT Suggestion | 75 |
| Nuanced Clinical Reasoning | 55 |
| Edge-Case Interpretation | 40 |
The point: anywhere the task is pattern-based and repetitive, the LLM is already competitive or superior.
Weaknesses and risk zones
Here is where people get burned:
Nuanced clinical reasoning
The model can sound confident while being clinically wrong. It might overstate a diagnosis, infer causality that you never intended, or “fill in” exam findings that were never explicitly stated in the audio.Rare conditions and edge cases
Documentation around zebras or complex multisystem cases can be subtly distorted. The model is biased toward the common.Temporal and causal relationships
It may mis-sequence events: implying that the CT was ordered because of symptom X when, in fact, it came from unrelated screening. That can create medico-legal vulnerability.Subtle tone and blame
A note that accidentally sounds like you are criticizing the patient (“noncompliant,” “failed to follow up”) when you never said that aloud. Or vice versa—missing clinician concerns around safety.Over-normalization
If you say “lungs are clear” once, it may propagate that in multiple sections. If your exam is incomplete, it may default to “normal” rather than “not examined.” That is a big problem.
The solution is very simple and very non-negotiable: the clinician is the author. The LLM is a drafting assistant. If you outsource judgment to it, you are doing it wrong.
4. Integration With EHRs: Where the Real Battle Is
The bottleneck is not the model. It is the EHR and hospital IT.
Here is the reality:
If LLMs live in a separate app that forces you to copy-paste into Epic or Cerner, adoption is mediocre. You add friction even as you promise to remove it.
If LLM capability is embedded directly in the EHR—and your clicks actually go down—then it becomes viable.
The three main integration patterns
I see three archetypes across systems:
Sidecar integration
A separate web or mobile app listens, generates the note, then writes back to the EHR via FHIR or HL7. This is how early ambient vendors started.Pros:
- Faster to deploy
- Vendor neutral
Cons:
- Context gaps (limited access to structured data)
- Feels “separate,” more clicks for the clinician
Deep native integration
The EHR vendor builds or buys the LLM layer. Examples: Epic’s partnership with Nuance/Microsoft, Oracle Cerner’s collaborations, etc.Pros:
- Access to full chart context
- Fewer logins, better UX
Cons:
- You are completely at the mercy of the EHR roadmap
- Less flexibility in choosing models/approaches
Hybrid orchestration
Health systems use their own orchestration platform: choose foundation models (OpenAI, Anthropic, open-source), add prompt engineering, guardrails, and connect directly into their EHR via APIs.This is where advanced academic centers are heading. You maintain control and can swap models as they evolve.

Latency, click burden, and the “1-minute rule”
One detail that non-clinical people underestimate: latency kills adoption.
If I finish a visit and your AI takes 3–5 minutes to produce the note, I will not wait. I will write it myself. My workflow cannot stall for your inference server.
The systems that work in the wild target:
- Partial draft within 30–60 seconds
- Fully refined note within 2–3 minutes, but I can already start editing
Anything slower is dead on arrival in high-volume clinics.
Same for clicks. If I have to:
- Open a separate window
- Choose from 5 templates
- Acknowledge 3 pop-ups
…you lost me. LLMs succeed in documentation when they remove friction, not repackage it.
5. Coding, Compliance, and the Billing Reality
Let me be blunt: a large portion of documentation exists for billing and compliance, not clinical reasoning. LLMs are being deployed aggressively here because the ROI is easy to quantify.
From “note-first” to “intent-first” coding
Traditional:
You document, then coders extract codes and challenge you via queries.
LLM-enabled pattern:
The model identifies potential codes in real-time as you speak or as the draft is created. It highlights missing documentation that would justify a higher-complexity visit or specific quality measures.
For example:
- It notices chronic kidney disease mentioned but no stage documented.
- It flags that time spent in counseling could support a time-based E/M code if properly documented.
- It identifies that sepsis criteria are met but not explicitly stated.
The system then prompts: “If clinically appropriate, you may want to document CKD stage to support accurate coding.”
You decide. Not the model.
| Category | Value |
|---|---|
| Level 2 | 10 |
| Level 3 | 45 |
| Level 4 | 30 |
| Level 5 | 15 |
What actually happens in practice: level 2 visits decrease, level 4 and 5 increase modestly—but more importantly, documentation better reflects true complexity. Audit risk does not automatically go up if you document accurately.
Risk: over-documentation and copy-forward at scale
We already have a problem with copied forward notes. LLMs can make that worse if poorly controlled.
Patterns to watch for:
- Massive notes full of irrelevant historical detail regurgitated by the model.
- Every problem listed as “high complexity” because the training examples looked like that.
- Template bloat—where the assistant always “helps” by inserting long paragraphs of boilerplate.
If your notes get 30% longer after introducing LLMs, you did it wrong. The goal is sharper, more focused documentation, not a wall of text that no one reads.
6. Patient-Facing Documentation: The Silent Revolution
The second big front: what the patient sees.
You have two emerging workflows:
Plain language after-visit summaries
“Rewrite the assessment and plan for an eighth-grade reading level, removing jargon, keeping medication names accurate.”Real-time visit explanations
Systems that listen to the visit and produce on-the-fly educational snippets: “What is atrial fibrillation?” “Why this blood test?” They are then added to the portal note.
Patients are already reading visit notes through OpenNotes-style policies. LLMs are making those notes understandable.

This has direct workflow implications:
- Less time answering basic “what does this mean?” portal messages.
- More time for actual clinical decision-making in follow-up visits.
- New sources of error if the simplified explanation is wrong or overly reassuring.
The guardrail here is straightforward: clinicians must be able to preview or at least review the style and content, with defaults tuned conservatively. Do not let the model give treatment advice beyond what you documented.
7. Governance, Bias, and Safety: The Less Fun but Critical Part
Every CIO and CMIO I know is wrestling with the same three questions:
- Where can we safely use general-purpose models (GPT-4, Claude, etc.)?
- Where do we need domain-specific, healthcare-tuned models?
- How do we monitor for silent failure?
Prompting and guardrails
For clinical documentation, your prompts matter more than people think. A well-designed prompt for a note generator will:
- Explicitly tell the model not to invent physical exam findings that were not mentioned.
- Instruct it to mark uncertain information clearly (“Patient unsure of exact date”).
- Constrain it to a specific note structure and style.
- Warn against inserting clinical recommendations beyond what is in the source text.
You should also be doing automatic post-processing checks:
- Regex or NER passes to identify dangerous phrases (“no history of X” contradicting past notes).
- Length and structure constraints (e.g., physical exam cannot appear without any vital signs context if your institution requires them).
- Flags for repeated content across multiple days without changes.
| Step | Description |
|---|---|
| Step 1 | Audio and Chart Data |
| Step 2 | LLM Draft Note |
| Step 3 | Automated Safety Checks |
| Step 4 | Clinician Review |
| Step 5 | Compliance or QA Review |
| Step 6 | Final Note Signed |
If you are not doing something like this, you are trusting a probabilistic model with medico-legal documents. That is irresponsible.
Data privacy and PHI
Another non-negotiable: where does the data live?
- Is PHI being sent to a third-party model provider?
- Is it stored or used for training?
- Is there a BAA (business associate agreement) in place?
- Can you audit prompts and outputs?
Serious systems either:
- Use vendor models with strict healthcare-grade contracts and PHI isolation, or
- Host models in their own VPC / data centers (especially for large IDNs).
If a vendor cannot answer basic data lineage questions, move on.
8. Training Clinicians To Work With LLMs (Not Against Them)
You cannot just “turn on” an AI scribe and expect productivity to jump. There is a learning curve.
The clinicians who get the most out of these tools:
- Speak in a slightly more structured way when summarizing assessment and plan out loud.
- Learn a small “vocabulary” of prompts: “Summarize this for patient,” “Refine this plan wording,” “Generate a problem-focused note from this conversation.”
- Give explicit feedback to the system early (accept/reject suggestions) so the local tuning gets better.
- Assume the AI is always right and sign without reading.
- Or distrust it completely and keep doing everything manually, so they carry the cognitive overhead of a new system without the benefits.
There is a middle path: treat the LLM like a very fast, very literal intern. It drafts. You own.

9. What Changes Next: 3–5 Year Horizon
Speculation, but informed speculation.
Here is where I expect clinical documentation workflows to land:
Documentation becomes a byproduct of care, not a separate task
You talk, examine, decide. The system builds not only the note, but the billing artifacts, quality measure checkboxes, and care coordination summaries behind the scenes.Notes get shorter for clinicians and richer for machines
Humans will see condensed, focused narratives. Under the hood, the system will maintain a highly structured knowledge graph of problems, findings, relationships, and timelines.Multi-modal documentation
Photos of rashes, point-of-care ultrasound clips, ECG strips—these will feed directly into the note. LLMs will summarize what they show and link them to clinical reasoning, with computer vision models assisting.Institution-level customization
Each health system’s documentation style, policies, and risk appetite will be baked into its own “house style model.” New hires will adapt to the model, not the other way around.Regulatory expectations swing
At some point, payers and regulators will start expecting AI-assisted documentation. Why? Because once outliers stand out clearly, pure copy-paste and boilerplate fraud gets easier to spot.
| Category | Value |
|---|---|
| Now | 55 |
| 2 Years | 40 |
| 5 Years | 30 |
That “values” line is percent of total clinical time spent on documentation. Will we actually get from ~55% to ~30%? If systems are implemented intelligently, yes. If they are layered on top of broken workflows, no.
10. How To Evaluate Vendors and Internal Projects, Practically
If you are a clinician leader or informatics person being pitched AI documentation tools every week, here is the short, ruthless checklist I use:
- Latency under real network conditions?
- Demonstrated reduction in clicks and keystrokes in your actual EHR?
- Robust PHI handling with contractual guarantees?
- Clear options to control style, length, and risk level of generated notes?
- Audit trails of what the model produced vs what the clinician edited?
- Exit strategy if the vendor disappears or gets acquired?
If a product demo spends 90% of time on “magic” and 10% on these details, they are not serious about clinical reality.
Key Takeaways
- LLMs are already changing documentation from active authoring to review-and-sign, especially via ambient scribing and summarization.
- The main value is not prettier notes; it is reclaimed clinician time and cognitive bandwidth, if—and only if—workflow friction and safety are handled well.
- Treat the model as an assistant, not an author. The moment you forget that distinction, you move from “future of healthcare” to “future malpractice exhibit A.”