
You’re post-residency, it’s 4:30 pm, and you’re already behind on notes. The EHR pops up an AI “care suggestion” for your complex CHF/COPD frequent flyer. It recommends a med change and flags a “potential alternative diagnosis.”
You hover over “Accept.”
Here’s the real question: should you trust it?
Let’s walk through a ruthless, practical checklist you can use in real time. Not theory. Not hype. A filter you can run every AI recommendation through before it touches a patient.
1. First Principle: AI Is a Consultant, Not an Attending
Start with the right mental model or everything else breaks.
AI in the clinical setting is a consultant. A smart one sometimes. A dangerous one occasionally. But never the attending.
You own the decision. Your license. Your name on the note. Your malpractice risk.
So the default stance is:
- AI may assist.
- AI may suggest.
- AI may occasionally surprise.
- It should never override your clinical judgment.
Any system that’s trying to act like the attending – auto-ordering, auto-diagnosing, auto-disposition without human review – is already a red flag.
Your internal rule of thumb:
AI can shift your probabilities, but it shouldn’t replace your reasoning.
2. The Checklist: 10 Questions Before You Trust an AI Recommendation
Use this like a pre-flight checklist. You won’t ask all 10 every time, but you should mentally hit most of them, especially early on.
| # | Question | If the answer is... |
|---|---|---|
| 1 | Is this within AI’s demonstrated strengths? | Yes → Proceed with caution |
| 2 | Do I understand what the model is actually doing? | No → High skepticism |
| 3 | Does this fit the clinical context I’m seeing? | No → Treat as noise |
| 4 | Would a reasonable colleague agree this is plausible? | No → Stop and re-evaluate |
| 5 | Is the recommendation guideline-concordant? | Yes → Mild trust boost |
| 6 | Is the AI transparent about uncertainty? | No → Don’t over-trust |
| 7 | Is there obvious bias or missing data? | Yes → Discount heavily |
| 8 | Can I independently verify this quickly? | Yes → Do it before accepting |
| 9 | What’s the downside if this is wrong? | High → Assume it’s wrong until proven right |
| 10 | Who is legally on the hook? | You → Treat as just another input |
Let’s break these down in real-world language.
3. Question 1: Is This Something AI Is Actually Good At?
AI is not magic. It’s pattern recognition on steroids. It’s strong in some lanes and absolutely mediocre in others.
Generally stronger at:
- Imaging classification (x-ray pneumothorax detection, CT PE flags, diabetic retinopathy screens)
- Narrow, structured risk scores (sepsis prediction, readmission risk, AKI prediction)
- Pattern-heavy tasks with lots of training data (ECG interpretation, basic dermatology triage photos)
- Administrative suggestions (drafting discharge summaries, visit notes, prior auth letters)
Generally weaker at:
- Nuanced, multi-problem decision making (the 10-comorbidity 79-year-old on 18 meds)
- Value-laden decisions (goals of care, palliative vs aggressive)
- Rare diseases, zebras, and atypical presentations
- Anything requiring real-world context (social factors, family dynamics, unreliable historians)
If your AI is suggesting:
- “This is likely sepsis” in a borderline case → maybe helpful.
- “Cancel surgery tomorrow, patient is too high-risk” → absolutely not without your full review.
- “Primary diagnosis is somatic symptom disorder” on a complex patient → be very suspicious.
Gut check: If a PGY‑2 shouldn’t make that call alone, your AI shouldn’t either.
4. Question 2: Do You Understand What the Model Is Actually Doing?
You don’t need to code. You do need to know the class of tool you’re dealing with.
Ask your IT/admin (and keep asking until you get a straight answer):
- Is this a prediction model? (e.g., “30-day readmission risk 27%”)
- A rules engine? (if X labs + Y vitals → suggest Z)
- A generative model? (large language model summarizing, suggesting diagnoses, rewriting plans)
- A hybrid?
And equally important:
- What data does it see? EHR only? Imaging only? Notes? Labs?
- What population was it trained on? Same hospital system? US data? International?
If no one can answer this clearly, you treat it like a black box consultant with unknown training. That means:
- You can listen.
- You absolutely don’t trust it blindly.
5. Question 3: Does This Fit the Clinical Context in Front of You?
This is where AI quietly fails.
An AI flag might say: “Consider PE – intermediate risk.”
But you’re looking at the patient who’s clearly septic, tachycardic from fever, satting fine, with a crystal-clear pneumonia on imaging.
Does the suggestion fit the context? If not, you label it correctly: background noise.
Quick context test:
- Does the recommendation match the story? (HPI + exam + time course)
- Does it respect what you know that isn’t in the EHR? (spoke to family, saw the living situation, observed mental status shifts)
- Does it ignore something obvious? (patient is pregnant, has ESRD, is on DNR/DNI)
If context and AI diverge, context wins. Every time.
6. Question 4: Would a Reasonable Colleague Call This Plausible?
Easy check: imagine presenting the AI suggestion on rounds.
“Hey, the AI suggested we start X, change Y, consider Z.”
If you can picture your co-attending or a solid senior resident saying:
- “Yeah, that’s reasonable,” → you can give it more weight.
- “That makes zero sense for this patient,” → you treat it as an error signal, not help.
- “Maybe, but I’d want labs / imaging / consult first,” → that tells you the AI jumped a step.
The AI doesn’t get a “higher bar” or a “lower bar” than a human consultant. Same bar: would a reasonable clinician with some experience find this within the bounds of sanity?
If the answer is no, you’re done. Ignore it.
7. Question 5: Is It Guideline-Concordant or At Least Not Guideline-Hostile?
AI that’s aligned with major guidelines (ACC/AHA, IDSA, ADA, etc.) tends to be more trustworthy in that narrow lane.
Examples where it can help:
- Anticoagulation dosing in afib with CKD (using CHA₂DS₂‑VASc, HAS‑BLED, eGFR, etc.)
- Heart failure med optimization (flagging when you can uptitrate ACEi/ARNI, beta-blockers, MRA)
- Diabetes regimens aligned with ADA/EASD consensus
But watch for:
- Off-label suggestions without clear justification
- Recommendations that contradict local pathways (antibiotic choices, VTE prophylaxis in your institution)
- “Creative” regimens in high-risk patients
If it’s clearly in line with guidelines and your own usual practice, AI earns a small trust bonus. But not a blank check.
8. Question 6: Does It Show Its Work or Any Uncertainty?
Good AI tools:
- Show contributing factors (e.g., “High AKI risk due to rising creatinine, hypotension, nephrotoxic med list”)
- Display some measure of confidence or risk range
- Let you peek into the logic (at least at a high level)
Bad tools:
- Give binary “Do this / Don’t do this” with no rationale
- No sense of how close to the threshold the patient is
- No distinctions between “maybe helpful” and “absolutely critical”
If the AI:
- Can’t explain why it’s making the recommendation, and
- You can’t reverse-engineer the logic clinically
…you treat it like a suggestion from a medical student you’ve never met, on their first day.
9. Question 7: Is Bias or Missing Data Poisoning This Output?
AI is often trained on:
- Sicker patients
- Over-represented demographics
- Under-documented social context
So ask:
- Does this systematically under-recognize disease in certain groups? (women, Black patients, non-English speakers)
- Is it leaning heavily on vitals/labs and ignoring social drivers or barriers?
- Is the model using race in any way? If yes, that’s already problematic in 2026.
Example: An AI predicting low readmission risk because the patient “never returns”… when in reality the patient has zero access to transportation, phones, or follow-up. That’s not low risk. That’s invisible risk.
If your patient is nothing like the “average” of your health system data, be more suspicious.
10. Question 8: Can You Independently Verify This Quickly?
If you can’t verify it in under 1–2 minutes, treat the suggestion as low-priority unless the stakes are very high.
Quick verification channels:
- Skim guideline summary (UpToDate, institutional pathways, pocket guides)
- Ask a near colleague (“Hey, this AI is saying X—would you ever do that here?”)
- Look at recent labs/imaging yourself rather than trusting the AI interpretation fully
The goal isn’t to double-check every spell-check-level suggestion. It’s to always verify anything with meaningful downside if wrong.
11. Question 9: What’s the Downside If This Is Wrong?
This one’s non-negotiable. The higher the downside, the higher the bar.
High downside (treat as guilty until proven innocent):
- Starting or stopping anticoagulation
- Changing chemo regimens
- Major antibiotic escalations/de-escalations in sepsis
- Any intervention that could cause permanent harm if wrong
Low downside (more leeway):
- Reordering the order of lab tests
- Drafting a note that you edit
- Suggesting follow-up intervals you can adjust
Ask yourself:
“If this goes bad and I’m explaining it to a plaintiff’s attorney, am I comfortable saying, ‘I just accepted the AI’s recommendation’?”
If the answer is no, you don’t act on it without robust independent reasoning.
12. Question 10: Who Is Actually on the Hook?
Short answer: you are. Always.
Hospitals and vendors will talk about “decision support,” “assistive tools,” and “augmenting clinicians.” Notice none of that language accepts legal responsibility.
So operationally:
- Document your reasoning, not “AI said so.”
- If you follow AI, own it: “Given X, Y, Z, I agree with [specific action].”
- If you override AI, document why: “AI sepsis flag triggered, but exam and course inconsistent; alternative diagnosis more likely.”
If everyone knows you’re responsible, then AI becomes what it should be: another data point. Not the captain.
13. How to Use AI Safely: A Simple Practical Framework
Here’s a bare-bones way to integrate AI without letting it drive the bus:
Use AI for:
- Drafting: discharge summaries, patient instructions, prior auth letters, templated notes
- Reminders: “Hey, patient is eligible for SGLT2i,” “Hey, you can up-titrate HF meds”
- Triage support: flagging charts worth a second look (possible sepsis, AKI risk, etc.)
- Pattern checks: double-checking imaging findings or ECG flags that you then review yourself
Avoid using AI as:
- The primary decision-maker for diagnosis
- The final word on high-risk orders
- The source of truth for prognosis or goals-of-care discussions
- A substitute for actually looking at the patient
You’re allowed to let AI nudge you. You’re not allowed (ethically or legally) to let it replace you.
| Category | Value |
|---|---|
| Documentation | 90 |
| Risk Flags | 70 |
| Order Suggestions | 50 |
| Diagnosis | 30 |
| Prognosis | 25 |
14. Red Flags: When You Should Ignore or Turn Off AI Suggestions
If you see these, don’t be heroic. Mute it, escalate it, or both.
- Repeatedly wrong suggestions for a certain population (e.g., chronic pain, SUD, pregnant patients)
- Harsh conflicts with local practice standards without explanation
- “Hallucinated” facts (it references diagnoses, meds, or labs that don’t exist)
- Performance that changes day to day with no notice (silent model updates)
- Leadership that can’t answer basic questions about validation or oversight
This isn’t about being anti-tech. It’s about refusing to be the beta tester on live patients without guardrails.
| Step | Description |
|---|---|
| Step 1 | AI Suggestion Appears |
| Step 2 | Ignore or Treat as Noise |
| Step 3 | Verify with Guideline or Colleague |
| Step 4 | Independent Verification Required |
| Step 5 | Consider Using with Judgment |
| Step 6 | Document Your Reasoning |
| Step 7 | Within AI Strength? |
| Step 8 | Clinically Plausible? |
| Step 9 | Guideline or Local Pathway Aligned? |
| Step 10 | High Downside if Wrong? |
FAQ: Should You Trust AI Clinical Recommendations?
Can I ever document “per AI recommendation” in my note?
You can reference AI as part of your information gathering, but you should never make it the main justification. “AI sepsis alert fired; independently assessed patient and found criteria A, B, C. Given this, initiated sepsis bundle.” Your reasoning, not the algorithm, has to be front and center.Is it safe to use AI to help with differential diagnoses?
As a brainstorming tool, yes. As an authority, no. Let it suggest possibilities you might’ve missed, then you apply your own filtering: prevalence, fit with the story, exam findings, test results. Treat it like a junior resident’s differential—potentially helpful, rarely definitive.What about AI that reads imaging or ECGs—is that more trustworthy?
Generally those models are better validated and constrained. Still, you should treat them like a second reader, not the primary one. If radiology disagrees with the AI, you go with radiology. If the AI flags something radiology missed, that’s when you pick up the phone and discuss—not blindly accept.How do I push back if my hospital wants us to “trust the AI more”?
Ask specific questions: What’s the model’s AUROC? On which population? What’s the false-positive and false-negative rate in our institution? Who reviews errors? How often is performance rechecked? If they can’t answer, your skepticism is justified and should be documented in any committee or email exchanges.Is using AI clinically going to increase my malpractice risk?
If you treat it as a decision-support tool and document your own reasoning, it’s unlikely to increase your risk and may sometimes reduce it (e.g., by reminding you of guidelines). If you rubber-stamp its suggestions without independent thought, you’re absolutely increasing your risk. Judges and juries still expect a human clinician to be in charge.Can I rely on generative AI (like chatbots) integrated into my EHR for patient education?
You can use it to draft content, but you need to scan it for accuracy, readability, and appropriateness for that specific patient. Check drug names, dosing, and any conditional language. Avoid giving it free rein on complex topics like prognosis, end-of-life planning, or controversial treatments—those need your voice and nuance.What’s one simple habit that makes AI use much safer in my practice?
Any time you act on an AI suggestion that meaningfully changes care, pause for 10 seconds and ask yourself: “If this goes badly, can I defend this decision without saying ‘the computer told me so’?” If the answer is no, stop and re-check. That tiny pause will prevent a lot of dumb, AI-driven mistakes.
Open your EHR the next time you see an AI suggestion and run it through just three questions: “Is this in AI’s lane? Does it fit this patient’s story? What’s the downside if it’s wrong?” Start there. Then build out the rest of the checklist as muscle memory.