
You are in your office at 10 p.m. The clerkship exam is next week. You have a pile of old questions, a syllabus, and an uneasy feeling that half of your test is still just “name that lab value.” You know your learners can memorize troponin cutoffs. What you are not seeing is whether they can actually manage a crashing NSTEMI at 3 a.m.
This is where most medical exams fail. Not because they are badly written. Because they are aimed at the wrong target.
If you want to test clinical reasoning, you must design for it from the ground up. Different question architecture. Different workflow. Different standards.
Let me walk you through a concrete, step‑by‑step way to build questions that actually assess how people think at the bedside, not how well they crammed a review book.
1. Get Clear On What “Clinical Reasoning” You’re Testing
Most exams go wrong here: the blueprint says “pneumonia,” and everyone writes “Which of the following is first‑line treatment for community‑acquired pneumonia?” That’s not reasoning. That’s recall.
You need to specify what piece of the diagnostic / management process your question is targeting.
Think in terms of micro‑skills of clinical reasoning:
- Problem representation (summarizing the case into a key clinical question)
- Generating a prioritized differential
- Identifying key discriminating features
- Choosing initial diagnostic tests
- Interpreting data in context
- Selecting the next best management step
- Re‑prioritizing when new data arrives
- Recognizing dangerous diagnoses you cannot miss (“can’t‑misses”)
- Identifying and correcting a cognitive error
For each exam, pick 3‑5 reasoning targets per topic, not just “knows pneumonia guidelines.”
Example for “Shortness of breath in clinic”:
- Distinguish CHF vs COPD vs pneumonia vs PE from history/physical.
- Choose the single most useful diagnostic test to narrow the differential.
- Interpret a CXR or BNP in the context of the case, not in isolation.
- Decide disposition: home with follow‑up vs ED vs admit vs ICU.
Write these down. They are your blueprint. If a draft question does not clearly hit one of these, that question is dead weight.
2. Use the Right Question Formats (and Stop Using the Wrong Ones)
Most of you are stuck with MCQs. Fine. Multiple‑choice can absolutely test reasoning—if you stop writing trivia.
There are three main MCQ structures that actually work for clinical reasoning.
2.1 Single‑Best‑Answer Vignettes (Properly Written)
This is your workhorse.
Structure:
- Clinical vignette with:
- Age, sex
- Presenting complaint
- Key positives and negatives
- Relevant past history, meds, social
- Focused exam
- Select labs/imaging only if they change the answer
- Clear, focused lead‑in question that demands a decision.
- 4–5 options that are all plausible but differ in quality of reasoning.
What this targets:
- Prioritizing differential
- Choosing next step in management
- Interpreting data in context
- Disposition decisions
Bad example (knowledge‑based):
A 65‑year‑old man has community‑acquired pneumonia. Which of the following is the mechanism of action of azithromycin?
This is pure pharmacology recall.
Good reasoning‑based example:
A 65‑year‑old man presents with 2 days of productive cough, fever, and pleuritic chest pain. He is febrile at 38.6°C, RR 26, BP 110/70, HR 105, O2 sat 91% on room air. Crackles at right base. No confusion. BUN 14 mg/dL.
Chest X‑ray: right lower lobe consolidation.
Which of the following is the most appropriate next step in management?
A. Discharge home with oral azithromycin
B. Discharge home with oral levofloxacin
C. Admit to general ward with IV ceftriaxone and azithromycin
D. Admit to ICU for IV piperacillin‑tazobactam and vancomycin
E. Observe in ED for 24 hours without antibiotics
Here you are testing:
- Use of a severity score (CURB‑65 implicitly)
- Risk stratification
- Appropriate setting and antibiotic spectrum
Knowledge is embedded, but the task is a reasoning decision.
2.2 “Next‑Best‑Step” Sequencing Questions
Same basic format as above, but the emphasis is on sequencing decisions.
These questions are powerful because clinical reasoning is often about “what do I do now, before I do anything else?”
Example:
A 24‑year‑old woman presents to the ED with acute shortness of breath and pleuritic chest pain. She is 3 weeks postop from ACL reconstruction. RR 30, HR 118, BP 122/80, O2 sat 89% on room air. Lungs are clear to auscultation.
Which of the following is the most appropriate next step in management?
A. Order a CT pulmonary angiogram
B. Start empiric therapeutic low‑molecular‑weight heparin
C. Obtain a D‑dimer level
D. Order a V/Q scan
E. Discharge with outpatient follow‑up
Here, the learner needs to:
- Recognize high pre‑test probability for PE.
- Know that D‑dimer is useless in high‑probability situations.
- Decide whether imaging or immediate anticoagulation comes first, considering stability.
You are testing their process, not just “what test diagnoses PE?”
2.3 Multi‑Step or “Short Case Cluster” Questions
This is underused in written exams but excellent for reasoning.
One patient. Two or three sequential questions:
- Initial presentation → what is the best next step?
- You give them the result of that step → what is your diagnosis?
- Now that you have a diagnosis → what is the best management?
Each question builds on the prior one. You are approximating real‑time clinical reasoning.
| Step | Description |
|---|---|
| Step 1 | Initial Vignette |
| Step 2 | Q1 - Next Best Step |
| Step 3 | New Data Given |
| Step 4 | Q2 - Likely Diagnosis |
| Step 5 | Additional Data or Context |
| Step 6 | Q3 - Management or Disposition |
If you control scoring software, you can make later questions visible only after answers are locked, but even if you cannot, the structure alone forces you to design questions that mimic clinical workflow.
3. Build Vignettes That Contain Reasoning Clues, Not Noise
Most cases are either bloated with irrelevant data or stripped to Step 1‑era minimalism. Neither tests real reasoning.
When you write a vignette, every sentence should be doing one of three things:
- Pushing the reader toward or away from a key diagnosis.
- Establishing severity/risk.
- Setting up a cognitive trap you want to see if they avoid.
3.1 A Practical Vignette Template
Use something like this:
- Opening frame:
- Age, sex
- Setting (ED, clinic, wards, ICU, postpartum unit)
- Chief complaint in the patient’s own words if possible
- History:
- Time course and key symptom qualities
- 2–3 targeted associated symptoms (yes and no)
- Past medical history that actually matters
- Meds, allergies, habits only if relevant
- Exam:
- A vital sign pattern that helps rank severity
- Focused exam findings that nudge the differential
- 1–2 negatives that rule out key alternatives
- Data:
- Only tests whose presence meaningfully affects the reasoning task
- Do not dump entire panels if only 1 value matters
- Lead‑in question:
- Single decision. No “which of the following is true/false” nonsense.
If a detail does not serve one of these jobs, cut it.
3.2 Encoding Diagnostic Discriminators
You want the student to be forced to use discriminating features. So you have to include them.
Example: Chest pain differential – ACS vs pericarditis vs PE vs reflux.
Weak vignette:
54‑year‑old man with chest pain.
Strong vignette:
54‑year‑old man with 4 hours of substernal chest pressure radiating to the left arm, brought on while shoveling snow, associated with nausea and diaphoresis, partially relieved with rest, not reproducible with palpation.
That tells the reader: exertional, pressure, radiation, autonomic symptoms. Very different reasoning from “chest pain, could be anything.”
You should deliberately include conflicting or ambiguous features in some cases, because real patients are messy. But the correct answer must still be clearly best, based on the overall pattern.
4. Write Lead‑In Questions That Force Reasoning
Most otherwise good vignettes get ruined in the last line.
You have two main jobs with the lead‑in:
- Force a specific cognitive task (diagnose, choose test, manage, disposition).
- Eliminate the option for “gotcha” knowledge checks.
Here are lead‑in structures that do test reasoning:
- “Which of the following is the most likely diagnosis?”
- “Which of the following is the most appropriate next step in management?”
- “Which of the following is the most appropriate next diagnostic test?”
- “Which of the following is the best initial treatment?”
- “Which of the following is the most appropriate disposition for this patient?”
- “Which of the following findings would most strongly support your suspected diagnosis?”
- “Which of the following, if present, would most strongly argue against your leading diagnosis?”
Avoid these:
- “Which of the following is true/false?” (always devolves into trivia)
- “All of the following are correct EXCEPT…” (tests test‑taking tricks, not reasoning)
- “Which of the following statements about [disease] is correct?” (pure recall)
If your lead‑in can be answered without reading the stem, you failed.
Quick check: Hide the answer options and read only the stem and lead‑in. If an expert could answer just from that, with their own option in mind, you are on the right track.
5. Craft Options That Reveal How People Think (Not Just What They Know)
The options are where you see the reasoning errors. Or fail to.
Every distractor should map to a recognizable mistake or alternative pathway. If you cannot say which cognitive error or guideline mistake an option represents, throw it out.
5.1 Map Options to Common Cognitive Errors
Typical patterns in clinical reasoning:
- Premature closure – stopping after the first plausible diagnosis
- Anchoring – sticking with an initial impression despite conflicting data
- Availability bias – over‑valuing recent or vivid diagnoses
- Confirmation bias – selecting data that fit your hypothesis
- Over‑reliance on a single test – ignoring pre‑test probability
Design options to reflect these.
Example:
A 45‑year‑old woman with known GERD presents with chest burning. Risk factors for CAD present. EKG shows subtle ST depressions in V4‑V6.
Lead‑in: “Which of the following is the most appropriate next step?”
Options could be:
A. Increase her proton pump inhibitor dose and discharge
B. Order an outpatient stress test
C. Admit for serial EKGs and troponins with cardiology consultation
D. Discharge with reassurance and follow‑up in 1 week
E. Give a GI cocktail in the ED and reassess in 2 hours
- A/D reflect anchoring on GERD history.
- B reflects underestimation of risk (chronic outpatient thinking).
- E is “test of treatment” reasoning, common but unsafe here.
- C is the correct risk‑based reasoning.
You are not just checking if they know “ST changes = admit.” You are seeing whether they override the tempting GI diagnosis.
5.2 Make Plausible but Inferior Answers
The wrong options should be:
- Clinically plausible
- Logically defensible at first glance
- Inferior once the entire vignette is considered
This means you must sometimes include partially correct answers—things you could do—so that the learner must decide what they should do first.
Do not create absurd distractors (“Prescribe vitamin C”) unless your goal is to waste space.
5.3 Option‑Writing Checklist
Before you finalize:
- Are there 1 clearly best option and 3–4 clearly inferior but plausible options?
- Does each distractor correspond to a known clinical or cognitive error?
- Can a testwise but ignorant student guess the answer from grammar/length/clues? If yes, fix.
- Are options homogeneous (all are tests, or all are treatments, etc.)?
If you cannot answer “yes” to all of those, keep editing.
6. Use Data and Scoring Feedback To Sharpen Reasoning Focus
Once you have an exam, you are not done. You need to see how your questions actually behave.
If you have item statistics, use them. If you do not, you can still analyze patterns from answer distributions.
| Category | Value |
|---|---|
| Q1 - Dyspnea | 82 |
| Q2 - Chest Pain | 48 |
| Q3 - Abdominal Pain | 65 |
| Q4 - Syncope | 55 |
| Q5 - Sepsis | 38 |
Look at three things per question:
Difficulty index (percent correct)
90%: Probably too easy unless testing a must‑know safety item.
- 30–80%: Often the sweet spot for reasoning questions.
- <30%: Either too hard, poorly written, or not taught.
Discrimination index (how well the item separates strong from weak students)
- High discrimination: Keep, maybe model on it.
- Low/negative: Red flag. The item may be misleading or keyed incorrectly.
Distractor analysis (who chose what)
- If a distractor is chosen by almost nobody, it is useless.
- If one wrong option is hugely popular with mid‑performing students, you have identified a common reasoning error—this is gold for feedback sessions.
When you see a question where top performers and bottom performers choose the correct option at similar rates, the item is not assessing higher‑order thinking. It might be “too easy recall” or just noise.
7. A Concrete Workflow for Rewriting Your Existing Exam
You probably already have 40–60 questions. Most are salvageable with structure and clarity. Here is how to triage and fix them.
| Step | Description |
|---|---|
| Step 1 | Collect Old Questions |
| Step 2 | Classify Question Intent |
| Step 3 | Discard or Convert |
| Step 4 | Rewrite Stem and Lead in |
| Step 5 | Add Clinical Vignette |
| Step 6 | Revise Options as Cognitive Errors |
| Step 7 | Peer Review |
| Step 8 | Pilot Test or Use on Next Exam |
| Step 9 | Reasoning or Recall |
Step 1: Sort questions into three piles
- Pile A – Clearly recall only (no vignette, or trivial).
- Pile B – Mixed (some clinical scenario, but weak task).
- Pile C – Already decent reasoning questions.
Be ruthless with Pile A. Most need major surgery.
Step 2: For Pile B and C, rewrite using this sequence
For each question:
- State the reasoning skill you want to test in one sentence. Example: “Differentiate between cardiogenic and septic shock based on exam and hemodynamics.”
- Rebuild the vignette to surface discriminating features for that skill.
- Rewrite the lead‑in to force a decision (diagnosis, next step, test, disposition).
- Rewrite options:
- 1 best answer representing sound reasoning.
- 3–4 distractors mapping to specific reasoning or guideline errors.
Step 3: For Pile A, decide: convert or kill
If you have a pure recall question:
“Which of the following bacteria is most commonly associated with community acquired pneumonia?”
You can either:
- Convert it into a case (fever, cough, CXR pattern, risk factors) and ask for empirical treatment choice or next step.
- Or accept that the core is trivia and drop it in favor of something else.
Not every fact deserves exam real estate. If it does not reflect what you care about in your learners’ clinical practice, stop testing it.
8. Add a Few Non‑MCQ Elements (If Your System Allows It)
Some of you have the luxury of OSCEs, oral exams, or SAQs (short answer questions). Those can push reasoning assessment even further.
8.1 Very Short Answer (VSA) Questions
These are typically 1–3 words or a short sentence, auto‑scored against an accepted answer list. They remove cueing from MCQs.
Example:
A 68‑year‑old man with known COPD presents with increasing dyspnea and productive cough. He is febrile and tachypneic. CXR shows right lower lobe consolidation.
Question: “What is the single most likely causative organism?”
Or:
“What is the most appropriate initial antibiotic regimen?”
You still need a well‑built vignette and specific task, but you eliminate recognition cues.
| Format | Strength for Clinical Reasoning | Weakness |
|---|---|---|
| Single-best-answer MCQ | Excellent, if vignette and options are well-designed | Cueing allows partial guessing |
| Short answer / VSA | Strong for diagnostic/treatment decisions | Harder to score, requires clear key |
| OSCE station | Excellent for real-time reasoning and communication | Resource-intensive to implement |
| Oral exam | Captures thinking process in depth | Vulnerable to examiner bias |
8.2 OSCE‑Style Reasoning Prompts
If you run OSCEs, you can bake reasoning into the scoring checklist:
- “Explains likely diagnosis and at least two alternatives.”
- “Explains why a test is or is not indicated.”
- “Adjusts differential after new data (e.g., lab result) is given.”
But keep OSCEs focused on what they are good at—process and communication. Use written exams to stress‑test the internal logic of your learners’ thinking.

9. Teach to the Test, On Purpose (When the Test Is Good)
Everyone says “do not teach to the test.” That is only true when the test is bad.
If your exam is full of solid clinical reasoning questions, then you should teach to it. Explicitly.
Here is how:
- Use your exam questions as teaching tools in conferences after the exam. Walk through why each distractor is wrong, naming the cognitive error.
- Show learners your structure: How a good vignette is built, what “next best step” means in practice.
- Align cases in teaching with your blueprint: If your blueprint emphasizes risk stratification, your cases in morning report should force that.
| Category | Value |
|---|---|
| Recall | 20 |
| Mixed | 30 |
| Reasoning | 50 |
If residents and students know you are grading how they think, they start paying attention to that in clinics and on rounds. They start saying things like “Given his risk factors, my pre‑test probability for PE is high, so the next test should be…” That is the behavior you want.
10. A Realistic Upgrade Plan for Your Next Exam Cycle
You are busy. You do not have time to rebuild everything at once. Fine. Here is a practical timeline that works over 6–12 months.
| Task | Details |
|---|---|
| Month 1-2: Review existing bank and classify items | a1, 2026-01, 4w |
| Month 2-3: Rewrite highest-yield 20 questions | a2, after a1, 4w |
| Month 3-4: Implement on next exam | a3, after a2, 4w |
| Month 4-5: Analyze item performance | a4, after a3, 4w |
| Month 5-6: Expand approach to rest of bank | a5, after a4, 4w |
Concrete actions:
- This month
- Identify 10–20 questions that cover your most important clinical topics.
- Apply the full rewrite process to those only.
- Next exam
- Swap in the redesigned items.
- Take basic stats or at least note which questions generated the most discussion.
- Post‑exam
- Review item performance, tweak bad ones, keep strong performers.
- Use 3–5 questions as case‑based teaching examples.
- Over the year
- Each exam cycle, convert another 10–20 questions.
- Within 1 year, your entire core bank will be mostly reasoning‑oriented.

11. Common Mistakes To Stop Making Tomorrow
If you remember nothing else, at least avoid these traps:
- Vignettes that are just decorations for a recall question.
- Lead‑ins that ask about mechanisms, definitions, or random facts instead of decisions.
- Options that are either obviously wrong or indistinguishable.
- “Except” and “all of the following” formulations.
- Testing what is easy to grade rather than what matters in practice.
You are training clinicians, not Jeopardy contestants. Your exam should reflect that.

Key Takeaways
- Start every question by naming the clinical reasoning skill you are testing, then build the vignette, lead‑in, and options around that.
- Use single‑best‑answer vignettes and next‑best‑step questions that mimic real workflow, with distractors mapped to recognizable cognitive errors.
- Continuously refine your bank using performance data and learner feedback, treating good questions as both assessment tools and teaching cases.