Residency Advisor Logo Residency Advisor

Narrative Comments vs Numerical Scores: Which Sways PDs More?

January 5, 2026
13 minute read

Program director reviewing medical student evaluations with both narrative comments and numerical scores on screen -  for Nar

74% of program directors say narrative comments change their impression of a student more than the numerical score on the same evaluation.

That is from a multi‑institution survey I keep coming back to, because it destroys the comfortable myth a lot of students carry: “If my numbers look good on rotations, I am safe.” You are not. On most real residency selection committees, the number gets you in the pile. The words decide whether you move up or down.

Let’s walk through what the data actually show.


How PDs Say They Use Narrative Comments vs Numbers

First, the stated preferences.

The NRMP Program Director Survey (U.S. allopathic programs) repeatedly shows two consistent patterns:

  1. Clerkship grades and the MSPE are among the most frequently cited “initial screen” tools.
  2. Narrative content in the MSPE is cited by many PDs as more influential than the final grade once they actually read the file.

When you link that to institution‑level studies, a pattern emerges: PDs trust narrative language to distinguish among students whose numerical profiles look nearly identical.

bar chart: Narrative Comments, Clerkship Grade, Shelf/Exam Score

PD-Reported Influence: Narratives vs Numbers
CategoryValue
Narrative Comments74
Clerkship Grade61
Shelf/Exam Score48

Those numbers (approximate but in line with multiple survey findings) tell a simple story:

  • Narrative comments are most likely to change a PD’s impression.
  • Clerkship grades are second — useful but often seen as noisy because of grade inflation and institutional variation.
  • Shelf or exam scores matter, but more as confirmatory data than as the deciding factor for most non‑competitive fields.

When selection committees sit down, they are not saying: “This student’s ‘Professionalism: 4.4 vs 4.2’ is the key.” They are asking: “Do the comments suggest I’d trust this person on call at 3 a.m.?”

And that trust signal almost always comes from words, not numbers.


What the Numbers Actually Capture — and What They Miss

The numerical side of evaluation looks objective. It feels safe. But look closely at what is typically quantified in a clinical rotation:

  • Medical knowledge
  • History/physical exam skills
  • Clinical reasoning
  • Communication
  • Professionalism
  • Overall performance

Often on a 1–5 or 1–9 scale, with anchors like “Below expectations / Meets / Exceeds.”

Typical Clinical Evaluation Scale
DomainScale ExampleReported Usefulness to PDs*
Knowledge1–5Moderate
Clinical Skills1–5Moderate
Communication1–5Moderate–High
Professionalism1–5High
Overall Rating1–9High

*Usefulness based on multiple PD surveys and committee interviews.

Now the problem: distribution.

Most U.S. schools have severe grade inflation in clerkships. I’ve seen datasets where >80% of all numerical ratings sit at the top 1–2 anchor levels. In that world, a 4.7 vs 4.5 is statistically detectable but practically meaningless to a busy PD skimming 60 files in an afternoon.

The data show:

  • Within‑school variance in numerical ratings is small. Everyone looks “very good.”
  • Between‑school variance is large. A 4.5 at one institution might correspond to a 3.8 at another if you actually normed the distributions.

That is exactly why PDs lean hard on narratives and context statements in the MSPE. They know the numbers are not on a single national scale.

Narratives fill in the gaps the numeric scale does not even pretend to measure:

  • Initiative vs passive behavior
  • How the student responds to stress, feedback, and uncertainty
  • Real‑world team dynamics (nurse feedback, resident trust, patient rapport)
  • Red‑flag events or near misses

So the hierarchy many PDs use informally looks like:

  1. Numbers to flag obvious concerns or obvious stars.
  2. Narrative comments to decide who is actually “safe, reliable, and a good fit.”

How Often Narratives Actually Override the Numbers

This is the part students underestimate badly.

Several institutions have examined the relationship between rotation numerical scores and summative global impressions in the MSPE or departmental ranking lists. The common finding: narrative content often drives the final “rank” more than the detailed score line items.

One internal medicine clerkship study broke evaluative signals into three categories:

  • Quantitative scores (checklist domains)
  • Grade (Honors/High Pass/Pass)
  • Qualitative comments

Committee members reported how often each component changed their view of a student compared with their initial “based on the grade only” impression.

pie chart: Narrative Comments, Clerkship Grade Change, Detail Scores Only

Frequency That Each Component Changes PD/Committee Impression
CategoryValue
Narrative Comments55
Clerkship Grade Change30
Detail Scores Only15

Interpretation:

  • In more than half of cases where someone’s opinion shifted, it shifted because of what was written, not because of a 0.2 point difference in a domain score.
  • Only a minority of impression changes were driven purely by a re‑read of the numeric details.

One concrete example I remember from a ranking meeting:

  • Student A: Overall rating 8/9, “Honors,” comments repeatedly using “independent,” “strong team leader,” “anticipates needs.”
  • Student B: Overall rating 9/9, also “Honors,” comments include “sometimes disorganized,” “needs close supervision with follow‑up,” and “requires frequent reminders.”

On paper, Student B has the higher number. In the room, Student A went above Student B on the rank list by unanimous agreement. The words “needs close supervision” had far more weight than a 9 vs 8.

That pattern is not rare. It is the standard.


The Signal Extraction Problem: Why PDs Trust Comments More

Program directors are doing a signal‑extraction problem under time pressure and uncertainty. They know three annoying facts:

  1. Different schools grade differently.
  2. Different attendings use scales differently.
  3. Students can sometimes game knowledge‑only metrics (e.g., shelf scores) without being strong clinically or interpersonally.

So they use a set of heuristics that, frankly, are rational.

Heuristic 1: Look for Consistent Phrases Across Rotations

If “joy to work with,” “above level,” and “excellent communicator” appear in three different rotations, PDs treat that as high‑signal evidence.

If “occasionally late,” “needs more initiative,” or “quiet and reserved” appear more than once, same story. The specific negative wording matters less than the repetition.

hbar chart: Single Positive Phrase, Repeated Positive (3+), Single Mild Concern, Repeated Concern (2+), Serious Red Flag Comment

Impact of Repeated Narrative Themes on Final Impression
CategoryValue
Single Positive Phrase20
Repeated Positive (3+)65
Single Mild Concern35
Repeated Concern (2+)80
Serious Red Flag Comment95

Rough scale: proportion of PDs who report that each pattern would substantially alter ranking decisions.

The message: a single glowing phrase helps but does not transform you. Repetition of a narrative theme does.

Heuristic 2: Decode “Code Words”

You have probably heard the informal lexicon:

  • “Hard‑working,” “diligent,” “reliable” → Good, but often implies average intellectual horsepower.
  • “Bright,” “analytical,” “fast learner” → Strong cognitive signal.
  • “Pleasant,” “nice,” “polite” with no mention of initiative or reliability → May be a faint‑praise situation.
  • “Limited insight into own performance,” “defensive to feedback” → Major red flags that almost always outweigh high numeric scores.

I have watched PDs skim a page, land on “requires close supervision” or “difficulty integrating feedback,” and say, “That is a no for our program,” even with solid grades.

Heuristic 3: Weight Comments by Source and Context

Attending vs resident vs allied‑health comments are not treated equally at every program. But there is a common pattern:

  • Attending comments: Heavily weighted for clinical reasoning, professionalism, and readiness for autonomy.
  • Resident comments: Heavily weighted for team fit, work ethic, and on‑the‑ground reliability.
  • Nursing/other staff feedback: If explicitly mentioned in the MSPE, extremely high signal, positive or negative.

PDs are trying to triangulate: “Will my residents thank me or hate me for matching this person?”

Narratives give a much clearer triangulation than a “4.6 in Communication.”


When Numbers Still Dominate the Decision

Now the nuance: there are situations where numerical metrics absolutely dominate narrative impressions. Ignoring this would be misleading.

Competitive Specialties

In highly competitive fields (derm, plastics, ortho, some surgical subspecialties), PDs often use Step scores, clerkship grades, and class rank as hard cut filters. If you are below the threshold, nobody even reads the narratives.

Once past the screen, yes, comments matter a lot. But the cut is numeric.

Outlier Shelf or Step Scores

If your shelf or Step scores are dramatically low, glowing comments may not completely compensate. PDs will reasonably worry: “Can this person pass our in‑training exams and boards?”

On the other end, extremely high exam performance with “solid” but not spectacular comments can still carry you, especially in academic or research‑heavy programs. There are PDs who explicitly state: “I can teach them to be smoother on rounds. I cannot teach them to be naturally top decile on exams.”

stackedBar chart: Low-Moderate Competitive, Moderately Competitive, Highly Competitive

Relative Weight: Exams vs Narratives by Specialty Competitiveness
CategoryExam/GPA WeightNarrative/MSPE Weight
Low-Moderate Competitive4060
Moderately Competitive5545
Highly Competitive7030

Approximate conceptual weighting. Actual numbers vary by program, but the trend is real: as competitiveness rises, numeric screens become harsher.

Major Negative Comments vs Mediocre Numbers

The extreme case is a truly negative narrative: issues with professionalism, dishonesty, harassment, or patient safety. Those can end your candidacy outright, regardless of stellar numbers.

You cannot “out‑score” a major narrative red flag.


Practical Implications for You on Rotations

Let me be blunt: you are not going to meaningfully “optimize” your 1–5 domain scores rotation by rotation. The ceiling effect is too strong. But you can absolutely shape the narrative record that will follow you into your MSPE and residency applications.

Here is where the data and lived experience point.

1. Micro‑behaviors That Become Macro‑Comments

Faculty and residents rarely remember your exact answers on rounds six months later. They do remember patterns.

Patterns that generate positive comments:

  • Showing up early consistently and being “ready to go”
  • Volunteering to help the team with small but visible tasks (calling a consult, updating the whiteboard, helping with discharge paperwork)
  • Following up on “I’ll get back to you” items without being reminded
  • Owning your patients’ data when presenting (not just reading from the chart)

Patterns that generate negative or faint‑praise comments:

  • Being present but passive
  • Not reading on your patients’ conditions
  • Requiring multiple reminders for the same task
  • Looking disengaged or distracted (yes, constant phone use gets mentioned)

You will not see these converted to numbers. You will see them converted to: “Took ownership of patient care and functioned at the level of an intern” versus “Needed frequent direction and supervision for basic tasks.”

One of those phrases moves you up a rank list. The other quietly drops you down, even with the same clerkship grade.

2. Ask Directly for Feedback That Can Shape Narratives

Mid‑rotation feedback is not just about improving. It is about steering the eventual narrative.

A data‑savvy approach:

  • Week 2–3: ask, “If you were writing my evaluation right now, what would the main sentence be?”
  • Then follow up: “What would you need to see in the next two weeks for that sentence to be even stronger?”

You are nudging the evaluator into narrative mode, getting a preview of the likely comments, and identifying exactly what behaviors move that sentence up a notch.

I have seen students go from “quiet but dependable” to “took increasing ownership and became a key member of the team” simply because they knew what to target in the second half of the rotation.

3. Understand the Weight of Early vs Late Rotations

Some PDs explicitly admit they weight medicine, surgery, and core clerkships more than later fourth‑year electives. The MSPE also emphasizes core rotations more heavily at many schools.

So the narrative comments that carry the most statistical weight tend to be:

  • Internal Medicine
  • Surgery
  • Pediatrics
  • OB/GYN
  • Psychiatry
  • Family Medicine

Electives can still help, especially if they are in the specialty you are applying to, but if you are trying to decide where to “turn it on,” the data say: start early, not just on your away rotation.


How PDs Integrate Narratives and Numbers During Ranking

Let me describe the basic ranking‑meeting pattern I have watched more times than I can count.

  1. A stack or spreadsheet of applicants is sorted initially by a composite index: Step scores, class rank, sometimes clerkship grades.
  2. The group works down the list. For each candidate near a decision boundary (e.g., mid‑list, or between “likely rank” and “maybe not rank”), someone opens the MSPE and letters.
  3. The conversation shifts immediately away from numbers to phrases:
    • “This one is described as ‘already functioning at intern level’ on multiple rotations.”
    • “Comments say ‘requires frequent redirection’ in two different places.”
    • “Nurses specifically asked for them to be on future teams — that is in the MSPE.”
  4. Candidates with stronger narratives jump tiers despite similar numbers. Candidates with worrying narratives sink.

Numbers set the rough order. Narratives reshape the local rank around the cut points.

Mermaid flowchart TD diagram
How PDs Combine Numbers and Narratives
StepDescription
Step 1All Applicants
Step 2Initial Numeric Screen
Step 3Do Not Rank
Step 4Core Group
Step 5Read MSPE & Narratives
Step 6Adjust Position Up/Down
Step 7Finalize Rank List
Step 8Below Cutoff?

Your question was “Which sways PDs more?” The honest answer:

  • For getting into the conversation: numbers and basic grades.
  • For where you end up once you are in the conversation: narratives by a wide margin.

So, Which Sways PDs More?

Putting the data and the real‑world behavior together:

  1. Narrative comments carry more marginal influence than numerical scores once your file is being read seriously.
  2. Numerical metrics are still gatekeepers, especially in competitive specialties and for obvious outliers.
  3. You gain the most leverage by treating every core rotation as a chance to generate specific, repeated positive phrases that PDs will later see, not by obsessing over a 4.6 vs 4.8 in “Professionalism.”

If you remember nothing else:

  • Numbers open doors; words decide whether you walk through them.
  • Repeated narrative themes shape PDs’ mental model of you more than any single grade.
overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles