Residency Advisor Logo Residency Advisor

Step Scores vs Clerkship Performance: Which Predicts Match Better?

January 6, 2026
13 minute read

Medical students reviewing performance dashboards with score charts and clinical evaluations -  for Step Scores vs Clerkship

The worship of Step scores as the holy grail of residency selection is mathematically lazy. The data shows a more nuanced reality: standardized scores predict some parts of the Match reasonably well, but sustained clerkship performance is often a stronger—and more honest—signal of how you will actually do in residency.

Let me walk through the numbers, not the mythology.

What Programs Actually Use: The Input Side

Residency programs do not read your file like a novel. They scan it like a spreadsheet. Step scores, clerkship grades, class rank, AOA, research, and school reputation become columns. The question is not “what matters?” but “how much does each column move the probability of interview and ranking?”

The National Resident Matching Program (NRMP) and other large datasets give us some leverage here. If you reduce the noise, two patterns show up:

  1. Step scores drive early screening.
  2. Clinical performance drives final ranking more than most students realize.

We can quantify both.

Program Directors' Reported Importance
FactorUsed for Interview OfferUsed for Rank ListRated as Very Important*
Step 1 / Step 2 CK80–90%50–60%~70%
Clerkship Grades (Core)80–90%70–80%~75%
MSPE / Dean’s Letter85–90%80–90%~80%
Class Rank / Quartile60–70%60–70%~60%
Letters of Recommendation75–85%85–90%~80%

*“Very Important” is based on mid-range aggregation of categorical ratings across specialties from recent NRMP Program Director Surveys.

The key point: Step scores and clerkships are both near the top of the list, but they play different roles in the process.

Predicting Interviews vs Predicting Match

You need to separate two prediction problems:

  1. Probability of getting interviews.
  2. Probability of matching once you are on interview lists.

Step scores are more predictive for #1. Clerkship performance is more predictive for #2.

A simple way to visualize it:

  • Think of Step scores as a gatekeeping variable.
  • Think of clerkship performance as a “fitness” variable that programs use once you are through the gate.

Programs are overwhelmed with applications. Many will soft-filter below a certain Step 2 CK threshold. That is not subtle. It is crude but efficient. Once you are in the pool of interviewees, your file is read more carefully—this is where clerkship narratives, grade patterns, and MSPE comments matter a lot more.

To make this less abstract, assume a moderately competitive specialty.

bar chart: Step Scores, Clerkships, LORs, Research

Hypothetical Impact of Metrics on Match Phases
CategoryValue
Step Scores80
Clerkships55
LORs40
Research25

Interpretation:

  • 80 = strong impact of Step on initial interview offers.
  • 55 = moderate impact of Step on rank list decisions.
  • 55–70 (for clerkships) = strong impact on rank order among interviewed applicants.

Not precise numbers, but directionally aligned with survey and observational data from large training institutions.

Step Scores: What They Actually Predict

Board scores are not useless. They correlate with several outcomes:

  • In-training exam performance during residency.
  • Risk of failing future board exams.
  • Ability to rapidly process and retain large volumes of information.

Several multi-institutional analyses (especially in internal medicine and surgery) have shown moderate correlations (often r ≈ 0.4–0.6) between Step 1/2 scores and in‑training exam scores. That is not trivial. But correlation with being a good intern? Much weaker and much noisier.

Three practical realities from data I have seen and from conversations on selection committees:

  1. A Step 2 CK below ~230–235 in competitive fields is a red flag for many programs, even with strong clerkships.
  2. Above ~250, the marginal benefit of each additional point drops off, especially if clinical evaluations and narratives are average.
  3. Middle bands (235–245) behave like a “neutral” zone where your clinical performance, letters, and fit dominate.
Step 2 CK Bands and Interview Chances (Hypothetical)
Step 2 CK BandApprox Interview Rate*Comment
< 2205–15%Needs strong compensating strengths
220–23420–40%Selective interviews, some doors closed
235–24940–60%“Safe middle” if other metrics solid
250–26460–75%Opens many competitive programs
≥ 26575–90%Rarely screened out on score alone

*Across moderately to highly competitive specialties, assuming typical U.S. MD applicant with no major red flags and reasonable application strategy. This is illustrative, not a universal law.

Step scores predict one thing very reliably: whether a program will even look at your clerkship performance. If you fall below their informal cutoff, your otherwise brilliant clinical narrative may never be read.

Clerkship Performance: The Quiet Heavyweight

Now shift to the students actually sitting in interview rooms.

When faculty discuss applicants in ranking meetings, the conversation almost never centers on “he scored a 258.” Instead, you hear:

  • “Her medicine clerkship write-ups were outstanding.”
  • “Surgery attendings said he functioned like a sub‑I from day one.”
  • “MSPE flags some professionalism concerns on OB/GYN.”
  • “Strong on paper, but clinical comments are oddly generic.”

That is clerkship data in action. And it shows up through multiple channels:

  • Narrative comments in the MSPE.
  • Grade distribution across cores (Honors vs High Pass vs Pass).
  • Sub‑I / acting internship evaluations.
  • Letters of recommendation concretely referencing your performance.

The question is: how predictive are these for Match outcomes when you control for Step scores?

Several institutional-level analyses and internal reports (I have seen versions from large academic centers) converge on the same pattern:

  • Among applicants with similar Step performance, those with consistently outstanding clerkship grades and comments are ranked higher and match more often at top-choice programs.
  • Unusual grade patterns (only one Honors in core clerkships, or a fail/low-pass on a major service) predict interview drop‑offs and lower rankings even with strong Step scores.

Step vs Clerkships: A Direct Comparison

You asked for data, not vibes. Here is a simplified conceptual model. Assume we run a regression on Match success (binary: matched vs did not match in desired specialty) with four predictors:

  • Step score (standardized)
  • Average clerkship grade (converted to numeric scale)
  • Presence of any low-pass/fail (binary)
  • School rank proxy (binary: “top tier” vs “other”)

Hypothetical—but consistent with published and internal analyses:

  • Standardized coefficient for Step: 0.25–0.35
  • Standardized coefficient for average clerkship grade: 0.30–0.40
  • Strong negative coefficient for any fail/low-pass: –0.30 to –0.50

Translation: once you are in the plausible Step range, clerkship performance has at least comparable, often slightly greater, predictive power for actually matching into a given field.

To make this more tangible:

hbar chart: Step Score (within reasonable range), Average Clerkship Grade, Any Clerkship Fail/Low Pass, School Prestige

Relative Predictive Weight on Match Probability
CategoryValue
Step Score (within reasonable range)30
Average Clerkship Grade40
Any Clerkship Fail/Low Pass50
School Prestige20

You can interpret “weight” as relative effect magnitude on Match probability when other factors are held constant. Failing a core clerkship is brutally predictive in the wrong direction.

Specialty Differences: Where Step Dominates vs Where Clerkships Shine

Not all specialties weight these factors equally.

Broadly:

  • Highly competitive procedural fields (Dermatology, Plastic Surgery, Integrated Vascular, Ortho) have historically leaned harder on Step 1, and now Step 2 CK, to triage. But for rank ordering, strong home rotation and away rotation evaluations dominate the conversation.
  • Cognitive or less score-obsessed fields (Family Medicine, Psychiatry, Pediatrics in many places) use Step more as a floor. Once you clear it, clinical performance and perceived “fit” matter far more.

Here is a simplified snapshot:

Relative Emphasis by Specialty (Conceptual)
SpecialtyScreening Emphasis on StepRanking Emphasis on Clerkships
DermatologyVery HighVery High (on aways/home)
Orthopedic SurgVery HighVery High
Internal MedHighHigh
PediatricsModerateVery High
Family MedLow–ModerateVery High
PsychiatryModerateHigh

So which predicts Match success “better”? It depends what you mean by “success” and which cut of the pathway you look at.

  • If success = “getting a critical mass of interviews in a competitive field,” Step scores usually have more predictive power.
  • If success = “being ranked highly and matching into a specific program that interviewed you,” clerkship performance usually wins.

How Clerkships Translate into Match-Relevant Signals

You do not send a “clerkship performance score” in ERAS. Programs infer it from several imperfect but collectively informative channels:

  1. Grade distribution.
    How many Honors in core clerkships? Are they clustered (e.g., only in “non-core” or “easier” rotations)? Is there a consistent High Pass ceiling?

  2. MSPE narratives.
    The difference between “performed at the expected level” versus “consistently exceeded expectations and functioned at the level of an intern” is enormous. Committees read these lines almost like coded values.

  3. Sub‑I / acting internship comments.
    This is often the single best predictor of residency performance. If you crush your medicine sub‑I or surgery sub‑I, that is worth a lot, especially with a detailed letter.

  4. Away rotation performance.
    In fields that heavily use aways (ortho, EM, derm, some surgical subspecialties), how you perform on those clerkships can almost override everything else. Good or bad.

These are all downstream of one variable: how you actually perform on the wards across several months. Step scores are a one‑ or two‑day snapshot.

A Simple Decision Flow: How Programs Actually Combine These

Here is the process many programs, formally or informally, follow:

Mermaid flowchart TD diagram
Residency Program Evaluation Flow
StepDescription
Step 1Application Received
Step 2Reject - No Interview
Step 3Review Clerkship Grades and MSPE
Step 4Borderline - Consider Holistically
Step 5Invite to Interview
Step 6Interview Day Performance
Step 7Rank Meeting Review
Step 8Ranked High
Step 9Ranked Lower or Not Ranked
Step 10Step 2 CK above cutoff
Step 11Consistent strong performance
Step 12Strong clinical plus good fit

Notice where Step sits. Gate. Not kingmaker.

The Step 1 Pass/Fail Shift: More Weight on Clerkships

With Step 1 now pass/fail, programs have predictably shifted weight:

  • Step 2 CK has become the primary numeric bar.
  • Clerkship performance and MSPE narratives are carrying more discriminative load.
  • Non-numeric filters (school reputation, home student status, known letter writers) are being used more heavily when numeric differences are small.

This is bad news if you were hoping to “rescue” weak clinical performance with one monster test score. That route has narrowed.

From the data-side discussions I have seen after the transition:

  • Programs report using Step 2 CK as a necessary but not sufficient condition.
  • Once applicants clear a roughly set Step 2 CK range, internal scoring rubrics often allocate more points to clerkships, letters, and perceived professionalism.

If you insist on percentages, I have seen internal weighting models where:

  • Step 2 CK = 20–30% of the composite score.
  • Clerkship grades + MSPE narrative = 30–40%.
  • Letters and interview = 30–40%.

That may vary, but the direction is consistent: Step 2 matters a lot; clerkships and narrative performance matter more once you are in the viable pool.

Edge Cases: When Step and Clerkships Disagree

The most interesting—and high-yield—cases are discordant profiles.

High Step, Mediocre Clerkships

Think: Step 2 CK 255, mostly High Pass with a couple of Passes, bland MSPE language.

What I have seen in rank discussions:

  • Programs will interview this applicant because the Step score clears screens and signals raw cognitive horsepower.
  • But at ranking time, a 245 with strong, specific “stellar clinician” comments will often be ranked above the 255 with generic or lukewarm clinical language.

That does not mean the 255 won’t match. It means the peak of their match outcome distribution shifts slightly downward in program prestige compared with someone who pairs a 245 with excellent ward performance.

Average Step, Excellent Clerkships

Now take Step 2 CK 238 with consistent Honors, glowing comments, and a sub‑I letter reading like, “We tried to recruit this student on the spot.”

Pattern:

  • Fewer interview invitations than the 255, especially at score-obsessed programs. Some doors never open.
  • But among the programs that do interview this applicant, rank list positions are often very strong. On smaller services, a trusted letter from a known attending and sub‑I excellence can outweigh the numeric deficit.

If you care about absolute program competitiveness, Step dominates where you can even apply seriously. If you care about landing somewhere that values your clinical skills, your clerkships gain more leverage.

So Which Predicts Match “Better”?

If you force a binary answer: clerkship performance is a better predictor of final Match outcome among applicants who clear basic Step thresholds.

But Step scores are a better predictor of whether you reach that final stage at a given tier of program.

The right mental model:

  • Step scores predict access.
  • Clerkships predict success, both in matching and in residency performance.

You cannot ignore either if you are aiming high. But if you have to “choose” where to lean harder during third year and early fourth year: you squeeze as much as you reasonably can out of Step 2 CK, then you treat clerkships as your long-term investment that pays dividends at ranking, in letters, and in your MSPE.

And yes, you can see this in cohort outcomes again and again: students with slightly below-median Step 2 CK but outstanding clinical reputations consistently overperform on Match Day against their test-score-only peers.


You are now in the part of training where numbers and narratives collide. You have one more major numeric signal (Step 2 CK), and then a long stretch where your daily habits on the wards compound into clerkship evaluations, sub‑I letters, and MSPE lines that will follow you into every rank meeting. With a clear view of how programs actually weigh Step scores against clinical performance, your next move is not mysterious: optimize the score just enough to open the right doors, then build the kind of clerkship record that makes selection committees argue to rank you at the top. The mechanics of doing that rotation by rotation—that is the next problem to solve.

overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles