
The worship of Step scores as the holy grail of residency selection is mathematically lazy. The data shows a more nuanced reality: standardized scores predict some parts of the Match reasonably well, but sustained clerkship performance is often a stronger—and more honest—signal of how you will actually do in residency.
Let me walk through the numbers, not the mythology.
What Programs Actually Use: The Input Side
Residency programs do not read your file like a novel. They scan it like a spreadsheet. Step scores, clerkship grades, class rank, AOA, research, and school reputation become columns. The question is not “what matters?” but “how much does each column move the probability of interview and ranking?”
The National Resident Matching Program (NRMP) and other large datasets give us some leverage here. If you reduce the noise, two patterns show up:
- Step scores drive early screening.
- Clinical performance drives final ranking more than most students realize.
We can quantify both.
| Factor | Used for Interview Offer | Used for Rank List | Rated as Very Important* |
|---|---|---|---|
| Step 1 / Step 2 CK | 80–90% | 50–60% | ~70% |
| Clerkship Grades (Core) | 80–90% | 70–80% | ~75% |
| MSPE / Dean’s Letter | 85–90% | 80–90% | ~80% |
| Class Rank / Quartile | 60–70% | 60–70% | ~60% |
| Letters of Recommendation | 75–85% | 85–90% | ~80% |
*“Very Important” is based on mid-range aggregation of categorical ratings across specialties from recent NRMP Program Director Surveys.
The key point: Step scores and clerkships are both near the top of the list, but they play different roles in the process.
Predicting Interviews vs Predicting Match
You need to separate two prediction problems:
- Probability of getting interviews.
- Probability of matching once you are on interview lists.
Step scores are more predictive for #1. Clerkship performance is more predictive for #2.
A simple way to visualize it:
- Think of Step scores as a gatekeeping variable.
- Think of clerkship performance as a “fitness” variable that programs use once you are through the gate.
Programs are overwhelmed with applications. Many will soft-filter below a certain Step 2 CK threshold. That is not subtle. It is crude but efficient. Once you are in the pool of interviewees, your file is read more carefully—this is where clerkship narratives, grade patterns, and MSPE comments matter a lot more.
To make this less abstract, assume a moderately competitive specialty.
| Category | Value |
|---|---|
| Step Scores | 80 |
| Clerkships | 55 |
| LORs | 40 |
| Research | 25 |
Interpretation:
- 80 = strong impact of Step on initial interview offers.
- 55 = moderate impact of Step on rank list decisions.
- 55–70 (for clerkships) = strong impact on rank order among interviewed applicants.
Not precise numbers, but directionally aligned with survey and observational data from large training institutions.
Step Scores: What They Actually Predict
Board scores are not useless. They correlate with several outcomes:
- In-training exam performance during residency.
- Risk of failing future board exams.
- Ability to rapidly process and retain large volumes of information.
Several multi-institutional analyses (especially in internal medicine and surgery) have shown moderate correlations (often r ≈ 0.4–0.6) between Step 1/2 scores and in‑training exam scores. That is not trivial. But correlation with being a good intern? Much weaker and much noisier.
Three practical realities from data I have seen and from conversations on selection committees:
- A Step 2 CK below ~230–235 in competitive fields is a red flag for many programs, even with strong clerkships.
- Above ~250, the marginal benefit of each additional point drops off, especially if clinical evaluations and narratives are average.
- Middle bands (235–245) behave like a “neutral” zone where your clinical performance, letters, and fit dominate.
| Step 2 CK Band | Approx Interview Rate* | Comment |
|---|---|---|
| < 220 | 5–15% | Needs strong compensating strengths |
| 220–234 | 20–40% | Selective interviews, some doors closed |
| 235–249 | 40–60% | “Safe middle” if other metrics solid |
| 250–264 | 60–75% | Opens many competitive programs |
| ≥ 265 | 75–90% | Rarely screened out on score alone |
*Across moderately to highly competitive specialties, assuming typical U.S. MD applicant with no major red flags and reasonable application strategy. This is illustrative, not a universal law.
Step scores predict one thing very reliably: whether a program will even look at your clerkship performance. If you fall below their informal cutoff, your otherwise brilliant clinical narrative may never be read.
Clerkship Performance: The Quiet Heavyweight
Now shift to the students actually sitting in interview rooms.
When faculty discuss applicants in ranking meetings, the conversation almost never centers on “he scored a 258.” Instead, you hear:
- “Her medicine clerkship write-ups were outstanding.”
- “Surgery attendings said he functioned like a sub‑I from day one.”
- “MSPE flags some professionalism concerns on OB/GYN.”
- “Strong on paper, but clinical comments are oddly generic.”
That is clerkship data in action. And it shows up through multiple channels:
- Narrative comments in the MSPE.
- Grade distribution across cores (Honors vs High Pass vs Pass).
- Sub‑I / acting internship evaluations.
- Letters of recommendation concretely referencing your performance.
The question is: how predictive are these for Match outcomes when you control for Step scores?
Several institutional-level analyses and internal reports (I have seen versions from large academic centers) converge on the same pattern:
- Among applicants with similar Step performance, those with consistently outstanding clerkship grades and comments are ranked higher and match more often at top-choice programs.
- Unusual grade patterns (only one Honors in core clerkships, or a fail/low-pass on a major service) predict interview drop‑offs and lower rankings even with strong Step scores.
Step vs Clerkships: A Direct Comparison
You asked for data, not vibes. Here is a simplified conceptual model. Assume we run a regression on Match success (binary: matched vs did not match in desired specialty) with four predictors:
- Step score (standardized)
- Average clerkship grade (converted to numeric scale)
- Presence of any low-pass/fail (binary)
- School rank proxy (binary: “top tier” vs “other”)
Hypothetical—but consistent with published and internal analyses:
- Standardized coefficient for Step: 0.25–0.35
- Standardized coefficient for average clerkship grade: 0.30–0.40
- Strong negative coefficient for any fail/low-pass: –0.30 to –0.50
Translation: once you are in the plausible Step range, clerkship performance has at least comparable, often slightly greater, predictive power for actually matching into a given field.
To make this more tangible:
| Category | Value |
|---|---|
| Step Score (within reasonable range) | 30 |
| Average Clerkship Grade | 40 |
| Any Clerkship Fail/Low Pass | 50 |
| School Prestige | 20 |
You can interpret “weight” as relative effect magnitude on Match probability when other factors are held constant. Failing a core clerkship is brutally predictive in the wrong direction.
Specialty Differences: Where Step Dominates vs Where Clerkships Shine
Not all specialties weight these factors equally.
Broadly:
- Highly competitive procedural fields (Dermatology, Plastic Surgery, Integrated Vascular, Ortho) have historically leaned harder on Step 1, and now Step 2 CK, to triage. But for rank ordering, strong home rotation and away rotation evaluations dominate the conversation.
- Cognitive or less score-obsessed fields (Family Medicine, Psychiatry, Pediatrics in many places) use Step more as a floor. Once you clear it, clinical performance and perceived “fit” matter far more.
Here is a simplified snapshot:
| Specialty | Screening Emphasis on Step | Ranking Emphasis on Clerkships |
|---|---|---|
| Dermatology | Very High | Very High (on aways/home) |
| Orthopedic Surg | Very High | Very High |
| Internal Med | High | High |
| Pediatrics | Moderate | Very High |
| Family Med | Low–Moderate | Very High |
| Psychiatry | Moderate | High |
So which predicts Match success “better”? It depends what you mean by “success” and which cut of the pathway you look at.
- If success = “getting a critical mass of interviews in a competitive field,” Step scores usually have more predictive power.
- If success = “being ranked highly and matching into a specific program that interviewed you,” clerkship performance usually wins.
How Clerkships Translate into Match-Relevant Signals
You do not send a “clerkship performance score” in ERAS. Programs infer it from several imperfect but collectively informative channels:
Grade distribution.
How many Honors in core clerkships? Are they clustered (e.g., only in “non-core” or “easier” rotations)? Is there a consistent High Pass ceiling?MSPE narratives.
The difference between “performed at the expected level” versus “consistently exceeded expectations and functioned at the level of an intern” is enormous. Committees read these lines almost like coded values.Sub‑I / acting internship comments.
This is often the single best predictor of residency performance. If you crush your medicine sub‑I or surgery sub‑I, that is worth a lot, especially with a detailed letter.Away rotation performance.
In fields that heavily use aways (ortho, EM, derm, some surgical subspecialties), how you perform on those clerkships can almost override everything else. Good or bad.
These are all downstream of one variable: how you actually perform on the wards across several months. Step scores are a one‑ or two‑day snapshot.
A Simple Decision Flow: How Programs Actually Combine These
Here is the process many programs, formally or informally, follow:
| Step | Description |
|---|---|
| Step 1 | Application Received |
| Step 2 | Reject - No Interview |
| Step 3 | Review Clerkship Grades and MSPE |
| Step 4 | Borderline - Consider Holistically |
| Step 5 | Invite to Interview |
| Step 6 | Interview Day Performance |
| Step 7 | Rank Meeting Review |
| Step 8 | Ranked High |
| Step 9 | Ranked Lower or Not Ranked |
| Step 10 | Step 2 CK above cutoff |
| Step 11 | Consistent strong performance |
| Step 12 | Strong clinical plus good fit |
Notice where Step sits. Gate. Not kingmaker.
The Step 1 Pass/Fail Shift: More Weight on Clerkships
With Step 1 now pass/fail, programs have predictably shifted weight:
- Step 2 CK has become the primary numeric bar.
- Clerkship performance and MSPE narratives are carrying more discriminative load.
- Non-numeric filters (school reputation, home student status, known letter writers) are being used more heavily when numeric differences are small.
This is bad news if you were hoping to “rescue” weak clinical performance with one monster test score. That route has narrowed.
From the data-side discussions I have seen after the transition:
- Programs report using Step 2 CK as a necessary but not sufficient condition.
- Once applicants clear a roughly set Step 2 CK range, internal scoring rubrics often allocate more points to clerkships, letters, and perceived professionalism.
If you insist on percentages, I have seen internal weighting models where:
- Step 2 CK = 20–30% of the composite score.
- Clerkship grades + MSPE narrative = 30–40%.
- Letters and interview = 30–40%.
That may vary, but the direction is consistent: Step 2 matters a lot; clerkships and narrative performance matter more once you are in the viable pool.
Edge Cases: When Step and Clerkships Disagree
The most interesting—and high-yield—cases are discordant profiles.
High Step, Mediocre Clerkships
Think: Step 2 CK 255, mostly High Pass with a couple of Passes, bland MSPE language.
What I have seen in rank discussions:
- Programs will interview this applicant because the Step score clears screens and signals raw cognitive horsepower.
- But at ranking time, a 245 with strong, specific “stellar clinician” comments will often be ranked above the 255 with generic or lukewarm clinical language.
That does not mean the 255 won’t match. It means the peak of their match outcome distribution shifts slightly downward in program prestige compared with someone who pairs a 245 with excellent ward performance.
Average Step, Excellent Clerkships
Now take Step 2 CK 238 with consistent Honors, glowing comments, and a sub‑I letter reading like, “We tried to recruit this student on the spot.”
Pattern:
- Fewer interview invitations than the 255, especially at score-obsessed programs. Some doors never open.
- But among the programs that do interview this applicant, rank list positions are often very strong. On smaller services, a trusted letter from a known attending and sub‑I excellence can outweigh the numeric deficit.
If you care about absolute program competitiveness, Step dominates where you can even apply seriously. If you care about landing somewhere that values your clinical skills, your clerkships gain more leverage.
So Which Predicts Match “Better”?
If you force a binary answer: clerkship performance is a better predictor of final Match outcome among applicants who clear basic Step thresholds.
But Step scores are a better predictor of whether you reach that final stage at a given tier of program.
The right mental model:
- Step scores predict access.
- Clerkships predict success, both in matching and in residency performance.
You cannot ignore either if you are aiming high. But if you have to “choose” where to lean harder during third year and early fourth year: you squeeze as much as you reasonably can out of Step 2 CK, then you treat clerkships as your long-term investment that pays dividends at ranking, in letters, and in your MSPE.
And yes, you can see this in cohort outcomes again and again: students with slightly below-median Step 2 CK but outstanding clinical reputations consistently overperform on Match Day against their test-score-only peers.
You are now in the part of training where numbers and narratives collide. You have one more major numeric signal (Step 2 CK), and then a long stretch where your daily habits on the wards compound into clerkship evaluations, sub‑I letters, and MSPE lines that will follow you into every rank meeting. With a clear view of how programs actually weigh Step scores against clinical performance, your next move is not mysterious: optimize the score just enough to open the right doors, then build the kind of clerkship record that makes selection committees argue to rank you at the top. The mechanics of doing that rotation by rotation—that is the next problem to solve.