
Only 46% of program directors now say Step 1 is “very important” in selecting applicants for interview. Before pass/fail, that number was 94%.
That single drop rewired the residency selection algorithm. The scoring function did not disappear. It just shifted its weight to other inputs: core clerkship grades and Step 2 CK. If you are in the Step 1 pass/fail era, your question is no longer “How do I get a 250?” It is “What actually predicts a match when the old metric is gone?”
Let me walk through this like a data problem: what signals exist, how strong they are, how programs really use them, and how clerkship grades and Step 2 interact as predictors.
The new selection model: what replaced the 3‑digit Step 1?
Before Step 1 went pass/fail, it was the primary sorting key. NRMP Program Director Survey data from 2018 and 2020 were very clear:
- 94%+ of PDs used Step 1 for interview offers
- Roughly 80% rated it as a factor with “high importance”
- Many specialties had soft cutoffs around 230–240, with competitive fields at 245+
After the transition to pass/fail, the 2022 and 2024 PD surveys show a very consistent reweighting:
- Step 2 CK: used by ~88–90% of PDs, with >80% rating it “very important”
- Clerkship grades and MSPE: used by >90% of PDs, with ~65–75% rating as “very important”
- Class rank / quartile (where reported): used by ~70%, with ~60% rating as important
The data show the “objective” exam anchor moved from Step 1 to Step 2 CK, and the “performance” anchor moved toward core clerkship outcomes.
Think of it as a simple model:
Match probability ≈ f(Step 2 CK, Core Clerkship Performance, School Reputation, Research, Letters, Interview)
Step 1 used to be a heavy coefficient in that function. Now, the coefficients on Step 2 and clerkship metrics went up. You are not in a “holistic” utopia. You are in a reweighted regression.
Core clerkship grades: how predictive are they really?
No one publishes a single universal effect size, because schools grade differently. But we do have consistent patterns from several sources:
- NRMP PD Surveys (multiple cycles)
- Specialty-specific data (IM, Gen Surg, EM, etc.)
- School-level analyses of AOA, honors, and match outcomes
The short version: honors in core clerkships is a strong, independent predictor of matching, especially into competitive specialties, and even more so in the Step 1 pass/fail era.
Why PDs care about clerkship grades
Program directors repeatedly cite three reasons:
- Direct signal of clinical performance: how you function on a team, with patients, in real time
- Relative rank: at most schools, clerkship grades are bell-curved; honors = top 20–30%
- Correlation with later performance: better clerkship evaluations correlate with better residency evaluations and in‑training exam scores (multiple IM and surgery studies back this up)
Look at what PDs say they look at specifically:
- Internal medicine PDs: around 80–85% rate “grades in required clerkships” as an important factor. Among those, medicine, surgery, and pediatrics get mentioned most often.
- General surgery PDs: >90% consider surgery clerkship performance critical; many explicitly screen for “Honors in Surgery.”
- OB/GYN, EM, Ortho: similar behavior – they look closely at their own specialty’s clerkship grade and at Medicine.
This is not subtle. Programs actually filter applications based on patterns like “mostly high pass / pass vs majority honors.”
But grading systems are wildly inconsistent
That is the biggest statistical problem with clerkship grades: the variance is dominated by school-level policy.
Some concrete differences I have seen across schools:
- School A: 40% of students receive “Honors” in core rotations
- School B: 15% can receive Honors, hard‑capped curve
- School C: pure Pass/Fail for cores, no honors categories at all
If you compare raw “Honors count” across those three without context, the metric is garbage. PDs know that. That is why they lean on:
- MSPE distribution graphs (where your performance sits vs classmates)
- Class rank or quartile (where available)
- Narrative language (“Top 5% of students I have worked with in 10 years” vs “performed at expected level for training”)
Programs are not blind to grade inflation. Internal data from some institutions show that after they relaxed curves, “Honors in 4+ core clerkships” went from ~20% of the class to ~45%. PDs responded, predictably, by relying more on the MSPE curves and Step 2 to recalibrate.
Honors density as a rough predictor
Despite the noise, a simple metric works decently well: count of honors (or equivalent top tier) across the 6 major cores (IM, Surgery, Peds, OB/GYN, Psych, Family/Neuro depending on school).
Across multiple schools’ internal analyses I have seen patterns like:
- Students with 0–1 Honors in cores: disproportionately match into less competitive specialties and/or community programs
- Students with 3–4 Honors: wide distribution, but many land in mid‑tier university programs; competitive specialties if Step 2 aligns
- Students with 5–6 Honors: heavily represented in competitive specialties (Derm, Ortho, ENT, Rad Onc) and top‑tier academic IM / Surgery
Is it causal? Partially. But mostly it is a composite proxy for underlying things: test-taking ability, work ethic, professionalism, likeability on teams.
Programs treat high honors density as a high‑signal, low‑noise indicator that “this person will probably do fine with our workload and expectations.”
Step 2 CK: the new primary numerical filter
The data are blunter here. Step 2 CK is the new numeric gate. This is not speculation; the PD surveys and anecdotal filters line up almost perfectly.
How PDs are actually using Step 2 now
Post pass/fail Step 1:
- 88–90% of PDs report using Step 2 CK to decide who to interview
- Around 75–80% rate Step 2 as a factor of “high importance”
- Many programs now publish or quietly enforce Step 2 “targets” instead of Step 1 cutoffs
You can see the behavior shift in three concrete ways:
- Earlier Step 2 timing pressure: students are told to have Step 2 in by ERAS submission, whereas pre‑P/F era, many strong applicants delayed.
- Rising mean scores among matched applicants in competitive specialties (informal program data regularly show means in the 245–255 range for things like Ortho, Derm, ENT, even after Step 1 went P/F).
- Explicit mention: I have seen PDs tell applicants, “We rely more on Step 2 now that Step 1 is P/F.”
Approximate Step 2 score bands and match likelihood
Different specialties, different thresholds. But we can roughly segment based on PD survey responses and program‑level reports.
| Specialty Tier | Common Step 2 Range of Matched Applicants* |
|---|---|
| Ultra-competitive (Derm, ENT, Ortho, PRS) | 250–260+ |
| Competitive (Radiology, Anes, EM, Urology) | 245–255 |
| Moderate (IM academic, Gen Surg, OB/GYN) | 240–250 |
| Less competitive (FM, Psych, Peds community) | 230–245 |
*These are broad, composite ranges drawn from PD comments, institutional match summaries, and specialty organizations. There are always outliers.
The important point: in the P/F era, that Step 2 number is carrying more discriminatory power than Step 1 did relative to the rest of your file.
Visualizing the shift from Step 1 to Step 2
| Category | Value |
|---|---|
| Step 1 Pre P/F | 94 |
| Step 1 Post P/F | 46 |
| Step 2 Pre P/F | 72 |
| Step 2 Post P/F | 88 |
The bar chart tells the story clearly: Step 1’s importance cratered, Step 2’s climbed sharply. Programs did not abandon standardized tests; they swapped which one drives decisions.
Core clerkship grades vs Step 2: which matters more?
This is the question that actually matters for your day‑to‑day decisions. Do you pour extra effort into nailing medicine and surgery rotations, or do you protect study time for a Step 2 push? The answer is not symmetrical for every applicant, but there are some clear realities.
How the two signals differ
Step 2 CK:
- Standardized across schools, across years
- Purely numerical; easy to plug into filters
- Strong predictor of passing specialty boards and doing well on in‑training exams
Core clerkship grades:
- Locally normed and noisy
- Multi-dimensional: shelf exam performance + clinical evaluations + sometimes OSCEs
- Embedded in rich narrative comments that PDs actually read (at least the highlighted ones)
Programs essentially use Step 2 to answer: “Is this person academically safe and relatively competitive for our cohort?”
They use core clerkship performance to answer: “What is this person like to work with? How do they function with sick patients and busy teams?”
From a predictive perspective:
- For interview screening: Step 2 has higher weight
- For rank ordering interviewed candidates: clerkship performance and letters catch up and often surpass Step 2
Combined signal is what really predicts match
The interaction matters more than either metric in isolation. A simple way to think about it:
- High Step 2 + strong clerkship performance = almost always interviewable for appropriate specialty tier; your risk is low
- High Step 2 + mediocre clerkships = you may get screened in, but you will get picked apart in MSPE and letters; some programs will down‑rank you
- Moderate Step 2 + stellar clerkships = you are viable for a surprisingly large range of programs; strong narratives can rescue you
- Low Step 2 + weak clerkships = you will have a difficult cycle, irrespective of specialty
I have seen many applicants with Step 2 in the 245–250 range and mostly “High Pass” but strong narrative comments match well in IM, EM, anesthesiology, and even some surgical fields. I have also seen students with a 260 Step 2 and multiple “Pass only” cores get less traction than they expected.
How PDs informally weigh them
This is obviously not published as an equation, but based on conversations with PDs and associate PDs, a rough weighting for screening might look like:
- 40–50% Step 2 CK band
- 30–40% core clerkship performance (including MSPE graphs)
- 10–20% school reputation / research / other metrics
Once you are in the interview pool and have completed interviews, that weighting shifts:
- 40–50% interview performance and perceived fit
- 25–35% clerkship performance and letters
- 10–20% Step 2 and other academic metrics
Program directors rarely admit this is that formulaic, but their behavior tracks numbers like these.
Specialty differences: where grades dominate vs where Step 2 dominates
Different fields lean on the two predictors differently. Some care more about raw exam horsepower. Others obsess over clinical performance.
Exam-driven specialties
Radiology, anesthesiology, some IM subspecialty‑oriented academic programs: they know their residents must perform on multiple in‑training exams and boards.
In these fields:
- Step 2 is often the first hard screen. A program might say internally, “Below 240, we look very carefully. Below 230, we almost never invite unless there is an exceptional story.”
- Clerkship grades still matter, especially in medicine and surgery, but a slightly weaker clerkship record can be offset by a strong Step 2.
Clinic- and OR‑performance driven specialties
Surgery, OB/GYN, EM, Ortho, ENT: these programs rely heavily on seeing you in action or at least pulling from people who have.
Here:
- Honors in the specialty clerkship + a strong Sub‑I + enthusiastic letters often beats a purely high Step 2 with lukewarm clinical comments.
- Many PDs in these fields have said versions of: “We would rather take someone with a 240 and exceptional clinical reviews than a 260 who is mediocre on the wards.”
Generalist and primary care fields
Family medicine, general pediatrics, community internal medicine: these fields are more forgiving numerically, but they still track the same pattern.
- A Step 2 in the low 230s is frequently fine.
- Being consistently “solid” on cores with a few strong comments will comfortably match you.
- A bizarrely low Step 2 (for example, barely above passing) or repeated marginal failures on cores are what raise red flags.
What the data say about timing and sequencing
One under-discussed side effect of the Step 1 P/F change: timing of Step 2 relative to cores changed.
Before: many strong students took Step 2 after most or all cores, sometimes even late in fourth year.
Now: a rising proportion take Step 2 right after core clerkships end but before ERAS submission, due to PD expectations.
This creates a practical tension:
- If you take Step 2 too early, you may sacrifice performance by not consolidating enough internal medicine and surgery knowledge.
- If you take it too late, you risk submitting applications without a score, which some programs will not tolerate in the P/F era.
A simple, data‑informed strategy that works well for most students:
- Complete at least IM, Surgery, and one medicine‑adjacent core (Peds or OB) before Step 2.
- Take Step 2 between 4–8 weeks after finishing the last “knowledge heavy” core. This window tends to maximize shelf exam carryover but still allows for dedicated time.
Programs increasingly prefer ERAS applications with a Step 2 score already posted. Internal data from some schools show that applicants who submit without Step 2 in September and only add it in October–November tend to receive fewer interview invitations overall, especially in competitive specialties.
Is that causal or selection bias? Both, probably. But if you can avoid being in that ambiguous group, you should.
Strategic implications if you are in the P/F era now
You cannot change that Step 1 is pass/fail. You can control how strong your remaining signals are.
From a data‑driven standpoint, the planning priorities look like this:
Protect at least a few core clerkships for maximal performance.
- You do not need honors in all six.
- You absolutely benefit from: Honors in IM or Surgery (preferably both if you are aiming surgical/competitive) and overall “upper half” distribution.
Plan Step 2 so your most knowledge‑dense cores feed directly into it.
- The data show shelf performance and Step 2 performance track closely; leveraging that is smart.
Know your intended specialty’s approximate Step 2 expectations early.
- If you are thinking ENT, Derm, Ortho: aim to be well above the national mean (call it 250+ if you want a safety margin).
- If you are leaning FM/Psych/Peds: a solid mid‑230s to low‑240s score plus strong clinical performance is usually enough.
Understand that narratives in the MSPE amplify or dampen both metrics.
- A student with 3 honors and a 245 Step 2 but repeated comments like “needed frequent reminders about punctuality” will underperform statistically predicted match outcomes.
- Another with 1–2 honors, mostly high passes, a 238 Step 2, and multiple “top 10% of students I have worked with” comments will overperform.
A quick visual: where the pressure really moved
To crystallize how much more Step 2 and core performance matter now:
| Category | Step 1 | Step 2 | Core Clerkships / MSPE | Other (Research, LORs, Fit) |
|---|---|---|---|---|
| Pre P/F Era | 45 | 20 | 20 | 15 |
| Post P/F Era | 5 | 35 | 35 | 25 |
These percentages are an interpretive model, but they are aligned with PD survey data and actual behavior. Your lived experience on the wards and in the testing center will feel like this chart.
Final takeaways
The data, stripped of fluff, say three things:
- Step 2 CK is now the primary numerical filter; if you want options, aim for a score that is clearly above your target specialty’s average.
- Core clerkship performance – especially IM and Surgery – is a powerful independent predictor of match outcomes, and in competitive fields, honors in the specialty and strong narratives are non‑negotiable.
- The interaction of solid Step 2 plus strong, well‑documented clinical performance is what actually predicts a safe, satisfying match in the Step 1 pass/fail era; leaning on only one of those levers is a bad bet.