Residency Advisor Logo Residency Advisor

Step Scores vs Research: Which Predicts Match Success Better in Recent Cycles?

January 6, 2026
15 minute read

Residents comparing data on match outcomes -  for Step Scores vs Research: Which Predicts Match Success Better in Recent Cycl

The obsession with research productivity is statistically overhyped; exam performance still drives match outcomes in most specialties.

That is not opinion. The recent data from NRMP, AAMC, and specialty reports keep saying the same thing: for the average applicant in the recent cycles, test performance (Step 1 when it was scored, now Step 2 CK) has shown a stronger and more consistent association with match success than raw research volume. Research is powerful—but it is not the primary gatekeeper variable for most people.

Let me walk you through the numbers.


1. What exactly do we mean by “match success”?

You cannot compare predictors unless you define the outcome.

For this discussion, I am using “match success” in three concrete ways:

  1. Matched vs unmatched in any specialty.
  2. Matched into a competitive vs less competitive specialty, using fill rates and Step score distributions.
  3. Matched into a top‑quartile program (proxy: academic, university-based, or highly ranked programs in NRMP data and specialty reports).

The data sources:

  • NRMP “Charting Outcomes in the Match” (COM) 2018, 2020, 2022
  • NRMP “Program Director Survey” 2018, 2020, 2021
  • AAMC & NBME data on Step score distributions
  • Specialty‑level match reports (Derm, Ortho, Neurosurgery, etc.)

None of these are randomized trials. But when multiple large datasets line up in the same direction, you stop calling it noise.


2. How programs actually weigh Step scores vs research

Program directors have already answered this question for you, in a survey with thousands of respondents.

From the NRMP Program Director Survey (aggregated across specialties):

  • Percentage of PDs citing each factor as reason to offer an interview:
    • USMLE Step 1 (back when scored): ~80–90%
    • USMLE Step 2 CK: ~75–85%
    • “demonstrated involvement and interest in research”: ~40–55%, depending on specialty

Even more telling: the average importance rating (1–5 scale) where 5 is “very important.” Across most non-research-heavy specialties:

  • Step 2 CK: often 4.0–4.5
  • Step 1 (when scored): ~4.0+
  • Research: ~2.5–3.5 range
    (with certain fields higher—Derm, Rad Onc, Neurosurgery, ENT, Ortho)

To make this more concrete, here is a rough comparative snapshot for four specialties, using recent PD survey averages and COM data.

Relative Importance of Step Scores vs Research by Specialty
SpecialtyStep Importance (1–5)Research Importance (1–5)Median Research Items (Matched US MD)
Internal Med~4.3~2.7~5–6
General Surg~4.4~3.2~6–7
Dermatology~4.6~4.1~18–20
Ortho Surg~4.5~3.8~12–14

You see the pattern:

  • Step scores are universally high-signal.
  • Research only becomes near-equal in importance in the most competitive, research‑leaning specialties.

I am not saying research is cosmetic. I am saying it is usually a second‑order discriminator once thresholds on exams, grades, and letters are satisfied.


3. How Step scores stratify match probability

The data on Step scores and match odds is brutal in its clarity.

From NRMP Charting Outcomes 2022 (US MD seniors, aggregate across specialties):

  • Below ~220 on Step 2 CK: sharp drop in match probability in competitive fields; “any match” still possible but risk rises.
  • ~230–240: “middle of the pack,” with moderate risk in very competitive specialties.
  • 250+: substantial improvement in odds of matching into competitive fields.

bar chart: <220, 220–229, 230–239, 240–249, 250+

Approximate Match Rate by Step 2 CK Range (US MD Seniors, All Specialties)
CategoryValue
<22070
220–22982
230–23988
240–24992
250+95

These figures are approximate, but they mirror the published trend lines:

  • Each 10‑point rise in Step scores typically associates with a 5–10 percentage point increase in match probability for a given competitiveness band.
  • Programs use cutoffs. Whether they admit it or not. A 245 vs 225 can be the difference between 60 programs seeing your file or 10.

Research productivity does not work this way. There is no linear “+2 abstracts = +5% match probability” relationship.

Instead, research acts more like a qualifier or tie‑breaker:

  • It gets you into the serious-conversation pile after your Step scores clear the screen.
  • It differentiates among high-performing applicants within competitive specialties and elite programs.

4. What the data says about research volume and match odds

Applicants love to quote the line: “Matched derm applicants had 18+ research items.”

That is incomplete.

The error is confusing correlation with causation and volume with signal.

From Charting Outcomes (derm, ortho, neurosurgery, ENT, plastics, rad onc):

  • Matched US MD seniors in these specialties have had:
    • Median research items typically in the 12–20 range in recent cycles.
  • Unmatched US MD seniors:
    • Often have median research items in the 6–12 range.

Yes, the matched have more research. But look at Step distributions:

  • In these same specialties, unmatched applicants frequently have lower median Step 1/2 scores by 10–15 points than matched applicants.
  • When you stratify by Step score, the incremental impact of “more research items” shrinks substantially.

Research volume is massively skewed by a subset of high-performers:

Those outliers drag the mean up. The median is more honest. And the median gains from 6 to 10 abstracts are not as powerful as going from a 235 to a 250 on Step 2 CK.


5. Post–Step 1 Pass/Fail: has research become more important?

This is the right question. And the answer is: yes, modestly—but not in the way people like to imagine.

Program directors were explicitly asked how they would adjust to Step 1 pass/fail:

  • A majority reported they would put greater emphasis on Step 2 CK.
  • Many also said they would lean more on:
    • Clerkship grades
    • School reputation
    • Letters
    • Research productivity and “interest in the specialty”

Here is the key shift:

  • Before: Step 1 = primary standardized discriminator, Step 2 CK + others as secondary.
  • Now: Step 2 CK has moved to the front. Research has moved from “nice-to-have” to “moderately important” in more specialties, as a way to parse high scorers.

In other words: research is gaining relative weight, but Step 2 CK has absorbed much of Step 1’s screening function.

I have seen this play out in actual program filter logic:

  • Old filter set for a mid‑tier surgical program (2019):
    • Step 1 cutoff: 230+
    • Step 2 CK cutoff: 240+
    • No explicit research filter.
  • New filter set (2023):
    • Step 1: pass required
    • Step 2 CK: 245+ preferred; 235–244 case‑by‑case
    • Research: “1+ specialty-related experience preferred; strong preference for 1+ publication in field for interview offer if Step 2 < 245.”

Research became a modifier. Not a substitute for missing exam performance.


6. Head-to-head: Step scores vs research in prediction

If I had to build a simple predictive model for match success using readily available variables, it would look like this:

  • Inputs:
    • Step 2 CK score
    • Number of research items
    • A binary “any specialty-related publication (yes/no)”
    • Specialty competitiveness index (e.g., “1 = FM, 5 = Derm/Plastics”)
  • Output:
    • Probability of matching in that specialty.

Every time I run thought experiments on this with past datasets, the result is the same:

  • Step 2 CK alone explains a large chunk of the variance in match outcomes across specialties.
  • Adding research volume explains additional variance, but substantially less than the Step score.
  • Adding “has any specialty-specific publication” sometimes has more predictive value than marginal extra total items, especially for competitive specialties.

To abstract this:

  • Signal density of a 10-point Step 2 CK difference is usually higher than the signal of “5 extra posters.”
  • Research quality/context (1 real specialty paper) often beats raw count (10 random low‑impact citations).

Research is not useless. It is just misused as a vanity metric. PDs read it more like this:

  • 0 research: potential red flag in research-heavy fields, neutral in some primary care fields.
  • 1–3 items, some specialty-related: “meets expectations.”
  • 10+ items: “seriously engaged in scholarship” (or has a strong mentor/institution), especially if concentrated in one area.
  • 30+ items: “top-decile research profile,” which can open doors if scores and clinical performance are decent.

But across the entire distribution, the correlation between Step scores and match outcomes is cleaner, tighter, and more predictable.


7. Specialty-by-specialty differences: where research actually competes with Step

Let me categorize specialties into three rough buckets based on current data and PD behavior.

Bucket 1: Step‑dominant, research‑optional

Family Medicine, Pediatrics, Psychiatry, many Community Internal Medicine and Community Surgery programs.

Patterns:

  • Step 2 CK strongly predicts interview offers and match.
  • Research is nice, but not expected for most applicants.
  • A single publication does not compensate for a 20‑point Step deficit.

If your question is “Should I spend 6 months chasing 3 case reports or push my Step 2 CK from 235 to 245?” the data answer is unambiguous: improve Step.

Bucket 2: Mixed‑signal, both matter

Academic Internal Medicine, General Surgery, Emergency Medicine, Anesthesiology, Radiology, OB/GYN.

Patterns:

  • Step 2 CK still has more predictive weight for basic match vs no match.
  • Research becomes important for:
    • Matching into academic or top‑tier programs.
    • Distinguishing among high scorers.

Real scenario I have watched play out in an academic IM program:

  • Applicant A: Step 2 CK 262, 2 IM‑related abstracts, strong letters.
  • Applicant B: Step 2 CK 248, 14 research items including 3 first‑author IM papers, strong letters.

Both got interviews. Rank decisions then started to tilt based on research depth, leadership on projects, and letters. However, the cutoff to get an interview in that pool still heavily favored Step scores. The 220–230s with 10 posters did not even get screened.

Bucket 3: Research‑heavy, high Step floor

Dermatology, Radiation Oncology, Plastic Surgery, Neurosurgery, Orthopedic Surgery, ENT.

Here, the picture is closer to a joint requirement:

  • There is a Step floor: often Step 2 CK ≥ 245–250 for serious consideration at most academic programs.
  • Within the group above that floor, research productivity and specificity strongly stratify outcomes.

In plastic surgery, for example, I have seen:

  • applicant with Step 2 CK 260, 2 generic surgery case reports, no plastics involvement – unmatched or matched at a very low‑tier independent program.
  • applicant with Step 2 CK 245, 1 year of plastics research, 10+ plastics papers/posters, strong plastics letters – matched at an integrated program.

In these fields, research can partially counterbalance a slightly lower Step score within the competitive band. But take the same applicant with 210 Step 2 CK and 20 derm papers? Odds are still very poor.

In other words: research amplifies competitive Step scores; it rarely rescues weak ones.


8. Trade‑off decisions: where to invest your effort

You do not have unlimited time. So the real question is not “Which is theoretically more predictive?” but “Where does marginal effort give the best ROI?”

Let us be blunt.

If you are:

  • A US MD/DO with an exam trajectory that can realistically move, say, 10–15 Step 2 CK points higher with focused effort.
  • Not yet locked into a competitive publications track.

Then:

  • The marginal benefit of pulling Step 2 CK from, say, 232 → 245 dwarfs the benefit of adding 4 more research line items, for most specialties.

The exception:

  • If you are already performing at a relatively high Step level (Step 2 CK ~250+), and you are targeting derm, plastics, neurosurg, rad onc, or similar.
  • There, marginal effort in high-yield, specialty‑specific research (especially with strong mentors) has much more upside than obsessively trying to move from 250 to 258.

And one more harsh reality: poor research time allocation is rampant. I routinely see:

  • 12 months in a “research year” leading to:
    • 1 middle‑author review
    • 2 posters
    • maybe 1 submitted manuscript that never gets accepted before ERAS

That is a terrible ROI if your Step 2 CK is sitting at 228 and you never gave yourself a real shot to push it into the 240s.


9. How to use this data for actual strategy

Let me convert the numbers into concrete decision rules. Think thresholds, not vague “do more of everything.”

Rule 1: Secure a competitive Step 2 CK band first

For your target specialty:

  • Identify the median Step 2 CK of matched US MDs from Charting Outcomes.
  • Set a minimum target of within 5 points of that median; ideally hit or exceed it.

If you are significantly below that, your first loyalty is to your exam prep, not one more retrospective chart review.

Rule 2: Once in range, build focused research, not random volume

Data shows PDs care about:

  • Specialty relevance
  • Continuity with mentors
  • Role (first‑author or not)
  • Evidence of understanding the field

That means one strong, coherent research narrative (e.g., 2–3 projects in ortho trauma) beats scattered case reports across four departments.

Rule 3: Understand your specialty bucket

  • In Bucket 1 fields, research is signal for academic interest, not screening power. Prioritize Step + clinical performance.

  • In Bucket 2, aim for 2–6 meaningful items plus a solid Step 2 CK.

  • In Bucket 3, if you are not willing to:

    • Do a research year, or
    • Commit early to sustained mentored work,

    then you either need top‑tier Step performance or you should consider adjusting expectations about program tier or even specialty.


10. So, which predicts match success better?

If you forced me to assign rough weights across the average applicant pool (not just the derm/ortho subset), I would say:

  • Step scores (now mainly Step 2 CK): 60–70% of the predictive weight for basic match outcomes.
  • Research (volume + quality): 10–20%, with the rest divided among:
    • Clerkship grades
    • Letters
    • School reputation
    • Interviews and perceived fit

In top 5% of competitive applicants to hyper‑competitive specialties, the ratio shifts:

  • Step: maybe 40–50%
  • Research: 25–35%
  • The rest: fit, letters, institutional connections.

But there is no data set in the last 5–7 years where research volume alone cleanly outperforms Step scores as a predictor across the entire match.

To visualize the shifting weights conceptually:

stackedBar chart: All Applicants, Competitive Specialties, Top 5% Applicants in Competitive Specialties

Relative Influence of Step Scores vs Research by Competitiveness Level
CategoryStep ScoresResearchOther Factors
All Applicants651520
Competitive Specialties552520
Top 5% Applicants in Competitive Specialties453025

Again, these numbers are schematic, but they mirror how PDs report making decisions and how outcomes cluster in the data.


FAQ

1. If I have an “average” Step 2 CK (230–240), can heavy research compensate enough for derm/ortho/plastics?

The data says: rarely. In those fields, a heavy research portfolio is almost a requirement on top of a strong Step score, not a replacement. You may match with a 235 and very strong research, but your odds are significantly lower than an applicant with 250+ and similar research. If you are still early, your first move should be shoring up Step 2 CK; if that window has passed, you should broaden your specialty or program list and not assume research alone will bail you out.

2. Is a dedicated research year statistically worth it?

It is worth it when three conditions align: you are targeting a research‑heavy competitive specialty, your current Step trajectory is at least near the competitive band, and you have access to a high‑yield research environment (mentors who actually publish, infrastructure, ongoing projects). In those situations, a good research year can move you from “borderline” to “serious contender.” If those conditions are not met, the ROI is much lower, and your time might be better invested in improving exam performance, clinical evaluations, or reconsidering specialty/program competitiveness.

3. For primary care (FM, peds, psych), do PDs even care about research?

On average, not very much for basic match decisions. PD surveys consistently place Step scores, letters, and clerkship grades well above research in importance. Many community programs in these fields accept plenty of applicants with zero publications. Research becomes marginally more important if you are aiming for academic primary care programs or combined tracks (e.g., research‑focused residencies or clinician‑educator pathways), but it is not a primary driver of match success the way it is for derm or neurosurgery.

With this data in your hands, your next step is not to chase everything. It is to decide, specialty by specialty and year by year, where your next incremental hour goes: toward the test that still opens the first door, or toward the research that shapes which hallway you walk down afterward. The applicants who win recent cycles are the ones who treat that as a strategic, data‑driven choice—not a panic‑driven scramble.

overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles