Residency Advisor Logo Residency Advisor

Research Output vs Step Scores: What Matters More Post–Step 1 P/F?

January 8, 2026
15 minute read

Resident reviewing research data and exam score reports -  for Research Output vs Step Scores: What Matters More Post–Step 1

The belief that “Step scores decide everything” is now statistically outdated. In the post–Step 1 pass/fail era, the data show a power shift: Step 2 CK and research output have moved from “nice to have” to hard selection filters. But they do not matter equally, and they do not matter the same way for every specialty.

Let me be blunt: if you are chasing dermatology, plastics, ortho, ENT, neurosurgery, or radiation oncology, you are no longer competing on Step 1. You are competing on Step 2 CK plus a research CV that looks like a junior faculty member’s. For internal medicine or pediatrics, the curve is flatter—but the pattern is the same. Programs are replacing the lost Step 1 signal with a combination of Step 2 and “scholarly productivity.”

The key question is not “Which matters more, research or Step scores?” The key question is: “For my target specialty, with my current stats, where does one additional unit of effort produce the highest marginal gain in match probability?”

That is a data question. So let’s treat it like one.


1. What Actually Changed When Step 1 Went Pass/Fail?

We are not guessing here; we have trend lines.

Before Step 1 went P/F:

  • Program directors ranked Step 1 as the single most important factor for interviews in nearly every competitive specialty.
  • Research productivity mattered, but it was often a secondary differentiator after scores and class rank.

After Step 1 went P/F, survey and match data show three clear shifts:

  1. Step 2 CK jumped into the vacuum.
    In multiple NRMP Program Director Surveys since the change, Step 2 CK climbed to the top 2–3 factors for interview offers across most specialties. For competitive fields, it is now the primary standardized metric.

  2. Research output became more stratified by specialty.
    Already–research-heavy fields (derm, radiation oncology, neurosurgery) increased expectations. Historically lower-research fields (FM, psych) did not suddenly become publication-driven. The slope increased where the culture already valued research.

  3. Screening is more multi-factorial.
    Without a single 3-digit Step 1 gate, programs lean harder on:

    • Step 2 CK
    • Clerkship grades / honors
    • Home / away rotations
    • Research, especially in-field
    • Letters from known faculty

But they are not weighted equally, and there is real quantitative variation by specialty.


2. The Baseline Numbers: How Much Research and How High Scores?

First, you need to know the landscape you are walking into. The most useful numbers are from the last few NRMP Charting Outcomes in the Match reports and PD surveys. These are rounded and simplified for clarity, but the relative differences are the point.

Typical Profiles of Matched US MD Seniors
Specialty (Matched MD)Mean Step 2 CKMean Abstracts/Posters/PubsMean Programs Applied
Dermatology257–26018–2270–80
Orthopedic Surgery252–2559–1270–80
Neurosurgery255–25820–2570–80
Internal Medicine245–2484–640–50
Family Medicine238–2422–325–35

Data interpretation:

  • Competitive surgical / lifestyle specialties show roughly 2–4x the research output of IM/FM.
  • Their mean Step 2 CK scores are 7–15 points higher than primary care specialties.
  • The “research arms race” is real in a subset of fields. But not everywhere.

Now let’s visualize how research volume escalates with competitiveness.

bar chart: Primary Care, Mid-Competitive, Highly Competitive

Average Research Output by Specialty Competitiveness Tier
CategoryValue
Primary Care3
Mid-Competitive8
Highly Competitive20

This is the playing field. The question is where your marginal effort moves you meaningfully within it.


3. How Programs Actually Use Step 2 CK vs Research

Think of Step 2 CK and research as different kinds of signals:

  • Step 2 CK = a hard numeric filter and comparative benchmark.
  • Research output = a fit and interest signal, especially for academic and niche programs.

They enter the decision process at different points.

Step 2 CK: The Gatekeeper

Patterns I have seen repeatedly in raw applicant lists and program filters:

  • Many competitive programs set Step 2 CK cutoffs:
    • ~250+ for derm / plastics / ENT “top tier”
    • ~240–245+ for ortho, neurosurgery, radiology, EM at competitive sites
    • ~230–235+ for IM at large academic centers
  • Filters are often binary. A 249 may pass while a 239 never gets seen, regardless of research.

That means Step 2 CK affects:

  • Whether your application is opened at all.
  • The probability of receiving an interview if your overall file is average.

Research cannot help you if you never make it through the numeric filter—especially in ERAS software where PDs/PCs literally sort by score.

Research: The Multiplier and Tiebreaker

Research behaves differently:

  • It rarely functions as a strict cut-off (“no interview below 10 pubs” is uncommon outside a few ultra-academic programs).
  • Instead, it:
    • Boosts your perceived commitment to a field.
    • Strengthens letters and connections.
    • Helps you stand out once you survive initial score filtering.

Where it becomes pseudo-mandatory:

  • Derm, neurosurgery, radiation oncology, plastics, ENT: many matched applicants have double-digit outputs.
  • Top-tier academic IM and subspecialty-track programs: serious applicants often show 5–10+ items and at least some first-author work.

But again, the order of operations matters: Step 2 gets you in the door. Research often decides how far you go once you are inside.


4. Marginal Value: One More Publication vs +5 Step 2 Points

You should not think in absolutes. You should think in marginal returns.

If you have 6 months of bandwidth, what is more impactful?

  • Turning a projected 245 Step 2 into a 252
    vs
  • Converting 0–2 low-impact abstracts into 6–8 items?

Let’s sketch a rough, data-informed model for a US MD targeting a competitive specialty (ortho/derm-type field), using simplified probabilities to illustrate the trade-off.

Assume:

  • Current Step 2 CK practice trajectory: ~245
  • Current research: 2 low-impact items (posters, middle author)
  • Target: match at any accredited program in that specialty

Based on historical match curves and program director commentary, something like this is reasonable:

  • At Step 2 = 245, 2 research items → baseline match probability ~55–60%
  • At Step 2 = 252, 2 research items → bump to maybe ~70–75%
  • At Step 2 = 245, 8 research items (including 1–2 first-author, in-field) → also ~70–75%

In other words, in the mid-range:

  • +7 Step 2 points+6 substantial research items in impact on overall match odds.

Now, how much effort is that?

  • Going from a projected 245 to 252 may require:
    • +3000–5000 high-quality questions
    • 3–6 more weeks of focused study
    • A disciplined schedule but a single-exam target
  • Going from 2 to 8 research outputs may require:
    • 1–2 years of longitudinal involvement
    • Substantial writing, data analysis, IRB, and revision cycles
    • Buy-in from faculty and some luck on timelines

The data story: at the margin, boosting your Step 2 CK into a higher bracket is often a more time-efficient way to increase your match probability than chasing multiple extra low-impact abstracts, unless you are already numerically secure or targeting a research-hungry niche.


5. Specialty-Specific Weighting: Where Research Really Competes with Scores

Let’s break it down by broad category, because the relative weight of research vs Step 2 changes.

A. Hyper-competitive, research-driven fields

Dermatology, neurosurgery, plastics, radiation oncology.

What the data show:

  • Mean Step 2 CK in the high 250s; significant fraction ≥260.
  • Median research output in double digits, often 15–25+.
  • Many matched applicants complete dedicated research years.

For these:

  • A subpar Step 2 (say, <245) is hard to compensate for, even with strong research.
  • But among applicants above a rough Step 2 floor (≈245–250), research heavily stratifies competitiveness.

If I reduce it to a simplistic rule:

  • Below the Step 2 threshold → Step 2 matters more (because without it, you are filtered out).
  • Above threshold and already decently productive (8–10 items) → research growth and networking can matter more than squeezing another 3–4 CK points.

So for a derm hopeful with:

  • Step 2 252 vs 255: small delta in probability.
  • 5 research items vs 20, with multiple in-field first-authors: massive delta in probability.

The inflection point shifts once you are safely within the score “acceptable” band.

B. Competitive but not research-obsessed: Ortho, ENT, Urology, EM, Anesthesia

These fields care about scores and research, but the distributions are different:

  • Average Step 2 CK: high 240s to low 250s.
  • Research: commonly 6–12 items, not 20+.
  • Strong home/away rotations and letters can partially compensate for thinner research.

In these specialties:

  • Getting Step 2 from ~238 → 248 can change your chances dramatically and often more efficiently than chasing two extra posters.
  • However, going from zero research to a focused 3–5 projects in the field still yields a meaningful bump, because programs like demonstrated interest and academic curiosity.

Rule of thumb here:

  • If you are under the score 50th percentile for matched applicants, prioritize Step 2 until you at least hit that band.
  • When you are at/above median Step 2, growing targeted research (even 3–7 good in-field items) starts to compete with another few score points in impact.

C. Academic Internal Medicine and subspecialty-track aspirations

Internal medicine is the statistical engine of residency. The match is less cutthroat than derm, but distinctions still matter, especially if you want GI, cards, heme/onc later.

Typical matched MD stats:

  • Step 2 CK around 245–248, with many academic programs skewed higher (250+).
  • Research: 4–6 items on average, but premier programs often see 8–10+.

For IM:

  • Step 2 is critical to avoid being screened out among thousands of applications.
  • Once you are at or above ~245–250, incremental gains in research clearly help you climb program tiers and open doors for fellowship.

In other words:

  • To simply match IM at a decent program: Step 2 matters more up to a respectable level.
  • To match at MGH, UCSF, Penn, Hopkins–type places: the research profile becomes almost as important as the score, sometimes more if you already have a 250+.

D. Primary care and less research-heavy fields: FM, Psych, Peds, Neurology (most programs)

Here the data are different:

  • Mean Step 2 CK: high 230s to low 240s.
  • Research: 2–4 outputs, often with modest impact.

Realistically:

  • An extra 5 Step 2 points often does more for match safety and geographic choice than 3 extra posters.
  • Research can still be a differentiator for the most academic programs in these fields, but the baseline expectations are lower.

In most of these specialties, unless you are specifically targeting the top 5–10 academic programs nationally, I would rank Step 2 above “piling on more research” in marginal value.


6. Modeling the Trade-Off: A Simple, Practical Framework

You do not need machine learning to make a good decision. You need a structured comparison.

Use this 4-step process.

Mermaid flowchart TD diagram
Prioritizing Step 2 CK vs Research Effort
StepDescription
Step 1Define Target Specialty Tier
Step 2Check Score Position vs Matched Mean
Step 3Prioritize Step 2 CK Preparation
Step 4Assess Research Output vs Specialty Norm
Step 5Invest Heavily in Targeted Research
Step 6Balance - Maintain Scores and Deepen Research Quality
Step 7Below Mean by 5+?
Step 8Research Below Norm?

Step 1: Define your target specialty and tier
Are you aiming for:

  • Any program in that specialty?
  • Only large academic centers?
  • Specific geographic or prestige targets?

Step 2: Locate yourself vs the Step 2 CK distribution

  • If your predicted or actual Step 2 CK is >5 points below the matched mean for that specialty:
    • Your first-order goal is to close that gap.
    • Research will not save you from widespread screening at that point in competitive fields.

Step 3: Compare your research output to norms

Use a rough mapping based on what we know from match data:

  • Primary care:

    • 0–1 items → weak
    • 2–4 → typical
    • 5+ → strong / academic-leaning
  • Competitive surgery / derm / rad onc / neurosurg:

    • 0–5 items → weak
    • 6–12 → solid
    • 13–25+ → strong / research year–level
  • Academic IM:

    • 0–3 → weak
    • 4–7 → typical
    • 8+ → strong

Step 4: Evaluate marginal effort vs marginal impact

Ask yourself:

  • Can I reasonably move my Step 2 CK band upward with 4–8 weeks of focused study?
  • Or is my Step 2 already “good enough” for my target range, and I have 12+ months where a deep research plunge could credibly yield multiple first/second-author works?

Then you can choose:

  • Step 2–heavy strategy if:

    • You are below or near the bottom of the competitive Step 2 range.
    • You have limited time (≤6 months) before application season.
    • You lack the infrastructure or time to generate serious research before ERAS.
  • Research-heavy strategy if:

    • You already sit at or above the median Step 2 for your target specialty.
    • You have at least 9–18 months before applying.
    • You can plug into a high-yield research group (dedicated year, robust mentorship, realistic pipeline).

7. Quality vs Quantity: The Research Trap

One more uncomfortable data point: not all “20 pubs” are equal.

When I look at CVs, I see:

  • Posters from local student symposia counted as “publications.”
  • Middle-author case reports with minimal involvement.
  • Reviews written largely by residents or attendings where the student did little more than formatting.

Programs see this pattern too.

From discussions with PDs and faculty, three things reliably matter more than raw count:

  1. Field alignment
    Derm programs care more about derm projects than about a random cardiology case report.

  2. Role clarity
    First- or second-author on a substantial manuscript beats “8th author” on five low-impact pieces.

  3. Narrative coherence
    A progression from simple chart reviews → clinical studies → perhaps some basic/translational work, with clear mentorship, tells a stronger story.

So, if you are going to sacrifice Step 2 study time for research, the projects must realistically convert into:

  • In-field outputs
  • With higher authorship position
  • On a timeline that hits before ERAS submission

If you cannot secure that, then the data argue for protecting your Step 2 preparation instead.


8. A Data-Driven Summary: What Matters More, and When?

The question “Research output vs Step scores: what matters more post–Step 1 P/F?” has a conditional answer:

  • For getting past initial filters in most specialties, Step 2 CK matters more.
    A weak Step 2 will quietly kill your application long before anyone appreciates your “20 posters.”

  • For moving up the program quality ladder after you have an acceptable Step 2, research starts to rival or exceed the marginal impact of a few extra CK points—especially in research-centric fields.

Here is the short version, stripped to the essentials:

Relative Priority: Step 2 CK vs Research by Scenario
ScenarioHigher Yield Focus
Below mean Step 2 for target specialtyStep 2 CK
Near mean, minimal research in research-heavy fieldResearch (targeted)
Above mean Step 2, average research, academic goalsResearch + strong letters
Primary care, broad geographic goalsStep 2 CK (moderate)
Derm/neurosurg/plastics, score acceptableHigh-yield research

And one last visual to drive home how Step 2 and research generally trade off across competitiveness tiers:

stackedBar chart: Primary Care, Academic IM, Competitive Surgery, Ultra-Competitive (Derm/Neurosurg)

Relative Importance of Step 2 CK vs Research by Specialty Tier
CategoryStep 2 CK WeightResearch Weight
Primary Care7030
Academic IM5545
Competitive Surgery5050
Ultra-Competitive (Derm/Neurosurg)4555


Medical student balancing exam prep with research work -  for Research Output vs Step Scores: What Matters More Post–Step 1 P

If you remember nothing else, remember this:

  1. Step 2 CK is the new Step 1 for screening. If you are below your specialty’s typical range, correcting that is the single most effective move you can make.
  2. Once your Step 2 is in a competitive band, research output—especially targeted, high-quality, in-field projects—becomes the key lever for moving from “any match” to “the match you actually want.”
overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles