
The belief that “Step scores decide everything” is now statistically outdated. In the post–Step 1 pass/fail era, the data show a power shift: Step 2 CK and research output have moved from “nice to have” to hard selection filters. But they do not matter equally, and they do not matter the same way for every specialty.
Let me be blunt: if you are chasing dermatology, plastics, ortho, ENT, neurosurgery, or radiation oncology, you are no longer competing on Step 1. You are competing on Step 2 CK plus a research CV that looks like a junior faculty member’s. For internal medicine or pediatrics, the curve is flatter—but the pattern is the same. Programs are replacing the lost Step 1 signal with a combination of Step 2 and “scholarly productivity.”
The key question is not “Which matters more, research or Step scores?” The key question is: “For my target specialty, with my current stats, where does one additional unit of effort produce the highest marginal gain in match probability?”
That is a data question. So let’s treat it like one.
1. What Actually Changed When Step 1 Went Pass/Fail?
We are not guessing here; we have trend lines.
Before Step 1 went P/F:
- Program directors ranked Step 1 as the single most important factor for interviews in nearly every competitive specialty.
- Research productivity mattered, but it was often a secondary differentiator after scores and class rank.
After Step 1 went P/F, survey and match data show three clear shifts:
Step 2 CK jumped into the vacuum.
In multiple NRMP Program Director Surveys since the change, Step 2 CK climbed to the top 2–3 factors for interview offers across most specialties. For competitive fields, it is now the primary standardized metric.Research output became more stratified by specialty.
Already–research-heavy fields (derm, radiation oncology, neurosurgery) increased expectations. Historically lower-research fields (FM, psych) did not suddenly become publication-driven. The slope increased where the culture already valued research.Screening is more multi-factorial.
Without a single 3-digit Step 1 gate, programs lean harder on:- Step 2 CK
- Clerkship grades / honors
- Home / away rotations
- Research, especially in-field
- Letters from known faculty
But they are not weighted equally, and there is real quantitative variation by specialty.
2. The Baseline Numbers: How Much Research and How High Scores?
First, you need to know the landscape you are walking into. The most useful numbers are from the last few NRMP Charting Outcomes in the Match reports and PD surveys. These are rounded and simplified for clarity, but the relative differences are the point.
| Specialty (Matched MD) | Mean Step 2 CK | Mean Abstracts/Posters/Pubs | Mean Programs Applied |
|---|---|---|---|
| Dermatology | 257–260 | 18–22 | 70–80 |
| Orthopedic Surgery | 252–255 | 9–12 | 70–80 |
| Neurosurgery | 255–258 | 20–25 | 70–80 |
| Internal Medicine | 245–248 | 4–6 | 40–50 |
| Family Medicine | 238–242 | 2–3 | 25–35 |
Data interpretation:
- Competitive surgical / lifestyle specialties show roughly 2–4x the research output of IM/FM.
- Their mean Step 2 CK scores are 7–15 points higher than primary care specialties.
- The “research arms race” is real in a subset of fields. But not everywhere.
Now let’s visualize how research volume escalates with competitiveness.
| Category | Value |
|---|---|
| Primary Care | 3 |
| Mid-Competitive | 8 |
| Highly Competitive | 20 |
This is the playing field. The question is where your marginal effort moves you meaningfully within it.
3. How Programs Actually Use Step 2 CK vs Research
Think of Step 2 CK and research as different kinds of signals:
- Step 2 CK = a hard numeric filter and comparative benchmark.
- Research output = a fit and interest signal, especially for academic and niche programs.
They enter the decision process at different points.
Step 2 CK: The Gatekeeper
Patterns I have seen repeatedly in raw applicant lists and program filters:
- Many competitive programs set Step 2 CK cutoffs:
- ~250+ for derm / plastics / ENT “top tier”
- ~240–245+ for ortho, neurosurgery, radiology, EM at competitive sites
- ~230–235+ for IM at large academic centers
- Filters are often binary. A 249 may pass while a 239 never gets seen, regardless of research.
That means Step 2 CK affects:
- Whether your application is opened at all.
- The probability of receiving an interview if your overall file is average.
Research cannot help you if you never make it through the numeric filter—especially in ERAS software where PDs/PCs literally sort by score.
Research: The Multiplier and Tiebreaker
Research behaves differently:
- It rarely functions as a strict cut-off (“no interview below 10 pubs” is uncommon outside a few ultra-academic programs).
- Instead, it:
- Boosts your perceived commitment to a field.
- Strengthens letters and connections.
- Helps you stand out once you survive initial score filtering.
Where it becomes pseudo-mandatory:
- Derm, neurosurgery, radiation oncology, plastics, ENT: many matched applicants have double-digit outputs.
- Top-tier academic IM and subspecialty-track programs: serious applicants often show 5–10+ items and at least some first-author work.
But again, the order of operations matters: Step 2 gets you in the door. Research often decides how far you go once you are inside.
4. Marginal Value: One More Publication vs +5 Step 2 Points
You should not think in absolutes. You should think in marginal returns.
If you have 6 months of bandwidth, what is more impactful?
- Turning a projected 245 Step 2 into a 252
vs - Converting 0–2 low-impact abstracts into 6–8 items?
Let’s sketch a rough, data-informed model for a US MD targeting a competitive specialty (ortho/derm-type field), using simplified probabilities to illustrate the trade-off.
Assume:
- Current Step 2 CK practice trajectory: ~245
- Current research: 2 low-impact items (posters, middle author)
- Target: match at any accredited program in that specialty
Based on historical match curves and program director commentary, something like this is reasonable:
- At Step 2 = 245, 2 research items → baseline match probability ~55–60%
- At Step 2 = 252, 2 research items → bump to maybe ~70–75%
- At Step 2 = 245, 8 research items (including 1–2 first-author, in-field) → also ~70–75%
In other words, in the mid-range:
- +7 Step 2 points ≈ +6 substantial research items in impact on overall match odds.
Now, how much effort is that?
- Going from a projected 245 to 252 may require:
- +3000–5000 high-quality questions
- 3–6 more weeks of focused study
- A disciplined schedule but a single-exam target
- Going from 2 to 8 research outputs may require:
- 1–2 years of longitudinal involvement
- Substantial writing, data analysis, IRB, and revision cycles
- Buy-in from faculty and some luck on timelines
The data story: at the margin, boosting your Step 2 CK into a higher bracket is often a more time-efficient way to increase your match probability than chasing multiple extra low-impact abstracts, unless you are already numerically secure or targeting a research-hungry niche.
5. Specialty-Specific Weighting: Where Research Really Competes with Scores
Let’s break it down by broad category, because the relative weight of research vs Step 2 changes.
A. Hyper-competitive, research-driven fields
Dermatology, neurosurgery, plastics, radiation oncology.
What the data show:
- Mean Step 2 CK in the high 250s; significant fraction ≥260.
- Median research output in double digits, often 15–25+.
- Many matched applicants complete dedicated research years.
For these:
- A subpar Step 2 (say, <245) is hard to compensate for, even with strong research.
- But among applicants above a rough Step 2 floor (≈245–250), research heavily stratifies competitiveness.
If I reduce it to a simplistic rule:
- Below the Step 2 threshold → Step 2 matters more (because without it, you are filtered out).
- Above threshold and already decently productive (8–10 items) → research growth and networking can matter more than squeezing another 3–4 CK points.
So for a derm hopeful with:
- Step 2 252 vs 255: small delta in probability.
- 5 research items vs 20, with multiple in-field first-authors: massive delta in probability.
The inflection point shifts once you are safely within the score “acceptable” band.
B. Competitive but not research-obsessed: Ortho, ENT, Urology, EM, Anesthesia
These fields care about scores and research, but the distributions are different:
- Average Step 2 CK: high 240s to low 250s.
- Research: commonly 6–12 items, not 20+.
- Strong home/away rotations and letters can partially compensate for thinner research.
In these specialties:
- Getting Step 2 from ~238 → 248 can change your chances dramatically and often more efficiently than chasing two extra posters.
- However, going from zero research to a focused 3–5 projects in the field still yields a meaningful bump, because programs like demonstrated interest and academic curiosity.
Rule of thumb here:
- If you are under the score 50th percentile for matched applicants, prioritize Step 2 until you at least hit that band.
- When you are at/above median Step 2, growing targeted research (even 3–7 good in-field items) starts to compete with another few score points in impact.
C. Academic Internal Medicine and subspecialty-track aspirations
Internal medicine is the statistical engine of residency. The match is less cutthroat than derm, but distinctions still matter, especially if you want GI, cards, heme/onc later.
Typical matched MD stats:
- Step 2 CK around 245–248, with many academic programs skewed higher (250+).
- Research: 4–6 items on average, but premier programs often see 8–10+.
For IM:
- Step 2 is critical to avoid being screened out among thousands of applications.
- Once you are at or above ~245–250, incremental gains in research clearly help you climb program tiers and open doors for fellowship.
In other words:
- To simply match IM at a decent program: Step 2 matters more up to a respectable level.
- To match at MGH, UCSF, Penn, Hopkins–type places: the research profile becomes almost as important as the score, sometimes more if you already have a 250+.
D. Primary care and less research-heavy fields: FM, Psych, Peds, Neurology (most programs)
Here the data are different:
- Mean Step 2 CK: high 230s to low 240s.
- Research: 2–4 outputs, often with modest impact.
Realistically:
- An extra 5 Step 2 points often does more for match safety and geographic choice than 3 extra posters.
- Research can still be a differentiator for the most academic programs in these fields, but the baseline expectations are lower.
In most of these specialties, unless you are specifically targeting the top 5–10 academic programs nationally, I would rank Step 2 above “piling on more research” in marginal value.
6. Modeling the Trade-Off: A Simple, Practical Framework
You do not need machine learning to make a good decision. You need a structured comparison.
Use this 4-step process.
| Step | Description |
|---|---|
| Step 1 | Define Target Specialty Tier |
| Step 2 | Check Score Position vs Matched Mean |
| Step 3 | Prioritize Step 2 CK Preparation |
| Step 4 | Assess Research Output vs Specialty Norm |
| Step 5 | Invest Heavily in Targeted Research |
| Step 6 | Balance - Maintain Scores and Deepen Research Quality |
| Step 7 | Below Mean by 5+? |
| Step 8 | Research Below Norm? |
Step 1: Define your target specialty and tier
Are you aiming for:
- Any program in that specialty?
- Only large academic centers?
- Specific geographic or prestige targets?
Step 2: Locate yourself vs the Step 2 CK distribution
- If your predicted or actual Step 2 CK is >5 points below the matched mean for that specialty:
- Your first-order goal is to close that gap.
- Research will not save you from widespread screening at that point in competitive fields.
Step 3: Compare your research output to norms
Use a rough mapping based on what we know from match data:
Primary care:
- 0–1 items → weak
- 2–4 → typical
- 5+ → strong / academic-leaning
Competitive surgery / derm / rad onc / neurosurg:
- 0–5 items → weak
- 6–12 → solid
- 13–25+ → strong / research year–level
Academic IM:
- 0–3 → weak
- 4–7 → typical
- 8+ → strong
Step 4: Evaluate marginal effort vs marginal impact
Ask yourself:
- Can I reasonably move my Step 2 CK band upward with 4–8 weeks of focused study?
- Or is my Step 2 already “good enough” for my target range, and I have 12+ months where a deep research plunge could credibly yield multiple first/second-author works?
Then you can choose:
Step 2–heavy strategy if:
- You are below or near the bottom of the competitive Step 2 range.
- You have limited time (≤6 months) before application season.
- You lack the infrastructure or time to generate serious research before ERAS.
Research-heavy strategy if:
- You already sit at or above the median Step 2 for your target specialty.
- You have at least 9–18 months before applying.
- You can plug into a high-yield research group (dedicated year, robust mentorship, realistic pipeline).
7. Quality vs Quantity: The Research Trap
One more uncomfortable data point: not all “20 pubs” are equal.
When I look at CVs, I see:
- Posters from local student symposia counted as “publications.”
- Middle-author case reports with minimal involvement.
- Reviews written largely by residents or attendings where the student did little more than formatting.
Programs see this pattern too.
From discussions with PDs and faculty, three things reliably matter more than raw count:
Field alignment
Derm programs care more about derm projects than about a random cardiology case report.Role clarity
First- or second-author on a substantial manuscript beats “8th author” on five low-impact pieces.Narrative coherence
A progression from simple chart reviews → clinical studies → perhaps some basic/translational work, with clear mentorship, tells a stronger story.
So, if you are going to sacrifice Step 2 study time for research, the projects must realistically convert into:
- In-field outputs
- With higher authorship position
- On a timeline that hits before ERAS submission
If you cannot secure that, then the data argue for protecting your Step 2 preparation instead.
8. A Data-Driven Summary: What Matters More, and When?
The question “Research output vs Step scores: what matters more post–Step 1 P/F?” has a conditional answer:
For getting past initial filters in most specialties, Step 2 CK matters more.
A weak Step 2 will quietly kill your application long before anyone appreciates your “20 posters.”For moving up the program quality ladder after you have an acceptable Step 2, research starts to rival or exceed the marginal impact of a few extra CK points—especially in research-centric fields.
Here is the short version, stripped to the essentials:
| Scenario | Higher Yield Focus |
|---|---|
| Below mean Step 2 for target specialty | Step 2 CK |
| Near mean, minimal research in research-heavy field | Research (targeted) |
| Above mean Step 2, average research, academic goals | Research + strong letters |
| Primary care, broad geographic goals | Step 2 CK (moderate) |
| Derm/neurosurg/plastics, score acceptable | High-yield research |
And one last visual to drive home how Step 2 and research generally trade off across competitiveness tiers:
| Category | Step 2 CK Weight | Research Weight |
|---|---|---|
| Primary Care | 70 | 30 |
| Academic IM | 55 | 45 |
| Competitive Surgery | 50 | 50 |
| Ultra-Competitive (Derm/Neurosurg) | 45 | 55 |

If you remember nothing else, remember this:
- Step 2 CK is the new Step 1 for screening. If you are below your specialty’s typical range, correcting that is the single most effective move you can make.
- Once your Step 2 is in a competitive band, research output—especially targeted, high-quality, in-field projects—becomes the key lever for moving from “any match” to “the match you actually want.”