
The comforting myth that Step 1 pass/fail would “level the playing field” between schools is not holding up under the early data. If anything, the advantage has just moved to different metrics.
Let me walk through what the numbers actually suggest so far.
What Changed With Step 1 – And What That Means Statistically
Step 1 went pass/fail for administrations beginning January 26, 2022. Before that, it was a 3-digit score with enormous spread in outcomes across schools, specialties, and demographics.
Under numeric scoring, you could quantify gaps very cleanly:
- School A average: 240
- School B average: 225
- Gap: 15 points, roughly 0.7–0.8 SD
Now, those same two schools both have “97% pass rate” on their websites. On the surface, that looks equal. It is not.
Here is the core problem: when you collapse a continuous variable (3-digit score) into a binary outcome (pass/fail), you lose resolution at the top and exaggerate or hide gaps at the bottom.
So the real question is not “did Step 1 P/F fix inequities?” The real question is: where did the selection power move?
From the early data, the answer is very clear:
- It moved from Step 1 to:
And that shift has uneven effects across different types of schools.
Early Quantitative Signals: What Programs Are Actually Doing
We do not have a nationally linked dataset that says “before and after” for every program. But we do have:
- NRMP Program Director Survey patterns
- Publicly released match data from some specialties
- De-identified data sets and institutional reports I have seen from several schools
- Application behavior changes (number of applications per student, interview filters, etc.)
Taken together, some early signals are loud.
1. Step 2 CK Is Rapidly Becoming the New Step 1
Programs are not shy about this. The 2022 and 2023 NRMP Program Director Surveys show a clear pattern: Step 2 CK moved up on the list of “factors for interview selection” after Step 1 went pass/fail.
A typical pattern in competitive specialties:
- Pre P/F: Step 1 score = #1 or #2 factor. Step 2 = secondary.
- Post P/F: Step 2 CK = #1 numeric filter. Step 1 = “pass required,” nothing more.
The practical effect: schools that historically produced higher Step 1 scores tend to also produce higher Step 2 CK scores. So the academic performance gap did not vanish. It just shifted one exam later.
| Category | Value |
|---|---|
| Pre P/F - Step 1 | 90 |
| Pre P/F - Step 2 | 60 |
| Post P/F - Step 1 | 20 |
| Post P/F - Step 2 | 95 |
You can debate exact numbers, but the direction is obvious: programs want a continuous performance variable. If you take one away, they will push weight onto another.
2. Pass Rates Compress; Score Distributions Do Not
Here is a stylized—but realistic—comparison based on patterns I have seen in internal school reports:
| Metric | School X (Top Tier) | School Y (Mid Tier) |
|---|---|---|
| Pre P/F Step 1 Mean | 243 | 230 |
| Pre P/F Step 1 Fail Rate | 2% | 6% |
| Post P/F Step 1 Fail Rate | 1.5% | 5% |
| Post P/F Step 2 CK Mean | 252 | 240 |
| Match into Competitive Fields* | 32% | 14% |
*Competitive fields: derm, ortho, plastics, ENT, neurosurgery, IR, rad onc, etc., approximated.
Pass/fail turns the Step 1 fail-rate difference (2 vs 6 percent) into a negligible looking 98.5 vs 95 percent pass. But the Step 2 CK mean gap (252 vs 240) remains large and actionable for programs.
So did Step 1 P/F “close the gap” in outcomes between these schools? No. It concealed it on one metric while it reappeared on another.
How Different School Types Are Affected
The data suggest three different stories depending on where your institution sits: top-ranked MD, average MD, or DO/IMG.
Top 30 MD Schools: Signal Loss Helps Them
Students at highly ranked MD schools used to compete in a system where a mid-tier student with a 255 Step 1 could push into their territory. That is much harder now.
Why? Because the strongest external equalizer—an anonymous 3-digit score—is gone. Programs revert to proxies they already trusted: school reputation, letter writers, known rotations.
Imagine a dermatology program comparing two applicants:
- Applicant A: T30 med school, Honors in clinicals, no Step 1 score, Step 2 CK pending, 2 pubs in mid-tier derm journals.
- Applicant B: Mid-tier med school, Honors, strong letters, similar research profile.
Under numeric Step 1:
- If B scored a 255 and A scored a 238, B had real leverage. Programs routinely said out loud: “We will absolutely interview a 255 from a mid-tier school.”
Under P/F Step 1:
- That leverage disappears. Programs default to “better-known” schools and letter writers, and then use Step 2 CK to filter. Even if B scores a 255 on CK, the comparison does not hit as hard psychologically as the old Step 1 score did. Programs rarely talk about a single CK number the way they obsessed over Step 1.
So at the high end, gaps are likely widening in terms of match outcomes into the most competitive specialties, not shrinking.
Mid-Tier and Lower-Tier MD Schools: More Noise, Not More Equity
For middle-of-the-distribution schools, Step 1 P/F has mixed effects.
Upsides:
- Slightly fewer catastrophic “one bad test day ruins your career” scenarios.
- Students who would have scored 220–230 and been borderline for certain specialties can now lean harder on Step 2 CK, good clinical performance, and networking.
Downsides (and they are real):
- The strongest students at these schools lose their biggest objective differentiator.
- The school’s average “academic reputation” now matters more because there is less hard data to override it.
If you look at internal match lists pre- and post-P/F at several mid-tier schools, you see patterns like:
- Total match rate: almost unchanged.
- Percentage matching into ultra-competitive fields: flat or slightly down.
- Percentage staying in-region or at home program: up slightly (because the home program knows them and is less fixated on missing Step 1 scores).
From a numbers perspective, the “gap” between mid-tier and top schools in high-end outcomes probably widened marginally, even if pass rates converged.
DO and IMG: Step 1 P/F Mostly Hurt Their Ability to Stand Out
This is the group that was sold a false hope.
Under numeric Step 1, a DO or IMG with a 245+ had a non-trivial shot at interviews in traditionally MD-dominated specialties and programs. Not a guaranteed shot, but a real one. I have seen DO students with 250+ crack into competitive university programs precisely because their score was too strong to ignore.
Under P/F:
- Many programs shifted to “Step 2 CK required before interview” and then used CK cutoffs or de facto tiers.
- At the same time, with Step 1 gone, school pedigree and bias reasserted themselves. You see this in anecdotal patterns: fewer IMGs invited without a known connection, more emphasis on doing away rotations to be “seen.”
| Category | Value |
|---|---|
| US MD | 235 |
| US DO | 245 |
| IMG | 250 |
These numbers are stylized, but they match the thresholds I hear in program meetings: “We can be flexible with our own MD students down to 230–235, but we want 245+ from DOs and IMGs.”
Translation: with Step 1 P/F, the bar to overcome school type through testing got higher, not lower. Which is the exact opposite of “closing gaps.”
What the Match Data Whisper (Even If They Do Not Shout Yet)
Publicly, aggregate NRMP data are not yet cleanly stratified by pre- vs post-P/F Step 1 cohorts for all specialties in a way that lets you compute precise effect sizes. But some early patterns are visible when you compare match outcomes for competitive specialties across recent years.
Common patterns I have seen when slicing data by year, school type, and specialty:
- Total number of applicants to competitive specialties went up. P/F gave people false confidence.
- Interview offers per applicant became more skewed. A subset hoards more invites, others get fewer.
- The share of interview slots going to home and regional schools ticked up by a few percentage points in some specialties.
That last one matters. In less data-driven environments, relationships fill the vacuum. If you remove a standard, external, decontextualized number (Step 1), people lean harder on who they know and trust.
So did Step 1 P/F narrow gaps between, say, a state MD and a top-10 MD? There is no convincing numerical evidence that it did. But there is growing circumstantial evidence that gatekeeping became slightly more relationship- and prestige-driven.
Did It Reduce Bias or Just Hide It?
One of the arguments for P/F was that numeric Step 1 scores showed demographic and structural inequities. That is true. But turning the score into P/F does not fix upstream inequities in:
- K–12 education
- College preparation
- Access to test prep resources
- Time off for dedicated study
- Financial stress
The best you can say is that P/F:
- Reduced the emotional and psychological fixation on a single number.
- Lowered the stakes of any single bad test day.
- Reduced the perceived “ceiling” pressure for students at already advantaged schools.
What the data suggest it did not do:
- It did not eliminate achievement gaps in basic science knowledge. These reappear on Step 2 CK and clinical performance statistics.
- It did not guarantee more equitable access to competitive specialties. Match outcomes have not moved in a way that supports that narrative so far.
- It did not force programs to adopt rigorous holistic review. Many simply moved filters to other numeric or pseudo-numeric variables (Step 2 CK, AOA, class rank, research output counts).
Think of it as an information-theory problem. You removed a high-signal, high-variance metric. Unless you fix root causes, the variance will express itself somewhere else.
Where the “Gaps” Actually Moved
To answer the original question precisely—did Step 1 P/F widen or close gaps between schools?—you need to specify which gap you mean. Because the answer differs by outcome variable.
Let us break it into four:
Step 1 pass rates by school
- Gaps narrowed in superficial appearance. A 3–5% pass rate difference is far less visible than a 15-point score difference.
- But absolute gaps in underlying knowledge probably changed little.
Access to “top” programs and competitive specialties
- Early trend: gap between top-tier MD and everyone else likely widened or at least solidified.
- Star students at mid-tier/DO/IMG schools lost their cleanest “I am objectively as strong as your top-school students” line item.
Reliance on school brand and networks
- Gaps widened. Reputational advantage became more important.
- Home program bias likely increased slightly because local faculty know their own students better than anonymous applicants without scores.
Reliance on another standardized exam (Step 2 CK)
- Numerical gatekeeping did not disappear; it shifted.
- Schools that historically performed better on standardized tests still do. You can see it in CK means and distributions.
So the honest synthesis is this:
- The Step 1 P/F change redistributed inequality rather than eliminating it.
- It reduced the visibility and psychological weight of one metric while amplifying others.
That is not “closing gaps.” It is moving the walls around.
What To Watch Over The Next 3–5 Years
The early trends are just that—early. But there are specific data points you should watch, because they will tell you whether this shift becomes entrenched or moderated.
| Step | Description |
|---|---|
| Step 1 | Step 1 P/F Era |
| Step 2 | Step 2 CK Mean by School |
| Step 3 | Match Rates by Specialty |
| Step 4 | Interview Filters Used |
| Step 5 | Gap Top vs Mid Tier |
| Step 6 | DO and IMG Access |
| Step 7 | Holistic vs Numeric Filters |
The critical metrics:
Step 2 CK distribution by school decile
- If gaps shrink, then maybe education quality or support is improving across the board.
- If they hold steady, then P/F was mostly cosmetic.
Match outcomes for high-achieving students from non-elite schools
- Specifically: how often a 245–255+ Step 2 CK at a mid-tier or DO school translates into interviews and matches at highly academic programs.
- If this rate falls versus pre-P/F Step 1 data, then the “mobility ladder” is weaker.
Proportion of interviews going to home and regional schools
- Increases here mean prestige and proximity matter even more.
- A flat or declining share would suggest real adoption of broader, holistic review.
Changes in demographic distributions in competitive specialties
- If P/F genuinely improves equity, you should see a measurable uptick in representation from historically excluded groups in derm, ortho, plastics, ENT, etc.
- If those numbers barely move, the reform did not hit its stated equity target.
Practical Implications For Current Students
If you are a student trying to make decisions in this environment, you do not have the luxury of waiting for the perfect longitudinal dataset. You live inside the transition.
From a data standpoint, here is what actually moves the needle now that Step 1 is P/F:
- Step 2 CK is your main standardized lever. The distribution of CK scores by specialty matches the old Step 1 story: the competitive fields still cluster high.
- For students outside the top-tier MD bubble, above-median to high Step 2 CK scores (think 240+) are often necessary just to get in the door for many competitive specialties.
- Research productivity and strong, specific letters are more valuable than ever because they inject signal where Step 1’s numeric spread used to live.
| Category | Min | Q1 | Median | Q3 | Max |
|---|---|---|---|---|---|
| Primary Care | 225 | 235 | 240 | 245 | 250 |
| Moderately Competitive | 230 | 240 | 245 | 250 | 255 |
| Most Competitive | 238 | 246 | 252 | 258 | 265 |
Again, the exact values vary by year and dataset, but the relative ordering does not.
So no, you cannot rely on the idea that “P/F leveled the field.” The data do not support that comfort. You still need a hard quantitative anchor. It just changed exams.
The Bottom Line
Did Step 1 pass/fail widen or close gaps between schools?
Summarizing the early data as bluntly as I can:
- It narrowed the visible gap in Step 1 performance by compressing everything into pass/fail.
- It did not meaningfully reduce underlying performance differences across schools. Those reappear on Step 2 CK and match outcomes.
- It likely increased the relative advantage of top-tier MD schools and of applicants with strong institutional networks.
- It weakened the ability of high-performing students at less-prestigious schools (especially DO and IMGs) to use a single standout exam score as a ticket into elite programs.
The reform solved an optics problem and a wellness problem around Step 1. It did not solve the structural inequities that produced score gaps in the first place. Those are upstream of any test format.
You can expect the system to keep rebalancing for several more cycles as programs refine their filters. And once Step 2 CK becomes the new obsession in a fully stable way, I would not be surprised if the conversation about its scoring format starts to sound very familiar.
For now, though, you know where the data point. Use that. Anchor yourself in the metrics that still move decisions, rather than the myths about what pass/fail was supposed to do.
You have the early numbers. Your next step is figuring out how to bend them in your favor. But that strategy conversation is a different story.
FAQ
1. Has Step 1 pass/fail made it easier to match into competitive specialties from a mid-tier school?
The available data do not show a clear improvement. Overall match rates are similar, but the top-tier schools still dominate matches in the most competitive specialties. What I have seen is that strong mid-tier candidates now rely more heavily on Step 2 CK scores, research, and away rotations rather than a standout Step 1 score. The “ceiling” for mid-tier students did not disappear; it shifted metrics.
2. Did Step 1 P/F help DO and IMG applicants?
Overall, no. Under numeric scoring, a DO or IMG with a strong Step 1 (say, 245+) could break into university programs that might otherwise ignore them. With P/F, that differentiator is gone, and programs lean more heavily on school type, Step 2 CK, and connections. Many programs have effectively raised the Step 2 CK bar for DOs and IMGs relative to US MDs, which reduces their ability to compete on equal footing.
3. Are residency programs actually using holistic review more after Step 1 went P/F?
There is some movement, but it is not uniform. Many programs still start with numeric filters, now based on Step 2 CK, class rank, or AOA status. Holistic review tends to happen after those hard screens. A minority of programs have invested in structured holistic review processes, but for most, the pattern is the same: a quick quantitative cut, and then a deeper look at the survivors.
4. Does Step 1 P/F reduce test-related stress for students at all schools equally?
No. Students at schools with strong institutional reputations and strong Step 2 CK performance can afford to relax more about Step 1. Their environment and downstream metrics protect them. Students at mid-tier, DO, and IMG schools often feel they must “make up for” the lack of Step 1 score by over-performing on CK, research, and everything else. For them, stress has shifted rather than decreased.
5. Could Step 2 CK eventually become pass/fail as well, and would that finally level the field?
If Step 2 CK ever becomes pass/fail, the system will not magically become fair. Programs would shift even more weight to school reputation, clinical grades, sub-internship performance, and research metrics. You might see some reductions in test-focused anxiety, but unless upstream educational and resource inequities change, performance gaps will resurface in whatever metrics remain. Pass/fail can change where inequality shows up; it does not delete it.