Residency Advisor Logo Residency Advisor

Pass/Fail Era Step 1: How Resource Choices Shifted Score Variability

January 5, 2026
13 minute read

Medical student analyzing USMLE Step 1 resources on laptop with data charts -  for Pass/Fail Era Step 1: How Resource Choices

The narrative that “Step 1 going pass/fail made it more relaxed and equal for everyone” is wrong. The data—what we can see already and what past behavior predicts—shows something very different: resource choices now drive more of the Step 2 score spread, and Step 1 study patterns are fragmenting in a way that increases variability in downstream performance.

USMLE changed the scoring scale. It did not change human behavior under competition.

Let me walk through what the numbers and patterns actually suggest.


What Changed When Step 1 Went Pass/Fail

On paper, the shift was simple: the score report lost the three‑digit number and kept a binary outcome. But behavior is a function of incentives, and the incentive structure changed across three dimensions:

  1. Risk perception
  2. Signaling value
  3. Resource allocation

Before pass/fail, Step 1 was a high‑stakes rank discriminator. After pass/fail, Step 1 became a high‑stakes gate (you must pass) but a low‑value signal (a pass means very little compared with a 250+).

This matters for resources because students optimize around signals.

Before pass/fail:

After pass/fail:

  • Top‑tier applicants shifted effort and money earlier (foundation building) and later (Step 2).
  • Mid‑tier students, watching anxiety drop a bit, started experimenting more with cheaper or shorter resources.
  • Lower-resourced students leaned harder into whatever was bundled or discounted.

Result: more heterogeneity in resource stacks, less standardization in study sequences, and more outcome spread that is now visible on Step 2 instead of Step 1.


The Old Regime: Standardized Resource Stacks, Compressed Variability

Under the numeric Step 1 era, prep was remarkably standardized. Not because schools mandated it, but because match data and online forums converged on a consensus.

If you talked to a third‑year around 2018–2019, you heard the same phrases repeatedly:

  • “I did 2.5 passes of UWorld, 80–100 questions a day.”
  • “I annotated First Aid with UWorld explanations.”
  • “Pathoma every day during dedicated. Sketchy for micro.”

When a market converges on a dominant strategy, variance in input decreases. Same inputs → more compressed output distribution, especially in the middle of the pack.

Let me quantify the pattern conceptually.

Programs and advisors reported Step 1 distributions roughly like this in the pre‑P/F era at solid mid‑tier schools:

  • Mean Step 1: ~230–235
  • Standard deviation: ~15–18 points
  • Resource usage (UWorld penetration): 90%+

So most students:

  • Used UWorld heavily
  • Used 1–2 major content resources
  • Entered dedicated with similar baseline structures

Compressed inputs produced a “fat middle” of Step 1 scores. Yes, there were outliers at 260+ and some failures, but the bulk of students clustered in a relatively predictable band.

If you want a mental model: think of strong standardization like nationalizing the curriculum. More similar prep → slightly narrower performance distribution, especially once everyone is working off the same question bank.


The New Regime: Divergent Resource Strategies, Same Underlying Pressure

When Step 1 went pass/fail, many students did exactly what commentators predicted: they de‑intensified Step 1 and re‑intensified Step 2.

But how they shifted mattered.

Here is what I have seen across several schools (and what fits the early survey data):

  1. A subset of students (call them the “signal maximizers,” roughly top 20–25%)

    • Still use UWorld early
    • Build deep basic science foundations with Boards & Beyond or similar
    • Practice NBME forms seriously
    • Then aggressively pivot to Step 2:
      • Full UWorld Step 2 CK
      • Consistent NBME practice
      • Dedicated schedule similar in intensity to old Step 1 days
  2. The middle group (maybe 50–60%)

    • Use UWorld for school exams and some Step 1 practice, but with less structure
    • Skim or partially watch content resources
    • Treat dedicated Step 1 time as “lighter” because the endpoint is pass/fail
    • For Step 2, they come in with patchy foundations, then try to “catch up” with heavy Step 2 UWorld only
  3. The under-resourced or under‑guided group (15–25%)

    • Rely primarily on school‑provided question banks or local materials
    • Delay UWorld because of cost or confusion
    • Use ad‑hoc YouTube or random Anki decks
    • Pivot to Step 2 late, often when they realize programs still care a lot

The raw hours may be similar across groups. The efficiency and sequencing are not.


Where Variability Shows Up Now: Step 2 as the New Sorting Mechanism

The core claim: going pass/fail on Step 1 did not meaningfully change how competitive residencies sort applicants. It just moved the sorting load to Step 2.

And that changes how resource choices affect score variability.

Instead of Step 1 showing a somewhat compressed distribution (because almost everyone used the same high‑yield tools in a similar way), we now see:

  • Wider spread in Step 2 outcomes, driven heavily by the quality and timing of Step 1 resource use.
  • More differentiation between students who used Step 1 era resources to truly learn pathophysiology and those who just “did enough to pass.”

Think in terms of basic statistical structure:

  • Old world:

    • Step 1: high‑stakes, input standardized, moderate SD
    • Step 2: lower marginal sorting value; students relatively fatigued; somewhat similar prep, SD moderate
  • New world:

    • Step 1: still high‑stakes (must pass) but locally variable prep; pass/fail hides the distribution
    • Step 2: very high sorting value, but prep quality is strongly conditioned on earlier resource choices

If you had continuous access to internal school data, you would likely see:

  • Students who built strong Step 1 foundations with robust QBank + structured resources:
    • Step 2 outcomes pulled rightward (higher scores, less study time)
  • Students who under‑invested in Step 1 or used scattered resources:
    • Step 2 outcomes more variable (a mix of late bloomers and real struggles)

Resource Mix Before vs After: A Data-Style Snapshot

Let’s make this concrete with a hypothetical but realistic comparison of resource penetration rates before and after Step 1 went pass/fail.

Estimated Resource Adoption Before vs After Step 1 Pass/Fail
Resource TypePre P/F AdoptionPost P/F AdoptionPattern Shift
UWorld Step 1 QBank90–95%70–80%Still dominant but less universal
Comprehensive “FA-style” book80–85%50–60%More students skip full text review
Major video course (B&B/Pathoma)60–70%50–65%Slightly more fragmented choices
Sketchy/visual mnemonics50–60%45–55%Stable, but more topic-selective use
School-only or bundled QBank20–30%40–50%More reliance where bundled for free

The critical line is UWorld adoption. When near-universal, it acted as a “floor raiser” for many borderline students. When only ~70–80% use it consistently, the lower tail of performance spreads out.

Now add timing:

  • Pre P/F: many students did a full or near‑full pass of UWorld before dedicated or early in dedicated.
  • Post P/F: some delay serious QBank use until Step 2, using fewer Step 1 questions and weaker integration.

Outcome: foundational weakness for a non‑trivial fraction of the class that will be exposed by Step 2, which remains fully numerical.


How Resource Choices Mechanically Increase Score Variability

Let me strip the emotion out and model the process.

Think of exam performance as:

Performance ≈ f(Knowledge base, Question practice, Test strategy, Stress)

Resource choices primarily affect the first two variables. Pass/fail Step 1 changed the interplay:

  1. Knowledge base:

    • Students who still treat Step 1 as a content‑mastery exam leverage:
    • Students who treat Step 1 as a “just pass” hurdle:
      • Do less systematic coverage
      • Rely on lecture notes and spot learning
      • Postpone rigorous systems review to Step 2 period
  2. Question practice:

    • Heavy Step 1 QBank usage used to be near-universal and front‑loaded.
    • Now, there is a split:
      • Some students use Step 1 QBank mainly for pass assurance (partial pass).
      • Others reserve “serious” question habits for Step 2.

Here is the catch: the function is not linear. Knowledge base and question practice during Step 1 amplify each other for Step 2. Delayed investment does not simply “shift right” by one exam; it loses compounding.

Students who:

  • Built robust mental models in M2
  • Reinforced them with UWorld Step 1
  • Enter Step 2 studying with less cognitive load to “relearn” systems

…will need far fewer hours per point gained on Step 2. Their marginal returns per UWorld block are higher.

Students who:

  • Barely scraped a Step 1 pass with patchwork resources
  • Did minimal NBME practice
  • Enter Step 2 with shallow memory and fragmented understanding

…will blow a lot of Step 2 QBank time just reconstructing basics instead of polishing.

That differential in marginal returns expands the score distribution.


Visualizing the Shift: From Compressed to Spread

Imagine distributions, not just means.

boxplot chart: Old Step 1 (standardized resources), New Step 2 (fragmented resources)

Conceptual Distribution Shift: Step 1 vs Step 2 Score Spread by Resource Strategy
CategoryMinQ1MedianQ3Max
Old Step 1 (standardized resources)210225235245260
New Step 2 (fragmented resources)205225240255275

The medians may creep up because resources improved overall, but the upper and lower tails spread. That is what you are living in now.

The right tail (top performers) pushes higher because those students exploit high‑quality resources earlier and for longer. The left tail drifts lower because a bigger fraction under‑optimize Step 1, then cannot fully catch up on Step 2.


Concrete Resource Patterns That Matter Now

If you want to play this game strategically, you need to understand which choices are actually driving variability. It is not “how many resources you own.” It is which ones you commit to and when.

From real schedules and outcomes I have seen, three patterns stand out.

1. Early UWorld vs Late UWorld

Students who start UWorld (or an equivalent high‑quality QBank) systematically during organ systems—not just “for the exam next week”—build pattern recognition and test‑taking skills gradually.

Students who delay UWorld until:

  • Just before Step 1 to secure a pass
  • Or only for Step 2 CK as a panic move

…lose both pattern recognition and the chance to integrate concepts across systems.

2. Coherent Primary Resource vs Resource Grazing

There is a massive difference between:

  • Committing to a complete spine (for example, Boards & Beyond + a structured Anki deck + targeted review book) and living inside that system for two years.

and

  • Picking random YouTube videos, switching decks every 3 months, and only touching “high‑yield” summaries shared by older students.

The former creates a stable cognitive framework. The latter produces islands of knowledge, which might be enough to pass Step 1, but buckles under the breadth and clinical framing of Step 2.

3. Using NBMEs and Self-Assessments Intelligently

One of the hidden equalizers in the old Step 1 world was the NBME practice test culture. People took them seriously, tracked scores, and adapted.

With pass/fail:

  • Some students still use them rigorously and benchmark early.
  • Others treat them as optional or only take 1–2 right before the exam, because “I just need to pass.”

That divergence means weaker feedback loops for a portion of students, so their resource misallocation persists longer and shows up as more variance later.


Process View: How Resources Cascade From M1 to Step 2

Here is how the process actually works when mapped out over time.

Mermaid flowchart TD diagram
Resource Strategy from M1 to Step 2
StepDescription
Step 1M1 Foundations
Step 2Strong basic science base
Step 3Fragmented knowledge
Step 4Integrated learning + pattern recognition
Step 5Late exposure to QBank logic
Step 6Efficient Step 1 pass, strong concepts
Step 7Step 1 pass, shallow concepts
Step 8High Step 2 yield if C+F path
Step 9Variable Step 2 outcomes if D or G path
Step 10Use structured core resource?
Step 11M2 Question Bank Early?
Step 12Step 2 Prep

Two key inflection points:

  1. Whether you adopt a coherent core resource strategy in M1–M2.
  2. Whether you adopt questions as a learning tool early rather than treating them as a final check.

The pass/fail change removed some fear around Step 1, so more students decide they can postpone real optimization until Step 2. The data from residency selection says that is a bad bet.


What the Data Implies You Should Actually Do

Let me be blunt. If you are semi‑serious about a competitive match, your behavior should look closer to the old Step 1 era than the new, relaxed mythology.

Specific, data‑aligned implications:

  • Treat Step 1 as the foundation builder, not a hoop.
    You do not need a 260. You do need to prepare as if you care about being in the right half of the Step 2 distribution.

  • Standardize your resources more, not less.
    The efficient set has not changed much:

    • One serious QBank early and consistently (UWorld still has the strongest track record).
    • One coherent primary content system (video series + structured notes or deck).
    • Targeted high‑yield supplements, not a dozen overlapping products.
  • Use Step 1 to debug your process.
    Because Step 1 is pass/fail, you actually gain a safe space to:

    • Learn how you respond to timed NBMEs.
    • Refine your Anki / note system.
    • Discover your blind spots in biochem, immunology, etc.
      Students who exploit this use Step 2 as execution, not experimentation.
  • Do not cheap out on earlier resources and expect Step 2 to bail you out.
    The data from real cohorts is ruthless here: late buyers of “all the right resources” before Step 2 underperform compared with those who used those same resources early, even when total hours look similar.


The Bottom Line: Variability Did Not Vanish, It Moved

Pass/fail Step 1 did not flatten the playing field. It shifted where and how inequalities in resource quality, timing, and strategy show up.

Three key points to leave with:

  1. The removal of a numerical Step 1 score reduced standardized pressure but increased heterogeneity in resource use. That heterogeneity has expanded variability in Step 2 outcomes, which now carry more weight.

  2. Early, structured use of high‑quality resources (especially a major QBank and a coherent content spine) still compresses risk and improves your expected position in the Step 2 distribution. Fragmented, delayed strategies widen your personal variance in a bad way.

  3. If you want to exploit the pass/fail era rather than be a casualty of it, you should act counter‑cyclically: keep the old Step 1 rigor, but enjoy the psychological buffer of pass/fail while you build a foundation that makes your Step 2 score predictable—and high.

overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles