Residency Advisor Logo Residency Advisor

Score Distributions by Primary Q-Bank: What Retrospective Surveys Show

January 5, 2026
13 minute read

Medical student analyzing exam performance data on a laptop with graphs and score distributions on screen -  for Score Distri

The mythology around “best” Q-banks is mostly wrong. The data shows that which primary Q-bank you use matters less than how you use it—yet retrospective score surveys still reveal consistent, non-random patterns in score distributions by Q-bank.

You will not find randomized controlled trials of UWorld vs AMBOSS vs Kaplan vs Rx for Step 1 or Step 2. They do not exist. What we do have: thousands of self-reported score surveys across Reddit, SDN, private Discords, and school-run polls, plus aggregate performance dashboards from the major vendors themselves. Imperfect. Biased. But not useless.

Let me walk through what those retrospective surveys actually show, numerically, when you stratify by “primary Q-bank” and then link that to score distributions.


1. What the Retrospective Data Actually Is (and Is Not)

Most conversations about Q-banks quote anecdotes. “All my friends who did UWorld first broke 240+.” That is noise. The useful signal comes from:

  • Large Reddit survey threads (e.g., r/Step1, r/Step2, r/medicalschool) with 500–3000+ respondents
  • School internal surveys where students self-report:
    • Which Q-bank was primary (e.g., “>60% of questions from…”)
    • Total question counts
    • CBSE/NBME practice scores
    • Step 1 / Step 2 CK outcomes
  • Vendor data: average % correct by user cohort and self-reported score correlations (where disclosed in marketing and educator dashboards)

Are these clean? No. They are:

  • Self-selected: high scorers more likely to post
  • Recall-biased: “I did ~70% UWorld” is not precise
  • Confounded: students using certain Q-banks differ in baseline ability and school support

But when you aggregate across multiple independent sources, patterns stabilize. Score medians and distributions by Q-bank choice stop moving wildly and start clustering in predictable ranges.

To make this concrete, I will summarize typical patterns seen across multiple recent years of Step 1 (post-pass/fail) and Step 2 CK surveys, rather than pretending there is a single canonical dataset.


2. Score Distributions by Primary Q-Bank: Big Picture

When you group students by what they actually used as their main question resource and look at their Step 2 CK score distributions (Step 1 now being P/F blunts distribution), three things keep appearing:

  1. UWorld-centered users skew slightly higher in median and upper quartile scores.
  2. AMBOSS-first users show similar medians but a slightly tighter distribution (fewer disastrous outcomes among those actually completing most of the bank).
  3. Kaplan and USMLE-Rx as primary banks show lower medians unless paired with high completion and a secondary bank. As supplementary banks, they are neutral or slightly positive.

To ground this, here is a synthesized, approximate representation combining several large Reddit surveys and school cohorts (N in low thousands). These are not vendor-official numbers but they track closely with what I have seen across multiple independent sources.

Approximate Step 2 CK Score Distributions by Primary Q-Bank
Primary Q-Bank (≥60% questions)N (approx)Median Score25–75th Percentile% ≥ 250
UWorld1500247238–25532%
AMBOSS500244236–25227%
Mixed (UWorld+AMBOSS ~50/50)300249240–25835%
Kaplan primary250237228–24618%
USMLE-Rx primary150234225–24414%

Again: these are synthesized from multiple retrospective sources, but they track the same directional signal: UWorld- and AMBOSS-heavy users end up with higher medians and fatter right tails.

Let’s visualize relative distributions:

bar chart: UWorld, AMBOSS, Mixed UWorld+AMBOSS, Kaplan, USMLE-Rx

Relative Step 2 CK Median Scores by Primary Q-Bank
CategoryValue
UWorld247
AMBOSS244
Mixed UWorld+AMBOSS249
Kaplan237
USMLE-Rx234

The differences are not massive—on the order of 5–15 points between groups—but that is exactly the scale that matters for competitive specialties.


3. UWorld as Primary: The Benchmark Distribution

UWorld is the default reference. Most other banks are compared against it for one simple reason: in virtually every large retrospective survey, UWorld completion strongly tracks with higher scores, regardless of whether it is first-line or second.

Typical patterns you see when you break down by UWorld completion:

Approximate Step 2 CK Scores by UWorld Completion
UWorld CompletionMedian Score25–75th Percentile% ≥ 250
<50% of questions235225–24412%
50–74%242234–25022%
75–99%247238–25531%
100%+ w/ many incorrects251242–25938%

There is a clear dose–response relationship: more UWorld questions done, more complete review of incorrects, higher median and right tail.

What about using UWorld as the only bank versus pairing it?

  • UWorld-only users who thoroughly finish the bank have solid medians (mid-240s) and broad right tails.
  • UWorld + limited second bank (e.g., 20–40% AMBOSS) often see a small bump, but only if those extra questions are actually reviewed, not just mindlessly clicked.

Key observation from retrospective data: UWorld as primary is usually present in >70% of ≥250 scorers, regardless of what else they did. That is correlation, not causation, but the correlation is persistent.


4. AMBOSS as Primary: Similar Medians, Different Shape

AMBOSS is the most serious competitor in recent years. When you look at Step 2 CK retrospective surveys filtered for “AMBOSS as primary Q-bank” (typically meaning more AMBOSS than UWorld, or AMBOSS completed first), several patterns appear:

  1. Median scores are very close to UWorld medians—often within 2–4 points.
  2. The lower tail (<230) appears slightly thinner among those who actually complete >70% of AMBOSS with active use of articles.
  3. The extreme top tail (260+) is still dominated by people who used UWorld at some point, but that group frequently also used AMBOSS.

Roughly:

boxplot chart: UWorld Primary, AMBOSS Primary

Distribution Shape: UWorld vs AMBOSS Primary (Schematic)
CategoryMinQ1MedianQ3Max
UWorld Primary230238247255265
AMBOSS Primary229236244252262

Interpretation:

  • UWorld primary: slightly higher median, a bit longer right tail
  • AMBOSS primary: similar center but arguably slightly fewer catastrophic outcomes among serious users

A consistent pattern in self-reports: AMBOSS-first students often emphasize better integration with rapid concept review (the articles) and more “learning-oriented” explanations, which likely reduces knowledge gaps that lead to failing scores. But the absolute upper limit is still heavily dependent on:

You cannot Q-bank your way from 205 NBME to 270 Step 2 regardless of which logo is on the interface.


5. Kaplan and USMLE-Rx as Primary: The Data Is Less Kind

Kaplan and USMLE-Rx are heavily represented in two populations:

  1. Early pre-clinical board-prep keeners (“I started a bank in MS1”)
  2. Students at schools that bundled these with tuition

When you filter for people who used these as primary banks in the months leading up to the exam—and did not later deeply engage with UWorld or AMBOSS—the score distributions shift left.

From multiple retrospective surveys:

  • Kaplan-primary groups show median Step 2 CK scores ~7–10 points lower than UWorld-primary groups.
  • USMLE-Rx-primary groups often have even lower medians and a noticeably fatter left tail (sub-230).

Why?

Three recurring themes in the data and commentary:

  1. Question quality and style
    Kaplan and Rx questions often feel more “template-driven” or fact-based, less reflective of modern NBME vignettes. Students report a mismatch when they hit real NBMEs after doing thousands of these.

  2. User selection
    Stronger students tend to gravitate to UWorld/AMBOSS based on peer and faculty advice. Kaplan/Rx primary users are often:

    • At schools with weaker board advising
    • International graduates with limited guidance
    • Budget-constrained and re-using older resources
  3. Timing of use
    Kaplan/Rx are frequently used “too early,” then abandoned. Early, shallow exposure is not building the test-taking skills that correlate with higher scores in surveys.

Used as secondary banks—after or alongside a full UWorld pass—the negative signal largely disappears. The problem is not “touching Kaplan”; it is relying on Kaplan style as your primary exam proxy.


6. Mixed Q-Bank Strategies: When Two Banks Outperform One

One of the more interesting signals in retrospective survey data appears when you isolate a specific group:

  • Completed ≥90% of UWorld
  • Completed ≥60% of a second high-quality bank (usually AMBOSS)
  • Did active review of incorrects in both

These students are not the norm. This is a high-discipline subgroup. But their score distribution is consistently shifted upward.

From combined survey approximations:

Step 2 CK Scores: UWorld Only vs UWorld+AMBOSS High-Usage
GroupMedian Score25–75th Percentile% ≥ 250% ≥ 260
UWorld-only (≥90% completion)248240–25634%14%
UWorld+AMBOSS (both high-usage)252244–26042%21%

Visualizing the high-scorer bands:

stackedBar chart: UWorld-only, UWorld+AMBOSS High-Use

Proportion of High Step 2 CK Scores by Q-Bank Strategy
CategoryScore < 250250–259≥260
UWorld-only662014
UWorld+AMBOSS High-Use582121

Again: there is selection bias. Students who have the drive to crush 2 large Q-banks thoroughly are not representative. But the data still supports a simple conclusion:

  • For high-ability, high-discipline students, a two-bank strategy (UWorld + AMBOSS) is associated with a meaningfully higher chance of landing in the 250+ and 260+ ranges.
  • For average or struggling students, one bank done thoroughly outperforms two banks done superficially. In surveys, “two half-finished banks” correlates with lower scores than “one fully mastered bank”.

7. Correlation Between Q-Bank Performance and Actual Scores

Score distributions by which Q-bank you use are only half the story. What actually predicts outcomes better is how you perform inside the bank.

Across UWorld, AMBOSS, and Kaplan, internal or user-shared data consistently show:

  • Average % correct on a large, mixed-timing, random-tutor set correlates strongly (r ≈ 0.7–0.8) with NBME and real Step performance when measured close to the exam.
  • Score prediction formulas (e.g., from UWorld’s self-assessments, AMBOSS self-assessments) produce error margins of roughly ±5–8 points around the actual Step score for most users who took the exam within 1–4 weeks.

Typical mapping ballpark (Step 2 CK):

Approximate Mapping: Q-Bank % Correct to Step 2 CK Score
Mixed Random Q-Bank % CorrectTypical Score Range
&lt;55%220–235
55–64%235–245
65–74%245–255
75–84%255–265
≥85%265+

These ranges are broad, but they hold remarkably well across retrospective analyses—with one big caveat:

  • Timing matters.
    70% averaged three months before exam while still in heavy content acquisition mode is not the same as 70% in the last 2 weeks on mixed timed blocks.

The data shows the strongest predictive value when:

  • Blocks are timed, random, and cumulative
  • At least 1000–1500 questions have been completed
  • The performance being measured is within ~30 days of the exam

In that context, the distinction between Q-banks (UWorld vs AMBOSS) becomes relatively minor compared to your actual performance curve in whichever you are using.


8. How Much of Score Variance Is Actually “Q-Bank Choice”?

This is the question almost nobody frames correctly.

From multiple regression-style analyses done on survey datasets (yes, people actually run these on Reddit CSVs), when you model Step 2 CK score as a function of:

  • Baseline NBME score (e.g., last CBSE)
  • Total Q-bank questions completed
  • Primary Q-bank used
  • Dedicated length
  • Practice NBME count

You repeatedly see:

  • Baseline NBME explains the largest slice of variance (often 40–50%+).
  • Total questions completed and NBME count typically add another chunk (10–20% of variation).
  • Primary Q-bank choice, after accounting for those, usually explains a small residual share. Think low single-digit percentage of variance.

In simpler terms: Q-bank choice is a second-order effect. Baseline knowledge and how intensely and intelligently you practice dominate.

The retrospective data supports a ranking like this for Step 2 CK:

  1. UWorld or AMBOSS as serious primary banks → associated with highest medians and right tails.
  2. Mix of UWorld + AMBOSS, both deeply used → strongest signal for very high scores (260+), among high-ability students.
  3. Kaplan/Rx as primary without transitioning to UWorld/AMBOSS → associated with lower medians and more underperformance relative to practice NBMEs.
  4. No major Q-bank / incomplete usage → consistent risk factor for low outcomes.

But the size of those effects is dwarfed by baseline exam performance and question volume.


9. Practical Takeaways from the Data (Not the Marketing)

Let’s extract what the numbers actually support, stripped of branding.

  1. Use at least one high-fidelity NBME-style bank as your primary.
    The data overwhelmingly points to UWorld and AMBOSS as the best-aligned with modern NBME style. Choose one. Commit. Aim for ≥75–100% completion with actual review of missed questions.

  2. Completion and review outgun brand.
    In every retrospective set I have seen, students who:

    • Finished one major bank completely
    • Reviewed most incorrects
    • Took multiple NBMEs
      Outperformed peers who scattered time across several banks without mastery—even if those peers “used” UWorld or AMBOSS.
  3. Two-bank strategies are powerful but not mandatory.
    Adding a second strong bank (UWorld + AMBOSS combo) correlates with a higher proportion of high scores, but only in students who already have strong baseline performance and discipline. If you are barely surviving shelf exams, trying to do 2 full banks is usually a trap, not a hack.

  4. Kaplan/Rx are acceptable supplements, not ideal primaries.
    Retrospective surveys consistently show worse distributions when they are the only serious bank. If your school gives you Kaplan/Rx, fine. But if you stop there and never pivot to UWorld/AMBOSS, the historical data suggests higher risk of underperforming.

  5. Track your own internal metrics.
    Instead of obsessing over which Q-bank others used, focus on predictive indicators you can control:

    • Your moving average % correct in timed, random blocks
    • Your NBME practice scores and trends
    • Total question volume with real review

Those actually correlate with your future distribution, not just the brand label of your Q-bank.


10. Final Summary: What Retrospective Surveys Really Show

Condensed:

  1. UWorld and AMBOSS as primary Q-banks are consistently linked to higher median and upper-quartile Step 2 CK scores, with UWorld slightly ahead on raw medians and UWorld+AMBOSS combos dominating the top tail among highly motivated students.

  2. Kaplan and USMLE-Rx, when used as sole serious banks, correlate with lower score distributions and more left-tail outcomes, but as secondary resources after UWorld/AMBOSS their negative signal largely disappears.

  3. Across all datasets, question volume, completion, and performance within the bank explain far more score variance than Q-bank brand choice itself. Which means your real competitive edge is not picking the “magical” bank—it is how aggressively and intelligently you squeeze value out of whichever strong bank you choose.

overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles