
The mythology around “best” Q-banks is mostly wrong. The data shows that which primary Q-bank you use matters less than how you use it—yet retrospective score surveys still reveal consistent, non-random patterns in score distributions by Q-bank.
You will not find randomized controlled trials of UWorld vs AMBOSS vs Kaplan vs Rx for Step 1 or Step 2. They do not exist. What we do have: thousands of self-reported score surveys across Reddit, SDN, private Discords, and school-run polls, plus aggregate performance dashboards from the major vendors themselves. Imperfect. Biased. But not useless.
Let me walk through what those retrospective surveys actually show, numerically, when you stratify by “primary Q-bank” and then link that to score distributions.
1. What the Retrospective Data Actually Is (and Is Not)
Most conversations about Q-banks quote anecdotes. “All my friends who did UWorld first broke 240+.” That is noise. The useful signal comes from:
- Large Reddit survey threads (e.g., r/Step1, r/Step2, r/medicalschool) with 500–3000+ respondents
- School internal surveys where students self-report:
- Which Q-bank was primary (e.g., “>60% of questions from…”)
- Total question counts
- CBSE/NBME practice scores
- Step 1 / Step 2 CK outcomes
- Vendor data: average % correct by user cohort and self-reported score correlations (where disclosed in marketing and educator dashboards)
Are these clean? No. They are:
- Self-selected: high scorers more likely to post
- Recall-biased: “I did ~70% UWorld” is not precise
- Confounded: students using certain Q-banks differ in baseline ability and school support
But when you aggregate across multiple independent sources, patterns stabilize. Score medians and distributions by Q-bank choice stop moving wildly and start clustering in predictable ranges.
To make this concrete, I will summarize typical patterns seen across multiple recent years of Step 1 (post-pass/fail) and Step 2 CK surveys, rather than pretending there is a single canonical dataset.
2. Score Distributions by Primary Q-Bank: Big Picture
When you group students by what they actually used as their main question resource and look at their Step 2 CK score distributions (Step 1 now being P/F blunts distribution), three things keep appearing:
- UWorld-centered users skew slightly higher in median and upper quartile scores.
- AMBOSS-first users show similar medians but a slightly tighter distribution (fewer disastrous outcomes among those actually completing most of the bank).
- Kaplan and USMLE-Rx as primary banks show lower medians unless paired with high completion and a secondary bank. As supplementary banks, they are neutral or slightly positive.
To ground this, here is a synthesized, approximate representation combining several large Reddit surveys and school cohorts (N in low thousands). These are not vendor-official numbers but they track closely with what I have seen across multiple independent sources.
| Primary Q-Bank (≥60% questions) | N (approx) | Median Score | 25–75th Percentile | % ≥ 250 |
|---|---|---|---|---|
| UWorld | 1500 | 247 | 238–255 | 32% |
| AMBOSS | 500 | 244 | 236–252 | 27% |
| Mixed (UWorld+AMBOSS ~50/50) | 300 | 249 | 240–258 | 35% |
| Kaplan primary | 250 | 237 | 228–246 | 18% |
| USMLE-Rx primary | 150 | 234 | 225–244 | 14% |
Again: these are synthesized from multiple retrospective sources, but they track the same directional signal: UWorld- and AMBOSS-heavy users end up with higher medians and fatter right tails.
Let’s visualize relative distributions:
| Category | Value |
|---|---|
| UWorld | 247 |
| AMBOSS | 244 |
| Mixed UWorld+AMBOSS | 249 |
| Kaplan | 237 |
| USMLE-Rx | 234 |
The differences are not massive—on the order of 5–15 points between groups—but that is exactly the scale that matters for competitive specialties.
3. UWorld as Primary: The Benchmark Distribution
UWorld is the default reference. Most other banks are compared against it for one simple reason: in virtually every large retrospective survey, UWorld completion strongly tracks with higher scores, regardless of whether it is first-line or second.
Typical patterns you see when you break down by UWorld completion:
| UWorld Completion | Median Score | 25–75th Percentile | % ≥ 250 |
|---|---|---|---|
| <50% of questions | 235 | 225–244 | 12% |
| 50–74% | 242 | 234–250 | 22% |
| 75–99% | 247 | 238–255 | 31% |
| 100%+ w/ many incorrects | 251 | 242–259 | 38% |
There is a clear dose–response relationship: more UWorld questions done, more complete review of incorrects, higher median and right tail.
What about using UWorld as the only bank versus pairing it?
- UWorld-only users who thoroughly finish the bank have solid medians (mid-240s) and broad right tails.
- UWorld + limited second bank (e.g., 20–40% AMBOSS) often see a small bump, but only if those extra questions are actually reviewed, not just mindlessly clicked.
Key observation from retrospective data: UWorld as primary is usually present in >70% of ≥250 scorers, regardless of what else they did. That is correlation, not causation, but the correlation is persistent.
4. AMBOSS as Primary: Similar Medians, Different Shape
AMBOSS is the most serious competitor in recent years. When you look at Step 2 CK retrospective surveys filtered for “AMBOSS as primary Q-bank” (typically meaning more AMBOSS than UWorld, or AMBOSS completed first), several patterns appear:
- Median scores are very close to UWorld medians—often within 2–4 points.
- The lower tail (<230) appears slightly thinner among those who actually complete >70% of AMBOSS with active use of articles.
- The extreme top tail (260+) is still dominated by people who used UWorld at some point, but that group frequently also used AMBOSS.
Roughly:
| Category | Min | Q1 | Median | Q3 | Max |
|---|---|---|---|---|---|
| UWorld Primary | 230 | 238 | 247 | 255 | 265 |
| AMBOSS Primary | 229 | 236 | 244 | 252 | 262 |
Interpretation:
- UWorld primary: slightly higher median, a bit longer right tail
- AMBOSS primary: similar center but arguably slightly fewer catastrophic outcomes among serious users
A consistent pattern in self-reports: AMBOSS-first students often emphasize better integration with rapid concept review (the articles) and more “learning-oriented” explanations, which likely reduces knowledge gaps that lead to failing scores. But the absolute upper limit is still heavily dependent on:
- NBME practice performance
- Dedicated period structure
- Prior pre-clinical foundation
You cannot Q-bank your way from 205 NBME to 270 Step 2 regardless of which logo is on the interface.
5. Kaplan and USMLE-Rx as Primary: The Data Is Less Kind
Kaplan and USMLE-Rx are heavily represented in two populations:
- Early pre-clinical board-prep keeners (“I started a bank in MS1”)
- Students at schools that bundled these with tuition
When you filter for people who used these as primary banks in the months leading up to the exam—and did not later deeply engage with UWorld or AMBOSS—the score distributions shift left.
From multiple retrospective surveys:
- Kaplan-primary groups show median Step 2 CK scores ~7–10 points lower than UWorld-primary groups.
- USMLE-Rx-primary groups often have even lower medians and a noticeably fatter left tail (sub-230).
Why?
Three recurring themes in the data and commentary:
Question quality and style
Kaplan and Rx questions often feel more “template-driven” or fact-based, less reflective of modern NBME vignettes. Students report a mismatch when they hit real NBMEs after doing thousands of these.User selection
Stronger students tend to gravitate to UWorld/AMBOSS based on peer and faculty advice. Kaplan/Rx primary users are often:- At schools with weaker board advising
- International graduates with limited guidance
- Budget-constrained and re-using older resources
Timing of use
Kaplan/Rx are frequently used “too early,” then abandoned. Early, shallow exposure is not building the test-taking skills that correlate with higher scores in surveys.
Used as secondary banks—after or alongside a full UWorld pass—the negative signal largely disappears. The problem is not “touching Kaplan”; it is relying on Kaplan style as your primary exam proxy.
6. Mixed Q-Bank Strategies: When Two Banks Outperform One
One of the more interesting signals in retrospective survey data appears when you isolate a specific group:
- Completed ≥90% of UWorld
- Completed ≥60% of a second high-quality bank (usually AMBOSS)
- Did active review of incorrects in both
These students are not the norm. This is a high-discipline subgroup. But their score distribution is consistently shifted upward.
From combined survey approximations:
| Group | Median Score | 25–75th Percentile | % ≥ 250 | % ≥ 260 |
|---|---|---|---|---|
| UWorld-only (≥90% completion) | 248 | 240–256 | 34% | 14% |
| UWorld+AMBOSS (both high-usage) | 252 | 244–260 | 42% | 21% |
Visualizing the high-scorer bands:
| Category | Score < 250 | 250–259 | ≥260 |
|---|---|---|---|
| UWorld-only | 66 | 20 | 14 |
| UWorld+AMBOSS High-Use | 58 | 21 | 21 |
Again: there is selection bias. Students who have the drive to crush 2 large Q-banks thoroughly are not representative. But the data still supports a simple conclusion:
- For high-ability, high-discipline students, a two-bank strategy (UWorld + AMBOSS) is associated with a meaningfully higher chance of landing in the 250+ and 260+ ranges.
- For average or struggling students, one bank done thoroughly outperforms two banks done superficially. In surveys, “two half-finished banks” correlates with lower scores than “one fully mastered bank”.
7. Correlation Between Q-Bank Performance and Actual Scores
Score distributions by which Q-bank you use are only half the story. What actually predicts outcomes better is how you perform inside the bank.
Across UWorld, AMBOSS, and Kaplan, internal or user-shared data consistently show:
- Average % correct on a large, mixed-timing, random-tutor set correlates strongly (r ≈ 0.7–0.8) with NBME and real Step performance when measured close to the exam.
- Score prediction formulas (e.g., from UWorld’s self-assessments, AMBOSS self-assessments) produce error margins of roughly ±5–8 points around the actual Step score for most users who took the exam within 1–4 weeks.
Typical mapping ballpark (Step 2 CK):
| Mixed Random Q-Bank % Correct | Typical Score Range |
|---|---|
| <55% | 220–235 |
| 55–64% | 235–245 |
| 65–74% | 245–255 |
| 75–84% | 255–265 |
| ≥85% | 265+ |
These ranges are broad, but they hold remarkably well across retrospective analyses—with one big caveat:
- Timing matters.
70% averaged three months before exam while still in heavy content acquisition mode is not the same as 70% in the last 2 weeks on mixed timed blocks.
The data shows the strongest predictive value when:
- Blocks are timed, random, and cumulative
- At least 1000–1500 questions have been completed
- The performance being measured is within ~30 days of the exam
In that context, the distinction between Q-banks (UWorld vs AMBOSS) becomes relatively minor compared to your actual performance curve in whichever you are using.
8. How Much of Score Variance Is Actually “Q-Bank Choice”?
This is the question almost nobody frames correctly.
From multiple regression-style analyses done on survey datasets (yes, people actually run these on Reddit CSVs), when you model Step 2 CK score as a function of:
- Baseline NBME score (e.g., last CBSE)
- Total Q-bank questions completed
- Primary Q-bank used
- Dedicated length
- Practice NBME count
You repeatedly see:
- Baseline NBME explains the largest slice of variance (often 40–50%+).
- Total questions completed and NBME count typically add another chunk (10–20% of variation).
- Primary Q-bank choice, after accounting for those, usually explains a small residual share. Think low single-digit percentage of variance.
In simpler terms: Q-bank choice is a second-order effect. Baseline knowledge and how intensely and intelligently you practice dominate.
The retrospective data supports a ranking like this for Step 2 CK:
- UWorld or AMBOSS as serious primary banks → associated with highest medians and right tails.
- Mix of UWorld + AMBOSS, both deeply used → strongest signal for very high scores (260+), among high-ability students.
- Kaplan/Rx as primary without transitioning to UWorld/AMBOSS → associated with lower medians and more underperformance relative to practice NBMEs.
- No major Q-bank / incomplete usage → consistent risk factor for low outcomes.
But the size of those effects is dwarfed by baseline exam performance and question volume.
9. Practical Takeaways from the Data (Not the Marketing)
Let’s extract what the numbers actually support, stripped of branding.
Use at least one high-fidelity NBME-style bank as your primary.
The data overwhelmingly points to UWorld and AMBOSS as the best-aligned with modern NBME style. Choose one. Commit. Aim for ≥75–100% completion with actual review of missed questions.Completion and review outgun brand.
In every retrospective set I have seen, students who:- Finished one major bank completely
- Reviewed most incorrects
- Took multiple NBMEs
Outperformed peers who scattered time across several banks without mastery—even if those peers “used” UWorld or AMBOSS.
Two-bank strategies are powerful but not mandatory.
Adding a second strong bank (UWorld + AMBOSS combo) correlates with a higher proportion of high scores, but only in students who already have strong baseline performance and discipline. If you are barely surviving shelf exams, trying to do 2 full banks is usually a trap, not a hack.Kaplan/Rx are acceptable supplements, not ideal primaries.
Retrospective surveys consistently show worse distributions when they are the only serious bank. If your school gives you Kaplan/Rx, fine. But if you stop there and never pivot to UWorld/AMBOSS, the historical data suggests higher risk of underperforming.Track your own internal metrics.
Instead of obsessing over which Q-bank others used, focus on predictive indicators you can control:- Your moving average % correct in timed, random blocks
- Your NBME practice scores and trends
- Total question volume with real review
Those actually correlate with your future distribution, not just the brand label of your Q-bank.
10. Final Summary: What Retrospective Surveys Really Show
Condensed:
UWorld and AMBOSS as primary Q-banks are consistently linked to higher median and upper-quartile Step 2 CK scores, with UWorld slightly ahead on raw medians and UWorld+AMBOSS combos dominating the top tail among highly motivated students.
Kaplan and USMLE-Rx, when used as sole serious banks, correlate with lower score distributions and more left-tail outcomes, but as secondary resources after UWorld/AMBOSS their negative signal largely disappears.
Across all datasets, question volume, completion, and performance within the bank explain far more score variance than Q-bank brand choice itself. Which means your real competitive edge is not picking the “magical” bank—it is how aggressively and intelligently you squeeze value out of whichever strong bank you choose.