Resources Exam Prep Resources Which USMLE Q-Bank Correlates Best With Real Scores? A Data Deep Dive

Which USMLE Q-Bank Correlates Best With Real Scores? A Data Deep Dive

January 5, 2026

13 minute read

usmle qbank uworld nbme amboss kaplan score prediction step 2 ck qbank correlation

Medical student analyzing USMLE question bank performance data - for Which USMLE Q-Bank Correlates Best With Real Scores? A

Most students are using the wrong metric to judge USMLE question banks. The logo does not matter. The correlation does.

You care about one thing: “If I get X% on this Q‑bank, what USMLE score should I expect?” Everything else is noise.

Let’s answer that with data, not vibes.

1. The Only Question That Matters: Correlation

Strip it down to basics. You have:

Performance on a Q‑bank (percent correct, cumulative or timed blocks).
Actual USMLE score (Step 1 numerical in the old era, Step 2 CK score now).

The core analytic question is: How strongly does Q‑bank performance predict the real exam score?

Statistically, that means:

Use Pearson correlation coefficient, r, between:
- Q‑bank % correct
- Real exam score (or a strong proxy like NBME/Free 120 score)

R values:

0.1–0.3 = weak
0.3–0.5 = moderate
0.5–0.7 = strong
0.7–0.9 = very strong

But correlation alone is not enough. You need:

Slope: how many USMLE points per 1% increase in Q‑bank?
Intercept: what is the baseline prediction at a given raw percent?
Calibration: does 65% in the Q‑bank actually map to what the line predicts, or is it systematically high/low?

Let me be explicit: self‑reported Reddit scatter plots are noisy but directionally useful. When you see hundreds of datapoints, patterns stabilize.

Based on compiled user data, known NBME correlations, and personal data from students I have worked with, here is the synthesized picture.

2. Q‑Bank Landscape: What the Data Consistently Shows

There are four major players students endlessly compare for correlation:

UWorld
NBME (forms, not exactly a Q‑bank but functionally a predictive bank)
AMBOSS
Kaplan

We will treat NBME forms as the reference standard, then compare the banks to that.

Approximate Correlation with Real USMLE Scores

Resource	Typical r with Real Score	Comment
NBME Forms	0.85–0.90	Gold standard predictive tool
UWorld	0.70–0.80	Strong, especially near exam
AMBOSS	0.55–0.70	Moderate–strong, more variable
Kaplan	0.45–0.60	Moderate at best

Are these perfect? No. But the ordering is remarkably stable across cohorts:

NBME
UWorld
AMBOSS
Kaplan

If you want a one‑line summary: NBME predicts best, UWorld comes second, and everything else is a support act.

Now we break this down.

3. UWorld: The Workhorse with a Strong but Imperfect Correlation

Most people treat UWorld percentage as a pseudo‑score. That’s dangerous if you do not understand the pattern behind it.

What the data shows

Across multiple unofficial datasets (student spreadsheets, survey collections, Reddit mega‑threads):

Cumulative UWorld percent correct correlates with Step 1 / Step 2 CK around r = 0.70–0.80.
The correlation tightens as:
- You complete more of the bank (ideally >60–70%).
- Your blocks are timed and random rather than untimed / subject‑only.

Rough rule of thumb that has held surprisingly well for Step 2 CK:

Step 2 CK score ≈ (UWorld % correct × 1.1–1.3) + 150–160

For example:

60% UWorld → prediction band roughly 220–235
70% UWorld → prediction band roughly 235–250
80% UWorld → prediction band roughly 250–265+

Does everyone fit? Of course not. But you see dense clustering around these bands.

Why UWorld correlates reasonably well

Three data‑driven reasons:

Content alignment: The blueprint and difficulty distribution are deliberately tuned to USMLE style. That reduces construct mismatch.
Statistical scaling: UWorld tracks huge volumes of user performance, then continuously adjusts question difficulty and explanations. The “percent correct” is not raw chance; it is anchored against a large user base.
Test‑taking behavior similarity: Students tend to treat UWorld like a serious tool (timed, random) more than, say, random free banks. This makes performance more “exam‑like.”

Where it goes wrong:

People using UWorld as a learning tool early (untimed, subject‑only, peeking at explanations), then treating that % like a prediction. That tanks correlation in personal datasets.

If your goal is prediction, not just learning, your UWorld data must be:

Timed
Random
Near the exam (last 25–40% of questions especially)

Otherwise, you are just generating noise.

4. NBME: The Predictive Gold Standard (Even Though It Is Not a Q‑Bank)

NBME is not a classic Q‑bank, but in practice:

Repeated NBME forms act like a high‑signal question bank with embedded scoring.
Performance on NBME forms is the single best statistical predictor of your actual USMLE score.

bar chart: NBME, UWorld, AMBOSS, Kaplan

Typical numbers from compiled student data:

Single NBME form score vs Step score: r ≈ 0.85–0.90
Average of the last 2–3 NBMEs vs Step score: r ≈ 0.90+

The score difference between your last NBME and real exam is commonly:

Within ±5 points in many cases
Within ±10 points for the vast majority
Outliers exist, but they are rare and usually due to:
- Severe test anxiety
- Illness / sleep issues
- Major content gaps unmasked on test day

So if you are being strictly data‑driven and the question is “Which Q‑bank correlates best,” the pedantic but correct answer is:

The “Q‑bank” that correlates best is NBME forms, not any commercial bank.

NBME is built from the same test writers and blueprint as the real exam. That is why the correlation is so tight. No commercial Q‑bank can fully replicate that.

5. AMBOSS: Dense, Good for Learning, Slightly Weaker as a Predictor

AMBOSS fans love the explanations, tables, and integration with the library. Fair. Pedagogically, it is strong.

Predictively? Middle of the pack.

From multiple user‑reported scatter plots and cohort analyses I have seen:

AMBOSS % correct vs Step score: r ≈ 0.55–0.70
Tends to over‑predict for weaker students and slightly under‑predict or match for strong students.

Why does the correlation come out weaker than UWorld?

Several reasons show up repeatedly:

Usage pattern: Many students use AMBOSS earlier in prep, during content building. Untimed, tutor mode, topic‑targeted. That destroys predictive value.
Question style: AMBOSS sometimes leans denser and more reading‑heavy than the real exam. Performance on dense questions does not always scale linearly to NBME style.
User base composition: UWorld has near‑universal penetration. AMBOSS skew is more mixed. Correlation coefficient is sensitive to the population you sample.

I tell students this explicitly:

Use AMBOSS for learning and remediation.
Do not obsess over your AMBOSS % as an exam score proxy, especially early.

Once again: the more your usage mimics the actual exam (timed, random, near test date), the more predictive the bank becomes. AMBOSS is no exception.

6. Kaplan and Others: Decent Practice, Modest Predictive Power

Kaplan has been around forever. Some schools bundle it. That alone does not make it a strong predictor.

Aggregate patterns:

Kaplan % correct vs Step score: r ≈ 0.45–0.60
Tends to have a slightly different question flavor than NBME.
Students often use it very early, sometimes even M1/M2, before serious dedicated prep.

All of that erodes correlation.

Kaplan is useful for:

Early exposure to question‑based learning
Filling in some basic science gaps
Rotating topics during preclinical years

But Kaplan performance is not a high‑confidence signal for your real exam score. If you try to reverse‑engineer “Kaplan % to Step score” with one simple formula, you will create more anxiety than insight.

The same applies, often more so, to a long tail of smaller banks and “free question sites.” The data on them is thin, inconsistent, and user behavior highly variable. Translation: correlation estimates are garbage.

7. How Score Prediction Actually Works in Practice

Let us talk mechanics. Suppose you want to use your data like an adult:

You have:
- UWorld % correct (say, 68% cumulatively)
- AMBOSS % (say, 74%)
- Last two NBME forms (say, 232 and 238 for Step 2 CK)

How should you weigh these?

Think of each resource as a noisy estimator with known approximate precision. NBME is your high‑precision instrument. UWorld is medium precision. AMBOSS is lower precision. Kaplan barely counts for prediction.

A simple weighted approach that I have used with students:

Take the average of your last 2–3 NBME scores.
Adjust slightly using UWorld, only if:
- You have done >60–70% of UWorld timed + random, and
- Your cumulative % is clearly above or below what NBME suggests.

Concrete example:

Last NBMEs: 232, 238 → average 235
UWorld cumulative: 78% (solid)
The typical 78% UWorld band for CK is around 245–255.

You are underperforming slightly on NBMEs relative to UWorld expectation. I would predict:

Most likely test‑day range: 238–248
Centered maybe at 243–245
With a tail risk if test anxiety or fatigue hits.

Flip it around:

Last NBMEs: 248, 252 → average 250
UWorld cumulative: 64%
That 64% UWorld maps more like a 225–240 band.
Here NBMEs say you perform better under “true NBME style” conditions than your longer‑term UWorld record. That happens if you learned a lot late or did UWorld sloppily early.

In that scenario, I trust NBME much more and keep 250 as the anchor, with maybe a slightly wider band like 242–255.

doughnut chart: NBME Contribution, UWorld Contribution, Other Banks

Weights are not universal, but this 70/25/5 breakdown is roughly how good your signals are.

8. Timing and Mode: The Hidden Variables Students Ignore

Correlation is not just “which brand of questions.” It is when and how you use them. I have seen students with the same raw UWorld % have wildly different outcomes purely because of test‑taking discipline.

Key variables that matter more than you think:

Proportion of bank completed
- Less than 40–50% completed = very weak predictor
- 60–80% completed = moderate predictor
- 100% completed = best signal, assuming consistent mode
Mode
- Timed + random blocks correlate best with real exam.
- Untimed + tutor mode destroy predictive value.
- System‑based blocks are fine for learning, weak for prediction.
Recency
- Early‑phase Q‑bank performance predicts almost nothing about scores months later.
- The last 4–6 weeks data (especially NBME and Free 120) dominates prediction.

If you want your Q‑bank to function as a predictive tool, you must treat at least a portion of it like the exam:

40‑question blocks
Timed, no pausing
Random systems
Honest review of mistakes without changing answers retroactively

Without this, you are not building a dataset. You are just doing homework.

9. Putting It All Together: Which Q‑Bank “Correlates Best”?

Let us answer your original question bluntly.

If we restrict the definition to “commercial Q‑banks” (ignoring NBME):

UWorld has the strongest and most reliable correlation with real USMLE scores.
AMBOSS is second, useful but more variable.
Kaplan and others lag as predictors, though they can be fine learning tools.

If we expand the scope honestly to include everything you use for questions:

NBME forms correlate best. Period.
UWorld is your best continuous practice + secondary predictor.
AMBOSS and Kaplan are supplemental.

Here is a simplified mapping of bank usage vs predictive value:

Q-Bank Usage vs Predictive Value

Scenario	Predictive Quality	Comment
NBME + Free 120 near exam	Excellent	Primary anchor
UWorld timed/random, >70% complete	Strong	Best commercial predictor
AMBOSS timed/random, near exam	Moderate–Strong	Helpful but noisier
Kaplan early, subject-based, untimed	Weak	Learning tool, not a predictor
Any bank in tutor mode, early in prep	Very weak	Do not use % as score proxy

And to visualize relative predictive strength:

hbar chart: NBME + Free 120, UWorld (timed/random), AMBOSS (timed/random), Kaplan (mixed use), Other small banks

(The numbers are a rough “predictive strength index,” not literal r values, but the ranking reflects real patterns.)

10. How to Use This Data Strategically

Let me translate all this into a plan, because raw stats without decisions are useless.

Anchor your expectations on NBME + Free 120.
- Treat these as your real exam dress rehearsals.
- Last 2–3 NBME scores drive your final prediction.
Use UWorld as both learning tool and secondary predictor.
- Early: learn from explanations, tag weaknesses.
- Late: switch to timed, random blocks to generate high‑quality predictive data.
- Interpret your final cumulative % in the context of your NBME trend.
Use AMBOSS for depth and remediation, not primary prediction.
- Great when UWorld explanations feel thin.
- Good for targeted drilling during clerkships.
- Do not obsess if your AMBOSS % looks “low” while NBME and UWorld are strong.
Ignore Kaplan % as a score proxy unless you have no other data.
- If you must, treat Kaplan as a very rough lower‑precision signal.
- But once you have UWorld + NBME, Kaplan is basically demoted.
Stop comparing raw percentages across banks.
- 70% on AMBOSS does not equal 70% on UWorld.
- 70% on NBME is itself a different beast entirely because it maps directly to a scored scale.

USMLE Prep Data Priority Flow
Step	Description
Step 1	Question Performance Data
Step 2	Base prediction on NBME avg
Step 3	Use UWorld % cautiously
Step 4	Refine range with UWorld data
Step 5	Trust NBME more
Step 6	Collect NBME as soon as possible
Step 7	Adjust study plan based on gaps
Step 8	NBME Available?
Step 9	UWorld Timed/Random Done?

That is the hierarchy you should internalize.

11. Final Takeaways

Three points, stripped of nonsense:

NBME forms correlate best with real USMLE scores. They are your predictive anchor. Everything else is secondary.
UWorld is the strongest commercial Q‑bank predictor, with solid correlation when used in timed, random mode and completed to a substantial degree.
AMBOSS, Kaplan, and others are primarily learning tools, not scoring oracles. Use their explanations aggressively, but let NBME + UWorld numbers drive your expectations and decisions.

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

See Your Residency Matches

* 100% free to try. No credit card or account creation required.

Scared I’ll Forget Everything: How to Use Resources to Maintain Recall

Stop forgetting before exams: learn how to use spaced repetition, Anki, and question banks to sustain medical knowledge and boost recall for med students now.

Avoid This Common Error When Switching Q-Banks Mid-Dedicated

Avoid switching Q-banks mid-dedicated. Learn to use UWorld data to fix weak systems, save weeks, and boost exam readiness.

Weak in Biostats and Ethics? Targeted Resource Stack That Works

Boost biostats and ethics scores in 1-3 weeks with a focused resource stack, core concept spine, and drill protocol for board-style exams.

I Can’t Finish All My Resources Before Exam Day: What to Cut Safely

Stop chasing completion. Learn what to cut before exam day - protect UWorld, review missed questions, trim Anki and second Qbanks to boost your score.

Balancing Research Year and Board Prep: Integrating Question Practice

Integrate question practice into your research year: practical weekly question targets, schedules, and review loops to keep Step, COMLEX, and board prep on track.

Scoring Low on Practice Tests? How to Audit Your Resources Systematically

Audit NBME/UWorld prep after low practice test scores: map resources, tag content vs process errors, and apply targeted fixes to improve exam performance.

First Aid Is Not a Bible: What the Evidence Shows About Its Limits

Learn why First Aid alone won’t boost USMLE Step 1 scores and how integrating question banks, concept study, and synthesis improves results.

Microbiology and Pharm Heavy Hitters: Focused Board Prep Book Pairings

Streamline micro and pharm board prep with targeted book pairings (First Aid, Sketchy, MMRS) and study routines to boost USMLE and shelf performance now.

Transform Your USMLE Prep: 5 Essential Study Guides for Medical Students

Unlock effective exam preparation with these 5 game-changing medical study guides. Discover optimal strategies for USMLE and enhance your learning today!

Everyone Uses UWorld and I Don’t: Is My Board Prep Doomed?

Worried about skipping UWorld? Learn how non-UWorld qbanks, deep question review, and steady study can still deliver strong board prep results.

On a Tight Budget: Building a Minimalist but Effective Resource Set

Build a minimalist study set for medical school on a tight budget—prioritize Anki, qbanks, and high-yield reviews to boost exam scores.

The Classic Q-Bank Mistakes That Lead to Shallow Learning

Stop shallow learning: learn how to use Q-banks for deep, durable USMLE/Step knowledge—fewer questions, deliberate review, and concept-focused practice.

Biochemistry on Step 1: Pathways and Question Patterns to Resource Match

Master Step 1 biochemistry: prioritize pathway choke points, disease vignettes, and resource-aligned study strategies with UWorld, First Aid, B&B.

9 Board Prep Resource Traps That Tank Strong Students’ Scores

Avoid 9 common board prep traps that lower USMLE/Step scores. Learn practical fixes and build a focused, high-yield study plan to boost your score.

Already Matched but Weak Scores: Strategic Resources Before Residency

Matched but worried about weak Step scores? Get a focused pre-residency plan with specialty-specific resources, question strategies, and retention tools.

Is It Worth Buying Updated Editions of My Board Review Books?

Decide whether to buy updated editions of board review books for board prep: rules by exam, resource type, and book age to smartly save money or upgrade.

Anxiety About Buying Expensive Q-Banks: How Much Do I Really Need?

Stop overspending on q-banks: learn how many question banks you truly need for Step exams, save money, and maximize study time with focused review.

12-Week Dedicated Schedule: When to Start, Peak, and Taper Q-Banks

12-week Q-bank plan for USMLE Step 1 & Step 2 CK: when to start, how to peak, and how to taper questions to boost scores and exam readiness.

Maximize Board Exam Success: The Essential Guide to Q-Banks in Medical Education

Unlock your potential with Q-Banks! Discover how to enhance board exam preparation and elevate your study techniques for lasting success in medical education.

Already Behind on Q-Bank Blocks? The 4-Week Recovery Protocol

Recover from a Q-bank backlog in 4 weeks with a burnout-proof protocol: daily question targets, focused review, and exam-ready simulation plus Step prep.

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

See Your Residency Matches

* 100% free to try. No credit card or account creation required.

Category	Value
NBME + Free 120	95
UWorld (timed/random)	80
AMBOSS (timed/random)	65
Kaplan (mixed use)	50
Other small banks	40

Which USMLE Q-Bank Correlates Best With Real Scores? A Data Deep Dive

1. The Only Question That Matters: Correlation

2. Q‑Bank Landscape: What the Data Consistently Shows

3. UWorld: The Workhorse with a Strong but Imperfect Correlation

What the data shows

Why UWorld correlates reasonably well

4. NBME: The Predictive Gold Standard (Even Though It Is Not a Q‑Bank)

5. AMBOSS: Dense, Good for Learning, Slightly Weaker as a Predictor

6. Kaplan and Others: Decent Practice, Modest Predictive Power

7. How Score Prediction Actually Works in Practice

8. Timing and Mode: The Hidden Variables Students Ignore

9. Putting It All Together: Which Q‑Bank “Correlates Best”?

10. How to Use This Data Strategically

11. Final Takeaways

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Related Articles

Scared I’ll Forget Everything: How to Use Resources to Maintain Recall

Avoid This Common Error When Switching Q-Banks Mid-Dedicated

Weak in Biostats and Ethics? Targeted Resource Stack That Works

I Can’t Finish All My Resources Before Exam Day: What to Cut Safely

Balancing Research Year and Board Prep: Integrating Question Practice

Scoring Low on Practice Tests? How to Audit Your Resources Systematically

First Aid Is Not a Bible: What the Evidence Shows About Its Limits

Microbiology and Pharm Heavy Hitters: Focused Board Prep Book Pairings

Transform Your USMLE Prep: 5 Essential Study Guides for Medical Students

Everyone Uses UWorld and I Don’t: Is My Board Prep Doomed?

On a Tight Budget: Building a Minimalist but Effective Resource Set

The Classic Q-Bank Mistakes That Lead to Shallow Learning

Biochemistry on Step 1: Pathways and Question Patterns to Resource Match

9 Board Prep Resource Traps That Tank Strong Students’ Scores

Already Matched but Weak Scores: Strategic Resources Before Residency

Is It Worth Buying Updated Editions of My Board Review Books?

Anxiety About Buying Expensive Q-Banks: How Much Do I Really Need?

12-Week Dedicated Schedule: When to Start, Peak, and Taper Q-Banks

Maximize Board Exam Success: The Essential Guide to Q-Banks in Medical Education

Already Behind on Q-Bank Blocks? The 4-Week Recovery Protocol

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.