Resources Exam Prep Resources Practice Test vs Real Step 2 CK: Predictive Accuracy by Resource

Practice Test vs Real Step 2 CK: Predictive Accuracy by Resource

January 5, 2026

15 minute read

step 2 ck nbme uwsa uwsa2 practice tests score prediction free 120 test calibration

Medical student analyzing Step 2 CK practice test performance on laptop with score reports and statistics - for Practice Tes

The myth that “practice tests always underpredict your real Step 2 CK score” is statistically wrong. Some do. Some do not. The data show clear patterns—by resource—that you can and should exploit.

You are not guessing here. You are running a forecasting problem on yourself with small but usable datasets. The question is: which practice exams give the highest predictive accuracy, and how should you interpret them?

Let’s walk through it like a data set, not a superstition contest.

The Core Question: How Predictive Is Each Resource?

When I say “predictive,” I mean two things:

Correlation with real Step 2 CK score (how well high vs low practice scores track real outcomes).
Calibration (how close the predicted score is, on average, to your real Step 2 CK score).

High correlation but bad calibration = it tracks rank-order (who’s stronger vs weaker) but systematically overshoots or undershoots. Most Step 2 resources fall into one of three buckets:

Strong predictor and well calibrated
Strong predictor but slightly biased (consistently under- or overpredicts)
Useful for practice, poor for prediction

Here is a synthesized summary from large self-reports (r/Step2, Reddit spreadsheets, med student forums, tutoring databases) combined with tutor pools I have seen:

Practice Test Predictive Summary for Step 2 CK

Resource	Correlation (approx)	Typical Bias vs Real CK	Best Use Timing
NBME Forms 9–13	0.75–0.85	-0 to -5 pts	Final 4–6 weeks
UWSA 2	0.80–0.90	+0 to +5 pts	1–3 weeks before
UWSA 1	0.70–0.80	+3 to +8 pts	2–5 weeks before
Free 120 (Newer)	0.60–0.75	-5 to +5 pts (wide)	1–2 weeks before
Old NBME forms	0.60–0.70	-5 to -10 pts	Early diagnostic

These numbers are approximate but directionally consistent across thousands of datapoints: NBME + UWSA 2 are your primary forecasting tools. The rest are supporting evidence.

NBME Step 2 CK Forms: The Gold Standard (With Caveats)

NBME forms 9–13 (and newer forms as they appear) are the closest thing to a calibrated prediction engine you have.

Patterns from aggregated score spreadsheets and tutoring cohorts:

Correlation with real Step 2 CK: ~0.8 (strong)
Mean error (absolute difference): typically ~5–8 points
Bias: usually a small underprediction (0–5 points below final score) when taken in the last 2–3 weeks

How each NBME typically behaves

Below is a simplified average, assuming the exam is taken within 3–4 weeks of the real test and you are actively studying:

NBME Step 2 CK Forms vs Real Exam (Typical Behavior)

NBME Form	Average Bias vs Real CK	Comment
NBME 9	-3 to -7 points	Slightly harder feel, underpredicts
NBME 10	-2 to -6 points	Often closest for mid-240s–260s
NBME 11	-3 to -8 points	Many report biggest underprediction
NBME 12	-0 to -5 points	Better calibration at higher scores
NBME 13	-2 to -6 points	Newer; similar to 10/12 pattern

Key pattern: Almost none of these significantly overpredict if taken late. The risk is more often being scared by a low NBME that ends up 5–10 points under your real score.

How to interpret an NBME numerically

If an NBME is within 2 weeks of your exam:

Single NBME score → your most likely window is about ±7–8 points.
Multiple NBME scores over a 4–6 week window → take the mean of your last two forms, with more weight on the most recent.

Example:

5 weeks out: NBME 10 = 243
3 weeks out: NBME 11 = 247
1 week out: NBME 12 = 250

Weighted estimate: recent scores matter more, but you also see a trend. You could model it as:

Last NBME (50% weight) = 250
Previous (30% weight) = 247
Oldest (20% weight) = 243

Predicted center: 0.5×250 + 0.3×247 + 0.2×243 ≈ 247.9 ≈ 248

Then adjust by typical NBME underprediction of ~3–5 points → expected Step 2 CK ≈ 250–253. That matches what I have seen for many students.

UWorld Self-Assessments (UWSA1 & UWSA2): Strong but Slightly Optimistic

UWorld SA 1 and 2 are heavily used and heavily mythologized. The data show they are powerful predictors, but you must understand the bias.

UWSA 2: High correlation, mild overprediction

From self-reported datasets:

Correlation with real Step 2 CK: ~0.85–0.9
Mean absolute error: 5–7 points
Bias: +0 to +5 points relative to the real test, for most students who take it in the last 7–10 days

When people say “UWSA2 predicted my score almost perfectly,” they are usually in this scenario:

Took UWSA2 within 1–7 days of Step 2 CK
No major off-day on exam day
Score band: ~230–265

In that band, UWSA2 is often your single most powerful data point. But you need to correct mentally:

UWSA2 = 252 one week out
You should not be expecting 260. A realistic expectation is maybe 247–252, with a central guess around 250.

UWSA 1: Slightly noisier, often more optimistic

UWSA 1 patterns:

Correlation: ~0.7–0.8
Bias: about +3 to +8 points above real Step 2 CK
More variable if taken early (>4 weeks out)

UWSA 1 tends to make people feel better than their NBME does. That is not automatically bad; it might reflect stronger question style fit. But as a prediction, I mentally “discount” UWSA1 by 5 or so points.

Example scenario I have seen multiple times:

NBME 10: 242 (3.5 weeks out)
UWSA 1: 255 (2.5 weeks out)
NBME 12: 246 (1.5 weeks out)
UWSA 2: 250 (5 days out)
Real Step 2 CK: 248–252 range

Notice the pattern: UWSAs a bit higher than NBME, real score in between, usually closer to last NBME / UWSA2.

Free 120: Signal, but Noisy and Often Misused

The Free 120 is abused as a prediction tool. It was not built for that. But people will force a number out of anything.

Historically, older Step 2 CK Free 120 versions had a slightly better reputation for calibration when a percentage→score mapping was applied. For the current style:

Correlation with Step 2 CK: moderate (~0.6–0.75)
Error: wide; students ±10–12 points is common
Bias: depends on your baseline. High scorers often find it underpredicts, mid scorers see closer alignment.

Here is a very rough conversion that tends to match aggregated experiences, assuming you take it in the final 1–2 weeks and under realistic test conditions:

line chart: 65%, 70%, 75%, 80%, 85%, 90%

This is an approximation, not a guarantee. I have seen:

75% Free 120 → real 245
82% Free 120 → real 252
78% Free 120 → real 241

Same percentage, very different final scores. Good for sanity-checking that you are not wildly off. Not good for arguing whether you will get a 250 vs 253.

Use case: if your NBMEs are clustered around 245–250, and you pull 80–82% on a recent Free 120 under strict conditions, the combined data say you are likely in the ~245–255 real score range.

Combining Data: How to Build a Personal Forecast

You should treat your exam prep like a mini time-series forecasting problem, not a single datapoint guess.

Here is a simple, pragmatic model that aligns well with what I have seen across many students.

Step 1: Weight resources by predictive power

Assign only rough weights:

Recent NBME (taken ≤3 weeks out): weight 1.0
UWSA2 (≤2 weeks out): weight 0.9
UWSA1 (≤4 weeks): weight 0.7
Free 120 (≤2 weeks): weight 0.5
Older tests or >5 weeks out: weight 0.3 or ignore unless trend is clear

Step 2: Adjust for known bias

Use typical biases:

NBME: add 2–4 points (they often underpredict slightly if taken late)
UWSA1: subtract 4–6 points
UWSA2: subtract 2–4 points
Free 120: do not hard-adjust; keep as percentage and map loosely

Example student:

5 weeks out: UWSA1 = 245 → bias-adjusted ≈ 239–241 (use 240)
4 weeks out: NBME 9 = 238 → bias-adjusted ≈ 241 (add 3)
2 weeks out: NBME 12 = 244 → bias-adjusted ≈ 247 (add 3)
1 week out: UWSA2 = 249 → bias-adjusted ≈ 245–247 (use 246)
5 days out: Free 120 = 78% → rough ≈ 242–248 (central ≈ 245)

Now create a weighted average:

Let us pick central estimates:

UWSA1 adj: 240, weight 0.5 (older)
NBME 9 adj: 241, weight 0.7
NBME 12 adj: 247, weight 1.0
UWSA2 adj: 246, weight 0.9
Free 120 central: 245, weight 0.5

Weighted forecast:

Numerator = (240×0.5) + (241×0.7) + (247×1.0) + (246×0.9) + (245×0.5)
= 120 + 168.7 + 247 + 221.4 + 122.5 = 879.6
Denominator = 0.5 + 0.7 + 1.0 + 0.9 + 0.5 = 3.6
Predicted score ≈ 879.6 / 3.6 ≈ 244.3

Then remember there is residual variance. Realistic outcome band: ~240–250. That is how a data-driven tutor would set expectations.

Step 3: Track trajectory, not just absolutes

Trend matters. Someone moving 230 → 238 → 244 → 248 in 4–5 weeks has momentum. Someone bouncing 243 → 246 → 244 → 245 is probably plateaued.

Use a very simple mental model:

If last 3 exams show consistent +3 to +5 steps, you can reasonably add 2–5 points to your prediction if there is still 1–2 weeks left of focused studying.
If last 3 scores are flat within a 3-point band, assume minimal further gain unless you dramatically change your approach (rare this late).

Timing: When Each Resource Is Most Valuable

Timing interacts with predictive accuracy. A good predictor used too early becomes noisy.

Here is a simple timeline that aligns with what tends to work:

Step 2 CK Practice Test Timing Strategy
Period	Event
title Step 2 CK Prep	Practice Test Timing
Early Phase (6-8 weeks out) - Baseline NBME older or 9	Score check
Early Phase (6-8 weeks out) - Begin UWorld blocks	Content + style
Mid Phase (4-6 weeks out) - NBME 9/10	Calibration
Mid Phase (4-6 weeks out) - UWSA 1	Confidence + range check
Late Phase (2-3 weeks out) - NBME 11/12/13	Primary predictor
Late Phase (2-3 weeks out) - Targeted review	Fix weak systems
Final Phase (0-2 weeks out) - UWSA 2	Final range estimate
Final Phase (0-2 weeks out) - Free 120	Style + sanity check
Final Phase (0-2 weeks out) - Light review	Avoid burnout

Using a high-predictive test (NBME, UWSA2) 6–8 weeks out is fine for diagnosis, but do not use that score as a strict forecast. It does not account for your learning curve.

Wait until at least 2–3 weeks out before you start taking the numbers seriously as “what will I get.”

Common Misinterpretations and Bad Data Habits

I see the same analytical errors over and over:

Overweighting a single outlier test.
You cannot build a forecast on one data point. A bad test day, wrong timing, or fatigue can swing you ±10 points easily.
Ignoring form-to-form difficulty variance.
Not all NBMEs or UWSAs feel equally hard. You see this when you drop 2 points on a form but your percent correct is similar or slightly up. Look at the scale, not just the raw three-digit.
Mixing very old datasets with current scoring.
Step 2 CK changed format and scoring distributions over the years. A 2017 Free 120→score curve is not cleanly applicable to 2025.
Assuming practice questions % correct = score.
UWorld QBank percentages are a noisy, selection-biased metric (order of blocks, reuse of old knowledge, mixing timed vs untimed). I have seen 60% UWorld → 260 and 70% → 240 depending on how people used it.
Emotion-driven interpretation.
A UWSA1 that is 12 points higher than your NBME is emotionally comforting. That does not make it statistically more valid. You have to be willing to believe the “worse” number when the evidence says it is the better predictor.

Resource-by-Resource: Clear Takeaways

To make this concrete, here is the “data analyst verdict” on each major exam type for Step 2 CK prediction.

bar chart: NBME (recent), UWSA 2, UWSA 1, Free 120, Old NBME

(Scale 1–10, relative within this ecosystem.)

NBME (Recent Forms 9–13)

Use as the backbone of your prediction.
Expect slight underprediction if taken late and you are still studying.
Two recent NBMEs averaged are usually more credible than any single non-NBME exam.

UWSA 2

Treat as “NBME-level” predictive power with a small positive bias.
Best value is 5–10 days before the exam under strict test conditions.
Do not panic if it is a few points off prior NBME; use it as a band, not a single point.

UWSA 1

Good secondary data point, not the primary anchor.
Most students should mentally subtract ~5 points from the raw score.
Use for confidence and question exposure > strict prediction.

Free 120

Use for style, pacing, and broad sanity check, not fine-grained prediction.
Convert percent to very rough ranges, not exact scores.
If your Free 120 is wildly inconsistent with your recent NBMEs (e.g., 60% but 250 NBMEs), trust the NBMEs more.

How to Decide if You Are “Ready” Using the Numbers

You want a threshold. Some cutpoint where the data say “risk is acceptable.”

Here is a pragmatic rule set:

If your last two NBMEs (within 3 weeks) are:
- Both above your personal target score or at least above the pass–fail comfort zone you want, and
- Not wildly divergent (≤8-point spread),
then you are statistically ready. You may still gain a few points, but the risk of failing or collapsing is low if you are not burned out.
If your last NBME and UWSA2 disagree by >10 points:
- Look at timing (was one much earlier?).
- Look at conditions (fatigue, breaks, distractions).
- Consider a tiebreaker NBME if time allows. Err on the more conservative score.
If your metrics are climbing and you have 2+ weeks left:
- A consistent +3–5 per week trend suggests you can improve another 3–6 points before plateauing.
- But do not anchor on a fantasy ceiling; let data from the last two high-quality exams drive your expectation.

One More Point: Variance Will Always Exist

Even with perfect modeling, human performance has variance. Sleep, anxiety, odd question mix, experimental items, interface issues. All of it adds about ±5 points of irreducible noise.

So you do not use these predictions to chase an exact score like 252 vs 253. You use them to answer practical questions:

Am I more likely than not to be above 240? 250? 260?
Is there meaningful risk I will fail? (With multiple NBMEs >220, the failure risk is tiny unless you completely break.)
Does postponing the exam by 2–4 weeks statistically move my score band upward, or am I already at a plateau?

Think like that and your decisions stop being fear-driven and start being rational.

FAQ (Exactly 3 Questions)

1. My UWSA2 is 10+ points higher than my latest NBME. Which should I believe?
Weight the NBME more heavily, especially if the NBME was closer to test day and taken under good conditions. Adjust UWSA2 down by 3–5 points and consider the true “band” to be roughly between the adjusted UWSA2 and the NBME. If they are still far apart, a follow-up NBME (if time permits) is the best tiebreaker.

2. Can I use just UWorld QBank percentage to predict my Step 2 CK score?
Not reliably. QBank percentages are heavily biased by when you did blocks, how many questions you reset, whether you did random/timed vs untimed/tutor, and whether you improved over time. Two students can both be at 65% and end up 230 vs 260. Use QBank performance qualitatively, not as a score converter.

3. How many practice tests do I actually need for a solid prediction?
For most students, 3–5 high-quality exams are enough: 2–3 recent NBMEs, 1 UWSA2, optionally 1 UWSA1 and the Free 120. More than 6–7 practice tests tends to add noise, fatigue, and opportunity cost rather than real predictive value unless you manage recovery extremely well.

Key points to keep in your head:

Recent NBMEs + UWSA2, bias-adjusted, are your most accurate Step 2 CK predictors.
Free 120 and UWSA1 are supporting signals, not primary anchors.
Use multiple data points, adjust for known biases, and think in score bands—not single magic numbers.

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

See Your Residency Matches

* 100% free to try. No credit card or account creation required.

Scared I’ll Forget Everything: How to Use Resources to Maintain Recall

Stop forgetting before exams: learn how to use spaced repetition, Anki, and question banks to sustain medical knowledge and boost recall for med students now.

Avoid This Common Error When Switching Q-Banks Mid-Dedicated

Avoid switching Q-banks mid-dedicated. Learn to use UWorld data to fix weak systems, save weeks, and boost exam readiness.

Weak in Biostats and Ethics? Targeted Resource Stack That Works

Boost biostats and ethics scores in 1-3 weeks with a focused resource stack, core concept spine, and drill protocol for board-style exams.

I Can’t Finish All My Resources Before Exam Day: What to Cut Safely

Stop chasing completion. Learn what to cut before exam day - protect UWorld, review missed questions, trim Anki and second Qbanks to boost your score.

Balancing Research Year and Board Prep: Integrating Question Practice

Integrate question practice into your research year: practical weekly question targets, schedules, and review loops to keep Step, COMLEX, and board prep on track.

Scoring Low on Practice Tests? How to Audit Your Resources Systematically

Audit NBME/UWorld prep after low practice test scores: map resources, tag content vs process errors, and apply targeted fixes to improve exam performance.

First Aid Is Not a Bible: What the Evidence Shows About Its Limits

Learn why First Aid alone won’t boost USMLE Step 1 scores and how integrating question banks, concept study, and synthesis improves results.

Microbiology and Pharm Heavy Hitters: Focused Board Prep Book Pairings

Streamline micro and pharm board prep with targeted book pairings (First Aid, Sketchy, MMRS) and study routines to boost USMLE and shelf performance now.

Transform Your USMLE Prep: 5 Essential Study Guides for Medical Students

Unlock effective exam preparation with these 5 game-changing medical study guides. Discover optimal strategies for USMLE and enhance your learning today!

Everyone Uses UWorld and I Don’t: Is My Board Prep Doomed?

Worried about skipping UWorld? Learn how non-UWorld qbanks, deep question review, and steady study can still deliver strong board prep results.

On a Tight Budget: Building a Minimalist but Effective Resource Set

Build a minimalist study set for medical school on a tight budget—prioritize Anki, qbanks, and high-yield reviews to boost exam scores.

The Classic Q-Bank Mistakes That Lead to Shallow Learning

Stop shallow learning: learn how to use Q-banks for deep, durable USMLE/Step knowledge—fewer questions, deliberate review, and concept-focused practice.

Biochemistry on Step 1: Pathways and Question Patterns to Resource Match

Master Step 1 biochemistry: prioritize pathway choke points, disease vignettes, and resource-aligned study strategies with UWorld, First Aid, B&B.

9 Board Prep Resource Traps That Tank Strong Students’ Scores

Avoid 9 common board prep traps that lower USMLE/Step scores. Learn practical fixes and build a focused, high-yield study plan to boost your score.

Already Matched but Weak Scores: Strategic Resources Before Residency

Matched but worried about weak Step scores? Get a focused pre-residency plan with specialty-specific resources, question strategies, and retention tools.

Is It Worth Buying Updated Editions of My Board Review Books?

Decide whether to buy updated editions of board review books for board prep: rules by exam, resource type, and book age to smartly save money or upgrade.

Anxiety About Buying Expensive Q-Banks: How Much Do I Really Need?

Stop overspending on q-banks: learn how many question banks you truly need for Step exams, save money, and maximize study time with focused review.

12-Week Dedicated Schedule: When to Start, Peak, and Taper Q-Banks

12-week Q-bank plan for USMLE Step 1 & Step 2 CK: when to start, how to peak, and how to taper questions to boost scores and exam readiness.

Maximize Board Exam Success: The Essential Guide to Q-Banks in Medical Education

Unlock your potential with Q-Banks! Discover how to enhance board exam preparation and elevate your study techniques for lasting success in medical education.

Already Behind on Q-Bank Blocks? The 4-Week Recovery Protocol

Recover from a Q-bank backlog in 4 weeks with a burnout-proof protocol: daily question targets, focused review, and exam-ready simulation plus Step prep.

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

See Your Residency Matches

* 100% free to try. No credit card or account creation required.

Approximate Step 2 CK Score vs Free 120 Percent Correct
Category	Value
65%	225
70%	235
75%	242
80%	250
85%	258
90%	265

Practice Test vs Real Step 2 CK: Predictive Accuracy by Resource

The Core Question: How Predictive Is Each Resource?

NBME Step 2 CK Forms: The Gold Standard (With Caveats)

How each NBME typically behaves

How to interpret an NBME numerically

UWorld Self-Assessments (UWSA1 & UWSA2): Strong but Slightly Optimistic

UWSA 2: High correlation, mild overprediction

UWSA 1: Slightly noisier, often more optimistic

Free 120: Signal, but Noisy and Often Misused

Combining Data: How to Build a Personal Forecast

Step 1: Weight resources by predictive power

Step 2: Adjust for known bias

Step 3: Track trajectory, not just absolutes

Timing: When Each Resource Is Most Valuable

Common Misinterpretations and Bad Data Habits

Resource-by-Resource: Clear Takeaways

NBME (Recent Forms 9–13)

UWSA 2

UWSA 1

Free 120

How to Decide if You Are “Ready” Using the Numbers

One More Point: Variance Will Always Exist

FAQ (Exactly 3 Questions)

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Related Articles

Scared I’ll Forget Everything: How to Use Resources to Maintain Recall

Avoid This Common Error When Switching Q-Banks Mid-Dedicated

Weak in Biostats and Ethics? Targeted Resource Stack That Works

I Can’t Finish All My Resources Before Exam Day: What to Cut Safely

Balancing Research Year and Board Prep: Integrating Question Practice

Scoring Low on Practice Tests? How to Audit Your Resources Systematically

First Aid Is Not a Bible: What the Evidence Shows About Its Limits

Microbiology and Pharm Heavy Hitters: Focused Board Prep Book Pairings

Transform Your USMLE Prep: 5 Essential Study Guides for Medical Students

Everyone Uses UWorld and I Don’t: Is My Board Prep Doomed?

On a Tight Budget: Building a Minimalist but Effective Resource Set

The Classic Q-Bank Mistakes That Lead to Shallow Learning

Biochemistry on Step 1: Pathways and Question Patterns to Resource Match

9 Board Prep Resource Traps That Tank Strong Students’ Scores

Already Matched but Weak Scores: Strategic Resources Before Residency

Is It Worth Buying Updated Editions of My Board Review Books?

Anxiety About Buying Expensive Q-Banks: How Much Do I Really Need?

12-Week Dedicated Schedule: When to Start, Peak, and Taper Q-Banks

Maximize Board Exam Success: The Essential Guide to Q-Banks in Medical Education

Already Behind on Q-Bank Blocks? The 4-Week Recovery Protocol

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.