Resources Step 2 Preparation Practice Test Correlation: Which Step 2 CK Self-Assessment Best Predicts Score?

Practice Test Correlation: Which Step 2 CK Self-Assessment Best Predicts Score?

January 5, 2026

14 minute read

step 2 ck nbme uwsa2 uwsa1 free 120 practice tests self-assessment score prediction ck prep

Medical student reviewing Step 2 CK practice test performance analytics - for Practice Test Correlation: Which Step 2 CK Sel

Step 2 CK self-assessments are not equal, and the data makes that brutally clear.

If you treat every practice test as equally predictive, you will overestimate your score, underestimate your risk, and walk into exam day with false confidence. I have seen it over and over: a student hanging their hopes on one inflated form while ignoring the one assessment that was screaming a very different story.

Let me be direct: correlation with the real Step 2 CK is a measurable statistic. You can rank the major self-assessments by how closely they track real scores. You can quantify which ones overpredict, which ones underpredict, and by how much. That is what we will do here.

We will focus on the big four categories of predictors:

NBME “Clinical Mastery Series” forms (the official ones)
UWorld Self-Assessments (UWSA1, UWSA2)
Free 120 / “Sample Questions”
Q‑bank percentages (UWorld, sometimes AMBOSS) as a secondary signal

And the core question: Which Step 2 CK self-assessment best predicts your real score, and how should you interpret each number?

The Core Metric: Correlation, Bias, and Error

Before arguing about which exam is “best”, we need three numbers:

Correlation coefficient (r) – how tightly do practice scores and real scores move together?
Bias (mean prediction error) – on average, how many points do they over- or underpredict?
Spread (standard deviation of error) – how wide is the typical miss?

You will not get perfect peer-reviewed meta-analyses for every form. But you do not need perfection. You need approximate, stable patterns pulled from:

School-level shared spreadsheets (classes of 150–200 students)
Big Reddit / SDN Google sheets (hundreds to low thousands of entries)
Tutoring group internal trackers (dozens to hundreds of students, but with clean, verified data)

When you aggregate multiple such sources, the signal is surprisingly consistent.

Here is an evidence-synthesized summary. Numbers are approximate, but directionally reliable.

Step 2 CK Self-Assessment Predictive Performance

Exam Type	Correlation with Real CK (r)	Average Bias vs Real (Predicted − Actual)	Typical Error Band (±SD)
NBME newer forms	0.80–0.87	−2 to +3 points	~8–10 points
UWSA2	0.80–0.85	+2 to +5 points	~8–11 points
UWSA1	0.70–0.78	+5 to +10 points	~10–13 points
Free 120 (scaled)	0.65–0.75	−5 to +5 points (centered, very noisy)	~12–15 points

The exact decimal place does not matter. The hierarchy does.

Top tier correlation: NBME newest forms, UWSA2
Second tier: UWSA1
Utility tool, not a score predictor: Free 120

And yes, this means what you probably suspect: NBME CBSSAs and UWSA2 are your best predictors. But they behave differently, and their errors are not symmetric. That matters for planning.

NBME Step 2 CK Forms: The Gold(ish) Standard

NBME forms are the least glamorous but most useful. No fancy interface. No “wow, I jumped 30 points” stories. Just steady, boring predictive power.

Across multiple class spreadsheets I have seen, NBME forms usually show:

Correlation with real Step 2 CK: r ≈ 0.80–0.87
Average bias: extremely low, often within −2 to +3 points
Error band (1 SD): roughly ±8–10 points

So if your NBME 10 score converts to a 245, a realistic outcome range (based on actual data, not wishful thinking) is approximately 235–255 for most students, assuming no major change in prep between that exam and test day.

Why NBME forms track so well

The data shows three reasons:

Item style and calibration. NBME writes the real exam. Their question construction, distractor style, and difficulty distribution mirror the actual test more closely than any third-party resource.
Score scaling tuned to population data. These forms are built off real-use performance data from large cohorts, not a guess from a private company.
Less content gimmickry. They measure how you handle the actual Step 2 CK style, not how cute you are with UWorld patterns.

Are NBME forms “harder” or “easier”?

Students say both, often in the same week.

Quantitatively, what I see:

Lower raw percent correct than UWorld blocks at equivalent ability levels.
But scaled scores that are quite close to the real exam, especially if taken 1–4 weeks before test day.

The story that matches the data:
NBME forms feel harder because they remove some of the pattern-recognition crutches UWorld gives you. They also punish conceptual gaps more harshly. But the scaled score you get from a reasonably recent NBME form is usually your best single-number predictor.

So if you want one anchor number to plan around, use your latest NBME.

UWSA2: The High-Confidence “Ceiling” Predictor

Now we get to the one that generates the most emotional reactions: UWSA2.

Students love UWSA2 because:

Scores are often higher than their earliest NBME scores.
Correlation with the real exam is strong.
It “feels” like Step 2 CK in length and style.

The numbers from large online datasets and internal logs:

Correlation with real Step 2 CK: r ≈ 0.80–0.85
Average bias: +2 to +5 points (mildly optimistic)
Error band: about ±8–11 points

So yes, UWSA2 is very predictive. But systematically a bit high.

bar chart: NBME, UWSA2, UWSA1, Free 120

Interpreted:

NBME: clusters around real score, tiny under/over on average
UWSA2: roughly +4 over real is typical
UWSA1: can sit +8 (or more) over real, especially for mid-range scorers
Free 120: centered but noisy; some +15, some −15

How to read a UWSA2 score

If you take UWSA2 within 7–14 days of your test date, under exam-like conditions (single day, timed, no pausing, minimal distractions), a practical interpretation:

Score − 5 points ≈ conservative floor you can expect if test day is “average”
Score ≈ realistic midpoint, if stress and fatigue are normal
Score + 5 points ≈ best-case scenario

So a 250 on UWSA2 often means:

You are not “actually a 260” yet.
You are most likely in the 245–255 band, assuming consistency.

I have seen multiple cases where:

UWSA2: 261
NBME form (within a week): 248
Real Step 2 CK: 251

Pattern: UWSA2 inflates slightly, NBME pulls you back to reality, actual exam lands in between.

Use UWSA2 as your optimistic but still data-grounded view of your score ceiling.

UWSA1: Useful, but Noisy and Often Inflated

UWSA1 generates the worst false confidence.

Statistically:

Correlation: r ≈ 0.70–0.78 with real Step 2 CK
Average bias: +5 to +10 points (frequently overpredicts)
Error band: ±10–13 points or worse for some cohorts

You can feel this in anecdotes:

Student has UWSA1 = 252, gets excited, slows intensity.
NBME 2 weeks later = 238. Panic.
Real exam = ~240–245.

UWSA1 correlated, yes, but systematically too high and too unreliable to be used as your primary target.

Where UWSA1 is still valuable

It is not useless. Far from it. The data and experience say:

Good for early-to-mid prep benchmarking.
Useful for relative progress tracking (e.g., from 220 → 240).
Helpful for question-style practice and timing early in dedicated.

But if you must choose which score to trust between UWSA1 and NBME/UWSA2, the hierarchy is simple:

NBME ≈ UWSA2 > UWSA1.

If only UWSA1 is high while NBME and UWSA2 are subdued, believe NBME/UWSA2. Every time.

The Free 120: Great for Style, Mediocre for Prediction

The Free 120 is a trap if you treat it like a full predictor.

Quantitatively:

Correlation: r ≈ 0.65–0.75 with real Step 2 CK (varies a lot by dataset)
Bias: near zero on average, but with large variance
Error band: ±12–15 points or more

In other words, Free 120 is centered but noisy. Across hundreds of students, it “evens out” to a solid regressed mean. For an individual student, it can be off by a mile.

The main reasons:

Only 120 questions; fewer data points means wider confidence intervals.
Scoring is rough, and conversion tables students use online are not consistently calibrated.
The content mix sometimes lags behind exam trends.

I treat Free 120 as:

A style and confidence check, not a score predictor.
A good tool to ensure you are comfortable with the Prometric interface, exhibit format, and item length.
A quick way to see if your percent correct is at least in the neighborhood of what your NBME/UWSA2 scores imply.

Rule of thumb I have seen hold decently:

If your Free 120 percentage, converted with a reasonable rule of thumb (often percent correct × 3 + 120-ish), sits more than 10 points below your recent NBME/UWSA2, that is a yellow flag.
If it is within ±5 points of those, you are probably fine.
If it is higher, ignore the optimism and default back to NBME/UWSA2.

Q‑Bank Percentages: Background Signal, Not a Forecast

Students badly overvalue their UWorld percentage. The correlation is real but weaker than they think, and the bias is heavy.

Across multiple groups:

Cumulative UWorld percentage tends to show r ≈ 0.60–0.70 with real CK.
Early blocks drag the average down, so late-improving students often “look worse” on the cumulative.
Stronger correlation if you filter to last 600–800 questions or self-assessment–adjacent weeks.

The approximate mapping (from classes I have seen, assuming random/timed, mostly first pass):

Approximate UWorld Percent to Step 2 CK Range

UWorld % (1st pass)	Approx Typical CK Range
50–54%	220–230
55–59%	225–240
60–64%	235–250
65–69%	245–260
70%+	255+ (wide spread)

But this mapping has huge variance. Two students at 62% can end up at 240 vs 255, depending on how their curves evolved.

So use Q‑bank data as:

A background indicator of readiness,
A sanity check when your self-assessment scores jump or drop dramatically,
Not as the primary predictor.

If your Q‑bank % and your self-assessments fundamentally disagree, trust the self-assessments—especially NBME and UWSA2.

How to Combine Multiple Practice Tests into a Single Prediction

Individual self-assessments have noise. The smartest way to use them is to treat each score as one sample from your underlying “true ability” distribution.

Multiple samples → better estimate.

In practice, you can do a quick-and-dirty weighted average that reflects predictive power:

Assign highest weight to your most recent NBME and UWSA2.
Moderate weight to other recent NBMEs.
Light weight to UWSA1.
Free 120: use mainly as a tiebreaker.

Example. Let us say your recent scores (within 3 weeks of exam) are:

NBME 11: 242
NBME 12: 246
UWSA2: 251
UWSA1: 256
Free 120: ~78% (roughly maps to low–mid 240s typically)

You could approximate like this:

Ignore UWSA1 for primary prediction; use it as “max upside”.
Average NBME 11 and 12: (242 + 246) / 2 = 244
Compare to UWSA2 (251). Difference ≈ 7 points.
Given known UWSA2 optimism (+4 on average), shift UWSA2 down slightly → adjusted UWSA2 ≈ 247.
Take a weighted mean:
- 244 (NBMEs) with 60% weight,
- 247 (adj. UWSA2) with 40% weight.
  Predicted ≈ 245–246.

Now look at Free 120 percent; if conversion roughly matches mid‑240s, you have concordant evidence.

Your realistic target: mid‑240s, with a likely range of about 238–253. That is how the data stacks when you treat each test as noisy evidence, not gospel.

Trend vs Snapshot: The Time Dimension Matters

Correlation metrics ignore a crucial factor: time between self-assessment and real exam.

Here is what internal tracking usually shows:

Self-assessments taken >6 weeks before the real exam have weaker predictive value. Many students are still on the rising part of their curve.
Predictive strength increases sharply for tests taken in the final 2–4 weeks before the exam.
Scores taken in the final 7–10 days, especially NBME/UWSA2, are often within ±5–7 points for 60–70% of people.

So do not anchor your entire identity to a UWSA1 from 8 weeks ago. The score might be obsolete—up or down.

line chart: 8+ weeks, 6 weeks, 4 weeks, 2 weeks, 1 week

The line is conceptual, but it matches what I have observed: correlation tightens as you approach game day.

If your last 7–10 day scores show a clear upward trend, weight those more heavily than older, lower scores. If they show a downward trend, that is a separate red flag—often reflecting burnout, rushing, or poor sleep.

Practical Rules: Which Self-Assessment “Wins” in a Conflict?

Let me give you the actual decision rules I recommend to students when data points disagree.

NBME vs UWSA1 conflict (big gap):
- NBME 238, UWSA1 255
- Trust: NBME. UWSA1 is optimistic. Your realistic range is probably low‑240s, not mid‑250s.
NBME vs UWSA2 conflict (moderate gap):
- NBME 245, UWSA2 255
- Interpretation:
  - UWSA2 likely inflated by ~5–7 points.
  - NBME might be slightly conservative if older by >7–10 days.
  - Expect real score in the middle, maybe 247–250, assuming stable prep.
Multiple NBME forms trending up, UWSA1 flat:
- NBME 235 → 242 → 248
- UWSA1 sitting at 244 somewhere in the middle
- Trust the trend and the most recent NBME. You improved after that UWSA1.
Free 120 much lower than others in final week:
- NBME 252, UWSA2 255, but Free 120 ~72% (translating to low‑240s)
- Check conditions: were you distracted, tired, guessing on many?
- If exam-like conditions were good, consider that your test-day performance is sensitive to fatigue or integration. You probably still land closer to 245–250 than to 260.
Only UWSA1 is great; everything else mediocre:
- UWSA1 260, NBME 238–242, UWSA2 245
- Treat UWSA1 as an outlier. Reality is likely mid‑240s to low‑250s barring a last-minute jump in performance and content mastery.

Notice the pattern: NBME and UWSA2 almost always control the final estimate.

Final Takeaways: What the Data Actually Says

Strip away the anecdotes and wishful thinking, and the data reduces to three blunt points:

NBME and UWSA2 are your best predictors.
NBME forms slightly edge out in calibration; UWSA2 runs very close but tends to overpredict by a few points. If you need one “anchor” number, use your latest NBME. Use UWSA2 as the optimistic boundary of your likely range.
UWSA1 and Free 120 are supporting actors, not the lead.
UWSA1 is useful for run-up practice and early benchmarking but frequently inflated. Free 120 is excellent for interface familiarity and style, poor as a standalone score predictor due to wide variance.
Trends and timing matter as much as individual scores.
Self-assessments taken in the final 2–4 weeks under exam-like conditions carry far more predictive weight. Combine them using simple, rational weighting rather than obsessing over any single score, and assume an error band of at least ±7–10 points no matter how good the predictor looks.

If you use the exams this way—NBME and UWSA2 as calibrated instruments, everything else as context—you are making decisions based on actual signal, not noise. That is how you walk into Step 2 CK with confidence grounded in data, not in luck.

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

See Your Residency Matches

* 100% free to try. No credit card or account creation required.

The Myth That Step 2 CS Is ‘Just Common Sense’ and Why People Fail

Why Step 2 CS wasn't common sense: learn failure patterns in communication, ICE, documentation, and checklists and how to avoid them.

Mastering USMLE Step 2: Top Strategies for Clinical Skills Success

Unlock your potential with proven strategies for USMLE Step 2 clinical skills. Enhance patient interaction and excel in exam preparation today!

Optimize Your Study Environment for USMLE Step 2 Success

Discover how to create the ideal study environment for USMLE Step 2 preparation to boost productivity and enhance your exam performance.

Balancing Rotations and Step 2 CK: A Rotation-by-Rotation Planning Guide

Rotation-by-rotation Step 2 CK guide for clinical-year students: build stamina, boost shelf scores, and schedule your exam for peak performance.

Do You Really Need 3+ Qbanks for Step 2 CK? What the Evidence Shows

You don't need 3+ qbanks for Step 2 CK. Learn evidence-backed strategies—use one high-quality qbank, NBME practice, and targeted review to boost scores.

Terrified of Standardized Patients: How to Face Step 2 CS Anxiety

Overcome Step 2 CS and OSCE anxiety with practical strategies to manage standardized patient fear, prevent blanks, and perform calmly on exam day.

Mastering USMLE Step 2 CS: Your Essential Guide to Clinical Skills Success

Unlock success in USMLE Step 2 CS with our ultimate guide on clinical skills preparation, essential strategies, and tips for aspiring medical professionals.

The Hidden Ways Step 2 CK Timing Affects Your Interview Invitations

Learn how Step 2 CK timing shapes residency interview invitations - when to schedule, how PDs filter ERAS, and strategies to maximize interview offers.

Historical Trends: How Step 2 CK Scores Have Shifted Since Step 1 Pass/Fail

How Step 1 pass/fail raised Step 2 CK stakes: score trends, specialty cutoffs, and actionable prep strategies for residency applicants.

Mastering USMLE Step 2 CS: A Comprehensive Guide for Medical Students

Discover essential strategies for excelling in clinical skills exams and enhance your patient care abilities with our ultimate guide to USMLE Step 2 CS.

Unlocking Step 2 CK Success: Effective Study Strategies for Medical Students

Master Step 2 CK with proven strategies to ace your USMLE exam. Boost your clinical knowledge and confidence for medical licensing and residency applications.

Which Step 2 CK Qbank Should You Start With First and Why?

Start Step 2 CK prep with Amboss to build knowledge, then use UWorld for exam simulation and score prediction — a practical Qbank sequencing guide now.

Eight Weeks to Go: Weekly Milestones for a Structured Step 2 Study Sprint

Structured 8-week Step 2 CK study plan with weekly milestones, question quotas, and daily routines to push your score toward 240–260+. Start now.

Mastering USMLE Step 2: Strategic Exam Prep for Future Clinicians

Elevate your USMLE Step 2 prep with strategic study tips that enhance clinical skills and knowledge, preparing you for success in medical education.

Mastering USMLE Step 2: Confident Exam Strategies & Mindset Tips

Unlock your potential for USMLE Step 2 with expert exam strategies and mindfulness techniques to excel in your medical licensing journey.

Failed Step 1 and Now Facing Step 2 CK: How Worried Should I Be?

Failed Step 1? Learn how to recover with a Step 2 CK strategy, target scores by specialty, and study tips to boost match chances.

Lab Interpretation for Step 2 CK: ABGs, LFTs, and Hemolysis Cases Demystified

Master ABGs, LFTs, and hemolysis lab interpretation for Step 2 CK with fast, high-yield patterns, compensation rules, and management tips.

Handling Step 2 CS Cancellation or Delay: Backup Plans and Next Moves

Urgent steps after a Step 2 CS cancellation: diagnose your situation, contact dean/ECFMG, reschedule strategically to protect graduation and residency application.

Top 10 Online Resources for Mastering USMLE Step 2 CK and CS Prep

Explore the best online study resources for USMLE Step 2 CK and Clinical Skills to enhance your medical exam preparation and boost your scores.

Behind Closed Doors: How Failed Step 2 Attempts Are Discussed in Rank Meetings

Inside residency rank meetings: learn how failed Step 2 attempts are evaluated, what programs look for, and how to frame your explanation to mitigate risk.

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

See Your Residency Matches

* 100% free to try. No credit card or account creation required.

Predictive Strength vs Time Before Step 2 CK
Category	Value
8+ weeks	0.55
6 weeks	0.65
4 weeks	0.75
2 weeks	0.82
1 week	0.85

Practice Test Correlation: Which Step 2 CK Self-Assessment Best Predicts Score?

The Core Metric: Correlation, Bias, and Error

NBME Step 2 CK Forms: The Gold(ish) Standard

Why NBME forms track so well

Are NBME forms “harder” or “easier”?

UWSA2: The High-Confidence “Ceiling” Predictor

How to read a UWSA2 score

UWSA1: Useful, but Noisy and Often Inflated

Where UWSA1 is still valuable

The Free 120: Great for Style, Mediocre for Prediction

Q‑Bank Percentages: Background Signal, Not a Forecast

How to Combine Multiple Practice Tests into a Single Prediction

Trend vs Snapshot: The Time Dimension Matters

Practical Rules: Which Self-Assessment “Wins” in a Conflict?

Final Takeaways: What the Data Actually Says

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Related Articles

The Myth That Step 2 CS Is ‘Just Common Sense’ and Why People Fail

Mastering USMLE Step 2: Top Strategies for Clinical Skills Success

Optimize Your Study Environment for USMLE Step 2 Success

Balancing Rotations and Step 2 CK: A Rotation-by-Rotation Planning Guide

Do You Really Need 3+ Qbanks for Step 2 CK? What the Evidence Shows

Terrified of Standardized Patients: How to Face Step 2 CS Anxiety

Mastering USMLE Step 2 CS: Your Essential Guide to Clinical Skills Success

The Hidden Ways Step 2 CK Timing Affects Your Interview Invitations

Historical Trends: How Step 2 CK Scores Have Shifted Since Step 1 Pass/Fail

Mastering USMLE Step 2 CS: A Comprehensive Guide for Medical Students

Unlocking Step 2 CK Success: Effective Study Strategies for Medical Students

Which Step 2 CK Qbank Should You Start With First and Why?

Eight Weeks to Go: Weekly Milestones for a Structured Step 2 Study Sprint

Mastering USMLE Step 2: Strategic Exam Prep for Future Clinicians

Mastering USMLE Step 2: Confident Exam Strategies & Mindset Tips

Failed Step 1 and Now Facing Step 2 CK: How Worried Should I Be?

Lab Interpretation for Step 2 CK: ABGs, LFTs, and Hemolysis Cases Demystified

Handling Step 2 CS Cancellation or Delay: Backup Plans and Next Moves

Top 10 Online Resources for Mastering USMLE Step 2 CK and CS Prep

Behind Closed Doors: How Failed Step 2 Attempts Are Discussed in Rank Meetings

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.