| Category | Min | Q1 | Median | Q3 | Max |
|---|---|---|---|---|---|
| COMLEX L1 | 400 | 450 | 500 | 550 | 650 |
| USMLE S1 | 205 | 220 | 230 | 240 | 265 |
The myth that COMLEX and USMLE scores live on “separate planets” is mathematically wrong. The data show clear, quantifiable relationships between the exams—just not the simple 1:1 conversions people want.
You are trying to answer a very specific question: how do performance curves on COMLEX Levels compare with USMLE Step scores, and what does that mean if you are choosing DO vs MD or planning exams as a DO student?
Let’s treat this like what it is: a distribution and signal‑quality problem, not a vibes question.
1. The Scoring Systems Are Built Differently on Purpose
First, structure. If you do not internalize how the score scales are built, every “conversion chart” you see will mislead you.
COMLEX (Level 1, 2-CE, 3)
Key numbers (pre‑pass/fail era; Level 1 switched to pass/fail reporting in 2022, but the scale logic is similar):
- Mean: typically ≈ 500
- Standard deviation: ≈ 85–100 (varies slightly by cohort)
- Passing: ~400 (not fixed, but historically close)
- Range of most scores: roughly 350–800, with 800+ as very high outliers
So if a class has Level 1 mean 500 with SD 90, then:
- 1 SD below mean ≈ 410
- 1 SD above mean ≈ 590
- 2 SD above mean ≈ 680
The NBOME explicitly uses a standard‑score model anchored to a reference cohort. That means 500 is not a “percentage”; it is a location on a normalized curve.
USMLE (Step 1, 2 CK, 3)
Older Step 1 data (before Step 1 went pass/fail):
- Mean: ≈ 230
- Standard deviation: ≈ 19
- Passing: ≈ 194
Step 2 CK (still scored numerically):
- Mean: ≈ 245
- Standard deviation: ≈ 15
- Passing: ≈ 214
Same idea: standard‑score model on a different numerical scale.
So for Step 1:
- 1 SD below mean ≈ 211
- 1 SD above mean ≈ 249
- 2 SD above mean ≈ 268
If you compare shapes, both exams are designed to produce something close to a normal distribution around a fixed point (500 vs ~230). Different numbers, same underlying statistics game.
2. What the Published Correlations Actually Show
The critical mistake I see: people confuse regression relationships with equivalence.
There have been multiple analyses—by programs, by researchers, and by test‑prep companies—looking at students who took both COMLEX and USMLE. When you plot COMLEX scores against USMLE scores, you see a reasonably tight but not perfect linear relationship.
Think: scatterplot with real spread, not a single diagonal line.
Typical correlation magnitudes
Various studies and internal program data sets tend to show:
- Correlation between COMLEX Level 1 and USMLE Step 1: r ≈ 0.70–0.80
- Correlation between COMLEX Level 2-CE and USMLE Step 2 CK: similar ballpark, r ≈ 0.70–0.80
What this actually means:
- r ≈ 0.75 → about 56% of the variance in USMLE score can be statistically explained by COMLEX score (because r² ≈ 0.56).
- The remaining ≈ 44% is noise: test style, prep resources, random exam effects, individual strengths, timing, etc.
So yes, higher COMLEX scores are strongly associated with higher USMLE scores, but no, you cannot “convert” with surgical precision for a single person.
3. The Temptation of Conversion Formulas (and Their Limits)
Every few months, a new “COMLEX to USMLE conversion chart” circulates on Reddit or in GroupMe chats. Most of them rely on some linear regression of the form:
USMLE Step ≈ a + b × (COMLEX)
Numbers from various published or semi‑published models have looked something like:
- Step 1 ≈ 67 + 0.28 × (Level 1)
- Step 1 ≈ 86 + 0.24 × (Level 1)
- Step 2 CK ≈ 80 + 0.30 × (Level 2-CE)
Exact coefficients vary by sample. That is the point.
Let’s take one plausible example line and look at the spread, not just the point estimate.
Assume a model (illustrative, not canonical):
Step 1 = 70 + 0.32 × Level 1, with standard error of estimate ≈ 9–10 points
Plug in:
- Level 1 = 400 → predicted Step 1 ≈ 70 + 0.32×400 = 198
- Level 1 = 500 → predicted Step 1 ≈ 230
- Level 1 = 600 → predicted Step 1 ≈ 262
Now fold in error:
If the standard error is 9 points, then for a given COMLEX, the 68% prediction interval is ±9; 95% interval is roughly ±18.
So for a Level 1 = 500 student:
- Predicted Step 1: 230
- 68% interval: 221–239
- 95% interval: 212–248
For programs that screen at Step 1 ≥ 240 (back when numeric scores existed), that spread matters. For students trying to guess “Am I competitive for derm?” off a single COMLEX score, it is reckless.
The same logic applies to Level 2-CE vs Step 2 CK:
- A Level 2-CE of 550 may predict something like Step 2 ≈ 250 ± 8–10.
- At scale, that is good enough for trend lines.
- For an individual student, you could easily land at 240 or 260 with that same COMLEX.
So as a data analyst: I treat conversion charts as rough percentile translators, not score guarantees. Programs that use them as hard filters are over‑interpreting noisy data.
4. Comparing Performance Curves: Percentiles, Not Raw Scores
If you want a sane way to mentally map COMLEX and USMLE, stop chasing one magic formula and start thinking in percentiles and SDs.
Standardizing both exams
Take z‑scores:
- For COMLEX Level 1:
z_COMLEX = (Score – 500) / 90 - For Step 1:
z_USMLE = (Score – 230) / 19
If an exam is roughly normal:
- z = 0 → 50th percentile
- z = +1 → ~84th percentile
- z = +2 → ~97.5th percentile
So if you score:
- Level 1 = 680
- z = (680 – 500)/90 ≈ +2.0 → ~97.5th percentile
- Step 1 = 268
- z = (268 – 230)/19 ≈ +2.0 → same percentile
That is a much more honest equivalence: your location in the distribution is analogous, even if the raw numbers differ.
Let’s map a few useful anchor points.
| Percentile | COMLEX Level 1 (μ=500, σ=90) | USMLE Step 1 (μ=230, σ=19) |
|---|---|---|
| ~5th | ~350 | ~200 |
| ~16th | ~410 | ~211 |
| 50th | 500 | 230 |
| ~84th | ~590 | ~249 |
| ~95th | ~650 | ~261 |
| ~97.5th | ~680 | ~268 |
Are these exact for every cohort? No. But they are directionally correct.
So when someone asks, “Is a 600 on COMLEX good?” the statistically honest reply is:
- Mean ≈ 500, SD ≈ 90 → z ≈ (600–500)/90 ≈ +1.1 → roughly 86th–88th percentile.
- That is roughly like scoring around 250 on old Step 1 in terms of relative standing.
Programs that understand distributions think roughly this way, whether explicitly or not.
5. How DO vs MD Training Pipelines Interact with These Curves
Now, the pragmatic piece you care about: what all this means for DO vs MD choices and exam strategy.
5.1 DO applicants and dual‑exam behavior
Historically, a substantial subset of DO students, especially those gunning for competitive or previously MD‑dominant specialties, sat for both COMLEX and USMLE.
You see selection bias here:
- Higher‑achieving DO students are more likely to take USMLE at all.
- So the observed correlations between exams are often based on the upper half of the COMLEX distribution, not a full cross‑section.
That inflates optimism. People see a bunch of Level 1 > 600 / Step 1 > 250 pairings and assume the relationship holds identically at 450 or 400. It does not. The lower the score, the more noise and confounders creep in.
5.2 Program behavior: filters and pseudo‑conversions
Three common patterns from residency data and PD surveys:
A subset of historically MD‑heavy programs still want a USMLE number because:
- Their historical filters, risk models, and internal benchmarks all live on the USMLE scale.
- They trust their intuition on what “240 vs 250 vs 260” means in their applicant pool.
- Many faculty have zero intuitive feel for “Level 1 550 vs 650” without extra translation work.
Some programs will approximate:
- If they see only COMLEX, they might use a rough conversion or percentile table to mentally map.
- Think in patterns: “Level 1 around 500 → average Step‑equivalent; 600 → strong; 650+ → top decile.”
- But many PDs openly admit they do this inconsistently and with discomfort.
A minority of programs still explicitly or implicitly penalize DO applicants without USMLE:
- Not always out of malice. Often just inertia: their filters are built around USMLE data.
- From an analytics standpoint, they have more historical predictive validity linked to USMLE than COMLEX.
So: if you are a competitive DO student aiming for historically competitive specialties at legacy MD institutions, the data strongly favor taking USMLE Step 2 CK at minimum, even in the post–Step 1 pass/fail world.
6. Step 1 Pass/Fail Changed the Game—but Not the Curve Logic
USMLE Step 1’s move to pass/fail removed a major quantitative signal. Programs lost a high‑variance metric that used to do a lot of their early applicant triage.
Statistically, what happened:
- One strong continuous variable (Step 1 score) was replaced by a binary (Pass/Fail) with limited information value.
- Programs shifted weight toward:
- Step 2 CK score (now the main numeric signal)
- Class rank / quartiles
- School reputation
- Research output
- Narrative components (with all the inherent subjectivity)
For DO students, this cut both ways:
- Less penalty for slightly lower Step 1 numerics. A pass is a pass.
- More pressure on Step 2 CK as the main cross‑platform performance metric.
- COMLEX Level 1 remains required for DOs. Step 2‑equivalent comparisons (Level 2-CE vs Step 2) now matter more.
So instead of obsessing over old Step 1 conversion curves, your attention should be on Level 2-CE ↔ Step 2 CK relationships.
| Category | Value |
|---|---|
| Level 2-CE Mean | 500 |
| Level 2-CE SD | 90 |
| Step 2 CK Mean | 245 |
| Step 2 CK SD | 15 |
Same normalization story:
- Level 2-CE: μ ≈ 500, σ ≈ 90
- Step 2 CK: μ ≈ 245, σ ≈ 15
So a 600 on Level 2-CE still approximates a 255–260 Step 2 CK in percentile space.
Residency programs that understand statistics will increasingly benchmark DOs using:
- Step 2 CK if present (preferably)
- Otherwise, Level 2-CE plus some internal adjustment / percentile mapping
But there is no sign that programs have collectively mastered this yet. Human heuristics lag behind policy changes.
7. Practical Strategy: How to Use the Curves for Decision‑Making
Let me strip this down to actionable, data‑driven rules of thumb.
7.1 If you are premed deciding DO vs MD
You care about three questions:
Can DO students reach the same performance percentiles as MD students on USMLE?
- Data from cohorts that take both show: yes, many DOs match or outperform MD peers at equivalent COMLEX percentiles.
- The limiting factor is usually resources, advising, and self‑selection, not raw exam potential.
Will COMLEX‑only performance handicap you for some residencies?
- For the most competitive fields and big‑name academic centers, yes, if you never take USMLE.
- For primary care, many IM, peds, FM, psych programs: a strong COMLEX alone is often acceptable.
Is DO inherently “less”?
- Statistically: no, not if you normalize by exam percentiles and control for selection.
- But structurally: you are entering a system where some gates are still fitted to USMLE keys.
7.2 If you are a DO student planning exams
Use the distributions intelligently.
Interpret your COMLEX in SD and percentile space:
- <400: at or below passing; you are in trouble for any test‑heavy path and probably should not add USMLE until you fix fundamentals.
- 400–500: below or around mean. Only worth adding USMLE if you have strong evidence your knowledge and test‑taking are improving and you need that signal for a particular specialty.
- 500–550: roughly 50–70th percentile. USMLE may help, but do not expect a miracle; predicted Step will cluster near the mean.
- 550–600: 75–85th percentile. This is where a USMLE attempt starts to make strategic sense for competitive programs.
- 600+: 85–90th+ percentile. You are exactly the kind of candidate who can post a strong Step 2 CK and materially change how MD‑heavy programs perceive you.
Understand prediction intervals:
If your Level 2-CE suggests a 250 Step 2 CK equivalent, treat your realistic range as about 240–260. Plan assuming mid‑range, not the upper bound.
Be honest about your trajectory:
- If your question bank performance is climbing (e.g., NBME or UWorld self‑assessments trending up by 10–15 points over a month), your probability of overshooting a regression‑based prediction increases.
- If your practice scores are stable or falling, expect to underperform any optimistic COMLEX→USMLE equation.
8. Why Programs Struggle with COMLEX Interpretation
From a data systems perspective, residency programs face three problems with COMLEX:
Less historical data: An internal data set of 1,000 USMLE scores vs Match outcomes is more robust than 80 COMLEX scores.
Non‑intuitive scale: Faculty who trained pre‑COMLEX normalization have no mental map for “680 vs 720” but have 20 years of gut feel for “240 vs 260”.
Noise in conversions: When programs try to use a linear regression to project USMLE equivalents, they import error. A COMLEX 500 could map to anything from about 215 to 245 in real‑world performance.
Some PDs solve this by:
- Only using COMLEX pass/fail as a screen and then relying heavily on other metrics.
- Or informally “chunking” COMLEX into bins:
- <450: concern
- 450–520: average
- 520–600: solid
600: strong
Which, again, is basically percentile thinking.
If you are a DO with a strong COMLEX score, part of your job on the application side is to translate yourself. Not with fake conversion charts in your personal statement, but with:
- Concrete evidence of performance: research productivity, class rank, AOA/Gold Humanism, strong narrative letters, and if you took USMLE, a coherent story tying everything together.
9. A Clear‑Eyed Summary
Strip away the noise and you get three blunt facts.
COMLEX and USMLE are both normalized, curved exams with roughly normal distributions; their numeric scales are different but their percentile structures are comparable. A COMLEX
1 SD above mean (590) lives in about the same performance band as a Step 1 around 249 or Step 2 CK around 260.Correlations between COMLEX and USMLE scores are strong (r around 0.7–0.8) but not perfect. That means you can use COMLEX to estimate a likely range for USMLE, yet individual outcomes swing ±10–15 points. Any rigid conversion chart that pretends otherwise is statistically dishonest.
For DO students, the data support a simple rule: if your COMLEX scores place you well above average (≥75th–80th percentile), sitting for USMLE—especially Step 2 CK—materially improves your signaling power to competitive, historically MD‑dominant programs. Below that band, the benefit is smaller and the risk‑reward equation becomes far more individualized.
If you treat these exams as distributions, not magic numbers, your decisions about DO vs MD, COMLEX vs USMLE, and specialty targeting become a lot cleaner.