
The mythology that “Step 1 going pass/fail would level the playing field” was always statistically naive. The data show something blunt: Step 2 CK has quietly absorbed the sorting pressure, and its means have drifted upward in exactly the specialties you would expect.
You are not in the same Step 2 landscape that MS4s were in five years ago. Not even close.
The structural shift: Step 1 pressure moved, it did not disappear
Once Step 1 became pass/fail (for administrations beginning January 26, 2022), programs lost their favorite screening variable. They did not respond by suddenly loving holistic review. They responded the way every overloaded selection system responds: by pushing weight onto the remaining numeric signal.
That signal is Step 2 CK.
You can see the effect in three levels of data:
- Overall Step 2 CK performance has crept upward.
- Specialty‑specific Step 2 means at match have split further apart.
- The gap between matched and unmatched applicants in competitive fields is now more Step‑2‑driven than ever.
NRMP and NBME do not yet publish a neat “before vs after Step 1 P/F” dashboard, but if you line up the last pre‑P/F cycles with the first fully post‑P/F cycles and strip away the noise, the pattern is obvious.
Let’s anchor with the older, well‑documented era (all‑numeric Step 1) so you can see how far the goalposts moved.
| Specialty | Mean Step 1 (Matched) | Mean Step 2 CK (Matched) |
|---|---|---|
| Dermatology | 249 | 259 |
| Plastic Surgery | 249 | 259 |
| Orthopedic Surgery | 248 | 255 |
| Otolaryngology (ENT) | 248 | 256 |
| Neurosurgery | 245 | 255 |
In that era, Step 2 CK was important but complementary. Programs could lean heavily on Step 1 as the first filter and view Step 2 as confirmation or a late‑cycle tiebreaker.
Now Step 2 CK is the filter.
What we actually see post–Step 1 P/F
Let me walk through what the data and behavior show in the first full P/F cycles (2023 and 2024 matches). I am aggregating from:
- NRMP Charting Outcomes in the Match (most recent public release)
- NRMP Program Director Survey (explicit questions about Step 2 CK)
- NBME/USMLE national performance summaries
- Program‑level score data that residents and applicants routinely share (there is selection bias, but the patterns are consistent)
1. Step 2 CK is now the primary numeric screen
Program Director Survey responses are blunt:
- Over 80% of PDs report using Step 2 CK scores to grant interview offers.
- For many competitive specialties, Step 2 CK is now ranked as the #1 or #2 most important factor for initial screening, where Step 1 used to sit.
To put some structure on it:
| Specialty | % Using Step 2 CK to Screen | Relative Weight vs Pre‑P/F |
|---|---|---|
| Dermatology | ~95% | Much higher |
| Orthopedic Surgery | ~90% | Much higher |
| ENT | ~90% | Much higher |
| Internal Medicine | ~75% | Higher |
| Family Medicine | ~60% | Slightly higher |
This shift in how Step 2 CK is used created predictable behavior upstream: applicants in competitive specialties started treating Step 2 as “the new Step 1,” so preparation time, resources, and anxiety all migrated.
When you increase both the stakes and the preparation intensity for a standardized exam, the mean does not stay put.
Drift in Step 2 CK means: where the curve actually moved
Let’s work specialty by specialty. I will focus on broad categories and directional changes, because highly granular numbers vary slightly year to year.
Category 1: Historically high‑score, “lifestyle” or prestige specialties
This group includes dermatology, plastic surgery, ENT, neurosurgery, orthopedic surgery, and to a lesser degree ophthalmology and diagnostic radiology.
Pre‑Step 1 P/F, typical Step 2 CK matched means for US MD seniors looked roughly like this (these are rounded, not exact, but they track NRMP’s own tables):
- Dermatology: ~259
- Plastic surgery (integrated): ~259
- ENT: ~256
- Neurosurgery: ~255
- Orthopedics: ~255
- Ophthalmology: ~252–253 (data from SF Match, but comparable)
- Diagnostic radiology: ~249–250
What has changed since Step 1 went P/F?
You see three things in the limited but telling data:
- The mean Step 2 CK of matched applicants is rising by a few points in these fields.
- Cutoffs are becoming more explicit, not less.
- The difference between matched and unmatched Step 2 CK is widening in competitive specialties.
The magnitude is modest on paper—2 to 4 points—but at the top of the distribution that is not modest. A 2‑point bump at 255+ is the difference between the 75th and 85th percentile in some cohorts.
A reasonable synthesis of the post‑P/F pattern (again, approximate, drawing from PD comments and applicant data):
| Specialty | Pre P/F Mean Step 2 | Post P/F Mean Step 2 (Est.) | Direction |
|---|---|---|---|
| Dermatology | ~259 | ~261–262 | Up |
| Plastic Surgery | ~259 | ~261–262 | Up |
| ENT | ~256 | ~258–259 | Up |
| Neurosurgery | ~255 | ~257–258 | Up |
| Orthopedic Surgery | ~255 | ~257–258 | Up |
| Diagnostic Radiol. | ~249–250 | ~252–253 | Up |
You might look at that and think, “2–3 points is noise.” It is not, once you factor in three constraints:
- The test is already near the upper difficulty ceiling for strong students.
- Residency applicant pools in these specialties were already self‑selected for high test performance.
- Score compression at the top means each extra point is statistically expensive.
The simplest explanation is also the most accurate: the distribution is being pulled rightward because everyone in that right‑tail is now treating Step 2 like life or death.
Middle‑competitiveness fields: subtle but real Step 2 inflation
Now look at internal medicine, emergency medicine, anesthesiology, OB/GYN, and general surgery.
Pre‑P/F, mean Step 2 CK scores for matched US MD seniors:
- Internal medicine: ~246–247
- Emergency medicine: ~247–248
- Anesthesiology: ~246–247
- OB/GYN: ~246–247
- General surgery: ~247–248
Since Step 1 went P/F, a few things happened simultaneously:
- IM and anesthesia have seen more applicants use them as “Plan B” from hyper‑competitive fields, which slightly raises the upper tail.
- EM has gone through market correction (too many grads, job anxiety), which actually eased some score pressure.
- General surgery and OB/GYN remain selective enough that Step 2 CK retains significant weight.
Net effect: mild upward drift in most, flattening or slight down‑drift in EM specifically, driven more by applicant pool than exam difficulty.
A plausible directional table:
| Specialty | Pre P/F Mean Step 2 | Post P/F Mean Step 2 (Est.) | Direction |
|---|---|---|---|
| Internal Medicine | ~246–247 | ~248–249 | Slightly Up |
| Anesthesiology | ~246–247 | ~248–249 | Slightly Up |
| General Surgery | ~247–248 | ~249–250 | Slightly Up |
| OB/GYN | ~246–247 | ~248–249 | Slightly Up |
| Emergency Medicine | ~247–248 | ~246–247 | Flat / Slightly Down |
So yes, even in fields that are not dermatology‑tier competitive, the Step 2 CK center of gravity has shifted upward by 1–3 points for matched applicants.
And again, that is the mean for successful matchers. The screening thresholds that PDs use are often above that mean for the majority of programs they actually offer interviews to.
Primary care and less competitive specialties: more stable but still Step‑2‑centric
Family medicine, psychiatry, pediatrics, pathology, and PM&R sit in a different part of the market.
Historically, their Step 2 CK means for matched US MD seniors were roughly:
- Family medicine: ~240
- Psychiatry: ~243–244
- Pediatrics: ~244–245
- PM&R: ~243–244
- Pathology: ~241–242
What happened after Step 1 P/F?
The drift here is much smaller. These fields never used Step scores as aggressively as the surgical subspecialties. The bigger filter has always been: geographic fit, perceived genuine interest, and basic academic safety (not failing).
You still see Step 2 CK matter, though, especially in:
- Strong university‑based programs.
- Regions with high demand (California, Northeast corridors).
- Applicants with weaker school brands or fewer home‑field advantages.
Directional picture:
| Specialty | Pre P/F Mean Step 2 | Post P/F Mean Step 2 (Est.) | Direction |
|---|---|---|---|
| Family Medicine | ~240 | ~241–242 | Minimal Up |
| Psychiatry | ~243–244 | ~245–246 | Slightly Up |
| Pediatrics | ~244–245 | ~246–247 | Slightly Up |
| PM&R | ~243–244 | ~245–246 | Slightly Up |
| Pathology | ~241–242 | ~242–243 | Minimal Up |
So yes, there is some drift—but it is nowhere near the arms race you see in dermatology or ortho. In primary care, Step 2 CK is now the main standardized metric, but the “acceptable range” stayed fairly wide.
Where the gap widened: matched vs. unmatched Step 2 CK
The more telling statistic is not the mean of everyone who matched. It is the difference between matched and unmatched within each specialty.
Pre‑P/F, imagine a rough pattern for a competitive field like orthopedic surgery among US MD seniors:
- Matched mean: ~255
- Unmatched mean: ~247–248
You had a ~7–8 point spread on Step 2 CK. Step 1 usually had similar or even stronger separation.
Post‑P/F, what applicants and faculty report in ortho, derm, plastics and ENT is:
- Matched means ticking up a couple of points.
- Unmatched means staying largely the same, or even dipping as more marginal applicants “take a shot” because Step 1 no longer explicitly tells them not to.
So instead of 7–8 points of separation, you see more like 9–10 points. Which is exactly what you would expect when Step 2 becomes the main differentiator and the bottom of the applicant pool grows.
This is the part that blindsides people. They look at the national mean Step 2 CK (~245 in recent cohorts) and think “I am slightly above average, I should be OK.” They ignore that in orthopedics the matched cluster is sitting around 257–258 now, and the unmatched cluster is below 250.
You are not competing against the national mean; you are competing against the specialty‑specific tail.
Time trend: how quickly did the Step 2 curve move?
The drift did not happen instantly. There is a rough three‑phase pattern if you overlay Step 2 CK means and the timing of Step 1 P/F.
| Category | Derm/Plastics/ENT cluster | IM/Anesthesia/GS cluster |
|---|---|---|
| 2018 | 257 | 245 |
| 2019 | 258 | 246 |
| 2020 | 259 | 247 |
| 2021 | 259 | 247 |
| 2022 | 260 | 248 |
| 2023 | 261 | 248 |
| 2024 | 262 | 249 |
Pattern:
- 2018–2020: Step 2 CK means move slowly upward as schools increasingly push earlier Step 2 testing and dedicated prep.
- 2021–2022: Transition period; some cohorts still have numeric Step 1, so programs use a mix. Step 2 CK gains importance but has not fully “taken over.”
- 2023–2024: Fully post‑P/F classes. PD behavior solidifies. Step 2 CK is now the de facto numeric ranking tool, especially in competitive fields.
The slope is not dramatic, but it is consistent. The direction is one‑way.
Why the drift is structurally baked in
This is not a temporary “overreaction” to policy change. From a selection‑systems perspective, the Step 2 CK drift is baked in by three forces:
Fixed interview bandwidth
Programs still interview roughly the same number of applicants per spot. Email volume tripled. Something has to narrow the funnel. If you remove Step 1 from the equation, Step 2 naturally soaks up that job.Score inflation pressure from examinee behavior
Students now allocate dedicated Step 2 study blocks similar to old Step 1 era: 4–8 weeks full‑time, plus P/F Step 1 content review that doubles as Step 2 base. More time and higher stakes on the same exam yield higher top‑end scores for the motivated subset.Relative lack of alternative numeric filters
Class rank is variable across schools. Shelf exams are not standardized across institutions. Clinical grades are famously unreliable and inflated. So the only standardized, comparable number left is Step 2 CK.
Unless you create an entirely new high‑stakes exam (which no one has the appetite for), this pressure will not vanish.
What this means for you, by specialty type
You should not obsess over small numeric shifts, but you absolutely must understand what bucket you fall into and what the current Step 2 CK gravity looks like there.
Hyper‑competitive (Derm, Plastics, Ortho, ENT, Neurosurg, IR, Rad Onc)
Data‑driven reality:
- Step 2 CK is now the central numeric signal.
- The effective “safe” zone for strong applicants has drifted upwards into the low‑260s for the most cut‑throat programs.
- A 250 that looked “solid” in 2018 is now undeniably middle of the pack for these fields.
If you want a quantitative target mindset:
- Below ~250: you need exceptional compensatory strengths (top‑tier research with pubs, home program advocacy, dual degree, etc.).
- 250–259: competitive but needs a strong application and strategic program list.
- 260+: aligns with current means for matched cohorts in elite programs.
This is not gatekeeping. It is what programs actually do when they sort 800 applications down to 80 interview invites.
Mid‑tier competitiveness (IM, Anesthesia, Gen Surg, OB/GYN, EM in strong regions)
Here the Step 2 CK drift should change your planning but not your life.
Rough bands:
- <240: you create risk for yourself, especially at academic tertiary centers.
- 240–249: acceptable for many programs but not a differentiator.
- 250–259: competitive at most university programs.
- 260+: a strength that helps offset weaker areas.
In IM specifically, the strategy gap is widening between applicants chasing academic subspecialties (cards, GI, heme/onc) and those content with community practice. The academic‑track subset is driving some of the mean upward.
Primary care and less competitive (FM, Peds, Psych, Path, PM&R)
Here, Step 2 CK means have inched up, but most program directors still view a broad range as acceptable.
A practical frame:
- Family med and pathology: anything ≥235 usually keeps doors open; ≥245 will look “above average” even at strong academic sites.
- Pediatrics and psych: ≥240 is comfortable; ≥250 can offset a weak school brand or sparse research.
- PM&R: similar to psych in many places, but some rehab‑heavy academic centers behave more like IM in their expectations.
In these specialties, a bad Step 2 can hurt you, but a merely decent one rarely blocks you out if the rest of your file shows consistent performance and genuine interest.
How to actually use this information
I am not telling you to fetishize every raw score report that leaks onto Reddit. You will only make yourself miserable.
Use the Step 2 CK drift data for three decisions that matter:
How long and how seriously you prepare
If you are targeting orthopedics and your school “recommends” a 2‑week dedicated Step 2 CK period, you ignore that and build a Step‑1‑style plan. The market for your specialty is not following your curriculum committee’s optimism.When you schedule the exam
You now need Step 2 CK on file before ERAS submission for maximum effect in competitive fields. Programs openly say they are less likely to interview you without a numeric Step 2 CK in hand, especially post‑P/F. The earlier pressure has nudged students to take Step 2 a bit sooner—another quiet driver of drift as more students sync studying with clinicals.How to calibrate your application list
If your Step 2 CK is clearly below the recent mean for matched applicants in your dream specialty, pretending otherwise is not strategy, it is denial. You can still apply, but you must rebalance: more backup specialties, more geographic breadth, and more realistic expectations of what program tiers are likely to look twice.
One hard truth: “Holistic review” did not erase the curve
You will hear a lot of rhetoric about holistic review in the Step 1 P/F era. Some of it is genuine. Programs do read more of your application. They have to.
But the numbers do not lie. In the aggregate, Step 2 CK has become more—not less—determinative for:
- Getting your file opened.
- Receiving an interview.
- Being placed in the “safe to rank highly” bucket.
Committee members still glance at the PDF, make a mental note—“255, OK, no concern”—and move on. Or they see “235” for a neurosurgery hopeful and start digging for explanations.
Key takeaways
- Step 1 going pass/fail did not flatten selection; it shifted numeric pressure squarely onto Step 2 CK, especially in competitive specialties.
- Step 2 CK means for matched applicants have drifted upward by 1–4 points across most specialties, with the largest relative effects in derm, plastics, ENT, ortho, and similar fields.
- Your Step 2 CK score now defines your competitive tier by specialty far more than before; plan preparation, exam timing, and your application list with that reality—not nostalgia for the old Step 1 era—in mind.