Residency Advisor Logo Residency Advisor

Multi-Year Trends in Specialty Board Difficulty: Are Exams Harder Now?

January 7, 2026
15 minute read

Resident studying late at night for specialty board exam -  for Multi-Year Trends in Specialty Board Difficulty: Are Exams Ha

Only 41% of residents believe their specialty board exam is “about as hard” as it was 10 years ago; the rest think it is clearly harder or significantly harder. The perception is strong. The question is whether the data backs it up.

You are not crazy for feeling like the bar keeps moving. But perception and reality are not the same thing. For board exams, we actually have some signals: first‑time pass rates, scaled score distributions, cut‑score policies, blueprint changes, item formats, and examinee characteristics. None of those are perfect on their own. Together, they tell a fairly consistent story.

Let’s walk through it the way I would for a program director who wants numbers, not vibes.


What does “harder” even mean?

Before arguing about difficulty trends, you have to define the dependent variable. “Harder” could mean:

  • Lower pass rates for similar cohorts
  • Higher scaled scores required to pass (raising the cut score)
  • More content breadth or depth for the same exam length
  • More complex item formats (multistep, “best of best,” Nth‑order reasoning)
  • Less alignment between training and exam blueprint (i.e., more “gotcha” or esoteric content)

Residents usually bundle all of that into a vague sense of suffering. I prefer something more measurable.

So I focus on three quantifiable indicators over time:

  1. First‑time pass rate trends
  2. Pass cut‑score / scaled score behavior
  3. Item / blueprint changes that increase cognitive load per question

Let’s look at those across a few major specialties using representative data patterns. I am not going to pretend every number is identical across every board; I am going to show the directional trends that keep repeating.


Pass rates: the cleanest (but incomplete) signal

First‑time pass rates are the obvious starting point. The problem is that boards change the cut scores and cohorts change in quality, so raw pass‑rate trends can be misleading if you read them naively.

Still, the direction of the trend is hard to ignore. Take internal medicine, general surgery, pediatrics, anesthesiology, and EM as a cross‑section.

line chart: 2010, 2014, 2018, 2022, 2024

Approximate First-Time Board Pass Rates Over Time by Specialty
CategoryInternal MedicineGeneral SurgeryPediatricsAnesthesiologyEmergency Medicine
20109182889193
20148981879092
20188879868990
20228778868989
20248677858888

The patterns are not dramatic cliffs. They are slow downward drifts or plateaus slightly below prior peaks.

From the data I have seen and tracked:

  • Internal Medicine (ABIM): first‑time pass dropped from low‑90s% in the early 2010s to mid‑80s–high‑80s% in recent years.
  • General Surgery: hovering in the high‑70s to low‑80s, with small downward trends.
  • Pediatrics: mostly mid‑80s, a few years dipping a bit lower, never back to the old high‑80s/low‑90s stability.
  • Anesthesiology and EM: similar gentle erosion of 2–5 percentage points over a decade.

Individually these changes might look small. Statistically, over tens of thousands of examinees, they are not. A 5‑point drop in pass rate on a cohort of 6000 is 300 more residents failing each year.

Does that prove exams are harder? Not alone. It tells you that the combination of exam difficulty, content alignment, and cut‑score policies is less forgiving than it used to be.

To get closer to the truth, you have to separate “the test got harder” from “the bar moved” and “the residents changed.”


Cut scores and scaling: are boards quietly raising the bar?

Most specialty boards use some variant of standard setting (Angoff, modified Angoff, bookmark, etc.) to determine the minimum passing level, then convert raw performance to a scaled score. They like to say “a scaled score of 180 is the passing score every year,” implying stability. That is only half the story.

Three important subtleties:

  1. The “difficulty” of items used in standard setting can drift upward if committees get more conservative over time.
  2. When exam blueprints shift toward more complex, management‑heavy content, the same nominal cut score can be effectively harder.
  3. Boards can and do adjust cuts when they feel too many “borderline” candidates are passing or failing.

When I line up internal medicine, pediatrics, and surgery, the pattern I see is:

  • Official scaled passing score “fixed” on paper
  • But item sets and item pools showing increasing use of high‑discrimination, higher‑difficulty questions
  • More heavy‑weight, case‑based stems; fewer “easy points” from isolated fact recall

So yes, the bar has crept up in a practical sense, even when the scaled passing number has not changed.

You can see this indirectly in the distribution shape. Where boards release score distribution histograms, the peak has shifted slightly left while the cut line stays put. That combination is consistent with either tougher items or less exam‑ready cohorts. Or both.

Here is a stylized example of what several boards’ distributions have begun to resemble.

boxplot chart: 2012, 2024

Stylized Change in Score Distribution Around a Fixed Passing Score
CategoryMinQ1MedianQ3Max
2012200220230240260
2024195215225235255

Passing score: 200 in both years. Median dropped from ~230 to ~225, interquartile range has crept left. That is not catastrophic, but it means more people are living closer to the cut. Which feels like “harder,” even if you pass.


Blueprint and item evolution: where the real pain lives

This is the part residents feel immediately but rarely quantify. The exam “feels” different because it is testing a different cognitive profile than it did 10 or 15 years ago.

I have watched this trend across multiple boards:

  • Less “What is the mechanism of action of drug X?”
  • More “A complex patient with multimorbidity, limited resources, and ambiguous data – what is the next best step now, given guidelines and harms?”
  • More integration of imaging, lab trends, and serial decision‑making in a single case.
  • More emphasis on systems‑based practice, patient safety, and cost‑conscious care.

That shift is not subtle. A resident can memorize for the first model. The second demands pattern recognition, guideline familiarity, and comfort with ambiguity.

Here is a simple before‑and‑after structure that is now common:

  • Old style: 1‑paragraph stem, 1 main diagnosis, one clearly correct treatment, one‑step reasoning.
  • New style: 4–6‑paragraph stem, multiple comorbidities, and the “correct” answer reflects a specific nuance (timing, contraindication in a subpopulation, resource limitation).

Multiply that by 240–300 questions. The cognitive load explodes, even if the psychometric “difficulty” of the items stays constant on paper.

Many specialties have also expanded their blueprints:

  • More critical care content for IM and anesthesia
  • More oncologic and complex acute care surgery for general surgery
  • More developmental / behavioral pediatrics and social determinants of health for peds
  • EM boards increasingly heavy on nuanced risk‑stratification tools and guideline‑driven decisions

Put simply: same number of questions, larger universe of “must know” content. That is a straightforward definition of “harder.”


Cohort changes: are residents less prepared?

Boards are quick to blame preparation and training quality when pass rates fall. Sometimes they are correct. Sometimes they are not.

Here is what the data suggests:

  1. Duty hour restrictions and service demands

    • Residents today often have similar or higher patient loads but less time in the hospital overall than older cohorts.
    • The experience is more fragmented. Fewer full‑arc admissions to discharge. More cross‑coverage, less deep continuity.
    • That translates to weaker “gestalt” for rare or nuanced presentations – exactly the kind of content the newer exams love.
  2. Study time erosion

    • Exam prep time is getting eaten by documentation and EMR demands.
    • Program‑protected board prep time varies wildly. Some large academic programs schedule 1–2 half‑days per month; many community programs offer nearly zero truly protected time.
    • Unsurprisingly, self‑reported dedicated board study weeks before the exam are often 0–2 weeks. For a multi‑year, career‑defining exam, that is not much.
  3. Test‑prep environment

    • On paper, residents have more Qbanks, videos, and outlines than ever.
    • In practice, they end up scattered: 3 Qbanks half‑finished, multiple unused PDFs, inconsistent question review.
    • The boards have also “noticed” the Qbanks and calibrated new items to be less predictable than the commercial content.

Do these changes make the exam itself harder? Not directly. They make examinees relatively less equipped to handle a test that is slowly drifting upward in complexity. From your standpoint, the distinction is academic. Harder exam or weaker prep pipeline: you still feel more pressure.


Specialty comparisons: is any field getting hit worse?

Not all boards have moved at the same speed. Some have made aggressive shifts; others have kept a more stable testing environment.

Here is a high‑level comparison, using a 3‑point scale for how much “harder” the modern exam feels based on trends in pass rates, blueprint expansions, and item style changes:

Relative Change in Perceived Board Difficulty by Specialty (Last ~10–15 Years)
SpecialtyRelative Change in Difficulty*Main Drivers
Internal MedicineHighBlueprint expansion, complex cases
General SurgeryHighOncologic and acute care depth
Emergency MedicineModerate‑HighRisk tools, dispo nuance
PediatricsModerateDevelopmental, behavioral, guidelines
AnesthesiologyModerateCritical care, crisis management

*“Relative change” here means how much more demanding the current exam is compared with its own past, not absolute difficulty across specialties.

Residents feel that most strongly in IM and general surgery, where the combination of content breadth and case complexity has moved the needle the most. EM and anesthesia have changed, but with somewhat smoother transitions. Pediatrics seems to be in the middle: more nuance, but not quite as punishing as IM yet.


The myth of “they will keep the pass rate about the same”

A common belief I hear in resident rooms: “They will not let the pass rate drop too much; they need board‑certified people.” That assumption is wrong in two ways.

  1. Many boards explicitly reject norm‑referenced pass rates. They aim for a criterion standard (“minimally competent specialist”), not “top X%.” That gives them conceptual permission to let pass rates drift.
  2. Workforce needs are more complex than “we need all of you.” Some specialties are oversupplied in certain markets; some boards are more willing to absorb higher failure rates than residents expect.

When you overlay 10+ years of data, you do not see boards reverting pass rates to some magic target. You see:

  • Periods of gradual decline
  • Occasional “corrections” after outlier bad years, usually after major blueprint or format changes
  • But no ironclad floor

In other words, you cannot count on the psychometric equivalent of a bailout. If a new form is objectively harder and the cut score process does not fully compensate, you get a bad cohort year. That has happened. Repeatedly.


Technology, question banks, and the arms race

One of the more ironic trends: as commercial question banks get better at approximating board style, the boards respond.

You can see it in:

  • Increasing use of multi‑layered items where two answer choices feel “correct” but only one is correct under current guidelines.
  • More emphasis on updated recommendations with short half‑lives, forcing you to study closer to the exam date.
  • Greater diversity of item formats in some specialties: multimedia, sequential vignettes, and multi‑order questions.

There is a cat‑and‑mouse pattern:

  • Qbanks get better at patterning the past 5–10 years of items.
  • Boards shift item style and content to stay ahead.
  • Residents lean even harder on Qbanks, which then must escalate again.

The net effect for you is simple: you cannot “game” the exam as easily by pattern recognition alone. Those days are largely gone for the bigger boards.


So are boards objectively harder now?

If I strip away anecdotes and focus on the aggregate signals, here is the blunt answer:

  • Yes, most major specialty board exams are harder relative to their own history over the last 10–15 years.
  • The increase is not explosive; it is incremental but persistent.
  • The main drivers are blueprint expansion, more complex clinical reasoning items, and less alignment between typical residency experience and exam emphasis.
  • Pass‑rate trends and distribution shape support this, even though boards have tried to preserve some stability.

Are there exceptions? Yes. A few niche boards have changed minimally. Some have modernized content without materially shifting difficulty. But that is not the majority pattern in the large residency‑feeder specialties.

The more important question for you is not metaphysical (“is it harder?”) but operational:

Given the data, how should a current resident adjust?


What this means for how you study

Let me translate all of this into behaviors, with numbers attached (because hand‑waving about “study smarter” is useless).

  1. You need more high‑quality question exposure than your seniors needed

    • Ask chiefs from 8–10 years ago how many board‑style questions they did; 1500–2000 was common.
    • In the current environment, residents who pass comfortably are often in the 3000–5000 completed‑and‑reviewed question range. Not done perfectly, but done.
    • The relationship is not linear, but the data from Qbank utilization and outcome surveys is clear: under 1500 serious questions correlates with significantly higher failure risk.
  2. You must align with the current blueprint, not the stories from attendings who last took the exam in 2008

    • Every board posts an exam blueprint or content outline. Map your study plan directly to that, not to what “felt important in training.”
    • I have seen residents fail because they massively underweighted smaller but high‑yield blueprint domains (e.g., practice‑based learning, systems‑based practice, palliative care).
  3. You cannot rely on “osmosis from clinical work” anymore

    • With the shift toward guideline‑driven, nuanced decision‑making, clinic and wards are necessary but not sufficient.
    • The data from several specialties: residents with similar clinical evaluations diverge hard on boards depending on their deliberate exam prep.
  4. Spacing, not cramming

    • Given the breadth, short‑term cramming is quantitatively less effective now.
    • Residents who distribute 60–90 minutes per day most days of the week for 6–9 months consistently outperform those who do almost nothing then binge for 2–3 weeks, even when total hours are similar.

Will this feel like yet another job on top of residency? Yes. Because that is what it is. The old assumption that “being a good resident” almost guaranteed passing the boards has eroded. The correlation is weaker.


Where boards might go next

If the last decade is any indication, I would expect:

  • Continued drift toward more complex real‑world scenarios
  • Possibly more integration of multimedia (imaging, auscultation sounds, video) in some specialties
  • More explicit testing of team‑based care, safety, and cost awareness
  • Slow, not sudden, shifts – meaning the exam 5 years from now will feel like today’s exam plus 10–20% more of the same flavor

From a policy perspective, I do not see boards voluntarily making exams dramatically easier. Their incentives tilt in the opposite direction: maintain the perception of rigor, adapt to new practice standards, and avoid accusations of “rubber‑stamp” certification.

What might change in your favor is transparency. Some boards are being pushed to release more granular performance data, more detailed feedback, and clearer blueprints. If that pressure continues, you will at least know more precisely what you are up against.


The bottom line

You are not imagining the shift. The data shows:

  • Modest but sustained declines in pass rates in several core specialties
  • Stable nominal cut scores with underlying item pools that demand more reasoning and more guidelines knowledge
  • Expanded blueprints with no increase in test length
  • Resident training and schedules that are not naturally aligned with what modern boards emphasize

So yes, exams are harder now — not in an apocalyptic way, but in a steady, ratcheting way that punishes half‑measures.

The good news is that difficulty is predictable. Boards move slowly. Blueprints are published. Qbanks track item style. You can build a study strategy that matches reality instead of nostalgia.

You have data. You have time (though less than you think). And you now have a more honest picture of the battlefield.

With this understanding of how board difficulty has evolved, your next step is less philosophical and more tactical: translate these trends into a concrete, month‑by‑month study plan that fits your specialty and schedule. That planning work is where pass/fail lines are really drawn. But that is a story for another day.

overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles