Residency Advisor Logo Residency Advisor

Comparing ACGME Surveys: Benchmarks for Red Flag Satisfaction Scores

January 8, 2026
13 minute read

Residency program leadership reviewing satisfaction survey dashboards -  for Comparing ACGME Surveys: Benchmarks for Red Flag

The most dangerous ACGME survey result is not a single complaint. It is a pattern of low satisfaction that leadership dismisses as “just feelings.” The data says otherwise.

Programs get shut down, placed on warning, or quietly blacklisted by applicants because they ignore their own numbers. If you want to know when residency satisfaction becomes a red flag, you start with the ACGME surveys. And you treat them like what they are: a large, standardized, national dataset with brutally clear benchmarks.

Let me walk through how to compare ACGME surveys in a way that actually surfaces red flags instead of smoothing them over with anecdotes and excuses.


What data we actually have (and what we do not)

The ACGME Resident/Fellow Survey and Faculty Survey are standardized instruments administered annually. They are not perfect, but they are consistent. That consistency is gold.

What you really get from these surveys:

  • Program-level compliance indicators (those bar graphs in ADS)
  • Domain scores: duty hours, supervision, evaluation, educational content, resources, professionalism, patient safety, teamwork, and overall satisfaction
  • Comparative benchmarks: your program vs specialty vs national

You do not get:

  • Raw item-level data with exact distributions (that stays with ACGME)
  • Free-text comments in the official reports (those are internal to programs if they run their own surveys)
  • Named respondents or class-level splits (for confidentiality)

So you are working with aggregates. But they are aggregates on a massive scale. Thousands of programs, tens of thousands of residents. That is enough to define meaningful benchmarks.

To make this concrete, consider three core satisfaction anchors most people care about:

  1. Overall satisfaction with program
  2. Would recommend program to others
  3. Adequacy of educational experience / preparation for independent practice

Across many specialties, internal data I have seen and ACGME presentations suggest that “agree or strongly agree” on these global items typically clusters in the 80–95% range for stable, non-toxic programs.

Once you drop far below that band relative to peers, you are not “just going through a tough year.” You are a risk outlier.


How to think about “red flag” satisfaction: relative, not absolute

The worst analytical mistake I see is treating a single percentage in isolation. “Eighty percent of residents would recommend us—seems good.” That sentence is meaningless without context.

You need three comparison layers:

  1. Historical (your program over time)
  2. Cross-sectional (your program vs same specialty nationally)
  3. Lateral (your program vs other programs in your institution)

Only when you combine all three do red flags become obvious.

Basic comparative benchmark table

Let us sketch a simplified benchmark grid. Assume these numbers are representative of what I commonly see in ACGME-like survey dashboards.

Benchmark Ranges for Key ACGME Satisfaction Indicators
IndicatorHealthy RangeWarning ZoneRed Flag Zone
Overall satisfaction with program≥ 90%80–89%< 80%
Would recommend program≥ 88%78–87%< 78%
Educational quality satisfaction≥ 90%82–89%< 82%
Feel prepared for next level≥ 88%80–87%< 80%
Psychological safety / speaking up≥ 85%75–84%< 75%

Those cut points are not official ACGME thresholds. They are empiric breakpoints where, in real programs I have worked with, the combination of survey data and outcomes (attrition, unmatched positions, citations) starts to look unstable.

The key concept: being 5–10 percentage points below your specialty’s median in satisfaction items is not noise. It is signal. And if that persists for 2+ consecutive survey cycles, it is a structural problem.


Reading ACGME survey dashboards like a data analyst

When I sit with program leadership and pull up their ADS survey report, I focus on six moves.

1. Index your specialty first

You are not compared to “all residency programs.” You are compared to your specialty and sometimes to institutional peers. OB/GYN will have a different satisfaction climate than Dermatology. That matters.

If the national “would recommend” for your specialty is 92% and you are at 79%, you are in trouble even if another field at your institution is lower. Applicants do not compare Medicine to Pathology. They compare Medicine to Medicine.

2. Look at z-scores, not just raw percentages

Even when ACGME does not give you explicit z-scores, you should conceptually think in those terms:

  • 0 to –0.5 SD below mean → normal variation
  • –0.5 to –1 SD → warning
  • Below –1 SD → red flag territory

Translated into simple percentage gaps, anything consistently 7–10 percentage points or more below your specialty’s national mean on global satisfaction is almost certainly more than one standard deviation down.

If your specialty national “overall satisfaction” is ~92% and you are at 82%, you are sitting about a full SD lower. That is not a “bad cohort.” That is a program-level problem.

bar chart: National Avg, Program A, Program B, Program C

Program Satisfaction vs National Specialty Average
CategoryValue
National Avg92
Program A83
Program B89
Program C94

In the bar chart above, Program A is the one that should keep people up at night, not because 83% sounds terrible in isolation, but because it is meaningfully lower than the 92% national reference.

3. Watch the trend line, not one bad year

Single-year dips happen. New PD. EMR meltdown. Pandemic surge. You want to see whether satisfaction recovers or stays suppressed.

A 3-year view tells the real story:

  • Year 1: 91% recommend program
  • Year 2: 86%
  • Year 3: 79%

That is not random. That is a downward drift of 12 percentage points. If your specialty remained stable over those years, you have a true decline. Odds are high you will also see parallel trends in duty hours, supervision, and burnout indicators.

line chart: Year 1, Year 2, Year 3

Three-Year Trend in 'Would Recommend Program'
CategoryValue
Year 191
Year 286
Year 379

By the time you are 10+ points below where you started, residents are already warning MS4s informally. Your “brand” is eroding even if no formal citation has landed yet.

4. Connect satisfaction to hard outcomes

The data is clear in most institutions I have worked with: low survey satisfaction correlates with:

If your satisfaction is tanking but your attrition is still low, you are probably burning through people who feel trapped (visa restrictions, family constraints) and cannot leave easily. That is not success. It is a delayed failure.

5. Separate “training quality” from “toxic culture”

Not all negative responses are equal. One program can have residents who think the educational structure is disorganized but still feel respected, safe, and supported. Another can have excellent board pass rates and still be psychologically abusive.

From the ACGME survey, the red-flag culture cluster usually includes:

  • “Faculty create an environment of respect”
  • “I feel comfortable reporting unprofessional behavior”
  • “Mistakes are handled in a way that promotes learning, not punishment”
  • “I can report patient safety concerns without fear”

When those drop into the red flag zone (< 75–80% agreement) while your board pass rate is still high, you do not have a “high expectations” culture. You have a high-risk environment that will eventually explode into formal complaints.

6. Check alignment between residents and faculty

The ACGME Faculty Survey provides a parallel measure of how faculty perceive the program. Misalignment is a dataset in itself.

Type 1: Both faculty and residents high satisfaction → stable program
Type 2: Faculty high, residents low → classic blind spot / denial
Type 3: Faculty low, residents “fine” → often early warning of coming resource or leadership failures
Type 4: Both low → systemic institutional issue

When I see residents at 78% “overall satisfaction” and faculty at 94%, I expect defensive leadership. “We think they are getting excellent training, they just do not understand how good they have it.” That attitude almost always worsens the numbers the following year.


Quantifying “red flag” satisfaction: an operational definition

If you want a clear, operational threshold for when satisfaction crosses from “not ideal” into “red flag,” combine three criteria:

  1. Absolute level: key satisfaction indicator below a fixed floor
  2. Relative gap: more than X percentage points below specialty mean
  3. Duration: present for at least 2 consecutive survey cycles

A reasonable cut that matches what I see in real programs:

  • Absolute: Overall satisfaction < 80% OR would recommend < 78%
  • Relative: ≥ 7 percentage points below national specialty mean
  • Duration: At least 2 years in a row

If a program meets all three for resident survey data, I would consider it “red flag” on satisfaction. That is the kind of program you see residents warn away from in applicant group chats.

Operational Definition of Red Flag Satisfaction
DimensionThreshold
Absolute levelOverall satisfaction &lt; 80%
Relative gap≥ 7 points below specialty mean
Duration≥ 2 consecutive years

You can apply the same structure to subdomains like psychological safety or supervision quality, just with slightly different cutoffs (those tend to run lower overall, so your floors may be 3–5 points lower).


Using ACGME surveys as an applicant: what you can and cannot see

Residents and applicants do not have direct access to program-level ACGME survey results. You see hints:

  • Citations in program letters
  • Rumors about “recent ACGME issues”
  • Sudden, aggressive recruitment efforts for a historically competitive specialty
  • Extra language in program websites about “recent program improvements” or “new leadership focusing on culture”

So you have to infer satisfaction from second-order signals.

Here is how I would advise a data-minded applicant to think about it.

1. Match unfilled positions and attrition to likely survey issues

Programs that suddenly have multiple unfilled slots after years of filling easily often have underlying survey problems. Residents do not leave programs in large numbers for no reason.

If you see:

  • Multiple PGY-2 or PGY-3 positions posted mid-year
  • A pattern of open spots in NRMP for that program
  • Alumni or residents talking about “leadership transition” in vague terms

I would assume, probabilistically, that recent survey cycles flagged significant dissatisfaction or environment concerns.

2. Correlate “vibes” with known benchmarks

On interview day, ask specific, quantitatively framed questions:

  • “Out of your residents, roughly what proportion stay for fellowship or jobs here versus leaving the institution?”
  • “How many residents have transferred out in the last 3 years?”
  • “How would you say resident satisfaction here compares to national benchmarks? Above average, average, below?”

Most programs will not quote exact percentages, but their comfort level answering versus deflecting tells you more than they think.

If a PD says, “We’re a bit below average on some culture items and have been working on X, Y, Z,” that is a lot more trustworthy than the “our residents are like family” speech with zero data.

3. Watch for defensive minimization

I have occasionally heard leadership say variants of: “The ACGME survey is just a venting tool; the residents were upset about one schedule issue that year.” Data does not support that narrative when the gaps are large and persistent.

The ACGME survey has noise. But at scale, it is remarkably stable. The more a program tries to discredit the instrument itself instead of engaging the numbers, the more likely they are running real deficits in satisfaction.


How leadership should be using the data (and what residents should look for)

From a program-improvement standpoint, the most effective programs I have worked with treat the ACGME survey as an early-warning system, not a report card to hide in a drawer.

There is a predictable pattern among programs that recover from bad satisfaction scores:

  1. They openly share aggregate results with residents and faculty (no cherry picking).
  2. They benchmark explicitly against specialty and institutional data.
  3. They create 2–3 measurable targets for the next survey cycle.
  4. They tie at least one target to culture/psychological safety, not just logistics.
  5. They report progress mid-year, not just after the next survey.

The presence or absence of that behavior is itself a “red flag satisfaction” indicator. If residents tell you, “We never see the ACGME results, they just tell us we are ‘meeting expectations,’” I would assume there is something they do not want to expose to scrutiny.


Comparing resident vs faculty surveys: a quick matrix

To simplify, you can think of resident/faculty satisfaction data on a 2×2 grid. Here is a conceptual version with made-up numbers that mirror patterns I see.

Resident vs Faculty Satisfaction Matrix
ScenarioResident SatisfactionFaculty SatisfactionInterpretation
Both high≥ 88%≥ 88%Stable, healthy culture
Residents low, faculty high&lt; 80%≥ 88%Resident experience dismissed
Residents high, faculty low≥ 88%&lt; 80%Hidden faculty burnout / strain
Both low&lt; 80%&lt; 80%Systemic institutional problems

The most dangerous quadrant for resident well-being is “residents low, faculty high.” That is where you see serious power imbalances, lack of psychological safety, and a leadership narrative that the trainees are “overreacting.” The ACGME survey will expose that split even if the program tries to hand-wave the results.


Future directions: ACGME surveys are getting more powerful, not less

The trajectory is clear: aggregate survey data is increasingly being used for:

  • Program accreditation decisions and citations
  • Institutional GME oversight
  • Applicant decision-making via word-of-mouth and online communities
  • Internal quality dashboards at the health system level

area chart: 2015, 2018, 2021, 2024

Growing Use of Survey Data in GME Oversight
CategoryValue
201520
201845
202170
202490

What does that chart represent? A rough proxy for how integrated survey metrics are in high-stakes GME decision-making. Institutions that once treated the ACGME survey as a compliance checkbox now feed it directly into executive scorecards.

Looking ahead, I expect:

  • More granular benchmarks by program size, region, and institutional type
  • Stronger emphasis on psychological safety and burnout indicators
  • Integration with other data streams: duty-hour violations, safety events, attrition statistics
  • Increased transparency pressured by applicants and national specialty organizations

Red flag satisfaction scores will become harder to hide. Programs that do not adapt will be left behind by both the ACGME and the applicant market.


Key takeaways

  1. ACGME satisfaction data is not “soft.” When you are 7–10 points below your specialty on overall satisfaction or “would recommend” for 2+ years, that is a true red flag, not a bad cohort.
  2. The most useful lens is comparative: your program vs its own history, vs specialty benchmarks, and vs faculty perceptions. Misalignment, especially residents low / faculty high, is itself a warning sign.
  3. Programs that treat survey data as an early-warning system and share it transparently usually recover. Programs that minimize or hide low satisfaction scores almost never do.
overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles