How Resident Satisfaction Surveys Predict Burnout Risk at a Program

January 6, 2026
15 minute read

Residents reviewing survey data on burnout risk -  for How Resident Satisfaction Surveys Predict Burnout Risk at a Program

The way most applicants glance at resident satisfaction surveys is statistically useless.

They skim for a star rating, glance at a couple of anonymous quotes, then move on. That is not analysis. That is vibes. And vibes will not protect you from a 70–80 hour week in a malignant culture with rising burnout and silent attrition.

If you treat resident satisfaction survey data like a serious dataset instead of marketing fluff, you can estimate burnout risk at a program with surprising accuracy. Not perfect. But far better than guessing from “the residents seemed nice on interview day.”

Let’s walk through how.


1. The Basic Equation: Satisfaction Scores → Burnout Risk

You cannot directly measure burnout as an applicant. You will not get a spreadsheet of Maslach Burnout Inventory scores by resident year.

But you can approximate a risk profile using three high-yield survey dimensions:

  1. Overall satisfaction
  2. Workload and autonomy
  3. Psychological safety and support

Each one behaves statistically like a predictor variable. Programs with low scores in these domains almost always show:

When I have seen internal program reviews that combine ACGME Resident/Fellow Survey data + internal climate surveys + outcome data, the pattern is consistent:

  • Programs in the bottom quartile of “would choose program again” have burnout indicators 1.5–2.0x higher than median programs.
  • Programs in the top quartile of “attendings treat residents with respect” have about half the serious burnout-related events (extended leave, mid-training resignation, formal wellness complaints) of the bottom quartile.

You do not need internal dashboards to infer this. The structure is already in the satisfaction surveys and what you can extract or infer from them.


2. Know Your Data Sources (and Their Biases)

You will encounter resident satisfaction data from several places. Each has a different signal-to-noise ratio.

Common Resident Satisfaction Data Sources
SourceData TypeAccess Level
ACGME Resident/Fellow SurveyQuantitativeIndirect/Summarized
Program internal surveysQuant + QualSelective
Doximity / online reviewsMostly QualPublic
Interview-day conversationsAnecdotalDirect

ACGME Resident/Fellow Survey

You will not see the raw ACGME data. But:

  • Programs are notified if they fall below certain national benchmarks.
  • Chronic low scores trigger citations and focused site visits.
  • PDs and chairs react strongly when their “duty hours,” “evaluation,” or “faculty supervision” items drop into the danger zone.

When a PD says on interview day, “We had some rough survey results three years ago, so we overhauled X, Y, Z,” that is not fluff. That is the ACGME data screaming “problem detected.”

Program-Generated Satisfaction Surveys

You will see these in:

  • Slide decks on interview day
  • Website “resident wellness” pages with glossy charts
  • Annual reports if the program is unusually transparent

They typically show Likert-scale responses (1–5 or 1–7) around:

  • Overall satisfaction
  • Educational quality
  • Faculty support
  • Work-life balance
  • Wellness resources

These are the numbers you want to dissect. But you must read them like an analyst, not a consumer.


3. How to Read the Numbers Like an Analyst

If you see any survey data in a slide or handout, immediately look for four things:

  1. Response rate
  2. Scale and anchors
  3. Distribution (not just averages)
  4. Trends over time

Response Rate: Who Actually Answered?

If response rate is below ~70%, the data is at higher risk of bias. I have seen programs with:

  • 40% response rate
  • Very happy subgroup answers
  • And complete silence from the overworked night float crew who never opened the link

You should mentally label low-response data as “possibly sanitized.” It does not mean the numbers are false, but it means you cannot assume they represent the whole cohort.

Scale and Anchors

Pay attention to what “4.0” actually means.

A 1–5 scale where:

  • 1 = strongly disagree
  • 3 = neutral
  • 5 = strongly agree

A mean of 3.6 is not “pretty good.” It indicates slight agreement. On something like “I have sufficient time for personal life,” a mean of 3.6 is actually mediocre.

Programs often present means without context. Adjust your mental calibration:

  • 4.3–4.5+ = Strong positive signal
  • 3.8–4.2 = Mixed / probably acceptable
  • 3.4–3.7 = Yellow flag
  • <3.4 = Red flag, especially on support and safety items

Distribution: Avoid Being Fooled by Averages

A mean of 4.0 could be:

  • A tight cluster around 4 (reasonable consistency), or
  • A bimodal split: half the residents at 5, half at 3 (or worse, 1 and 5).

Programs rarely show you the distribution. But if they do, read it carefully. A bimodal pattern tells you there are “haves” and “have-nots” inside the program — common in:

  • Certain tracks (community vs university)
  • Specific sites (VA vs main hospital)
  • Certain years (PGY-2s drowning while PGY-3s are coasting)

Resident experience inequality inside a program is a strong predictor of localized burnout. The angry half of the distribution is telling you where the fires are.


4. The Four Critical Survey Domains That Track Burnout

I am going to be blunt: some survey items correlate with burnout much more strongly than others. As an applicant, focus on four:

  1. “Would you choose this program again?”
  2. Attendings treat residents with respect.
  3. “I feel comfortable reporting concerns without fear of retaliation.”
  4. “I have enough time for rest and personal life.”

bar chart: Choose again, Faculty respect, Psych safety, Rest/time off

Key Resident Survey Domains Tied to Burnout Risk
CategoryValue
Choose again4.2
Faculty respect4.4
Psych safety3.9
Rest/time off3.3

Consider these as semi-quantitative predictors:

1. “Would You Choose This Program Again?”

This is your composite outcome variable. It compresses dozens of experiences into one decision. Data from multiple internal reviews I have seen:

  • Programs with “choose again” mean ≥4.3 tend to have lower attrition (<2% per year).
  • Programs with “choose again” mean ≤3.7 often have 2–3x the attrition and more LOAs.

If the program shows you anything like:

“90% of our residents would choose this program again”

Ask: “Out of how many? Which class years? What was it 2–3 years ago?” The trend matters.

2. Faculty Respect

Disrespect from attendings is gasoline on the burnout fire. Residents will tolerate long hours, high workload, and chaos if they feel:

  • Respected
  • Backed up clinically
  • Treated as learners, not disposable labor

If you could only see one number, I would pick this one. Programs with faculty-respect scores above 4.4 almost always show lower burnout events even when workload is heavy. Scores below 4.0 are a glaring warning sign.

3. Psychological Safety

Look for language like:

  • “I feel comfortable raising patient safety concerns.”
  • “I can report mistreatment without fear of negative consequences.”
  • “Feedback is encouraged and acted upon.”

Low scores here are extremely predictive of hidden burnout. When people are afraid to speak honestly, issues fester:

  • Toxic seniors remain unchecked
  • Chronic scheduling abuse continues
  • Bullying gets normalized

I have seen programs where the only stable low domain on ACGME surveys was psychological safety — and that was exactly where most of the serious burnout and mental health events clustered.

4. Time for Rest and Personal Life

This is where programs often underperform. But there is a difference between “this is residency, it is hard” and “this is unsustainable.”

Rough mental thresholds on a 1–5 scale:

  • ≥3.8: Tough but survivable, decent systems, people find ways to recharge
  • 3.4–3.7: Yellow zone — burnout risk moderated by other support systems
  • <3.4: Burnout risk is high unless the program is spectacularly supportive in other domains

If a program touts amazing fellowship match lists but shows you a “wellness” score around 3.2–3.4, do not ignore that. The outcome prestige is coming at a cost.


5. Trend Data: The Direction Matters More Than the Snapshot

One cross-sectional survey is a photograph. It does not show velocity.

If you can, push for trend data: “How have these scores changed over the last 3–5 years?”

line chart: Year -3, Year -2, Year -1, Current

Resident Satisfaction Trend Over 4 Years
CategoryChoose againFaculty respect
Year -33.63.8
Year -23.94
Year -14.14.2
Current4.34.4

Two very different risk profiles:

  • Program A: Currently 4.0, stable for 5 years
  • Program B: Currently 3.9, up from 3.2 three years ago after major leadership change

Program B is likely safer from a burnout trajectory perspective, even with a slightly lower current mean. They are moving in the right direction. A static 4.0 in a complacent culture can degrade quickly with leadership turnover or volume spikes.

When PDs talk about “we responded to resident feedback by X,” you want to hear:

  • Specific interventions (schedule changes, extra night coverage, preclinic time)
  • Measurable impact (“duty-hour violations down 40%,” “clinic notes after 8 pm dropped by half,” “wellness scores rose from 3.4 to 3.9”)

Leadership that measures and iterates tends to control burnout risk better.


6. Where the Numbers Get Distorted (and How to Correct)

You should assume that any data shown to applicants has been curated. That does not make it useless, but you need some correction factors.

Positive Bias

Residents who are deeply unhappy sometimes disengage and do not fill out surveys. Or they graduate and their negative experiences vanish from the current dataset.

Counter this by:

  • Asking seniors and recent grads directly: “What have they improved since you started? What is still bad?”
  • Listening for “they do not want me to say this, but…” hints on social media or in private conversations.

Sample Size and Small Programs

A 10-resident program with 100% response rate and a mean score of 4.6 is statistically different from a 60-resident program with the same 4.6.

In small programs:

  • One very happy or very unhappy resident massively skews the mean.
  • Year-to-year variance is higher.

Do not over-interpret small-program survey averages. Qualitative consistency across conversations matters more there.


7. Correlating Survey Data With Hard Outcomes

You do not have access to the internal burnout dashboards I have seen. But you do have partial proxies:

  • Board pass rates
  • Attrition (residents transferring out, disappearing)
  • LOA patterns (you may hear about frequent “personal leaves”)
  • Fellowship match stability (chaos tends to hurt structured mentoring)

Think in rough correlations:

  • Sustained low “choose again” + known attrition events = very high burnout risk
  • High satisfaction + consistent board pass rates + few transfers = lower risk
  • Big mismatch between glossy satisfaction numbers and whispered stories of people leaving mid-year = data manipulation or sample bias

scatter chart: Program 1, Program 2, Program 3, Program 4, Program 5

Hypothetical Link Between Satisfaction and Attrition
CategoryValue
Program 13.4,8
Program 23.8,5
Program 34.1,3
Program 44.3,2
Program 54.6,1

Above is a typical pattern: lower resident satisfaction correlating with higher annual attrition percentage. Not perfect, but consistent enough to take seriously.


8. How You Actually Use This When Ranking Programs

Let me make this practical. Here is how I would approach it if I were in your shoes.

Step 1: Extract Every Hard Number You Can

From slide decks, websites, and conversations, note:

  • Any reported mean satisfaction scores by domain
  • Any mention of “improved from X to Y over [time]”
  • Any “percent would choose again / recommend to a friend” stats
  • Any board pass rate data

Write them down. Do not trust your memory.

Step 2: Classify Programs by Burnout Risk Tier

Use a simple three-tier mental model:

  • Low risk: High faculty respect, moderate-to-high “choose again,” improving trends, no glaring red flags in psychological safety
  • Moderate risk: Mixed scores, some weaker domains but active efforts to improve, leadership responsive
  • High risk: Low or flat “choose again,” weak psychological safety, vague or defensive responses about survey issues, known attrition

This is not perfect statistics; it is Bayesian risk estimation with partial data. That is still far better than ignoring the data.

Step 3: Adjust for Your Own Priorities

Some applicants will trade higher burnout risk for:

  • Extremely strong fellowship placements
  • Unique niche training (interventional, global health, etc.)
  • Geography or family location

Fine. But make it a conscious tradeoff. Not an accidental one because you ignored the surveys.


9. Questions to Ask That Force Real Data

You will not get full spreadsheets. But you can ask questions that make it hard to answer with pure spin:

  • “How have your resident satisfaction scores, especially ‘would choose this program again,’ changed over the last 3 years?”
  • “What specific changes have you made in response to ACGME Resident Survey feedback?”
  • “How do wellness and workload scores differ between sites or tracks, if at all?”
  • “Have you ever received ACGME citations related to duty hours, supervision, or resident support? What did you do afterward?”
  • “Roughly what percentage of residents take some form of leave of absence during training?”

Watch not just the content of the answer but the comfort level. Programs that track and own their data answer these cleanly. Programs with brewing burnout risk either evade or get defensive.


10. Reading Between the Lines of Resident Comments

Numbers are the backbone, but you still need to read text.

Patterns that usually indicate high burnout risk even when the scores are “okay”:

  • Repeated mentions of “communication issues” with leadership
  • Residents saying “things are improving” but struggling to give concrete examples
  • Comments like “it is survivable” or “you will be fine if you are tough” instead of “I feel supported”
  • Off-hand phrases like “it was a rough couple of years, but…”

Programs that are truly low risk for burnout usually have residents who say:

  • “They listen when we complain and things actually change.”
  • “The hours are heavy, but I rarely feel completely alone.”
  • “I can safely say I would choose this again.”

Those qualitative statements usually align with the quantitative survey patterns I have seen.


FAQs

1. If a program does not show any survey data, should I assume the worst?
Not automatically, but the absence of data is data. Many excellent programs simply do not think in terms of marketing metrics. However, if they also dodge questions about ACGME feedback, cannot discuss specific improvements, or residents give vague answers about culture, then the lack of survey transparency becomes a real warning sign.

2. How much weight should I give resident satisfaction compared with reputation or fellowship match?
From a risk perspective, satisfaction is a leading indicator of your mental health and daily life. Reputation and match lists are lagging indicators of historical performance. The data I have seen supports giving resident satisfaction equal or greater weight than reputation, unless you have a very specific, high-stakes career goal that depends on a narrow set of programs.

3. Are high satisfaction scores ever a red flag (too good to be true)?
Rarely, but yes. A tiny program with 4.9/5 across every domain and no mention of challenges deserves a closer look for small-sample bias or social pressure to respond positively. In larger programs, extremely high scores accompanied by clear, concrete descriptions of what they changed and how they track outcomes are usually legitimate.

4. What if the residents I meet seem happy, but online reviews are brutal?
You are seeing selection and survivorship bias in real time. The residents who stay may be genuinely satisfied, while those who left early or were most burned out wrote the online reviews. In that case, focus on trajectory: ask about past problems and what has concretely changed in schedules, leadership, and support over the last few years.

5. Can I build an actual scoring spreadsheet for programs using this approach?
Yes, and you probably should. Create columns for: “choose again,” faculty respect, psychological safety, rest/time off, trend direction, and qualitative red/green flags. Score each on a simple 1–3 scale and sum them. It will not be perfect, but it forces you to convert vague impressions into structured, comparable risk estimates across programs.


Two key points: Resident satisfaction surveys are not fluff; they are one of the strongest observable predictors of burnout risk you will ever see as an applicant. And the way you interpret them — focusing on specific domains and trends, not headline averages — can be the difference between a sustainable residency and a slow-motion crash.

overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.
Share with others
Link copied!

Related Articles