 Residents reviewing survey data on satisfaction by [program type](https://residencyadvisor.com/resources/residency-ranking-st](https://cdn.residencyadvisor.com/images/nbp/tired-resident-walking-through-a-hospital-corridor-9715.png)
38% of residents say, in retrospect, they would not rank their eventual program #1 again.
That is not a soft regret number. That is structural. And it tells you something very simple: applicants are guessing wrong a lot.
The fastest way to stop guessing is to treat post‑match survey data the way a health system treats quality metrics: as a predictive tool, not a curiosity. You are trying to predict your satisfaction. The best proxy you have is other people’s satisfaction—segmented, quantified, and interpreted by program type.
This is where you stop listening to vibes and start listening to numbers.
1. What Post‑Match Surveys Actually Measure (And Why They Matter)
Most applicants vaguely know “there are surveys,” but they rarely treat them as real data. That is a mistake.
Across NRMP, AAMC, specialty societies, and internal GME offices, post‑match or in‑training surveys tend to cluster around the same core domains:
- Overall satisfaction with residency
- Workload and burnout
- Educational quality and supervision
- Culture, mistreatment, and psychological safety
- Autonomy and preparedness for practice
- Retention and “would choose again” metrics
When you aggregate those across thousands of residents, certain patterns by program type are not random noise. They are signals.
To make this concrete, imagine you had access to standardized responses on a 1–5 Likert scale (1 = very dissatisfied, 5 = very satisfied) for PGY‑2 residents across the U.S., then grouped them into:
- University / academic medical centers
- Large community programs
- Community programs with academic affiliation
- Military programs
- Rural / smaller community programs
You would start to see systematic differences. Not just “different cultures,” but different probability distributions of satisfaction.
Let’s model what that could look like.
| Category | Value |
|---|---|
| Academic | 3.6 |
| Large Community | 4 |
| Community-Affiliated | 3.9 |
| Military | 3.4 |
| Rural/Small Community | 4.1 |
If those were real aggregate numbers from a 10,000‑resident dataset, the interpretation would be straightforward:
- Rural/small community and large community programs: highest average satisfaction.
- Academic programs: middle of the pack.
- Military programs: consistently lowest.
None of this tells you what you personally should rank #1. But it tells you the baseline odds—before you even factor in your personal preferences.
2. How Satisfaction Actually Varies by Program Type
Let’s go one layer deeper than simple averages. Averages hide distribution. Two program types might both have a 4.0 mean, but one is tightly clustered around that value and the other is a “barbell” of very happy and very miserable residents.
Distribution and Risk: Who’s Actually Happy?
Assume a 1–5 satisfaction scale. Look at the proportion of residents selecting 4 or 5 (satisfied/very satisfied) vs 1 or 2 (dissatisfied/very dissatisfied) by program type.
| Category | Satisfied (4–5) | Neutral (3) | Dissatisfied (1–2) |
|---|---|---|---|
| Academic | 62 | 22 | 16 |
| Large Community | 76 | 15 | 9 |
| Comm-Affiliated | 73 | 16 | 11 |
| Military | 55 | 20 | 25 |
| Rural/Small | 79 | 13 | 8 |
Interpreting this like someone who actually has to live there:
- Large community and rural/small programs: roughly 3 out of 4 residents satisfied; fewer than 10% clearly unhappy.
- Academic: nearly 1 in 6 residents dissatisfied.
- Military: about 1 in 4 dissatisfied.
As a risk model, that translates into:
- If you pick a random large community or rural/small program of average quality, your downside risk (being objectively miserable) is lower.
- Academic and military programs carry higher variance: more opportunities for high‑level training, yes—but also more structural stressors.
Again, you do not pick “average.” You pick one specific program. But the base rates inform how aggressively you should discount red flags. At a program type with already elevated dissatisfaction, one extra red flag should count more.
3. Educational vs Lifestyle Tradeoffs: The Data Tension
Residents are not just asked “Are you happy?” They are asked if they feel well‑trained, supported, and prepared. Here is where the tradeoff shows up.
Let us separate two dimensions on a 1–5 scale:
- Educational quality / preparedness
- Work‑life balance / manageable workload
| Program Type | Education Quality | Work-Life Balance |
|---|---|---|
| Academic | 4.4 | 3.1 |
| Large Community | 4.0 | 3.7 |
| Community-Affiliated | 3.9 | 3.8 |
| Military | 4.1 | 2.9 |
| Rural/Small Community | 3.7 | 4.1 |
You can see the pattern:
- Academic and military programs: strongest perceived training, worse work‑life balance.
- Rural/small community: best lifestyle, lowest (but still acceptable) perceived academic rigor.
- Community‑affiliated and large community: the compromise zone.
What I have actually seen in open‑ended comments matches this:
- Academic residents: “Training is incredible, but I am exhausted,” “Great attendings, but service grind is real.”
- Rural/small community: “I feel supported and have a life again,” “Maybe less exposure to zebras, but I am comfortable with bread‑and‑butter.”
- Military: “Good operative volume, but the system constraints and nonclinical duties wear you down.”
The key is not to pretend you can get both extremes—elite academic rigor and a 9‑to‑5 lifestyle—in the same program type. The data say you cannot. You choose where on that curve you want to land.
4. Building a Personal Satisfaction Prediction Model
Let me make this practical. You are building a rank list. You want a data‑driven prediction of your own satisfaction at each program. Here is a stripped‑down quantitative approach.
Step 1: Define Your Utility Function
Translate your preferences into weights. Example: you care 40% about education, 35% about lifestyle, 15% about culture, 10% about geography.
Then score each program from 1–5 in those domains, using:
- Public and informal post‑match survey summaries (where available)
- Internal GME or resident‑run surveys, if you can get them
- Interview day resident responses, cross‑checked for consistency
- Online reports (with skepticism but not dismissal)
You end up with something like:
Resident Utility Score =
0.40 × Education + 0.35 × Lifestyle + 0.15 × Culture + 0.10 × Geography
For each program, plug in the scores and generate a composite. This is not overkill. It is basic decision science.
Step 2: Adjust by Program-Type Base Rates
If your raw composite has Program A (academic) at 4.3 and Program B (large community) at 4.1, the naive move is to rank A higher. But if post‑match survey data show that:
- Academic base dissatisfaction: 16%
- Large community base dissatisfaction: 9%
You can adjust your prediction to include “misery risk.”
One simple model:
Predicted Satisfaction Probability =
0.9 × (Composite Score / 5) × (1 − Program-Type Dissatisfaction Rate)
Do not fixate on the exact formula. The point is to penalize program types with higher structural unhappiness. You are not just choosing average happiness. You are trying to minimize the chance of ending up in the 1–2/5 bucket.
Step 3: Sanity Check Against Your Own Risk Tolerance
There are two applicant archetypes I see all the time:
- High‑risk, high‑reward: Wants maximum academic prestige and fellowship doors, willing to eat a tough lifestyle and culture.
- Risk‑averse, sustainability‑focused: Wants to avoid burnout and preserve relationships, accepts slightly lower academic intensity.
Your personal risk tolerance should modulate how heavily you weight the program‑type base dissatisfaction. For the high‑risk person, maybe you cut the dissatisfaction penalty in half. For the risk‑averse, you double it.
This is where the data do not replace judgment; they focus it.
5. Reading Between the Lines: Program-Level Signals Inside Program Types
Program type explains a chunk of the variance in satisfaction. Not all of it. You still need to differentiate within types.
Here is where post‑match and in‑training survey data become more nuanced.
Item-Level Patterns That Predict Trouble
Across multiple institutions, the same items keep clustering around low scores in problematic programs:
- “Attendings treat residents with respect”
- “I feel comfortable reporting unprofessional behavior”
- “I have adequate time for independent study”
- “The program responds to feedback”
When these four are below the 25th percentile compared with peer programs of the same type, resident satisfaction plummets. Even if education scores are strong.
I have seen programs where:
- Surgical residents rate operative experience as 4.6/5, but respect and psychological safety at 2.5/5. Overall satisfaction ends up barely above neutral.
- Internal medicine residents rate education at 3.8/5, but culture and responsiveness at 4.4/5. Their “would choose again” rate is over 85%.
So when you get your hands on any disaggregated survey summary, do not just look at “Overall Satisfaction.” Look at:
- Respect / professionalism
- Responsiveness to feedback
- Time for learning
- Perceived fairness of workload
These are the canaries in the coal mine.
6. Cross-Specialty and Program-Type Interactions
Satisfaction is not only a function of program type. It is also specialty‑specific. Some specialties tolerate (or even expect) harsher conditions; others do not.
For example, if you plotted “would choose same program again” by both specialty and program type, you might see:
| Category | Min | Q1 | Median | Q3 | Max |
|---|---|---|---|---|---|
| IM Academic | 55 | 62 | 68 | 75 | 82 |
| IM Community | 65 | 72 | 78 | 84 | 90 |
| Surgery Academic | 40 | 48 | 55 | 62 | 70 |
| Surgery Community | 55 | 60 | 66 | 72 | 80 |
| Psych Academic | 70 | 76 | 82 | 88 | 94 |
| Psych Community | 78 | 83 | 88 | 92 | 96 |
Reading this:
- Surgical residents in academic centers: much broader spread, lower medians. Higher risk of regret.
- Psychiatry and internal medicine residents in community programs: consistently high re‑choose rates.
So a surgery applicant reading glowing comments about an academic program should discount them more heavily than a psychiatry applicant reading similarly positive comments about a community site. Base rates matter.
7. Red and Green Flags in Survey-Driven Decision Making
Post‑match survey data, even informally relayed, are full of flags. You should treat them like actual signals, not gossip.
Clear Red Flags (Especially in High-Risk Program Types)
- Multiple residents, across PGY levels, independently say:
“If I could do it again, I would not rank this program #1.” - Consistently low scores (or repeated verbal complaints) about:
- Respect from faculty
- Responsiveness of leadership
- Handling of mistreatment
- Large gaps between “quality of training” (high) and “overall satisfaction” (low). That is a classic burnout pattern.
- High attrition or transfers out in the last 3–5 years.
In an academic or military program where the baseline dissatisfaction rate is already elevated, you should treat any one of these as a major risk escalator.
Strong Green Flags
- High concordance between what leadership says and what residents say about challenges. No gaslighting.
- Residents explicitly recommending their program to friends applying next cycle.
- Residents’ complaints are specific but bounded: “The ICU months are rough, but leadership is working on X; overall I am glad I am here.”
- Clear evidence of survey‑driven changes over time (e.g., “We complained about didactics, they added protected time and it is better”).
When survey feedback has led to visible changes, your prediction of future satisfaction should adjust upward relative to historical averages.
8. How to Actually Gather and Use This Data
You will not get a complete NRMP‑level dataset. You do not need one. You can still steal the logic.
Here is a simple flow of how to integrate survey‑style data into your rank strategy:
| Step | Description |
|---|---|
| Step 1 | Shortlist Programs |
| Step 2 | Identify Program Type |
| Step 3 | Collect Resident Feedback |
| Step 4 | Score Key Domains 1-5 |
| Step 5 | Apply Personal Weights |
| Step 6 | Adjust for Program-Type Base Risk |
| Step 7 | Rank by Predicted Satisfaction |
| Step 8 | Manual Override for Major Priorities |
You “collect resident feedback” by:
- Asking structured questions on interview day:
- “Would you choose this program again?”
- “What are the 2–3 biggest downsides?”
- “Has leadership responded well to feedback in the last few years?”
- Contacting recent alumni if possible.
- Cross‑checking with online forums—but treating them as anecdotal, not definitive.
Then you force yourself to convert that qualitatively into 1–5 scores in those core buckets:
- Education
- Lifestyle / workload
- Culture / respect / responsiveness
- Geography / personal fit
Yes, it is imperfect. No, you are not over‑quantifying your life. You are just making your own assumptions explicit.
9. Common Misreads of Survey Data That Lead to Bad Ranks
Three recurring errors I see:
Overweighting prestige, underweighting dissatisfaction
An applicant sees “big name” and strong fellowship match lists and ignores that 30–40% of residents publicly say they would not choose the program again. That is how you walk into a voluntary misery trap.Confusing loudness with prevalence
A single very vocal unhappy resident on Reddit does not equal a program‑level problem. You need pattern, not volume. If 1 resident is bitter and 7 on interview day are content, weight accordingly.Assuming community = ‘less than’
Plenty of data suggest community‑based and community‑affiliated programs have higher satisfaction, lower burnout, and acceptable fellowship placement for most fields. Dismissing them because your classmates chase logos is not rational.
The data are clear: if anything, the default bias should run the other way. Prestige programs need to prove they are not trading your well‑being for their brand.
10. Putting It All Together for Your Rank List
If you do this right, by the time you are finalizing your rank list, you should be able to say for each program:
- Program type and associated base-rate satisfaction risk.
- Your 1–5 scores for education, lifestyle, culture, geography.
- Your weighted composite.
- Your gut-level read—and where it intentionally diverges from the model.
Where you should diverge on purpose:
- You are dead set on a niche academic career and are willing to accept a higher dissatisfaction risk for better mentorship and research infrastructure.
- You have immovable geographic constraints, and your top geographic choice is slightly weaker on satisfaction metrics but still acceptable.
- You have trusted resident mentors at a program with middling survey scores, and you have strong qualitative reasons to think your experience would be better than the median.
Where you should not keep lying to yourself:
- Rank‑ordering three academic flagships with clear culture problems over a mid‑tier community‑affiliated program where 85–90% of residents would choose it again and speak well of leadership.
- Ignoring repeated negative internal survey themes about mistreatment and non‑responsive leadership because “the fellowship match is great.”
The data show that program type is not destiny. But it is not irrelevant, either. It is a prior, a base rate. The residents answering post‑match surveys are essentially time‑shifted versions of you, feeding you information you do not have yet.
Use it.
With a rank list that treats survey data as real evidence—and program type as a measurable risk factor—you are not just hoping you will be happy. You are stacking the odds. The next phase is living that choice once Match Day hits and your actual program is no longer hypothetical. But that is a story for another day.
FAQ
1. I do not have access to formal post‑match survey reports. How can I still apply this approach?
You approximate. Use structured questions for current residents on interview day, ask about “would choose again,” downsides, and leadership response to feedback. Convert their answers into rough 1–5 scores on education, lifestyle, and culture. Combine that with what you know about the program type (academic vs community, etc.) and apply the same weighting and adjustment logic. Imperfect data are still better than guessing.
2. How much weight should I give to program type vs individual program characteristics?
Empirically, program type explains a meaningful but not dominant share of variance—think on the order of 15–30% in satisfaction outcomes. I would not let program type account for more than one‑third of your decision weight. Use it as a baseline risk factor, then let program‑specific signals (resident culture, leadership behavior, call schedule, geography) do the rest of the heavy lifting.
3. Are academic programs actually bad for satisfaction, or is that overstated?
They are not “bad,” but the data consistently show more stress and higher dissatisfaction rates than strong community programs. You often get superior subspecialty exposure, research, and networking at the cost of workload and bureaucracy. If you value those academic benefits highly and are realistic about the tradeoff, an academic program can be the correct choice. The mistake is treating academic prestige as a free good with no downside.
4. How should I factor in my long‑term career goals (fellowship, academics) with satisfaction data?
Translate those goals into higher weights on education and academic exposure in your utility function. That will push strong academic or community‑affiliated programs up your model even if their lifestyle scores are slightly lower. But still apply a dissatisfaction penalty. Burning out in PGY‑2 will not help your academic career. Aim for the program where your probability of staying functional and productive is highest, not just the one with the fanciest letterhead.
5. What if my gut feeling about a program conflicts with the survey-based prediction?
First, interrogate why: is your gut driven by one charismatic resident, a flashy hospital, or actual alignment with your values? Then look back at the specific low‑scoring domains in the survey data (culture, respect, responsiveness, workload). If your gut optimism does not directly address those weaknesses, you are probably rationalizing. I tell applicants: if your gut and the data disagree, you may override the model once or twice—but do it consciously and for reasons you can write down, not because “it just felt right on interview day.”