
The way most applicants compare residency programs is broken. You do not need more vibes. You need a red‑flag scorecard that forces reality onto the table.
You are not choosing a vacation. You are choosing the environment that will shape your skills, mental health, and career trajectory for years. “The residents seemed happy” is not a data point. It is background noise.
You fix that by building a structured, ruthless scorecard.
Below is a step‑by‑step system I have used with applicants to turn hand‑wavy impressions into something you can actually compare and defend. You will walk away with a living document that:
- Survives interview day hype
- Filters out marketing and “we’re a family” nonsense
- Highlights genuine danger zones before you rank
Let’s build it.
Step 1: Decide What Your Scorecard Is For (and What It Is Not)
Your red‑flag scorecard has one job: identify programs that can damage your training or wellbeing, no matter how shiny they look on paper.
It is not about:
- Prestige signaling
- Whether your medical school classmates will be impressed
- The fanciest hospital lobby or best “free food” situation
The scorecard is your safety system, not your dreamboard.
Think of it like this:
- Green flags = nice to have
- Yellow flags = be cautious, ask more
- Red flags = rank lower or cut entirely
You will still have a separate list for positive factors (case volume, fellowship match, location appeal), but this article is about a red‑flag scorecard only—the “do I trust this place with three to seven years of my life?” filter.
Step 2: Define Your Red‑Flag Categories
Do not start with 30 categories. You will never use it. You want 6–9 domains that actually predict suffering and poor training.
Here is a core set that works across most specialties:
- Workload and Coverage Safety
- Culture and Psychological Safety
- Education and Supervision
- Program Stability and Leadership
- Resident Outcomes and Retention
- Compliance and Duty Hours
- Support Services and Resources
- Equity, DEI, and Professionalism Handling
- Geography / Life Logistics (as a risk factor)
You can merge or split a couple, but if you drop any of the top 6, you are making a mistake.
Let me spell out what belongs in each, because this is where people usually under‑specify.
1. Workload and Coverage Safety
This is “Will they work me into the ground and then blame me when things break?”
Look at:
- Number of residents per service vs average census
- Frequency of unfilled slots / chronic understaffing
- Frequency of “jeopardy” calls and how often they are actually used
- Presence of non‑physician support (APPs, scribes, nocturnists, phlebotomy, transport)
- Realistic call schedule vs what is advertised
2. Culture and Psychological Safety
You are trying to detect: fear, blame, bullying, and fake “family.”
Signals:
- Residents afraid to speak candidly (glances at chief before answering)
- Comments like “you just have to keep your head down intern year”
- Normalization of yelling, public shaming, or “pimping” that feels like hazing
- Leadership boasting about “grit” and “thick skin” more than mentorship
- Any hint that therapy, pregnancy, illness, or disability are viewed as weakness
3. Education and Supervision
Red flags here are subtle because every program claims to care about education.
You want to know:
- Who actually teaches on busy rotations (attendings vs each‑other)
- Structured teaching: daily didactics, case conferences, simulation
- How often residents feel unsafe due to lack of supervision
- Clear expectations for graduated responsibility vs sink‑or‑swim chaos
- Whether procedures are supervised properly or signed off just to meet numbers
4. Program Stability and Leadership
If the leadership is unstable, everything else fluctuates with it.
Look for:
- Recent turnover of PD, APDs, or core faculty
- History of probation, ACGME citations, or closure of sister programs
- Vague answers about “changes happening next year” with no specifics
- Residents who do not know what the 3–5 year plan is
5. Resident Outcomes and Retention
You want to know what actually happens to people after and during training.
Data points:
- Board pass rates (not just last year, but trend)
- Fellowship match or job placement that fits their stated goals
- Residents leaving the program early, switching specialties, or going “non‑categorical”
- PGY‑2 and PGY‑3 classes that suddenly have gaps or lots of “new” faces
6. Compliance and Duty Hours
This is usually where programs either lie or dodge.
Watch for:
- Residents openly saying “we do not log honestly”
- Pressure to falsify duty hours (“put that call as home” / “delete that” / “you can fix this later”)
- Chronic post‑call work past noon being brushed off as “expected”
- No mechanism to report violations anonymously
7. Support Services and Resources
You are looking for whether the system is set up for humans or for disposable cogs.
Things to ask about:
- Access to mental health, confidential and actually available
- Availability of backup for personal emergencies
- Reasonable parental leave that people actually use
- Housing support, parking logistics, food access on nights/weekends
- IT and EMR support, availability of workstations, call rooms, lockers
8. Equity, DEI, and Professionalism Handling
This is not fluff. Poor handling of bias and harassment is a direct red flag.
Signals:
- Vague DEI talk with no examples of concrete policies or actions
- Residents from minoritized groups looking visibly hesitant answering DEI questions
- Any “we have no issues here” response (every program has something; denial is alarming)
- Formal process for reporting harassment or discrimination—and whether anyone trusts it
9. Geography / Life Logistics (as Risk)
Location is not just “fun city vs boring town.” It can be a safety or burnout risk.
Examples:
- Extreme cost of living with inadequate salary
- Dangerous commute at night, poor public transport
- Distance from any personal support system
- Limited childcare options if that matters for you
Step 3: Turn Categories into a Scoring Scale
You now need a consistent scale that you can use fast, on the fly.
I prefer a 0–3 red‑flag scale per category:
- 0 = Clean
No meaningful red flags identified. Minor annoyances only. - 1 = Mild Concern
Some yellow flags; watch but not disqualifying alone. - 2 = Significant Concern
Clear red flags or multiple yellows; program drops on rank list unless very strong counter‑reasons. - 3 = Dealbreaker / Hard No
Serious safety, ethics, or culture problems. Do not rank, or rank at the very bottom.
You will tally these across categories, but do not treat them like cute test scores. A single 3 in the wrong place (e.g., duty hour fraud, serious abuse) can overshadow a bunch of zeros.
To visualize this quickly, use a simple bar chart mindset: higher total red‑flag score = more dangerous program.
| Category | Value |
|---|---|
| Program A | 3 |
| Program B | 9 |
| Program C | 1 |
| Program D | 7 |
| Program E | 5 |
Quick rule of thumb:
- 0–3 total: Likely safe on red‑flag front
- 4–7: Proceed with caution; scrutinize closely
- 8+: Something is structurally wrong; needs a strong reason to stay on your list
Step 4: Build the Actual Scorecard Layout
Keep this simple enough that you can fill it in the same day as the interview.
Use a one‑page grid per program:
| Category | Score (0–3) | Notes / Examples (specific, not vague) |
|---|---|---|
| Workload & Coverage Safety | ||
| Culture & Psychological Safety | ||
| Education & Supervision | ||
| Program Stability & Leadership | ||
| Resident Outcomes & Retention | ||
| Compliance & Duty Hours | ||
| Support Services & Resources | ||
| Equity, DEI, Professionalism | ||
| Geography / Life Risk | ||
| **Total Red‑Flag Score** |
Basic rules:
- Fill in scores the day of the interview while memory is fresh.
- Under “Notes,” write direct quotes and concrete details, not feelings.
- Bad: “Vibes were off.”
- Good: “Resident: ‘Everyone lies on duty hours or you get in trouble.’”
- If you are unsure, write a question mark and mark it 1 for now, then adjust after follow‑up.
Step 5: Decide Your Non‑Negotiables in Advance
If you do not define dealbreakers before interview season, you will rationalize anything when a big‑name program flatters you.
Sit down once and write:
- Automatic 3s (dealbreakers)
- Automatic 2s (strong concerns)
Example automatic 3s:
- Evidence or credible reports of retaliation for reporting concerns
- Systematic duty hour falsification or threats tied to reporting
- Normalized physical or verbal abuse (“Our attending throws instruments sometimes, but that is just how they are”)
- Program on ACGME probation for resident wellbeing or education issues
- Residents clearly fearing leadership (checking over shoulder, dodging questions)
Example automatic 2s:
- Chronic understaffing with no plan to fix it
- PD recently resigned and no clarity on new leadership
- More than one resident leaving or switching out in last 2–3 years
- Repeated comments about “we are working on culture” without specifics
Write these in the margin of your template. You want as little “maybe it’s fine” wiggle room as possible.
Step 6: What Data to Collect and How to Ask
A scorecard is only as good as the information feeding it. You will get some from websites, some from interview day, and some from back‑channel conversations.
Before Interview Day: Pre‑Fill What You Can
From program websites, FREIDA, ACGME, and forums, pull:
- Program size and call structure (night float vs 24‑hour call)
- Known probation history or big restructuring
- Location cost‑of‑living vs salary
- Board pass rate if available
- Any published duty hour violations or news articles
Pre‑flag anything concerning with a provisional 1 so you remember to ask.
On Interview Day: Ask Targeted, Uncomfortable Questions
You do not need to be rude. You need to be precise.
To residents (in a no‑faculty room):
- “How often are people staying more than 2 hours past shift end?”
- “Do you feel comfortable calling your attending at 2 a.m. if you are worried?”
- “Have any residents left the program in the last few years? Why?”
- “If you report a problem—duty hours, mistreatment—does anything actually change?”
- “How often do you get formal teaching on your busiest rotations?”
To PD/leadership:
- “What are you most actively trying to improve about this program in the next 2 years?”
- “Have you had any ACGME citations or probation in the last 5 years? What changed because of it?”
- “How do you handle underperforming or struggling residents?”
- “How do residents give you feedback, and can you give an example of something that actually changed?”
Watch not just what they say but how they say it. Evasive, defensive, or over‑polished answers are data.
Step 7: Fill the Scorecard the Same Day
Do not trust your future self. After a long interview trail, programs blur together.
Your post‑interview protocol:
- Within 2 hours of leaving (or logging off Zoom), jot bullet points on your phone or notebook. Raw, unfiltered.
- That evening, transfer to your scorecard sheet for that program. Assign 0–3 scores per category.
- Highlight any category with score ≥2 in red or bold.
If you are interviewing at many programs, this is where a simple spreadsheet helps. Do not get fancy. Program name in rows, categories in columns, automatic sum at the end.
| Step | Description |
|---|---|
| Step 1 | Interview Day |
| Step 2 | Immediate Notes |
| Step 3 | Same Day Scorecard |
| Step 4 | Assign Scores per Category |
| Step 5 | Highlight 2s and 3s |
| Step 6 | Update Master Comparison Sheet |
| Step 7 | Adjust Rank List Later |
Step 8: Compare Programs Using the Scorecard (Not Your Ego)
Once you have 5–10 completed scorecards, the patterns will start to jump out.
Now you:
- Sort programs by total red‑flag score
- Then look at where those points sit
Example:
- Program X: Total 6, but mostly 1s spread across categories
- Program Y: Total 5, but a single 3 in “Culture & Psychological Safety” and a 2 in “Duty Hours”
On paper, Y has fewer total points. In real life, Y is riskier. Your system should respect critical categories.
One way to structure this mentally is to treat a few domains as “weighted heavier.” For instance:
- Any 3 in Culture, Duty Hours, or Supervision should push a program to the bottom tier, regardless of total
- Clusters of 2s in Workload + Outcomes probably mean chronic burnout or poor training
You can visualize this with a quick boxplot to see how one program compares to others across categories.
| Category | Min | Q1 | Median | Q3 | Max |
|---|---|---|---|---|---|
| Workload | 0 | 1 | 1 | 2 | 3 |
| Culture | 0 | 1 | 2 | 2 | 3 |
| Education | 0 | 0 | 1 | 2 | 3 |
| Leadership | 0 | 0 | 1 | 1 | 2 |
| Outcomes | 0 | 0 | 1 | 1 | 2 |
| Duty Hours | 0 | 1 | 1 | 2 | 3 |
You do not need software to do this, but seeing that one program consistently hits 2–3 in multiple domains should make you pause.
Step 9: Integrate Red‑Flag Score with “Positive” Ranking Factors
You still care about:
- Case volume and procedural exposure
- Academic reputation and fellowship opportunities
- Location personal preferences
- Niche interests (global health, QI, research)
Fine. Build a second scorecard for positives if you want, but keep the red‑flag one independent.
Practical way to merge:
- First, exclude or bottom‑rank programs with catastrophic red‑flag scores or any hard 3s.
- Among the remaining, use your positive factors to stratify.
Think of it as a two‑step filter:
- Safety / non‑toxic baseline (red‑flag scorecard)
- Fit and ambition (positive features)
Do not reverse that order. If you start with “dream programs” and then try to fit in red‑flags, you will rationalize away real problems.
Step 10: Use the Scorecard to Structure Back‑Channel Conversations
Your classmates, alumni, and “friend of a friend who did residency there” are not automatically reliable. They are biased by their personality, year, and tolerance for garbage.
You improve the signal by anchoring them to your categories:
- “How was the culture? Any bullying or fear about speaking up?”
- “How honest were people about duty hours and documentation?”
- “Did many residents leave or fail boards while you were there?”
- “If you could change one thing about the workload or staffing, what would it be?”
Then you translate their answers into your scoring system. Not “They said it was tough but fine.” Instead: “Workload probably a 2; culture maybe a 1; duty hours unclear.”
This way, back‑channel intel becomes structured data, not gossip.
Step 11: Update Your Scorecards Over the Interview Season
Programs change. PDs leave. Residents graduate. You should treat your scorecard as a snapshot, not scripture.
Practical maintenance routine:
- After each interview block, re‑scan all scorecards briefly.
- If you learn new information (probation, leadership change) from emails or forums, update that program’s scores.
- Before certifying your rank list, do one last pass:
- Any program with new red flags moves down.
- If something has improved (new PD you respect, better staffing), you can cautiously lower a 2 to 1—with notes.
| Category | Program A | Program B | Program C |
|---|---|---|---|
| October | 4 | 2 | 6 |
| November | 4 | 5 | 6 |
| December | 3 | 6 | 6 |
| January | 3 | 7 | 6 |
Programs that accumulate red‑flag points over the season rarely surprise you in a good way later.
Step 12: Common Mistakes to Avoid with Your Scorecard
I have watched applicants build beautiful systems and then ignore them when prestige shows up. Do not do this.
Here are the usual failure modes:
Over‑weighting brand name
- “Yes, everyone seems miserable, but it is [insert big‑name hospital].”
- Translation: You are volunteering for suffering to impress people who will forget in a year.
Ignoring your own data
- You literally wrote: “Residents terrified of PD. Multiple left. Duty hour lying.”
- Then you still rank them top 3 because “amazing research.” That is self‑betrayal.
Letting one friendly resident override systemic issues
- There is always one PGY‑3 who “loves it here.” They might be unusually resilient. Or delusional. Or leaving next year. You are not them.
Not distinguishing between “hard” and “toxic”
- Intense but fair vs abusive is a crucial line. Hard but supported can be excellent training. Hard plus gaslighting and retaliation is a red flag factory. Your scorecard should reflect that difference.
Forgetting your life outside the hospital
- Three years of impossible commutes, unsafe neighborhoods, or isolation from any support can break you even if the program is “good on paper.” Respect your geography / life‑risk category.
Step 13: A Concrete Example Walk‑Through
Let me show you how this works with two hypothetical internal medicine programs.
Program Alpha
- Residents candid about being busy but say “We feel supported and can always call attendings.”
- PD has been in role 8 years, clear long‑term plan.
- One ACGME citation 3 years ago for inadequate ambulatory experiences, since corrected with visible changes.
- Board pass rate >95% for last 5 years.
- Cost of living moderate, salary reasonable, commute 20–30 minutes.
You score:
- Workload & Coverage: 1 (busy but staffed)
- Culture & Psych Safety: 0
- Education & Supervision: 0–1 (strong overall)
- Leadership & Stability: 0
- Outcomes & Retention: 0
- Duty Hours: 1 (occasional long days but honest logging)
- Support Resources: 1
- DEI & Professionalism: 0–1 (clear policies, used at least once)
- Geography / Life Risk: 0
Total: 3–5. This is a green program with realistic, non‑toxic challenges.
Program Beta
- Multiple residents mention “just getting through intern year” and “you learn not to complain.”
- PD left 6 months ago; interim PD “still figuring things out.”
- Two residents left in last 3 years; rumor on forum about bullying by a specific attending.
- Residents all laugh vaguely when asked about duty hours: “We log what we have to.”
- Great research and fellowship match; big‑name institution in high cost‑of‑living city; salary low.
You score:
- Workload & Coverage: 2 (chronic short‑staffing on nights)
- Culture & Psych Safety: 3 (fear and normalized suffering)
- Education & Supervision: 2 (learn a lot but trial‑by‑fire)
- Leadership & Stability: 2 (interim PD, unclear direction)
- Outcomes & Retention: 2 (residents leaving)
- Duty Hours: 3 (dishonest logging strongly implied)
- Support Resources: 1
- DEI & Professionalism: 1–2 (no clear examples; rumors of issues)
- Geography / Life Risk: 2 (expensive city, low salary)
Total: easily 16–19. This should fall hard on your rank list, no matter its national reputation.
Step 14: How This Ties Into the “Future of Medicine”
This is not just about your personal comfort. A generation of residents who normalize toxic training environments becomes a generation of attendings who perpetuate them.
When you:
- Systematically identify dangerous programs
- Refuse to rank programs that retaliate, lie, or abuse
- Feed honest feedback to your med school and peers
You change where applicants apply and match. That pressure is one of the few levers that actually moves program behavior.
Hospitals care about recruitment optics. If the programs with chronic red flags start struggling to fill, or only fill with desperate applicants, leadership eventually responds.
Is it fast? No. Is it perfect? No. But your individual choices, backed by a clear red‑flag scorecard, are a small, concrete way to stop feeding the worst parts of this system.



Key takeaways:
- Build a structured red‑flag scorecard with 6–9 critical categories and a 0–3 scale. Use it the same day as each interview.
- Define non‑negotiable dealbreakers in advance and let them override prestige, hype, and ego.
- Use the scorecard to filter out unsafe or toxic programs first, then rank the remaining ones by fit, training quality, and your long‑term goals.