
The way most students pick residency programs is lazy and expensive. You need a scorecard, not vibes.
Most applicants do one of three things:
- Sort by “name brand” and location
- Copy their classmates’ lists
- Panic-apply to 80+ programs with no clear plan
All three are bad strategies. You are making a multi‑year decision that will shape your training, your sanity, and your career options. You cannot treat that like picking an Airbnb.
A residency program scorecard fixes this. It turns “this feels good” into “this is an 86/100 and clearly better than that 72/100 program for my priorities.”
Here is how to build a real, usable scorecard that actually helps you choose and rank residency programs.
Step 1: Define Your Core Domains (What Actually Matters)
If you skip this part, the rest is noise. You must decide what buckets you are judging programs on.
For most people, the right domains are a mix of:
- Training quality
- Career impact
- Lifestyle and wellness
- Environment and culture
- Personal logistics
Let’s break those into specific, scorable domains. You will not use all of these; you will pick 6–10.
Common Residency Program Domains
Clinical Volume and Breadth
- Patient volume (are you seeing enough, but not drowning?)
- Case mix (bread‑and‑butter vs complex tertiary/quaternary care)
- Procedural opportunities (for surgical fields, EM, IM with procedures)
Teaching and Education
- Quality of didactics (protected time, structure, attendance)
- Faculty teaching culture (do they actually teach on rounds or just sign notes?)
- Feedback quality and frequency
- Board prep support (in‑house review, qbanks, pass rates)
Autonomy and Supervision
- Balance of responsibility vs backup
- Senior vs attending presence on calls, codes, procedures
- Decision‑making opportunities for residents
Fellowship and Job Placement
- Fellowship match data (not vague “we place well,” but concrete numbers)
- Job placement (for fields where you work right after residency)
- Program reputation in your intended subspecialty, if you have one
Program Culture
- How residents talk when faculty are not in the room
- Psychological safety (can you say “I do not know” without punishment?)
- Team dynamics (nurses, APPs, consults)
- Diversity, inclusion, and how they show up beyond the brochure
Wellness and Workload
- Call schedule and nights
- Real vs reported duty hours
- Administrative burden (scut, notes, chasing consults)
- Burnout vibe (you can see this on interview day if you look)
Location and Life Outside Work
- Cost of living
- Proximity to support system
- Commute
- Safety, schools (if relevant), partner job market
Compensation and Benefits
- Base salary + differential (night, weekend)
- Housing support, meal stipends
- Parking, transportation, childcare support
- GME support for conferences, exams, books
Program Stability and Leadership
- Program director tenure
- Recent major changes (mergers, EMR switch, losing key rotations)
- How they handle resident feedback and complaints
Research and Academic Opportunities
- Mentorship availability
- Protected research time (if promised, is it real?)
- Publication and presentation track record for residents
You do not need 20 domains. That will dilute your decisions. For most applicants, 7–10 is ideal.
Step 2: Assign Weights (Because Everything Is Not Equal)
If you treat all domains as equal, you will get a polite but useless spreadsheet.
You must decide: what counts double? What barely matters?
Think in percent weights that add to 100. Here is a reasonable baseline for a typical internal medicine applicant aiming for fellowship:
| Domain | Weight (%) |
|---|---|
| Clinical Volume/Breadth | 20 |
| Teaching/Education | 15 |
| Fellowship Placement | 20 |
| Program Culture | 15 |
| Wellness/Workload | 10 |
| Location/Life Outside Work | 10 |
| Research Opportunities | 10 |
Different specialty, different weights. A community‑focused FM applicant might flip research to 0–5% and increase location and culture. A surgical applicant might heavily weight operative volume and autonomy.
How to Set Your Own Weights (Practical Method)
Do this on paper or in a notes app:
List your top 8–10 domains.
Start by assigning “default 10%” to each. You now have ~80–100%.
Now ask:
- “Which of these domains, if bad, is a dealbreaker?” Give those +5–10% each.
- “Which of these domains would be nice, but I can live without?” Drop those to 5% or even 0%.
Adjust until you total 100%.
If you are stuck, force yourself to make one domain 25% and one domain 5%. You will discover what you actually care about.
For example, an EM applicant who cares a lot about autonomy and lifestyle may end up with:
| Category | Value |
|---|---|
| Clinical Volume | 20 |
| Autonomy/Supervision | 20 |
| Program Culture | 15 |
| Wellness/Workload | 20 |
| Location | 15 |
| Comp/Benefits | 10 |
Do not overthink the exact numbers. You are trying to create a reasonable reflection of your priorities, not a perfect economic model.
Step 3: Build a Simple, Scorable Rubric
The scorecard lives or dies on how you translate “vibes” into numbers. You need:
- A consistent scoring scale (e.g., 1–5 or 1–10)
- Clear anchor descriptions so you are not guessing what “4” means every time
- A way to handle missing or fuzzy data
Recommended Scale: 1–5 With Clear Anchors
Use 1–5. Ten‑point scales look precise but are mostly noise.
Example for Program Culture:
5 – Excellent
- Multiple residents on interview day independently describe strong support and psychological safety.
- Minimal trash‑talk, but realistic about challenges.
- Residents interact positively with faculty and each other. You see laughter, not forced smiles.
- You would be genuinely excited to work with these people.
4 – Good
- Overall positive, some minor concerns but nothing alarming.
- A few residents seem tired but not broken.
- No clear red flags, but not “this is my tribe” either.
3 – Mixed/Unknown
- Signals conflict: some residents positive, others guarded.
- Hard to get straight answers.
- Culture may vary heavily by rotation or site.
- You have unanswered questions after the interview day.
2 – Concerning
- Multiple comments about toxicity, lack of support, or conflict with leadership.
- You hear things like “we are like a family” but residents look miserable.
- Visible tension between staff groups.
1 – Red Flag
- Reports of bullying, retaliation, or unsafe practices.
- High recent attrition, residents quitting or transferring.
- You leave thinking “absolutely not.”
You will build similar 1–5 anchor descriptions for your top domains. They do not need to be perfect, but they should be specific enough that “3 vs 4” feels like a real difference.
Step 4: Create the Actual Scorecard Template
Use a spreadsheet. Google Sheets, Excel, Notion, whatever you like. Just not your memory.
Basic Structure
Columns:
- Program name
- Each domain score (1–5)
- Weighted score per domain
- Total composite score
- Notes / red flags
Rows:
- One row per program
Example layout:
| Program | Volume (20%) | Teaching (15%) | Placement (20%) | Culture (15%) | Wellness (10%) | Location (10%) | Research (10%) | Total Score |
|---|---|---|---|---|---|---|---|---|
| Prog A | ||||||||
| Prog B | ||||||||
| Prog C |
Formula logic (example with 1–5 scores):
- For each domain:
Domain Weighted Score = (Score / 5) * Weight - Total score = Sum of all domain weighted scores (0–100)
You do not round until the end. Let the spreadsheet handle the math.
Step 5: Collect Real Data (Not Just Brochure Talk)
Most people fill scorecards with whatever the program says on its website. That is marketing, not data.
You need three sources:
- Official / published data
- What you observe
- What residents tell you when faculty are not around
1. Official / Published Data
Pull from:
- Program websites
- ACGME and FREIDA
- Resident handbooks (if available)
- NRMP data books and Charting Outcomes
- Program presentations on interview day
For things like:
- Case numbers (operative logs, procedure counts)
- Board pass rates
- Fellowship match lists (by year)
- Salary and benefits
- Rotation structure
Log this in a separate tab or section so you do not keep re‑googling.
2. What You Observe
During interviews, treat yourself like an investigator:
- How do residents talk to each other in the pre‑interview social?
- How do they talk about nurses and consults?
- Do they complain only about normal residency pain, or do they hint at something systemic and toxic?
- On hospital tours, do people look surprised to see an interview group or is this routine and organized?
Write notes the same day. Your brain will blur programs together after the 5th Zoom.
3. What Residents Tell You When They Relax
This is where the truth lives.
Ask sharp, specific questions:
- “What is the worst part of this program that people do not talk about on interview day?”
- “If you could change one thing about this program tomorrow, what would it be?”
- “How has the program handled a serious resident issue or mistake?”
- “Do you feel safe admitting when you do not know something?”
You are not trying to catch them. You are giving them permission to be honest.
Update your scorecard immediately after each interview. Not the next day. Memory decay is brutal during interview season.
Step 6: Score Programs Immediately, Then Revisit Later
The first score is based on your immediate impression + notes. It captures gut reaction, which has value.
Process:
Same day as interview:
- Fill in every domain with a 1–5 score
- Jot 3–5 bullet notes for context
- Let the sheet calculate your total score
End of interview season:
- Revisit each program with some distance
- Adjust scores if:
- New information emerges
- Your sense of your priorities changes
- Do not change scores based purely on prestige anxiety
You will notice interesting patterns. For example, you might see:
| Category | Value |
|---|---|
| Program A | 88 |
| Program B | 82 |
| Program C | 77 |
| Program D | 74 |
| Program E | 69 |
Then, when you look back, Program B (82) might “feel” better than Program A (88) because A is across the country and your partner cannot move. That is fine. The scorecard is a tool, not a dictator. But at least now your choice is conscious.
Step 7: Use the Scorecard to Build Your Rank List
Here is how you convert this into a rational rank list without letting fear or prestige run the show.
1. Group Programs by Score Tiers
Once everything is scored:
- Tier 1: 85–100 (excellent fits)
- Tier 2: 75–84 (good fits)
- Tier 3: 65–74 (acceptable, with compromises)
- Tier 4: <65 (only rank if you must)
This prevents you from agonizing over whether an 87 vs 85 program should be #3 or #4. They are in the same quality band.
2. Within Tiers, Break Ties Using 1–2 “Trump” Domains
Pick 1–2 domains that break ties when total scores are close:
- Future fellowship applicant: Fellowship placement > everything
- Long‑term local: Location > everything
- Burned‑out MS4: Wellness/culture > everything
If Program A and B are both Tier 1, but A destroys B on your trump domain, A goes higher.
3. Explicitly Mark “Rank Only If Needed” Programs
Some programs will end up on your list only because you need enough options to match safely.
Mark them as such in your sheet. They may be:
- Score <65 overall
- Or single‑domain disaster (e.g., culture 1/5)
Rank them at the bottom, after every reasonable option. They are for safety, not preference.
Step 8: Real Examples (How This Plays Out)
Let’s run through a simple, condensed example for three hypothetical internal medicine programs.
Applicant Priorities
- Wants cardiology fellowship
- Values teaching and fellowship placement most
- Will tolerate moderate workload
- Location matters but is not dominant
Weights:
- Clinical Volume/Breadth – 20%
- Teaching/Education – 20%
- Fellowship Placement – 25%
- Program Culture – 15%
- Wellness/Workload – 10%
- Location – 10%
Programs (numbers are 1–5 raw scores):
| Domain / Weight | Prog X | Prog Y | Prog Z |
|---|---|---|---|
| Volume (20%) | 5 | 4 | 3 |
| Teaching (20%) | 3 | 5 | 4 |
| Fellowship (25%) | 4 | 5 | 3 |
| Culture (15%) | 3 | 4 | 5 |
| Wellness (10%) | 2 | 3 | 4 |
| Location (10%) | 4 | 3 | 5 |
Compute weighted scores (out of 100):
Prog X
- Volume: (5/5)*20 = 20
- Teaching: (3/5)*20 = 12
- Fellowship: (4/5)*25 = 20
- Culture: (3/5)*15 = 9
- Wellness: (2/5)*10 = 4
- Location: (4/5)*10 = 8
- Total: 73
Prog Y
- Volume: (4/5)*20 = 16
- Teaching: (5/5)*20 = 20
- Fellowship: (5/5)*25 = 25
- Culture: (4/5)*15 = 12
- Wellness: (3/5)*10 = 6
- Location: (3/5)*10 = 6
- Total: 85
Prog Z
- Volume: (3/5)*20 = 12
- Teaching: (4/5)*20 = 16
- Fellowship: (3/5)*25 = 15
- Culture: (5/5)*15 = 15
- Wellness: (4/5)*10 = 8
- Location: (5/5)*10 = 10
- Total: 76
Result:
- Program Y: 85 (Tier 1)
- Program Z: 76 (Tier 2)
- Program X: 73 (edge Tier 2/3)
Maybe you liked the city for X more than Y. But if you claim cardiology fellowship is your #1 goal, Y is your rational #1.
This is what the scorecard does: it exposes the difference between your stated priorities and your emotional pulls. Then you decide which one wins, consciously.
Step 9: Common Mistakes and How to Avoid Them
I have seen people blow up good options because they made predictable mistakes. Do not repeat them.
Mistake 1: Overweighting Prestige
US News rankings and “my attending said this is top‑tier” often correlate poorly with:
- Teaching quality
- Resident happiness
- Your actual fellowship chances (especially in non‑academic careers)
Fix:
- Cap “Overall Reputation” at 10–15% weight
- Force it to compete with your other domains
- Do not use prestige as a tiebreaker if it contradicts your trump domain
Mistake 2: Ignoring Red Flags Because “Name Brand”
If a place has:
- Multiple residents quietly warning you
- Recent mass resignations
- Serious concerns about safety or retaliation
I do not care what the logo is. That program should plummet on your list.
Fix:
- Build an automatic “red flag penalty”: any program with a culture score of 1 gets an automatic −15 or more on total score or moves to bottom tier.
- Mark red‑flag programs in bold red in your sheet.
Mistake 3: Overcomplicating the Scorecard
I have seen 25‑column sheets that no one uses after week two.
Fix:
- Start with 6–8 domains
- Only add more if you genuinely use them in decisions
- If you find a domain never influencing your thinking, delete it
Mistake 4: Letting Other People’s Priorities Run Your List
Your classmate wants academics and big‑name research. You want to be near family and do solid community practice. Your lists should not look the same.
Fix:
- Do your weights alone first.
- Compare with trusted friends after, but do not clone theirs.
- If you adjust your weights, write down why. If the “why” is “I got scared,” reconsider.
Step 10: Make the Scorecard Easy to Use During Interview Season
You will be tired. You will be on the road or on Zoom. If your system is clunky, you will not use it.
Practical setup:
Create one main sheet with:
- Program list
- Domain weights locked
- Columns ready for 1–5 scores
- Automatic total score calculation
Create a second tab for:
- Raw notes per program
- Links to program websites
- Specific red/green flags
Keep a quick‑entry version on your phone:
- Even a simple note template:
- Score (1–5): Volume / Teaching / Culture / Wellness / Location
- 3 pros / 3 cons
- Even a simple note template:
Later that night, transfer to the main sheet. Ten minutes per program beats confusion in February.
To visualize your final shortlist and how they stack up on key domains, a simple comparison chart helps:
| Category | Volume | Teaching | Culture | Wellness |
|---|---|---|---|---|
| Program A | 18 | 15 | 10 | 6 |
| Program B | 16 | 20 | 12 | 8 |
| Program C | 14 | 18 | 14 | 10 |
You do not need fancy visuals, but sometimes seeing the stack makes the tradeoffs painfully clear. Which is the point.
FAQ (Exactly 4 Questions)
1. Should I change my scorecard weights after interview season starts?
Yes, but rarely and for clear reasons. If you realize, after 5 interviews, that culture and wellness matter more than you admitted, you can adjust weights once and re‑run scores. What you should not do is tweak weights after every interview to justify liking or disliking a specific program. Set an explicit “recalibration day” midway through the season, adjust once, and then lock it.
2. How many programs should I even include in my scorecard?
Every program you apply to is ideal, but not always realistic. At minimum, include all programs that offer you an interview. If your spreadsheet feels unmanageable, you applied to too many. For most competitive but not extreme specialties, 15–25 scored programs is common. Extremely competitive fields may require more, but the scorecard helps prevent “spray and pray” from turning into chaos.
3. What if I fall in love with a program that scores lower than another?
Then you look at why. Pull up the breakdown. If the only reason it scores lower is, for example, slightly weaker research but you no longer care about research, explicitly change your weights and recompute. If the lower score is due to serious red flags in culture or workload, think very hard before ignoring that. Emotions are data, but so are residents quietly warning you.
4. How do I get honest information without making residents uncomfortable?
Ask concrete, non‑accusatory questions. Use phrases like “How does the program handle…” instead of “Is your program toxic?” Example: “How does the program respond when residents raise concerns?” or “Can you tell me about a time a resident struggled and what support looked like?” Also, pay attention to what they avoid answering. Silence, long pauses, or “every program has issues” with no specifics—those are data points too.
Key points, no fluff:
- Build a weighted scorecard around 6–10 domains that actually matter to you, not your classmates or your attendings.
- Use consistent 1–5 scoring with clear anchors, log data immediately after each interview, and let the math expose tradeoffs and red flags.
- Use the composite scores to create tiers and a rational rank list, then consciously decide when to override the numbers—rather than letting prestige or panic quietly rewrite your priorities.