
The most common way applicants misjudge programs is by ignoring the numbers behind program expansion and match fill. That mistake is expensive.
If you want to identify genuinely stable training sites, you need to treat residency programs like datasets, not glossy brochures. The story is in the trend lines: how fast they are growing, how reliably they fill, and how often residents leave or are replaced.
Let’s walk through how the data actually behaves and how to use it to your advantage.
1. The Core Metrics That Expose Stability (Or the Lack of It)
You cannot assess program stability from a single year snapshot. Stability is a time-series problem. Think 5–10 years, not “last cycle.”
Here are the primary quantitative signals I look at when I evaluate a program for stability:
- Match fill rate trends
- Program size and expansion rate
- PGY1–PGY3 (or PGY4) continuity
- Use of SOAP and post-Match scrambling
- ACGME citations and accreditation actions
- Resident complement vs. case volume
You will not always have perfect data for each, but you can get surprisingly close with publicly available information, NRMP data, ACGME reports, and some basic pattern recognition.
Match fill rate trends
The question is not “Did they fill this year?”
The question is “How predictably have they been filling over time?”
For any program, you want to know:
- What percent of positions filled in the Main Residency Match?
- How many were filled by U.S. MD, U.S. DO, and IMGs?
- Did they rely on SOAP?
- What is the volatility year-to-year?
| Category | Program A (Stable) | Program B (Expanding) | Program C (Unstable) |
|---|---|---|---|
| 2019 | 100 | 90 | 70 |
| 2020 | 100 | 100 | 80 |
| 2021 | 100 | 95 | 60 |
| 2022 | 100 | 100 | 75 |
| 2023 | 100 | 100 | 65 |
Interpretation:
- Program A: Rock-solid. Consistently 100% filled. Might be competitive or simply well-run.
- Program B: Some noise around expansion. Variability in early growth is expected, but it stabilized.
- Program C: Chronic underfill. This is a red flag. Either the reputation is poor, support is weak, or leadership is unstable.
If a program has underfilled in 3 out of the last 5 years, you are not looking at a healthy ecosystem. You are looking at a program fighting for survival or quality perception.
2. Program Expansion: Growth Pattern vs. Controlled Scaling
“New spots” sound good on paper. The data says: proceed cautiously.
Programs expand for three main reasons:
- Hospital system growth (new service lines, more beds, more volume)
- Institutional push to increase GME footprint and funding
- Desperation to cover service with cheaper labor (you)
Those are very different motives. The numbers will tell you which one you are dealing with.
Measuring expansion rate
Think in terms of percentage change in approved complement over time, not just raw positions added.
If a program went from 6 to 9 categorical spots in 2 years, that is a 50% expansion in intake. If they went from 9 to 20 in 3 years, that is >120% expansion. Completely different stability profile.
| Program | 2019 Categorical PGY1 | 2023 Categorical PGY1 | 4-Year Change | Interpretation |
|---|---|---|---|---|
| IM-A | 12 | 14 | +17% | Slow, controlled growth |
| IM-B | 8 | 16 | +100% | Aggressive expansion |
| IM-C | 10 | 10 | 0% | Stable size |
IM-A: Probably responding to increased volume or system-level growth. Reasonable.
IM-B: You want evidence that infrastructure and faculty scaled with it. Otherwise you are signing up for chaos.
IM-C: Stability in complement is not bad. Could signal mature program with steady demand.
Red flags in expansion patterns
Patterns I have seen before trouble hits:
- Large jump in spots (≥50% growth in 1–2 years) without new affiliated hospitals or major service line expansion.
- Growth concentrated in prelim or “transitional” spots with weak long-term planning.
- PGY1 intake rising quickly while PGY2/PGY3 numbers lag or plateau (implies leakage, attrition, or transfers).
The data view: uncontrolled expansion is like a startup hiring three times as many engineers without adding managers or product. Burnout, disorganization, and unhappy people.
You want modest, explainable growth that tracks with hospital capacity and faculty bandwidth.
3. Match Fill Patterns: Who Fills the Spots and How?
Not all “filled” spots are equal.
You care about:
- Fill in the Main Match vs. SOAP
- Proportion of categorical vs. prelim
- U.S. MD / DO vs. IMG mix over time
- Sudden changes in recruitment pattern
| Category | US MD/DO Main Match | IMG Main Match | SOAP |
|---|---|---|---|
| Year 1 | 8 | 2 | 0 |
| Year 2 | 7 | 3 | 0 |
| Year 3 | 6 | 4 | 1 |
| Year 4 | 4 | 5 | 3 |
Trends that should catch your eye:
- Increasing reliance on SOAP over several years → suggests declining desirability or negative word-of-mouth.
- Dramatic change in composition (e.g., from mostly U.S. MD/DO to mostly SOAP/IMG in 2–3 years) → often tracks with leadership turnover, workload spike, or major institutional problems.
This is not about IMG vs. non-IMG quality. It is about consistency and control. Stable programs usually have a predictable mix with minor year-to-year fluctuations. Unstable sites show sharp swings and scramble behavior.
SOAP as a stress indicator
One SOAP year is not a crisis. Patterns matter.
If a 10-position program had this trajectory:
- Year 1: 10/10 filled in main Match, no SOAP
- Year 2: 8/10 main + 2 SOAP
- Year 3: 6/10 main + 4 SOAP
- Year 4: 5/10 main + 3 SOAP + 2 unfilled
That is not random noise. That is a distress trend.
A truly stable site either:
- Fills consistently in the main Match; or
- Has one aberrant year, then recovers.
You are looking for resilience, not perfection.
4. Continuity Across PGY Levels: Tracking “Silent Attrition”
One of the most underused stability signals is internal cohort continuity.
The logic is simple: if a program takes 10 categorical interns yearly, you should see something very close to 10 PGY2s and 10 PGY3s (barring planned research years).
When you see large discrepancies, you should start asking why.
| Year | PGY1 Count | PGY2 Count | PGY3 Count | Quick Read |
|---|---|---|---|---|
| 2021 | 10 | 9 | 10 | Normal variation |
| 2022 | 10 | 7 | 8 | Noticeable loss |
| 2023 | 10 | 6 | 7 | Sustained attrition |
If you repeatedly see 10 PGY1 → 7 PGY2 → 6 PGY3, the data is saying residents are leaving, being non-renewed, or quietly transferring. None of that screams “stable, supportive environment.”
Practical move:
On interview day, literally count residents on the website and on the schedule boards by PGY level. Ask yourself whether the numbers line up with reported intake. They often do not, and the mismatch is data.
5. Using ACGME Data: Citations and Accreditation Actions
You will not get raw resident complaint files, but the ACGME does not hide everything.
What you can extract:
- Frequency and type of citations
- Shortened accreditation cycles
- Warnings, probation, or adverse actions
- Sudden change from “continued accreditation” to “with warning”
| Category | Value |
|---|---|
| 2018 | 1 |
| 2019 | 2 |
| 2020 | 4 |
| 2021 | 5 |
| 2022 | 3 |
A single citation about documentation or evaluation forms is not catastrophic. A string of citations about:
- Supervision
- Duty hours
- Educational content
- Patient safety or professionalism climate
…combined with expansion and chronic underfill, paints a very clear picture.
The strongest programs have:
- Long, boring patterns of “continued accreditation”
- Limited serious citations
- No recent warnings or probation
ACGME data is lagged, but trend direction still matters. If accreditation is secure and quiet year after year, that is what stability looks like on paper.
6. Case Volume vs. Resident Complement: The Hidden Ratio
Another analytical lens: does service volume support the number of residents they are training?
You will not get a perfect numerator and denominator, but you can approximate:
- Annual ED visits, discharges, or surgical case volume (often in hospital annual reports)
- Number of residents per year per service line
| Category | Min | Q1 | Median | Q3 | Max |
|---|---|---|---|---|---|
| Program X | 180 | 220 | 250 | 280 | 320 |
| Program Y | 120 | 150 | 170 | 190 | 210 |
| Program Z | 240 | 260 | 280 | 310 | 350 |
Interpretation:
- Very low volume per resident → risk of inadequate clinical exposure, especially for procedural specialties.
- Extremely high volume per resident → service-heavy, burnout-prone environment.
- You want a middle band with enough volume to train you without grinding you into dust.
Program expansion without matching volume growth usually results in one of two things:
- Thinner case exposure per trainee; or
- Faculty stretching supervision across more learners than they can reasonably support.
Neither is a stability win.
7. How to Combine These Signals Into a Stability Score
You are not going to build a perfect regression model, but you can absolutely build a mental scoring system.
Take each program you are considering and rate:
- Match Fill Stability (0–3)
- Expansion Discipline (0–3)
- SOAP Reliance (0–3, reversed)
- Cohort Continuity (0–3)
- ACGME Stability (0–3)
- Volume-to-Resident Balance (0–3)
Total possible: 18 points. Very rough, but it structures your thinking.
| Metric | Program A | Program B | Program C |
|---|---|---|---|
| Fill Stability (0–3) | 3 | 2 | 1 |
| Expansion Discipline | 3 | 1 | 2 |
| SOAP Reliance (0–3) | 3 | 2 | 1 |
| Cohort Continuity | 3 | 1 | 1 |
| ACGME Stability | 3 | 2 | 1 |
| Volume/Resident Balance | 2 | 1 | 2 |
| **Total (out of 18)** | **17** | **9** | **8** |
Program A: Extremely likely to be a stable training site.
Program B: Fast growth, instability indicators, possible growing pains or deeper issues.
Program C: Not expanding recklessly, but struggling with fill, continuity, and accreditation.
No scoring system is perfect. But if you find yourself justifying a program with a “stability score” under, say, 9–10, you should be honest about the risk you are accepting.
8. How to Interrogate Programs Without Sounding Paranoid
You do not need to show up with spreadsheets. You just need to ask targeted, numerically-focused questions that reveal how they think about their own data.
Some examples that work:
- “How has your intake size changed over the past 5–10 years, and what drove those changes?”
- “Over the last several classes, what proportion of positions have filled in the main Match versus SOAP?”
- “How often do residents leave the program before completing training, and why does that usually happen?”
- “Has the program received any major ACGME citations in the last cycle, and what did you change in response?”
- “How do you monitor whether case volume is sufficient for the current resident complement?”
Listen less to the words, more to the structure of their answer.
Good sign: precise numbers or ranges, trend descriptions, and concrete interventions.
Bad sign: vague reassurances, defensiveness, or “we do not really track that.”

9. Interpreting New vs. Established Programs
New or rapidly expanding programs are not automatically bad. They are just higher variance.
Data approach:
- Age of program (<5 years vs. 5–10 vs. >10)
- Expansion trajectory (single-step vs. continuous ramp)
- Early fill patterns (did they fill quickly to 100%, or are they struggling?)
Typical pattern for a well-designed new program backed by a serious system:
- Year 1–2: Slight underfill or mixed composition
- Year 3–4: Strong word-of-mouth, full Match fill, very limited SOAP use
- Year 5+: Expansion only if service and faculty also expanded
You should be more cautious with a program that is new, aggressively expanding, and already showing SOAP reliance or attrition. That is not “new program noise.” That is a structural signal.
10. A Practical Shortlist Filter: Who Gets on Your Rank List?
You cannot optimize everything. But you can avoid obvious data hazards.
I would personally be very skeptical of ranking a program highly if:
- It has ≥3 years in the last 5 with underfilled positions, and
- It increased complement by ≥50% in the last 3 years, and
- It shows clear PGY1–PGY2 attrition, and
- There are nontrivial ACGME concerns
One of those alone is manageable. All of them together is a pattern. And patterns are what matter.
On the other hand, a program that:
- Fills consistently in the main Match
- Has slow, explainable expansion (or stable size)
- Shows near 1:1 continuity between PGY levels
- Has quiet ACGME history
- Has residents who, unprompted, say “People stay here”
…is almost certainly a stable training site, even if it lacks shiny branding or big-name prestige.

11. Bringing It All Together on Interview Season
Here is how I would operationalize this as an applicant with limited time:
Pre-interview desktop analysis
- Pull 5–10 years of match fill and spot count from NRMP data (where available) or program websites.
- Note obvious expansion jumps, recurrent underfill, and visible accreditation issues.
On-site reality check
- Count residents by PGY level. Compare to reported intake.
- Ask 2–3 data-focused questions about expansion, fill, and attrition.
- Observe resident demeanor: are they openly critical, quietly cautious, or genuinely neutral/positive?
Post-interview ranking
- Score each program on the 6 metrics above.
- Combine that with your subjective fit (location, culture, career goals).
- Use the numbers not to override your gut completely, but to stop you from rationalizing red flags.

Key Takeaways
- Stability is a trend, not a snapshot. You care about 5–10 year patterns in fill, expansion, and accreditation, not a single “good year”.
- Controlled growth with consistent Match fill and minimal attrition is the strongest quantitative signature of a healthy training site.
- Aggressive expansion plus chronic underfill, SOAP reliance, and PGY attrition is not “growing pains”. The data shows it is a structural risk you should think very hard before accepting.
FAQ
1. How many years of match data should I look at to judge stability?
Ideally 5–10 years. At minimum, you want three consecutive cycles. Anything less and you are reacting to noise. Multi-year patterns smooth out random fluctuations and reveal whether a program consistently fills, underfills, or oscillates in a concerning way.
2. Is a single year of SOAP use a reason to downgrade a program?
Not by itself. One SOAP-heavy year amid several fully filled cycles can reflect temporary factors like leadership transition or a bad communication year. The concern rises when SOAP reliance repeats or worsens over 3–4 years, especially if coupled with expansion and resident turnover.
3. Are rapidly expanding programs always a bad choice?
No. Some are strategically growing with solid backing, new facilities, and added faculty. The risk increases when expansion is fast (>50% growth over a few years) and there is no clear increase in case volume, no new sites, and visible strain in fill patterns or attrition. You want evidence the infrastructure grew with the complement.
4. How can I estimate cohort continuity if the website looks incomplete?
Use multiple sources. Compare resident rosters on the website across archived versions (Wayback Machine), ask residents how many started in their class, and look at call schedules posted on walls during interview day. Discrepancies between “expected” and actual headcounts are in themselves informative.
5. What if my top-fit program looks somewhat unstable on paper?
Then you make a conscious risk–reward decision. Some applicants will accept higher instability for location, family reasons, or unique career opportunities. The point of this analysis is not to create a rigid rulebook. It is to make sure you are not surprised later by problems the data already hinted at.