
The belief that “more call equals better training” is mathematically wrong once you cross a surprisingly low threshold. The data on fatigue, error rates, and learning curves all point in the same direction: call burden has a clear point of diminishing returns for surgical case volume and competency.
Let me walk through it like I would with a department chair staring at a grim call schedule and an unhappy resident cohort.
The Core Tradeoff: Hours, Cases, and Cognitive Yield
We can quantify the call–case relationship as three separate but linked curves:
- Call hours
- Case volume
- Effective learning per case (cognitive yield)
Call hours and raw case volume correlate up to a point. But effective learning does not scale linearly. It flattens, then falls.
Imagine a PGY-2 on general surgery:
- Regular days: 60–65 hours/week on service
- Call: q4 in-house, 24-hour shifts, most of them busy nights
- Real life: 80–85 hours/week total, which is common even if technically non-compliant some weeks
On paper, that resident is “getting more cases” than a colleague on q7 call. In practice, only some of those extra cases are high-yield, and a lot of them occur during periods of degraded performance due to sleep loss.
To see what is actually happening, you have to separate three layers:
- Marginal cases per extra call shift
- Marginal useful cases per extra call shift
- Marginal retained competency from those cases
That last layer is where diminishing returns hit hard.
What the Numbers Say About Fatigue and Performance
The fatigue literature is brutal if you actually run the math.
Sleep restriction to 4–5 hours/night for several nights in a row creates cognitive deficits equivalent to a blood alcohol level of approximately 0.06–0.08%. Multiple lab and simulation studies show:
- Working 24+ hours continuously:
- Increases serious medical errors by roughly 20–30%.
- Doubles or triples some types of technical and judgment errors in simulated procedures.
- Residents working >80 hours/week:
- Have higher rates of needle sticks, lacerations, and near-miss events.
- Report significantly worse focus and information retention.
None of that is “soft” data. It is replicated, predictable, and quantifiable.
So even if case volume rises with call frequency, the quality of those repetitions (from a learning and safety standpoint) falls as fatigue accumulates.
We can visualize the clash like this:
| Category | Relative Call Hours | Relative Case Volume | Effective Learning Yield |
|---|---|---|---|
| q8 | 60 | 60 | 55 |
| q7 | 70 | 75 | 65 |
| q6 | 80 | 85 | 70 |
| q5 | 90 | 95 | 68 |
| q4 | 100 | 100 | 60 |
| q3 | 115 | 105 | 45 |
The shape here is not exact empirical data; it is a stylized representation of what multiple datasets suggest:
- Call hours grow steeply as you move from q8 to q3.
- Case volume grows, but much more modestly.
- Learning yield starts to plateau around q5–q4 and then drops as cognitive impairment wins.
So “more call” beyond a certain point buys you mainly fatigue and low-yield, repetitive, poorly retained experiences.
Case Volume: When More is Actually More
Before we talk diminishing returns, we need to admit the obvious: insufficient volume is a problem.
For many basic surgical skills, the learning curve looks something like this:
- First 5–10 cases: very steep improvement in basic familiarity and setup.
- 10–30 cases: strong improvement in efficiency and error reduction.
- 30–50+ cases: smaller refinements; plateau effects begin.
- Beyond that: incremental gains, but much smaller per additional case.
This is visible in everything from lap chole times to endoscopy completion rates.
Where does call fit in?
Call disproportionately contributes certain case types:
- Appendectomies
- Cholecystitis cases
- Obstructions
- Trauma laparotomies
- Emergency C-sections (for OB/Gyn)
- I&Ds, washouts, reoperations
In many programs, 30–60% of these “index emergencies” for junior residents come from nights/weekends.
So yes, some call is non-negotiable if you want residents who can handle emergencies.
But then you look at actual logs.
| Scenario | Scheduled Cases/Month | Call Cases/Month | Total Cases/Month |
|---|---|---|---|
| Lower call (q7) | 55 | 15 | 70 |
| Moderate call (q5) | 55 | 25 | 80 |
| Heavy call (q3–4) | 55 | 35 | 90 |
Here is the uncomfortable truth:
- Moving from q7 to q5: +10 extra cases/month, most of them semi-urgent/emergent. That often is meaningful, especially early.
- Moving from q5 to q3–4: another +10 cases/month, but a significant portion are the same patterns repeated in degraded conditions: second-look washouts at 3 a.m., minor I&Ds, or “just hold the camera for this damage control.”
The first step up in call may be valuable. The second step up is mostly marginal and disproportionately low-yield.
Diminishing Returns: A Simple Quantitative Model
Let me lay out a simplified framework I use when I work with program directors who want something more concrete than “residents are tired.”
Define:
- C = total cases per month
- Ce = emergent/urgent cases per month
- L = learning effectiveness (0 to 1 scale), combining attention, memory, and opportunity to take an active role
- H = hours worked per week
We can define effective learning volume (ELV) as:
ELV = Σ (Lᵢ × case_complexityᵢ × role_factorᵢ)
We know from fatigue research:
- For H ≤ 65–70 hours/week: L is close to 1 for most daytime and early evening cases.
- For H between 70–80: L begins to drop, especially for late-night cases.
- For H ≥ 80–85 with frequent 24-hour in-house calls: L during post-midnight windows often falls into the 0.4–0.6 range or worse.
So more call adds cases, but the average L applied to those cases falls.
Let us plug rough, illustrative numbers:
Scenario A – Moderate call (q5):
- C = 80/month, of which Ce = 25
- Average L (all cases) ≈ 0.85
- Daytime/elective: 0.9–1.0
- Night call: 0.7–0.8
ELV_A ≈ 80 × 0.85 = 68 “effective case units”
Scenario B – Heavy call (q3–4):
- C = 90/month, of which Ce = 35
- Average L ≈ 0.7
- More cases between 1–5 a.m. with markedly decreased performance and recall
- Post-call fatigue affecting the next day’s learning as well
ELV_B ≈ 90 × 0.7 = 63 “effective case units”
So you have more cases, but less effective learning. Net productivity drops.
That is the core of diminishing returns: not just flattening, but actual reversal of net gain.
The Hidden Cost: Post-Call Decay
People focus too much on the overnight shift itself and not enough on what it destroys: the next 24–48 hours.
Residents are not robots. A brutal night call does not just impair the 2 a.m. chole — it also degrades:
- The next morning’s OR performance
- That afternoon’s clinic or cases if they are not truly post-call released
- Study time that evening
- Long-term memory consolidation from the previous few days
You can think of it as “post-call tax.”
Here is a simplified cost estimate:
- One heavy 24-hour in-house call:
- Night performance: -30% learning efficiency on those cases.
- Next 24 hours: -20–30% on whatever they do (if they remain in the hospital).
- Lost or fragmented study time: effectively 0–50% of usual learning that day.
Multiply by 7–8 calls/month, and you are not just losing a few points off each call case. You are degrading a significant fraction of the entire training week.
| Category | Moderate Call (q5) | Heavy Call (q3–4) |
|---|---|---|
| Pre-call Day | 0.95 | 0.9 |
| Call Day | 0.8 | 0.65 |
| Post-call Day | 0.85 | 0.7 |
Again, these are stylized estimates, but they mirror what people actually report. You feel sharp pre-call. Slowed and a bit sloppy overnight. Blunted and irritable post-call.
The net effect: beyond a certain call frequency, you are burning tomorrow’s learning to barely squeeze extra low-yield cases into tonight.
Case Mix Matters More Than Raw Count
Residents love to compare numbers: “I logged 1200 cases last year.” It sounds impressive. But raw case log totals are a crude metric.
From a training perspective, the distribution across procedure types and levels of responsibility is much more important.
You can divide call-derived cases into three broad buckets:
- High-yield, index emergencies
- First several appendectomies, cholecystitis cases, trauma ex-laps where the resident is primary operator for key steps.
- Moderate-yield repetitions
- Case number 15–40 of similar pathology, where refinements in efficiency and judgment accumulate.
- Low-yield or pseudo-participation
- Standing in the room for the third re-look washout of the week at 4 a.m.
- Being essentially an observer while the attending “just gets it done” on a crashing trauma.
Early in training, call shifts are rich in category 1. By mid-residency, many of the additional call cases slide into categories 2 and 3, particularly at higher volume centers.
This is where schedulers often fool themselves. They assume:
More call → more cases → more independence → more competence.
The actual pattern is usually:
More call → more similar cases at worse hours → less independence on the hardest ones (attendings take over when things get dicey at 3 a.m.) → declining marginal learning per case.
A smarter model maximizes high-yield experiences without drowning residents in low-yield repetition under fatigue.
Comparing Call Strategies: A Quantitative Snapshot
Programs often ask, “What if we spread the call wider but keep case volume acceptable?” That is the right question.
Let us compare three hypothetical but realistic call structures over a year for a PGY-2 on a high-volume general surgery service.
| Metric | q7 In-House | q5 In-House | q3–4 In-House |
|---|---|---|---|
| Annual Calls | ~52 | ~73 | ~104 |
| Total Cases Logged | 800 | 900 | 1000 |
| Estimated Effective Learning Volume (ELV units) | 720 | 765 | 700 |
| Average Weekly Hours | 70–72 | 75–78 | 80–85 |
| Self-reported Burnout Risk | Moderate | High | Very High |
Interpretation:
- Moving from q7 to q5:
- ~100 extra cases/year.
- ELV rises modestly (~6% in this example).
- Hours increase, but often still manageable.
- Moving from q5 to q3–4:
- Another ~100 extra cases/year.
- ELV actually falls as fatigue undercuts learning efficiency.
- Burnout, attrition, and safety concerns spike.
In other words: moderate call can be beneficial, but very heavy call is a negative return on investment from a training perspective.
Future Direction: Data-Driven Call Design
The future of surgical training will not be about “gut feel” on call schedules. It will be about telemetry and outcome data.
We already have the ingredients:
- Electronic OR logs linked to residents by name and role.
- Time stamps for start/end times, including night vs day distribution.
- Simulation performance metrics (time to completion, errors).
- Patient outcome data linked to operator level and time of day.
- Resident-reported fatigue and burnout scores.
Programs that are serious can do what some top centers have started:
Construct a per-resident longitudinal dataset:
- Cases per week (with case type and role).
- Hours worked (EHR log-ins, badge taps, OR timestamps as proxy).
- Sleep data (increasingly, from opt-in wearables).
- Simulation performance exams every 6–12 months.
- Standardized in-training exam scores.
Model the relationships:
- Use regression or mixed models to estimate:
- How weekly hours (H) relate to case volume (C).
- How hours and sleep relate to simulation performance (S).
- How case volume and sleep relate to exam performance and independent performance milestones.
- Use regression or mixed models to estimate:
Identify the knee of the curve:
- There is usually a clear region where per unit of additional call:
- Case count increases mildly.
- Performance and well-being metrics worsen significantly.
- There is usually a clear region where per unit of additional call:
Rebuild the schedule around that threshold:
- Set a hard ceiling on average hours at the level where marginal ELV still rises or at least plateaus.
- Shift low-complexity cases or tasks away from residents above that threshold (e.g., night-shift APP coverage for simple consults, tele-triage, or cross-coverage tasks).
| Step | Description |
|---|---|
| Step 1 | Collect Resident Data |
| Step 2 | Model Hours vs Cases |
| Step 3 | Model Hours vs Performance |
| Step 4 | Identify Optimal Hour Range |
| Step 5 | Redesign Call Schedule |
| Step 6 | Monitor Outcomes |
This is not theoretical. I have seen departments do small-scale versions of this with shocking results: minor reductions in call frequency produced stable or increased case volume (by better case distribution) and meaningfully better exam performance and resident satisfaction.
Technology, Simulation, and the “Replace Call” Question
One objection always comes up: “If we cut call, where do residents get their emergency experience?”
The answer is not “from a textbook.” But it is also not “from indefinite q3 nights.”
Three concrete tools can offset some of the lost low-yield call time:
-
- Not as a toy, but as a structured, assessed curriculum.
- 20–30 well-designed crisis simulations per year can teach decision frameworks for hemorrhage, bowel injury, anastomotic leak, etc., more reliably than random chaotic cases at 4 a.m.
Targeted case assignment
- Ensure true index emergencies (first 10–15 of each type) are allocated to residents who need them — not just whoever happens to be post-call and awake.
- This may mean more nuanced daily assignment boards instead of “service owns all cases.”
Scheduled “emergency blocks”
- Some programs are experimenting with residents on dedicated acute care/emergency surgery blocks with more predictable hours but dense emergency exposure, versus random scattering of emergencies on top of elective work plus heavy call.
| Category | Value |
|---|---|
| Heavy Call | 70 |
| Moderate Call + Simulation | 85 |
| Emergency Block Model | 90 |
Again, those values are conceptual, but they reflect a key point: thoughtful design plus simulation and targeted exposure can beat brute-force high call frequency.
The Hard Reality: Culture vs Data
The biggest barrier is not technical. It is cultural.
Surgeons who trained doing q2 home call with 110-hour weeks will swear “that is how you learn.” But memory is biased. Most of them underestimate how many nights they were spectators, not primary surgeons, and how many “cases” blurred into each other.
The data shows:
- True catastrophic under-exposure is bad. Residents need meaningful volume.
- But beyond a moderate call level, extra nights:
- Add relatively few unique, high-yield cases.
- Substantially worsen fatigue, error risk, and long-term retention.
- Potentially reduce the total effective learning over the year.
If a program is still designing call with the guiding principle “we suffered, you should too,” it is ignoring a decade of human performance literature.
Residents do not need martyrdom. They need a well-shaped learning curve.
Three Takeaways
Call burden and true learning gains are not linearly related. After a moderate level of call, extra nights add low-yield, fatigue-blunted repetitions and can reduce net effective learning.
The key variable is not raw case count, but effective learning volume: case mix, independence, timing, and fatigue-adjusted cognitive yield. That is where heavy call loses the plot.
The future is data-driven call design: linking hours, sleep, case mix, and performance metrics to find the real knee of the curve—and building schedules that maximize emergency competence without wasting human capital on diminishing returns.