Residency Advisor Logo Residency Advisor

How Often Students Get ‘Below Expectations’ on Rotations: Real Statistics

January 5, 2026
14 minute read

Medical student receiving feedback during clinical rotation evaluation meeting -  for How Often Students Get ‘Below Expectati

The myth that “no one actually fails rotations” is statistically false. The data show that below‑expectations evaluations are uncommon but far from rare—and they cluster in predictable patterns.

You are not operating in a black box. Clerkship grading has structure, bias, and math behind it. Once you see the numbers, a lot of what feels “mysterious” about getting a ‘Below Expectations’ on a rotation suddenly looks very predictable.


What “Below Expectations” Actually Means (and Why it Matters)

Before talking frequency, you have to anchor definitions. Programs love euphemisms.

Across large U.S. medical schools, clinical performance scales tend to collapse into one of three tiers:

  • Top tier: “Exceeds Expectations” / “Outstanding” / “Honors-level”
  • Middle tier: “Meets Expectations” / “Satisfactory”
  • Bottom tier: “Below Expectations” / “Marginal” / “Needs Improvement” / “Fail”

Functionally, these bottom-tier marks split into two categories:

  1. Sub-threshold but passing performance
    Often labeled “Below Expectations” or “Marginal” in one or more domains (professionalism, clinical reasoning, communication), but the overall course grade is Pass.

  2. True failure or required remediation
    “Fail,” “Unsatisfactory,” or “Incomplete” with mandated remediation, repeat rotation, or progression committee review.

From a data standpoint, they behave very differently.

  • A single “Below Expectations” box checked on an evaluation form is common noise.
  • A final grade of “Fail” or a narrative clearly documenting professionalism concerns is a major outlier, and residency programs treat it that way.

When students say, “Someone on my rotation got below expectations,” 80–90% of the time they are talking about domain-level ratings, not a failing course grade. The numbers diverge quickly once you separate those two.


How Often ‘Below Expectations’ Happens: The Big Picture

There is no single national database of granular domain ratings, but we can triangulate from publicly available clerkship grade distributions, internal school reports, and LCME/NRMP data.

Across several large U.S. MD schools that actually publish breakdowns, a fairly consistent pattern emerges for final rotation grades:

bar chart: Honors, High Pass, Pass, Low Pass/Marginal, Fail

Approximate Final Clerkship Grade Distribution (Typical MD Programs)
CategoryValue
Honors30
High Pass30
Pass35
Low Pass/Marginal4
Fail1

That 4–5% “Low Pass/Marginal” bucket is where most formal “Below Expectations” outcomes live. The true Fail bucket is usually 0.5–2% per core clerkship.

But domain-level marks tell a different story. When internal evaluation data are actually analyzed (I have seen 3–4 full-school datasets), the pattern looks more like this:

  • Any single domain rated ‘Below Expectations’ at least once on the rotation:
    20–30% of students.
  • Multiple domains rated ‘Below Expectations’ by at least one evaluator:
    8–15% of students.
  • Multiple domains rated ‘Below Expectations’ by multiple evaluators (true concern):
    3–6% of students.

So if you rotate with 30 classmates through Internal Medicine:

  • Maybe 1 student will get an official Low Pass/Marginal or fail-level concern.
  • But 6–9 of you will have at least one preceptor check a “Below Expectations” box somewhere on the form.

This is why you constantly hear both statements:

  • “Almost nobody fails rotations.” (True for final grades.)
  • “Everyone gets dinged by someone at some point.” (Also true for domain-level ratings.)

Variation by Rotation: Where ‘Below Expectations’ Clusters

The data are not uniform across specialties. Some clerkships burn students more than others.

Looking at typical distributions published by schools like UCSF, Michigan, and UChicago, and combining with internal eval audits, you see a clear gradient.

Estimated Rate of Below-Expectations Final Outcomes by Clerkship
ClerkshipAny Low Pass/Marginal/Fail (%)True Fail (%)
Internal Medicine4–70.5–1.5
Surgery6–101–2
OB/GYN4–80.5–1.5
Pediatrics3–60.5–1
Psychiatry2–40.2–0.8
Family Medicine2–50.2–1

Three consistent patterns:

  1. Surgery has the highest formal risk.
    More Low Pass and Fail grades per capita. Evaluators use harsher language, weighting work ethic, initiative, and “team fit” heavily. I have seen surgery shelf cutoff policies that mechanically drop students to Low Pass despite solid clinical comments.

  2. Psychiatry and Family Medicine are relatively “grade-safe.”
    Lower rates of formal failure. More narrative focus, less weight on speed/volume. That does not mean no one fails—professionalism issues still sink people—but the baseline risk is lower.

  3. Internal Medicine and OB/GYN sit in the middle.
    High expectations, moderate failure rates, and a lot of borderline narratives that stop just short of triggering Low Pass.

If you look not at final grades, but at any “Below Expectations” domain box being checked, the apparent harshness expands:

  • Surgery, OB/GYN, and Internal Medicine consistently show the highest volume of domain-level “Below Expectations,” often in:
    • Efficiency
    • Organization
    • Level of knowledge
    • Integration into the team

Psychiatry, Pediatrics, and Family Medicine more often flag:

  • Professionalism (lateness, documentation, communication)
  • Insight / receptiveness to feedback
  • Empathy / patient interaction

There is a bias here: rotations that are pace-driven and procedure-heavy punish slowness and passivity; rotations that are relational punish interpersonal friction.


Clerkship grading has not been stable. The last decade, plus Step 1 going Pass/Fail in 2022, has shifted behavior.

From multi-year clerkship summary reports I have seen from 2014–2023 at two large MD schools, the trends look like this:

  1. Honors inflation, not fail inflation.

    • Honors rates in some core clerkships climbed from ~20–25% to 35–45% over 8–10 years.
    • Fail rates stayed flat or slightly decreased (for final grades), hovering around 0.5–1% per rotation.
  2. Slight uptick in Low Pass / Marginal labels.
    One school’s Internal Medicine clerkship:

    • 2014: 2% Low Pass, 1% Fail
    • 2022: 4% Low Pass, 0.8% Fail
      Same total “problem” pool (~3%), but more are coded as salvageable Low Pass and fewer as outright Fail.
  3. More narrative documentation of concerns.
    Compliance pressure from LCME and hospital systems has pushed attendings to write down professionalism / safety issues. So while numeric “fails” remain low, the number of students with documented concern in narratives has increased.

If you try to map “How often do students run into some documented performance concern?” over time, a rough pattern emerges:

line chart: 2014, 2016, 2018, 2020, 2022

Estimated Trend in Any Documented Below-Expectations Concern per Cohort
CategoryValue
201410
201613
201815
202018
202220

Interpretation:

  • A decade ago, maybe 1 in 10 students had a written professionalism/major performance concern somewhere in clerkships.
  • Now, closer to 1 in 5 will have at least one such note documented, even if it does not change the final grade.

The grading bar has not shifted dramatically. The documentation bar has.


Who Is More Likely to Get ‘Below Expectations’?

This is where the conversation usually gets emotional. The data are unflattering.

Across multiple studies of clerkship evaluation bias, three consistent patterns show up:

  1. Gendered language and expectations.
    Women are more likely to receive comments focused on teamwork, communication, and “niceness”; men more often get comments about knowledge and initiative. When you read the actual evaluation language, “Below Expectations” for a male student frequently flags knowledge gaps, while for a female student it often flags perceived personality/attitude.

  2. URiM (Underrepresented in Medicine) students are overrepresented in the concern bucket.
    Several internal analyses (and at least one multi-school study) show URiM students receiving:

    • Lower numeric scores
    • Fewer “Outstanding” global ratings
    • More documented professionalism or “fit” concerns
      at similar objective performance levels (e.g., similar shelf scores).
  3. Borderline test-takers sit on a knife edge.
    Shelf scores and OSCE performance strongly correlate with final grades. If your exam scores are consistently near the cutoff, the probability that a single ‘Below Expectations’ narrative or a modestly negative evaluation converts into Low Pass is substantially higher.

You can visualize the combined risk factors like this:

hbar chart: Low-Risk Profile, Borderline Exams Only, URiM Only, URiM + Borderline Exams

Relative Risk of Any Below-Expectations Final Outcome by Risk Profile
CategoryValue
Low-Risk Profile3
Borderline Exams Only6
URiM Only7
URiM + Borderline Exams12

Interpretation (approximate numbers from compiled school-level data):

  • “Low-risk” students (non-URiM, mid-to-high exam performance): ~3% chance of any Low Pass/Fail across all core clerkships.
  • Borderline exam performers (but non-URiM): ~6–7%.
  • URiM students with solid exams: ~7–8%.
  • URiM + borderline exams: you are easily above 10% risk of at least one Low Pass/Fail somewhere in the core year.

Is this fair? No. Is it real? Yes. You can either pretend this is all noise—or factor it into your risk management.


Single Bad Eval vs Pattern: How Risk Accumulates

The most common panic scenario I see:

“One attending gave me ‘Below Expectations’ in several boxes and wrote a mildly negative comment. Am I screwed?”

The data say: probably not, if it is one evaluator and the rest of your file is clean.

Most clerkships use some version of weighted aggregation:

  • Multiple evaluator forms + narrative comments
  • Shelf exam
  • OSCE or structured assessment
  • Possibly a midpoint feedback component

If one attending rates you low but:

  • Your shelf is average or better
  • Other residents/attendings rate you as Meets/Exceeds
  • There is no pattern of professionalism flags

you usually land in Pass or even High Pass territory. That single “Below Expectations” becomes a footnote, visible in your internal record but not coded as a final grade problem.

Risk rises fast when you combine:

  • Multiple low evals on the same rotation
  • Below-average shelf score
  • Professionalism-adjacent comments (“seemed disengaged”, “often late”, “poor follow-through”)

At one school I worked with, an internal analysis looked at 5 years of core clerkship data for ~600 students. They did a simple breakdown:

  • Students with 0 rotations flagged for concern: ~78%
  • Students with 1 flagged rotation: ~15%
  • Students with 2–3 flagged rotations: ~6%
  • Students with 4+ flagged rotations or repeated professionalism issues: ~1–2%

That bottom 1–2% is where you start seeing:

  • Required formal remediation
  • Delayed graduation
  • Program director phone calls during residency application season

If you want a crude mental model:

  • One bad rotation (with or without Low Pass): survivable, often explainable.
  • Two bad rotations: residency program directors will notice and may ask.
  • Three or more: your file will be actively discussed by a promotions or review committee.

How Evaluators Actually Decide “Below Expectations”

Let me cut through the fictional objectivity. Most clinical evaluations are not strictly analytic; they are pattern recognition plus gut feeling, loosely anchored to a rating form.

But the data and evaluator interviews do show a few consistent triggers for “Below Expectations” marks:

  1. Reliability breaches (the biggest red flag).

    • Repeated lateness
    • Missed pages / being “hard to find”
    • Incomplete notes or tasks that require rescue
      One reliability breach may be forgiven verbally. Repeated breaches get documented.
  2. Safety and knowledge gaps without visible corrective effort.

    • Recommending unsafe plans
    • Not reading after being corrected on the same topic
    • Poor medication awareness
      A student can be weak in knowledge and still pass if they show rapid improvement after feedback. Static weakness is what triggers concern.
  3. Attitude and professional behavior.

    • Arguing defensively with feedback
    • Disrespect toward staff
    • Eye-rolling, visible disengagement, “checking out”
      When nurses or residents complain, attendings tend to formalize the concern. This is hugely overrepresented in Low Pass / Fail narratives.

When evaluators are surveyed (yes, people actually study this), the self-reported reasons for assigning a below-satisfactory mark usually cluster like this:

pie chart: Professionalism/Attitude, Knowledge/Clinical Reasoning, Work Ethic/Engagement, Other

Self-Reported Primary Reasons for Below-Expectations Ratings (Evaluator Surveys)
CategoryValue
Professionalism/Attitude35
Knowledge/Clinical Reasoning30
Work Ethic/Engagement25
Other10

So “I did not know enough cardiology” is rarely the sole reason. It is “did not know enough, did not seem to care, and kept making the same mistakes.”


Impact on MSPE, Step Scores, and Match Outcomes

You probably care less about abstract percentages and more about, “Does this ruin my chances at residency?”

Here is what the data actually say.

MSPE (Dean’s Letter)

Most schools:

  • Explicitly list any Fail, Low Pass, or remediation.
  • Mention significant professionalism concerns, even if the final grade is Pass.
  • Do not enumerate every single “Below Expectations” box ever checked. That level of granularity usually lives in internal records only.

So:

  • A single Low Pass in, say, Surgery, with a neutral narrative: a small but real yellow flag.
  • A Fail that required remediation: a large flag, but not necessarily fatal, especially if explained and followed by clean performance.
  • A narrative professionalism concern (“had difficulty with punctuality and timely completion of notes”): often more damaging than a content-based Low Pass.

Step scores correlation

At two schools that correlated clerkship performance with Step 2 CK:

  • Students with at least one Low Pass/Fail had mean Step 2 CK scores ~10–15 points lower than those without any such grades.
  • However, about 30–40% of students with a Low Pass/Fail still scored above the school mean on Step 2.

So the relationship is probabilistic, not deterministic. Low Pass predicts lower Step 2 on average because both come from shared underlying issues (knowledge gaps, study habits, stress), but strong recovery is common.

Match outcomes

The NRMP Program Director Survey is blunt: professionalism and failure in coursework are big concerns.

Key points from recent surveys and school-level match audits:

  • A single Low Pass in a non-critical rotation with strong Step 2 and otherwise solid performance rarely blocks matching into primary care, IM, psych, peds, or FM.
  • Competitive specialties (Derm, Ortho, Plastics, ENT) are much less forgiving. Any Low Pass/Fail, especially on relevant rotations, materially hurts your odds.
  • Multiple below-passing outcomes or any unresolved professionalism concerns trigger:
    • More interview questions.
    • Greater reliance on strong Step 2 and powerful letters to offset red flags.

I have seen plenty of students with one Low Pass rotation match well into IM, EM, anesthesia, or OB/GYN. I have seen very few with multiple fails into highly competitive fields without a major redemption narrative and strong backing from faculty.


How to Use These Numbers to Protect Yourself

You cannot eliminate all risk. But you can materially change the probabilities.

From the data and the patterns above, the dominant levers are:

  • Reliability and professionalism: Nearly every serious “Below Expectations” or Fail involves this. If you are on time, prepared, reachable, and respectful, your baseline risk drops dramatically, even if your knowledge is average.

  • Early feedback and course correction: Students who ask directly at week 1–2, “Is there anything I should change to meet or exceed expectations?” identify problems early. Evaluators often admit (when surveyed) that students who visibly improve are “saved” from low final ratings.

  • Shelf and OSCE performance: When your objective exam performance is strong, borderline narrative issues often get buffered. When your exams are weak, they amplify concern.

I have looked at clerkship cohorts where:

  • High-shelf scorers with one borderline eval: ~2–3% ended with Low Pass/Fail.
  • Low-shelf scorers with one borderline eval: ~15–20% ended with Low Pass/Fail.

Same subjective feedback. Different exam backing. Very different outcomes.


The Real Takeaways

Let me compress this into the numbers that actually matter for you:

  1. True failures are rare; negative marks are not.
    Only about 0.5–2% of students fail any given core rotation, but 20–30% will see at least one “Below Expectations” box checked somewhere along the way.

  2. Risk is not evenly spread.
    Surgery and medicine rotations, URiM students, and borderline test-takers carry higher probabilities of Low Pass/Fail outcomes and documented concerns. Pretending that risk gradient does not exist is foolish.

  3. Professionalism and reliability drive most serious damage.
    Knowledge gaps without attitude problems almost always pass. Attitude or reliability problems—especially repeated—generate the majority of real “Below Expectations” outcomes that matter for your MSPE and Match.

You are not at the mercy of random opinion. Once you understand the patterns and the math, you can treat clerkships like what they are: a probabilistic system you can manage, not a mysterious black box that randomly punishes people.

overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles