Residency Advisor Logo Residency Advisor

Behind Closed Doors: How Teaching Evaluations Are Read and Scored

January 8, 2026
17 minute read

Medical faculty member reading teaching evaluations alone in an office -  for Behind Closed Doors: How Teaching Evaluations A

Last spring, a junior faculty member forwarded me her teaching evals with a single line: “Do I need to start looking for another job?” One student had written “worst lecturer in the curriculum” and she was convinced her academic career was over. What she did not know—and what nobody had ever told her—is how little that single comment actually mattered in the rooms where decisions are really made.

Let me walk you through those rooms.

What Actually Happens The Day Your Evals Come In

At most med schools and teaching hospitals, there is a predictable, almost boring workflow behind the supposedly terrifying “teaching evaluations.”

It goes something like this, though nobody ever explains it to you.

First, the raw survey data land in the educational office. Not on the department chair’s desk. Not in the Dean’s inbox. In the hands of an overworked education coordinator or data analyst who’s juggling clerkship schedules, room bookings, and accreditation reports.

They run the standard report:

  • Mean scores for each item (1–5 or 1–7 scale)
  • Standard deviations
  • Response rate
  • Comment dump at the end

Then there’s the first, unspoken filter: response rate. If you got 5 responses out of a rotation of 40 students, every serious educator silently downgrades the “precision” of your feedback. We do not treat 3 angry comments from 10% of your learners the same way we treat a consistent pattern from 70–80%.

bar chart: Preclinical Course, Core Clerkship, Elective, Conference Series

Typical Teaching Evaluation Response Rates by Setting
CategoryValue
Preclinical Course78
Core Clerkship65
Elective42
Conference Series30

Those reports get rolled up. Depending on the institution, they’re:

  • Emailed to you and your division chief
  • Stored in a central faculty performance system
  • Summarized annually in a teaching dossier

The real reading and scoring happens later—during promotion cycles, contract renewals, teaching awards, or when there’s a complaint.

That’s the part you never see. Because you’re not in the room when your name comes up.

How Program Leadership Actually Reads Your Numbers

Let me be blunt: nobody is sitting in a dark room, obsessing over whether your “organized presentation” item is 4.3 or 4.4.

Senior people look for three things:

  1. Are you consistently below the group?
  2. Are you consistently above the group?
  3. Is there a pattern—over time or across different settings?

Notice what’s missing: perfection.

No serious educator expects straight 5.0s. When you see those, seasoned program directors get suspicious. Either the sample size is tiny, the evals are inflated, or the faculty member aggressively pressures learners for good scores.

Here’s how those numbers are really interpreted behind closed doors:

How Leadership Interprets Teaching Eval Numbers
PatternTypical Interpretation
4.6–5.0 with high response rateStrong teacher, likely doing something special
4.2–4.5 around departmental meanSolid, reliable, not a concern
3.8–4.1 slightly below meanWatch list, may need support or context
< 3.8 consistently, multiple yearsProblem we cannot ignore
Big year-to-year swingsContext change – need to investigate

You obsess over “I got a 4.1 and my colleague got a 4.3.” Leadership doesn’t. We look at:

  • Where are you relative to the median faculty member?
  • Is your trend stable, improving, or sliding?
  • Does this fit what we hear anecdotally from residents and students?
  • Does this fit your teaching context? (more on that later)

When I sit in promotions meetings, the conversation about teaching evals usually lasts maybe 2–5 minutes per candidate. That’s it. No one is printing your entire comment history and reading it aloud.

Unless you are truly an outlier. Then the tone shifts.

The Dark Secret: Comments vs. Scores

Let me tell you what everyone pretends isn’t true: the comments matter more than the numbers—if there are enough of them saying the same thing.

But a single spectacularly nasty comment? We’ve all been subject to those. Experienced committee members discount them almost immediately.

In a typical meeting, here’s what you’ll hear:

  • “She’s at or slightly above the departmental mean. Nothing concerning here.”
  • “There are a few negative comments, but they’re not consistent year to year.”
  • “Students love his bedside teaching but hate his PowerPoints—that’s fixable.”

Now, when do comments start to count against you?

When we see repetition:

  • “Disorganized” every year, across courses
  • “Humiliates students” from UME and GME learners, multiple cohorts
  • “Makes sexist jokes” echoing across time
  • “Never lets us do procedures, just pushes us aside”

One offhand complaint about being “mean” because you failed a student who deserved to fail? That gets thrown in the mental trash. Three independent cohorts calling you belittling? That changes how your file gets read.

Printed teaching evaluation comments with highlighted repeated themes -  for Behind Closed Doors: How Teaching Evaluations Ar

And there’s another uncomfortable truth: glowing comments rarely save bad numbers. A pattern of low scores with a handful of gushing “best teacher ever” comments looks like this to us: you connect well with a subset of learners and fail the rest.

That’s not excellence. That’s variability.

Context: The Part Nobody Tells You Gets Adjusted

You think all teaching evals are judged the same. They aren’t. Not even close.

When I evaluate a faculty member’s teaching, I’m adjusting in my head for:

  • Content difficulty. The person teaching renal physiology or biostatistics is graded by students harsher than the person teaching dermatology pictures. Not fair, but very real.
  • Learner seniority. Preclinical small groups vs. intern boot camp are different worlds. Interns will slam you for being demanding; students might adore the structure.
  • Required vs. elective. Evaluations from electives are almost always inflated. The learners chose to be there. We all know this.
  • Time of year. Pathology in April vs. September? Two different audiences. Burned-out learners are not generous graders.

Good education leaders know this and mentally curve things. Here’s roughly how people weight different teaching contexts behind closed doors:

doughnut chart: Core Clerkship Teaching, Preclinical Lectures, Elective Teaching, One-off Noon Conferences

Relative Weight of Teaching Contexts in Promotion Decisions
CategoryValue
Core Clerkship Teaching40
Preclinical Lectures30
Elective Teaching20
One-off Noon Conferences10

If you’re outstanding in core clerkships and mediocre at a once-per-year noon conference, nobody cares about the noon conference.

Flip it—if your only strong evals are from a handpicked elective and your core teaching is weak—that’s a problem. And yes, we notice that pattern.

What Committees Actually Do With Your Evals

Let’s talk about where the evals really matter: promotions, reappointments, and teaching awards.

For Promotion and Reappointment

In a typical Assistant Professor → Associate Professor discussion, your teaching section gets handled like this:

  1. A committee member has your dossier open. They’ve read the summary prepared by the education office.
  2. They see:
    • Aggregate scores over ~5 years
    • Comparison to departmental averages
    • Selected representative comments
    • List of teaching roles (lectures, small groups, clerkships, simulation, etc.)
  3. They talk for 60–120 seconds:
    • “Teaching evaluations are consistently at or above departmental mean.”
    • “He’s taken on increasing teaching responsibility.”
    • “Comments highlight approachability and bedside teaching.”
    • “There were concerns about organization early on, which seem to have improved.”

That’s the entire teaching segment.

If you’re below the mean, the conversation shifts:

  • “She’s consistently 0.3–0.5 below the departmental mean in the clerkship.”
  • “Comments repeatedly mention disorganization.”
  • “She has not sought faculty development despite previous feedback.”

Then someone will ask:

“Is this a support issue, or a patient safety / professionalism issue?”
If it’s the former, they may still recommend promotion but with a strong note that you need teaching development. If it’s the latter, things get sticky.

For Teaching Awards

Here’s the part nobody will tell you: teaching awards are political.

Yes, evals matter. But so do:

  • Who is on the selection committee
  • Whether your department advocates for you
  • Whether learners bother to nominate you with specific stories
  • Whether you’re “visible” in big lectures vs. buried in night-float teaching

We absolutely look at your evaluations. But we are looking for sustained excellence and narrative evidence, not perfection.

I’ve sat in award meetings where someone with 4.9 averages lost to someone with 4.6 but legendary comments and clear impact on struggling learners. The committee valued story over fraction.

The Bias Problem: What We Say Quietly After The Meeting

There is a conversation that happens after the official meeting ends. You won’t see it in any policy document.

We all know the literature: teaching evaluations are biased. Against women. Against underrepresented faculty. Against anyone with an accent. Against those who enforce standards.

So after reading a dossier from, say, a Black woman surgeon with “tough but fair” comments and slightly lower means, experienced chairs will literally say in the room:

“Adjust for bias. She’s holding residents to standards and they’re reacting.”

I’ve heard this word-for-word:

  • “Her scores are a bit lower, but she’s in trauma surgery nights with angry PGY-1s. This is not a red flag.”
  • “Students call him ‘intimidating’ but he’s the only one giving them real feedback. I’m not docking him for that.”

Faculty promotions committee in serious discussion -  for Behind Closed Doors: How Teaching Evaluations Are Read and Scored

Are all committees this self-aware? No. Some still treat evals as objective truth. But in most serious academic centers, the bias problem is known and at least partially corrected for—informally, in people’s heads.

That said, there’s another ugly bias: charisma.

If you’re funny, extroverted, and good on stage, your evals are inflated. If you’re quiet, methodical, and introverted, students often underrate you even if they learn more from you.

We know this too. Some of the best clinical teachers I’ve seen get “good but not great” evals because they’re not performers. A thoughtful promotions committee will read that correctly.

A lazy one won’t.

How Harsh Comments Are Actually Perceived

Let’s go back to that “worst lecturer in the curriculum” line that crushed my junior colleague.

Here’s the mental filter experienced faculty use when they read hostile comments:

  • Singular extreme comment in a sea of decent scores
    Translation: one unhappy learner, probably personality clash, grading backlash, or someone having a bad day.

  • Hostile comment + low response rate
    Translation: noise. Statistically meaningless.

  • Hostile comment, but the only specific complaint is “too hard,” “expects too much,” or “tests on things we didn’t see”
    Translation: might actually be doing their job.

  • Hostile comment + specifics that match multiple other comments
    Now we pay attention.

There’s also the clinical reality check. If residents say, “She’s mean, she insists I present cases clearly and read overnight,” there’s usually a chuckle and someone says, “Sounds like my best attending in residency.”

But if students say, “He made racist jokes about patients,” the room goes quiet. That’s not teaching style; that’s professionalism.

We’re not idiots. We do know the difference.

How Smart Faculty Use Evals Instead of Fearing Them

The best teachers I know do not have perfect evals. They have clear stories in their evals.

I’ll tell you what they do differently:

They read for themes, not for ego. They keep a running document of:

  • Phrases that repeat over time (“organized,” “approachable,” “tough but fair”)
  • Specific criticisms that show up more than once (“too fast,” “slides crowded”)
  • Comments that align with what they themselves feel is weak

Then they make small, visible changes and—this part matters—they signal those changes to learners.

Examples I’ve seen work:

  • “Last year I got feedback that I move too fast through the imaging. So today I’m going to pause after each case and give you a minute to process before we discuss.”
  • “Residents have told me the feedback I give is too vague. I’m going to be more explicit today—expect some very direct comments.”

You know what that does? It:

  1. Shows you take feedback seriously
  2. Lowers the temperature when you inevitably give someone tough feedback
  3. Makes it harder for learners to write lazy, generic complaints
Mermaid flowchart TD diagram
Faculty Response to Teaching Evaluations
StepDescription
Step 1Receive eval report
Step 2Note but do not overreact
Step 3Prioritize 1-2 changes
Step 4Implement change next cycle
Step 5Tell learners about changes
Step 6Review next round of evals
Step 7Look for patterns

Here’s the real secret: committees love seeing trajectory. If your early years show mixed evals and your later ones show thoughtful improvement, that impresses people far more than a flat line of “fine, I guess.”

What Actually Gets You In Trouble

Let me be crystal clear about the scenarios that really raise red flags in closed-door meetings:

  1. Persistent, multi-year, below-average scores across multiple settings
    Not just one tough rotation. Across lectures, wards, small groups, everything. That suggests a global teaching problem, not a bad fit.

  2. Recurrent concerns about humiliation, disrespect, or safety
    “Belittles students,” “throws instruments,” “yells in the OR,” “punishes honest mistakes.” Even if the numbers are middling, this kind of pattern forces the committee’s hand. It becomes a professionalism issue.

  3. Discrepancy between evals and what you claim
    If your personal statement screams “passionate educator” but your evals are bottom quartile and you do zero faculty development, the mismatch bothers people. It looks like self-delusion or spin.

  4. Ignoring clear, repeated feedback
    If every year you hear “too disorganized” and the evals 5 years later say the same thing, promotion committees lose patience. The problem isn’t the evals; it’s your refusal to adapt.

Notice what’s not on that list:

  • One bad year during COVID chaos
  • A rough first year on a new clerkship
  • Lower scores in a notoriously tough course
  • A handful of angry comments after you failed someone or reported misconduct

We remember we were attendings and residents once too. Most of us have our own horror stories of “that one eval.”

How To Read Your Own Evals Like a Program Director

When you open your next eval report, stop reading it like a wounded human for five minutes and read it like a division chief.

Ask yourself:

  • Where am I relative to my peers? (If you don’t see comparison data, ask for it.)
  • What are the 2–3 adjectives that keep repeating across years?
  • Are my worst comments about style or about safety/respect?
  • Is there a specific context where my evals are consistently lower? (e.g., lectures vs. bedside)
  • Can I name one concrete change I’ll make next cycle?

Then, later, once the sting is gone, read them again as a human. Let yourself be proud of the quiet, specific compliments: “Took extra time to explain,” “Made me feel comfortable admitting what I didn’t know,” “Pushed me to be better.”

Those matter more than the random “best lecturer ever!!!” from the student who already loved the topic.

Physician educator annotating evaluation report with notes for improvement -  for Behind Closed Doors: How Teaching Evaluatio

If You’re On The Receiving End Of A “Concern”

If your chair calls you in “to talk about your teaching evaluations,” here’s what is usually happening:

  • Someone (clerkship director, course director, program director) flagged a pattern.
  • The chair wants to see if this is:
    • A documentation problem
    • A context problem
    • A real performance problem

You are not on trial yet. You are under assessment.

The worst move you can make is immediate defensiveness or blaming “these new learners today.” That tells the chair you’re going to be hard to coach.

A better approach:

  • Acknowledge you’ve seen the pattern.
  • Share at least one concrete change you’re willing to try.
  • Ask if there’s someone in the department with strong evals you could observe or get mentored by.

When we see a faculty member meet this moment thoughtfully, we’re actually relieved. We don’t want to fire you. We want to be able to tell the Dean, “We addressed it, they’re improving.”

FAQ

1. Can bad teaching evaluations actually get me fired or prevent promotion?
Yes—but only when there’s a sustained, multi-year pattern of poor evaluations across multiple teaching settings, usually combined with serious comments about disrespect, humiliation, or unsafe supervision. One bad year, or one hostile cohort, almost never derails a career by itself. It’s the combination of consistency, severity, and refusal to change that becomes lethal in promotion discussions.

2. Do committees really adjust for biased evaluations (gender, race, accent)?
At reputable academic centers, yes, at least informally. Seasoned leaders are very aware that women, underrepresented faculty, and foreign-trained physicians get harsher evals. In meetings, you’ll actually hear things like “remember the bias data” or “she’s in a male-dominated field, curve this in your head.” The problem is that this “correction” is not systematic—it depends on who’s in the room. That’s why you should not count on evals alone to prove your teaching value; build other evidence like peer observations and teaching portfolios.

3. How much do student vs. resident evaluations matter?
For medical school promotions, student evaluations carry more formal weight because they’re standardized and heavily audited for accreditation. For residency-focused faculty, resident evals are scrutinized more closely, especially in ACGME core faculty roles. In truth, committees like seeing strength in both groups: strong UME and GME evals tell us you can teach across levels, which looks very good for promotion. Persistently poor resident evals worry people more, because they imply issues with supervision and patient care.

4. What if my scores are average but I put in a huge amount of teaching effort?
Behind closed doors, effort alone doesn’t move promotion decisions—but documented impact does. If your numbers are average, you can still build a strong teaching case by showing: you developed curricula, created new rotations, mentored successful trainees, led simulations, or improved exam performance or milestone outcomes. Smart faculty don’t rely solely on numeric evals; they collect letters from course directors, document teaching roles, track outcomes, and get peer evaluations. Committees love a coherent narrative of “this person steadily improved and built something that lasts,” even if the raw scores are merely solid rather than spectacular.


When the door closes and your file is on the table, nobody is counting decimal points. They’re asking three questions: Are you safe? Are you teachable? And are you contributing more than you’re costing? If your evaluations tell a story of basic competence, gradual growth, and responsiveness to feedback, you’re going to be fine—no matter what that one vicious comment said.

overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles