Does Applicant Gender Affect How LORs Are Written? Emerging Evidence

January 5, 2026
13 minute read

Residency applicant reviewing letters of recommendation on a laptop in a hospital workroom -  for Does Applicant Gender Affec

Only 23–30% of residency recommendation letters written for women contain standout “superlative” phrases, compared with roughly 40–50% for men in some specialties. Same stage of training. Same role. Very different language on paper.

That is the core of the emerging evidence: applicant gender does affect how letters of recommendation (LORs) are written in residency applications. Not in a cartoonishly obvious way. The differences are subtle, statistical, and cumulative. But they are real, and they show up consistently when you stop reading letters like a human and start parsing them like data.

Let me walk you through what the numbers actually show, how large the effects really are, and what you can do about it as an applicant or letter writer.


What the Data Actually Shows About Gendered LORs

Most of the better studies use one of three tools:

  1. Manual content coding with predefined word lists (e.g., “communal” vs “agentic” words).
  2. Linguistic Inquiry and Word Count (LIWC)–style lexical analysis.
  3. More recent NLP / machine learning text analyses on large ERAS datasets.

Across these, the pattern repeats.

Communal vs. Agentic Language

Historically, letters for women in medicine (and science in general) have contained more “communal” terms:

  • Caring, kind, helpful, nurturing, supportive, team player

Letters for men have skewed toward “agentic” or “ability-based” language:

  • Brilliant, outstanding, leader, decisive, confident, gifted

One early often-cited set of work comes from medical school and faculty evaluations, but residency-focused work has confirmed similar trends.

In radiology residency applications, for example:

  • Women were more likely to be described with communal adjectives (e.g., “kind,” “pleasant,” “hardworking”).
  • Men were more likely to be labeled with stand-out ability descriptors (“brilliant,” “exceptional,” “star”).

These are not random anecdotes. They are frequency differences across thousands of letters.

Length, Specificity, and “Standout” Language

Length is one of the easier metrics to quantify.

Multiple specialty-specific analyses (e.g., EM, radiology, surgery) have found:

  • Average word count is often slightly shorter for women’s letters than men’s, though not always dramatically.
  • What is more consistent: density of standout phrases.

Think phrases like:

  • “One of the best students I have worked with in the last 5–10 years”
  • “Ranks in the top 5% of residents / students I have trained”
  • “Absolutely outstanding” / “truly exceptional”

These appear more frequently for male applicants, even after adjusting for some performance indicators.

To make this more concrete:

Example Differences in Standout Language Frequency
Specialty (Study)Letters for Men with SuperlativesLetters for Women with Superlatives
Radiology (sample study)~45%~28%
EM SLOEs (older data)~38%~26%
IM Sub-I letters~42%~30%

Do not over-interpret exact percentages across different studies; the methodology and samples differ. The consistent direction of the effect is the point: male applicants more often get “this is a star” language.

The “Doubt-Raising” Phrases Problem

Several analyses of academic recommendation letters, including in medicine, have highlighted “doubt-raising” phrases for women:

  • “Although she lacks experience in X, she works very hard…”
  • “With continued mentorship and support, she will succeed…”
  • “She will do well in the right environment…”

These hedging or conditional statements signal uncertainty, even if the letter is officially “positive.”

Quantitatively, doubt-raising language shows up more often in letters for women. Not overwhelmingly, but enough to matter statistically.

A typical finding: something like 10–15% of women’s letters include at least one doubt-raising phrase, compared with perhaps 5–8% of men’s, in similar applicant pools.


Specialty-Specific Patterns: Where Gender Gaps Show Up the Most

The bias is not uniform across specialties. Competitive, male-dominated fields tend to show more pronounced linguistic differences. That fits what you would expect if stereotype-based expectations are leaking into language use.

bar chart: Surgery, Radiology, Internal Med, Pediatrics

Relative Frequency of Agentic vs Communal Terms by Gender (Illustrative Across Specialties)
CategoryValue
Surgery1.6
Radiology1.4
Internal Med1.2
Pediatrics1.1

Interpretation of the above example ratio: values >1 indicate that agentic-to-communal word ratio is higher for men than for women in that specialty. Again, this is illustrative but directionally aligned with published work.

Surgery and Other Procedural Fields

In surgery and some procedurally heavy specialties, letters for male applicants:

  • More frequently highlight technical skill, decisiveness, leadership in acute settings.
  • Use competitive framing (“top 1–5%,” “rises above peers,” “natural leader in the OR”).

Women are:

  • More often praised for work ethic, reliability, being “pleasant to work with,” and patient-centered care.
  • Less often explicitly compared to the very “top” of previous trainees.

Functionally, the data show that female applicants get framed as “solid, dependable, hardworking,” while males are more often framed as “elite, gifted, commanding.” Residency selection committees read both, usually side by side.

Internal Medicine, Pediatrics, and “Caring” Specialties

In internal medicine and pediatrics, the communal bias is more “aligned” with the specialty identity. Everyone is supposed to be caring and team-oriented. That blurs some contrasts, but you still see that:

  • Men’s letters more frequently include language about intellectual ability, research potential, or “future leader in the field.”
  • Women’s letters tilt more toward bedside manner, empathy, team functioning.

One pediatrics-oriented analysis showed that both genders were described with communal terms frequently, but when you isolate high-intensity praise about clinical reasoning or “leadership potential,” men were again overrepresented.

EM and SLOEs

Emergency medicine has a somewhat standardized letter of evaluation (SLOE), which allows for more structured comparisons. Studies of SLOEs have found:

  • Numeric rankings (e.g., top 1/3, top 10%) are sometimes similar by gender within a given med school, but free-text comments diverge.
  • Women more often receive comments emphasizing reliability, teamwork, and communication.
  • Men more often get comments highlighting speed, decisiveness, or “taking charge.”

Structured data make it easier to see when narrative language diverges from numeric ratings. That divergence is part of the emerging story.


Quantifying Gender Effects: How Big Are the Differences?

The honest answer: moderate in language. Subtle in any single letter. Potentially large when aggregated across many small decisions.

Effect Sizes and Odds Ratios

Where studies report effect sizes, they tend to show:

  • Small to medium standardized differences (Cohen’s d around 0.2–0.5) in frequencies of certain word categories.
  • Odds ratios >1.3–1.8 for men receiving standout ability descriptors versus women with similar scores.

Those are not trivial. But they also are not so massive that you can point to one letter and say “this is clearly biased” without doing a comparative analysis.

From a selection-committee perspective, what matters is how these linguistic differences correlate with interview offers and rank positions. That is harder to study directly because program ranking data are not easily accessible and are confounded by test scores, school prestige, and research.

Correlation with Downstream Outcomes

Where there is partial data:

  • Stronger superlative language and explicit comparisons to top-tier past trainees correlate with higher interview and match success. That is intuitive and supported by survey data: PDs explicitly state they weigh these phrases.
  • Letters with more vague praise and more communal-only language correlate with lower perceived competitiveness when PDs are asked to rate applicant strength in experimental reading tasks.

A few simulation-style experiments gave identical CVs to faculty but manipulated the letter tone. Letters with agentic, superlative language led to higher “would invite for interview” ratings, independent of gender labels on the CV.

Now combine that with the observed gender language skew. The math is obvious.


Intersectionality: Gender Does Not Act Alone

If you stop at “men vs women,” you miss a key layer. Some of the newer large-text analyses show that gendered language interacts with race/ethnicity.

There is evidence that:

  • Women of color may receive even less superlative language and more doubt-raising qualifiers than white women.
  • Men of color sometimes get less of the “brilliant / standout” language than white men, even when controlling for USMLE scores and school tier.

The problem: the sample sizes for intersectional groups get smaller fast, so the estimates are noisier. But directionally, the data suggest that being a woman plus a racial/ethnic minority may compound disadvantage in how letters are written.

So the story is not “gender only.” It is “gender plus the usual structural inequities” showing up in yet another place.


Standardized Letters: Do They Reduce Gender Bias?

Some specialties have tried to tame the wild west of narrative letters with standardized forms (e.g., EM SLOE, some IM structured letters).

The theory: constrain what can be said and how performance is rated, and bias will shrink.

To some extent, it works.

  • Numeric domains (clinical reasoning, professionalism, etc.) show smaller gender gaps when raters are forced to anchor against specific benchmarks.
  • Distribution of rating categories sometimes becomes more equitable.

However:

  • Free-text comment sections still display gendered word choice even inside standardized formats.
  • Even when numeric ratings are comparable, the language used to describe “top” male vs female candidates still diverges.

This tells you something important: standardization helps, but does not eliminate the deeper cognitive biases that shape narratives.


What This Means Practically for Applicants

You cannot single-handedly fix structural bias as an MS4. But you can manage your own risk.

Be Strategic About Whom You Ask

The data are clear on one point: the writer matters as much as your gender.

  • Some attendings consistently write long, enthusiastic, comparative letters full of clear rankings and superlatives.
  • Others churn out three bland paragraphs that could describe anyone. For everyone, not just you.

You want the first group.

Look for:

  • Faculty who have written successful letters for prior applicants you know personally.
  • People who actually worked closely with you in settings where you led, presented, or made complex decisions.
  • Those who already talk about you in clear comparative language: “You’re one of the strongest students we have had on this service.”

If someone speaks about you in vague, purely communal terms—“you’re so nice, patients love you, you’re very diligent”—and never once comments your reasoning or leadership, that tone is likely to show up in the letter.

Explicitly Prime Your Letter Writers

You can partially counteract gendered tendencies by giving your letter writers structured input. Not a generic CV. A targeted, data-rich memo.

Include:

  • A bullet list of 3–5 specific clinical scenarios where you:
    • Took initiative.
    • Made high-quality decisions.
    • Led a team or managed a critical situation.
  • Concrete metrics: “Highest exam score in the clerkship,” “top 10% on OSCE metrics,” “poster at national conference,” etc.
  • A brief note about your career goals and strengths you hope the letter can highlight (clinical reasoning, leadership, procedural skills).

This is not manipulative. It is data provisioning. You are nudging the letter writer away from default, gendered stereotypes (“team player,” “kind”) and toward concrete, agentic achievements.

Faculty who are unaware of their unconscious bias often appreciate the specificity. They just need the raw material.


For Letter Writers: How to De-Bias Your Own Letters

Since many readers of this type of article eventually sit on the other side, let me be blunt: you are almost certainly not immune to this, regardless of your beliefs.

Audit Your Last 10 Letters

Pick 5 letters you wrote for men and 5 for women at the same level (e.g., MS4s applying to the same specialty). Run a quick, crude analysis:

  • Count explicit comparative phrases (“top X%,” “one of the best…”).
  • Note whether you included standalone superlatives (“outstanding,” “exceptional,” “brilliant”).
  • Identify doubt-raising hedges (“although,” “however,” “with support,” “needs to continue to develop”).

You will likely see the pattern. Even if you are a woman writing letters for women.

Once you see it, it is hard to unsee.

Use Structured Language Templates

You do not need a fancy algorithm. You can build your own minimal checklist:

  • Opening: clear statement of relationship and duration.
  • Performance summary with explicit comparative statement:
    • “Among the ~X students at this level I work with annually, I would place [Name] in the top [Y%].”
  • Two to three concrete examples of:
    • Complex clinical reasoning.
    • Leadership and initiative.
    • Resilience under pressure.
  • Closing: unambiguous recommendation strength:
    • “I give [Name] my highest recommendation without reservation for your residency program.”

Then, importantly: apply this template equally across genders. Do not save superlatives only for men you “feel” are exceptional; interrogate whether a woman with similar metrics and performance is being undersold in your language.


How Program Directors Are Responding

Program directors (PDs) are not oblivious to this. Many have seen the research and have thousands of letters in their own mental training set.

Three PD behaviors are increasingly common:

  1. De-weighting narrative fluff: Some PDs basically skim letters for red flags or extreme outliers and focus far more on scores, school, and experiences.
  2. Looking for specific signals: When PDs do read closely, they look for:
    • Explicit comparative statements.
    • Clear enthusiasm vs lukewarm tone.
    • Any evidence of professionalism concerns hidden in polite language.
  3. Adjusting for bias: A subset of PDs report mentally adjusting expectations, especially for women and underrepresented applicants, when letters appear overly communal or restrained.

But there is a limit to how much mental calibration can do when you are reading hundreds of files in a compressed timeline.

A small number of programs are experimenting with more formalized scoring rubrics for letters, or even automated text analysis to flag standout vs undermining language. That trend will probably grow.


Where the Evidence Is Still Thin

You should also know where the data are not as strong:

  • Direct causal links from letter language → match outcome in large, multi-specialty datasets.
  • High-quality, intersectional analyses broken down by gender, race, and institution type with sufficient power.
  • Longitudinal changes over time as awareness of bias increases.

We have solid evidence that gendered language exists and is measurable. We have plausible pathways linking that language to selection decisions. We have smaller studies suggesting real impact.

What we lack is a massive, ERAS-wide, multi-year dataset with applicant demographics, full letters, and match outcomes, analyzed with modern NLP. That would settle the magnitude question.

If that sounds like overkill, it is not. That is the level of rigor we apply to much smaller clinical questions. Educational and career selection systems deserve the same.


Key Takeaways

  1. Data across multiple specialties show consistent gender differences in residency LOR language: men receive more agentic, superlative, and comparative praise; women receive more communal, sometimes hedged descriptions.
  2. These differences are modest letter by letter but meaningful in aggregate, especially in competitive fields where marginal advantages matter.
  3. Applicants can partially counteract bias by choosing strong writers and providing concrete, achievement-focused input; writers can de-bias by auditing their own patterns and standardizing comparative, explicit praise across genders.
overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.
Share with others
Link copied!

Related Articles