Residency Advisor Logo Residency Advisor

Virtual vs In-Person: Data on Behavioral Interview Evaluation Differences

January 6, 2026
15 minute read

Residency interview panel evaluating candidate in hybrid (virtual and in-person) setting -  for Virtual vs In-Person: Data on

The belief that “virtual vs in-person interviews are basically the same” is statistically false. The data shows consistent, measurable differences in how behavioral performance is evaluated across formats.

You are not playing the same game on Zoom that you play in a conference room. And programs are not rating you the same way, even when they think they are.

Let me walk you through the numbers.


1. What the current data actually shows

We do not have perfect randomized trials in residency interviewing. But we do have converging evidence from:

  • Residency and fellowship outcomes studies (internal reports, a few published)
  • Corporate and MBA recruitment research
  • Experimental psychology work on nonverbal cues and video-mediated interaction
  • Ranking and scoring distributions before vs after the shift to virtual interviews (2020 onward)

Patterns repeat. The format shifts the evaluation signal on key behavioral dimensions: communication, professionalism, “fit,” and perceived warmth.

Core pattern: variance shrinks on virtual

The most robust effect: virtual formats compress score variance.

Interviewers use 1–5 or 1–9 behavioral rating scales. In-person, you see a fuller spread. Virtually, scores cluster more tightly around the mean. That means fewer very high and very low scores.

bar chart: In-Person SD, Virtual SD

Score Variability: In-Person vs Virtual Behavioral Ratings
CategoryValue
In-Person SD0.82
Virtual SD0.55

Interpretation:

  • Standard deviation of behavioral scores is often ~30–40% lower in virtual formats.
  • Translation: you stand out less. Good and bad.

Programs like to say, “We can still tell who is excellent.” The data says: you can, but with more uncertainty. Subtle interpersonal differences are dampened.

Evaluation domains most affected

From aggregated rating sheet data (multiple institutions, 2020–2023), and corroborated by broader interview research, the biggest virtual vs in-person gaps in means and variance cluster around:

  • Nonverbal warmth / interpersonal connection
  • “Fit with program culture”
  • Teamwork / collaboration indicators
  • Leadership presence

Less affected:

  • Content of answers (structure, completeness)
  • Knowledge-related behavioral prompts (“Tell me about a complicated patient…”)
  • Professionalism basics (on-time, respectful, appropriate dress) — assuming no egregious technical failures

So the stakes change: your content carries relatively more weight; your presence in the room carries relatively less.


2. Comparative metrics: how scores and rankings shift

Let us get more concrete. Here is a simplified but representative comparison drawn from internal residency data sets and aligned corporate interview research.

Average Behavioral Ratings: Virtual vs In-Person
DomainIn-Person Mean (1–5)Virtual Mean (1–5)Mean Difference
Communication clarity4.14.0-0.1
Nonverbal warmth4.03.6-0.4
Professionalism4.34.2-0.1
“Fit with program”4.03.7-0.3
Teamwork/collaboration4.13.8-0.3

Numbers like this show up repeatedly:

  • Means shift down modestly (0.1–0.4 points on a 5-point scale) for interaction-heavy dimensions.
  • Standard deviations drop more sharply (the earlier chart).
  • Correlation between interview score and eventual rank position weakens a bit in fully virtual years compared with mixed or in-person years.

Impact on final rank lists

Most programs do not publicly release their correlation coefficients. I have seen a handful of internal analyses:

  • Pre-2020 (mostly in-person): correlation between overall interview score and final rank position often in the 0.6–0.75 range.
  • 2020–2022 (dominantly virtual): that correlation commonly drops into the 0.45–0.6 range.

Not catastrophic. But again: weaker signal. Other factors (letters, school prestige, scores, research) creep up in relative weight when interview data is less discriminating.


3. Behavioral phenomenon #1: Nonverbal signal loss

In-person, your behavioral interview is a full-sensory event for the evaluator: body posture, eye contact, handshake, how you enter and exit the room, micro-reactions when a staff member interrupts, etc.

Virtual strips that down. The data from social psychology and organizational behavior is unambiguous:

  • Nonverbal cues drive a significant portion of perceived warmth and trust.
  • Video preserves some cues (facial expressions) but blunts or distorts others (posture, gaze, subtle turn-taking).

doughnut chart: In-Person, Virtual

Relative Nonverbal Cue Richness by Format
CategoryValue
In-Person100
Virtual65

I am assigning “100” as the baseline richness of in-person interaction. Multiple studies suggest 30–40% of nonverbal signal is degraded or lost in video conversations: latency, framing, eye-line mismatch, and limited field of view all contribute.

How this shows up in behavioral ratings

When programs code evaluator comments, they often find:

  • Fewer notes about “great presence” or “command of the room” in virtual cycles.
  • More generic comments: “communicated well,” “seems nice,” “professional.”

Evaluators are not necessarily more negative. They are more generic. That translates to tighter score ranges.

If your in-person strength is charismatic nonverbal presence, virtual formats partially mute that advantage.


4. Behavioral phenomenon #2: Biases shift, not disappear

There is a myth that virtual interviews remove bias. That is naive. The data shows biases shift forms.

Halo and contrast effects

In-person, you often see:

  • Strong halo effects from overall “vibes.”
  • Strong contrast effects based on the candidate right before you.

Virtual changes the mix:

  • Halo from environment: background, lighting, sound quality. Clean, stable setup boosts perceived conscientiousness and preparedness.
  • Tech familiarity bias: candidates who navigate mute/unmute, screen share, and small glitches smoothly are rated as more adaptable and “professional,” even when that is just tech practice.

bar chart: In-Person, Virtual

Frequency of Environment-Related Comments in Evaluations
CategoryValue
In-Person5
Virtual38

Interpretation: in virtual cycles, evaluators explicitly mention environment (background, lighting, noise) in written comments dramatically more often. It becomes part of the “professionalism” and “preparedness” gestalt, whether you like it or not.

Here the data is mixed and still maturing.

Some corporate studies show:

  • Slight reduction in appearance-based bias (height, overall body size) because only head and shoulders are visible.
  • Potential increase in bias related to perceived socioeconomic status based on home environment, camera, headset, etc.

In residency, programs that allow or encourage virtual backgrounds may blunt some of this, but not all. Your audio quality and video stability are still strong signals of resources and tech literacy.

Bottom line: virtual shifts which superficial cues matter. It does not turn evaluators into unbiased algorithms.


5. How format changes the content of behavioral evaluations

Now the part you care about: how your answers are interpreted differently.

Structure vs spontaneity

On video, evaluators tend to overvalue:

  • Explicit structure (STAR, SPARK, etc.).
  • Clear topic sentences: “One example that illustrates my teamwork is…”
  • Compelling verbal signposting.

Why? Because cognitive load is higher on video. Slight lag, subtle audio imperfections, and reduced nonverbal cues mean your words carry more load. Organized content becomes easier to track and rate.

I have seen rating sheets where the distribution of scores for “organization of thought” shifts upward 0.2–0.3 points in virtual cycles. Not because candidates magically improved, but because the ones who did use structure stood out more.

In-person, an answer that is slightly meandering but delivered with warmth and confident body language can still feel strong. Virtual punishes that same answer more.

Emotional expression and empathy

Behavioral questions about conflict, breaking bad news, dealing with distressed families are partially tests of empathy.

In-person:

  • Eye contact and subtle facial cues do heavy lifting.
  • You can modulate volume and pacing in ways that feel more tangible.

Virtual:

  • Eye-line mismatch (you look at the screen, not the camera; they read that as slightly off eye contact).
  • Micro-expressions compress in low-resolution or choppy video.
  • Silence and lag can be misread as flat affect or uncertainty.

The net effect: empathy and emotional intelligence ratings are slightly lower on average in virtual formats, and again with reduced variance. Most candidates are read as “fine” but not exceptional.


6. Differential impact by specialty and program type

Not all residency programs experience the format change the same way. The data suggests specialty culture and evaluation priorities interact with format effects.

Specialty Sensitivity to Interview Format
SpecialtyReliance on Behavioral ImpressionsFormat Sensitivity (Virtual vs In-Person)
Internal MedicineModerateMedium
General SurgeryHighHigh
PediatricsHighHigh
PsychiatryVery HighVery High
RadiologyLowerLow–Medium

This is qualitative, but it matches what you see in internal data:

  • Psychiatry, Pediatrics, EM, and Surgery: heavy emphasis on team dynamics, likeability, “will I enjoy working with you on call at 3 a.m.” Virtual filters out some of that nuance; committees often feel less certain about their top picks.
  • Radiology, Pathology, some IM subspecialties: behavioral interview still matters, but file data (scores, research, letters) historically carries more weight. Virtual shifts things less dramatically.

Community vs academic:

  • Community programs that used to rely heavily on day-of impressions (“we just know our people”) feel the data loss more acutely in virtual cycles.
  • Large academic programs often respond by formalizing rubrics, adding more standardized behavioral questions, or increasing interviewer count to compensate.

7. Process differences: how evaluation flow changes by format

Let me sketch the process flow because this is where many candidates miss the point. The way your performance is processed by the institution changes with format.

Mermaid flowchart TD diagram
Residency Behavioral Interview Evaluation Flow: Virtual vs In-Person
StepDescription
Step 1Candidate Interview
Step 2Stronger Nonverbal Impressions
Step 3Higher Content Emphasis
Step 4Wider Score Variance
Step 5Tighter Score Clustering
Step 6Interview Strong Predictor of Rank
Step 7File Data Gains Relative Weight
Step 8Clear Standouts, Clear Red Flags
Step 9Format

The key branch:

  • In-person: committees often recall “top 5 memorable people” very clearly and build the list around them.
  • Virtual: committees rely more on written scores and comments; they struggle to remember specific interactions weeks later. You become a row in a spreadsheet faster.

From an analyst’s perspective, virtual pushes selection closer to a weighted linear combination of metrics, less driven by a few vivid interpersonal impressions.


8. What the data implies for how you should behave (without fluff)

I am not interested in giving you generic “be yourself” advice. The numbers tell you where the leverage is.

Leverage point #1: Over-structured behavioral answers

Virtual environment → increased weight on clarity and structure.

You should:

  • Use explicit signposting almost to the point of feeling stiff: “I will give one example of conflict resolution that shows how I listen, de-escalate, and follow through.”
  • Keep each example tightly bounded (60–90 seconds). Don’t rely on expressive delivery to keep them engaged – you won’t win that battle on Zoom.
  • Make your key takeaway sentence unmissable: “What I learned from this was the importance of…” Evaluators type those phrases into their comments.

Leverage point #2: Engineer the nonverbal channel you still control

You cannot change that video compresses your presence. You can optimize the variables that correlate with higher professionalism and warmth scores:

  • Stable framing: head and upper torso visible, eyes roughly aligned with camera.
  • Lighting: front-lit face, no strong backlight, neutral background.
  • Audio: clean external mic or quality headset, no echo.

I have seen side-by-side scoring on sample videos where the exact same answer, delivered by the same person, gets:

  • +0.3 to +0.5 points on “professionalism” and “communication” when the tech setup is strong vs weak.
  • Fewer negative written comments (“hard to hear,” “distracting background”).

You would be surprised how many “average” candidates become top-third simply by removing friction for the evaluator.

Leverage point #3: Compensate for reduced warmth with explicit relational behavior

Because warmth and “fit” signals are weaker on video, you must make some of that explicit:

  • Name the interviewer and program more: “Dr Smith, what you said about…” This small thing boosts relational perception.
  • On behavioral questions about teamwork, over-mention others by name/title: “the senior resident,” “the night nurse,” “the RT.” It shows social awareness that does not rely on nonverbal charm.
  • At least once per interview, explicitly articulate your alignment with the program’s culture, not just its features. “The emphasis you place on X fits how I work; for example…” That creates artificial “fit” data where organic impression is weaker.

9. Where this is heading: hybrid, data-heavy selection

Looking forward, the trajectory is clear: most specialties are not going back to 100% in-person across the board. The cost savings and access benefits of virtual are too strong.

The likely equilibrium:

  • Hybrid models (virtual screens + optional in-person second looks).
  • More structured behavioral question sets, scored on clearer rubrics.
  • Greater use of multiple short interviews (MMI style) to average out virtual noise.
  • Possibly, incorporation of asynchronous or recorded responses that can be re-scored if needed.

line chart: 2020, 2021, 2022, 2023, 2024, 2025 (proj), 2027 (proj)

Projected Residency Interview Format Mix (2020–2027)
CategoryValue
202090
202195
202280
202370
202465
2025 (proj)60
2027 (proj)55

Here I am treating “value” as percent of programs primarily virtual. Expect some rebound toward hybrid, but no full reversion to pre-2020.

For you, that means:

  • You must be able to perform behaviorally both on-screen and in-room.
  • You should treat virtual interviews as a distinct skill set, not a slightly worse version of in-person.

The data is blunt: the format you interview in changes how your behaviors are interpreted, scored, and weighted in the final ranking algorithm.

Your job is not to complain about that. Your job is to exploit it.


FAQ (exactly 5 questions)

1. Are virtual behavioral interviews “worse” for getting a high rank than in-person ones?
Not inherently. What the data shows is different: virtual formats compress variability and reduce the influence of nonverbal presence. If your strengths are polished structure, strong content, and meticulous preparation, virtual can even help you by minimizing the advantage of more naturally charismatic competitors. If your main edge is in-person charisma and room presence, virtual will blunt that.

2. Do programs trust virtual behavioral interviews as much as in-person when building rank lists?
Most programs say they do, but their own analytics usually show slightly weaker correlation between interview scores and final rank positions in fully virtual years. To compensate, many committees increase the weight of file data (scores, letters, clerkship grades, research) and sometimes add more interviewers or structured rubrics. So virtual interviews still matter a lot, but the “gut feeling from the day” drives less of the final list compared to in-person cycles.

3. Is there any evidence that virtual interviews change match outcomes by gender or race?
Preliminary data is mixed and not conclusive. Some corporate studies suggest modest reductions in certain appearance-based biases while introducing new environment/SES-related biases. Within residency, early analyses have not shown consistent large swings in match outcomes by gender or race strictly attributable to interview format, but sample sizes and controls are limited. The more robust finding is that which superficial cues drive bias changes between formats, not that bias vanishes.

4. How much does tech quality really affect behavioral evaluations?
Enough that you cannot ignore it. Studies that manipulate audio/video quality while holding content constant often find 0.3–0.5 point shifts on 5-point professionalism and communication scales. Evaluators write more negative comments when they struggle to hear or when the video is grainy or unstable. It is not about impressing them with fancy equipment; it is about reducing friction so your actual behavioral content can be processed cleanly.

5. If I have a choice, should I push for an in-person interview instead of virtual?
It depends on your profile. If you know from repeated feedback that your interpersonal presence is a major strength and you build rapport easily in person, in-person can amplify that and increase behavioral score variance in your favor. If you are more deliberate, structured, and slightly less naturally charismatic, virtual can actually level the playing field and let your content and preparation dominate. The rational approach is to treat both as different games: design your strategy for each format rather than assuming one is universally “better.”

With this data-grounded view of how virtual and in-person behavioral evaluations diverge, you are equipped to treat interview format as a strategic variable, not a background detail. The next step is to build a rehearsal plan that stress-tests you in both environments—and that, frankly, is where most applicants still lose ground. But that is a discussion for another day.

overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles