Residency Advisor Logo Residency Advisor

Basic Biostatistics for Student Researchers: Tests You Actually Use

December 31, 2025
18 minute read

Medical students learning biostatistics together -  for Basic Biostatistics for Student Researchers: Tests You Actually Use

Most student research projects fail not because the idea is bad, but because the statistics are an afterthought.

You do not need to be a statistician to do credible research. You do need to know a tight, practical set of tests cold—and when not to use them. That is what separates a “poster that looks fine” from a project faculty actually respect.

Let me break this down specifically.


1. The Five Questions That Choose Your Test

Every statistical test you will actually use in student research comes down to five questions. If you train yourself to answer these before you open SPSS, R, or Prism, most of the confusion disappears.

Whiteboard with decision tree for choosing statistical tests -  for Basic Biostatistics for Student Researchers: Tests You Ac

Question 1: What is your outcome variable’s type?

Forget the fancy names; you need to decide what you are measuring as your main outcome.

You will most commonly see:

  1. Continuous (scale / numeric)

    • Has a real numeric scale, where differences and averages make sense.
    • Examples:
      • Systolic blood pressure (mmHg)
      • HbA1c (%)
      • Test score (0–100)
      • Length of stay (days)
  2. Binary (dichotomous)

    • Only two categories.
    • Examples:
      • Passed vs failed exam
      • Complication: yes/no
      • Smoker vs non-smoker
  3. Categorical (more than two levels)

    • No intrinsic order:
      • Blood type (A/B/AB/O)
      • Specialty choice (IM, surgery, pediatrics…)
    • Ordered:
      • Likert scale (“strongly disagree” to “strongly agree”)
      • NYHA class I–IV
  4. Time-to-event

    • Time until something happens (or study ends).
    • Examples:
      • Time to relapse
      • Time to death
      • Time to hospital readmission

Question 2: How many groups are you comparing?

You need to know if you are:

  • Describing one group only
    (e.g., mean Step 1 score in your class)
  • Comparing two groups
    (e.g., students who did vs did not use Anki)
  • Comparing three or more groups
    (e.g., mean OSCE scores across three different sites)

Question 3: Independent or paired?

This is the single most common error in student projects.

  • Independent (unpaired) data

    • Different individuals in each group
    • Example: Comparing mean exam scores between MS1 and MS2
  • Paired (dependent) data

    • Same individuals measured twice, or matched pairs
    • Example: Pre- and post-bootcamp exam scores in the same students
    • Example: Case–control study where each case is matched to a control by age and sex

Paired data usually need paired tests. If you ignore this, your p-values will be misleading.

Question 4: What is your study design?

You do not need to name it for publication, but it determines what you can validly claim.

Common student designs:

  • Cross-sectional (single time point, e.g., survey)
  • Pre–post (before and after an intervention)
  • Cohort (follow a group over time)
  • Case–control (start with outcome, look back for exposures)
  • Randomized trial (rare in student work, but not impossible)

The design constrains your test choice—and even more, what kind of causal language you should avoid.

Question 5: Are the assumptions met?

Student researchers often ignore this completely. That is dangerous.

Key assumptions (for parametric tests):

  • Outcome is roughly normally distributed (for small samples)
  • Variances are similar between groups (for some tests)
  • Observations are independent (no clustering you ignore)

If your data break these assumptions badly, you use a non-parametric alternative.


2. The Core Tests: A Decision Map You Should Memorize

If you know these 10–12 tests very well, you can handle 90% of student research:

  • t-test (independent and paired)
  • ANOVA (one-way)
  • Chi-square test
  • Fisher’s exact test
  • Correlation (Pearson, Spearman)
  • Linear regression
  • Logistic regression
  • Mann–Whitney U
  • Wilcoxon signed-rank
  • Kruskal–Wallis

Here is how to pick among them based on outcome type and group structure.

Table of common statistical tests and when to use them -  for Basic Biostatistics for Student Researchers: Tests You Actually

2.1 Continuous outcome, two independent groups

Scenario: You are comparing mean values between two separate groups.

Examples:

  • Mean Step 1 score in students who did vs did not take a commercial course
  • Mean systolic BP in patients on Drug A vs Drug B

Parametric test (normal-ish data):
Independent samples t-test (a.k.a. two-sample t-test)

Assumptions (simplified):

  • Outcome is approximately normally distributed in each group (especially important if n < 30 per group)
  • Variances are similar between groups (most software can handle unequal variances; that gives Welch’s t-test)

Non-parametric alternative (skewed/small sample):
Mann–Whitney U test (also called Wilcoxon rank-sum test)

Key points for student projects:

  • Do not report “median” and then use a t-test. That combination is a red flag.
  • With very large sample sizes (hundreds per group), t-tests are robust even if data are a bit skewed.

2.2 Continuous outcome, two paired measurements

Scenario: You measure the same individuals twice.

Examples:

  • Pre-course vs post-course exam scores in the same students
  • Baseline vs 3-month HbA1c in the same patients
  • Left vs right eye measurement in the same person

Parametric test:
Paired t-test

You are not comparing two separate groups; you are comparing within-person change.

Non-parametric alternative:
Wilcoxon signed-rank test

Common mistakes:

  • Using an independent t-test when you should use a paired t-test
  • Forgetting to check distribution of the differences (pre-post), not the raw scores

2.3 Continuous outcome, ≥3 independent groups

Scenario: You have one continuous outcome and three or more unrelated groups.

Examples:

  • Mean OSCE score in students at three clinical sites
  • Average BMI across four ethnic groups
  • Mean hospital stay length across three treatment arms (non-randomized)

Parametric test:
One-way ANOVA

ANOVA tests if at least one group mean differs from the others. It does not tell you which groups differ. For that you need:

  • Post hoc tests (Tukey, Bonferroni, etc.)
  • Or pre-specified pairwise comparisons (with some correction for multiple testing)

Non-parametric alternative:
Kruskal–Wallis test

Typical student project pattern:

  • You have three or more specialties, clerkship sites, or years of training.
  • You use ANOVA if data are roughly normal and variances are similar.
  • If not, you fall back to Kruskal–Wallis and report medians and IQR.

2.4 Continuous outcome, ≥3 repeated measures

You will encounter this less often as a student, but it does appear in education interventions or small clinical cohorts.

Examples:

  • Pre, immediately post, and 3-month follow-up exam scores in same students
  • Baseline, 6-month, 12-month blood pressure in same patients

Technically:

  • Repeated measures ANOVA (parametric)
  • Friedman test (non-parametric)

For most simple student projects, it is acceptable to:

  • Focus on pre vs post (two timepoints) and use a paired t-test or Wilcoxon signed-rank
  • Clearly say in limitations that you simplified the analysis

If you have a serious education project with >3 time points, involve a statistician.


3. Categorical Outcomes: Chi-square, Fisher, and Friends

Most medical student projects involve categorical outcomes somewhere: complications yes/no, pass/fail, response categories, etc.

Two tests dominate here: Chi-square and Fisher’s exact.

3.1 Two categorical variables: Are they associated?

Scenario: Both variables are categorical. You want to know if there is an association.

Examples:

  • Smoking status (yes/no) and lung cancer (yes/no)
  • Gender (M/F) and specialty choice (IM/surgery/pediatrics/etc.)
  • Group (intervention/control) and exam pass/fail

Data are usually summarized in a contingency table.

2 × 2 tables (both variables binary)

Example:

Outcome Yes Outcome No
Exposed a b
Not Exposed c d

Main test:
Chi-square test of independence

Interpretation:

  • p-value < 0.05 suggests exposure and outcome are not independent (association exists)

When to use Fisher’s exact test

If:

  • Expected cell counts are small (classic rule of thumb: any expected cell < 5; some are stricter)
  • Sample size is very small

Then you switch to:

Fisher’s exact test

Student mistake to avoid:

  • Running Chi-square on tiny tables (e.g., some cells have 0, 1, 2) instead of Fisher. Many reviewers will immediately question the validity.

Larger tables (e.g., 2 × 3, 3 × 4)

Examples:

  • Race (3 categories) vs outcome (yes/no) → 3 × 2
  • Specialty (4 choices) vs gender (M/F) → 4 × 2

Still use:

Chi-square test

If many cells have very low expected counts, you may need to:

  • Collapse categories (e.g., combine smaller specialties) if it makes clinical sense
  • Or get more data

3.2 Ordinal data: Likert scales and ordered categories

Students love Likert scales. Journals are more skeptical.

Common issue: treating 1–5 Likert scores as continuous and using t-tests or ANOVA.

Options:

  • Treat as ordinal:
    • Compare distributions with Mann–Whitney U (two groups) or Kruskal–Wallis (≥3 groups)
  • Or, very cautiously, treat as approximately continuous:
    • Use t-tests / ANOVA if:
      • You have many respondents
      • The scores are not extremely skewed
      • You acknowledge this in limitations

Binary simplification is sometimes reasonable:

  • E.g., collapse “strongly agree” + “agree” vs the rest, then use Chi-square
  • But you lose detail; do not do this mechanically

4. Correlation and Regression: When You Want More Than a p-value

Describing “association” with a p-value alone is shallow. You should know how to think about:

  • Correlation: strength and direction of linear relationship
  • Regression: predicting an outcome via one or more predictors

Scatterplot showing correlation between two clinical measurements -  for Basic Biostatistics for Student Researchers: Tests Y

4.1 Correlation

Pearson correlation (r)

  • Both variables continuous
  • Relationship is approximately linear
  • No big outliers dominating the pattern

Interpretation:

  • r ranges from –1 to +1
  • |r| ≈ 0.1: small
    |r| ≈ 0.3: moderate
    |r| ≥ 0.5: strong (very rough benchmarks)
  • r > 0 means as one increases, the other tends to increase
  • r < 0 means as one increases, the other tends to decrease

Example:

  • Correlation between number of practice questions done and NBME score.

Spearman correlation (ρ)

Use Spearman when:

  • Data are skewed
  • Relationship is monotonic but not linear
  • Variables are ordinal (e.g., ranks, Likert scores)

Think of Spearman as “correlation of ranks” rather than raw values.

Student trap: Reporting only p-value for correlation (“p < 0.001”) without the actual r. Always report r and p, and ideally a 95% CI.

4.2 Linear regression

You move from “Are these associated?” to “How much does Y change when X changes?”

Simple linear regression

Outcome: continuous
Predictor: one (continuous or binary)

Example:

  • Outcome: NBME score (continuous)
  • Predictor: # of UWorld questions completed (continuous)

Model:
NBME = β₀ + β₁*(questions) + error

Interpretation of β₁:

  • Estimated change in NBME score for each additional unit of questions (often scaled, like per 100 questions)

Output to pay attention to:

  • β₁ (coefficient)
  • 95% CI for β₁
  • p-value for β₁
  • R²: proportion of variance in Y explained by the model

Multiple linear regression

You add more predictors:

  • Outcome: HbA1c
  • Predictors: age, BMI, medication adherence score

This lets you adjust for confounders. For a student project, even a small model with 2–3 predictors can dramatically strengthen your analysis.

Golden rule: You need enough events / sample size for the number of predictors. A rough rule for continuous outcomes is at least 10–15 observations per predictor, though this is context-dependent.


5. Logistic Regression: When the Outcome Is Yes/No

If your outcome is binary, linear regression is inappropriate. That is where logistic regression comes in.

5.1 Basic idea

Outcome: binary (0/1, yes/no)
Predictor(s): continuous, categorical, or both

Example student question:

  • Does using a spaced repetition app predict passing Step 1 on first attempt?

Outcome: pass (yes/no)
Predictors: app use (yes/no), MCAT score, undergrad GPA

Logistic regression estimates odds ratios (ORs).

5.2 Interpreting logistic regression output

Key pieces:

  • Odds ratio (OR)
  • 95% confidence interval for OR
  • p-value

Example:

  • OR for app use = 2.3, 95% CI 1.1–4.8, p = 0.03

Interpretation:

  • Students using the app had 2.3 times the odds of passing Step 1 on first attempt compared to those not using it, adjusting for MCAT and GPA.
  • CI > 1, so association is statistically significant at α = 0.05.

Common misinterpretations:

  • OR is not exactly the same as risk ratio, especially when outcome is common.
  • p = 0.049 vs p = 0.051 is not a hard border between “truth” and “no effect”.

Student tip: For simple projects, keep your models small and conceptually justified. Do not throw in every variable you measured “to see what sticks”.


6. Survival Analysis: The One You Will Mostly Recognize, Not Run

Many premeds and early medical students read, but rarely perform, survival analyses. Still, you should recognize the basics.

6.1 Time-to-event data

Outcome: time until something happens (or you stop observing)

Key features:

  • Censoring: patient did not have the event by end of follow-up
  • Unequal follow-up times across patients

Two main tools:

  1. Kaplan–Meier curves

    • Plot of survival probability over time for one or more groups
    • Often compared with the log-rank test
  2. Cox proportional hazards model

    • Regression model for time-to-event
    • Gives hazard ratios (HRs)

Example interpretation:

  • HR = 0.70 (0.55–0.89) p = 0.003 means a 30% relative reduction in hazard in treatment vs control.

You probably won’t run Cox models alone as a student, but you should be able to read and explain them in journal club.


7. Effect Size, Confidence Intervals, and p-values: How Not to Embarrass Yourself

Most student posters are p-value heavy and effect-size blind. That is backwards.

7.1 p-values: what they are, and what they are not

  • A p-value is the probability of observing your data (or more extreme) if the null hypothesis is exactly true.
  • It is not:
    • The probability the null is true
    • The probability your results are due to chance
    • A measure of clinical importance

Binary thinking (“significant” vs “not significant”) is crude, especially for small student samples.

7.2 Effect sizes

Always ask: How big is the difference?

For continuous outcomes:

  • Mean difference (e.g., 5-point difference in test scores)
  • Standardized effect size (Cohen’s d), though often not essential in early projects

For binary outcomes:

  • Risk difference (absolute difference in risk)
  • Risk ratio
  • Odds ratio

Effect sizes let you talk about clinical or educational relevance, not just statistical significance.

7.3 Confidence intervals

You should report 95% CIs whenever possible.

Why they matter:

  • Narrow CI → precise estimate
  • Wide CI → lots of uncertainty

Interpretation example:

  • Mean difference = 3.2 points (95% CI 0.1 to 6.3), p = 0.04

This suggests:

  • The improvement is statistically significant
  • The true effect might be as small as 0.1 or as large as 6.3 points
  • You can discuss whether that range is educationally meaningful

Contrast with:

  • OR = 2.0 (95% CI 0.9–4.5), p = 0.09

The point estimate suggests doubling of odds, but CI crosses 1 and is wide. With a modest sample size, you have lot of uncertainty. You should not overclaim.


8. Sample Size, Power, and Common Student Pitfalls

Many student projects are underpowered. You cannot fix this with “better statistics” afterward.

Medical student calculating sample size for a research project -  for Basic Biostatistics for Student Researchers: Tests You

8.1 Basic power concepts

Four things are intertwined:

  • Effect size (how big a difference you care about)
  • Sample size (n)
  • Significance level (α, usually 0.05)
  • Power (probability of detecting a true effect, often set at 0.8)

If you keep α and power fixed, then:

  • Smaller effect size → need larger sample
  • Larger effect size → can detect with smaller sample

As a student, the key practical takeaway:

  • If your sample is tiny, treat non-significant p-values as “we cannot be sure” rather than “there is no difference”.

8.2 Typical student mistakes

  1. Testing everything that moves (multiple comparisons)

    • You have 30 survey items and 10 subgroups.
    • You run 300 p-values.
    • A handful show p < 0.05 by chance alone.
    • Strategy: pre-specify primary outcomes and main comparisons. Label others as exploratory.
  2. Using the wrong test for paired data

    • Pre–post survey scores analyzed with independent t-tests.
    • Fix: Recognize repeated measures and use paired tests.
  3. Ignoring distribution

    • Using t-tests on highly skewed data with small n.
    • Always look at a histogram or at least median vs mean.
  4. Confusing correlation with causation

    • “Using Anki causes higher scores.”
    • Your cross-sectional study cannot prove causality.
  5. Fishing for significance

    • Switching tests repeatedly until p < 0.05 appears.
    • Intellectually lazy and scientifically dishonest.

9. How to Present Your Stats in a Way Faculty Respect

Good analysis means little unless you can present it clearly.

9.1 Descriptive statistics

Before any inferential tests, describe your sample:

For continuous variables:

  • If approximately normal: mean ± SD
  • If skewed: median (IQR)

For categorical variables:

  • n (percentage)

Example:

  • Age: 24.3 ± 1.8 years
  • Step 1 score: median 231 (IQR 225–239)
  • Female: 38/62 (61%)

9.2 Reporting tests

Link the test to the structure of the data in your methods.

Example methods phrasing:

  • “Continuous variables were compared using independent-samples t-tests or Mann–Whitney U tests where appropriate.”
  • “Categorical variables were compared using Chi-square or Fisher’s exact tests.”
  • “Pearson correlation was used to assess linear association between practice questions completed and NBME scores.”
  • “A multivariable logistic regression model was constructed to identify predictors of Step 1 passing on the first attempt.”

In the results, do not just give p-values.

Example:

Mean OSCE scores were higher among students who completed the simulation module (82.4 ± 6.1) compared with those who did not (78.2 ± 7.3; mean difference 4.2, 95% CI 1.3–7.1; p = 0.006; independent-samples t-test).

For a 2 × 2 table:

Complication rates did not differ significantly between laparoscopic and open approaches (5/60 [8.3%] vs 9/58 [15.5%]; OR 0.50, 95% CI 0.16–1.49; p = 0.21; Fisher’s exact test).

Notice the pattern:

  • Exact numbers
  • Effect size (difference or OR)
  • 95% CI
  • p-value
  • Test name

If you use regression:

On multivariable linear regression, each additional 100 practice questions completed was associated with a 1.8-point higher NBME score (β = 1.8, 95% CI 0.9–2.7; p < 0.001), adjusting for prior GPA.

This is the level of clarity that makes a faculty mentor comfortable putting their name on your work.


10. A Minimal Mental Checklist Before You Choose Any Test

When you sit down with your dataset, run through this:

  1. What is my primary outcome?

    • Type: continuous, binary, categorical, time-to-event?
  2. What is my main comparison or question?

    • How many groups? Independent or paired?
  3. What is my sample size in each group?

    • Very small n may push you toward non-parametric or exact tests.
  4. What does the distribution look like?

    • Quick histogram or summary: mean vs median, SD vs IQR.
  5. Which simple test from the core set fits this structure?

    • t-test / Mann–Whitney
    • Paired t / Wilcoxon signed-rank
    • ANOVA / Kruskal–Wallis
    • Chi-square / Fisher
    • Correlation
    • Linear or logistic regression
  6. Can I clearly justify that choice in one sentence in my Methods section?

    • If not, simplify or ask for help.

Key Takeaways

  • Most student research needs only a small toolkit: t-tests, ANOVA, Chi-square/Fisher, correlation, and basic regression. Learn these deeply rather than superficially sampling everything.
  • Always start from your outcome type and group structure, then check assumptions and choose the simplest valid test. Do not let software menus drive your decisions.
  • Report effect sizes and confidence intervals alongside p-values, and be conservative in making causal claims; good judgment about interpretation matters more than fancy statistics.
overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles