Residency Advisor Logo Residency Advisor

Maximizing AAMC Data Tools for Pre‑Med and MS1 Research Projects

December 31, 2025
18 minute read

Medical students collaborating on research using data dashboards -  for Maximizing AAMC Data Tools for Pre‑Med and MS1 Resear

The most underused “database” in pre‑medical and MS1 research is hiding in plain sight on the AAMC website.

Most students chase lab bench projects or chart reviews while ignoring something infinitely scalable, IRB‑friendly, and ripe for publication: national‑level AAMC data tools. Used correctly, these tools can power serious, methodologically sound projects for student organizations, pre‑health advising offices, and early‑career academic portfolios.

Let me break this down specifically.


Why AAMC Data Tools Are a Goldmine for Early Trainees

Pre‑meds and MS1s usually assume “real research” means pipettes or EPIC screenshots. That bias leaves massive gaps:

AAMC has already done much of the heavy lifting: collecting, cleaning, and standardizing huge datasets. Your job is not to generate the raw data; it is to ask sharp questions and analyze what is already there.

For student organizations—premed clubs, SNMA, LMSA, AMSA chapters, specialty interest groups—these tools can:

  • Power structured research projects that produce posters and abstracts
  • Justify funding or programming decisions with data instead of anecdotes
  • Build collaborative “research pipelines” for new members each year
  • Allow leadership to publish education or workforce papers with faculty partners

The barrier is not access. It is knowing exactly which tools exist, what questions they can answer, and how to convert web tables into legitimate research designs.


Core AAMC Data Tools You Should Actually Be Using

There are many AAMC resources, but only a subset are truly high‑yield for pre‑med and MS1 research projects. I will separate them into three categories: admissions and applicant data, workforce and specialty data, and curriculum/education data.

1. Admissions and Applicant Data: MSAR, FACTS, and Beyond

These are the workhorses for pre‑med research.

A. AAMC FACTS Tables

Location: AAMC → Data & Reports → “FACTS: Applicants, Matriculants, Enrollment, Graduates”

Key tables:

  • Table A‑1: U.S. medical school applicants, first‑time applicants, acceptees by sex
  • Table A‑3/A‑4: Applicants and matriculants by race/ethnicity and sex
  • Table A‑7: Matriculants by state of legal residence and sex
  • Table A‑16/A‑17: MCAT and GPA distributions for applicants and matriculants

What you can realistically study as a student:

  • 10‑year trends in MCAT and GPA thresholds nationally
  • Shifts in racial/ethnic representation among applicants and matriculants
  • State‑level differences in acceptance rates (applicants vs matriculants)
  • Sex differences in application patterns over time

Example project (pre‑med club):

“Ten‑Year Trends in MCAT and GPA for U.S. Medical School Matriculants: Implications for Advising at a Large Public University”

Steps:

  1. Pull A‑16 and A‑17 for the last 10 available years.
  2. Extract mean, median, and percentile distributions of MCAT and GPA.
  3. Use Excel or R to visualize trends.
  4. Compare national medians to your institution’s accepted students (if your advising office shares de‑identified stats).
  5. Present at your school’s undergraduate research day or a regional advising conference (e.g., NACADA).

B. MSAR (Medical School Admission Requirements)

MSAR is less “research dataset” and more “school comparison utility”, but it can still support structured studies.

You cannot easily scrape MSAR in bulk due to access and format constraints, but focused projects are possible:

  • Comparing mission statements and demographics for schools with strong primary care vs research emphases
  • Evaluating geographic clustering of matriculants versus applicant pools
  • Looking at correlation between class size and diversity metrics (if available)

Caution: MSAR data is often aggregated and not ideal for heavy statistical modeling. Use it for descriptive and exploratory work rather than complex inferential analyses.

C. AAMC Application and Acceptance Trends Reports

The AAMC periodically publishes summary PDFs on:

  • Application surges (e.g., during COVID‑19)
  • Demographic changes in applicant pools
  • Impact of policy changes (holistic review, test‑optional experiments elsewhere)

These can support narrative framing and literature reviews for your empirical work with FACTS tables.


2. Workforce and Specialty Data: Careers in Medicine, Specialty Reports, and Physician Workforce

Once students enter MS1, research questions naturally expand beyond admissions. This is where AAMC workforce tools become powerful.

A. AAMC Workforce Data and Reports

Location: AAMC → Data & Reports → “Physician Workforce”

Key resources:

  • Physician Specialty Data Reports
  • State Physician Workforce Data Reports
  • The “Complexities of Physician Supply and Demand” projections

What you can study as an MS1 (especially through student organizations):

  • Maldistribution of physicians by specialty and region
  • Associations between state‑level physician density and medical school distribution
  • Trends in gender diversification in certain specialties
  • Age distribution of physicians by specialty and implications for future shortages

Example project (specialty interest group):

“Regional Distribution of Primary Care Physicians and Medical Schools in the Midwest: Are Training Sites Aligning with Workforce Need?”

Design:

  1. Use State Physician Workforce Data to pull:
    • Total physicians per 100,000 population
    • Primary care physicians per 100,000
  2. Compile a list of LCME‑accredited schools in those states from AAMC or LCME websites.
  3. Examine whether states with fewer primary care physicians have fewer MD‑granting schools.
  4. Discuss pipeline strategies (branch campuses, rural tracks, etc.).

You are not doing individual‑level analysis here. You are working with aggregate, state‑level data—perfectly suitable for student projects.

B. Specialty Data Reports

  • Typically include data on:
    • Number of active physicians by specialty
    • Age, sex, and sometimes race/ethnicity distributions
    • IMG vs US‑trained proportions

Projects:

  • Gender shifts in traditionally male‑dominated specialties (e.g., orthopedic surgery) over two decades.
  • Racial/ethnic representation gaps in competitive vs less competitive specialties.
  • Comparing diversity in specialties to diversity in the applicant pool (linking workforce data with FACTS tables).

These are especially well‑suited to student groups grounded in diversity and inclusion (SNMA, LMSA, APAMSA) or specialty interest groups concerned with workforce equity.


3. Curriculum and Education Data: GQ, Y2Q, and Curriculum Inventory

These tools are less accessible publicly but still usable with the right collaboration.

A. Graduation Questionnaire (GQ) and Year 2 Questionnaire (Y2Q)

The GQ and Y2Q are survey instruments administered to medical students by the AAMC. Publicly available summaries cover topics like:

  • Student satisfaction with curriculum
  • Mistreatment reports
  • Career intention shifts
  • Debt burden and its perceived impact

While raw data typically stays at the institutional level, national aggregate reports are public.

Possible MS1 projects (often through curriculum committees or academic medicine interest groups):

  • Changes in reported mistreatment over time and across regions.
  • Association between perceived preparedness for residency and debt levels (using aggregate categories).
  • Trends in career intentions (e.g., interest in primary care) between Y2Q and GQ.

These projects become much stronger if you can:

  1. Obtain local, de‑identified GQ/Y2Q results from your school.
  2. Compare them to national aggregates.
  3. Frame your work as “Institution X vs National Benchmarks: A Five‑Year Comparison”.

This type of work is highly publishable in medical education journals when done with proper oversight.

B. Curriculum Inventory

Curriculum Inventory (CI) is more complex. It is a structured database of curricular elements across LCME schools. Access is controlled and typically limited to curriculum deans, but:

  • You can sometimes work with a faculty mentor who has access.
  • CI allows comparative analyses of:
    • Hours devoted to specific content areas (e.g., health policy, pharmacology, DEI training)
    • Use of certain teaching modalities (TBL, PBL, simulation)
    • Vertical integration of specific topics

For first‑year students interested in academic medicine or education research, a CI‑based project—if supported by a faculty mentor—can be a very strong early publication.


Medical student analyzing national medical education datasets -  for Maximizing AAMC Data Tools for Pre‑Med and MS1 Research

Building Legitimate Research Projects from AAMC Tools

Accessing data is the easy part. Designing a project that survives scrutiny from a conference abstract reviewer is where most student efforts fall apart.

Let us walk through how to structure serious projects from these tools.

Step 1: Formulate a Narrow, Answerable Question

Bad student question:

  • “How has diversity in medicine changed over time?”

Better, precise questions tied to specific AAMC tools:

  • “How has the proportion of Black/African American matriculants to U.S. MD‑granting schools changed from 2012–2022 by state of legal residence?” (FACTS A‑3, A‑7)
  • “What trends in average matriculant MCAT score have occurred over the last decade among applicants from low‑income backgrounds?” (if using AAMC SES‑linked reports)
  • “How has the percentage of women in interventional cardiology compared to general internal medicine over the last 15 years?” (Specialty Data Reports)

Your question should explicitly name:

  1. Population (who/what),
  2. Time frame,
  3. Data source (which AAMC reports),
  4. Outcome(s) or variable(s) of interest.

Step 2: Map Your Question to Specific Data Tables

Do this before you promise anything to your student organization or mentor.

Example mapping:

  • Question: “State‑level disparities in acceptance rates.”
  • Tools:
    • FACTS Table A‑7 (matriculants by state)
    • Additional AAMC table on applicants by state (varies by year; sometimes part of special reports)
  • Derived variable:
    • Acceptance rate ≈ matriculants / applicants (approximation, acknowledging limitations)

If you find that AAMC does not publish the level of granularity you need (for example, MCAT distributions by race/ethnicity and state), you must either:

  • Narrow your question, or
  • Switch to a different, feasible angle.

Step 3: Design the Study Type Explicitly

Most student projects using AAMC tools are:

  • Descriptive cross‑sectional (single time point, multiple groups)
  • Trend analyses (repeated cross‑sectional over multiple years)
  • Ecological studies (state‑ or school‑level aggregates)

State this in your methods. For example:

  • “We conducted a retrospective, descriptive trend analysis using publicly available AAMC FACTS tables from 2012–2022.”

You are not doing individual‑level cohort studies here. Recognize the level of your data and do not overstate causal implications.

Step 4: Create a Data Extraction Plan

This is where many pre‑med/MS1 projects become messy.

Concrete approach:

  1. Define which years you will include (e.g., 2005–2022).
  2. Decide your unit of analysis (e.g., national, state, or specialty).
  3. Identify all needed variables from each table (column names and categories).
  4. Standardize your categories. AAMC sometimes changes:
    • Race/ethnicity labels
    • MCAT score scales (pre‑2015 vs post‑2015)
    • Specialty names or groupings

Strategy for consistency:

  • For race/ethnicity, you might group into broad categories that are consistent over time (e.g., White, Black/African American, Hispanic/Latino, Asian, Other).
  • For MCAT, if straddling 2015, you will likely restrict to one test version or use concordance tables but interpret trends cautiously.

Use a spreadsheet template to track:

  • Year
  • Table name/number
  • Variables copied
  • Any transformations or recoding

This documentation becomes part of your methods and prevents confusion when someone else joins the project.

Step 5: Choose Appropriate Analysis Methods

You do not need advanced statistics to produce meaningful work, but you must avoid basic errors.

Common, defensible analyses:

  • Percentages and proportions
  • Rate calculations per 100,000 (for workforce data)
  • Year‑over‑year percent change
  • Simple linear trend lines (with slope and p‑value) across years
  • Group comparisons using:
    • Chi‑square tests (proportions)
    • t‑tests or ANOVAs (if means are provided, which is less common in these summaries)

Software:

  • Pre‑med students: Excel or Google Sheets for descriptive work, plus maybe SPSS if provided by your institution.
  • MS1s: R or Python for more complex projects. R is particularly friendly for reproducible workflows in education research.

Key rule: Do not overinterpret correlations from ecological data as individual‑level effects. For example, “states with more medical schools have more physicians per capita” does not mean each new school caused that difference, without far more modeling.


Student organization meeting discussing research using national medical data -  for Maximizing AAMC Data Tools for Pre‑Med an

Turning Data into Student Organization Power Projects

You are in the “STUDENT ORGANIZATIONS” category. That changes the strategy. Your goal is not just an individual line on your CV; it is creating reproducible research pipelines that outlive your own class year.

Here is how to structure that intelligently.

1. Choose Themes that Match Your Organization’s Mission

Examples:

  • Premed club at a large state university:

    • Focus: how applicants from your institution compare to national trends.
    • Projects:
      • “Acceptance rate trends from [Your University] vs national averages.”
      • “First‑generation college student outcomes: institutional vs national AAMC data comparison” (if advising office data is available).
  • SNMA/LMSA/APAMSA chapter:

    • Focus: diversity, equity, and inclusion in admissions and workforce.
    • Projects:
      • “Ten‑year trend in Latino matriculants by U.S. Census region.”
      • “Black physician workforce representation by specialty vs U.S. population.”
  • Specialty interest group (e.g., Family Medicine, Psychiatry, Surgery):

    • Focus: pipeline and workforce.
    • Projects:
      • “Pipeline to Primary Care: Trends in medical students’ specialty intentions vs actual residency match outcomes” (using AAMC plus NRMP data).
      • “Women’s representation in surgical specialties across states with and without large academic centers.”
  • Academic medicine or medical education interest group:

    • Focus: curriculum and learner experience.
    • Projects:
      • “Changes in reported burnout indicators among graduating medical students nationally (GQ trends).”
      • “Integration of health systems science in U.S. medical curricula: a curriculum inventory analysis” (with faculty access).

2. Build a Multi‑Year Project Structure

Single‑year projects waste momentum. Use AAMC tools to build living projects.

Example structure:

  • Year 1: Define topic, extract initial 5–10 years of data, present a poster locally.
  • Year 2: Expand time frame, refine analysis, submit to a regional or national conference (e.g., AAMC Learn Serve Lead, regional SGIM, or specialty meetings).
  • Year 3+: Collaborate with faculty mentor to write a manuscript, incorporate new years of data as they are released.

Assign roles:

  • Data team: handles extraction and cleaning.
  • Analysis team: performs statistical work and generates figures.
  • Writing team: abstracts, posters, manuscripts.
  • Succession team: recruits juniors, documents workflows, stores templates.

Store everything in a shared, version‑controlled environment (even simple: shared drive + carefully named spreadsheet versions). Write a two‑page “project operations manual” for the next cohort.

3. Engage Faculty and Offices Strategically

AAMC tools alone are powerful, but pairing them with institutional data multiplies value.

Targets for collaboration:

  • Premed advising office (for undergraduate clubs)
  • Office of Medical Education or Student Affairs (for MS1 organizations)
  • Diversity, Equity, and Inclusion office
  • Department of Family Medicine or Internal Medicine (for workforce‑focused projects)

Offer something specific:

  • “We want to compare our institution’s applicant data with AAMC FACTS to inform advising.”
  • “We would like to benchmark our graduates’ reported burnout and mistreatment vs national GQ trends.”

Bring a concise, 1‑page project outline that specifies:

  • Research question
  • AAMC data tools involved
  • What you need from the office (e.g., de‑identified counts, percentages)
  • Output goals (poster, abstract, policy brief)

This seriousness changes how faculty view student organizations: from event planners to collaborators.

4. Ethics and IRB Considerations

Public AAMC aggregate data generally does not require IRB review when used purely descriptively. Complications arise when you:

  • Combine AAMC data with institutional data about identifiable individuals.
  • Access non‑public institutional GQ or Y2Q data.
  • Conduct surveys or interventions yourself based on the insights.

Strategy:

  • Always ask your faculty mentor or IRB office whether your planned work constitutes “research with human subjects” under your institution’s policy.
  • Many projects using de‑identified, aggregate data end up “IRB exempt” or “not human subjects research,” but you must not self‑assign that label without institutional confirmation.

When in doubt, draft a short IRB query letter describing:

  • Data sources (public AAMC aggregate vs internal de‑identified)
  • Unit of analysis (institution level, no individual identifiers)
  • Planned outputs

Attach the IRB response (even if it says “exempt” or “not regulated”) to your project files for documentation.


Common Mistakes Students Make with AAMC Data (and How to Avoid Them)

Let me be very direct here.

Example error:

  • “The data show that increasing MCAT scores caused the drop in diversity.”

Your data likely can show:

  • MCAT scores increased.
  • Diversity decreased or did not increase proportionally.
  • Both occurred during the same interval.

You cannot prove causation from aggregate, observational AAMC data alone. Phrase appropriately:

  • “We observed concomitant increases in MCAT scores and persistently lower representation of [group] among matriculants.”
  • “These trends are consistent with prior literature suggesting standardized tests may disproportionately impact [group].”

Mistake 2: Ignoring Category Changes Over Time

AAMC sometimes revises:

  • Race/ethnicity classifications
  • Sex/gender categories
  • Socioeconomic status (SES) definitions

Always inspect the footnotes of each table. Document any changes and, if needed, collapse categories into broader, stable groups so that comparisons over time are legitimate.

Mistake 3: Overcomplicating the First Project

You do not need hierarchical mixed models to produce meaningful work as a pre‑med or MS1. A clean, clear, accurate 10‑year trend analysis, with well‑designed graphs and thoughtful discussion, is far more impressive than a flawed attempt at highly advanced statistics.

Start with:

  • Solid data cleaning
  • Transparent methods
  • Well‑labeled figures
  • Sensible interpretation

Then level up to more complex modeling if and when you have mentorship.

Mistake 4: Failing to Differentiate Applicant vs Matriculant Data

When working with FACTS tables:

  • Applicants reflect who applied.
  • Matriculants reflect who was accepted and chose to attend.

Do not conflate the two when making claims about “access” or “selection.” You can compare them, and that comparison is often central to your research question, but you must always specify which you are describing.


Concrete Project Blueprints You Can Use Immediately

To make this actionable, here are three fully specified project concepts tailored to student organizations.

Blueprint 1: Premed Club – “Home Institution vs National Benchmarks”

Question:

  • How do medical school acceptance outcomes from [Your University] compare to national AAMC data by GPA, MCAT, and demographic characteristics?

Data:

  • AAMC FACTS: A‑16, A‑17, A‑3, A‑7
  • Institutional advising office: de‑identified data on applicants from your university over the last 5–10 years

Design:

  • Descriptive cross‑sectional and trend analysis.
  • Compare:
    • Mean/median MCAT and GPA of accepted students locally vs national matriculants.
    • Proportion of underrepresented in medicine (URiM) students among your applicants vs national applicants.

Outputs:

  • Poster at your university research day
  • Advisory report shared with your pre‑health office
  • Potential pre‑health advising conference abstract

Question:

  • How have URiM matriculation rates into MD‑granting schools changed across U.S. regions from 2010–2022?

Data:

  • FACTS A‑3 (applicants and matriculants by race/ethnicity)
  • A‑7 (state of legal residence)
  • U.S. Census regional definitions

Design:

  • Aggregate state data into regions (e.g., Northeast, Midwest, South, West).
  • Compute:
    • Proportion of URiM applicants and matriculants by region, per year.
    • Changes over time, with slope estimates.

Outputs:

  • Presentation at national SNMA or LMSA conference.
  • Policy brief shared with local DEI leadership.
  • Manuscript in a journal focused on diversity in health professions education (with faculty mentorship).

Blueprint 3: MS1 Academic Medicine Group – “Burnout and Debt”

Question:

  • What is the relationship between perceived burnout indicators and educational debt levels among graduating U.S. medical students over the last decade?

Data:

  • GQ published summary reports (AAMC)
  • Focus on:
    • Self‑reported burnout/mental health indicators
    • Debt categories at graduation

Design:

  • Descriptive trend analysis of:
    • Proportion of students reporting high debt by category.
    • Proportion endorsing burnout‑related items.
  • Correlate aggregate national debt category proportions with burnout indicator proportions by year.

Caveat: Ecological correlation, not individual‑level. Interpret accordingly.

Outputs:

  • Medical education conference abstract (AAMC Learn Serve Lead, AAMC regional groups, or other education‑focused meetings).
  • Internal curriculum committee briefing.

Final Takeaways

  1. AAMC data tools are not just for institutional planners; they are a robust, underused platform for serious pre‑med and MS1 research when questions are precise and methods are disciplined.
  2. Student organizations can transform these tools into multi‑year, mission‑aligned research pipelines that support conference presentations, manuscripts, and real policy change within their institutions.
  3. The real differentiator is not access to data but the rigor of your question, the clarity of your methods, and your willingness to treat even “simple” descriptive analyses with the same care as any bench or clinical project.
overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles