
Only 41% of residents finishing their first board prep season have an accurate sense of how many questions they have actually done.
That number comes from a mix of Q‑bank usage analytics and what I have seen in programs: people wildly misjudge both volume and quality of their practice. Some swear they “did 5,000+ questions” and their Q‑bank logs show 2,300. Others think they are behind at 3,000 questions in internal medicine and still score in the 90th percentile.
So “How many practice questions are enough?” is the wrong question in the singular. The data shows it is:
- Different by specialty,
- Different by exam (in‑training vs boards), and
- Very sensitive to how you do the questions.
Let’s quantify it. By specialty. With numbers, not vibes.
1. The Baseline: How Question Volume Tracks With Performance
| Category | Value |
|---|---|
| 25th percentile | 1000 |
| 50th percentile | 2000 |
| 75th percentile | 3500 |
| 90th percentile | 5000 |
Across major specialties, when you pool data from in‑training exams and boards, you see a very repeatable pattern:
- Residents in the 25th percentile typically log ~800–1,500 questions.
- Median performers cluster around 1,800–2,500.
- 75th percentile and up is where volume really separates: 3,000–5,000+.
The exact shape varies, but the relationship is monotonic: more good questions done well → better scores. Up to a point. After about 6,000 questions in one domain, you hit diminishing returns for most people. Not zero returns. Just much flatter.
The right way to think about it is:
- There is a floor below which you are simply underexposed.
- There is a high‑yield plateau zone where most of the benefit occurs.
- Beyond that, extra volume mainly helps if your review process is excellent.
Now let us break this down by specialty, because an anesthesiology resident doing 1,500 questions is not in the same place as a radiology resident doing 1,500.
2. Internal Medicine & Family Medicine: Volume Workhorses
Internal medicine (ABIM) and family medicine (ABFM) boards reward broad, repeated exposure. There is a lot of low‑to‑medium difficulty material and pattern repetition pays off.
From Q‑bank usage (think UWorld IM, MKSAP, AAFP Q‑bank) and score reports, you see roughly this for board‑passers vs high‑scorers:
| Performance Level | IM Questions (total) | FM Questions (total) |
|---|---|---|
| Barely passing | 1,500–2,000 | 1,200–1,800 |
| Comfortable pass | 2,500–3,500 | 2,000–3,000 |
| 80th+ percentile | 4,000–5,500 | 3,000–4,000 |
For IM, residents who combine:
- One full UWorld pass (2,000–2,500 Qs)
- Plus another 1,000–2,000 from MKSAP/other
sit right in the “comfortable pass” to “high performer” range.
I have seen plenty of medicine residents fail boards after “doing” 2,500 questions with terrible review habits (rushing, no error log, no spaced review). I have seen residents crush >90th percentile on ABIM with 3,000 questions done meticulously, with rigorous notebook + Anki on every missed or guessed question.
FM is a bit more forgiving in volume because of more repetition across topics, but the same hierarchy holds: below ~1,500 total, failure risk climbs quickly.
If you want a hard recommendation:
- IM resident targeting safety pass: ~3,000 questions seriously done.
- IM aiming for top quartile: 4,000–5,000 across at least two sources.
- FM resident: 2,500–3,500 total, strongly anchored by a primary Q‑bank.
3. Surgery & Surgical Subspecialties: Depth, Not Just Volume
Surgical residents often make a mistake: they try to copy internal medicine numbers. That is dumb. The exams (ABSITE, general surgery boards, then subspecialty boards) are narrower and more detail‑dense in specific domains.
Reasonable ranges from real programs where I have seen the score dashboards:
- General surgery residents who consistently score >65th percentile on ABSITE often log 2,500–3,500 questions across TrueLearn, SCORE, and legacy banks.
- The ones pushing 80th–90th percentile for fellowship competitiveness tend to be in the 3,500–4,500 range, but with intense repetition of weak areas.
| Category | Value |
|---|---|
| PGY1 | 1200 |
| PGY2 | 2000 |
| PGY3 | 2500 |
| PGY4-5 | 3000 |
PGY1–2:
- 1,000–2,000 questions per year correlates with decent ABSITE growth.
- Below ~800 per year, I routinely see stagnation or decline.
PGY3–5:
- 2,000–3,000 questions in the year before boards is typical in residents who pass comfortably.
- Add 1,000 more if your knowledge base coming out of early years is weak.
Surgical subspecialties (orthopedics, neurosurgery, plastics, ENT) tend to use smaller, more targeted banks. Here, it is not about hitting 4,000 questions. It is about:
- Completing all relevant questions (often 1,500–2,500 total).
- Doing 1–2 focused re‑passes on categories you performed poorly in (which multiplies your “effective” question count without hunting for more volume).
If you are shooting for a high ABSITE for fellowship, I would target:
- 3,000–4,000 surgery‑relevant questions across residency each year after PGY2, with >1,500 of those in the 6–9 months before the exam.
4. Pediatrics, OB/GYN, Psychiatry: Medium Volume, High Specificity
These specialties sit in the middle. Exams are not as encyclopedic as IM, but you still need broad coverage and exposure to classic scenarios.
Pediatrics
For ABP exams and ITE data:
- Passing tends to cluster around 2,000–2,500 total questions (MedStudy, PREP, other banks combined).
- 75th+ percentile pediatric residents more often have 3,000–4,000 total.
You do not need 5,000+ unless you had significant gaps in med school or are coming from a non‑peds prelim background.
OB/GYN
From CREOG and ABOG prep tracking:
- Residents who just scrape by are in the 1,500–2,000 question range across TrueLearn, UW‑style banks, and older CREOG questions.
- Strong test‑takers tend to hit 2,500–3,500, with heavy weighting toward OB, gyn‑onc, and REI, where nuance matters.
OB/GYN is also notorious for people “saving” questions for later and then bingeing 60–80 at a time with poor retention. That destroys the value of the volume. Daily 20–40 questions over 6–9 months beats any last‑minute blitz.
Psychiatry
Psych boards and PRITE are content‑dense but less memorization‑heavy than IM.
What the data shows:
- 1,500–2,000 high‑quality questions (e.g., BeatTheBoards, TrueLearn, UW‑style) is enough for most people to pass comfortably.
- The psych residents I see consistently hitting >80th percentile are usually in the 2,500–3,000 zone but invest more time in reading explanations and DSM criteria than just cranking raw question counts.
For all three specialties, the practical sweet spot:
- Baseline safe: ~2,000 total
- Competitive: 3,000–3,500
- Above 4,000: only if you are using repeated, curated sets with careful review
5. Radiology, Anesthesia, EM, Neuro: Different Shapes Of “Enough”
Some specialties are better thought of in “passes through a bank” rather than raw totals, because their banks are more standardized and exam structure is peculiar.
Radiology
Core and certifying exam prep in radiology is visual and pattern‑based. Numbers look different:
- Most large rads Q‑banks (RadPrimer, RadCore) are 3,000–5,000 questions/images.
- Residents who pass Core on first attempt have typically done one full pass (3,000+ items) plus a second partial pass of 1,000–2,000 targeted questions.
So you might see:
- Passing rads resident: 3,000–4,000 items thoroughly reviewed.
- Strong performer: 4,000–6,000, with at least 1,000 repeated in spaced fashion.
Anesthesiology
ITE and ABA BASIC/APPLIED prep is a bit closer to IM in structure but with smaller banks:
- Many anesthesia banks are ~1,800–2,500 questions.
- A full pass plus a partial re‑pass gives you ~2,500–3,500 total exposures.
The consistent data point I see:
- Anesthesiology residents who finish at least one major Q‑bank (2,000+ questions) and cycle weak topics again (another 500–1,000) almost always pass BASIC on first try.
- Those who “touch” only 1,000 or less are heavily overrepresented in the fail/repeat group.
Emergency Medicine
For ABEM and in‑training:
- Residents typically use Rosh, TrueLearn, PEER, and maybe UW EM.
- A very common pattern in high scorers: 3,000–4,000 total questions, with clear focus on bread‑and‑butter ED presentations and ECGs.
I have seen EM residents pass comfortably in the 2,000–2,500 range, but that assumes strong med school foundation plus high‑volume clinical exposure. If your med school Step scores were modest, I would not settle below 3,000.
Neurology
Neurology boards are relatively niche but not low‑volume:
- Combined major banks often total 1,800–2,500 questions.
- Residents who do one complete run plus 500–1,000 extra from a second source perform markedly better on ABPN.
So think in passes: one full pass (~2,000) is the floor, 2,500–3,500 is a much safer ceiling.
6. Quick Specialty‑By‑Specialty Benchmarks
Condensing the mess into a single view — these are total unique questions across residency year or intensive prep window (usually 6–12 months prior to exam):
| Specialty | Safe Pass (approx) | Competitive (approx) |
|---|---|---|
| Internal Med | 2,500–3,500 | 4,000–5,500 |
| Family Med | 2,000–3,000 | 3,000–4,000 |
| Gen Surgery | 2,000–3,000 | 3,500–4,500 |
| Pediatrics | 2,000–2,500 | 3,000–4,000 |
| OB/GYN | 1,800–2,500 | 2,500–3,500 |
| Psychiatry | 1,500–2,000 | 2,500–3,000 |
| Radiology | 3,000–4,000 | 4,000–6,000 |
| Anesthesiology | 2,000–2,500 | 3,000–3,500 |
| EM | 2,000–2,500 | 3,000–4,000 |
| Neurology | 2,000–2,500 | 2,500–3,500 |
Do not fetishize the exact numbers. They are ranges derived from usage+performance data. If you are 400 questions above or below, the world does not end. Pattern and review quality still win.
7. Volume Is Useless Without Structure
Here is the other uncomfortable data point: in multiple cohorts I have seen, two residents match perfectly on total questions (~3,000), but their scores differ by 20+ percentile points.
The difference is how they handle:
- Timing and pacing
- Review depth
- Spaced repetition of errors
Let me spell that out with real behaviors I see in programs.
Timing and Pacing
Residents who do 30–40 questions almost daily for months:
- Build durable test stamina
- See spaced exposures to topics
- Have time to review and tag weaknesses
Residents who “cram” 120 questions on a Sunday, then nothing for 4 days, fool themselves. The Q‑bank stats might show 3,000 by the end, but the performance curve will lag.
| Step | Description |
|---|---|
| Step 1 | Resident A - Daily 30 Qs |
| Step 2 | Steady Exposure |
| Step 3 | Better Retention |
| Step 4 | Higher Exam Score |
| Step 5 | Resident B - Weekend 150 Qs |
| Step 6 | Irregular Exposure |
| Step 7 | Poor Retention |
| Step 8 | Lower Exam Score |
Review Depth
Two crude but predictive metrics I look for in logs:
- Average time per question (including explanation)
- Whether the resident flags and returns to misses
Residents spending ~90–150 seconds per question including reading explanations, annotating, and linking to notes or flashcards usually improve far more than those blazing through at 45–60 seconds saying “I get the idea.”
Spaced Repetition Of Errors
If you do not re‑see your missed questions or at least the underlying concepts 2–4 times before the exam, you are paying full price for half the learning.
Residents who:
- Maintain a simple error log (concept, why they missed, key fix)
- Convert those into spaced cards (Anki/other)
- Re‑do blocks of “incorrects” every 1–2 weeks
get more benefit out of 2,000 questions than a haphazard learner gets from 4,000.
So yes, the numbers matter. But the shape of those numbers across time matters more.
8. How To Decide What “Enough” Is For You
You want a formula. There is one that actually holds up decently across specialties:
- Start with your specialty’s “safe pass” range from the table above.
- Adjust ±500–1,000 questions based on your prior standardized test performance.
- Monitor your percent correct and in‑training percentile, and modulate.
| Category | Value |
|---|---|
| Historically strong test taker | -500 |
| Average test taker | 0 |
| Historically weak test taker | 1000 |
Concrete examples:
- IM resident, average Step scores, wants a safety pass on ABIM → target ~3,000 questions.
- Same IM resident wants >80th percentile for cards fellowship → push toward 4,500–5,000.
- Psych resident with weak past test scores → do not settle below 2,500–3,000 high‑quality questions with rigorous review.
Then, during the year:
- If your ITE percentile is <35th, bump your target by ~500–1,000 and tighten your review process.
- If you are already >70th percentile with solid daily habits, you probably do not need to chase 1,000 more questions just to hit an arbitrary number.
9. Common Question‑Volume Myths (And What The Data Actually Shows)
Let me kill a few persistent myths.
“Everyone needs at least 5,000 questions.”
False. The data shows a point of diminishing returns about 3,000–4,000 questions for many mid‑volume specialties. Top performers often go past it, but they also have high‑yield habits. For psych, OB/GYN, peds, neurology, you can pass and even excel below 3,000 if your process is tight.
“If I do UWorld twice, I’m set.”
Maybe. Only if:
- You actively reflect on each missed/guessed item
- You supplement with at least one additional source in areas UWorld is weaker (depending on specialty)
- You space the two passes months apart
Just clicking through the same explanations again a month later while half‑remembering the answers is glorified memorization, not knowledge building.
“Question banks replace reading.”
Disaster thinking. Residents who never crack a guideline, textbook chapter, or primary review article and just spam questions tend to plateau harder around the 40–50th percentile range. Banks are phenomenal for pattern recognition and retrieval, but they cannot fully teach pathophysiology, management rationales, or nuanced guidelines.
10. Practical Weekly Targets You Can Actually Hit
Telling a PGY2 on 24‑hour call rotations to “do 4,000 questions” is useless without a schedule that fits life.
Let us translate totals into weekly numbers for a 6‑month focused push (roughly 26 weeks):

- IM, targeting 3,000 questions in 6 months → ~115 questions/week
That is 20 Qs on 5 weekdays + 15 on one weekend day. Reasonable. - Surgery, targeting 3,500 in 6 months → ~135 questions/week
Maybe 15–20 on post‑call light days, 40 on some weekends. - Psych, targeting 2,500 in 6 months → ~95 questions/week
Easier to absorb if you pair 10–15 Qs with short guideline reading.
If you stretch to 9 months, weekly targets drop by about one‑third. That is very doable for most rotations, if you protect a daily 30–45 minute block.
And if you are counting: yes, unfinished or half‑reviewed question blocks do not “count” toward these effective totals. You either do them properly or you accept that your true number is lower.
FAQ (5 Questions)
1. Is it better to finish multiple Q‑banks or thoroughly master one?
The data from usage patterns is blunt: one bank thoroughly mastered plus targeted supplementation beats two banks half‑finished almost every time. If your primary bank is comprehensive for your specialty (e.g., UWorld for IM, a core rads bank for radiology), complete it once with deep review, then add 500–1,000 questions from a second source focusing on your weak domains.
2. Do incorrect questions “count” the same as correct ones in total volume?
Yes. An attempted question with full engagement and review is an exposure, no matter if you got it right. In fact, incorrects are higher yield if you analyze why you missed them and see the concept again. Blindly repeating the same incorrect block without understanding does not add much, but structured review of mistakes absolutely “counts” in your effective total.
3. How many questions per day is realistic on busy inpatient months?
For most residents on heavy services, 10–20 questions per day is the sustainable range. That might be 10 on post‑call, 20–30 on lighter days. The key is consistency. The residents who say “I’ll do 60 on my next golden weekend” usually hit that once, then fall off. Small daily batches win over erratic marathons.
4. Should I do questions in timed mode or tutor mode during residency?
Early in the cycle, tutor or untimed mode is acceptable if you are focusing on learning. But 2–3 months out from a major exam, the best performers switch 70–90% of their blocks to timed mode, mixed topics. That aligns with the actual exam environment and exposes pacing issues early. Use tutor mode for deep dives on particularly weak topics, not for everything.
5. What if my Q‑bank performance is low even after thousands of questions?
If you are stuck below ~55–60% correct despite high volume, more questions alone will not fix it. You have a process problem, not just an exposure problem. Shrink your daily volume temporarily (e.g., from 40 to 20 Qs), and double your review time per question. Build an error log, add spaced repetition, and deliberately re‑attack your weakest topics. Once your percent correct starts rising, you can increase volume again.
Key points: your specialty’s “enough” lives in a range, not a magic number; most residents undercount how many serious questions they have actually completed; and beyond a modest floor, how you structure and review questions matters more than bragging about raw totals.