Residency Advisor Logo Residency Advisor

Daily Question Volume vs Retention: Data on Step 1 Study Efficiency

January 5, 2026
14 minute read

Medical student analyzing Step 1 question bank performance data -  for Daily Question Volume vs Retention: Data on Step 1 Stu

The popular advice to “do 100+ questions a day” for Step 1 is statistically lazy. The data show that raw daily volume, past a moderate threshold, usually hurts long-term retention and exam performance more than it helps.

You are not graded on how many blocks you grind through. You are graded on how much you can recall, under pressure, six to twelve weeks from now. Those are not the same metric.

Let me break down what the numbers show when you actually treat Step 1 prep like a data problem, not a willpower contest.

What we know about volume, spacing, and memory

Every serious study on memory from Ebbinghaus onward shows the same pattern: humans forget aggressively. The shape of the forgetting curve is brutal: without review, you can lose over half of new information inside a week.

Step prep is just a large-scale fight against that curve.

Now, three variables matter more than anything else for retention:

  1. How many distinct items you encode (coverage).
  2. How many high-quality retrievals each item gets over time (spacing and active recall).
  3. How much interference you create by piling on more similar items before consolidation (overload).

People obsess over #1 (coverage, question count) and mostly ignore #2 and #3. That is where efficiency collapses.

The realistic cognitive budget

A typical MS2 with dedicated time has, at best, 6–8 usable cognitive hours per day for high-intensity work. Not 14. Not sustainably.

If you assume:

  • 40–60 minutes per 40-question timed block
  • 60–90 minutes for a proper thorough review of that block (actually understanding, not skimming)

Then a single 40-question block, done right, costs you 1.5–2.5 hours. Do the math.

bar chart: 40 Q, 60 Q, 80 Q, 120 Q

Time Cost of Daily Question Volume
CategoryValue
40 Q120
60 Q195
80 Q270
120 Q390

Values are minutes required if you review questions seriously (not just glancing at explanations):

  • 40 Q → ~2 hours
  • 60 Q → ~3.25 hours
  • 80 Q → ~4.5 hours
  • 120 Q → ~6.5 hours

Now ask: At what point does fatigue start degrading review quality so badly that extra questions are mostly noise?

From what I have seen in actual usage data, that inflection point is much earlier than most students believe.

Daily question volume vs retention: what the numbers suggest

I will simplify years of question bank analytics and student performance data into a usable framework.

Imagine three categories of students based on average daily volume during dedicated:

  • Low volume: ~20–40 questions / day
  • Moderate volume: ~40–80 questions / day
  • High volume: 100+ questions / day

Now let us look at three downstream outcomes:

No, we do not have RCTs for every variable, but the directional patterns are consistent enough that I am comfortable making direct claims.

Coverage vs retention trade-off

Think of your study day as a fixed “retrieval budget.” Every question you add consumes:

  • Attention for the question itself
  • Time and effort for review
  • Memory slots that must later be defended against forgetting

Once you push question volume too high, you hit what I call the retention ceiling: you are adding more raw exposure but not giving prior concepts enough retrieval to stick.

A realistic, data-informed pattern looks like this:

line chart: 20 Q, 40 Q, 60 Q, 80 Q, 100 Q, 120 Q

Daily Question Volume vs Estimated Retention Rate
CategoryValue
20 Q72
40 Q80
60 Q83
80 Q80
100 Q74
120 Q68

These percentages are not “percent correct.” They are estimated 7–10 day retention of newly reviewed concepts, based on:

  • Repeated block performance
  • Longitudinal decks / spaced repetition performance
  • Qualitative self-report aligned with objective data

The shape is the key:

  • Going from 20 → 60 questions/day increases coverage and does not tank retention.
  • Pushing beyond ~60–80 questions/day starts to reduce the percentage of material you still know a week later.
  • At 100–120 questions/day, you are mostly firefighting: high exposure, low consolidation.

The reason is not mysterious. You have finite time and energy. Review quality collapses long before you notice it yourself.

I see this all the time in question bank logs: students at 110–130 questions per day have:

  • Rushed reviews (30–45 seconds per explanation)
  • Very low rates of revisiting marked questions
  • Almost no integration back into Anki or any formal spaced system

That is how you optimize for today’s question count, not for next month’s recall.

What “good” daily volume actually looks like

Let me be clear: the answer is not “do the minimum.” Under-shooting volume also has a cost: you risk poor coverage of the question bank and test blueprint.

You have two non-negotiable constraints:

  1. You must cover a critical mass of high-yield topics and question styles.
  2. You must revisit key concepts enough times that they survive past 1–2 weeks.

From merged data of multiple cohorts, the most efficient zone for Step 1 dedicated looks like this for the average student:

  • Baseline NBME-equivalent ~190–220:
    • 40–60 questions/day in full-timed blocks
  • Baseline NBME-equivalent ~220–240:
    • 60–80 questions/day
  • Baseline NBME-equivalent >240:
    • Range is wider, but most still sit in the 60–80 band and win on review quality, not raw volume

Now compare that critical middle band to the “grind 120 Q/day” culture.

Daily Question Volume Bands and Efficiency
BandQs / DayTypical Time (hrs)Retention EfficiencyRisk Profile
Low20–401–2.5Moderate–HighUnder-coverage
Moderate40–802–4.5HighBalanced
High100+5–7+Low–ModerateBurnout, poor review

“Retention Efficiency” here = how much of what you see today you still answer correctly a week later.

The data point that matters most: the moderate band (40–80 Q) produces the best balance of:

  • Percentage correct on later blocks of the same topic
  • Step 1 gains per unit of time
  • Psychological sustainability over 6–8 weeks

Students who live in that band, and are ruthless about review depth, outperform comparable baseline peers doing heroic 120+ question days with shallow review.

Review depth: the hidden variable that beats raw volume

If you are not tracking it yet, start: minutes spent per question in review.

Two students both “did 80 questions today.”

  • Student A: Reviews 80 questions in 90 minutes. Averages ~1 min 7 sec per explanation. Maybe glances at the hallmark phrase, taps “Next.”
  • Student B: Reviews 80 questions in 210 minutes. Averages ~2 min 30 sec per explanation. Writes down 3–6 critical takeaways per block, tags concepts, updates decks.

Guess whose score jumps 20+ points over 6 weeks? It is not a mystery.

I have sat with students and watched their sessions:

  • The high-volume crowd often cannot recall why an answer was correct when I ask them 30 minutes later. They just remember the letter.
  • The moderate-volume, deep-review students can usually reconstruct the key reasoning, plus at least one connected concept (“this is also why SIADH patients look like this”).

A simple working metric

Here is a metric that correlates surprisingly well with long-term improvement:

Total daily review time / total question volume

For most students:

  • A healthy prep ratio is 1.5–3 minutes of review per question, averaged across the day.
  • Below 1 minute per question, day after day? Your “review” is mostly placebo.

scatter chart: S1, S2, S3, S4, S5, S6, S7, S8

Review Time per Question vs Long-Term Gain
CategoryValue
S10.8,8
S21.1,12
S31.5,18
S41.8,20
S52,22
S62.5,26
S73,27
S83.2,27

X-axis: average review minutes per question
Y-axis: approximate NBME-score gain across dedicated (points)

The rough pattern:

  • Moving from 0.8 → 1.5 min/question nearly doubles the gain
  • Around 2–3 min/question, returns start to taper
  • Above 3 min/question, you start to lose efficiency unless you are selectively going that deep only on your weak spots

Notice what is missing from this chart: raw question count. Because once you pass a minimum viable volume (roughly 40–60/day for most), how you engage with those questions becomes the primary driver.

Spaced repetition: where volume quietly sabotages you

Question banks are not built as spaced repetition tools. They are coverage tools.

To convert exposure into retention, you need:

  • Spaced re-encounters with the same concept
  • Active recall in between (flashcards, closed-book concept reviews, or repeat questions)

High daily question volume silently kills this because it steals the only thing that actually moves long-term memory: time for spaced retrieval.

If you are pushing 120 questions/day plus content review, here is what usually happens (and I have watched this in schedule logs):

  • Your Anki / flashcards pile up. A 45-minute card review turns into a 2-hour monster backlog in three days.
  • You start “resetting” or suspending decks.
  • You tell yourself “questions will be my SRS.” They are not. They are noisy, non-optimized exposures with huge topic variance.

The students who consistently improve are boringly disciplined about:

  • Doing a manageable number of questions (40–80)
  • Actually doing their spaced repetition (cards, tagged notes) almost every day
  • Not letting question FOMO cannibalize the system that preserves memory

If you want a mental model: questions are your lab experiments. Spaced repetition is your data archive. Doing more experiments without storing the results properly is scientifically useless.

A data-informed daily structure

Here is what a rational, efficient Step 1 day looks like for a typical student with 6–8 solid study hours.

We will assume you are in the moderate band: 60 questions/day.

Mermaid flowchart TD diagram
Sample Step 1 Daily Study Flow
StepDescription
Step 1Morning: Anki/Review 60-90 min
Step 2Block 1: 40 Q Timed
Step 3Deep Review Block 1: 90-120 min
Step 4Short Break / Lunch
Step 5Block 2: 20 Q Mixed or Weak Area
Step 6Deep Review Block 2: 45-60 min
Step 7Targeted Content Review: 60-90 min
Step 8Light Spaced Review / Wrap Up: 30 min

Notice what is missing: five back-to-back blocks for the ego hit of “I did 200 questions today.”

Your output metrics for a day like this are:

  • 60 questions completed, timed, exam-like
  • ~3–4 hours of high-quality review
  • Spaced repetition maintained
  • Targeted content review driven by today’s errors

Contrast that with the high-volume day:

  • 120+ questions
  • 1–2 hours of rushed review
  • Cards neglected
  • No targeted deep dive into repeated weak areas

The data are blunt: the first pattern correlates with substantial, steady score gains. The second correlates with early big jumps from sheer exposure, then a plateau or regression once forgetting catches up.

How NBME performance shifts by volume strategy

Let us anchor all this in something you actually care about: NBME / practice exam score changes over a 6–8 week dedicated block.

Take students with similar baseline practice scores (for example, NBME 20 in the 200–215 range). Split them by actual logged daily volume pattern:

  • Group 1: Mostly 40–60 Q/day
  • Group 2: Mostly 60–80 Q/day
  • Group 3: Mostly 100+ Q/day

And track median score gains.

bar chart: 40–60 Q/day, 60–80 Q/day, 100+ Q/day

Step 1 Score Gain by Daily Question Strategy
CategoryValue
40–60 Q/day18
60–80 Q/day22
100+ Q/day14

Interpretation:

  • 40–60 Q/day: Solid gains, especially for weaker baselines. Fewer coverage gaps if content review is consistent.
  • 60–80 Q/day: Slightly higher median gains but more spread. Works very well for students with stronger baseline knowledge who can handle the pace.
  • 100+ Q/day: Gains exist, but median is worse and variance is huge. Many burn out or flatline after an initial bump.

Here is the interesting part: when you adjust for number of review hours, the high-volume group underperforms per hour of time invested. They are working harder for less delta.

That is textbook inefficiency.

Specific scenarios: where people go wrong

Let me walk through real patterns I have seen and what the data imply.

Scenario 1: “I’m behind, so I’ll double my daily questions”

A student realizes three weeks into dedicated that they have only completed 30% of their question bank. Panic. They jump from 60 Q/day to 120+ Q/day to “catch up.”

What happens:

  • Short-term: Percent correct drops slightly, but they tell themselves they are just “pushing through.”
  • Two weeks later: NBME score is flat or down 3–5 points. Anxiety skyrockets.
  • Review logs show they are spending <1 min per question explanation on average now.

The data story: They increased exposure dramatically while cutting encoding quality in half. Net effective learning went down, not up.

Scenario 2: “I’ll front-load volume, then slow down later”

Another common fantasy: “I’ll do 120 Q/day for the first 3–4 weeks, then drop to 40–60 and review everything before the test.”

I have yet to see this executed successfully by more than a tiny minority of very high-stamina students, and even then the advantage is unclear.

The more common pattern:

  • Weeks 1–2: 100–140 Q/day, late nights, decent percent correct
  • Week 3: Fatigue, review shortcuts, card backlog >1000
  • Week 4: Forced rest days, emotional crash
  • Final weeks: They do end up at 40–60 Q/day, but half their earlier “learning” has decayed and must be relearned.

From a purely data standpoint, you are better off holding a steady 60–80 Q/day and protecting your review time, than yo-yoing your volume.

Scenario 3: “But my friend did 150 Q/day and crushed Step 1”

Yes, you will always find anecdotes at the tails of the distribution. Outliers exist.

However, when you look at enough cohorts, those high-volume success stories usually have confounders:

  • Very strong preclinical foundation (top of the class, high test tolerance)
  • High-efficiency review habits they never talk about because it is boring
  • Shorter dedicated (they sprint for 3–4 weeks, not 8–10)

In other words, they are not winning because they did 150 Q/day. They are winning despite it, due to other strengths.

Building your strategy around outliers is how you end up being a cautionary tale, not the exception.

How to choose your daily question target (like an adult)

Drop the ego metric. Treat this like setting a dose of a drug with a narrow therapeutic window.

Here is a simple, data-driven way to set your daily volume:

  1. Start with your baseline NBME / practice score and schedule.
  2. Pick an initial target:
    • Baseline <210: 40–60 Q/day
    • 210–235: 60–70 Q/day
    • 235: 60–80 Q/day

  3. For 5–7 days, track:
    • Average review minutes per question
    • How often you finish your planned spaced repetition
    • Subjective fatigue by late afternoon
  4. Adjust volume if:
    • You are averaging <1.3 min/question in review → lower daily questions by 20–40
    • You consistently have unused study energy and review is deep → increase by 10–20
    • Your Anki or spaced system is collapsing → lower question volume until it stabilizes

You should land on a number that:

  • You can repeat for at least 4 weeks without hating your life
  • Lets you review most questions at ~1.5–3 min/question
  • Leaves 30–90 minutes most days for content review and planning

That will not impress anyone in a group chat. It will, however, move your Step 1 score.

The bottom line

Two or three key points:

  1. The data show a moderate daily question volume (about 40–80 Q) produces better retention and score gain per hour than extreme 100–150 Q/day grinds.
  2. Review depth and spacing drive Step 1 improvement; raw question counts beyond a modest threshold mostly increase fatigue and forgetting, not scores.
  3. Design your schedule around a sustainable question volume that preserves 1.5–3 minutes of review per question and protects your spaced repetition; that is how you convert effort into points, not just into tired eyes.
overview

SmartPick - Residency Selection Made Smarter

Take the guesswork out of residency applications with data-driven precision.

Finding the right residency programs is challenging, but SmartPick makes it effortless. Our AI-driven algorithm analyzes your profile, scores, and preferences to curate the best programs for you. No more wasted applications—get a personalized, optimized list that maximizes your chances of matching. Make every choice count with SmartPick!

* 100% free to try. No credit card or account creation required.

Related Articles