Active recall vs re-reading — the 50%+ retention gap in three studies
Most people don't believe retrieval practice beats re-reading until they see the numbers. Here are the three studies that nail it down, and what to do about it on a Tuesday afternoon.
Ask a hundred people studying for an exam how they prepare, and most will tell you they re-read the textbook, re-watch the lecture, or work through their highlights one more time. Roughly none will say "I closed the book and tried to remember it from scratch." Re-reading feels like studying. Recalling feels like being tested: uncomfortable, slow, error-prone, faintly humiliating when the answer doesn't come.
Yet the cognitive-science finding that contradicts this preference is among the most-replicated in the field, and the size of the effect is not subtle. In head-to-head comparisons, students who try to recall material remember substantially more of it days and weeks later than students who re-read the same material. Often 50% more, sometimes more than double. The gap shows up in undergraduate prose comprehension, in middle-school science, in foreign-language vocabulary, in medical-school anatomy, in the lab and in the wild.
I want to walk through three of the cleanest demonstrations. Not because the science is contested (it isn't) but because the numbers are far enough from intuition that they're worth seeing once.
Study 1: Roediger and Karpicke, 2006
The canonical paper is Henry Roediger and Jeffrey Karpicke's Test-Enhanced Learning: Taking Memory Tests Improves Long-Term Retention, published in Psychological Science in 2006. The setup is almost embarrassingly simple.
Undergraduates studied short prose passages (the kind of material in a TOEFL reading test, a few hundred words on the sun or sea otters). Then they were assigned to one of two conditions:
- Re-study: read the passage again.
- Test: take a free-recall test on the passage with no feedback, just write down what they remembered.
Both groups had the same total time with the material. Then everyone came back later for a final test, at five minutes, two days, or a week.
At five minutes, the re-study group did slightly better. This is the trick: in the short term, re-reading wins. It's why students prefer it. You feel you "have" the material right after re-reading it, because in some shallow sense you do.
By two days, the curves had crossed. By a week, the gap was wide. In the prose-passage experiment, the re-study group recalled about 42% of the idea units. The retrieval-practice group recalled about 56%. That's a 33% relative improvement at one week, and the rate of forgetting in the re-study condition was much steeper. They were on track to lose more of it over the following weeks.
A second experiment in the same paper used more complex material and three exposures. Same result. Re-reading produced higher confidence and better short-term performance; retrieval produced better retention at every meaningful horizon. Roediger and Karpicke called this the testing effect, and the paper has been cited tens of thousands of times because subsequent labs kept getting the same answer.
What's striking is how passive the "test" condition was. No feedback, no second chance, no flashcards, no app. Just close the book and write what you remember. That single act, repeated once, beat re-reading on a 7-day horizon by a wide margin.
Study 2: Karpicke and Blunt, 2011
Five years later, Karpicke and Janell Blunt published a follow-up in Science that took aim at a more sophisticated rival: concept mapping. (Karpicke & Blunt, 2011, Science 331:772–775.) Concept mapping (drawing nodes-and-edges diagrams of how ideas relate) is what good students do when they're trying to study harder than re-reading. It's what teachers recommend when they want to recommend something other than re-reading. It feels rigorous, structured, generative.
The design was tighter than the 2006 study. Undergraduates studied science texts (about ecosystems and bodily functions), then were randomly assigned to one of four conditions: study once, study repeatedly, build an elaborate concept map of the material, or do free-recall retrieval practice. One week later, everyone took the same final test, scored both on verbatim memory and on inferential questions: the kind that require integrating ideas, not just regurgitating sentences.
The retrieval-practice group beat the concept-mapping group by about 50% on the final test. That margin held for both verbatim and inferential questions, which mattered: the standard objection to retrieval practice is that it's good for shallow recall but bad for deep understanding. The Karpicke and Blunt data say the opposite — retrieval-practice students did better on the questions that required deep, integrative reasoning than the students who had spent their study time literally drawing diagrams of how the ideas integrated.
There was a second, almost more telling result. Students predicted, before the final test, how well they expected to do. The concept-mappers predicted highest. The retrieval-practice group predicted lowest. Then the retrieval-practice group did best. The students were systematically wrong about which strategy was helping them, in the direction that matters: the strategy that felt worst was the one that worked best.
Call it the judgement-of-learning problem in miniature. It's the reason this whole field of research exists as a field. People can't tell from the inside which study technique is producing durable learning. Their introspection says re-read; the data say recall.
Study 3: Adesope, Trevisan and Sundararajan, 2017
Single experiments are suggestive. Meta-analyses are how a finding becomes settled. The most comprehensive synthesis of the testing-effect literature is Olusola Adesope, Dominic Trevisan, and Narayankripa Sundararajan's Rethinking the Use of Tests: A Meta-Analysis of Practice Testing, published in Review of Educational Research in 2017.
They aggregated 118 experiments from over four decades of research, encompassing more than 15,000 participants. Each study was coded for design features: kind of practice test, kind of final test, retention interval, age of learner, type of material, presence of feedback. Then they calculated effect sizes.
The headline number is the standardised mean difference between practice testing and re-reading: Hedges' g = 0.61 when both conditions had equal exposure time, with a 95% confidence interval that does not come anywhere close to zero. When practice testing was compared against no further study at all, the effect was g = 0.51. Against more elaborate alternatives like concept mapping or guided study, practice testing still won, with effect sizes in the 0.50 to 0.93 range depending on conditions.
For context, in education research, an effect size of g = 0.4 is considered roughly equivalent to one year of additional schooling. The testing effect, in this meta-analysis, is larger than that across more than a hundred independent studies.
Now the moderator analyses, where it gets interesting. Practice testing worked better when:
- The format of the practice test matched the format of the final test (no surprise).
- Feedback was given between the practice test and the final test (also unsurprising; feedback turns errors into corrections instead of fossilised mistakes).
- The retention interval was longer (a critical detail: the longer you wait, the bigger the gap between recall practice and re-reading, because re-reading's advantage is purely short-term).
Practice testing worked across age groups, across material types, across school subjects, across testing formats. There is no major moderator that makes the effect disappear. This is what people mean when they say a finding holds up.
Why this is hard to apply
If retrieval practice is this powerful, why doesn't everyone do it.
Two reasons. The first is the judgement-of-learning gap. Robert Bjork and Elizabeth Bjork have written for decades about desirable difficulty: the principle that techniques which produce more durable learning often feel worse in the moment than techniques that produce less. Re-reading is fluent. Recall is halting. The fluency is what people use as their internal signal for "this is working." It's a bad signal. In Karpicke and Blunt's study, the strategy with the worst self-rating was the one with the best result.
The second reason is even simpler: retrieval practice requires something to retrieve from. You need a question, a prompt, a cloze deletion. Something that asks before it tells. Most learning material is structured the other way. A lecture, a chapter, a video: these are all answer-first formats. To turn them into retrieval practice you have to do work — extract the questions, sit with them later, attempt them before peeking. This is what flashcard systems like Anki are for, and it's also why most people who hear about Anki download it, make twenty cards, and quit. The system works; the discipline of authoring cards on top of already studying is what doesn't.
That gap surfaces in the broader research literature on study skills too. The most influential review is John Dunlosky's Improving Students' Learning With Effective Learning Techniques, published in Psychological Science in the Public Interest in 2013. Dunlosky and his co-authors graded ten common study techniques on a four-point utility scale. Practice testing got the highest grade. Distributed practice (spacing, the topic for another post) also got the highest grade. Re-reading and highlighting, the two most popular techniques, got the lowest. The paper has been a touchstone in education research for over a decade and the conclusions have not been seriously challenged.
What to do about it this week
The good news is the research is not asking you to install software, build an Anki deck, or change your life. It's asking you to do one thing differently.
When you finish a chapter, lecture, or article you actually want to remember, close it and try to write down what you just learned, before you check. Three or four key ideas, in your own words, with examples if you can. Then look back and see what you missed. That's the whole move.
Do it once and you've done more for retention than re-reading the chapter twice. Do it on a delay — a few hours later, or the next morning — and you've done more still, because retrieval at a delay does double duty: you're testing yourself and spacing the practice.
Phrase your notes as questions rather than statements when you can. "What was the headline effect size in the Adesope meta-analysis?" is a card. "Adesope found g = 0.61" is not — it's a sentence you can re-read but not retrieve from. The question is the engine. The statement is decoration.
If you're studying with someone, take turns asking each other to explain what you just covered, without the book. The Feynman technique — explaining out loud as if to a beginner — is retrieval practice in conversational form. It works for the same reason.
If you fail to recall something, that's not wasted effort. The act of trying and failing, then seeing the answer, is better for memory than seeing the answer without the failed attempt. This is one of the more counter-intuitive results in the literature and it has been replicated many times. Your wrong answers are doing work.
How Strive does this
We built Strive around the assumption that the lesson isn't where the learning happens — the lesson is where the exposure happens. The learning happens later, when you try to remember it. So every lesson on Strive seeds questions that come back to you on a schedule, in your own queue, phrased in a way you can't pattern-match. You don't make the cards. You don't keep the schedule. You read the lesson, and then on Tuesday a question shows up that asks what you read on Sunday, and you answer it before the answer is shown. That's it.
That's why the research above sits at the centre of the product instead of being a side feature. The studies are forty years old in places. The numbers haven't moved. No reason to keep ignoring them.
If you want to read deeper: Roediger and Karpicke's 2006 paper is short, clean, and a good entry point. Adesope, Trevisan and Sundararajan's 2017 meta-analysis is the rigorous synthesis. Dunlosky et al.'s 2013 review ranks ten common study techniques against the evidence and is the best single reference if you only read one. Bjork and Bjork's chapter on desirable difficulty is the framing that makes the rest make sense.
How to keep what you learn: a working spaced-repetition routine
A 5-minute routine for keeping what you just learned, without setting up Anki, building 500 cards, or learning a scheduling algorithm. Just a calendar and a notebook.
8 min read →Want a personal AI course on what you’re reading about?
Build your first course — free