Improving Learning Efﬁciency of Factual Knowledge in Medical Education D. Dante Yeh, MD, FACS,* and Yoon Soo Park, PhD† *
Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts; and †Department of Medical Education, University of Illinois at Chicago, College of Medicine, Chicago, Illinois OBJECTIVE: The purpose of this review is to synthesize
recent literature relating to factual knowledge acquisition and retention and to explore its applications to medical education. RESULTS: Distributing, or spacing, practice is superior to
massed practice (i.e. cramming). Testing, compared to re-study, produces better learning and knowledge retention, especially if tested as retrieval format (short answer) rather than recognition format (multiple choice). Feedback is important to solidify the testing effect. CONCLUSIONS: Learning basic factual knowledge is often
overlooked and under-appreciated in medical education. Implications for applying these concepts to smartphones are discussed; smartphones are owned by the majority of medical trainees and can be used to deploy evidence-based educational methods to greatly enhance learning of factual knowledge. ( J Surg ]:]]]-]]]. C J 2015 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.) KEY WORDS: medical education, testing effect, distrib-
utive practice, feedback, smartphones COMPETENCIES: Medical Knowledge
INTRODUCTION With recent interest in clinical simulation and higher order reasoning, learning basic factual knowledge is often overlooked and underappreciated in medical education. Yet without a solid knowledge base, the practice of medicine would not be possible. The sheer volume of information encountered by medical trainees is enormous. In addition to learning the jargon of medical terminology (comparable to learning a foreign language), students must learn and retain a body of factual knowledge in both basic and applied Correspondence: Inquiries to Daniel Dante Yeh, MD, FACS, Department of Surgery, Massachusetts General Hospital, Harvard Medical School, 165 Cambridge St. 810 Boston, MA 02114; fax: (617) 726-9121.; E-mail: [email protected]
, [email protected]
sciences. Unfortunately, without repeated rehearsal, most of this knowledge is subsequently forgotten shortly after graduation.1-3 This is true for both basic sciences and clinically relevant knowledge.4,5 In the United States, the ﬁrst 2 years of medical education are generally spent in the classroom setting (preclinical), whereas starting from the third year, students are immersed in the hospital environment. While working full time in the clinical setting, students are expected to independently continue reading and learning new material. With less time for formal didactic instruction, more emphasis is placed on selfdirected learning. This expectation is further ampliﬁed during residency, when “protected” educational time is minimal and work demands are maximal. Postresidency, physicians are required to continually learn and master new material to maintain proﬁciency and certiﬁcation. To gain acceptance into a medical school and to match in a surgical residency requires a high level of achievement, intelligence, and effort. Yet, learners are often uninformed regarding the optimal methods of study. Metacognition, deﬁned as “knowing about knowing,” is used to describe a learner’s awareness of his/her own knowledge strengths and deﬁcits. It underlies the choices a learner makes in deciding what to study, when to study, how much to study, and when to stop studying. Research demonstrates that up through the university level, learners have an inaccurate understanding of how learning works. In a survey of college students, respondents were 8 times more likely to use inferior study strategies, with only 1% using the best strategy.6 The most frequently chosen method continues to be rereading of prior lecture notes and textbook chapters, a highly inefﬁcient and nondurable strategy.6,7 Left on their own, some adult learners will restudy material up to 7 times longer than their colleagues with minimal gain in accuracy, an effect termed labor-in-vain.8 In the past 2 decades, there have been great strides in education research and advances in the understanding of how learners acquire and retain new knowledge. Most of these experiments have been performed in a laboratory setting under highly controlled and artiﬁcial conditions.
Journal of Surgical Education & 2015 Association of Program Directors in Surgery. Published by 1931-7204/$30.00 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jsurg.2015.03.012
However, recent translational studies under real world settings (including in medical education) have replicated and extended these ﬁndings. The purpose of this review is to synthesize recent literature relating to factual knowledge acquisition and retention and explore its applications to medical education. Speciﬁcally, we will discuss how the spacing effect, the testing effect, the choice of test format, and the provision of feedback all contribute to dramatic gains in learning.
THE SPACING EFFECT The spacing effect describes the phenomenon whereby for an equal amount of cumulative study time, spacing or distributing the study sessions with intervening time gaps (or inter-study interval [ISI]) results in superior knowledge acquisition and long-term retention compared with massing the study sessions together. The ﬁnal examination is given after a retention interval (RI). Descriptions of the spacing effect stretch back as far as the classic Ebbinghaus experiments of 1885, and there are hundreds of reports conﬁrming this effect in applications ranging from learning a new language, to mathematical concepts, to surgical skills training, and in subjects ranging from rats to young children, to cognitively impaired adults, to physicians.9-12 Yet, despite the long history and oft-replicated results, the spacing effect is rarely used intentionally in real world settings as a form of strategic studying.13 Massed repetition (i.e., “cramming”) is still by far the dominant strategy, chosen because of ﬂawed metacognitive judgments based on feelings of ﬂuency or illusions of competency.14,15 In actuality, such strategies that maximize performance early in the learning process are associated with faster forgetting; conversely, strategies resulting in slower early knowledge or skill acquisition are associated with longer retention.16 A theory for why the spacing effect exists is the encoding or contextual variability theory.17,18 In this account, when an item is learned or studied, a memory trace representing the learning context is also stored. Thus, multiple learning sessions will produce varied memory traces and a richer set of paths or cues allowing the learner more chances to recall the item. An alternative theory is the deﬁcient processing theory.19 In this theory, subsequent item presentations after the initial learning session are treated with less attention (because of fatigue, boredom, or overconﬁdence) and thus the repeated sessions are reduced in quality. Congruent with this theory is the concept of desired difﬁculty.16 Within this paradigm, memory has 2 characteristics: storage strength and retrieval strength. The former represents a permanent property of the item, i.e., long-term retention, and the latter refers to the immediate accessibility of the item. According to this theory, the 2 strengths are negatively correlated during initial learning. Less effortful immediate retrieval produces smaller increments in storage strength, and 2
conversely, more difﬁcult retrieval (lower retrieval strength) results in higher gains in storage strength. Up until recently, most research on the spacing effect was conducted in the laboratory setting with very short gaps (minutes), short RIs (minutes to hours), and no long-term follow-up. For longer RIs in real world settings, recent studies provide empirical evidence to help decide how long the ISI should be. In a large randomized study of 1354 subjects (mean age 34 y, standard deviation ¼ 11; 72% were woman), Cepeda et al. presented the subjects with 32 obscure, but true trivia facts (e.g., “What European nation consumes the most spicy Mexican food?” Answer: “Norway”). Using 26 combinations of ISIs and RIs, these investigators reported that the optimal ratio for maximal retention varies as a function of RI.20 To quote the authors, “if you want to know the optimal distribution of your study time, you need to decide how long you wish to remember something.” As the RI increases from 1 week to 1 year, optimal ISI in absolute terms increases, but as a percent of RI, the optimal ISI decreases from 20% to 40%, to 5% to 10%.20,21 All else being equal, simply restudying material at the optimal time can retain more than double the amount retained. The Cepeda study also demonstrated that the costs of too-short spacing were greater than those of too-long spacing. Put another way, the beneﬁts of spacing may override the costs of increasing error rates associated with increasing the ISI.22 In a study of dental students taking a theoretical radiological science course, subjects were randomized to usual practice or spaced testing (test questions were delivered by email 14 d after the lecture). Nkenke et al. found that subjects in the spaced education group actually spent far more time (216 min vs 58 min, p o 0.001) engaged in the learning context. Using a validated questionnaire, the authors found that the spaced education group rated the didactics signiﬁcantly better (mean 4.9 out of 6 vs 4.1, p ¼ 0.034), and felt that their learning needs were better fulﬁlled (4.6 vs 1.4, p ¼ 0.02).23 These ﬁndings are signiﬁcant and meaningful when considering that amount of time spent studying is directly correlated with scores on the American Board of Surgery In-Training Exam (ABSITE) and that most of the surveyed residents were dissatisﬁed with traditional study methods.24
THE TESTING EFFECT It was previously believed that learning occurred only during active study; testing was regarded as formative assessment, measuring learning that had previously occurred but itself contributing little, if any, to learning. Thus, it is a common practice for a learner to study and test a topic until an item is successfully recalled, and then drop that item from future study and testing. It seems intuitively obvious that once an item is mastered, attention should be focused Journal of Surgical Education Volume ]/Number ] ] 2015
on learning new, unmastered items rather than on continually retesting mastered material. Recently, however, Karpicke and Roediger25 reported a series of elegant experiments challenging this practice. In studying SwahiliEnglish word pairs, subjects were randomized to 1 of 4 conditions: Standard (ST) condition, where the entire 40item list was studied and tested repeatedly regardless of recall mastery. In the second condition, SNT, once a word pair was recalled, it was dropped from further study but continually tested. In the third condition, STN, recalled pairs were continually studied, but dropped from further testing. And ﬁnally, in the fourth condition, SNTN, recalled word pairs were dropped from all future study and tests. The results were both counter-intuitive and dramatic: longterm retention was highest when successfully recalled material was continually retested. Furthermore, repeated study after a single successful recall did not produce any further learning. To quote the authors, “… there is a striking absence of any beneﬁt of repeated studying once an item could be recalled from memory.” Congruent with previous studies demonstrating lack of metacognition, these students exhibited complete unawareness of the beneﬁcial effects of retrieval practice. Repeated studying may result in higher rates of initial learning if measured after a very short time, but testing substantially slows the inevitable rate of forgetting. Figure 1 illustrates the results of university-level students learning Swahili for the ﬁrst time. Initial test scores are higher for repeated study if challenged minutes after the last exposure, but the trend reversed in as little as 2 days. Most learners continue to choose studying because they are fooled by initially high learning rates. A theory for why the testing effect occurs is termed transfer appropriate processing. In this theory, the act of testing as practice more closely approximates the conditions on the ﬁnal test (as opposed to simply rereading the correct
answer), and thus increases success through mimicking the appropriate required cognitive processes. A second theory for why the testing effect exists is termed elaborative retrieval hypothesis and is based on the previously described concept of desirable difﬁculty. Increasing the difﬁculty forces the learner to engage more elaborate cognitive processes and thus strengthens item storage strength. Early investigations of the testing effect were conducted under artiﬁcially controlled experimental conditions involving word lists. Recent studies have conﬁrmed the testing effect in classroom settings with educationally relevant material.26-28 Some have questioned whether the testing effect encourages only superﬁcial memorization of answers without deeper understanding. However, testing beneﬁts have been demonstrated for both factual and conceptual learning, transfer to different domains, learning of natural concepts, and visuospatial map learning.29-31 Testing has been shown to beneﬁt the teaching of anatomy to university-level students and signiﬁcantly improve knowledge retention 6 months after a cardiac resuscitation course.32 Likewise, Larsen et al.33 showed that testing is superior to restudying in 6-month retention in ﬁrst-year medical students learning clinical neurology topics. In another experiment, the same authors randomized pediatric and emergency medicine residents to repeated study or repeated testing of clinical topics (myasthenia gravis and status epilepticus). Testing led to a signiﬁcant increase in knowledge retention 6 months after the initial teaching session (39% vs 26%, p o 0.001).34 End-user buy-in was strong, as almost 90% indicated a willingness to continue to take tests after study completion. This experiment is also informative as it demonstrates the rapid rate of forgetting (74% at 6 mo) after initially “mastering” new material through studying alone. The testing effect can thus be summarized as: learners tested on material will remember it better in the future than if they had not been tested.35 This testing effect remains even when the time spent studying or testing is equalized,36 and is further enhanced and ampliﬁed with increasing number of test opportunities vs restudy opportunities.35-37 Of note, the combination of testing and spacing seems to be particularly effective; a series of experiments demonstrated that a single well-timed test was more effective than 7massed study repetitions.38
FIGURE 1. Mean percentage correct cued recall on a ﬁnal test as a function of learning condition (study vs test) and retention interval (2 min vs 48 h). RO, read only; MC, multiple choice; SA, short answer. (Reproduced with permission from Toppino and Cohen.65)
When considering test format, the 2 options most feasible for wide-scale use are: retrieval (i.e., short answer) and recognition (i.e., multiple choice questions [MCQ]). The overwhelming majority of study that aids in medical education currently use multiple choice format questions, mainly because of ease of grading and perceived objectivity. Research demonstrates, however, that compared with
Journal of Surgical Education Volume ]/Number ] ] 2015
recognition testing, retrieval or recall testing produces superior long-term retention at the expense of short-term acquisition.39 A theory proposes that engagement of retrieval processes will modify the memory trace, increasing the probability of future successful retrieval. This has been termed the generation effect, whereby items self generated by learners are better retained than items merely presented to be read.40 This effect occurs even if the learner fails to correctly generate the item, provided that corrective feedback is given. For example, McDaniel et al.26 demonstrated that missed facts on a short answer quiz (compared with missed multiple choice or additional restudy) were more likely to be answered correctly on a later exam. A second theory, the retrieval effort hypothesis, states that compared with recognition testing, retrieval testing induces more effortful cognition by the learner, thus promoting deeper processing.41 Hence, retrieval testing is superior to recognition testing, and both forms of testing are superior to simple restudy. This theory is also grounded in the concept of desirable difﬁculty.16,41 The retrieval effort hypothesis has been conﬁrmed through multiple studies demonstrating that more effortful processing causes lower rates of initial learning, but higher rates of long-term retention.42 As stated earlier, the retrieval effect need not be successful to be beneﬁcial.43 In an experiment involving ﬁctional trivia questions, subjects forced to answer a ﬁctional trivia question (for which the answer was impossible to guess correctly) displayed signiﬁcantly higher rates of learning compared with simply reading the ﬁctional trivia fact.44 The act of trying to answer an unanswerable question appears to have enhanced encoding of the “correct” answer. This effect was replicated in the same study when using weakly associated word pairs (e.g., mouse-hole and train-caboose) instead of ﬁctional facts. In medical education, a spaced retrieval test (without feedback) signiﬁcantly improved long-term retention after a life support course.45 A potential disadvantage of using MCQ recognition format is interference with learning: subjects may select “lures” (incorrect answer choices) and acquire false knowledge.46-48 This is especially true if no feedback is provided and the learner has not previously studied the material.49 In an experiment, the learners endorsed a previously selected lure on a ﬁnal examination 30% of the time despite corrective feedback given after the initial MCQ test.50 Hence, exposing a learner to incorrect information (via MCQ lures) in the acquisition phase of learning may result in encoding of incorrect information. The effect is diminished (but not eliminated) with corrective feedback.
and optimizes the gains achievable from testing through 2 mechanisms: reinforcement of correct responses and correction of incorrect guesses.52 The type of feedback can vary, ranging from simply informing correct or incorrect (partial feedback), to showing the examinee the correct answer, to explaining the rationale for the correct answer. Research has shown that partial feedback is least effective and thus more elaboration is generally recommended.53 So, powerful is the combination of testing with feedback, that some researchers have reported that it overrides any merits of prior studying of the material49! Without time constraints, there is little debate about whether feedback is beneﬁcial or harmful; the overwhelming majority of research demonstrates that feedback is at best, vastly beneﬁcial, and at worst, learning neutral.26 A typical ﬁnding is illustrated in Figure 2, where providing feedback increases the retention percentage by more than 2 times on a delayed test. Note also that the combination of testing and feedback quadruples the retention percentage when compared with no testing. The 3 main theories about why the feedback effect occurs are the Learning Theory, the Spacing Theory, and the Interference Preservation Hypothesis. In the Learning Theory, feedback is thought to reinforce correct responses. Similar to operant conditioning, it is believed that the reinforcement is most effective when provided immediately after the stimulus. In a contrary argument, the Spacing Theory views feedback as simply an additional restudy opportunity. Therefore, delaying feedback should provide improved memory beneﬁts over immediate feedback. Both these theories consider only feedback after correct responses. The third theory, the Interference Preservation Hypothesis, focuses instead on feedback after incorrectly answered items. It is assumed that the incorrect response serves to interfere with the correct answer when feedback is given immediately. Delay of feedback allows the learner to forget the
FEEDBACK Although the testing effect occurs even in the absence of feedback,51 it is generally accepted that feedback enhances
FIGURE 2. Proportion of correct response on the ﬁnal cued-recall test as a function of initial learning condition. (Reprinted with permission from Butler et al.57)
Journal of Surgical Education Volume ]/Number ] ] 2015
incorrect response and opens the way for encoding of the correct answer. Thus, the beneﬁcial effects of feedback may depend upon whether or not the item was answered correctly. For a correctly answered item, feedback may reinforce the correct answer and serve as a form of spaced restudy. However, some researchers have recently reported that feedback after a correct response makes little difference, whether immediate or delayed.54 For an incorrectly answered item, feedback serves as error correction, preventing the encoding of false information. Signiﬁcantly, a learner’s perception of the difﬁculty of an item and his subsequent performance (correct or incorrect) inﬂuence the success of learning. Questions may be correct or incorrect and may be answered with high- or low-conﬁdence. Incorrectly answered items answered with both low- and high-conﬁdence beneﬁt from feedback through correction of memory errors. Without feedback, the error is almost never corrected (Fig. 3). When learners commit an error on a perceived easy item (high conﬁdence), they are much more likely to improve after feedback on a subsequent test than when the error is committed on a perceived difﬁcult item (low conﬁdence).55 This phenomenon has been termed hypercorrection and is believed to occur because the feedback is surprising and unexpected to the learner, stimulating closer attention and interest.56 Correct items answered with high conﬁdence are unlikely to derive additional beneﬁt from feedback. However, correct items answered with low conﬁdence may be a lucky guess and have been described as an error of metacognitive monitoring. Thus, although there is no memory error to correct, feedback in these cases can serve to correct metacognitive errors, improving the accuracy of future conﬁdence judgments.49 Hence, in both correct items answered with low conﬁdence and incorrect items answered with high conﬁdence, there is a discrepancy
between the objective correctness of the response and the learner’s subjective assessment. Providing feedback in these cases illuminates this metacognitive discrepancy and stimulates additional cognitive processing by the learner.56,57
ADDITIONAL STUDIES IN MEDICAL EDUCATION Several investigators have demonstrated that the spacing effect and testing effect can powerfully effect learning in medical education, causing gains in knowledge detectable up to 2 years later.58,59 Kerfoot et al. enrolled 724 of the 1000 Urology residents in the United States and divided them into 2 cohorts, the ﬁrst receiving spaced education on prostatetestis histopathology and massed (bolus) education via webbased teaching modules on bladder-kidney histopathology, and the second cohort receiving the inverse. Congruent with laboratory education experiments, massed education increased scores more than spaced education in the short term; however, when measuring diagnostic skill retention in the long term (18-45 wk), spaced education improved learning efﬁciency fourfold. Compliance was high and most of the enrollees completed all teaching modules. When surveyed, 99% of respondents requested to participate in future studies on spaced education.60 A similar study demonstrated similar spacing effects for teaching of core physical examination topics to medical students.61 Once again, acceptance was high, with 85% of participants recommending the teaching method for the following year’s class of students.
FIGURE 3. Proportion of correct response on the ﬁnal cued-recall test. (Reprinted with permission from Butler et al.57)
Surveys report that 85% of Accreditation Council for Graduate Medical Education (ACGME) trainees, and up to 97% of surgical residents own a smartphone.24,62 However, the most commonly used smartphone applications are drug guides and medical calculators.63 Most education applications are often little more than electronic textbooks, video libraries, or question banks. Very few, if any, offer spaced study based on optimum ISI:RI ratio, retrieval format testing, or adaptive algorithms that adjust content according to user performance. With high existing market penetration, frequent end-user engagement, and enormous computing power, the smartphone is the ideal tool to deliver strategic testing to promote efﬁcient learning and retention of factual knowledge. One could easily imagine an application that provides a daily “push” notiﬁcation, similar to a text message alert, to the user at a speciﬁed time, such as during the morning subway commute. Upon accepting this push notiﬁcation, the learner is then presented with a retrieval format question (e.g., The [ ] nerve provides cutaneous innervation to the plantar side of the heel.). Upon entering the answer (“tibial”), the user is next asked to rate the difﬁculty of the item (judgment of learning) and then given immediate feedback. Based on the
Journal of Surgical Education Volume ]/Number ] ] 2015
response (correct or incorrect) and the perceived difﬁculty, the item will then be processed through an algorithm to determine the optimal time to retest the item. Response times can also be measured and factored into the algorithm. Additionally, if the learner is studying for a particular test (e. g., ABSITE), the date of that exam could be entered into the application. Thus, a few simple calculations could determine the optimal ISI based on the known RI. For example, if the ABSITE is 300 days away, item retesting should occur in 15 to 30 days; however, if the ABSITE is only 30 days away, item retesting should occur in 3 to 6 days. Thus, the application will be adaptive to the responses and judgments of the individual user as well as learning objectives. Importantly, these spacing decisions must NOT be left to the user to decide, as research has repeatedly demonstrated that illusions of competency will lead learners to make poor study decisions. Even if learners may make the correct decision to retest an item, the timing of the retesting opportunity may not be optimized for maximal efﬁciency. Although several applications already exist (Anki, Osmosis, Firecracker), they do not adequately exploit all of the strategies for optimal learning. For example, Anki (Ankitects Pty. Ltd., Sydney) allows the user to choose the spacing interval and thus functions as only an electronic form of ﬂashcards. Osmosis (Knowledge Diffusion, Inc., Baltimore, MD), a program designed by medical students for medical students, provides push notiﬁcations and social game mechanics. The application measures judgments of learning and feature questions in multiple choice, true-false, and label-matching formats as well as ﬂashcards that test recall. Like Anki, Osmosis allows the user to choose when to re-take test items, though also has the option to automatically generate quizzes based on prior performance. Firecracker (Firecracker, Inc., Cambridge, MA), a medical education application, presents learners with both MCQ and retrieval questions. However the retrieval questions are sometimes open-ended (e.g., How does G6PD deﬁciency lead to hemolysis?) and do not necessarily have 1 correct answer. However, Firecracker does use a proprietary algorithm, based on inputs such as judgment of difﬁculty, number of repetitions, and elapsed time, to improve the efﬁciency of retesting. The validity of these applications has not been extensively studied using objective scientiﬁc methods. Additional research studies are required to evaluate and reﬁne existing algorithms. Techniques for determining optimal selection of items using prior history of learner performance can be calibrated using existing methods, such as computerized adaptive testing based on item response theory models.64
is readily apparent that learners should choose learning strategies that challenge them and require deeper cognitive processing. Adult learners receive little, if any, instruction on how best to study and the most of them will choose strategies based on ﬂawed metacognitive judgments. The most commonly employed methods of rereading text, using tests only as formative assessments, and cramming are highly inefﬁcient and result in rapid forgetting. The use of testing, especially retrieval format with feedback, combined with optimally distributed spacing, can greatly enhance learning and retention. Early experiments in medical and surgical education demonstrate that such strategies are feasible, effective, durable, and well-accepted. Existing technologies, such as the ubiquitous smartphone, can be used to provide evidence-based strategic testing to maximize efﬁciency and retention.
REFERENCES 1. Custers EJ. Long-term retention of basic science
knowledge: a review study. Adv Health Sci Educ Theory Pract. 2010;15(1):109-128. 2. Custers EJ, Ten Cate OT. Very long-term retention of
basic science knowledge in doctors after graduation. Med Educ. 2011;45(4):422-430. 3. Rico E, Galindo J, Marset P. Remembering biochem-
istry: a study of the patterns of loss of biochemical knowledge in medical students. Bichem Educ. 1981;9 (3):100-102. 4. Berden HJ, Willems FF, Hendrick JM, Pijls NH,
Knape JT. How frequently should basic cardiopulmonary resuscitation training be repeated to maintain adequate skills? Br Med J. 1993;306(6892):1576-1577. 5. Ali J, Cohen R, Adam R, et al. Attrition of cognitive
and trauma management skills after the Advanced Trauma Life Support (ATLS) course. J Trauma. 1996;40(6):860-866. 6. Karpicke JD, Butler AC, Roediger HL 3rd. Metacog-
nitive strategies in student learning: do students practise retrieval when they study on their own? Memory. 2009;17(4):471-479. 7. Carrier LM. College students’ choices of study strat-
egies. Percept Mot Skills. 2003;96(1):54-56. 8. Nelson TO, Leonesio RJ. Allocation of self-paced
CONCLUSIONS This brief review has highlighted several concepts, which should be considered in medical education. One common theme underlying all these concepts is desirable difﬁculty. It 6
study time and the "labor-in-vain effect". J Exp Psychol Learn Mem Cogn. 1988;14(4):676-686. 9. Rohrer D, Taylor K. The effects of overlearning and
distributed practise on the retention of mathematics knowledge. Appl Cognit Psychol. 2006;20:1209-1224. Journal of Surgical Education Volume ]/Number ] ] 2015
10. Cepeda NJ, Pashler H, Vul E, Wixted JT, Rohrer D.
24. Yeh DD, Hwabejire JO, Imam A, et al. A survey of
Distributed practice in verbal recall tasks: a review and quantitative synthesis. Psychol Bull. 2006;132(3): 354-380.
study habits of general surgery residents. J Surg Educ. 2013;70(1):15-23.
11. Bahrick HP, Bahrick LE, Bahrick AS, Bahrick PE.
Maintenance of foreign language vocabulary and the spacing effect. Psychol Sci. 1993;4(5):316-321. 12. Bird S. Effects of distributed practice on the acquis-
ition of second language English syntax. Appl Psycholinguistics. 2010;31:635-650. 13. Dempster FN. The spacing effect: a case study in the
25. Karpicke JD, Roediger HL 3rd. The critical impor-
tance of retrieval for learning. Science. 2008;319 (5865):966-968. 26. McDaniel MA, Anderson JL, Debrish MH, Morrisette
N. Testing the testing effect in the classroom. Eur J Cognit Psychol. 2007;19(4/5):494-513. 27. McDaniel MA, Roediger HL 3rd, McDermott KB.
failure to apply the results of psychological research. Am Psychol. 1988;43(8):627-634.
Generalizing test-enhanced learning from the laboratory to the classroom. Psychon Bull Rev. 2007;14 (2):200-206.
14. Koriat A, Bjork RA, Sheffer L, Bar SK. Predicting
28. Roediger HL III, Agarwal PK, McDaniel MA, McDer-
one’s own forgetting: the role of experience-based and theory-based processes. J Exp Psychol Gen. 2004;133 (4):643-656. 15. Son LK. Spacing one’s study: evidence for a metacog-
nitive control strategy. J Exp Psychol Learn Mem Cogn. 2004;30(3):601-604. 16. Schmidt RB, Robert AB. New conceptualizations of
practice: common principles in three paradigms suggest new concepts for training. Psychol Sci. 1992;3: 207-217. 17. Glenberg AM. Component-levels theory of the effects
of spacing of repetitions on recall and recognition. Mem Cognit. 1979;7(2):95-112. 18. Melton A. The situation with respect to the spacing of
repetitions and memory. J Verbal Learn Verbal Behav. 1970;9(5):596-606. 19. Dempster F. Spacing Effects and Their Implications
for Theory and Practice. Educ Psychol Rev. 1989;1 (4):309-330. 20. Cepeda NJ, Vul E, Rohrer D, Wixted JT, Pashler H.
Spacing effects in learning: a temporal ridgeline of optimal retention. Psychol Sci. 2008;19(11): 1095-1102. 21. Cepeda NJ, Coburn N, Rohrer D, Wixted JT, Mozer
MC, Pashler H. Optimizing distributed practice: theoretical analysis and practical implications. Exp Psychol. 2009;56(4):236-246. 22. Pashler H, Zarow G, Triplett B. Is temporal spacing of
tests helpful even when it inﬂates error rates? J Exp Psychol Learn Mem Cogn. 2003;29(6):1051-1057. 23. Nkenke E, Vairaktaris E, Bauersachs A, et al. Spaced
mott KB. Test-enhanced learning in the classroom: long-term improvements from quizzing. J Exp Psychol Appl. 2011;17(4):382-395. 29. Butler AC. Repeated testing produces superior transfer
of learning relative to repeated studying. J Exp Psychol Learn Mem Cogn. 2010;36(5):1118-1133. 30. Jacoby LL, Wahlheim CN, Coane JH. Test-enhanced
learning of natural concepts: effects on recognition memory, classiﬁcation, and metacognition. J Exp Psychol Learn Mem Cogn. 2010;36(6):1441-1451. 31. Carpenter SK, Pashler H. Testing beyond words: using
tests to enhance visuospatial map learning. Psychon Bull Rev. 2007;14(3):474-478 http://dx.doi.org/doi: 10.1002/ase.1489 [Epub a head of print]. 32. Dobson J.L., Linderholm T. The effect of selected
“desirable difﬁculties” on the ability to recall anatomy information. Anat Sci Educ. 2014. 33. Larsen DP, Butler AC, Roediger HL 3rd. Comparative
effects of test-enhanced learning and self-explanation on long-term retention. Med Educ. 2013;47(7): 674-682. 34. Larsen DP, Butler AC, Roediger HL 3rd. Repeated
testing improves long-term retention relative to repeated study: a randomised controlled trial. Med Educ. 2009;43(12):1174-1181. 35. Roediger HL III, Karpicke JD. Test-enhanced learn-
ing: taking memory tests improves long-term retention. Psychol Sci. 2006;17(3):249-255. 36. Carpenter SK, Pashler H, Wixted JT, Vul E. The
effects of tests on learning and forgetting. Mem Cognit. 2008;36(2):438-448.
education activates students in a theoretical radiological science course: a pilot study. BMC Med Educ. 2012;12:32.
37. Kuo TH. Investigations of the testing effect. Am J
Journal of Surgical Education Volume ]/Number ] ] 2015
38. Karpicke JD, Roediger HL. Expanding retrival pro-
52. Butler AC, Karpicke JD, Roediger HL 3rd. The effect of
motes short-term retention, but equally spaced retrieval enhances long-term retention. J Exp Psychol. 2007;33:704-719.
type and timing of feedback on learning from multiplechoice tests. J Exp Psychol Appl. 2007;13(4):273-281.
39. Butler AC, Henry LR. Testing improves long-term
retention in a simulated classroom setting. Eur J Cognit Psychol. 2007;19:514-527. 40. Carrier M, Pashler H. The inﬂuence of retrieval on
retention. Mem Cognit. 1992;20(6):633-642. 41. Pyc AP, Katherinw AR. Testing the retrieval effort
hypothesis: does greater difﬁculty correctly recalling information lead to higher levels of memory? J Mem Lang. 2009;60:437-447. 42. Carpenter SK. Cue strength as a moderator of the
testing effect: the beneﬁts of elaborative retrieval. J Exp Psychol Learn Mem Cogn. 2009;35(6):1563-1569. 43. Kornell N. Attempting to answer a meaningful ques-
tion enhances subsequent learning even when feedback is delayed. J Exp Psychol Learn Mem Cogn. 2014;40 (1):106-114. 44. Kornell N, Hays MJ, Bjork RA. Unsuccessful retrieval
attempts enhance subsequent learning. J Exp Psychol Learn Mem Cogn. 2009;35(4):989-998. 45. Turner NM, Scheffer R, Custers E, Cate OT. Use of
unannounced spaced telephone testing to improve retention of knowledge after life-support courses. Med Teach. 2011;33(9):731-737. 46. Roediger HL 3rd, Marsh EJ. The positive and negative
consequences of multiple-choice testing. J Exp Psychol Learn Mem Cogn. 2005;31(5):1155-1159. 47. Butler AC, Marsh EJ, Goode MK, Roediger MK, III
HL. When additional multiple-choice lures aid versus hinder later memory. Appl Cognit Psychol. 2006 941-956.
53. Bangert-Drowns RL, Kulik CC, Kulik JA, Morgan
MT. The Instructional Effect of Feedback in Test-Like Events. Revi Educ Res. 1991;61(2):213-238. 54. Pashler H, Cepeda NJ, Wixted JT, Rohrer D. When
does feedback facilitate learning of words? J Exp Psychol Learn Mem Cogn. 2005;31(1):3-8. 55. Butterﬁeld B, Metcalfe J. Errors committed with high
conﬁdence are hypercorrected. J Exp Psychol Learn Mem Cogn. 2001;27(6):1491-1494. 56. Fazio LK, Marsh EJ. Surprising feedback improves
later memory. Psychon Bull Rev. 2009;16(1):88-92. 57. Butler AC, Karpicke JD, Roediger HL. Correcting a
metacognitive error: feedback increases retention of low-conﬁdence correct responses. J Exp Psychol Learn Mem Cogn. 2008;34(4):918-928. 58. Kerfoot BP. Learning beneﬁts of on-line spaced
education persist for 2 years. J Urol. 2009;181(6): 2671-2673. 59. Kerfoot BP, Baker HE, Koch MO, Connelly D,
Joseph DB, Ritchey ML. Randomized, controlled trial of spaced education to urology residents in the United States and Canada. J Urol. 2007;177(4):1481-1487. 60. Kerfoot BP, Brotschi E. Online spaced education to
teach urology to medical students: a multi-institutional randomized trial. Am J Surg. 2009;197(1):89-95. 61. Kerfoot BP, Armstrong EG, O’Sullivan PN. Interac-
tive spaced-education to teach the physical examination: a randomized controlled trial. J Gen Intern Med. 2008;23(7):973-978. 62. Ozdalga E, Ozdalga A, Ahuja N. The smartphone in
memorial consequences of multiple-choice testing. Psychon Bull Rev. 2007;14(2):194-199.
medicine: a review of current and potential use among physicians and students. J Med Internet Res. 2012;14 (5):e128.
49. Butler AC, Roediger HL 3rd. Feedback enhances the
63. Franko OI, Tirrell TF. Smartphone app use among
48. Marsh EJ, Roediger HL 3rd, Bjork RA, Bjork EL. The
positive effects and reduces the negative effects of multiplechoice testing. Mem Cognit. 2008;36(3):604-616.
medical providers in ACGME training programs. J Med Syst. 2012;36(5):3135-3139.
50. Kang SM,, Kathleen BM, Roediger HL III. Test
64. van der Lindern WP, PJ, Pashley PJ. Item selection
format and corrective feedback modify the effect of testing on long-term retention. Eur JCognit Psychol. 2007;19:528-558.
and ability estimation in adaptive testing.Computerized Adaptive Testing: Theory and Practice. Boston: Kluwer; 2000. 1-25.
51. Karpicke JD, Roediger HL 3rd. Is expanding retrieval
65. Toppino TC, Cohen MS. The testing effect and the
a superior method for learning text materials? Mem Cognit. 2010;38(1):116-124.
retention interval: questions and answers. Exp Psychol. 2009;56(4):252-257.
Journal of Surgical Education Volume ]/Number ] ] 2015