Improving Learning Efficiency of Factual Knowledge in Medical Education.

ORIGINAL REPORTS

Improving Learning Efficiency of Factual Knowledge in Medical Education D. Dante Yeh, MD, FACS,* and Yoon Soo Park, PhD† *

Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts; and †Department of Medical Education, University of Illinois at Chicago, College of Medicine, Chicago, Illinois OBJECTIVE: The purpose of this review is to synthesize

recent literature relating to factual knowledge acquisition and retention and to explore its applications to medical education. RESULTS: Distributing, or spacing, practice is superior to

massed practice (i.e. cramming). Testing, compared to re-study, produces better learning and knowledge retention, especially if tested as retrieval format (short answer) rather than recognition format (multiple choice). Feedback is important to solidify the testing effect. CONCLUSIONS: Learning basic factual knowledge is often

overlooked and under-appreciated in medical education. Implications for applying these concepts to smartphones are discussed; smartphones are owned by the majority of medical trainees and can be used to deploy evidence-based educational methods to greatly enhance learning of factual knowledge. ( J Surg ]:]]]-]]]. C J 2015 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.) KEY WORDS: medical education, testing effect, distrib-

utive practice, feedback, smartphones COMPETENCIES: Medical Knowledge

INTRODUCTION With recent interest in clinical simulation and higher order reasoning, learning basic factual knowledge is often overlooked and underappreciated in medical education. Yet without a solid knowledge base, the practice of medicine would not be possible. The sheer volume of information encountered by medical trainees is enormous. In addition to learning the jargon of medical terminology (comparable to learning a foreign language), students must learn and retain a body of factual knowledge in both basic and applied Correspondence: Inquiries to Daniel Dante Yeh, MD, FACS, Department of Surgery, Massachusetts General Hospital, Harvard Medical School, 165 Cambridge St. 810 Boston, MA 02114; fax: (617) 726-9121.; E-mail: [email protected], [email protected]

sciences. Unfortunately, without repeated rehearsal, most of this knowledge is subsequently forgotten shortly after graduation.1-3 This is true for both basic sciences and clinically relevant knowledge.4,5 In the United States, the first 2 years of medical education are generally spent in the classroom setting (preclinical), whereas starting from the third year, students are immersed in the hospital environment. While working full time in the clinical setting, students are expected to independently continue reading and learning new material. With less time for formal didactic instruction, more emphasis is placed on selfdirected learning. This expectation is further amplified during residency, when “protected” educational time is minimal and work demands are maximal. Postresidency, physicians are required to continually learn and master new material to maintain proficiency and certification. To gain acceptance into a medical school and to match in a surgical residency requires a high level of achievement, intelligence, and effort. Yet, learners are often uninformed regarding the optimal methods of study. Metacognition, defined as “knowing about knowing,” is used to describe a learner’s awareness of his/her own knowledge strengths and deficits. It underlies the choices a learner makes in deciding what to study, when to study, how much to study, and when to stop studying. Research demonstrates that up through the university level, learners have an inaccurate understanding of how learning works. In a survey of college students, respondents were 8 times more likely to use inferior study strategies, with only 1% using the best strategy.6 The most frequently chosen method continues to be rereading of prior lecture notes and textbook chapters, a highly inefficient and nondurable strategy.6,7 Left on their own, some adult learners will restudy material up to 7 times longer than their colleagues with minimal gain in accuracy, an effect termed labor-in-vain.8 In the past 2 decades, there have been great strides in education research and advances in the understanding of how learners acquire and retain new knowledge. Most of these experiments have been performed in a laboratory setting under highly controlled and artificial conditions.

Journal of Surgical Education & 2015 Association of Program Directors in Surgery. Published by 1931-7204/$30.00 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jsurg.2015.03.012

1

However, recent translational studies under real world settings (including in medical education) have replicated and extended these findings. The purpose of this review is to synthesize recent literature relating to factual knowledge acquisition and retention and explore its applications to medical education. Specifically, we will discuss how the spacing effect, the testing effect, the choice of test format, and the provision of feedback all contribute to dramatic gains in learning.

THE SPACING EFFECT The spacing effect describes the phenomenon whereby for an equal amount of cumulative study time, spacing or distributing the study sessions with intervening time gaps (or inter-study interval [ISI]) results in superior knowledge acquisition and long-term retention compared with massing the study sessions together. The final examination is given after a retention interval (RI). Descriptions of the spacing effect stretch back as far as the classic Ebbinghaus experiments of 1885, and there are hundreds of reports confirming this effect in applications ranging from learning a new language, to mathematical concepts, to surgical skills training, and in subjects ranging from rats to young children, to cognitively impaired adults, to physicians.9-12 Yet, despite the long history and oft-replicated results, the spacing effect is rarely used intentionally in real world settings as a form of strategic studying.13 Massed repetition (i.e., “cramming”) is still by far the dominant strategy, chosen because of flawed metacognitive judgments based on feelings of fluency or illusions of competency.14,15 In actuality, such strategies that maximize performance early in the learning process are associated with faster forgetting; conversely, strategies resulting in slower early knowledge or skill acquisition are associated with longer retention.16 A theory for why the spacing effect exists is the encoding or contextual variability theory.17,18 In this account, when an item is learned or studied, a memory trace representing the learning context is also stored. Thus, multiple learning sessions will produce varied memory traces and a richer set of paths or cues allowing the learner more chances to recall the item. An alternative theory is the deficient processing theory.19 In this theory, subsequent item presentations after the initial learning session are treated with less attention (because of fatigue, boredom, or overconfidence) and thus the repeated sessions are reduced in quality. Congruent with this theory is the concept of desired difficulty.16 Within this paradigm, memory has 2 characteristics: storage strength and retrieval strength. The former represents a permanent property of the item, i.e., long-term retention, and the latter refers to the immediate accessibility of the item. According to this theory, the 2 strengths are negatively correlated during initial learning. Less effortful immediate retrieval produces smaller increments in storage strength, and 2

conversely, more difficult retrieval (lower retrieval strength) results in higher gains in storage strength. Up until recently, most research on the spacing effect was conducted in the laboratory setting with very short gaps (minutes), short RIs (minutes to hours), and no long-term follow-up. For longer RIs in real world settings, recent studies provide empirical evidence to help decide how long the ISI should be. In a large randomized study of 1354 subjects (mean age 34 y, standard deviation ¼ 11; 72% were woman), Cepeda et al. presented the subjects with 32 obscure, but true trivia facts (e.g., “What European nation consumes the most spicy Mexican food?” Answer: “Norway”). Using 26 combinations of ISIs and RIs, these investigators reported that the optimal ratio for maximal retention varies as a function of RI.20 To quote the authors, “if you want to know the optimal distribution of your study time, you need to decide how long you wish to remember something.” As the RI increases from 1 week to 1 year, optimal ISI in absolute terms increases, but as a percent of RI, the optimal ISI decreases from 20% to 40%, to 5% to 10%.20,21 All else being equal, simply restudying material at the optimal time can retain more than double the amount retained. The Cepeda study also demonstrated that the costs of too-short spacing were greater than those of too-long spacing. Put another way, the benefits of spacing may override the costs of increasing error rates associated with increasing the ISI.22 In a study of dental students taking a theoretical radiological science course, subjects were randomized to usual practice or spaced testing (test questions were delivered by email 14 d after the lecture). Nkenke et al. found that subjects in the spaced education group actually spent far more time (216 min vs 58 min, p o 0.001) engaged in the learning context. Using a validated questionnaire, the authors found that the spaced education group rated the didactics significantly better (mean 4.9 out of 6 vs 4.1, p ¼ 0.034), and felt that their learning needs were better fulfilled (4.6 vs 1.4, p ¼ 0.02).23 These findings are significant and meaningful when considering that amount of time spent studying is directly correlated with scores on the American Board of Surgery In-Training Exam (ABSITE) and that most of the surveyed residents were dissatisfied with traditional study methods.24

THE TESTING EFFECT It was previously believed that learning occurred only during active study; testing was regarded as formative assessment, measuring learning that had previously occurred but itself contributing little, if any, to learning. Thus, it is a common practice for a learner to study and test a topic until an item is successfully recalled, and then drop that item from future study and testing. It seems intuitively obvious that once an item is mastered, attention should be focused Journal of Surgical Education Volume ]/Number ] ] 2015

on learning new, unmastered items rather than on continually retesting mastered material. Recently, however, Karpicke and Roediger25 reported a series of elegant experiments challenging this practice. In studying SwahiliEnglish word pairs, subjects were randomized to 1 of 4 conditions: Standard (ST) condition, where the entire 40item list was studied and tested repeatedly regardless of recall mastery. In the second condition, SNT, once a word pair was recalled, it was dropped from further study but continually tested. In the third condition, STN, recalled pairs were continually studied, but dropped from further testing. And finally, in the fourth condition, SNTN, recalled word pairs were dropped from all future study and tests. The results were both counter-intuitive and dramatic: longterm retention was highest when successfully recalled material was continually retested. Furthermore, repeated study after a single successful recall did not produce any further learning. To quote the authors, “… there is a striking absence of any benefit of repeated studying once an item could be recalled from memory.” Congruent with previous studies demonstrating lack of metacognition, these students exhibited complete unawareness of the beneficial effects of retrieval practice. Repeated studying may result in higher rates of initial learning if measured after a very short time, but testing substantially slows the inevitable rate of forgetting. Figure 1 illustrates the results of university-level students learning Swahili for the first time. Initial test scores are higher for repeated study if challenged minutes after the last exposure, but the trend reversed in as little as 2 days. Most learners continue to choose studying because they are fooled by initially high learning rates. A theory for why the testing effect occurs is termed transfer appropriate processing. In this theory, the act of testing as practice more closely approximates the conditions on the final test (as opposed to simply rereading the correct

answer), and thus increases success through mimicking the appropriate required cognitive processes. A second theory for why the testing effect exists is termed elaborative retrieval hypothesis and is based on the previously described concept of desirable difficulty. Increasing the difficulty forces the learner to engage more elaborate cognitive processes and thus strengthens item storage strength. Early investigations of the testing effect were conducted under artificially controlled experimental conditions involving word lists. Recent studies have confirmed the testing effect in classroom settings with educationally relevant material.26-28 Some have questioned whether the testing effect encourages only superficial memorization of answers without deeper understanding. However, testing benefits have been demonstrated for both factual and conceptual learning, transfer to different domains, learning of natural concepts, and visuospatial map learning.29-31 Testing has been shown to benefit the teaching of anatomy to university-level students and significantly improve knowledge retention 6 months after a cardiac resuscitation course.32 Likewise, Larsen et al.33 showed that testing is superior to restudying in 6-month retention in first-year medical students learning clinical neurology topics. In another experiment, the same authors randomized pediatric and emergency medicine residents to repeated study or repeated testing of clinical topics (myasthenia gravis and status epilepticus). Testing led to a significant increase in knowledge retention 6 months after the initial teaching session (39% vs 26%, p o 0.001).34 End-user buy-in was strong, as almost 90% indicated a willingness to continue to take tests after study completion. This experiment is also informative as it demonstrates the rapid rate of forgetting (74% at 6 mo) after initially “mastering” new material through studying alone. The testing effect can thus be summarized as: learners tested on material will remember it better in the future than if they had not been tested.35 This testing effect remains even when the time spent studying or testing is equalized,36 and is further enhanced and amplified with increasing number of test opportunities vs restudy opportunities.35-37 Of note, the combination of testing and spacing seems to be particularly effective; a series of experiments demonstrated that a single well-timed test was more effective than 7massed study repetitions.38

TEST FORMAT

FIGURE 1. Mean percentage correct cued recall on a final test as a function of learning condition (study vs test) and retention interval (2 min vs 48 h). RO, read only; MC, multiple choice; SA, short answer. (Reproduced with permission from Toppino and Cohen.65)

When considering test format, the 2 options most feasible for wide-scale use are: retrieval (i.e., short answer) and recognition (i.e., multiple choice questions [MCQ]). The overwhelming majority of study that aids in medical education currently use multiple choice format questions, mainly because of ease of grading and perceived objectivity. Research demonstrates, however, that compared with

Journal of Surgical Education Volume ]/Number ] ] 2015

3

recognition testing, retrieval or recall testing produces superior long-term retention at the expense of short-term acquisition.39 A theory proposes that engagement of retrieval processes will modify the memory trace, increasing the probability of future successful retrieval. This has been termed the generation effect, whereby items self generated by learners are better retained than items merely presented to be read.40 This effect occurs even if the learner fails to correctly generate the item, provided that corrective feedback is given. For example, McDaniel et al.26 demonstrated that missed facts on a short answer quiz (compared with missed multiple choice or additional restudy) were more likely to be answered correctly on a later exam. A second theory, the retrieval effort hypothesis, states that compared with recognition testing, retrieval testing induces more effortful cognition by the learner, thus promoting deeper processing.41 Hence, retrieval testing is superior to recognition testing, and both forms of testing are superior to simple restudy. This theory is also grounded in the concept of desirable difficulty.16,41 The retrieval effort hypothesis has been confirmed through multiple studies demonstrating that more effortful processing causes lower rates of initial learning, but higher rates of long-term retention.42 As stated earlier, the retrieval effect need not be successful to be beneficial.43 In an experiment involving fictional trivia questions, subjects forced to answer a fictional trivia question (for which the answer was impossible to guess correctly) displayed significantly higher rates of learning compared with simply reading the fictional trivia fact.44 The act of trying to answer an unanswerable question appears to have enhanced encoding of the “correct” answer. This effect was replicated in the same study when using weakly associated word pairs (e.g., mouse-hole and train-caboose) instead of fictional facts. In medical education, a spaced retrieval test (without feedback) significantly improved long-term retention after a life support course.45 A potential disadvantage of using MCQ recognition format is interference with learning: subjects may select “lures” (incorrect answer choices) and acquire false knowledge.46-48 This is especially true if no feedback is provided and the learner has not previously studied the material.49 In an experiment, the learners endorsed a previously selected lure on a final examination 30% of the time despite corrective feedback given after the initial MCQ test.50 Hence, exposing a learner to incorrect information (via MCQ lures) in the acquisition phase of learning may result in encoding of incorrect information. The effect is diminished (but not eliminated) with corrective feedback.

and optimizes the gains achievable from testing through 2 mechanisms: reinforcement of correct responses and correction of incorrect guesses.52 The type of feedback can vary, ranging from simply informing correct or incorrect (partial feedback), to showing the examinee the correct answer, to explaining the rationale for the correct answer. Research has shown that partial feedback is least effective and thus more elaboration is generally recommended.53 So, powerful is the combination of testing with feedback, that some researchers have reported that it overrides any merits of prior studying of the material49! Without time constraints, there is little debate about whether feedback is beneficial or harmful; the overwhelming majority of research demonstrates that feedback is at best, vastly beneficial, and at worst, learning neutral.26 A typical finding is illustrated in Figure 2, where providing feedback increases the retention percentage by more than 2 times on a delayed test. Note also that the combination of testing and feedback quadruples the retention percentage when compared with no testing. The 3 main theories about why the feedback effect occurs are the Learning Theory, the Spacing Theory, and the Interference Preservation Hypothesis. In the Learning Theory, feedback is thought to reinforce correct responses. Similar to operant conditioning, it is believed that the reinforcement is most effective when provided immediately after the stimulus. In a contrary argument, the Spacing Theory views feedback as simply an additional restudy opportunity. Therefore, delaying feedback should provide improved memory benefits over immediate feedback. Both these theories consider only feedback after correct responses. The third theory, the Interference Preservation Hypothesis, focuses instead on feedback after incorrectly answered items. It is assumed that the incorrect response serves to interfere with the correct answer when feedback is given immediately. Delay of feedback allows the learner to forget the

FEEDBACK Although the testing effect occurs even in the absence of feedback,51 it is generally accepted that feedback enhances

FIGURE 2. Proportion of correct response on the final cued-recall test as a function of initial learning condition. (Reprinted with permission from Butler et al.57)

4


incorrect response and opens the way for encoding of the correct answer. Thus, the beneficial effects of feedback may depend upon whether or not the item was answered correctly. For a correctly answered item, feedback may reinforce the correct answer and serve as a form of spaced restudy. However, some researchers have recently reported that feedback after a correct response makes little difference, whether immediate or delayed.54 For an incorrectly answered item, feedback serves as error correction, preventing the encoding of false information. Significantly, a learner’s perception of the difficulty of an item and his subsequent performance (correct or incorrect) influence the success of learning. Questions may be correct or incorrect and may be answered with high- or low-confidence. Incorrectly answered items answered with both low- and high-confidence benefit from feedback through correction of memory errors. Without feedback, the error is almost never corrected (Fig. 3). When learners commit an error on a perceived easy item (high confidence), they are much more likely to improve after feedback on a subsequent test than when the error is committed on a perceived difficult item (low confidence).55 This phenomenon has been termed hypercorrection and is believed to occur because the feedback is surprising and unexpected to the learner, stimulating closer attention and interest.56 Correct items answered with high confidence are unlikely to derive additional benefit from feedback. However, correct items answered with low confidence may be a lucky guess and have been described as an error of metacognitive monitoring. Thus, although there is no memory error to correct, feedback in these cases can serve to correct metacognitive errors, improving the accuracy of future confidence judgments.49 Hence, in both correct items answered with low confidence and incorrect items answered with high confidence, there is a discrepancy

between the objective correctness of the response and the learner’s subjective assessment. Providing feedback in these cases illuminates this metacognitive discrepancy and stimulates additional cognitive processing by the learner.56,57

ADDITIONAL STUDIES IN MEDICAL EDUCATION Several investigators have demonstrated that the spacing effect and testing effect can powerfully effect learning in medical education, causing gains in knowledge detectable up to 2 years later.58,59 Kerfoot et al. enrolled 724 of the 1000 Urology residents in the United States and divided them into 2 cohorts, the first receiving spaced education on prostatetestis histopathology and massed (bolus) education via webbased teaching modules on bladder-kidney histopathology, and the second cohort receiving the inverse. Congruent with laboratory education experiments, massed education increased scores more than spaced education in the short term; however, when measuring diagnostic skill retention in the long term (18-45 wk), spaced education improved learning efficiency fourfold. Compliance was high and most of the enrollees completed all teaching modules. When surveyed, 99% of respondents requested to participate in future studies on spaced education.60 A similar study demonstrated similar spacing effects for teaching of core physical examination topics to medical students.61 Once again, acceptance was high, with 85% of participants recommending the teaching method for the following year’s class of students.

SMARTPHONE APPLICATIONS

FIGURE 3. Proportion of correct response on the final cued-recall test. (Reprinted with permission from Butler et al.57)

Surveys report that 85% of Accreditation Council for Graduate Medical Education (ACGME) trainees, and up to 97% of surgical residents own a smartphone.24,62 However, the most commonly used smartphone applications are drug guides and medical calculators.63 Most education applications are often little more than electronic textbooks, video libraries, or question banks. Very few, if any, offer spaced study based on optimum ISI:RI ratio, retrieval format testing, or adaptive algorithms that adjust content according to user performance. With high existing market penetration, frequent end-user engagement, and enormous computing power, the smartphone is the ideal tool to deliver strategic testing to promote efficient learning and retention of factual knowledge. One could easily imagine an application that provides a daily “push” notification, similar to a text message alert, to the user at a specified time, such as during the morning subway commute. Upon accepting this push notification, the learner is then presented with a retrieval format question (e.g., The [ ] nerve provides cutaneous innervation to the plantar side of the heel.). Upon entering the answer (“tibial”), the user is next asked to rate the difficulty of the item (judgment of learning) and then given immediate feedback. Based on the


5

response (correct or incorrect) and the perceived difficulty, the item will then be processed through an algorithm to determine the optimal time to retest the item. Response times can also be measured and factored into the algorithm. Additionally, if the learner is studying for a particular test (e. g., ABSITE), the date of that exam could be entered into the application. Thus, a few simple calculations could determine the optimal ISI based on the known RI. For example, if the ABSITE is 300 days away, item retesting should occur in 15 to 30 days; however, if the ABSITE is only 30 days away, item retesting should occur in 3 to 6 days. Thus, the application will be adaptive to the responses and judgments of the individual user as well as learning objectives. Importantly, these spacing decisions must NOT be left to the user to decide, as research has repeatedly demonstrated that illusions of competency will lead learners to make poor study decisions. Even if learners may make the correct decision to retest an item, the timing of the retesting opportunity may not be optimized for maximal efficiency. Although several applications already exist (Anki, Osmosis, Firecracker), they do not adequately exploit all of the strategies for optimal learning. For example, Anki (Ankitects Pty. Ltd., Sydney) allows the user to choose the spacing interval and thus functions as only an electronic form of flashcards. Osmosis (Knowledge Diffusion, Inc., Baltimore, MD), a program designed by medical students for medical students, provides push notifications and social game mechanics. The application measures judgments of learning and feature questions in multiple choice, true-false, and label-matching formats as well as flashcards that test recall. Like Anki, Osmosis allows the user to choose when to re-take test items, though also has the option to automatically generate quizzes based on prior performance. Firecracker (Firecracker, Inc., Cambridge, MA), a medical education application, presents learners with both MCQ and retrieval questions. However the retrieval questions are sometimes open-ended (e.g., How does G6PD deficiency lead to hemolysis?) and do not necessarily have 1 correct answer. However, Firecracker does use a proprietary algorithm, based on inputs such as judgment of difficulty, number of repetitions, and elapsed time, to improve the efficiency of retesting. The validity of these applications has not been extensively studied using objective scientific methods. Additional research studies are required to evaluate and refine existing algorithms. Techniques for determining optimal selection of items using prior history of learner performance can be calibrated using existing methods, such as computerized adaptive testing based on item response theory models.64

is readily apparent that learners should choose learning strategies that challenge them and require deeper cognitive processing. Adult learners receive little, if any, instruction on how best to study and the most of them will choose strategies based on flawed metacognitive judgments. The most commonly employed methods of rereading text, using tests only as formative assessments, and cramming are highly inefficient and result in rapid forgetting. The use of testing, especially retrieval format with feedback, combined with optimally distributed spacing, can greatly enhance learning and retention. Early experiments in medical and surgical education demonstrate that such strategies are feasible, effective, durable, and well-accepted. Existing technologies, such as the ubiquitous smartphone, can be used to provide evidence-based strategic testing to maximize efficiency and retention.

REFERENCES 1. Custers EJ. Long-term retention of basic science

knowledge: a review study. Adv Health Sci Educ Theory Pract. 2010;15(1):109-128. 2. Custers EJ, Ten Cate OT. Very long-term retention of

basic science knowledge in doctors after graduation. Med Educ. 2011;45(4):422-430. 3. Rico E, Galindo J, Marset P. Remembering biochem-

istry: a study of the patterns of loss of biochemical knowledge in medical students. Bichem Educ. 1981;9 (3):100-102. 4. Berden HJ, Willems FF, Hendrick JM, Pijls NH,

Knape JT. How frequently should basic cardiopulmonary resuscitation training be repeated to maintain adequate skills? Br Med J. 1993;306(6892):1576-1577. 5. Ali J, Cohen R, Adam R, et al. Attrition of cognitive

and trauma management skills after the Advanced Trauma Life Support (ATLS) course. J Trauma. 1996;40(6):860-866. 6. Karpicke JD, Butler AC, Roediger HL 3rd. Metacog-

nitive strategies in student learning: do students practise retrieval when they study on their own? Memory. 2009;17(4):471-479. 7. Carrier LM. College students’ choices of study strat-

egies. Percept Mot Skills. 2003;96(1):54-56. 8. Nelson TO, Leonesio RJ. Allocation of self-paced

CONCLUSIONS This brief review has highlighted several concepts, which should be considered in medical education. One common theme underlying all these concepts is desirable difficulty. It 6

study time and the "labor-in-vain effect". J Exp Psychol Learn Mem Cogn. 1988;14(4):676-686. 9. Rohrer D, Taylor K. The effects of overlearning and

distributed practise on the retention of mathematics knowledge. Appl Cognit Psychol. 2006;20:1209-1224. Journal of Surgical Education Volume ]/Number ] ] 2015

10. Cepeda NJ, Pashler H, Vul E, Wixted JT, Rohrer D.

24. Yeh DD, Hwabejire JO, Imam A, et al. A survey of

Distributed practice in verbal recall tasks: a review and quantitative synthesis. Psychol Bull. 2006;132(3): 354-380.

study habits of general surgery residents. J Surg Educ. 2013;70(1):15-23.

11. Bahrick HP, Bahrick LE, Bahrick AS, Bahrick PE.

Maintenance of foreign language vocabulary and the spacing effect. Psychol Sci. 1993;4(5):316-321. 12. Bird S. Effects of distributed practice on the acquis-

ition of second language English syntax. Appl Psycholinguistics. 2010;31:635-650. 13. Dempster FN. The spacing effect: a case study in the

25. Karpicke JD, Roediger HL 3rd. The critical impor-

tance of retrieval for learning. Science. 2008;319 (5865):966-968. 26. McDaniel MA, Anderson JL, Debrish MH, Morrisette

N. Testing the testing effect in the classroom. Eur J Cognit Psychol. 2007;19(4/5):494-513. 27. McDaniel MA, Roediger HL 3rd, McDermott KB.

failure to apply the results of psychological research. Am Psychol. 1988;43(8):627-634.

Generalizing test-enhanced learning from the laboratory to the classroom. Psychon Bull Rev. 2007;14 (2):200-206.

14. Koriat A, Bjork RA, Sheffer L, Bar SK. Predicting

28. Roediger HL III, Agarwal PK, McDaniel MA, McDer-

one’s own forgetting: the role of experience-based and theory-based processes. J Exp Psychol Gen. 2004;133 (4):643-656. 15. Son LK. Spacing one’s study: evidence for a metacog-

nitive control strategy. J Exp Psychol Learn Mem Cogn. 2004;30(3):601-604. 16. Schmidt RB, Robert AB. New conceptualizations of

practice: common principles in three paradigms suggest new concepts for training. Psychol Sci. 1992;3: 207-217. 17. Glenberg AM. Component-levels theory of the effects

of spacing of repetitions on recall and recognition. Mem Cognit. 1979;7(2):95-112. 18. Melton A. The situation with respect to the spacing of

repetitions and memory. J Verbal Learn Verbal Behav. 1970;9(5):596-606. 19. Dempster F. Spacing Effects and Their Implications

for Theory and Practice. Educ Psychol Rev. 1989;1 (4):309-330. 20. Cepeda NJ, Vul E, Rohrer D, Wixted JT, Pashler H.

Spacing effects in learning: a temporal ridgeline of optimal retention. Psychol Sci. 2008;19(11): 1095-1102. 21. Cepeda NJ, Coburn N, Rohrer D, Wixted JT, Mozer

MC, Pashler H. Optimizing distributed practice: theoretical analysis and practical implications. Exp Psychol. 2009;56(4):236-246. 22. Pashler H, Zarow G, Triplett B. Is temporal spacing of

tests helpful even when it inflates error rates? J Exp Psychol Learn Mem Cogn. 2003;29(6):1051-1057. 23. Nkenke E, Vairaktaris E, Bauersachs A, et al. Spaced

mott KB. Test-enhanced learning in the classroom: long-term improvements from quizzing. J Exp Psychol Appl. 2011;17(4):382-395. 29. Butler AC. Repeated testing produces superior transfer

of learning relative to repeated studying. J Exp Psychol Learn Mem Cogn. 2010;36(5):1118-1133. 30. Jacoby LL, Wahlheim CN, Coane JH. Test-enhanced

learning of natural concepts: effects on recognition memory, classification, and metacognition. J Exp Psychol Learn Mem Cogn. 2010;36(6):1441-1451. 31. Carpenter SK, Pashler H. Testing beyond words: using

tests to enhance visuospatial map learning. Psychon Bull Rev. 2007;14(3):474-478 http://dx.doi.org/doi: 10.1002/ase.1489 [Epub a head of print]. 32. Dobson J.L., Linderholm T. The effect of selected

“desirable difficulties” on the ability to recall anatomy information. Anat Sci Educ. 2014. 33. Larsen DP, Butler AC, Roediger HL 3rd. Comparative

effects of test-enhanced learning and self-explanation on long-term retention. Med Educ. 2013;47(7): 674-682. 34. Larsen DP, Butler AC, Roediger HL 3rd. Repeated

testing improves long-term retention relative to repeated study: a randomised controlled trial. Med Educ. 2009;43(12):1174-1181. 35. Roediger HL III, Karpicke JD. Test-enhanced learn-

ing: taking memory tests improves long-term retention. Psychol Sci. 2006;17(3):249-255. 36. Carpenter SK, Pashler H, Wixted JT, Vul E. The

effects of tests on learning and forgetting. Mem Cognit. 2008;36(2):438-448.

education activates students in a theoretical radiological science course: a pilot study. BMC Med Educ. 2012;12:32.

37. Kuo TH. Investigations of the testing effect. Am J


7

Psychol. 1996;109:451-464.

38. Karpicke JD, Roediger HL. Expanding retrival pro-

52. Butler AC, Karpicke JD, Roediger HL 3rd. The effect of

motes short-term retention, but equally spaced retrieval enhances long-term retention. J Exp Psychol. 2007;33:704-719.

type and timing of feedback on learning from multiplechoice tests. J Exp Psychol Appl. 2007;13(4):273-281.

39. Butler AC, Henry LR. Testing improves long-term

retention in a simulated classroom setting. Eur J Cognit Psychol. 2007;19:514-527. 40. Carrier M, Pashler H. The influence of retrieval on

retention. Mem Cognit. 1992;20(6):633-642. 41. Pyc AP, Katherinw AR. Testing the retrieval effort

hypothesis: does greater difficulty correctly recalling information lead to higher levels of memory? J Mem Lang. 2009;60:437-447. 42. Carpenter SK. Cue strength as a moderator of the

testing effect: the benefits of elaborative retrieval. J Exp Psychol Learn Mem Cogn. 2009;35(6):1563-1569. 43. Kornell N. Attempting to answer a meaningful ques-

tion enhances subsequent learning even when feedback is delayed. J Exp Psychol Learn Mem Cogn. 2014;40 (1):106-114. 44. Kornell N, Hays MJ, Bjork RA. Unsuccessful retrieval

attempts enhance subsequent learning. J Exp Psychol Learn Mem Cogn. 2009;35(4):989-998. 45. Turner NM, Scheffer R, Custers E, Cate OT. Use of

unannounced spaced telephone testing to improve retention of knowledge after life-support courses. Med Teach. 2011;33(9):731-737. 46. Roediger HL 3rd, Marsh EJ. The positive and negative

consequences of multiple-choice testing. J Exp Psychol Learn Mem Cogn. 2005;31(5):1155-1159. 47. Butler AC, Marsh EJ, Goode MK, Roediger MK, III

HL. When additional multiple-choice lures aid versus hinder later memory. Appl Cognit Psychol. 2006 941-956.

53. Bangert-Drowns RL, Kulik CC, Kulik JA, Morgan

MT. The Instructional Effect of Feedback in Test-Like Events. Revi Educ Res. 1991;61(2):213-238. 54. Pashler H, Cepeda NJ, Wixted JT, Rohrer D. When

does feedback facilitate learning of words? J Exp Psychol Learn Mem Cogn. 2005;31(1):3-8. 55. Butterfield B, Metcalfe J. Errors committed with high

confidence are hypercorrected. J Exp Psychol Learn Mem Cogn. 2001;27(6):1491-1494. 56. Fazio LK, Marsh EJ. Surprising feedback improves

later memory. Psychon Bull Rev. 2009;16(1):88-92. 57. Butler AC, Karpicke JD, Roediger HL. Correcting a

metacognitive error: feedback increases retention of low-confidence correct responses. J Exp Psychol Learn Mem Cogn. 2008;34(4):918-928. 58. Kerfoot BP. Learning benefits of on-line spaced

education persist for 2 years. J Urol. 2009;181(6): 2671-2673. 59. Kerfoot BP, Baker HE, Koch MO, Connelly D,

Joseph DB, Ritchey ML. Randomized, controlled trial of spaced education to urology residents in the United States and Canada. J Urol. 2007;177(4):1481-1487. 60. Kerfoot BP, Brotschi E. Online spaced education to

teach urology to medical students: a multi-institutional randomized trial. Am J Surg. 2009;197(1):89-95. 61. Kerfoot BP, Armstrong EG, O’Sullivan PN. Interac-

tive spaced-education to teach the physical examination: a randomized controlled trial. J Gen Intern Med. 2008;23(7):973-978. 62. Ozdalga E, Ozdalga A, Ahuja N. The smartphone in

memorial consequences of multiple-choice testing. Psychon Bull Rev. 2007;14(2):194-199.

medicine: a review of current and potential use among physicians and students. J Med Internet Res. 2012;14 (5):e128.

49. Butler AC, Roediger HL 3rd. Feedback enhances the

63. Franko OI, Tirrell TF. Smartphone app use among

48. Marsh EJ, Roediger HL 3rd, Bjork RA, Bjork EL. The

positive effects and reduces the negative effects of multiplechoice testing. Mem Cognit. 2008;36(3):604-616.

medical providers in ACGME training programs. J Med Syst. 2012;36(5):3135-3139.

50. Kang SM,, Kathleen BM, Roediger HL III. Test

64. van der Lindern WP, PJ, Pashley PJ. Item selection

format and corrective feedback modify the effect of testing on long-term retention. Eur JCognit Psychol. 2007;19:528-558.

and ability estimation in adaptive testing.Computerized Adaptive Testing: Theory and Practice. Boston: Kluwer; 2000. 1-25.

51. Karpicke JD, Roediger HL 3rd. Is expanding retrieval

65. Toppino TC, Cohen MS. The testing effect and the

a superior method for learning text materials? Mem Cognit. 2010;38(1):116-124.

retention interval: questions and answers. Exp Psychol. 2009;56(4):252-257.

8


Effects of improving teachers' content knowledge on teaching and student learning in physical education.

Early detection of factual knowledge deficiency and remediation in otolaryngology residency education.

Effects of Team-Based Learning on short-term and long-term retention of factual knowledge.

Improving Education in Medical Statistics: Implementing a Blended Learning Model in the Existing Curriculum.

Learning Analytics And Medical Education.

Learning from erroneous examples in medical education.

The science of learning and medical education.

Improving Nursing Knowledge of Alcohol Withdrawal: Second Generation Education Strategies.

[Medical education: learning from an error].

Improving Dermatology Clinical Efficiency in Academic Medical Centers.

Improving the efficiency and outcomes of medical care.

Views on an active learning curriculum improving knowledge.

Improving medical education in Kenya: an international collaboration.

An overview of infusing service-learning in medical education.

Evaluation of problem-based learning in medical students' education.

The growth of learning communities in undergraduate medical education.

Learning styles of medical students - implications in education.

Response to "Improving medical education in Kenya: an international collaboration".

Patient Safety in Medical Education: Students' Perceptions, Knowledge and Attitudes.

Twelve tips on teaching and learning humanism in medical education.

Affordances of knowledge translation in medical education: a qualitative exploration of empirical knowledge use among medical educators.

Learning model for behavioral science in medical education.

Problem-based learning in medical education: Developing a research agenda.

Collaborative diagramming during problem based learning in medical education: Do computerized diagrams support basic science knowledge construction?