Adv in Health Sci Educ DOI 10.1007/s10459-014-9529-1

The benefits of testing for learning on later performance Meghan M. McConnell • Christina St-Onge • Meredith E. Young

Received: 6 August 2013 / Accepted: 19 June 2014 Ó Springer Science+Business Media Dordrecht 2014

Abstract Testing has been shown to enhance retention of learned information beyond simple studying, a phenomena known as test-enhanced learning (TEL). Research has shown that TEL effects are greater for tests that require the production of responses [e.g., short-answer questions (SAQs)] relative to tests that require the recognition of correct answers [e.g., multiple-choice questions (MCQs)]. High stakes licensure examinations have recently differentiated MCQs that require the application of clinical knowledge (context-rich MCQs) from MCQs that rely on the recognition of ‘‘facts’’ (context-free MCQs). The present study investigated the influence of different types of educational activities (including studying, SAQs, context-rich MCQs and context-free MCQs) on later performance on a mock licensure examination. Fourth-year medical students (n = 224) from four Quebec universities completed four educational activities: one reading-based activity and three quiz-based activities (SAQs, context-rich MCQs, and context-free MCQs). We assessed the influence of the type of educational activity on students’ subsequent performance in a mock licensure examination, which consisted of two types of context-rich MCQs: (1) verbatim replications of previous items and (2) items that tested the same learning objective but were new. Mean accuracy scores on the mock licensure exam were higher when intervening educational activities contained either context-rich MCQs (Mean z-score = 0.40) or SAQs (M = 0.39) compared to context-free MCQs (M = -0.38) or study only items (M = -0.42; all p \ 0.001). Higher mean scores were only present for verbatim items (p \ 0.001). The benefit of testing was observed when intervening educational activities required either the generation of a response (SAQs) or

M. M. McConnell (&) Program for Educational Research and Development, Faculty of Clinical Epidemiology and Biostatistics, McMaster University, MDCL 3510, Hamilton, ON, Canada e-mail: [email protected] C. St-Onge Department of Medicine and Centre de pe´dagogie des sciences de la sante´, Faculty of Medicine and Health Sciences, Universite´ de Sherbrooke, Sherbrooke, Canada M. E. Young Centre for Medical Education and Department of Medicine, McGill University, Montreal, Canada

123

M. M. McConnell et al.

the application of knowledge (context-rich MCQs); however, this effect was only observed for verbatim test items. These data provide evidence that context-rich MCQs and SAQs enhance learning through testing compared to context-free MCQs or studying alone. The extent to which these findings generalize beyond verbatim questions remains to be seen. Keywords Assessment  Learning  Medical education  Question format  Testing effect  Test-enhanced learning

Introduction In education, testing is used primarily for assessment purposes such as measuring students’ learning, assigning grades and evaluating curriculum efficiency. However, the pedagogical benefit of testing extends beyond assessment, with recent research demonstrating that testing may actually promote learning and long-term retention of tested information (‘‘testenhanced learning’’; for review, see Larsen et al. 2008; McDaniel et al. 2007a; Roediger and Karpicke 2006). More specifically, research has shown that participants who engage in a testing activity retain information better than participants who study the same material for an equivalent amount of time (Butler and Roediger 2007; Carrier and Pashler 1992; Karpicke and Roediger 2007; McDaniel et al. 2007b). If repeated testing is associated with greater long-term retention of information relative to repeated studying, then it is worthwhile to examine the pedagogical impact of testing for learning within the context of health professions education. Indeed, within the past several years, researchers have demonstrated the applicability of test-enhanced learning (TEL) in health professions education. For example, Larsen et al. (2009) invited pediatric and emergency medicine residents to participate in an interactive teaching session and found that residents who were tested repeatedly showed greater retention of the learned materials 6 months after the initial teaching session relative to residents who studied repeatedly. Similarly, Kromann et al. (2009, 2010, 2011), in the context of a mandatory cardiac resuscitation course, found that individuals who were tested on a cardiac simulator at the end of course performed better on a later skills-based test than students who simply studied the test material. The results of Kromann et al.’s studies suggest that TEL may not be limited to general knowledge but may also be of benefit in skills learning. Given that repeated testing appears to promote long-term knowledge retention, it is surprising that no published research, to date, has examined TEL in a high-stakes medical evaluation setting. The purpose of the present study was to examine the influence of different educational interventions on later performance, specifically in the context of a mock high-stakes medical licensure examination. As Boulet (2008) asked, should healthcare educators be ‘‘teaching to test, or testing to teach’’? Test-enhanced learning: theoretical foundations The act of testing does not appear to be a neutral event; rather, the process of retrieving information appears to have a fundamentally different effect on memory than the process of simply restudying or re-reviewing information, as evidenced by enhanced retention of tested information relative to studied information. According to Wheeler et al. (2003), studying strengthens representations in memory, thereby enhancing memory acquisition

123

Test-enhanced learning and question style

(e.g., the ‘entering’ of information into memory); testing, on the other hand, enhances memory retrieval (e.g., the ability to access and use information from memory). Support of Wheeler’s theoretical explanation of TEL comes from research showing that different question formats have differential effects on TEL. For example, tests that require the production of a response (e.g., short-answer, fill-in-the blank) require more effortful retrieval of information than tests that require the recognition of the correct response (e.g., multiple-choice, true–false), and a growing body of research has shown that TEL effects are greater for tests that require the generation of a response relative to those that require the recognition of the correct response (Bjork and Bjork 1992; Butler and Roediger 2007; Kang et al. 2007). In other words, the more challenging it is for an individual to retrieve the information from memory (e.g., short-answer, fill-in-the blank), the greater the TEL effects. Despite the robustness of effortful retrieval effects on TEL, concern has been raised over whether the benefits of effortful retrieval persist when the format of the practice test does not match that of the final criterion test (Carpenter and DeLosh 2006; Hinze and Wiley 2011; Marsh et al. 2009; McDaniel et al. 2007b). For example, it may be feasible to give trainees short-answer questions (SAQs) within the context of a course, but many final criterion tests, such a licensure examinations, consist solely of multiple-choice questions (MCQs). In such cases, the retrieval benefits of tests that encourage recall (e.g., SAQ) may be overridden by transfer appropriate processes, whereby memory is enhanced when the cognitive processes evoked during the initial test match those evoked in the final examination (often equated to ‘matching’ the practice and test formats). If transfer appropriate processes outweigh the retrieval benefits of recall over recognition, then performance should be greatest when the intervening study and test conditions match (e.g., MCQs for both quizzes and final examination) than when they mismatch (e.g., SAQ-based quizzes and MCQ-based final examination). Fortunately, research has shown that the benefit of repeated testing persists even when the format of the practice test does not match that of the final criterion test. For example, McDaniel et al. (2007b) found that taking SAQ quizzes lead to larger performance gains on the final exam relative to MCQ quizzes, even though the final examination consisted solely of MCQs. This finding emphasizes the importance of effortful retrieval in the modulation of TEL and further illustrates the potential educational benefit of repeated testing to support student learning. Relating test-enhanced learning to high-stakes medical licensure examinations Over the past several years, many high-stakes licensing groups, such as the National Board of Medical Examiners (NBME) and the Medical Council of Canada (MCC), have started to use context-rich MCQs on written licensure exams that consist of a clinical case presentation and a lead-in question (Medical Council of Canada 2010; National Board of Medical Examiners 2002). These were developed in contrast to context-free MCQs, which typically consist of a specific question that requires the direct recall of information or facts (see Table 1 for examples). Context-rich MCQs are thought to test the application of clinical and diagnostic knowledge, as opposed to context-free MCQs which emphasize rote memorization (National Board of Medical Examiners 2002; Schuwirth et al. 2001). It has been speculated that context-rich MCQs require more effortful retrieval processes, allow for a clinical application of basic science knowledge, and should therefore, based on the principles of TEL, lead to greater memory retention relative to context-free MCQs (i.e., ‘‘factoid’’ questions; Wood 2009). Furthermore, because high stakes medical licensure examinations use primarily context-rich MCQs, it is important to examine how to best

123

M. M. McConnell et al.

prepare students for examinations that contain context-rich MCQs; that is, is it better for students to study using quiz items that match the structure of the test format (e.g., contextrich MCQs), as suggested by theories of transfer appropriate processing (Morris et al. 1977), or quiz items designed to maximize retrieval effort (e.g., SAQs), as suggested by theories of effortful retrieval (Wheeler et al. 2003)? The present study The purpose of the present study was twofold. First, we set out to examine the influence of a variety of educational interventions on performance on a mock licensure examination. More specifically, we investigated the effects of context-free MCQs, context-rich MCQs, SAQs, and a study-only condition on performance on a mock licensure examination containing only context-rich MCQs (in order to more closely match the structure of national licensure exams). Theories of transfer appropriate processing would predict that students would answer more questions correctly on the mock licensure examination for content tested using identical question formats, and therefore the highest performance would be seen for content tested using context-rich MCQs. In contrast, effortful retrieval theories would predict that students’ mean accuracy on the mock licensure examination would be highest for content studied using the most retrieval-challenging items, and therefore the highest performance would be seen for content tested using SAQs. Second, the goal of any educational intervention is to be able to demonstrate improved understanding of a concept—or at the very least, improved performance when knowledge is applied in a novel context or clinical case. Consequently, we investigated whether TEL effects depend on the degree of content overlap between intervening quizzes and final performance measures. With very few exceptions, research on TEL has used test items that were identical across both the intervening and final examinations. To address this shortcoming, the mock licensure examination used in this study included items that were (1) identical to those seen during an intervening quiz and (2) mapped to the same learning objective but unique in terms of context and content.

Method Context The present study was designed to run during a four-week course constructed to help prepare students for the MCC Qualifying Examination Part I (MCCQEI) at Universite´ de Sherbrooke. The materials designed and used for this study were intended to provide additional study material for the national licensure examination, and the study was originally designed to provide one educational activity per week for 4 weeks. While the initial intention was to conduct this study solely at the Universite´ de Sherbrooke, all four medical schools in Que´bec (Universite´ de Sherbrooke, Universite´ de Montre´al, Universite´ de Laval and McGill University) expressed interest in having the study material available to their students. Therefore, all final-year medical students enrolled in any of the four medical schools in Que´bec were invited to participate, and the order of topics presented was matched to the curriculum order of the host institution (Universite´ de Sherbrooke). In order to facilitate integration into the local curricula, the order of the topics remained the same across schools. However, some differences in the timing of accessing the educational

123

Test-enhanced learning and question style Table 1 Sample item types for each educational activity, and sample content-matched context-rich MCQ for one MCC educational objective Context-rich multiple choice A 40-year-old man is brought to the emergency department (ED) by ambulance. His wife found him unconscious on the floor of their bathroom. She reports that he had had several episodes of rectorrhagia over the last 6 h. The patient is tachycardic and pale; his blood pressure is 90/60 mmHg. The paramedics tell you that they found on the patient a card indicating that he was a Jehovah’s Witness and a no-transfusion order. What is the next step in managing this patient? (A) Transfuse the patient, because it’s an emergency situation and his life is in danger (B) * Initiate volume repletion and call for an emergency gastroscopy, but do not use blood products (C) Ask his wife for substituted consent (D) Ask the hospital’s legal counsel to clarify the situation (E) Transfer the patient to a colleague who has already dealt with a similar situation Context-free multiple choice Which of the following statements best applies to patient refusal of treatment? (A) * A patient can refuse any treatment without discrimination and without this affecting any other care subsequently provided (B) A patient can only refuse treatment for medical conditions that are not imminently life threatening (C) A patient cannot refuse treatment that a reasonable individual would agree to under the same circumstances (D) When the refusal of treatment is deemed improper, the patient can be considered incompetent and therefore substituted consent can be requested (E) An incompetent patient cannot refuse treatment when substituted consent has been given Study only item Regarding refusal of treatment, a patient can refuse any treatment without discrimination and without this affecting any other care subsequently provided Short answer question A 40-year-old man is brought to the emergency department (ED) by ambulance. His wife found him unconscious on the floor of their bathroom. She reports that he had had several episodes of rectorrhagia over the last 6 h. The patient is tachycardic and pale; his blood pressure is 90/60 mmHg. The paramedics tell you that they found on the patient a card indicating that he was a Jehovah’s Witness and a no-transfusion order. What is the next step in managing this patient? SAQ feedback Initiate volume repletion and call for an emergency gastroscopy, but do not use blood products Educational objective matched MCQ item (appeared only on the mock licensure examination) A 35 year-old woman presenting with acute abdominal pain is assessed in the emergency room. The physical examination reveals pain on palpation in the right iliac fossa. The radiologist noted an inflamed, enlarged appendix on the image but no perforation. You explain your diagnosis of appendicitis to the patient, the operation required, and the risks involved. The patient refuses to undergo the surgery despite the inherent consequences of not doing so, which can even include death. In your opinion, she fully understands the information you provided. What is the best course of action in this situation? (A) *Assess the reasons underlying the patient’s decision and determine if her choice could be guided towards agreeing to the surgery by appropriately acting on her reasons (B) Contact the hospital’s ethics committee (C) Call the patient’s family members and discuss the situation with them with a view to eliciting their support in convincing the patient to agree to the surgery (D) Respect her decision. Since she is an adult and competent, however, you can no longer provide her care because you disagree with her decision (E) Accept her decision despite the fact that you deem that the consequences of not having the procedure to be dangerous. Offer to discuss consent to the surgery with her again, if she so wishes; treat her pain; monitor her closely; and initiate empiric antibiotic therapy Materials translated to English from the original language of presentation (French) for purposes of this manuscript * Denotes the correct response where applicable

123

M. M. McConnell et al.

materials were present across the four schools (e.g., study-related materials were not available during mandatory local pedagogical activities). The MCCQE Part I is a 3-h computer-based examination that assesses knowledge, clinical skills and attitudes of undergraduate medical students, and it primarily uses context-rich MCQ-style items. The MCQ component of the examination assesses candidates’ knowledge in six medical disciplines: internal medicine, pediatrics, surgery, psychiatry, obstetrics and gynecology, and population health, ethics, legal and organizational (PHELO) aspects of medicine. For the purpose of this study, the PHELO component of the MCCQE Part I was chosen over other disciplines, such as surgery or pediatrics, due to local concerns. However, given that a variety of research has shown the benefits of TEL in the retention of traditional clinical knowledge and skills, such as paediatrics (Larsen et al. 2009), anatomy (Janssen et al. 2014), ambulatory medicine (Cook et al. 2014), neurology (Larsen et al. 2013a, b), clinical nephrology (Schmidmaier et al. 2011), and cardiopulmonary resuscitation (Kromann et al. 2009, 2010, 2011), the present study proposed to extend on this body of research and examine whether repeated testing could also promote the retention of knowledge in areas such as ethical and legal aspects of medicine. Participants All final year Que´bec undergraduate medical students were invited to participate in the present study. An email inviting students to participate was sent out by the Undergraduate Medical Education Dean or the Vice Dean for Education from each medical school and included a brief description of the purpose of the study. A $5.00 honorarium, in the form of a gift card of the participant’s choice, was offered for each educational activity completed, including the mock licensure examination. Materials Materials for this study were developed by a panel of experts at Universite´ de Sherbrooke consisting of residents and instructors involved in the instruction of at least one of the four focal topics of the study [i.e., population health, ethics, legal and organizational aspects of medicine (PHELO)]. For each PHELO topic, the panel of experts worked with the research team to identify the relevant MCC learning objectives (Medical Council of Canada 2010). In total, seven objectives were identified for population health, eight for ethics, ten for legal and four for organizational topics in medicine. Two sets of questions were written for each identified PHELO learning objective. Each question set was comprised of five versions: one study only item, one SAQ, one contextfree MCQ and two context-rich MCQs (see Table 1). Pairs of context-rich MCQ questions were matched to the same objective but included different lead-in vignettes and response options. Only one of the two versions of the context-rich MCQs was presented during the quiz activities; however, both of the paired context-rich MCQ items were presented on the mock licensure examination. SAQs were similar to the context-rich MCQ, but the response options were removed. Context-free MCQs were created to reflect the content tested in the context-rich MCQ and SAQ but did not include a clinical vignette. Study only items were built using the answer key for the context-free MCQ, with the question and correct answer grouped together to form a statement (please see Table 1 for a sample of the different item types used in the educational activities). Each educational activity (study only, SAQ, etc.) included two questions for each relevant learning objective. The mock licensure exam contained four context-rich MCQs

123

Test-enhanced learning and question style

per objective (i.e., two context-rich MCQs previously included in the educational activities and two novel context-rich MCQs matched to the same learning objective) for a total of 116 context-rich MCQs (see Table 1). Procedure Once participants offered consent to participate, they were randomly assigned to one of four groups. Each group had the opportunity to participate in an educational activity for each of the four PHELO topics (one per week: PHELO aspects of medicine, presented in that order), and each group had the opportunity to complete each style of educational activity (study only, SAQs, context-rich MCQs and context-free MCQs), crossed by topic (see Fig. 1). In this design, all students had access to all basic material and had the opportunity to participate in all forms of educational activities, crossed with topic of study. The overall design can be seen in Fig. 1. Weekly educational activities The quizzes and study only items were available online for a pre-determined period (2–6 days) and varied across universities to accommodate for the various curricular schedules. Feedback was provided to the students whereby the correct answer was highlighted after the student answered a question (see Table 1). Feedback for SAQs was the correct response from the matching context-rich MCQs (see Table 1). Quiz items were available for a set time (90 s per question) to discourage an open-book approach to the exam. Mock licensure examination The mock examination was also completed online and was accessible for a period of 3 days. After logging on to the electronic educational platform, the students had a 2-h period to complete the examination. The mock licensure examination consisted of 116 context-rich MCQs, half of which were identical to questions presented during the contextrich MCQ portion of the learning intervention. The remaining half of the context-rich MCQ questions were matched on the content of the learning objectives and included different clinical vignettes (see Table 1). Data analysis All results were significant at the 0.05 confidence level, unless otherwise stated. All pairwise comparisons were Bonferroni-corrected to the 0.05 level. In order to balance the pedagogical value of participating in the educational activities and our research questions, we allowed participants to complete as many educational activities as they wished, meaning that they could participate in only one educational activity or only the mock licensure examination if they wished. Subsequently, not all participants completed all educational activities and the mock licensure examination. Therefore, initial analyses were conducted using all subjects in order to maximize the amount of data available for analysis. Subsequent analyses only used the sub-sample of participants who completed all four educational activities and the mock licensure examination. As the results did not vary

123

M. M. McConnell et al.

Final Year Quebec Medical Students

Group 1 Short Answer Questions Study Only

Context-Rich Multiple-Choice Questions

Group 2

Group 3

Context-Free Context-Rich Multiple-Choice Multiple-Choice Questions Questions Short Context-Free Answer Multiple-Choice Questions Questions

Study Only

Context-Rich Context-Free Multiple-Choice Multiple-Choice Questions Questions

Group 4

Study Only Context-Rich Multiple-Choice Questions

Short Answer Questions

Context-Free Multiple-Choice Questions

Study Only

Short Answer Questions

Fig. 1 Experimental design

across the two analyses, we will only be presenting within-subjects analyses, although between-subjects analyses are available from the authors on request. Data standardization Scores on the mock licensure examination were subdivided into the four PHELO topics using the original examination blueprint and corresponding MCC objectives. In order to ensure comparability of scores across the different subsections (as subsections had different numbers of items), z-scores were calculated using the mean and standard deviation for all responses within each PHELO topic. This transformation created a comparable subscore for each individual for each PHELO subject area. Overall, we conducted two separate analyses, each of which was intended to address a specific research question. First, in order to examine the influence of educational activity on later performance, we conducted a 4 9 4 mixed design ANOVA, with Type of Educational Activity (study-only, context-free MCQ, context-rich MCQ and SAQ) as the within subject factor of interest and PHELO Topic (population health, ethics, legal and organization) as the between subject factor of interest. We expected mean z-scores on the final examination to be higher when the initial educational activity contained context-rich MCQs and SAQs relative to tests that contained study only and context-free MCQs. Second, in order to investigate the generalizability of TEL effects, we investigated the magnitude of TEL effects for previously seen context-rich MCQs (e.g., questions that were identical to those seen during the educational activities) compared to learning-objective matched but novel context-rich MCQs (e.g., questions that had the same underlying content or educational objective but whose context was novel; see Table 1 for an example). Because we varied which topic contained context-rich MCQs between participants,

123

Test-enhanced learning and question style

whereby some participants saw context-rich MCQs in population health while other candidates wrote context-rich MCQs in ethics, we conducted a 4 (topic: population health, ethics, legal, and organization) by 2 (question exposure: previously seen vs. novel) mixed measures ANOVA. Overall, we expected TEL effects to be larger for previously seen context-rich MCQs than for novel but learning objective-matched MCQs.

Results Participants A total of 424 medical students from four medical schools located across Quebec, Canada participated in at least one educational activity and the mock licensure exam: Universite´ Laval (n = 153), Universite´ de Sherbrooke (n = 110), McGill University (n = 21) and Universite´ de Montre´al (n = 21). However, only 224 students participated in all four educational activities and the mock licensure exam. Role of educational activity type on later performance Students’ performance on the final licensure exam, in percentages, can be seen in Table 2. Overall, performance on the mock licensure examination varied depending on the format of the intervening quiz [F(3,660) = 82.5, p \ 0.001, partial g2 = 0.27; see Fig. 2]. Participants were more likely to respond correctly to items on the mock licensure examination when the intervening educational activity contained either SAQs or context-rich MCQs than when the educational activities were either quizzes containing context-free MCQs or the study only condition (all t’s [ 10.8, dfs = 223, all ps \ 0.001). Accuracy for the mock licensure examination did not differ between those who were exposed to SAQs or contextrich MCQs conditions [t(223) = 0.16, p [ 0.05], or between study only and context-free MCQs [t(223) = 0.50, p [ 0.05]. While there was no effect of PHELO topic on the overall accuracy on the mock licensure examination (p [ 0.05), there was a significant interaction between PHELO topic and type of educational activity [F(9,660) = 4.2, p \ 0.001, partial g2 = 0.06]. As illustrated in Fig. 2, the effect of the type educational activity on the mock licensure exam performance failed to reach significance for questions pertaining to population health after Bonferroni corrections were applied [F(3,272) = 3.0, p = 0.032]. However, the type of educational activity did influence students’ performance on the mock licensure examination for the remaining three topics (all Fs [ 21.7, all ps \ 0.001). Mock licensure examination questions pertaining to ethics, legal and organizational aspects of medicine followed the previous trend, whereby the mean accuracy did not differ between study only and context-free MCQs quiz activity types (all ts \ 2.0, all ps [ 0.054), or between context-rich MCQs and SAQs (all ts \ 1.6, all ps [ 0.11). Performance on the mock licensure benefited significantly when students were exposed to context-rich MCQs/SAQs relative to context-free MCQs/study only conditions (all ts [ 4.7, all ps \ 0.001). Previously seen compared to novel context-rich multiple-choice questions There was a significant effect of prior exposure to specific questions, with higher mean scores being found on previously seen questions relative to novel questions [F(1,

123

M. M. McConnell et al. Table 2 Mean (standard deviation) accuracy on mock final examination, as a function of the four PHELO topics Domain

Mean score

Population health

65.7 % (0.67)

Ethics

67.7 % (0.54)

Legal

69.6 % (0.54)

Organizational

64.1 % (0.75)

Fig. 2 Performance (mean z-scores) on the mock licensure examination for quizzed (context-free MCQ, context-rich MCQ, and SAQ) and non-quizzed (study only) items across the four tested topics. Error bars represent the standard error of the mean z-score

220) = 39.1, p \ 0.001, partial g2 = 0.15]. While there was no effect of PHELO topic on overall performance (F \ 1.0), there was a significant interaction between topic and previous exposure to questions [F(3,220) = 5.2, p \ 0.001, partial g2 = 0.07]. As illustrated in Fig. 3, the difference in performance across previously seen and new questions was significant for all PHELO topics (ts [ 3.8, ps \ 0.006), with the exception of questions on population health (ts \ 1.0).

Discussion Theoretical applications The present study investigated how different types of educational activities influenced students’ performance on a mock licensure examination. We were specifically interested in the potential benefit of different educational activities on later test performance, as predicted by both theories of effortful retrieval and transfer appropriate processing (Morris et al. 1977; Wheeler et al. 2003). Theories of effortful retrieval predict that participants

123

Test-enhanced learning and question style

Fig. 3 Performance on the mock licensure examination for previously seen and new context-rich MCQs

would perform better on the mock licensure examination in conditions that require them to ‘work the hardest’ during the learning phase—that is, when intervening educational activities included either SAQs or context-rich MCQs, assuming that context-rich MCQs require more effortful retrieval and application of information relative to context-free MCQs. In contrast, transfer appropriate processing predicts that performance on the mock examination would be greatest when the intervening educational activities match the conditions at test—in this case, when intervening educational activities included contextrich MCQs. In the present study, performance on the mock licensure examination was greatest when the intervening educational activities contained either context-rich MCQs or SAQs. The finding of equivalent TEL effects for context-rich MCQs and SAQs suggest that both question formats require more effortful retrieval than the context-free MCQs, which would support the effortful retrieval account of TEL. However, because both SAQs and contextrich MCQs used the same lead-in clinical vignette (e.g., same ‘‘context’’), we cannot entirely rule out the potential benefit of transfer appropriate processing. It is feasible that both effortful retrieval and transfer appropriate processing have measureable influences on TEL; however, more research is needed to better tease these effects apart. While students performed better on the mock final examination on concepts initially tested using SAQs or context-rich MCQs, the performance benefit was much smaller for concepts that were either tested using context-free MCQs or were only studied by the students. That is, TEL was reduced or even eliminated when the intervening educational activity required little to no retrieval difficulty. The discovery of equivalent performance for context-free MCQs and study only conditions is consistent with previous research (e.g., Butler and Roediger 2007; Kang et al. 2007), although other researchers have shown that even context-free MCQs lead to performance benefits relative to study only conditions (e.g., McDaniel et al. 2007b). Clearly, not all MCQs have equal influences on learning, and therefore, more research is needed to understand the relationship between specific attributes of MCQs and the magnitude of TEL.

123

M. M. McConnell et al.

Practical applications The finding of equivalent learning effects for context-rich MCQs and SAQs is a novel expansion on the traditional comparison between SAQs and MCQs (typically context-free) found in the literature. These findings provide support for the notion that context-rich MCQs may require more effortful retrieval of knowledge, as they produced learning effects that were similar to those of SAQs (Schuwirth et al. 2001; Wood 2009). This suggests that the benefit of using educational activities that are similar to those applied in high-stakes licensure examinations may be equivalent to effortful retrieval typically attributed to SAQstyle learning activities. Furthermore, the present study found that the beneficial effects of context-rich MCQs transferred only to questions that were verbatim replicas of questions seen during the educational activities. This was somewhat surprising given the growing body of research showing that the effect of testing on memory retention extends beyond simple replication of questions presented during intervening quizzes (Butler 2010; Chan 2010; Chan et al. 2006; Johnson and Mayer 2009; Rohrer et al. 2010; for review, see Carpenter 2012). For example, several studies have shown that intervening quizzes can enhance performance on memory tests that consist of non-tested information that is conceptually related to quizzed items (Chan 2010; Chan et al. 2006). Other research has shown that TEL occurs even when the final test consists of entirely new questions (Butler 2010; Johnson and Mayer 2009; Rohrer et al. 2010). These findings have lead researchers to theorize that the ‘‘benefits of TEL are not limited to the retention of the specific response tested during initial learning but rather extend to the transfer of knowledge in a variety of contexts’’ (Butler 2010, p. 1118). We were unable to replicate these findings within the context of the present study. An important factor in whether transfer to novel test items occurs is the degree of similarity between the contexts of initial learning and transfer (for review, see Barnett and Ceci 2002). In the present study, we matched previously seen and new mock licensure questions to the same MCC objective; however, while the questions were intended to test the same objective, the information contained in the clinical vignette and response options were quite different. The change in the context and content of the clinical vignette for new questions may have been large enough to disrupt transfer processes (see Table 1). In health professions education, context appears to be a significant challenge; as acknowledged by Norcini (cited in Colliver 2002), context specificity influences what is learned, how readily learning transfers to new situations and how reliably that learning is assessed. Because the context-rich MCQs and SAQs included a contextually rich clinical presentation, it is possible that the changed content of the clinical vignette interfered with the retrieval of the learned information. Clearly more research is needed on the extent to which the contextual information presented in clinical vignettes modulates the degree of TEL. While most of the research on TEL has been conducted in laboratory of classroom settings, the implementation of testing is not limited to simple written tests. Retrieval practice can be promoted using a variety of activities, such as having students answer questions orally or perform a surgical procedure—the success of the activity is based on the retrieval of knowledge, principles, skills, or procedures from memory (Larsen and Butler 2013). Clinical educators need to provide purposeful learning and practice retrieval opportunities for students (Chamberland and St-Onge 2013). In this way, the implementation of testing within clinical settings does not necessarily require advanced technologies or complex materials, but rather, requires educators to engage and challenge the learner

123

Test-enhanced learning and question style

through insightful, clinically relevant activities that promote active retrieval of information, skills, and procedures from memory. Limitations and considerations When assessing the results of the present study, several limitations should be considered. For example, because the present study attempted to align the mock licensure examination to the format utilized by the MCCQE Part I, only context-rich MCQs were present in the mock licensure examination. This design limited our ability to distinctly separate the potential influences of TEL and transfer appropriate processing (as only one item format was used on the mock licensure examination). This design format also limited our ability to examine the role of educational activities on transfer of learning to later testing (by comparing identical and content-related items) across the spectrum of item types (e.g., context-free MCQs or SAQs). Additionally, because the feedback provided for SAQs was the same as that provided for context-rich MCQs, some may argue that the learning experience (e.g., clinical vignette ? feedback given) is identical across SAQs and the matching context-rich MCQ. While this is a valid concern, it is important to note that the two question types differ with regards to the cognitive processes being evoked during testing; that is, SAQs required students to recall the correct answer from memory, while context-rich MCQs required students to recognize the correct answer among a list of distractors. Because the two question types rely on different cognitive processes (e.g., recollection vs. recognition), it is unlikely that students perceived the two events as identical educational experiences. Rather, our argument is that context-rich MCQs require the clinical application of basic science knowledge, and this application of knowledge is more similar to the cognitive processes evoked during SAQs than context-free MCQs. Nonetheless, because the present study used identical question stems for both SAQs and context-rich MCQs, more research is needed to differentiate between the underlying cognitive processes evoked by these two question types. However we believe that our findings remain of value to the health professions education community. Furthermore, it is important to note that the benefit of TEL observed across the different educational activities failed to reach statistical significance for questions pertaining to population health. As outlined in the methods description, we were not able to orthogonally manipulate the order in which the four PHELO topics were presented, and as a result, population health educational activities were always presented first, followed by educational activities pertaining to ethics, legal and finally organizational topics in medicine. Consequently, it is possible that this lack of an effect for population health suggests that the benefits of TEL may have deteriorated over the course of the four-week study period. However, previous research has illustrated TEL effects using intervals [1 month (e.g., Kromann et al. 2010; Larsen et al. 2009), so it seems unlikely that the lack of the benefit of testing is simply an artifact of the amount of time passed between the intervening and final tests. Furthermore, if the benefits of testing decrease as the time between the initial educational activity and the final assessment increases, then we might expect to find a progressive decrease in testing-effects across time, with the advantage of context-rich MCQs and SAQs lessening as the time between the initial educational activity and the final assessment increased. Rather, the present study found equivalent TEL effects for ethics, legal and organizational topics. Alternatively, the reduced TEL effects for population health may have occurred because participants were more familiar, knowledgeable or comfortable with population health content than with the other three topics. That is, if students were more knowledgeable

123

M. M. McConnell et al.

regarding issues in population health compared to the other content areas, retrieval of such information from memory would have been easier relative to other topics, and so the benefit of testing could be less. While the mean accuracy was equivalent across all PHELO topics (Table 2), the present study did not measure students’ pre-intervention PHELO knowledge, and consequently, the finding of equivalent Public Health scores on the mocklicensure examination provides no information about the amount of knowledge students had coming into the present study. Consequently, the failure to find TEL may be due to the fact that students were more knowledgeable in Population Health prior to the intervention and thus, gained little from the intervening quizzes.

Conclusions The present study revealed that context-rich MCQs, which are quite common in high stakes licensure examinations, are capable of producing testing effects similar to those of SAQs. However, the benefits of context-rich MCQs on TEL were limited to questions that were identical across the intervening quizzes and the final mock licensure examination. Nonetheless, these findings demonstrate the educational benefit of repeated testing in an educationally relevant context without controlling for variability in studying habits, variability in motivation and different delays between quizzing and final examinations. Taken together, these data demonstrate that educators must be aware of what is tested, when it is tested and the potential pedagogical value of these assessment exercises. As national licensure organizations and more medical schools move towards the use of context-rich MCQs, it is interesting to note that this question type can contribute to student learning in addition to providing a measure of their knowledge. The results of this study thus support the potential later pedagogical benefit of this growing trend. As schools struggle to renew and revise their teaching methods, it would be interesting to explore the possibility of deliberately using testing as a combined teaching and learning strategy, thus providing students with the opportunity to broaden their interaction with the material to be learned, which would hopefully lead to better retrieval of knowledge within a variety of contexts. Acknowledgments This research was supported by the Medical Council of Canada Research and Development Fund #MCC1/1112. The authors would like to thank Ms. Linda Bergeron, MSc, and Ms. Katharine Fisher for their help and support with the realization of the study.

References Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn? A taxonomy for far transfer. Psychological Bulletin, 128, 612–637. Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From leaning processes to cognitive processes: Essays in honor of William K. Estes (pp. 35–67). Hillsdale, NJ: Erlbaum. Boulet, J. (2008). Teaching to test or testing to teach? Medical Education, 42, 952–953. Butler, A. C. (2010). Repeated testing produces superior transfer of learning relative to repeated studying. Journal of Experimental Psychology. Learning, Memory, and Cognition, 36, 1118–1133. Butler, A. C., & Roediger, H. L. (2007). Testing improves long-term retention in a simulated classroom setting. European Journal of Cognitive Psychology, 19, 514–527. Carpenter, S. K. (2012). Testing enhances the transfer of learning. Current Directions in Psychological Science, 21, 279–283.

123

Test-enhanced learning and question style Carpenter, S. K., & DeLosh, E. L. (2006). Impoverished cue sup-port enhances subsequent retention: Support for the elaborative retrieval explanation of the testing effect. Memory & Cognition, 34, 268–276. Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention. Memory & Cognition, 20, 633–642. Chamberland, M., & St-Onge, C. (2013). Back to basics: Keeping students cognitively active between the classroom and the examination. Medical Education, 47, 641–643. Chan, J. C. K. (2010). Long-term effects of testing on the recall of nontested materials. Memory, 18, 49–57. Chan, J. C. K., McDermott, K. B., & Roediger, H. L. (2006). Retrieval-induced facilitation: Initially nontested material can benefit from prior testing of related material. Journal of Experimental Psychology: General, 135, 553–571. Colliver, J. A. (2002). Educational theory and medical education practice: A cautionary note for medical school faculty. Academic Medicine, 77, 1217–1220. Cook, D. A., Thompson, W. G., & Thomas, K. G. (2014). Test-enhanced web-based learning: Optimizing the number of questions (a randomized crossover trial). Academic Medicine, 89, 169–175. Hinze, S. R., & Wiley, J. (2011). Testing the limits of testing effects using completion tests. Memory, 19, 290–304. Janssen, S. A. K., VanderMeulen, S. P., Shostrom, V. K., & Lomneth, C. S. (2014). Enhancement of anatomical learning and developing clinical competence of first-year medical and allied health profession students. Anatomical Sciences Education, 7, 181–190. Johnson, C. I., & Mayer, R. E. (2009). A testing effect with multimedia learning. Journal of Educational Psychology, 101, 621–629. Kang, S. H. K., McDermott, K. B., & Roediger, H. L. (2007). Test format and corrective feedback modulate the effect of testing on long-term retention. European Journal of Cognitive Psychology, 19, 528–558. Karpicke, J. D., & Roediger, H. L. (2007). Repeated retrieval during learning is the key to long-term retention. Journal of Memory and Language, 57, 151–162. Kromann, C. B., Bohnstedt, C., Jensen, M. L., & Ringsted, C. (2010). The testing effect on skills learning might last 6 months. Advances in Health Sciences Education, 15, 395–401. Kromann, C. B., Jensen, M. L., & Ringsted, C. (2009). The effect of testing on skills learning. Medical Education, 43, 21–27. Kromann, C. B., Jensen, M. L., & Ringsted, C. (2011). Test-enhanced learning may be a gender-related phenomenon explained by changes in cortisol level. Medical Education, 45, 192–199. Larsen, D. P., & Butler, A. C. (2013). Test-enhanced learning. In K. Walsh (Ed.), Oxford textbook of medical education (pp. 443–452). Oxford: Oxford University Press. Larsen, D. P., Butler, A. C., & Roediger, H. L. (2008). Test-enhanced learning in medical education. Medical Education, 42, 959–966. Larsen, D. P., Butler, A. C., & Roediger, H. L. (2009). Repeated testing improves long-term retention relative to repeated study: A randomized controlled trial. Medical Education, 43, 1174–1181. Larsen, D. P., Butler, A. C., Lawson, A. L., & Roediger, H. L. (2013a). The importance of seeing the patient: Test-enhanced learning with standardized patients and written tests improve clinical application of knowledge. Advances in Health Science Education, 18, 409–425. Larsen, D. P., Butler, A. C., & Roediger, H. L. (2013b). Comparative effects of test-enhanced learning and self-explanation on long-term retention. Medical Education, 47, 674–682. Marsh, E. J., Agarwal, P. K., & Roediger, H. L., I. I. I. (2009). Memorial consequences of answering SAT II questions. Journal of Experi-mental Psychology: Applied, 15, 1–11. McDaniel, M. A., Anderson, J. L., Derbish, M. H., & Morrisette, N. (2007a). Testing the testing effect in the classroom. European Journal of Cognitive Psychology, 19, 494–513. McDaniel, M. A., Roediger, H. L., & McDermott, K. B. (2007b). Generalizing test-enhanced learning from the laboratory to the classroom. Psychonomic Bulletin & Review, 14, 200–206. Medical Council of Canada. (2010). Guidelines for the development of multiple-choice questions. Ottawa: Medical Council of Canada. Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16, 519–533. National Board of Medical Examiners. (2002). Constructing written test questions for the basic and clinical sciences. Philadelphia, PA: National Board of Medical Examiners. Roediger, H. L., & Karpicke, J. D. (2006). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181–210. Rohrer, D., Taylor, K., & Sholar, B. (2010). Tests enhance the transfer of learning. Journal of Experimental Psychology. Learning, Memory, and Cognition, 36, 233–239.

123

M. M. McConnell et al. Schmidmaier, R., Ebersbach, R., Schiller, M., Hege, I., Holzer, M., & Fischer, M. P. (2011). Using electronic flashcards to promote learning in medical students: Retesting vs. restudying. Medical Education, 45, 1101–1110. Schuwirth, L. W. T., Verheggen, M. M., van der Vleuten, C. P. M., Boshuizen, H. P. A., & Dinant, G. J. (2001). Do short cases elicit different thinking processes than factual knowledge questions do? Medical Education, 35, 348–356. Wheeler, M. A., Ewers, M., & Buonanno, J. (2003). Different rates of forgetting following study versus test trials. Memory, 11, 571–580. Wood, T. (2009). Assessment not only drives learning, it may also help learning. Medical Education, 43, 5–6.

123

The benefits of testing for learning on later performance.

Testing has been shown to enhance retention of learned information beyond simple studying, a phenomena known as test-enhanced learning (TEL). Research...
365KB Sizes 2 Downloads 3 Views