cognitive workload Measuring cognitive load: performance, mental effort and simulation task complexity Faizal A Haji,1,2,3 David Rojas,1,2 Ruth Childs,4 Sandrine de Ribaupierre3 & Adam Dubrowski5

CONTEXT Interest in applying cognitive load theory in health care simulation is growing. This line of inquiry requires measures that are sensitive to changes in cognitive load arising from different instructional designs. Recently, mental effort ratings and secondary task performance have shown promise as measures of cognitive load in health care simulation.

OBJECTIVES We investigate the sensitivity of these measures to predicted differences in intrinsic load arising from variations in task complexity and learner expertise during simulation-based surgical skills training.

RESULTS Significant improvements in knot-tying performance (F(1.04,24.95) = 41.1, p < 0.001 for movements; F(1.04,25.90) = 49.9, p < 0.001 for time) and reduced cognitive load (F(2.3,58.5) = 57.7, p < 0.001 for SRME; F(1.8,47.3) = 10.5, p < 0.001 for SRT) were observed in both groups during training. The simple-task group demonstrated superior knot tying (F(1,24) = 5.2, p = 0.031 for movements; F(1,24) = 6.5, p = 0.017 for time) and a faster decline in SRME over the first five trials (F(1,26) = 6.45, p = 0.017) compared with their peers. Although SRT followed a similar pattern, group differences were not statistically significant. CONCLUSIONS Both secondary task perfor-

METHODS We randomly assigned 28 novice medical students to simulation training on a simple or complex surgical knot-tying task. Participants completed 13 practice trials, interspersed with computer-based video instruction. On trials 1, 5, 9 and 13, knot-tying performance was assessed using time and movement efficiency measures, and cognitive load was assessed using subjective rating of mental effort (SRME) and simple reaction time (SRT) on a vibrotactile stimulus-monitoring secondary task.

mance and mental effort ratings are sensitive to changes in intrinsic load among novices engaged in simulation-based learning. These measures can be used to track cognitive load during skills training. Mental effort ratings are also sensitive to small differences in intrinsic load arising from variations in the physical complexity of a simulation task. The complementary nature of these subjective and objective measures suggests their combined use is advantageous in simulation instructional design research.

Medical Education 2015: 49: 815–827 doi: 10.1111/medu.12773 Discuss ideas arising from the article at www.mededuc.com discuss. 1 Faculty of Medicine, Wilson Centre, University of Toronto, Toronto, Ontario, Canada 2 SickKids Learning Institute, Hospital for Sick Children, Toronto, Ontario, Canada 3 Department of Clinical Neurological Sciences, University of Western Ontario, London, Ontario, Canada 4 Department of Leadership, Higher and Adult Education, Ontario Institute for Studies in Education, University of Toronto, Toronto, Ontario, Canada

5 Division of Emergency Medicine, Memorial University of Newfoundland, St John’s, Newfoundland and Labrador, Canada

Correspondence: Faizal A Haji, Faculty of Medicine, Wilson Centre, University of Toronto, Room 1ES-565, 200 Elizabeth Street, Toronto, Ontario M5G 2C4, Canada. Tel: 00 1 647 972 8086; E-mail: [email protected]

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

815

F A Haji et al facilitates the acquisition and automation of schemas).15,18,19

INTRODUCTION

A considerable evidence base has accumulated supporting the use of simulation in education and training in the health professions.1 Multiple systematic reviews and meta-analyses demonstrate the educational value of simulation,2–4 which has been shown to ‘flatten’ the learning curve of novices learning procedural and surgical skills5,6 and improve performance on transfer to the clinical environment.7,8 Notwithstanding these findings, there is limited research specifying the ‘active ingredients’ that optimise simulation training outcomes.2,9,10 Thus, it is not surprising that instructional design has emerged as a top priority for future research in the field.11–13 Studies grounded in established theories of instruction have been advocated to advance this line of inquiry.10,12 One framework that has garnered interest in health care simulation is cognitive load theory (CLT).14,15 Based on the premise that human working memory is limited,16,17 CLT argues that performance and learning are impaired when the total cognitive load (CL) associated with training exceeds working memory capacity.18 Cognitive load theory further articulates three sources of CL that may be imposed during training.19 Intrinsic load reflects the cognitive demands associated with executing a learning task, and is determined by task complexity (the number of information elements that must be simultaneously processed in working memory) and learner expertise (the sophistication of the learner’s schemas related to the task).18 As such, intrinsic load is a function of the task–learner interaction.20 Extraneous load arises when task elements that are unrelated to the goals of instruction must be processed, reflecting inefficient instructional design.21 Finally, germane load refers to the cognitive resources invested in dealing with intrinsic load that contribute to genuine learning (i.e. acquisition and automation of schemas stored in long-term memory).18 As these sources of load are considered additive (resulting in the total CL imposed by a learning task), CLT advocates for instructional designs that maximise training efficiency by minimising extraneous load, optimising germane load and managing intrinsic load.19 Importantly, whereas extraneous and germane loads can be modulated by changing the presentation of information or the activities in which learners engage during training, intrinsic load can only be managed by modifying: (i) the learning task itself (e.g. through simplification), or (ii) the act of learning (e.g. through training that

816

Applying CLT to investigate the effect of instructional design on simulation-based learning requires measures of CL that are sensitive to the working memory demands imposed during simulation training. Current CL measures fall into three broad categories: psychophysiological indices; secondary task techniques, and subjective ratings.20 Both psychophysiological and secondary task methods estimate total CL,22 based on changes in physiological parameters that indirectly reflect working memory demands (pupil dilation, heart rate variability, etc.),23,24 or performance on secondary tasks (e.g. memory or stimulus detection tasks) that reflect ‘spare’ working memory capacity not consumed during training.23 Similarly, most subjective rating measures are used to estimate total CL,20–22,25 for instance by asking learners to introspect on the cognitive resources they invested towards a learning task.24 Currently, the measurement of CL in health care simulation research is heavily reliant on subjective rating scales, the psychometric properties of which have not been established in the setting of health professions education.26 Recent reviews also indicate the need for further research to examine the applicability of secondary task performance in the context of simulation-based procedural skills training.27 For instance, although the subjective rating of mental effort (SRME) and simple reaction time (SRT) to a vibrotactile stimulus detection secondary task have shown promise as measures of total CL in health care simulation,28,29 there is conflicting evidence regarding whether SRT-based measures are sensitive to changes in intrinsic load within novice learners as they acquire expertise through simulation-based procedural skills training. In addition, the sensitivity of both SRME and SRT-based measures to predicted differences in intrinsic load arising from training tasks of varying complexity has yet to be established,28 which limits the utility of these measures in simulation instructional design research. To address these gaps, this study investigated the sensitivity of SRME and SRT-based secondary task measures to predicted changes in intrinsic load as a function of: (i) learner expertise in a basic surgical skill, and (ii) simulation training task complexity. This was achieved through an experiment in which novices’ task performance and total CL were assessed during simulation training on either a

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

Measuring cognitive load in simulation simple or a complex surgical knot-tying task. To ensure that observed differences in total CL (reflected in the SRME and SRT measures) could be attributed to intrinsic load, we held extraneous and germane loads constant by standardising the presentation of instructional information and the nature of practice across all learners.21,25 In so doing, we hypothesised that: (i) during simulation training, novices would demonstrate improved knottying performance and lower SRME and SRT (indicating a reduction in intrinsic load associated with knot tying), and (ii) novices training on the simple task would demonstrate lower SRME and SRT (indicating that this task imposed lower intrinsic load) and superior knot-tying performance compared with their peers training on the complex task, based on the assumption that the latter task is too complex for our participants’ current level of expertise.

METHODS

Participants and randomisation Ethical approval for the study was obtained. Subsequently, year 1 medical students from the University of Toronto (n = 28) with no prior experience in

surgical knot tying were recruited to participate in a two-arm, prospective randomised trial. A computergenerated, blocked randomisation technique was used to assign participants to either the simple or complex simulation training task. Group allocation was concealed prior to training and participants were blinded to group assignment throughout the study. Primary and secondary tasks The simple and complex one-handed knot-tying tasks were developed using a benchtop simulator, similar to the apparatus described by Brydges et al.30 (Fig. 1). In the simple task, participants were instructed to tie one-handed surgical knots as quickly and accurately as possible on a ¾-inch wooden dowel using #1 silk ties to simulate a knot used to secure a chest tube. In the complex task, participants were also instructed to tie one-handed knots as quickly and accurately as possible; however, in this case the knots were tied on a ¼-inch Penrose drain placed within an abdominal training model using 4–0 nylon suture (while wearing sterile gloves) to simulate tying off a blood vessel in a deep surgical cavity. The drain was loosely anchored to a simulated tissue pad (such that excess force would result in ‘avulsion’ of the simulated vessel) and

(a)

(c)

(b)

(d)

Figure 1 Simple and complex simulation tasks. (a, b) Participants in the complex task practised one-handed surgical knot tying using 4–0 nylon suture while gowned and gloved. The simulator consisted of a benchtop simulator comprising a ¼inch Penrose drain loosely secured to two skin pads set inside a laparoscopic box trainer and constricted by a circular piece of foam and thus the task simulated the tying of a friable vessel inside a deep surgical cavity. (c, d) Conversely, participants in the simple task practised surgical knot tying using #1 silk ties without gown or gloves; their simulator was constructed from a ¾-inch wooden dowel securely fastened to the top of a lapbox trainer to provide unobstructed access to the tying surface

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

817

F A Haji et al participants were explicitly instructed to pay attention to the force they were applying in order to tie off the vessel without strangulating or tearing it. Each of these experimental manipulations was designed to increase the complexity (and thus intrinsic load) of the complex learning task by modifying an element of the physical environment relevant to surgical knot tying.31 For instance, tying in a deep cavity with a smaller suture while wearing gloves reduced the visual and haptic feedback available to learners, which could subsequently impact their ability to refine their knot-tying performance. Similarly, tying onto a loosely secured Penrose drain (rather than a well-anchored wooden dowel) required that learners’ consider the force applied during knot tying, adding an additional cognitive element to the learning task. Throughout the study, a simple stimulus-monitoring secondary task was used to facilitate the objective measurement of novices’ total CL.23 The task required the participant to monitor a small unit (dimensions: 152.0 9 107.0 9 54.0 mm) attached to his or her leg, which would vibrate four times at random time intervals. Each vibration lasted up to 3 seconds and there was a minimum of 7 seconds between vibrations. Each participant was instructed to press a pedal placed directly under his or her right foot as quickly as possible upon detecting the vibrotactile stimulus. Customised software recorded the SRT (the time from stimulus presentation to the pressing of the pedal) on each vibration.29 Experimental protocol Participant flow during the study is described in Fig. 2. After providing informed consent and

demographic data, all participants completed a 3minute trial on the secondary task to establish their baseline SRT. This was followed by a single session of simulation-based knot-tying training lasting approximately 2.5 hours. During training, the content and presentation of instructional information, as well as the nature of practice, were standardised in both groups to control for the extraneous and germane loads imposed on learners. All participants began with 15 minutes of computer-based video instruction (CBVI) on surgical knot tying, using a pre-recorded instructional video that provided real-time and slowmotion demonstrations of an expert tying onehanded surgical knots with voiceover instructions describing the technique.32 Following CBVI, all participants completed 13 knot-tying trials (3 minutes each) in which they attempted to tie as many single square knots as possible on the simulation task to which they were assigned. Between each trial, learners were given up to 5 minutes to review the instructional video. The total practice time (approximately 40 minutes) was selected based on literature demonstrating that most novices achieve a plateau in knot-tying performance within this training period.32 Practice trials 1, 5, 9 and 13 were conducted under dual-task conditions and treated as ‘test’ trials in the subsequent analysis. During these trials, participants were instructed to focus on the knot-tying task, but also to attend to the vibrotactile stimulus as they were able. Following each of these trials, the participant’s SRME was recorded using the single-item measure described by Paas.33 Specifically, each participant was asked to ‘Please rate the level of mental effort you invested towards the knot-tying task by indicating the appropriate response’ on a Likert scale ranging from 1 (very, very low mental effort) to 9 (very, very high mental effort). To anchor

Figure 2 Experiment design and participant flow. CBVI = computer-based video instruction; RT = reaction time; single task = tying surgical knots only; dual task = tying surgical knots while attending to a vibrotactile stimulus-monitoring secondary task

818

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

Measuring cognitive load in simulation participants’ responses, a rating of 1 was considered to represent the effort required to drive a car (as an example of an automatic activity requiring few or no cognitive resources), and a rating of 9 was indicative of the effort required to write an examination (as an example of an activity that required the investment of all available cognitive resources).28 Outcome measures Knot-tying performance was assessed using time and motion efficiency measures. These metrics are sensitive to performer expertise and task complexity, and have been shown to correlate with global rating measures for this skill.30,34,35 An electromagnetic motion tracking system (Patriot; Polhemus, Inc., Colchester, VT, USA) connected to the Imperial College Surgical Assessment Device (ICSAD; Department of Surgical Oncology and Technology, Imperial College of Science, Technology and Medicine, London, UK) was used to track the total number of movements during each practice trial.30,32 The number of square knots tied during each practice trial was also recorded in order to calculate the average number of movements and time per one-handed knot throw. These two metrics were used in all subsequent analyses, with fewer movements and less time per throw considered to reflect superior knot-tying performance. Cognitive load was assessed objectively using participants’ performance on the secondary task. This was achieved by averaging the SRTs for all stimuli presented in a given trial. To account for individual differences in reaction time, each participant’s mean baseline SRT was subtracted from the mean SRT for each dual-task trial to generate the ‘SRT difference’. In addition, participants’ SRMEs were recorded after each dual-task trial. In accordance with the original conceptualisation of these measures,20,24,33,36 both SRT and SRME were treated as estimates of total CL. However, because extraneous and germane loads were held constant, differences in total CL between groups and across practice trials were assumed to reflect changes in intrinsic load. Statistical analysis

were summarised using descriptive statistics, with group differences tested using Fisher’s exact test for categorical variables and independent-samples t-tests for continuous variables. Each knot-tying performance and CL measure was analysed using two-way mixed analysis of variance (ANOVA), with task (simple or complex) as the between-subjects factor and trial (1, 5, 9 or 13) as the repeated, within-subjects factor. The Greenhouse–Geisser correction was applied to analyses in which the sphericity assumption was violated. Significant task 9 trial interactions were explored using repeated contrasts to determine the nature of the interaction, with Bonferroni correction applied to adjust for multiple comparisons.37 Effect size is reported using Cohen’s d for task main effects and Cohen’s f for trial main effects and interactions.

RESULTS

Participant demographics There were no significant differences in demographic variables between participants in the two training groups (Table 1). The participants’ Table 1

Participant demographics by case assignment

Variable*

Simple task

Complex task

Age, years, mean  SE

24.0  0.7

24.1  0.6

Gender, n Male

9

8

Female

5

6

Handedness, n Left

1

2

Right

13

12

Prior teaching or clinical observation of surgical knot tying, n Yes

6

7

No

8

7

Prior simulation training in knot tying (suturing workshop), n Yes

2

2

No

12

12

Prior simulation training for other skills, n

All statistical analyses were conducted using IBM SPSS Statistics for Windows Version 21.0 (IBM Corp., Armonk, NY, USA). In order to make use of all available data, participants for whom some observations were missing (arising from equipment failure, representing < 10% of all data) were included in the analysis of all variables in which they had non-missing data (pairwise deletion). Demographic data

Yes

0

1

No

14

13

SE = standard error * Independent-samples t-tests for continuous variables and Fisher’s exact tests for binary variables revealed no statistically significant differences between the groups

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

819

F A Haji et al average age was 24 years (range: 21–30 years). The majority of participants were male (61%) and righthanded (89%). Thirteen participants (46%) had received some prior teaching on surgical knot tying or had observed the skill in a clinical setting; however, this experience was minimal (mean didactic teaching time: 0.25 hours; mean number of instances of knot tying observed: 1.14). Four participants (14%) had received training on instrument tying during a skills workshop and one participant had previously participated in a laparoscopic simulation study.

Cognitive load A significant reduction in mental effort (F(2.3,58.5) = 57.7, p < 0.001, f = 1.49) and improvement in secondary task performance (F(1.8,47.3) = 10.5, p < 0.001, f = 0.63) was observed over the 13 practice trials, indicating total CL declined over the simulation training period in both groups (Fig. 4). A significant task 9 trial interaction was observed for SRME (F(2.3,58.5) = 5.57, p = 0.005, f = 0.46), with a faster decline in mental effort from trials 1 to 5 observed among novices training on the simple task versus those on the complex task (F(1,26) = 6.45, p = 0.017, d = 0.99). This difference remained significant after correcting for multiple comparisons (adjusted a ≤ 0.017). A trend towards a significant overall difference between the simple and complex groups was also observed on SRME (F(1,26) = 3.9, p = 0.059, f = 0.78), with participants in the simple task group demonstrating the lower mental effort ratings compared with those in the complex task group on trials 5, 9 and 13. Although a similar pattern was observed for SRT, the effect sizes related to group assignment (d = 0.19) and the interaction between task and trial (f = 0.27) were small and not statistically significant.

Knot-tying performance Participants training on both the simple and complex tasks demonstrated significant improvement in their knot-tying performance during simulation training, requiring fewer movements (F(1.04,24.95) = 41.1, p < 0.001, f = 1.31) and less time (F(1.04,25.90) = 49.9, p < 0.001, f = 1.42) to complete each throw of the surgical knot over the course of the 13 practice trials. Participants training on the simple task also demonstrated superior knot-tying performance compared with those on the complex task, requiring fewer movements (F(1,24) = 5.2, p = 0.031, d = 0.93) and less time (F(1,24) = 6.5, p = 0.017, d = 1.05) per knot. A plateau in knottying performance was observed following the fifth trial in both groups (Fig. 3). A trial 9 task interaction was not observed, indicating that the rate of improvement in movement and time efficiency did not differ significantly between the two groups. (a)

DISCUSSION

We investigated the effect of task complexity on novices’ knot-tying performance and total CL during simulation training in an effort to establish the (b)

Figure 3 Knot-tying performance, represented by (a) mean number of movements and (b) time per throw for participants training in the simple and complex tasks at practice trials 1, 5, 9 and 13. Error bars represent 95% confidence intervals

820

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

Measuring cognitive load in simulation (a)

(b)

Figure 4 Cognitive load, represented by (a) mean simple reaction time (SRT) difference from baseline and (b) mean subjective rating of mental effort (SRME) for participants training in the simple and complex tasks at practice trials 1, 5, 9 and 13. Error bars represent 95% confidence intervals

sensitivity of SRME and SRT-based secondary task performance to predicted variations in intrinsic load arising from: (i) changes in learners’ expertise during simulation-based surgical skills training, and (ii) differences in simulation training task complexity. Although this line of inquiry is in its early stages, the results are promising and reveal important insights regarding the measurement of CL in health professions education and its application to the study of simulation instructional design. Interpretation of experimental findings Our first hypothesis was that simulation training would result in improved knot-tying performance and reduced intrinsic load among novice learners. The results demonstrate precisely this pattern: in both groups a significant decline in SRME and SRT and improved knot-tying efficiency were observed during the training period. This finding is consistent with CLT’s prediction that the intrinsic load imposed by the knot-tying task declined as novices developed schemas for this surgical skill through simulation-based deliberate practice. From a motor learning perspective, the plateau in knot-tying performance observed between trials 5 and 9 may also reflect learners’ transition from the ‘cognitive’ phase of psychomotor skill acquisition (in which the steps required to tie surgical knots are schematised) to the ‘associative’ phase (in which this schema is elaborated to make task performance more efficient).38,39

Although these results corroborate those of prior research regarding the sensitivity of subjective rating measures,28,40 they contradict findings related to secondary task performance among novices engaged in simulation-based surgical skills training.5,28,41 Specifically, Stefanidis et al.5,41 observed that novices’ performance on a complex visuospatial secondary task did not improve until they approached automaticity on a laparoscopic knot-tying task (after approximately 10 hours of simulation training). Similarly, our research group observed that novices’ recognition reaction time did not improve after 30 minutes of one-handed surgical knot-tying training on a simple benchtop simulator; however, when a simple reaction time secondary task (identical to that employed in the current study) was used, improved reaction times were observed within this training period.28,29 Taken together, these results highlight the potential for interference between primary and secondary tasks.20 In turn, the sensitivity of dual-task performance measures is likely to depend on appropriately matching primary and secondary tasks so that they consume the same pool of cognitive resources,5,42 but do not overload novices’ working memory capacity when they are performed simultaneously. Our second hypothesis was that training on a simple simulation task would result in superior knot-tying performance and lower intrinsic load compared with training on a complex task. The SRME data support this hypothesis. Specifically, the significant

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

821

F A Haji et al interaction observed in SRME and the trend towards significantly lower total CL in the simpletask group suggests that although both tasks imposed similar cognitive demands at the onset of training, after the fifth practice trial, the simple task imposed less load than the complex task. This pattern suggests that novices may have experienced similar intrinsic load during the initial ‘cognitive’ phase of skill acquisition, whereas differences in mental effort between the two tasks emerged as learners reached a plateau in knot-tying performance (reflecting a transition to the ‘associative’ phase). These data also suggest that the manipulations to the physical dimensions of the complex task (e.g. tying in a deep cavity, with a smaller suture, on a more friable surface) primarily affected intrinsic load when learners were refining and automating the motor elements of the knot-tying task. However, these interpretations should be adopted cautiously, given the potential for bias in the SRME measure. Although we assume that changes in novices’ mental effort ratings reflect reciprocal changes in intrinsic load, it is possible that they reflect some other confounding variable, such as learners’ perceptions of how well they performed the task.28 This underscores the need for further ‘response process’ validity evidence for the SRME measure43 in order to better understand exactly what learners base their mental effort ratings on. Interestingly, despite similar trends in the SRME and SRT data, group differences in SRT were not statistically significant. This may indicate that SRME is more sensitive than SRT to differences in the intrinsic load imposed by the two training tasks. This interpretation is supported by an extensive evidence base in educational psychology which suggests that SRME is sensitive to relatively small differences in CL arising from different instructional designs.20,24,25,44 Conversely, a number of studies have documented individual variance in working memory capacity,36,45 which may impact the amount of ‘spare’ cognitive resources individual participants are able to invest in the secondary task, thereby increasing the variance in this measure. Alternatively, it is possible that our study was not sufficiently powered to detect significant group differences in SRT, given the smaller effect sizes associated with the task complexity manipulation compared with those associated with the change in SRT over the training period. However, we believe the most likely explanation for this finding is that the differences in intrinsic load between the simple and complex tasks were not par-

822

ticularly large. This may also account for the lack of significant difference in the rate of improvement in knot-tying performance between the two groups, particularly if the simple task made knot tying easier to perform, without impacting the rate at which the skill was learned in comparison with the complex task. A similar pattern was noted by Dubrowski et al.,46 who found that training under initially simplified practice conditions did not affect novices’ rate of laparoscopic knot-tying skill acquisition compared with training under complex conditions. Brydges et al.30 demonstrated a similar effect in junior and senior residents’ tying of surgical knots on simple and complex simulation: although performance in both groups was reduced on a task of higher complexity, the difference in performance between juniors and seniors remained constant, suggesting that the performance decrement was related to the inherent difficulty of the task rather than to the performers’ surgical skill. Implications The results of this study have a number of implications for educators and researchers interested in CL measurement and instructional design for simulation-based procedural skills training. The first relates to the independent measurement of intrinsic, extraneous and germane loads in instructional settings. This is an area of active controversy in the CLT literature. In fact, the theory has recently come under intense scrutiny because many existing measures of CL (including those used in this study) cannot differentiate between the three types of load.22,47–49 It has been argued that without such a distinction, the central assumptions of CLT cannot be falsified as the theory can be used to explain every possible pattern of data arising from a comparison between instructional interventions after the fact.49 In light of this issue, a number of investigators have developed multiple-item subjective rating scales in an attempt to quantify each type of load separately.26,44,50 Although there is some evidence to indicate that such scales tap into different aspects of CL,26,44,50,51 these results are preliminary and somewhat inconsistent.47 Furthermore, many cognitive load theorists question whether the three types of load can ever be distinguished psychometrically.20–22,47 For instance, Sweller argues that all load types share a common underlying mechanism (the interactivity of information elements held in working memory) and, as a result, learners experience CL only as an overall phenomenon.21 Thus, whereas an educator may be able to determine

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

Measuring cognitive load in simulation which interacting task elements reflect intrinsic or extraneous load (based on prior knowledge of instructional content and design), learners may not be capable of making similar distinctions.21,47 These issues are compounded by recent reformulations of CLT, which argue that germane load is not an independent source of CL, but, rather, reflects the resources invested in dealing with intrinsic load (and thus subsumed within it).21,52 Together, these arguments raise doubts about CL estimates generated from the aforementioned multiple-item rating scales. We took an alternative approach in this study, whereby we specified the elements of our learning task that were theorised to impact intrinsic load a priori. We then based our hypotheses, experimental manipulations and subsequent interpretations exclusively on these elements, while holding others that would impact extraneous and germane loads constant. By doing so, we posit that the changes in total CL reflected in the SRME and SRT-based secondary task measures can be attributed to variations in intrinsic load arising from changes in learners’ expertise in surgical knot tying and differences in the complexity of the two training tasks. Although this technique has previously been applied with subjective ratings to detect variations in intrinsic load imposed by cognitive tasks (i.e. problem solving),25 to our knowledge this is the first study to demonstrate its utility with secondary task measures and in the setting of psychomotor skills training. Furthermore, although our study focused on sources of intrinsic load, the same approach could be applied to investigate sources of extraneous or germane load as long as only a single load type is manipulated at a given time. A number of cognitive load theorists have recommended precisely this approach to avoid the aforementioned circularity associated with CLT.20,21,25 The second implication concerns the measurement of CL over time. Specifically, our findings demonstrate that both SRME and SRT-based secondary task measures of total CL can be used to detect variations in intrinsic load within novices over the course of skills training. This finding is a novel contribution to the CLT literature because most CLT studies compare the loads imposed by different instructional procedures, rather than measuring changes in load within learners over a training period. In fact, a recent criticism of CLT highlights the paucity of evidence regarding the sensitivity of existing measures to changes in CL over time.47 Our results help to address this gap, thereby furthering our understanding of how these measures can be used in instructional settings. These results also

highlight added advantages of SRME and SRT over other CL measures currently employed in simulation-based training.5,40,53,54 Both are relatively easy to administer, can be used repeatedly within a single training session, and are sensitive to changes in load in the early stages of skill acquisition, making them particularly useful for CL measurement among novice populations.28 As such, educators can use these measures to track changes in intrinsic load within individual learners during simulation training, providing an additional method to gauge novices’ level of expertise on a given skill (beyond learning task performance).5,42,55 Future studies may also be able to apply these measures as a ‘diagnostic tool’ to estimate the cognitive demands imposed by learning tasks at various points during training, which may be useful for educators who wish to move learners between training tasks of differing levels of complexity.20,56 The final implication relates to the measurement of CL in simulation instructional design research. The finding that mental effort ratings and secondary task performance are sensitive to predicted differences in intrinsic load arising from different simulation training tasks adds to a growing body of literature indicating that these measures can be used to quantify the loads imposed by different simulation instructional design features. Our specific findings represent a novel contribution to the literature on CL measurement in health care simulation, as most studies in this domain focus on measuring changes in load as a function of learners’ expertise,5,41,53,54 emotion57 and fatigue.58 Our results also expand the ‘theory net’ for CLT (i.e. the range of instructional situations in which the theory is applicable) to include learning tasks with a significant motor component.48 This is important because CLT is most often applied to cognitive learning tasks, and there is ongoing controversy regarding the extent to which its principles (and working memory constraints in general) apply to the acquisition of motor skills.59,60 However, the clear pattern of interference between primary and secondary tasks in our study suggests that both draw on a common pool of cognitive resources, supporting the argument that working memory capacity plays an important role in simulation-based learning of surgical and procedural skills. Furthermore, the consistency with which CLT predictions matched our experimental data supports the argument that the theory is applicable to instructional design research in this domain. The findings of this study also provide fruitful ground for future inquiry, given that the manipula-

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

823

F A Haji et al tions of the physical complexity of our simulation training task did not have as large an impact on learners’ intrinsic load (and the rate of knot-tying skill acquisition) as we had predicted. This raises important questions regarding how manipulations of the physical dimensions of a simulation task (e.g. in comparison with the cognitive or affective dimensions) impact on CL during procedural skills training. Interestingly, there is some evidence that changes to the physical environment in which a learning task is performed (e.g. ambient noise or lighting) can influence CL, suggesting that the experimental manipulations used in the current study should have had an effect on total load.31 However, research investigating the link between working memory and the physical environment is still at an early stage, and many questions are as yet unanswered, including: (i) what are the conditions in the physical environment that increase CL and how? (ii) Do these conditions impact CL at specific points during skill acquisition (e.g. in the ‘associate’ phase)? (iii) Can learners manage the resulting high CL using cognitive strategies, such as by selectively attending to some elements of a task while ignoring others? (iv) What are the implications for instructional design? Future studies investigating these issues will help to elucidate not only what the sources of CL are in simulation-based training environments, but also how learners respond in highload situations and subsequent implications for performance, learning and CL measurement.26,31,61 Limitations There are several limitations to consider in this study. Firstly, we investigated the sensitivity of the CL measures among novice learners in the context of the development of a single psychomotor skill and thus generalisations to more experienced trainees or other clinical skills should be undertaken with due caution. Secondly, our knot-tying performance measurement focused on time and motion efficiency and thus we are unable to comment on the effect of the experimental manipulations on differences in knot-tying quality or expert-based assessment of technical skill. Similarly, because we were primarily interested in the relationship between the CL measures and primary task performance, our experimental design focused on measuring performance and CL during simulation training rather than after a retention period. As a result, the extent to which the observed performance patterns reflect motor learning (i.e. a permanent change in participants’ knot-tying skills)39,62,63 cannot be determined from our results. Thirdly, the ways in which the

824

SRME and SRT measures were used in this study generate estimates of average CL during a given time period. Although such measures are useful for tracking changes in CL over multiple learning episodes or for comparing the cognitive efficiency of two instructional designs, they cannot provide additional information about instantaneous changes in CL within a given learning experience.20,47 Finally, our approach of using measures of total CL to capture changes in a single load type (i.e. intrinsic load) is limited in settings in which two or more types of load are manipulated at once (e.g. by combining two instructional design features). Given that some instructional manipulations are theorised to impact multiple types of load at once,48 finding ways to reliably capture such changes within a single study will continue to be an important area for future inquiry.

CONCLUSIONS

The results of this study demonstrate that both secondary task performance and mental effort ratings are sensitive to changes in intrinsic load resulting from simulation training, and that mental effort ratings in particular may be sensitive to small differences in intrinsic load arising from variations in task complexity. Considered in light of the existing literature, the findings also highlight the importance of matching primary and secondary tasks when the latter are used to measure CL. Ultimately, given the inherent limitations of each measure, the combined use of subjective ratings, secondary task performance and primary task performance may be helpful in determining the impact of instructional design on simulation learning outcomes. Contributors: FAH contributed to the conception of this research and the design of the experimental protocol, and to data analyses and interpretation, and drafted the manuscript. DR contributed to data collection and analyses. RC contributed to the design of the experimental protocol and to data interpretation. SdR contributed to the design of the experimental protocol. AD contributed to the conception of this research, the design of the experimental protocol and the interpretation of findings. DR, RC, SdR and AD contributed to the critical revision of the paper. All authors approved the final manuscript for publication. Acknowledgements: the authors wish to acknowledge the support of the Mount Sinai Surgical Skills Center in providing the equipment used to construct the simulators used in this study. The authors also wish to thank Drs Glenn Regehr, Centre for Health Education Scholarship, University of British Columbia, Vancouver, BC, Canada. James Drake, Division of Neurosurgery, Hospital for Sick

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

Measuring cognitive load in simulation Children, Toronto, ON, Canada and Kulamakan (Mahan) Kulasegaram, The Wilson Centre, University of Toronto for their support during the experimental design and data analysis, and Rob Shegawa, Faculty of Business and Information Technology, University of Ontario Institute of Technology, Oshawa, ON, Canada for developing the custom software used for the secondary task. Funding: this study was supported by a Royal College of Physicians and Surgeons of Canada (RCPSC) Medical Education Research Grant, as well as funds from an RCPSC Fellowship for Studies in Medical Education and Canadian Institutes for Health Research Vanier Canada Graduate Scholarship held by the primary author. Conflicts of interest: none. Ethical approval: this study was approved by the University of Toronto Health Sciences Research Ethics Board (protocol 27427).

10

11

12

13

REFERENCES 1 Haji FA, Hoppe DJ, Morin M-P, Giannoulakis K, Koh J, Rojas D, Cheung JJ. What we call what we do affects how we do it: a new nomenclature for simulation research in medical education. Adv Health Sci Educ Theory Pract 2014;19 (2):273–80. 2 Cook DA, Hatala R, Brydges R, Zendejas B, Szostek JH, Wang AT, Erwin PJ, Hamstra SJ. Technologyenhanced simulation for health professions education: a systematic review and meta-analysis. JAMA 2011;306 (9):978–88. 3 McGaghie WC, Issenberg SB, Cohen ER, Barsuk JH, Wayne DB. Does simulation-based medical education with deliberate practice yield better results than traditional clinical education? a meta-analytic comparative review of the evidence. Acad Med 2011;86 (6):706–11. 4 Nestel DD, Groom JJ, Eikeland-Husebø SS, O’Donnell JMJ. Simulation for learning and teaching procedural skills: the state of the science. Simul Healthc 2011;6 (7 Suppl):10–3. 5 Stefanidis D, Scerbo MW, Korndorffer JR Jr, Scott DJ. Redefining simulator proficiency using automaticity theory. Am J Surg 2007;193 (4):502–6. 6 Stefanidis D, Korndorffer JR, Markley S, Sierra R, Heniford BT, Scott DJ. Closing the gap in operative performance between novices and experts: does harder mean better for laparoscopic simulator training? J Am Coll Surg 2007;205 (2):307–13. 7 Gurusamy K, Aggarwal R, Palanivelu L, Davidson BR. Systematic review of randomised controlled trials on the effectiveness of virtual reality training for laparoscopic surgery. Br J Surg 2008;95:1088–97. 8 Teteris E, Fraser K, Wright B, McLaughlin K. Does training learners on simulators benefit real patients? Adv Health Sci Educ Theory Pract 2012;17 (1):137–44. 9 Cook DA, Bordage G, Schmidt HG. Description, justification and clarification: a framework for

14 15

16

17

18

19

20

21

22

23

24

25

classifying the purposes of research in medical education. Med Educ 2008;42:128–33. Cook DA, Hamstra SJ, Brydges R, Zendejas B, Szostek JH, Wang AT, Erwin PJ, Hatala R. Comparative effectiveness of instructional design features in simulation-based education: systematic review and meta-analysis. Med Teach 2013;35 (1):e844–75. Issenberg SB, Ringsted C, Østergaard D, Dieckmann P. Setting a research agenda for simulation-based healthcare education. Simul Healthc 2011;6 (3):155– 67. Dieckmann PP, Phero JCJ, Issenberg SBS, KardongEdgren SS, Østergaard DD, Ringsted CC. The first Research Consensus Summit of the Society for Simulation in Healthcare: conduction and a synthesis of the results. Simul Healthc 2011;6 (Suppl):1–9. Haji FA, Da Silva C, Daigle DT, Dubrowski A. From bricks to buildings: adapting the medical research council framework to develop programmes of research in simulation education and training for the health professions. Simul Healthc 2014;9 (4):249–59. Sweller J. Cognitive load during problem solving: effects on learning. Cogn Sci 1988;12 (2):257–85. van Merri€ enboer JJG, Sweller J. Cognitive load theory in health professional education: design principles and strategies. Med Educ 2010;44:85–93. Miller GA. The magical number seven plus or minus two: some limits on our capacity for processing information. Psychol Rev 1956;63 (2):81–97. Cowan N. Working memory underpins cognitive development, learning, and education. Educ Psychol Rev 2014;26 (2):197–223. Young JQ, van Merri€ enboer J, Durning S, ten Cate O. Cognitive load theory: implications for medical education: AMEE Guide No. 86. Med Teach 2014;36 (5):371–84. Sweller J, van Merri€ enboer JJG, Paas FGWC. Cognitive architecture and instructional design. Educ Psychol Rev 1998;10 (3):251–96. Paas F, Tuovinen JE, Tabbers H, van Gerven PWM. Cognitive load measurement as a means to advance cognitive load theory. Educ Psychol 2003;38 (1):63– 71. Sweller J. Element interactivity and intrinsic, extraneous, and germane cognitive load. Educ Psychol Rev 2010;22 (2):123–38. Schnotz W, K€ urschner C. A reconsideration of cognitive load theory. Educ Psychol Rev 2007;19 (4):469–508. Brunken R, Plass JL, Leutner D. Direct measurement of cognitive load in multimedia learning. Educ Psychol 2003;38 (1):53–61. Paas FG, van Merri€ enboer JJ, Adam JJ. Measurement of cognitive load in instructional research. Percept Mot Skills 1994;79 (1 Pt 2):419–30. Ayres P. Using subjective measures to detect variations of intrinsic cognitive load within problems. Learn Instr 2006;16 (5):389–400.

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

825

F A Haji et al 26 Naismith LM, Cheung JJH, Ringsted C, Cavalcanti RB. Limitations of subjective cognitive load measures in simulation-based procedural training. Med Educ 2015;49:805–814. 27 Naismith L, Hambaz S, Cavalcanti RB. How should we measure cognitive load in postgraduate simulation-based education? Med Educ 2014;48 (Suppl 1):128. 28 Haji FA, Khan R, Regehr G, Drake J, de Ribaupierre S, Dubrowski A. Measuring cognitive load during simulation-based psychomotor skills training: sensitivity of secondary-task performance and subjective ratings. Adv Health Sci Educ Theory Pract 2015; PMID: 25764154. [Epub ahead of print]. 29 Rojas D, Haji F, Shewaga R, Kapralos B, Dubrowski A. The impact of secondary-task type on the sensitivity of reaction-time based measurement of cognitive load for novices learning surgical skills using simulation. Stud Health Technol Inform 2014;196:353–9. 30 Brydges R, Classen R, Larmer J, Xeroulis G, Dubrowski A. Computer-assisted assessment of onehanded knot tying skills performed within various contexts: a construct validity study. Am J Surg 2006;192 (1):109–13. 31 Choi HH, van Merri€enboer J, Paas F. Effects of the physical environment on cognitive load and learning: towards a new model of cognitive load. Educ Psychol Rev 2014;26:225–44. 32 Jowett N, Leblanc V, Xeroulis G, MacRae H, Dubrowski A. Surgical skill acquisition with selfdirected practice using computer-based video training. Am J Surg 2007;193 (2):237–42. 33 Paas FG. Training strategies for attaining transfer of problem-solving skill in statistics: a cognitive-load approach. J Educ Psychol 1992;84 (4):429. 34 Datta V, Mackay S, Mandalia M, Darzi A. The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model. J Am Coll Surg 2001;193 (5):479–85. 35 Datta V, Chang A, Mackay S, Darzi A. The relationship between motion analysis and surgical technical assessments. Am J Surg 2002;184 (1):70–3. 36 Brunken R, Steinbacher S, Plass JL, Leutner D. Assessment of cognitive load in multimedia learning using dual-task methodology. Exp Psychol 2002;49 (2):109–19. 37 Field A. Repeated-measures designs (GLM 4). Discovering Statistics Using SPSS, 3rd edn. London: Sage Publications 2009; 457–505. 38 Fitts P, Posner MI. Human Performance. Oxford: Brooks/Cole 1967. 39 Schmidt RA, Lee TD. Motor Control and Learning, 4th edn. Champaign, IL: Human Kinetics 2005. 40 Yurko YY, Scerbo MW, Prabhu AS, Acker CE, Stefanidis D. Higher mental workload is associated with poorer laparoscopic performance as measured by the NASATLX tool. Simul Healthc 2010;5 (5):267–71.

826

41 Stefanidis D, Scerbo MW, Sechrist C, Mostafavi A, Heniford BT. Do novices display automaticity during simulator training? Am J Surg 2008;195 (2):210–3. 42 Carswell C, Clarke D, Seales W. Assessing mental workload during laparoscopic surgery. Surg Innov 2005;12 (1):80–90. 43 Downing SM. Validity – on the meaningful interpretation of assessment data. Med Educ 2003;37:830–7. 44 Leppink J, Paas F, van der Vleuten CPM, van Gog T, van Merri€ enboer JJG. Development of an instrument for measuring different types of cognitive load. Behav Res Methods 2013;45 (4):1058–72. 45 Kulasegaram KM, Grierson LEM, Norman GR. The roles of deliberate practice and innate ability in developing expertise: evidence and implications. Med Educ 2013;47:979–89. 46 Dubrowski A, Park J, Moulton C-A, Larmer J, MacRae H. A comparison of single- and multiple-stage approaches to teaching laparoscopic suturing. Am J Surg 2007;193 (2):269–73. 47 de Jong T. Cognitive load theory, educational research, and instructional design: some food for thought. Instr Sci 2010;38 (2):105–34. 48 Gerjets P, Scheiter K, Cierniak G. The scientific value of cognitive load theory: a research agenda based on the structuralist view of theories. Educ Psychol Rev 2009;21 (1):43–54. 49 Verkoeijen PPJL, Tabbers HK. Good research requires productive theories and guidelines. Med Educ 2013;47:863–5. 50 Cierniak G, Scheiter K, Gerjets P. Explaining the split-attention effect: is the reduction of extraneous cognitive load accompanied by an increase in germane cognitive load? Comput Human Behav 2009;25 (2):315–24. 51 Leppink J, Paas F, van Gog T, van der Vleuten CPM, van Merri€ enboer JJG. Effects of pairs of problems and examples on task performance and different types of cognitive load. Learn Instr 2014;30: 32–42. 52 Kalyuga S. Cognitive load theory: how many types of load does it really need? Educ Psychol Rev 2011;23 (1):1–19. 53 Dubrowski A, Brydges R, Satterthwaite L, Xeroulis G, Classen R. Do not teach me while I am working!. Am J Surg 2012;203 (2):253–7. 54 Kurahashi AM, Harvey A, MacRae H, Moulton C-A, Dubrowski A. Technical skill training improves the ability to learn. Surgery 2011;149 (1):1–6. 55 Szulewski A, Roth N, Howes D. The use of taskevoked pupillary response as an objective measure of cognitive load in novices and trained physicians. Acad Med 2015; PMID: 25738386. [Epub ahead of print]. 56 van Merri€ enboer JJG, Sweller J. Cognitive load theory and complex learning: recent developments and future directions. Educ Psychol Rev 2005;17 (2): 147–77.

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

Measuring cognitive load in simulation 57 Fraser K, Ma I, Teteris E, Baxter H, Wright B, McLaughlin K. Emotion, cognitive load and learning outcomes during simulation training. Med Educ 2012;46:1055–62. 58 Tomasko JM, Pauli EM, Kunselman AR, Haluck RS. Sleep deprivation increases cognitive workload during simulated surgical tasks. Am J Surg 2012;203 (1):37– 43. 59 Paas F, Sweller J. An evolutionary upgrade of cognitive load theory: using the human motor system and collaboration to support the learning of complex cognitive tasks. Educ Psychol Rev 2011;24 (1):27–45. 60 Maxwell JP, Masters RSW, Eves FF. The role of working memory in motor learning and performance. Conscious Cogn 2003;12 (3):376–402.

61 Bannert M. Managing cognitive load – recent trends in cognitive load theory. Learn Instr 2002;12 (1):139–46. 62 Dubrowski A. Performance vs. learning curves: what is motor learning and how is it measured? Surg Endosc 2005;19 (9):1290. 63 Lee TD, Magill RA. Can forgetting facilitate skill acquisition? In: Goodman D, Wilberg RB, Franks IM, eds. Differing Perspectives in Motor Learning, Memory, and Control. Amsterdam: Elsevier Science 1985;3–22. Received 4 February 2015; editorial comments to author 3 March 2015, accepted for publication 21 April 2015

ª 2015 John Wiley & Sons Ltd. MEDICAL EDUCATION 2015; 49: 815–827

827

Measuring cognitive load: performance, mental effort and simulation task complexity.

Interest in applying cognitive load theory in health care simulation is growing. This line of inquiry requires measures that are sensitive to changes ...
458KB Sizes 0 Downloads 10 Views