bs_bs_banner

Emergency Medicine Australasia (2015) 27, 173–175

doi: 10.1111/1742-6723.12370

PERSPECTIVE

Simulation as a high stakes assessment tool in emergency medicine Fenton O’LEARY Emergency Department, The Children’s Hospital at Westmead, Sydney, New South Wales, Australia

Abstract The Australasian College for Emergency Medicine (ACEM) will introduce high stakes simulation-based summative assessment in the form of Objective Structured Clinical Examinations (OSCEs) into the Fellowship Examination from 2015. Miller’s model emphasises that, no matter how realistic the simulation, it is still a simulation and examinees do not necessarily behave as in real life. OSCEs are suitable for assessing the CanMEDS domains of Medical Expert, Communicator, Collaborator and Manager. However, the need to validate the OSCE is emphasised by conflicting evidence on correlation with long-term faculty assessments, between essential actions checklists and global assessment scores and variable interrater reliability within individual OSCE stations and for crisis resource management skills. Although OSCEs can be a valid, reliable and acceptable assessment tool, the onus is on the examining body to ensure construct validity and high interrater reliability. Key words: educational measurement, patient simulation.

The Australasian College for Emergency Medicine (ACEM) will introduce high stakes simulationbased summative assessment in the form of Objective Structured Clinical Examinations (OSCEs) into the Fellowship examination from 2015, while at the same time introducing WorkBased Assessments (WBAs) that specifically exclude simulation as an assessment tool. The aim of the present paper is to describe the current evidence for the use of simulation in high stakes assessment. OSCEs have been used internationally by the United States Medical Licensing Examination (USMLE) since 2004 and the College of Emergency Medicine (UK) Fellowship exam from 2005. Waterson has described the process the NSW Medical Board followed to incorporate simulation methods into its high stakes Performance Assessment Program and in particular describes the need for reliability, validity and acceptability. Reliability is the extent to which test results will be reproduced by different raters (interrater reliability), by a candidate on different occasions (test–retest reliability) or by subsets of the same test (internal consistency). Reliability is considered best achieved by standardising as many components of tests as possible (e.g. case design and deliv-

Correspondence: Associate Professor Fenton O’Leary, Emergency Department, The Children’s Hospital at Westmead, Locked Bag 4001, Westmead, NSW 2145, Australia. Email: [email protected] Fenton O’Leary, MBBS, FACEM, Emergency Physician, Clinical Associate Professor. This is an open access article under the terms of the Creative Commons AttributionNonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made. Accepted 21 January 2015

ery, scoring criteria and raters). Validity is the extent to which the outcomes of a test faithfully reflect the tasks and traits it is intended to. A test has high content validity if experts have been adequately consulted about relevant competencies; construct validity if candidates’ scores increase with their level of experience; and concurrent validity if candidates’ scores on the test correlate with those of other tests, which assess the same tasks and traits. High validity is achieved in part by retaining as many variables as possible so as to preserve the realism of the cases.1 Miller’s hierarchical performance assessment model emphasises that it is important to recognise that, no matter how realistic the simulation, it is still a simulation and examinees do not necessarily behave as in real life. This reinforces the need for summative WBAs during the period of training, scored against a recognised curriculum (Fig. 1). Boulet has described some obstacles to overcome for summative, simulation-based assessments being used in emergency medicine (EM) including: choosing the appropriate simulation tasks, developing appropriate metrics, assessing the reliability of test scores and providing the evidence to support the validity of test score inferences. Particularly emphasising that as attempts are made to assess skills more broadly and combined with new technologies then there is a necessity to create new metrics and studies to support their accuracy.2 Table 1 describes the types of criteria used in scoring simulations, 3 and Table 2 describes the barriers to using simulation in testing.4 With the use of the CanMEDS (Canadian Medical Education Directives

© 2015 The Author. Emergency Medicine Australasia published by Wiley Publishing Asia Pty Ltd on behalf of Australasian College for Emergency Medicine and Australasian Society for Emergency Medicine

174

F O’LEARY

TABLE 2. Identified barriers to using simulation as an assessment tool 4

Figure 1. Miller’s prism of clinical competence (aka Miller’s pyramid). Based on the work by Miller GE, The Assessment of Clinical Skills/Competence/Performance; Acad. Med. 1990; 65(9): 63–67. Adapted by Drs. R. Mehay and R. Burns, UK (Jan 2009).

TABLE 1.

Criteria for scoring simulations3

Criteria Explicit Process

Implicit Process

Explicit Outcome

Implicit Outcome Combined Criteria

Example Case-specific checklist used in a standardised patient chest-pain station to record the history findings obtained and physical examination manoeuvres performed by an examinee Global judgment of a physician-rater observing an examinee’s work with an integrated simulator in a trauma-type scenario Indicators of overall patient status (alive vs dead; complications; physiological indicators) at the conclusion of a computer-based clinical simulation Global judgment of a physician-rater inspecting the sutures made by an examinee on a skin pad Task-specific checklist of explicit process and outcome criteria for observation and inspection of an end-to-end anastomosis of pig bowel

for Specialists) domains of competence for EM it has been suggested that OSCE and high-fidelity simulation are suitable for assessing the domains of Medical Expert (history taking, physical exam, technical skills performance and clinical decision making), Communicator (patient and family interaction, writing records), Collaborator (ability to manage conflict, interprofessional interaction), Manager (ability to lead) and Scholar (teaching ability).5 However, the need to validate the OSCE is emphasised by a paper examining the relationship between EM Intern OSCE and EM faculty evaluation of resident performance. Over 5 years the OSCE assessment of clinical performance did not correlate with faculty assessment in any of the measured domains of

history taking, physical exam, overall performance and interpersonal skills.6 In Anaesthetics the Generic Integrated Objective Structured Assessment Tool (GIOSAT) has been evaluated to integrate Medical Expert and intrinsic (non-medical expert) CanMEDS competencies with non-technical skills. The tool has shown construct validity for the medical expert domain, but was not valid for the intrinsic CanMEDS competencies.7 Hall et al. have developed and evaluated a high-fidelity simulation-based assessment tool for EM residents.8 This uses a three-station OSCE, and for each station, a corresponding assessment tool was developed with an essential actions (EA) checklist and a global assessment score (GAS). Using GAS scores there was construct va-

Costs and logistics Standardisation across multiple simulation sites Exposure of simulation modalities to trainees before high-stakes testing Overreliance on psychometric criteria that can lead to measures (e.g. checklists) that may fail to capture the complexities involved in healthcare, such as caring for the patient with multiple comorbidities Validity, especially in maintenance of licensure and certification where little evidence exists Transferability to actual clinical practice. Training and recruitment of the raters for high-stakes simulation-based assessment Evidence base for some simulation based activities not yet robust enough for high-stakes assessment

lidity with increasing scores with increasing seniority, but based on the EA scores junior residents outperformed senior residents in some situations. This may illustrate the problem with EA checklists for high stakes senior assessments in that those participants who have reached mastery may not follow the stepwise approach but will still do better overall and receive superior GAS scores. This point is extremely important when designing assessment tools at Fellowship level.8 When looking at the individual skills that might be examined in an OSCE a group developed and evaluated an objective structured assessment of technical skills for neonatal lumbar puncture (OSATS-LP). The domains of sterility and CSF collection had moderate statistical reliability (κ = 0.41 and 0.51, respectively). The domains of preparation, analgesia and management of laboratories had substantial reliability (κ = 0.60, 0.62 and 0.62, respectively). The domains of positioning and needle insertion were less reliable (κ = 0.16 and 0.16, respectively. In high stakes assessment you would be requiring a kappa of >0.75 (excellent) to be an acceptable

© 2015 The Author. Emergency Medicine Australasia published by Wiley Publishing Asia Pty Ltd on behalf of Australasian College for Emergency Medicine and Australasian Society for Emergency Medicine

175

HIGH STAKES ASSESSMENT

benchmark so this tool would not be suitable.9 In contrast, scoring systems for Paediatric Advanced Life Support (PALS) algorithms seem to be more reliable with interrater reliability high (0.81) when using four scenarios (asystole, dysrhythmia, respiratory arrest, shock) and four raters and also demonstrating construct validity.10 Adler and colleagues have demonstrated that a dichotomous checklist or the Global Performance Assessment Tool (GPAT), an anchored multidimensional scale, is highly reliable and has high interrater reliability (>0.9) in simulated paediatric emergencies assessing paediatric EM residents.11 Formal tools to assess the crisis resource management skills of trainees do already exist but have shown mixed results with evaluation. The Ottowa Global Rating Scale (GRS) has acceptable interrater reliability as well as construct validity12 and a study by Adler et al. has shown that a simulation programme based on four cases (apnoea, asthma, supraventricular tachycardia and sepsis) can reliably measure and discriminate competence.13 An interesting paper on senior intensive care trainees using high-fidelity simulation for a simulated specialist examination using the Anaesthesia Nontechnical Skills (ANTS) scale and the GRS, only showed a fair interrater reliability for the ANTS and GRS scores as well as only a fair (weighted kappa, 0.32) for non-technical skills pass or fail and for technical skills pass and fail (weighted kappa, 0.36). The low interrater reliability for the ANTS and GRS rating scales, which are evaluated and well regarded tools, is a real concern for a potential high stakes assessment examination.14 In the anaesthetic literature Everett and colleagues have validated checklists and a global rating scale as part of the Managing Emergencies in Paediatric Anaesthesia (MEPA) course, noting that at least two raters were required to achieve acceptable reliability, claiming that the global rating scale allows raters to make a judgement regarding a participant’s readiness for independent practice.15

Although it does seem possible that simulation in the form of OSCEs can be a valid, reliable and acceptable assessment tool, the onus is on the examining body to ensure construct validity, high interrater reliability (even within individual stations) for both Medical Expert or technical skills as well as the non-technical skills. This will ensure the OSCEs are acceptable to candidates, examiners and other health professionals.

Competing interests

8.

9.

10.

None declared.

References 1. Waterson L. High-Stakes Performance Assessment A Manual of Simulation in Healthcare. Riley R. London: Oxford University Press, 2008. 2. Boulet JR. Summative assessment in medicine: the promise of simulation for high-stakes evaluation. Acad. Emerg. Med. 2008; 15: 1017–24. 3. Boulet J, Swanson DS. Psychometric challenges of using simulations for high-stakes assessment. In: Dunn W, ed. Simulations in Critical Care and Beyond. Des Plains, IL: Society of Critical Care Medicine, 2004; 119– 30. 4. Holmboe E, Rizzolo MA, Sachdeva AK, Rosenberg M, Ziv A. Simulationbased assessment and the regulation of healthcare professionals. Simul. Healthc. 2011; 6(Suppl): S58– 62. 5. Sherbino J, Bandiera G, Frank JR. Assessing competence in emergency medicine trainees: an overview of effective methodologies. CJEM 2008; 10: 365–71. 6. Shih RSM, Mayer C. A 5 year study of emergency medicine intern Objective Structured Clinical Examination (OSCE) performance does not correlate with emergency medicine faculty evaluation of resident performance. Ann. Emerg. Med. 2013; 62: S181. 7. Neira VM et al. ‘GIOSAT’: a tool to assess CanMEDS competencies

11.

12.

13.

14.

15.

during simulated crises. Can. J. Anaesth. 2013; 60: 280–9. Hall AK, Pickett W, Dagnone JD. Development and evaluation of a simulation-based resuscitation scenario assessment tool for emergency medicine residents. CJEM 2012; 14: 139–46. Iyer MS, Santen SA, Nypaver M et al. Assessing the validity evidence of an objective structured assessment tool of technical skills for neonatal lumbar punctures. Acad. Emerg. Med. 2013; 20: 321–4. Donoghue A, Nishisaki A, Sutton R, Hales R, Boulet J. Reliability and validity of a scoring instrument for clinical performance during Pediatric Advanced Life Support simulation scenarios. Resuscitation 2010; 81: 331–6. Adler MD, Vozenilek JA, Trainor JL et al. Comparison of checklist and anchored global rating instruments for performance rating of simulated pediatric emergencies. Simul. Healthc. 2011; 6: 18–24. Kim J, Neilipovitz D, Cardinal P, Chiu M, Clinch J. A pilot study using high-fidelity simulation to formally evaluate performance in the resuscitation of critically ill patients: the University of Ottawa Critical Care Medicine, High-Fidelity Simulation, and Crisis Resource Management I Study. Crit. Care Med. 2006; 34: 2167–74. Adler MD, Trainor JL, Siddall VJ, McGaghie WC. Development and evaluation of high-fidelity simulation case scenarios for pediatric resident education. Ambul. Pediatr. 2007; 7: 182–6. Nunnink L, Foot C, Venkatesh B et al. High-stakes assessment of the non-technical skills of critical care trainees using simulation: feasibility, acceptability and reliability. Crit Care Resusc 2014; 16: 6–12. Everett TC, Ng E, Power D et al. The Managing Emergencies in Paediatric Anaesthesia global rating scale is a reliable tool for simulation-based assessment in pediatric anesthesia crisis management. Paediatr. Anaesth. 2013; 23: 1117–23.

© 2015 The Author. Emergency Medicine Australasia published by Wiley Publishing Asia Pty Ltd on behalf of Australasian College for Emergency Medicine and Australasian Society for Emergency Medicine

Copyright of Emergency Medicine Australasia is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Simulation as a high stakes assessment tool in emergency medicine.

The Australasian College for Emergency Medicine (ACEM) will introduce high stakes simulation-based summative assessment in the form of Objective Struc...
183KB Sizes 1 Downloads 5 Views