Medical Teacher

ISSN: 0142-159X (Print) 1466-187X (Online) Journal homepage: http://www.tandfonline.com/loi/imte20

Reliability and validity of an extended clinical examination G.A.M. Bouwmans, E. Denessen, A.M. Hettinga, C. Michels & C.T. Postma To cite this article: G.A.M. Bouwmans, E. Denessen, A.M. Hettinga, C. Michels & C.T. Postma (2015): Reliability and validity of an extended clinical examination, Medical Teacher To link to this article: http://dx.doi.org/10.3109/0142159X.2015.1009423

Published online: 16 Feb 2015.

Submit your article to this journal

Article views: 106

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=imte20 Download by: [University of California, San Diego]

Date: 05 November 2015, At: 22:37

2015, 1–6, Early Online

Reliability and validity of an extended clinical examination G.A.M. BOUWMANS1, E. DENESSEN2, A.M. HETTINGA1, C. MICHELS2 & C.T. POSTMA1 1

Radboud University Medical Centre, The Netherlands, 2Radboud University Nijmegen, The Netherlands

Downloaded by [University of California, San Diego] at 22:37 05 November 2015

Abstract Introduction: An extended clinical examination (ECE) was administered to 85 final year medical students at the Radboud University Medical Centre in the Netherlands. The aim of the study was to determine the psychometric quality and the suitability of the ECE as a measurement tool to assess the clinical proficiency of eight separate clinical skills. Methods: Generalizability studies were conducted to determine the generalizability coefficient and the sources of variance of the ECE. An additional D-study was performed to estimate the generalizability coefficients with altering numbers of stations. Results: The largest sources of variance were found in skill difficulties (36.18%), the general error term (26.76%) and in the rank ordering of skill difficulties across the stations (21.89%). The generalizability coefficient of the entire ECE was above the 0.70 lower bound (G ¼ 0.74). D studies showed that the separate skills could yield sufficient G coefficients in seven out of eight skills, if the ECE was lengthened from 8 to 14 stations. Discussion: The ECE proved to be a reliable clinical assessment that enables examinees to compose a clinical reasoning path through self-obtained data. The ECE can also be used as an assessment tool for separate clinical skills.

Introduction

Practice points

Assessing the clinical competence of students in an objective, reliable and valid way is not an easy challenge. Clinical competence comprises different features and assessing the various components in a coherent way is a central tenet in the assessment of clinical proficiency. A frequently used method to assess clinical proficiency is the Objective Standardized Clinical Examination (OSCE) (Harden 1988) and over the years many different OSCE forms have been developed. For instance, the employment of objective checklists was initially a key feature of OSCEs (Regehr et al. 1998) meant to remove the subjectivity from the assessment. Gradually it became apparent that checklists might measure thoroughness rather than competence (Reznick et al. 1998; Regehr et al. 1999) and currently there seems to be consensus that, dependent on the context, a mixed approach is probably the best (Newble 2004). Another circumstance that propelled the development of OSCE variants was the problem of case specificity. Case specificity (the fact that the clinical proficiency in one medical case is a poor predictor for the proficiency in another case) is a recurring issue in clinical assessments, generally resulting in assessments requiring large numbers of cases. Another strategy to solve the case specificity problem was found in the combination of OSCEs with other assessment formats (Newble 2004) and a relatively new method to combine OSCEs was found in free text, post-encounter forms that allowed for the assessment of the intermediate steps in clinical reasoning (Durning et al. 2012). At the Radboud University Medical Centre in the Netherlands, a need was felt to develop







The extended clinical examination enables candidates to go through the entire clinical process, from data acquisition to management plan. The design of the extended clinical examination allows for profound inferences about clinical proficiency as well as internal validity. Dependent on resources (time, money, availability) or the required thoroughness of the research questions, the extended clinical examination facilitates various analysis configurations.

a formative clinical assessment for last year students. To address the challenges of developing a valid and reliable assessment we set-up an Extended Clinical Examination (ECE) consisting of eight, 30 min stations, in which the clinical problems are portrayed by professionally and highly trained standardized patients. The entire examination took four hours with an additional hour for the introduction, the explanation of the procedure and a short break. The content of the exam is derived from a national blueprint, consisting of eight clinical problems the candidates are expected to be able to handle versus eight separate clinical skills (1. history taking, 2. physical examination, 3. communication, 4. professional behaviour, 5. report of the most important conclusions from history taking and physical examination, 6. report of problem list, 7. report of differential diagnosis and 8. report of diagnostic management plan). In each of the eight stations

Correspondence: G.A.M. Bouwmans, IWOO Internal Postal Code 306, Geert Grooteplein Noord 21, 6525EZ Nijmegen, The Netherlands. Tel: 31 243617344; Fax: 31 243560433; E-mail: [email protected] ISSN 0142-159X print/ISSN 1466-187X online/15/000001–6 ß 2015 Informa UK Ltd. DOI: 10.3109/0142159X.2015.1009423

1

Downloaded by [University of California, San Diego] at 22:37 05 November 2015

G. A. M. Bouwmans et al.

the candidates had to execute each of these eight skills, resulting in an examination consisting of 64 components, to wit, eight stations, each of which measures eight clinical skills. The particular problems chosen for the examination originate from a framework of problems that should be mastered by medical graduates as established by the regulatory bodies (Sonderen et al. 2009; Laan et al. 2010). The theoretical rationale behind the ECE design is that the chosen set-up enables the candidates to go through the entire clinical process from data acquisition to differential diagnosis and management plan in each station. In other words, the various clinical skills are not executed in isolation but are assessed in an integrated manner, in the same way as it could happen in daily clinical practice, thus supporting the ecological validity of the assessment. Moreover, the symmetric design of the ECE also allows for estimations of validity and reliability parameters at the separate skill level. Assessment at skill level broadens the applicability of the ECE for it allows the possibility to use the ECE as an assessment tool for distinct skills of clinical proficiency (e.g. communication skills) or as an instrument for a more profound insight of the various components of clinical competence. For instance, a high score on data gathering skills (history taking and physical examination) combined with a low score on differential diagnosis skills, could indicate that the candidate is able to systematically obtain relevant clinical data, but lacks the ability to combine the accumulated data into meaningful clinical conclusions (clinical reasoning). The first aim of this study was to determine the psychometric quality and the main sources of variance of the entire ECE comprising eight stations of eight skills each. Our second aim was to determine the suitability of ECE as a measurement tool to assess the proficiency of eight separate clinical skills.

free-text, written summary (10 min) of the encounter comprising four elements: (1) report of the most important aspects of history taking and physical examination, (2) report of problem list, (3) report of differential diagnosis and (4) report of diagnostic management plan. Whilst the examinee was administering these four written summaries, the SP completed four questionnaires pertaining to the prior encounter with the examinee: three checklists (concerning history taking skills, physical examination skills and communication skills) and one global rating scale (concerning professional behaviour). The global rating scale concerning professional behaviour (two questions) and the communication checklist (18 questions) were generic, meaning that the questionnaires were identical for all stations. An example of the communication checklist is: ‘‘The physician showed adequate nonverbal behaviour (good/ mediocre/bad)’’. The checklists concerning history taking (on average about 12 items) and physical examination (on average about 15 items) were specific, meaning that each station had a different set of items, dependent on the contents of the station (Hettinga et al. 2010). The history taking checklists determined whether the examinee made inquiries about distinct topics, for instance in the Jaundice station: ‘‘The physician asked whether urine is brown/dark orange (yes/no)’’. The physical examination checklists determined whether well-defined examination activities took place, for instance: ‘‘The physician measures blood pressure (correctly executed/not correctly executed/not executed)’’. Each written report was assessed on a global rating scale by a principal clinical lecturer (PCL) for instance, in the report of history taking/physical examination, the PCLs rated the use of professional language (semantic qualifiers) and the integration of history taking and physical examination data on 0–10 scale.

Analyses

Methods To date the entire ECE library of the Radboud University Medical Center consists of 26 stations, each of which assesses eight separate skills: (1) history taking, (2) physical examination, (3) communication, (4) professional behaviour, (5) report of the most important conclusions from history taking and physical examination, (6) report of problem list, (7) report of differential diagnosis and (8) report of diagnostic management plan. For the ECE examination a set of eight stations was selected from the ECE library and in each station all of the aforementioned eight skills are assessed. The stations were selected randomly, on the understanding that there were no stations in the examination that assessed the same class of clinical problems, that is, since the stations were selected from a 26 station ECE library, it was considered undesirable if several stations should relate to, for example, musculoskeletal disorders. For this study, 85 last year students from the Radboud University Medical Centre in the Netherlands, sat at eight station ECE assessment. In each station the examinee was instructed to take a history and physical examination with a standardized patient (SP), as they deemed necessary in the light of the presented complaints and signs (20 min). Directly after each encounter the examinee was required to compose a 2

The ratings ‘‘not correctly executed’’ and ‘‘not executed’’ from the physical examination checklist and the ratings ‘‘mediocre’’ and ‘‘bad’’ from the communication skills checklist were taken together since those categories were considered indicators of inadequate performance. The items of each checklist were averaged in one grade, resulting in an assessment in which each candidate was assessed on 64 components, to wit, eight different stations, each comprising the same eight skills (history taking skills, physical examination skills, communication, professional behavior skills, report of history taking and physical examination, problem list report, differential diagnosis report and management plan report). To get insight in the psychometric properties of the stations and the separate clinical skills, generalizability analyses were conducted with which various sources of score variance were identified. For the entire ECE a two-facet, crossed design with eight station and eight skills was used. The history taking and physical examination skills comprised station-specific items, and were therefore analysed with a two-facet design in which the case-specific items were nested in the stations. The communication and professional behaviour skills were assessed by identical items across all stations and were measured with a two-facet, crossed design. The four written skills solely comprised one global rating for each station and were analysed with a one-facet

Reliability and validity of an Extended Clinical Examination

Table 1. Average Pearson’s r correlation among skills within stations of the ECE.

All ECE skills (n ¼ 8) SP skills (n ¼ 4)a PCL skills (n ¼ 4)b a

a,b

stat1

stat2

stat3

stat4

stat5

stat6

stat7

stat8

0.28 0.55 0.18

0.41 0.42 0.63

0.33 0.41 0.59

0.23 0.35 0.40

0.28 0.57 0.28

0.27 0.31 0.51

0.31 0.42 0.49

0.38 0.45 0.54

History taking, physical examination, communication, professional behavior. Report of history taking and physical examination, report of problem list, report of differential diagnosis, report of diagnostic management plan.

b

Downloaded by [University of California, San Diego] at 22:37 05 November 2015

Table 2. Variance components and generalizability coefficient from the generalizability study from the overall assessment.

Source of variation Person Station Skill Person  Station Person  Skill Station  Skill Person  Skill  Station

Variance component

Variance percentage

0.127 0.000 0.952 0.184 0.088 0.576 0.704 G ¼ 0.74

4.83 0.00 36.18 6.99 3.34 21.89 26.76

design. An additional D-study was conducted to estimate the G coefficients with varying numbers of stations. A value of 0.70 will be used as lower bound value for the generalizability coefficient (G coefficient) (Downing 2004; Brannick et al. 2011). Internal validity was assessed by calculating the average correlations among the skills within each station. Since the ECE is still in use today, the station names will be numbered. Data were analysed using IBM SPSS statistics 20 (Armonk, NY), and G string IV, version 6.11 (Hamilton, Ontario, Canada) (Bloch & Norman 2012).

Results Within each station, the average correlations among all eight skills of the ECE were moderate (average r ¼ 0.31) and fairly stable across stations (Table 1, first data row). Closer inspection of the correlations revealed that the four skills assessed by the SPs (history taking, physical examination, communication, professional behaviour) correlated strongly (average r ¼ 44) and fairly consistent (Table 1, second data row). The four written skills (PCL ratings of the report of history taking and physical examination, report of problem list, report of differential diagnosis, report of diagnostic management plan) also correlated strongly (average r ¼ 0.45) but were more dispersed with relatively low values in station 1 and station 5 (average r, respectively, 0.18 and 0.28) (Table 1, third data row). The estimated person variance component was relatively small (4.83%) meaning that the candidates’ total scores (candidate scores averaged over station and skills) did not vary strongly (Table 2). This proportion of variance represents the individual differences in the medical competence levels between the candidates (Mushquash & O’Connor 2006). The low station variance (0.00%) indicates that the eight station scores (scores per station, averaged over skills and persons)

Table 3. Generalizability coefficients from the eight separate skills.

G coefficient

HTa

PEb

COc

PBd

R1e

R2f

R3g

R4h

0.63

0.60

0.56

0.64

0.79

0.62

0.72

0.58

a

History taking. Physical examination. c Communication. d Professional behavior. e Report of history taking/physical examination. f Report of problem list. g Report of differential diagnosis. h Report of management plan. b

did not differ significantly, pointing to equal difficulties of the eight stations. The largest source of variance was found in the skills component (36.18%) meaning that the skill difficulties differed substantially from each other. Closer inspection of the average scores showed that the communication skill was the main contributor with highly rated scores in each station (average score ¼ 8.77) and to a lesser extent the physical examination skill with an average low scores (average score 5.20), while the six remaining skill scores yielded average scores around 6.50. The person  station variance component was 6.99% indicating that the eight stations showed relatively small but not negligible differences in the rank ordering of the candidates with respect to their proficiency level. The person  skills variance was low (3.34%). This means that the skills scores indicated approximately the same ordering of candidates’ proficiency level. The station  skill variance was high (21.89%), signifying that the rank ordering of skill difficulties differed greatly between the eight stations. The person  station  skill variance (26.76%) reflected not only the ordering of candidates across stations and skills, but also other sources of error that were not included in the design. This variance component is therefore a general error term that cannot be further unraveled (van der Vleuten & Wijnen, 1991; Mushquash & O’Connor 2006). The generalizability coefficient for the entire ECE (G ¼ 0.74) was sufficient (Table 2) and the generalizability coefficients of the separate skills were below the lower bound in six out of eight skills (Table 3). Only the report of history taking/physical examination (G ¼ 0.79) and the report of the differential diagnosis (G ¼ 0.72) showed fair levels of generalizability. Figures 1 and 2 show the results of the D-studies concerning the G coefficients. For readability reasons the G coefficients were represented in two separate graphs, each comprising four skills, in which each graphs comprises the results from the entire ECE as reference point.

3

G. A. M. Bouwmans et al.

history taking skills professional behaviour skills

physical examinaon skills enre ECE

communicaon skills

0,85 0,8 0,75 0,7 0,65 0,6

Downloaded by [University of California, San Diego] at 22:37 05 November 2015

0,55 0,5 0,45 6 staons 7 staons 8 staons 9 staons 10 staons 11 staons 12 staons 13 staons 14 staons

Figure 1. Estimated G coefficients from the D-study from the skills assessed by standardized patients. report history/physical examinaon

report problem list

report diferenal diagnosis

report management plan

enre ECE 0,9 0,85 0,8 0,75 0,7 0,65 0,6 0,55 0,5 0,45 6 staons 7 staons 8 staons 9 staons 10 staons 11 staons 12 staons 13 staons 14 staons

Figure 2.

Estimated G coefficients from the D-study from skills assessed by principal clinical lecturers.

The D-studies showed that shortening the entire ECE from eight to seven stations could result in a sufficient reliability value (G ¼ 0.72). Lengthening the ECE to 14 stations could result in sufficient levels of the G coefficient in all skills with the exception of the communication skill (G ¼ 0.66).

Discussion Considering the entire ECE, all eight skills correlated moderately positive, thus supporting the notion that all skills 4

contributed to the underlying construct of clinical competence. The intercorrelations of the four skills assessed by the SPs, as well as the four skills assessed by the PCLs were higher, indicating that both clusters measured complementary, yet different aspects of the clinical competence construct. The first cluster of skills (history taking, physical examination, communication, professional behaviour) which primarily refer to practical oriented skills, that is, a competency aimed at the professional gathering of relevant clinical data, can be considered distinct from the second cluster of skills (problem list, differential diagnosis, management plan, report of history

Downloaded by [University of California, San Diego] at 22:37 05 November 2015

Reliability and validity of an Extended Clinical Examination

taking/physical examination) that primarily represent more cognitive oriented skills, that is, the integration of the gathered clinical data from the first cluster, into meaningful clinical conclusions (clinical reasoning) (Durning et al. 2012). The main source of variance was found in the skill component of the ECE, and more specific in the high scores in all communication checklists and the low scores in half of the physical examination checklists. The high scores across all the communication checklists indicated that some of the (generic) items have little discriminating power. This might imply that candidates reached the required communication competencies more easily than problem-related medical skills and that these skills are less related to the specific content of the medical problem at stake than the other skills. A potential reason for the low scores on the physical examination checklist could be found in the content validity of the checklists. Expert panels and case constructors (mostly clinicians who are specialized in the stations that have been developed) often diverge with respect to which items to include in a checklist (Boulet et al. 2008). Moreover, evidence-based items often do not correspond to items found in literature (Hettinga et al. 2010). A possible cause for this discrepancy might be that clinical reasoning paths can be idiosyncratic (Schmidt et al. 1990; Eva 2005; Charlin et al. 2007; Schuwirth 2009), that is, different clinicians can use different reasoning routes to arrive at the same clinical conclusions. Therefore, a deterministic checklist might not be the most effective method to capture the candidates proficiency (Regehr et al. 1998, 1999; Reznick et al. 1998) for there is always the possibility that the candidate followed a valid reasoning route through items not included in the checklist, which results in a low score on a standardized checklist. On the other hand, all candidates in this study were last year students from the same university. They were all trained and educated in the same educational context, so there seemed to be little reason to assume that these students already developed highly personalized clinical reasoning routes. These results might indicate that the developers of the ECE checklists (experienced clinician with personalized reasoning routes) did not take into account the standards and methods used in the university curriculum, but placed too much emphasis on their personal, idiosyncratic viewpoint. Aligning the physical examination checklists more thoroughly with the methods and indications taught during the curriculum, might improve the validity of the physical examinations checklists, as well as the reliability coefficients. Obviously, this alignment is only justified in this research setting concerning a homogeneous student group. In other situations were different but correct physical examinations manoeuvres are expected, it might be advisable to use (holistic) global rating scales instead of (deterministic) checklists but due to the required expertise, this would probably mean that these global ratings have to be assessed by PLCs. Another major source of variance was found in the station–skill interaction, meaning that the rank ordering of skill difficulties were different across stations. For instance: the ‘‘differential diagnosis – skill’’ could be a relatively easy skill in some stations, but a more difficult skill in other stations, and this was evidently the case despite the fact that the communication skill was most easy in all stations and the physical examination skill was the most

difficult in half of the stations. To a certain extent this ‘‘skillspecificity’’ is an inherent property of clinical proficiency, for instance, the relative difficulty of a differential diagnosis concerning a mitochondrial disorder case is likely higher than in a case concerning a fractured leg. Yet, due to the integrated design of the ECE skills we may expect some systematic relation between some skills in all stations alike. For instance it would be peculiar – not to say very problematic regarding internal validity – if a relatively low score on the data gathering skills (possibly indicating that essential clinical data were not taken into account) could nevertheless result in an excellent differential diagnosis. Inferences such as the above should be carefully executed as the scoring scales are relative, entwined and complicated to disentangle. It is conceivable that aligning the checklists with the curriculum, or the employment of global rating scales instead of checklists as mentioned above, might also reduce this source of variance. In most clinical competence research, the person–station interaction (case specificity) is a major source of variance (van der Vleuten & Wijnen 1991; Guiton et al. 2004; Swanson & van der Vleuten 2013) but in our data this variance was moderate. The main difference between regular clinical assessments and the ECE is that the latter uses more skills (and therefore also more items) within each station. Adding more information in stations probably made the stations scores more robust and more consistent with each other. The low reliability of the communication skills is consistent with literature (Brannick et al. 2011) and since communications skills and the professional behaviour skill are technically different from clinical skills, it might be considered to merge both skills in a new skill, e.g. interpersonal skill, with possibly a higher reliability value. Merging skills to obtain higher reliability or to gain effectiveness (reducing the required number of stations) is not uncommon. For instance, history taking and physical examination can also be seen as complementary skills, contributing to the data gathering construct (Whelan et al. 2005). The four written reports might be merged to measure clinical reasoning proficiency (Durning et al. 2012). Collapsing data like this is justified providing that it is supported by a sound fundamental rationale, and moreover, it must be taken into account that every alteration changes the construct to be measured. A major advantage of the ECE design is that it allows for several ways of merging data and this is a very practically useful asset. Dependent on the research questions, the resources (time, money, availability) or the required thoroughness, the ECE design facilitates various analysis configurations. The entire ECE proved to be a reliable assessment that takes approximately half a day of testing time. The ECE can also be used as an assessment tool for seven separate clinical skills. The possibility of assessing separate skills is a valuable addition to the ECE for it allows not only for a profound insight in the strengths and weaknesses of the intermediate steps necessary to arrive at clinical conclusions, but also allows for inferences regarding internal validity of the ECE. However, indepth analysis is costly. In order to reliably measure the separate skills, the ECE had to be expanded to 14 stations, thus lengthening the ECE from 8 to 14 stations; approximately a full day of testing time.

5

G. A. M. Bouwmans et al.

Conclusion The ECE is a reliable clinical assessment that enables examinees to compose a comprehensive clinical reasoning path, solely through self-obtained data. The ECE can also be used as an adaptable and profound measure for the candidates’ proficiency level on seven separate clinical skills and as a measure for internal validity.

Notes on contributors

Downloaded by [University of California, San Diego] at 22:37 05 November 2015

GEERT BOUWMANS, MSc, is a health psychologist working as a researcher at the Radboud University Medical Centre, Nijmegen, the Netherlands. He researches reliability and validity aspects of clinical skills assessments. EDDIE DENESSEN, PhD, is an associate professor and member of the Behavioural Science Institute at Radboud University Nijmegen, the Netherlands. His research interests include educational research methods and psychometrics. AGGIE HETTINGA, MD, is a general practitioner and researcher at the Radboud University Medical Center in the city of Nijmegen, the Netherlands. She has special interest and experience in the development of clinical skills assessment with standardized patients. CHRIS MICHELS, PhD, is a retired assistant professor at the Faculty of Social Sciences at Radboud University Nijmegen, the Netherlands. His research interests included data analysis, psychometrics and Aptitude  Treatment Interaction research. CORNELIS T. POSTMA, MD PhD, is an associate professor of Medicine and principal lecturer in medical education at the department of Internal Medicine of the Radboud University Nijmegen Medical Centre. His main medical education interests are in the training of practical medical education and medical competence and in the field of clinical assessment.

Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the article.

References Bloch R, Norman G. 2012. Generalizability theory for the perplexed: A practical introduction and guide: AMEE Guide No. 68. Med Teach 34: 960–992. Boulet JR, van Zanten M, de Champlain A, Hawkins RE, Peitzman SJ. 2008. Checklist content on a standardized patient assessment: An ex post facto review. Adv Health Sci Educ Theory Pract 13:59–69. Brannick MT, Erol-Korkmaz HT, Prewett M. 2011. A systematic review of the reliability of objective structured clinical examination scores. Med Educ 45:1181–1189.

6

Charlin B, Boshuizen HP A, Custers EJ, Feltovich PJ. 2007. Scripts and clinical reasoning. Med Educ 41:1178–1184. Downing SM. 2004. Reliability: On the reproducibility of assessment data. Med Educ 38:1006–1012. Durning SJ, Artino A, Boulet J, La Rochelle J, van der Vleuten C, Arze B, Schuwirth L. 2012. The feasibility, reliability, and validity of a postencounter form for evaluating clinical reasoning. Med Teach 34:30–37. Eva KW. 2005. What every teacher needs to know about clinical reasoning. Med Educ 39:98–106. Guiton G, Hodgson CS, Delandshere G, Wilkerson L. 2004. Communication skills in standardized-patient assessment of final-year medical students: A psychometric study. Adv Health Sci Educ Theory Pract 9:179–187. Harden RM. 1988. What is an OSCE? Med Teach 10:19–22. Hettinga AM, Denessen E, Postma CT. 2010. Checking the checklist: A content analysis of expert- and evidence-based case-specific checklist items. Med Educ 44:874–883. Laan RF, Leunissen RR, van Herwaarden CL. 2010. The 2009 framework for undergraduate medical education in the Netherlands. GMS Z Med Ausbild 27:Doc35. Mushquash C, O’Connor BP. 2006. SPSS and SAS programs for generalizability theory analyses. Behav Res Methods 38:542–547. Newble D. 2004. Techniques for measuring clinical competence: Objective structured clinical examinations. Med Educ 38:199–203. Regehr G, Freeman R, Robb A, Missiha N, Heisey R. 1999. Students assessment and standardized patients – Will the question never end? Acad Med 74:135–137. Regehr G, Macrae H, Reznick RK, Szalay D. 1998. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med 73: 993–997. Reznick R, Regehr G, Yee G, Rothman A, Blackmore D, Dauphinee D. 1998. High-stakes examinations: What do we know about measurement. Acad Med 73:3. Schmidt HG, Norman GR, Boshuizen HP. 1990. A cognitive perspective on medical expertise: Theory and implication. Acad Med 65:611–621. Schuwirth L. 2009. Is assessment of clinical reasoning still the Holy Grail? Med Educ 43:298–300. Sonderen MJ, Denessen E, Cate OTJT, Splinter TAW, Postma CT. 2009. The clinical skills assessment for international medical graduates in The Netherlands. Med Teach 31:e533–e538. Swanson DB, van der Vleuten CPM. 2013. Assessment of clinical skills with standardized patients: State of the art revisited. Teach Learn Med 25(Suppl 1):S17–S25. van der Vleuten CPM, Wijnen WHFW. 1991. Niets is praktischer dan een goed theorie: Generaliseerbaarheidstheorie als instrument voor betrouwbaarheidsstudies. Bull Med Ond 10:2–14. Whelan GP, Boulet JR, McKinley DW, Norcini JJ, van Zanten M, Hambleton RK, Burdick WP, Peitzman SJ. 2005. Scoring standardized patient examinations: Lessons learned from the development and administration of the ECFMG Clinical Skills Assessment (CSA (R)). Med Teach 27:200–206.

Reliability and validity of an extended clinical examination.

An extended clinical examination (ECE) was administered to 85 final year medical students at the Radboud University Medical Centre in the Netherlands...
514KB Sizes 2 Downloads 6 Views