Education, learning and assessment: current trends and best practice for medical educators.

Ir J Med Sci (2015) 184:1–12 DOI 10.1007/s11845-014-1069-4

REVIEW ARTICLE

Education, learning and assessment: current trends and best practice for medical educators W. Tormey

Received: 26 June 2013 / Accepted: 15 January 2014 / Published online: 19 February 2014 Ó Royal Academy of Medicine in Ireland 2014

Abstract Introduction Methods of teaching and assessment in medical schools have transformed over the recent past. Accreditation of medical schools through national licensing bodies and removal of bias at examinations is the norm. This review is intended to inform senior doctors who are peripherally involved in training at speciality or general professional level. Materials and methods From the summative assessment of learning, the uses of assessment for learning and formative assessment have transformed the process of education. Student feedback has moved centre stage. The criteria used to rate questions and abilities are made explicit and there is accountability for student and examiner performances. Standard setting for medical professional examinations is formalised through norm or criteria referencing. Objective methods to determine the pass/fail border including Angoff, Ebel, Nedelsky, Bookmark, Hofstee, borderline group and contrast by group are described. There is some evidence for grade inflation over time at universities. Blueprinting by setting test questions to learning objectives is now standard. The Objective Structured Clinical Examination uses a wide variety of case tasks at different stations. Miller’s pyramid is the road map for professional competence indication where ‘doing’ becomes the benchmark standard. Item response theory and computer adaptive testing are available through the Concerto testing platform which is an open resource. Conclusion Examining the performance of examiners as well as that of students is a necessary part of good W. Tormey (&) Biomedical Sciences, University of Ulster, Coleraine, Northern Ireland, UK e-mail: [email protected]

examination practice. The World Federation for Medical Education with the World Health Organisation has developed nine standards with two categories of basic and quality for the accreditation of medical education. Keywords Learning Assessments Standard setting Education evaluation

From schools to medical schools It is self-evident that assessment of learning and knowledge acquisition has always formed a part of the student experience in education. In Britain and Ireland, state primary education was introduced in 1831 [1]. Schools were allowed under religious patronage and were a driver of massive social change in the nineteenth century. The end exit examination in Ireland was the Primary Certificate which lasted from 1929 to 1967. This was an assessment of learning. In England, Wales and Northern Ireland, the 11 Plus was an exit-type examination from the primary school system to grade pupils for progression to second-level schools that differed in their academic or practical curricula. The Leaving Certificate and GCE A-levels perform a similar gateway function in the Republic of Ireland and in England, Wales and Northern Ireland at the interface between second- and third-level education. There is controversy about grade inflation in the UK with five A level Boards offering examinations with some being thought ‘easier’ than others. Some pupils are sitting the same subject twice on the same day using two examination boards in an effort to maximise their grades [2]. School league tables using a new English Baccalaureate which rewards pupils who get good grades in core academic subjects and more rigorous examination markings in 2012

123

2

and 2013 especially in so-called easier subjects are reflected in a declining percentage of UK students gaining good grades. Grade inflation is also controversial at second and third level in Ireland [3].Thus, the assessment of learning by terminal examination is a driving force of policy change [4]. In Ireland, the Report of the Working Group on Undergraduate Medical Education and Training chaired by Patrick Fottrell provided a detailed blueprint for modernising medical education and offers detailed information on benchmarking Irish education against accreditation standards from across the world [5]. The Irish Medical Practitioners Act of 2007 sets out the requirements for education in medical schools that will be accredited for 5-year periods and reviewed. The basic requirements are the World Federation for Medical Education Global Standards for Quality Improvement in Medical Education: European Specifications 2007 together with the requirements of Article 24 of the European Union Directive 2005/36/EC for programmes of basic medical education [6]. In the UK, the General Medical Council must set and maintain the standards of undergraduate medical education as required by the Medical Act 1983. There is a quality assurance scheme for basic medical education in which all medical schools in the UK must detail developments in their curriculum on an annual basis [7]. Graduate entry medical school places in Ireland and the UK require an upper second class honours degree (2.1) and an appropriate (GAMSAT) Graduate Australian Medical School Admissions Test score. For undergraduate courses, the Irish Leaving Certificate points score of 480 or greater and the Health Professions Admission Test Ireland (HPAT) are combined to rank candidates for entry.

The Oxbridge style of integrated education Oxford University founded before 1167 and Cambridge University founded in 1209 have similar teaching policies and methods. Students go to Oxbridge to read a degree. As well as attending lectures and laboratory work appropriate to their courses, students at Oxford and Cambridge also benefit from highly personalised teaching time with world experts in their field. The only difference is that Oxford refers to these sessions as ‘tutorials’ while Cambridge calls them ‘supervisions’. Both universities informally assess students who must produce work for weekly tutorials with tutors. The final examination, used to classify the degree awarded, may be a combination of an end examination and a dissertation. Thus, there are elements of assessments for learning and of learning in parallel.

123

Ir J Med Sci

Transformation in teaching and assessment practice The role of teaching and assessment practice in higher education institutions is evolving rapidly. In October 2011, the Higher Education Authority (HEA) in the Republic of Ireland sought submissions on the potential roles of a National Academy for the Enhancement of Teaching and Learning. The Trinity College, Dublin, submission recommended that one of the objectives for a National Academy should be to raise the status of teaching within higher education institutions to ensure that excellent teaching is both properly recognised and rewarded [8]. University College Cork offers certificate, diploma and masters programmes for teaching and learning in higher education. Dublin Institute of Technology also offers academic teaching and learning programmes on a modular basis. UCD suggests ‘that there is value in facilitating institutions to identify academic staff with talent for teaching and learning development and to then network such staff nationally to promote and develop specific teaching and learning enhancement themes’. Staff development through ‘Centres of Teaching and Learning’ is the direction being taken at UCD. In the UK, the Higher Education Academy has a teaching standards recognition scheme which is referenced to the UK Professional Standards Framework [9]. Four levels of recognition are awarded commensurate with qualifications and experience. At the University of Ulster, the Centre for Higher Education Practice (CHEP) encourages all academic staff to engage in the scholarship of teaching and learning. CHEP events and seminars are organised regularly across the campuses. Associates and Fellows are required to fulfil exacting criteria before admission for a renewable 3 years period [10]. In Ireland, a National Strategy for Higher Education to 2030 was announced in January 2011 [11]. Summary recommendation number 8 states that ‘‘All higher education institutions must ensure that all teaching staff are both qualified and competent in teaching and learning, and should support ongoing development and improvement of their skills.’’ There is no comment on the qualification level. Consequently, the suggestion is wishful rather than wholly practical. A review of the quality of teaching and scholarship in all institutions as an integral part of performance management is recommended [12]. Using feedback from students to drive innovation and alter the presentation and content of courses will also form a core part of the National Strategy.

Assessment ‘Assessment of learning’ is the traditional paradigm in formal education. It differs from ‘assessment for learning’

Ir J Med Sci

(AfL) or ‘formative assessment’ (FA) because the AfL is a core contributor to learning itself. AfL forms part of routine teaching and involves the student directly in their own acquisition of understanding through knowledge. It was redefined in 2009 at the Third International Conference on assessment for learning, Dunedin, New Zealand as ‘‘assessment for learning is part of everyday practice by students, teachers and peers that seeks, reflects upon and responds to information from dialogue, demonstration and observation in ways that enhance ongoing learning’’ [13]. The participants were 31 assessment experts from New Zealand, Australia, US, Canada and Europe. The main objective in AfL is to contribute to the learning process itself. When true learning occurs, the consequences are manifested in performance. A position paper was set out at the Dunedin Conference and was republished when the subject of an academic editorial and commentary [14]. A key point is that ‘‘learners can be taught how to score well on tests without much underlying learning’’. AfL tries to side-step this problem by using information from students themselves, from teachers and from others in the class when the response to tasks set by teachers and sub-divided and modified by the interaction with student participants is used in a dynamic real-time manner to mould the synthesis and integration of the study objectives in the students’ minds. This has formed part of the tutorial system that is the experience in college for decades. Part of the practice is to give feedback to students which will clarify the objectives and the subject matter. This is an enquiry process which seeks to expose whether there is assimilation of capability and understanding. This process also directs the teacher to specific areas where students’ understanding seems limited and in need of clarification. With the active engagement of the teacher and as many students as possible, the likelihood of gaining further insights and clarifications from classmates increases. This facilitates the educational process. The use of formative assessment tasks can be part of this process but the downside is that sceptical teachers may morph these into mini-summative assessments. ‘‘Sources of evidence are formative if, and only if, students and teachers use the information they provide to enhance learning’’ [12]. It is important to provide students with direction on where to go next. Telling someone to do better is an insufficient response. The most succinct quote at the end of Klenowski’s editorial from the State collaborative on Assessment and Student Standards, Council of Chief State School Officers USA, is ‘‘formative assessment is a process used by teachers and students during instruction that provides feedback to adjust ongoing teaching and learning to improve students’ achievement of intended instructional outcomes.’’ Validity and usefulness are the major concerns in formative assessment and should take precedence over reliability [15].

3

Other educationalists—Black and Wiliam—suggested an alternative definition in 2009 which reads ‘‘practice in a classroom is formative to the extent that evidence about student achievement is elicited, interpreted, and used by teachers, learners, or their peers, to make decisions about the next steps in instruction that are likely to be better, or better founded, than the decisions they would have taken in the absence of the evidence that was elicited’’ [16]. Dr Sue Swaffield of Cambridge University succinctly explains the features which distinguish AfL from formative assessment [17]. (a)

(b) (c)

(d)

(e) (f)

AfL is a learning and teaching process—formative assessment is a purpose and may be a function of certain assessments. AfL concerns the immediate future—formative assessments can have a long time span. Both sides of AfL are the teachers and students in a specific class—formative assessments may involve and be of use to other teachers. In AfL, students exercise autonomy—formative assessments can render students passive recipients of teachers decisions and actions. AfL is a learning process—formative assessment provides information to direct future learning. AfL is concerned with learning how to learn as well as specific learning intentions—formative assessments concentrate on curriculum objectives.

Thus, AfL is clearly not about testing and is not a staggered summative assessment. The confusion in the literature is exemplified by the introduction to the paper by Hargreaves on collaborative assessment for learning. ‘‘The purpose of a summative test is summarising learning, while the purpose of assessment for learning is promoting learning: therefore it is important to distinguish between what characterises a highly valid summative test and what characterises a highly valid assessment for learning.’’ And further ‘‘the phrase ‘assessment for learning’ (or formative assessment) denotes assessment used specifically to enhance learning processes or performances, rather than just measure them’’ [18]. These quotes are telling and indicate a degree of confusion amongst teaching theorists.

Teacher tactics in AfL Use of questions to students in classrooms to direct and manage student learning—Harvard Business School methods Questioning, listening and responding are the three basic skills in interactive learning. Questioning starts the process. Questions in tutorials have three functions: firstly, to

123

4

simply ask a question and, secondly, to issue a set of instructions. The question can be broad ranging and allow a very wide range of response or can be couched in a narrow focus to restrict and direct the student to address a particular issue. The question can be open to start off a subject which allows non-prescriptive multi-faceted student answers and later a closed question which has only two answers that can be used to close out a topic. The third function is to heat up or cool down a discussion. Asking a personalised question to an individual heats up the room and asking another to agree or disagree and why adds to this. To cool off a discussion, ask a non-personal question and the atmosphere will cool. These are techniques taught at the Harvard Business School as part of their teaching processes. Listening must be concentrated and responding must involve verbal and body language cues. Listening to how things are said or are not said is important for the teacher. Echoing a comment back to the person and the class is important in some circumstances to get comments, but it must not be excessive because this becomes disruptive. Teacher reinforcement of the value of accurate hearing in student classroom interaction is an important facilitator [19]. To move on you summarise or defer until later. Students will pick up the signal. Getting students to vote on issues in class is a good way of engaging them. Taking a survey or seeking answers to short typed questions is another method. These questions must be relevant to the issues at hand otherwise they will be disruptive.

Collaborative AfL This concept is old and requires a sublimation of competitive instincts in students. Collaborative learning is not the same as peer teaching, but there can be mixing and alternating roles as was my experience in the 1970s. Encouraging students to collaborate rather than compete is a constructive objective in teaching and in the past would fly in the face of the highly competitive appointments structure in medicine. Interestingly, collaborative students approach their subjects in a more in-depth fashion and are more aware of how they themselves learn. This may lead to better performance in examinations and more life-long learning [20]. The key to collaborative learning is high quality preparation. This allows maximum evolution of collaborative knowledge acquisition and is a core element in the teaching process of the Harvard MBA [21].

Medical college and AfL In the context of classes of 150 or more medical students, the role of AfL is made much more difficult. The lecture

123

Ir J Med Sci

has stood the test of time for teaching large groups. There are limitations including the passivity of the audience, visual aids such as power points overloaded with information and the content may be inappropriate for the particular audience, being either too simple or too complex with too many assumptions as to the state of knowledge of the student body. Thus, lectures to students in isolation from knowledge of their backgrounds have a greater chance of being discordant [22]. AfL on the other hand ensures that live adjustments can be made to the presentation as feedback and interactions take place. Greater numbers make the probability of unacceptable variation in the educational experience more likely. In medical education, small group teaching should ideally have 7 or 8 students and should probably never exceed 12. The teacher’s role is primarily that of facilitator [23]. The ‘Fottrell report’ cites room tutorials of up to 20 students as a norm [5]. This has resource implications for expansion of medical student numbers at schools. At the Education MSc at the Institute of Leadership at the Royal College of Surgeons in Ireland, the operational policy of all presenters incorporated the AfL principle in teaching. When the student audience is heterogeneous, it is more difficult for the teacher to conduct the interactions constructively. The culture of AfL has been insufficiently assimilated by the graduate entry medical students at RCSI in 2013, because feedback to the course organisers after lectures and other teaching modalities is that the students prefer to be told exactly what is important to learn from amongst the totality of the information being didacted and/ or interacted. Assessment practices do influence the direction that students take in engaging with their courses [24]. This can be utilised as a measure of quality assessment in higher education [25]. In Australia, the objective of improving teaching and learning in Australian universities has directed institutions to re-evaluate their assessment policies and practices [26].

Core contents of assessments The expectations in institutional assessment policies can be synthesised into five bullet points [24]. In short, these are (1) assessment judgements should use specific criteria and standards rather than a norm-referenced method; (2) a course should use more than one form of summative assessment such as essays, verbal presentation, written examination, multiple choice questions, journal article, laboratory report, literature review or a practical performance; (3) marking of all assessments must be done in a class group or within a single course; (4) timely feedback to students must be given on every item of summative assessment; (5) there must be a standard process which is transparent and easily defensible

Ir J Med Sci

in arriving at the end examination result; (6) the course details, subjects and mode of assessments should be made available both in hard copy and electronically at the commencement of each module. The data showed that 66 % were confident in combining tasks into an assessment plan contrasting with the 80 % and 70 % confidence in developing summative and formative assessment tasks, respectively. The old traditional examination structure required less thought and innovation. Curriculum and assessment assumed learning independence and standards were unclear to students. Casual and non-tenured staff had less interaction with students. The other interesting finding in that Australian study was that academics were confident in making assessment judgements in ranking students (86 %), making defensible 91 % and transparent 85 % judgements, but were least confident in making judgements that were consistent with other academics. The sources of influence on course coordinator assessment practices were common sense in 95 %, and the course profile with the ranking of informal advice from colleagues and students equal at 89 %. Only 56 % felt that workshops by the Teaching and Educational Development Institute were helpful.

5

student has done well and the attitude of staff in showing willingness to support, allow time for consultation and promptness of feedback [24].

Education and degrees as a commodity Marginson and Considine [29] found that some Australian academics felt under pressure to tailor the content and standard of their courses to ensure student satisfaction with outcomes. There appeared to be a shift in emphasis from teaching to support student learning to primarily ensuring the satisfaction of students with education as a product. The result was an undermining of academic respect for university quality control systems. In a survey of Associate Deans of Teaching and Learning, the ‘how and why’ of feedback to students was the most common assessment issue [16]. Good feedback practices were (1) annotating individual assignments with comments (in my world—a red biro job!); (2) addressing identified common problems through whole class explanations, (3) use of criteria sheets that make explicit the performance required, (4) using focused staged assignments with regular feedback to morph into a project over a semester.

Student feedback In 2007, an Australasian survey of student engagement reported that 40 % of first year students never discussed their grades or assignments with teaching staff and 55 % never or rarely received prompt written or oral feedback on their performance. An executive summary is available [27]. The perception of students that formative assessment was entirely a pathway to the end examination and not part of assessment for learning concept was derived from the experience of lecturers referring to the end examination in the context of course content and not the purpose of the learning experience. Criticism of feedback by students participating in classroom or laboratory sessions included telling students the right answer to a question, but not explaining the method and details of how the answer was arrived at. In the UK, 60 % of students reported that feedback had helped them in their learning, but 29 % were dissatisfied by the delay in response [28]. The number of students in a class will impact on the feedback process because simple time and motion may make the ideal unattainable. This is an issue identified by Associate Deans of Study which needs to be addressed and resolved where possible. Good feedback attributes derived from first year student focus groups in Australia were the provision of constructive criticism, directing the areas to improve in future, provision of feedback which can then be used to amend an assessment task, highlighting where the

Postscript on AfL The following quotation from Sue Swaffield of Cambridge University encapsulates the overall message from the papers and reports referenced here [17]. ‘‘Assessment for learning when the learner is centre stage is as much about fuzzy outcomes, horizons of possibilities, problem-solving, and evaluative objectives, as it is about tightly specified curriculum objectives matched to prescribed standards. It is the (mis)interpretation of AfL as a teacher driven mechanism for advancing students up a prescribed ladder of subject attainment that is the problem, not AfL itself. At the heart of this problem is the understanding of teachers’ and learners’ roles in AfL.’’ There are big differences between teaching through lectures and by discussion-based learning. In discussionbased classes, there must be great flexibility in contrast to a lecture where the content is tightly regulated. Overdirection in a discussion class will negatively impact on students and will kill discussion.

Summative assessment This takes place at certain intervals when achievement has to be reported. It measures learning against a public

123

6

Ir J Med Sci

standard. The results for different students may be combined and studied because there is a common criterion of assessment. The methods must be as reliable as possible without endangering validity. The tests must be quality controlled. The evidence must cover the full range of tasks being examined [15]. The results of both formative and summative assessments may be used to direct a students learning course—where formative is a short-term and summative a long-term activity. Summative assessments should address what students are expected to achieve at a particular point. Teachers’ meetings to discuss students’ works should approach the issue holistically and not in a single issue. The criteria being discussed should be applied evenly across all students. Quality control procedures should be applied to ensure that there is good comparability between teachers.

Curriculum and assessment The Guidelines on Good Assessment Practice devised by the University of Tasmania (UTAS) in 2007 set out the principles of assessment at UTAS [30]. The fundamental role of assessment in learning and teaching is addressed. Three principles are set out. 1. 2.

Assessment should be seen as an integral part of the learning and teaching cycle. Assessment has five key purposes—to guide students’ development of meaningful learning. To inform the learner of their progress. To inform staff of the progress of students and the effectiveness of their teaching. To provide data for arriving at final grades of students and to rank students for awards. To ensure that academic quality and standards are maintained at the university.

3.

Assessment practices and processes must be transparent and fair.

An explanatory clarification and detail of implementation policies is outlined for every part of the above principles. In the implementation of these principles, there are a number of important changes over the past decades. The requirement of ‘‘early low-stakes, low-weight assessment should be used to provide the students with constructive feedback to improve their performance with later assessment pieces of higher weight used for summative assessment.’’ With regard to Principle 3, anonymity of student work must be maintained (UCD required names and numbers on

123

examination papers in 1976 and before, but now numbers only are used). There should be no inherent biases to disadvantage any student and, most importantly, clear descriptions of performance standards for student work should be made available at the commencement of a semester. As a clear exposition of the appropriate requirements and the reasons for them, the UTAS Guidelines 2007 offers a template for implementation at departmental level.

Current outcomes in the Republic of Ireland The question of grade inflation in assessments in the Republic of Ireland arises as in the Australia and elsewhere. The table below lists the recent overall institutional student performances at universities in the republic with regard to the percentage of students with first class honours.

2012

Peak and year

UCD

12 %

19 % in 2009

TCD

18 %

20 % in 2009

UCC

18 %

NUIG

14 %

16 % in 2007

NUI Maynooth

12 %

18 % in 2005

DCU

19 %

25 % in 2005

UL

12 %

14 % 7-year average

The Irish Higher Education Authority publishes annual outcomes data from various disciplines. Under the heading, ‘medicine and diagnostics’, the first class honours national figures were 9.4 % in 2011, 8.9 % in 2012 and (in medicine alone) 3.7 % in 2005. These data contrast with the past in UCD. The medical school output in 1974 was about 143 students and no first class honours were awarded. This graduate cohort subsequently went on to some academic distinction which included four senior professorships in UCD, two in the USA, masterships of two maternity hospitals, one professor at RCSI and one at University of Ulster. When considering grade inflation, there was no gold standard in 1974 against which student performance was compared. External assessors in Dublin were appointed from UK and Irish medical schools and no metrics were performed on examination questions, all of which were in the form of written essays. Across time periods, the question must be considered whether there is grade/honours inflation or grade/honours standardisation.

Ir J Med Sci

7

Accreditation in medical education

Norm referencing

There are nine standards developed by the World Federation for Medical Education (WFME). Each standard has two categories—a basic and a ‘quality’ standard [31]. Medical schools are obliged to clearly set out their assessment criteria and the balance between formative and summative assessment. A review of the subject was recently published [32]. There were five criteria set out as objectives of any particular assessment methodology— reliability, validity, impact on future learning and practice, acceptability to learners and faculty and the cost of assessment [33]. The requirement of medical schools to train staff in assessment as a discipline which includes essay writing and short answers, Objective Structured Clinical Examinations (OSCEs) and the setting of standards in both written and clinical examinations is consistent with the policy objectives of the Higher Education Authority with regard to National Strategy on standards to 2030 [11].

This method is often used. A candidate is ranked and a preordained percentage of the cohort is passed. The downside is that a student in a high ability group may fail, whereas the same student with similar performance will pass in a lower ability cohort simply because the pass standard has been reset in a random manner. The pass/fail border is thus influenced by the strength of the candidate group and not by the requirements of the course.

Training for medicine In professional qualification courses, the framing of questions must be such as to ensure that graduates possess core competencies. When the curriculum is defined through a core content set out for undergraduate medical education, it is easier to assess than a broader postgraduate set of knowledge. Matching the test questions with learning objectives is called blueprinting. In medicine, a broad knowledge, skills and attitudes are imparted. Many different assessment methods are needed. Multiple choice examination will check knowledge while skills have to be measured individually, often in multiple clinical bays and communication and attitudes by a live interactive test.

Standard setting in medical professional examinations How is the standard set for professional examinations? There is no gold standard for a passing score or mark. Setting performance standards in examinations is complex. Test-centred models include Angoff, Ebel and Nedelsky. Examinee-centred models are the Hofstee method, contrasts by group approach and borderline group method. There are other methodologies which are infrequently used [34]. Norm referencing and Angoff variations are the most often used in UK and Irish medical schools [35]. Many net-based slide shows on setting standards are available [36, 37]. A clear description can be found in Zieky and Perie’s the Primer on setting cut scores [38].

Criterion referencing The minimum pass/fail standard is set before the test. Standards are set for each test, item by item. Angoff and Ebel process is the most common method for doing this [35]. Angoff process An expert panel of judges is assembled to estimate the proportion of minimally competent candidates who would correctly answer a given question. The average of the results from each expert for each question or part of a question becomes the predicted difficulty of the question and is the Angoff cut score. At the end of the first round of ratings, each expert is shown the results of the others for each test item and then asked whether they would like to re-rate their items based on the results of the other experts. This second round of ratings is then averaged across each question and a final cut score calculated. This method allows the pass grade to be empirically established and the test then becomes legally defensible. Unlike a norm referencing pass/fail method, the Angoff method allows all to fail or all to pass. A 95 % confidence interval can be calculated from the ratings of the expert judges and a passing point can be selected from within the 95 % confidence interval. Once this passing point is set, it is used throughout the examination for each candidate. A practical difficulty is the evidence that judges have difficulty in accurately determining what constitutes a borderline candidate and are not confident in their estimate of candidates’ performances. Previous experience of the process improves judges’ performance in the process [39]. The advantage of this method is that it has held up in court of law as valid. Modified Angoff’s (yes/no) method Judges reach a consensus on the characteristics of a borderline candidate and then pose the question—would the

123

8

borderline candidate be able to correctly answer the question posed? A correct answer is given ?1 and an incorrect answer is zero. The pass mark is calculated by averaging the scores. This is the standard setting method used by the American Board of Internal Medicine. Nedelsky’s method The method is only used in MCQs. The borderline candidate responds to MCQs by firstly eliminating the responses that he/she believes are incorrect and then guessing at random from the remaining answers. The plausibilities of the remaining segments are rated by the candidate. The candidate’s expected score for any question is 1 divided by the number of answers the test taker has to guess from. The examiners establish the pass level by independently identifying the answers that a borderline candidate would be able to recognise as implausible. The number of remaining options determines the probability that the candidate will answer the item correctly: 1 plausible response = 100 %, 2 = 50 %, 3 = 33 %, 4 = 25 % and 5 = 20 %. The average of the probabilities determines the pass point. The pass point can also be calculated using the median which eliminates outlier judges or a trimmed mean. This is usually done by eliminating the highest and lowest 25 % scores and averaging the middle 50 %. Ebel’s method The judges classify the item as easy, medium or difficult and the item’s importance as questionable, acceptable, important or essential. This forms 12 groups of items. The judges make a decision on the percentage of items a borderline candidate should get right in each of the 12 categories (which are three difficulty items multiplied by four importance items). The number that should be correct across all categories is averaged and this then determines the pass point. Bookmark method Actual examination data are used to arrange item difficulty with judges selecting the most difficult item that a borderline candidate would be likely to answer correctly and a ‘bookmark’ is placed at that point. Each judge is bookmarked separately and the judges’ collective average is calculated and given to the judges. The effect of the collective average is discussed with the judges and the judges are invited to reassess. They may place the ‘bookmark’ at the same or a new site. A passing score is judged on the

123

Ir J Med Sci

mean or median of the final judges’ scores and this is the cutoff score. Hofstee method This method is a compromise between a relative and absolute model. It is suitable for overall pass/fail decisions and is approved by the United States Medical Licensing Examination (USMLE). The judges must agree on the lowest and highest acceptable passing scores, and the lowest and highest acceptable failure rates. The passing score is the mean of four judgements plotted against the cumulative score distribution. The difficulties with compromise methods are that they are unsuitable in high stakes examinations and the cutoff score may not be in the area defined by the judges’ estimates. Borderline group method Judges identify an actual borderline group and use the median score for this group as the pass score. Contrast by group approach Judges sort the group into competent and non-competent. The judgement is based on the prior record of the examinees and not the current test results. After the sorting is completed, the scores of the competent and non-competent are plotted and the point of intersection of the two groups is considered the pass score. Test reliability Reliability of the examination has a candidate moiety which is inter-case reliability and an inter-examiner reliability which is best assessed using multiple examiners. Broad sampling across cases is necessary to reliably assess clinical competence because doctors do not perform consistently from task to task [40]. Validity in testing requires that the tests selected are appropriate for the task. In medicine, this means a composite examination of differing modalities.

Clinical examinations OSCE The Objective Structured Clinical Examination (OSCE) uses a wide variety of case tasks and scenarios at different stations. It was first introduced in 1975 and is recognised as

Ir J Med Sci

the gold standard for the assessment of clinical competence. OSCE is a circuit of examination stations with usually two different examiners in each station with an actor or simulated simple or complex scenario in each station. Candidates rotate through the stations, so all take the same examinations. OSCEs usually last 2 h and have 24 stations with 5 min allocated to each station. Fewer stations with longer station times may be used, but this will reduce the reliability and validity of the examination. The examination is standardised, which contrasts with the traditional long case examination each of which is different. The OSCE allows fairer peer comparison. Each step of the station process is marked, so the procedure is more objective. The examination is structured because candidates are given identical information through simulated patients who are provided with specific scripts. Instructions are written and parts of all elements of the curriculum are tested. The candidate is only asked questions that are set out on the marking sheet. Additional questions carry no marks. Marking options include a global score at a station where the candidate may be rated fail/borderline/pass/ good/excellent. The sum of the pass marks from all the stations may be used to determine the overall pass mark for the OSCE. An alternative is to pre-determine a minimum number of station passes to avoid an inadequate performance being compensated by a good or excellent performance in a small number of tasks. The objective structured long examination record (OSLER) The traditional long case clinical examination has been replaced by the OSLER [41]. Over a 30-min period, the examiner uses a structured score sheet to assess history taking, physical examination performed, the appropriateness of suggested investigations, the examiners opinion of the management of the patient and the general clinical acumen of the candidate. Each area is graded separately and an overall grade is recorded for the entire performance. Mini clinical evaluation exercise (mini-cex) The examiner watches the candidate take a focused history, performs relevant clinical examinations and

9

provides a diagnosis and a management plan. The test lasts 15–20 min and the performance is rated on a sixor nine-point scale. The examiner then provides 5-min feedback to the candidate. This is done at the workplace and is usually done in a postgraduate setting. The advantage is that clinical skills are observed in the workplace. This can be extended to assessment through the direct observation of procedural skills (DOPS) such as intubation, central venous line insertion, arterial blood sampling and others. Case-based discussions These are principally used in postgraduate training. The examiner selects a case from a portfolio of cases that the trainee has selected. The assessment of the application of knowledge, decision making, record keeping, treatment, follow-up and ethical and professional issues is undertaken. Fifteen minutes is allowed for the examination and 5 min for feedback. Each dimension assessed is scored on a sixpoint scale [41]. Miller’s pyramid of competence The Miller pyramid of competence is an index of the issues involved when analysing validity [42]. Miller’s pyramid (below) is used as a tool to develop assessment methods. It may be used to delineate learning objectives in medical education. The lowest level is testing what the student knows through multiple choice questions (MCQ). The next level is testing whether the student knows how through essays, case presentations or extended matching type MCQs. OSCEs and clinical examinations are used to allow the student to demonstrate their competencies. The highest level is where the student actually demonstrates their knowledge and ability by doing the requisite skill. Performance assessment using record analysis, direct observation and multisource feedback may be used to assess what the person does in practice. Teaching and assessment methods are designed to ensure that the pinnacle of the pyramid is reached.

123

10

Computer adaptive testing (CAT) The system is used in US medical licensing examinations and also as an index of the standing of individual students in subjects taught at many medical schools. It is used in graduate management admissions council (GMAT) and in postgraduate medical examinations. The standard test is likely to contain questions that are either too easy or too difficult. The Psychometric Centre at Cambridge University provides an example of how to address the problem. Item response theory (IRT) and CAT have been made available as open resource services through the Concerto testing platform [43]. Therefore, its use is likely to increase. IRT was described by Bimbaum [44]. Three-item parameters are usually used to rank a question and answer. These are discrimination, difficulty and guessing. The metrics of the testing accurately determine the value of the individual item. It adapts to the candidates ability level determined by a previous question. For example, a correct answer to a question of intermediate difficulty will generate a question of greater difficulty or the opposite where a wrong answer will generate an easier question in response. A 95 % confidence interval is calculated regarding the candidate’s ability. This is repeated until a consistent reflection of the candidates’ ability is calculated. Each question will focus in on the true ability and the 95 % interval of the ranking set question will narrow. CAT requires fewer questions than conventional MCQs to reach

123

Ir J Med Sci

equally accurate scores. Standard MCQs have good precision for candidates of medium ability but are less discriminatory for high or low achievers, whereas CAT is a more accurate ranker of candidates [45]. IRT is used to reduce rater error from performance assessments, and a large well-tested question and data bank are necessary to operate this successfully. Good examination practice Transparency as well as inclusivity is a key ingredient of good examination practice. It is important that students are confident in the honesty and authenticity of the assessment systems used. With high numbers in classes at universities, the importance and difficulty of standard setting are clear. Feedback to students on elements of the course and examiner expectations is important. To quality assure an examination, explicit guidelines for the recruitment, selection and training of examiners are necessary. Each examination must be monitored item by item. Fitness for the role of examiner requires fair treatment for candidates across all ethnic groups and individuals. Licensing bodies must be satisfied that prescribed standards are upheld and are consistent. The medical colleges must develop and maintain a clear description of the necessary standards and play a full role in training, observation and developmental activities [46]. The road to better assessments of clinical competence was mapped in the Lancet in 2001 [47].

Ir J Med Sci

11

Attitudes in academe to student assessment have transformed to the point of student involvement in the choice of assessment methods, thus allowing students to play to very diverse strengths. Offering students the opportunity to be assessed in different ways is now regarded as best practice in education. The subject literature is reviewed by the University College Dublin group [48]. To ensure that assessments are relatively equal for students is a major challenge for the staff.

Conclusion Assessment for learning has many nuances and is a central part of the educational experience in the best universities in the global ratings scales including Harvard, Oxford and Cambridge. Student interaction in the classroom learning process is demanding on both the students and teachers, requiring an augmented degree of commitment to preparation and openness and collaboration. Feedback forms a central role in quality monitoring of the learning process and provides directional guidance to both the teacher and students. Both formative and summative assessments may be used to direct a students learning course—where formative is a short-term and summative a long-term activity. Summative assessments should address what students are expected to achieve at a particular point. The educational loss if assessment is solely confined to summative assessment is clear from the development of ‘‘assessment for learning (AfL)’’. Because all areas of professional examinations are subjected to performance rating through statistical analysis, the bias and eccentricity in examiners are constrained. Formal training and examination of the performance of examiners have become a new norm. Examiners whose ratings are demonstrably outside acceptable limits set by the medical colleges are removed from the examination roster. Conflict of interest

There are no conflicts of interest.

References 1. Coolahan J (1981) Irish education history and structure. Institute of Public Administration, Dublin 2. Mansell W (2013) Exam entries—a double-edged sword? The guardian, p 34 3. Network for Irish Education Standards (2007) Grade inflation: Summary of papers 1–5. http://www.stopgradeinflation.ie. Accessed 24 Aug 2013 4. http://www.telegraph.co.uk/education/educationnews/10258795/ GCSE-results-2013-record-fall-in-top-grades-as-tough-examreforms-take-effect.html. Accessed 24 Aug 2013

5. http://www.dohc.ie/publications/pdf/fottrell.pdf. Accessed 24 Aug 2013 6. Medical Council. Medical Council rules in respect of the duties of council in relation to medical education and training (section 88 of the Medical Practitioners Act 2007). http://www.medi calcouncil.ie/Education-and-Training/Undergraduate-MedicalEducation/. Accessed 24 Aug 2013 7. General Medical Council (2010) The state of basic medical education, reviewing quality assurance and regulation. http:// www.gmc-uk.org/publications/undergraduate_education_publica tions.asp#stateofbasic. Accessed 24 Aug 2013 8. http://www.hea.ie/files/SynthesisOfSubmissions FromHEIstoNationalAcademyconsultation.pdf. Accessed 7 May 2013 9. http://www.heacademy.ac.uk/professional-recognition. Accessed 25 Aug 2013 10. http://www.ulster.ac.uk/centrehep/membership.html. Accessed 14 Feb 2014 11. Report of the Strategy Group. National strategy for higher education to 2030. 14–15. http://WWW.education.ie/en/Publica tions/Policy-Reports/National-Strategy-for-Higher-Education2030.pdf. Accessed 14 Feb 2014 12. Report of the Strategy Group. National strategy for higher education to 2030. 8. http://WWW.education.ie/en/Publications/Pol icy-Reports/National-Strategy-for-Higher-Education-2030.pdf. Accessed 14 Feb 2014 13. Third Assessment for Learning Conference Participants (2009) Third international conference on assessment for learning. Dunedin 14. Klenowski V (2009) Assessment for learning revisited: an AsiaPacific perspective. Assess Edu Prin Policy Pract 16:263–268 15. Harlen W, James M (1997) Assessment and learning: differences and relationships between formative and summative assessment. Assess Educ 4:365–379 16. Black P, Wiliam D (2009) Developing a theory of formative assessment. Educ Assess Eval Account 21:5–31 17. Swaffield S (2011) Getting to the heart of authentic assessment for learning. Assess Educ Princ Policy Pract 18:433–449 18. Hargreaves E (2007) The validity of collaborative assessment for learning. Assess Educ 14:185–199 19. Garvin DA. http://hbsp.harvard.edu/multimedia/pcl/pcl_1/3/ threequest.html. Accessed 25 April 2013 20. Watkins C, Carnell E, Lodge C, Wagner P, Whalley C. Learning about learning enhances performance. NSIN research matters 2001. 13 21. Frei F. http://hbsp.harvard.edu/multimedia/pcl/pcl_2/2/6learning collectivelyFF.html. Accessed 25 April 2013 22. Harden RM, Laidlaw JM (2012) The lecture and teaching with large groups. In: Essential skills for a medical teacher, Chapter 21. Churchill Livingston Elsevier, Edinburgh, pp 131–6 23. Crosby JR, Hesketh EA (2004) Developing the teaching instinct: 11: small group learning. Med Teach 26:16–19 24. Goos M, Gannaway D, Hughes C (2011) Assessment as an equity issue in higher education: comparing the perceptions of first year students, course coordinators, and academic leaders. Aust Educ Res 38:95–107 25. Coates H (2005) The value of student engagement for higher education quality assurance. Qual High Ed 11:25–36 26. James R, McInnis C, Devlin M (2002) Assessing learning in Australian Universities. Centre for the Study of Higher Education and AUTC, Melbourne 27. http://www.acer.edu.au/documents/aussereports/AUSSE_2008_ Australasia_Uni_Exec_Summary.pdf. Accessed 12 April 2013 28. Yorke M, Longden B (2008) The first year experience of higher education in the UK. Final report. Higher Education Authority, Heslington

123

12 29. Marginson S, Considine M (2000) The enterprise university: power. Governance and reinvention in Australia. Cambridge University Press, Cambridge 30. Guidelines for good assessment practice. Compiled by the University of Tasmania Assessment Working Group 2007. University of Tasmania 31. (2003) World Federation for Medical Education Basic Medical Education WFME Global Standards for Quality Improvement. WFME Office: University of Copenhagen, Denmark 32. MacCarrick GR (2011) A practical guide to using the World Federation for Medical Education Standards. WFME 3: assessment of students. Ir J Med Sci 180:315–317 33. Van der Vleuten CP (1996) The assessment of professional competence: developments, research and practical implications. Adv Health Sci Educ 1:41–67 34. Arnett R (2013) Lies damned lies and exam results. 2 setting performance standards quality enhancement office. RCSI Institute of Leadership, Dublin 35. Cusimano MD (1996) Standard-setting in medical education. Acad Med 71(suppl 10):s112–s120 36. Greenwood S. Standard setting in the MBChB programme. University of Bristol. http://www.bristol.ac.uk/medical-school/ staffstudents/.../standardsetting.pdf. Accessed 27 Aug 2013 37. Boursicot K, Roberts T. Principles of standard setting. http:// www.heacademy.ac.uk/assets/.../51_Principles_of_Standard_Setting.ppt. Accessed 27 Aug 2013 38. Zieky M, Perie M. A primer on setting cut scores on tests of educational achievement. Educational Testing Service 2006. http://www.ets.org/Media/Research/pdf/Cut_Scores_Primer.pdf. Accessed 28 Aug 2013

123

Ir J Med Sci 39. Boursicot K, Roberts T (2006) Setting standards in a professional higher education course: defining the concept of the minimally competent student in performance based assessment at the level of graduation from medical school. High Ed Q 60:74–90 40. Swanson DB, Norman GR, Linn RL (1995) Performance-based assessment: lessons learnt from the health professions. Educ Res 24:5–11 41. Harden RM, Laidlaw JM (2012) Clinical and performance-based assessment. In: Essential skills for a medical teacher, Chapter 30. Churchill Livingstone Elsevier, pp 195–202 42. Miller GE (1990) The assessment of clinical skills/competence/ performance. Acad Med 65:563–567 43. http://www.psychometrics.cam.ac.uk/page/338/concerto-testingplatform.htm. Accessed 27 Aug 2013 44. Birnbaum A (1968) Some latent trait models and their use in inferring an examinee’s ability. Part 5 in FM Lord, MR Novick. 1968 Statistical theories of mental health scores. Addison-Wesley, Reading 45. Weiss DJ, Kingsbury GG (1984) Application of computerized adaptive testing to educational problems. J Educ Meas 1984(21):361–375 46. Foy F (2013) The logistics of clinical assessment. RCSI Institute of Leadership Data bank, Dublin 47. Wass V, Ven der Vieuten C, Shatzer J, Jones R (2001) Assessment of clinical competence. Lancet 357:945–949 48. O’Neill G, Doyle E, O’Boyle K, Clipson N. Choice of assessment methods within a module: students’ experiences and staff recommendations for practice. AISHE 2010. http://ocs.aoshe.org/ aishe/index/.php/international/2010/paper/paper/view/155

Online assessment in medical education-current trends and future directions.

Relevance of anatomy to medical education and clinical practice: perspectives of medical students, clinicians, and educators.

Innovations and best practice in undergraduate education.

Learning Analytics And Medical Education.

Social origins, medical education, and medical practice.

Endovascular thrombectomy for stroke: current best practice and future goals.

Ethics teaching in a medical education environment: preferences for diversity of learning and assessment methods.

Best practice in wound assessment.

External quality assessment: best practice.

Best practices for implementing team-based learning in pharmacy education.

The science of learning and medical education.

Developing microbiological learning materials for schools: best practice.

radiological investigations.

Bridging medical education and clinical practice.

Notes for the Primary Care Teachers: What do we Hope to Achieve in Family Practice Training and How: A brief look at the current trends in medical education.

Current medical practice and hyperactive children.

Quality assessment of medical research and education.

Learning to empower patients. Results of professional education program for diabetes educators.

Contemporary and future eLearning trends in medical education.

Temporal trends and current practice patterns for intraoperative ventilation at U.S. academic medical centers: a retrospective study.

American Indians in U.S. Medical education: trends and prospects.

The privatization of medical education in Brazil: trends and challenges.

Assessment in Youth Sport: Practical Issues and Best Practice Guidelines.

Continuing education for general practice--a learning system.