© Longman Group UK"0Lid 1992

Midwifery

Scoring and setting pass/fail standards for an essay certification examination in nursemidwifery Judith T Fullerton, Deborah L Greener and Leon J Gross Examination for certification or licensure o f health professionals (credentialing) in the United States is almost exclusively of the multiple choice format. T h e certification examination for entry into the practice of the profession of nursemidwifery has, however, used a modified essay f o r m a t t h r o u g h o u t its twenty-year history. T h e examination has recently u n d e r g o n e a revision in the m e t h o d for score interpretation and for pass/fail decision-making. T h e revised method, described in this paper, has i m p o r t a n t implications for all health professional credentialing agencies which use modified essay, oral or practical methods o f competency assessment. This paper describes criterion-referenced scoring, the process of constructing the essay items, the methods for assuring validity and reliability for the examination, and the m a n n e r o f standard setting. In addition, two alternative methods for increasing the validity o f the pass/fail decision are evaluated, and the rationale for decision-making about marginal candidates is described.

INTRODUCTION The American College of Nurse-Midwives (ACNM) has been involved in the testing of American nurse-midwives since the early 1970s, initially through the ACNM Test Committee, subsequently through the Division of Examiners, and until 1991 through the Division of Judith T Fullerton PhD, CNM. Associate Professor of

Clinical Family Medicine, Department of Community & Family Medicine, 0809, 9500 Gilman Drive, University of California, San Diego, La Jolla, CA 92093-0809, USA. Deborah Greener PhD, CNM. Test Consultant, ACNM Certification Council, 8401 Corporate Drive, Suite 470, Landover Maryland 20785, USA. Leon Gross PhD. Director of Psychometrics and Research, National Board of Examiners in Optometry, 5530 Wisconsin Ave, NW, Suite 805, Washington, DC 20815, USA. Manuscript accepted 19 November 1991 Requests for offprints to JTF

Competency Assessment (DCA). The focus of responsibility has been the development and administration of an examination which, when successfully completed, entitled the individual to be known as an 'ACNM certified nurse-midwife'. From September 1991, responsibilities for nurse-midwifery certification have been assumed by the ACNM Certification Council Inc., and independent testing corporation. The certificant will be known as an 'ACC certified nursemidwife'. Prior to 1970, individual schools of nurse-midwifery education had bestowed the title 'nurse-midwife' on their graduates. After 1970, this task was assigned to the national committee. The national examination, therefore, generated the credential necessary for entry into professional nurse-midwifery practice. Each of the United States separately regulates the practice of health professionals within its jurisdiction. The credential is itself sufficient for 31

32

MIDWIFERY

licensure to practice nurse-midwifery in most of the 50 United States, and required for licensure in some others. T h r o u g h o u t its history, the examination has been offered in the modified essay format, consisting of short answer essays, using outlining or listing of responses to structured clinical scenarios. This format is not widely used in the United States for health professional credentialing. Multiple-choice formats have been preferred by state licensure agencies and by professional and occupational credentialing boards because of greater objectivity, compatibility with methods of statistical analysis and, recently, because of the ease with which this format can be adapted for computerised administration and scoring. The modified essay format was selected by ACNM for several important reasons. In the earliest years of examination development, ACNM drew heavily from its relationship to European midwifery systems, and reflected, in its own work, the methods which had already been shown to be useful for the purpose of assessment for entry into the practice of midwifery. Second, ACNM respected that the essay format was particularly relevant to the task of communication which was an important skill of the professional midwife. The essay format requires that the candidate demonstrates the ability to convey client information in a concise and articulate fashion, as the practising midwife would need to do when writing i n t h e records. In addition, the modified essay format has been considered to provide a more reality-based assessment of clinical situations when compared with the multiple-choice format. In actual clinical practice, midwives must make judgements and decide on actions based on principles (memory) rather than a list of previously identified options. Pass-fail standards for ACNM's certification (credentialing) examination were originally determined in a norm-referenced manner, that is, the pass/fail standard was determined by comparing the performance of a candidate with respect to all other candidates who took the same form of the examination. Norm-referencing is widely used in testing as a relative standard of

performance. It has particular usefulness as an indicator of selection, as, for example, in admissions or hiring decisions. The limitation inherent in norm-referencing, however, is that group norms change, and, therefore, the standard for selection may change. A marginal performance in one group assessment may appear better or worse when compared to another group norm. A measure of competence should demonstrate less'variance (Fullerton et al, 1989). Criterion-referenced scoring offers that desired stability, since decisions are made by reference to an objective, predetermined and task-related standard. Interest in wider use of criterion-referenced testing and the essay format has only recently been raised in the United States (Haertel, 1985; Hambleton and Rogers, 1986; Turnbull, 1989; Leucht et al, 1990)_ Research into the psychometric properties of this format has, therefore, been limited. The issue of setting standards (pass/fail decisions) for essay format examinations has been largely unexplored (Lunz & Stahl, 1990). In 1988 ACNM DCA implemented a change to criterion-referenced standard setting. Pass/ fail standards are now established by comparison to a predetermined expected level of acceptable performance. This article describes the development of both a scoring and a standard setting model for ACNM's criterion-referenced modified essay examination and discusses some of the psychometric issues inw)lved in the use of a criterion-referenced essay examination. These issues are particularly relevant to educational programs and to credentialing agencies which use a modified essay format for the assessment of students or candidates. The methods which are discussed represent an advancement in the art of constructing tests in this written format. The standard setting model, in turn, represents an advance for the development of oral and performance (practical) examinations (Richards, 1985; Cantor, 1989), offering a technique which can be adapted for the development of performance evaluation instruments. Performance testing may be used as an adjunct to written evaluation, or may be the preferred method of evaluation in selected situations, as,

MIDWIFERY 33

for example, in the formative evaluation of students, or a m o n g groups with language or literacy deficits.

DEVELOPMENT OF THE CRITERION-REFERENCED ESSAY EXAM

answer. Another g r o u p of CNMs then takes the examination and critically reviews the items and provides feedback for revision. T h e challenge in constructing a modified essay item is to develop a question which requires high-order reasoning and allows the candidate to offer a range of responses, while requiring the responses to reflect several aspects of safe clinical practice. An example of a question and answer key are shown in Figure 1.

Methods T h e ACNM examination is constructed f r o m a blueprint which defines the domain of professional competencies for the entry-level practitioner. T h e examination is developed to include items reflecting both professional (clinical) knowledge and j u d g e m e n t in four content areas: antepartum, intrapartum, postpartum-newborn, and family planninggynaecology. In keeping with the role requirements of nurse-midwives, who care primarily for women experiencing an uncomplicated pregnancy, approximately two-thirds o f the items represent normal, uncomplicated clinical cases or situations, and one-third represent deviations from the normal. Items relating to professional issues in nurse-midwifery, such as risk management, malpractice, and ethics are also included according to blueprint specifications. T h e essays are constructed by teams of 5-7 certified nurse-midwives (CNMs) who are active clinical practitioners, educators and/or administrators. Particular care is taken to ensure that the essays address entry-level responsibilities of the substantial majority o f nurse-midwives, across a diversity of demographic (e.g., urban vs. rural), geographic (e.g., north vs. south), and practice (e.g., in- vs. out-of-hospital) settings. During test development, each item is critically reviewed to eliminate ambiguity. Particular attention is given to delimiting the category and range of response. T h e answer keys are constructed concurrently with the items and validated by a separate g r o u p of CNMs who are responsible for reviewing the current nursing, midwifery, and medical literature and finding a minimum of three references documenting each

Content area: family planning/gynaecology Question A 22-year-old woman, presents to the clinic stating that she has very sore blisters on her vulva, with pain on urination and during intercourse. She states that she has never had anything like this before. Upon examination you note five distinct, raised vesicles surrounded by a large erythematous area on her left labia. Speculum and bimanual examinations cannot be done due to client discomfort. A. Identify the most probable diagnosis(es). B. After the nurse-midwife takes steps to confirm the diagnosis, two specific areas of management for this problem remain: a) treatment, and b) provision of anticipatory guidance and patient education. Cite, within each management category, three specific clinical actions which should be taken. Key A.

Primary herpes i n f e c t i o n B. Treatment acyclovir - synthetic analgesics - topical anaesthetics - urinary analgesics o r c o m f o r t measures for urination - catheterise if necessary 2. Anticipatory guidance and patient education - symptomatic relief, n o t c u r e , is available - recurrence is l i k e l y . . , p r o v i d e i n s t r u c t i o n c o n c e r n i n g d i e t , r e d u c t i o n o f stress, hygiene measures which might decrease frequency 1.

-

-

of recurring events. mode of transmission

of the infection...

preventive measures - implications for pregnancy - need for follow-up and c o m p l e t i o n patient examination Fig. 1 Sample of question and answer key

of

34

MIDWIFERY

Scoring T h e modified essay f o r m a t of the A C N M examination requires that candidate responses are handwritten. Each response is then read, comp a r e d to an answer key, and scored by a reader (also k n o w n as an e x a m i n e r o r rater). T h e readers for each examination are f r o m the same g r o u p o f CNMs who constructed that f o r m o f the examination. Extensive efforts are m a d e by A C N M to structure tightly the essays and keys in o r d e r for the scoring to be as objective as possible; nevertheless some j u d g e m e n t s must be m a d e by the readers. T h e r e f o r e , several steps have been taken to examine inter-reader reliability and to assure that scores have adequate reliability. I n t e r - r e a d e r reliability is assessed using a 2-step process. First, each reader within a g r o u p o f 5-7 readers scores every examination for a series o f 6 0 - 8 0 candidates. T h e s e item scores, for all candidates across all readers, are used to c o m p u t e Pearson correlations m e a s u r i n g r e a d e r a g r e e m e n t (i.e., inter-reader reliability). Outliers, defined as readers with low a g r e e m e n t indices, are neither retained as examiners nor used for scoring purposes. T h e second step in the process inw)lves examining r e a d e r scores across candidates. This is d o n e to determine whether an individual reader's m e a n score across all candidates differs f r o m the aggregate m e a n for all readers by more than one standard deviation (plus or minus). This degree o f disparity may represent bias, and if found, is a second criterion ti)r not retaining a reader as an examiner. After inter-reader reliability has been established for an examination f o r m t h r o u g h this 2-step process each examination is read and scored by only one reader. Scores can thus be r e p o r t e d fi)r an examination with established reliability in about 6 weeks.

SETTING THE STANDARD FOR THE MODIFIED ESSAY EXAMINATION I n o r d e r to establish a pass/fail c u t o f f score fi)r the credentialing examination, A C N M has

developed a modification o f the Nedelsky technique (Nedelsky, 1954; Gross, 1985) appropriate for essay f o r m a t examinations. This technique involves identifying the a m o u n t o f candidate e r r o r considered 'permissible,' and then subtracting the point values o f these errors f r o m the weight o f the correct response. Each item thus has an individual standard, the numerical value o f which can be s u m m e d across items for a total test standard. T h e pass/fail standard for each item, and therefore the total test, is d e t e r m i n e d by the same CNMs who developed the test items. Each completed and validated item-answer set is individually and i n d e p e n d e n t l y reviewed and discussed by the group. T h e importance and relevance o f each keyed response to safe nursemidwifery practice, and to a p p r o p r i a t e expectations for the beginning practitioner, are evaluated by the g r o u p in d e t e r m i n i n g individual item standards.

RELIABILITY AND VALIDITY ISSUES T h e A C N M DCA has addressed three main issues o f particular i m p o r t a n c e to the validity and reliability of the examination_ T h e s e issues include the validity o f the examination content, the reliability o f the scores, and the validity o f the standard set ti)r each examination. Each potential area o f concern will be discussed in detail.

Content validity Since its inception, the A C N M examination has been intended for use as an assessment o f competency fi)r entry into practice o f the profession. T h e blueprint for the examination, which defines the c o m p e t e n c y d o m a i n for entrylevel nurse-midwifery practice, was d r a w n f r o m several sources and continues to be re-evaluated and u p d a t e d as needed. A major source of infi)rmation about the competency d o m a i n was a detailed task analysis o f American nurse-midwifery practice (Fullerton, 1987). A t h o r o u g h review o f standards fi)r

MIDWIFERY

the practice o f nurse-midwifery, as delineated in various state statutes and regulations which govern the profession, was also conducted. Documents p r o m u l g a t e d by o t h e r professional work g r o u p s and committees, such as the Core Competencies for Nurse-Midwifery Education (ACNM, 1985), were also used as guides fi)r blueprint d e v e l o p m e n t and revision. Content validity fi)r the examination is achieved t h r o u g h a d h e r e n c e to the blueprint.

Score reliability As previously mentioned, the traditional psychometric concerns r e g a r d i n g essay f o r m a t examinations are the presence of e x a m i n e r bias and the attainment o f sufficient interrater (i.e., inter-reader) reliability_ A C N M must determine that pass/fail outcomes result f r o m the quality o f candidate performance, not f r o m r e a d e r subjectivity. Typical inter-reader a g r e e m e n t indices for the examinations range f r o m 0.88 to 0.97. These correlations reflect a high level of a g r e e m e n t in scoring the examinations, influenced by the extensive discussion and revision o f the questiuns and keys that occurs d u r i n g the test development and standard setting process. In addition, intra-reader reliability (consistency o f an individual reader in reading and scoring examinations over time) appears e n h a n c e d by this process and by the detailed notes that readers take about what they will or will not accept as an answer.

Standard setting validity Each item and keyed response is discussed by the expert panel to determine the responses expected of the entry-level nurse-midwife_ Since nurse-midwives in clinical practice may have different p e r f o r m a n c e expectations than nursemidwifery educators or administrators, all types of nurse-midwives are represented in this process. For each portion o f each essay, the goal is to gain consensus on the p e r f o r m a n c e standard [or each item. This modified Nedelsky p r o c e d u r e for setting standards for essay examinations has utility on an aggregate basis, but individual candidate

35

essay data may be imperfect, in o r d e r to decrease possible bias associated with pass/fail scoring on an essay examination, A C N M conducted a research project to investigate m e t h o d s of affirming the pass/fail decision for individual candidates whose examination scores are considered marginal. Marginal candidates are those whose total test scores are within one standard e r r o r o f m e a s u r e m e n t o f the pass/fail c u t o f f fi)r the test. Two alternative methods o f validating this pass/fail decision were c o m p a r e d : majority opinion and referee arbitration. T h e majority opinion m e t h o d inw)lved two additional examiner readings for marginal candidates. T h e examination scripts o f these candidates were read by examiners who had demonstrated adequate inter-reader reliability for that f o r m o f the examination, and who were unaware that the candidate had received a marginal score on a previous reading. Alter the additional two readers scored the script o f the marginal candidate the new scores and pass/fail deosions were re-examined in conjunction with the first score and pass/fail decision. T h e three scores were averaged and the pass/fail decisions tallied, i f the average score was at or above the pass/fail cutoff score, or if two o f the three pass/fail decisions yielded a pass, the candidate was considered to have passed the examination. This disjunctive a p p r o a c h extends the benefit o f the d o u b t to the candidate. T h e second m e t h o d designated a single reader as an arbitrator or referee. U n d e r this method, marginal candidates were rescored by a designated referee who either confirmed or contradicted the pass/fail decision. T h e referee was the reader who had previously d e m o n strated the highest inter-reader reliability for the particular exam f o r m (if this reader was not available, the r e a d e r with the next highest reliability was selected). I f the referee offered a pass/fail status at variance with the first reader, then the two scures were averaged. T h e average score was c o m p a r e d to the pass/fail score fl)r the final determination o f pass/fail status. This m e t h o d reduces by one the n u m b e r o f readers reqnired to make a decision, and therefore takes less time and is less costly. However, this m e t h o d withdraws the most effective reader

36

MIDWIFERY

f r o m the g e n e r a l pool a n d increases the n u m b e r o f e x a m i n a t i o n s which m u s t be r e a d by the r e m a i n i n g readers. R e f e r e e a r b i t r a t i o n m a y also increase the possibility o f subjective bias because t h e d e s i g n a t e d r e a d e r r e a d s only t h e scripts o f m a r g i n a l candidates. I n a d d i t i u n , h a v i n g only o n e a d d i t i o n a l r e a d e r , r a t h e r than two, m a k e s it m o r e difficult to o v e r t u r n an initial failing score a n d t h e r e f o r e does m)t e x t e n d the b e n e f i t o f the d o u b t to the c a n d i d a t e . Data f r o m the a d m i n i s t r a t i o n o f two new f o r m s o f the certification e x a m i n a t i o n * ( F o r m A: 66 candidates, F o r m B: 83 c a n d i d a t e s ) w e r e used in e v a l u a t i n g the two m e t h o d s o f s t a n d a r d setting. F o r F o r m A, ten c a n d i d a t e s w e r e identified as m a r g i n a l by b e i n g within one s t a n d a r d e r r o r *throughout the paper 'Form A' and 'Form B' refer to the two versions of the examination

o f m e a s u r e m e n t o f the pass/fail c u t o f f score. U s i n g the m a j o r i t y o p i n i o n a p p r o a c h , six candid a t e s passed a n d f o u r c a n d i d a t e s failed. I n t e r estingly, the pass/fail o u t c o m e s p r o d u c e d by the score averages a n d pass/fail tallies were identical. F o r F o r m B, twelve c a n d i d a t e s a t t a i n e d m a r g i n a l scores. Using the m a j o r i t y o p i n i o n a p p r o a c h , ten c a n d i d a t e s p a s s e d a n d two c a n d i d a t e s failed. I n two cases the a v e r a g e score o f the three r e a d e r s was above the c u t o f ' f ' p o i n t , a l t h o u g h two r e a d e r s h a d given failing scores a n d o n e r e a d e r passing scores. U s i n g this m e t h o d , such c a n d i d a t e s w o u l d pass the e x a m i n a t i o n o n the basis o f their a v e r a g e score (Table 1). T h e r e a d e r s selected as r e f e r e e s for t h e evaluation o f the s e c o n d m e t h o d h a d e m p i r i c a l interr e a d e r c o r r e l a t i o n s which r a n g e d f r o m 0.90 to 0.93 ( F o r m A) a n d 0.93 to 0.97 ( F o r m B) across all o t h e r r e a d e r s . F o r p u r p o s e s o f m e t h o d evalu-

Table 1 Pass or fail decisions about marginal candidates using the majority opinion method. Raw score data for all candidates whose performance was within +_ one standard error of measurement of the pass/fail score Candidate

Reader 1 * *

Reader 2

Form A A B C D E F G H I J

261 253 254 255 255 264 257 262 273 282

(P) (F) (F) (F) (F) (P) (P) (P) (P) (P)

270 271 245 261.5 269 265 265 269.5 273.5 265.5

(P) (P) (F) (P) (P) (P) (P) (P) (P) (P)

254 232 259.5 251 228 259 255 252 262 265

(F) (F) (P) (F) (F) (P) (F) (F) (P) (P)

261.6 252 252.8 255.8 250.6 262.6 259 261 269.5 277.5

(P) (F) (F) (F) (F) (P) (P) (P) (P) (P)

Form B A B C D E F G H I J K L

183 216 212.5 219 222 227 230 224 234 231.5 228 228

(F) (F) (F) (F) (P) (P) (P) (P) (P) (P) (P) (P)

208 234 216 242 230 246 242 235 246 239 261 236

(F) (P) (F) (P) (P) (P) (P) (P) (P) (P) (P) (P)

189 218.5 214 209.5 215 219 213 213 202 234 225 229

(F) (F) (F) (F) (F) (F) (F) (F) (F) (P) (P) (P)

193 223 214 224 223 222 228 224 227 235 238 231

(F) (P) (F) (P) (P) (P) (P) (P) (P) (P) (P) (P)

Form A Test SEM = 11.64 Total n u m b e r of points = 396 Pass/Fail Score = 257

Reader 3

Form B Test SEM = 12.17 Total n u m b e r of points = 334 Pass/Fail Score = 221

* * F o r m s A and B: Designated Referee = Reader 1

Average of 3

MIDWII,'ERY 37 ation each marginal candidate's script was read by two (rather than one) readers, and also by the referee. T h e analysis o f F o r m A revealed disa g r e e m e n t between the referee and the other reader in seven instances (referee disagreed with reader # 2 on three decisions a n d with reader # 3 on f o u r decisions). T h e average scores c o m p u t e d for these instances o f disagreement are given in Table 2. T h e pass/t'ail decisions, using the average score as a final criterion, are also given. Using the referee a p p r o a c h for Form A, all original decisions were u p h e l d for reader 2 confirming one failure. T h r e e decisions m a d e by reader 3 (two fail and one pass) were reversed by use o f the referee, resulting in a total of five failures. T h e analysis o f F o r m B is also summarised in Table 2. T h e r e was d i s a g r e e m e n t between the referee and readers in seven cases (two disagreements between the referee and reader 2 and five disagreements between the referee and

reader 3). All orginal decisions by reader 2 were u p h e l d by the referee approach, resulting in two failures. T h e r e were two original failure decisions m a d e by reader 3 that were o v e r t u r n e d by the use o f the referee m e t h o d , resulting in seven failures in this group. Candidates reviewed with the referee a p p r o a c h were, therefore, m o r e likely to receive a failing score. This is because some readers are always 'stricter' or ' m o r e liberal' in the answers they will accept, despite inter-reader reliability scores above 0.90. This evaluation was conducted with a 'liberal r e a d e r ' (reader 2) and a 'strict reader' (reader 3) which resulted in the differing pass/fail rates when only one referee was used. Because o f this disparity the majority opinion m e t h o d was selected as the p r e f e r r e d a p p r o a c h because it offers the b r o a d e r review and is therefore considered m o r e objective, reliable, and ['air. In addition to c o m p a r i n g results o f pass/fail

Table 2 Pass or fail decision about marginal candidates using the referee method* Candidate

Form A A B C D E F G H I j Form B A B C D E F G H I j K L

Comparison/decision Referee and Reader 2

Comparison/decision Referee and Reader 3

Agree/Pass 262/Pass Agree/Fail 258.25/Pass 262/Pass Ag ree/Pass Agree/Pass Ag ree/Pass Ag ree/Pa ss Agree/Pass

257.5/Pass Ag ree/Fail 256.7/Fail Ag ree/Fail Agree/Fail Agree/Pass 256/Fail 257/Pass Ag ree/Pass Agree/Pass

Referee and Reader 2 Ag ree/Fail 225/Pass Agree/Fail 230.5/Pass Ag ree/Pass Agree/Pass Agree/Pass Ag ree/Pass Ag ree/Pass Ag ree/Pass Ag ree/Pass Agree/Pass

Referee and Reader 3 Ag ree/Fail Agree/Fail Ag ree/Fail Agree/Fail 218.5/Fail 223/Pass 221.5/Pass 218.5/Fail 218/Fail Ag ree/Pass Ag ree/Pass Agree/Pass

*Raw score data from Table 1 is used to compute mean score between Referee and reader in cases of disagreement; decision based on mean score.

38

MIDWIFERY

decision-making with these two criterion-referenced methods we also examined the differences between pass/tail decisions made u n d e r the norm-referenced versus the criterkm-referenced model. We wished to examine the impact which criterion-referenced scoring would have on the numbers of practitk)ners newly credentialed for practice of the profession since, obviously, the dual issues of practitioner availability and quality were both of concern. While a norm-referenced standard was not set for these tbrms of the examination, the cutoff score extant fi)r several previous norm-referenced forms (1.25 standard deviations below the mean) was used for comparison purposes. T h e method for computation and transformation of the standard scale score for essay format examinations has been previously reported (Fullerton & Holley, 1982). Table 3 presents the pass/fail peril)finance of the marginal candidates demonstrated under the criterion-referenced standard setting method which was selected (the majority opinion approach) and the pass/fail outcomes which would have been calculated had the normreferenced standard setting methodology been applied. T h e raw score and pass/fail decisions represented for the majority opinion, criterionreferenced approach (both examination forms) is derived from the average of 3 readers, as shown in Table 1, column 4. T h e standard scale score and pass/fail decisions representing the norm referenced approach are derived f r o m the formula, using data indicated in the footnote to Table 3_ On Form A, eight of the ten marginal candidates would have failed u n d e r the arbitrary norm-referenced standard. O f these, four candidates attained the criterion-referenced standard, and four did not. T h e results for Form B were similar: nine of twelve marginal candidates failed using norm-referenced interpretation, while seven of these nine failing candidates attained the criterion-referenced standard. T h r o u g h these investigations ACN M has compared pass/fail results of three different decision-making methods. While it is impossible to label any of the three sets of pass/fail outcomes as the 'correct' set, ACNM has provided extensive rationale for its decision to use the criterionreferenced model for score interpretation

Table 3 Pass/fail performance according to criterionreferenced and norm-referenced standard setting

Form A A B C D E F G H I J Form B A B C D E F G H I J K L

Majority opinion criterion-referenced Raw score Decision

Norm-referenced standard scale Score* Decision**

261.6 252 252.8 255.8 250.6 262.6 259 261 269.5 277.5

P F F F F P P P P P

345 309 313 318 318 358 327 349 398 440

F F F F F F F F P P

193 223 214 224 223 222 228 224 227 235 238 231

F P F P P P P P P P P P

159 314.7 295.9 326.5 340.7 366.6 378.4 352.5 397.3 385.5 369 369

F F F F F F P F P P F F

*Data are calculated using reader mean and reader standard deviation for designated Referee (from Table 1) Form A: reader mean (295.21); reader standard deviation (22.1) Form B: reader mean (255.78); reader standard deviation (21.2) **Pass/fail score - Standard Scale Score 375 (mean - 500, standard deviation = 100)

(Fullerton et al, 1989). T h e majority opinion approach appears to be the most defensible adjunct fi)r increasing the validity of a criterionreferenced pass/fail decision for an individual candidate.

SUMMARY Several issues of concern in the use of examinations in a modified essay format have been addressed by ACNM. T h e adaptation of the Nedelsky technique for modified essay examination standard-setting, and the development of a

MIDWIFFRY method for affirming pass/fail decisions regarding marginal candidates are new techniques. These procedures have been developed in a g e n e r i c m a n n e r , a l l o w i n g a p p l i c a t i o n to t h e i n c r e a s i n g n u m b e r o f tests t h a t i n c l u d e essays a n d o t h e r ' n o n - o b j e c t i v e ' f l ) r m a t s , as well as o r a l and performance

examinations.

References American College of Nurse-Midwives. 1985 Core competencies tor nurse-midwifery education. ACNM, Washington, D.C. C a n t o r J 1989 A validation of Ebel's method tot performance standard setting through its application with comparison approaches to a selected criterionreferenced test. Educational and Psychological Measurement 49:709-721 Fullerton J 1987 A task analysis of American nursemidwifery practice. Journal of Nurse-Midwifery 32: 291-296 Fullerton J, Greener D, Gross L 1989 Criterionreferenced competency assessment and the national certification examination in nurse-midwifery. Journal of Nurse-Midwifery 3 4 : 7 1 - 7 4

39

Fullertnn J, Holley M 1982 A new look at standard scale score transformation. Psychological Reports 5(1: 1148-1150

Gross L J 1985 Setting cutoff scores on credentialing examinations: a refinement in the Nedelsky procedure. Evaluation and the Health Professions 8: 469-493 Haertel E 1985 Construct validity and criterionreferenced testing. Review of Educational Research 55:23-46 Hambleton R, Rogers H J 1986 Technical advances in credentialing examinations. Evaluation and the Health Professions 9 : 2 0 5 - 2 2 9 Luecht R, Madsen M, T a u g h e r et al 199/) Assessing professional perceptions: design and validation of an interdisciplinary education perception scale. Journal of Allied Health 19: 181-191) Lunz M, Stahl J 1991) A comparison of intra- and inter,judge decision Consistencies using analytic and hnlistic scoring criteria. Journal of Allied Health 19: 173-179 Nedelsky L 1954 Absolute grading standards for grading tests. Educational and Psychological Measurement 1 4 : 3 - 1 9 Richards B 1985 Perfbrmance objectives as the basis for criterion-referenced performance testing. Journal of Industrial Teacher Education 2 2 : 2 8 - 3 7 T u r n b u l l J 1989 What is . . . normative versus criterionret~erenced assessment. Medical Teacher 11: 145-15/)

fail standards for an essay certification examination in nurse-midwifery.

Examination for certification or licensure of health professionals (credentialing) in the United States is almost exclusively of the multiple choice f...
554KB Sizes 0 Downloads 0 Views