Quality-of-care appraisal in primary care: a quantitative method.

Quality-of-Care Appraisal in Primary Care: A Quantitative Method JOHN C. SIBLEY, M.D., F.R.C.P., F.R.C.P.(C), F.A.C.P., WALTER 0 . SPITZER, M.D., M.H.A., M.P.H., K. VINCENT RUDNICK, M.D., C M . , J. DOUGLAS BELL, M.D., RICHARD D. BETHUNE, M.D., DAVID L. SACKETT, M.D., KAREM WRIGHT, R.N., B.N., Hamilton, Ontario, Canada

A reproduceable method has been developed for measuring the quality of clinical care provided by physicians and nurse practitioners. The distinctive features of the method are the extended use of the tracer disease concepts, the evaluation of referrals, new procedures for probing the clinical operation of practices, a single blind design, emphasis on the use of the untouched medical record, the ability to compare results with measurements of concurrent outcome, and a relatively low cost. Three simultaneous approaches used in the method are described: surveillance of the management of indicator conditions, evaluation of clinical use of drugs, and the assessment of referral decisions. The three approaches gave consistently similar results about the relative performances of the practices compared and were in agreement with concurrent outcome studies. The method was successfully implemented in a health care experiment.

for a quantitative method of quality of care appraisal designed for primary care practices. There are six distinctive features of the method reported: [1] an extension of the tracer disease concept to symptoms, states, injuries, and drugs of Kessner, Kalk, and Singer (1); [2] concurrent application of outcome measures of quality, such as physical disability, on the same practice population; [3] evaluation of the referral process as a supplementary measure of quality; [4] the development of special sampling methods for identification of representative episodes of care, which we called probes, that employs existing clinical and insurance records of service practices; [5] a single-blind method made possible by the probes; and [6] reduction of the cost of implementation to a level within reach of unsubsidized small group practices. The method was first implemented as one of the five major components of the study known as the "Burlington Randomized Trial of the Nurse Practitioner." The methods and general results of the entire trial, including preliminary quality of care data, have been reported earlier (2). The analysis of physical, emotional, and social function of paW E PRESENT HERE THE DESIGN

• From the Departments of Medicine and Clinical Epidemiology and Biostatistics, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada.

46

Downloaded From: https://annals.org/ by a University of Otago User on 11/28/2018

tients in the Burlington trial as a measure of health outcomes has also been reported (3). The study focused primarily on the process of medical practice as suggested or defined by Weinerman (4), Donabedian (5, 6) Starfield (7), and Brook and Appel (8). Although there have been extensive studies on the in-hospital application of medical audit systems (9-11), further studies on the primary health care sector are essential. The pioneer studies of the Health Insurance Plan (H.I.P.) Clinics in New York have been enumerated by Densen (12). Fitzpatrick, Riedel, and Payne (13) and Riedel and Fitzpatrick (14) first developed and extensively applied predetermined criteria for quality of care appraisal. The criterion approach fundamental to the ambulatory care study presented here was used primarily by them in hospitals. Studies on ambulatory practice done in the last 2 decades were only partially based on the criteria approach, with the exception of work by Williamson (15). Other well-known projects were those reported earlier by Peterson (16), Clute (17), and Jungfer and Last (18) in North Carolina, Canada, and Australia, respectively. In departing from Ressner's disease or diagnostic orientation by introducing presenting complaints, commonly used drugs, and the referral process for evaluation, an attempt was made to render the method more appropriate for measurement of the quality of primary care. Methods

After 7 months of development and pretesting, the instrument was used to evaluate clinical activity in the Burlington practice between July 1971 and June 1972. THREE APPROACHES OF APPRAISAL

The three approaches used were the surveillance of (a) the management of indicator conditions, (b) the indications for prescription of drugs commonly used in family practice, and (c) referral decisions and patient management as assessed by consultants to the practices. A Peer Advisory Group (described in detail in another section) selected the conditions, drugs, and referral issues and developed the corresponding criteria. Indicator Conditions: An indicator condition may be a disease, presenting symptoms, an injury, or a state. IndiAnnals of Internal Medicine 83:46-52, 1975

Table 1 . Number of Episodes of Indicator Conditions Assessed and Percentage Scored Adequate

Pr actices Under Ascessment

Indicator Condition

Randomized Control

Randomized Nurse Practitioner % 74 56 77 71 90 81 42 36 100 75 69

Otitis media Hypertension Prenatal care Care of the newborn Immunization in 1st year of life Depression Urinary tract infection Knee injury Pityriasis Anemia Total

no. 39 9 13 17 10 37 24 11 4 4 168

cator conditions were selected in which outcomes may be influenced by management and when the frequency of occurrence was sufficiently high to provide adequate data for analysis. Ten indicator conditions were identified. For each, explicit criteria for adequacy of management were established before their use and appraisal. The conditions are listed in Table 1. Drugs: The manner in which 13 drugs were used was evaluated*. Explicit criteria for the satisfactory use of drugs shown in Table 2 were established in advance. Referral Decisions: Appraisal reports concerning referral decisions and patient management were obtained from the group of consultants who provided 96% of all consulting service to the primary care practitioners studied. The data were elicited with consultant questionnaires that explored (a) communication between the referring health professional, the patient, and the consultant, (b) timeliness and clinical usefulness of the referral, and (c) attitudes about the referral. Candor was assured by the scrupulous * Nurse practitioners cannot legally prescribe in Ontario, but it was possible to determine how they made decisions about drugs for patients in their practices.

% 74 67 71 64 45 94 62 50 73 40 69

Communiity Control

no. 39 12 31 34 11 33 26 16 11 10 223

67 31 70 82 11 71 53 57 55 90 61

no. 36 13 23 33 19 34 36 14 11 10 229

confidentiality in the manner that the questionnaires were gathered, coded, and reported. As advocated by Donabedian (5), applying these normative criteria complemented the use of empirical standards. THE DEVELOPMENT OF CRITERIA FOR CLINICAL JUDGEMENT

The indicator conditions and drugs studied were selected to permit exploration of problems in all age groups and both sexes, with due attention to the frequency of clinical problems presenting to the physician in primary health care. Three general considerations were used in selecting the explicit criteria. 1. Observations (clinical or laboratory) were identified that were considered essential for adequate monitoring of the patient's progress. For example, in the indicator condition care of the newborn in the first year of life, the criteria state that there must be weight recorded on every visit and height and head circumference recorded at least once every 6 months. 2. Management decisions should indicate sound clinical judgement. For example, in pityriasis rosea no therapy was indicated if the patient was asymptomatic. Oral steroids were not permitted. 3. The possible serious significance of apparently benign symptoms, signs, or laboratory findings should be recognized; for example, in a 20-year-old man a haemoglobin of 10 g should lead the practitioner to question the presence of an underlying primary disorder, such as a duodenal ulcer, a blood

Table 2. Number of Episodes of Drug Use Assessed and Percentage Scored Adequate

Drug

Practices Under Assessment Randomized Nurse Practitioner 100 97

no. 38 38

—

—

100 92 97 53 86 60 38 44 43 60 71

3 13 35 15 14 5 13 41 7 42 226

% Chloramphenicol * Tetracycline Amphetamines Multivitamins Hematinic Phenylbutazone Hypertensive medications Steroids Vitamin Bi2 Antidepressants Tranquillizers Cardiac glycosides Antibiotics f Total

Randomized Control

% 100 97 0 86 85 89 60 86 67 39 70 67 72 75

no. 38 38 1 7 26 38 15 21 6 33 44 12 43 284

Community Control

% 100 97 14 86 62 95 9 97 80 40 76 40 69 68

no. 37 37 7 7 39 39 32 29 5 25 41 15 39 315

* See comment on chloramphenicol in text. t Other than tetracycline or chloramphenicol. Sibley et a/. • Appraisal of Care


47

dyscrasia, and so forth. The development of explicit criteria proved to be a very demanding task. The indicator condition hypertension included four categories of severity each having seven or more criteria for adequate care*. Prenatal care presented fewer problems. For example, for an episode to be scored adequate, the following criteria had to be met in an uncomplicated pregnancy: (a) pelvic assessment, if there was no previous history of successful delivery; (b) past obstetrical history; (c) complete physical assessment within a 2-year period; (d) haemoglobin; (e) urinalysis at each visit; (f) monthly visits from the first through to the seventh month, bi-weekly visits for the eighth month, and weekly through to term; (g) record of weight at each visit; (h) record of blood pressure at each visit; (i) record of Rh and of serological test for syphillis; and (j) a statement of gestational age. In addition, 14 intermediate states of possible complications of pregnancy were identified, each with a mandatory intervention if the score of adequate was to be obtained. Such complications included premature rupture of the membranes, presence of albuminuria, hypertension, glucosuria, and exposure to German measles. The selection of drugs to be used and the criteria for their appropriate use in family practice were developed concurrently by the Peer Advisory Group. The following examples illustrate the approach adopted. Amphetamines were permitted only if there was (a) an established diagnosis of narcolepsy; (b) an established diagnosis of idiopathic postural hypotension; and (c) were permitted in children only if there was minimal brain damage, cerebral dysfunction, or functional behavioural problems in a hyperkinetic child. The combination of a diuretic and a cardiac glycoside was adequate if (a) the daily dose of diuretic is omitted at least 1 day per week or (b) the patient is on a potassium supplement or (c) the serum potassium is measured not less frequently than every 2 months or (d) an increased dietary intake of potassium is clearly indicated. Antidepressants such as amitriptyline were permitted only if there was evidence of follow-up visits and concurrent psvchotherapeutic support by the family physician or a consultant. In addition, not more than 50 tablets could be prescribed at any one time. ELIGIBILITY OF EPISODES OF CARE FOR INCLUSION IN THE STUDY

According to definitions adopted, an episode is not just a single visit but includes all encounters for the management for an indicator condition. To be eligible, an episode has to be managed in full or in part by the health professional concerned in the studied ambulatory setting. Both the initial and final dates have to fall within the study period. In "care of the newborn," for example, the episode begins with the first visit to well-child care in the patient's first year of life, continues within the study period, and ends on the child's first birthday or at the end of the study period, whichever comes sooner. A child must be under surveillance for a minimum of 5 months. PEER ADVISORY GROUP

The purpose of the study excellent a practitioner might cologist, or paediatrician, but compared with community

was not to determine how be as a cardiologist, gynaerather how the practitioner standards of adequacy in

* Complete lists of criteria, questionnaires, and abstract forms used are found in NAPS document #02421. Order from ASIS/NAPS Microfiche Publications, 305 East 46th St., New York, New York 10017. Remit $1.50 for microfiche or $12.30 for photocopies. 43

July 1975 • Annals of Internal Medicine • Volume 83 • Number 1


primary care management. A Peer Advisory Group was therefore established composed of three family physicians highly regarded in the community and by the university, each in the 40 to 50 age group and having been in practice not less than 10 or more than 20 years and representing a four-man practice, two-man practice, and a solo practice, respectively. These physicians were active in reallife community practices and were not full-time academics. The group's functions were to select the indicator conditions and drugs, to define episodes, to develop explicit criteria for each and then make their own practices available for pretesting of data-gathering instruments and validation of criteria. Measurements were to be done on patients seen before the date the group was first informed of the study. THE EXPERIMENTAL SETTING

The study was a collaborative project among three private family practices in Burlington, Ontario, and investigators from the McMaster University Faculty of Health Sciences. The practices that were compared included the same ones assessed concurrently using outcome measures. Two physicians practising in a conventional way and assisted by traditional office nurses formed the first study practice with 1060 families (Randomized Control [RC] practice) t- The second practice with 540 families received care from nurse practitioner-physicians teams (Randomized Nurse Practitioner [RNP]). The third practice, Community Control (CC), had 1350 families of two other family physicians practising conventionally in close association with each other. The three practices were located in the same building, practised in the same hospital, and their patients had a similar geographic distribution. Further details about the practices and about the characteristics of the patients (7990 or 2.71 per family) are given elsewhere (2, 3 ) . Although the allocation of patients to the RC and RNP study practices was randomly done in a 2:1 ratio, the selection of episodes of care within practices was not a random process. Abstractors did not actively search further after 35 episodes of a given condition or of a drug used had been found in each practice. GENERAL PLAN OF THE STUDY

The general plan was to derive scores that reflected the quality of performance as assessed with the three approaches employed. The scores indicate the number of episodes judged adequate, expressed as a percentage of all eligible episodes examined. The Peer Advisory Group developed explicit criteria for the adequacy of management of indicator conditions and the use of drugs. The physicians and nurse practitioners studied were unaware of the indicator conditions, the drugs being considered, or the aspects of care evaluated by the consultants. They were unaware of the explicit criteria or the manner in which the data were gathered, scored, and summarized. t To facilitate comparisons with earlier reports (2, 3 ) , the abbreviations previously used to designate practices were retained.

PRINCIPAL DATA SOURCE

Ordinary clinical records existing in the primary care practice under assessment were the principal data source for the indicator conditions and drug use components. PROBES

To justify a "single-blind" designation of the study, a series of probes were developed. They permitted the identification of patients having the preselected indicator conditions or those who received the chosen drugs during the experimental period, without the need to set up a new clinical record system and without any distortion of usual patterns of practice. The probes also ensured that the indicator conditions and drugs remained unknown to the practitioners under study. Five probes were developed. First, each practice kept a daysheet journal listing the patient's name, complaint(s), diagnosis(es), procedure(s) done, whether referred, and to whom. Entries of 44 083 visits were obtained in this fashion. Second, the physicians under study used only carbonized, personalized prescription forms (which they or the pharmacist also completed for telephone orders). The carbon copy retained by the research group permitted the identification of the drugs used. The prescription of certain drugs served as an additional probe to identify indicator conditions. For example, amitriptyline (Elavil®) would suggest a possible case of depression. Third, there was a probe of consultants' records of consecutive consultations done during the experimental period. Those records were reviewed independently. Fourth, the hospital records during the same period were reviewed for the identification of indicator conditions and drugs. Finally, once access was gained to a patient's file by any of the above probes, a secondary direct record search of the file was done to identify any additional episodes. The "yield" of the various probes is summarized in Table 3. In the Burlington study, 620 episodes of indicator conditions and 825 episodes of drug use were identified with these probes. The distribution of probes identifying each indicator condition or drug proved to be balanced.

Nurses were trained to abstract information from clinical records and to score the abstracts according to predetermined criteria. We compared scores obtained by two nurse abstractors independently, and these scores were Table 3. Number of Eligible Episodes Detected by Probes in Experimental and Control Practices

Daysheet Prescription Hospital Consultation Direct record search * Total

SCALE FOR SCORING

A categorical scale was used that permitted scoring indicator condition and drug-use episodes as adequate or questionable. The percentage of adequate episodes of all episodes scrutinized was calculated. For instance, for 26 episodes of urinary tract infection reviewed in practice RC, the abstractors indicated whether the recorded care in each episode met predetermined criteria of adequacy. Of the 26 eligible episodes, 16 (53%) were classified adequate. The procedure was repeated for the 10 indicator conditions. Of the 223 episodes examined, 154 (69%) were adequate, and the percentage derived (69%) has been reported as the score for practice RC (Table 2). To score the consultant questionnaires, the responses to each of the 10 questions in the questionnaire were marked on a five-point scale by the consultant. On an ordinal scale from 1 to 5, 1 to 2 was low, 3 was equivocal, and 4 to 5 was high. The results have been expressed as the percentage of high score responses out of all responses in each group of questions that correspond to communication, judiciousness or referral, adequacy of management, and attitudes. For example, of 162 questions about judiciousness of referrals in practice CC, the consultants had rated 144 (89%) as either 4 or 5, that is, high performance. The score reported for that aspect of practice was 89%. The resulting scores for indicator conditions, drugs, and referral decisions obtained permitted us to compare each aspect of care of one practice with another at a certain time. In the absence of any evidence or criteria by which to assign weights for the aspects of practice assessed, the three scores were not averaged to give a "total practice score." PRETESTING OF MEASURING INSTRUMENTS AND

ABSTRACTORS

Type of Probe

further compared with scores obtained by two physicians. When 162 abstracts were independently assessed, the agreement between the nurses was 8 8 % . Of the abstracts about which they agreed, 94% agreement was attained with two physicians (one an internist, the other a family physician).

Indie ator Cond ition no. 446 133 37 4 0 620

% 72 21 6 1 100

Drug Use no. 439 197 13 6 170 825

% 53 24 2 1 20 100

* Secondary, after probe had given access to file.

VALIDATION OF CLINICAL CRITERIA

In a pretest done in three practices of the peer advisors, the abstractors readily identified 355 indicator conditions, 191 drug use episodes, and 50 referral decisions within 8 weeks. Eleven indicator conditions were tested, and 1 was excluded. Two drugs were deleted, and 1 was added from 13 original categories. The aggregate scores of "adequate" expressed as a percentage of total episodes were similar in the three practices, as was anticipated because of the consensus approach of criteria setting. For indicator conditions, practice A scored 66% adequate, practice B, 6 5 % , and practice C, 6 1 % . For drugs, the scores were A, 64, B, 81, and C, 66. One question was dropped from the consultant questionnaire. Certain changes were made in explicit criteria, and certain refinements were introduced to the abstracting methods. The Peer Advisory Group judged that the evaluation in their practices was consistent with their performance and Sibley et al. • Appraisal of Care


49

that there had been no distortions in their practices during pretesting and validating. Encouraged by this phase of the study, the method was then applied as part of a controlled experiment. Results INDICATOR CONDITIONS

Table 1 lists the indicator conditions evaluated and the number of episodes for each indicator condition. One hundred and sixty-eight episodes of indicator conditions were identified by the probes in the RNP practice, 223 episodes in the RC practice, and 229 in the CC practice, for a total of 620 episodes. The proportion of episodes scored "adequate" is also shown for each indicator condition in the three practices under comparison. In the RNP practice, 69% of the episodes of indicator conditions scored adequate or superior, in the RC practice, 6 9 % , and in the CC practice, 6 1 % * . It should be stressed that the Peer Advisory Group decided that equal weighting would be given to each indicator condition or to drug assessment and that each episode, therefore, would have equal value in the final scoring. Examining of the various cells in Table 1 shows considerable variation in the percentage of adequacy for indicator conditions within each practice. The drugs studied are listed in Table 2. The percentage of all drug use episodes scored as adequate by drug and by practice are also shown. As with the indicator conditions, the Peer Advisory Group decided that each episode was of equal value. The RNP practice scored 7 1 % adequate, the RC, 7 5 % , and the CC, 6 8 % .

Figure 1. Profile of high level performance in referral decisions as reported by consultants (by practice and by practice areas explored). There were 22 questionnaires completed in the Randomized Nurse Practitioner (RNP), 41 in the Randomized Control (RC), and 4 1 in the Community Control (CC).

Discussion INTERPRETATION OF RESULTS

The Untouched Medical Record as Evidence: Quality of care studies in primary health care present challenges distinct from those in the hospital setting. The medical record is brief, and it is primarily a shorthand reminder to the doctor of what was done or not done on a particular visit. In addition, the physician may appropriately choose to record only abnormal findings and significant normal findings. Furthermore, in primary health care, the physician may have extensive previous knowledge of the patient or the family, the patient has easier access to the physician, the majority of the complaints seen are self-limited, and the diagnostic process may depend on repeated observations and the sequential decision-making process over time. These factors influence the type of record keeping in primary health care. To avoid the pitfall of measuring completeness of documentation rather than the quality of patient care, the abstractors operated on the assumption that there must be sufficient evidence in the practice records to draw a reasonable conclusion that a particular intervention had actually been done. The comparisons of abstractors' scores with physicians' scores and observer variation checks permitted us to conclude that appropriately trained nursing abstractors could make judgements about sufficient evidence with a high degree of consistency. The direct observation method of obtaining data was rejected not only because of the costs and the risk of altering the performance of the person observed, but particularly because the time period for observation was too short. The assessment of depression or hypertension, for example, must take place over multiple visits for several months, and it is impractical for a trained medical observer to be present for this length of time or for each visit. The Meaning of Absolute Scores: Because only six practices (including the pretest practices) have been completed, we cannot draw a firm conclusion about the meaning of the absolute scores reported. The scores obtained, however, have permitted us to compare one type of practice with another at a certain time and to compare one

Three drugs are worthy of mention. Drug 1 (chloramphenicol) was considered contraindicated in primary care, whereas drug 2 (tetracycline) is an appropriate alternative. To reward a correct decision for the nonuse of chloramphenicol, it was assumed arbitrarily that a choice between these two was made each time that tetracycline was prescribed. In order to avoid a double weight for a proper choice, the numbers for chloramphenicol are not included in the grand total. Drug 3 (amphetamine) shows a small number of episodes as expected, because this drug should rarely be used in primary care. There are considerable variations in the scores within the practices and within drugs, suggesting that the method is reasonably sensitive and can elicit differences for particular drugs across various practices and across all drugs within a particular practice. REFERRAL DECISIONS

As shown in Figure 1, for each of the four areas explored in the questionnaire by consultants, the RC and RNP practices had 80% to 90% of the episodes rated in the high category, with the CC practice rated in the 60% to 70% range. The exception was in appropriateness of referral, in which all scored within 88% to 8 9 % . * The conventional assessment of the differences between the RNP practice and the RC practice showed that the observed differences are not statistically significant at the 5% level. However, it is important in this situation to assess the likelihood that the adequacy of care could, in fact, be worse in the RNP practice. The analysis showed that the probability of a true deterioration of 10% or more in the quality of care in the RNP practice compared with the RC practice was 0.018 for indicator conditions and 0.072 for drugs. An earlier report (2) erroneously reported the RC rate for indicator conditions as 66%. Correction of the arithmetic error does not alter the conclusion in any significant way. 50

July 1975

• Annals of Internal Medicine • Volume 83 • Number

1


mode of care with another within a practice. We are able to assess changes over time. Undoubtedly, the scores for adequacy would have been higher if documentation had been more complete. A score of 75% adequate means that 25% of episodes are questionable or indeterminant and not necessarily inadequate. Required interventions may have been done and simply not recorded. Further assessments now underway in additional practices will provide information about the meaning of absolute scores.

physicians' scores vary on different indicator conditions and drugs, there is internal consistency in the scores for the three different approaches used for each of the three practices studied: assessing indicator conditions, the use of drugs, and the opinion of consultants. Third, there is high inter-observer agreement on scoring. Finally, in combining the results reported here with results obtained, but not yet reported in further studies on additional practices, the scores obtained produced a rank order of performance identical to that achieved by a panel of experienced primary care physicians practising in the same area.

FEASIBILITY

Time: It was possible to complete the methodological phase, including the development of criteria and the design of data-gathering instruments, and to conclude the pretest in the three Peer Review practices in 7 months. Three months were required to complete data gathering in the study practices for the experimental 1-year period, to develop the computer programme, and to do the scoring. With increasing experience, the time requirements are diminishing. Cost: The use of specially designed clinical records and increasing the number of indicator conditions per practice were considered, but the additional cost would limit the replicability of the method in unsubsidized settings. Studies being completed on 15 additional physicians during 19731974 have shown that once the developmental work is done, the cost is approximately $1500 per physician for a survey such as the one reported here. The figure does not include university faculty investigators' time. Daysheets replaced normal office procedure for governmental insurance purposes and did not add appreciably to the cost. Carbonized prescription forms added negligibly to the cost. In other practices, assessed later, routinely prepared data processing cards used for billing in the Canadian universal insurance plan were used with comparable success. Third-party insurance forms commonly used in the United States and E-book clinical indexing cards used extensively in Great Britain might serve the purpose equally well. Our present capability of scoring episodes by computer reduces costs further and accelerates the process.

IMPLICATIONS FOR CONTINUING EDUCATION

Primary care practitioners cannot be expected to be uniformly "good" or "bad." Differences in score among indicator conditions and drugs are normative in primary care. (This variation was also observed in subsequent assessments using 10 additional indicator conditions.) A selective identification of areas requiring improvement in performance has major implications and potential value in planning and evaluating the results of continuing medical education programmes. The method presented for the assessment of quality of care in primary care practices seems sensitive, credible, practical, and economical. This method, which is currently being tested in additional rural, urban, and experimental practices in Ontario and Newfoundland, may be useful to other researchers wishing to measure quality of care, either as part of an experiment in health care delivery or for ongoing monitoring of existing practices when cost is a limiting factor. ACKNOWLEDGMENTS: The authors thank Drs. W. Ian Hay, G. Patrick Sweeny, H. Sigurd Nielsen, and E. Vivienne MacKrell for their collaboration; Professor M. Gent of McMaster University and Dr. H. Smits of the University of Pennsylvania for comments and suggestions in the preparation of the manuscript; and Miss H. Otrosina for coordination and administrative assistance. Grant support: in part by National Health Grant 606-21-66 of Health and Welfare Canada. Received 27 November 1974; revision accepted 7 March 1975. • Requests for reprints should be addressed to J. C. Sibley, M.D., Room 2E18, Health Sciences Centre, McMaster University, Hamilton, Ontario L8S 4J9, Canada.

VALIDITY AND RELIABILITY

"Validity refers to the extent to which a situation as observed reflects the 'true' situation, or the situation as evaluated by other criteria that are thought to reflect the true situation more accurately" (19). This definition given by MacMahon and Pugh shows how difficult it is to reach conclusions about the validity of a new measure of quality of care. The "true situation" is seldom known and virtually never quantified. Furthermore, "other criteria [that] reflect the true situation more accurately" may not exist for primary care. Nevertheless, the consistency of the pattern of results obtained in a given practice using the methods reported and their further agreement with other results obtained independently (3) suggest that the method is sound and reliable. First, these results are in close agreement with the outcome measurements of mortality and physical, social, and emotional function done on the same study subjects in the Burlington Trial (3). Second, although the

References 1. KESSNER DM, KALK CE, SINGER J: Assessing health quality—

the case for tracers. N Engl J Med 288:189-194, 1973 2. SPITZER WO, SACKETT DL, SIBLEY JC, et al: The Burlington

randomized trial of the nurse practitioner. N Engl J Med 290: 251-256, 1974 3. SACKETT DL, SPITZER WO, GENT M, et al: The Burlington ran-

4. 5. 6. 7. 8. 9. 10.

domized trial of the nurse practitioner: health outcomes of patients. Ann Intern Med 80:137-142, 1974 WEINERMAN ER: The quality of medical care. Ann Am Acad Polit Soc Sci 273:185-191, 1951 DONABEDIAN A: Medical care appraisal, in Guide to Medical Care Administration, Vol. 2. New York, American Public Health Association, 1969 DONABEDIAN A: Promoting quality through evaluating the process of patient care. Med Care 6:181-202, 1968 STARFIELD B: Health services research: a working model. N Engl J Med 289:132-136, 1973 BROOK RH, APPEL FA: Quality of care assessment: choosing a method for peer review. N Engl J Med 288:1323-1329, 1973 SJ.EE VN: Information systems and measurement tools. / Am Med Assoc 196:1063-1065, 1966 ROBERTSON CM: The hospital medical records system. Med Care Sibley et al. • Appraisal of Care


51

8:93-97, 1970 11. SHINDELL S, LONDON M: A method of hospital utilization re-

view. Pittsburgh, University of Pittsburgh Press, 1966, p. 64 12. DENSEN PM: The quality of medical care. Yale J Biol Med 37: 523-536, 1965 13. FITZPATRICK TB, RIEDEL DC, PAYNE BC: Character and effec-

tiveness of hospital use (project 2), in Hospital and Medical Economics edited by MCNERNEY W. Chicago, Hospital Research and Educational Trust, 1962, pp. 361-592 14. RIEDEL DC, FITZPATRICK TB: Patterns of patient care, a study of hospital use in six diagnoses, in Bureau of Hospital Administration Research Series, No. 4. Ann Arbor, University of Michigan, 1964

52

July 1975 • Annals of Internal Medicine • Volume 83 • Number 1


15. WILLIAMSON JW: Evaluating quality of patient care—a strati relating outcome and process assessment. JAMA 218:564-5 1971 16. PETERSON OL, ANDREWS LP, SPAIN RS, et al: An analyti

study of North Carolina general practice. / Med Educ 31:1-1 1956 17. CLUTE K F : The General Practitioner. Toronto, University Toronto Press, 1963 18. JUNGFER CC, LAST JM: Clinical performance in Australian g

eral practice. Med Care 2:71-83, 1964 19. MACMAHON B, PUGH T F : Epidemiology Principles and Methc

Boston, Little, Brown and Co., 1970, p. 261

Healthcare waste management: qualitative and quantitative appraisal of nurses in a tertiary care hospital of India.

Primary health care research in Saudi Arabia: A quantitative analysis.

A Strategy to Identify Critical Appraisal Criteria for Primary Mixed-Method Studies.

Rights to health care: a critical appraisal.

A method to determine the impact of patient-centered care interventions in primary care.

Primary nursing care as a method for improving the quality of patient care.

Patient-care appraisal as continuing medical education.

The mental health care bill 2013: a critical appraisal.

The implementation and sustainability of a combined lifestyle intervention in primary care: mixed method process evaluation.

A home-based method for the detection of impaired glucose tolerance in hypertensive primary care patients.

Towards implementing coordinated healthy lifestyle promotion in primary care: a mixed method study.

Curing and caring -- a proposed method for self-assessment in primary care organizations.

Quantitative determination of plasmin: a fibrinogenolytic method.

Using theory to improve low back pain care in Australian Aboriginal primary care: a mixed method single cohort pilot study.

Improving patient experience in a pediatric ambulatory clinic: a mixed method appraisal of service delivery.

Measuring teamwork in primary care: Triangulation of qualitative and quantitative data.

A method for quantitative determinations of nonhistone proteins in chromatin.

Refractory giardiasis: a molecular appraisal from a tertiary care centre in India.

International primary care snapshot: academic primary care in Italy.

The status of platyrrhine phylogeny: a meta-analysis and quantitative appraisal of topological hypotheses.

Integrated care of refugees in a primary care residency clinic.

A review of antenatal care initiatives in primary care settings.

Fall Prevention in a Primary Care Setting.

Issues in primary care. A labor perspective.