International Journal of Technology Assessment in Health Care, 8:3 (1992), 419-443. Copyright © 1992 Cambridge University Press. Printed in the U.S.A.

PROBLEMS IN ASSESSING THE TECHNOLOGY OF CRITICAL CARE MEDICINE William J. Sibbald Kevin J. Inman Victoria Hospital, London, Ontario

Abstract Technology assessment is becoming increasingly important in the area of critical care due both to the explosion of technology associated with this discipline and to the realization that future demand for these health care resources will undoubtedly exceed the ability to pay. Technology assessment remains both confusing and controversial to many physicians. This review tries to address some of the confusion by reviewing the basic strategies involved in this process. From there, problems and prospects for the evaluation of critical care as a program are presented, followed by the assessment of components within the area of critical care. Finally, recommendations are made on how technology assessment could proceed in the future to best achieve the efficient provision of this service.

The escalating imbalance between supply and demand in health care resources during the last decade has promoted an increased awareness of the need to assess new, as well as existing, health care technologies. Although various strategies have been proposed to measure the impact of technology on health care, the process of technology assessment remains both confusing and controversial. Concomitant with this increasing awareness of the need for improved technology assessment in health care, has been the expansion of hospital-based critical care services. It seems paradoxical that critical care, a relatively new discipline that emphasizes technology in achieving its mission, could diffuse into so many markets without greater attention to establishing a lasting process for evaluating its mandate and outcomes. Before commenting on the need for and types of assessments performed in, critical care, this commentary will begin by defining the various terms that are commonly used when discussing technology assessment. We apologize to the informed reader but consider this a necessary aid to those critical care professionals who are just now becoming familiar with this area. A brief history of the development of the critical care unit will then introduce a discussion of technology assessment in critical care. Technology has been defined in various manners. Battista (4) stated that healthrelated technology included that which "encompasses all of the instruments, equipment, drugs and procedures used in health care delivery, as well as the organizations supporting delivery of such care." Technology assessment can be succinctly defined as the process of designing and conducting studies to evaluate and render judgment on the technology being assessed. Therefore, the goal of technology assessment is to 419

Sibbald and Inman

"discover for each technology, the sliding scale of value in terms of benefits and burdens for various types of patients, so as to define the limits of appropriate use" (38). Stated somewhat differently, technology assessments strive to establish the criteria of efficacious, effective, and efficient care. Given the disproportionate impact on resource use that is exerted by critical care medicine, it is an intuitive conclusion that the process of technology assessment will become an important tool in making those policy decisions that affect this discipline. Efficacy can be defined as the power or capacity of a technology to produce the desired effect. In health care technology assessment however, efficacy is described as the probability of benefit to individuals in a defined population from a medical technology applied for a given medical problem, under ideal conditions (48). Thus, efficacy addresses the question of whether the technology can achieve its stated goal under optimal circumstances. In addition, it is at this stage that such issues as safety and the potential risks associated with the technology are also examined. In contrast to efficacy, effectiveness describes the probability of benefit to individuals in a defined population from a medical technology applied for a given medical problem under average conditions of use (48). When new technologies diffuse to average conditions of use, benefits realized under stringently controlled conditions may no longer apply. At this stage of the technology's assessment, the goal is then to identify whether a new technology does more harm than good when applied under usual circumstances. Consider the following example as an illustration. Three large, multicenter, randomized controlled trials conducted in a well-defined group of patients in an intensive care unit (ICU) identify that a new treatment is associated with a decreased incidence of nosocomial pneumonia. It is also found that the new treatment is associated with a decreased length of ICU stay, with no increase in adverse occurrences. On the basis of these results, the treatment is declared to be efficacious. Now, suppose that other hospitals adopt the technology on the basis of these results. Only if the technology remains associated with the same benefit as in the original evaluation trials, when such factors as differing case-mix groups, patient selection, and staff compliance are no longer being monitored, can the technology be deemed effective. Few evaluations that have concluded that a technology is efficacious based on results in tertiary care centers have gone on to ask questions regarding its effectiveness following the technology's diffusion to secondary and primary care health centers. While some medical technologies may indeed be shown to be both efficacious and effective, this result is no guarantee that they will be employed efficiently. Efficiency can be defined as the relationship between inputs and outputs, or between costs and consequences (3). Thus, efficiency considers not only the effectiveness of the technology, but also the resources required to provide it. Various economic analyses have been proposed which examine this issue, including the cost-effectiveness evaluation, the cost-benefit analysis, and the cost-utility analysis. While all three methods entail calculating the costs associated with providing a "new" versus a "standard" strategy, each is associated with a different outcome measure. In the case of the cost-effectiveness analysis, the output or consequence of providing each treatment is measured in units of health, in this case the number of pneumonias prevented or the length of stay in the ICU. Thus, comparisons of the costs that are associated with the same unit of outcome are compared, in this case the time spent in the ICU. Although there is apparent confusion over the use of this term in clinical medicine, it should be reserved for those technologies which provide a saving in cost, along with an equal or better health outcome. According to recently published criteria (21), one strategy would be considered more cost-effective than another if it is: 420

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care 1. less costly and at least as effectiver 2. more effective and more costly, with the additional benefit worth its additional cost; 3. less effective and less costly, with the added benefit of the rival strategy not being worth its extra cost.

In contrast to a cost-effectiveness analysis, outcome is converted to dollars in a costbenefit analysis. Using the previous example, a monetary cost is calculated for each case of pneumonia or each day of hospitalization in the ICU. Finally, in the cost-utility approach the relative social value associated with the gains made by the technologies under examination are used as the outcome measures. Table 1 summarizes these concepts. Distinct from technology assessment, but often confused with it, are the areas of quality assessment, quality assurance, and quality monitoring. Unlike technology assessments, they do not directly attempt to evaluate and improve the provision of medical care. For example, quality assessments examine the degree to which a given medical technology is used appropriately, judiciously, and skillfully (20). Issues regarding the efficacy and effectiveness of the technology are not addressed. Quality assessment makes no contribution to clinical medicine. It simply determines whether what clinical medicine has established to be the most appropriate management for any condition has been selected and implemented skillfully. If, in the course of doing this, it occasionally notes aberrations that cast doubt on the dicta of clinical medicine, it does so only as a byproduct of pursuing its primary purpose (20).

Simply put, quality assurance is all that a hospital does to ensure and enhance the quality of care that it provides. It would include, for example, administrative standards, professional standards, admission criteria, as well as measures of mortality, morbidity, and quality of life. Monitoring these quality-assurance measures is accomplished through such procedures as accreditation, staff evaluations, statistical analyses, audits, and surveys of patient satisfaction. Given the high cost of providing critical care, it is obvious that the critical care physician must become more familiar with these procedures. WHAT IS CRITICAL CARE? As is demonstrated in the writings of Florence Nightingale, ICUs had their origin in the postoperative recovery room: It is not uncommon, in small country hospitals, to have a recess or small room leading from the operating theatre in which the patients remain until they have recovered, or at least, recover from the immediate effects of the operation (56, 89). This observation that the severity of some patients' illnesses required a greater commitment of both human and physical resources to facilitate their recovery introduced a new process of patient care to the hospital. The first ICU was developed in 1923, when a three-bed facility for postoperative neurosurgical patients was opened at Johns Hopkins Hospital in Baltimore, MD by Dr. W. E. Dandy (32). The growth of intensive care was further contributed to by organizational and treatment innovations which-accompanied the U.S.'s participation in the Second World War, the Korean War, and the Vietnam War. During these periods, INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

421

3c. Cost-utility

3b. Cost-benefit

3a. Cost-effectiveness

3. Efficiency

2. Effectiveness

1. Efficacy

Type of assessment

Tries to establish whether the technology can achieve its stated goal under optimal conditions Tries to establish if the technology does more harm than good under average conditions of use Examines the relationship between the costs and consequences of providing the technology Compares the costs associated with achieving a given level of health outcome Same as a cost-effectiveness analysis, with the exception that a monetary value is attributed to the health outcome Same as above, with the outcome being weighted as to its societal value

Goal of the assessment

Table 1. Types of Technology Assessments and Their Attributes

Monetary value attributed to the health event Societal value associated with the health event

Health events (e.g., length of ICU stay)

Ratio of inputs (costs) to:

Improved patient care/outcome

Improved patient care/outcome

Outcome measure

Problems in assessing critical care

significant advances were realized in the process of intravenous fluid therapy for shock, blood transfusion, anesthesia, and postoperative recovery, as well as in the rapid evacuation of patients to facilities that were best equipped to meet their need within a tiered system of care. The introduction of mechanical ventilation was the next major development which contributed to the rapid growth of ICUs in the United States. The sense that close monitoring by skilled health care workers had the potential to improve the outcome of patients with myocardial infarction (MI) assured the simultaneous development of coronary care units. From these lessons, critical care subsequently evolved as a unique hospital-based service which was held to improve the outcomes of patients with life-threatening illnesses, through the commitment of greater human and technological resources than were available in a traditional hospital ward. Throughout the 1960s and 1970s, the growth of ICUs was remarkable. The increased commitment to this type of care was significantly fostered by reimbursement practices which provided greater profit to hospitals than had existed when patients were admitted to standard ward beds. The introduction of a prospective payment process in the United States, and the increasing use of global budgeting in Canada, the U.K. and other countries where government primarily funded health care, led to greater analysis of critical care services in terms of their output and accountability. In 1983, a U.S. National Institutes of Health consensus conference defined the mission of critical care (57): Critical Care is a multidisciplinary field concerned with patients who have sustained, or are at risk of sustaining, acutely life-threatening, single or multiple organ system failures due to disease or injury. Critical Care seeks to provide for the needs of these patients through immediate and continuous observation and intervention so as to restore health and prevent complications.

In striving to achieve this goal, the provision of critical care services has become resource intensive, particularly replete with both expensive and inexpensive technological innovations. Many of these technological innovations diffused into the critical care setting without having undergone any serious evaluation of their effectiveness or efficiency. "Thus, it is time for a rigorous effort to establish what procedures produce beneficial outcomes under what conditions—and to eliminate stark instances of 'over-utilization.' Physicians and hospital administrators should put establishing quality standards at the top of their agendas" (9). In other words, it is time to examine the efficacy, effectiveness, and efficiency of critical care services. The question then becomes: How should we proceed? Conceivably, the process of critical care can be analyzed at two levels: a) the program itself, i.e., the hospital service; and b) the components of the program (i.e., the labor, equipment, and management structure) which, when combined, collectively define the program. At the program level, the primary focus is on the overall performance of this service in providing the care required by critically ill patients. At the component level, the emphasis must be on the assessment of the specific technologies that are used in achieving the program's mission. As has been mentioned, it should be a fundamental objective at both the program and component levels to define the parameters of efficacious, effective, and efficient care. PROGRAM ASSESSMENT IN CRITICAL CARE

A stringent evaluation of the efficacy or effectiveness of a critical care program, both between and within hospitals, would be an enormous, lengthy, and expensive undertaking given the myriad of drugs, equipment, procedures, and management techniques INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

423

Sibbald and Inman Table 2. Retrospective Outcome Studies of ICU Care (Adapted from [5]) Study

Condition

Studies showing definite reduction in mortality for condition Petty (61) Respiratory failure treated with ventilators Rogers (66) Respiratory failure treated with ventilators Bates (3) Status asthmaticus and emphysema Drake (23) Nonhemorrhagic stroke Skidmore (75) Postoperative trauma patients Feller (26) Severe burns Studies showing no reduction in mortality for condition Pitner (63) Strokes Piper (62) Drug overdose Jennett (37) Head injuries with coma Casali (10) Postoperative acute renal failure Griner (29) Pulmonary edema Hook (34) Pneumococcal bacteremia

that are currently involved in delivering this service. Compounding this problem is the diverse nature of the case-mix groupings among patients who are admitted to individual ICUs; they not only differ by diagnosis, but also in the severity of their illnesses. The use of current methods to evaluate critical care as a program is rendered virtually impossible by the ethical ramifications. As it is practically impossible to separate the intensity of care from the setting in which it is provided, strict evaluations would require the randomization of patients to the ICU versus a standard ward. In reviewing the process by which individual critical care programs have developed, and are now perceived by both the consumer and the health care community, Berenson has concluded that "there is general agreement that such randomized studies would be unethical, as it is felt that for many problems, treatment in an ICU is necessary if a patient is to have a chance of survival" (5). Nonetheless, in the absence of well-controlled randomized studies, there remains uncertainty regarding the efficacy and effectiveness of critical care. While the aforementioned assessments will remain difficult to ever envisage, some outcome research has been conducted, usually employing historical controls or preICU/post-ICU designs. Thus, if technological innovations in the care process exert a positive impact on the provision of critical care, one might expect to observe certain benefits in patient outcome as a result. A reduction in mortality and/or morbidity over time would be an indication of improved effectiveness in the delivery of this service. Improvements in the quality of life of discharged patients would be an additional indication of the benefit of a critical care program. A third patient outcome that could provide evidence of benefit is a decreasing length of stay for a given illness. Early attempts to document the effectiveness of critical care focused on defining the interaction of this service with its outcome, defined by mortality rate. Berenson (10) has reviewed some of the early methods by which mortality outcomes were applied to evaluating the effectiveness of critical care (Table 2). The conflicting results that were apparent in these studies were probably due to the analytical method that was employed, specifically the use of retrospectively collected data. The latter process is subject to certain biases, including the differential treatment of patient groups over time, as well as time-related differences that develop in the demographic and prognostic variables between patient groups. The heterogeneity of their patient popula424

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care Table 3. Heterogeneity of the Mortality Rates within Intensive Care Units (Adapted from [5])

Percent hospital Type of ICU

Study Rogers (66) Pessi (60) Spagnolo (76)

Turnbull (82) Tomlin (81) Vanholder (83) Chassin (12) Fedulo (25) Parno (59) Thibault (80) LeGall (49) Murata (54) Knaus, et al. (42)

Rc Sd M M-Sa M-S M M,R M M-S M-Cb M-S M M-S

Percent ICU mortality

mortality for ICU patients

18.0 20.1 28.0 22.3 13.5 32.6 21.0 11.7 6.0 16.7

26.0 28.9 47.0 38.6 19.7 42.6 14.0 29.0 17.3 10.0 34.0 26.8 16.9

« M-S = Medical-surgical ICU; b M-C = Medical-cardiac ICU; c R = Respiratory ICU; d S = Surgical ICU.

tions, with ICU mortality rates ranging from 6% to 43% (Table 3), further hampered initial attempts at research into patient outcomes in the ICU setting. Thus, the basic epidemiological tenets of describing processes according to "person, place, and time" were not adequately addressed in critical care. Data assembled for other administrative purposes has recently been used to compare hospitals and their major care areas. Using hospital discharge coding, it has been demonstrated that hospitals with high surgical volumes had lower death rates in their ICUs than did hospitals with low surgical volumes (20, 50, 51). Authors of such analyses appropriately note that inferences regarding causation are impossible in such databases, due to their primary source. That is, without knowledge of salient patient characteristics, it is impossible to separate the contributions of "person" from "place" The release of specific hospital death rates by the U.S. Health Care Financing Administration (HCFA) also identified significant dissimilarities in mortality rates among hospitals. While it is possible that hospitals with higher death rates are those that are the least effective, and are thus inefficient, an alternative explanation would be that these are the hospitals that admit the most severely ill patients. Thus, in order to make valid comparisons between units or hospitals, a mechanism to introduce quantifiable measures of illness severity, and thus to make comparable the characteristic of"person" was needed in these evaluation processes. Subsequently, scales for the classification of MI patients (58), patients with brain injury (39), and burn patients (26) were introduced to permit interhospital comparisons. Although a step in the right direction, comparisons among units remained imprecise because of the inability to control for differences in the case-mix groups among different units. To improve the validity of comparisons both within and between critical care units of different sizes and patient compositions, a quantification of illness severity (independent of admission diagnosis) was required. The Therapeutic Intervention Scoring System (TISS), developed in 1974 and updated in 1983 (40), was introduced for this purpose. TISS assigns scores for each intervention that a patient receives; the scores range from 1 to 4, based on the intensity and severity of the intervention, with scores of 4 being the most intense therapies (e.g., controlled ventilation with or INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

425

Sibbald and Inman

without positive-end expiratory pressure [PEEP]) and points are summed over a 24hour period to achieve a daily score. Patients are then grouped in classes I to IV, with patients in class IV representing the most severely ill. TISS has been employed for a variety of purposes since its introduction. A critical care program cannot proceed at peak efficiency if financial resources must continually be directed at seeking and training new staff members. Therefore, an early application included the review of the ratios of nurses to patients, wherein a "patient points per nurse" index was derived in individual critical care units; where this measure was elevated, high stress levels and high turnover rates were demonstrated (14). The TISS score has also been used as a stratifying measure for cost analyses, and it has been concluded that it is a good predictor of total critical care costs (15). A third use of the TISS methodology has been in the assessment of current and future utilization of critical care beds (70). The introduction of TISS has provided a mechanism to evaluate a number of management issues within, as well as among, different critical care units. For a number of reasons, however, the TISS program was hampered fn its goal of quantifying the severity of illness. TISS was found to be a poor predictor for mortality in both critically ill adult (8) and pediatric (67) patient groups. Attempts to compare either effectiveness or efficiency among individual critical care units was not possible using the TISS method, since it did not provide a sensitive predictor of mortality and/or other adverse occurrences. This issue is evident in the following example. Suppose we wanted to compare the critical care program in two hospitals, A and B, each of which has identically equipped facilities and receives patients whose illnesses are of equal severity. In addition, let hospital A have a standing order that requires that all patients have three cardiac output measurements daily, while hospital B's standing order requires only one. Measurement of cardiac output is a diagnostic intervention measured by TISS. Any comparisons between hospitals A and B would indicate that hospital A's patients were more severely ill than hospital B's since the TISS score would be higher in the former than the latter. Given the higher TISS score, we might, therefore, expect hospital A to demonstrate a higher mortality rate than hospital B. If this were not borne out, and in fact hospital A actually had an equal or slightly lower mortality rate, the temptation would be to infer that the critical care program in hospital B was not as effective as it should be. Thus, the difficulty with using the TISS program for comparing outcome within or among critical care units is that the individual components defining the final TISS score are not solely patientspecific, but vary considerably according to physician interaction. The Acute Physiology and Chronic Health Evaluation (APACHE) (46) scoring system was subsequently introduced to deal with some of the deficiencies inherent in TISS, specifically to permit valid comparisons of throughput and outcome in critical care units of different composition, size, and geography. Based on the premise that any severity of illness index should reflect the probability of mortality, Knaus et al. developed a two-part index, with the first part based on 33 routinely collected physiological parameters purporting to measure the acuity of illness. The second part of this index included the patient's age and an estimate of the patient's chronic health status, as these factors are known to influence the patient's physiological reserve. Similar to the TISS methodology, scores from 0 to 4 are assigned for each physiological measurement, with scores of 4 indicating the most severe physiological derangement. Four chronic health categories were designed: A, B, C, and D, with the patient being assigned to a category based on a review of the medical history. Like the physiology score, patients in category D are the most chronically debilitated. APACHE scores

426

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care 100.0%9080£.

70 60-

X UJ Q _i

fc a. co O

x

50403020100

I

I

I

I

T

I

I

I

I

I

I

I

I

6

9

12

15

18

21

24

27

30

33

36

39

42*

APACHE II SCORE C D PREDICTED

I OBSERVED

Figure 1. Acute Physiology and Chronic Health Evaluation (APACHE) II scores and mortality rates in 5,030 consecutive patients admitted to intensive care units at 13 hospitals. Actual (solid) and predicted (open) bars are indicated; r = 0.995 (Adapted from [43]).

are calculated within the first 24 hours of admission, and include the most deranged physiological measurements occurring in that period. From its inception, the APACHE scoring system has been subjected to rigorous validation and testing. This process has resulted in the development of the APACHE //program (44), a simplified version of the original, and a soon to be reported APACHE III program (85). Preliminary uses of the original APACHE system proved it to be a good indicator of mortality in a variety of critical care settings in the United States (Figure 1) (43). Although there remain some methodological concerns, it is generally agreed that the APACHE II system quantifies differences in illness severity with reasonable sensitivity. Indeed, the existing benchmark for the evaluation of patients' outcome in critical care was reported by Knaus in 1984 using the APACHE II system (43). From a database generated in 13 large hospitals, observing some 5,030 patient outcomes, a number of pertinent issues relating to critical care were reported. Importantly, mortality was not uniform across all hospitals, even after controlling for severity of illness. One hospital experienced significantly fewer deaths than expected, while a second hospital experienced significantly more deaths than expected. The important observation of this study remains consistent when mortality patterns were examined using only nonoperative patients. This study also demonstrated that hospitals with a high degree of coordination and positive interaction among the staff were also those with the most favorable mortality experiences. Using TISS to assess the number and type of therapeutic interventions, the authors found that one hospital averaged 40% more points per patient than the other hospitals. Surprisingly, this difference was not attributable to "high-tech" invasive monitoring techniques, nor to active treatment, but rather came from "frequent laboratory testing, dressing changes, and chest physiotherapy" (43). The important results of this study

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

427

Sibbald and Inman

imply that not all critical care services function at the same level of effectiveness or efficiency, that the process by which care is provided may have a significant impact on outcome, and that outcome may not necessarily relate to the quantity of resources that are consumed in the patient's care. Using much the same method as in the aforementioned study, Knaus and colleagues subsequently compared critical care programs in the United States with those of France (45) and New Zealand (86). When comparing French programs with those in the U.S., the authors found that "French ICU patients were significantly younger, many more were transferred from another hospital and more were admitted for treatment rather than close observation" (45). Interestingly, for patients with equally severe illnesses, those in France had a length of stay approximately twice as long as their American counterparts. French physicians considered these longer lengths of stay necessary, since ward care cannot monitor the patients closely enough. As there were no significant differences in mortality between the care provided in this sample of ICUs of the two nations, questions regarding the efficiency of the French ICU system resulted. If the same ICU resources consumed in the monitoring of these patients could be channelled into an intermediate-care facility, the efficiency of the program should improve, as fewer resources would be needed to achieve the same outcome. This international study also demonstrated that invasive monitoring procedures were not used elsewhere to the same extent that they are in the U.S. Like France, critical care units in New Zealand are more likely to treat young, previously healthy nonoperative patients than are U.S. programs (86). Finally, the APACHE II system proved to be a good predictor of mortality in all three countries. Another use of APACHE II is to control for varying patient case mix and severity of illness when undertaking quality assurance, utilization review, or audits of critical care units. A recent study by Brown and Sullivan (7) examined mortality rates. Using mortality data collected for two consecutive years, one year without and one year with the specialist, the authors reported that both ICU mortality and hospital mortality were significantly reduced when an intensivist joined the critical care staff. In this study, the APACHE II system was used as a tool to examine the comparability of the two groups of patients who were sequentially evaluated in two time periods. While the authors did not comment on issues such as effectiveness and efficiency, this study did confirm Knaus' (43) previous suggestion that both staffing and process of care may significantly affect mortality in critical care units. Although the validity of applying the APACHE methodology to the quantification of illness severity in various analyses of critical care has been generally accepted, recent work has suggested that there remains some deficiency in this system that must be recognized. In patients receiving total parenteral nutrition, Hopefl et al. (35) reported that neither the APACHE II scores calculated at admission, nor those calculated on the day that parenteral nutrition was started were good indicators of mortality. Escarce and Kelley (24) and Dragsted et al. (22) have also suggested that some deficiencies remain in the validity of the APACHE II system, as they found that the source of a patient's admission was an important variable predicting hospital mortality, and was independent of the admission APACHE II score; this is an example of the probability of lead-time bias influencing the admission APACHE II score. Our results indicate that the rate of hospital death predicted by the APACHE II classification system was the same as the actual hospital death rate among patients who were admitted to the MICU (medical ICU) directly from the emergency department, but was lower than the actual death rate among patients who were transferred to the MICU from regular hospital floors, the medical intermediate care unit, and other hospitals (24). 428

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care

In summary, the evaluation of a critical care program relies on measures of patient outcome to assess its effectiveness and efficiency, because traditional methods of assessing efficacy using, for example, randomized admission to critical care, are unacceptable. It is essentially a macro-evaluation process, as it would be virtually impossible to design, implement, and conduct an assessment of all the technologies currently employed in the provision of a critical care service. Due to the absence of an accepted classification scheme for stratifying patients admitted to the critical care units, early attempts at assessing the effectiveness of these programs using patient outcome data were severely hampered. Only with the advent of measures that attempted to quantify the severity of a patient's illness could meaningful studies of patient outcome proceed. With the development of the scoring systems, such as TISS and APACHE II, it is now apparent that there are distinct differences in the practice and performance of individual critical care units, irrespective of the type of patient population. While conducting these studies is an admirable beginning, the time has come to use this information to make sound policy decisions concerning the appropriate provision of critical care services. To date, however, the preponderance of studies have been concerned with assessing components of the critical care service. It is to this issue that we now turn. COMPONENT ASSESSMENT IN CRITICAL CARE

Although there has been increasing attention paid to evaluating critical care at the program level, the vast majority of studies evaluating critical care have occurred at the level of the components defining this hospital-based program. Although the last 2-3 decades have seen a "technological revolution in all components of the health care sector, nowhere has this period had such an impact as in the hospitals critical care program" (73). It has been estimated that the cost of care for a critically ill patient may be 200 to 400% in excess of tfiat required for providing care at the general ward level (5). There are those who would argue that a substantial portion of this increase is due to the inappropriate use of diagnostic and therapeutic technologies. It is with selected items in these two components of the critical care program that the remainder of this review will deal. Assessing Diagnostic Technology in Critical Care

Jennett identified five circumstances under which the deployment of therapeutic or diagnostic technologies should be considered inappropriate. "Inappropriate deployment may be unnecessary, because the same end could be achieved by simpler means; unsuccessful, because the condition is beyond influence; unsafe, because the risks of complications outweigh the probable benefit; unkind, because the quality of life afterwards is unacceptable; and unwise, because resources are diverted from more useful activities" (36). Each of these issues is particularly germane in critical care, as it is here that the clinician must continuously make judgments about the relative merits of conflicting data, and apply them in the practice situation. It is obvious that both the effectiveness and efficiency of individual critical care units ultimately rests on these decisions. Many different types of technology are used in both the diagnosis and treatment of critically ill patients. It is easy to emphasize a Jarge expensive item, such as cardiac scintigraphy, nuclear magnetic resonance (NMR) imaging, or computed tomography (CT), when discussing the impact of technology on critical care. While these are undoubtedly an important aspect of costs generated in this discipline, less expensive, "lowtech" procedures likely have greater impact on the overall rate of technology use. It INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

429

Sibbald and Inman a Table 4,. Possible Results of a Diagnostic Test

Disease status as judged by gold standard Test result

Present

Absent

Total

Positive Negative Total

A C A + C

B D B + D

A + B C + D N

• Stable properties: A/A + C = Sensitivity, D/B + D = Specificity; Prevalence (A + C/N) dependent properties: A/A + B = Positive predictive value, D/C + D = Negative predictive value, (A + D)/N = Accuracy.

has even been suggested that the bulk of spending on technology in critical care is likely accounted for not by visible big-ticket items, but by low-cost, high-volume technologies and the many small but frequently performed procedures (16). In order to render judgment concerning a diagnostic technology's benefit to patients in critical care, measures of its efficacy, effectiveness, and cost are needed. The assessment of the efficacy of any diagnostic test includes a measure of its sensitivity and specificity, estimates of which can be derived by the information contained in Table 4. Sensitivity can be described as the number of diseased individuals with a positive test divided by the total number of diseased individuals (a/a+c). Thus, sensitivity answers the question: "If the patient has the disease, how likely is he/she to have a positive test?" (17). Specificity, on the other hand, is the number of individuals having a negative test without the disease divided by the total number of nondiseased individuals (d/b+d). Specificity then answers the question: "If the patient does not have the disease, how likely is he to have a negative test?" (17). While a high sensitivity and specificity are desirable for any diagnostic test, what is of greater concern to the clinician are the test's powers ofpositive and negative prediction, since the patient's true disease status is usually not known at the time that the diagnostic test is performed. Positive prediction can be defined as the probability of disease in an individual, given that a positive test result has been obtained (a/a+b). Therefore, this parameter gives the physician an idea of how good the test is at "ruling in" a disease. Conversely, negative prediction is the probability that an individual is not diseased, given that a negative test result has been obtained (d/c+d). In this case, the physician is supplied with information on the test's ability to "rule out" the disease in question. Intuitively, these predictive values are a function not only of the sensitivity and specificity of the test, but also a function of the prevalence of the condition in the population under study (a+c/N). It is in the evaluation of these four indices that a judgment can be rendered regarding the efficacy of any diagnostic test. Although application of these four indices would appear to be quite straightforward, certain biases and problems which have been described in this process seem to be the rule rather than the exception when it is applied to critical care. Ranoshof f and Feinstein (65) identified three components of the diseased and comparative populations which could lead to erroneous conclusions regarding the efficacy of a test, including: the pathological component, the clinical component, and the comorbid component. In the diseased group, variations in the extent, location, and cell type (pathological component), or in the chronicity and severity (clinical component), or in the amount and type of coexisting ailments (comorbid), may all affect the reported sensitivity of the test. These are particularly salient issues when evaluating diagnostic

430

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care

technology in critical care in each of these components, since this patient population is typically extremely heterogeneous. In addition, equal attention must be paid to the impact of these three components in the nondiseased group when evaluating the specificity of any test. If a suitable pathological, clinical, and comorbid spectrum of patients is not used, the resulting measure of a test's specificity will be exaggerated. Thus, if the majority of the diagnostic tests that are employed in the work-up of critically ill patients were developed using remarkably dissimilar patient populations, quoted measures of sensitivity and specificity may not be even remotely valid for critically ill patients. Consider the following three diagnostic tests which are frequently employed on patients admitted to critical care: the CT scan, the labelled white blood cell (WBC) scan, and NMR imaging. While these tests may perform well in their respective populations, this fact does not guarantee that they will demonstrate the same degree of efficacy in the critical care setting. If they are not as efficacious when applied in the critical care population, it follows that both the effectiveness and efficiency of the unit could suffer. If they are not as effective, with both the sensitivity and specificity associated with the test being affected, optimal patient care could be hampered. Efficiency could also be reduced as a result of using an inappropriate test since the amount of resources, and thus costs, being used for those patients for whom the test is inappropriate are greater than necessary. // is crucial that future evaluations of diagnostic technologies that are employed in critical care be evaluated in a critical care population! In addition, evaluating a diagnostic technology is subject to certain inherent biases, even when the diseased and comparison populations have been adequately selected. Work-up bias is possible when the course of a patient's treatment is affected by the test result. This problem is very real in critical care, and can be illustrated using the WBC scan as an example. The WBC scan is used as an aid to localize an inflammatory focus; when it is positive, another .diagnostic test is usually required; laparotomy is one such confirmatory test. However, patients exhibiting a negative WBC scan usually do not undergo laparotomy. As a result, their inclusion in the assessment will result jn a work-up bias. Three other forms of bias may affect the assessment of the efficacy of a diagnostic test in critical care. Diagnostic review bias is possible when the test result affects the subjective review needed to establish the true diagnosis. This type of bias can occur in any of those diagnostic tests where the physician who is establishing the true diagnosis is aware of the result of a diagnostic test. Examples of this type of bias can be found when the true diagnosis is arrived at by the subjective examination of venography, angiography, or radiographs. To deal with this concern, examination of the test result and the establishment of the diagnosis should be carried out independently. Testreview bias occurs when the diagnostic test is run after the true diagnosis has been established. Like the diagnostic review bias, this problem can be alleviated by insuring that test results are reviewed independently from the diagnosis. Incorporation bias, as the name implies, occurs when the diagnostic test result is actually used as part of the criteria for establishing the diagnosis. Implicit in the discussion of these problems and biases is the assumption that, for each diagnostic test being examined, there is a reproducible "gold standard" for establishing the presence or absence of disease. Given the setting in which critical care medicine is practiced, along with the fact that most diagnostic tests which are employed were developed elsewhere, estimating the efficacy of a test in this environment becomes extremely difficult. Table 5 contains a.list of questions which may be asked

INTL. J, OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

431

Sibbald and Inman Table 5. Questions Regarding the Validity of a Diagnostic Technology Assessment (Adapted from [72])

1. 2. 3. 4. 5. 6. 7. 8. 9.

Is the diagnostic test compared against a well-defined gold standard? Are positive and negative test results clearly defined? Are the performance and interpretation of the test done in a blinded fashion? Are the data displayed in tabular form? Are the terms sensitivity and specificity used correctly, and if so, are the calculations done correctly? Are the terms positive and negative prediction used correctly, and if so, are the calculations done correctly? Has the test been used in a sufficiently diverse group of persons with the disease so as to establish its negative predictive value, and thus its potential efficacy in ruling out the disease? Has the test been used in a sufficiently diverse group of persons without the disease so as to establish its positive predictive value and thus its potential efficacy as a tool to rule in disease? Is there any recognition of the influence of setting, prevalence, or pretest likelihoods on clinical utility?

when considering the merits of a diagnostic technology assessment which is conducted in the critical care setting. Examples of Diagnostic Technology Assessment in Critical Care

It is, therefore, obvious that the inappropriate deployment of diagnostic tests will influence both the effectiveness and the efficiency of the critical care facility. While the indices that are used to assess the efficacy of a diagnostic test are usually reported, the distinct nature of the critical care population must also be kept in mind. Unless the diagnostic test has been rigorously evaluated using appropriate controls and in the critical care setting, caution should be exercised in the interpretation of the assessment's results. There are some germane examples of the problems that are inherent in technology assessments in critical care when appropriate controls are not employed. For example, pulse oximetry is a relatively new technology that is used in critical care to monitor continuously, and noninvasively, arterial oxygen saturation. While there have been reports comparing the accuracy of pulse oximeters between different manufacturers (52;71), as well as against oxygen saturation derived from arterial blood gas analysis (11;13), we have found surprisingly few studies which attempted to measure the efficacy of pulse oximetry using the additional methods previously described. In only one study could we find that the efficacy of pulse oximetry had been assessed in the adult critical care setting (55). Here the authors reported a sensitivity of 100% and a specificity of 82% for the Biox III pulse oximeter. Based upon these indices, it would appear that pulse oximetry should receive a positive evaluation in the adult critical care setting. However, with specific reference to the questions outlined in Table 5, critical analysis of this study identified several problems which should minimize any conclusion about the efficacy of pulse oximetry in critical care. While the authors used an appropriate gold standard to judge the test arterial blood gas measurements, as well as a clear definition of what would be considered a positive and negative test result, this clinical evaluation was based on a study of only 23 patients, wherein only 7 hypoxic events were recorded. Clearly, the sample size was too small and the adverse respiratory events too few to allow definitive answers about the positive and/or negative 432

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care

predictive value of pulse oximetry. As a result, no valid inferences can be made from this study regarding the efficacy, effectiveness, or efficiency of pulse oximetry when it is applied in the critical care setting. As a second example, again consider the WBC scan. As was mentioned, this test may be used in the critical care setting to define the presence or absence of a site of infection in a critically ill patient. In a recent retrospective review of 45 patients in whom indium-III leucocyte scintigraphy had been performed with the clinical suspicion of an intraabdominal abscess (2), the reported sensitivity and specificity of this scintigraphic procedure was 95% and 91%, respectively. This study, however, did not clearly define what constituted a positive test result, nor was there any mention of where the patients who were studied had been recruited; both issues could have substantially affected the reported results. As in the previous study evaluating pulse oximetry (11), the current study reporting on the utility of the WBC scan suggests that this test could be an invaluable diagnostic tool for detecting a focus of intraabdominal infection. However, upon closer inspection, it is clear that no definitive statement regarding the WBC scan was warranted from the results of that study. These clinical studies (2;55), both inferring some evidence of benefit from their respective diagnostic tests, illustrate the need for more thorough diagnostic technology assessments in critical care. Part of the difficulty with the need for more thorough technology assessments in this discipline is that only recently have educational objectives for critical care trainees emphasized technology evaluation. It is imperative that both physician and nonphysician scientists be encouraged to study the evaluation of technologies applied to critical care. Failure to encourage this process will result in arbitrary decisions made by governing bodies without the appropriate database to make decisions in the best interest of the patients. Assessing Therapeutic Technology in Critical Care

"With the development of clinical research, attempts to establish safety and efficacy became more systematic and scientific, culminating in the crown jewel of technology assessment, the randomized clinical [controlled] trial" (RCT) (28). While the determination of the parameters of safety and efficacy are still of vital interest in assessing therapeutic technologies in the critical care setting, technology assessments have added many more important issues, including cost-effectiveness, and cost-benefit (19;33). As a result, technology assessments now include not only physician researchers, but also economists, epidemiologists, systems analysts, and medical sociologists, among others. A review of the literature {Compact Cambridge Medline) found there were over 100 RCTs conducted in critical care between the years 1985 to 1989 (using the search terms Randomized and Critical Care/ICU). While this may seem adequate in the sense that there are attempts being made at establishing the most efficacious procedures and treatments, the results are deceiving. In a review of three acute care journals often cited in the critical care literature, {Annals of Emergency Medicine, Critical Care Medicine, Journal of Trauma), Kelen et al. (41) found that, according to previously published criteria for evaluating clinical trials (18), none of the journals consistently reported enough methodological information to allow the assessment of their bias-reducing techniques and statistical methodology. Given the fact that RCTs are both expensive and time-consuming to conduct, this lack of attention to methodological rigor is most disconcerting. One of the more common pitfalls"in these reports is small sample size and the resulting lack of statistical power. The result of these drawbacks is that critical care physicians are forced to evaluate each piece of conflicting information individually, and thereby synthesize the information that they deem the most useful. Recently, INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

433

Sibbald and Inman Table 6. Questions Regarding the Validity of a Therapeutic Technology Assessment 1. Are the primary, and all the other, questions that the researchers ask, carefully selected, clearly defined, and stated in advance? 2. Is the study population selected using unambiguous inclusion and exclusion criteria? 3. Were the inclusion and exclusion criteria applied before the patient was allocated? 4. Is the study truly randomized, and if so, are both the researchers and patients blinded (when possible) as to treatment allocation? 5. Are all clinically relevant outcomes considered? 6. Is the assessment of outcome conducted in a blinded fashion, that is without knowledge of patient allocation? 7. Is any information given about the number of patients lost to follow-up, and the reasons? 8. Are the groups compared at baseline so as to assess the degree to which random assignment was successful in distributing potentially confounding variables evenly between groups? 9. Are statistical analyses carried out, with mention of the statistical methods used? 10. Was any justification given for the sample size the authors used? 11. Is there any mention of the statistical power (the chance of missing a significant finding) associated with the sample size? 12. Are the results generalized beyond the study population?

attempts have been made to improve the standards of the RCT in critical care through the linkage of critical care units into broad-based groupings which collectively provide the throughput necessary to collect sufficient sample sizes of the required patients. Examples of these groups include the Sepsis Group (6), the Monoclonal Antibody Group (24), and the Multiple Organ Failure Study Group (74). Table 6 provides a list of general questions which may help readers select those articles that are worth reading. An example of the conflicting therapeutic information in critical care is evident in the controversy over prophylaxis of gastrointestinal (GI) bleeding. One type of treatment is the administration of H2 receptor blockers, while an alternate therapy is the administration of antacids. In a 1985 meta-analysis of the RCTs in this area (47), five of eight trials found H 2 receptor blockers to be more effective than placebo or no treatment. In other studies employing antacids to prevent bleeding, six of nine studies found that antacids were better than placebo. In order to confuse the physician even more, in two often clinical trials antacids were found to be better than an H2 receptor blockers in the prevention of GI bleeding. LaCroix's conclusions highlight problems with RCTs in critical care: . . . weaknesses in the study designs, heterogeneity of treatment effects, the lack of strength of the accumulated evidence, and the fact that no utility has been shown in terms of reducing morbidity (shock, need for transfusion) or mortality, prevent any definitive conclusion in regard to compulsory use of upper GI bleeding prophylaxis for ICU patients (47). Given the advances in patient classification and scoring of illness severity, there is clearly room for marked improvement in the conduct of RCTs in the critical care setting. While the RCT has been frequently employed in clinical research, it is a relatively recent addition. In the past, an approach that was commonly used to assess therapeutic technologies has been to compare patients treated with the novel treatment to historical controls treated with the standard treatment. This study design possesses distinct advantages: A major advantage offered by the historical control format is its potential to resolve the central ethical dilemma of the prospective study—how to evaluate a clinical intervention that you be-

434

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care lieve is more effective than existing interventions without depriving some of your patients of the benefits of that increased effectiveness as a necessary part of the evaluation (53).

The use of historical controls to evaluate therapeutic technologies has another advantage over the long and arduous process of conducting a RCT. Particularly germane to critical care research are those conditions where the outcome is uniformly, or near uniformly, fatal. In the case of end-stage renal failure, dialysis did not have to undergo a RCT to prove it was an effective therapy. Another scenario where the historical control may prove to be an adequate method of assessment is in the case of rare diseases. In fact, it may be the only feasible format if results of satisfactory clinical and statistical significance are to be achieved in a reasonable time frame (53). While the use of historical controls does have some distinct advantages over a prospective randomized design, its disadvantages are well chronicled. In a recent review (69), Sacks et al. found that the use of historical controls was associated with far more positive results than were randomized trials of the same procedures. "It is inevitable that studies using historical controls will not be able to ensure the absence of confounding because of imbalances between experimental and control groups in the distribution of unknown prognostic factors" (53). However, there are problems with this design which are avoidable. Most obvious is the failure to ensure that both control and experimental groups are comparable where important demographic and prognostic variables are concerned. Secondly, patients should be identically treated except for the therapy being assessed (64). It is well documented that critically ill patients have more physiological parameters recorded routinely than their counterparts who are receiving standard ward care. In addition, the course of their therapy is more concisely documented. As a result, the critical care environment may be particularly well suited to this methodology. This is not an endorsement of supplanting the wellconducted RCT as the definitive method for identifying efficacious therapeutic technologies, but rather a suggestion tHat the appropriate use of historical controls may provide the building blocks upon which more definitive research can be done. An example of the use of historical controls in the critical care setting includes studies of the prevention of infection by selective decontamination of the digestive tract (SDD). Using this approach, both Stoutenbeek et al. (77) and Sydow et al. (79) reported that SDD resulted in a significant reduction in mortality for polytrauma patients admitted to the ICU. Both studies attempted to control for important demographic, prognostic, and treatment variables, lending more credibility to their use of this design. However, in a recent review of SDD, Stoutenbeek and Van Saene concluded: The ultimate proof that selective decontamination reduces mortality can only be produced in a properly designed trial in primarily noncolonized and noninfected patients, with a comparable severity of the underlying curable disease and comparable age, irrespective of the length of stay in the ICU (78).

Clearly, the use of historical controls should not be abandoned, as these authors outlined the type of patients required to provide definitive proof regarding the efficacy of this therapeutic technology. Other methods of assessing the efficacy of a therapeutic technology are available, but have not, as yet, had much impact in the critical care setting. Included in this category would be the strategies of «-of-l randomized controlled trials (30), the sequential adaptive design (1), and the use of the principles of the randomized trial applied to the case-control design (67). While these methods have not been utilized to any great INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

435

Sibbald and Inman Table 7. Guidelines for Performing an n-of-1 Randomized Controlled Trial (Adapted from [31]) 1. Is an /i-of-1 trial indicated for this patient? (i.e., is this design applicable?) la. Is the effectiveness of the treatment really in doubt? lb. Will the treatment, if effective, be long-term? 2. Is an w-of-1 trial feasible in this patient? 2a. Does the treatment have a rapid onset? 2b. Does the treatment stop acting soon after it is discontinued? 2c. Is an optimal duration of treatment feasible? 2d. Can clinically relevant targets be measured? 2e. Can sensible criteria for stopping the trial be established? 2f. Is an unblinded run-in period necessary? 3. Is the trial feasible in your practice setting? 3a. Is the infrastructure in place for logistical help (e.g., pharmacy to assign drug orders)? 3b. Are strategies in place for interpreting the data once it is collected? 4. Is the trial ethical?

extent, a brief mention of the methods of each will demonstrate their applicability to the critical care setting. . The /z-of-1 approach is analogous to the traditional RCT, but, as the name implies, it is an experimental study of single subjects. Briefly, a patient undergoes pairs of treatment periods during which he or she receives both the experimental and standard therapies; the order is assigned at random. Pairs of treatment periods are continued until effectiveness is proved or refuted. It is obvious that this method avoids some of the ethical dilemmas inherent in the traditional RCT. All patients enrolled in the study are treated equally, with none being deprived of a potentially beneficial treatment. "Moreover the research often has immediate value to the subject, for it is possible to determine not only which regimen works better on average, but also which regimen works better in that particular patient" (53). It is obvious that n-of-1 trials are not applicable to certain conditions such as surgical procedures or self-limited illnesses. Table 7 provides some guidelines for performing an n-of-1 trial. In order for this method to be appropriate, all answers to the questions must be affirmative. Intuitively, this design should be attractive to the critical care physician considering the following scenario. Critically ill patients are a very heterogeneous group, with many of them not fitting the eligibility criteria that were used in trials to determine efficacious treatment. As such, there is no guarantee that what is supposed to be the best course of treatment for an individual patient is actually the best treatment. One need only to look at trials concerning Adult Respiratory Distress Syndrome (ARDS) and Sepsis Syndrome to see that the eligibility criteria differ from study to study. Once again, the clinicians are forced to synthesize conflicting information in an attempt to best treat their patients. In this manner the criteria for establishing efficacious, effective, and efficient therapies will be a long and arduous task in the critical care setting. Sequential adaptive designs have their roots, like the /i-of-1-trials, in the traditional RCT. When investigators use the information generated during the course of an evaluation to make adjustments to the protocol, they are making use of an adaptive design. These assessments, in the planning stage, contain provisions for interim or sequential analysis of the study's results. Their most common application is to a) adjust the needed sample size or b) adjust the study's duration. Analyses are carried out at predetermined intervals to determine if the patient groups differ on the major health outcomes of interest. When significant results are obtained, the assessment may be stopped earlier, 436

' INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care

with fewer patients accrued, and the most efficacious therapy instituted. Thus, while resembling the RCT, the sequential adaptive designs are a means to more quickly identify those technologies which are of benefit. In most large multicenter assessments this has become standard practice, but collaborative studies among groups of critical care units are just now becoming possible. Thus, although the RCT remains the "crown jewel" for the assessment of therapeutic technologies in critical care, there are alternative designs which can be applied. Given the expense, time, and rigor associated with conducting an adequate RCT, the case for increased utilization of these alternate methods may be advanced. Where their utility lies is in identifying those technologies where a randomized trial is not necessary, due to either large deleterious or beneficial effects. CONCLUSIONS

In the preceding sections, problems associated with assessing the technology associated with critical care have been presented for both the program as a whole as well as for its diagnostic and therapeutic components. To summarize, the last 30 years have seen critical care medicine expand, in parallel with an explosive increase in medical technologies and an aging population. Critical care beds now make up approximately 5% of all acute hospital beds, but impact on approximately 20% of total hospital expenditures. While critical care expanded in a relatively unhindered process in its early years, it is currently facing increasing pressures for cost containment, improved accountability, and other management initiatives that, reasonably, try to assess the possibility of improving the efficiency of this program. Since it has been argued that the acquisition of technologies has been a major contributor to the cost of providing critical care, it is astounding that a discipline which emphasizes technology to achieve its mission has not implemented appropriate processes for establishing the criteria for efficacious, effective, and efficient use of these technologies. Evaluating critical care at the program level introduces unique problems. Separating the modes of critical care practice from the physical setting in which they are provided poses ethical dilemmas which, in effect, preclude randomized studies to evaluate this program's overall efficacy. While studies of the program's effectiveness have been attempted, until recently they have been plagued by methodological difficulties associated with the unique and heterogeneous nature of the typical critical care patient population. With recent advances in classifying patients according to illness severity, studies of effectiveness are becoming more sound. Unfortunately, these studies are also becoming more infrequent. It strikes these authors that the work by Knaus et al. (43) would be a logical starting point from which to pose the following question: Given the fact that there are differences between critical care units irrespective of the types of patients they serve, how can we proceed to improve the effectiveness of those programs where it is most needed? While this is more of a quality assessment process, it is extremely important, as this is an area where the provision of care could be improved. As Donabedian points out, individual services would be analyzed in this manner to determine whether what they have established as appropriate treatment has been efficient (20). Aberrations which are noted might identify areas where the service has room for improvement. At the component level, the evaluation of critical care technologies has failed dismally. Where diagnostic technologies are concerned, problems stem both from uncontrolled diffusion of the technology as well as from poor attention to the details of established methods of evaluation. Many diagnostic procedures that are employed in INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

437

Sibbald and Inman

critical care have their primary origin in other areas of medicine. While they may have received adequate evaluations in these primary patient populations, such evaluations do not necessarily endorse their diffusion into the critical care setting. Unfortunately, most of these diagnostic technologies diffuse into critical care with little or no attention given to an evaluation of their efficacy in the critical care patient population. Furthermore, when assessments of diagnostic technologies have been undertaken in the critical care setting, they have been plagued by methodological difficulties. Little attention has been paid to the prevalence of the condition in question, or to the spectrum of diseased and nondiseased patients included in the evaluation. Several controllable biases are also often overlooked. The result is that the published measures of a diagnostic test's efficacy in critical care are often not even remotely valid. Intuitively, this results in an inappropriate allocation of valuable resources, which in turn compromises the efficiency with which critical care is provided. From its inception, critical care medicine has attempted to define the most efficacious and effective therapeutic interventions for its patients. However, examination of the literature serves to point out that the quantity of the research should not be equated with quality. Although there have been many randomized controlled trials which attempted to evaluate the efficacy of critical care therapies, they are, by and large, seriously flawed. As was mentioned previously, these trials often provide little information regarding attempts at reducing bias. In addition, sample sizes are usually too small to detect what would be clinically significant differences. While other methods are available for the process of technology assessment, they have been largely ignored in critical care. For critical care providers, significant improvements are required in their current attitudes toward technology assessment if there is to be any hope of them implementing sound changes in the practice of this discipline. The following are specific objectives that could improve these current problems in critical care. 1. Improve the ability of critical care to evaluate its own technology. To date, there has been little educational focus on the processes or importance of assessing technology in critical care. If those who are involved with the provision of this service do not accept the premise that they should become leaders in this process, the responsibility will be turned over to individuals without the clinical expertise that is required to prioritize, design, and implement sound testable hypotheses. Therefore, the onus is on those who are involved with educating our current and future critical care providers about the need for, importance of, and methods for evaluating technology in critical care. 2. Encourage the development of multidisciplinary national and international bodies with a specific mandate to advise on technology assessments. These should be multidisciplinary bodies composed of the various professionals who are necessary for the sound and comprehensive evaluation of specific aspects of technology assessments. Perhaps the most important early role of such a group would be to act as a data bank. By collecting and storing objective information on the types, scope, and merit of individual technology assessments the critical care practitioner would have a valuable asset to help sort out the currently muddled situation. Having established their credibility in this manner, later in their development these bodies could propose guidelines and methods for both assessing and implementing the technologies associated with critical care. Thus, these multidisciplinary bodies would serve to impose some order on a currently chaotic process. Since the cost, in both dollars and time, is high for conducting randomized controlled trials in critical care, these bodies could research other methods for assessing the technology of critical care. While the well-conducted randomized controlled trial could remain the definitive form of assessing the efficacy of therapeutic technologies, there are other less costly, more expeditious methods available. 438

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care 3. Establish funding sources with the express intent of funding evaluations of technology. A technology assessment, regardless of the method, is an expensive undertaking. Critical care managers cannot be expected to redirect funds from the clinical component of the budget to a budget area concerned with assessments, as the transfer of these funds from patient care would be ethically wrong. In Canada, there is currently such an effort underway. In a partnership approach, the Medical Research Council of Canada and the Medical Technology Industry have a joint program to fund technology-based research. Unfortunately, this encompasses not only assessments of technology, but also the development of new technologies. 4. Encourage industry to develop stronger liaisons with the hospital sector. The current process of technological development takes place far removed from daily clinical practice. If industry were to forge better relations with the hospital sector before and during the development of new technologies, the process would be enhanced. By placing industry's scientists in strong clinical programs, the cost of bringing a technology to market could be drastically reduced. In this manner industry could maintain its profit margin, while medicine would be able to participate in the development of new technologies as well as to receive the finished product at considerably less cost. Since efficiency is the relationship between costs and consequences, this is a desirable gain for both parties.

In summary, the state of technology assessment in critical care leaves much to be desired. While it has been acknowledged that technology assessments are extremely important, all three major players have failed. The critical care practitioner is not sufficiently educated to undertake meaningful and valid assessments. As a result, there is a dearth of valid information. Government has failed by demanding that costs be reduced, but not providing the requisite funding to undertake research into how best to use resources. Industry has failed by not having established ties with the hospital sector until they have a finished product. The current cost containment era in health care demands that each of the stakeholders take steps to correct these mistakes or suffer dire consequences for all involved, -especially for the patient. REFERENCES 1. Armitage, P. Sequential medical trials. New York: John Wiley and Sons, 1975. 2. Baba, A. A., McKillop, J. H., Cuthbert, G. R, et al. Indium 111 leucocyte scintigraphy in abdominal sepsis: Do the results affect management? European Journal of Nuclear Medicine, 1990, 16, 307-09. 3. Bates, D. V. "Workshop on Intensive Care Units," comments of the National Academy of Sciences, National Research Council, Committee on Anesthesia. Anesthesiology, 1964,25, 192. 4. Battista R. N. Innovation and diffusion of health-related technologies: A conceptual framework. International Journal of Technology Assessment in Health Care, 1989, 5, 227-43. 5. Berenson, R. A. Intensive care units (ICUs): Clinical outcomes, costs, and decisionmaking. Health Technology Case Study 28, Prepared for the Office of Technology Assessment, 1984. 6. Bone, R. C , Fisher, C. J. Jr., Clemmer, T. P., et al. Sepsis syndrome: A valid clinical entity. Critical Care Medicine, 1989, 17, 389-93. 7. Brown, J. J., & Sullivan, G. Effect on ICU mortality of a full-time critical care specialist. Chest, 1989, 96, 127-29. 8. Byrick, R. J., Mindor, F. F., McKee, L., & Mudge, B. Cost-effectiveness of intensive care for respiratory failure patients. Critical Care Medicine, 1980, 6, 332-37. 9. Califano, J. A. The health care chaos. New York Times Magazine, 1988, 44, 44-48. 10. Casali, R., et al. Acute renalinsufficiency complicating major cardiovascular surgery. American Journal of Surgery, 1975, 181, 370. 11. Cecil, W. T., Petterson, M. T., Lamoonpun, S., & Rudolph, C. D. Clinical evaluation of the Biox IIA ear oximeter in the critical care environment. Respiratory Care, 1985,30,179-83. 12. Chassin, M. R. Costs and outcomes of medical intensive care. Medical Care, 1982,20,165. INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

439

Sibbald and Inman 13. Choe, H., Tashiro, C , Fukumitsu, K., Yagi, M., & Yoshiya, I. Comparison of recorded values from six pulse oximeters. Critical Care Medicine, 1989, 17, 678-81. 14. Cullen, D. J., Civetta, J. M., Burton, B. A., & Ferrara, L. C. Therapeutic Intervention Scoring System: A method for quantitative comparison of patient care. Critical Care Medicine, 1974, 2, 57-60. 15. Cullen, D. J., Ferrara, L. C , & Briggs, B. A. Survival, hospitalization charges and followup results in critically ill patients. New England Journal of Medicine, 1976, 294, 892-95. 16. Deber, R. B., Thompson, G. G., Leatt, P. Technology acquisition in Canada. International Journal of Technology Assessment in Health Care, 1988, 4, 185-206. 17. Department of Clinical Epidemiology and Biostatistics, McMaster University Health Sciences Centre. How to read clinical journals: II. To learn about a diagnostic test. Canadian Medical Association Journal, 1981, 124, 703-10. 18. DerSimonian, R., Charette, L. J., McPeek, B., et al. Reporting on methods in clinical trials. New England Journal of Medicine, 1982, 306, 1332-37. 19. Detsky, A. S., & Naglie, G. A clinician's guide to cost-effectiveness analysis. Annals of Internal Medicine, 1990, 113, 147-54. 20. Donabedian, A. The assessment of technology and quality: A comparative study of certainties and ambiguities. International Journal of Technology Assessment in Health Care, 1988, 4, 487-96. 21. Doubilet, P., Weinstein, M. C , & McNeil, B. J. Use and misuse of the term "cost effective" in medicine. New England Journal of Medicine, 1986, 314, 253-56. 22. Dragsted, L., Jorgensen, J., Jensen, N., et al. Interhospital comparisons of patient outcome from intensive care: Importance of lead time bias. Critical Care Medicine, 1989, 17, 418-22. 23. Drake, W. E. Jr., Acute stroke management and patient outcome: The value of neurovascular care units (NCU). Stroke, 1973, 4, 933. 24. Escarce, J. J., Kelley, M. A. Admission source to the medical intensive care unit predicts hospital death independent of APACHE II score. Journal of the American Medical Association, 1990, 264, 2389-94. 25. Fedullo, A. J., & Swinburne, A. J. Relationship of patient age to cost and survival in a medical ICU. Critical Care Medicine, 1983, 11, 155-59. 26. Feller, I., Tholen, D., & Cornell, R. G. Improvements in burn care, 1965 to 1979. Journal of the American Medical Association, 1980, 244, 2074-78. 27. Flood, A. B., Scott, W. R., Ewy, W. Does practice make perfect? Part I. The relation between hospital volume and outcomes for selected diagnostic categories. Medical Care, 1984, 22, 98-114. 28. Fuchs, V. R., & Garber, A. M. The new technology assessment. New England Journal of Medicine, 1990, 323, 673-77. 29. Griner, P. F. Treatment of acute pulmonary edema: Conventional or intensive care? Annals of Internal Medicine, 1972, 77, 501-06. 30. Guyatt, G. H., Keller, J. L., Jaeschke, R. et al. The n-of-1 randomized controlled trial: Clinical usefulness. Annals of Internal Medicine, 1990, 112, 293-99. 31. Guyatt, G. H., Sackett, D., Adachi, J., et al. A clinician's guide for conducting randomized trials in individual patients. Canadian Medical Association Journal, 1988, 139, 497-503. 32. Harvey, A. M. Neurosurgical genius—Walter Edward Dandy. Johns Hopkins Medical Journal, 1974, 135, 358-68. 33. Holly, N. A call for randomized controlled cost-effectiveness analysis of percutaneous transluminal coronary angioplasty. International Journal of Technology Assessment in Health Care, 1988, 4, 497-510. 34. Hook, E. W., Horton, C. A., & Schaberg, D. R. Failure of intensive care unit support to influence mortality from pneumococcal bacteremia. Journal of American Medical Association, 1983, 249, 1055-57. 35. Hopefl, A. W., Taafffe, C. L., Herrmann, V. M. Failure of APACHE II alone as a predictor of mortality in patients receiving total parenteral nutrition. Critical Care Medicine, 1989, 17, 414-17. 440

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care 36. Jennett, B. Inappropriate use of intensive care. British MedicalJournal, 1984,289,1709-11. 37. Jennett, B. Resource allocation for the severely brain damaged. Archives Neurology, 1976, 33, 595-97. 38. Jennett, B. Assessment of a technological package using a predictive tool. International Journal of Technology Assessment in Health Care, 1987, 3, 335-38. 39. Jennett, B., & Teasdale, G. Assessment of outcome after severe brain damage. Lancet, 1975, 1, 480-84. 40. Keene, A. R., & Cullen, D. J. Therapeutic Intervention Scoring System: Update 1983. Critical Care Medicine, 1983, 11, 1-3. 41. Kelen, G. D., Brown, C. G., Moser, M., et al. Reporting methodology protocols in three acute care journals. Annals of Emergency Medicine, 1985, 14, 880-84. 42. Knaus, W. A., Draper, E. A., Wagner, D. P., et al. Evaluating outcome from intensive care: A preliminary multihospital comparison. Critical Care Medicine, 1982, 491-96. 43. Knaus, W. A., Draper, E. A., Wagner, D. P., & Zimmerman, J. E. An evaluation of outcome from intensive care in major medical centers. Annals of Internal Medicine, 1986,104, 410-18. 44. Knaus, W. A., Draper, E. A., Wagner, D. P., & Zimmerman, J. E. APACHE II: A severity of disease classification system. Critical Care Medicine, 1985, 13, 818-29. 45. Knaus, W. A., LeGall, J. R., et al. A comparison of intensive care in the U.S.A. and France. Lancet, 1982, 18, 642-46. 46. Knaus, W. A., Zimmerman, J. E., Wagner, D. P., Draper, E. A., et al. APACHE-Acute physiology and chronic health evaluation: A physiologically based classification system. Critical Care Medicine, 1981, 9, 591-97. 47. Lacroix, J., Infante-Rivard, C , Jenicek, M., & Gauthier, M. Prophylaxis of upper gastrointestinal bleeding in intensive care units: A meta-analysis. Critical Care Medicine, 1989, 17, 862-69. 48. Lasch, K., Maltz, A., Mosteller, E, & Tosteson, T. A protocol approach to assessing medical technologies. International Journal of Technology Assessment in Health Care, 1987, 3, 103-22. 49. LeGall, J. R., Brun-Buisson, C , Tfunet, P., et al. Influence of age, previous health status, and severity of acute illness on outcome from intensive care. Critical Care Medicine, 1982, 10, 575-77. 50. Luft, H. The relation between surgical volume and mortality: An exploration of causal factors and alternative models. Medical Care, 1980, 18, 940-46. 51. Luft, H., Bunker, J., & Enthoven, A. Should operations be regionalized? The empirical relation between surgical volume and mortality. New England Journal of Medicine, 1979, 301, 1364-69. 52. Morris, R. W., Nairn, M., & Torda, T. A. A comparison of fifteen pulse oximeters. Part I: A clinical comparison; Part II: A test of performance under conditions of poor perfusion. Anaesthesia Intensive Care, 1989, 17, 62-82. 53. Moser, M. M. Randomized clinical trials: Alternatives to conventional randomization. American Journal of Emergency Medicine, 1986, 4, 276-85. 54. Murata, G. H., & Ellrodt, A. G. Medical intensive care in a community teaching hospital. Western Journal of Medicine, 1982, 136, 462-70. 55. Niehoff, J., DelGuercio, C , LaMorte, W, et al. Efficacy of pulse oximetry and capnometry in postoperative ventilatory weaning. Critical Care Medicine, 1988, 16, 701-05. 56. Nightingale, F. Notes on Hospitals?3rd ed. London: Longman, Green, Longman, Roberts and Green, 1863, 89. 57. NIH Consensus Development Conference Statement on Critical Care Medicine. In J. E. Parillo & S. M. Ayres (eds). Baltimore, MD: Williams and Wilkins, 1984, 277-89. 58. Norris, R. M., Brandt, P. W. T., Caughey, D. E.,72t al. A new coronary prognostic index. Lancet, 1969, 1, 274-81. 59. Parno, J. R., Teres, D., Lemeshow, S., et al. Hospital charges and long-term survival of ICU versus non-ICU patients. Critical Care Medicine, 1982, 10, 569-74. 60. Pessi, T. T. Experiences gained in intensive care of surgical patients: A prospective clinical INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

441

Sibbald and Inman

61. 62. 63. 64. 65. 66. 67.

68. 69. 70. 71. 72. 73. 74. 75. 76. 77.

78. 79.

80. 81. 82. 83. 84.

442

study of 1,001 consecutively treated patients in a surgical intensive care unit. Annals Chiropractic Gynecology, 1973, 62 (Suppl.), 185, 3. Petty, T. L., Lakshminarayan, S., Sahn, S. A., et al., Intensive respiratory care unit: Review of ten years experience. Journal of the American Medical Association, 1975, 233, 34-37. Piper, K. W., & Griner, P. F. Suicide attempts with drug overdose: Outcomes of intensive vs. conventional floor care. Archives Internal Medicine, 1974, 134, 703-06. Pitner, S. E., & Mance, C. J. An evaluation of stroke intensive care: Results in a municipal hospital. Stroke, 1973, 4, 737-41. Pocock, S. J. The combination of randomized and historical controls in clinical trials. Journal of Chronic Diseases, 1976, 29, 175-88. Ranoshoff, D. R, & Feinstein, A. R. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. New England Journal of Medicine, 1978, 299, 926-30. Rogers, R. M., Weiler, C , & Ruppenthal, B. Impact of the respiratory intensive care unit in survival of patients with acute respiratory failure. Chest, 1972, 62, 94-97. Ron, A., Aronne, L. J., Kalb, P. E., et al. The therapeutic efficacy of critical care units: Identifying subgroups of patients who benefit. Archives of Internal Medicine, 1989, 149, 338-41. Rothstein, P., & Johnson, P. Pediatric intensive care: Factors that influence outcome. Critical Care Medicine, 1982, 10, 34-37. Sacks, H., Chalmers, T. C , & Smith, H. Randomized versus historical controls for clinical trials. American Journal of Medicine, 1982, 72, 233-40. Schwartz, S., & Cullen, D. J. How many intensive care beds does your hospital need? Critical Care Medicine, 1981, 9, 625-29. Sheps, S. A., & Schechter, M. T. The assessment of diagnostic tests: A survey of current medical research. Journal of the American Medical Association, 1984, 252, 2418-22. Severinghaus, J. W., Naifeh, K. H., & Koh, S. O. Errors in 14 pulse oximeters during profound hypoxia. Journal of Clinical Monitoring, 1989, 5, 72-81. Sibbald, W. J., Escaf, M., & Calvin, J. E. How can new technology be introduced, evaluated, and financed in critical care? Clinical Chemistry, 1990, 36, 1604-11. Sibbald, W. J., Marshall, J., Christou, N., et al: "Sepsis"-Clarity of existing terminology . . . Or more confusion? (Editorial) Critical Care Medicine, 1991, 19, 996. Skidmore, F. D. A review of 460 patients admitted to the intensive therapy unit of a general hospital between 1965 and 1969. British Journal of Surgery, 1973, 60, 1-16. Spagnolo, S. V., Hershberg, P. I., & Zimmerman, H. J. Medical intensive care unit: Mortality rate experience in a large teaching hospital. New York State Journal of Medicine, 1973, 73, 754-57. Stoutenbeek, C. P., Van Saene, H. K. F., Miranda, D. R., et al. The effect of selective decontamination of the digestive tract on colonization and infection in multiple trauma patients. Intensive Care Medicine, 1984, 10, 185-92. Stoutenbeek, C. P., & Van Saene, H. K. F. Infection prevention in intensive care by selective decontamination of the digestive tract. Journal of Critical Care, 1990, 5, 137-56. Sydow, M., Burchardi, H., Crazier, T. A., et al. Prospective study of infection and mortality rates in critically ill patients during SDD regimen (abstract). In H. K. F. Van Saene, Stoutenbeek, C. P., Lawin, P., et al. (eds.). Infection control by selective decontamination. Berlin, FRG: Springer Verlag, 1989, 118. Thibault, G. E., Mulley, A. G., Barnett, C. O., et al. Medical intensive care: Indications, interventions, and outcomes. New England Journal of Medicine, 1980, 302, 938-42. Tomlin, P. J. Intensive care-Medical audit. Anaesthesia, 1978, 33, 710-15. Turnbull, A. D., Carlon, G., Baron, R., et al. The inverse relationship between cost and survival in the critically ill cancer patients. Critical Care Medicine, 1979, 7, 20-23. Vanholder, R., & Colardyn, F. Prognosis of intensive care patients: Correlation of diagnosis and complications to patient outcome. Acta Clinica Belgica, 1980, 35, 279-86. Ziegler, E. J., Fisher, C. J. Jr., Sprung, C. L., et al. Treatment of gram-negative bacteremia and septic shock with HA-1A human monoclonal antibody against endotoxin: A randomized, double blind, placebo controlled trial. New England Journal of Medicine, 1991,324,429-36. INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

Problems in assessing critical care 85. Zimmerman, J. E. (ed.). APACHE III study design: Analytic plan for evaluation of severity and outcome. Critical Care Medicine, 1989, 17(S), S169-S221. 86. Zimmerman, J. E., Knaus, W. A., Judson, J. A., et al. Patient selection for intensive care: A comparison of New Zealand and United States hospitals. Critical Care Medicine, 1988, 16, 318-26.

INTL. J. OF TECHNOLOGY ASSESSMENT IN HEALTH CARE 8:3, 1992

443

Problems in assessing the technology of critical care medicine.

Technology assessment is becoming increasingly important in the area of critical care due both to the explosion of technology associated with this dis...
2MB Sizes 0 Downloads 0 Views