Comparison of the Mortality Probability Admission Model III, National Quality Forum, and Acute Physiology and Chronic Health Evaluation IV Hospital Mortality Models: Implications for National Benchmarking* Andrew A. Kramer, PhD1,2; Thomas L. Higgins, MD, MBA, MCCM3,4; Jack E. Zimmerman, MD, FCCM1,5

Objective: To examine the accuracy of the original Mortality Probability Admission Model III, ICU Outcomes Model/National Quality Forum modification of Mortality Probability Admission Model III, and Acute Physiology and Chronic Health Evaluation IVa models for comparing observed and risk-adjusted hospital mortality predictions. Design: Retrospective paired analyses of day 1 hospital mortality predictions using three prognostic models. Setting: Fifty-five ICUs at 38 U.S. hospitals from January 2008 to December 2012. Patients: Among 174,001 intensive care admissions, 109,926 met model inclusion criteria and 55,304 had data for mortality prediction using all three models. Interventions: None. Measurements and Main Results: We compared patient exclusions and the discrimination, calibration, and accuracy for each model. Acute Physiology and Chronic Health Evaluation IVa excluded 10.7% of all patients, ICU Outcomes Model/National *See also p. 732. 1 Cerner Corporation, Vienna, VA. 2 Department of Biostatistics, Kansas University Medical Center, Kansas City, MO. 3 Critical Care Division, Baystate Medical Center, Springfield, MA. 4 Department of Medicine, Tufts University School of Medicine, Boston, MA. 5 Department of Anesthesiology and Critical Care Medicine, George Washington University, Washington, DC. Supported, in part, by Cerner Corporation, Kansas City, MO. Dr. Kramer is an employee of Cerner Corporation, which holds the marketing rights to APACHE and also markets MPM, and has stock options with Cerner Corporation. Dr. Higgins served as Chair of Project IMPACT research committee (2003–2007) and had access to Project IMPACT data used to develop MPM model during that time. He has stock options with Cerner Corporation and has disclosed that he has previously collaborated with Drs. Zimmerman and Kramer on other MPM and APACHE papers. Dr. Zimmerman consulted for Cerner Corporation (critical care). For information regarding this article, E-mail: [email protected] Copyright © 2013 by the Society of Critical Care Medicine and Lippincott Williams & Wilkins DOI: 10.1097/CCM.0b013e3182a66a49

544

www.ccmjournal.org

Quality Forum 20.1%, and Mortality Probability Admission Model III 24.1%. Discrimination of Acute Physiology and Chronic Health Evaluation IVa was superior with area under receiver operating curve (0.88) compared with Mortality Probability Admission Model III (0.81) and ICU Outcomes Model/National Quality Forum (0.80). Acute Physiology and Chronic Health Evaluation IVa was better calibrated (lowest Hosmer-Lemeshow statistic). The accuracy of Acute Physiology and Chronic Health Evaluation IVa was superior (adjusted Brier score = 31.0%) to that for Mortality Probability Admission Model III (16.1%) and ICU Outcomes Model/National Quality Forum (17.8%). Compared with observed mortality, Acute Physiology and Chronic Health Evaluation IVa overpredicted mortality by 1.5% and Mortality Probability Admission Model III by 3.1%; ICU Outcomes Model/National Quality Forum underpredicted mortality by 1.2%. Calibration curves showed that Acute Physiology and Chronic Health Evaluation performed well over the entire risk range, unlike the Mortality Probability Admission Model and ICU Outcomes Model/National Quality Forum models. Acute Physiology and Chronic Health Evaluation IVa had better accuracy within patient subgroups and for specific admission diagnoses. Conclusions: Acute Physiology and Chronic Health Evaluation IVa offered the best discrimination and calibration on a large common dataset and excluded fewer patients than Mortality Probability Admission Model III or ICU Outcomes Model/National Quality Forum. The choice of ICU performance benchmarks should be based on a comparison of model accuracy using data for identical patients. (Crit Care Med 2014; 42:544–553) Key Words: Acute Physiology and Chronic Health Evaluation; benchmarking; health quality indicators; hospital mortality; intensive care; Mortality Prediction Model; outcome assessment

P

rognostic scoring systems were developed to assess ICU performance by comparing observed and risk-adjusted hospital mortality for critically ill patient groups. The use of scoring systems has been mandated for ICUs in the United March 2014 • Volume 42 • Number 3

Clinical Investigations

Kingdom (1), the Netherlands (2), Austria (3), Finland (4), and at Veterans Administration hospitals in the United States (5). Scoring systems are voluntarily used in 10–15% of U.S. ICUs, with the Mortality Probability Admission Model (MPM0)III and Acute Physiology and Chronic Health Evaluation (APACHE) IV prognostic systems being most commonly used (6). Recent reviews have described the current generation of prognostic scoring systems and discussed their value and limitations for comparing observed and predicted mortality (6–9). Studies comparing the accuracy of current prognostic systems in the United States have been few in number. A 2008 study compared the accuracy of the MPM0-III and APACHE IV models for ICU patients in 35 California hospitals and presented a recalibrated MPM0-III model (10). This study found that the original and recalibrated MPM0-III models were less accurate than APACHE IV for predicting hospital mortality but a reasonable alternative for ICUs constrained by cost and manual data collection burden. Another comparison at a U.S. tertiary referral center also found that discrimination and overall model performance were better for APACHE IV than for MPM0-III (11). The use of prognostic scoring systems for quality measurement may increase in the future as a result of the efforts of the California Healthcare Foundation and the National Quality Forum (NQF) (7). Both of these organizations have endorsed a modified and recalibrated version of the MPM0-III model, known as the “ICU Outcomes Model” (ICOM) for predicting risk-adjusted hospital mortality (12). This model includes 28 additional interaction

terms and has different patient exclusions than MPM0-III. The ICOM modification of the MPM0-III model has been endorsed by the National Quality Form for public reporting of risk-adjusted hospital mortality for ICU patients (12). The objective of this study is to examine the suitability of the original MPM0-III, the ICOM/NQF modification of MPM0-III, and APACHE IV models for comparing observed and risk-adjusted predicted mortality in a broad sample of U.S. ICU patients. To do this, we used a large multi-institutional database to 1) examine the accuracy of the three models in a contemporary patient sample and 2) compare model performance across selected patient subgroups.

METHODS Data for this study were obtained from the APACHE database (Cerner Corporation, Kansas City, MO) for ICU admissions between January 1, 2008, and December 31, 2012. This database supports collection of the variables used by the MPM0III, ICOM/NQF modification of MPM0-III, and APACHE IV models. Cerner Corporation owns the registered trademark for APACHE and markets both the MPM0-III and APACHE IV models. The database was collected using a software program that integrates automated and computer-based data collection. All patient data were stripped of patient identifiers and are reported in compliance with the Health Insurance Portability and Accountability Act. This study was submitted to the Institutional Review Board at Baystate Medical Center and deemed exempt from review under current federal regulations. The MPM0-III, ICOM/NQF modification of MPM0-III, and APACHE IV hospital mortality models are publically available at no charge. Description of the three models, their predictor variables, development, validation, and reliability are published elsewhere (12–14). For simplicity and to avoid confusion, the ICOM/NQF modification of MPM0-III will subsequently be referred to as the “NQF model.”

Figure 1. Flow diagram showing reasons for patient exclusion. CABG = coronary artery bypass graft, MPM = Mortality Probability Admission Model, ICOM/NQF = ICU Outcomes Model/National Quality Forum, APACHE = Acute Physiology and Chronic Health Evaluation.

Data Collection Information about each hospital and ICU was selfreported. All ICUs collected data for APACHE IV mortality predictions, but collection of data supporting MPM0III was optional. Patient data were generated as a result of medical care and collected for consecutive unselected ICU admissions on day 1. The

Critical Care Medicine

www.ccmjournal.org

545

Kramer et al

demographic, clinical, and physiologic data collected for each patient are shown in Appendix 1. Patients Excluded From Hospital Mortality Prediction We did not perform mortality predictions for patients who met the exclusion criteria for one or more of the models. All three models exclude ICU readmissions and patients who are younger than 18 years. MPM0-III excludes patients admitted for acute myocardial infarction (AMI) and elective or emergency cardiac surgery (15, 16). NQF excludes trauma patients and those admitted after coronary artery bypass graft surgery (12). In addition, NQF excludes patients admitted to rule out myocardial infarction and found later to not have AMI, but our database did not allow us to identify those patients. The APACHE IV model excludes patients admitted from another ICU during the same hospitalization because extensive life support before ICU admission biases the prognostic implications of the day 1 physiologic measurements (13). We applied each of the above exclusion criteria to our database to generate a common patient dataset for comparison of model performance. Hospital Mortality Prediction We predicted aggregate risk-adjusted hospital mortality for each eligible patient using the methods prescribed for the most recent versions of each model. The MPM0-III model was developed and validated using patient data from 2001 to 2004 (14) and externally validated using data from 2004 to 2005 (15, 16). The NQF model was developed and validated using 2001–2004 patient data from 35 California hospitals (10) and updated using external 2011 patient data (12). The NQF model is provided on request by the NQF at http://www.qualityforum.org/ QPS and is shown in Appendix 2. The APACHE IV model was developed and validated using 2001–2003 patient data (13) and updated (APACHE IVa) using external data from 2006 to 2008 (A.A. Kramer, unpublished observation, 2010). Hospital Mortality Prediction Across Subgroups To examine factors that might be influencing model performance, we analyzed the accuracy of mortality prediction by each model across subgroups selected on the basis of prior studies (13, 16, 17). These included the following: 1) patients who did and did not receive mechanical ventilation on ICU day 1. 2) Calibration curves were generated on the basis of the ICU day 1 mortality risk predicted by each model. This was done to address concerns about differences in model accuracy for patients with differing risks for hospital mortality (7, 18, 19). 3) Patients admitted for medical diagnoses and after elective or emergency surgery. 4) Patients admitted to teaching (with and without Council of Teaching Hospital membership) versus nonteaching hospitals (no residency training programs). 5) A subset of the admission diagnoses (five medical and two postoperative) that represented a cross-section of body systems and had sufficient numbers and mortality (> 75 events for ­medical diagnoses, > 40 events for postoperative diagnoses) to permit statistical comparison. 546

www.ccmjournal.org

Assessment of Model Performance The accuracy of mortality predictions using the three models was compared. To assess discrimination, we used the area under the receiver operating characteristic curve (AU-ROC) (20) with comparisons carried out using the method proposed by Delong et al (21); calibration was measured by the delta mortality, defined as the difference between observed and mean predicted mortality, the Hosmer-Lemeshow c­ hi-square statistic (22), and calibration curves that plot observed and predicted mortality across all risk ranges. Accuracy was determined by the Brier score (23), which was modified since the raw Brier score is affected by the incidence of mortality. The modified score adjusts for this and represents the percent reduction in deviation when using a specific predictive model as opposed to assigning everyone a probability equal to the incidence rate. A higher percentage reduction indicates better model accuracy. The adjusted Brier score is calculated as follows (24):

Characteristics of Hospitals and ICUs That Provided Care for the 55,304 Eligible Study Patients Table 1.

n

%

10

26.3

7

18.4

14

36.8

7

18.4

   COTH

17

44.7

   Teaching, non-COTH

12

31.6

9

23.7

10

26.3

   301–399

9

23.7

   400–524

5

13.2

   525–799

10

26.3

4

10.5

 Coronary care

5

9.1

 Medical

8

14.5

28

50.9

3

5.5

11

20.0

Variable

Hospitals (n = 38)  Region    East    Southeast    Midwest    West  Teaching status

   Nonteaching  Bed size    < 300

   ≥ 800 ICUs (n = 55)

 Mixed medical-surgical  Neurologic  Surgical COTH = Council of Teaching Hospitals.

March 2014 • Volume 42 • Number 3

Clinical Investigations

Raw Brier score = ∑ (i =1)(Observedi − Predictioni )2 / N N

Null Brier Score (using mortality incidence as a prediction for all patients) = incidence X (1 – incidence) Adjusted Brier Score = (Null Brier Score – Raw Brier Score) / Null Brier Score Although a decreasing Brier score indicates better performance, an increasing adjusted Brier score signifies better performance.

We used the Hosmer-Lemeshow test as one measure of calibration even though it is highly sensitive to sample size, as all three models were compared using the same patients in identical numbers (25). We did not recalibrate the three models for the study population because users are likely to employ the most recent model rather than a recalibrated version. In addition, model recalibration improves discrimination and calibration, but it does not indicate which model represents the best tool for comparing observed and predicted mortality (26, 27).

Characteristics of 109,926 ICU Admissions That Met All Inclusion Criteria, Stratified by Whether or Not Mortality Probability Admission Model III Data Were Collected Table 2.

Mortality Probability Admission Model III Data Collected Variable

Yes (n = 55,304)

No (n = 54,622)

p

Male, %

50.8

53.1

< 0.001

Ventilated on day 1, %

29.2

28.9

0.27

≥ 1 Chronic health item, %

15.8

17.1

< 0.001

Patient type, %

< 0.001

 Medical

82.5

85.9

 Elective surgery

13.2

11.7

4.4

2.4

 Emergency surgery Location prior to admission, %

< 0.001

 Emergency department

47.5

32.6

 Operating/recovery room

17.4

13.9

 Floor/direct admit

14.3

16.2

 Other hospital

10.7

23.9

 Step down unit or telemetry

8.8

10.5

 Other

1.3

2.9

Age (mean ± se)

61.83 ± 0.07

62.12 ± 0.07

0.002

Acute physiology score (mean ± se)

41.34 ± 0.11

45.49 ± 0.12

< 0.001

Length of stay before ICU admission

1.02 ± 0.01

1.36 ± 0.01

< 0.001

ICU length of stay

3.19 ± 0.02

3.63 ± 0.02

< 0.001

Hospital length of stay

8.61 ± 0.04

10.63 ± 0.04

< 0.001

7.7

8.5

< 0.001

11.8

12.7

< 0.001

ICU mortality, % Hospital mortality, % Top five medical diagnostic groups

Drug overdose, pulmonary sepsis, upper GI bleed, urinary tract sepsis, chronic obstructive pulmonary disease

Congestive heart failure, stroke, rhythm disturbance, cardiac arrest, upper GI bleed

Top three operative diagnostic groups

GI neoplasm, cranial neoplasm, liver transplant

GI neoplasm, cranial neoplasm, GI perforation

GI = gastrointestinal.

Critical Care Medicine

www.ccmjournal.org

547

Kramer et al

did not have MPM0-III data collected, either because no data were collected or because data were collected only for a random sample of patients. Thus, 55,304 patients from 55 ICUs in 38 hospitals were used for model comparisons. Table 1 shows the characteristics of the ICUs and hospitals for the 55,304 included admissions. There was adequate variability across geographic region, teaching status, and hospital bed size. Half of the ICUs were mixed medical-surgical, 20% were surgical, 14.5% were medical, and the remaining 14.6% were either coronary care or neurologic units. Table 2 shows the characteristics and outcomes for the included admissions. There were significant (p > 0.001) differences between patients at ICUs that did and did not collect MPM0-III data. Most notaFigure 2 Annual adjusted Brier score (percent improvement from bly, patients at ICUs that did not collect MPM0-III data were random prediction) for the Mortality Probability Admission Model III (gray more frequently admitted from another hospital, had higher circle with solid line), National Quality Forum (gray circle with dashed line), and Acute Physiology and Chronic Health Evaluation (black circle acute physiology scores, longer lengths of stay, and higher with solid line) IVa hospital mortality models. mortality. They also differed in which medical diagnoses were the most prevalent. All statistical methods were carried out using SAS 9.2 Table 3 shows the discrimination, calibration, and dif­software (SAS Institute, Cary, NC). ference between observed and predicted mortality for each model. The APACHE IVa model had the best discrimination (AU-ROC = 0.88), compared with the MPM0-III (AU-ROC = RESULTS We retrospectively analyzed data for 174,001 patients admitted 0.81) and NQF (AU-ROC = 0.80) models. All pairwise differences in AU-ROC were highly significant (p < 0.001). Although to 99 ICUs at 47 U.S. hospitals from January 1, 2008, to Decemall three models had a high Hosmer-Lemeshow chi-square ber 31, 2012. Of these patients, 64,705 patients (36.8%) met statistic, the departure from perfect fit was lowest for the the exclusion criteria for one or more models. Figure 1 shows APACHE IVa model (219) and worse for the MPM0-III (554) the number and reasons for patient exclusion. For the 109,926 patients meeting inclusion criteria, 54,622 patients (49.7%) and NQF (760) models. APACHE IVa overpredicted mortality by 1.5%, MPM0-III overpredicted mortality by 3.1%, and NQF underpredicted mortality by 1.2%. Table 3 also shows that the reduction in deviation of observed and predicted values, as reflected by the adjusted Brier score, was better for APACHE IVa (31.0%) than for MPM0-III (16.1%) and NQF (17.8%). Using APACHE IVa to predict mortality for patients who did not have MPM0-III data, the adjusted Brier score was 27.7%. Figure 2 displays the annual adjusted Brier scores for the three models from 2008 to 2012. Adjusted Brier scores for all three models trended downwards, but the gap between APACHE IVa, MPM0-III, and NQF was consistent during this 5-year period. Figure 3 Figure 3. Adjusted Brier score (percent improvement from random prediction) for the Mortality Probability shows the adjusted Brier score Admission Model III (gray bar), National Quality Forum (light shaded gray bar), and Acute Physiology and Chronic Health Evaluation (black bar) IVa hospital mortality models, stratified by preselected subgroups. COTH by hospital teaching status, = Council of Teaching Hospital, Teach = non-COTH teaching hospital, Non-teach = nonteaching hospital, operative status, and whether elective = elective surgery, emergency = emergency surgery, MV = mechanical ventilation. the patient was mechanically 548

www.ccmjournal.org

March 2014 • Volume 42 • Number 3

Clinical Investigations

Table 3. Comparison of Performance Measures for the Mortality Probability Admission Model III, National Quality Forum, and Acute Physiology and Chronic Health Evaluation IVa Hospital Mortality Models on 55,304 Admissions to 55 ICUs

Predictive Model

Acute Physiology and Chronic Health Evaluation IVa

Mortality Probability Admission Model III

National Quality Forum

Area under the receiver operating characteristic curve

0.880

0.805

Hosmer-Lemeshow chi-square statistic

219

554

Adjusted Brier score: reduction in variability from random prediction

31.0%

16.1%

17.8%

Difference between observed and predicted mortality: all patients

–1.5%

–3.1%

1.2%

ventilated. APACHE IVa had the highest (best) adjusted Brier score for each of these subgroups; the MPM0-III and NQF models had lower values. All of the models performed better on mechanically ventilated patients than nonventilated patients. None of the models were exceptional at predicting mortality in elective surgery patients. Figure 4 displays the calibration curves for each model. With the exception of one decile, the APACHE model’s predictions were close to the observed mortality. MPM’s predictions severely overpredicted mortality throughout the risk range. The NQF model underpredicted mortality (except for the last decile), sometimes by quite a bit. Figure 5 shows the difference between observed and predicted mortality for the seven selected diagnostic groups. The NQF model had the largest deviation of the three models for patients with bacterial pneumonia, hepatic failure, and sepsis of gastrointestinal origin; MPM0-III had the largest deviation for patients with cardiac arrest, chronic obstructive pulmonary disease, surgery for gastrointestinal neoplasm, and surgery for gastrointestinal perforation.

DISCUSSION Analysis of hospital mortality predictions for 55,304 admissions to 55 U.S. ICUs showed that APACHE IVa had better discrimination, calibration, and overall accuracy than the MPM0-III and NQF models. The accuracy of the NQF model was better than for the original MPM0-III model. We also found that the accuracy of APACHE IVa was superior within subgroups preselected on the basis of operative (yes/no), ventilator (yes/no), hospital teaching status, and specific ICU admission diagnoses. Calibration curves demonstrated that APACHE IVa was consistent in its accuracy across the range of risk, unlike the MPM and NQF models which showed poor calibration. There has not been a consensus about whether discrimination or calibration is a better measure of a predictive model’s accuracy (28). We therefore presented both measures and the adjusted Brier score. The latter contains elements of discrimination and calibration and provides an intuitive way to describe model accuracy. The similarity in the downward trends of annual adjusted Brier scores for the three models Critical Care Medicine

0.801 760

suggests that their deterioration in accuracy over time had a minimal impact on differences in model performance. It has been repeatedly shown that predictive accuracy is influenced by how well a model accounts for case-mix differences (2, 11, 29, 30). We believe that differences in the complexity of the three models are the primary reason for differences in model accuracy. MPM0-III has 16 predictor variables plus seven interaction terms, NQF 19 predictor variables plus 28 interaction terms, and APACHE IVa 142 predictor variables (12–14). It is intuitive that the lower expected mortality for asthma differs from a higher rate for bacterial pneumonia, but MPM0-III and NQF forego the added data collection burden of diagnosis and assume case-mix will average out in most circumstances. APACHE IVa, on the other hand, incorporates 116 specific ICU admission diagnoses as well as extent of abnormality of 17 physiological variables to generate expected mortality. We believe that this added complexity is the most important reason for the superior accuracy of APACHE IVa in the aggregate and within patient subgroups. However, it is possible that APACHE’s better predictive performance is a result of using data over the first day rather than the first hour after admission. Our findings are in agreement with and extend the suggestion by Keegan et al (11) that the larger number of predictor variables in APACHE IV is associated with better case-mix adjustment and superior predictive accuracy. Our results also support previous findings regarding the accuracy of current prognostic models. In the multihospital study by Kuzniewicz et al (10), the accuracy of APACHE IV was superior to MPM0-III, both with and without recalibration. At two U.S. tertiary care institutions with high acuity and frequent hospital transfers, APACHE IV was more accurate than MPM0-III (11, 31). APACHE IV was also more accurate than MPM0-III and Simplified Acute Physiology Score 3 in a medical ICU at an Indian tertiary care center (32). We are not aware of prior accuracy comparisons using the NQF model. The effectiveness of a predictive model across hospitals is dependent not only on accuracy but also on extent of patient coverage. The three models compared in this study all had exclusion criteria but differed substantially on which patients should be excluded: APACHE eliminated 10.7%, the NQF www.ccmjournal.org

549

Kramer et al

Figure 4. Calibration curves of hospital mortality for the Acute Physiology and Chronic Health Evaluation (APACHE), Mortality Probability Admission Model (MPM), and National Quality Forum (NQF) models. Observed % = black circle, predicted % = gray circle.

model eliminated 20.1%, and MPM0-III eliminated 24.1%. Model-specific exclusions are likely to limit the use of the MPM0-III in coronary and cardiac surgical ICUs. The NQF excludes coronary artery bypass surgery and trauma patients, which would preclude its use in cardiac surgical and trauma ICUs. In a prior study, the exclusion criteria of four older 550

www.ccmjournal.org

prognostic systems resulted in the elimination of 11.5–14.6% of ICU admissions in a large multi-institutional database (33). This earlier study also showed that differences in predictive accuracy at the ICU level were influenced by exclusion criteria. Our results extend these patient-related findings to contemporary models. Another consideration when comparing models is the burden of data collection. Although APACHE’s increased complexity resulted in improved accuracy, it also results in a much larger set of data to collect. In particular, a primary admission diagnosis must be entered. A lower time has been reported for manually collecting MPM0-III (11.1 min) versus APACHE IV (37.3 min) data per patient (10). However, with the increasing use of electronic data collection, this distinction is largely obviated. Automated collection has been shown to reduce the time to collect APACHE IV data to 1.5 minutes per patient (34). In addition, automation improves data completeness (35) and results in changes in severity-adjusted mortality rate (36). This study has important implications for national benchmarking. First, APACHE IVa is more accurate than the MPM0III and NQF models. These findings strongly suggest that providers of quality measures and regulatory agencies should base model endorsement on comparison of model accuracy using identical data. Second, none of the models had outstanding calibration. Figure 2 demonstrates that the accuracy of each model needs to be validated periodically and either updated or replaced by another predictive model. Finally, benchmarking initiatives should consider which models exclude the fewest patients, in order to make the core measures meaningful to the largest number of patients. Although this report presents a rigorous comparison of predictive model performance among U.S. patients, there are several limitations to our study: First, our results cannot be interpreted as representative of all ICU patients in the United States or other countries. This is because data were collected only in U.S. hospitals that elected to measure ICU performance using MPM0III and APACHE IV. Second, we did not compare the impact of differences in model accuracy on the evaluation of ICU performance based on comparison of observed versus predicted hospital mortality at the unit level. We plan to do this in a future study but need to acquire data for a larger number of eligible patients to narrow CIs for each ICU. Third, model-specific exclusion of a large number of patients might have led to bias in the selection of study patients. Finally, not every ICU elected to collect data supporting MPM0-III and NQF predictions. The patients in these ICUs tended to have higher severity of illness and different diagnoses. However, the APACHE IVa model still had a relatively high adjusted Brier score in this subgroup of patients.

CONCLUSIONS APACHE IVa demonstrated superior predictive accuracy compared with MPM0-III and its derivative provided by the National Quality Form. The choice of a predictive model as a core measure should be based on patient coverage and comparison of model accuracy using data for identical patients. March 2014 • Volume 42 • Number 3

Clinical Investigations 10. Kuzniewicz MW, Vasilevskis EE, Lane R, et al: Variation in ICU risk-adjusted mortality: Impact of methods of assessment and potential confounders. Chest 2008; 133:1319–1327 11 Keegan MT, Gajic O, Afessa B: Comparison of APACHE III, APACHE IV, SAPS 3, and MPM0III and influence of resuscitation status on model performance. Chest 2012; 142:851–858 12. Philip R: Lee institute for health policy studies: ICU outcomes (mortality and length of stay) methods, data collection tool, and data. Available at: http:// health policy.ucsf.edu/content/ icu-outcomes. Accessed April 1, 2013 13. Zimmerman JE, Kramer AA, McNair DS, et al: Acute Physiology and Chronic Health Evaluation (APACHE) IV: Hospital mortality assessment for today’s critically ill patients. Crit Care Med 2006; 34:1297–1310 14. Higgins TL, Teres D, Copes WS, et al: Assessing contemporary Figure 5. Percent difference between observed and predicted mortality for the Mortality Probability Admission intensive care unit outcome: An Model III (gray bar), National Quality Forum (light shaded gray bar), and Acute Physiology and Chronic Health updated Mortality Probability Evaluation (black bar) IVa hospital mortality models in seven diagnostic subgroups. GI = gastrointestinal. Admission Model (MPM0-III). Crit Care Med 2007; 35:827–835 15. Higgins TL, Kramer AA, Nathanson BH, et al: Prospective validaACKNOWLEDGMENT tion of the intensive care unit admission Mortality Probability Model We thank R. Adams Dudley, MD, MBA, Associate Director, (MPM0-III). Crit Care Med 2009; 37:1619–1623 Research, Philip R. Lee Institute for Health Policy Studies, for 16. Nathanson BH, Higgins TL, Kramer AA, et al: Subgroup mortality providing the ICOM/NQF mortality equation. probability models: Are they necessary for specialized intensive care units? Crit Care Med 2009; 37:2375–2386 17. Moreno RP, Metnitz PG, Almeida E, et al; SAPS 3 Investigators: SAPS 3–From evaluation of the patient to evaluation of the intensive REFERENCES care unit. Part 2: Development of a prognostic model for hospital mor 1. Harrison DA, Parry GJ, Carpenter JR, et al: A new risk predictality at ICU admission. Intensive Care Med 2005; 31:1345–1355 tion model for critical care: The Intensive Care National Audit & Research Centre (ICNARC) model. Crit Care Med 2007; 18. Moreno RP, Hochrieser H, Metnitz B, et al: Characterizing the risk pro35:1091–1098 files of intensive care units. Intensive Care Med 2010; 36:1207–1212 19. Moreno RP, Bauer P, Metnitz PG: Characterizing performance pro 2. Brinkman S, Abu-Hanna A, van der Veen A, et al: A comparison of files of ICUs. Curr Opin Crit Care 2010; 16:477–481 the performance of a model based on administrative data and a model based on clinical data: Effect of severity of illness on stan 20. Hanley M, McNiel B: The meaning and use of the area under a receiver dardized mortality ratios of intensive care units. Crit Care Med 2012; operating characteristic (ROC) curve. Radiology 1982; 143:29–36 40:373–378 21. DeLong ER, DeLong DM, Clarke-Pearson DL: Comparing the areas 3. Metnitz B, Schaden E, Moreno R, et al; ASDI Study Group: Austrian under two or more correlated receiver operating characteristic curves: validation and customization of the SAPS 3 Admission Score. A nonparametric approach. Biometrics 1988; 44:837–845 Intensive Care Med 2009; 35:616–622 22. Hosmer D, Lemeshow S: Applied Logistic Regression. Second 4. Niskanen M, Reinikainen M, Pettilä V: Case-mix-adjusted length of Edition. New York, Wiley, 2000 stay and mortality in 23 Finnish ICUs. Intensive Care Med 2009; 23. Brier GW: Verification of forecasts expressed in terms of probability. 35:1060–1067 Mon Weather Rev 1950; 75:1–3 5. Render ML, Deddens J, Freyberg R, et al: Veterans Affairs intensive 24. Steyerberg EW: Chapter 15. In: Clinical Prediction Models. New care unit risk adjustment model: Validation, updating, recalibration. York, Springer Science-Business Media, 2009, p 257 Crit Care Med 2008; 36:1031–1042 25. Kramer AA, Zimmerman JE: Assessing the calibration of mortality 6. Breslow MJ, Badawi O: Severity scoring in the critically ill: Part benchmarks in critical care: The Hosmer-Lemeshow test revisited. 1–interpretation and accuracy of outcome prediction scoring sysCrit Care Med 2007; 35:2052–2056 tems. Chest 2012; 141:245–252 26. Harrison DA, Brady AR, Parry GJ, et al: Recalibration of risk predic 7. Breslow MD, Badawi O: Severity scoring in the critically ill—Part 2: tion models in a large multicenter cohort of admissions to adult, genMaximizing value from outcome prediction scoring systems. Chest eral critical care units in the United Kingdom. Crit Care Med 2006; 2012; 141:518–527 34:1378–1388 8. Keegan MT, Gajic O, Afessa B: Severity of illness scoring sys 27. Bakhshi-Raiez F, Peek N, Bosman RJ, et al: The impact of different tems in the intensive care unit. Crit Care Med 2011; 39: prognostic models and their customization on institutional compari163–169 son of intensive care units. Crit Care Med 2007; 35:2553–2560 9. Vincent JL, Moreno R: Clinical review: Scoring systems in the critically 28. Cook NR: Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 2007; 115:928–935 ill. Crit Care 2010; 14:207

Critical Care Medicine

www.ccmjournal.org

551

Kramer et al 29. Murphy-Filkins R, Teres D, Lemeshow S, et al: Effect of changing patient mix on the performance of an intensive care unit severity-ofillness model: How to distinguish a general from a specialty intensive care unit. Crit Care Med 1996; 24:1968–1973 30. Glance LG, Osler T, Shinozaki T: Effect of varying the case mix on the standardized mortality ratio and W statistic: A simulation study. Chest 2000; 117:1112–1117 31. Hixson E, Guzman J: Agreement between commonly applied mortality prediction models for medical intensive care patients in a large academic medical center setting. Abstr. Crit Care Med 2010; 38:11 32. Juneja D, Singh O, Nasa P, et al: Comparison of newer scoring systems with the conventional scoring systems in general intensive care population. Minerva Anestesiol 2012; 78:194–200

33. Wunsch H, Brady AR, Rowan K: Impact of exclusion criteria on case mix, outcome, and length of stay for the severity of disease scoring methods in common use in critical care. J Crit Care 2004; 19:67–74 34. Lombardozzi K, Bible S, Eckman J, et al: Evaluation of efficiency and accuracy of a streamlined data entry process into an outcomes database. Abstr. Crit Care Med 2009; 37:758 35. Bosman RJ, Oudemane van Straaten HM, Zandstra DF: The use of intensive care information systems alters outcome prediction. Intensive Care Med 1998; 24:953–958 36. Reinikainen M, Mussalo P, Hovilehto S, et al: Association of automated data collection and data completeness with outcomes of intensive care. A new customized model for outcome prediction. Acta Anaesthesiol Scand 2012; 56:1114–1122

Characteristics and Outcomes Collected for Each Patient Included in the Hospital Mortality Predictions

Appendix 1.

Variable

Measurement

Gender

Male (reference), female

Race

White (reference), black, Hispanic, other

Age

Continuous measure

APS variables on ICU day 1 and at ICU discharge

Weight determined by most abnormal value on ICU day 1 and day of discharge, sum of weights equals the APS, which ranges 0–252; variables include pulse rate, mean blood pressure, temperature, respiratory rate, Pao2: Fio2 ratio (or P(a-a)o2 for intubated patients with Fio2 > 0.5), hematocrit, WBC count, creatinine, urine output, blood urea nitrogen, sodium, albumin, bilirubin, glucose, acid base abnormalities, and neurologic abnormalities based on Glasgow Coma Score; continuous measure

Chronic health items

AIDS, cirrhosis, hepatic failure, immunosuppression, lymphoma, leukemia or myeloma, metastatic tumor; not used for elective surgery patients; binary variable created for 0 vs > 0 items

ICU admission diagnosis

116 Acute Physiology and Chronic Health Evaluation IV categories (see reference [13])

Location prior to ICU admission

Operating or recovery room, emergency department, acute care floor (reference), step-down unit, transfer from another ICU, other hospital, direct ICU admission from home, and other/unknown

Length of stay before ICU admission

Square root of time from hospital admission to ICU admission (in fractional days)

Discharge location

Acute care floor (reference), step-down unit, other

ICU length of stay, first admission

Continuous measured, truncated at 30.0 d

Patient admitted after emergency surgery

Yes, no

Glasgow Coma Score

3–15

Unable to Assess Glasgow Coma Score due to sedation/paralysis on day 1 and discharge

Yes, no

Ventilated on day 1 and discharge

Yes, no

Duration of mechanical ventilation

Continuous measured, truncated at 30.0 d

Pao2:Fio2 ratio

Continuous measure Yes, no

Chronic renal failure

a

Yes, no

Acute renal failure

a

Yes, no

Cardiac dysrhythmia

a

Cerebrovascular incident

Yes, no

Intracranial mass effect

Yes, no

a

a

(Continued) 552

www.ccmjournal.org

March 2014 • Volume 42 • Number 3

Clinical Investigations

(Continued). Characteristics and Outcomes Collected for Each Patient Included in the Hospital Mortality Predictions

Appendix 1.

Variable

Measurement

Gastrointestinal bleedinga

Yes, no

Cardiopulmonary resuscitation within 24 hr prior to ICU admission

Yes, no

Full code (no do-not-resuscitate restrictions)a

Yes, no

Mortality before hospital discharge

Yes, no

Mortality before ICU discharge

Yes, no

APS = Acute Physiology Score. a On admission or within 1 hr after admission.

Equation for the ICU Outcomes Model/National Quality Forum Hospital Mortality Model

Appendix 2.

Main Effect

Coefficient

Interaction Term

Coefficient

Constant

–3.77660

ACUTEREN_FULLCODE

0.18697

Coma/deep stupor (Glasgow Coma Scale 3 or 4)

1.17740

ACUTEREN_RENALHIST

–0.58166

Heart rate ≥ 150 beats/min

1.37961

ACUTEREN_VENT

–0.27488

Systolic blood pressure ≤ 90

0.93005

ARRHY_CPR

–0.51759

Chronic renal insufficiency

0.47906

ARRHY_HEARTRATE

–0.59134

Cirrhosis

2.04594

ARRHY_VENT

Metastatic neoplasm

3.25668

CANCER_ELECSURG

–0.60384

Acute renal failure

1.45419

CANCER_SBP

–0.27854

Cardiac dysrhythmia

0.64270

CANCER_VENT

–0.19247

Cerebrovascular incident

1.44617

CANCER_FULLCODE

GI bleed

0.99711

CIRRHOS_GIBLD

Intracranial mass effect

0.11814

CVI_IME

0.60941

Age (per year)

0.02070

CVI_VENT

0.66294

Age spline middle

0.00432

ELECSURG_VENT

Age spline upper

0.00426

GIBLD_VENT

0.37392

CPR before admission

2.42896

RENALHIST_SBP

0.19529

Mechanical ventilation within 1 hr of admission

1.63628

0.21406

0.17190 –0.65371

–0.34059

SBP_VENT

–0.22969

Elective surgery

–1.45047

Age × heart rate ≥ 150 beats/min

–0.00828

Full code

–1.91206

Age × cirrhosis

–0.01332

Zero risk factors

–0.80970

Age × metastatic neoplasm

–0.02895

Age × acute renal failure

–0.00993

Age × cardiac dysrhythmia

–0.00691

Age × cerebrovascular incident

–0.01487

Age × GI bleed

–0.01347

Age × CPR before admission

–0.01440

Age × mechanical ventilation

–0.00611

Age × elective surgery

0.00821

Age × full code

0.01305

GI = gastrointestinal, CPR = cardiopulmonary resuscitation.

Critical Care Medicine

www.ccmjournal.org

553

Comparison of the Mortality Probability Admission Model III, National Quality Forum, and Acute Physiology and Chronic Health Evaluation IV hospital mortality models: implications for national benchmarking*.

To examine the accuracy of the original Mortality Probability Admission Model III, ICU Outcomes Model/National Quality Forum modification of Mortality...
758KB Sizes 0 Downloads 0 Views

Recommend Documents