Nursing routine data as a basis for association analysis in the domain of nursing knowledge.

Nursing Routine Data as a Basis for Association Analysis in the Domain of Nursing Knowledge Björn Sellemann, RN, PhD1, Jürgen Stausberg, MD2, Ursula Hübner, PhD3 1 Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Lower Saxony, Germany; 2IBE, Medical Faculty, Ludwig-Maximilians-Universität München, München, Bavaria, Germany; 3Healthcare Informatics Unit, Department of Business Management and Social Sciences, University of Applied Sciences Osnabrück, Osnabrück, Lower Saxony, Germany Abstract This paper describes the data mining method of association analysis within the framework of Knowledge Discovery in Databases (KDD) with the aim to identify standard patterns of nursing care. The approach is application-oriented and used on nursing routine data of the method LEP nursing 2. The increasing use of information technology in hospitals, especially of nursing information systems, requires the storage of large data sets, which hitherto have not always been analyzed adequately. Three association analyses for the days of admission, surgery and discharge, have been performed. The results of almost 1.5 million generated association rules indicate that it is valid to apply association analysis to nursing routine data. All rules are semantically trivial, since they reflect existing knowledge from the domain of nursing. This may be due either to the method LEP Nursing 2, or to the nursing activities themselves. Nonetheless, association analysis may in future become a useful analytical tool on the basis of structured nursing routine data. Introduction International1 as well as German2 studies showed that the implementation and documentation of the nursing process in everyday practice often is difficult. Within the nursing process, as soon as a problem is identified, a diagnosis is made by means of nursing diagnoses. On basis of the nursing aims, the problem then is solved through appropriate nursing interventions. According to Bulechek & McCloskey3, any direct nursing activity performed by a nurse in the patient’s interest, is defined as a nursing intervention. The analysis of Güttler and Görres2, which was carried out in Germany, showed, however, that nursing is documented only infrequently during ward routine. Formulating the nursing problems and the expected goals/outcomes was observed to be a difficult task for nurses. One reason for the enormous difficulties could be that all documentation is paper-based. It has already been stated that an electronic documentation system could provide valuable assistance in the form of pre-formulated problems, goals and results4. In the German healthcare system, especially in hospitals, the use of information technology has increased in the past years5,6. Besides the classical, e.g. administrative and clinical, applications of a hospital information system (HIS), nursing information systems are used more widely7. With the acquisition and storage of structured nursing data within HIS, it has been made much easier to depict the extensive range of tasks of patient care in the inpatient hospital setting. In Germany, however, data about nursing interventions are not captured systematically8, so that only structural data on staff numbers exists9. All German hospitals are obliged to collect a considerable amount of administrative data for billing purposes. In addition to these administrative routine data, medical and nursing routine data are recorded as well. These are data recorded in the course of daily routine; thus, they represent the scope of a hospital’s services. They are often referred to as “already-there data”. Because they arise in the actual medical care situation and thus reflect medical and nursing reality, routine data are of great importance for research and practice. Many of the methods of classical statistics are neither designed nor suitable for the today’s enormous data sets10. Due to the development of affordable digital recording and data storage, a new research area for the analysis of large amounts of data with respect to new, previously unknown patterns, has evolved 20 years ago. In the year 1989, the term “Knowledge Discovery in Databases (KDD)” was used for the first time to name a workshop in Detroit (USA) 11. In this paper, the term “Knowledge Discovery in Databases (KDD)” is used to describe the complete process of knowledge discovery. According to Fayyad et al.12, KDD is the nontrivial process of discovering valid, new, useful, and easily understandable patterns of databases. One step of the KDD process is data mining, the task of which it is to apply methods of data analysis and discovery algorithms in order to discover patterns in the data sets 13. With the introduction of scanner checkout systems in the early 90s, for example, the data mining process of association analysis was developed 14. Association rules always describe correlations between objects occurring together. These objects can be individual retail articles or objects from other applications, such as medicine 15,16 or banking17. Duan18 uses association analysis in order to identify correlations between nursing diagnoses. This knowledge is then used to support an electronic nursing documentation as a proposed system. A Nursing Minimum Data Set (NMDS) is

defined as “a minimum set of items [or elements] of information with uniform definitions and categories, concerning a specific aspect or dimension of nursing, which meets the information needs of multiple data users in the health care system”19. The NMDS includes three elements: 1st Nursing care elements as nursing diagnoses, interventions, outcomes, and intensity of nursing care, 2nd Patient or client demographic elements, and 3rd Service elements. Park et al.20 employ the association analysis to find out how nursing contributes to personal health. The study data were based on the nursing information system with the elements of the NMDS. The nursing information system used the North American Nursing Diagnosis Association Classification (NANDA), the Nursing Intervention Classification (NIC), and the Nursing Outcome Classification (NOC). This paper tries to answer the question whether the data mining process of association analysis is applicable to the nursing intervention data of the method “Leistungserfassung in der Pflege (LEP)”, and whether day-specific intervention patterns of nursing can be identified. In Germany, the scientific method of LEP21 is used, in the context of electronic nursing documentation, to capture data of the performed nursing services. It is a method of statistical collection, calculation, and presentation that, on basis of a list of nursing interventions in acute hospitals, retrospectively records/estimates the daily care activities21. Whether this method is applicable or not is tested for the days of admission, surgery, and discharge, as those are the days when nursing and/or administrative expenses usually increase. Methods and Materials The 6-step KDD process model by Cios et al. 22 forms a methodological frame of this paper. The iterative KDD process model requires regular interventions by the analyst13. Regarding these interventions, however, only a few diffuse aids to decision-making, which have not been evaluated, are suggested in literature due to the strong domain relation. Through the data mining process of association analysis, association rules in the form X→Y, where X and Y are subsets of a data collection, are generated14. According to Agrawal et al.14, the following formal model shall be employed in this paper: The nursing interventions of the method LEP Nursing 2 are given by I={i1, i2, i3, ..., in}; the transaction database is given by T={t1, t2, t3, ..., tn}. A transaction consists of one or more nursing interventions from I that are considered together within an analysis, and of a transaction identification that allows matching the transaction. An association rule X→Y consists of the antecedent (X  T) and the consequent (Y  T). Antecedent and consequent contain a non-empty and disjoint number of items23. When using the Apriori Algorithm24, X may be composed of several objects from I. Y, however, always consists of only one object from I. The relevance of a rule is rated on the basis of interestingness, such as support23, confidence23 and lift25. Support (Supp(X→Y)) corresponds to the relevance of an association rule, Confidence (Conf(X→Y)) measures the strength of this rule14. Lift(X→Y) is the ratio of the probability specified in the numerator (Conf(X→Y)), in relation to the probability specified in the denominator, which this event would have if the events X and Y were independent from each other26. The Apriori Algorithm we used consists of two phases: 1) Calculation of all sets of items where Supp(X→Y) is greater than or equal to the threshold value MinSup(X→Y). These sets of items are called “Large Itemsets”; their length is described by k24. 2) With the aid of the so-called Apriori gene-function24, all rules are derived from the “Large Itemsets” whose confidence corresponds at least to MinConf(X→Y). Based on the two threshold values, MinSup(X→Y) and MinConf(X→Y), and the value of k, the association analysis may be operated. The aim of an analysis is to identify all rules in a set of transactions, which reach the pre-defined lower threshold value for MinSup(X→Y) and MinConf (X→Y) 14. The higher the threshold value, the fewer rules can be found/identified. Three pseudonymized MS Access 2003 tables (case-based data, LEP-data, cases treated in 2006) with over 45,000 inpatients treated at the University Hospital of Essen form the database of this study. The respective inpatient cases have admission dates following the 31. December 2005, release dates 1. January 2007, and approximately 10.5 million documented nursing interventions of the method LEP Nursing 2. The database for the analysis of the day of surgery contains 13,156 valid cases with an average age of 52.7 years and an average length of stay of 9.9 days. The database contains over 40,000 valid treatment cases for the days of admission (42,243 cases) and discharge (43,429 cases). The average length of stay in both study groups was 6.8 days, average age was 48 years. In all three data sets, the sex ratio is balanced and identical (male: 52%, female: 48%). The data preparation and the implementation of the association analysis were performed with SPSS Clementine 10.1 for Windows. For the analysis of the day of surgery, only those cases were included that underwent at least one operation during their stay and who had the LEP information variable 21.01 (=surgery performed). The settings of the Apriori modeling node for the three day-specific association analyses are identical. The threshold values based on the study of Ordonez et al.16 were adopted (MinSup(X→Y)=1%, MinConf(X→Y)=70%, k=4). All of the 119 LEP Nursing 2 care variables may be represented in antecedent as well as in consequent. Results All three day-specific association analyses were completed successfully. For the analysis of the day of surgery, however, the value of k had to be reduced to k=3 for methodological reasons (no rules with k=4 could be generated). Accordingly, the maximum number of variables in the antecedent is three and not four, as in the

other two analyses. Due to the large number of generated rules, a ranking method is used for the presentation of the results: on the basis of the three values of interestingness, the rules within the hit-list are sorted according to the respective values. The most relevant rules on the scale of interestingness are presented at the top, while less relevant regulations appear on the bottom of the ranking list. For presentation and content analysis of the results, a “cut-off value” was specified. The “cut-off value” was based not on a study, but on experience and hunch of the analyst’s expections27. It has been agreed that only the first ten rules/ranks (top 10 rules) of each scale of interestingness are shown in this paper. 961,295 rules were generated for the association analysis of the day of surgery, 126,914 for the day of discharge, and 347,364 for the day of admission (Table 1). Despite the reduction of k in the model settings for the analysis of the day of surgery, the largest number of rules was generated for this day. The distribution of rules for the days of admission and discharge, respectively, differs only slightly in terms of complexity/length of itemsets (k). All of the rules with four variables vary between 79.90% and 74.68% in their antecedents. The distribution of complexity rules on the day of surgery, however, significantly differs from the two other days. Even when the antecedent has only rules with three variables, it is remarkable that almost 95% of the total number of rules consists of rules with the highest level of complexity. In summary, it can be stated that the rule with the maximum value for Supp(X→Y) is identical in all three studies, for the simple, elementary rule “ nursing documentation simple → Food/Drink simple“ has the maximum values for Supp(X→Y) (85.143%, 77.180% and 76.912%). The value of Supp(X→Y), however, is significantly less than the maximum (100%), even though the rule is composed of two elementary nursing interventions carried out routinely with almost every patient. Unlike the rules with the maximum values for Supp(X→Y), the rules with the maximum values for Conf(X→Y) differ significantly in their composition. On the day of admission, for example, the probability that the nursing interventions “taking of a blood sample, complex + theme-centered nursepatient communication/instruction, long + Administration/Coordination, complex + prepare bed/cot, complex“ (antecedent) is accompanied with the nursing intervention “nursing documentation, simple” (consequent) is 100%. Also, the value for Conf(X→Y) is 100% for eleven rules

on the day of admission and 1,002 rules on the day of surgery. The composition of the rules with a maximum value for Lift (X→Y): Lift(X→Y)day of operation=34.154, Lift(X→Y)day of admission=39.209, and Lift(X→Y)day of discharge=13.520 points is also very different. Table 1. Results of the day-specific association analyses Study period

Day of Surgery

Day of Admission

Day of Discharge

Valid treatment cases

13,156

42,243

43,429

Total number of rules

961,295 (100%)

347,364 (100%)

126,914 (100%)

Antecedent k=1

1,196 (0.12%)

295 (0.08%)

195 (0.15%)

Antecedent k=2

48,510 (5.05%)

6,767 (1.95%)

4,008 (3.16%)

Antecedent k=3

911,589 (94.83%)

62,750 (18.07%)

27,932 (22.01%)

Antecedent k=4

--

277,552 (79.90%)

94,779 (74.68%)

Discussion The results show that it is valid to apply the process of data mining to the association analysis for the identification of patterns of actions in nursing. According to Lechner & Dannecker28, all of the association rules investigated in this work may be assigned to the group of semantically trivial association rules, since they reproduce existing knowledge from the domain of nursing. The reasons for the lack of useful new association rules may be complex. Nursing practice itself may be the reason, as it is questionable what may be defined as a new and useful nursing pattern. It is also possible that the method LEP Nursing 2 is not capable to depict nursing practice in sufficient detail, which is a prerequisite for the creation of useful association rules. In addition to discussing the method LEP Nursing 2, the coding of nursing care through this method is of particular interest, because the coding of a LEP variable depends on the assessment of the contents of the variable/action by the nurse. If a nurse guides a patient to the bathroom, this action may be assigned to either LEP category “motion” or to the category “excretion”29, depending on what the nurse perceives to be the primary support area. This example shows how difficult it is to identify a specific nursing action pattern, when the components of a rule may not be assigned unambiguously to a specific nursing action. Thus, not only the nursing services may be interpreted in a variety of ways, but the identified patterns themselves may affect different areas of support, which renders a clear allocation of the patterns almost impossible. If specific and identifiable nursing action patterns are included in the association rules, they may be so low in number (Supp(X→Y)) or belong to the middle range Supp(X→Y), that they are not observed or even identified in the large number of rules due to their small or medium values for Supp(X→Y). This suggests that the method LEP 2 is too unspecific and too general

for the generation of useful association rules. Nevertheless, with LEP the care providers in the German-speaking countries for the first time possess a scientific instrument to depict the services and achievements of nursing in patient care. The human resources needed to cope with everyday nursing work in the inpatient hospital setting may be represented as well. These nursing services, however, may not be justified through a pure recording of the service, because a service is not automatically justified by its existence. The instrument LEP Nursing 2 gives no information about why a service has been performed on a patient and cannot, therefore, be used to assess the need for this nursing service. For this aim, additional information on nursing problems and nursing objectives is required. This information has to be recorded in the nursing documentation system. There is no indicator and no justification for the nursing service, as medical diagnoses in themselves are not sufficient to determine nursing requirements30. Concerning this, Bartholomeyczik31 holds the view that the nursing efforts for patients with a specific medical diagnosis vary strongly and are significantly influenced by several factors. She believes that care has to deal less with the causes of a disease, but rather with the state of being ill and its impact on the individual. Furthermore, the assessment of a patient by a nurse or by a doctor may differ greatly from each other32. When trying to map North American Nursing Diagnosis Association (NANDA) diagnoses with ICD-10 diagnoses, Fischer32 found that only 21% of the NANDA diagnoses can be coded with ICD-10 diagnoses. This fact supports the thesis that medical diagnoses in themselves are not sufficient to identify the nursing needs. They are always only a limited indicator for nursing services. This illustrates how important it is to develop and use nursing indicators: to begin with, they help to depict the maintenance effort. Besides, they enable the identification of the complex nursing services in general, and the successful implementation of the nursing association analysis based on routine data in particular. This finding is supported by a study by Park et al.20. On the basis of a Nursing Minimum Data Set (NMDS) including nursing diagnoses, interventions and outcomes, Park et al.20 could identify useful association rules. In 2009 German policy makers acknowledged the need for nursing indicators to classify and justify the triggered nursing expenditures33. These expenditures have to be reflected in the G-DRG system for remuneration of health care. This led to the development of scores for complex nursing procedures (Pflegekomplexmaßnahmen-Score (PKMS)), which have been integrated into chapter nine of the German Procedure Classification catalog (Operationen und Prozedurenschlüssel (OPS)) of 2011. The professional care givers also realized that the benefit of nursing information systems is only evident if narrative information is collected in a structured manner. Results of “IT-Report Gesundheitswesen”7 indicate a positive trend towards to use of nursing activities classifications and standardized terminologies for nursing diagnosis, nursing interventions and nursing outcomes. Whereas unstructured text prevailed in nursing documentations in 2002, institution internal taxonomies and the NANDA taxonomy ranked 1st and 2nd in 2007. The results of this work are influenced by the poor diffusion of information technology in the nursing care profession – only 43% of German hospitals implement an electronic nursing documentation system 6. It is true that the analyses did not generate “new” and “useful” nursing patterns that contain information about previously unknown, but comprehensible nursing contexts. Still, the association analysis could be applied successfully to the available nursing routine data. The insights gained should be viewed as a kind of confidence-building measure, since they reflect the actual state of knowledge in the domain of nursing. They emphasize the importance of using taxonomies and classifications in nursing documentation in order to generate transparency of applied nursing activities. This leads to comparable data, which is vital for the development and establishment of a Nursing Minimum Data Set in Germany. References 1.

2. 3. 4. 5.

6. 7.

Boccoli E, Lavazza L, Tomaiuolo M, Brandi A, Melani AS, Trianni G. The content and structure of nursing documentation in Careggi Hospital, Florence, 1998: results and perspectives. Epidemiol Prev. 2001; 25(45):174-180. Güttler K, Görres S. Von APLE zu apenio. Wissenschaftlich entwickelte Typologie ist Basis der Pflegeplanungs- und -dokumentationssoftware apenio. PR-Internet/Informatik. 2006;8(5):306-312. Bulechek GM, McCloskey JC. Defining and validating nursing interventions. Nurs Clin North Am. 1992;27(2):289-299. Albrecht I, Fritsche S. Entwicklung und Implementierung einer mobilen elektronischen Patientendokumentation. Die Schwester der Pfleger. 2009;48(07):673-676. Hübner U, Sellemann B. Current and future use of ICT for patient care and management in German acute hospitals--a comparison of the nursing and the hospital managers’ perspectives. Methods of Inf Med. 2005;44(4):528–536. Hübner U, Sellemann B, Frey A, Egbert N, Liebe J, Flemming D. IT-Report Gesundheitswesen: Schwerpunkt Vernetzte Versorgung. Osnabrück: HS Osnabrück; 2010. Sellemann B, Flemming D, Frey A, Hübner U. Informationssysteme in der Pflege: Fortschritt oder Stagnation in den letzten 5 Jahren?. 2008; Available from: http://www.egms.de/static/en/meetings/gmds2008/08gmds171.shtml (last access: 07.09.2011).

8. 9. 10. 11. 12.

13.

14. 15. 16. 17. 18.

19. 20. 21. 22. 23. 24.

25. 26. 27. 28.

29. 30. 31. 32. 33.

Sermeus W, Goossen W. A Nursing Minimum Data Set. In: J. Mantas and A. Hasman (Eds.) Textbook in health informatics: a nursing perspective. IOS Press. 2002;65:98-109. Pick P, Brüggemann J, Grote C, Grünhagen E, Lampert T. Schwerpunktbericht der Gesundheitsberichterstattung des Bundes - Schwerpunkt Pflege. Berlin: Robert-Koch-Institut; 2004. Fayyad U. Diving into Databases. Database Programming & Design. 1998;11(3):24–31. Piateski G, Frawley W. Knowledge Discovery in Databases. Cambridge: MIT Press; 1991. Fayyad UM, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery: An overview. In: Fayyad UM, Piatetsky-Shapiro G, Uthurusamy R. (Eds.) Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence Menlo Park, CA, USA; 1996;1-34. Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge Discovery and Data Mining: Towards a Unifying Framework. In: Simoudis E, Han J, Fayyad UM (Eds.) Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR. 1996;82–88. Agrawal R, Imielinski T, Swami A. Mining Association Rules between Sets of Items in Large Databases. SIGMOD Rec. 1993;22(2):207–216. Cios KJ, William Moore G. Uniqueness of medical data mining. Artif Intell Med. 2002;26(1-2):1–24. Ordonez C. Ezquerra N, Santana C. Constraining and summarizing association rules in medical data. Knowl Inf Syst. 2006;9(3):259-283. Eickbusch J. Kundenabwanderungen in Kreditinstituten – Eine empirische Analyse mittels Data-MiningMethoden für das Privatkundengeschäft einer Großsparkasse. Frankfurt am Main, Fritz Knapp; 2002. Duan L, Street WN, Lu DF. A nursing care plan recommender system using a data mining approach. In: Li J, Aleman D, Sikora R. (Eds.): Proc. 3nd INFORMS Workshop on Data Mining in Health Informatics. 2008; Available from: http://morlab.mie.utoronto.ca/DMHI2008/papers/B2_4.pdf. Werley HH, Devine EC, Zorn CR, Ryan P, Westra BL. The Nursing Minimum Data Set: abstraction tool for standardized, comparable, essential data. American Journal of Public Health 1991;81(4):421. Park M, Park JS, Kim CN, Park KM, Kwon YS. Knowledge discovery in nursing minimum data set using data mining. Taehan Kanho Hakhoe Chi. 2006;36(4):652–61. Brügger U, Bamert U, Maeder C, Odermatt R. Beschreibung der Methode LEP Nursing 2: Leistungserfassung für die Gesundheits-und Krankenpflege. St. Gallen: LEP; 2002. Cios KJ, Teresinska A, Konieczna S, Potocka J, Sharma S. A knowledge discovery approach to diagnosing myocardial perfusion. IEEE Eng Med Biol Mag. 2000;19(4):17–25. Bollinger T. Assoziationsregeln – Analyse eines Data Mining Verfahrens. Informatik-Spektrum. 1996;19(5):257–261. Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C. (Eds.): Proc. 20th Int. Conf. Very Large Data Bases, VLDB. San Francisco: Morgan Kaufmann Publishers;1994;487-499. Adamo JM. Data Mining for Association Rules and Sequential Patterns: Sequential and Parallel Algorithms. Springer; 2001:37-43. Fahrmeir L, Kunstler R, Pigeot I, Tutz G. Statistik, der Weg zur Datenanalyse. 2 Aufl. Berlin: SpringerVerlag; 1999;237. Kamber M, Han J. Data Mining. Concepts and Techniques (Morgan Kaufmann Series in Data Management Systems): Concepts and Techniques. Morgan Kaufmann; 2001. Lechner U, Dannecker A. WrapUP – Data Warehousing. 2008; Available from: http://wi.informatik.unibwmuenchen.de/C14/lectures-DataWarehous(WT2008)/Document%20Library/DWDM-WT08Wrapup__01__ad.pdf (last access 07.09.2011). Bamert U. LEP® Nursing 2.1.1 – Beschreibung der Variablen der Methode LEP® für die Gesundheits- und Krankenpflege. Revision B. . St. Gallen: LEP; 2004. Eberl I, Bartholomeyczik S, Donath E. Die Erfassung des Pflegeaufwands bei Patienten mit der medizinischen Diagnose Myokardinfarkt. Pflege. 2005;18:364–372. Bartholomeyczik S. Erforderliche Pflege und die geplante Einführung der DRGs. Medizin und Gewissen. Wenn Würde ein Wert würde... Frankfurt am Main: Mabuse. 2002;229–235. Fischer W. Diagnosis Related Groups (DRGs) und Pflege. Huber; 2002. Bundesministerium für Gesundheit (BMG): Zweiter Pflegegipfel – Maßnahmen für bessere Pflege im Krankenhaus vorgestellt – Handlungsempfehlungen zur genauen Abbildung von pflegerisch hochaufwendigen Fällen im G-DRG-System. Pressemitteilung Nr. 26 und Anlage 2. Berlin; 2009.

Big data and nursing knowledge.

A normative analysis of nursing knowledge.

A Nursing Intelligence System to Support Secondary Use of Nursing Routine Data.

The distinctiveness of nursing knowledge.

Perspectives on nursing knowledge.

The conceptual basis of mental health nursing.

The Nursing Minimum Data Set: a major priority for public health nursing but not a panacea.

Letter: Contributing to nursing knowledge.

Nursing knowledge development—making the link.

Nursing and the new biology: towards a realist, anti-reductionist approach to nursing knowledge.

A Concept Analysis of Palliative Care Nursing: Advancing Nursing Theory.

A review of dashboards for data analytics in nursing.

Good nursing practice as perceived by clients: a starting point for the development of professional nursing.

Doing Foucault: inquiring into nursing knowledge with Foucauldian discourse analysis.

The clinical utility of the Cornell Scale for Depression in Dementia as a routine assessment in nursing homes.

Nursing knowledge for the 21st century: an international commitment.

Doctorate education and producing knowledge in nursing.

Knowledge sources for evidence-based practice in rheumatology nursing.

Cardiac health knowledge and misconceptions among nursing students: implications for nursing curriculum design.

International Family Nursing Association: family nursing practice survey findings.

Does the use of a classification for nursing diagnoses affect nursing students' choice of nursing interventions?

Relationships between core factors of knowledge management in hospital nursing organisations and outcomes of nursing performance.

Nursing research. Challenges for nursing.

Diabetes knowledge in nursing homes and home-based care services: a validation study of the Michigan Diabetes Knowledge Test adapted for use among nursing personnel.