Front. Med. 2014, 8(3): 352–357 DOI 10.1007/s11684-014-0351-1

RESEARCH ARTICLE

Clinical data quality problems and countermeasure for real world study Runshun Zhang1, Yinghui Wang1, Baoyan Liu (

✉)2, Guangli Song1, Xuezhong Zhou3, Shizhen Fan4, Xishui Pan4

1 Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China; 2China Academy of Chinese Medical Sciences, Beijing 100700, China; 3School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China; 4 Beijing Fanglue Medical Information Co., Ltd., Beijing 100053, China

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2014

Abstract Real world study (RWS) has become a hotspot for clinical research. Data quality plays a vital role in research achievement and other clinical research fields. In this paper, the common quality problems in the RWS of traditional Chinese medicine are discussed, and a countermeasure is proposed. Keywords real world study; traditional Chinese medicine; clinical and research information sharing system; data quality problem; data quality control

Introduction The National Institutes of Health emphasized that studies should use findings generated by clinical research and translate them into treatments for patients in day-to-day non-research settings, so real-life data can be obtained. Real world study (RWS) has become a hotspot for clinical research [1,2]. Experts in both efficacy and effectiveness research collaborated to address the methodological and data collection issues that need to be considered in designing a firstgeneration effectiveness study. Elements of the overall study design, setting or service delivery context, inclusion and exclusion criteria, recruitment and screening, assessment tools, and intervention modification are discussed to illustrate the idea and rationale for decisions regarding these different design components. Real life data are collected under real life practice circumstances. The clinical data collected in the hospital, at home or abroad, are due to the development of information technology and wide use of hospital information system (HIS). The real world data of clinical research will mainly come from the electronic medical records (EMRs) and other information systems in a hospital. The data quality of clinical research determines the quality and level of RWS. Thus, data quality has become a vital problem in clinical research [3].

Definition of data quality and basic characteristics of high quality data Data quality is defined as the degree of excellence exhibited by the data in relation to the portrayal of the actual scenario and totality of features and characteristics of data that bears on their ability to satisfy a given purpose [4]. The high quality of data can truly reflect the essence of things. Given that RWS of clinical research is a relatively new research topic, no unified standard for clinical data quality currently exists. The same set of data may show different data qualities depending on the research needs. The data reflecting the objective world and realizing information transfer should meet the following basic properties [5]. Accuracy Accuracy, or correctness, is the root property of data quality attributes. Clinical data must be in accordance with the real clinical practice, and should correctly represent the real world construct and status. Availability and usability Availability refers to the data stored in the database in line with the data properties previously designed for any intended or external uses. Usability is a vital feature of data quality. Integrity

Received May 13, 2014; accepted July 11, 2014 Correspondence: [email protected]

Integrity indicates the degree of completeness. It requires

Runshun Zhang et al.

each record to be unique in the range of legal values, but for correlations to be maintained. Consistency and standardization Consistency refers to logical consistency. Standardization refers to the content and format that meet the requirements of standard specifications. Timeliness Timeliness refers to valid clinical data collected from the operation system in real time. The above features reflect the data quality from different aspects, and are interrelated with each other. High quality data must ensure that all properties meet the research needs.

Basic characteristics of clinical RWS data Information of traditional Chinese medicine (TCM) involves knowledge-intensive clinical data, but records are not comprehensive and abundant. Many subjective, reflective, or epistemological data exist. Different doctors or nurses have different descriptions in health records using different terminologies and texts. Thus, the connotation and extension of a concept or term are easy to change. Furthermore, RWS data come from different sources. Data types differ in terms of various clinical application software programs, and significantly differ from randomized controlled trial (RCT) research. These characteristics cause problems in the quality control of TCM clinical data in terms of theory, technology, and management aspects.

Common quality problems of real world clinical data In RWS, researchers have to collect data during clinical work. The problems in data quality are common because of the different goals of clinical and scientific research, busy working schedules of doctors and nurses, insufficient awareness on the importance of quality control, and weakness of the quality control measures. Different understanding of quality control goal and standard In previous clinical studies, particularly double-blind RCTs in which the research objects are certain people, data quality control should abide by the quality control standard for clinical trials (i.e., good clinical practice) to reduce imprecise data and improve data accuracy [6,7]. Clinical data are currently showing an increasing trend. Clinical RWS is entering the era of big data because of the

353

increasing number of cases and availability of information in each case. Some researchers mistakenly believe that data quality is relatively unimportant for analyzing big data, and obtain results from fuzzy data. For large numbers of data, their variability must be based on data accuracy. Clinical data differ from general data. Data analysis and data mining aim to determine the rules of valuable disease clinical manifestations, diagnosis, treatment and prognostic factors related to life and health [8]. Therefore, improving the quality of data should be a priority of clinical research. The improvement in data quality is a systematic project that involves the demand of data and steps of data collection and use. A reasonable evaluation criteria and management system should be established. A specialized personnel team for data quality improvement work or related service work is also necessary. Among these requirements, the quality standard is vital. For RWS data quality, we cannot neglect quality control because of the large amounts of data involved. Moreover, erroneous or incomplete data are not allowed, and each step cannot be strictly controlled as in standardized clinical RCTs. The quality control standards should be constructed and combined with the research goals and clinical information to meet the needs of scientific research. Researchers should set feasible quality standards on this basis. Quality control system is not sound, and the method is not perfect Data quality control should be evaluated in terms of availability, accuracy, timeliness, completeness, attainability, interpretability, cohesion, comparability, objectivity, effectiveness, cost, and so on. Among these parameters, applicability, accuracy, timeliness, attainability, comparability, and cohesion are internationally recognized as the basic elements of data quality. Given the lack of unified standards for data quality, the principle of quality control system is unclear, and a method for quality control has not been established. Good quality control methods should be able to detect the problems hidden in data, assess the overall data quality, and locate the error in data. Data collection software should be developed to provide technical support for improving data quality [9]. Responsibility and division of quality control work are unclear Data quality control in RCTs requires specialized personnel. However, for clinical RWS, data acquisition involves clinical personnel, scientific research and management staff, interns, advanced training physicians, and other professional staffs at all levels. Data preprocessing, statistical analysis, and data mining personnel are involved in the stage of data consumption. These employees are likely to affect the data quality for data collection and transformation, and influence the completeness and correctness of the data. Unspecialized

354

personnel may lack the experience needed for data quality control. A working environment with no rigorous work division and unclear responsibilities can lead to insufficient quality work or difficulty in implementing quality control measures. Only high quality data can be used to draw scientific conclusions, so organizers should arouse the enthusiasm of each participant, strengthen quality understanding, organize scientific results, take on responsibility, establish rules and regulations, and require everyone to collect high quality clinical data when they complete clinical work. Cost and difficulty in implementing high quality control Quality control requires manpower and material resources. This work aims to reduce inputs or cost, and increase the number and quality of results of output. Clinical RWS is unremitting work, and the main work of the personnel is clinical work. Too much emphasis on scientific research needs or high costs of scientific research may affect clinical routine work. If we do not focus on the data quality of scientific research, the data may not conform to the research needs and we will be unable to obtain the expected results. Therefore, data quality control should reduce research costs to increase feasibility of the work.

Strategy for the quality control of real world clinical data Confirm the basic requirement of quality standards of real world medical data In the era of small data, researchers require accurate data. However, in the era of big data, the understanding of data quality requirements places more emphasis on the integrity of data. Some inaccurate and erroneous data are inevitable, and these mistakes do not affect the overall analysis of data mining. The big data era requires us to re-examine the pros and cons of accuracy. We do not have to try to avoid confounding factors. This principle may be more applicable in other areas, but in clinical studies, the confounding factors are not suitable for research concerning the life and health of patients and requiring high precision. Although the size of the present clinical data is not yet big data, we should avoid errors and confounding factors to ensure the accuracy and integrity of data for each case. We should treat the issue of data control lightly. We believe that high quality clinical research data in RWS should have the following characteristics: (1) meet the requirements of clinical work to achieve a high level of health records; (2) meet the specific research demand and common research needs; (3) the needs of human and material resources invested in clinical research are within the acceptable range; and (4) data fit the requirement of accuracy, consistency, and completeness.

Clinical data quality problems and countermeasure

Establish data quality control methods and strengthen the quality control process In RWS practice, under the guidance of TCM clinical research paradigm theory [2], we have developed a medical and clinical research of TCM information sharing system (or the sharing system), which solves the problem of digitalization of TCM data, management of TCM data, and method of using complex clinical data. The structure of the TCM EMR system is an important part of the sharing system, and it is a key technological tool for converting clinical diagnosis and treatment information into structural data in the process of writing medical records. The designed template can ensure the clinical effectiveness of data collection, data availability, integrity, and standardization. As a platform centered on patients with integration of the hospital information system, laboratory information system, picture archiving and communication system, and other data resources, evaluation of effects and follow-up information in clinical research data can meet the requirements of normal medical work and efficiency demands [10–13]. The key points of clinical data quality include the aspects shown in Fig. 1. Ensure the quality of the source data RWS data of TCM come from clinical medical records. The quality of clinical research is based on these records. To obtain clinical information of high quality, doctors must take good care of patients, provide them with correct diagnosis and treatments, improve the effectiveness of treatments, and ensure the quality of medical record data. (1) Doctors should have a high standard of professional ethics and sense of responsibility. Doctors should perform clinical activities, as well as promote and maintain the patient’s physical and mental health. (2) Doctors should continue to learn medical knowledge and master the most comprehensive knowledge and latest technology. Patients should be treated based on the most suitable equipment, technology, methods, and service. (3) Doctors should use standardized clinical terms when completing medical records, and provide comprehensive and accurate clinical diagnosis and treatment information. Doctors should improve the understanding of the structured EMRs, make full use of the medical record templates, and constantly summarize and revise the template. (4) Doctors should make a definitive diagnosis that includes the TCM disease diagnosis, syndrome diagnosis, and western medicine diagnosis; analyze the records of TCM syndrome characteristics; and record the clinical effects and follow-up information [14]. Create a research plan, focus on key information, and formulate basic requirements of quality control At the beginning of RWS based on the sharing system, we

Runshun Zhang et al.

355

Fig. 1 Flow chart of TCM RWS quality control.

found a substantial amount of clinical medical records from the structured EMR. However, when we attempted to conduct a certain study, some quality problems occurred, such as insufficient necessary data. Such problems make the effective use of real data difficult. Scientific research requires complete, comprehensive, and standardized data whereas clinical work requires real-time and efficient data that meet the basic requirements of medical work. Given the differences between data of scientific research and clinical work, the information collected by clinical staff in the limited time and limited energy is unable to meet the goal of scientific research. In some cases, clinical personnel spend much time collecting information, but fail to meet the demand of scientific research. The problems of the early stage of RWS are due to insufficient recognition of an implementation plan. Thus, a study plan that includes the design and optimization of the implementation of scientific research can solve the data quality problems in the RWS process. A scientific research plan provides a standard of relatively high quality data to form the expected target research data within a specific period. We proposed that the plan should be designed based on TCM development, and geared to the needs of clinical problems. Furthermore, the complexity of

the clinical data and difficulties in the process of data collection should be considered. The design is based on the design principles of clinical scientific research, and combines the technical features of the sharing system in the TCM RWS and its implementation process. Researchers should constantly adjust and optimize the plan, and determine the appropriate digital content and method to match the program. Design scientific and reasonable acquisition methods according to quality requirements The requirements of clinical data quality are stricter in the RWS of TCM compared with the general requirements of written medical records. For example, in terms of the integrity of data, data of hospitalized patients should include hospital records, progress notes, course records for the first time, discharge records, whole structured orders, laboratory examination results, physical examination results, and other treatment information. According to research needs, dynamic and comprehensive information, such as scale and follow-up information, should also be included. In terms of data accuracy, the indexes within a reasonable range should be consistent with clinical practice. In terms of the consistency of

356

data, clinical information must be relevant and highly consistent with data from scientific research. The clinical evaluation of in-hospital records must correspond with the discharge records. Clinical documents should be convenient and accurate to complete data collection of scientific research. Similar data in different places have a mapping relationship, thereby reducing the workload of clinical staff and ensuring the consistency of data for standardization. Multiple constraint mechanisms to control medical records should be established to ensure the integrity of data. Moreover, the contents of each sampling point should be controlled via scientific and reasonable scheduling tasks. Quantitative quality indicators Scientific quality can be managed to quantify quality control indexes. We can evaluate the factors related to data quality by estimating the correctness rate using the value of correctness, which can be obtained by dividing the total amount of data by record data. We can evaluate the missing quantitative data by dividing the total record by the value of the missing data. We can then estimate the integrity by the amount of data in data set that meets the conditions by dividing the total number of records by the error of the missing data or integrated data. We can also examine the consistency by dividing the total number of records reviewed by the number of all records meeting the conditions in the database (for a particular rule). We can calculate the data timeliness by dividing the number of records in the total collection by the amount of data that has not yet been included in the invalid data set.

Clinical data quality problems and countermeasure

control module should be maximized. The man-machine combination can strengthen quality control. For example, setting key information, settling quality problems via manmachine combination, auditing strictly, and using personnel to solve problems based on quantitative quality indicators and effective data quality control implementation process can ensure that data meet the demand of clinical and scientific needs. (4) In the process of quality control, a mechanism to detect and solve problems should be established. Implement staff responsibilities Relevant personnel should be assigned common data quality control tasks. Quality control should be considered in performance appraisal. The results of data quality should become the basis for regular assessment of the staff. The responsibilities executed effectively will promote the improvement in data quality. Establish perfect quality control technical support system based on the man-machine combination In terms of quality control, the technology system plays a very important role. However, this system cannot solve all problems, such as the participation of relevant experts and personnel use of the methods of man-machine combination, initiative to review data, completion and accuracy of data, quality control technology of common data quality control, and governance responsibilities. The quality control technical system is an important tool to solve the problem of data quality.

Implement process management strictly to control bias In the process of RWS using the sharing system, synchronization management should be performed for clinical and research work. High quality data should be traced back, accurate, timely, complete, and consistent. Only strict process management and control of various types of bias throughout the course of clinical research work ensure the quality of data. Quality control can be strengthened and high quality clinical data can be ensured in the stages of data collection, integration, transmission, analysis, and mining. Among these stages, data collection is the most crucial, and quality control in this stage can be achieved by performing the following steps [15,16]. (1) Clinical doctors or data collection personnel should be regularly trained and assessed. (2) The quality control standard should be increased on the basis of general clinical quality control requirements for medical records. The quality control standards of clinical research data should not only meet the medical record writing standards, but also be combined with the research topic. (3) The clinical information collecting system, such as structured EMR and template, should be constantly improved. The structured EMR quality

Establish data quality evaluation system and normalize quality control work Data quality is vital in scientific research. The RWS is a longterm process. Data quality methods and management should be established on the basis of long-term mechanisms. Quality control work should be routine. Researchers should focus on improving data quality with different sources and different types according to the research target, and use technical means to solve the quality problems according to the changes in research topics to gear to the needs of clinical problems. Based on the big data method, we can accumulate clinical data to develop deeper and wider clinical studies, and obtain more valuable conclusions in clinical RWS.

Acknowledgements This work was partially supported by the National Special Fund Projects for Traditional Chinese Medicine (201207001) and National High Technology Research and Development Program of China (863 Program, digital information system research and development of traditional Chinese medicine, 2012AA02A609).

Runshun Zhang et al.

357

Compliance with ethics guidelines 9.

Runshun Zhang, Yinghui Wang, Baoyan Liu, Guangli Song, Xuezhong Zhou, Shizhen Fan, and Xishui Pan declare that they have no conflict of interest. This article does not contain any studies with human or animal subjects performed by any of the authors. 10.

References 1. Roy-Byrne PP, Sherbourne CD, Craske MG, Stein MB, Katon W, Sullivan G, Means-Christensen A, Bystritsky A. Moving treatment research from clinical trials to the real world. Psychiatr Serv 2003; 54(3): 327–332 2. Liu BY. Traditional Chinese medicine clinical research paradigm of real-world. J Tradit Chin Med (Zhong Yi Za Zhi) 2013; 54(6): 451– 455 (in Chinese) 3. Jiang LJ, Xie Q, Liu BY. Overview of the comparative effectiveness research. World Tradit Chin Med (Shi Jie Zhong Yi Yao) 2013; (6): 695–700 (in Chinese) 4. Roebuck K. Data Quality: High-impact Strategies—What You Need to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors. Australia: Tebbo Press, 2013 5. Ding HL, Xu HB. The data quality analysis and application. Comput Technol Dev (Ji Suan Ji Ji Shu Yu Fa Zhan) 2007; (3): 236–238 (in Chinese) 6. Roche N, Reddel H, Martin R, Brusselle G, Papi A, Thomas M, Postma D, Thomas V, Rand C, Chisholm A, Price D. Quality standards for real-world research. Focus on observational database studies of comparative effectiveness. Ann Am Thorac Soc 2014; 11 (Suppl 2): S99–S104 7. Sun SL, Weng WL, Yang LH. Quality and Management in the Traditional Chinese Medicine Clinical Research Process. Beijing: China Press of Traditional Chinese Medicine, 2010: 4: 14–65 (in Chinese) 8. Schonberge VM, Cukier K. Big Data: A Revolution that Will Transform How We Live, Work, and Think (in Chinese, trans.

11.

12.

13.

14.

15.

16.

Cheng YY, Zhou T). Hangzhou: Zhejiang People’s Publishing House, 2013: 45–66 Song HM, Liu BY, He LY, Zhang RS, Zhou XJ, Zhou XZ. The quality problems in the process of data collection and countermeasures in scientific research by the electronic medical records of traditional Chinese medicine. Chin J Basic Med Tradit Chin Med (Zhongguo Zhong Yi Ji Chu Yi Xue Za Zhi) 2011(9): 955–956 (in Chinese) Xie Q, Jiang LJ, Liu BY, Shi HX. Key issue and countermeasure to carry out research in the real world comparative effectiveness research of traditional Chinese medicine. World Tradit Chin Med (Shi Jie Zhong Yi Yao) 2014; (1): 28–31 (in Chinese) Liu BY, Zhou XZ, Li P, Wang YH, Wen TC, Guo YF, Zhang RS, Chen SB. The individualized diagnosis and treatment of clinical scientific research information integration platform. China Digit Med (Zhongguo Shu Zi Yi Xue) 2007; 2(6): 31–36 (in Chinese) Liu B, Zhou X, Wang Y, Hu J, He L, Zhang R, Chen S, Guo Y. Data processing and analysis in real-world traditional Chinese medicine clinical data: challenges and approaches. Stat Med 2012; 31(7): 653–660 Zhou X, Chen S, Liu B, Zhang R, Wang Y, Li P, Guo Y, Zhang H, Gao Z, Yan X. Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artif Intell Med 2010; 48(2–3): 139–152 Liu BY, Zhou XZ, Zhang RS, Wang YH, Xie Q, Guo YF, Zhang XP, Zhou XJ, He LY, Zhang L, Song GL, Zhang YH, Zhang H, Li BS, Zhao SJ. Basic requirements of TCM electronic medical records system in medical treatment and clinical scientific research information sharing system. China Digit Med (Zhongguo Shu Zi Yi Xue) 2012; 7(10): 57–60 (in Chinese) Liu BY, Xie Q Shi HX, Wang B, Zhou XZ, Zhang RS, Guo YF, Zhang XP. To build real world clinical research technology platform of organization management strategy. J Tradit Chin Med (Zhong Yi Za Zhi) 2013; 54(24): 2071–2075 (in Chinese) Shi HX, Liu BY, Xie Q, Wang B, Cao XY, Jiang LJ, Zhao Y, Wang ZY. Build a service flow and promote the transformation and application of Chinese medicine scientific and technological achievements. J Tradit Chin Med (Zhong Yi Za Zhi) 2013; 9: 726–728 (in Chinese)

Clinical data quality problems and countermeasure for real world study.

Real world study (RWS) has become a hotspot for clinical research. Data quality plays a vital role in research achievement and other clinical research...
121KB Sizes 2 Downloads 6 Views