SCHOLARS’ CORNER

The use of large healthcare data sets in pursuit of a clinical question Raymond R. Arons, DrPH (Adjoint Professor)1 & Leslie-Faith Morritt Taub, PhD, A/GNP-BC (Clinical Associate Professor)2 1 2

Anschutz Medical, University of Colorado, Denver, Colorado NYU College of Nursing, Staten Island, New York

Keywords Clinical decision making; evidence-based practice; healthcare delivery; research. Correspondence Leslie-Faith Morritt Taub, PhD, A/GNP-BC, 5 Brunswick St., Staten Island, NY 10314. NYU College of Nursing 433 First Avenue, 5th Floor, Room 514 NY, NY 10010 212-992-7342. Fax: 212-995-3143; E-mail: [email protected]

Abstract The use of large healthcare databases may be of interest to nurse practitioners who wish to answer clinical questions. This column will provide information about access to selected large healthcare databases, requirements for statistical software, and the skills required to utilize these databases.

Received: 22 November 2014; accepted: 9 December 2014 doi: 10.1002/2327-6924.12238

Healthcare data collection is entering its 50th year, starting with the passage in 1966 of the Medicare and Medicaid legislation (New York State Department of Health, 2014) and has resulted in multiple, extremely large, public use data sets. The use of large healthcare data sets may be of interest to nurse practitioners (NPs) who wish to answer clinical questions such as how closely a representative sample of all U.S. ambulatory care practitioners follow new treatment guidelines or what the trend is in bariatric surgery for those with type 2 diabetes (Davis, Slish, Chao, & Cabana, 2006; Taub, 2005). This column will provide some guidance for those interested in public use data sets to answer clinical questions. In 1979, New York State passed regulations requiring hospitals to submit electronic data annually on all patient encounters including hospital discharges, ambulatory surgery visits, and emergency department visits (Arons, 1984). Over the years, variables collected have become standardized and include but are not limited to all patient demographics, such as zip code, age, race, Hispanic ethnicity, diseases presented, procedures performed, time of procedure, method of arrival, provider identifiers (license numbers), the dates of service, and the condition of the patient when leaving the facility such as to home or death (Missouri Department of Health and Senior Services, 2014).

236

Based on the annual required reporting from hospitals, large data sets have been compiled for various types of settings. In 2014, there were 4999 acute care, short stay, nonfederal hospitals in the United States. They average 36 million discharges annually (American Hospital Association, 2014). There are 5724 ambulatory surgery centers with 53.3 million ambulatory surgery visits yearly. Lastly, there are 129 million emergency department visits yearly, which have been increasing for decades (Centers for Disease Control and Prevention [CDC], 2014d). An important reason for collecting these data is to measure and report adverse events, cost comparisons, cardiac mortality rates, comorbidities, unnecessary procedures and treatments, quality indicators, and hospital-to-hospital cost comparisons (National Association of Health Data Organizations, 2014). Healthcare researchers across the nation can obtain access to these data sets by requesting a data use agreement from the supplying agency. Researchers must agree not to identify a specific patient or share the data with anyone who has not signed an agreement, and the researcher must pay fees rated on the amount and complexity of data requested (State of California Office of Statewide Health Planning and Development, 2014a). Many large data sets collected by the federal government, such as the National Health and Nutrition Examination Survey (NHANES),

Journal of the American Association of Nurse Practitioners 27 (2015) 236–239  C 2015 American Association of Nurse Practitioners

The use of large healthcare data sets

R.R. Arons & L.-F.M. Taub

Table 1 Public use data files National survey data location

Years available

National Health and Nutrition Examination Survey (NHANES) http://www.cdc.gov/nchs/nhanes.htm

Annually from 1999 to 2014

National Health Interview Survey (NHIS) http://www.cdc.gov/nchs/nhis.htm

1997–present 1996–prior

National Health Care Surveys (NHCS) http://www.cdc.gov/nchs/ahcd.htm

National Ambulatory Medical Care Survey (NAMCS)

National Hospital Ambulatory Medical Care Survey (NHAMCS)

National Survey of Ambulatory Surgery (NSAS)

National Hospital Discharge Survey (NHDS)

National Nursing Home Survey (NNHS)

National Survey of Residential Care Facilities (NSRCF)

National Study of Long-Term Care Providers (NSLTCP)

Selected variables Assesses the health and nutritional status of adults and children in the United States; combines interviews with physical and laboratory examinations Personal household interviews. Survey results track health status, healthcare access, and progress toward achieving national health objectives for 50 years Nationally representative, provider based, covers a broad spectrum of healthcare settings (home healthcare agencies, inpatient hospital units, or physician offices)

NAMCS—1973–1981, 1985, 1989–present Most recent year of data available: 2007 NHAMCS—1992-present Most recent year of data available: 2007 NSAS—1994–1996, 2006 Most recent year of data available: 2006 NHDS—1965–2010 Most recent year of data available: 2009 NNHS—1992–1994, 1996, 1998, 2000, 2007 Most recent year of data available: 2007 NSRCF—2010–2011 Most recent year of data available: 2011 NSLTCP—2012–2013 Most recent year of data available: expected late 2013

Go to http://www.cdc.gov/nchs/data_access/ftp_data.htm to access to data sets, documentation, and questionnaires from NCHS surveys and data collection systems. Downloading instructions are available in "readme" files

the National Ambulatory Medical Care Survey (NAMCS), the National Health Interview Survey (NHIS), and the National Hospital Care Survey (NHCS), have deidentified data files that can be accessed online without any charge (CDC, 2014a, 2014b, 2014c, 2014e) These data sets are kept current and are designed to be estimates of the U.S. population (see Table 1 for selected content in these free data sets).

Although the use of large data sets is compelling, NPs will require advanced education in how to manage this complex task. Completing a university course in large-scale data gives researchers a unique ability to query population level data (Arons, 2001; University of Pittsburgh School of Nursing, 2014). A second approach to acquiring the necessary skills would be to attend a center that specializes in training data analytics (SAS Training,

237

The use of large healthcare data sets

R.R. Arons & L.-F.M. Taub

Table 2 Terminology Data dictionary

Sampling error Stratified sampling Weighed means

Sample design

Information describing the contents, format, and structure of a database and the relationship between its elements used to control access to and manipulation of the database. Access to the database’s data dictionary is required to query the database. Source: http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_049331.hcsp?dDocName=bok1_049331 A result of taking a sample from a population sample rather than using the whole population. The process of dividing members of the population into homogeneous subgroups before sampling to improve the representativeness of the sample by reducing sampling error. Mathematical manipulation to assist in making population sample representative to the whole (e.g., United States) population. An example of this is the weighting of a racial sample that is underrepresented in the data to provide numbers more closely aligned to the racial sample of that population in the entire U.S. population. An explanation of how the sample was chosen to, for example, represent national estimates for the civilian noninstitutionalized population in the United States.

2014). A third approach is to be mentored by a faculty or statistics expert in large data analysis. It is also important to find recent textbooks and high-quality online resources for reference use (Arons, 2013, 2014; Kirkpatrick LA, 2013). In addition to adequate preparation for approaching research using a large data set, an appropriate statistical software program capable of doing analysis of large data sets is essential. Examples of four of the 50 propriety software include (a) SAS—comprehensive statistical package, (b) Stata—comprehensive statistical package, (c) IBM SPSS—Statistical Package for the Social Sciences, and (d) BMDP—general statistics package (Columbia University Information Technology, 2014; Delwiche LD, 2012; University of Maryland, 2014). For those who choose to use open or free software, the following are three examples of the 50 open source statistical packages: (a) DAP—a free replacement for SAS (The DAP Project for Statistics and Graphics, 2014), (b) R—a free implementation of the S language (The R Project for Statistical Computing), and (c) PSPP—a free software alternative to IBM SPSS Statistics (GNU Project PSPP, 2014). Some examples of data sets of potential interest to NPs in addition to those in Table 1 are provided with examples how they can be used to answer clinical questions. The National Discharge Survey (CDC, 2014e) can be easily accessed online from the National Center for Health Statistics with data beginning in 1970 up to 2010. Documentation is available in downloadable PDF files (see Table 2 for an explanation of some of the terminology that follows). The files contain the data dictionaries that are required to identify all of the variables and where they are located in the survey records. They also provide the survey design, survey methodology, expected error rates, and summary tables of principal variables. This survey is a stratified multistage randomized data set. The documentation provides SAS coding directions and the variables with each record, such as strata and weight, which use sample observations to reflect those discharges occurring in the nation for the year of study. Most importantly, it 238

provides the basic tools to write the necessary code to answer your research question about the nation’s hospitals. Having more than 40 years of data allows for studying significant trends over a long time period: How has hospital breast cancer or prostate surgery by region varied over the last decade by age, race, insurance, and principal diagnoses? The second data set discussed as an example is from the Organ Procurement and Transplant Network. This agency follows and collects data on the recipients of kidney, kidney–pancreas, heart, heart–lung, liver, pancreas, lung, and intestine. Follow-up began in 1988 and continues today. Between 1988 and 2014 there have been 129,249 liver transplants performed in the United States (Health and Human Services Administration [HRSA], 2014). There are 30 variables included for each recipient. They include, but are not limited to diagnoses at transplantation, height, weight, blood type, gender, living versus cadaver donor, age, ethnicity, gender, a panel of blood chemistry at time of transplant, and survival time. A sample research question would be: what are the predictors of surviving a liver transplantation considering the effects of gender, race, age, blood type, status pretransplant, principal diagnoses, and donor type? Another example of data use is from the Office of Statewide Health Planning & Development (OSPHD), from the California Emergency Department. Again a data request form is necessary and this data set comes on CD, with data available from 2005 through 2013 (State of California Office of Statewide Health Planning and Development, 2014d). It should be emphasized for faculty members, students, or anyone working at a 501c (3) nonprofit, there are no fees associated with the three data sets reviewed. Again the primary documentation is the data dictionaries readily available for downloading (State of California Office of Statewide Health Planning and Development, 2014b). There are 30 data categories including but not limited to facility identification numbers, age, gender, ethnicity, race, first three zip code digits, patient

R.R. Arons & L.-F.M. Taub

county, the quarter of the year of service, patient disposition, expected source of payment, injury codes, diagnoses codes, and procedure codes. With this large data set, in 2013, with 10.8 million visits, the research possibilities are enormous (State of California Office of Statewide Health Planning and Development, 2014c). A potential study question is, what were the percent of uninsured using the emergency department along with their demographics, principle diagnoses, principal cause of injuries, and disposition? Large data sets, which are population based, support generalizability and can be used to answer clinical questions that are specific to the variables within the database. It is critical to understand the strength and weakness of the data you have selected to study. Secondary data analysis of large datasets provides an opportunity to query a very large sample within the inherent design representing, for example, all of the U.S. hospital discharges, all liver transplants in the United States, all of the California emergency rooms, etc. Before you start, it is important to make the assessment: will the data set I have chosen provide answers to my clinical questions? Further, data used for benchmarking must use the same terminology and definitions that you wish to compare it with so you must be cognizant of the definitions that are used in the database you have chosen to work with. As we seek to anchor clinical practice in evidence-based high-quality research, which will have an impact on the public’s health, it becomes important for both clinicians and nurse researchers to obtain the skill sets to be comfortable to manipulate readily available, rich data sets that allow an opportunity to provide answers of interest for practice.

References American Hospital Association. (2014). Research and trends: Fast facts on US hospitals. Retrieved from http://www.aha.org/research/ rc/stat-studies/fast-facts.shtml Arons, R. R. (1984). The new economics of health care: DRGs, Case Mix, and the Prospective Payment System (PPS) (1st ed.). New York: Prager. Arons, R. R. (2001). Using technology to advance public health. American Journal of Public Health, 91(8), 1178–1179. Arons, R. R. (2013). Health services research using SAS. Retrieved from http://www.drraysastraining.com/ Arons, R. R. (2014). SAS data driven by rocket science. Retrieved from http://www.drraysastraining.com/ Centers for Disease Control and Prevention (CDC). (2014a). National Ambulatory Medical Care Survey. Retrieved from http://www.cdc.gov/nchs/ahcd.htm

The use of large healthcare data sets

Centers for Disease Control and Prevention (CDC). (2014b). National Health and Nutrition Examination Survey. Retrieved from http://www.cdc.gov/nchs/ nhanes.htm Centers for Disease Control and Prevention (CDC). (2014c). National Health Interview Survey. Retrieved from http://www.cdc.gov/nchs/dhcs.htm Centers for Disease Control and Prevention (CDC). (2014d). Emergency department visits. Retrieved from http://www.cdc.gov/nchs/ fastats/emergency-department.htm Centers for Disease Control and Prevention (CDC). (2014e). National Hospital Care Survey. Retrieved from http://www.cdc.gov/nchs/nhcs.htm Columbia University Information Technology. (2014). SAS, how to obtain licenses. Retrieved from http://cuit.columbia.edu/sas Davis, M. M., Slish, K., Chao, C., & Cabana, M. D. (2006). National trends in bariatric surgery, 1996-2002. Archives of Surgery, 141(1), 71–74. doi:10.1001/archsurg.141.1.71 Delwiche, L. D., & Slaughter, S. J. (2012). The little SAS book, a primer (5th ed.). Cary N.C.: SAS Institute Inc. GNU Project PSPP. (2014). Retrieved from https://www.gnu.org/software/pspp/ Health and Human Services Administration (HRSA). (2014). Request data, organ procurement and transplantation network, transplantation by donor type. Retrieved from http://optn.transplant.hrsa.gov/converge/data/default.asp Kirkpatrick, L. A., & Feeney, B. C. (2013). A simple guide to IBM SPSS, version 20 (12th ed.). Belmont, CA: Wadsworth. Missouri Department of Health and Senior Services. (2014). Missouri Center for Health Statistics (MCHS) Patient Abstract System. Retrieved from http://health.mo.gov/data/patientabstractsystem/ National Association of Health Data Organizations. (2014). Retrieved from https://www.nahdo.org/ New York State Department of Health. (2014). Statewide Planning and Research Cooperative System (SPARCS). Retrieved from http://www.health.ny.gov/ statistics/sparcs/ SAS Training. (2014). The power to know, business knowledge series. Retrieved from https://support.sas.com/edu/qs.html?id=BKS&ctry=us State of California Office of Statewide Health Planning and Development. (2014a). Healthcare Information Division Data Request Center Public Use File requests. Retrieved from http://www.oshpd.ca.gov/HID/ Data Request Center/PUF.html State of California Office of Statewide Health Planning and Development. (2014b). Healthcare Information Division, Data Request Center, Data Dictionaries 2010-2013. Retrieved from http://www.oshpd.ca.gov/HID/ Data Request Center/Data Documentation.html State of California Office of Statewide Health Planning and Development. (2014c). Healthcare Information Division, Data Request Center, Statewide Frequency of Encounters by Disposition, Emergency Department Data 2013. Retrieved from http://www.oshpd.ca.gov/HID/Products/EmerDeptData/ State of California Office of Statewide Health Planning and Development. (2014d). Healthcare Information Division, Emergency Department and Ambulatory Surgery Data. Retrieved from http://www.oshpd.ca.gov/HID/ Products/EmerDeptData/ Taub, L.-F. M. (2005). Concordance of provider recommendations for the care of individuals with diabetes with evidence based guidelines. Journal of the American Academy of Nurse Practitioners, 18(3), 124–133. The DAP Project for Statistics and Graphics. (2014). From Free Software Foundation, Inc. Retrieved from https://www.gnu.org/software/dap/ The R Project for Statistical Computing. Retrieved from http://www.r-project.org/ University of Maryland. (2014). Statistical software. Retrieved from http://www.oacs.umd.edu/software/StatPackages.asp University of Pittsburgh School of Nursing. (2014). Center for Research and Evaluation. Retrieved from http://www.nursing.pitt.edu/department/cre/

239

Copyright of Journal of the American Association of Nurse Practitioners is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

The use of large healthcare data sets in pursuit of a clinical question.

The use of large healthcare databases may be of interest to nurse practitioners who wish to answer clinical questions. This column will provide inform...
83KB Sizes 2 Downloads 7 Views