Medicare Claims Data as Public Use Files: A New Tool for Public Health Surveillance Erkan Erdem, PhD; Holly Korda, PhD, MA; Samuel “Chris” Haffer, PhD; Cary Sennett, MD, PhD rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr

laims data are an important source of data for public health surveillance but have not been widely used in the United States because of concern with personally identifiable health information and other issues. We describe the development and availability of a new set of public use files created using de-identified health care claims for fee-for-service Medicare beneficiaries, including individuals 65 years and older and individuals with disabilities younger than 65 years, and their application as tools for public health surveillance. We provide an overview of these files and their attributes; a review of beneficiary de-identification procedures and implications for analysis; a summary of advantages and limitations for use of the public use files for surveillance, alone and in combination with other data sources; and discussion and examples of their application for public health surveillance using examples that address chronic conditions monitoring, hospital readmissions, and prevalence and expenditures in diabetes care.

C

KEY WORDS: claims data, Medicare, public health surveillance,

public use files Public health surveillance—the systematic ongoing collection, management, analysis and interpretation of data followed by its dissemination to stimulate public health action1 —uses data from various sources to support program planning and policy making, assessing population health and prevention strategies, monitoring changes in health care practices and quality of care, identifying emerging health issues, and monitoring progress toward state and national objectives, such as those for Healthy People 2020.2 While health surveys, registries, information systems, environmental monitoring, and clinical and public health research have become standard public health information sources,3 administrative claims data from public

J Public Health Management Practice, 2014, 20(4), 445–452 C 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Copyright 

health care systems, such as Medicare, Medicaid, and the Veterans Administration, are gaining recognition as important public health surveillance tools. Claims data offer an inexpensive complement or surveillance alternative to expensive surveys, such as the National Health Interview Survey or state-level surveys such as the Behavioral Risk Factor Surveillance System, and use standardized definitions to define prevalence, risk factors, and complications, to assess changes over time, and to determine trends in specific populations and locations.4 These data also provide indicators of resource use and spending for specific conditions, patient and provider types, and regions. The Centers for Disease Control and Prevention identified improving access to and sharing data useful for public health surveillance, including claims data, as a priority in its vision statement for public health surveillance in the 21st century.5 Recent statements by the Council of State and Territorial Epidemiologists also call for strategic assessment and application of data to meet public health surveillance challenges of the future.6 Despite the considerable potential of claims data for public health surveillance, its use has been limited in the United States. With the exception of some states that use Medicaid data for surveillance, administrative claims data have not been widely available or accessible for public use, given concerns with personally

Author Affiliations: IMPAQ International, Washington, District of Columbia (Dr Erdem); Health Systems Research Associates, Portland, Maine (Dr Korda); Office of Information Products & Data Analytics, US Centers for Medicare & Medicaid Services, Baltimore, Maryland (Dr Haffer); and IMPAQ International, Columbia, Maryland (Dr Sennett). The research in this article was supported by the Centers for Medicare & Medicaid Services under contract number 500-2006-00007I/#T0004 with IMPAQ International. The views expressed in this article are those of the authors and do not necessarily reflect the views of the US Department of Health and Human Services, the Centers for Medicare & Medicaid Services, or IMPAQ International. The authors declare no conflicts of interest. Correspondence: Holly Korda, PhD, MA, Health Systems Research Associates, 15 Birch Lane, Portland, ME 04064 ([email protected]). DOI: 10.1097/PHH.0b013e3182a3e958

445 Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

446 ❘ Journal of Public Health Management and Practice identifiable information and legislative and regulatory restrictions on data sharing. Data timeliness and availability, and the inherent limitations of data collected for “billing” purposes, are also concerns. This report describes the development and application of a newly available set of public use files (PUFs) from the Centers for Medicare & Medicaid Services (CMS) as tools for public health surveillance. These PUFs were created using de-identified health care claims for fee-for-service (FFS) Medicare beneficiaries, including individuals 65 years and older and individuals with disabilities younger than 65 years, and are available online within the public domain.∗ In the following sections, we use these data to address priority policy issues—chronic conditions monitoring, hospital readmissions, and prevalence and expenditures in diabetes care—providing (1) an overview of these files and their attributes; (2) a review of beneficiary de-identification procedures and implications for analysis; (3) a summary of advantages and limitations for use of the PUFs for surveillance, alone and in combination with other data sources; and (4) discussion and examples of their application for public health surveillance.

● Background The CMS developed the Medicare Claims PUFs in 2010 as part of the efforts by the Department of Health and Human Services to improve the accessibility of public data for decision making to support a transformed health system. This initiative, funded by the American Recovery and Reinvestment Act, was designed to make administrative data available to the general public for comparative effectiveness research and allow for evidence-based research on various health conditions. The CMS maintains one of the most comprehensive administrative data resources: all FFS Medicare beneficiaries (about 80% of the Medicare population)— about 38.5 million enrollees in 2010—for different settings and types of care. This information is available in 2 versions: research-identifiable files (RIFs), which include beneficiary-level protected health information, and limited data set (LDS) files, which also contain beneficiary-level protected health information but with selected variables encrypted, blanked, or ranged.† Specifically, LDS files exclude direct identifiers (see ∗ Available at www.healthdata.gov and http://www.cms.gov/ BSAPUFS with documentation, data dictionary, and application programming interface. All files and documentation downloaded in March 2013. † Available at http://www.resdac.org/resconnect/articles/148.

later) as outlined in the Health Insurance Portability and Accountability Act of 1996 (HIPAA).‡ Given the sensitivity of information contained in RIFs, requests to obtain RIFs are reviewed by the CMS’s Privacy Board to assess the need for identifiable data. Both RIFs and LDS files require a data user agreement, which details how the data may be used, limitations on the uses of the data, and required data security standards to ensure the maximum amount of privacy protection. To access these data, researchers need to pay the CMS a recovery-of-cost fee, which represents a significant barrier for many. The purpose of this initiative is to increase access to CMS data through the creation and dissemination of PUFs, while continuing to strictly protect beneficiary and provider confidentiality. This third method of data access (in addition to RIFs and LDS files) is also designed to remove the recovery-of-cost fee and eliminate the need for a data user agreement (by deidentifying the information contained in the PUFs and complying with the privacy rule of the HIPAA and the Privacy Act of 1974). Accordingly, these PUFs not only exclude/remove the 18 direct identifiers defined by the HIPAA but also implement additional statistical disclosure limitation techniques to ensure that the privacy and confidentiality of Medicare beneficiaries are protected.

● CMS Medicare Claims PUFs Development of the Medicare Claims PUFs included a comprehensive legal analysis, assessment of the needs of health care researchers, case studies of other PUF initiatives in the United States, and interviews with stakeholders.7-9 As of May 2013, the initiative has prepared and released 8 basic stand-alone (BSA) PUFs covering all 8 settings for 2008 and 2010 (Table 1). The unit of observation depends on the size and richness of information contained in each data set and strategically designed for each PUF. Each of these PUFs is based on a 5% random (without replacement) sample of Medicare beneficiaries and cannot be linked to any of the other PUFs. Each PUF provides demographic (ie, gender and age of the beneficiaries) and claim-specific information (eg, diagnosis and procedure codes, spending, and utilization measures) for the sample of beneficiaries. Other aggregated files that linked information across multiple types of settings, and in some cases, multiple databases, were also produced and made public. ‡

Available at http://www.cms.gov/Research-Statistics-Dataand-Systems/Computer-Data-and-Systems/Privacy/DUA_-_ LDS.html.

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

Medicare Claims Data as PUFs

❘ 447

TABLE 1 ● Medicare Claims PUFsa

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

PUF Name

BSA Medicare Claims PUFs BSA Inpatient Claims

Years 2008

BSA Outpatient Procedures

2008 and 2010

BSA Durable Medical Equipment Line Items

2008 and 2010

BSA Prescription Drug Events

2008 and 2010

BSA Carrier Line Items

2008 and 2010

BSA Home Health Agency Beneficiaries

2008 and 2010

BSA Hospice Beneficiaries

2008 and 2010

BSA Skilled Nursing Facility Beneficiaries

2008 and 2010

Aggregated Medicare Claims PUFs Chronic Conditions

2008 and 2010

Institutional Provider and Beneficiary Summary File

2008 and 2010

Prescription Drug Profiles

2008 and 2010

Description This claim-level file contains 7 analytic variables: age, gender, base diagnosis related group, ICD-9 primary procedure code, length of stay, and Medicare payment amount. This procedure-level file contains 6 analytic variables: age, gender, ICD-9 primary diagnosis code, HCPCS procedure code,1 count of services provided, and Medicare payment amount. This line item-level file contains 6 analytic variables: age, gender, ICD-9 primary diagnosis code, HCPCS code, count of services provided, and Medicare payment amount. This event-level contains 12 analytic variables: age, gender, drug name, drug strength and unit of strength, dose form, class, quantity dispensed, days supply, total drug cost, payment by patient, and drug type. This is a line item-level file. It contains 10 analytic variables: age, gender, ICD-9 diagnosis code, HCPCS code, BETOS code,2 count of services provided, provider type, service type, place of service, and Medicare payment amount. This is a beneficiary-level file. It contains 7 analytic variables: age, gender, number of admissions, count of therapy visits, count of skilled nursing visits, count of home health aide visits, and Medicare payment amount. This beneficiary-level file contains 7 analytic variables: age, gender, indicator for deceased at discharge, terminal diagnosis, cancer diagnosis indicator, covered days, and Medicare payment amount. This beneficiary-level file contains 7 analytic variables: age, gender, admissions, covered days of skilled nursing facility rehabilitation services, covered days of skilled nursing facility rehabilitation plus extensive services, and Medicare payment amount. This is an aggregated file summarizing utilization by 100% of the FFS Medicare beneficiaries. Each record is a profile defined by age, gender, 11 chronic condition indicators, and dual-eligibility3 status of the beneficiaries. For each profile, many claim-related variables are provided in the form of averages. The data are disaggregated by length of enrollment: full year (12 mo) or less than full year. This is an aggregated file summarizing care and services provided by institutional providers (eg, hospital, skilled nursing facility) to FFS Medicare beneficiaries. The file includes 3 types of measures: (1) beneficiary measures; (2) cost and utilization measures; and (3) prevention quality indicators. This is an aggregated file summarizing drug events by the characteristics of Medicare beneficiaries, drugs, plans, and prescribers for 100% of prescription drug claims for Medicare beneficiaries covered by Part D.

Abbreviations: BETOS, Berenson-Eggers Type of Service; BSA, basic stand-alone; CPT, Current Procedural Technology; FFS, fee-for-service; HCPCS, Healthcare Common Procedure Coding System; ICD-9, International Classification of Diseases, Ninth Revision; PUF, public use file. a The HCPCS codes are based on the CPT codes developed by the American Medical Association and describe services provided by medical practitioners. Level I HCPCS codes are numeric and identical to CPT codes developed by the American Medical Association. Level II HCPCS are alphanumeric codes that are used by medical suppliers other than physicians, such as ambulance services or durable medical equipment. All HCPCS codes are assigned to BETOS clinical categories, which are used to analyze Medicare costs. Dual-eligibles are entitled to Medicare Part A and/or Part B and also eligible for some form of Medicaid benefit. For a review of different types of Medicaid coverage of Medicare beneficiaries, see https://www.cms.gov/MLNProducts/downloads/Medicare Beneficiaries Dual Eligibles At a Glance.pdf.

Medicare PUFs vary in structure and content, with some files offering greater potential to support specific types of surveillance activities than others. Because the files use retrospective, administrative claims data, they are of limited use for active surveillance or to detect outbreaks and incidence. However, they are well suited to investigations of disease prevalence and expenditures in the Medicare population and studies of institutional

provider treatment patterns and performance, including examination of geographic variation in utilization.

● Analytic Issues and Examples Administrative claims data contained in the PUFs can provide cross-sectional and longitudinal information

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

448 ❘ Journal of Public Health Management and Practice and enable studies of prevalence, spending by condition or provider, and by beneficiary subgroups. Because the PUFs use claims from the Medicare FFS program, which covers nearly 96% of US residents 65 years and older, the files offer a unique opportunity to investigate patterns and trends among beneficiaries in the US population aged 65 years and older. Race and socioeconomic status (SES) are not specified in the PUFs, limiting its use for monitoring racial-ethnic or SES disparities as stand-alone files, although the files can be used in combination with other information, such as census data, to estimate community demographics. The institutional files enable studies of resource use, institutionspecific and class treatment intensity, quality of care and performance, and cross-sectional comparisons and trending. Some PUFs can support studies of geographic variation (eg, by hospital referral regions).

rollees who were not dual-eligible beneficiaries.10 Beneficiaries with chronic conditions account for a disproportionate share of program payments for Part A. In 2008, approximately 36.4% of Part A beneficiaries had 2 or more chronic conditions, accounting for 85.5% of total Part A payments (among the Part A fullyear enrollees who were not dual-eligible beneficiaries). By 2010, the share of Part A beneficiaries with 2 or more chronic conditions had increased to 36.8%, accounting for 86.2% of total Part A payments. Total Medicare payments for this subpopulation for Part A enrollees increased by 5.2% between 2008 and 2010. Approximately 98% of the increase (about $4.1 billion) was for the care of those with multiple (≥2 chronic conditions).

Hospital readmissions trending by facility and facility type Single and multiple chronic conditions monitoring The CMS 2008 and 2010 Chronic Conditions PUFs are appropriate for surveillance and monitoring of chronic conditions, which are monitored over 2-year period. The PUFs can support studies of (1) periodic trending of prevalence, (2) spending for specific and multiple chronic conditions, and (3) changes in the prevalence of any of the 11 conditions or any combination of conditions (ie, comorbidities) available in the data set. These files summarize claims information for 100% of Medicare FFS enrollees grouped by gender, age, dual-eligibility status, 11 chronic condition indicators (eg, depression, diabetes, congestive heart failure), and length of enrollment (ie, full year or less than full year). Table 2 summarizes trends in the prevalence and costs of Medicare Part A (hospitalization) full-year en-

High hospital readmission rates are an indicator of poor quality care and a focus of performance improvement under the Affordable Care Act. Identifying hospitals with high readmissions and understanding factors that lead to readmissions are priorities for health system change. To investigate this issue, we used the 2010 Institutional Provider and Beneficiary Summary PUF, which summarizes the types of Medicare FFS beneficiaries served by institutions including disaggregation by age group, race, dual-eligibility status, health status (ie, data on how many beneficiaries had any of the 26 chronic conditions), as well as total Medicare payments, admissions, readmissions, and covered days. The CMS 2010 Institutional Provider and Beneficiary Summary PUF also provides average hierarchical condition category (HCC) risk score for beneficiaries using

TABLE 2 ● Changes in Enrollment, Total Payments, and Average Payment per Enrollee for Medicare Part A Full-Year

Beneficiaries by the Number of Chronic Conditions qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq No. Enrollees (Part A) No. Chronic Conditions 0 1 2 3 4 5 6 7 8 9 10 Total

Total Payment Part A (Millions)

Average Payment per Beneficiary

2008

2010

% Change

2008

2010

% Change

2008

2010

% Change

10 138 926 6 663 517 4 583 587 2 632 736 1 399 364 649 251 251 404 80 674 19 532 2 991 225 26 422 207

10 245 731 6 609 818 4 605 347 2 680 459 1 445 912 686 250 276 226 91 023 23 089 3 910 269 26 668 034

1.1 − 0.8 0.5 1.8 3.3 5.7 9.9 12.8 18.2 30.7 19.6 0.9

$2 511 $8 754 $13 740 $15 714 $15 097 $11 391 $6 575 $2 924 $913 $168 $15 $77 803

2 512 8 833 13 936 16 203 15 890 12 292 7 418 3 442 1 114 226 18 $81 884

0.0 0.9 1.4 3.1 5.2 7.9 12.8 17.7 21.9 34.9 19.8 5.2

$248 $1 314 $2 998 $5 969 $10 789 $17 545 $26 153 $36 243 $46 766 $56 014 $68 333 $2 945

$245 $1 336 $3 026 $6 045 $10 989 $17 912 $26 854 $37 813 $48 243 $57 806 $68 495 $ 3 070

− 1.0 1.7 0.9 1.3 1.9 2.1 2.7 4.3 3.2 3.2 0.2 4.3

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

Medicare Claims Data as PUFs

TABLE 3 ● Hospital Readmission Rates for Medicare

Fee-for-Service Beneficiaries qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq Category

Type

No. Hospitals

Readmission Rate

Size (beds)

0-25 25-100 100-250 250-500 500-2500 Unknown 0%-10% 10%-20% 20%-40% 40%-100% Unknown 0%-10% 10%-25% 25%-50% 50%-75% 75%-100% Unknown 0%-10% 10%-25% 25%-50% 50%-75% 75%-100% Unknown 0.75-1.25 1.25-1.5 1.5-4.5

167 954 1267 753 276 126 477 1042 1746 134 144 171 1591 1410 258 84 29 175 2432 852 46 14 24 948 1622 972 3541

12.0% 17.8% 18.8% 19.1% 20.2% 18.0% 18.1% 18.9% 19.2% 19.7% 18.0% 16.4% 18.3% 20.0% 23.2% 25.7% 11.9% 16.9% 18.7% 21.4% 25.5% 17.7% 0.0% 16.3% 18.6% 21.3% 19.1%

Occupancy rate

Dual share

Under 65 share

Average risk score

Total

the services of each hospital.§ Beneficiaries with higher risk scores are relatively more costly (and less healthy) than beneficiaries with lower scores. Using this PUF, we examined readmission rates for 3541 acute care (short-stay) hospitals that served Medicare beneficiaries in 2010.11 As Table 3 shows, readmission rates were higher for (i) larger hospitals (measured by the number of beds), (ii) highly used hospitals (measured by the occupancy rate), (iii) hospitals that serve more dualeligible and younger than 65 years beneficiaries, and (iv) hospitals that serve sicker beneficiaries (measured by the average risk score of beneficiaries). Although these findings do not control for other confounding factors, they were also confirmed in the context of a §

The CMS HCC risk adjustment model is used to adjust payments for Part C (Medicare Advantage) plans. The model assigns a risk score to each Medicare beneficiary on the basis of his or her Medicare FFS claims history. The scores are normalized so that the average risk score is 1.0. For additional information on the CMS HCC risk score model, visit https://www.cms.gov/ MedicareAdvtgSpecRateStats/06a Risk adjustment prior.asp.

❘ 449

multivariate analysis that controls for all factors at the same time.11

Prevalence and expenditures in medical and hospital-based diabetes care Diabetes mellitus is a common chronic condition affecting an estimated 25.8 million people, or 8.3% of the population of the United States in 2010, including 10.9 million adults 65 years and older (26.8%).12 Diabetes refers to a set of conditions, including type 1 diabetes mellitus, usually diagnosed in children and young adults, and type 2 diabetes mellitus, a metabolic disorder formerly known as adult-onset diabetes. Diabetes is a leading cause of death and disability and is often associated with comorbidities, including kidney failure, lower limb amputations, adult-onset blindness, obesity, hypertension, nerve damage, heart disease, and stroke. These conditions complicate treatment and increase related spending on diabetes in the United States, estimated at $176 billion in direct medical costs with an additional $69 billion in reduced productivity in 2012.13 To examine the prevalence and impact of diabetes, we used the 2010 chronic conditions PUFs, which summarize more than 50 million Medicare beneficiaries. In 2010, average Medicare reimbursements for a full-year enrollment (ie, 12 months of continuous enrollment) for beneficiaries who are not dual-eligible beneficiaries were $3146 and $4056 per beneficiary in Part A and Part B, respectively, and average drug costs per beneficiary were $2048. Table 4 shows diabetes among the most common 15 combinations of chronic conditions among Part A, Part B, and Part D beneficiaries.14 About 38.3% of the Medicare Part A beneficiaries and 31.5% of Medicare Part B beneficiaries did not have any of these chronic conditions. Beneficiaries with only diabetes were about 5.4% (6.0%) of the Part A (Part B) beneficiaries, but another 2.4% (2.6%) had both diabetes and ischemic heart disease, and 1.1% (1.2%) had both diabetes and rheumatoid arthritis/osteoarthritis within the most common combinations provided in the table. Ischemic heart disease is the most common comorbidity of diabetes among Medicare beneficiaries.

● Discussion There are many advantages to conducting surveillance using PUFs. First, the information is calculated from a large number of observations. Even the “smallest” CMS BSA PUF is based on a 5% sample of Medicare FFS beneficiaries—about 2.4 and 2.5 million individuals in 2008 and 2010, respectively. Hence, PUFs

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

450 ❘ Journal of Public Health Management and Practice TABLE 4 ● Most Common 15 Combinations of Chronic Conditions

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

No. Beneficiaries (Part A) 10 245 731 1 457 641 1 381 884 1 087 123 780 153 642 840 506 759 364 414 324 979 296 557 292 620 282 873 278 485 263 117 252 725 8 293 781 26 751 682

% Beneficiaries (Part A)

No. Beneficiaries (Part B)

% Beneficiaries (Part B)

No. Beneficiaries (Part D)

% Beneficiaries (Part D)

38.3 5.4 5.2 4.1 2.9 2.4 1.9 1.4 1.2 1.1 1.1 1.1 1.0 1.0 0.9 31.0 100.0

7 468 750 1 412 250 1 337 306 1 075 092 771 700 626 484 485 388 356 644 323 380 294 210 292 042 277 055 263 884 257 461 245 031 8 191 094 23 677 771

31.5 6.0 5.6 4.5 3.3 2.6 2.0 1.5 1.4 1.2 1.2 1.2 1.1 1.1 1.0 34.6 100.0

10 200 308 721 715 663 984 506 029 362 094 297 262 243 769 155 801 151 442 149 652 139 793 137 675 134 076 124 410 110 731 3 733 817 17 832 558

57.2 4.0 3.7 2.8 2.0 1.7 1.4 0.9 0.8 0.8 0.8 0.8 0.8 0.7 0.6 20.9 100.0

Conditions (Part A and Part B) None DIAB IHD RA/OA OSTEO DIAB and IHD DEPR CAN IHD and RA/OA DIAB and RA/OA OSTEO and RA/OA CHF and IHD CKD ALZ COPD All other Total

Conditions (Part D) None DIAB IHD RA/OA OSTEO DIAB and IHD DEPR ALZ CAN IHD and RA/OA DIAB and RA/OA OSTEO and RA/OA CHF and IHD CKD COPD All other Total

Abbreviations: ALZ, Alzheimer disease; CAN, cancer; CHF, congestive heart failure; CKD, chronic kidney disease; COPD, chronic obstructive pulmonary disease; DEPR, depression; DIAB, diabetes; IHD, ischemic heart disease; OSTEO, osteoporosis; RA/OA, rheumatoid arthritis/osteoarthritis.

can allow for analyses based on millions of observations compared with, for example, survey data based on much smaller samples. These PUFs also allow for analyses of subpopulations (eg, age groups, males vs females, dual-eligible vs non–dual-eligible) for finer research questions. Second, CMS PUFs include health care utilization of approximately 96% of adults 65 years and older in the United States. Given that older adults use a higher share of the health care resources, these PUFs cover a significant share of the spending. Third, the source data are collected by the CMS and have been available for research for many years (with restriction described earlier). The additional cost of producing and releasing PUFs from these data files is especially small in light of the American Recovery and Reinvestment Act investment in the initial costs of developing the appropriate methodology and expertise. The fact that the PUFs are made available to the public freely and without a data user agreement also lowers the costs to the rest of the public. Fourth, the PUFs allow for research (eg, examination of prevalence and treatment) based on standardized definitions of diagnoses and conditions (eg, using ICD-9 [International Classification of Diseases, Ninth Revision] codes). This property eliminates the difficulties that are associated with most self-reported survey data. For example, the 2008 and 2010 Chronic Conditions PUFs allow for analyses of 11 chronic conditions that are determined by searching

for specific ICD-9, CPT (Current Procedural Terminology) 4, and HCPCS (Healthcare Common Procedure Coding System) codes in the beneficiaries’ Medicare FFS claims data for the last few years. Production and release of these PUFs on an annual basis would facilitate longitudinal, year-on-year investigation of trends by disease or condition grouping (ie, a chronic condition with its important comorbidities) and provide information on spending associated with specific conditions for subpopulations (eg, gender and age groups). Finally, because the PUFs are de-identified and no protected health information is released, concerns regarding the privacy and confidentiality of individuals are significantly reduced. The CMS PUFs are not without limitations. For example, the main de-identification principle is kanonymity, which depends on making the information included in the PUF anonymous by ensuring that there exist at least k identical observations for each combination of identifying variables.15 That is, information for a cell (in a table of frequencies or a PUF) cannot be released if the number of individuals for that cell is less than k, which is equal to 11 in the CMS PUFs. Values of 3 to 5 are commonly used by agencies in the 

For more information, see the Chronic Condition Warehouse Web site: http://www.ccwdata.org/cs/groups/public/ documents/document/ccw conditioncategories2011.pdf.

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

Medicare Claims Data as PUFs

United States, but it may be larger depending on the sensitivity of the information and the amount of protection required.16 This principle typically requires exclusion/suppression of information for rare diseases that affect very few individuals and/or small populations. In fact, the principle of k-anonymity makes inclusion of an extended list of variables increasingly difficult. This is because each additional variable leads to smaller cells—combinations of variables with a small number of individuals/beneficiaries. As a result, most CMS BSA PUFs contain 6 to 8 variables, limiting the utility of the files, and geography (eg, state) and race/ethnicity are not included as a combination of the 2 variables creates many small cells. Hence, these PUFs may not be useful for policy or intervention analysis but may provide value in a supportive role. Another technique used in the CMS PUFs is generalization in the form of coarsening information (eg, providing age categories rather than exact ages) or rounding values (eg, rounded payment values rather than actual ones). This is one of the desirable options compared with other disclosure limitation techniques, such as data swapping and adding random noise, but still limits the utility of the information and potentially leads to biased estimates. Also, because PUFs are deidentified (including the ones with encrypted identifiers), they cannot be linked to external databases, registries, or surveys, reducing their potential. In the case of the CMS BSA PUFs, each BSA PUF (eg, inpatient) is a stand-alone file and cannot be linked to any of the other PUFs (eg, outpatient). While patient-level linkage is not possible, one of the CMS BSA PUFs (Institutional Provider and Beneficiary Summary PUF described earlier) can be used in combination with other data sources, for example, census and/or state-level data (state surveillance systems etc), to link the files with contextual, demographic, and market area information useful for public health surveillance. For example, census data on race, income, and age composition of the surrounding service area, metropolitan service area, or county can be overlaid with the PUFs to provide location and service area information. Other secondary data sets that describe characteristics and availability of other provider types, or which provide information about multiple determinants of health such as the Robert Wood Johnson Foundation–funded County Health Rankings data developed by the University of Wisconsin Population Health Institute,¶ can also be used to enrich information available in the PUFs. Using the PUFs in this way, as one of several secondary data sets, is consistent with the national public health ¶

Available at http://www.countyhealthrankings.org.

❘ 451

grid strategy which includes consumers, providers, and public health agencies participating in data sharing for public health surveillance.17 This concept has been advanced by the Centers for Disease Control and Prevention’s National Center for Public Health Informatics.18 Moving forward, public health researchers may want to cross-validate data definitions and prevalence from the Medicare PUFs with surveys and other similar data sources to discover how best to integrate and use the PUFs as part of the broader surveillance picture. Validation with Centers for Disease Control and Prevention surveillance data sets and large national surveys, such as the National Health Interview Survey, currently used to examine prevalence of various conditions, may also help advance understanding of public health surveillance across data sources.

● Conclusion Claims data from public and private health systems have been an underutilized resource for public health surveillance. New attention to population health and public health surveillance, combined with availability of identifiable and de-identified claims data for the nearly universal enrollment of individuals 65 years and older in the Medicare program, brings into focus the potential of claims data, specifically Medicare Claims PUFs, as a secondary data source for surveillance. The Medicare PUFs we describe and illustrate are available free of charge as part of the US Department of Health and Human Services’ efforts to promote transparent, available public health data. They include information for 2008 and 2010, with possible updates for future years. These valuable resources are accessible, easy to use, and contribute additional tools for the national grid for public heath surveillance with unique, comprehensive access to the older adult US population. These files also provide useful information to generate questions, test hypotheses, and/or for power calculations. Finally, they may provide an initial look into claims data and an opportunity for preparation (eg, developing codes) for researchers who plan to request and access identifiable data from the CMS. REFERENCES 1. Thacker SB, Berkelman RI. Public health surveillance in the United States. Epidemiol Rev. 1988;10:164-190. 2. Desai J, Geiss L, Mukhtar Q, et al. Public health surveillance of diabetes in the United States. J Public Health Manag Pract. 2003;(suppl):S44-S51. 3. Thacker SB, Qualters JR, Lee LM Centers for Disease Control and Prevention. Public health surveillance in the United

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

452 ❘ Journal of Public Health Management and Practice

4.

5.

6.

7.

8.

9.

States: evolution and challenges. MMWR Surveill Summ. 2012;61(suppl):3-9. Bernstein AB, Sweeney MH; Centers for Disease Control and Prevention. Public health surveillance data: legal, policy, ethical, regulatory, and practical issues. MMWR Surveill Summ. 2012;61(suppl):30-34. Buehler J. Introduction: CDC’s vision for public health surveillance in the 21st century. MMWR Surveill Summ. 2012;61(suppl):1-2. Smith PF, Hadler JL, Stanbury M, Rolfs RT, Hopkins RS the CSTE Surveillance Strategy Group. “Blueprint Version 2.0”: updating public health surveillance for the 21st century. J Public Health Manag Pract. 2013;19(3): 231-239. Erdem E, Prada S. Creation of public use files: lessons learned from the comparative effectiveness research public use files data pilot project. In: Joint Statistical Meeting Proceedings, 2011. Alexandria, VA: Government Statistics Section, American Statistical Association; 2011:40954109. Erdem E, Concannon TW. What do researchers say about proposed Medicare claims public use files? J Comp Eff Res. 2012;1(6):519-525. Prada S, Gonz´alez-Mart´ınez C, Borton J, et al. Avoiding disclosure of individually identifiable health information: a literature review [published online ahead of print December 14, 2011] SAGE Open. doi:10.1177/215824401143 1279.

10. Erdem E, Prada S, Haffer S. Medicare payments: how much do chronic conditions matter? Medicare Medicaid Res Rev. 2013;3(2):E1-E15. 11. Erdem E, Fout BT, Korda H, Abolude A. Hospital readmission rates in Medicare. Working Paper. Submitted. 12. Centers for Disease Control and Prevention. National Diabetes Fact Sheet: National Estimates and General Information on Diabetes and Prediabetes in the United States, 2011. Atlanta, GA: US Department of Health and Human Services, Centers for Disease Control and Prevention; 2011. 13. American Diabetes Association. Economic costs of diabetes in the U.S. in 2007. Diabetes Care. 2008;31(3):596-615. 14. Erdem E, Korda H. Medicare fee-for-service spending for diabetes and co-morbidities. Working Paper. 2013. Submitted. 15. Sweeney L. k-Anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowledge-Based Syst. 2002;10(7):557570. 16. Federal Committee on Statistical Methodology. Report on Statistical Disclosure Limitation Methodology. Rev 2005. Washington, DC: Office of Management and Budget. Statistical Policy Working Paper 22. 17. Thacker SB, Qualters JR, Lee LM; Centers for Disease Control and Prevention. Public health surveillance in the United States: evolution and challenges. MMWR Surveill Summ. 2012;61(suppl):3-9. 18. Savel TG, Hall KE, McMullin V, et al. A public health grid (PH Grid): architecture and value proposition for 21st century public health. Int J Med Inform. 2010;79:523-529.

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

Medicare claims data as public use files: a new tool for public health surveillance.

Claims data are an important source of data for public health surveillance but have not been widely used in the United States because of concern with ...
122KB Sizes 0 Downloads 3 Views