At the Intersection of Health, Health Care and Policy Cite this article as: Dawn Fallik For Big Data, Big Questions Remain Health Affairs, 33, no.7 (2014):1111-1114 doi: 10.1377/hlthaff.2014.0522

The online version of this article, along with updated information and services, is available at: http://content.healthaffairs.org/content/33/7/1111.full.html

For Reprints, Links & Permissions: http://healthaffairs.org/1340_reprints.php E-mail Alerts : http://content.healthaffairs.org/subscriptions/etoc.dtl To Subscribe: http://content.healthaffairs.org/subscriptions/online.shtml

Health Affairs is published monthly by Project HOPE at 7500 Old Georgetown Road, Suite 600, Bethesda, MD 20814-6133. Copyright © 2014 by Project HOPE - The People-to-People Health Foundation. As provided by United States copyright law (Title 17, U.S. Code), no part of Health Affairs may be reproduced, displayed, or transmitted in any form or by any means, electronic or mechanical, including photocopying or by information storage or retrieval systems, without prior written permission from the Publisher. All rights reserved.

Not for commercial use or unauthorized distribution Downloaded from content.healthaffairs.org by Health Affairs on August 24, 2015 at Haiti: HEALTH AFFAIRS Sponsored

Entry Point

Delivering data: Cardiologist Jose Soler uses an app on an iPad to review medical tests of one of his patients at Northwest Medical Center, in Margate, Florida. Such digital tools are part of the big-data revolution in US health care. doi:

10.1377/hlthaff.2014.0522

For Big Data, Big Questions Remain Medicare’s release of practitioner payments highlights the strengths and weaknesses of digging into big data. BY DAWN FALLIK

D

uring the second week of April, medical journalists nationwide dropped everything and started playing with Medicare records. In response to a lawsuit from the Wall Street Journal, the federal government released payment records for more than 825,000 practitioners na-

tionwide. The multimillion-record release from the Centers for Medicare and Medicaid Services (CMS) meant that reporters could figure out which doctors, down to their very names and specialties, were bringing in the biggest federal payments. It was great information, fantastic source material, and just the latest em-

Photograph by Emily Michot/Miami Herald/MCT via Getty Images

bodiment of the promise of big data. It is that same promise that recently led pharmaceutical companies to agree to merge clinical data in the hope of deciphering diseases such as Alzheimer’s and type 2 diabetes. And it is the same promise that might give the public the transparent, useful information they need to choose the best doctor, the best price, and maybe even—by combining crime statistics with real estate numbers and restaurant reviews—the best place to live. Except none of these things happens magically, seamlessly, cleanly. Databases managed by one state, for example, rarely match up to those from another state—they may collect different information or use different diagnosis codes. And rarely can data, especially in great volume, tell a clear story on their own. They have to be understood, interpreted, and explained. With the Medicare filings, some of the reporters writing stories had never worked with data before. Even computer-assisted data guru reporters such as Sarah Cohen at the New York Times fought through the files. “This first round of stories has to be turned around so fast, and the data is so complicated and there are so many ways to go wrong,” Cohen said. “When you see a big payment, you have to imagine all the ways that doctors run their offices.” For example, not all payments attributed to a particular doctor ultimately end up in his or her own pocket. “If the internist has their own lab, all the lab charges are in that doctor’s name,” Cohen said. Although the data are available to the public, that doesn’t mean that they’re offered in an accessible or simple format. To paint any kind of clear picture, researchers and journalists must download the millions of records and analyze them using statistical software. It’s not as if every consumer can now go online to Medicare.gov and start searching by doctor or ZIP code. “Medicare has not made this easy,” Cohen said. “I think they’re waiting to see what kind of things trip up the news organizations, and then they can figure

July 2014

33:7

Downloaded from content.healthaffairs.org by Health Affairs on August 24, 2015 at Haiti: HEALTH AFFAIRS Sponsored

Health A ffairs

1111

Entry Point out whether or not they can make something available for the public.”

Defining ‘Big Data’

It’s unclear when “big data” became the buzzword of the day. Or, really, what it means. New York University (NYU) computer science professor Ernest Davis said that the term is used very broadly for all kinds of large data sets, structured and unstructured. “It’s a really vague term. Is it data that people are unconsciously providing through ordinary communication?” Davis asked, describing people’s tweets or even their location reported unknowingly by a smartphone—the kind of information retailers could use to identify likely customers in the immediate area. “So if you’re passing by a store, they might call you on your smartphone and tell you about a sale going on,” he continued. The term can also refer to the volume of data. But Stephen Doig, a data reporting pioneer and Knight Chair in Journalism at Arizona State University, wondered just how big data has to be before you categorize it that way: “Is big data when you have a million records? Ten million?” It’s a moving target. After all, using millions of records to analyze trends and predict patterns isn’t new. That is how Amazon, sorting through millions of sales records, can figure out what customers want before they know they want it. It is how your credit card company can spot sudden changes in spending patterns, alerting you that your card may have been stolen before you even notice it is gone. What’s new is how easy it is to do the analysis. “The computing tools are much more powerful to work with, and much more affordable,” Doig said. “I’m doing a project for California Watch right now on a Mac Mini, and it [has] 50 million records. The analysis techniques don’t change.You’re still merging, sorting, filtering. It’s just that the software is powerful enough to handle these huge amounts.” While the computer power is simpler and cheaper, the amount of information available has also exploded, particularly in the business world. 1112

Health Affairs

J uly 201 4

“Businesses realize that gathering information on customers and predicting buying patterns can get them, presumably, better information so that they can sell better and sell more,” Doig said. That’s important not just for picking the next electronic gadget or hot Christmas toy. Researchers, including many published in this issue of Health Affairs, say that when it comes to health care, figuring out how patients access medical care, order prescriptions, and recover from procedures could greatly change the health care system over the next decade. But all of those opportunities come at a cost. White House officials have expressed concern about the collection of records and privacy. In May the White House issued a report1 of government and private-sector data use, noting concerns about housing and employment discrimination issues in particular. White House counselor John Podesta led the team that drafted the report. “It was a moment to step back and say, ‘Does this change our basic framework or our look at the way we’re dealing with records and privacy,’” Podesta said in an Associated Press interview.2

Digging Deeper With the newly released Medicare data, journalists could plug the data into their computers and find the doctors who filed the most payer claims, Doig explained. But those stories were often misleading. For example, several publications reported that ophthalmologists were the greatest beneficiaries of Medicare payments.3 But not all of the stories explained that ophthalmologists typically serve an older patient population and, therefore, bill Medicare more frequently than other doctors. Nor did all of the stories explore the fact that ophthalmologists were more likely than other doctors to administer expensive drugs during office visits for conditions such as macular degeneration. The cost of those drugs may be billed to Medicare by the doctor administering them, but most of the sizable payments go to drug companies and distributors. It’s not just newspapers that make mistakes. Detractors of big data often point to Google Flu Trends, a model for following how flu spreads in real time by analyzing search-engine 33 : 7

queries. In 2009 Google researchers published an article in Nature,4 reporting that Flu Trends’ results were 97 percent accurate. But a follow-up paper5 reported that in 2013 Google estimated almost twice as many flu cases as the Centers for Disease Control and Prevention estimated. So who was searching for flu symptoms? Was it only those who were sick? Or was it also people simply trying to avoid the disease–people suddenly interested or fearful because someone coughed on them on the subway? It’s not clear. Now the Google Flu Trends website, which today tracks searches related to flu in twenty-five countries, states that “historically, national and regional estimates have been very consistent with traditional surveillance data collected by health agencies[;] however, it is possible that future estimates may deviate from actual flu activity.”6

A Public Resource In Rhode Island, Jennifer Wood is on the front lines of her state’s big data collection. As chief of staff and general counsel for Elizabeth Roberts, Rhode Island’s lieutenant governor, Wood coordinates an interagency team to develop the state’s all-payer claims database. The database, still in its collection phase, includes medical, pharmacy, and dental claims, as well as payer providers. The hopes for using the data are high. “Until we can get our arms around the total costs of care from all sources, we can’t do any analysis to address cost issues in health care,” said Wood, adding that state officials worked with the School of Public Health at Brown University to make sure the database was set up in a way that would be useful for the public and researchers. Other states, including Colorado, Massachusetts, and Vermont, have similar databases, but Rhode Island’s is different in one key way: It agreed to give health insurance policy holders the ability to opt out of being part of the database. The state has a million residents, and officials hope that, accounting for the uninsured, who are far less likely than the insured are to be tracked, about 900,000 will be in the database. They expected that another 1–3 percent of

Downloaded from content.healthaffairs.org by Health Affairs on August 24, 2015 at Haiti: HEALTH AFFAIRS Sponsored

the insured would opt out. “Our question was—if 3 percent decided to opt out, would that erode the utility of the data set, and we found that it would still be reliable and valid,” Wood said. Fortunately, it’s been “much less,” she said, but could not give precise numbers. The truth is, it is unclear what use the public will have for this data. Rhode Island officials hope to have their first reports on spending and disease trends around December 2014. “As a policy wonk, I find it really exciting that this could shed light on patterns of health treatment and disease as well as patterns in terms of costs and efficacy,” Wood said. “But I’m not so naïve that I’m thinking the general public is going to download SPSS and do the analysis,” she continued, referring to the complex software Statistical Package for Social Sciences.

Getting Personal Having publicly available health data isn’t new. Mortality data are available through the National Center for Health Statistics. And for anyone who wants to know how many people had bad reactions to a specific drug, the Food and Drug Administration’s adverse events database is available online for download. What’s new is just how personal that data can be—details about people’s location, purchasing habits, even what they searched for on Google and how they responded to that new blood pressure medication. Much of that information we hand over with a distracted swipe of the pen in a doctor’s waiting room or by clicking “accept” without reading every line of a website’s terms of use. Davis, the computer science professor from NYU, has worries, particularly about privacy issues and illegal discrimination but also about the fact that big data sets are generally collected unsystematically, so that they can easily incorporate large biases that can be hard to detect and hard to correct for. “If important decisions are being made by applying opaque statistical techniques to big data sets, then there is a serious danger that these decisions are in fact being based on features that are essentially proxies for race, gender,

and so on,” he said. Collecting millions of records—often from various agencies—doesn’t just create analytical problems. There are ethical issues as well, especially when it comes to sharing data.

The Case Of Henrietta Lacks Consider the well-known case of Henrietta Lacks, a young black woman who was diagnosed with and died of cervical cancer in 1951. Doctors took samples of her cells without her knowledge and then shared them with scientists, who used them to develop a human cell line for research. No one knows why, but her cells, known as HeLa cells, didn’t die. They were used in numerous experiments over many years, including one experiment that helped developed the polio vaccine. Eventually, labs were selling vials of the cells to researchers around the world, yet Lacks’s family didn’t see any of that money or even know anything about how the cells were being used. The 2010 publication of Rebecca Skloot’s national bestseller The Immortal Life of Henrietta Lacks finally brought this story to a broader audience. In March 2013 German scientists published the HeLa cell line’s entire genome sequence.7 That is when the National Institutes of Health’s (NIH’s) director, Francis Collins, and deputy director for science, outreach, and policy, Kathy Hudson, intervened. After extensive consultation with the Lacks family, the NIH put in place a policy to respect the family’s wishes and protect their privacy. Now the data are no longer freely available to the public, and, according to the NIH, investigators have to apply for access to the data and acknowledge the Lacks family in any papers or presentations, among other restrictions.

Implications For Research Policy In the process of establishing the policy, Hudson and fellow researchers were forced to confront broader questions of how to respect personal privacy in the era of big data. “With human genomic data, the challenge is, how do you make it available for research and protect the privacy of those

who provide that data,” Hudson said. “We tried to strike the right balance.” In a call monitored by a department spokesperson, Hudson pointed out that there are different kinds of data being collected in NIH-funded research: genomic data, imaging data, and electronic health record (EHR) information. Each creates its own kind of challenge, both medically and ethically. EHRs offer tremendous opportunities to understand diagnoses and treatments on a broad scale—serving as a sort of large clinical trial. In the past it was usually only an individual physician and his or her patient who knew all of the details: the patient’s diagnosis, prescription, and outcome. The data weren’t collected in a systematic way, Hudson said. Now with EHRs, researchers can look at large numbers of patients and readily answer fundamental questions about disease management. “For example, with blood pressure in older people—what is the best management? There are lots of medications used, but if we could analyze thousands of records of people over a certain age with a certain medical history and extract [those data] and follow [the people] over time, we could do an observational study comparing drug A with drug B,” Hudson said. The challenge is being mindful of patients’ wishes and expectations, she added. Another factor that people handling big data must consider is the possibility that some populations may be excluded from records collection in a way that skews analysis: those served by understaffed hospitals that don’t have the time or staffing to document and share the information or by underfunded clinics that lack the necessary computer software and information technology infrastructure. If the data aren’t broad and inclusive, then it’s hard to gather accurate results, Hudson said. ▪ Dawn Fallik ([email protected]) is a journalist specializing in database analysis, medical coverage, and digital storytelling. She is a visiting professor at the University of Kansas William A. White School of Journalism and Mass Communication, in Lawrence. She will return to her position as a journalism professor at the University of Delaware, in Newark, in the fall of 2015.

July 2014

33:7

Downloaded from content.healthaffairs.org by Health Affairs on August 24, 2015 at Haiti: HEALTH AFFAIRS Sponsored

Health A ffairs

1113

Entry Point

NOTES 1 Podesta J, Pritzker P, Moniz EJ, Holdren J, Zients J. Big data: seizing opportunities, preserving values [Internet]. Washington (DC): Executive Office of the President; 2014 May [cited 2014 May 30]. Available from: http:// www.whitehouse.gov/sites/ default/files/docs/big_data_ privacy_report_may_1_2014.pdf 2 Sullivan E. White House says big data could be used to discriminate against Americans. PBS News-

1114

H e a lt h A f fai r s

Hour [serial on the Internet]. 2014 Apr 26 [cited 2014 Jun 17]. Available from: http://www .pbs.org/newshour/rundown/ white-house-says-big-data-useddiscriminate-americans/ 3 Chen C, Pearson S. Top Medicare doctor paid $21 million in 2012, data show. Bloomberg News [serial on the Internet]. 2014 Apr 9 [cited 2014 May 30]. Available from: http://www.bloomberg .com/news/2014-04-09/top-

J u ly 201 4

3 3: 7

medicare-doctor-paid-21-millionin-2012-data-shows.html 4 Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009; 457(7232):1012–4. 5 Butler D. When Google got flu wrong. Nature. 2013; 494(7436):155–6. 6 Google Flu Trends. Frequently asked questions [Internet].

Downloaded from content.healthaffairs.org by Health Affairs on August 24, 2015 at Haiti: HEALTH AFFAIRS Sponsored

Mountain View (CA): Google; [cited 2014 May 30]. Available from: http://www.google.org/ flutrends/about/faq.html 7 Landry JJ, Pyl PT, Rausch T, Zichner T, Tekkedil MM, Stütz AM, et al. The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda). 2013; 3(8):1213–24.

For big data, big questions remain.

For big data, big questions remain. - PDF Download Free
134KB Sizes 0 Downloads 5 Views