Big Data Big data is a big deal. Witness the White House reports, industry analyses, professional association strategic planning, an entire issue of Health Affairs, consulting shop tomes, and more, most touting the transformative potential of big data analytics. What is big data? Although no standard deﬁnition exists, most attribute 3 characteristics to the term: very large volumes of data, rapid growth in the accumulation of information, and multiple sources of inputs in varying formats. In 2013, estimates were that worldwide, 4 zettabytes of data were accumulated (a zettabyte is 1,000,000,000,000,000,000, 000 bytes), more than double the volume of 2011 . As context, if every person in the United States took a digital photo every second of every day for a month, that would equal 1 zettabyte. Although commerce, entertainment, social media, and national security all occupy a place on the big data spectrum, health care holds particular appeal, namely for the potential of analytic techniques to shape improvements in quality, access, and affordability. As an example, in April 2014, the Wall Street Journal released data from CMS describing the payment of 14% of all Medicare professional fees ($77 billion to 888,000 physicians) to the top 1% of physicians. In 2012, 334 physicians received more than $3 million each; the top 1,000 collected $3.05 billion. Observers hailed the ﬁndings, foreseeing a means of increasing transparency and pinpointing inefﬁciency, perhaps even fraud. Rather than mining of a stable, 2-yearold data set, a corollary of rapid growth, termed velocity, of real-time two-way communication brings a revolutionary (and disruptive) change in some practices and businesses.
An example of dynamic two-way data mining is a new Global Positioning System navigational application. Waze, founded in Israel in 2008 and acquired by Google in 2013, starts with a map providing voice and image-guided driving directions. But Waze then goes another step, with the driver’s smart phone not just receiving such input but also generating information about speed and volume of trafﬁc on the designated route. Are there examples of such dynamic real-time data analytics in medicine? Certainly, including admissions for chronic obstructive pulmonary disease exacerbations correlated with weather data such as summer heat waves, spikes in emergency department visits for asthma associated with pollen counts, declining physical activity with progression of dementia, and biometric surveillance of vital signs. Several concerns emerge in any discussion about big data and health care. Misattribution is a persistent threat in big data mining. In the previously mentioned CMS physician payment report, pathologists, ophthalmologists, and oncologists are conspicuously at the top of the lists. What is not so obvious, at least to most, is that a large chunk of such fees are to cover disproportionately high expenses for drugs, supplies, and equipment. Conclusions drawn from analysis of big data are only as valid as the source information. In 2012 and 2013, in a widely cited report published in Nature, Google Flu Trends, which calculates disease prevalence on the basis of Internet searches for symptoms, dramatically overestimated the extent of the ﬂu outbreak compared with conventional (and slower) data surveillance methods . It turns out that, for whatever reason, many more people searched for ﬂulike symptoms without personally having evidence of the disease.
Most important, there is a natural tension between data sharing and privacy. And any breach in the conﬁdentiality of personal health information will unavoidably undermine the public trust. As a counterpoint to these concerns, advocates of big data research emphasize two areas of inquiry. Predictive analytics is the use of data—current or past performance or both—to anticipate and shape future actions . In medicine, the most important metric of success would be positively inﬂuencing the clinical status of a patient. Such techniques hold the promise of reducing or avoiding the risk for readmissions, clinical decompensation, adverse events, and other perils. Constant learning can be fueled by the analysis of big data. Rather than conﬁned to hypothesis-driven, discrete clinical trials of tightly deﬁned subjects with data prospectively collected, proponents of big data research envision the discovery of new knowledge by the systematic harvesting of new data from observations . Causal associations between these observations are explored constantly with speed and efﬁciency by machine-driven algorithms.
REFERENCES 1. Podesta J, Pritzer P, Moniz EJ, Holdren J, Zients J. Big data: seizing opportunities, preserving values. Washington, District of Columbia: Executive Ofﬁce of the President; 2014. 2. Butler D. When Google got ﬂu wrong. Nature 2013;494:155-6. 3. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage highrisk and high-cost patients. Health Aff (Millwood) 2014;33:1123-31. 4. Krumholz HM. Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. Health Aff (Millwood) 2014;33:1163-70.
Michael J. Pentecost, MD: Magellan Health, 6950 Columbia Gateway Drive, Columbia, MD 21046; e-mail: pentecost. [email protected]
ª 2015 American College of Radiology 1546-1440/14/$36.00 n http://dx.doi.org/10.1016/j.jacr.2014.10.018
Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the "big data" era. The availability of big data provides unprecedented opportunities