Foreword: Big Data and its Application in Health Disparities Research
Foreword
Guest Editors: Eberechukwu Onukwugha, MS, PhD1; O. Kenrik Duru, MD, MS2; Emmanuel Peprah, PhD3 We developed this themed is-
tives; 2) a clinical trial perspective
sue out of a desire to understand
on whether Big Data will increase
the relevant issues that arise when
or decrease disparities within the
utilizing Big Data to document,
health care system; 3) results from
explain and address health care dis-
an investigation of how differential
parities. The term, Big Data, has
linkage by race can affect health
been coined to define the large and
care disparities research using the
complex datasets that are character-
linked data; 4) results from a study
ized by volume (large number of
of Hispanic residential isolation
records), variety (multiple sources),
and the relationship with Atten-
velocity (frequent, and not neces-
tion Deficit Hyperactivity Disorder
sarily uniform, updates) and value
health service utilization; 5) a scop-
(unique and powerful insights). We
ing review of opportunities for Big
sought manuscripts from diverse
Data science to address health dis-
settings including academia, health
parities as well as some challenges.
care systems, contract research or-
ganizations, and government. The
Big Data from a modern view-
contributed manuscripts offer an
point, the problem of analyzing
important, baseline snapshot of
large amounts of data to under-
current perspectives on Big Data
stand the content of said data, draw
and health disparities. Specifically,
reasonable conclusions, and make
the articles in this special section
meaningful inferences is an age-old
provide: 1) an overview of recent ef-
problem requiring the development
forts to improve workforce diversity
of novel tools. Big Data problems
and training through the National
existed even before the invention
Institutes of Health Big Data to
of the 20th century computer. For
Knowledge (BD2K) diversity initia-
example, it was predicted that it
The articles presented in this special issue advance the conversation by describing the current efforts, findings and concerns related to Big Data and health disparities. They offer important recommendations and perspectives to consider when designing systems that can usefully leverage Big Data to reduce health disparities. We hope that ongoing Big Data efforts can build on these contributions to advance the conversation, address our embedded assumptions, and identify levers for action to reduce health care disparities. Ethn Dis. 2017;27(2):6972; doi:10.18865/ed.27.2.69.
Keywords: Big Data, Health Disparities Pharmaceutical Health Services Research Department; University of Maryland School of Pharmacy 2 Division of General Internal Medicine; Geffen School of Medicine, University of California Los Angeles 3 Center for Translation Research and Implementation Science (CTRIS); National Heart, Lung, and Blood Institute; National Institutes of Health 1
Address correspondence to Eberechukwu Onukwugha, MS, PhD; Pharmaceutical Health Services Research Department, University of Maryland School of Pharmacy, 220 Arch Street, Baltimore, MD 21201; 410.706.8981;
[email protected]. edu
Ethnicity & Disease, Volume 27, Number 2, Spring 2017
Although we are familiar with
69
Foreword: Big Data and Health Disparities - Onukwugha et al would take more than eight years
over, characterization of pleiotropic
ignoring best practices for the use
to hand-tabulate all the data col-
effects based on integration of vari-
and interpretation of either clinical
lected in the 1880 US census, and
ous Big Data types will be critical to
trial or observational data can limit
it would take more than 10 years
identifying molecular pathways that
our ability to identify and address
to analyze data from the 1890 cen-
underlie disease and health. Using
health disparities. Most EHRs are
sus.1 The US Census Bureau needed
Hollerith’s example, we note that
currently constructed to maximize
a major technological advance to
these challenges provide the oppor-
billing revenue; increasing the ca-
analyze and produce accurate data
tunity to develop solutions that can
pacity of these datasets to collect
to inform policy development at
address the vexing problems posed
and merge clinically relevant data
the government level. To accom-
by Big Data in the modern era.
elements will be an important step
plish this task, Herman Hollerith
Currently, 87% of physicians
toward building the infrastructure
automated the process and devel-
and 80% of non-Federal acute
needed to tackle health disparities.
oped a storing and information
hospitals in the United States use
processing system in the 1880s.2
single-system electronic health re-
data such as genetic information
Data standardization, computa-
cords (EHRs), in addition to the
and patient-reported health out-
tion analysis, methodology, statis-
development of aggregated Clini-
comes have even greater potential to
tical approaches, and data storage
cal Data Repositories (CDRs) and
address health disparities. However,
all represented challenges that Hol-
other massive databases.6 It is pos-
exciting opportunities such as algo-
lerith had to solve in addressing his
sible that novel uses of Big Data
rithms for machine learning using
Big Data problem. We have simi-
will finally reduce or eliminate stub-
large amounts of data from smart-
lar challenges in the modern era.
bornly persistent racial/ethnic and
phone health apps, or programs
As an example, the development
socioeconomic disparities in health
that analyze genetic mutations in a
of genome-wide association stud-
outcomes. The articles in this issue
large patient population to unlock
ies (GWAS) required the creation
discuss the current data, privacy,
triggers for cancer growth, will re-
of novel analytical pipelines that
and larger societal environments
quire patients to consent to use
combined statistical analysis with
and illustrate how we can make
of their data or ideally provide it
computing power. These GWAS
progress toward this goal. Canner
themselves. Because of longstand-
analytical pipelines have had lim-
et al describe the ongoing efforts to
ing concerns about historical abuse
ited integration power because of
increase workforce diversity within
at the hands of medical leaders as
the utilization of linear univari-
the field of biomedical data science.
noted by Zhang et al, minority
ate methodology. Higher order in-
Creative projects that combine pre-
patients may be particularly suspi-
teraction studies on large GWAS
existing public databases, as illus-
cious about sharing their data with
utilizing multivariate analysis are
trated by Miller et al and Pennap et
researchers or other entities. There
needed but currently computation-
al, can provide important insights
are plenty of present-day concerns
ally intensive and thus limited by
in the delivery and outcomes of
as well that justify these serious
the availability of hardware.4 More-
care. As noted by Heller & Seltzer,
concerns – the number of individu-
3
70
5
Ethnicity & Disease, Volume 27, Number 2, Spring 2017
Datasets built on personalized
Foreword: Big Data and Health Disparities - Onukwugha et al als whose personal health informa-
as: What are the source experienc-
will require that we re-think how we
tion was affected by hacking/secu-
es, triggers or events that result in
generate, store, and protect patient
rity breaches of network servers or
health disparities? What are the per-
data and will require the leadership
EHRs jumped 10-fold from 2014
sonal, health system and geographic
to implement solutions to the exist-
to 2015, exceeding 110 million
barriers to improved individual
ing challenges with Big Data. The
people.6 Health systems are already
health behaviors? What are the le-
articles represented in this issue ad-
somewhat wary of sharing patient
vers for action needed to address the
vance the conversation by describ-
data to create the massive databases
identified health disparities? These
ing the current efforts, findings and
needed to advance this work due to
data also can be used to deepen our
concerns related to Big Data and
a perceived loss of competitive ad-
understanding of heterogeneity and
health disparities. They offer impor-
vantage. At the same time, the po-
the implications for how we use Big
tant recommendations and perspec-
tential for data breaches will further
Data. To the extent that “Models
tives to consider when designing
dampen enthusiasm if the scientific
are opinions embedded in mathe-
systems that can usefully leverage
community cannot provide assur-
matics”7 we will need to examine our
Big Data to reduce health dispari-
ance to patients that their data will
embedded assumptions regarding
ties. We hope that ongoing Big Data
be safeguarded. Reversing each of
the health care system we desire be-
efforts can build on these contribu-
these trends will require intensive
fore we can conduct a complete and
tions to advance the conversation,
efforts to demonstrate the value to
inclusive investigation of the health
address our embedded assump-
patients of sharing their individual
disparities that persist within this
tions, and identify levers for action
data and to health systems of shar-
system. Big Data is not immune to
to reduce health care disparities.
ing their aggregate data, as well as
these embedded assumptions. For
boosting data security safeguards
example, if we assume that qual-
Acknowledgements
using the most effective technol-
ity indices (and their contributing
ogy and eliminating all “weak links”
subdomains) can be applied to sub-
within these shared data networks.
populations in which they have not
The contents of this article are those of the authors and do not necessarily represent the official view of the National Institutes of Health. E.P. is a staff member of the National Heart, Lung, and Blood Institute, National Institutes of Health.
These efforts are necessary in order
been formally tested, any errors in
References
for Big Data to usefully support clin-
this assumption will be embedded
ical and population health research.
in models that utilize Big Data to
With access to the increased vol-
derive quality measures and then
ume, velocity and variety offered by
perpetuated when these measures
Big Data, we have to ask whether
are adopted across institutions.
there is value in the use of these data
to address health care disparities. In
ination of health disparities using
order to represent value, these data
Big Data is an admirable goal and
will need to support the investiga-
one that we believe is achievable.
tion of fundamental questions such
Sustained progress toward this goal
The reduction and ultimate elim-
Ethnicity & Disease, Volume 27, Number 2, Spring 2017
1. United States Census. Tabulation and Processing. https://www.census.gov/history/ www/innovations/technology/tabulation_ and_processing.html Accessed February 14, 2017. 2. da Cruz, F. Columbia University Computing History. http://www.columbia.edu/cu/ computinghistory/hollerith.html Accessed February 14, 2017. 3. Williams NB. The rise of Big Data: Trends and opportunity for the lab. Clinical Laboratory News. March 2014. https://www.aacc. org/publications/cln/articles/2014/march/bigdata Accessed February 20, 2017. 4. Goudey, B, Abedini M, Hopper JL, et al. High performance computing enabling exhaustive analysis of higher order single
71
Foreword: Big Data and Health Disparities - Onukwugha et al nucleotide polymorphism interaction in Genome Wide Association Studies. Health Inf Sci Syst, 2015. 3(Suppl 1 HISA Big Data in Biomedicine and Healthcare 2013 Con): S1-S3. https://doi.org/10.1186/2047-2501-3S1-S3. 5. Yang C, Li C, Wang Q, Chung D, Zhao H. Implications of pleiotropy: challenges and op-
portunities for mining Big Data in biomedicine. Frontiers in Genetics. 2015 Jun 30;6:229. doi: 10.3389/fgene.2015.00229. eCollection 2015. 6. Office of the National Coordinator for Health Information Technology, Health IT Dashboard. https://dashboard.healthit.gov/ quickstats/quickstats.php
7. Zhang, C. Q&A Cathy O’Neil, author of ‘Weapons of Math Destruction,’ on the dark side of big data. LA Times. December 30, 2016. http://www.latimes.com/books/jacketcopy/la-ca-jc-cathy-oneil-20161229-story. html Accessed: January 30, 2017.
Big Data articles in this issue of Ethnicity & Disease, Spring 2017 In the Rising Era of Big Data, Small Steps are Key Nina Heller, Jonathan H. Seltzer Racial and Ethnic Differences in a Linkage With the National Death Index Eric A. Miller, Frances A. McCarty, Jennifer D. Parker Hispanic Residential Isolation, ADHD Diagnosis and Stimulant Treatment among Medicaid-insured Youth Dinci Pennap, Mehmet Burcu, Daniel J. Safer, Julie Magno Zito Big Data Science: Opportunities and Challenges to Address Minority Health and Health Disparities in the 21st Century Xinzhi Zhang, Eliseo J. Pérez-Stable, Philip Bourne, Emmanuel Peprah, Kenrik Duru, Nancy Breen, David Berrigan, Fred Wood, James S. Jackson, David W.S. Wong, Josh Denny Enhancing Diversity in Biomedical Data Science Judith E. Canner, Archana J. McEligot, María-Eglée Pérez, Lei Qian, Xinzhi Zhang
72
Ethnicity & Disease, Volume 27, Number 2, Spring 2017