COMMENTARY

COMMENTARY

Case for fMRI data repositories Satish Iyengara,1

Functional magnetic resonance imaging (fMRI) technology for studying the human brain is the result of considerable efforts of distinguished physicists and engineers (1, 2). Despite concerns about some of its conclusions (3), it is now a common tool in neuroscience, psychology, and psychiatry. In these applications, statistical methods provide many of the tools for both the preprocessing of fMRI signals and the analysis of those processed data (4). These statistical procedures are implemented in software packages whose algorithms are rather involved, typically using parametric models for the error distribution and the spatial autocorrelation function. They are used to study both individual voxels and clusters of voxels. As a result, there are many thousands of fMRI-based publications. In PNAS, Eklund et al. (5) report on their studies of the statistical aspects of fMRI studies. In particular, they assess the performance of three software packages on datasets made available through sharing agreements. They tell a sobering tale: Familywise error (FWE) rates that should be about 5% can be much higher, there was a bug in software that led to inflated error rates for 15 y, and a survey of 241 recent fMRI papers indicates that about 40% did not report using well-known methods for correcting for multiple testing. These findings cast a shadow of doubt on those earlier publications. However, their work also provides a template for a way forward; namely, encourage the sharing of data and use them to test statistical methods and software to improve fMRI-based research, for example, by reducing the number of nonreproducible findings (6).

An Assessment of Current fMRI Practice The discipline of statistics is undergoing a period of rapid growth (7, 8). As a result, new methods for analyzing large, high-dimensional data are being developed rapidly, with fMRI being a standard source of high-dimensional data (consider the signal for each voxel as a dimension). The assessment of these methods to understand how they perform is done in many ways. A principal classical approach is asymptotics (9), in which the sample size increases. However, this approach is often intractable for high-dimensional

data and is of limited relevance when datasets are not sufficiently large. There is also evidence that methods (e.g., bootstrapping) that work well in classical problems break down in high dimensions (10). Simulation studies are common in such cases (11), but as Eklund et al. (5) note, it is “hard to simulate the complex spatiotemporal noise that arises from a human subject in an MR scanner.” Thus, access to actual datasets that can be used as test beds to assess statistical methods and to develop code for their implementation is important. Eklund et al. (5) use data from the 1000 Functional Connectomes Project (12) and encourage scientists to share their data with the OpenfMRI project (13), which, to date, has 49 raw MRI datasets on 1,811 subjects. Such repositories can also facilitate the use of meta-analysis to reduce further the problem of false-positive results. The motivation in the study by Eklund et al. (5) comes from their earlier work (14), in which they used SPM software on 1,484 resting states for task-based, single-subject fMRI analyses, and found that the falsepositive rates were as high as 70% rather than the expected 5%. To understand the sources of the higher error rates, they expanded their scope by including the three most common software packages, SPM, FSL, and AFNI, generally using each package’s default settings. They used resting state fMRI data from 499 healthy controls from three sites obtained from the 1000 Functional Connectomes Project. They then mimicked several activity paradigms, and used one-sample and two-sample t tests, controlling for the FWE rates for both voxel-wise and cluster-wise inference. Because resting state data should not have systematic changes in brain activity, the error rates should have been around 5%. Their main findings are that all packages were conservative for voxel-wise inference but not for cluster-wise inference. In particular, the rates using a parametric Gaussian spatial autocorrelation gave much higher FWE rates; instead, a nonparametric approach using a permutation test gave valid inferences. They then did extensive exploratory data analysis to try to identify the reasons for the poor performance. They concluded that the main reason was the misspecification of the spatial autocorrelation function; they also discuss many other

a Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15260 Author contributions: S.I. wrote the paper. The author declares no conflict of interest. See companion article on page 7900. 1 Email: [email protected].

www.pnas.org/cgi/doi/10.1073/pnas.1608146113

PNAS | July 12, 2016 | vol. 113 | no. 28 | 7699–7700

aspects of their findings, such as possible reasons for differences between voxel-wise and cluster-wise inference, as well as for differences between the software packages (which presumably led to finding the long-standing bug).

Remedies Old and New The nonparametric permutation test is an old method (15) that is based on the simple idea that if the null hypothesis holds (for, say, the two-sample problem), then the two groups are arbitrary labels. It uses all permutations of the labels into two groups to assess the significance of the observed difference. The number of permutations is usually too large to do all of them; typically, only a sample of several thousand permutations suffices to get a good estimate of the significance level. Using the permutation test, Eklund et al. (5) are able to show that the actual data had much longer tails than the Gaussian model. The permutation test is not a panacea, however, because it appears to be invalid for the onesample test they studied; they attribute that finding to asymmetrical errors, but a more careful analysis of that case is needed. Eklund et al. (5) focus on the simplest analyses, the one-sample and two-sample problems. In most studies, there are also many

covariates that may help explain the variation in fMRI data. When the number of covariates is large, some sort of model selection method, such as the lasso (16), is used to whittle down the number of covariates. In such cases, another strain of recent statistical research that is relevant is called adaptive inference (17, 18). In practice, the same data are typically used to decide which covariates and their interactions enter a regression or classification model, and also to assess the significance of the regression coefficients or error rate, respectively (for example, see refs. 19 and 20). This approach violates the often implicit assumption that the selection of the model is made before the data are seen, making inferences about the regression coefficients in the model difficult. The adaptive inference approach aims to produce estimates and confidence intervals that are more honest by accounting for the selection of the covariates from the same data. It would be useful to use the adaptive inference framework to assess the magnitude of improvement in the false-positive rates.

Acknowledgments The author’s research is supported by National Institute of Mental Health Grant R01 MH100041-01A1.

1 Uludag K, et al. (2005) Basic principles of functional MRI. Clinical MRI, eds Edelman R, Hesselink J, Zlatkin M (Elsevier, San Diego), pp 249–287. 2 Ogawa S, et al. (1992) Intrinsic signal changes accompanying sensory stimulation: Functional brain mapping with magnetic resonance imaging. Proc Natl Acad Sci USA 89(13):5951–5955. 3 Vul E, Harris C, Winkielman P, Pashler H (2009) Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspect Psychol Sci 4(3): 274–290. 4 Ashby G (2011) Statistical Analysis of fMRI Data (MIT Press, Cambridge, MA). 5 Eklund A, Nichols TE, Knutsson H (2016) Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proc Natl Acad Sci USA 113:7900–7905. 6 Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2(8):e124. 7 Lin X, et al., eds (2014) Past, Present, and Future of Statistical Science (CRC Press, New York). 8 Committee on the Analysis of Massive Data, et al. (2013) Frontiers in Massive Data Analysis (National Academy Press, Washington, DC). 9 van der Vaart AW (2000) Asymptotic Statistics (Cambridge Univ Press, Cambridge, UK). 10 El Karoui N, Purdom E (2015) Can we trust the bootstrap in high dimension? (Department of Statistics, University of California, Berkeley, CA), Report 824. 11 Welvaert M, Rosseel Y (2014) A review of fMRI simulation studies. PLoS One 9(7):e101953. 12 Biswal B, et al. (2010) Toward discovery science of human brain function. Proc Natl Acad Sci USA 107:4734–4739. 13 Poldrack R, et al. (2014) Making big data open: Data sharing in neuroimaging. Nat Neurosci 17:1510–1517. 14 Eklund A, Andersson M, Josephson C, Johannesson M, Knutsson H (2012) Does parametric fMRI analysis with SPM yield valid results? An empirical study of 1484 rest datasets. Neuroimage 61(3):565–578. 15 Fisher RA (1935) The Design of Experiments (Hafner, New York). 16 Hastie T, Tibshirani R, Friedman JH (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York), 2nd Ed. 17 Taylor J, et al. (2016) Statistical learning and selective inference. Proc Natl Acad Sci USA 112:7629–7634. 18 Lockhart R, et al. (2014) Post-selection adaptive inference for least angle regression and the lasso. Ann Statist 42(2):413–468. 19 Rashid B, et al. (2014) Dynamic connectivity states estimated from resting fMRI identify differences among schizophrenia, bipolar disorder, and healthy control subjects. Front Hum Neurosci 8:897. 20 Wu G, et al. (2013) Mapping the voxel-wise effective connectome in resting state fMRI. PLoS One 8(9):e73670.

7700 | www.pnas.org/cgi/doi/10.1073/pnas.1608146113

Iyengar

Case for fMRI data repositories.

Case for fMRI data repositories. - PDF Download Free
519KB Sizes 0 Downloads 7 Views