Accepted Manuscript Desperately seeking grey matter volume changes in sleep apnea: a methodological review of magnetic resonance brain voxel-based morphometry studies Sébastien Celle, Chantal Delon-Martin, Frédéric Roche, Jean-Claude Barthelemy, Jean-Louis Pépin, Michel Dojat PII:
S1087-0792(15)00043-X
DOI:
10.1016/j.smrv.2015.03.001
Reference:
YSMRV 872
To appear in:
Sleep Medicine Reviews
Received Date: 21 July 2014 Revised Date:
11 March 2015
Accepted Date: 11 March 2015
Please cite this article as: Celle S, Delon-Martin C, Roche F, Barthelemy J-C, Pépin J-L, Dojat M, Desperately seeking grey matter volume changes in sleep apnea: a methodological review of magnetic resonance brain voxel-based morphometry studies, Sleep Medicine Reviews (2015), doi: 10.1016/ j.smrv.2015.03.001. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Desperately seeking grey matter volume changes in sleep apnea: a methodological review of magnetic resonance brain voxel-based morphometry studies
Sébastien Celle1, Chantal Delon-Martin2,3, Frédéric Roche1, Jean-Claude Barthelemy1, Jean-
1
RI PT
Louis Pépin3,4, Michel Dojat2,3
EA 4607 SNA-EPIS, Service de Physiologie Clinique et de l'Exercice, Pole NOL, CHU
Nord, 42055 Saint-Étienne, Faculté de Médecine Jacques Lisfranc, Université Jean Monnet,
SC
42023 Saint-Étienne, PRES Université de Lyon, France Inserm, U836, F-38000 Grenoble, France
3
Université Grenoble Alpes, GIN, F-38000 Grenoble, France
4
Inserm, U 1042, F-38000 Grenoble, France
M AN U
2
Corresponding author: Sébastien CELLE, Service de physiologie clinique, Niveau 6, CHU
TE D
de Saint-Étienne, 42055 Saint-Étienne, France; E-mail:
[email protected] ; Phone: +33 4 77 82 83 00 ; Fax: +33 4 77 82 84 47
AC C
apnea
EP
Short title: A critical review of voxel-based morphometry brain imaging literature in sleep
Acknowledgments: No conflict of interest
ACCEPTED MANUSCRIPT Summary Cognitive impairment related to obstructive sleep apnea might be explained by subtle changes in brain anatomy. This has been mainly investigated using magnetic resonance brain scans coupled with a voxel-based morphometry analysis. However, this approach is prone to several
RI PT
methodological pitfalls that may explain the large discrepancy in the results reported in the literature. We critically reviewed twelve papers addressing grey matter volume modifications in association with obstructive sleep apnea. Finally, based on strict methodological criteria,
SC
only three studies reported robust, but conflicting, results. No clear evidence has emerged and exploring brain alteration due to obstructive sleep apnea should thus be considered as an open
M AN U
field. We provide recommendations for designing additional robust voxel-based morphometry studies, notably the use of larger cohorts, which is the only way to solve the underpowered issue and the underestimated role of confounders in neuroimaging studies.
AC C
EP
anatomy, neuroimaging
TE D
Keywords: Obstructive sleep apnea, voxel-based morphometry, registration, segmentation,
ACCEPTED MANUSCRIPT Abbreviations AHI: apnea/hypopnea index BMI: body mass index BRAVO: brain volume
RI PT
CPAP: continuous positive airway pressure DTI : diffusion tensor imaging ESS: Epworth sleepiness score
SC
FDR: false discovery rate
FWHM: full width at half maximum GLM: general linear model GM: grey matter
M AN U
FWE: family wise error
MPRAGE: magnetization prepared rapid acquisition gradient echo
TE D
MR/MRI: magnetic resonance/magnetic resonance imaging OSA: obstructive sleep apnea
SPGR: spoiled gradient recalled
EP
SPM: statistical parametric mapping
AC C
SVC: small volume correction TIV: total intracranial volume
VBM: voxel-based morphometry WM: white matter
ACCEPTED MANUSCRIPT
Introduction Obstructive sleep apnea/hypopnea (OSA) is characterized by the repetitive occurrence of partial or complete pharyngeal collapse during sleep ended by oxyhemoglobin desaturation
RI PT
and/or micro-arousals. It is a growing health concern, with a prevalence ranging from 4% in men in middle-aged patients [1] to 50% in the elderly population [2]. Many adverse
consequences are claimed to be associated with sleep apnea, such as sleepiness and associated
SC
car accidents [3], cardiovascular disease [4], cognitive impairment [5], diabetes [6], or even Alzheimer's disease [7] ; though some of these links remain debated in the scientific
M AN U
community.
This paper will only focus on OSA. Central sleep apnea [8] mainly caused by a defect in respiratory controls is frequently encountered in heart failure as well as in the elderly or after a stroke and represents a specific entity. Hence, the relationship between central sleep apnea
TE D
and brain structure abnormalities will not be explored in this paper. Cognitive impairment, as well as the potential link with Alzheimer's disease, suggests that brain structures are altered in OSA. Brain insult may result from sleep fragmentation due to
EP
micro-arousals and intermittent hypoxemia (i.e., the repetition of a desaturationreoxygenation sequence), which are the hallmarks of sleep apnea. In rodent models exposed
AC C
to intermittent hypoxia, this intermittent hypoxemia is associated with cell death in some brain structures, particularly in the hippocampus [9]. Several authors have explored potential modifications of brain anatomy in patients with OSA using structural magnetic resonance (MR) brain scans and the voxel-based morphometry (VBM) methodology. The published results vary significantly and recently, two review papers [10, 11] and a meta-analysis [12] have attempted to draw conclusions from the synthesis of these data. Although some of the authors [10,11] indicated that the discrepancies in the
ACCEPTED MANUSCRIPT literature probably reflect differences in image processing and statistical methods, a systematic analysis of the methodology adopted to produce the published results has not yet been conducted. As a result, the goal of our paper was to question the neuroimaging methodology, from
RI PT
magnetic resonance imaging (MRI) acquisition to statistics used in the MRI/OSA literature and, consequently, the interpretation of results. Therefore, we first defined the minimum set of methodological criteria to be respected for a statistically robust exploration of the grey matter
SC
(GM) modifications using VBM. We then reviewed all studies addressing this point in OSA patients. Based on our predefined criteria, we critically reviewed the relevant literature and
M AN U
selected the robust papers to conclude about the possible GM modification due to OSA. We finally proposed methodological guidelines for further studies.
TE D
Methods
VBM standard pipeline and key methodological issues
EP
VBM is a methodology developed to explore local brain volume changes [13] in which voxels are used as outcome measures to study the effects of explanatory variables. In the early days
AC C
of VBM, Bookstein’s controversy addressed some concerns regarding VBM methodology [14]. Fifteen years later, its statements remain true. The VBM pipeline, i.e., the image processing chain used to assess the possible tissue changes in MR brain scans due to some conditions, is composed of four steps: 1) Image preprocessing, 2) Modulation 3) Model definition and 4) Statistical analysis. The quality of each step clearly has a determinant influence on the quality of the final results and then on the interpretation. We defined a set of criteria to assess the quality of each step. Note that image acquisition is also an important
ACCEPTED MANUSCRIPT step. In practice, image acquisition conditions may differ between studies, including different magnetic fields (from 1T to 3T), different voxel sizes (from 2*2*2mm3 to 1*1*1mm3) or the use of different MRI sequences (spoiled gradient recalled (SPGR), magnetization prepared
quality criteria can be defined on these parameters.
RI PT
rapid acquisition gradient echo (MPRAGE) and Brain volume (BRAVO)). However, no clear
Image preprocessing: VBM requires three basic steps: registration, segmentation and the
SC
subsequent spatial smoothing of the set of MR structural images for exploration. Since each individual brain image is different, each image must first be registered to a common
M AN U
reference. This step is crucial because imperfection in the registration of images among individuals may introduce bias to the statistics [14]. This reference can be either specific to the population under study and provided by a specific realignment algorithm, such as DARTEL [15], or a template based on the mean of several subjects, such as MNI305, which
TE D
is based on the accurate realignment of MR brain scans of 305 healthy subjects. This latter reference allows for comparisons of the coordinates of the detected structural differences between studies using the same template. Segmentation provides a probability for each voxel
EP
to belong to a specific tissue. For instance, a probability equal to 0.8 for GM and 0.2 for white matter (WM) indicates that the corresponding voxel is likely to consist of GM. Spatial
AC C
smoothing with a Gaussian kernel is then applied to respect the conditions of validity of the Gaussian random field theory, which is mainly used for statistical analysis, and also attenuates possible remaining differences between individual brains after registration. For each study, the quality of the registration and segmentation steps crucially depends on the version of the corresponding algorithm available at that time. This dependence may explain why a reanalysis of a set of data with an upgraded version of the software could lead to different conclusions.
ACCEPTED MANUSCRIPT Statistical parametric mapping (SPM) is largely used in neuroimaging. The software was provided during the last 15 years in successive versions from SPM99 to SPM12 [16]. Clearly, each version provided improvements to some key steps compared to the previous one. SPM99 was released in January 2000 with a fully 3D nonlinear registration and used the MNI305
RI PT
registration template as the default. Compared to SPM99, SPM2 contained few
methodological improvements concerning registration and segmentation. In 2001, Good et al. [17] proposed a major modification to the protocol used up to the present called “optimized
SC
VBM”. This protocol aimed to correct the misclassification of some non-brain tissue by
creating specific GM and WM templates, computing the transformation parameters to realign
M AN U
the segmented individual images to these specific templates, applying such parameters to the original images and finally segmenting the realigned images. SPM5 was a major improvement: it introduced the unified segmentation method [18] to realign and segment images in a combined and iterative manner. VBM5 was a VBM dedicated toolbox for the
TE D
SPM5 version. SPM8 provided a new registration algorithm called DARTEL [15] which used an elastic deformation with a high number of degrees of freedom, and iteratively built a template specific to the studied population to considerably improve the quality of the fitting
EP
between each image and the computed template. The transformation of the template to a common reference, such as MNI305, was provided. Finally, unified segmentation was
AC C
improved in modeling 6 head components, such as the fat signal from the scalp or signals from large veins, as opposed to only 3 brain tissues, a method often referred to as “New Segment”. This improvement permitted the removal of potential contamination from nonbrain tissues that could lead to false positives. Criterion 1: We considered that the more recent the software is, the more robust the result is; in particular, the use of elastic registration tools was of high importance.
ACCEPTED MANUSCRIPT Modulation: Modulation is an important aspect that we should consider. After realignment to a reference, the tissue volumes present in the realigned image may be modified due to the application of the corresponding spatial transformation, e.g., when a large brain is realigned to a smaller template, information about its initial tissue volumes is lost. Modulation, which
RI PT
appeared in 2001 by Good et al. [17], aims to compensate for such artifactual modification. A change in GM volume and a change in its concentration are detected via modulated and unmodulated images, respectively.
M AN U
considered the use of modulated images as being valid.
SC
Criterion 2: Since the GM volume is usually the variable of interest for OSA studies, we only
Model definition: The choice of covariates is also important. Age and gender are the most common covariates included in the literature. However, the total intracranial volume (TIV) or alternatively the total GM volume, which obviously impacts the whole brain matter quantity,
TE D
has to be also included as a covariate in the analysis as reported by Pell et al. [19] Criterion 3: We only considered studies that introduced TIV or GM volume as a covariate as
EP
being valid.
Statistical analysis:
AC C
Statistical tests using the general linear model (GLM) were performed on each voxel of the complete set of images (univariate statistics). 100,000 voxels in an image using an alpha risk equal to 5% leads to a probability of 5,000 statistically significant-by-chance voxels. A correction for multiple comparisons is then required. A Bonferroni correction (dividing the pthreshold by the number of performed tests) is too conservative and requires the independence of each test, which is not the case in a VBM analysis. Random field theory can be used to control for the family wise error (FWE) rate [20]. A correction based on the proportion of
ACCEPTED MANUSCRIPT incorrect rejections of the null hypothesis (false discovery rate, FDR) was also introduced in SPM [21]. Since these corrections depended on the number of voxels included in the analysis, reducing the number of voxels decreased the number of multiple comparisons. Regions of interest can be used to decrease the set size and increase power with a specific multiple
RI PT
comparison correction (small volume correction, SVC) but introduces a strong hypothesis regarding the location of expected differences. Such a hypothesis should clearly be indicated when reporting results [22]. For example, assuming that the hippocampal [23] or brainstem
SC
[24] region is modified by OSA is a reasonable hypothesis, leading authors to apply “small volume correction”, i.e., only correcting their p-values by the number of voxels in the vicinity
M AN U
of hippocampus. However, in doing so the authors cannot prove that the hippocampus or brainstem are modified by OSA. The correct interpretation is the following: if we assume that OSA modifies hippocampus or brainstem size, then we can identify which part of the regions are most likely to be modified by OSA. The detection of differences in these regions did not
TE D
exclude that similar (or higher) differences could be detected in another part of the brain, especially if SVC and a lenient statistical threshold were used. The probability that a voxel is
cluster of voxels.
EP
truly modified increases when voxels in its neighborhood are also truly modified to define a
Cluster-size statistics were developed for functional MRI [25]. Their use was strongly
AC C
discouraged for VBM because the smoothness of the images was required to be uniform, which was rarely the case in this context. In recent years, various algorithms were developed to bypass this VBM limitation by using random field theory and permutation methods [26] or topological FDR [27]. Criterion 4: Results obtained without correction for multiple comparisons are not powerful enough to be reliable. Consequently, studies that reported results uncorrected for multiple comparisons should only be considered as exploratory. Multiple corrections at the voxel or
ACCEPTED MANUSCRIPT cluster level, when correctly reported (see Rigdway et al. [28]), were considered valid.
Criterion 5: SVC should be used with caution and the underlying hypothesis should clearly be indicated
RI PT
when reporting results, which is rarely the case in the literature. Studies with a low statistical power overestimate effect size and are difficult to reproduce
SC
[29,30].
M AN U
Criterion 6: Studies with less than 10 subjects in each group were excluded.
Multicenter study: Pooling the images from multiple centers should help to recruit subjects. However, even when an additional covariate is included in the statistical analysis to model the
TE D
center effect, inter-scanner differences can reach the significant threshold of p15, where comorbities have to be taken into account for defining a
disease. However, we did not exclude any studies based on the chosen AHI threshold. It is to be noted that Macey et al. [41] included treated patients and, as the treatment could be a
SC
confounding factor, the paper was excluded from our final analysis.
M AN U
Population size:
Population size also differed from 7 patients and 7 controls in Huyn et al. [10] to 60/60 in Morell et al. [46] and 76/76 in Celle et al. [43] However, between these extremes, other studies included about 15 to 25 subjects in each group. According to our selection criteria
Preprocessing
EP
our analysis.
TE D
(Criterion 5), we chose to exclude the studies by Morrel et al. [23] and Huynh et al. [10] from
AC C
Realignment and segmentation
The list of SPM versions used, from SPM99 to SPM8, reflects the history of SPM through the years with the weaknesses and strengths of built-in image-processing algorithms. The accuracy of the different algorithms would be interesting to discuss but is beyond the scope of this paper. It is interesting to underline that some authors failed to reproduce previous results in using different versions of the same pipeline. Canessa et al. [48] reported no differences with SPM2 but differences in the frontal and hippocampal areas using SPM5. Celle et al. did
ACCEPTED MANUSCRIPT not replicate their results obtained with SPM2 [43] using SPM8 (unpublished results). Consequently, based on Criterion 1, results from Celle et al. [43] and Canessa et al. [48] were not considered in our final analysis.
RI PT
Modulation
Across the twelve studies, most of them used the modulation step; only Macey et al. in 2002 [41] and Yaouhi et al. [44] did not report using this step to correct local deformations. The
SC
2010 study by Joo et al. [45] is interesting because results were given with and without
modulation, i.e., as GM volume and concentration. In the latter paper, the authors did notice
M AN U
differences in GM concentration but not in volume. According to Criterion 2, we excluded studies by Macey et al. [41] and Yaouhi et al. [44] , as well as the GM concentration results from Joo et al. [45]
TE D
Model definition
Three papers [10, 23, 24] did not provide any information about the covariates used. The majority of publications used age, with the exception of Celle et al. [43], because of the low
EP
dispersion of age in their study. Some authors also used other covariates such as body mass index (BMI) or blood pressure [43], gender [43,46], handedness [41] or comorbidities and
AC C
demographic characteristics [47]; although there was no clarification of which precise variables were included. Only four studies used a covariate expressing the global brain volume, GM volume for Celle et al. [43] and TIV for Joo et al. [45], Morrell et al. [46] and Torelli et al. [47] Following Criterion 3, the papers from Macey et al. [41], Morrell et al. [23], O'Donoghue et al. [46], Yaouhi et al. [44], Canessa et al. [48], Zhang et al. [49] and Huynh et al. [10] were not considered in our final analysis.
ACCEPTED MANUSCRIPT
Statistical analysis Finally, a broad range of statistical methods and thresholds to consider a difference as valid has been reported in the literature. Macey et al. [41] used basic statistics without any
RI PT
correction for multiple comparisons. Yaouhi et al. [44] and Celle et al. [43] used cluster level correction that was inappropriate for a VBM study. Huynh et al. [10] and Morrell et al. [46] used topological FDR [27], while Joo et al. [45] and O'Donoghue et al. [42] used FDR
SC
correction [21]. Canessa et al. [48] and Torelli et al. [47] used appropriate cluster correction and Zhang et al. [49] used a FWE correction at voxel level. According to our criteria, Macey
M AN U
et al. [41], Yaouhi et al. [44] and Celle et al. [43] were excluded for the final analysis (Criterion 4). Based on Criterion 5, we also chose to exclude Morrell et al. [23] and Lundblad et al. [24]. The use of a priori hypotheses, such as hippocampus [23] or brainstem [24], is
TE D
debatable and does not exclude that other areas may be modified (even strongly) in OSA.
were not coherent.
Discussion
EP
To conclude, three references survived the exclusion criteria (Table 3) and the reported results
AC C
Due to sleep fragmentation and intermittent hypoxia, OSA might induce changes in the central nervous system. Based on MR brain scans, some studies have reported such changes in brain tissue volumes. Indeed, a large set of nonreproducible results has been reported, and a clear picture of the effect of OSA on the brain has not yet emerged. In the present paper, we systematically reviewed twelve studies that explored GM modifications due to OSA using VBM to assess if an effect of OSA on brain structure has yet been demonstrated. Our goal was to demonstrate the serious methodological difficulties faced when using VBM to detect
ACCEPTED MANUSCRIPT the possible impact of OSA on GM.
When using our discriminating methodological criteria to assess the robustness of the results reported, only three of the twelve studies subsisted [45–47]. However, these three studies
RI PT
were not in agreement: Joo et al. [45] did not find a GM difference between OSA patients and controls, whereas Morrell et al. [46] observed GM decreases in the right middle temporal gyrus and cerebellum and Torelli et al. [47] in the right hippocampus. In our opinion, this
SC
clearly indicates that specific OSA-related brain modifications have not been demonstrated to date. On that point we do not agree with Weng et al. [12] This meta-analysis inappropriately
M AN U
pooled results obtained from different populations (subjects with or without treatment, severe or moderate apneic patients, male only or mixed-gender studies), with different outcome measures (GM volume and GM concentration) and different statistical power (results with uncorrected statistics, with FWE or FDR correction at voxel level, or with topological or
TE D
FWE correction at cluster level). The conclusion that the brain alteration due to OSA, if any, has not yet been demonstrated is not surprising because several methodological flaws hamper the coherence between studies and our quest for brain changes due to OSA. We will discuss
EP
some of these flaws below.
AC C
The population under study differed among studies. If we consider only the three selected studies, the patient inclusion criteria were AHI>30 for Joo et al. [45] and Morrel et al. [46] and AHI>15 for Torelli et al. [47] We did not expect similar GM decreases between patients with severe (AHI>30) and moderate (AHI>15) OSA. More heterogeneous criteria, from a lack of a precise AHI [23] level to AHI>5 [24, 41], were introduced in other studies. Moreover, following Killgore et al. [37], sleepiness could be the main factor that may induce some brain changes. The Epworth sleepiness score (ESS) reported for patients in the twelve
ACCEPTED MANUSCRIPT studies varied from 6.0 in the study by Celle et al. [43] to 15.2 in that reported by Zhang et al. [49] Even in the three selected papers, mean ESS in the pathological group ranged from 8.5 to 13.2. Ideally, patients and controls should be matched for sleepiness and for several influential parameters, such as hypertension [51] or BMI [52]. A very interesting study from
RI PT
Kendzerska et al. [53] on OSA and risk of cardiovascular events showed the following: 1) traditional cardiovascular risk factors (BMI, age, gender, smoking status, hypertension,
diabetes, etc.) may have greater impact than OSA; and 2) AHI may not be the most relevant
SC
OSA-related factor. Some of these factors have been taken into account in previous published papers, but some factors remain different between patients and controls; for instance, BMI
M AN U
and sleepiness were not controlled in most of the studies, in Celle et al. [43] sleepiness and BMI were not different between subjects with and without sleep-related breathing disorders but systolic blood pressure differed. However, the selection of the right influential variables is difficult and not specific in the sleep apnea literature. Mazziotta et al. [54] even considered
TE D
the concept of the normal, average human brain a myth.
Moreover, the matching process between the OSA and control was not accurate in most of
EP
these studies. A 1:1 or 1: n matching was rarely used, and a limited number of confounders were included. Since OSA is a heterogeneous disease with cardiovascular and metabolic
AC C
comorbidities that directly favor stroke or brain lesions, this inaccurate matching represents a major limitation. CPAP treatment has mainly been evaluated in open studies, and CPAP intervention has been randomized against sham CPAP in only one study with a small sample size to date [10].
The image acquisition procedure is also a factor of interest. Tardif et al. [55] indicated substantial differences when using even identically processed MPRAGE, modified driven
ACCEPTED MANUSCRIPT equilibrium Fourier transform (MDEFT) or SPGR images. More precisely, MPRAGE was more accurate for the cortex, whereas SPGR was better for deep structures. Moreover, a larger number of subjects were required when using MPRAGE images to reach a similar statistical threshold compared to SPGR images. Our three selected papers used three different MRI
RI PT
sequences (SPGR, MPRAGE and BRAVO). Interestingly, the only study that observed a hippocampal decrease used a sequence reported not to be the most accurate in this region. The magnetic field strength can also affect VBM studies. Marchewka et al. [56] reported strong
SC
differences in the cerebellum, precentral cortex and thalamus between using 1.5 T and 3 T scanners. Since 3 T MR scanners tend to be largely disseminated, the importance of this
M AN U
parameter will decrease. Images should be acquired in a reasonably short period because even a scanner upgrade may influence the results [57], which may be a problem for large cohorts, and even more for longitudinal studies. The coil used may also influence the sensitivity of the method, and results obtained with a 32-channel coil generally outperformed those obtained
TE D
with a 12-channel coil [58]. The voxel size directly influences the spatial resolution of the difference that can be detected [58]. The voxel size was relatively homogeneous in the OSA
EP
literature (1x1x1 mm3), but some authors used a smaller resolution [43].
Processing pipeline. Registration is a crucial step in a VBM analysis. Klein et al. [59] showed
AC C
major differences between registration algorithms applied to the whole brain that differed by region. According to Hellier et al. [60], the quality of registration is directly related to the degree of freedom of the transformation; as such, DARTEL, which is an elastic deformation technique that deals with 6 million parameters, should perform better than SPM2 registration, which deals with only one thousand parameters. Klein et al. [59] also demonstrated a modest correlation between the registration accuracy and the number of degrees of freedom of the registration, as well as a correlation between the number of degrees of freedom and years; the
ACCEPTED MANUSCRIPT most recent algorithms usually consider more degrees of freedom and thus are more accurate. According to Bergouignan et al. [61] and Yassa et al. [62], DARTEL should also improve the VBM sensitivity in small structures. Many studies have attempted to evaluate MR image segmentation [63–65]. The main
RI PT
conclusion we can draw from these studies is that discrepancies between different algorithms can reach the same order of magnitude as the expected volume change we search for using VBM, as demonstrated by Klauschen et al. [64], and they are spatially heterogeneous [55].
SC
Different pipelines can lead to different results [66, 67]. In sleep apnea, some authors have attempted to reproduce previous results with different versions of the same pipeline [42, 48].
M AN U
O’Donoghue et al. [42] showed the absence of differences in OSA patients compared to controls in using either SPM99 or SPM2 at valid statistical thresholds. Canessa et al. [48] reported no differences with SPM2, but they did show differences in the frontal and hippocampal areas with SPM5. We did not replicate our results obtained with SPM2 [43]
TE D
using SPM8.
We considered in the present review that modulation is a necessary step for VBM. However, a recent study [68] based on simulated brain abnormalities challenges this view, suggesting
EP
that modulation can be omitted with the conjoint use of large smoothing kernels in accordance
AC C
to Silver et al. [69]
Statistical analysis. Most of the discrepancies across OSA studies are apparently due to the statistical standard used in the neurosciences community [70] or cognitive sciences [71]. Seven [23, 24, 41–45] of the twelve selected studies did not show a GM difference when using an appropriately corrected p-value threshold. In the absence of a difference, reducing the statistical demands is tempting, even if it impacts the confidence in the results. In this case, the study should only be considered exploratory. Random field theory allows the calculation of cluster extent statistics that state the probability of observing a cluster of voxels
ACCEPTED MANUSCRIPT of a given size; all voxels with values above a given t or z threshold under the Null hypothesis. The validity of cluster extent statistics crucially depends on the spatial smoothing and the selected voxel threshold, which have not been systematically reported in some studies. For VBM, Silver et al. [69] recommended a “p = 0.001 for voxel threshold and a 12 mm
RI PT
Gaussian kernel (full width at half maximum, FWHM)” based on empirical information.
Based on simulations they observed that “false positive rates ranged from 9.8 to 67.6%” when using a 6 mm Gaussian kernel and thresholds such as p = 0.05 or p = 0.01.
SC
Underpower. OSA studies are clearly underpowered (i.e., they have low sensitivity), similarly to numerous neuroimaging [72,73] and even neuroscience [29] studies. Studies with low
M AN U
power tend to inflate the detected significance, possibly because of sampling errors [29,30]. A generic criterion to determine when a study is underpowered is lacking because such a criterion depends on the corresponding effect size, which is unknown a priori (see some discussions on this point in Friston, Ingre and Lindquist [74–77]). Shen et al. [78, 79]
TE D
indicated, “that a VBM study with groups smaller than 25 may acquire unreliable detection”, which would invalidate two of our three selected studies. In 2014, Ionnadis and co-authors [71] showed in a simulated study that for sample size less than 30 (15 in each group) the
EP
number of correctly detected abnormalities compared to falsely detected abnormalities is quite the same and close to zero. The recent study of the same group focused on voxel-based meta-
AC C
analysis including VBM suggested the median sample size of 47 subjects per group [30]. However, we could infer from our failing to reproduce our results on a population of 152 subjects [43] that no universal “magic number” exists. Limitations of the current review The main limitation of this review was the difficulty to obtain detailed information on the pipeline and statistics used for some studies, mainly the oldest ones. The review was also restricted to cortical density modifications. It may be extended to studies that explore possible
ACCEPTED MANUSCRIPT changes to the basal ganglia morphometry and WM due to OSA using structural MRI for brain tissue density changes or DTI for fractional anisotropy modifications.
Conclusion
RI PT
To date, only three studies were robust enough to provide reliable results. However, no clear conclusion on a potential link between OSA and GM alteration can be drawn from these three studies because of the absence of congruence between them. To improve our knowledge of
SC
the possible brain modifications due to OSA, we suggest several important methodological points for future studies. First, we suggest a homogeneous recruitment of subjects with
M AN U
controlled OSA related parameters (AHI or oxyhemoglobin desaturation index) and comorbidities (sleepiness, hypertension, and BMI). Second, the acquisition sequence should be the same for all subjects, and the spatial resolution should fit with the cortical ribbon alterations in question. Since the cortical thickness ranges between 3 mm and 5 mm, it should
TE D
be limited at acquisition, using a high spatial resolution (typically 1 mm3 or below) to avoid partial volume effect. Third, image preprocessing should include the following: (i) segmentation that excludes large vessels, skull and meninges; (ii) accurate inter-individual
EP
image registration using elastic deformations; (iii) a modulation step to generate volume images; and (iv) spatial smoothing (6-12 mm Gaussian kernel). If the study relies on a
AC C
population younger or older than normal adulthood, a specific template should be computed for the registration step. An elegant manner to preprocess images is to use a pipeline that includes a correction for partial volume effects (see for instance VBM8). Longitudinal studies are also of particular importance to study the impact of age or treatment. In these cases, care should be taken in the template selection to prevent false positive results [80]. Fourth, the statistical analysis should at least include the TIV or total GM volume regressor, age, gender, and other comorbidity regressors if not matched between groups. A minimal threshold must
ACCEPTED MANUSCRIPT be applied to the GM content to exclude statistics from areas that contain insufficient GM (absolute threshold typically higher than 10% or even 20%). A correction for multiple comparisons (p 25
28
NC
14
1.5
MPRAGE
1*1*2
O'Donoghue et al. 2004 [42]
27 / 24
> 30
71.7
33.2
13.1
3
Fast SPGR
0.97*0.73*1.5
Yaouhi et al. 2009 [44]
16 / 14
≥ 10
38.3
NC
12.5
1.5
SPGR
0.94*0.94*1.5
Celle et al. 2009 [43]
76 / 76
≥ 15
29.2
26.3
5.9
1
MPRAGE
2*2*2
Joo et al. 2010 [45]
36 / 31
> 30
52.5
26.0
10.4
1.5
SPGR
0.86*0.86*1.6
Morrell et al. 2010 [46] ¥
26/25 34/35
> 30
71.5 41.6
32.6 31.4
13.5 13.0
3 1.5
Fast SPGR MPRAGE
0.49*0.49*2 1*1*2
Canessa et al. 2011 [48]
17 / 15
> 30
55.8
31.2
11.9
3
NC
1*1*1
Torelli et al. 2011 [47]
16 / 14
Moderate ( > 15 )
52.5
31.7
8.5
3
MPRAGE
1*1*1
Zhang et al. 2012 [49]
24 / 21
> 15
54.7
29.8
15.2
3
BRAVO
1*1*1
Huynh et al. 2014 [10] *
27 / 7
≥ 15
38.9*
27.4*
NC
3
Fast SPGR
1*1*1.2
Lundblad et al. 2014 [24]
20 / 19
≥5
38
31.7
9
3
TFE
0.8
TE D
M AN U
RI PT
Nb subjects
SC
Subjects characteristics
Table 1 : characteristics of patients included in the eleven selected studies. Only means were reported.
AC C
EP
* For Huynh et al., we computed the values from the values given for sham (n=13) and active CPAP (n=14) patient groups. ¥ Patients and subjects were recruited in two sites Melbourne, Australia and London, UK. Nb subjects: number of subjects given as patients/controls AHI: apnea/hypopnea index, BMI: body-mass index, BRAVO: Brain Volume, ESS: Epworth sleepiness score, MPRAGE: magnetization prepared rapid acquisition gradient echo, MR/MRI: magnetic resonance/magnetic resonance imaging, NC: not communicated, SPGR: spoiled gradient recalled, TFE: Turbo Field Echo
ACCEPTED MANUSCRIPT
Processing
Statistics
Morrell et al. 2003 [23]
SPM99 MNI 152 NC
SPM99 default
P uncorr at voxellevel
Threshold on cluster size
Correction
P corr
300
No correction
NC
Yes
12
ANOVA
Age Handedness
0.001
No
12
ANOVA
NC
NC
SVC (hipp)
0.01
Optimised
Yes
NC
ANCOVA
Age
0.001
Voxel level FDR
0.05
MNI 152 Optimised
SPM2
Yaouhi et al. 2009 [44]
SPM2
Custom Optimised
No
12
T-test
Age
0.005
Cluster level
0.05
Celle et al. 2009 [43]
SPM2
Custom Optimised
Yes
12
T-test Regression
Gender, BMI, BP, GM volume
NC
Cluster level
0.05
Joo et al. 2010 [45] ¥
SPM2
Custom Optimised Yes & No
8 & 12
ANCOVA
Age TIV & Age
NC
Voxel level FDR
0.05
Morrell et al. 2010 [46]
SPM8
NC SUIT*
SPM8 NS SUIT*
Yes
8 & 12
Factorial design
Age, gender TIV
0.001
Topological FDR
0.05
Canessa et al. 2011 [48]
SPM5
MNI 152
VBM5
Yes
8
T-test
Age
0.005
Cluster FWE
0.05
Torelli et al. 2011 [47]
SPM8
Custom
NS+D
Yes
10
ANOVA
Demog char Comorb TIV
NC
Cluster FWE
0.05
Zhang et al. 2012 [49]
SPM8
Custom
NS+D
Yes
8
T-test
Age
NC
Voxel level FWE
0.05
Huynh et al. 2014 [10]
SPM8
NC
VBM8
Yes
8
ANOVA
NC
0.005
Topological FDR
0.05
EP
TE D
M AN U
O'Donoghue et al. 2004 [42]
AC C
NC
Covariate
SC
Macey et al. 2002 [41]
FWHM Statistical Smoothing tests (mm)
RI PT
Software Template Methods Modulation
200
ACCEPTED MANUSCRIPT
Lundblad et al. 2014 [24]
NC
SUIT*
SUIT
Yes
3
T-test
NC
NC
3
SVC
0.05
RI PT
Table 2: summary of the pipeline and statistics used in sleep apnea/VBM studies.
SC
ANCOVA: analysis of covariance, ANOVA: analysis of variance, BMI: body-mass Index, BP: blood pressure, FDR: false discovery rate, FWE: family wise error, GM: grey matter, Hipp: hippocampus, FWHM: full width at half maximum, MNI: Montreal Neurological institute, NC: not communicated, NS+D: NewSegment + DARTEL, P corr: P corrected for multiple comparisons, P uncorr: P uncorrected for multiple comparisons, SVC: small volume correction, TIV: total intracranial volume
AC C
EP
TE D
M AN U
¥ Joo et al. used modulated and unmodulated images *SUIT is a toolbox for SPM designed to specifically explore brainstem and cerebellum50.
ACCEPTED MANUSCRIPT
Morrell et al. 2010 [46]
Torelli et al. 2011 [47]
Age and TIV
Age, gender, TIV
Demographic characteristics (no
FDR at voxel level
Topological FDR
M AN U
Statistical correction for multiple
SC
RI PT
Covariates
Joo et al. 2010 [45]
comparisons Localisation of GM diminution due
No diminution
Right middle temporal gyrus
MNI coordinates (mm)
TIV Cluster FWE
Right hippocampus
N/A
[52 4 -22]
[30 -5 -48]
[-12 -62 -56]
EP
[x y z]
Comorbidities (no clarification)
Left cerebellum (lobe VIIIb)
TE D
to OSA
clarification is given)
AC C
Table 3: Surviving results with correction for multiple comparisons and total intracranial volume as covariate; modulated images were used. FDR: false discovery rate, FWE: family wise error, GM: grey matter, MNI: Montreal Neurological institute, N/A: not applicable, OSA: obstructive sleep apnea, TIV: total intracranial volume
ACCEPTED MANUSCRIPT
Morrell et al.
O'Donoghue et Yaouhi et al.
Celle et al.
Morrell et al.
Joo et al. 2010
Canessa et al.
Torelli et al.
Zhang et al.
Huynh et al.
Lundblad et al.
2002 [41]
2003 [23]
al. 2004 [42]
2009 [44]
2009 [43]
2010 [46]
[45]
2011 [48]
2011 [47]
2012 [49]
2014 [10]
2014 [24]
Inclusion
Confirmed
Newly
AHI >30
Newly
Being 65 at the AHI > 30
Male
Newly
Newly
Male
Male
Mild apnea
criteria for
sleep diagnosis
diagnosed
15 % of total
diagnosed
beginning of
Age between
diagnosed
diagnosed
Moderate
AHI>=15
(AHI>=5)
male
sleep spent at
Subjective
the study
18 and 55
AHI>30
Moderate
apnea
Mild to severe
No CPAP
SaO2 < 90 %
complaints of
AHI >= 15 or
AHI>30
apnea
(AHI>15)
OSA
treatment
OSA
ODI >= 15
(Hypothese :
Normal lung
AHI >= 10
AHI>5)
function
SC
(AHI>15)
M AN U
patients
RI PT
Macey et al.
Weight < 130kg Girth < 152cm No sleep
Matched for
Matched for
No history of
Being 65 at the No history of
AHI 90
airflow >= 90 % for 10
AASM 1999
AASM 1999
ons Standard
More than
Complete
Absence of
>80 % drop of
Reduction
practices
50 % reduction
cessations of
airflow for
airflow in
airflow > 90 % respiratory
of airflow for
airflow for 10s
more than 10s
London >= 10s during 10s with amplitude for
for 10s
despite
Complete
evidence of
associated with associated with
respiratory
cessation of
persistent
effort
persistent
airflow >= 10s respiratory
increased
respiratory
in Melbourne
respiratory
effort
effort
50 % of airflow Reduction of
Decrease of
reduction of
50 % of airflow 50 % of the
reduction with
airflow for 10s
for 10s
amplitude of
a >4 % dip in
accompanied
the respiratory
effort 30 % drop of
Reduction
Reduction in
airflow of 50 % respiratory
>30 % for 10s
airflow >=
>=10
amplitude for
with 4 % or
30 % for 10s
saturation or an or
more than 10s
greater oxygen
and
by desat >=3 % effort, with a
arousal for
Reduction of
and 3 % or
desat.
accompanied
or
fall of 3 % in
London
airflow by
greater oxygen
with 4 % or
oxygen
50 % decrease
30 % > 10s
desat.
greater oxygen
saturation, or
in airflow
accompanied
desaturation or
autonomic
without
with EEG
EEG arousal
arousal
requirement for arousal and/or
according to a
desat, plus
decrease in
those with a
EP
TE D
Reduction of
more than 10s
continued or
50 to 75 %
microarousals
AC C
Hypopnea def
80 % drop of
SC
10s
M AN U
Apnea def
RI PT
Recommandati ASDA
2009 [44]
3% desat
ACCEPTED MANUSCRIPT
PTT
less airflow reduction associated with
RI PT
3 % desat or arousal for Melbourne
Device
CID-102
Tyco
Healthdyne /
Healthcare
Embla
HypnoPTT
Alice-3 /
Flaga
SC
Cidelec
M AN U
Manufacturer
Oro-nasal
PSG
PG
Nasal cannula
Nasal pressure
TE D
Nasal flux
PSG
Oral flux
Oro-nasal
Yes
AC C
thermistor Saturation
Thermistor
Finger pulse
Pulse oxymetry
Embletta
Compumedics
Crystal monitor ApneaLink
E-series Sleep
/ Sandman
System PSG
PG ?
PSG
PSG
Nasal cannula
YES
Nasal cannula
PG
pressure monitor Thermal sensor YES
Oximeter
oxymeter
EEG
ResMed
+ nasal air
EP
thermistor
PSG
CleveMed
Sleep
Somnologica
PSG/PG
Compumedics
Finger pulse
YES
YES
Pulse oximetry
Standard EEG
YES
oxymeter
FP1/FP1 ;
4-channel
C3/C4 ;
(C3/A4 ;
T3/T4;O1/O2
C4/A1 ;
ACCEPTED MANUSCRIPT
O1/A2 ;
Electro-
Left and right
4-channel
Electromyogra
Anterior
Submental
m
tibialis muscles
Intercostal
oculogram
SC
Anterior
RI PT
O2/A1) YES
YES
YES
Chin and legs
YES
YES
tibialis muscles Yes (heart rate)
Bodyposition
On-lead ECG
Yes
Yes
ECG with
One-lead &
surface
thoracic
electrodes
electrodes
M AN U
ECG
Bodyposition
YES
TE D
sensor
Rib cage
Thoracic
excursions
impedance Strain gauges
EP
Abdominal respiratory
Snoring
Piezolelectric
Strain gauges
Yes
YES
bands
Abdominal and thoracic belts
AC C
movements
YES
Tracheal
Microphone
YES
microphone
Treatment
Remarks
11 with CPAP
Before and
Before and
Sham and
at MRI
after CPAP
after CPAP
active CPAP
Occurence of
Controls :
Videotape
ACCEPTED MANUSCRIPT
AHI 0.2
AC C
whole brain
Table S6: statistics used in the VBM sleep apnea literature
AHI: apnea/hypopnea index, ANCOVA: analysis of covariance, ANOVA: analysis of variance, BMI: body-mass index, BP: blood pressure, CPAP: continuous positive airway pressure, FDR: false discovery rate, FWE: family wise error, GM: grey matter, ODI: oxyhemoglobin desaturation index, ROI: region of interest, SVC: small volume correction, SUIT: Spatially unbiased infra tentorial template50, TIV: total intracranial volume
ACCEPTED MANUSCRIPT
Macey et al.
Morrell et al.
O'Donoghue Yaouhi et al.
Celle et al.
Morrell et
Joo et al.
Canessa et
2002 [41]
2003 [23]
et al. 2004
2009 [43]
al. 2010
2010 [45]
al. 2011 [48] 2011 [47]
2009 [44]
[46]
Bilateral
Posterior parietal cortex
SC
Anterior cingulate gyrus Left
Bilateral
Ventral lateral cortex Left
sites
Posterior inferior Dorsolateral prefrontal Middle gyrus Temporal
EP AC C
Frontomarginal gyrus
Bilateral
TE D
Lateral prefrontal Multiple
Gyrus rectus
2014 [10]
M AN U
Anterior superior gyrus Bilateral
Superior frontal gyrus
al. 2012
Left
Frontal
Prefrontal cortex
Huynh et al. Lundblad et
[49]
Parietal Inferior parietal gyrus
Zhang et
RI PT
[42]
Torelli et al.
Left Bilateral
Right
Left Bilateral Left Left Left
al. 2014 [24]
ACCEPTED MANUSCRIPT
Inferior temporal gyrus Bilateral
Bilateral Bilateral
Superior temporal gyrus
Bilateral
Right
Right
RI PT
Middle temporal gyrus
Anterior lobe Bilateral Bilateral
Mesial lobe
Bilateral
SC
Posterior lobe
Occipital Middle occipital gyrus
Right
Lingual gyrus
Right
Cuneus Hippocampus
Right
Parahippocampal gyrus
Bilateral
Right
Cerebellum
Quadrangula
Right
Brainstem
EP
Right
AC C
r lobule
Left
Vermis Insula
TE D
Left
Fusiform gyrus
Left
Right Right+left
Bilateral Left Quadrangul ar & biventer lobules
Bilateral
Yes Left
Bilateral
M AN U
Lateral temporal
Right
Bilateral
ACCEPTED MANUSCRIPT
Basal ganglia Right
Putamen
Left
Caudate nucleus
Left
Pallidum
Left
Bilateral
Bilateral
Amygdala
SC
Uncorr
Only GM
Uncorr
Only GM
Reduction of
GM increase
No CSF
concentration
GM
concentration
WM in the
in RVLM
No WM
results
increase :
results
temporal
Dorsolateral
No results in
lobes nearby
pons
volume (ie
the
with
hippocampus
No increase
M AN U
Remarks
Bilateral
RI PT
Thalamus
R basal ganglia
parietal lobes
TE D
frontal &
modulation)
EP
Table S7: grey matter results according to the VBM sleep apnea literature
AC C
CSF: cerebrospinal fluid, GM: grey matter, RVLM: rostral ventrolateral medulla, VBM: voxel-based morphometry, WM: white matter Bold represents areas where a corrected threshold (correction on cluster or on voxel-level) was used.