Accepted Manuscript Desperately seeking grey matter volume changes in sleep apnea: a methodological review of magnetic resonance brain voxel-based morphometry studies Sébastien Celle, Chantal Delon-Martin, Frédéric Roche, Jean-Claude Barthelemy, Jean-Louis Pépin, Michel Dojat PII:

S1087-0792(15)00043-X

DOI:

10.1016/j.smrv.2015.03.001

Reference:

YSMRV 872

To appear in:

Sleep Medicine Reviews

Received Date: 21 July 2014 Revised Date:

11 March 2015

Accepted Date: 11 March 2015

Please cite this article as: Celle S, Delon-Martin C, Roche F, Barthelemy J-C, Pépin J-L, Dojat M, Desperately seeking grey matter volume changes in sleep apnea: a methodological review of magnetic resonance brain voxel-based morphometry studies, Sleep Medicine Reviews (2015), doi: 10.1016/ j.smrv.2015.03.001. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Desperately seeking grey matter volume changes in sleep apnea: a methodological review of magnetic resonance brain voxel-based morphometry studies

Sébastien Celle1, Chantal Delon-Martin2,3, Frédéric Roche1, Jean-Claude Barthelemy1, Jean-

1

RI PT

Louis Pépin3,4, Michel Dojat2,3

EA 4607 SNA-EPIS, Service de Physiologie Clinique et de l'Exercice, Pole NOL, CHU

Nord, 42055 Saint-Étienne, Faculté de Médecine Jacques Lisfranc, Université Jean Monnet,

SC

42023 Saint-Étienne, PRES Université de Lyon, France Inserm, U836, F-38000 Grenoble, France

3

Université Grenoble Alpes, GIN, F-38000 Grenoble, France

4

Inserm, U 1042, F-38000 Grenoble, France

M AN U

2

Corresponding author: Sébastien CELLE, Service de physiologie clinique, Niveau 6, CHU

TE D

de Saint-Étienne, 42055 Saint-Étienne, France; E-mail: [email protected] ; Phone: +33 4 77 82 83 00 ; Fax: +33 4 77 82 84 47

AC C

apnea

EP

Short title: A critical review of voxel-based morphometry brain imaging literature in sleep

Acknowledgments: No conflict of interest

ACCEPTED MANUSCRIPT Summary Cognitive impairment related to obstructive sleep apnea might be explained by subtle changes in brain anatomy. This has been mainly investigated using magnetic resonance brain scans coupled with a voxel-based morphometry analysis. However, this approach is prone to several

RI PT

methodological pitfalls that may explain the large discrepancy in the results reported in the literature. We critically reviewed twelve papers addressing grey matter volume modifications in association with obstructive sleep apnea. Finally, based on strict methodological criteria,

SC

only three studies reported robust, but conflicting, results. No clear evidence has emerged and exploring brain alteration due to obstructive sleep apnea should thus be considered as an open

M AN U

field. We provide recommendations for designing additional robust voxel-based morphometry studies, notably the use of larger cohorts, which is the only way to solve the underpowered issue and the underestimated role of confounders in neuroimaging studies.

AC C

EP

anatomy, neuroimaging

TE D

Keywords: Obstructive sleep apnea, voxel-based morphometry, registration, segmentation,

ACCEPTED MANUSCRIPT Abbreviations AHI: apnea/hypopnea index BMI: body mass index BRAVO: brain volume

RI PT

CPAP: continuous positive airway pressure DTI : diffusion tensor imaging ESS: Epworth sleepiness score

SC

FDR: false discovery rate

FWHM: full width at half maximum GLM: general linear model GM: grey matter

M AN U

FWE: family wise error

MPRAGE: magnetization prepared rapid acquisition gradient echo

TE D

MR/MRI: magnetic resonance/magnetic resonance imaging OSA: obstructive sleep apnea

SPGR: spoiled gradient recalled

EP

SPM: statistical parametric mapping

AC C

SVC: small volume correction TIV: total intracranial volume

VBM: voxel-based morphometry WM: white matter

ACCEPTED MANUSCRIPT

Introduction Obstructive sleep apnea/hypopnea (OSA) is characterized by the repetitive occurrence of partial or complete pharyngeal collapse during sleep ended by oxyhemoglobin desaturation

RI PT

and/or micro-arousals. It is a growing health concern, with a prevalence ranging from 4% in men in middle-aged patients [1] to 50% in the elderly population [2]. Many adverse

consequences are claimed to be associated with sleep apnea, such as sleepiness and associated

SC

car accidents [3], cardiovascular disease [4], cognitive impairment [5], diabetes [6], or even Alzheimer's disease [7] ; though some of these links remain debated in the scientific

M AN U

community.

This paper will only focus on OSA. Central sleep apnea [8] mainly caused by a defect in respiratory controls is frequently encountered in heart failure as well as in the elderly or after a stroke and represents a specific entity. Hence, the relationship between central sleep apnea

TE D

and brain structure abnormalities will not be explored in this paper. Cognitive impairment, as well as the potential link with Alzheimer's disease, suggests that brain structures are altered in OSA. Brain insult may result from sleep fragmentation due to

EP

micro-arousals and intermittent hypoxemia (i.e., the repetition of a desaturationreoxygenation sequence), which are the hallmarks of sleep apnea. In rodent models exposed

AC C

to intermittent hypoxia, this intermittent hypoxemia is associated with cell death in some brain structures, particularly in the hippocampus [9]. Several authors have explored potential modifications of brain anatomy in patients with OSA using structural magnetic resonance (MR) brain scans and the voxel-based morphometry (VBM) methodology. The published results vary significantly and recently, two review papers [10, 11] and a meta-analysis [12] have attempted to draw conclusions from the synthesis of these data. Although some of the authors [10,11] indicated that the discrepancies in the

ACCEPTED MANUSCRIPT literature probably reflect differences in image processing and statistical methods, a systematic analysis of the methodology adopted to produce the published results has not yet been conducted. As a result, the goal of our paper was to question the neuroimaging methodology, from

RI PT

magnetic resonance imaging (MRI) acquisition to statistics used in the MRI/OSA literature and, consequently, the interpretation of results. Therefore, we first defined the minimum set of methodological criteria to be respected for a statistically robust exploration of the grey matter

SC

(GM) modifications using VBM. We then reviewed all studies addressing this point in OSA patients. Based on our predefined criteria, we critically reviewed the relevant literature and

M AN U

selected the robust papers to conclude about the possible GM modification due to OSA. We finally proposed methodological guidelines for further studies.

TE D

Methods

VBM standard pipeline and key methodological issues

EP

VBM is a methodology developed to explore local brain volume changes [13] in which voxels are used as outcome measures to study the effects of explanatory variables. In the early days

AC C

of VBM, Bookstein’s controversy addressed some concerns regarding VBM methodology [14]. Fifteen years later, its statements remain true. The VBM pipeline, i.e., the image processing chain used to assess the possible tissue changes in MR brain scans due to some conditions, is composed of four steps: 1) Image preprocessing, 2) Modulation 3) Model definition and 4) Statistical analysis. The quality of each step clearly has a determinant influence on the quality of the final results and then on the interpretation. We defined a set of criteria to assess the quality of each step. Note that image acquisition is also an important

ACCEPTED MANUSCRIPT step. In practice, image acquisition conditions may differ between studies, including different magnetic fields (from 1T to 3T), different voxel sizes (from 2*2*2mm3 to 1*1*1mm3) or the use of different MRI sequences (spoiled gradient recalled (SPGR), magnetization prepared

quality criteria can be defined on these parameters.

RI PT

rapid acquisition gradient echo (MPRAGE) and Brain volume (BRAVO)). However, no clear

Image preprocessing: VBM requires three basic steps: registration, segmentation and the

SC

subsequent spatial smoothing of the set of MR structural images for exploration. Since each individual brain image is different, each image must first be registered to a common

M AN U

reference. This step is crucial because imperfection in the registration of images among individuals may introduce bias to the statistics [14]. This reference can be either specific to the population under study and provided by a specific realignment algorithm, such as DARTEL [15], or a template based on the mean of several subjects, such as MNI305, which

TE D

is based on the accurate realignment of MR brain scans of 305 healthy subjects. This latter reference allows for comparisons of the coordinates of the detected structural differences between studies using the same template. Segmentation provides a probability for each voxel

EP

to belong to a specific tissue. For instance, a probability equal to 0.8 for GM and 0.2 for white matter (WM) indicates that the corresponding voxel is likely to consist of GM. Spatial

AC C

smoothing with a Gaussian kernel is then applied to respect the conditions of validity of the Gaussian random field theory, which is mainly used for statistical analysis, and also attenuates possible remaining differences between individual brains after registration. For each study, the quality of the registration and segmentation steps crucially depends on the version of the corresponding algorithm available at that time. This dependence may explain why a reanalysis of a set of data with an upgraded version of the software could lead to different conclusions.

ACCEPTED MANUSCRIPT Statistical parametric mapping (SPM) is largely used in neuroimaging. The software was provided during the last 15 years in successive versions from SPM99 to SPM12 [16]. Clearly, each version provided improvements to some key steps compared to the previous one. SPM99 was released in January 2000 with a fully 3D nonlinear registration and used the MNI305

RI PT

registration template as the default. Compared to SPM99, SPM2 contained few

methodological improvements concerning registration and segmentation. In 2001, Good et al. [17] proposed a major modification to the protocol used up to the present called “optimized

SC

VBM”. This protocol aimed to correct the misclassification of some non-brain tissue by

creating specific GM and WM templates, computing the transformation parameters to realign

M AN U

the segmented individual images to these specific templates, applying such parameters to the original images and finally segmenting the realigned images. SPM5 was a major improvement: it introduced the unified segmentation method [18] to realign and segment images in a combined and iterative manner. VBM5 was a VBM dedicated toolbox for the

TE D

SPM5 version. SPM8 provided a new registration algorithm called DARTEL [15] which used an elastic deformation with a high number of degrees of freedom, and iteratively built a template specific to the studied population to considerably improve the quality of the fitting

EP

between each image and the computed template. The transformation of the template to a common reference, such as MNI305, was provided. Finally, unified segmentation was

AC C

improved in modeling 6 head components, such as the fat signal from the scalp or signals from large veins, as opposed to only 3 brain tissues, a method often referred to as “New Segment”. This improvement permitted the removal of potential contamination from nonbrain tissues that could lead to false positives. Criterion 1: We considered that the more recent the software is, the more robust the result is; in particular, the use of elastic registration tools was of high importance.

ACCEPTED MANUSCRIPT Modulation: Modulation is an important aspect that we should consider. After realignment to a reference, the tissue volumes present in the realigned image may be modified due to the application of the corresponding spatial transformation, e.g., when a large brain is realigned to a smaller template, information about its initial tissue volumes is lost. Modulation, which

RI PT

appeared in 2001 by Good et al. [17], aims to compensate for such artifactual modification. A change in GM volume and a change in its concentration are detected via modulated and unmodulated images, respectively.

M AN U

considered the use of modulated images as being valid.

SC

Criterion 2: Since the GM volume is usually the variable of interest for OSA studies, we only

Model definition: The choice of covariates is also important. Age and gender are the most common covariates included in the literature. However, the total intracranial volume (TIV) or alternatively the total GM volume, which obviously impacts the whole brain matter quantity,

TE D

has to be also included as a covariate in the analysis as reported by Pell et al. [19] Criterion 3: We only considered studies that introduced TIV or GM volume as a covariate as

EP

being valid.

Statistical analysis:

AC C

Statistical tests using the general linear model (GLM) were performed on each voxel of the complete set of images (univariate statistics). 100,000 voxels in an image using an alpha risk equal to 5% leads to a probability of 5,000 statistically significant-by-chance voxels. A correction for multiple comparisons is then required. A Bonferroni correction (dividing the pthreshold by the number of performed tests) is too conservative and requires the independence of each test, which is not the case in a VBM analysis. Random field theory can be used to control for the family wise error (FWE) rate [20]. A correction based on the proportion of

ACCEPTED MANUSCRIPT incorrect rejections of the null hypothesis (false discovery rate, FDR) was also introduced in SPM [21]. Since these corrections depended on the number of voxels included in the analysis, reducing the number of voxels decreased the number of multiple comparisons. Regions of interest can be used to decrease the set size and increase power with a specific multiple

RI PT

comparison correction (small volume correction, SVC) but introduces a strong hypothesis regarding the location of expected differences. Such a hypothesis should clearly be indicated when reporting results [22]. For example, assuming that the hippocampal [23] or brainstem

SC

[24] region is modified by OSA is a reasonable hypothesis, leading authors to apply “small volume correction”, i.e., only correcting their p-values by the number of voxels in the vicinity

M AN U

of hippocampus. However, in doing so the authors cannot prove that the hippocampus or brainstem are modified by OSA. The correct interpretation is the following: if we assume that OSA modifies hippocampus or brainstem size, then we can identify which part of the regions are most likely to be modified by OSA. The detection of differences in these regions did not

TE D

exclude that similar (or higher) differences could be detected in another part of the brain, especially if SVC and a lenient statistical threshold were used. The probability that a voxel is

cluster of voxels.

EP

truly modified increases when voxels in its neighborhood are also truly modified to define a

Cluster-size statistics were developed for functional MRI [25]. Their use was strongly

AC C

discouraged for VBM because the smoothness of the images was required to be uniform, which was rarely the case in this context. In recent years, various algorithms were developed to bypass this VBM limitation by using random field theory and permutation methods [26] or topological FDR [27]. Criterion 4: Results obtained without correction for multiple comparisons are not powerful enough to be reliable. Consequently, studies that reported results uncorrected for multiple comparisons should only be considered as exploratory. Multiple corrections at the voxel or

ACCEPTED MANUSCRIPT cluster level, when correctly reported (see Rigdway et al. [28]), were considered valid.

Criterion 5: SVC should be used with caution and the underlying hypothesis should clearly be indicated

RI PT

when reporting results, which is rarely the case in the literature. Studies with a low statistical power overestimate effect size and are difficult to reproduce

SC

[29,30].

M AN U

Criterion 6: Studies with less than 10 subjects in each group were excluded.

Multicenter study: Pooling the images from multiple centers should help to recruit subjects. However, even when an additional covariate is included in the statistical analysis to model the

TE D

center effect, inter-scanner differences can reach the significant threshold of p15, where comorbities have to be taken into account for defining a

disease. However, we did not exclude any studies based on the chosen AHI threshold. It is to be noted that Macey et al. [41] included treated patients and, as the treatment could be a

SC

confounding factor, the paper was excluded from our final analysis.

M AN U

Population size:

Population size also differed from 7 patients and 7 controls in Huyn et al. [10] to 60/60 in Morell et al. [46] and 76/76 in Celle et al. [43] However, between these extremes, other studies included about 15 to 25 subjects in each group. According to our selection criteria

Preprocessing

EP

our analysis.

TE D

(Criterion 5), we chose to exclude the studies by Morrel et al. [23] and Huynh et al. [10] from

AC C

Realignment and segmentation

The list of SPM versions used, from SPM99 to SPM8, reflects the history of SPM through the years with the weaknesses and strengths of built-in image-processing algorithms. The accuracy of the different algorithms would be interesting to discuss but is beyond the scope of this paper. It is interesting to underline that some authors failed to reproduce previous results in using different versions of the same pipeline. Canessa et al. [48] reported no differences with SPM2 but differences in the frontal and hippocampal areas using SPM5. Celle et al. did

ACCEPTED MANUSCRIPT not replicate their results obtained with SPM2 [43] using SPM8 (unpublished results). Consequently, based on Criterion 1, results from Celle et al. [43] and Canessa et al. [48] were not considered in our final analysis.

RI PT

Modulation

Across the twelve studies, most of them used the modulation step; only Macey et al. in 2002 [41] and Yaouhi et al. [44] did not report using this step to correct local deformations. The

SC

2010 study by Joo et al. [45] is interesting because results were given with and without

modulation, i.e., as GM volume and concentration. In the latter paper, the authors did notice

M AN U

differences in GM concentration but not in volume. According to Criterion 2, we excluded studies by Macey et al. [41] and Yaouhi et al. [44] , as well as the GM concentration results from Joo et al. [45]

TE D

Model definition

Three papers [10, 23, 24] did not provide any information about the covariates used. The majority of publications used age, with the exception of Celle et al. [43], because of the low

EP

dispersion of age in their study. Some authors also used other covariates such as body mass index (BMI) or blood pressure [43], gender [43,46], handedness [41] or comorbidities and

AC C

demographic characteristics [47]; although there was no clarification of which precise variables were included. Only four studies used a covariate expressing the global brain volume, GM volume for Celle et al. [43] and TIV for Joo et al. [45], Morrell et al. [46] and Torelli et al. [47] Following Criterion 3, the papers from Macey et al. [41], Morrell et al. [23], O'Donoghue et al. [46], Yaouhi et al. [44], Canessa et al. [48], Zhang et al. [49] and Huynh et al. [10] were not considered in our final analysis.

ACCEPTED MANUSCRIPT

Statistical analysis Finally, a broad range of statistical methods and thresholds to consider a difference as valid has been reported in the literature. Macey et al. [41] used basic statistics without any

RI PT

correction for multiple comparisons. Yaouhi et al. [44] and Celle et al. [43] used cluster level correction that was inappropriate for a VBM study. Huynh et al. [10] and Morrell et al. [46] used topological FDR [27], while Joo et al. [45] and O'Donoghue et al. [42] used FDR

SC

correction [21]. Canessa et al. [48] and Torelli et al. [47] used appropriate cluster correction and Zhang et al. [49] used a FWE correction at voxel level. According to our criteria, Macey

M AN U

et al. [41], Yaouhi et al. [44] and Celle et al. [43] were excluded for the final analysis (Criterion 4). Based on Criterion 5, we also chose to exclude Morrell et al. [23] and Lundblad et al. [24]. The use of a priori hypotheses, such as hippocampus [23] or brainstem [24], is

TE D

debatable and does not exclude that other areas may be modified (even strongly) in OSA.

were not coherent.

Discussion

EP

To conclude, three references survived the exclusion criteria (Table 3) and the reported results

AC C

Due to sleep fragmentation and intermittent hypoxia, OSA might induce changes in the central nervous system. Based on MR brain scans, some studies have reported such changes in brain tissue volumes. Indeed, a large set of nonreproducible results has been reported, and a clear picture of the effect of OSA on the brain has not yet emerged. In the present paper, we systematically reviewed twelve studies that explored GM modifications due to OSA using VBM to assess if an effect of OSA on brain structure has yet been demonstrated. Our goal was to demonstrate the serious methodological difficulties faced when using VBM to detect

ACCEPTED MANUSCRIPT the possible impact of OSA on GM.

When using our discriminating methodological criteria to assess the robustness of the results reported, only three of the twelve studies subsisted [45–47]. However, these three studies

RI PT

were not in agreement: Joo et al. [45] did not find a GM difference between OSA patients and controls, whereas Morrell et al. [46] observed GM decreases in the right middle temporal gyrus and cerebellum and Torelli et al. [47] in the right hippocampus. In our opinion, this

SC

clearly indicates that specific OSA-related brain modifications have not been demonstrated to date. On that point we do not agree with Weng et al. [12] This meta-analysis inappropriately

M AN U

pooled results obtained from different populations (subjects with or without treatment, severe or moderate apneic patients, male only or mixed-gender studies), with different outcome measures (GM volume and GM concentration) and different statistical power (results with uncorrected statistics, with FWE or FDR correction at voxel level, or with topological or

TE D

FWE correction at cluster level). The conclusion that the brain alteration due to OSA, if any, has not yet been demonstrated is not surprising because several methodological flaws hamper the coherence between studies and our quest for brain changes due to OSA. We will discuss

EP

some of these flaws below.

AC C

The population under study differed among studies. If we consider only the three selected studies, the patient inclusion criteria were AHI>30 for Joo et al. [45] and Morrel et al. [46] and AHI>15 for Torelli et al. [47] We did not expect similar GM decreases between patients with severe (AHI>30) and moderate (AHI>15) OSA. More heterogeneous criteria, from a lack of a precise AHI [23] level to AHI>5 [24, 41], were introduced in other studies. Moreover, following Killgore et al. [37], sleepiness could be the main factor that may induce some brain changes. The Epworth sleepiness score (ESS) reported for patients in the twelve

ACCEPTED MANUSCRIPT studies varied from 6.0 in the study by Celle et al. [43] to 15.2 in that reported by Zhang et al. [49] Even in the three selected papers, mean ESS in the pathological group ranged from 8.5 to 13.2. Ideally, patients and controls should be matched for sleepiness and for several influential parameters, such as hypertension [51] or BMI [52]. A very interesting study from

RI PT

Kendzerska et al. [53] on OSA and risk of cardiovascular events showed the following: 1) traditional cardiovascular risk factors (BMI, age, gender, smoking status, hypertension,

diabetes, etc.) may have greater impact than OSA; and 2) AHI may not be the most relevant

SC

OSA-related factor. Some of these factors have been taken into account in previous published papers, but some factors remain different between patients and controls; for instance, BMI

M AN U

and sleepiness were not controlled in most of the studies, in Celle et al. [43] sleepiness and BMI were not different between subjects with and without sleep-related breathing disorders but systolic blood pressure differed. However, the selection of the right influential variables is difficult and not specific in the sleep apnea literature. Mazziotta et al. [54] even considered

TE D

the concept of the normal, average human brain a myth.

Moreover, the matching process between the OSA and control was not accurate in most of

EP

these studies. A 1:1 or 1: n matching was rarely used, and a limited number of confounders were included. Since OSA is a heterogeneous disease with cardiovascular and metabolic

AC C

comorbidities that directly favor stroke or brain lesions, this inaccurate matching represents a major limitation. CPAP treatment has mainly been evaluated in open studies, and CPAP intervention has been randomized against sham CPAP in only one study with a small sample size to date [10].

The image acquisition procedure is also a factor of interest. Tardif et al. [55] indicated substantial differences when using even identically processed MPRAGE, modified driven

ACCEPTED MANUSCRIPT equilibrium Fourier transform (MDEFT) or SPGR images. More precisely, MPRAGE was more accurate for the cortex, whereas SPGR was better for deep structures. Moreover, a larger number of subjects were required when using MPRAGE images to reach a similar statistical threshold compared to SPGR images. Our three selected papers used three different MRI

RI PT

sequences (SPGR, MPRAGE and BRAVO). Interestingly, the only study that observed a hippocampal decrease used a sequence reported not to be the most accurate in this region. The magnetic field strength can also affect VBM studies. Marchewka et al. [56] reported strong

SC

differences in the cerebellum, precentral cortex and thalamus between using 1.5 T and 3 T scanners. Since 3 T MR scanners tend to be largely disseminated, the importance of this

M AN U

parameter will decrease. Images should be acquired in a reasonably short period because even a scanner upgrade may influence the results [57], which may be a problem for large cohorts, and even more for longitudinal studies. The coil used may also influence the sensitivity of the method, and results obtained with a 32-channel coil generally outperformed those obtained

TE D

with a 12-channel coil [58]. The voxel size directly influences the spatial resolution of the difference that can be detected [58]. The voxel size was relatively homogeneous in the OSA

EP

literature (1x1x1 mm3), but some authors used a smaller resolution [43].

Processing pipeline. Registration is a crucial step in a VBM analysis. Klein et al. [59] showed

AC C

major differences between registration algorithms applied to the whole brain that differed by region. According to Hellier et al. [60], the quality of registration is directly related to the degree of freedom of the transformation; as such, DARTEL, which is an elastic deformation technique that deals with 6 million parameters, should perform better than SPM2 registration, which deals with only one thousand parameters. Klein et al. [59] also demonstrated a modest correlation between the registration accuracy and the number of degrees of freedom of the registration, as well as a correlation between the number of degrees of freedom and years; the

ACCEPTED MANUSCRIPT most recent algorithms usually consider more degrees of freedom and thus are more accurate. According to Bergouignan et al. [61] and Yassa et al. [62], DARTEL should also improve the VBM sensitivity in small structures. Many studies have attempted to evaluate MR image segmentation [63–65]. The main

RI PT

conclusion we can draw from these studies is that discrepancies between different algorithms can reach the same order of magnitude as the expected volume change we search for using VBM, as demonstrated by Klauschen et al. [64], and they are spatially heterogeneous [55].

SC

Different pipelines can lead to different results [66, 67]. In sleep apnea, some authors have attempted to reproduce previous results with different versions of the same pipeline [42, 48].

M AN U

O’Donoghue et al. [42] showed the absence of differences in OSA patients compared to controls in using either SPM99 or SPM2 at valid statistical thresholds. Canessa et al. [48] reported no differences with SPM2, but they did show differences in the frontal and hippocampal areas with SPM5. We did not replicate our results obtained with SPM2 [43]

TE D

using SPM8.

We considered in the present review that modulation is a necessary step for VBM. However, a recent study [68] based on simulated brain abnormalities challenges this view, suggesting

EP

that modulation can be omitted with the conjoint use of large smoothing kernels in accordance

AC C

to Silver et al. [69]

Statistical analysis. Most of the discrepancies across OSA studies are apparently due to the statistical standard used in the neurosciences community [70] or cognitive sciences [71]. Seven [23, 24, 41–45] of the twelve selected studies did not show a GM difference when using an appropriately corrected p-value threshold. In the absence of a difference, reducing the statistical demands is tempting, even if it impacts the confidence in the results. In this case, the study should only be considered exploratory. Random field theory allows the calculation of cluster extent statistics that state the probability of observing a cluster of voxels

ACCEPTED MANUSCRIPT of a given size; all voxels with values above a given t or z threshold under the Null hypothesis. The validity of cluster extent statistics crucially depends on the spatial smoothing and the selected voxel threshold, which have not been systematically reported in some studies. For VBM, Silver et al. [69] recommended a “p = 0.001 for voxel threshold and a 12 mm

RI PT

Gaussian kernel (full width at half maximum, FWHM)” based on empirical information.

Based on simulations they observed that “false positive rates ranged from 9.8 to 67.6%” when using a 6 mm Gaussian kernel and thresholds such as p = 0.05 or p = 0.01.

SC

Underpower. OSA studies are clearly underpowered (i.e., they have low sensitivity), similarly to numerous neuroimaging [72,73] and even neuroscience [29] studies. Studies with low

M AN U

power tend to inflate the detected significance, possibly because of sampling errors [29,30]. A generic criterion to determine when a study is underpowered is lacking because such a criterion depends on the corresponding effect size, which is unknown a priori (see some discussions on this point in Friston, Ingre and Lindquist [74–77]). Shen et al. [78, 79]

TE D

indicated, “that a VBM study with groups smaller than 25 may acquire unreliable detection”, which would invalidate two of our three selected studies. In 2014, Ionnadis and co-authors [71] showed in a simulated study that for sample size less than 30 (15 in each group) the

EP

number of correctly detected abnormalities compared to falsely detected abnormalities is quite the same and close to zero. The recent study of the same group focused on voxel-based meta-

AC C

analysis including VBM suggested the median sample size of 47 subjects per group [30]. However, we could infer from our failing to reproduce our results on a population of 152 subjects [43] that no universal “magic number” exists. Limitations of the current review The main limitation of this review was the difficulty to obtain detailed information on the pipeline and statistics used for some studies, mainly the oldest ones. The review was also restricted to cortical density modifications. It may be extended to studies that explore possible

ACCEPTED MANUSCRIPT changes to the basal ganglia morphometry and WM due to OSA using structural MRI for brain tissue density changes or DTI for fractional anisotropy modifications.

Conclusion

RI PT

To date, only three studies were robust enough to provide reliable results. However, no clear conclusion on a potential link between OSA and GM alteration can be drawn from these three studies because of the absence of congruence between them. To improve our knowledge of

SC

the possible brain modifications due to OSA, we suggest several important methodological points for future studies. First, we suggest a homogeneous recruitment of subjects with

M AN U

controlled OSA related parameters (AHI or oxyhemoglobin desaturation index) and comorbidities (sleepiness, hypertension, and BMI). Second, the acquisition sequence should be the same for all subjects, and the spatial resolution should fit with the cortical ribbon alterations in question. Since the cortical thickness ranges between 3 mm and 5 mm, it should

TE D

be limited at acquisition, using a high spatial resolution (typically 1 mm3 or below) to avoid partial volume effect. Third, image preprocessing should include the following: (i) segmentation that excludes large vessels, skull and meninges; (ii) accurate inter-individual

EP

image registration using elastic deformations; (iii) a modulation step to generate volume images; and (iv) spatial smoothing (6-12 mm Gaussian kernel). If the study relies on a

AC C

population younger or older than normal adulthood, a specific template should be computed for the registration step. An elegant manner to preprocess images is to use a pipeline that includes a correction for partial volume effects (see for instance VBM8). Longitudinal studies are also of particular importance to study the impact of age or treatment. In these cases, care should be taken in the template selection to prevent false positive results [80]. Fourth, the statistical analysis should at least include the TIV or total GM volume regressor, age, gender, and other comorbidity regressors if not matched between groups. A minimal threshold must

ACCEPTED MANUSCRIPT be applied to the GM content to exclude statistics from areas that contain insufficient GM (absolute threshold typically higher than 10% or even 20%). A correction for multiple comparisons (p 25

28

NC

14

1.5

MPRAGE

1*1*2

O'Donoghue et al. 2004 [42]

27 / 24

> 30

71.7

33.2

13.1

3

Fast SPGR

0.97*0.73*1.5

Yaouhi et al. 2009 [44]

16 / 14

≥ 10

38.3

NC

12.5

1.5

SPGR

0.94*0.94*1.5

Celle et al. 2009 [43]

76 / 76

≥ 15

29.2

26.3

5.9

1

MPRAGE

2*2*2

Joo et al. 2010 [45]

36 / 31

> 30

52.5

26.0

10.4

1.5

SPGR

0.86*0.86*1.6

Morrell et al. 2010 [46] ¥

26/25 34/35

> 30

71.5 41.6

32.6 31.4

13.5 13.0

3 1.5

Fast SPGR MPRAGE

0.49*0.49*2 1*1*2

Canessa et al. 2011 [48]

17 / 15

> 30

55.8

31.2

11.9

3

NC

1*1*1

Torelli et al. 2011 [47]

16 / 14

Moderate ( > 15 )

52.5

31.7

8.5

3

MPRAGE

1*1*1

Zhang et al. 2012 [49]

24 / 21

> 15

54.7

29.8

15.2

3

BRAVO

1*1*1

Huynh et al. 2014 [10] *

27 / 7

≥ 15

38.9*

27.4*

NC

3

Fast SPGR

1*1*1.2

Lundblad et al. 2014 [24]

20 / 19

≥5

38

31.7

9

3

TFE

0.8

TE D

M AN U

RI PT

Nb subjects

SC

Subjects characteristics

Table 1 : characteristics of patients included in the eleven selected studies. Only means were reported.

AC C

EP

* For Huynh et al., we computed the values from the values given for sham (n=13) and active CPAP (n=14) patient groups. ¥ Patients and subjects were recruited in two sites Melbourne, Australia and London, UK. Nb subjects: number of subjects given as patients/controls AHI: apnea/hypopnea index, BMI: body-mass index, BRAVO: Brain Volume, ESS: Epworth sleepiness score, MPRAGE: magnetization prepared rapid acquisition gradient echo, MR/MRI: magnetic resonance/magnetic resonance imaging, NC: not communicated, SPGR: spoiled gradient recalled, TFE: Turbo Field Echo

ACCEPTED MANUSCRIPT

Processing

Statistics

Morrell et al. 2003 [23]

SPM99 MNI 152 NC

SPM99 default

P uncorr at voxellevel

Threshold on cluster size

Correction

P corr

300

No correction

NC

Yes

12

ANOVA

Age Handedness

0.001

No

12

ANOVA

NC

NC

SVC (hipp)

0.01

Optimised

Yes

NC

ANCOVA

Age

0.001

Voxel level FDR

0.05

MNI 152 Optimised

SPM2

Yaouhi et al. 2009 [44]

SPM2

Custom Optimised

No

12

T-test

Age

0.005

Cluster level

0.05

Celle et al. 2009 [43]

SPM2

Custom Optimised

Yes

12

T-test Regression

Gender, BMI, BP, GM volume

NC

Cluster level

0.05

Joo et al. 2010 [45] ¥

SPM2

Custom Optimised Yes & No

8 & 12

ANCOVA

Age TIV & Age

NC

Voxel level FDR

0.05

Morrell et al. 2010 [46]

SPM8

NC SUIT*

SPM8 NS SUIT*

Yes

8 & 12

Factorial design

Age, gender TIV

0.001

Topological FDR

0.05

Canessa et al. 2011 [48]

SPM5

MNI 152

VBM5

Yes

8

T-test

Age

0.005

Cluster FWE

0.05

Torelli et al. 2011 [47]

SPM8

Custom

NS+D

Yes

10

ANOVA

Demog char Comorb TIV

NC

Cluster FWE

0.05

Zhang et al. 2012 [49]

SPM8

Custom

NS+D

Yes

8

T-test

Age

NC

Voxel level FWE

0.05

Huynh et al. 2014 [10]

SPM8

NC

VBM8

Yes

8

ANOVA

NC

0.005

Topological FDR

0.05

EP

TE D

M AN U

O'Donoghue et al. 2004 [42]

AC C

NC

Covariate

SC

Macey et al. 2002 [41]

FWHM Statistical Smoothing tests (mm)

RI PT

Software Template Methods Modulation

200

ACCEPTED MANUSCRIPT

Lundblad et al. 2014 [24]

NC

SUIT*

SUIT

Yes

3

T-test

NC

NC

3

SVC

0.05

RI PT

Table 2: summary of the pipeline and statistics used in sleep apnea/VBM studies.

SC

ANCOVA: analysis of covariance, ANOVA: analysis of variance, BMI: body-mass Index, BP: blood pressure, FDR: false discovery rate, FWE: family wise error, GM: grey matter, Hipp: hippocampus, FWHM: full width at half maximum, MNI: Montreal Neurological institute, NC: not communicated, NS+D: NewSegment + DARTEL, P corr: P corrected for multiple comparisons, P uncorr: P uncorrected for multiple comparisons, SVC: small volume correction, TIV: total intracranial volume

AC C

EP

TE D

M AN U

¥ Joo et al. used modulated and unmodulated images *SUIT is a toolbox for SPM designed to specifically explore brainstem and cerebellum50.

ACCEPTED MANUSCRIPT

Morrell et al. 2010 [46]

Torelli et al. 2011 [47]

Age and TIV

Age, gender, TIV

Demographic characteristics (no

FDR at voxel level

Topological FDR

M AN U

Statistical correction for multiple

SC

RI PT

Covariates

Joo et al. 2010 [45]

comparisons Localisation of GM diminution due

No diminution

Right middle temporal gyrus

MNI coordinates (mm)

TIV Cluster FWE

Right hippocampus

N/A

[52 4 -22]

[30 -5 -48]

[-12 -62 -56]

EP

[x y z]

Comorbidities (no clarification)

Left cerebellum (lobe VIIIb)

TE D

to OSA

clarification is given)

AC C

Table 3: Surviving results with correction for multiple comparisons and total intracranial volume as covariate; modulated images were used. FDR: false discovery rate, FWE: family wise error, GM: grey matter, MNI: Montreal Neurological institute, N/A: not applicable, OSA: obstructive sleep apnea, TIV: total intracranial volume

ACCEPTED MANUSCRIPT

Morrell et al.

O'Donoghue et Yaouhi et al.

Celle et al.

Morrell et al.

Joo et al. 2010

Canessa et al.

Torelli et al.

Zhang et al.

Huynh et al.

Lundblad et al.

2002 [41]

2003 [23]

al. 2004 [42]

2009 [44]

2009 [43]

2010 [46]

[45]

2011 [48]

2011 [47]

2012 [49]

2014 [10]

2014 [24]

Inclusion

Confirmed

Newly

AHI >30

Newly

Being 65 at the AHI > 30

Male

Newly

Newly

Male

Male

Mild apnea

criteria for

sleep diagnosis

diagnosed

15 % of total

diagnosed

beginning of

Age between

diagnosed

diagnosed

Moderate

AHI>=15

(AHI>=5)

male

sleep spent at

Subjective

the study

18 and 55

AHI>30

Moderate

apnea

Mild to severe

No CPAP

SaO2 < 90 %

complaints of

AHI >= 15 or

AHI>30

apnea

(AHI>15)

OSA

treatment

OSA

ODI >= 15

(Hypothese :

Normal lung

AHI >= 10

AHI>5)

function

SC

(AHI>15)

M AN U

patients

RI PT

Macey et al.

Weight < 130kg Girth < 152cm No sleep

Matched for

Matched for

No history of

Being 65 at the No history of

AHI 90

airflow >= 90 % for 10

AASM 1999

AASM 1999

ons Standard

More than

Complete

Absence of

>80 % drop of

Reduction

practices

50 % reduction

cessations of

airflow for

airflow in

airflow > 90 % respiratory

of airflow for

airflow for 10s

more than 10s

London >= 10s during 10s with amplitude for

for 10s

despite

Complete

evidence of

associated with associated with

respiratory

cessation of

persistent

effort

persistent

airflow >= 10s respiratory

increased

respiratory

in Melbourne

respiratory

effort

effort

50 % of airflow Reduction of

Decrease of

reduction of

50 % of airflow 50 % of the

reduction with

airflow for 10s

for 10s

amplitude of

a >4 % dip in

accompanied

the respiratory

effort 30 % drop of

Reduction

Reduction in

airflow of 50 % respiratory

>30 % for 10s

airflow >=

>=10

amplitude for

with 4 % or

30 % for 10s

saturation or an or

more than 10s

greater oxygen

and

by desat >=3 % effort, with a

arousal for

Reduction of

and 3 % or

desat.

accompanied

or

fall of 3 % in

London

airflow by

greater oxygen

with 4 % or

oxygen

50 % decrease

30 % > 10s

desat.

greater oxygen

saturation, or

in airflow

accompanied

desaturation or

autonomic

without

with EEG

EEG arousal

arousal

requirement for arousal and/or

according to a

desat, plus

decrease in

those with a

EP

TE D

Reduction of

more than 10s

continued or

50 to 75 %

microarousals

AC C

Hypopnea def

80 % drop of

SC

10s

M AN U

Apnea def

RI PT

Recommandati ASDA

2009 [44]

3% desat

ACCEPTED MANUSCRIPT

PTT

less airflow reduction associated with

RI PT

3 % desat or arousal for Melbourne

Device

CID-102

Tyco

Healthdyne /

Healthcare

Embla

HypnoPTT

Alice-3 /

Flaga

SC

Cidelec

M AN U

Manufacturer

Oro-nasal

PSG

PG

Nasal cannula

Nasal pressure

TE D

Nasal flux

PSG

Oral flux

Oro-nasal

Yes

AC C

thermistor Saturation

Thermistor

Finger pulse

Pulse oxymetry

Embletta

Compumedics

Crystal monitor ApneaLink

E-series Sleep

/ Sandman

System PSG

PG ?

PSG

PSG

Nasal cannula

YES

Nasal cannula

PG

pressure monitor Thermal sensor YES

Oximeter

oxymeter

EEG

ResMed

+ nasal air

EP

thermistor

PSG

CleveMed

Sleep

Somnologica

PSG/PG

Compumedics

Finger pulse

YES

YES

Pulse oximetry

Standard EEG

YES

oxymeter

FP1/FP1 ;

4-channel

C3/C4 ;

(C3/A4 ;

T3/T4;O1/O2

C4/A1 ;

ACCEPTED MANUSCRIPT

O1/A2 ;

Electro-

Left and right

4-channel

Electromyogra

Anterior

Submental

m

tibialis muscles

Intercostal

oculogram

SC

Anterior

RI PT

O2/A1) YES

YES

YES

Chin and legs

YES

YES

tibialis muscles Yes (heart rate)

Bodyposition

On-lead ECG

Yes

Yes

ECG with

One-lead &

surface

thoracic

electrodes

electrodes

M AN U

ECG

Bodyposition

YES

TE D

sensor

Rib cage

Thoracic

excursions

impedance Strain gauges

EP

Abdominal respiratory

Snoring

Piezolelectric

Strain gauges

Yes

YES

bands

Abdominal and thoracic belts

AC C

movements

YES

Tracheal

Microphone

YES

microphone

Treatment

Remarks

11 with CPAP

Before and

Before and

Sham and

at MRI

after CPAP

after CPAP

active CPAP

Occurence of

Controls :

Videotape

ACCEPTED MANUSCRIPT

AHI 0.2

AC C

whole brain

Table S6: statistics used in the VBM sleep apnea literature

AHI: apnea/hypopnea index, ANCOVA: analysis of covariance, ANOVA: analysis of variance, BMI: body-mass index, BP: blood pressure, CPAP: continuous positive airway pressure, FDR: false discovery rate, FWE: family wise error, GM: grey matter, ODI: oxyhemoglobin desaturation index, ROI: region of interest, SVC: small volume correction, SUIT: Spatially unbiased infra tentorial template50, TIV: total intracranial volume

ACCEPTED MANUSCRIPT

Macey et al.

Morrell et al.

O'Donoghue Yaouhi et al.

Celle et al.

Morrell et

Joo et al.

Canessa et

2002 [41]

2003 [23]

et al. 2004

2009 [43]

al. 2010

2010 [45]

al. 2011 [48] 2011 [47]

2009 [44]

[46]

Bilateral

Posterior parietal cortex

SC

Anterior cingulate gyrus Left

Bilateral

Ventral lateral cortex Left

sites

Posterior inferior Dorsolateral prefrontal Middle gyrus Temporal

EP AC C

Frontomarginal gyrus

Bilateral

TE D

Lateral prefrontal Multiple

Gyrus rectus

2014 [10]

M AN U

Anterior superior gyrus Bilateral

Superior frontal gyrus

al. 2012

Left

Frontal

Prefrontal cortex

Huynh et al. Lundblad et

[49]

Parietal Inferior parietal gyrus

Zhang et

RI PT

[42]

Torelli et al.

Left Bilateral

Right

Left Bilateral Left Left Left

al. 2014 [24]

ACCEPTED MANUSCRIPT

Inferior temporal gyrus Bilateral

Bilateral Bilateral

Superior temporal gyrus

Bilateral

Right

Right

RI PT

Middle temporal gyrus

Anterior lobe Bilateral Bilateral

Mesial lobe

Bilateral

SC

Posterior lobe

Occipital Middle occipital gyrus

Right

Lingual gyrus

Right

Cuneus Hippocampus

Right

Parahippocampal gyrus

Bilateral

Right

Cerebellum

Quadrangula

Right

Brainstem

EP

Right

AC C

r lobule

Left

Vermis Insula

TE D

Left

Fusiform gyrus

Left

Right Right+left

Bilateral Left Quadrangul ar & biventer lobules

Bilateral

Yes Left

Bilateral

M AN U

Lateral temporal

Right

Bilateral

ACCEPTED MANUSCRIPT

Basal ganglia Right

Putamen

Left

Caudate nucleus

Left

Pallidum

Left

Bilateral

Bilateral

Amygdala

SC

Uncorr

Only GM

Uncorr

Only GM

Reduction of

GM increase

No CSF

concentration

GM

concentration

WM in the

in RVLM

No WM

results

increase :

results

temporal

Dorsolateral

No results in

lobes nearby

pons

volume (ie

the

with

hippocampus

No increase

M AN U

Remarks

Bilateral

RI PT

Thalamus

R basal ganglia

parietal lobes

TE D

frontal &

modulation)

EP

Table S7: grey matter results according to the VBM sleep apnea literature

AC C

CSF: cerebrospinal fluid, GM: grey matter, RVLM: rostral ventrolateral medulla, VBM: voxel-based morphometry, WM: white matter Bold represents areas where a corrected threshold (correction on cluster or on voxel-level) was used.

Desperately seeking grey matter volume changes in sleep apnea: A methodological review of magnetic resonance brain voxel-based morphometry studies.

Cognitive impairment related to obstructive sleep apnea might be explained by subtle changes in brain anatomy. This has been mainly investigated using...
413KB Sizes 0 Downloads 9 Views