Assessment and quantification of sources of variability in breast apparent diffusion coefficient (ADC) measurements at diffusion weighted imaging.

G Model EURR-7147; No. of Pages 8

ARTICLE IN PRESS European Journal of Radiology xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

European Journal of Radiology journal homepage: www.elsevier.com/locate/ejrad

Assessment and quantification of sources of variability in breast apparent diffusion coefficient (ADC) measurements at diffusion weighted imaging E. Giannotti a,1 , S. Waugh b,c,∗ , L. Priba b,2 , Z. Davis a,3 , E. Crowe c , S. Vinnicombe d a

Breast Imaging Department, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK Department of Medical Physics, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK c Department of Clinical Radiology, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK d Division of Imaging and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK b

a r t i c l e

i n f o

Article history: Received 30 March 2015 Received in revised form 21 May 2015 Accepted 29 May 2015 Keywords: Breast cancer Magnetic resonance imaging Diffusion weighted imaging Apparent diffusion coefficients Reproducibility

a b s t r a c t Purpose: Apparent Diffusion Coefficient (ADC) measurements are increasingly used for assessing breast cancer response to neoadjuvant chemotherapy although little data exists on ADC measurement reproducibility. The purpose of this work was to investigate and characterise the magnitude of errors in ADC measures that may be encountered in such follow-up studies- namely scanner stability, scan–scan reproducibility, inter- and intra- observer measures and the most reproducible measurement of ADC. Methods: Institutional Review Board approval was obtained for the prospective study of healthy volunteers and written consent acquired for the retrospective study of patient images. All scanning was performed on a 3.0-T MRI scanner. Scanner stability was assessed using an ice-water phantom weekly for 12 weeks. Inter-scan repeatability was assessed across two scans of 10 healthy volunteers (26–61 years; mean: 44.7 years). Inter- and intra-reader analysis repeatability was measured in 52 carcinomas from clinical patients (29–70 years; mean: 50.0 years) by measuring the whole tumor ADC value on a single slice with maximum tumor diameter (ADCS ) and the ADC value of a small region of interest (ROI) on the same slice (ADCmin ). Repeatability was assessed using intraclass correlation coefficients (ICC) and coefficients of repeatability (CoR). Results: Scanner stability contributed 6% error to phantom ADC measurements (0.071 × 10−3 mm2 /s; mean ADC = 1.089 × 10−3 mm2 /s). The measured scan-scan CoR in the volunteers was 0.122 × 10−3 mm2 /s, contributing an error of 8% to the mean measured values (ADCscan1 = 1.529 × 10−3 mm2 /s; ADCscan2 = 1.507 × 10−3 mm2 /s). Technical and clinical observers demonstrated excellent intra-observer repeatability (ICC > 0.9). Clinical observer CoR values were marginally better than technical observer measures (ADCS = 0.035 × 10−3 mm2 /s vs. 0.097 × 10−3 mm2 /s; ADCmin = 0.09 × 10−3 mm2 /s vs. 0.114 × 10−3 mm2 /s). Inter-reader ICC values were good 0.864 (ADCS ) and fair 0.677 (ADCmin ). Corresponding CoR values were 0.202 × 10−3 mm2 /s and 0.264 × 10−3 mm2 /s, respectively. Conclusions: Both scanner stability and scan–scan variation have minimal influence on breast ADC measurements, contributing less than 10% error of average measured ADC values. Measurement of ADC values from a small ROI contributes a greater variability in measurements compared with measurement of ADC across the whole visible tumor on one slice. The greatest source of error in follow-up studies is likely to be associated with measures made by multiple observers, and this should be considered where multiple measures are required to assess response to treatment. © 2015 Elsevier Ireland Ltd. All rights reserved.

∗ Corresponding author at: Department of Medical Physics, Ninewells Hospital and Medical School, Dundee DD1 9SY UK. Tel.: +44 1382 496551; fax: +44 1382 640177. E-mail addresses: [email protected] (E. Giannotti), [email protected] (S. Waugh), [email protected] (L. Priba), [email protected] (Z. Davis), [email protected] (E. Crowe), [email protected] (S. Vinnicombe). 1 Present address: Department of Experimental and Clinical Biochemical Sciences “Mario Serio”, University of Florence, Firenze 50134, Italy 2 Present address: Department of Medical Physics, Edinburgh Royal Infirmary, Edinburgh EH16 4SA, UK 3 Present address: Department of Radiology, Edinburgh Royal Infirmary, Edinburgh EH16 4SA, UK http://dx.doi.org/10.1016/j.ejrad.2015.05.032 0720-048X/© 2015 Elsevier Ireland Ltd. All rights reserved.

Please cite this article in press as: E. Giannotti, et al., Assessment and quantification of sources of variability in breast apparent diffusion coefficient (ADC) measurements at diffusion weighted imaging, Eur J Radiol (2015), http://dx.doi.org/10.1016/j.ejrad.2015.05.032


ARTICLE IN PRESS E. Giannotti et al. / European Journal of Radiology xxx (2015) xxx–xxx

2

1. Introduction

2.2. MRI acquisition/technique

Breast magnetic resonance imaging (MRI) is becoming a more widely utilized tool in clinical practice for a range of indications including screening, local staging and increasingly, in the assessment of response to neoadjuvant chemotherapy (NAC) in breast cancer patients [1–3]. The addition of diffusion weighted imaging (DWI) to standard morphological and dynamic contrast-enhanced imaging sequences can improve lesion characterization, and results in a concomitant increase in the specificity and positive predictive value of breast MRI studies [4–7]. There is a large volume of published research into the use of diffusion-weighted imaging and apparent diffusion coefficients (ADC) in prediction and response assessment in patients undergoing neoadjuvant chemotherapy (NAC) for breast cancer but despite this, DWI is not routinely used for response assessment in clinical practice. Whereas some authors report that early changes in ADC values predict response before significant changes in tumor size occur [1,2,8], others have found no such relationship [3]. This may be partially attributable to the heterogeneity of the studies and indeed of the subjects [9]. However, the lack of standardization of DWI parameters and a uniform method of interpretation [10] may also be a contributory factor in the under-utilization of DWI. The development of standardized interpretation criteria requires an understanding of the effect of biological variability on the range of ADC values encountered in normal breast and breast lesions [11] but also of the effect of measurement error. To date, there have been numerous studies considering the influence of scan sequence and parameters [12–14], but there is little data available on the repeatability of ADC quantification in the breast. In order for ADC measurements to be routinely used in clinical practice, characterization and quantification of typical sources of variability in ADC measures is imperative [15,16]- particularly if changes in ADC measures are ever to be utilized in assessing response to therapy. Measurements of ADC values may be subject to several sources of variability, including patient-related factors (precise population under study, biological within-patient variation), scanner stability and systematic reader errors (intra-and inter-observer repeatability, choice of region of interest). The purpose of this work was to investigate and characterize the relative magnitude of these factors, which would give rise to typical ADC variability within routine clinical scanning, whilst utilizing a fixed imaging protocol to minimize any sequence-related variations in ADC measures. We addressed these sources of variability through a combination of studies using phantoms, healthy volunteers and retrospective analysis of DWI data collected routinely during clinical breast MRI examinations.

All MRI examinations were performed using a 32-channel 3.0T MRI scanner (Siemens Trio; Erlangen, Germany). All patients were imaged in a head-first orientation, lying prone with both breasts positioned in a dedicated 7-channel breast coil (Invivo Breast Biopsy Coil; Orlando, USA). The standard acquisition protocol consisted of a T1 weighted fast spoiled gradient echo sequence (TR/TE/˛ = 6.5 ms/2.6 ms/25◦ , slice thickness 1 mm, field of view 340 × 340 mm, matrix 384 × 384) and a T2 turbo spin echo sequence (TR/TE 7500/82 ms, slice thickness 2 mm, field of view 340 × 340 mm, matrix 384 × 384). Patients included in the study also underwent a dynamic contrast enhanced sequence using a spoiled gradient echo sequence (TR/TE/␣ 3.86 ms/ 1.46 ms, 10◦ , slice thickness 0.9 mm, field of view 340 × 340 mm, matrix 384 × 384). Diffusion weighted imaging (DWI) was carried out prior to acquisition of the dynamic contrast-enhanced sequence in patients. DWI was acquired in the axial plane covering both breasts with an echo planar imaging sequence utilizing a short TI inversion recovery (STIR) preparation pulse at 220 ms (TR/TE 10,400/93 ms; thickness 4 mm; no inter-slice gap; field of view 360 × 132; matrix 256 × 256; 4 signal averages; GRAPPA parallel imaging factor 2; voxel size 1.4 × 1.4 × 4 mm; acquisition time 5 min 22 s) with two b values (50, 800 s/mm2 ). Gradients were applied in three orthogonal directions. Average ADC maps were generated by system software according to the mono-exponential equation: S(b) = S(0) exp(-b.ADC), where S(b2 ) and S(b1 ) are the signal intensities for the two user-defined b-values.

2. Materials and methods 2.1. Patient selection Both volunteers and patients were scanned as part of this study. Institutional Review Board approval was acquired for the prospective study involving ten consenting healthy female volunteers (age range: 26–61 years, mean age 44.7 years). For the patient study, 54 patients (29–70 years; mean: 50.0 years), referred for a breast MRI examination for known breast cancer between November 2011 and October 2012, provided written consent for the use of their anonymized images for research purposes. In accordance with local and national ethics guidance, informed consent and ethical approval was waived for this retrospective study.

2.3. Image analysis Image analysis was performed either using a Leonardo Multi Modality Workplace (MMWP) workstation (for phantom and healthy volunteer studies) or a Kodak Carestream workstation (patient studies). The healthy volunteer images were anonymized and randomly presented to the reader to minimize bias. All image analysis was performed by an experienced breast radiologist (SV, 15 years reporting breast MRI). For the patient dataset, an MR physicist with breast MRI expertise (SW, technical observer, 9 years experience) and a senior radiology resident (EG, 3 years reporting breast MRI, no previous experience in breast DWI), also performed the analysis to assess the influence of ADC measures by different readers. 2.4. Scanner stability To assess scanner stability, a phantom was built by immersing a vial of water into a larger container filled with iced water [15] and left to equilibrate for approximately 30 min before scanning. This method has been shown to provide thermal stability and hence minimize temperature-related variations of ADC value [15]. The phantom was scanned weekly, for twelve weeks, in each side of the breast coil using the standard DWI sequence described above. For ADC measurement, circular ROIs (diameter 20 mm) were manually drawn on the Siemens MMWP in the centre of the vial on 5 consecutive slices and an average value calculated. 2.5. Scan–scan reproducibility Scan–scan reproducibility was assessed in 10 consenting, healthy female volunteers who were scanned using the setup described. The volunteers were scanned twice, with a 1-month interval between examinations to ensure each pre-menopausal volunteer was scanned in the same phase of their menstrual cycle, to avoid potential hormonally induced variations in ADC measurements [11]. The slice to be interrogated was specified by a third


G Model

ARTICLE IN PRESS

EURR-7147; No. of Pages 8

E. Giannotti et al. / European Journal of Radiology xxx (2015) xxx–xxx

3

mixture of both surrounding the lesion, classification was as a mixed environment. Measurements were repeated after a minimum interval of 2 weeks by both expert readers. 2.7. Statistical analysis Scanner stability was assessed using the coefficient of variation (CoV) and Students t-test. Scan–scan reproducibility in the healthy volunteers was assessed with intra-class correlation coefficients (ICC) calculated using SPSS version 21.0 for Windows (IBM Corporation; New York, USA) and Bland–Altman plots with calculation of the coefficient of reproducibility (CoR). Inter- and intra- observer agreement for the patient studies was similarly assessed [17]. Coefficients of repeatability were expressed as a percentage of the mean value for comparison. Fig. 1. In plane tumor apparent diffusion coefficient (ADCS ) was measured by drawing a region of interest (ROI) around the entire perceived region of restricted diffusion. To calculate minimum ADC (ADCmin ), a 3 mm2 ROI (approximately 10 pixels) was used to interrogate within the whole tumor ROI.

party not involved in the analysis, and a region of interest (ROI) of around 40 pixels was placed in the most homogeneous area of abundant fibroglandular tissue on the ADC maps to minimize partial volume effects. This was typically a central slice, as confirmed on corresponding T2 weighted images. All healthy volunteer analysis was performed using the Siemens MMWP. 2.6. Measurement repeatability in patients Reader repeatability of ADC measuremeants was assessed using images from 54 consecutively imaged patients, referred for breast MRI for local staging or as a baseline prior to commencement of NAC. A total of 4 studies were excluded (failure of DWI sequence [n = 2]; primary lesion too small or posterior to be measurable [n = 2]). In total, 52 lesions were analyzed in 50 patients. The appropriate slice for analysis was specified by considering both the DWI and T2-weighted sequences. Generally this was the slice with the largest area of restricted diffusion or (in cases of tumors with obvious and gross cystic change) the slice with the largest solid component. Analysis was performed on a Kodak Carestream workstation using the selected slice of the diffusion-weighted images only. This was carried out twice by the 2 expert readers and once by the radiology resident. Average lesion ADC on a single slice (ADCS ) was measured by manually segmenting the entire area of restricted diffusion. The lowest ADC (ADCmin ) was measured using a small ROI of around 10 pixels (approx 3 mm2 ) which was manually moved within the boundary of the ADCS contour to identify the minimum ADC value within the tumor, on the selected slice (Fig. 1). Lesions were characterized as mass or non-mass according to the American College of Radiology BI-RADS Lexicon after comparison with the contrast enhanced series; maximum lesion diameter was recorded on the measured slice, as was tumor environment (fatty, glandular, mixed), as assessed on the T2 weighted image. Where the lesion was entirely surrounded by fat, the environment was classified as fatty, where it was surrounded by parenchymal tissue it was classified as glandular and where there was a

3. Results 3.1. Scanner stability and scan–scan reproducibility Scanner stability, as assessed using the phantom measurements, was excellent, with an average ADC of 1.089 × 10−3 mm2 /s and a CoV of 6.6% (0.071 × 10−3 mm2 /s). Average measurements over time from each individual side of the breast coil were not significantly different (ADCleft = 1.096 × 10−3 mm2 /s and ADCright = 1.082 × 10−3 mm2 /s, p = 0.74; Students t-test). For the healthy volunteers, there was no significant difference in ADC measurements between the left and right breast (mean values ADCleft 1.521 × 10−3 mm2 /s, ADCright 1.468 × 10−3 mm2 /s for both visits; p = 0.887, Students t-test). Comparing the first and second examinations, the CoR was 0.122 × 10−3 mm2 /s (8%), with average ADC values of ADCscan1 = 1.529 × 10−3 mm2 /s and ADCscan2 = 1.507 × 10−3 mm2 /s. 3.2. Intra- and Inter-observer agreement For the patient study, there were 38 mass (mean size: 27.9 mm, range: 15–65 mm) and 14 non-mass lesions (mean size: 48.1 mm, range: 18–78 mm). Median lesion diameter (as measured on the same slice from which ADC measurements were taken) was 27 mm (range 15–78 mm). As expected, the entire lesion ADC measure on one slice was significantly higher (mean: 1.046 × 10−3 mm2 /s) than the ADCmin measure on the same slice (mean: 0.727 × 10−3 mm2 /s) (p < 0.001; Students t-test). Overall, technical and clinical observers demonstrated excellent repeatability. Intra-observer repeatability for the clinical reader was CoR = 0.035 × 10−3 mm2 /s for ADCS and 0.090 × 10−3 mm2 /s for ADCmin , equating to 3.4% and 12.4%, respectively when expressed as a percentage of the mean measured values. For the technical reader, repeatability was slightly lower with ADCS CoR = 0.097 × 10−3 mm2 /s and 0.114 × 10−3 mm2 /s for ADCmin , equating to 9.2% and 16.2% of the mean values. For both observers and both measures of ADC, ICC values were greater than 0.9 for intra-observer agreement. Results are summarized in Table 1 and in the Bland–Altman plots (Fig. 2).

Table 1 Intra-observer measurements and repeatability for whole tumor (ADCS ) and minimum (ADCmin ) apparent diffusion coefficient (ADC) measures on a single slice for the technical and clinical observers (CoR- coefficient of repeatability; ICC- intra-class correlation coefficient).

Mean (×10−3 mm2 /s) CoR (×10−3 mm2 /s) ICC

ADCS technical

ADCS clinical

ADCmin technical

ADCmin clinical

1.009 0.097 0.962

1.045 0.035 0.996

0.702 0.114 0.917

0.725 0.090 0.972


G Model

ARTICLE IN PRESS



4

Fig. 2. Bland–Altman plot showing the intra-observer repeatability for both clinical and technical observers for (a) In plane tumor apparent diffusion coefficient (ADCS ) and (b) minimum apparent diffusion coefficient (ADCmin ). Horizontal lines indicate limits of agreement, calculated as Mean ±2 standard deviations. All ADC measures are in units ×10−3 mm2 /s.

Table 2 Intra-observer measurements and repeatability for whole tumor (ADCS ) and minimum tumor (ADCmin ) apparent diffusion coefficient (ADC) measures on a single slice by tumor environment for clinical and technical observers (CoR- coefficient of repeatability; ICC- intra-class correlation coefficient). ADCS clinical

ADCS technical

ADCmin clinical

ADCmin technical

Fatty breast Mean (×10−3 mm2 /s) CoR (×10−3 mm2 /s) ICC

1.053 0.034 0.995

1.000 0.108 0.957

0.732 0.116 0.937

0.696 0.101 0.960

Mixed breast Mean (×10−3 mm2 /s) CoR (×10−3 mm2 /s) ICC

1.040 0.053 0.996

1.012 0.135 0.965

0.722 0.111 0.983

0.703 0.175 0.875


G Model

ARTICLE IN PRESS



5

Table 3 Intra-observer measurements and repeatablity for whole tumor (ADCS ) and minimum tumor (ADCmin ) apparent diffusion coefficient (ADC) measures on a single slice dichotomized by lesion size for both clinical and technical observers (CoR- coefficient of repeatability; ICC- intra-class correlation coefficient). ADCS clinical

ADCS technical

ADCmin clinical

ADCmin technical

Large lesions (≥27 mm) Mean (×10−3 mm2 /s) CoR (×10−3 mm2 /s) ICC

1.060 0.039 0.997

1.023 0.096 0.973

0.723 0.065 0.991

0.685 0.093 0.947

Small lesions (≤27 mm) Mean (×10−3 mm2 /s) CoR (×10−3 mm2 /s) ICC

1.031 0.033 0.994

0.995 0.100 0.943

0.728 0.111 0.891

0.719 0.133 0.891

Intra-observer repeatability was further analyzed according to the nature of the surrounding breast tissue (fatty, mixed or glandular) as assessed by the expert breast radiologist using the T2-weighted image. There were 17 lesions with mainly fatty tissue surrounding the lesion and 33 with a mixed environment. Only 2 lesions were within densely glandular tissue and therefore these were not included in this analysis. Overall, reproducibility was found to be slightly better within fatty breasts rather than those with a mixed tumor surround as illustrated in Table 2. The effect of lesion size was also considered, after simple dichotomization by the median value (27 mm). Results are shown in Table 3. Repeatability for ADCS measures were similar between the two categories, but it can be seen that ADCmin is more reproducibly measured within larger lesions compared with small lesions (p < 0.05) (Table 3). Interobserver repeatability was on average 0.202 × 10−3 mm2 /s for ADCS (19.6% of the mean value) and 0.264 × 10−3 mm2 /s for ADCmin (36.8% of the mean value). These results are summarized in the Bland–Altman plots shown in Fig. 3. ICC values for ADCS were 0.864 and 0.677 for ADCmin , the latter representing only a fair agreement. The effect of lesion type (mass versus non-mass) is demonstrated in Table 4. ADC reproducibility was lower in non-mass lesions, particularly for ADCmin. . 4. Discussion In this work we have investigated some of the possible sources of ADC measurement variability in the breast, including the influence of scanner stability, intra-subject scan-scan variability, intraand inter observer repeatability and the effect of the size of the region of interest chosen to measure the ADC. We have attempted to quantify the relative magnitude of each of these sources of variability in relation to typical ADC values measured in normal breast parenchyma in healthy volunteers and in breast malignancies in patients. Not all sources of variation were considered in this study, such as menstrual status and sequence variables, since the effects of these sources of variability have been well characterized elsewhere [11–4,18]. The phantom utilized in this study is simple, inexpensive, offers a highly reproducible material to monitor temporal stability of MRI systems, and is well suited for performing serial ADC quality assurance measurements as may be required on systems on which ADC measures are employed in treatment response studies [15,19]. Phantom ADC measurements were within 1% of the ADC reported for water at 0 ◦ C (1.099 × 10 − 3 mm2 /s) [15] and results indicate that scanner stability has a minimal effect on the repeatability of ADC measurements, with a CoV of 6%, which is in line with other reports in the literature [15]. Malyarenko et al. [15] used a similar phantom to investigate the repeatability of ADC measurements across multiple scanners and found standard deviations at the isocentre to be less than 2% of that reported for water at 0 ◦ C, with a day-to-day repeatability within

4.5%. Giannelli et al. [19] reported that short term stability for an individual scanner was of the order of 1%, and inter-scanner measures (using MRI scanners from 3 different manufactures) resulted in around 7% of the measured mean diffusion value. Off-centre measurements demonstrated significant differences greater than 10%, and were vendor and system-specific [15] and therefore most likely to be attributable to gradient non-linearity. Breast MRI is, by definition, performed off-centre, however we found only small (non-significant) differences between the left and right breast coils over the 12 weeks of the phantom study. In our healthy population, the scan-to-scan coefficient of repeatability was 8% with good agreement between visits (ICC 0.81) for a single experienced observer. Potential sources of variability are likely to be due to positional differences in ROI placement within healthy fibroglandular tissue and the mixed fatty/fibroglandular composition of some of the breasts, which could lead to partial volume effects. A recent study by Aliu et al. [20] highlighted the importance of investigating the variability and repeatability of MRI measurements in normal breast tissue. In a small cohort of nine patients who underwent DWI twice within 72 h, they concluded that ADC measures were reproducible with a CoR of 6.1 × 10−4 mm2 /s, just over five times larger than the reported values in this study. While the absolute ADC values reported in this study are variable, this is likely to be related to volunteer age, breast characteristics and menstrual status, although all pre-menopausal women were imaged in the same phase of their menstrual cycle at both time points to negate any cyclic effects. O’Flynn et al. [11] found that while there were no significant variations within the menstrual cycle, there were significantly lower ADC values in the postmenopausal breast. That study demonstrated a higher inter-observer reproducibility (ICC = 0.93) compared with the present study (ICC = 0.811) which may be partly due to the smaller number of volunteers but more likely attributable to the prevalence of fatty involuted breasts in our volunteer population. The results from the patient study identified ADCS to be most repeatable metric with intra-reader ICC values greater than 0.91 for both experienced observers. Between readers, ADCS measures were also more reproducible with CoR values of 19.6% of mean values compared with much higher errors associated with ADCmin measures (36.8% of mean). Higher variability in both intra- and interobserver repeatability measures is likely to be related to influences of tumor heterogeneity, observer perception and imaging artefacts, such as marker clips, necrosis or post biopsy hemorrhage. These factors are likely to also explain the greater variability of ADCmin measurements in non-mass lesions. As expected, reproducibility was higher for mass than non-mass lesions, most likely due to easier border delineation in mass lesions. For these reasons, measurement of ADC across the entire visible region of restricted diffusion are likely to be more reproducible than ADCmin , particularly in follow up studies, even if potentially important information on tumor heterogeneity is obscured. Currently there is no consensus as to whether ADC values should be averaged across the lesion area on one slice or minimum ADC values within a lesion


G Model

ARTICLE IN PRESS



6

Fig. 3. Bland–Altman plots showing inter-observer repeatability for clinical, technical and radiology resident for (a) In plane tumor apparent diffusion coefficient (ADCS ) and (b) minimum apparent diffusion coefficient (ADCmin ). Horizontal lines indicate limits of agreement, calculated as Mean ±2 standard deviations. All ADC measures are in units ×10−3 mm2 /s.

should be reported and there are a number of studies reporting both methods [21–24]. Studies reporting ADCmin used various ROI sizes, with some reporting only minimum pixel value. However, smaller ROI sizes are more susceptible to noise; therefore we utilized a ROI with approximately 10 pixels to average out some of the

noise effect. Measurement of ADCmin was carried out by manually ‘scanning’ the small ROI across the slice, within the ADCS boundary. While other methods are possible, this was considered to be a consistent approach that could be readily implemented in a clinical scenario.

Table 4 Inter-observer measurements between clinical, technical and radiology resident observers for whole tumor (ADCS ) and minimum tumor (ADCmin ) apparent diffusion coefficient (ADC) measures on a single slice according to lesion type (CoR- coefficient of repeatability; ICC- intra-class correlation coefficient).

Mean (×10−3 mm2 /s) CoR (×10−3 mm2 /s) ICC

ADCS mass

ADCS non-mass

ADCmin mass

ADCmin non-mass

1.012 0.181 0.855

1.088 0.273 0.890

0.705 0.162 0.832

0.751 0.467 0.537




The biggest source of error in ADC measures was inter-observer variability, with between 20% and 37% error on the average measured ADC value for both ADCWT and ADCmin , even when slices were matched. Other investigators have also investigated intraand inter- observer variability [24,25] and report similar findings to those of this study with better intra-observer than inter-observer repeatability. Whilst Petralia et al. [25] suggested that different operators could potentially obtain comparable ADC values where image analysis was rigorously standardized, their conclusion was that the same operator should obtain serial ADC measurements, for example in patients undergoing neoadjuvant chemotherapy, which is in agreement with our findings. Rubesova et al. [24] briefly described intra- and inter- observer variation in calculating the ADC of both benign and malignant breast lesions greater than 7 mm, and found good repeatability, but data was presented graphically with no numerical data specified. Petralia et al. [25] analyzed Rubesova’s graphics and found limits of agreement of approximately −0.3; +0.3 × 10−3 mm2 /s (intra-observer) and −0.55; +2.5 × 10−3 mm2 /s (inter-observer). Both of these studies utilized different analysis techniques with ROI’s drawn by coregistering the contrast-enhanced images with the ADC maps [24] and the other using only the high b-value DWI to delineate the cancer [25]. This highlights the impact that operator analysis technique can potentially have in measuring ADC values, and demonstrates the requirement for a standardized analytic approach for breast cancer ADC measures if the technique is to be used in clinical practice for patient management decisions. It is likely that Computer Aided Diagnosis (CAD) systems could improve diagnostic performance and repeatability of results, and these could potentially also provide data across the entirety of the lesion [26]. Co-registration and automated detection and segmentation of lesions could ultimately result in robust, reliable and reproducible ADC quantitation alongside dynamic and morphological features [26], however presently we are aware of no studies that have validated such software for reporting of ADC measures. One limitation in this study is that only one, specified slice was utilized rather than entire lesion ADC value. However, ADC measurement on every slice without automated CAD systems is time-consuming and unlikely to be performed in a clinical scenario. We have also considered only one scanner, with one particular DWI protocol. Previous studies report the impact of differing bvalues and different pulse sequences on ADC measures [12,13], however this study was focused on replicating variability likely to be encountered in clinical studies where protocols are unlikely to change between examinations. We did not consider different fat suppression techniques, nor non-fat suppressed techniques, which systematically underestimate ADC values in normal and malignant breast tissue [6,27]. In the two patients excluded from our study due to DWI sequence failure, the breasts were large and fatty involuted, resulting in poor and non-uniform fat suppression and unreadable resulting images. Tagliafico et al. used a diffusion tensor imaging (DTI) sequence to generate ADC maps and measured intra- and inter- observer reproducibility of ADC values in the breast using such a sequence with 32 diffusion directions. They found that for four observers with 4–6 years of breast MRI experience there was still some inter-observer variation with intra-class correlation coefficients from (0.63 to 0.94) [28]. We have not considered non-mono-exponential models for calculation of ADC maps in this study as we used scanner software to generate ADC maps, as is likely to be the case in a clinical setting. However, the observer reproducibility assessment is still expected to be comparable, irrespective of the model used to produce ADC maps. Our scan–scan assessment of reproducibility was performed on multiple days (with one month time-interval between examinations) and therefore temporal influences could be introduced.

7

However, as clinically referred patients in follow-up studies would be scanned on multiple days, we felt this approach to be most appropriate, whilst matching menstrual status to minimize this known source of variation [11]. We conclude that breast ADC measurements are relatively unaffected by scanner stability or intra-subject scan–scan variation, as both of these measures contribute less than 10% error to the mean ADC measurement in both normal and malignant breast tissue. Intra-observer repeatability was good with ICC values greater than 0.9 for all in plane tumour measurements. Inter-observer variability was the largest factor influencing ADC measures with up to 37% error on the average measured value when ADC ‘hotspots’ are identified manually. We suggest that in follow-up clinical or research studies, particularly in departments with multiple reporting radiologists, the mean ADC of the whole lesion area in the slice with maximum restriction of diffusion is utilized, and that, where possible, only one observer measures serial ADC values in any one patient. In order to homogenize findings in multi-centre studies, a standardized reporting approach may be required- either by centralized reporting or at the very least, after formal assessment of observer repeatability with intensive training as necessary. References [1] X.R. Li, L.Q. Cheng, M. Liu, Y.J. Zhang, J.D. Wang, A.L. Zhang, et al., DW-MRI ADC values can predict treatment response in patients with locally advanced breast cancer undergoing neoadjuvant chemotherapy, Med. Oncol. 29 (2012) 425–431. [2] C. Iacconi, M. Giannelli, Can diffusion-weighted MR imaging be used as a biomarker for predicting response to neoadjuvant chemotherapy in patients with locally advanced breast cancer, Radiology 259 (2011) 303–314. [3] L. Nilsen, A. Fangberget, O. Geier, D.R. Olsen, T. Seierstad, Diffusion-weighted magnetic resonance imaging for pretreatment prediction and monitoring of treatment response of patients with locally advanced breast cancer undergoing neoadjuvant chemotherapy, Acta Oncol. 49 (2012) 354–360. [4] Y. Guo, Y.Q. Cai, Z.L. Cai, Y.G. Gao, N.Y. An, L. Ma, et al., Differentiation of clinically benign and malignant breast lesions using diffusion-weighted imaging, J. Magn. Reson. Imaging 16 (2002) 172–178. [5] X. Chen, W.L. Li, Y.L. Zhang, Q. Wu, Y.M. Guo, Z.L. Bai, Meta-analysis of quantitative diffusion-weighted MR imaging in the differential diagnosis of breast lesions, BMC Cancer 10 (2010) 693–703. [6] S.C. Partridge, H. Rahbar, R. Murthy, X. Chai, B.F. Kurland, W.B. DeMartini, et al., Improved diagnostic accuracy of breast MRI through combined apparent diffusion coefficients and dynamic contrast-enhanced kinetics, Magn. Reson. Med. 65 (2011) 1759–1767. [7] N.H. Peters, I.H. Borel Rinkes, N.P. Zuithoff, N.P. Zuithoff, W.P. Mali, K.G. Moons, P.H. Peeters, Meta-analysis of MR imaging in the diagnosis of breast lesions, Radiology 246 (2008) 116–124. [8] M.D. Pickles, P. Gibbs, M. Lowry, L.W. Turnbull, Diffusion changes precede size reduction in neoadjuvant treatment of breast cancer, Magn. Reson. Imaging 24 (2006) 843–847. [9] E.S. McDonald, J.G. Schopp, S. Peacock, W.B. DeMartini, H. Rahbar, C.D. Lehman, et al., Diffusion-weighted MRI association between patient characteristics and apparent diffusion coefficients of normal breast fibroglandular tissue at 3 T, AJR Am. J. Roentgenol. 202 (2014) W496–W502. [10] R. Prevos, M.L. Smidt, V.C. Tjan-Heijnen, M. van Goethem, R.G. Beets-Tan, J.E. Wildberger, et al., Pre-treatment differences and early response monitoring of neoadjuvant chemotherapy in breast cancer patients using magnetic resonance imaging: a systematic review, Eur. Radiol. 22 (2012) 2607–2616. [11] E.A. O’Flynn, V.A. Morgan, S.L. Giles, N.M. deSouza, Diffusion weighted imaging of the normal breast: reproducibility of apparent diffusion coefficient measurements and variation with menstrual cycle and menopausal status, Eur. Radiol. 22 (2012) 1512–1518. [12] F.P.A. Periera, G. Martins, E. Figueiredo, M.N.A. Domingues, R.C. Domingues, L.M. Barbosa da Fonseca, E.L. Gasparetto, Assessment of breast lesions with diffusion weighted MRI: comparing the use of different b-values, Am. J. Roentgenol. 193 (2009) 1030–1035. [13] E. Wenkel, C. Geppert, R. Schulz-Wendtland, M. Uder, B. Kiefer, W. Bautz, R. Janka, Diffusion weighted imaging in breast MRI: comparison of two different pulse sequences, Acad. Radiol. 14 (2007) 1077–1083. [14] S.C. Partridge, E.S. McDonald, Diffusion weighted magnetic resonance imaging of the breast: protocol optimization, interpretation and clinical applications, Magn. Reson. Imaging C 21 (2013) 601–624. [15] D. Malyarenko, C.J. Galbán, F.J. Londy, C.R. Meyer, T.D. Johnson, A. Rehemtulla, et al., Multi-system repeatability and reproducibility of apparent diffusion coefficient measurement using an ice-water phantom, J. Magn. Reson. Imaging 37 (2013) 1238–1246. [16] S. Colagrande, F. Pasquinelli, L.N. Mazzoni, G. Belli, G. Virgili, MR-diffusion weighted imaging of healthy liver parenchyma: repeatability and




8

[17] [18]

[19]

[20]

[21]

[22]

reproducibility of apparent diffusion coefficient measurement, J. Magn. Reson. Imaging 31 (2010) 912–920. J.M. Bland, D.G. Altman, Statistical methods for assessing agreement between two methods of clinical measurement, Lancet 1 (1986) 307–310. A. Ogura, K. Hayakawa, T. Miyati, F. Maeda, Imaging parameter effects in apparent diffusion coefficient determination of magnetic resonance imaging, Eur. J. Radiol. 77 (2011) 185–188. M. Giannelli, R. Sghedoni, C. Iacconi, M. Iori, A.C. Traino, M. Guerrisi, et al., MR scanner systems should be adequately characterized in diffusion-MRI of the breast, PloS One 9 (2014) e86280. S.O. Aliu, E.F. Jones, A. Azziz, J. Kornak, L.J. Wilmes, D.C. Newitt, et al., Repeatability of quantitative MRI measurements in normal breast tissue, Transl. Oncol. 7 (2014) 130–137. M. Hirano, H. Satake, S. Ishigaki, M. Ikeda, H. Kawai, S. Naganaw, Diffusion weighted imaging of breast masses: comparison of diagnostic performance using various apparent diffusion coefficient parameters, Am. J. Roentgenol. 198 (2012) 717–722. S.K. Jeh, S.H. Kim, H.S. Kim, B.J. Kang, S.H. Jeong, H.W. Yim, B.J. Song, Correlation of the apparent diffusion coefficient values and dynamic magnetic resonance imaging findings with prognostic factors in invasive ductal carcinoma, J. Magn. Reson. Imaging 33 (2011) 102–109.

[23] S. Kul, I. Eyuboglu, A. Cansu, E. Alhan, Diagnostic efficacy of the diffusion weighted imaging in the characterisation of different types of breast lesions, J. Magn. Reson. Imaging 40 (2014) 1158–1164. [24] E. Rubesova, A.S. Grell, V. De Maertelaer, T. Metens, S.L. Chao, M. Lemort, Quantitative diffusion imaging in breast cancer: a clinical prospective study, J. Magn. Reson. Imaging 24 (2006) 319–324. [25] G. Petralia, L. Bonello, P. Summers, L. Preda, A. Malasevschi, S. Raimondi, et al., Intraobserver and interobserver variability in the calculation of apparent diffusion coefficient (ADC) from diffusion-weighted magnetic resonance imaging (DW-MRI) of breast tumours, Radiol. Med. 116 (2011) 466–476. [26] M.D. Dorrius, M.C. Jansen-van der Weide, P.M. van Ooijen, R.M. Pijnappel, M. Oudkerk, Computer-aided detection in breast mri: a systematic review and meta-analysis, Eur. Radiol. 21 (8) (2011) 1600–1608. [27] P. Mürtz, M. Tsesarskiy, A. Kowal, F. Traber, J. Gieseke, W.A. Willinek, et al., Diffusion-weighted magnetic resonance imaging of breast lesions: the influence of different fat-suppression techniques on quantitative measurements and their reproducibility, Eur. Radiol. 24 (2014) 2540–2551. [28] A. Tagliafico, G. Rescinito, F. Monetti, A. Villa, F. Chiesa, E. Fisci, et al., Diffusion tensor magnetic resonance imaging of the normal breast: reproducibility of dti-derived fractional anisotropy and apparent diffusion coefficient at 3.0T, Radiol. Med. 117 (2012) 992–1003.


Diffusion weighted imaging and apparent diffusion coefficient in 3 tesla magnetic resonance imaging of breast lesions.

Impact of measurement parameters on apparent diffusion coefficient quantification in diffusion-weighted-magnetic resonance imaging.

Apparent diffusion coefficient reproducibility of the pancreas measured at different MR scanners using diffusion-weighted imaging.

Diffusion-Weighted Breast Magnetic Resonance Imaging: A Semiautomated Voxel Selection Technique Improves Interreader Reproducibility of Apparent Diffusion Coefficient Measurements.

Diffusion-weighted imaging: Effects of intravascular contrast agents on apparent diffusion coefficient measures of breast malignancies at 3 Tesla.

Reproducibility of Apparent Diffusion Coefficient Measurements in Malignant Breast Masses.

The difference in diffusion-weighted imaging with apparent diffusion coefficient between spontaneous and postoperative intracranial infection.

Differential Diagnosis of Benign and Malignant Breast Tumors Using Apparent Diffusion Coefficient Value Measured Through Diffusion-Weighted Magnetic Resonance Imaging.

Additional Value of Diffusion-Weighted Imaging to Evaluate Prognostic Factors of Breast Cancer: Correlation with the Apparent Diffusion Coefficient.

Apparent diffusion coefficient (ADC) measurements in pancreatic adenocarcinoma: A preliminary study of the effect of region of interest on ADC values and interobserver variability.

Coronal diffusion-weighted magnetic resonance imaging of the kidney: agreement with axial diffusion-weighted magnetic imaging in terms of apparent diffusion coefficient values.

Intra- and interobserver variability of whole-tumour apparent diffusion coefficient measurements in nephroblastoma: a pilot study.

Diffusion-weighted imaging in the head and neck region: usefulness of apparent diffusion coefficient values for characterization of lesions.

Role of diffusion-weighted magnetic resonance imaging and apparent diffusion coefficient values in the detection of gastric carcinoma.

Apparent diffusion coefficient (ADC) value to evaluate BI-RADS 4 breast lesions: correlation with pathological findings.

Region of interest demarcation for quantification of the apparent diffusion coefficient in breast lesions and its interobserver variability.

Usefulness of Apparent Diffusion Coefficient of Diffusion-Weighted Imaging for Differential Diagnosis of Primary Solid and Cystic Renal Masses.

Characterization of Liver Tumors by Diffusion-Weighted Imaging: Comparison of Diagnostic Performance Using the Mean and Minimum Apparent Diffusion Coefficient.

Diffusion-weighted MR imaging in laryngeal and hypopharyngeal carcinoma: association between apparent diffusion coefficient and histologic findings.

Prognostic value of diffusion-weighted imaging summation scores or apparent diffusion coefficient maps in newborns with hypoxic-ischemic encephalopathy.

Apparent diffusion coefficient value of gastric cancer by diffusion-weighted imaging: correlations with the histological differentiation and Lauren classification.

Comparison between Gleason score and apparent diffusion coefficient obtained from diffusion-weighted imaging of prostate cancer patients.

Correlations between apparent diffusion coefficient values of invasive ductal carcinoma and pathologic factors on diffusion-weighted MRI at 3.0 Tesla.

Applicable apparent diffusion coefficient of an orthotopic mouse model of gastric cancer by improved clinical MRI diffusion weighted imaging.