Eur Radiol DOI 10.1007/s00330-015-4051-2

BREAST

Is there a systematic bias of apparent diffusion coefficient (ADC) measurements of the breast if measured on different workstations? An inter- and intra-reader agreement study Paola Clauser 1,2 & Magda Marcon 2,3 & Marta Maieron 4 & Chiara Zuiani 2 & Massimo Bazzocchi 2 & Pascal A. T. Baltzer 1

Received: 20 August 2015 / Revised: 22 September 2015 / Accepted: 28 September 2015 # European Society of Radiology 2015

Abstract Objectives To evaluate the influence of post-processing systems, intra- and inter-reader agreement on the variability of apparent diffusion coefficient (ADC) measurements in breast lesions. Methods Forty-one patients with 41 biopsy-proven breast lesions gave their informed consent and were included in this prospective IRB-approved study. Magnetic resonance imaging (MRI) examinations were performed at 1.5 T using an EPI-DWI sequence, with b-values of 0 and 1000 s/mm2. Two radiologists (R1, R2) reviewed the images in separate sessions and measured the ADC for lesion, using MRIworkstation (S-WS), PACS-workstation (P-WS) and a commercial DICOM viewer (O-SW). Agreement was evaluated using the intraclass correlation coefficient (ICC), Bland– Altman plots and coefficient of variation (CV). Results Thirty-one malignant, two high-risk and eight benign mass-like lesions were analysed. Intra-reader agreement was almost perfect (ICC-R1=0.974 ICC-R2=0.990) while interreader agreement was substantial (ICC from 0.615 to 0.682).

Bland–Altman plots revealed a significant bias in ADC values measured between O-SW and S-WS (P=0.025), no further systematic differences were identified. CV varied from 6.8 % to 7.9 %. Conclusion Post-processing systems may have a significant, although minor, impact on ADC measurements in breast lesions. While intra-reader agreement is high, the main source of ADC variability seems to be caused by inter-reader variation. Key points • ADC provides quantitative information on breast lesions independent from the system used. • ADC measurement using different workstations and software systems is generally reliable. • Systematic, but minor, differences may occur between different post-processing systems. • Inter-reader agreement of ADC measurements exceeded intra-reader agreement. Keywords Breast . Magnetic Resonance Imaging . Diffusion-weighted Imaging . Apparent Diffusion Coefficient . Observer Variation

* Pascal A. T. Baltzer [email protected]

Introduction 1

Department of Biomedical Imaging and Image-guided Therapy, Division of Molecular and Gender Imaging, Medical University of Vienna/General Hospital Vienna, Waehringer Guertel 18-20, 1090 Vienna, Austria

2

Institute of Diagnostic Radiology, Department of Medical and Biological Sciences, University of Udine, Udine, Italy

3

Department of Radiology, University Hospital Zurich, Zurich, Switzerland

4

SOC Fisica Sanitaria, Azienda Ospedaliero-Universitaria, S.Maria della Misericordia, Udine, Italy

Diffusion-weighted imaging (DWI) is increasingly used in breast MRI, owing to its ability to differentiate between malignant and benign lesions with a good sensitivity and specificity [1–3], and to evaluate response after neoadjuvant chemotherapy [4]. With the application of DWI sequences, two sets of images are obtained, providing both qualitative and quantitative information. Diffusion-weighted images show a signal decrease that is positively correlated with the

Eur Radiol

Fig. 1 Example of measurements performed by the readers. ADC maps were obtained using the MRI system software (S-WS; software NUMARIS 4 version Syngo MR B17), and measured on the apposite workstation (a). The ADC maps were then sent to the hospital PACS (PWS; Suitestensa RIS PACS, Esaote, Genova, Italy) for the second

measurement (b). Finally, DICOM data were uploaded into a commercially available DICOM viewer (O-SW; Osirix PRO, Aycan medical Systems, NY, USA) and calculated using a dedicated ADC toolbox (ADC map plug-in software, Stanford, USA; c)

degree of water molecule diffusion. In other words, a high remaining signal on highly diffusion-weighted images corresponds to restricted diffusion. This is regularly observed in malignant neoplasms and can be referred to microstructural tissue properties, one of them being cellularity [5]. When at least two different b-values are used during the acquisition, quantitative information can be obtained by calculating the apparent diffusion coefficient (ADC). However promising DWI is for clinical application in breast MRI, there are still issues of standardization and reliability of DWI sequences and ADC measurements. Variation in ADC values, and consequently ADC thresholds for lesion characterization, is influenced by the choice of b-values [3] and can be affected by variations in field homogeneity, eddy currents and coil systems [2, 6]. Post processing, either on vendor-supplied workstations or on an institutional picture archiving and communication system (PACS), is necessary to calculate the ADC values. Software calculates ADC using linear regression; the signal decay is evaluated on a logarithmic scale, and the slope of the assumed monoexponential decay provides the ADC value. A noise level threshold is defined to minimize the effects of background noise, and the ADC is calculated on a voxel-by-voxel basis. In clinical practice, the available software platform provided by the institutional PACS vendor is regularly used to

obtain quantitative information on ADCs. On the other hand, in many published studies, dedicated workstations are used; thus, often the study setting does not resemble that of every day practice, where often various different post-processing systems might be used to evaluate the same exam, and the eventual workstation-related variability is not considered. An investigation on the reliability of ADC measurements using different workstations and software platforms is still missing. The purpose of the study was to investigate whether the use of different workstations or software platforms for ADC measurements, is associated with a systematic bias of the obtained ADC values.

Table 1 Means and ranges of ADC values, measured by both observers using an MRI workstation, the hospital PACS and a commercially available system. Results are stratified by lesion type (benign vs. malignant)

Materials and methods Patient population and lesion characteristics Our institutional review board (IRB) approved the study and informed consent for the examination was obtained. Consecutive patients that underwent breast MRI between February 2014 and May 2014 were eligible for this prospective single-centre, cross-sectional study. Inclusion criteria were: presence of a mass-like lesion on MRI and availability of histological characterization for this lesion. Exclusion criteria were: absence of a complete MRI

Reader 1

S-WS P-WS O-SW

Reader 2

Benign

Malignant

Benign

Malignant

mean (range)

mean (range)

mean (range)

mean (range)

× 10-3 mm2/s

× 10-3 mm2/s

× 10-3 mm2/s

× 10-3 mm2/s

1.4 (1.14–1.73) 1.39 (1.15–1.66) 1.38 (1.03–1.59)

0.89 (0.58–1.44) 0.88 (0.6–1.2) 0.9 (0.68–1.19)

1.31 (0.62–1.89) 1.3 (0.7–1.97) 1.35 (0.79–1.87)

0.99 (0.59–1.4) 0.99 (0.65–1.4) 0.98 (0.64–1.39)

S-WS: MRI system software; P-WS: hospital PACS system; O-SW: commercially available DICOM viewer

Eur Radiol

Fig. 2 Bland–Altman plots evaluating inter-reader agreement for ADC values measured on breast lesions using the MRI software system (a), the available PACS system (b) and a commercial DICOM viewer (c). R1=

Reader 1; R2=Reader 2; S=MRI workstation; P=picture archiving and communications system workstation; O = commercially available DICOM viewer with dedicated software for ADC calculation

examination; presence of artefacts that affected DWI interpretation (incorrect fat-suppression, motion artefacts); presence of post-biopsy artefacts that affected image analysis (susceptibility artefacts due to a clip positioned in small lesions; post-biopsy hematoma); presence of nonmass-like lesions only.

164×85 pixel, in-plane spatial resolution 2×2 mm2, slice thickness 4 mm, 24 slices, 5 averages, b-values 0 and 1000 s/mm2, acquisition time of 2’29^). Monoexponential ADC maps were first calculated from the ðS2 S1Þ (where S is the signal DWI images, using the formula: ln b1−b2 intensity at the given b-value), by the MRI system software (S-WS; software NUMARIS 4 version Syngo MR B17). For noise reduction, a b0 threshold of 15 was applied, meaning that in voxels not surpassing 15 arbitrary units of signal intensity, no ADC value was calculated. Images were then sent to the hospital PACS (P-WS; Suitestensa RIS PACS, Esaote, Genova, Italy) and uploaded into a commercially available BDigital Imaging and Communications in Medicine^ (DICOM) viewer with dedicated software for ADC calculation (O-SW; Osirix PRO, Aycan medical Systems, NY, USA). This software uses a monoexponential fit to calculate the ADC value. Again, the user-defined noise threshold was defined at 15 arbitrary units on the b0 image.

MR imaging MRI examination was performed on a 1.5-T scanner (Magnetom, Avanto Siemens Medical System, Erlangen, Germany; software NUMARIS 4 version Syngo MR B17) with a dedicated, bilateral, 4-channel coil. The imaging protocol in our study consisted of: a DWI sequence acquired before contrast medium injection, a short tau inversion recovery (STIR) T2-weighted sequence and a non-fat-saturated T1weighted spoiled gradient-echo (fast low-angle shot) sequence one time before and five times after intravenous contrast medium administration (0.1 ml/kg gadobenate dimeglumine, 0.5M, MultiHance; Bracco Imaging, Milan, Italy). DWI was acquired in the transverse plane using single shot echo planar imaging (SS-EPI), with spectrally adiabatic inversion recovery (SPAIR) fat-suppression (TR 7100 ms, effective TE 84 ms, FOV 330 x 165 mm, matrix

Fig. 3 Bland–Altman plots evaluating intra-reader agreement for ADC values measured on breast lesions using the MRI software system (a), the available PACS system (b) and a commercial DICOM viewer (c). R1=

Data analysis Two readers with more than 3 years of experience in breast MRI performed the measurements. The readers were not aware

Reader 1; R2=Reader 2; S=MRI workstation; P=PACS workstation; O= commercially available DICOM viewer with dedicated software for ADC calculation

Eur Radiol Table 2 Coefficient of Variation (CV) according to reader and workstation Lesions R1+R2 S-WS+P-WS+O-SW

S-WS

P-WS

O-SW

7.3 %

7.9 %

6.8 %

R1 7.4 %

R2 6.8 %

R1=Reader 1; R2=Reader 2; S-WS=MRI workstation; P-WS hospital PACS; O-SW commercially available DICOM viewer

of the clinical history of the patient and of the histology of the lesion, and initially reviewed in consensus the whole pre- and post-contrast MRI examination and identified the target lesions. ADC values were measured separately by the two readers using three methods: 1) the S-WS, on the maps created by the dedicated software; 2) the P-WS, using ADC maps created by the MRI system software and sent to the hospital PACS; 3) the O-SW, using a dedicated ADC calculation software (ADC map plug-in for Osirix, Brian Heargraves, Stanford, USA; Figure 1). Each reader performed the measurements in three different sessions, one for each modality, separated by at least three weeks in order to reduce any memory effect. Both readers measured the target lesion by using the circular region of interest (ROI) tool available in all the workstations used; they were instructed to draw a 2 dimensional ROI as small as possible, with a minimum area around 0.1 cm2 (as shown in Fig. 1), within the region with the lowest intensity on the ADC map. Care was taken to avoid a partial volume effect due to inclusion of surrounding parenchyma and areas of necrosis. Both readers entered the data in an Excel spreadsheet (Microsoft Corporation, Redmond, WA). The intra-reader agreement among different postprocessing system measurements and the inter-reader agreement were evaluated using the intraclass correlation coefficient (ICC) and interpreted according to the criteria of Landis and Koch [7]: an ICC of 0.41–0.60 indicated moderate agreement; an ICC of 0.61–0.80 indicated substantial agreement; 0.81–1.0 indicated almost perfect agreement. The data were graphically examined using Bland–Altman plots. Two different statistical software programs were used: SPSS v20.0 (IBM, Armonk, New York, USA) and MedCalc v12.5 (MedCalc Software, Ostend, Belgium). The coefficient of variation (CV) was also calculated using the formula S100 M , with S being the standard deviation and M the mean of the ADC values measured. P-values of 0.05 were considered statistically significant. No Bonferroni correction was used in this exploratory study.

Results Forty-one consecutive patients (mean age 54 years, range 31– 77 years) were included. Indications for the MRI examination were: pre-operative in 31 cases (75.6 %), screening in high-risk patients in 3 (7.3 %) and problem solving in 7 cases (17.1 %). All lesions were also visible on ultrasound, and were assessed with ultrasoundguided core needle biopsy, before or after MRI. Histological analysis showed: 31 malignant lesions (25 invasive ductal carcinoma, 3 invasive lobular cancer, 1 mucinous invasive ductal carcinoma, 1 papillary carcinoma, 1 high-grade ductal carcinoma in situ), 2 high-risk lesions (papilloma) and 8 benign lesions (4 fibroadenoma, 3 fibrosis, 1 adenosis). Maximum diameters of the lesions ranged from 8 mm to 75 mm (mean 21.6 mm). Table 1 summarizes mean ADC values and ranges measured in malignant and benign lesions by both readers using the S-WS, P-WS and the dedicated software platform on the O-SW. The ICC showed an almost perfect intra-reader agreement for the target lesion both when measuring the ADC values on S-WS and P-WS and when calculating the value with the dedicated software on the O-SW (ICC R1=0.974; ICC R2= 0.990). The ICC showed a substantial inter-reader agreement in the measurements of the target lesions. In particular, the ICC was 0.682 for S-WS, 0.615 for P-WS and 0.674 for O-SW. Bland–Altman plots for inter-reader agreement showed rather broad limits of agreement with a substantial overlap for all the three systems used; no systematic bias between readers was present (Figure 2). Bland–Altman plots for intra-reader agreement showed narrow limits of agreement for all three systems (Figure 3). O-SW showed systematic ADC differences as compared to S-WS: low ADC values were lower on O-SW and high ADC values were higher on O-SW as compared to S-WS (P=0.025, Figure 3b). A similar, but not significant, tendency was observed comparing O-SW and P-WS (P=0.14). The CV calculated both between readers and between workstations ranged between 6.8 % and 7.9 % (as shown in Table 2).

Discussion Our findings indicate almost perfect intra- and substantial inter-reader agreement in the evaluation of mass lesion ADC values in the breast, both when measuring ADC maps on different workstations and when calculating the values from DICOM data with a dedicated software plug-in. Systematic,

Eur Radiol

but minor, ADC differences were observed between two of the investigated systems but not between readers. Our Bland– Altman analysis shows the limitations of considering absolute ADC values for lesion diagnosis. While intra-reader agreement showed narrow limits of agreement, the wider interreader limits of agreement demonstrate substantial variation in ADC measurements in mass lesions. Applying fixed (absolute) ADC thresholds for lesion interpretation [8, 9] implies that false positive and false negative findings may occur. Possible bias in evaluating quantitative ADC information related to differences in imaging post-processing at different workstations has not been fully investigated. To our knowledge, there is only one paper comparing the reproducibility of ADC measurements on dedicated workstations and PACS software. Notably, the authors found no software related bias [10]. Our findings are important as they indicate that while ADC can be measured on parametric maps irrespective of the platform used, systematic and non-systematic variation might occur. Knowing the amount of disparity is a prerequisite for a standardized assessment of a quantitative tissue marker such as ADC value. Prior studies evaluating the reproducibility of ADC measurements outside the breast using similar diffusion-weighted sequences performed in magnets from different vendors, with different field strengths and coil systems, estimated ADC variability as follows: Sasaki et al. [6], performing brain ADC measurements on ten different MR imagers from four different vendors and using the same post-processing software, documented ADC value variability up to 9 %. On the other hand, Donati and coworkers [11] compared measurements performed in the upper abdomen using three different systems from three different vendors and found that ADC values were comparable in most of the anatomic regions assessed. CVs differed between the investigated organs, ranging between 7 and 27 % with most values below 20 %. We also obtained CV values below 20 %. Similar results were obtained by Ye et al. in the evaluation of the pancreas [12]. Variability can also be related to the shape and size of the ROI. For example, using a small ROI drawn only in the area of lowest signal intensity versus an ROI containing the whole lesion. Inoue et al. [13] found significant differences in the values measured using a free-hand ROI drawn on the tumour border compared to ROIs with different shape, but similar dimension, drawn inside the tumour in endometrial cancer. Lambregts et al. [14] also found a significant variability related to ROI size and positioning in rectal cancer. Similar results were obtained by Bilgili when comparing measurements obtained with various ROI size in the normal brain [15]. A recent study by Giannotti et al. [16] suggested that inter-reader variability can be reduced by the use of a larger ROI covering the whole lesion, as compared to a single small ROI. According to our previous experience [17], we decided to use standardized circular ROIs of similar size positioned in a defined area of the

lesion, with a lower intensity, in order to minimize ROIinduced variability. Regarding further influencing factors on ADC measurements in the breast, a recent meta-analysis underlined that the choice of b-values can affect breast lesion ADC values: the ADC decreases when higher b-values are used [3]. From our results, we can assume that post-processing systems may have a significant, but minor, impact on ADC measurement as compared to acquisition parameters. The main source of ADC variability seems to be caused by inter-reader variation. Our study has some limitations: first, only mass lesions were evaluated. It has been shown that ADC measurements may be more useful for discriminating mass lesions than non-mass lesions [18]. The high inter-reader reproducibility we obtained in mass lesions might not be valid in non-mass lesions, but we consider that our conclusions concerning the high reproducibility of different postprocessing methods can also be extended to non-mass lesions. Moreover, the comparisons were performed exclusively for the mean ADC values and the variability of minimum or maximum ADC values was not evaluated. Nevertheless, mean ADC values are the most common ADC values applied in the clinical routine. Finally, we investigated only a sample of three out of the multitude of clinically available post-processing systems. In conclusion, our results show that post-processing systems may have a significant, although minor, impact on ADC measurements in breast lesions. While intra-reader agreement is high, the main source of ADC variability seems to be caused by inter-reader variation. Acknowledgments The scientific guarantor of this publication is Chiara Zuiani. The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article. The authors state that this work has not received any funding. One of the authors has significant statistical expertise. Institutional review board approval was obtained. Written informed consent was obtained from all subjects (patients) in this study. Methodology: prospective, cross-sectional, performed at one institution.

References 1.

2.

3.

4.

Baltzer PAT, Benndorf M, Dietzel M et al (2010) Sensitivity and specificity of unenhanced MR mammography (DWI combined with T2-weighted TSE imaging, ueMRM) for the differentiation of mass lesions. Eur Radiol 20:1101–1110 Partridge SC, McDonald ES (2013) Diffusion weighted magnetic resonance imaging of the breast: protocol optimization, interpretation, and clinical applications. Magn Reson Imaging Clin N Am 21: 601–624 Dorrius MD, Dijkstra H, Oudkerk M, Sijens PE (2014) Effect of b value and pre-admission of contrast on diagnostic accuracy of 1.5-T breast DWI: a systematic review and meta-analysis. Eur Radiol 24: 2835–2847 Wu L-M, Hu J-N, Gu H-Y et al (2012) Can diffusion-weighted MR imaging and contrast-enhanced MR imaging precisely evaluate and

Eur Radiol

5.

6.

7. 8.

9.

10.

11.

predict pathological response to neoadjuvant chemotherapy in patients with breast cancer? Breast Cancer Res Treat 135:17–28 Woodhams R, Ramadan S, Stanwell P et al (2011) Diffusionweighted imaging of the breast: principles and clinical applications. Radiogr Rev Publ Radiol Soc N Am Inc 31:1059–1084 Sasaki M, Yamada K, Watanabe Y et al (2008) Variability in absolute apparent diffusion coefficient values across different platforms may be substantial: a multivendor, multi-institutional comparison study. Radiology 249:624–630 Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174 Pinker K, Bickel H, Helbich TH et al (2013) Combined contrastenhanced magnetic resonance and diffusion-weighted imaging reading adapted to the BBreast Imaging Reporting and Data System^ for multiparametric 3-T imaging of breast lesions. Eur Radiol 23:1791–1802 Baltzer A, Dietzel M, Kaiser CG, Baltzer PA (2015) Combined reading of contrast enhanced and diffusion weighted magnetic resonance imaging by using a simple sum score. Eur Radiol. doi:10. 1007/s00330-015-3886-x El Kady RM, Choudhary AK, Tappouni R (2011) Accuracy of apparent diffusion coefficient value measurement on PACS workstation: a comparative analysis. AJR Am J Roentgenol 196:W280–284 Donati OF, Chong D, Nanz D et al (2014) Diffusion-weighted MR imaging of upper abdominal organs: field strength and intervendor variability of apparent diffusion coefficients. Radiology 270:454–463

12.

13.

14.

15.

16.

17.

18.

Ye X-H, Gao J-Y, Yang Z-H, Liu Y (2014) Apparent diffusion coefficient reproducibility of the pancreas measured at different MR scanners using diffusion-weighted imaging. J Magn Reson Imaging JMRI 40:1375–1381 Inoue C, Fujii S, Kaneda S et al (2014) Apparent diffusion coefficient (ADC) measurement in endometrial carcinoma: effect of region of interest methods on ADC values. J Magn Reson Imaging JMRI 40:157–161 Lambregts DMJ, Beets GL, Maas M et al (2011) Tumour ADC measurements in rectal cancer: effect of ROI methods on ADC values and interobserver variability. Eur Radiol 21: 2567–2574 Bilgili Y, Unal B (2004) Effect of region of interest on interobserver variance in apparent diffusion coefficient measures. AJNR Am J Neuroradiol 25:108–111 Giannotti E, Waugh S, Priba L et al (2015) Assessment and quantification of sources of variability in breast apparent diffusion coefficient (ADC) measurements at diffusion weighted imaging. Eur J Radiol. doi:10.1016/j.ejrad.2015.05.032 Molinari C, Clauser P, Girometti R et al. (2015) MR mammography using diffusion-weighted imaging in evaluating breast cancer: a correlation with proliferation index. Radiol Med. 120(10):911–918 Partridge SC, Mullins CD, Kurland BF et al (2010) Apparent diffusion coefficient values for discriminating benign and malignant breast MRI lesions: effects of lesion type and size. AJR Am J Roentgenol 194:1664–1673

Is there a systematic bias of apparent diffusion coefficient (ADC) measurements of the breast if measured on different workstations? An inter- and intra-reader agreement study.

To evaluate the influence of post-processing systems, intra- and inter-reader agreement on the variability of apparent diffusion coefficient (ADC) mea...
1KB Sizes 0 Downloads 8 Views