Cardiopulmonar y Imaging • Original Research Lahiji et al. Computer-Aided Detection of Pulmonary Embolism

Downloaded from www.ajronline.org by UCSF LIB & CKM/RSCS MGMT on 12/12/14 from IP address 169.230.243.252. Copyright ARRS. For personal use only; all rights reserved

Cardiopulmonary Imaging Original Research

Improved Accuracy of Pulmonary Embolism Computer-Aided Detection Using Iterative Reconstruction Compared With Filtered Back Projection Kian Lahiji1 Seth Kligerman Jean Jeudy Charles White Lahiji K, Kligerman S, Jeudy J, White C

OBJECTIVE. The purpose of this study was to determine whether use of iterative reconstruction (IR) can improve performance of a pulmonary embolism computer-aided detection (PE CAD) prototype. MATERIALS AND METHODS. Images were collected from 40 consecutive pulmonary CT angiographic examinations in which PE was found and 26 studies in which it was not found for use as control cases. All images were reconstructed with filtered back projection (FBP) and six levels of a hybrid IR algorithm. The studies were evaluated with a prototype PE CAD system, and its performance was comparatively assessed on the basis of reconstruction type on a per-embolus and a per-study basis. RESULTS. Use of the hybrid IR algorithm led to a significant and progressive decrease in false-positive marks made by PE CAD compared with those made by radiologists on FBP reconstructions (239 false-positive marks for FBP and 154, 136, 125, 116, 107, and 98 false-positive marks for the six hybrid IR [HIR] levels). Specificity improved with increasing HIR level (45.6% for level 6; 30.3% for FBP). However, compared with FBP, increasing levels of HIR resulted in a progressive decrease in per-embolism sensitivity (70.3% for FBP; 55.4% for HIR level 6) and, with the exception of HIR level 4, a progressive decrease in per-study sensitivity (97.5% for FBP; 85.0% for HIR level 6). Overall accuracy was highest for HIR level 1 (77.3%). CONCLUSION. The use of IR leads to a significant reduction in false-positive marks by PE CAD at a cost of decreasing sensitivity. Very high levels of IR, which had the lowest sensitivities, should be avoided if being used concomitantly with PE CAD.

P

Keywords: computer-aided detection, CT, iterative reconstruction, pulmonary embolism DOI:10.2214/AJR.13.11838 Received September 3, 2013; accepted after revision January 13, 2014. 1

All authors: Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland Medical Center, 22 S Greene St, Baltimore, MD 21201. Address correspondence to K. Lahiji ([email protected]).

AJR 2014; 203:763–771 0361–803X/14/2034–763 © American Roentgen Ray Society

ulmonary embolism (PE) is the third most common cause of cardiovascular death in the United States, after stroke and myocardial infarction [1]. Prompt diagnosis is crucial to avoid rapid hemodynamic compromise and death. The clinical presentation of PE varies considerably, and the diagnosis relies heavily on clinical suspicion in conjunction with radiologic findings. The advent of MDCT technology allows single-breath-hold high-resolution acquisition of thoracic images of submillimeter section thickness through fifth-order branches of the pulmonary vascular tree. Acquisition of images at this level of anatomic detail has led to an increase in the accuracy of detecting small subsegmental emboli [2, 3]. However, a disadvantage of this capability is the large volume of data acquired per pulmonary CT angiography (CTA) study. Computer-aided detection (CAD) algorithms have been found to improve sensitivity

for disease detection in a variety of scenarios, including mammography, CT colonography, and both CT and radiographic lung nodule detection, by directing the radiologist’s attention to areas of potential diagnostic significance [4–8]. Because missing a PE is typically due to lack of detection by the reader rather than to interpretive error [9], the pulmonary vasculature is also an area of interest for CAD implementation. Some investigators have reported sensitivities and negative predictive values (NPVs) of PE CAD approaching 90%. One of the main drawbacks of PE CAD, however, is the high number of false-positive (FP) computerized marks, ranging from 0.93 to 25 per examination [10]. There are multiple causes of FP marks, but many are due to factors that lead to image degradation, such as suboptimal contrast enhancement of the pulmonary arterial system, respiratory or cardiac motion, and image noise [10]. Use of iterative reconstruction (IR) algorithms reduces image noise by provid-

AJR:203, October 2014 763

ing a more exact approximation of the sinogram obtained during CT. These algorithms are currently commercially available on all major platforms, and their use is becoming more widespread. The effect of use of IR algorithms on PE CAD effectiveness is unknown. The purpose of this study was to determine whether use of an IR technique can improve the performance of a PE CAD prototype compared with standard filtered back projection (FBP) reconstruction. Materials and Methods Study Design Our institutional review board approved this study with waiver of informed consent. All CT examinations included in this study were performed as standard of care. The study protocol complied with HIPAA.

Patient Population Between August 2011 and December 2012, raw data were collected from 40 consecutive pulmonary CTA studies that had findings positive for PE. Studies were excluded if PE was present in the main, left main, or right main pulmonary artery, because it was deemed highly unlikely that these central emboli would be overlooked by an interpreting radiologist. Studies with varying degrees of contrast enhancement of the pulmonary arteries were included as long as PE could be visualized and the standard departmental protocol was followed. Studies with qualitatively mild to moderate respiratory motion were included, but those with qualitatively severe respiratory motion were excluded. To confirm the presence of PE, all studies with positive findings were reviewed by two fellowship-trained thoracic radiologists, who had 5 and 9 years of experience.

Control Subjects For the control group, 26 studies acquired during the same time frame as the images with positive findings (August 2011–December 2012) and interpreted as negative for PE were selected at random from the PACS. These studies were performed with the same scanner as the positive cases with adherence to the same protocol and with identical inclusion and exclusion criteria. To confirm the absence of PE, all negative studies were reviewed by the two fellowship-trained thoracic radiologists who interpreted the positive cases.

Scanner Protocol For a positive pulmonary CTA study to be included in the PE CAD assessment, our standard institutional protocol was used. The scans were obtained with a 256-MDCT scanner (Brilliance

764

Fig. 1—Effect of hybrid iterative reconstruction level (HIR1–HIR6) on falsepositive (FP) detection. Bar graph shows four most common causes of FP identification of pulmonary embolism by computeraided detection (CAD). Use of hybrid iterative construction algorithm led to dramatic decreases in number of FP marks due to incorrect identification of pulmonary vein and image noise. There was only minimal change in number of FP marks due to motion or bronchial wall thickening (BWT), mucus plugging, or consolidation across reconstructions. FBP = filtered back projection.

120 No. of False Positives Detected by CAD

Downloaded from www.ajronline.org by UCSF LIB & CKM/RSCS MGMT on 12/12/14 from IP address 169.230.243.252. Copyright ARRS. For personal use only; all rights reserved

Lahiji et al.

100

Vein Motion BWT Noise

80

60

40

20

0

FBP

HIR1

iCT, Philips Healthcare) using the following parameters: 128 × 0.625 mm collimation, 0.9-mm slice thickness, 0.758 pitch, and B filter. The tube voltage was 120 kVp for patients whose body mass index (BMI) was 40 or greater and 100 kVp for patients with a BMI less than 40. BMI was calculated as patient weight in kilograms divided by the square of patient height in meters. If a BMI was not available on the hospital electronic medical record system to confirm protocol adherence, the examination was not included in the study. The tube current–time setting in milliampere-seconds was recommended by the scanner software on the basis of dual scout images. Tube current modulation was used for all studies. A rotation time of 0.33 milliseconds was used for studies obtained at 250 mAs or less and was increased to 0.5 millisecond for 300 mAs or greater. All examinations were reconstructed with 0.9-mm slice thickness and 50% overlap for viewing. IV contrast medium was injected at a rate of 5 mL/s through a central or peripheral venous catheter measuring at least 20 gauge. A test bolus of 20 mL of contrast medium was administered to estimate pulmonary artery transit time; 65 mL of contrast medium was administered for the diagnostic study. Pulmonary CTA examinations that deviated from the contrast injection protocol were also excluded from the study.

Iterative Reconstruction Technique For this study, we used the iDose hybrid IR technique (Philips Healthcare), which has been approved by the U.S. Food and Drug Administration. By identifying measurements in the projection domain and voxels in the image domain, the reconstruction algorithm selectively processes data that are likely to cause noise-related artifacts

HIR2 HIR3 HIR4 Reconstruction Type

HIR5

HIR6

in the image volume, namely, low-signal streaks and pixel-to-pixel noise. The noisy data are treated in two stages. In the projection domain, only the noisiest projections are processed with the iterative algorithm. This allows streak removal and correction of CT bias errors due to photon starvation. After reconstruction, image volumes are produced in which most of the correlated noise has been removed. After this initial analysis, mostly pixel-to-pixel noise remains. This noise is further corrected by use of an image domain noise reduction process whereby noise is uniformly removed at the entire frequency band and the underlying anatomic edges of the scanned object are preserved. The hybrid IR technique averages approximately 22 images per second compared with 31 images per second for FBP. The iDose level can be scaled from 1 to 7, which is proportional to the degree of noise reduction compared with the standard FBP reconstruction algorithm. In our study, levels from 1 to 6 were used to reconstruct all studies. This corresponded to noise reduction of 11%, 16%, 22%, 29%, 37%, and 45% [11]. A standard FBP reconstruction was also generated.

Quantitative Analysis of Image Quality For each study, all reconstructions (FBP, iDose hybrid IR [HIR] levels 1–6) were generated with identical parameters, including FOV, slice thickness, and percentage overlap. The seven reconstructions for each patient were loaded into a workstation (Brilliance Workspace Portal, Philips Healthcare). The reconstructions were subsequently linked at corresponding anatomic levels and displayed as a single image in 4 × 2 format. Attenuation and image noise were measured at four levels of the pulmonary vasculature for all pos-

AJR:203, October 2014

Downloaded from www.ajronline.org by UCSF LIB & CKM/RSCS MGMT on 12/12/14 from IP address 169.230.243.252. Copyright ARRS. For personal use only; all rights reserved

Computer-Aided Detection of Pulmonary Embolism

A

B

C

D

Fig. 2—51-year-old morbidly obese woman with shortness of breath. Representative axial CT images show reduction in false-positive (FP) and true-positive (TP) marks by pulmonary embolism computer-aided detection (PE CAD) with increasing levels of hybrid iterative reconstruction (HIR). A, Filtered back projection reconstruction shows PE CAD made 30 marks total, 24 FP, and six TP. B, Reconstruction at HIR level 2 shows PE CAD made 12 marks total, nine FP, three TP. C, Reconstruction at HIR level 4 shows PE CAD made five marks, three FP, and two TP. D, Reconstruction at HIR level 6 shows PE CAD made only two marks: one FP and one TP.

itive PE studies. Circular ROIs were placed on the left main pulmonary artery, left lower lobe basilar artery (just distal to the origin of the inferior lingular branch), left lower lobe posterior basal segmental artery, and posterior basal subsegmental artery. If a filling defect was present in either a segmental or a subsegmental branch, an adjacent nonthrombosed vessel was used for measurements. The ROI was initially placed on the FBP reconstruction and subsequently copied and pasted onto the exact areas of all six HIR reconstructions to ensure consistency of measurement. ROIs were also obtained from the left supraspinatus muscle in a similar manner. The attenuation values, in HU, were recorded. Noise levels were measured

as the SD of attenuation value within the selected ROI. Contrast-to-noise ratio (CNR) was measured with the standard equation: CNR =

(HUvessel − HUmuscle) [(SDvessel + SDmuscle) / 2]

where HUvessel is the attenuation of the object vessel, HUmuscle is the attenuation of muscle, ­SDvessel is the noise in the object vessel, and S­ Dmuscle is the noise in the muscle [12].

Computer-Aided Detection Software and Pulmonary Embolism Characterization There were no instances in which the two reviewing radiologists believed that the interpreta-

tion was an FP reading. All confirmed positive pulmonary CTA studies that met the inclusion criteria were evaluated with a non–commercially available PE CAD prototype (Philips Healthcare). This prototype was installed on a separate workstation. Before CAD analysis, a reference standard was created by digital marking each PE on the FBP reconstruction by use of a crosshair placed on the proximal and distal portions of the embolus. These marks were agreed on in consensus by the two cardiothoracic radiologists. The emboli were categorized according to location within the vascular tree and marked as lobar, segmental, or subsegmental according to standard definitions. Emboli that extended into multiple smaller segmental

AJR:203, October 2014 765

Pulmonary Embolism Computer-Aided Detection Analysis The PE CAD analysis was performed on seven reconstructions per patient (FBP, HIR levels 1–6). The time required for CAD analysis was recorded for each reconstruction. After analysis by CAD, the markings were evaluated and categorized as either true-positive (TP), FP, or falsenegative (FN). A TP mark corresponded to correct CAD identification of a reference standard PE mark. To be classified as TP, more than 50% of the green ovoid CAD mark had to be placed between the crosshairs of the reference standard mark. If CAD failed to identify a reference standard mark, or most of the CAD mark was placed outside the reference standard crosshairs, the result was deemed FN. The anatomic locations of each TP and FN mark were recorded in a manner similar to that for the reference standard PE markings (i.e., lobar, segmental, subsegmental). In cases in which CAD marked two portions of the same embolus, the second mark was disregarded. In addition, if PE CAD made a mark that was not made during the reference standard evaluation but was thought to represent a PE with at least moderate confidence (score of 3–5), the mark was changed

766

Fig. 3—Graph shows plot of number of falsepositive (FP) marks made by pulmonary embolism computer-aided detection (CAD) per iterative reconstruction level (HIR1–HIR6) against contrast-tonoise ratio (CNR). Inverse exponential relation exists between FP detection rate and image noise (lower CNR) such that highest number of FP marks (239) were made on filtered back projection (FBP) reconstructions, and lowest number (98) were made with HIR6. R2 of 0.816 indicates high goodness of fit.

250 No. of FP Marks Made by CAD

or subsegmental branches without discontinuity were counted as a single lesion originating from the most proximal vessel in which it was present. Discontinuous emboli were classified as individual PEs only if the discontinuous filling defects were located in a distinct segment or subsegment. Confidence scores were assigned to each embolus and agreed on by consensus as follows: A score of 1 indicated very low confidence of PE; the supposed filling defect is likely an artifact. A score of 2 indicated low confidence of PE; although the finding could potentially represent a PE, more likely it is not an embolus. A score of 3 indicated moderate confidence of PE; although there is some question in diagnosis, such as vessel blurred by respiratory motion or very small, the finding would still be interpreted as a PE even if an isolated finding. A score of 4 indicated confident of PE; although the finding may initially be questioned, the diagnosis is not in doubt once the vessel is evaluated. A score of 5 indicated very confident of PE; that is, clear certainty that the filling defect is PE with almost instantaneous recognition. Both radiologists provided a consensus opinion on the characteristics and location of all emboli. If there was disagreement in the classification that could not be resolved after consensus review, a third cardiothoracic radiologist, with 33 years of experience, served as the tiebreaker. Emboli with a confidence score of 1 or 2 were judged likely not to represent true emboli and were not included in the analysis.

FBP

200

HIR1

150

HIR2 HIR3 100

50

HIR6

7

9

11

13

15

Contrast-to-Noise Ratio

75

In addition to evaluation of each PE CAD mark, individual studies were also categorized as TP, FN, FP, or TN. If CAD failed to mark a single embolus on a positive study, the study was deemed FN. Conversely, if at least one mark was correctly made that corresponded to an embolus on a positive study, the study was considered TP. An FP study was classified as any control study in which at least one FP mark was present. A TN study was a control study in which no marks were made by PE CAD.

Statistical Evaluation An unpaired Student t test was used to assess for differences in BMI and age between the PE-positive and control study groups. A paired Student t test was used to determine any significance be-

FBP

70

HIR1 HIR2

Sensitivity R 2 = 0.975

65 60

HIR3

HIR5

HIR4

55 50 45 40

HIR2 HIR1

PPV R 2 = 0.833

35 30 25

HIR5

HIR4

R 2 = 0.816

from FP to TP after consensus agreement by the two radiologists. If the CAD mark did not correspond to a reference standard mark, the CAD mark was categorized as FP. Each of the FP marks was evaluated by the cardiothoracic radiologist with 5 years of experience to classify its cause. If a CAD mark appeared to correspond to a PE that was not detected by either radiologist during the consensus review, it was re-reviewed by both thoracic radiologists. If both radiologists agreed that the CAD mark did represent a PE with at least moderate confidence (score of 3, 4, or 5) it was subsequently relabeled as a TP mark. If a PE CAD mark corresponded to a reference standard scored as 1 or 2 by consensus, it was counted as an FP mark.

Value (%)

Downloaded from www.ajronline.org by UCSF LIB & CKM/RSCS MGMT on 12/12/14 from IP address 169.230.243.252. Copyright ARRS. For personal use only; all rights reserved

Lahiji et al.

HIR3 HIR4

HIR6 HIR5 HIR6

FBP

6

7

8

9

10

11

12

13

14

Contrast-to-Noise Ratio

Fig. 4—Graph shows per-lesion sensitivity (solid line) and positive predictive value (PPV) (dashed line) as functions of contrast-to-noise ratio (CNR). With noise removal and subsequent improvement in CNR, there is increase in PPV of pulmonary embolism computer-aided detection (PE CAD) marking system with varying levels (1–6) of hybrid iterative reconstruction (HIR). Inverse linear relation exists between amount of noise removal and number of true-positive lesions PE CAD marked. Best per-lesion sensitivity was seen with filtered back projection (FBP) reconstruction (70.3%) and lowest sensitivity with HIR level 6 (55.4%). R 2  = 0.833 for PPV, R 2 = 0.97 for sensitivity, indicating high goodness of fit and low degree of variance.

AJR:203, October 2014

Computer-Aided Detection of Pulmonary Embolism

Downloaded from www.ajronline.org by UCSF LIB & CKM/RSCS MGMT on 12/12/14 from IP address 169.230.243.252. Copyright ARRS. For personal use only; all rights reserved

TABLE 1: Sensitivity of Pulmonary Embolism Computer-Aided Detection in True-Positive Marking Location of Pulmonary Embolism

Hybrid Iterative Reconstruction Level

Filtered Back Projection

1

2

3

4

5

6

Total

Lobar (13)

76.9 (10)

69.2 (9)

84.6 (11)

69.2 (9)

61.5 (8)

53.8 (7)

53.8 (7)

67.0 (61/91)

Segmental (54)

87.0 (47)

81.5 (44)

75.9 (41)

75.9a (41)

68.5a (37)

72.2 (39)

64.8a (35)

75.1 (284/378)

Subsegmental (81)

58.0 (47)

55.6 (45)

54.3 (44)

54.3 (44)

54.3 (44)

53.1 (43)

49.4 (40)

54.1 (307/567)

64.9 (96)

63.5a (94)

60.1a,b (89)

60.1a (89)

55.4a,b (82)

62.9 (652/1036)

Total (148)

70.3 (104)

66.2 (98)

Note—Values are percentages with number of true-positive results in parentheses. aNumber of true-positive detections statistically significant compared with filtered back projection (p < 0.05). bNumber of true-positive detections statistically significant compared with iterative construction level 1 (p < 0.05).

tween CAD markings (i.e., TP, TN, FP, FN) by reconstruction type. Statistical significance was defined as p < 0.05. Sensitivities were calculated for all emboli for each reconstruction at the lobar, segmental, and subsegmental levels. Per-study sensitivity, specificity, NPV, and positive predictive value (PPV) were calculated. Accuracy on a per-study basis was calculated with the following formula: Accuracy =

TP + TN TP + TN + FP + FN

Studies in which the reference standard showed an isolated pulmonary embolus were grouped together for separate analysis of PE CAD performance. To determine the effects of image quality on CAD performance, standard plots were created between PE CAD outcomes versus average CNR values obtained at three levels of the pulmonary arterial tree. The coefficient of determination (R2) was calculated by squaring the correlation coefficient (r) to determine the proportion of the variance (fluctuation) of one variable predictable from the other variable. All calculations were performed with Microsoft Excel 2010.

Results Patient Characteristics Between August 1, 2011, and December 1, 2012, 86 studies obtained with the reference 256-MDCT scanner were interpreted as being positive for PE. Twenty-one (24.4%) showed PE within the main pulmonary arteries and were excluded. Of the remaining 65, eight studies were excluded because the standard departmental protocol was not followed (three patients with BMI ≥ 40 were scanned at 100 kV, and five patients with BMI < 40 were scanned at 120 kV). Severe respiratory motion was present on eight studies, leading to exclusion. Six studies were excluded because BMI was not recorded in the electronic medical record. Three patients were excluded because of deviations from the contrast protocol (two with injection rate less than 5 mL/s and one with partial contrast infiltra-

tion). After exclusion of these 46 studies, 40 studies of 40 different patients (21 male patients, 19 female patients; average age, 51.4 years; range, 15–78 years; SD, 14.7 years; average BMI, 31.3; range, 14.8–71.3; SD, 11.80) composed the study group. Five of the 40 studies were performed on morbidly obese patients (BMI ≥ 40) with a tube voltage of 120 kV according to institutional protocol. The other 35 patients underwent scanning at 100 kV. The 26 negative studies forming the control group were obtained during the same time frame as the positive studies and according to departmental protocol. They were selected at random from the PACS. The patients were four men and 22 women (average age, 44.7 years; range, 19–79 years; SD, 17.4 years; average BMI, 28.2; range, 16.5–53.3; SD, 9.3). Three of the 26 patients were morbidly obese (BMI ≥ 40), and the studies were performed at a tube voltage of 120 kV. The other 23 patients underwent scanning at 100 kV. There was no statistically significant difference in BMI (p = 0.265) or age (p = 0.097) between the study and control groups. Consensus Review of Pulmonary Embolism Studies Among the 40 cases of PE, a total of 193 filling defects were identified by consensus evaluation. Forty-five of these defects were scored 1 or 2 on the confidence scale (very low or low confidence) and were excluded. Each of these excluded marks occurred in patients with multiple emboli and did not lead to a change in the diagnosis from a positive to a negative PE study. In the reference standard evaluation, a total of 146 PEs were marked. An additional two subsegmental PEs were missed by both of the readers during consensus review but were marked by PE CAD and determined by consensus review to represent PE with at least a moderate degree of confidence. These two emboli were added

to the reference standard marks for a total of 148 individual PEs. The average number of PEs per study was 3.7 (range, 1–20; SD, 3.9). Thirteen studies had a solitary embolus, and 27 had multiple emboli. With respect to vascular distribution, 13 emboli were lobar, 54 segmental, and 81 subsegmental, corresponding to 8.8%, 36.5%, and 54.7% of total emboli. Eighty PEs (54.1%) received a confidence rating of 5. Confidence scores of 4 and 3 were assigned to 27 (18.2%) and 41 (27.7%) of emboli. The isolated PE subgroup contained eight studies with subsegmental lesions (61.5%) and five studies with segmental emboli (38.5%). Eight (61.5%) of these studies were rated with a confidence level of 5, two (15.4%) with a confidence score of 4, and three (23.1%) with a confidence score of 3. Overall Computer-Aided Detection Performance True-positive findings—Across all seven reconstructions evaluated per patient, CAD made a total of 652 TP marks. Compared with the FBP reconstruction, increasing HIR levels were associated with a decreasing overall number of TP marks made by PE CAD (Table 1). The TP numbers for the FBP reconstruction were significantly higher than for HIR level 3 (p = 0.048), level 4 (p = 0.01), level 5 (p = 0.03), and level 6 (p = 0.003). There was no significant difference between FBP and HIR level 1 (p = 0.2) and HIR level 2 (p = 0.09). Moreover, HIR level 1 reconstructions had significantly more TP marks than HIR level 4 (p = 0.048) and level 6 (p = 0.005) reconstructions. With the exception of HIR level 4 (p = 0.051), reconstructions at all HIR levels had a significantly higher number of TP marks than HIR level 6 reconstructions (level 1, p = 0.005; level 2, p = 0.014; level 3, p = 0.017; level 5, p = 0.018). Although there was a trend toward an increase in the number of TP marks made by CAD on the FBP reconstruction compared with the increasing

AJR:203, October 2014 767

Lahiji et al.

Downloaded from www.ajronline.org by UCSF LIB & CKM/RSCS MGMT on 12/12/14 from IP address 169.230.243.252. Copyright ARRS. For personal use only; all rights reserved

TABLE 2: Classification of False-Positive Marks Made by Computer-Aided Detection Filtered Back Projection

Cause of False-Positive Mark

Iterative Reconstruction Level 1

2

3

4

5

Total 6

No.

%

Pulmonary vein

112

74

56

51

43

45

32

413

42.4

Motion

38

35

35

32

34

32

33

239

24.5

Adjacent bronchial wall thickening, mucus plugging, or consolidation

25

19

22

23

23

17

19

148

15.2

Noise

48

17

12

5

3

4

4

93

9.5

Poor opacification

4

4

4

7

6

5

4

34

3.5

No reason identified

5

2

4

4

3

1

3

22

2.3

Branch point

4

2

2

2

2

1

2

15

1.5

Tumor

1

1

1

1

1

1

1

7

0.7

Lymph node

2

0

0

0

1

1

0

4

0.4

Total

239

154

136

125

116

107

98

975

100

Average

3.6

2.3

2.1

1.9

1.8

1.6

1.5

2.1

levels of HIR, there was no statistical difference in numbers of PE detected in the lobar and subsegmental branches. For segmental emboli, PE CAD made a significantly greater number of TP marks on the FBP reconstruction than on the HIR level 3 (p = 0.03), level 4 (p = 0.01), and level 6 (p = 0.01) reconstructions. Sensitivity—On a per-clot basis, the pooled sensitivity for all reconstructions was 62.9% (652/1036).With respect to most proximal embolus, the sensitivity of PE CAD was 67.0% for lobar, 75.9% for segmental, and 54.1% for subsegmental emboli. The overall sensitivity was highest for the FBP reconstruction (70.3%). There was a progressive decrease in sensitivity with increasing HIR level (Table 1). For per-study sensitivity, PE CAD correctly identified at least one PE on the FBP reconstruction in 39 of 40 TP cases, corresponding to a per-study sensitivity of 97.5%. In the single FN case, an isolated subsegmental PE with a conspicuity score of 3 was not detected on any reconstruction. With the exception of HIR level 4 (90% sensitivity), there was a progressive decrease in per-reconstruction sensitivity with increasing HIR level, from a sensitivity of 92.5% for level 1 to 85.0% for levels 5 and 6. On a per-reconstruction basis, the total number of TP cases detected by PE CAD on the FBP reconstruction was significantly greater than those on HIR level 2 (p = 0.04), level 3 (p = 0.04), level 5 (p = 0.02), and level 6 (p = 0.02) reconstructions. There was no significant difference in the number of TP cases between FBP and HIR level 1 (p = 0.16) and HIR level 4 (p = 0.083). In the 13 cases of isolated PE, PE CAD likewise performed best on the FBP recon-

768

struction, detecting a PE in all but one study (12/13). As before, HIR level 1 and level 4 performed better than the other HIR types (11/13), which made 9 or 10 TP marks. There was no significance difference between FBP and any of the HIR levels (p = 0.08–0.34). False-positive findings—PE CAD made 975 FP marks among the seven reconstructions across the 66 cases (40 positive cases, 26 control cases), for an average of 2.1 FP marks per reconstruction (range, 0–24; SD, 3.4). The causes of FP marks by CAD are listed in Table 2. For 14 patients (eight positive cases, six control cases), PE CAD did not make a single FP mark on any of the seven reconstructions. Increasing levels of noise reduction in the HIR algorithms led to a progressive decrease in the number of FP marks made by PE CAD (Table 2 and Figs. 1 and 2). Compared with the results with FBP, the decrease in FP marks was significant across all HIR levels (all, p < 0.001). The decrease in number of FP marks was greatest between FBP (239 FP marks; average, 3.6 FP marks per case) and IR level 6 (98 FP marks; average, 1.5 FP marks per case). Compared with HIR level 1, there was a significant reduction in number of FP marks for level 2 (p = 0.03), level 3 (p < 0.001), level 4 (p = 0.049), level 5 (p = 0.03), and level 6 (p = 0.02). HIR level 6 was associated with a statistically significant reduction in number of FP detections compared with all other HIR levels (level 1, p = 0.02; level 2, 0.04; level 3, 0.048; level 4, 0.046) except level 5 (p = 0.21). Nearly 90% of the FP marks made by PE CAD were judged to be due to incorrect identification of a pulmonary vein (42.4%);

cardiac or respiratory motion (24.5%); adjacent bronchial wall thickening, mucus plugging, or consolidation (15.2%); and image noise (9.5%). There was no significant difference in number of FP marks due to respiratory or cardiac motion made by PE CAD for FBP compared with any HIR level (p = 0.28–0.59) reconstruction. FP marks due to adjacent lung lesions also had a similar trend (p = 0.13–0.67). With regard to FP marks made by PE CAD that corresponded to a portion of pulmonary vein, there was a significant increase in the number of FP marks on FBP reconstructions compared with all HIR levels (p < 0.001 to p < 0.006). Moreover, compared with HIR level 1, each incremental HIR level exhibited significant reduction in the number of veins incorrectly marked. FP marks due to image noise decreased from 48 with FBP reconstructions to four with HIR level 6 (Table 2). However, these decreases did not reach significance at any HIR (p = 0.06–0.08). In no instance did PE CAD mark a nonpulmonary vascular structure. True-negative findings—Of the 182 control reconstructions among the 26 cases (seven reconstructions per case), PE CAD accurately made no mark on 87 reconstructions. There was a significant increase in number of TN studies for HIR level 1 (p = 0.01), level 2 (p = 0.03), level 4 (p = 0.01), and level 5 (p < 0.001) compared with FBP reconstruction. There was no significant difference ­between FBP and HIR level 3 (p = 0.39) or between FBP and HIR level 6 (p = 0.16). Specificity—PE CAD did not correctly mark 87 of 182 total control studies across all reconstructions, a combined specificity

AJR:203, October 2014

Computer-Aided Detection of Pulmonary Embolism

Downloaded from www.ajronline.org by UCSF LIB & CKM/RSCS MGMT on 12/12/14 from IP address 169.230.243.252. Copyright ARRS. For personal use only; all rights reserved

TABLE 3: Per-Study Performance Measures Hybrid Iterative Reconstruction Level

Filtered Back Projection

Measure

1

2

3

4

5

6

Sensitivity

97.5

92.5

87.5

87.5

90.0

85.0

85.0

Specificity

26.9

53.8

50.0

38.5

53.8

61.5

50.0

Positive predictive value

67.2

75.5

72.9

68.6

75.0

77.3

72.3

Negative predictive value

87.5

82.4

72.2

66.7

77.8

72.7

66.7

Accuracy

69.7

77.3

72.7

68.2

75.8

75.8

70.7

Note—All values are percentages. Numbers in bold indicate highest percentage per statistic. Numbers in italics indicate second highest percentage per statistic.

of 47.8%. The lowest per-case specificity occurred with PE CAD on FBP reconstructions (26.9%), and the highest was with HIR level 5 (61.5%). Except for HIR level 3, the use of HIR significantly decreased the number of FP cases across all IR levels compared with FBP (p < 0.001–0.03). Positive predictive value, negative predictive value, and accuracy—Because of the significant reduction in FP marks, the PPV for each CAD mark increased as the HIR level was augmented. On a per-mark basis, the lowest PPV was 30.3% for FBP reconstruction. This increased to 38.9% for HIR level 1 and 45.6% for level 6. On a per-study basis, in comparison with the results for FBP, PPV improved with increasing HIR level, except level 3 (Table 3). The greatest improvement in per-study PPV was seen between FBP (67.2%) and IR level 5 (77.3%). The per-study NPV also changed with the use of HIR. Increasing levels of noise reduction inversely affected the NPV (Table 3). The highest NPV was seen with FBP reconstruction (88%) and the lowest with HIR level 3 (66.7%). Overall accuracy was calculated for each reconstruction type to determine optimal performance of CAD (Table 3). With the FBP reconstruction, PE CAD had an accuracy of 69.7%. All HIR levels achieved a higher accuracy level compared with FBP, HIR level 1 having the highest accuracy at 77.3%. Image Noise At all four areas measured (left main pulmonary artery and left lower lobar, segmen-

tal, and subsegmental vessels), there was a significant reduction (p < 0.001) in image noise between the HIR studies and FBP reconstructions and between increases in HIR level. In addition, CNR improved with increasing levels of noise reduction across all levels of the pulmonary arterial tree (Table 4). An increase in CNR was associated with a decrease in overall FP marks (Fig. 3) and an associated increase in PPV (Fig. 4). Interestingly, with improving CNR, there was a resultant decrease in PE CAD sensitivity (Fig. 4). The R2 values for these plots indicated a low degree of variance and strong goodness of fit between the CNR and the number of FP marks (R2 = 0.816), CNR and PPV per study (R2 = 0.833), and CNR and per-lesion sensitivity (R2 = 0.975). Discussion The ideal CAD system should improve a radiologist’s workflow by highlighting areas of potential clinical concern so that greater emphasis can be placed on these areas. The efficacy of PE CAD can therefore be defined by maintenance of both a low number of FP findings and acceptable sensitivity. Several studies have evaluated the efficacy of PE CAD software as an adjunct to reader interpretation [9, 10, 13–19]. Sensitivities of 60– 90% in a series of publications suggest that there may be a benefits to concomitant use of CAD during interpretation, particularly for inexperienced radiologists [9, 10, 13, 15]. However, a common finding has been poor PPV, reportedly as low as 20% [4], which is a drawback because potential FP mark-

ings must be carefully assessed to rule out emboli. In our study, we evaluated the effect of IR techniques on PE CAD performance to determine whether the reduction in noise would provide any benefit. PPV is directly correlated with the number of FP detections by PE CAD. The four major contributors to FP markings that we found were incorrect identification of a pulmonary vein, image noise, cardiac and respiratory motion, and airway or parenchymal abnormality adjacent to a pulmonary artery branch. Our findings agree with results from the current literature, which show similar trends [9, 10]. Overall, these factors contributed to more than 90% of the FP markings across the 462 total scans evaluated. Through the use of HIR, we found nearly 60% reduction in the total number of FP marks between FBP (239 FP marks) and the highest level of HIR, that is, level 6 (98 FP marks). As anticipated, the use of HIR had little effect on FP marks caused by motion (38 with FBP, 33 with HIR level 6) and localized lung abnormalities (25 with FBP and 19 with IR level 6). However, to reduce the number of these types of FP results, measures can be taken before scanning, such as practicing proper breathhold techniques. In addition, one may choose not to run the PE CAD on studies with numerous areas of parenchymal consolidation or extensive large airway abnormalities. By decreasing image noise and increasing CNR, the use of HIR led to a dramatic decrease in the number of FP marks due to pulmonary veins and image noise. With FBP reconstruction, PE CAD marked 112

TABLE 4: Average Contrast-to-Noise Ratios per Anatomic Level Across All Reconstructions Hybrid Iterative Reconstruction Level

Filtered Back Projection

1

2

3

4

5

6

8.1

10.0

10.6

11.2

12.0

13.1

14.5

Segmental

7.1

8.5

8.9

9.3

9.8

10.7

11.5

Subsegmental

5.9

7.1

7.3

7.7

8.1

8.6

9.3

Anatomic Level Lobar

AJR:203, October 2014 769

Downloaded from www.ajronline.org by UCSF LIB & CKM/RSCS MGMT on 12/12/14 from IP address 169.230.243.252. Copyright ARRS. For personal use only; all rights reserved

Lahiji et al. pulmonary veins, which decreased in a nearly linear manner to 32 with the highest level of HIR. This suggests that the PE CAD program better differentiated pulmonary artery and pulmonary venous branches. The algorithm works by tracing the pulmonary artery segments centrally to peripherally and will mark a location as a PE if there is an abrupt decrease in attenuation in what the program assumes is a pulmonary artery. Thus most FP marks occur on low-attenuation structures that course directly adjacent to the pulmonary arteries, such as poorly opacified pulmonary veins. We theorize that with improvements in noise and CNR, the program likely can better define the course and wall of the pulmonary artery. This would decrease the likelihood that the program will inadvertently begin to trace the pulmonary vein instead of the pulmonary artery at a point where a pulmonary artery and vein cross. Similarly, image noise contributed to approximately 20% of the overall FP marks made by PE CAD on FBP studies compared with 4% with HIR level 6 reconstructions. As expected, this area had the most significant improvement after the use of IR, showing a 92% decline between the FBP (48 FP marks) and HIR level 6 reconstructions (four FP marks). Paradoxically, our findings also showed that as the level of noise was reduced and CNR improved, the sensitivity of CAD decreased both per lesion (70.3% with FBP to 55.4% with HIR level 6) and per study (97.5% with FBP and 85.0% with HIR level 6). The number of individual TP marks made was significantly better with FBP than with all but the two lowest levels of HIR. The reason for this decrease is unclear. Because the diagnosis of PE ultimately depends on detecting a single embolus rather than identifying every clot, it is of more clinical relevance to determine PE CAD performance on a per-study basis. Although the FBP reconstruction had the highest perstudy sensitivity (97.5%), the numbers of TP marks between the FBP reconstruction and HIR level 1 and level 4 were not significantly different. Paired with the dramatic decrease in the number of FP marks with the use of HIR, both HIR level 1 and level 4 had a higher PPV, specificity, and number of TN examinations compared with the FBP reconstruction. Similarly, although FBP had the highest NPV, the NPVs of both HIR level 1 and level 4 were not much lower. This combination led to the highest overall accuracy for HIR level 1 (77.3%) followed by level 4 (75.8%).

770

Therefore, the recommendation of what reconstruction to use for PE CAD analysis depends on the initial interpretation and the degree of noise and CNR of the CT scan. For studies the radiologist initially interprets as showing abnormal findings, there is little utility in using the PE CAD software. The only exception would be cases in which only a single subsegmental embolus is detected and the presence of a second embolus would alter treatment decisions. In normal studies with little perceptible image noise and a high CNR on the FBP reconstruction, PE CAD analysis should be performed on the FBP reconstruction because it has the highest per-study and per-lesion sensitivity and NPV. In studies with extensive image noise and a poor CNR on the FBP reconstruction, the use of a moderate level of HIR, such as iDose4, would be recommended because the further reduction of FP marks with only a slight decrease in sensitivity would lead to overall improved accuracy compared with the FBP reconstruction. All other studies should be analyzed with a low level of IR because it leads to improved accuracy primarily through a reduction in the number of FP marks without a significant loss in the number of TP studies or sensitivity. Very high levels of HIR, such as iDose6, should be avoided because they lead to a significant reduction in the number of TP marks per patient and per study. In general, PE CAD should not be performed on studies severely degraded by respiratory motion because the high number of FP marks may cause a decrease in reader accuracy. Although previous studies have investigated the relation between CNR and the diagnostic image quality of pulmonary CTA [20–23], to our knowledge, this is the first study that has examined the effect of IR on PE CAD performance. This was a retrospective study with certain limitations due to selection bias. However, we tried to mitigate this factor by selecting all consecutive positive PE studies performed over a 1-year period. To further limit selection bias, we did not specifically match the control group to the study group in terms of sex, age, or BMI. We studied the IR algorithm of only a single vendor, and it is unclear whether these results would be reproducible across all platforms. Except for iDose6, the per-mark sensitivity of the PE CAD system fell between the 60–90% sensitivity previously reported. However, our study was also limited in that we did not analyze specific reasons for FN marks by the PE CAD system.

Another limitation was that large central pulmonary emboli proximal to the lobar vessels were excluded from this study. This exclusion was made for two reasons. First, limitations stipulated by the software developer (Philips Healthcare) indicate that the mapping and detection process can be limited within the central pulmonary arteries. Second, and more important, these emboli are usually quite large and would rarely, if ever, be missed by a radiologist on a pulmonary CTA study, limiting the clinical utility of PE CAD in this situation. Another limitation to our study was that the diagnosis of pulmonary emboli is not always certain, especially in very distal subsegmental vessels and in areas degraded by motion or a suboptimal bolus. This could lead to inaccuracies in the assessment of PE CAD performance. To minimize any inaccuracies in the detection of PE, two fellowship-trained thoracic radiologists evaluated each study before PE CAD assessment, and if either radiologist had very low or low confidence in the diagnosis of an embolus, the finding was regarded as negative. Conclusion The use of IR to improve image quality can be of benefit in the use of CAD for PE detection. Noise removal results in a significant reduction in the number of FP detections by the software. As CNR improves, specificity and PPV also improve. However, the sensitivity of CAD decreases to below values obtained with FBP reconstruction, suggesting that using IR before PE CAD would be of greatest benefit for noisy images. Overall, the greatest accuracy was seen with the use of mild IR (iDose1), which showed 11% improvement over FBP. References 1. Kuriakose J, Patel S. Acute pulmonary embolism. Radiol Clin North Am 2010; 48:31–50 2. Schoepf UJ, Holzknecht N, Helmberger TK. Subsegmental pulmonary emboli: improved detection with thin-collimation multi-detector row spiral CT. Radiology 2002; 222:483–490 3. Patel S, Kazerooni EA, Cascade PN. Pulmonary embolism: optimization of small pulmonary artery visualization at multi-detector row CT. Radiology 2003; 227:455–460 4. Lauria A, Palmiero R, Forni G, et al. A study on two different CAD systems for mammography as an aid to radiological diagnosis in the search of microcalcification clusters. Eur J Radiol 2005; 55:264–269 5. Cole EB, Zhang Z, Marques HS, et al. Assessing

AJR:203, October 2014

Downloaded from www.ajronline.org by UCSF LIB & CKM/RSCS MGMT on 12/12/14 from IP address 169.230.243.252. Copyright ARRS. For personal use only; all rights reserved

Computer-Aided Detection of Pulmonary Embolism the stand-alone sensitivity of computer-aided detection with cancer cases from the Digital Mammographic Imaging Screening Trial. AJR 2012; 199:[web]W392–W401 6. Beigelman-Aubry C, Raffy P, Yang W, Castellino RA, Grenier PA. Computer-aided detection of solid lung nodules on follow-up MDCT screening: evaluation of detection, tracking, and reading time. AJR 2007; 189:948–955 7. Regge D, Delle Monica P, Galatola G, et al. Efficacy of computer-aided detection as a second reader for 6-9-mm lesions at CT colonography: multicenter prospective trial. Radiology 2013; 266:168–176 8. Miyake M, Iinuma G, Taylor SA, et al. Comparative performance of a primary-reader and secondreader paradigm of computer-aided detection for CT colonography in a low-prevalence screening population. Jpn J Radiol 2013; 31:310–319 9. Walsham AC, Roberts HC, Kashani HM, et al. The use of computer-aided detection for the assessment of pulmonary arterial filling defects at computed tomographic angiography. J Comput Assist Tomogr 2008; 32:913–918 10. Blackmon KN, Florin C, Bogoni L, et al. Computer-aided detection of pulmonary embolism at CT pulmonary angiography: can it improve performance of inexperienced readers? Eur Radiol 2011; 21:1214–1223 11. Noël PB, Fingerle AA, Renger B, Rummeny EJ, Dobritz M. A clinical comparison study of a novel

statistical iterative and filtered back projection reconstruction. In: Pelc J, Ehsan D, Nishikawa R, eds. Medical imaging 2011: physics of medical imaging—proceedings of SPIE, vol. 7961. Bellingham, WA: SPIE, 2011:877971 12. Marin D, Nelson RC, Schindera ST, et al. Lowtube-voltage, high-tube-current multidetector abdominal CT: improved image quality and decreased radiation dose with adaptive statistical iterative reconstruction algorithm—initial clinical experience. Radiology 2010; 254:145–153 13. Wittenberg R, Peters JF, Sonnemans JJ, Prokop M, Schaefer-Prokop CM. Computer-assisted detection of pulmonary embolism: evaluation of pulmonary CT angiograms performed in an oncall setting. Eur Radiol 2010; 20:801–806 14. Maizlin ZV, Vos PM, Godoy MC, Cooperberg PL. Computer-aided detection of pulmonary embolism on CT angiography: initial experience. J Thorac Imaging 2007; 22:324–329 15. Wittenberg R, Berger FH, Peters JF, et al. Acute pulmonary embolism: effect of a computer-assisted detection prototype on diagnosis—an observer study. Radiology 2012; 262:305–313 16. Das M, Muhlenbruch G, Helm A, et al. Computeraided detection of pulmonary embolism: influence on radiologists’ detection performance with respect to vessel segments. Eur Radiol 2008; 18:1350–1355 17. Buhmann S, Herzog P, Liang J, et al. Clinical evaluation of a computer-aided diagnosis (CAD)

prototype for the detection of pulmonary embolism. Acad Radiol 2007; 14:651–658 18. Engelke C, Schmidt S, Bakai A, Auer F, Martehn K. Computer-assisted detection of pulmonary embolism: performance evaluation in consensus with experienced and inexperienced chest radiologists. Eur Radiol 2008; 18:298–307 19. Wittenberg R, Peters JF, Weber M, et al. Standalone performance of a computer-assisted detection prototype for detection of acute pulmonary embolism: a multi-institutional comparison. Br J Radiol 2012; 85:758–764 20. Kazakauskaite E, Husmann L, Stehli J, et al. Image quality in low-dose coronary computed tomography angiography with a new high-definition CT scanner. Int J Cardiovasc Imaging 2013; 29:471–477 21. Wittenberg R, Peters JF, Sonnemans JJ, Bipat S, Prokop M, Schaefer-Prokop CM. Impact of image quality on the performance of computer-aided detection of pulmonary embolism. AJR 2011; 196:95–101 22. Kligerman S, Mehta D, Farnadesh M, Jeudy J, Olsen K, White C. Use of a hybrid iterative reconstruction technique to reduce image noise and improve image quality in obese patients undergoing computed tomographic pulmonary angiography. J Thorac Imaging 2013; 28:49–59 23. Nyman U, Bjorkdahl P, Olsson ML, Gunnarsson M, Goldman B. Low-dose radiation with 80kVp computed tomography to diagnose pulmonary embolism: a feasibility study. Acta Radiol 2012; 53:1004–1013

AJR:203, October 2014 771

Improved accuracy of pulmonary embolism computer-aided detection using iterative reconstruction compared with filtered back projection.

The purpose of this study was to determine whether use of iterative reconstruction (IR) can improve performance of a pulmonary embolism computer-aided...
880KB Sizes 0 Downloads 4 Views

Recommend Documents