Cell morphology-based classification of red blood cells using holographic imaging informatics Faliu Yi,1 Inkyu Moon,1,* and Bahram Javidi2 1

Department of Computer Engineering, Chosun University, 309 Pilmun-daero, Dong-gu, Gwangju 501-759, South Korea

2

Department of Electrical and Computer Engineering, U-2157, University of Connecticut, Storrs, Connecticut 06269, USA * [email protected]

Abstract: We present methods that automatically select a linear or nonlinear classifier for red blood cell (RBC) classification by analyzing the equality of the covariance matrices in Gabor-filtered holographic images. First, the phase images of the RBCs are numerically reconstructed from their holograms, which are recorded using off-axis digital holographic microscopy (DHM). Second, each RBC is segmented using a markercontrolled watershed transform algorithm and the inner part of the RBC is identified and analyzed. Third, the Gabor wavelet transform is applied to the segmented cells to extract a series of features, which then undergo a multivariate statistical test to evaluate the equality of the covariance matrices of the different shapes of the RBCs using selected features. When these covariance matrices are not equal, a nonlinear classification scheme based on quadratic functions is applied; otherwise, a linear classification is applied. We used the stomatocyte, discocyte, and echinocyte RBC for classifier training and testing. Simulation results demonstrated that 10 of the 14 RBC features are useful in RBC classification. Experimental results also revealed that the covariance matrices of the three main RBC groups are not equal and that a nonlinear classification method has a much lower misclassification rate. The proposed automated RBC classification method has the potential for use in drug testing and the diagnosis of RBC-related diseases. ©2016 Optical Society of America OCIS codes: (090.1995) Digital holography; (100.6890) Three-dimensional image processing; (170.3880) Medical and biological imaging; (150.0150) Machine vision; (150.1135) Algorithms; (100.5010) Pattern recognition; (170.1530) Cell analysis.

References and links 1. 2. 3. 4. 5. 6. 7.

R. Liu, D. K. Dey, D. Boss, P. Marquet, and B. Javidi, “Recognition and classification of red blood cells using digital holographic microscopy and data clustering with discriminant analysis,” J. Opt. Soc. Am. A 28(6), 1204– 1210 (2011). F. Yi, I. Moon, and Y. H. Lee, “Three-dimensional counting of morphologically normal human red blood cells via digital holographic microscopy,” J. Biomed. Opt. 20(1), 016005 (2015). J. Laurie, D. Wyncoll, and C. Harrison, “New versus old blood - the debate continues,” Crit. Care 14(2), 130– 131 (2010). C. G. Koch, L. Li, D. I. Sessler, P. Figueroa, G. A. Hoeltge, T. Mihaljevic, and E. H. Blackstone, “Duration of red-cell storage and complications after cardiac surgery,” N. Engl. J. Med. 358(12), 1229–1239 (2008). I. Moon, F. Yi, Y. H. Lee, B. Javidi, D. Boss, and P. Marquet, “Automated quantitative analysis of 3D morphology and mean corpuscular hemoglobin in human red blood cells stored in different periods,” Opt. Express 21(25), 30947–30957 (2013). T. Tishko, Holographic Microscopy of Phase Microscopic Objects: Theory and Practice (MA, World Scientific, 2011). J. W. Bacus and J. H. Weens, “An automated method of differential red blood cell classification with application to the diagnosis of anemia,” J. Histochem. Cytochem. 25(7), 614–632 (1977).

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2385

8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.

H. Lee and Y. Chen, “Cell morphology based classification for red cells in blood smear images,” Pattern Recognit. Lett. 49, 155–161 (2014). I. Moon, M. Daneshpanah, B. Javidi, and A. Stern, “Automated three-dimensional identification and tracking of micro/nanobiological organisms by computational holographic microscopy,” Proc. IEEE 97(6), 990–1010 (2009). I. Moon, M. Daneshpanah, A. Anand, and B. Javidi, “Cell identification computational 3-D holographic microscopy,” Opt. Photonics News 22(6), 18–23 (2011). A. Anand and B. Javidi, “Digital holographic microscopy for automated 3D cell identification: an overview,” Chin. Opt. Lett. 12(6), 060012 (2014). R. Fiolka, L. Shao, E. H. Rego, M. W. Davidson, and M. G. Gustafsson, “Time-lapse two-color 3D imaging of live cells with doubled resolution using structured illumination,” Proc. Natl. Acad. Sci. U.S.A. 109(14), 5311– 5315 (2012). B. Javidi, I. Moon, S. Yeom, and E. Carapezza, “Three-dimensional imaging and recognition of microorganism using single-exposure on-line (SEOL) digital holography,” Opt. Express 13(12), 4492–4506 (2005). I. Moon and B. Javidi, “Shape tolerant three-dimensional recognition of biological microorganisms using digital holography,” Opt. Express 13(23), 9612–9622 (2005). I. Moon and B. Javidi, “3-D visualization and identification of biological microorganisms using partially temporal incoherent light in-line computational holographic imaging,” IEEE Trans. Med. Imaging 27(12), 1782– 1790 (2008). S. Abrahamsson, J. Chen, B. Hajj, S. Stallinga, A. Y. Katsov, J. Wisniewski, G. Mizuguchi, P. Soule, F. Mueller, C. Dugast Darzacq, X. Darzacq, C. Wu, C. I. Bargmann, D. A. Agard, M. Dahan, and M. G. Gustafsson, “Fast multicolor 3D imaging using aberration-corrected multifocus microscopy,” Nat. Methods 10(1), 60–63 (2012). E. H. Seeley and R. M. Caprioli, “3D imaging by mass spectrometry: a new frontier,” Anal. Chem. 84(5), 2105– 2110 (2012). M. A. Robinson, D. J. Graham, and D. G. Castner, “ToF-SIMS depth profiling of cells: z-correction, 3D imaging, and sputter rate of individual NIH/3T3 fibroblasts,” Anal. Chem. 84(11), 4880–4885 (2012). F. Yi, I. Moon, B. Javidi, D. Boss, and P. Marquet, “Automated segmentation of multiple red blood cells with digital holographic microscopy,” J. Biomed. Opt. 18(2), 26006 (2013). B. Rappaz, I. Moon, F. Yi, B. Javidi, P. Marquet, and G. Turcatti, “Automated multi-parameter measurement of cardiomyocytes dynamics with digital holographic microscopy,” Opt. Express 23(10), 13333–13347 (2015). T. Colomb, E. Cuche, F. Charrière, J. Kühn, N. Aspert, F. Montfort, P. Marquet, and C. Depeursinge, “Automatic procedure for aberration compensation in digital holographic microscopy and applications to specimen shape compensation,” Appl. Opt. 45(5), 851–863 (2006). E. Cuche, P. Marquet, and C. Depeursinge, “Simultaneous amplitude-contrast and quantitative phase-contrast microscopy by numerical reconstruction of Fresnel off-axis holograms,” Appl. Opt. 38(34), 6994–7001 (1999). R. Gonzalez and R. Woods, Digital Imaging Processing (NJ, Prentice Hall, 2002). J. Li, X. Li, B. Yang, and X. Sun, “Segmentation-based Image Copy-move Forgery Detection Scheme,” IEEE Trans. Inf. Forensics Security 10(3), 507–518 (2015). Y. Zheng, B. Jeon, D. Xu, Q. M. Wu, and H. Zhang, “Image segmentation by generalized hierarchical fuzzy Cmeans algorithm,” Journal of Intelligent and Fuzzy Systems: Applications in Engineering and Technology 28(2), 961–973 (2015). P. Guo, J. Wang, B. Li, and S. Lee, “A variable threshold-value authentication architecture for wireless mesh networks,” Journal of Internet Technology 15(6), 929–936 (2014). I. Moon and B. Javidi, “Three-dimensional identification of stem cells by computational holographic imaging,” J. R. Soc. Interface 4(13), 305–313 (2007). P. Green, Mathematical Tools for Applied Multivariate Analysis (Academic Press, 2014). A. Rencher, Methods of Multivariate Analysis (New York, Wiley-Interscience, 2002). X. Wen, L. Shao, Y. Xue, and W. Fang, “A rapid learning algorithm for vehicle classification,” Inf. Sci. 295, 395–406 (2015). B. Gu, V. S. Sheng, K. Y. Tay, W. Romano, and S. Li, “Incremental support vector learning for ordinal regression,” IEEE Trans. Neural Netw. Learn. Syst. 26(7), 1403–1416 (2015). I. Moon, B. Javidi, F. Yi, D. Boss, and P. Marquet, “Automated statistical quantification of three-dimensional morphology and mean corpuscular hemoglobin of multiple red blood cells,” Opt. Express 20(9), 10295–10309 (2012). F. Yi, I. Moon, and Y. H. Lee, “Extraction of target specimens from bioholographic images using interactive graph cuts,” J. Biomed. Opt. 18(12), 126015 (2013). J. Gall, A. Yao, N. Razavi, L. Van Gool, and V. Lempitsky, “Hough forests for object detection, tracking, and action recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2188–2202 (2011). M. Winter, E. Wait, B. Roysam, S. K. Goderie, R. A. Ali, E. Kokovay, S. Temple, and A. R. Cohen, “Vertebrate neural stem cell segmentation, tracking and lineaging with validation and editing,” Nat. Protoc. 6(12), 1942– 1952 (2011). I. Seroussi, D. Veikherman, N. Ofer, S. Yehudai-Resheff, and K. Keren, “Segmentation and tracking of live cells in phase-contrast images using directional gradient vector flow for snakes,” J. Microsc. 247(2), 137–146 (2012). I. Fogel and D. Sagi, “Gabor filters as texture discriminator,” Biol. Cybern. 61(2), 103–113 (1989). D. Zhang, M. Islam, G. Lu, and I. Sumana, “Rotation invariant curvelet features for region based image retrieval,” Int. J. Comput. Vis. 98(2), 187–201 (2012). T. Fawcett, “An Introduction to ROC Analysis,” Pattern Recognit. Lett. 27(8), 861–874 (2006).

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2386

1. Introduction Red blood cells (RBCs) are essential components of human blood. They not only deliver oxygen to body tissues in vertebrates via the circulatory system, but also absorb oxygen in the lungs and release it when flowing through the capillaries [1, 2]. Normally, a relatively stable number of RBCs is maintained in the circulatory system; however, for anemic patients, it is necessary to stimulate RBC production with medication. Recent studies showed that in RBCs stored for long periods, structural and functional changes occur that affect their viability, which, in turn, increases the latent risk to patients when the blood is transfused [1–5]. Moreover, the types of RBCs in human blood vary with respect to RBC-related diseases [2, 6]. Therefore, a robust classification method is needed to analyze RBCs for medical diagnosis and therapeutics. In addition, an RBC classification algorithm would be helpful in screening RBC-related drugs or agents. Existing RBC classification methods are either time-consuming or have a high misclassification rate [1, 2, 7, 8], and most are based on two-dimensional (2D) imaging techniques [7, 8]. Conventional 2D intensity images of semitransparent or transparent biological specimens are virtually the same and do not have enough detail to enable separation of different types of targets. The use of traditional 2D imaging systems for quantitative analysis of RBCs is limited because they cannot provide important biophysical cell parameters related to the structure and function of RBCs. Consequently, development of an RBC classification method based on quantitative three-dimensional (3D) imaging with greater efficiency and accuracy is imperative. Recently, 3D imaging systems for the analysis of semitransparent or transparent biological specimens such as cardiomyocytes and RBCs have been presented [9–21]. Digital holographic microscopy (DHM) has been widely investigated for use in observing, measuring, and tracking biological micro- and nano-organisms because it allows for fast, noninvasive imaging of the observed specimen with a nanometric axial sensitivity. DHM overcomes the depth-of-field limitation of classical optical microscopy [5–15] and is a quantitative phase-imaging technique that directly and noninvasively provides the optical thickness of a cell under dynamic conditions [9–11]. This phase information, which is related to cell morphology and biophysical cell parameters such as dry mass, volume, and density, can be used in the classification of RBCs. In this paper, we present an automated RBC classification algorithm in which the holograms of the RBCs are recorded using off-axis DHM and the corresponding phase images are reconstructed from these holograms using numerical reconstruction techniques [9–11, 21, 22]. We used the three main types of RBCs based on shape, i.e., stomatocyte, discocyte, and echinocyte, for classification training and testing [2, 6]. The RBC samples were extracted from multiple human RBC phase images and a biologist determined the type of each RBC. The marker-controlled watershed transform algorithm [19, 23], which is a preprocessing step in the edge-based segmentation algorithm [23–26], was used to segment cell targets in the phase images and isolate cells beneficial to further analysis. Also, it was used to extract and segment the inner part of the RBCs that can contribute to the discrimination of RBC groups [2]. Unlike other holographic RBC classification methods in which the RBC features are measured directly from the segmented cells, our classification scheme applies Gabor wavelet filter [27] to the selected RBC before feature extraction. The Gabor wavelet filter provides optimal resolution in both space and frequency domains and is optimal at extracting local features such as discontinuities [27]. The ten features (see Table 1) extracted from a single RBC after application of the Gabor wavelet filter were projected surface area, perimeter, circularity, average phase value, mean corpuscular hemoglobin (MCH), MCH surface density (MCHSD), mean phase of the central part, sphericity coefficient, elongation, and D value, and the four features extracted from the inner part of the RBC were projected surface area, perimeter, average phase value, and elongation [2]. Some of these 14 features are redundant, which reduces the efficiency of the RBC classification scheme. Thus, our scheme uses the stepwise selection procedure [28, 29] to statistically check for redundancy. Therefore, only the remaining features are used in the design of the RBC classification However, the

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2387

conventional linear classification method may not work well when the three RBC groups have different covariance matrices [28–31]. The multivariate χ2 test is used to verify the equality of the covariance matrices [29]. If covariance matrices are different, the use of a nonlinear classifier with quadratic functions is preferable, and if they are the same, the use of a linear classifier is better. Statistical analysis shows that the covariance matrices of the three typical RBC shapes in our experiments are not equal. Experimental results showed that our proposed nonlinear classification method achieves a much lower misclassification rate than that presented in [2]. In addition, the RBC features extracted from the segmented RBC target with the Gabor wavelet filter improve the performance of the final RBC classification scheme. Section 2 discusses the principle of off-axis DHM. Section 3 presents the statistical methods used for the RBC classification. Section 4 presents the experimental results and Section 5 is a summary of our conclusions. 2. Off-axis digital holographic microscopy Figure 1 shows the schematic of the off-axis DHM used to record the RBC holograms [2, 32, 33]. In the off-axis geometry, the 682-nm-wavelength laser diode beam is divided into a reference wave and an object wave. When the object wave goes through the RBC sample, it is diffracted and magnified by a 40 × , 0.75-NA microscope objective (MO). A 1024 × 1024 pixels charge-coupled device (CCD) camera (pixel size = 6.5μm) captures the RBC interference patterns between the diffracted object wave and the reference wave. Finally, the RBC wave front or phase image is reconstructed from the off-axis digital hologram using the numerical algorithm described in [21, 22]. Beam splitter Mirror

CCD

MO Reference wave (R)

R

Sample

θ

Lens

Object wave (O) Laser

O Mirror

Beam splitter

Fig. 1. Schematic of off-axis DHM.

3. Statistical methods for RBC classification In this section, we present the procedures used to design the statistical RBC classification method. First, the RBCs are extracted from the RBC phase images that are reconstructed from the RBC holograms recorded by the off-axis DHM. Second, the Gabor wavelet filter is applied to the segmented RBC target from which 14 features are determined. The redundancy of each RBC feature is statistically analyzed by the stepwise selection method and the redundant features are removed. Finally, a linear or nonlinear RBC classifier is selected based on the multivariate tests of the equality of covariance matrices among the three RBC groups using the remaining features. 3.1 RBC segmentation The preprocessing step of RBC segmentation is necessary for RBC classification, recognition, and tracking [23, 32–36]. The marker-controlled watershed transform algorithm, which is an edge-based segmentation method, is used for RBC segmentation because it can isolate an

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2388

object that is beneficial to the separation of connected RBCs. This transform is the extension of the watershed transform method that can easily produce an over-segmentation problem [19, 23]. With the watershed transform algorithm, the regional minimal values in the image are viewed as valleys and the regional maximal values as peaks. In a simulation model of flooding [23], the intensity values of the entire image are considered the elevation values of the terrain. Note that for the algorithm applied to the holographic phase images, the intensity values refer to phase values. Dams are constructed in neighboring valleys when the terrain is flooded and two neighboring valleys merge into one. The final dams or peaks form the watershed line. The watershed transform algorithm tries to find all of the peaks. However, there are many regional minimal values in the image and thus there are many peaks, which are referred to as oversegmentation and can be found using a conventional marker-controlled watershed transform method. With this approach, the regional minimal values occur only at marked locations. The marker usually consists of an internal marker that denotes an object that needs to be extracted and an external marker that represents the background around the object. All external markers are connected [23] so the marker-controlled watershed algorithm finds the peaks between the internal and external markers. When the marker-controlled watershed transform algorithm is applied to the gradient image, the object boundaries can be searched because they tend to have high values in the gradient image when the object is marked as an internal marker and the background is marked as an external marker. We use the marker-controlled watershed transform scheme to extract both a single RBC and the inner part of the RBC in the phase image reconstructed from digital hologram [19]. A single RBC and its inner part are illustrated in Fig. 2 where the intensity value represents phase value.

Single RBC Inner part of RBC 8μm

8μm

Fig. 2. Illustration of single RBC and the inner part of the cell.

3.2 Gabor wavelet filter The Gabor wavelet filter is applied to the segmented and identified RBCs. The filter is the optimal compromise between spatial resolution and frequency resolution [27]. It also has good noise tolerance because of its band-limited behavior and the representation of the local features centered on feature points by its coefficients. The generalized 2D Gabor elementary function is expressed as [27]:

{

}

2 2 ψ ( m, n ) = exp − α 2 ( m cos θ + n sin θ ) + β 2 ( −m sin θ + n cos θ ) 



× exp { j 2πf 0 ( m cos θ + n sin θ )} ,



(1)

where θ is the rotation of the Gaussian and sinusoid functions, f 0 is the frequency of the sinusoid function, j is the imaginary unit, and α and β are the sharpness of the Gaussian major and minor axes, respectively. The continuous wavelet transformation of the extracted RBCs is then defined as [27]: Rˆ (m′, n′) = R ( m, n ) ⊗ψ ( m, n ) ,

#262009 (C) 2016 OSA

(2)

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2389

where R(m,n) is the extracted RBC image after segmentation, Rˆ (m′, n′) is the Gabor-waveletfiltered RBC image, and ⊗ denotes convolution. Gabor wavelet filter is basically a Gaussian function that is modulated by a complex sinusoid of frequency and orientation [37]. Unlike Fourier transform which mainly defines the frequency characteristic of image, Gabor transform consists of characteristics of both spatial frequency and their orientations in image. The Gabor response from Eq. (2) emphasizes three types of characteristics in the image which are edge-oriented characteristics, texture-oriented characteristics, and a combination of both. Gabor wavelet filter is widely used to extract features that differentiate between targets with similar colors or shapes for object classification. It is also shown in [38] that texture features extracted from Gabor filter outperform these from other methods. Accordingly, Gabor filter is adopted to extract the RBC features for the purpose of RBC classification. 3.3 Extraction of RBC features After the Gabor wavelet filter is applied to the extracted RBCs, the corresponding features are measured. Features F1–F10 are identified from the single RBC and F11–F14 are determined from the inner part of the RBC on the Gabor-wavelet-filtered single RBC image. These features are described in Table 1 [2]. Table 1. Feature Descriptions Feature Segmented Single RBC

Description

F1: Projected surface area S

Number of pixels within single RBC × one pixel area

F2: Perimeter

Length of the cell boundary

F3: Circularity

(Perimeter × perimeter)/area

F4: Average phase value

ϕ

F5: MCH F6: MCHSD

Average phase value for pixels within a single RBC Mean corpuscular hemoglobin MCH surface density

F7: Phase of center pixel

Phase value of center pixel (average 5 × 5 pixels)

F8: Sphericity coefficient

Phase of center pixel/maximal phase value

F9: Elongation

Orientation of chain code in the cell boundary

F10: D value

Phase of center pixel minus maximal pixel phase value

Inner part of the segmented RBC F11: Projected surface area

Number of pixels within the inner part of RBC × one pixel area

F12: Perimeter F13: Average phase value F14: Elongation

Length of the boundary in the inner part of cell Average phase value for pixels within the inner part of RBC Orientation of chain code in the boundary of the inner part of cell

The projected surface area (S), average phase value ( ϕ ), mean corpuscular hemoglobin (MCH), and MCH surface density (MCHSD) are defined as [2]: S=N

p2 1 ,ϕ= 2 N M

10λ

N

 ϕ , MCH = 2πα Sϕ , MCHSD = i =1

i

MCH , S

(3)

where N is the number of pixels within the measured area of the RBC, p is the pixel size (pixel size is the same in the x and y directions), M is the magnification of the off-axis DHM used to image RBCs, ϕi is the corresponding RBC phase value at the ith pixel, λ is the wavelength of the light source used in the DHM system, and α is the specific refraction increment and is a constant in our system. When the boundary of the analyzed cell area is marked using the Freeman chain code of 8 directions, the perimeter (Per), circularity (Cir), and elongation (Elo) features are defined as [2, 7]:

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2390

Per = N e × 1 + N o × 2, Cir = Per 2 / S ,

(4)

max(| N 0,4 − N 2,6 |,| N1,5 − N 3,7 |) , if | N 0,4 − N 2,6 |≠| N1,5 − N 3,7 | Elo =  , if | N 0,4 − N 2,6 |=| N1,5 − N 3,7 |  0,

where Ne and No are the number of even- and odd-valued elements, respectively, in the chain code of the boundary in the analyzed cell area, i.e., the segmented single cell or the inner part of the cell. And, N j ,k =

n



1

(5)

i =1 ai = j or ai =k

where n ( = Ne + No) is the total number of elements in the Freeman chain code along the object boundary, and ai (i = 1, 2, …, n) is an integer from 0 to 7 in the Freeman chain code of 8 directions [7]. The phase of the center pixel (PCP), the sphericity coefficient (SC), and the D value (D value) are defined as [2]: PCP= SC =

1   ϕ ij , 25 -2≤ i ≤ 2 −2≤ j ≤ 2

PCP

ϕ

,

(6)

max

Dvalue = PCP − ϕ max ,

where ϕ ij is the phase at location i,j within a 5 × 5 window and 0,0 is the center of the single RBC or the inner part of the RBC, and ϕ max is the maximal phase within the analyzed single RBC or the inner part of the RBC. 3.4 Analysis of RBC features Fourteen features are extracted for each class of RBC. However, any dependent or redundant features should be discarded when separating the RBC groups. The stepwise selection of variables, a combination of the forward and backward selection methods [29] and commonly referred to as stepwise discriminant analysis [28, 29], is used to analyze the measured RBC features. For the forward selection procedure, the Wilks Λ or F test is used to test the null hypothesis that a particular variable does not help separate the groups beyond what other existing variables yield. In this procedure, first the Wilks Λ value [29] is calculated for each individual variable, i.e., Λ( yi ) is calculated for each variable yi, where i = 1, 2, …, p and p is the number of variables. Then, the variable with the minimum Λ( yi ) or the maximum associated F value is selected. For example, if y1 is selected in the first step, then Λ( yi y1 ) for each of the other p – 1 variables that are not entered at the first step are calculated and that with the minimum Λ( yi y1 ) or the maximum associated partial F is chosen. Then assume that y2 is chosen in the second step. In the third step, Λ ( yi y1 , y2 ) is calculated for each of the p – 2 remaining variables and the variable with the minimum Λ ( yi y1 , y2 ) or the maximum associated F value is chosen. This process continues until no more variables can be selected. Thus, Λ( yi y1 ) , Λ ( yi y1 , y2 ) , …, Λ ( yi y1 , y2 ,..., y p −1 ) are measured using [29]:

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2391

(

)

Λ x y1 ,..., y p =

Λ ( y1 ,..., y p , x ) Λ ( y1 ,..., y p )

,

(7)

which is distributed as Λ1,vH ,vE − p with the degrees of freedom of vH = k – 1 and vE = k(n – 1), where k is the number of groups and n is the number of observed data. In Eq. (7), if the additional variable x makes Λ ( y1 ,..., y p , x ) sufficiently smaller than Λ ( y1 ,..., y p ) , then

(

)

Λ x y1 ,..., y p is small enough such that the null hypothesis that the variable x is not helpful

in separating the groups beyond other variables would yield is rejected. Moreover, Eq. (7) can be expressed as [29]: Λ=

E E+H

,

(8)

where k

n

E =  ( yij − y i )( yij − y i ) , T

i =1 j =1

(9)

k

H = n ( y i − y )( y i − y ) , T

i =1

where yij is the jth data point of the ith group, y i is the mean vector of the ith group, and y is the mean of all the sample data. The Λ statistics of Eq. (7) can also be transformed into a partial F statistic as follows [29]: F=

1 − Λ vE − p , Λ vH

(10)

which is distributed as FvH ,vE − p , where vH = k – 1 and vE = k(n – 1). The null hypothesis is rejected if Λ in Eq. (7) or F in Eq. (10) is smaller than the critical values Λα ,1, vH , vE − p or Fα ,vH ,vE − p , where α is the significance level.

Similar to the forward selection operation, the backward selection begins with all the features. Then, after the Wilks Λ or the partial F test is applied at each step, the feature that contributes the least is removed. For stepwise selection, only one feature is added at a time and the Λ value is checked against Wilks’ Λ or the equal partial F test. At the same time, the features are rechecked to see if any feature that had been selected previously had become redundant with the recently chosen feature. This check is performed using Wilks’ Λ or the equal partial F test of Eq. (7) or Eq. (10). 3.5 Multivariate tests of the equality of covariance matrices The equality or inequality of the covariance matrices affects the design of the RBC classifier so it is reasonable to test the equality of the covariance matrices for multivariate populations. The null hypothesis H0 for the equality of covariance matrices is expressed as [29]: H 0 : Σ1 = Σ 2 = ⋅⋅⋅ = Σ k .

(11)

To perform the test, W must be calculated from multivariate normal distributions using the assumption that the independent groups are of sizes n1, n2, …, nk [29]:

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2392

S1

W=

v1 / 2

v2 / 2

S2 S pl

 Sk

vk / 2

 i vi / 2

,

(12)

where vi = ni – 1, Si is the covariance matrix of the ith RBC group sample, and Spl is the covariance matrix of the pooled sample and is defined as [29]:

  k

S pl

v Si

i =1 i k

v

i =1 i

=

E . vE

(13)

The distribution of W can be further approximated using χ2. For this approximation, the variable c1 [29] is calculated first:  k 1 1   2 p2 + 3 p −1  c1 =   − k   ,  i =1 vi  vi   6( p + 1)(k − 1)  i =1  

(14)

where p is the number of features, k is the number of classes, and vi = ni – 1. Then, we define [29]: u = −2(1 − c1 )lnM,

(15)

which is distributed approximately as χ 2 {[(k − 1) p ( p + 1)] / 2} . Consequently, the null

hypothesis H0 is rejected if u > χα2 , [( k −1) p ( p +1) ]/ 2 , where α is the significance level and [(k – 1)p(p + 1)]/2 is the degree of freedom. 3.6 Classifier selection Only features that are helpful in separating the RBC groups remain after feature analysis by stepwise selection. One approach to performing an input test using an RBC with reduced feature y of unknown group membership is to use a distance function to find the mean vector of the k groups ( y1 , y 2 , …, y k ) to which y is the closest and then assign y to the corresponding group. However, the distance function or discriminant function varies and depends on the equality of the covariance matrices. If the χ2 test finds that the covariance matrices of the different RBC groups are equal, then linear classification functions can be adopted. Thus, if Σ1 =Σ 2 = ⋅⋅⋅ =Σ k , the input RBC feature y is compared to each group mean vector yi using the following distance function [29]: Di2 ( y ) = ( y − yi ) S-1pl ( y − yi ) , T

(16)

where i = 1, 2, …, k and S pl =

1 k  (ni − 1)Si . N − k i =1

(17)

ni and Si are the size of the sample data and the covariance matrix of the ith RBC group, respectively, and N = n1 + n2 + … + nk. Equation (16) can be expanded to the following linear classification rule [29]: Di2 ( y ) = y T S -1pl y − 2 y iT S −pl1y + y iT S -1pl y i .

(18)

Because the first term in Eq. (18) does not affect the classification result from group to group, it can be omitted. The second term in Eq. (18) is a linear function of y so the linear classification function is expressed as [29]: #262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2393

Di2 ( y ) = −2y iT S pl−1y + y iT S pl−1 y i .

(19)

Then, y is assigned to the group in which Di2 ( y ) achieves a minimum value. On the other hand, when Σ1 =Σ 2 = ⋅⋅⋅ =Σ k for these RBC groups is not satisfied, the observation RBC feature y tends to be classified too frequently into groups whose covariance matrices have a larger variance on the diagonal, as indicated in Eq. (19). Therefore, the use of the linear classification function for RBC classification is not recommended when the covariance matrices are not equal. In this case, the nonlinear quadratic classification method is used for the RBC classification [29]: Di2 (y ) = ( y − y i ) S i−1 ( y − y i ) , T

(20)

where Si and y i are the covariance matrix and the mean vector of the ith RBC group, respectively, and y is the input RBC feature vector. Equation (20) is a quadratic function and cannot be reduced to a linear function of y, as was the case for Eq. (16). Similar to the linear classification function, the RBC feature y, of an unknown group, is assigned to the group for which Di2 (y ) in Eq. (20) is the smallest. Figure 3 is the flowchart for the design of the classifier and the classification of the unknown input RBC. First, the classifier is designed using the RBC training data with the RBC class labeled and then the designed classifier is used to classify the input RBC that is in an unknown group. 1: Training

RBC Phase Image

RBC Phase Image

RBC Segmentation

RBC Segmentation

Gabor Wavelet Filter

Gabor Wavelet Filter Classifier

Feature Extraction

Yes

Feature Analysis Equality Check of Covariance Matrices

=?

No

Feature Extraction Linear Classification Functions

Feature Reduction

Nonlinear Classification Functions

2: Testing

Fig. 3. Flowchart of the design of the RBC classification method.

4. Experimental results

All the RBCs used in our experiment were obtained from healthy blood donors stored in a transfusion bag at the Laboratoire Suisse d’Analyse Du Dopage, CHUV. RBC stock solution (100–150 μL) was mixed with a HEPA buffer (15 mM HEPES, pH 7.4, 130 mM NaCl, 5.4 mM KCl, 10 mM glucose, 1 mM CaCl2, 0.5 mM MgCl2, and 1 mg/mL bovine serum albumin) at 0.2% hematocrit and stored at 4 °C. The experimental chamber included two cover slips separated by 1.2-mm spacers and was filled with 150 μL of HEPA buffer diluted with 4 μl of the RBC suspension. The cells were incubated for 30 min at 37 °C before the chamber was mounted for off-axis DHM imaging. All experiments were performed in a room

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2394

kept at 22°C. The RBC phase images were numerically reconstructed from the RBC holograms recorded by the off-axis DHM using the computational numerical algorithm presented in [21, 22]. Figure 4 shows three reconstructed RBC phase images [2, 19].

Fig. 4. Three reconstructed RBC phase images.

To design the RBC classifier, the RBCs are segmented and then each RBC is extracted from the segmented RBC phase images. Figure 5 shows segmented RBC phase images and the inner part of the RBCs obtained with the marker-controlled watershed transform algorithm [2, 5, 19]. Our segmentation algorithm can work well on phase images with several RBCs slightly connected [19].

Fig. 5. Segmented RBC phase images. Top row: segmented whole RBCs corresponding to the RBC phase images in Fig. 4. Bottom row: segmented inner part of the RBCs corresponding to the RBC phase images in Fig. 4.

A biologist expert selected samples of the three typical RBC shapes for training and testing: (1) stomatocyte-shaped RBCs (117 samples), which have a narrow central part, (2) discocyte-shaped RBCs (105 samples), which have a relatively circular area, and (3) echinocyte-shaped RBCs (100 samples), which have no apparent central region. Figure 6 shows examples of some of the extracted RBC samples. The Gabor wavelet filter was applied to each RBC before the RBC features were measured. Figure 7 shows some of the filtered RBC images resulting from averaging all of the Gabor coefficients obtained by changing the Gabor kernel frequency from 0.1 to 0.4 in steps of 0.1 and changing the rotation angle of the Gabor kernel from 0° to 180°in 30° steps at each kernel frequency for the rotation-invariant property [27]. The sharpness of the Gaussian major and minor axes [α and β in Eq. (1)] were set at 2 and 1, respectively [27].

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2395

Fig. 6. Examples of the three types of RBCs.

Fig. 7. The Gabor-wavelet-filtered RBC images. Top row: extracted RBCs. Bottom row: corresponding Gabor-wavelet-filtered RBCs.

Next, features F1–F10 and F11–F14 were measured in the single RBC and the inner part of the RBC on the Gabor-wavelet-filtered RBC image, respectively. Four random values generated from a standard normal distribution were assigned to F11–F14 for the RBCs in which the inner part was not detected. Next, the importance of each feature was analyzed, i.e., the features not helpful in distinguishing the three RBC groups were screened out. Each variable was checked using the stepwise selection method described in Sec. 3.4. Table 2 presents the results of the analysis with α = 0.05. The null hypothesis for this test was that the variable was redundant or not important for separating the RBC groups. Ten features (F1, F2, F4, F5, F7, F9, F11, F12, F13, and F14) in Table 2 were not redundant and were good features for designing the RBC classifier. Table 2. Results of RBC Feature Analysis Using Stepwise Selection Method Features

p value (significance α = 0.05)

Decision

−8

Keep Keep

F3

5.9818 × 10 1.9109 × 10−5 0.0616

F4

3.2960 × 10−4

Keep

F5

Keep

F6

2.0473 × 10−18 0.9999

F7

0.0123

Keep

F1 F2

Remove

Remove

F8

0.8489

Remove

F9

0.0301

Keep

F10

0.7641

Remove −13

F11

8.5389 × 10

F12

1.9566 × 10−4

Keep

F13

3.6225 × 10−81 0.0036

Keep

F14

Keep

Keep

Figure 8 presents four scatterplots for selected pairs of features in which the separation of the three types of RBCs is clearly seen. Figure 8(c) and Table 2 show that F13 contributes to the RBC classification in a dominant way.

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2396

30

28

(a)

(b) 25

24

20

22

15

F2

F11

26

20

10

18

5 Discocyte RBCs Stomatocyte RBCs Echinocyte RBCs

16 14 15

20

25

30

35

40

45

50

55

Discocyte RBCs Stomatocyte RBCs Echinocyte RBCs

0 -5 20

60

30

40

140

50

60

70

F5

F1 20

(c)

(d)

120 15

100 10

F12

F13

80 60

5

40 20

0

Discocyte RBCs Stomatocyte RBCs Echinocyte RBCs

0 -20 20

30

40

50

60

70

-5 -5

Discocyte RBCs Stomatocyte RBCs Echinocyte RBCs 0

5

F5

10

15

20

25

30

F11

Fig. 8. Scatterplots of selected pairs of features for the three types of RBCs: (a) scatterplot for features F1 and F2, (b) scatterplot for features F5 and F11, (c) scatterplot for features F5 and F13, and (d) scatterplot for features F11 and F12.

The features retained after the stepwise selection step were used to design the RBC classifier. However, choosing a linear or a nonlinear RBC classification method depends on the equality of the covariance matrices of the three RBC types. Therefore, multivariate tests of the equality of the covariance matrices using the remaining ten features were performed with the null hypothesis that the covariance matrices for the three kinds of RBCs are the same ( H 0 : Σ1 =Σ 2 =Σ3 ). The χ2 test was applied to the ten useful features, where χ2 = 4035 and the critical value was 162 at significance level 0.05. However, the equivalent p value for the null hypothesis was approximately 0 at α = 0.05; therefore, the null hypothesis was rejected because the covariance matrices for the three types of RBC were significantly different. This led to the adoption of the following three nonlinear discriminant functions for the three classes of RBC: Stomatocyte RBC : L1 ( y ) = ( y − y1 ) S1−1 ( y − y1 ) , T

Discocyte RBC :

L2 ( y ) = ( y − y2 ) S−21 ( y − y2 ) ,

Echinocyte RBC :

L3 ( y ) = ( y − y3 ) S3−1 ( y − y3 ) ,

T

(21)

T

where S1, S2, and S3 are the covariance matrices and y1 , y2 , and y3 are the mean vectors of the three types of RBC, and y is feature vector of the input test RBC, the group of which is unknown. Finally, we performed a holdout procedure, also called cross validation [29], to estimate the error rates of our proposed RBC classification method based on individual cell level. In this cross-validation method, data from only one observation are tested while data from the

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2397

other observations are used to compute the classifier. This process is repeated using different observation data so eventually each observation is tested. For example, for observation data of size N = 117 + 105 + 100 = 322, each observation is classified by a classifier designed from the data of the other 321 observations. Thus, the testing data are not used to design the classifier of that data. This improves the estimation of the error because the data used to form the classifier are not used to evaluate the error [29]. Table 3 presents the classification result and rate of misclassification for the three types of RBC. Table 3. Classification of the Three Types of RBCs Using the Holdout Procedure Predicted group Actual group

Number of observations

Stomatocyte RBC

Discocyte RBC

Echinocyte RBC

Misclassification rate (%)

Stomatocyte RBC

117

117

0

0

0

Discocyte RBC

105

1

104

0

0.95

Echinocyte RBC

100

0

0

100

0

Table 3 shows that the misclassification rates of our RBC classification scheme were approximately 0, an excellent classification result. Finally, we conducted the 3D RBC classification method in [2] using the same training and testing RBC samples used for our method for comparison to show that our statistical RBC classification method is superior. Table 4 gives the results obtained with the classification method in [2]. Comparison of the misclassification rates in Tables 3 and 4 shows that the RBC classification method presented in this paper is superior to that in [2]. Table 4. Results of the Classification of the Three Types of RBCs Using Method in [2] Predicted group Actual group

Number of observations

Stomatocyte RBC

Discocyte RBC

Echinocyte RBC

Misclassification rate (%)

Stomatocyte RBC

117

111

6

0

5.13

Discocyte RBC

105

1

104

0

0.95

Echinocyte RBC

100

0

2

98

2.00

We also used the receiver operating characteristic (ROC) curve [39] to compare our proposed method with that presented in [2]. The average area under the curve (AUC) for the proposed RBC classification algorithm was 0.9984 and that for the scheme in [2] was 0.9793. Thus, from the AUC values and the results in Tables 3 and 4, we conclude that the presented RBC classification approach is better than that in [2]. To show that the features extracted from Gabor-wavelet-filtered images can contribute to the accuracy of RBC classification, the same classification procedures without using the Gabor wavelet filters are conducted and the misclassification rates are measured to be 0.85%, 2.86%, and 1.00% for stomatocyte, discocyte, and echinocyte RBC, respectively. Comparing these misclassification rates with those in Table 3, it demonstrates that the Gabor wavelet filter can be helpful to the RBC classification since the misclassification rates are increased without using Gabor wavelet filtering. In addition, we have verified that the removal of redundant features using stepwise selection analysis is beneficial to the final RBC classification because the misclassification rates can reach 5.98%, 3.81%, and 1.00% for the stomatocyte, discocyte, and echinocyte RBC when there are no feature selection analysis in the classification procedure. 5. Conclusion

We have proposed a statistical method for classifying RBCs based on 3D morphology and off-axis digital holographic microscopy. The RBC phase images are numerically reconstructed from digital holograms while the single cell and the inner part of the cell are

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2398

segmented using a marker-controlled watershed transform algorithm. Fourteen features are determined from the Gabor-wavelet-filtered RBC. Features that do not help in distinguishing the RBC groups are removed using a statistical stepwise selection scheme, and only the remaining features are used for RBC classification. We designed a linear or nonlinear classification method based on the multivariate χ2 test of the equality of covariance matrices of the different RBC groups. Experimental results showed that the proposed method yields good classification results, with a misclassification rate of approximately zero. In addition, the classification results show that the Gabor wavelet filter contributes to the RBC classification. The proposed classification algorithm can be used to analyze RBC-related diseases and screen compounds during RBC-related drug testing. Acknowledgments

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), which is funded by the Ministry of Science, ICT & Future Planning (NRF-2015K1A1A2029224). B. Javidi acknowledges support under NSF ECCS 1545687. We thank Daniel Boss and Pierre Marquet from Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland, for their help with the experiments.

#262009 (C) 2016 OSA

Received 28 Mar 2016; revised 22 May 2016; accepted 23 May 2016; published 25 May 2016 1 June 2016 | Vol. 7, No. 6 | DOI:10.1364/BOE.7.002385 | BIOMEDICAL OPTICS EXPRESS 2399

Cell morphology-based classification of red blood cells using holographic imaging informatics.

We present methods that automatically select a linear or nonlinear classifier for red blood cell (RBC) classification by analyzing the equality of the...
2MB Sizes 0 Downloads 11 Views