Original Article

Automatic recognizing of vocal fold disorders from glottis images

Proc IMechE Part H: J Engineering in Medicine 2014, Vol. 228(9) 952–961 Ó IMechE 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0954411914551851 pih.sagepub.com

Chang-Chiun Huang1, Yi-Shing Leu2, Chung-Feng Jeffrey Kuo3, Wen-Lin Chu3, Yueng-Hsiang Chu4 and Han-Cheng Wu3

Abstract The laryngeal video stroboscope is an important instrument to test glottal diseases and read vocal fold images and voice quality for physician clinical diagnosis. This study is aimed to develop a medical system with functionality of automatic intelligent recognition of dynamic images. The static images of glottis opening to the largest extent and closing to the smallest extent were screened automatically using color space transformation and image preprocessing. The glottal area was also quantized. As the tongue base movements affected the position of laryngoscope and saliva would result in unclear images, this study used the gray scale adaptive entropy value to set the threshold in order to establish an elimination system. The proposed system can improve the effect of automatically captured images of glottis and achieve an accuracy rate of 96%. In addition, the glottal area and area segmentation threshold were calculated effectively. The glottis area segmentation was corrected, and the glottal area waveform pattern was drawn automatically to assist in vocal fold diagnosis. When developing the intelligent recognition system for vocal fold disorders, this study analyzed the characteristic values of four vocal fold patterns, namely, normal vocal fold, vocal fold paralysis, vocal fold polyp, and vocal fold cyst. It also used the support vector machine classifier to identify vocal fold disorders and achieved an identification accuracy rate of 98.75%. The results can serve as a very valuable reference for diagnosis.

Keywords Laryngeal video stroboscope, glottis physiological parameters, digital image processing, laser projection marking module, glottal area, dynamic image search

Date received: 18 July 2013; accepted: 19 August 2014

Introduction

Literature review

Computer vision techniques can help people to see what is invisible to human eyes in order to enhance diagnostic efficiency. Human glottis opens to the largest extent when individuals breathe with maximal effort and closes to the smallest extent when individuals make voice. At present, vocal fold examination is mainly done by using laryngeal video stroboscope to judge vocal fold disorders based on the observed dynamic images. Laryngeal video stroboscope uses the automatic frequency electronic flash to capture the throat video, so that physicians are available to identify diseases in the glottal area from larynx vocal video.1,2 However, physicians manually select the static images of glottis opening to the largest extent and closing to the smallest extent from video clips recorded by laryngeal video stroboscope for diagnosis. In the medical engineering field, the research on using computer vision technologies to develop automated recognition systems for vocal fold disorders and related medical equipment is booming.

Analysis of the dynamic images can provide more information than that from static images. The relationship between the glottal area and time of dynamic images has great reference value in medical diagnosis.3 A high-speed glottography (HGG) was integrated with image-processing technology by Eysholdt et al.,4 as well as the quantization of the vibrations of the vocal folds, to find the motion curves of the vocal folds

1

Department of Materials Science and Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan 2 Mackay Medical College, Mackay Memorial Hospital, Taipei, Taiwan 3 Graduate Institute of Automation and Control, National Taiwan University of Science and Technology, Taipei, Taiwan 4 Department of Otolaryngology-Head and Neck Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan Corresponding author: Chung-Feng Jeffrey Kuo, Graduate Institute of Automation and Control, National Taiwan University of Science and Technology, Taipei 106, Taiwan. Email: [email protected]

Downloaded from pih.sagepub.com at NORTH CAROLINA STATE UNIV on March 14, 2015

Huang et al.

953

before identifying the actuation parameters of the vibration by computer simulation. Vocal fold motion curves are obtained by image processing and identified relevant physiological parameters. The differences between normal vocal folds and the vocal folds of patients with voice disorders were discussed based on the mechanical model by Do¨llinger et al.5 An automated detection system is developed by Yan et al.6 to detect the location of the glottis and distinguish the normal voice and the disordered voice. The vocal fold records are obtained by the high-speed videoendoscopy (HSV) through computer analysis, employed the average set theory for glottis edge segmentation, and compared the displacement vector of vocal fold vibrations by Skalski and Zielinski.7 An adaptive image segmentation method was employed in glottis edge detection and glottis area calculation, and analyzed the signals according to the waveforms of the opening and closing of glottis by Yan et al.8 In this way, the glottis waveforms could be analyzed and traced to display the status of vocal folds. Verikas et al.9 use a co-occurrence matrix and Gabor filter for the segmentation of glottis location and area, and input color, texture, and geometric characteristic values into the support vector machine (SVM) classifier to distinguish healthy vocal folds and disordered vocal folds. A vocal fold recognition module was established by Me´ndez et al.10 according to the opening angle and the closed glottis area to assist experts in objective diagnosis. Bresch and Narayanan11 used two-dimensional (2D) magnetic resonance imaging (MRI) to unsupervised area segmentation of the human upper respiratory tract images with the aim to automatically calculate the vocal fold contour and location of articulation. Marendicl et al.12 proposed a high-speed digital image capture technology that was applied to calculate and describe the main contours of vocal fold vibrations. The algorithm could be integrated with the glottis local space and time information to increase judgment dimensions. The laryngeal video stroboscope was used to obtain information of the vocal folds and employed the color transformation for optimal recognition. Moreover, according to the Fisher linear discriminant, the trachea and vocal folds were distinguished for the segmentation of vocal fold edges to effectively detect the glottis contours by Allin et al.13 Yan et al.6 applied a new automatic tracing diagram of glottis motion to track the opening and closing of glottis. This method used the global threshold and morphological operations to identify the contours of the opening and closing of glottis, as well as the glottal area waveforms (GAWs). This method can be applied to the clinical diagnosis of voice disorders. Me´ndez et al.14 applied the image-processing method for the automatic segmentation of glottis space to define disorder symptoms according to the area and summarized relevant parameters into a database as the basis for diagnosis of the vocal fold disorders. The threshold segmentation method was proposed by Chen et al.15 for glottis,

coupled with morphological operators and regional growth method, in order to detect glottis contours. The GAW was calculated for the identification of normal and disordered vocal folds. The image-processing method integrated with Gabor filter and Chan-Vese segmentation method is proposed by Zorrilla and Zapirain16 to determine the glottis location by using different methods of segmentation and distinguish normal vocal fold from vocal fold nodules. Two types of artificial intelligence technologies, namely, neural network and SVM, were applied for vocal fold detection by Behroozmand and Almasganj.17 Integrating the imageprocessing technology, the system can conduct pathological classification on three types of vocal fold disorders, including vocal fold edema, vocal fold nodules, and vocal fold polyps. The color, structure, and geometric parameters of vocal fold images are used as the characteristics of the vocal fold, and the SVM was applied to identify normal vocal fold, vocal fold nodules, and vocal fold diffusion. Bacauskiene et al.18 used SVM to identify the glottis images and sound features and employed genetic algorithm to search for its correctness. Based on the above, assistive devices are required for clinical diagnosis of various physical characteristics of glottis. At present, the capture of glottis images relies on manual selection of images from the video clips of computer vision processing. The processing of fuzzy images due to poor quality may lead to judgment errors. To reduce human error in selection and the time to capture dynamic images, it is necessary to develop more accurate and direct glottis detection methods. The main objective of this study is to use the dynamic glottis images photographed by laryngeal video stroboscope for image processing to automatically capture the images of glottis opening to the largest extent and closing to the smallest extent from the video clips and automatically depict the dynamic GAW. Moreover, the SVM is employed to develop an intelligent recognition system for vocal fold disorders from dynamic glottis images.

Research method This study proposed an innovative algorithm for the dynamic image of glottis photographed by laryngeal video stroboscope and conducted experiments. The video format was the audio and video interleaved (AVI) format, which can provide dynamic images of 30 frames/s. This study is divided into three parts. The first part is the automatic selection from the dynamic glottis images (video) the images of glottis opening to the largest extent and closing to the smallest extent. The second part is to conduct area segmentation by using the images of glottis opening to the largest extent to label blocks of largest area by image labeling and make detection records of the entire video. The third part is to integrate the SVM classifier to establish the identification system for vocal fold disorders. The process is as shown in Figure 1.

Downloaded from pih.sagepub.com at NORTH CAROLINA STATE UNIV on March 14, 2015

954

Proc IMechE Part H: J Engineering in Medicine 228(9) Create template. In order to quickly find images of the glottis opening to the largest extent and closing to the smallest extent, this study created templates to facilitate similarity comparison as shown in Figure 3. The selfestablished template selected the image of the glottis opening to the largest extent and closing to the smallest extent and considered the capture of the anterior thyroid notch in the video. This study used anterior thyroid notch and non-anterior thyroid notch templates to calculate similarity measurement. These vocal fold templates were used to calculate similarity computation for different samples. Based on the glottis color characteristics, this study conducted image processing on the established templates and each frame captured from the video film by using two different methods. Image preprocessing 1. To improve the similarity computation results of images of glottis opening to the largest extent, this method used the components of the red, green, and blue (RGB) and hue, saturation, and intensity (HSI) color space for weighted calculation to highlight the features for gray scale processing. The automatic threshold value was employed for binarization processing. Image preprocessing 2. To improve the similarity computation results of images of glottis closing to the smallest extent, this method used the R component of RGB to highlight the features for gray scale processing. The automatic threshold value was employed for binarization processing.

Figure 1. Research flowchart. SVM: support vector machine; GAW: glottal area waveform.

The similarity computation used the correlation coefficient to find the images of the glottis opening to the largest extent and closing to the smallest extent. The equation is shown as follows K P J P

Automatic selection of images of glottis opening to the largest extent and closing to the smallest extent from the video The automatic selection of images of glottis opening to the largest extent and closing to the smallest extent from the video can be divided into the following steps. Input video. The video used in this study was stored in the dynamic image format of 30 frames/s. The image file was sized 320 3 240. This study captured images every 15 frames for subsequent processing, as shown in Figure 2. Image tag. In order to display the original images of the glottis opening to the largest extent and closing to the smallest extent, the image codes, as shown in Figure 2, were tagged to facilitate follow-up detection of original images as referred by Bierbraurer.19

r(x, y) =

w(s, t)f(x + s, y + t)

s=0 t=0



K P J P

w2 (s, t)

s=0 t=0

K P J P

1=2 f2 (x + s, y + t)

s=0 t=0

ð1Þ

where w(s,t) is the template image, f(x, y) is the vocal fold image, and the K and J are the template image sizes. After the similarity computation of templates, the comparison results were saved as a matrix of [N1, N2, N3, ., Nn]. Nn is subject to the time length of each video. This study then selected the top three similarity values according to the comparison results before using the equation to search for the original images. The equation is shown as follows Oimage = ðTag  1Þ3Frame number + 1

ð2Þ

where Oimage is the original image, Tag is the tag number, and Frame_number is the frame number. The top

Downloaded from pih.sagepub.com at NORTH CAROLINA STATE UNIV on March 14, 2015

Huang et al.

955

Figure 2. Images captured every 15 frames.

Glottis Opening

Template 1

Template 2

Without the anterior thyroid notch

With the anterior thyroid notch

The found top 3 similar images by Template 1

The found top 3 similar images by Template 2

The found top 3 similar images by Template 3

Glottis Closing Without the anterior thyroid notch

With the anterior thyroid notch

The found top 3 similar images by Template 4

Figure 3. Self-established template—glottis opening and closing.

three original images of similarity ranking selected from the above equation are as shown in Figure 4. Eliminate unclear images. In the study, the videos may capture other non-vocal fold or unclear screen; the gray scale Shannon entropy value was applied to distinguish the clear and unclear images by setting up the elimination system in given threshold. The concept of entropy is to define the level of variation in information.

Figure 4. Images of the top three original images of similarity ranking.

The mathematical equation for the calculation of gray scale images is as follows ð3Þ

Pi = Ni =Ns H= 

n X

Pi logPi

i=1

Downloaded from pih.sagepub.com at NORTH CAROLINA STATE UNIV on March 14, 2015

ð4Þ

956

Proc IMechE Part H: J Engineering in Medicine 228(9)

Figure 5. Various unclear images.

where Ni is the number of pixels of gray scale degree at i in the image, Ns is the total number of pixels of the image, Pi is the probability of the occurrence of gray scale degree of i, and H is the presentation of Shannon entropy. Threshold is not an objective definition, and normal images may be eliminated accordingly in some cases, while unclear images as shown in Figure 5 may be included. Hence, this study proposed a variety of templates to obtain the entropy of one of highest similarity degrees subtracting the fixed threshold to acquire a new one. Based on the new threshold, adaptive entropy was established for judgment. This method is more comprehensive than setting the entropy parameter of single threshold value. This study used four templates, and there were four adaptive entropy values for the establishment of the elimination system. Identify original images of glottis opening to the largest extent and closing to the smallest extent. The adaptive entropy judgment can eliminate the unclear images and avoid error in follow-up identification. This study employed the two different methods of image processing for clear images. Image preprocessing 3. This method calculates the glottis opening to the largest extent by using the weighted operation of various components of the RGB and HSI color space to filter non-glottis areas for binarization after gray scale transformation. Image preprocessing 4. This method calculates the glottis closing to the smallest extent by using the weighted operation of various components of the RGB and HSI color space to filter non-glottis areas for binarization after gray scale transformation.

The found images by Template 1 and then image-processed

The found images by Template 2 and then image-processed

The found images by Template 3 and then image-processed

The found images by Template 4 and then image-processed

Figure 6. Processed images.

The glottis location as highlighted by the above image-preprocessing methods is as shown in Figure 6. The six images of glottis opening to the largest extent and the six images of glottis closing to the smallest extent were calculated by pixel. Regarding the six similar images identified in Template 1 and Template 2, this study calculated the white area after image processing. By comparison, the one with the largest white area was the image of the glottis opening to the largest extent. In addition, this study calculated the white area of the six similar images identified in Template 3 and Template 4 after image processing. The one with the largest white area was the image of the glottis closing to the smallest extent. Then, tagging method was applied to

Downloaded from pih.sagepub.com at NORTH CAROLINA STATE UNIV on March 14, 2015

Huang et al.

957

Figure 7. Automatically selected images of glottis opening to the largest extent and closing to the smallest extent. Figure 9. Normal vocal fold GAW diagram. GAW: glottal area waveform.

Using SVM for disorder classification

Figure 8. Confirmation and separation of the vocal fold location: (a) glottis opening, (b) image labeling, and (c) region separation.

This study used the gray scale co-occurrence matrix of the statistical analysis method of the texture analysis20 to obtain characteristics. The gray scale co-occurrence matrix used the pixel distribution probability statistics summarized in the form of matrix to check the relevance of the gray scale values of two pairwise pixels at the same location with the expected distance and direction.21 It could realize the purpose of describing image gray scale geometric distribution. This study aimed to achieve the recognition of four types of vocal fold disorders, namely, normal vocal fold, vocal fold paralysis, vocal fold polyp, and vocal cyst. This study employed the gray level co-occurrence matrix and glottis opening to the largest area as characteristic values. The characteristic value of the co-occurrence matrix is shown as follows: 1. Correlation

automatically find the images of glottis opening to the largest extent and closing to the smallest extent, as shown in Figure 7.

Automatic depiction of dynamic GAW As vocal fold condition is related to the cycle of glottis opening/closing, in order to determine the glottis region for segmentation, this study used the eight-connection method to label regions and used tagged area to determine whether it is background noise. Furthermore, the location of glottis in the video was selected for area segmentation according to the blocks of the largest labeled area in the video. The results of area segmentation are shown in Figure 8. This study used the video time and the corresponding glottal area to draw the GAW diagram; the normal vocal fold GAW diagram is shown in Figure 9.

Correlation can reflect the level of linear dependence of gray scale values of the images. A higher level of correlation suggests that there is a linear relationship between the gray scale values of pixels to reflect the direction of texture in the images. Its mathematical equation is as shown in equation (5) Correlation =

N1 X N1 X

i3j3Pd (i, j)  mx my =s2x s2y

i=0 j=0

ð5Þ

where mx =

N 1 X N1 X

Pd (i, j)

ð6Þ

i=0 j=0

my =

N1 X N1 X j Pd (i, j) j=0 i=0

Downloaded from pih.sagepub.com at NORTH CAROLINA STATE UNIV on March 14, 2015

ð7Þ

958

Proc IMechE Part H: J Engineering in Medicine 228(9)

Figure 10. GAW diagrams of vocal folds: (a) tumor, (b) polyp, (c) papilloma, and (d) cyst. GAW: glottal area waveform.

sx =

N 1 X N1 X

(i  mx )2  Pd (i, j)

ð8Þ

(j  my )2  Pd (i, j)

ð9Þ

i=0 j=0

sy =

N1 N1 X X i=0 j=0

mx , my and sx, sy are the row and column averages, and standard deviations of matrix Pd, respectively.

Homogeneity =

N1 N 1 X X

Pd (i, j) 2 i = 0 j = 0 1 + (i  j)

ð11Þ

where Pd is the normalized gray scale co-occurrence matrix, and its equation is as shown in equation (12)

2. Entropy Entropy can reflect the complexity of texture of the images or the chaotic level of element arrangements of the gray scale co-occurrence matrix. Therefore, when the texture is more complex, the value of entropy is larger. The mathematical equation is as shown in equation (10) Entropy = 

around the major diagonal line. When the non-zero elements concentrate around the major diagonal line, the value of homogeneity value will be larger, indicating a higher level of homogeneity. Its mathematical equation is as shown in equation (11)

N1 N 1 X X

Pd =

1 255 P 255 P

P

ð12Þ

P(i, j)

i=0 j=0

4. Contrast Pd (i, j) log (Pd (i, j))

ð10Þ

i=0 j=0

3. Homogeneity Homogeneity can reflect the level of concentration of various elements of the gray scale co-occurrence matrix

Contrast can reflect the concentration of the elements of a gray scale co-occurrence matrix in the major diagonal lines. If the contrast value distribution is even, it indicates the existence of texture in images. On the contrary, smaller contrast value indicates poorer level of contrast of the images, and the gray scale variation of

Downloaded from pih.sagepub.com at NORTH CAROLINA STATE UNIV on March 14, 2015

Huang et al.

959

Figure 11. Distributed data diagram of characteristic values: (a) class, (b) contrast (0°), (c) contrast (45°), (d) contrast (90°), (e) contrast (135°), (f) correlation (0°), (g) correlation (45°), (h) correlation (90°), (i) correlation (135°), (j) entropy (0°), (k) entropy (45°), (l) entropy (90°), (m) entropy (135°), (n) homogeneity (0°), (o) homogeneity (45°), (p) homogeneity (90°), (q) homogeneity (135°), and (r) the largest glottis opening area.

Downloaded from pih.sagepub.com at NORTH CAROLINA STATE UNIV on March 14, 2015

960

Proc IMechE Part H: J Engineering in Medicine 228(9)

two pixels is smaller. The mathematical equation is as shown in equation (13) Contrast =

N1 X N1 X

(i  j)2 Pd (i, j)

ð13Þ

i=0 j=0

Discussion This study proposed an automatic recognizing of vocal fold disorders algorithm; in order to evaluate the performance of the proposed algorithm, the experiment software used MATLAB to develop the vocal fold disorder recognized algorithm. The detected and depicted diagrams can serve as reference indicators as well as the basis for the recognition of normal vocal fold and vocal fold disorders. The glottal images used in this experiment included images of vocal fold tumor, vocal fold polyp, vocal fold papilloma, and vocal fold cyst. The glottal closing, opening, and GAW diagram are as shown in Figure 10. The GAW diagram of vocal fold disorders can be used to help physicians diagnose disorders. The characteristic values captured for the four types include the gray scale co-occurrence matrix internal contrast (0°, 45°, 90°, and 135°), correlation (0°, 45°, 90°, and 135°), entropy (0°, 45°, 90°, and 135°), homogeneity (0°, 45°, 90°, and 135°), and the area of glottis opening to the largest extent. The data of the total 17 characteristic values are as shown in Figure 11, which can be applied to check the input characteristics, characteristic values, and the original sample tags. The sample and tag matching diagram is as shown in the upper left corner, and the rest illustrates the samples of characteristic values 1–17 and corresponding attribute values. As seen, the range of the area characteristic value is relatively broader, and the ranges of other characteristic values are comparatively smaller. In order to assist physician to recognize vocal fold disorders, four types of vocal fold disorders can be recognized from this study. The vocal fold types included, namely, normal vocal fold, vocal fold paralysis, vocal fold polyp, and vocal cyst. In vocal fold paralysis, the vocal folds cannot open or close properly. It causes difficulty in swallowing or food could enter the lungs. However, the vocal fold recognized types that did not contain tumor and papilloma, mainly because the tumor and papilloma samples are seldom in reality. For reliability consideration, this study used the SVM to recognize the disorder types for the normal vocal fold, vocal fold paralysis, vocal fold polyp, and vocal cyst. When applying the SVM for sample training and actual classification of the disorders and characteristics,22 the training samples include 162 samples of normal vocal fold, 186 samples of vocal fold paralysis, 234 samples of vocal fold polyp, and 84 samples of vocal fold cyst, totaling 666 training samples. The training samples must have independence and difference, making SVM find the optimal hyper-plane, and testing

samples are used to verify the recognition accuracy of vocal fold disorder algorithm. In addition, the testing samples include 68 samples of normal vocal fold, 80 samples of vocal fold paralysis, 108 samples of vocal fold polyp, and 64 samples of vocal fold cyst, totaling 320 testing samples; the recognition rate is 98.75%. The results can serve as a very valuable reference for diagnosis.

Conclusion This study proposed an automated recognized vocal fold disorders algorithm, which solved the problem of glottal examination by manual selection and improved the diagnosis efficiency of vocal fold disorders. First, this algorithm searched the glottis opening to the largest extent and closing to the smallest extent by the laryngeal video stroboscope. As seen from the two images, the patient’s breathing efficiency and the severity of aspiration can be identified. Second, this study used the glottal area to draw the GAW diagrams, which establish reference indicators before and after operation for the use in diagnosis. Finally, an experiment was conducted on recognition of vocal fold disorders under four different conditions, namely, normal vocal fold, vocal fold paralysis, vocal fold polyp, and vocal fold cyst. This study used characteristic values in the SVM to recognize the vocal fold disorder. The results indicated that applying SVM classifier in the recognition of vocal fold disorder can achieve the accuracy rate up to 98.75% for all four types of vocal fold disorders. This study was proven to be effective in medical academic research. Declaration of conflicting interests I certify that none of the authors have any financial and personal relationships with any other people or organizations that could inappropriately influence (bias) their work.

Funding The research was supported by the National Taiwan University of Science and Technology–Mackay Memorial Hospital Joint Research Program (MMHNTUST-102-11).

References 1. Hess MM and Ludwigs M.Strobophotoglottographic transillumination as a method for the analysis of vocal fold vibration patterns. J Voice 2001; 14: 255–271. 2. Pontes P, Brasolotto A and Behlau M.Glottic characteristics and voice complaint in the elderly. J Voice 2005; 19: 84–94. 3. Noordzij JP and Woo P.Glottal area waveform analysis of benign vocal fold lesions before and after surgery. Ann Otol Rhinol Laryngol 2000; 109: 441–446.

Downloaded from pih.sagepub.com at NORTH CAROLINA STATE UNIV on March 14, 2015

Huang et al.

961

4. Eysholdt U, Rosanowski F and Hoppe U.Vocal fold vibration irregularities caused by different types of laryngeal asymmetry. Eur Arch Otorhinolaryngol 2003; 260: 412–417. 5. Do¨llinger M, Braunschweig T, Lohscheller J, et al. Normal voice production: computation of driving parameters from endoscopic digital high speed images. Methods Inf Med 2003; 42: 271–276. 6. Yan Y, Chen X and Bless D. Automatic tracing of vocal-fold motion from high-speed digital images. IEEE Trans Biomed Eng 2006; 53: 1394–1400. 7. Skalski A and Zielinski T. Analysis of vocal folds movement in high speed videoendoscopy based on level set segmentation and image registration. In: 2008 international conference on signals and electronic systems (ICSES), Krakow, Poland, 14–17 September 2008, pp.223–226. New York: IEEE. 8. Yan Y, Bless D and Chen X.Biomedical image analysis in high-speed laryngeal imaging of voice production. Conf Proc IEEE Eng Med Biol Soc 2005; 7: 7684–7687. 9. Verikas A, Gelzinis A, Bacauskiene M, et al. A kernelbased approach to categorizing laryngeal images. Comput Med Imaging Graph 2007; 31: 587–594. 10. Me´ndez A, Garcı´ a B, Vicente J, et al. Objective model of vocal folds, based on glottal closure, opening and morphologic criteria. In: 9th international symposium on signal processing and its applications (ISSPA), Sharjah, UAE, 12–15 February 2007, vol. 53, pp.1–4. New York: IEEE. 11. Bresch E and Narayanan S.Region segmentation in the frequency domain applied to upper airway real-time magnetic resonance images. IEEE Trans Med Imaging 2009; 28: 323–338. 12. Marendicl B, Galatsanos N and Bless D. A new active contour algorithm for tracking vibrating vocal folds. In: 2001 international conference on image processing (ICIP), Thessaloniki, Greece, 7–10 October 2001, vol. 1, pp.397– 400. New York: IEEE. 13. Allin S, Galeotti J, Stetten G, et al. Enhanced snake based segmentation of vocal folds. In: 2004 IEEE international symposium on biomedical imaging: nano to macro (ISBI),

14.

15.

16.

17.

18.

19. 20.

21.

22.

Arlington, VA, USA, 15–18 April 2004, vol. 1, pp.812– 815. New York: IEEE. Me´ndez A, Garcı´ a B and Ruiz I. Glottal area segmentation without initialization using Gabor filters. In: 2008 IEEE international symposium on signal processing and information technology (ISSPIT), Sarajevo, Bosnia and Herzegovina, 16–19 December 2008, pp.18–22. New York: IEEE. Chen X, Bless D and Yan Y. A segmentation scheme based on Rayleigh distribution model for extracting glottal waveform from high-speed laryngeal images. In: 2005 Annual International Conference of the Engineering in Medicine and Biology Society, IEEE-EMBS 2005, Shanghai, China, 1–4 September 2005, vol. 7, pp.6269–6272. Zorrilla AM and Zapirain BG. Vocal folds paralysis study using a pre-processing stage of Gabor filtering and Chan-Vese segmentation. In: 2010 IEEE international symposium on signal processing and information technology (ISSPIT), Luxor, Egypt, 15–18 December 2010, pp.360–365. New York: IEEE. Behroozmand R and Almasganj F. Comparison of neural networks and support vector machines applied to optimized features extracted from patients’ speech signal for classification of vocal fold inflammation. In: 2005 IEEE international symposium on signal processing and information technology (ISSPIT), Athens, Greece, 18–21 December 2005, pp.844–849. New York: IEEE. Bacauskiene M, Verikas A, Gelzinis A, et al. A feature selection technique for generation of classification committees and its application to categorization of laryngeal images. Pattern Recognit 2009; 42: 645–654. Bierbraurer J. Introduction to coding theory. London: Chapman & Hall, 2005. Deng Y and Manjunath BS.Unsupervised segmentation of color texture regions in images and video. IEEE Trans Pattern Anal Mach Intell 2001; 23: 800–810. Haralick RM, Shanmugam K and Dinstein IH.Textural features for image classification. IEEE Trans Syst Man Cybern 1973; 3: 610–621. Vapnik VN. Statistical learning theory. New York: John Wiley & Sons, 1998.

Downloaded from pih.sagepub.com at NORTH CAROLINA STATE UNIV on March 14, 2015

Automatic recognizing of vocal fold disorders from glottis images.

The laryngeal video stroboscope is an important instrument to test glottal diseases and read vocal fold images and voice quality for physician clinica...
3MB Sizes 3 Downloads 9 Views