Computerized Analysis of Vocal Folds Vibration From Laryngeal Videostroboscopy *Silvia Gora, *Noy Yavin, *David Elad, †,‡Michael Wolf, and †,‡Adi Primov-Fever, *zTel-Aviv, and yTel Hashomer, Israel Summary: Objectives. To develop an objective analysis of laryngeal videostroboscopy (VSS) movies in the spacetime domain for quantitative determination of the true vocal folds (TVFs) vibratory pattern to allow for detection of local pathologies at early stages of development. Methods. Contours of the TVF and false vocal folds (FVFs) were tracked on each frame of a VSS movie. A registration algorithm was used with respect to the centerline of the FVF to eliminate movements not related to TVF vibration. The registered contours of the TVF were analyzed in time and frequency domains. Results. The TVF vibration demonstrated a sinusoidal pattern with the same fundamental frequency at every section along the folds of healthy subjects, as well as detection of an abnormal area with a different fundamental frequency in TVF with local pathologies. Analysis of the TVF vibration time delay of healthy subject revealed a posterior-to-anterior longitudinal wave that was not detected by visual observation. Conclusions. An objective analysis of laryngeal VSS movies was developed for quantitative determination of the TVF vibration. This analysis was able to detect and quantify TVF characteristics in normal subjects as well as in patients with pathologies beyond the ability of examinee’s naked eyes. Key Words: Vocal folds vibration–Active contours–Videostroboscopy.

INTRODUCTION Laryngeal videostroboscopy (VSS) is the preferable diagnostic method of true vocal fold (TVF) vibration and serves as the gold standard for clinical evaluation of vocal folds (VFs) pathologies.1 The acquired data are a sequence of images, each taken from a successive cycle in a slightly different phase, which yields the illusion of a complete glottal cycle called ‘‘the stroboscopic glottal cycle.’’2,3 It enables visual exploration of the closure of the TVF, the mucosal wave movement, and the symmetry and periodicity of vibrations.4 However, the VSS interpretation is a subjective physician-dependent observation, and thus, development of an objective and precise evaluation is of paramount importance. Many studies were conducted to develop objective tools for processing and optimization of recorded images of laryngeal VSS for improvement of the diagnosis and evaluation of VF disorders. The first step for any analysis was segmentation of the TVF, which required overcoming illumination, movement, and resolution problems. The methods that were implemented included region growing,5 thresholding and edge detection,6 level set segmentation,7 motion estimation,8 morphology,9 and different variations of active contours.10–15 However, tracking of the TVF contours is still problematic at TVF closure, when the glottal area between the TVF is minimal. Various post-segmentation methods were proposed for analysis of the TVF performance that was acquired either via VSS Accepted for publication May 29, 2015. S.G. and N.Y. have contributed equally to this work. From the *Department of Biomedical Engineering, Faculty of Engineering, Tel Aviv University, Tel Aviv, Israel; yDepartment of Otorhinolaryngology, Chaim Sheba Medical Center, Tel Hashomer, Israel; and the zSackler School of Medicine, Tel Aviv University, Tel Aviv, Israel. Address correspondence and reprint requests to David Elad, Department of Biomedical Engineering, Faculty of Engineering, Tel Aviv University, 69978 Tel Aviv, Israel. E-mail: [email protected] Journal of Voice, Vol. -, No. -, pp. 1-7 0892-1997/$36.00 Ó 2015 The Voice Foundation http://dx.doi.org/10.1016/j.jvoice.2015.05.021

or high-speed videoendoscopy (HSVSS). These methods include fitting an affine model to the TVF displacements,10 Nyquist plot to study opening and closing pattern of the TVFs,6,16 comparison of contour displacements, connected component analysis of the glottal area,12 TVF opening angle,13 deviation of the TVF boundaries from the medial axis in time and space, and the mean deviation per each point along the boundary.15 These analyses were used to explore the different patterns of TVF opening and closing in different populations of healthy subjects, as well as in pathologic cases. Most of these evaluations were used on the data from the glottal area. In this work, we used a new method for time and space analysis of VSS movies for exploration of the dynamics of the TVF. The space analysis may be used to objectively explore development of irregularities and abnormal structures that may allow for more precise clinical diagnosis and then improved medical monitoring.

METHODS Study design Videostroboscopic movies of the vibrating TVF were recorded in the Department of Otolaryngology of the Chaim Sheba Medical Center using a digital EndoSTROB system and a rigid endoscope (model DX Ls6035; XION Medical GmbH, Germany). The recordings were acquired in audio video interleave (AVI) format at a rate of 25 frames/s. The study was composed of four subjects: two healthy volunteers and two patients with a pseudocyst on one of their TVFs. The study was approved by the hospital ethical committee (#0415-13-SMC) and the subjects signed an informed consent form. A sequence of at least three periods of vibratory cycles in the same pitch was selected from the VSS movies from each subject. The frames of the selected footage were sampled into a sequence of bitmap (BMP) files of resolution 576 3 720 for further processing using MATLAB by MathWorks. An example

2

Journal of Voice, Vol. -, No. -, 2015

FIGURE 1. Description of the procedure for detection of the outline contours of the true vocal folds (TVFs) and the false vocal folds (FVFs) on each frame. (A) The region of interest of the original image. (B) The gray level image with specified regions for the TVFs and the left and right FVFs. (C) The regions of the glottal area and the left and right FVFs after filtration and thresholding. (D) The detected outlines of the TVF (purple) and the FVF (green) with the centerline of the FVF (yellow). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

is shown in Figure 1A. The analysis of the dynamic characteristics of the TVF was composed of (1) tracking the contours of the TVF in each frame; (2) registration of the data to filter noise not related to TFV vibration; and (3) dynamic analysis of the TVF vibration. Contour tracking Image processing for contour tracking of the TVF started with selection of a region of interest (ROI) that included the TVF and FVF on all frames from a sequence of a single subject (Figure 1A). Then, the regions of the TVF and of left and right FVFs were manually marked on the first frame (Figure 1B). The following steps of the processing included texture classification using Gabor filter to emphasize the differences between the TVF, the glottal area and the FVF,17 local thresholding (based on the preselected border points), and connected components. The outcome at this stage was a binary image of the glottal area surrounded by the left and right FVF bands (Figure 1C). In the final stage of the tracking analysis, we implemented the greedy active contour algorithm18 to identify the contours which trace the TVF and FVF outlines. The algorithm was

modified for open contours by limiting the movement of the contour end points. The required initialization for the snake algorithm was done automatically by sampling points at the binary boundaries of the segmented image (Figure 1C). This avoided, at this stage, the need to take care of big movements between frames. The contours that outline the TVF (purple curves) and the FVF (green curves) were obtained within six iterations (Figure 1D). Examples of the tracked contours on different frames of the VSS are demonstrated in Figure 2. The acquired images contained movements not related to the TVF functional vibrations due to patient’s breathing and examiner’s hand instability. These movements were filtered before the dynamic analysis by registration of all images with respect to a quasistationary feature on the acquired images. Further examination of the video recordings revealed that during phonation, the FVF movements between frames are negligible and appear to be symmetrical. Accordingly, we also assumed in this study that the FVF contours were symmetric and calculated their centerline (see the yellow line in Figure 1D), which will be used for registration of the extracted contours. The TVF and FVF contours from all the frames of the video recording, as

Silvia Gora, et al

Computerized Analysis of the Vocal Folds Vibration

3

FIGURE 2. (AC) Examples of the detected true vocal folds (purple) and the false vocal folds (green) on different frames of videostroboscopy. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

well as the centerlines of the FVF are depicted in Figure 3A. For the registration, we first aligned the centerline of the first image to be vertical and then registered all the centerlines with respect to this vertical line. The results after registration are depicted in Figure 3B.

FIGURE 3. (A) The detected contours of the true vocal folds (TVFs; purple) and the false vocal folds (FVFs; green) from all the frames of one video clip before registration. (B) The TVF and FVF contours after registration about the FVF’s centerlines (yellow). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Dynamic analysis The VSS movies were recorded at a rate of 25 frames/s. Thus, the TVF tracked contours from successive frames do not represent the realistic time domain of the physiological phenomena. Accordingly, we considered the video sampling rate and analyzed the dynamics of the ‘‘slowed motion’’ of the TVF to explore objective differences between healthy and pathologic conditions. First, we analyzed the glottal area as function of frame number as demonstrated in Figure 4. The outcome was similar to previous publications,6,15 but without information regarding spatial variations along the TVF. A more complex analysis was developed for evaluation of both the time and spatial variations during glottal opening/closing oscillations. For this purpose, we calculated the horizontal distances of the left and right TVF contours from the centerline in the registered images for a range of horizontal lines within the central domain of the TVF (Figure 5). The variation of these distances with time is shown in Figure 6A and B, where the curve for each line was scaled about zero between the maximal and minimal distances. The DC offset of these motilities was reduced to focus on the fundamental frequencies of the local vibration. The frequency domain of the TVF vibration about each of the horizontal lines was analyzed by using a fast Fourier transform (FFT) algorithm on the local time-dependent curves (Figure 6C and D). The longitudinal pattern of the TVF vibration was explored by computing the averaged time delay between the periodic vibration curves over successive horizontal lines using the MATLAB function ‘‘finddelay’’ that performs cross-correlation between two close signals. These time delays enabled objective analysis of either anterior-to-posterior longitudinal movement of the TVF or vice versa. RESULTS The VSS movies of the healthy subjects and patients with VF pathology (eg, pseudocyst) were processed to detect the TVF and FVF outlines. Sample results from the recorded data of

4

Journal of Voice, Vol. -, No. -, 2015

FIGURE 5. Schematic description of the outlines of the right and left true vocal folds (TVFs) and the horizontal lines along the glottal area on which local time variation of TVFs was analyzed and shown in Figures 6 and 7.

FIGURE 4. Variation of the glottal area with time in dimensions of pixels versus frames: (A) healthy volunteer and (B) patient with pseudocyst. subject 1 are shown in Figure 2 by the purple and green contours for the TVF and FVF, respectively, and demonstrated the accuracy of the detecting method. The registration algorithm about the centerline of the FVF in each frame revealed a very narrow band of up to seven pixels (ie, yellow band in Figure 3B), which is expected because of the inherent noise of the acquisition system and the processing procedure. The oscillations of the contours representing the FVF (green) are significantly smaller than those of the contours representing the TVF (purple). This outcome supports our assumption that the movement of the FVF is minimal and approximately symmetric about the centerline. Glottal area was easily calculated from the identified TVF contours as a function of frame number (which represented different times) and revealed a sine-like behavior similar to other studies without markers that may identify development of pathologic performance (Figure 4). The horizontal vibrations of the left and right TVFs of the healthy subject 1 with respect to the centerline of the contours (see black arrows in Figure 5) are depicted in Figure 6A,B for different locations within the central domain of the TVF. It is immediately observed that the vibration of both left and right TVFs about the 14 horizontal lines is periodic and smooth with very similar amplitudes. Performing FFT on each of the

curves provided the spectrum for all the lines, either those of the left or the right TVF that collapsed on each other with the same dominant frequency (Figure 6C and D). Similar movement patterns were also obtained for the patient’s TVF with pseudocyst as can be seen in Figure 7. The distributed vibration pattern about the horizontal lines for this case with pseudocyst does not reveal any difference between the left and right folds. On the other hand, the frequency spectrum depicted in Figure 7C and D clearly revealed a different pattern for the horizontal lines in the vicinity of the pathologic lesion on the right fold. Similar results were obtained for the additional healthy and pathologic subjects. Calculation of the time delay of TVF vibrations along the folds of the healthy subject revealed that the vibration about line 14 was delayed with respect to those about lines 7 and 1 by 0.04 seconds and 0.12 seconds, respectively. Thus, the movement was in posterior-to-anterior direction. A similar analysis of the TVF vibration of the pathologic subject revealed that the vibration about lines 1 and 8 was delayed with respect to that of line 16 by 0.12 seconds and 0.04 seconds, respectively. This type of vibration demonstrated an opposite, anterior-to-posterior, movement. The vibration of the TVF for the additional healthy and pathologic subjects revealed no directed movement or chaotic movements.

DISCUSSION An objective and quantitative computerized processing of laryngeal VSS movies was developed for the analysis of the TVF vibration aimed at identification of minor local lesions. The analysis was performed on four VSS videos acquired from two healthy subjects and two patients with minor

Silvia Gora, et al

Computerized Analysis of the Vocal Folds Vibration

5

FIGURE 6. Motility of the true vocal folds (TVFs) of a healthy subject about the 14 horizontal lines within the glottal area shown in Figure 5. (A) Periodic oscillations of the left fold, (B) periodic oscillations of the right fold, (C) frequency spectrum of the left fold about each of the horizontal lines, (D) frequency spectrum of the right fold about each of the horizontal lines. The patterns of motility are the same for left and right TVFs, and the frequency spectra are identical for all lines and both sides. pathologies. The contours of the TVF were successfully tracked in each frame of the video clip of approximately three glottal cycles, which were recorded during production of constant pitches. The tracking algorithm was also successful in frames where the TVFs were partially closed, which was complicated for detection in previous studies.8,11 Rigid registration of the detected outlines of the TVF and FVF about the centerline of the FVF contours revealed much smaller vibrations for the FVF compared with those of the TVF, and thus justifying the assumption that the FVFs move symmetrically to each other and maintain approximately constant shape during phonation. Accordingly, we concluded that the registration succeeded in removing background movements and enabled isolation of the TVF absolute movement signal. Obviously, prominent asymmetry and irregularities of FVF structure and function may prevent finite registration. Similarly, TVF irregularities may prevent stroboscopic evaluation. We used three methods to analyze the dynamics of TVF vibration: glottal area, time-frequency domain, and phase difference. Analysis of the glottal area as a function of frame number showed a sine-like behavior, as reported in previous works.6,15 Therefore, we could rely on the segmentation and registration results for the following time and frequency analysis stages. We also observed similar sine-like periodic behavior in both the pathologic and healthy cases.

In the time and frequency domain analysis, we managed to detect pathologic characteristics in a specific TVF which was not detected with the glottal area analysis. It promotes the option of a better way of identification of abnormalities that are not obvious to detect. It is important to note that these minor pathologies were also observed by expert physicians via conventional VSS and therefore additional supporting the diagnostic procedure at regions of uncertainty. The method proposed in this study for analyzing the TVF vibration also allowed exploring the spatial propagation of the vibration phase delay along the folds. Previous studies of the phase difference were conducted on laryngeal HSVSS images of healthy subjects to determine the opening direction.19–21 Here, we successfully determined the direction of movement from data taken at a much lower rate using the widely used method of VSS. In conclusion, a quantitative method was developed for objective analysis of the TVF vibration from laryngeal VSS movies. The analysis was performed on the TVF and FVF outlines that were accurately tracked on each image, and after registration of the images to cancel movements not related to the TVF vibration. The VSS videos from two healthy and two subjects with VF pathologies were successfully analyzed both in the time and frequency domains. The frequency spectrums for all points along the TVF were identical without observable differences between the right and left healthy TVFs. However,

6

Journal of Voice, Vol. -, No. -, 2015

FIGURE 7. Motility of the true vocal folds (TVFs) of a patient with pseudocyst in his right fold about the 16 horizontal lines within the glottal area shown in Figure 5. (A) Periodic oscillations of the left fold, (B) periodic oscillations of the right fold, (C) frequency spectrum of the left fold about each of the horizontal lines, (D) frequency spectrum of the right fold about each of the horizontal lines. The patterns of motility are the same for left and right TVFs; however, the frequency spectra are clearly different with an additional dominant frequency in the vicinity of the lesion.

the spectrum of the frequency domain revealed significant differences between frequency spectra of left and right TVFs at locations of defined lesions like pseudocysts. Analysis of the spatial vibration of the TVF in time and frequency domains also differed between healthy and diseased folds. This innovative and objective analysis of VSS movies provides important additional information of VFs characteristics to the physician’s naked-eyes interpretation.

REFERENCES 1. Dejonckere PH, Bradley P, Clemente P, et al. A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Guideline elaborated by the Committee on Phoniatrics of the European Laryngological Society (ELS). Eur Arch Otorhinolaryngol. 2001;258:77–82. 2. Bonilha HS, Deliyski DD. Phase asymmetries in normophonic speakers: visual judgments and objective findings. Am J Speech Lang Pathol. 2008; 17:367–376. 3. Kaszuba SM, Garrett CG. Strobovideolaryngoscopy and laboratory voice evaluation. Otolaryngol Clin North Am. 2007;40:991–1001. 4. Simpson B, Rosen C. Operative Techniques in Laryngology. Springer; 2008. 5. Lohscheller J, Toy H, Rosanowski F, Eysholdt U, Dollinger M. Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos. Med Image Anal. 2007;11: 400–413.

6. Yan Y, Bless D, Chen X. Biomedical image analysis in high speed laryngeal imaging of voice production. Conf Proc IEEE Eng Med Biol Soc. 2005;7: 7684–7687. 7. Skalski A, Zielinski T, Deliyski D. Analysis of Vocal Movement in High Speed Videoendoscopy Based on Level Set Segmentation and Image Registration. IEEE ICSES. 2008;223–226. 8. Mendez A, Ismaili Alaoui EM, Garcia B, IBN-Elhaj E, Ruiz I. Glottal Space Segmentation from Motion Estimation and Gabor Filtering. Conf Proc IEEE Eng Med Biol Soc. 2009;2009:5756–5759. 9. Aghlmandi D, Faez K. Automatic segmentation of glottal space from video images based on mathematical morphology and the Hough transform. IJECE. 2012;2:223–230. 10. Saadah AK, Galatsanos NP, Bless D, Ramos CA. Deformation analysis of the vocal folds from videostroboscopic image sequences of the larynx. J Acoust Soc Am. 1998;103:3627–3641. 11. Allin S, Galeotti J, Stettrn G, Dailey SH. Enhanced snake based segmentation of vocal folds. Proc IEEE Int Symp BiomedImaging. 2004;1:812–815. 12. Mendez Zorrilla A, El-Zehiry N, Garcia Zapirain B, Elmagraphy A. Pathological vocal folds features extraction using a modified graph based active contour segmentation. MJEE. 2010;4:55–61. 13. Mendez Zorrilla A, Garcia Zapirain B. Vocal folds paralysis using a preprocessing stage of Gabor filtering and Chan-Vase segmentation. In: Proc IEEE Int Symp Signal Proc Inf Tech. 2011:360–365. 14. Marendic B, Galatsanos N, Bless D. A new active contour algorithm for tracking vibrating vocal folds. Proc Int Symp Image Signal Process Anal IEEE. 2011;1:397–400. 15. Elidan G, Elidan J. Vocal folds analysis using global energy tracking. J Voice. 2012;26:760–768. 16. Ahmad K, Yan Y, Bless DM. Vocal fold vibratory characteristics on normal female speakers from high-speed digital imaging. J Voice. 2011;26: 239–253.

Silvia Gora, et al

Computerized Analysis of the Vocal Folds Vibration

17. Bianconi F, Fernandez A. Evaluation of the effects of Gabor filter parameters on texture classification. Pattern Recogn. 2007;40:3325–3335. 18. Williams DJ, Shah M. A fast algorithm for active contours and curvature estimation. CVGIP: Image Understanding. 1992;55:14–26. 19. Yamauchi A, Imagawa H, Sakakibara K, et al. Phase difference of vocally healthy subjects in high-speed digital imaging analyzed with laryngotopography. J Voice. 2013;27:39–45.

7

20. Orlikoff RF, Golla ME, Deliyski DD. Analysis of longitudinal phase differences in vocal-fold vibration using synchronous high-speed videoendoscopy and electroglottography. J Voice. 2012;26:816. e13–816.e20. 21. Yamauchi A, Imagawa H, Sakakibara K, et al. Characteristics of vocal fold vibrations in vocally healthy subjects: analysis with multi-line kymography. J Speech Lang Hear Res. 2014;57:S648–S657.

Computerized Analysis of Vocal Folds Vibration From Laryngeal Videostroboscopy.

To develop an objective analysis of laryngeal videostroboscopy (VSS) movies in the space-time domain for quantitative determination of the true vocal ...
2MB Sizes 3 Downloads 9 Views