Breath-by-breath detection of apneic events for OSA severity estimation using non-contact audio recordings.

Breath-By-Breath Detection of Apneic Events for OSA Severity Estimation Using Non-Contact Audio Recordings T. Rosenwein, E. Dafna, A. Tarasiuk, Y. Zigel 

Abstract—Obstructive sleep apnea (OSA) is a prevalent sleep disorder, characterized by recurrent episodes of upper airway obstructions during sleep. We hypothesize that breath-bybreath audio analysis of the respiratory cycle (i.e., inspiration and expiration phases) during sleep can reliably estimate the apnea hypopnea index (AHI), a measure of OSA severity. The AHI is calculated as the average number of apnea (A)/hypopnea (H) events per hour of sleep. Audio signals recordings of 186 adults referred to OSA diagnosis were acquired in-laboratory and at-home conditions during polysomnography and WatchPat study, respectively. A/H events were automatically segmented and classified using a binary random forest classifier. Total accuracy rate of 86.3% and an agreement of κ=42.98% were achieved in A/H event detection. Correlation of r=0.87 (r=0.74), diagnostic agreement of 76% (81.7%), and average absolute difference AHI error of 7.4 (7.8) (events/hour) were achieved in in-laboratory (at-home) conditions, respectively. Here we provide evidence that A/H events can be reliably detected at their exact time locations during sleep using non-contact audio approach. This study highlights the potential of this approach to reliably evaluate AHI in at home conditions. Keywords: OSA, audio signal processing, random forest.

I. INTRODUCTION Sleep disordered breathing (SDB) is a battery of sleeprelated breathing abnormalities including obstructive sleep apnea (OSA) that affect about 7% of the adult population. OSA can lead to excessive daytime sleepiness, cardiovascular morbidity, and death [1-4]. OSA is characterized by repeated complete breathing cessations, apneas (A), or partial cessations, hypopneas (H), during sleep. OSA severity is measured by the apnea-hypopnea index (AHI), which is the average number of apnea and hypopnea events per hour of sleep [5-7]. Polysomnography (PSG) is the gold standard test for OSA diagnosis; during PSG patients are required to spend a night in a sleep laboratory connected to various sensors. PSG is expensive, unsuitable for mass screening, and may affect normal sleep [8]. Due to these drawbacks, researchers seek non-invasive alternatives for OSA diagnosis, such as the WatchPAT *This work was supported in part by the Israel Ministry of Industry and Trade, The Kamin Program, award no. 46168. T. Rosenwein, E. Dafna are with the Department of Biomedical Engineering, Ben-Gurion University of the Negev, Beer–Sheva, Israel ([email protected]). A. Tarasiuk is with the Sleep-Wake Disorders Unit, Soroka University Medical Center, and Department of Physiology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Israel ([email protected]). Y. Zigel is with the Department of Biomedical Engineering, Ben-Gurion University of the Negev, Beer–Sheva, Israel (corresponding author, phone: +972-8-642-8372; fax: +972-8-642-8371; e-mail: [email protected]).

978-1-4244-9270-1/15/$31.00 ©2015 IEEE

device [9, 10]. Recently, a non-contact audio approach of respiratory sound analysis was proposed [11-15]. However, little is known about the detection of A/H events at their exact time locations and the validity of the audio respiratory sound analysis in at-home conditions. In recent years, several studies regarding OSA detection (OSA/non-OSA) and OSA severity estimation were published. Karunajeewa et al. [16] proposed a method to classify OSA/non-OSA subjects with a logistic regression model; sensitivity of 89.3% and specificity of 92.3% were reported. Azarbazin et al. [13] proposed a method to distinguish OSA subjects from non-OSA subjects by analyzing their snore variations during sleep. Using linear discriminant analysis they achieved accuracy of 96.4%. Alencar et al. [12] examined the hypothesis that time interval between snores is in good agreement with AHI; correlation of r=0.841 was reported. In these studies, the entire set of detected snores was used in order to extract averaged snoring parameters across the night. However, this approach ignores the breathing dynamics and specific locations of apnea/hypopnea (A/H) events during sleep. We hypothesize that breath-by-breath analysis of the respiratory cycle, e.g., inspiration and expiration phases during sleep, will contribute to better detection of respiratory activity and apneas. To our knowledge, few studies tried to locate A/H events at their specific time of occurrence by using breathing sounds analysis [17, 18]. All of these studies were performed inlaboratory settings. In this study we developed a fully automated algorithm to detect A/H events at their exact time locations during sleep. The algorithm enables distinguishing between regular breathing events and A/H events using a binary random forest classifier [19]. To the best of our knowledge, this is the first attempt to estimate OSA severity in at-home conditions (characterized by low signal to noise ratio) using non-contact audio signals. II. METHODS The proposed algorithm was designed and validated using the hold-out method (Figure 1). Once the breathing events were detected as described in [11], suspected A/H events were automatically segmented; a feature vector from each suspected A/H event was then extracted. The feature vectors fed a binary-random forest classifier, followed by an adaptive individual threshold that determined the classification decision – A/H or non-A/H event. Using this algorithm, OSA severity estimation is straight forward. A. Experimental setup The database for the current study consists of audio recordings of 186 adult patients (>18 years old) as shown in

7688

B. Breathing sounds detection In this study we used our breathing sounds randomforest-based detector [11]. The detector locates the event’s time location and distinguishes between inspiration (snore), expiration, and non-respiration (noise).

A/H or non-A/H model estimation

C. Apnea-hypopnea events’ segmentation Using the breathing sound detector’s output, all nonrespiratory events were eliminated; the energy envelope of the audio signal was calculated, and the baseline was subtracted. Suspected A/H event was defined as a period of 10-90 seconds in which the energy envelope without baseline is negative [14]. Labelling of suspected A/H events was conducted by a technician in the sleep-wake disorder unit. D. Feature extraction & selection For each suspected A/H event, an 18-dimensional feature vector was extracted. The feature-set consists of 6 features (Figure 2), extracted from 3 different intervals: (i) the suspected A/H event, (ii) 10 seconds before the beginning of the suspected A/H event until it starts, and (iii) from the end of the suspected A/H event until 10 seconds after it ends. TABLE I.

SUBJECT’S DATABASE INFORMATION Design in-laboratory

Validation in-laboratory

Validation at-home

# Subjects

18

75

93

Gender (M/F)

16/2

47/28

55/38

Age (mean ± std) range [year] AHI (mean ± std) range [events/hr] BMI (mean ± std) range [kg/m2] TST (mean ± std) range [min] # Insp. (mean ± std) range [x103] # Exp. (mean ± std) range [x103]

46.5±11.8 25.2÷63.6 13.8±9.5 1.4÷42.6 27.7±4.9 17÷39 380.1±46.8 239÷425 3.6±1.3 1.3÷5.4 3.7±1.2 1÷5.5

52.4±15.2 24÷81 21.5±18.5 0.5÷87.4 30.2±6.1 17.2÷38.6 395.6±50.9 250÷440 3.1±1.5 0.107÷6.9 3.3±1.4 0.239÷6.8

51.3±15.2 19.1÷83 17.6±17.3 0.3÷92 30±6.5 18.6÷53 364.6±68.9 182÷520 2.5±1.7 0.018÷7.4 2.6±1.6 0.055÷6.9

Design database

Validation Phase

Breathing sounds detection

PSG

Design Phase

Table 1; The Institutional Review Committee of Soroka University Medical Center approved this study protocol (protocol number 10141). All acquired audio signals are digitized at a sampling frequency of 44.1 kHz, and downsampled to 16 kHz (PCM, 16 bits per sample). The audio signals of 93 patients were acquired at the Sleep-Wake Disorder Unit (Soroka University Medical Center) during a PSG test. The signals were recorded using a digital audio recording device (Edirol R-4 Pro) connected to a noncontact, directional condenser microphone (RODE NTG-1) that was placed 1.0 m above the patient's head. Furthermore, the database is constructed from signals of 93 adult patients who were recorded at their own apartments. A technician installed a WatchPat device (treated as the gold standard) on the patient’s wrist and placed a handy recorder (Olympus, LS-5) on the dresser beside the patient’s head. Along with the acquired signals, we included patients’ information and data from the PSG and WatchPat device.

Validation database

Breathing sounds detection

A/H events segmentation A/H events segmentation

Selected Features

Feature extraction & selection

Model’s parameters

OSA severity estimation

Feature extraction

Model matching

Classification decision

Figure 1: Block diagram of the proposed system.

The features: Breathing rate (BR): During an A/H event the BR decreases compared to non-A/H event. BR was calculated as the number of detected breathing events during the interval. Non-respiratory rate (NR): NR was calculated as the number of non-respiratory events during the interval. This feature holds complimentary information to the BR. Duration to last respiratory event (Dur2Resp): The most intuitive feature. Dur2Resp was calculated as the duration to the last detected respiratory event; high feature values are expected for A/H events. Variation of respiratory energy (VRNRG): The log ratio (LR) between the averaged energy of inspiration (NRGinsp) and expiration (NRGexp) classes in each of the n = 1,..,N respiratory cycles is calculated (1). VRNRG is the range of LR during the interval. 





LR  n   log NRG insp  n  NRGexp  n  



Averaged ventilation (AV): When an A/H event is present the ventilation decreases (compared to a non-A/H event). AV was estimated as the suspected A/H event’s averaged respiratory events’ energy. Mean energy value (ME): Mean value of the suspected A/H event’s energy envelope without baseline. Features were normalized to zero mean and unity standard deviation. Using random forest's Gini importance measure a feature selection algorithm was applied [20].

BMI – Body mass index, AHI – Apnea and hypopnea index, TST – Total sleep time, Insp. – Inspirations detected, Exp. – Expirations detected.

E. Apnea/hypopnea (A/H) or non-A/H model estimation Suspected A/H events are classified as A/H or non-A/H using random forest classifier [19]. Random forest generates Ntree bootstrap replications of the data; for each replication it trains a classification trees. Each node is split using the best among mtry randomly chosen features [21]. Hence,

7689

random forest has only two parameters to be determined: the number of trees, Ntree, and mtry. The data that is used for building the tree is called “In-Bag” while the remaining data is called “out of bag” (OOB). Using the Gini importance measure a feature selection algorithm was applied [20]. The optimal number of features was determined according to the OOB error. F. Model matching In order to obtain the test error estimation, the independent test dataset is passed down the forest. For each suspected A/H event, the majority vote of all trees determines the final class-estimation – A/H or non-A/H (2): 

Yˆ  arg max( si ); si  1 i 1,2,.. ... ,C

Ntree

Ntree

 I (Yˆb  i) ,



b 1

where I is the indicator function that returns 1 if Yˆb , the bth tree decision, is equal to i, the ith class label, and returns 0 otherwise, and C is the number of classes (two in this case).

Figure 2: The features. A: Audio signal (blue) and A/H events’ manual labels (red). B: Energy signal (blue), its envelope without baseline (black) and threshold (red). C: BR, NR, and Dur2Resp. D: VRNRG, AV, and ME.

G. Classification decision In order to reduce false negatives (and increase accuracy) of the A/H detector as presented in Figure 3 Panel A, an adaptive threshold is found for each subject’s score distribution. The algorithm seeks a minimum (“valley”) in the bi-modal scores’ histogram as shown in Figure 3 Panel B. H. OSA severity estimation In order to estimate the AHI of a subject, we divided the number of suspected A/H events that were classified as A/H by the total sleep time (TST) measured by the PSG (WatchPat device) for in-laboratory (at-home) recordings. III. RESULTS AND DISCUSSION We recruited 186 subjects routinely referred for OSA diagnosis. No significant differences were found between system design and validation in age, AHI, BMI, and TST. A significant difference in the number of detected inspiration and expiration events was found between the in-laboratory and at-home recordings (Table 1). This finding could be related to differences in SNR between laboratory and home conditions. A forest was grown using 18-features and parameters (chosen according to the OOB error): mtry = 5 and Ntree = 400. Out of the first 5 most important features, we have representatives from all 3 time-intervals; this provides evidence that respiratory information of A/H event does not exist only during the A/H event itself, but also in its close environment. Table 2 shows the confusion matrix of the A/H system output after the adaptive individual threshold was applied. The diagonal represents the true detection rate of each class. Total accuracy rate of 86.3% TABLE II.

CLASSIFICATION RESULT: A/H EVENTS DETECTOR

True label/Estimated

Apnea/hypopnea

Non-apnea/hypopnea

Apnea/hypopnea

51.02%

48.98%

Non-apnea/hypopnea

8.93%

91.07%

Figure 3: An example of the proposed system’s output: a 35-year-old male, AHI of 28 (events/hour). A: Breath-by-breath OSA analysis. Positive means that the true label of the suspected A/H event is A/H, and negative means that the suspected A/H event‘s true label is a non-A/H. Black line represents the hypnogram. B: Classification threshold. The new threshold is adapted to the “valley” of the subject’s suspected A/H events bi-modal distribution.

(90% CI: 72.97÷99.63%) was achieved. To overcome unbalanced a priori probabilities of the classes (16.95% of A/H events vs. 83.05% for non-A/H events), an agreement measurement [22] of κ=42.98% between the true labels and our class estimation was calculated. This measure takes into account the unbalanced a priori probabilities of the classes. In order to estimate system performance, the estimated AHI was compared to the AHI determined by the PSG and WatchPat device for the in-laboratory and at-home recordings, respectively. As shown in Figure 4, correlation between AHIPSG and AHIEST of r=0.874 (p-value 1.4E-24), diagnostic agreement [23] of 76%, and average absolute difference AHI error of 7.42 [events/hour] were achieved using the in-laboratory database. We can see from the BlandAltman plot in Figure 5 Panel A that there is no consistent bias, i.e., the mean difference between AHIEST and AHIPSG was close to zero (-1.9 events/hour). As for the at-home database, correlation between AHIPAT and AHIEST of r=0.737 (p-value 4E-17), diagnostic agreement of 81.72% and average absolute difference AHI error of 7.838 [events/hour] were achieved (Figure 4). We can see from the Bland-Altman

7690

plot in Figure 5 Panel B that there is no consistent bias, i.e., the mean difference between AHIEST and AHIPAT was close to zero (1.4 events/hour).

[1]

IV. CONCLUSIONS AND FUTURE WORK

[2]

We propose an automatic OSA severity estimation algorithm that is based on breath-by-breath detection of A/H. The proposed algorithm is robust and can be used with various recording devices and SNR conditions. Here we provide evidence that using non-contact audio analysis, A/H can be detected with high temporal resolution; moreover, AHI can be estimated in at-home conditions. The effects of different body positions on system performance should be explored. This study emphasizes the embedded value of this approach for AHI estimation at-home conditions.

[3]

REFERENCES

[4]

[5]

ACKNOWLEDGMENT We thank Mrs. Bruria Freidman from the Sleep Wake Disorder Unit of Soroka University Medical Center, for her collaboration and support.

[6] [7] [8] [9]

[10] [11] [12]

[13]

Figure 4: AHI estimation results. Solid lines represent diagnostic agreement’s boundaries. Dashed line represents identity between the estimated AHI and the gold standard’s AHI (blue for in-laboratory/red for athome recordings).

[14] [15] [16] [17] [18] [19] [20]

[21] [22] [23] Figure 5: Bland-Altman plot: no consistent bias is found in both cases. A: Inlaboratory database. B: At-home conditions database. Solid line indicates the average difference and dashed lines indicate two standard deviations.

7691

A. Malhotra and D. P. White, "Obstructive sleep apnoea," The lancet, vol. 360, pp. 237-245, 2002. N. M. Punjabi, "The epidemiology of adult obstructive sleep apnea," Proc. of the American Thoracic Society, vol. 5, p. 136, 2008. E. Shahar, C. W. Whitney, S. Redline, E. T. Lee, A. B. Newman, F. Javier Nieto, et al., "Sleep-disordered breathing and cardiovascular disease: cross-sectional results of the Sleep Heart Health Study," Am. J. of resp. and critical care med., vol. 163, pp. 19-25, 2001. A. Tarasiuk, S. Greenberg-Dotan, T. Simon, A. Tal, A. Oksenberg, and H. Reuveni, "Low socioeconomic status is a risk factor for cardiovascular disease among adult OSAHS patients requiring treatment," CHEST, vol. 130, pp. 766-773, 2006. W. W. Flemons, M. R. Littner, J. A. Rowley, P. Gay, W. M. Anderson, D. W. Hudgel, et al., "Home diagnosis of sleep apnea: A systematic review of the literaturean evidence review cosponsored by the american academy of sleep med., the american college of chest physicians, and the american thoracic society," CHEST, vol. 124, pp. 1543-1579, 2003. D. Hudgel, R. Martin, B. Johnson, and P. Hill, "Mechanics of the respiratory system atid breathing pattern during sleep in normal humans," 1984. P. E. Peppard, T. Young, M. Palta, and J. Skatrud, "Prospective study of the association between SDB and hypertension," New England J. of Med., vol. 342, pp. 1378-1384, 2000. E. Osman, J. Osborne, P. Hill, and B. Lee, "Snoring assessment: do home studies and hospital studies give different results?," Clinical Otolaryngology & Allied Sciences, vol. 23, pp. 524-527, 1998. G. Pillar, A. Bar, M. Betito, R. P. Schnall, I. Dvir, J. Sheffy, et al., "An automatic ambulatory device for detection of AASM defined arousals from sleep: the WP100," Sleep med., vol. 4, pp. 207-212, 2003. A. Bar, G. Pillar, I. Dvir, J. Sheffy, R. P. Schnall, and P. Lavie, "Evaluation of a portable device based on peripheral arterial tone for unattended home sleep studies," CHEST, vol. 123, pp. 695-703, 2003. T. Rosenwein, E. Dafna, A. Tarasiuk, and Y. Zigel, "Detection of Breathing Sounds During Sleep Using Non-Contact Audio Recordings," in EMBC, Chicago, IL, 2014. A. M. Alencar, D. G. V. da Silva, C. B. Oliveira, A. P. Vieira, H. T. Moriya, and G. Lorenzi-Filho, "Dynamics of snoring sounds and its connection with obstructive sleep apnea," Physica A: Statistical Mechanics and its Apps., vol. 392, pp. 271-277, 2013. A. Azarbarzin and Z. Moussavi, "Snoring sounds variability as a signature of obstructive sleep apnea," Med. Eng. & Physics, vol. 35, pp. 479-485, 2013. N. Ben-Israel, A. Tarasiuk, and Y. Zigel, "Obstructive apnea hypopnea index estimation by analysis of nocturnal snoring signals in adults," Sleep, vol. 35, pp. 1299-305C, 2012. E. Dafna, A. Tarasiuk, and Y. Zigel, "OSA severity assessment based on sleep breathing analysis using ambient microphone," in EMBC, 2013, pp. 2044-2047. A. S. Karunajeewa, U. R. Abeyratne, and C. Hukins, "Multi-feature snore sound analysis in obstructive sleep apnea–hypopnea syndrome," Physiol. Meas., vol. 32, p. 83, 2011. B. R. Snider and A. Kain, "Automatic classification of breathing sounds during sleep," in ICASSP, 2013, pp. 699-703. H. Nakano, K. Hirayama, and T. Tanigawa, "Monitoring sound to quantify snoring and sleep apnea severity using a smartphone: proof of concept," J. of clinical sleep med., vol. 10, pp. 73-78, 2014. L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001. B. H. Menze, B. M. Kelm, and F. A. Hamprecht, "A comparison of RF and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data," BMC bioinformatics, vol. 10, p. 213, 2009. A. Liaw and M. Wiener, "Classification and Regression by randomForest," R news, vol. 2, pp. 18-22, 2002. J. Cohen, "A coefficient of agreement for nominal scales," Educational and Psychol. Meas., vol. 20, pp. 37-46, 1960. D. P. White, T. J. Gibb, J. M. Wall, and P. R. Westbrook, "Assessment of accuracy and analysis time of a novel device to monitor sleep and breathing in the home," Sleep, vol. 18, pp. 115-126, 1995.

Authenticity examination of compressed audio recordings using detection of multiple compression and encoders' identification.

Detection of breathing sounds during sleep using non-contact audio recordings.

Fall Detection Using Smartphone Audio Features.

HomeBank: An Online Repository of Daylong Child-Centered Audio Recordings.

Mechanical impedance measurement and damage detection using noncontact laser ultrasound.

Do lecture audio-recordings support engagement and flexible learning?

Velocity estimation algorithms for audio-haptic simulations involving stick-slip.

Out of touch : from audio recordings to phone apps to mattress sensors, noncontact systems offer a less cumbersome way to monitor sleep.

Method for generating realistic sound stimuli with given characteristics by controlled combination of audio recordings.

An automatic rules extraction approach to support OSA events detection in an mHealth system.

Audio-digital recordings used for independent confirmation of site-based MADRS interview scores.

A technology prototype system for rating therapist empathy from audio recordings in addiction counseling.

Real-time detection, classification, and quantification of apneic episodes using miniature surface motion sensors in rats.

Audio signal analysis in combination with noncontact bio-motion data to successfully monitor snoring.

Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data.

Detection of driving fatigue by using noncontact EMG and ECG signals measurement system.

Accurate Estimation of Obstructive Sleep Apnea Severity Using Non-Polysomnographic Features For Home-Based Screening.

Unsupervised Decoding of Long-Term, Naturalistic Human Neural Recordings with Automated Video and Audio Annotations.

Sleep-wake evaluation from whole-night non-contact audio recordings of breathing sounds.

Low-complexity intrauterine pressure estimation using the Teager energy operator on electrohysterographic recordings.

Bayesian estimation of directed functional coupling from brain recordings.

Fast Parabola Detection Using Estimation of Distribution Algorithms.

Maximum likelihood estimation and identification directly from single-channel recordings.

Automatic Detection of Cow's Oestrus in Audio Surveillance System.