sensors Article

Toward Improving Electrocardiogram (ECG) Biometric Verification using Mobile Sensors: A Two-Stage Classifier Approach Robin Tan and Marek Perkowski * Department of Electrical and Computer Engineering, Portland State University, Portland, OR 97201, USA; [email protected] * Correspondence: [email protected] Academic Editor: Panicos Kyriacou Received: 4 December 2016; Accepted: 9 February 2017; Published: 20 February 2017

Abstract: Electrocardiogram (ECG) signals sensed from mobile devices pertain the potential for biometric identity recognition applicable in remote access control systems where enhanced data security is demanding. In this study, we propose a new algorithm that consists of a two-stage classifier combining random forest and wavelet distance measure through a probabilistic threshold schema, to improve the effectiveness and robustness of a biometric recognition system using ECG data acquired from a biosensor integrated into mobile devices. The proposed algorithm is evaluated using a mixed dataset from 184 subjects under different health conditions. The proposed two-stage classifier achieves a total of 99.52% subject verification accuracy, better than the 98.33% accuracy from random forest alone and 96.31% accuracy from wavelet distance measure algorithm alone. These results demonstrate the superiority of the proposed algorithm for biometric identification, hence supporting its practicality in areas such as cloud data security, cyber-security or remote healthcare systems. Keywords: electrocardiogram (ECG); biometric recognition; random forest; wavelet distance measure; data security

1. Introduction Mobile devices are now indispensable in our daily lives in social networking, ecommerce, online banking, and remote healthcare. As a result, a large amount of personal data is stored in the cloud and can be accessed anywhere around the globe. This poses a major concern in data security and confidentiality [1,2]. Traditional biometric recognition systems based on fingerprints or iris exhibit certain limitations regarding robustness against falsified credentials or spoof attacks [3,4]. To overcome the existing challenges, electrocardiograms (ECGs) have been studied as an emerging biometric modality for subject verification because of their high distinctiveness, difficult replication, and intrinsic aliveness detection [5,6]. Conventionally, ECG tests rely on a set of 12 leads placed on a human body in a clinical environment. More recently, low-power and small-size biosensors are being integrated into mobile devices for real-time monitoring of people’s physiological conditions during daily activities, including electrocardiograms. This makes mobile ECG data readily available for biometric recognition without added hardware cost. The use of ECG for identity recognition dates back to the pioneer studies of Biel et al. [7], Irvine et al. [8], and Kyoso and Uchiyama [9], which revealed that ECG contains sufficiently detailed information to uniquely identify an individual. During the last decade, considerable efforts have been made to develop various algorithms using ECG for subject identification [10–15]. Generally speaking, the ECG identification systems are categorized based on the method of feature (also called template) extraction as well as the type of template matching for classification. The ECG feature extraction Sensors 2017, 17, 410; doi:10.3390/s17020410

www.mdpi.com/journal/sensors

2017, 17, 17, 410 410 Sensors 2017,

2 of 16

extraction as well as the type of template matching for classification. The ECG feature extraction methods methods are are mainly mainly categorized categorized as as either either fiducial-based fiducial-based or or non-fiducial-based. non-fiducial-based. The The fiducial-based fiducial-based feature extraction relies on an accurate detection of ECG fiducial characteristic points such as as P, P, Q, Q, R, R, feature extraction relies on an accurate detection of ECG fiducial characteristic points such S, QRSon QRSoffoff, ,TTonon andTToffoffasasshown shownininFigure Figure1,1,to to obtain obtain their their relative relative amplitude, amplitude, on, , QRS S, T, T, PPon on,, P Poff off,, QRS , ,and temporal intervals and morphological features [5–7,9,16–18]. Here P , P , QRS , QRS on on on , and temporal intervals and morphological features [5–7,9,16–18]. Here Pon, Poff,offQRSon, QRSoff, off Ton, ,Tand Toff T represent the onset and offset timing location of P-wave, QRS-wave, and T-wave (Figure 1). off represent the onset and offset timing location of P-wave, QRS-wave, and T-wave (Figure 1). The nonThe non-fiducial-based feature extraction analyzes the ECG complexes (of heartbeats) time or fiducial-based feature extraction analyzes the ECG complexes (of heartbeats) using timeusing or frequency frequency analysis such as discrete wavelet transform (DWT) or discrete cosine transform (DCT) to analysis such as discrete wavelet transform (DWT) or discrete cosine transform (DCT) to obtain other obtain other statistical features [10,14,19,20]. The template-matching include one-to-many statistical features [10,14,19,20]. The template-matching classifiers classifiers include one-to-many template template [5,9,10], neighbor k-nearest algorithms neighbor algorithms (KNN)and [21,22], and non-linear matching matching [5,9,10], k-nearest (KNN) [21,22], non-linear machine machine learning learning techniques such as the artificial neural networks (ANN) [23] and support vector machines techniques such as the artificial neural networks (ANN) [23] and support vector machines (SVM) [24]. (SVM) [24].

R

R

Pon Poff

Q

T

QRSoff

QRSon

P

Ton

Toff

S TR-R

Figure 1. Electrocardiogram (ECG) P-QRS-T complex and fiducial characteristic points. Figure 1. Electrocardiogram (ECG) P-QRS-T complex and fiducial characteristic points.

The ECG verification performance reported by theby literature varies depending on databases, ECGsubject subject verification performance reported the literature varies depending on methods ECG data recording, sample size, signal pre-processing techniques, types oftypes feature databases,ofmethods of ECG data recording, sample size, signal pre-processing techniques, of extraction, and classification methods.methods. Many studies based on the clinical multior single-lead feature extraction, and classification Manywere studies were based on the clinical multi- or ECG data, such MIT-Beth HospitalIsrael (MIT-BIH) and(MIT-BIH) Physikalisch-Technische Bundesanstalt single-lead ECGasdata, such Israel as MIT-Beth Hospital and Physikalisch-Technische (PTB) diagnostic databases hosted on the Physionet [25–32]. website A few other studies built the Bundesanstalt (PTB) diagnostic databases hosted onwebsite the Physionet [25–32]. A few other sensor design into a lab prototype to obtain “off-the-person” ECG data for analyzing the performance studies built the sensor design into a lab prototype to obtain “off-the-person” ECG data for analyzing of biometric of identification. For identification. instance, Shen For et al.instance, [6] implemented the[6] identity verification theECG performance ECG biometric Shen et al. implemented the using template matching decision-based network (DBNN) fornetwork a group (DBNN) of 20 subjects identity verification usingand template matching neural and decision-based neural for a from database, reported the rateand of correct identity verification as 95% for template groupthe of MIT-BIH 20 subjects from theand MIT-BIH database, reported the rate of correct identity verification matching 80% for the DBNN, combining thewhile two methods produced 100% correct rate. as 95% forand template matching and while 80% for the DBNN, combining the two amethods produced Wang al. [33]rate. developed a hierarchical architecture integrating human ECG fiducial features and a 100%etcorrect Wang et al. [33] developed a hierarchical architecture integrating human ECG appearance features for appearance classification,features and achieved 100% correct human identification evaluated fiducial features and for classification, and achieved 100% when correct human with 13 subjects. Shen et al. [11] applied a combined matching and distance classification identification when evaluated with 13 subjects. Shen ettemplate al. [11] applied a combined template matching methods to the classification ECG signal recorded from biometric recognition group of subjects, and distance methods tothe thepalms ECGfor signal recorded from for thea palms for168biometric and achieved rate.and Chan et al. a[10] studied the biometric of recognition fora a95.3% groupidentification of 168 subjects, achieved 95.3% identification rate. performance Chan et al. [10] one-channel signalsperformance recorded fromoftheone-channel pads of individuals’ thumbs, from a group subjects studied the ECG biometric ECG signals recorded from of the50 pads of during three data-recording on50 different days, and three achieved 89% classification accuracy using individuals’ thumbs, from asessions group of subjects during data-recording sessions on different adays, wavelet classifier. For more details, suggest the review paper of ECG biometric and distance achievedmeasure 89% classification accuracy using awe wavelet distance measure classifier. For more recognition by Odinaka et al. [34]. details, we suggest the review paper of ECG biometric recognition by Odinaka et al. [34]. The latest advancement of integrating biosensors into mobile devices has facilitated the real-time measurement human health conditions [3].[3]. Taking advantage of the measurement of ofECG ECGsignal signalfor formonitoring monitoring human health conditions Taking advantage of fact the that mobile ECG ECG data might be readily available, we aimed to develop a robust effective mobile fact that mobile data might be readily available, we aimed to develop a and robust and effective ECG biometric recognition system forsystem reliablefor access control in different to mobile ECG biometric recognition reliable access controlapplications. in different Compared applications. the ECG data measured from a multi-lead setup,clinical the mobile ECG from poorer Compared to the ECG data measured fromclinical a multi-lead setup, the signals mobile suffer ECG signals suffer signal-to-noise ratio and baseline drift due todrift human or motion or artifacts finger to from poorer signal-to-noise ratio and baseline duerespiration to human respiration motionfrom artifacts from electrode pad contact, well asas power line interferences [35–37]. As a result, detections of mobile finger to electrode padascontact, well as power line interferences [35–37]. Asthe a result, the detections of mobile ECG wave boundaries (i.e., onset and offset points) become less reliable as they are mostly

Sensors 2017, 17, 410 Sensors 2017, 17, 410

3 of 16 3 of 16

susceptible to noise and baseline wandering, leading to a sub-optimal performance using fiducialbased feature extraction methods. Alternatively, although non-fiducial-based feature extraction ECG wave boundaries (i.e., onset and offset points) become less reliable as they are mostly susceptible methods obviate the need for detecting ECG fiducial points, the subject verification accuracy varies to noise and baseline wandering, leading to a sub-optimal performance using fiducial-based feature depending on the ECG signal quality as well as the choice of classification method. Moreover, when extraction methods. Alternatively, although non-fiducial-based feature extraction methods obviate the the subject size magnifies, the performance tends to degrade and computational load increases need for detecting ECG fiducial points, the subject verification accuracy varies depending on the ECG significantly [34,38–39]. signal quality as well as the choice of classification method. Moreover, when the subject size magnifies, To overcome the mobile ECG limitations and improve the accuracy and robustness of a the performance tends to degrade and computational load increases significantly [34,38,39]. biometric recognition system using mobile ECG, a new two-stage subject verification system is To overcome the mobile ECG limitations and improve the accuracy and robustness of a biometric proposed in this paper that takes the advantages of both fiducial and non-fiducial features and recognition system using mobile ECG, a new two-stage subject verification system is proposed in this intelligently combines a probabilistic random forest classifier with a one-to-many template matching paper that takes the advantages of both fiducial and non-fiducial features and intelligently combines a classifier based on wavelet coefficients. To objectively assess the performance of the proposed probabilistic random forest classifier with a one-to-many template matching classifier based on wavelet algorithm, a new ECG database is created by combining ECG data from four sources, including the coefficients. To objectively assess the performance of the proposed algorithm, a new ECG database is ECG from a mobile phone, the ECG in the presence of arrhythmia, the ECG with normal sinus created by combining ECG data from four sources, including the ECG from a mobile phone, the ECG rhythm, and the ECG data measured over a 6-month span. To the best of our knowledge, our method in the presence of arrhythmia, the ECG with normal sinus rhythm, and the ECG data measured over a is the first study to use a prototype system of industrial sensors integrated into mobile phones to 6-month span. To the best of our knowledge, our method is the first study to use a prototype system of obtain ECG data. An up/down sampling technique is utilized to re-sample the ECG signals at industrial sensors integrated into mobile phones to obtain ECG data. An up/down sampling technique different sampling rates into a uniform one, to ensure the proper operation of the two-stage is utilized to re-sample the ECG signals at different sampling rates into a uniform one, to ensure the classification system. proper operation of the two-stage classification system. The rest of this paper is organized as follows. The proposed ECG biometric verification system The rest of this paper is organized as follows. The proposed ECG biometric verification system is is presented in Section 2; Section 3 describes the ECG data sources as well as the generation of the presented in Section 2; Section 3 describes the ECG data sources as well as the generation of the new new database using cubic spline interpolation; the methodology of the proposed algorithm is database using cubic spline interpolation; the methodology of the proposed algorithm is presented presented in detail in Section 4; the results and discussions are demonstrated in Section 5; finally, in detail in Section 4; the results and discussions are demonstrated in Section 5; finally, Section 6 Section 6 concludes the paper. concludes the paper. SystemFramework Framework 2.2.System Figure 22 shows shows the verification system using mobile ECG. The Figure the framework frameworkofofa atypical typicalbiometric biometric verification system using mobile ECG. system operates in two stages: the enrollment stagestage and the stage.stage. The ECG signalsignal taken The system operates in two stages: the enrollment andverification the verification The ECG from an individual by a mobile device is transmitted to the biometric verification system in a remote taken from an individual by a mobile device is transmitted to the biometric verification system in a center center over wireless networks. At theAt time enrollment, the system extracts a set aofset features from remote over wireless networks. theoftime of enrollment, the system extracts of features the ECG signal of each individual, and stores the feature template into the database. During the from the ECG signal of each individual, and stores the feature template into the database. During the verification stage, when the ECG signal from an unknown subject is received, the system again verification stage, when the ECG signal from an unknown subject is received, the system again extracts a set of using features same methodology andthem applies to a classifier for making. decision aextracts set of features theusing same the methodology and applies to athem classifier for decision making. In this study, a new two-stage verification algorithm is proposed, as illustrated in Figure 3. In this study, a new two-stage verification algorithm is proposed, as illustrated in Figure 3.

Figure Figure2.2.Biometric Biometricpatient patientverification verificationsystem systemusing usingmobile mobileECG. ECG.

From Figure 3, in the enrollment stage the ECG signal received from each individual i (i = 1, . . . N ) is first pre-processed, where N is the total number of individuals registered in the system. The system then extracts the fiducial features Xei (l ) based on limited ECG significant points (P, Q, R, S, T), as well as non-fiducial features Xei (w) based on significant wavelet coefficients, and next stores both feature sets into the database. When the ECG signal from an unknown subject j is received, j j the system once again extracts its fiducial features Xq (l ) and non-fiducial features Xq (w). Next, the proposed two-stage subject verification system includes the following steps: (1) application of the

Sensors 2017, 17, 410

4 of 16

j

ji

Xq (l ) feature set to a random forest classifier with function f [·]. The probabilities Pq for the unknown subject j being identified as the individual i in the database are then derived as given in Equation (1): h i j1 j2 jN j Pq , Pq , . . . , Pq = f Xq (l ) Xe1 (l ), Xe2 (l ), . . . , XeN (l )

(1)

(2) selection of K candidate subjects (K  N) whose probabilities are higher than a pre-determined probability threshold Pth and application of the data of those selected K subjects to the subsequent one-to-many template matching classifier; (3) use of a 1-to-K template matching classifier, based on significant wavelet coefficients, to calculate the wavelet distance D using the unknown identity feature j Xq (w) against each of the selected candidate subjects Xek (w) (k = 1, . . . , K) as given in Equation (2): h i j Dk = WDIST Xq (w)| Xek (w) , k = 1, . . . K

(2) j

Here WDIST [·] is the function that calculates the distance between the test subject Xq (w) and each candidate Xek (w). Finally, a decision is made using the minimum Dk which means that the candidate j Xek (w) with the smallest distance to Xq (w) is selected. Compared to a conventional 1-to-N template matching classifier, the proposed two-stage classifier using 1-to-K template matching eliminates the concerns of significantly increased computational load and performance degradation as subject size N Sensors 2017, 17, 410 4 of 16 increases [34,38,39].

j

Figure 3. Proposed two-stage ECG biometric subject verification algorithm. The fiducial feature Xq (l ) Figure 3. Proposed two-stage ECG biometric subject verification algorithm. The fiducial feature is first applied to the random forest classifier, to identify the limited K potential candidates (usually j is random forest classifier, identify coefficient) the limited is applied potentialtocandidates (usually K first < 5).applied Next, to thethe non-fiducial feature Xq (w)to(wavelet a 1-to-K template 5). Next, the non-fiducial feature matching classifier to make the final decision. (wavelet coefficient) is applied to a 1-to- template matching classifier to make the final decision.

3. Data Sources From Figure 3, in the enrollment stage the ECG signal received from each individual To real-life practicalitywhere of biometric verification, the proposed algorithm is 1, …support is first pre-processed, is thesubject total number of individuals registered in the evaluated using multiple types ofthe ECG data features on subjects under different health conditions. 1 system. The system then extracts fiducial based on limited ECG significantTable points summarizes the database used, including a comparatively large database generated in this study. (P, Q, R, S, T), as well as non-fiducial features based on significant wavelet coefficients, and

next stores both feature sets into the database. When the ECG signal from an unknown subject j is Table 1. Summary of the datasets used in this study. MIT-BIH: MIT-Beth Israel Hospital. received, the system once again extracts its fiducial features and non-fiducial features . Next, the proposed two-stage subject verification system includes the following steps: (1) application Subject Sampling Subject Typeclassifier with function Number Recordings feature set to a random forest ∙ . of The probabilities Ratefor the of the Data Sources Quantity unknown subject as the individual 30 i in the databaseSingle are then derived 400 as given in Mobile Phone being identified Healthy Hz PhysioNet Human-ID Age between 13 and 75 89 2–20 times over 6 months 500 Hz Equation (1): MIT-BIH MIT-BIH Mix of above

Arrhythmia Normal sinus , ,… , rhythm Mix of above

47

| 18 ,

,…,

184

Single Single Mix of above

360 Hz 128 Hz (1) 360 Hz

(2) selection of candidate subjects ( ≪ ) whose probabilities are higher than a preand application of the data of those selected subjects to the determined probability threshold A majority of previously developed ECG biometric recognition systems utilize ECG data measured subsequent one-to-many template matching classifier; (3) use of a 1-totemplate matching from the same device at the same sampling rate. However, in real applications, the ECG signals may classifier, based on significant wavelet coefficients, to calculate the wavelet distance using the unknown identity feature against each of the selected candidate subjects ( 1, … , ) as given in Equation (2): |

,

1, …

(2)

PhysioNet Human-ID MIT-BIH MIT-BIH Mix of above

Age between 13 and 75 Arrhythmia Normal sinus rhythm Mix of above

89 47 18 184

2–20 times over 6 months Single Single Mix of above

500 Hz 360 Hz 128 Hz 360 Hz

Sensors 2017, 17, 410

5 of 16

A majority of previously developed ECG biometric recognition systems utilize ECG data measured from the same device at the same sampling rate. However, in real applications, the ECG signals taken from different types of mobile devices with different samplingThe frequencies. The be taken may frombe different types of mobile devices with different sampling frequencies. unmatched unmatched may lead to operation an improper of asystem. classification system. resolve sampling ratesampling may leadrate to an improper of aoperation classification To resolve thisTo problem, this problem, an up/down-sampling interpolation technique is introduced in this study to convert an up/down-sampling interpolation technique is introduced in this study to convert the ECG signals ECG signals at different sampling ratessampling into a uniformed rate (360 Hzstudy). is selected in this atthe different sampling rates into a uniformed rate (360sampling Hz is selected in this A unified study). A database is then created by integrating all four datasets using cubic spline database is unified then created by integrating all four datasets together using cubictogether spline data interpolation data interpolation method for data up/down-sampling. 4 illustrates an exampleusing of up-sampling method for data up/down-sampling. Figure 4 illustratesFigure an example of up-sampling the cubic usinginterpolation the cubic spline interpolation for the ECG data from the MIT-BIH normal sinus rhythm. spline method for ECGmethod data from MIT-BIH normal sinus rhythm.

Figure4.4.ECG ECGup-sampling up-samplingusing usingcubic cubicspline splinedata datainterpolation. interpolation.Left: Left:128 128Hz; Hz;Right: Right:360 360Hz. Hz. Figure

Methodology 4.4.Methodology FollowingSection Section22on onthe theframework frameworkofofthe theproposed proposedECG ECGbiometric biometricsubject subjectverification verificationsystem system Following using combined random forest and wavelet distance measure classifiers, this section describes the using combined random forest and wavelet distance measure classifiers, this section describes the methodology in detail for identifying an unknown subject. methodology in detail for identifying an unknown subject. 4.1.Data DataPre-Processing Pre-Processing 4.1. Theraw rawECG ECGdata data first applied a bandpass using fast Fourier transform The The is is first applied to atobandpass filterfilter using fast Fourier transform (FFT).(FFT). The low low frequency of the bandpass filter setto toget 2 Hz rid ofwandering; baseline wandering; the high frequency cutoff cutoff of the bandpass filter is set to 2isHz rid to of get baseline the high frequency frequency of thefilter bandpass setto to keep 50 Hzastomuch keep as much ECGenergy signal as energy as possible cutoff of thecutoff bandpass is setfilter to 50isHz ECG signal possible while while removing the line power line interference and other high frequency removing the power interference (60 Hz) (60 andHz) other high frequency noise. noise. 4.2. 4.2.R-Peak R-PeakDetection Detection R-peak R-peakdetection detectionisisthe thekey keytotoensure ensureeach eachP-QRS-T P-QRS-Tcomplex complexisiscorrectly correctlydelineated. delineated.The Thegoal goalfor for R-peak R-peakdetection detectionisistotolocate locatethe thetiming timingposition positionfor forall alltrue truepositive positiveR-peaks R-peakswhile whileeliminating eliminatingfalse false positive positiveR-peaks. R-peaks.In Inthis thisstudy, study,a amodified modifiedR-Peak R-Peakdetection detectionisiscreated. created.ItItcombines combinesthe thevalley–peak valley–peak detection detectionalgorithm algorithm with withshifting shifting windowing windowing from from [40] [40]with withan anadaptive adaptivethreshold thresholdalgorithm algorithmasas proposed proposedby byPan Panand andTompkins Tompkins[41]. [41].The Theaccuracy accuracyofofthis thisnew newR-peak R-peakdetection detectionalgorithm algorithmisisevaluated evaluated using the MIT-BIH arrhythmia database as the R-peak positions are accurately annotated by the medical staff. The proposed R-peak detection algorithm achieves a 99.46% true positive rate (TPR), which demonstrates the effectiveness of the modified R-peak detection algorithm. The computation time is 0.53 s, which is ~1/5 of the time needed by the Pan–Tompkins algorithm under the same development environment (the same computer device, with dual CPU cores of Intel® core™-i5 processor 3.6 GHz). 4.3. P-QRS-T Complex Delineation After R-peak detection, the next step is to delineate the P-QRS-T complex through time windowing for ECG feature extraction. In this study, a time window of 800 ms centered on R-peak location is

development environment (the same computer device, with dual CPU cores of Intel® core™-i5 processor 3.6 GHz). 4.3. P-QRS-T Complex Delineation Sensors 2017, 17, 410

6 of 16

After R-peak detection, the next step is to delineate the P-QRS-T complex through time windowing for ECG feature extraction. In this study, a time window of 800 ms centered on R-peak used to segment each P-QRS-T complex. The 3D arrayThe of the complexECG is illustrated location is used to segment each P-QRS-T complex. 3D delineated array of theECG delineated complex in is Figure 5. in Figure 5. illustrated

Figure 5. 3D array of P-QRS-T complexes.

To further improve improve the the feature feature extraction extraction accuracy, accuracy, some some of of the the P-QRS-T P-QRS-T outliers outliers are are removed removed To further as suggested by al. [10] [10] by by calculating calculating the the Pearson Pearson correlation correlation coefficients each P-QRS-T P-QRS-T as suggested by Chan Chan et et al. coefficients for for each complex against the mean complex calculated from the 3D array. Figure 5 presents an example of 10 complex against the mean complex calculated from the 3D array. Figure 5 presents an example of waveforms, and we define the mean complex as the mean of these waveforms. The distribution 10 waveforms, and we define the mean complex as the mean of these waveforms. The distribution of of the examined as where the correlation correlation coefficients coefficients is is examined as follows. follows. A A threshold threshold is is determined determined as: as: µ − 0.5 0.5 × σ,, where is the the mean mean value value of of the the correlation correlation coefficient thestandard standard deviation. deviation. If If the the correlation correlation µ is coefficient and and σ isisthe coefficient of aa P-QRS-T P-QRS-T complex complex falls falls below below this this pre-determined pre-determined threshold, considered as an coefficient of threshold, it it is is considered as an outlier and is removed. outlier and is removed. 4.4. 4.4. Fiducial Fiducial Feature Feature Extraction Extraction Once Q, S, S, T T peaks peaks and and valleys valleys are are first first detected detected using using aa local local Once the the R R point point is is detected, detected, the the P, P, Q, maximum/minimum within aa defined definedphysical physicalregion. region.In Inthis thisstudy, study,the theQQand and maximum/minimum searching searching algorithm algorithm within S S points are limited within the 150 ms width window, centered at the R point. The P point is within points are limited within the 150 ms width window, centered at the R point. The P point is within a a200 200 ms period advance from the point. The point within a 400 period backward from ms period advance from the RR point. The TT point is is within a 400 msms period backward from thethe R R point. Next, the onset and offset points P wave on , P off ) and T wave (T on,,T T off) ) are determined point. Next, the onset and offset points forfor P wave (P(P , P ) and T wave (T are determined on on off off using to Singh Singh et et al. al. [14]). [14]). Once using the the triangle triangle optimization optimization method method (for (for details details please please refer refer to Once those those ECG ECG significant pointsare arecorrectly correctly identified, fiducial features are extracted then extracted based their significant points identified, the the fiducial features are then based on theiron relative relative amplitude, as angles well asofangles of the ECGaswave, as illustrated 6. temporaltemporal interval,interval, amplitude, as well as the ECG wave, illustrated in Figurein6.Figure Prior to Prior to data processing by the proposed patient verification system, the QT temporal intervals data processing by the proposed patient verification system, the QT temporal intervals Ti are scaled are scaled to according to the Framingham formula [42]: according the Framingham formula [42]: Tiscaled = Ti + 0.154 × (1 − Tr−r )

(3)

where Tr−r is the time interval between the adjacent R peaks. To achieve an optimized performance, three combinations of ECG significant points are investigated for feature extraction. Table 2 shows a summary of the extracted fiducial features from the three combinations of ECG fiducial points. Since the mobile ECG signals suffer from a poorer signal-to-noise ratio and baseline drift due to human respiration or motion artifacts from finger-to-electrode pad contact, the detections of mobile ECG wave boundaries (i.e., onset and offset

is the time interval between the adjacent R peaks. where To achieve an optimized performance, three combinations of ECG significant points are investigated for feature extraction. Table 2 shows a summary of the extracted fiducial features from the three combinations of ECG fiducial points. Since the mobile ECG signals suffer from a poorer Sensors 2017, 17, 410 7 of 16 signal-to-noise ratio and baseline drift due to human respiration or motion artifacts from finger-toelectrode pad contact, the detections of mobile ECG wave boundaries (i.e., onset and offset points) points) less reliable they are mostly susceptible to noise and baseline wandering. Therefore, becomebecome less reliable as theyasare mostly susceptible to noise and baseline wandering. Therefore, the the presented two-stage cascaded method allows for fiducial features extracted from only P, Q, R, S, presented two-stage cascaded method allows for fiducial features extracted from only P, Q, R, S, T peaks and valleys to be sufficient sufficient for for subject subject verification. verification.

Figure fiducial points. points. Figure 6. 6. Feature Feature extraction extraction based based on on fiducial Table 2. Summary of extracted fiducial features. ML: machine learning; ML-N: using N fiducial points in ML.2. Summary of extracted fiducial features. ML: machine learning; ML-N: using N fiducial points Table in ML.

Method Method ML-9 ML-9 ML-5 ML-3 ML-5

Fiducial Points P, Q, R, S, T, Pon, Fiducial Points Poff, Ton, Toff P, Q, R, S, T, Pon , P, Q, R, S, T Poff , Ton , Toff Q, R, S P, Q, R, S, T

ML-3 Feature Extraction Q, R, S 4.5. Non-Fiducial

Temporal Temporal T1 to T15 T11 T3 toT1Tto 6, T 15, T12, T15 T3, T4

Amplitude Angle V1, V2, V3, V4, V5, Amplitude Angle A1 to A6 V 6, V 7 V ,V ,V ,V , A6A5 V11V, V, 22V, V, 33V, V44, V7 A1Ato 3 to 5 6 7 V 1, V 2, V 7 A5

T3 to T6 , T11 , T12 , T15

V1 , V2 , V3 , V4 , V7

A3 to A5

T3 , T4

V 1 , V2 , V7

A5

The ECG non-fiducial features are obtained based on DWT. The wavelet analysis provides a 4.5. Non-Fiducial Feature Extraction time–frequency representation of an analyzed signal in time domain, allowing a higher The ECG non-fiducial features are obtained based on DWT. The wavelet analysis provides a temporal resolution for high frequency components of and lower temporal resolution for its time–frequency representationThis of anmulti-scale analyzed signal x (t) in timeisdomain, allowing a higher temporal low-frequency components. time resolution beneficial for ECG complex data resolution for high frequency components of x t and lower temporal resolution for its low-frequency ( ) processing, as the ECG waveform exhibits both high frequency data transitions (related to QRS) as components. This multi-scale time resolution is beneficial complex data as the well as low frequency waves (P, T) within a small P-QRS-T for timeECG window. Using theprocessing, wavelet analysis ECG waveform exhibits ECG both high data transitions (related well as lowatfrequency algorithm, the original data frequency is hierarchically decomposed intoto QRS) -levelassub-series different waves (P, T) within a small P-QRS-T time window. Using the wavelet analysis algorithm, the original frequency bands. This is done by processing the input ECG data with two complementary high ECG data low is hierarchically decomposed into N-level at different frequency This in is pass and pass filters in a tree-structured fashionsub-series and down-sampling by two atbands. each stage done input ECG data x (t) with two complementary and lowinpass filters in orderby to processing decomposethe into a set of orthogonal components (D1 to D5,high A5),pass as shown Figure 7. In a tree-structured fashion andthe down-sampling twoand at each stage in order to decomposetime intoseries a set Figure 7, D1 to D5 represent detailed time by series A5 represents the approximate of orthogonal components (D1 to D5,isA5), shownfrequency in Figure [43] 7. Inof Figure 7, D1 to D5 represent the at different frequency sub-bands; the as Nyquist the ECG data. detailed time series and A5 represents the approximate time series at different frequency sub-bands; Fn is the Nyquist frequency [43] of the ECG data. The corresponding wavelet coefficients ω pr at each level of decomposition can be derived by Equation (4):   Z +∞ 1 t − r ∗ 2p ω pr = x (t) ∗ √ ϕ dt (4) 2p −∞ 2p where ϕ(t) is the selected mother wavelet, p is the scale parameter which represents the level of decomposition, and r is the shifting parameter which gives the number of wavelet coefficients at decomposition level p. In this study, the Daubechies (db) mother wavelet is used, which provides a family of wavelets called dbN, where N is the order of wavelets. For achieving the best overall patient

Sensors 2017, 17, 410

8 of 16

verification accuracy, db3 is used to decompose each ECG P-QRS-T complex into sub-series D1 to D5 Sensors 2017, 17, 410 8 of 16 and A5, as illustrated in Figure 8. Figure 7. The 5-level decomposition using Daubechies discrete wavelet transform (DWT).

The corresponding wavelet coefficients Equation (4):

at each level of decomposition can be derived by ∗



(4)



where is the selected mother wavelet, p is the scale parameter which represents the level of decomposition, and r is the shifting parameter which gives the number of wavelet coefficients at decomposition level p. In this study, the Daubechies (db) mother wavelet is used, which provides a family of wavelets called , where is the order of wavelets. For achieving the best overall patient verification accuracy, 3 is used to decompose each ECG P-QRS-T complex into sub-series D1 to D5 and A5, as illustrated in Figure 8. Figure 7. The 5-level decomposition using Daubechies discrete wavelet transform (DWT). Figure 7. The 5-level decomposition using Daubechies discrete wavelet transform (DWT).

Amplitude

+300 The corresponding wavelet coefficients +250 Equation (4): +200 +150

P-QRS-T Complex

at each level of decomposition can be derived by



+100



(4)



+50

Amplitude

Amplitude

where is the 0 selected mother wavelet, p is the scale parameter which represents the level of decomposition, -50 and r is the shifting parameter which gives the number of wavelet coefficients at -100 decomposition level study, wavelet is0.7used, 0.8 which provides a 0 p. In this 0.1 0.2 the Daubechies 0.3 0.4 (db) mother 0.5 0.6 Time (second) family of wavelets called , where is the order of wavelets. For achieving the best overall Decomposed Wavelet Coefficients patient verification 3 is used to decompose each ECG P-QRS-T complex into sub-series +600 accuracy, +500 D1 to D5 and A5, as illustrated in Figure 8. +400 +300 +200 +100 0 +300 -100 +250 -200 +200 -300 +150 -400 0 +100

P-QRS-T Complex

0.1

0.2

+50

0.3 0.4 0.5 Length of Wavelet Coefficients (second)

0.6

0.8

0.7

0

Figure 8. ECG original complex (top); and decomposed signal (bottom). 8. ECG original complex (top); and decomposed signal (bottom).

-50Figure -100 0

0.1

0.2

0.3

0.4

0.5

0.6

0.8

0.7

Time (second) For For ECG ECG data data with withsampling samplingfrequency frequency Fs , ,its itsNyquist Nyquistfrequency frequencyFn = Fs /2. /2. The The relationship relationship Decomposed Wavelet bandwidth Coefficients between the sampling frequency F and the frequency range F at the p level s p between the sampling frequency and the frequency bandwidth range at the p level wavelet wavelet +600 +500 decomposition is derived in Equation (5) as: decomposition is derived in Equation (5) as:

Amplitude

+400 +300 Fs Fs +200 (5) ≤ Fp ≤ p (5) +100 p2+1 2 2 2 0 -100 Taking the -200 MIT-BIH arrhythmia database as an example (Fs = 360 Hz), Table 3 shows -300 the frequency sub-band range and the number of wavelet coefficients at each level of wavelet -400 0 0.8 0.1 0.2 0.3 0.4 0.5 0.6 decomposition. Considering the fact that ECG signal is pre-processed by0.7a bandpass filter from Length of Wavelet Coefficients (second)

2 Hz to 50 Hz, the ECG non-fiducial features will only take the wavelet coefficients at the significant level of decomposition, within the frequency range of interest. Figure 8. ECG original complex (top); and decomposed signal (bottom).

For ECG data with sampling frequency , its Nyquist frequency between the sampling frequency and the frequency bandwidth range decomposition is derived in Equation (5) as: 2

2

/2. The relationship at the p level wavelet

(5)

frequency sub-band range and the number of wavelet coefficients at each level of wavelet decomposition. Considering the fact that ECG signal is pre-processed by a bandpass filter from 2 Hz to 50 Hz, the ECG non-fiducial features will only take the wavelet coefficients at the significant level of decomposition, within the frequency range of interest. Sensors 2017, 17, 410

9 of 16

Table 3. Number of wavelet coefficients at each frequency sub-band.

Frequency Number of Table 3. Number Decomposition Level (p =of1 wavelet … 5) coefficients at each frequency sub-band. Bandwidth Fp (Hz) Wavelet Coefficients 90Fto 180 144 Coefficients Decomposition Level (pD1 = 1 . . . 5) Frequency Bandwidth Number of Wavelet p (Hz) D2 45 to 90 72 D1 90 to 180 144 D3 D2 45 to 90 22.5 to 45 7236 D3 22.5 to 4511.25 to 22.5 3618 D4 D4 11.25 to 22.5 18 D5 6.75 to 11.25 9 D5 6.75 to 11.25 9 A5 0–6.75 A5 0–6.75 99 4.6. Two-Stage Subject Verification System 4.6. Two-Stage Subject Verification System To enhance the robustness of the biometric subject verification system using mobile ECG, a To enhance the robustness of the biometric subject verification system using mobile ECG, two-stage automatic subject verification algorithm is developed. The ECG fiducial features are first a two-stage automatic subject verification algorithm is developed. The ECG fiducial features are applied to a random forest ensemble learning classifier. A random forest operates by constructing a first applied to a random forest ensemble learning classifier. A random forest operates by constructing multitude of independent decision tree predictors. A bootstrapping technique is used to resample the a multitude of independent decision tree predictors. A bootstrapping technique is used to resample training datasets such that each decision tree takes only a subset of input data. During the model the training datasets such that each decision tree takes only a subset of input data. During the model training process, each leaf node of the decision tree produces estimates of several conditional class training process, each leaf node of the decision tree produces estimates of several conditional class probabilities. When an input subject arrives at this leaf node, it gives the probability of being classified probabilities. When an input subject arrives at this leaf node, it gives the probability of being classified as any subject 1, … in the database. Figure 9 shows the decision-making process when an as any subject i (i = 1, . . . N ) in the database. Figure 9 shows the decision-making process when an unknown subject j is applied to the random forest classifier. unknown subject j is applied to the random forest classifier.

Figure 9. Random j represents anan unknown subject; T is the Random forest forestclassifier classifierfor forfiducial-based fiducial-basedfeatures. features. j represents unknown subject; is total number of decision trees trees used in thein random forest algorithm; N is the total number of subjects the total number of decision used the random forest algorithm; is the total number of stored in stored the database; P ji is aand probability with size N, indicating classified any subjects in the and database; is avector probability vector with size j as, being indicating as as being subject i i = 1, . . . N in the database. ( ) classified as any subject 1, … in the database.

In Figure Figure 9, the total total number number of of decision decision trees trees used used in is In 9, T isis the in the the random random forest forest algorithm; algorithm; N is the total total number number of of subjects subjects stored stored in in the the database. database. The probability of of the the random random the The final final classification classification probability forest classifier classifier is is the the average average of of the the probabilities probabilities from from the the terminal terminal nodes nodes of of all all decision decision trees trees that that aa forest test subject has reached. Only a few candidate subjects K whose probability is above an optimized probability threshold Pth will be applied to the subsequent 1-to-K template matching classifier for further decision. The probability threshold Pth is determined based on the overall subject verification accuracy. Unlike a conventional 1-to-N template matching classifier where the feature vector of an

Sensors 2017, 17, 410

10 of 16

test subject has reached. Only a few candidate subjects whose probability is above an optimized probability threshold will be applied to the subsequent 1-to- template matching classifier for Sensors 2017, 17, 410 10 of 16 further decision. The probability threshold is determined based on the overall subject verification accuracy. Unlike a conventional 1-to- template matching classifier where the feature vector of subject an unknown subject compared to that of elements of the database, the unknown is compared to is that of all N elements of all the subject database, thesubject proposed algorithm proposed algorithm only needs to implement 1-totemplate matching, as shown in Figure 10. only needs to implement 1-to-K template matching, as shown in Figure 10.

1-to-K template Figure 10. 1-totemplate matching matching using usingwavelet waveletdistance distancemeasure. measure.

Assuming the the stored stored wavelet wavelet coefficients coefficients vector vector for for the the subject subject ii isis represented represented as as Xei (w) = Assuming   D feature vector vector X j (w) = D1i ,,DD2i… D . . . D,iPwhere , where Pis isthe thedecomposed decomposedlevel, level,a aquery querysubject subject j has has aa feature q Dj , D j, … D .j The template matching classifier calculates the distance between the wavelet (D1 , D2 , . . . DP ). The template matching classifier calculates the distance between the wavelet coefficients by: coefficients by: j i P D D p Dp − D (6)   (6) WDIST (i ) = ∑ max D i p=1 max D p The smallest smallest wavelet wavelet distance distance WDIST(i) WDIST(i) indicates indicates the the identified identified subject subjecti.i. The In summary, summary, the the proposed proposed two-stage two-stage classifier classifier offers offers two two advantages. advantages. First, First, it it eliminates eliminates the the In concern from previous studies in which the subject verification accuracy of a conventional 1-toconcern from previous studies in which the subject verification accuracy of a conventional 1-to-N matchingclassifier classifiertends tends to worsen when the subject large The algorithm proposed matching to worsen when the subject size Nsize gets toogets largetoo [34]. The [34]. proposed algorithm this eliminates this performance concern only 1-totemplate matching is eliminates performance degradation degradation concern as only 1-to-Kastemplate matching is needed, where needed, where ≪ . Second, the proposed algorithm significantly reduces the computation load. K  N. Second, the proposed algorithm significantly reduces the computation load. 5. Results and Discussion Discussion 5. 5.1. 5.1. Single Random Forest Forest Classifier Classifier The The accuracy accuracy of of machine machine learning learning classifier classifier depends depends on the the accurate accurate detection detection of of ECG ECG fiducial fiducial points, as as thethe features extracted from from those those fiducialfiducial points. points. For performance comparison purposes, points,asaswell well features extracted For performance comparison we first assume thatassume a singlethat random forest algorithm is used and would be and first would independently purposes, we first a single random forest algorithm is used be first evaluated in order to explore howto the subjecthow verification accuracy changes when changes the ECGwhen features independently evaluated in order explore the subject verification accuracy the are different from combinations fiducial points. Following theFollowing above discussions, ECGextracted features from are extracted different of combinations of fiducial points. the above three cases listed Table 2 are evaluated this study our hypothesis that the local discussions, three in cases listed in Table 2 areinevaluated in to thissupport study to support our hypothesis that maximum/minimum fiducial fiducial points (P, Q, R,(P, S, Q, and enoughenough for the for random forest the local maximum/minimum points R,T) S, are andsufficient T) are sufficient the random classifier. Figure Figure 11 shows subject accuracyaccuracy results for the three cases, where machine forest classifier. 11 the shows the verification subject verification results for the three cases, where learning represents the features extracted from the Q, the R, and points; ML-5ML-5 represents the machine ML-3 learning ML-3 represents the features extracted from Q, R,Sand S points; represents features extracted from from the P, the Q, R,P,S,Q, and ML-9 and further includes theincludes wave boundaries the features extracted R,TS,points; and Tand points; ML-9 further the wave Pboundaries , on and to Table 2).to Table 2). , PoffT,off Ton(according , and Toff (according on , Poff , TonP

Sensors Sensors2017, 2017,17, 17,410 410

11 11 of of 16 16

Figure Figure 11. 11. ECG ECG biometric biometric patient patient verification verification accuracy accuracy using using random random forest forest classifier classifier only. only. ML-3 ML-3 represents the features extracted from the Q, R, and S points; ML-5 represents the features represents the features extracted from the Q, R, and S points; ML-5 represents the features extracted extracted from points; and and ML-9 ML-9 further furtherincludes includesthe thewave waveboundaries boundariesPP P,off , and onon , P,off T,onT, on and Toff from the the P, P, Q, Q, R, R, S, S, and and T T points; T(according to Table off (according to Table 2). 2).

Four individual It isItobserved that three datasets, MITFour individual datasets datasetsare areevaluated evaluatedindependently. independently. is observed that three datasets, arrhythmia, MIT-normal, and mobile ECG ECG showshow a high accuracy (>99%) for both and ML-9 MIT-arrhythmia, MIT-normal, and mobile a high accuracy (>99%) for ML-5 both ML-5 and scenarios. ML-9ML-9 shows slightly better accuracy than ML-5 forforMIT-normal other ML-9 scenarios. shows slightly better accuracy than ML-5 MIT-normaldataset. dataset. For For the other datasets (MIT-arrhythmia (MIT-arrhythmia and and mobile mobile ECG), the ML-9 does not show improvement over the ML-5. datasets This is is possibly possibly due due to to the theunreliable unreliabledetections detections of ofthe thewave waveboundaries. boundaries. The The PhysioNet PhysioNet Human-ID Human-ID This dataset shows showsaa totally totallydifferent differentperformance. performance.ML-3 ML-3gives givesthe thebest bestaccuracy, accuracy,while whilethe theML-5 ML-5and andML-9 MLdataset 9 results degrade performance.Further Furtherinvestigation investigationofofthe thePhysioNet PhysioNetHuman-ID Human-IDdataset datasetindicates indicates results degrade inin performance. that there are lots of irregular it that irregular ECG ECG shapes, shapes,resulting resultingin inunreliable unreliablefiducial fiducialpoint pointdetection. detection.Overall, Overall, our itis isconcluded concludedthat thatML-5 ML-5gives givesthe thebest bestperformance performance among among the the four four datasets, which confirms our hypothesis and and is is therefore thereforeused usedfor forthe thefinal finalsystem systemsolution. solution. hypothesis 5.2. 5.2. Single Single Wavelet Wavelet Distance Distance Classifier Classifier To best performance for the template matching classifier based onbased wavelet Toobtain obtainthe the best performance for one-to-many the one-to-many template matching classifier on coefficients while maintaining a low computational load, we investigate the subject verification wavelet coefficients while maintaining a low computational load, we investigate the subject accuracy of accuracy the template wavelet derived derived from three sets of verification of thematching template classifier matchingusing classifier usingcoefficients wavelet coefficients from three decomposition levels, S1 = {D1, D3,D2, D4,D3, D5}, S2D5}, = {D2, D4,D3, D5}, and S3 and = {D3, D5}D4, forD5} all sets of decomposition levels, S1 D2, = {D1, D4, S2 D3, = {D2, D4, D5}, S3 D4, = {D3, four individual ECG databases. Our hypothesis is that the wavelet coefficients are obtained for all four individual ECG databases. Our hypothesis is optimized that the optimized wavelet coefficients are from thosefrom frequency sub-bands sub-bands (see Table 3) within frequency of 2–50range Hz, considering obtained those frequency (see Tablethe 3)ECG within the ECGrange frequency of 2–50 Hz, the fact that ECG signal pre-processed by a bandpass from 2 to 50 Hz. considering the fact thatisECG signal is pre-processed byfilter a bandpass filter from 2 to 50 Hz. Figure Figure 12 12 presents presents the the subject subject verification verification accuracy accuracy results results using using the the 1-to-N 1-to- template template matching matching wavelet classifier withwith the three sets S1, S2, S1, andS2, S3 defined Forabove. the PhysioNet waveletdistance distance(WDIST) (WDIST) classifier the three sets and S3 above. defined For the Human-ID dataset, the sampling rate is 500 Hz. The set S3 contains the ECG frequency range (2–50 Hz) PhysioNet Human-ID dataset, the sampling rate is 500 Hz. The set S3 contains the ECG frequency and therefore gives best performance. For performance. the MIT-normal the set S1 dataset, containsthe theset ECG range (2–50 Hz) andthe therefore gives the best Fordataset, the MIT-normal S1 frequency range due to the sampling ratetoofthe 128sampling Hz. Therefore, degrades when less contains the ECG frequency range due rate ofthe 128performance Hz. Therefore, the performance wavelet coefficients S2, and set S3) are (set included. two datasets For withother sampling rates 360with Hz degrades when less(set wavelet coefficients S2, andFor setother S3) are included. two datasets and 400 Hz, on average, the400 set Hz, S2 gives the bestthe performance. sampling rates 360 Hz and on average, set S2 gives the best performance.

Sensors2017, 2017,17, 17,410 410 Sensors Sensors 2017, 17, 410

12 of of16 16 12 of 16 12

Figure biometric patient verification accuracy using 1-to-N Three combinations Figure12. 12.ECG ECG biometric patient verification accuracy usingclassifier 1-to- only. classifier only. Three Three Figure 12. ECG biometric patient verification accuracy using 1-toclassifier only. of wavelet distance (WDIST) coefficients are presented. WDIST1-5: D1 to D5; WDIST2-5: D2 to D5; and combinationsof ofwavelet waveletdistance distance(WDIST) (WDIST)coefficients coefficientsare arepresented. presented.WDIST1-5: WDIST1-5: D1 D1 to to D5; D5;WDIST2WDIST2combinations WDIST3-5: D3and to D5. 5: D2 D2 to to D5; D5; WDIST3-5: D3 D3 to to D5. D5. 5: and WDIST3-5:

5.3. Two-Stage Two-StageClassifier Classifier 5.3. Two-Stage 5.3. Classifier Basedon onthe theresults resultsfrom fromFigures Figure11 11and and12, Figure 12,performance further performance performance optimization istoapplied applied Based further optimization is appliedis utilize Based on the results from Figure 11 and Figure 12, further optimization to utilize a probabilistic approach that combines the random forest method with the wavelet distance atoprobabilistic approachapproach that combines the random method the wavelet utilize a probabilistic that combines theforest random forestwith method with thedistance wavelettemplate distance template matching classifier. The results are presented in Figure 13. matching classifier. The results are presented in Figure 13. template matching classifier. The results are presented in Figure 13.

Figure 13. Comparisons of classifiers: the machine learning (ML-5), the one-to-many template matching Figure 13. 13. Comparisons Comparisons of of classifiers: the the machine machine learning learning (ML-5), (ML-5), the the one-to-many one-to-many template template Figure (WDIST2-5), and the proposedclassifiers: two-stage classifiers (ML-5 + WDIST2-5) on the four individual datasets matching (WDIST2-5), and the proposed two-stage classifiers (ML-5 + WDIST2-5) on the four four matching (WDIST2-5), and the proposed two-stage classifiers (ML-5 + WDIST2-5) on the and the combined large dataset. individual datasets and the combined large dataset. individual datasets and the combined large dataset.

The algorithm is evaluated on the fourfour individual datasets and also The proposed proposedtwo-stage two-stagecascaded cascaded algorithm is evaluated evaluated on the the individual datasets and The proposed two-stage cascaded algorithm is on four individual datasets and on the combined dataset to demonstrate its ability to support the real-life practicality of ECG biometric also on on the the combined combined dataset dataset to to demonstrate demonstrate its its ability ability to to support support the the real-life real-life practicality practicality of of ECG ECG also verification. The probability threshold value is optimized as 0.15 for pre-determining the candidate biometric verification. The probability threshold value is optimized as 0.15 for pre-determining the biometric verification. The probability threshold value is optimized as 0.15 for pre-determining the subjects to subjects be applied to the 1-to-Ktotemplate matching classifier. Theclassifier. results inThe Figure 13 indicate that candidate to be applied the 1-totemplate matching results in Figure 13 candidate subjects to be applied to the 1-to- template matching classifier. The results in Figure 13 the two-stage classifier achieves an overall better accuracy than each individual classifier. indicate that cascaded the two-stage two-stage cascaded classifier achieves an overall overall better accuracy than than each indicate that the cascaded classifier achieves an better accuracy each For instance,classifier. for the combined dataset withcombined 184 subjects, the WDIST2-5 and ML-5 achieves 96.31% and individual For instance, for the dataset with 184 subjects, the WDIST2-5 and MLindividual classifier. For instance, for the combined dataset with 184 subjects, the WDIST2-5 and ML98.33% subject verification accuracies respectively, while the cascaded two-stage classifier achieves a achieves 96.31% 96.31% and and 98.33% 98.33% subject subject verification verification accuracies accuracies respectively, respectively, while while the the cascaded cascaded twotwo55 achieves stage classifier classifier achieves achieves aa 99.52% 99.52% subject subject verification verification accuracy. accuracy. ItIt should should be be noted noted that that the the unified unified stage

Sensors 2017, 17, 410

13 of 16

99.52% subject verification accuracy. It should be noted that the unified dataset includes subject ECG measured from mobile phones, subject ECG measured in the presence of arrhythmia and subject ECG data measured 2–20 times over a 6-month period. Therefore, the high subject verification accuracy of 99.52% indicates the robustness of our proposed algorithm. In our study, the subject verification accuracy is defined as the number of subjects being correctly classified divided by the total number of subjects being tested. This leads to a small error rate of 0.48% where a subject is incorrectly classified. In our simulation, it is assumed that one subject is represented by only one ECG complex. In reality, the test subject could easily record more heartbeats (e.g., >10 heart beats) at a time to be used for subject verification. According to binomial theorem, using more than one heartbeat would further improve the subject verification rate over the results presented in this paper. From Figure 13, it can also be seen that the achieved accuracy improvement by the two-stage classifier depends on the quality of ECG data acquired. For instances, when ECG data is taken multiple times over a 6-month period, the proposed two-stage classifier achieves a 98.79% accuracy, better than the 93.54% from random forest alone and 96.23% from 1-to-N template matching alone. However, for subject ECG data that were taken from a single time test, such as MIT-BIH arrhythmia, MIT-BIH normal sinus rhythm, and ECG data from mobile phones, a high verification accuracy of better than 99% can be achieved using the single random forest classifier or the single 1-to-N wavelet distance measure classifier. Therefore, it is understandable that the improvement from the two-stage classifier over each single stage classifier would not be very obvious. Last, but not least, it is important to compare the performance of the proposed two-stage classifier to what was reported in literature. Due to the fact that many ECG studies use different datasets from different sources, and often, the data pre-processing technique, feature extraction and classification methods might all be different, a rigorous performance comparison might be impractical. However, the classification accuracy performance reported in literature can still be used as a benchmark and provide a relative reference for our study. Taking MIT-BIH ECG database as examples, Fatemian et al. [44] used the template matching approach for ECG subject identification, and reported an average of 99.61% verification accuracy when evaluated with MIT-BIH normal sinus rhythm database; Zokaee et al. [45] proposed using Mel-Frequency Cepstrum Coefficients (MFCC) for ECG feature extraction and k-Nearest Neighbors (KNN) for classification, and achieved 100% identification accuracy for MIT-BIH normal sinus rhythm database and 89% for ECG data gathered from 50 subjects in a hospital with three records of different times. Sasikala et al. [46] tested the feature extraction approach on ECG verification performance using the MIT-BIH arrhythmia database and achieved 99.0% accuracy; Zeng et al. [47] implemented the reduced binary pattern template matching on MIT-BIH arrhythmia and normal sinus rhythm databases and achieved a success rate of 95.79% and 90.19% separately. As a comparison, our proposed two-stage classification algorithm achieved a 99.43% accuracy for the MIT-BIH arrhythmia database and 99.98% for the MIT-BIH normal sinus rhythm database. These results further demonstrate the robustness and efficiency of the proposed two-stage classifier. 6. Conclusions The main goal of this study is to facilitate the application of mobile ECG for biometric subject verification for applications where data security is demanding. Compared to the ECG data measured from a multi-lead clinical setup, the mobile ECG signals suffer from a poorer signal-to-noise ratio, baseline drift due to human respiration or motion artifacts from finger to electrode pad contact, as well as power line interferences. To overcome those challenges, a new two-stage ECG biometric verification system is proposed by combining probabilistic random forest method with wavelet distance measure. The motivation behind this approach is to decrease the environmental interference into a minimum effect on the verification accuracy. The proposed approach consists of two stages: (1) ECG fiducial features are first applied to a random forest ensemble learning classifier; (2) after the first stage classifier, only few candidates with probability higher than the threshold are applied to the template matching

Sensors 2017, 17, 410

14 of 16

classifier. Compared with either the random forest classifier with ECG fiducial feature extraction, or the template matching classifier using wavelet distance measure only, the proposed hybrid approach is more robust against the environmental variations while still maintaining a low computational load. The investigations of the robustness and effectiveness of the proposed two-stage algorithm are performed over four individual datasets. In addition, in order to simulate the real application environment where the sampling frequency of ECG signals might be different, a combined dataset is created using up/down sampling technique. Overall, for the combined 184 subjects, the proposed two-stage classifier achieves a total of 99.52% subject verification accuracy, better than the 98.33% accuracy from random forest alone and 96.31% accuracy from wavelet distance measure algorithm alone. It is also noted that the proposed two-stage algorithm is particularly effective when ECG data is acquired multiple times over a long time span. The evaluation results support that ECG signals can be potentially used for human biometric recognition using biosensors embedded into mobile devices. Furthermore, this algorithm can be adapted with dual ECG/fingerprint scanning to provide a highly reliable solution for a wide range of applications such as biosecurity and cybersecurity. Author Contributions: Robin Tan proposed the idea, processed the data, developed the algorithms and ran simulations. Professor Marek Perkowski guided the project during simulation and development. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3.

4. 5. 6.

7. 8.

9.

10. 11. 12. 13.

Patel, S.; Park, H.; Bonato, P.; Chan, L.; Rodgers, M. A review of wearable sensors and systems with application in rehabilitation. J. Neuroeng. Rehabil. 2012, 9, 21. [CrossRef] [PubMed] Wang, C.; Wang, Q.; Ren, K.; Lou, W. Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing. In Proceedings of the INFOCOM, San Diego, CA, USA, 15–19 March 2010; pp. 1–9. Boulos, M.N.K.; Wheeler, S.; Tavares, C.; Jones, R. How smartphones are changing the face of mobile and participatory healthcare: An overview, with example from eCAALYX. Biomed. Eng. Online 2011, 10, 24. [CrossRef] [PubMed] Jain, A.; Bolle, R.; Pankanti, S. (Eds.) Biometrics: Personal Identification in Networked Society; Springer Science & Business Media: New York, NY, USA, 2006; Volume 479. Israel, S.A.; Irvine, J.M.; Cheng, A.; Wiederhold, M.D.; Wiederhold, B.K. ECG to identify individuals. Pattern Recogn. 2005, 38, 133–142. [CrossRef] Shen, T.W.; Tompkins, W.J.; Hu, Y.H. One-lead ECG for identity verification. In Proceedings of the Second Joint Conference 24th Annual International Conference of the Engineering in Medicine and Biology Society, Annual Fall Meeting of the Biomedical Engineering Society, Houston, TX, USA, 23–26 October 2002; Volume 1, pp. 62–63. Biel, L.; Pettersson, O.; Philipson, L.; Wide, P. ECG analysis: A new approach in human identification. IEEE Trans. Instrum. Meas. 2001, 50, 808–812. [CrossRef] Irvine, J.M.; Wiederhold, B.K.; Gavshon, L.W.; Israel, S.A.; McGehee, S.B.; Meyer, R.; Wiederhold, M.D. Heart rate variability: A new biometric for human identification. In Proceedings of the International Conference on Artificial Intelligence (IC-AI’01), Las Vegas, NV, USA, 25–28 June 2001; pp. 1106–1111. Kyoso, M.; Uchiyama, A. Development of an ECG identification system. In Proceedings of the 23rd Annual International Conference of Engineering in Medicine and Biology Society, Istanbul, Turkey, 25–28 October 2001; Volume 4, pp. 3721–3723. Chan, A.D.; Hamdy, M.M.; Badre, A.; Badee, V. Wavelet distance measure for person identification using electrocardiograms. IEEE Trans. Instrum. Meas. 2008, 57, 248–253. [CrossRef] Shen, T.W.D.; Tompkins, W.J.; Hu, Y.H. Implementation of a one-lead ECG human identification system on a normal population. J. Eng. Comput. Innov. 2010, 2, 12–21. Wübbeler, G.; Stavridis, M.; Kreiseler, D.; Bousseljot, R.D.; Elster, C. Verification of humans using the electrocardiogram. Pattern Recogn. Lett. 2007, 28, 1172–1175. [CrossRef] Agrafioti, F.; Hatzinakos, D. Fusion of ECG sources for human identification. In Proceedings of the 2008 ISCCSP 2008 3rd International Symposium on Communications, Control and Signal Processing, St. Julian’s, Malta, 12–14 March 2008; pp. 1542–1547.

Sensors 2017, 17, 410

14.

15.

16. 17. 18. 19. 20. 21. 22.

23. 24.

25.

26. 27. 28. 29.

30.

31. 32.

33.

34.

15 of 16

Singh, Y.N.; Gupta, P. Biometrics method for human identification using electrocardiogram. In Proceedings of the International Conference on Biometrics, Alghero, Italy, 2–5 June 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1270–1279. Agrafioti, F.; Hatzinakos, D. ECG based recognition using second order statistics. In Proceedings of the CNSR 2008 6th Annual Communication Networks and Services Research Conference, Halifax, NS, Canada, 5–8 May 2008; pp. 82–87. Irvine, J.M.; Israel, S.A. A sequential procedure for individual identity verification using ECG. EURASIP J. Adv. Signal Process. 2009, 1, 1–13. [CrossRef] Ince, T.; Kiranyaz, S.; Gabbouj, M. A generic and robust system for automated patient-specific classification of ECG signals. IEEE Trans. Biomed. Eng. 2009, 56, 1415–1426. [CrossRef] [PubMed] Martis, R.J.; Acharya, U.R.; Min, L.C. ECG beat classification using PCA, LDA, ICA and discrete wavelet transform. Biomed. Signal Process. Control 2013, 8, 437–448. [CrossRef] Lin, C.H. Frequency-domain features for ECG beat discrimination using grey relational analysis-based classifier. Comput. Math. Appl. 2008, 55, 680–690. [CrossRef] Daamouche, A.; Hamami, L.; Alajlan, N.; Melgani, F. A wavelet optimization approach for ECG signal classification. Biomed. Signal Process. Control 2012, 7, 342–349. [CrossRef] Saini, I.; Singh, D.; Khosla, A. QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. J. Adv. Res. 2013, 4, 331–344. [CrossRef] [PubMed] Christov, I.; Gómez-Herrero, G.; Krasteva, V.; Jekova, I.; Gotchev, A.; Egiazarian, K. Comparative study of morphological and time-frequency ECG descriptors for heartbeat classification. Med. Eng. Phys. 2006, 28, 876–887. [CrossRef] [PubMed] Wan, Y.; Yai, J. A neural network to identify human subjects with electrocardiogram signals. In Proceedings of the World Congress on Engineering and Computer Science, San Francisco, CA, USA, 22–24 October 2008. Li, M.; Narayanan, S. Robust ECG biometric by fusing temporal and cepstral information. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 August 2010; pp. 1326–1329. Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation 2000, 101, 215–220. [CrossRef] Moody, G.B.; Mark, R.G. The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 2001, 20, 45–50. [CrossRef] [PubMed] Martínez, J.P.; Almeida, R.; Olmos, S.; Rocha, A.P.; Laguna, P. A wavelet-based ECG delineator: Evaluation on standard databases. IEEE Trans. Biomed. Eng. 2004, 51, 570–581. [CrossRef] [PubMed] Hu, Y.H.; Palreddy, S.; Tompkins, W.J. A patient-adaptable ECG beat classifier using a mixture of experts approach. IEEE Trans. Biomed. Eng. 1997, 44, 891–900. [PubMed] Maglaveras, N.; Stamkopoulos, T.; Diamantaras, K.; Pappas, C.; Strintzis, M. ECG pattern recognition and classification using non-linear transformations and neural networks: A review. Int. J. Med. Inform. 1998, 52, 191–208. [CrossRef] Prasad, G.K.; Sahambi, J.S. Classification of ECG arrhythmias using multi-resolution analysis and neural networks. In Proceedings of the TENCON 2003 Conference on Convergent Technologies for the Asia-Pacific Region, Bangalore, India, 15–17 October 2003; Volume 1, pp. 227–231. De Chazal, P.; O’Dwyer, M.; Reilly, R.B. Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng. 2004, 51, 1196–1206. [CrossRef] [PubMed] Salahuddin, L.; Kim, D. Detection of acute stress by heart rate variability (HRV) using a prototype mobile ECG sensor. In Proceedings of the International Conference on Hybrid Information Technology, Cheju Island, Korea, 9–11 November 2006; pp. 453–459. Wang, Y.; Plataniotis, K.; Hatzinakos, D. Integrating Analytic and Appearance Attributes for Human Identification from ECG signals. In Proceedings of the IEEE Biometrics Symposium 2006, Baltimore, MD, USA, 19–21 September 2006. Odinaka, I.; Lai, P.H.; Kaplan, A.D.; O’Sullivan, J.A.; Sirevaag, E.J.; Rohrbaugh, J.W. ECG biometric recognition: A comparative analysis. IEEE Trans. Inf. Forensics Secur. 2012, 7, 1812–1824. [CrossRef]

Sensors 2017, 17, 410

35.

36.

37. 38. 39.

40. 41. 42. 43. 44. 45. 46. 47.

16 of 16

Kailanto, H.; Hyvarinen, E.; Hyttinen, J. Mobile ECG measurement and analysis system using mobile phone as the base station. In Proceedings of the 2008 Second International Conference on Pervasive Computing Technologies for Healthcare, Tampere, Finland, 30 January–1 February 2008; pp. 12–14. Ottenbacher, J.; Kirst, M.; Jatoba, L.; Huflejt, M.; Grossmann, U.; Stork, W. Reliable motion artifact detection for ECG monitoring systems with dry electrodes. In Proceedings of the 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, Canada, 20–24 August 2008; pp. 1695–1698. Szczepanski, ´ A.; Saeed, K. A mobile device system for early warning of ECG anomalies. Sensors 2014, 14, 11031–11044. [CrossRef] [PubMed] Sufi, F.; Khalil, I.; Habib, I. Polynomial distance measurement for ECG based biometric authentication. Secur. Commun. Netw. 2010, 3, 303–319. [CrossRef] Zhang, J.; Hu, X.; Liu, X.; Dong, J. A framework for ECG morphology features recognition. In Proceedings of the 2010 IEEE 23rd International Symposium on Computer-Based Medical Systems (CBMS), Perth, Australia, 12–15 October 2010; pp. 85–91. Chernenko, S. ECG Processing-R-Peaks Detection, Librow TM. 2012. Available online: www.librow.com (accessed on 14 January 2016). Pan, J.; Tompkins, W.J. A real-time QRS detection algorithm. IEEE Trans. Biomed. Eng. 1985, 3, 230–236. [CrossRef] [PubMed] Vandenberk, B.; Vandael, E.; Robyns, T.; Vandenberghe, J.; Garweg, C.; Foulon, V.; Willems, R. Which QT Correction Formulae to Use for QT Monitoring? J. Am. Heart Assoc. 2016, 5, e003264. [CrossRef] [PubMed] Nyquist, H. Certain topics in telegraph transmission theory. Proc. IEEE 2002, 90, 280–305. [CrossRef] Fatemian, S.Z.; Hatzinakos, D. A new ECG feature extractor for biometric recognition. In Proceedings of the 16th International Conference on Digital Signal Processing, Santorini, Greece, 5–7 July 2009; pp. 1–6. Zokaee, S.; Faez, K. Human Identification Based on Electrocardiogram and Palmprint. Int. J. Electr. Comput. Eng. 2012, 2, 261–266. Sasikala, P.; Wahidabanu, R.S.D. Identification of individuals using electrocardiogram. Int. J. Comput. Sci. Netw. Secur. 2010, 10, 147–153. Zeng, F.; Tseng, K.K.; Huang, H.N.; Tu, S.Y.; Pan, J.S. A new statistical-based algorithm for ECG identification. In Proceedings of the Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), Piraeus-Athens, Greece, 18–20 July 2012; pp. 301–304. © 2017 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Toward Improving Electrocardiogram (ECG) Biometric Verification using Mobile Sensors: A Two-Stage Classifier Approach.

Electrocardiogram (ECG) signals sensed from mobile devices pertain the potential for biometric identity recognition applicable in remote access contro...
4MB Sizes 1 Downloads 8 Views