A Preliminary Investigation of Potential Biases in Phonation Threshold Pressure Analysis Anusha Sundarrajan, 1Elizabeth Erickson-Levendoski, and M. Preeti Sivasankar, West Lafayette, Indiana Summary: Objective. Phonation threshold pressure (PTP) is a voice measure used in both research and clinic. PTP data analysis is susceptible to bias from investigator awareness of experimental hypothesis, and poor investigator training. The objective of this study was to systematically examine the role of these two biases on PTP data analysis. Study Design. Prospective design. Methods. Two trained investigators analyzed PTP datasets. The datasets were identical, but uniquely labeled so that the investigators were not aware that the datasets contained the same data. Each investigator analyzed two datasets. For one dataset, investigators were ‘‘blinded’’ to the experimental hypothesis. For the other dataset, the investigators were ‘‘unblinded’’ and provided a fake experimental hypothesis. Intraclass correlations were used to examine intrarater and interrater reliability. Results. For both investigators, intraclass correlations within the excellent range were obtained for intrarater reliability. In contrast, lower intraclass correlations were obtained for interrater reliability. Conclusions. The high intrarater reliability obtained in this preliminary study suggests that awareness of experimental hypothesis may not significantly bias PTP analysis. Conversely, lower interrater reliability is indicative of differences between investigators analyzing the same data. Our findings contribute to the growing body of literature that seeks to standardize the use of PTP in research and the clinic. Future investigations are needed to identify methods to improve interrater reliability and that quantify the effects of biases on PTP data collection. Key Words: Phonation threshold pressure–Biases–Reliability–Intraclass correlations. INTRODUCTION Phonation threshold pressure (PTP) is the minimum lung pressure required to initiate and sustain vocal fold oscillation.1 PTP is commonly used in research laboratories to noninvasively assess changes in laryngeal biomechanics2–6 but is less ubiquitous in the clinic. One obstacle in the widespread adoption of PTP7 is susceptibility to bias. PTP data collection and analysis retain some subjectivity. Investigator knowledge of the hypothesis and experimental conditions and investigator training and experience are potential sources of bias in data collection and analysis. These biases may be manifested as inconsistent cuing for evoking threshold phonation, providing inadequate feedback about productions, and selective peak picking. Blinding, defined as the deliberate withholding of information from participants and/or investigators involved in a study8,9 is used to reduce biases in PTP data collection and analysis. Double-blinded research designs have been used in studies of PTP. Verdolini et al2 documented the role of systemic and secretory dehydration in altering PTP with a double-blinded placebo controlled study. Tanner et al3 used a double-blinded within crossover design, to examine the effects of nebulized treatments on PTP following a surface laryngeal dehydration challenge in Accepted for publication July 3, 2014. This research was presented at the Annual Symposium: Care of the Professional Voice, May 28—June 1, 2014, Philadelphia, Pennsylvania. From the Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana. 1 Currently at University of Wisconsin-Madison, Division of Otolaryngology-Head & Neck Surgery, Department of Surgery, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin. Address correspondence and reprint requests to M. Preeti Sivasankar, Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907. E-mail: [email protected] Journal of Voice, Vol. 29, No. 1, pp. 22-25 0892-1997/$36.00 Ó 2015 The Voice Foundation http://dx.doi.org/10.1016/j.jvoice.2014.07.001

classically trained sopranos. Erickson-Levendoski and Sivasankar10 investigated the adverse phonatory effects of caffeine with a double-blinded sham controlled experiment. Double blinding however, may not be applicable in all PTP studies. Complete participant and investigator blinding may be precluded in research studies that investigate the effects of environmental background noise or different ambient humidities and temperatures on PTP.11–13 In such circumstances, investigator blinding during data analysis alone may be used.14 However, to fully understand the importance of investigator blinding in PTP data analysis, it is essential to determine whether awareness of the experimental hypothesis can bias PTP analysis. In addition, it is useful to quantify how much investigator training is needed to accurately measure PTP. The overall goal of this preliminary study was to investigate potential biases in PTP analysis. The primary focus was to determine whether knowledge of the experimental hypothesis can influence PTP analysis. The secondary focus was to compare the accuracy of PTP analysis between two investigators, trained simultaneously by the same researcher. We hypothesized that knowledge of the experimental hypothesis would bias PTP analysis and furthermore, that PTP data analyzed by similarly trained investigators would be in close agreement. To test these hypotheses, two trained investigators analyzed PTP data that were collected a priori by the authors. The investigators analyzed coded data and were blinded or unblinded to a fake, experimental hypothesis. We propose that knowledge from this study will inform practices for training investigators to analyze PTP data in both research and clinical settings. METHODS The Purdue University Institutional Review Board approved all the procedures used in the study.

Anusha Sundarrajan, et al

23

Biases in PTP Analysis

Investigators PTP data were analyzed by two investigators who did not participate in data collection. The investigators read several published articles on PTP and discussed them with the senior author (M.P.S.). PTP training was provided to the investigators by the senior author and the two investigators practiced PTP analysis. Data categorization and instructions Four PTP datasets were created by the senior author. Each dataset contained the same data, which included 606 /pi/ pressure peaks. The datasets were labeled blinded1 (BL1), blinded2 (BL2), unblinded1 (UB1), and unblinded2 (UB2). The first investigator analyzed UB1 and BL1. The second investigator analyzed UB2 and BL2. The following instructions were provided to the investigators. For the UB1 dataset, ‘‘This dataset contains PTP data for 6 subjects (subject 1 through subject 6). Each subject performed the PTP task before loud reading and after loud reading. So each subject has PTP-pre data and PTP-post data. We want to test the hypothesis that loud reading increases PTP. This has been demonstrated before in PTP studies. So we would expect that PTP is higher for PTP-post data as compared to PTP-pre data.’’ For the BL1 dataset the instructions were as follows: ‘‘This dataset contains PTP data for 12 subjects (subject 13 through subject 24). Please analyze these PTP data.’’ The second investigator was provided with the same instructions for their unblinded (UB2) and blinded (BL2) datasets except for UB2 they were informed that they were analyzing subjects 37–48, and for BL2 they were requested to analyze PTP data from subjects 25–36. We chose a loud reading task as the fake hypothesis for the unblinded dataset because this loading challenge is commonly incorporated in PTP studies and there is a clear direction of experimental change that the investigators were aware of.4,15 Each investigator analyzed their assigned datasets and their analyses were compared for interrater reliability and intrarater reliability. PTP instrumentation, data collection, and analysis The instrumentation for PTP data collection included a circumferentially-vented pneumotachograph face mask coupled to transducers (Glottal Enterprises, Syracuse, NY) for the measurement of oral flows and PTP. PTP data were collected using validated methods from six individuals (2 males and four females) in a humidity (50 ± 10%) and temperature (70 ± 5 F) controlled environment (Traceable Memory Hygrometer, VWR, Radnor, PA). The subjects were trained on PTP using established methods.16 In brief, subjects produced 7–10 repetitions of /pi/ as softly as possible in a single breath at 92 beats/minute measured using a Seiko Digital Metronome (Model# DM50; Seiko Sports Life Co., Ltd, China). These repetitions constituted one string. Subjects produced 8–10 strings at C4 frequency as cued on a keyboard. The subjects were provided visual feedback and investigator modeling. PTP data were deemed accurate when meeting the following criteria: /pi/ pressure peaks of equal height as assessed visually, oral flows 15 mL/s during the /p/ production. For PTP analysis, the investigators analyzed the pressures for three consecutive

/p/ peaks from each string. The first and last pressure peaks were not considered for inclusion. These peak pressures were averaged to yield PTP. Statistical analysis PTP data were summarized as means ± Standard deviation (SD). Data were analyzed using SPSS statistical software (IBM Version 20, Chicago, IL). To obtain intrarater reliability a two-way mixed, absolute, single-measures intraclass correlation coefficient (ICC) was calculated. Separate ICCs were performed for each investigator. For investigator 1, intrarater reliability was the ICC between UB1 and BL1. For investigator 2, intrarater reliability was the ICC between UB2 and BL2. To obtain interrater reliability, a two-way mixed, absolute, singlemeasures ICC was calculated. Separate ICCs were obtained for the blinded and unblinded datasets. For the blinded dataset, interrater reliability was the ICC between BL1 and BL2. For the unblinded dataset, interrater reliability was the ICC between UB1 and UB2. RESULTS The primary purpose of this study was to investigate whether knowledge of the experimental hypothesis would affect PTP analysis. We compared the PTP analysis for investigators who analyzed identical PTP data that were labeled differently (ie, BL1 and UB1 for investigator 1, and BL2 and UB2 for investigator 2). PTP results are depicted in Table 1. Mean PTP data for investigator 1 were 3.96 cm H2O for the BL1 condition and 3.86 cm H2O for the UB1 condition, yielding a mean difference of 0.1 cm H2O. Mean PTP data for investigator 2 were 3.62 cm H2O for the BL2 condition and 3.66 cm H2O for the UB2 condition, yielding a mean difference of 0.04 cm H2O. We also computed intrarater ICC for each investigator. ICCs are classified as excellent in the range of 0.75–1. Intrarater ICC for investigator 1 was excellent at 0.92. Intrarater ICC for investigator 2 was excellent at 0.96. The second purpose of this study was to investigate the extent of agreement between two investigators who were trained simultaneously in PTP analysis. We predicted that these investigators would be in close agreement in their analysis of identical data (Table 1). The mean PTP data were 3.96 cm H2O and 3.62 cm H2O for BL1 and BL2 datasets respectively, yielding a mean difference of 0.34 cm H2O. Mean PTP data were 3.86 cm H2O and 3.66 cm H2O for UB1 and UB2 datasets TABLE 1. Interrater and Intrarater Intraclass Correlation Coefficients (ICC) for Blinded and Unblinded Datasets for Each Investigator

Datasets Blinded Unblinded Intrarater ICC

Investigator 1 PTP (cm H2O) Mean ± SD

Investigator 2 PTP (cm H2O) Mean ± SD

3.96 ± 0.55 3.86 ± 0.62 0.92

3.62 ± 0.57 3.66 ± 0.61 0.96

Interrater ICC 0.76 0.78

24 respectively, yielding a mean difference of 0.2 cm H2O. We also compared interrater ICC for the blinded and unblinded datasets. The interrater ICC for blinded dataset was 0.76 and the interrater ICC for the unblinded dataset was 0.78. DISCUSSION The present study examined two potential biases in PTP data analysis stemming from (1) investigator knowledge of the experimental hypothesis and (2) investigator training. Two investigators analyzed multiple, identical PTP datasets. However, the data within each dataset were labeled uniquely so the investigators were not aware that the datasets were identical. Rather, they were provided a hypothesis for one dataset and asked to investigate whether PTP results agreed with the hypothesis. The hypothesis (loud reading increases PTP) was selected based on published PTP studies.4,15 Both investigators had read and discussed these studies with the senior author. Investigators were not provided a hypothesis for the second dataset. Our results reveal excellent intrarater reliability suggesting that awareness of the experimental hypothesis may not significantly bias PTP analysis, in the context of the present study. In contrast, the interrater reliability for PTP analysis was lower. These results suggest that there were differences between investigators analyzing the same data. Intrarater reliability is defined as the degree to which measurements taken by the same investigator are consistent.17 Our results reveal high consistency for each investigator who analyzed PTP suggesting that investigators learned and implemented PTP analysis consistently and effectively. The investigators selected the PTP peaks with precision. Interestingly, the ability to select pressure peaks accurately did not appear to depend on years of experience with PTP data collection and analysis, because neither investigator had prior experience with PTP analysis before commencing this study. The conclusion that could be drawn here is that the training given to the investigators may have been sufficient to enable them to use PTP in a manner that ensured higher intrarater reliability. The excellent intrarater ICCs also suggest that the same rater should measure all the PTP peaks in a study to minimize measurement error. Intrarater reliability is typically high in published PTP studies6,18,19 supporting the findings in this study. However, high intrarater reliability in data analysis does not minimize the importance of incorporating double blinding in research studies, because a significant source of bias in PTP occurs during data collection, specifically from investigator knowledge of the hypothesis being tested. Interrater reliability is defined as the degree of agreement between two or more investigators who analyze the same data independently.20 The interrater ICCs were in the excellent range (>0.75) but lower than intrarater ICC. These results suggest differences in PTP analysis between two trained investigators. The reasons for this finding are unclear because PTP training was conducted simultaneously and in an identical manner for both investigators. Despite this training, it appears that PTP analysis retains some subjectivity. A follow-up study could compare the reliability across investigators trained at different facilities. Automating PTP analysis for peak picking using predetermined criteria could also reduce investigator bias and

Journal of Voice, Vol. 29, No. 1, 2015

should be examined further. Automated PTP analyses have been used in PTP research studies.6,18 However, automated analysis may reduce researcher autonomity especially in analyzing PTP data from individuals with voice disorders, where PTP peaks tend to be variable because of altered vocal fold biomechanics resulting from the underlying pathology. The findings of higher intrarater reliability and lower interrater reliability in this preliminary investigation need to be interpreted with certain caveats. Only two trained investigators analyzed the data and it is possible that the absence of a blinding effect was because of the small number of investigators. Additionally, the investigators were not vested in the outcome of the study and may have therefore been less susceptible to bias. Because bias is context-driven, it will be important to replicate this study with investigators who have a stake in a particular outcome. Finally, the investigators were trained in PTP analysis for the sole purpose of this study. It is therefore difficult to rule out a recency effect that may have influenced the analysis. In summary, our results draw attention to the need to identify robust techniques to improve interrater reliability. Future work should focus on standardizing procedures to train investigators in PTP analysis, because differences in training methodology across facilities could affect analysis and interpretation. Multi-center training and data sharing would be beneficial in developing standardized protocols. Finally, potential sources of bias in PTP data collection were not assessed here.7,21 PTP data are influenced by quality of instruction, and investigator awareness of the experimental hypothesis is another large source of error. Until sources of bias in data collection are fully quantified and understood, double blinding should be used in PTP studies. CONCLUSIONS PTP data analysis is susceptible to bias from investigator awareness of the experimental hypothesis and investigator training. This preliminary study sought to investigate these biases. We found high intrarater reliability for each investigator suggesting that knowledge of the hypothesis may not influence PTP analysis. The interrater reliability for PTP analysis, on the other hand, was lower. The current findings contribute to the growing body of literature on methods to improve the reliability of PTP during data collection and analysis. Acknowledgments We thank Leasa Rueter and Morgan Robertson for their contributions to data analysis. REFERENCES 1. Titze IR. The physics of small-amplitude oscillation of the vocal folds. J Acoust Soc Am. 1988;83:1536–1552. 2. Verdolini-Marston K, Min Y, Titze IR, Lemke J, Brown K, Jiang J, Fisher KV. Biological mechanisms underlying voice changes due to dehydration. J Speech Lang Hear Res. 2002;45:268–281. 3. Tanner K, Roy N, Merrill RM, et al. Nebulized isotonic saline versus water following a laryngeal desiccation challenge in classically trained sopranos. J Speech Lang Hear Res. 2010;53:1555–1566. 4. Solomon NP, Glaze LE, Arnold RR, van Mersbergen M. Effects of a vocally fatiguing task and systemic hydration on men’s voices. J Voice. 2003;17:31–46.

Anusha Sundarrajan, et al

Biases in PTP Analysis

5. Chang A, Karnell MP. Perceived phonatory effort and phonation threshold pressure across a prolonged voice loading task: a study of vocal fatigue. J Voice. 2004;18:454–466. 6. Roy N, Tanner K, Gray SD, Blomgren M, Fisher KV. An evaluation of the effects of three laryngeal lubricants on phonation threshold pressure. J Voice. 2003;17:331–342. 7. Plexico LW, Sandage MJ, Faver KY. Assessment of phonation threshold pressure: a critical review and clinical implications. Am J Speech Lang Pathol. 2011;20:348–366. 8. Polit DF. Blinding during the analysis of research data. Int J Nurs Stud. 2011;48:636–641. 9. Schulz KF, Grimes DA. Blinding in randomized trials: hiding who got what. Lancet. 2002;359:696–700. 10. Erickson-Levendoski E, Sivasankar M. Investigating the effects of caffeine on phonation. J Voice. 2011;25:e215–e219. 11. Sandage MJ, Connor NP, Pascoe DD. Voice function differences following resting breathing versus submaximal exercise. J Voice. 2013;27:572–578. 12. Sivasankar M, Erickson-Levendoski E, Schneider S, Hawes A. Phonatory effects of airway dehydration: preliminary evidence for impaired compensation to oral breathing in individuals with vocal fatigue. J Speech Lang Hear Res. 2008;51:1494–1506.

25 13. Sandage MJ, Connor NP, Pascoe DD. Vocal function and upper airway thermoregulation in five different environmental conditions. J Speech Lang Hear Res. 2014;57:16–25. 14. Erickson-Levendoski E, Sundarrajan A, Sivasankar M. Reducing the negative vocal effects of superficial laryngeal dehydration with humidification. Ann Otol Rhinol Laryngol. 2014;123:475–481. 15. Solomon NP, DiMattia MS. Effects of a vocally fatiguing task and systemic hydration on phonation threshold pressure. J Voice. 2000;14:341–362. 16. Fisher KV, Swank PR. Estimating phonation threshold pressure. J Speech Lang Hear Res. 1997;40:1122–1129. 17. Hayen A, Dennis RJ, Finch CF. Determining the intra- and inter-observer reliability of screening tools used in sports injury research. J Sci Med Sport. 2007;10:201–210. 18. Tanner K, Roy N, Merrill RM, Elstad M. The effects of three nebulized osmotic agents in the dry larynx. J Speech Lang Hear Res. 2007;50:635–646. 19. Solomon NP, Ramanathan P, Makashay MJ. Phonation threshold pressure across the pitch range: preliminary test of a model. J Voice. 2007;21:541–550. 20. Hallgren KA. Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012;8:23–34. 21. Plexico LW, Sandage MJ. Influence of vowel selection on determination of phonation threshold pressure. J Voice. 2012;26:673.e7–673.e12.

A preliminary investigation of potential biases in phonation threshold pressure analysis.

Phonation threshold pressure (PTP) is a voice measure used in both research and clinic. PTP data analysis is susceptible to bias from investigator awa...
84KB Sizes 0 Downloads 10 Views