International Journal of Audiology 2014; 53: 836–837

Letter to the Editor

Reliability of categorical loudness scaling in the electrical domain: Common mistakes Advances in knowledge 1. Reliability (precision) is an important methodological issue in all fields of research. 2. Reliability is being assessed by inappropriate tests, all of which are among the common mistakes being published by high impact journals. 3. As a take home message, for reliability analysis, appropriate tests should be applied by clinical researchers.

Implication for patient care Misdiagnosis and mismanagement of the patients in routine clinical care cannot be avoided when using inappropriate tests to assess reliability.

Reliability of categorical loudness scaling in the electrical domain: Common mistakes I was interested to read the paper by Theelen-van den Hoek and colleagues published in the April 2014 issue of the International Journal of Audiology. The authors investigated the reproducibility of categorical loudness scaling (CLS) following the recommendations of ISO 16832 using electrical stimuli presented to cochlear implant (CI) users (Theelen-van den Hoek et  al, 2014). Loudness growth functions (LGFs) described loudness as a function of level (mA). The reproducibility of the b parameter and inter-session intra-subject differences in percentage dynamic range (DR) between “Very Soft” and “Loud – Very Loud” levels were analysed. They reported that the reproducibility of LGF shapes was moderate (r  0.63) (Theelen-van den Hoek et  al, 2014). This result has nothing to do with reliability and actually is one of the common mistakes in reliability analysis (Lin). Reliability (repeatability or reproducibility) is being assessed by different statistical tests such as Pearson r, least square, and paired t-test, all of which are among common mistakes in

Reply from Theelen-van den Hoek, et al With interest we have read the letter “Reliability of categorical loudness scaling in the electrical domain: Common mistakes” that was sent by Dr. Sabour in response to our paper “Reliabi­ lity of categorical loudness scaling in the electrical domain”. In his letter, Dr. Sabour argues that the reproducibility of measurement tools should be assessed by means of intraclass correlation

reliability analysis (Lin, 1989; Rothman et  al, 2010; Sabour & Dastjerdi, 2013; Sabour, 2013). Briefly, for quantitative variable intra-class correlation coefficient (ICC) and for qualitative variables, a weighted kappa should be used with caution because kappa has its own limitation too (Lin, 1989; Rothman et al, 2010; Sabour & Dastjerdi, 2013; Sabour, 2013). As the authors point out in their conclusion, the reproducibility was comparable to the reproducibility for acoustical stimulation in normal-hearing and hearing-impaired listeners (Theelen-van den Hoek et al, 2014). Such a conclusion is misleading due to inappropriate use of statistical tests to evaluate reproducibility. Kind regards, Siamak Sabour Safety Promotion and Injury Prevention Research Center, and the Department of Clinical Epidemiology, School of Health, Shahid Beheshti University of Medical Sciences, Tehran, I.R. Iran. E-mail: [email protected]

References Theelen-van den Hoek F.L., Boymans M., Stainsby T. & Dreschler W.A. 2014. Reliability of categorical loudness scaling in the electrical domain. Int J Audiol, 53, 409–17. Lawrence I. Kuei Lin. 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255–268. Kenneth J. Rothman, Sander Greenland & Timothy L. Lash. 2010. Modern Epidemiology, 4th edition. Baltimore, USA: Lippincott Williams & Wilkins. Sabour S. & Dastjerdi E.V. 2013. Reliability of four different computerized cephalometric analysis programs: A methodological error. Eur J Orthod, 35(6), 848. doi: 10.1093/ejo/cjs074. Epub 2013 Oct 16. Sabour S. 2013. Interlaboratory and interstudy reproducibility of a novel lateral-flow device: A statistical issue. J Clin Microbiol, 51(5), 1652. doi: 10.1128/JCM.00111-13.

coefficients (ICCs) or kappas in the case of qualitative variables. More specifically, Dr. Sabour argues that the correlation coefficient we used for the b parameter is not suitable for concluding that the reproducibility of this variable was only moderate. Also, Dr. Sabour questions our conclusion about the similarity between the reproducibility of categorical loudness scaling (CLS) with CI users, normal-hearing listeners, and hearing-impaired listeners because this conclusion is based on an invalid outcome measure,

ISSN 1499-2027 print/ISSN 1708-8186 online © 2014 British Society of Audiology, International Society of Audiology, and Nordic Audiological Society DOI: 10.3109/14992027.2014.930522

at least in his opinion. We would like to thank Dr. Sabour for suggesting the use of ICCs to assess the reproducibility of CLS in the electrical domain. In response we would like to share our motivation for the statistical analysis that we have used in our paper. Also, we will indicate that our conclusions would be the same if ICCs were used. Our main dataset consists of stimulation levels corresponding to categorical loudness levels as measured by means of categorical loudness scaling (CLS) in the electrical domain during different sessions. For this continuous variable we chose to use an outcome measure that has been used by others for similar types of measurements in the field of audiology. This had the advantage of being able to compare our data for CI users with similar data for normal-hearing and hearing-impaired listeners. We do not agree with Dr. Sabour that this comparison is misleading. ICCs are used to reflect the variability between different observations over time compared to the variability in the total dataset. Retrospectively we have calculated the two-way random ICC for absolute agreement between the stimulation levels measured during both sessions (in terms of mA). This value equalled 0.98 which confirms our conclusion that CLS is a reliable measurement tool in the electrical domain.

Letter to the Editor     837 Furthermore, we conclude from the correlation coefficient for the b parameter that its reproducibility is limited. Dr. Sabour indicates that the correlation coefficient is not appropriate to assess reproducibility. We do agree with Dr. Sabour that high correlation coefficients do not necessarily indicate a good reproducibility. However, if the correlation coefficient is well below 1 (in our case 0.6) we do feel confident in concluding that the reproducibility is only limited. Namely, if the consistence of the b parameter between sessions would have been high relative to the total variability in b parameter values (i.e. a high ICC and good reliability), the correlation coefficient must also be high. Additionally, we like to emphasize that we provided all individual b parameter values in Figure 2. If a location shift would have been present in our data, this ‘flaw’ of using the correlation coefficient would have been visible in this figure. Given the above, we are confident that our conclusion about the reproducibility of the b parameter is valid. Kind regards, Femke Theelen-van den Hoek Monique Boymans Wouter Dreschler

Copyright of International Journal of Audiology is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Reliability of categorical loudness scaling in the electrical domain: common mistakes.

Reliability of categorical loudness scaling in the electrical domain: common mistakes. - PDF Download Free
68KB Sizes 0 Downloads 4 Views