Original Paper Folia Phoniatr Logop 2013;65:312–317 DOI: 10.1159/000365006

Published online: July 12, 2014

The Boston Residue and Clearance Scale: Preliminary Reliability and Validity Testing Asako S. Kaneoka a Susan E. Langmore a, b Gintas P. Krisciunas c Katherine Field a Rebecca Scheel c Edel McNally c Michael J. Walsh c Meredith B. O’Dea c, d Howard Cabral e a

Department of Speech Language Hearing Sciences, Sargent College, Boston University, b Department of Otolaryngology, Boston University School of Medicine, c Department of Otolaryngology, Boston Medical Center, d Department of Communication Sciences and Disorders, MGH Institute of Health Professions, and e Department of Biostatistics, Boston University School of Public Health, Boston, Mass., USA

Key Words Dysphagia · Residue · Fiberoptic endoscopic evaluation of swallowing

Abstract Background: There is no appropriately validated scale with which to rate the problem of residue after swallowing. The Boston Residue and Clearance Scale (BRACS) was developed to meet this need. Initial reliability and validity were assessed. Methods: BRACS is an 11-point ordinal residue rating scale scoring three aspects of residue during a fiberoptic endoscopic evaluation of swallowing (FEES): (1) the amount and location of residue, (2) the presence of spontaneous clearing swallows, and (3) the effectiveness of clearing swallows. To determine inter-rater and test-retest reliability, 63 swallows from previously recorded FEES procedures were scored twice by 4 raters using (1) clinical judgment (none, mild, mild-moderate, moderate, moderate-severe, severe) and (2) BRACS. Concurrent validity was tested by correlating clinical judgment scores with BRACS scores. Internal consistency of the items in BRACS was examined. A factor analysis was performed to identify important factors that suggest grouping within the 12 location items in BRACS. Results: BRACS showed excellent inter-rater reliability (intraclass cor-

© 2014 S. Karger AG, Basel 1021–7762/14/0656–0312$39.50/0 E-Mail [email protected] www.karger.com/fpl

relation coefficient, ICC = 0.81), test-retest reliability (ICC: 0.82–0.92), high concurrent validity (Pearson’s r = 0.76), and high internal consistency (Cronbach’s α = 0.86). Factor analysis revealed 3 main latent factors for the 12 location items. Conclusion: BRACS is a valid and reliable tool that can rate the severity of residue. © 2014 S. Karger AG, Basel

Introduction

Pharyngeal residue, the retention of food or liquid in the pharynx after swallowing, is a major sign of dysphagia caused by reduced movement or force of key muscles and structures that propel and clear the bolus through the oral and pharyngeal cavities [1, 2]. Because pharyngeal residue can lead to a restricted diet, lack of adequate oral intake, aspiration, and a decrease in quality of life [2–4], it is important for clinicians to reliably and accurately measure residue severity. A clinically meaningful residue scale embraces three aspects: the amount and location of residue, and the patient’s response to residue. The amount of residue is often directly related to the severity of the physiologic dysfunction. The location of residue affects the risk of aspiration, Asako S. Kaneoka, MS Department of Speech, Language and Hearing Sciences Sargent College, Boston University 635 Commonwealth Avenue, Boston, MA 02215 (USA) E-Mail kaneoka @ bu.edu

regardless of the quantity [3, 5]. The ability of a patient to elicit a spontaneous swallow in response to residue is important because reduced sensation and awareness of residue increase aspiration risk. Unfortunately, to date, there is no validated, reliable and clinically useful scale to measure all aspects of the residue problem. Two instrumental examinations of swallowing, videofluorographic swallow study (VFSS) [6] and fiberoptic endoscopic evaluation of swallowing (FEES) [7], provide different information to clinicians in terms of anatomical and physiological patterns of residue problems. Visualization in FEES is limited to the pharyngeal stage of the swallow; however, it has been shown to be more sensitive than VFSS for identifying aspiration and detecting the amount of residue [8]. This superior sensitivity is most likely related to the axial, direct view of the surface anatomy within the laryngopharynx. Most residue rating scales created to date have been developed for VFSS. Of these scales, those that are quantitative have demonstrated higher inter-rater and test-retest reliability [9, 10]. However, these tools are not widely used in clinical practice because they require special recording systems or analysis software. When scoring residue in VFSS studies, subjective scales are used more often by clinicians than quantitative ones because they are simpler and require less time. Such subjective tools rate the amount of residue using binary methods (absent or present) [1, 11], ordinal scales (e.g., none, mild, moderate, or severe) [2–4, 12, 13], or a percent estimate of the remaining bolus [14, 15]. However, a significant problem with subjective scales is their poor to moderate inter-rater reliability. In other words, scores of residue severity based on subjective measurements most likely vary between clinicians. In addition, most of the scales have not included residue clearance as a factor that contributes to residue severity. The MBSImp is a perceptually based and validated tool for scoring swallowing problems including pharyngeal residue identified by VFSS [16]. Even this established tool has focused only on the amount and location of residue. Only one scale to date has addressed the effectiveness of clearing swallows, but that had limited validity and reliability testing [13]. In spite of the fact that FEES is more sensitive in detecting pharyngeal residue than VFSS [8], there are a limited number of residue rating scales for FEES [5, 8, 17] as compared to VFSS. Both the scale of Kelly et al. [8] and Tohara’s [17] study involve locations and the amount of residue, but response to residue, otherwise called residue management, is not scored. Only one study to date [5] has included response to residue in addition to amount and The Boston Residue and Clearance Scale

location. However, the reliability of this scale has not been tested. It would be of great clinical use if a reliable and validated tool that grades comprehensive residue problems using FEES could be established. The Boston Residue and Clearance Scale (BRACS), a perceptually based rating system, was developed to assess the total problem of residue, including the amount and location of residue, and the effectiveness of patients’ ability to clear the residue in a clinically useful manner. Clear definitions and criteria for scoring were developed. The purpose of this preliminary study was to examine the reliability and validity of BRACS when used with FEES.

Materials and Methods Test Construction BRACS was conceptualized by a focus group that included 5 speech-language pathologists (SLPs) with at least 5 years of experience using FEES and 1 research associate. The SLPs scored 10 swallows taken from FEES videos using a prototype of BRACS to confirm its face validity. After further discussions and several modifications, followed by subsequent scoring of the same 10 FEES videos, the pilot version of BRACS was created (see Appendix for questions included in the scale). This early version satisfied the SLP raters as adequately encompassing the residue problem. BRACS is an 11-point ordinal scale measuring the severity of a residue problem. The scale specifically defines amount of residue (none/coating; mild = covering/filling 2/3 of the location). The amount of residue is scored in 12 locations in the laryngopharynx. An extra point is added if residue was noted in 4 or more anatomical regions. An additional point is added if residue was ever present inside the vestibule, placing the individual at highest risk for aspiration after the swallow. If residue is observed and the individual demonstrates no spontaneous clearing swallows, an extra point is added to account for the apparent lack of pharyngeal sensation. Cued or spontaneous swallows are then judged for effectiveness (yes = 80–100% cleared, partially = 20–80% cleared, no = 0–20% cleared). The scoring instruction sheet provides raters with detailed directions on how and when to score each section when reviewing a FEES video, which helps maximize inter-rater reliability. Data Source Hospital video archives (nStream) were used to obtain FEES video samples for BRACS reliability and validity testing. These FEES exams had been performed using distal chip laryngoscopy systems (either with Olympus ENF TYPE P4 or with Pentax VNL1070STK endoscopes). Sampled videos included patients who had been examined for a suspected dysphagia between January 2008 and August 2012. The FEES procedures had been performed according to the protocol described by Langmore [7] and had been performed by 1 of the 5 SLPs who were involved in BRACS test construction. A total of 63 swallows were chosen from 51 patients. Sample swallows were selected so they embraced a wide variety of residue severity and medical diagnoses. Boluses that were rated in-

Folia Phoniatr Logop 2013;65:312–317 DOI: 10.1159/000365006

313

cluded 3–5 ml liquid, 1/2–1 teaspoon of apple sauce, and 1/4–1/2 cracker. Swallows of various consistencies that were extracted from a single patient accounted for about 20% of all the sample swallows. Videos with poor visualization of the pharynx and larynx were excluded. To measure the effectiveness of additional clearing swallows, all trials without follow-up swallows or without cues to swallow by examiners were also excluded. Altered structures due to surgery for head and neck cancer were accepted as long as residue was well visualized. Patients with nasogastric feeding tubes or tracheostomy tubes were also included. Patients’ age, gender, medical diagnosis, bolus type, and estimated residue severity were obtained from the hospital electronic medical record system and saved in a spreadsheet (Excel 2010; Microsoft, Redmond, Wash., USA). The mean age of the 51 patients was 61.4 ± 12.3 years. The 63 swallows included 44 male and 19 female subjects, 33 outpatients and 30 inpatients, and diverse medical diagnoses: 21 head and neck cancer, 13 neurological diseases, 7 cardiovascular diseases, 10 respiratory diseases, 5 esophageal diseases, and 7 other. The selected 63 FEES videos were saved to a MacBook Pro (Apple; Cupertino, Calif., USA) and were trimmed and converted into MPEG files by a video converter (iSkysoft Video Converter for Mac, iSkysoft Studio). The audio of 14 swallows was selectively and partially muted with this software because the examiners had verbalized comments about the patients’ residue severity during the FEES procedure. The raters could know if the swallow was spontaneous or cued from the remaining audio. All 63 edited swallows were randomized with the random number generator function in Excel and then assigned a unique bolus number (1 through 63) for identification. These videos were compiled in a stand-alone movie file (iMovie’11; Apple). Text was shown before each video swallow with the bolus number and the bolus consistency of the swallow. The same 63 swallows in the initial movie were rerandomized and saved in a second movie. Those two movies were exported to DVDs or flash drives for the raters to view on their personal computers. Reliability Inter-rater reliability was analyzed to determine the degree of agreement among raters on BRACS. Test-retest reliability, also known as intra-rater reliability, was assessed to determine how consistent each rater was in scoring each swallow, over time, using the same scale. For reliability testing of BRACS, 4 SLP members of the focus group served as raters. For comparison, inter-rater and test-retest reliability testing was performed to assess the degree of similarity in clinical judgment between the raters. First, the 4 SLPs independently scored the severity of residue for all 63 swallows using their clinical judgment in the way they usually grade in clinical practice (using the 6-step subjective qualification of none, mild, mild-moderate, moderate, moderate-severe, and severe). This was done with the first movie, and no guidelines or criterion for scoring were provided. After 1 week, the second, rerandomized movie was scored in the same manner. Reliability of BRACS was then determined. First a 3-hour training session was conducted by the 5th expert clinician and cocreator of BRACS. The raters independently scored the swallows, and their scores were compared to the trainer’s scores. If there was discrepancy in scoring, the training continued. When the score of the 4 raters and of the trainer were within a range of 3 points on the total BRACS score across all three consistencies, the concordance was accepted and the 4 raters were deemed proficient. The 4 SLPs

314

Folia Phoniatr Logop 2013;65:312–317 DOI: 10.1159/000365006

were then given the initial movie to score using BRACS. One week later, the same SLPs scored the second movie (with the swallows rerandomized) with BRACS again. The videos were played on the raters’ personal computers with Quick Time (Apple). The raters were allowed to analyze the FEES videos frame by frame and to review as many times as needed. Intraclass correlation coefficients (ICCs) were calculated for obtaining both inter-rater and test-retest reliability. The possible range of the ICC is between 0.0 and 1.0, and ICC >0.8 is generally considered excellent reliability. The raters also recorded the amount of time they spent scoring the 63 swallows. Validity The 5th expert SLP independently scored the swallows for clinical judgment and for a BRACS score, according to the identical protocols as described above. Only the 5th SLP’s scores were used for the following validity testing. Concurrent validity examines the accuracy of a newly developed scale against a previously established criterion measure [18]. Because there are no established criterion scales for residue assessment using FEES, concurrent validity of BRACS was tested by determining the correlation of BRACS with the expert’s clinical judgment, which was assumed to be the current best residue assessment using FEES. A correlation coefficient of >0.8 is usually recognized as demonstrating good concurrent validity. Internal consistency estimates scale reliability in measuring a single concept [19]. Correlation coefficients (Cronbach’s α) between items involved in BRACS were computed in order to assess a degree of item uniformity. Cronbach’s α of >0.8 generally represents good internal consistency. Factor analysis is a statistical approach used to identify correlated items in a scale that share unobserved concepts, or factors. The method is usually used to reduce redundant items in a scale as it assumes that things that co-occur tend to have a common cause [19]. Factor analysis was performed to explore important latent factors for the involved 12 items of BRACS regarding location of residue. Statistical Analysis Inter-rater ICC and test-retest ICCs of the 4 raters for both clinical judgment and BRACS were computed [19]. Pearson’s correlation coefficient between BRACS score and the clinical judgment was obtained to test concurrent validity. Internal consistency of BRACS was tested with Cronbach’s α. Factor analysis with Varimax rotation was performed to find common factors among 12 location items in BRACS. Statistical analyses were performed on SAS® for Windows (version 9.3; SAS Institute Inc., Cary, N.C., USA). All the procedures of this study were approved by the Institutional Review Board at Boston University Medical Center.

Results

Reliability For clinical judgment ratings, inter-rater ICCs for the first and second sessions were 0.60 and 0.61, respectively. For BRACS scores, inter-rater ICCs for the first and secKaneoka/Langmore/Krisciunas/Field/ Scheel/McNally/Walsh/O’Dea/Cabral

Table 1. Rotated factor loadings of the 12 location items in BRACS

Locations of residue

Factor 1 Factor 2 Factor 3 ‘laryngeal vestibule’ ‘lower pharynx’ ‘upper pharynx’

Lateral and posterior pharyngeal wall Base of tongue Valleculae, tip of epiglottis Left lateral channel and piriform recess Right lateral channel and piriform recess Postcricoid region Left arytenoid and aryepiglottic fold Right arytenoid and aryepiglottic fold Interarytenoid space Laryngeal surface of epiglottis Laryngeal surface of aryepiglottic fold, false vocal fold Anterior and posterior commissure, true vocal fold

0.49 –0.06 0.09 0.21 0.16 0.18 –0.95 –0.79 –0.74 –0.75 –0.95 –0.95

0.08 –0.16 0.37 –0.82 –0.89 –0.81 0.16 0.33 0.33 0.11 0.10 0.10

0.34 –0.85 –0.76 0.13 0.15 –0.14 –0.01 –0.03 –0.05 0.14 –0.02 –0.02

High loadings on each factor are in italics.

ond sessions were 0.81 and 0.80, respectively. For clinical judgment scores, test-retest ICCs for the 4 SLPs ranged between 0.72 and 0.86 (rater 1 = 0.80, rater 2 = 0.86, rater 3 = 0.72, rater 4 = 0.80). For BRACS, test-retest ICCs for the 4 SLPs ranged between 0.82 and 0.92 (rater 1 = 0.84, rater 2 = 0.82, rater 3 = 0.92, rater 4 = 0.86). Validity BRACS scores showed high correlation with the expert’s clinical judgment (Pearson’s correlation coefficient = 0.76). This indicates that criterion validity was good and that BRACS measured residue in a manner that was similar to the experienced clinicians’ clinical judgment. Internal consistency of all items included in BRACS yielded a Cronbach’s α of 0.87, which showed high internal consistency. Factor analysis showed that there were three significant factors with an eigenvalue >1, the commonly applied Kaiser criterion (eigenvalue for factor 1: 5.67, factor 2: 1.91, and factor 3: 1.43, respectively). For factor 1, 6 of the 12 locations had high loadings and anatomically corresponded to the laryngeal vestibule. For factor 2, 3 of the 12 locations had high factor loadings and corresponded to the lower pharynx, while for factor 3, 2 of the 12 locations had high factor loadings that corresponded to the upper pharynx. Accordingly, the 12 locations could be grouped into three main regions: laryngeal vestibule, lower pharynx, and upper pharynx (table  1). This first factor, ‘laryngeal vestibule’, explained 47.3% of the variance. The second factor, ‘lower pharynx’, explained 15.9% of the variance. The third factor, ‘upper The Boston Residue and Clearance Scale

pharynx’, explained 11.9% of the variance. Collectively, 75.1% of total variance was accounted for by these three factors. Lateral and posterior pharyngeal walls did not show high loading on any factors. Necessary Time for Scoring per Swallow When scoring the swallows using clinical judgment, all raters scored the 63 swallows within 1 h in both the first and second trials. Thus, the mean time that the 4 raters spent scoring a single swallow was less than 1 min. When rating using BRACS, the average time required to score all 63 swallows in the first trial was 2 h and 45 min, and in the second trial 1 h and 57 min. Thus, the mean time that the 4 raters spent scoring a single swallow was approximately 2 min and 37 s in the first BRACS trial, and 1 min and 52 s in the second BRACS trial.

Discussion

BRACS was created to assess residue severity and was tested for its reliability and concurrent validity. Our preliminary study demonstrated that BRACS had excellent inter-rater and test-retest reliability, while clinical judgment showed moderate inter-rater and excellent test-retest reliability. Concurrent validity of BRACS and internal consistency were also appropriately established. These results support that BRACS provides reliable and valid information regarding the severity of pharyngeal residue. Factor analysis revealed three main latent factors in the Folia Phoniatr Logop 2013;65:312–317 DOI: 10.1159/000365006

315

12 location items in BRACS: ‘laryngeal vestibule’, ‘lower pharynx’, and ‘upper pharynx’. BRACS showed high inter-rater and test-retest reliability. The training session and the clear instructions were possible reasons for the high reliability of BRACS, considering that other studies that did not specifically define scoring criteria [14] or did not provide training sessions [11] failed to obtain high reliability. Although understanding scoring criteria and using frame-by-frame analysis require clinicians to spend more time to score residue, those processes may be crucial for BRACS to maintain high reliability. High internal consistency of BRACS suggested that all items included in BRACS were related and relevant. This also suggested that all the items involved in BRACS were required to determine residue severity. However, BRACS was not time-efficient, taking twice as long to evaluate a swallow as compared to a traditional 6-step clinical judgment scale. Scoring 12 different anatomical sites in BRACS was most likely the factor that contributed most to the increased time needed for residue assessment. Such a detailed scoring instrument may be useful for research purposes, but will be overly cumbersome for routine clinical practice. To make BRACS more practical in the clinical setting, the scale will be simplified by removing or merging redundant items. The three factors that were identified during the factor analysis will help guide item reduction. Using only the more general locations of ‘laryngeal vestibule’, ‘lower pharynx’ and ‘upper pharynx’ would enable BRACS to be a more concise but still statistically valid scale that could be used routinely in a clinical setting. There are some limitations of the current study. BRACS was tested for reliability and validity using SLP raters that worked in the same institution. This same group developed the scale and then used it to rate swallows. This may have yielded much higher reliability and consensus validity than would be obtained using raters with a greater range of expertise, and from different institutions. Another limitation of this study is that it was only validated for FEES. Although all other residue rating systems described in the literature were developed only for VFSS or FEES, a superior rating scale would be able to be applied to either procedure. This has been done for the Penetration-Aspiration Scale, a widely used scale for measuring another parameter of dysphagia [20]. This endeavor might uncover interesting differences between the two types of diagnostic tools, but in the end would hopefully result in a truly useful scale that could be generalized to either procedure. 316

Folia Phoniatr Logop 2013;65:312–317 DOI: 10.1159/000365006

Conclusion

A new scale for assessing the problem of residue as seen in a FEES procedure was developed. It showed high inter-rater and test-retest reliability, concurrent validity, and internal consistency. The scale contained items that were appropriate for assessing the severity and location of residue, the patient’s response to the residue, and the effectiveness of any clearing swallows. Using the same swallows, BRACS was found to be more reliable than clinical judgment. Further simplification based on the results of the factor analysis and further testing with external raters will further increase the clinical utility and validity of BRACS.

Appendix Q1. Location and amount of Bolus 1 residue – mark all that apply, and none/ then indicate the worst score attained coating from any location in the last row.

Zone 1 Lateral pharyngeal wall, posterior pharyngeal wall Base of tongue Valleculae, tip of epiglottis Zone 2 Left lateral channel and left piriform recess Righ lateral channel and right piriform recess Postcricoid region

References

mild 2/3

0 0 0

1 1 1

2 2 2

3 3 3

0

1

2

3

0 0

1 1

2 2

3 3

1 Dejaeger E, Pelemans W, Ponette E, Joosten E: Mechanisms involved in postdeglutition retention in the elderly. Dysphagia 1997; 12: 63–67. 2 Eisenhuber E, Pokieser P, Oschatz E: Videofluoroscopic assessment of patients with dysphagia: pharyngeal retention is a predictive factor for aspiration. AJR Am J Roentgenol 2002;178:393–398. 3 Perlman PW, Cohen MA, Setzen M, Belafsky PC, Guss J, Mattucci KF, et al: The risk of aspiration of pureed food as determined by flexible endoscopic evaluation of swallowing with sensory testing. Otolaryngol Head Neck Surg 2004;130:80–83. 4 Han TR, Paik NJ, Park JW: Quantifying swallowing function after stroke: a functional dysphagia scale based on videofluoroscopic studies. Arch Phys Med Rehabil 2001;82:677–682.

Kaneoka/Langmore/Krisciunas/Field/ Scheel/McNally/Walsh/O’Dea/Cabral

5 Farneti D: Pooling score: an endoscopic model for evaluating severity of dysphagia. Acta Otorhinolaryngol Ital 2008;28:135–140. 6 Logemann JA: Evaluation and Treatment of Swallowing Disorders, ed 2. Austin, PRO-ED, 1998. 7 Langmore SE: Endoscopic Evaluation and Treatment of Swallowing Disorders, ed 2. New York, Thieme, 2001. 8 Kelly AM, Leslie P, Beale T, Payten C, Drinnan MJ: Fibreoptic endoscopic evaluation of swallowing and videofluoroscopy: does examination type influence perception of pharyngeal residue severity? Clin Otolaryngol 2006;31:425–432. 9 Dyer JC, Leslie P, Drinnan MJ: Objective computer-based assessment of valleculae residue – is it useful? Dysphagia 2008;23:7–15. 10 Pearson WG, Molfenter SM, Smith ZM, Steele CM: Image-based measurement of post-swallow residue: the normalized residue ratio scale. Dysphagia 2013;28:167–177.

The Boston Residue and Clearance Scale

11 McCullough GH, Wertz RT, Rosenbek JC, Mills RH, Webb WG, Ross KB: Inter- and intrajudge reliability for video fluoroscopic swallowing evaluation measures. Dysphagia 2001;118:110–118. 12 Kuhlemeier KV, Yates P, Palmer JB: Intraand interrater variation in the evaluation of videofluorographic swallowing studies. Dysphagia 1998;13:142–147. 13 O’Neil KH, Purdy M, Falk J, Gallo L: The Dysphagia Outcome and Severity Scale. Dysphagia 1999;14:139–145. 14 Stoeckli SJ, Huisman TA, Seifert B, MartinHarris BJW: Interrater reliability of videofluoroscopic swallow evaluation. Dysphagia 2003;18:53–57. 15 Logemann JA, Williams RB, Rademaker A, Pauloski BR, Lazarus CL, Cook I: The relationship between observations and measures of oral and pharyngeal residue from videofluorography and scintigraphy. Dysphagia 2005; 20:226–231.

16 Martin-Harris B, Brodsky MB, Michel Y, Castell DO, Schleicher Blair M, Sandidge J, Maxwell RJB: MBS measurement tool for swallow impairment – MBSImp: establishing a standard. Dysphagia 2008;23:392–405. 17 Tohara H, Nakane A, Murata S, Mikushi S, Ouchi Y, Wagasugi Y, et al: Inter- and intrarater reliability in fibroptic endoscopic evaluation of swallowing. J Oral Rehabil 2010; 37: 884–891. 18 Cronbach LJ, Meehl ME: Construct validity in psychological tests. Psychol Bull 1955;52:281– 302. 19 Cohen RJ, Swerdlik ME: Psychological Testing and Assessment: An Introduction to Tests and Measurement, ed 7. Boston, McGrawHill, 2009. 20 Colodny N: Interjudge and intrajudge reliabilities in fiberoptic endoscopic evaluation of swallowing (FEES®) using the PenetrationAspiration Scale: a replication study. Dysphagia 2002;17:308–315.

Folia Phoniatr Logop 2013;65:312–317 DOI: 10.1159/000365006

317

Copyright: S. Karger AG, Basel 2014. Reproduced with the permission of S. Karger AG, Basel. Further reproduction or distribution (electronic or otherwise) is prohibited without permission from the copyright holder.

The Boston Residue and Clearance Scale: preliminary reliability and validity testing.

There is no appropriately validated scale with which to rate the problem of residue after swallowing. The Boston Residue and Clearance Scale (BRACS) w...
116KB Sizes 2 Downloads 5 Views