ORIGINAL ARTICLE

Pain Assessment in Children Validity of Facial Expression Items in Observational Pain Scales Julie Chang, BA (Hons),* Judith Versloot, PhD,*w Samantha R. Fashler, BA,* Kalie N. McCrystal, BA (Hons),* and Kenneth D. Craig, PhD*

Objectives: Assessing pain in young children requires astute judgment by observers. Multidimensional observational scales for pediatric pain contribute by providing behavioral cues believed to characterize pain in children; yet, few measurement items have undergone rigorous psychometric evaluation. This is the case with facial expression, which has been widely recognized as the most sensitive and specific nonverbal indicator of pain. The criteria for identifying facial expressions of pain differ substantially across scales and are frequently inconsistent with empirical descriptions. Materials and Methods: The present study compared observer ratings of children’s (aged 1 to 6 y, inclusive) videotaped postoperative pain reactions using the facial activity items from 6 widely used pediatric pain assessment scales and an anatomically based and empirically validated measure, the Child Facial Coding System. We hypothesized that facial expression items that did not correspond to empirical descriptions would lead to less reliable and divergent pain estimates. Intercoder reliability, criterion validity (empirical and convergent), content validity, and face validity were examined. Results: Findings supported hypotheses and indicated that variation in cues proposed for assessing facial expression led to widely ranging scores that could be insensitive to differences in children’s pain intensity. Discussion: The facial items varied considerably in coder judgment reliability as well as criterion (empirical and convergent), content, and face validity. Observational scales should provide behavioral cues that correspond to empirical descriptions of the facial expression of pain. Key Words: pain, pediatric pain assessment, facial expression

(Clin J Pain 2015;31:189–197)

P

ain as a result of injury, disease, and medical procedures is a familiar experience for children.1,2 Recognizing and assessing pain accurately are fundamental to identifying sources of tissue damage and alleviating distress effectively and efficiently. Unfortunately, a number of challenges impede the efforts of parents and health care professionals to achieve valid and useful assessment.3,4 In particular, the limited verbal capacity of young children5–7 and often inconsistent or distorted verbal reports resulting from children’s unique cognitive Received for publication September 10, 2013; revised April 16, 2014; accepted March 18, 2014. From the *University of British Columbia, Vancouver, BC; and wSt Michaels Hospital, Toronto, Canada. Supported in part by a Social Sciences and Humanities Research Council of Canada grant, Ottawa, Canada. The authors declare no conflict of interest. Reprints: Kenneth D. Craig, PhD, Department of Psychology, University of British Columbia, Vancouver, BC, Canada V6T 1Z4 (e-mail: [email protected]). Copyright r 2014 Wolters Kluwer Health, Inc. All rights reserved. DOI: 10.1097/AJP.0000000000000103

Clin J Pain



Volume 31, Number 3, March 2015

strategies,8,9 including suppression or exaggeration of pain report, can substantially increase the difficulty of pain evaluation. In consequence, observational scales focusing upon nonverbal behavior are typically recommended as alternatives to self-report or as sources of complementary information.10 Observational scales utilize nonverbal expression of pain and aim to diminish biases and questionable reliability that accompany global estimates of children’s pain.11–13 Numerous behavioral scales are available at present,4,14 although, despite their availability, pain often remains unrecognized and underestimated in pediatric populations.15–18 To be maximally effective, pediatric pain measures must meet established psychometric requirements of sensitivity, specificity, validity, and reliability.19–21 No single scale satisfies these criteria for all types of pain and all ages19,22 and measures vary substantially in the behavioral criteria recommended to identify and determine pain severity.20 Furthermore, scales often fail to provide detailed instruction on how to use the scale or information concerning the age group, setting, or type of pain for which the scale has been validated. Substantial support has emerged for a focus on facial activity in pain assessment of infants and young children because of its accessibility, universality, and specificity.23–26 Facial activity during pain serves important social functions, such as alerting observers to danger and attracting observer attention in the interest of eliciting aid.27–29 Its importance is exemplified by Hadjistavropoulos et al30 who found that 49% of variance in parental estimates of pain was accounted for by the presence of facial actions associated with pain. Although most behavioral scales include facial expression as an indicator of pain, many fail to provide item descriptions that are consistent with empirically described accounts.14,27,31 Most scales do not justify the choice of item descriptions. There tends to be oversimplification of complex facial activities, such as reference to “frown” or “negative facial expression.”5 This is problematic because it confuses painful displays with expressions of nonpainful, aversive emotional states. The revised FLACC, for example, includes emotional displays (eg, “appears sad or worried”) as part of the facial pain assessment.32 Some scales provide descriptors referring to facial actions rarely observed in empirical accounts of pain expression (eg, “quivering chin” or “clenched teeth”).21,33 Other scales provide ambiguously described items, such as “wry mouth,”19 thereby contributing to variable interpretations of both the term and the expression of the patient. Finally, some scale items are inconsistent with the level of pain severity their descriptions denote. For example, the descriptor “composed” has been represented as an intermediate level of pain.34 These ambiguities, inconsistencies, and inaccuracies indicate that these items lack content validity—they would appear to hinder accurate assessment and to indicate a need for further refinement of certain behavioral observational scales. www.clinicalpain.com |

189

Clin J Pain

Chang et al

The objective of the current study was to examine the impact on observer judgments of pain of different descriptors used for items representing the facial expression of pain in 6 widely used multidimensional pediatric observational pain scales, as compared with the Child Facial Coding System (CFCS).35 The CFCS provides an objective, detailed, and anatomically based index of pain with established reliability, specificity, and validity,25,36–38 making it an appropriate criterion index for comparison in the current investigation. The pattern of facial expression identified using CFCS items can be differentiated from facial activity associated with other aversive but non-noxious states, for example, fear or anger.39 Scales using explicit descriptors that correspond to empirically based accounts of the facial expression of pain were expected to yield higher levels of intercoder reliability, to better discriminate levels of pain intensity, to correlate more strongly with CFCS scores, and to have better face validity, compared with ambiguous items.

MATERIALS AND METHODS Participants Coders viewed the facial behavior of 44 children who had provided video-recorded behavior for a previous study and whose parents had consented to secondary use of the video footage in future studies.25 The children were aged between 13 and 74 months (M = 41.8 mo) and were recorded immediately following minor surgical procedures in the Post Anaesthetic Care Unit (PACU) of the British Columbia Children’s Hospital in Vancouver, BC. The families of the children all spoke English fluently, as did all of the patients old enough to converse. Children with developmental disabilities, those who had received major surgery or ocular/ facial procedures, and those scheduled to receive major conductive blocks were excluded from participation. Fifty percent of the surgeries performed were myringotomies, 27% were tonsillectomies and/or adenoidectomies, 8% were cyst removals, and 4% were inguinal hernia repairs. The remaining 10% included tympanotomy, microlaryngoscopy, laryngobronchoscopy, removal of arch bars, removal of a submandibular duct calculus, release of tendons in the hand, electrocautery, and removal of an oral lesion. The duration of the surgeries ranged from 2 to 65 minutes (M = 15), with the length of stay in the recovery room ranging from 20 to 300 minutes (M = 48). All children received general anesthesia, whereas 44% also received intravenous opioids intraoperatively and 45% received local anesthetic for the surgery site.

Measures Video-clips of children’s facial activity after surgery were coded for behavioral signs of pain using the facial items from 7 observational scales: the CFCS,35 the Children’s Hospital of Eastern Ontario Pain Scale (CHEOPS),34 the Face, Legs, Arms, Cry and Consolability Scale (FLACC),33 the Riley Infant Pain Scale (RIPS),21 the Pre-verbal, Early Verbal Pediatric Pain Scale (PEPPS),40 the Children and Infants’ Postoperative Pain Scale (CHIPPS),19 and the ToddlerPreschooler Postoperative Pain Scale (TPPPS).41 Table 1 represents a summary of all facial scale items from the observational measures. The scales selected were designed to assess pain in the age range studied. Several are widely used in clinical pain assessment, and they represent different approaches to characterize the facial expression of pain.

190 | www.clinicalpain.com



Volume 31, Number 3, March 2015

CFCS

The CFCS35 was designed to assess pain in children aged 1 to 6 years. Facial actions representing pain were derived from the Facial Action Coding System, an objective coding system that provides comprehensive, anatomically based descriptions of facial activity.42 The CFCS focuses on the following 13 facial actions likely to be observed during acute pain: “brow lower,” “eye squeeze,” “squint,” “blink,” “flared nostril,” “nose wrinkle,” “nasolabial furrow,” “cheek raise,” “open lips,” “upper lip raise,” “lip corner pull,” “horizontal mouth stretch,” and “vertical mouth stretch.” Three items, “blink,” “flared nostril,” and “open lips” are coded as present or absent, whereas the remainder are coded for intensity using 0 (no action), 1 (slight action), and 2 (distinct to maximal action) (total score range: 0 to 26). High reliability has been established and validity has been demonstrated in diverse settings.25,36,37,43

CHEOPS The CHEOPS was developed for evaluating pain in children aged 1 to 7 years.34 Items were based on descriptions of the most frequently observed behaviors of children in pain, as described by experienced postoperative recovery room nurses. The full scale (total score 4 to 13) is comprised of 6 behavioral categories: “cry,” “facial,” “verbal,” “torso,” “touch,” and “legs.”34 The “facial” item is scored from 0 to 2, with 0 indicating smiling (scored only if definite positive facial expression), 1 indicating composed (neutral facial expression), and 2 indicating grimace (scored only if definite negative facial expression).34

FLACC Scale The FLACC scale assesses pain in children aged 2 months to 7 years. The items included on the FLACC scale were derived from accounts of the behavioral expression of pain provided by experienced clinicians.33 There are 5 categories of behavior (total score 0 to 10): “face,” “legs,” “activity,” “cry,” and “consolability.”33 The face subscale is scored for intensity, with 0 denoting no particular expression or smile, 1 denoting occasional grimace or frown, withdrawn, disinterested, and 2 denoting frequent to constant quivering chin, clenched jaw.33

RIPS The RIPS was developed for assessment of pain in infants and children aged 0 to 36 months by staff nurses in a children’s hospital.21 The 6 behavioral categories (total score 0 to 18) used in the scale are “facial,” “body movement,” “sleep,” “verbal/vocal,” “consolability,” and “response to movement/touch.”21 The facial item is scored from 0 to 3, with 0 indicating neutral/smiling, 1 indicating frowning/grimacing, 2 indicating clenched teeth, and 3 indicating full cry expression.

PEPPS The PEPPS provides measurement of pain in premature infants through to toddlers of 36 months.40 Items were based on consultations with experienced health care providers.40 The scale has 7 categories (total score 0 to 26): “heart rate,” “facial,” “cry (audible/visible),” “consolability/state of restfulness,” “body posture,” “sociability,” and “sucking/ feeding.”40 For the facial item, 0 indicates relaxed facial expression, 2 indicates grimace, brows drawn together, eyes partially closed, squinting, and 4 indicates severe grimace, brows lowered, tightly drawn together, eyes tightly closed.

Copyright

r

2014 Wolters Kluwer Health, Inc. All rights reserved.

Clin J Pain



Volume 31, Number 3, March 2015

Comparative Validity

TABLE 1. Facial Expression Items and Scoring for the Different Pediatric Observational Behavior Pain Scales

Scoring Original (Transformed to 0-10 Scale) Scale

Dimension

0 (0)

1 (5)

2 (10)

CHEOPS FLACC

Facial Face

Smiling No particular expression or smile

CHIPPS

Facial expression

Relaxed/smiling

Composed Occasional grimace or frown, withdrawn, disinterested Wry mouth

Grimace Frequent/ constant quivering chin, clenched jaw

Scale

Dimension

0 or 1 (0/3.33)

0 or 1 (0/3.33)

0 or 1 (0/3.33)

TPPPS

Facial pain expression

Open mouth

Squint, closed eyes

Furrow forehead, brow bulging

Scale

Dimension

RIPS

Facial

Scale

Dimension

PEPPS

Facial

Grimacing (mouth and eyes)

Scoring Original (Transformed to 0-10 Scale)* Total Score 0-3 (0-10)

Scoring Original (Transformed to 0-10 Scale)w 0 (0) Neutral/smiling

1 (3.33) Frowning/ grimacing

2 (6.67) Clenched teeth

3 (10) Full cry expression

Scoring Original (Transformed to 0-10 Scale) 0 (0)

1 (2.5)

Relaxed facial expression

2 (5) Grimace; brows drawn together; eyes partially closed, squinting

3 (7.5)

4 (10) Severe grimace; brows lowered, tightly drawn together; eyes tightly closed

For all scales, higher scores reflect greater pain severity. *Items on the TPPPS are evaluated for present or absence of each item. wRanges for the scores of all measures were transformed arithmetically into a 0 to 10 scale, thereby standardizing ratings and allowing comparisons between scales, for example, for a scale that has a scoring range from 0 to 4 for the facial component, each score was multiplied by 2.5 to transform the scores to a 0 to 10 scale. Scores on the Child Facial Coding System (CFCS) were transformed to a 0 to 10 metric by dividing the summed score by 2.6. CHEOPS indicates Children’s Hospital of Eastern Ontario Pain Scale; CHIPPS, the Children and Infants’ Postoperative Pain Scale; FLACC, the Face, Legs, Arms, Cry and Consolability scale; PEPPS, the Pre-verbal, Early Verbal Pediatric Pain Scale; RIPS, the Riley Infant Pain Scale; TPPPS, Toddler-Preschooler Postoperative Pain Scale.

CHIPPS The CHIPPS was designed for pain assessment in newborn to 5-year-old children.19 It was developed using behavioral indicators of pain drawn from the current literature on the expression of pain. The CHIPPS is composed of 5 behavioral categories (total score 0 to 10): “crying,” “facial expression,” “posture of the trunk,” “posture of the legs,” and “motor restlessness.” For the facial expression item, 0 indicates relaxed, smiling, 1 indicates wry mouth, and 2 indicates grimace (mouth and eyes).

Video Editing

TPPPS The TPPPS was designed for pain assessment in children aged 1 to 5 years.41 Items were based on observational studies of pediatric pain behavior. The 7 items on the scale fall into 3 categories (total score 0 to 7): “vocal pain expression,” “facial pain expression,” and “bodily pain expression.” The facial pain expression scale is comprised of 3 items scored for their presence or absence: open mouth, lips pulled back at corners” squint, closed eyes, and brow bulge, furrowed forehead.

Procedure Video Recordings Before surgery, informed consent for the child’s participation was obtained from parents. The study protocol Copyright

was approved by the Behavioural Research Ethics Board at the University of British Columbia. The children were videotaped by a research assistant, beginning with their arrival following surgery at the PACU and ending with their discharge to the day surgery unit or after 1 hour of filming. Views of the face were occasionally obstructed by caregivers and medical equipment during filming. The mean duration of recording time was 35 minutes, ranging from 2 to 62 minutes.

r

Each real-time video recording was edited to provide two 10-second intervals likely expressive of pain. The first was based upon a time criterion and began one third into the time the child spent in the PACU (the “one-third” time clip). This 10-second interval was selected because, in the investigation by Gilbert et al,25 this period corresponded to the waning of anesthetic impact, leading to a greater likelihood of pain expression in the child. The second criterion was designed to capture the most vigorous facial activity. An independent coder viewed each child’s video recording and identified the moment that represented the most vigorous facial activity, with the preceding and succeeding 5 seconds comprising the 10-second clip (the “most vigorous” time clip). Both 10-second intervals were selected with the

2014 Wolters Kluwer Health, Inc. All rights reserved.

www.clinicalpain.com |

191

Clin J Pain

Chang et al



Volume 31, Number 3, March 2015

constraint that they featured a clear 10-second view of facial activity. If this was not possible, the first 10-second interval available thereafter was selected for coding. The one third time clip was conceptualized as indicating less severe pain as its selection was based upon a time rather than a severity criterion.

compare scores for the 2 time blocks. Finally, convergent validity, the extent to which the measures agreed with each other, was calculated using a repeated measure analysis of variance, as well as by examining intercorrelations among the measures.

Video Coding

Effective reliability, intercoder reliability, empirical validity, convergent validity, and face validity of the facial components of the 6 observational pediatric pain scales were examined.

RESULTS

Four research assistants were trained to code the CHEOPS, FLACC, RIPS, PEPPS, CHIPPS, and TPPPS facial expression items, adhering to the original instructions provided by the authors of the scales. Scales were assigned to coders in random order to reduce order effects and coders were blind to the medical condition, surgery type, and anesthesia received by patients. After becoming acquainted with the facial activity items on a scale, coders independently rated all of the video-clips in random order. A forced-choice rating method was employed, with coders required to use one of the categories provided by the scale (ie, no omissions were permitted). After rating every clip according to a scale, the coders were asked to answer “yes” or “no” to the question “Did the action fit the item description?” to determine whether the coders believed that the scale adequately represented the painful expression they had viewed in each individual clip. All coding for each scale was completed before subsequent coding using a new scale commenced. CFCS coding was undertaken by 2 experienced coders who had previously demonstrated high levels of intercoder reliability following training. Both coders completed coding for all of the children. These coders did not code the other scales.

Sample Size Calculation and Statistical Analyses A post hoc sample size estimation was computed to determine the number of participants required to show a statistically difference between sample means. The estimate was based on a = 0.05, power = 0.8, and effect size = 0.60 (calculated using the adjusted mean scores of the CFCS at the one third time interval and the most vigorous time interval, mean difference = 1.31). Forty-five participants were required indicating that the current study achieved adequate power. As the measures have different ranges of scoring, a direct comparison of the raw scores was not suitable. Therefore, metrics for the scores from all measures were transformed arithmetically into a 0 to 10 scale to standardize ratings and to allow comparisons between scales, for example, for a scale that has a scoring range from 0 to 4 for the facial component each score was multiplied by 2.5 to transform the scores to a 0 to 10 scale (Table 1 provides details regarding original and transformed scoring).44 The a level for analyses was set at P < 0.01 to control for the possible inflation of type 1 error due to the number of comparisons being conducted. Descriptive statistics were used to report the incidence and mean scores of the different facial scale items. The response to the “scale representativeness” question was used to measure face validity. Effective reliability, the stability of the mean of all coders’ ratings, was examined using the Spearman-Brown formula recommended by Rosenthal.45 Intercoder reliability was determined using the intraclass correlation (ICC) (absolute agreement) with ordinal categories and by calculating proportion agreement for the scales, wherein each individual behavior was coded as absent or present.42 To determine the empirical validity of each scale, the Wilcoxon signed rank test was used to

192 | www.clinicalpain.com

Effective Reliability To establish the stability of the mean score across coders for the facial item on the individual scales (CHEOPS, FLACC, RIPS, PEPPS, and CHIPPS), Rosenthal’s45 effective reliability was calculated using the Spearman-Brown formula. This approach to the stability of the mean score treats individual differences among coders as noise and establishes whether variation among additional coders would change the mean substantially. We considered an effective reliability of >0.70 for 4 coders as representing moderately high levels.45 The SpearmanBrown R ranged between 0.73 and 0.77, indicating that the mean scores for scales were stable and reliable, thereby permitting comparisons of mean scores across scales.

Intercoder Reliability The extent to which coders agreed with each other was established for facial items on each of the different scales separately for both the one third time-point and the most vigorous time-point. Intercoder reliability was determined using ICC for all pairs of coders on the CHEOPS, FLACC, RIPS, PEPPS, and CHIPPS scales, creating 6 pairwise comparisons for each scale. ICCs exceeding 0.75 are considered as very strong.46 Because TPPPS uses a different structure to assess pain, that is, coders identified actions as present or absent, intercoder reliability was calculated for facial items on this scale by calculating proportion agreement.42 This method was also used to calculate intercoder reliability for the CFCS. For CHEOPS, FLACC, RIPS, PEPPS, and CHIPPS scales, ICC values ranged from 0.66 to 0.84 (Table 2). For the one third time-point, the CHEOPS and CHIPPS both had substantial reliability and emerged as reliable scales (mean k = 0.79 and 0.78, respectively). The RIPS demonstrated moderate to substantial reliability and was the least reliable scale (k = 0.66).47 For the most vigorous timepoint, CHEOPS and CHIPPS (k = 0.83 and 0.84) remained highly reliable and the FLACC demonstrated moderate to substantial reliability (k = 66).47 The intercoder reliability for the TPPPS was high at both the one third clip (r = 0.83) and increased for the most vigorous clip (r = 0.92). The intercoder reliability for the CFCS was low at the one third clip (r = 0.55) but increased slightly for the most vigorous clip (r = 0.61). According to Landis and Koch,47 reliability for the TPPPS could be categorized as almost perfect, whereas it was moderate to substantial for the CFCS.

Empirical Validity Empirical validity was examined using the Wilcoxon signed rank test to establish whether each of the 6 scales could distinguish between the different levels of pain intensity represented by the one third time interval and the

Copyright

r

2014 Wolters Kluwer Health, Inc. All rights reserved.

Clin J Pain



Volume 31, Number 3, March 2015

Comparative Validity

TABLE 2. Range and Means of Intercoder Reliability for Scales

Time Intervals

CHEOPS

FLACC

RIPS

PEPPS

CHIPPS

TPPPS

CFCS

0.79 0.70-0.87

0.70 0.56-0.81

0.66 0.53-0.78

0.75 0.63-0.85

0.78 0.67-0.86

0.83 0.67-0.97

0.55 0-0.94

0.83 0.75-0.89

0.66 0.41-0.81

0.73 0.61-0.83

0.73 0.60-0.83

0.84 0.76-0.90

0.92 0.83-0.97

0.61 0.20-0.98

One third M SD Most vigorous M SD

CFCS indicates Child Facial Coding System Children; CHEOPS, Children’s Hospital of Eastern Ontario Pain Scale; CHIPPS, the Children and Infants’ Postoperative Pain Scale; FLACC, the Face, Legs, Arms, Cry and Consolability scale; PEPPS, the Pre-verbal, Early Verbal Pediatric Pain Scale; RIPS, the Riley Infant Pain Scale; TPPPS, Toddler-Preschooler Postoperative Pain Scale.

most vigorous time interval (Fig. 1 and Table 3). Except for the CHEOPS (P = 0.02), all scales had a significantly higher score for the most vigorous interval than for the one third time interval (P < 0.01). This demonstrated good discrimination between levels of pain severity for all scales except the CHEOPS scale.

Convergent Validity Convergent validity was evaluated by comparing relative magnitudes of pain severity scores across all 7 scales for both one third time interval and the most vigorous time interval. Substantial differences were observed (Fig. 1 and Table 3). A repeated measures analysis of variance (ANOVA) was performed to compare the mean ratings of each scale to each other at each of the 2 time-points. There were significant main effects for differences among the various scales, F6,38 = 138.27, P < 0.001, and the 2 timepoints, F1,43 = 14.29, P < 0.001, as well as a significant interaction for the different scales across the 2 time-points, F6,38 = 3.66, P = 0.006. The source of interaction was established through paired comparisons (univariate analysis, simple effects) across the individual scales at each of the 2 time-points. At the one third time-point, ratings of severity on FLACC, PEPPS, RIPS, and CFCS were not significantly different from each other. The TPPPS and CHIPPS had similar scores but these were significantly greater scores than all other scales (P < 0.001), save for 10 Most vigorous One-third

Pain Severity Rating

8

6

4

2

0

CHEOPS FLACC∗

RIPS∗

PEPPS∗

CHIPPS∗

TPPPS∗

CFCS∗

Scale ∗Refers to significant differences between the two intensity levels at p < .01

FIGURE 1. Comparison of pain severity scores on facial items of the clinical observational scales at the one third and most vigorous time-points. CFCS indicates Child Facial Coding System Children; CHEOPS, Children’s Hospital of Eastern Ontario Pain Scale; CHIPPS, the Children and Infants’ Postoperative Pain Scale; FLACC, the Face, Legs, Arms, Cry and Consolability scale; PEPPS, the Pre-verbal, Early Verbal Pediatric Pain Scale; RIPS, the Riley Infant Pain Scale; TPPPS, Toddler-Preschooler Postoperative Pain Scale.

Copyright

r

CHEOPS, whereas the CHEOPS had significantly greater scores than all other scales (P < 0.01), indicating poor convergent validity, across scales. At the most vigorous time-point, comparison of the scales was similar to the one third time-point, with minor differences. Again, the PEPPS, RIPS, and CFCS were not significantly different from each other, but at this time-point the FLACC was only similar to the CFCS and had a lower score in comparison with all other scales (P < 0.01). The TPPPS and CHIPPS had similar scores but had significantly greater scores than all other scales (P < 0.001), save again for CHEOPS, and the CHEOPS had significantly greater scores than all other scales (P < 0.01). Note that scores for the most objective measure, CFCS, were intermediate to most other scales. The CHEOPS provided the greatest estimate and the FLACC provided the lowest estimate. Intercorrelations between the CFCS measure and each of the other scales were calculated. The correlations for the one third time interval ranged from 0.75 for the CHEOPS to 0.90 for the TPPPS and for the most vigorous time interval the range was from 0.76 for the TPPPS to 0.88 for the FLACC and the PEPPS. Correlations were universally strong according to Cohen’s recommended guidelines, with the correlation modestly greater when the most vigorous time interval was being coded.48

Face Validity To examine coder perception of how accurately the various scale descriptors seemed to represent the actual facial expressions of pain evidenced by the children, each coder indicated whether the descriptive language fit the observed behavior of the children for each video clip for each scale, producing 44 ratings for each time interval per scale per coder. Table 4 provides means and the range for the coders’ estimates for each of the 5 scales. TPPPS was not included in this analysis because this scale was measured on a different metric, with coders asked to identify presence or absence of specific actions. At the one third time-point, the CHEOPS was rated as being the most accurate (85% agreed, ie, the coders deemed the facial expression item corresponded to the child’s pain), whereas at the most vigorous time-point, RIPS was rated as the most accurate (85% agreement). CHEOPS and CHIPPS also reached high levels of face validity at the most vigorous time-point (83% agreement). FLACC had the lowest face validity at both the one third time-point (69% agreement) and for the most vigorous time-point (60% agreement).

DISCUSSION The scales differed in their ability to satisfy standard psychometric criteria. The findings suggest that use of

2014 Wolters Kluwer Health, Inc. All rights reserved.

www.clinicalpain.com |

193

Clin J Pain

Chang et al



Volume 31, Number 3, March 2015

TABLE 3. Adjusted Means and SDs of the Scales for Each Time Interval

Time Interval

CHEOPS

FLACC

RIPS

PEPPS

CHIPPS

TPPPS

7.22 2.56

2.81 2.70

3.18 3.23

3.32 3.45

4.97 4.56

5.51 3.54

8.07 2.70 2.37 0.018

4.03 2.68 3.07 0.002

5.04 3.85 3.47 0.001

5.14 3.66 2.82 0.005

6.79 4.38 2.83 0.005

7.31 3.49 3.48 0.001

One third M SD Most vigorous M SD Z* P

CFCS 3.28 2.07 4.59 2.32 3.88 < 0.001

*The Wilcoxon Signed Rank Test. CFCS indicates Child Facial Coding System Children; CHEOPS, Children’s Hospital of Eastern Ontario Pain Scale; CHIPPS, the Children and Infants’ Postoperative Pain Scale; FLACC, the Face, Legs, Arms, Cry and Consolability scale; PEPPS, the Pre-verbal, Early Verbal Pediatric Pain Scale; RIPS, the Riley Infant Pain Scale; TPPPS, Toddler-Preschooler Postoperative Pain Scale.

certain scales may lead to unreliable and inaccurate pain assessment.

Effective Reliability We established that the mean score for the 4 coders represented a stable estimate of the score of any group of coders. Additional coders would not have led to significant divergence from this estimate. Thus, comparisons of the relative magnitude of scale scores were considered acceptable.

Intercoder Reliability Although group means were stable, there was substantial variability in intercoder reliability across the scales. It is noteworthy that reliability was generally better when there were more vigorous levels of pain expression. Disagreement as to the presence or magnitude of pain is most likely to be evident when children are experiencing less rather than more severe pain. This was the case with the CFCS, with intercoder reliability varying substantially on this scale. It seems likely that reducing items on the CFCS to those most likely to be observed during painful events would increase reliability on this scale and others. The most reliable scales were the CHEOPS, CHIPPS, and TPPPS scales; all 3 were successful in yielding high levels of intercoder reliability on the 2 levels of pain expression scored. CHEOPS achieved this by requiring only a judgment of the difference between “composed” and “grimace,” that is, the distinction would seem easily achieved. However, these terms did not differentiate between the 2 levels of pain observed in the present study, indicating relative insensitivity of the CHEOPS scale to variations in pain expression. The CHIPPS scale on the other hand used vague and unusual language. An absence of pain is described as

“relaxed/smiling,” which would seem a more positive description for children not in pain in clinical settings than would be expected; children can be pain free yet not relaxed or smiling, suggesting a risk of false positives for pain. The first level of pain for CHIPPS is to be coded if the child displays a “wry mouth.” This description is vague as it fails to distinguish the manner in which one would recognize the mouth as wry and it is not clear how the term “wry” pertains to the facial expression of pain. Moreover, the term only describes 1 portion of the face that could indicate pain. In contrast, TPPPS provided cues associated empirically with the facial expression of pain in children (“open mouth,” “squint,” “closed eyes,” “furrow forehead,” and “brow bulging”) that were both reliable in coding and differentiated the 2 levels of pain scored indicating sensitivity to variation in pain expression, making it a more satisfactory scale and affirming the importance of relatively precise behavioral descriptors. The least reliable scales were the FLACC and RIPS, reflecting substantial disagreement among coders in application of the descriptors used on these scales. Examination of items on the FLACC scale provides insight for this finding. For example, the first level indicating pain requires observation of “occasional grimace, or frown, withdrawn, disinterested,” descriptors that do not correspond to empirical descriptions of facial expressions of pain in children. Specifically, although grimacing is related to pain, it is not specific to a painful experience. The descriptor “frown” similarly could mislead observers because it can also signal that the child is unhappy, which does not necessitate pain. In addition, although children in pain can be withdrawn49 or voluntarily suppress their expressions,50 pain most often is associated with active expression. The descriptor for the most severe pain rating, “frequent, constant quivering chin,

TABLE 4. Percentages of Coders Who Indicated Good Fit Between the Scale Descriptors and the Observed Behavior

Time Intervals One third (%) Range M Most vigorous (%) Range M

CHEOPS

FLACC

RIPS

PEPPS

CHIPPS

75-100 85

52-100 69

61-96 80

65-89 80

55-100 78

57-100 83

43-100 60

71-100 85

50-95 75

64-100 83

TPPPS and CFCS were not included in this analysis because coders are asked to identify presence or absence of specific actions. CHEOPS indicates Children’s Hospital of Eastern Ontario Pain Scale; CHIPPS, the Children and Infants’ Postoperative Pain Scale; FLACC, the Face, Legs, Arms, Cry and Consolability scale; PEPPS, the Pre-verbal, Early Verbal Pediatric Pain Scale; RIPS, the Riley Infant Pain Scale.

194 | www.clinicalpain.com

Copyright

r

2014 Wolters Kluwer Health, Inc. All rights reserved.

Clin J Pain



Volume 31, Number 3, March 2015

clenched jaw,” suffers from similar problems; empirical descriptions of children’s facial reactions when they are in pain have not identified a “quivering chin” to be associated with pain and, rather than observing “clenched jaw,” children in pain typically display varying degrees of open mouth.27,31 The coders also rated this description as having the least face validity. Similar misleading language is used for items on the RIPS scale. For example, “frown/ grimacing” are associated with the first level of pain; again, “frown” is usually associated with lower mood or unhappiness, states not necessarily indicative of pain, and the item “grimacing” is not accompanied by specific behavioral cues to distinguish it from other aversive, but non-noxious, emotional states. The item “clenched teeth” indicates the next level of pain. This action, like “clenched jaw” for the FLACC scale, is rarely observed when children display pain.31 Finally, the maximal level of pain on the RIPS is to be scored when children display a “full cry expression.” Although crying provides a reasonable index of the severity of children’s distress,51 it lacks specificity as an index of pain. It often is precipitated by nonpainful states of distress, such as fear and hunger, and children in pain often do not cry.31 Thus, on close inspection, the RIPS provides confusing cues as to the presence of pain, thereby producing modest levels of reliability. The PEPPS scale was substantially reliable for both the levels of pain estimated. As observed in Table 1, the PEPPS provides relatively specific behavioral cues that correspond to empirical descriptions when children are in pain.

Empirical Validity All scales investigated, with the exception of the CHEOPS, demonstrated empirical validity, that is, they were sensitive to the difference between the most vigorous time-point and the one third time-point levels of pain. This also supported the assumption that the one third time interval would represent a lesser severity of pain. As the CHEOPS only asks for discrimination between “composed” and “grimacing,” it could be characterized as focusing upon recognition of pain rather than attempting to identify different levels of pain severity using the facial item.

Convergent Validity The scales differed substantially in the levels of pain identified. The most substantial difference observed for the most vigorous expression was between the overall mean ratings of 8 of 10 for the CHEOPS scale and 4 of 10 for the FLACC scale. The CFCS, TPPPS, RIPS, CHIPPS, and PEPPS mean scores were intermediate to these scores. Substantial differences in scale item descriptions would appear to account for the different levels of severity identified; CHEOPS only allows for identifying pain at the highest possible level (grimace) and consequently does not permit observers to judge gradations of pain. As a result, if there is any facial indication of pain, the coder’s only option is to choose the highest score possible, conceivably inflating ratings. Inflated levels of pain may also reflect use of “smiling” to indicate an absence of pain on CHEOPS. Because of the anxiety and discomfort involved in surgery, as well as the sedation, it is unlikely that any of the children would exhibit a smiling expression, further inflating ratings. In contrast, low scores on the FLACC scale suggest underestimation of pain. This may relate to the lack of correspondence between the pain behaviors described as most indicative of severe pain on the scale, “quivering chin” Copyright

r

Comparative Validity

and “clenched jaw,” and those that have been reported in empirical studies. It is very unlikely that observers would observe these actions or code for severe pain. The nonspecificity of the FLACC scale was also evident in the low appraisal of face validity by coders; they indicated that the item descriptions were consistent with the observed behavior for only 69% of the items at the one third time, and this number dropped to 60% at the most vigorous point, a rate far lower than any other scale. We are in agreement with the position that efforts should be made to improve the psychometric properties of existing pain assessment instruments within various populations, types of pain, and contextual factors.52,53 The emphasis in this paper was on facial expression, but psychometric evaluation of items in other behavioral categories is warranted. Inspection of items on the numerous observational scales used for the assessment of pain in children14 discloses a remarkable diversity of items, ranging from self-report (eg, “asking for help,” “complaining of pain,” “cursing”), paralinguistic qualities of speech (eg, “whimpering,” “crying,” “moaning”), limb action (eg, “flailing arms and legs,” “rubbing,” “protecting/favoring/guarding part of body that hurts”), torso posture and activity (eg, “tensing up,” “restless,” “shivering torso”), manifestations of physiological activity (eg, “looking pale,” “irregular breathing,” “tensing up”), and social reactivity (eg, “withdrawn,” “hard to console,” “angry verbalizations”). Most scales have relied upon expert judgment in the development of items rather than a careful ethological approach to fine-grain description. Although the value of careful attention to facial expression has been demonstrated, the relative weight attached to facial expression in behavioral observation scales must await careful psychometric evaluation of other categories of behavioral items. We are indebted to a reviewer who observed that future development of observational pain scales should incorporate evidence-based weighting of items rather than arbitrarily weighted items, as in most or all of the present scales. The findings of the current study make it clear that using well-defined behavioral descriptors based upon empirical description of how children behave when in pain improves the psychometric quality of the instrument. When this desirable property was present we observed: (1) improved face validity, a feature likely to increase compliance in the use of standardized observation scales for children’s pain assessment16; (2) higher intercoder reliability, as less ambiguous cues would diminish uncertainty as to what was to be observed and reduce the likelihood of clinician biases; and (3) increased sensitivity to differences in levels of pain severity. Although not directly investigated, focusing upon specific features of pain expression is likely to reduce the likelihood of false positives—confusion of aversive but non-noxious psychological states with pain.

Limitations We identify the following as limitations of the study: the sample size was relatively small, the age range was relatively large, there was overrepresentation of otolaryngologic procedures, and only postoperative pain was examined. Given that there is no absolute index of pain, we were operating on the supposition that the behavior manifested by the children was an expression of pain. This was supported by the likelihood of postoperative pain, observer observations that the children were displaying pain behavior, and the applicability of the scales designed to identify pain. Nevertheless, there could have been confusion between pain and distress behaviors associated with the emergence of

2014 Wolters Kluwer Health, Inc. All rights reserved.

www.clinicalpain.com |

195

Clin J Pain

Chang et al

delirium or agitation, a common phenomenon among children emerging from anesthesia. Therefore, care should be taken when generalizing the results. Replication of the approach in other contexts of pain in children is warranted. Nevertheless, the data were reliable and consistent with expectations. The findings also were consistent with a similar empirical investigation of the use of facial activity in measures of pain in elderly persons with dementia.44 We finally note that only facial activity was studied in the present study. The facial items were studied without consideration of other items on the various scales. Nevertheless, the lessons learned likely are applicable to observational scale items concerning other behavioral activity.

CONCLUSIONS This investigation emphasizes the need for further development of objective coding observational measures to standardize pain assessment in children using empirically derived characterizations of facial display. REFERENCES 1. Stevens BJ, Abbott LK, Yamada J, et al. Epidemiology and management of painful procedures in children in Canadian hospitals. Can Med Assoc J. 2011;183:E403–E410. 2. Perquin CW, Hazebroek-Kampschreur AAJM, Hunfeld JAM, et al. Pain in children and adolescents: a common experience. Pain. 2000;87:51–58. 3. Crellin D, Sullivan TP, Babl FE, et al. Analysis of the validation of existing behavioral pain and distress scales for use in the procedural setting. Pediatr Anesth. 2007;17:720–733. 4. Stinson JN, Kavanagh T, Yamada J, et al. Systematic review of the psychometric properties, interpretability and feasibility of self report pain intensity measures for use in clinical trials in children and adolescents. Pain. 2006;125:143–157. 5. Hesselgard K, Larsson S, Romner B, et al. Validity and reliability of the Behavioural Observational Pain Scale for postoperative pain measurement in children 1-7 years of age. Ped Crit Care Med. 2007;8:102–108. 6. Ramelet AS, Rees NW, McDonald S, et al. Clinical validation of the multidimensional assessment of pain scale. Paediatr Anaesth. 2007;17:1156–1165. 7. Stanford EA, Chambers CT, Craig KD. A normative analysis of the development of pain-related vocabulary in children. Pain. 2005;114:278–284. 8. von Baeyer CL, Forsyth SJ, Stanford EA, et al. Response biases in preschool children’s ratings of pain in hypothetical situations. Eur J Pain. 2009;13:209–213. 9. von Baeyer C, Uman LS, Chambers CT, et al. Can we screen young children for their ability to provide accurate self-reports of pain? Pain. 2011;152:1327–1333. 10. Schiavenato M, Craig KD. Pain assessment as a social transaction: beyond the “Gold Standard”. Clin J Pain. 2010; 26:667–676. 11. Stevens B. Development and testing of a pediatric pain management sheet. Pediatr Nurs. 1990;16:543–548. 12. Riddell RRP, Badali MA, Craig KD. Parental judgments of infant pain: importance of perceived cognitive abilities, behavioural cues and contextual cues. Pain Res Manag. 2004; 9:73–80. 13. Schiavenato M. Facial expression and pain assessment in the pediatric patient: the primal face of pain. J Spec Ped Nurs. 2008;13:89–97. 14. von Bayer CL, Spagrud LJ. Systematic review of observational (behavioral) measures of pain for children and adolescents ages 3 to 18 years. Pain. 2007;127:140–150. 15. Chambers CT, Reid GJ, Craig KD, et al. Agreement between child and parent reports of pain. Clin J Pain. 1998;14:336–342, 12.

196 | www.clinicalpain.com



Volume 31, Number 3, March 2015

16. Franck LS, Bruce E. Putting pain assessment into practice: why is it so painful? Pain Res Manage. 2009;14:13–20. 17. Schechter NL. The undertreatment of pain in children: an overview. Pediatr Clin N Am. 1989;36:781–794. 18. Manworren RC. It’s time to relieve children’s pain. J Spec Ped Nurs. 2007;12:196–198. 19. Bu¨ttner W, Finke W. Analysis of behavioural and physiological parameters for the assessment of postoperative analgesic demand in newborns, infants and young children: a comprehensive report on seven consecutive studies. Pediatr Anesth. 2000;10:303–318. 20. Ramelet AS, Abu-Saad HH, Rees N, et al. The challenges of pain measurement in critically ill young children: a comprehensive review. Aus Crit Care. 2004;17:33–45. 21. Schade JG, Joyce BA, Gerkensmeyer J, et al. Comparison of three preverbal scales for postoperative pain assessment in a diverse pediatric sample. J Pain Symptom Manag. 1996;12:348–359. 22. Foster RL. State-of-the-art pain assessment and management. J Spec Ped Nurs. 2007;12:137–138. 23. Xavier Balda R, Guinsburg R, de Almeida MF, et al. The recognition of facial expression of pain in full-term newborns by parents and health professionals. Arch Pediat Adol Med. 2000;154:1009–1016. 24. Franck LS, Miaskowski C. Measurement of neonatal responses to painful stimuli: a research review. J Pain Symptom Manage. 1997;14:343–378. 25. Gilbert CA, Lilley CM, Craig KD, et al. Postoperative pain expression in preschool children: validation of the Child Facial Coding System. Clin J Pain. 1999;15:192–200. 26. Grunau RE, Oberlander T, Holsti L, et al. Bedside application of the neonatal facial coding system in pain assessment of premature neonates. Pain. 1998;76:277–286. 27. Craig KD, Prkachin KM, Grunau RE. The facial expression of pain. In: Turk D, Melzack R, eds. Handbook of Pain Assessment. 3rd ed. New York, NY: The Guilford Press; 2011:117–133. 28. Deyo KS, Prkachin KM, Mercer SR. Development of sensitivity to facial expression of pain. Pain. 2004;107:16–21. 29. Hadjistavropoulos T, Craig KD, Duck S, et al. A biopsychosocial formulation of pain communication. Psychol Bull. 2011;137:910–939. 30. Hadjistavropoulos HD, Craig KD, Grunau RVE, et al. Judging pain in newborns: facial and cry determinants. J Pediatr Psychol. 1994;19:485–491. 31. Craig KD. The facial display of pain. In: Finley GA, McGrath PJ, eds. Measurement of Pain in Infants and Children. Seattle, WA: IASP Press; 1998:103–121. 32. Malviya S, Voepel-Lewis T, Burke C, et al. The revised FLACC observational pain tool: improved reliability and validity for pain assessment in children with cognitive impairment. Pediatr Anesth. 2006;16:258–265. 33. Merkel SI, Voepel-Lewis T, Shayevitz JR, et al. The FLACC: a behavioral scale for scoring postoperative pain in young children. Pediatr Nurs. 1997;23:293–297. 34. McGrath PJ, Johnson G, Goodman JT, et al. CHEOPS: a behavioral scale for rating postoperative pain in children. In: Fields HL, Dubner R, Cervero F, eds. Advances in Pain Research and Therapy. New York, NY: Raven Press; 1985:395–402. 35. Chambers CT, Cassidy KL, McGrath PJ, et al. Child Facial Coding System Revised Manual. Halifax, Nova Scotia and Vancouver, British Columbia: Dalhousie University and the University of British Columbia; 1996. 36. Breau LM, McGrath PJ, Craig KD, et al. Facial expression of children receiving immunizations: a principal components analysis of the child facial coding system. Clin J Pain. 2001;17:178–186. 37. Cassidy K, Reid GJ, McGrath PJ, et al. Watch needle watch TV: audiovisual distraction in preschool immunization. Pain Med. 2002;3:108–118. 38. Vervoort T, Craig KD, Goubert L, et al. Expressive dimensions of pain catastrophizing: a comparative analysis of school children and children with clinical pain. Pain. 2008; 134:59–68.

Copyright

r

2014 Wolters Kluwer Health, Inc. All rights reserved.

Clin J Pain



Volume 31, Number 3, March 2015

39. Williams AC. Facial expression of pain: an evolutionary account. Behav Brain Sci. 2002;25:439–455. 40. Schultz AA, Murphy E, Morton J, et al. Preverbal, early verbal pediatric pain scale (PEPPS): development and early psychometric testing. J Pediatr Nurs. 1999;14:19–27. 41. Tarbell SE, Cohen TI, Marsh JL. The Toddler-Preschooler postoperative pain scale: an observational scale measuring postoperative pain in children aged 1-5. Preliminary report. Pain. 1992;50:273–280. 42. Ekman P, Friesen WV. Facial Action Coding System. Palo Alto, CA: Consulting Psychologists Press; 1978. 43. Cohen LL, Lemanek K, Blount RL, et al. Evidence-based assessment of pediatric pain. J Pediatr Psychol. 2008;33:939–955. 44. Sheu S, Versloot J, Nader R, et al. Pain in the elderly: validity of facial expression components of observational measures. Clin J Pain. 2011;27:593–601. 45. Rosenthal R. Estimating effective reliabilities in studies that employ judges’ ratings. J Clin Psyc. 1973;29:342–345. 46. Fleiss JL. The Design and Analysis of Clinical Experiments. New York: John Wiley & Sons; 1986. 47. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174.

Copyright

r

Comparative Validity

48. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Erlbaum; 1988. 49. Beyer JE, McGrath PJ, Berde CB. Discordance between self-report and behavioral pain measures in children aged 3–7 years after surgery. J Pain Symp Manage. 1990;5: 350–356. 50. Larochette AC, Chambers CT, Craig KD. Genuine, suppressed and faked facial expressions of pain in children. Pain. 2006;126:64–71. 51. Craig KD, Gilbert-MacLeod C, Lilley CM. Crying as an indicator of pain in infants. In: Barr RG, Hopkins B, Green JA, eds. Crying as a Sign, a Symptom, & a Signal: Clinical Emotional and Developmental Aspects of Infant and Toddler Crying. New York, NY: Cambridge University Press; 2000: 23–40. 52. Ranger M, Johnston CC, Rennick JE, et al. A multidimensional approach to pain assessment in critically ill infants during a painful procedure. Clin J Pain. 2013;29:613–620. 53. Stevens B, McGrath PJ, Gibbins S, et al. Determining behavioural and physiological responses to pain in infants at risk for neurological impairment. Pain. 2007;127:94–102.

2014 Wolters Kluwer Health, Inc. All rights reserved.

www.clinicalpain.com |

197

Pain assessment in children: validity of facial expression items in observational pain scales.

Assessing pain in young children requires astute judgment by observers. Multidimensional observational scales for pediatric pain contribute by providi...
201KB Sizes 0 Downloads 4 Views