j o u r n a l o f s u r g i c a l r e s e a r c h x x x ( 2 0 1 4 ) 1 e1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Q10 17 18 19 20 21 22 23 24 25 26 27 28 29 Q1 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Available online at www.sciencedirect.com

ScienceDirect journal homepage: www.JournalofSurgicalResearch.com

Reliable scar scoring system to assess photographs of burn patients Gabriel A. Mecott, MD,a Celeste C. Finnerty, PhD,a,b,* David N. Herndon, MD,a Ahmed M. Al-Mousawi, MD,a Ludwik K. Branski, MD,a Sachin Hegde, MD,a Robert Kraft, MD,a Felicia N. Williams, MD,a Susana A. Maldonado, MD,a Haidy G. Rivero, MD,a Noe Rodriguez-Escobar, MD,a and Marc G. Jeschke, MD, PhDc a

Department of Surgery, University of Texas Medical Branch and Shriners Hospitals for Children, Galveston, Texas Sealy Center for Molecular Medicine and the Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, Texas c Ross Tilley Burn Centre, Sunnybrook Health Sciences Centre and Division of Plastic Surgery, University of Toronto, Toronto, Ontario, Canada b

article info

abstract

Article history:

Background: Several scar-scoring scales exist to clinically monitor burn scar development and

Received 18 January 2013

maturation. Although scoring scars through direct clinical examination are ideal, scars must

Received in revised form

sometimes be scored from photographs. No scar scale currently exists for the latter purpose.

8 October 2014

Materials and methods: We modified a previously described scar scale (Yeong et al., J Burn

Accepted 31 October 2014

Care Rehabil 1997) and tested the reliability of this new scale in assessing burn scars from

Available online xxx

photographs. The new scale consisted of three parameters as follows: scar height, surface appearance, and color mismatch. Each parameter was assigned a score of 1 (best) to 4

Keywords:

(worst), generating a total score of 3e12. Five physicians with burns training scored 120

Scar

representative photographs using the original and modified scales. Reliability was analyzed

Scale

using coefficient of agreement, Cronbach alpha, intraclass correlation coefficient, variance,

Burn

and coefficient of variance. Analysis of variance was performed using the KruskaleWallis

Photograph

test. Color mismatch and scar height scores were validated by analyzing actual height and

Hypertrophic scar

color differences. Results: The intraclass correlation coefficient, the coefficient of agreement, and Cronbach alpha were higher for the modified scale than those of the original scale. The original scale produced more variance than that in the modified scale. Subanalysis demonstrated that, for all categories, the modified scale had greater correlation and reliability than the original scale. The correlation between color mismatch scores and actual color differences was 0.84 and between scar height scores and actual height was 0.81. Conclusions: The modified scar scale is a simple, reliable, and useful scale for evaluating photographs of burn patients. ª 2014 Elsevier Inc. All rights reserved.

* Corresponding author. Department of Surgery, Shriners Hospitals for Children, 815 Market Street, Galveston, TX 77550. Tel.: þ1 409 770 6567; fax: þ1 409 770 6919. Q2 E-mail address: [email protected] (C.C. Finnerty). 0022-4804/$ e see front matter ª 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jss.2014.10.055

5.2.0 DTD  YJSRE13012_proof  21 November 2014  5:57 pm  ce

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

2

131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195

1.

j o u r n a l o f s u r g i c a l r e s e a r c h x x x ( 2 0 1 4 ) 1 e1 0

Introduction

photographs of burn patients. Validation of the method was performed to demonstrate utility for future studies.

Burn treatment has improved dramatically in recent decades. Early excision and grafting of burn wounds have greatly reduced morbidity and mortality [1,2]. Unfortunately, pathologic scarring affecting both function and cosmesis remains common place in burn patients, often delaying reintegration of these individuals into society. Improved outcomes have shifted greater attention to wound and scar management in an attempt to improve esthetic and functional results. Evaluation of esthetic results depends on the ability to assess the evolution of scars over time and the outcomes of corrective interventions. This requires scar scoring methods that are simple, reliable, and objective. Several scar scores have been described [3e5], with the Vancouver scale being the most widely used in clinical practice [6]. However, an ideal scoring system has yet to be developed. Scoring scars through direct clinical examination is vital and irreplaceable. However, photographs of patients are commonly used in clinical practice to score scars, either because it is the only option or more convenient. Some scales have been applied to patient photographs to measure reliability between observers [7e10]. Unfortunately, these clinicbased scales are not specifically designed for analysis of patient photographs and may not be appropriate for this purpose. Development of a simple and reliable system to assess scars in photographs would be extremely useful from both a clinical and a research standpoint. Reliable comparisons over time, by multiple observers who may be blinded to treatment interventions, would be possible with a scale developed specifically for photographic evaluation. Objective, blinded analysis of a large number of patients enrolled in clinical studies would also be possible. Yeong et al. [8] described a very reliable clinic-based scar scoring system that relies on assessments of four scar characteristics. Here, we describe a modification of this scale to obtain a simple and highly reliable score system for the assessment of

2.

Methods

2.1.

Photographs

The study was approved by the Institutional Review Board of the University of Texas Medical Branch, and written informed consent was obtained from patients, parents, or legal guardians before enrollment. One-hundred twenty representative photographs of 40 severely burned patients admitted to our hospital from 2000e2008 were selected by a plastic surgeon who did not participate in the scoring. Representative photographs from 6, 12, and 24 mo after burn injury were selected so that different stages of scar maturation and hypertrophy could be presented to observers and the full-spectrum of scoring could be tested. All photographs were taken in the Medical Photography Department at our hospital using standard lighting conditions as well as a standard background and distance. Photographs were taken using two Photogenic Powerlight 600s located 9 feet from the subject and set at 58, 2 white umbrellas (Photogenic Professional Lighting, IL) reflecting 6 Q3 feet, and a Nikon D200 camera with a Nikon 105 mm f/2.8 AF Micro lens (Nikon Inc, NY), which was set to Manual: ISO 160, 1/ 60 at f10. The focal length was adjusted in accordance with the patient’s height. The photographer was located between the two Powerlights and 9 feet from the patient, who was standing 3 feet in front of a standard surgical blue background. Color control patches (Eastman Kodak Company, Rochester, NY) were used to calibrate color. All photographs were stored digitally and then randomly ordered in a PowerPoint slideshow (Microsoft Office PowerPoint 2003 SP3; Microsoft Corporation, Redmond, WA). The representative scar was outlined, and a full picture of the patient was included to illustrate the normal skin for comparison purposes (Fig. 1).

Fig. 1 e Example of a slide presented to observers. The studied area was outlined, and normal skin was included in the photograph for comparison. If necessary, a zoom-out or a picture of the entire patient was included in the same slide. (Color version of the figure is available online.) 5.2.0 DTD  YJSRE13012_proof  21 November 2014  5:57 pm  ce

196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260

j o u r n a l o f s u r g i c a l r e s e a r c h x x x ( 2 0 1 4 ) 1 e1 0

261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325

2.2.

Scoring

Five physicians with burns training scored all patient photographs. These observers were provided with a reference chart for the scale (the original or modified scale, see description in the following). The reference chart included sample pictures of each category (Figs. 2 and 3). The observers scored the photographs in two separate sessions (one session for each scale), with the sessions being separated by a 4-mo interval. The observers were blinded to the identity of the patients and time point of the photograph. This, taken with the random order of the slides ensured that any bias in the scoring would be minimized. The slideshow was identical in both sessions. The scores were captured in a database file (Microsoft Office Access 2003; Microsoft Corporation) for further analysis.

2.3.

Scar-scoring scales

The original scale described by Yeong et al. [8] grades four characteristics of the scar as follows: surface, thickness, border height, and color differences. In this scale, each category is assigned a score ranging from 1 to 4, yielding a final total score of 4 to 16 (Fig. 4).

3

We modified the original scale of Yeong et al. [8] to increase reproducibility. This modified scale included characteristics that describe the general appearance of the scars [11]. In this scale, three scar characteristics were assessed as follows: color mismatch, surface appearance, and scar height. Scar thickness in the original scale was omitted because of difficulty in reliably assessing this from only patient photographs. Each category received a score ranging from 1e4. This generated a total possible score of 3e12, which described the general appearance of the scar. The lower the score, the better the overall appearance of the scar and vice versa. Negative numbers on the scale, which also indicated a deviation from normal, were eliminated because negative numbers would decrease the final score and falsely imply a better scar appearance. Finally, the score started at 1 rather than 0 for “normal,” as it was felt that observers may exhibit reluctance in determining any scar to have a score of 0. A brief description of each score is provided in Table 1.

2.4. Validation of scar height and color mismatch scoring Color mismatch and scar height scores were validated by analyzing actual height and color differences. The five

Fig. 2 e Original scale teaching chart. The teaching chart, which was provided to observers, contained sample pictures for every category. (Color version of the figure is available online.) 5.2.0 DTD  YJSRE13012_proof  21 November 2014  5:57 pm  ce

326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390

4

391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455

j o u r n a l o f s u r g i c a l r e s e a r c h x x x ( 2 0 1 4 ) 1 e1 0

Fig. 3 e Modified scale teaching chart. The teaching chart contained sample photographs for scar surface appearance, scar height, and color mismatch. A brief description of the characteristics of each score was provided. (Color version of the figure is available online.)

5.2.0 DTD  YJSRE13012_proof  21 November 2014  5:58 pm  ce

456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520

j o u r n a l o f s u r g i c a l r e s e a r c h x x x ( 2 0 1 4 ) 1 e1 0

521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585

5

Fig. 4 e Scar score scale described by Yeong et al. [8]. Four characteristics of the scars are scored from L1 to 4, for a total score ranging from L4 to 16. (Color version of the figure is available online.)

Table 1 e Modified scale. Category Scar surface appearance

Score*

Description

1

Surface appearance: similar to normal skin. Surface appearance: slight mismatch (smoother or rougher than normal skin). Surface appearance: noticeably rougher than normal skin. Shallow Surface appearance: very rough compared with that of normal skin. Deep depressions and irregularities. Loss of normal architecture. No difference. Scar surface at the same plane of the normal skin. Slight difference. Smooth slope at the edge of the scar (positive or negative). Moderate difference. Defined slope at the edge of the scar (positive or negative). Extreme difference. Abrupt dropping at the edge of the scar (positive or negative). Color difference: difficult to distinguish. Color difference: subtle but noticeable (includes differences in pigmentation or erythema). Color difference: moderate color difference. Easy to distinguish (includes differences in pigmentation or erythema). Color difference: major color difference. Prominent mismatch. (Include differences in pigmentation or erythema.)

2

3

4

Scar height

1 2

3

4

Color mismatch

1 2

3

4

*

Each category is assigned a score from 1e4, for a total possible score of 3e12.

observers’ color mismatch ratings were compared with actual color differences in digital photographs obtained using Adobe Photoshop and actual color differences in skin Q4 obtained using a scanning reflectance spectrophotometer with xenon flash lamps. Briefly, color differences in digital photographs were determined by obtaining the red, green, blue or CIELAB (CIE 1976 L*a*b*) values from Photoshop software and then calculating the Euclidean distance with these values (see Supplementary Methods for further description of Euclidean distance). Color differences in skin were also assessed by calculating the Euclidean distance, which was obtained using CIELAB readings from a spectrophotometer. The inter rater reliability between the methods of color mismatch determination was assessed by the intraclass correlation coefficient (ICC) (SPSS; SPSS Inc, Chicago, IL). A more detailed description of the methods used for color mismatch validation can be accessed in the Online Supplement (see Supplementary Methods and Supplementary Figs. 1 and 2).

2.5.

Statistical analysis

Exact agreement between observers was determined using the coefficient of agreement (Po), calculated as the number of exact agreements/number of possible agreements [12]. ICC Model 2 form 1 [13] was used to test the reliability of each category (i.e., scar height, scar surface appearance, and color mismatch). For the mean values, we used Model 2, form 2 of the ICC to test reliability [13] (PASW Statistics 17.0; SPSS Inc). Reliability was calculated with Cronbach alpha (PASW Statistics 17.0; SPSS Inc). Variance and coefficient of variance (CV; CV ¼ standard deviation/mean) were calculated for each photograph to evaluate variance between observers. Analysis of variance was performed by KruskaleWallis one-way analysis of variance (ANOVA) on ranks (SigmaStat version 3.5; Systat Software, Inc Chicago, IL). Values of P 0.7) among all observers was overall disfigurement (clothed). Irregularity, thickness, and discoloration did not achieve this level of reliability in the evaluation. A high reliability (>0.8) could be obtained with any 8 raters randomly selected from the pool of 95 observers, though this was achieved using the

5.2.0 DTD  YJSRE13012_proof  21 November 2014  5:58 pm  ce

716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780

7

j o u r n a l o f s u r g i c a l r e s e a r c h x x x ( 2 0 1 4 ) 1 e1 0

781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845

Table 2 e Original scale: correlation and variance by category.

Q9

Statistical parameter

Category

Cronbach alpha Intraclass correlation* (95% confidence interval) Po CV  SEM Variance  SEM

Surface

Height

Color

Thickness

0.74 0.32 (0.24e0.42) 0.33 0.67  0.94 1.40  1.28

0.83 0.47 (0.48e0.57) 0.39 0.45  0.31 0.63  0.50

0.75 0.36 (0.28e0.46) 0.26 NCy 0.66  0.44

0.77 0.39 (0.30e0.48) 0.35 0.41  0.21 1.37  1.07

SEM ¼ standard error of the mean. * Based on standardized items. y NC: resulted in error (mean ¼ 0).

SpearmaneBrown formula. Unfortunately, this formula can only be used when the half-tests are classically parallel (the true scores and error variances for the halves should be equal for every population of examinees taking the half-tests) [15]. Crowe et al. [7] scored ten sets of four pictures (different time points) of each patient. Once the pictures of one patient were scored, the set of pictures of the next patient were scored. The observers were aware of this picture arrangement, which could have biased the scoring. The observers consisted of two novices and two experts. However, the ICC was calculated according to expertise, and the authors did not describe the ICC of the four observers together. Beausang et al. [9] analyzed image (photograph) scoring as well as clinical and histologic scar assessment. For the image analysis, ten observers first assessed 22 photographs and then assessed 14 photographs for test-retest analysis. The photograph scoring was not compared with clinical and histological analysis. The overall reliability of image scoring, as judged by the Spearman correlation coefficient, was 0.87. Among all scar scales, that described by Yeong et al. [8] has yielded the highest reliability (ICC of 0.94 for scar surface, 0.95 for border height, 0.90 for thickness, and 0.85 for color differences). For this reason, we chose to base our scale on this scheme. Our scale consisted of three modified versions of characteristics from Yeong et al. scale. Of these, the color mismatch category was used to account for differences in pigmentation (hypo and hyperpigmentation) and the presence of erythema. One of the main problems with some scoring systems is that they assign a specific color (e.g., red, purple) to scars, resulting in high variability. Our color mismatch scoring system avoided this pitfall and required

that scores for differences in pigmentation and erythema be combined. We believe that use of three categories keeps the scoring simple and reliable, but is sufficient to allow one to discriminate between varying degrees of maturation (erythema) and to monitor outcomes of therapy aimed at rectifying hypo and hyperpigmentation. One difficulty in assessing color differences is the diversity of possible skin tones, even within an individual patient. Thus, assessing color mismatch is most appropriately done by comparing the scar with adjacent (or the closest) nonaffected skin. Including nonaffected skin in the photograph or zooming out to include a larger area (even a picture of the entire patient, if necessary) is essential to adequately evaluate the appearance of the patient’s normal skin (Fig. 1). This is particularly useful when the scored scar exhibits mixed hypo and hyperpigmentation areas. Assessing color differences between scars and normal skin is also difficult because of the numerous variables involved in the perception of color. These include, but are not limited to, photographic settings (e.g., camera resolution and source of light) and individual perception. Nevertheless, we have demonstrated that color differences between scarred and normal skin can be reliably assessed in photographs of burn patients, with a correlation of 0.83. The second characteristic in our scale was surface appearance, which is related to the cosmetic appearance of the scar. The presence of smooth skin is non-natural and should be differentiated from normal skin. However, because this distinction is subtle and the scar may have an appearance close to that of normal skin, we decided to score it as 2 instead of 1, as in the original scale. For variables without true values, such as scar surface evaluation, validation requires

Table 3 e Modified scale: correlation and variance by category. Statistical Parameter

Category

Cronbach alpha Intraclass correlation* (95% confidence interval) Po CV  SEM Variance  SEM

Surface appearance

Height

Color

0.88 0.59 (0.52e0.67) 0.52 0.19  0.11 0.32  0.26

0.87 0.56 (0.48e0.64) 0.54 0.20  0.12 0.30  0.24

0.82 0.46 (0.38e0.57) 0.43 0.24  0.10 0.41  0.26

SEM ¼ standard error of the mean. * Based on standardized items.

5.2.0 DTD  YJSRE13012_proof  21 November 2014  5:58 pm  ce

846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910

8

911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975

j o u r n a l o f s u r g i c a l r e s e a r c h x x x ( 2 0 1 4 ) 1 e1 0

Fig. 6 e Comparison of the CV for each individual photograph in the original and modified scoring systems. (A) Total score. (B) Height. (C) Color mismatch. (D) Surface appearance (modified scale)/surface (original scale). (Color version of the figure is available online.)

assessment of reliability and the correlation between observers to demonstrate a score’s utility. This was the case with assessment of the cosmetic appearance of the scar surface, which had the highest correlation among observers (Table 3). The final characteristic in our scale was scar height, which is usually intended to assess hypertrophy. The only

way to accurately assess hypertrophy (scar height) is to objectively measure the elevation from the normal skin plane. However, we found that when a description of the characteristics of the scar border was provided, the scale proved to be highly reliable in assessing border height. The intention of the score was not to provide a specific height (or

Fig. 7 e Comparison of variance for each individual photograph in the original and modified scoring systems. (A) Total score. (B) Height. (C) Color mismatch. (D) Surface appearance (modified scale)/surface (original scale). (Color version of the figure is available online.) 5.2.0 DTD  YJSRE13012_proof  21 November 2014  5:58 pm  ce

976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040

j o u r n a l o f s u r g i c a l r e s e a r c h x x x ( 2 0 1 4 ) 1 e1 0

1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105

depression) of the scar in millimeters. Indeed, this would be impossible to determine in a photograph. Instead, our goal was to demonstrate that different levels of depression and elevation of the scar could be reliably scored. Given this goal, the standardization of photography is mandatory. Scar depression can be subtle or prominent and was included in the height category as varying degrees of negative slope (drop of the border). Our goal was to allow analysis of different clinical scenarios (e.g., reepithelialization, tangential, and fascial excision, treatments oriented to decrease hypertrophic scarring). The utility of the modified scale was determined by analyzing reliability, which indicates the quality of the measurements [16]. Several methods can be used to evaluate the reliability. Each provides specific information and has pros and cons. We decided to apply more than one to obtain a comprehensive evaluation of this scoring system. Correlation has limitations in testing reliability because it does not allow for the assessment of more than two observers [12]. Furthermore, correlation does not provide a measure of reliability, only covariance [12]. Therefore, we analyzed the scores of the ten possible pairs of observers (all raters pair-wise) to test Po and correlations. The ICC reflects both degrees of correspondence and agreement among two or more ratings. Consequently, ICC is an extension of the reliability coefficient [12]. A value of 1 reflects perfect agreement (no variance), whereas 0 reflects perfect disagreement. Both the original and modified scales resulted in a good ICC, but the modified scale was closer to 1 than the original scale. Yeong et al. [8] reported an ICC of 0.95 for scar surface, 0.93 for scar thickness, 0.95 for scar border height, and 0.85 for color difference from normal. We were unable to reproduce these values in our study, perhaps because of the larger sample size of this study (120 photographs in this study versus 10 in the Yeong study). For a significance level of 0.95 and a confidence-interval width of 0.2, the sample size necessary to generate the ICCs calculated for the original-scale categories (ICC ¼ 0.32e0.47) was 87e94 samples and that necessary to generate the ICCs of the modifiedscale categories (ICC ¼ 0.46e0.59) was 71e84 samples [17]. For the final scores, the required sample size for the modified scale was 10 samples (ICC ¼ 0.9) and for the original scale was 28 samples (ICC ¼ 0.83). Unfortunately, Yeong et al. [8] did not report the ICC for the final score. Kappa statistics is a chance-corrected analysis of exact agreement between scores. It provides a general idea of agreement. However, it does not provide a measure for “close agreement” and gives no value to scores that remain close over several events [13], making it inadequate for assessing a scar scale score. We decided to analyze CV to determine how close or different the values between observers were using both scales. Because of the fact that the original scale included negative numbers, the scores were equal to zero in some cases. This made it impossible to calculate the CV. The presence of negative numbers also consistently affected the CV. Therefore, we analyzed variance using KruskaleWallis oneway ANOVA by ranks, as previously described. Repeated measures of ANOVA are needed for assessing the variance between scores (raters) in this model [12]. Both analyses detected significant differences among observers for the

9

original scale, and the variance in scores was significantly higher for the original scale than for the modified scale. No significant difference between observers was detected for the modified scale.

5.

Conclusions

Q5

In summary, this modified scar scale score can be used to evaluate photographs of severely burned patients to assess the severity of post-burn scarring, either prospectively or retrospectively. Use of this method requires standard photographic conditions and a standard comparison chart (Fig. 3). By adhering to these guidelines, investigators can reliably evaluate hypertrophic scarring progression over time. We have used one of the largest sample sizes of scored photographs (compared with studies in the literature) and comprehensive statistical analysis to show that this new, modified scale is valid and reliable. It is important to note that comparison of this new scoring system with other clinicbased scoring systems (e.g., the Vancouver scar scale) is not feasible because some parameters such as pliability are impossible to assess in photographs. Ultimately, these clinicbased scoring systems were not designed for photograph analysis. Our newly described modifications provide a useful tool to investigators interested in assessing hypertrophic scarring.

Acknowledgment The authors thank Dr Kasie Cole for editing and proofreading this article. This study was supported by grants from Shriners Hospitals for Children (80100, 80480, 71008, 8660, 8740, 71001, 8760, and 9145), National Institutes of Health (R01-GM56687, R01-GM56687-11S1, R01-GM087285, T32-GM008256, and P50GM60338), the National Institute on Disability and Rehabilitation Research (H133A020102), The Canadian Institutes of Health Research (#123336), the CFI Leader’s Opportunity Fund (#25407), the Physicians’ Services Incorporated Q6 FoundationdHealth Research Grant Program, and the Wound Q7 Healing Foundation. CCF is an ITS Career Development Scholar supported, in part, by NIH KL2RR029875 and NIH UL1RR029876. This work was also supported in part by a Clinical and Translational Science Award from NCATS (UL1TR000071). Q8 Authors’ contributions: C.C.F., D.N.H., L.K.B., R.K., S.A.M., H.G.R., N.R.-E., and M.G.J. made substantial contributions to the conception or design of the work. G.A.M., A.M.A., S.H., F.N.W., S.A.M., H.G.R., and N.R.-E. contributed to the acquisition, analysis, or interpretation of data for the work. All authors did the drafting of the work or revising it critically for important intellectual content.

Disclosure None declared.

5.2.0 DTD  YJSRE13012_proof  21 November 2014  5:58 pm  ce

1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170

10

1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201

j o u r n a l o f s u r g i c a l r e s e a r c h x x x ( 2 0 1 4 ) 1 e1 0

Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.jss.2014.10.055.

references

[1] Desai MH, Herndon DN, Broemeling L, Barrow RE, Nichols RJ Jr, Rutan RL. Early burn wound excision significantly reduces blood loss. Ann Surg 1990; 211:753. [2] Herndon DN, Barrow RE, Rutan RL, Rutan TC, Desai MH, Abston S. A comparison of conservative versus early excision. Therapies in severely burned patients. Ann Surg 1989;209:547. [3] Oliveira GV, Chinkes D, Mitchell C, Oliveras G, Hawkins HK, Herndon DN. Objective assessment of burn scar vascularity, erythema, pliability, thickness, and planimetry. Dermatol Surg 2005;31:48. [4] Lau JC, Li-Tsang CW, Zheng YP. Application of tissue ultrasound palpation system (TUPS) in objective scar evaluation. Burns 2005;31:445. [5] Katz SM, Frank DH, Leopold GR, Wachtel TL. Objective measurement of hypertrophic burn scar: a preliminary study of tonometry and ultrasonography. Ann Plast Surg 1985;14:121. [6] Sullivan T, Smith J, Kermode J, McIver E, Courtemanche DJ. Rating the burn scar. J Burn Care Rehabil 1990;11:256.

[7] Crowe JM, Simpson K, Johnson W, Allen J. Reliability of photographic analysis in determining change in scar appearance. J Burn Care Rehabil 1998;19:183. [8] Yeong EK, Mann R, Engrav LH, et al. Improved burn scar assessment with use of a new scar-rating scale. J Burn Care Rehabil 1997;18:352. [9] Floyd H, Dunn KW, Orton CI, Ferguson MW. A new quantitative scale for clinical scar assessment. Plast Reconstr Surg 1998;102:1954. [10] Micomonaco DC, Fung K, Mount G, et al. Development of a new visual analogue scale for the assessment of area scars. J Otolaryngol Head Neck Surg 2009;38:77. [11] Masters M, McMahon M, Svens B. Reliability testing of a new scar assessment tool, Matching Assessment of Scars and Photographs (MAPS). J Burn Care Rehabil 2005;26:273. [12] Portney LG, Watkins MP. Foundations of clinical research: applications to practice. Upper Saddle River, N.J: Pearson/ Prentice Hall; 2009. [13] Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420. [14] Smith GM, Tompkins DM, Bigelow ME, Antoon AY. Burninduced cosmetic disfigurement: can it be measured reliably? J Burn Care Rehabil 1988;9:371. [15] Charter RA. It is time to bury the Spearman-Brown “Prophecy” formula for some common applications. Educ Psychol Meas 2001;61:690. [16] Trochim WM. Research methods knowledge base. Ohio: Atomic Dog Publishing; 2006. [17] Bonett DG. Sample size requirements for estimating intraclass correlations with desired precision. Stat Med 2002; 21:1331.

5.2.0 DTD  YJSRE13012_proof  21 November 2014  5:58 pm  ce

1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232

Reliable scar scoring system to assess photographs of burn patients.

Several scar-scoring scales exist to clinically monitor burn scar development and maturation. Although scoring scars through direct clinical examinati...
3MB Sizes 2 Downloads 8 Views