THEKNE-02034; No of Pages 8 The Knee xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

The Knee

Assessing participation in the ACL injured population: Selecting a patient reported outcome measure on the basis of measurement properties Robert Letchford a,c,⁎, Valerie Sparkes a,b, Robert W.M. van Deursen a,b a b c

School of Healthcare Sciences, Cardiff University, Second Floor, Cardigan House, Heath Park Campus, Cardiff CF14 4XN, United Kingdom Arthritis Research UK Biomechanics and Bioengineering Centre, Cardiff University, Cardiff, United Kingdom Aneurin Bevan Health Board, Physiotherapy Department, Royal Gwent Hospital, Cardiff Road, Newport NP20 2UB, Gwent, United Kingdom

a r t i c l e

i n f o

Article history: Received 30 October 2014 Received in revised form 19 January 2015 Accepted 21 January 2015 Available online xxxx Keywords: ACL reconstruction Activity Participation Patient reported outcome measure Measurement properties

a b s t r a c t Background/aim: A return to pre injury activity participation remains a common but often elusive goal following ACL injury. Investigations to improve our understanding of participation restrictions are limited by inconsistent use of insufficiently investigated measurement tools. The aim of this study was to follow the consensus based standards for the selection of health measurement instruments (COSMIN) guideline to provide a comparative evaluation of four patient reported outcomes (PROMs) on the basis of measurement properties. This will inform recommendations for measuring participation of ACL injured subjects, particularly in the United Kingdom (UK) National Health Service (NHS). Methods: Thirteen criteria were compiled from the COSMIN guideline. These included reliability, measurement error, content validity, construct validity, responsiveness and interpretability. Data from 51 subjects collected as part of a longitudinal observational study of recovery over the first year following ACLR was used in the analysis. Results: Of the thirteen criteria, the required standard was met in 11 for Tegner, 11 for International Knee Documentation Committee (IKDC), 6 for Cincinnati Sports Activity Scale (CSAS) and 6 for Marx. The two weaknesses identified for the Tegner are more easily compensated for during interpretation than those in the IKDC; for this reason the Tegner is the recommended PROM. Conclusions: The Tegner activity rating scale performed consistently well in respect of all measurement properties in this sample, with clear benefits over the other PROMs. The measurement properties presented should be used to inform implementation and interpretation of this outcome measure in clinical practice and research. Level of evidence: Level II prospective study. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Short term success of interventions for the anterior cruciate ligament (ACL) injured knee has been defined by a symptom free return to participation in the individual's chosen activities [1–3]. However, recent publications of rehabilitative [1] and surgical [2,3] interventions, have demonstrated lower rates of success than has been previously expected [4]. Whilst a multifactorial interaction between physical, physiological, psychological and social factors has been proposed [4], investigations of these are limited by inconsistencies in the measurement of participation outcomes [2,5]. No gold standard measure for participation outcomes exists, however patient reported outcome measures (PROMs) have become widely accepted in the literature. A recent systematic review demonstrated that PROMs are inconsistently adopted and that the four most commonly reported (Tegner, Cincinnati sports activity scale (CSAS), Marx and

International knee documentation committee (IKDC)) lack a comprehensive exploration of measurement properties [5]. There has been considerable debate regarding terminology and methodology for the assessment of measurement properties of PROMs [6,7]. The COSMIN group (Consensus based standards for the selection of health measurement Instruments) have published an international consensus guideline that goes a significant way to resolving this debate, and offers a framework for the conduct and reporting of such studies [8]. This study therefore aimed to; following the COSMIN guideline, provide a comparative evaluation of the measurement properties of these four PROMs. This will inform recommendations for participation PROMs for ACL injured subjects, particularly in the United Kingdom (UK) National Health Service (NHS). 2. Materials and methods 2.1. The four patient reported outcome measures (PROMs)

⁎ Corresponding author at: Floor 2 Cardigan House, School of Healthcare Sciences, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom. Tel.: +44 7973661467. E-mail address: [email protected] (R. Letchford).

The Tegner activity rating scale [9] is a single item with 11 responses ranked by activity type and intensity on an ordinal scale between 0 and

http://dx.doi.org/10.1016/j.knee.2015.01.010 0968-0160/© 2015 Elsevier B.V. All rights reserved.

Please cite this article as: Letchford R, et al, Assessing participation in the ACL injured population: Selecting a patient reported outcome measure on the basis of measurement pr..., Knee (2015), http://dx.doi.org/10.1016/j.knee.2015.01.010

2

R. Letchford et al. / The Knee xxx (2015) xxx–xxx

10. Investigations for measurement properties [9–13] have reported adequate evidence for test–retest reliability, measurement error and known groups validity [5]. The Cincinnati Sports Activity Scale (CSAS) [14,15] is a single item with 12 responses ranked by activity type and frequency on an ordinal scale between 0 and 10. There is adequate evidence only for reliability [5]. The Marx activity rating scale [16] has four items each with four responses. These are ranked by frequency on an ordinal scale between 0 and 4 and summed to a maximum score of 16. There is adequate evidence for reliability and convergent/divergent validity [5]. The International Knee Documentation Committee (IKDC) includes participation on both the knee evaluation [17] and subjective knee form [18]. It is a single item with five responses ranked between 1 and 5 by the intensity and type of activity. No studies of measurement properties of the activity rating section were identified [5]. 2.2. Data collection Data were collected as part of a prospective longitudinal observational study investigating recovery following primary hamstring autograft ACL reconstruction. All patients attending our unit for the above procedure between January 2011 and July 2013 were invited to participate. Subjects were excluded only if additional surgical procedures that altered the standard rehabilitation programme were performed (e.g. microfracture). Data were collected prior to surgery and 1, 2, 3, 6 and 12 months following surgery. Pre operative data were collected on average 25 (standard deviation, SD = 34) days before surgery; this was on average 19 (SD = 17) months following injury. Retrospective measures of pre injury participation were also collected at the pre operative visit, however for 22 subjects they were delayed due to later inclusion of the IKDC in the study. All four participation PROMs were provided in the original published format [9,14,16,18]. Additional data were collected simultaneously at each visit for use in the validity and interpretation analysis; measures of knee function including the Lyshlom knee scale, IKDC Subjective knee form (IKDC SKF) and Visual Analogue Scale (VAS) for pain and a global rating of change score (GRCS) for participation. Ethical approval was received from the South Wales Research Ethics Committee (Reg: 10/WSE04/48). Where missing data occurred, the Missing Completely at Random (MCAR) assumption was assessed using Little's MCAR test [21] and differences in baseline participation and demographics between the subjects with and without missing data were explored [22]. Listwise deletion was used when the MCAR assumption was supported [22,23]. 2.3. Measurement properties The definitions and methodological guidelines established by the COSMIN group [8,19] were fully adopted. These are summarised below. 2.3.1. Reliability and measurement error Test–retest reliability was calculated from repeated measures from a convenience sample of 35 subjects (on the basis of the COSMIN recommendations) [19] who completed each PROM at consultation, then repeated them two and four days later [20] and returned them in a sealed envelope. Intraclass correlation coefficients (ICC) were considered acceptable when values were N0.8 for group and N0.9 for individual analysis [19,24,25]. Measurement error was calculated from this repeated measures data and considered acceptable when smallest detectable change (SDC) for both individual and group analysis [19,26, 27] was lower than one category of change. Due to differences in the PROM's scoring structures this was one point change on the Tegner, Marx and IKDC, but 5 points on the CSAS. 2.3.2. Validity and responsiveness Relevance of items and responses to the construct, population and purpose of the instrument (content validity) were assessed by cross

matching the items on each PROM to the participation domain of the World Health Organisation International Classification of Functioning, Disability and Health (WHO ICF) [28], published national sports participation data [29] and ACL injury risk data [30]. The PROM with the greater number of ICF domains and qualifiers, greatest number of high risk and high frequency participation sports was considered to best represent the diverse population of ACL injury and was therefore preferred. Questionnaire development (Item generation and reduction) was previously assessed by our systematic review [5] and was therefore not further investigated in this study. Only the Marx scale was considered adequate on this criterion [5]. Since there is no gold standard measure for participation, construct rather than criterion methods were applied to assess validity and responsiveness [8]. Hypothesis testing of relationships between known groups (healthy, ACL injured and ACL reconstructed), convergent (the four participation PROMs) and divergent (knee function and pain) constructs was used to define validity. Direction and magnitude of the change in scores over time, as subjects passed from healthy to injured and reconstructed were used to define responsiveness. The following hypotheses were generated. For convergent validity: Since all four PROMs measure the same construct Hypothesis one states that they would correlate highly (r N 0.7). For divergent validity: Whilst function is considered a primary limiter of participation [2,3,31,32] the ‘knee abuser’ is known to continue to participate despite impairments [14]. Hypothesis two therefore stated there would be a moderate (0.4–0.7) correlation between function and participation. Participation is known to reduce with age [3, 33] and Hypothesis three stated a moderate (0.4–0.7) inverse correlation with age. In an active population there is no reason to expect BMI to influence participation and Hypothesis four stated a low correlation (r b 0.4) with BMI. Correlations were calculated using Spearman's r and interpreted using Dancey and Reidy (2004) categorisation [34], coefficients between 0.7 and 0.9 are considered strong, 0.4 to 0.6 are considered moderate and 0.1 to 0.3 are considered weak. For known groups validity and responsiveness: Participation is known to be restricted following ACL injury and is expected to improve but not resolve 12 months after surgery [2,3]. Since most rehabilitation schedules consider return to pre injury activities after six months [40] change in participation between six and 12 months is expected to be greater than between injury and six months. Therefore Hypothesis five for validity stated that pre operative scores will be lowest, becoming sequentially greater at six months, 12 months and highest before injury. Hypothesis six for responsiveness stated that change scores between pre-injury and pre-operative will be negative and larger than the positive changes occurring between both pre-operative and six months or six and 12 months. These differences were tested using Freidman's ANOVA with post hoc Wilcoxon signed rank test and Bonferroni correction. Effect size (r) was calculated from the t statistic and interpreted following the categorisation of Cohen (1998); N 0.5 is considered large, 0.3–0.5 moderate and b0.3 small. Adequate known group's validity and responsiveness were achieved if hypotheses were confirmed with appropriate effect size. 2.3.3. Interpretability Floor and ceiling effects were considered significant when N 15% of scores were located within the lowest or highest category of the scale, absence of significant effects is preferred [36]. Minimally important change (MIC) is the subject of considerable debate, with both distribution and anchor based methods described [7,37]. The visual anchor based MIC distribution provides an appropriate compromise, fulfilling the requirements of both methods [7]. The minimal change and much change levels of the global rating of change score (GRCS) were used to define important change. Receiver Operator Characteristic (ROC) curve was used to define the optimal cut off point, for both directions of change at ‘minimally’ and ‘much’ changed levels of the GRCS [38]. The area under the curve (AUC) is used as a summary statistic [19,39]

Please cite this article as: Letchford R, et al, Assessing participation in the ACL injured population: Selecting a patient reported outcome measure on the basis of measurement pr..., Knee (2015), http://dx.doi.org/10.1016/j.knee.2015.01.010

R. Letchford et al. / The Knee xxx (2015) xxx–xxx

for sensitivity and specificity. For adequate interpretability the MIC value should be greater than SDC [19]. 3. Results Sample characteristics are presented in Table 1. There were 34 (11.33%) missing data observations, Little's MCAR test was not significant (Chi square = 103.116, P = 0.267) and there were no significant (P N 0.05) differences between those with and without missing data for any of the demographic or participation variables at baseline. The MCAR assumption was therefore supported and analysis was completed using listwise deletion and a sample of 51 subjects. 3.1. Reliability Reliability and measurement error are presented in Table 2. All PROMs meet the ICC requirement for groups (N0.80); however only the Tegner and Marx satisfy the requirement for individuals (N0.90). SEM was within one category for both the Tegner and IKDC, but greater for the CSAS and Marx. The IKDC was the only PROM to have an SDC less than one unit for individuals; however the IKDC, Tegner and CSAS meet this criterion for groups. Marx failed the SDC criteria for both situations. 3.2. Validity and responsiveness Content data is displayed in Tables 3 and 4. All items belonged to the ICF domains relating to physical activity. They were however represented differently across the PROMs; the CSAS and IKDC represent four domains, Tegner represents three and Marx represents two. The number of qualifiers also differed with fewest on the Marx. All but one of the activities on the scales was included in the sports Wales participation data. Sports with the highest risk of injury were all included on the scales. Hypotheses one to four were supported (Table 5) for statistical significance but not strength of correlation. All PROMs correlated significantly; however the coefficients were generally lower than hypothesised; only the Tegner and IKDC reached strong correlations. The Tegner and IKDC also showed higher correlations than hypothesised with the IKDC SKF, which was also stronger than the correlation to the Lysholm scale for all PROMs. Hypotheses five to six were supported by significant differences with appropriate direction and magnitude across the study (Table 6). For validity the differences were large for the first six months, whilst this remained the case for the Tegner and IKDC after six months, the CSAS and Marx were both reduced to moderate effects. For responsiveness, change scores were large between pre injury, pre-operative and six months. However between six and 12 months Tegner and IKDC showed a trend towards smaller increases whilst CSAS and Marx showed significant and moderate reductions in change score. 3.3. Interpretability Tegner was the only PROM with no floor or ceiling effects (Table 7). The CSAS and IKDC had ceiling effects in the healthy state and at 12 months post op and the Marx scale had a significant floor effect following surgery. The visual anchor based MIC distribution graphs (Fig. 1) demonstrated that all four PROMs have overlap in the distribution of scores for the groups classified as better and worse by the GRCS. The optimal MIC cut off scores (Table 8) had good sensitivity and specificity (AUC N 0.8) for deterioration on all PROMs; this was however lower (AUC b 0.7) for improvements. The Marx and IKDC had the lowest sensitivity, below 0.5 for minimal improvement. The MIC was the same for improvement and deterioration for the Tegner, CSAS and IKDC, where a change of one point in either direction was considered important. For the Marx however, the MIC was greater for deterioration than improvement. The relationship between MIC and SDC was appropriate for the IKDC, Tegner had MIC less than the SEM but not SDC, whilst the CSAS and Marx had MIC values lower than both SEM and SDC. Performance of each tool against the 14 criteria is presented in Table 9. The required standard was met by the Tegner in 11, IKDC in 11, CSAS in 6 and Marx in 6 (Table 9).

3

Table 2 Reliability and measurement error. Intraclass correlation coefficients (ICC) for Individual (Ind) and Group (Grp) comparisons, standard error of measurement (SEM) and smallest detectable change (SDC) are presented. Greyscale indicates where the standard is not met.

PROM

Test retest reliability ICC (Ind)

ICC (Grp)

Measurement error SEM

SDC (Ind)

SDC (Grp)

Tegner

0.92

0.92

0.63

1.75

0.29

CSAS

0.82

0.82

9.60

26.59

4.49

Marx

0.92

0.92

1.23

3.41

1.60

IKDC

0.86

0.86

0.29

0.80

0.14

4. Discussion The main finding of this study is that both the Tegner and IKDC performed consistently better than the CSAS and Marx across the 14 criteria. Both the Tegner and IKDC passed 11 criteria; however there were differences upon which a preferred PROM might be selected. Performance of each PROM against each criterion will be discussed before final recommendations are made. The high levels of reliability identified in this sample were consistent with the literature [9–12,15,16]. However only the Tegner and Marx met the standard for both group and individual analyses. Measurement error is rarely reported in the literature; however in this sample it was a criterion on which the PROMs differed. Only the IKDC met the standard for both group and individual analysis. Therefore, the Tegner is the more reliable PROM, and whilst a 1 point change is within measurement error during group analysis, when assessing change within individuals, a change of 2 points is required to be confident of avoiding measurement error. Assessment of content validity confirmed that all of the PROMs measure the participation construct and are suitable for the local ACL injured population. However, differences in distribution of ICF domains and qualifiers indicate clear differences in the way in which each PROM measures the participation construct. The greater number of ICF domains and qualifiers indicates that the Tegner, CSAS and IKDC offer a broader representation of participation that should be preferred for this population. The Marx is more specific, assessing high level sports activities and should be used in populations with these specific demands, as was the author's intention [16]. The convergent/divergent hypotheses were supported with just two exceptions. Whilst correlations between PROMs were lower than hypothesised, they are similar to those in the literature (r = 0.66– 0.67) [16]. This is not surprising since the content analysis demonstrated differences in the way the construct is measured. In fact, the PROMs with similar content also show the highest correlations. There was also higher correlation to the IKDC SKF than the Lysholm, which is likely to

Table 1 Sample characteristics. Baseline characteristics and pre injury participation scores are presented for the study sample. Subjects with complete data are included in the analysis. There are no significant differences between the complete and incomplete groups at baseline. The test retest sample was selected by convenience to complete the reliability data collection. There were no statistically significant (P b 0.05) differences between samples. Baseline characteristics Mean (standard deviation)

Pre injury participation Median (range)

Sample

Gender M/F

Age years

Height cm

Weight kg

Time to surgery months

Tegner 0–10

CSAS 0–100

Marx 0–16

IKDC 1–4

Study cohort (n = 75) Complete data (n = 51) Missing data (n = 24)

11 F 64 M 8F 43 M 3F 21 M 7F 28 M

30.2 (8.8) 30.9 (9.8) 28.8 (6.2) 30.2 (8.3)

175.8 (7.3)

85.8 (17.2)

175.7 (11.4) 176.2 (6.4) 176.6 (9.0)

86.7 (82.7) 83.8 (13.8) 83.36 (15.9)

24.0 (41.8) 23.2 (39.3) 27.0 (47.3) 20.2 (16.4)

7 (3–10) 7 (4–10) 8 (3–9) 7 (5–9)

100 (40–100) 100 (40–100) 97.5 (40–100) 95 (40–100)

12 (0–16) 12 (0–16) 12 (0–16) 12 (0–16)

1 (1–3) 1 (1–3) 1 (1) 1 (1–3)

Test retest (n = 35)

Please cite this article as: Letchford R, et al, Assessing participation in the ACL injured population: Selecting a patient reported outcome measure on the basis of measurement pr..., Knee (2015), http://dx.doi.org/10.1016/j.knee.2015.01.010

4

R. Letchford et al. / The Knee xxx (2015) xxx–xxx

Table 3 Content validity—cross matching to the participation component of the WHO ICF [28]. The relation between items on the PROMs and the five domains of the physical activity participation component of the WHO ICF [28] are presented. Representation by a greater number of domains and qualifiers (indicated by Y) is considered more appropriate; the weakest on this basis is highlighted in greyscale.

WHO ICF

PROM

Domain Mobility

4

Qualifier

Tegner

Walk Walk (uneven) Run Jump Swim Cycle Specified

Y Y Y Y Y Y

Marx

CSAS

IKDC Y

Y Y y Y Pivot Cut Turning Twisting

5

Self care

ADL

Y

6

Domestic life

ADL Yard work House work

Y

Y

Y Y

Pivot Cut Decelerate

Pivot

Y Y Y

8

Major life areas

Employment

Y

9

Community social and civic life

Sports

Y

Y

Y

Y

Total domains

3

4

2

4

Total qualifiers

8

11

5

8

Table 4 Content validity—cross matching to sports Wales participation [29] and injury risk [30] data. The relation between activities explicitly included on each of the PROMs (indicated by a Y), local participation rates and published ACL injury risk data are presented. Participation is the percentage of the Welsh population (M = male and F = female) reporting regular participation in the activity to the sports Wales active adults survey 2008/9; injury risk data is the percentage injury rate per 1000 exposures reported in Renstrom et al. (2008). Greater coverage of high risk and high participation activities is considered more appropriate; the weakest on this basis are highlighted in greyscale.

Item

Participation (%)

Injury risk

M

F

Total

M

PROM F

Tegner

CSAS

32.7

34.8

33.8

Y

10.3

12.5

11.4

Y

Y

Cycling

11.0

4.3

7.5

Y

Y

Y

Y

Y

Y

Soccer

12.9

1.0

6.7

3.3 7.5

1.4 3.0

2.3 5.2

1.3

3.7

Y Y

5.6

0.6

3.0

4.6

0.1

2.3

Badminton

1.7

0.9

1.3

Y

Y

Y

Tennis

1.4

0.8

1.1

Y

Y

Y

Squash

1.1

0.2

0.6

Y

Y

Y

Basketball

1.0

0.3

0.6

Y

Y

Athletics

0.7

0.3

0.5

0.7

0

0.4

Hockey

0.3

0.2

0.3

1.4

4.9

Y

Y

Y Y Y

1.6

Volleyball

0.2

0.3

0.3

2.0

Gymnastics

0.2

0.3

0.3

4.9

Skiing

0.3

0.1

0.2

Baseball

0.2

0.1

0.1

0.7

Wrestling

0.1

0.1

0.1

1.5

Ice Hockey

0.1

0

0

1.2

Handball

0.1

0

0

0

0

0

Bandy

Y

Y

Golf Rugby

Motocross

IKDC Y

Walking Swimming

Running Jogging

Marx

Y Y

Y

Y

Y

Y

5

6

Y

2.4 Y 0.7

Y

Y

Y Y Y

TOTAL

17

15

Please cite this article as: Letchford R, et al, Assessing participation in the ACL injured population: Selecting a patient reported outcome measure on the basis of measurement pr..., Knee (2015), http://dx.doi.org/10.1016/j.knee.2015.01.010

R. Letchford et al. / The Knee xxx (2015) xxx–xxx

5

Table 5 Construct validity—hypothesis testing of convergent and divergent constructs. Correlation coefficients (r) for the convergent and divergent hypotheses are presented with 1 tailed significance indicated by ⁎ at P b 0.05 and ⁎⁎ at P b 0.01. Results which are not consistent with a priori hypotheses are highlighted in greyscale.

Validity

Hypothesis

Measure

Tool

Tegner

Tegner CSAS Convergent

1

Participation

2

Function

Divergent 3

Demographic

4

CSAS

Marx

IKDC

0.628**

0.686**

−0.719**

0.69**

−0.528**

0.628**

−0.640**

Marx

0.686**

0.690**

IKDC

−0.719**

−0.528**

−0.640**

Lysholm

0.612**

0.478**

0.548**

−0.567**

IKDC SKF

0.700**

0.573**

0.684**

−0.710**

Pain VAS

−0.436**

−0.335**

−0.328**

0.400**

Age

−0.251*

−0.421**

−0.491**

0.396**

BMI

0.060

−0.125

0.150

0.223

Abbreviations: Lysholm knee rating scale (Lysholm), IKDC subjective knee form (IKDC SKF), visual analogue scale for pain (Pain VAS), and body mass index (BMI).

reflect the dependence of many of the IKDC SKF response categories on activity restrictions. This dependence indicates that the IKDC SKF is a measure of both participation and function, and that the higher correlation to both the Tegner and IKDC may be considered additional validation of their association with the broader participation construct. Whilst all hypotheses for known group's validity and responsiveness were confirmed, the PROMs do not behave as expected between six and 12 months. During this period most rehabilitation schedules consider return to sports [40] and change in participation is therefore expected

to increase. This was the case for both Tegner and IKDC; however changes on Marx and CSAS reduced. This may in part be due to the dependence of these scales on frequency or the higher intensity activities that they measure. The Tegner and IKDC are more responsive to the demands of this population and are preferred on this basis. The Tegner was the only PROM to demonstrate no distribution effects. This is likely due to the extreme nature of the descriptors for both ceiling (Competitive sport—national and international elite) and floor (Sick leave or disability pension) categories. The Marx scale has a

Table 6 Known group's hypothesis testing for validity and responsiveness. Group mean (validity) and mean change scores (responsiveness) are presented with the results of Freidman's ANOVA comparing groups across the longitudinal data. Smaller effect sizes in the validity data and a reduction in change score in the responsiveness data do not meet the a priori hypotheses and are highlighted in greyscale.

Group mean (SD)

Freidman’s ANOVA

Post hoc contrasts (Wilcoxon)

PROM

PI - PO PI

Tegner Validity hypothesis 5

CSAS

Marx

IKDC

PO

6

Tegner

Responsiveness

CSAS

hypothesis 6 Marx

IKDC

6 – 12

X2

P

Z

P

ES

Z

P

ES

Z

P

ES

96.807

Assessing participation in the ACL injured population: Selecting a patient reported outcome measure on the basis of measurement properties.

A return to pre injury activity participation remains a common but often elusive goal following ACL injury. Investigations to improve our understandin...
1001KB Sizes 0 Downloads 8 Views