http://informahealthcare.com/dre ISSN 0963-8288 print/ISSN 1464-5165 online Disabil Rehabil, 2014; 36(20): 1704–1712 ! 2014 Informa UK Ltd. DOI: 10.3109/09638288.2013.868045

ASSESSMENT PROCEDURES

General motor function assessment scale – reliability of a Norwegian version Birgitta Langhammer1 and Birgitta Lindmark2 1

Department of Physiotherapy, Faculty of Health, Oslo and Akershus University College, Oslo, Norway and 2Physiotherapy Programme, Institution of Neurosciences, Uppsala University, Uppsala, Sweden Abstract

Keywords

Purpose: The General Motor Function assessment scale (GMF) measures activity-related dependence, pain and insecurity among older people in frail health. The aim of the present study was to translate the GMF into a Norwegian version (N-GMF) and establish its reliability and clinical feasibility. Methods: The procedure used in translating the GMF was a forward and backward process, testing a convenience sample of 30 frail elderly people with it. The intra-rater reliability tests were performed by three physiotherapists, and the inter-reliability test was done by the same three plus nine independent colleagues. The statistical analyses were performed with a pairwise analysis for intra- and inter-rater reliability, using Cronbach’s a, Percentage Agreement (PA), Svensson’s rank transformable method and Cohen’s . Results: The Cronbach’s a coefficients for the different subscales of N-GMF were 0.68 for Dependency, 0.73 for Pain and 0.75 for Insecurity. Intra-rater reliability: The variation in the PA for the total score was 40–70% in Dependence, 30–40% in Pain and 30–60% in Insecurity. The Relative Rank Variant (RV) indicated a modest individual bias and an augmented rank-order agreement coefficient ra of 0.96, 0.96 and 0.99, respectively. The variation in the  statistics was 0.27–0.62 for Dependence, 0.17–0.35 for Pain and 0.13–0.47 for Insecurity. Inter-rater reliability: The PA between different testers in Dependence, Pain and Insecurity was 74%, 89% and 74%, respectively. The augmented rank-order agreement coefficients were: for Dependence ra ¼ 0.97; for Pain, ra ¼ 0.99; and for Insecurity, ra ¼ 0.99. Conclusion: The N-GMF is a fairly reliable instrument for use with frail elderly people, with intrarater and inter-rater reliability moderate in Dependence and slight to fair in Pain and Insecurity. The clinical usefulness was stressed in regard to its main focus, the frail elderly, and for communication within a multidisciplinary team.

Assessments, general motor function assessment scale, Norway, older adults, reliability History Received 7 March 2013 Revised 3 September 2013 Accepted 18 November 2013 Published online 17 December 2013

ä Implications for Rehabilitation   

The Norwegian-General Motor Function Assessment Scale (N-GMF) is a reliable instrument. The N-GMF is an instrument for screening and assessment of activity-related dependence, pain and insecurity in frail older people. The N-GMF may be used as a tool of communication in a multidisciplinary team.

Introduction Sufficient motor functioning is an essential prerequisite for independence in the activities of daily living (ADL), which in turn has been shown to improve life satisfaction among older people undergoing rehabilitation [1]. Conversely, activity limitation is an important risk factor for functional decline, morbidity and mortality among community-dwelling older adults [2,3]. In particular, those with frail health are vulnerable to the risk

Address for correspondence: Birgitta Langhammer, Associate Professor PhD, Department of Physiotherapy, Faculty of Health, Oslo and Akershus University College, Postbox 4, St Olavs pl., 0130 Oslo 4, Norway. Tel: + 47 22 45 25 10. Fax: + 47 22 45 25 05. E-mail: Birgitta. [email protected]

of secondary conditions, as well as dysfunction caused by a reduction in activity levels [4]. However, in this group, a reduction in activity is also commonly caused by subjective experiences, such as performance-triggered pain and insecurity [5], which reduce the motivation and activity level. It is worth noting that both musculoskeletal pain and insecurity, including a fear of falling, are prevalent among older people with poor health [1–5]. Research has confirmed that among older people, pain and fear of falling are associated with limiting activity, avoiding the ADL and a declining quality of life (QoL) [6]. The General Motor Function assessment scale (GMF) [7,8] was constructed to meet the requirements of an instrument for the screening and assessment of activity-related dependence, pain and insecurity among older people with frail health [7,8]. The entire GMF assessment – including its subjective aspects – is based on performance testing, which for several reasons has been

DOI: 10.3109/09638288.2013.868045

considered to be best for decisions in relation to the functional (activity) capacity of older people [9]. In contrast to other instruments that target motor activities, the assessment procedure of the GMF is comprised of the observation of dependency, together with synchronized reports of pain and insecurity triggered by the execution of the tasks. These are important aspects relating to independence and function for older people. The combination of mobility and upper limb activities, are considered to have an impact on the ability to perform the ADL. Moreover, the verbalization of experiences associated with the performance of a task has been thought to be important for motivation to be physically active [10] It has also been suggested that older patients may be less willing to report pain [11] and that, among patients who are afraid of falling, a lack of communication regarding falls is associated with a limitation of activity [12]. Hence, the assessment procedure of the GMF may serve as a facilitator in the rehabilitation process for older people, since it may identify problem areas that need to be observed, expressed, noticed and treated. The test has been available only in Swedish and has not been translated into any other language before. It is therefore of interest to translate this assessment scale into Norwegian for use in geriatric care. The aim of the present study was to translate GMF into a Norwegian version and to establish its intra-rater and inter-rater reliability, as well as its clinical feasibility.

Methods Setting and subjects The subjects were recruited by their physiotherapists in a geriatric competence centre in a small community with a total of 64 000 inhabitants. The study’s convenience sample of 30 frail elderly people, identified as the target group for the GMF assessment, affords 90% power for a scale of approximately 20 items to demonstrate minimally acceptable internal consistency in the form of a Cronbach’s a coefficient of 0.8 [13,14]. The inclusion criteria were frail elderly people with transfer difficulties, as measured with the short version of the Falls Efficacy Scale [15] and Timed up and Go [16], who were dependent for the ADL, as measured with Barthel Index [17], and permanent or short-term residents in a nursing home, or alternatively daycare users, where they received physiotherapy and occupational therapy services twice a week. The exclusion criteria were: persons with cognitive impairments, those who were not able to understand instructions, or those with a terminal disease. Ethical approval of the study was obtained from the Regional Ethics Committee, 14th February 2011, nr 2010/3223. Information about the aim of the study was given both verbally and in writing to those who might be included. Their participation was voluntary and they signed to give their informed consent. The general motor function assessment The GMF is a performance-based test that tests three components of physical function: dependence, pain and insecurity. A lower score indicates better performance where the 0 score indicates total independence/lack of pain or no insecurity. The scale for mobility, 13 of 21 items, is divided into three categories: 0 ¼ independent, 1 ¼ needs help from one person and 2 ¼ needs help from two people. By contrast, arm and hand items, eight out of 21, are divided into two categories: 0 ¼ independent and 1 ¼ needs help. Pain is reported for all the items. A score of 0 indicates that no pain was felt, whereas 1 indicates pain. Insecurity is reported with regard to the nine mobility items and two of the arm items (touch left/right toe), where 0 indicates no

Norwegian general motor function assessment

1705

insecurity and a score of 1, insecurity. Three separate total scores are calculated for dependence, pain and insecurity, respectively [7]. A measure is said to be highly reliable if it produces similar results under consistent conditions [18–20]. Previous tests of the GMF concerning its clinical and psychometric properties showed satisfactory overall results, while the field tests strengthened the evidence of its clinical feasibility by indicating clinical practicality, relevance and usefulness [7]. A lack of floor effects is another characteristic of vital importance, because that demonstrates the ability of GMF to discriminate between subjects with severe activity limitations. Additionally, a very high inter-rater and test–retest reliability were exhibited [7]. In validity testing, GMF/dependence revealed a moderate to strong correlation to the ADL taxonomy [7,8] and Katz Index of ADL [7,8], respectively, thereby confirming that dependence on the tasks included in the GMF is related to capacity to perform the ADL. Cross-cultural translation The Swedish version of the GMF assessment scale was translated into Norwegian with a forward and backward procedure [21] as follows: a translation from Swedish to Norwegian was made by a professional translator. This version was discussed with therapists who were familiar with the test and both languages until a first Norwegian version had been developed. This first Norwegian draft was translated back into Swedish by a second professional translator, and the result was discussed by the professional translators and therapists until they could decide on a final version. It is this final version that is the subject of the reliability test in this study. Test procedure As with the Swedish version, the Norwegian-General Motor Function assessment scale (N-GMF) is a performance-based test that tests three components of physical performance (dependence, pain and insecurity) with three separate total scores [7]. In order to become familiar with the test and to help standardize the test occasions, several meetings and a workshop were held with the participating physiotherapists focusing on putting the N-GMF into practice. The process and the scoring, as well as the results of the scoring, were discussed and summarized into a consensus for test procedures during the trial (Figure 1). This procedure started in January and ended in July 2011, with a summing-up and closure of the study. Tests were then performed consecutively as the subjects agreed to participate in the study, a process which took 6 months; first patient was included in the beginning of March and the last at the end of June in 2011. The intra-rater reliability tests were performed by the same three volunteer physiotherapists on two occasions; each therapist evaluated 10 patients twice, and the test and retest were performed 1 week apart. This time interval was chosen in accordance with the Swedish reliability investigation, so that the participants would not be tired [7]. The inter-rater reliability test was performed by the three original volunteer physiotherapists and nine independent colleagues, all of whom tested the person simultaneously and with the distribution of testers 1:3. This means that one physiotherapist rated 10 patients with 3 different colleagues, so that patients were rated by the same pair of therapists in an order of 3 þ 3 þ 4 ¼ 10. The inter-rater reliability tests were performed on the first test occasion in all the tests. A total of 12 therapists were involved in this procedure, and there was no discussion or comparison of the scoring/results either during or after the tests. The test forms were filled out and completed separately, and then sealed and sent to the project leader.

1706

B. Langhammer & B. Lindmark N=30 paents consecuvely included in the study

Test 1 Inter-rater: Two test persons perform the General Motor Funcon assessment simultaneously with the distribuon of testers 1:3. A total of three “main” testers and nine “control” testers (n=12)

Test 2 Intra-rater: Main testers (n=3) performs the General Motor Funcon assessment on their respecve paents (n=10) one week aer test 1. Same procedure, same localies

Figure 1. Flow chart of the reliability testing procedure.

The intra-rater and inter-rater reliability covering observerassessed dependence, self-assessed pain and insecurity was evaluated for all the items [7]. Demographic data were obtained from the records held by the geriatric competence centre. Field test The 12 participating physiotherapists were also asked about their perceptions of the GMT’s clinical practicality by means of a semistructured questionnaire distributed to all. There were no answer options but the respondents answered in their own words. Three themes were at focus in this questionnaire: relevance of the test items, the clinical usefulness, and the test as a tool for communication within the multidisciplinary team. In addition, a focus group session was performed in order to capture other comments and thoughts about the test and the practical aspects of the clinic. The focus interview was audiotaped, transcribed verbatim and the text was read in order to find similarities and differences to the answers of the questionnaire, relating to the three themes. Statistical methods For inter-rater reliability, the statistical analyses were performed for a pairwise analysis that was simultaneously performed by a total of 12 raters working in pairs, while for intra-rater reliability, paired assessments were performed by the same rater (n ¼ 3) on the same individual on two different occasions, 1 week apart. The distributions of the N-GMF data were examined by calculating frequencies of floor and ceiling effects. The floor and

Disabil Rehabil, 2014; 36(20): 1704–1712

ceiling effects reflect the extent that the scores cluster at the bottom and top of the scale range. The magnitude of the effects indicates the scale’s ability to discriminate between subjects. In this study, it was deemed that floor and ceiling effects reported for more than 20% of the total number of participants should be considered substantial [22,23]. Cronbach’s a, a coefficient of reliability, was also evaluated. It is commonly used as a measure of the internal consistency or reliability of a psychometric test score for a sample of patients [24]. Cronbach’s a generally increases as the inter-correlations among the test items increase; hence, it is known as an estimate of the internal consistency of the reliability of test scores. Generally speaking, a value 40.7 is desired for a scale to be considered reliable, and because inter-correlations among test items are maximized when all the items measure the same construct, Cronbach’s a is widely believed to indicate indirectly the degree to which a set of items measures a single one-dimensional latent construct based on the -equivalent model. Cronbach’s a is a lower-bound estimate and if the standardized item a, reported in the SPSS calculation, is higher than Cronbach’s a, a further examination of the -equivalent measurement in the data may be called for [24]. Cronbach’s a is also said to be equal to the stepped-up consistency version of the intra-class correlation coefficient commonly used in observational studies. In this study, Cronbach’s a and the standardized item a are reported for the total scores of dependency, pain and insecurity, to evaluate whether the test items measure the same latent trait in the N-GMF [24]. Percentage Agreements (PA) were used for both intra- and inter-rater reliability to measure exact agreement for all the items of the N-GMF. A systematic disagreement was illustrated by plotting the cumulative relative frequencies for the marginal distribution against each other, which yielded a relative operating characteristics curve (ROC) [19,20,25]. With total agreement the ROC curve is the diagonal of identical co-ordinates. A curve located to one or the other side of the diagonal indicates a systematic over- or under-estimation (bias). A rank transformable method to evaluate the reliability of ordered categorical assessments, developed by Svensson, was used to measure random disagreements separately from the systematic disagreement [19]. The presence of a systematic disagreement, thus implying a lack of stability in the scale, is summarized by the two statistics of Relative Position (RP) and Relative Concentration (RC), which lie in the interval between 1 and þ1. Values close to zero mean a negligible systematic disagreement. A positive value for RP indicates a systematic change of the use of the scale from a lower to a higher category and vice-versa. The RC will be positive if the categorization of the second session concentrates more on the central part of the scale than the first session does. The RC will be negative if the categorization moves towards the periphery of the scale. The Relative Rank Variance (RV) is a measure of the individual variation of the rank transformable pattern, which is the disagreement between two evaluators that cannot be explained by a systematic disagreement. The smaller the RV, the less occasional variation there is between repeated evaluations. The measure of disorder defines the level of discordance relative to the total agreement, which is a useful measure of order consistency in paired assessments, as well as a measure of individual variability. The method also allows for the definition of an augmented rank-order agreement coefficient related to the Relative RV, ra [20]. A high value of ra, which ranges from 0 to 1, implies a high level of agreement between the paired assessments, which is an expression of good reliability. The intra- and inter-rater agreement of individual items of the GMF was also analyzed by means of , Cohen’s , which measures the agreement between two raters who each classify N items into C mutually exclusive categories. A  score indicates an

Norwegian general motor function assessment

DOI: 10.3109/09638288.2013.868045

agreement between raters, which is adjusted for the amount of agreement by chance and the magnitude of the disagreement [26]. A general interpretation of  has been reported as:  Agreement 50, less than a chance agreement; 0.01–0.20, slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; 0.81–0.99, almost perfect agreement [26–29]. The choices of statistical analyses were made in order to explore the data and present it as thorough as possible.

Results Three physiotherapists volunteered to be ‘‘head investigators’’, recruiting 10 persons each to the sample. All three were female, 30 to 60 years of age, with 5 to 40 years of experience, and they had worked together for some years in the same department. A total of 30 elderly people, 20 women and 10 men, all in need of community-based services, were included in the study consecutively, resulting in 30 pairwise intra-rater tests and 30 pairwise inter-rater tests. The majority of these were shortterm patients in a nursing home, while a few others were living at home but receiving help from community-based services (Table 1). The mean age was 85 years (SD 7.4). All the participants had multiple diagnoses, including a hip fracture, chronic obstructive lung disease, stroke, cancer, osteoporosis, and a heart condition. The subjects tested were in need of multiple medications; 27 people had 2–12 medicines per day and only three did not receive any medication, while the majority needed help with the ADL, as evaluated by the Barthel Index [17]. The participants displayed a reduced mobility and the majority of the participants, 28 of the 30, used assistive devices and several had reduced strength in the lower extremities, which indicated that they fulfilled the description of physically frail elderly persons. The internal consistency of the N-GMF is shown in the Cronbach’s a coefficient of 0.68 for Dependency, 0.73 for Pain and 0.75 for Insecurity when testing each of the concepts. However, the standardized item a’s were slightly higher at 0.70, 0.80 and 0.90, respectively. The distribution of GMF data: floor and ceiling effects The distribution of the N-GMF scores showed that the total score was reached in approximately 50% of the 30 subjects, which indicated a ceiling effect in this population of the frail elderly (Table 2). This means that the majority of the participants could perform the items independently, which indicates a low sensitivity for the further improvement of function. However, there were no Table 1. Description of patients participating. Subjects (n ¼ 30) Age (mean; SD) Gender male/female Diagnosis (n): Hip fracture Cerebral insult Cancer Osteoporosis Obstructive lung disease Heart condition Reduced capacity Other Medication (n): 0 1–5 6–10 11–12

84.9 (7.4) 20/10 3 6 2 1 2 1 2 13 3 13 9 5

1707

floor effects in the three subscales, which means that the subscales were able to discriminate between individuals with functional limitations and a disability. Intra-rater reliability – test–retest The PA regarding the different items in the three testers of paired data for sub-scores in the N-GMF items in the intra-rater reliability test ranged from 70 to 100% for Dependence, 42 to 91% for Pain, and from 33 to 100% for Insecurity. The PA for the total score of the three varied from 40 to70% for Dependence, 30 to 40% for Pain and 30 to 60% for Insecurity (Table 3). The RP and RC indicated a systematic disagreement in Dependence (n ¼ 30). The RP shift was negative, which indicated a distribution towards lower scores on the second test. This shift indicate a better performance for the person tested, since score 0 represent independence, and score 1–2 represent dependence. This negative shift was more likely to happen in 22% of the patients being scored by Tester 1, though this result is not significant. Participants tested by Testers 2 and 3 were 5% and 9% more likely to score significantly higher in the second test, indicating a deterioration of performance (Table 3), while the RC distribution was positive for one of the three testers, thereby indicating a more concentrated scoring in the second test – and significantly so (Figure 2). Relative RV indicates modest individual bias in all three, with an augmented rank-order agreement coefficient ra of 0.964, 0.964 and 0.988, respectively (Table 3). Regarding Dependence, the measures of disorder indicated a 3–5% difference between possible pairs. The RP regarding Pain indicated higher scores in the second test, although this difference was only significant for one of the testers (Table 3). The RC did not reveal any systematic disagreement in any tests, while the RV indicated a modest individual bias with an augmented rank-order agreement Table 2. Distribution of dependence scores in all testers Test 1/Test 2, intra-reliability (n ¼ 30) items of mobility and hand function of the Norwegian-General Motor Function assessment. Scoring values Item and description

2

1

0

Turn over in bed (0–2) From lying to sitting (0–2) Lie down from sitting (0–2) Transfer from bed to chair (0–2) Touch left big toe, with either hand (0–1) Touch right big toe, with either hand (0–1) Stand up from sitting position (0–2) Stand for 410 s (0–2) Transfer indoors 10 m (0–2) Stair walking up and down 7 steps (0–2) Transfer outdoors 25 m (0–2) Move left hand to mouth (0–1) Move right hand to mouth (0–1) Put left hand on top of head (0–1) Put right hand on top of head (0–1) Put left hand behind back (0–1) Put right hand behind back (0–1) Shake hands with left hand (0–2) Shake hands with right hand (0–2) Take hold of paper with left thumb and index finger (0–2) Take hold of paper with right thumb and index finger (0–2)

2/0 0/0 0/0 0/0

0/0 0/0 0/1

5/2 7/2 3/1 1/1 5/4 3/3 2/0 2/0 1/0 8/8 11/1 0/0 1/1 3/3 2/2 1/1 4/2 0/0 0/0 2/1

23/28 23/28 27/29 29/29 25/26 27/27 28/30 28/30 28/30 18/18 19/19 30/30 29/29 27/27 28/28 29/29 26/28 30/30 30/30 28/28

1/1

3/2

26/27

0/0 0/0 1/0 4/4 0/10

Dependence is evaluated as 0 ¼ independent, 1 ¼ needs help from one person, and 2 ¼ needs help from two persons. The difference in the evaluation of items is indicated in brackets in the text.

1708

B. Langhammer & B. Lindmark

Disabil Rehabil, 2014; 36(20): 1704–1712

Table 3. Intra-rater reliability of the Norwegian-General Motor Function assessment scale (N-GMF) for Dependence, Pain, and Insecurity in the three testers separately; T1, T2 and T3, and all. Dependence

PA RP CI RC CI RV CI ra

Pain

Insecurity

T1 (n ¼ 10)

T2 (n ¼ 10)

T3 (n ¼ 10)

All (n ¼ 30)

T1 (n ¼ 10

T2 (n ¼ 10)

T3 (n ¼ 10)

All (n ¼ 30)

T1 (n ¼ 10

T2 (n ¼ 10)

T3 (n ¼ 10)

All (n ¼ 30)

70% 0.22 0.42; 0.02 0.07 0.33; 0.47 0.04 0.00; 0.12 0.96

60% 0.05 0.29; 0.19 0.33 0.64; 0.02 0.04 0.00; 0.12 0.96

90% 0.09 0.25; 0.07 0.08 0.24; 0.09 0.01 0.00; 0.04 0.99

70% 0.10 0.72; 0.51* 0.07 0.79; 0.66* 0.02 0.00; 0.54 0.98

70% 0.30 0.59; 0.01 0.13 0.48; 0.21 0.09 0.00; 0.28 0.90

60% 0.22 0.49; 0.05 0.04 0.42; 0.34 0.09 0.00; 0.28 0.90

60% 0.30 0.06; 0.04 0.16 0.57; 0.26 0.04 0.00; 0.12 0.96

67% 0.16 0.32; 0.00* 0.10 0.11; 0.19* 0.07 0.00; 0.16 0.93

60% 0.26 0.61; 0.09 0.11 0.50; 0.28 0.22 0.00; 0.55 0.78

40% 0.00 0.26; 0.26 0.00 0.43; 0.43 0.11 0.00; 0.29 0.89

50% 0.07 0.24; 0.38 0.12 0.24; 0.48 0.12 0.00; 0.36 0.88

57% 0.12 0.29; 0.05* 0.08 0.30; 0.14* 0.07 0.00; 0.16 0.93

*Indicate significant differences, p50.05. Percentage Agreement (PA); Relative Position (RP); Relative Concentration (RC); and Relative Rank Variation (RV) with confidence intervals (CI) are given for each tester and the sample for each (n ¼ 10), as well as for all three testers combined, ‘‘All’’ for the total sample (n ¼ 30). The coefficient of agreement (ra) was calculated.

Figure 2. ROC curves of the systematic disagreement between paired N-GMF assessments of intra-rater reliability test for total score in Dependence, Pain and Insecurity.

Norwegian general motor function assessment

DOI: 10.3109/09638288.2013.868045

coefficient ra ¼ 0.964, 0.904, and 0.0904, respectively, for each of the therapists (Figure 2). The measures of disorder indicated a 5–10% difference between possible pairs regarding Pain. The scoring of Insecurity and RP indicated no systematic differences, and the RC indicated a slightly modest shift in concentration (Table 3). However, the RV indicated an individual bias with an augmented rank-order agreement coefficient ra ¼ 0.79, 0.0.89 and 0.88 (Figure 2). A measure of disorder indicated a 10–22% difference between possible pairs regarding Insecurity. The  statistics varied between the three testers for the total score for Dependence from fair to moderate, 0.27–0.62, for Pain slight to fair, 0.17–0.35 and for Insecurity slight to moderate, 0.13–0.47 (Table 4).  statistics, association and agreement between each item and the total score for all three testers aggregated with respect to Dependence are presented in Table 5. The association varied from not applicable to high, with highest agreement in the following: transfers from bed, stair climbing, outdoor transfers and arm and hand activities.  varied from substantial to slight agreement on the different items, with fair agreement for the total score (Table 5). The  Agreement and Spearman coefficient were  0.29 and 0.8, respectively, for the total Pain score and  0.28 and 0.6, respectively for the total Insecurity score. Inter-rater reliability The PA between different testers for Dependence, Pain and Insecurity (n ¼ 30) was 74%, 89% and 74%, respectively, whereas the Relative Operating Characteristic (ROC) curves revealed a Table 4. Intra-rater reliability for each of the three testers: Kappa coefficient () for total score in each sub-scale in the three test persons evaluating the same 10 patients on two different occasions 1 week apart.

Test person 1 Test person 2 Test person 3

Dependence

Pain

Insecurity

0.62 0.27 0.56

0.30 0.35 0.17

0.17 0.13 0.47

Table 5. Intra-rater reliability for all three testers aggregated: Spearman’s correlation (rs), and  coefficient (k) (n ¼ 30). Item Turn over in bed Sit up from recumbent position Lie down from sitting Transfer from bed to chair Touch left big toe Touch right big toe Stand up from sitting Stand for 410 s Transfer indoors 10 m Stair climbing/steps up/down Transfer outdoors 25 m Move left hand to mouth Move right hand to mouth Move left hand to head Move right hand to head Move left hand behind back Move right hand behind back Shake hands with left hand Shake hands with right hand Pinch grip with left hand Pinch grip with right hand Total dependence Total pain Total insecurity

Spearman (rs)

p



p

0.59 0.48 0.56 1.0 0.89 0.66 – – – 0.86 0.99 1 1 1 1 1 0.68 1 1 0.45 0.56 0.87 0.77 0.55

0.001 0.007 0.001 0.0001 0.001 0.001 – – – 0.001 0.001

0.29 0.38 0.47 1.0 0.87 0.64 0 0 0 0.76 0.93 1 1 1 1 1 0.63 1 1 0.47 0.53 0.48 0.29 0.28

0.0001 0.008 0.002 0.0000 0.001 0.001 0.001

0.0001 0.01 0.001 0.0001 0.0001 0.0001

0.001 0.001

1709

slight systematic disagreement for Dependence and Insecurity (Figure 3). The RP for Dependence indicated a slight systematic disagreement with a lower rating for the second test than the first (Table 6). The RC confirmed this impression with values diverging from 0, but the RV indicated no individual bias with an augmented rank-order agreement coefficient of ra ¼ 0.974. The disagreement between pairs in Dependence was 5%. Pain showed little to no systematic disagreement between raters (Table 6), exhibiting a low individual variability with a small RV, an augmented rank-order agreement coefficient of ra ¼ 0.996 and a measure of disagreement indicating only 1% between pairs. Insecurity showed a minor systematic disagreement in regard to RP and RC (Table 6), while for RV there was no indication for individual bias and the augmented rank-order agreement coefficient was ra ¼ 0.993. The disagreement between pairs in Pain was 51%. The inter-rater reliability for the total score was assessed with the  coefficient; it was 0.38, ‘‘fair’’ for Dependence, 0.47, ‘‘moderate’’ for Pain and 0.59, ‘‘moderate’’ for Insecurity. Clinical usefulness A focus group session with the 12 participating physiotherapists in the Centre was held at the end of the study, during which the questions that had been used in the questionnaire were discussed. One of the major benefits put forward was that the N-GMF focused on the simultaneous questions of pain and insecurity in combination with the tasks performed. It was also agreed that the N-GMF was perceived as a screening instrument, relevant to the work, as a communication tool for a multidisciplinary team and was practical in use. It was also mentioned that the N-GMF could be used to help the team visualize how declining ability influenced and reduced the number of activities and the level of performance reached in each activity. Negative aspects were that the test took some time to perform: in general, between 15 and 45 min to conduct each test. Some patients also reported that they were not able to perform the entire test in one session. In particular, the testing of outdoor walking ability was perceived as difficult to perform, and patients were often not motivated. There were also some differences between therapists in their view of the items included in the tests. The activities were considered important by all, but there was disagreement about the importance of the items that assessed arm and hand functions. Some physiotherapists regarded the items as unnecessary since they often were ‘‘part’’ of an activity, so that they already had an impression of the range of motion. Others insisted that these items were important since they explained how a reduced range of movement could affect activities and in that respect were ‘‘educational’’ for the rest of the team. The large amount of documentation that is obligatory in daily work was another topic put forward. The difficulty of introducing a new instrument into daily practice and work was emphasized. It was stressed that a new outcome measure must be perceived as important and give the team extra information in order to be accepted and considered worthwhile in daily use.

Discussion 0.0001 0.001 0.0001 0.0001 0.0001 0.0001

The primary goal of this study was to establish the reliability and clinical usefulness of the N-GMF. The results indicate that N-GMF may be perceived as a relatively reliable instrument and considered useful, both in regard to the consistency of the scale and to intra- and inter-observer agreement, which is in line with the original Swedish reliability test on a similar population of elderly people (n ¼ 20, n ¼ 22) [7].

1710

B. Langhammer & B. Lindmark

Disabil Rehabil, 2014; 36(20): 1704–1712

Figure 3. ROC curves of the systematic disagreement between paired N-GMF assessments of inter-rater reliability tests for total score in Dependence, Pain and Insecurity.

Table 6. Inter-rater reliability in the Norwegian-General Motor Function assessment scale (N-GMF) for total score in Dependence, Pain and Insecurity; Percentage Agreement (PA), Relative Position (RP), Relative Concentration (RC) and Relative Rank Variation (RV) in all therapists (n ¼ 12) on 30 patients.

PA RP 95% CI RC 95% CI RV 95% CI ra

Dependence

Pain

Insecurity

74% 0.04 0.15; 0.05 0.11 0.29; 0.08 0.03 0.00; 0.06 0.97

89% 0.02 0.05; 0.08 0.05 0.15; 0.05 0.00 0.00; 0.02 1

74% 0.07 0.00; 0.13 0.05 0.19; 0.09 0.00 0.00; 0.01 1

The N-GMF is both performance-based (Dependence) and self-evaluated (Pain and Insecurity), which may be seen as both a strength and weakness of the scale. The slight difference between Cronbach’s a and the standardized item a reported in this study may indicate that some multifactorial items violate the -equivalent model [24]. The -equivalent model assumes that each test item measures the same latent trait on the same scale; this is

reflected in the a, which gives a better estimate of reliability. However, the assumption of the -equivalent model would be violated if the test items were in fact heterogeneous, and the reported standardized item a would then be higher than Cronbach’s a, indicating the need to examine the -equivalent measurement in the data more closely [24]. The reported arm and hand functions indicate higher agreement than items on mobility and transfer in both intra-rater and inter-rater reliability [19]. In the mobility scale, 13 of the 21 items are divided into three categories: ‘‘can do alone’’, ‘‘needs help from one person’’ and ‘‘needs help from two persons’’. However, for the arm and hand items, 8 of 21 items are only divided into two categories, and this may have influenced the results (Table 2). One could speculate that the items relating to arm and hand function could be seen as less difficult to interpret, since arm and hand items were reported to be corresponding (Table 5) to a higher degree than mobility items. Less disagreement in the intra-rater reliability test between the same testers might have been expected, but here differences between testers emerged, hence indicating systematic differences (Table 3). In the Swedish intra-rater reliability study the results were less varied [7]. One rater tested 20 patients 1 week apart compared to the Norwegian study, where three raters each tested 10 patients 1 week apart (Table 3). Inter-rater reliability tests

DOI: 10.3109/09638288.2013.868045

involving 12 therapists in the Norwegian study compared three samples (n ¼ 10), while there were two and five therapists respectively testing two samples (n ¼ 20, n ¼ 25) in the Swedish study [7]. The results of N-GMF reliability presented separately for all three testers and aggregated, for all three, showed that, despite work-shops and discussions, the individual therapists diverged slightly from one test to the another (Table 3). Possible explanations for this difference may be the time interval of 1 week between test occasions. The difference in scores for dependence from one test to the next may reflect a change of function in the elderly participant. These elderly people were registered as shortterm patients in the institution or had been assigned a treatment setting at home with the aim of improving performance with treatment and care. A shorter period between the test and re-test might have been beneficial for the study, although the same design was used in the Swedish study, on the grounds that elderly and fragile people need a longer break between tests [7]. Another possible explanation may be that the differences represent varying physical performance from 1 week to another among the elderly. Svensson’s method makes it possible to evaluate systematic bias and individual variations, which has been said not to have been captured in  analyses [19,20,26]. This was expressed in the ROC curves and in the RP, RC and RV for the different therapists. In the individual analyses of the testers, a modest systematic bias influenced the test results. This may be related to experience, which influences the interpretation of an evaluated function. Moreover, the measures of Pain and Insecurity indicated individual bias, with RV varying between 0.04 and 0.09 and 0.11 and 0.22 respectively for the individual testers, hence indicating the heterogeneous pattern of change, discussed earlier, from one test occasion to the next. The  statistics revealed a fair to moderate agreement between raters in the same categories of Dependence, Pain and Insecurity, underlining the impression of differences between the first test and the second (Tables 4 and 5). The results of the differences on both intra-rater and inter-rater reliability indicated a negligible individual-based disagreement with an RV50.1, a high overall ra and small differences in disagreements regarding Dependence. The systematic differences indicated by RP and RC may be influenced by the fact that the test was new to the therapists and that the test period stretched over a longer time. In addition, the time for testing was lengthened because of a stressful work period, as well as not having enough personnel for the amount of work to be done, which possibly contributed to the systematic differences between the raters. In this respect, the test was applied in regular clinical practice and could be viewed as ‘‘a normal day’’ within a clinic, although the systematic differences that occurred might be easily overcome by education, more practice and repetition [19]. Approximately 50% of the cases received a maximum total score, thus indicating a ceiling effect that was also shown in the Swedish sample [7]. This ceiling effect may limit the clinical usefulness of N-GMF, since it may only be used for the frailest persons if the goal is to show improvement. However, for clinical use in which the goal is to monitor the maintenance of performance in the elderly in order to prevent deterioration, and as a tool for communication in multidisciplinary work in nursing homes and community care, the N-GMF can be perceived as a useful and helpful tool. There are some weaknesses relating to the study; the relatively small number of elderly people being tested (n ¼ 30), that the tests were performed on short-term patients only and that the intra-rater reliability tests were performed with a time perspective that may, at least in this group of patients, have influenced the test results. Furthermore, the field test was performed in order to get an indication of clinical usefulness, but in order to conclude on a general basis further analyses are needed.

Norwegian general motor function assessment

1711

Conclusion The N-GMF is a fairly reliable instrument for use with frail elderly persons. Both intra-rater and inter-rater reliability were moderate for Dependence and slight to fair for Pain and Insecurity. Furthermore, a negligible systematic bias was observed in the rating of Dependence, whereas indications of individual bias were observed with regard to Pain and Insecurity. The clinical usefulness of the test was stressed in relation to its main focus, the frail elderly, in addition to being related to that of communication within the multidisciplinary team.

Acknowledgements We would like to thank the Geriatric Competence Centre in Drammen, Norway, for their valuable help and cooperation in this study. We would also like to thank statistician M. Sma˚stuen for ˚ berg, creator of GMF, Uppsala valuable statistical help, A.-C. A University, for support and Pearson Assessment Sweden, the copyright holder of GMF, for allowing us to publish the results.

Declaration of interest The authors report no conflict of interests. The authors alone are responsible for the content and writing of this article. We would like to acknowledge the support from Oslo and Akershus University College during this study.

References 1. Mercier L, Audet T, He´bert R, et al. Impact of motor, cognitive, and perceptual disorders on ability to perform activities of daily living after stroke. Stroke 2001;32:2602–8. 2. Ueshima K, Ishikawa-Takata K, Yorifuji T, et al. Physical activity and mortality risk in the Japanese elderly: a cohort study. Am J Prev Med 2010;38:410–18. 3. Fletcher PC, Guthrie DM, Berg K, Hirdes JP. Risk factors for restriction in activity associated with fear of falling among seniors within the community. J Patient Saf 2010;6:187–91. 4. Vermeulen J, Neyens JC, van Rossum E, et al. Predicting ADL disability in community-dwelling elderly people using physical frailty indicators: a systematic review. BMC Geriatric 2011;11:33 (1–11). 5. Berlin Hallrup L, Albertsson D, Bengtsson Tops A, et al. Elderly women’s experiences of living with fall risk in a fragile body: a reflective life world approach. Health Soc Care Commun 2009;17: 379–87. 6. Kempen GI, van Haastregt JC, McKee KJ, et al. Socio-demographic, health-related and psychosocial correlates of fear of falling and avoidance of activity in community-living older persons who avoid activity due to fear of falling. BMC Public Health 2009;9:170. 7. Aberg AC, Lindmark B, Lithell H. Development and reliability of the General Motor Function Assessment Scale (GMF) – a performance-based measure of function-related dependence, pain and insecurity. Disabil Rehabil 2003;25:462–72. 8. Gustafsson U, Grahn B. Validation of the General Motor Function Assessment Scale – an instrument for the elderly. Disabil Rehabil 2008;30:1177–84. 9. Elam JT, Graney MJ, Beaver T, et al. Comparison of subjective ratings of function with observed functional ability of frail older persons. Am J Public Health 1991;81:1127–30. 10. Resnik B. Motivation to perform activities of daily living in the institutionalized older adult: can a leopard change its spots? J Adv Nurs 1999;29:792–9. 11. Scudds RJ, Robertson JM. Empirical evidence of the association between the presence of musculoskeletal pain and physical disability in community-dwelling senior citizens. Pain 1998;75:229–35. 12. Howland J, Lachman ME, Peterson EW, et al. Covariates of fear of falling and associated curtailment. Gerontologist 1998;38:549–55. 13. Conroy R. Sample size. A rough guide. Cited 2012 October 16. Available from: http://www.beaumontethics.ie/docs/application/ samplesizecalculation.pdf [last accessed 9 Dec 2013].

1712

B. Langhammer & B. Lindmark

14. Shoukri MM, Aysyali MH, Donner A. Sample size requirements for the design of reliability study: review and new results. Stat Methods Med Res 2004;13:251–71. 15. Tinetti ME, Richman D, Powell L. Falls efficacy as a measure of fear of falling. J Gerontol 1990;45:239–43. 16. Podsiadlo D, Richardson S. The timed up and go – a test of basic functional mobility for frail elderly persons. J Am Geriatric Soc 1991;24:398–401. 17. Mahoney FI, Barthel DW. Functional evaluation: the Barthel Index. Md State Med J 1965;14:61–5. 18. Domholdt E. Physical therapy research. Principles and applications. Philadelphia (PA): WB Saunders Company; 1993. 19. Svensson E. Application of a rank-invariant method to evaluate reliability of ordered categorical assessments. J Epidemiol Biostat 1998;3:403–9. 20. Svensson E. A coefficient of agreement adjusted for bias in paired ordered categorical data. Biomed J 1997;39:643–57. 21. Lin YH, Chen CY, Chiu PK. Cross-cultural research and backtranslation. Sport J 2005;8:1–10.

Disabil Rehabil, 2014; 36(20): 1704–1712

22. Mao HF, Hsueh IP, Tang PF, et al. Analysis and comparison of the psychometric properties of three balance measures for stroke patients. Stroke 2002;33:1022–7. 23. Holmes WC, Shea JA. Performance of a new, HIV/AIDS-targeted quality of life (HAT-QoL) instrument in asymptomatic seropositive individuals. Qual Life Res 1997;6:561–71. 24. Tavakol M, Dennick R. Making sense of Cronbach’s alpha. IJME 2011;2:53–5. 25. Altman DG. Practical statistics for medical research. London: Chapman & Hall; 1991. 26. Viera AJ, Garrett JM. Understanding inter-observer agreement: the kappa statistic. Fam Med 2005;37:360–3. 27. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20:37–46. 28. Cohen J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 1968;70: 213–20. 29. Tooth LR, Ottenbacher KJ. The  statistic in rehabilitation research: an examination. Arch Phys Med Rehabil 2004;85:1371–6.

Copyright of Disability & Rehabilitation is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

General motor function assessment scale--reliability of a Norwegian version.

The General Motor Function assessment scale (GMF) measures activity-related dependence, pain and insecurity among older people in frail health. The ai...
674KB Sizes 0 Downloads 0 Views