Menopause: The Journal of The North American Menopause Society Vol. 22, No. 3, pp. 325/336 DOI: 10.1097/gme.0000000000000321 * 2014 by The North American Menopause Society

Structural validity of a 16-item abridged version of the Cervantes Health-Related Quality of Life scale for menopause: the Cervantes Short-Form Scale Pluvio J. Coronado, MD, PhD,1 Rafael Sa´nchez Borrego, MD, PhD,2 Santiago Palacios, MD, PhD,3 Miguel A. Ruiz, PhD,4 and Javier Rejas, MD, PhD5 Abstract Objective: The Cervantes Scale is a specific health-related quality of life questionnaire that was originally developed in Spanish to be used in Spain for women through and beyond menopause. It contains 31 items and is timeconsuming. The aim of this study was to produce an abridged version with the same dimensional structure and with similar psychometric properties. Methods: A representative sample of 516 postmenopausal women (mean [SD] age, 57 [4.31] y) seen in outpatient gynecology clinics and extracted from an observational cross-sectional study was used. Item analysis, internal consistency reliability, item-total and item-dimension correlations, and item correlation with the 12-item Medical Outcomes Study Short Form Health Survey Version 2.0 were studied. Dimensional and full-model confirmatory factor analyses were used to check structure stability. A threefold cross-validation method was used to obtain stable estimates by means of multigroup analysis. Results: The scale was reduced to a 16-item version, the Cervantes Short-Form Scale, containing four main dimensions (Menopause and Health, Psychological, Sexuality, and Couple Relations), with the first dimension composed of three subdimensions (Vasomotor Symptoms, Health, and Aging). Goodness-of-fit statistics were better than those of the extended version (W2/df = 2.493; adjusted goodness-of-fit index, 0.802; parsimony comparative fit index, 0.749; root mean standard error of approximation, 0.054). Internal consistency was good (Cronbach’s > = 0.880). Correlations between the extended and the reduced dimensions were high and significant in all cases (P G 0.001; r values ranged from 0.90 for Sexuality to 0.969 for Vasomotor Symptoms). Conclusions: The Cervantes Scale can be reduced to a 16-item abridged version (Cervantes Short-Form Scale) that maintains the original dimensional structure and psychometric properties. At 51% of the original length, this version can be administered faster, making it especially suitable for routine medical practice. Key Words: Cervantes Scale Y Abridged Y Menopause Y Health-related quality of life. enopause is defined as permanent cessation of menstruation, confirmed after 12 consecutive months of amenorrhea. It can be natural (occurring without other pathological or physiological causes) or it can be induced by surgical operation, radiation, or drugs.1 Natural menopause results from a loss of ovarian follicular activity and occurs at about ages 50 to 52 years.1 Climacteric terminology is somewhat confusing, but the World Health Organization and the

M

International Menopause Society (IMS) have defined the following terms: Bmenopausal transition[Vthe period before the final menstrual period (FMP), when variability in the menstrual cycle is usually increased (it can be synonymous with Bpremenopause[); Bperimenopause[Vthe period immediately before menopause, when endocrinological, biological, and clinical features commence; Bclimacteric[Vthe period of transition from the reproductive phase to the nonreproductive phase,

Received March 26, 2014; revised and accepted July 7, 2014. From the 1Department of Obstetrics and Gynecology, Hospital Clı´nico San Carlos, Madrid, Spain; 2Clı´nica DIATROS, Barcelona, Spain; 3Instituto Palacios de Salud y Medicina de la Mujer, Madrid, Spain; 4Department of Methodology, School of Psychology, Universidad Auto´noma de Madrid, Madrid, Spain; and 5Health Economics and Outcomes Research Department, Pfizer SLU, Alcobendas, Spain. All authors had complete access to the data, participated in the analysis and/or interpretation of results, and drafted the manuscript. P.J.C., R.S.B., S.P., and J.R. were responsible for the design of the study. Analysis of data was performed by M.A.R. Funding/support: This study was carried out using the database of the GINERISK study, which was funded by Pfizer SLU. The statistical work included in this article was performed by M.A.R. under a grant from

Pfizer SLU. No editorial support was provided except for review of the English manuscript (by Sharon Grevet), which was funded by Pfizer SLU. Financial disclosure/conflicts of interest: J.R. is an employee of Pfizer SLU. M.A.R. is a professor of statistics and methodology at Universidad Auto´noma de Madrid and has received a grant from Pfizer SLU for statistical analysis. All other authors declare that they have no conflicts of interest. Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s Website (www.menopause.org). Address correspondence to: Pluvio J. Coronado, MD, PhD, Servicio de Obstetricia y Ginecologı´a, Hospital Clı´nico San Carlos, C/ Prof. Martı´n Lagos, Madrid 28040, Spain. E-mail: [email protected] Menopause, Vol. 22, No. 3, 2015

Copyright © 2015 The North American Menopause Society. Unauthorized reproduction of this article is prohibited.

325

CORONADO ET AL

which incorporates perimenopause by extending for a longer variable period before and after perimenopause (IMS); Bclimacteric syndrome[Vthe symptoms possibly associated with the climacteric phase (IMS); Bpostmenopause[Vthe time dating from the FMP, regardless of whether menopause was natural or induced.1,2 During these periods, women experience significant physical and psychological changes as a consequence of decline in estrogenic activity.3,4 It is noteworthy that the effects of menopause-related changes on health-related quality of life (HRQoL) are closely related to personal and sociocultural characteristics, which in turn decisively influence how each woman perceives many of these changes.5,6 HRQoL assessment has become an essential component, both when studying the effects of menopause on well-being and when evaluating the benefits of hormone therapy or any other type of treatment used at this stage of a woman’s life.7,8 The Cervantes Scale is a self-completed questionnaire designed to measure HRQoL in perimenopausal and postmenopausal women.9 The existing questionnaire consists of 31 items and was developed from an extended version containing 83 items. The extended version was originally structured in seven dimensions and had excellent reliability (> = 0.951). The dimensions were: Menopause Symptoms, Self-Perceived Health and Well-Being, Couple Relations, Sexuality, Depression, Anxiety, and Healthy Lifestyle. The original authors decided to make the scale more practical for use in a clinical setting and reduced it to the published 31-item version.9 Internal consistency was kept high (> = 0.909), but the structure was reduced to four dimensions: Menopause and Health (divided into three subdimensions: Vasomotor Symptoms, Health, and Aging), Psychological, Sexuality, and Couple Relations, with scores ranging from 0 to 155 points. The target population was women aged 45 to 64 years. Although appropriate for use in clinical trials and other health studies, the 31-item version is still time-consuming and thus difficult to use on a routine basis in daily medical practice. Following the usual guidelines for development of multidimensional instruments,10,11 two possible approaches were considered. The first approach was to detach the individual dimensions from one another and to define them separately with at least three items so that the structure would be defined.12 The second approach was to resign ourselves to measuring isolated individual dimensions and developing a more generic instrument where only the overall score would retain all the adequate psychometric properties as a composite measure without losing the contribution of the original attributes. The former would make it possible to assess women in each individual dimension but would entail a minimum of 18 (6  3) items, which was considered too long. The latter would make it possible to reduce the questionnaire to at least half its size but would compromise the assessment of individual dimensions. As a result, we decided to follow the second strategy and to reduce the length of the scale as far as possible without giving up the original structure. If a particular study required individual assessment of each dimension, the original 31-item would be the best choice. Thus, the aim of this study was to shorten the Cervantes Scale, while

326

Menopause, Vol. 22, No. 3, 2015

retaining the original psychometric properties and structure, and to produce a validated abridged version, the Cervantes ShortForm Scale (Supplementary DataVAbridged Questionnaire, Supplemental Digital Content 1, http://links.lww.com/MENO/A106). METHODS Sample of women The present work was carried out using a subsample extracted from the GINERISK study.13 The GINERISK study was a noninterventional cross-sectional epidemiological study in a broad population of Spanish postmenopausal women (FMP 912 mo earlier) older than 18 years who had a clinical diagnosis of osteoporosis, were seen in gynecology clinics, and were able to understand written Spanish questionnaires. The women in the GINERISK study were representative of the entire Spanish territory. The subsample used here consisted of 503 women aged 45 to 64 years, with no previous fractures, and who had been diagnosed with osteopenia (dual-energy x-ray absorptiometry T score Q j2.5 in spine or hip). Because there is an estimated 20% prevalence of osteoporosis in the general population,14 a second subsample of 123 women was added by random selection from women with a diagnosis of osteoporosis (dual-energy x-ray absorptiometry T score G j2.5 in spine or hip). Thus, a total population of 626 participants was used here. According to the usual requirements for factor analysis15 and structural equation modeling,16 the sample size was considered sufficient for this study. Relevant variables were collected at a single visit after a written informed consent form had been obtained. The study followed the ethical principles of the Declaration of Helsinki and was approved by the Hospital de la Princesa Ethics Committee. Instruments In addition to the Cervantes Scale, the Spanish version of the 12-item Medical Outcomes Study Short Form Health Survey Version 2.0 was also used here.17,18 The 12-item Short Form Health Survey Version 2.0, an updated version of the 12-item Short-Form Health Survey (SF-12), is a generic instrument used to assess HRQoL, adapted and validated for the Spanish population.18 It offers two component summary scales (Physical and Mental Health) and comprises 12 items and eight dimensions: Bself-perceived general health[ (item 1); Bbodily pain[ (item 2); Bphysical functioning,[ which includes Bengagement in moderate activities[ (item 3) and Bability to climb a flight of stairs[ (item 4); Bphysical role,[ which includes Baccomplishment[ (item 5) and Blimitation[ (item 6); Bvitality[ (item 7); Bsocial functioning[ (item 8); Bmental health,[ which includes Bfeeling calm and peaceful[ (item 9) and Bfeeling downhearted and depressed[ (item 10); and Bemotional role,[ which includes [accomplishment[ (item 11) and Blimitation[ (item 12). High scores mean better HRQoL. Results were compared with populationbased norms and corrected to obtain Z scores and standardized metric scores (0-100).18 Statistical analysis In the first step, items with worse metric properties were identified under classical test theory item analysis assumptions.11,19 * 2014 The North American Menopause Society

Copyright © 2015 The North American Menopause Society. Unauthorized reproduction of this article is prohibited.

CERVANTES SHORT-FORM SCALE FOR MENOPAUSE

Blank responses, unimodal response distribution, ceiling and floor effects, and skewness were studied. Internal consistency reliability was studied for each subdimension. Cronbach’s >, itemtotal correlation, and adjusted item-total correlation (deleting each particular item from the total score) were computed (> G 0.70, poor; > Q 0.70, acceptable; > 9 0.80, good; > 9 0.90, very good; > 9 0.95, excellent).11 Item-total statistics were computed by considering the corresponding dimension total score. Structural validity was assessed through confirmatory factor analysis (CFA), using listwise treatment of missing values. The original structure proposed by the scale developers was assumed: four dimensions, with one of the dimensions divided into three subdimensions. A separate CFA was performed for each dimension to identify items with lower factor loading, and final overall analyses including all items were also estimated. The following models were estimated: (1) a second-order CFA for the Menopause and Health scale with three first-order factors (Vasomotor Symptoms, Health, and Aging); (2) a first-order CFA including the Psychological scale items; (3) a first-order CFA including the Sexuality scale items; (4) a first-order CFA including the Couple Relations scale items; and (5) a secondorder CFA for all items with six first-order factors (Vasomotor Symptoms, Health, Aging, Psychological, Sexuality, and Couple Relations) and a second-order HRQoL dimension. Given that goodness-of-fit (GOF) statistics are very sensitive to large sample sizes (more than 200-400 cases) and to prevent overfitting to a single sample, we adopted a multigroup strategy. The sample was divided at random into three equal-size groups (172 cases), and each confirmatory model was estimated simultaneously in the threefold sample (cross-validation), gaining generalizability. The stepwise multigroup strategy starts with a separate model estimation for each subsample. Measurement loadings are imposed equally among all groups, and loss of fit is assessed. If no statistical loss is found, the following sequence of equality restrictions is imposed among groups until a significant loss is found or until the more restricted model is accepted: factor variances, factor covariances, structural weights (if defined), structural residuals (if defined), and measurement errors. The best-fitting model was reported. The following GOF criteria were used15: W2/df less than 4; GOF index higher than 0.90; adjusted GOF index higher than 0.90; comparative fit index higher than 0.90; Tuker-Lewis index higher than 0.90; and root mean standard error of approximation less than 0.05. Owing to the sample size, W2 was expected to be significant in all cases, whereas the significance of the difference between nested models $W2 was considered informative. When more than three observed indicators (items) per dimension were available, additional indicator suitability information was assessed by estimating a regression model for the dimension indicators using the scores obtained from the SF-12 Psychological and Physical dimensions (two separate regression models). When possible, at least three items were retained per dimension to ensure local identification.15 After decisions had been made about the items retained for each dimension and subdimension within the shortened scale, two final CFA models were estimated: (6) a second-order CFA with six first-order

factors (Vasomotor Symptoms, Health, Aging, Psychological, Sexuality, and Couple Relations) and a second-order HRQoL dimension (similar to model 5 described previously); and (7) a hierarchical model with a general HRQoL dimension common to the four main scales (Menopause and Health, Psychological, Sexuality, and Couple Relations) and the Menopause and Health dimension split into three first-order factors (Vasomotor Symptoms, Health, and Aging). Model 7 would most resemble the theoretical structure, but it entails high complexity, whereas model 6 is a robust standard in CFA models and could be easily compared with the long version estimates. Scores were computed following the proposed correction algorithm. Mean scores for known groups were compared to assess discriminant validity. All analyses were carried out using IBM SPSS 20.0 software. RESULTS The effective sample size after deletion of missing values (discussed later) was composed of 516 women. The mean (SD) age of the sample was 57.2 (4.2) years, with a minimal age of 45 years and a maximum age of 64 years. Table 1 shows the main demographic descriptors. Missing responses None of the items in the 31-item scale had more than 5% missing responses. According to the rule of discarding items with more than 15% blank responses, none of the items could be discarded. Item 26 (equality within the couple) was the item with the highest nonresponse rate (3.5%). All 31 items had a unique modal category (with maximal response frequency) with diminishing frequency for adjacent categories. Given the low nonresponse rates obtained, it was concluded that all items were well understood and pertinent in this sample. A more detailed study of nonresponses showed that 82.4% of women answered all items and 10.9% left one item blank (not always TABLE 1. Demographic data Age, mean (SD), y 57.2 (4.2) Educational level, n (%) None 16 (3) Primary 162 (31) Secondary 165 (32) Undergraduate 86 (17) Higher 87 (17) Working status, n (%) Active 301 (58) On leave 10 (2) Disabled 3 (1) Unemployed 16 (3) Retired 33 (6) Homemaker 153 (30) Habitat, n (%) Rural 49 (10) Semiurban 105 (20) Urban 238 (46) Metropolitan 122 (24) 25.1 (3.7) Body mass index, mean (SD), kg/m2 Age at menarche, mean (SD), y 12.5 (1.4) Rural, 10,000 inhabitants or less; semiurban, more than 10,000 to30,000 inhabitants; urban, more than 30,000 to 200,000 inhabitants; metropolitan, more than 200,000 inhabitants. Menopause, Vol. 22, No. 3, 2015

Copyright © 2015 The North American Menopause Society. Unauthorized reproduction of this article is prohibited.

327

CORONADO ET AL

the same one). Therefore, it was decided to discard all cases with missing values when complete information was needed, resulting in 516 cases. Floor or ceiling effect Most items exhibited a distribution of responses more or less centered on the middle categories. Items accumulating more than 50% responses in the highest (or lowest) category were selected for deletion. None of the items exhibited a ceiling effect. Item 17 (BI think that others would be better off without me[), with 50.6% responses in the lowest category, and item 24 (BSometimes I think I wouldn’t mind being dead[), with 61.4% responses in the lowest category, exhibited a marked floor effect. Both items belong to the psychological well-being dimension and refer to depressive thoughts at their highest intensity; these were excluded. Reliability analysis The Menopause and Health dimension was originally composed of 15 items with good reported reliability (> = 0.850; Table 2). In the present sample, the scale attained a good > value of 0.877 (95% CI, 0.861-0.891). Adjusted item-total correlations remained around an r value of 0.5, except for item 20 (r = 0.338). Discarding item 20 would raise the scale reliability to an > value of 0.879. The following paragraphs report the behavior of the isolated subdimensions. The Vasomotor Symptoms subdimension was originally composed of three items with good internal consistency (> = 0.868; 95% CI, 0.849-0.885). The adjusted item-total correlation was higher than 0.73 in all cases. Discarding any item would undermine the subdimension reliability, limiting to an > value of 0.824 (upper bound). The Health subdimension was composed of five items with acceptable internal consistency (> = 0.746; 95% CI, 0.712-0.776). The adjusted item-total correlation ranged from 0.4 to 0.6, with the lowest correlation being for item 1 (r = 0.431). Discarding any item would reduce reliability to at least an > value of 0.728. The Aging subdimension was composed of seven items with acceptable internal consistency (> = 0.743; 95% CI, 0.710-0.774). The adjusted item-total correlations were slightly lower than 0.5, with the lowest correlations for item 20 (r2 = 0.364) and item 31 (r2 = 0.343). Discarding any item would

reduce reliability to at least an > value of 0.738. The item 20 metric is reversed relative to the rest of the subdimension items. The Psychological dimension scale was composed of nine items with good reported reliability (> = 0.826). In our sample, after removing items 7 and 24 (described previously), the scale was reduced to seven items and reliability increased to an > value of 0.855 (95% CI, 0.837-0.873). The adjusted item-total correlations were all around an r value of 0.6. No additional item could be excluded without reducing reliability, being limited to an upper bound (> = 0.841). The Sexuality scale was composed of four items with good to acceptable reported reliability (> = 0.799). In our sample, reliability was good (> = 0.829; 95% CI, 0.805-0.851). The adjusted item-total correlations were all higher than an r value of 0.6. Items in this scale are reversed. No item could be omitted without reducing reliability to at least an > value of 0.788. The Couple Relations scale was originally composed of three items with good reported reliability (> = 0.826). In our sample, the reliability was also good (> = 0.811; 95% CI, 0.783-0.836). The adjusted item-total correlations were all higher than an r value of 0.6. Items in this scale are reversed. No item could be rejected without reducing reliability to at least an > value of 0.781. Dimensionality analysis The structure of the Menopause and Health scale was estimated jointly for the three subdimensions (Vasomotor Symptoms, Health, and Aging), assuming a second-order factor model with a common underlying latent HRQoL factor (Fig. 1), where the items were the observed indicators. GOF statistics were acceptable (Table 3). All factor loadings were significant (P G 0.001), with the lowest being those for item 20 (L203 = j0.44), item 27 (L273 = 0.47), and item 18 (L313 = 0.41) in the Aging subdimension and for item 14 (L142 = 0.56) in the Health subdimension. The proportion of item-explained variance varies between r2 values of 0.68 and 0.71. The Health subdimension exhibited more variability, with loadings ranging between 0.56 and 0.76. The proportion of item-explained variance was low for item 14 (r2 = 0.31) and item 23 (r2 = 0.36). The Aging subdimension had the lowest indicator loadings, varying (in absolute value) between 0.44 and 0.68. Consequently,

TABLE 2. Reliability estimates (Cronbach’s >) for the Cervantes Short-Form Scale and the original scale Short-form scale Dimension

Items

Menopause and Health 9 Vasomotor Symptoms (2) Health (3) Aging (4) Psychological 3 Sexuality 2 Couple Relations 2 Total 16 ICC, intraclass correlation coefficient. a Original study estimates. b Estimates obtained with the present 516-case sample.

328

Menopause, Vol. 22, No. 3, 2015

Original (long) scale

>

ICC 95% CI

Items

>a

>b

0.826 0.800 0.635 0.607 0.752 0.742 0.768 0.880

0.803-0.846 0.766-0.830 0.581-0.683 0.553-0.656 0.716-0.785 0.697-0.780 0.728-0.802 0.865-0.894

15 (3) (5) (7) 9 4 3 31

0.850 0.906 0.708 0.726 0.836 0.799 0.826 0.909

0.877 0.868 0.746 0.743 0.855 0.829 0.811 0.928

* 2014 The North American Menopause Society

Copyright © 2015 The North American Menopause Society. Unauthorized reproduction of this article is prohibited.

CERVANTES SHORT-FORM SCALE FOR MENOPAUSE

the proportion of item-explained variance was very low for item 20 (r2 = 0.19) and item 31 (r2 = 0.17). These results suggest that the latter two subdimensions could be shortened without incurring measurement problems. The second-order HRQoL dimension was well represented by the three subdimensions. The highest amount of variance was shared by the Aging subdimension (r2 = 0.93), followed by the Health (r2 = 0.80) and Vasomotor Symptoms (r2 = 0.43) subdimensions. The structure of the Psychological dimension scale was defined as a single factor with seven indicators (figure not shown). All factor loadings were significant (P G 0.001), and GOF statistics were good or very good (Table 3). Factor loadings ranged between 0.61 and 0.78, confirming that they belong to a single common dimension. No clues were obtained regarding anomalous item behaviors, and all items could initially be retained as indicators of the latent factor.

The structure of the Sexuality scale was defined as a single factor with four indicators (figure not shown). All factor loadings were significant (P G 0.001), and GOF statistics were very good (Table 3). Factor loadings ranged between 0.73 and 0.78, confirming that they belong to a single common dimension. All items behaved properly. The structure of the Couple Relations scale was defined as a single factor with three indicators (figure not shown). All factor loadings were significant (P G 0.001), and GOF statistics were excellent (Table 3). Factor loadings ranged between 0.72 and 0.89, confirming that they belong to a single common dimension. All items behaved properly. Discarding any item would lead to problems of local identification. No shortening would be recommended. To obtain a joint vision of the overall relationships among dimensions, we proposed an overall model in which the three

FIG. 1. Menopause and Health dimension confirmatory factor analysis. Standardized estimates. HRQoL, health-related quality of life; e, indicator observed; I, item; z, domain. Menopause, Vol. 22, No. 3, 2015

Copyright © 2015 The North American Menopause Society. Unauthorized reproduction of this article is prohibited.

329

CORONADO ET AL TABLE 3. Goodness-of-fit statistics for estimated structural equation models Overall fit W

2

df

P

Isolated solutions Menopause and Health 726.7 328 G0.001 Psychological 123.2 70 G0.001 Sexuality 55.7 22 G0.001 Couple Relations 12.3 12 0.418 Global solutions Six-factor complete form 2,766.1 1,244 G0.001 Six-factor short form 912.3 373 G0.001 Hierarchical short form 930.0 373 G0.001 All models assumed that measurement errors were equal. GFI, goodness-of-fit index; AGFI, adjusted goodness-of-fit index; PGFI, parative fit index; RMSEA, root mean standard error of approximation.

Parsimony GFI

AGFI

PGFI

CFI

PCFI

RMSEA

2.216 1.760 2.534 1.029

0.839 0.937 0.947 0.985

0.823 0.925 0.928 0.978

0.765 0.781 0.695 0.657

0.862 0.960 0.958 0.999

0.898 0.999 1.000 1.000

0.049 0.038 0.055 0.008

2.224 2.446 2.493

0.718 0.822 0.819

0.704 0.806 0.802

0.684 0.752 0.749

0.796 0.843 0.837

0.813 0.843 0.868

0.049 0.053 0.054

parsimony goodness-of-fit index; CFI, comparative fit index; PCFI, parsimony com-

Menopause and Health subdimensions and the other three dimensions were all included as first-level factors measuring a common second-order factor (Fig. 2). This is a robust model that is also simpler to estimate afterward than the hierarchical model. The initial estimation found an improper solution for the variance estimate of the Psychological dimension in one of the cross-validation groups; thus, the error variance was set to the maximal estimate of the other two stable solutions (#44 = 0.04). After imposing the previously mentioned restriction, all three estimates were similar and technically acceptable. GOF indexes were modest (Table 3), suggesting the possibility of improving the model either by excluding unneeded variables or by simplifying the proposed structure. All estimated factor loadings were statistically significant (P G 0.001) and similar to those obtained in the corresponding stand-alone model. The lowest observed factor loadings were those for item 31 (L31 1 = 0.39), item 20 (L20 3 = j0.49), item 27 (L27 3 = 0.45), and item 14 (L14 2 = 0.51); hence, some items shared a small amount of variance with their first-order factor, that is, item 31 (r2 = 0.15) and item 20 (r2 = 0.24). The first-order dimensions with the highest loading in the common second-order dimension were the Health subdimension (F21 = 0.99), Aging subdimension (F31 = 0.95), and Psychological dimension (F41 = 0.97), whereas the other three also showed high loading: Vasomotor Symptoms subdimension (F11 = 0.61), Couple Relations dimension (F61 = j0.52), and Sexuality dimension (F51 = j0.41). Convergent validity Additional evidence on psychometric properties was explored using items as predictors of the SF-12 dimension scores. It was assumed that good items should not only be consistent with other scale items belonging to the same dimension, but they should also be able to predict convergent HRQoL scores. However, SF-12 dimensions (Physical and Mental Health) showed considerable correlation (r = 0.554); thus, items could correlate with both dimensions and not only with the one conceptually closest. Table 4 shows item correlations with SF-12 dimensions, along with standardized coefficients and significance, when items were used as predictors of the SF-12 dimension scores. Regression models were estimated using the

330

Menopause, Vol. 22, No. 3, 2015

Other

W /df 2

Cervantes subdimension item parcels to preserve the original structure. Items with a significant correlation in the SF-12 dimension that were not the theoretically congruent and nonsignificant items were proposed for deletion. Although not reported here, items were deleted from the model using a stepwise strategy because multicollinearity could affect the regression weights of items remaining in the model. Correlations of the three items in the Vasomotor Symptoms subdimension with both SF-12 dimensions were similar (Table 4). The Vasomotor Symptoms subdimension could be considered a Physical dimension (although associated with great psychological impairment). Item 9 showed the smallest correlation with the Physical dimension, and it also was the worst predictor in this model. However, discarding either item 9 or item 3 from the model led to a regression model where the two remaining items were significant (P G 0.05) in both dimensions. It was decided to retain items 9 and 29 because they were less redundant from a conceptual point of view. Correlations of the five items in the Health subdimension with SF-12 dimensions were similar (Table 4). The Health subdimension could be considered a Physical dimension (albeit with psychological impact); in fact, correlations are slightly higher with this dimension. The items with the highest correlation were items 1, 11, and 23, and they also had significant regression weights. Therefore, all three were retained. The Aging subdimension consists of six items (with item 20 reversed) and could be considered a Physical dimension. The items with the highest correlations were items 25, 20, and 18; all but item 27 had a significant regression weight. Along with item 27, item 20 was discarded because of the inverted metric and because it had the lowest factor loading in the CFA model. After the stepwise deletion of variables, items 7, 18, 25, and 31 were retained. The Psychological dimension is composed of nine items, and it could be related to the SF-12 Mental Health dimension. Items 19, 6, 28, and 2 showed the highest correlations with the Mental Health dimension, whereas items 19, 6, and 3 had significant regression weights. Therefore, these last three items were retained. The Sexuality dimension is composed of four items, and it could be considered a Mental Health dimension. Items 15 and * 2014 The North American Menopause Society

Copyright © 2015 The North American Menopause Society. Unauthorized reproduction of this article is prohibited.

CERVANTES SHORT-FORM SCALE FOR MENOPAUSE

FIG. 2. Original scale six first-order factors model. Standardized estimates. HRQoL, health-related quality of life; e, indicator observed; I, item; z, domain.

30 were the items with the highest correlation and significant weight with the Mental Health dimension. These two items were retained. The Couple Relations dimension consists of three items, and it should be considered a Mental Health dimension. Items 13 and 26 were the items with the highest correlation with the Physical dimension, whereas items 13 and 8 were the items correlating highest with the Mental Health subdimension and having significant regression weights. Items 13 and 8 were retained. Psychometric properties of the abridged version of the scale Table 5 shows the item numbers kept in the short-form scale and the dimension they belong to, and the scoring algorithm needed to obtain a score in the range 0 to 100 in all cases. A total

of 16 items were retained; at the lowest disaggregation level, subdimensions were composed of two to four items. We assumed that the short form would be always presented with all items and future structural assessments would also be carried out with all items. Otherwise, those dimensions with only two items would be underidentified. The internal consistency for the overall short form was good (> = 0.880), although a generalized reduction of individual dimension reliabilities was observed (Table 2). The individual dimension with the highest reliability was Menopause and Health (Cronbach’s > = 0.826), whereas other dimensions and subdimensions did not behave so well. This tends to be the result when reducing to such a small number of items and was therefore not unexpected. Furthermore, the internal consistency value for the overall score captures the effect Menopause, Vol. 22, No. 3, 2015

Copyright © 2015 The North American Menopause Society. Unauthorized reproduction of this article is prohibited.

331

CORONADO ET AL TABLE 4. Correlation (r), standardized regression coefficient (A), and significance (P) of individual items with SF-12 dimensions SF-12 Physical dimension Dimension

r

A

Menopause and Health Vasomotor Symptoms Item 3 j0.264 j0.115 j0.242 j0.070 Item 9a j0.274 j0.150 Item 29a Health a j0.377 j0.200 Item 1 Item 5 j0.263 0.034 j0.435 j0.260 Item 11a Item 14 j0.266 j0.036 j0.427 j0.259 Item 23a Aging j0.431 j0.085 Item 7a Item 16 j0.423 j0.075 a j0.424 j0.142 Item 18 Item 20 0.487 0.238 j0.622 j0.385 Item 25a Item 27 j0.247 j0.003 j0.239 j0.093 Item 31a Psychological j0.356 j0.040 Item 2a j0.419 0.008 Item 6a Item 10 j0.394 j0.025 Item 12 j0.409 j0.092 j0.573 j0.355 Item 19a Item 21 j0.478 j0.182 Item 28 j0.465 j0.128 Sexuality Item 4 0.235 0.018 0.377 0.248 Item 15a Item 22 0.212 j0.065 a 0.368 0.254 Item 30 Couple Relations 0.380 0.129 Item 8a 0.441 0.222 Item 13a Item 26 0.425 0.217 Correlation r between SF-12 dimensions equals 0.554. SF-12, 12-item Short-Form Health Survey. a Items retained in the abridged version.

of combining different dimensions that do not overlap in a combined score; thus, the internal consistency figures are not so good. To check that the original structure was preserved in the short form, we estimated two different CFA models: a six first-order factors solution with one common second-order factor (similar to the one carried out with all items) and a hierarchical structure in which the Menopause and Health dimension was split into its three subdimensions (Figs. 3 and 4, respectively). As derived from the preceding analyses, the six first-order dimensions were composed of four, three, or two observed indicators (Fig. 2). To make the model identifiable, we imposed three restrictions on error variances based on the stable solutions (#22 = 0.03, #33 = 0.02, #44 = 0.03). The estimated solution obtained good fit, and most GOF values improved with respect to the corresponding model using the complete questionnaire (Table 3). Factor loadings were all significant (P G 0.001) and did not differ substantially from those obtained previously for the complete questionnaire. Once more, the dimensions with the highest shared variance with the second-order factor were the Psychological dimension (r2 = 0.97), Aging subdimension (r2 = 0.97), and Health subdimension (r2 = 0.94). The other three

332

Menopause, Vol. 22, No. 3, 2015

SF-12 Mental Health dimension P

r

A

P

0.056 0.223 0.011

j0.236 j0.249 j0.250

j0.064 j0.130 j0.116

0.291 0.026 0.048

G0.001 0.432 G0.001 0.352 G0.001

j0.356 j0.247 j0.391 j0.270 j0.358

j0.205 0.009 j0.222 j0.077 j0.170

G0.001 0.836 G0.001 0.059 G0.001

0.016 0.035 G0.001 G0.001 G0.001 0.926 0.005

j0.327 j0.310 j0.324 0.391 j0.394 j0.276 j0.213

j0.109 j0.044 j0.096 0.230 j0.162 j0.108 j0.086

0.010 0.304 0.017 G0.001 G0.001 0.007 0.029

0.322 0.850 0.547 0.019 G0.001 G0.001 0.002

j0.380 j0.423 j0.345 j0.300 j0.485 j0.342 j0.384

j0.148 j0.117 0.005 j0.002 j0.296 j0.051 j0.086

0.001 0.013 0.914 0.960 G0.001 0.246 0.060

0.720 G0.001 0.209 G0.001

0.277 0.365 0.317 0.358

0.030 0.203 0.100 0.161

0.557 G0.001 0.053 0.002

0.006 G0.001 G0.001

0.334 0.389 0.305

0.143 0.267 0.057

0.004 G0.001 0.255

first-order factors shared a smaller amount of variance: Vasomotor Symptoms (r2 = 0.38), Couple Relations (r2 = 0.21), and Sexuality (r2 = 0.26). The more complex hierarchical structures obtained GOF statistics very similar to those of the two-level structure for the short form. Because the number of degrees of freedom is equal in both models, no statistical test could be carried out to compare them, and we have to conclude that they fit the existing data equally well (Table 3). As before, the Menopause and Health dimension and the Psychological dimension obtained the highest loadings (0.97 and 0.99, respectively), dominating the information gathered in the general common HRQoL factor (Fig. 4). Sexuality (j0.49) and Couple Relations (j0.54) attained relatively lower loadings. Within the Menopause and Health dimension, the Vasomotor Symptoms subdimension was once again less prominent than the Health subdimension or the Aging subdimension.

Correction method The correction method needed to obtain the dimension and overall scores for the abridged version can be found in Table 5. * 2014 The North American Menopause Society

Copyright © 2015 The North American Menopause Society. Unauthorized reproduction of this article is prohibited.

CERVANTES SHORT-FORM SCALE FOR MENOPAUSE TABLE 5. Cervantes Short-Form Scale scoring algorithm using original item numbers and short-form item numbers Scoring algorithm Dimension

Shorthand

Menopause and Health MS Vasomotor Symptoms VM Health SL Aging EN Psychological PS Sexuality SX Couple Relations RP Total TOT a Items numbered as in the original instrument. b Items numbered as in the short-form version.

Items

Original version

Short versionb

9 2 3 4 3 2 2 16

MS = (VM + SL + EN) / 3 VM = (X9 + X29)  10 SL = (X1 + X23 + X11)  6.67 EN = (X7 + X18 + X25 + X31)  5 PS = (X2 + X6 + X19)  6.67 SX = 100 j (X15 + X30)  10 RP = 100 j (X8 + X13)  10 TOT = (MS + PS + SX + RP) / 4

MS = (VM + SL + EN) / 3 VM = (X1 + X2)  10 SL = (X3 + X4 + X5)  6.67 EN = (X6 + X7 + X8 + X9)  5 PS = (X10 + X11 + X12)  6.67 SX = 100 j (X13 + X14)  10 RP = 100 j (X15 + X16)  10 TOT = (MS + PS + SX + RP) / 4

Two algorithms are offeredVone using correlative item numbers from the proposed abridged version (labeled short version) and the other using item numbers as they figure in the long version (labeled original version)Vfor those cases in which there is a need to recalculate scores for the abridged version for historic databases. Scores are all standardized to a 0 to 100 metric, where higher scores represent worse quality of life. As required by researchers, three possible sets of scores may be computed: an overall summary score including all items, a set of four di-

a

mension scores, and an additional unfolding in three subdimensions of the Mental Health dimension. DISCUSSION This secondary analysis has shown that it is possible to establish a shortened form of the Cervantes Scale to measure HRQoL in perimenopausal and postmenopausal women. In accordance with the requirements of classical test theory, item analysis, reliability, structural validity, and concurrent validity

FIG. 3. Short scale six first-order factors model. Standardized estimates. e, indicator observed; I, item; z, domain. Menopause, Vol. 22, No. 3, 2015

Copyright © 2015 The North American Menopause Society. Unauthorized reproduction of this article is prohibited.

333

CORONADO ET AL

FIG. 4. Short-form scale hierarchical model. Standardized estimates. e, indicator observed; I, item; z, domain.

were studied to ensure that the new short version retains the original psychometric properties. Items with a marked floor effect were deleted because they actually do not discriminate enough between different women and responses collapse in the lower part of the metric, with more than half of women obtaining the same score in these items. Such items are permissible only in longer instruments where there is a need to fully segregate those women in a more severe stage of the condition. Internal consistency has also been ensured, with special focus on the overall score. The Cronbach’s > value for the total score is good (> = 0.880), and it is higher than 0.74 for the individual dimensions. Cronbach’s > assumes that the scale being studied is unidimensional, with all items measuring the same concept or trait in a consistent way. This is why > should be computed for each dimension. However, because all dimensions in this questionnaire are interrelated, the overall composite score retains good reliability properties. Nevertheless, effort should be made to retain all concepts that are important for measuring the condition of interest, and decisions regarding item reduction should not be made based on reliability alone (>, test-retest, or intraclass correlation coefficient [ICC]). Unfortunately, reliability

334

Menopause, Vol. 22, No. 3, 2015

diminishes quickly as the length of a scale is reduced, and there is always a tradeoff between shortening an instrument and retaining good reliability. In this case, reliability values obtained with the short form are better than those obtained with culturally adapted versions of the long form.20 Content validity was inherited from the processes used by the original authors for determining those concepts that are relevant to measuring the condition and important for the woman. We ensured content validity by preserving all dimensions in the original instrument.9 A closely related issue is structural validity. As is the case with most HRQoL instruments, quality of life is a multidimensional concept that involves the measurement of different dimensions or factors. Efforts have been made to combine all relevant dimensions in a multiattribute model using one single question for each attribute or concept, as was done in the EuroQoL project21 and the Health Utilities Index-3 questionnaire.22,23 However, it is more common to measure each dimension with various items to ensure that measurement error estimates are obtained and distinguished from unique item variance (unicity).24 No matter which approach is taken, it is important that all critical concepts be included in the combined measure. In the item reduction process, we were able to ensure that the original structure, containing four relevant dimensions, was maintained: (1) Menopause and Health, (2) Psychological, (3) Sexuality, and (4) Couple Relations. Furthermore, the first dimension has been divided into three subdimensions: Vasomotor Symptoms, Health, and Aging. The individual dimensions exhibited good fit statistics in their original extended format, demonstrating that the instrument is well defined, using a confirmatory task that the original authors did not include in their seminal work. With respect to the short form, whether the proposed structure is defined as a six first-order factors model or as a hierarchical model, the resulting GOF is good or even better than with the complete form in some cases. We can conclude that the original dimensions and subdimensions are preserved in the short version, but we have to remain aware that the shortened version, although covering all original concepts, is less appropriate than the extended version for measuring the isolated dimensions. Some evidence of adequate convergent validity has been found because items were selected based on their capacity to predict HRQoL dimensions, as measured by the SF-12. Even though the original SF-12 Physical and Mental Health dimensions were developed to be independent (r = 0.06),25 in the present sample, a rather high correlation between dimensions was observed (r = 0.55), making it difficult to assign each item to a single dimension. However, this finding also supports the need to use both Psychological and Physical dimensions to measure HRQoL properly. A similar proposal by other researchers for abridging the Cervantes Scale using a Colombian population was found in the literature.26 In that study, the original Cervantes Scale was reduced to a 10-item version based mainly on reliability considerations. However, several drawbacks have been identified in that study. First, the authors made no effort to maintain the original dimensional structure of the Cervantes Scale. Second, the Sexuality dimension was removed in its entirety and * 2014 The North American Menopause Society

Copyright © 2015 The North American Menopause Society. Unauthorized reproduction of this article is prohibited.

CERVANTES SHORT-FORM SCALE FOR MENOPAUSE

considered nonrelevant to measuring HRQoL in perimenopausal and postmenopausal women. Third, items with a marked floor effect (five items represented 64%- 79% of responses in the lowest response category) were retained. Fourth, only individual ICCs were considered for deletion, but no check for unidimensionality was performed to ensure the reliability of the ICC statistics. Fifth, no convergent validity measures with other valid questionnaires were performed. Sixth, a study that had recruited one third of women before menopause was used as the data source. In fact, the authors mentioned that Bthe 10-item version provides quick menopausal symptom assessment,[ but not specifically quality of life. It is important to bear in mind that the original Cervantes Scale was developed Bto assess the impact of menopause on couple relations[9 based on previous findings derived from the development of the Menopausia y Calidad de Vida (MENCAV) scale,27 where the importance of couple relations in the demographic structure of the Spanish society was identified. At the time of development, inclusion of the Sexuality and Couple Relations dimensions was deemed to be a central aspect differentiating the Cervantes Scale from other existing instruments.28

Structural validity of a 16-item abridged version of the Cervantes Health-Related Quality of Life scale for menopause: the Cervantes Short-Form Scale.

The Cervantes Scale is a specific health-related quality of life questionnaire that was originally developed in Spanish to be used in Spain for women ...
2MB Sizes 0 Downloads 4 Views