Reliability, validity, and methodological issues in assessing physical activity in a cross-cultural setting.

Research Quarterly for Exercise and Sport

ISSN: 0270-1367 (Print) 2168-3824 (Online) Journal homepage: http://www.tandfonline.com/loi/urqe20

Reliability, validity, and methodological issues in assessing physical activity in a cross-cultural setting Louise C. Mâsse To cite this article: Louise C. Mâsse (2000) Reliability, validity, and methodological issues in assessing physical activity in a cross-cultural setting, Research Quarterly for Exercise and Sport, 71:sup2, 54-58, DOI: 10.1080/02701367.2000.11082787 To link to this article: http://dx.doi.org/10.1080/02701367.2000.11082787

Published online: 13 Feb 2015.

Submit your article to this journal

Article views: 11

View related articles

Citing articles: 4 View citing articles

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=urqe20 Download by: [University of Wisconsin Oshkosh]

Date: 05 November 2015, At: 15:08

Masse

Research Quarterlyfor Exerciseand Sport ©2000 bythe American Alliance for Health, Physical Education, Recreation andDance Vol. 71, No.2, pp. 54-58

Downloaded by [University of Wisconsin Oshkosh] at 15:08 05 November 2015

Reliability, validity, and methodological issues in assessing physical activity in a cross-cultural setting Louise C. Masse

Key words: psychometric properties, questionnaire, cross-cultural differences, physical activity. ew validation studies have included various cultural groups or provided psychometric estimates for these groups. Approximately 19% of the 31 questionnaires reviewed by Pereira et al. (1997) included a significant proportion (at least 50%) of participants of varying ethnic or racial backgrounds in their validation. Unfortunately, in the majority of the studies, the psychometric estimates were pooled when multiple ethnic or racial groups were involved. In addition, no study has assessed all types of cross-cultural equivalences defined as structural, measurement-unit, and scalar equivalence. Structural equivalence occurs when the instruments have the same psychometric properties. Measurement-unit equivalence occurs when the instruments have comparable origins and different origins may occur when cultural groups have different interpretations of common physical activity terms. Finally, scalar equivalence occurs when instruments have both a common origin and identical units of measurements (van de Vijver & Leung, 1997). Equivalence is a major concern for making valid crosscultural comparisons. Dr. Kriska identified anumbel' of issues that researchers need to address when developing and modifying current physical activity questionnaires for various cultural groups. As a complement to Dr. Kriska's paper, this paper discusses a number of methodological issues related to assessing and comparing the physical activity behaviors of various cultural groups. Specifically, the current paper will identity: 1) the issues investigators encounter when making valid cross-cultural comparisons, and 2) the

F

Louise C. Masse is with the School of Public Health, Center for Health Promotion Research and Development at The University of Texas-Houston.

54

methodological considerations that need to be addressed to adequately demonstrate the cross-cultural equivalency of physical activity questionnaires. Because anthropologists do not agree as to how culture should be operationalized, in this paper the term "cross-cultural comparisons" will be used to broadly refer to subgroup comparisons of (but not limited to) age, gender, ethnicity, economic status, and education.

Issues related to making valid cross-cultural comparisons

Quasi-experimental design Cross-cultural comparison studies are by default quasi experiments, which means that participants are not randomly assigned to a subgroup, in this case a cultural group (Van de Vijver & Leung, 1997). Even if participants are randomly selected, cultural membership is pre-determined. Randomization implies that the groups are exactly equal except for the variable which is manipulated, therefore allowing us to make valid group comparisons. However, because cultural membership cannot be manipulated, to produce valid group comparisons, researchers must demonstrate that the groups are comparable by ensuring the viability of the comparison from a methodological standpoint. Although some group differences may be statistically corrected, the value of a strong methodological design cannot be overstated given the limitations of statistical approaches to correct for initial group differences. Some of these limitations will be discussed later.

RDES: June 2000

Masse


Truncated comparisons As indicated by Ragin and Hein (1993), many comparative analyses in the social sciences have made truncated comparisons that are of little value in broadening our understanding of cultural differences. Truncated comparisons are comparisons that have attempted, for example, to show that physical activity behaviors differ among different racial and ethnic groups, but they do not include enough demographic variables to demonstrate that the cultural differences observed are valid and not sample specific. Most importantly, the truncated comparison does not decompose the significant difference to determine which aspect of culture may explain the difference or rule out other plausible hypotheses, for example, to determine whether the observed differences are related to ethnicity or social class. Currently, no study has attempted to identify which component of culture is responsible for the observed differences in physical activity behaviors nor have we assessed the interactive effects of important cultural indicators. I concur with Dr. Kriska, to advance our cross-cultural understanding it is important that we assess, for example, if ethnicity is a substitute for social class or if there exists an interaction between ethnicity and social class.

Elaborated comparisons To produce valid cross-cultural comparisons, participants in the different cultural groups must be comparable, that is, have some similarities, thus requiring researchers to elaborate on the groups' differences (Ragin & Hein, 1993). Ideally, the groups should be similar in all areas other than the dimension of culture that is to be compared. Ifwe assume that two groups differ by age, education, income, marital status, occupation, and socioeconomic status and that all variables are trichomized, we accept that there are 729 possible ways in which the groups differ. Although, this example does not elaborate on all possible between-group differences, it serves to indicate that the cross-cultural differences may in some instance be due to other between-group differences and emphasizes the complexity and challenges associated with conducting elaborated comparisons. While it is critical to account for between-group differences in cross-cultural comparison studies, it is also important to be concerned about within-group differences. For example, the typical ethnic categorizations (Asian, Black, Hispanic, White, and others) do not allow for valid within-group differences. This categorization does not account for other dimensions of ethnicity nor does it consider that Hispanics come from two dozen countries and that a Hispanic person might be pure Spanish or a mixture of Spanish blood with Native Ameri-

RDES: June 2000

can, African, German, Italian to name a few (Robinson, 1998). To help researchers better define the ethnic groups they have studied, requires that we conceptually agree on the definition of cultural terms. Because the conceptualization of cultural terms is continuously being debated, it is difficult for researchers to define and operationalize such terms (Cooper & Denner, 1998). Although a number of theories have attempted to conceptualize how culture interacts with psychological and behavioral processes (Cooper & Denner, 1998), empirical support for these theories is lacking. Therefore, more empirical studies are needed because the success of comparative studies hinges on the empirical validation of current theories or the elaboration of viable theories. In practice, elaborated comparison studies are difficult if not impossible to conduct. Fortunately, by using a number of methodological and statistical strategies elaborated comparisons may be more feasible to implement. These strategies may include, but are not limited to, the use of covariates and macrovariables. Because these two strategies are often used, it is important to discuss these strategies.

Covariates. Perfect matches are rarely achieved in practice and may require that a number of variables be used as covariates. The use of covariates requires that: 1) the viability of the analysis of covariance be determined; and 2) the inclusion of certain covariates should be empirically or theoreticallyjustified. First, from a statistical perspective, analyses of covariance may be misleading if the assumption of parallel regression lines within the groups is violated (Lord, 1967). For example, a covariate would not be able to equate the groups if the analyst attempts to correct the group differences by education when one group has only high-education participants and the other group has only low-education participants. Secondly, a successful cross-cultural comparison study lies in the empirical and theoretical elaboration ofviable covariates or variables that may be kept constant through experimental manipulation. Physical activity researchers are encouraged to provide the empirical or theoretical justification that has guided the inclusion of certain covariates, because this may help us understand how the covariates interact with certain components of culture and physical activity behavior.

Ma crovariables. Macrovariables map the characteristics of many variables into a single variable (Ragin & Hein, 1993). For example, social class may be construed as a macrovariable because it is believed to aggregate education, occupation, and income, among other factors. Macrovariables are essential to obtain manageable elaborated comparisons.

55

Masse


However, the success of cross-cultural studies hinges on our ability to conceptually defend the viability of these macrovariables. Even more challenging, is the ability of these macrovariables to measure the same concept in different cultural groups. It is important that researchers begin to explicitly articulate their assumptions about macrovariables in research papers and that we begin to empirically validate these macrovariables.

Methodological considerations to empirically demonstrate cross-cultural equivalency As previously mentioned, few studies have demonstrated the cross-cultural equivalency of physical activity questionnaires. Demonstrating equivalency is important for making valid cross-cultural comparisons. This section will discuss three methodological considerations that are important to consider when demonstrating the equivalence of questionnaires: 1) conceptualization of physical activity constructs; 2) selection of comparable validation samples, and 3) statistical approaches to demonstrate the equivalency of composite scores.

Operationalization of physical activity constructs Fundamentally, it is important to evaluate how physical activity constructs are conceptualized. Cross-cultural researchers define constructs that have a universal interpretation as etic contructs, whereas, constructs that are group-specific are referred as emic constructs (Marin & Marin, 1993). True etic constructs have a universal interpretation of the dimension of physical activity it measures and it uses the same activities to measure these dimensions. However, the inclusion of group-specific activities or dimensions is very informative and is believed to provide a clearer picture of physical activity behaviors. Failing to include group-specific activities or dimensions may underestimate a group's total physical activity behaviors. Because individuals or groups are expected to engage in different activities, it seems natural that different cultural groups may have group-specific activities. However, valid comparisons can still be made if the groups have a common interpretation of a dimension (construct) although some activities are group-specific. These dimensions may still be conceived as etic constructs/ dimensions, assuming that only a small proportion of the activities are group-specific. Finally, some dimensions of physical activity may have a unique interpretation by group and would capture activities that are exclusively performed by a cultural group. Emic dimensions may arise in international comparisons and these emic dimensions are important for capturing all physical activity energy expen-

56

ditures, but cross-cultural comparisons on these dimensions would not be possible. As indicated by Dr. Kriska, a number of studies have indicated that adding group-specific activities help capture the physical activity patterns of various cultural groups. However, it is important to make sure that adding group-specific activities does not artificially increase estimates of physical activity energy expenditure. Most importantly, studies that empirically verify the universal interpretation of these constructs are needed. Many studies implicitly assume in their developmental process that physical activity constructs have a universal interpretation. As indicated by Marin and Marin (1983), it is inadequate to develop an instrument for a given group and then assume that the instrument can be translated for use with another cultural group. Similarly, having a cultural group review an instrument does not necessarily make the constructs culturally sensitive. Both processes, translating and reviewing, in many circumstances do not allow the possibility of discovering different ways of conceptualizing or interpreting these constructs but rather implicitly imposes a given formulation for these constructs. Cultural appropriateness needs to be included and present at all levels of research, including when planning the study, developing the instrument, and interpreting the results (Marin & Marin, 1983). Studies that specifically include cultural groups at the initial stage of the development are particularly needed.

Sampling issues Demonstrating cross-cultural equivalence of the psychometric properties requires that comparable validation samples be employed. Structural equivalence is often assessed by correlating a questionnaire with an external criterion (van de Vijver & Leung, 1997), which means that many researchers have relied on the Pearson product moment correlation to determine the validity of instruments. However, it is well known that restriction of range has an impact on the correlation coefficient and therefore may mask the psychometric properties of an instrument. In raw score regression, the correlation (rxy) may be written as a function of the standard deviation of x ((x) and y ((y) and slope (b)) (i.e., rxy = (x/(y) b)). Thus, any changes in the distribution of x and y affect the correlation coefficients. For instrument development, this means that if one sample includes only participants who have low and moderate levels of physical activity and the other sample has low, moderate, and high levels of physical activity, the correlation coefficient in the first sample would be lower. However, this reduction would not necessarily indicate that the psychometric properties are different between the samples. Consid-

ROES: June 2000


Masse

eration of the distribution of validity samples is particularly relevant when the psychometric properties of various cultural groups are compared, since we know that some cultural groups appear to be less active (Crespo, Keteyian, Heath, & Sempos, 1996; U.S. Department of Health and Human Services, 1996, Diez-Rious, Northridge, Morabia, Bassett, & Shea, 1999). Thus, to answer the question posed by Dr. Kriska "Are low levels of physical activity observed among cultural groups in our national data sets due to measurement errors?", it appears that little is known about this issue because few validation studies have used stratified validation samples or validation samples that have comparable distributions among cultural groups. Therefore, more comparative psychometric that would employ stratified validation samples are needed. Or at the very least, researchers need to demonstrate that the samples have similar distributions and enough variability in physical activity behaviors to obtain valid psychometric values.

Statistical methods Mutidimensional scaling and factor analysis procedures have been used to assess structural equivalence of affective tests (i.e., psychological and emotional tests). However, for the assessment of physical activity behaviors, these procedures cannot be used because the activities within and between the dimensions of physical activity are not inter-correlated. For example, it cannot be assumed that leisure activities can be correlated with work or household behaviors, so alternate procedures are needed to determine the equivalency of composite scores. These procedures would assess whether individuals who have similar physical activity energy expenditures but belong to two different cultural groups have similar physical activity profiles. If they do not, it would indicate that the composite scores obtained by aggregating energy expenditures from all physical activity dimensions are not comparable. Furthermore, it would indicate that a common interpretation of these composite scores may not be valid. Physical activity data are often aggregated at two levels: 1) by dimensions (occupation, home chores, physical recreation, leisure activities, caregiving activities, social-community-church involvement, and transportation); and 2) to include all activities. Within-dimension differences are expected, and for cross-cultural comparisons, it appears most important to determine the equivalence of aggregated dimensions (i.e., all activities aggregated to assess total energy expenditures) because it has implications for correlational and predictive studies. For example, when the relationship between stress and physical activity is compared in two cultural groups, one may find a moderate relationship in the first group and no

ROES: June 2000

relationship in the other group, even though the groups have similar total physical activity energy expenditures. However, when the physical activity dimensions are examined, the first group might be found to derive its physical activity from exercise whereas the other group might derive it from work. This simplified, example shows that even though the groups had similar total energy expenditures, their activity profiles differed, thus suggesting that aggregated group comparisons are not valid. This is not to say that if one group is less active than another group that meaningful comparisons are not possible, but differential-composite interpretation results when those who have comparable total energy expenditures but belong to two different cultural groups have different activity profiles. Dr. Kriska presented a number of cross-cultural comparison studies that assessed the relationship with various health outcomes; however, there is a need to a-priori determine whether the groups' total energy expenditures have a similar interpretation to allow for valid interpretation of cross-cultural differences. Note that differential-composite interpretation analyses are pertinent for correlational and predictive studies but not as a concern for behavioral studies aimed at comparing physical activity patterns. While this section emphasized the need to demonstrate structural equivalence, more studies that also determine the measurement unit and scalar equivalence of physical activity instruments are needed because no study has yet addressed these issues.

Summary In summary, to complement the issues raised in Dr. Kriska's paper, this paper identified a number of methodological issues that researchers need to address before valid cross-cultural comparisons can be performed (see summary of issues and proposed actions for researchers in Table 1). In addition, this paper cautions us to interpret published findings in light of the methodological issues that were raised in this paper. This paper selectively addressed some methodological issues and leaves ample room for other researchers to think of other important issues. Many of the issues raised in this paper are related to the use of questionnaires to make cross-cultural comparisons. However, many of the methodological issues presented in this paper apply to other methods of assessing physical activity behaviors. As a final remark, I do want to acknowledge that other methods used to measure physical activity (including activity monitor) provide valuable but different information than questionnaires. It remains that in large-scale epidemiological studies, questionnaires are often the most practical instrument to use, and given the errors associated with questionnaires, we

57

Masse

need to consider whether questionnaires are better at measuring physical activity behaviors at the group level or individual level.


References Cooper, C. R, & Denner,]. (1998). Theories linking culture and psychology: Universal and community-specific processes. Annual Review ofPsychology, 49, 559-584. Crespo, C.]., Keteyian, S.]., Heath, G. w, & Sempos, C. T. (1996). Leisure-time physical activity among US adults. Archives of Internal Medicine, 156,93-98. Diez-Rioux, A V., Northridge, M. E., Morabia, A, Bassett, M., & Shea, S. (1999). Prevalence and social correlates of cardiovascular disease risk factors in Harlem. American journal ofPublic Health, 89, 302-307. Lord, F. M. (1967). A paradox in the interpretation of group comparisons. Psychological Bulletin, 68, 304-305.

Marin, G., & Marin, B.Y. (1993). Research with Hispanic populations. Newbury Park, CA: Sage Publications. Pereira, M. A, FitzGerald, S. .J., Gregg, E. w., Joswiak, M. L., Ryan, w.j., Suminski, R R., Utter, A C., & Zmuda,J. M. (1997). A collection of physical activity questionnaires for health-related research. Medicine and Science in Sports and Exercise, 29(Supp!.), Sll-S205. Robinson, L. (1998). Hispanics don't exist. U.S. News & World Report, 26-32. Ragin, C.C., & Hein, J. (1993). The comparative study of ethnicity. In]. H. Steinfield & R M. Dennis (Eds.) Race and ethnicity in research methods (pp. 254-272). Newbury Park, CA: Sage Publications. U.S. Department of Health and Human Services. (1996). Physical activity and health: A report from the Surgeon Genera!. Department of Health and Human Services, Center for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion. Atlanta, GA: Authors Van de Vijver, F., & Leung, K. (1997). Methods and data analysis of comparative research. In]. W. Berry, Y H. Poortinga, J. Panday (Eds.) Handbook of cross-cultural psychology (pp. 257-300). Boston, MA: Allyn and Bacon.

Table 1: Summary of cross-cultural methodological issues, consideration discussed, and suggestions for future research.

Issues

Consideration

Suggestions for future research

Defining cross-cultural equivalency

* Cross-cultural equivalency includesstructural

* Makingvalid cross-cultural comparisons requirethat

Makingvalid cross-cultural comparison

* Cross-cultural studies are quasi-experimental * Elaborated comparisons are moreinformative than

Making methodologically valid cross-cultural comparisons

* The inclusion of group-specific activities and

* Studies should assess if adding group-specific activities

dimensions mayproduce better estimates of physical activity behaviors. * Donot assume that translating or having cultural groups review a questionnaire will result in universal constructs. * Non-stratified validation samples maymask the psychometric properties of instruments. * Differential composite interpretation hasimportant implication for the interpretation of correlational and predictive studies.

artificially increase estimates of physical activity behaviors. * Studies that include cultural groups at the planning and instrumentdevelopment stage are needed. * Comparative validation studiesthat employ stratified or comparative distributionsare needed. * Studies that determine if physical activity composite scores havethe same interpretation are needed.

58

equivalence, measurement unit equivalence, and scalar equivalence.

truncated comparisons, * Cultural terms should be feasible when covariates or macrovariables are used group heterogeneity.

more researchers demonstrate the equivalency of current physical activity questionnaires.

* More elaborated comparisons are needed while considering 1)the issues of selecting proper covariates and statistical assumptions of covariate analyses; and 2)the potential of macrovariables to simplifyour design andvalidity of these variables. * Empirical validationof theories or elaboration of viable theoriesthat conceptualize how culture interacts with physical activity behaviors are needed.

ROES: June 2000

Reliability, validity, and methodological issues in assessing physical activity in older adults.

Validity and Reliability issues in Objective Monitoring of Physical Activity.

Reliability, Validity, and Methodological Response to the Assessment of Physical Activity via Self-Report.

Reliability and Validity Issues concerning Large-Scale Surveillance of Physical Activity.

Reliability and validity of self-reported physical activity in Latinos.

Ankle Accelerometry for Assessing Physical Activity Among Adolescent Girls: Threshold Determination, Validity, Reliability, and Feasibility.

Validity and responsiveness of the Global Physical Activity Questionnaire (GPAQ) in assessing physical activity during pregnancy.

Reliability of the ASA physical status scale in clinical practice: methodological issues.

The Greek version of the Hand20 questionnaire: crosscultural translation, reliability and construct validity.

Issues of validity and reliability in qualitative research.

Methodological Issues in Assessing the Impact of Prenatal Drug Exposure.

Validity of a two-item physical activity questionnaire for assessing attainment of physical activity guidelines in youth.

Validity of Alcohol Use Disorder Identification Test: Methodological Issues.

Validity of physical activity monitors for assessing lower intensity activity in adults.

Validity and Reliability of Field-Based Measures for Assessing Movement Skill Competency in Lifelong Physical Activities: A Systematic Review.

Validity and reliability of International Physical Activity Questionnaire-Short Form in Chinese youth.

Reliability and validity of a GPS-enabled iPhone "app" to measure physical activity.

Measuring physical activity in young people with cerebral palsy: validity and reliability of the ActivPAL™ monitor.

Validity and reliability of Physical Activity Enjoyment Scale questionnaire (PACES) in children with asthma.

Validity and reliability of physical activity measures in greek high school age children.

Validity and reliability of a modified english version of the physical activity questionnaire for adolescents.

Hypo-activity screening in school setting; examining reliability and validity of the Teacher Estimation of Activity Form (TEAF).

A physical activity questionnaire: reproducibility and validity.

Is self-reporting workplace activity worthwhile? Validity and reliability of occupational sitting and physical activity questionnaire in desk-based workers.