This article was downloaded by: [University of Oklahoma Libraries] On: 05 February 2015, At: 14:01 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Experimental Aging Research: An International Journal Devoted to the Scientific Study of the Aging Process Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/uear20

Continuously distributed random variables in factorial designs Robert W. Bell

a

a

Texas Tech University Published online: 27 Sep 2007.

To cite this article: Robert W. Bell (1992) Continuously distributed random variables in factorial designs, Experimental Aging Research: An International Journal Devoted to the Scientific Study of the Aging Process, 18:2, 47-50, DOI: 10.1080/03610739208253910 To link to this article: http://dx.doi.org/10.1080/03610739208253910

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

47

ExpenrnentalAging Research, Volume 18, Number 2, 1992, ISSN 0734-0664 0 1992 Beech Hill Enterprises Inc.

Downloaded by [University of Oklahoma Libraries] at 14:01 05 February 2015

Continuously Distributed Random Variables in Factorial Designs ROBERTW. BELL Texas Tech University

Age as a variable in lifespan research usually is sampled as several age blocks which, in turn, are combined with additional variables in a factorial design. Sampling a continuous variable in discrete blocks increases the difficulty in obtaining adequate sampling, reduces power, and prevents a fine grain analysis of age X treatment interactions. Age can be sampled as a continuously distributed variable, factorially combined with treatment groups, and analyzed as an analysis of variance by the use of regression analysis and comparison of multiple R2 coefficients. The advantages of such a sampling strategy include both practical sampling advantages as well as statistical advantages when compared with the usual sampling approach.

L

ifespan research characteristically is conducted using factorial designs in which several selected age groupings, selected to represent early, middle and later aged populations, are combined factorially with one or more additional variables. Usually all variables are treated as “fixed effects” variables for purposes of statistical analysis; however, the interpretation of any “main effect” for age or age x treatment “interaction” is discussed (interpreted?) as if age were a random effects variable. That is, the functional relationship between the dependent variable and age is discussed, rather than treating the ages included in the sample as “point” estimates. An alternative design and sampling strategy, in which age is incorporated as a dimension in a factorial design but sampled as a continuously distributed variable and analyzed as a random effects variable, offers many advantages relative to the usual design approach. Pedhazur (1982) presents the statistical bases for this approach but does not explicitly relate the outcome to a conventionally designed factorial experiment. Keppel and Zedeck (1989) present a similar analysis for use with unbalanced factorials with missing data. A conventional analysis of variance source table containing main effects and interactions can be obtained from a sampling experiment which combines treatment groups with continuously distributed variables. If the continuous variable is comparably distributed within each of the treatment groups the data may be analyzed

as a linear regression problem with groups defined as coded vectors, each of which corresponds to a source in a factorial experimental design. Appropriate manipulation of squared multiple correlations yields treatment sums of squares corresponding to main effects, interactions and error. The SAS/STAT (1989) GLM program can combine continuous and discrete variables into an analysis of variance source table. In order for investigators to understand the underlying process by which this regression model yields an ANOVA source table, the following section outlines a sampling strategy, an approach to coding vectors using “effect” coding, and the multiple correlations which are combined to create each source in the ANOVA table. Sampling Strategy In the case where all treatment sources other than age are experimental variables (random assignment of subjects) the sampling to create a factorial should pose no problems. Once the age distribution of interest is specified a sample of subjects representing the full range of interest can be identified. Subjects can be assigned randomly to treatment groups with the usual restriction of equal sample sizes, unless some theoretical foundation for using unequal sample sizes exists. Samples then should be compared to insure that no significant differences exist between samples. To insure orthogonality of sources the groups should be compared for significant

Send correspondence regarding this article to Robert W. Bell, Department of Psychology, P.O. Box 4100, Texas Tech University, Lubbock, TX 79409, U S A .

Downloaded by [University of Oklahoma Libraries] at 14:01 05 February 2015

48

BELL

differences in means, variances and, if the distribution of age is drastically non-normal, on estimates of skewness and kurtosis. Given the robustness of the analysis of variance, these last two indices probably are not important unless extreme values are obtained (Keppel, 1991, Ch. 5). If one of the treatment sources is not experimental, e.g., gender, the overall age distribution by gender must be matched by mean and variance (not matching individual pairs of subjects) prior to being randomly assigned to additional experimental groupings. The above-specified comparisons for significant differences between groups on age then would be conducted. In order to insure that each group has a comparable age distribution and, hence, meets the requirement of a factorial combination of treatment level by age, one or more subjects may have to be discarded or transferred between groups. Although inter-group transfer of subjects should be reflected in reduced degrees of freedom for the required F-Ratio when analyzing the dependent variable, it rarely should cause greatly disparate sample sizes nor should it appreciably influence statistical power. Certainly it never should approach the elimination of subjects and resultant loss of power which is inevitable if thc investigator restricts the sample to include only subjects within narrowly defined age ranges; e.g., 20-25, 45-50, 65 and older; which characterizes much of the extant research.

TABLE 1 Examples of Data Input for an Age x 3-Group Factorial Displaying 2 Subjects Per Group Case 1. Groups coded by orthogonal polynomials for for trends analysis. Subject Age(A) s1 45 S2 62 s3 59 S4 48 36 S5 S6 71

B1 -1 -1 0 0 +1 +I

B2 -1 -1 +2 +2 -1 -1

AB1

AB12

-45

Y1 Y2 +118 Y3

-62 0 0 +36 +71

Y

-45 -62

+96 Y4 -36 Y5 -'71 Y6

~~

Case 2. Groups coded for planned comparisons of' Group 1 (Sl,S2)vs. Groups 2+3 (S3,S4,SS,S6) and to compare Group 2 (S3,S4)vs. Group 3 (S5,S6). ~

Subject Age(A) s 1 45 S2 62 s3 59 S4 48 S5 36 S6 71

B1 -2 -2 +I +I +1 +1

B2 0 0 -1 -1 +1 +1

~

~~~

ABl

-90 -124 +59 +48 +36 +71

AB2 0 0 -5'9 -48 +36 +71

Y

Y1 Y2 Y3 Y4 Y5 Y6

Effecl Coding of Groups

Pedhazur (1982) presents several coding schemes for entering group identity into a linear regression equation. Although any of the schemes will suffice, effect coding has certain advantages. If the source has three or more levels the coding coefficients permit testing specific single-degree-of-freedom hypotheses rather than an omnibus test of that source (whether main effect or interaction). If the source is a quantitative one, e.g., dosage, these hypotheses may take the form of testing specific functions via orthogonal polynomials. In any event it insures that each single-degree-of-freedom vector contributing to a source is independent of all other vectors and that the sources are additive. Using this system a main effect with two levels would be coded - 1, 1 to denote the levels; a main effect with three levels would require two vectors using contrast coefficients which create two orthogonal vectors. If it was desirable to define the vectors as linear and quadratic functions, for example, the three groups would be coded - 1, 0, + 1 for the first vector and - 1, +2, - 1 for the second vector. Table 1 presents a (hypothetical) data file for a 3-Treatment x Age factorial experimental design in which age is a continuous variable. Two subjects per group provide an illustration of the combinations of coding vectors needed to define the factorial (not intended to suggest adequate sample size for estimating systematic effects plus error). Since there are 2 degrees of freedom for groups, two coded vectors are needed to

+

define the main effect and two additional vectors to define the Group x Age interaction. The first example uses contrasts corresponding to linear and quadratic functions, which would be appropriate if the three treatment groups constituted three points on some quantitative dimension. The second example uses contrasts which would be more useful if the groups consisted of a control condition plus two treatment conditions. In this example the vectors could be separately tested to compare Control vs. Combined Treatments followed by a comparison of the two treatment groups to one another. However, if only main effects and the interactions are estimated, the two coding schemes would yield identical sums of squares for those sources. Table 2 presents the analysis of variance with Sums of Squares and degrees of freedom (df'error based upon 10 subjects per group). Each treatment SS is estimated by multiplying SS Total by the difference between the multiple R2 based upon all variables minus the multiple R2 based upon all variables except the one being estimated. Since R2 estimates the proportion of variance due to the combined variables in the multiple regression equation the difference between the two R2 values represents the proportion of variance due to the variable(s) omitted from the less complete model. Multiplying this value by the SS Total yields the SS For that source. The SS Residual equals 1-SS Treatments with dfequal to df Total minus df Treatments.

RANDOM VARIABLES

49

TABLE 2 Analysis of Variance Source Table with Sums of Squares ( S S ) and Degrees of Freedom (df) Source

SS

Age(A)

SST (R2YqA,B1,B2,AB1,AB2) - (R2 Y. Bl,B2,ABl,AB2) SST (R2 Y.A,Bl,B2,ABl,AB2) - (R2 Y.A, ABl,AB2) SST (R2 Y.A,Bl,B2,ABl,AB2) - (R2 Y.A,Bl,B2 ) SST (1 -R2 Y.A,Bl,B2,ABl,AB2)

Downloaded by [University of Oklahoma Libraries] at 14:01 05 February 2015

B

AB Residual

df 1

2 2 24

Note: To further analyze Main Effect B into planned comparisons, each with a single df, SS can be estimated by omitting either B1 or B2 singly from the regression equation to be subtracted. Similarly, the interaction term could be further analyzed into two orthogonal Age x Planned Comparison components. SST = Sums-of-Squares-Totalwhich equals SSY in typical regression notation. Degrees of Freedom for Residual is based upon 10 subjects per group. In general, it is Total a m i n u s Treatment df.

Random vs. Fixed Effects Analysis The sampling strategy described above treats age as a random effects variable and the analysis of variance should reflect this randomness. As stated earlier, even when age is sampled as a stratified variable it frequently is interpreted as if it had been analyzed as a randomeffects variable. Investigators rarely conduct random effects analysis, probably because it requires that variables which are factorially combined with a random variable be tested for significance using the interaction between the two variables as the error term (Keppel, 1991 Ch. 22). For the design and analysis illustrated in Tables 1 and 2 this would require testing the main effect for groups using the Mean Square Interaction as the error term. Hence, the F-Ratio would have 2 and 2 degrees of freedom rather than 2 and 24 degrees of freedom, yielding a statistical test with unacceptably low power except in the presence of very large treatment effects. A n approach to solving the problem of limited power for this main effect for most cases depends upon the magnitude and significance or non-significance of the interaction term. If the interaction is significant at an acceptable p-value, e.g., .05, the investigator may interpret the interaction with little concern for the main effects. If the interaction is not significant with power at an acceptable level, e.g., 3 0 (Cohen, 1990), it may be pooled with the residual term and the pooled interaction/ residual used as error term for the main effects. TO insure adequate power before implementing a pooling procedure the investigator might specify a liberal level of significance, e.g.,p = .10 or .25. Only if the interaction between a fixed-effects and random-effects variable is

significant at the liberal p-value but not significant at the more stringent p-value does this analysis fail to provide adequate tests of all sources of interest. Such an outcome might require additional sampling to increase the power of the test of the interaction. Since hypotheses in lifespan research usually center upon a Treatment x Age Interaction most cases can be analyzed with acceptable power if the usual requirements for acceptable power are met. Statistical evaluations in the presence of a significant interaction must take into account the continuous nature of the age dimension. Any point estimate of Treatment x Age is meaningless since any given age is likely to be represented by very few subjects. Instead, an appropriate follow-up analysis is to calculate a regression function for age separately for each treatment group. Treatment groups can be compared in terms of regression coefficients in the obtained age functions. Linear functions could be tested for non-linear components and, if present, higher-order functions can be formulated and tested for goodness-of-fit. This approach not only would provide a basis for comparing linear age functions among the treatment groups but would permit a statistically based conclusion regarding different non-linear age functions by treatment group. For example, if groups defined gender an investigator could determine if some age-related difference, reflected in the dependent variable, shows comparable differences in the two genders, whether the difference appears at about the same age for the two genders, whether the magnitude of difference is some monotonic function of age for both genders or levels off or is potentiated at about the same age for the two genders, and perform other possible fine grain analyses of Age x Treatment effects not possible with few fixed levels of age as in the traditionally-designed factorial. Summary and Conclusions Any continuously distributed variable, such as age, can be incorporated into factorial designs without selecting specific limited ranges of values for inclusion in the experimental design. Combining a full distribution with groups, using coded vectors to define groups, can be analyzed into sources from which sums of squares, degrees of freedom, mean squares and F-Ratios can be estimated. If the continuous variable is comparably distributed for each group, defined by equal means and variances, the sources are orthogonal and can be interpreted in terms of main effects and interactions in a factorial design. The continuous variable should be defined as a random-effects variable for purposes of specifying the appropriate error term for the F-Ratios. This reduces the power of the F-Test on the main effect of sources which are factorially combined with the continuous variable, e.g., groups, but does not affect the test for interactions which usually define the major hypotheses. Significant interactions can be followed by comparing regression functions of the continuous vari-

BELL

Downloaded by [University of Oklahoma Libraries] at 14:01 05 February 2015

50

able obtained separately for each group of the variable which contributed to the interaction. When compared to the usual procedure, which involves creating blocks to represent the continuous variable, this sampling strategy has a number of major advantages. First, it makes obtaining a sample much easier. The capability of incorporating all available subjects, rather than restricting the sample within relatively narrow age limits, facilitates the inclusion of an adequate size sample. Second, incorporating the entire sample yields greater statistical power than any sampling approach which either discards subjects or combines non-identical but similar subjects into blocks and treats them as if identical. Third, it permits a finer grain analysis of age effects, which may be particularly useful if non-linear age functions are present for any or all treatment groups.

References Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45,121,1304-1312. Keppel, G. (1991). Design and analysis (3rd Edition). Englewood Cliffs, NJ: Prentice-Hall. Keppel, G., & Zedeck, S. (1989). Data avialysis for research designs: Analysis of variance and multiple1 correlational approaches. New York: W. H. Freeman, New York, NY. Pedhazur, E.J. (1982). Multiple regression in behavioral research (2nd Edition). New York: CBS College Publishing. SAS Institute Inc. (1989).SAS/STAT User's Guide (Version 6, 4th Edition, Vol. 2). Cary, NC: SAS Institute Inc.

Continuously distributed random variables in factorial designs.

Age as a variable in lifespan research usually is sampled as several age blocks which, in turn, are combined with additional variables in a factorial ...
384KB Sizes 0 Downloads 0 Views