STATISTICS IN MEDICINE, VOL. 10, 1241-1256 (1991)

AN EMPIRICAL BAYES FORMULATION OF COHORT MODELS IN CANCER EPIDEMIOLOGY CYNTHIA M. DESOUZA Depurrmenr of Mathematical Sciences. University of North Carolina at Wilmington, Wilmington. NC 28403-3297, U.S.A.

SUMMARY This paper concerns the incidence rates of malignant skin melanoma for several age-sex groups and time periods in three geographic regions, uses a method of cohort analysis and employs a two-stage random effects model. The first stage entails the assumption that the within-region variation in the frequency of disease incidence for a fixed age-sex-cohort group has a Poisson distribution with mean proportional to the population at risk. The second stage, after adjusting for age and sex, entails the assumption that the betweenregion geographic variation in the logarithm of the true incidence rate has a prior distribution with parameters estimated by the method of maximum likelihood. After adjusting for age effects, we estimate random geographic-specific cohort effects for each sex with use of an empirical Bayes method and compare the results with the usual multiplicative Poisson model that assumes fixed geographic-specific cohort effects for each sex. This comparison shows that the method presented here provides more stable estimates of geographic-specific cohort effects, and in addition the random effects model describes these data more adequately.

INTRODUCTION Epidemiologists use cancer incidence data, usually stratified by region, sex, year and age, to study geographic and temporal trends in cancer disease incidence. Many of the statistical models considered assume a strong influence on disease incidence rates of year of birth, called a cohort effect, in addition to age and sex. Graphical techniques show that age-specific disease incidence rises in a cohort-wise fashion for each sex. With simultaneous examination of cohorts in different geographic regions, the wide variation in the magnitude of incidence rates between regions makes it difficult to study the temporal trends in disease incidence. Previous methods of analysis include a method of age-adjusted rates by direct or indirect standardization as presented by Mason and McKay' for cancer mortality in the United States during 195CL69 by county, sex, race and cancer type. These age-adjusted rates have use in descriptive studies but mask important differences among age-sex groups as well as geographic variations within these groups. Breslow and Day2 studied the geographic variation in a fixed effects multiplicative Poisson model for cancer incidence rates, and used a method of proportional iterative fitting to estimate the age-adjusted geographic effects. Venzon and Moolgavkar3 modelled the geographic variation in an age-cohort model for cancer mortality rates, by means of a geographic-specific cohort effect. Manton et d4used a more complex model where the frequency of death is assumed to have a negative binomial distribution, with demographic and geographic parameters included as fixed effects. These models account for the geographic effect as 0277-671 5/9 1/08 1241-1 6$08.oO 0 1991 by John Wiley & Sons, Ltd.

Received June 1990 Revised December 1990

1242

C . M. DESOUZA

a fixed effect which in turn one estimates by maximum likelihood. One problem with maximum likelihood estimation, however, is that a zero case in any region provides a zero estimate of incidence/mortality rate in all age-sex groups in that region. More recently, hierarchical models have been used to study the geographic variation in these data. Pocock et d 5studied the geographic variation in area mortality rates by using a random effects model and a method of iteratively reweighted least squares to estimate the extra component of variance. Breslow6 used a method of moments to estimate the component of extraPoisson variation present in cancer mortality data. Tsutakawa et d 7employed a two-stage hierarchical model and an empirical Bayes method to estimate cancer mortality rates for a given age-sex group over several geographic regions. Tsutakawa' used a three-stage hierarchical model for studying cancer data over several geographic regions in a single demographic stratum, and a Bayesian method to estimate cancer mortality rates. This method provides a higher-order approximation to the posterior moments involved in the Bayesian estimation of random effects. In this article, interest is in an age-cohort model for disease incidence for both sexes over several geographic regions. For a given age-sex-cohort group in a region, we assume that the number of incident cases over a period of T years has a Poisson distribution with mean I proportional to the population at risk n, which we estimate as the number of person-years at risk. To compare populations of different sizes we consider p = A/n, where we interpret p as the annual incidence rate per individual. We further assume that the rate p varies from region to region in such a manner that we can treat the logarithms of the p's as a random sample from a prior distribution with unknown parameters. We include demographic factors such as age and sex through a log-linear model for the rates together with random geographic-specific cohort effects that have a prior distribution with unknown parameters. We estimate the demographic and prior parameters by maximum likelihood and then use these estimates to obtain estimates of random geographic-specific cohort effects. These obtain as posterior moments after we replace prior parameters by their maximum likelihood estimates. This method of estimation is referred to as empirical Bayes. Much of the computation relies on numerical integration since the mixed likelihood function is analytically intractable. The principle followed is similar to that of Tsutakawa et aL7. Efron and Morris' provide a general presentation of the empirical Bayes principle and show that empirical Bayes estimates produce improved parameter estimates. RANDOM EFFECTS MODEL

For a fixed sex i ( i = 1, 2), age group j ( j= 1, . . . , J ) , birth cohort k (k = 1, . . . , K ) and geographic region s (s = 1, . . . , S), we assume that the number of incident cases of disease Y i S s observed during a period of T years has a Poisson distribution with mean Aijks proportional to the number of person-years at risk nijks.We can express the mean 1,. as nij&.Pijks, where p i j k s is the annual incidence rate per individual. Thus all Yijk. given &jks are independently distributed. We include age, sex and cohort effects through a log-linear model for the rates, namely we parameterize 8 i j k , = log(pijk,)as t?ijkS = aij + ciks,where aij is a sex-specific age effect, and Cik. is a geographic-specific cohort effect for each sex. Given no a priori difference in cohort effects among geographic regions, we assume that {cik.} (s = 1, . . . , S) is a random sample from a normal distribution with mean c i k and variance a2, where c i k is a sex-specific mean cohort effect, and where the mean is over all geographic regions for each sex. Since the model is sex-specific, for simplicity we drop the subscript i to give yjks ejks cks

-

Poisson

= aj

+ cks

normal

(ljks

= njkspjks),

(ejks

= l0g(Pjks)h

(ck,

a').

EMPIRICAL BAYES FORMULATION OF COHORT MODELS

1243

The parameters { a j }are age effects, and the parameters {ck} are mean cohort effects. Further, we call {ck}and B prior parameters. The standard fixed effects model differs from the above random effects model in only one respect, namely it assumes fixed rather than random geographic-specific cohort effects. The rationale for considering a random effects model is described next. In younger and older cohorts of a given region, where the populations are small and the number of cases is usually small, the standard Poisson maximum likelihood method does not provide reliable estimates of these cohort effects. The assumption that the cks have a prior distribution is a vehicle for borrowing strength across regions to estimate cks for each region. Further, since E(ck,) = ck, the parameters a j and ck may be used to describe on average how a region’s log-probability of disease incidence depends on age at diagnosis and year of birth. MAXIMUM LIKELIHOOD ESTIMATION O F PRIOR PARAMETERS As described in the previous section, we have a compound sampling model Yjks

- Poisson

(Ajks

(2)

= njkspjks)

with probability mass functionf(yjksI AjkS),and 6jks = log(P,ks)

-

normal (aj

+

Ckr

a’)

(3)

with probability density function g(6jksI ajr Ck, a). This parameterization of the model is nonunique, since parameters {aj} and {ck} are unique only up to an additive constant. We estimate the parameters { a j } , {ck} and B by the method of maximum likelihood applied to the marginal likelihood function. Let y = {yjks} ( j= 1, . . . , J ; k = 1 , . . . , K ; s = 1 , . . . , S ) , a = {aj} (j= 1, . . . , J ) and c = {Ck}, (k = 1, . . . , K). Then the marginal likelihood for each yjks is

s

I(Yjks; aj, ck, a) = f(yjks lejks)

(ejks IQj? Ckr

o)dejks.

(4)

By the conditional independence among cells in a J x K x S contingency table given (a, c, a),l0the likelihood function of (a, c, a) is

Then we obtain a maximum likelihood estimate of (a, c, a) by solving the likelihood equations L“i = O ( j = l . . . , J ) ,

L , , = O ( k = l ..., K ) ,

L,=O, (7) where L, is the first-order partial derivative of L with respect to I( = aj, ck, a. The above likelihood equations are a system of non-linear equations that we can solve by the EM algorithm of Dempster et a l l 1 with use of {ejks}as missing data, or alternatively by Marquardt’s’’ method. The latter method is more efficient, in the sense that it has a faster rate of convergence. Since these equations involve the computation of single integrals that are analytically intractable, we use a numerical integration procedure, namely the 48-point Gauss-Hermite quadrature formula which is sufficient for our purposes. The differences in computation between the standard model and the random effects model are described next. The standard model uses maximum likelihood to estimate the fixed effects,

1244

C. M. DESOUZA

whereas the random effects model uses maximum likelihood to estimate the prior parameters and an empirical Bayes method to estimate the random effects. The estimation of fixed effects in the standard model employs Breslow and Day's' method of proportional iterative fitting, whereas the estimation of prior parameters in the random effects model employs the EM algorithm" with use of { 6 j k s } as missing data. The latter involves a large number of estimating equations for { a j } ,{ C k } and a, in addition to the calculation of conditional posterior expectations of the form E ( O j k s I Y j k s , B j , e k , 8 ) (see Appendixes). EMPIRICAL BAYES ESTIMATION O F GEOGRAPHIC-SPECIFIC COHORT EFFECTS Let Y k s= { Yjks}( j = 1, . . . , J ) . Then given ( a j , c k , a), the joint probability of ( Y , k s ,

1

P(Yjks9 cks aj? c k ,

and the marginal probability of

Yjks

a) = f ( y j k s I a j ,

Cks)

I ck,

cks)g(cks

is (8)

is

Assuming conditional independence among cells given (a, c k , a)," the joint probability of ( Y k s , c k s ) is J

P ( Y k s , cks

I a, c k , a) =

fl

n J

I

P ( Y j k s 9 C k s a j , ck*

j= 1

a) =

Iaj,

f(Yjks

j= 1

I

Cks)g(cks Ck,

a),

(10)

and the marginal probability of Y k s is

Assuming known parameters (a, c, a), the posterior P D F of h(cks

I Y k s r a, Ckr a ) = P ( Y k s 9

cks

I a, ckl

cks

upon observing Y k s is

a)/p(yks

1 a, Ckr a).

(12)

In the absence of known (a, c k , a), we estimate (a, c k , a) by maximum likelihood as shown in the previous section. Then we replace (a, c k , a)in (12) by its maximum likelihood estimate (A, ? k , a) to obtain the estimated posterior P D F of c k s . We use this posterior P D F to estimate c k s by its expectation ?ks

=

s

and we estimate the standard error of

Cksh(Cks

?ks

I Y k s , 9, ? k ,

(13)

by the posterior standard deviation

This estimate of standard error, however, underestimates the true standard error since it does not take into account the error in (A, t k , a). Further, the posterior P D F of 6 j k S upon observing Y j k s is h(ejksl Y j k s , aj, ck,

a) = P ( Y j k s ,

1

Cks a j , c k i a ) / P ( y j k s

Iaj,

Ckr

a).

(15)

1245

EMPIRICAL BAYES FORMULATION OF COHORT MODELS

Then we can estimate the individual rate Pjks

and the cell mean

ljks

=

s

Pjks

eeJksh(ejks

=

eeJksby its empirical Bayes estimate

1 yjks, dj, 2k, B ) d e j k s ?

(16)

by its empirical Bayes estimate

-1.

Jks

=

(17)

nP.Jks'

In the limit, as o -+ 0, ?ks + an overall mean ?k

=

log( zYjks/zeaJnjks),

where

and in the limit as c + cc , ?kS+ t k s = l o g ( c j yjks/c.edJnjks), the Poisson maximum likelihood estimate, where Bj = l o g ( ~ k , s y p s / c ke':,'. s njks). These fimiting solutions obtain from Breslow and Day's' method of proportional iterative fitting, which is equivalent to use of the EM algorithm" with no missing data. APPLICATION TO MALIGNANT SKIN MELANOMA INCIDENCE We have malignant skin melanoma incidence data available for three white populations Connecticut, 1935-79; Denmark, 1943-77; and Norway, 1955-84- for ages 2&80+ years. The periods are grouped by 5 years as well as ages up to age 79 with another age group of 8 0 + . Figure 1 presents the age- and sex-specific incidence rates per 100,OOOperson-years at risk by midyear birth cohort. This graphical analysis shows a rise in incidence rates for each cohort, with some geographic variation in the magnitude of rates across countries. Further, Figure 2 provides an age-standardized (age-adjusted to the European standard) incidence curve per 100,OOOpersonyears at risk by country, sex and calendar period. This suggests a rise in incidence rates by calendar period, with some variation in the magnitude of rates across countries. For Connecticut males the age-standardized incidence rates ranged from 1.86 to 13.57, and for females from 1.43 to 11.07.For Denmark males the rates ranged from 2.09 to 8.54, and for females from 2.47 to 11.34. For Norway males the rates ranged from 3.76 to 17.90, and for females from 4.24 to 21.22. These increases in incidence rates have been attributed in large part to increases in successive birth cohorts based on the graph in Figure 1 . A cohort analysis was performed on these data. We first fitted a fixed effects, sex-specific age-cohort Poisson model with constant age effects and geographic-specific cohort effects to these data, which provided a deviance of 255.44 for males and 305.78 for females. For the Poisson model, the deviance statistic is defined as the likelihood ratio statistic, 2cyjkslog(yjks/ljks)- 2c(yjks - ijks),where i j k S is a maximum likelihood estimate of the cell mean ljks. The second term in this expression is identically zero, under the standard fixed effects model. These values, compared against an approximate critical value that corresponds to x&,~(216 d.f.) = 251.00, suggest a poor fit of the fixed effects model. (We estimated this critical value using Fisher's approximation, ,/(2~,2)- J(2v - 1 ) N(0, l).) The degrees of freedom parameter v is defined as n - p , where n is the number of observations and p is the number of independent parameters. For the fixed effects age-cohort model, n is 286 and p is 13 58 - 1, where the number of age parameters {uj} is 13, the number of cohort parameters

-

+

1246

C . M . DESOUZA

1850

1870

1890

1910

1930

1950

1970

1950

1970

Mid -year birth cohort

1850 (b)

1870

1890

1910

1930

Mid -year birth cohort

Figure 1. Age-specific incidence rates per 100,OOO person-years at risk for malignant skin melanoma for ages 20-80 + for (a)Connecticut males 1935-79 (b) Connecticut females 1935-79 (c) Denmark males 1943-77 (d) Denmark females 1943-77 (e) Norway males 1955-84 (f) Norway females 1955-84 (A 20-24, B 25-29, C 30-34, D 35-39. E M,F 45-49, G 50-54, H 55-59, I W,J 65-69, K 70-74, L 75-79, M 80+)

1247

EMPIRICAL BAYES FORMULATION OF COHORT MODELS

35

1850

1870

1890

1910

1930

1950

1070

1950

1070

Mid -year birth cohort

(c)

40-

35-

r8 25-

E

1850

(4

1870

1890

1910

1930

Mid -year birth cohort Figure 1. (Continued)

1248

C . M.DESOUZA

"1

1850

1870

1890

1910

1930

1950

1970

1950

1070

Mid -year birth cohort

1850

1870

1890

1910

1930

Mid-year birth cohort Figure 1. (Continued)

1249

EMPIRICAL BAYES FORMULATION OF COHORT MODELS

25

1

O

I

1935

1940 1945 1950 1955 1960 1965 1970 1975

Period

(a)

1943 (b)

1948

1953

1958

1963

1968

1973

Period

Figure 2. Age-standardized incidence rates per 100,OOO person-years at risk by sex, age adjusted to the European standard for (a) Connecticut 1935-79, (b) Denmark 1943-77 (c) Norway 1955-84. (M males, F females)

1250

C. M.DESOUZA

25

1955

1960

1965

1970

1975

1980

Period

(4

Figure 2. (Continued)

{ c k s } is 58, and 1 is subtracted because of the one degree of indeterminacy, resulting in v equal to 216. On the other hand, the random effects, sex-specific age-cohort model provided an approximate deviance of 157.35 for males and 194.26 for females. The approximate deviance was computed from that of the Poisson model by substituting I j k s by the empirical Bayes estimate &$ (the second term in the expression for the deviance is no longer zero). Compared against an approximate critical value that corresponds to x&,s (251 d.f.) = 288.67, these values suggest a good fit of the random effects model. For this model, the number of independent parameters estimated is p equal to 13 22 1 - 1, where the number of age parameters {aj} is 13, the number of cohort parameters { c k } is 22, the number of extra-Poisson variation parameters u is 1, and 1 is subtracted because of the one degree of indeterminacy, resulting in v equal to 251. Further, determination of the standardized Pearson residuals ( yjru - & $ ) / , / I j k s indicated a sample mean and standard deviation of 0.0003 and 0-7350 for males and 0.0419 and 0.7996 for females. This provides some sense of the adequacy of the mean-variance structure of the model. Further, the coefficient of relative variation for the standard deviation of Ek.9 namely s E ( I ? k s ) / z k s , had a maximum value of 0.072 for males and 0.051 for females, which provides some sense of the reliability of the empirical Bayes estimates of geographic-specific cohort effects. For the fixed effects model, on the other hand, the coefficient of relative variation for the standard deviation of estimated geographic-specific cohort effects had a maximum value of 0.3537 for males and 0.3373 for females. Also, compared with the fixed effects model, the empirical Bayes estimates provided a reduction of up to 80.8 and 86.1 per cent in standard error of Zru for males and females,

+ +

EMPIRICAL BAYES FORMULATION OF COHORT MODELS

1251

-8

-11

1

20 25 30 35 40 45 50 55 60 65 70 75 80

Age - group Figure 3. Estimated age eRects on logarithmic scale by sex for malignant skin melanoma incidence in Connecticut 1935-79, Denmark 1943-77 and Norway 1955-84 (M males, F females)

respectively. The maximum reductions were obtained at the endpoints of the birth cohort spectrum (oldest and youngest cohorts), where the numbers of observations are usually very small and where one usually obtains maximum improvement. Clearly, we see that empirical Bayes estimation of geographic-specific cohort effects provides more stable estimates than the maximum likelihood fixed effects model. Estimates of the extra-Poisson variation IT(on the logarithmic scale) were 0.1953 and 0.1407 for males and females, respectively. In order to assess the significance of period effects in the random effects model, we assumed an age-period-cohort model with fixed age and period effects and random geographic-specific cohort effects. This model provided an approximate deviance statistic of 153.28 for males and 190.79 for females. These values, compared with an approximate critical value that corresponds to ~:.,,~(242 d.f.) = 279.01, suggest a good fit. For this model, the number of independent parameters estimated is p equal to 13 + 10 + 22 1 - 2, where the number of age parameters ( u j } is 13, the number of period parameters { p i } is 10, the number ofcohort parameters { c k }is 22, the number of extra-Poisson variation parameters CJis 1, and 2 is subtracted because of the two degrees of indeterminacy, resulting in v equal to 242. Further, this model provided a loglikelihood L of - 907.93 for males and - 91 1.43 for females, compared with - 910.95 and - 918.62 for males and females, respectively, obtained from the random effects age-cohort model. A formal test for the significance of period effects is based on the likelihood ratio statistic - 2(L({dj},{tk}, 8 ) - L ( { d j } {, B , } , { t k } , 8)).This statistic obtains as 6.04 for males and 14.38 for females. These values are not significant compared with a 5 per cent critical value of 16.92 based

+

1252

C . M.DESOUZA

3.0 -

F

2.5 2.0 -

1.5 1.0 -

0.5 -

0.0-0.5 -1.0

-

-1.5

Mid -year birth cohort Figure 4. Estimated mean cohort effects on logarithmic scale by sex for malignant skin melanoma incidence in Connecticut 1935-79, Denmark 1943-77 and Norway 1955--84 (M males, F females)

on x 2 (9 d.f.). Thus period effects in the random effects model were assessed to be non-significant for these data. Figure 3 presents the estimated age effects { a j }for both sexes obtained from the random effects model. As in the fixed effects model, only the relative shapes of the sex-specific age-incidence curves are important, owing to the unique determination of these age effects up to a non-zero additive constant. It is evident from these age curves that the older age groups have higher incidence rates. It is also evident that the age curve rises more rapidly for males (after fitting a least-squares regression line). Figure 4 presents the estimated mean cohort effects { c k } for both sexes obtained from the random effects model. Here again, only the relative shapes of the sexspecific cohort-incidence curves are important, owing to the unique determination of these mean cohort effects up to a non-zero additive constant. This graph indicates that incidence rates rise in successive birth cohorts, with a slight indication of a less rapid rise for males (which is evident after fitting a least-squares regression line). Finally, in Figure 5, we have a summary of estimated geographic-specific cohort effects {&} by sex. We can compare these country-specific cohort effects only within each sex, owing to the unique determination of these effects up to a non-zero additive constant. It is evident, however (after fitting a least-squares regression line), that for Denmark and Norway the cohort curve rises slightly faster for females compared with males, whereas the reverse is true for Connecticut. Further, we can use these country-specific estimates to identify countries with higher rates for further epidemiological study. For example, for males of successive birth cohorts, Norway and Connecticut appear to have higher incidence rates than Denmark. Also, for females of successive birth cohorts, Norway appears to have higher incidence rates than Connecticut and Denmark.

EMPIRICAL BAYES FORMULATION OF COHORT MODELS

1253

2.5 2.0 1.51.0 -

3 5

5

8

0.50.0 -0.5 -1.0 -

-3.0 1850

1870

1890

1910

1930

1950

1970

Mid -year birth cohort

(a)

3.0

1

H

2.5 2.01.5 1.0 -

3

0.5 -

ic

0.0-

5

0 .c

8 -0.5 -1.0 -

-1.5 -2.0 -2.5 -

- 3.0-_

i

1850 (b)

1870

1890

1910

1930

1950

1970

Mid -year birth cohort

Figure 5. Empirical Bayes estimates of geographic-specific cohort effects on logarithmic scale for malignant skin melanoma incidence in Connecticut 1935-79, Denmark 1943-77 and Norway 1955-84 for (a) males (b) females (C Connecticut, D Denmark, N Norway)

1254

C. M. DESOUZA

CONCLUSION Our data analysis shows that empirical Bayes estimation provides more stable estimates of geographic-specific cohort effects, as indicated by the low magnitude of the coefficient of variation. Further, this method provides substantial improvement for small birth cohorts that usually have very small numbers of incident cases, as indicated by the large reduction in standard error of estimates of geographic-specific cohort effects for the younger and older birth cohorts. In addition, the estimates serve to identify for further epidemiological study those geographic regions that have high incidence rates, as indicated in the previous section. On the other hand, empirical Bayes methods require the specification of a prior distribution that often does not form a conjugate pair with the likelihood function. This does not permit an analytically tractable analysis and results in the need to use numerical procedures or to find analytic approximations. The basis for all the calculations undertaken in this article were numerical integration procedures. The method proposed here falls short of the fully Bayesian approach. Although the latter accounts for the uncertainty in the maximum likelihood estimates of prior parameters, the calculation of third-order partial derivatives of the log-likelihood function required to carry out the Bayesian analysis results in a considerable increase in computation.

APPENDIX I: COMPUTATIONAL FORMULAE FOR MARQUARDT’S PROCEDURE We summarize expressions of derivatives of the log-likelihood function used in finding the maximum likelihood estimates. The marginal likelihood for each Y j k s is [ ( Y j k s ; a j , c k s , 0,

=f(Yjks1

ejks)g(ejks

I aj? cks,

Then the log-likelihood function is

L(Y;a, C, 0 ) =

C log

(18)

I ( Y j k s ; a j , CkS, 0).

j.k.s

For simplicity, we drop the subscripts j , k, s from f ( y i k s )g, ( e j k S )and derivatives of the log-likelihood function are ( j = 1,

f de 1SSfgde L o = 1Sfgu dB

LCk

gCk

=

( k = 1,

~

j,s

j.k.s

Sfgde ’

where Yaj

=9

(ejks

-

aj

U2

-

ck)2

ejks.

The first-order partial

SSZ I

EMPIRICAL BAYES FORMULATION OF COHORT MODELS

1255

The second-order partial derivatives of the log-likelihood function are

APPENDIX 11: COMPUTATIONAL FORMULAE FOR THE EM ALGORITHM

) the solution of the pth iteration. Then we carry out the usual E- and Denote ( u y ) , tip), d p ) as M-steps of the EM algorithm to obtain iterative updating equations for the unknown parameters ( a j , c k , a ) , j = ,l . . . , J , k = l , . . . , K .

1256

C. M. DESOUZA

At the pth iteration we d o the following steps:

E-step

We set

M-step We then solve the following equations for ( u y ” ) , cf‘+l), d P + l ) ) , j k = I , . . . , K, after substituting for A?), and S‘p’ obtained from the E-step:

=

1, . . . ,J ,

k, s

ACKNOWLEDGEMENTS

Support for preliminary work on this study came from USPHS grant CA39949 from the National Institutes of Health, through the Fred Hutchinson Cancer Research Center where the author was a visiting research associate in 1985-86. Computations were done principally on the mainframe computer at the University of North Carolina, Wilmington. REFERENCES 1. Mason, T. J. and McKay, F. W. U.S. Cancer Mortality by County, 1950-1969, DHEW publication (NIH) 74-615, 1974. 2. Breslow, N. E. and Day, N. E. ‘Indirect standardization and multiplicative models of rates with reference to the age adjustment of cancer incidence and relative frequency data’, Journal oj’Chronic Diseases, 28, 289-303 (1 975). 3. Venzon, D. J. and Moolgavkar, S. H. ‘Cohort analysis of malignant melanoma in five countries’, American Journal of Epidemiology, 119 (l), 62-70 (1984). 4. Manton, K. G., Woodbury, M. A. and Stallard, E. ‘A variance components approach to categorical data models with heterogeneous cell populations: analysis of spatial gradients in lung cancer mortality rates in North Carolina counties’, Biometrics, 37, 259-269 (1981). 5. Pocock, S. J., Cook, D. G. and Beresford, S. A. ‘Regression of area mortality rates: what weighting is appropriate?, Journal of Applied Statistics, 30 (3), 286-295 (1981). 6. Breslow, N. E. ‘Extra-Poisson variation in log-linear models’, Journal of Applied Statistics, 33 ( l ) , 3 8 4 4 (1984). 7. Tsutakawa, R. K., Shoop, G. L. and Marienfeld, C. J. ‘Empirical Bayes estimation of cancer mortality rates’, Statistics in Medicine, 4, 201-212 (1985). 8. Tsutakawa, R. K. ‘Estimation of cancer mortality rates: a Bayesian analysis of small frequencies’, Biometrics, 41, 69-79 (1985). 9. Efron, B. and Morris, C. ‘Stein’s estimation rule and its competitors-an empirical Bayes approach’, Journal ofthe American Statistical Association, 68 (341), 117-1 39 (1973). 10. Deeley, J. J. and Lindley, D. V. ‘Bayes empirical Bayes’, Journal ofthe American Statistical Association, 76 (376), 833-841 (1981). 11. Dempster, A. P., Laird, N. M. and Rubin, D. B. ‘Maximum likelihood from incomplete data via the EM algorithm (with discussion)’, Journal of the Royal Statistical Society, Series B, 39, 1-38 (1977). 12. Marquardt, D. W. ‘An algorithm for least squares estimation of non-linear parameters’, Journal o f t h e Society of Applied Mathematics, 11, 4 3 1 4 1 (1963).

An empirical Bayes formulation of cohort models in cancer epidemiology.

This paper concerns the incidence rates of malignant skin melanoma for several age-sex groups and time periods in three geographic regions, uses a met...
690KB Sizes 0 Downloads 0 Views