Article

Evaluation of cluster recovery for small area relative risk models

Statistical Methods in Medical Research 2014, Vol. 23(6) 531–551 ! The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0962280214527382 smm.sagepub.com

Chawarat Rotejanaprasert

Abstract The analysis of disease risk is often considered via relative risk. The comparison of relative risk estimation methods with ‘‘true risk’’ scenarios has been considered on various occasions. However, there has been little examination of how well competing methods perform when the focus is clustering of risk. In this paper, a simulated evaluation of a range of potential spatial risk models and a range of measures that can be used for (a) cluster goodness of fit, (b) cluster diagnostics are considered. Results suggest that exceedence probability is a poor measure of hot spot clustering because of model dependence, whereas residual-based methods are less model dependent and perform better. Local deviance information criteria measures perform well, but conditional predictive ordinate measures yield a high false positive rate. Keywords Model-based clustering, spatial, small area, health, disease mapping, evaluation

1 Introduction A range of models are now available with a variety of goals in disease mapping applications. One goal is to isolate clusters of disease while another is to model clustering. To this end, standard relative risk models as well as specific clustering models have been proposed for use. The comparison of relative risk estimation methods with ‘‘true risk’’ scenarios has been considered on various occasions.1,2 A variety of measures have also been proposed for assessing clustering (e.g. exceedence, residual thresholding, or parametric estimation). This paper describes exploration of the appropriateness of both different models and also different metrics in finding unusual aggregations (or clusters) of disease in small areas. Cluster analysis of disease incidence has a long history, and a variety of approaches can be adopted for this analysis ranging from testing-based methods such as the use of SatScan3 to fully parameterized cluster models.4,5 Some evaluation of model-based clustering of risk has been attempted previously. For example, local likelihood models were evaluated via a small range of performance measures.6 Furthermore, a wider evaluation of spatiotemporal models has also been attempted.7 In this paper, a novel evaluation of both a range of Bayesian spatial models and a range Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA Corresponding author: Chawarat Rotejanaprasert, Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA. Email: [email protected]

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

532

Statistical Methods in Medical Research 23(6)

of measures which may be used for cluster assessment is made. The paper is structured as follows. First, notation and study context and structure (including simulation approach) are introduced. Next, the models to be evaluated and then the relevant measures are established. Following that, results of the simulated evaluation are presented and finally a discussion and conclusions are made.

2 Methodology In this work, a data format commonly used in small area health studies, counts of disease in small areas, is assumed as the focus. Arbitrary areas (e.g. zip codes, counties, post codes, etc.) are denoted as i ¼ 1, . . . , m where yi and ei , respectively, denote the count and expected count within the ith area. A common assumption is made that the counts are independently distributed Poisson variates, i.e. yi  Poissonði Þ, i ¼ ei i where i is the relative risk parameter of the ith area. A primary focus of disease mapping is the modeling of relative risk of disease, and the commonest approach to the specification of the risk model is to assume that the log of the relative risk can be modeled linearly via covariates and/or random effects. Hence, logði Þ is then parameterized so that relevant variation in risk is captured. Often in relative risk models, random effects are included to allow for uncorrelated or correlated noise in the risk. Commonly, an additive combination of uncorrelated and correlated effects is assumed and this is known as a convolution model (additive random effects with Gaussian prior distributions). Other models can be considered however. For example, simple long range variation in risk such as spatial trend can be assumed. The use of relative risk models with a focus on isolating clusters of unusual risk has been proposed by Richardson et al.8 Some of these risk models are examined here but they are complemented with models that specifically focus on clustering of risk (cluster models). A cluster of risk is typically defined as an area of unusually elevated or reduced risk. Here, increased risk is the only focus. The definition of ‘‘unusual’’ depends on the assumptions. In the simplest approach, any area displaying unusually elevated risk is a cluster or part of a cluster. In more sophisticated models, criteria are introduced concerning contiguity of areas or morphology of the cluster. Membership of cluster sets becomes important because this is the classification of an area as being within a cluster or not. A spatial cluster is here defined as a group of neighboring areas with unusual relative risks; the relative risk is usually modeled as a function of spatial components. Here, a range of scenarios where true clusters (predefined areas of elevated risk) are given and the recovery of these areas as a criterion of the evaluation procedure are considered. Before describing the simulation scenarios in detail, the models to be evaluated with respect to cluster detection capabilities are described.

2.1

Simple trend model (Trend)

The simplest choice to model the relative risk is the simple trend model which is defined as logði Þ ¼ 0 þ x Sxi þ y Syi where 0 represents the overall risk under the log scale, Sxi and Syi are, respectively, the geographical coordinates of the centriod of the ith area, and x and y are the coefficients of the geographic coordinates, respectively. All parameters are assumed stochastic and, therefore, have prior distributions under our Bayesian formulation. It is commonly assumed that the intercept and slope coefficients can be described by a zero-mean Gaussian distribution if the parameter can have both negative and positive values and this is the case of log relative risk models. Thus, a centered Gaussian distribution is assumed for each parameter. The standard deviation of each regression parameter and the intercept in the trend model were assumed to have a uniform distribution with a range of (0,100).

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Rotejanaprasert

533

The simple trend model consists of only a trend component which is not often a popular choice used to model relative risk as the linear trend assumption does not include any random noise or variation. However, the trend model is considered because it could be regarded as a base model for spatial variation and in some cases is more parsimonious and fits better than more sophisticated random effect models in certain situations.5

2.2

Uncorrelated heterogeneity (UH) model

The trend model assumes that the log relative risk is linearly described only by location. Simple regression models usually do not capture the overdispersion or random variation due to unobserved confounders. Perhaps a more appropriate model should include some terms to capture such effects. Initially, an uncorrelated component can be included to account for random variation in the model and the model can be defined as logði Þ ¼ 0 þ vi where vi represents the variation in risk of the ith area differing from the overall risk, 0 . The random effect can allow for the effect of unobserved covariates and residual noise in relative risk. Since the support of vi is on the real line, a zero-mean Gaussian prior distribution is sensible to describe the parameter. The common standard deviation of vi and intercept were assumed to have a uniform distribution with a range of (0,100).

2.3

Convolution model (BYM)

Often, spatial data contain geographical characteristics which usually appear to be positively correlated. Spatial dependency can lead to spatial autocorrelation in disease maps. Although UH models are useful and capture some extra random variation, spatial correlation is not accounted for. One approach to inclusion of spatial autocorrelation is to consider the addition of a spatial Gaussian process. A spatial Gaussian process9 is defined marginally by a multivariate normal distribution with mean and covariance inherited from the process. This provides a sophisticated description of continuous spatial variation. A simpler alternative is the convolution model suggested by Besag et al.10 In the model, both UH and correlated heterogeneity random effects are employed. This is sensible since unobserved risk confounding effects within a study area can take on a variety of forms, both structured and unstructured effects. There is no reason a priori to exclude either of them. Thus, the convolution model includes both within an additive formulation: logði Þ ¼ 0 þ vi þ ui where 0 and vi are specified as in the UH model described earlier. For the correlated component, ui , Besag et al.10 proposed an improper conditional autoregressive model (ICAR) based on i the neighborhood of the ith area. The neighborhood, i , is assumed to be only the adjacent neighbors and ni denotes the number in the adjacent set. That is, the BYM model, with the conditional specification of the ICAR component, can be written as logði Þ ¼ 0 þ vi þ ui ! 1X u1 ui jui  Normal uj , ni j2 ni i

vi  0   ¼

Normalð0, v1 Þ Normalð0, 1 Þ 0 1 ; sd  Unifð0, 10Þ sd2

 ¼ u , v , 0

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

534

Statistical Methods in Medical Research 23(6)

The conditional specification of this model provides a simplified description, assuming Markovian dependence on immediate neighbors of areas. The combination of spatial correlation and the uncorrelated component makes the convolution model a robust method in various situations of risk estimation and, therefore, is appropriate for the description of clustered risk due to the general definition of a cluster as spatial association with elevated risks. This model has previously shown to perform well in recovery of true relative risk across simulation scenarios.1,2 It has not been evaluated for clustering however.

2.4

Dirichlet process (DP)

In parametric Bayesian settings, parameters are assumed to be random, and the data level distribution is assumed to be known. Nonparametric Bayesian models are defined to allow for a more flexible mixture specification of data models. The most commonly used method for this is the DP.11 This gives us a way to estimate a mixture probability density with unrestricted shapes and potentially provides an alternative flexible model for risk variation. To construct the DP, Blackwell and MacQueen12 proposed an algorithm known as the generalized Polya urn scheme. Another wellknown representation of the DP is the Chinese restaurant process.13 Here, we want to examine the relaxation of parametric assumptions and adopt the construction based on the stick-breaking representation proposed by Sethuraman14 which is easily implemented. It is assumed that theP relative risks can be modeled using the stick-breaking representation such that i  G where G ¼ 1 delta function and G0 is the base j¼1 j j and j  G0 .  is the Dirac Q distribution. The weights can be obtained as j ¼ 0j j1 ð1  0i Þ and 0j  Betað1,  Þ. This i¼1 representation of a DP provides a simple way to obtain the nonparametric sample and shows the fact that the samples are discrete. G0 is considered to be a Gamma distribution with parameters G0 and G0 , and a Gamma distribution is assumed for each parameter. Here,  is a concentration parameter and is correlated with the estimated number of components. A log-normal distribution with a zero mean and large variance is considered as a prior distribution for  . The number of mixtures in the stick-breaking weights can be potentially infinite. However, the possible maximum number of mixtures, J, can be set to a number so that the number of components can be estimated and the component weights need to be normalized. The DP using this stick-breaking process, with associated prior distributions, can be expressed as J X i  G; G ¼ j j j¼1

j

j

¼ PJ

j¼1

j

j  GammaðG0 , G0 Þ j ¼ 0j

j1 Y

ð1  0i Þ; 0j  Betað1,  Þ

i¼1

for j ¼ 1, . . . , J G0 , G0  Gammað0:1, 0:1Þ Þ logð Þ  Normal ð0, 1  1  ¼ 2 ; sd  Unif ð0, 10Þ sd

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Rotejanaprasert

535

The prior mean of number of mixtures can be approximated from the Polya urn scheme as  logð1 þ n Þ when n,   0 where n is the number of observations.13,15 Thus, the prior number of mixtures is proportional to  as the large value of  implies a larger number of mixtures a priori which grows logarithmically in the number of observations. This slow growth of the number of mixtures is sensible because the new observations tend to belong to a big mixture. It is expected that there will be a small number of mixtures with large membership.

2.5

Spatial cluster model (SCL)

Bayesian nonparametric models have gained considerable acceptance in many fields due to their flexibility. Additionally, the nonparametric nature of these models makes them very effective for clustering problems where the distinct number of components is unknown beforehand. By the construction, DP can be seen as a mixture model with the stick-breaking weights. However, the model does not incorporate an explicit spatial structure which is an important component of spatial models as mentioned previously. Several Bayesian parametric and nonparametric spatial mixture models have been proposed16 but they have been little focused on spatial cluster data in the context of disease mapping. In this paper, a spatial mixture model for cluster data in small areas is also proposed. Variants of this proposal have been previously proposed.17,18 A finite mixture is assumed as a mean mixture model. The latent allocation vector z ¼ (z1, . . . , z159) with zi 2 f1, . . . , Jg as the mixture (component) label of the component generating the ith observation is assumed. Then P½zi ¼ j jJ, pi  ¼ ij independently and yi jJ, z, pi ,   h  Poisson ei zi independently. An approach to avoid fixing the number of mixtures is to employ an entry parameter.19,20 Let the maximum number of mixtures be J which is potentially as large as the number of areas but can be assumed to be sufficiently large, e.g. 10 or 20, to find true number. Define j as an entry parameter to have a Bernoulli distribution with the parameter pj 21 described by P a Beta prior distribution, following Kuo and Mallick. Then, ij is defined as ij ¼ j hij = k k hik . The jth component is excluded from the model when j ¼ 0 and is included when j ¼ 1. The cluster function, hij, is specified with two components as hij ¼ expfd1 ðsi , cj Þ  d2 ð yi =ei , j Þg where si is the centroid of the ith area, yi is the count of the ith area, cj is the center of the jth cluster, and j is the estimated relative risk of the jth cluster. That is, d1 represents a spatial distance between the ith area and the jth cluster while d2 measures the similarity of the crude rate, yi/ei, and the relative risk of the jth cluster. Hence, it is natural to assume d1 and d2 to be the Euclidean and absolute distances, respectively. The areas with similarity in location and magnitude of risk would have a higher probability to have the same component. For the location of a cluster center, each of the coordinates is assumed to have a prior normal distribution with mean as the center of the map with its standard deviation described by a uniform distribution with a range of (0,100). The centers are posterior sampled with other parameters. In the posterior any cluster center falling outside the study region had low probability of acceptance. Moreover, the relative risk of the jth cluster, j , is identically distributed as Gamma with parameters  and  each also described by a Gamma distribution. The relative risk of the ith area is assumed to inherit the risk of the cluster. That is, assigning i ¼ zi for all i areas. Thus, the hierarchical cluster model can be formulated as   yi jJ, z, pi , h  Poisson ei zi P½zi ¼ jjJ, pi  ¼ ij

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

536

Statistical Methods in Medical Research 23(6) ij ¼ j hij =

X

k hik

k

hij ¼ expfd1 ðsi , cj Þ  d2 ð yi =ei , j Þg j  Betað1, 1Þ j  Gammað ,  Þ cj ¼ coordinateðcxj , cyj Þ   cxj  Normal xcenter , x1 cyj  Normal ð ycenter , y1 Þ for j ¼ 1, . . . , J  ,   Gammað0:1, 0:1Þ 1 x ¼ 2 ; sdx  Unif ð0, 10Þ sdy y ¼

1 ; sdy  Unif ð0, 10Þ sd2y

3 Cluster evaluation The evaluation of the spatial models can be divided into two components. First, global measures to assess the overall performance of each model fitting to the data can be considered. Second, localized clustering behavior can be evaluated using local diagnostics. The first approach evaluates model behavior while the second approach can be used to evaluate model diagnostics. One of the most common global goodness-of-fit (GOF) measures is the deviance information criteria (DIC), and we also examine local unit-wise measures such as the local deviance information criteria (local DIC), standardized Bayesian residual, conditional predictive ordinate (CPO), and relative risk exceedence probability.

3.1

GOF measures

3.1.1 DIC Information criteria are measures of the relative GOF of a statistical model. Usually, these measures of model adequacy compare a fitted model to a saturated model. It is based on the difference between the log likelihood of the data under either model. An important variant of information criteria that is widely used in Bayesian modeling is the DIC. This is defined as DIC ¼ 2Ehjy ðDÞ  D½Ehjy ðhÞ where D ¼ 2ðlog f ðyjhÞÞ, the Bayesian deviance of the model. The   DIC is based on a comparison of the average deviance, Ehjy ðDÞ ¼ 2E log fhjy ðyjh Þ  P log fhjy ðyjhg Þ D ¼ 2 G from Markov chain Monte Carlo (MCMC) sampling with G iterations, g¼1 G     and the  deviance of the posterior expected parameter estimates, D Ehjy ðhÞ  D h~ ¼  g g 2 log fhjy yjh~ . For any sample parameter value g the deviance  is  Dðh Þ ¼ 2 log fhjy ðyjh Þ. The effective number of parameters (pD) is estimated as pD ¼ D  D h~ , and finally, DIC ¼ D þ pD: One natural choice for pD would be to use the posterior mean h ¼ Ehjy ðhÞ as the estimator for h~ which leads to the standard DIC proposed by Spiegelhalter et al.22 From Jensen’s inequality, pD is

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Rotejanaprasert

537

guaranteed to be nonnegative when using the posterior mean as an estimator of ~ for any logconcave likelihood function. However, pD can be negative for non-log-concave likelihoods in certain situations. An alternative estimatorPof pD proposed by Gelman et al.23 is based on the g  2 posterior variance of the deviance: pDr ¼ ð G g¼1 ðDðh Þ  DÞ Þ=ð2ðG  1ÞÞ. The DIC based on this  estimator is DICr ¼ D þ pDr: 3.1.2 Mean squared predictive error (MSPE) The approach proposed by Gelfand and Ghosh24 measures predictive capability of a model by comparing the observed data to predicted data from the fitted model. Define the ith predictive data item as ypred . An overall P crude loss across the data is afforded by the average i P measure of 2 loss across all items: MSPE ¼ i j ðyi  ypred ij Þ =ðG  mÞ, where m is the number of observations and G is the sampler sample size.

3.2

Cluster diagnostics

In addition to global GOF measures, which address overall performance of models we have also examined the local behavior of diagnostics relevant to clustering. To evaluate clustering behavior, a number of local measures of GOF have been examined. These include local DIC, CPO, and exceedence probabilities. 3.2.1 Local DIC A variety of forms of DIC have been used as a primary measure for overall or global model comparisons in Bayesian settings. However, the DIC can be evaluated locally providing a finer level of model diagnostics. The partitioning of the DIC was initially given by Spiegelhalter et al.22 according to individual observations as DICi ¼ D i þ pDi where D i is the average deviance of the observation i, and pDi is the effective number of parameters or the information contributing to its own fitted value of the observation i. One can also estimate the pDi from the local average deviance and the local deviance of the mean parameters. This calculation can also apply to DICr (DICr is based on the term pDr which is a notation for approximation of pD. Then DICr is a notation of approximation of DIC using pDr) as well. 3.2.2 CPO Although posterior predictive measures are useful for model checking and diagnostics, they have a drawback as they make double use of the data. A solution is proposed by Geisser and Eddy25 using the leave-one-out cross-validation which is known as the   R predictive  density  CPO. The CPO is defined as CPO ¼ f y jy ð jh Þ f hjy ¼ f y dh and is estimated from a i i ni i ni P 1 1 sampler by ðG1 G Þ , where y is the ith observation of y and y is y after omitting yi g i \i g¼1 fðyi jh Þ and G is the sampler sample size. The CPO is a convenient posterior predictive checking tool because it is easily implemented using MCMC and can be used to identify outliers, influential observations across different nonnested models. A larger value implies a better fit of the model to the observation and a low value suggests that the observation is an outlier or influential observation. 3.2.3 Exceedence probability To investigate localized behavior of a model, an exceedence probability is an important tool for the assessment of unusual elevation of disease.8,26 The probability can be estimated by recording how

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

538

Statistical Methods in Medical Research 23(6)

often the relative risk exceeds a threshold and has been used to evaluate how unusual the risk is in an area.6 The exceedence probability is usually calculated from the posterior sample values and defined P g as P^ ði 4 cÞ ¼ G g¼1 Iði 4 cÞ=G where G is the sampler sample size and  1 if  4 c Ið 4 cÞ ¼ 0 otherwise There are two components needed to calculate the probability: the cut-off point c for the theta and the threshold for a probability to define an unusual risk. Some thresholds conventionally chosen are, for example, 0.95, 0.975, and 0.99 while c can be a range of extreme risks such as 1 or 2 and 3. Larger values of c represent more extreme risk levels. Analysis of residuals forms a fundamental part of the assessment of model GOF in any area of statistical application. When Poisson likelihood models are assumed with yi  Poissonði Þ, then a standardized residual can be defined by yi  ^ i resi ¼ pffiffiffiffiffi ^ i In general, when a model fits well, one expects the residuals from that model to have a number of features. Residuals are commonly used in regression analysis to detect data points with extreme variation. They can also be used in cluster recovery since residuals potentially contain cluster information, evidence of grouped areas with unusually elevated risk, which is not captured in a model. In this paper, I also calculate the exceedence probability of standardized as a P residuals g measure to evaluate the unusual-risk areas which is defined as P^ ðresi 4 cÞ ¼ G Iðres 4 cÞ=G i g¼1 where G is the sampler sample size. 3.2.4 Receiver operating characteristic curve (ROC curve) The cluster diagnostics are used to assess whether an area is considered to have unusual risk or not. This concept is similar to a binary classification. The concepts of sensitivity and specificity are readily usable in this context to measure the performance of cluster diagnostics. A ROC curve is a graphical representation exemplifying the effectiveness of a binary discrimination system as its classification threshold is varied. It is generated by plotting the true positive rate, the proportion of high-risk areas exceeding a threshold, against the false positive rate, the proportion of low-risk areas exceeding a threshold at various threshold settings. The true positive rate is also known as sensitivity, and the false positive rate is one minus the specificity or true negative rate.

4 Evaluation procedures To assess the performance of various models a simulation-based procedure was adopted. A suitable simulation map was identified to use as a basis for our evaluation. This map is used as a study area into which simulated risks and counts from known models were generated. The assessment of clustering was performed based on the county map of the state of Georgia, USA. This state has a large number of counties (159) with a reasonably regular spatial distribution and small range of county size and shape. Figure 1 displays this map. To compare the cluster detection capability of the models and measures, a simulation study was conducted with spatial clusters defined as a group of areas with elevated relative risks. Count yik for county i of the kth dataset was simulated from yik  Poissonðeik ik Þ, where i ¼ 1, . . . , 159 and

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Rotejanaprasert

539

Figure 1. Georgia county map (left) with cluster design one (middle) and two (right) of the simulated data. The dark-colored areas have a high relative risk and the light-colored areas have a low relative risk.

k ¼ 1, . . . , 300. The expected rate, eik , was assumed to be one for each of the county and simulation. For the first simulation scenario (relative risk scenario 1), each of simulated clusters (high-risk areas) was assumed to have aggregated areas with the elevated relative risk of 3 while the other counties (low-risk areas) were assumed to have a one-unit risk. The true relative risk, ik , was thus modeled as ik equals 3 and 1 for high- and low-risk areas, respectively. The relative risk of 3 could be considered as a very high risk and most of the measures were expected to have a good recovery rate. Next, another scenario (relative risk scenario 2) assuming high-risk areas to have a relative risk of 2 was conducted to examine how well the measures can detect a cluster in a situation with low-risk clusters. Moreover, models and diagnostics were also evaluated to access their performance under the null situation, a condition of no clusters (relative risk scenario 3). In this null design, the true relative iid risks were modeled as ik  Gammað1, 1Þ for all counties. This distribution can produce some isolated areas of high or low risk but yields risks 90% of which are below 2.9. In design one (Figure 1, middle), the size and shape of the clusters are quite regular and similar. In addition, another clustering design with different sizes and shapes was adopted and presented in design two (Figure 1, right). Each design made use of the three scenarios of relative risks to compare models and measures. For each scenario and design, 300 (K) simulated data sets were generated over which results were averaged. Since the number of high-risk areas of design one and two is 3 and 6, respectively, the maximum number of clusters, J, was chosen to be 10 and 20 for design one and two, respectively. To construct a ROC curve, points obtained from all possible chosen thresholds can be connected. The value of local DICs lies on the positive real line and a threshold can be a very large positive number. However, after initial simulations, the value never exceeded 30. Then, a range of 0–30 was set as the threshold for local DIC and local DICr. For CPO and exceedence probabilities, the threshold was set to range from 0 to 1—this is the value’s natural range. Once the range is set, the thresholds for ROC are based on increments of 1/1000. Inference for all models was achieved using posterior sampling via Markov chain Monte Carlo. WinBUGS was employed using a combination of Gibbs sampling and Metropolis–Hastings steps via R2WinBUGS to fit the models. The burn-in period was

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

540

Statistical Methods in Medical Research 23(6)

set as the first 10,000 iterations. This was pretested to ensure that convergence occurred for all models by this time. Two chains were sampled and parameter estimation was based on a sample of 5000 after the burn-in period.

5 Results 5.1 Overall GOF differences Tables 1 and 2 present the performances of the five models for two different map designs and the three scenarios described previously; note that relative risk scenario 3 does not change based on the design applied. The tables summarize the results of model fitting in terms of average DIC, DICr, and MSPE over the 300 simulations. When comparing performance of the models, a model with smaller measure represents an overall average better GOF. The SCL model appears to yield very low values across the range of scenarios and designs. Though for both designs on scenario 3 (the no cluster case), the UH and BYM models fit better than the SCL models in terms of DIC. Overall, the SCL model appears to provide the best fit in a variety of scenarios, especially when large risk clusters are present. The worst models appear to be trend followed by the DP, although the predictive capability of the DP is good in the case of medium risk levels.

5.2

Local diagnostic differences

Although global GOF measured by DIC or MSPE can be used to assess how good a model fits a set of data in general, they do not provide much information about cluster recovery. Cluster diagnostics are needed to assess local behavior. For cluster evaluation, models are compared using the ROC curve computed based on a number of local measures. To calculate ROC curves, true positive and false positive rates are needed. A true positive rate for a measure was calculated by averaging the proportion of times when the posterior sampler of the measure was greater than a threshold during the convergent period over the areas of true high risks (dark blue areas). A false positive rate was computed in a similar way but averaged over the low-risk areas (light blue areas) which are considered as the noncluster. A threshold was picked in a range of possible values of the measure. Then, the threshold was varied over the range to get a curve by plotting the true positive rates, proportion of high-risk areas exceeding a threshold, against false positive rates, proportion of low-risk areas exceeding a threshold. These curves are seen in Figures 2 to 7. We examined a range of local measures including local DIC, CPO, exceedence probabilities for relative risk, and standardized residuals. With the null scenario, no models find clusters: the ROC curves come to the 45 diagonal of the ROC space. With moderate risk, scenario 2, the DP and BYM models perform best for risk exceedence and residual exceedence. The SCL and UH models also achieve reasonable performance for the moderate risk scenario. Finally for the high-risk situation, scenario 1, the BYM and SCL models perform reasonably well. The simple trend model performs poorly for risk exceedence and only slightly better for residual exceedence probability. The poor performance of exceedence measures is further highlighted by the Figures 8 and 9 which display a selection of posterior estimated maps under relative risk exceedence (Prði 4 1Þ) and residual exceedence (Prðresi 4 1Þ) for design 1 (three clusters) at high risk. It is clear that relative risk exceedence is highly model dependent and thus, highly sensitive to assumptions about risk variation. On the other hand, the residual exceedence recovers the three clusters in all cases.

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

474.89 409.64 409.97 470.45 390.16

103.77 (0.35) 89.81 (11.76) 89.9 (11.759) 103.4 (14.43) 84.88 (10.02)

371.11 319.82 320.06 367.05 305.28

Trend UH BYM DP SCL

High risk Trend UH BYM DP SCL

Low risk Trend UH BYM DP SCL

(1.69) (42.02) (42.52) (51.49) (34.91)

(2.04) (53.78) (54.28) (65.92) (44.94)

DIC (pD)

Overall

Risk scenario 3

376.01 347.51 347.96 604.57 360.91

104.81 97.54 97.72 160.01 98.40

480.82 445.05 445.68 764.58 459.32

(6.58) (69.69) (70.42) (289.01) (90.54)

(1.39) (19.49) (19.57) (71.04) (23.55)

(7.97) (89.19) (89.98) (360.06) (114.09)

DICR (pDr)

365.82 246.82 246.95 255.29 225.76

101.60 69.01 68.99 72.32 62.28

467.41 315.83 315.94 327.60 288.04

MSPE

(0.55) (7.5) (8.36) (8.9) (8.5)

(1.84) (19.89) (23.92) (25.91) (30.68)

329.75 (1.29) 329.6 (12.38) 330.01 (15.55) 330.96 (17.01) 273.38 (22.19)

133.08 131.42 129.99 131.97 103.54

462.82 461.02 459.99 462.93 376.93

DIC (pD)

Risk scenario 2

331.14 338.62 340.66 354.15 305.63

133.91 137.99 136.43 142.21 116.66 (2.68) (21.41) (26.2) (40.2) (54.43)

(1.38) (14.07) (14.81) (19.14) (21.61)

465.05 (4.07) 476.61 (35.48) 477.09 (41.01) 496.36 (59.34) 422.3 (76.05)

DICR (pDr)

281.49 266.86 262.23 261.30 188.79

130.66 118.81 117.28 117.67 88.42

412.15 385.68 379.51 378.97 277.21

MSPE

Table 1. Goodness-of-fit measures of the five models under relative risk scenarios of map design 1.

343.11 337.51 337.22 337.56 330.57

179.22 163.49 156.68 161.78 143.49

(1.146) (23.58) (25.09) (27.13) (28.39)

(0.72) (18.41) (16.84) (17.27) (12.16)

522.33 (1.87) 501.01 (41.99) 493.9 (41.93) 499.34 (44.41) 474.06 (40.55)

DIC (pD)

Risk scenario 1

(2.54) (32.96) (27.71) (33.39) (19.2) 344.70 (2.73) 353.02 (39.09) 352.9 (40.76) 364.16 (53.63) 343.25 (41.07)

181.04 178.03 167.55 177.93 150.52

525.74 (5.27) 531.06 (72.04) 520.45 (68.47) 542.1 (87.03) 493.77 (60.27)

DICR (pDr)

327.73 284.78 276.16 271.36 240.92

235.59 177.44 175.73 172.67 158.53

563.32 462.22 451.88 444.04 399.45

MSPE

Rotejanaprasert 541

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

476.03 (2.05) 410.28 (54.07) 410.61 (54.57) 469.8 (66.2) 386.87 (46.6)

68.49 58.95 59.08 67.87 55.01

407.54 351.33 351.53 401.93 333.01

Trend UH BYM DP SCL

High risk Trend UH BYM DP SCL

Low risk Trend UH BYM DP SCL

(1.76) (46.34) (46.82) (56.61) (46.59)

(0.28) (7.73) (7.74) (9.59) (6.46)

DIC (pD)

Overall

Risk scenario 3

(7.91) (89.84) (90.71) (351.66) (134.35)

412.59 381.92 382.36 645.52 410.67

(6.81) (76.92) (77.66) (300.2) (117.8)

69.31 (11) 64.14 (12.92) 64.38 (13.05) 109.74 (51.46) 65.1 (16.55)

481.89 446.07 446.74 755.26 475.77

DICR (pDr)

400.78 271.51 271.73 279.56 247.17

66.42 45.44 45.47 46.38 39.75

467.20 316.95 317.21 325.95 286.91

MSPE

358.68 359.13 359.86 362.92 358.49

(1.52) (12.94) (15.55) (18.46) (24.07)

89.49 (0.48) 88.05 (4.79) 87.64 (5.27) 88.92 (5.86) 84.9 (5.15)

448.17 (2) 447.18 (17.74) 447.5 (20.83) 451.85 (24.33) 443.39 (29.23)

DIC (pD)

Risk scenario 2

(1.27) (9.69) (10.34) (14.2) (15.03)

(4.32) (31.59) (36.26) (59.34) (91.5)

360.21 (3.05) 368.09 (21.9) 370.22 (25.92) 389.6 (45.14) 410.89 (76.47)

90.27 92.94 92.71 97.26 94.78

450.48 461.04 462.93 486.86 505.67

DICR (pDr)

294.27 280.96 278.81 274.69 260.00

86.45 78.92 77.95 78.18 75.77

380.72 359.88 356.76 352.87 335.77

MSPE

Table 2. Goodness-of-fit measures of the five models under relative risk scenarios of map design 2.

365.5 (1.34) 364.73 (23.71) 365.46 (25.2) 366.16 (26.48) 356.36 (31.96)

125.38 (0.64) 112.18 (11.64) 110.862 (11.6) 112.07 (11.3) 98.44 (7.64)

490.89 (1.98) 476.91 (35.35) 476.32 (36.8) 478.22 (37.78) 454.8 (39.6)

DIC (pD)

Risk scenario 1

367.04 378.45 379.97 392.67 389.86

127.18 124.76 122.81 126.91 106.37

494.22 503.21 502.78 519.58 496.24

(2.87) (37.44) (39.71) (52.99) (65.47)

(2.44) (24.21) (23.54) (26.14) (15.58)

(5.31) (61.65) (63.25) (79.13) (81.04)

DICR (pDr)

323.01 291.46 290.43 287.67 254.51

161.20 121.84 120.61 121.61 110.32

484.22 413.29 411.03 409.29 364.83

MSPE

542 Statistical Methods in Medical Research 23(6)

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Figure 2. ROC curves of different cluster diagnostics using data generated from map design 1 and relative risk scenario 3. (a) Local DIC, (b) local DICr, (c) cpo, (d) prob (y > 1), (e) prob (y > 2), (f) prob (y > 3), (g) prob (res > 1), (h) prob (res > 2), (i) prob (res > 3).

Rotejanaprasert 543

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Figure 3. ROC curves of different cluster diagnostics using data generated from map design 1 and relative risk scenario 2. (a) Local DIC, (b) local DICr, (c) cpo, (d) prob (y > 1), (e) prob (y > 2), (f) prob (y > 3), (g) prob (res > 1), (h) prob (res > 2), (i) prob (res > 3).

544 Statistical Methods in Medical Research 23(6)

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Figure 4. ROC curves of different cluster diagnostics using data generated from map design 1 and relative risk scenario 1. (a) Local DIC, (b) local DICr, (c) cpo, (d) prob (y > 1), (e) prob (y > 2), (f) prob (y > 3), (g) prob (res > 1), (h) prob (res > 2), (i) prob (res > 3).

Rotejanaprasert 545

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Figure 5. ROC curves of different cluster diagnostics using data generated from map design 2 and relative risk scenario 3. (a) Local DIC, (b) local DICr, (c) cpo, (d) prob (y > 1), (e) prob (y > 2), (f) prob (y > 3), (g) prob (res > 1), (h) prob (res > 2), (i) prob (res > 3).

546 Statistical Methods in Medical Research 23(6)

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Figure 6. ROC curves of different cluster diagnostics using data generated from map design 2 and relative risk scenario 2. (a) Local DIC, (b) local DICr, (c) cpo, (d) prob (y > 1), (e) prob (y > 2), (f) prob (y > 3), (g) prob (res > 1), (h) prob (res > 2), (i) prob (res > 3).

Rotejanaprasert 547

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Figure 7. ROC curves of different cluster diagnostics using data generated from map design 2 and relative risk scenario 1. (a) Local DIC, (b) local DICr, (c) cpo, (d) prob (y > 1), (e) prob (y > 2), (f) prob (y > 3), (g) prob (res > 1), (h) prob (res > 2), (i) prob (res > 3).

548 Statistical Methods in Medical Research 23(6)

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Rotejanaprasert

549

Figure 8. Relative risk exceedence probabilities averaged over 300 simulations for a selection of basic relative risk models: Prði 4 1Þ for cluster design one and scenario 1. (a) Trend, (b) UH, (c) BYM.

Figure 9. Residual exceedence probabilities averaged over 300 simulations for a selection of basic relative risk models: Pr(resi > 1) for cluster design one and scenario 1. (a) Trend, (b) UH, (c) BYM.

6 Discussion A range of diagnostic measures and overall GOF summaries have been evaluated over a range of scenarios for clustering of risk. It is clear that the SCL model provides as good overall fit over a variety of clustering scenarios, but for local diagnostics, the DP, BYM, and SCL models appear to provide a good overall performance. It should be noted that the DP used in this study is not specifically targeted as a spatial model and alternative specifications could also be utilized.16 It is also apparent that trend models are poor choices. One caveat in this analysis is that the simulated ground truth has an idealized form with jumps in risk between 1 and 2 and 1 and 3 depending on the scenario. In addition, the background variation is assumed to be stationary and constant in the clustered realizations. Hence an idealized ground truth has been used. However, the overall differences in performance between the models would be mirrored in other formulations where

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

550

Statistical Methods in Medical Research 23(6)

risk varied more on a continuous basis, which can be significantly assumed. Here, it is more important to assess distinct clustering in the simulations and in the evaluation of local measures.

7 Conclusions The spatial cluster model assumed here appears to provide the best overall fit to the simulation scenarios. The UH and BYM models also do quite well in overall fit diagnostics for the no-clustering scenario and high-risk areas but not for the highest risk simulations. The trend and DP models behave poorly in terms of overall fit. For cluster diagnostics, however, the DP and BYM appear good for risk and residual exceedence probabilities. The SCL and UH models also achieve recovery reasonably well. The simple trend model performs poorly for exceedences as it averages over the risk surface. This suggests that relative risk exceedence measures can be very unreliable and are highly dependent on the underlying model. It is also clear that while trend models are generally poor in recovery of true risk, as might be expected, the local diagnostics based on relative risk exceedence has very poor robustness to misspecification of risk. It is therefore crucial that if such measures are used, they must be accompanied with evaluation of sensitivity of risk form and prior distributions. Residual exceedence appears to be more reliable for cluster diagnostic purposes. In addition, the use of models specifically designed to capture relative risk (BYM or UH) does not provide the best fits in terms of overall cluster GOF, but does have reasonable performance locally. Acknowledgement I would like to acknowledge the support and encouragement of my supervisor Dr Andrew Lawson in this work. The manuscript benefited greatly from the comments of referees.

Funding The financial support was provided by the Thai government and the Medical University of South Carolina.

References 1. Lawson A, Biggeri A, Boehning D, et al. Disease mapping models: An empirical evaluation. Disease Mapping Collaborative Group. Stat Med 2000; 19: 2217–2241. 2. Best N, Richardson S and Thomson A. A comparison of Bayesian spatial models for disease mapping. Stat Methods Med Res 2005; 14: 35–59. 3. Kulldorff M and Nagarwalla N. Spatial disease clusters: Detection and inference. Stat Med 1995; 14: 799–810. 4. Lawson AB. Statistical methods in spatial epidemiology. 2 ed. New York: John Wiley, 2006. 5. Lawson AB. Bayesian disease mapping: hierarchical modeling in spatial epidemiology. 2 ed. Boca Raton: CRC Press, 2013. 6. Hossain M and Lawson AB. Cluster detection diagnostics for small area health data: With reference to evaluation of local likelihood models. Stat Med 2006; 25: 771–786. 7. Hossain MM and Lawson AB. Space-time Bayesian small area disease risk models: Development and evaluation with a focus on cluster detection. Environ Ecol Stat 2010; 17: 73–95. 8. Richardson S, Thomson A, Best N, et al. Interpreting posterior relative risk estimates in disease-mapping studies. Environ Health Perspect 2004; 112: 1016.

9. Banerjee S, Gelfand AE and Carlin BP. Hierarchical modeling and analysis for spatial data. Boca Raton: Crc Press, 2003. 10. Besag J, York J and Mollie´ A. Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 1991; 43: 1–20. 11. Ferguson TS. A Bayesian analysis of some nonparametric problems. The Annals of Statistics 1973; 1: 209–230. 12. Blackwell D and MacQueen JB. Ferguson distributions via Po´lya urn schemes. The Annals of Statistics 1973; 1: 353–355. 13. Teh YW. Dirichlet process. In: Sammut C and Webb G (eds) Encyclopedia of machine learning. New York: Springer, 2010, pp.280–287. 14. Sethuraman J. A constructive definition of Dirichlet priors. Stat Sin 1994; 4: 639–650. 15. Hjort NL. Bayesian nonparametrics. Cambridge University Press, 2010. 16. Duan JA, Guindani M and Gelfand AE. Generalized spatial Dirichlet process models. Biometrika 2007; 94: 809–825. 17. Knorr-Held L and Raßer G. Bayesian detection of clusters and discontinuities in disease maps. Biometrics 2000; 56: 13–21.

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Rotejanaprasert

551

18. Gangnon RE and Clayton MK. Bayesian detection and modeling of spatial disease clustering. Biometrics 2000; 56: 922–935. 19. Lawson AB, Song HR, Cai B, et al. Space–time latent component modeling of geo-referenced health data. Stat Med 2010; 29: 2012–2027. 20. Choi J, Lawson AB, Cai B, et al. Evaluation of Bayesian spatiotemporal latent models in small area health data. Environmetrics 2011; 22: 1008–1022. 21. Kuo L and Mallick B. Variable selection for regression  The Indian Journal of Statistics, Series B models. Sankhya: 1998; 60: 65–81.

22. Spiegelhalter DJ, Best NG, Carlin BP, et al. Bayesian measures of model complexity and fit. J R Stat Soc Ser B (Stat Methodol) 2002; 64: 583–639. 23. Gelman A, Carlin JB, Stern HS, et al. Bayesian data analysis. 2 ed. Boca Raton: CRC press, 2004. 24. Gelfand AE and Ghosh SK. Model choice: A minimum posterior predictive loss approach. Biometrika 1998; 85: 1–11. 25. Geisser S and Eddy WF. A predictive approach to model selection. J Am Stat Assoc 1979; 74: 153–160. 26. Abellan JJ, Richardson S and Best N. Use of space–time models to investigate the stability of patterns of disease. Environ Health Perspect 2008; 116: 1111.

Downloaded from smm.sagepub.com at University of Ulster Library on May 10, 2015

Evaluation of cluster recovery for small area relative risk models.

The analysis of disease risk is often considered via relative risk. The comparison of relative risk estimation methods with "true risk" scenarios has ...
2MB Sizes 2 Downloads 3 Views