Multiple censored data in dentistry: A new statistical model for analyzing lesion size in randomized controlled trials.

384

Biometrical Journal 57 (2015) 3, 384–394

DOI: 10.1002/bimj.201400118

Multiple censored data in dentistry: A new statistical model for analyzing lesion size in randomized controlled trials Marvin N. Wright1 and Andreas Ziegler∗,1,2,3 1

2

3

¨ Medizinische Biometrie und Statistik, Universität zu Lubeck, ¨ Institut fur Universitätsklinikum ¨ ¨ Schleswig-Holstein, Campus Lubeck, Ratzeburger Allee 160, 23562 Lubeck, Germany ¨ Klinische Studien, Universität zu Lubeck, ¨ Zentrum fur Universitätsklinikum Schleswig-Holstein, ¨ ¨ Campus Lubeck, Lubeck, Germany School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa

Received 3 June 2014; revised 17 December 2014; accepted 12 January 2015

Caries infiltration is a novel treatment option for proximal caries lesions. The idea is to build a diffusion barrier inside the lesion to slow down or stop the caries progression. If a lesion still reaches a critical size, restorative treatment is required. Clinical trials investigating caries infiltration thus produce multiple censored ordinal data. Standard statistical models do not take into account this censoring, and we therefore propose the Multiple Ordered Tobit (MOT) model. The model is implemented in R and compared with standard approaches. Simulation studies demonstrate that for all sample sizes and scenarios the MOT model has the largest statistical power among all methods compared, and it is robust against heteroscedasticity to some extent. Finally, a comparison with dichotomous and ordinal scaled models shows that the use of metric data for the lesion size reduces the required sample size considerably.

Keywords: Caries infiltration; Censored data; Lesion size; Tobit model.

Additional supporting information including source code to reproduce the results may be found in the online version of this article at the publisher’s web-site

1 Introduction One common goal in the treatment of caries is to preserve as much natural substance of a tooth as possible. In early stages of caries, this can be achieved by preventive, noninvasive methods, such as fluoridation or improvement of oral hygiene. For lesions reaching the inner enamel or outer dentin, it is too late for this kind of treatment but too early for invasive treatment, such as drilling. The standard therapy is to continue the noninvasive treatment and monitor the progression of the lesions. If cavitation finally occurs, conventional invasive treatment, that is drilling and filling, is used. Recently, a new method called caries infiltration has been proposed for proximal lesions to fill the gap between noninvasive and invasive treatment (Paris et al., 2007). Here, fissures in molars are filled with composite material to prevent bacteria growth. The diffusion barrier is not applied at the surface but inside the lesion. To assess efficacy and safety of the caries infiltration approach, it was discussed to conduct a randomized controlled parallel group trial. The primary endpoint of such a trial is lesion size. Although it is quantitative in nature, it is measured ordinally with current technology. ∗ Corresponding

author: e-mail: [email protected], Phone: +49-451-500-2780, Fax: +49-451-500-2999

C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Biometrical Journal 57 (2015) 3

385

In the course of a trial, lesions may increase in size so that they need to be drilled and filled. As a result, the data are censored for all subsequent follow-up time points, and the quantitative information on lesion size is not available at the end of the follow-up period. Follow-up in these studies generally is once per year for each patient so that the only available information is whether the filling was in the first or second year, and so on. The time point of censoring is informative because late drilling and filling is preferable over early drilling and filling. Since follow-up time points are fixed, the lesion size is censored at fixed thresholds. In recent studies on caries infiltration two approaches to model the data were used. Martignon et al. (2012) used ordinal categories for lesion progression and added another category for cases when filling took place. Paris et al. (2007) used only two categories: Progression and no progression, and treated filling as progression. Both approaches implicitly use threshold models for the lesion size, but restrict them to ordinal categories, ignoring the quantitative nature of the lesion size. The measurement in categories could be interpreted as interval censoring, and an ordinal logit model, also termed proportional odds model (McCullagh, 1980), or the more sophisticated Bayesian method by Komárek et al. (2005) could be used. The former can be applied to categorical data only, the latter additionally incorporates left and right censoring. But still, the quantitative lesion size would be ignored. One approach for quantitative data with one point of censoring would be the Tobit model (Tobin, 1958). Originally developed for econometric applications, the Tobit model is often used in medical and biological applications in case of a lower or upper limit of detection (Lorimer and Kiermeier, 2007). If patients are followed up over several time points and censoring can occur at any of these time points, the standard Tobit model is not appropriate because it only allows for a single time point for censoring. In summary, no appropriate model for the data from a randomized controlled parallel group trial on caries infiltration could be found. In this paper, we therefore extend the standard Tobit model with one time point of censoring to a multiple ordinal Tobit (MOT) model with several time points of censoring. The approach has been implemented in the R (R Core Team, 2014) package lmmot, which is available on CRAN (Wright, 2014). The package uses the maxLik (Henningsen and Toomet, 2011) package for maximum likelihood estimation and MASS (Venables and Ripley, 2002) for the computation of the Moore–Penrose pseudoinverse. Its validity for estimation is illustrated in a simulation study, where we show that the linear and Tobit models yield biased estimates. In a second simulation study, type I error levels and power of the MOT model are compared with alternative statistical approaches. Finally, in an application to the planning of a clinical trial, the methods are compared with respect to the needed sample size.

2 The multiple ordinal Tobit model The standard Tobit model has an observable variable y and a continuous latent variable y∗ , which are connected through a left censoring threshold model with threshold τ y=

c, y∗ ,

if y∗ ≤ τ, if y∗ > τ.

(1)

For censored data, the observation y equals a real-valued constant c, while y∗ is observed otherwise. In most laboratory applications, the threshold is known a priori, and c = τ . This basic type of left censoring can be insufficient, and natural extensions, which are used in laboratory medicine are right censoring and double censoring. For example, in the double censoring model, values below and values above a quantification range are censored (Nakamura and Nakamura, 1983). Other extensions include the random censoring Tobit model (Buchinsky and Hahn, 1998) and the ordered Tobit model (Bellemare and Barrett, 2006). C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

386

M. N. Wright and A. Ziegler: Multiple censored data in dentistry

For multiple censored data, we extend the threshold model (1) by K ordinal categories, representing the censoring intervals: ⎧ ∗ y , ⎪ ⎪ ⎪ ⎪ ⎨ c1 , y = c2 , ⎪ ⎪.. ⎪ ⎪ ⎩. cK ,

if −∞ < y∗ < τ1 , if τ1 ≤ y∗ < τ2 , if τ2 ≤ y∗ < τ3 ,

(2)

if τK ≤ y∗ < ∞,

for K intervals with thresholds τk and k = 1, . . . , K. As in the standard Tobit model (Tobin, 1958), we consider a sample of n independent subjects i, i = 1, . . . , n and assume a linear model for the latent variable y∗i y∗i = xi β + εi with independent and identically distributed non-systematic and homoscedastic errors εi ∼ N 0, σ 2 . xi is the vector of independent variables of length p, and β is a vector of the parameters of interest. In contrast to the Tobit model, K dummy variables are used to indicate censoring in the K ordinal categories: 1, zik = 0,

if τk ≤ y∗i < τk+1 , otherwise,

with τ0 = −∞ and τK+1 = ∞. The likelihood function is therefore obtained as L(β, σ ) =

n i=1

K zi0 zik f yi − xi β, σ , F τk+1 − xi β, σ − F τk − xi β, σ k=1

where F (yi ) and f (yi ) denote the distribution and density functions of yi , respectively. For maximum likelihood estimation, we assume normally distributed errors so that the log-likelihood function of the MOT model is given by n K

τk+1 − xi β yi − xi β τk − xi β l (β, σ ) = − ln σ + − zik ln zi0 ln φ , σ σ σ i=1

k=1

where (·) and φ(·) denote the distribution and density functions of the standard normal distribution, respectively. In the following, we use the following abbreviations: φi,k := φ τk − xi β /σ , i,k := τk − xi β /σ and vi,k := τk − xi β. The score function of the MOT model is obtained as ⎞ n K φi,k − φi,k+1 1 xi zi0 y − xi β + zik xi ⎟ ⎜ σ σ i i,k+1 − i,k ⎟ ∂l (β, σ ) ⎜ i=1 k=1 ⎟, ⎜ = n K ⎜

1 vi,k φi,k − vi,k+1 φi,k+1 ⎟ ∂ (β, σ ) 2 1 ⎠ ⎝1 y − xi β − 1 + zik zi0 σ σ2 i σ i,k+1 − i,k ⎛

i=1


k=1



387

and the Fisher information matrix is subsequently derived to be I(β, σ ) =

Iββ Iσ β

Iσ β , Iσ σ

with Iββ Iσ σ

2 n K

φi,k − φi,k+1 1 1 v φ − vi,k+1 φi,k+1 − = 2 xi xi i,1 xi xi − , σ σ i,k i,k i,k+1 − i,k i=1 k=1 n K

1 3 1 vi,k − 2vi,k σ 2 φi,k = 2 2i,1 − σ σ3 i=1 k=1 2 3 1 vi,k φi,k − vi,k+1 φi,k+1 2 − vi,k+1 − 2vi,k+1 σ φi,k+1 − 2 σ i,k+1 − i,k

and Iσ β

n K 1

= 2 −xi σ

1 2 vi,k − σ 2 φi,k − v2i,k+1 − σ 2 φi,k+1 2 σ i=1 k=1 1 vi,k φi,k − vi,k+1 φi,k+1 φi,k − φi,k+1 − . σ i,k+1 − i,k

3 Simulation study I: Model validation To evaluate the implementation of the likelihood function, score function, and Fisher information of the MOT model in the R implementation lmmot, we performed a Monte–Carlo simulation study. To this end, we estimated the parameters of the MOT model, the standard linear regression model and the standard right censored Tobit model using data generated from a simple linear regression model y∗i = β0 + β1 xi + εi , with normally distributed errors εi ∼ N (0, 4). The independent variables xi were drawn from a normal distribution with mean 1 and variance 9. All data points (xi , y∗i ) were ordinal right censored with thresholds τ = (τ1 , τ2 , τ3 ). The linear model and the standard Tobit model were estimated using the R function lm() and the censReg package (Henningsen, 2012), respectively. The MOT model was estimated with the new R package lmmot, provided with this article. Parameter estimates were averaged over all replicates. Bias, variance, and mean squared error (MSE) of 100,000 simulations with 100 data points per replicate are shown in Table 1 for β0 = 1, β1 = 1, σ 2 = 4, and thresholds τ ∈ {(1, 2, 3), (2, 3, 4), (3, 4, 5)}. The mean proportions of censoring for τ = (1, 2, 3) were 39% uncensored observations, 11% between τ1 and τ2 , 11% between τ2 and τ3 , and 39% larger than τ3 . For τ = (2, 3, 4) the proportions were 50%, 11%, 10%, 29%, respectively, and for τ = (3, 4, 5) they were 61%, 10%, 9%, 20%, respectively. The linear model underestimated both intercept and slope in all scenarios. The Tobit and MOT models slightly overestimated intercept and slope, with lower bias and variance for the MOT model. The bias decreased with increasing thresholds for all models. The variance of the linear model increased for larger thresholds, while the variance for the Tobit and MOT models decreased. For larger threshold, C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim


388


Table 1 Bias, variance, and mean squared error (MSE) from a simulation study for the true multiple ordinal Tobit model with β0 = 1, β1 = 1, σ 2 = 4, and varying thresholds τ = (τ1 , τ2 , τ3 ). Intercept

Slope

Thresholds

Method

Bias

Variance

MSE

Bias

Variance

MSE

(1, 2, 3)

Linear model Tobit model MOT model

−0.7027 0.0092 0.0042

0.0360 0.1015 0.0558

0.5298 0.1015 0.0558

−0.3911 0.0048 0.0033

0.0041 0.0161 0.0091

0.1570 0.0162 0.0092

(2, 3, 4)


−0.4530 0.0068 0.0028

0.0384 0.0695 0.0496

0.2436 0.0696 0.0497

−0.2972 0.0040 0.0026

0.0041 0.0118 0.0075

0.0924 0.0118 0.0075

(3, 4, 5)


−0.2785 0.0044 0.0019

0.0404 0.0554 0.0470

0.1179 0.0555 0.0470

−0.2159 0.0033 0.0022

0.0040 0.0091 0.0064

0.0506 0.0091 0.0064

Notes: In the simulation, 100,000 replicates were drawn with 100 data points each.

thus fewer censored observations, the differences between the models decreased. In summary, the MOT model estimates revealed the least bias and the lowest MSE in all analyzed scenarios.

4 Simulation study II: Power In the second simulation study, the Wald and likelihood ratio test of the MOT model were compared with standard statistical methods for testing for a treatment effect in clinical trials with multiple ordinal right censored data. Starting point was a parallel group trial with two groups. The treatment variable equaled 1 for the treatment with caries infiltration, and 0 for the waiting strategy. The size of a caries lesion was measured annually over a three year period. If the lesion exceeded a certain size, restorative treatment was performed. In this case, the lesion size could not be measured at later follow-ups, and only the year of treatment was available. We used the simple linear regression model y∗i = β0 + β1 xi + εi

(3)

for the latent variable y∗i , which denoted the lesion size after three years. The samples were equally split to control (xi = 0) and treatment (xi = 1). The errors were assumed to be normally distributed with εi ∼ N 0, σ 2 . Under the assumption that a lesion treated earlier is worse than a lesion treated later, we obtain a triple ordinal right censored model with threshold model ⎧ ∗ y , ⎪ ⎨ i c1 , yi = ⎪ ⎩c2 , c3 ,

if −∞ < y∗i < τ1 , if τ1 ≤ y∗i < τ2 , if τ2 ≤ y∗i < τ3 , if τ3 ≤ y∗i < ∞.

Subjects with yi = c1 correspond to patients with restorative treatment in year 3, while restoration was assumed to be performed in years 2 and 1 for subjects with yi = c2 and yi = c3 , respectively. Censored data were generated as in simulation study I. In the simulations the intercept, effect size, error variance, and threshold values were varied, and we investigated a total of 48 scenarios. For all combinations of β0 ∈ {0, 1}, β1 ∈ {0, 0.5, 1, 1.5}, σ 2 ∈ {0.6, 1, 1.4}, and τ ∈ {(1, 2, 4), (2, 4, 6)}, simulations were repeated with 100,000 replications each. To C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim



389

assess robustness of the MOT model, we also performed simulations with heterogeneous variances. In the corresponding 16 scenarios, equal group sizes for treatment and control were chosen with equal β0 ∈ {0, 1}, β1 ∈ {0, 1} and τ ∈ {(1, 2, 4), (2, 4, 6)} for both groups. The variance σ 2 was either 0.6 for the treatment group and 1.4 for the control group or 1.4 for the treatment group and 0.6 for the control group. Finally, we performed simulations with misspecified thresholds. The data were simulated with thresholds τ real , while the models assumed thresholds of τ model . The parameters were set to β0 = 0, β1 = 1, σ 2 = 1, and τ real = (1, 2, 4), and model thresholds were set to τ model = τ real + d, with d ∈ {−2, −1, 0, 1, 2}. Again, 100,000 replications of each simulation scenario were computed. Robustness of a test statistic was defined adopting Bradley’s (1978) liberal criterion for the type I error. In detail, a test was regarded robust, if the estimated type I error was between 0.5 α and 1.5 α, where the size of the two-sided test was set to α = 0.05. The Wald-test MOT approach (MOT–Wald) and the likelihood ratio test MOT approach (MOT– LRT) as implemented in the R package lmmot were compared with the following standard statistical methods: (a) U-test using the R function wilcox.test(); (b) two-sample t-test using the R function t.test() (t-test); (c) Cochran–Armitage test for trend (Cochran, 1954) with four categories according to the censoring and linear ascending weights (trend test) — the R function used for the estimations is provided in the supplement; d) standard Wald-test Tobit model (Tobit) for a single parameter, available in the R package censReg; (e) ordinal logit model (Ordinal logit) with four categories according to the censoring. The R package ordinal from Christensen (2012) was used for estimation. Finally, (f) Pearson’s chi-squared test (chi-squared) using one category for all noncensored data was employed using the R function chisq.test(). Table 2 displays the results under the null hypothesis for varying sample sizes, variance, and intercept for 100,000 replications. Results of all simulated models are provided in the supplementary material (Supporting Information Tables S1–S64). For the case of homogeneous variances and smaller sample sizes, the MOT–Wald test showed slightly inflated type I errors although it satisfied Bradley’s criterion for robustness in all scenarios. The type I error levels of the MOT-LRT were substantially increased for small sample sizes and were higher than for MOT–Wald in all cases. However, with increasing sample size, the inflation diminished for both tests. The standard Tobit model also showed slightly inflated type I errors for small sample sizes in some simulation scenarios but satisfied Bradley’s criterion. The other approaches generally had type I errors close to or even below the nominal value α = 0.05. Pearson’s chi-squared test was conservative for all sample sizes in all scenarios, and, as a result, one might expect relatively low power for this test under the alternative hypothesis. For larger thresholds, type I errors were comparable, except for lower type I error levels for the trend and chi-squared test; see supplementary material for details. For heterogeneous variances and thresholds of τ = (1, 2, 4) the MOT–Wald test satisfied Bradley’s criterion in all scenarios. For thresholds of τ = (2, 4, 6) there were two cases, where Bradley’s criterion was not met for a sample size of n = 10 (supplementary material, Supporting Information Tables S50 and S58). The type I error was 0.076 in both cases, just failing this criterion. Again, the MOT–LRT showed substantially increased type I errors for small sample sizes. For the trend test, chi-squared test and, for some scenarios the Tobit model and ordinal logit model, the type I error increased with sample size. The models were apparently misspecified in these cases. Finally, both the t-test and the U-test can be regarded as robust in the simulated scenarios. Empirical power from 100,000 replicates is displayed for varying sample sizes, error variance, intercept, effect size, and thresholds for the different approaches in Fig. 1. The MOT model had the highest power of the compared methods for all scenarios with homogeneous variances. Although the MOT–LRT had high power in all cases, we do not recommend this approach for applications because of inflated type I error levels under the null hypothesis. In most simulation scenarios, the Tobit model, the U-test and the t-test were next best in power. The ordinal logit model and trend test had low power. Pearson’s chi-squared test had the lowest power among all methods investigated because it tested only for the year of censoring. For heterogeneous variances with larger variance in the treatment group, the trend test appeared to have the highest power for large sample sizes. However, this test had inflated



390


Table 2 Empirical type I errors from the simulation study for varying sample size n and variance σ 2 for the treatment and control group. σ2

Method

n = 10

n = 20

n = 40

n = 60

n = 80

n = 100

Control 1.0

MOT–Wald MOT–LRT Tobit U-test t-test Ordinal logit Trend test Chi-squared

0.068 0.115 0.066 0.032 0.044 0.004 0.018 0.000

0.060 0.079 0.058 0.044 0.049 0.043 0.024 0.005

0.054 0.062 0.053 0.048 0.049 0.049 0.047 0.013

0.052 0.058 0.051 0.048 0.049 0.048 0.049 0.019

0.053 0.057 0.053 0.050 0.050 0.050 0.049 0.019

0.053 0.056 0.052 0.050 0.051 0.050 0.050 0.024


0.069 0.116 0.068 0.032 0.045 0.001 0.008 0.000

0.060 0.079 0.060 0.044 0.049 0.040 0.009 0.001

0.054 0.063 0.054 0.048 0.049 0.048 0.039 0.004

0.052 0.058 0.052 0.048 0.049 0.047 0.049 0.009

0.053 0.057 0.053 0.050 0.050 0.050 0.056 0.014

0.052 0.055 0.052 0.050 0.051 0.050 0.050 0.018


0.068 0.115 0.064 0.032 0.044 0.007 0.025 0.000

0.060 0.079 0.058 0.044 0.049 0.045 0.034 0.008

0.053 0.063 0.053 0.047 0.049 0.049 0.047 0.018

0.052 0.058 0.051 0.048 0.049 0.049 0.048 0.022

0.053 0.057 0.052 0.049 0.050 0.050 0.050 0.022

0.053 0.056 0.052 0.050 0.051 0.050 0.049 0.028

MOT–Wald MOT–LRT Tobit–Wald U-test t-test Ordinal logit Trend test Chi-squared

0.073 0.119 0.071 0.034 0.047 0.005 0.024 0.000

0.062 0.082 0.060 0.048 0.050 0.048 0.049 0.012

0.055 0.064 0.055 0.050 0.051 0.060 0.146 0.051

0.053 0.058 0.054 0.051 0.053 0.067 0.212 0.096

0.054 0.058 0.056 0.053 0.056 0.075 0.286 0.137

0.054 0.057 0.058 0.054 0.060 0.080 0.355 0.198

MOT–Wald MOT–LRT Tobit–Wald U-test t-test Ordinal logit Trend test Chi-squared

0.073 0.120 0.070 0.035 0.047 0.005 0.025 0.000

0.062 0.081 0.061 0.047 0.051 0.048 0.051 0.013

0.055 0.064 0.056 0.050 0.051 0.060 0.147 0.051

0.054 0.059 0.056 0.051 0.054 0.067 0.210 0.095

0.054 0.059 0.056 0.053 0.057 0.075 0.286 0.135

0.054 0.057 0.056 0.053 0.058 0.081 0.355 0.199

Treatment 1.0

Control 0.6 Treatment 0.6




Notes: The thresholds were set to τ = (1, 2, 4) and the intercept to β0 = 0. Simulations were performed with 100,000 replications each and a nominal type I error level of α = 0.05.




391

σ2ctrl = 1, σ2treat = 1, β = (0,1)’, τ = (1,2,4)’

σ2ctrl = 1, σ2treat = 1, β = (0,0.5)’, τ = (1,2,4)’

σ2ctrl = 1.4, σ2treat = 1.4, β = (0,1)’, τ = (1,2,4)’

σ2ctrl = 0.6, σ2treat = 0.6, β = (0,0.5)’, τ = (1,2,4)’

σ2ctrl = 1.4, σ2treat = 1.4, β = (1,1)’, τ = (2,4,6)’

σ2ctrl = 1.4, σ2treat = 1.4, β = (0,1.5)’, τ = (2,4,6)’

σ2ctrl = 0.6, σ2treat = 1.4, β = (0,1)’, τ = (1,2,4)’

σ2ctrl = 1.4, σ2treat = 0.6, β = (1,1)’, τ = (1,2,4)’

1.00

0.75

0.50

0.25

0.00

1.00

Power

0.75

0.50

0.25

0.00

Method

1.00

MOT−Wald MOT−LRT

0.75

U−Test t−Test 0.50

Tobit Ordinal logit Trend test

0.25

Chi−squared 0.00 25

50

75

100

25

50

75

100

Sample size

Figure 1 Empirical power from the simulation study for varying sample sizes with different thresholds τ, intercept β0 , effect size β1 and variances σ 2 for the control and treatment group. Simulations were performed with 100,000 replications each and a nominal type I error level of α = 0.05.



392


Table 3 Simulated sample size required for 80% power in a randomized controlled trial. Method

Sample size required for 80% power


233 233 250 245 238 278 371 428

Notes: The parameters were set to β = (5.8, −2), σ = 5.4, and τ = (7.6, 9.4, 11.8). Simulations were performed with 100,000 replications each and a nominal type I error level of α = 0.05.

type I error levels under the null hypothesis. As expected with heteroscedasticity, the U-test gained some power compared to the other approaches, with the MOT approaches performing best. Results of the simulation study with misspecified thresholds are provided in the supplementary material (Supporting Information Tables S65–S74). Type I errors were mostly unchanged, except for d = −2, where they were lower for the Tobit model at larger sample sizes, higher for the MOT–Wald at n = 10, and slightly lower for MOT–Wald at larger sample sizes. The power was generally reduced for the MOT model, the Tobit model and the t-test. For the U-test, ordinal logit model and chi-squared test it was unchanged or slightly increased for d > 0 and reduced for d < 0. In contrast, the trend test had higher power for d = −1 and lower power for d > 0. For d = −2 the power of all tests was comparatively low. The MOT model and trend test reached 50% power for large sample sizes, while the others stayed below 20% for all samples sizes. In summary, the MOT–Wald performed best in all scenarios of this simulation study. It is important to note that the performance of the approaches depends on the proportion of censored observations. If no observation was censored, the Tobit and MOT–Wald tests are equal to the t-test. If all censored observations are in the first category, the MOT model is identical to the standard Tobit model. Finally, if all observations are censored, the MOT model equals the ordinal logit model.

5 Application: Planning of a randomized controlled trial The development of the MOT model was motivated by a dental caries study, and in this section we provide sample sizes required for such a trial. Given the lack of closed formulae, we simulated required sample sizes for a two group study with one-to-one randomization comparing caries infiltration with the waiting strategy. Based on a study on quantitative light-induced fluorescence (Heinrich-Weltzien et al., 2005), we simulated lesion size as a normally distributed random variable with mean sizes of 5.8 mm2 and 3.8 mm2 for the control and the active treatment group at a common standard deviation of 5.4 mm2 . The thresholds were chosen to lead to censoring proportions of 70% uncensored lesions and 10% in each category, representing the lesions treated restoratively in the years 3, 2, and 1 of the follow-up. This resulted in a threshold vector of τ = (7.6, 9.4, 11.8). We performed simulations with 100,000 replications each. At the two-sided 5% test-level, the sample sizes required to achieve an average power of 80% with the methods introduced in Section 4 are provided in Table 3. The Wald test and likelihood ratio test for the MOT model performed equally well, leading to a required sample size of 233. The methods using continuous data required considerably smaller sample sizes than the methods using only ordinal or dichotomous information. Specifically, the t-test performed next best after the MOT methods, confirming the results from the previous simulation study. C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim



393

The U-test required more samples than the t-test, which was expected for homogeneous variances and correctly specified thresholds. The Tobit model could not confirm the results from the previous study, and it performed worse than the other models using the continuous data. The sample size increased by, respectively, 19%, 59%, and 84% for the ordinal logit model, the trend test, and the chi-square test. In this section, we have illustrated the superiority of the MOT model over competitive analysis approaches for a caries lesion study. For these censored data it is important to use the available quantitative information.

6 Discussion Motivated by clinical trials for caries infiltration, the Multiple Ordinal Tobit (MOT) model was proposed as a new statistical model for analyzing multiple censored ordinal data. Model specification and implementation were validated in a first simulation study. For triple ordinal right censored data, as intended in the clinical trial, the MOT model showed lower bias in point estimates and lower mean squared errors than both the classical linear model and the standard Tobit model. In a second simulation study, the MOT model was compared with standard statistical approaches for multiple ordinal censored data. The results indicate that with respect to type I error and power, the MOT model with the Wald–type test performed best in all analyzed scenarios, and it was robust in case of heteroscedastic variances or misspecified thresholds. Furthermore, the trend and the chi-squared test suffered from low power compared to other approaches. For small samples sizes, the MOT–Wald model showed slightly inflated type I errors. We do not recommend its use with fewer than n = 10 samples. Modifications of the test statistic would be required to keep the nominal type I error level. Alternatively, the U-test could be employed. For heterogeneous variances, the type I error levels were slightly larger in a few settings, and here a sample size of at least n = 20 is indicated by the simulation studies for acceptable type I error levels. However, in applications, sample sizes smaller than n = 20 are rarely expected, especially for parallel group phase III trials in dentistry. In previous studies for caries infiltration, the lesion size was measured only in categories (Martignon et al., 2006, 2012; Paris et al., 2010). The size or progression was measured radiographically and classified manually afterwards. Additionally, in the statistical analysis the categories were neglected and a dichotomous test with only progression or no progression was used. As shown here, both simplifications lower the power considerably for detecting a treatment effect. Standard errors of maximum likelihood estimators can be estimated using the Hessian matrix or the Fisher information matrix (Greene, 2011). For the MOT model, we evaluated the performance of these two approaches in simulations. The use of the Hessian matrix led to slightly inflated type I errors compared to the use of the Fisher information matrix. We observed no difference in the power. We therefore decided to use the Fisher information matrix in all simulations presented in this work. Future research on the assessment of caries infiltration should focus on the development and establishment of methods to measure the lesion size of infiltrated lesions quantitatively. As shown here, this would lead to a reduction of required sample sizes. This might be possible with modern imaging technologies, such as subtraction radiology (Pretty, 2006). The MOT model proved to be suitable for analyzing such data. To widen the field of possible applications of the MOT model, it could be extended to unknown (Arminger et al., 1995) or random (Buchinsky and Hahn, 1998) thresholds. Reproducible Research All R scripts to run the simulation studies are supplied with the Supporting Information. simulation_study_1.R, simulation_study_2.R: Runscripts for simulation studies; study_planning.R Script to run the study planning in Section 5; testPower.R: Function for computation of power and type I error for all methods; tabtrend.R: Function for Cochran–Armitage test for trend. C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim


394


Conflict of interest A.Z. is Associate Editor of the Biometrical Journal, Statistics in Medicine and Methods of Information in Medicine. M.N.W. declares no conflict of interest.

References Arminger, G., Clogg, C. and Sobel, M. (1995). Handbook of Statistical Modeling for the Social and Behavioral Sciences. Plenum Publishing Corporation, New York, NY. Bellemare, M. F. and Barrett, C. B. (2006). An ordered Tobit model of market participation: evidence from Kenya and Ethiopia. American Journal of Agricultural Economics 88, 324–337. Bradley, J. (1978). Robustness? British Journal of Mathematical & Statistical Psychology 31, 144–152. Buchinsky, M. and Hahn, J. (1998). An alternative estimator for the censored quantile regression model. Econometrica 66, 653–671. Christensen, R. H. B. (2012). Ordinal: Regression Models for Ordinal Data. R package version 2012.01-19. http://www.cran.r-project.org/package=ordinal/. Cochran, W. G. (1954). Some methods for strengthening the common χ 2 tests. Biometrics 10, 417–451. Greene, W. H. (2011). Econometric Analysis (7th edn.). Prentice Hall, Upper Saddle River, NJ. ¨ ¨ Heinrich-Weltzien, R., Kuhnisch, J., Ifland, S., Tranæus, S., Angmar-M˚ansson, B. and Stoßer, L. (2005). Detection of initial caries lesions on smooth surfaces by quantitative light-induced fluorescence and visual examination: an in vivo comparison. European Journal of Oral Sciences 113, 494–498. Henningsen, A. (2012). censReg: Censored Regression (Tobit) Models. R package version 0.5-10. http://CRAN.R-project.org/package=censReg. Henningsen, A. and Toomet, O. (2011). maxlik: a package for maximum likelihood estimation in R. Computational Statistics 26, 443–458. Komárek, A., Lesaffre, E., Härkänen, T., Declerck, D. and Virtanen, J. I. (2005). A Bayesian analysis of multivariate doubly-interval-censored dental data. Biostatistics 6, 145–155. Lorimer, M. F. and Kiermeier, A. (2007). Analysing microbiological data: Tobit or not Tobit? International Journal of Food Microbiology 116, 313–318. Martignon, S., Ekstrand, K. and Ellwood, R. (2006). Efficacy of sealing proximal early active lesions: an 18-month clinical study evaluated by conventional and subtraction radiography. Caries Research 40, 382–388. Martignon, S., Ekstrand, K., Gomez, J., Lara, J. and Cortes, A. (2012). Infiltrating/sealing proximal caries lesions: a 3-year randomized clinical trial. Journal of Dental Research 91, 288–292. McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society, Series B 42, 109–142. Nakamura, A. and Nakamura, M. (1983). Part time and full time work behavior of married women: a model with a double truncated dependent variable. Canadian Journal of Economics 6, 144–159. ¨ Paris, S., Hopfenmuller, W. and Meyer-Lueckel, H. (2010). Resin infiltration of caries lesions. Journal of Dental Research 89, 823–826. Paris, S., Meyer-Lueckel, H. and Kielbassa, A. (2007). Resin infiltration of natural caries lesions. Journal of Dental Research 86, 662–666. Pretty, I. (2006). Caries detection and diagnosis: novel technologies. Journal of Dentistry 34, 727–739. R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica 26, 24–36. Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S (4th edn.). Springer, New York, NY. Wright, M. N. (2014). lmmot: Multiple Ordinal Tobit (MOT) model. R package version 0.1.2. http://CRAN.Rproject.org/package=lmmot.



Comparison of methods for analyzing left-censored occupational exposure data.

Statistical inference from multiply censored environmental data.

Sample size and robust marginal methods for cluster-randomized trials with censored event times.

A statistical framework for analyzing deep mutational scanning data.

Analyzing overall survival in randomized controlled trials with crossover and implications for economic evaluation.

Conflict of interest reporting in dentistry randomized controlled trials: a systematic review.

A Bayesian proportional hazards model for general interval-censored data.

Method for analyzing left-censored bioassay data in large cohort studies.

Statistical considerations when analyzing biomarker data.

Challenges in randomized controlled trials and emerging multiple sclerosis therapeutics.

Multiple frailty model for clustered interval-censored data with frailty selection.

A Statistical Model for Event Sequence Data.

Randomized controlled trials in dentistry: common pitfalls and how to avoid them.

Statistical analysis and handling of missing data in cluster randomized trials: a systematic review.

Statistical reporting in randomized controlled trials from the dermatology literature: a review of 44 dermatology journals.

A general method for handling missing binary outcome data in randomized controlled trials.

Stratified Weibull Regression Model for Interval-Censored Data.

The effect of cluster size variability on statistical power in cluster-randomized trials.

A multiple imputation strategy for sequential multiple assignment randomized trials.

A model-based approach for imputing censored data in source apportionment studies.

Effect of Risk of Bias on the Effect Size of Meta-Analytic Estimates in Randomized Controlled Trials in Periodontology and Implant Dentistry.

Correction: Effect of Risk of Bias on the Effect Size of Meta-Analytic Estimates in Randomized Controlled Trials in Periodontology and Implant Dentistry.

Sample size calculations for micro-randomized trials in mHealth.

Early versus deferred treatment for smoldering multiple myeloma: a meta-analysis of randomized, controlled trials.