422

Biometrical Journal 57 (2015) 3, 422–440

DOI: 10.1002/bimj.201400106

Focused information criterion on predictive models in personalized medicine Hui Yang1 , Yutao Liu2 , and Hua Liang∗,3 1 2

3

Medical Science Biostatistics, Amgen Inc, Thousand Oaks, CA 91320, USA School of Statistics and Mathematics, Central University of Finance and Economics, 39 South College Road, Beijing 100081, China Department of Statistics, George Washington University, Washington, DC 20052, USA

Received 28 May 2014; revised 29 September 2014; accepted 27 October 2014

Instead of assessing the overall fit of candidate models like the traditional model selection criteria, the focused information criterion focuses attention directly on the parameter of the primary interest and aims to select the model with the minimum estimated mean squared error for the estimate of the focused parameter. In this article we apply the focused information criterion for personalized medicine. By using individual-level information from clinical observations, demographics, and genetics, we obtain the personalized predictive models to make the prognosis and diagnosis individually. The consideration of the heterogeneity among the individuals helps reduce the prediction uncertainty and improve the prediction accuracy. Two real data examples from biomedical research are studied as illustrations.

Keywords: Heterogeneity; Model selection criterion; Personalized medicine; Predictive model; Prognosis and diagnosis.



Additional supporting information including source code to reproduce the results may be found in the online version of this article at the publisher’s web-site

1 Introduction As the growth of biotechnology and genomics continues, personalized medicine has become an important topic in medical practice and flexible in reality. Evidence-based medicine selects therapy based on a whole group of patients. Nevertheless, it ignores the heterogeneity among the patients within the cohort. Especially in oncology studies, it has been shown that cancers can be diverse in terms of their oncogenesis, pathogenesis, and responsiveness to therapy even if they are in the same primary site and stage (Simon, 2013). Certain medication, which has a significant treatment effect on some patients, may be of no use to others. Misuse of medication may expose the patients to the risks of adverse events with no benefit as illustrated in Dumas et al. (2007). By utilizing the individual level characteristics, such as patient demographics, imaging and exam results, laboratory parameters, and genetic or genomic information, personalized predictive models can improve individualized prognosis and diagnosis and correspondingly individualize and optimize therapy, as mentioned in Simon (2005, 2012). They therefore can be applied to many fields in personalized medicine, including personalized preventive care, personalized prognosis, diagnosis, and monitoring, as well as personalized therapy selection. Since this century, there has been vigorous statistical research about personalized therapy selection, such as Murphy (2002); Robins (2004); Moodie et al. (2007); Robins et al. (2008); Li et al. (2008); Qian and Murphy (2011); Brinkley et al. (2010); Gunter et al. (2011); and Zhang et al. (2012). ∗ Corresponding

author: e-mail: [email protected], Phone: +1-202-994-7844, Fax: +1-202-994-6917

 C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Biometrical Journal 57 (2015) 3

423

Most of the research involved a single or series of sequential decision-making processes and focused on estimating optimal treatment regimes. Some statisticians have also conducted subgroup analysis to tailor their findings to a specific group, such as Bonetti and Gelber (2000, 2004); Song and Pepe (2004); Pfeffer and Jarcho (2006); Wang et al. (2007); and Cai et al. (2011). For certain patients or subgroups, therapy that would result in the best estimated mean response outcomes based on specified models with specified exploratory variables is chosen from a set of candidate therapies. However, most of these methods are built on a basis of the assumption that the model with the specified exploratory variables is the true underlying model. Due to heterogeneity in the population, different exploratory variables might be identified to be significant for different patients or subgroups. Therefore in this article, instead of focusing on personalized therapy selection, we target personalized prognosis and diagnosis by using personalized predictive models. The construction of a reliable prediction rule for future responses is heavily dependent on the “adequacy” of the fitted model. Resulting from the traditional model selection criteria, like Akaike’s information criterion (AIC; Akaike, 1973), Bayesian information criterion (BIC; Schwarz, 1978) and the deviance information criterion (DIC; Ando, 2007), the final fitted model is the model with the overall best property for the whole population regardless of individuals. Even if the model is selected based on certain subgroup observations, it can still catch the overall information of that whole subgroup but does not necessarily work best for each individual in that group, as illustrated in Henderson and Keiding (2005). Claeskens and Hjort (2003) introduced a model selection criterion from the different perspective, called focused information criterion (FIC). It does not attempt to assess the overall fit of candidate models, but focuses attention directly on the parameter of the primary interest and aims to select the model with the minimum estimated mean squared error of the estimators of the parameter. The final model therefore ideally is the best one for that parameter only. This characteristic motivates us to apply FIC to personalized medicine, in particular to the field of utilizing personalized predictive model to make individualized prognosis and diagnosis. Although FIC has been well recognized and studied in the literature, it is the first time, to the best of our knowledge, to apply FIC in study of personalized medicine. This application fully reflects the feature of FIC’s emphasis on individualization. Section 2 briefly introduce the classic FIC and its related properties. Based on the well-established framework, we first illustrate an application of the classic personalized FIC in one cross-sectional binary case study and provide a personalized diagnosis on tumor penetration of the prostatic capsule for prostate cancer patients in Section 3. In Section 4, the personalized QFIC is applied to a longitudinal case study and used to make a personalized prognosis on patients’ treatment responses in relapsing remitting multiple sclerosis disease. We conclude in the final section.

2 Focused information criterion and QFIC As a parameter oriented model selection criterion, FIC has been well studied in some commonly used models, such as generalized linear models (Claeskens et al., 2006), Cox proportional hazards models (Hjort and Claeskens, 2006), semiparametric partial linear models (Claeskens and Carroll, 2007), generalized additive partial linear models (Zhang and Liang, 2011), and generalized estimating equations (Yang et al., 2014). In this section, we take a classical linear regression model as an example to illustrate the framework of FIC. Consider a study with n independent patients. Patient i has response yi and a set of the exploratory variables xi = (x1i , · · · , xki ), where all the explanatory variables can be grouped into two categories: p certain covariates, which are certainly included in the final model, and q uncertain ones, which we are not sure about. The corresponding p + q unknown coefficients therefore can be composed of certain coefficients θ = (θ1 , · · · , θ p ) and uncertain coefficients γ = (γ1 , · · · , γq ). Written β = (θ, γ ). The corresponding linear regression model can be written as: yi = x i β + i with i = 1, · · · , n. Any candidate model S can be written as the special case of the full model, β S = (θ, γ S , 0Sc ), where γ S  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

424

H. Yang et al.: FIC in personalized medicine

is a qS subvector of γ and 0Sc is a qSc subvector of q × 1 vector 0 with S ⊂ {1, · · · , q} and Sc being the complementary set. When S = N , the narrow model, β = (θ, 0), which includes the certain N covariates only. The corresponding (p + q) × (p + q) information matrix is denoted by  =

00 01 10 11

 and

 −1 =



  00  01 .  10  11

The targeted patient, who we want to make a prediction about, can be either in this study or a new patient coming later on, say patient j. The corresponding personalized prediction, therefore, can be written as the function of the model parameters, denoted by ζ j = ζ j (θ, γ ). The personalized focused information criterion, which is the part of the estimated mean squared error of the personalized prediction estimate  ζ j , has form as:   2 S11 πS  S ) FIC j,S = 2 ωj πS  ω j (Iq − D γ , ωj + n   −1 ∂ζ j /∂θ − ∂ζ j /∂γ, πS is the qS × q projection matrix mapping γ to γ S and D 10  S = where  ωj =  00       −1 −1 −1  11 πS   11 π    11 . In particular,  10 ,  00 , and   11 can be reached by the information πS πS  S S   matrix estimate , and  and  γ can both be estimated under the full model. The behavior of FIC is not only related to the uncertain parameter γ, in terms of different candidate model S, but also influenced by ω j , which is determined by the different targeted patient’s prediction ζ j . In the large sample context, FIC has nice asymptotic theoretical properties, shown in Appendix. In conclusion, FIC chooses the personalized predictive model based on the targeted patient. For each targeted patient, the one with the smallest FIC value, therefore the smallest estimated mean squared error, will be selected. In the context of GEE approach, Yang et al. (2014) proposes the quasi-likelihood-based focused information criterion (QFIC) for longitudinal data as 2   S11 πS  S ) QFICn,S = 2 ω πS  γ . ω+n  ω (Iq − D

(1)

This criterion, derived in the large sample sense, is not only related to the uncertain parameter γ, but also determined by the focus parameter ζ through ω. Therefore QFIC chooses the different model depending on the different focus quantities. The one with the smallest QFIC value, that is the smallest estimated mean square error of the focus parameter’s estimator, will be selected.

3 Prostate cancer study Prostate cancer is one of the most common cancers in American men. As it advances, cancer cells may spread from the prostate to the capsule. Knowing the cancer stage can help the doctor make a diagnosis and select a corresponding therapy. The first case study we discuss in this article is a prostate cancer trial with the possible capsule involvement (Hosmer and Lemeshow, 1989). In this trial, 151 out of 376 patients had prostate cancer that penetrated the prostatic capsule. The binary response, penetrat, indicates tumor penetration (0 – absence and 1 – presence). The corresponding potential explanatory factors include: dre, result of the digital rectal exam (1 – no nodule, 2 – unilobar left nodule, 3 – unilobar right nodule, and 4 – bilobar nodule); caps, detection of the capsular involvement in the rectal exam (1 – absence and 2 – presence); psa, prostate-specific antigen value (in mg/ml); volume, tumor volume obtained from ultrasound (in cm3 ); gscore, total Gleason score (0–10); race, (1 – white and 2 – black); and age.  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 57 (2015) 3

425

Table 1 Prostate cancer: The estimated values, standard errors and p-values by using the ordinary logistic model. Covariate

Estimate

Std. err.

Z-value

p-value

int. gscore dre(2) dre(3) dre(4) psa race caps volume age

−6.1e+00 9.7e−01 7.3e−01 1.5e+00 1.4e+00 2.9e−02 −6.8e−01 5.3e−01 −2.6e−03 −1.3e−02

1.9e+00 1.7e−01 3.6e−01 3.8e−01 4.6e−01 1.0e−02 4.7e−01 4.6e−01 2.6e−03 2.0e−02

−3.2 5.8 2.1 4.0 3.0 3.0 −1.4 1.1 −1.0 −0.7

1.6e−03 5.8e−09 4.0e−02 5.9e−05 2.5e−03 3.5e−03 1.5e−01 2.6e−01 3.2e−01 5.0e−01

Table 2

m1 m2 m3 m4 m5 m6 m7  m8

Prostate cancer: Candidate models. race

caps

volume

age

× × × × × × × ×

× × × × ◦ ◦ ◦ ◦

× ◦ ◦ ◦ × × ◦ ◦

× ◦ × ◦ × ◦ × ◦

m9 m10 m11 m12 m13 m14 m15 m16

race

caps

volume

age

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

× × × × ◦ ◦ ◦ ◦

× × ◦ ◦ × × ◦ ◦

× ◦ × ◦ × ◦ × ◦

Note: × indicates presence of the covariate in the candidate model and ◦ means its absence.

In this section, we aim to select a personalized predictive model for a targeted prostate cancer patient based on the personalized FIC. By doing so, we can better predict the targeted patient’s tumor penetration rate and therefore provide a personalized diagnosis for cancer progression. 3.1

Model selection implementation

We first fit the dataset using a classic logistic regression model with all the potential explanatory covariates listed above. The full model can be written as: logit(μ) =β0 + β1 gscore + β2 dre + β3 psa + β4 race + β5 caps + β6 volume + β7 age. By order of the significance, the corresponding statistical inference is listed in Table 1 including the estimated values, standard errors and p-values. Based on Table 1, we identify four highly significant certain covariates: int., gscore, exam, and psa. The predictive model, therefore, is selected by the personalized FIC and also the traditional AIC (for comparison) from the remaining 24 = 16 candidate models listed in Table 2. Here and below, “×” indicates the presence of the specific covariate in the specific candidate model and “◦” means its absence.  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

426

H. Yang et al.: FIC in personalized medicine

Figure 1 Prostate cancer: Frequency of candidate models selected by the personalized FIC as the personalized predictive models for 376 patients.

As a result of AIC, m8 is selected as the single overall predictive model for all the patients in the study and is circled in Table 2. Other than the four certain covariates, m8 also contains one uncertain covariate: race. Nevertheless, for implementation of the personalized FIC, we consider each patient’s capsular penetration prediction as an individual targeted parameter. Therefore, 376 personalized predictive models are selected individually for the corresponding 376 patients by the personalized FIC from the 16 candidate models. Figure 1 provides the frequencies of the 16 candidate models selected as the final personalized predictive models for the 376 patients. From this histogram, we observe that instead of the single predictive model m8 , the personalized predictive models mainly distribute among the candidate models: m6 , m7 , m8 , m11 , m12 , m14 , and m16 . In particular, more than 50 patients choose m8 , m12 , and m16 as their predictive models. 3.2

Cross-validation and simulation examination

In order to examine the predictive power of the personalized predictive models and the single final model m8 , we run the leave-one-out cross-validation experiment. The corresponding prediction error rates for the personalized predictive models and the single predictive model are 0.345 and 0.351, with both the standard deviations around 0.001 using bootstrap resampling. The smaller prediction error rate of the personalized predictive models indicates slight superiority of the personalized FIC compared to the traditional AIC. We also conduct a simulation study to see their performance at a relatively small sample size level. In order to mimic the patients in this study, we randomly sample with replacement from the 376 patients and generate 100 pseudo-patients with observations on the response and seven potential covariates. Again, we implement the personalized FIC and the traditional AIC to these 100 pseudo-patients and identify 100 personalized predictive models and one overall predictive model, based on which we make the corresponding penetration rates predictions. The corresponding mean squared errors can be obtained by comparing the predictions to the true penetration rates. We calculate the true penetration rates through the following formula:

p=

 C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

exp(τ ) , 1 + exp(τ ) www.biometrical-journal.com

Biometrical Journal 57 (2015) 3

427

Figure 2 Prostate cancer: Histograms of tumor volume and age.

where

τ = − 6.1 + 0.97gscore + 0.73dre(2) + 1.5dre(3) + 1.4dre(4) + 0.029psa − 0.68race + 0.53caps − 0.0026volume − 0.013age.

The coefficients used here are from Table 1. With one thousand replications, we arrive at estimated mean squared errors of 3.14 and 3.25 for the personalized predictive models and the single predictive model. The smaller mean squared error once again shows the better behavior of the personalized FIC compared to AIC.  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

428

H. Yang et al.: FIC in personalized medicine

Table 3 Prostate cancer: Group partition criteria. Criterion

Group A

Group B

race caps volume age

white presence > 2 cm3 (60,75)

black absence = 2 cm3 [40,60] or [75,80]

3.3

Group-specific analysis

In order to illustrate the personalized FIC consideration of the patients’ heterogeneity, we also perform the group-specific analysis based on four uncertain covariates. Figure 2 presents the histograms of the observations on two continuous uncertain covariates, volume and age. In particular, about 50% of the patients have 2 cm3 tumor volume obtained from ultrasound and the distribution of age looks similar to the bell shape. Based on these two histograms and also the outcomes of two binary uncertain covariates, race and caps, we categorize all the 376 patients into two groups with four different partition criteria, as listed in Table 3. In this subsection, we particularly target the patients whose personalized predictive models are different from the single predictive model m8 in terms of each uncertain covariate. As we reported in Subsection 3.1, m8 includes one extra uncertain covariate race in addition to the certain covariates. These targeted patients’ personalized predictive models, therefore,    either(i) exclude race that is shown in m8 , race[◦] ; (ii) include caps that is excluded from m , caps[×] ; (iii) include 8    volume that is excluded from m8 , volume[×] ; or (iv) include age that is excluded from m8 , age[×] . For each partition criterion, the percentage (pct.) of the targeted patients is calculated based on the number of patients in each group (size) as reported in Table 4. The percentage actually measures the difference shown in each group between the personalized predictive models and the single predictive model in terms of each specific uncertain covariate. The corresponding prediction error rates of the personalized predictive models (erFIC ) and the single predictive model (erAIC ) are calculated only based on the targeted patients and also shown in Table 4. We highlight the relatively higher percentages, which are greater than 50%. Particularly in the bottom row of the table, a total of 56% of the patients exclude race in their predictive models, regardless of the group partition criteria. The smaller prediction error rates of the personalized predictive model compared to the single predictive model for almost every category show the advantage of tailoring the predictive model individually based on the patient’s personal information. Based on each group partition criterion, we also compare the percentages of the targeted patients within the corresponding two groups. Generally speaking, various percentages in Table 4 do show the differences in each group-specific comparison. This is especially true of the boxes circled in the dashed line that indicate the pairs with quite different percentages. For race-based partition, 61% of black patients include caps in their personalized predictive models while only 29% of white patients do so. Since white patients are the majority in this study (340 out of 376), yet show a low percentage, this indicates simultaneously the overall fitting property of m8 selected by the traditional AIC. Based on the caps partition criterion, for the patients with and without capsular involvement, 60% versus 20% of patients exclude race and 25% versus 65% patients include volume in their final personalized predictive models. Groups A and B partitioned based on volume also reveal the quite different percentages, 72% versus 36%, in terms of race existence in their personalized predictive models. In summary, by considering the individual level information of prostate cancer patients, the personalized FIC considers patients’ heterogeneity and provides the best personalized predictive model for the targeted patient only. The smaller prediction error rate, the smaller mean squared error, and  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 57 (2015) 3

429

Table 4 Prostate cancer: Group-specific percentages and prediction error rates of targeted patients’ tumor penetration with four partition criteria. Criterion race

Group

Inference

race[◦]

caps[×]

volume[×]

age[×]

size

A

pct. erFIC erAIC pct. erFIC erAIC pct. erFIC erAIC pct. erFIC erAIC pct. erFIC erAIC pct. erFIC erAIC pct. erFIC erAIC pct. erFIC erAIC pct.

56% 0.337 0.339 61% 0.355 0.398 60% 0.338 0.342 20% 0.365 0.418 72% 0.349 0.357 36% 0.312 0.314 58% 0.335 0.339 50% 0.349 0.362 56%

29% 0.342 0.347 61% 0.399 0.437 36% 0.355 0.366 3% 0.009 0.004 48% 0.368 0.378 12% 0.274 0.284 35% 0.320 0.327 27% 0.456 0.480 32%

29% 0.317 0.320 33% 0.460 0.484 25% 0.347 0.344 65% 0.286 0.317 27% 0.381 0.383 33% 0.283 0.291 23% 0.341 0.343 44% 0.322 0.331 29%

26% 0.302 0.310 36% 0.175 0.211 25% 0.291 0.295 45% 0.264 0.308 26% 0.290 0.302 28% 0.282 0.291 33% 0.287 0.295 11% 0.278 0.315 27%

340

B

caps

A

B

volume

A

B

age

A

B

Total

36

336

40

211

165

267

109

376

Note: pct. indicates the percentage of the targeted patients in each group; erFIC and erAIC indicate the prediction error rates of the personalized predictive models and the single predictive model based on the targeted patients in each group; size indicates the number of patients in each group; race[◦] indicates the targeted patients whose personalized predictive models exclude race; caps[×] indicates the targeted patients whose personalized predictive models include caps; volume[×] indicates the targeted patients whose personalized predictive models include volume; and age[×] indicates the targeted patients whose personalized predictive models include age.

the results of the group-specific analysis all show the advantage of the personalized predictive models concluded by the personalized FIC over the single predictive model selected by the AIC. Therefore, diagnosis of the targeted prostate cancer patients’ capsular penetration can be better made individually based on the different personalized predictive models chosen by the personalized FIC.

4 Relapsing remitting multiple sclerosis case study Other than the cross-sectional study, the personalized predictive models can also be used for individualized prognosis and diagnosis in longitudinal study. As an illustration, the second case study we perform is from a longitudinal clinical trial, which aims to assess the effects of neutralizing antibodies on interferon beta-1 (IFNB) in relapsing remitting multiple sclerosis (RRMS), a disease that destroys the myelin sheath surrounding the nerves.  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

430

H. Yang et al.: FIC in personalized medicine

Figure 3 RRMS: Empirical and estimated exacerbation rates on visit days and duration time. Table 5 RRMS: Statistical inference under full model. Covariate

Estimate

Std. err.

Wald

p-value

edss int. dose(1) dose(2) lot age sex

2.9e−01 −1.4e+00 7.5e−02 −3.5e−01 3.9e−01 −1.3e−02 1.4e−01

8.6e−02 8.1e−01 3.1e−01 3.1e−01 3.3e−01 1.5e−02 3.4e−01

11.6 3.0 0.1 1.3 1.3 0.8 0.2

6.6e−04 8.4e−02 8.1e−01 2.6e−01 2.5e−01 3.8e−01 6.7e−01

We particularly focus on a 15-week magnetic resonance imaging (MRI) study involving 50 patients in two locations, randomized into three treatment groups: 17 in placebo, 17 in low-dose and 16 in high-dose. At each of 17 scheduled visits, a binary exacerbation outcome exacerb was recorded at the time of each MRI scan, according to whether an exacerbation began since the previous scan (1 – positive and 0 – negative). The potential explanatory covariates include: edss, expanded disability status scale; dose, treatment groups (0 – placebo, 1 – low dose, and 2 – high dose); duration, rrms duration (in years); lot, location indicator (0 – location A and 1 – location B); sex; and visit, the visit times (in days). The goal of this study is to identify the prediction rule, by which we can then accurately predict the targeted patients’ exacerbation response to the specific treatment. We can make a better prediction even at the targeted visit time. 4.1

Model selection implementation

We consider the following generalized additive partially linear models incorporating the GEE approach for this study: logit(μ) = η1 (visit) + η2 (duration) + β1 edss + β2 dose + β3 lot + β4 age + β5 sex  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 57 (2015) 3

431

Table 6 RRMS: Candidate models.

m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12 m13 m14 m15 m16

int.

dose

lot

age

sex

× × × × × × × × × × × × × × × ×

× × × × × × × × ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

× × × × ◦ ◦ ◦ ◦ × × × × ◦ ◦ ◦ ◦

× × ◦ ◦ × × ◦ ◦ × × ◦ ◦ × × ◦ ◦

× ◦ × ◦ × ◦ × ◦ × ◦ × ◦ × ◦ × ◦

m17 m18 m19 m20 m21 m22 m23 m24 m25 m26 m27 m28 m29 m30 m31 m32

int.

dose

lot

age

sex

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

× × × × × × × × ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

× × × × ◦ ◦ ◦ ◦ × × × × ◦ ◦ ◦ ◦

× × ◦ ◦ × × ◦ ◦ × × ◦ ◦ × × ◦ ◦

× ◦ × ◦ × ◦ × ◦ × ◦ × ◦ × ◦ × ◦

Note: × indicates presence of the covariate in the candidate model and ◦ means its absence.

where visit and duration are set in nonparametric components and μ is the conditional expectation of exacerb. Figure 3 plots the empirical exacerbation rates at different visit days and with different RRMS duration time. It confirms the nonlinear trends of these two covariates on the log odd ratio of the response. We therefore fit this full model using the polynomial spline method incorporating the generalized estimating equations (GEE) approach with the exchangeable working correlation. The two degree natural splines are used to approximate the two nonparametric functions. The fitted curves of these two nonparametric components, η1 (visit) and η2 (duration), are depicted in the dashed line in Fig. 3. The statistical inference on the rest of the coefficients, including their estimates, standard errors, and corresponding p-values, are listed in Table 5. Based on Table 5, other than the two nonparametric components, we also include the highly significant covariate edss in the narrow model and perform the model selection procedure among the remaining five uncertain factors. There are totally 25 = 32 candidate models, as listed in Table 6. The AIC-type model selection criterion AIC for longitudinal data incorporating the GEE approach, proposed in Yang et al. (2014), selects m30 as the final single predictive model, which can be written as: logit(μ) = η1 (visit) + η2 (duration) + β1 edss + β4 age. It is circled in Table 6. Regardless of the different characteristics of the different patients at the different visit times, m30 is from the overall perspective for this longitudinal study. On the other hand, the personalized QFIC, proposed in Yang et al. (2014) considers the observations’ heterogeneity among the patients and even among the same patient’s different visit times. By taking each observation’s exacerbation prediction as the individual targeted parameter, the personalized QFIC chooses different personalized predictive models for different patients at the different visit times. In this study, we have 50 patients and about 17 visit times for each patient, therefore totaling 822 observations. Figure 4 provides the frequencies of the 32 candidate models chosen as the corresponding  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

432

H. Yang et al.: FIC in personalized medicine

Figure 4 RRMS: Frequencies of candidate models selected by the personalized QFIC as the personalized predictive models for 822 observations.

822 personalized predictive models. From the histogram in Fig. 4, we can observe that other than AIC’s single predictive model m30 , m16 also has a relatively higher frequency. In particular, on certain targeted visit days, namely days 7, 31, 61, and 104, the frequencies of the candidate models selected as the personalized predictive models for these 50 patients are also plotted in Fig. 5. We observe the slight difference among these four histograms. Due to the relatively strong correlation among each patient’s repeated measurements, the four histograms, however, all have relatively higher frequencies to choose m16 and m30 for the 50 patients. This general trend in these histograms is consistent with the trend shown in the overall histogram in Fig. 4. By using the cross-validation experiment, we also examine the predictive powers of the 822 personalized predictive models and the single predictive model m30 . Again, due to the complicated correlation structure of each patient’s repeated measurements, the leave-one-patient-out experiment is used. Based on one thousand replications, the prediction error rates of 0.2643 for the personalized predictive models and 0.2733 for the single predictive model (both the standard deviations are around 0.0004 using bootstrap resampling) show the superiority of tailoring predictive models individually by the personalized QFIC. 4.2

Group-specific analysis

Similar as the discussion in Subsection 3.3, in order to illustrate the personalized QFIC consideration of the patients’ heterogeneity, we also carry out the group-specific analysis based on four uncertain covariates: dose, lot, sex and age. The analysis is performed respectively at four different targeted visit days, namely days 7, 31, 61, and 104. Again, we specifically focus on the patients whose personalized predictive models are different from the single predictive model m30 in terms of the presence or absence of each uncertain covariate. As reported in Subsection 4.1, m30 includes one extra uncertain covariate age, other than the certain covariates. Therefore, the targeted patients in this subsection have their personalized predictive models: either (i) including dose[×] based on dose group partition criterion; (ii) including lot[×] based on lot partition criterion; (iii) including sex[×] based on sex criterion; or (iv) excluding age[◦] based on age criterion. The corresponding percentages, prediction error rates estimated through the personalized  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 57 (2015) 3

433

Figure 5 RRMS: Frequencies of candidate models selected by the personalized QFIC as the personalized predictive models for 50 patients at visit days of 7, 31, 61, and 104. predictive models and the single predictive model for the targeted patients at the visit days of 7, 31, 61, and 104 are reported in Table 7. The high percentages of the targeted patients in Table 7 show the relatively large differences of the personalized predictive models from the single predictive model in the corresponding groups in terms of the existence of specific uncertain covariate. In particular, we highlight the percentages of greater than 30%. For the uncertain covariate age, in the earlier visit days of 7 and 31, the younger group, composed of patients who are younger than 30 years old, has the higher percentages. In other words, around 40% of the patients in the younger group exclude age from their personalized predictive models at day 7 and 31. The majority of the patients in the remaining two groups have their personalized predictive models consistent with m30 in terms of the existence of age. But in the later visit days of 61 and 104,  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

 C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

35% 0% 19% 18% 0% 0% 0% 0% 8% 2% 38% 23% 31%

28%

placebo low high total A B total male female total ࣘ30 30-40 ࣙ40

total

dose [×]

0.163

0.281 – 0.142 0.235 – – – – 0.273 0.273 0.357 0.125 0.144

erFIC

0.175

0.305 – 0.238 0.283 – – – – 0.505 0.505 0.357 0.141 0.155

erAIC

50

17 17 16 50 10 40 50 38 12 50 8 26 16

size

31%

31% 0% 19% 16% 0% 0% 0% 5% 17% 8% 38% 28% 31%

pct.

0.144

0.153 – 0.075 0.124 – – – 0.430 0.111 0.271 0.344 0.138 0.064

erFIC

0.157

0.176 – 0.121 0.156 – – – 0.446 0.191 0.319 0.367 0.152 0.071

erAIC

Day 31

49

16 16 16 49 9 40 49 37 12 49 8 25 16

size

40%

24% 0% 13% 12% 0% 8% 6% 13% 17% 14% 50% 35% 44%

pct. 0.102 – 0.052 0.085 – 0.379 0.379 0.105 0.142 0.116 0.187 0.084 0.179 0.126

0.125

erAIC

0.101 – 0.052 0.0843 – 0.368 0.368 0.118 0.109 0.115 0.205 0.072 0.192

erFIC

Day 61

50

17 17 16 50 10 40 50 38 12 50 8 26 16

size

47%

25% 0% 7% 11% 0% 11% 9% 6% 10% 7% 67% 40% 50%

pct.

0.294

0.181 – 0.814 0.307 – 0.535 0.535 0.547 0.183 0.425 0.899 0.217 0.286

erFIC

0.319

0.223 – 0.853 0.349 – 0.525 0.525 0.656 0.307 0.540 0.961 0.244 0.296

erAIC

Day 104

45

16 15 14 45 8 37 45 35 10 45 6 25 14

size

Note: At each targeted visit day, pct. indicates the percentage of the targeted patients in each group; erFIC and erAIC indicate the prediction error rates of the personalized predictive models and the single predictive model based on the targeted patients in each group; size indicates the number of patients in each group; age[◦] indicates the targeted patients whose personalized predictive models exclude age; dose[×] indicates the targeted patients whose personalized predictive models include dose; lot[×] indicates the targeted patients whose personalized predictive models include lot; and sex[×] indicates the targeted patients whose personalized predictive models include sex.

age [◦]

sex [×]

lot [×]

pct.

Group

Criterion

Day 7

Table 7 RRMS: Group-specific percentages and prediction error rates for the targeted patients’ exacerbation response at the targeted visit days with four partition criteria.

434 H. Yang et al.: FIC in personalized medicine

www.biometrical-journal.com

Biometrical Journal 57 (2015) 3

435

Table 8 RRMS: Personalized predictive models concluded by the personalized QFIC for targeted patients under 12 scenarios. sex dose placebo low high

lot

Male

Female

A B A B A B

int. age sex int. age int. dose age age

age int. sex age int. lot sex dose age sex age

more than 35% of patients in all three groups exclude age from their personalized predictive models, as shown in the dashed box at the bottom right corner of Table 7. This indicates the larger difference between the predictive models concluded by the personalized QFIC and AIC in the later visit days compared to the earlier days. It simultaneously shows the personalized QFIC’s consideration of the heterogeneity among observations at different visits. In addition, for consideration of the heterogeneity among the patients, the top dashed box encircles the placebo group based on the dose group partition criterion. At all four visit days, more than 24% of patients in the placebo group include dose in their personalized predictive models compared to the other two treatment groups. It is reasonable to infer that the patients in the placebo group tend to have no treatment effect. Therefore, the treatment indicator dose may be significant for patients in the placebo group to better predict their exacerbation rate. Finally, most of the categories based on the personalized QFIC in Table 7 have smaller prediction error rates compared to AIC. This again shows the advantage of tailoring the predictive model individually based on the individual level information. 4.3

Statistical inference on targeted patients

Rather than just focusing on prediction accuracy of the patients in the study, we also try to predict future targeted patients. In this subsection, we particularly consider 36 year old patients who have 8.8 years of RRMS disease with the expanded disability status scale of 4. All these values are actually the medians of the observations on the continuous potential covariates age, duration, and edss in this current study. To illustrate that the personalized predictive models are tailored by the personalized QFIC for each targeted patient, we place these patients into twelve different scenarios based on the three categorical potential factors of sex, lot, and dose. Table 8 records the corresponding twelve personalized predictive models selected by the personalized QFIC. The corresponding exacerbation rate predictions through the 17 visit times for each scenario are also plotted in Fig. 6. In Table 8, the female patients who are in placebo and low-dose group at location A and the male patients who are in low-dose group at location A and in high-dose group at location B have their personalized predictive models only include age, thus consistent with the single predictive model m30 . The targeted patients who receive high dose in location A include dose in their personalized predictive model. The female patients who are in low-dose group in location B include lot in their personalized predictive models. The uncertain explanatory covariate age is significant for all patients in location A, regardless of their gender and treatment. In location B, however, only high-dose males identify age’s significance. Among these twelve scenarios, females tend to include sex in their personalized predictive models.  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

436

H. Yang et al.: FIC in personalized medicine

Figure 6 RRMS: Exacerbation rate predictions for targeted patients under the single predictive model and the 12 personalized predictive models. Note: QFICplacebo indicates the personalized predictive model for targeted patient in placebo group. QFIClow indicates the personalized predictive model for targeted patient in low-dose group. QFIChigh indicates the personalized predictive model for targeted patient in high-dose group.

From Fig. 6 we observe that both the personalized predictive models and the single predictive model m30 show the U-shaped exacerbation rate predication along with the visit time. But in the different treatment groups, namely the placebo, low-dose, and high-dose groups, the personalized predictive exacerbation rate is decreasing as the dose level changes from placebo to high, while it stays the same by the single predictive model m30 . In conclusion, the personalized QFIC utilizes the individual level information and considers the heterogeneity among RRMS patients and even among the repeated measurements from the same patient at different visit times. With the personalized predictive model selected by the personalized QFIC, we can therefore reach a more accurate exacerbation rate prediction and make a better prognosis and diagnosis on treatments for the targeted patient only.  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 57 (2015) 3

437

5 Conclusion and remarks Through these three case studies, namely the cross-sectional study in prostate cancer, longitudinal study in relapsing remitting multiple sclerosis disease and survival study in lung cancer, we illustrate the superiority of the personalized FIC in identifying the personalized predictive models for personalized prognosis and diagnosis. We thus show the applicability of the personalized FIC in one field of personalized medicine. Different from the traditional model selection criteria, FIC does not attempt to assess the overall fitting of candidate models but instead focuses attention directly on the parameter of primary interest. Generally speaking, in the model selection procedure, including the unnecessary covariates may lead to estimates with small bias but high variance, while excluding the necessary covariates typically yields large bias though small variance. FIC balances the goals of having a small bias and a small variance and aims to provide the small mean squared error of the estimates. With the information from all the patients in the study, the traditional procedure makes the statistical inference and the prediction for any targeted patient with the “overall fitting” model selected by the traditional model selection criterion. Due to the patients’ heterogeneity, the model with the overall best property may not be the best for the targeted patient. By using the individual level information from the targeted patient, the personalized FIC focuses on individual prediction and aims to find his/her own best model in terms of the minimum mean squared error estimate of his/her own prediction. The leave-one-out cross-validation experiments and group-specific analysis are performed for all three case studies. The smaller prediction error rate attained by using the different personalized predictive models compared to the single “overall best” model shows the superiority of our perspective. In this article, we only utilize FIC’s individualized consideration to personalized prognosis and diagnosis. More research and applications of the personalized FIC need to be invested in the field of personalized medicine, such as the personalized therapy selection and monitoring. Acknowledgments The authors thank two reviewers for their insightful comments and suggestions that lead to improvement of the manuscript. Liang’s research was partially supported by NSF grants DMS-1207444 and DMS 1418042, and Award Number 11228103, made by National Natural Science Foundation of China.

Conflict of interest The authors have declared no conflict of interest.

Appendix Here, we only take a classical linear regression model as an example to illustrate the large sample behavior based on the local misspecification framework for focused information criterion. More details are discussed in Claeskens and Hjort (2008). Based on the notation introduced in Section 2, the true model can be defined in the local misspecification framework shown in Claeskens and Hjort (2008):  √  β 0 = (θ 0 , γ 0 ) = θ 0 , δ/ n .

Here δ = (δ1 , · · · , δq ) measures how far away the true model is from the narrow model in directions  √  1, · · · , q of order O 1/ n and some δi ’s can be 0. Under this scenario, the size of the squared model biases and the model variances can reach O(1/n), the highest possible large sample approximation.  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

438

H. Yang et al.: FIC in personalized medicine

Thus, the score function of the full model, evaluated at (θ 0 , 0), can be written as: 

U1 U= U2





∂l (θ, γ; D)/∂θ = ∂l (θ, γ; D)/∂γ

 θ=θ 0 ,γ=0

.

The corresponding (p + q) × (p + q) information matrix is denoted by: 

00 01  = varN (U ) = 10 11

 

and

−1

  00  01 , =  10  11 

−1 where  11 = (11 − 10 00 01 )−1 . Let πS be the qS × q projection matrix mapping γ to γ S , then the quasi-score of submodel S, evaluated at (θ 0 , 0), can be written as:



U1 US = U2,S





 U1 = . πS U2

The corresponding information matrix will have a (p + qS ) × (p + qS ) dimension:  S =

00 πS 10

01 πS πS 11 πS

 and



S11

−1

 −1 = πS  11 πS .

Assume that the focus parameter can be written as the function of the model parameters, denoted  by ζ = ζ (θ, γ ), which has the continuous partial derivatives in the neighborhood of ζ 0 = ζ θ 0 , γ 0 . To simplify the notation, let: τ 20 =

∂ζ ∂θ



−1 00



∂ζ ∂θ

and

 −1 DS = πS S11 πS  11 .

  Denote the maximum likelihood estimates under submodel S by  θ S , γ S . Claeskens and Hjort (2003) has mentioned that the estimates of uncertain parameter under the full model has the limiting distribution as follows: √

  d n γ → ∼ Nθ δ,  11 .

In particular, under the submodel S:  −1 √ d n γ S → S11 πS  11 . Therefore, the corresponding maximum likelihood estimates under submodel S can be written as  ζS, which has limiting distribution of the form: √

d n( ζ S − ζ 0 ) → S =



∂ζ ∂θ



−1 00 M1 + ω δ − ω DS ,

with the mean squared error as:

2 11  mse( S ) = τ 20 + ω π I  π ω + ω − D S S S q S δ .  C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Biometrical Journal 57 (2015) 3

439

And the parameters τ 0 , ω, S11 , DS and δ can be estimated incorporating GEE approach under the full model. The focused information criterion therefore has been proposed as: 2   S11 πS  S ) ω πS  γ . FICn,S = 2 ω+n  ω (Iq − D

(1)

References Akaike, H. (1973). Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika 60, 255–265. Ando, T. (2007). Bayesian predictive information criterion for the evaluation of hierarchical bayesian and empirical bayes models. Biometrika 94(2), 443–458. Bonetti, M. and Gelber, R. D. (2000). A graphical method to assess treatment-covariate interactions using the cox model on subsets of the data. Statistics in Medicine 19(19), 2595–2609. Bonetti, M. and Gelber, R. D. (2004). Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics 5(3), 465–481. Brinkley, J., Tsiatis, A. and Anstrom, K. J. (2010). A generalized estimator of the attributable benefit of an optimal treatment regime. Biometrics 66(2), 512–522. Cai, T., Tian, L., Wong, P. H. and Wei, L. J. (2011). Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics 12(2), 270–282. Claeskens, G. and Carroll, R. J. (2007). An asymptotic theory for model selection inference in general semiparametric problems. Biometrika 94, 249–265. Claeskens, G., Croux, C. and van Kerckhoven, J. (2006). Variable selection for logistic regression using a predictionfocused information criterion. Biometrics 62, 972–979. Claeskens, G. and Hjort, N. (2003). The focused information criterion. Journal of the American Statistical Association 98, 900–916. Claeskens, G. and Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge University Press, Cambridge, UK. Dumas, Todd E., Hawke Roy, L., Lee and Craig, R. (2007). Warfarin Dosing and the Promise of Pharmacogenomics. Current Clinical Pharmacology 2(1), 11–21. Gunter, L., Zhu, J. and Murphy, S. A. (2011). Variable Selection for Qualitative Interactions. Statistical Methodology 1(8), 42–55. Henderson, R. and Keiding, N. (2005). Individual survival time prediction using statistical models. Journal of Medical Ethics 31(12), 703–706. Hjort, N. L. and Claeskens, G. (2006). Focused information criteria and model averaging for the Cox hazard regression model. Journal of the American Statistical Association 101, 1449–1464. Hosmer, D. W. and Lemeshow, S. (1989). Applied Logistic Regression. John Wiley & Sons, New York, NY. Li, C.-Y., Mao, X. and Wei, L. (2008). Genes and (common) pathways underlying drug addiction. PLoS Comput Biol 4(1), e2. Moodie, E. E. M., Richardson, T. S. and Stephens, D. A. (2007). Demystifying optimal dynamic treatment regimes. Biometrics 63(2), 447–455. Murphy, S. A. (2002). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, Series B 65, 331–366. Pfeffer, M. A. and Jarcho, J. A. (2006). The charisma of subgroups and the subgroups of charisma. New England Journal of Medicine 354(16), 1744–1746. Qian, M. and Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Annals of Statistics 39(2), 1180–1210. Robins, J. M. (2004). Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium on Biostatistics, Springer, New York, NY. Robins, J., Orellana, L. and Rotnitzky, A. (2008). Estimation and extrapolation of optimal treatment and testing strategies. Statistics in Medicine 27(23), 4678–4721. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, 461–464. Simon, R. (2005). Roadmap for developing and validating therapeutically relevant genomic classifiers. Journal of Clinical Oncology 23, 7332–7341.

 C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

440

H. Yang et al.: FIC in personalized medicine

Simon, R. (2012). Clinical trials for predictive medicine. Statistics in Medicine 31(25), 3031–3040. Simon, R. (2013). Genomic Clinical Trials and Predictive Medicine. Practical Guides to Biostatistics and Epidemiology, Cambridge University Press, Cambridge, UK. Song, X. and Pepe, M. (2004). Evaluating markers for selecting a patient’s treatment. Biometrics 60(4), 874–83. Wang, R., Lagakos, S. W., Ware, J. H., Hunter, D. J. and Drazen, J. (2007). Statistics in medicine–reporting of subgroup analyses in clinical trials. New England Journal of Medicine 357(21), 2189–2194. Yang, H., Lin, P., Zou, G. and Liang, H. (2014). Variable selection and model averaging for longitudinal data incorporating GEE approach. Technical report, George Washington University, Washington, DC. Zhang, B., Tsiatis, A. A., Laber, E. B. and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics 68(4), 1010–1018. Zhang, X. Y. and Liang, H. (2011). Focused information criterion and model averaging for generalized additive partial linear models. The Annals of Statistics 39, 174–200.

 C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

Focused information criterion on predictive models in personalized medicine.

Instead of assessing the overall fit of candidate models like the traditional model selection criteria, the focused information criterion focuses atte...
544KB Sizes 4 Downloads 6 Views