Human Reproduction, Vol.30, No.1 pp. 28– 36, 2015 Advanced Access publication on November 5, 2014 doi:10.1093/humrep/deu295

ORIGINAL ARTICLE Embryology

Which set of embryo variables is most predictive for live birth? A prospective study in 6252 single embryo transfers to construct an embryo score for the ranking and selection of embryos 1 Department of Women’s and Children’s Health, Uppsala University, S-751 85 Uppsala, Sweden 2Uppsala Clinical Research Center, Akademiska sjukhuset (UCR), S-751 85 Uppsala, Sweden 3Carl von Linne´ Clinic, Uppsala Science Park, S-751 83 Uppsala, Sweden 4 Centre for Reproductive Biology in Uppsala (CRU), PO Box 7054, S-750 07 Uppsala, Sweden

*Correspondence address. E-mail: [email protected]

Submitted on February 5, 2014; resubmitted on October 9, 2014; accepted on October 15, 2014

study question: Which embryo score variables are most powerful for predicting live birth after single embryo transfer (SET) at the early cleavage stage?

summary answer: This large prospective study of visual embryo scoring variables shows that blastomere number (BL), the proportion of mononucleated blastomeres (NU) and the degree of fragmentation (FR) have independent prognostic power to predict live birth. what is known already: Other studies suggest prognostic power, at least univariately and for implantation potential, for all five variables. A previous study from the same centre on double embryo transfers with implantation as the end-point resulted in the integrated morphology cleavage (IMC) score, which incorporates BL, NU and EQ.

study design, size and duration: A prospective cohort study of IVF/ICSI SET on Day 2 (n ¼ 6252) during a 6-year period (2006 –2012). The five variables (BL NU, FR, EQ and symmetry of cleavage (SY)) were scored in 3- to 5-step scales and subsequently related to clinical pregnancy and LBR.

participants/materials, setting, methods: A total of 4304 women undergoing IVF/ICSI in a university-affiliated private fertility clinic were included. Generalized estimating equation models evaluated live birth (yes/no) as primary outcome using the embryo variables as predictors. Odds ratios with 95% confidence intervals and P-values were presented for each predictor. The C statistic (i.e. area under receiver operating characteristic curve) was calculated for each model. Model calibration was assessed with the Hosmer –Lemeshow test. A shrinkage method was applied to remove bias in c statistics due to over-fitting. main results and the role of chance: LBR was 27.1% (1693/6252). BL, NU, FR and EQ were univariately highly significantly associated with LBR. In a multivariate model, BL, NU and FR were independently significant, with c statistic 0.579 (age-adjusted c statistic 0.637). EQ did not retain significance in the multivariate model. Prediction model calibration was good for both pregnancy and live birth. We present a ranking tree with combinations of values of the BL, NU and FR embryo variables for optimal selection of the embryo/s to transfer, providing a revised IMC score. The five embryo variables had similar effects over all age groups.

limitations, reasons for caution: Limitations of the present study are those inherent for real-time visual scoring, including risks of inter-observer variation and the hazards of fixed time-point scoring procedures in a dynamic process. The study is restricted to Day-2 transfers. wider implications of the findings: To our knowledge this is the largest prospective, SET study performed with the explicit aim of constructing an evidence-based embryo score for the ranking and selection of early cleavage stage embryos. In line with previous research, our data suggest that the symmetry of cleavage variable may be omitted when scoring embryos in the early cleavage stage. We suggest that, following validation in other populations, the revised IMC score may be used when international standards for embryo scoring are discussed. & The Author 2014. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: [email protected]

Downloaded from http://humrep.oxfordjournals.org/ at Selcuk University on December 29, 2014

A. Rhenman 1,*, L. Berglund 2, T. Brodin 3, M. Olovsson1, K. Milton 3, N. Hadziosmanovic 2, and J. Holte 1,3,4

29

Construction of an embryo score to predict live birth

study funding/competing interest: Carl von Linne´ Clinic, Uppsala and the Department of Women’s and Children’s Health and the Family Planning Fund in Uppsala, Uppsala University, Uppsala, Sweden financed this study. There are no competing interests to declare. Key words: embryo morphology / embryo score / implantation potential / prediction / embryo transfer

Introduction

Materials and Methods All data were collected prospectively. Unselected treatments at the Carl von Linne´ Clinic during a period of 6 years (2006– 2012) were included. Only treatments that resulted in a SET on Day 2 after oocyte retrieval were studied regardless of whether SET was elective or not. Patients (i.e. treatments) were enrolled regardless of cause and duration of infertility and regardless of infertility being primary or secondary.

Downloaded from http://humrep.oxfordjournals.org/ at Selcuk University on December 29, 2014

Embryo quality is widely regarded as the single most important factor for the chance of a successful IVF treatment (Harton et al., 2013). Scoring of embryo morphology and cleavage rate is a clinically useful method in embryo selection and may serve as a proxy for embryo quality (Gardner et al., 2000, 2004; Balaban et al., 2006; Holte et al., 2007). There is still a lack of evidence-based standardized morphological and cleavage variables for ranking and selection of the embryo with the greatest chance of resulting in a live baby. The variation in embryo scoring models makes comparisons between different clinics and treatment strategies difficult, a fact which is highlighted when accumulating national data (Vernon et al., 2011). Solid data on how to rank and select individual embryos with different types and degrees of deviations from the ‘ideal’ morphologic and cleavage appearance are still scarce. Thus, in clinical practice, it is not clear if, for example, an embryo with ‘too fast’ or ‘too slow’ cleavage and no fragmentation should be selected for transfer before an embryo with normal cleavage rate and moderate fragmentation, or if a slight variation in blastomere size is a worse prognostic sign than the absence of a visible normal nucleus in a proportion of the blastomeres. In a previous study it was investigated whether the five commonly used embryo variables, namely blastomere number (BL), the proportion of mononucleated blastomeres (or presence of multinuclearity; NU), the degree of fragmentation (FR), the size variation of the blastomeres (equality, EQ) and the symmetry of the embryo (SY), were independent predictors of implantation potential of the embryo (Holte et al., 2007). Of those five variables, only three were independent predictors in multivariate regression analysis: BL, EQ and NU. Although also significant in univariate testing, the variables FR and SY did not retain significance in the multivariate model. The Holte et al. (2007) study was conducted on double embryo transfers (DET), and thus the primary outcome was either the implantation of both or neither of the transferred embryos. The study resulted in a clinically useful algorithm, the integrated morphology cleavage score (IMC), which enables evidence-based scoring of embryos on Day 2 after oocyte retrieval in a 10-point scale. The IMC score forms an essential part of a validated prediction model as an aid in the decision between SET and DET (Holte et al., 2011a), a model which has been in clinical use in several clinics for a number of years. The aim of the present study was to re-evaluate the same five embryo variables in a second prospective study, using single embryo transfer (SET) only and with live birth rate (LBR) as primary end-point.

Ovarian stimulations were conducted with individual doses of recombinant FSH in the form of follitropin alfa (Merck Serono, Darmstadt, Germany) or follitropin beta (MSD, Whitehouse Station, New Jersey) in 79% of the cycles and with human menopausal gonadotrophin (hMG) (Ferring, Lausanne, Switzerland) in 21% of the cycles. In 90% of the cycles, patients underwent long GnRH-agonist down-regulation with nafarelin (Pfizer, New York, New York) or buserelin (Sanofi, Paris, France). In 10% of the treatments, a GnRH antagonist (Cetrorelix, Merck Serono, Darmstadt, Germany; or Ganirelix, MSD, Whitehouse Station, New Jersey) was used, starting on the fifth day of stimulation. Final oocyte maturation was triggered with hCG using either 10 000 IU of urinary derived hCG (MSD, Whitehouse Station, New Jersey) or 250 mg (6500 IU) of recombinant hCG (Merck Serono, Darmstadt, Germany), and oocytes were aspirated after 35– 37 h. Oocytes were inseminated (or injected with sperm after denudation for ICSI) after 2– 6 h of incubation and then cultured in one of three culture medium (MediCult, Denmark; Cook Medical, Australia or Vitrolife, Sweden) in a 5% CO2 incubator at 378C. Fertilization was checked 16 – 20 h after the insemination. The embryos were evaluated 46 – 48 h after insemination by placing the Petri dish with the embryos on the heating plate under a stereomicroscope (Nikon SMZ 1500) at maximal magnification (×110). The embryos were gently rolled with a Pasteur pipette to assess the five morphological parameters. Light was directed obliquely through an angled mirror beneath the microscope to enable the visualization of the nuclei. The five embryo variables were scored as follows: BL was recorded as 2, 3, 4, 5 or ≥6, FR was scored as 0 (no fragments), 1 (≤10% fragmentation), 2 (.10 ≤25%), 3 (.25 ≤50%) and 4 (.50% fragmentation). EQ was scored according to a three-level system, where 0 denotes uniform size of the blastomeres, 1 denotes varying size but ,50% variation and 2 denotes .50% variation in blastomere size. Similarly, SY was scored as 0, 1 and 2, describing full symmetry of the cleaved embryo, slightly asymmetric cleavage, and pronounced asymmetry, respectively. The difference between these two parameters is thus that EQ scores variation in blastomere size whereas SY scores the arrangement of the blastomeres three-dimensionally. The parameter NU was defined as the number of visible mononucleated blastomeres divided by the total number of blastomeres in the embryo (to correct for the cleavage rate); a ratio of 0 – 0.29 gave a NU score of 0, 0.30 – 0.69 gave a NU score of 1, 0.70 – 0.99 gave a score of 2, and a ratio of 1 equalled a NU score of 3. The NU score 21 denotes that the embryo contains at least one multinucleated blastomere. Supplementary Table SI gives all the possible NU scores for embryos with 2 – 8 blastomeres and varying proportions of visible nuclei. Supplementary Table SI is given to facilitate the NU scoring, which otherwise must be calculated manually for each combination. With similar definitions, these five variables were used in our previous study, which resulted in the IMC (Holte et al., 2007). The IMC included BL, NU and EQ, which remained significant after regression analysis, incorporated in an algorithm and presented as a 10-point scale. In that study, only DETs were included, and the end-point was clinical pregnancy. In the present study, the IMC was the basis for ranking and selection of embryos, but FR and SY were still scored for the purpose of comparing their potential to compete with the other three variables to be included in the new algorithm, which also predicts a slightly different end-point, LBR. Embryos were transferred 51 – 55 h after oocyte retrieval. After embryo transfer, all patients were given luteal phase support vaginally for 2 weeks, either as 400 mg of micronized progesterone (vagitories, Apoteket AB, Stockholm, Sweden) or 100 mg vaginal tablets (Ferring, Lausanne,

30 Switzerland) three times daily, or as 90 mg gel (Merck, Darmstadt, Germany) twice daily. A clinical pregnancy was defined by the presence of an intrauterine gestational sac at vaginal ultrasonography 5– 6 weeks after oocyte retrieval. The delivery of a living child defined live birth.

Statistics

Ethical approval The study was approved by the Regional Ethics Review Board, Uppsala, Sweden. Written informed consent was obtained from all participants.

Results The study cohort consisted of 6252 treatment cycles in 4304 couples. Table I shows the characteristics of the study cohort. Mean female age (SD) was 34 (4.2) years. The overall LBR was 27.1%.

Effects of embryo variables in univariate models Figure 1 shows the relationships between each of the five embryo variables and predicted and observed LBRs. All variables except for SY were significantly associated with LBR. All variables except BL could best be described by linear functions; for BL, LBRs increased from 2- to 4-cell embryos but then declined again with higher numbers of blastomeres. As BL was not linearly related to the outcome, it had to be de-composed into two variables. The linear BL (linBL) term is the number of blastomeres and the term abs(BL-4) is the deviation from the ideal number of blastomeres. Thus, the relation was best described by the combination of a linear term and the absolute deviation from 4 cells (e.g. 3 cells and 5 cells both have an absolute deviation of 1).

Table I Characteristics of the cohort in a prospective study of embryo variables and prediction of live birth. Parameter

Value

........................................................................................ Women, n

4304

Embryo transfers, n

6252

ET/woman, mean (min-max)

1.45 (1– 7)

Couples with 1 treatment, n (%)

2884 (67.0)

Couples with 2 treatments, n (%)

1006 (23.4)

Couples with 3 treatments, n (%)

328 (7.6)

Couples with 4 or more treatments, n (%)

86 (2.0)

Female age in years, mean (SD)

34 (4.2)

Eggs per oocyte retrieval, mean (range)

9.75 (1– 39)

Tubal factor, n (%)

692 (11.1)

Anovulation, n (%)

525 (8.4)

Unexplained infertility, n (%)

2512 (40.1)

Male factor (%), n (%)

1665 (26.6)

Other cause of infertility, n (%)

858 (13.8)

ICSI, n (%)

2164 (34.6)

IVF, n (%)

3445 (55.1)

Combined IVF/ICSI, n (%)

643 (10.3)

Pregnancy rate, %

34.3

Live birth rate, %

27.1

For category variables, n (%) is presented and for continuous variables, mean + SD or range (minimum; maximum) is presented. ET, embryo transfer.

Downloaded from http://humrep.oxfordjournals.org/ at Selcuk University on December 29, 2014

The sample size was based on the perceived length of the 95% confidence interval (CI) for the pregnancy odds ratio for the most important predictor; the absolute deviation of blastomeres from 4 (abs(BL-4)). This odds ratio was 0.39 in our previous study (Holte et al., 2007). We calculated that 5000 treatments would be sufficient to make the length of the 95% CI 0.2. Subsequently, as shown in the Results section, this power calculation proved valid, given the lengths of the calculated 95% CIs for odds ratios for abs(BL-4); the estimated odds ratio is 0.47 with 95% CI 0.39 –0.58, which is highly informative. The interrelations between the predictor variables were examined using Spearman’s rank correlation coefficients. Univariate relations between predictor variables and the outcomes pregnancy and live birth were described in figures and statistically analysed in generalized estimating equation (GEE) models (Zeger et al., 1988) with clinical pregnancy and live birth (yes/no) as outcome variables and embryo factors as predictors. GEE models take into account the dependence between repeated treatments for the same woman. Results were presented as odds ratios with 95% CIs and P-values for each predictor and c statistic, i.e. area under receiver operating characteristic curve, for each model. A model’s predictive capacity, which is measured as the c statistic in this study, when estimated from the same data set as was used for model development, is commonly too optimistic compared with the accuracy found in new patients. This over-fitting is more likely to occur when the derivation data set is small, and/or there is a large number of candidate predictors compared with the number of outcomes (in our case the number of delivered babies) (Moons et al., 2004). Although none of these conditions characterized the present study, we still used a shrinkage method to adjust for possible over-fitting and remove bias in c statistics (Harrell et al., 1996). The method was applied to a logistic regression model with data from the first treatment for each woman. Model calibration was assessed with the Hosmer –Lemeshow test (Hosmer and Lemeshow, 2000). The test assesses whether or not the observed event rates match expected event rates in subgroups based on predicted probabilities from the model. The Hosmer –Lemeshow test results are non-significant if the calibration is acceptable. Each predictor’s (i.e. embryo variable’s) functional relation to the outcome was examined and, if necessary, the variable was transformed. Thereafter, a GEE multivariate regression model with forward selection of variables was applied to find the set of predictor variables with independent impact on the outcome. P-values from two-tailed tests ,5% were considered statistically significant. From this model, estimated probabilities were calculated for each treatment using a weighed sum of mean embryo variable values where the estimated regression coefficients were used as weights. This calculated value is the estimated probability of pregnancy or a live birth. The relative merits of the predictor variables were illustrated by the calculation of the estimated probabilities (transferred to a scale 0 –10, where 10 is the highest probability) for a number of combinations of the predictors’ values.

Rhenman et al.

31

Construction of an embryo score to predict live birth

Table II Effects of single embryo variables on live birth rates: univariate analysis. Parameter

OR (CI)

P-value

c-stat

........................................................................................ linBL abs(BL-4)

1.38 (1.15–1.65) 0.39 (0.32–0.47)

0.001 ,0.0001

0.548

NU

1.68 (1.51–1.87)

,0.0001

0.545

FR

0.77 (0.71–0.84)

,0.0001

0.535

EQ

0.69 (0.61–0.78)

,0.0001

0.537

SY

0.88 (0.75–1.03)

0.103

0.513

Odds ratios (OR) with 95% confidence intervals (CI) and P-values and c-statistics for the end-point live birth for each of the variables; Blastomere (BL), Nucleus (NU), Fragmentation (FR), Equality (EQ) and Symmetry (SY). Number of blastomeres is represented both by the linear number of blastomeres, lin BL, and the absolute number of blastomeres – 4, abs(BL-4), given the non-linear association between the blastomere variable and the end-point.

The term abs(BL-4) implies a symmetric relation but addition of the linear term introduces a slight asymmetric feature such that 5 cells gives a better outcome than 3 cells. Thus, the combination of these terms captures the asymmetric relation between BL and LBR evident from Fig. 1. In the following, BL refers to the combination of linBL term and abs(BL-4). Table II shows univariate odds ratios and the discriminative abilities for a live birth for the five embryo variables. BL had the highest discriminative capacity (c statistic 0.548), closely followed by NU and FR and EQ. Of the five embryo variables, only SY was univariately insignificant.

Effects of embryo variables and their relative importance in multivariate models In the GEE multivariate regression model, BL, NU and FR remained independent significant predictors for a live birth, but not EQ, which is why this variable was subsequently excluded. Table III shows the discriminative capacity for different combinations of the independently significant variables. The model including BL, NU and FR (Table III, model 1) had the highest discriminative capacity for prediction of live birth (c statistic 0.579). Removal of FR or NU from the multivariate model (model 2 and 3, respectively) resulted in a c statistic of 0.568 and 0.569, respectively. The combination NU and FR gave the lowest c statistic of 0.559 (model 4). To construct the revised IMC score (rIMC) we transferred the resulting figure of predicted probabilities of live birth from the algorithm of model 1 (Table III) to a scale of 0–10. Supplementary Figure S1 displays all combinations of BL, NU and FR values with corresponding rIMC scores, the value 10 (BL 4, NU 3, i.e. all four blastomeres with a visible nucleus, and FR 0) thus implying the highest probability of a live birth. Supplementary Table SI shows the calculation of all possible NU scores up to eight blastomeres and each possible rIMC score. Examples of the effect of different rIMC scores: Transfer of an embryo with a score of 10 points resulted in a mean LBR of 33.1% (predicted: 33.2%; N ¼ 1126), whereas the corresponding figure for a rIMC score of 9.1 was 29.3% (predicted: 30.3%; N ¼ 3469). Table IV summarizes the observed and predicted LBRs for different levels of rIMC scores. Age did not interact with any of the variables, i.e. the embryo variables had similar effects over all age groups. The age-adjusted c statistic for

Downloaded from http://humrep.oxfordjournals.org/ at Selcuk University on December 29, 2014

Figure 1 Number of observations, predicted (blue lines) and observed (red bars), live birth rates (LBR) at all scores for the five embryo variables (no age-adjustment). BL ¼ number of blastomeres, NU ¼ Nucleus score, the number of visible mononucleated blastomeres divided by the total number of blastomeres, with the score 21 denoting that the embryo contains at least one multinucleated blastomere, FR ¼ degree of fragmentation, EQ ¼ variation in blastomeres size and SY ¼ symmetry of the cleavage. BL, NU, FR and EQ were all significantly associated with LBR, whereas SY was non-significant.

32

Rhenman et al.

Table III Effects of embryo variables in multiple models versus live birth rates. OR with 95% CI and P-values and c-statistics. OR (CI)

P-value

c-stat

........................................................................................ Model 1 linBL

1.46 (1.21– 1.76)

0.0001

abs(BL-4)

0.47 (0.39– 0.58)

,0.0001

NU

1.39 (1.24– 1.57)

,0.0001

FR

0.87 (0.79– 0.96)

0.005

0.579

Model 2 1.38 (1.16– 1.64)

0.001

abs(BL-4)

0.48 (0.40– 0.57)

,0.0001

NU

1.46 (1.30– 1.64)

,0.0001

0.568

Figure 2 Mean live birth rates at revised integrated morphology

Model 3 linBL

1.46 (1.20– 1.77)

0.001

abs(BL-4)

0.39 (0.32– 0.48)

,0.0001

FR

0.83 (0.76– 0.91)

,0.0001

0.569

Correlation of the variables

Model 4 NU

1.60(1.43–1.79)

,0.0001

FR

0.85 (0.78– 0.94)

0.001

0.559

BL, NU and FR were significant predictors, whereas EQ was lost because of non-significance when tested in multivariate setting. Owing to its non-linear relationship with live birth rate, blastomere is represented by two variables: linBL, the total number of blastomeres; and abs(BL-4), the absolute figure for the deviation from the ideal four blastomeres. Model 1 was chosen as the revised integrated morphology cleavage score (rIMC) due to the highest c statistics.

Table IV The observed and predicted live birth rates for five different levels of rIMC score. rIMC (0– 10 point score)

cleavage (rIMC) scores from 1 to 10 points in different age groups. The impact of embryo score on implantation is proportionally affected by age over the entire range of embryo scores.

0–4

4–5

5– 8

8 –9.9

10

........................................................................................ 543 Number of observations (n)

144

473

3966

1126

Observed live birth, n (%)

29 (5.3)

24 (16.7)

100 (21.1)

1167 (29.4)

373 (33.1)

Predicted live birth %

6.4

14.9

21.8

29.9

33.2

The cohort is divided into five revised Integrated Morphology Cleavage score (rIMC) levels; 0– 4, 4 –5, 5–8, 8–9.9 and 10. For each rIMC group the total number of observations and the observed and predicted live birth rates are displayed.

model 1 was 0.637. LBRs for different embryo scores were calculated in seven age groups and the resulting graphs are depicted in Fig. 2.

Adjustment for over-fitting To adjust for over-fitting, we first estimated a logistic regression model with data from the first treatment for each woman (n ¼ 4452) and thereafter a shrinkage model using the same data was estimated and the c statistics were compared. The c statistics for these models were 0.593 and 0.593, respectively.

Overall, the embryo variables displayed modest intercorrelations (Table V). The strongest correlation was between abs(BL-4) and NU (r ¼ 20.40), followed by NU and EQ (r ¼ 20.35), abs(BL-4) and EQ (r ¼ 0.25) and FR and NU (r ¼ 20.23).

Calibration of the multivariate model Figure 3 shows the association between the mean predicted probability of a pregnancy and the observed pregnancy rates. The corresponding results for LBRs are given in Fig. 4. The goodness of fit was good for both outcomes, assessed with the Hosmer –Lemeshow test, which was insignificant for both outcomes (P ¼ 0.60 for the prediction of pregnancy and P ¼ 0.39 for a live birth).

Discussion This is by far the largest study on classic embryo morphological markers and their relation to chance of a successful IVF/ICSI treatment. The study is unique in the sense that all studied embryos were scored prospectively and SET was used. We focused on evaluating five morphological variables at the early cleavage stage, determining each variable’s individual relationship with treatment outcome, and further aimed at determining the relative weight of these parameters to predict live birth. The study is a follow-up of our previous study, which investigated the same variables in DET and using implantation as the end-point (Holte et al., 2007). We found that the present embryo model, derived from only SET in a much larger cohort and with live birth as main end-point, is very similar to the one presented previously. In univariate analysis, the results were almost identical, confirming that treatment outcome is associated with cleavage rate in a non-linear way, and in a linear way with nuclear status, degree of fragmentation and variation in blastomere size. After regression modelling, the degree of fragmentation qualified in the final new model instead of the variation in blastomere size, which was incorporated in the previous model (Holte et al., 2007). Thus the new algorithm, the rIMC, incorporates cleavage rate, the percentage of blastomeres showing a normal nucleus, and the degree of fragmentation, whereas the variation in

Downloaded from http://humrep.oxfordjournals.org/ at Selcuk University on December 29, 2014

linBL

33

Construction of an embryo score to predict live birth

Table V Spearman’s coefficient of correlation between the five embryo variables. linBL

abs(BL-4)

NU

FR

EQ

SY

............................................................................................................................................................................................. linBL

(r) (P)

1.00

abs(BL-4) (r) (P)

20.19 ,0.000

1.00

NU

(r) (P)

0.02 0.050

20.40 ,0.000

1.00

FR

(r) (P)

20.03 0.012

0.16 ,0.000

20.23 ,0.000

1.000

EQ

(r) (P)

0.07 ,0.000

0.25 ,0.000

20.35 ,0.000

0.21 ,0.000

1.000

SY

(r) (P)

0.07 ,0.000

20.02 0.213

20.11 ,0.000

0.05 0.003

0.17 ,0.000

1.000

Figure 3 Calibration of the age-adjusted rIMC score (including blastomere, fragmentation and nucleus) for observed (y-axis) versus predicted (x-axis) chance of pregnancy in stratas of 10%. Hosmer – Lemeshow test P-value ¼ 0.60. A non-significant value indicates the calibration is good. Error bars indicate SEM.

blastomere size and the symmetry of the cleavage did not qualify for the final model. The use of poor-quality prediction models could have a negative effect on clinical strategies by introducing the illusion of objective improvement over clinical judgement (Leushuis et al., 2009). Many previous studies on embryo morphology have focused on a single factor, such as cleavage rate, degree of fragmentation, uneven cleavage and multinuclearity, all factors which are associated with implantation and birth rates in univariate analyses (Alpha Scientists in Reproductive Medicine and ESHRE Special Interest Group of Embryology, 2011). There are, however, only a limited number of studies that take a multi-dimensional approach, which better reflects the common clinical situation, when ranking embryos with various degrees and types of morphological deviation is needed (Scott, 2003; Sjoblom et al., 2006; Guerif et al., 2007; Holte

Figure 4 Calibration of the age-adjusted rIMC score (including blastomere, fragmentation and nucleus) for observed (y-axis) versus predicted (x-axis) chance of a live birth in stratas of 10%. Hosmer – Lemeshow test P-value ¼ 0.39. A non-significant value indicates the calibration is good. Error bars indicate SEM.

et al., 2007; Rehman et al., 2007; Racowsky et al., 2009; Ahlstrom et al., 2011; Fauque et al., 2013). A multivariate model is necessary to find out which variables have independent, and thus true predictive, impact on the outcome. Some variables may lack biological significance, but show an association with other variables of real importance and thus falsely give the impression of true predictors for embryo euploidy. The best way to avoid misclassification of embryos due to univariate or subjective embryo scoring is to perform prospective, large-scale studies with several embryo variables, resulting in a scoring algorithm that is based on multivariate statistical analyses, in which all relevant variables ‘compete’. Such an algorithm thus includes only the factors that show independent impact on treatment outcome and it also gives the correct power balance between those variables. In the present study we included all treatments, also from repeated cycles, as this strategy may better reflect a couple’s

Downloaded from http://humrep.oxfordjournals.org/ at Selcuk University on December 29, 2014

Owing to its non-linear relationship with live birth rate, Blastomere (BL) is represented by two variables: linBL, the total number of blastomeres; and abs(BL-4), the absolute figure for the deviation from the ideal four blastomeres. r, coefficient of correlation; n ¼ 6252 embryos.

34

predictor blastomere number in the multivariate model. Thus, our findings should not discourage from scoring blastomere size variation, given the previous support for a biological significance of this variable per se, although it did not qualify in the final ranking model for predicting live birth. In fact, applying our previous IMC model (with blastomere size variation instead of degree of fragmentation in the model) to the present data resulted in discriminatory levels close to those attained with the new model (data not shown). Symmetry of cleavage is the variable in this study with the least previous scientific support, and it was the only variable which showed no significant association with LBR also in univariate testing, confirming our previous results (Holte et al., 2007). Together the findings suggest this variable may be omitted when scoring embryos in the early cleavage stage. The best predictive capacity (c statistic) of live birth in our cohort was 0.579, with the variables BL (linBL and abs(BL-4)), NU and FR. It should be stressed that this relatively low discrimination level resulted from only measures of embryo quality. Nevertheless, the c statistic we reached for the rIMC score only corrected for age is comparable to the discrimination levels attained in most full IVF prediction models (Leushuis et al., 2009; van Loendersloot et al., 2013). Other variables, such as measures of ovarian reserve or ovarian response to FSH/hMG (Holte et al., 2011b; Brodin et al., 2013; Huber et al., 2013), and a variable covering outcomes of previous IVF attempts would probably show independent effects above those in the present model and consequently increase the total discriminatory capacity (Leushuis et al., 2009; Holte et al., 2011a). However, the aim of this study was not to construct a full prediction model for counselling of patients. The interpretation of the present results should primarily focus on the rank order scheme for optimal selection within a cohort of embryos. Furthermore, there is a growing recognition that the c statistic, which plays a central role in evaluating diagnostic tests, has limitations in the evaluation of prediction models (Cook, 2007). Modest discriminative qualities may still result in clinically valuable calibrations at the group level (Cook, 2007; Leushuis et al., 2009; Te Velde et al., 2014). The present model did indeed show a strikingly good calibration on the group level over the whole range of predicted probabilities of pregnancy and live birth, also when subjected to a strictly conservative statistical test. Thus, the rIMC alone could correctly differentiate treatment prognosis in groups of patients in 10% stratas from low to high chance in this large population. Such good calibration has not been shown for other prediction models, even when other treatment variables were included and the ambition was to create a full prediction model (Te Velde et al., 2014). This finding underlines the great importance of an accurate embryo scoring protocol for ranking and selecting embryos to transfer, and suggests that such precise and standardized scoring should be the cornerstone of full prediction models. The results must, however, be interpreted with caution. One limitation of the study involves the risk for overestimation when a prediction model is created and validated in the same data set. However, when tested with a specific statistical method, which controls for such bias (Harrell et al., 1996; Moons et al., 2004), we observed no decrease in statistical power. Nevertheless, the model should be validated in other independent populations (Te Velde et al., 2014). It could also be seen as a limitation of this study that we studied only Day 2 transfers. However, short time of culture may be associated with lower risks for the offspring compared with prolonged culture, as recently suggested by three large national follow-up studies (Kallen et al., 2010; Kalra et al., 2012; Dar et al., 2013). Indeed, a major argument

Downloaded from http://humrep.oxfordjournals.org/ at Selcuk University on December 29, 2014

full fertility potential, compared with restricting the data analysis to the first treatment only (Broekmans et al., 2006; Holte et al., 2011a; Brodin et al., 2013). We applied the GEE model which statistically corrects for the correlation between repeated treatments within couples (Zeger et al., 1988). Refraining from correcting for such bias may alter c statistic levels, in some cases markedly (Racowsky et al., 2009). Cleavage rate was the most powerful predictor for live birth. The ideal cleavage rate on Day 2 was four blastomeres. This finding corroborates several other studies (Holte et al., 2007; Machtinger and Racowsky, 2013). The slight reduction in implantation and LBRs associated with a more rapid cleavage rate (not as marked as for slow cleavage) confirms our own (Holte et al., 2007) and other previous findings (Cummins et al., 1986; Claman et al., 1987; Staessen et al., 1992; Giorgetti et al., 1995; Ziebe et al., 1997; Van Royen et al., 1999; Magli et al., 2001). This complex, non-linear relationship between treatment outcome and blastomere number is important to capture mathematically to give a high accuracy of the algorithm. The finding was identical to our previous finding constructing the IMC, and therefore the blastomere relationship with treatment outcome was again best described by a composite variable (linBL and abs(BL-4)) (Holte et al., 2007). The proportion of mononucleated blastomeres had almost as strong predicting power for live birth as cleavage rate. The linear prognostic impact of the embryo nuclear status, with multinuclearity being the least favourable predictor, was first shown in our previous paper (Holte et al., 2007). Multinuclearity is an established risk marker for embryo abnormality and decreased implantation potential (Tesarik et al., 1987; Van Royen et al., 1999, 2001, 2003). A recent study (Fauque et al., 2013), with 1629 treatments, demonstrated multinucleation as an independent negative variable after multivariate regression analysis, but could not confirm the proportion of mononucleated blastomeres as an independent variable, although the variable was significant in the univariate analysis. This study shows that degree of fragmentation qualifies as an independent significant predictor of live birth in the multivariate model, as a marginally less strong marker than cleavage rate and on the same power level as nuclear status. As is illustrated in Fig. 1, the variable shows a similar linear relationship with LBR, as does nuclear status, with subtle but visible effects also at low degree fragmentation. This effect is also illustrated in the ranking tree for the rIMC score with slight fragmentation resulting in a somewhat lower rIMC score (9.1 versus 10 at ideal BL 4 and NU 3; Supplementary Fig. S1), corresponding to a reduction in observed and predicted LBRs of 3 –4%. The variation in blastomere size has been observed to have prognostic importance in earlier studies (Giorgetti et al., 1995; Ziebe et al., 1997; Antczak and Van Blerkom, 1999; Hardarson et al., 2001; Rienzi et al., 2005; Holte et al., 2007). The present study could not demonstrate blastomere size variation as an independent marker, as opposed to the results in our previous study (Holte et al., 2007), although it was highly significant in univariate analysis. It should however be stressed that the difference with our previous results was subtle, resulting in a switch between including degree of fragmentation instead of blastomere size variation. It is difficult to draw solid biological conclusions from these differences, but they may be related to differences in the two study protocols (SET versus DET, primary end-points LBRs versus implantation), and differences in size of study populations. Fragmentation showed a weaker association with blastomere number than did the variation in blastomere size (Table V), a factor of possible statistical importance when qualifying as an independent variable together with the strongest

Rhenman et al.

35

Construction of an embryo score to predict live birth

Supplementary data Supplementary data are available at http://humrep.oxfordjournals.org/.

Authors’ roles A.R.: participated in the study design, the interpretation of the data and wrote the main parts of the manuscript in cooperation with the other authors. I have had full access to all the data in the study and had final responsibility for the decision to submit for publication. L.B.: supervised the statistical evaluation and advised in interpretation of the data. He has also been involved in the design of the study. T.B.: participated in collecting the data, the design of the study and has taken part in supervising the development of the manuscript. M.O.: interpretation of data; supervised the development of the article and advised in the writing of the manuscript. K.M.: participated in the study design, performed active scoring of the embryos and has taken part in revision of the article and final approval. N.H.: performed statistical analysis, participated in study design and interpretation of data. J.H.: conception of the study and the study design, interpretation of the data and revised the manuscript for final approval.

Funding Carl von Linne´ Clinic, Uppsala and the Department of Women’s and Children’s Health and the Family Planning Fund in Uppsala, Uppsala University, Uppsala, Sweden financed this study.

Conflict of interest None declared.

References Ahlstrom A, Westin C, Reismer E, Wikland M, Hardarson T. Trophectoderm morphology: an important parameter for predicting live birth after single blastocyst transfer. Hum Reprod 2011;26:3289– 3296. Alpha Scientists in Reproductive Medicine and ESHRE Special Interest Group of Embryology. The Istanbul consensus workshop on embryo assessment: proceedings of an expert meeting. Hum Reprod 2011;26: 1270 – 1283. Antczak M, Van Blerkom J. Temporal and spatial aspects of fragmentation in early human embryos: possible effects on developmental competence and association with the differential elimination of regulatory proteins from polarized domains. Hum Reprod 1999;14:429 – 447. Aparicio B, Cruz M, Meseguer M. Is morphokinetic analysis the answer? Reprod Biomed Online 2013;27:654 – 663. Balaban B, Yakin K, Urman B. Randomized comparison of two different blastocyst grading systems. Fertil Steril 2006;85:559 – 563. Brodin T, Hadziosmanovic N, Berglund L, Olovsson M, Holte J. Antimullerian hormone levels are strongly associated with live-birth rates after assisted reproduction. J Clin Endocrinol Metab 2013;98:1107 – 1114. Broekmans FJ, Kwee J, Hendriks DJ, Mol BW, Lambalk CB. A systematic review of tests predicting ovarian reserve and IVF outcome. Hum Reprod Update 2006;12:685 – 718. Claman P, Armant DR, Seibel MM, Wang TA, Oskowitz SP, Taymor ML. The impact of embryo quality and quantity on implantation and the establishment of viable pregnancies. J In Vitro Fert Embryo Transf 1987; 4:218 – 222. Conaghan J, Chen AA, Willman SP, Ivani K, Chenette PE, Boostanfar R, Baker VL, Adamson GD, Abusief ME, Gvakharia M et al. Improving embryo selection using a computer-automated time-lapse image analysis test plus day 3 morphology: results from a prospective multicenter trial. Fertil Steril 2013;100:412 – 419 e415. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 2007;115:928 – 935. Cummins JM, Breen TM, Harrison KL, Shaw JM, Wilson LM, Hennessey JF. A formula for scoring human embryo growth rates in in vitro fertilization: its value in predicting pregnancy and in comparison with visual estimates of embryo quality. J In Vitro Fert Embryo Transf 1986;3:284– 295. Dar S, Librach CL, Gunby J, Bissonnette F, Cowan L, IVF Directors Group of Canadian Fertility and Andrology Society. Increased risk of preterm birth in singleton pregnancies after blastocyst versus Day 3 embryo transfer: Canadian ART Register (CARTR) analysis. Hum Reprod 2013; 28:924– 928. Fauque P, Audureau E, Leandri R, Delaroche L, Assouline S, Epelboin S, Jouannet P, Patrat C. Is the nuclear status of an embryo an independent factor to predict its ability to develop to term? Fertil Steril 2013; 99:1299– 1304 e1293. Gardner DK, Lane M, Stevens J, Schlenker T, Schoolcraft WB. Blastocyst score affects implantation and pregnancy outcome: towards a single blastocyst transfer. Fertil Steril 2000;73:1155 – 1158. Gardner DK, Surrey E, Minjarez D, Leitz A, Stevens J, Schoolcraft WB. Single blastocyst transfer: a prospective randomized trial. Fertil Steril 2004; 81:551– 555. Giorgetti C, Terriou P, Auquier P, Hans E, Spach JL, Salzmann J, Roulier R. Embryo score to predict implantation after in-vitro fertilization: based on 957 single embryo transfers. Hum Reprod 1995;10:2427 – 2431. Guerif F, Le Gouge A, Giraudeau B, Poindron J, Bidault R, Gasnier O, Royere D. Limited value of morphological assessment at days 1 and 2 to predict blastocyst development potential: a prospective study based on 4042 embryos. Hum Reprod 2007;22:1973– 1981. Hardarson T, Hanson C, Sjogren A, Lundin K. Human embryos with unevenly sized blastomeres have lower pregnancy and implantation rates:

Downloaded from http://humrep.oxfordjournals.org/ at Selcuk University on December 29, 2014

for blastocyst culture is the possibility to perform more accurate embryo selection at that stage. Given the excellent calibration for the present model, it should be possible to switch to early cleavage transfers with the use of the rIMC. Other limitations of the present study are those inherent for real-time visual scoring including risks of inter-observer variation and the hazards of fixed time-point scoring procedures in a dynamic process. Time-lapse recordings have so far resulted in important results for deselection of aberrant cleaving embryos (Herrero et al., 2013). Some results indicate that time-lapse recording may also prove to be powerful for positive selection between embryos of otherwise similar morphology and cleavage stage (Aparicio et al., 2013; Conaghan et al., 2013; Herrero et al., 2013; Montag et al., 2013). It would be of great interest to combine the present scoring model with the time-lapse technique to further optimize embryo selection. In conclusion, the findings of this large prospective study on visual embryo scoring variables in SET show that blastomere number (in a non-linear way), the proportion of mononucleated (including information on multinucleation) blastomeres and the degree of fragmentation (both linearly) have independent prognostic power to predict live birth. Together, these variables are incorporated in an algorithm, which is presented as the rIMC score. It is suggested that applying the ranking tree, as presented here, gives an easy and structured clinical guide to the ranking and selection of embryo/s most suitable for transfer in the early cleavage stage.

36

Racowsky C, Ohno-Machado L, Kim J, Biggers JD. Is there an advantage in scoring early embryos on more than one day? Hum Reprod 2009; 24:2104 – 2113. Rehman KS, Bukulmez O, Langley M, Carr BR, Nackley AC, Doody KM, Doody KJ. Late stages of embryo progression are a much better predictor of clinical pregnancy than early cleavage in intracytoplasmic sperm injection and in vitro fertilization cycles with blastocyst-stage transfer. Fertil Steril 2007;87:1041 – 1052. Rienzi L, Ubaldi F, Iacobelli M, Romano S, Minasi MG, Ferrero S, Sapienza F, Baroni E, Greco E. Significance of morphological attributes of the early embryo. Reprod Biomed Online 2005;10:669– 681. Scott L. The biological basis of non-invasive strategies for selection of human oocytes and embryos. Hum Reprod Update 2003;9:237 – 249. Sjoblom P, Menezes J, Cummins L, Mathiyalagan B, Costello MF. Prediction of embryo developmental potential and pregnancy based on early stage morphological characteristics. Fertil Steril 2006;86:848 – 861. Staessen C, Camus M, Bollen N, Devroey P, Van Steirteghem AC. The relationship between embryo quality and the occurrence of multiple pregnancies. Fertil Steril 1992;57:626 – 630. Tesarik J, Kopecny V, Plachot M, Mandelbaum J. Ultrastructural and autoradiographic observations on multinucleated blastomeres of human cleaving embryos obtained by in-vitro fertilization. Hum Reprod 1987; 2:127 – 136. Te Velde ER, Nieboer D, Lintsen AM, Braat DD, Eijkemans MJ, Habbema JD, Vergouwe Y. Comparison of two models predicting IVF success; the effect of time trends on model performance. Hum Reprod 2014;29:57 – 64. van Loendersloot L, van Wely M, Goddijn M, Repping S, Bossuyt P, van der Veen F. Pregnancy and twinning rates using a tailored embryo transfer policy. Reprod Biomed Online 2013;26:462– 469. Van Royen E, Mangelschots K, De Neubourg D, Valkenburg M, Van de Meerssche M, Ryckaert G, Eestermans W, Gerris J. Characterization of a top quality embryo, a step towards single-embryo transfer. Hum Reprod 1999;14:2345 – 2349. Van Royen E, Mangelschots K, De Neubourg D, Laureys I, Ryckaert G, Gerris J. Calculating the implantation potential of day 3 embryos in women younger than 38 years of age: a new model. Hum Reprod 2001; 16:326 – 332. Van Royen E, Mangelschots K, Vercruyssen M, De Neubourg D, Valkenburg M, Ryckaert G, Gerris J. Multinucleation in cleavage stage embryos. Hum Reprod 2003;18:1062 – 1069. Vernon M, Stern JE, Ball GD, Wininger D, Mayer J, Racowsky C. Utility of the national embryo morphology data collection by the Society for Assisted Reproductive Technologies (SART): correlation between day-3 morphology grade and live-birth outcome. Fertil Steril 2011;95:2761–2763. Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics 1988;44:1049– 1060. Ziebe S, Petersen K, Lindenberg S, Andersen AG, Gabrielsen A, Andersen AN. Embryo morphology or cleavage stage: how to select the best embryos for transfer after in-vitro fertilization. Hum Reprod 1997; 12:1545 – 1549.

Downloaded from http://humrep.oxfordjournals.org/ at Selcuk University on December 29, 2014

indications for aneuploidy and multinucleation. Hum Reprod 2001; 16:313 – 318. Harrell FE, Jr., Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361 –387. Harton GL, Munne S, Surrey M, Grifo J, Kaplan B, McCulloh DH, Griffin DK, Wells D, PGD Practitioners Group. Diminished effect of maternal age on implantation after preimplantation genetic diagnosis with array comparative genomic hybridization. Fertil Steril 2013;100:1695– 1703. Herrero J, Tejera A, Albert C, Vidal C, de Los Santos MJ, Meseguer M. A time to look back: analysis of morphokinetic characteristics of human embryo development. Fertil Steril 2013;100:1602– 1609 e1604. Holte J, Berglund L, Milton K, Garello C, Gennarelli G, Revelli A, Bergh T. Construction of an evidence-based integrated morphology cleavage embryo score for implantation potential of embryos scored and transferred on day 2 after oocyte retrieval. Hum Reprod 2007;22:548 – 557. Holte J, Berglund L, Hadziosmanovic N, Tilly J, Pettersson H, Bergh T. The construction and validation of a prediction model to minimize twin rates at preserved live birth rates in ART. Paper presented at 27th Annual Meeting of the European Society of Human Reproduction & Embryology (ESHRE), July 2011a; Stockholm, Sweden. Holte J, Brodin T, Berglund L, Hadziosmanovic N, Olovsson M, Bergh T. Antral follicle counts are strongly associated with live-birth rates after assisted reproduction, with superior treatment outcome in women with polycystic ovaries. Fertil Steril 2011b;96:594 – 599. Hosmer DW, Lemeshow S. Applied Logistic Regression, 2nd edn. New York; Chichester: Wiley, 2000. Huber M, Hadziosmanovic N, Berglund L, Holte J. Using the ovarian sensitivity index to define poor, normal, and high response after controlled ovarian hyperstimulation in the long gonadotropin-releasing hormone-agonist protocol: suggestions for a new principle to solve an old problem. Fertil Steril 2013;100:1270– 1276. Kallen B, Finnstrom O, Lindam A, Nilsson E, Nygren KG, Olausson PO. Blastocyst versus cleavage stage transfer in in vitro fertilization: differences in neonatal outcome? Fertil Steril 2010;94:1680 –1683. Kalra SK, Ratcliffe SJ, Barnhart KT, Coutifaris C. Extended embryo culture and an increased risk of preterm delivery. Obstet Gynecol 2012;120:69 – 75. Leushuis E, van der Steeg JW, Steures P, Bossuyt PM, Eijkemans MJ, van der Veen F, Mol BW, Hompes PG. Prediction models in reproductive medicine: a critical appraisal. Hum Reprod Update 2009;15:537– 552. Machtinger R, Racowsky C. Morphological systems of human embryo assessment and clinical evidence. Reprod Biomed Online 2013;26:210– 221. Magli MC, Gianaroli L, Ferraretti AP. Chromosomal abnormalities in embryos. Mol Cell Endocrinol 2001;183(Suppl 1):S29– S34. Montag M, Toth B, Strowitzki T. New approaches to embryo selection. Reprod Biomed Online 2013;27:539 –546. Moons KG, Donders AR, Steyerberg EW, Harrell FE. Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J Clin Epidemiol 2004;57:1262 – 1270.

Rhenman et al.

Which set of embryo variables is most predictive for live birth? A prospective study in 6252 single embryo transfers to construct an embryo score for the ranking and selection of embryos.

Which embryo score variables are most powerful for predicting live birth after single embryo transfer (SET) at the early cleavage stage?...
407KB Sizes 0 Downloads 5 Views