Estimation of the Malthusian parameter in an stochastic epidemic model using martingale methods.

Mathematical Biosciences 246 (2013) 272–279

Contents lists available at ScienceDirect

Mathematical Biosciences journal homepage: www.elsevier.com/locate/mbs

Estimation of the Malthusian parameter in an stochastic epidemic model using martingale methods David Lindenstrand, Åke Svensson ⇑ Department of Mathematics, Stockholm university, SE-10691 Stockholm, Sweden

a r t i c l e

i n f o

Article history: Received 26 March 2013 Received in revised form 18 September 2013 Accepted 14 October 2013 Available online 24 October 2013 Keywords: Malthusian parameter Stochastic epidemic model Growth rate Generation time distribution

a b s t r a c t Data, on the number of infected, gathered from a large epidemic outbreak can be used to estimate parameters related to the strength and speed of the spread. The Malthusian parameter, which determines the initial growth rate of the epidemic is often of crucial interest. Using a simple epidemic SEIR model with known generation time distribution, we define and analyze an estimate, based on martingale methods. We derive asymptotic properties of the estimate and compare them to the results from simulations of the epidemic. The estimate uses all the information contained in the epidemic curve, in contrast to estimates which only use data from the start of the outbreak. Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction When considering possible effects of an epidemic outbreak of an infectious disease several characteristics are of interest. Traditionally the basic reproduction number, R0 , is considered important. It is defined as the mean number of persons an infected infects in a totally susceptible population. R0 is closely related to the proportion of a population that finally will be infected during an epidemic outbreak, and can be seen as a primary measure of the infectivity of the infectious agent considered. Recently the speed at which the epidemic grows has attracted considerable interest. Together with R0 the speed decides the possibilities to control spread of an infection. Experience indicates that the number of infected persons initially grows at a constant exponential rate, i.e. proportional to expðrtÞ, where t is the time the spread has been going on in the population. The increase will slow down as the number of infected and immune in the population grows. Theoretically the exponential behavior, in the start, is a consequence of that an epidemic, in the initial phase, is well approximated by a branching process. Branching processes have been studied for long and much is known of their behavior cf e.g. [1]. The parameter r is often referred to as the Malthusian parameter. A relation between R0 and r, which depends on the generation time distribution, is given by the Euler– Lotka equation (cf Section 3). A discussion of generation time distributions, the basic reproduction number and the Malthusian parameter in epidemic models can be found in [2]. ⇑ Corresponding author. Tel.: +46 8164569. E-mail address: [email protected] (Å. Svensson). 0025-5564/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.mbs.2013.10.002

The epidemic curve describes how the number of infected persons grows with time. The purpose of this paper is to suggest methods to derive estimates of both the basic reproduction number and the Malthusian parameter based on possibly partial observations of the epidemic curve. As regard estimates of r simple methods have been suggested that are restricted to using data from periods where the branching process approximation is applicable (see Section 2). In order to use the information contained in longer sections of the epidemic curve we will use assumptions on the generation time distribution. The analysis is based on a class of simple models defined in Section 3. We will consider spread in a closed population with n persons in which infections are transmitted according to homogeneous mixing, i.e., when a transmission occurs the new infected is a susceptible member of the population chosen uniformly at random. The epidemic is started by one infected individual and a person may only be infected once. We will preferably use martingale estimators. Becker, [3,4], gives a thorough introduction to martingale methods for estimation of infection rates. Becker also derives estimates of R0 , and their standard deviations, for the standard SIR epidemic model. In particular he studies models where the infectious period is assumed to be exponentially distributed. This assumption makes it possible to derive estimates that depend on the number of removed persons. He also considers models with other types of mixing. In Section 4 we assume that each infected person is infectious during a fixed time interval. We derive the generation time distribution and find a relation between the basic reproduction number and the Malthusian parameter using the Euler–Lotka equation. With this assumption it is possible, knowing the epidemic curve,

273

D. Lindenstrand, Å. Svensson / Mathematical Biosciences 246 (2013) 272–279

to calculate, at each time, the number of infectious persons. Using this we can define estimators of R0 and r and derive their asymptotic properties. Using theoretical results and simulations we illustrate, what is gained by using this martingale estimator. In this special case we can also derive, as an alternative, Maximum Likelihood estimators. The simple model with a fixed infectious period is used to illustrate the methods. In Section 5 more complicated generation time distributions are considered and it is illustrated how the estimation method can be generalized.

2. Estimates of the growth rate based on the initial phase of the epidemic As pointed out in the introduction the number of infections grows initially proportional to expðrtÞ. Eventually the progress slows down, when a substantial proportion of the population is infected and therefore immune. An obvious, and often used idea, (see e.g. [5]), is to use observations from the initial part of the epidemic to estimate r. Denote the (counting) process of infected individuals by fN n ðtÞ; t P 0g, with N n ð0Þ ¼ 0 where the subindex n refers to the size of initially susceptible population. If we observe N n at times t1 and t 2 we have

~r simple ¼

logfNn ðt 2 Þ=Nn ðt 1 Þg t2 t1

ð2:1Þ

This estimate will degenerate if the branching process approximation is not appropriate in at least one of the observation times. If this is the case may e.g. be judged by a simple inspection of the epidemic curve. Another possibility is to use a statistical test to see if the progress is slowing down. Such a method is suggested in [6]. Methods of this kind have the advantage that they do not use any assumptions of anything else than the initial part of the epidemic curve. The alternative methods that are discussed in the remainder of this paper use data also from times when the growth of the epidemic has slowed down but require the use of assumptions on the generation time distribution.

gðtÞ ¼

PðX 6 t < X þ YÞ PðX þ Y > tÞ PðX > tÞ : ¼ EðYÞ EðYÞ

ð3:2Þ

(cf [2]). Let

hðrÞ ¼

Z

1

ert PðX < t 6 X þ YÞdt:

ð3:3Þ

0

The Euler–Lotka equation which has the Malthusian parameter, r, as a solution can be written as

1 ¼ R0

Z

1

ert gðtÞdt ¼ ahðrÞ:

ð3:4Þ

0

From Eq. (3.1) we observe that there is a one-to-one relation between a and the final size s. Since (3.4) implies that there is a one-to-one relation between a and r there is also a relation between r and the final size:

logð1 sÞ ¼ sEðYÞ=hðrÞ:

ð3:5Þ

4. Simple model To illustrate how estimates of a and r can be derived we will study an extremely simple model that allows us to highlight main points of the analysis. In Section 5 we will discuss how the methods can be extended to more complicated models. For the moment we will assume that there is no latent time and that the infectious time Y is nonrandom and covers one time unit. Let N n be as defined in Section 2. At time t an infectious individual spreads the disease at the rate an ðn N n ðtÞÞ ¼ að1 N n ðtÞ=nÞ where ð1 N n ðtÞ=nÞ is the proportion of susceptible individuals in the population. The number of infectious individuals at time t is IðtÞ ¼ ðN n ðtÞ N n ððt 1Þ _ 0Þ þ 1ð0;1 , where 1ð0;1 is the contribution of the initial infected. Thus the total infectious pressure in the population at time t is akn ðtÞ where

kn ðtÞ ¼ ð1 Nn ðtÞ=nÞIðtÞ; This model is a S (Susceptibles)-I (Infectious)-R (Removed) model with

3. Basic model and the Euler–Lotka equation We introduce one infectious individual in a closed, homogeneously mixing, population of n susceptible individuals. All individuals make contact with other particular individuals according to independent Poisson processes with rate an. Let Y be the length of the time interval during which an infected person is infectious. The infectious time may be proceeded by a latent time, with length X, during which the infection is not transmitted. In general both these times may be individually random. Each infected individual has a random number of potentially contagious contacts. We denote this number by n. Now, given the length of the infectious time, n is Poisson distributed with parameter aY. The basic reproduction number is given by R0 ¼ E½n ¼ aEðYÞ. A large epidemic outbreak is possible exactly when R0 > 1. Let sn denote the final proportion infected individuals after a large epidemic outbreak, i.e. sn ¼ N n ð1Þ=n. It is well known that sn converges in probability to s as n ! 1, where s is the solution to the equation (see e.g. [7])

1 s ¼ esR0 ;

The generation time distribution is defined by the density

ð3:1Þ

SðtÞ ¼ n Nn ðtÞ IðtÞ ¼ Nn ðtÞ Nn ððt 1Þ _ 0Þ þ 1ð0;1 RðtÞ ¼ Nn ððt 1Þ _ 0Þ þ 1ð1;1 : Observe that this epidemic process is completely specified by the process N n . It is possible to derive estimates of a and the Malthusian parameter observing only the process N n . This is the reason why we choose to investigate this model first. In this simple model R0 ¼ a and

hðrÞ ¼

Z

1

ert dt ¼ 0

1 er r

ð4:1Þ

According to Eq. (3.4),

R0 ¼ a ¼

r : 1 er

ð4:2Þ

We will first estimate a and then use the relation (4.2) to obtain an estimate of r. 4.1. Inference

or equivalently

logð1 sÞ ¼ Ro s

We will start by defining and analyzing the properties of an estimate of the contact intensity a. The discussion uses the

274


martingale property of the process defined as N n compensated by the integrated jump intensity. We will exploit that if the infectious time Y 1 then the number of individuals that are infectious at a certain time is a direct function of the process N n . To obtain reasonable precise estimates there has to be sufficient number of infections. We will therefore restrict the analysis to situations when a positive proportion of the population have been infected. We will say that an outbreak is large if it affects more that a positive proportion of the population, i.e. N n ð1Þ > n for some small > 0. Let F n;d ¼ fN n ð1Þ P nd; d 2 ð0; sÞg, where s is the final size in the limiting process, and let p be the outbreak probability in the limiting process. Furthermore, let En ¼{large outbreak in popup lation of size n}. Since N n ð1Þ=n ! s on the set En as n ! 1, it follows that limn!1 PððF n;d \ En ÞÞ ¼ limn!1 PðEn Þ ¼ p. In the following we restrict ourselves to the reduced sample spaces fF n;d g. This is no restriction in practise, since one would not hope to find a good estimate of any epidemic parameter if there were no large epidemic outbreak. Define the processes 1

ð4:3Þ

g n ðsÞðdNn ðsÞ akn ðsÞdsÞ;

ð4:4Þ

g n ðtÞ ¼ ð1 Nn ðtÞ=nÞ ; M n ðtÞ ¼

Z

t

large population. Here, and in the following, ) denotes converp gence in distribution and ! convergence in probability. Theorem 1. Let M n ðtÞ and T n;d be defined as above and 0 < d < s. Let

r2 ¼ a2

d

;

ð4:8Þ

pffiffiffi ~ n ðT n;d Þ aÞ ) Nð0; r2 Þ; nða

ð4:9Þ

2

ð1 dÞðlogð1 dÞÞ

then

on the sequence of sample spaces fF n;d g as n ! 1. If d ¼ s then the asymptotic variance equals 1=ðsð1 sÞÞ. Theorem 2. On the sequence of sample spaces fF n;d g, p

a~ ðT n;d Þ ! a as n ! 1. Proofs of Theorems 1 and 2 are given in appendix A. We obtain an estimate of r as the solution to the Euler–Lotka equation:

0

½Mn ðtÞ ¼

1

Z 0

a~ ðT n;d Þ ¼ ~ : hðr n ðT n;d ÞÞ

t

g 2n ðsÞdNðsÞ:

ð4:5Þ The Taylor expansion

The process fg n ðtÞg is bounded, left-continuous, and only depends on the N n -process before time t. Thus it is predictable and M n ðtÞ is a zero mean martingale with optional quadratic variation process ½Mn (cf [8]). ~ ðtÞ, based on observations of the epiA martingale estimator a demic curve up till time t, is

Rt

a~ ðtÞ ¼ R 0t

g n ðsÞdNn ðsÞ

ð4:6Þ

g ðsÞkn ðsÞds 0 n Rt

¼ Rt 0

0

ð1 Nn ðsÞ=nÞ1 dNn ðsÞ

ðNn ðsÞ Nn ððs 1Þ _ 0Þ þ 1ð0;1 Þds

ð4:7Þ

;

~ ðtÞ. which is motivated by the fact that Mn ðtÞ ¼ 0 for a ¼ a Rt Observe that the denominator in the estimate equals 0 IðsÞds, where IðsÞ is the number of infectious persons in the population at time s. The results obtained in the following of this section will hold if this integral is observed. However, in general it is not possible to find an exact expression for the number of infectious persons only from the epidemic curve. This problem is further discussed in Section 5, where possibilities to obtain estimates of IðsÞ are considered. Let T n ¼ minft > 0; N n ðtÞ N n ðt 1Þ þ Ið0;1 ¼ 0g, i.e. the first time there is no infectious person in the population. The epidemic spread stops at time T n . Furthermore, let T 0n;d ¼ minft > 0; N n ðtÞ= n > d; 0 < d < sg, i.e. the first time the proportion of infected individuals in the population reaches d. If this never happens, T 0n;d ¼ 1. Since T n and T 0n;d are stopping times it follows that T n;d ¼ minfT n ; T 0n;d g is a stopping time. We confine the analysis of a~ ðtÞ to times t ¼ T n;d , such that d 2 ð0; sÞ. For this stopping time nd 1 1X 1 ½M n ðT n;d Þ ¼ n n n j¼0 ð1 j=nÞ2

Z 0

nd

ðn xÞ2 dx ¼

d 1d

~ is asymptotically is a constant on fF n;d g. Theorem (1) states that a normally distributed with mean a if there is a large outbreak in a

0

1 1 ð~rn ðT n;d Þ rÞh ðrÞ 2 hð~r n ðT n;d ÞÞ hðrÞ h ðrÞ together with (3.4) yields

~rn ðT n;d Þ r

~ ðT n;d Þ aÞ ða

a2

1 : 0 h ðrÞ

ð4:10Þ

pffiffiffi Using Eq. (4.10), we find that nð~rðT n;d Þ rÞ is asymptotically normally distributed with mean 0 and variance

1 ajh0 ðrÞj

2

d 2

ð1 dÞðlogð1 dÞÞ

:

ð4:11Þ

4.2. Simulations 4.2.1. Martingale estimate To illustrate the theoretical results we use 100 simulations of the epidemic, each of which results in a large outbreak. Martingale estimates are calculated for different proportion infected in the population. For each proportion we have 100 data points to calculate estimates. We compare the means of these estimates with the theoretical values and the empirical standard deviations with standard deviations derived in Section 4.1. The asymptotic results are valid when the population size tends to infinity, and are therefore only approximations for finite population sizes. However, using a large population size, n ¼ 1000, when simulating the epidemic, we expect the estimates to be close to the theoretical values. In the simulations R0 ¼ a ¼ 1:5 which corresponds to r ¼ 0:874. The theoretical final size, s, is 0.58. The means estimates ~r for different proportion of infected are shown in Fig. 1. In Fig. 2 the empirical and theoretical standard deviations and illustrated. The estimated values are close to the theoretical values. Importantly, the standard deviation decreases the more of the epidemic curve we use to calculate ~r . The deviations of the means from the theoretical value are well within the limit of two standard deviations. It should be expected that the

275


NðtÞ

a~ ML ðtÞ ¼ R t

ð1 NðsÞ=nÞðNðsÞ Nððs 1Þ _ 0Þ þ 1ð0;1 Þds 0

Theoretical

1.1

ð4:12Þ

Estimate

This estimate is the same as a martingale estimate suggested by Becker [3] in a more general situation. We will compare the martingale and the ML-estimates with the simple estimate

Malthusian parameter

1

0.9

1.2 1.1

Theoretical

0.8

Martingale Maximum likelihood

1

0.6

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Proportion infected

Malthusian parameter

0.7

0.5 0.05

:

Simple

0.9 0.8 0.7 0.6 0.5

Fig. 1. Estimate of the Malthusian parameter (dashed line) together with the theoretical value, r ¼ 0:874 (solid line). The estimates are calculated from 100 simulations that results in a large outbreak.

0.4 0.3

0.35

0.2 0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

proportion infected Teoretical

0.3

Fig. 3. Martingale- (dashed line), Maximum likelihood- (stars) and simple estimates (dash dotted line) of the Malthusian parameter together with the theoretical approximation (solid line). The estimates are calculated from 100 simulations that results in a large outbreak.

Standard deviation

Estimated

0.25 0.6 0.55

0.2

Martingale

0.5

Maximum likelihood Simple

0.15

0.1 0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

proportion infected Fig. 2. Estimate of the standard deviations of the Malthusian parameter (dashed line) together with the theoretical values (solid line). The estimates are calculated from 100 simulations that results in a large outbreak.

Standard deviation

0.45 0.4 0.35 0.3 0.25 0.2 0.15

estimates for different proportions of infected are positively correlated. 4.2.2. Maximum likelihood and simple estimate For comparison we also investigate the ML-estimate and a simple estimate of the type described in Section 2. The ML estimate of ~ ML is derived using Eq. (9.2) in [7], which the contact intensity a yields

0.1 0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

proportion infected Fig. 4. Estimates of the standard deviations of Martingale- (dashed line), Maximum likelihood- (stars) and simple estimates (dash dotted line) of the Malthusian parameter together with the theoretical approximation (solid line). The estimates are calculated from 100 simulations that results in a large outbreak.

276


a~ simple ðtÞ ¼ logðNðtÞ=Nððt 1Þ _ 0ÞÞ:

ð4:13Þ

The three estimates are illustrated in Fig. 3. The ML estimator and the martingale estimator are close. The simple estimator is, as expected, heavily biased when the proportion of infected is high. As is shown in Fig. 4 the standard deviations of the ML- and martingale estimates are very close. If we have data of the whole epidemic outbreak, the comparison should be between the simple estimate calculated in the beginning of the epidemic, where it has a small bias, and the martingale estimate calculated using all data, where the former estimate has three times as large standard deviation in this particular example. 5. Random infectious and latent times We will now assume that the latent and infectious times are random for each infected individual and independent between individuals. For this more general model we can estimate the contact parameter a with

Rt

a~ ðtÞ ¼

0

ð1 Nn ðsÞ=nÞ1 dNn ðsÞ ; Rt IðsÞds 0

ð5:1Þ

where IðsÞ is the number of infectious individuals at time s. To calculate the denominator knowledge about the individual infectious periods is required. To estimate the Malthusian parameter, r, we can use the Eq. (3.4) which gives a one-to-one relation between a and r for the special model. In the simple model, considered in Section 4, an exact expresRt sion of 0 IðsÞds can be derived directly from the epidemic curve. With a more general generation time distribution we either have to have an observation of this integral or an estimate. If we have access to a precise observation the asymptotic properties of the estimates of a and r are the same as given in Theorems 1 and 2. Let TðtÞ be the, possibly random, time that an infected person has been infectious up till time t after the infection. If X is the duration of the latent time and Y the duration of the infectious period then

TðtÞ ¼ Y ^ ððt XÞ _ 0Þ:

ð5:2Þ

Asymptotically the estimates will be normal distributed with mean a. The asymptotic variance will, however, be larger due to the fact that ^JðT n;d Þ is a, possibly unprecise, estimate. An estimate ^ ðT n;d Þ is, cf appendix B, of the asymptotic variance of a 2

ðlogð1 dÞÞ

EðTðtÞÞ ¼ EððX þ YÞ ^ tÞ EðX ^ tÞ: A natural estimator of

Rt 0

IðsÞds is

t

Gðt sÞdNðsÞ þ EðYÞGðtÞ:

0

In appendix B, a sketch of a proof is given that

^JðT n;d Þ p !1; R T n;d IðsÞds 0 as n ! 1. We can thus use ^JðtÞ instead of gives the estimate

R T n;d

a^ ðT n;d ÞÞ ¼

0

ð1 Nn ðsÞ=nÞ1 dNn ðsÞ : ^JðT n;d ÞÞ

Rt 0

IðsÞds in Eq. (5.1). This

ð5:3Þ

T n;d

VðT n;d sÞdNðsÞ ;

ð5:4Þ

0

As an example we will use a model there both the latent and infectious times are assumed to be exponentially distributed. This is in seldom a realistic model but it is used here to illustrate how the estimation method works in a rather extreme situation. In contrast to the simple model, the exponential distributions gives a large variation in the individual infectivity. This will, of course, Rt make it difficult to find good estimates of 0 IðsÞds using only observations on the number of infected. If X and Y are independent and exponentially distributed with EðXÞ ¼ 1=l and EðYÞ ¼ 1=k, simple calculation yields that

hðrÞ ¼

l

; ðl þ rÞðk þ rÞ 1 k l ; EðTðtÞÞ ¼ 1 elt þ ekt k kl kl and

VarðTðtÞÞ ¼ VðtÞ " # 2 1 lk ðkelt lekt Þ lt kt ¼ 2 1 2ðe e Þ k ðk lÞ2 ðk lÞ2 þ 2tekt

where G is the cumulative distribution function related to the genRt eration time density, i.e. GðtÞ ¼ 0 gðsÞds. An alternative representation can be derived combining the expression (3.2) with the fact that if Z is a random variable with distribution function F Z then Rt EðZ ^ tÞ ¼ 0 ð1 F Z ðxÞÞdx. This gives

Z

5.1. A simulated example

EðTðtÞÞ ¼ EðYÞGðtÞ;

^JðtÞ ¼ EðYÞ

d a2 þ 1d n

where VðtÞ ¼ VarðTðtÞÞ. ^ ðT n;d Þ, and estimates Estimates of R0 are easily obtained as EðYÞa of the Malthusian parameter r can be obtained from the relation (3.4). The asymptotic properties follows from the approximation (4.10). Note that for a given generation time density g the estimate defined by (5.3) is proportional to 1=EðYÞ. Thus the estimate of R0 depends only on g (and not on EðYÞ. The same is true for the estimate of the Malthusian parameter.

We find that

Z

a2

l kðk lÞ

:

To illustrate the different estimates we have performed simulations of 100 epidemics with a large outbreak. The simulations are run in a population of size n ¼ 1000. We have assumed that the mean latency time is 1, the mean infectious time is 3, and R0 ¼ 1:5. This gives the parameter values a ¼ 0:5 and r ¼ 0:115. The estimates are calculated when half of the population is infected, i.e., d ¼ 0:5. Estimates of R0 are obtained by multiplying the estimate of a with EðYÞ. ^ In Table 1 the means of the estimates of R0 and r based on a from Eq. (5.1) are given together with the empirical standard deviation and the theoretical standard deviations given in Eqs. (4.8) and (4.11). Table 2 gives the corresponding data for estimates when R T n;d IðsÞds is substitued by ^JðT n;d Þ and the estimates are given by 0 Table 1 Simulated estimates based on observations of the number of infectious persons. Using (5.1) with d ¼ 0:5. Parameter

Mean of estimates

Empirical s.d.

Theoretical s.d.

R0 r

1.512 0.118

0.0567 0.0121

0.0492 0.0105

D. Lindenstrand, Å. Svensson / Mathematical Biosciences 246 (2013) 272–279 Table 2 Simulated estimates based on estimates of the number of infectious persons. Using (5.3) with d ¼ 0:5. Parameter

Mean of estimates

Empirical s.d.

Theoretical s.d.

R0 r

1.521 0.119

0.0741 0.0157

0.0678 0.0145

(5.3). The theoretical standard deviations are derived from Eq. (5.4). 6. Discussion The martingale estimates suggested in this paper use information from a large part of the epidemic outbreak, and can be calculated at any proportion infected. The analysis starts by considering a simple epidemic model with contact parameter a and constant infectious period of length one time unit. Since we restrict the infectious time to be exactly one time unit, we can derive the estimate of r using data of the number of infected. The estimate and its asymptotic properties are derived together with an approximation of its asymptotic standard deviation. Simulations verify that the martingale estimate becomes better the more data we use as opposed to the simple estimates which only uses the data from the start of the epidemic outbreak. In later sections the estimates are modified to be used in situations with more general generation time distributions. Also in this case asymptotic properties are given. The disadvantage is that the estimate requires that the generation time distribution is known. Knowledge of this distribution may come from observations on previous epidemic outbreaks or from studies of the progress of the infection in infected individuals. In the models we consider the generation time distribution can be derived from the simultaneous distributions of the latent and infectious times. Latent and infectious times differ between infections. For this reason it is important that the estimation method is general enough to take this into account. Data analyzed with an incorrect assumption of gðtÞ may cause a bias. There are some opportunities to check if an assumed generation time distribution is reasonable. It should, e.g., be expected that estimates calculated for different proportion of infected are stable, within the limits of random variation. Another possibility is to compare the estimate of r with a ’’non-parametric’’ estimate based on the initial phase of the epidemic which do not use any assumptions of the generation time distribution (cf Section (2)). Statistical properties of procedures based on such comparisons remain to be investigated. It is to be noted that if both the number of infected persons and the number of infectious persons are observed then it is possible to derive a consistent and asymptotically normal distributed estimate the parameter a, cf (5.1). If the mean infectious time is known this gives also a consistent estimate of R0 . The estimates discussed are based on that the epidemic curve is observed. Of course, it may be difficult to obtain observations on the times of all infections. However, often it may be possible to construct an approximation from partial observations or from surveys. The effect of using such approximations has to be considered in each special situation. Appendix A To prove Theorem 1 and Theorem 2 we need the following two lemmas. The definition of F n;d is as before F n;d ¼ fN n ð1Þ P nd; d 2 ð0; sÞg,

Mn ðT n;d Þ d pffiffiffi ) N 0; 1d n as n ! 1.

pffiffiffiffiffiffiffi Proof. Define X n ðtÞ ¼ M n ðtÞ psiffiffin, where i ¼ 1 and s is a real number. For the moment we regard s as a fixed number and supress Rt that X n depends on s. Let Z n ðtÞ ¼ 1 þ 0 X n ðyÞdXðyÞ. Then fX n ðtÞ; t P 0g is a martingale and thus trivially a local martingale. It follows that fZ n ðtÞ; t P 0g is a local martingale (see e.g. [10], Theorem 4.61). The explicit solution Z n ðtÞ, known as the stochastic exponential, is

Z n ðtÞ ¼ Z 1n ðtÞZ 2n ðtÞ 1 Z 1n ðtÞ ¼ exp X n ðtÞ ½X n ðtÞ 2 Y 1 2 ð1 þ DX n ðuÞÞ exp DX n ðuÞ þ ðDX n ðuÞÞ2 Z n ðtÞ ¼ 2 0 0, then

Z

u

nd X 1 ð1 Nn ðsÞ=nÞ dNn ðsÞ ¼ ð1=nÞ ð1 ðj 1Þ=nÞ 1

0

j¼1

¼

nd1 X

1=ðj nÞ ! logf1 dg

ZðtÞ ¼

Z

t

IðsÞds ^JðtÞ

0

¼

XZ u

t

T u ðt sÞ½dNn;u ðsÞ pu dNn ðsÞ þ AðtÞ: 0

Simple calculations yield that the process ZðtÞ has mean 0. Thus ^JðtÞ Rt is a natural estimator of 0 IðsÞds. Furthermore

EðZ 2 ðtÞÞÞ ¼ E

Z

t

Vðt sÞdNn ðsÞ þ VðtÞ;

ðB:1Þ

0

where VðtÞ ¼ VarðTðtÞÞ, and

CovðM n ðtÞ; ZðtÞÞ ¼ 0:

ðB:2Þ

j¼0

the lemma follows. Proof of Theorem 1 We have that

pffiffiffi M ðT Þ n ~ n ðT n;d Þ aÞ ¼ npffiffiffin;d R T nða n;d n IðsÞds 0

If the outbreak is large we will use the approximation ^JðT n;d Þ at the stopping time T n;d ¼ minft > 0; N n ðtÞ=n > dg. Heuristically the probability that the m’th infected has an u-profile contitional on T n;d ¼ t < 1 is pu . A more formal proof can be constructed using that the time changed processes N un ðK1 ðsÞÞ are independent Poisson processes with intensities pu , where K is the inverse of Rt KðtÞ ¼ a 0 ð1 Nn ðsÞ=nÞIðsÞds. The same calculations as above yield that

and Theorem 1 follows from Lemma A1 and A2 and Slutsky’s Theorem.

EðZðT nd Þ AðT nd ÞjNn ð1Þ P ndÞ ¼ 0;

Proof of Theorem 2

In the same way we may extend Eqs. (B.1) and (B.2) to hold conditional that N n ð1Þ P nd. The conditional event is asymptotically equivalent to the condition that there is a large outbreak. In Appendix A we proved that

a~ ðT n;d Þ a ¼

Mn ðT n;d Þ n R T n;d n IðsÞds 0

and Theorem 2 follows from Lemma A1 and A2.

This will be a sketch of proof that we can estimate the sum of infectious times up till the time t if we know the epidemic curve and the generation time distribution. Let X denote the length of the latent period and Y the length of the infectious period. The time an infected person has been infectious up till time t after the infection is TðtÞ ¼ Y ^ ððt XÞ _ 0Þ. We will call such a T an infectious profile. For simplicity, we restrict ourselves to the case that there is a finite number of infectious profiles. We will distinguish them by the sub-index u. Each infected individual is given such a profile randomly when infected. The probability to have an uprofile is pu . Let N un ðtÞ be the number of infected persons with the profile u up till time t. Then N un is a counting process with intensity P pu að1 N n ðtÞ=nÞIðtÞ, where N n ðtÞ ¼ u N un ðtÞ. The model imply that if there is a jump in the N n -process then it is a jump in the N un -process with probability pu . The time that an infective with u-profile is infective up till time a after the infection is by assumption a non-random function. If u0 is the profile of the initial infector then

0

t

IðsÞds ¼

Z

T n;d

p

IðsÞds!

logð1 dÞ

0

a

as n ! 1 if there was a large outbreak. Since AðT nd Þ=n ! 0 it follows that

Appendix B

Z

1 n

XZ u

t 0

T u ðt sÞdNn;u ðsÞ þ T u0 ðtÞ

^JðT n;d Þ p !1; R T n;d IðsÞds 0 as n ! 1 in case of a large outbreak. ~ ðT n;d ÞÞ, see (5.1), and a ^ ðT n;d Þ, see (5.3), are related The estimates a through the identity

^JðT Þ pffiffiffi pffiffiffi n ZðT n;d Þ ~ ðT n;d ÞÞ aÞ ¼ nða ^ ðT n;d ÞÞ aÞ R T n;d þ a R T pffiffiffi : nða n;d n;d n IðsÞds IðsÞds 0 0 We find that an estimate of the asymptotic variance of pffiffiffi ^ ðT n;d Þ aÞ is nða

a2 2

ðlogð1 dÞÞ

d a2 þ 1d n

Z

T n;d

VðT n;d sÞdNðsÞ :

0

References [1] P. Jagers, Branching Processes with Biological Applications, Wiley-Interscience, John Wiley & Sons, London, 1975. [2] Å Svensson, A note on generation times in epidemic models, Mathematical Biosciences 208 (1) (2007) 300.

D. Lindenstrand, Å. Svensson / Mathematical Biosciences 246 (2013) 272–279 [3] N.G. Becker, Analysis of Infectious Disease Data, Chapman and Hall, London, 1989. [4] N.G. Becker, Martingale methods for the analysis of epidemic data, Statistical Methods in Medical Research 2 (1) (1993) 93. [5] T. Britton, D. Lindenstrand, Epidemic modelling: aspects where stochasticity matters, Mathematical Biosciences 222 (2) (2009) 109. [6] J.M. Gran, B. Iversen, O. Hungnes, O.O. Aalen, Estimating influenza-related excess mortality and reproduction numbers for seasonal influenza in Norway, 1975–2004, Epidemiology and Infection (2010).

279

[7] H. Andersson, T. Britton, Stochastic Epidemic Models and their Statistical Analysis, Springer, New York, 2000. [8] P.E. Protter, Stochastic Integration and Differential Equations, Springer-Verlag, Berlin, 2005. [9] R.Sh. Liptser, A.N. Shiryayev, Theory of Martingales, Kluwer Academic Publishers Group, Dordrecht, 1989. [10] J. Jacod, A.N. Shiryaev, Limit Theorems for Stochastic Processes, SpringerVerlag, Berlin, 1987.

Comparison of Parameter Estimation Methods in Stochastic Chemical Kinetic Models: Examples in Systems Biology.

Improved estimation of the initial number of susceptible individuals in the general stochastic epidemic model using penalized likelihood.

Tensor methods for parameter estimation and bifurcation analysis of stochastic reaction networks.

A comparison of Monte Carlo-based Bayesian parameter estimation methods for stochastic models of genetic networks.

Parameter estimation in stochastic chemical kinetic models using derivative free optimization and bootstrapping.

Fast Bayesian parameter estimation for stochastic logistic growth models.

On a general stochastic epidemic model.

On the stochastic SIS epidemic model in a periodic environment.

Vaccination Control in a Stochastic SVIR Epidemic Model.

Bayesian Estimation and Inference Using Stochastic Electronics.

Estimation of parameter uncertainty for an activated sludge model using Bayesian inference: a comparison with the frequentist method.

Variational Bayesian Parameter Estimation Techniques for the General Linear Model.

Parameter Estimation Using Divide-and-Conquer Methods for Differential Equation Models.

Fokker-Planck and Fortet Equation-Based Parameter Estimation for a Leaky Integrate-and-Fire Model with Sinusoidal and Stochastic Forcing.

Kinetic parameter estimation in N. europaea biofilms using a 2-D reactive transport model.

An inverse problem formulation for parameter estimation of a reaction-diffusion model of low grade gliomas.

An unbiased two-parameter estimation with prior information in linear regression model.

Stochastic compartmental modeling and parameter estimation with application to cancer treatment follow-up studies.

Automated parameter estimation for biological models using Bayesian statistical model checking.

An evolutionary computing approach for parameter estimation investigation of a model for cholera.

Model-based MR parameter mapping with sparsity constraints: parameter estimation and performance bounds.

A simulation model of AIDS in San Francisco: I. Model formulation and parameter estimation.

Identifiability and parameter estimation.

Correction: Comparison of Two New Robust Parameter Estimation Methods for the Power Function Distribution.