Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q1 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Q5 Q4 53 54 55 56 57 58 59 60 Q3 61 62 63 64 65 66

Contents lists available at ScienceDirect

Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi

On linear models and parameter identifiability in experimental biological systems Timothy O. Lamberton a, Nicholas D. Condon b, Jennifer L. Stow b, Nicholas A. Hamilton a,b,n a b

Division of Genomics & Computational Biology, Institute for Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia Division of Molecular Cell Biology, Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia

H I G H L I G H T S

   

Methods are developed to design optimal experiments to ensure model parameter identifiability. We give sufficient conditions to identify parameters in linear, diagonalizable ODE models of experiments. A framework for proving identifiability in linked models is demonstrated. Applications to experimentally common situations and measurement protocols are shown.

art ic l e i nf o

a b s t r a c t

Article history: Received 10 September 2013 Received in revised form 19 May 2014 Accepted 20 May 2014

A key problem in the biological sciences is to be able to reliably estimate model parameters from experimental data. This is the well-known problem of parameter identifiability. Here, methods are developed for biologists and other modelers to design optimal experiments to ensure parameter identifiability at a structural level. The main results of the paper are to provide a general methodology for extracting parameters of linear models from an experimentally measured scalar function – the transfer function – and a framework for the identifiability analysis of complex model structures using linked models. Linked models are composed by letting the output of one model become the input to another model which is then experimentally measured. The linked model framework is shown to be applicable to designing experiments to identify the measured sub-model and recover the input from the unmeasured sub-model, even in cases that the unmeasured sub-model is not identifiable. Applications for a set of common model features are demonstrated, and the results combined in an example application to a real-world experimental system. These applications emphasize the insight into answering “where to measure” and “which experimental scheme” questions provided by both the parameter extraction methodology and the linked model framework. The aim is to demonstrate the tools' usefulness in guiding experimental design to maximize parameter information obtained, based on the model structure. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Modeling Experimental design Parameter identifiability Ordinary differential equations

1. Introduction and background Linear systems of ordinary differential equations (ODEs) are a widely applied tool used for modeling of biological systems to generate hypotheses for experimental testing. Linear models are used in fields such as molecular cell biology to investigate endocytic traffic and sorting (Ghosh et al., 1994; Sheff et al., 2002; Henry and Sheff, 2008) and phosphoinositide transformation in endocytosis n Corresponding author at: Division of Genomics & Computational Biology, Institute for Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia. Tel.: þ 61 7 3346 2033; fax: þ 61 7 3346 2101. E-mail address: [email protected] (N.A. Hamilton).

(Belward et al., 2011), and systems biology to study physiology (Cobelli and DiStefano, 1980), gene expression (Crampin, 2006) and regulatory networks (de Jong, 2002). The rapid advance of technologies, such as fluorescent microscope imaging technologies which allow imaging of the behavior of multiple proteins at multiple locations in live cells to be quantified and analyzed, have given access to large amounts of data which can be used for robust model development and validation (Hamilton, 2009), and has already lead to breakthroughs in understanding major diseases such as Parkinson's Disease (Follett et al., 2014). The rise in the use of mathematical modeling in biological fields brings with it pitfalls which are to be avoided. Often such models include large numbers of parameters which are impractical to measure individually, and must be inferred

http://dx.doi.org/10.1016/j.jtbi.2014.05.028 0022-5193/& 2014 Elsevier Ltd. All rights reserved.

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

from their effect on the system, using data sets which can be incomplete and noisy (Ghosh et al., 1994; Sheff et al., 2002; Voit et al., 2006; Henry and Sheff, 2008; Berthoumieux et al., 2011). The problem of parameter estimation for linear and non-linear ODEbased models has been extensively studied from a variety of perspectives, for example, applied mathematics (Liu et al., 2011) and systems biology (Jaqaman and Danuser, 2006; Ashyraliyev et al., 2009). Complicating the process of parameter estimation is the idea of parameter identifiability, that is, whether different combinations of parameter values lead to indistinguishable model output, both in terms of inherent model structure (structural parameter identifiability) and due to experimental noise (practical parameter identifiability) (Bellman and Åströ m, 1970; Cobelli and DiStefano, 1980; Holmberg, 1982; Chou and Voit, 2009; Nikerel et al., 2009; Chen et al., 2010; Berthoumieux et al., 2012). Approaches to dealing with identifiability have been developed for a variety of modeling situations. The structural identifiability problem (defined precisely below) is more fundamental than the practical identifiability problem, in that it represents a best-case scenario for any practical identifiability analysis, and also more analytically tractable, with well-defined approaches for linear (Bellman and Åströ m, 1970; Cobelli and DiStefano, 1980) and many non-linear models (Pohjanpalo, 1978; Jaqaman and Danuser, 2006; Bellu et al., 2007; Nemcova, 2010; Chis et al., 2011). The problem of practical identifiability, which considers whether experimental noise will allow parameters that may be structurally identifiable to be resolved to a level of certainty, is generally approached by developing confidence intervals for estimated parameters and considering the parameter space of an objective function near solutions, either using quadratic approximations or numerical methods (Cobelli and DiStefano, 1980; Raue et al., 2009, Berthoumieux et al., 2012). Bayesian frameworks have also be used to represent parameter uncertainty and model sensitivity to parameter values (Liepe et al., 2013). Practical identifiability of biological systems is a difficult problem, as illustrated by a study (Gutenkunst et al., 2007) where it was found that biological systems show degrees of sensitivity distributed fairly evenly over multiple orders of magnitude to changes in different independent combinations of parameters, with highly sensitive parameters being difficult to constrain with even large amounts of data. There are intuitive gaps in the underlying causes of both structural and practical identifiability. Although practical identifiability is fundamentally a problem of experimental noise, understanding model sensitivity to parameter variation in most cases involves performing complex analytical and often numerical calculations. In the case of structural identifiability, many of the analytic approaches allow the straight-forward computation of the binary identifiability/non-identifiability of model parameters on a case-by-case basis; however it is often the case that these approaches miss a broad intuition of what underlying mechanics of models allows or does not allow the determination of the parameters. A related issue to parameter identifiability is experimental design optimization, that is, how to maximize information obtained from measurements. An effort has been made (Liepe et al., 2013) to address this problem using a Bayseian framework as a general computational method for optimizing experiments. An analytical approach to both parameter identifiability and experimental design problems could provide several advantages over computational methods, by providing fine-grained feedback on how parameters affect outputs under a wide range of conditions, and lead to intuitive rule-of-thumb guidance for experimental design. Here, theorems applicable to broad classes of biological structural parameter identifiability problems are proved. In Section 2.1,

new methods are developed which consider the underlying mechanics of models and how they are captured by experimental design, allowing the development of measurement schemes to efficiently extract parameter information. These methods are demonstrated with applications to a range of commonly used models (see figures) to design experimental schemes to optimize the identification of model parameters. In Section 2.2 a framework for constructing complex models by considering the outputs of components of simple model classes as inputs to other models to create linked models is outlined. By considering the problem of identifying the combined model, it is shown that under certain conditions the component models can effectively be distinguished, and structural identifiability results for the individual models when not linked also hold for the corresponding components of the combined model. It is also shown that even when components of the models are “hidden“ from measurement much can often be inferred about the hidden parts from the measurable components. This linked model framework is also applied to a range of commonly used model configurations, which are in turn used to analyze a realistic biological system. The specific focus of the paper is on the broad family of mathematical models, linear systems of ordinary differential equations (ODEs). Cobelli and DiStefano (1980) and Raue et al. (2011) give good overviews of modeling using linear systems of ODEs in the context of physiological systems and gene signaling networks respectively, fields where linear models have been widely and successfully applied. Following a similar framework, here we will consider models that have the following linear general form: d xðtÞ ¼ AxðtÞ dt

ð1Þ

for a model state vector x with n components. The initial state of the model is specified by the length n vector x(0) ¼b and the model structure by the constant n  n matrix A. The vector of m model observables is given by yðt i Þ ¼ Cxðt i Þ þ εi

ð2Þ

where εi is a length m vector of measurement noise assumed to be normally distributed. C is an m  n non-negative matrix with each row defining the combinations of model locations measured in a corresponding experimental data set. The model structure A and the initial state b are functions on the parameters θ of the model. A is dependent on a set of constant parameters k (rate constants) that denote the rates at which the model components interact and together with an independent initial state b make up the parameters θ. For example, in cell protein trafficking models (Ghosh et al., 1994; Sheff et al., 2002; Henry and Sheff, 2008) the model parameters consist of rate constants at which a protein of interest is trafficked between compartments (such as endosomes) in the cell, as well as initial concentrations of the protein in the compartments. In enzyme reaction models such as those used by Raue et al. (2011), the model parameters are the rates at which the presence of enzymes inhibit or accelerate the production of other enzymes, as well as initial concentrations. The model states represent the concentration of protein in the intracellular compartments, or the enzyme concentrations in a solution, for example within a cell cytoplasm, or an organism's blood stream. An often used experimental set-up (Bellman and Åströ m 1970; Cobelli and DiStefano, 1980; Henry and Sheff, 2008) is a pulse or tracer input at a single location, which is then measured over time at other locations. In terms of Eqs. (1) and (2) this corresponds to an initial state b¼bieTi for the pulse location i, where ei is a standard basis vector (1 in the ith component and zeros elsewhere), and bi denotes the size of the input. Another example set-up is a fill/release method, where a

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

model is connected to a fixed input and filled or saturated, and then allowed to evolve naturally from a steady state/saturation level corresponding to a vector b that may depend on the rate constants of a model (see Application 4). In general, the observation matrix C will depend on the experimental set-up, with the most common form having rows with a single non-zero entry corresponding to measurements at independent model locations, such as in the tracer experiment at location i mentioned above where the measurement of the system occurs at the same location as the input, corresponding to C ¼ eTi . It also is possible that multiple locations will be measured simultaneously. For example, an often used subcellular trafficking experimental scheme involves using a fluorescently-tagged marker, such as a protein, that is known to localize at specific locations within a cell, to be used as a mask for another fluorescentlytagged (of a different color) protein of interest. One can measure fluorescent intensity of both colors over time, and calculate out the total fluorescent intensity of the protein of interest that coincides with the marker fluorescence. In some cases, either by practical limitations or design, markers for different locations may use the same color fluorescence, and hence be indistinguishable from measurements of total fluorescence, meaning that measurements of coincidence would involve the summed fluorescent intensity of the protein of interest in all of the marker locations using the same color. The values of entries of C represent conversion factors between the observed/measured quantity (e.g. fluorescent intensity) and the quantity of interest (e.g. number/proportion/concentration of protein), and can be functions of the model parameters. For example, consider an experiment that measures the exit or loss from a cell of a protein that is known to exit via a defined pathway, by measuring the decline in fluorescence over time (assuming that outside the cell the protein concentration becomes highly dilute and has negligible fluorescence). For this experiment, measurements, and hence C, would depend on the exit rate of the pathway (Ghosh et al., 1994). An important assumption of parameter inference is that the model parameters can be identified from perfect experimental data. This assumption is known as structural (parameter) identifiability (or just identifiability in the following). A commonly-used method for determining the identifiability of linear models is the transfer function method (Bellman and Åströ m, 1970; Cobelli and DiStefano, 1980; Raue et al., 2011). The transfer function of a system is the Laplace transform of the system output (in the absence of noise) evolving from an initial state b

φðsÞ ¼ CLfxgðsÞ ¼ CðsI AÞ  1 b

ð3Þ

where L{x}(s) is the Laplace transform of x(t). The elements of Eq. (3) are scalar functions with coefficients that depend on the

3

parameters of the model θ and are theoretically obtainable from experimental time-series measurements as will be shown below. In this work the following definition is applied. Definition 1. A model is system identifiable if the transfer function given by Eq. (3) yields a system of equations that is solvable in terms of the model parameters θ. It should be noted that in the literature there are slight variations in definitions of identifiability. For example, Raue et al. (2011), Cobelli and DiStefano (1980) and others also define the identifiability of individual model parameters (i.e. is a particular parameter uniquely specified by a model output). This concept is often termed parameter identifiability, with system (or model) identifiability used when considering if all parameters of a model are identifiable. Raue et al. (2011) would consider a case where Eq. (3) has multiple (but finite) solutions to not be identifiable as each parameter does not have a unique value, but this work would consider such a case as identifiable. The definition used by Raue et al. (2011) corresponds to this work's definition of unique parameter identifiability: a parameter θα is uniquely identifiable if its solution (in terms of the system of equations obtained from Eq. (3)) is unique. The definitions applied in this work are consisted with Cobelli and DiStefano (1980), among others. For a comprehensive summary of methods for determining system identifiability see Cobelli and DiStefano (1980), Raue et al. (2011) and Chis et al. (2011). Most methods to determine identifiability of a model such as the transfer function method provide a test to check if a model is identifiable on a model-by-model basis. An example of this can be seen in the second example in the work of Bellman and Åströ m (1970)) (see Fig. 1 of the work, also Fig. 1 of Cobelli and DiStefano (1980)). Bellman and Åströ m (1970) show that an experimental set-up with a tracer introduced to a compartment, which is then measured, is not sufficient to identify the four parameters of the model. The model is similar to the single-loop feedback model discussed in this work (shown in Fig. 3) but with an extra pathway and a known initial state. Several questions can be raised about this example. Would additionally measuring at compartment 2 allow the parameters of the model to be identified? Would it give additional or redundant information? Additionally, would experimental schemes other than a single tracer input, such as fill/release or multiple tracers, instead allow the parameters to be narrowed down? This example helps to illustrate that using the transfer function method and other test methods to answer “where to measure” or “which input scheme” questions require a range of different choices of C and b to be individually tested. Further, to date, methods for parameter identifiability generally do not easily allow understanding of internal features of a model

Fig. 1. The reaction network model used by Belward et al. (2011) details the transformation of phosphoinositides attached to membrane surfaces of the cell which are critical for correct vesicle trafficking. This model can be considered as a linked model.

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

that may lead to it being identifiable or not. Consider Fig. 1 from Belward et al. (2011). The compartments of the model refer to concentrations of different phosphoinositides on endosome membranes, and the links are one way conversions of phosphoinositides, with an associated conversion rate constant. This model exhibits multiple features, including a feedback loop between two compartments (PI45P22PI345P3) similar to the single-loop feedback model of Fig. 3, some single compartments (PI, PI34P2), and a chain of two compartments (PI3P-PI35P2). The two-compartment feedback loop feature shown by Bellman and Åströ m to be unidentifiable is similar to PI45P22PI345P3, however this previously found result cannot be used to simplify the analysis of the model in Fig. 1 when applying standard methods such as the transfer function method, which would require reapplying the method for the entire new model. In contrast to current methods, the framework and methods developed in this work can reveal, for example, where the important factors of the model can be best measured, to individually isolate parameters of interest. Additionally, using the methods developed here, identifiability results for simple models can be applied to combined models that contained the simple models as building blocks, allowing rapid determination of identifiability and experimental design in complex situations (see Application 7 which demonstrates this for the model in Fig. 1). The overarching aim is to be able to guide experiments via optimal choices of measurement locations and experimental set-up. Following this introduction, in Section 2.1, with a weak assumption applied to Eqs. (1) and (2) (model diagonalizability with distinct eigenvalues) it is shown that experimentally determined quantities can be related to elements of a diagonalized form of the transfer function Eq. (3), providing three varieties of equations for the parameters of the model: eigenvalue equations, residue relations, and equations for the initial state. The methods used take into account the form of the model to give a precise upper bound on the number of available equations. It is shown that a sufficient condition for the identifiability of the model is presented if the system of derived equations is solvable for the model parameters. The methods are demonstrated with applications to the following building block models, to derive equations and explicitly solve for the model parameters. Application 1 is the n-chain model, which is a simple yet practically applicable model form consisting of a unidirectional linked chain of compartments, as represented graphically in Fig. 2. The n-chain model often appears as part of intracellular protein dynamical models (Ghosh et al., 1994), where the compartments represent distinct protein populations on the cell surface or within the cell, and the links represent the transfer of the populations between compartments. Each link has an associated rate constant kα which determines the relative size of transfers. For example, the model shown in Ghosh et al. (see Fig. 8 of that work) depicts two distinct trafficking pathways for two proteins, transferrin (Tf) and low-density lipoprotein (LDL). As each protein takes a distinct pathway through the cell after entry, they can be separately modeled as n-chains. Here it will be demonstrated that measuring over time at the end of the chain (the nth compartment) allows

the identification of all the model parameters. As an example of Application 1, data for an explicit n ¼5 experiment is simulated, and the methods developed in Section 2.1 applied to determine the multiple possible values of the model parameters used to simulate the data. It is also shown that a prior knowledge of initial states can be used to uniquely determine the model parameters. In Application 2 the methods are applied to a single-loop feedback model as shown in Fig. 3. An example of this model can be seen in the transformation model used by Belward et al. (2011) shown in Fig. 1. It is shown that measurements over time of location 1 and the output of location 2 along the k2 pathway are enough to uniquely identify the model. How this can be achieved experimentally is also discussed. Another common type of model configuration considered in Application 3 is a diverging R-branch model, as shown in Fig. 4. An R¼ 2 configuration (i.e. two branches) of this model appears as a feature of the intracellular dynamical model used by Henry and Sheff (2008). Here it will be shown that measuring the output of location 1 along R  1 of the exit pathways allows the parameters of the model to be uniquely identified. Section 2.2 presents a methodology to combine identifiability results for individual models to interrogate the identifiability of more complex linked models. Linked models are defined as models formed when two independently parameterized models are connected such that the output of one model (the hidden component model) is input into the other model (the observable model). Methods are developed to investigate the information that can be extracted from measurements over time of the observable component model. As an example of the linked model methodology, consider the model from Belward et al. (2011) in Fig. 1. This model can be considered as a series of nested linked models: a single feedback loop (PI45P22PI345P3) linked to a single compartment (PI34P2) which can be considered as a 1-chain, which in turn is part of converging 2-branch linked model (see Fig. 6) with another single compartment/1-chain (PI) – both linked to a 2-chain (PI3P-PI35P2). Applying to linked models a similar methodology as that applied in Section 2.1 to individual models allows the derivation of eigenvalue equations, equations for the initial state, and residue relations for the linked model. It is also shown that under certain circumstances, the transfer function of the input to the observable model of a linked model can be recovered. As the input to the observable model is equivalent to the transfer function of the hidden model, once it is recovered, the

Fig. 3. Another common model configuration is a single-loop feedback model consisting of two interlinked compartments.

Fig. 2. An n-chain model is a unidirectional linked chain of n compartments.

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

hidden model can be studied as a separate model using the methodology of Section 2.1 to determine identifiability. If the input transfer function is known, then it is shown that the initial state of the observable model can be identified. As with Section 2.1 methods, the linked model methodology is applied to a range of building block model configurations. Applications 4 and 5 consider linked models with observable components being the n-chain and single feedback loop models of Section 2.1 respectively, with an arbitrary hidden model providing input (the linked feedback loop model is represented in Fig. 8). In both cases, under the same measurement conditions of Applications 1 and 2, using the linked model methodology and identifiability results from Section 2.1 the models are shown satisfy the conditions required to recover the hidden model input, and an expression for the transfer function of the hidden arbitrary input in terms of experimentally deduced quantities obtained from measurements over time of the observable model of linked models is derived, as well as expressions for the initial state of the observable models. In addition, Application 4 (n-chain) gives an example with a specific hidden model: a fixed source/constant input, linked to the observable n-chain as shown in Fig. 5. The rate of flow into the n-chain model is denoted k0 . The fixed source linked model could correspond to the fill phase of the fill/release experimental scheme mentioned earlier, in which a cell is bathed in a media that contains a high concentration of a protein of interest that is internalized and trafficked along a specific pathway through the cell. For a large enough bath such that the internalized protein fraction over the bathing time is insignificant, the rate at which protein is internalized is limited by the amount of protein that can attach to the cell surface, and can be considered constant. It will also be shown that by considering the steady state reached in the saturate phase that the combination with a release phase this scheme can provide a relationship between the initial state of the release phase and the rate constants, increasing the number of available equations compared with a release scheme from an arbitrary state. The sixth application of the linked model methodology is to a converging R-branch model. In this model R independently parameterized arbitrary inputs (denoted A1 to AR in Fig. 6) are linked to an observable model, as shown in Fig. 6. It is shown that by considering the R inputs as a single input that the linked model methodology can be applied under certain circumstances to identify the component inputs. The last application is to the phosphoinositide transformation model of Belward et al. (2011) shown in Fig. 1, where results

determined for the building block models are used to determine the minimum number and locations of measurements that are needed to identify the model, without any further calculations. Finally, the Discussion highlights the results and applications to directing experimental design to maximize information extracted, by answering “where to measure” questions and allowing the investigation of different experimental schemes, as well as giving a framework for analyzing complex models in terms of more simple features or components. Future avenues for investigation are suggested, primarily relating to adapting the approaches discussed to account for experimental noise, model parameter sensitivity and fit uncertainty when considering identifiability (i.e. practical identifiability). The analytical approaches in this work could be adapted to develop a framework to understand experimental (including biological) noise and develop insights about its effect on relationships between model parameters and output. Appendix A presents proofs of the propositions and corollaries presented in the work.

2. Results 2.1. Extracting equations from the transfer function This section presents a methodology to extract information about model parameters and initial state from experimental measurements of the modeled system, in the form of eigenvalue equations, equations for the initial state, and residue relations. Also some general sufficient conditions for system identifiability are derived. The application of the methodology is demonstrated on a range of model systems. 2.1.1. Diagonalizable models The following considers a general family of models that are diagonalizable with distinct eigenvalues, as expressed by Assumption 1. This assumption of diagonalizability will be shown to be widely applicable to a range of biologically relevant linear models. Assumption 1. A model described by Eqs. (1) and (2) where A is diagonalizable with distinct eigenvalues. h iT The n eigenvalues of A are denoted λ ¼ λ1 λ2 … and the eigenvectors by the corresponding columns of an n  n matrix V. With Assumption 1, the output of a linear systems (1) and (2) is, ignoring noise, given by a superposition of exponentials. By curvefitting methods, estimates of the coefficients and exponents of the superposition can be obtained from experimental time-series data. Such a fit (to the data) would take the general form f j ðtÞ ¼ r j1 expðp1 tÞ þ ⋯ þr ji expðpi tÞ þ ⋯ þ r jn expðpn tÞ

Fig. 4. The diverging R-branch model, with R branches.

5

ð4Þ

The fit fj(t) is assumed to be the best approximation of the observable yj(t). Recall that by Eq. (2), yj(t) represents a superposition of the states x of the model, with coefficients determined by the jth row of C. It is assumed that time-series data of the observables represented by the elements of y of sufficient quality to fit curves that match exactly to the output signal of the model

Fig. 5. Fixed source/constant input model linked to an n-chain model.

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 Q6110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Fig. 6. Converging R-branch model, with R input models denoted A1 to AR linked to an observable model B.

Eq. (2) (i.e. ignoring noise). This is often not a realistic assumption in practice, but represents a best-case scenario that must be satisfied, as a system that is not identifiable under this assumption will not be in any experimental situation. Note that in general, issues of practical identifiability (see Section 1) will also need to be dealt with when applying the following theorems in real experimental situations. The Laplace transform of a fit fj is given by Lff gj ðsÞ ¼

r j1 r ji r jn þ⋯þ þ⋯ þ s  p1 s  pi s  pn

ð5Þ

and corresponds to the best approximation to the transfer function element φj(s). Borrowing some terminology from complex analysis, this work will refer to the poles pi and residues rji of the Laplace transform of the fit L{f}j (Ahlfors, 1966). By comparison with Eq. (4) it can be seen that the residues are the coefficients of the fitted exponential terms, and the poles are the exponent coefficients. With this equivalence in mind, in the following both Eq. (4) and the Laplace transform Eq. (5) are referred to as a fit. 2.1.2. A note on non-diagonalizable models Mathematically, a non-diagonalizable model will have an incomplete basis of eigenvectors and repeated eigenvalues. Although unlikely in general, this can occur in specific modeling situations, for example, if a closed system is given continuous input, then the total system will grow linearly. In cases such as this, it may be that the model uses unrealistic assumptions that need to be rethought, either by creating an exit path to open the system, or by switching to a more realistic non-linear model. If the model is known to be realistic, a direct solution to the original transfer function can be sought, without applying any of the theorems in this work. It seems also a possible direction for future work could be to generalize the method for non-diagonalizable models by using a Jordan normal form of A, which would apply for fits that do not have the form of Eq. (4). 2.1.3. Diagonalized form of the transfer function For a model defined by Eqs. (1) and (2), the m transfer function observables Eq. (5) will have at most n poles and n residues, where n is the number of model locations. This means that the transfer function method gives at most 2nm equations with which to deduce the model parameters. Efficient experimental design

including choices of measurement locations and schemes (i.e. choices of C and b) is important to avoid collecting redundant data and to collect the most informative data sets to allow the model parameters to be inferred. So what information does a particular choice of C and b tell us? To straight-forwardly connect experimental fits Eq. (5) to the parameter form of the transfer function Eq. (3), with Assumption 1 a diagonalized form of Eq. (3) can be considered. Given Assumption 1, the transfer function Eq. (3) can be written as

φðsÞ ¼ CVðsI  diagðλÞÞ  1 V  1 b:

ð6Þ

Recall that V  1 is the inverse matrix of V and has as rows the left eigenvectors of A. Equivalently, the matrix V  1 is given by the transpose of the eigenvectors V of the matrix AT. By writing Eq. (6) in component form it can be immediately seen that Eq. (3) follows the form of Eq. (5) under Assumption 1. It is possible to develop for Eq. (6) an analogous interpretation to Eq. (3). In place of C define Cn ¼ CV:

ð7Þ

The rows of Cn can be interpreted as indicating the model eigenstates (as opposed to the physical states) of the model sampled in the corresponding fit to φ. Similarly, in place of b define b ¼ V  1b n

ð8Þ

which can be interpreted as the initial profile (concentrations) within the model eigenstates. Replacing A is the diagonal matrix of the eigenvalues λ that represents the relative size or power of the eigenstates. Recall that the eigenstates V  1x of the model are linear combinations of the states of the model that exhibit independent self-contained dynamics. With Eqs. (7) and (8), Eq. (6) can be rewritten as

φðsÞ ¼ Cn ðsI  diagðλÞÞ  1 bn :

ð9Þ

Keeping in mind Assumption 1, connecting fits to experimental observations of the form of Eq. (5) to the parameters of the model can be accomplished via the following propositions and corollary. For considerations of readability, proofs of the propositions and corollary are provided in Appendix A.

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 Q7 66

7

2.1.4. Eigenvalue equations By writing Eq. (9) in component form Proposition 1 can be deduced.

constants k. To derive expressions for the initial state b parameters, it is necessary to consider the relationship between the transfer function and fit residues. Consider the following proposition.

Proposition 1 – eigenvalue equations. For a model of the form given by Eqs. (1) and (2) and given Assumption 1, the ith eigenvalue λi is given by a pole pj of a fit of the form of Eq. (5) if the ith column of Cn ¼CV and the ith element of bn ¼V  1b are non-zero.

Proposition 2. For a model of the form of Eqs. (1) and (2) with Assumption 1, if all of the columns of Cn have at least one non-zero element then for the corresponding non-zero elements of bn a vector h iT r j1 1 r j2 2 … of residues corresponding top1 , p2 , …, can be

Let λi1 , λi2 , …, be the eigenvalues of a model where the conditions of Proposition 1 hold. We can express the model eigenvalue equations as h iT h iT λi1 λi2 … ¼ pπð1Þ pπ ð2Þ … ð10Þ for some arbitrary permutation π. The permutation π arises due to the ability to arbitrarily order the terms in the sums in both Eqs. (5) and (9). Hence it is not possible without a priori knowledge to know which of the terms of a fit correspond to which eigenstates of the model, and including the permutation π in Eq. (10) accounts for all possible alignments. Each additional eigenvalue equation obtained experimentally gives more information about the model parameters. The conditions of Proposition 1 can be interpreted as saying that to infer a particular eigenstate i of the model from experimental data, measured locations of at least some of the experiments conducted must exhibit behavior in this eigenstate (non-zero elements in the ith column of Cn), and that there must be some initial component of the eigenstate to begin with (non-zero ith element of bn). An experimental set-up such that the conditions of Proposition 1 hold for all eigenvalues λi will give the maximum number of eigenvalue equations obtainable from the system. If there are some λi for which the conditions of Proposition 1 are not satisfied, then this indicates that the experiment design could be revised in order to capture the full range of behavior of the system. It is also possible that some simple models could be identifiable from an incomplete set of eigenvalue equations. It is instructive to write Cn in component form C nij ¼ C i1 V 1j þ C i2 V 2j þ ⋯

ð11Þ

It can be seen from Eq. (11) that the jth column of Cn will in general be non-zero if measurements include a location with a corresponding non-zero element of the jth column of V. Recall that the columns of V show the relative size of the contribution of the corresponding eigenstates to each of the model locations, and note this confirms the interpretation above that, assuming the condition that the corresponding element of bn is non-zero is also satisfied, if the contribution of an eigenstate is non-zero for a measured location, then this eigenstate will be observed. (It is possible, but in general highly unlikely, for the combination of simultaneous measurements at different locations to exactly cancel the contribution of an eigenstate.) This can often be ensured by the design of the experimental initial setup. For example, in designing a pulse/injection type experiment, by writing bn in component form 1 1 b1 þ V i2 b2 þ ⋯ bij ¼ V i1 n

ð12Þ

it can be noted that a tracer/pulse in a single location i (i.e. bj ¼ 0, jai) will allow measurements of eigenstates corresponding to non-zero elements of the corresponding (ith) column of the inverse matrix V  1. A system of the form given by Eqs. (1) and (2) can produce at most n eigenvalue equations, as each transfer function element will have a subset of the same n possible poles. This can be seen immediately by writing Eq. (9) in component form. 2.1.5. Initial state The eigenvalue equations (Eq. (10)) inform us about the parameters of A, which as mentioned in the Introduction are the set of rate

formed, wherej1 , j2 , …, are the indices of any fits (possibly the same) of the form of Eq. (5) with non-zero residues corresponding to p1 , p2 , …. The initial state b can be expressed in terms of the vector of nonzero residues via 2 3 V 1i1 V 1i2 3 ⋯ 2r n n C C 6 j1 i1 7 j1 π ð1Þ j2 i2 6 76 7 V V 7 ð13Þ b¼6 6 C n2i1 C n2i2 ⋯ 74 r j2 π ð2Þ 5 4 j1 i1 5 j2 i2 ⋮ ⋮ where i1 , i2 , …, are the indices of non-zero elements of bn and π is the permutation defined by Eq. (10). Proposition 2 shows that if the experimental choice of measurement locations does not preclude the measurement of the model eigenstates (i.e. that the Cn has non-zero elements in every column) then the residues of fits to measured time-series can be related to the model eigenvectors V and initial state b. If the eigenvectors V are known then the components of b can be determined by Eq. (13). 2.1.6. Residue relations If the eigenvectors V are not deducible from the eigenvalue equations, additional information can be acquired in the form of residue relations between identical poles of different transfer function observables, as considered in the following proposition. Proposition 3 – residue relations. For a model of the form of Eqs. (1) and (2) and given Assumption 1, residues of an identical pole in two fits L{f}j, L{f}k, are related by C nkib r jπ ðbÞ ¼ C njib r kπ ðbÞ

ð14Þ

where ib is the index of the eigenvalue corresponding to pπ(b) as defined by Eq. (10). There are at most a number of equations for the elements of the eigenvectors V of the form of Eq. (14) equal to the number of non-zero elements of Cn minus the number of non-zero columns. In essence, Propositions 1–3 describe three different types of equations that are obtained when applying the transfer function method. Propositions 1 and 3 describe equations for the rate constants k (of which V and λ are functions) in terms of measurable quantities (residues and poles of fits). Proposition 2 relates the rate constants k to the initial state b and measurable quantities. By considering the above propositions, the number of available equations (which we earlier declared to be at most 2nm) can be identified. As mentioned in the Introduction, in a large number of cases the elements of the initial state b are independent parameters. In this case it follows from Proposition 2 that the solubility of the systems of equations for the rate constants k is the key determinant for the identifiability of the model, as summarized by the following Corollary. Corollary 1. For a model of the form of Eqs. (1) and (2) and given Assumption 1, and assuming that the parameters of the model are   k given by θ ¼ and all of the columns of Cn have at least one nonb zero element, the model is identifiable if the model eigenvalue equations Eq. (10) and residue relations Eq. (14) can be solved such that k is expressed only in terms of experimentally determined values (poles and residues of fits). The size of k cannot exceed the number of valid equations.

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

If all or some of the initial states of the model b are known a priori, for instance, by experimental design or otherwise independently determined, or are also functions of the rate constants k, then the system of equations given by Eq. (13) can give additional information on the values of the parameters of the model, meaning that less information needs to be extracted via eigenvalue equations and residue relations. This is demonstrated in the example for Application 1 below, where multiple possible parameter combinations are solutions to the set of eigenvalue equations for different permutations π. Prior knowledge of the initial state b is incorporated by comparing the set of solutions to Eq. (13) for each of the finite set of possible parameter combinations to the known initial states, with the correct parameter combination being the one that gives matching initial state values. To further highlight the usage of methods described by the propositions and corollary above, they are demonstrated in the following examples.

2.1.7. Application 1 – n-chain The first application is to the n-chain model shown in Fig. 2, and defined in equation form by 2

k1 6 k 6 1 6 d xðtÞ ¼ 6 6 0 dt 6 ⋮ 4 0

0



0

 k2



0

k2



0

0



0

0 7 7 7 0 7 7xðtÞ 7 5  kn

⋱ kn  1 2

 k1 6 k 6 1 6 The parameters of A ¼ 6 6 0 6 ⋮ 4 0

3

0



0

 k2



0

k2

⋱ ⋱

0

0



3

0

0 7 7 7 0 7 7 for this 7 5  kn

f  k1 ;  k2 ; …g. Without loss of generality choose λ ¼  k. It is readily determined that the corresponding eigenvectors are given by the rows of 2

1

k1 6 6 k2  k1 6 6 k2 k1 6 V ¼ 6 ðk3  k1 Þðk2  k1 Þ 6 ⋮ 6  6 n  1 ki 4 ∏ i ¼ 1 ki þ 1  k1

 n1 ∏ i¼2

0



0

1



0

k2 k3  k2

⋱ ⋱

0



kn  1 kn  kn  1



ki ki þ 1  k2





eTn x:

ð17Þ

To confirm that Cn has no zero columns, it is straight-forwardly computed as the nth row of Eq. (16) using Eq. (7) " #   n  1  n1 ki ki kn  1 n T ∏ ∏ ⋯ 1 C ¼ en V ¼ kn  kn  1 i ¼ 1 ki þ 1  k1 i ¼ 2 ki þ 1  k2 ð18Þ hence one of the conditions of Proposition 1 holds for all eigenstates of the model. Furthermore, as Eq. (16) is lower triangular, it can be shown that V  1 is also, and hence for bn to have no zero elements (the second condition of Proposition 1), it is required that b1 a 0 for the arbitrary initial state (again, a consequence of the uni-directional system). Assuming this, it follows from Proposition 1 that all eigenvalues of the system can be obtained from the poles of the fit to the model observable. As the rate constants k are given by the eigenvalues of the system, the rate constants are hence identifiable by Eq. (10) and by Corollary 1 it follows that Eqs. (15) and (17) are identifiable. The indices of non-zero elements of bn are hence given by iα ¼ α, α ¼ 1, 2, …, n and as there is only one experimental fit, the indices j1 ¼j2 ¼… ¼1. Applying Eq. (13) gives with some computation 2

ð15Þ

kn  1 T model are the n rate constants k ¼ k1 … kn , ki 40, i¼ 1, 2, …, n. This example considers the release stage of an experimental fill/release set-up, with the model state evolving from some arbitrary  T described by the n parameters saturation state b ¼ b1 … bn   k . bi Z0, i¼1, 2, …, n. The model parameters are given by θ ¼ b By solving detðλi I  AÞ ¼ 0, the eigenvalues of A are easily shown to be the negative of the rate constants fλ1 ; λ2 ; …g ¼ 

location, i.e. C ¼eTn, the vector of model observables Eq. (2) is

0

3

07 7 7 07 7 7 7 7 7 15

ð16Þ

Recall that the rows of Eq. (16) correspond to the components of x, and note that the ith row shows non-zero elements in the first i columns (as the elements of k are non-zero), i.e. that the ith component of x shows behavior in the first i eigenstates of the system. This is illustrative of the uni-directional behavior of the chain, where only previous locations affect the behavior at a location. Considering Proposition 1, this suggests that to capture all of the modes of the system in an experiment, it is necessary to measure at the nth location to ensure Cn has non-zero elements in the last column. With a time-series measurement at the nth

3   n1 k i þ 1  k1 r 1π ð1Þ ∏ 6 7 ki 6 7 i¼1 6 7     6 n1 k 7 n1 k i þ 1  k1 i þ 1  k2 6 ∏ r 1π ð1Þ þ ∏ r 1π ð2Þ 7 6 7 k k 6i¼2 7 i i i¼2 6 7 6 7 ⋮ 6 7   b¼6 7 j n1 k i þ 1  kl 6 7 r ∏ ∑ 6 7 1π ðlÞ ki 6 7 l ¼ 1i ¼ j 6 7 6 7 ⋮ 6 7 6 7 n 4 5 ∑r i¼1

ð19Þ

1π ðiÞ

To illustrate the meaning of the permutation π, consider an arbitrary ordering of the poles of the fit to the model observable, for example in order of decreasing size: p1, p2, …, pn. A valid solution for the rate constants according to Eq. (10) is ki ¼  pi ;

1 r ir n:

ð20Þ

This corresponds to a choice for the permutation π defined by π (i)¼ i, 1r irn. With this choice, the initial state of the model Eq. (19) is 2 3   n1 p i þ 1  p1 r 11 ∏ 6 7 pi 7 i¼1 2 3 6 6 7     b1 6 n1 p 7 n1 p  p  p iþ1 1 iþ1 2 6 7 6 r 11 þ ∏ r 12 7 ∏ 7 6 b2 7 6 p p 6 7 i i i¼2 6 7 6i¼2 7 6 ⋮ 7 6 7 ⋮ 6 7 6 7   ð21Þ 6 7¼6 7 j n1 p 6 bj 7 6  pl 7 i þ 1 6 7 6 r ∏ ∑ 7 1l 6 ⋮ 7 6 pi 7 l ¼ 1i ¼ j 4 5 6 7 6 7 ⋮ bn 6 7 6 7 n 4 5 ∑ r 1i i¼1

As all of the expressions in Eqs. (20) and (21) are only functions of the experimentally determined fit parameters, they explicitly show that the parameters of Eqs. (15) and (17) are identifiable. However, the solution specified by Eqs. (20) and (21) is not unique, as can be seen from the fact that any permutation of the set of poles { p1, p2, …} will give an alternative valid solution for the rate constants k to Eq. (20). Obviously considering Fig. 2, these solutions correspond to different physical realizations of the

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

system. From Eq. (13) it can be seen that permuting the order of two poles pα , pβ (and hence permuting the values for the rate constants) requires that the corresponding residues r 1α , r 1β also be interchanged. Considering Eq. (21), no two initial states are functions of exactly the same set of residues, which implies that permuting them will change at least one of the initial state values. It follows that each of the solutions to the rate constants leads to a unique solution to Eqs. (15) and (17), and hence there are n! possible solutions. Note also that in general, the initial component bj can be written as j

bj ¼ ∑

l¼1



kn  kl kj



n1



i ¼ jþ1



 ki  kl r 1l ki

ð22Þ

From Eq. (22) it follows that selections of k that preserve the nth and jth elements, and consist of permutations of the j þ1 to n  elements and the 1 to j  1 elements, will preserve the value of bj. To see this, we can introduce a function g l ¼ ð∏ni ¼ j1þ 1 ðki kl Þ= ki Þr l , such that bj ¼ ∑jl ¼ 1 ðkn  kl Þ=kj  g l and note that permuting two of k elements jþ 1 to n  1 would leave all gl terms invariant. Similarly changing two of k elements 1 to j  1 will change the order of the summed terms in bj without changing the value of the sum. However, changing the nth or jth elements would change the ðkn  kl Þ=kj coefficient, and result in a different bj value. We can hence state that the number of possible values for bj is hence equivalent to a combinatorial problem: the number of ways to partition n items into four groups of size 1, 1, n  j 1 and j 1. ! n2 This is easily computed to be nðn  1Þ ¼ ðn!Þ= ðn  j  1Þ n  j 1 ! i !ðj  1Þ!. The binomial coefficient is maximal for , floorði=2Þ which implies that the maximum number of solutions occur for j¼floor(n/2).

2.1.8. Application 1 – example – simulation and solving of a chain with n ¼5 In this example the application of the propositions and corollary outlined above to a simulated experimental realization of Application 1 is demonstrated. The model used is an n¼5 component n-chain (i.e. a 5-chain) which might be used to represent an experimental fill/release scheme where a fluorescently-tagged protein is used as a marker for a specific trafficking pathway from the cell surface through a series of organelles within a cell. Experimentally, the cell would be bathed in a solution containing the protein for a fixed amount of time, and then washed and observed over time at the 5th compartment. To simulate this experiment, a set of data corresponding to a time-series measurement of the 5th compartment was

9

generated for the following model: 2 3  0:45 0 0 0 0 6 0:45  0:105 0 0 07 6 7 6 7 d 6 0 0:105  0:92 0 07 xðtÞ ¼ 6 ð23Þ 7xðtÞ dt 6 7 0 0:92  0:3 0 5 4 0 0 0 0 0:3 0  T 29:5649 19:7674 8:5273 0 with an initial state b ¼ 42:1404   and measurement location C ¼ 0 0 0 0 1 . The parameters of the model Eq. (23) were generated randomly, with the initial state given in units of fraction of the total population (of protein, enzyme etc.) and the rate constants given in min  1. Using Matlab, values for x were generated for simulated time points at intervals of 0.4 min, up to 40 min (101 time-points). A standard curve-fitting program in Matlab was then used to produce a fit a 5-term sum of exponentials function of the form of Eq. (4) to the generated time-series for x5 ðt i Þ. The generated fit was as follows: f 1 ðtÞ ¼ 86:8619  expð  0:3001  tÞ þ 10:2288  expð  0:9201  tÞ  146:7978  expð  0:1050  tÞ  50:2929 expð  0:4498  tÞ þ 100:001:

ð24Þ

The generated time-series data and fit are shown in Fig. 7, with the corresponding fitted constants are given in Table 1. It was shown in Application 1 that the poles of the fit correspond to the possible values of the rate constants of model Eq. (15). Applying a priori knowledge in this case can narrow the possible combinations of parameters by noting that the rate constant corresponding to p5 ¼ 0 must denote the end of the chain (i.e. k5 ¼0). The 5 left-hand columns of Table 2 show the permutations of the remaining pi's corresponding to the possible parameter configurations of the system (4! ¼24 in total). Using Eq. (13), possible solutions for the initial populations of the chain corresponding to the possible parameter configurations were generated, also shown in the 5 right-hand columns of Table 2. Once the solutions are generated, a priori knowledge can be used to eliminate physically unrealistic solutions. For example, it is known that the model species are the relative proportion of a total population (such as protein within localized compartments of a cell), thus negative fractions are forbidden in the model. In Table 2, of the 24 possible outcomes only 8 satisfy the condition of a nonnegative initial state. Of these physically-realizable solutions, 7 are highlighted in bold and the actual parameters used to generate the data highlighted in italic, showing that method recovers the real solution. Observing Table 2 it can be noted that the 5 right-hand columns (representing the initial populations in the 5 compartments) may be useful in guiding further experiments to uniquely identify the model parameters. The locations of the model that correspond to the columns that show the widest range of values can be measured to give the greatest chance of finding the true

Table 1 5-Term exponential fit coefficients (rj's) and exponents (pi's). As shown by Eq. (5) these correspond to the residues and poles of the transfer function observable.

Fig. 7. Simulated experimental time-series data and 5-term exponential fit of the form of Eqs. (4) for 5-chain model Eq. (23).



pj ¼

rj ¼

1 2 3 4 5

 0.300  0.920  0.105  0.450 0.000

86.86 10.23  146.80  50.30 100.00

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Table 2 Possible solutions for the model initial populations. 7 of the 8 physically-realizable solutions are highlighted in bold and the actual parameters used to generate the data highlighted in italic. k1 0.920 0.920 0.920 0.920 0.920 0.920 0.450 0.450 0.450 0.450 0.450 0.450 0.300 0.300 0.300 0.300 0.300 0.300 0.105 0.105 0.105 0.105 0.105 0.105

k2 0.450 0.450 0.300 0.300 0.105 0.105 0.920 0.920 0.300 0.300 0.105 0.105 0.920 0.920 0.450 0.450 0.105 0.105 0.920 0.920 0.450 0.450 0.300 0.300

k3 0.300 0.105 0.450 0.105 0.450 0.300 0.300 0.105 0.920 0.105 0.920 0.300 0.450 0.105 0.920 0.105 0.920 0.450 0.450 0.300 0.920 0.300 0.920 0.450

k4 0.105 0.300 0.105 0.450 0.300 0.450 0.105 0.300 0.105 0.920 0.300 0.920 0.105 0.450 0.105 0.920 0.450 0.920 0.300 0.450 0.300 0.920 0.450 0.920

k5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

b1 171.52 171.52 171.52 171.52 171.52 171.52 42.110 42.110 42.110 42.110 42.110 42.110 36.196 36.196 36.196 36.196 36.196 36.196 64.809 64.809 64.809 64.809 64.809 64.809

b2  253.19  253.19  200.83  200.83  120.46  120.46  123.77  123.77  17.77  17.77 29.61 29.61  65.50  65.50  11.86  11.86 44.01 44.01  13.75  13.75 6.91 6.91 15.40 15.40

b3

b4

"

b5

V¼ 157.31 173.15 104.96 123.62 40.42 43.25 157.31 173.15 51.31 72.88 19.76 25.50 104.96 123.62 51.31 72.88 14.11 17.01 40.42 43.25 19.76 25.50 14.11 64.809

24.35 8.52 24.35 5.68 8.52 5.68 24.35 8.52 24.35 2.78 8.52 2.78 24.35 5.68 24.35 2.78 5.68 2.78 8.52 5.68 8.52 2.78 5.68 2.78

0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

[italic] ¼ correct solution to 0 d.p. [bold] ¼ physically realistic solutions.

parameter combination. For example, observe that the columns for b2 and b3 show 24 different possible pairs outcomes, compared to the 12 possible pairs of outcomes for b1 and b4. Hence performing experiments to measure the initial state at the 2nd and 3rd locations of the model would fully determine the correct solution for the rate constants, and hence the unique correct solution for the model parameters (rate constants and initial state). Once the unique solutions for the model parameters are determined, then the time courses for the protein concentration at all locations of the model are readily determined. Hence given a time-series measurement at the 5th compartment, together with the measurement of the initial state at two intermediate locations (2nd and 3rd compartments) we may fully determine the time-courses at all the model locations. 2.1.9. Application 2 – single feedback loop In the following example the methods of this section are applied to the feedback loop shown in Fig. 3. This configuration is seen for example in the first two compartments of the reaction network model used by Belward et al. (2011) shown in Fig. 1. The equations governing the single feedback loop model are " #  k1 k1 d x¼ x ð25Þ k1  ðk  1 þk2 Þ dt The parameters of the model are the 3 rate constants, k1 , k  1 , k2 , and the 2 initial state parameters b1 , b2 . The eigenvalues of " # k1  k1 A¼ are straight-forwardly shown to be k1  ðk  1 þ k2 Þ given by the roots of the quadratic equation s2 þ ðk1 þk  1 þ k2 Þs þ k1 k2 ¼ 0

straight-forward computation it can also be shown that the eigenvectors of A are given by the columns of

ð26Þ

Define the two roots  of Eq. T(26) as Q and R, and without loss of generality choose λ ¼ Q R . Here it can be noted that there are 3 parameters of the model Eq. (25) and only (at most) 2 eigenvalue equations, and hence the eigenvalue equations of the model Eq. (25) are not enough to identify the model parameters. By

k1

k1

Q þ k1

R þ k1

# ð27Þ

Considering Eq. (27), first it can be noted that assuming k  1 4 0 (i.e. the feedback loop exists – this condition also ensures Q a  k1 ; R a  k1 ), the elements V are non-zero, so that any (non-zero) choice of C will satisfy the first condition of Proposition 1 for Cn to have non-zero columns. It also follows that V  1 will also have no zero elements, and hence it is required that b must have at least one non-zero component in order to satisfy the second condition of Proposition 1 that bn have no zero components. This is easily achieved by any number of input schemes, such as a pulse or fill/release scheme. Note that Eq. (27) cannot be expressed solely in terms of the eigenvalues Q and R (this would require writing k1 and k  1 as functions of Q and R which would imply that the model is identifiable from the eigenvalue equations). This implies that, in order to apply Corollary 1, multiple independent time-series measurements of locations of x are needed to obtain residue relations. Noting that Eq. (25) has only two components (i.e. n ¼2), a choice of C that has rank 2 would (by definition) give model observables that are linearly independent combinations of the components of Eq. (25), and hence provide non-trivial residue relations via Eq. (14). An obvious option is to make independent measurements at both locations, corresponding to C ¼I2. However, to illustrate the fact that C can be dependent on the model parameters and to align with the linked model concepts described " # 1 0 in Section 2.2, this example will use C ¼ , signifying a 0 k2 measurement of the output of the second compartment (along the k2 path) as well as the internal behavior of the first compartment. How would this measurement of output be achieved in practice? Consider that measurements of a model can be broadly thought of as being the difference of two components: the input into the compartments measured and the output from the compartments. Arranging measurements such that either of these is zero (or negligible) will allow the other to be inferred. In other words, the output of a closed model with no inputs (above the initial state) can be inferred by the decrease in measurements of all locations of the model over time (as with the exit rate example explained in the text below Eq. (2)). Conversely, if the output of a model is known to input into another closed model (an observable model as defined in Section 2.2) then that input can be inferred from the increase in measurements of all locations of the observable model. The simplest illustration of this would involve the output of a model (such as the output of the single-loop feedback model along the k2 path) linking to a single compartment with no output. In this case, measuring at this compartment over time would give exactly the output of the single-loop model along the k2 path. With this choice it follows from Eq. (2) that: " y¼

1

0

0

k2

# x

ð28Þ

The proposed experiment will produce two fits of the form of Eq. (5), which will be denoted L{f}1 and L{f}2. With the proposed experiment implied by Eq. (28), by Proposition 1, Q and R are identifiable from Eq. (10). Initially choosing the permutation π such that π(1) ¼1, π(2)¼ 2, the earlier iT  T h choice of λ implies Q R ¼ p1 p2 . From Eq. (14) the

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

following relationships for the fits L{f}1 and L{f}2 can be written: " # " # k2 ðp1 þ k1 Þr 11 k  1 r 21 ¼ ð29Þ k2 ðp2 þ k1 Þr 12 k  1 r 22 which can be shown to have the solution for the parameters k  1 =k2 and k1 2 3 ðp2  p1 Þr 11 r 12 " # k  1 =k2 6 r 11 r 22 r 12 r 21 7 7 ð30Þ ¼6 4 p2 r 12 r 21 p1 r 11 r 22 5 k1 r 11 r 22 r 12 r 21 With the above choice of π it can be noted that Eq. (26) implies two equations: p1 þ p2 ¼ k1 k2 and p1 p2 ¼ k1 þ k  1 þ k2 , which are equivalent to the eigenvalue equations. Only one of these is needed to determine k2 k2 ¼

p1 p2 p1 p2 ðr 11 r 22 r 12 r 21 Þ ¼ p2 r 12 r 21  p1 r 11 r 22 k1

ð31Þ

and from Eq. (30) it follows that: k1 ¼

ðp2  p1 Þr 11 r 12 ðp p1 Þp1 p2 r 11 r 12 k2 ¼ 2 r 11 r 22  r 12 r 21 p2 r 12 r 21  p1 r 11 r 22

ð32Þ

Hence Eqs. (30)–(32) show that the parameters of A of Eqs. (25),(28) can be expressed in terms of experimentally determined quantities from measurements over time at location 1 and of the output along the k2 path. By Corollary 1 it follows that Eqs. (25) and (28) are identifiable with these measurements. Applying Eq. (13) with some computation it can be shown that the initial state can be written explicitly as " # r 11 þ r 12 b ¼ r21 þ r22 : ð33Þ

11

relationships required. The corresponding observables of the model are 2 3 k1 6 7 6 k2 7 7 y¼6 ð35Þ 6 ⋮ 7x: 4 5 kR By considering Cn ¼ C observe that each element of the transfer function will have an identical pole corresponding to the negative of the sum of the rate constants. From Eqs. (10) and (14) the residue relations of Eqs. (34) and (35) are 2 3 2 3 k1 r 21 k2 r 11 6 ⋮ 7 6 ⋮ 7 ð36Þ 4 5¼4 5 kR r 11 k1 r R1 which in combination with the remaining eigenvalue equation i p1 ¼  ∑Ri¼ 1 k1 , after some manipulation can be solved uniquely for the rate constants 2 3 2 3 r 11 k1 6 7 6 r 21 7 6 k2 7 7 6 7 ¼  p1 6 ð37Þ 6 7: 6 ⋮ 7 R 4 ⋮ 5 ∑ r 4 5 i ¼ 1 i1 r R1 kR Applying Corollary 1, the initial state b follows from Eq. (13): b¼

∑R r i1 r 11 ¼  i¼1 : k1 p1

ð38Þ

In conclusion, the diverging R-branch model is uniquely identifiable from measurements over time of the output of each of the model branches.

k2

The solution given by Eqs. (30)–(33) is unique, despite there being two choices for the permutation π. To see this, note that the other permutation, given by π(1) ¼2, π(2)¼ 1 corresponds to the transformation p12p2, rj12rj2, j¼1, 2. It is easily checked that Eqs. (30)–(33) are invariant under this transformation. 2.1.10. Application 3 – diverging R-branch The above methods may also be applied to the diverging R-branch model, as shown in Fig. 4. Mathematically, the ‘diverging branch point’ model is described by R dx ¼  ∑ ki x: dt i¼1

ð34Þ

The parameters of model  T Eq. (34) are given by the R rate constants k ¼ k1 … kR , and the initial state b. The eigenvalue of Eq. (34) is given by λ ¼  ∑Ri¼ 1 ki , with the corresponding eigenvector V ¼1. It is clear that the R rate constants cannot be determined from the eigenvalue, however, residue relations can be obtained by measuring the output along various branches (see text above Eq. (28) for a discussion of how this could be achieved experimentally). To individually determine the rate constants at least R  1 residue relations are needed, which would give R total equations for which to solve the R unknowns. The condition of non-zero columns of Cn of Proposition 1 is trivially satisfied by any non-zero C (i.e. any measurement). Noting also that V  1 ¼1, it follows that at least the initial state b 40 to satisfy that bn be non-zero, which is the second condition of Proposition 1. This condition can be satisfied for example by a fill/release scheme or a pulse introduced at the model location. To obtain the residue relations needed to determine the rate constants, an experiment measuring the output along the each of the R branches independently would give the R1 residue

2.2. Observing and identifying linked models This section extends the work of the previous section to complex models which contain as sub-components the simple models (n-chain, single-loop feedback, diverging R-branch) considered above. For example, the model used by Belward et al. (2011) (Fig. 1) has features that are recognizable as examples of these simple models, combined in a more complicated form. For the dual purposes of building an intuitive understanding of these complicated models and to develop computationally useful tools for dealing with them, this work introduces the concepts of linked, hidden and observable models. Essentially, two independently parameterized models are considered to be linked if the outputs of one model are included as inputs into the other, as demonstrated in Fig. 8 (a more rigorous definition is discussed below). If the first model (model A in Fig. 8) is unable to be directly measured it is considered hidden, the other model, on which experimentation is possible, is known as observable (model B in Fig. 8). As will be shown later, the hidden model does not need to be identifiable, however the structures of the hidden and observable models need to be defined (i.e. models must have explicitly defined A, b and C matrices). 2.2.1. Discussion of terms A linked model consists of two component models. In the following the subscripts H and O are applied to refer to the hidden and observable realizations of the corresponding variables as defined in Section 2.1. For example, in the following the governing equations of the hidden model of degree nH are denoted: d xH ðtÞ ¼ AH xH ðtÞ dt yH ðtÞ ¼ CH xH ðtÞ:

ð39Þ

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

of the linked model given that the hidden and observable models are themselves identifiable. In a large number of experimental situations it may be possible to measure at hidden model locations, breaking the assumption that hidden model locations are not directly measurable. In such cases, it can be seen that the hidden model behavior is completely independent of the observable model, and the analysis of Section 2.1 can be applied unaltered. The information obtained via observable model measurements can under certain circumstances (see Corollary 3) be used to recover the hidden model transfer function, and hence provide additional effective measurements of the hidden model which can be combined with direct measurements.

Fig. 8. The output of the hidden model A is input into an observable model B which can be measured. The objective is to see what can be inferred about model B and model A by measuring model B. It is assumed that the structure of the hidden model is defined, though not necessarily measureable.

2.2.2. Linked model transfer function Suppose that the linked model Eq. (41) satisfies Assumption 1. " # AH 0 The eigenvalues of AL ¼ can easily be shown to be given CH A0 by

λL ¼ with the hidden model initial state vector bH . The observable model of degree nO , without input from the hidden model, is governed by the equations d xO ðtÞ ¼ AO xO ðtÞ; dt yO ðtÞ ¼ CO xO ðtÞ

ð40Þ

with initial state bO . With (39) and (40) a linked model can be constructed by #" " # " # xH ðtÞ AH 0 d xH ðtÞ ¼ ; xO ðtÞ CH A0 dt xO ðtÞ yL ðtÞ ¼ CO xO ðtÞ:

ð41Þ

It can be seen that the input to the observable model from the hidden model is exactly yH . Here, the link matrix CH is an nO  nH matrix which can be a function of the parameters of AH , defined so that C Hαβ 4 0 if a link exists between the hidden model component xHβ and the observable component xOα , and C Hαβ ¼ 0 otherwise. Note that the observables yL ðtÞ of the linked model Eq. (41) consist of superpositions of the states of the observable model only, hence the term hidden for the component model Eq. (39). A formal definition of a linked model is as follows. Definition 2. A linked model is a model of with a combined form Eq. (41), with a component hidden model given by Eq. (39) and a component observable model Eq. (40). In the following, it is assumed that measurement opportunities are restricted to the locations of observable component of the model, as this work is focused on the problem of how the hidden model can be indirectly measured, via measuring locations of the observable model. Two interesting questions which are considered in the following are:

"

λH λO

# ð42Þ

where λH and λO are the eigenvalues of AH and AO respectively. It follows that if AL has distinct eigenvalues, then AO and AH will also have distinct eigenvalues and hence both component models Eqs. (39) and (40) satisfy Assumption 1. 2.2.3. A note on Assumption 1 for linked models By Eq. (42), a linked model composed of diagonalizable component models with distinct eigenvalues will also be diagonalizable if no component eigenvalues are the same. For the majority of cases this should hold, however, at most one component model can have a zero eigenvalue. With Assumption 1, it follows that time-series measurements of the system Eq. (41) will have the form corresponding to Eq. (4). It can be shown that the transfer function Eq. (6) for the linked model Eq. (41) is

φL ¼ CO ζðsI  diagðλH ÞÞ  1 bnH þ CnO ðsI diagðλO ÞÞ  1 ðbnO  VO 1 ζbnH Þ ð43Þ with ζ given by 2

2 n 3 C H11 6 7  16 n 6 ð λ I A Þ 4 C H21 5 ζ ¼ 4 H1 O ⋮

33 C nH1nH 6 77 ðλHnH I  AO Þ  1 4 C nH2nH 5 7 5 ⋮ 2



ð44Þ

where C nHαβ are the elements of CnH . One can interpret the rows of ζ as denoting the relative components of the hidden model eigenstates that can be observed from the corresponding observable model component. To see this, it can be instructive to derive the eigenvectors of AL corresponding to Eq. (42), given by the columns of " # VH 0 VL ¼ : ð45Þ ζ VO

 How is the identifiability of the observable model affected when considering it as part of a linked model?

 What can be inferred about a hidden model if measurements of the observable part only are possible? To help answer these questions, the following presents some propositions and corollaries on the information that can be determined about the hidden and observable components of a linked model, from measurements over time of the observable model. Also presented is a sufficient condition for the identifiability

It is illustrative to interpret Eq. (43) in 2 parts: terms with poles corresponding to the hidden model eigenvalues λH , and terms with poles corresponding to the observable model eigenvalues λO . The first term of Eq. (43), corresponding to the hidden model eigenvalues, is similar in form to Eq. (9), but with CnH replaced by n C^ ¼ CO ζ:

ð46Þ

This new variable represents the components of the hidden model eigenstates that are observed through time-series

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

measurement of the observable model. The second term, corresponding to the observable model eigenvalues, is also of similar form to Eq. (9), but with the initial components of the corresponding eigenstates of the linked model Eq. (41) being composed of a n contribution from the initial components bO of the observable model eigenstates and a contribution from the eigenstates of the n hidden model VO 1 ζbH . Define n n b^ ¼ VO 1 ðbO  ζbH Þ:

ð47Þ

With Eqs. (46) and (47) the diagonalized transfer function of the linked model can be expressed as n

n

φL ¼ C^ ðsI  diagðλH ÞÞ  1 bnH þ CnO ðsI  diagðλO ÞÞ  1 b^

ð48Þ

The coefficients expressed by Eqs. (46),(47) are important indicators of the information that can be obtained from a proposed experimental set-up, as will be seen in the following. As with Section 2.1, in the following a series of propositions and corollaries are presented that link the parameter form of the transfer function Eq. (48) with experimentally measured fits of the form of Eq. (5), beginning by considering experimental requirements for obtaining eigenvalue equations from the linked model. Proposition 4 – eigenvalue equations. I. For a linked model Eq. (41) and given Assumption 1, an eigenvalue λHi of the hidden component model Eq. (39) is given by a pole pα of a fit of the form of Eq. (5) to the linked model if the n n ith column of C^ and the ith element of bH are non-zero. II. An eigenvalue λOi of the observable model Eq. (40) is given by a pole pj of a fit of the form of Eq. (5) to the linked model if the jth n column of CnO and the jth element of b^ are non-zero. By comparing Proposition 4 to Proposition 1 with regards to the component models, we see the slightly changed condition in that n C^ replaces CnH as an important indicator of the measurability of n the hidden model eigenvalues. The condition on C^ (of non-zero columns) corresponds to whether or not the eigenstate dynamics of the hidden model Eq. (39) are transferred into the observed components of the linked model. It can be noted that in the case when the conditions of Proposition 1 hold for an eigenvalue of the observable model Eq. (40) then in general that eigenvalue will appear as a pole of a fit to the linked model transfer function, as for n all j C nOhj bOj will be non-zero for some h. Thus, in general, the eigenvalue equations of the observable model (considered independently using the methods of Section 2.1) will also appear as eigenvalue equations of the linked model. Note also that from Eq. (47), even if the initial state of the observable model is zero, the eigenstates of the observable model may still be measurable n (i.e. b^ a 0). For example, this might correspond to an experimental scheme where a pulse or tracer input is introduced to a hidden component model location. In a similar vein to the single model eigenvalue equations Eq. (10), an expression for the linked model eigenvalue equations can be derived. Let u be the number of eigenvalues λHi1 , λHi2 , …, of the hidden model, and v be the number of eigenvalues λOj1 , λOj2 , …, corresponding to the observable model, where the conditions of Proposition 4 hold. The linked model eigenvalue equations can be expressed as h iT λHi1 … λHiu λOj1 … λOjv h ¼ pπ ð1Þ



pπ ðuÞ

pπ ðu þ 1Þ



pπ ðu þ vÞ

iT

ð49Þ

for some arbitrary permutation π. As explained in the text following Eq. (10), the permutation π represents the ways of

13

ordering the terms of an experimental fit to correspond to the theoretical model eigenvalues.

2.2.4. Initial state As can be seen from Proposition 4, the eigenvalue equations of the linked model only provide information about the parameters of AH and AO , i.e. the hidden and observable model rate constants, kH and kO . As with the single-model in Section 2.1, for the linked model information relating to the initial states bH and bO can be obtained from fit residues, as shown in the following proposition. Proposition 5. I. For a linked model Eq. (41) and given Assumption 1, if all of the n columns of C^ have at least one non-zero element then for the h iT n non-zero elements of bH a vector r h1 π ð1Þ … r hu π ðuÞ of residues corresponding topπ ð1Þ , … pπ ðuÞ can be formed, where π is the permutation defined by Eq. (49), and h1 , … hu are the indices of any fit (possibly the same) of the form of Eq. (5) with non-zero residues corresponding pπ ð1Þ , … pπ ðuÞ . The initial state bH of the hidden model Eq. (39) can be expressed in terms of the vector of residues via 2V 3 V H1i1 3 ⋯ ^ nH1iu 2 r n C hu iu 7 h1 π ð1Þ 6 C^ h1 i1 6 76 7 V ⋮ 7 bH ¼ 6 ð50Þ 5 6 ^ nH2i1 ⋯ V^ nH2iu 74 4 C h 1 i1 C hu iu 5 r hu π ðuÞ ⋮ ⋮ n

where i1 , … iu are the indices of non-zero elements of bH . II. Additionally to I, if all of the columns of CnO have at least one non-zero element, then Proposition 2 holds for the linked model, and a vector of residues h iT r g1 π ðu þ 1Þ … r gv π ðu þ vÞ corresponding topπ ðu þ 1Þ , …, pπ ðu þ vÞ can be formed, where g 1 , … g v are the indices of any fits (possibly the same) of the form of Eq. (5) with non-zero residues corresponding pπ ðu þ 1Þ , … pπ ðu þ vÞ . The initial state bO of the observable model can be related to the u þ v residues of the fits Eq. (5) to the linked model via 2ζ 3 2 3 ζ 1i1 V O1j1 V 3 ⋯ ^ n1iu 2 r n ⋯ C nO1jv 2 r g π ðu þ 1Þ 3 C hu iu 7 h1 π ð1Þ 6 C^ h1 i1 6 C nOg1 j1 Ogv jv 7 1 6 76 76 7 6 7 ⋮ ζ 2i1 V O2jv 74 ζ 2iu 74 ⋮ 5 þ 6 V O2i1 bO ¼ 6 5 6 n 7 6 Cn ⋯ Cn 7 4 C^ h i ⋯ C^ nhu iu 5 r 4 Og1 j1 Ogv jv 5 r g v π ðu þ vÞ 1 1 hu π ðuÞ ⋮ ⋮ ⋮ ⋮ ð51Þ n

where j1 , …, jv , are the indices of non-zero elements of b^ . From the systems of equations given by Eqs. (50),(51) it can be seen that there are three matrices that are important in evaluating the initial state of the linked model: VH , VO and ζ. It follows that the initial state of the hidden and observable models can be determined from Eqs. (50) and (51) if kH and kO are known from solving eigenvalue equations or residue relations as discussed below. Alternatively, if the initial states are known a priori, for instance by experimental design or independent measurement, then the equations provide additional information that can be used to narrow down kH and kO . 2.2.5. Residue relations If the components of VH , VO and ζ are not deducible from the eigenvalue equations, residue relations for the linked model can be found in a similar manner to the single-model version in Section 2.1.

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Proposition 6 – residue relations. For a linked model Eq. (41) and given Assumption 1, residues of an identical pole in two fits L{f}g, L{f}h are related by n n C^ gib r hπ ðbÞ ¼ C^ hib r gπ ðbÞ ;

1 rb r u;

C nOgjc r hπ ðu þ cÞ ¼ C nOhjc r gπ ðu þ cÞ;

1 rc r v:

ð52Þ

where ib and jc are the indices of the eigenvalues corresponding to pπ(b) and pπ(u þ c) respectively as defined by Eq. (49). There are at most a number of equations for the elements of ζ of the form of the first n row of Eq. (52) equal to the number of non-zero elements of C^ minus the number of non-zero columns. Similarly, there are at most a number of equations for the elements of the eigenvectors VO of the observable model Eq. (40) equal to the number of non-zero elements of CnO minus the number of non-zero columns. Recall that the first of the two questions posed at the beginning of this section related to the effect on the identifiability of a model when considered as part of a linked model. One interesting consequence of Eqs. (49) and (52) is that the eigenvalue equations obtained for the observable model eigenvalues, and the corresponding residue relations are exactly equivalent to the single model eigenvalue equations and residue relations, if the observable model was considered by itself. This implies the following Corollary. Corollary 2. For a linked model Eq. (41) and given Assumption 1, if kO is identifiable from the eigenvalue equations and residue relations from the observable model Eq. (40) transfer function, then they are identifiable from the eigenvalue equations and residue relations obtained from the linked model transfer function. Note that identifiability results for the initial state of the observable model are not necessarily applicable, due to the dependence on ζ in Eq. (51). It can be noted that, for the case where the initial states bH , bO of the component models are themselves model parameters (and therefore independent of kH and kO ), if kH and kO are identifiable from the eigenvalue equations Eq. (49) and residue relations Eq. (52) from Proposition 6 then the conditions of Corollary 1 hold for the linked model. If this is true and the conditions of Proposition 5 also hold for the linked model, then the linked model is identifiable. 2.2.6. Hidden model transfer function As stated in Corollary 2, the observable component model eigenvalue equations and residue relations are unchanged when measuring the linked model. The second interesting question raised at the beginning of this section relates to what information about the hidden model can be obtained from measurements of the observable model. To begin to answer this, it can be noted that by Eq. (41) the elements of the input to the observable model are exactly the observables of the hidden model Eq. (39), and hence are linear combinations of the hidden model components, and have the form of sums of exponentials. In essence, the linked model can be thought of as introducing these sums of exponentials as input to (components of) the observable model. It can be shown that with a careful choice of (observable) locations to measure, the input to the observable model, i.e. the transfer function of the hidden model Eq. (39), can be recovered from the linked model transfer function. It follows that eigenvalue equations, equations for the initial state and residue relations can be obtained for the hidden model via the propositions described in Section 2.1. Consider the proposition and following corollary below. Proposition 7 – hidden model transfer function. For a linked model Eq. (41) and given Assumption 1, if (at least one element of) n n the ibth column of C^ and the ibth element of b are non-zero, then, H

n

denoting the indices of non-zero elements of the ibth column of C^ as h iT of the residues of the pole g 1 , g 2 …, a vector r g1 π ðbÞ r g2 π ðbÞ … pπ(b) corresponding to the ibth eigenvalue can be formed, where π is the permutation defined in Eq. (49). Denoting the indices non-zero elements of the ibth column of CnH as h1 , h2 , …, if the matrix 2 3 C O ðλHib I  AO Þg1 h11 C O ðλHib I  AO Þg1 h12 ⋯ 6 7 6 C ðλ I  A Þ  1 C ðλ I  A Þ  1 7 ð53Þ 4 O Hib 5 O g 2 h1 O Hib O g 2 h2 ⋮ is full rank, then the vector of residues can be related to the numerators of the elements of the transfer function of the hidden model Eq. (39) via 2

3 2  1 n C O λHib I  AO g h C nHh i bHi 1 1 6 n 1b n b7 6  1 6C 7 6 4 Hh2 ib bHib 5 ¼ 4 C O λHib I  AO g h 2 1 ⋮ ⋮

 1 C O λHib I  AO g h 1 2  1 C O λHib I  AO g h 2 2



3  12 7 7 5

r g1 π ðbÞ

3

6r 7 4 g2 π ðbÞ 5 ⋮

ð54Þ where C O ðλHi I  AO Þαβ are the elements of CO ðλHi I AO Þ 1

1

. Note that

n the ibth column of C^ must have at least as many non-zero elements as the corresponding column of CnH , in order for Eq. (53) to be full rank.

Corollary 3. For a linked model Eq. (41) and given Assumption 1, the input to the observable model (i.e. the hidden model Eq. (39) transfer function) can be recovered from the linked model transfer function if the eigenvalues λO and eigenvectors VO of the observable model Eq. (40) and the eigenvalues λH for the hidden model Eq. (39) are identifiable from the eigenvalue equations and residue relations of n the linked model transfer function, and for all non-zero columns of C^ the conditions of Proposition 7 hold. 2.3. A note on non-hidden models An important thing to note here is that no assumptions have been made about the identifiability of the hidden model from the recovered input to the observable model, hence Corollary 3 holds even if the hidden model Eq. (39) is not identifiable. If this is the case, and it is in fact possible to make measurements at additional locations of the hidden model (contravening an assumption of the above propositions), then, as mentioned earlier, any additional experiments can be combined with the elements of the input to the observable model recovered from measurements of the linked model to construct a combined transfer function for the hidden model. The methods of Section 2.1 can be applied to all the available data to attempt to identify the hidden model. Returning to the first of the questions at the beginning of this section, recall that as stated earlier the initial state of the observable model non-trivially has a dependence on ζ. Consider the following corollary. Corollary 4. For a linked model Eq. (41) for which the conditions of Corollary 3 hold and given Assumption 1, the initial state of the observable model is identifiable. The initial state can be evaluated by 2 n 2 n n 3 n 3 C H1i1 bHi1 C H1iu bHiu 6 7 6 7 bO ¼ ðλHi1 I  AO Þ  1 4 C nH2i bnHi 5 þ ⋯ þ ðλHiu I  AO Þ  1 4 C nH2iu bnHiu 5 1 1 2

V O1j1

6 C nOg1 j1 6 V þ6 6 C nO2i1 4 Og1 j1 ⋮

⋯ ⋯

V O1jv C nOgv jv

⋮ 3

⋮ 2

3

7 r g1 π ðu þ 1Þ 76 7 ⋮ V O2jv 74 5 7 C nOg j 5 v v r gv π ðu þ vÞ ⋮

ð55Þ

where i1,…,iu, j1,…,jv, g1,…,gv are as described in Proposition 5.

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

As with Corollary 3, no assumptions have been made about the identifiability of the hidden model from the hidden model transfer function, and hence as stated earlier it follows that the observable model may be identified even in cases where the hidden model is unidentifiable, or of unknown form. The following are some examples of applications to linked models of the above propositions and corollaries to determine the models' identifiability. 2.3.1. Application 4 – linked n-chain model As shown in Corollary 3 above, the linked model methodology can under certain circumstances allow the recovery of the input to an observable model. The following example considers a linked model consisting of an arbitrary hidden model of the form of Eq. (39) linked into the beginning of an n-chain model described by Eqs. (15) and (17). This is similar to the model shown in Fig. 5, with an arbitrary model input replacing the fixed source. In the following, the application of Corollary 3 is demonstrated to recover the input to the observable model (i.e. the hidden model transfer function). Recall that for the observable model, the n-chain model of Application 1, measurements at the end of the chain are sufficient to identify the model. It follows from Corollary 2 that the eigenvalues and eigenvectors of the observable model are identifiable from measurements of the linked model composed of the n-chain and hidden input model. Recall that the eigenvalues of the hidden model are also identifiable if the conditions n n of Proposition 4 – that C^ and bH have non-zero columns and elements respectively – hold for the eigenvalues of the hidden model. To compute ζ, we first compute the set of matrices 2 

λHb I  AO

1

1 λHb þ k1

6 6 k 6 ðλHb þ k1 Þð1λHb þ k2 Þ ¼6 6 ⋮ 6 1k 4 ∏ni ¼ 1 i ∏ni ¼ 1 ðλHb þ ki Þ

1

λHb þ k2



1k ∏ni ¼ 2 i

∏ni ¼ 2 ðλHb þ ki Þ

⋱ …

1

λHb þ kn

7 7 7 7 7 7 5

C nH11

λH1 þ k1 6 n 6 6 ðλ þkk1 CÞðH11 H1 1 λH1 þ k2 Þ ζ¼6 6 ⋮ 6 4 C n ∏n  1 ki H11

i ¼ 1

∏ni ¼ 1 ðλH1 þ ki Þ

C nH12

λH2 þ k1 k1 C nH12 ðλH2 þ k1 ÞðλH2 þ k2 Þ



n

1k C H12 ∏ni ¼ 1 i ∏ni ¼ 1 ð H2 þ ki Þ

λ



ð56Þ

3

7 7 ⋯7 7 7 7 5 …

By Eqs. (46) and (17) it follows that:  n n1  C H11 ∏ ki C nH12 ∏n  1 ki C^ ¼ eTn ζ ¼ ∏n ðλiH1¼ þ1 ki Þ ∏n ðλiH2¼ þ1 ki Þ ⋯ n

i ¼ 1

i ¼ 1

λ

∏n ð Hi þ ki Þ n r 1π ðbÞ C nH1ib bHib ¼ i ¼ 1 n  1b ∏i ¼ 1 ki

ð59Þ

with the corresponding poles given by Eq. (49). As Corollary 4 holds under the same conditions as Corollary 3, Eq. (55) can be applied, which after some computation gives the initial state of the observable n-chain model as 3 2 n  1 λHib þki þ 1 ∏ 7 6 2 3 ki 6i¼1 7 bO1 6  7 6 7 6 n  1 λHib þki þ 1 7 6 bO2 7 6 ∏ 7 6 7 7 6 ki 6 ⋮ 7 7 u 6i¼2 6 7 6 7 ⋮ 6 7¼ ∑ 6 7r 6 bOj 7 b ¼ 1 6 n  1  7 1π ðbÞ 6 7 6 7 λ þk Hi i þ 1 b 6 ⋮ 7 6 ∏ 7 4 5 6 7 ki 6 i¼j 7 6 7 bOn 4 5 ⋮ 2

1

3  ki þ 1  k1 r 1π ðu þ 1Þ 6 7 ki 6 7 i¼1 6 7    6 n  1k 7 n1 k  k k 1 2 i þ 1 i þ 1 6 ∏ 7 þ r r ∏ 6 1π ðu þ 1Þ 1π ðu þ 2Þ 7 ki ki 6i¼2 7 i¼2 6 7 6 7 ⋮ 6 7   þ6 7 n1n1 k  k 2 iþ1 6 7 r 1π ðu þ cÞ ∑ ∏ 6 7 k 6 7 i c ¼ 11 ¼ c 6 7 6 7 ⋮ 6 7 n 6 7 4 5 ∑ r 1π ðu þ jÞ 

n1



ð60Þ

Noting that in our example, all hidden model links are to the initial chain location, implying that CnH is a matrix with a single non-zero row, i.e. C nHαβ ¼ 0, for α 41. Applying Eq. (44) we find that ζ is given by 2

g 1 ¼ g 2 ¼ ⋯ ¼ 1, it follows that:

j¼1

3

0

15

ð57Þ

Hence we have shown that the observable component of the linked model is identifiable. So, what does it mean to have an experimentally determined measurement of the input to the observable model? This information is closely related to that provided by experimental fits to observations of the hidden model itself, in that it can provide the same eigenvalue equations, residue relations and equations for the initial state, i.e. once the input to the observable model is determined, the identifiability of the hidden model can be investigated using the methods described in Section 2.1. As noted earlier, if measuring at hidden model locations is possible, then conducting experiments to measure at these locations over time will contribute additional equations that can be combined with those inferred from measurements of the observable model via Eq. (54) and Eq. (49). Additionally, as demonstrated by Eq. (60), if the input to the observable model is known then it is possible to isolate the observable model component of the linked model behavior and determine the initial state of the observable model.

ð58Þ

Hence, with the assumptions that CnH has non-zero columns (i.e. C nHj1 a 0 for each j¼1,2,…) and no eigenvalues of the hidden n model equal any observable rate constants (Assumption 1), C^ will n be a single row of non-zero elements. Requiring also that bH have no zero components, it follows that for the linked model, the conditions of Proposition 4 hold for all of the hidden model eigenvalues λH . It also follows that the conditions of Corollary 3 hold for all columns of Eq. (58) (as there will be one non-zero n element of C^ for each non-zero element C nHj1 , j¼1,2,…), and hence the input to the observable model is recoverable. The residues of the input transfer function are found explicitly from Eq. (54). Noting that the single measuring location as defined by Eq. (17) implies a transfer function with only one element (and hence a single experimental fit), the index of which we denote h1 ¼

2.3.2. Example – constant input hidden model The linked model methodology can be used to investigate some commonly applied experimental schemes, as the following instance of Application 4 shows. An experimental scheme of interest involves applying a constant input to a model. This scheme often serves as the fill (or saturate) phase of a fill/release scheme was discussed in the Introduction. What can be learned from the fill phase? The fill scheme can be investigated by including a simple single-component fixed source model as the hidden model into Application 4, as shown in Fig. 5. The hidden fixed source is modeled as dxH ¼ 0; dt y H ¼ e1 k0 x H :

ð61Þ

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

The system Eq. (61) can be easily solved to get yH ¼ e1 k0 bH

ð62Þ

for an initial state bH . That CH ¼ k0 e1 indicates that the model output along the k0 path is mapped to the first component of Eqs. (15),(17). It can be seen that the system Eq. (62) has two parameters: the entrance rate k0 , and the initial (constant) state bH . To give physical examples of these parameters, consider the cell bathing experiment described in the Introduction. The fixed source location xH would correspond to the cell surface, with the initial state bH corresponding to the maximum amount of protein that can attach to the surface, which is constant as protein is instantly replaced as it is internalized. The entrance rate k0 would correspond to the rate that surface attached protein is internalized. Taking the Laplace transform of Eq. (62) it can be noted that the two parameters appear only as a product k0 bH in the model transfer function. Hence as defined the individual parameters cannot be individually determined and the fixed source model is not identifiable without a priori knowledge of either of the parameters. Note that from Eq. (61) it follows that λH ¼ 0 and V H ¼ 1. Hence for Assumption 1 to hold for the linked model it is required that none of the n-chain eigenvalues (i.e. the chain rate constants) are zero. If this is the case the observable n-chain component can still be identified, as shown above. To see how the constant input affects the initial states of the n-chain and to determine the relative size of the input, note that C H11 ¼ k0 and hence we can apply Eq. (60) to get

2.3.3. Steady state solution For an experimental situation where a fill/release scheme is used (on an open system), if the fill scheme is continued long enough, the system will saturate, i.e. reach a steady state as t-1. In the above example, the steady state can be shown to be 2 3 2 k0 3  k0 k1 6 7 6⋮7 7 xO ðt-1Þ ¼ AO 1 4 ⋮ 5 ¼ 6 ð65Þ 4 5 k0 0 kn If the source (input) is then removed from the observable model, then the release stage occurs as per the n-chain model as described by Application 1, with the initial state given by Eq. (65). It can be shown that the following additional constraints apply to the initial state of the release stage: 2 3 2 kn bOn 3 bO1 k1 6 7 6 kn bOn 7 7 6 bO2 7 6 6 k2 7 6 7 7 6 ⋮ 7 6 6 6 7 6 ⋮ 7 ð66Þ 6 7 ¼ 6 kn bOn 7 6 bOl 7 6 k 7 6 7 6 l 7 7 6 ⋮ 7 6 ⋮ 7 4 5 4 5 kn bOn bOn  1 k n1

By comparing to Eq. (19), the n  1 equations from Eq. (66) can be used to limit the possible values of the rate constants ki , for example by computing the values of Eqs. (19) and (66) for different permutations of the poles and residues of the model fit and finding the values for which they match. 2.4. Application 5 – linked single-loop feedback model

2

3   n1 k i þ 1  k1 kn r þ r ∏ 1π ð2Þ 6 7 k1 1π ð1Þ ki 7 i¼1 2 3 6 6 7     bO1 n1 k n  1 6 7  k k  k 1 2 i þ 1 i þ 1 k 7 n 6 7 6 r r r þ ∏ þ ∏ 1π ð2Þ 1π ð3Þ 7 k2 1π ð1Þ 6 bO2 7 6 ki ki 7 i¼2 i¼2 6 7 6 6 7 6 ⋮ 7 6 7 ⋮ 6 7 6 7   6 7¼6 j n1 k 7  k 6 bOj 7 6 iþ1 l kn 7 6 7 6 r 1π ðl þ 1Þ r þ ∑ ∏ 7 kj 1π ð1Þ 6 ⋮ 7 6 k 7 i l ¼ 1i ¼ j 4 5 6 7 6 7 ⋮ bOn 6 7 6 7 nþ1 4 5 ∑ r 1π ðiÞ i¼1

ð63Þ Additionally, applying Eq. (50) gives the numerator of the fixed source transfer function bH k0 ¼ kn r 1π ð1Þ

ð64Þ

Considering Eqs. (63) and (64) it can be seen that measuring over time during the saturate experimental scheme gives access to the same information as the release scheme, i.e. the chain rate constants and initial state, and additionally allows the transfer function of the fixed source model to be determined. Recall that the eigenvalue of the hidden model Eq. (62) is zero, and note that it follows that the pole corresponding to pα ¼0 of the fit to the linked model will correspond to the residue r 1π ð1Þ , which can be uniquely determined. It can be seen that, after taking the uniqueness of r 1π ð1Þ into account, the number of solutions for the parameters of the observable model are the same as with the release scheme described by Application 1. As with the example for Application 1, directed measurements of selected initial conditions can be done to determine the model parameters uniquely. We should reinforce that this assumes error-free experimental fits to the data. In practice, experimental noise may make it difficult to detect a unique pole corresponding to zero.

The following example considers a linked model consisting of an arbitrary hidden model of the form of Eq. (39) linked into a single feedback loop model described by Eqs. (25),(28). This combined configuration is depicted in Fig. 8. As with Application 4, the application of Corollary 3 is demonstrated to recover the input to the observable model (i.e. the hidden model transfer function). Recall that for the observable model, the single-loop feedback model of Application 2, measurements are made individually at location 1 of the loop and at the output of the location 2 as per Eq. (28). It was shown earlier that this is sufficient to identify the single-loop feedback model. It follows from Corollary 2 that the eigenvalues and eigenvectors of the observable model are identifiable from measurements of the linked model. Recall that the eigenvalues of the hidden model are also identifiable if the n n conditions of Proposition 4 – that C^ and bH have non-zero columns and elements respectively – hold for the eigenvalues of the hidden model. Applying Eq. (44) gives 2 3 ðλH1 þ k  1 þ k2 ÞC nH11 þ k  1 C nH21 ðλH2 þ k  1 þ k2 ÞC nH12 þ k  1 C nH22 ⋯ 2 2 6 λH1 þ ðk1 þ k  1 þ k2 ÞλH1 þ k1 k2 λH2 þ ðk1 þ k  1 þ k2 ÞλH2 þ k1 k2 7 7 ð67Þ ζ¼6 4 5 k1 C nH11 þ ðλH1 þ k1 ÞC nH21 k1 C nH12 þ ðλH2 þ k1 ÞC nH22 ⋯ 2 2 λH1 þ ðk1 þ k  1 þ k2 ÞλH1 þ k1 k2

λH2 þ ðk1 þ k  1 þ k2 ÞλH2 þ k1 k2

" n By Eq. (46) and (28) it follows that C^ ¼

1

0

0

k2

#

ζ, and hence,

n

with the assumptions that CH has non-zero columns (i.e. C nH1j a C nH2j a0) and that C nH1j a ððk  1 Þ=ðλHj þ k  1 þ k2 ÞÞC nH2j and C nH1j a ððλHj þ k1 Þ=ðk1 ÞÞC nH2j , which will be true in nearly all practical n situations, each column of C^ will have 2 non-zero elements. Requiring n also that bH have no zero components, it follows that for the linked model, the conditions of Proposition 4 hold for all of the hidden model eigenvalues λH . It also follows that the conditions of Corollary 3 hold for all columns of Eq. (67) (as each column of CnH will also have at most

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

2 non-zero elements). It then follows that the input to the observable model is recoverable. The residues of the input transfer function are found explicitly from Eq. (54) and are given by 3 2 n 3 2 n ðqπ ðbÞ þ k1 Þr 1π ðbÞ  kk21 r 2π ðbÞ C H1ib bHib 6 7

4 n 5¼4 ð68Þ 5 n q þk þk C H2ib bHib  k1 r 1π ðbÞ þ πðbÞ k2 1 2 r 2π ðbÞ with the corresponding poles given by Eq. (49). As Corollary 4 holds under the same conditions as Corollary 3, Eq. (55) can be applied, which after some computation gives the initial state of the observable single-loop feedback model nH þ 2

bO ¼ ∑

i¼1

"

r 1π ðiÞ

#

r 2π ðiÞ k2

:

ð69Þ

It follows that the observable component of the linked model is identifiable. Analysis of the n-chain and feedback loop building block models in Applications 4 and 5 tell us how these types of features can affect the identifiability of a linked model. We have shown that a n-chain model as observable components of a linked model, or as an intermediate model in a set of nested linked models will preserve identifiability of any further nested hidden models along the chain. We have also shown that any feedback loops with a single exit path will require at least one direct measurement in order to identify the model and any further hidden models.

2.4.1. Application 6 – converging R-branch The following example demonstrates a way in which the linked model approach is generalizable to complex networks. The example considers the case of multiple independent inputs to an observable model, and consists of R hidden inputs of the form Eq. (39), denoted x1H , …, xRH , which are linked to an observable model Eq. (40), as illustrated in Fig. 6. This is referred to in the following as the converging R-branch model, as a simple form of the model with a single link from each input to the same component of the output would look like the inverse of the diverging R-branch model shown in Fig. 4. The model can also deal with much more complex links from the input models as illustrated in Fig. 6, with the constraints that no links are directed from the observable model to any of the hidden models, and that the hidden models are not linked to each other (i.e. independent). The converging R-branch linked model is governed by the system of equations 3 2 1 AH x1H ðtÞ 6 x2 ðtÞ 7 6 6 6 H 7 6 0 7 d6 6 ⋮ 7¼6 7 6 ⋮ dt 6 6 xR ðtÞ 7 6 4 H 5 6 4 0 xO ðtÞ C1H 2

0



A2H

0 0

⋱ ARH

0 C2H



CRH

32

ð70Þ

1

xHþ

3

2

1 b 6 H 6 b2 6 H



A2H ⋱ 0

0

3

7 0 7 7; 7 5 ARH

ð71Þ

it can be seen Eq. (41) holds for the new variables, i.e. " þ # " þ AH d xH ðtÞ ¼ CHþ dt xO ðtÞ

0 A0

#"

xHþ ðtÞ xO ðtÞ

# ;

yL ðtÞ ¼ CO xO ðtÞ:

ð72Þ

þ xHþ ð0Þ ¼ bH .

and It follows that for a linked model Eq. (70) such that the conditions of Corollary 3 hold, the transfer function of the hidden model with the new variables Eq. (71) can be determined from Eq. (54). From Eq. (3) and Eq. (71) it follows that the input to the observable model is given by the sum of the inputs from the R hidden models, i.e. R

φðsÞ ¼ ∑ CiH ðsI AiH Þ  1 biH :

ð73Þ

i¼1

If the hidden models are identifiable from their individual input transfer functions, then they are identifiable from the summed input Eq. (73), as it can be easily seen that the terms of a fit to Eq. (73) can be partitioned into the individual inputs from the R hidden models. In other words, if the input to the component observable model of a linked model is composed of multiple independent inputs, and if the combined input can be determined from a measurement of the observable model, then the individual inputs can be determined. 2.4.2. Application 7 – phosphoinositide transformations in endocytosis model Finally, we present a case study of the power of the linked model framework to provide an intuitive understanding of complex models, based on the interaction of component models, by returning to the model from Belward et al. (2011)) shown in Fig. 1. We recall the features of the model as described in the Introduction and summarized as thus:

 a feedback loop between two compartments (PI45P22PI345P3),

 single compartments (PI, PI34P2), converging on and  a chain of two compartments (PI3P-PI35P2). Application 2 (single feedback loop) determined that for the first feature to be identified requires:

internal species measurement, and allowing an external measurement via k3 pathway.

R

 an initial value to the chain, and  a single measurement at the end of the chain. In addition to this, the linked model framework was used to deduce that:

3

x1H ðtÞ 7 6 2 7 7 h x 6 H ðtÞ 7 þ 7 þ 1 7; b ¼ 6 ¼6 7; CH ¼ CH 6 ⋮ 7 H 6 7 ⋮ 4 5 4 5 R xRH ðtÞ bH

0

Application 1 (linear chain) determined that to identify the other features requires:

with the initial states of the inputs denoted by bH , …, bH . Defining new variables 2

AHþ

A1H 6 6 0 ¼6 6 ⋮ 4 0

 the loop has non-zero initial components, and  two independent time-series measurements, with at least one

3 x1 ðtÞ 76 H x2 ðtÞ 7 0 7 7 76 6 H 7 7 6 ⋮ 76 ⋮ 7 76 R 7 7 0 7 54 xH ðtÞ 5 xO ðtÞ AO 0

2

17

C2H



i CRH ;

 Measurement of a linked linear chain allows the input to the chain to be determined (Application 4), and

 Multiple independent inputs to a model can be separately identified (Application 6).

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

18

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

We can use this information about the component features of the model to deduce some minimum identifiability requirements in terms of measurements for the model:

 Required measurement at PI35P2 (end of PI3P-PI35P2 chain),



which will allow determination of PI3P (initial chain location), and PI and PI34P2 (inputs to chain), and output from PI45P22PI345P3 feedback loop (input to PI34P2), and Required measurement at either PI45P2 or PI345P3 to resolve feedback loop.

We have hence demonstrated that once the properties of individual model components are understood, the linked model framework allows the information to be used to assess identifiability of complex systems, and to intuitively understand what makes a model identifiable or not.

system is determined. In this way it would be possible to estimate the variability in parameter estimates for given levels of experimental noise. The effects of noise and the degree to which they affect the uncertainty in parameter estimation will be highly specific to the modeled system, with some models being more robust than others. For example, noise might mean that relatively small contributions to model output cannot be reliably fitted, reducing the amount information obtained from an experiment. It is reasonable to assume that closer measurements (i.e. less intervening compartments) to a compartment of interest would result in reduced noise contribution when inferring the compartment behavior (and the associated model parameters). More broadly, an analytical study including model noise may provide a deeper intuition about the effects of noise and guide experimental design to minimize the propagation of measurement uncertainty to the inference of parameter values. Another possible direction for future work could be to generalize the method for non-diagonalizable models by using a Jordan normal model form.

3. Discussion Linear systems of ordinary differential equations are widely applied tools for modeling biological systems. The structural identifiability of a model is an analytical property that can affect the ability to correctly draw biological conclusions from experiments, and can be non-trivial to deduce. The first contribution of this paper is to present analytical methods using the diagonalized transfer function to extract parameter information from experimental data via eigenvalue equations, equations for the initial state and residue relations, and includes the determination of sufficient conditions for the structural identifiability of a model. The diagonalized transfer function methods are applied to investigate the experimental design required to identify some configurations commonly applied in models of biological systems: linear chains, feedback loops and diverging branch models. A simulated experiment and calculation for a 5-component linear chain was presented, demonstrating the extraction of the model parameters from the simulated data. The second contribution defines the concept of linked models – that is, models that are composed of a component hidden model linked to a component observable model – and shows that in combination with the diagonalized transfer function methods that the linked model framework can be used to extract parameter values from measurements of the observable components of linked models, and determines sufficient conditions for the recovery of the transfer function of component hidden models. The linked model framework in combination with the diagonalized transfer function methods was used to investigate the experimental design required to recover the transfer function of arbitrary sources linked to common model configurations: a linear chain, a feedback loop, and multiple arbitrary inputs to a converging branch point. The linked model was also applied to analyze the commonly used fill/release experimental scheme as a fixed source providing continuous input to a linear chain model, and a relationship between the saturated steady state and measurements over time of a following release scheme is shown. Finally, results for these building block models are combined to deduce the measurement requirements to identify a real experimental model. This example in particular, as well as the other examples in the work, highlight the linked model framework's usefulness (in combination with the diagonalized transfer function method applied to building blocks) in answering “where and how to measure” questions for complex systems, by guiding experimental design to maximize the structural parameter information obtained. With measuring an experimental system, noise is always an important consideration, and has led to the study of practical identifiability, as noted in the Introduction. While the systems we have described are assumed to be noise-free, the effects of noise on parameter estimation can be readily determined using numerical simulations, once the structural identifiability of the experimental

Acknowledgments This work was funded by National Health and Medical Research Council of Australia (NHMRC) Grants (to N.A.H. and J.L.S.) and by an NHMRC program grant and fellowship (J.L.S.).

Appendix A. Mathematical proofs

Proposition 1. Proof The rows of Eq. (9) in component form are n

Φj ðsÞ ¼ ∑

n

cnji bi

λ

i ¼ 1 s i

ð74Þ

Hence, if for some i there exists some j such that the product cnjibni is non-zero, then ith eigenvalue of the model appears as a pole of Eq. (5) (recalling the assumption that Lff j gðsÞ is the best approximation of Φj ðsÞ). Conversely, if all j and some i, cnji ¼0 or bni ¼0 then the ith term of Eq. (74) vanishes from all elements of the transfer function (and hence all fits). □ Proposition 2. Proof It can be seen from Eq. (5) and (74) that the permutation π defined by Eq. (10) describes how the terms of a fit are permuted to correspond with the terms of the transfer function element. Hence by permuting the constructed vector of residues and equating to the numerators of the terms of Eq. (74) it can be shown that 32 n 3 2 3 2 n n 3 2 n C j1 i1 bi1 bi1 C j1 i1 r j1 π ð1Þ 6 7 6 76 n 7 6r 7 n 7 n n ¼ ð75Þ 4 4 5 4 j2 π ð2Þ 5 ¼ 6 b C b C 4 j2 i2 i2 5 i2 5 j2 i2 ⋮ ⋮ ⋮ ⋱ If there exists elements of the columns of Cn that are non-zero, then from Proposition 1 it must be the case that the condition that n bj ¼ 0 does not hold for any eigenvalue λj that does not appear in the transfer function. From Eq. (8) it follows that: 2 32 n 3 bi1 V 1i1 V 1i2 ⋯ 7 6V 76 n ð76Þ b ¼ Vb ¼ 4 2i1 V 2i2 ⋯ 54 bni 5 2 ⋮ ⋮ Rearranging Eqs. (74) and (75)) for the vector of non-zero elements of bn and inserting into Eq. (76) shows Eq. (13). □

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 Q8 89 90 Q9 Q10 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Proposition 3. Proof Note from Eq. (74) that the elements of bn are independent of the choice of measurement location. With the permutation π defined by Eq. (10), equating the residue of a fit Eq. (5) to the corresponding numerator of Eq. (74) gives n

r jπ ðbÞ ¼ C njib bib :

ð77Þ

Considering the residues of two fits j and k for a common pole, n eliminating the common variable bib leads to Eq. (14). Assuming n that b has no zero elements, the number of non-zero elements of Cn corresponds to the number of fit residues, with each column corresponding to a pole of the transfer function. If more than one residue for a pole is observed, each additional residue can be related to the first by Eq. (14). Hence, each column Cn with more than one non-zero element contributes a number of equations at most equal to the number of additional non-zero elements above n one. If there exists an element bα ¼ 0, then the corresponding n column of C will contribute no equations. □ Corollary 1. Proof If the rate constants k can be obtained from eigenvalue equations and residue relations, then it follows that V can be determined. Hence the parameters b are also obtainable by Proposition 2. By Definition 1 it follows that the model is identifiable. □ Proposition 4. Proof The rows of Eq. (48) can be expressed in component form as n n ^n nO C n b C^ b φLh ðsÞ ¼ ∑ hi Hi þ ∑ Ohj j i ¼ 1 s  λHi j ¼ 1 s  λOj

nH

ð78Þ

It follows that if for some term i of the first sum of Eq. (78) and n n some h that the product C^ hi bHi is non-zero then λHi will appear as a pole in the corresponding hth fit. Similarly, if for some j of the n second sum of Eq. (78) and some h the product C nOhj b^ j is non-zero then λOj will appear as a pole in the hth fit. Conversely, if all h and n n some i, C^ hi ¼ 0 or bHi ¼ 0 then the ith term of the first sum of Eq. (78), and hence λHi , vanishes from all elements of the transfer n function, and if for all h and some j, C nOhj ¼ 0 or b^ j ¼ 0, then λOi vanishes. □ Proposition 5. Proof It can be seen by comparing Eq. (5) and (78) that the permutation π defined by Eq. (49) describes how the terms of a fit of the form of Eq. (5) to the linked model are permuted to correspond with the terms of the transfer function element Eq. (78). Construct a vector h iT r h1 π ð1Þ … r hu π ðuÞ r h1 π ðu þ 1Þ … r hv π ðu þ vÞ of residues that correspond to all the eigenvalues for which the conditions of Proposition 4 hold. By equating to the numerators of the terms of Eq. (78) it follows that: 2 n 3 n 2 3 b C^ r h1 π ð1Þ 6 h1 i1 Hi1 7 6 7 6 7 ⋮ ⋮ 6 7 6 7 6 7 6 ^n n 7 6 r hu π ðuÞ 7 6 C hu iu bHiu 7 6 7 6 7 ð79Þ 6r 7¼6 n 7: 6 g1 π ðu þ 1Þ 7 6 C n b^ 7 Og j j 6 7 6 1 1 1 7 6 7 6 7 ⋮ ⋮ 4 5 6 7 4 n 5 r gv π ðu þ vÞ n ^ C Ogv jv bjv n If there exists elements of all the columns of C^ that are nonn zero, then it must be the case that bHα ¼ 0 for any eigenvalue λHα for which the conditions of Proposition 4 do not hold. Similarly, if

19

the observable model satisfies the condition of Proposition 2 of non-zero elements in the columns of CnO , then for any eigenvalue n λOα for which the conditions of Proposition 4 do not hold, b^ α ¼ 0. It can be shown that for the linked model " # " #" n # VH 0 bH bH ¼ : ð80Þ ζ VO b^ n bO

It follows that the initial state of the hidden model can be expressed as: 2 32 n 3 bHi1 V H1i1 ⋯ V H1iu 6 76 ⋮ 7 n 7 ð81Þ bH ¼ VH bH ¼ 4 V H2i1 ⋯ V H2iu 56 4 n 5 bHiu ⋮ ⋮ And the initial state of the observable model as 2 32 n 3 bHi1 ζ 1i1 ⋯ ζ 1iu n 6 76 ⋮ 7 n 7 bO ¼ ζbH þ VO b^ ¼ 4 ζ 2i1 ⋯ ζ 2iu 56 4 n 5 bHiu ⋮ ⋮ 2 32 ^ n 3 bj V O1j1 ⋯ V O1jv 6 17 6V 7 6 þ 4 O2j1 ⋯ V O2jv 54 ⋮ 7 5: n ⋮ ⋮ b^

ð82Þ

jv

Writing the first u elements of Eq. (79) as 32 n 3 3 2 ^n r h1 π ð1Þ bHi C h1 i1 76 1 7 6 ⋮ 7 6 76 ⋮ 7 ⋱ 4 5¼6 4 54 n 5 n r hu π ðuÞ bHiu C^ h i 2

ð83Þ

2 2

it can be seen that Eq. (50) follows by rearranging Eq. (81) for the n non-zero elements of bH and substituting into Eq. (83). From the u þ1 to u þv elements of Eq. (79) it follows that: 32 n 3 2 3 2 n C Og1 j1 r g1 π ðu þ 1Þ b^ j 6 76 1 7 6 7 6 7 6 7 ⋱ ⋮ ð84Þ 4 5¼4 54 ⋮n 5 ^b C nOgv jv r gv π ðu þ vÞ jv which can be rearranged for the non-zero elements of the vector n b^ and substituted into Eq. (82) along with the expression for the n non-zero elements of bH from Eq. (83) to give Eq. (51). □ Proposition 6. Proof n n It can be noted from Eq. (78) that the elements of bH and b^ are invariant of the measured locations of the model denoted by CO . With the permutation π defined by Eq. (49), by equating the residues of Eq. (5) to the corresponding numerator of Eq. (78) it follows that: n n r hπ ðbÞ ¼ C^ hib bHib

1 r b r u;

n r hðu þ cÞ ¼ C Ohjc b^ jc ;

1 r c rv:

n

ð85Þ

Considering the residues of two fits L{f}g and L{f}h, eliminating n the common variable (bHib for the residues corresponding to the n hidden model eigenvalues and b^ jc for the residues corresponding to the observable eigenvalues as determined by π) leads to n Eq. (52). The number of non-zero elements of C^ and CnO corresponds to the number of fit residues, with each column corresponding to a pole of the transfer function. If more than one residue for a pole is observed, each additional residue can be n related to the first by Eq. (52). Hence, each column of C^ and CnO with more than one non-zero element contributes a number of equations at most equal to the number of additional non-zero elements above one. □

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

T.O. Lamberton et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56

Proposition 7. Proof The constructed vector of residues corresponding to an identical pole can be rewritten using Eq. (52) to get 2 mH 3 n 1 n 2 3 C ð λ I A Þ C b ∑ O O Hi l Hli Hi g b b b 7 1 6l¼1 r g1 π ðbÞ 6 7 7 6r 7 6 4 g2 π ðbÞ 5 ¼ 6 mH n 7 1 n 6 ∑ C O ðλHib I AO Þg l C Hlib bHib 7 2 4l¼1 5 ⋮ 2

⋮ C O ðλHib I  AO Þg1 h11

6 1 ¼6 4 C O ðλHib I  AO Þg2 h1 ⋮

C O ðλHib I  AO Þg1 h12 C O ðλHib I  AO Þg2 h12



32

n

C nHh1 ib bHib

3

7 76 n n 7 76 C 54 Hh2 ib bHib 5 ⋮ ð86Þ n

recalling that mH is the number of (non-zero) rows of CH . The system Eq. (86) can be formed such that the first matrix on the R.H.S. is square if the number of residues corresponding to the ibth eigenvalue of the component model Eq. (39) is equal or greater than the number of non-zero elements in the ibth column of CnH . If the matrix is invertible, then inverting Eq. (86) gives the corresponding residues of the component model transfer function via Eq. (54). □ Corollary 3. Proof If the eigenvalues λH of the hidden model Eq. (39) are known, then they give the possible values for the poles of the transfer function of the hidden model by Eq. (10). If the conditions of n Proposition 7 hold for each column of C^ , then the system of equations Eq. (54) can be constructed. If in addition the eigenvalues λO and eigenvectors VO of the observable model Eq. (40) are known, then the elements of the inverse matrix in Eq. (54) are known, and the transfer function residues corresponding to the eigenvalues λH can be evaluated. □ Corollary 4. Proof From Eq. (82) and substituting Eq. (84) we have 2 3 V O1j1 V 2 32 n 3 ⋯ C nO1jv 2 r g π ðu þ 1Þ 3 bHi1 ζ 1i1 ⋯ ζ 1iu 6 C nOg1 j1 Og v jv 7 1 6 76 6 76 ⋮ 7 7 7 þ 6 V O2i1 ⋮ V O2jv 74 bO ¼ 4 ζ 2i1 ⋯ ζ 2iu 56 5 4 n 5 6 Cn ⋯ Cn 7 4 Og 1 j1 Og v jv 5 r gv π ðu þ vÞ bHiu ⋮ ⋮ ⋮ ⋮ ð87Þ Substituting Eq. (44) and evaluating the first term in the R.H.S. gives Eq. (55). The assumptions of Corollary 3 are that λO , VO and λH are identifiable (and hence known), and hence Eq. (55) can be evaluated to find the observable model initial state bO . □ References Ahlfors, L.V., 1966. Complex Analysis: An Introduction to the Theory of Analytic Functions of One Complex Variable. McGraw-Hill, New York. Ashyraliyev, M., Fomekong-Nanfack, Y., Kaandorp, J.A., Blom, J.G., 2009. Systems biology: parameter estimation for biochemical models. FEBS J. 276 (4), 886–902.

57 Bellman, R., Åströ m, K.J., 1970. On structural identifiability. Math. Biosci. 7 (3–4), 329–339. 58 Bellu, G., Saccomani, M.P., Audoly, S., D'Angiò, L., 2007. DAISY: a new software tool 59 to test global identifiability of biological and physiological systems. Comput. 60 Methods Progr. Biomed. 88 (1), 52–61. Belward, J., Burrage, K., Teasdale, R.D., Hamilton, N.A., 2011. Linear models for 61 endocytic transformations from live cell imaging. ANZIAM J. 52, C156–C171. 62 Berthoumieux, S., Brilli, M., de Jong, H., Kahn, D., Cinquemani, E., 2011. Identifica63 tion of metabolic network models from incomplete high-throughput datasets. Bioinforma 27 (13), i186–195. 64 Berthoumieux, S., Brilli, M., Kahn, D., Jong, H., Cinquemani, E., 2012. On the 65 identifiability of metabolic network models. J. Math. Biol., 1–38. Q11 66 Chen, W.W., Niepel, M., Sorger, P.K., 2010. Classic and contemporary approaches to 67 modeling biochemical reactions. Genes Dev. 24 (17), 1861–1875. Chis, O.T., Banga, J.R., Balsa-Canto, E., 2011. Structural identifiability of systems 68 biology models: a critical comparison of methods. PLoS One 6 (11), e27755. 69 Chou, I.C., Voit, E.O., 2009. Recent developments in parameter estimation and 70 structure identification of biochemical and genomic systems. Math. Biosci. 219 (2), 57–83. 71 Cobelli, C., DiStefano 3rd, J.J., 1980. Parameter and structural identifiability concepts 72 and ambiguities: a critical review and analysis. Am. J. Physiol. 239 (1), R7–24. 73 Crampin, E.J., 2006. System identification challenges from systems biology. Syst. Identif. 14, 81–93. 74 de Jong, H., 2002. Modeling and simulation of genetic regulatory systems: a 75 literature review. J. Comput. Biol. 9 (1), 67–103. 76 Follett, J., Norwood, S.J., Hamilton, N.A., Mohan, M., Kovtun, O., Tay, S., Zhe, Y., 77 Wood, S.A., Mellick, G.D., Silburn, P.A., Collins, B.M., Bugarcic, A., Teasdale, R.D., 2014. The Vps35 D620N mutation linked to Parkinson's disease disrupts the 78 cargo sorting function of retromer. Traffic 15 (2), 230–244. 79 Ghosh, R.N., Gelman, D.L., Maxfield, F.R., 1994. Quantification of low density 80 lipoprotein and transferrin endocytic sorting HEp2 cells using confocal microscopy. J. Cell Sci. 107 (Pt. 8), 2177–2189. 81 Gutenkunst, R.N., Waterfall, J.J., Casey, F.P., Brown, K.S., Myers, C.R., Sethna, J.P., 82 2007. Universally sloppy parameter sensitivities in systems biology models. 83 PLoS Comput. Biol. 3 (10), 1871–1878. Hamilton, N., 2009. Quantification and its applications in fluorescent microscopy 84 imaging. Traffic 10 (8), 951–961. 85 Henry, L., Sheff, D.R., 2008. Rab8 regulates basolateral secretory, but not recycling, 86 traffic at the recycling endosome. Mol. Biol. Cell 19 (5), 2059–2068. 87 Holmberg, A., 1982. On the practical identifiability of microbial growth models incorporating Michaelis–Menten type nonlinearities. Math. Biosci. 62 (1), 88 23–43. 89 Jaqaman, K., Danuser, G., 2006. Linking data to models: data regression. Nat. Rev. 90 Mol. Cell Biol. 7 (11), 813–819. Liepe, J., Filippi, S., Komorowski, M., Stumpf, M.P., 2013. Maximizing the information 91 content of experiments in systems biology. PLoS Comput. Biol. 9 (1), e1002888. 92 Liu, F., Burrage, K., Hamilton, N.A., 2011. Some novel techniques of parameter Q2 93 estimation for dynamical models in biological systems. IMA J. Appl. Math., 1–26 (in press). 94 Nemcova, J., 2010. Structural identifiability of polynomial and rational systems. 95 Math. Biosci. 223 (2), 83–96. 96 Nikerel, I.E., van Winden, W.A., Verheijen, P.J., Heijnen, J.J., 2009. Model reduction 97 and a priori kinetic parameter identifiability analysis using metabolome time series for metabolic reaction networks with linlog kinetics. Metab. Eng. 11 (1), 98 20–30. 99 Pohjanpalo, H., 1978. System identifiability based on the power series expansion of 100 the solution. Math. Biosci. 41 (1–2), 21–33. Raue, A., Kreutz, C., Maiwald, T., Bachmann, J., Schilling, M., Klingmuller, U., Timmer, J., 101 2009. Structural and practical identifiability analysis of partially observed 102 dynamical models by exploiting the profile likelihood. Bioinforma 25 (15), 103 1923–1929. Raue, A., Kreutz, C., Maiwald, T., Klingmuller, U., Timmer, J., 2011. Addressing 104 parameter identifiability by model-based experimentation. IET Syst. Biol. 5 (2), 105 120–130. 106 Sheff, D.R., Kroschewski, R., Mellman, I., 2002. Actin dependence of polarized 107 receptor recycling in Madin–Darby canine kidney cell endosomes. Mol. Biol. Cell 13 (1), 262–275. 108 Voit, E.O., Almeida, J., Marino, S., Lall, R., Goel, G., Neves, A.R., Santos, H., 2006. 109 Regulation of glycolysis in Lactococcus lactis: an unfinished systems biological 110 case study. IEE Proc. Syst. Biol. 153 (4), 286–298. 111

Please cite this article as: Lamberton, T.O., et al., On linear models and parameter identifiability in experimental biological systems. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.028i

On linear models and parameter identifiability in experimental biological systems.

A key problem in the biological sciences is to be able to reliably estimate model parameters from experimental data. This is the well-known problem of...
1MB Sizes 0 Downloads 3 Views