G Model

ARTICLE IN PRESS

YSCDB 1620 1–11

Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Seminars in Cell & Developmental Biology journal homepage: www.elsevier.com/locate/semcdb

Review

1

Information theory and signal transduction systems: From molecular information processing to network inference

2

3

5

Siobhan S. Mc Mahon 1 , Aaron Sim 1 , Sarah Filippi, Robert Johnson, Juliane Liepe, Dominic Smith, Michael P.H. Stumpf ∗

6

Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK

4

Q1

7

a r t i c l e

8 17

i n f o

a b s t r a c t

9

Article history: Available online xxx

10 11 12

16

Keywords: Signal processing Mutual information Noise

18

Contents

13 14 15

Sensing and responding to the environment are two essential functions that all biological organisms need to master for survival and successful reproduction. Developmental processes are marshalled by a diverse set of signalling and control systems, ranging from systems with simple chemical inputs and outputs to complex molecular and cellular networks with non-linear dynamics. Information theory provides a powerful and convenient framework in which such systems can be studied; but it also provides the means to reconstruct the structure and dynamics of molecular interaction networks underlying physiological and developmental processes. Here we supply a brief description of its basic concepts and introduce some useful tools for systems and developmental biologists. Along with a brief but thorough theoretical primer, we demonstrate the wide applicability and biological application-specific nuances by way of different illustrative vignettes. In particular, we focus on the characterisation of biological information processing efficiency, examining cell-fate decision making processes, gene regulatory network reconstruction, and efficient signal transduction experimental design. © 2014 Published by Elsevier Ltd.

1. 2.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A primer on information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Uncertainty and information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Mutual information and conditional mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. Computing mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Biological information processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Quantifying molecular information transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Cell fate decision making processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. The logic of sensing and responding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Information theoretical analysis of signal transduction systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Network reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Experimental choice in signal transduction systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Molecular motif equations and simulation parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

Q2

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

∗ Corresponding author. Tel.: +44 2075945114. E-mail address: [email protected] (M.P.H. Stumpf). 1 These authors contributed equally to this work. http://dx.doi.org/10.1016/j.semcdb.2014.06.011 1084-9521/© 2014 Published by Elsevier Ltd.

Please cite this article in press as: Mc Mahon SS, et al. Information theory and signal transduction systems: From molecular information processing to network inference. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.06.011

G Model YSCDB 1620 1–11

S.S. Mc Mahon et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

2 34

35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98

ARTICLE IN PRESS

1. Introduction Biological organisms — from microbes to multi-cellular organisms, including humans — are driven to survive and reproduce. The ability to sense the presence of sustenance, reproductive opportunities, and imminent danger is, as such, the primary physiological requirement across all domains and stages of life. In particular, in the context of developmental biology, an organism has to identify, in addition to environmental signals, different tissue types, nutrient needs, co-factor requirements, developmental states, and other physiological cues [1,2]. As a result, a necessary feature of life is the complex biological machinery which has evolved to process such information, translating incoming messages from the environment into appropriate responses and behaviours [3]. From this perspective, evolutionary change is driven by fitness advantages conferred to an organism possessing more efficient information processing capabilities, which allow it to respond to vital but potentially noisy signals timely and appropriately. Our understanding of the molecular processes underlying biological information processing at molecular and cellular scales is closely coupled to our ability to quantify the efficiency of these signal transduction processes, and here we provide an overview of the relevant theoretical concepts and their scope for application in signal transduction and developmental biology. Information theory (discussed in the next section) is the theoretical framework that provides the necessary mathematical tools for the analysis of biological information processing [4–6]. Originally formulated in an engineering context, its applications typically deal with making the transmission of information from sender to receiver more efficient across some communication channel (for instance, wires, fibre optic cables, electromagnetic waves, etc.). In systems biology applications it is tempting to appropriate the engineering approach wholesale. However, the notion of a channel in biological signal transduction systems is less well defined, especially in developmental biology as we will argue below (see Section 3.2). Here it suffices to say that the inputs and outputs can be of a very different nature. For example, the concentration of epidermal growth factor (EGF) could be the input, while switching cells into a proliferative mode the intended output. More generally, the channel is the complex machinery that senses and transduces extracellular chemical concentrations, resulting in a transcriptional programme which determines the cell’s fate. But whether cells proliferate or differentiate, for instance, depends not only on the presence of a single signal, but also the presence of other molecules and the temporal profile with which the stimulus is presented to the cells [7]. This introduces a level of nuance on biological information processing that is typically absent from traditional engineering applications. But information theory has much more to offer to systems and developmental biologists than an alternative, if descriptive, view of known biological processes. It provides some of the tools that can be employed in order to get more detailed insights into the information processing machineries [8–13]. Unlike conventional (or Pearson) correlation, information theoretic measures are sensitive to non-linear relationships between variables [14]. This can be used to reverse engineer the structure of signalling and regulatory networks that control, for example, developmental processes at the molecular and cellular levels [15–17]. More recently, in addition to network inference (or reconstruction) information theoretic approaches have also gained prominence in experimental design, where they allow us to improve our knowledge, e.g. about signal transduction networks, in a well defined iterative manner [18]. Below we provide an overview of these different aspects of information theory in developmental systems biology. We start from a basic outline of information theoretic concepts (aimed

primarily at quantitative biologists) before introducing some key mathematical tools. We then discuss the practical uses that information theory can have in elucidating the mechanisms underlying biological systems by way of five illustrative vignettes. Methodologically these different topics are closely related; considering them together — here done for the first time, to the best of our knowledge — allows us to draw out some of the many facets that information theory has in a molecular and cellular context.

99 100 101 102 103 104 105 106 107

2. A primer on information theory

108

2.1. Uncertainty and information

109

The first successful theory of information was developed by C.E. Shannon at Bell Laboratories in 1948 [19]. Despite the specific nature of his original motivation (maximising the capacity of communication channels), the theory itself is so general that it has since been applied, in its original from, to a wide range of disciplines across science, engineering and economics [16,20–26]. Shannon recognised that whatever information is, one should, at least in theory, be able to measure it. In signals sent over a communication channel one quantifiable feature is the noise, which is measured, say, in terms of the proportion of incorrectly interpreted messages. A less distorted signal carries, in the intuitive sense, more information; conversely, the noisier or more distorted signal is associated with a loss of information. This link between uncertainty and information is the key insight behind Shannon’s definition. Mathematically, one deals with uncertainty in physical quantities using random variables. A random variable X is a variable that adopts a range of values in some sample space ˝X with some probability distribution pX . Examples in biology range from discrete counts of physical entities (e.g. cell signalling molecules) to continuous attributes (e.g. temperature, pH levels), all of which, upon repeated measurements, exhibit some degree of randomness about a mean. We begin by quantifying the information contained in a single realisation x of some random variable X. Shannon posited three natural conditions that any such measure of information I(x) must satisfy. First, the quantity of information in the measurement x depends solely on the probability pX (X), with small values pX (X) corresponding to large I(x). This is highly intuitive — unlikely and surprising events carry more information than common occurrences. This condition strips away from the concept of information all colloquial associations to the inherent meaning and other semantic aspects of the message or signal. Second, information must be a continuous function of pX (X); it is only reasonable that a small change in pX (X) should effect a corresponding small change in I(x). Third, information obtained from independent realisations x1 x2 ˝X should be additive in the sense that I(x1 , x2 ) = I(x1 ) + I(x2 ). Up to a constant of proportionality, the unique measure that satisfies these three conditions is the (negative) logarithm, i.e. I(x) = − logpX (x). Now the very nature of random variables compels one to consider the space of all possible realisations in ˝X , via the expectation operator, rather than just a single measurement x. According to Shannon, therefore, information can be thought of as the expected (logarithmic) surprise from the measurement of a random variable. More formally, it is given by the entropy H(X), defined as the negative expectation of the logarithm of the probability distribution. For a continuous variable,

110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156

 H(X) = −

pX (x) log pX (x)dx.

(1)

˝X

Please cite this article in press as: Mc Mahon SS, et al. Information theory and signal transduction systems: From molecular information processing to network inference. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.06.011

157

G Model

ARTICLE IN PRESS

YSCDB 1620 1–11

S.S. Mc Mahon et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx 158

2.2. Mutual information and conditional mutual information

172

So far we have described information theory in the simple context of a single random variable. Almost all of biology, however, is the study of the relationships between variables — input/output, signal/response, co-regulating genes, competing species, etc. The most commonly-used tool to characterise these links, namely Pearson’s correlation measure, unfortunately suffers from an inability to properly account for complex, yet frequently observed, non-linear associations. Information theory, however, has no such handicap and can be used to shed light on the overall dependency structures of such systems. The first step in the multivariate extension of information theory is straightforward. The measure of information here is the joint entropy H(X,Y, . . .), defined in terms of the joint probability distribution p(X,Y,...) . In the case of two variables, for example,

173

H(X, Y ) = −

159 160 161 162 163 164 165 166 167 168 169 170 171





pX,Y (x, y) log pX,Y (x, y)dxdy. ˝X

174 175 176 177 178 179 180

˝Y

The information theoretic approach to quantifying dependencies, then, is to measure the departures from independence. Far from being a tautology, the avoidance of any explicit assumptions about the form of the dependencies (e.g. linear, sinusoidal, etc.) allows the approach to be completely general. Two random variables are independent if and only if their joint probability is the product of their marginals, i.e. ind.

181

182 183 184 185 186 187

pX,Y (x, y) = pX (x)py (y) ∀x ∈ ˝X,y ∈ ˝Y , where pX (x) =



˝Y

(3)

pX,Y (x, y)dy .

The mutual information I(X;Y) between two random variables X,Y is then defined as the difference between the joint entropy and the joint entropy under the assumption of independence of X and Y [27]. It gives the extent to which knowledge of one variable provides information about the other. Explicitly, one writes







I(X; Y ) = 188

(2)

pX,Y (x, y) log ˝X

˝Y

pX,Y (x, y) pX (x)pY (y)



dx dy

(4)

= H(X) + H(Y ) − H(X, Y ).

197

It is easily shown that mutual information is non-negative, I(X;Y) ≥ 0, with equality only in the case of independence. The expression is also symmetric, I(X;Y) = I(Y;X), reflecting a nondirectional association with no claims of causality. In cases where the random variables represent time-series trajectories, i.e. X = X(t), in addition to the mutual information between the trajectories, one can also measure the rate at which information about one trajectory is gained from the other as the length of the trajectories are increased. The mutual information rate Irate (X ; Y) is defined as

198

Irate (X; Y ) = lim

189 190 191 192 193 194 195 196

199 200 201 202 203 204 205 206 207 208 209 210 211 212

d I(X(t); Y (t)), T →∞ dT

(5)

for trajectory length T. It has been shown that, for systems at stationarity, Irate (X ; Y) can be obtained via a transformation of the trajectories into Fourier space where the T→ ∞ limit is well-defined in terms of the power spectrum [10]. In complex and highly interconnected biological systems, the dependency structure is seldom a disjoint union of simple pairwise associations. An example could be instances where the non-independence between two variables is entirely attributed to physical interactions with one or several other common confounding variables; the apparent association between the two variables is incidental rather than biological (see Section 4.1 for a concrete illustration in the context of gene regulatory networks). In information theory, one is able to uncover this additional structure using the conditional mutual information [28].

3

For random variables X, Y, and Z, the conditional mutual information I(X;Y|Z) is the expectation of the mutual information I(X;Y), given the value of Z, with respect to the marginal distribution pZ , i.e. I(X; Y |Z)

= EZ (I(X; Y )|Z) = H(X, Z) + H(Y, Z) − H(X, Y, Z) − H(Z).

(6)

In words, it measures the extent to which knowledge of Y can give additional information about X in light of prior knowledge of Z. As in the case of mutual information, it is non-negative and symmetric under the exchange of X and Y. 2.3. Computing mutual information

I(X; Y ) =

1 log 2

|R|

.

214 215 216

217

218 219 220 221

222

Despite its wide applicability and many favourable properties, it is well known that estimating mutual information is not straightforward [29]. There are, however, a few comforting exceptions. In the special case where the joint distribution pX,Y is Gaussian, there are exact, analytic expressions for mutual information in terms of the determinants of the correlation matrix R (17), i.e.

 |R ||R |  xx yy

213

(7)

Similarly, for the mutual information rate one can obtain the exact expression in terms of the power spectra [10]. Less trivially, if the probability distributions in the integrand in (4) are not Gaussians but are known in closed-form, estimating mutual information becomes an exercise in numerical integration, an example of which is given in Section 4.2. In this illustrative application, a modelled relationship between the two variables allows for the joint and conditional distributions, pX,Y and pX|Y respectively, to be estimated via Monte Carlo sampling of the known marginal distribution pY (see Section 4.2 and, for full details, [18]). Despite the distributions being non-Gaussian, the analytic expressions in (7) can still provide the lower bounds of the mutual information and mutual information rates where appropriate Gaussian models are chosen as approximations [30]. In studies of biological systems, and data exploration tasks in particular, one is almost never in possession of any prior knowledge of the probability distributions in (4); mutual information will therefore need to be estimated entirely from data. For discrete variables, not knowing the joint probability mass function is not considered pathological as the probabilities are easily and optimally obtained via maximum likelihood estimation [31]. Biological variables are, however, typically continuous and the challenge is to estimate the distribution functions in (4) from a finite, and often very small, number of empirical point measurements [32]. There have been numerous attempts at constructing better estimators, all of which differ in varying extents with regard to their inherent bias, unwanted tuning parameters, and computation complexity in higher dimensions, and hence the efficiency of each estimator is context dependent. A summary of several methods is illustrated in Fig. 1 and a more thorough review can be found in [14,33]. Perhaps the simplest approach is to discretise the distribution using standard, fixed-width, histograms [35], where the probability mass in each (possibly multi-dimensional) bin is directly proportional to its data occupancy. Although consistent, this systematically overestimates the mutual information and is highly sensitive to the number of bins and, by implication, the number of data points [36]. This estimator can be improved upon by dropping the restriction on fixed histogram widths. This is known as adaptive partitioning, of which there are several variants. One method involves an iterative, tree-like, equal partitioning of the data, which terminates when the support of each bin is sufficiently uniform (as determined via a 2 test statistic) [37]. A more recent attempt

Please cite this article in press as: Mc Mahon SS, et al. Information theory and signal transduction systems: From molecular information processing to network inference. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.06.011

223 224 225 226 227 228

229

230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271

G Model

ARTICLE IN PRESS

YSCDB 1620 1–11

S.S. Mc Mahon et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

4

3. Biological information processing The conceptually simple foundations of information theory belie the true wealth of its possible applications. We attempt to remedy this, starting, in this section, with a few illustrative examples pertaining to biological information processing. These illustrations are ‘conventional’ in the sense that inputs, outputs, and channels are all biological objects from which measurements are extracted. 3.1. Quantifying molecular information transmission

Fig. 1. Mutual information estimators. Illustration of the application of four commonly used mutual information estimators on non-linear, two-dimensional, toy datasets. From the top-left, the estimators used are: fixed-width binning, adaptive partitioning, kernel density estimator (KDE), and k-nearest neighbours (kNN) estimator. The first three estimate the global probability distribution. The kNN estimator, with k = 3 in this case, estimates the entropy terms in (4) directly from the statistics of the distances di to the third-nearest neighbour of each point. By estimating the probability distribution of the distance di between the sample points to their kth nearest neighbour, one obtains estimates for the probability mass surrounding the sample points. Given sufficient data, this leads to local estimates of the probability distribution. Since the entropy H is simply the expected negative log-density, an estimate for the mutual information follows naturally [34].

272 273 274 275 276 277 278 279 280 281 282

283

284 285 286 287 288 289 290 291 292 293 294 295 296 297 298

seeks to achieve an approximately uniform data occupancy across the bins [33]. In both instances, one aims to maximise the effective sample size of the available data sets by varying the width of each partition using local rather than global properties of the data. Another approach, as adopted in Section 3.1, employs kernel density estimation. This generalises the naive estimator by replacing rectangular bins with a kernel function so as to render the estimator independent from the choice of origin or bin positioning [14,38]. For example, in the case of a Gaussian kernel, a onedimensional distribution pX (X) is approximated from a data set {xi }N , assuming a given bandwidth h, as i=1 ∧ pX (x)

1 = √ Nh 2

N 

e

−(x−xi )2 /2h2

.

(8)

i=1

The choice of smoothing parameter is crucial and can be made via a canonical rule-of-thumb [39] or by employing more statistically sophisticated methods like the smoothed bootstrap and other plug-in techniques [40]. So far we have described plug-in methods of approximating mutual information based on consistent, global, density estimates. Instead, one can determine mutual information by direct computation of the entropy terms in (4). The best known instance is, perhaps, the k-nearest neighbour estimator. This provides entropy values from the statistics of the distances to the kth-nearest neighbour of every sample in the data set [34]. Here the dependence on the density is local and implicit (with an assumption of a uniformity surrounding each kth-nearest neighbour ball), one consequence being that there is no guarantee that the estimated mutual information value is positive.

Our first illustration is an instance of how to increase our understanding of biochemical processes within cells by quantifying the information processing efficiency of the associated molecular reactions. Specifically, we would like to gauge the relative impact of different types of noise in relation to the distortion of information between two molecular species in a signalling reaction. There are two sources of noise which account for the loss of fidelity in the information transfer, namely the extrinsic noise, representing cellto-cell variability in cellular or environmental quantities such as protein degradation rates or temperature, and the intrinsic noise, reflecting the inherent stochastic or probabilistic nature of chemical reactions and transport processes at the molecular level [41]. In the simplest of setups, one could consider a signal transduction system where the input and output signals are counts of identical molecules and the noisy transmission channel the physical transport of the molecules from one location in a cell to another, such as the shuttling of Hes1 protein from the cytoplasm to the nucleus in embryogenesis [42,43]. In more complete and complicated scenarios, there is one or several molecular species representing the input and another the output; the channel itself may be a tangled combination of transport processes, linear, branching and loop reactions [44]. Here we illustrate the potential of mutual information for characterising the effects of noise in signalling pathways by estimating its value between molecular species at opposite ends of three, increasingly complex, molecular signalling motifs, under the effects of different types of noise [45]. Details of the motifs are summarised in Fig. 2. The first motif we consider represents a basic input-output transcription regulation system with just two molecular species, for instance, the transcription factor and its regulating gene with no additional interactions. Next, we introduce an additional species, to create a motif made up of a chain of elements with linear dependencies, which in a biological context could correspond to functionally critical linkers between protein domains [46]. The final motif represents a three-species system with a single, non-linear, feed forward loop (FFL) reaction, composed of two genes, one of which regulates the other, and both jointly modulating the transcription of a target gene [47]. Here the focus is on the, so-called, coherent type-1 FFL which has been shown to be the most common type of coherent structure in both E. coli and S. cerevisiae [48]. Note that there are eight different structural types of FFL based on different combinations of activation and repression, each categorised into coherent and incoherent based on whether the sign of the direct and indirect regulation path are, respectively, the same or opposite. Using a stochastic differential equation (SDE) model, extrinsic noise is introduced via random Gaussian perturbations of the system parameters, while the intrinsic noise level is varied by tuning the size of the Brownian motion steps in the Euler–Maruyama approximation of the SDE (which corresponds to the size of the system). There are three scenarios: intrinsic-only, extrinsic-only, and both noise sources present. As shown in Fig. 2, we observe that mutual information I(X;Z) is highest for the cases with extrinsiconly noise. Information transmission is most affected, therefore, by the presence of intrinsic noise and, counter-intuitively, the loss in fidelity is not sufficiently mitigated by increasing signal strengths.

Please cite this article in press as: Mc Mahon SS, et al. Information theory and signal transduction systems: From molecular information processing to network inference. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.06.011

299

300 301 302 303 304 305

306

307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361

G Model YSCDB 1620 1–11

ARTICLE IN PRESS S.S. Mc Mahon et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

5

Fig. 2. The figure above displays the distributions of species X, Y and Z in the simple input–output motif (A), the linear motif (B) and the feed forward loop motif (C), in the presence of extrinsic (purple) noise, intrinsic (green), and both types of noise (blue) — all with comparable output variances — as well as in the absence of noise (red cross). For simplicity all three motifs contain a single time-varying stimulus S(t) and molecules with fixed degradation rates. In each subplot (A–C) we show how the molecular species react to different input signals S = {1, 5, 10, 20}, increasing from top to bottom in each set of scatter plots respectively. The bar charts represent the trend in mutual information computed via KDE between molecular species X and Z for the simple, linear and feed forward loop in the presence of the above mentioned different input signals. The equations corresponding to each motif are detailed in Appendix A.

Please cite this article in press as: Mc Mahon SS, et al. Information theory and signal transduction systems: From molecular information processing to network inference. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.06.011

G Model

ARTICLE IN PRESS

YSCDB 1620 1–11

S.S. Mc Mahon et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

6

Fig. 3. (A) Signal transduction through the Akt pathway takes extra-cellular signals such as EGF, and transduces them into the cell’s interior. In addition to the signal of interest (in a given context), the elements of the signalling pathway also are subject to interact with molecular signals emanating from other receptors or pathways, and in turn affect several other down-stream molecules. Furthermore, signal transmission is dependent on the input strength and low (green) and high (red) EGF concentrations will give rise to different cell-fates. But even once this is known, non-linearity in the dynamics of signal transduction networks may mean that we cannot necessarily predict responses to intermediate signals (orange). (B) Depending on the signals arriving in the cell/cell nucleus, a transcriptional response will result in different cell fate decisions: e.g. proliferation, differentiation, apoptosis or stasis.

381

Furthermore, although this conclusion about the relative impact of the different types of noise cannot be generalised to all signalling pathways, the observed pattern is surprisingly similar across the three otherwise non-trivially different systems. The above analysis of different noise types can also be carried out for entire trajectories (including stationary states), inferred, for instance, from sets of longitudinal data. The linear noise approximation (LNA) [49–51] is often used to investigate information transmission in systems responding to both instantaneous and time-varying input signals [9,10,49]. Through system-size expansion of the chemical Master equation, the LNA provides the first two moments of each species in the system, and their autocorrelations. This Gaussian approximation then allows one to make approximations based on (7). Some other recent examples of information theoretic signal transduction systems analysis (using Gaussian assumptions) use Langevin equations to explore the effect of noise intensity on stochastic resonance [52] and the small noise approximation to find optimal information transmission in gene regulatory networks with particular application to Drosophila embryo development [11].

382

3.2. Cell fate decision making processes

362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380

383 384 385 386 387 388 389 390 391 392 393

The conceptual framework illustrated in the previous section can be applied to cell-fate decision making processes; in this case the extra-cellular signals, such as growth factors, hormones, etc. form the inputs, X, while the cell fates form the outputs, Y. Now, unlike before, the very nature of inputs and outputs differs quite considerably (see Fig. 3), and the “vocabulary” used at the inputs (here, continuous molecular abundances) and the outputs, (here, discrete cell-fates) will typically differ, i.e. the sample spaces ˝X and ˝Y are different. The mutual information I(X;Y) for the example in Fig. 3 is then given by

 I(X; Y )

= ˝X



394

= ˝X

 y ∈ ˝Y



y ∈ ˝Y

 pX,Y (x, y) log

pX,Y (x, y) pX (x)pY (y)



pY |X (y|x)px (x) log



pY |X (y|x) pY (y)

dx

 dx.

Here the conditional probability pY|X (y|x) is a characteristic of the molecular pathway and pY (Y) is the distribution over cell-fates. In this way the mutual information measures the extent to which the uncertainty about cell-fate Y is reduced when the input signal (such as ambient EGF concentration) X is known [7]. More informative here is the channel capacity C, with

 C = max pX

˝X

 y ∈ ˝Y



pY |X (y|x)pX (x) log

pY |X (y|x) pY (y)

395 396 397 398 399 400



dx.

Typically, for a multi-cellular organism pY is given by the physiological demands; for example, a haematopoietic stem cell will have to produce common lymphoid and myeloid progenitors in certain proportions within narrow ranges [53]. Then given the molecular processes, pY|X , the channel capacity is maximised by optimising pX , the distribution over inputs. This is a classical variational problem. However, cell-fate is often subject to more than one input (i.e. X is really a vector of different inputs) and a “parliament of pathways” decides upon the cell-fate. This makes the variational problem harder for the modeller, but gives evolution more scope for altering the different molecular processes that result in the mapping, pY|X . In fact, both pX , the distribution over inputs such as endocrinal and paracrinal signals, and pY|X , the molecular interaction network transducing and processing these extracellular signals, are more flexible and subject to evolutionary “tinkering” with molecular processes. This may seem like a disjoint between the mathematical and biological interpretations of the channel capacity. But it merely reflects the differences between engineering and biological applications of information theory. For any given molecular network giving rise to pY|X , we can identify an “optimal” input distribution, pX , that maximises the channel capacity. In reality this may not be the distribution over physiological conditions, but allows us to assess how much information in principle can flow across a signalling/gene regulatory network underlying cell fate decision making. 3.3. The logic of sensing and responding The signal processing capabilities of all biological entities are under evolutionary pressures due to their life-history, along with their environment and the closely-related constraints imposed by their physiological needs (such as availability of food, etc.). A crucial

Please cite this article in press as: Mc Mahon SS, et al. Information theory and signal transduction systems: From molecular information processing to network inference. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.06.011

401

402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426

427

428 429 430 431

G Model YSCDB 1620 1–11

ARTICLE IN PRESS S.S. Mc Mahon et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471

472 473

aspect, is the ability to translate the environmental states into the cells’ vocabulary, by mapping them onto the corresponding physiological or response states, which in an information processing context correspond to different outputs. For example in E. coli, chemotaxis is guided by the flagella [54]; these can rotate in two ways, causing the bacterium to either swim in a straight line, or to tumble randomly. Thus there are two physiological states, and the process of switching between the two types of motion depends on the external nutrient availability, which is quasi-continuous. The ability of the cell to respond correctly to the environmental signals here directly affects the individual cell’s reproductive success [55]. The number of existing distinct states, onto which external signals are mapped, is a clear indicator of the complexity of the signal processing, and typically is highest in multi-cellular organisms. These in fact possess a range of post-translational modification scenarios, allowing cells to exhibit multiple stable states [56], but also increasing their computational demands. Here, the spatial structure of the eukaryotic cell alone will frequently suffice to generate multi-stable behaviour [57] (where each state can correspond to a separate physiological cell state); partitioning biochemical processes into separate compartments is, of course, widespread and may underly both physiological as well as developmental cell-fate decision making processes. Current understanding of many cell differentiation processes suggests that to go from the original stem cell population to a set of fully differentiated cells, a series of intermediate steps has to be negotiated. This would allow the cell to break down this challenging computational problem — such as going from a haematopoietic stem cell to a fully differentiated plasma cell [58] — into a sequence of much simpler, typically binary, decisions. A simple switch is easily generated at both the transcriptional and post-transcriptional levels [59], and a sequence of such switches could facilitate the progression towards increased differentiation. Depending on the cell type (or circumstances) such decisions can also affect the fitness of an organism, just like in bacteria. The overall importance of getting such decisions “right” is clearly greater in the less differentiated cells, than e.g. in fully differentiated cells. For this reason it will be interesting to see if the signal processing networks, or their ability to handle molecular noise, change along differentiation cascades. 4. Information theoretical analysis of signal transduction systems

482

In contrast to the preceding three examples where the focus is on measuring information, we now describe a slightly less conventional class of applications involving the reverse-engineering of signal transduction systems. We begin with an example in network inference and demonstrate the use of the conditional mutual information measure. The fifth and final example is an information theoretic approach to experimental design, where one or both entities in the mutual information measure are the parameters of a given biological model.

483

4.1. Network reconstruction

474 475 476 477 478 479 480 481

484 485 486 487 488 489 490 491 492

The state of a cell is governed by the complex regulation of the expression of its genes. This regulation occurs at many levels, ranging from chromatin remodelling to post translational modifications [60]. In order better to understand gene regulation, a large and growing body of transcriptomic data can be used to infer interactions between genes, and has prompted the development of a number of gene regulatory network (GRN) reconstruction algorithms, including Graphical Gaussian models, Bayesian networks and relevance networks [61–64].

7

A complete statistical description of a GRN is equivalent to a full specification of the joint probability distribution p(g1 , ..., gN ) for the set of N genes and their corresponding set of steady-state expres. This joint distribution can, in turn, be expressed sion levels {gi }N i=1 in the canonical form



p(g1 , . . ., gN ) =



  1 i (gi ) − ij (gi , gj ) − · · ·⎦ , exp ⎣− Z N

N

i

i,j

(9)

where Z is the normalisation factor and ϕ(gi ), ϕ(gi , gj ), ... the potential functions representing the (possibly zero) contributions of the respective indicated sets of co-regulating genes [65]. Although obtaining the precise definitions of these potentials represents a full characterisation of the pairwise and higher-order genetic interactions, the exercise typically involves computationally non-trivial constrained optimisation routines; furthermore, these definitions are not required if one is interested in discovering only the presence or absence of interactions. In this section, we describe the information theoretic approach to this relatively simpler task of uncovering the non-zero potentials for the overall inference of GRNs. Information theoretic approaches to GRN reconstruction have two major strengths. The first is that mutual information is able to capture nonlinear associations between variables, a feature seen in expression data. Second, it has been demonstrated that the use of the data processing inequality (DPI), which states that for a noisy system X → Y → Z, knowledge of Z cannot give more information about X than Y can give about X [19,65], in distinguishing two genes regulated by a third from a trio of co-regulating genes. In reconstructing simulated GRNs, the combined approach using mutual information and DPI outperforms Bayesian and relevance networks techniques in the precision and recall of direct regulatory links [65]. It has been suggested that the performance can be further improved upon, with fewer false removals of network edges from trios of connected genes, by incorporating conditional mutual information [65]. Employing an estimator for the conditional mutual information allows for indirect associations to be distinguished from direct ones via pairwise testing of all detected three-way associations. The improved performance of this full information theoretic approach can be demonstrated, for instance, via a reconstruction of a simulated GRN model based on the Raf pathway — a gold standard network model of eleven genes (see Fig. 4) — and comparing it with the latter’s experimentally verified structure (network structure and simulated data taken from [61,63]. Building an undirected graphical model of the Raf pathway from a simulated set of nonlinear gene association data with moderate additive noise, the full approach incorporating conditional mutual information is shown to be superior to the “DPI-only” approach irrespective of the choice of DPI tolerance threshold [66]. The DPIonly method can only remove all false negatives (omitted true edges in the inferred network) at the expense of many false positives (incorrectly included edges). Of equal importance, the DPI is never able to remove all false negatives for this reconstruction without missing almost half of the true interactions. Larger sets of gene expression data will enable increasingly accurate reconstructions of GRNs, including interactions regulated by the joint expression of two or more genes. This carries huge potential in understanding the role of transcriptional programmes in cell differentiation and in the molecular basis of cell phenotypes. 4.2. Experimental choice in signal transduction systems In addition to providing an ideal framework for the analysis of a given data set, as demonstrated in the previous example, information theory can also provide the ideal framework for the optimal collection of data. Building mechanistic models from experimental

Please cite this article in press as: Mc Mahon SS, et al. Information theory and signal transduction systems: From molecular information processing to network inference. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.06.011

493 494 495 496 497

498

499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547

548

549 550 551 552

G Model YSCDB 1620 1–11 8

ARTICLE IN PRESS S.S. Mc Mahon et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

Fig. 4. Reconstruction of the Raf signalling network. The undirected graphs here are inferred using the following information theoretic methods. (A) The conditional mutual information algorithm (66); (B–D) the algorithm using mutual information and DPI only at tolerance thresholds 0.05, 0 and 0.2 respectively. Blue edges indicate correct interactions. Green edges represent false negatives (edges falsely removed by the DPI) and red represent false positives (edges incorrectly included by the algorithm). (E) the precision and recall at various levels of tolerance threshold for the DPI-only method. The best precision-recall tradeoff for the DPI-only method was determined by observing the DPI tolerance threshold where these curves intersect. 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586

measurements is central to systems biology. But not all experiments are equally useful: while some may allow for the constructed model to be calibrated to a high precision, others add little or no knowledge to what is already known from previously collected data. Choosing the optimal experiment is the goal of experimental design [67]. In this section we describe how information theory can be used to identify the experiment that removes the most uncertainty from a given model and, by implication, maximises our understanding of the underlying biological system [18]. Mathematical models of developmental signalling systems typically depend on parameters whose values are not known to a high degree of certainty [68]. In the Bayesian framework, this initial uncertainty is represented by the prior distribution over the parameters. The expected data output from a well-designed experiment should allow one to infer the underlying model parameter with high confidence. More formally, given the prior and the corresponding output distribution, the optimal experiment is one with the maximum mutual information between the parameters and the measurable outputs. In addition, mutual information can also be used to determine the experiment that would provide the best prediction of the system behaviour under a separate set of experimental conditions that, for some reason, cannot be implemented. These two applications are illustrated in the following using examples of developmental signal transduction system experiments. Here we define the optimal experiment as the one that maximises the mutual information. However, it is possible to take into account the cost of an experiment or the complexity of an experimental setup. The first example concerns the Hes1 transcription factor that plays a number of important roles, including in cell differentiation and segmentation of vertebrate embryos. Hes1 transcriptional regulation can be modelled by a system of ordinary differential equations involving three species (Hes1 mRNA, Hes1 nuclear protein, and Hes1 cytosolic protein) and four parameters. The shuttling

rate parameter — the rate by which Hes1 enters the nucleus — is of main interest as it was shown to be the most sensitive parameter with regards to obtaining the oscillatory behaviour of the Hes1 protein [43]. Whether western blot protein measurements (P) or real-time PCR mRNA measurements (m) provide more information about the shuttling rate parameter  can be determined by comparing the mutual information measures I(P;) and I(m;). As shown in Fig. 5A, I(P;)

Information theory and signal transduction systems: from molecular information processing to network inference.

Sensing and responding to the environment are two essential functions that all biological organisms need to master for survival and successful reprodu...
2MB Sizes 2 Downloads 4 Views