RESEARCH ARTICLE

Systems Biology Approach to Model the Life Cycle of Trypanosoma cruzi Alejandra Carrea, Luis Diambra* Centro Regional de Estudios Genómicos, Universidad Nacional de La Plata, La Plata, Argentina * [email protected]

Abstract

OPEN ACCESS

Due to recent advances in reprogramming cell phenotypes, many efforts have been dedicated to developing reverse engineering procedures for the identification of gene regulatory networks that emulate dynamical properties associated with the cell fates of a given biological system. In this work, we propose a systems biology approach for the reconstruction of the gene regulatory network underlying the dynamics of the Trypanosoma cruzi’s life cycle. By means of an optimisation procedure, we embedded the steady state maintenance, and the known phenotypic transitions between these steady states in response to environmental cues, into the dynamics of a gene network model. In the resulting network architecture we identified a small subnetwork, formed by seven interconnected nodes, that controls the parasite’s life cycle. The present approach could be useful for better understanding other single cell organisms with multiple developmental stages.

Citation: Carrea A, Diambra L (2016) Systems Biology Approach to Model the Life Cycle of Trypanosoma cruzi. PLoS ONE 11(1): e0146947. doi:10.1371/journal.pone.0146947 Editor: Luzia Helena Carvalho, Centro de Pesquisa Rene Rachou/Fundação Oswaldo Cruz (FiocruzMinas), BRAZIL Received: August 6, 2015 Accepted: December 22, 2015 Published: January 11, 2016 Copyright: © 2016 Carrea, Diambra. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All relevant data are within the paper and its Supporting Information files. Funding: The authors have no support or funding to report. Competing Interests: The authors have declared that no competing interests exist.

Introduction One of the main aims in the post-genome era is to elucidate the complex webs of interacting genes and proteins underlying the establishment and maintenance of cell states. Consequently, many researchers have focused on developing quantitative frameworks to identify modules that govern the transitions between different phenotypes [1–3]. The gene regulatory network (GRN) approach is one of the most popular frameworks used today [4, 5]. This approach has been used to study key reprogramming genes and cell differentiation processes in stem cells from different points of view [6–8]. Mathematically, GRN models are dynamical systems whose states determine the gene-expression levels [4]. The structure of the network is defined as a graph whose nodes are associated with genes (or groups of genes), and whose edges represent the interactions between the nodes. The task of uncovering the GRN architecture from the cell states (gene-expression profiles) represents a very complex inverse problem that has become central in functional genomics [9]. The main drawbacks of this reverse engineering task are not only the large number of genes and the limited amount of data available, but also the nonlinear dynamics of regulations, the inherent experimental errors, the noisy readouts of expression levels, and many other unobserved factors that are part of the challenge [10]. Although emerging technologies offer new prospects for monitoring mRNA concentrations, researchers have focused on determining the architecture of simplified theoretical models [11].

PLOS ONE | DOI:10.1371/journal.pone.0146947 January 11, 2016

1 / 20

Reconstructing the Life Cycle of Trypanosoma cruzi

In this work, we have implemented a GRN approach to analyse transcriptional data of the steady states of the flagellated protozoan parasite Trypanosoma cruzi (T. cruzi). This trypanosomatid is the causative agent of Chagas disease, that affects about 7–8 million people worldwide causing about 12,000 deaths per year. Usually, the parasites are transmitted to humans and to other mammalian hosts mainly by contact with the faeces of infected blood-sucking triatomine bugs [12]. T. cruzi has several developmental stages both in insect vectors and in mammalian hosts (Fig 1a). Insects become infected by sucking blood from mammals with circulating parasites (trypomastigotes). In the midgut of the insect, trypomastigotes differentiate

Fig 1. The life cycle of T. cruzi. (a) Sketch illustrating the life cycle of the parasite. (b) Plots illustrating the transcriptional snapshots of the parasite’s four stages. After a dimensional reduction analysis of the microarray dataset, we have found that the four steady states can be represented by 339 variables. Each of these variables (cells in the 19 × 18 array) corresponds to the intra-cluster average of the log-transformed relative expression level of the genes that belong to the corresponding cluster. Since gene assignment to the clusters is the same for all states, the arrays can be directly compared with one another. doi:10.1371/journal.pone.0146947.g001

PLOS ONE | DOI:10.1371/journal.pone.0146947 January 11, 2016

2 / 20

Reconstructing the Life Cycle of Trypanosoma cruzi

into epimastigotes that replicate by binary fission. Then, epimastigotes differentiate into metacyclic trypomastigotes in the hindgut. This parasite form is released in the insects faeces and enters the mammalian host. Once in the vertebrate host, metacyclic trypomastigotes invade local cells and differentiate into amastigotes that replicate by binary fission. They subsequently transform into trypomastigotes inside the cells. By lysing the cells, trypomastigotes are released into the circulation. Thus, they spread via the bloodstream and infect new cells from distant tissues, mainly muscle and ganglion cells. There, trypomastigotes transform back into intracellular amastigotes which, in turn, undergo further cycles of intracellular multiplication. The cycle of transmission is completed when circulating trypomastigotes are taken up in blood meals by triatomine vectors [13, 14]. Even though there are potential vaccine candidates against T. cruzi infection, no vaccine is yet available [15]. Thus, the finding of novel therapeutic targets remains a significant challenge in the control of Chagas disease. Taking this into consideration, we have implemented a GRN approach to analyse transcriptional data of T. cruzi’s steady states [16]. The first analyses of networks related to T. cruzi were not truly GRN approaches but extracellular matrix (ECM) interactomes [17, 18]. These protein-interaction networks were built using the MiMI Cytoscape plugin. In these works some parasite surface molecules, which are known to modulate or interact with these host proteins, were pointed out. In the present framework, we have uncovered the underlying architecture network that supports the steady states associated with the four phenotypic stages of T. cruzi and the transitions between the parasite’s life cycle stages in response to environmental cues. We believe that this gene network model can clarify the signaling pathways, predict the response of cellular systems to multiple perturbations other than the ones used to derive the model, and determine the perturbation pattern for any desired response.

Materials and Methods Microarray data normalisation In this work we have used the microarray experiments of Minning et al. [16]. These data are publicly available in Gene Expression Omnibus (GEO) database (Accession no.: GSE14641). This series is the result of dye-swap experiments, out of which we selected the probe intensity signals of 12 microarrays (three biological replicates, and the four not-mixed parasite stages). These microarrays comprise 12,288 unique 70-mers designed against open reading frames in the annotated CL Brener reference genome sequence. They also contain 500 control oligonucleotides designed from Arabidopsis sequences. All of these oligonucleotides were printed in duplicate. Further details about probe preparation, microarray hybridisation, and data acquisition can be found in [16], while the description of the microarrays is available at http://pfgrc.tigr.org. The probe intensity signals from the microarrays were subjected to the following normalisation procedure. (i) The signal intensity of each probe was set at the average of the signal intensities associated with a pair of replicate spots. (ii) The signal intensity of a probe i was normalised against the average signal of control Arabidopsis probes in order to obtain a signal relative intensity within the slide. The average signal of control Arabidopsis probes is the arithmetic mean of a set of control probes with valid signals. Of course, this set is the same in all microarray experiments. This normalisation procedure has allowed us to integrate the expression data of all the microarrays. The signal relative intensity of the probe i recorded in one of a the biological replicates, j = 1, 2, 3, at one of the stages, α = 1, 2, 3, 4, was represented by yi j . (iii) a After within-slide replicates processing, we averaged the relative intensity yi j over all replicates P a belonging to the same stage, i.e., yai ¼ 1=3 j yi j . Probes without a valid relative signal in all

PLOS ONE | DOI:10.1371/journal.pone.0146947 January 11, 2016

3 / 20

Reconstructing the Life Cycle of Trypanosoma cruzi

three biological replicates were not considered in the subsequent analyses. As a result of this processing, we obtained the relative intensity of 8904 probes at parasite’s four different stages. We then considered the variable that describes the expression level of the probe i at stage α as y ai =h y ai ia . S1 Table lists all normalised expression levels, xia , used in the the quantity xia ¼ Loge ½ following analyses, with their corresponding oligo IDs.

Clustering procedure Instead of using each gene’s profile, many researchers have analysed the cell at a higher level of abstraction. One way to do this is by grouping redundant genes, i.e. by clustering co-expressed genes [19, 20], and using the average within each cluster as a variable. In order to group the genes by similar expression profiles, we have applied an agglomerative hierarchical clustering method; the Unweighted Pair Group Method with Arithmetic Mean (UPGMA), though a more sophisticated clustering method like self-organizing map could be used for this task. The agglomerative process is stopped at a given number of clusters considered suitable for our dataset. Since the suitable number of clusters, Nc, is not known, it has to be computed beforehand. In order to do this, we repeated the clustering procedure for several Nc values, and computed the Davies-Bouldin index (DBI) as a measure of the clustering merit [20, 21]. The DBI is defined as: n o Nc max d  d X k j 1 ð1Þ ; E¼ Nc j¼1 jjck  cj jj P where dk ¼ Nk1 i jjxi  ck jj denotes the centroid intra-cluster distances of cluster k (Nk being the number of genes belonging to cluster k); and ||ck − cj|| is the distance between the cluster centroids. A low DBI value indicates a good cluster structure. It should be noted that increasing Nc without penalty will always reduce the resulting index. Then, the choice of Nc will intuitively strike a balance between the data compression and the accuracy of the dimensionality reduction. S1 Fig displays the DBI as a function of Nc for the gene-expression profile under study. It can be seen that the DBI does not suffer a significant reduction beyond Nc = 339. Thus, Nc = 339 was selected as the optimal number of clusters. The profiles of the 8,904 genes were grouped in 339 clusters, and the intra-cluster average of the expression level (i.e. xaj ¼ hxia ii2clusterj ) was used in all of the subsequent analyses. Fig 1b displays a 2D array of the resulting average levels after the dimension reduction process described above for the four stages of T. cruzi’s life cycle. The sets of genes belonging to each cluster are listed in S2 Table, and the intra-cluster averages of the expression levels, xaj , for each cluster (rows) at each of the parasite’s stages (columns) are listed in S3 Table.

Reverse engineering methods Gene network dynamics. In this work, we have implemented a discrete-time linear model [22–24] which has two advantages: it can take into account fluctuations, and its parameter estimation does not involve intensive computational steps [11]. In this model, the system’s state at time t is represented by an N-dimensional vector x(t), which represents the activity of the N nodes of the network. The temporal evolution of the gene network is governed by: X wi;j xj ðt Þ þ yi þ kmi þ i ðtÞ; xi ðt þ Dt Þ ¼ ð2Þ j

where wi,j are the elements of the weighted connectivity matrix W, θi is a constant bias term of gene i, and kmi determines the influence of the environmental cue μ on gene i. We have

PLOS ONE | DOI:10.1371/journal.pone.0146947 January 11, 2016

4 / 20

Reconstructing the Life Cycle of Trypanosoma cruzi

considered four different cues corresponding to unknown external differentiation signals. Thus, μ = 1, 2, 3, and 4 represent the external signals responsible for the transitions to the amastigote, epimastigote, metacyclic tryp., and trypomastigote stages, respectively. Finally, i(t) is a noise term assumed to be Gaussian with mean equal to 0. In order to simplify the notation for the parameter estimation procedure, we noticed that the bias term and the environmental cues can be included in an extended version of matrix W and of state vector x. Thus, the state of gene i is given by: xi ðt þ Dt Þ ¼ ðwi;1 ; wi;2 ; . . . ; wi;N ; yi ; kmi Þ  ðx1 ; x2 ; . . . ; xN ; 1; 1Þ þ i ;

ð3Þ

where μ corresponds to the acting environmental cue. This said, the same parameter estimation method can be applied whether the environmental cues are present or not. Singular value decomposition (SVD). Linear models serve as the basis of all continuous gene-network approaches currently available to model typical time-course gene-expression data sets (see [11] for a review). These data sets consist of M pairs of input-output states, represented by D = {X,Y}. Matrix X is the N × M gene-expression matrix at time t. The columns of matrix X labeled by index ν, xν, correspond to the experiments, while the rows indicate individual genes. The same is valid for the gene-expression matrix at time t+Δt, Y. For a given D, the linear model must map each gene-expression state to the consecutive state, i.e.: yn ¼ Wxn ;

n ¼ 1; . . . ; M:

ð4Þ

Therefore, in order to find the connectivity matrix, the predicted states from a given input state xν of the training set must be as close as possible to the output state yν. An alternative would be to minimise the cost function ∑ν kWxν − yνk. A particular solution with the smallest L2 norm is given in terms of the SVD of matrix XT (where superscript T denotes the transpose matrix), i.e. XT = U  S  VT, where U is a unitary M × N matrix of left eigenvectors, S is a diagonal N × N matrix containing the eigenvalues {s1, . . .,sN}, and V is a unitary N × N matrix of right eigenvectors [23, 25]. Thus, the solution with the smallest L2 norm represented by WL2 is given by: T WL2 ¼ Y  U  diagðs1 j ÞV :

ð5Þ

Without loss of generality, all sj elements whose value is different from 0 were listed at the end of diagonal matrix S, and the s1 j values in Eq (5) were considered to be 0 if sj = 0. SVD is mathematically related to the eigen decomposition (ED) and the principal component analysis (PCA). SVD can be understood as a generalisation of ED, but ED only applies on diagonalizable matrices, and fails if the matrix is not square or is singular as in our case. SVD and ED are related, in particular the column vectors of U are eigenvectors of X  XT, while the column vectors of V are eigenvectors of XT  X. The diagonal entries of S are the square roots of the eigenvalues of both X  XT and XT  X. The eigenvectors of XT  X also allow to compute the principal components in the standard PCA. PCA is a common tool for exploratory data analysis. It can provide a lower-dimensional picture by projecting the data onto the subspace with the higher variance [26]. We have exploited PCA analysis for a better visualisation of our results. However, since the focus of this manuscript is finding the connectivity matrix rather than obtaining a better representation of data sets, we have used SVD instead of ED or PCA because it is more suitable for that purpose. The smallest L2 norm solution cannot be unique. Assuming that xν are linearly independent, finding the unique solution requires that M  N. Unfortunately, the inverse problem in GRN involves M <

Systems Biology Approach to Model the Life Cycle of Trypanosoma cruzi.

Due to recent advances in reprogramming cell phenotypes, many efforts have been dedicated to developing reverse engineering procedures for the identif...
NAN Sizes 1 Downloads 9 Views