Chapter 13 Genome-Scale Models of Plant Metabolism Margaret Simons, Ashish Misra, and Ganesh Sriram Abstract A genome-scale model (GSM) is an in silico metabolic model comprising hundreds or thousands of chemical reactions that constitute the metabolic inventory of a cell, tissue, or organism. A complete, accurate GSM, in conjunction with a simulation technique such as flux balance analysis (FBA), can be used to comprehensively predict cellular metabolic flux distributions for a given genotype and given environmental conditions. Apart from enabling a user to quantitatively visualize carbon flow through metabolic pathways, these flux predictions also facilitate the hypothesis of new network properties. By simulating the impacts of environmental stresses or genetic interventions on metabolism, GSMs can aid the formulation of nontrivial metabolic engineering strategies. GSMs for plants and other eukaryotes are significantly more complicated than those for prokaryotes due to their extensive compartmentalization and size. The reconstruction of a GSM involves creating an initial model, curating the model, and then rendering the model ready for FBA. Model ­reconstruction involves obtaining organism-specific reactions from the annotated genome sequence or organism-specific databases. Model curation involves determining metabolite protonation status or charge, ensuring that reactions are stoichiometrically balanced, assigning reactions to appropriate subcellular ­compartments, deleting generic reactions or creating specific versions of them, linking dead-end metabolites, and filling of pathway gaps to complete the model. Subsequently, the model requires the addition of transport, exchange, and biomass synthesis reactions to make it FBA-ready. This cycle of editing, refining, and curation has to be performed iteratively to obtain an accurate model. This chapter outlines the reconstruction and curation of GSMs with a focus on models of plant metabolism. Key words Genome-scale metabolic models, Metabolic pathway databases, Curation, Compartmentalization, Intercompartmental transporters, The SuBliMinaL toolbox, COBRA toolbox, KEGG, MetaCyc

1  Introduction A genome-scale model (GSM) of an organism is an in silico ­stoichiometric model that includes a large number of metabolic reactions from pathways known to operate in the cells of the organism [1]. Typically, a GSM is expected to contain the complete metabolic Margaret N. Simons, Ashish Misra, and Ganesh Sriram conceived the chapter. Margaret N. Simons wrote an initial draft of the chapter; Ashish Misra and Ganesh Sriram critically edited it; Ganesh Sriram prepared the final version. All authors approved the final version. Ganesh Sriram (ed.), Plant Metabolism: Methods and Protocols, Methods in Molecular Biology, vol. 1083, DOI 10.1007/978-1-62703-661-0_13, © Springer Science+Business Media New York 2014

213

214

Margaret Simons et al.

i­ nventory of an organism. Practically, however, GSMs contain between several hundreds to a few thousand reactions [2–4]. GSMs are reconstructed on the basis of experimental results as obtained from genome sequence annotations, gene and protein homology, biochemistry textbooks, the primary literature, and isotope labeling experiments [5–8], and are iteratively refined (e.g., refs. [5, 9, 10]). Quantitative analysis of GSMs facilitates the prediction or estimation of carbon traffic within myriad pathways in a cell as well as the prediction of the outcomes of molecular or environmental perturbations that can effect metabolic responses [11, 12]. A frequently used technique to quantitatively analyze GSMs is flux balance analysis (FBA). FBA and associated methodologies employ mass balancing and linear or quadratic optimization [13, 14] based on a suitable objective function to analyze a constrained GSM. Using a small amount of experimental data such as growth rates extracellular flux measurements, FBA determines a feasible distribution of fluxes for the reactions in a cellular metabolic network [13]. Such flux distributions identify active and inactive pathways in the cell during its growth or survival under different environmental conditions. FBA also enables the simulation of the effects of gene knockouts and overexpressions on metabolism, thus pointing out strategic metabolic engineering targets. Another technique to analyze GSMs is elementary flux mode (EFM) analysis (see Chapter 14), which can delineate all possible metabolic pathways satisfying a given set of constraints such as producing a particular metabolite starting from a particular carbon source. Although analyses such as FBA and EFM analysis can be performed on stoichiometric models of any size, implementing them on a GSM can enable the hypothesis or discovery new network properties due to its ability to holistically view metabolic reactions [15]. The first GSM was published for Haemophilus influenza in 1999 [3]. Since then, GSMs for several microorganisms have been published, including ones for Escherichia coli [2, 16–18] and Saccharomyces cerevisiae [11, 19–22]. The first plant GSM, for Arabidopsis thaliana, was published in 2009 [23]. This relatively late appearance is not surprising given the extensive nature of plant metabolism and the challenges involved in capturing its various features into a GSM (next paragraph). A non-exhaustive list of published GSM include three GSMs for Arabidopsis thaliana [23–25], one for Zea mays [26], one generic GSM for C4 plants [27], one for the alga Chlamydomonas [28] (one for rapeseed and two for rice (see Note 7) Table 1. Two reasons for the relative scarcity of plant GSMs are: (1) extensive compartmentalization of reactions and pathways and (2) numerous and variegated secondary metabolic pathways. Compartmentalization, a prominent feature of plant metabolism, is necessary for plants to apportion cellular functions between subcellular organelles [29]. Often, some pathways and reactions are ­replicated in multiple compartments with a different flux distribution

Genome-Scale Models of Plant Metabolism

215

Table 1 Selected plant GSMs published at the time of writing Model

Number of reactions

Number of metabolites

AraGEM [23]

1,567

1,748

Cytoplasm, mitochondrion, plastid, peroxisome, vacuole

Arabidopsis (poolman) [24]

1,406

1,253

Does not distinguish between cellular compartments

Arabidopsis (radrich) [25]

2,315

2,328

Does not distinguish between cellular compartments

Maize iRS1563 [26]

1,985

1,825

Cytoplasm, mitochondrion, plastid, peroxisome, vacuole, extracellular space

C. reinhardtii [28]

2,190

1,068

Cytosol, chloroplast, mitochondrion, glyoxysome, nucleus, Golgi apparatus, thylakoid, flagellum, eyespot

Rapeseed (see Note 7)

313

262

Cytosol, chloroplast, mitochondrion

Rice (see Note 7)

1,736

1,484

Cytosol, chloroplast, mitochondrion

Rice (see Note 7)

326

371

Cytosol, plastid, mitochondrion

Compartments featured

in each compartment [30]. Intercompartmental transporter proteins allow selected metabolites to move from one compartment to another [31]. The compartmentalization of plant cells, along with the large number of primary and secondary metabolic reactions that occur within a plant cell, makes the reconstruction of a plant GSM challenging and time-consuming. The reconstruction of a GSM requires the collection and ­processing of a substantial amount of information on the reactions that occur within an organism. This process is schematically depicted in Fig. 1 and explained in Subheading 3. First, the largest available set of reactions occurring in an organism is usually determined from databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and MetaCyc/BioCyc family of databases. Both these databases host individual collections of reactions for several organisms. However, each database has its own convention for naming reactions and metabolites as well as its own advantages and disadvantages. Choosing the proper database for an organism is an essential first step to creating the GSM. Following this, several curation steps are required to ensure that the GSM accurately models the organism’s metabolic pathways. Various toolboxes can assist a user to create and edit a GSM as well as ensure that the correct reactions are included in the GSM. The initial model can be created manually, with reactions obtained from the databases and Excel, or by using the SuBliMinaL Toolbox. Toolboxes such as the SuBliMinaL Toolbox and rBioNet in COBRA can help with curating and fixing incorrect reactions or gaps within the model, thus enhancing the accuracy of the model.

216

Margaret Simons et al. Reactions from KEGG or BioCyc

Preliminary Model Metabolite Protonation

Elimination of Dead-End Metabolites

Reaction Balancing Expansion of Generic Reactions

Compartmentalization of Reactions

Curated Model Transport Reactions Biomass Synthesis Reaction Exchange Reactions

Completed Model

Fig. 1 Workflow of GSM reconstruction. Many curation steps between the ­preliminary and final models may need to be iteratively repeated to obtain a GSM that accurately simulates metabolic behavior of the modeled plant

GSM reconstruction involves the elimination of some ­reactions that are included in databases, but are not desired in the stoichiometric model. For instance, metabolic databases or initial GSMs may contain dead-end metabolites, which are only produced or consumed (but not both) by reactions in the database. Often, such metabolites occur only once in the database or the initial model. Because metabolites cannot build up within a cell at steady state, a dead-end metabolite indicates a gap in the model that should be eliminated, preferably by filling. GSM reconstruction also requires the addition of reactions missing from databases, but that certainly occur in the cell, e.g., transport, exchange, and biomass synthesis reactions. Transport reactions are essential to link different compartments within the cell. Exchange reactions allow metabolites to move from inside the cell to outside the cell or vice versa. Furthermore, databases contain generic reactions that feature nonspecific metabolites (e.g., “a fatty acid” instead of “stearic acid”). These metabolites and reactions need to be expanded so that they are specific to the organism. Finally, experimentally determined biomass synthesis reaction(s) that account(s) for the proportions of all metabolites that contribute to the biomass of a plant cell(s) or tissue(s) of i­nterest [32, 33] should be incorporated into the GSM.

Genome-Scale Models of Plant Metabolism

217

2  Materials 2.1  Databases

Inventories of reactions to be included in GSMs are obtained from metabolic databases. The reactions specific to an organism may be collectively obtained from a database; however, it is useful to view and edit specific reactions in the database during manual curation of the model.

2.1.1   KEGG

KEGG (http://www.genome.jp/kegg/kegg2.html) consists of several known biochemical pathways and reactions occurring within a variety of organisms [34]. KEGG consists of smaller databases for individual organisms including several plants.

2.1.2  MetaCyc/BioCyc

MetaCyc and BioCyc (http://metacyc.org and http://biocyc.org) are collections of pathway and genome databases that provide an electronic reference for the genomes and metabolic pathways of sequenced organisms. Individual metabolic pathways or the entire metabolic map of an organism can be viewed on the BioCyc website [35]. Several databases such as Gramene (http://www. gramene.org/pathway) collaborate with BioCyc and contain databases for various plants including Arabidopsis, maize, rice, ­ poplar, and sorghum.

2.2  KEGGtranslator

KEGGtranslator (http://www.ra.cs.uni-tuebingen.de/software/ KEGGtranslator/index.htm) is an application that can visualize and convert KEGG Markup Language (KGML)-formatted files into a variety of output formats [36]. KGML format is a XML-file format that is specific to the KEGG database. The KEGGtranslator requires Java to change the format of the KGML file into a format that is easier to work with such as Systems Biology Markup Language (SBML).

2.3  The SuBliMinaL Toolbox

The SuBliMinaL Toolbox (http://www.mcisb.org/resources/ subliminal) provides an integrated interface to perform common tasks that are essential during the creation and editing of GSMs [37]. The toolbox can generate draft reconstructions, determine the protonation state of a metabolite, balance the mass and charge of reactions, and format the reconstruction so that it can be used in third-party analysis packages. Both the KEGG and BioCyc databases are compatible with the SuBliMinaL Toolbox.

2.4  Pathway Tools

Pathway Tools (http://bioinformatics.ai.sri.com/ptools; also see Chapter 10) is a software useful for creating and visualizing organism-­specific databases [38]. The reactions from databases can be exported into a SBML format. Pathway Tools creates a metabolic map representing the reactions and pathways from the GSM.

218

Margaret Simons et al.

2.5  Spreadsheet Program

A spreadsheet program such as Microsoft Excel (http://office. microsoft.com/en-us/excel) is useful for manual curation of GSMs. The large amount of information in the model, including the reactions and metabolites, can be viewed in an organized ­manner, making it convenient to edit the model manually.

2.6  MATLAB

Mathworks MATLAB (http://www.mathworks.com/products/ matlab) is a very useful program for solving engineering and technical computing problems, especially those involving extensive matrix manipulations and elaborate algorithms. MATLAB implements algorithms, analyzes data, visualizes data, and computes numerical results. During the creation of a GSM, MATLAB can be employed to encode the GSM into SBML, a format necessary for all other toolboxes. COBRA, the main toolbox used for GSMs, runs on MATLAB.

2.7   COBRA

The Constraint Based Reconstruction and Analysis (COBRA; http://opencobra.sourceforge.net) toolbox uses MATLAB as its programming environment to edit, repair, and run FBA on metabolic models including GSMs. COBRA’s capabilities are normally invoked through the MATLAB command line. Once a metabolic model is FBA-ready, COBRA can be used to implement methods such as FBA to determine a feasible solution space of the model, and then apply measurements and physiochemical constraints to reduce the solution space. An optimization algorithm and (an) objective function(s) are required to isolate a particular flux distribution amongst several possible ones in the solution space [39, 40]. COBRA has inbuilt linear optimization algorithms such as the GNU Linear Programming Kit (GLPK; http://www.gnu. org/software/glpk) that are usually adequate for FBA-type analyses. However, more complex problems may require the use of external algorithms such as GUROBI (http://www.gurobi.com), CPLEX (http://www-01.ibm.com/software/commerce/optimization/ cplex-optimizer/index.html), and TOMLAB (http://tomopt. com/tomlab). An add-on for the COBRA toolbox, rBioNet (http://opencobra.sourceforge.net), uses COBRA (version 1.3 or higher) commands in a user-friendly interface [41]. It assembles and monitors the model reconstruction process to decrease human error that may occur during manual editing of spreadsheets. However, as all commands in rBioNet are already contained within COBRA and can be invoked through the MATLAB command line, this program is optional and only used if the user wants a graphical interface.

2.8  MetNet Online

MetNet Online (http://metnetonline.org) can identify the compartment of reactions from the BioCyc database in specific organisms including Arabidopsis, Chlamydomonas, soybean, and Vitis [42]. MetNet Online allows export, through Excel, of information on

Genome-Scale Models of Plant Metabolism

219

compartmentalization of reactions or metabolites for individual pathways. Although MetNet Online provides a good foundation for determining the compartmentalization of metabolic reactions, it has inaccuracies. For example, in the Arabidopsis-specific pathways on MetNet Online, the pentose phosphate pathway has been only shown (as of this writing) to occur in the cytosol. This contrasts with experimental evidence for the presence of many of this pathway’s enzymes in both the plastid and the cytosol in Arabidopsis [30].

3  Methods 3.1  Preliminary Model

A preliminary GSM can be created by using the SuBliMinaL Toolbox or manually in Excel. Depending on the database selected and the completeness of the reactions contained therein, some of the following steps may be skipped. For example, using the KEGG database obviates some curation steps. The manual creation of the preliminary model requires the addition of transport, exchange, and biomass equations as well as incorporation of the compartmentalization of each reaction. The SuBliMinaL Toolbox, a very powerful tool in GSM reconstruction due to its ability to automate several steps, can add transport and exchange reactions, include biomass equations as well as suggest intracellular compartmentalization.

3.1.1  Choosing a Database

Choosing a database from which to obtain reactions is critical in GSM development. Criteria that should be considered while selecting a database include the completeness of reactions, the specificity of the reactions to the organism, and the number of generic ­reactions present. Selecting a database with a near-complete set of reactions from the organism of interest will decrease the amount of dead-end metabolites, causing fewer issues during editing of the model. Additionally, it is important to include only the reactions that are known to occur within the organism of interest. For instance, some databases include reactions from plants that are similar but not identical to the plant of interest. Finally, choosing a database with a fewer number of generic reactions will save time during the model curation, because generic reactions must either be deleted or replaced with variants that include metabolites specific to the plant of interest (see Subheading 3.2.5 and Note 1).

3.1.2  Creating a Preliminary Model with the SuBliMinaL Toolbox

The SuBliMinaL Toolbox consists of a set of modules that can generate a draft model and curate it by obtaining and integrating information from KEGG, MetaCyc/BioCyc or a combination of databases. The “KEGG-extract” module can extract files from KEGG for desired organism by providing the NCBI taxonomy ID for the organism. Following this, the “merge” module can create a draft SBML model reconstruction. The “MetaCyc-extract” module creates a model from MetaCyc/BioCyc. This module requires the

220

Margaret Simons et al.

user to have access to the BioCyc family of databases (freely ­available for academic users). Given access, this module can reconstruct an SBML model for an organism based on its NCBI ­taxonomy ID. 3.1.3  Obtaining Reactions from KEGG

KEGGtranslator can be used to convert an organism’s KEGG pathway map to reactions needed for a GSM. Once the reactions are extracted in KGML format, they can be exported into SBML format using KEGGtranslator. COBRA can convert a set of reactions in SBML format to Excel format, which permits easy reading and editing.

3.1.4  Obtaining Reactions from MetaCyc/ BioCyc

Reactions from the MetaCyc/BioCyc databases are best extracted by using Pathway Tools. This program exports reactions from BioCyc database into SBML format, which can then be converted into Excel format by using COBRA.

3.1.5  Creating the Initial Model with Excel

While creating an initial model using only Excel, reactions should first be obtained from the chosen database (see subsections below) and then populated into an Excel workbook. For further curation with COBRA, the workbook must contain two sheets (tabs), one named “Reactions” and another named “Metabolites.” The headers of each sheet must be set up as shown in Fig. 2 (also see Note 2). This option of creating the initial model manually using Excel should be a last resort to using the SuBliMinaL Toolbox due to the time-consuming nature of manual model reconstruction.

3.2  Curation of the Model

Once the preliminary model is created, a significant amount of curation and editing is needed before the model will accurately represent the organism chosen. First, the protonated state of metabolites must be determined on the basis of intracellular or intra-organellar pH (see Note 3), and metabolites must be accordingly protonated or deprotonated. Generic reactions must be curated to include specific metabolites or deleted. Dead-end metabolites must be eliminated by using gap-filling techniques [43]. Furthermore, the balancing of all reactions is critical to ensure there is no inappropriate cycling within the cell. Despite all these steps, a manually curated model may still require testing (via FBA) and subsequent iterative curation to ensure that it does not simulate unrealistic or nonsensical situations. For instance, if a model contains a reaction in which the number of atoms of a ­certain element (e.g., sulfur) is unbalanced, FBA-type algorithms will perceive this reaction as being able to generate that sulfur from nothing. Therefore, an “optimal” flux distribution returned by FBA may leverage such an unbalanced reaction in a metabolic cycle to simulate growth in the absence of sulfur, thus simulating a realistically impossible scenario.

Genome-Scale Models of Plant Metabolism

221

Fig. 2 Excel model, with Reactions and Metabolites tabs (sheets). The Reactions tab (top) shows an example each of a transport, exchange, biomass synthesis, and database reaction. The Metabolites tab (bottom) includes metabolites from the cytosol (c), peroxisome (x), mitochondrion (m), and plastid (p). COBRA requires the tab names and headers to be formatted exactly as shown here

3.2.1  Curating the Model for Easier Reading

Once the model is created, curation may be required in order ­render the model in a form easy to read and edit. This step is not necessary, but could save time while reorganizing the model. Curation of the model includes editing the names of the metabolites so that they are clear and concise. Metabolite names exported from BioCyc are normally very long, making the metabolites and reaction description very difficult to read.

3.2.2  Identifying Reaction Compartmentalization

Plant GSMs are more sophisticated than their bacterial or ­mammalian counterparts due to extensive compartmentalization of metabolites and reactions. Compartments or organelles such as plastids (including chloroplasts, amyloplasts, and leucoplasts), mitochondria, and peroxisomes allow for functionally specialized aqueous spaces within the cell [44]. Organelles are impermeable to many hydrophobic and charged molecules because they are enclosed by lipid bilayers. Some pathways are distributed across multiple compartments. For example, the conversion of lipids to sugars in germinating oilseeds via the β-oxidation and glyoxylate cycles is known to involve reactions occurring in the peroxisome, cytosol, and mitochondria. Single-celled C4 photosynthesis is possibly orchestrated between chloroplasts, cytosol, and mitochondria [45]. Certain pathways are known to occur in more than one ­compartment. For instance, glycolysis and the pentose phosphate pathway often operate both in the cytosol and the plastid [30]; unraveling the compartmentalization and the extent of duplication of these two pathways has been the focus of metabolic flux analysis for more than a decade (e.g., refs. 46–49). This compartmentalized nature of plant cells presents a major challenge in reconstructing GSMs, because pathways need to be correctly assigned to compartments for accurate simulations of metabolic scenarios. Compartments usually featured in plant GSMs include the cytoplasm, peroxisome, plastid, mitochondrion, and vacuole. Additionally, the Golgi and periplasmic space may also be included.

222

Margaret Simons et al.

A preliminary assignment of metabolic pathways to compartments may be performed by studying plant biochemistry textbooks and refined by looking up databases such as BioCyc and other protein localization databases. However, the most updated information on compartmentalization is obtained from the literature as well as various experimental and in silico techniques. Experimental techniques toward this goal include cell fractionation, immunohistochemistry, proteome analysis, and in vivo imaging techniques. For a comprehensive discussion of some computational techniques for this purpose, see Chapter 12. Furthermore, MetNet Online (Subheading 2.8) determines the compartmentalization information for each reaction and exports it. Table 2 summarizes how we manually assigned compartments to reactions in an illustrative pathway (the tryptophan synthesis pathway in poplar) by considering available evidence and predictions from various databases and tools, including the Arabidopsis peroxisome (AraPerox) database [50] and Arabidopsis chloroplast (AT_Chloro) database [51], Arabidopsis mitochondrial protein database (AMPDB) [52], the literature and textbooks as well as the TargetP [53] and WoLF PSORT [54] algorithms. 3.2.3  Metabolite Protonation

Protonation or deprotonation, i.e., the addition or removal of a proton due to the intracellular or intra-organellar pH is required for each metabolite in a GSM. Typically, Excel workbooks containing

Table 2 Assigning reactions to compartments Reaction

Stoichiometric equation

AT gene ID

Rn_Igpsyn_p

H[p] + 1_o_car_1_deo_5_pho_p[p] →  ind_3_gly_pho[p] + H2O[p] + CO2[p]

AT5G48220 p p p m c p

Rn_Praisom_p

N_5_pho_ant[p] → 1_o_car_1_deo_ 5_pho_p[p]

AT1G07780 p

p p

p

Rn0_2382_p

indole[p] + L_serine[p] → H2O[p] +  L_tryptopha[p]

AT5G38530 p p p –

p

AT2G28880 p – p p

p

anthranila[p] + 5_pho_1_ AT5G17990 p p p p pyr[p] → diphosphate[p] + N_5_pho_ant[p]

p

Rn_Anthransyn_p chorismate[p] + L_glutamine[p] → H[p] + L_ glutamate[p] + pyruvate[p] + anthranila[p] Rn_Prtrans_p

1 2 3 4

Final

We manually assigned compartments to reactions in the tryptophan synthesis pathway in poplar by considering available evidence and predictions from various databases and tools. The columns in order are: reaction name; stoichiometric equation (some abbreviated metabolite names may appear cryptic); the Arabidopsis gene ortholog whose product catalyzes the reaction; compartment obtained from the SUBA database (1); compartment obtained from combined analysis of Arabidopsis peroxisome (AraPerox) database [50], Arabidopsis chloroplast (AT_Chloro) database [51], and Arabidopsis mitochondrial protein (AMPDB) database [52] (2); compartment obtained from the literature or t­ extbooks (3); compartment predicted by TargetP [53] and WoLF PSORT [54] algorithms (4); we arrived at a final compartmental assignment (shown in bold) by carefully evaluating all the abvoe information. Compartments are abbreviated as cytosol (c), plastid (p), mitochondria (m), peroxisome (x), and undetermined (–)

Genome-Scale Models of Plant Metabolism

223

GSMs use a “charged formula” for the protonated form of the metabolite. Protonation status of functional groups is determined by their pKa values [5] and the pH of the compartment containing the metabolite (see Note 3). Plant cells are generally at a pH of 7.2, although some compartments such as vacuoles may be at a significantly different pH. The SuBliMinaL Toolbox can be used to easily protonate all metabolites corresponding to a given pH, via the command “protonate.” The toolbox can export the charged formulas in either the KEGG or BioCyc formats. 3.2.4  Balancing Reactions

The SuBliMinaL Toolbox and COBRA can examine the protonated molecular formulas of metabolites to indicate which reactions are not balanced by charge, mass, or both. The “balance” command in the SuBliMinaL Toolbox automates charge and mass balances to automatically detect and if possible, correct reactions by employing mixed integer linear programming [37] (see Note 4).

3.2.5  Generic Reactions

Metabolic databases frequently contain generic reactions, i.e., reactions containing nonspecific metabolites (e.g., fatty acid, long-­ chain alcohol, protein, DNA, RNA, electron acceptor). Such reactions should not be directly included in a GSM without curation. It is important to create particular versions of these reactions containing metabolites specific to the organism. However, if this information is not known or not fully known, the generic reaction needs be eliminated (retaining any particular versions of it) as not doing so will prevent the application of FBA to the GSM. For instance, the β-oxidation cycle for fatty acid decomposition is a “spiral” pathway that begins with a 2,3,4-saturated fatty acylCoA containing n (typically in the range 12–30) carbon atoms. One turn of the spiral converts this compound to a fatty acyl-CoA containing n-2 carbon atoms, which continues through several more turns to ultimately result in acetoacetyl-CoA, which has four carbon atoms. MetaCyc depicts only one turn of this spiral pathway (Fig.  3a). Curation of this cycle involves: (1) obtaining an experimental fatty acid and/or triglyceride profile for the organism and (2) replacing the generic fatty acid degradation cycle with particular versions featuring each fatty acid experimentally known to be present in the organism (Fig. 3b shows an example for decanoic acid degradation). This process may significantly expand the inventory of reactions in the GSM.

3.2.6  Dead-End Metabolites and Gap-Filling

Every metabolite in a stoichiometric model should be present in at least two reactions in the model, so it does not become a dead-end metabolite. COBRA includes a command “detectDeadEnds” to determine, and if necessary, delete dead-end metabolites from a stoichiometric model. Additionally, the “gapFind” command in COBRA finds the gaps in a model and the “growthExpMatch” command uses optimization to suggest candidate reactions to fill these gaps [40] (see Note 5).

224

Margaret Simons et al.

b

a A 2,3,4Saturate d Fatty Acyln CoA Fatty Acyln-2 CoA

A Trans-2enoyl-CoA

A 3oxoacylCoA

A (3S)-3Hydroxyacyl -CoA

DecanoylCoA OctanoylCoA HexanoylCoA ButanoylCoA

Trans-but-2enoyl-CoA Trans-Hex-2enoyl-CoA Trans-Oct2-enoylCoA Trans-Dec-2enoyl-CoA

3-OxodecanoylCoA 3-OxooctanoylCoA 3-OxohexanoylCoA AcetoacetylCoA

(S)-3-Hydroxybutanoyl-CoA (S)-Hydroxyhexanoyl-CoA (S)-3-Hydroxyoctanoyl-CoA (S)-Hydroxydecanoyl-CoA

Fig. 3 Curation of generic reactions: the β-oxidation cycle of saturated fatty acids. (a) Databases often show generic reactions featuring a single turn of a spiral pathway and nonspecific metabolites. (b) Curation involves replacing the generic reactions with ones featuring metabolites specific to the plant of interest. Here, the generic “a 2,3,4-saturated fatty acyl-CoA” (a) is replaced with decanoyl-CoA, octanoyl-CoA, etc. (b) to convert the generic pathway to a specific, spiral β-oxidation pathway

3.3  Rendering the Model FBA-Ready

Even if a preliminary model is curated as described above, it usually requires additional processing for FBA to be performed on it. These reactions include intercompartmental transport reactions, extracellular exchange reactions, and (a) biomass synthesis reaction(s) specific to the organism.

3.3.1  Inter­ compartmental Transport Reactions

In cells, several metabolites travel between contiguous compartments, either by diffusion or by the action of a transporter protein. This transport depends on the pH, concentration and charge g ­radients across the membrane separating the two compartments, the concentration of transporter proteins as well as the distribution of binding sites [55]. Therefore, a model featuring a metabolite in two or more compartments will often need to include transport reactions that carry the metabolite between these compartments. However, metabolic databases generally do not include transport reactions. The intercompartmental transport of many metabolites is accompanied by the counter-exchange of another metabolite. Therefore, the introduction of a transport reaction also requires the introduction of a compatible co-substrate [56]. An important point to note is that certain metabolites, despite being present in more than one compartment, do not travel between the compartments. This is due to two non-exclusive reasons: (1) their chemical properties do not enable to them cross membranes or (2) there may be no intercompartmental transporter

Genome-Scale Models of Plant Metabolism

225

proteins for these metabolites [31, 56, 57]. It is desirable to limit intercompartmental transport reactions to only those metabolites with experimental evidence for movement from one compartment to another. Available evidence for metabolite transporters is based on proteomic analysis [58], transcriptomic analysis [59], full genome sequencing, and forward and reverse genetic screens. Once the compartment housing each reaction is determined, the compartment corresponding to each metabolite can be generated with COBRA. Transport reactions with one-to-one stoichiometries and co-substrates, if applicable, can then be added. These reactions can be created with the SuBliMinaL Toolbox or manually added with Excel. 3.3.2  Exchange Reactions

Exchange reactions facilitate the cellular entry or exit of metabolites including water, sources of carbon, nitrogen, sulfur, or p ­ hosphorus, gases, compounds present in the medium or liquid surrounding the cell or tissue of interest as well as biomass components and cellular products. Exchange reactions can be added with the SuBliMinaL Toolbox or Excel. Exchange reactions for metabolite entry into a cell are written with a blank reactant side, whereas those for metabolite exit are written with a blank product side. Certain metabolites both enter and leave cells, which should be modeled by reversible exchange reactions of the type “ ⇔ H2O[c]”. Care should be taken not to include an excessive number of exchange reactions in a model as this may cause optimization algorithms to inaccurately simulate metabolic situations.

3.3.3  Biomass Synthesis Reaction

A biomass synthesis reaction reflects the contributions of metabolites in a GSM to cellular or tissue biomass. This reaction should be constructed on the basis of experimentally determined biomass composition, which generally includes proteins, proteinogenic amino acids, nucleotides and nucleic acids, lipids, lipogenic fatty acids and glycerol, carbohydrates including starch, cellulose, and soluble sugars as well as various soluble metabolites [32]. The biomass equation should account for the contributions of different metabolites to these biomass components and the proportions of the components in the biomass. An illustration of this procedure for a small model is available in [32], and additional discussion is available in [33].

3.4  Running FBA Simulations on the Model

A completed GSM can be analyzed by FBA to determine the values of individual fluxes corresponding to specific biological ­scenarios. This is necessary to predict metabolic behavior as well as test and improve the GSM (see Note 6). To select a unique solution amongst several candidates that satisfy the GSM and its ­constraints, FBA optimizes (an) objective function(s). Growth rate maximization has been demonstrated to be a good objective function for fastgrowing microbes and its validity is understandable from an evolutionary perspective. In plant metabolism, this objective

226

Margaret Simons et al.

function may be applicable to scenarios such as rapidly dividing meristematic cells or germinating embryos. Williams et al. [12] have used the sum of all fluxes as an objective function for Arabidopsis cell suspensions, arguing that minimizing this function is equivalent to satisfying the stoichiometric constraints imposed by the GSM with minimal enzyme activity. Indeed, using this objective function made several predictions consistent with flux estimates from 13C isotope labeling [12]. For most other plant cells and tissues, it may be necessary to evaluate objective functions by ­ examining their ability to simulate anticipated metabolic behaviors. FBA on a GSM is ideally performed with COBRA, by ­inputting an Excel or SBML version of the model that specifies an objective function and constraints. One or more fluxes can be designated as components of the objective function as explained in Note 2. The COBRA command “optimizeCbModel” performs FBA and generates both a flux solution and the value of the objective function for this solution. The flux solution can be easily copied into Excel to visualize fluxes for individual reactions.

4  Notes 1. Reaction names exported from the BioCyc database are generally very long and therefore difficult to view and read in tables. The KEGG database uses coded metabolite names such as C00001. The KEGG database sometimes uses reactions that are specific to a similar organism with the assumption that the reactions are the same in the organism being modeled. However, this assumption may not always be valid. 2. In the Excel rendition of a model, the column headers in the reactions tab must include, for each reaction, the following information in the order listed: (1) reaction name, (2) reaction description, (3) reaction stoichiometric equation, (4) gene– reaction association, (5) gene(s), (6) proteins, (7) subsystem, (8) reversibility, (9) flux lower bound, (10) flux upper bound, (11) the objective reaction whose flux should be optimized in FBA, (12) confidence score, (13) EC number, (14) notes, and (15) reference(s). The columns that must be populated for each reaction are (1) reaction name, (3) reaction stoichiometric equation, (8) reversibility, (9) flux upper bound, (10) flux lower bund, and (11) the objective reaction. The column headers in the metabolites tab must include the following metabolite properties in the order listed: (1) name, (2) description, (3) neutral (protonated) formula, (4) charged (non-protonated) formula, (5) charge, (6) compartment, (7) KEGG ID, (8) PubChemID, (9) CHEBI ID, (10) structure in INCHI format, and (11) structure in SMILES format. The columns that must be populated for each metabolite are (1) name and (5) charge.

Genome-Scale Models of Plant Metabolism

227

See Note 4 for a situation in which protonated formulas must be specified. 3. The Henderson–Hasselbalch equation relates the protonation state of a weak acid–conjugate base pair to pH. For the dissociation of a weak acid to a proton and its conjugate base: Ka

HA  H + + A −



(1)

for which Ka is the dissociation (equilibrium) constant, we have: éA - ù pH = pK a + log ë û . [HA ]



(2)

4. Tools such as the SuBliMinaL Toolbox will balance a reaction only if protonated formulas are listed for all metabolites participating in the reaction. However, the protonated formulas can be obtained a priori using this toolbox as explained in Subheading 3.2.3. 5. Adding reactions to fill gaps as suggested by COBRA may ­create new dead-end metabolites and therefore a new gap. Therefore, users may need to iteratively perform gap-filling to ensure that all metabolites occur in more than two reactions in the model. 6. Running FBA for metabolic scenarios whose flux distributions are predictable a priori is a very useful method to test and iteratively improve the GSM. 7. Some plant GSMs listed in Table 1 appeared when this book went into press; they are reported in [60–62].

Acknowledgments  This work was funded by the US National Science Foundation (Award IOS-0922650). References 1. Milne C, Eddy J, Raju R, Ardekani S, Kim P-J, Senger R, Jin Y-S, Blaschek H, Price N (2011) Metabolic network reconstruction and genome-scale model of butanol-producing strain Clostridium beijerinckii NCIMB 8052. BMC Syst Biol 5:130 2. Reed JL, Vo TD, Schilling CH, Palsson BO (2003) An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol 4:R54

3. Edwards JS, Palsson BO (1999) Systems properties of the Haemophilus influenzaeRd metabolic genotype. J Biol Chem 274:17410–17416 4. Durot M, Bourguignon P-Y, Schachter V (2009) Genome-scale models of bacterial metabolism: reconstruction and applications. FEMS Microbiol Rev 33:164–190 5. Thiele I, Palsson BO (2010) A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc 5:93–121

228

Margaret Simons et al.

6. Seaver SMD, Henry CS, Hanson AD (2012) Frontiers in metabolic reconstruction and modeling of plant genomes. J Exp Bot 63: 2247–2258 7. Kim TY, Sohn SB, Kim YB, Kim WJ, Lee SY (2012) Recent advances in reconstruction and applications of genome-scale metabolic models. Curr Opin Biotechnol 23:617–623 8. Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BØ (2007) Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci U S A 104:1777–1782 9. Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL (2010) High-­ throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol 28:977–982 10. Feist AM, Herrgård MJ, Thiele I, Reed JL, Palsson BØ (2009) Reconstruction of biochemical networks in microorganisms. Nat Rev Microbiol 7:129–143 11. Mo ML, Palsson BØ, Herrgård MJ (2009) Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Syst Biol 3:37 12. Williams TCR, Poolman MG, Howden AJM, Schwarzlander M, Fell DA, Ratcliffe RG, Sweetlove LJ (2010) A genome-scale metabolic model accurately predicts fluxes in central carbon metabolism under stress conditions. Plant Physiol 154:311–323 13. Orth JD, Thiele I, Palsson BO (2010) What is flux balance analysis? Nat Biotechnol 28: 45–248 14. Segrè D, Vitkup D, Church GM (2002) Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci U S A 99:15112–15117 15. Oberhardt MA, Palsson BO, Papin JA (2009) Applications of genome-scale metabolic reconstructions. Mol Syst Biol 5:320 16. Reed JL, Palsson BØ (2003) Thirteen years of building constraint-based in silico models of Escherichia coli. J Bacteriol 185:2692–2699 17. Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BØ (2007) A genomescale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol 3:121 18. Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, Palsson BØ (2011) A comprehensive genome-scale reconstruction of Escherichia coli metabolism—2011. Mol Syst Biol 7:535 19. Duarte NC, Herrgård MJ, Palsson BØ (2004) Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized

genome-scale metabolic model. Genome Res 14:1298–1309 20. Nookaew I, Jewett MC, Meechai A, Thammarongtham C, Laoteng K, Cheevadhanarak S, Nielsen J, Bhumiratana S (2008) The genome-scale metabolic model iIN800 of Saccharomyces cerevisiae and its ­validation: a scaffold to query lipid metabolism. BMC Syst Biol 2:71 21. Schellenberger J, Park JO, Conrad TM, Palsson BØ (2010) BiGG: a biochemical genetic and genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics 11:213 22. Förster J, Famili I, Fu P, Palsson BØ, Nielsen J (2003) Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res 13:244–253 23. De Oliveira Dal’Molin CG, Quek L-E, Palfreyman RW, Brumbley SM, Nielsen LK (2009) AraGEM – a genome-scale ­reconstruction of the primary metabolic network in Arabidopsis thaliana. Plant Physiol. doi:10.1104/pp. 109.148817 24. Poolman MG, Miguet L, Sweetlove LJ, Fell DA (2009) A genome-scale metabolic model of Arabidopsis thaliana and some of its properties. Plant Physiol 151:1570–1581 25. Radrich K, Tsuruoka Y, Dobson P, Gevorgyan A, Swainston N, Baart G, Schwartz J-M (2010) Integration of metabolic databases for the reconstruction of genome-scale metabolic networks. BMC Syst Biol 4:114 26. Saha R, Suthers PF, Maranas CD (2011) Zea mays iRS1563: a comprehensive genome-scale metabolic reconstruction of maize metabolism. PLoS One 6:e21784 27. De Oliveira Dal’Molin CG, Quek L-E, Palfreyman RW, Brumbley SM, Nielsen LK (2010) C4GEM, a genome-scale metabolic model to study C4 plant metabolism. Plant Physiol 154:1871–1885 28. Chang RL, Ghamsari L, Manichaikul A, Hom EFY, Balaji S, Fu W, Shen Y, Hao T, Palsson BO, Salehi-Ashtiani K et al (2011) Metabolic network reconstruction of Chlamydomonas offers insight into light-driven algal metabolism. Mol Syst Biol 7:518 29. Lunn JE (2007) Compartmentation in plant metabolism. J Exp Bot 58:35–47 30. Kruger NJ, von Schaewen A (2003) The oxidative pentose phosphate pathway: structure and organisation. Curr Opin Plant Biol 6:236–246 31. Linka N, Weber APM (2010) Intracellular metabolite transporters in plants. Mol Plant 3:21–53 32. Sriram G, Gonzalez-Rivera O, Shanks JV (2006) Determination of biomass composition of Catharanthus roseus hairy roots for

Genome-Scale Models of Plant Metabolism ­ etabolic flux analysis. Biotechnol Prog 22: m 1659–1663 33. Senger RS (2010) Biofuel production improvement with genome-scale models: the role of cell composition. Biotechnol J 5:671–685 34. Kanehisa M, Goto S, Kawashima S, Nakaya A (2002) The KEGG databases at GenomeNet. Nucleic Acids Res 30:42–46 35. Karp PD, Ouzounis CA, Moore-Kochlacs C, Goldovsky L, Kaipa P, Ahrén D, Tsoka S, Darzentas N, Kunin V, López-Bigas N (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 33:6083–6089 36. Wrzodek C, Dräger A, Zell A (2011) KEGGtranslator: visualizing and converting the KEGG PATHWAY database to various ­formats. Bioinformatics 27:2314–2315 37. Swainston N, Smallbone K, Mendes P, Kell D, Paton N (2011) The SuBliMinaL Toolbox: automating steps in the reconstruction of metabolic networks. J Integr Bioinforma 8(2):186 38. Karp PD, Paley SM, Krummenacker M, Latendresse M, Dale JM, Lee TJ, Kaipa P, Gilham F, Spaulding A, Popescu L (2010) Pathway Tools version 13.0: integrated software for pathway/genome informatics and ­systems biology. Brief Bioinform 11:40–79 39. Becker SA, Feist AM, Mo ML, Hannum G, Palsson BO, Herrgard MJ (2007) Quantitative prediction of cellular metabolism with constraint-­based models: the COBRA Toolbox. Nat Protoc 2:727–738 40. Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S et al (2011) Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v.20. Nat Protoc 6:1290–1307 41. Thorleifsson SG, Thiele I (2011) rBioNet: A COBRA toolbox extension for reconstructing high-quality biochemical networks. Bioinformatics 27:2009–2010 42. Wurtele ES, Li L, Berleant D, Cook D, Dickerson JA, Ding J, Hofmann H, Lawrence M, Lee E, Li J (2007) MetNet: systems biology tools for Arabidopsis. In: Wurtele ES, Nikolau BJ (eds) Concepts in plant metabolomics. Springer, Heidelberg, pp 145–157 43. Green ML, Karp PD (2004) A Bayesian method for identifying missing enzymes in ­predicted metabolic pathway databases. BMC Bioinformatics 5:76 44. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P (2002) Molecular biology of the cell. Garland Science, New York

229

45. Edwards GE, Franceschi VR, Voznesenskaya EV (2004) Single-cell C4 photosynthesis ­versues the dual-cell (Kranz) paradigm. Annu Rev Plant Biol 55:173–196 46. Roscher A, Kruger NJ, Ratcliffe RG (2000) Strategies for metabolic flux analysis in plants using isotope labelling. J Biotechnol 77: 81–102 47. Sriram G, Fulton DB, Iyer VV, Peterson JM, Zhou R, Westgate ME, Spalding MH, Shanks JV (2004) Quantification of compartmented metabolic fluxes in developing soybean embryos by employing biosynthetically directed fractional 13C labeling, two-­dimensional [13C, 1H] nuclear magnetic resonance, and comprehensive isotopomer balancing. Plant Physiol 136: 3043–3057 48. Masakapalli SK, Lay PL, Huddleston JE, Pollock NL, Kruger NJ, Ratcliffe RG (2010) Subcellular flux analysis of central metabolism in a heterotrophic Arabidopsis thaliana cell suspension using steady-state stable isotope labeling. Plant Physiol 152:602–619 49. Allen DK, Laclair RW, Ohlrogge JB, Shachar-­ ­ Hill Y (2012) Isotope labelling of Rubisco subunits provides in vivo information on subcellular biosynthesis and exchange of amino acids between compartments. Plant Cell Environ 35:1232–1244 50. Reumann S, Ma C, Lemke S, Babujee L (2004) AraPerox A database of putative Arabidopsis proteins from plant peroxisomes. Plant Physiol 136:2587–2608 51. Ferro M, Brugière S, Salvi D, Seigneurin-Berny D, Court M, Moyet L, Ramus C, Miras S, Mellal M, Gall SL et al (2010) AT_CHLORO, a comprehensive chloroplast proteome database with subplastidial localization and curated information on envelope proteins. Mol Cell Proteomics 9:1063–1084 52. Heazlewood JL, Millar AH (2005) AMPDB: the Arabidopsis mitochondrial protein database. Nucleic Acids Res 33:D605–D610 53. Emanuelsson O, Brunak S, von Heijne G, Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2:953–971 54. Horton P, Park K-J, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K (2007) WoLF PSORT: protein localization predictor. Nucleic Acids Res 35:W585–W587 55. Hettema EH, Tabak HF (2000) Transport of fatty acids and metabolites across the peroxisomal membrane. Biochim Biophys Acta 1486: 18–27 56. Weber AP, Fischer K (2007) Making the connections – the crucial role of metabolite ­

230

Margaret Simons et al.

transporters at the interface between chloroplast and cytosol. FEBS Lett 581:2215–2222 57. Weber APM (2004) Solute transporters as connecting elements between cytosol and plastid stroma. Curr Opin Plant Biol 7: 247–253 58. Bräutigam A, Weber AP (2009) Proteomic analysis of the proplastid envelope membrane provides novel insights into small molecule and protein transport across proplastid membranes. Mol Plant 2:1247–1261 59. Weber AP, von Caemmerer S (2010) Plastid transport and metabolism of C3 and C4 plants— comparative analysis and possible biotechnological exploitation. Curr Opin Plant Biol 13:256–264

60. Pilalis E, Chatziioannou A, Thomasset B et al (2011) An in silico compartmentalized metabolic model of Brassica napus enables the ­systemic study of regulatory aspects of plant central metabolism. Biotechnology and Bioengineering 108:1673–1682 61. Poolman MG, Kundu S, Shaw R et al (2013) Responses to Light Intensity in a GenomeScale Model of Rice Metabolism. Plant Physiology 162:1060–1072 62. Lakshmanan M, Zhang Z, Mohanty B et al (2013) Elucidating the Rice Cells Metabolism under Flooding and Drought Stresses Using Flux-based Modelling and Analysis. Plant Physiology 162:2140–2150

Genome-scale models of plant metabolism.

A genome-scale model (GSM) is an in silico metabolic model comprising hundreds or thousands of chemical reactions that constitute the metabolic invent...
330KB Sizes 0 Downloads 0 Views