Accepted Manuscript Evolution of catalytic microenvironment governs substrate and product diversity in trichodiene synthase and other terpene fold enzymes Indu Kumari, Mushtaq Ahmed, Yusuf Akhter PII:
S0300-9084(17)30245-6
DOI:
10.1016/j.biochi.2017.10.003
Reference:
BIOCHI 5284
To appear in:
Biochimie
Received Date: 10 July 2017 Accepted Date: 5 October 2017
Please cite this article as: I. Kumari, M. Ahmed, Y. Akhter, Evolution of catalytic microenvironment governs substrate and product diversity in trichodiene synthase and other terpene fold enzymes, Biochimie (2017), doi: 10.1016/j.biochi.2017.10.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
102-124
(a)
(c)
RI PT
169-175
M AN U
SC
247-252
311-320 564-571
AC C
504-534
EP
TE D
(b)
653-660 667-680 726-733 746-750
40-52
ACCEPTED MANUSCRIPT 1
Evolution of catalytic microenvironment governs substrate and product diversity in trichodiene
2
synthase and other terpene fold enzymes
3 4
Indu Kumari1, Mushtaq Ahmed1 and Yusuf Akhter2*
5
1
6
Himachal Pradesh-176206, India
7
2
8
176206, India
9
*Correspondence:
RI PT
School of Earth and Environmental Sciences, Central University of Himachal Pradesh, Kangra,
SC
School of Life Sciences, Central University of Himachal Pradesh, Kangra, Himachal Pradesh-
E.mails:
[email protected],
[email protected] 11
Running title: Substrate and product diversity in terpene fold enzymes
M AN U
10
12
16
17
18
19
EP
15
AC C
14
TE D
13
20
21
22 1
ACCEPTED MANUSCRIPT Abstract
2
Trichodiene synthase, a terpene fold enzyme catalyzes the first reaction of trichodermin biosynthesis
3
that is an economically important secondary metabolite. Sequence search analysis revealed that the
4
proteins containing terpene fold are present in bacteria, fungi and plants. Terpene fold protein from
5
Selaginella moellendorffii, a lycophyte, appeared at the interface of the microbes and plants in the
6
evolutionary scale. Amino acid residues present around the catalytic pocket determines the size of the
7
substrate as well as product molecules. It has been observed that the overall molecular evolution of
8
the catalytic pockets dictates the choice of substrates/products of the proteins. It was further observed
9
that N-terminus of multi-domain terpene fold proteins may assist in the interactions with the
SC
RI PT
1
pyrophosphate part of the substrates. The phylogenetic analysis of these proteins revealed that the
11
enzymes are clustered into groups based on the domains present additional to the catalytic domains.
12
We have also observed inter-domain ‘puckering forceps’ type motions in the multi-domains using
13
normal mode analyses which were further correlated with their functions. The evolutionary clustering
14
of these proteins was also influenced by the presence/absence of cofactor interacting motifs. These
15
results may be used to modify/enhance the functions of these enzymes using protein engineering
16
methods.
17
Abbreviations: EAS, 5-epi-aristolochene synthase; PS, Pentalenene synthase; EIZS, Epi-isozizaene
18
synthase; CS, 1,8-cineole synthase; DMAPP, Dimethylallyl diphosphate; FPP, Farnesyl
19
pyrophosphate; GPP, Geranyl pyrophosphate; GPPS, Geranyl pyrophosphate synthase; δCS, (+)-δ-
20
cadinene synthase; HexPPS, Hexaprenyl pyrophosphate synthase; IPP, Isopentenyl pyrophosphate,
21
ISPS, Isoprene synthase; OPPS, Octaprenyl pyrophosphate synthase; PPS, Polyprenyl synthase;
22
BPPS, Bornyl pyrophosphate synthase; SS, Squalene synthase; TaxS, taxadiene synthase; TS,
23
Trichodiene synthase
24
Keywords: Terpene fold; Trichodiene synthase; Catalytic site evolution; Single domain proteins;
25
Multi-domain proteins
AC C
EP
TE D
M AN U
10
2
ACCEPTED MANUSCRIPT 1. Introduction
2
All terpene molecules are derived from C5 linear allyl chain i.e. isopentenyl diphosphate (IPP) and
3
dimethylallyl diphosphate (DMAPP). The terpene structures are diverse in nature that result in the
4
formation of diverse terpene molecules by rearrangement and cyclization the linear reactants [1-4].
5
The plant species have been reported to produce several terpene synthases [5], however, only a few
6
microbial enzymes have been functionally characterized [6-13]. The genomic studies have provided
7
the fundamental information that led to the discovery of several sesquiterpene synthases from
8
actinomycetes and cyanobacteria [14-17], though, only three of them namely, trichodiene synthase
9
(TS), aristolochene synthase and presilphiperfolan-8-ol synthase [4, 7, 9, 13, 18-21] have been
SC
RI PT
1
characterized so far. Trichodiene synthase (TS) belongs to the family of terpene synthases that are
11
pivotal enzymes involved in the biosynthesis of trichothecenes secreted by different fungal genera
12
[22]. Many of the terpene compounds are also reported to exhibit various biological properties such as
13
anti-microbial activities [23]. These have been also reported to have deterrence to harmful insects and
14
were obtained from liverworts [24, 25]. It has been shown that the diterpene synthase genes in the
15
lycophyte, Selaginella moellendorffii are closely related to other plant homologues phylogenetically.
16
The mono- and sesqui-terpene synthase genes of plants appeared to be evolutionarily related to
17
microbial terpene synthases [25]. Similarly, recent studies carried out on the mono- and sesquiterpene
18
synthase genes of Marchantia polymorpha showed that these genes are distantly related to terpene
19
synthase genes in fungi and bacteria but are unrelated to those previously described from land plants.
20
However, functional diterpene synthase genes present in M. polymorpha showed their direct
21
proximity to other terpene synthases found in vascular plants [25, 26]. These genes might have
22
functionally diversified to mono- and sesqui-terpene synthases during the course of evolution [25, 26].
23
There are studies that have shown the diversity among terpene synthases in different species of
24
bacteria, fungi, lower plants and higher plants [2, 3, 5]. However, there is no report so far at the level
25
of protein structure levels that has completely deciphered the TS related enzymes in these different
26
domains of life. Therefore, the present investigation is an attempt in this direction. The protein
27
structure based present study shows that TS related terpene fold carrying proteins are present in 3
AC C
EP
TE D
M AN U
10
ACCEPTED MANUSCRIPT bacteria, fungi, lower plants and higher plants. Further, terpene fold containing proteins are required
2
for the synthesis of diverse terpenoid derivatives which are involved in the biosynthesis of hormones,
3
vitamins, pigments (carotenoids), quinones, membrane lipids, essential oils, antioxidants and give
4
fragrance, flavor, and medicinal properties to these diverse compounds obtained from different
5
organisms including plants and microbes [5]. These products of terpene synthase enzymes have been
6
extensively explored for their diverse applications as prospective targets for exploitation for advanced
7
Biofuels production, agricultural chemicals, medicines industrial chemicals, flavors and fragrances
8
[27, 28]. Therefore, to demonstrate the microenvironment of terpene fold in the catalytic site of the
9
protein which leads to a structural and functional diversity of these compounds, a comparative
SC
RI PT
1
structural analysis was carried out. It was observed that different regions of the protein are conserved
11
and the functional domains, specifically, which contribute to the product diversity in several terpene
12
synthase enzymes are present in diverse organisms. In addition to the conserved catalytic motifs of the
13
protein, it was also observed that the amino acid residues present in the binding pocket of the enzyme
14
also play important role in the reaction. The orientation of these residues determines the size of
15
substrate and product. We have analyzed inter-domain motions and correlated fluctuations of multi-
16
domain proteins in the context of their catalytic functions. We have also observed the gene fusion and
17
diversity generation during the course of events involving various molecular evolution steps in the
18
terpene fold proteins.
19
2. Methods
20
The overall methodology used to study the comparative sequence and structural differences in the
21
terpene fold containing proteins were given in Fig. 1. The details are given in the following
22
subsections.
23
2.1. Sequence retrieval
24
The protein sequences for TS enzymes were obtained from the NCBI protein database using the
25
BLAST algorithm in the blastp mode with default parameters using a non-redundant database so that
26
the homologous sequences can be obtained from most diverse organisms. The protein sequences (16
AC C
EP
TE D
M AN U
10
4
ACCEPTED MANUSCRIPT in number) obtained were selected as broad representatives from the groups of organisms namely,
2
fungi (beneficial and pathogenic fungi), bacteria, lower and higher plants and were further analyzed.
3
2.2. Multiple sequence alignments and motif analysis
4
Clustal X and MultAlin were used to obtain the best possible sequence alignment between the protein
5
sequences [29, 30]. The pairwise alignment was carried out by conventional dynamic programming
6
method in MultAlin that is both accurate and precise. The secondary structures of the proteins were
7
visualized along with the conserved residues and multiple sequence alignment (MSA) using ESPript,
8
ClustalX and MultAlin [29-31]. Gibbs sampling algorithm was used to obtain the sequence logo of
9
the two motifs in TS proteins [32]. The WEBLOGO tool was used to construct consensus of
10
sequences which indicates the conserved pattern of amino acids at the specific positions [33].
11
2.3. Phylogenetic analysis
12
The sequences were aligned and phylogenetic tree was constructed using 1000 bootstrap value. The
13
MEGA5 was used to construct the tree [34]. Maximum likelihood was used to construct phylogenetic
14
tree of TS proteins as it is used for the analysis of sequences of diverse origins. While Maximum
15
parsimony method was used to construct the phylogenetic tree of terpene fold containing proteins
16
which have similar catalytic domains with a comparatively higher sequence homology than former set
17
of sequences [34, 35]. The phylogenetic tree was viewed using the program FigTree [36].
SC
M AN U
TE D
EP
18
RI PT
1
2.4. Comparative structural analysis of terpene synthase enzymes
20
The TS protein was used as a query which provided 14 terpene fold containing proteins using DALI
21
server [37]. Files containing the coordinates of these proteins were downloaded from PDB [38].
22
Comparative tertiary structure analysis was carried out using the rapid alignment of proteins in terms
23
of domains (RAPIDO) server. RAPIDO aligns the protein structures of diverse protein molecules and
24
identifies the conserved domains, which may be present at different positions in the analyzed
25
sequence. The variations among amino acid residues are employed by weighting-functions based on
AC C
19
5
ACCEPTED MANUSCRIPT the refined B-values [39]. Result files were analyzed by PyMOL [40]. Substrates were docked into the
2
proteins which were without ligand by molecular docking using AutoDock Vina-2.0 and MGLTools
3
[41, 42]. Obtained complexes were energy minimized to achieve best stable conformation by
4
GROMACS software [43, 44]. The contact mapping analysis was carried out using LigPlus [45]. The
5
binding pocket volumes around the ligands were defined within the radius of 10 Å to consider the
6
residues which were involved in non-bonded interactions. This analysis of the proteins was carried out
7
using POVME_2_0_1 [44, 46]. The volumes of substrates and products were analyzed using the
8
‘measure volume and area’ tool of UCSF Chimera [47].
9
Normal mode analysis of single and multi-domain protein
SC
RI PT
1
Global motions of flexible regions of the single and multi-domain proteins were analysed using
11
normal mode analysis (NMA). NMA of the molecules was carried out using Anisotropic Network
12
Model web server which works on the elastic network model. The analysis was carried out using
13
default settings [48, 49]. All atoms of the proteins were taken for the analysis of total 10 modes with a
14
cut-off value of 10 Å for mode calculation for each of them. Elastic network model represents the
15
system at the residue levels. Therefore, macromolecules are represented as a graph in the form of
16
network. The normal mode calculation is based on the harmonic approximation of the potential
17
energy function around a minimum energy conformation. TS and TaxS were analyzed using NMA as
18
representatives of single domain and multi-domain proteins.
19
3. Results
20
3.1. Trichodiene synthase and other terpene fold proteins across different kingdoms of life
21
The metal binding and the catalytic domain of TS protein is well characterized in Fusarium
22
sporotrichiodes [50]. TS protein homologues were obtained from different organisms using pBLAST.
23
It was observed that bacteria, lower plants as well as higher plants have proteins that show sequence
24
similarity with TS protein. MSA across various fungal genera revealed that catalytic region (metal
25
binding motif and pyrophosphate binding motif) of TS protein is conserved (Fig. 3). Secondary
AC C
EP
TE D
M AN U
10
6
ACCEPTED MANUSCRIPT structure analysis revealed that the protein sequence obtained from bacteria and plants have similar
2
secondary structures as observed across different genera of fungi. Further, phylogenetic analysis
3
showed that terpene fold carrying proteins similar to the TS protein were clustered into fungal,
4
bacterial, lycophyte and other plants based on their similarity and co-evolution (Fig. 4). It was
5
observed that in S. moellendorffii terpene fold protein was clustered between that of the bacteria and
6
the plants which provide us a clue to conclude that the protein present in this lycophyte might
7
represent an evolutionary intermediate form between the microbes and the plants.
8
3.2. Distribution of substrate diversity of terpene fold
9
The terpene fold present in TS protein possessing metal binding domain and the catalytic site is
SC
RI PT
1
reported in all the enzymes involved in the biosynthesis of terpene molecules in different organisms.
11
These organisms range from bacteria, fungi, plant and human as retrieved from DALI server. These
12
enzymes are 5-epi-aristolochene synthase (EAS) from tobacco (PDB ID: 5IK0) [51], pentalenene
13
synthase from Streptomyces UC5319 (PS) (PDB ID: 1PS1) [52], epi-isozizaene synthase (EIZS) from
14
Streptomyces coelicolor (PDB ID: 3LG5) [53], 1,8-cineole synthase (CS) from Salvia (PDB ID: 2J5C)
15
[54], geranyl pyrophosphate synthase (GPPS) from Mentha piperita (PDB ID: 3KRF) [55], (+)-δ-
16
cadinene synthase (δCS) from Gossypium arboretum (PDB ID: 3G4F) [56], hexaprenyl
17
pyrophosphate synthase (HexPPS) from Sulfolobus solfataricus (PDB ID: 2AZJ) [57], isoprene
18
synthase from (ISPS) from Grey Poplar Leaves (PDB ID: 3N0F) [58], octaprenyl pyrophosphate
19
synthase (OPPS) from Thermus thermophilus (PDB ID: 1WLO) [59], polyprenyl synthase (PPS) from
20
Caulobacter crescentus (PDB ID: 3OYR) [60], bornyl pyrophosphate synthase (BPPS) from Salvia
21
officinalis (PDB ID: 1N21) [61], squalene synthase from Homo sapiens (SS) (PDB ID: 1EZF) [62]
22
and taxadiene synthase (TaxS) from Taxus brevifolia (PDB ID: 3P5R) [63]. Although these proteins
23
have different lengths of amino acid sequence, however, the amino acid residues involved in the
24
catalysis are reported to be conserved [63, 64]. The metal binding domains DDXX(X)D/E and
25
“NSE/DTE” [N/DDXXS/TXX(K/R)E] were found in many of the terpene synthases. However, there
26
are some enzymes which do not have one of these two domains. NSE/DTE domain was not observed
AC C
EP
TE D
M AN U
10
7
ACCEPTED MANUSCRIPT in δCS, CS, OPPS, PPS and HexPPS while NSE/DTE domain was not found in TaxS. The metal
2
binding domain DD(X)nD is present in GPPS. However, it is not involved in the catalytic activity of
3
the enzyme [55]. It was observed that the amino acid residues interacting with the pyrophosphate
4
moiety were of the same chemical nature that leads to the formation of the product. Arginine,
5
phenylalanine, tyrosine, valine, tryptophan and isoleucine were the commonly observed amino acid
6
residues in the catalytic sites of these enzymes. The pattern of amino acid residue sequence present in
7
the binding pocket of the enzymes provides the molecular basis of the ability of these enzymes to
8
produce the products of diverse chain lengths. Apart from these amino acid residues as mentioned
9
above, few more amino acid residues are also known to play important role in the catalysis. The
SC
RI PT
1
comparative analysis of TS with GPPS, EIZS, HexPP, PS, OPPS and SS revealed similar core
11
structures except for some additional folds (Supplementary Fig. 2). However, in the case of EAS, CS,
12
δCS, ISPS and TaxS, it showed an extended structural fold that might be assisting the enzyme to carry
13
out its additionally evolved functions (Fig. 4).
14
3.3. Evolutionary relationship of different enzymes containing the terpene molecular fold
15
The phylogenetic analysis showed that the proteins are closely related. However, their sequences
16
follow clustering patterns according to their structure, function and presence/absence of additional
17
functional domains other than the core catalytic domain in the proteins. It could be observed that the
18
enzymes involved in the cyclization of isoprenyl compounds tended to cluster in one group i.e. OPPS,
19
PPS, δCS, CS, HexPP. Bacterial and fungal sesquiterpene cyclases are single-domain enzymes that
20
adopt class I terpene synthase fold and such enzymes were clustered in one group. These were PS, SS,
21
FPPS, EIZS and TS. Rest of the proteins clustered into a separate group containing proteins with two
22
or three domains namely, TaxS, CS, EAS, ISPS and BPPS. Comparison of TaxS with other terpene
23
cyclases revealed that cyclase architecture is modular in nature and may be consist of one, two or
24
three domains.
25
3.4. Single domain enzymes
AC C
EP
TE D
M AN U
10
8
ACCEPTED MANUSCRIPT It was observed that the terpene synthases which have a single domain for enzyme catalysis showed
2
structural similarity with homologous enzymes (Supplementary Fig. 1). These include PS, SS, FPPS,
3
EIZS and TS. The structural comparison analysis is elaborated in the following subsections.
4
3.4.1. Squalene synthase
5
The comparative structural analysis of TS and SS proteins showed that they share similarities at the
6
catalytic site. Arginine is the common amino acid residue in both the enzymes involved in the
7
catalysis. The amino acid residues of SS which interact with pyrophosphate moiety are F54, V69,
8
Y73, R77, M150, M154, Y171, V175, L183, L211, Q212, R218, Q219, Y276, F288, C289 and P292
9
(Supplementary Fig. 2). However, in case of TS protein arginine and phenylalanine are the two amino
10
acid residues that are involved in the interactions. The amino acid residues which are present in radius
11
of 10 Å of the substrate binding site of SS are alanine, leucine, glycine and valine with smaller side
12
chains which may provide enough space to accommodate the two molecules of farnesyl
13
pyrophosphate (FPP) that are used during the reaction (Supplementary Fig. 3). However, in case of
14
TS protein, arginine, phenylalanine and glutamate that have longer side chains were observed to be
15
present in the same pocket, thereby, making the pocket smaller in comparison to the SS enzyme that
16
leads to the cyclization of only single molecule of FPP.
17
3.4.2. Polyprenyl transferases
18
HexPPS, OPPS and PPS are the enzymes that belong to polyprenyl transferases. The substrates for all
19
these three proteins are the same while the end products are different (Table 1). However, there are
20
extended α-helices in TS, which were observed to be near the first metal binding domain and at the N-
21
terminal region (Supplementary Fig. 1). Since these helices surround the active site, they might be
22
assisting in the pyrophosphate catalysis. The presence of amino acid residues with smaller side chain
23
such as A76 before the DDXXD motif might be helpful to the enzyme in catalyzing the substrates
24
(Supplementary Fig. 2). The amino acid residues that are important for the catalysis of pyrophosphate
25
include A76, P114, R116, W136 and L164 (Supplementary Fig. 2). The amino acid residues that
AC C
EP
TE D
M AN U
SC
RI PT
1
9
ACCEPTED MANUSCRIPT interact with pyrophosphate in OPPS are A76, R91, K93, D205 and D208 (Supplementary Fig. 2).
2
However, the amino acid residues interacting with pyrophosphate in PPS were K56, P60, H88, L103,
3
R105, L224 and K236. This reflects that amino acids of different types are present in the binding
4
pocket. It is already reported that site-directed mutagenesis of alanine to tyrosine may alter the length
5
of product molecules in OPPS and HexPPS [58]. The contact mapping analysis showed that the
6
substrate FPP interacts with alanine, threonine, aspartate and histidine in the binding pocket, whereas,
7
in the case of isopentenyl pyrophosphate (IPP), arginine, glutamine and phenylalanine were lining the
8
IPP binding site in the same pocket (Supplementary Fig. 4). Since HexPPS, OPPS and PPS
9
accommodate both the substrates in the same binding pocket and therefore, it should be large enough
10
to accommodate both of them simultaneously. It is reported that the cleft volume of HexPPS is 1400
11
Å3 [65]. It was observed that the binding pocket volumes of OPPS and PPS are 1779 Å3 and 2004Å3
12
respectively. It was observed that all these three enzymes have binding pocket volumes larger than the
13
TS enzyme (1175 Å3). This observation is in correlation with molecule sizes of the product of these
14
enzymes (C15 for TS; C30 for HexPPS; C40 for OPPS and C40 or C50 for PPS). This may lead to
15
facilitate the catalysis and swift release of these product molecules from the respective catalytic
16
pockets.
17
3.4.3. Pentalene synthase
18
The catalysis requires hydrophobic surrounding in the binding pocket of both the proteins, TS and PS.
19
However, the amino acid residues present in the binding pocket varies. The aromatic amino acid
20
residues F57, F76, F77, W308 and hydrophobic residues L53, V79, T182 and V301 lined the binding
21
cavity of PS. However, Arginine is the common residue in both proteins which is positively charged
22
and may stabilize the pyrophosphate group present in the substrate. The amino acid residues which
23
have shown non-bonded interactions with pyrophosphate in PS are R44, L53, L82, D117, R173,
24
V179, K225, R230 and H309 (Supplementary Fig. 5). The comparative biding pocket analysis
25
between PS, TS and SS showed that FPP bind at different positions in these enzymes i.e. in case of
26
TS, it showed interactions with D100, N225 and R304 (Supplementary Fig. 5). When FPP was docked
AC C
EP
TE D
M AN U
SC
RI PT
1
10
ACCEPTED MANUSCRIPT at the homologous site as of PS protein and the resulting complex was energy minimized, it showed
2
hydrogen bond interactions with residues D345, E423 and E497; However, the actual binding site
3
reported in the case of PS is different as discussed above (Supplementary Fig. 2 and 5). This
4
stereospecific exclusive binding of FPP in PS, SS and TS binding pockets leads to the formation of
5
different types of product molecules (Table 1).
6
3.4.4. Geranyl pyrophosphate synthase
7
GPPS has functional large subunit (chain A and D) and small subunit (chain B and C) which may
8
have role in catalysis. The large subunit (chain A and D) shows structural resemblance to TS which is
9
a homodimer. GPPS converts dimethylallyl diphosphate (DAMPP) (C5) and IPP (C5) into geranyl
SC
RI PT
1
pyrophosphate (GPP). However, TS protein cyclizes FPP (C15) into trichodiene (C15). Different
11
substrates and products are accommodated by the changes in the local environment of the binding
12
pockets (Supplementary Fig. 5). Pyrophosphate shows hydrogen bond interactions with C87, M88,
13
S107, V160, C161, R293, D294 and N295 in GPPS. However, in the case of TS, only lesser number
14
of residues were observed to interact with pyrophosphate. These include R182, K232, R303, F304 and
15
R305 (Supplementary Fig. 5). It was observed that the amino acid residues, phenylalanine and
16
glutamine (with larger side chains) which showed non-bonded interactions may help the enzyme to
17
accommodate two substrates of smaller sizes in case of the GPPS (Supplementary Fig. 6). Further,
18
binding pocket volume analysis showed that the size of GPPS is 973 Å3 while the size of binding
19
pocket of TS is 1175 Å 3 (Table 2). It is inferred that the binding pocket volume is smaller in case of
20
GPPS to accommodate the smaller substrate molecules. This provides the evidence that both the
21
enzymes have same molecular fold but lead to the formation of different size of product molecules
22
(Table 1).
23
3.4.5. Epi-isozizaene synthase
24
The structural comparison of EIZS with TS showed similarity around the binding pocket. It is earlier
25
pertinently reported that negatively charged D99 is responsible for the metal coordination in both
AC C
EP
TE D
M AN U
10
11
ACCEPTED MANUSCRIPT EIZS as well as TS [9, 53]. The comparative structural alignment revealed that the amino acid
2
residues presented similar pattern in three-dimensional structures in both the cases (Supplementary
3
Fig. 5). The interactions of pyrophosphate moiety with amino acid residues of the binding pocket lead
4
to the formation of enzyme specific product. The amino acid residues interacting with pyrophosphate
5
are R194, F196, I249, N240, S244, K247, R338 and Y339 (Supplementary Fig. 5). Comparative
6
binding pocket analysis showed that the orientation of interacting amino acid residues namely,
7
phenylalanine and arginine were present in the proximity in the binding pocket of EIZS than that of
8
the TS protein which may lead to the formation of different end product (Supplementary Fig. 3;
9
Supplementary Fig. 6) (Table 1). This was further supported by binding pocket volume calculation
SC
RI PT
1
which showed that the size of binding pocket of EIZS is 527 Å3 which is much more compact than the
11
binding pocket (1175 Å3) of TS (Table 2). This stark difference in the size of binding pockets leads to
12
the formation of different products by these enzymes.
13
3.5 Multi-domain proteins
14
Many of the terpene fold proteins have been reported to carry multiple domains [65]. However, these
15
domains are not involved in the classical catalytic activity. The comparative structural analysis of the
16
TS protein with these multi-domain proteins was carried out (Fig. 4). The results of multi-domain
17
proteins namely, δ-CS, CS, EAS, ISPS, BPPS and TaxS are discussed in the below subsections.
18
3.5.1. δ-Cadiene synthase
19
δ-CS is a terpene cyclase [56]. The metal binding motif (NSE/DTE) is absent in δCS. However, it
20
contains another aspartate-rich sequence which interacts with metal ion. The pyrophosphate present in
21
δ-CS showed hydrogen bond interactions with some of the amino acid residues namely, Y382, E385,
22
R448 and R270 (Supplementary Fig. 5). It was observed that δ-CS binding pocket is lined by valine
23
and leucine which support the enzyme to interact with the substrate molecule (Supplementary Fig. 7).
24
The products of both the reactions showed relatively small difference in their volume. However, the
25
binding pocket volume of δ-CS is larger (1464 Å3) than that in case of TS protein (1175 Å3) (Table 2).
AC C
EP
TE D
M AN U
10
12
ACCEPTED MANUSCRIPT Therefore, we observed a reasonable difference in the binding pocket volumes of these two enzymes
2
which may lead to the formation of two different end products.
3
3.5.2. 1, 8-cineole synthase
4
It is reported that W317, I337, T342, Y420, S445, I451, L485 and Y564 amino acid residues are
5
conserved in both enzymes which may help to stabilize the carbocationic intermediates of the reaction
6
[9, 51, 52]. However, amino acid residues with similar nature were reported taking part in catalysis in
7
TS and CS proteins have been reported [9, 51, 52]. There are other amino acid residues in case of CS
8
which were observed to form hydrogen bond with the substrate. These amino acid residues include
9
I337, S445, I451, T342, T343 and L485 which showed bonded interactions with pyrophosphate
SC
RI PT
1
moiety (Supplementary Fig. 8). The amino acid residues with longer side chains namely leucine and
11
tryptophan that are present in the binding pocket of the protein but showed no hydrogen bond
12
formation are known to help the substrate to fit in the binding pocket leading to the formation of 1, 8-
13
cineole (Supplementary Fig. 7). It was observed that the volume of cineole (154.6 Å3) is less than the
14
product [trichodiene (200 Å3)] of TS protein. Further, it was correlated with the comparative binding
15
pocket volume analysis which showed that the binding pocket volume of CS was 857 Å3 which is less
16
than the binding pocket volume of TS protein (1175 Å3) (Table 2). It might be the reason for the
17
formation of different products in these enzymes.
18
3.5.3. 5-Epi-Aristolochene synthase
19
It was observed that the 264-266 and 521-534 amino acid residues present in the loops of EAS
20
contributed to the catalysis of the substrate since these occur near the catalytic site of the protein
21
(Supplementary Fig. 8). In case of TS, arginine and tyrosine were reported to be involved in the
22
reaction catalysis of FPP [67]. However, in our contact mapping analysis, in case of EAS, different
23
residues were observed to form hydrogen bond interactions with the pyrophosphate. These are G526,
24
T401, T402, T403, T445, T447, Y527 and W273. In addition to these, other amino acid residues are
25
also present in the binding pocket and their stereoselectivity leads to the formation of different
26
products (trichodiene in TS and epi-aristolochene in EAS). The contact mapping analysis revealed
AC C
EP
TE D
M AN U
10
13
ACCEPTED MANUSCRIPT that tryptophan and tyrosine showed non-bonded interactions that form an aromatic box in the active
2
site and lead to the formation of epi-aristolochene (Supplementary Fig. 9). This aromatic box leads to
3
cation-pi interactions with the metal ions that provide favourable surrounding for the catalysis of the
4
substrate. It was also observed that the volume of binding pocket of EAS was 1387 Å3 which is larger
5
than the binding pocket volume of TS protein (1175 Å3) (Table 2). While, the calculated volumes of
6
the product molecules are respectively, 206 Å3 (epi-aristolochene) and 200 Å3 (trichodiene).
7
Therefore, these results correlate well with the binding pocket volumes measured and it seems that the
8
bigger pocket is required for the release of bigger product molecule.
9
3.5.4. Isoprene synthase
SC
RI PT
1
The comparative structural analysis of ISPS and TS showed that terpene fold is common in both the
11
proteins but the metal binding motif and catalytic site region varied in the protein structures. The
12
interactions of pyrophosphate moiety were observed with F338, V341, F485, R486 and N489
13
(Supplementary Fig. 8). It is clear from structural presentation that amino acid residues involved in
14
the catalysis are present in the same sphere. However, the orientation and stereochemistry of the
15
amino acid residues, interatomic hydrogen and non-bonded interactions lead to the product specificity
16
in these enzymes. It was supported by the contact mapping which revealed that the amino acid
17
residues namely, serine, phenylalanine and arginine (with relatively longer side chains) were
18
interacting through non-bonded interactions. The long side chains of these residues make the binding
19
pocket of the protein compact for the binding of DAMPP (C5) which is smaller than the substrate
20
[FPP (C15)] of TS enzyme (Table 1). It showed that the amino acid residues lining the binding pocket
21
may influence the size of end product of the enzymes (Supplementary Fig. 9). The binding pocket
22
volume of ISPS was 792 Å3 which is smaller than the binding pocket volume of TS (1175 Å3) (Table
23
2). It provides indicative evidence about the operating mechanism that how these pockets of the same
24
catalytic fold may handle the substrates/products of different sizes.
25
3.5.5. Bornyl pyrophosphate synthase
AC C
EP
TE D
M AN U
10
14
ACCEPTED MANUSCRIPT The comparative structural analysis of BPPS and TS showed that N terminus of BPPS is elongated
2
and the amino acid residues of the catalytic site which interact with pyrophosphate moiety are present
3
towards the C-terminal of the protein. It is known that N-terminus of BPPS is stabilized by the inter-
4
domain hydrogen bond interactions formed carbonyl hydrogen bond between the NH1 atom of the
5
side chain of R56 and O atom of carboxyl group of D355 and O atom of the OH group of the aromatic
6
ring of Y60 and OD1 atom of the side chain of D352 [61]. These interactions may stabilize the
7
pyrophosphate during catalysis as these residues are present in the loop region and point towards the
8
active site of the protein. Arginine was the common amino acid residue which was reported to carry
9
out the reaction in both the enzymes [9, 61]. Other amino acid residues of BPPS involved in the
10
hydrogen bond interactions with pyrophosphate moiety were W323, I344, V452, K511 and F578
11
(Supplementary Fig. 8). Leucine, phenylalanine, tryptophan and isoleucine residues occur in the
12
binding pocket of BPPS that can accommodate the GPP (C10) which is smaller than the substrate
13
(FPP) of TS protein (Supplementary Fig. 10). It is reported that the active site cavity of BPPS is 222
14
Å3 which is smaller than the active site cavity of TS protein (324 Å3) [60]. Therefore, it may be
15
concluded the basis of catalysis of larger substrate (FPP; volume: 339 Å3) by TS in comparison to a
16
smaller size substrate (GPP; volume: 263 Å3).
17
3.5.6. Taxadiene synthase
18
It is reported that the catalytically active site of TaxS is present at the C-terminal of the protein and N-
19
terminal domain contains a double α-barrel which does not have DXDD motif [63]. The terpene fold
20
and catalytic mechanism is same as that of TS. However, the amino acid residues which were
21
observed to form hydrogen bonds with pyrophosphate moiety in TaxS were S587, D613, Y688, Y684,
22
Q709, C719 and C830. These interacting residues determine the length of the end product in the
23
enzyme (Supplementary Fig. 11). The α-helical extra domain present at the N-terminal is 17Å far
24
from the catalytic site (Supplementary Fig. 11) [63]. It is reported that intra-hydrogen bonds of these
25
α-helices may help in stabilizing the pyrophosphate catalysis [63]. It is reported that arginine residues
26
present tandemly in the loop regions may lead to conformational changes that help in the catalysis of
AC C
EP
TE D
M AN U
SC
RI PT
1
15
ACCEPTED MANUSCRIPT the geranylgeranyl diphosphate (GGPP) (C20) [63] (Fig. 5). It is documented that arginine residues
2
play important role in the catalysis of pyrophosphate in BPPS and TS [63]. It would be interesting to
3
investigate the role of additional helical region which is an α-α barrel and contain arginine residues in
4
tandem repeats. The contact mapping showed that the amino acid residues occurring in the binding
5
pocket are aromatic and positively charged residues provide favourable environment for
6
pyrophosphate catalysis. Valine, serine and glycine residues having small side chains allow the protein
7
to bind to GGPP (C20) (Supplementary Fig. 10). This different composition of binding pocket of
8
terpene fold leads to the formation of different products. The kinetic parameters Kcat/Km of the OPPS
9
and CS were reported as 0.005 s-1 and 0.049 min-1 respectively [54, 59]. It was observed that CS may
10
have higher catalytic efficiency than OPPS which may be attributed to the additional helical regions
11
of the multi-domain protein.
12
It was observed that the terpene fold was conserved across different enzymes. The difference is at the
13
level of amino acid residues interacting with the pyrophosphate moiety and residues lining the binding
14
pocket. The proteins were clustered in the phylogenetic tree according to the domains diversity (Fig.
15
3). The proteins with multiple domains are clustered together and the proteins having single domain
16
occur in a separate clade. The phylogenetic analysis has also shown that the clustering was also
17
governed by the size of the end products. This is clear from the phylogenetic analysis that these
18
different enzymes are related on the evolutionary scale.
19
Additional domains of multi-domain proteins may aid in catalytic efficiency
20
NMA of single and multi-domain proteins showed that the additional domains may help in improving
21
the catalytic efficiency of the enzymes and may also help in regulation of their activities. Cumulative
22
movements of additional domains showed ‘puckering forceps’ like converging movement along a
23
central perpendicular axis (746-750 amino acid residues) which could bring the functionally important
24
residues of the binding pocket in close proximity for interaction. As explained above that some
25
regions of additional domains may interact with helices present in the catalytic domain and may
26
further increase the catalysis. However, in single domain proteins, it was observed that the residues
AC C
EP
TE D
M AN U
SC
RI PT
1
16
ACCEPTED MANUSCRIPT adjacent to the catalytic site were in the flexible regions. The amino acid residues which showed more
2
flexibility are shown in red colour in Fig. 6.
3
4. Discussion
4
Phylogenetic analysis provided the evidence of the overall occurrence of the TS protein across
5
different kingdoms of the life. However, TS proteins are well characterized in the fungi [50]. The
6
homologous protein sequences obtained after pBLAST were found in bacteria, lycophytes and plants.
7
It may be inferred that this protein has diverged and evolved with additional non-catalytic domains.
8
The phylogenetic analysis has shown that TS protein has evolved over a period of time according to
9
their role in the biosynthetic pathways of secondary metabolites among different genera of fungi (Fig.
SC
RI PT
1
3). The TS protein sequences from different genera of pathogenic fungi were clustered in one group
11
namely, Fusarium sporotrichiodes, F. graminearum, F. asiaticum, Beauveria bassiana and
12
Stachybotrys echinata. However, the genera of beneficial fungi that are reported as bio-control agents
13
formed a separate cluster adjacent to the pathogenic fungi (Fig. 2). Though it is difficult to explain its
14
biological basis, however, the end products of these molecules synthesized in the same biosynthetic
15
pathway have been reported to have different types of activities. The sequence analysis revealed that
16
there are homologous proteins present in different organisms including S. moellendorffii, Anabaena
17
variabilis, Sciscionella and Arabidopsis thaliana. However, the protein sequences of Nectria
18
haematococca, Capronia epimyces, Aspergillus oryzae and Ophiocordyceps sinensis were clustered in
19
different groups as these are aristolochene synthases containing the same terpene fold as that of the TS
20
protein [4]. It may be the reason that the other fungal genera which showed homology to TS proteins
21
were clustered in the group of Aspergillus oryzae. It is reported that in S. moellendorffii and
22
Marchantia polymorpha the expressed diterpene genes are closely related to the plants whereas, the
23
expressed mono- and sesqui-terpene synthase genes in these plants were closely related to those of the
24
microbes [25, 26]. Similarly, the homologous protein sequences obtained from S. moellendorffii,
25
belong to terpene synthases, that are present at the interface of those of the fungi and plants. However,
26
the proteins obtained from bacteria showed similarity to those of the plants. TS has shown homology
AC C
EP
TE D
M AN U
10
17
ACCEPTED MANUSCRIPT to AS from Aspergillus oryzae [21]. Further, the comparative structural analysis was carried out to
2
observe the presence of terpene fold in different organisms which carry out similar catalytic reactions.
3
However, the substrate and product length and the amino acid residues involved in the reactions were
4
varied in different organisms. It has been reported that metal binding structural motif is conserved in
5
these enzymes which have a distinct terpene fold in different organisms [67]. Mg2+ ions are reported
6
to act as a co-factor in all the proteins containing terpene fold which helps it to recognize the substrate
7
pyrophosphate group by metal coordination [54, 60, 61, 66]. The metal binding also showed to
8
regulate the binding pocket sizes. It is reported that the complexation of pyrophosphate with Mg2+ led
9
to the conformational changes that make the binding pocket sequestered and closed as reported in the
10
case of BPPS [61]. Further, these metal ions form hydrogen bond interactions with pyrophosphate and
11
the amino acid residues of the active site which result in proper orientation of substrate for cyclization
12
and product formation in the terpene fold containing proteins [66]. It is reported that the N termini of
13
BPPS and EAS cap their respective active sites, whereas, TS requires neither an N-terminal domain
14
nor the N terminus for active site closure. TS contains a D101–R304 regulatory motif, which was
15
earlier referred as a molecular switch that triggers active site closure, in this molecular switch, the
16
D101 residue has been reported to interact with Mg2+ [51, 61]. It shows that metal ion has a role in
17
preparing the optimal binding pocket and domains reorganisation to facilitate the overall
18
catalysis.
SC
M AN U
TE D
EP
19
RI PT
1
These enzymes can be broadly classified into single domain and multi-domain on the basis of their tertiary structures. The single domain proteins are clustered together in one group in the
21
phylogenetic tree and the proteins containing two or three domains are clustered into another group
22
(Fig. 3). Since the pyrophosphate possesses negative charge densities, therefore, it was observed to be
23
interacting with positively charged residues. It was also observed that in all analyzed protein
24
sequences, arginine interacts with pyrophosphate. The aromatic residues also play an important role in
25
the enzyme catalysis of all these analyzed proteins as these emanate the catalysis by cation-pi
26
interactions. The proteins which do not have NSE/DTE metal binding domain were observed in both
27
clusters. It is evident that terpene fold is involved in the catalysis of diverse type of chemical reactions
AC C
20
18
ACCEPTED MANUSCRIPT governed by a similar mechanism. This property of terpene fold can be attributed to the diversity
2
evolved in a local environment of the catalytic pockets. The contact mapping analysis of all the
3
enzymes showed that the sidechains of the residues present in the binding pocket determine the
4
diverse size of substrates and products. A similar study on plant terpene synthases showed that the
5
altered residues from the binding pocket may change the end products [68]. It is reported that the side
6
chain of the fifth amino acid present before the first DDXXD motif in HexPPS, GPPS and FPPS
7
decides the length of the product [65]. Therefore, the stereochemistry of the amino acid residues
8
present in the binding pocket and the depth of the binding pocket determine the size of the end
9
products in these enzymes. This observation was further strengthened by the binding pocket volume
SC
RI PT
1
analysis. TS, δ-CS, EIZS and EAS enzymes convert the substrate of the same size into different
11
products of the same size. It was observed that the volume of the binding pocket and the volume of
12
product molecules show a correlation in the context of the diverse behavior of terpene fold enzymes.
13
The comparative volume analysis of binding pockets and substrates/products clearly indicated that the
14
promiscuous behaviour of terpene fold may be attributed to the size of substrates, products and the
15
side chain lengths of the amino acid residues lining the binding pocket which affects the occupancy
16
and movement of ligand molecules in the binding site. It is well known that the enzymes containing
17
terpene fold might be having a common origin but the studies on the evolutionary events which lead
18
to their functional diversity needs to be worked out yet. The comparative analysis of kinetic efficiency
19
of the OPPS to CS showed that CS has higher catalytic efficiency than OPPS [54, 59]. This better
20
efficiency of the multi-domain enzymes may be attributed to the extra α- α barrel structures present at
21
the N-terminal. The structural arrangement indicated that these may be in the evolving phase as the
22
barrel present in this case is not well developed. Therefore, it would be interesting to explore how the
23
tandem arginine repeats reported in the loop of α- α barrel could regulate the catalytic site which is
24
situated 17 Å far, using in silico approaches for analyzing bigger protein domain movements and
25
dynamics such as normal mode analysis as described earlier [69]. Furthermore, the intra-hydrogen
26
bond formation between tandem arginine residues present in the domains other than the catalytic
27
domain may assist in the catalysis. It was observed that the instances at which the arginine residues
AC C
EP
TE D
M AN U
10
19
ACCEPTED MANUSCRIPT are involved in the intra-hydrogen bonding lead to a decrease in the binding pocket volume. For
2
example in the case of TaxS protein, initially before the simulation, the observed binding pocket of
3
the protein was 339 Å3. While, at the instance where arginine residues showed intra-hydrogen
4
bonding, the calculated binding pocket volume was appeared to be 289 Å3. This compaction of the
5
binding pocket as a result of intra-hydrogen bonding may lead to better catalysis in the multi-domain
6
proteins (unpublished results). To extend this observation, it will be interesting to comparatively
7
analyse the representatives from all terpene fold containing multi-domain protein classes in this
8
context. Further, it is well studied that multi-domain proteins utilize inter-domain motions like domain
9
swinging, stretching, twisting, and motion coupling etc. to improve their functionality. This may also
10
aid in their versatile modes of regulation according to the presented variable physiological conditions
11
within the diverse organisms. We attempted to observe whether the additional domains of these
12
terpene family multi-domain proteins communicate with the catalytic domains or not. In the current
13
study we observed that the cumulative motions of additional domains follow a circular pattern around
14
a perpendicular axis present roughly in the middle of these proteins. Therefore, the overall structure of
15
these proteins may become compacter. This compactness of catalytic domain brings the functional
16
residues in close proximity and may increase the catalysis. It could be concluded from NMA that
17
these coupled motions may increase the catalytic efficiency of multi-domain proteins. Further, it
18
would be interesting to investigate the role of this domain and the possible reason why this protein is
19
carrying this reasonably large region that is yet to be characterized functionally. These proteins are
20
involved in the production of economically important secondary metabolites that may be beneficial
21
for sustainable agriculture and biomedical applications [70]. Therefore, by comparing the sequence
22
and structures of these enzymes, we may trace their molecular evolution. The changes that lead to
23
such enzyme catalytic diversity may be helpful in devising the novel recombinant designer enzymes
24
for producing the diverse secondary metabolite molecules of agricultural and biomedical interest at
25
industrial scales.
26
Conflict of interest
27
Authors declare no conflict of interest.
AC C
EP
TE D
M AN U
SC
RI PT
1
20
ACCEPTED MANUSCRIPT 1 Acknowledgements:
3
University Grant Commission, Govt. Of India (UGC) is acknowledged for providing financial support
4
in the form of a fellowship to IK. Research in MA lab is supported by UGC. Research in YA lab is
5
supported by extramural research funds from UGC, Indian Council of Medical Research and Science
6
and Engineering Research Board, DST, Govt. of India. We thank the Central University of Himachal
7
Pradesh and Bioinformatics Resources & Applications Facility, Centre for Development in Advanced
8
Computing, Pune for providing the computational infrastructure used for carrying out this work.
9
References
SC
1. Davis EM, Croteau R. Cyclization enzymes in the biosynthesis of monoterpenes,
M AN U
10
RI PT
2
11
sesquiterpenes, and diterpenes. In: Leeper FJ, Vederas JC, editors. Biosynthesis. Springer
12
Berlin Heidelberg, 2000. p 53-95.
16 17 18 19 20 21 22
TE D
15
2006;106:3412–3442.
3. Christianson DW. Unearthing the roots of the terpenome. Curr Opin Chem Biol 2008;12:141150.
4. Agger S, Gallego FL, Dannert CS. Diversity of sesquiterpene synthases in the basidiomycete
EP
14
2. Christianson DW. Structural biology and chemistry of the terpenoid cyclases. Chem Rev
Coprinus cinereus. Mol Microbio 2009;72:1181-1195.
5. Tholl D. Terpene synthases and the regulation, diversity and biological roles of terpene
AC C
13
metabolism. Curr Opin Plant Biol 2006;9:297-304.
6. Kawaide H, Imai R, Sassa T, Kamiya Y. ent-Kaurene synthase from the fungus Phaeosphaeria sp. L487: cDNA isolation, characterization, and bacterial expression of a
23
bifunctional diterpene cyclase in fungal gibberellin biosynthesis. J Biol Chem
24
1997;272:21706–21712.
21
ACCEPTED MANUSCRIPT 1
7. Caruthers JM, Kang I, Rynkiewicz MJ, Cane DE, Christianson DW. Crystal structure
2
determination of aristolochene synthase from the blue cheese mold, Penicillium roqueforti. J
3
Biolog Chem 2000;275:25533-25539.
4
8. Dairi T, Hamano Y, Kuzuyama T, Itoh N, Furihata K, et al. Eubacterial diterpene cyclase genes essential for production of the isoprenoid antibiotic terpentecin. J Bacterio
6
2001;183:6085-6094.
7
RI PT
5
9. Rynkiewicz MJ, Cane DE, Christianson DW. Structure of trichodiene synthase from
Fusarium sporotrichioides provides mechanistic inferences on the terpene cyclization
9
cascade. PNAS 2001;98:13543-13548.
10. Toyomasu T, Nakaminami K, Toshima H, Mie T, Watanabe K, et al. Cloning of a gene
M AN U
10
SC
8
11
cluster responsible for the biosynthesis of diterpene aphidicolin, a specific inhibitor of DNA
12
polymerase alpha. Biosci Biotechnol Biochem 2004;68:146–152.
13
11. Toyomasu T, Tsukahara M, Kaneko A, Niida R, Mitsuhashi W, et al. Fusicoccins are biosynthesized by an unusual chimera diterpene synthase in fungi. PNAS USA
15
2007;104:3084–3088.
16
TE D
14
12. Shishova EY, Di Costanzo L, Cane DE, Christianson DW. X-ray crystal structure of aristolochene synthase from Aspergillus terreus and evolution of templates for the cyclization
18
of farnesyl diphosphate. Biochem 2007;46:1941-1951.
20 21 22 23 24
13. Pinedo C, Wang CM, Pradier JM, Dalmais B, Choquer M, et al. Sesquiterpene synthase from the botrydial biosynthetic gene cluster of the phytopathogen Botrytis cinerea. ACS Chem Biol
AC C
19
EP
17
2008;3:791–801.
14. Cane DE, Watt RM. Expression and mechanistic analysis of a germacradienol synthase from Streptomyces coelicolor implicated in geosmin biosynthesis. PNAS. 2003;100:1547-1551.
15. Cane DE, He X, Kobayashi S, Omura Cane S, Ikeda H. Geosmin biosynthesis in
25
Streptomyces avermitilis. Molecular cloning, expression, and mechanistic study of the
26
germacradienol/geosmin synthase. J Antibiot (Tokyo) 2006;59:471–479.
22
ACCEPTED MANUSCRIPT 1
16. Agger SA, Lopez-Gallego F, Hoye TR, Schmidt-Dannert C. Identification of sesquiterpene
2
synthases from Nostoc punctiforme PCC 73102 and Nostoc sp. strain PCC 7120. J Bacteriol
3
2008;190:6084–6096.
4
17. Giglio S, Jiang J, Saint CP, Cane DE, Monis PT. Isolation and characterization of the gene associated with geosmin production in cyanobacteria. Environ. Sci. Technol. 2008;42:8027–
6
8032.
8 9
18. Hohn TM, Beremand PD. Isolation and nucleotide sequence of a sesquiterpene cyclase gene from the trichothecene-producing fungus Fusarium sporotrichioides. Gene 1989;79:131–138.
SC
7
RI PT
5
19. Hohn TM, Plattner RD. Purification and characterization of the sesquiterpene cyclase aristolochene synthase from Penicillium roqueforti. Arch Biochem Biophys 1989;272:37–
11
143.
12 13 14
M AN U
10
20. Cane DE, Shim JH, Xue Q, Fitzsimons BC, Hohn TM. Trichodiene synthase- Identification of active site residues by site-directed mutagenesis. Biochem. 1995;34:2480–2488.
21. Cane DE, Kang I. Aristolochene synthase: purification, molecular cloning, high-level expression in Escherichia coli and characterization of the Aspergillus terreus cyclase. Arch.
16
Biochem. Biophys. 2000;376:354–364.
19 20 21 22 23 24 25 26
has both seed plant and microbial types of terpene synthases. PNAS. 2012;109:14711-14715.
EP
18
22. Li G, Köllner TG, Yin Y, Jiang Y, Chen H, et al. Nonseed plant Selaginella moellendorffii
23. Gahtori D, Chaturvedi P. Antifungal and antibacterial potential of methanol and chloroform extracts of Marchantia polymorpha L. Arch. Phytopatho Plant Prot 2011;44:726-731.
AC C
17
TE D
15
24. Asakawa Y. Bryophytes: Chemical diversity, synthesis and biotechnology. A review. Flavour Fragr J 2011;26:318–320.
25. Kumar S, Chase K, Xun Z, Ayla N, Sibongile M, et al. Molecular Diversity of Terpene Synthases in the Liverwort Marchantia polymorpha. The Plant Cell 2016;28:2632-2650.
26. Trapp SC, Croteau RB. Genomic organization of plant terpene synthases and molecular evolutionary implications. Genetics 2001;158:811–832.
23
ACCEPTED MANUSCRIPT 1
27. Pontin M, Bottini R, Luis BJ, Piccoli P. Allium sativum produces terpenes with fungistatic
2
properties in response to infection with Sclerotium cepivorum. Phytochemistry
3
2015;115:152–160.
4
28. McAndrew RP, Peralta-Yahya PP, DeGiovanni A, Pereira JH, Hadi MZ, Keasling JD, Adams PD. Structure of a three-domain sesquiterpene synthase: a prospective target for
6
advanced biofuels production. Structure 2011;19:1876-1884.
8
29. Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res 1988;16:10881-10890.
SC
7
RI PT
5
30. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X
10
windows interface: flexible strategies for multiple sequence alignment aided by quality
11
analysis tools. Nucleic Acids Res 1997;25:4876–4882.
12 13 14
M AN U
9
31. Gouet P, Courcelle E, Stuart DI. ESPript: analysis of multiple sequence alignments in PostScript. Bioinfo. 1999;15:305-308.
32. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. SCIENCE-NEW YORK
16
THEN WASHING 1993;262:208-208.
19 20
Genome Res 2004;14:1188-1190.
EP
18
33. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator.
34. Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 1993;10:512–526.
AC C
17
TE D
15
21
35. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. MEGA5: molecular evolutionary
22
genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony
23 24 25 26 27
methods. Mol Biol Evo 2011;28:2731-2739.
36. Rambaut A. Fig Tree from A. Rambaut [Internet]. 2007 Available from: http://tree.bio.ed.ac.uk/software/figtree/
37. Holm L, Rosenström P. Dali server: conservation mapping in 3D. Nucleic Acids Res 2010;38(suppl 2):W545-W549. 24
ACCEPTED MANUSCRIPT 1 2 3 4
38. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. The protein data bank.Nucleic Acids Res.2000;28:235-242.
39. Mosca R, Schneider TR. RAPIDO: a web server for the alignment of protein structures in the presence of conformational changes. Nucleic Acids Res 2008;36:W42-W46.
40. DeLano WL. PyMOL. DeLano Scientific, San Carlos, CA. 2002;700.
6
41. Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, et al. AutoDock4
7
and AutoDockTools4: automated docking with selective receptor flexibility. J Comp Chem
8
2009;30:2785–2791.
SC
RI PT
5
42. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new
10
scoring function, efficient optimization and multithreading. J Comp Chem 2010;31:455–461.
11 12 13
M AN U
9
43. Berendsen HJ, van der Spoel D, van Drunen R. GROMACS: a message passing parallel molecular dynamics implementation. Comput Phys Commun 1995;91:43–56.
44. Kumari I, Chaudhary N, Sandhu P, Ahmed M, Akhter Y. Structural and mechanistic analysis of engineered trichodiene synthase enzymes from Trichoderma harzianum: towards higher
15
catalytic activities empowering sustainable agriculture. J Biomol Struc Dyn 2015;34:1176-89.
17 18
45. Laskowski RA, Swindells MB. LigPlot+: Multiple ligand–protein interaction diagrams for drug discovery. J Chem Info Mod 2011;51:2778–2786.
46. Ben Nasr N, Guillemain H, Lagarde N, Zagury JF, Montes M. Multiple structures for virtual
EP
16
TE D
14
ligand screening: defining binding site properties-based criteria to optimize the selection of
20
the query. J Chem Info Mod 2013;53:293-311.
AC C
19
21
47. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al. UCSF Chimera-a
22
visualization system for exploratory research and analysis. J Compu Chem 2004;25:1605-
23 24 25
1612.
48. Atilgan AR, Durrell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J 2001;80:505-515.
25
ACCEPTED MANUSCRIPT 1
49. Chunyan X, Tobi D, Bahar I. Computational prediction of allosteric structural changes by a
2
simple mechanical model: application to hemoglobin T to R transition. J Mol Biol
3
2003;333:153-168.
4
50. Tijerino A, Cardoza RE, Moraga J, Malmierca MG, Vicente F, et al. Overexpression of the trichodiene synthase gene tri5 increases trichodermin production and antimicrobial activity in
6
Trichoderma brevicompactum. Fung Gen Biol 2011;48:285-296.
8
51. Starks CM, Back K, Chappell J, Noel JP. Structural basis for cyclic terpene biosynthesis by tobacco 5-epi-aristolochene synthase. Sci 1997;277:1815–1820.
SC
7
RI PT
5
52. Lesburg CA, Zhai G, Cane DE, Christianson DW. Crystal structure of pentalenene synthase:
10
Mechanistic insights on terpenoid cyclization reactions in biology. Sci 1997;277:1820–1824.
11
M AN U
9
53. Aaron JA, Lin X, Cane DE, Christianson DW. Structure of epi-Isozizaene synthase from
12
Streptomyces coelicolor A3(2), a Platform for new terpenoid cyclization templates. Biochem
13
2010;49:1787–1797.
14
54. Kampranis SC, Ioannidis D, Purvis A, Mahrez W, Ninga E, et al. Rational conversion of substrate and product specificity in a Salvia monoterpene synthase: structural insights into the
16
evolution of terpene synthase function. The Plant Cell 2007;19:1994–2005.
17
TE D
15
55. Chang TH, Hsieh FL, Ko TP, Teng KH, Liang PH, et al. Structure of a heterotetrameric geranyl Pyrophosphate synthase from mint (Mentha piperita) reveals intersubunit regulation.
19
The Plant Cell 2010;22:454–467.
21 22 23
56. H.A. Gennadios, V. Gonzalez, L.D. Costanzo, A. Li, F. Yu, et al., Crystal structure of (+)-δ-
AC C
20
EP
18
cadinene synthase from Gossypium arboreum and evolutionary divergence of metal binding motifs for catalysis. Biochem. 48 (2009) 6175–6183.
57. Sun HY, Ko TP, Kuo CJ, Guo RT, Chou CC, et al. Homodimeric hexaprenyl pyrophosphate
24
synthase from the thermoacidophilic crenarchaeon Sulfolobus solfataricus displays
25
asymmetric subunit structures. J Bacterio 2005;187:8137-8148.
26
ACCEPTED MANUSCRIPT 1
58. Köksal M, Zimmer I, Schnitzler JP, Christianson DW. Structure of isoprene synthase
2
illuminates the chemical mechanism of teragram atmospheric carbon emission. J Mol Bio
3
2010;402:363-373.
59. Guo RT, Kuo CJ, Ko TP, Chou CC, Liang PH, et al. A molecular ruler for chain elongation
5
catalyzed by octaprenyl pyrophosphate synthase and its structure-based engineering to
6
produce unprecedented long chain trans-prenyl products. Biochem 2004;43:7678-7686.
7
60. Wallrapp FH, Pan JJ, Ramamoorthy G, Almonacid DE, Hillerich BS, et al. Prediction of
8
function for the polyprenyl transferase subgroup in the isoprenoid synthase superfamily.
9
PNAS 2013;110:E1196-E1202.
SC
61. Whittington DA, Wise ML, Urbansky M, Coates RM, Croteau RB, et al., Bornyl diphosphate
M AN U
10
RI PT
4
11
synthase: Structure and strategy for carbocation manipulation by a terpenoid cyclase. PNAS
12
2002;99:15375–15380.
16 17 18 19 20 21 22
63. Köksal M, Jin Y, Coates RM, Croteau R, Christianson DW. Taxadiene synthase structure and
TE D
15
squalene synthase. J Biol Chem 2000;275:30610-30617.
evolution of modular architecture in terpene biosynthesis. Nature 2011;469:116-120.
64. Yu F, Li M, Xu C, Sun B, Zhou H, et al., Crystal structure and enantioselectivity of terpene cyclization in SAM-dependent methyltransferase TleD. Biochem J 2016;473:4385–4397.
EP
14
62. Pandit J, Danley DE, Schulte GJ, Mazzalupo S, Pauly TA, et al. Crystal structure of human
65. Sasaki D, Fujihashi M, Okuyama N, Kobayashi Y, Noike M, et al. Crystal structure of heterodimeric hexaprenyl diphosphate synthase from Micrococcus luteus BP 26 reveals that
AC C
13
the small subunit is directly involved in the product chain length regulation. J Biolog Chem 2011;286:3729-3740.
23
66. Vedula LS, Cane DE, Christianson DW. Role of Arginine-304 in the diphosphate-triggered
24
active site closure mechanism of trichodiene synthase. Biochem 2005;44:12719–12727.
25
67. Greenhagen B, Chappell J. Molecular scaffolds for chemical wizardry: learning nature's rules
26
for terpene cyclases. PNAS 2001;98:13479-13481.
27
ACCEPTED MANUSCRIPT 1 2 3
68. Greenhagen BT, O’Maille PE, Noel JP, Chappell J. Identifying and manipulating structural determinates linking catalytic specificities in terpene synthases. PNAS 2006;103:9826–9831.
69. Kumari I, Ahmed M, Akhter Y. Deciphering the protein translation inhibition and coping mechanism of trichothecene toxin in resistant fungi. The Intl J Biochem Cell Biol
5
2016a;78:370-376. Doi: 10.1016/j.biocel.2016.08.002.
RI PT
4
70. Kumari I, Ahmed M, Akhter Y. Multifaceted impact of trichothecene metabolites on plant-
7
microbe interactions and human health. App Microbio Biotech 2016b;100:5759-5771.DOI
8
10.1007/s00253-016-7599-0
9
SC
6
Figure legends:
Fig. 1 The overall methodology used for comparative sequence and structural analysis of
11
terpene fold containing proteins
12
This work can be divided into two parts viz. sequence based and structure-based analysis. For the
13
comparative sequence-based analysis pBLAST was used to obtain all the homologous sequences from
14
the diverse taxa of all kingdoms. Trichoidene synthase was used against the non-redundant database
15
as the query sequence. The catalytic domain was analyzed among these sequences by multiple
16
sequence alignment and phylogenetic tree was constructed using Mega5. The terpene fold containing
17
proteins were obtained by using DALI server. The obtained proteins were compared structurally using
18
RAPIDO server. To understand the possible reason which leads to reactant/product diversity in
19
terpene fold, contact mapping analysis and volume calculations of the binding pocket were carried
20
out. Phylogenetic analysis of the proteins containing terpene fold was done using maximum
21
parsimony method.
22
Fig. 2 MSA demonstrates conserved motifs of the TS enzymes across different genera of fungi
23
(a) MSA of different TS proteins from fungi showed that the metal binding regions and catalytic
24
regions which interact with pyrophosphate moiety are conserved. The pattern of conservation is
25
depicted by from the colored weblogo diagrams. (b) The cartoon structure of TS protein is shown in
AC C
EP
TE D
M AN U
10
28
ACCEPTED MANUSCRIPT cyan blue color and terpene fold is shown in blue color. The metal binding motifs are shown in red
2
color and pyrophosphate binding motif is highlighted in green colour. The metal ions are shown in
3
magenta color spheres and pyrophosphate is shown in sticks in the binding pocket of the TS protein.
4
Fig. 3 Phylogenetic analysis of TS protein in fungi and terpene domain enzymes across the
5
kingdoms of life
6
(a) Phylogenetic analysis of TS protein showed that it is present across diverse fungal genera. TS
7
protein of plant growth promoting fungi (PGPF) and pathogenic fungi form separate clusters. While
8
the proteins that are homologous to TS protein in other fungal genera excluding the PGPF and
9
pathogenic fungi contain the same terpene fold with different enzymatic activities form a separate
SC
RI PT
1
cluster. Terpene synthase from lycophyte was present at the interface of microbes and plants. (b) The
11
catalytic residues which are reported to be involved in the reaction catalysis in TS of Fusarium
12
sporotrichoides have shown conservation among different organisms as represented in MSA. (c)
13
Phylogenetic analysis of TS protein with other enzymes containing terpene fold also resulted in two
14
clusters based on their structural diversity and a catalytic domain. The green colour is showing the
15
proteins with two or more domains and the cyan blue colour is showing the proteins which carry
16
single domain. The proteins which do not have second metal binding domain are indicated by the
17
brackets in the tree. The abbreviations used for the proteins: 5-epi-aristolochene synthase (EAS),
18
pentalenene synthase (PS), epi-isozizaene synthase (EIZS), geranyl pyrophosphate synthase (GPPS),
19
(+)-δ-cadinene synthase (δCS), hexaprenyl pyrophosphate synthase (HexPPS), isoprene synthase
20
(ISPS), octaprenyl pyrophosphate synthase (OPPS), polyprenyl synthase (PPS), trichodiene synthase
21
(TS), bornyl pyrophosphate synthase (Bornyl Ppi Synthase), farnesyl pyrophosphate synthase (FPPS),
22
squalene synthase (SS), pentalene synthase (PS) and taxadiene synthase (TaxS).
23
Fig. 4 Comparative structural analysis of multi-domain terpene synthase proteins
24
TS protein is shown in cyan blue colour and other proteins are shown in green colour. While the
25
helical region presented in magenta colour is absent in TS protein. It was observed that the catalytic
26
domain is conserved in all the proteins. However, the occurrence of metal binding and pyrophosphate
AC C
EP
TE D
M AN U
10
29
ACCEPTED MANUSCRIPT interacting residues are at different positions in these proteins. The additional helical structures in the
2
multi-domain proteins have not shown any direct roles in catalytic activity. There are indications of
3
their regulatory roles. For instance, it is reported in TaxS that N-terminal of protein showed intra-
4
hydrogen bond interactions that stabilize this non-catalytic domain of the protein. Arginine residues
5
are present in this region which is reported to play important role in the substrate catalysis in all TS
6
proteins. Therefore, it may be considered that N-terminal domain stabilizes the catalysis of
7
pyrophosphate and might be helpful in enhancing the catalytic efficiency [63].
8
Fig. 5 α- α barrel in the N-terminal region of TaxS protein
9
(a)It is reported that the amino acid residues of N-terminal region form intra-hydrogen bonds that may
SC
RI PT
1
stabilize the interaction of pyrophosphate [63]. It was observed that the loops of the N-terminal
11
domain are 17Å far from the catalytic site. The N-terminal region contains loops which are flexible in
12
nature and may aid in the movement of the α-α barrel domain and subsequently may have a regulatory
13
role in the catalysis. (b) The N-terminal region of the protein is arranged in α- α barrel like fold. It is
14
reported that arginine is present tandemly in this domain. Arginine is reported to play important role
15
in the catalysis of terpene synthases [63]. Therefore, it will be interesting to study in future the role of
16
this region.
17
Fig. 6 Normal mode analyses of single and multi-domain terpene fold proteins
18
(a) NMA of single domain protein showed that the amino acid residues around the catalytic site are
19
flexible in nature. These may help in the catalysis of the substrate. (b) While, NMA of multi-domain
20
proteins showed that the flexible regions in the proteins lead to ‘puckering forceps’ like movements,
21
which may increase the compactness of the protein and its catalytic efficiency. (c) The multi-domain
22
protein is shown in the cartoon. The green coloured region contains catalytic sites highlighted in red
23
and blue colour. Additional domains other than the catalytic domain are highlighted in magenta
24
colour. The coupled motion of different domains is depicted in blue arrows.
AC C
EP
TE D
M AN U
10
25 30
ACCEPTED MANUSCRIPT Tables Table 1 Terpene fold catalyses similar reaction to yield diverse products Enzyme
Reactant (size)
Product (size)
1.
Trichodiene synthase
Farnesyl pyrophosphate (C15)
Trichodiene (C15)
2.
Squalene synthase
2 Farnesyl pyrophosphate (C15)
Squalene (C30)
3.
Pentalene synthase
Farnesyl pyrophosphate (C15)
Pentalene (C15)
4.
Geraynl pyrophosphate synthase
Dimethylallyl diphosphate (C5) and Isopentenyl pyrophosphate (C5)
Geranyl pyrophosphate (C10)
5.
Octaprenyl pyrophosphate synthase
Farnesyl diphosphate (C15) Octaprenyl and Isopentenyl pyrophosphate pyrophosphate (C40) (C5)
6.
Hexaprenyl pyrophosphate synthase
Farnesyl diphosphate (C15) and Isopentenyl pyrophosphate (C5)
7.
Polyprenyl synthase
Decaprenyl diphosphate Farnesyl diphosphate (C15) and Isopentenyl pyrophosphate (C40) (C5)
8.
Epi-isozizaene synthase
Farnesyl pyrophosphate (C15)
Epi-isozizaene (C15)
9.
5-Epi-aristolochene synthase
Farnesyl pyrophosphate (C15)
5-Epi-aristolochene (C15)
10.
1,8-cineole synthase
Geranyl diphosphate (C10)
1,8-cineole (C10)
11.
δ-Cadiene synthase
Farnesyl pyrophosphate (C15)
δ-Cadiene (C15)
12.
Isoprene Synthase
Dimethylallyl pyrophosphate (C5)
Isoprene (C5)
Bornyl pyrophosphate synthase
Geranyl pyrophosphate (C10)
Bornyl pyrophosphate synthase (C10)
Taxadiene synthase
Geranylgeranyl diphosphate (C20)
Taxadiene (C20)
14.
SC
M AN U
TE D
EP
AC C
13.
RI PT
Sr. No.
1
Hexaprenyl pyrophosphate (C30)
ACCEPTED MANUSCRIPT
Protein
Volume (Å3)
1.
Trichodiene synthase
1175
2.
Epi-isozizaene synthase
527
3.
Geraynl pyrophosphate synthase
973
4.
δ-Cadiene synthase
1464
5.
1,8-cineole synthase
857
6.
5-Epi-aristolochene synthase
1387
7.
Isoprene Synthase
792
AC C
EP
TE D
M AN U
SC
Sr. No.
RI PT
Table 2 Volume of the substrate binding pocket of the terpene fold enzymes
2
ACCEPTED MANUSCRIPT
RI PT
pBLAST of trichodiene synthase with NR-sequence database
SC
Conservation pattern of the catalytic site across different organisms
Phylogenetic tree of trichodiene synthase among different organisms
M AN U
Sequence based
Fig1
TE D
EP
Comparative structural analysis using RAPIDO server
AC C
Structure based
Terpene fold identification in different proteins using DALI server
Binding pocket analysis to explore the reaction diversity of terpene fold
Phtylogenetic tree on the basis of terpene fold
ACCEPTED MANUSCRIPT
Fig2
(b)
(b)
Mg2+
AC C
EP
TE D
M AN U
SC
RI PT
(a)
Pyrophosphate
ACCEPTED MANUSCRIPT
Fig3
(c)
(b)
AC C
EP
TE D
M AN U
Fungi
SC
RI PT
(a)
ACCEPTED MANUSCRIPT
Fig4 5-epi-aristolochene synthase
1,8-cineole synthase
(+)-δ-cadinene synthase
AC C
EP
Bornyl pyrophopshate synthase
TE D
M AN U
SC
RI PT
Isoprene synthase
Taxadiene synthase
ACCEPTED MANUSCRIPT
Fig4
17Å
M AN U
SC
17.3Å
RI PT
(a)
AC C
EP
TE D
(b)
ACCEPTED MANUSCRIPT
Fig6
102-124
(a)
(c)
RI PT
169-175
M AN U
SC
247-252
311-320 564-571
AC C
504-534
EP
TE D
(b)
653-660 667-680 726-733 746-750
40-52
ACCEPTED MANUSCRIPT
Highlights 1. Phylogenetic analysis of terpene fold showed evolution at domain levels 2. Terpene fold sequence from lycophyte Selaginella is between microbes and plants
RI PT
3. Amino acid side chains in catalytic pocket determine substrates/products diversity
4. Multi-domain enzymes contain additional α-α barrel which may regulate the catalysis
AC C
EP
TE D
M AN U
SC
5. ‘Puckering forceps’ kind of regulatory motion was observed in multi-domains
ACCEPTED MANUSCRIPT Author’s agreement and ethical statement All the authors have jointly worked on the manuscript and agree to its publication. No part of the manuscript has been published previously or currently under consideration for publication. The acknowledgements contain complete information on the funding we receive and we have no financial conflicts of interests to declare. There are no ethical issues involved in this work.
RI PT
On behalf of all authors, Yusuf Akhter, PhD
AC C
EP
TE D
M AN U
SC
Corresponding author