Accepted Manuscript Evolution of catalytic microenvironment governs substrate and product diversity in trichodiene synthase and other terpene fold enzymes Indu Kumari, Mushtaq Ahmed, Yusuf Akhter PII:

S0300-9084(17)30245-6

DOI:

10.1016/j.biochi.2017.10.003

Reference:

BIOCHI 5284

To appear in:

Biochimie

Received Date: 10 July 2017 Accepted Date: 5 October 2017

Please cite this article as: I. Kumari, M. Ahmed, Y. Akhter, Evolution of catalytic microenvironment governs substrate and product diversity in trichodiene synthase and other terpene fold enzymes, Biochimie (2017), doi: 10.1016/j.biochi.2017.10.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

102-124

(a)

(c)

RI PT

169-175

M AN U

SC

247-252

311-320 564-571

AC C

504-534

EP

TE D

(b)

653-660 667-680 726-733 746-750

40-52

ACCEPTED MANUSCRIPT 1

Evolution of catalytic microenvironment governs substrate and product diversity in trichodiene

2

synthase and other terpene fold enzymes

3 4

Indu Kumari1, Mushtaq Ahmed1 and Yusuf Akhter2*

5

1

6

Himachal Pradesh-176206, India

7

2

8

176206, India

9

*Correspondence:

RI PT

School of Earth and Environmental Sciences, Central University of Himachal Pradesh, Kangra,

SC

School of Life Sciences, Central University of Himachal Pradesh, Kangra, Himachal Pradesh-

E.mails: [email protected], [email protected]

11

Running title: Substrate and product diversity in terpene fold enzymes

M AN U

10

12

16

17

18

19

EP

15

AC C

14

TE D

13

20

21

22 1

ACCEPTED MANUSCRIPT Abstract

2

Trichodiene synthase, a terpene fold enzyme catalyzes the first reaction of trichodermin biosynthesis

3

that is an economically important secondary metabolite. Sequence search analysis revealed that the

4

proteins containing terpene fold are present in bacteria, fungi and plants. Terpene fold protein from

5

Selaginella moellendorffii, a lycophyte, appeared at the interface of the microbes and plants in the

6

evolutionary scale. Amino acid residues present around the catalytic pocket determines the size of the

7

substrate as well as product molecules. It has been observed that the overall molecular evolution of

8

the catalytic pockets dictates the choice of substrates/products of the proteins. It was further observed

9

that N-terminus of multi-domain terpene fold proteins may assist in the interactions with the

SC

RI PT

1

pyrophosphate part of the substrates. The phylogenetic analysis of these proteins revealed that the

11

enzymes are clustered into groups based on the domains present additional to the catalytic domains.

12

We have also observed inter-domain ‘puckering forceps’ type motions in the multi-domains using

13

normal mode analyses which were further correlated with their functions. The evolutionary clustering

14

of these proteins was also influenced by the presence/absence of cofactor interacting motifs. These

15

results may be used to modify/enhance the functions of these enzymes using protein engineering

16

methods.

17

Abbreviations: EAS, 5-epi-aristolochene synthase; PS, Pentalenene synthase; EIZS, Epi-isozizaene

18

synthase; CS, 1,8-cineole synthase; DMAPP, Dimethylallyl diphosphate; FPP, Farnesyl

19

pyrophosphate; GPP, Geranyl pyrophosphate; GPPS, Geranyl pyrophosphate synthase; δCS, (+)-δ-

20

cadinene synthase; HexPPS, Hexaprenyl pyrophosphate synthase; IPP, Isopentenyl pyrophosphate,

21

ISPS, Isoprene synthase; OPPS, Octaprenyl pyrophosphate synthase; PPS, Polyprenyl synthase;

22

BPPS, Bornyl pyrophosphate synthase; SS, Squalene synthase; TaxS, taxadiene synthase; TS,

23

Trichodiene synthase

24

Keywords: Terpene fold; Trichodiene synthase; Catalytic site evolution; Single domain proteins;

25

Multi-domain proteins

AC C

EP

TE D

M AN U

10

2

ACCEPTED MANUSCRIPT 1. Introduction

2

All terpene molecules are derived from C5 linear allyl chain i.e. isopentenyl diphosphate (IPP) and

3

dimethylallyl diphosphate (DMAPP). The terpene structures are diverse in nature that result in the

4

formation of diverse terpene molecules by rearrangement and cyclization the linear reactants [1-4].

5

The plant species have been reported to produce several terpene synthases [5], however, only a few

6

microbial enzymes have been functionally characterized [6-13]. The genomic studies have provided

7

the fundamental information that led to the discovery of several sesquiterpene synthases from

8

actinomycetes and cyanobacteria [14-17], though, only three of them namely, trichodiene synthase

9

(TS), aristolochene synthase and presilphiperfolan-8-ol synthase [4, 7, 9, 13, 18-21] have been

SC

RI PT

1

characterized so far. Trichodiene synthase (TS) belongs to the family of terpene synthases that are

11

pivotal enzymes involved in the biosynthesis of trichothecenes secreted by different fungal genera

12

[22]. Many of the terpene compounds are also reported to exhibit various biological properties such as

13

anti-microbial activities [23]. These have been also reported to have deterrence to harmful insects and

14

were obtained from liverworts [24, 25]. It has been shown that the diterpene synthase genes in the

15

lycophyte, Selaginella moellendorffii are closely related to other plant homologues phylogenetically.

16

The mono- and sesqui-terpene synthase genes of plants appeared to be evolutionarily related to

17

microbial terpene synthases [25]. Similarly, recent studies carried out on the mono- and sesquiterpene

18

synthase genes of Marchantia polymorpha showed that these genes are distantly related to terpene

19

synthase genes in fungi and bacteria but are unrelated to those previously described from land plants.

20

However, functional diterpene synthase genes present in M. polymorpha showed their direct

21

proximity to other terpene synthases found in vascular plants [25, 26]. These genes might have

22

functionally diversified to mono- and sesqui-terpene synthases during the course of evolution [25, 26].

23

There are studies that have shown the diversity among terpene synthases in different species of

24

bacteria, fungi, lower plants and higher plants [2, 3, 5]. However, there is no report so far at the level

25

of protein structure levels that has completely deciphered the TS related enzymes in these different

26

domains of life. Therefore, the present investigation is an attempt in this direction. The protein

27

structure based present study shows that TS related terpene fold carrying proteins are present in 3

AC C

EP

TE D

M AN U

10

ACCEPTED MANUSCRIPT bacteria, fungi, lower plants and higher plants. Further, terpene fold containing proteins are required

2

for the synthesis of diverse terpenoid derivatives which are involved in the biosynthesis of hormones,

3

vitamins, pigments (carotenoids), quinones, membrane lipids, essential oils, antioxidants and give

4

fragrance, flavor, and medicinal properties to these diverse compounds obtained from different

5

organisms including plants and microbes [5]. These products of terpene synthase enzymes have been

6

extensively explored for their diverse applications as prospective targets for exploitation for advanced

7

Biofuels production, agricultural chemicals, medicines industrial chemicals, flavors and fragrances

8

[27, 28]. Therefore, to demonstrate the microenvironment of terpene fold in the catalytic site of the

9

protein which leads to a structural and functional diversity of these compounds, a comparative

SC

RI PT

1

structural analysis was carried out. It was observed that different regions of the protein are conserved

11

and the functional domains, specifically, which contribute to the product diversity in several terpene

12

synthase enzymes are present in diverse organisms. In addition to the conserved catalytic motifs of the

13

protein, it was also observed that the amino acid residues present in the binding pocket of the enzyme

14

also play important role in the reaction. The orientation of these residues determines the size of

15

substrate and product. We have analyzed inter-domain motions and correlated fluctuations of multi-

16

domain proteins in the context of their catalytic functions. We have also observed the gene fusion and

17

diversity generation during the course of events involving various molecular evolution steps in the

18

terpene fold proteins.

19

2. Methods

20

The overall methodology used to study the comparative sequence and structural differences in the

21

terpene fold containing proteins were given in Fig. 1. The details are given in the following

22

subsections.

23

2.1. Sequence retrieval

24

The protein sequences for TS enzymes were obtained from the NCBI protein database using the

25

BLAST algorithm in the blastp mode with default parameters using a non-redundant database so that

26

the homologous sequences can be obtained from most diverse organisms. The protein sequences (16

AC C

EP

TE D

M AN U

10

4

ACCEPTED MANUSCRIPT in number) obtained were selected as broad representatives from the groups of organisms namely,

2

fungi (beneficial and pathogenic fungi), bacteria, lower and higher plants and were further analyzed.

3

2.2. Multiple sequence alignments and motif analysis

4

Clustal X and MultAlin were used to obtain the best possible sequence alignment between the protein

5

sequences [29, 30]. The pairwise alignment was carried out by conventional dynamic programming

6

method in MultAlin that is both accurate and precise. The secondary structures of the proteins were

7

visualized along with the conserved residues and multiple sequence alignment (MSA) using ESPript,

8

ClustalX and MultAlin [29-31]. Gibbs sampling algorithm was used to obtain the sequence logo of

9

the two motifs in TS proteins [32]. The WEBLOGO tool was used to construct consensus of

10

sequences which indicates the conserved pattern of amino acids at the specific positions [33].

11

2.3. Phylogenetic analysis

12

The sequences were aligned and phylogenetic tree was constructed using 1000 bootstrap value. The

13

MEGA5 was used to construct the tree [34]. Maximum likelihood was used to construct phylogenetic

14

tree of TS proteins as it is used for the analysis of sequences of diverse origins. While Maximum

15

parsimony method was used to construct the phylogenetic tree of terpene fold containing proteins

16

which have similar catalytic domains with a comparatively higher sequence homology than former set

17

of sequences [34, 35]. The phylogenetic tree was viewed using the program FigTree [36].

SC

M AN U

TE D

EP

18

RI PT

1

2.4. Comparative structural analysis of terpene synthase enzymes

20

The TS protein was used as a query which provided 14 terpene fold containing proteins using DALI

21

server [37]. Files containing the coordinates of these proteins were downloaded from PDB [38].

22

Comparative tertiary structure analysis was carried out using the rapid alignment of proteins in terms

23

of domains (RAPIDO) server. RAPIDO aligns the protein structures of diverse protein molecules and

24

identifies the conserved domains, which may be present at different positions in the analyzed

25

sequence. The variations among amino acid residues are employed by weighting-functions based on

AC C

19

5

ACCEPTED MANUSCRIPT the refined B-values [39]. Result files were analyzed by PyMOL [40]. Substrates were docked into the

2

proteins which were without ligand by molecular docking using AutoDock Vina-2.0 and MGLTools

3

[41, 42]. Obtained complexes were energy minimized to achieve best stable conformation by

4

GROMACS software [43, 44]. The contact mapping analysis was carried out using LigPlus [45]. The

5

binding pocket volumes around the ligands were defined within the radius of 10 Å to consider the

6

residues which were involved in non-bonded interactions. This analysis of the proteins was carried out

7

using POVME_2_0_1 [44, 46]. The volumes of substrates and products were analyzed using the

8

‘measure volume and area’ tool of UCSF Chimera [47].

9

Normal mode analysis of single and multi-domain protein

SC

RI PT

1

Global motions of flexible regions of the single and multi-domain proteins were analysed using

11

normal mode analysis (NMA). NMA of the molecules was carried out using Anisotropic Network

12

Model web server which works on the elastic network model. The analysis was carried out using

13

default settings [48, 49]. All atoms of the proteins were taken for the analysis of total 10 modes with a

14

cut-off value of 10 Å for mode calculation for each of them. Elastic network model represents the

15

system at the residue levels. Therefore, macromolecules are represented as a graph in the form of

16

network. The normal mode calculation is based on the harmonic approximation of the potential

17

energy function around a minimum energy conformation. TS and TaxS were analyzed using NMA as

18

representatives of single domain and multi-domain proteins.

19

3. Results

20

3.1. Trichodiene synthase and other terpene fold proteins across different kingdoms of life

21

The metal binding and the catalytic domain of TS protein is well characterized in Fusarium

22

sporotrichiodes [50]. TS protein homologues were obtained from different organisms using pBLAST.

23

It was observed that bacteria, lower plants as well as higher plants have proteins that show sequence

24

similarity with TS protein. MSA across various fungal genera revealed that catalytic region (metal

25

binding motif and pyrophosphate binding motif) of TS protein is conserved (Fig. 3). Secondary

AC C

EP

TE D

M AN U

10

6

ACCEPTED MANUSCRIPT structure analysis revealed that the protein sequence obtained from bacteria and plants have similar

2

secondary structures as observed across different genera of fungi. Further, phylogenetic analysis

3

showed that terpene fold carrying proteins similar to the TS protein were clustered into fungal,

4

bacterial, lycophyte and other plants based on their similarity and co-evolution (Fig. 4). It was

5

observed that in S. moellendorffii terpene fold protein was clustered between that of the bacteria and

6

the plants which provide us a clue to conclude that the protein present in this lycophyte might

7

represent an evolutionary intermediate form between the microbes and the plants.

8

3.2. Distribution of substrate diversity of terpene fold

9

The terpene fold present in TS protein possessing metal binding domain and the catalytic site is

SC

RI PT

1

reported in all the enzymes involved in the biosynthesis of terpene molecules in different organisms.

11

These organisms range from bacteria, fungi, plant and human as retrieved from DALI server. These

12

enzymes are 5-epi-aristolochene synthase (EAS) from tobacco (PDB ID: 5IK0) [51], pentalenene

13

synthase from Streptomyces UC5319 (PS) (PDB ID: 1PS1) [52], epi-isozizaene synthase (EIZS) from

14

Streptomyces coelicolor (PDB ID: 3LG5) [53], 1,8-cineole synthase (CS) from Salvia (PDB ID: 2J5C)

15

[54], geranyl pyrophosphate synthase (GPPS) from Mentha piperita (PDB ID: 3KRF) [55], (+)-δ-

16

cadinene synthase (δCS) from Gossypium arboretum (PDB ID: 3G4F) [56], hexaprenyl

17

pyrophosphate synthase (HexPPS) from Sulfolobus solfataricus (PDB ID: 2AZJ) [57], isoprene

18

synthase from (ISPS) from Grey Poplar Leaves (PDB ID: 3N0F) [58], octaprenyl pyrophosphate

19

synthase (OPPS) from Thermus thermophilus (PDB ID: 1WLO) [59], polyprenyl synthase (PPS) from

20

Caulobacter crescentus (PDB ID: 3OYR) [60], bornyl pyrophosphate synthase (BPPS) from Salvia

21

officinalis (PDB ID: 1N21) [61], squalene synthase from Homo sapiens (SS) (PDB ID: 1EZF) [62]

22

and taxadiene synthase (TaxS) from Taxus brevifolia (PDB ID: 3P5R) [63]. Although these proteins

23

have different lengths of amino acid sequence, however, the amino acid residues involved in the

24

catalysis are reported to be conserved [63, 64]. The metal binding domains DDXX(X)D/E and

25

“NSE/DTE” [N/DDXXS/TXX(K/R)E] were found in many of the terpene synthases. However, there

26

are some enzymes which do not have one of these two domains. NSE/DTE domain was not observed

AC C

EP

TE D

M AN U

10

7

ACCEPTED MANUSCRIPT in δCS, CS, OPPS, PPS and HexPPS while NSE/DTE domain was not found in TaxS. The metal

2

binding domain DD(X)nD is present in GPPS. However, it is not involved in the catalytic activity of

3

the enzyme [55]. It was observed that the amino acid residues interacting with the pyrophosphate

4

moiety were of the same chemical nature that leads to the formation of the product. Arginine,

5

phenylalanine, tyrosine, valine, tryptophan and isoleucine were the commonly observed amino acid

6

residues in the catalytic sites of these enzymes. The pattern of amino acid residue sequence present in

7

the binding pocket of the enzymes provides the molecular basis of the ability of these enzymes to

8

produce the products of diverse chain lengths. Apart from these amino acid residues as mentioned

9

above, few more amino acid residues are also known to play important role in the catalysis. The

SC

RI PT

1

comparative analysis of TS with GPPS, EIZS, HexPP, PS, OPPS and SS revealed similar core

11

structures except for some additional folds (Supplementary Fig. 2). However, in the case of EAS, CS,

12

δCS, ISPS and TaxS, it showed an extended structural fold that might be assisting the enzyme to carry

13

out its additionally evolved functions (Fig. 4).

14

3.3. Evolutionary relationship of different enzymes containing the terpene molecular fold

15

The phylogenetic analysis showed that the proteins are closely related. However, their sequences

16

follow clustering patterns according to their structure, function and presence/absence of additional

17

functional domains other than the core catalytic domain in the proteins. It could be observed that the

18

enzymes involved in the cyclization of isoprenyl compounds tended to cluster in one group i.e. OPPS,

19

PPS, δCS, CS, HexPP. Bacterial and fungal sesquiterpene cyclases are single-domain enzymes that

20

adopt class I terpene synthase fold and such enzymes were clustered in one group. These were PS, SS,

21

FPPS, EIZS and TS. Rest of the proteins clustered into a separate group containing proteins with two

22

or three domains namely, TaxS, CS, EAS, ISPS and BPPS. Comparison of TaxS with other terpene

23

cyclases revealed that cyclase architecture is modular in nature and may be consist of one, two or

24

three domains.

25

3.4. Single domain enzymes

AC C

EP

TE D

M AN U

10

8

ACCEPTED MANUSCRIPT It was observed that the terpene synthases which have a single domain for enzyme catalysis showed

2

structural similarity with homologous enzymes (Supplementary Fig. 1). These include PS, SS, FPPS,

3

EIZS and TS. The structural comparison analysis is elaborated in the following subsections.

4

3.4.1. Squalene synthase

5

The comparative structural analysis of TS and SS proteins showed that they share similarities at the

6

catalytic site. Arginine is the common amino acid residue in both the enzymes involved in the

7

catalysis. The amino acid residues of SS which interact with pyrophosphate moiety are F54, V69,

8

Y73, R77, M150, M154, Y171, V175, L183, L211, Q212, R218, Q219, Y276, F288, C289 and P292

9

(Supplementary Fig. 2). However, in case of TS protein arginine and phenylalanine are the two amino

10

acid residues that are involved in the interactions. The amino acid residues which are present in radius

11

of 10 Å of the substrate binding site of SS are alanine, leucine, glycine and valine with smaller side

12

chains which may provide enough space to accommodate the two molecules of farnesyl

13

pyrophosphate (FPP) that are used during the reaction (Supplementary Fig. 3). However, in case of

14

TS protein, arginine, phenylalanine and glutamate that have longer side chains were observed to be

15

present in the same pocket, thereby, making the pocket smaller in comparison to the SS enzyme that

16

leads to the cyclization of only single molecule of FPP.

17

3.4.2. Polyprenyl transferases

18

HexPPS, OPPS and PPS are the enzymes that belong to polyprenyl transferases. The substrates for all

19

these three proteins are the same while the end products are different (Table 1). However, there are

20

extended α-helices in TS, which were observed to be near the first metal binding domain and at the N-

21

terminal region (Supplementary Fig. 1). Since these helices surround the active site, they might be

22

assisting in the pyrophosphate catalysis. The presence of amino acid residues with smaller side chain

23

such as A76 before the DDXXD motif might be helpful to the enzyme in catalyzing the substrates

24

(Supplementary Fig. 2). The amino acid residues that are important for the catalysis of pyrophosphate

25

include A76, P114, R116, W136 and L164 (Supplementary Fig. 2). The amino acid residues that

AC C

EP

TE D

M AN U

SC

RI PT

1

9

ACCEPTED MANUSCRIPT interact with pyrophosphate in OPPS are A76, R91, K93, D205 and D208 (Supplementary Fig. 2).

2

However, the amino acid residues interacting with pyrophosphate in PPS were K56, P60, H88, L103,

3

R105, L224 and K236. This reflects that amino acids of different types are present in the binding

4

pocket. It is already reported that site-directed mutagenesis of alanine to tyrosine may alter the length

5

of product molecules in OPPS and HexPPS [58]. The contact mapping analysis showed that the

6

substrate FPP interacts with alanine, threonine, aspartate and histidine in the binding pocket, whereas,

7

in the case of isopentenyl pyrophosphate (IPP), arginine, glutamine and phenylalanine were lining the

8

IPP binding site in the same pocket (Supplementary Fig. 4). Since HexPPS, OPPS and PPS

9

accommodate both the substrates in the same binding pocket and therefore, it should be large enough

10

to accommodate both of them simultaneously. It is reported that the cleft volume of HexPPS is 1400

11

Å3 [65]. It was observed that the binding pocket volumes of OPPS and PPS are 1779 Å3 and 2004Å3

12

respectively. It was observed that all these three enzymes have binding pocket volumes larger than the

13

TS enzyme (1175 Å3). This observation is in correlation with molecule sizes of the product of these

14

enzymes (C15 for TS; C30 for HexPPS; C40 for OPPS and C40 or C50 for PPS). This may lead to

15

facilitate the catalysis and swift release of these product molecules from the respective catalytic

16

pockets.

17

3.4.3. Pentalene synthase

18

The catalysis requires hydrophobic surrounding in the binding pocket of both the proteins, TS and PS.

19

However, the amino acid residues present in the binding pocket varies. The aromatic amino acid

20

residues F57, F76, F77, W308 and hydrophobic residues L53, V79, T182 and V301 lined the binding

21

cavity of PS. However, Arginine is the common residue in both proteins which is positively charged

22

and may stabilize the pyrophosphate group present in the substrate. The amino acid residues which

23

have shown non-bonded interactions with pyrophosphate in PS are R44, L53, L82, D117, R173,

24

V179, K225, R230 and H309 (Supplementary Fig. 5). The comparative biding pocket analysis

25

between PS, TS and SS showed that FPP bind at different positions in these enzymes i.e. in case of

26

TS, it showed interactions with D100, N225 and R304 (Supplementary Fig. 5). When FPP was docked

AC C

EP

TE D

M AN U

SC

RI PT

1

10

ACCEPTED MANUSCRIPT at the homologous site as of PS protein and the resulting complex was energy minimized, it showed

2

hydrogen bond interactions with residues D345, E423 and E497; However, the actual binding site

3

reported in the case of PS is different as discussed above (Supplementary Fig. 2 and 5). This

4

stereospecific exclusive binding of FPP in PS, SS and TS binding pockets leads to the formation of

5

different types of product molecules (Table 1).

6

3.4.4. Geranyl pyrophosphate synthase

7

GPPS has functional large subunit (chain A and D) and small subunit (chain B and C) which may

8

have role in catalysis. The large subunit (chain A and D) shows structural resemblance to TS which is

9

a homodimer. GPPS converts dimethylallyl diphosphate (DAMPP) (C5) and IPP (C5) into geranyl

SC

RI PT

1

pyrophosphate (GPP). However, TS protein cyclizes FPP (C15) into trichodiene (C15). Different

11

substrates and products are accommodated by the changes in the local environment of the binding

12

pockets (Supplementary Fig. 5). Pyrophosphate shows hydrogen bond interactions with C87, M88,

13

S107, V160, C161, R293, D294 and N295 in GPPS. However, in the case of TS, only lesser number

14

of residues were observed to interact with pyrophosphate. These include R182, K232, R303, F304 and

15

R305 (Supplementary Fig. 5). It was observed that the amino acid residues, phenylalanine and

16

glutamine (with larger side chains) which showed non-bonded interactions may help the enzyme to

17

accommodate two substrates of smaller sizes in case of the GPPS (Supplementary Fig. 6). Further,

18

binding pocket volume analysis showed that the size of GPPS is 973 Å3 while the size of binding

19

pocket of TS is 1175 Å 3 (Table 2). It is inferred that the binding pocket volume is smaller in case of

20

GPPS to accommodate the smaller substrate molecules. This provides the evidence that both the

21

enzymes have same molecular fold but lead to the formation of different size of product molecules

22

(Table 1).

23

3.4.5. Epi-isozizaene synthase

24

The structural comparison of EIZS with TS showed similarity around the binding pocket. It is earlier

25

pertinently reported that negatively charged D99 is responsible for the metal coordination in both

AC C

EP

TE D

M AN U

10

11

ACCEPTED MANUSCRIPT EIZS as well as TS [9, 53]. The comparative structural alignment revealed that the amino acid

2

residues presented similar pattern in three-dimensional structures in both the cases (Supplementary

3

Fig. 5). The interactions of pyrophosphate moiety with amino acid residues of the binding pocket lead

4

to the formation of enzyme specific product. The amino acid residues interacting with pyrophosphate

5

are R194, F196, I249, N240, S244, K247, R338 and Y339 (Supplementary Fig. 5). Comparative

6

binding pocket analysis showed that the orientation of interacting amino acid residues namely,

7

phenylalanine and arginine were present in the proximity in the binding pocket of EIZS than that of

8

the TS protein which may lead to the formation of different end product (Supplementary Fig. 3;

9

Supplementary Fig. 6) (Table 1). This was further supported by binding pocket volume calculation

SC

RI PT

1

which showed that the size of binding pocket of EIZS is 527 Å3 which is much more compact than the

11

binding pocket (1175 Å3) of TS (Table 2). This stark difference in the size of binding pockets leads to

12

the formation of different products by these enzymes.

13

3.5 Multi-domain proteins

14

Many of the terpene fold proteins have been reported to carry multiple domains [65]. However, these

15

domains are not involved in the classical catalytic activity. The comparative structural analysis of the

16

TS protein with these multi-domain proteins was carried out (Fig. 4). The results of multi-domain

17

proteins namely, δ-CS, CS, EAS, ISPS, BPPS and TaxS are discussed in the below subsections.

18

3.5.1. δ-Cadiene synthase

19

δ-CS is a terpene cyclase [56]. The metal binding motif (NSE/DTE) is absent in δCS. However, it

20

contains another aspartate-rich sequence which interacts with metal ion. The pyrophosphate present in

21

δ-CS showed hydrogen bond interactions with some of the amino acid residues namely, Y382, E385,

22

R448 and R270 (Supplementary Fig. 5). It was observed that δ-CS binding pocket is lined by valine

23

and leucine which support the enzyme to interact with the substrate molecule (Supplementary Fig. 7).

24

The products of both the reactions showed relatively small difference in their volume. However, the

25

binding pocket volume of δ-CS is larger (1464 Å3) than that in case of TS protein (1175 Å3) (Table 2).

AC C

EP

TE D

M AN U

10

12

ACCEPTED MANUSCRIPT Therefore, we observed a reasonable difference in the binding pocket volumes of these two enzymes

2

which may lead to the formation of two different end products.

3

3.5.2. 1, 8-cineole synthase

4

It is reported that W317, I337, T342, Y420, S445, I451, L485 and Y564 amino acid residues are

5

conserved in both enzymes which may help to stabilize the carbocationic intermediates of the reaction

6

[9, 51, 52]. However, amino acid residues with similar nature were reported taking part in catalysis in

7

TS and CS proteins have been reported [9, 51, 52]. There are other amino acid residues in case of CS

8

which were observed to form hydrogen bond with the substrate. These amino acid residues include

9

I337, S445, I451, T342, T343 and L485 which showed bonded interactions with pyrophosphate

SC

RI PT

1

moiety (Supplementary Fig. 8). The amino acid residues with longer side chains namely leucine and

11

tryptophan that are present in the binding pocket of the protein but showed no hydrogen bond

12

formation are known to help the substrate to fit in the binding pocket leading to the formation of 1, 8-

13

cineole (Supplementary Fig. 7). It was observed that the volume of cineole (154.6 Å3) is less than the

14

product [trichodiene (200 Å3)] of TS protein. Further, it was correlated with the comparative binding

15

pocket volume analysis which showed that the binding pocket volume of CS was 857 Å3 which is less

16

than the binding pocket volume of TS protein (1175 Å3) (Table 2). It might be the reason for the

17

formation of different products in these enzymes.

18

3.5.3. 5-Epi-Aristolochene synthase

19

It was observed that the 264-266 and 521-534 amino acid residues present in the loops of EAS

20

contributed to the catalysis of the substrate since these occur near the catalytic site of the protein

21

(Supplementary Fig. 8). In case of TS, arginine and tyrosine were reported to be involved in the

22

reaction catalysis of FPP [67]. However, in our contact mapping analysis, in case of EAS, different

23

residues were observed to form hydrogen bond interactions with the pyrophosphate. These are G526,

24

T401, T402, T403, T445, T447, Y527 and W273. In addition to these, other amino acid residues are

25

also present in the binding pocket and their stereoselectivity leads to the formation of different

26

products (trichodiene in TS and epi-aristolochene in EAS). The contact mapping analysis revealed

AC C

EP

TE D

M AN U

10

13

ACCEPTED MANUSCRIPT that tryptophan and tyrosine showed non-bonded interactions that form an aromatic box in the active

2

site and lead to the formation of epi-aristolochene (Supplementary Fig. 9). This aromatic box leads to

3

cation-pi interactions with the metal ions that provide favourable surrounding for the catalysis of the

4

substrate. It was also observed that the volume of binding pocket of EAS was 1387 Å3 which is larger

5

than the binding pocket volume of TS protein (1175 Å3) (Table 2). While, the calculated volumes of

6

the product molecules are respectively, 206 Å3 (epi-aristolochene) and 200 Å3 (trichodiene).

7

Therefore, these results correlate well with the binding pocket volumes measured and it seems that the

8

bigger pocket is required for the release of bigger product molecule.

9

3.5.4. Isoprene synthase

SC

RI PT

1

The comparative structural analysis of ISPS and TS showed that terpene fold is common in both the

11

proteins but the metal binding motif and catalytic site region varied in the protein structures. The

12

interactions of pyrophosphate moiety were observed with F338, V341, F485, R486 and N489

13

(Supplementary Fig. 8). It is clear from structural presentation that amino acid residues involved in

14

the catalysis are present in the same sphere. However, the orientation and stereochemistry of the

15

amino acid residues, interatomic hydrogen and non-bonded interactions lead to the product specificity

16

in these enzymes. It was supported by the contact mapping which revealed that the amino acid

17

residues namely, serine, phenylalanine and arginine (with relatively longer side chains) were

18

interacting through non-bonded interactions. The long side chains of these residues make the binding

19

pocket of the protein compact for the binding of DAMPP (C5) which is smaller than the substrate

20

[FPP (C15)] of TS enzyme (Table 1). It showed that the amino acid residues lining the binding pocket

21

may influence the size of end product of the enzymes (Supplementary Fig. 9). The binding pocket

22

volume of ISPS was 792 Å3 which is smaller than the binding pocket volume of TS (1175 Å3) (Table

23

2). It provides indicative evidence about the operating mechanism that how these pockets of the same

24

catalytic fold may handle the substrates/products of different sizes.

25

3.5.5. Bornyl pyrophosphate synthase

AC C

EP

TE D

M AN U

10

14

ACCEPTED MANUSCRIPT The comparative structural analysis of BPPS and TS showed that N terminus of BPPS is elongated

2

and the amino acid residues of the catalytic site which interact with pyrophosphate moiety are present

3

towards the C-terminal of the protein. It is known that N-terminus of BPPS is stabilized by the inter-

4

domain hydrogen bond interactions formed carbonyl hydrogen bond between the NH1 atom of the

5

side chain of R56 and O atom of carboxyl group of D355 and O atom of the OH group of the aromatic

6

ring of Y60 and OD1 atom of the side chain of D352 [61]. These interactions may stabilize the

7

pyrophosphate during catalysis as these residues are present in the loop region and point towards the

8

active site of the protein. Arginine was the common amino acid residue which was reported to carry

9

out the reaction in both the enzymes [9, 61]. Other amino acid residues of BPPS involved in the

10

hydrogen bond interactions with pyrophosphate moiety were W323, I344, V452, K511 and F578

11

(Supplementary Fig. 8). Leucine, phenylalanine, tryptophan and isoleucine residues occur in the

12

binding pocket of BPPS that can accommodate the GPP (C10) which is smaller than the substrate

13

(FPP) of TS protein (Supplementary Fig. 10). It is reported that the active site cavity of BPPS is 222

14

Å3 which is smaller than the active site cavity of TS protein (324 Å3) [60]. Therefore, it may be

15

concluded the basis of catalysis of larger substrate (FPP; volume: 339 Å3) by TS in comparison to a

16

smaller size substrate (GPP; volume: 263 Å3).

17

3.5.6. Taxadiene synthase

18

It is reported that the catalytically active site of TaxS is present at the C-terminal of the protein and N-

19

terminal domain contains a double α-barrel which does not have DXDD motif [63]. The terpene fold

20

and catalytic mechanism is same as that of TS. However, the amino acid residues which were

21

observed to form hydrogen bonds with pyrophosphate moiety in TaxS were S587, D613, Y688, Y684,

22

Q709, C719 and C830. These interacting residues determine the length of the end product in the

23

enzyme (Supplementary Fig. 11). The α-helical extra domain present at the N-terminal is 17Å far

24

from the catalytic site (Supplementary Fig. 11) [63]. It is reported that intra-hydrogen bonds of these

25

α-helices may help in stabilizing the pyrophosphate catalysis [63]. It is reported that arginine residues

26

present tandemly in the loop regions may lead to conformational changes that help in the catalysis of

AC C

EP

TE D

M AN U

SC

RI PT

1

15

ACCEPTED MANUSCRIPT the geranylgeranyl diphosphate (GGPP) (C20) [63] (Fig. 5). It is documented that arginine residues

2

play important role in the catalysis of pyrophosphate in BPPS and TS [63]. It would be interesting to

3

investigate the role of additional helical region which is an α-α barrel and contain arginine residues in

4

tandem repeats. The contact mapping showed that the amino acid residues occurring in the binding

5

pocket are aromatic and positively charged residues provide favourable environment for

6

pyrophosphate catalysis. Valine, serine and glycine residues having small side chains allow the protein

7

to bind to GGPP (C20) (Supplementary Fig. 10). This different composition of binding pocket of

8

terpene fold leads to the formation of different products. The kinetic parameters Kcat/Km of the OPPS

9

and CS were reported as 0.005 s-1 and 0.049 min-1 respectively [54, 59]. It was observed that CS may

10

have higher catalytic efficiency than OPPS which may be attributed to the additional helical regions

11

of the multi-domain protein.

12

It was observed that the terpene fold was conserved across different enzymes. The difference is at the

13

level of amino acid residues interacting with the pyrophosphate moiety and residues lining the binding

14

pocket. The proteins were clustered in the phylogenetic tree according to the domains diversity (Fig.

15

3). The proteins with multiple domains are clustered together and the proteins having single domain

16

occur in a separate clade. The phylogenetic analysis has also shown that the clustering was also

17

governed by the size of the end products. This is clear from the phylogenetic analysis that these

18

different enzymes are related on the evolutionary scale.

19

Additional domains of multi-domain proteins may aid in catalytic efficiency

20

NMA of single and multi-domain proteins showed that the additional domains may help in improving

21

the catalytic efficiency of the enzymes and may also help in regulation of their activities. Cumulative

22

movements of additional domains showed ‘puckering forceps’ like converging movement along a

23

central perpendicular axis (746-750 amino acid residues) which could bring the functionally important

24

residues of the binding pocket in close proximity for interaction. As explained above that some

25

regions of additional domains may interact with helices present in the catalytic domain and may

26

further increase the catalysis. However, in single domain proteins, it was observed that the residues

AC C

EP

TE D

M AN U

SC

RI PT

1

16

ACCEPTED MANUSCRIPT adjacent to the catalytic site were in the flexible regions. The amino acid residues which showed more

2

flexibility are shown in red colour in Fig. 6.

3

4. Discussion

4

Phylogenetic analysis provided the evidence of the overall occurrence of the TS protein across

5

different kingdoms of the life. However, TS proteins are well characterized in the fungi [50]. The

6

homologous protein sequences obtained after pBLAST were found in bacteria, lycophytes and plants.

7

It may be inferred that this protein has diverged and evolved with additional non-catalytic domains.

8

The phylogenetic analysis has shown that TS protein has evolved over a period of time according to

9

their role in the biosynthetic pathways of secondary metabolites among different genera of fungi (Fig.

SC

RI PT

1

3). The TS protein sequences from different genera of pathogenic fungi were clustered in one group

11

namely, Fusarium sporotrichiodes, F. graminearum, F. asiaticum, Beauveria bassiana and

12

Stachybotrys echinata. However, the genera of beneficial fungi that are reported as bio-control agents

13

formed a separate cluster adjacent to the pathogenic fungi (Fig. 2). Though it is difficult to explain its

14

biological basis, however, the end products of these molecules synthesized in the same biosynthetic

15

pathway have been reported to have different types of activities. The sequence analysis revealed that

16

there are homologous proteins present in different organisms including S. moellendorffii, Anabaena

17

variabilis, Sciscionella and Arabidopsis thaliana. However, the protein sequences of Nectria

18

haematococca, Capronia epimyces, Aspergillus oryzae and Ophiocordyceps sinensis were clustered in

19

different groups as these are aristolochene synthases containing the same terpene fold as that of the TS

20

protein [4]. It may be the reason that the other fungal genera which showed homology to TS proteins

21

were clustered in the group of Aspergillus oryzae. It is reported that in S. moellendorffii and

22

Marchantia polymorpha the expressed diterpene genes are closely related to the plants whereas, the

23

expressed mono- and sesqui-terpene synthase genes in these plants were closely related to those of the

24

microbes [25, 26]. Similarly, the homologous protein sequences obtained from S. moellendorffii,

25

belong to terpene synthases, that are present at the interface of those of the fungi and plants. However,

26

the proteins obtained from bacteria showed similarity to those of the plants. TS has shown homology

AC C

EP

TE D

M AN U

10

17

ACCEPTED MANUSCRIPT to AS from Aspergillus oryzae [21]. Further, the comparative structural analysis was carried out to

2

observe the presence of terpene fold in different organisms which carry out similar catalytic reactions.

3

However, the substrate and product length and the amino acid residues involved in the reactions were

4

varied in different organisms. It has been reported that metal binding structural motif is conserved in

5

these enzymes which have a distinct terpene fold in different organisms [67]. Mg2+ ions are reported

6

to act as a co-factor in all the proteins containing terpene fold which helps it to recognize the substrate

7

pyrophosphate group by metal coordination [54, 60, 61, 66]. The metal binding also showed to

8

regulate the binding pocket sizes. It is reported that the complexation of pyrophosphate with Mg2+ led

9

to the conformational changes that make the binding pocket sequestered and closed as reported in the

10

case of BPPS [61]. Further, these metal ions form hydrogen bond interactions with pyrophosphate and

11

the amino acid residues of the active site which result in proper orientation of substrate for cyclization

12

and product formation in the terpene fold containing proteins [66]. It is reported that the N termini of

13

BPPS and EAS cap their respective active sites, whereas, TS requires neither an N-terminal domain

14

nor the N terminus for active site closure. TS contains a D101–R304 regulatory motif, which was

15

earlier referred as a molecular switch that triggers active site closure, in this molecular switch, the

16

D101 residue has been reported to interact with Mg2+ [51, 61]. It shows that metal ion has a role in

17

preparing the optimal binding pocket and domains reorganisation to facilitate the overall

18

catalysis.

SC

M AN U

TE D

EP

19

RI PT

1

These enzymes can be broadly classified into single domain and multi-domain on the basis of their tertiary structures. The single domain proteins are clustered together in one group in the

21

phylogenetic tree and the proteins containing two or three domains are clustered into another group

22

(Fig. 3). Since the pyrophosphate possesses negative charge densities, therefore, it was observed to be

23

interacting with positively charged residues. It was also observed that in all analyzed protein

24

sequences, arginine interacts with pyrophosphate. The aromatic residues also play an important role in

25

the enzyme catalysis of all these analyzed proteins as these emanate the catalysis by cation-pi

26

interactions. The proteins which do not have NSE/DTE metal binding domain were observed in both

27

clusters. It is evident that terpene fold is involved in the catalysis of diverse type of chemical reactions

AC C

20

18

ACCEPTED MANUSCRIPT governed by a similar mechanism. This property of terpene fold can be attributed to the diversity

2

evolved in a local environment of the catalytic pockets. The contact mapping analysis of all the

3

enzymes showed that the sidechains of the residues present in the binding pocket determine the

4

diverse size of substrates and products. A similar study on plant terpene synthases showed that the

5

altered residues from the binding pocket may change the end products [68]. It is reported that the side

6

chain of the fifth amino acid present before the first DDXXD motif in HexPPS, GPPS and FPPS

7

decides the length of the product [65]. Therefore, the stereochemistry of the amino acid residues

8

present in the binding pocket and the depth of the binding pocket determine the size of the end

9

products in these enzymes. This observation was further strengthened by the binding pocket volume

SC

RI PT

1

analysis. TS, δ-CS, EIZS and EAS enzymes convert the substrate of the same size into different

11

products of the same size. It was observed that the volume of the binding pocket and the volume of

12

product molecules show a correlation in the context of the diverse behavior of terpene fold enzymes.

13

The comparative volume analysis of binding pockets and substrates/products clearly indicated that the

14

promiscuous behaviour of terpene fold may be attributed to the size of substrates, products and the

15

side chain lengths of the amino acid residues lining the binding pocket which affects the occupancy

16

and movement of ligand molecules in the binding site. It is well known that the enzymes containing

17

terpene fold might be having a common origin but the studies on the evolutionary events which lead

18

to their functional diversity needs to be worked out yet. The comparative analysis of kinetic efficiency

19

of the OPPS to CS showed that CS has higher catalytic efficiency than OPPS [54, 59]. This better

20

efficiency of the multi-domain enzymes may be attributed to the extra α- α barrel structures present at

21

the N-terminal. The structural arrangement indicated that these may be in the evolving phase as the

22

barrel present in this case is not well developed. Therefore, it would be interesting to explore how the

23

tandem arginine repeats reported in the loop of α- α barrel could regulate the catalytic site which is

24

situated 17 Å far, using in silico approaches for analyzing bigger protein domain movements and

25

dynamics such as normal mode analysis as described earlier [69]. Furthermore, the intra-hydrogen

26

bond formation between tandem arginine residues present in the domains other than the catalytic

27

domain may assist in the catalysis. It was observed that the instances at which the arginine residues

AC C

EP

TE D

M AN U

10

19

ACCEPTED MANUSCRIPT are involved in the intra-hydrogen bonding lead to a decrease in the binding pocket volume. For

2

example in the case of TaxS protein, initially before the simulation, the observed binding pocket of

3

the protein was 339 Å3. While, at the instance where arginine residues showed intra-hydrogen

4

bonding, the calculated binding pocket volume was appeared to be 289 Å3. This compaction of the

5

binding pocket as a result of intra-hydrogen bonding may lead to better catalysis in the multi-domain

6

proteins (unpublished results). To extend this observation, it will be interesting to comparatively

7

analyse the representatives from all terpene fold containing multi-domain protein classes in this

8

context. Further, it is well studied that multi-domain proteins utilize inter-domain motions like domain

9

swinging, stretching, twisting, and motion coupling etc. to improve their functionality. This may also

10

aid in their versatile modes of regulation according to the presented variable physiological conditions

11

within the diverse organisms. We attempted to observe whether the additional domains of these

12

terpene family multi-domain proteins communicate with the catalytic domains or not. In the current

13

study we observed that the cumulative motions of additional domains follow a circular pattern around

14

a perpendicular axis present roughly in the middle of these proteins. Therefore, the overall structure of

15

these proteins may become compacter. This compactness of catalytic domain brings the functional

16

residues in close proximity and may increase the catalysis. It could be concluded from NMA that

17

these coupled motions may increase the catalytic efficiency of multi-domain proteins. Further, it

18

would be interesting to investigate the role of this domain and the possible reason why this protein is

19

carrying this reasonably large region that is yet to be characterized functionally. These proteins are

20

involved in the production of economically important secondary metabolites that may be beneficial

21

for sustainable agriculture and biomedical applications [70]. Therefore, by comparing the sequence

22

and structures of these enzymes, we may trace their molecular evolution. The changes that lead to

23

such enzyme catalytic diversity may be helpful in devising the novel recombinant designer enzymes

24

for producing the diverse secondary metabolite molecules of agricultural and biomedical interest at

25

industrial scales.

26

Conflict of interest

27

Authors declare no conflict of interest.

AC C

EP

TE D

M AN U

SC

RI PT

1

20

ACCEPTED MANUSCRIPT 1 Acknowledgements:

3

University Grant Commission, Govt. Of India (UGC) is acknowledged for providing financial support

4

in the form of a fellowship to IK. Research in MA lab is supported by UGC. Research in YA lab is

5

supported by extramural research funds from UGC, Indian Council of Medical Research and Science

6

and Engineering Research Board, DST, Govt. of India. We thank the Central University of Himachal

7

Pradesh and Bioinformatics Resources & Applications Facility, Centre for Development in Advanced

8

Computing, Pune for providing the computational infrastructure used for carrying out this work.

9

References

SC

1. Davis EM, Croteau R. Cyclization enzymes in the biosynthesis of monoterpenes,

M AN U

10

RI PT

2

11

sesquiterpenes, and diterpenes. In: Leeper FJ, Vederas JC, editors. Biosynthesis. Springer

12

Berlin Heidelberg, 2000. p 53-95.

16 17 18 19 20 21 22

TE D

15

2006;106:3412–3442.

3. Christianson DW. Unearthing the roots of the terpenome. Curr Opin Chem Biol 2008;12:141150.

4. Agger S, Gallego FL, Dannert CS. Diversity of sesquiterpene synthases in the basidiomycete

EP

14

2. Christianson DW. Structural biology and chemistry of the terpenoid cyclases. Chem Rev

Coprinus cinereus. Mol Microbio 2009;72:1181-1195.

5. Tholl D. Terpene synthases and the regulation, diversity and biological roles of terpene

AC C

13

metabolism. Curr Opin Plant Biol 2006;9:297-304.

6. Kawaide H, Imai R, Sassa T, Kamiya Y. ent-Kaurene synthase from the fungus Phaeosphaeria sp. L487: cDNA isolation, characterization, and bacterial expression of a

23

bifunctional diterpene cyclase in fungal gibberellin biosynthesis. J Biol Chem

24

1997;272:21706–21712.

21

ACCEPTED MANUSCRIPT 1

7. Caruthers JM, Kang I, Rynkiewicz MJ, Cane DE, Christianson DW. Crystal structure

2

determination of aristolochene synthase from the blue cheese mold, Penicillium roqueforti. J

3

Biolog Chem 2000;275:25533-25539.

4

8. Dairi T, Hamano Y, Kuzuyama T, Itoh N, Furihata K, et al. Eubacterial diterpene cyclase genes essential for production of the isoprenoid antibiotic terpentecin. J Bacterio

6

2001;183:6085-6094.

7

RI PT

5

9. Rynkiewicz MJ, Cane DE, Christianson DW. Structure of trichodiene synthase from

Fusarium sporotrichioides provides mechanistic inferences on the terpene cyclization

9

cascade. PNAS 2001;98:13543-13548.

10. Toyomasu T, Nakaminami K, Toshima H, Mie T, Watanabe K, et al. Cloning of a gene

M AN U

10

SC

8

11

cluster responsible for the biosynthesis of diterpene aphidicolin, a specific inhibitor of DNA

12

polymerase alpha. Biosci Biotechnol Biochem 2004;68:146–152.

13

11. Toyomasu T, Tsukahara M, Kaneko A, Niida R, Mitsuhashi W, et al. Fusicoccins are biosynthesized by an unusual chimera diterpene synthase in fungi. PNAS USA

15

2007;104:3084–3088.

16

TE D

14

12. Shishova EY, Di Costanzo L, Cane DE, Christianson DW. X-ray crystal structure of aristolochene synthase from Aspergillus terreus and evolution of templates for the cyclization

18

of farnesyl diphosphate. Biochem 2007;46:1941-1951.

20 21 22 23 24

13. Pinedo C, Wang CM, Pradier JM, Dalmais B, Choquer M, et al. Sesquiterpene synthase from the botrydial biosynthetic gene cluster of the phytopathogen Botrytis cinerea. ACS Chem Biol

AC C

19

EP

17

2008;3:791–801.

14. Cane DE, Watt RM. Expression and mechanistic analysis of a germacradienol synthase from Streptomyces coelicolor implicated in geosmin biosynthesis. PNAS. 2003;100:1547-1551.

15. Cane DE, He X, Kobayashi S, Omura Cane S, Ikeda H. Geosmin biosynthesis in

25

Streptomyces avermitilis. Molecular cloning, expression, and mechanistic study of the

26

germacradienol/geosmin synthase. J Antibiot (Tokyo) 2006;59:471–479.

22

ACCEPTED MANUSCRIPT 1

16. Agger SA, Lopez-Gallego F, Hoye TR, Schmidt-Dannert C. Identification of sesquiterpene

2

synthases from Nostoc punctiforme PCC 73102 and Nostoc sp. strain PCC 7120. J Bacteriol

3

2008;190:6084–6096.

4

17. Giglio S, Jiang J, Saint CP, Cane DE, Monis PT. Isolation and characterization of the gene associated with geosmin production in cyanobacteria. Environ. Sci. Technol. 2008;42:8027–

6

8032.

8 9

18. Hohn TM, Beremand PD. Isolation and nucleotide sequence of a sesquiterpene cyclase gene from the trichothecene-producing fungus Fusarium sporotrichioides. Gene 1989;79:131–138.

SC

7

RI PT

5

19. Hohn TM, Plattner RD. Purification and characterization of the sesquiterpene cyclase aristolochene synthase from Penicillium roqueforti. Arch Biochem Biophys 1989;272:37–

11

143.

12 13 14

M AN U

10

20. Cane DE, Shim JH, Xue Q, Fitzsimons BC, Hohn TM. Trichodiene synthase- Identification of active site residues by site-directed mutagenesis. Biochem. 1995;34:2480–2488.

21. Cane DE, Kang I. Aristolochene synthase: purification, molecular cloning, high-level expression in Escherichia coli and characterization of the Aspergillus terreus cyclase. Arch.

16

Biochem. Biophys. 2000;376:354–364.

19 20 21 22 23 24 25 26

has both seed plant and microbial types of terpene synthases. PNAS. 2012;109:14711-14715.

EP

18

22. Li G, Köllner TG, Yin Y, Jiang Y, Chen H, et al. Nonseed plant Selaginella moellendorffii

23. Gahtori D, Chaturvedi P. Antifungal and antibacterial potential of methanol and chloroform extracts of Marchantia polymorpha L. Arch. Phytopatho Plant Prot 2011;44:726-731.

AC C

17

TE D

15

24. Asakawa Y. Bryophytes: Chemical diversity, synthesis and biotechnology. A review. Flavour Fragr J 2011;26:318–320.

25. Kumar S, Chase K, Xun Z, Ayla N, Sibongile M, et al. Molecular Diversity of Terpene Synthases in the Liverwort Marchantia polymorpha. The Plant Cell 2016;28:2632-2650.

26. Trapp SC, Croteau RB. Genomic organization of plant terpene synthases and molecular evolutionary implications. Genetics 2001;158:811–832.

23

ACCEPTED MANUSCRIPT 1

27. Pontin M, Bottini R, Luis BJ, Piccoli P. Allium sativum produces terpenes with fungistatic

2

properties in response to infection with Sclerotium cepivorum. Phytochemistry

3

2015;115:152–160.

4

28. McAndrew RP, Peralta-Yahya PP, DeGiovanni A, Pereira JH, Hadi MZ, Keasling JD, Adams PD. Structure of a three-domain sesquiterpene synthase: a prospective target for

6

advanced biofuels production. Structure 2011;19:1876-1884.

8

29. Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res 1988;16:10881-10890.

SC

7

RI PT

5

30. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X

10

windows interface: flexible strategies for multiple sequence alignment aided by quality

11

analysis tools. Nucleic Acids Res 1997;25:4876–4882.

12 13 14

M AN U

9

31. Gouet P, Courcelle E, Stuart DI. ESPript: analysis of multiple sequence alignments in PostScript. Bioinfo. 1999;15:305-308.

32. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. SCIENCE-NEW YORK

16

THEN WASHING 1993;262:208-208.

19 20

Genome Res 2004;14:1188-1190.

EP

18

33. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator.

34. Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 1993;10:512–526.

AC C

17

TE D

15

21

35. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. MEGA5: molecular evolutionary

22

genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony

23 24 25 26 27

methods. Mol Biol Evo 2011;28:2731-2739.

36. Rambaut A. Fig Tree from A. Rambaut [Internet]. 2007 Available from: http://tree.bio.ed.ac.uk/software/figtree/

37. Holm L, Rosenström P. Dali server: conservation mapping in 3D. Nucleic Acids Res 2010;38(suppl 2):W545-W549. 24

ACCEPTED MANUSCRIPT 1 2 3 4

38. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. The protein data bank.Nucleic Acids Res.2000;28:235-242.

39. Mosca R, Schneider TR. RAPIDO: a web server for the alignment of protein structures in the presence of conformational changes. Nucleic Acids Res 2008;36:W42-W46.

40. DeLano WL. PyMOL. DeLano Scientific, San Carlos, CA. 2002;700.

6

41. Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, et al. AutoDock4

7

and AutoDockTools4: automated docking with selective receptor flexibility. J Comp Chem

8

2009;30:2785–2791.

SC

RI PT

5

42. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new

10

scoring function, efficient optimization and multithreading. J Comp Chem 2010;31:455–461.

11 12 13

M AN U

9

43. Berendsen HJ, van der Spoel D, van Drunen R. GROMACS: a message passing parallel molecular dynamics implementation. Comput Phys Commun 1995;91:43–56.

44. Kumari I, Chaudhary N, Sandhu P, Ahmed M, Akhter Y. Structural and mechanistic analysis of engineered trichodiene synthase enzymes from Trichoderma harzianum: towards higher

15

catalytic activities empowering sustainable agriculture. J Biomol Struc Dyn 2015;34:1176-89.

17 18

45. Laskowski RA, Swindells MB. LigPlot+: Multiple ligand–protein interaction diagrams for drug discovery. J Chem Info Mod 2011;51:2778–2786.

46. Ben Nasr N, Guillemain H, Lagarde N, Zagury JF, Montes M. Multiple structures for virtual

EP

16

TE D

14

ligand screening: defining binding site properties-based criteria to optimize the selection of

20

the query. J Chem Info Mod 2013;53:293-311.

AC C

19

21

47. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al. UCSF Chimera-a

22

visualization system for exploratory research and analysis. J Compu Chem 2004;25:1605-

23 24 25

1612.

48. Atilgan AR, Durrell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J 2001;80:505-515.

25

ACCEPTED MANUSCRIPT 1

49. Chunyan X, Tobi D, Bahar I. Computational prediction of allosteric structural changes by a

2

simple mechanical model: application to hemoglobin T to R transition. J Mol Biol

3

2003;333:153-168.

4

50. Tijerino A, Cardoza RE, Moraga J, Malmierca MG, Vicente F, et al. Overexpression of the trichodiene synthase gene tri5 increases trichodermin production and antimicrobial activity in

6

Trichoderma brevicompactum. Fung Gen Biol 2011;48:285-296.

8

51. Starks CM, Back K, Chappell J, Noel JP. Structural basis for cyclic terpene biosynthesis by tobacco 5-epi-aristolochene synthase. Sci 1997;277:1815–1820.

SC

7

RI PT

5

52. Lesburg CA, Zhai G, Cane DE, Christianson DW. Crystal structure of pentalenene synthase:

10

Mechanistic insights on terpenoid cyclization reactions in biology. Sci 1997;277:1820–1824.

11

M AN U

9

53. Aaron JA, Lin X, Cane DE, Christianson DW. Structure of epi-Isozizaene synthase from

12

Streptomyces coelicolor A3(2), a Platform for new terpenoid cyclization templates. Biochem

13

2010;49:1787–1797.

14

54. Kampranis SC, Ioannidis D, Purvis A, Mahrez W, Ninga E, et al. Rational conversion of substrate and product specificity in a Salvia monoterpene synthase: structural insights into the

16

evolution of terpene synthase function. The Plant Cell 2007;19:1994–2005.

17

TE D

15

55. Chang TH, Hsieh FL, Ko TP, Teng KH, Liang PH, et al. Structure of a heterotetrameric geranyl Pyrophosphate synthase from mint (Mentha piperita) reveals intersubunit regulation.

19

The Plant Cell 2010;22:454–467.

21 22 23

56. H.A. Gennadios, V. Gonzalez, L.D. Costanzo, A. Li, F. Yu, et al., Crystal structure of (+)-δ-

AC C

20

EP

18

cadinene synthase from Gossypium arboreum and evolutionary divergence of metal binding motifs for catalysis. Biochem. 48 (2009) 6175–6183.

57. Sun HY, Ko TP, Kuo CJ, Guo RT, Chou CC, et al. Homodimeric hexaprenyl pyrophosphate

24

synthase from the thermoacidophilic crenarchaeon Sulfolobus solfataricus displays

25

asymmetric subunit structures. J Bacterio 2005;187:8137-8148.

26

ACCEPTED MANUSCRIPT 1

58. Köksal M, Zimmer I, Schnitzler JP, Christianson DW. Structure of isoprene synthase

2

illuminates the chemical mechanism of teragram atmospheric carbon emission. J Mol Bio

3

2010;402:363-373.

59. Guo RT, Kuo CJ, Ko TP, Chou CC, Liang PH, et al. A molecular ruler for chain elongation

5

catalyzed by octaprenyl pyrophosphate synthase and its structure-based engineering to

6

produce unprecedented long chain trans-prenyl products. Biochem 2004;43:7678-7686.

7

60. Wallrapp FH, Pan JJ, Ramamoorthy G, Almonacid DE, Hillerich BS, et al. Prediction of

8

function for the polyprenyl transferase subgroup in the isoprenoid synthase superfamily.

9

PNAS 2013;110:E1196-E1202.

SC

61. Whittington DA, Wise ML, Urbansky M, Coates RM, Croteau RB, et al., Bornyl diphosphate

M AN U

10

RI PT

4

11

synthase: Structure and strategy for carbocation manipulation by a terpenoid cyclase. PNAS

12

2002;99:15375–15380.

16 17 18 19 20 21 22

63. Köksal M, Jin Y, Coates RM, Croteau R, Christianson DW. Taxadiene synthase structure and

TE D

15

squalene synthase. J Biol Chem 2000;275:30610-30617.

evolution of modular architecture in terpene biosynthesis. Nature 2011;469:116-120.

64. Yu F, Li M, Xu C, Sun B, Zhou H, et al., Crystal structure and enantioselectivity of terpene cyclization in SAM-dependent methyltransferase TleD. Biochem J 2016;473:4385–4397.

EP

14

62. Pandit J, Danley DE, Schulte GJ, Mazzalupo S, Pauly TA, et al. Crystal structure of human

65. Sasaki D, Fujihashi M, Okuyama N, Kobayashi Y, Noike M, et al. Crystal structure of heterodimeric hexaprenyl diphosphate synthase from Micrococcus luteus BP 26 reveals that

AC C

13

the small subunit is directly involved in the product chain length regulation. J Biolog Chem 2011;286:3729-3740.

23

66. Vedula LS, Cane DE, Christianson DW. Role of Arginine-304 in the diphosphate-triggered

24

active site closure mechanism of trichodiene synthase. Biochem 2005;44:12719–12727.

25

67. Greenhagen B, Chappell J. Molecular scaffolds for chemical wizardry: learning nature's rules

26

for terpene cyclases. PNAS 2001;98:13479-13481.

27

ACCEPTED MANUSCRIPT 1 2 3

68. Greenhagen BT, O’Maille PE, Noel JP, Chappell J. Identifying and manipulating structural determinates linking catalytic specificities in terpene synthases. PNAS 2006;103:9826–9831.

69. Kumari I, Ahmed M, Akhter Y. Deciphering the protein translation inhibition and coping mechanism of trichothecene toxin in resistant fungi. The Intl J Biochem Cell Biol

5

2016a;78:370-376. Doi: 10.1016/j.biocel.2016.08.002.

RI PT

4

70. Kumari I, Ahmed M, Akhter Y. Multifaceted impact of trichothecene metabolites on plant-

7

microbe interactions and human health. App Microbio Biotech 2016b;100:5759-5771.DOI

8

10.1007/s00253-016-7599-0

9

SC

6

Figure legends:

Fig. 1 The overall methodology used for comparative sequence and structural analysis of

11

terpene fold containing proteins

12

This work can be divided into two parts viz. sequence based and structure-based analysis. For the

13

comparative sequence-based analysis pBLAST was used to obtain all the homologous sequences from

14

the diverse taxa of all kingdoms. Trichoidene synthase was used against the non-redundant database

15

as the query sequence. The catalytic domain was analyzed among these sequences by multiple

16

sequence alignment and phylogenetic tree was constructed using Mega5. The terpene fold containing

17

proteins were obtained by using DALI server. The obtained proteins were compared structurally using

18

RAPIDO server. To understand the possible reason which leads to reactant/product diversity in

19

terpene fold, contact mapping analysis and volume calculations of the binding pocket were carried

20

out. Phylogenetic analysis of the proteins containing terpene fold was done using maximum

21

parsimony method.

22

Fig. 2 MSA demonstrates conserved motifs of the TS enzymes across different genera of fungi

23

(a) MSA of different TS proteins from fungi showed that the metal binding regions and catalytic

24

regions which interact with pyrophosphate moiety are conserved. The pattern of conservation is

25

depicted by from the colored weblogo diagrams. (b) The cartoon structure of TS protein is shown in

AC C

EP

TE D

M AN U

10

28

ACCEPTED MANUSCRIPT cyan blue color and terpene fold is shown in blue color. The metal binding motifs are shown in red

2

color and pyrophosphate binding motif is highlighted in green colour. The metal ions are shown in

3

magenta color spheres and pyrophosphate is shown in sticks in the binding pocket of the TS protein.

4

Fig. 3 Phylogenetic analysis of TS protein in fungi and terpene domain enzymes across the

5

kingdoms of life

6

(a) Phylogenetic analysis of TS protein showed that it is present across diverse fungal genera. TS

7

protein of plant growth promoting fungi (PGPF) and pathogenic fungi form separate clusters. While

8

the proteins that are homologous to TS protein in other fungal genera excluding the PGPF and

9

pathogenic fungi contain the same terpene fold with different enzymatic activities form a separate

SC

RI PT

1

cluster. Terpene synthase from lycophyte was present at the interface of microbes and plants. (b) The

11

catalytic residues which are reported to be involved in the reaction catalysis in TS of Fusarium

12

sporotrichoides have shown conservation among different organisms as represented in MSA. (c)

13

Phylogenetic analysis of TS protein with other enzymes containing terpene fold also resulted in two

14

clusters based on their structural diversity and a catalytic domain. The green colour is showing the

15

proteins with two or more domains and the cyan blue colour is showing the proteins which carry

16

single domain. The proteins which do not have second metal binding domain are indicated by the

17

brackets in the tree. The abbreviations used for the proteins: 5-epi-aristolochene synthase (EAS),

18

pentalenene synthase (PS), epi-isozizaene synthase (EIZS), geranyl pyrophosphate synthase (GPPS),

19

(+)-δ-cadinene synthase (δCS), hexaprenyl pyrophosphate synthase (HexPPS), isoprene synthase

20

(ISPS), octaprenyl pyrophosphate synthase (OPPS), polyprenyl synthase (PPS), trichodiene synthase

21

(TS), bornyl pyrophosphate synthase (Bornyl Ppi Synthase), farnesyl pyrophosphate synthase (FPPS),

22

squalene synthase (SS), pentalene synthase (PS) and taxadiene synthase (TaxS).

23

Fig. 4 Comparative structural analysis of multi-domain terpene synthase proteins

24

TS protein is shown in cyan blue colour and other proteins are shown in green colour. While the

25

helical region presented in magenta colour is absent in TS protein. It was observed that the catalytic

26

domain is conserved in all the proteins. However, the occurrence of metal binding and pyrophosphate

AC C

EP

TE D

M AN U

10

29

ACCEPTED MANUSCRIPT interacting residues are at different positions in these proteins. The additional helical structures in the

2

multi-domain proteins have not shown any direct roles in catalytic activity. There are indications of

3

their regulatory roles. For instance, it is reported in TaxS that N-terminal of protein showed intra-

4

hydrogen bond interactions that stabilize this non-catalytic domain of the protein. Arginine residues

5

are present in this region which is reported to play important role in the substrate catalysis in all TS

6

proteins. Therefore, it may be considered that N-terminal domain stabilizes the catalysis of

7

pyrophosphate and might be helpful in enhancing the catalytic efficiency [63].

8

Fig. 5 α- α barrel in the N-terminal region of TaxS protein

9

(a)It is reported that the amino acid residues of N-terminal region form intra-hydrogen bonds that may

SC

RI PT

1

stabilize the interaction of pyrophosphate [63]. It was observed that the loops of the N-terminal

11

domain are 17Å far from the catalytic site. The N-terminal region contains loops which are flexible in

12

nature and may aid in the movement of the α-α barrel domain and subsequently may have a regulatory

13

role in the catalysis. (b) The N-terminal region of the protein is arranged in α- α barrel like fold. It is

14

reported that arginine is present tandemly in this domain. Arginine is reported to play important role

15

in the catalysis of terpene synthases [63]. Therefore, it will be interesting to study in future the role of

16

this region.

17

Fig. 6 Normal mode analyses of single and multi-domain terpene fold proteins

18

(a) NMA of single domain protein showed that the amino acid residues around the catalytic site are

19

flexible in nature. These may help in the catalysis of the substrate. (b) While, NMA of multi-domain

20

proteins showed that the flexible regions in the proteins lead to ‘puckering forceps’ like movements,

21

which may increase the compactness of the protein and its catalytic efficiency. (c) The multi-domain

22

protein is shown in the cartoon. The green coloured region contains catalytic sites highlighted in red

23

and blue colour. Additional domains other than the catalytic domain are highlighted in magenta

24

colour. The coupled motion of different domains is depicted in blue arrows.

AC C

EP

TE D

M AN U

10

25 30

ACCEPTED MANUSCRIPT Tables Table 1 Terpene fold catalyses similar reaction to yield diverse products Enzyme

Reactant (size)

Product (size)

1.

Trichodiene synthase

Farnesyl pyrophosphate (C15)

Trichodiene (C15)

2.

Squalene synthase

2 Farnesyl pyrophosphate (C15)

Squalene (C30)

3.

Pentalene synthase

Farnesyl pyrophosphate (C15)

Pentalene (C15)

4.

Geraynl pyrophosphate synthase

Dimethylallyl diphosphate (C5) and Isopentenyl pyrophosphate (C5)

Geranyl pyrophosphate (C10)

5.

Octaprenyl pyrophosphate synthase

Farnesyl diphosphate (C15) Octaprenyl and Isopentenyl pyrophosphate pyrophosphate (C40) (C5)

6.

Hexaprenyl pyrophosphate synthase

Farnesyl diphosphate (C15) and Isopentenyl pyrophosphate (C5)

7.

Polyprenyl synthase

Decaprenyl diphosphate Farnesyl diphosphate (C15) and Isopentenyl pyrophosphate (C40) (C5)

8.

Epi-isozizaene synthase

Farnesyl pyrophosphate (C15)

Epi-isozizaene (C15)

9.

5-Epi-aristolochene synthase

Farnesyl pyrophosphate (C15)

5-Epi-aristolochene (C15)

10.

1,8-cineole synthase

Geranyl diphosphate (C10)

1,8-cineole (C10)

11.

δ-Cadiene synthase

Farnesyl pyrophosphate (C15)

δ-Cadiene (C15)

12.

Isoprene Synthase

Dimethylallyl pyrophosphate (C5)

Isoprene (C5)

Bornyl pyrophosphate synthase

Geranyl pyrophosphate (C10)

Bornyl pyrophosphate synthase (C10)

Taxadiene synthase

Geranylgeranyl diphosphate (C20)

Taxadiene (C20)

14.

SC

M AN U

TE D

EP

AC C

13.

RI PT

Sr. No.

1

Hexaprenyl pyrophosphate (C30)

ACCEPTED MANUSCRIPT

Protein

Volume (Å3)

1.

Trichodiene synthase

1175

2.

Epi-isozizaene synthase

527

3.

Geraynl pyrophosphate synthase

973

4.

δ-Cadiene synthase

1464

5.

1,8-cineole synthase

857

6.

5-Epi-aristolochene synthase

1387

7.

Isoprene Synthase

792

AC C

EP

TE D

M AN U

SC

Sr. No.

RI PT

Table 2 Volume of the substrate binding pocket of the terpene fold enzymes

2

ACCEPTED MANUSCRIPT

RI PT

pBLAST of trichodiene synthase with NR-sequence database

SC

Conservation pattern of the catalytic site across different organisms

Phylogenetic tree of trichodiene synthase among different organisms

M AN U

Sequence based

Fig1

TE D

EP

Comparative structural analysis using RAPIDO server

AC C

Structure based

Terpene fold identification in different proteins using DALI server

Binding pocket analysis to explore the reaction diversity of terpene fold

Phtylogenetic tree on the basis of terpene fold

ACCEPTED MANUSCRIPT

Fig2

(b)

(b)

Mg2+

AC C

EP

TE D

M AN U

SC

RI PT

(a)

Pyrophosphate

ACCEPTED MANUSCRIPT

Fig3

(c)

(b)

AC C

EP

TE D

M AN U

Fungi

SC

RI PT

(a)

ACCEPTED MANUSCRIPT

Fig4 5-epi-aristolochene synthase

1,8-cineole synthase

(+)-δ-cadinene synthase

AC C

EP

Bornyl pyrophopshate synthase

TE D

M AN U

SC

RI PT

Isoprene synthase

Taxadiene synthase

ACCEPTED MANUSCRIPT

Fig4

17Å

M AN U

SC

17.3Å

RI PT

(a)

AC C

EP

TE D

(b)

ACCEPTED MANUSCRIPT

Fig6

102-124

(a)

(c)

RI PT

169-175

M AN U

SC

247-252

311-320 564-571

AC C

504-534

EP

TE D

(b)

653-660 667-680 726-733 746-750

40-52

ACCEPTED MANUSCRIPT

Highlights 1. Phylogenetic analysis of terpene fold showed evolution at domain levels 2. Terpene fold sequence from lycophyte Selaginella is between microbes and plants

RI PT

3. Amino acid side chains in catalytic pocket determine substrates/products diversity

4. Multi-domain enzymes contain additional α-α barrel which may regulate the catalysis

AC C

EP

TE D

M AN U

SC

5. ‘Puckering forceps’ kind of regulatory motion was observed in multi-domains

ACCEPTED MANUSCRIPT Author’s agreement and ethical statement All the authors have jointly worked on the manuscript and agree to its publication. No part of the manuscript has been published previously or currently under consideration for publication. The acknowledgements contain complete information on the funding we receive and we have no financial conflicts of interests to declare. There are no ethical issues involved in this work.

RI PT

On behalf of all authors, Yusuf Akhter, PhD

AC C

EP

TE D

M AN U

SC

Corresponding author

Evolution of catalytic microenvironment governs substrate and product diversity in trichodiene synthase and other terpene fold enzymes.

Trichodiene synthase, a terpene fold enzyme catalyzes the first reaction of trichodermin biosynthesis that is an economically important secondary meta...
2MB Sizes 1 Downloads 9 Views