MBE Advance Access published July 24, 2014

Dual targeted proteins tend to be more evolutionarily conserved Irit Kisslov1,*, Adi Naamati1,*, Nitzan Shakarchy1, and Ophry Pines1,2

1

Department of Microbiology Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem 91120, Israel, 2CREATE-NUS-HUJ Cellular & Molecular Mechanisms of Inflammation Program, National University of Singapore, , Singapore 138602

*These authors contributed equally to this study. Corresponding author Ophry Pines, Department of Microbiology Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University, Jerusalem 91120, Israel, Tel 972-2-6757203; Fax 9722-6757260; E-mail: [email protected]

© The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

Running head: Dual targeted proteins are more conserved

Abstract In eukaryotic cells, identical proteins can be located in more than a single sub-cellular compartment, a phenomenon termed dual-targeting. We hypothesized that dual-targeted proteins should be more evolutionary conserved than exclusive mitochondrial proteins, due to separate selective pressures administered by the different compartments to maintain the functions associated with the protein sequences. We employed codon usage bias, gene loss propensity, phylogenetic relationships, conservation analysis at the DNA level and gene expression, to test conserved than their exclusively-targeted counterparts. We then used this trait of gene conservation, together with previously identified traits of dual targeted proteins [such as protein net charge and mitochondrial targeting sequence (MTS) strength] in order to i) create, for the first time (due to addition of conservation parameters), a tool for the prediction of dual targeted mitochondrial proteins based on protein and mRNA sequences, and ii) show that molecular mechanisms involving one versus two translation products are not correlated with specific dual targeting parameters. Finally, we discuss what evolutionary pressure maintains protein dual targeting in eukaryotes and deduce, as we initially hypothesized, that it is the discrete functions of these proteins in the different subcellular compartments, regardless of their dual targeting mechanism.

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

our hypothesis. Our findings indicate that, indeed, dual-targeted proteins are significantly more

Introduction It is well documented that in eukaryote cells, molecules of one protein can be located in several subcellular locations, a phenomenon termed dual targeting, dual localization, or dual distribution. These identical or nearly identical forms of the proteins localized to different subcellular compartments

are

termed

echoforms

or

echoproteins

(to

distinguish

them

from

isoforms/isoproteins) (Yogev and Pines 2011). The high abundance of dual targeting was recently supported by an independent experimental screen of 320 mitochondrial gene products third of the mitochondrial proteome is dual-targeted and that these dual targeted mitochondrial proteins differ from mitochondrial exclusive proteins by a number of traits (Dinur-Mills, Tal, and Pines 2008; Ben-Menachem et al. 2011). These distinctive properties of dual localized proteins include a lower probability of mitochondrial localization (according to MitoProtII prediction, (Claros and Vincens 1996) a lower protein net charge and an enrichment for proteins with weaker mitochondrial targeting sequences. Nevertheless, before this study, we had not succeeded in developing a search motor for mitochondrial dual targeted proteins. Protein dual targeting can be achieved by a variety of molecular mechanisms all described in depth in reviews on this topic (Karniely and Pines 2005; Avadhani 2011; Yogev and Pines 2011; Duchene and Giege 2012; Carrie and Small 2013; Carrie and Whelan 2013) and briefly summarized below. Dual targeting mechanisms can be divided into two types of mechanisms, according to the number of translation products involved. Dual targeting by two translation products can occur due to the existence of multiple mRNAs that are derived from a single gene. This can be achieved either by alternative transcription initiation or mRNA splicing, in which the coding for a targeting sequence is removed. One mRNA can also give rise to several proteins by translation initiation from a downstream in frame start codon. In all these cases, two translation products (one containing and one lacking the targeting signal) are made and are targeted to different cellular locations. Dual targeting of a single translation product may be achieved, for example, due to an ambiguous targeting sequence that can be recognized by more than one organelle. Similarly, two (or more) targeting signals on a single polypeptide can provide a mechanism of dual targeting. Here the balance of echoprotein amounts between the different organelles is determined by the affinity of each signal for its target. Single translation products

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

[α-complementation, (Ben-Menachem et al. 2011)]. In fact, we have recently estimated that one

harboring one or more specific targeting signals can be dual targeted, when one of the signals is inaccessible under certain conditions due to protein folding, modification or binding of another protein. Another unique dual targeting mechanism is “reverse translocation” in which all molecules are first targeted to mitochondria, begin their translocation and are processed by MPP, but then a sub-population of the molecules moves back to the cytosol. There are several additional possible single-translation-product mechanisms, which are described elsewhere (Karniely and Pines 2005; Yogev and Pines 2011).

proteins is so abundant? Is this due to a leakiness of the targeting process or is this due to the two echoforms always having distinct functions? While leakiness of the targeting process could explain certain cases of dual targeting there are a number of arguments against leakiness being a general principle of this phenomenon: 1) Dual targeting is achieved by distinct but precise molecular mechanisms as described above, 2) Two thirds of the yeast mitochondrial proteome are exclusively localized to mitochondria, demonstrating that protein targeting is not an intrinsically leaky process, 3) Dual localized mitochondrial proteins constitute a subgroup of mitochondrial proteins with distinctive properties, suggesting a nonrandom determination and not a leaky process, 4) Targeting and translocation of proteins into mitochondria is very efficient, not leaky, as one deduces from the fact that unprocessed precursor proteins are not detected in cells. In this study we find that dual-targeted mitochondrial proteins are more evolutionary conserved than exclusive or non-mitochondrial proteins. This finding that dual targeted proteins are under greater evolutionarily pressure than exclusive-targeted proteins supports our notion that in most cases dual targeting is associated with dual function. Our hypothesis is that in order to maintain protein function and high enough levels of a protein in the two subcellular locations simultaneously, the dual-targeted proteins are more conserved and have higher expression levels. Our findings led us to develop a search motor for dual targeted proteins.

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

We have previously pondered (Ben-Menachem et al. 2011), why dual targeting of mitochondrial

Results Dual-targeted proteins are more conserved than exclusive mitochondrial proteins Dn/Ds ratio. One very common way to estimate conservation at the DNA level and evaluate evolutionary pressure is by the Dn/Ds ratio (Nei and Gojobori 1986), the ratio of nonsynonymous substitutions per non-synonymous site (Dn) to the synonymous substitutions per synonymous site (Ds). A low Dn/Ds ratio indicates high conservation at the protein level. the MITOP2 database (Andreoli et al. 2004) and discovered significantly lower Dn/Ds values of dual targeted versus exclusively mitochondrial proteins (not shown). However, the simple Dn/Ds measure is based on the assumption that synonymous changes undergo neutral selection. It turns out that synonymous substitutions are subject to selection in favor of preferred codons (Hirsh, Fraser, and Wall 2005). Hirsh et al suggested a yeast specific adjustment of the Ds values, to account for this effect, which we have adopted. This correction (Dn/Ds') is based on the Codon Adaptation Index (CAI) which measures the use of rare codons and which is correlated with protein expression levels. We analyzed all mitochondrial proteins (Dinur-Mills, Tal, and Pines 2008; Ben-Menachem et al. 2011) and found that 316 dual-targeted proteins are significantly more conserved (have lower Dn/Ds' ratio) than 482 exclusively mitochondrial proteins (Table 1). As a control, we analyzed randomly selected groups out of the non-mitochondrial yeast proteins according to the same approach. Only 41-43/1000 samples showed a similar Dn/Ds' ratio trend found for the dual versus non-dual mitochondrial proteins, thereby supporting our statistical analysis (Table 1). As an additional control, we compared the values of the non-mitochondrial proteins to the mitochondrial proteins by group size. As shown in Fig. 1, mitochondrial proteins are more conserved than non-mitochondrial proteins, while the dual-targeted subgroup of the mitochondrial proteins is the most conserved; All 1000 samples of the same size had higher Dn/Ds' values, and higher medians than the dual targeted\exclusive mitochondrial proteins respectively (mean Kolmogorov-Smirnov p-value 0.928 and 0.816; mean Mann-Whitney p-value 0.988 and 0.891). Supplementary Fig. S1 depicts  the  distribution  of  Dn/Ds’  as  a  box-plot. Above we used only S. cerevisiae strains to calculate the nonsynonymous and synonymous rates

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

Initially we looked at the Dn/Ds distribution of mitochondrial reference set proteins taken from

of substitution. Since the conservation of genes within the 29 S. cerevisiae strains is high (average pairwise sequence identity 97.5%), it is safe to assume that a protein is either dual targeted or exclusive in all strains. Nevertheless, because substitutions are in principle changes between species we employed PAML (see Methods) with four species as an outgroup: S. paradoxus, S. bayanus, and S. mikatae. PAML successfully processed 784 out of 798 genes with average pairwise sequence identity percentage of 83.5%. We find that dual-targeted proteins are on average significantly more conserved than exclusive-mitochondrial proteins (Table 1). These strains. Expression levels and Codon Adaptation Index (CAI). The Codon Adaptation Index (CAI) referred to above, measures the use of rare codons as a way to estimate the translation rate of a protein, which itself is a selective evolutionary pressure. Since CAI has been correlated with other parameters, including evolutionary conservation and elevated expression (Sharp and Li 1987; Pal, Papp, and Hurst 2001) we tested whether dual-targeted and exclusive mitochondrial proteins differ in their CAI values. Dual targeted proteins have significantly higher CAI values than exclusive mitochondrial proteins and significantly different medians (Table 2). In contrast, and as a control, random groups of non-mitochondrial proteins of similar size essentially exhibited no difference (Table 2). We also found that dual-targeted proteins have higher CAI values than non-mitochondrial proteins in 962/1000 samples (Kolmogorov-Smirniv mean pvalue 0.010). Both high expression levels of genes, and evolutionary conservation are correlated with CAI (Hirsh, Fraser, and Wall 2005). We tested the expression levels of mRNAs encoding mitochondrial proteins. First we looked at the absolute number of particular gene transcripts in yeast cells as published by Siwiak and Zielenkiewicz (parameter x in the paper) (Siwiak and Zielenkiewicz 2010). 271 dual-targeted proteins were found to have more transcripts than 436 exclusive-mitochondrial proteins (p-value 0.0078, one sided Kolmogorov-Smirnov test). Furthermore, we looked at the transcript level data from Klockow (Klockow et al. 2008) in which expression in wild type S. cerevisiae was examined under different glucose levels. We found that dual-targeted proteins, indeed, have higher transcript levels than exclusive mitochondrial proteins (p-value 0.015, one sided Kolmogorov-Smirnov test, 295 dual-targeted

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

results using the outgroup are consistent with our previous findings using the 29 S. cerevisiae

proteins, 423 exclusive mitochondrial proteins). As will referred to in the discussion, we hypothesize that the conservation of dual targeted proteins emanates from their dual function. The fact that expression levels of dual targeted proteins are higher, certainly supports this notion since there is a clear correlation between expression levels and the importance of function (e.g Genome Res. Dec 2003; 13(12): 2686–2690). Thus higher expression levels in our case are required to maintain proteins, at functional levels, in the two subcellular locations simultaneously. Number of homologs and the Propensity for Gene Loss (PGL). The analysis above is based treated the same, while there is clearly a meaningful difference to conservation in the two scenarios. To approach this question we analyzed two parameters: (1) the number of homologs each gene has and (2) the propensity for gene loss (PGL) (Krylov et al. 2003). It is important to emphasize that in the subsequent analyses, we referred to organisms with a number of paralogs only once as having an ortholog. Data regarding eukaryotic homologous proteins, extracted from Homologene (http://www.ncbi.nlm.nih.gov/homologene), revealed that dual targeted proteins have more homologs than exclusive mitochondrial proteins (8 versus 7, Table 3). Moreover, nonmitochondrial proteins have less homologs than mitochondrial proteins (median 4.5, mean 7.5, Table 3). The Propensity for Gene Loss (PGL) (Krylov et al. 2003) is based on estimating the time during which a gene could have been lost, but was not, compared with the total time available. Thus, low values mean high conservation. The evolution tree topology and branch length was extracted from iTol (Letunic and Bork 2011) for 15/20 organisms in Homologene. PGL was then calculated as illustrated in Supplementary Fig. S2: Each gene was assigned with a binary vector of gene absence or presence in the organisms; vectors were then projected onto the tree and PGL values were calculated according to the tree topology and branch length. We discovered that dual targeted proteins have a lower PGL median value than exclusive mitochondrial proteins, thereby providing evidence that they are indeed more conserved over time (Table 3). The control data for non-mitochondrial proteins indeed show that they have higher PGL values than mitochondrial proteins (Table 3).

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

on sequence alignment. Similar sequences of close species and evolutionarily distant species are

Development of a prediction tool for dual targeting. Support vector machines (SVMs) are a machine-learning approach for the classification of data points into one of two possible categories (Vapnik 1995). The SVM is given a set of training examples, each marked as belonging to one of two categories. Then, it builds a classification model that best fits all of them, i.e. for the given training set, the SVM has minimal classification mistakes. The trained SVM can now assign new examples into one category or the other. In our case we provided the SVM with a group of dual and exclusive mitochondrial proteins together between the two groups. The input vector for the SVM, is composed of seven parameters which were found to be distinctive. Four were derived from the protein and targeting (MTS) sequences, found in previous studies (Dinur-Mills, Tal, and Pines 2008):

(i) probability to be targeted to

mitochondria (MitoProtII score), (ii) total net charge of the protein, (iii) the hydrophobic moment ( Hd), used as a measure of helical amphiphilicity of the targeting sequence, and (iv) the number of positively charged residues within the N-terminal targeting sequence. To these four parameters we added three parameters referring to conservation of dual targeted proteins as analyzed in the previous sections of this manuscript: (i) CAI, (ii) PGL, and (iii) the number of organisms with a homologous gene (as found in Homologene). The  Dn/Ds’  value  relies on the availability of two sequences for comparison, as well as on their similarity, and in extreme cases (very high or very low identity) result in null values. Thus, for generalization of our SVM, these Dn/Ds’  values  were  not  included. Having only seven potential features allowed us to select an SVM model by a 'grid search' which is an exhaustive search for the best SVM parameters. We tried 154 SVM types built from three types of kernel functions (linear, polynomial and RBF) and eleven values of the C parameter. This was performed in a 5-fold cross validation and was repeated 10 times as described by Yahalom et al. (Yahalom et al. 2011). Performance measures were averaged over each 5-fold and then over the 10 repeats to obtain a single value for each SVM type. As only one third of the proteins are dual targeted, while two thirds are exclusive mitochondrial proteins, our data is unbalanced. Therefore, we used MCC (Matthews Correlation Coefficient, Equation 1) as the primary quality measure, and accuracy (Equation 2) as a secondary measure. Every predictor has

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

with a set of parameters related to dual targeting, which were found to differ significantly

correct classifications for both groups (true positive TP and true negative TN) as well as errors (false positive FP and false negative FN). The same accuracy can be achieved by many combinations of TP and TN. However, a predictor which poorly predicts one class (small TP or TN) is clearly biased and less informative, and in such cases MCC assesses this bias and acquires a negative or very close to zero value. MCC, in contrast to accuracy, is often used as a quality measure of binary classifiers with unbalanced datasets. (Eq. 1) 𝑀𝐶𝐶 =  

)(

)(

)(

 

)

(Eq. 2) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =   The training set was the same as in the first section of the article. 798 mitochondrial proteins; 316 dual targeted and 482 exclusive-mitochondrial proteins. The model with the best (average) MCC was linear with a C parameter value of 8. The MCC was 0.32 and accuracy was 66.6%, but only one out of 50 models of this type converged (i.e. after the maximum number of rounds, a default of 15,000 in Matlab, it still had not converged). Since this contradicts the idea of using repeats in order to select a good general model, other models were used for further performance evaluation. The next runner-up in the average MCC was of an RBF type with a C parameter of 2-5 and a gamma parameter of 8. The average MCC was 0.26 and the average accuracy was 66.07%. We then used accuracy, as secondary measure to choose one SVM out of the 50 of this type. The most accurate SVM of this type had an MCC of 0.38 and an accuracy of 73.8%. We also chose the SVM model with the best accuracy out of all 7700 runs, which was also the model with the best MCC (75%, 0.414 respectively). This polynomial SVM is of the 2nd degree, its C parameter is 2-5, its average MCC is 0.24 and its average accuracy is 65.2%. To examine the contribution of each parameter to the prediction, we took out one parameter at a time by placing zeros instead of the true value, and applied the SVM models to our 798 proteins. None of the seven runs performed better than the original run (where all seven parameters are incorporated) in terms of accuracy or MCC. Supplementary Table 3 includes accuracy and MCC and NPV and PPV values (negative and positive predictive values respectively). Accordingly, we used all seven parameters in the generation of our SVMs. We evaluated the two chosen SVMs by comparing them to two predictors: YLoc (Briesemeister,

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

(

Rahnenfuhrer, and Kohlbacher 2010) and PSortII (Nakai and Horton 1999). These predictors output a probability of a protein to be located in each of several locations (YLoc four locations, PSortII eleven locations). A probability for a specified location l is defined as high, if it is higher than the average of probabilities in the tested dataset for location l. We used three cutoffs to label a protein as dual targeted: (i) mitochondrial and cytoplasmic locations are with highest probabilities for this protein (ii) mitochondria and cytoplasm have a non-zero probability for this protein and (iii) mitochondrial location is high and at least one of the other location's probability prediction had 67.2% and 68.7% accuracy, and MCC values of 0.28 and 0.32, respectively (Supplementary Table 4). This outperforms all six classifications based on YLoc and PSortII both in accuracy as well as in MCC (accuracy between 38.5% and 57.5% and negative MCC values between -0.20 and -0.02). In addition we examined our SVMs, and YLoc and PSortII predictors employing a different mitochondrial proteome and a dual targeting dataset: The UNIPROT database designates 709 yeast proteins as mitochondrial (based on experimental location data). Of these, 107 are not part of our training set. We then used UNIPROT localization to classify them into 51 dual targeted and 56 exclusively mitochondrial proteins. In this regard we referred to the UNIPROT mitochondrion and mitochondrion-related locations (such as mitochondrion membrane, mitochondrion inner and outer membrane) together as a single mitochondrial location. Our SVM models performed better than YLoc and PSortII. We obtained best accuracy of 70.1% and MCC of 0.4 while the best performances of YLoc or PSortII were 50.5% with an MCC of -0.09 (Supplementary Table 4). Experimental identification of possible types of distribution mechanisms Above we have established evolutionary conservation as a trait enriched in mitochondrial dual targeted proteins. One interpretation of this finding is that the evolutionary driving force for dual localization of these proteins, are the separate functions of the echoforms in each subcellular compartment. In this regard, there are more than ten different possible mechanisms by which this phenomenon can occur [see introduction and reviewed in (Karniely and Pines 2005; Yogev and Pines 2011)]. If, in fact, dual function is the selective pressure for dual targeting, one would predict that the mechanism by which they are targeted should not matter. In other words, we

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

is high. First we evaluated our performance on the original dataset of 798 proteins. Our SVMs

would not expect any correlation between the dual targeting mechanism and the parameters used to predict dual targeting. To examine this question, we set out to identify proteins with a mechanism of two translation products, or a mechanism of a single translation product, which nevertheless harbors the information for targeting to two locations. We focused on proteins which were suggested to be dual targeted by the

-complementation

assay (Ben-Menachem et al. 2011) in order to allow easier downstream experimental analysis based on the

-fragment tag. This assay is extremely sensitive allowing detection of minute

In these cases one of the echoforms has barely detectable amounts of its molecules in one of the compartments (Regev-Rudzki et al. 2005; Regev-Rudzki and Pines 2007) The precise choice of the proteins for the analysis is described in the Supplementary Material. Two types of experiments were designed as a test for dual targeting of one or two translation products. i) Can the protein be detected (using anti-

antiserum) in the two compartments,

mitochondria and cytosol, based on subcellular fractionation? and ii) Can two forms of the protein be detected by blocking import and subsequent processing in mitochondria, using a mitochondrial membrane potential uncoupler (e.g. CCCP)? The interpretation is based on the detection of two protein sizes, namely the long protein being the mitochondrial unprocessed precursor and the shorter protein lacking the MTS being the cytosolic echoform. The first 5 proteins in Fig 2 (OSM1, PRD1, GPD2, GLO4 and MGE1) appear to be distributed by a two translation product mechanism; all proteins have two translation products according to metabolic labeling experiments (Fig2A, right panels, see arrows indicating precursor and mature forms). OSM1, PRD1, GPD2 and GLO4 can also be detected both in the mitochondria and cytosol by subcellular fractionation, while MGE1 is not and is eclipsed distributed according to the -complementation screen (Ben-Menachem et al. 2011). These five genes, indeed, encode a second downstream in frame methionine codon (between positions 8-70 of the protein) and the data (Fig 2A) indicate a general two translation product mechanism. A second subgroup of proteins, 10 out of the 27, can be clearly detected in two compartments according to subcellular localization, with a significant portion of the molecules in the cytosol (Fig 2B, left panels). Metabolic labeling experiments for these proteins detect only a single band

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

amounts of protein in a specific subcellular compartment, as in cases of "eclipsed distribution".

in SDS-PAGE upon blockage of mitochondrial import and subsequent processing by MPP (Fig 2B, right panels). These results indicate a single translation product mechanism. The full description of the identification of the distribution mechanisms is provided in the Supplementary Material (including reference to Fig 2C), while the significance is considered in the Discussion.

Discussion Recent studies have established dual-targeting as a highly abundant phenomenon (Ast et al. ; 2011; Raza 2011; Yogev and Pines 2011; Duchene and Giege 2012; Carrie and Small 2013; Carrie and Whelan 2013). We have previously shown that dual targeted mitochondrial proteins are enriched for specific traits such as protein net charge and MTS strength, suggesting a nonrandom mechanistic determination and not just a leaky process. In this study, we have identified features that are enriched in dual targeted mitochondrial proteins indicating their increased conservation in evolution. Parameters indicating the above include low Dn/Ds (ratio of nonsynonymous to synonymous mutations), high transcript level, high CAI (Codon Adaption Index), a high Number of homologs, and a low PGL (Propensity of Gene Loss), all emphasizing the relative importance of these proteins. We hypothesized that dual-targeted proteins should be more evolutionary conserved than exclusive mitochondrial proteins, due to separate selective pressures for the protein presence in the different compartments. Indeed, we demonstrate that for all parameters, dual targeted proteins are on average significantly more conserved than exclusive mitochondrial proteins, clearly supporting our hypothesis. These results provide an intriguing connection between protein localization, function and evolution and are consistent with general findings showing that conservation is correlated with functionality (Suzuki and Saitou 2011; Zhang and Broughton 2013). We have previously discussed why dual targeting is so abundant (Ben-Menachem et al. 2011). One possibility, that emanates from the discussion above, is that dual targeting of a single protein to two compartments relieves the requirement for two genes, encoding two different proteins. In other words, echoforms have distinct functions in their respective compartments, and therefore, dual targeting is under purifying selection. Our findings that dual localized proteins are more evolutionally conserved, certainly supports this notion. There are many examples of dual functions of dual targeted proteins; in yeast, Mod5 is required for modification of tRNAs in the

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

Carrie, Giraud, and Whelan 2009; Avadhani 2011; Avadhani et al. 2011; Ben-Menachem et al.

nucleus and mitochondria (Gillman et al. 1991), aconitase and malate dehydrogenase are required for the TCA and glyoxylate shunt in the mitochondria and cytosol respectively (Small and McAlister-Henn 1997; Regev-Rudzki et al. 2005), and p450 enzymes are required in the mitochondria and ER (Avadhani et al. 2011). Glutathione S-transferases are present in several compartments, as part of an adaptive response against the toxicity of endogenous and exogenous metabolites (Raza 2011) and the transcription factor STAT3 was shown to be localized to mitochondria and the nucleus where it is associated with Ras-dependent oncogenic functions are known are distributed between mitochondria and chloroplasts (Carrie, Giraud, and Whelan 2009; Carrie and Small 2013). While there are numerous reports on distinct functions of echoforms in their respective compartments, this cannot be presently claimed for all dual localized proteins. A scenario that we and others have previously considered (Martin 2010; BenMenachem et al. 2011) is that some echoforms may not have a second function. One possibility is that there is a general evolutionary advantage for a pool of dual localized proteins in eukaryotic cells. For example, echoforms could be evolutionary substrates for acquisition of novel activities on a single polypeptide (moonlighting), localized to more than one compartment, without initially requiring gene duplication. The "price" for this would not be high; all that would be needed is to allow a small percentage of protein molecules to reside in or "explore" a different compartment in the cell. However, in such a case we would not expect the double selective pressure (due to function) on the protein sequence in contrast to what we in fact observe. Hence, our interpretation of the evolutionary conservation of dual targeted mitochondrial proteins is that most dual targeted proteins have functional echoforms in each compartment. Only many more single gene studies including deciphering molecular mechanisms of dual targeting will allow full support of this interpretation. Our previous efforts to create a predictive tool, for dual targeting of mitochondrial proteins, that we based on protein sequence features alone, were unsuccessful in distinguishing dual targeted mitochondrial proteins. In the second part of this study we generated an SVM using four parameters, based on the sequence of proteins and their MTSs, which were found to be distinctive in previous studies, [features i-iv, (Dinur-Mills, Tal, and Pines 2008; Ben-Menachem et al. 2011)], as well as new features of protein conservation (v-vii): (i) probability to be targeted to mitochondria (MitoProtII score), (ii) total net charge of the protein, (iii) the hydrophobic

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

transformation (Gough et al. 2009; Erlich et al. 2014). In plants numerous proteins whose dual

moment of the MTS ( Hd), (iv) the number of positively charged residues within the MTS, (v) Codon Adaption Index, CAI, (vi) Propensity for Gene Loss, PGL, and (vii) the number of organisms with a homologous gene (as found in Homologene). We achieved accuracies of 69-70 % and MCCs of 0.33-0.40. We evaluated our SVMs by comparing them to two predictors: YLoc and PSortII, using our mitochondrial proteome and a validation dataset (107 proteins). Our SVM predictions had accuracy and MCC values which are better than all the classifications based on YLoc and PSortII. In the future, identification of additional traits that distinguish between dual solely on the coding sequence. In the third part of this study we have identified for about 55.5% (15/27), of the proteins examined, a general dual targeting mechanism type, involving one or two translation products. These proteins which are dual targeted according to our

-complementation screen were

analyzed by Western blotting and labeling experiments. For about 18.5% (5/27) we detected two translation products while 37% (10/27) yielded a single translation product. Our findings may provide an initial approach to the question on whether dual targeting mechanisms are reflected in the choice of parameters used to predict dual targeting. The 15 proteins from this study which were categorized as encoding one or two translation products were added to 19 proteins from our dual targeted reference set (Dinur-Mills, Tal, and Pines 2008) to create a list of 34 proteins of which 18 and 16 are proposed to encode one or two translation products, respectively. A heat map showing the distribution of the various parameter values between these two mechanisms does not reveal any distinctive pattern (Supplementary Fig. S3). In other words there does not appear to be an enrichment of parameter values in one group ("one translation product") versus the other ("two translation products"). For each parameter there were insignificant differences between the values associated with each mechanism (Kolmogorov-Smirnov test). Consequently, we argue that our current parameters, which are used to define the two groups, ensue from the function and final location of the proteins and not from the mechanism by which they are targeted. A good example is fumarase (Yogev, Naamati, and Pines 2011) which is a highly conserved enzyme at the level of primary sequence (~60% identity) and function, yet it is dual targeted by different mechanisms in the yeast Saccharomyces cerevisiae (single translation product), human (two translation products) and the plant Arabidopsis thaliana (two genes).

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

and exclusive mitochondrial proteins may further improve our prediction capabilities based

Materials and Methods Saccharomyces cerevisiae strain sequences were downloaded from the Saccharomyces Genome Database (http://www.yeastgenome.org), for each gene separately by the sequence alignment pages. SGD contains 6607 nuclear encoded ORFs, 801 mitochondrial according to our previous studies (Ben-Menachem et al. 2011) and 5806 non-mitochondrial proteins. From the latter we omitted uncharacterized and dubious ORFs. The following ORFs were omitted: ORFs with only each other (YLR154W-A and YLR154W-C, see Supplementary Table 1). Our final analysis included 798 mitochondrial proteins and 4272 non-mitochondrial proteins. MIT yeast species S. cerevisiae, S. paradoxus, S. bayanus, and S. mikatae sequences were downloaded from SGD (http://downloads.yeastgenome.org/genomics/alignment/MIT_Spar_Sbay_Smik_Scer/). Dn/Ds' calculations - Emboss and Clustalw2 (Rice, Longden, and Bleasby 2000; Larkin et al. 2007) were used to translate the DNA sequences, create a multiple alignment of the protein sequences and rewrite the DNA sequences accordingly i.e. make them codon aligned to each other. For S. cerevisiae strains Matlab dnds function, with default parameters, was sequentially used for pairwise Dn/Ds calculations. For yeast species, PAML (Yang 2007) with model 0 runmode 0 was used to calculate pairwise Dn/Ds calculations. Dn/Ds’  is  Dn/Ds  with  a  correction   based on Hirsh at al. (Hirsh, Fraser, and Wall 2005), was calculated as follows: Ds`=Ds-m*cai, where m=-2.02 (as in Hirsh's work) and cai is the CAI value of the second sequence in the pair. Each gene was assigned with the average of all pairwise Dn/Ds' values as the single representative Dn/Ds' ratio value. Codon Adaptation Index (CAI) values were calculated using Emboss with the default codon frequency table of S. cerevisiae (Sharp and Li 1987). Each gene received the average CAI value of all the strain's sequences. Eukaryote

homologs

were

extracted

from

the

Homologene

database

(build

65,

http://www.ncbi.nlm.nih.gov/homologene). The number of homologous genes is the number of organisms, out of 19, which had at least one homolog protein which is verified (not assigned as a predicted protein). Organisms with more than one homologous protein for a specific gene

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

one strain (1 mitochondrial and 24 non-mitochondrial) and 2 mitochondrial which are aliases of

(paralogs) were nevertheless counted only once for the calculation of the "number of orthologs" and PGL. Prokaryotic homologs were identified according to blastp (Altschul et al. 1990; Camacho et al. 2009). limited to homologs of taxonomy id 2 (prokaryotes). Only proteins with an e-value < 0.001 were considered as homologs. Propensity for gene loss (PGL) calculation was adopted from Krylov et al (Krylov et al. 2003) and implemented in the Matlab script. Phylogenetic tree topology and branches length was taken from iTOL (Letunic and Bork 2011). Only 15 out of 20 organisms were used since the other referred to as the tree root for this analysis. Transcript levels were taken from Klockow (Klockow et al. 2008) and downloaded from SGD. Genes that appeared twice were averaged and then the median of the 7 samples was calculated as a single value per gene and was used to represent transcript levels. The absolute number of gene transcripts in a yeast cell was taken from Siwiak and Zielenkiewicz (parameter x in the paper) (Siwiak and Zielenkiewicz 2010). Statistical analysis: 1000 random samples of non-mitochondrial proteins, divided into two disjoint sets of appropriate sizes, were used as control groups. Distributions of two groups (mitochondrial and non-mitochondrial of the same size) were compared to each other, validating that the tested trait is unique to mitochondrial proteins (dual targeted or exclusively mitochondrial). In addition, all statistical tests were conducted on the disjoint sets of nonmitochondrial proteins, as if they were mitochondrial, to emphasize that the statistical significance is due to the values of the tested trait rather than due to the protein groups sizes. Comparing two distributions was performed by one sided Kolmogorov-Smirnov test which determines whether the CDF (Cumulative Distribution Function) of the first group is larger than the CDF of the second group (Matlab kstest2 function), and by Mann-Whitney test which determines whether the medians of two groups are significantly different from one another (Matlab ranksum function). All tests including the random sampling were performed using Matlab. Strains and Plasmids. S. cerevisiae strain BY4741 (Mat a; his3 1; leu2 0; met15 0; ura 3 0),

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

organisms were not found in the iTOL tree of life (Supplementary Table 2). S. cerevisiae was

and all the plasmids used in this work have been described previously (Ben-Menachem et al. 2011). Subcellular Fractionation. Induced yeast cultures (in galactose medium) were grown to an absorbance (A) of 1.5 at 600 nm, spheroplasts were prepared in the presence of Zymolyase-20T (MP Biomedicals, Irvine, CA) and mitochondria were isolated as described previously (Knox et al. 1998). The quality of subcellular fractionation experiments was followed by western blot antibody (cytosolic marker) and anti-Kar2 (ER marker). Metabolic labeling. Induced cultures (galactose medium) were harvested and labeled with 10 Ci/ml [35S]methionine and incubated for 30 min at 30oC. Where indicated, a final concentration of 20 M carbonyl cyanide m-chlorophenyl hydrazone (CCCP) was added before labeling. Labeling was stopped by addition of 10 mM sodium azide and labeled cells were collected by centrifugation, resuspended in Tris/EDTA buffer, pH 8.0, containing 1mM phenylmethylsulfonyl fluoride. Cells were broken with glass beads, and centrifuged to obtain the supernatant fraction. Supernatants were denatured by boiling in 1% SDS, immunoprecipitated with anti-

rabbit antiserum and protein A-Sepharose (Amersham Biosciences), and then

analyzed by SDS-PAGE and autoradiography.

Acknowledgments We thank Ora Schueler-Furman and Yoav Smith for their help and discussions throughout this study. We thank Hanah Margalit and Sigal Ben Yehuda for critical reading of our manuscript. This research was supported by grants from the Israel Science Foundation (ISF), The Israel Cancer Research Fund (ICRF), The USA–Israel Binational Science Foundation (BSF), and the CREATE project of the National Research Foundation of Singapore. Adi Naamati was supported by the Ori Foundation Fellowship.

References Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J Mol Biol 215:403-410.

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

analysis using anti-Hsp60 antibody (mitochondrial marker), anti-hexokinase 1 (anti-HxK1)

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

Andreoli, C., H. Prokisch, K. Hortnagel, J. C. Mueller, M. Munsterkotter, C. Scharfe, and T. Meitinger. 2004. MitoP2, an integrated database on mitochondrial proteins in yeast and man. Nucleic Acids Res 32:D459-462. Ast, J., A. C. Stiebler, J. Freitag, and M. Bolker. Dual targeting of peroxisomal proteins. Front Physiol 4:297. Avadhani, N. G. 2011. Targeting of the same proteins to multiple subcellular destinations: mechanisms and physiological implications. FEBS J 278:4217. Avadhani, N. G., M. C. Sangar, S. Bansal, and P. Bajpai. 2011. Bimodal targeting of cytochrome P450s to endoplasmic reticulum and mitochondria: the concept of chimeric signals. FEBS J 278:4218-4229. Ben-Menachem, R., M. Tal, T. Shadur, and O. Pines. 2011. A third of the yeast mitochondrial proteome is dual localized: a question of evolution. Proteomics 11:4468-4476. Briesemeister, S., J. Rahnenfuhrer, and O. Kohlbacher. 2010. YLoc--an interpretable web server for predicting subcellular localization. Nucleic Acids Res 38:W497-502. Camacho, C., G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, and T. L. Madden. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421. Carrie, C., E. Giraud, and J. Whelan. 2009. Protein transport in organelles: Dual targeting of proteins to mitochondria and chloroplasts. FEBS J 276:1187-1195. Carrie, C., and I. Small. 2013. A reevaluation of dual-targeting of proteins to mitochondria and chloroplasts. Biochim Biophys Acta 1833:253-259. Carrie, C., and J. Whelan. 2013. Widespread dual targeting of proteins in land plants: when, where, how and why. Plant Signal Behav 8. Claros, M. G., and P. Vincens. 1996. Computational method to predict mitochondrially imported proteins and their targeting sequences. Eur J Biochem 241:779-786. Dinur-Mills, M., M. Tal, and O. Pines. 2008. Dual targeted mitochondrial proteins are characterized by lower MTS parameters and total net charge. PLoS One 3:e2161. Duchene, A. M., and P. Giege. 2012. Dual localized mitochondrial and nuclear proteins as gene expression regulators in plants? Front Plant Sci 3:221. Erlich, T. H., Z. Yagil, G. Kay, A. Peretz, H. Migalovich-Sheikhet, S. Tshori, H. Nechushtan, F. Levi-Schaffer, A. Saada, and E. Razin. 2014. Mitochondrial STAT3 plays a major role in IgE-antigen-mediated mast cell exocytosis. J Allergy Clin Immunol. Gillman, E. C., L. B. Slusher, N. C. Martin, and A. K. Hopper. 1991. MOD5 translation initiation sites determine N6-isopentenyladenosine modification of mitochondrial and cytoplasmic tRNA. Mol Cell Biol 11:2382-2390. Gough, D. J., A. Corlett, K. Schlessinger, J. Wegrzyn, A. C. Larner, and D. E. Levy. 2009. Mitochondrial STAT3 supports Ras-dependent oncogenic transformation. Science 324:1713-1716. Hirsh, A. E., H. B. Fraser, and D. P. Wall. 2005. Adjusting for selection on synonymous sites in estimates of evolutionary distance. Mol Biol Evol 22:174-177. Karniely, S., and O. Pines. 2005. Single translation--dual destination: mechanisms of dual protein targeting in eukaryotes. EMBO Rep 6:420-425. Klockow, C., F. Stahl, T. Scheper, and B. Hitzmann. 2008. In vivo regulation of glucose transporter genes at glucose concentrations between 0 and 500 mg/L in a wild type of Saccharomyces cerevisiae. J Biotechnol 135:161-167. Knox, C., E. Sass, W. Neupert, and O. Pines. 1998. Import into mitochondria, folding and retrograde movement of fumarase in yeast. J Biol Chem 273:25587-25593.

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

Krylov, D. M., Y. I. Wolf, I. B. Rogozin, and E. V. Koonin. 2003. Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res 13:2229-2235. Larkin, M. A., G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettigan, H. McWilliam, F. Valentin, I. M. Wallace, A. Wilm, R. Lopez, J. D. Thompson, T. J. Gibson, and D. G. Higgins. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947-2948. Letunic, I., and P. Bork. 2011. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res 39:W475-478. Martin, W. 2010. Evolutionary origins of metabolic compartmentalization in eukaryotes. Philos Trans R Soc Lond B Biol Sci 365:847-855. Nakai, K., and P. Horton. 1999. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24:34-36. Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3:418-426. Pal, C., B. Papp, and L. D. Hurst. 2001. Highly expressed genes in yeast evolve slowly. Genetics 158:927-931. Raza, H. 2011. Dual localization of glutathione S-transferase in the cytosol and mitochondria: implications in oxidative stress, toxicity and disease. FEBS J 278:4243-4251. Regev-Rudzki, N., S. Karniely, N. N. Ben-Haim, and O. Pines. 2005. Yeast aconitase in two locations and two metabolic pathways: seeing small amounts is believing. Mol Biol Cell 16:4163-4171. Regev-Rudzki, N., and O. Pines. 2007. Eclipsed distribution: a phenomenon of dual targeting of protein and its significance. Bioessays 29:772-782. Rice, P., I. Longden, and A. Bleasby. 2000. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16:276-277. Sharp, P. M., and W. H. Li. 1987. The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:12811295. Siwiak, M., and P. Zielenkiewicz. 2010. A comprehensive, quantitative, and genome-wide model of translation. PLoS Comput Biol 6:e1000865. Small, W. C., and L. McAlister-Henn. 1997. Metabolic effects of altering redundant targeting signals for yeast mitochondrial malate dehydrogenase. Arch Biochem Biophys 344:53-60. Suzuki, R., and N. Saitou. 2011. Exploration for functional nucleotide sequence candidates within coding regions of mammalian genes. DNA Res 18:177-187. Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer, Berlin. Yahalom, R., D. Reshef, A. Wiener, S. Frankel, N. Kalisman, B. Lerner, and C. Keasar. 2011. Structure-based identification of catalytic residues. Proteins 79:1952-1963. Yang, Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:15861591. Yogev, O., A. Naamati, and O. Pines. 2011. Fumarase: a paradigm of dual targeting and dual localized functions. FEBS J 278:4230-4242. Yogev, O., and O. Pines. 2011. Dual targeting of mitochondrial proteins: mechanism, regulation and function. Biochim Biophys Acta 1808:1012-1020. Zhang, F., and R. E. Broughton. 2013. Mitochondrial-nuclear interactions: compensatory evolution or variable functional constraint among vertebrate oxidative phosphorylation genes? Genome Biol Evol 5:1781-1791.

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

Table 1: Dn/Ds' of mitochondrial dual-targeted and exclusive proteins, or nonmitochondrial proteins MannWhitney 1 tail p-value

0.010 0.012

0.022 0.024c

0.043

0.041

0.013a

0.032a

0.519ab

0.510ab

310

0.055

0.07

Mitochondrial exclusive

474

0.068

0.08

0.00073

0.005

Non-mitochondrial randomly selected

310

Non-mitochondrial randomly selected

0.05a

0.06a

0.517ad

0.509ad

474

Parameter

Protein group

Dn/Ds` ratio

Mitochondrial dual targeted Mitochondrial exclusive

Dn/Ds` ratio Dn/Ds` ratio four species

Dn/Ds` ratio four species

Non-mitochondrial randomly selected Non-mitochondrial randomly selected Mitochondrial dual targeted

316 482 316 482

a) Average of 1000 random samples b) Kolmogorov-Smirnov 41 with p-value of less than 0.043. Mann-Whitney 39 with p-value of less than 0.041. c) Three proteins with a value of infinity were removed d) non with p-value less than the original (0.00073, 0.005)

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

Mean

KolmogorovSmirnov 1 tail p-value

Group Median size

Table 2: Comparison of mean CAI values between mitochondrial dual targeted proteins, mitochondrial exclusive proteins and non-mitochondrial proteins

Parameter

CAI

Group size

Median

Mean

Mitochondrial dual targeted

316

0.250

0.297

Mitochondrial exclusive

482

0.230

0.247

Non-mitochondrial randomly 316 selected 0.236a 0.272a Non-mitochondrial randomly 482 selected a) Average of 1000 samples b) None with p-value less than original (3.41 e -6, 5.97 e- 07)

KolmogorovSmirnov 1 tail p-value

Mann-Whitney 1 tail p-value

3.41 e-6

5.97 e-7

0.515ab

0.507ab

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

CAI

Protein group

Table 3: Comparison of number of homologous and PGL values between mitochondrial dual targeted proteins, mitochondrial exclusive proteins and non-mitochondrial proteins Parameter

PGLb

Group size

Median

Mean

Mitochondrial dual targeted

316

8

8.75

Mitochondrial exclusive

482

7

7.66

316

4.46a

7.53a

482

4.51a

7.55a

316

0.241

0.288

482

0.262

0.320

316

0.359a

0.421a

482

0.358a

0.417a

Non-mitochondrial randomly selected Non-mitochondrial randomly selected Mitochondrial dual targeted Mitochondrial exclusive

Non-mitochondrial randomly selected PGLb Non-mitochondrial randomly selected a) Average of 1000 samples b) Maximum value 0.587

KolmogorovSmirnov 1 tail p-value

MannWhitney 1 tail pvalue

0.025

0.005

0.647a

0.517a

0.057

0.006

0.621a

0.519a

Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

Number of organisms with homologous protein Number of organisms with homologous protein

Protein group

Figure legends Fig. 1 Cumulative distribution function (CDF) of Dn/Ds`. CDF values of Dn/Ds' of dual targeted mitochondrial proteins (red broken line, top) show a significantly different distribution than exclusive mitochondrial (red continuous line, middle) and non-mitochondrial (blue continuous line, bottom). The blue continuous line (bottom) is the CDF of Dn/Ds' of 1000 random samples of 476 non-mitochondrial proteins (blue continuous line) which overlaps a blue dashed line (covered) which is the CDF of 1000 random samples of 316 non-mitochondrial proteins. Downloaded from http://mbe.oxfordjournals.org/ at Memorial Univ. of Newfoundland on August 4, 2014

Fig. 2 Examination of dual targeting mechanisms. Yeast strains expressing the indicated proteins, which are fused at their C terminus to the -fragment of bacterial beta-galactosidase, were subjected to subcellular fractionation (left panels) and metabolic labeling (right panels), in presence or absence of mitochondrial protein import. Equivalent portions of subcellular fractions, total (T) cytosol (C) and mitochondria (M), were analyzed by Western blotting using -fragment anti-serum. Metabolic labeling was performed in the absence(-) or presence(+) of CCCP followed by immuno-precipitation using the -fragment anti-serum. Proteins suggested to have a two translation product mechanism (A, top left) a single translation product (B, bottom let) and an undefined mechanism (C, right). Controls antibodies: anti-Hsp60 (mitochondrial marker), anti-hexokinase 1 (HxK1) (cytosolic marker) and anti-Kar2 (ER marker).

Dual-targeted proteins tend to be more evolutionarily conserved.

In eukaryotic cells, identical proteins can be located in more than a single subcellular compartment, a phenomenon termed dual targeting. We hypothesi...
398KB Sizes 1 Downloads 4 Views