Zhang et al. Genome Biology (2015) 16:202 DOI 10.1186/s13059-015-0772-4
RESEARCH
Open Access
New genes drive the evolution of gene interaction networks in the human and mouse genomes Wenyu Zhang1,2, Patrick Landback3, Andrea R. Gschwend2, Bairong Shen1,4* and Manyuan Long2,3*
Abstract Background: The origin of new genes with novel functions creates genetic and phenotypic diversity in organisms. To acquire functional roles, new genes must integrate into ancestral gene-gene interaction (GGI) networks. The mechanisms by which new genes are integrated into ancestral networks, and their evolutionary significance, are yet to be characterized. Herein, we present a study investigating the rates and patterns of new gene-driven evolution of GGI networks in the human and mouse genomes. Results: We examine the network topological and functional evolution of new genes that originated at various stages in the human and mouse lineages by constructing and analyzing three different GGI datasets. We find a large number of new genes integrated into GGI networks throughout vertebrate evolution. These genes experienced a gradual integration process into GGI networks, starting on the network periphery and gradually becoming highly connected hubs, and acquiring pleiotropic and essential functions. We identify a few human lineage-specific hub genes that have evolved brain development-related functions. Finally, we explore the possible underlying mechanisms driving the GGI network evolution and the observed patterns of new gene integration process. Conclusions: Our results unveil a remarkable network topological integration process of new genes: over 5000 new genes were integrated into the ancestral GGI networks of human and mouse; new genes gradually acquire increasing number of gene partners; some human-specific genes evolved into hub structure with critical phenotypic effects. Our data cast new conceptual insights into the evolution of genetic networks.
Background New genes provide important genetic novelties responsible for biological diversity in organisms [1], and are often the genetic basis for lineage- or species-specific components in important biological processes and structures [2, 3]. As biological characteristics mostly emerge through complicated interactions among a cell’s components [4], new genes will inevitably be integrated into and reshape ancestral genegene interaction (GGI) networks to acquire their corresponding biological roles. Recently, several case-studies have shown individual new genes can participate in local ancestral GGI networks and acquire important functions in fruit fly [5, 6], budding yeast [7], and plants [8, 9]. Conse* Correspondence:
[email protected];
[email protected] 1 Center for Systems Biology, Soochow University, Suzhou, Jiangsu 215006, China 2 Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA Full list of author information is available at the end of the article
quently, it is intriguing to ask how new genes are topological and functionally incorporated in and subsequently change ancestral GGI networks in genome-wild scale. Thanks to the accumulation of GGI data brought by the development of high throughput technologies, a couple of attempts have been made to address this issue. Through examining the evolution of new genes in the proteinprotein interaction networks of yeast Saccharomyces cerevisiae, Capra et al. [10] found novel genes are less integrated in cellular networks than duplicated genes, genes prefer to interact with other genes of similar age and origin, and new genes participated in the network modules for synthesis of important metabolites. By applying different network data source, another research group showed a similar integration process of new genes in yeast [11]. Popadin et al. [12] recently analyzed a co-expression network with previous data of gene ages in vertebrates [2, 13] and observed a difference of integration of these genes into the
© 2015 Zhang et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Zhang et al. Genome Biology (2015) 16:202
networks between young and old ages. These works encourage us to further explore a potential quantitative correlation between a continuous evolutionary process of new genes and their degree to be integrated into and subsequent rewiring of various ancestral gene networks in vertebrates, which have provided data of evolutionarily well resolved divergence times and interesting phenotypic data with the rich datasets of recently evolved genes we identified [2, 13]. In the present report, we investigated evolutionary patterns of GGI networks driven by new genes originating throughout various stages in the lineages toward human and mouse. Taking advantage of a well-resolved gene dating dataset [2, 13] and the rich and independent GGI datasets, we elaborately explored the integration process of new genes into GGI networks reconstructed with four different data sources in both human and mouse. Following, we focused on the functional evolution analysis of new genes in human genome, and explored how new genes acquire critical functions, that is, pleotropic functions, essential functions, and brain development relevant functions, in term of GGI network integration. Finally, we deeply excavated and discussed the mechanisms driving the evolution of GGI networks and deriving the integration patterns of new originating genes.
Results and discussion The integration of new genes into GGI networks is a gradual evolutionary process
A technical challenge to examine the role of new genes in evolution of gene networks is to detect reliable GGI networks in their global distribution. Considering current technical growth and evaluation to methods and data that reveal GGI, we constructed and analyzed three different types of data in attempt to identify robust GGI networks (see Methods): the human protein-protein interactions (hPPIs), the human gene co-expression (hGC) networks, and the mouse proteinprotein interactions (mPPIs). The second line of data we used to investigate the correlation between new gene evolution, as we extensively investigated previously, and the evolution of GGI networks as revealed by above three different databases is the best-resolved vertebrate divergence times, supported by paleontology, organismal evolutionary analysis, and molecular evolution, and most reliably resolved phylogenetic tree of vertebrates over decades of extensive studies on vertebrate species [2, 13]. These data provided excellent estimates for the ages of new genes, comprising the ones generated by DNA-based duplication, RNAbased duplication, and de novo origination during the vertebrate evolution in the lineage toward humans and mouse, as we identified previously in comparative genome comparison.
Page 2 of 14
First of all, we investigated the correlation between the ages of genes and their topological characteristics in the GGI networks described in the four databases we constructed. Remarkably, all these types of GGI network data revealed highly similar rates and patterns of new genes-integrated into the networks. Therefore, we will focus on human for presentation and discussion of the results while introducing the relevant findings in the mouse genome. We first analyzed the human protein-protein interactions (hPPIs) network by exploiting and modifying an integrative experimental protein interactions dataset [14] (with the threshold of confidence score of 0.68, see Methods). The reconstructed human PPI network revealed an approximately scale-free topological structure [15] with a degree exponent of 1.49 that defines a power-law distribution of connectivity (or degrees) (Additional file 1: Figure S1 and Additional file 2: Table S1). We then labeled the gene (equivalent to its coded protein) age of each node in the PPI network, determined by an age index for the genes that originated in every period of evolution along the wellresolved phylogeny of vertebrates (Fig. 1a and b), that were retrieved from a widely used database [2, 13] (See Methods). Analysis on the above PPI network indicated a significant and strong correlation (Polynomial regression test, R2 = 0.8834, Fig. 2a) between the ages of genes and their connectivity (or degree, that is, numbers of interacting partners) in the PPI network, revealing a gradual evolutionary process in which new genes are integrated into the PPI network, which echoed the evolutionary procedure of new gene structures [16]. This finding suggests that throughout vertebrate evolution there was a non-robust and rapid process, unexpected by conventional thought, in which new genes were integrated into the GGI networks. During this process of 370 million years (MY, branch 1–12, Fig. 1a) we examined, we observed that 5,710 new genes were integrated into the GGI networks. Furthermore, this process showed an evolutionarily significant pattern: the new genes started, at a young age, to be integrated into networks to form new and less connected branches; however, with the elapse of evolutionary time, as genes grow older, they acquired more interacting links. To avoid possible bias created by the chosen confidence score threshold for the reconstruction of human PPI network, we reanalyzed a new human PPI network using a more stringent cutoff (With minimum confidence score of 0.77, see Methods and Additional file 2: Table S1) and we found the same evolutionary pattern (Polynomial regression test, R2 = 0.7909, Fig. 2b). The connectivity-based conclusion is further supported by the analysis of another statistic parameter describing network centralities of genes, that is, Betweenness, which
Zhang et al. Genome Biology (2015) 16:202
Page 3 of 14
A Millions of years from present (Myr)
Branch 0
450 400
Branch 1
350
Branch 2
300 250 Branch 3 - 4
200 150
Branch 5 - 7
100 50
Branch 8 - 12
0
Human Chimp
Orangutan
Rhesus
Marmoset Mouse GuineaPig
Platypus Chicken Lizard
Dog Cow Armadillo Tenrec Opossum
Frog
Fugu Zebrafish
Branches
12
11
10
9
8
7
6
5
4
3
2
1
0
# of genes
389
447
392
286
314
130
336
1214
945
1018
1393
1013
12058
B Genes originating before vertebrate split (Branch 0) Genes originating at the split of Branch 1
…… Genes originating at the split of human (Branch 12)
Fig. 1 Schematic diagram to show the network integration of new genes originating from various phylogenetic branches towards human. a Phylogenetic tree of vertebrates towards human together with branches and divergence times in millions of years from present (myr). The number of genes originating at each phylogenetic branches was also listed. b A sub-graph of human PPI network to show the incorporation of new genes from different originating times
measured the importance of one node connecting all the other nodes (Polynomial regression test, R2 = 0.9021, Fig. 2c). Based on human PPI network reconstructed from a different experimental manual curation resource (See Methods and Additional file 3: Figure S2A), that is, Human Protein Reference Database (HPRD) [17], the same conclusion was drawn as described above (Additional file 3: Figure S2B). For a more rigorous analysis of independent GGI data types, we analyzed another human GGI network referred to as gene co-expression (hGC) network (See Methods
and Additional file 3: Figure S2C and D), reflecting the correlations of gene expression profiling in a series of human tissues [18]. Mapping the topological positions of new genes in humans into the GC network revealed a similar correlation between the ages and connectivity of genes (Polynomial regression test, R2 = 0.6527, Fig. 2d), revealing the same evolutionary trend of new genes starting with low connectivity and evolving to be highly connected hubs. Additionally, we also explored the evolutionary patterns of human PPI network based on another gene age dataset [19] (Additional file 4: Figure S3A), which
Zhang et al. Genome Biology (2015) 16:202
Page 4 of 14
A
B
15
R² = 0.8834
0
20
Average connectivity in PPI Network*
Average connectivity in PPI Network
25
1 4
10
2
5 6
3
11
8
12
10
5
9
7
0
12 R² = 0.7909
0 9 1
9 6
3
3
50
10
8
7 500
5
12
11
2
0
500
50
5
Divergence time (Million Year Ago)
Divergence time (Million Year Ago)
D
C
150 R² = 0.9021
0
2.5 1
3
4 5
1.5
2
0.5
Average connectivity in GC Network
Average Betweenness in PPI network (log-10)
5
4
6
6
8 9 7
11
12
10
0 120 90
1
2
R² = 0.6527 3
4
5 6 9
10
60 7
30
8
11 12
0
-0.5 500
50
5
Divergence time (Million Year Ago)
500
50
5
Divergence time (Million Year Ago)
Fig. 2 GGI network topological patterns of human genes related to their divergence times. a Distribution of PPI network connectivity (numbers of interactions) for genes from different phylogenetic branches. b Distribution of genes from different phylogenetic branches from another PPI network connectivity reconstructed with a more stringent threshold. c Distribution of average Betweenness (log10-based) within each gene group in the PPI network. d Distribution of GC (gene co-expression) network connectivity for genes from different phylogenetic branches. The error bars show the standard error of the mean for each group of genes, and the dash line indicates the polynomial regression correlation between network centralities (that is, Connectivity, Betweenness) of genes and their divergence times. Numbers near each data point are phylogenetic branch assignments for each group of genes. The divergence time of each gene age group was assigned as the middle time point for each branch and the oldest branch (branch 0) is arbitrarily set as 500 myr
estimated gene ages in human genome based on independent and long distant phylogenetic distribution. A same evolutionary pattern of new genes was shown (Additional file 4: Figure S3B), and it was further demonstrated that our conclusion was independent of gene age dating datasets. Thus, different GGI data, that is, PPI and GC data, and different gene age dating data, all supported the same conclusions as reported above. Furthermore, we applied a similar protocol to analysis of the reconstructed mouse GGI networks from mouse PPI data (mPPIs), by integrating most of the available online experimental interaction datasets (Additional file 5: Table S2). The integrative analysis of mouse gene age information [13] (Additional file 6: Figure S4A) and PPI topological data (Additional file 6: Figure S4B) lead to the same conclusion (Polynomial regression test, R2 = 0.6232, Additional file 6: Figure S4C) determined by the human GGI network analyses. These data suggest a gradual integration of new genes in the GGI networks is an evolutionary process shared in mammalian lineages of primates and rodents.
Given the observation that the acquisition of genetic interactions is a time-dependent gradual procedure, we further investigated whether this process occurred at a constant rate. Our result showed that new genes could establish linking partners at a high rate (interactions acquired per million years) in the initial stage of their origination. After that, the rate dramatically declined, and finally plateaued (Fig. 3a and b), suggesting that the acquisition of biological roles of new genes is a rapid process during early evolution, but as the genes age, the function spectrum is diversified at a much lower rate. Taking advantage of the high coverage of the human PPI data (Additional file 2: Table S1), we subsequently focused on the analysis of both topological and functional evolution patterns of new genes based on our first constructed human PPI network. To better visualize the integration process, we mapped the genes in the mammalian GGI networks based on their connectivity, where highly connected genes made up the core of the human PPI network and genes with low connectivity were located on the
Zhang et al. Genome Biology (2015) 16:202
R² = 0.891 12
0.9
11
0.6
0.3 2
5
3
6 7
8
4 0 1
10 9
0.0 500
50
5
Rate of evolving interactions in the Mouse PPI network (interactions / myr)
Rate of evolving interactions in the Human PPI network (interactions / myr)
B
1.2
0.20 R² = 0.8939 0.16
11
0.12 10
0.08 6
0.04
3 0 1
2
4
5
9
8
A
Page 5 of 14
7
0.00 500
50
Divergence time (Million Year Ago)
5
Divergence time (Million Year Ago)
Fig. 3 Average rate of evolving linking partners (interactions / myr) for genes from different phylogenetic branches based on the human PPI network (a) and mouse PPI network (b). The dash line indicates the power regression correlation between evolution rates of interactions for genes and their divergence times. Numbers near each data point are phylogenetic branch assignments for each group of genes. The divergence time of each gene age group is assigned as the middle time point for each branch. And the oldest branch (branch 0) is arbitrarily set as 500 myr
network periphery (Fig. 4), which revealed a clear correlation between gene age and location in the mammalian GGI networks. Surprisingly, a small fraction of young genes were found to have evolved into the network core, whereas the majority of recently originating genes,
especially primate-specific genes (branch 8–12, Fig. 1a), are located in the exterior regions of network. As the ages of genes increase, they tend to appear more frequently in the more densely connected core of network.
0.8
0.6
0.4
0.2
0 0
1
2
3
4
Very low connectivity
5
Low connectivity
6
7
Periphery
Moderate connectivity
8
High connectivity
9 10
Very high connectivity
11 12
Core
Fig. 4 PPI network locations of genes in relation to their divergence times. The network locations of genes are classified into five distinct layers according to the rank of degree centralities. Specifically, genes that have the top 20 % of degree centralities are assigned to the network core (genes with very high connectivity), and those with the bottom 20 % of degree centralities in the network periphery (genes with very low connectivity). The same rule is applied for the assignment of the middle three network layers. The x-axis shows the phylogenetic branches for each group of genes, and y-axis indicates the categorization of genes according to the above specifications, and z-axis exhibits the percentage of genes within each age group located in the corresponding categories
Zhang et al. Genome Biology (2015) 16:202
Page 6 of 14
New genes gradually acquire pleiotropic and essential function roles
As most biological characteristics arise from the complex interactions between cell’s numerous components [4], the integration of new genes into the GGI network might indicate the emergence of novel functions for these new genes. Furthermore, the gradual evolution of more interactions in GGI networks might signal the process of new genes acquiring pleiotropic functions. This hypothesis could be indirectly confirmed by the strong correlation of connectivity of genes and their divergence times (Fig. 2a) and a strong linear correlation between the connectivity of genes and their expression breadths at both RNA expression level (Pearson linear correlation test, R2 = 0.9384, Fig. 5a) and protein expression level (Pearson linear correlation test, R2 = 0.9457,
B 24
Average number of tissues with expression on protein level
Average number of tissues with expression on RNA level
A
Fig. 5b). Thus it could hint that new genes gradually evolve broader expression patterns and therefore acquire pleiotropic functions, as they gradually evolve more linking partners (Fig. 2a), and genes with more linking partners tend to have broader expression patterns (Fig. 5a and b). To verify this hypothesis in a direct manner, we further computed and compared the tissue expression patterns for genes along different phylogenetic branches. Our results showed that genes gradually evolved broader tissue expression patterns at mRNA expression level from RNA-seq data [20] (Polynomial regression correlation test, R2 = 0.96538, Fig. 5c), which indicates the acquirement of stronger pleiotropic functions. One might dissent the role of mRNA as the performer of biological functions, our analysis on protein expression profiling
R² = 0.9384
21
18
15
44
R² = 0.9457
41
38
35 1
2
3
4
5
6
7
>=8
1
2
PPI network connectivity
4
5
6
7
>=8
PPI network connectivity
C
D 25
Average number of tissues with expression on protein level
Average number of tissues with expression on RNA level
3
R² = 0.9654
0
20 1
2
15
3
10
5
7
8 12
4 9
6
10
11
5 500
50
Divergence time (Million Year Ago)
5
45 R² = 0.8004
0
40
1 2 4
35
5
6
10
3
30
11
12
9 7
8
25 500
50
5
Divergence time (Million Year Ago)
Fig. 5 Expression breadths of genes in regards to their PPI network connectivity and divergence times. a Average number of tissues with expression of genes with various PPI network connectivity based on RNA-seq expression level data. b Average number of tissues with expression of genes with various PPI network connectivity based on protein expression level data. The error bars show the standard error of the mean for each group of genes, and the solid line indicates the linear regression correlation between network connectivity of genes and their expression breadths. c Average number of tissues with expression of genes from different phylogenetic branches based on RNA-seq expression level data. d Average number of tissues with expression of genes from different phylogenetic branches based on protein expression level data. The dash line indicates the polynomial regression correlation between divergence times of genes and their expression breadths. Branch assignment is labeled near each data point. The age assignment for each branch follows Fig. 1
Zhang et al. Genome Biology (2015) 16:202
data [20] drew the same conclusion (Polynomial regression test, R2 = 80038, Fig. 5d). In line with the network topological integration process of new genes (Figs. 2a and 4), our results showed a gradual process for new genes to evolve pleiotropic function roles, reflected by the tissue expression patterns. These findings also suggest functional constraints on new originating genes [21], as they are usually shown to be with very narrow and specified expression patterns [22], such as testis expression [23]. One critical feature of scale-free networks is the existence of hub nodes, or highly connected nodes [24]. Hub nodes are essential components in various networks [25], and are subjected to concentrated evolutionary forces that shape the network structures to result in essential functions [3, 26]. To explore the contribution of new genes in reshaping the GGI network, we investigated the percentage distributions of hub genes (with interaction degrees no smaller than 6) originating across different phylogenetic branches in human PPI network. The data revealed a strong correlation between gene ages and fractions of hub genes (Polynomial regression correlation test, R2 = 0.8016, Fig. 6a). In particular, we found a high proportion of hub genes (16 %) arising in the most recently originated human-specific branch (Branch 12, Fig. 1a), and this number gradually increased with gene ages, peaking at around 53 % for the earliest originating genes (Branch 0, genes arising before the split of vertebrates, Fig. 1a). This phenomenon indicates the gradual process of new genes evolving to be network hubs, and reshaping the original gene interaction networks. It has been reported that there is a relationship between gene topological features and biological functions [26, 27]. More specifically, genes with high network connectivity tend to be functionally essential [26] (Fig. 6b). Given the above observation that new genes gradually evolve many interactions to become network hubs, it is reasonable to infer that the acquisition of functional essentiality for new genes in human genomes may follow a step-wise evolutionary process. Through the meticulous collection and analysis of sources of human gene essentiality data (Additional file 7: Table S3, see Methods), we explored the relationship between gene essentiality and origination time (Fig. 6c). It was unexpected that a proportion of newly originated genes, especially genes that arose after branch 6 (approximately 80 million years ago), have evolved essential functions, although more genes originating from older periods are functionally essential, and the fraction of essential genes increases with the elapse of evolutionary time. Together with aforementioned observations from the network topology, our analysis demonstrated a clear trend that human new genes gradually evolve to be topologically central and
Page 7 of 14
functionally essential, and acquire the capability to reshape the GGI networks. Human-specific hub genes are found to be with potential brain development functions
The remarkable development of the brain in primatelineage species, especially in human, is a decisive hallmark differentiating them from other organisms [28]. Recent studies have reported important roles of new genes in evolution of important human brain-related traits. For example, it was detected that an excess of young genes (that is, primate-specific) in the human genome are recruited in early human brain development [2]; potential strengthening functions of brain neoronconnection by SRGAP2 [29, 30]; the skin and brain functions by CHRFAM7A [31, 32]. We further investigated the correlation of the young genes in human that have evidence for functioning in brain development with their topological structures in the GGI networks. Through integrative analysis of the brain expression pattern data of these young genes [2] and their network topological features based on human PPI network data, we found no significant bias on the percentages of hub genes (with minimum interaction degrees of 6) among three different brain expression categories of young genes (Fisher’s exact test, Fetus vs. Adult: P value = 0.435, Adult vs. Unbiased: P value = 0.3323, Fig. 7). In other words, young genes with diverse network connectivity contribute equally during both early and late stages of human brain development. More intriguingly, four human-lineage specific (the genes that originated only in the human lineage since its divergence and thus exist only in the human genome) hub genes with clear expression evidence in human brain were found (Additional file 8: Table S4). As there was no direct clue in literatures about their functions in brain development of these four genes, we conducted a ‘guilt by connection’ study to investigate the reported evidence for the roles in brain function of their direct linking partners by manual curation of early studies (Additional file 9: Table S5). For instance, CCT4, a subunit of chaperonin containing TCP1, was reported to be involved with development of a brain malfunction disorder - Alzheimer’s disease [33], and it was also shown that CCT4 (gene id: 10575) is a direct interacting partner of one of young hub gene - FAM86B2 (gene id: 653333, Fig. 8). Collectively, we found that 62.5 % (10 of 16) and 53.3 % (8 of 15) of the first-layer linking partners for two out of the four hub genes, which were fetus brain biased, were confirmed to be involved in brain development (Fig. 8 and Additional file 9: Table S5). While for the other two unbiased hub genes, 24.4 % (10 out of 41) and 50 % (3 out of 6) were proven to function in brain development in previous literature
Zhang et al. Genome Biology (2015) 16:202
Page 8 of 14
B
A
20
Fraction of functionally essential genes
Fraction of hub genes in PPI network
60 R² = 0.8016
0 1
40
3 4
5
8
6
2
11
20 7
12
10 9
0 500
50
5
Fraction of functionally essential genes
15
10
5
0 1
2
3
4
5
6
7
>=8
PPI network Connectivity
Divergence time (Million Year Ago)
C
R² = 0.665
10 R² = 0.9712
8
0
6
4
1 2
3
2 4
5 67
9
8
10
11
12
0 500
50
5
Divergence time (Million Year Ago) Fig. 6 Fraction of topologically and functionally essential genes for gene groups from different divergence times. a Fraction of hub genes in PPI network within gene groups of different divergence times. Hub genes are defined as genes with network connectivity greater than median level (Interaction degree > = 6). Branch assignment is labeled near each data point. The age assignment for each branch follows Fig. 1. The dash line indicates the polynomial regression correlation between divergence times of genes and the fractions of hub genes. b Fraction of essential genes in regards to their PPI network connectivity. The solid line indicates the linear regression correlation between PPI network connectivity of genes and the fractions of essential genes within each gene group. c Fraction of essential genes in PPI network within gene groups from different divergence times. The dash line indicates the polynomial regression correlation between divergence times of genes and the fractions of essential genes
(Fig. 8 and Additional file 9: Table S5). As genes with similar functions tend to be within the same network cluster [34], this evidence suggests these four humanlineage specific hub genes could also be with associated functions in human brain development. Multiple mechanisms drive the evolution of human GGI network
The most significant property of complex networks, including biological networks, is the power-law degree distribution [24] (Additional file 1: Figure S1), or so-called scale-free feature. Following the classic Barabasi-Albert (BA) model [35], this preferential attachment model was also applied to account for the scale-free feature of
biological networks [36], which claims that new originating genes tend to interact with well-connected nodes. However, the biggest challenge for this model is the distinctive characteristics of biological networks duplication as the dominant source of network evolution [37]. Therefore, another biologically motivated model called duplication-divergence model was proposed [38, 39], which accounts for both the gene duplication and the subsequent loss of inherited interactions. However, the acquirement of new links, except inherited interactions, was not considered in this model. To address this issue from an evolutionary aspect, we defined primate-specific genes (branch 8–12 as shown in Fig. 1a) as young genes, and genes that originated before
Zhang et al. Genome Biology (2015) 16:202
Page 9 of 14
P_value = 0.435
P_value = 0.3323
100
Fraction of young genes
80
60
40
20
0 Fetus brain biased
Adult brain biased
Young hub genes
Unbiased
Young non_hub genes
Fig. 7 Comparison of PPI network topologies for young genes with diverse brain expression patterns. This figure shows the percentage distribution of young hub genes and young non-hub genes within different categories of brain expression patterns. The statistical significance difference was calculated using Fisher’s exact test
Fig. 8 Human lineage-specific hub genes and their first-level linking partners. This figure illustrates two fetus brain biased human lineage-specific hub genes (top) and two unbiased human lineage-specific hub genes (bottom) and their direct interacting partners from the human PPI network. Genes biased in fetus brain (blue), adult brain (red), and unbiased (orange) between fetus and adult brain are marked. Genes (in square circles) outlined in the green dashed rectangle have been reported to have some brain development-related functions in previous literature
Zhang et al. Genome Biology (2015) 16:202
Page 10 of 14
this time period as old genes. Among these young genes, 95 % of them were created from duplication-based (either from DNA-level duplication or RNA-level duplication) mechanisms (Additional file 10: Figure S5), which is in line with the classic argument that duplication is the dominant source of evolution [37]. Consequently, these young genes inherited on average 27 % linking partners from their parental genes (Fig. 9a), which is statistically greater (18 times) than that of random gene pairs (Fig. 9b). This finding indicated the inheritance of interacting partners of new genes from their parental copies [5]. We further explored the pattern for young genes to establish new linking partners, by removing those shared interactions with their parental genes. Different with the pattern in yeasts [10], we found that the young genes tend to prefer as new linking patterns the genes with high topological centralities (Chi-square tests, Degree: P value