Zhang et al. Genome Biology (2015) 16:202 DOI 10.1186/s13059-015-0772-4

RESEARCH

Open Access

New genes drive the evolution of gene interaction networks in the human and mouse genomes Wenyu Zhang1,2, Patrick Landback3, Andrea R. Gschwend2, Bairong Shen1,4* and Manyuan Long2,3*

Abstract Background: The origin of new genes with novel functions creates genetic and phenotypic diversity in organisms. To acquire functional roles, new genes must integrate into ancestral gene-gene interaction (GGI) networks. The mechanisms by which new genes are integrated into ancestral networks, and their evolutionary significance, are yet to be characterized. Herein, we present a study investigating the rates and patterns of new gene-driven evolution of GGI networks in the human and mouse genomes. Results: We examine the network topological and functional evolution of new genes that originated at various stages in the human and mouse lineages by constructing and analyzing three different GGI datasets. We find a large number of new genes integrated into GGI networks throughout vertebrate evolution. These genes experienced a gradual integration process into GGI networks, starting on the network periphery and gradually becoming highly connected hubs, and acquiring pleiotropic and essential functions. We identify a few human lineage-specific hub genes that have evolved brain development-related functions. Finally, we explore the possible underlying mechanisms driving the GGI network evolution and the observed patterns of new gene integration process. Conclusions: Our results unveil a remarkable network topological integration process of new genes: over 5000 new genes were integrated into the ancestral GGI networks of human and mouse; new genes gradually acquire increasing number of gene partners; some human-specific genes evolved into hub structure with critical phenotypic effects. Our data cast new conceptual insights into the evolution of genetic networks.

Background New genes provide important genetic novelties responsible for biological diversity in organisms [1], and are often the genetic basis for lineage- or species-specific components in important biological processes and structures [2, 3]. As biological characteristics mostly emerge through complicated interactions among a cell’s components [4], new genes will inevitably be integrated into and reshape ancestral genegene interaction (GGI) networks to acquire their corresponding biological roles. Recently, several case-studies have shown individual new genes can participate in local ancestral GGI networks and acquire important functions in fruit fly [5, 6], budding yeast [7], and plants [8, 9]. Conse* Correspondence: [email protected]; [email protected] 1 Center for Systems Biology, Soochow University, Suzhou, Jiangsu 215006, China 2 Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA Full list of author information is available at the end of the article

quently, it is intriguing to ask how new genes are topological and functionally incorporated in and subsequently change ancestral GGI networks in genome-wild scale. Thanks to the accumulation of GGI data brought by the development of high throughput technologies, a couple of attempts have been made to address this issue. Through examining the evolution of new genes in the proteinprotein interaction networks of yeast Saccharomyces cerevisiae, Capra et al. [10] found novel genes are less integrated in cellular networks than duplicated genes, genes prefer to interact with other genes of similar age and origin, and new genes participated in the network modules for synthesis of important metabolites. By applying different network data source, another research group showed a similar integration process of new genes in yeast [11]. Popadin et al. [12] recently analyzed a co-expression network with previous data of gene ages in vertebrates [2, 13] and observed a difference of integration of these genes into the

© 2015 Zhang et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Zhang et al. Genome Biology (2015) 16:202

networks between young and old ages. These works encourage us to further explore a potential quantitative correlation between a continuous evolutionary process of new genes and their degree to be integrated into and subsequent rewiring of various ancestral gene networks in vertebrates, which have provided data of evolutionarily well resolved divergence times and interesting phenotypic data with the rich datasets of recently evolved genes we identified [2, 13]. In the present report, we investigated evolutionary patterns of GGI networks driven by new genes originating throughout various stages in the lineages toward human and mouse. Taking advantage of a well-resolved gene dating dataset [2, 13] and the rich and independent GGI datasets, we elaborately explored the integration process of new genes into GGI networks reconstructed with four different data sources in both human and mouse. Following, we focused on the functional evolution analysis of new genes in human genome, and explored how new genes acquire critical functions, that is, pleotropic functions, essential functions, and brain development relevant functions, in term of GGI network integration. Finally, we deeply excavated and discussed the mechanisms driving the evolution of GGI networks and deriving the integration patterns of new originating genes.

Results and discussion The integration of new genes into GGI networks is a gradual evolutionary process

A technical challenge to examine the role of new genes in evolution of gene networks is to detect reliable GGI networks in their global distribution. Considering current technical growth and evaluation to methods and data that reveal GGI, we constructed and analyzed three different types of data in attempt to identify robust GGI networks (see Methods): the human protein-protein interactions (hPPIs), the human gene co-expression (hGC) networks, and the mouse proteinprotein interactions (mPPIs). The second line of data we used to investigate the correlation between new gene evolution, as we extensively investigated previously, and the evolution of GGI networks as revealed by above three different databases is the best-resolved vertebrate divergence times, supported by paleontology, organismal evolutionary analysis, and molecular evolution, and most reliably resolved phylogenetic tree of vertebrates over decades of extensive studies on vertebrate species [2, 13]. These data provided excellent estimates for the ages of new genes, comprising the ones generated by DNA-based duplication, RNAbased duplication, and de novo origination during the vertebrate evolution in the lineage toward humans and mouse, as we identified previously in comparative genome comparison.

Page 2 of 14

First of all, we investigated the correlation between the ages of genes and their topological characteristics in the GGI networks described in the four databases we constructed. Remarkably, all these types of GGI network data revealed highly similar rates and patterns of new genes-integrated into the networks. Therefore, we will focus on human for presentation and discussion of the results while introducing the relevant findings in the mouse genome. We first analyzed the human protein-protein interactions (hPPIs) network by exploiting and modifying an integrative experimental protein interactions dataset [14] (with the threshold of confidence score of 0.68, see Methods). The reconstructed human PPI network revealed an approximately scale-free topological structure [15] with a degree exponent of 1.49 that defines a power-law distribution of connectivity (or degrees) (Additional file 1: Figure S1 and Additional file 2: Table S1). We then labeled the gene (equivalent to its coded protein) age of each node in the PPI network, determined by an age index for the genes that originated in every period of evolution along the wellresolved phylogeny of vertebrates (Fig. 1a and b), that were retrieved from a widely used database [2, 13] (See Methods). Analysis on the above PPI network indicated a significant and strong correlation (Polynomial regression test, R2 = 0.8834, Fig. 2a) between the ages of genes and their connectivity (or degree, that is, numbers of interacting partners) in the PPI network, revealing a gradual evolutionary process in which new genes are integrated into the PPI network, which echoed the evolutionary procedure of new gene structures [16]. This finding suggests that throughout vertebrate evolution there was a non-robust and rapid process, unexpected by conventional thought, in which new genes were integrated into the GGI networks. During this process of 370 million years (MY, branch 1–12, Fig. 1a) we examined, we observed that 5,710 new genes were integrated into the GGI networks. Furthermore, this process showed an evolutionarily significant pattern: the new genes started, at a young age, to be integrated into networks to form new and less connected branches; however, with the elapse of evolutionary time, as genes grow older, they acquired more interacting links. To avoid possible bias created by the chosen confidence score threshold for the reconstruction of human PPI network, we reanalyzed a new human PPI network using a more stringent cutoff (With minimum confidence score of 0.77, see Methods and Additional file 2: Table S1) and we found the same evolutionary pattern (Polynomial regression test, R2 = 0.7909, Fig. 2b). The connectivity-based conclusion is further supported by the analysis of another statistic parameter describing network centralities of genes, that is, Betweenness, which

Zhang et al. Genome Biology (2015) 16:202

Page 3 of 14

A Millions of years from present (Myr)

Branch 0

450 400

Branch 1

350

Branch 2

300 250 Branch 3 - 4

200 150

Branch 5 - 7

100 50

Branch 8 - 12

0

Human Chimp

Orangutan

Rhesus

Marmoset Mouse GuineaPig

Platypus Chicken Lizard

Dog Cow Armadillo Tenrec Opossum

Frog

Fugu Zebrafish

Branches

12

11

10

9

8

7

6

5

4

3

2

1

0

# of genes

389

447

392

286

314

130

336

1214

945

1018

1393

1013

12058

B Genes originating before vertebrate split (Branch 0) Genes originating at the split of Branch 1

…… Genes originating at the split of human (Branch 12)

Fig. 1 Schematic diagram to show the network integration of new genes originating from various phylogenetic branches towards human. a Phylogenetic tree of vertebrates towards human together with branches and divergence times in millions of years from present (myr). The number of genes originating at each phylogenetic branches was also listed. b A sub-graph of human PPI network to show the incorporation of new genes from different originating times

measured the importance of one node connecting all the other nodes (Polynomial regression test, R2 = 0.9021, Fig. 2c). Based on human PPI network reconstructed from a different experimental manual curation resource (See Methods and Additional file 3: Figure S2A), that is, Human Protein Reference Database (HPRD) [17], the same conclusion was drawn as described above (Additional file 3: Figure S2B). For a more rigorous analysis of independent GGI data types, we analyzed another human GGI network referred to as gene co-expression (hGC) network (See Methods

and Additional file 3: Figure S2C and D), reflecting the correlations of gene expression profiling in a series of human tissues [18]. Mapping the topological positions of new genes in humans into the GC network revealed a similar correlation between the ages and connectivity of genes (Polynomial regression test, R2 = 0.6527, Fig. 2d), revealing the same evolutionary trend of new genes starting with low connectivity and evolving to be highly connected hubs. Additionally, we also explored the evolutionary patterns of human PPI network based on another gene age dataset [19] (Additional file 4: Figure S3A), which

Zhang et al. Genome Biology (2015) 16:202

Page 4 of 14

A

B

15

R² = 0.8834

0

20

Average connectivity in PPI Network*

Average connectivity in PPI Network

25

1 4

10

2

5 6

3

11

8

12

10

5

9

7

0

12 R² = 0.7909

0 9 1

9 6

3

3

50

10

8

7 500

5

12

11

2

0

500

50

5

Divergence time (Million Year Ago)

Divergence time (Million Year Ago)

D

C

150 R² = 0.9021

0

2.5 1

3

4 5

1.5

2

0.5

Average connectivity in GC Network

Average Betweenness in PPI network (log-10)

5

4

6

6

8 9 7

11

12

10

0 120 90

1

2

R² = 0.6527 3

4

5 6 9

10

60 7

30

8

11 12

0

-0.5 500

50

5

Divergence time (Million Year Ago)

500

50

5

Divergence time (Million Year Ago)

Fig. 2 GGI network topological patterns of human genes related to their divergence times. a Distribution of PPI network connectivity (numbers of interactions) for genes from different phylogenetic branches. b Distribution of genes from different phylogenetic branches from another PPI network connectivity reconstructed with a more stringent threshold. c Distribution of average Betweenness (log10-based) within each gene group in the PPI network. d Distribution of GC (gene co-expression) network connectivity for genes from different phylogenetic branches. The error bars show the standard error of the mean for each group of genes, and the dash line indicates the polynomial regression correlation between network centralities (that is, Connectivity, Betweenness) of genes and their divergence times. Numbers near each data point are phylogenetic branch assignments for each group of genes. The divergence time of each gene age group was assigned as the middle time point for each branch and the oldest branch (branch 0) is arbitrarily set as 500 myr

estimated gene ages in human genome based on independent and long distant phylogenetic distribution. A same evolutionary pattern of new genes was shown (Additional file 4: Figure S3B), and it was further demonstrated that our conclusion was independent of gene age dating datasets. Thus, different GGI data, that is, PPI and GC data, and different gene age dating data, all supported the same conclusions as reported above. Furthermore, we applied a similar protocol to analysis of the reconstructed mouse GGI networks from mouse PPI data (mPPIs), by integrating most of the available online experimental interaction datasets (Additional file 5: Table S2). The integrative analysis of mouse gene age information [13] (Additional file 6: Figure S4A) and PPI topological data (Additional file 6: Figure S4B) lead to the same conclusion (Polynomial regression test, R2 = 0.6232, Additional file 6: Figure S4C) determined by the human GGI network analyses. These data suggest a gradual integration of new genes in the GGI networks is an evolutionary process shared in mammalian lineages of primates and rodents.

Given the observation that the acquisition of genetic interactions is a time-dependent gradual procedure, we further investigated whether this process occurred at a constant rate. Our result showed that new genes could establish linking partners at a high rate (interactions acquired per million years) in the initial stage of their origination. After that, the rate dramatically declined, and finally plateaued (Fig. 3a and b), suggesting that the acquisition of biological roles of new genes is a rapid process during early evolution, but as the genes age, the function spectrum is diversified at a much lower rate. Taking advantage of the high coverage of the human PPI data (Additional file 2: Table S1), we subsequently focused on the analysis of both topological and functional evolution patterns of new genes based on our first constructed human PPI network. To better visualize the integration process, we mapped the genes in the mammalian GGI networks based on their connectivity, where highly connected genes made up the core of the human PPI network and genes with low connectivity were located on the

Zhang et al. Genome Biology (2015) 16:202

R² = 0.891 12

0.9

11

0.6

0.3 2

5

3

6 7

8

4 0 1

10 9

0.0 500

50

5

Rate of evolving interactions in the Mouse PPI network (interactions / myr)

Rate of evolving interactions in the Human PPI network (interactions / myr)

B

1.2

0.20 R² = 0.8939 0.16

11

0.12 10

0.08 6

0.04

3 0 1

2

4

5

9

8

A

Page 5 of 14

7

0.00 500

50

Divergence time (Million Year Ago)

5

Divergence time (Million Year Ago)

Fig. 3 Average rate of evolving linking partners (interactions / myr) for genes from different phylogenetic branches based on the human PPI network (a) and mouse PPI network (b). The dash line indicates the power regression correlation between evolution rates of interactions for genes and their divergence times. Numbers near each data point are phylogenetic branch assignments for each group of genes. The divergence time of each gene age group is assigned as the middle time point for each branch. And the oldest branch (branch 0) is arbitrarily set as 500 myr

network periphery (Fig. 4), which revealed a clear correlation between gene age and location in the mammalian GGI networks. Surprisingly, a small fraction of young genes were found to have evolved into the network core, whereas the majority of recently originating genes,

especially primate-specific genes (branch 8–12, Fig. 1a), are located in the exterior regions of network. As the ages of genes increase, they tend to appear more frequently in the more densely connected core of network.

0.8

0.6

0.4

0.2

0 0

1

2

3

4

Very low connectivity

5

Low connectivity

6

7

Periphery

Moderate connectivity

8

High connectivity

9 10

Very high connectivity

11 12

Core

Fig. 4 PPI network locations of genes in relation to their divergence times. The network locations of genes are classified into five distinct layers according to the rank of degree centralities. Specifically, genes that have the top 20 % of degree centralities are assigned to the network core (genes with very high connectivity), and those with the bottom 20 % of degree centralities in the network periphery (genes with very low connectivity). The same rule is applied for the assignment of the middle three network layers. The x-axis shows the phylogenetic branches for each group of genes, and y-axis indicates the categorization of genes according to the above specifications, and z-axis exhibits the percentage of genes within each age group located in the corresponding categories

Zhang et al. Genome Biology (2015) 16:202

Page 6 of 14

New genes gradually acquire pleiotropic and essential function roles

As most biological characteristics arise from the complex interactions between cell’s numerous components [4], the integration of new genes into the GGI network might indicate the emergence of novel functions for these new genes. Furthermore, the gradual evolution of more interactions in GGI networks might signal the process of new genes acquiring pleiotropic functions. This hypothesis could be indirectly confirmed by the strong correlation of connectivity of genes and their divergence times (Fig. 2a) and a strong linear correlation between the connectivity of genes and their expression breadths at both RNA expression level (Pearson linear correlation test, R2 = 0.9384, Fig. 5a) and protein expression level (Pearson linear correlation test, R2 = 0.9457,

B 24

Average number of tissues with expression on protein level

Average number of tissues with expression on RNA level

A

Fig. 5b). Thus it could hint that new genes gradually evolve broader expression patterns and therefore acquire pleiotropic functions, as they gradually evolve more linking partners (Fig. 2a), and genes with more linking partners tend to have broader expression patterns (Fig. 5a and b). To verify this hypothesis in a direct manner, we further computed and compared the tissue expression patterns for genes along different phylogenetic branches. Our results showed that genes gradually evolved broader tissue expression patterns at mRNA expression level from RNA-seq data [20] (Polynomial regression correlation test, R2 = 0.96538, Fig. 5c), which indicates the acquirement of stronger pleiotropic functions. One might dissent the role of mRNA as the performer of biological functions, our analysis on protein expression profiling

R² = 0.9384

21

18

15

44

R² = 0.9457

41

38

35 1

2

3

4

5

6

7

>=8

1

2

PPI network connectivity

4

5

6

7

>=8

PPI network connectivity

C

D 25

Average number of tissues with expression on protein level

Average number of tissues with expression on RNA level

3

R² = 0.9654

0

20 1

2

15

3

10

5

7

8 12

4 9

6

10

11

5 500

50

Divergence time (Million Year Ago)

5

45 R² = 0.8004

0

40

1 2 4

35

5

6

10

3

30

11

12

9 7

8

25 500

50

5

Divergence time (Million Year Ago)

Fig. 5 Expression breadths of genes in regards to their PPI network connectivity and divergence times. a Average number of tissues with expression of genes with various PPI network connectivity based on RNA-seq expression level data. b Average number of tissues with expression of genes with various PPI network connectivity based on protein expression level data. The error bars show the standard error of the mean for each group of genes, and the solid line indicates the linear regression correlation between network connectivity of genes and their expression breadths. c Average number of tissues with expression of genes from different phylogenetic branches based on RNA-seq expression level data. d Average number of tissues with expression of genes from different phylogenetic branches based on protein expression level data. The dash line indicates the polynomial regression correlation between divergence times of genes and their expression breadths. Branch assignment is labeled near each data point. The age assignment for each branch follows Fig. 1

Zhang et al. Genome Biology (2015) 16:202

data [20] drew the same conclusion (Polynomial regression test, R2 = 80038, Fig. 5d). In line with the network topological integration process of new genes (Figs. 2a and 4), our results showed a gradual process for new genes to evolve pleiotropic function roles, reflected by the tissue expression patterns. These findings also suggest functional constraints on new originating genes [21], as they are usually shown to be with very narrow and specified expression patterns [22], such as testis expression [23]. One critical feature of scale-free networks is the existence of hub nodes, or highly connected nodes [24]. Hub nodes are essential components in various networks [25], and are subjected to concentrated evolutionary forces that shape the network structures to result in essential functions [3, 26]. To explore the contribution of new genes in reshaping the GGI network, we investigated the percentage distributions of hub genes (with interaction degrees no smaller than 6) originating across different phylogenetic branches in human PPI network. The data revealed a strong correlation between gene ages and fractions of hub genes (Polynomial regression correlation test, R2 = 0.8016, Fig. 6a). In particular, we found a high proportion of hub genes (16 %) arising in the most recently originated human-specific branch (Branch 12, Fig. 1a), and this number gradually increased with gene ages, peaking at around 53 % for the earliest originating genes (Branch 0, genes arising before the split of vertebrates, Fig. 1a). This phenomenon indicates the gradual process of new genes evolving to be network hubs, and reshaping the original gene interaction networks. It has been reported that there is a relationship between gene topological features and biological functions [26, 27]. More specifically, genes with high network connectivity tend to be functionally essential [26] (Fig. 6b). Given the above observation that new genes gradually evolve many interactions to become network hubs, it is reasonable to infer that the acquisition of functional essentiality for new genes in human genomes may follow a step-wise evolutionary process. Through the meticulous collection and analysis of sources of human gene essentiality data (Additional file 7: Table S3, see Methods), we explored the relationship between gene essentiality and origination time (Fig. 6c). It was unexpected that a proportion of newly originated genes, especially genes that arose after branch 6 (approximately 80 million years ago), have evolved essential functions, although more genes originating from older periods are functionally essential, and the fraction of essential genes increases with the elapse of evolutionary time. Together with aforementioned observations from the network topology, our analysis demonstrated a clear trend that human new genes gradually evolve to be topologically central and

Page 7 of 14

functionally essential, and acquire the capability to reshape the GGI networks. Human-specific hub genes are found to be with potential brain development functions

The remarkable development of the brain in primatelineage species, especially in human, is a decisive hallmark differentiating them from other organisms [28]. Recent studies have reported important roles of new genes in evolution of important human brain-related traits. For example, it was detected that an excess of young genes (that is, primate-specific) in the human genome are recruited in early human brain development [2]; potential strengthening functions of brain neoronconnection by SRGAP2 [29, 30]; the skin and brain functions by CHRFAM7A [31, 32]. We further investigated the correlation of the young genes in human that have evidence for functioning in brain development with their topological structures in the GGI networks. Through integrative analysis of the brain expression pattern data of these young genes [2] and their network topological features based on human PPI network data, we found no significant bias on the percentages of hub genes (with minimum interaction degrees of 6) among three different brain expression categories of young genes (Fisher’s exact test, Fetus vs. Adult: P value = 0.435, Adult vs. Unbiased: P value = 0.3323, Fig. 7). In other words, young genes with diverse network connectivity contribute equally during both early and late stages of human brain development. More intriguingly, four human-lineage specific (the genes that originated only in the human lineage since its divergence and thus exist only in the human genome) hub genes with clear expression evidence in human brain were found (Additional file 8: Table S4). As there was no direct clue in literatures about their functions in brain development of these four genes, we conducted a ‘guilt by connection’ study to investigate the reported evidence for the roles in brain function of their direct linking partners by manual curation of early studies (Additional file 9: Table S5). For instance, CCT4, a subunit of chaperonin containing TCP1, was reported to be involved with development of a brain malfunction disorder - Alzheimer’s disease [33], and it was also shown that CCT4 (gene id: 10575) is a direct interacting partner of one of young hub gene - FAM86B2 (gene id: 653333, Fig. 8). Collectively, we found that 62.5 % (10 of 16) and 53.3 % (8 of 15) of the first-layer linking partners for two out of the four hub genes, which were fetus brain biased, were confirmed to be involved in brain development (Fig. 8 and Additional file 9: Table S5). While for the other two unbiased hub genes, 24.4 % (10 out of 41) and 50 % (3 out of 6) were proven to function in brain development in previous literature

Zhang et al. Genome Biology (2015) 16:202

Page 8 of 14

B

A

20

Fraction of functionally essential genes

Fraction of hub genes in PPI network

60 R² = 0.8016

0 1

40

3 4

5

8

6

2

11

20 7

12

10 9

0 500

50

5

Fraction of functionally essential genes

15

10

5

0 1

2

3

4

5

6

7

>=8

PPI network Connectivity

Divergence time (Million Year Ago)

C

R² = 0.665

10 R² = 0.9712

8

0

6

4

1 2

3

2 4

5 67

9

8

10

11

12

0 500

50

5

Divergence time (Million Year Ago) Fig. 6 Fraction of topologically and functionally essential genes for gene groups from different divergence times. a Fraction of hub genes in PPI network within gene groups of different divergence times. Hub genes are defined as genes with network connectivity greater than median level (Interaction degree > = 6). Branch assignment is labeled near each data point. The age assignment for each branch follows Fig. 1. The dash line indicates the polynomial regression correlation between divergence times of genes and the fractions of hub genes. b Fraction of essential genes in regards to their PPI network connectivity. The solid line indicates the linear regression correlation between PPI network connectivity of genes and the fractions of essential genes within each gene group. c Fraction of essential genes in PPI network within gene groups from different divergence times. The dash line indicates the polynomial regression correlation between divergence times of genes and the fractions of essential genes

(Fig. 8 and Additional file 9: Table S5). As genes with similar functions tend to be within the same network cluster [34], this evidence suggests these four humanlineage specific hub genes could also be with associated functions in human brain development. Multiple mechanisms drive the evolution of human GGI network

The most significant property of complex networks, including biological networks, is the power-law degree distribution [24] (Additional file 1: Figure S1), or so-called scale-free feature. Following the classic Barabasi-Albert (BA) model [35], this preferential attachment model was also applied to account for the scale-free feature of

biological networks [36], which claims that new originating genes tend to interact with well-connected nodes. However, the biggest challenge for this model is the distinctive characteristics of biological networks duplication as the dominant source of network evolution [37]. Therefore, another biologically motivated model called duplication-divergence model was proposed [38, 39], which accounts for both the gene duplication and the subsequent loss of inherited interactions. However, the acquirement of new links, except inherited interactions, was not considered in this model. To address this issue from an evolutionary aspect, we defined primate-specific genes (branch 8–12 as shown in Fig. 1a) as young genes, and genes that originated before

Zhang et al. Genome Biology (2015) 16:202

Page 9 of 14

P_value = 0.435

P_value = 0.3323

100

Fraction of young genes

80

60

40

20

0 Fetus brain biased

Adult brain biased

Young hub genes

Unbiased

Young non_hub genes

Fig. 7 Comparison of PPI network topologies for young genes with diverse brain expression patterns. This figure shows the percentage distribution of young hub genes and young non-hub genes within different categories of brain expression patterns. The statistical significance difference was calculated using Fisher’s exact test

Fig. 8 Human lineage-specific hub genes and their first-level linking partners. This figure illustrates two fetus brain biased human lineage-specific hub genes (top) and two unbiased human lineage-specific hub genes (bottom) and their direct interacting partners from the human PPI network. Genes biased in fetus brain (blue), adult brain (red), and unbiased (orange) between fetus and adult brain are marked. Genes (in square circles) outlined in the green dashed rectangle have been reported to have some brain development-related functions in previous literature

Zhang et al. Genome Biology (2015) 16:202

Page 10 of 14

this time period as old genes. Among these young genes, 95 % of them were created from duplication-based (either from DNA-level duplication or RNA-level duplication) mechanisms (Additional file 10: Figure S5), which is in line with the classic argument that duplication is the dominant source of evolution [37]. Consequently, these young genes inherited on average 27 % linking partners from their parental genes (Fig. 9a), which is statistically greater (18 times) than that of random gene pairs (Fig. 9b). This finding indicated the inheritance of interacting partners of new genes from their parental copies [5]. We further explored the pattern for young genes to establish new linking partners, by removing those shared interactions with their parental genes. Different with the pattern in yeasts [10], we found that the young genes tend to prefer as new linking patterns the genes with high topological centralities (Chi-square tests, Degree: P value

New genes drive the evolution of gene interaction networks in the human and mouse genomes.

The origin of new genes with novel functions creates genetic and phenotypic diversity in organisms. To acquire functional roles, new genes must integr...
NAN Sizes 1 Downloads 7 Views