Proc. Natl. Acad. Sci. USA Vol. 89, pp. 6358-6362, July 1992 Evolution

Conservation of the organization of five tightly clustered genes over 600 million years of divergent evolution (gene cluster/gene organization/ribosomal protein L7a/surfelt locus/CpG-rich islands)

PAOLO COLOMBO*, JEFF YON, KENNETH GARSON, AND MIKE FRIED Eukaryotic Gene Organization and Expression Laboratory, Imperial Cancer Research Fund, P.O. Box 123, Lincoln's Inn Fields, London WC2A 3PX, United Kingdom

Communicated by Walter Bodmer, April 3, 1992 (received for review January 28, 1992)

between the surfeit genes by mechanisms such as the sharing of regulatory elements, antisense regulation, and/or promoter occlusion (2). The DNA sequence of the Surf-3 gene, which encodes the ribosomal protein L7a (Rpl7a), is highly conserved through evolution (6, 7). In mammals the Rpl7a gene, like other mammalian ribosomal protein genes, is a member of a multigene family containing multiple processed pseudogenes (15-30 copies) (8). However, there appears to be only one functional gene present in birds, amphibia, insects, and Schizosaccharomyces pombe (6) and two functional genes in Saccharomyces cerevisiae (7). The other surfeit genes are single copy and do not appear to encode ribosomal proteins

The organization of the mouse surfeit locus is ABSTRACT unusual in that it contains six housekeeping genes (Surf-lSurf6), which are unrelated by sequence homology, in the tightest mammalian gene cluster thus far described. A maximum of only 73 base pairs separates any two of the four wellcharacterized genes, and two of the genes overlap at their 3' ends. The direction of transcription of each of the five surfeit genes, Surf-ri-Surf-S, alternates with respect to that of its neighbor, suggesting cis-interaction or coregulation between the genes by mechanisms such as the sharing of regulatory elements and/or antisense regulation. The Suf-3 gene has been identified as encoding the ribosomal protein L7a (Rpl7a). We have used the high conservation of the Rpl7a gene to clone the chicken gene and surrounding genomic DNA. The tight clustering and juxtaposition of at least five of the surfeit genes (SurfJ-Surf-5) and their associated CpG-rich islands have been found to be conserved over the 600 million years of divergent evolution that separates birds and mammals. This strongly suggests that the surfeit locus represents a different form of gene cluster in which gene organization may play both a positive and negative regulatory role in gene expression possibly via cis-interactions between the closely spaced genes.

(2).

To gain some insight into the importance and reason for the unusual organization of the surfeit gene cluster, we have analyzed the conservation and association of these genes in the chicken, which is separated from mammals by 600 million years of divergent evolution (9, 10). We used the evolutionary conservation of the Surf-3/Rpl7a gene first to isolate a chicken cDNA and eventually a genomic clone containing the single chicken Surf-3/Rpl7a gene.t Analysis of the cloned chicken gene and surrounding genomic DNA has revealed that the Surf-3/Rpl7a gene and its structure are very highly conserved between mammals and birds, and at least four of the other surfeit genes are present in the chicken genome and occur in a cluster of similar topographic organization to that of the mouse surfeit locus. These results suggest that the surfeit locus represents a third type of gene cluster in which the gene organization has biological significance.

Sets of functionally related genes are often found clustered together in vertebrate genomes. In a number of instances these gene clusters are also found to be conserved through evolution. In some cases, the genes within a cluster share DNA sequence homology and probably arose from gene duplication. In other instances, the genes within the cluster do not share DNA sequence homology. An experiment initially designed to isolate transcriptional enhancers (1) led to the identification and characterization of a very compact gene cluster (2) called the surfeit locus. The mouse surfeit locus has a number of distinctive features and may represent an additional type of gene cluster. It contains at least six very tightly clustered housekeeping genes (Surf-i-Surf-6), which are unrelated by sequence homology. In contrast to the tens to hundreds of kilobases (kb) that separate most adjacent mammalian genes, a maximum of only 73 base pairs (bp) separate any of the adjacent well-characterized surfeit genes (see Fig. 3B). The heterogeneous 5' ends of the Surf-] and Surf-2 genes are separated by only 15-73 bp (3), and the 3' ends of the Surf-1 and Surf-3 genes are separated by only 70 bp (4). The 3' ends of the Surf-2 and Surf-4 genes overlap by 133 bp (5). The surfeit locus contains the tightest cluster of mammalian genes so far described. The 5' end of each gene is associated with a CpG-rich island, and the direction of transcription of each of the five characterized genes (SurfI-Surf-S) alternates with respect to that of its neighbor(s) (2). This topography is thus far unusual for a cluster of mammalian genes and suggests cis-interaction and/or coregulation

MATERIALS AND METHODS A A Charon 28 library containing BamHI-cleaved chicken genomic DNA fragments (a kind gift of T. Graf, European Molecular Biology Laboratory) and a Agtll cDNA library derived from chicken liver (a kind gift of G. Ab, University of Groningen, The Netherlands) were screened at reduced stringency hybridization conditions (4) by using a mouse full-length cDNA Rpl7a/Surf-3 probe (8). Restriction fragments containing Rpl7a/Surf-3 genomic and cDNA sequences were subcloned into Bluescript vectors (Stratagene) and sequenced by using T7 DNA polymerase (Sequenase) according to the manufacturer's (United States Biochemical) instructions. Sequence analysis was carried out using the Genetics Computer Group software (11). Restriction enzyme-digested chicken genomic DNA and restriction enzyme-digested DNA 'from the chicken ACH2 clone were fractionated on a 1% agarose gel and transferred to a Hybond-N membrane according to the manufacturer's *Present address: Istituto di Biologia dello Sviluppo del Consiglio Nazionale delle Ricerche, Via Archirafi, 20, 90123 Palermo, Italy. tThe sequence reported in this paper has been deposited in the GenBank data base (accession no. X62640).

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

6358

Proc. Natl. Acad. Sci. USA 89 (1992)

Evolution: Colombo et al. instructions. Filters were hybridized in 4x standard saline citrate (SSC)/5 x Denhardt's solution/0.5% SDS at 550C, and washing was carried out at 550C in 0.5X SSC/0.1% SDS. PCRs were carried out in a Dri-Block (Techne, Princeton, NJ) and consisted of 30 cycles using conditions previously described (12). The primers used for PCR of the ACH2 DNA to determine the distance and orientation of the chicken surfeit genes were derived from the coding regions of mouse surfeit genes as follows: Surf-1 (nucleotides 106-85) and Surf-2 (nucleotides 127-98) (3), Surf-1 (nucleotides 2956-2934) and Surf-3 (nucleotides 734-758) (8), and Surj-3 (nucleotides -27-1) and Surf-5 (nucleotides 107-90) (K. Garson, personal communication). The amplified PCR products were fractionated on 1% agarose gels, excised, and sequenced directly.

RESULTS Organization and Structure of the Chicken Surf-3/Rpl7a Gene. A number of cDNAs were isolated from a chicken Agtll library by using a mouse Surf-3/Rpl7a cDNA probe (8) after hybridization under conditions of slightly reduced stringency. All of the single plaques were analyzed by PCR using Agtll primers, and the DNA sequence of the longest cDNA, PC4, was determined. A clone, ACH2, containing a 12-kb genomic insert, was isolated from a library ofBamHI-cleaved chicken genomic DNA (in A Charon 28) by using the PC4 chicken Rpl7a cDNA as a probe. Analysis of the PC4 cDNA and the corresponding regions of the cloned chicken genomic DNA allowed the sequence and organization of the chicken

6359

Surf-3/RpI7a gene to be determined (Fig. 1). The chicken Rpl7a gene is composed of eight exons spanning 3.5 kb. The position of the introns is conserved among the mouse, human, and chicken genes although the size and sequence of the introns differ between the three genes (8, 13). At the DNA sequence level, within the putative coding regions, the chicken Surf-3/RpI7a cDNA has 82% homology and 81% homology with the mouse and human Surf-3/RpI7a cDNAs, respectively (8, 14). Both the chicken and mouse genes encode putative polypeptide chains of 266 amino acids. There are 10 differences (of which 6 are conservative changes) between the mouse and chicken amino acid sequences (96% homology) (8). The chicken, mouse, and human genes each contain short (about 50 bp) 3' untranslated regions. Only the chicken gene contains a canonical poly(A) addition site (AATAAA) (Fig. 1); both the mouse and the human genes contain variant putative poly(A) sites (8, 13, 14). The Organization of the Surfeit Gene Cluster Is Conserved in the Chicken. As the Surf-3/Rpl7a gene is highly conserved between mammals and chicken, it was of interest to determine if homologues of the other mammalian surfeit locus genes were also conserved and tightly clustered in chicken as in mouse (2) and human (4). Mouse cDNA probes from Surf-1, Surf-2, Surf3/Rpl7a, and Surf-S were used to probe restriction enzyme-digested chicken ACH2 cloned DNA (Fig. 2A). The positions of the corresponding chicken surfeit genes in relation to the restriction map of the ACH2 clone are illustrated in Fig. 3A. It can be seen that the tightly clustered

+45 intron 1I CGCCCTTTTTACTCTACTACCAAGATGCCGAAAGGAAAAAAGGCCAAGGGCAAGAAGGTGGCACCTGCC

METProLysGlyLysLysAlaLysGlyLysLysValAlaProAla +114

CCTGCTGTAGTCAAGAAGCAGGAGGCCAAGAAGGTTGTCAATCCCCTCTTTGAGAAGAGGCCCAAGAAC ProAlaValValLy3LysGlnGluAlaLysLysValValAsnProLeuPheGluLysArgProLySASn +183 intron 2 TTTGGCATTGGACAGGATATCCAGCCCAAGCGTGATCTCACCCGCTTTGTGAAATGGCCCCGCTACATC

PheGlyIleGlyGlnAspIleGlnProLy3ArgAspLeuThrArgPheValLysTrpProArgTyrIle

+252 AGGCTGCAGCGCCAGCGCTCCATTCTGTACAAGCGCTTGAAGGTGCCCCCTGCAATCAACCAGTTCAGT

ArgLeuGlnArgGlnArgSerIleLeuTyrLysArgLeuLysValProProAlaIleASnGlnPheSer +321 Iintron 3 CAGGCTTTGGATCGCCAAACAGCCACGCAGCTTCTGAAGCTGGCACACAAATACAGGCCAGAAACTAAG GlnAlaLeuAspArgGlnThrAlaThrGlnLeuLeuLysLeuAlaHiSLysTyrArgProGluThrLys

+390

CAAGAGAAGAAGCAGAGGCTGTTGGCTCGTGCTGAACAGAAAGCTGCAGGAAAGGGAGATACTCCAACT

GlnGluLy3LysGlnArgLeuLeuAlaArgAlaGluGlnLysAlaAlaGlyLysGlyASpThrProThr

+459

intron 4

AAGAGACCACCAGTCCTCCGGGCAGGTGTTAACACTGTCACAACTCTGGTAGAGAATAAGAAAGCTCAG

LysArgProProValLeuArgAlaGlyValASnThrValThrThrLeuValGluAsnLy3LySAlaGln I

+528

intron 5

CTTGTGGTGATTGCCCATGATGTAGACCCCATTGAGCTGGTGGTCTTCTTGCCAGCTCTGTGCCGCAAG

LeuValValIleAlaHi3AspValAspProIleGluLeuValValPheLeuProAlaLeuCysArgLys

+597

ATGGGAGTGCCATACTGCATCATCAAGAGCAAGGCCAGGCTGGGGCGACTGGTGCACAGGAAAACTTGT MetGlyValProTyrCysIleIleLy3SerLysAlaArgLeuGlyArgLeuValHiSArgLySThrCYS

+666

intron 6

ACCTGTGTTGCTTTCACACAAGTTAACCCGGAGGATAAGGGTGCCCTTGCAAAGCTGGTGGAGGCTGTC

ThrCysValAlaPheThrGlnValAsnProGluAspLysGlyAlaLeuAlaLysLeuValGluAlaVal

intron 7 AAGACCAACTACAATGACAGATATGATGAGATCCGTCGTCACTGGGGCGGTAATGTCTTGGGTCCAAAA

LysThrAsnTyrAsnAspArgTyrAspGluIleArgArgHisTrpGlyGlyAsnValLeuGlyProLys

+804

TCTGTGGCTCGCATTGCCAAGCTTGAAAAAGCAAAGGCTAAAGAACTGGCTACTAAGCTGGGCTAAAGT SerValAlaArgIleAlaLysLeuGluLysAlaLysAlaLy3GluLeuAlaThrLy3LeuGly--+857

TGTACTGATTTGTACCGTGGTTTGTGTACATAAAAAAAATAAAGCTCTGGATT FIG. 1. The DNA and amino acid sequence of the chicken Surf-3/Rpl7a gene. The translation product is shown below the DNA sequence. The 10 amino acid changes between the chicken and mouse (8) Surf-3/RpI7a genes are indicated: the six conservative changes are denoted by double underlining, and the nonconservative changes are denoted by single underlining. The polyadenylylation signal is indicated by overlining. The positions of the seven introns are indicated.

Evolution: Colombo et al.

6360 A

E

B

H

E

B

Proc. Natl. Acad. Sci. USA 89 (1992) E

H

B

E

H

B

4w

122I A1 2 4.0 3.0

2.0 1 .6

4WD

go Om

B

H

ww.mm

40M

S

I_

4

_0 FIG. 2. Conservation of the clustering and organization of the chicken Surf-], Surf-2, Suff-3, and Surf-5 genes. (,A) DNA of the 'RI (E). Hind lIl (H) . or Bam H I ACH2 clone digested with either Eo (B) (indicated along the top) was fractionated on ai a garose gel before being transferred to a Hybond-N membrane according to the manufacturer's instructions. The numbers on the left side of the gel indicate sizes in kb. The restriction map of ACH2 and the locations of the hybridizing regions are diagrammatically represented in Fig. 3A. (B) Fractionation of the PCR products generated from the DNA of the ACH2 clone on a 1S agarose gel using primers from the coding regions of the mouse surfeit genes. Lane 2. PCR of Sur3tL and Surf 5: lane 3. PCR of Surff3 and Surf1: lane 4. PCR of Surt'I and Sucff2 Lanes 1 and 5 each contain 123-bp ladder size markers BRl .

0.5p

I Surf-5

Surt-2

Surf-i

Surf-3

To estimate the distances between the different chicken surfeit genes, a number of PCRs (15) were performed on the ACH2 DNA by utilizing specific oligonucleotide primers derived from the coding regions of the adjacent mouse surfeit genes. The sizes of the PCR bands indicate maximum dis-

structure of the surfeit locus is conserved between chicken and mouse (compare A and B in Fig. 3). The juxtaposition of the chicken surfeit genes is the same as in the mouse (2), with Surf-3/Rpl7a lying between Surf-1 and Surf-5 and the Surf-2 gene located on the opposite side of Surf-l.

A Chicken surfeit locus

16kb

6 kb Narl Sacl Smal

Smal

H3

B1ll gZH3

SacilRI RI H3 H3 IR{

ML

) h

Narl Smal Sacil

B1 m

rpG Surf-l Surf-2

pG

Surf-5 Surf-3 I

H3 RI

Surf-4

~

-

--,N

I

I I

Conservation of the organization of five tightly clustered genes over 600 million years of divergent evolution.

The organization of the mouse surfeit locus is unusual in that it contains six housekeeping genes (Surf-1-Surf-6), which are unrelated by sequence hom...
1MB Sizes 0 Downloads 0 Views