Mammalian Genome 0991) 1:171-183

9 Springer-VedagNewYorkInc. 1991

The mouse Col2a-1 gene is highly conserved and is linked to Int-1 on C h r o m o s o m e 15 Kathryn S.E. Cheah, 1 Patrick K.C. Au, 1 Elizabeth T. L a u , 1 Peter F.R. Little z and Lisa S t u b b s 3 1Department of Biochemistry, Hong Kong University, Sassoon Road, Hong Kong; 2Department of Biochemistry, Imperial College of Science, Technology and Medicine, London, SW7 2AY, UK; 3Imperial Cancer Research Fund, Lincoln's Inn Fields, London WC2A 3PX, UK Received December 7, 1990; accepted February 4, 1991

Abstract. Type II collagen is the major extracellular

matrix component of cartilage and correct expression of the al(II) collagen gene is important for vertebrate skeletal development. In order to provide the basis for studying the control of type II collagen gene expression in embryogenesis and in mouse models of human connective tissue disease, the complete mouse Col2al gene has been isolated in a single cosmid clone, cosMcol.2, and partially characterized. The gene is approximately 30 kb and is highly conserved in exon/ intron structure and nucleotide and amino acid sequence (>80% homology) when compared with the human, rat, bovine and chicken equivalents. A high degree of conservation was also found in the 5' flanking region of the rat, human and mouse ed(II) collagen genes, including the presence of several G + C and C + T rich, direct repeat motifs. The sites of transcription start, termination codon and polyadenylation have also been identified. Unlike chicken, bovine and human, where polyA attachment is at a single site, for the mouse Col2a-1 gene two polyadenylation sites are utilized. Col2a-1 has also been localized by interspecies backcross analysis to the central portion of mouse Chromosome (Chr) 15, approximately 8 centiMorgans (cM) proximal oflnt-1 and 18 cM distal ofMyc. Col2a1 is therefore included in a linkage group which is conserved on human Chr 12q.

Introduction

Amongst the different connective tissues, cartilage plays a special role because it has to withstand high The sequence data presented in this paper have been submitted to GenBank and have been assigned the accession numbers M63708, M63709 and M63710. Offprint requests to: K.S.E. Cheah

cyclical loads and absorb shocks. This is achieved by the synthesis of a complex extracellular matrix consisting of at least five different collagens, types II, VI, IX, X, and XI (Mayne and Irwin 1986; Grant et al. 1988) and h y a l u r o n a t e - p r o t e o g l y c a n c o m p l e x e s (Rosenberg and Buckwalter 1986). Of these components, type II collagen is the most abundant protein component in the extracellular matrix of cartilage, forming up to 60% of protein content, and 85% of total collagen. The type II collagen molecule consists of three identical e~ chains, designated as ~l(II) (Miller 1971) which associate to form a triple helix and is classified as a fibril-forming collagen because the molecules aggregate in the matrix to form fibrils (Miller and Gay 1987). The ability to form a triple helix and fibrils is due to the long uninterrupted sequence of a repeating amino acid triplet, (Gly-X-Y), where X and Y are often proline and hydroxyproline, or alanine and lysine (Miller and Gay 1987; Kuhn 1987). Although type II collagen is often thought of as a classical collagen because of its early discovery, characteristic structure and fibrillar-forming ability, little is known of the factors regulating the expression of the o~l(II) collagen gene during development. It is however generally accepted that the correct expression of type II collagen is important for chondrogenesis and also for proper skeletal development, as cartilage forms the scaffold for the formation of endochondral bone (Veis 1984). Very few mutations have been identified in the human ctl(II) collagen, COL2A1, gene. As a result, there is a relative lack of knowledge about the relationship between structure and function which may be gained through studying the phenotypic effect of mutations in the COL2A1 gene. In Stickler syndrome, an association with a defect in COL2AI has been implied by linkage analysis (Francomano et al. 1987), and mutations in the ed(II) collagen gene have been demonstrated in a patient with achondrogenesis (Vissing et

172

K.S.E. Cheah et al.: Mouse Col2a-1 is conserved and located on Chr 15

al. 1989) a n d i n a c a s e o f s p o n d y l o e p i p h y s e a l d y s p l a s i a ( L e e e t al. 1989). I n o r d e r to s t u d y the r e l a t i o n s h i p b e t w e e n struct u r e a n d f u n c t i o n , a n i m p o r t a n t first p r e r e q u i s i t e is the isolation and characterization of genes. The complete h u m a n ~ I ( I I ) c o l l a g e n g e n e , a n d p a r t o f the rat, b o v i n e a n d c h i c k e n g e n e s h a v e b e e n isolated. H o w e v e r , for t y p e I I c o l l a g e n , c o r r e l a t i n g s t r u c t u r e with f u n c t i o n b y s t u d y i n g the p h e n o t y p i c c o n s e q u e n c e s o f m u t a t i o n s in the g e n e h a s n o t b e e n s t r a i g h t f o r w a r d . This has b e e n d u e to the difficulty i n o b t a i n i n g b i o p s y s a m p l e s for b i o s y n t h e t i c s t u d i e s a n d the lack o f cell c u l t u r e m o d e l s y s t e m s . C h o n d r o c y t e s are p h e n o t y p i c a l l y u n s t a b l e in l o n g t e r m c u l t u r e t e n d i n g to d e d i f f e r e n t i a t e to b e c o m e f i b r o b l a s t i c , a n d s w i t c h s y n t h e s i s f r o m t y p e I I to t y p e I c o l l a g e n ( B e n y a a n d B r o w n 1986). T h e m o u s e p r o v i d e s a g o o d m o d e l for the s t u d y o f d e v e l o p m e n t a n d d i s e a s e b e c a u s e o f the w e a l t h o f basic k n o w l e d g e o f its g e n e t i c s a n d e m b r y o g e n e s i s . M o r e r e c e n t l y t e c h n i q u e s h a v e b e e n d e v e l o p e d in w h i c h specific m u t a t i o n s m a y b e i n t r o d u c e d into the m o u s e germ line by h o m o l o g o u s r e c o m b i n a t i o n in p l u r i p o t e n t i a l s t e m cell l i n e s ( T h o m a s et al. 1986; S c h w a r t z b e r g et al. 1989). T h i s t e c h n i q u e p r o v i d e s a p o w e r f u l tool for c o r r e l a t i n g s t r u c t u r e w i t h f u n c t i o n b y the s t u d y o f the p h e n o t y p i c c o n s e q u e n c e s o f m u t a t i o n in a n a n i m a l m o d e l . I n o r d e r to p r o v i d e the b a s i s for s u c h s t u d i e s , w e h a v e i s o l a t e d the c o m p l e t e m u r i n e otl(II) c o l l a g e n (Col2a-1) g e n e a n d p a r t i a l l y d e t e r m i n e d its p r i m a r y s t r u c t u r e . W e h a v e i d e n t i f i e d the start site o f t r a n s c r i p t i o n , p o t e n t i a l p o l y a d e n y l a t i o n sites a n d d e m o n s t r a t e t h a t the oH(II) c o l l a g e n g e n e is a p p r o x i m a t e l y 30 k b in size a n d is highly c o n s e r v e d a m o n g v e r t e b r a t e species. I n a d d i t i o n , w e h a v e u s e d a n inters p e c i e s b a c k c r o s s to l o c a l i z e t h e h i g h l y c o n s e r v e d Col2a-1 g e n e b e t w e e n the m a r k e r s Myc a n d Int-1 in the c e n t r a l p o r t i o n o f m o u s e C h r 15.

Materials and methods

Isolation and characterization o f genomic clones containing the complete mouse etl(II) collagen gene A mouse genomic library (Waley et al. 1990) constructed in the cosmid vector Lorist 2, using a partial Hind III digest, was screened by hybridization methods described by Cross and Little (1986), using as a probe a 9.8 kb EcoR I fragment from the triple-helical coding domain of the human gene (Cheah et al. 1985). Filters were washed finally in 1 • SSC, 0.1% SDS at 65~ From this screening one strongly hybridizing clone, cosMcol.2, was selected for further study because of its hybridization to 4.8 and 2.4 kb. EcoR I DNA fragments containing the first and last exons, respectively of the human etl(I1) collagen gene were used. CosMcol.2 was digested with a combination of restriction enzymes and a restriction map was constructed (Fig. 1).

quence were obtained using the exonuclease/mungbean nuclease method of generating overlapping clones (Henikoff 1987). Sequence comparison analyses were performed using the UWGCG computer programs of Devereux et al. (1984), and the LFASTA program of Pearson and Lipman (1988).

R N A isolation and RNase protection assays Total RNA was isolated from 16.5d post coitum (p.c.) mouse fetuses using the lithium chloride-urea method (Lovell-Badge 1987). 32p_ labeled antisense RNA probes for RNase protection assays were generated as described by Melton et al. (1984) from two subclones, pEL111 and pPA120, pEL111 contained exon 1 and 5' flanking DNA in a 600 base pair (bp) Acc I-Sal I fragment subcloned in pGEM3 (Promega Biotec, USA) and pPA120 contained part of exon 52 and the 3' flanking region of the gene in a 1.2 kb Rsa I-BamH I fragment subcloned in pBluescript KS- (Stratagene, USA). Hybridization of the labeled antisense RNA with total RNA was in 80% formamide, 40 mM PIPES pH 6.7,400 mM NaC1, 1 mr,i EDTA at 45~ or 50~ for 18 h. Non-hybridized RNA was removed by digestion with RNase A (10 ~g/ml) and RNase T1 (0.5 p~g/ml)for 15 min at 37~ and the protected RNA fragments electrophoresed on 6% polyacrylamide/8 M urea gels as described (Lovell-Badge 1987). The sizes of the protected RNA fragments were determined with respect to DNA molecular weight markers, pGEM3 digested with Hpa II.

Primer extension analysis Primer extension analysis was performed using a synthetic oligonucleotide complimentary to part of the first exon of the mouse al(II) collagen gene. The sequence of the oligonucleotide used was 5' AGCACCAGCGACTGGGGAGC 3' complementary to nt. + 169 to + 188 of the gene. The synthetic oligonucleotide (0.25 ng) was labeled with 32p-ATP (Amersham, UK) using polynucleotide kinase under standard conditions (Maniatis et al. 1982). The labeled oligonucleotide was mixed with 20 p,g total RNA in 0.4 M NaC1, 40 mM PIPES pH 6.5 and 1 mM EDTA in a final volume of 10 p,1, boiled for 3 min, and incubated for 16-18 h at 45~ Primer extension buffer was added to a final concentration of 50 mM Tris-HC1pH 8.0, l0 mM DTT, 10 mM MgCl2, 50 mM KC1 and 0.5 mM each of dATP, dCTP, dGTP and dTTP (Pharmacia-LKB). Reverse transcriptase 00-20 units, BRL, USA) and RNAsin (40 units, Promega Biotec) were added and the reaction mixture incubated at 43~ for 1 h in a final volume of 50 p.1. RNase A (0.4 ng/ml) and EDTA (125 mM) were added and the mixture incubated for 15 rain at 37~ The mixture was diluted threefold with distilled H20 and extracted with an equal volume of phenol:chloroform (1:1). The nucleic acids were coprecipitated with 3 p~g tRNA in 2.5 volumes ethanol and electrophoresed on 5% polyacrylamide/6 M urea sequencing gels.

In situ hybridization Fetuses of CBA/N mice, 14.5d p.c., were dissected free of yolk sac and processed for in situ hybridization as described by Wilkinson et al. (1987). Single-stranded 35S-labeledRNA probes were generated from pEL107 (Fig. 1), a subclone containing the first exon of the mouse cd(II) collagen gene in an 860 bp Pst I-Sal I fragment in pBluescript KS . Sense and antisense RNA strands were generated using T3 and T7 RNA polymerase respectively. K5 (llford, UK) emulsion was used for autoradiography. After 6 days exposure to emulsion, slides were developed in D19 developer (Kodak) and stained for histology in haematoxylin (Ehrlich) and eosin. Sections were photographed using Kodak EPY-50 film under dark field illumination.

D N A sequencing Exons were sequenced on both strands by the dideoxy chain termination method (Sanger et al. 1977) on double-stranded DNA using Sequenase 2.0 (United States Biochemicals, USA) following the manufacturer's protocol. Longer contiguous stretches of DNA se-

Genetic mapping The interspecies backcross used in this study, between Mus spretus mice and animals of the Mus rnusculus domesticus strain C57BL/10,

K.S.E. Cheah et al.: Mouse Col2a-1 is conserved and located on Chr 15 E

E

I

E

I

EE

,~

I I

-"

[

E

I

I

173

PE

I I

E.

t

Lv

I

CosH9

t

N~ , pPAI02

t

H A

H i

[

I

[

/

I

[

[ I

s I

9

EE I I

43

a

It

45

I

PHP

Pv

2 eH

4.

l

I

II

1 x I

,~ r~

~ I

ltK ,

i

i

~/:". ;~'. f

,

I r

~

[

:

.."

51 N

pELI07

,~, 52

I

~

11

~ I

pEL111

I It

I

I I

K I[

I

11

K It K

1

I 0,5 kb

I

I I pPA120

Fig. 1. Organization of the mouse Cot2a-1 gene. Upper: Partial restriction map of cosmids cosMcol.2 compared with cosHcol.1 (Cheah et al. 1985) which contains the complete human COL2A1 gene. Positions of the TATA box and exon 1 (small box), termination codon (&), and polyadenylation signal ( t ) in both cosmids are indicated. EcoR I (E) sites were completely mapped. Not all other restriction sites are marked, only those which show the positions of the subclones referred to in the text. Lower: Regions of cosMcol.2 sequenced and positions of subclones. The direction of sequencing is indicated by arrows. Numbered lined boxes represent exons. Long open boxes mark the positions of subclones used for RNase protection, primer extension, in situ hybridization and chromosome mapping. A = Acc I, B = BamH I, H = Hind III, P = Pst I, Pv = Pvu II, R = Rsa I, S = Sac I, Sm = Sma I, Sp = Spe I, St = Stu I, X = Xba I. Exon number convention: The fibrillar collagens have a basic gene structure consisting of approximately 52 exons (Vuorio and de Crombrugghe 1990; Chu and Prockop 1991). For the four genes [ctl(I), a2(I), al(II) and al(III)], for which the primary gene structure is known, the number of exons coding for the main triple helix and carboxypropeptide domains of the protein is conserved.

The amino propeptide of fibrillar collagens contains both globular and triple helical domains. The number of exons coding for the amino propeptide varies between five and eight depending on the presence or absence of a cysteine-rich globular coding domain, or a fusion of two exons coding for Gly-X-Y triplets (reviewed in Vuorio and de Crombrugghe 1990; Chu and Prockop 1991). Although the complete gene sequences for all the fibrillar collagens have not yet been determined, the convention is to base the numbering of exons on the cd(I) and a2(I) collagen genes (Vuorio and de Crombrugghe 1990; Chu and Prockop 1991). On this basis, the numbering system assigns the N-propeptide to exons 1-6, with the variation in number of exons in this domain accommodated by designating exons as A or B, for example, exon 4A, 4B, 5A, 5B for COL2A1, and exon 4/5 for COL3A1; exons 7--48 encode the triple helix; exon 49 the end of the triple helix, telopeptide and start of the C-propeptide; exons 50-52 encode the rest of the C-propeptide and 3' untranslated region (Vuorio and de Crombrugghe 1990; Chu and Prockop 1991). This convention is used to assign exon numbers to the mouse Col2a-1 gene.

has been described (Stubbs et al. 1990; Van-der-Meer-de-Jong et al. 1990). Kidneys from backcross progeny were kindly provided by Dr. Brigid Hogan. DNA was prepared from frozen tissue, as described by Herrmann and Frischauf (1987). Southern blotting and hybridizations of the DNA were performed as previously described (Stubbs et al. 1990). Animals were scored for the presence or absence of Mus spretus alleles for each probe. Distances, in cM, are reported together with standard error calculations (in parentheses).

spretus mice, respectively. Col2a-1 was represented by pPA102, a 5.6 kb Hind III genomic fragment subcloned into pGEM3 (Fig. 1). The pPAI02 probe detected fragments of 6.5 and 7.5 kb in BamH Idigested C57BL/10 and M. spretus DNA, respectively.

Results

Molecular probes for genetic mapping

I so l a t i o n o f the c o m p l e t e m o u s e Col2a-1 g e n e

Polymorphic fragments were used to detect Myc, Int-1, Gdc-1 and Col2a-1 loci in the interspecies backcross. Segregation of Myc was followed using a 970 bp Sma I-Xho I fragment including the first exon and 5' flanking sequences of the mouse Myc gene (S. Roberts and D. Bentley, unpublished data). The Myc probe detected a fragment of 2.5 kb in Taq I-digested DNA of C57BL/10 mice, and a 3.5 kb fragment in Taq I digests of M. spretus DNA. The lnt-1 locus was detected using a 2.4 kb cDNA clone, pMT70, obtained from the American Type Culture Collection. The pMT70 probe detected fragments of approximately 15 and 13 kb in Bgl II-digested DNA of C57BL/10 and M. spretus animals, respectively. Gdc-1 was detected with a 780 bp cDNA clone (Chu and Prockop 1991) which identifies Hinc II fragments of 2.2 and 6.5 kb in DNA of C57BL/10 and M.

The cosmid clone, cosMcol.2, was isolated from a m o u s e c o s m i d l i b r a r y as a c a n d i d a t e f o r t h e c o m p l e t e m o u s e Col2a-1 g e n e o n t h e b a s i s o f its s t r o n g h y b r i d ization to DNA probes containing the first and last exons of the human COL2A1 gene and also a portion of the gene coding for part of the triple helix of al(II) c o l l a g e n ( C h e a h e t al. 1985). C o s M c o l . 2 c o n t a i n e d a 40 k b i n s e r t a n d a r e s t r i c t i o n m a p w a s c o n s t r u c t e d as s h o w n in F i g . 1. I n o r d e r t o c o n f i r m t h a t c o s M c o l . 2 c o n t a i n e d t h e m o u s e Col2a-1 g e n e , a n d t o d e t e r m i n e t h e e x a c t e x t e n t o f t h e g e n e e n c o d e d in t h a t c o s m i d ,

174

K.S.E. Cheah et al.: Mouse Cot2a-1 is conserved and locatedon Chr 15

parts of the cosmid at both ends and within the insert were sequenced as indicated in Fig. 1. CosMcol.2 was also compared with cosHcol.1, a cosmid containing the COL2A1 gene. The complete coding sequence for the human COL2A1 gene has been determined by DNA sequencing of genomic and cDNA clones (Cheah et al. 1985; Sangiorgi et al. 1985; Baldwin et al. 1989). An alternatively spliced exon has recently been identified in the N-propeptide coding region of the human oLl(II) collagen gene (Ryan et al. 1990; Ryan and Sandell 1990). The coding information for COL2A1 is distributed over 54 exons and can be assigned to specific numbered exons based on comparison with the type I collagen genes and the conservation of exon numbers in the triple helical and C-propeptide coding domains (see legend for Fig. 1). Figure 2 shows the DNA sequence of part of a fragment which hybridized strongly to the 5' end of the human otl(II) collagen gene. This fragment contained the 5' flanking region of the gene and first exon containing the 5' untranslated region, translational start, signal peptide and part of the amino-propeptide. This sequencing analysis shows the presence of a TATA box and initiation codon approximately I0 kb from one end of cosMcol.2. The 255 bp 5' flanking region of the gene contained several repeat sequence motifs and was GC-rich with over 70% G + C residues overall and 80% G + C in the region up to 60 nt upstream of the TATA box. In this region were three GC boxes and two inverted GC boxes, direct repeat sequences, 5' CGGTTTG 3', and 5' TGGGCTC 3' and three possible SP1 binding sites (5' GGGCGG 3'). An SP1 binding sequence was also present in the complementary strand within the first intron at + 251 and two direct repeats 5' AGACGCT 3' at +560 and +599. No CAAT box was found, but an inverted box, 5' ATTGG 3', was present at - 197. A similar inverted CAAT box has been found in the chicken al(I) collagen gene promoter. CAAT boxes have been found in the mouse al(I) and a2(I) collagen gene promoters and both CAAT and inverted CAAT boxes are present in the human COLIA1 gene. In the cd(III) collagen gene promoter no CAAT box was found (reviewed in BensonChanda 1989; Bornstein and Sage 1989). Within the region encoding the 5' untranslated sequence, several direct repeat sequences were present, some of which were close to the ATG codon. These were: 5' GCCTCCT 3' at +33 and +52; four 5' CTCCT 3' repeats between + 27 and + 59; 5' GCTCTGCC 3' at +95 and + 129; and the repeat 5' GCCAGGC 3' at + 124 and + 134, which overlaps with the 5' GCTCTGCC 3' repeat. The direct repeat 5' CTGCCGGG 3' was found just preceding the cap site at 13 and also within the leader at + 19. The function of these multiple repeat sequences is unknown, but may be related to a role in controlling the rates of transcription or translation since the majority are conserved in the three species compared. Unlike the od(I), c~2(I) and oLI(IH) collagen genes, no inverted repeat was found in the 5' untranslated region (reviewed in Bornstein and Sage 1989). The sequence flanking the ATG codon 5' CGGTGAGCCATGA 3' conformed only -

loosely with the consensus 5' CCA/GCCATGG 3' observed by Kozak (1987). However the sequence at positions - 1 to - 3 from the ATG conform with the consensus. A 2.2 kb Pvu II-BamH I restriction fragment which hybridized strongly to a 2.4 kb EcoR I fragment of the human COL2A1 gene containing the most 3' exon of the gene (exon 52) was identified in cosMcol.2. When a 1.1 kb Pst I fragment from this region of the cosmid was sequenced, several exons were identified with greater than 80% homology to the human COL2A1 gene (Fig. 3). Five exons coding for Gly-X-Y triplets within the triple helical region of od(II) collagen were identified. As shown in Fig. 3, these exons corresponded to part of exon 42 and exons 43-46 encoding amino acids 962-1078. Exons 43-46 were 54, 108, 54, and 108 bp in size respectively, characteristic of the genes coding for fibrillar collagens (Vuorio and de Crombrugghe 1990; Chu and Prockop 1991). The sizes of the exons were conserved across the species, however, intron sizes showed less conservation with generally shorter intron sizes for mouse Col2a-1 compared to the human gene. The intron-exon junction sequences were generally conserved between mouse and human. The 5' splice site junctions conform with the consensus for eukaryotes (Shapiro and Senapathy 1987) but some deviations were observed in the 3' splice sequences. However, all sequences were >70% homologous to the consensus. Further 3' (2.2 kb) to this fragment, two exons were found to contain coding information for part of the carboxyl propeptide, a translational stop codon and 3' untranslated region containing potential polyadenylation signals corresponding to exons 51 (243 bp) and 52, (Fig. 4), approximately 1.2 kb from the Hind III cloning site of cosMcol.2. The position of these exons within cosMcol.2 are summarized in Fig. 1. From these data it may be concluded that cosMcol.2 contained the whole of the mouse Col2a-1 gene. The gene is approximately 30 kb, very similar in size to the human COL2A1 gene which has been isolated and characterized previously (Cheah et al. 1985).

Identification o f the transcription start and polyadenylation sites o f Col2a-1 In order to determine the position of the start of transcription for the Col2a-1 gene, two approaches were used; RNase protection assays and primer extension assays (Fig. 5A and B). For the former, antisense riboprobes ( p E L l l l ) , complementary to the whole of exon 1 and the 5' flanking region of the gene, were used in RNase protection assays with RNA isolated from 16.5 day mouse fetuses. Antisense p E L l l l probes protected a 235 base RNA fragment which places the transcription start at the sequence 5' AGAGCG 3' (Fig. 2). Labeled sense strand did not protect any RNA species (data not shown). In order to obtain a more accurate estimate of the position of transcriptional initiation and the cap site, primer extension assays were performed with RNA isolated from 16.5d p.c. fetuses using an oligonucleotide complimentary to

K.S.E. Cheah et al.: Mouse Col2a-1 is conserved and located on Chr 15

175 -220

GTCCCGGCGACCGGAGGCTGTCTCGGTGCCCCGCC

Mouse Rat

GAGGGGGCAGTGTGGCAGTCCCAA--GGG

Human

.........

Mouse

CGATCAGGCCACTGGGCAC~TT

Rat

.....................

Human

.............

Mouse

GCGACTGGCCTTGGCAGGTGTGGGCTCTGGTCCGGCCTGGGCGGGCTCCGGGGGCGGG.

Rat

..................................

Human

-G-T-C---T-G

Mouse

GTCTCAGGTTACAGCCCCGCGGGGGGCTAGGGGGCGGCCCGCGGTTTGGGCCGGTTTGC

Rat

.............

Human

-GT ..............

Mouse

CAGCCTTTGGAGCGACCGGGAG~TGGAGCCTCTGCCGGGGGAAGACGCAGAG

Rat

..............................

Human

......

.C-CC

.............

C ................

.... ,---,,--'',--,-'-'--.---"

......

.... T-C ........

( ~ .G G G C G G G A A G C T G G G C T C A C

C ......

--. ...........

C---G--C-AGGGGT--A

...........

........

G--.

.G A A A G G G .....

GT-A---C-A--G

A .......

....

-iii

A ...............

.

C .... A .... CAG .... CA ...................

A-G--A

.........

C

-52

.....................

A .........

TTG.-

. .......................

.....

GA---C-+1

Exon 1

=::,

CGAA--G-G

-169

.C ......

.....

+8

C ...........

C ........

T---A

.C--C---G-G--G---A

Mouse

CGCCGCTGGGCTGCCGGGTCTCCTG...CCTCCTCCTGCTCCTAGGGCCTCCTGCATGA .........................

Human

---T

Mouse

GGGAGCGGTAGAGACCCGGACCCGCTCCGTGCTCTGCCGCCTCGCTGCGCTTCGCCCGG

Rat

---C-A

Human

---C

CCT .............. C-CTT--C

.......................

G ........

Human

CG ....

.............

A ................ +123

GT ..............

.....................

G-

+64

G--A

...........

......

......

Rat

...................

....

C-A ...........

CTG-CGT-TCGCTG---C

.

.

.

.

.

.

M

I

R

L

G

A

.

.......

.

T

-

Rat

Mouse Mouse

GCC.AGGCTCTGCCAGGCCTCGCGGTGAGCCATGATCCGCCTCGGGGCTCCCCAGTCGC

Rat

---A

Human

---. C .....

.........

Human

.

.

Rat

.

.

Mouse Mouse Rat

V

.

L

A--.

.

.

.....

V .

.

T

L

.

.

L

.

.

I

A

A

V

S

L +181

T ..................

.

T

.

Q

T ......................

C ................

. .

L

.G .....................

P

.

.

.

-

Q

.

L

R

C

.

.

.

.

.

.

Q

A---

V .

G

.

Q

D

A

TGGTGCTGCTGACGCTGCTCATCGCCGCGGTCCTACGGTGTCAGGGCCAGGATGCCCgt ..........................

A .........

+240

AA .....................

Human

....................

Mouse

aagtcgcccgccqcccctgectacttccctgacttgtgacccttttcctcctactccct

Rat

-g .......

Human

.....

Mouse

cccccaagtactggtccgggttagggcgetaagtcctaagcgcaggtatacaggagtac

Rat

.... tcc--c-ga--ta--cgccaa-tc---g-cg-gt--t---caggaga-cactg-t

Human

--,--,,-,,-,-,---,-,-,,--.

Mouse

actgatcctaattctgcgag.aaggacctctgtcgcagcatcttcagtgagagtcacga

Rat

c--a

Human

cggg

Mouse

ttggagactttctcaggtccctgcgttaagcaaggatatgcaacaggtttgaacacaga

+475

Mouse

cgcatcaccttocaccagctttgaccaggacgtttcgagctcagtaaaagaagccccac

+534

Mouse

acttaaaagtaggagagaaatcgagaqacqctacgggttgatttatcagcctagtctag

+593

Mouse

gggttaqacqctgcag

tt--c

a--a

G .......

.........

............

.............

t--tg

T .....

g .............

T ...................

c .......

.....

+299

c--t-.

g-c-gtgcctgc-t-ccatgcg-c

T ....

.......

--

.... agcatt---

+358

--c--gcg--t-g--cta-a-c-gc-c-

.... gt ..............

+416

c .....

c---=--g

Fig. 2. DNA sequence of the 5' end of Col2a-1. Comparison with human and rat sequences. Sequence identity is indicated by dashes (-) and gaps introduced by the LFASTA program to optimize alignment are represented by stops (...). Sequence motifs indicated: potential SP1 binding sites, underlined; inverted CCAAT and TATA boxes in bold type and boxed; direct repeats overlined, double underlined or double overlined. Exon sequences are in bold type, up-

per case; intron sequences in lower case. Horizontal arrow indicates position of transcription start, vertical arrows, the potential signal peptide cleavage sites. Nucleotides are numbered with respect to the transcription start site (5' = - n ; 3' = + n). Translated amino acid sequence is shown above the DNA sequence. The single letter amino acid code is used. Rat and human sequences were taken from Kohno et al. (1985) and Nunez et al. (1986), respectively.

the r e g i o n c o d i n g for part of the s i g n a l p e p t i d e . The c D N A s y n t h e s i z e d b y r e v e r s e t r a n s c r i p t a s e u s i n g the o l i g o n u c l e o t i d e p r i m e r w e r e c o m p a r e d with a set of s e q u e n c i n g r e a c t i o n s u s i n g the same oligonucleotide to

p r i m e D N A s e q u e n c i n g of that region of the gene. Figure 5B s h o w s that t r a n s c r i p t i o n starts with the seq u e n c e 5' A C G C A G 3' 4 bp 5' to that i n d i c a t e d b y R N a s e p r o t e c t i o n . The start site of transcription there-

Human Exon Mouse Human

42

A G E P G R E CTGCAGGTGAACCTGGACGAGAGgtgagc

968 .......... cctaggg .... cttggtgtgg agtgagaccc---g---tggc-c--a-tg--

Mouse Human

caggatgatcttgaggg .................. g--agg-gc-c--t-a-tctctgtgctgggtcagc

Mouse Human

.............. cagggctgggtg ................. ggagaagggggcgg---c ..... cc-acaggcgaaagcctagg

Mouse Human

..... agag ............................... gagggtgggtggaagcttgg cgggg .... agacgggcatagagaccaagggctgcttctg--a--a--ag .... a .....

Mouse Human

t..ggacgtgttagccccgaagtctca .... gttgggcagaagtgcacaggtcaatgctt -ga---aact--g--tt-a .... g-g-gtga a-g-g---c-tgg

Mouse Human

cagac .............. -t--gaggggctgggggag

Human Exon 43 Mouse Human

G S P G A D G P P G R D G tcatgtccccttgtccagGGCAGCCCTGGTGCTGATGGACCCCCTGGAAGAGATGGTGCA -tg-t--a-t .......... A ..... C ........... C ........ C ........

Human Exon Mouse Human

A G V K GCTGGAGTCAAGgtaagtgactggcgtctgtctgggatgggggttccgg..gactgtgct g .... t .... t ...... g--t-ca-t ..... gg--ag---at---c

43

aaggaca...cccag..agagac.. ....... agc ..... tc--g-c-tc tacaa ............ ..... tgggaaggttgt

....

cagagtgggagagtaaacagagagtggctctaagtgaatcc ..... g ..... .--gg ...... gaca ...... g---cg-t-

A C--T

Mouse Human

...ggcctaatgagcca.ct ........... ctagaaacaa .................. tcg ..... g-cag-t--g--gggggtggcagg-t-g ..... gtctcatctcagcctaga-

Mouse Human

gta...tctctgcctgcctcttctgaaacattctttgctgaatctgacactttgctcct. -g-cct---g-t .... t ........ g ct .... gc .... g--c-ct

Human Exon Mouse Human

44

982

986

a

..... g

V G D R G E T G A L G A P G A P G P P acagGGAGATCGTGGTGAGACTGGAGCACTGGGTGCCCCTGGAGCTCCTGGGCCCCCAGG . . . . . . TA" A--C--T--TG .... A--T ...... A-C ...........

G T--

Human Exon 44 Mouse Human

S P G P A G P T G K Q G D R G E A CTCTCCTGGTCCTGCTGGCCCAACTGGCAAACAAGGAGACAGAGGAGAGGCTgtgagtat ---C ..... C--C ..... T ........... G ,A . . . . . a . . . . .

Mouse Human

cctgagtttaggagtgaagagggtgaccaagtggtagagagtgttccgtgtttgcttttc .... gaa--c..---a--agcc-cctt-ccc--cgc-gtg-g-c-ga-.

Mouse Human

ctgttttgagtatccagaggtagcccctgggtctggggcctcctgggctgaggcagctgt ---gg--. .... ---....-c--t-t .... a.--aa--agcag---c--....-

Mouse Human

agacgaggggttt.catgtcctgacacttcattccctgacgctttctcgatctcccttaa ---g .... ccccca-c g-.. .. t ........ g.-g-tca---t--cct

Human Exon 45 Mouse Human

G A Q G P M G P S G P A G A R G I A cagGGTGCACAAGGTCCTATGGGTCCCTCAGGACCTGCTGGAGCCCGTGGGATTGCAgta C - - C . . . . . A, .A G--A--CCAG--g

Mouse Human

agtattggaaaatcctgtgtggagtcctc ...... cca-gtg ..... cact ......

Mouse Human

ccaggt ......

Human Exon Mouse Human

1005

1022

.... --ag-c-

..... c

Q

............ cttggaaaagagaca ccaccagggatagg--g---gggc--c--gcct

.......... ctcaaa ............................... ggttcctggc---c-gccctgtgtttccggggattcctcagcttggg--g

46

1040

....

tgagaca ....

G catag ............ tggaccagacttgaccaaatctgtgactttctcattcatagGG ggag-gggctcctgtcc---c--t---c .... tc .... g---t--g---tg---cc

1041

....

Human Mouse Mouse Human

P Q G P R G D K G E S G E Q G E R G L K CCCTCAAGGCCCCCGAGGTGACAAAGGAGAATCTGGAGAGCAGGGCGAGAGGGGACTGAA T ............ A GG CT ........ A--C .....

1061

Human Mouse Mouse Human

G H R G F T G L Q G L P G P P GGGACACCGAGGTTTCACTGGACTGCAGGGGTCTGCCTGGCCCTCCGgtgggtgtca~2.2kb ......... T--C ........ T ............... C ........ T---a

1078

A

--

--

P

K.S.E. Cheah et al.: Mouse Col2a-1 is conserved and locatedon Chr 15 fore is probably located at a position I53 bp upstream of the AUG codon and 23 bp from the TATA box and the size of exon 1 is therefore 238 bp (Fig. 5B). Sequence analysis of the 3' end of the gene showed two canonical polyadenylation recognition signals, 5' AATAAA 3' and 5' ATTAAA 3' (Wickens 1990), at positions 183 bp and 392 bp from the termination codon. In addition, the sequence 5' ATTTTTTAAA 3' was also present at 370 bp (Fig. 4). In order to distinguish which site was used, RNase protection assays were carried out to determine the size of the 3' most exon. Figure 5C shows that at least two sites are used. Microheterogeneity in fragment sizes was observed. A total of five protected fragments were found which could be classified into two classes: three fragments very close in size ranging from approximately 510,488 and 477 bases; and a 313-315 base major fragment and a 294 base minor fragment. The microheterogeneous fragments of 488, 477 and 294 bases could be the result of partial degradation during the RNase protection assay or represent further processed mRNA species. The first polyadenylation signal at 183 bp was followed by the GT-rich sequence 5' GGTGTTCT 3'. The second and third polyA signals were followed (at 6 and 39 bp respectively) by a stretch of 13 As and a GT-rich region including the sequence 5' GTGTT 3' and the direct repeat 5'TGTTTTGTT 3'. The GT-rich sequence conforms with the 5' YGTGTTYY 3' consensus required for polyadenylation (McLauchlan et al. 1985), suggesting that both signals should be functional. Because the probe used starts within exon 52 (see Fig. 4), the actual size of the exon is 541 or 350 bp, depending on which of the signals is used. The length of the 3' untranslated region is 459 or 166 bases, respectively. The third signal sequence, 5' ATTTTTTA 3', in the mouse gene is just 5' (12 bp) to the 5' ATTAAA 3' p o l y a d e n y l a t i o n signal and c o n f o r m s to the 5' UUUUUAU/C 3' consensus sequence important in regulating polyadenylation (Wickens 1990; Swimmer and Shenk 1985). However a similar sequence was not associated with the 5' AATAAA 3' signal which could account for the relatively lower abundance of mRNA arising from this polyadenylation site compared to the 5' AATAAA 3' site and could reflect a difference in efficiency in polyadenylation at this site (Fig. 5C).

CosMcol.2 hybridizes to cartilage in the developing mouse fetus Type II collagen is expressed at very high levels at prechondrogenic and chondrogenic sites during development (Von der Mark, K. and Von der Mark, H. 1977; Swalla et al. 1988). Apart from showing a high degree of sequence homology to the ~1(II) collagen

Fig. 3. Nucleotide and amino acid sequence of exons 43-46 of Col2a-1 and comparison with COL2A1. Representation of sequences and alignmentis as describedfor Fig. 2. See legend, Fig. 1, for explanationof exon numbering.Aminoacids are numberedfrom

177

genes of other species, the mouse od(II) collagen gene should also be shown to be expressed at high levels in developing cartilage. Figure 6 shows the pattern of in situ hybridization to sagital sections of a 14.5d mouse fetus, using an antisense RNA probe synthesized from pEL107, which contains exon 1 from Col2a-1. This probe hybridized to all chondrogenic sites, for example the chondrocranium, nasal septum, limb cartilage, vertebral column and ribs, a pattern of expression consistent with that expected for type II collagen. In addition, the al(II) collagen gene was expressed in some non-chondrogenic tissues such as the surface ectoderm, calvaria and brain (Fig. 6). Control sense probes did not hybridize to any specific part of the sections (data not shown). A detailed description and analysis of the temporal pattern of expression of the ed(II) collagen during mouse development is described elsewhere (Cheah et al. 1991).

Chromosomal localization of the murine Col2a-1 gene Restriction fragment length polymorphisms (RFLPs) between C57BL/6 and DBA/2 inbred strains could not be detected in the well-conserved Col2a-I region even though fragments of spanning the whole of cosMcol.2 were used to probe 18 different restriction enzyme digests. Fragments from cosMcol.2 did, however, detect differences between DNA of the Mus musculus domesticus strain C57BL/10 and the related species Mus spretus, allowing us to map Col2a-1 using an interspecles backcross (Robert et al. 1985). Polymorphic restriction fragments for Col2a-1 corresponding to M. musculus domesticus and M. spretus Col2a-I alleles are shown in Figure 7A. COL2A1 is located on human Chr 12qi4.3 (Law et al. 1986) near a group of genes, including KRT-2, INT1, GPD-1 and HOX-3, with counterparts that are also linked in the mouse. The homologs of these four genes have been mapped to the central portion of mouse Chr 15 (Nadeau et al. 1990). To examine the possibility that Col2a-1 is also included in this conserved linkage group, we traced the segregation of Col2a-1 sequences in 50 backcross animals. The distribution of Col2a-1 alleles among those animals was then compared with that of markers known to be localized on mouse Chr 15. Among those well-mapped markers were Int-1 and Gdc-1, the homologs of genes (INT-1 and GPD-1, respectively) mapping close to COL2A1 on human Chr 12. In addition, we traced the segregation of Myc, located nearer the centromere of mouse Chr 15 and related to a human gene which maps to Chr 8 (Sakaguchi 1983). The RFLPs used to trace the M. musculus domesticus and M. spretus alleles for each locus are shown in Fig. 7A.

the start of translation as reported in the cDNA sequenceby Baldwin et al. (1989). Humansequencewas extractedfrom Cheahet al. (1985).

52

N

G

N

.

. .

.

.

V

. Q

.

.

M

.

.

T

.

.

F

.

.

L

.

.

. .

.

R

. .

.

L

. .

.

L

.

.

.

. .

.

. .

.

. .

.

. .

. .

.

. .

.

. .

.

. .

.

. .

.

.

.

M

M .

-

-

-

K

.

.

A

.

.

L

.

. .

.

L

. .

.

.

I

.

.

.

N

.

.

.

G

.

.

.

S

.

.

. .

.

.

N

.

.

.

D

.

V

E

M

I

I

I

R

-

-

-

.

G

.

E

Mouse

.

.

F

.

.

T

.

.

Y

.

T

.

A

.

V

V .

L

.

-

K

.

E .

D

.

. .

.

T

.

.

.

.

Bovine

Human

.

.

G

.

.

.

K

.

.

.

W

.

. . .

.

G

. .

.

K

.

. .

. .

. .

. .

.

. .

.

. .

.

. .

.

. .

. .

. .

.

....

G- ....

G ......

.....

C .....

G-A--T--T

C ........

C ........

.....

T ...........

T ..............

T ..............

C--A--

A ......

A ......

C-

C-

258C

238C

224C

208C

188C

168C

147C

Q

.

.

. .

.

.

E

.

.

.

F

.

.

.

G

.

.

.

V

.

.

. . D

.

. .

.

.

I

.

.

.

G

.

.

.

P

.

.

.

V

.

.

. . C

.

.

F

L

G .................

C .....

T--T

.....

A .................

---C

G ....

................

A ................

A ................ GA .........

C ............................

.....

....

CTTTCTAAGAGACCTGA ..............

.................

....--C

CTCT

..................... TCGGGACCCTCCGGCGCCGTC--C

Chicken

. ...................

. ...................

....................

A ........

A .....

A .....

CCTCCC-TC---GTGTATCA-CA-CA-G--T-CTAA--TA-C.ATA-AACAGA

G .......

........

-T--.

.........

TIA-GCTGCT--

T*

T~C--AT

G ..........

....

......

CC .....

...............

- .....

CAT--A-T

.........

C-T

C .....

G ......

CAGCC---.

CA ......

Human

....

TGATGTCACTCACACACACAAAA..GA.CCTGCCTCTCTTGTAAACTTGGGCACTTTGTG .......

GCA-TC-TA

......

A ....

C---A--G--A-

....

T-

. .... CAAC---

AG ......

A ....

CAGGATGCTGAAGTCACACTGCCTGGTTTGG

G-.

Mouse

T--TT--T---C-GAGAGTA

--TCT

.....

CT..GTCTGTGGACCTTACAATC Human

-T--A Mouse

..........

C-.

.......

.......

.......

Human

CT-TTGG

..-T--.

TCCGGTTGTATTTACTAGTCCTTGGTTCTATAAGGCATGCCCAAATA..TGGTCCCAGGA

.........

..- ......

Mouse

Chicken

Bovine

G-.

..- ....

AAAAAAAAA..TCAT.TGGAAAGGATATGGTGACTTGTGTTTTGTTCTTTGTTTTGTTCT

AG ..............

Human

..... Mouse

.---A

-N .....

Chicken

TG .... ....................................

Bovine

G ..................

CTATTCTGTGTCAAACACCTCTG~ATTTTTTAA~CATCAATTGAT~TT~CCAAAA ......................................................

.......

C--C-

G-

G---

Human

...........

................

......

.....

.....

......

.....

Mouse

..--.-GCGC

A--..--TG-G.T

Chicken

............

.........

-,,,-,-..----,---,---,

Bovine

T---G---C

AGAAGAC..CCCTACAGATGCTGGGCGCAGGGAC..TG.CT...GTCCTACACAATGGTG .......

Human

A--.

A--.

CC ......... A---ACCC---G

Mouse

T ....

T .........

GCAGGTGTGACGGCCCCCC-CCCCACA-A-GGAT-T-GCAA--GCAGGTATCG-G-AT--

A ........

......

chicken

........

A-T

...............

Bovlne

- ..........

...............

Human

GGTGTGAGTCAGACGCCCCCCGAGTGACT.GTTCCCAGCCCAGCC

...............

Mouse

CT ....

....

C--C-A---GC--A

C ...................

G---G---GGTTTCGT---GC-G-TGGA-GTGGG-ATCAATCTG

....

A---G

........

T-A

A ....

Chicken

T ....

TC .....

..................

G .....

ATTTATTGTCTTCCTGTAAGACCTCTGGGTCCAGGCGGAGACAGG.AACTATCT

G .....

........................

...........

- ..................

Bovine

--A

..CA---C--

.......

....

.........

.........

A

A

272C

ACTGGGCAGACTGC~TCTCGGTGTTCTATTT

Human C .......

T-C

T-T

....

- ......................................

Mouse

. ..........

..................... ..........

......

......

AGGGTTGTTGT-

.A--C

.A--T

.-C-ATTCATCC-A..C--T C-...----TT-A---CAC

C--G--C-CAG-ACAA---CTGC-CT--A-G--TGGCACGACCCCGCGCCCCT

T .......

T-T

Bovine

....

......

......

Human

GGA

---TT

---TT

CAGCCCGGACTGTGCTCCC

A-AA-AAAAAG-AA-G-ATC-A-C--AAT--CATAA-AG-AAAC-A

T ....

C ..............................

CAGGA.TCTGCACTGAATGGCTGACCTGACCTGATGATACCCAACCGTCC..TCCCCTCA T ....

STOP

GT .........

ATTTGTGTGTTTGTTTG-TGTTTGGTTGTT-TTTTTTGTTTC--TTTTTTTTTT-TTT-T

Mouse

Chicken

Bovine

Human

Mouse

Chicken

Bovine

Human

Mouse

Chicken

Bovine

GAAACAACACAATCCATTGCGAACCCAAAGGACCCAAACACTTTCCAACCGCAGTCACTC

G .....

C ...................................

C .................

...............

.....

........

........

Human

C--T

C--G

C--G

TGAACAGGAATTTGGTGTGGACATAGGGCCTGTCTGCTTCTTGTAAAACCCCCGAACCCT

E

.

.

.

Mouse

Chicken

Bovine

Human

Mouse

Mouse

Human

Bovine

Chicken

Fig. 4. Interspecies sequence comparisons of the 3' end of the a 1(II) collagen gene. The sequence of exons 51 and 52 of Col2a-1 are compared with those of human, bovine and chicken extracted from Cheah et al. (1985), Sangiorgi et al. (1985), and Sandell et al. (1984), respectively. Representation of exon sequences and alignment are as described for Fig. 2. Amino acid residues numbered with respect to the beginning of the C-propeptide are as for the human gene and indicated with a C as suffix. Arrows (,l~) indicate the positions of cleavage and polyadenylation; arrows ( $ ) indicate cleavage positions which generated fragments shown in Fig. 5C. Canonical polyadenylation sequences are boxed and the YGTGTTYY consensus sequence is underlined and in bold type; * marks the start of pPA120 and A marks the polyadenylation sites of human, bovine and chicken cd(II) collagen mRNAs.

...........

T--G

A .....

p

p

R

A

G ......

...........

G--G

.

.

.

Y*

Chicken

.

.

V

E

-

...........

.

G

G .................

.

.

.

T--G G--G

I

.

.

G ..............

.

.

T--T

V

.

Bovine

.

.

.

...........

.

.

Human

.

.

.

C--A

T

.

C--A-

A

G-

A

E

T

A-

cc---

ag--t-ct

D A-

CC---

B Q K T S R L P I I D I A P M D I G G ATCACAGAAGACCTCCCGCCTTCCCATTATTGACATCGCACCCATGGACATTGGAGGGGC

.

.

.

................. C .................

.

.

.

.....

..... .....

Mouse Mouse

.

.

Chicken

.

T--C

........

........

............

c--c

cc--c--c.-

Chicken

....

Bovine

tt ......

T

-

C ...........

....

H

.

.

....

Human

-c-t

C

-

..................

GC-T-T--G

T--C-TT

C ....

ctccccttttctgcc.tagAAACACACTGGTAAGTGGGGCAAGACCGTCATCGAGTACCG

K

Exert

G

.

C--G C--C--G

C--C--G

G .....................................

Mouse

.

Human

383bp>

.

Bovine

The mouse Col2a-1 gene is highly conserved and is linked to Int-1 on chromosome 15.

Type II collagen is the major extracellular matrix component of cartilage and correct expression of the alpha 1(II) collagen gene is important for ver...
2MB Sizes 0 Downloads 0 Views