Proc. Nati. Acad. Sci. USA Vol. 89, pp. 8443-8447, September 1992

Developmental Biology

Genomic structure and chromosomal mapping of the mouse N-cadherin gene (gene family/gene duplication/chromosomal location)

SEUI MIYATANI*t, NEAL G. COPELAND*, DEBRA J. GILBERTt, NANCY A. JENKINSt, AND MASATOSHI TAKEICHI* *Department of Biophysics, Faculty of Science, Kyoto University, Kitashirakawa, Sakyo-ku, Kyoto 606, Japan; and tMammalian Genetics Laboratory, ABL-Basic Research Program, NCI-Frederick Cancer Research and Development Center, Frederick, MD 21702

Communicated by James D. Ebert, May 26, 1992

N-cadherin is a member of the cadherin cellABSTRACT cell adhesion receptor family that includes P-, E-, and R-cadherin and liver cell adhesion molecule (L-CAM). In this study, we determined the structure of the mouse N-cadherin gene by analyzing overlapping genomic clones obtained from a mouse genomic library. This gene consists of 16 exons that disperse over >200 kilobases of genomic DNA. This large size of the N-cadherin gene, compared with its cDNA (4.3 kilobases), is ascribed to the fact that the first and second introns are 34.2 kilobases and >100 kilobases long, respectively. When the N-cadherin gene was compared with that of L-CAM and P-cadherin, the exon-intron boundaries were found to be fully conserved between them, except that the P-cadherin first exon includes the first and second exons of the other two genes. Also, the second intron, which is equivalent to the first intron in P-cadherin, is exceptionally large and this structural feature is conserved in all of these genes. An interesting feature of the N-cadherin gene is that this gene has an extra 16th exon that is almost identical to the other exon, 100% in the coding region and 99% in the 3' untranslated region in the nucleotide level. We also determined the chromosomal localization of the N-cadherin gene by interspecific backcross analysis and found that this gene is localized in the proximal region of mouse chromosome 18. The E- and P-cadherin genes are tightly linked and located on chromosome 8 in this species. Thus, N-cadherin is unlinked to these other cadherin loci.

expression patterns are very complex (14). Since each cadherin molecule tends to bind exclusively to an identical molecule in a homophilic fashion, cells are selectively connected with those expressing the same cadherins. This molecular family is therefore thought to be essential for the cell sorting mechanism. To understand the molecular basis of differential expression of cadherins as well as the evolution of the cadherin family, it is essential to reveal the genomic structure and chromosomal localization of each cadherin gene. Recently, the organizations of the chicken L-CAM (15) and K-CAM and the mouse P-cadherin genes were determined and compared (8, 16), and the results of this study demonstrated that the exon and intron patterns are highly conserved between these three genes, although their size is very distinct. Regarding chromosomal localization, the human E-cadherin gene has been mapped to chromosome 16 (17) and the human N-cadherin gene has been mapped to chromosome 18 (18). Thus, they are localized to different chromosomes. On the other hand, E-cadherin gene is located on chromosome 8 in the mouse (19) and, interestingly, the P-cadherin gene locus is tightly linked to the E-cadherin locus (16). Also the chicken L-CAM and K-CAM are tandemly located on the same chromosome (8). In this study, we cloned the mouse N-cadherin gene and analyzed its genomic structure and chromosomal localization. The results showed that the N-cadherin gene consist of 16 exons and its exon-intron boundaries are identical to those of the P-cadherin and L-CAM genes, though there is some exception, and this gene is localized to a proximal region of chromosomal 18.

Cadherins constitute a gene family of Ca2+-dependent cellcell adhesion molecules (1). More than 15 members of this family have been identified; these include E-cadherin (uvomorulin) (2, 3), P-cadherin (4), N-cadherin (5, 6), liver cell adhesion molecule (L-CAM) (7), and K-CAM (8). These members share a similar primary structure that is divided into the extracellular, transmembrane, and cytoplasmic domains. Amino acid sequences are conserved to various extents among these members; the conservation is most obvious in the cytoplasmic domain as well as in some of repeated sequences in the extracellular domain. Recent analyses of desmosomal glycoproteins show that desmoglein and desmocollins I and II are similar to cadherins, although their sequences are distinct from those of the original cadherins, in particular in the cytoplasmic domain (9-12). Also, Drosophila has cadherin-like proteins; for example, a protein encoded by fat has cadherin-like repetitive domains in the extracellular region but its cytoplasmic domain is not identical to that of cadherins (13). Thus, cadherins constitute a supergene family from vertebrates to invertebrates. Different members of the cadherin family are expressed in different tissues or different regions of a tissue, and their

MATERIALS AND METHODS Isolation and Characterization of Genomic Clones. Gene cloning, restriction mapping, and sequencing of DNA were done by standard methods (20) with some modifications. A mouse genomic library was constructed using Sau3AI partially digested DNA ofthe C57BL/6 strain and bacteriophage vector EMBL4 (16). Genomic clones that contain exons were isolated from this library using a full-length mouse N-cadherin cDNA (6) radiolabeled by the random primer labeling kit (Takara Shuzo, Kyoto), and the first and second intron regions were cloned by gene walking. Positive clones were used for restriction mapping, and the EcoRI or HindIII fragments of these clones were subcloned into the pUC18 vector for sequencing analysis. The locations of exons were determined by restriction mapping and Southern blot analysis using 300- to 500-base-pair (bp) fragments excised from the N-cadherin cDNA as probes. From the first to the seventh exon, the restriction fragments cloned into pUC18 that contained these exons were

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Abbreviation: L-CAM, liver cell adhesion molecule. tTo whom reprint requests should be addressed.

8443

8444

Proc. Natl. Acad Sci. USA 89 (1992)

Developmental Biology: Miyatani et al.

sequenced by the method of exonuclease III-mung bean nuclease deletion, followed by the modified dideoxy nucleotide chain-terminating reaction (Takara Shuzo, Kyoto), using double-strand templates. The other exons were directly sequenced using primers synthesized by a DNA synthesizer (Applied Biosystems). These primers were prepared by deducing both ends of the exons of the N-cadherin gene from the data of exon-intron boundaries conserved between the L-CAM and P-cadherin genes. Interspecific Backcross Mapping. Interspecific backcross progeny were generated by mating (C57BL/6J x Mus spretus)Fl females and C57BL/6J males as described (21). A total of 205 N2 progeny were obtained; a random subset of these N2 mice were used to map the N-cadherin locus. DNA isolation, restriction enzyme digestion, agarose gel electrophoresis, Southern blot transfer, and hybridization were performed essentially as described (22). All blots were prepared with Zetabind nylon membrane (AMF Cuno). The N-cadherin probe MNC1OA was a 4327-bp mouse cDNA (6) that was labeled with [a-32P]dCTP using a nicktranslation kit (Boehringer Mannheim). Washing was done to a final stringency of 120 mM NaClI12 mM sodium citrate/9 mM NaH2PO4/0.1% SDS, 650C. The N-cadherin probe detected fragments of >23.3, 6.1, 5.6, 4.8, 2.3, and 1.9 kilobases (kb) in Bgl I-digested C57BL/6J DNA and fragments of 8.2, 7.2, 5.6, 4.8, 2.6, and 2.3 kb in Bgl I-digested M. spretus DNA. Probes for the loci linked to Ncad, fibroblast growth factor a (Fgfa), and glucocorticoid receptor-1 (Gri-l) have been described (23). Recombination distances were calculated as described (24) using the computer program SPRETUS MADNESS. Gene order was determined by minimizing the number of recombination events required to explain the allele distribution patterns.

RESULTS AND DISCUSSION Genomic Organization of the Mouse N-Cadherin Gene. The mouse genomic library was screened using the mouse N-cadherin cDNA as a probe, and 46 independent clones were isolated and analyzed by digestion with restriction enzymes EcoRI, BamHI, HindIII, Kpn I, and Sma I (Fig. 1). These clones overlapped except that they did not contain the complete first and second introns. We therefore attempted to isolate genomic clones that cover the intron region by gene walking, but we failed to obtain the entire second intron. Restriction mapping and sequence analysis showed that the mouse N-cadherin gene consists of 16 exons that spread over >200 kb (Fig. 1). The first intron is 34.2 kb and the

second intron is >100 kb long. The sequences of exon-intron boundaries is shown in Fig. 2. All of the splice donor and acceptor sites conformed to the GT/AG rule for nucleotides immediately flanking exon borders. We found by Southern blot analysis that the N-cadherin gene is a single copy in the mouse genome (data not shown). Existence ofan Extra 16th Exon. During the screening of the genomic library, we obtained two different groups of genomic clones that hybridized with excised fragments involved in the 16th exon of the N-cadherin gene. These two groups, MNG22 and MNG25, were further analyzed to reveal their relationship. MNG22 and MNG25 gave completely different restriction maps of EcoRI, HindIII, BamHI, Kpn I, and Sma I, except one HindIII site, which was located on the 3' untranslated region (Fig. 3a). MNG22 clone overlapped with another genomic clone, MNG20, that contains the 15th exon, as revealed by restriction mapping and Southern blot analysis, whereas MNG25 clone did not overlap with MNG20 and MNG22 clones (data not shown). This result suggests that MNG22 contains the 16th exon equivalent to that on the N-cadherin cDNA confirmed by sequence analysis, and MNG25 contains extra sequences homologous to the 16th exon, suggesting the existence of an extra 16th exon. To confirm this possibility, the C57BL/6 mouse genomic DNA digested with EcoRI was subjected to Southern blot analysis using an excised fragment of the 16th exon as a probe. The results showed that two bands of 5.6 kb and 9 kb hybridized with the probe (Fig. 3b, lane 1). When MNG22 and MNG25 DNA digested with EcoRI were used for Southern blot analysis, the above probe reacted with 5.6-kb and 9-kb bands, respectively (Fig. 3b, lanes 2 and 3). These results suggest that 9-kb and 5.6-kb fragments containing the 16th exon exist in the mouse genome. The nucleotide sequence of the region containing the 16th exon in these two clones were determined and compared (Fig. 4). It was found that the sequence of the 16th exon in MNG22 is completely identical to the corresponding region of the N-cadherin cDNA and shows striking similarity to that of MNG25. The extra 16th exon in MNG25 showed 100%1 similarity in the coding region and 99% similarity in the 3' untranslated region to those of the 16th exon existing in MNG22. This result suggests that this extra 16th exon was very recently duplicated in the mouse genome. In Northern blot analysis for N-cadherin mRNA, three bands, 5.3 kb, 4.3 kb, and 3.5 kb, are generally detected, of which the 4.3-kb band is the major component (6). It is possible that the extra 16th exon has a longer or shorter 3' untranslated region and is used for generating either of these messages instead of the 16th exon in MNG22 through alter50 kbp

t1

-/1-

MNG9 MNG7

IH I 11H I MNG263 MNG123

MNG12 MNG11

MNG72

MNG28 MNG157 MNG258

MNG58 MNG19 MNG17 MNG16

MNG23

MNG20 MNG30 MNG22

FIG. 1. Genomic organization of the mouse N-cadherin gene. Sixteen exons encoding the mouse N-cadherin gene are shown as vertical bars. The genomic clones are aligned with respect to the genomic DNA. Size is indicated by a filled bar.

Developmental Biology: Miyatani et al.

Proc. Natl. Acad. Sci. USA 89 (1992)

native splicing. However, there is no direct evidence that this extra 16th exon is transcribed. More detailed analysis of this extra 16th exon is, therefore, necessary for understanding its function. Comparisons with P-Cadherin and L-CAM Gene Structures. The structure of the mouse N-cadherin gene was compared with that of the mouse P-cadherin (16) and the chicken L-CAM (15) genes. The N-cadherin gene is >200 kb long and the longest among these three genes; it is -5 times longer than the 45-kb P-cadherin gene and 23 times longer than the 9-kb L-CAM gene. Interestingly, however, all exonintron boundaries are conserved between these three genes, except that the first exon of the P-cadherin gene corresponds to the first and second exons of the other two genes (Fig. 5), as we have already shown such a relationship between P-cadherin and L-CAM (16). The above results suggest that DNA insertion or recombination occurred only within the introns in the process of gene duplication and conversion generating the cadherin gene family. The high conservation of exon-intron boundaries suggests that the exon pattern may be crucial for generating the functional domain structure of cadherins. For example, each exon unit might be equivalent to a functional unit in cadherin molecules. Cadherins contain repeated amino acid sequences in the extracellular domain, and these repeats could constituent different domains. However, introns split each of the repeating units at unique positions, as discussed previously (16). The relationships between the exon organization and the domain structure of cadherin molecules, therefore, still remain to be clarified. It should be noted that the overall pattern of introns is similar in these three cadherin genes irrespective of their size. The second intron of the N-cadherin gene spans >100 kb, EXON INTRON No. (bp) (kbp) 1

392

34.2

DONOR CAG 20 Gin

3

227

1.55

AAG 133 Lys

gtatggtacc ..... taatgcaaag GAA

4

147

1.85

AGA

gtaagagact ..... tcatcctcag ATC

5

156

5.9

6

144

0.2

182 Arg

CAC 234 His

CCT 282 Pro

7

174

4.15

d

GAG 340 Glu

134 Glu

H

B

H K

E BE

K

E H

11

H

E B

F

mc-1

1831w

235 Le

gtaagtagac ..... cactccctag GGG 283 Gly gtgagcagac ..... gtctttttag AAA 341 Lys gtgagtacaa ..... tttctcgcag TTC

4

9

186

2.25

10

254

1.8

ATC AG gtatgaagca ..... cattttgcag TAC 534 Tyr 532 lie 533 Arg

11

143

2.4

AAT G 580 Asn

12

234

0.7

AAT G

386 Thr

387 Phe

AAA 448 Lys

gtaagtgtcc ..... tctttttcag CCA

b 1

2

3

449 Pro

gtatgtaatc ..... ctccttttag GA ATC

581 Gly 582 Be gtaagagcag ..... ttctgtgtag GT GAT 659 Gly 680 Asp

13

234

2.1

CTG A 736 Leu

gtgagtgttt ..... ccctatccag TC CTT

14

140

17.6

CAG 783 Gin

gtgagtggtg ..... atgcttgcag GAC 784 Asp

15

165

16

GAG

gtacagaaag ..... cctaacccag GGC

838 Glu

H

gtaagcttgc ..... tattagacag TTG

138

658 Asn

H K IH E 11 11

gtagtacctt ..... tcctttccag TG AAA 58 Val 59 Lys

8

ACT

5 kbp

H

........... 21Ab

112 >106.25

57 Asn

a

gtaagaggag ..... tgtgttgcag GCG

2

AAT G

which cover the half of this gene, and the second intron of the chicken L-CAM is 3.5 kb long, containing over one-third of the 9-kb L-CAM gene. The first intron of mouse P-cadherin, which corresponds to the second intron of the other two genes, spans >23 kb and occupies half of the P-cadherin gene. Our recent results indicate that the first intron of the P-cadherin gene contains enhancer activities (M. Hatta and M.T., unpublished data). Therefore, the consistently large size of these introns may have some importance in providing cis regulatory elements that play a central role in transcriptional regulation of these genes. Chromosomal Localization of the N-Cadherin Gene. The mouse chromosomal location of the N-cadherin locus was determined by interspecific backcross analysis using progeny derived from matings of (C57BL/6J x M. spretus)Fl x C57BL/6J mice. This interspecific backcross mapping panel has been typed for >850 loci that are well distributed among all of the autosomes as well as the X chromosome (21). C57BL/6J and M. spretus DNAs were digested with several enzymes and analyzed by Southern blot hybridization for informative restriction fragment length polymorphisms using the Ncad probe. The N-cadherin locus was mapped by following the segregation of the 8.2- and 7.2-kb M. spretus-specific Bgl I restriction fragment length polymorphisms (see Materials and Methods). The two fragments cosegregated and the mapping results indicated that N-cadherin (Ncad) is located in the proximal region of mouse chromosome 18 (Fig. 6). The loci linked to N-cadherin included Fgfa and Grl-l. Although

G H EBE E MNG22 II 11

ACCEPTOR

8445

737 lie 738 Leu

839 Gy

16 1481 + poly(A)

FIG. 2. Exon-intron junctions of the mouse N-cadherin gene. Exon sequences are in uppercase letters and introns are in lowercase letters. The amino acids with respect to each boundary are indicated with the numbers in N-cadherin cDNA (6).

FIG. 3. Restriction map and Southern blot analysis of genomic clones MNG22 and MNG25. (a) Open and filled boxes show the untranslated and coding regions of the 16th exon, respectively. E, EcoRI; B, BamHI; H, HindIII; K, Kpn I. mc-i is a 600-bp Pvu II-HindIII fragment of 3' untranslated region of the 16th exon in MNG22 and used for the Southern blot analysis. (b) DNA isolated from C57BL/6 mice (lane 1), MNG22 DNA (lane 2), and MNG25 DNA (lane 3) were digested completely with EcoRI, fractionated by electrophoresis in a 1% agarose gel, and transferred to a nitrocellulose filter. To adjust the genome copy number, the amount of DNA in the gels was 10 ,ug per lane for the total DNA and 166 pg per lane for the genomic clones. The blotted filter was hybridized with the random primer labeled mc-1 probe (see above). After hybridization, the filter was washed to a final stringency of 30 mM NaCl/3 mM sodium citrate/0.1% SDS at 65°C.

8446

Developmental Biology: Miyatani et al.

Proc. Natl. Acad. Sci. USA 89 (1992)

T G L K A A D N D P T A P P Y D S L L V F D Y E G TCATGTCTCTCCTAACCCAG GGCCTTAAAGCTGCTGACAACGACCCCACGGCGCCACCGTATGACTCCCTCTTAGTCTTTGACTACGAGGGC GGGAACTCTTCATTAATGAG GGCCTTAAAGCTGCTGACAACGACCCCACGGCGCCACCGTATGACTCCCTCTTAGTCTTTGACTACGAGGGC S G S T A G S L S S L N S S S S G G D Q D Y D Y L N D W G P R AGCGGCTCCACGGCTGGCTCCTTGAGCTCCCTCAACTCCTCCAGTAGCGGTGGGGACCAGGACTATGACTACCTGAATGACTGGGGACCCCGC

AGCGGCTCCACGGCTGGCTCCTTGAGCTCCCTCAACTCCTCCAGTAGCGGTGGGGACCAGGACTATGACTACCTGAATGACTGGGGACCCCGC F K K L A D M Y G G G D D * TTCAAGAAACTGGCGGACATGTACGGCGGTGGTGACGACTGAACGGCAGGACGGACTTGGCTTTTGGACAAGTATGAACAGTTTCACCTGATA

TTCAAGAAACTGGCGGACATGTACGGCGGTGGTGACGACTGAACGGCAGGACGGACTTGGCTTTTGGACAAGTATGAACAGTTTCACCTGATA

TTCCCAAAAAAAAGCATACAGAAGCTAGGCTTTAACTCTGTAGTCCACTAGCACCGTGCTTGCTGGAGGCTTTGGCGTAGGCTGCGAACCAGT TTCCCAAAAAAAAGCATACAGAAGCTAGGCTTTAACTCTGTAGTCCACTAGCACCGTGCTTGCTGGAGGCTTTGGCGTAGGCTGCGAACCAGT

TTGGGCTCCCAGGGAATATCAGTGATCCAATACTGTCTGGAAAACACCGAGCTCAGCTACACTTGAATTTTACAGTAAAGAAGCACTGGGATT TTGGGCTCCCAGGGAATATCAGTGATCCAATACTGTCTGGAAAACACCGAGCTCAGCTACACTTGAATTTTACAGTAAAGAAGCACTGGGATT TATGTGCCTTTTTGTACCTTTTTCAGATTGGAATTAGTTTTCTGTTTAAGGCTTTAATGGTACTGATTTCTGAAATGATAAGGAAAAGACAAA

TATGTGCCTTTTTGTACCTTTTTCAGATTGGAATTAGTTTTCTGTTTAAGGCTTTAATGGTACTGATTTCTGAAATGATAAGGAAAAGACAAA ATATTTTGTGGCGGGAGCAGAAAGTTAAATGTGATACGCTTCAACCCACTTTTGTTACAATGCATTTGCTTTTGTTAAGATACAGAACGAAAC

ATATTTTGTGGCGGGAGCAGAAAGTTAAATGTGATACTCTTCAACCCACTTTTGTTACAATGCATTTGCTTTTGTTAAGATACAGAACGAAAC

AACCAGATTAAAAAAAATTAACTCATGGAGTGATTTTGTTACCTTTGGGGTGGGGGGGATGAGACCACAAGATAGGAAAATGTACATTACTTC

AACCAGATTAAAAAAAATTAACTCATGGAGTGATTTTGTTACCTTTGGGGTGGGGGGGATGAGACCACAAGATAGGAAAATGTACATTACTTC TAGTTTTAGACTTTAGATTTTTTTTTTTCACTAAAATCTTAAAACTTACGCAGCTGGTTGCAGATAAAGGGAGTTTTCATATCACCAATTTGT

TAGTTTTAGACTTTAGA-TTTTTTTTTTCACTAAAATCTTAAAACTTACGCAGCTGGTTGCAGATAAAGGGAGTTTTCATATCACCAATTTGT AGCAAAATGAATTTTTTCATAAACTAGAATGTTAGACACATTTTGGTCTTAATCCATGTACACTTTTTTATTTTCTGTATTTTTTCCACCTCG AGCAAAATGAATTTTTTCATAAACTAGAATGTTAGACACATTTTGGTCTTAATCCATGTACACTTTTTTATTTTCTGTATTTTTTCCACCTCG

CTGTAAAAATGGTGTGTGTACATAATGTTTATCAGCATAGACTATGGAGGAGTGCAGAGAACTCGGAACATGTGTATGTATTATTTGGACTTT

CTGTAAAAATGGTGTGTGTACATAATGTTTATCAGCATAGACTATGGAGGAGTGCAGAGAACTCGGAACATGTGTATGTATTATTTGGACTTT GGATTCAGGTTTTTTGCATGTTAATATCTTTCGTTATGGGTAAAGTATTTACAAAACAAAGTGACATTTGATTCAACTGTTGAGCTGTAGTTA

GGATTCAGGTTTTTTGCATGTTAATATCTTTCGTTATGGGTAAAGTATTTACAAAACAAAGTGACATTTGATTCAACTGTTGAGCTGTAGTTA GAATACTCAATTTTTAATTTTTTAATTTTTTTTAAATTTTTTTATTTTCTTTTTGTTTGTTTCGTTTTGGGGAGGGGTAAAAGTTCTTAGCAC

GAATACTCAATTTTTAATTTTTTAATTTTTTTTAAATTTTTTTATTTTCTTTTTGTTTGTTTTGTTTTGGGGAGGGGTAAAAGTTCTTAGCAC

AATGTTTTACATAATTTGTACCAAAAAAATTACACAC---AAAAAAAAAAAAGAAAAGAAAAGAAAAGTGAAAGGGGTGGCCTGGTACTGGCA

AATGTTTTACATAATTTGTACCAAAAAAATTACACACACAAAAAAAAAAAAAGAAAAGAAAAGAAAAGTGAAAGGGGTGGCCTGGTACTGGCA

GCACTAGCAAGTGTGTGTTTTTAAAAAACAAAACAAACAAACAAAAAAATAAATAAAGAGGAAAGAAAAAAAAAGCTTTTAAACTG

GCACTAGCAAGTGTGTGTTTTTAAAAAACAAAACAAACAAACAAAAAAATAAATAAAGAGGAAAAGAAAAAAAAAAAAGCTTTTAAACTG GAGAGACTTCTGAAACAGCTTTGCGTCTGTGTTGTGTACCAGAATACAAACAATACACCTCTGACCCCAGCGTTCTGAATAAAAAGCTAATTT GAGAGACTTCTGAAACAGCTTTGCGTCTGTGTTGTGTACCAGAATACAAACAATACACCTCTGACCCCAGCGTTCTGAATAAAAAGCTAATTT

TGGATCTGGITGACTGGTTTGAACTTTTTT TGGAAAAAA AAAAAAAAAACTTACATTAA

FIG. 4.

Comparison of the nu-

cleotide sequences for the 16th exon and the extra 16th exon of

the N-cadherin gene. Closed arrowheads indicate the exon-

intron junctions. Identical nucleotides are shown with paired dots. The nucleotide sequence of the 16th exon is at the top; that of the extra 16th exon is at the bottom.

185 mice were analyzed for every marker and are shown in the segregation analysis (Fig. 6), up to 190 mice were typed for some markers. Each locus was analyzed in pairwise

combinations for recombination frequencies using the additional data. The ratios of the total number of mice exhibiting recombinant chromosomes to the total number of mice 1 kbp

a

N / F M

N-cadherin

M C 1~~~~~~~~~~~----------------- --------

_

P-cadherin L-CAM

t O kbp

FIG. 5. Comparisons of the genomic structures of N-cadherin, P-cadherin, and L-CAM genes. (a) Common structure of cadherin cDNA. and C represent the amino and carboxyl termim respectively and M represents the transmembrane domain. Dotted lines show the untranslated regions. The putative signal peptide, the precursor peptide, the repeated structures of the extracellular domain, and the transmembrane domain are shown by filled, hatched, shadowed, and the stippled boxes, respectively. Size is indicated by a filled bar. (b) Comparisons of the locations of exons of the N-cadherin, P-cadherin, and L-CAM genes. Open and filled boxes show the untranslated and coding regions of these genes, respectively. Oblique lines indicate the positions of exon-intron junctions on the cDNA and also the corresponding exons located on these genes. Size is indicated by an open bar. N

Developmental Biology: Miyatani et al. analyzed for each pair of loci and the most likely gene order are centromere, Ncad (26/188), Fgfa (2/190), Grl-L. The recombination frequencies (expressed as genetic distances in centimorgans + the standard error) are Ncad (13.8 ± 2.5), Fgfa (1.1 + 0.7), Grl-L. The placement of N-cadherin in the proximal region of mouse chromosome 18 clearly indicates that this locus is unlinked to two other cadherins, E- and P-cadherin, that have been shown to be tightly linked to each other on chromosome 8 (16). The overall similarity in amino acid sequences between E- and P-cadherin is 58%, but the similarities between N- and E-cadherin and N- and P-cadherin are 49o and 43%, respectively (6). Furthermore, N-cadherin contains unique sequences, that are not detected in E- and P-cadherin, in the cytoplasmic domain. These suggest that the N-cadherin gene evolved independently of the E- and P-cadherin gene group after their segregation. On the other hand, we recently found that another cadherin, R-cadherin, is 74% identical to N-cadherin, suggesting that R-cadherin is a close relative to N-cadherin. It would be, therefore, intriguing to determine their relationship in chromosomal localization in future studies. Finally, the human homolog of the N-cadherin locus has been placed on human chromosome 18 (18), thus defming a new region of homology between human chromosome 18 and the proximal region of mouse chromosome 18. The distal Ncad

Fgfa Grl-l

* n8

8 89

68

11

15

1

1

18

-

Ncad

18

Fgfa

5q31.3-q33.2 5q31-q32

13.8

1.1

_

Grl-l

Proc. Natl. Acad. Sci. USA 89 (1992)

8447

region of mouse chromosome 18 also shares a region of homology with human chromosome 18 but the proximal and distal regions of chromosome 18 homology are interrupted by a region of human 5q homology (data not shown and Fig. 6). To summarize, the cadherin genes greatly vary in size, but the overall genomic structure seems to be highly conserved within this family, suggesting a functional or evolutionary importance of this pattern. It was also shown that cadherin genes constitute more than one linkage group. If all cadherin genes are mapped eventually, such information may be helpful in understanding not only evolutionary but also functional relationships among multiple cadherin molecules. We thank Debbie Swing, Denise Angle, and Brian Cho for excellent technical assistance. This work was supported by research grants 02258102 and 02262216 from the Ministry of Education, Science and Culture of Japan to M.T. and, in part, by the National Cancer Institute, Department of Health and Human Services, under Contract NO1-CO-74101 with Advanced BioScience Laboratories. 1. Takeichi, M. (1991) Science 251, 1451-1455. 2. Nagafuchi, A., Shirayoshi, Y., Okazaki, K., Yasuda, K. & Takeichi, M. (1987) Nature (London) 329, 341-343. 3. Ringwald, M., Schuh, R., Vesweber, D., Distetter, H.,

Lottspeich,- F., Engel, J., Dolza, R., Jahhnig, F., Epplen, J., Mayer, S., Muller, C. & Kemler, R. (1987) EMBO J. 6,

3647-3653. 4. Nose, A., Nagafuchi, A. &

Takeichi, M. (1987) EMBO J. 6, 3655-3661. 5. Hatta, K., Nose, A., Nagafuchi, A. & Takeichi, M. (1988) J. Cell Biol. 106, 873-881. 6. Miyatani, S., Shimamura, K., Hatta, M., Nagafuchi, A., Nose, A., Matsunaga, M., Hatta, K. & Takeichi, M. (1989) Science 245, 631-635. 7. Ballin, W., Sorkin, B. C., Edelman, G. M. & Cunningham, B. A. (1987) Proc. Natl. Acad. Sci. USA 84, 2808-2812. 8. Sorkin, B. C., Gallin, W. J., Edelman, G. M. & Cunningham, B. A. (1991) Proc. Natl. Acad. Sci. USA 88, 11545-11549. 9. Koch, P. J., Walsh, M. J., Schmelz, M., Goldschmidt, M. D., Zimbelmann, R. & Franke, W. W. (1990) Eur. J. Cell Biol. 53, 1-12. 10. Holton, J. L., Kenny, T. P., Legan, R. K., Collins, J. E., Keen, J. N., Sharma, R. & Garrod, D. R. (1990) J. Cell Sci. 97, 239-246. 11. Collins, J. E., Legan, P. K., Kenny, T. P., MacGarvie, J., Holton, J. L. & Garrod, D. R. (1991) J. Cell Biol. 113, 381-391. 12. Parker, A. E., Wheeler, G. N., Arnemann, J., Pidsley, S. C., Ataliotis, P., Thomas, C. L., Rees, D. A., Magee, A. I. & Buxton, R. S. (1991) J. Biol. Chem. 266, 10438-10445. 13. Mahoney, P. A., Weber, U., Onofrechuk, P., Biessmann, H., Bryant, P. J. & Goodman, C. S. (1991) Cell 67, 853-868. 14. Takeichi, M. (1988) Development 102, 639-655. 15. Sorkin, B. C., Hemperly, J. J., Edelman, G. M. & Cunningham, B. A. (1988) Proc. Natl. Acad. Sci. USA 85, 7617-7621. 16. Hatta, M., Miyatani, S., Copeland, N. G., Gilbert, D. J., Jenkins, N. A. & Takeichi, M. (1991) Nucleic Acids Res. 19, 4437-4441.

17. 18. FIG. 6. Position of the N-cadherin locus on mouse chromosome 18. Ncad was placed on mouse chromosome 18 by interspecific backcross analysis. The segregation patterns of Ncad and flanking genes in 185 backcross animals that were typed in common are shown at the top. For individual pairs of loci, >185 animals were typed (see text). Each column represents the chromosome identified in backcross progeny that was inherited from the (C57BL/6J x M. spretus)Fl parent. The shaded boxes represent the presence of a C57BL/6J allele and white boxes represent the presence of a M. spretus allele. The number of offspring inheriting each type of chromosome is listed at the bottom of each column. A partial chromosome 18 linkage map showing the location of N-cadherin in relation to linked genes is shown at the bottom. Recombination distances between loci are shown (in centimorgans) to the left of the chromosome and the positions of Fgfa and Grl-1 in human chromosomes are shown to the right.

19. 20. 21. 22. 23.

24.

Mansouri, A., Spurr, N., Goodfellow, P. N. & Kemler, R. (1988) Differentiation 38, 67-71. Walsh, F. S., Barton, C. H., Putt, W., Moore, S. W., Kelsel, D., Spur, N. & Goodfellow, P. N. (1990) J. Neurochem. 55, 805-812. Eistetter, H. R., Adolph, S., Ringwald, M., Simon-Chazotes, D., Schuh, R., Guenet, J.-L. & Kemler, R. (1988) Proc. Natl. Acad. Sci. USA 85, 3489-3493. Maniatis, T., Frisch, E. F. & Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Lab., Cold Spring Harbor, NY). Copeland, N. G. & Jenkins, N. A. (1991) Trends Genet. 7, 113-118. Jenkins, N. A., Copeland, N. G., Taylor, B. A. & Lee, B. K. (1982) J. Virol. 43, 26-36. Cox, R. D., Copeland, N. G., Jenkins, N. A. & Lehrach, H. (1991) Genomics 10, 375-384. Green, E. L. (1981) in Genetics and Probability in Animal Breeding Experiments (Macmillan, New York), pp. 77-113.

Genomic structure and chromosomal mapping of the mouse N-cadherin gene.

N-cadherin is a member of the cadherin cell-cell adhesion receptor family that includes P-, E-, and R-cadherin and liver cell adhesion molecule (L-CAM...
1MB Sizes 0 Downloads 0 Views