Chromosoma (1991) 100:360-370

CHROMOSOMA 9 Springer-Verlag1991

CENP-B is a highly conserved mammalian centromere protein with homology to the helix-loop-helix family of proteins Kevin F. Sullivan 1 and Charles A. Glass 2

Departments of 1 MolecularBiology,and 2 Molecularand ExperimentalMedicine, Research Institute of Scripps Clinic, 10666 N. TorreyPines Road, La Jolla, CA 92037, USA Received January 25, 1991 Accepted February 19, 1991 by J.B. Ratmer

Abstract. CENP-B is a centromere associated protein

originally identified in human cells as an 80 kDa autoantigen recognized by sera from patients with anti-centromere antibodies (ACA). Recent evidence indicates that CENP-B interacts with centromeric heterochromatin in human chromosomes and may bind to a specific subset of human alphoid satellite DNA. CENP-B has not been unambiguously identified in non-primates and could, in principal, be a primate-specific alphoid DNA binding protein. In this work, a human genomic DNA segment containing the CENP-B gene was isolated and subjected to DNA sequence analysis. In vitro expression identified the site for translation initiation of CENP-B, demonstrating that it is encoded by an intronless open reading frame (ORF) in human DNA. A homologous mouse gene was also isolated and characterized. It was found to possess a high degree of homology with the human gene, containing an intronless ORF coding for a 599 residue polypeptide with 96% sequence similarity to human CENP-B. 5' and 3' flanking and untranslated sequences were conserved at a level of 94.6% and 82.7%, respectively, suggesting that the regulatory properties of CENP-B may be conserved as well. CENP-B mRNA was detected in mouse cells and tissues and an immunoreactive nuclear protein identical in size to human CENP-B was detected in mouse 3T3 cells using human ACA. Analysis of the sequence of CENP-B revealed a segment of significant similarity to a DNA binding motif identified for the helix-loop-helix (HLH) family of DNA binding proteins. These data demonstrate that CENP-B is a highly conserved mammalian protein that may be a member of the H L H protein family and suggest that it plays a role in a conserved aspect of centromere structure or function.

Introduction

The centromere is the genetic locus that specifies the chromosomal component of the mitotic spindle appara-

Offprint requests to: K.F. Sullivan

tus. It serves at least three functions during mitosis: as the microtubule association site, as the primary site of sister chromatid attatchment and as a receptor for signals generated at the metaphase-anaphase transition (Pluta et al. 1990). Work in a number of systems has led to an understanding of the centromere as a unique chromatin domain, built around specific DNA sequence elements, that mediates the assembly of a microtubule binding site at the surface of the chromosome (Clarke and Carbon 1980; Bloom and Carbon 1982; Ris and Witt 1981; Rattner and Bazett-Jones 1989). This site, the kinetochore, is visible as a trilaminar disk at the surface of the centromeres of ceils of higher organisms and plays an active role in chromosome motility as a dynamic attatchment site for microtubules and an organizer region for mechanochemical proteins (Pfarr et al. 1990; Steuer et al. 1990; Hyman and Mitchison 1990). Thus, the centromere/kinetochore domain of the genome plays an active and essential role in cell division. As a structural component of the chromosome, the centromere consists of both cis-acting DNA sequences and associated protein components. Knowledge of DNA components of the centromere is based largely on characterization of centromeres in yeasts. Centromeres of Saccharomyces cerevisiae chromosomes have been shown to comprise short (ca. 125 bp) DNA sequences containing three distinct subelements (Centromere DNA Elements) that differ in their functional properties (for review see Bloom and Yeh 1989). Centromere sequences in Schizosaccharomyces pombe are quite different, consisting of a much larger region of DNA (40-100 kb) containing both conserved repetitive elements and chromosome specific sequences (Clarke 1990). In both species, the centromere regions contain uniquely configured chromatin (Bloom and Carbon 1982; Polizzi and Clarke 1991). In vertebrates, the centromere is seen by microscopy as the primary constriction of mitotic chromosomes, a highly condensed heterochromatin domain toeated at the site of sister chromatid attachment. In situ hybridization has revealed the presence of highly repetitive DNA sequences at the centromeres of many higher eukaryotes, which may comprise as much as 20% of

361 the genome (Pardue and Gall 1970; Manuelidis 1978; Singer 1982). In humans, the major centromere D N A component is alphoid satellite D N A , which comprises hierarchically organized tandem repeats o f a basic 170 bp sequence which is present in blocks estimated to be 0.5-10 Mb long (3,000-50,000 copies) at the centromeres o f each chromosome (Tyler-Smith and Brown 1987; Willard et al. 1989). It is not clear whether satellite DNAs are related to components o f the yeast centromeres or how they may be related to the segregation function o f the centromere. It is surprising that centromere D N A organization is so highly variable during evolution in light o f the functional conservation of the centromere. It seems likely that much of the centromere associated D N A in higher cells is present to establish a structural context for the action of sequences that confer the segregation function of the centromere. An important step in the identification o f centromere specific proteins was the discovery o f human autoimmune disease sera that recognized antigens localized uniquely at the centromere (Moroi et al. 1980). Three human C E N t r o m e r e Proteins, CENP-A (17 kDa), CENPB (80 kDa) and CENP-C (140 kDa) were identified by characterizing antigens recognized by these sera (Earnhaw and Rothfield 1985; Valdivia and Brinkley 1985). A partial c D N A encoding human CENP-B was cloned by expression screening with a human autoantiserum and it was shown to be a highly acidic protein (Earnshaw et al. 1987), while C E N P - A appears to be histone-like in its biochemical properties (Palmer and Margolis 1987). More recently, S. cerevisiae centromere D N A binding proteins have been identified by affinity techniques using C D E sequences and one of the these, CBF1 or CP-1, has recently been cloned (Cai and Davis 1990; Baker and Masison 1990). Surprisingly, CBF1 is a member o f the helix-loop-helix ( H L H ) family of proteins and has functional properties that suggest a role in regulation o f genes encoding methionine metabolism in addition to a role as a centromere associated protein. Cytoplasmic dynein, a microtubule associated ATPase that drives minus end-directed movement along microtubules, has recently been localized in the centromeres of mammalian chromosomes in mitosis by indirect immunofluorescence (Pfarr et al. 1990; Steuer et al. 1990) and, similarly, p34 kinase has been shown to associate with the centromere during mitosis (J.B. Rattner, in press). The human centromere antigen CENP-B is the most extensively characterized vertebrate centromere associated protein but its function has not yet been unambiguously established. It is a chromosome scaffold-associated protein concentrated in the dense heterochromatin underlying the kinetochore (Earnshaw et at. 1984; Cooke et al. 1990). It appears to interact, directly or indirectly, with a short D N A sequence present in certain human centromeric alphoid D N A repeats (Masumoto et al. 1989). This is consistent with the observation that the apparent abundance of CENP-B on different human chromosomes varies in direct correlation with the abundance of alphoid D N A (Earnhaw et al. 1987). These properties are consistent with a role in organizing the

chromatin underlying the kinetochore. However, in vitro crosslinking data suggest that CENP-B may be available for interaction with non-chromosomal proteins, including tubulin (Balczon and Brinkley 1987). It has been unclear whether C E N P - B is a conserved centromere component or, like alphoid DNA, is a unique component of primate chromosomes. The primary tool for analysis of CENP-B has been human autoantisera that recognize multiple centromere proteins and often additional antigens, making identification o f homologous antigens across species difficult. Several workers have failed to detect an 80 k D a antigen in rodents and other vertebrate species. A protein of 50 kDa was detected in rat tissues using affinity purified anti-CENP-B antibodies (Earnshaw and Rothfield 1985) and a protein o f similar size, classified as CENP-D, has been detected in multiple species, including humans (Kingwell and Rattner 1987). Others have reported detection of an 80 k D a centromere antigen in non-human cells (Balczon and Brinkley 1987; Simmerly et al. 1990). In this work the human gene encoding CENP-B was isolated and characterized and a homologous gene was isolated from the mouse genome. The mammalian genes encoding CENP-B are both intronless and are highly conserved in D N A and encoded protein sequence in the two species. Further, sequence analysis demonstrates a region o f high similarity to the basic D N A binding m o t i f o f the H L H protein family at the amino-terminus o f CENP-B. The data demonstrate that CENP-B is a highly conserved mammalian protein and suggest that CENP-B serves a conserved function in the chromosomes o f vertebrates.

Materials and methods Cells and antibodies. HeLa (human) or 3T3 (mouse) cells were maintained at 37~ C in 10% CO2 in DMEM medium supplemented with 10% fetal calf serum (Life Technologies, Bethesda, Md.). Human anti-centromere autoantiserum M was obtained from a CREST patient identified at Scripps Clinic. It contains high titer (> 1:10,000) antibodies to CENP-A and CENP-B and lower titer antibodies to additional antigens. Monoclonal antibodies 1H2 and 2D7 (Earnshaw et al. 1987) were purified from ascites fluid by DEAE chromatography. Nucleic acid analysis. Genomic DNA was extracted from peripheral

blood lymphocytes (human) or liver (mouse), lambda phage DNA was prepared from CsCI purified phage and plasmids were prepared by alkaline lysis and phenol extraction using methods described by Maniatis et al. (1982). Cloning vectors were further purified by CsC1 gradient centrifugation prior to use. Restriction mapping was performed with several enzymes according to the manufacturers' instructions (New England Biolabs, Mass. ; BoehringerMannheim, Ill.). RNA was extracted from exponentially growing cultures of HeLa or mouse 3T3 cells using the method of Chomczynski and Sacchi (1987). Oligonucleotides were prepared by the W.M. Keck Core DNA facility using a DNA synthesizer (Model 380B, Applied Biosystems, Foster City, Calif.). Nucleic acid electrophoresis and transfer to nitrocellulose were as described previously (Sullivan et al. 1986). Hybridization was carried out at 42~ in 50% formamide, 5xSSC for 8-16 h with probes labeled with [32p]dCTP using the random primer method (Feinberg and Vogelstein 1984) and filters were washed at 51~ in 0.1 xSSC, 0.1% SDS. (1 x SSC is 0.15 M NaC1, 0.015 M sodium citrate.)

362

Gene isolation, subcloning and sequence analysis. Genomic libraries of human peripheral blood lymphocyte DNA cloned in lambda EMBL-3 (Clontech, Palo Alto, Calif.) or mouse (C57/B6) DNA cloned in Charon-4A (a gift of Nalin Kumar) were screened using a cDNA copy of human CENP-B. Approximately 106 phage from each library were screened, resulting in each case in the isolation o f a single class of recombinant phage with strong homology to the probe. Phage DNA was analyzed by restriction mapping. The human gone was isolated on a 14 kb insert containing a 5.8 kb HindIII fragment corresponding to the fragment detected on Southern blots of genomic DNA. This fragment was subcloned into Bluescript SK-plus (pBSK+) (Stratagene, La Jolla, Calif.), yielding the plasmid pHCG-5. A 1.95 kb Sinai fragment was isolated from pHCG-5 and subcloned into the EcoRV site of pBSK +, yielding plasmid pHS-1 that contains the CENP-B coding sequence oriented for sense strand transcription with T7 RNA polymerase. The mouse CENP-B gone was isolated on a 17 kb genomic DNA segment. EcoRI (pMR6, 6 kb) and HindIII (pMH3 and pMH5, 3 and 5 kb respectively) fragments corresponding to those detected in genomic DNA hybridizations were subcloned in the appropriate pBSK + vectors. For sequence analysis, a 3.8 kb BglII-EcoRI fragment (pMBR6) containing the entire coding region was subcloned from pMR6. For DNA sequence determination, pHCG-5 was subjected to Ba131 nuclease deletions from flanking and internal restriction sites, as described previously (Sullivan et al. 1985). Fragments were subcloned into ml3mpl8 and DNA from single stranded phage was analyzed by the Sanger technique, using T7 DNA polymerase and commercial dideoxy nucleotide sequencing reagents (Pharmacia, Piscataway, N.J.). Determination of both strands was completed using synthetic oligonucleotides to sequence plasmid DNA (Chen and Seeberg 1985). Similarly, for the mouse genea Ba131 nuclease deletion set representing the 3.8 kb insert of pMBR6 was prepared in ml 3mpl 8. DNA sequence determination was as above or using an automated DNA sequencer (Model 370A, Applied Biosystems). Phage DNA for automated sequencing was prepared by a glass binding method based on that of Kristensen etal. (1987). Sequencing reactions were performed with T7 DNA polymerase using dye-conjugated primers and nucleotide solutions obtained from Applied Biosystems. DNA sequence 3' of the EcoRI site of pMBR6 and that required to complete the double stranded sequence was determined using synthetic oligonucleotides to sequence the appropriate plasmids. Sequences were analyzed with the aid of programs from the University of Wisconsin Genetics Computer Group (Devereux et al. 1984). GAP, COMPARE and DOTPLOT were used to analyze nucleotide and amino acid sequence homologies and PEPTIDESTRUCTURE and PLOTSTRUCTURE were used for protein sequence analysis. Tfasta and fasta (Pearson and Lipman 1988) were used to search the GenBank and EMBL nucleotide sequence and the NBRF protein sequence databases. The SCAN program of the Protein Identification Resource system was used to probe the NBRF protein databases for short peptide sequences. In vitro transcription and translation. Plasmids were linearized with appropriate restriction enzymes prior to in vitro transcription with T3 or T7 RNA polymerase (Promega, Madison, Wis.). RNA was purified by extraction and ethanol precipitation, dissolved in water and quantitated by electrophoresis on agarose-formaldehyde gels with R N A standards of known concentration (BRL, Bethesda, Md.). For in vitro translation 1 lag of RNA was incubated with rabbit reticuloeyte lysate (Promega) and [3SS]methionine (> 800 Ci/mmol; Amersham) according to the suppliers' instructions. Protein analysis. Cells were harvested from exponentially growing cultures by trypsinization and centrifugation at 1,000 rpm. Ceil numbers were determined with a hemocytometer. Cell pellets were washed once with 20% fetal calf serum in PBS and twice with PBS. For total cell extracts, cells were solubilized at a concentration of 5 x 107 cells/ml in 2 x SDS-PAGE (SDS-polyacrylamide gel electrophoresis) sample buffer (Laemmli 1970). Nuclei were prepared

in low ionic strength buffer as described by Masumoto et al. (1989) and dissolved in 2 x sample buffer at a concentration of 10s nuclei/ ml for electrophoresis; the corresponding cytoplasmic fraction (low speed supernatant) was adjusted to 5 x 107 cell equivalents/ml with concentrated sample buffer. All fractions were boiled for 5 rain, vortexed vigorously and spun for 10 men in a microcentrifuge prior to loading on a gel. Protein SDS-PAGE was performed on 10% or 12.5% polyacrylamide gets (Laemmli 1970). For fluorography, gels were fixed for 30 rain and then incubated for 30 rain with a fluorographic reagent (Enhance; Amersham) prior to drying and exposure to film. For immunodetection, proteins were transferor to nitrocellulose (BA-85, Schleicher and Schuell, N.H.) with an electrophoretic transfer cell (Bio-Rad, Richmond, Calif.) using the Tris/Glycine/ SDS buffer described in Harlow and Lane (1988). Following transfer, filters were incubated in blotting solution (3% non-fat dried milk in TBS tweba: 100 mM Tris, pH 7.5, 50 mM NaC1, 0.1% Tween-20) for 1 h at room temperature or overnight at 4~ C. Primary and secondary antibodies were applied in blotting solution and blots were washed vigorously five times for 5 min each in TBS-tween after each antibody application. Secondary antibodies were alkaline phosphatase linked goat anti-human IgG (OranonTeknica) and signals were visualized using 5-bromo-4-chloro-3-indolyl-phosphate (BCIP) and nitro blue tetrazolium (NBT) reagents according to the manufacturer's instructions (Promega). Immunoprecipitations were performed with protein-A-linked Sepharose beads (Pharmacia) using the protocol described in Harlow and Lane (1988). Beads were collected by centrifugation in a microfuge for 3 rain and washed three times with TBS prior to elution with SDS-PAGE sample buffer. Results

A h u m a n l y m p h o c y t e g e n o m i c l i b r a r y c o n s t r u c t e d in l a m b d a E M B L - 3 was screened using a h u m a n C E N P - B c D N A , resuRing i n the i s o l a t i o n o f a p h a g e c o n t a i n i n g a 14 k b h u m a n g e n o m i c s e g m e n t with h o m o l o g y to the c D N A sequence. Restriction m a p p i n g of this segment i n d i c a t e d t h a t it c o r r e s p o n d e d to the single copy C E N P B sequence detected by S o u t h e r n analysis o f h u m a n gen o m i c D N A a n d t h a t all detectable h o m o l o g y with the C E N P - B c D N A was located within a n a p p r o x i m a t e l y 3 k b r e g i o n o f the g e n o m i c clone (Fig. 1). A 5.8 H i n d I I I A

B

/|

s.s- " " l I P 4.3-2.3-2.O--

1.1O.9--

111

'

L

0.6--

-

-

O.Skb

Fig. 1A, B. Structure of the human CENP-B locus. A Southern blot analysis of human genomic DNA digested with different restriction enzymes and probed with a CENP-B cDNA. Lanes represent BamHI (B), HindIII (H) and EcoRI (R) digests, respectively. Markers correspond to Cambola Hindlt[ and phi X174 HaeIII fragments. B The top panel shows a restriction map of the insert of lambda phage HCG-5 containing a 14 kb genomic insert. The bottom panel shows a detailed map of the 5.8 kb HindIII fragment containing the CENP-B gene. The coding region is indicated by the stippled box and the start and stop codons and polyadenylation signal sequence are indicated. Restriction sites are abbreviated as: A, ApaLI; B, BamHI; Bg, BgllI; H, HindllI ; K, KpnI; R, EcoRI; S, Sinai; Sf, SfiI; X, XhoI; Xb, XbaI

363 TAGATTTAcAACTGT~TG~T~TTTT~AAAT~CTT~GTGT~T~cAATAGGTTA~JACG~TAGT~TTCATT~ATAG~TG~TGCTACACAATcATTAAAAA~CGTGTT~C~T

120

~TA~TTAc~TATG~CAAAATT~TCA~TA~TT~T~AGGTAAAAcA~A~T~C~TATA~AGT~TATACA~T~GAT~AGTTTTATAT~TAAATTTTTAAAAAGAAA

240

ACGACATG~AT~TAGAGAAAA~TGGMGGAAAT~TATC/UUU~TGTTACCTGTG~AT~TcT~AGGTGAcGT~AAAAGCGTTTTTC~CTT~TTGAATTTTTCAGGTTCCGCAA~UUUU~ A/~T~TTATTTTTACCTT~TT~~A/kACAAA~T~TT~TTATTCTTTTT~G~A~C~TC~CTGTc~cG~AGTA~GATG~A~CGTC~C~ CCC~GTCG~T~CGGGTCGCC~GAGGC~GCAG~CG~GCAG~TCCA~CAGGC~GCAGCGCC~C~CGCC~GTATGTGCCCGGCGGCGGGAAGGG~GT~GGC~C~GAGGGCG~G~G G~cCGGGCGT~CC~GCTT~CC~CGA~TCCCGCT~CGCCCCGCTCGCCGCGCGT~c~GCTCCCCGCG~CGGC~CTTCCTGcCTTC~GCGC~CG~GCC~CCCGcC~TCGTCTGCGc CCGTC~C~T~C~GCCCGCCG~CCGGGAC~GGCCCG~CGGCGTCC~CGGAGGT~CCCGGC~GGCCGGGTCGTCG~CCCGCCGCCGCGC~C~G~CGCTTTGTCTC~GG~GGGGCGcGC GGGA~GGCCGC~AGGTG~CC~GC~A~GGGCCCGGGCCC~G~CC~G~GCGGCGGCGGCG~GC~GGGCCCCGGGGCGGGGG~CGCGCc~GGATGGGCCCC~GAGGCGA~AG~TGA M G P K R R Q L T ~GTTCcGGGAG~TCACGGAT~AT~CAGGAGGTG~G~G~T~G~CCTG~c~GGG~TCGCGCGGCGcTTC~CATCCcG~C~TC~ACGcTGAGCAcGATCCTGAAGAACA F R E K S R I I Q E V E E N P D L R K G E I A R R F N I P P S T L S T I L K N K

360 480 600 720 ~0 960 1060

AGc~GCCATCCTGG~GTCG~G~GC~GTACG~GGTGG~TCCACCTGCCGC~GAC~CTGTCTCc~TAC~C~GCTc~AG~G~TTGcTCATCGCcT~GTTC~A~A~TCc 1200 R A I L A S E R K Y G V A S T C R K T N K L S P Y D K L E G L L I A W F Q Q I R GcGCCG~CG~C~GcCGGT~GG~CATCATC~TC~GGAG~GCGCTGCGCATAGC~GAG~A~TGGGCATGGA~ACTTCA~CGCCTC~GG~TGGCTGGA~C~CTTCC~CCGGC 1320 A A G L P V K G I I L K E K A L R I A E E L G M D D F T A S N G W L D R F R R R

GCCACGGCGTGGTGTCCT~cAGCGGCGTGG~cGcGCCCG~Gc~CGAAA~GCTGCCCCC~GCAC~C~GcGGCGCCTGC~AGTCcGGCCGCGGTGCCcTCGGA~G~CAGTGGC~G~GCA

1440

H G V V S C S G V A R A R A R N A A P R T P A A P A S P A A V P S E G S G G S T

cTACTG~TT~GCGCGCTcGGGAG~A~CAGCCG~cGTCGGTGGCCGAGG~CTAcGcCTCGCAG~cGTGTTCAGcGcCACC~AGACCAGTCTATGGTA~CTTCCTGCCCGACCAGGCCG

1560

T G W R A R E E Q P P S V A E G Y A S Q D V F S A T E T S L W Y D F L P D Q A A

CGGG~CTGTGCGGAGG~GACG~C~CCG~GT~GCCA~cCA~CGCCTGAGCGTCCTG~TATG~GCC~TGcCGA~GGCAGCGA~GCTGCCCCCGCTGGTGGC~GGCAAGTcGGCCA

1~0

G L C G G D G R P R O A T Q R L S V L L C A N A D G S E K L P P L V A G K S A K AGCC~CGCGCAGGCC~GCCGGCCTG~CCTGCGACTACACCGCC~TC~GGGTGGTGTCACCACCCAGGCCCTGGCC~GTACTTG~GCCTTGGACACCC~ATGGCTGCAGAGT

1800

P R A G Q A G L P C D Y T A N S K G G V T T Q A L A K Y L K A L D T R M A A E S

CTCGCCG•GTC•T•CTGTTGG•CGGCCGCTTGGCTGCCCAGTCCTTGGA•ACCTcGGGCCTGCGGCA•GTGCAGCTGGCCTT•TTCCCTC••G•CACCGTGcATCCGCTGGAGAGGGGAG

1920

R R V L L L A G R L A A Q S L D T S G L R H V Q L A F F P P G T V H P L E R G V TGGTCCA•CA•GTGAAG•GCCACTACC•CCAGGCCATG•TGCTCAAGGCCAT•GCcGCG•TAGAGGGCCAGGATCCCT•AGGCCTGCAGCTG•GTCTCACGGAGG•CCTGCACTTTGTGG

2040

V Q Q V K G H Y R Q A M L L K A M A A L E G Q D P S G L Q L G L T E A L H F V A

CTGCCGCCTGGCAGGCAGTGGAGCCTTCG~CATAG~cGCCTGCTTTcGTGAG~cTGGCTTTGGGGGTGGC~CTAATGCCACCATCACCACTT~CCTCAAGAGT~AGGGAGAGGAAGAGG

2160

A A W Q A V E P S D I A A C F R E A G F G G G P N A T I T T S L K S E G E E E E

2280

AGGAGGAGGAGGAA~AAGAGGA~GAG~AAGAGGGTG~GGAGA~GAGGAGGAGG~G~GAGGAGGAGGAGGAGGAAGG~GGGAAG~AGAGG~TTGGGGGA~GAA~AGGAGGTGG E E E E E E E E E E G E G E E E E E E G E E E E E E G G E G E E L G E E E E V E

2400

AGGAGGAGGGTGATGTTGATAGTGATGAAG~GA~GAG~ATGAGGAGAG~TcCTCGGAGG~CTTGGAG~TGAGGACTGGG~CCAGG~GTAGTGGAG~CCGGT~GcAGCTTCG~GG E E G D V D S D E E E E E D E E S S S E G L E A E D W A Q G V V E A G G S F G A

cTTAT~GT~Cc~AGGAGG~GC~CAGTGCCCTACT~TGCATTTCcTGGAA~GTGGGGAGGACTCT~ATT~A~CA~TGAGGAAGAG~AcGAT~AGG~GAG~AT~ATGAAGATG~GAcG Y

G

A

Q

E

E

A

Q

C

P

T

L

H

F

L

E

G

G

E

D

S

D

S

D

S

E

E

E

D

D

E

E

E

D

D

E

D

E

D

2520

D

ACGATGATGATGAGGAGGATGGT~ATGAGGTGC~T~TA~CCAGCTTT~GGGAGGC~ATGGCTTACTTTGCCATGGTC~GAGGTACCTGA~CT~CTTCCCcATTGATGACCGCGTGCAGA

2~0

D D D E E D G D E V P V P S F G E A M A Y F A M V K R Y L T S F P I D D R V Q S GCCACATCCTCCA~TTGGAACACGATcTGGTTcATGT~ACCAGG~G~CCA~GcCAGGCAGGcGGGAGTTcGAG~TCTT~GACATcA~GCTGAGTcACTGGAcCTAGCTGTGCCCCCA

2760

H I L H L E H D L V H V T R K N H A R Q A G V R G L G H Q S * ACcTA~TTGGCAGCAC~ACCCCA~GGCAGAGGA~TCTCTGGGCACCCGCTG~G~ATGGAGCCAGAGTGCAGAGCCC~GATCCTTTAGTAA~GC~TCCCCT~GTCCTGC~AGGCCCG GTCA~CTCGGcc~CCcGGGG~GGT~A~C~TcAcTGCCTGcTTATTGCcTCTTTCTCA~TCCTCTTTCCTCCCCATTTG~CccTGGGCTCA~GGAc~AGGTGGG~GGGTGGG GAGcT~TCcGGTG~TACcA~AcCGTGc~TcAGTGGACT~CcA~GcAGCAGccAGGGAT~GG~CTGGAGGTT~CCGG~GTGCCTCT~CcTCTGCCATCCACGTcAGGTCT TTGGTGGGGGGACC~CAAAG~CATTCTGGGAAG~GCTC~AG~G~TCcA~ccTAGGCCCC~TGCAAGGCTGG~A~CCCCcAcCc~cACC~cAGGCCG~CTTGAGAAGCA~AGTTT AACTCACTGCGGGCTCcTGAG~CTG~TTcTG~cTG~TTT~CA~cT~CCCAGTC~TTTCTcTG~CC~TGTcCATGT~CTTTGG~CcTT~TTTTCTTTCCA~ATTGGA~TTTCcAAGA GGC~CCccAccGTGGAAGT~CC~GGGcGCTTcCTTGTGGGCA~CT~AGGCCCcATGCCTCTccT~CCTCTCTCTGGcAGGGC~CATC~T~CA~AGGG~CCTGGG~CTGGGCcCAG AGTCCAGCCGTCCAGCTGCTCCTTTCC~AGTTT~ATTTC~TAAAT~TGTCCACTCCCCTTTTGTGGGGGTGAACGTTTTAA~AGCC~GGTGCATCCTTCATGGTCTGGGCTTGCGTC TGTCTTGG~GGACTTATTCGTCCTGGCTCTCTTTGGTCCTTGCTCT~GTGG~ACATGGAGGCAAGTGTTGAGAGGGTT~CC~CCGGAAGAGGGGCAGGAG~AGACCTCAAG~TT

2~0 3000 3120 3240 3360 ~BO 3600 3717

Fig. 2. Sequence of the human CENP-B gene. The sequence of a 3.717 kb segment of the 5.8 kb genomic HindIII fragment cloned in pHCG5 is shown with translation of the 599 residue open reading flame (ORF) corresponding to the CENP-B protein; the po!yadenylation signal AATAA is un&rlined. Corrections to the nucle-

otide sequence result in three amino acid differences relative to the previously published eDNA sequence of CENP-B (Earnshaw et al. 1987) at positions 583 (M to R), 592 (L to V) and 593 (L to R). Actual nucleotide differences are detailed in the EMBL submission (EMBL accession no. X55039)

fragment containing this region was subcloned into Bluescript S K ( + ) to construct plasmid pHCG-5 for D N A sequencing and further analysis. D N A sequence was obtained from pHCG-5 using a combination of directed deletion subcloning and synthetic oligonucleotide primers, resulting in the determination of a contiguous sequence of 3,717 bp (Fig. 2). The sequence of a previously determined e D N A (Earnshaw et al. 1987) is represented in a single uninterrupted segment in the genomic D N A sequence. The genomic sequence extends an additional 965 bp in the 5' direction and 148 bp 3' of the polyadenylation site. An A T G codon in good initiator context is present 15 nucleotides 5' of the end of the longest e D N A isolated previously, defining an open reading frame (ORF) of 599 codons which predicts a polypeptide with a mass of 65.2 kDa. This is consistent with previous analysis of fragments of CENP-B expressed as fusion proteins in Escherichia

coli which demonstrated that CENP-B migrates anomalously in SDS-PAGE with an apparent mass 15-25 kDa larger than predicted (Earnshaw et al. 1987). A second in-frame A T G is present 471 bp 5' of the proximal methionine codon in the genomic sequence. This codon is unlikely to be present on the 3 kb CENP-B m R N A (see below) and did not initiate protein synthesis in vitro (data not shown). To determine whether the proximal 5' ATG corresponds to the translation initiation site for CENP-B, a 1.95 kb Smal fragment was subcloned from pHCG-5 for analysis of the genomic ORF by in vitro transcription and translation. The plasmid pHS-J, containing 20 bp of D N A flanking the A T G codon and 162 nucleotides of 3' flanking sequences, was transcribed in both orientations and the resultant R N A s were translated in vitro in a reticulocyte lysate using [aSS]methionine to label protein products. The sense (TT) transcript specifi-

364

Fig. 3A-C. Identification of the initiator methionine codon for CENP-B. Plasmid pHS-1 was transcribed in either direction in vitro and the resultant RNAs were translated in a rabbit reticulocyte lysate. A Following translation, samples were analyzed by SDS-polyacrylamide gel electrophoresis (SDS-PAGE). Lane 1, molecular weight markers, 116, 94, 66 and 43 kDa. Lane 2, control translation with no RNA. Lane 3, T7 transcript corresponding to sense strand mRNA. Lane 4, T3 transcript corresponding to antisense RNA. B Translation products were immunoprecipitated and analyzed by electrophoresis. Lane 1, molecular weight markers as in A Lane 2, in vitro translation products used for immunoprecipitation. Lane 3, control immunoprecipitation with normal human serum. Lane 4, immunoprecipitation with human antiserum M. The position of CENP-B at 80 kDa is indicated and proteolytic of aberrant translation termination products of CENP-B are noted by arrowheads. C In vitro translated CENP-B was mixed with total HeLa cell lysate and co-electrophoresed and CENP-B was detected by immunoblot with human antiserum M. Lane 1, 10 gl in vitro translated CENP-B. Lane 2, 5 gl in vitro CENP-B mixed with total HeLa cell lysate representing 5 x 10~ cells. Lane 3, total HeLa cell lysate, 1 x 106 cells

cally directed the synthesis of a polypeptide migrating at 80 k D a by S D S - P A G E which was immunoprecipitated by A C A autoantisera (Fig. 3 A, B). Specific inhibition o f translation with an oligonucleotide complementary to codons 11-20 supports the assignment o f the proximal A T G shown in Fig. 2 as the site o f translation initiation in vitro (data not shown). The lower molecular weight products seen in Fig. 3 represent proteolytic or abortive translation products o f CENP-B, since they are obtained in variable yields in different experiments and are inhibited by the anti-sense oligonucleotide. CENP-B translated in vitro was compared with H e L a cell CENP-B by co-electrophoresis and immunoblot analysis and found to be indistinguishable in electrophoretic migration (Fig. 3C, lane 2). These experiments strongly support the assignment of the proximal A T G as the correct initiator codon for C E N P - B translation and demonstrate that C E N P - B is encoded by an intronless O R F in the human genome. [Fig. 2 contains three corrections in the nucleotide sequence which alter the previously reported amino acid sequence (Earnshaw et al. 1987). Resequencing of the c D N A CENP-B4 confirmed the sequence shown here.] The sequence shown in Fig. 2 begins 950 bp 5' o f the initiator methionine codon and extends to a HindIII site-148 bp 3' of the site of polyadenylation observed in several C E N P - B c D N A clones. No introns are present in the 3' untranslated region. The extremely high G + C

Fig. 4A, B. Structure of the mouse genomic CENP-B locus. A Southern blot analysis of mouse genomic DNA digested with different restriction enzymes and probed w~th a CENP-B cDNA. Lanes represent BamHI (B), HindIlI (H) and EcoRI (R) digests, respectively. Markers correspond to Lambola HindIII and phi X174 HaeIII fragments. B The top panel shows a restriction map of the insert of lambda phage M3 containing a 17 kb genomic insert. The bottom panel shows a detailed map of the segment used for sequence determination. The coding region is indicated by the stippled box and the start and stop codons and polyadenylation signal sequence are indicated. Restriction sites are abbreviated as: B, BamHI; Bg, BglII; C, ClaI; H, HindIII; K, KpnI; R, EcoRI; S, SacII; Sal, SalI; X, XhoI

content o f the immediate 5' flanking sequences ( > 90%) has hindered unambiguous determination of the site of transcription initiation, although it is likely to be within 150 bp of the initiator codon based on the observed size of the transcript and homology o f this region with the mouse CENP-B gene (see below). The 5' flanking sequences contain several potential SP-1 binding sites within 300 bp immediately flanking the coding region and a highly A + T rich segment centered around 600 bp upstream o f the coding region. To determine whether C E N P - B is present in a nonprimate species, mouse genomic D N A was analyzed by hybridization with the h u m a n CENP-B coding region, identifying sequences with high homology to CENP-B (Fig. 4A). The corresponding genomic segment was isolated from a mouse genomic library (Fig. 4B) and the sequence o f a 3.8 kb segment was determined (Fig. 5). A 2.8 kb segment of high homology with the human genomic sequence was identified that included a unique O R F encoding a 599 amino acid polypeptide. In vitro transcription and translation o f this D N A segment resulted in the synthesis o f a polypeptide that migrated at 80 kDa in SDS-PAGE, just as observed for the human translation product (Fig. 6A). The mouse, therefore, contains a CENP-B gene homologous in sequence and intronless structure to that found in the human. R N A blot analysis demonstrated that a single 3 kb m R N A transcript identical in size to h u m a n C E N P - B m R N A was expressed in 3T3 cells (Fig. 6B) and a variezy o f mouse tissues (data not shown). Immunoblot analysis with human A C A demonstrated the presence of a crossreactive nuclear protein with an apparent mass o f 80 kDa in 3T3 cells (Fig. 6 C). The identity in apparent mass of this protein with both in vitro synthesized mouse

365 A~TCTTGT~TATTT7TATTTTTC~G~GTGGATTCTGCAT~C~CAGCCCTG~CCAGGCTTGGCTC~CCTCCACCTACCTCTGTCTCCcTCATGCTGG~TT~GG~ATGC 120 GCCATCAC/~TTG~A~TTT~T~TG~CA~T~CAC~TC~GTG~TTT~TGCTATGTTGTCTTTAC~T~CAT~CTGCBT~CCATA~TTGGCT~TT~CATTTG Z40 G~GTATACA~TT~GTTCA~GTTAGG~GTCT~GGCGCAGCACT~Mh~GTGCACAGGCTGCTCTC~GAGGTT~GGATCC~AGcACC~TGTTGC~TT~AT~1AC ~ 0 CTGCAATCCTAC~CCTGT~TGJu~GCCCTCTTCIGGTCTGC~TGGG~TCTGCATG~GGAG~GC~TACA~A~AGG~A~CAT~ACATGCAT~GAC~T~ATT~AA~G~GCA~AC A~T~G~TT~ACACGTAG~AJ~JU~TCC~TCG~T~CG~C~C~T..M~CGCAGT~T~CAGCCT~TGC~AG~AGCCG~TC~GGAG~Juu~GAGCA~CC~A~GC A~A~tJ~GG~CC7~CGCAT~TGT~CAGGTGC~T~GTGcGTTTTGT~TCTGC~T~C~T~TCCC~CGCGTGGCcGA~G~GCACA~T~GGGCCTCAGCG~CCCGGCGGCGGGTCCGG

480 600 ~D

TCGCCACCG~CGCC~G~C~GC~A~CCGC~rG~TCGGGCGGGGCGTG~G~GAGGCCACCAGGTGC~CCC~GC~GCGGG~CCGGG~CCCCGC~C~GGGGCG~c~GG~GTG 960 CCGGGCCCCGGGGCGGAGGGCGCGCCG~GATGGGCCCC~GCGGCGGCAGCTGACG~TCCGGGAG~GT~GCG~TCA~CCAGGAGGTGGAGGAGAACCCGGACCTACGC~GGGCGA~ 1060 N G P K R R q L T F R E K S R I I O E V E E N P O L R K G E X TCGCGCGGCGCTTC~ATCCCGCCGTCCACGCTGAGCACCATCCTG~G~C~GCGCGCCATCCTGGCGTCG~AGC~GTACGGAGTGGCCTCCACCTGCCGc~G~CC~C~GC 1200 A R R F N [ P P S T L $ T I L K N K R A ] L A S E R K Y G V A S T C R K T N K L TGTCCCCGTAC~C~GCTGGAGGG6CTTCTCATCGCTTGGTTCCAGCAGATCCGCGCCGCCGGGCTGCCGGTC~GGGCATCAT~CTGAAAGAG~GCGCTGCGGATAGCGGAGGAGC

1320

S P Y D K L E G L L [ A ~ F Q Q X R A A G L P V K G I I L K E K A L R [ A E E L TGGGCATGGA~GACTT~CTGCTTCC~CGGCTGGCTGGATCGCTTCCG~AGGCGCCA~GGTGTAGTG~CTGCA~CGG~GTGACCCGTT~TCG~G~GCC`AAC~CTGCCCCTC~GCcCC G M D D F T A S N G ~ L D R F R R R H G V V A C S G V T R S R A R T S A P R P Q

1440

AGCCGGCA~CTG~CGGC~AGcCA~G~CCTCCGAGG~CAGCG~TGGGAGCACACCCGGCT~GCACAC~GGGAGGAGCAGCCGC~GTC~GTAG~TGA~GG~TA~GCCTCGCAGGACG P A P A G P A T V P S E G S G G S T P G ~ H T R E E O P P S V A E G Y A S Q D V T•TTCAGCGCCA•CGAGACCAGCCTGTGGTACGACTTTCTGTCGGACCAGGCCTCAGGGcTCTGGGGAGGTGATGGA•CGGCGCGCCAG•CCACCCAGCGCCTTAGC•T•TTGCTGTGCG F S A T E T S L ~ Y D F L S D Q A S G L ~ G G D G P A R Q A T Q R L S V L L C A C~AACGCCGATGGCAGCGAGAAGCTTCCCCCACTGGTCGCAGGCAAATCT~CCAAGCCCCGT~CAGG~CAAGGTGGTCTG~CCTG~GACTACACTG~C~CTC~AAGGGTGGAGTCACCA

1560 1~0 1600

N A D G S E K L P P L V A G K S A K P R A G O O G L P C D Y T A N S K G G V T T CCCAG~CTCTGGCT~GTACTTGAAAGCTCTGGA~ACTCGAATGGCAGCAG~TCTCGTCGGGTCCTT~TGCTGGCCGG~CGTCTGGCTGCCCA~TCCTTGGACACCTCGGGCCTGCGG~

1920

O A L A K Y L ~ A L D T R M A A E S R R V L L L A G R L A A O S L D T S G L R H ACGT~AGCTGGCCTTCTTCCCTCCTGGCA~CGTGCATCCTTTGGAGCGAGGAGTGGTCC~AGGTG~GGGC~A~TA~CGCCA~GC~ATGTTGCTC~G~CCATGG~AGCA~TCGAGG

2040

V Q L A F F P P G T V H P L E R G V V O O V K G H Y R O A M L L K A M A A L E G GCCAGGATCCCT~AG~CTTGC~CTG~GCCTTGT~GAGGCCCTGCACTTTGTGGCTGCAGCCTGGCAGGCAGTGGAGCCCT~GGACATAGC~CTTG~TTTCGTGAGGCGGGTTTTGGAG 2160 O D P S G L Q L 6 L V E A L H F V A A A ~ Q A V E P S D I A T C F ~ E A G F G G GTGGC~TT~TGC~ACTAT~A~CACTT~CTT~GAGTGAG~GGAA~AG~GG~GAGGAG~GG~G~A~GAG~AG~GG~GAGGAGGGTG~GGGGAAGAGGA~AGGAAG

~260

G L N A T ~ T T S F K S E G E E E E E E E E E E E E E E E E E G E G E E E E E E AGGAAGAA~AAGGGGAG~AGGAAGGA~GG~GGAGAG~GG~G~GAGGAA~GGTAGAAGAG~AGGTGAGGTTGATGATAGTGAT~AGA~A~G~GAAAGTTCCTCTGAGGGTT ~00 E E E G E E E G G E G E E E G E E E V E E E G E V D D E D E E E E E S S S E G L TA~AGGCTc~AGACTGGGCCCA~GAGTAGTBG~CCAGTGGTGG~TTTGGGGGTTATAGTGTCCAGG~GAGG~C~AGTTCCCCACTCTCCAT~TCCTGGAAGGAG~AGAGGACTCTG

ZS~O

E A E D ~ A ~ G V V E A S G G F G G Y S V Q E E A ~ F P T L H F L E G G E D S D ACTCA~ACAGTGATGAAGAGGAGGA~GATGAAGAGGAGGATGAGGAGGATG~GAT~AGGAGGAT~TGAGGATGGTGATGAG~TCCCTGTGCCCAG~TT~GGGGA~G~CA~GG~T~A~ ~ 0 S ~ E D E E E D D E E E D E E B E D E E O ~ E ~ D E V P V P S F ~ E A ~ A Y F TTGCCATGGTCAAGA~GTACCTGA~C~CcTTCCC~AT~GA~GA~CGAGTGCAGAG~CACATCCTT~ATTTGG~ACGA~CTGGT~CACGTGACTAGG~G~CCATG~AGGCA~G~GG ~ 0 A M V K R Y L T s F P l D O R V q S H I L H L E H D L V H V T R K N H A R G A G GAGTTCGGGGTCTTGGACATCA~GCTGAGCTGCTGGACATAGCTGTACCCCAGCCCAGATGGGCTGCTCCTGCCC~GGCAGAGAA~TCTGGG~GCTGCTGGAGATGGCTG~G~GC V R G L G H Q S ~AGGGCTTT~AGC~CGCTTCGCCAGGTTC~AGGCTCCAGG~TCTGAGA~AGGC~CAGGGGCTGAGGTCCCCCTCACTGCTGTTGCCTCTTTCTCAG~TCCTCTTTC~T~CATTAG T~CCTGGGCT~AGGG~ACTGGATGGGTGGGGAGCTGT~CGGTGCTACCACACCATGCCATCAGTGGGCTA~TCACAGCAGCAGC~AGGGTTGGGTCCTGG~G~TCTTG~CCAGAGAGT GCCTCTCCCC~TGCCA~CCAACCAGGTCTTTGGTGGGGGGAT~CCAAAGCCATTCTGGAAAGGGCTCCAGAGG~GGTCTAGC~TAGGCCCCCACAAA'ACTAGCAGCCCCC~TCCTGCAT CTCTAGGCTA~AGAAGCACAGCGTAACTTAGGGCGGGCTCCCGAGCCTGGCTCTGCCTGCTTTCCACCTCCCCAGATCCCTTTCTCTGGCCCAGT~TTTGGCC~TTGGT~TTCTTTC~A GATTGGAGGTTTCCAGGAGGCCCTCCAGGGGCGTGACAGCAGGCAC~TCC~TCTGGGTAGCTACAGG~CTCTGCCTCATCCT~TGCAGACCCCCAT~TGGGCAGAGGGG~TGGGGCT GGGCG~AGAGTCCAGCCACCCAGGTGCTCCTTTCCCAGCTTGAATTC~TAAATCTGTCCACTCCCCTTTTGTGGGAGTGAATGTTTT~CAGCCAAGGATGCATC~T~CATGGTCTGGG CTTGCGT~TATCTTGGGGGA~ACATCCCTTCCCGATTCCCCTTGGTTCTTG~CGTTAGCGGGACATGG~GCAAGGTTTGAGTAGATGGACGTGACTGGGGTGAGTGGCAGGCGGTGACA CCCACAT~TA~TGTTA~AGAATTCTGGGGAAGGAGGGTGTTA~ATCTTTTTGATAGTCTTGAGGAACCTTAGCCTTAAAGTTTCTTAGTGAGAAGCTCCGGGTGGGGTAGGAGATAGGG

~0 3000

5120 ~240 ~360 ]460 3600 3~0 ~0

Fig. 5. Sequence of the mouse CENP-B gone. The 3.84 kb sequence determined ~om the genomic segment is shown with translation of the 599 residue ORF; the polyadenylation signal AATAAA is un&rlmed. (EMBL accession no. X55038)

Fig. 6A-C. Identification of mouse CENP-B gone products. A Plasmid pMBR-6 containing the BgllI-EcoRI fragment of the mouse CENP-B gene was linearized at the EcoR[ site, transcribed with T3 RNA polymerase and the resulting RNA was translated in vitro. Proteins were analyzed by SDS-PAGE. Lane l, molecular weight standards. 116, 94, 66 and 43 kDa; lane 2, no RNA control; lane 3 , p M B R - 6 sense strand RNA. B RNA blot analysis of total RNA probed with human CENP-B eDNA: Lane l, HeLa RNA; lane 2, mouse 3T3 RNA. C Immunoblot analysis of mouse 3T3 cell proteins using human anticentromere autoantiserum M. Lane l, total cell extract; lane 2, cytoplasmic proteins; lane 3, nuclear proteins

C E N P - B and h u m a n C E N P - B , its cross reactivity with A C A antibodies and the expression o f a C E N P - B m R N A in these cells strongly suggest that it represents m o u s e CENP-B. M o n o c l o n a l antibodies raised against a h u m a n C E N P - B fusion protein failed to cross react with this protein; however these antibodies have been found to be specific for h u m a n C E N P - B in a number o f experiments (data not shown). An additional 50 k D a polypeptide was detected in purified nuclei in these experiments, This protein may correspond to the 50 k D a C E N P - D antigen described by Kingwell and Rattner (1987). Its immunological relationship with C E N P - B has n o t been examined. The m o u s e and h u m a n C E N P - B genes share 88% sequence identity within the coding region and a surprisingly high degree o f identity in the 5' and 3' flanking sequences (Fig. 7A, B)_ The 3' untranslated regions show 82.7% identity with several short gaps in the alignment; this h o m o l o g y extends past the site o f polyadenylation for 50 bp into the flanking genomic D N A . The 5' flanking regions extending 140 bp 5' o f the A T G show 94.6%

366

A

B Mcehpbl.'~e.q

ck:

2.625,

6~9

Ip

Mcenpbf.S:eq ok: 2~625, 2 . 7 ~ 0 Io ~lBap

9,B9

k 2E

-r

"

\

g-

*

"r "

xx

.\

.,o

go, x

":.

==-

.\

.,

,.

'~, \

o~ ro

~j

-

. 9

5

01

== 01

o

c~

-

\

,,, 3,000

700

800

3,500

900

C hCENP-B

~P~E~L~F~E~K~EVEENPDLRKGE~ARRFN~PPSTLST~LKNKP`A~LA~ERKYG~ASTCRKTNKLSPYDKLEGLL~AWF~IEAA~LP~K~L 1DO

mCENP-B I~EY-J~LI~1AEEL CMDDF'[A$NG~LpRFRRRHGWSCSGVARARARNAAPRTPAAPASP~SEGS~STTG~EE~PPSVAEGYA~bV A T S TS POP G T P HT

~ V L LLAGRLA

~00

~~ C F~EAgFgGGp~AT~~T $ T L

/.OD

Yb f LPbQAAP,Lc&~iPGl(PRGWIQRL~;VLLCA~NAPGSEKLPPLVAGKSAKPRAGQAGLPCDYTANS~GGVTT~LA~L~ L b l ~ E 5. S U PA G A~s

F~'~TVT4PLERGWOQVtC'G~YR D ~ L L ~ L E G ~ P S G L Q L G L TEALH ~ V ~ A V ~ P ~

FS~] E1SL~ ZOO

LK$EG~s163163 f

s163

r

F

500 E

G

E .

E D

r L HFLEf,f.EO~0505EEEOOEEEDO~OEODODDEED~EVPVPSF G ~ Y F ~ R Y L T S F P [ 0 E 0 E EDE EE D

.,,

s C

G ~v

DDRVQSHI LHLEHOLVHVTAKNHA~QAGVRGLG 600

HOS ~f~0] Fig. 7 A - C . Comparison of human and mouse CENP-B D N A and

protein sequences. DNA sequences were analyzed using a diagonal dot-matrix method. In panel A, the 300 bp flanking the initiator codons were compared; in panel B, the 3' flanking sequences distal to the stop codon were compared. In both panels, the vertical soidentity with single gaps of 1 and 2 bp inserted in the mouse and human sequence, respectively. Beyond this region, there is no extensive sequence similarity in the 5' end, although the mouse gone possesses a region of high A + T composition similar to that seen in the human gene. Comparison of the predicted sequences of the encoded CENP-B protein products reveals them to be highly conserved as well. The alignment shown in Fig. 7 C shows 92~ 1% sequence identity between the two proteins, with 43 amino acid differences and 4 insertion/ deletion changes, The gaps introduced in the alignment occur in the first a o d i c domain and are compensatory, resulting in sI~gh~ differences in acidic residue d~strihution but a conserved length wRhin thi~ region. A total of 25 amino acid substitutions can be considered conservative, yielding a level of 96% sequence similarity between mouse and human CENP-B. These data demonstrate that CENP-B is a highly conserved mammalian gene, both in coding and noncoding sequence domains. The majority of the differences between the two proteins are found in a segment of high predicted flexibility near

quence is human CENP-B and the horizontal sequence is mouse CENP-B. Panel C shows the alignment of the predicted protein sequences. Note the gaps introduced in the acidic domain between residues 405--460

the amino-terminus or within the acidic domains near the carboxyl-terminus, reinforcing the view of CENP-B as a multidomain protein. Based on the alignment shown in Fig. 7, secondary structure prediction and analysis o f the primary structure with several physical indexes, CENP-B can be divided into four domains which may correspond to physical domains of the folded protein (Fig. 8). The amino-terminal domain I spans residues 1-140 and consists of a relatively basic domain of four predicted alpha helices interspersed with short turns and a predicted beta sheet at residues 60-65. This is separated from the central domain Ill by an extended segment o f high Dredicted flexibility and turn content interspersed with short predicted beta sheets spanning residues 141-218 (domain It in Fig. 7). This region contains a proline rich sequence identified by Joly et al. (1989) as similar to the protease sensitive hinge region of MAP2. The central domain is comprised of residues 219-390 and contains a series of alternating beta sheets and alpha helices, bounded by a long hydrophobic helix at the carboxyl-end of the domain. This is separated

367

A 1.5

0.5

-0.5

1

51

101

151

201

251

351

301

401

451

501

551

B I

II Basic

C N-myc L-myc El2 E47 MyoD MYF-5 daught twist AS-C/T4 AS-C/T5 CBF1 CENP-B

III Helix i

IV (Loop)

ERIIRNHN]~TEI~Q'I~RNDI~RSS~LTCRDHVP TKRKNHN~LE~KR RND~CRSR~LA~RDQVP ERRMANNARERVRVRD~EA~RE~GRMQQ ER~MANN~~ R% ,R ~V~,~ VRD~EA~RE~GRMQQ DRRKAATMRERRRLSK~V~'EA~ET~KRCTS DRRKAATMRERRRLKKyNQAFET~KRCTT ERRQANPIARERIR I RD~'EA~KE~GRMCM NQ~'~VMAN~Q~ TQSL~DA~KS~QQII P

~RR...NAR~RNRWQVNNSgAR~RQH~P

Z~R...NR'~N~ RAR~NC~Nff~Tt~L

QR'KDSHKEVNR'R RENII:~:TAYNV~SDL IP gRRQLT,EIIEKS8 I I QEvEENPD~RKGEI

.

o

.

Helix 2 NEKAAKVVI~[KKXTEyVHS~QAEEH CSKAPKVV~I~SK~L~E~LQA'~VGAEK EKpQT~'~LI~ HQ~V~'V~Lff~EQQVR DKAQT~*CL~ QQ%V~Q~fLG~EQQVR NQRLp~VE~:~RN~I~'y'~EG~QALLR NQRLP~E~RN~I ~ ~ES~QELLR PQTQTKIZGI[s IMTLEQQVR SDKLSI~QT~KI~TR~"~D K F~CRMLS KKI I S~ DT~RI~V'~)'~:RS~i~Q DLVD HSKLEKXD~I~EK~TVK'~QEZQRQQA vRESSK~A~LAR~A~EY~QK~I~KETDE TI LKN~'RA"i~ASER~YGVASTCRKT

Fig. $A-C. Structure analysis of CENP-B and alignment with helix-loop-helix (HLH) protein domains. A Secondary structure analysis of mouse CENP-B. Gamier predictions for alpha helix (solid line) and beta sheet (dotted line) structures are shown at the top of the panel with Chou and Fasman beta turn potential (dashed line) plotted directly. The solid curve below shows the Kyte-Doolittle hydrophilicity calculated over a window of 9 residues and then smoothed by averaging over a window of 5 residues, B Proposed domain structure of CENP-B. Domains are represented by boxes (I, III and IV) or curved lines (II), spanning segments of the sequence as discussed in the text. The basic segment homologous

to the HLH DNA binding element in domain I is hatched and the acidic stretches in domain IV are shaded. C Alignment of CENP-B with HLH proteins. Alignments of HLH proteins were made after Benezra et al. (1990) and Cai and Davis (1990), to optimize homology in the HLH domain. The loop region between helices 1 and 2 is not shown. Amino acid identities and residues conserved among most HLH proteins are shaded. Primary sources for sequences were: N-Myc, Kohi et al. (1986); L-myc, DePinho etal. (1987); E12 and E47, Mure etal. (1989a); daughterless, Caudy et al. (1988); twist, Thisse et al. (1988); AS-C/T4 and AS-C/ T5, Villares and Cabrera (1987); CBF-1, Cai and Davis (1990)

by a short flexible segment from the highly acidic carboxyl-terminal domain IV spanning residues 404-599. A search of G E N B A N K / E M B L and N B R F databases revealed no extensive h o m o l o g y with other k n o w n proteins outside of the highly acidic segments o f the carboxyl-terminus. Highly acidic amino acid segments are characteristic o f a n u m b e r of nuclear proteins and have been proposed to play a rote in chromatin binding (Earnshaw 1987). Limited h o m o l o g y o f the flexible domain II to elastin (Bressan et al. 1987) and the M A P 2 hinge cited above, was confined to a pattern of proline and small side chain amino acids but underscored the predicted flexibility o f this segment. A more directed search for h o m o l o g y with known tubulin binding proteins failed to reveal regions of significant similarity. A sequence identical to the M A P binding m o t i f p r o p o s e d

f r o m analysis of tubulin peptides, E G E E , occurs several times in the acidic domain (Paschal et al. 1989; Brinkley 1990). This sequence is detected frequently in m a n y unrelated proteins and its significance for non-tubulin proteins is not clear. Direct sequence comparison with several classes o f D N A binding proteins revealed a significant similarity with a basic amino acid m o t i f that has been implicated in D N A binding o f the H L H family of transcriptional regulators (Lassar et al. 1989; Voronova and Baltimore 1990; Murre et al. 1989b). A total of 7/7 of the conserved residues in this motif, as well as the overall length and spacing o f the H L H D N A binding domain are conserved in C E N P - B (Fig. 7C). C E N P - B possesses two predicted alpha helices flanking this site separated by an unstructured " l o o p " region, but these are not highly similar in amino acid sequence or hydrophobic

368 moment characteristics with the helices of the H L H family. CENP-B shares only limited similarity with the amino-terminal portion of H L H hehx 2.

Discussion

In this work, we have identified and characterized the genes encoding the centromere associated protein CENP-B in two mammals, human and mouse. Mammalian CENP-B is encoded by an intronless O R F that specifies a protein of 599 amino acids sharing 96% sequence similarity between the mouse and the human. This high degree of homology is also reflected in the 5' flanking and 3' untranslated regions of the m R N A detected in both species. CENP-B is thus a highly conserved mammalian gene and is not restricted to primates. We have also detected homologous sequences in the chicken genome, suggesting that CENP-B is present throughout the vertebrate kingdom (data not shown). Previously, CENP-B has been unambiguously identified only in human chromosomes, based on immunoblot analysis and cDNA cloning. Most human anti-centromere sera recognize CENP-A (17 kDa) and CENP-B (80 kDa) in blots of human cell extracts or chromosomes while some recognize a third protein, CENP-C (140 kDa) (Earnshaw and Rothfield 1985). These sera appear to contain both unique and cross reactive antibodies, complicating the task of identifying homologous antigens in other species. Atthough some CREST sera recognize centromeres in diverse phyla, the apparent molecuiar masses of antigens detected in different species is not conserved (Mole-Bayer et al. 1990; Kingwell and Rattner 1987; Earnshaw and Rothfield 1985). In rodent species, a 50 kDa antigen has been detected with CREST sera and proposed to be the mouse CENP-B homolog by virtue of reactivity with affinity purified human antiCENP-B antibodies (Earnshaw and Rothfield 1985), while other workers have detected an 80 kDa antigen in mouse and hamster cells (Balczon and Brinkley 1987; Simmerly et al. 1990). Based on the evidence presented here, it is clear that an 80 kDa nuclear protein homologous to human CENP-B is present in mouse cells. Monoclonal antibodies raised against a fusion protein containing the carboxyl-terminal 25% of human CENP-B do not cross react with the mouse gene product and appear to be specific for human CENP-B, consistent with the sequence divergence seen in this segment of the protein (Fig. 7 C). However, a polyclonal rabbit antiserum raised against the CENP-B fusion protein reacts weakly with the centromeres of mouse chromosomes by immunofluorescence, indicating that CENP-B is indeed localized at centromeres in the mouse (J.B Rattner and A. Wong, personal communication). Thus, CENP-B represents a highly conserved centromere associated protein in vertebrates. It is not clear whether the 50 kDa nuclear protein detected in Fig. 6C represents the 50 kDa CENP-D protein described by Kingwell and Rattner (1987) or whether it may be related to CENP-B. However it is unlikely to be a product of a closely related gene, since CENP-B

appears to be a single copy g e n e m both the human and the mouse (Fig. 1 and 4). The structure of the CENP-B gene is unusual in that it is both intronless and very highly conserved at the nucleic acid level. The property of being intronless is shared by a number of genes with no apparent biological or regulatory similarity and the biological significance of being intronless is unclear. Evolutionary conservation of untranslated sequences is the exception rather than the rule and may be correlated with post-transcriptional mechanisms of gene regulation (e.g Mullner and Kfihn 1988; Caput et al. 1986). Analysis of the highly conserved 5' and 3' untranslated region or subsequences from these regions did not reveal any significant homologies with other genes or with known regulatory motifs. While the high degree of conservation of the untranslated regions is suggestive of a conserved regulatory mechanism(s) for CENP-B that acts at the level of RNA, a biological role for these sequences remains to be determined. In human chromosomes, CENP-B is associated with centromeric heterochromatin underlying the kinetochore domain of the chromosome (Cooke et al. 1990) and interacts with a subelement of human alphoid satellite DNA, a 17 bp segment termed the CENP-B box (Masumoto et al. 1989). Inspection of the complete sequence of CENP-B reported here reveals a small amino acid segment at the amino-terminus (residues 3-15) that has homology to the D N A binding motif of the HLH family of proteins (Fig. 8 C). Preliminary experiments indicate that domain I indeed possesses a sequence specific alphoid D N A binding activity (K.F.S, unpublished data). Recently, a yeast cenlromere-binding protein, CBF1, that recognizes CDE-I of the S. cerevisiae centromere has been characterized and the corresponding gene has been cloned (Cai and Davis 1990; Baker and Masison 1990). CBF1 is a member of the HLH family of proteins and has properties consistent with a role in gene regulation as well as chromosome segregation, since deletion of CBF1 results in methionine auxotrophy. CENP-B shows only limited homology to the helical portions of the H L H protein family, restricted to the amino-terminal portion of helix 2. Functional dissection of E47 (Voronova and Baltimore 1990) and MyoD (Davis et al. 1990) indicates that the putative helical segments of H L H proteins are involved in protein-protein interactions that lead to protein dimer and heterod~mer formation. It is possible that CENP-B represents a distant member of this family with altered protein-protein interactions. In preliminary experiments dimer formation has not been detected in CENP-B samples prepared by in vitro translation (K.F.S. unpublished observations). Two other segments of the protein exhibit similarity with other known proteins: the extended region between residues 140-200 and the highly acidic segment found near the carboxyl-terminus. Joly et al. (1989) have reported a similarity between CENP-B and the sequence of a protease sensitive hinge domain of MAP2 and a search of the protein database identified similarity of the same segment with elastin (Bressan et al. 1987). In both proteins, the similarity is restricted to a pattern

369 o f proline residues, P X X P X X P X X P , a n d n o t the sequence per se, a conslusion strengthened by divergence between m o u s e and h u m a n C E N P - B in this region which alters the pattern o f proline residues, but the region remains.proline-rich. This segment lies at the a m i n o - t e r m i nal end o f a 60 residue segment o f high predicted flexibility and low secondary structure potential a n d appears to define an extended flexible d o m a i n (Fig. 8; see also Pluta et al. 1990). Brinkley (1990) has discussed a similarity f o u n d between C E N P - B and tubulin in the acidic d o m a i n o f C E N P - B consisting o f four repeats o f the sequence E G E E . This sequence is f o u n d in the acidic carboxyl-termini o f b o t h alpha and beta tubulins and has been p r o p o s e d to mediate the interaction o f tubulin with m i c r o t u b u l e associated proteins (Paschal et al. 1989). A scan o f the N B R F protein sequence databases revealed a total o f 210 occurrences o f this sequence in over 90 proteins and protein families, suggesting that it m a y be distributed widely a m o n g a p p a r e n t l y unrelated proteins. The significance o f this sequence for proteins other than tubulin remains to be established, but it is possible that it represents an element for interaction o f microtubule-associated proteins with n o n - t u b u l i n structures. The d e m o n s t r a t i o n that C E N P - B is highly conserved in vertebrates strongly suggests that it is associated with some conserved element o f centromere structure. The a p p a r e n t association o f C E N P - B with a l p h o i d D N A suggests that C E N P - B m a y serve a scaffolding function for the kinetochore, but also raises a c o n u n d r u m as to h o w C E N P - B function m a y be conserved. C E N P - B a b u n d a n c e varies on different h u m a n c h r o m o s o m e s ( E a r n s h a w e t a l . 1987) a n d analysis o f dicentric c h r o m o somes and partial centromere deletions indicates that the presence or a b u n d a n c e o f C E N P - B is n o t directly correlated with centromere function ( E a r n s h a w et al. 1989; Pluta et al. t990). Repetitive D N A at the centromere is highly variable in evolution, even a m o n g very closely related species (e.g. W o n g e t a l . 1990). A sequence sharing 9 7 % (16/17) identity with the alphoid C E N P - B box sequence has been identified in the m i n o r satellite D N A sequence o f Mus musculus ( M a s u m o t o et al. 1989; W o n g a n d R a t t n e r 1988), but this sequence is n o t detectable in other m o u s e species or ancestral p r i m a t e species (A. W o n g and J.B. Rattner, personal c o m m u n i c a t i o n ) . Variations in sequence o f the putative targets for C E N P B binding m a y be c o m p e n s a t e d for by the small regions o f protein sequence divergence in C E N P - B by a co-evolutionary mechanism. Investigation o f the distribution and function o f C E N P - B in c h r o m o s o m e s o f different organisms will shed light on h o w c e n t r o m e r e structure and function is maintained in the absence o f c e n t r o m e r e D N A sequence conservation.

Acknowledgements. It is a pleasure to thank Bessie Huang for her generosity in sharing laboratory space and equipment and to acknowledge the technical assistance of Barb Grant and Mark Stapleton. The W.M. Keck Core DNA facility was supported by grants from the W.M. Keck Foundation, the N.S.F. and the Sam and Rose Stein Charitable fund. This work was supported by an Arthritis Foundation Investigator Award and an N.I.H. grant (GM36098) to K.F.S.

References Baker RE, Masison DC (1990) Isolation of the gene encoding the Saccharornyces cerevisiae centromere-binding protein CP1. MoI Celt Biot 10:2458-2467 Balczon RD, Brinkley BR (1987) Tubulin interaction with kinetochore proteins: analysis by in vitro assembly and chemical cross-linking. J Cell Biol 105:855-862 Benezra R, Davis RL, Lockshon D, Turner DL, Weintraub H (1990) The protein Id: A negative regulator of helix-loop-helix DNA binding proteins Ceil 61:49-59 Bloom K, Carbon J (1982) Yeast centromere DNA is in a unique and highly ordered structure in chromosomes and small circular minichromosomes. Cell 29 : 305-317 Bloom K, Yeh E (1989)Centromeres and telomeres: structural elements of eukaryotic chromosomes. Curr Opin Cell Biol 1 : 526-532 Bressan GM, Argos P, Stanley KK (1987) Repeating structure of chick tropoelastin revealed by complementary DNA cloning. Biochemistry 26:149%1503 Brinkley BR (1990) Centromeres and kinetochores: integrated domains on eukaryotic chromosomes. Curr Opin Cell Biol 2: 446452 Cai M, Davis RW (1990) Yeast centromere binding protein CBF1, of the helix-loop-helix protein family, is required for chromosome stability and methionine prototrophy. Cell 61:437-446 Caput D, Beutler B, Hartog K, Thayer R, Brown-Shimer S, Cerami A (1986) Identification of a common nucleotide sequence in the T-untranslated region of mRNA molecules specifying inflammatory mediators. Proc Natl Acad Sci USA 83:1670-1674 Caudy M, Vassin H, Brand M, Tuma R, Jan LY, Jan YN (1988) daughterless, a Drosophila gene essential for both neurogenesis and sex determination, has sequence similarities to myc and the achaete-scute complex. Cell 55:1061-1067 Chert EY, Seeburg PH (1985) Supercoil sequencing: a fast and simple method for sequencing plasmid DNA. DNA 4:165-70 Chomczynski P~ Sacchi N (1987) Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem 62: 156-159 Clarke L (I 990) Centromeres of budding and fission yeasts. Trends Genet 6: 150-4 Clarke L, Carbon J (1980) Isolation of a yeast centromere and construction of small circular chromosomes. Nature 287 : 504509 Cooke CA, Bernat RL, Earnshaw WC (1990) CENP-B: a major human centromere protein located beneath the kinetochore. J Ceil Biol 110:1475-1488 Davis RL, Cheng P-F, Lassar AB, Weintraub H (1990) The MyoD DNA binding domain contains a recognition code for musclespecific gone activation_ Cell 60:733-746 DePinho RA, Hatton KS, Tesfaye A, Yancopoulos GD, Alt FW (1987) The human myc gone family: structure and activity of L-myc and an L-myc pseudogene. Genes Dev 1:773-786 Devereux J, Haeberli P, Smithies O (1984) A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 12:38%395 Earnshaw WC (1987) Anionic regions in nucIear proteins. J Cell Biol 105:1479-82 Earnshaw WC, Rothfield N (1985) Identification of a family of human centromere proteins using autoimmune sera from patients with scleroderma_ Chromosoma 91:313-321 Earnshaw WC~ Halligan N, Cooke C. RothfieId N (1984) The kinetochore is part of the metaphase chromosome scaffold. J Cell Biol 98 : 352-357 Earnshaw WC, Sullivan KF, Machlin PS, Cooke CA, Kaiser DA, Pollard TD~ Rothfield NF, Cleveland DW (1987) Molecular cloning of a cDNA for CENP-B, the major human centromere autoantigen. J Cell Biol 104 : 817-829 Earnshaw WC, Ratrie H, Stetten G (1989) Visualization of centromere proteins CENP-B and CENP-C on a stable dicentric chromosome in cytological spreads. Chromosoma 98:1-12

370 Feinberg A, Vogelstein B (1984) A technique for radiolabelling DNA restriction endonuclease fragments to high specific activity. Anal Biochem 132: 6-I 1 Harlow E, Lane D (1988) Antibodies. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY .Hyman AA, Mitchison TJ (1990) Modulation of microtubule stability by kinetochores in vitro. J Cell Biol 110:1607-1616 Joly JC, Flynn G, Purich DL (1989) The microtubute-bindingfragment of microtubule-associated protein-2: location of the protease-accessible site and identification of an assembly-promoting peptide. J Cell Biol 109:2289-2294 Kingwelt B, Rattner JB (1987) Mammalian kinetochore/centromere composition: a 50 kD antigen is present within the mammalian kinetochore/centromere. Chromosoma 95:403-407 Kohl NE, Legouy E, DePinho RA, Nisen PD, Smith RK, Gee CE, Aft FW (1986) Human N-myc is closely related in organization and nucleotide sequence to c-myc. Nature 319:73-77 Kristensen T, Voss H, Ansorge W (1987) A simple and rapid preparation of M13 sequencing templates for manual and automated dideoxy sequencing. Nucleic Acids Res 15: 5507-5516 Laemmli UK (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227: 680-685 Lassar AB, Buskin JN, Lockshon D, Davis RL, Apone S, Hauschka SD, Weintraub H (1989) MyoD is a sequence specific DNA binding protein requiring a region of myc homology to bind to the muscle creatine kinase enhancer. Cell 58:823-831 Maniatis T, Fritsch EF, Sambrook J (1982) Molecular cloning. Cold Spring Harbor Laboratory. Cold Spring Harbor, NY Manuelidis L (1978) Chromosomal localization of complex and simple repeated human DNAs. Chromosoma 66:23-32 Masumoto H, Masukata H, Muro Y, Nozaki N, Okazaki T (1989) A human centromere antigen (CENP-B) interacts with a short specific sequence in alphoid DNA, a human centromeric satellite. J Cell Biol 109:1963-1973 Mole-Bajer J, Bajer AS, Zinkowski RP, Balczon RD, Brinkley BR (1990) Autoantibodies from a patient with scleroderma CREST recognized kinetochores of the higher piant Haemanthus. Proc Natl Acad Sci USA 87: 3599-3603 Moroi Y, Peebles C, Fritzler MJ, Steigerwald J, Tan EM (1980) Autoantibody to centromere (kinetochore) in scleroderma sera. Proc Natl Acad Sci USA 77:1627-163J Mullner EW, Kiihn LC (1988) A stem-loop in the 3' untranslated region mediates iron-dependent regulation of transferrin receptor mRNA stability in the cytoplasm. Cell 53:815-825 Murre C, McCaw PS, Baltimore D (1989a) A new DNA binding and dimerization motif in immunoglobulin enhancer binding, daughterless, MyoD, and myc proteins. Cell 56:777-783 Murre C, McCaw PS, Vaessin H, Caudy M, Jan LY, Jan YN, Cabrera C, Buskin DC, Lassar AB, Weintraub H, Baltimore D (1989b) Interactions between heterologous helix-loop-helix proteins generate complexes that bind specifically to a common DNA sequence. Cell 58:537-544 Palmer DK, Margolis RL (1987) Kinetochore components recognized by human autoantibodies are present on mononucleosomes. J Cell Biol 104: 805-815 Pardue ML, Gall JG (1970) Chromosomal localization of mouse satellite DNA. Science 168:1356-1358

Paschal BM, Obar RA, Valee RB (1989) Interaction of brain cytoplasmic dynein and MAP2 with a common sequence at the C terminus of tubulin. Nature 342: 569-572 Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85 : 2444-2448 Pfarr CM, Coue M, Grissom PM, Hays TS, Porter ME, McIntosh JR (1990) Cytoplasmic dynein is localized to kinetochores during mitosis. Nature 345: 263-265 Pluta AF, Cooke CA, Earnshaw WC (I990) Structure of the human centromere at metaphase. Trends Biochem Sci 15:181-185 Polizzi C, Clarke L (1991) The chromatin structure of centromeres from fission yeast: differentiation of the central core that correlates with function. J Cell Biol 112:191-202 Rattner JB, Bazett-Jones DP (1989) Kinetochore structure: electron spectroscopic imaging of the kinetochore. J Cell Biol 108 : 1209-1218 Ris H, Witt PL (1981) Structure of the mammalian kinetochore. Chromosoma 82:153-170 Simmerly C, Balczon R, Brinkley BR, Schatten G (1990) Microinjected kinetochore antibodies interfere with chromosome movement in meiotic and mitotic mouse oocytes. J Cell Biol 111 : 1491-1504 Singer MF (1982) Highly repeated sequence in mammalian gehomes. Int Rev Cytol 76:67-112 Steuer ER, Wordeman L, Schroer TA, Sheetz MP (1990) Localization of cytoplasmic dynein to mitotic spindles and kinetochores. Nature 345: 266-268 Sullivan KF, Machlin PS, Rattle H, Cleveland DW (1986) Sequence and expression of the chicken beta 3 tubulin gene. A vertebrate testis beta-tubulin isotype. J Biol Chem 261:1331713322 Thisse B, Stoetzel C, Gorostiza-Thisse C, Perrin-Schmitt F (1988) Sequence of the twist gene and nuclear localization of its protein in endomesodermal ceils of early Drosophila embryos. EMBO J 7:2175-2183 Tyler-Smith C, Brown WRA (1987) Structure of the major block of alphoid satellite DNA on the human Y chromosome. J Mol Biol 195:457-470 Valdivia MM, Brinkley BR (1985) Fractionation and initial characterization of the kinetochore from mammalian metaphase chromosome. J Cell Biol 101 : 1124--1134 Villares R, Cabrera CV (1987) The achaete-scute complex of D. melanogaster: conserved domains in a subset of genes required for neurogenesis and their homology to myc. Cell 50: 415--424 Voronova A, Baltimore D (1990) Mutations that disrupt DNA binding and dimer formation in the E47 helix-loop-helix protein map to distinct domains. Proc Natl Acad Sci USA 87:47224726 Willard HF, Wevrick R, Warburton PE (1989) Human centromere structure: organization and potential role of alpha satellite DNA. Prog Clin Biol Res 318:9-18 Wong AKC, Rattner JB (1988) Sequence organization and cytological localization of the minor satellite of mouse. Nucleic Acids Res 16:11645-11661 Wong AKC, Biddle FG, Rattner JB (1990) The chromosomal distribution of the major and minor satellite is not conserved in the genus Mus. Chromosoma 99:190-195

CENP-B is a highly conserved mammalian centromere protein with homology to the helix-loop-helix family of proteins.

CENP-B is a centromere associated protein originally identified in human cells as an 80 kDa autoantigen recognized by sera from patients with anti-cen...
1MB Sizes 0 Downloads 0 Views