SHORT COMMUNICATION Sequence of the Human Factor VIII-Associated Is Conserved in Mouse BARBARA LEVINSON, JOHN R. BERMINGHAM, JR.,’ AIDA METZENBERG, VERNE CHAPMAN, * AND JANE GITSCHIER Howard

Hughes Medical

December

19, 1991;revised

Previously, we reported a transcribed gene contained within the human gene coding for blood coagulation factor VIII (7). This nested gene, which we refer to here as factor VIII-associated gene A (F8A), was discovered in conjunction with a CpG island located within intron 22. The function of this gene is unknown, but it is expressed abundantly in a variety of cell types. It is transcribed in the opposite direction from factor VIII and contains no intervening sequences. Two additional copies of FBA are located 5 ’ to the factor VIII gene (6). The sequence has several possible coding regions in all three frames. Computer prediction of reading frame, by the method of Staden and McLachlan ( lo), indicated that a single frameshift at codon 230 would result in a composite open reading frame with the highest coding potential; the alternative reading frames showed low probability of coding. This adjusted sequence would yield a 365amino-acid protein with no significant similarity to known proteins. This report presents the isolation and sequence of cDNA and genomic clones for mouse F8A. HybridizaThe mouse F8A sequence data have been deposited with GenBank under Accession No. M83118. ’ Present address: School of Medicine, University of California, San Diego, CA 92093. *Present address: Department of Medicine, University of Cambridge, Cambridge, CB2 2QQ UK.

Copyright All

rights

13,862-865

862 Inc. reserved.

March

94 143;

16, 1992

tion of Southern and Northern blots with the human F8A probe revealed an expressed murine homolog ( 7). In contrast to the arrangement of F8A genes in humans, there appears to be only a single copy of the gene in mice. As this feature makes it amenable to genetic manipulation, we have chosen to examine the corresponding gene in mice as a prelude to future functional studies. Both genomic and cDNA clones for mouse F8A were isolated. The genomic DNA was cloned by preparing a library enriched for the 15-kb XbaI fragment that contains F8A. DNA from a female CBA mouse was digested with XbaI and fractionated on a 0.8% agarose gel, and the 15kb region was excised and extracted using GeneClean (Bio 101) . A library was generated in the XDash vector (Stratagene) and grown in the recombinationdeficient host DL538. Twenty-six positive plaques were detected from 240,000 recombinant phage screened. cDNA clones were isolated from a mouse liver library purchased from Stratagene. F8A-containing clones from both libraries were detected by hybridization to a 900-bp EcoRI/SstI human fragment (probe A, (7)). For the genomic library, the filters were hybridized in 6~ SSC at 65°C and washed in 0.2X SSC, 0.1% SDS at 65”C, while filters from the cDNA library were hybridized in 30% formamide, 5X SSC, 10% dextran sulfate at 42°C and washed in 2~ SSC, 0.5% SDS at 55°C. Subclones of the two libraries were prepared and sequenced. Figure 1 presents the murine sequence and compares its predicted amino acid sequence with that of the human. Like the human gene, the murine gene is intronless. The mouse mRNA appears to be a few hundred bases shorter than the human mRNA by Northern analysis (7), a finding that is consistent with the ZOO-bp shorter 3’ untranslated region. No attempt has been made to determine the site of transcription initiation in the mouse. The derived amino acid sequence from the longest murine open reading frame, initiated by a methionine codon, extends for 380 codons. Testcode computer analysis (4) revealed a region of high coding potential throughout the open reading frame. Computer-aided comparison indicated that the maximum homology between the mouse- and the human-derived amino acid sequences

cDNA and genomic clones corresponding to the human factor VIII-associated gene (F8A) were isolated from mouse cDNA and FSA-enriched genomic libraries. The sequences of these clones revealed an intronless gene coding for 380 amino acids, with 85% identity to the predicted human sequence. The single murine gene copy is genetically linked to factor VIII, but appears to lie outside the factor VIII gene by physical mapping. Like the human gene, the mouse F8A gene is highly expressed in a wide variety of tissues. This evolutionary comparison has helped to clarify the derived amino acid sequence in the human and strongly supports the hypothesis that the F8A gene encodes a protein. 12 1992 Academic Press, Inc.

(1992) $5.00 8 1992 by Academic Press, of reproduction in any form

SUSAN KENWRICK, 2

Institute and Department of Medicine, University of California, San Francisco, California and l Roswell Park Cancer Institute, Buffalo, New York 14263 Received

GENOMICS 0888-7543/92

Gene

SHORT

863

COMMUNICATION

(85% identity) occurs if the frameshift in the human sequenceis placed at codon 257 rather than at codon 230 as we previously reported (7). On the basis of this prediction, we reinvestigated the human sequence in this

region and observed an additional cytosine at nucleotide position 767. The homology based on this revised human sequence is shown in Fig. 1. Thus the mouse F8A sequence allowed us to reevaluate the human DNA se-

A 1

CCCCTCCATTTGAGCACTCAAGCCCGACCCCCGCAGATTTAATTCCGTACGGGACTGTTTGGGTAGGCACCCAGCCCTTCTTCCTTGATTGGCTTTTATA GTCCTCCATATCCCTCCTTCAACAGCACTTTGCACATCACTGTGGCTACAGCAGACGCTGGGCGGCCTGAGGCGGAAGCGGAGGTGATG~TTGTGATCC

101 201

CATTGTGGGCTAGGCGCAGCGTAGCGTTAGTCATAGCTGTTCCCCCTCAGCCCTCCCTCAACCGGAAGT~GGCGGCGCGGCTAGCCGGTCAACATGGCT MA

1 301 3 401 37

501 70

GCGGGCTCTGCGTCCTCCTTGGGCGGCGGTGCCTGGCCAGGCTCAGAGGCTGGGGACTTCTTGGCACGCTATCGGCAGGTGTCCAACAAGCTC~GAAGC AGSASSLGGGAWPGSEAGDFLARYRQVSNKLKKR GCTTCTTGAGGAAGCCGAACGTGGCCGAGGCCGGGGAGCAGTTCGCCCAGCTAGCCCGGGAGCTGCGCGCCCAGGAGTGCCTGCCTTATGCTGCCTGGTG FLRKPNVAEAGEQFAQLARELRAQECLPYAAWC CCAGCTGGCTGTGGCGCGCTGCCAGCAGGCGCTCTTCCATGGGCCCGGGG~GCGCTGGCCCTCACAGAGGCGGCCCGACTTTTCCTGCGGCAGGAGTGC GEALALTEAARLFLRQEC QLAVARCQQALFHGP

103

GACGCGCGCCAACGCTTGGGCTGTCCCGCCGCCTACGGGGAGCCTCTGCAGGCGGCCGCCAGCCTCGGCGACGCTGTGCGCTTGCACCTCGAGCTCGGCC DARQRLGCPAAYGEPLQAAASLGDAVRLHLELGQ

701 137

AGCCTGCCGCCGCTGCTGCACTGTGCCTGGAACTAGCTAGCTGCTGCCCTTCGCGCAGTGGGCCAGCCGGCCGCTGCTGCCGGTCACTTTCAGCGGGCTGCCCA PAAAAALCLELAAALRAVGQPAAAAGHFQRAAQ

801 170

GCGGCACCTGCCCCTGATGCCACTGGCCGCACTGCAAGCGCTTGGTGATGCTGCCTCCTGCCAGCTGTTGGCGCGCGACTACACAGGCGCCCTGGCGCTA RHLPLMPLAALQALGDAASCQLLARDYTGALAL

901

TTTACACGCATGCAACGCCTGGCACGGGAGCATGGGGGCCACCCGGTACAGCAACTCGAGCTGCTGCCGCAGCCGCCTTCTGGGCCCCAGCCACCCCTGT FTRMQRLAREHGGHPVQQLELLPQPPSGPQPPLS

601

203

1001 231

1101 270

1201 303

1301 337

CGGGACCCCAGCCGAGACCTGTCTTGGGCTCGACCTTGCCCCTCCCGCAGCCCCCGGACCACGCCCCAGGCTCTGTTGCGCCTTCACCTGGCACACTCGG GPQPRPVLGSTLPLPQPPDHAPGSVAPSPGTLG TGCCTTTGCTGACGTCCTAGTCAGGTGTGAGGTGTCCCGTGTACTGCTGCTGCTGCTCCTGC~CCACCACCTGCCAAGCTGCTGCCCGAGCATGCCCAG AFADVLVRCEVSRVLLLLLLQPPPAKLLPEHAQ ACCCTGGAGAAGTACTCCTGGGAGGCTTTCGATGGCCATGGCCAGGATACCAGCGGCCAGCTTCCTGAGGAGCTGTTTCTGTTATTGCAGTCTCTGGTCA TLEKYSWEAFDGHGQDTSGQLPEELFLLLQSLVM TGG~TGC~~AAGAA~GGA~A~TGAAGG~AT~AAGAAG~TG~AGGTGGAGATGTGG~~A~TG~TAA~~G~TGAG~AG~~CAC~T~~TCCAC~TCGTTCT AAQEKDTEG IKKLQVEMWPLLTAEQNHLLHLVL

1401 370

GCAGGAAACCATCTCTCCCTCTGGACAGGGTGTCTGATAACATCTGATCTAGC Q E T ISPSGQGV*

1501

TTCTGTTTTTTTTTGTCTGTTTTTGTCTTGTTTTTCTTTTCATTTCTATTGCTGCCATGTTAATTTGGCTTTGTGTGCCTAGCAAGTTACTTA~TTAGT

1601 1701 B 1 MAAGSASSLGGGAWPGSEAGDFLARYRQVSNKLKKRFLRKPNVAEAGEQFAQLARELRAQECLPYAAWCQLAVARCQQALFHGPGEALALTEAARLFLRQ I1l:.l: Iill II.IIIIiIlIII IIIIIIIIIIIIIIIiIIIIII:II:IIIIIIIIIIIIIIlIIIlIIIIIIIlIIIIIIIlIlIIIIIIlll 1 MAAAAAGLGGGGAGPGPEAGDFLARYRLVSNKLKKRFLRKPNVAEAGEQFGQLGRELRAQECLPYAAWCQLAVARCQQALFHGPGEALALTEAARLFLRQ 101 101 200 200 300 290

ECDARQRLGCPAAYGEPLQAAAS.LGDAVRLHLELGQPAAAAALCLELAAALRAVGQPAAAAGHDYTGA I II.. I.IIIlIIIIIIIIII II.IIIIIIIIIIIIIIIIIIIIIIIIII.:IIIIIII11111111 :Il ERDAPA.LVCPAAYGEPLQAAASALGAAVRLHLELGQPAAAAALCLELAAALRDLGQPAAAAGHFQRAAQLQLPQLPLAALQALGEAASCQLLARDYTGA

:IIIIIIIII:IIIIIIlIIIIIII

LALFTRMQRLAREHGGHPVQQLELLPQPPSGPQPPLSGPQPRPVLGSTLP~QPPDHAPGSV~SPGTLGAFADVLVRCEVSRVLLLLLLQPPPAKLLPE lI:IIIIIIIIIIII:IIII I Il..l.:I .Il.: I.I...I I. .II.IIII:.IIII.IIIIIIIlIIIIIIlIIIIIIIlIIII LAVFTRMQRLAREHGSHPVQSL . . . ..PPPPPPAPQPGPGATPALPAALLPPN . . . ..SGSAAPSPAALGAFSDVLVRCEVSRVLLLLLLQPPPAKLLPE HAQTLEKYSWEAFDGHGQDTSGQLPEELFLLLQSLVMAAQEKDTEGIKKLQVEMWPLLTAEQNHLLHLVLQETISPSGQGV* IIIIIIIIIIIIII:III:.IIIIIIIIIlIIIIIIII .:IIIII:ll.llllllllIIIIIIIIIlIIIlIIlIl HAQTLEKYSWEAFDSHGQESSGQLPEELFLLLQSLVMATHEKDTEAIKSLQVEMWPLLTAEQNHLLHLVLQETISP*.....

FIG. 1. Sequence of the mouse F8A gene and comparison of its predicted amino acid sequence with that of human F8A. The nucleotide sequence of the murine F8A gene is shown in A. The sequence is a composite sequence derived from genomic DNA clones (nt l-1440) and cDNA clones (711-1766). The sequences in the region of overlap were found to be identical. The transcription start site was not determined. A polyadenylation signal ( AATAAA) is present at nt 1748 and cDNA clones were polyadenylated at position 1766. The longest open reading frame predicts a polypeptide of 380 amino acids. In B, the predicted mouse and human proteins are aligned by the GAP comparison program (3). The mouse sequence shares 85% amino acid identity with the human sequence, revised from Levinson et al. (7) as described in the text.

864

SHORT

COMMUNICATION

quence, which had proven difficult to determine definitively due to its CG-richness and the absence of confirmatory amino acid sequence. The deduced murine protein is 15 amino acids longer than the human protein, including two internal sets of 5 extra amino acids and 5 extra amino acids at the carboxy terminus. The extensive similarity of human and murine amino acid sequences between residues 1 and 221 and 259 to the end of the open reading frame, joined by a divergent linker region, suggests that the putative F8A protein may be composed of two domains. The murine gene is GC-rich like its human counterpart and includes sites for the rare-cutting restriction enzymes BssHII, EugI, NueI, NurI, and NotI. These findings suggest an association with a CpG island, although we have not determined whether these restriction sites are unmethylated. The overall G+C content of the sequence shown in Fig. 1 is 60%, significantly lower than that of the human gene. The murine sequence corresponding to the open reading frame is 66% G+C, whereas the human is 74%. This difference is accounted for primarily by changes at the third position of codons. The 295-bp sequence 5/ of the human open reading frame is 18% higher in GC content than the corresponding murine region (75% vs 57% ). Although we are not sure whether these differences are significant, it will be interesting to see whether they persist as more human and murine GC-rich genes are sequenced and compared. We note that in two other X-linked, CpG island-associated genes, the corresponding regions 5’ of the initiation codons are also more GC-rich in the human than in the mouse (10% for hypoxanthine-guanine ribosyltransferase and 4% for phosphoglycerate kinase) . The mouse F8A gene, like its human homolog, is ubiquitously expressed, as shown in Fig. 2. A 1.7-kb F8A

-

1.7 kb-

28s

-18s

FIG. 2. Northern blot of RNA from mouse tissues probed with murine FSA. Poly (A)+ RNA was isolated from a variety of mouse tissues using the Invitrogen FastTrack kit. Two micrograms was loaded on a 1% agarose gel, blotted, and hybridized as described previously (7). The probe was a murine 769-bp F8A cDNA; nt 711-1480 in Fig. 1. The autoradiogram was exposed for 4 h and developed briefly to avoid overexposure.

TABLE Sizes

of Hybridizing Fragments from Pulsed-Field

Enzyme

Human

HSSHII EagI NaeI NarI Not1 Sac11 SfiI

250' !640,670/

Sequence of the human factor VIII-associated gene is conserved in mouse.

cDNA and genomic clones corresponding to the human factor VIII-associated gene (F8A) were isolated from mouse cDNA and F8A-enriched genomic libraries...
667KB Sizes 0 Downloads 0 Views