SHORT COMMUNICATION Sequence of the Human Factor VIII-Associated Is Conserved in Mouse BARBARA LEVINSON, JOHN R. BERMINGHAM, JR.,’ AIDA METZENBERG, VERNE CHAPMAN, * AND JANE GITSCHIER Howard
Hughes Medical
December
19, 1991;revised
Previously, we reported a transcribed gene contained within the human gene coding for blood coagulation factor VIII (7). This nested gene, which we refer to here as factor VIII-associated gene A (F8A), was discovered in conjunction with a CpG island located within intron 22. The function of this gene is unknown, but it is expressed abundantly in a variety of cell types. It is transcribed in the opposite direction from factor VIII and contains no intervening sequences. Two additional copies of FBA are located 5 ’ to the factor VIII gene (6). The sequence has several possible coding regions in all three frames. Computer prediction of reading frame, by the method of Staden and McLachlan ( lo), indicated that a single frameshift at codon 230 would result in a composite open reading frame with the highest coding potential; the alternative reading frames showed low probability of coding. This adjusted sequence would yield a 365amino-acid protein with no significant similarity to known proteins. This report presents the isolation and sequence of cDNA and genomic clones for mouse F8A. HybridizaThe mouse F8A sequence data have been deposited with GenBank under Accession No. M83118. ’ Present address: School of Medicine, University of California, San Diego, CA 92093. *Present address: Department of Medicine, University of Cambridge, Cambridge, CB2 2QQ UK.
Copyright All
rights
13,862-865
862 Inc. reserved.
March
94 143;
16, 1992
tion of Southern and Northern blots with the human F8A probe revealed an expressed murine homolog ( 7). In contrast to the arrangement of F8A genes in humans, there appears to be only a single copy of the gene in mice. As this feature makes it amenable to genetic manipulation, we have chosen to examine the corresponding gene in mice as a prelude to future functional studies. Both genomic and cDNA clones for mouse F8A were isolated. The genomic DNA was cloned by preparing a library enriched for the 15-kb XbaI fragment that contains F8A. DNA from a female CBA mouse was digested with XbaI and fractionated on a 0.8% agarose gel, and the 15kb region was excised and extracted using GeneClean (Bio 101) . A library was generated in the XDash vector (Stratagene) and grown in the recombinationdeficient host DL538. Twenty-six positive plaques were detected from 240,000 recombinant phage screened. cDNA clones were isolated from a mouse liver library purchased from Stratagene. F8A-containing clones from both libraries were detected by hybridization to a 900-bp EcoRI/SstI human fragment (probe A, (7)). For the genomic library, the filters were hybridized in 6~ SSC at 65°C and washed in 0.2X SSC, 0.1% SDS at 65”C, while filters from the cDNA library were hybridized in 30% formamide, 5X SSC, 10% dextran sulfate at 42°C and washed in 2~ SSC, 0.5% SDS at 55°C. Subclones of the two libraries were prepared and sequenced. Figure 1 presents the murine sequence and compares its predicted amino acid sequence with that of the human. Like the human gene, the murine gene is intronless. The mouse mRNA appears to be a few hundred bases shorter than the human mRNA by Northern analysis (7), a finding that is consistent with the ZOO-bp shorter 3’ untranslated region. No attempt has been made to determine the site of transcription initiation in the mouse. The derived amino acid sequence from the longest murine open reading frame, initiated by a methionine codon, extends for 380 codons. Testcode computer analysis (4) revealed a region of high coding potential throughout the open reading frame. Computer-aided comparison indicated that the maximum homology between the mouse- and the human-derived amino acid sequences
cDNA and genomic clones corresponding to the human factor VIII-associated gene (F8A) were isolated from mouse cDNA and FSA-enriched genomic libraries. The sequences of these clones revealed an intronless gene coding for 380 amino acids, with 85% identity to the predicted human sequence. The single murine gene copy is genetically linked to factor VIII, but appears to lie outside the factor VIII gene by physical mapping. Like the human gene, the mouse F8A gene is highly expressed in a wide variety of tissues. This evolutionary comparison has helped to clarify the derived amino acid sequence in the human and strongly supports the hypothesis that the F8A gene encodes a protein. 12 1992 Academic Press, Inc.
(1992) $5.00 8 1992 by Academic Press, of reproduction in any form
SUSAN KENWRICK, 2
Institute and Department of Medicine, University of California, San Francisco, California and l Roswell Park Cancer Institute, Buffalo, New York 14263 Received
GENOMICS 0888-7543/92
Gene
SHORT
863
COMMUNICATION
(85% identity) occurs if the frameshift in the human sequenceis placed at codon 257 rather than at codon 230 as we previously reported (7). On the basis of this prediction, we reinvestigated the human sequence in this
region and observed an additional cytosine at nucleotide position 767. The homology based on this revised human sequence is shown in Fig. 1. Thus the mouse F8A sequence allowed us to reevaluate the human DNA se-
A 1
CCCCTCCATTTGAGCACTCAAGCCCGACCCCCGCAGATTTAATTCCGTACGGGACTGTTTGGGTAGGCACCCAGCCCTTCTTCCTTGATTGGCTTTTATA GTCCTCCATATCCCTCCTTCAACAGCACTTTGCACATCACTGTGGCTACAGCAGACGCTGGGCGGCCTGAGGCGGAAGCGGAGGTGATG~TTGTGATCC
101 201
CATTGTGGGCTAGGCGCAGCGTAGCGTTAGTCATAGCTGTTCCCCCTCAGCCCTCCCTCAACCGGAAGT~GGCGGCGCGGCTAGCCGGTCAACATGGCT MA
1 301 3 401 37
501 70
GCGGGCTCTGCGTCCTCCTTGGGCGGCGGTGCCTGGCCAGGCTCAGAGGCTGGGGACTTCTTGGCACGCTATCGGCAGGTGTCCAACAAGCTC~GAAGC AGSASSLGGGAWPGSEAGDFLARYRQVSNKLKKR GCTTCTTGAGGAAGCCGAACGTGGCCGAGGCCGGGGAGCAGTTCGCCCAGCTAGCCCGGGAGCTGCGCGCCCAGGAGTGCCTGCCTTATGCTGCCTGGTG FLRKPNVAEAGEQFAQLARELRAQECLPYAAWC CCAGCTGGCTGTGGCGCGCTGCCAGCAGGCGCTCTTCCATGGGCCCGGGG~GCGCTGGCCCTCACAGAGGCGGCCCGACTTTTCCTGCGGCAGGAGTGC GEALALTEAARLFLRQEC QLAVARCQQALFHGP
103
GACGCGCGCCAACGCTTGGGCTGTCCCGCCGCCTACGGGGAGCCTCTGCAGGCGGCCGCCAGCCTCGGCGACGCTGTGCGCTTGCACCTCGAGCTCGGCC DARQRLGCPAAYGEPLQAAASLGDAVRLHLELGQ
701 137
AGCCTGCCGCCGCTGCTGCACTGTGCCTGGAACTAGCTAGCTGCTGCCCTTCGCGCAGTGGGCCAGCCGGCCGCTGCTGCCGGTCACTTTCAGCGGGCTGCCCA PAAAAALCLELAAALRAVGQPAAAAGHFQRAAQ
801 170
GCGGCACCTGCCCCTGATGCCACTGGCCGCACTGCAAGCGCTTGGTGATGCTGCCTCCTGCCAGCTGTTGGCGCGCGACTACACAGGCGCCCTGGCGCTA RHLPLMPLAALQALGDAASCQLLARDYTGALAL
901
TTTACACGCATGCAACGCCTGGCACGGGAGCATGGGGGCCACCCGGTACAGCAACTCGAGCTGCTGCCGCAGCCGCCTTCTGGGCCCCAGCCACCCCTGT FTRMQRLAREHGGHPVQQLELLPQPPSGPQPPLS
601
203
1001 231
1101 270
1201 303
1301 337
CGGGACCCCAGCCGAGACCTGTCTTGGGCTCGACCTTGCCCCTCCCGCAGCCCCCGGACCACGCCCCAGGCTCTGTTGCGCCTTCACCTGGCACACTCGG GPQPRPVLGSTLPLPQPPDHAPGSVAPSPGTLG TGCCTTTGCTGACGTCCTAGTCAGGTGTGAGGTGTCCCGTGTACTGCTGCTGCTGCTCCTGC~CCACCACCTGCCAAGCTGCTGCCCGAGCATGCCCAG AFADVLVRCEVSRVLLLLLLQPPPAKLLPEHAQ ACCCTGGAGAAGTACTCCTGGGAGGCTTTCGATGGCCATGGCCAGGATACCAGCGGCCAGCTTCCTGAGGAGCTGTTTCTGTTATTGCAGTCTCTGGTCA TLEKYSWEAFDGHGQDTSGQLPEELFLLLQSLVM TGG~TGC~~AAGAA~GGA~A~TGAAGG~AT~AAGAAG~TG~AGGTGGAGATGTGG~~A~TG~TAA~~G~TGAG~AG~~CAC~T~~TCCAC~TCGTTCT AAQEKDTEG IKKLQVEMWPLLTAEQNHLLHLVL
1401 370
GCAGGAAACCATCTCTCCCTCTGGACAGGGTGTCTGATAACATCTGATCTAGC Q E T ISPSGQGV*
1501
TTCTGTTTTTTTTTGTCTGTTTTTGTCTTGTTTTTCTTTTCATTTCTATTGCTGCCATGTTAATTTGGCTTTGTGTGCCTAGCAAGTTACTTA~TTAGT
1601 1701 B 1 MAAGSASSLGGGAWPGSEAGDFLARYRQVSNKLKKRFLRKPNVAEAGEQFAQLARELRAQECLPYAAWCQLAVARCQQALFHGPGEALALTEAARLFLRQ I1l:.l: Iill II.IIIIiIlIII IIIIIIIIIIIIIIIiIIIIII:II:IIIIIIIIIIIIIIlIIIlIIIIIIIlIIIIIIIlIlIIIIIIlll 1 MAAAAAGLGGGGAGPGPEAGDFLARYRLVSNKLKKRFLRKPNVAEAGEQFGQLGRELRAQECLPYAAWCQLAVARCQQALFHGPGEALALTEAARLFLRQ 101 101 200 200 300 290
ECDARQRLGCPAAYGEPLQAAAS.LGDAVRLHLELGQPAAAAALCLELAAALRAVGQPAAAAGHDYTGA I II.. I.IIIlIIIIIIIIII II.IIIIIIIIIIIIIIIIIIIIIIIIII.:IIIIIII11111111 :Il ERDAPA.LVCPAAYGEPLQAAASALGAAVRLHLELGQPAAAAALCLELAAALRDLGQPAAAAGHFQRAAQLQLPQLPLAALQALGEAASCQLLARDYTGA
:IIIIIIIII:IIIIIIlIIIIIII
LALFTRMQRLAREHGGHPVQQLELLPQPPSGPQPPLSGPQPRPVLGSTLP~QPPDHAPGSV~SPGTLGAFADVLVRCEVSRVLLLLLLQPPPAKLLPE lI:IIIIIIIIIIII:IIII I Il..l.:I .Il.: I.I...I I. .II.IIII:.IIII.IIIIIIIlIIIIIIlIIIIIIIlIIII LAVFTRMQRLAREHGSHPVQSL . . . ..PPPPPPAPQPGPGATPALPAALLPPN . . . ..SGSAAPSPAALGAFSDVLVRCEVSRVLLLLLLQPPPAKLLPE HAQTLEKYSWEAFDGHGQDTSGQLPEELFLLLQSLVMAAQEKDTEGIKKLQVEMWPLLTAEQNHLLHLVLQETISPSGQGV* IIIIIIIIIIIIII:III:.IIIIIIIIIlIIIIIIII .:IIIII:ll.llllllllIIIIIIIIIlIIIlIIlIl HAQTLEKYSWEAFDSHGQESSGQLPEELFLLLQSLVMATHEKDTEAIKSLQVEMWPLLTAEQNHLLHLVLQETISP*.....
FIG. 1. Sequence of the mouse F8A gene and comparison of its predicted amino acid sequence with that of human F8A. The nucleotide sequence of the murine F8A gene is shown in A. The sequence is a composite sequence derived from genomic DNA clones (nt l-1440) and cDNA clones (711-1766). The sequences in the region of overlap were found to be identical. The transcription start site was not determined. A polyadenylation signal ( AATAAA) is present at nt 1748 and cDNA clones were polyadenylated at position 1766. The longest open reading frame predicts a polypeptide of 380 amino acids. In B, the predicted mouse and human proteins are aligned by the GAP comparison program (3). The mouse sequence shares 85% amino acid identity with the human sequence, revised from Levinson et al. (7) as described in the text.
864
SHORT
COMMUNICATION
quence, which had proven difficult to determine definitively due to its CG-richness and the absence of confirmatory amino acid sequence. The deduced murine protein is 15 amino acids longer than the human protein, including two internal sets of 5 extra amino acids and 5 extra amino acids at the carboxy terminus. The extensive similarity of human and murine amino acid sequences between residues 1 and 221 and 259 to the end of the open reading frame, joined by a divergent linker region, suggests that the putative F8A protein may be composed of two domains. The murine gene is GC-rich like its human counterpart and includes sites for the rare-cutting restriction enzymes BssHII, EugI, NueI, NurI, and NotI. These findings suggest an association with a CpG island, although we have not determined whether these restriction sites are unmethylated. The overall G+C content of the sequence shown in Fig. 1 is 60%, significantly lower than that of the human gene. The murine sequence corresponding to the open reading frame is 66% G+C, whereas the human is 74%. This difference is accounted for primarily by changes at the third position of codons. The 295-bp sequence 5/ of the human open reading frame is 18% higher in GC content than the corresponding murine region (75% vs 57% ). Although we are not sure whether these differences are significant, it will be interesting to see whether they persist as more human and murine GC-rich genes are sequenced and compared. We note that in two other X-linked, CpG island-associated genes, the corresponding regions 5’ of the initiation codons are also more GC-rich in the human than in the mouse (10% for hypoxanthine-guanine ribosyltransferase and 4% for phosphoglycerate kinase) . The mouse F8A gene, like its human homolog, is ubiquitously expressed, as shown in Fig. 2. A 1.7-kb F8A
-
1.7 kb-
28s
-18s
FIG. 2. Northern blot of RNA from mouse tissues probed with murine FSA. Poly (A)+ RNA was isolated from a variety of mouse tissues using the Invitrogen FastTrack kit. Two micrograms was loaded on a 1% agarose gel, blotted, and hybridized as described previously (7). The probe was a murine 769-bp F8A cDNA; nt 711-1480 in Fig. 1. The autoradiogram was exposed for 4 h and developed briefly to avoid overexposure.
TABLE Sizes
of Hybridizing Fragments from Pulsed-Field
Enzyme
Human
HSSHII EagI NaeI NarI Not1 Sac11 SfiI
250' !640,670/