DNA Sequence-!.DNA Sequencing and Mapping, Vol. 2, pp. 397-403 Reprints available directly from the publisher Photocopying permitted by license only
0 1992 Harwood Academic Publishers C m b H Printed in the United Kingdom
SHORT COMMUNICATION
The nucleotide sequence of the human transcription factor HTF4a cDNA Y I ZHANG and M I N O U BlNA Department of Chemistry, Purdue University, West Lafayette, IN 47907- 1393 USA
Mitochondrial DNA Downloaded from informahealthcare.com by ThULB Jena on 11/13/14 For personal use only.
EMBL GenBank Accession N o . M83233
A partial cDNA has previously been isolated that encodes HTF4, a new member of the basic helix-loop-helix family of DNA binding proteins. W e have reconstructed a full cDNA, designated HTF4a cDNA, which spans the entire HTF4 coding region. The 5 ' end of the reconstructed cDNA includes an open reading frame encoding a 117-residue polypeptide. Three closely spaced ATC codons in the major open reading frame may potentially serve as sites of initiation for HTF4a synthesis, to yield proteins that contain 682, 675, and 657 amino acid residues, respectively. Pairwise dot matrix analysis indicates that there are at least three distinct human genes which encode proteins related to the Drosophila daughterless.
(Murre et a / . , 1989b). Thus, heterodimers of M y o D and E l 2 can induce myogenesis (Benezra et a/., 1990; reviewed by Olson, 1990), and those formed between the achaete-scute products and D a specify the formation of the central nervous system of Drosophila (see for example, Villares and Cabrera, 1987; Alonso and Cabrera, 1988; Caudy et a/., 1988; reviewed by Cline, 1989). Comparative sequence analysis predicts that two other human proteins (ITF2 and HTF4) may also represent members of the class A group of the bHLH proteins (Zhang eta/., 1991a). In order to gain further insight into the structure of HTF4, we have determin'ed the nucleotide sequence of several cDNA fragments and assembled a cDNA encoding HTF4a, a full-length protein corresponding to HTF4. To isolate the HTF4a cDNA, we used a D N A segment unique to the HTF4 cDNA to probe approximately 1 . 2 ~O6 1 phage colonies of a HeLa A g t l l library employed in our previous analysis (Zhang et a/., 1991a,b). We identified 1 3 positive clones. The cDNA inserts that were longer than 600 bp (Fig. 1) were isolated from purified recombinant phage and subsequently subcloned into the EcoRI site of pBluescript KS' plasmid (purchased from STRATAGENE). We determined the nucleotide sequence of selected regions of the cloned inserts in order to identify their related regions (Fig. 1). There is a significant overlap in the isolated inserts (Fig. 1). For example, 543 bp of clone F15 (2 kb) overlaps with the original HTF4 cDNA. F 1 5 includes two identical sequences (F1 and F17) and overlaps with a significant portion of several other clones (F6, F7, F8 and F l l ) . Five clones ( F l , F6, F8, F1 1, and F17) have the same 5 ' end, corresponding
Several human cDNAs encode proteins containing a region homologous to the Drosophila daughterless protein Da (Murre et a/., 1989a; Zhang et a/., 1991a). The proteins include HTF4 (Zhang et a/., 1991a), ITF2 (Henthorn et a/., 1990b), and several related products ( E l 2, E47/HE47, and ITF1) encoded by the human E2A gene (Kamps et a/., 1990; Henthorn et a/., 1990b; Zhang et a/., 1991 b). The DNA binding domain of the human and the Drosophila proteins map to their carboxy termini and contains a sequence motif known as bHLH, a basic region followed by a he1ix-loop-helix structure (reviewed by Cline, 1989; Jones, 1990; Olson, 1990). The Drosophila Da and the E2A gene products represent members of the class A family of the bHLH proteins (Murre et a/., 1989b; reviewed by Olson, 1990). The class A proteins form heterodimers with members of class B which includes several myogenic transcription factors and the products of the achaete-scute genes of Drosophila
Address correspondence to: Minou Bina, Purdue University, Department of Chemistry, W. Lafayette IN 4707-1 393 USA.
397
398
Y. ZHANC AND M. BlNA
tion procedure (Sanger e t a / . , 1977). To resolve the ambiguous regions, the reactions were supplemented with 7-deaza-dGTP. In order to obtain the segment corresponding to the 5' region of the HTF4a cDNA, we synthesized two PCR primers (P1 and P2) that map to the 5 ' end of F15 (Fig. 1). P1 and P2 were used in PCR reactions in conjunction with two primers that map near the cloning site in the bacteriophage A D N A to obtain the missing regions of the HTF4a cDNA. We cloned the PCR products and isolated two
to nucleotide position 1872 (Fig. 1). Since this position represents an internal EcoRl site, we suspect that incomplete methylation of this site during the construction of the HeLa library produced a significant number of cloned partial cDNAs. The longest cDNA (F15) was selected for further analysis. W e produced a series of nested deletions in F15, by exonuclease Ill digestions initiated from the Sacl/BamHI or the Apal/HindIII sites of the parental KS' plasmid. The deletion mutants were amplified and sequenced by the dideoxy termina-
Mitochondrial DNA Downloaded from informahealthcare.com by ThULB Jena on 11/13/14 For personal use only.
1000
I
2000 I
I 1120
cod1 ng
ATG
2205
3000
4000
I
I
5000
I
3165
TM
5108
-Probe 1872 F-tll
2608 andF17 3039
2259
HF~
1872
3212
I
F
6 367 1
2377
a F7
1872
3104
7 1872
1186
3213
2748 F15
1
1-1
1347
EN 15 P2
1
1 - 1
1322 EN48 PI Figure 1 Cloning strategy for HTF4a cDNA. The line marked probe designates a segment unique to the HTF4 cDNA that was used to screen the HeLa I g t l 1 expression library. The relative locations of the cDNA fragments isolated from independent phage colonies are shown below the HTF4 cDNA. P1 (nucleotide 1304-1 322) and P2 (nucleotide 1330-1 347) represent two primers that were used in conjunction with two A primers to obtain two independent clones of PCR products (EN48 and EN15, respectively), corresponding to the 5 ' end of the HTF4a cDNA. The numbering system follows the convention shown in Fig. 2.
H U M A N TRANSCRIPTION FACTOR
independent recombinant plasmids: EN15, 1347 base pairs; EN48, 1322 base pairs (Fig. 1). Nucleotide sequence and restriction map analysis showed that both isolates span identical sequences. W e established the primary structure of both strands of the HTF4a cDNA, by overlapping the nucleotide sequences of the isolated cDNA fragments, the F15 deletion mutants, and the PCR products (Fig. 2). The length (5108 base pairs) of the reconstructed cDNA agrees within experimental uncertainties with the size of a major band
30
10
50
Mitochondrial DNA Downloaded from informahealthcare.com by ThULB Jena on 11/13/14 For personal use only.
90
(4.7 kb) observed in Northern blots containing mRNA prepared from HeLa and BJAB cells (data not shown). The reconstructed cDNA includes the entire HTF4a coding region (Fig. 2). The c D N A segment corresponding to the HTF4a mRNA leader includes an open reading frame encoding a 1 1 7-residue peptide (Fig. 2). The major open reading frame is between nucleotide 1120 and 31 65 (Figs 1, 2). A termination codon preceding this main frame indicates that the complete coding region of HTF4a
910
110
910
AAGTCGAAG~AGAATAATATTAAATCGTAAGRGCAGCT M
P
150
950
990
1010
CGTGGAGAGCTGCGTCCCTACAAACTACCCCGGGCCGRGC
2
1030 130
930
GTSCMAT&TGGTGTCTCCTTTCCAGG~GCC~~~TAXGT~TCM
CTGKCCA&GGCTTG&ATAGAGTT&TTTATTCGG~AGCAWAT~TTTAGGATT~
10
399
1050
1070
110 AGTGGUjRGCGGGU;GAAGCCCRAGCCCGTTCTCXGGCCAPAGTGAACTTTMTCGGG
TGTTCCATAAGGGAGCGTGAWGTCTAXCXACCTGCATTGAGCTTGCTTTATCAACC C S I R E R E E S S Q P A L S L A L S T 22 190
210
230
AGAGACCGAGGCTGGTACATATCCGCAAGTGCTTCTGCXGACTGGQXCGTTG5ZTGAAT R D R G W Y I S A S A S G D W G G W L N 42 21 0
250
1090
1110
1130
TGGTTffiATGCGGAGACGGGGCGGCAGAAGTGGCCGAAWLn;A M N P Q Q Q R 7
290
1150
1170
1190
ATGGCCGCTATAGGACCGACAAGGAGCTGAGCGACCTACTGGACTTCAGTKGATGTTT 21 M A A I G T D K E L S D L L D F S A M F
GCGAGGATGCTGCAGTTTTGTTCCG-TGTCACTGAACCAAGTCATGGT-T
A
R
M
L
310
Q
F
C
S
V
K
G
330
L
S
L
N
Q
V
M
V
D
62
350
GATGCTGGTGTTCCATTGATGGGGTCRTATATATAGGGGTCATGGTTTTACTGTATAAACCA D A G V P L M G S Y I G V M V L L Y K P 82 31 0
390
410
WCTTACTGATGAACCTGAAGCTGTTGTTGGAGAACTCTCCAGGTTTAGAATATCTTCFATT G L T D E P E A V G E L S R F R I S S I 102
430
450
410
ATGGGTTCCTTATTATTTTTGTCTTTTTTCTTCTTCTTC M G S L L F L S F F F F F L K 117
490
510
530
1210
1230
1250
TCCCCACCTGTTAATAGTGGGAAAACTAGACCAACTACACTCGGAAGCAGTCFtATTCAGT S P P V N S G K T R P T T L G S S Q F S 47
1210
1290
1310
GGATCAGGTATTGATGAAAGAGGAGGTAWCATCTTGSGGFACAAGTGGTCAACCAAGT G S G I D E R G G T T S W G T S G Q P S 61 1330
1350
1370
CCTTCCTATGATTCATCTAGAGGTTTTACAGAC~CCTCATTACAGTGATCACTTGAAT P S Y D S S R G F T D S P H Y S D H L N 97
1390
1410
1430
GACAGTCGATTAGGAGCCCATGAAGXTTGTCCCCGACACCTTTCATGAACTCAAATCTG D S R L G A H E G L S P T P F M N S N L 117
TTTAACAAGGARAGCTCTTCTGATCTTCTAATATTCAGGATTTTGCAGATATCACTGACA
550
57 0
590
GCTTTAAAAACCACAGCTGAGAAGCTGACTCGCAACCTCACCATCTTCAAATTCGGCAGA 610
630
650
C G A A G G C G C A C G A T T T T A T G T G A ~ T G A A G A G A A G C T
670
690
110
TATTTGTCCAGGGTCCAGTGGGTTTTCAGAAGCCAGCAAT~TTCTGTTCCCACC~A
730
7 50
110
GCAAAGTCTGACCAGTCTTGATATT~TCTGTTCTACTAACTTGA~ATCACTCTT 190
810
830
CCRACGTGAAGGTCTCCAGATACTCTCAGTGTGACGTCTTTCTGCTGCTCTTCATTGGA
850
810
1450
1410
1490
ATGGGAAAAACATCAGAGAGAGGCTCATTTTCCCTGTACAGCAGAGATACTGGATTACCA M G K T S E R G S F S L Y S R D T G L P 137 1510
1530
1550
GGCTGTCAATCTAGTCTCCTGAGACXAGATCTGGGCTTGGGAGCCCAGCACACCTATCT G C Q S S L L R Q D L G L G S P A Q L S 157 1510
1590
1610
TCTTCAGGRAAACCTGGGACAGCATACTATTCATTCTCTGCTACAAGTTCCAGGAGGAGA S S G K P G T A Y Y S F S A T S S R R R 177
1630
1650
1610
CCACTCCATGACTCTGCAGCGCTTGATCCCTTGCAAGSAW4AAGTCAGMACGTGCCT P L H D S A A L D P L O A K K V R K V P 191
890
TGGTCAACGCGGACCACAAGCTCCCAGGRRGCAAATGTAAAGTCAGTGGATGACAGCATT
Figure 2 The nucleotide sequence of the HTF4a cDNA (Cenbank accession #M83233). The leader of the HTF4a cDNA contains an open reading frame encoding a 11 7-residue polypeptide. The longest open reading frame begins at nucleotide 112 0 and corresponds to the predicted HTF4a amino acid sequence. The basic Helixl-Loopl-Helix2 segment defines the HTF4a DNA binding domain. CAS represents the class A specific region of the helix-loop-helix proteins (Zhang et a/., 1991a,b). The nucleotide sequence of the 3' untranslated region of the HTF4a c D N A is not shown and can be obtained from the Cenbank.
400
Y. ZHANG AND M. BlNA
1690
1710
1730
2470
2490
2510
C~TTTGCCTTC~C'IGATAT~~TCCCCRRAT~ M T T ~ T A ~ A T C ~ T ~ P
G
L
S
P
S
V
Y
A
P
S
1750
P
N
S
D
D
F
1770
N
R
E
217
V
G
C 477
ACTCATC-TCTTCmnmTTWTCCTGmAGTACAGTC T H R E D S V S L N G N H S V L S S
T
V
497
S
1830
N
G
Y
G
S
S
L
2530
1790
TfXCCTAGTTATCCATCTCCTAAGCCACCAACCAGTA'IGTTCGCTAGCACTTXTTTATG S P S Y P S P K P P T S M F A S T F F M 237 1810
M
N
A
S
S
R
S A
2550
2590
1850
V
S
G
A
2570
2630
2610
CAAGATGGG;LCCC~TTCTGRCCTTCTGACCTTTGG AA CG TT AT CC T T C A R G C A C R G R C C T ~ C C A T ~ C A C A A ~ ~ A T ~ ~ ~ T ~ Q
D
G
T
H
N
s s
1870
D
L
w s s s N
1890
G
M
s a
P
257
T
T
S
S
T
D
L
N
H
2650
1910
K
T
Q
E
N
2670
Y
R
G
G
L
Q
517
2690
~ ~ T T T ~ ; ~ T ~ ~ ~ T ~ C T T C C C A C A ~ ~ C ~ T CA ~G TT C A AG ~T CT TA f~ f i A A C T G T ' I G T T A C ~ ~ A A G ~ ~ ~ G F G G I L G T S T S H M S Q S S S Y G 277 s a s G T v v T T E I K T E N K E K D E 537 1930
1950
1970
AACCTKATTCACATGACCGCTTGAGTTATCCTCCACACTCAGTTTCAKAACAGACATA N L H S H D R L S Y P P H S V S P T D I 297
Mitochondrial DNA Downloaded from informahealthcare.com by ThULB Jena on 11/13/14 For personal use only.
1990
2010
2030
AACACWLGTCTTCCACCAATGTCCAGCTTTCAT~GCAGTACCAGCAGTTCACCTTAC N T S L P P M S S F H R G S T S S S P Y 317
2070
2050
2130
2150
2730
2770
2790
2170
2190
2210
TCTCCTGACCATACCAGCAGTAGTTTTCCGTCAAATCCATT S P D H T S S S F P S N P S T P V G S P 377
2230
2250
2270
2810
2830
2850
2870
CCTGARCAGAAGATAGAAAGGGAGAAGGAGAGGEGAlGGCTAACAAT~ P E Q K I E R E K E R R M A N N A R E R 597 I basic region
2890 AATGCTGCTGGAAGCTCACAGACAGGTGATGCACTTGGAAAGCTTTGGCATCTATTTAT N A A G S S Q T G D A L G K A L A S I Y 357
2750
GATATCRAGGTTTCATCTAGAGWAGAACAAGCAGTACTAATGMGATGACGATTTGAAC D I K V S S R G R T S S T N E D E D L N 577
2090
GTTGCTGCCTCRCACRCKCTCCCATCAA~lCA~ A G C A T T ~ A ~ C C A G ~ 337 V A A S H T P P I N G S D S I L G T R G
2110
2710
AACCTlCATGAACCTCCTTCATCAGATGACATGAAGTCAGATGATGAATCCTCCCAAA?A N L H E P P S S D. D M K S D . D E S S Q K 557
2910
2930
TTACGCGTGCGGGATATTAATGAAGCATTCAhAGAKTKGCCGAATGTGTCAGCTTCAC L R V R D I N E A F K E L G It M C Q L H 607
I
I
Helix 1 2950
2970
TTGAAGAGTGAAAAACCCCAT&TTATTCTT&TCAAGCCG~GCAGTCAT~ L
K
S
E
K
P
Q
T
K
L
L
2290
2310
2330
AGCTATGAAAACTCACTCCACTCCCTGCAGTCTCGAATGGAffiATCGTTTAGACAGACTG S Y E N S L H S L Q S R M E D R L D R L 417
2350
2370
2390
GATGATGCAATCCATGTGCTGCGGAACCATGCTGTGGGACCTTCCACCAGTTTGCCTGCT D D A I H V L R N H A V G P S T S L P A 437
2410
2430
2450
GGTCACAGTGATATACATAGTTTATTGGGACCATCCCATAATGCACCAATTGGAAGCCTC G H S D I H S L L G P S H N A P I G S L , 457
Figure 2
3010
I
I
Loop 1 TCACCTCTCACAGGTACCAGTCAGTGGCCAAGACCTGGAGGGCAAGCACCTTCATCCCCA S P L T G T S Q W P R P G G Q A P S S P 397
2990
3030
L
H
Q A V Helix 2
A
V
I
627
3050
CTTAGTCTAGAACAGCAAGTCAGAGAGAGAGGAACCTTAACCCCMAGCAGCCTGCCTTM L S L E Q Q V R E R N L N P K A A C L K 647
I
II 3130
3150
3170
ACCCATCCTGGGCTTAGTGATACCAACCCTATGCGTCATATGTAAACATCAGCCAGT T H P G L S E T T N P M G H M
682
(continued).
cDNA has been identified. Three in frame ATG codons (at nucleotide positions 1 1 20, 11 41, and 1 195) can potentially serve as HTF4a translation initiation sites (Fig. 2). Initiation of protein synthesis at these sites would yield polypeptide chains containing 682, 675, and 657 amino acid residues, respective Iy . The reconstruction of a cDNA encoding HTF4a
allowed us to address the question of an evolutionary relationship among the bHLH proteins of class A: Da, HTF4a, ITF2, and E l 2 (used as a prototype of the E2A gene products). Dot-matrix graphic comparison programs provide a convenient method for visualizing regions of similarity between two D N A or protein sequences, especially when comparing sequences of different
Figure 3 Dot plot sequence comparison of HTF4a cDNA with selected members of the class A family of the bHLH proteins. A window of 20 and a stringency of 14 was used to compare the HTF4a cDNA sequence to that reported for (A) human ITF2 (Henthorn eta/., 1990b), (6) human E l 2 (Kamps era/., 1990), and (C) Drosophila daughterless (Caudy e t a / . , 1988; Cronmiller et a/., 1988).
T
~
HUMAN TRANSCRIPTION FACTOR
a
1,000
I
I
1
I
I
I
2,000 I
I
-.
*
I
.
.
I
*
40 1 4,000
3,000
*
8
I
I
l
I
1
I
I
~,OOO
1
I
. I. - 2,000
I
I
,
8-
A
... I
. .'
-
..
. . ..
Mitochondrial DNA Downloaded from informahealthcare.com by ThULB Jena on 11/13/14 For personal use only.
I
1,000
0 1 . 1
a
I
I.
I
.. - . . . . . . .. . .. . . . ... -
1
.
I
I
I
*
.
n
.
.. .'
....
:
.
*
Ei z -
s
...
.
7
0
#
I
l
I
.
..
.?.
I-.
'
D
I
I .*
.
I
.