DNA Sequence-!.DNA Sequencing and Mapping, Vol. 2, pp. 397-403 Reprints available directly from the publisher Photocopying permitted by license only

0 1992 Harwood Academic Publishers C m b H Printed in the United Kingdom

SHORT COMMUNICATION

The nucleotide sequence of the human transcription factor HTF4a cDNA Y I ZHANG and M I N O U BlNA Department of Chemistry, Purdue University, West Lafayette, IN 47907- 1393 USA

Mitochondrial DNA Downloaded from informahealthcare.com by ThULB Jena on 11/13/14 For personal use only.

EMBL GenBank Accession N o . M83233

A partial cDNA has previously been isolated that encodes HTF4, a new member of the basic helix-loop-helix family of DNA binding proteins. W e have reconstructed a full cDNA, designated HTF4a cDNA, which spans the entire HTF4 coding region. The 5 ' end of the reconstructed cDNA includes an open reading frame encoding a 117-residue polypeptide. Three closely spaced ATC codons in the major open reading frame may potentially serve as sites of initiation for HTF4a synthesis, to yield proteins that contain 682, 675, and 657 amino acid residues, respectively. Pairwise dot matrix analysis indicates that there are at least three distinct human genes which encode proteins related to the Drosophila daughterless.

(Murre et a / . , 1989b). Thus, heterodimers of M y o D and E l 2 can induce myogenesis (Benezra et a/., 1990; reviewed by Olson, 1990), and those formed between the achaete-scute products and D a specify the formation of the central nervous system of Drosophila (see for example, Villares and Cabrera, 1987; Alonso and Cabrera, 1988; Caudy et a/., 1988; reviewed by Cline, 1989). Comparative sequence analysis predicts that two other human proteins (ITF2 and HTF4) may also represent members of the class A group of the bHLH proteins (Zhang eta/., 1991a). In order to gain further insight into the structure of HTF4, we have determin'ed the nucleotide sequence of several cDNA fragments and assembled a cDNA encoding HTF4a, a full-length protein corresponding to HTF4. To isolate the HTF4a cDNA, we used a D N A segment unique to the HTF4 cDNA to probe approximately 1 . 2 ~O6 1 phage colonies of a HeLa A g t l l library employed in our previous analysis (Zhang et a/., 1991a,b). We identified 1 3 positive clones. The cDNA inserts that were longer than 600 bp (Fig. 1) were isolated from purified recombinant phage and subsequently subcloned into the EcoRI site of pBluescript KS' plasmid (purchased from STRATAGENE). We determined the nucleotide sequence of selected regions of the cloned inserts in order to identify their related regions (Fig. 1). There is a significant overlap in the isolated inserts (Fig. 1). For example, 543 bp of clone F15 (2 kb) overlaps with the original HTF4 cDNA. F 1 5 includes two identical sequences (F1 and F17) and overlaps with a significant portion of several other clones (F6, F7, F8 and F l l ) . Five clones ( F l , F6, F8, F1 1, and F17) have the same 5 ' end, corresponding

Several human cDNAs encode proteins containing a region homologous to the Drosophila daughterless protein Da (Murre et a/., 1989a; Zhang et a/., 1991a). The proteins include HTF4 (Zhang et a/., 1991a), ITF2 (Henthorn et a/., 1990b), and several related products ( E l 2, E47/HE47, and ITF1) encoded by the human E2A gene (Kamps et a/., 1990; Henthorn et a/., 1990b; Zhang et a/., 1991 b). The DNA binding domain of the human and the Drosophila proteins map to their carboxy termini and contains a sequence motif known as bHLH, a basic region followed by a he1ix-loop-helix structure (reviewed by Cline, 1989; Jones, 1990; Olson, 1990). The Drosophila Da and the E2A gene products represent members of the class A family of the bHLH proteins (Murre et a/., 1989b; reviewed by Olson, 1990). The class A proteins form heterodimers with members of class B which includes several myogenic transcription factors and the products of the achaete-scute genes of Drosophila

Address correspondence to: Minou Bina, Purdue University, Department of Chemistry, W. Lafayette IN 4707-1 393 USA.

397

398

Y. ZHANC AND M. BlNA

tion procedure (Sanger e t a / . , 1977). To resolve the ambiguous regions, the reactions were supplemented with 7-deaza-dGTP. In order to obtain the segment corresponding to the 5' region of the HTF4a cDNA, we synthesized two PCR primers (P1 and P2) that map to the 5 ' end of F15 (Fig. 1). P1 and P2 were used in PCR reactions in conjunction with two primers that map near the cloning site in the bacteriophage A D N A to obtain the missing regions of the HTF4a cDNA. We cloned the PCR products and isolated two

to nucleotide position 1872 (Fig. 1). Since this position represents an internal EcoRl site, we suspect that incomplete methylation of this site during the construction of the HeLa library produced a significant number of cloned partial cDNAs. The longest cDNA (F15) was selected for further analysis. W e produced a series of nested deletions in F15, by exonuclease Ill digestions initiated from the Sacl/BamHI or the Apal/HindIII sites of the parental KS' plasmid. The deletion mutants were amplified and sequenced by the dideoxy termina-

Mitochondrial DNA Downloaded from informahealthcare.com by ThULB Jena on 11/13/14 For personal use only.

1000

I

2000 I

I 1120

cod1 ng

ATG

2205

3000

4000

I

I

5000

I

3165

TM

5108

-Probe 1872 F-tll

2608 andF17 3039

2259

HF~

1872

3212

I

F

6 367 1

2377

a F7

1872

3104

7 1872

1186

3213

2748 F15

1

1-1

1347

EN 15 P2

1

1 - 1

1322 EN48 PI Figure 1 Cloning strategy for HTF4a cDNA. The line marked probe designates a segment unique to the HTF4 cDNA that was used to screen the HeLa I g t l 1 expression library. The relative locations of the cDNA fragments isolated from independent phage colonies are shown below the HTF4 cDNA. P1 (nucleotide 1304-1 322) and P2 (nucleotide 1330-1 347) represent two primers that were used in conjunction with two A primers to obtain two independent clones of PCR products (EN48 and EN15, respectively), corresponding to the 5 ' end of the HTF4a cDNA. The numbering system follows the convention shown in Fig. 2.

H U M A N TRANSCRIPTION FACTOR

independent recombinant plasmids: EN15, 1347 base pairs; EN48, 1322 base pairs (Fig. 1). Nucleotide sequence and restriction map analysis showed that both isolates span identical sequences. W e established the primary structure of both strands of the HTF4a cDNA, by overlapping the nucleotide sequences of the isolated cDNA fragments, the F15 deletion mutants, and the PCR products (Fig. 2). The length (5108 base pairs) of the reconstructed cDNA agrees within experimental uncertainties with the size of a major band

30

10

50

Mitochondrial DNA Downloaded from informahealthcare.com by ThULB Jena on 11/13/14 For personal use only.

90

(4.7 kb) observed in Northern blots containing mRNA prepared from HeLa and BJAB cells (data not shown). The reconstructed cDNA includes the entire HTF4a coding region (Fig. 2). The c D N A segment corresponding to the HTF4a mRNA leader includes an open reading frame encoding a 1 1 7-residue peptide (Fig. 2). The major open reading frame is between nucleotide 1120 and 31 65 (Figs 1, 2). A termination codon preceding this main frame indicates that the complete coding region of HTF4a

910

110

910

AAGTCGAAG~AGAATAATATTAAATCGTAAGRGCAGCT M

P

150

950

990

1010

CGTGGAGAGCTGCGTCCCTACAAACTACCCCGGGCCGRGC

2

1030 130

930

GTSCMAT&TGGTGTCTCCTTTCCAGG~GCC~~~TAXGT~TCM

CTGKCCA&GGCTTG&ATAGAGTT&TTTATTCGG~AGCAWAT~TTTAGGATT~

10

399

1050

1070

110 AGTGGUjRGCGGGU;GAAGCCCRAGCCCGTTCTCXGGCCAPAGTGAACTTTMTCGGG

TGTTCCATAAGGGAGCGTGAWGTCTAXCXACCTGCATTGAGCTTGCTTTATCAACC C S I R E R E E S S Q P A L S L A L S T 22 190

210

230

AGAGACCGAGGCTGGTACATATCCGCAAGTGCTTCTGCXGACTGGQXCGTTG5ZTGAAT R D R G W Y I S A S A S G D W G G W L N 42 21 0

250

1090

1110

1130

TGGTTffiATGCGGAGACGGGGCGGCAGAAGTGGCCGAAWLn;A M N P Q Q Q R 7

290

1150

1170

1190

ATGGCCGCTATAGGACCGACAAGGAGCTGAGCGACCTACTGGACTTCAGTKGATGTTT 21 M A A I G T D K E L S D L L D F S A M F

GCGAGGATGCTGCAGTTTTGTTCCG-TGTCACTGAACCAAGTCATGGT-T

A

R

M

L

310

Q

F

C

S

V

K

G

330

L

S

L

N

Q

V

M

V

D

62

350

GATGCTGGTGTTCCATTGATGGGGTCRTATATATAGGGGTCATGGTTTTACTGTATAAACCA D A G V P L M G S Y I G V M V L L Y K P 82 31 0

390

410

WCTTACTGATGAACCTGAAGCTGTTGTTGGAGAACTCTCCAGGTTTAGAATATCTTCFATT G L T D E P E A V G E L S R F R I S S I 102

430

450

410

ATGGGTTCCTTATTATTTTTGTCTTTTTTCTTCTTCTTC M G S L L F L S F F F F F L K 117

490

510

530

1210

1230

1250

TCCCCACCTGTTAATAGTGGGAAAACTAGACCAACTACACTCGGAAGCAGTCFtATTCAGT S P P V N S G K T R P T T L G S S Q F S 47

1210

1290

1310

GGATCAGGTATTGATGAAAGAGGAGGTAWCATCTTGSGGFACAAGTGGTCAACCAAGT G S G I D E R G G T T S W G T S G Q P S 61 1330

1350

1370

CCTTCCTATGATTCATCTAGAGGTTTTACAGAC~CCTCATTACAGTGATCACTTGAAT P S Y D S S R G F T D S P H Y S D H L N 97

1390

1410

1430

GACAGTCGATTAGGAGCCCATGAAGXTTGTCCCCGACACCTTTCATGAACTCAAATCTG D S R L G A H E G L S P T P F M N S N L 117

TTTAACAAGGARAGCTCTTCTGATCTTCTAATATTCAGGATTTTGCAGATATCACTGACA

550

57 0

590

GCTTTAAAAACCACAGCTGAGAAGCTGACTCGCAACCTCACCATCTTCAAATTCGGCAGA 610

630

650

C G A A G G C G C A C G A T T T T A T G T G A ~ T G A A G A G A A G C T

670

690

110

TATTTGTCCAGGGTCCAGTGGGTTTTCAGAAGCCAGCAAT~TTCTGTTCCCACC~A

730

7 50

110

GCAAAGTCTGACCAGTCTTGATATT~TCTGTTCTACTAACTTGA~ATCACTCTT 190

810

830

CCRACGTGAAGGTCTCCAGATACTCTCAGTGTGACGTCTTTCTGCTGCTCTTCATTGGA

850

810

1450

1410

1490

ATGGGAAAAACATCAGAGAGAGGCTCATTTTCCCTGTACAGCAGAGATACTGGATTACCA M G K T S E R G S F S L Y S R D T G L P 137 1510

1530

1550

GGCTGTCAATCTAGTCTCCTGAGACXAGATCTGGGCTTGGGAGCCCAGCACACCTATCT G C Q S S L L R Q D L G L G S P A Q L S 157 1510

1590

1610

TCTTCAGGRAAACCTGGGACAGCATACTATTCATTCTCTGCTACAAGTTCCAGGAGGAGA S S G K P G T A Y Y S F S A T S S R R R 177

1630

1650

1610

CCACTCCATGACTCTGCAGCGCTTGATCCCTTGCAAGSAW4AAGTCAGMACGTGCCT P L H D S A A L D P L O A K K V R K V P 191

890

TGGTCAACGCGGACCACAAGCTCCCAGGRRGCAAATGTAAAGTCAGTGGATGACAGCATT

Figure 2 The nucleotide sequence of the HTF4a cDNA (Cenbank accession #M83233). The leader of the HTF4a cDNA contains an open reading frame encoding a 11 7-residue polypeptide. The longest open reading frame begins at nucleotide 112 0 and corresponds to the predicted HTF4a amino acid sequence. The basic Helixl-Loopl-Helix2 segment defines the HTF4a DNA binding domain. CAS represents the class A specific region of the helix-loop-helix proteins (Zhang et a/., 1991a,b). The nucleotide sequence of the 3' untranslated region of the HTF4a c D N A is not shown and can be obtained from the Cenbank.

400

Y. ZHANG AND M. BlNA

1690

1710

1730

2470

2490

2510

C~TTTGCCTTC~C'IGATAT~~TCCCCRRAT~ M T T ~ T A ~ A T C ~ T ~ P

G

L

S

P

S

V

Y

A

P

S

1750

P

N

S

D

D

F

1770

N

R

E

217

V

G

C 477

ACTCATC-TCTTCmnmTTWTCCTGmAGTACAGTC T H R E D S V S L N G N H S V L S S

T

V

497

S

1830

N

G

Y

G

S

S

L

2530

1790

TfXCCTAGTTATCCATCTCCTAAGCCACCAACCAGTA'IGTTCGCTAGCACTTXTTTATG S P S Y P S P K P P T S M F A S T F F M 237 1810

M

N

A

S

S

R

S A

2550

2590

1850

V

S

G

A

2570

2630

2610

CAAGATGGG;LCCC~TTCTGRCCTTCTGACCTTTGG AA CG TT AT CC T T C A R G C A C R G R C C T ~ C C A T ~ C A C A A ~ ~ A T ~ ~ ~ T ~ Q

D

G

T

H

N

s s

1870

D

L

w s s s N

1890

G

M

s a

P

257

T

T

S

S

T

D

L

N

H

2650

1910

K

T

Q

E

N

2670

Y

R

G

G

L

Q

517

2690

~ ~ T T T ~ ; ~ T ~ ~ ~ T ~ C T T C C C A C A ~ ~ C ~ T CA ~G TT C A AG ~T CT TA f~ f i A A C T G T ' I G T T A C ~ ~ A A G ~ ~ ~ G F G G I L G T S T S H M S Q S S S Y G 277 s a s G T v v T T E I K T E N K E K D E 537 1930

1950

1970

AACCTKATTCACATGACCGCTTGAGTTATCCTCCACACTCAGTTTCAKAACAGACATA N L H S H D R L S Y P P H S V S P T D I 297

Mitochondrial DNA Downloaded from informahealthcare.com by ThULB Jena on 11/13/14 For personal use only.

1990

2010

2030

AACACWLGTCTTCCACCAATGTCCAGCTTTCAT~GCAGTACCAGCAGTTCACCTTAC N T S L P P M S S F H R G S T S S S P Y 317

2070

2050

2130

2150

2730

2770

2790

2170

2190

2210

TCTCCTGACCATACCAGCAGTAGTTTTCCGTCAAATCCATT S P D H T S S S F P S N P S T P V G S P 377

2230

2250

2270

2810

2830

2850

2870

CCTGARCAGAAGATAGAAAGGGAGAAGGAGAGGEGAlGGCTAACAAT~ P E Q K I E R E K E R R M A N N A R E R 597 I basic region

2890 AATGCTGCTGGAAGCTCACAGACAGGTGATGCACTTGGAAAGCTTTGGCATCTATTTAT N A A G S S Q T G D A L G K A L A S I Y 357

2750

GATATCRAGGTTTCATCTAGAGWAGAACAAGCAGTACTAATGMGATGACGATTTGAAC D I K V S S R G R T S S T N E D E D L N 577

2090

GTTGCTGCCTCRCACRCKCTCCCATCAA~lCA~ A G C A T T ~ A ~ C C A G ~ 337 V A A S H T P P I N G S D S I L G T R G

2110

2710

AACCTlCATGAACCTCCTTCATCAGATGACATGAAGTCAGATGATGAATCCTCCCAAA?A N L H E P P S S D. D M K S D . D E S S Q K 557

2910

2930

TTACGCGTGCGGGATATTAATGAAGCATTCAhAGAKTKGCCGAATGTGTCAGCTTCAC L R V R D I N E A F K E L G It M C Q L H 607

I

I

Helix 1 2950

2970

TTGAAGAGTGAAAAACCCCAT&TTATTCTT&TCAAGCCG~GCAGTCAT~ L

K

S

E

K

P

Q

T

K

L

L

2290

2310

2330

AGCTATGAAAACTCACTCCACTCCCTGCAGTCTCGAATGGAffiATCGTTTAGACAGACTG S Y E N S L H S L Q S R M E D R L D R L 417

2350

2370

2390

GATGATGCAATCCATGTGCTGCGGAACCATGCTGTGGGACCTTCCACCAGTTTGCCTGCT D D A I H V L R N H A V G P S T S L P A 437

2410

2430

2450

GGTCACAGTGATATACATAGTTTATTGGGACCATCCCATAATGCACCAATTGGAAGCCTC G H S D I H S L L G P S H N A P I G S L , 457

Figure 2

3010

I

I

Loop 1 TCACCTCTCACAGGTACCAGTCAGTGGCCAAGACCTGGAGGGCAAGCACCTTCATCCCCA S P L T G T S Q W P R P G G Q A P S S P 397

2990

3030

L

H

Q A V Helix 2

A

V

I

627

3050

CTTAGTCTAGAACAGCAAGTCAGAGAGAGAGGAACCTTAACCCCMAGCAGCCTGCCTTM L S L E Q Q V R E R N L N P K A A C L K 647

I

II 3130

3150

3170

ACCCATCCTGGGCTTAGTGATACCAACCCTATGCGTCATATGTAAACATCAGCCAGT T H P G L S E T T N P M G H M

682

(continued).

cDNA has been identified. Three in frame ATG codons (at nucleotide positions 1 1 20, 11 41, and 1 195) can potentially serve as HTF4a translation initiation sites (Fig. 2). Initiation of protein synthesis at these sites would yield polypeptide chains containing 682, 675, and 657 amino acid residues, respective Iy . The reconstruction of a cDNA encoding HTF4a

allowed us to address the question of an evolutionary relationship among the bHLH proteins of class A: Da, HTF4a, ITF2, and E l 2 (used as a prototype of the E2A gene products). Dot-matrix graphic comparison programs provide a convenient method for visualizing regions of similarity between two D N A or protein sequences, especially when comparing sequences of different

Figure 3 Dot plot sequence comparison of HTF4a cDNA with selected members of the class A family of the bHLH proteins. A window of 20 and a stringency of 14 was used to compare the HTF4a cDNA sequence to that reported for (A) human ITF2 (Henthorn eta/., 1990b), (6) human E l 2 (Kamps era/., 1990), and (C) Drosophila daughterless (Caudy e t a / . , 1988; Cronmiller et a/., 1988).

T

~

HUMAN TRANSCRIPTION FACTOR

a

1,000

I

I

1

I

I

I

2,000 I

I

-.

*

I

.

.

I

*

40 1 4,000

3,000

*

8

I

I

l

I

1

I

I

~,OOO

1

I

. I. - 2,000

I

I

,

8-

A

... I

. .'

-

..

. . ..

Mitochondrial DNA Downloaded from informahealthcare.com by ThULB Jena on 11/13/14 For personal use only.

I

1,000

0 1 . 1

a

I

I.

I

.. - . . . . . . .. . .. . . . ... -

1

.

I

I

I

*

.

n

.

.. .'

....

:

.

*

Ei z -

s

...

.

7

0

#

I

l

I

.

..

.?.

I-.

'

D

I

I .*

.

I

.

The nucleotide sequence of the human transcription factor HTF4a cDNA.

A partial cDNA has previously been isolated that encodes HTF4, a new member of the basic helix-loop-helix family of DNA binding proteins. We have reco...
446KB Sizes 0 Downloads 0 Views