Striking homology of the 'variable' N-terminal as well as the 'conserved core' domains of the mouse and human TATA-factors (TFIID).

Nucleic Acids Research, Vol. 19, No. 14 3861 -3865

Striking homology of the 'variable' N-terminal as well as the 'conserved core' domains of the mouse and human TATA-factors (TFIID) Taka-aki Tamura' *, Kohsuke Sumita1 2, Ichiroh Fujino1 2, Atsuo Aoyamal +, Masami Horikoshi3, Alexander Hoffmann3, Robert G.Roeder3, Masami Muramatsu4 and Katsuhiko Mikoshibal 3 'Division of Behavior and Neurobiology, National Institute for Basic Biology, Myodaiji-cho, Okazaki 444, 2Division of Macromolecular Function, Institute for Protein Research, Osaka University, 3-2 Yamada-oka, Suita 565, Japan, 3Laboratory of Biochemistry and Molecular Biology, The Rockefeller University, New York, NY 10021, USA and 4Department of Biochemistry, Faculty of Medicine, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo 113, Japan Received May 12, 1991; Revised and Accepted July 1, 1991

ABSTRACT A complementary DNA (cDNA) encoding a mouse TFIID (milD) was isolated from mouse brain cDNA libraries. The 316 amino acid sequence deduced from cDNA sequences revealed the presence of an amino-terminal region enriched in serine, threonine, and proline (STPcluster), an uninterrupted stretch of 13 glutamine residues (Q-run), a second STP-cluster, and a conserved carboxy-terminal region. Amino acid sequences of the first STP-cluster and the conserved carboxy-terminal region were identical to those of the human TFIID (hilD). However, the 0-run was considerably shorter than that in hilD and sequences in the second STP-cluster diverged from those of the hIlD. The murine TFIID transcript is expressed as a 2 kilobase poly(A)+ RNA in the mouse brain. Southern blot analysis identified a single gene copy per haploid mouse genome.

INTRODUCTION The general transcription factors for RNA polymerase II identified in human cells have been designated TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIG (1, and references therein). Human TFIID (hID) binds to the TATA box element (2, 3) and initiates the ordered assembly of RNA polymerase II and the other general factors into a functional preinitiation complex (4, 5). The demonstration of a yeast TFIID (yIID) that was functionally interchangeable with the hIHD (6-8) led to the total purification of this factor (9) and the cloning of corresponding cDNA (10-14). This was followed by the cloning of TFIID cDNAs from fission yeast (15, 16), plants (17), Drosophila (18, 19), and human (20-22). Sequence comparisons revealed a highly

DDBJ accession no. D01034

conserved 180 residue C-terminal domain, which earlier mutational studies had shown to be necessary and sufficient for basal level transcription by the yllD (23). In contrast, these analyses failed to reveal significant sequence homologies in the N-terminal domains. Given previous indications that hIHD is a target for the action of at least some regulatory proteins (24-27), as well as functional comparisons of natural and recombinant TFIID species, these results led to the speculation that the variable N-terminal regions might be involved in species-specific interactions with regulatory factors (20, 21, 28). However, the ability of several mammalian activators to enhance promoter activity via yeast TFIID in HeLa cell-derived systems (22, 29) may indicate that at least some interactions with regulatory factors might proceed via the conserved core domain and suggest alternate or accessory roles for the unique N-termini of hID. To further investigate structurefunction and evolutionary relationships in mammalian TFIIDs, we have isolated murine TFIID cDNA and demonstrated striking sequence conservations between human and murine TFIIDs.

MATERIALS AND METHODS Cloning and Sequencing of the Mouse TFIID cDNA Two cDNA libraries in lambda-gtl I were made from cerebellum poly(A)+RNA of ICR mice by priming with oligo(dT) and with a random hexamer (30). Libraries were screened as previously described (31) using a 32P-labeled DNA probe corresponding to nucleotide positions 576-971 (encoding amino acid residues 160-291) in a hIlD cDNA (20). Nucleotide sequences were determined by the dideoxy chain termination method in each direction.

To whom correspondence should be addressed at Department of Biochemistry, Saitama Medical School, Moroyama-machi, Saitama 350-04, Japan + Present address: Biotechnology Research Laboratory, Tosoh Co. Ltd, 2743-1 Hayakawa, Ayasa, Kanagawa, Japan

*

3862 Nucleic Acids Research, Vol. 19, No. 14

a

a

~~452238-2b

1

COFA*rVM u-sermwo core T.M.-.,!Zl: 316

mouse

420

Mat

O CCA Pro CCT

17

161

M

209 49

OlG

65

Gin

Ila

Ala Oiy Thr Oly Gin

Gin

Gin

G

nl CCC ALa

97

Thr Pro C ACA

ac

449

Lau Tyr Pro gar CT

CCA QTTCTT

V

497

AnT OTA TCT ACC OTO

193

669 209

737

225

Yakl

633

Bar

ATTCT

BOA

AT OTA CCG

Mu kly Ise

CAG

Mat

Arc

Gin

Lan

Ala

OCTMa

Ci

Mrg

Lys

Ap

Pro

Ber Bar T0r

hIm OAO

ATC MAA CCC AGAATT Il Lym Pro Mre hI. A ACA GOT OCTAMA TT 929

681

289

L

977

ATC

Thsr TAC

Tyr

Lym

ATC CAB

he

OTT Ala

C Gin

Gin

nO

ro

GOC

C

Ba

TCC ABA Bar Arg flC CCA

Pro

nGly

MC ATO OTO

Gin Ann

AOC TOT C8a CAG CTO ACC CAC CAG

BAT

Yal Giy Bar

Mat OTO

L

Yal

OT

Gin

Lon

AOA OCA

hIs

AAT BAG

Lys Yal Mr g Al&aGi ATC TTA AAG GOA TIC AGO

Ile Tr ProIle Lou LyGyPh

Phi

ha

o

Arg Lys

Bar

Yak

TAT

OM

Tyr

MG ACC

CA

AOT

8

CTT GMA GOC Mgc nu 01y Lan Val TT Lon Him ABA CCA OAATTA OAA ATC e CT m Pro Bliu Lon PProGl0 Lau lTryr Mrg OTn CTC CTrATT TCTGOA MA OTn Lou 01O Lys

Ala

CCC

MT

C

Pi

GAA

Ly Leu

ALa Mrg Val Val Olin

AG

nC

Pha

C CCC ATA AGO CTG

TTC AOT AUC TAT Pha

BAC

Gin

Thr

Yal

OCA TTT OM Ala

P

Amp 5km

Mt Yak

MC

Gin

4gaoM

1153 1216 1279 1342 1405

TCTGTGCTGCTACTTnOOCOGCACTOCCATTATTTATATTrABATTTAAACACTOCTOTO OTGAOTTOGTTMAGOACAGAACTTAAGTGTTCMCCACCTOTACMnTGBAC'TTTC ATTMTCTTTCCCACACAAGCCATTTTTATATTTCTACCATMAAGTMAAATCTrrTTr MAAGTGTTOTTrCTArrrGTGTTAMCTAAGGAGTTATrrTTTO CCAGATACATTCCGCC

1531 1594

GBATACTOGAAAAOTCCCCCTCTGCACTGAATCACCCTGCAOCACTACTOTGAOTnGCnC

TTCCCAOTATTGCAGoACTGTAATGTATTATCAMCMTGGCTOTACATACIr-CIn TCTTCAGAGTCTCTGCACMAA COCAGCTTOTMAnAnGTTAAT1TTrGoATMAATGATAC CTTGTMGTCATGTOATCATACTGTCAAGATTTATTTTAGATATAAA

Fig. 1. Structure of the mouseT

AAAAAA

ED cDNA. a; Schematic representation of three

independent overlappingclones isolated from mouse braincDNA libraries. b; Nucleotide and predicted amino acid sequences of the mIUD.

.. .. .. .

human

Shown

--

--

-

--

--

-

[LEEBGR9GOOQQQQOQQQQOQGOQQGQOOOQQQQOOBQOOAVA-AAA-VOQSTSQQATQGTSGOAPQLFHSOTLTT --;S-92

i

Fig. 2. Comparisons of mouse and human TFIID amino acid sequences. a; Schematic representation of the mouse and human (20) TFIIDs. Amino acid positions of the STP-l, Q-run, STP-2 and conserved C-terminal core are shown. Residues 71 to 95 show 80% sequence identity betweenmIHD hIID. and b;Arnino acid sequences around the Q-run. Amino acids are indicated by the single letter code. Short bars indicated deleted residues while asterisks indicate identities between the two sequences.

is

exposed to Kodak XAR films. A mouse myelin basic protein (MBP) gene fragment from nucleotides -1318 to +223 relative to the transcription start site (33) was also used as a probe for checking the DNA digestion and for calibration of the copy number. For Northern blot analysis (30), 2 Mg of poly(A)+RNA from the mouse brain was analyzed using a mIlD probe (8x 108 cpm/yg) extending from positions 369-675. RESULTS

Ann

TTOTCTOCCAT

ACA

1090

1468

------

OTA

OTTCTCCTOCCTTCCCTATCCACGTGTTrTAMAACCAOTCAOTTTTGGTACCACToATGoT ACAGTTOOTGAGGACACTCAGTTACAGGTGoCAoCATGMOTGACACTGTOTOTCCTACTGCA

1027

STPaQQQQQQQ0-run -AVAS,V,OQSSQQTQUGQPQHS9LT STP-2 LLEELREQ mowse ............ --------AVATAAASVQQSTSOQPTQGASGQTPQLFHSQTLTT G Q--------OQ-QgQ IOOEEQQR

G

AT Ala Lanl

hae

TICTTA e

Al&

LysPPn

CM

Gin

GAC

Giu

Ala

CT7

Pro

.inAEtz.mwgm:

AG

GAG

Gln Lau ATT CTA MO ACC Amp Lau Lym Tar hIs

Val

MAT Cfl GOC TOT MAA CTT

Pro TT r r Arg Th TCT GOA MAA ATO OTG TOC ACA GOA 0CC MAO AOT GM Bar Ol1 Lys HotVal Cys Thr Gly Ala Lys Bar CTA GCA OCA AOA MA TAT OCT AGA OTn BTO CABO AG

hae

Yal

278

305

GAG

Lys Lau ALn Lan Gir hae Yal Th Bar r M T CCC OCA CTT COT OCA AGA MAT OCT GM TAT MAOA CGA MAl Loa Mg Al& Arg An Ala. Bu Tyr Ass Pro Lym Arg OTC ATC ATO AGA ATA AGA GAO CCA COO ACA ACT OCO nTT

241 Val LysTT 257

Yal

-_

mar

PMn

AC

129

641

Ala

CMATTCTC

401 118

177

GAO

Thr Ala Ala Ala Bar &CA OlCAG GOT 0CC TCA GOC CAG Yak

human

AC

TCA ACA TCT CAG CMA AC CC CA Gly Ala Bar GlyGin Tar Pro Bar ar mar Gin Gin pro Tar C CAT TCT CM ACT CTO ACC ACT GCA CCA nTO CCA 00C C Thr Lan Thr Thr Ala Pro Lan Pro Oly war Gin Lan Pha Him Bar ACC CCT ATC ACT CCT ACC CCC no TAC CCT TCA CCAAAACT CCT ATO Gin

493

AATOATT

Gin

CM

305

353

145

TA Iaae Gin

257

545

TC

TT

LOU Thr Pro Gln Pro Il Ann Tsr C CAAA1a ATT TT O GMA GAB CAA C AGA CAB CAG CAG C CTC TCT Lan Bar hl Lau O lu Gin Gin Gin Ar Gin Gin Gin Uin Gn CAG CAB CM CM CAG CAB BCA BTA OCA &CT OCA OCA OCC TCA OTA Pro Tyr

33

161

AATAGACTOACTGAOOOAGAAAOOBA

C CCTT CCA CCT TAT OCT CAB GOC AGC Occ TCC Ol L.u Ala ar Aap Gin Amn Aan Bar L.u Pro Pro Tyr Ma T C CAO OC OCC ATO ACT OCT GOA ATT CCC lATC Pro Pro Not Not Gin Gly Not mar rro Gy Bar C TAC GOC ACA BOA AC T CA CAG ATT CAG MC C COT CAOT

M ATG GAC CAG

113

HM33

CTAABAAGATATTCAAGAOATBCTCTAOGO

T

65

It a a

680%:

2S-2 42 1

1

11--.:.-.:.-.:.:.:,:. .:.

725 23S- I

the combined

sequence of three overlapping clones, containing a 948 bp ORF encoding the 316 amino acid mIHD. Straight and wave lines show the Q-run and the pro-serthr or its derivativemotifs, respectively. The conserved C-terminal core is specified by the region between the two arrows.

Southem and Northern Blot Analyses

Southern blot analysis under stringent hybridization conditions carried out by the standard method withn minor modifications (32). Briefly, 12 jAg amounts of the mouse DNA were digested with BamHI and HindIll, electrophoresed on a 0.7% agarose gel, blotted onto a nylon membrane, and probed with random was

a

primer-labeled mouse TFUID (mIHD) cDNA fragment (8 x Io0 cpm/,ug) that extended from nucleotide positions 486 to 1006 and covered the conserved C-terminal region. The hybridization solution contained 106 cpm/ml probe, 5 x Denhardt's solution, 100 zg/ml denatured herring testis DNA, xSSC, 50 mM sodium phosphate (pH 6.5), 0.1% SDS and 50% deionized 5

formamide. The blot was incubated with the probe at 42°C for 16 hr, washed in 0.1 x SSC, 0.1% SDS at room temperature and

The TFIID species encoded by the several cDNAs reported thus far show strong sequence conservations over 180 residues in their respective C-termini (conserved C-terminal core). Assuming sequence conservation at both the protein and DNA level, we employed a hIlD cDNA-derived probe encoding 132 residues of the conserved core to screen an oligo(dT)- and random hexamer-primed mouse brain cDNA libraries. The first screens

2 positive clones from the (1 107 phage each) yielded and 6 from the random hexamer-primed library oligo(dT)-primed library. These clones (including 23S-1 and H3, Figure la) contained sequences related to the conserved C-terminal core of lacked the expected N-terminal sequences. The mIlD hIlD, but from nucleotide positions 369 to 675 (below) and the sequence from nucleotide positions 1 to 183 were then used sequence hIID as to screen the random library, x

which probes hexamer-primed yielded one positive clone (23S-2b) carrying a part of the conserved C-terminal core and preceding N-terminal sequences. Overlapping clones 23S-2b, 23S-1 and H3 (Fig. la) were then used to construct a 1652 base pair cDNA which contained one long open reading frame (ORF) begining at position 65 (Fig. Ib).

The ORF encoded a 316 amino acid polypeptide which was similar in size to the 335 residue (20). The overall amino acid sequence showed 92% identity with the hbID sequence, while the C-terminal 221 amino acid sequence was identical with the

hIID

region of the hMID (Fig. Ib, 2a). Nucleotide corresponding between and cDNAs revealed a high

mIID hIID

(90%) sequences sequence identity in the coding regions. Sequence similarities were weak in the short 5'- and proximal 3'-non-coding regions (Fig. 3a), but stronger in the more distal 3'-untranslated region

(downstream from 1200) (Fig. 3b).

Analysis of the mIlD amino acid sequence revealed motifs observed in other site-specific DNA binding proteins (Figs. lb

Nucleic Acids Research, Vol. 19, No. 14 3863

a

b

conserved C-terminal core

931 1014 991

TAGTTGTCTGCCATGTTCTCCTGCCTTCCC TA-ATGGCTCTCATGTACCCTTGCCTCCCC

1051 ATCCACGTGTTTTTTAAAAC ---- CAGTCAGTTTTGGTACC ---- ACTGATGGTACAGT 1133 TTCTTTTTTTTTTTTTAAACAAATCAGTTTGTTTTGGTACCTTTAAATGGTGGTGTTGT

1102 TGGTGA-GGACACTCAGTTACA---GGTGGCAGCA--TGAAGTGACACTGTGTGT-CCT 1193 AGAAGATGGATGTTGAGTTGCAGGGTGTGGCACCAGGTGATGCCCTTCTGTAAGTGCCC 1155 CTGCAGGAT-ACTGGAAAGGTC--CCCCTCTGCACTGAAATCACCCTGCAGCACTACTG 1253 CCGCGGGATGCCGGGAAGGGGCATTATTTGTGCACTGAGA- -ACACCGCAGCGTGACTG 1212 GAGTTGCTTGCTCTGTGCTGCTACTTGGGCGGCACTG-CCATTTATTTATATTTAGATT 1311 GAGTTGCTCATACCGTGCTGCTA--TCGGCAGCGCTGCCCATTTATTTATATGTAGA-T

1271 TAAACACTGCTGTTGGTGATTGTTGGTTTAAGGGACAGAACTTTAAGTGTTCAAGCCAC 1368 AAAACACTGCTGTTGACAAGT--TGGTTTGA-GGCCA-AACTTTAAGTGTTAAAGCCGC

1331 TGTACAA----TTGGACTTTTCATTTTAA--TCTTTCCCACACAAGC-CAGTTTTTATA 1424 TCTATAATTGATTGGCTTTTTAATTTTAATGTTTTTCCCCATGAACCACAGTTTTTATA

1384 TTCTACCATAAAAGTAAAAATCTTTTTTAAAAGTGTTGTTTTTCTAGTTTGTAACTCTT 1484 TTCTACCAGAAAAGGAAAAATCTTTTTTAAAAGTGTTGTTTTTCTAATTTATAACTCCT

1444 GGAGTTATTTTTGTGCCAGATACATTCCGCCTTCCCAGTATTGCAGGACTGAATAGTTG 1544 GGGGTTATTTCTGTGCCAGACACATTCCACCTCTTCAGTATTGCAGGACAGTATATATG

1504 ATTAATCAAAA-CAATGGCTGTACATA--CTTTTCTTTCTTCAGAGT-CTCTGCACAA1604 GTTAATGAAAATGAATGGCTGTACATATTTTTTTCTTTCTTCAGAGTACTCTGTACAAT 1559 AACGCAGCTTGTAAATTGTTAGATTTTTGTTATAAATGATACCTTGTAAGTCATGTGAT 1664 AATGCAGTTTATAAAAAAAAAAAAAAAAAAAAA

Fig. 3. Comparison of nucleotide sequences between mouse and human TFIIDs. a; Harr plot analysis of the mouse and human (20) TFIID cDNA sequences. One dot represents an identical 9 nucleotides out of 10. Start points of the coding region (65), Q-run (236), STP-2 (275), conserved C-terminal core (470) and the terminus of the coding region (1013) are indicated by arrowheads. b; Nucleotide sequence comparison of the 3'-flanking region between mouse (upper) and human (lower) TFIID cDNAs. Sequences were aligned to have a maximum homology between two cDNAs, including deletion and insertions. The conserved C-terminal core and matching nucleotides are indicated by a shaded area and asterisks, respectively.

and 2b). A stretch of 13 uninterrupted glutamine residues (Qrun) was found between residues 58-70. Regions enriched in ser(S), thr(T), and pro(P) were detected on each side of the Qrun and designated STP-1 and STP-2. The overall contents of ser, thr, and pro residues in STP- 1 and STP-2 were 35% and 50%, respectively. The region from residues 119- 133 contained an imperfect tripeptide repeat, either pro-ser-thr or a derivative motif (20). In comparing these regions with those of hlID (Fig. 2a, 2b), we found that the N-terminal 57 amino acids (the first 5 amino acids and the STP-1 region) were identical between both TFIIDs. However, the mIlD Q-run (13 residues) was considerably shorter than the 34 (20) or 38 (21, 22) residue counterparts in MID. Sequence deviations between mIlD and hIlD were most evident in the beginning of STP-2; the mIID sequence of residues 71 -95 had 80% identity with the corresponding region of hIID and included two single residue insertions. As described above, the nucleotide sequence of the two cDNAs also diverged in this region. In a Southern analysis, mouse DNA was digested with BamHI or Hindlll and probed with the conserved region (nucleotide residue 486-to- 1006) of the mIlD cDNA. This analysis (Fig. 4a) revealed single strong signals of 3.9 (BamHI) or 5.5 (HindIII) kilobase pairs, respectively. A control with a mouse myelin basic protein (MBP) probe of similar specific activity also identified single strong bands whose intensities were comparable to those observed with the mIID probe. Since a single MBP gene is present per haploid mouse genome (34), these data indicate that the TFIID gene analyzed here exists as a single copy in the haploid mouse genome. It may be noteworthy that the Southern

A TFIID B H

%a

M

M

-

23

-

-

9.4

-

-6.6

-

4.4

-

-

B

MBP B H

to

V

Conserved structural motifs within the N-terminal domain of TFIID tau from Xenopus, mouse and human.

Striking conservation of TFIID in Schizosaccharomyces pombe and Saccharomyces cerevisiae.

Maternal imprinting of the mouse Snrpn gene and conserved linkage homology with the human Prader-Willi syndrome region.

Functional differences between yeast and human TFIID are localized to the highly conserved region.

Highly conserved core domain and unique N terminus with presumptive regulatory motifs in a human TATA factor (TFIID).

A conserved element in the leader mediates post-meiotic translation as well as cytoplasmic polyadenylation of a Drosophila spermatocyte mRNA.

Engineered Autonomous Human Variable Domains.

Hyperreactivity as well as mouse killing is induced by electrical stimulation of the lateral hypothalamus in the rat.

Helminth infection alters mood and short-term memory as well as levels of neurotransmitters and cytokines in the mouse hippocampus.

Allergen-Specific Immunotherapy Alters the Frequency, as well as the FcR and CLR Expression Profiles of Human Dendritic Cell Subsets.

Large-Scale Screening of Preferred Interactions of Human Src Homology-3 (SH3) Domains Using Native Target Proteins as Affinity Ligands.

Molecular analysis of the outer surface protein A (OspA) of Borrelia burgdorferi for conserved and variable antibody binding domains.

Determinants of the assembly and function of antibody variable domains.

Pheomelanin as well as eumelanin is present in human epidermis.

Similarity between the DNA-binding domains of IHF protein and TFIID protein.

Evidence that Linkage Group IV as well as linkage Group X of the mouse are in chromosome 10.

Variety--the spice of science as well as of life. The disadvantages of specialization.

Pregnancy: Study the mother's DNA as well.

Dehydration-The Pharynx is Affected As Well.

Subjective well-being and human welfare around the world as reflected in the Gallup World Poll.

The conserved carboxy-terminal domain of Saccharomyces cerevisiae TFIID is sufficient to support normal cell growth.

Glutathione influences the proliferation as well as the extent of mitochondrial activation in rat splenocytes.

The efficient method for simultaneous monitoring of the culturable as well as nonculturable airborne microorganisms.

gag as well as myc sequences contribute to the transforming phenotype of the avian retrovirus FH3.