Cell, Vol. 61, 1179-l 186, June 29, 1990, Copyright
0 1990 by Cell Press
Isolation and Characterization of the Drosophila Gene Encoding the TATA Box Binding Protein, TFIID Timothy Hoey, Brian David Dynlacht, M. Gregory Peterson, B. Franklin Pugh, and Robert Tjian Howard Hughes Medical Institute Department of Molecular and Cell Biology University of California Berkeley, California 94720
Summary To investigate the biochemical mechanisms involved in interactions between regulatory factors and the general transcription complex, we have cloned, expressed, and characterized the Drosophila gene encoding the TATA binding protein, dTFIID. Comparison of the protein sequences of the Drosophila and yeast TATA binding proteins reveals a bipartite organization consisting of a highly conserved, basic carboxy-terminal domain and a nonconserved amino-terminal region rich in Gln, Gly, Ser, and Met residues. Purified dTFllD protein binds specifically to the TATA sequence and activates basal-level transcription, and the conserved carboxy-terminal half of the molecule is suffi‘cient for both activities. Partially purified TFIID from Drosophila cells mediates activation by the transcrip,tion factor Spl. In contrast, purified dTFllD expressed ,from the cloned gene is unable to support Spl-de[pendent activation, suggesting that other factors may be required to mediate interactions between upstream activators like Spl and the TATA binding protein. llntroduction IDifferential spatial and temporal patterns of gene expression are frequently controlled at the level of transcription by the action of sequence-specific regulatory factors (reviewed in Mitchell and Tjian, 1989; Scott et al., 1989). Although the DNA binding specificities and the protein domains required for transcription activation of several factors are well characterized, the specific mechanisms involved in how these proteins function to activate transcription are largely unknown. Several different classes of activation domains have been described; for example, rnany eukaryotic trans-activators contain regions rich in acidic residues which are involved in activation (Ptashne, ll988), while other activating domains are rich in glutamine or proline residues (Courey and Tjian, 1988; Mermod et al., 1989). A major unresolved question concerns the target for the transcriptional activating domains. Presumably, activating regions work through contacting some c:omponent of the general transcription apparatus common to RNA polymerase II promoters. There are at least five general factors, TFIIA, TFIIB, TFIID, TFIIE, and TFIIF, required for basal-level transcription by RNA polymerase II (Buratowski et al., 1989; Reinberg et al., 1987; Reinberg and Roeder, 1987). The binding
of TFIID to the TATA box has been shown to be the first step in the assembly of an active transcription complex and to be required for the subsequent assembly of the other factors (Buratowski et al., 1989; Van Dyke et al., 1988). Sequence-specific transcription factors are thought to function by promoting or inhibiting the formation of an active transcription complex at the promoter. In principle, regulatory proteins could act by facilitating the entry of any of the general factors to the complex or by modulating the activity of the complex after assembly. TFIID has been proposed to be a direct target for the action of transcriptional regulatory factors based on DNA binding studies with partially purified TFIID from HeLa cells. For example, a cooperative interaction between the human transcription factor USF and TFIID when simultaneously bound to DNA has been reported (Sawadogo and Roeder, 1985b). More recently, the regulatory proteins GAL4 and ATF were found to alter the interaction of TFIID with promoter sequences (Horikoshi et al., 1988a, 1988b). Efforts to study the regulatory interactions between genespecific factors and the TATA binding protein in greater detail have been hampered by the inability to purify TFIID to homogeneity from higher eukaryotes. However, the yeast TFIID protein (yTFIID, also known as BTFlY) has been purified, and the gene encoding this general transcription factor was cloned by several different groups (Buratowski et al., 1988; Cavallini et al., 1989; Hahn et al., 1989b; Horikoshi et al., 1989; Schmidt et al., 1989). Sequence analysis of the cloned yeast gene indicates that yTFllD is a 27 kd protein without a high degree of homology to any previously characterized protein. Initial characterization of yTFllD indicated that the protein binds specifically to TATA sequences and is able to substitute for the TFIID fraction from human cells to allow at least basal-level transcription. Therefore, the functional domains of TFIID required for binding DNA and for assembly of an active initiation complex have been evolutionarily conserved between yeast and humans. Our initial experiments have used Spl as a model for studying eukaryotic regulatory proteins and how they interact with the general transcription factors. Spl is one of several transcription factors that utilize glutamine-rich activation domains (Courey and Tjian, 1988; Driever et al., 1989; Tanaka and Herr, 1990). Many aspects of transcriptional regulation are similar in yeast and higher eukaryotes; however, some eukaryotic transcription factors display different activation properties in yeast and higher eukaryotic cells. For example, the glutamine-rich activation domains of Spl that function in Drosophila or mammalian cells cannot function in yeast (G. Gill, E. Pascal, and R. T., unpublished data). Similarly, glutamine-rich domains of the Drosophila bicoid protein that are required for transcriptional activation in the embryo are not able to function in yeast (Driever et al., 1989). Reconstituted in vitro transcription experiments indicate that partially purified TFIID fractions from Drosophila or mammalian cells are able to mediate upstream activation by Spl, but TFIID
40 GGAGCCACAG GTACAACGAC GAACGCGGTT GCATTACATT ATTATTATTA 80 CGCTCTGTGG GT/\TGCA/\TT
AAAACCCAAA GTTGAGCCCG TTCAGCAATC TTGAATTGAG TCACTGAAAA
150 180 210 ATTCACCGGA GTCCACAATA AACCATCTGT AAGATGGACC AAATGCTAAG CCCCAACTTC TCGATTCCGA 220 250 280 GCATCGGAAC GCCGCTCCAC CAGATGGAAG CGGACCAGCA GATAGTGGCC AATCCTGTGT ACCATCCTCC 290 320 350 GGCTGTATCG CAGCCGGATT CGTTGATGCC GGCACCCGGT TCCAGTTCCG TGCAGCACCA GCAGCAGCAA
CAGCAGTCGG ACGCCAGTGG GGGATCAGGT CTCTTTGGCC ACGAACCATC GCTCCCGCTG GCGCACAAAC
AAATGCAGAG TTA -A
490 GCAGCAGCAA CAGCAGCTCC AGAGTCAGK
GCCCGGCGGC GGTGGGAGCA CTCCGCAGTC CATGATGCAG CCGCAGACGC CGCAGAGCAT GATGGCCCAC
ATGATGCCCA TGAGTGAGCG GAGTGTGGGC GGTTCGGGGG CCGGAGGTGG CGGAGATGCC CTGAGCAACA
GA C CC AA G C G AT GGGCCCC T CC A CGCCGA TGACACCAGC CACACCAGGT TCCGC TGAT C CCGGTATTGT
710 740 770 GCCACAACTT CAGAACATCG TGTCCACGGT TAATCTGTGC TGCAAACTGG ACCTCAAGAA AATAGCATTG
:: ‘_ 1.46- (_ ‘,. 4.40- (
2.31- ‘3 ‘/ : t
CATGCGAGAA ACGCCGAGTA CAATCCTAAG CGATTTGCGG CTGTGATTAT GCGAATCCGA GAGCCCCGGA
1.6 kb dTFIlD
850 TA TTTTCAGC TLC&Q&A
920 950 980 GGCAGCGAGA AAGTATGCGC GCATCATTCA AAAGCTCGGT TTTCCTGCAA AGTTCCTCGA CTTTAAGATT 990 1020 1050 CAAAACATGG TCGGCTCCTG CGATGTCAAG TTCCCCATAC GCTTGGAAGG CCTGGTGCTG PCCCATTGCA
880 910 TGGTGTGCACAGGGGCAAAG AGTGAGGACG ACTCCAGACT
1120 AC TC GAATCGTGCT
CCTCATCTTC GTGTCCGGAA AGGTGGTGCT CACTGGAGCA AAGGTGCGGC AGGAGATCTA CGATGCCTTC
1200 GPCAAGATAT TCCCCATTTT
AAAAAAGTTC AAGAAGCAGT CATAAATAGG ATAGCGCTTT ATTAGVCTG
1210 1300 1330 TACGTGTACG TTTTAAGGTC GGTAGTTCTG GAAGTCTGAT CATATGAGTG GGAGCAGCCT GGCGAGCAGC
1340 TCCGATCGAG AATAAACCAC AAAGTAATTT
Figure 1. Cloning of the Drosophila
1370 AGCTGTAAAA AAAAAAAAAA AAAAAA
(A) Strategy for amplifying a Drosophila DNA fragment homologous to the yeast TFIID gene. The region of the yTFllD gene containing the 240 amino acid open reading frame is represented by the horizontal bar. The locations of the different primers are indicated by horizontal arrows: numbers indicate the amino acid residue of the yTFllD protein corresponding the 5’ end of each of the primers. Primer sequences were chosen to reflect the Drosophila codon preferences (Streck et al., 1966). Reactions containing primers 165 and 191 or 165 and 200 produced an amplified product of the predicted size (indicated by asterisks). (9) Nucleotide sequence of a dTFllD cDNA. The underlined region denotes an open reading frame, starting at nucleotide 174, specifying a protein of 353 amino acids. (C) Low stringency Southern blot of Drosophila genomic DNA. Two micrograms of DNA was digested with either EcoRl (lane 1) or Hindlll (lane 2), and hybridized with a 700 bp probe from the 3’ half of the cDNA. The probe hybridizes to a 2.5 kb EcoRl fragment and a 6.5 kb Hindlll fragment. The blot hybridized at high stringency appears identical to the low stringency hybridization shown here. (D) Northern analysis of dTFllD expression. The gene is expressed as a single mRNA species of approximately 1.6 kb in Schneider cells (lane 3). The same transcript is expressed at a constant level throughout embryogenesis (data not shown).
from yeast cannot productively interact with Spl (Pugh and Tjian, 1990). Therefore, we reasoned that it would be necessary to isolate TFIID from Drosophila or mammalian cells to obtain a suitable target for Spl-dependent activation. To facilitate the study of interactions between sequencespecific transcription factors and the general initiation complex, we have cloned the Drosophila gene encoding TFIID. This has allowed us to express the gene in Escherichia coli and to purify the TATA binding protein for biochemical experiments. In this report, we compare the primary structure of the TFIID protein from Drosophila and yeast and examine the biochemical activities of the protein in three assays: DNA binding, basal-level transcription initiation, and Spl-dependent activation. The accompanying paper (Pugh and Tjian, 1990) extends these
studies and provides evidence that the amino-terminal region of the dTFllD protein may interact with novel transcription factors required for mediating activation by Spl. Results Isolation of a dTFllD cDNA Our strategy for isolating a Drosophila cDNA encoding TFIID employed the polymerase chain reaction (PCR) method. The amino acid sequence of the yeast TFIID clone was used to design degenerate oligonucleotide primers based on Drosophila codon usage (Streck et al., 1986). Ten primers, five in each direction, were tested in different pairwise combinations using Drosophila genomic DNA as a template (Figure 1A). Two different combinations of primers yielded amplified products of the expected sizes
Cloning of Drosophila 1181
Figure 2. Comparison of the Amino Acid Sequences of Drosophila and Yeast TFllD
71 ~fohepslpla~kqmqsyqpsasyqqqqqqqqlqsqapogggstpqsmmqpqtpqsmmahmmpmsersv 1 madeerlkefkeankivfdpntrqvw
D 139 ggsoagoggdalsnihqtmgPstpm~pgsadpGIVPqLONIVsTVnLcCkLDLKkiALHARNA~Y Y 27 enqnrdgtkpattfqseedikratpesekdtsatsGIVPtLaNIVaTVtLgCrLDLKtvALHARNAEY con GIVP-LONIV-TV-L-C-LDLK-ALHARNAEY
DDSrLAaRKYARIIOK1GFpAKFlDFKIONmVG DDSkLAsRKYARIIOKiGFaAKFtDFKIONiVG DDS-LA-R K-GF-AKF-DFKION-VG
I1 275 SCDVKFPIRLEGLvltHcnFS &&&vRqEIYdAFdkI li 163 SCDVKFPIRLEGLafsHgtFS ,%$%$qReEIYqAFeaI con SCDVKFPIRLEGL---H--FSSYEPELFPGLIYRMV-'P-IVLCIFVSGK-VLTGAK-R-EIY-AF~-I D 343 fPiLkkFkKqs Y 231 yPvLseFrKm -P.L--F.K.. CC"
based on the yeast sequence, and one of these fragments was used as a probe to isolate homologous clones from a Drosophila embryonic cDNA library. The complete nucleotide sequence of the longest cDNA clone is shown in Figure 19. Primer extension analysis (data not shown) indicates that the transcription start site corresponds to the S’end of this cDNA. The dTFllD gene is unusual in that translation does not appear to initiate at the first methionine codon in the transcript (Kozak, 1984). The first ATG codon in the dTFllD cDNA, at nucleotide 83, specifies avery short open reading frame containing only 13 codons. The second methionine codon, at nucleotide 174, defines a long open reading frame capable of encoding a 353 amino acid, 38 kd protein that shares extensive homology with the yeast TFIID protein. The third ATG in the cDNA is in the same reading frame four codons downstream (Figure 19); initiation of translation at the third methionine codon would result in a 350 amino acid protein. Western blot analysis of Drosophila embryo nuclear extracts is consistent with translation initiation of the endogenous protein at either the second or third ATG in the dTFllD transcript (T. H., unpublished data). The TATA Binding Protein Is Encoded by a Single Gene in Drosophila It has been reported that there are different types of functionally distinct TATA elements in eukaryotes (Chen and Struhl, 1988; Simon et al., 1988), suggesting that there could be multiple TATA binding proteins. To address this possibility, we investigated whether there are other Drosophila genes that are similar to the dTFllD cDNA we had isolated. Genomic DNA blots were hybridized at high and low stringency (McGinnis et al., 1984) with a probe from the 3’half of of the dTFllD cDNA containing the entire part ot the coding region that is conserved between yeast and Drosophila (see below). At both high and low stringency this probe hybridizes to a single band (Figure lC), indicating that there is only one Drosophila gene that shares homology to yTFllD and suggesting that Drosophila has only one TATA binding protein. The dTFllD gene is expressed in Schneider cells as a single mRNA species of approximately 1.6 kb as determined by RNA blot analysis (Figure ID). The abundance and size of the transcript are con-
Amino acids that are identical in Drosophila(D) and veast (Y) are denoted bv capital letters and also-indicated on the line labeled “con:’ Within the conserved region the yeast and Drosophila proteins possess a 34 amino acid motif that is iresent ;wice in each protein, indicated by shading. The dTFllD amino-terminal sequence that is not conserved with yTFllD is rich in the amino acids Gln, Gly, Ser, Pro, and Met. The positions for translational initiation of the two versions of dTFllD expressed in E. coli in this study are indicated by arrows. The first three residues are underlined because it is not certain if these amino acids are present in the endogenous protein.
stant throughout all stages of embryogenesis, and the same transcript is present in Drosophila Kc cells (data not shown). The dTFllD gene was localized to map position 57F8-10 by in situ hybridization to polytene chromosomes (data not shown). There is no previously identified Drosophila gene that corresponds to this chromosomal locus (Lindsley and Zimm, 1987). Comparison of the Drosophila and Yeast TATA Binding Proteins A comparison of the predicted amino acid sequences of the Drosophila and yeast TFIID proteins reveals a striking pattern of conserved and variable regions (Figure 2). The carboxy-terminal 180 residues of the two proteins are 80% identical. Furthermore, if conservative amino acid substitutions are taken into account, they share 87% homology within the conserved region. In the conserved region the yeast and Drosophila proteins possess a 34 amino acid motif that is present twice in each protein (Figure 2; Cavallini et al., 1989; Hoejimakers, 1990; Nagai, 1990). The reiterated motif overlaps the weak homology that the TFIID proteins share with prokaryotic o factors (Hahn et al., 1989b; Horikoshi et al., 1989). The yeast and Drosophila proteins do not share any significant homology at their amino termini, and thedTFllD protein is longer, containing approximately 170 residues upstream of the conserved region while yTFllD has only 60. The dTFllD amino-terminal region has several short repeats of the amino acids Gln, Gly, Ser, and Met. Expression and Purification of Full-Length and Truncated Versions of dTFllD Protein The dTFllD protein was overproduced in E. coli using a T7 promoter expression vector (Rosenberg et al., 1987; Studier and Moffat, 1986). To test the function of the unique amino-terminal domain, two versions of the protein were expressed: a 38 kd protein whose initiator methionine is at amino acid 4, and a truncated, 21 kd version initiated at methionine residue 163 (Figure 2). The smaller protein, referred to as dTFIID-191C, contains the carboxyterminal 191 residues of the protein and begins 11 residues upstream of the conserved region between yeast and Drosophila. After SDS-polyacrylamide gel electro-
1234.56 123456789 Figure 3. Expression
10 11 12 13 14 15
of dTFIID, and DNA Binding Activity of the Protein
(A) The dTFllD protein was expressed in E. coli using the T7 promoter expression system (Studier and Moffat, 1986). Lane 1, molecular weight standards; Lane 2. DEAE flowthrough fraction of a control extract prepared from induced cells transformed with the expression vector with no insert. Lane 3, DEAE flowthrough fraction of an extract containing dTFllD protein. The induced protein can be seen as a unique species migrating at the predicted size, approximately 38 kd. Lane 4, dTFllD protein after purification on a heparin-Sepharose column. Lane 5, DEAE flowthrough fraction of an extract containing the dTFIID-191C protein; the expressed protein migrates at approximately 20 kd. Lane 5, purified dTFIID-191C protein after fractionation on a heparin-Sepharose column. (9) DNAase I footprint analysis of dTFllD binding. A radioactively labeled promoter fragment containing the MLP TATA box was incubated with increasing amounts of the H.4 fractions derived from induced cells containing the expression vector pAR3038 with no insert (lanes 2-4) and from extract containg full-length dTFllD (lanes 6-8); no-protein lanes are 1, 5, and 9. In each case the highest amount of protein is 300 ng of the H.4 fraction, and there is a 3-fold difference in protein concentration between each step in the titration. The H.4 fraction from the control extract does not exhibit any TATA box binding activity. The locations of the fragments on the gel relative to the transcription start site are indicated to the right. (C) Binding reactions containing partially purified fractions of the control pAR3038 extract (lane ll), dTFllD (lane 12) (ITFIID-191C (lane 13) and no protein (lanes 10 and 14). In these experiments, 800 ng of bacterial protein prepared from extracts of cells (prepared as in Santoro et al., 1988) expressing the indicated expression plasmid were used in each binding reaction. Lane 15 shows a Maxam-Gilbert purine ladder. The dTFllD and dTFIID-19lC proteins each protect ~20 bp centered around the TATA box from DNAase I digestion.
phoresis, each of the dTFllD proteins can be seen as a unique band in the crude extract when compared with a control extract prepared from induced cells that contain the expression vector with no insert (Figure 3A, lanes 2, 3, and 5). The two versions of the dTFllD protein were purified to apparent homogeneity by chromatographic fractionation on heparin-Sepharose columns (Figure 3A, lanes 4 and 6). Biochemical Activities of the dTFllD Protein The dTFllD protein was tested for its ability to bind to a TATA sequence using a DNAase I protection assay (Figure 38). This DNA binding assay was performed using a fragment that contains the adenovirus major late promoter (MLP) TATA box and protein extracts fractionated on a heparin column and eluted at 0.4 M KCI (H.4 fractions). The dTFllD protein binds specifically to the TATA box (Figure 38, lanes 6-8) and protects approximately 20 bp of sequence from DNAase I digestion, centered around the TATA element. An H.4 fraction derived from the control bacterial extract did not produce a footprint over the TATA box, indicating that the binding activity is the result of the
induced protein. The dTFIID-191C protein also binds specifically to the TATA sequence (Figure 3C, lane 13) with similar affinity to dTFIID, demonstrating that the conserved carboxy-terminal region of the protein is sufficient for DN,A binding. The activities of the dTFllD and dTFIID-191C proteins in promoting basal-level transcription in the absence of any upstream activators were tested by an in vitro transcription assay using partially purified general initiation factors TFIIA, TFIIB, and TFllElFlpol II isolated from HeLa cells (Pugh and Tjian, 1990; Reinberg et al., 1987). The DEAE flowthrough fraction containing the bacterially expressed dTFllD protein is capable of substituting for the human TFIID fraction in promoting accurate transcription initiation from the adenovirus MLP (Figure 4, lane l), while a DEAE flowthrough fraction from a control E. coli extract has no activity (lane 2) relative to no TFIID added (lane 3). Purified dTFllD protein is also fully active in substituting for the human TFIID fraction in directing basal-level transcription (lane 4). In addition, purified TFIID-191C protein is similar in its basal transcriptional activation properties to the full-length protein, indicating that the
Cloning of Drosophila 1163
1234 Figure 5. Purified dTFllD Protein Does Not Independently Spl Response
12345678 Figure 4. dTFllD Protein Expressed RasaCLevel Transcription
from the Cloned Gene Activates
The template for the transcription reactions was pML(&AT), containing the adenovirus MLP (Sawadogo and Roeder, 1965a). Specific initiation results in a transcript of 377 nucleotides. Reactions were performed using fractions from a HeLa cell nuclear extract that contain the different general factors required for pol II transcription. The reactions differed in the source of TFIID: lane 1,300 ng of the DEAE flowthrough fraction containing (ITFIID; lane 2, 300 ng of the DEAE flowthrough from the negative control extract; lane 3, no TFIID added, lane 4, 5 ng of purified dTFllD (heparin-Sepharose 0.6 M KCI eluate); lane 5, 5 ng of purified dTFIID-191C (heparin-Sepharose 0.6 M KCI eluate); lane 6, same as lane 4 except that plasmid p(CzAT) (Sawadogo and Roeder, 1965a), lacking the MLP sequences, was used as the template: lane 7, 0.3 1.11 of a fraction containing partially purified TFIID from Drosophila embryo extracts (provided by J. Kadonaga); lane 6, 1.2 kg of a fraction c’ontaining partially purified TFIID from HeLa cells.
carboxy-terminal 191 residues are sufficient to participate in formation of an active transcription complex (Figure 4, lane 5). Purified dTFllD Is Not Able to Substitute for the Endogenous TFIID Fraction in Supporting an Spl Response In addition to functioning in basal-level transcription, TFIID and the other general factors are required for mediating frans-activation by upstream factors. To test the possibility that dTFllD is a direct target involved in mediating trans-activation, we used the transcription factor Spl as a model. The template for these reactions is G6TI (Pugh and Tjian, 1990), a promoter that contains six Spl binding sites and the MLP TATA box. The endogenous, partially purified TFIID fraction from Drosophila embryos can mediate Spl-dependent activation in reconstituted transcription assays (Figure 5, lanes 1 and 2). In contrast, the purified, bacterially expressed dTFllD protein is not able to substitute for the endogenous fraction in supporting Spldependent activation (Figure 5, lanes 3 and 4). Discussion Function of the Conserved Region of dTFllD We have shown that the Drosophila TFIID protein has a bipartite structural organization consisting of a highly conserved carboxy-terminal domain and a nonconserved
The template for the transcription reactions was GeTI, a promoter with six Spl binding sites and a TATA box. Reactions in lanes 1 and 2 contained Drosophila TFIID partially purified from embryos (“fxn:‘); in lanes 3 and 4, 15 ng of bacterially expressed dTFllD was added. All reactions contained the other general initiation factors from HeLa cells. Purified Spl (300 ng) was added to reactions in lanes 2 and 4. The endogenous Drosophila TFIID fraction is able to mediate activation by Spl (lanes 1 and 2). The purified, bacterially expressed dTFllD protein cannot substitute for the endogenous fraction for Spl-dependent activation (lanes 3 and 4).
amino-terminal domain. Purified protein expressed from the cloned Drosophila gene possesses both biochemical activites that are characteristic of TFIID: sequence-specific DNA binding to the TATA element and the ability to activate transcription from a TATA box-containing promoter. The conserved region of the Drosophila protein is sufficient for both of these functions, suggesting that the domains of the protein required for contacting DNA and for assembly of an initiation complex are located within the carboxy-terminal half of the molecule. The carboxy-terminal portion of dTFllD is rich in basic amino acids but does not share a high degree of sequence similarity with any of the previously characterized DNA binding motifs. The carboxy-terminal regions of both the Drosophila and yeast proteins contain a 34 amino acid motif that is present twice in each sequence, and it has been proposed that both of these domains might be required for DNA sequence recognition (Nagai, 1990). The dTFllD protein can bind to a TATA box that has the canonical sequence TATAAAA. We have also found that purified dTFllD protein can bind other, nonconsensus Drosophila TATA box sequences (6. D. D. and T. H., unpublished data). The ability of the protein to recognize degenerate sequences is similar to the DNA binding activity that has been reported for yTFllD (Hahn et al., 1989a) and is consistent with our finding that Drosophila apparently has only one gene for the TATA binding factor. The conserved carboxy-terminal region of Drosophila TATA binding protein must also contain the domains involved in protein-protein interactions with the other general initiation factors for assembly of an active transcription complex at the promoter. The general initiation factors TFIIA and TFIIB are possible candidates for directly interacting with the TATA box binding protein. TFIIA and TFIIB do not have any DNA binding activity on their own, but can
interact with the TFIID-DNA complex, presumably through protein-protein interactions with the TATA box binding factor (Buratowski et al., 1989). Perhaps the high degree of sequence homology in the carboxy-terminal regions of the Drosophila and yeast TFIID proteins reflects the conservation of domains required for protein-protein interactions with the other members of the initiation complex. Function of the Variable Amino-Terminal Region of dTFllD In contrast to the highly conserved carboxy-terminal domain, the amino-terminal half of the dTFllD protein shares no homology with yeast TFIID. The 80 residue variable region of yTFll0 is rich in charged amino acids, containing 23 basic or acidic residues. In contrast, the larger, variable region of the Drosophila protein has very few charged residues; instead, it has an unusual sequence characterized by many short regions composed of repeated amino acids. There are two glutamine repeats and other short repeats of glycine, serine, and methionine. The fact that the Drosophila protein contains a large region not present in the yeast protein suggests that this region is responsible for differences in biological activity of the two proteins. Indeed, the variable amino-terminal region in dTFllD may be required for mediating trans-activation by Spl (Pugh and Tjian, 1990). We have noticed that some preparations of purified dTFIID-191C exhibit a slightly reduced affinity for the MLP TATA sequence. We are not certain whether this reflects an intrinsic property of the truncated protein or a more general instability of this protein. It is therefore not clear if the amino-terminal domain plays a role in DNA binding, or if removal of the amino-terminal half of the protein has an indirect destabilizing effect on the activity of the DNA binding domain in the carboxy-terminal region. Interactions between Sequence-Specific Transcription Factors and the TATA Binding Protein How do upstream activators communicate with the general transcription apbaratus? Since a partially purified TFIID fraction derived from Drosophila embryos directs Spl-dependent activation, our expectation was that purified dTFllD protein expressed from the cloned gene would also function in this assay. Surprisingly, the purified, bacterially expressed dTFllD is not able to mediate upstream activation by Spl. This difference between the endogenous fraction and the purified protein suggests two possible explanations. One possibility is that the bacterially expressed protein is missing a posttranslational modification that is required for mediating upstream activation. A second possibility is that the purified dTFllD protein is fully functional but other factors that copurify with the TATA binding protein present in the endogenous fraction are necessary for a productive interaction with Spl. The results presented in this study do not allow us to distinguish between these possibilities; however, the experiments presented in the accompanying paper provide evidence in favor of the latter hypothesis (Pugh and Tjian, 1990). Direct interactions between the TATA binding protein
and transcriptional activators have been suggested by studies with human TFIID (Hai et al., 1988; Horikoshi et al., 1988a, 1988b; Sawadogo and Roeder, 1985b). Our results, on the other hand, indicate that the TATA binding protein may not be a direct target for sequence-specific transcription factors. Since the human TFIID fractions used in the previous studies contained many other proteins, it is not clear if these interactions require intermediary factors. In fact, the behavior of partially purified TFIID fractions in DNAase I protection assays is consistent with the model (see Pugh and Tjian, 1990) that the endogenous TATA binding protein present in the TFIID fraction is associated with other proteins. The TFIID fraction partially purified from human or Drosophila cells typically protects approximately 80 bp of promoter sequences from DNAase I digestion (Horikoshi et al., 1988a, 1988b; Parker and Topol, 1984; Sawadogo and Roeder, 1985b; Van Dyke et al., 1988). In contrast, the purified yeast and Drosophila TATA binding proteins each produce essentially identical footprints of approximately 20 bp centered around the TATA box. Spl, a protein with glutamine-rich activation domains, apparently cannot independently interact with the TATA binding protein to stimulate transcription. It will be of interest to determine if this is a common feature of glutaminerich activators and to test other transcription factors with different kinds of activation domains to see if they function by a similar mechanism. The availability of purified Drosophila TATA binding protein, expressed from the cloned gene, will allow a closer examination of protein-protein interactions involved in transcriptional regulation. Experimental Procedures Cloning of the dTFllD cDNA Each PCR reaction included 0.1 fig of Drosophila genomic DNA, 2.5 U of Taq polymerase(Perkin-ElmerCetus),100 PM deoxynucleotides, 1.5 PM each oligonucleotide primer, 1.5 m M MgC12, 50 m M KCI. and 10 m M Tris (pH 6.4) in a reaction volume of 50 ~1. The reaction mixtures were overlaid with mineral oil, incubated at 94% for 2 min, and taken through 30 cycles of 40 sat 94%, 30 sat 40%, and 1 min at 72% Products of the expected sizes were subcloned into Ml3 and sequenced by the dideoxy method. The sequences of the primers that yielded amplified fragments homologous to yTFllD are as follows: primer 165, 5’-GAC/TGTG/CAAGTTCCCCCTTATCCG-3’; primer 191, 5’-GGGAAC/ GAGCTCGIAITGGCTCGTA-3’; primer 200, S’-GGCTTG/CACCATG/ACGGTAG/AATJ’. The 107 bp fragment produced by primers 165 and 200 was subcloned into M13; the fragment was labeled by synthesizing the complementary strand with [+P]dCTP and Klenow polymerase. The labeled fragment was gel purified and used to screen lo6 recombinant phage from a Drosophila 3-12 hr embryo cDNA library (Poole et al., 1965) as previously described (Kadonaga et al., 1967). Forty cross-hybridizing clones were selected from the primary screen, and ten of these were randomly selected for a secondary screen. Two independently isolated cDNAclonesof 1.4 kb and 1.3 kb were subcloned into PBS-SK (Stratagems), a nested set of deletions in each orientation was constructed using exonuclease Ill, and the cDNAs were sequenced using the dideoxy method using Sequenase (United States Biochemical). The two clones were found to be identical in sequence except that one clone is 69 bp longer at the 5’ end. The sequence of the longer cDNA is shown in Figure 1B.
Northern Blots For the Southern blots, 2 pg of genomic DNA was digested, electrophoresed on a 0.6% agarose gel, and transferred to GeneScreen. High
Cloning of Drosophila 1185
and low stringency hybridizations were carried out exactly as previously described (McGinnis et al., 1984). The probe was an Apal fragment from nucleotide 649 through the S’end of the dTFllD cDNA. The probe was labeled by random hexamer priming (Amersham). For the Northern blots, 20 Kg of total RNA was electophoresed on a 1.2% agarose-formaldehyde gel, transferred to GeneScreen, and hybridized with the same probe used for the library screen. The conditions for hybridization and washing were exactly as described (Hauser et al., 1985). Expression and Purification of dTFllD Site-directed mutagenesis was performed to create Ndel restriction sites in the dTFllD gene at positions corresponding to the methionine codons at amino acids 4 and 163 (see Figure 2). The Ndel fragments were cloned into the T7 expression vector pAA (Rosenberg et al., 1987) (the dTFllD gene contains an endogenous Ndel site in the 3’untranslated region). Expression was induced in the cell line BL21(DE3) (Studier and Moffat, 1986). Three cultures were processed in parallel: cells containing pAR3038, cells containing pARdTFIID, and cells containing pARdTFIID-191C. The cells were grown at 37°C in 400 ml of 2x YT medium, 0.4% glucose, 200 @ml ampicillin to an 00~00 of 0.5, induced with 0.4 m M IPTG, and grown for an additional 30 min at 30°C. The cells were harvested and lysed by treatment with lysozyme and sonication in 0.3 M KCI-HEMGN buffer (25 m M HEPES [pH 7.61, 0.1 m M EDTA [pH 8.01, 12.5 m M MgCl2, 10% glycerol, 0.1% NP40, 1 m M DTT, 0.1 m M PMSF, and 0.1 m M sodium metabisulfite). The lysates were centrifuged at 15,000 rpm for 10 min to remove insoluble material. The soluble fractions of the extracts were applied to 2.5 ml DEAESepharose (Pharmacia) columns equilibrated with 0.3 M KCI-HEMGN. The flowthrough fractions from the DEAE columns were used for transcription experiments (see Figure 4). The flowthroughs from the DEAE columns were diluted to 0.1 M KCI-HEMGN and loaded onto 5 ml heparin-Sepharose columns. The columns were washed with 0.1 M KCI-HEMGN and then successively eluted with 0.2, 0.4, and 1.0 M KCI-HEMGN. Eluted fractions with a conductivity equal to 0.8 M KCIHEMGN are shown in Figure 3A and were used for the transcription experiments. DNA Binding The fragment used for the footprint reactions was labeled by poly nucleotide kinase on the noncoding strand at the Xhol site at +60 in pSGsTI (Pugh and Tjian, 1990); the fragment extends upstream to a Hpal site at around -205. Binding reactions were performed in a volume of 50 ul for45 min at room temperature and contained 50 m M KCI, 32.5 m M HEPES (pH 7.6). 0.05 m M EDNA, 6.25 m M MgClp, 5% glycerol, 0.05% NP40, 2% polyvinyl alcohol, 100 ng of poly(dG-dC). poly(dG-dC), and approximately 1 ng of labeled DNA. The samples were digested with DNAase I for 1 min at room temperature by addition of !jO ~1 of 10 m M MgClp, 5 m M CaC12 and 2.5 pg of DNAase I. In ‘Vitro Transcription The general transcription factors TFIIA, TFIIB, TFIID, and TFIIEIWpol II were isolated from HeLa cell nuclear extracts as described elsewhere (Pugh and Tjian. 1990; Reinberg et al., 1987) by fractionation on phosphocellulose. DEAE-Sepharose, and S-300 Sephacryl columns. All the reactions contained 2.6 pg of TFIIA (S-300), 1.8 pg of TFIIE (S-300), and 2.3 wg of the TFIIEIFIpol II (S-300). Transcription reactions were performed in a volume of 20 ul for 60 min at 30°C and contained 200 ng of template DNA, 50 pM ATP, 50 PM CTP, 10 pM UTP, 5 &i of (u-~~P]UTP, 20 m M HEPES-KOH (pH 7.9), 1.5% polyvinyl alcohol, 100 m M potassium glutamate, 50 m M KCI, 10 m M (NH4)2S04, and 1 m M OTT. Reactions were stopped by addition of 80 pl of 3.125 M almmonium acetate, 125 ug/ml tRNA. The reaction products were extracted with phenol-chloroform, ethanol precipitated, and electrophoresed on a 6% polyacrylamide gel. Acknowledgments 8. D. Dynlacht and T. Hoey contributed equally to the work reported in this paper. We thank Jim Kadonaga for providing his TFIID fraction from1 Drosophila embryos, Laura Attardi for RNA samples, Larry Kauvar for the cDNA library, and Todd Laverty for performing the chromo-
some in situ hybridiza:ions. We acknowledge Trevor Williams for pointing out the duplicated amino acid motif in the Drosophila and yeast TFIID proteins. We also thank Grace Gill, Richard Turner, and Steve Bell for their comments on the manuscript. This work was funded in part by a grant to R. T from the National Institutes of Health. T. H. is supported by an American Cancer Society postdoctoral fellowship, and M. G. P. and B. F. i? are funded by postdoctoral fellowships from the Leukemia Society of America. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “8dveflisement” in accordance with 18 USC. Section 1734 solely to indicate this fact. Received
March 15, 1990; revised April 27, 1990.
References Buratowski, S., Hahn, S., Sharp, F? A.. and Guarente, L. (1986). Function of a yeast TATA element binding protein in a mammalian transcrip tion system. Nature 334, 37-42. Buratowski, S., Hahn, S., Guarente, L., and Sharp, I? A. (1989). Five intermediate complexes in transcription initiation by RNA polymerase II. Cell 56, 549-561. Cavallini, B., Faus, I., Matthes. H., Chipoulet, J. M., Winsor, S., Egly, J. M., and Chambon, P (1969). Cloning of the gene encoding the yeast protein BTFlY, which can substitute for the human TATA box-binding factor. Proc. Natl. Acad. Sci. USA 66, 9603-9807. Chen, W., and Struhl, K. (1988). Saturation mutagenesis of a yeasthis “TATA” element: genetic evidence for a specific TATA-binding protein. Proc. Natl. Acad. Sci. USA 85, 2691-2695. Courey, A. J., and Tjian, R. (1986). Analysis of Spl in vivo reveals multiple transcriptional domains, including a novel glutamine-rich activation motif. Cell 55, 867-898. Driever, W., Ma, J., Niisslein-Volhard, C., and Ptashne. M. (1989). Rescue of bicoid mutant Drosophila embryos by Bicoid fusion proteins containing heterologous activating sequences. Nature 342, 149-154. Hahn, S., Buratowski, S., Sharp, l? A.. and Guarente, L. (1989a). Yeast TATA-binding protein TFIID binds to TATA elements with both consensus and nonconsensus DNA sequences. Proc. Natl. Acad. Sci. USA 86, 5718-5722. Hahn, S., Buratowski, S., Sharp, l? A., and Guarente, L. (1989b). Isolation of the gene encoding the yeast TATA binding protein TFIID: a gene identical to the SPT75 suppressor of Ty element insertions. Cell 58, 1173-l 181. Hai, T., Horikoshi, M., Roeder, R. G., and Green, M. R. (1988). Analysis of the role of the transcription factor ATF in the assembly of a functional preinitiation complex. Cell 54, 1043-1051. Hauser, C. A., Joyner, A. L., Klein, R. D., Learned, T K., Martm, G. R., and Tjian. R. (1985). Expression of homologous homeo-box-containing genes in differentiated human teratocarcinoma cells and mouse embryos. Cell 43, 19-28. Hoejimakers, J. H. J. (1990). Cryptic initiation sequence ture 343, 417-418.
Horikoshi, M., Carey, M. F., Kakidani, H., and Roeder, R. G. (1966a). Mechanism of action of a yeast activator: direct effect of GAL4 derivatives on mammalian TFIID-promoter interactions. Cell 54, 665-669. Horikoshi, M., Hai, T, Lin, Y.-S., Green, M. R., and Roeder, R. G. (1988b). Transcription factor ATF interacts with the TATA factor to facilitate establishment of a preinitiation complex. Cell 54, 1033-1042. Horikoshi, M., Wang, C. K., Fujii, H.. Cromlich. J. A., Weil, 8. A., and Roeder, R. G. (1989). Cloning and structure of a yeast gene encoding a general transcription initiation factor TFIID that binds to the TATA box. Nature 347, 299-303. Kadonaga, J. T., Carner, K. R., Masiarz, F. R., and Tjian. R. (1967). Isolation of cDNA encoding transcription factor Spl and functional analysis of the DNA binding domain. Cell 51, 1079-1090. Kozak, M. (1984). Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. Nucl. Acids Res. 12. 857-072.
Lindsley, 0. L., and Zimm, G. (1987). The genome of Drosophila melanogaster. Part 2: lethals; maps. Dros. Inf. Serv. 64. 1-158. McGinnis, W., Levine, M. S., Hafen, E., Kuroiwa, A., and Gehring, W. J. (1984). A conserved DNA sequence in homeotic genes of the Drosophila Antennapedia and bithorax complexes. Nature 308, 428-433. Mermod. N., O’Neill, E. A., Kelley, T. J., and Tjian, R. (1989). The proline-rich transcriptional activator of CTWNF-I is distinct from the replication and DNA binding domain. Cell 58, 741-753. Mitchell, f? J., and Tjian, R. (1989). Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science 245, 371-378. Nagai, K. (1990). Cryptic initiation sequence revealed. Nature 343,418. Parker, C. S., and Topol, J. (1984). A Drosophila RNA polymerase II transcription factor contains a promoter-region-specific DNA-binding activity. Cell 36, 357-369. Poole, S. J., Kauvar, L. M., Drees, B., and Kornberg, T. (1985). The engrailed locus of Drosophila: structural analysis of an embryonic transcript. Cell 40, 37-43. Ptashne, M. (1988). How eukaryotic transcriptional ture 335, 683-689.
activators work. Na-
Pugh, B. F., and Tjian, R. (1990). Mechanism of transcriptional tion by Spl: evidence for coactivators. Cell 61, this issue.
Reinberg, D., and Roeder, R. G. (1987). Factors involved in specific transcription by mammalian RNA polymerase II. J. Biol. Chem. 262, 3310-3321. Reinberg, D., Horikoshi, M., and Roeder, R. G. (1987). Factors involved in specific transcription in mammalian RNA polymerase II. J. Biol. Chem. 262, 3322-3330. Rosenberg, A. H., Lade, B. N., Chui, D. S., Lin, S. W., Dunn, J. J., and Studier, F. W. (1987). Vectors for selective expression of cloned DNAs by T7 RNA polymerase. Gene 56, 125-135. Santoro, C., Mermod, N., Andrews, P. C., and Tjian, R. (1988). A family of human CCAAT-box-binding proteins active in transcription and DNA replication: cloning and expression of multiple cDNAs. Nature 334, 218-224. Sawadogo, M., and Roeder, R. G. (1985a). Factors involved in specific transcription by human RNA polymerase II: analysis by a rapid and quantitative in vitro assay. Proc. Natl. Acad. Sci. USA 82, 4394-4398. Sawadogo, M., and Roeder, R. G. (1985b). Interaction of a genespecific transcription factor with the adenovirus major late promoter upstream of the TATA box region. Cell 43, 185-175. Schmidt, M. C., Kao, C. C., Pei, R., and Berk, A. J. (1989). Yeast TATAbox transcription factor gene. Proc. Natl. Acad. Sci. USA 86, 77857789. Scott, M. P., Tamkun, J. W., and Hartzell, G. W. (1989). The structure and function of the homeodomain. BBA Rev. Cancer 989, 25-48. Simon, M. C., Fisch, T. M., Benecke, B. J., Nevins, J. R., and Heintz, N. (1988). Definition of multiple, functionally distinct TATA elements, one of which is a direct target in the hsp70 promoter for ElA regulation, Cell 52, 723-729. Streck, R. D., MacGaffey, J. E., and Beckendorf, S. E. (1986). The structure of hobo transposable elements and their insertion sites. EMBO J. 5, 3615-3623. Studier, F. W., and Moffat, 8. A. (1986). Use of bacteriophage T7 RNA polymerase to direct high level expression of cloned genes. J. Mol. Biol. 789, 113-130. Tanaka, M., and Herr, W. (1990). Differential transcriptional activation by &t-l and 06-2: interdependent activation domains induce Ccl-2 phosphorylation. Cell 60, 375-386. Van Dyke, M. W., Roeder, R. G., and Sawadogo, M. (1988). Physical analysis of tranSCriptiOn preinitiation complex assembly on a class II gene promoter. Science 247, 1335-1338.