Cell, Vol. 63, 455-456,
November
2, 1990, Copyright
0 1990 by Cell Press
Letter to the Editor
The fork head Domain: A Novel DNA Binding Motif of Eukaryotic Transcription Factors? Genetic and subsequent molecular analysis of pattern formation in the early Drosophila embryo has identified a number of developmental control genes that encode transcriptional regulators (reviewed by Ingham, 1988). Some of these genes, however, do not fall into any of the known categories of transcription factors. One such gene is fork heed(fkh), a region-specific homeotic gene that promotes terminal, as opposed to segmental, development (Jtirgens and W&gel, 1988; Weigel et al., 1989a). fkh encodes a nuclear protein, consistent with the proposal that fkh exerts its function as a transcriptional regulator of subordinate genes. Several such genes whose expression is dependent on fkh activity have been identified (Weigel et al., 1989b). The sequence of the Fkh protein, however, was not similar to the sequence of any known protein (Weigel et al., 1989a). The rat gene encoding the hepatocyte-enriched transcription factor HNF-3A has recently been cloned and sequenced (Lai et al., 1990). The HNF-3A protein is 488 amino acids long, and its DNA binding domain has been mapped to a 185 amino acid stretch. Comparison of the sequences of HNF-3A and Fkh has revealed a striking similarity between the two proteins (Figures 1 and 2). Starting at position 91 of HNF-3A and 125 of Fkh, 129 out of 178 amino acids are identical (73% identity; Figure 1). Within this region of sequence similarity, two subregions with 48% (region A) and 88% (region B) amino acid iden-
A)
HNF-3A
125
identical: identical
161
FKH
201
*
*
** * **
34/71 36/71
FKH
RtPDKPGKGSfWTLHPDSGNMFENGCYLRRQKRFKdEK
* ********.************************
+ conserved:
of Homology
****
** * * ** *
TfkRSYpHAKPPYSYISLITMAIQqaPskMLTLSEIYQwIMDQNSIRHSLSFNaCFVKva *..*** ***************** * .********* ******.******************* TyrRSYtHAKPPYSYISLITMAIQnnPtrMLTLSEIYQfI~LFPfYRQNQQRWQNSIRHSLSFNdCFVKip
RsPDKPGKGSyWTLHPDSGNMFENGCYLRRQKRFKcEK
1. Regions
* *
*
*
*****
Shared
by HNF-3A
95/110 lOl/llO
195
aa = 48 8 aa = 51 %
HNF-3A
identical: identical Figure
.*
AaAMNSMggncmTpssMsyAsmgSPlGnMGgcM-~s~SMsaaGLsGtYg~pPgsreMetgsPnsLGRSR
+ conserved:
HNF-3A
152
AgAMNSMtaagvT--aMg-Aal-SP-GgMGs-MgA9pAASnPcmspMay-aPsnLGRSR
* ***** FKH
W
91
tity can be distinguished. Allowing for conservative substitutions, the degree of sequence similarity in region B, which spans 110 amino acids, increases to 92% (Figure 1B). In addition to the large regions A and B, we note a third, small stretch of sequence identity, the hexapeptide HPFSIN (region C), at a more carboxy-terminal position in both proteins (392-397 in HNF3A and 450-455 in Fkh). Lai et al. (1990) have delimited the DNA binding domain of HNF-3A to a region spanning amino acids 124 to 288, which fully encompasses region 6 (Figure 2). Deletion of amino acids 251 to 288, which includes 19 amino acids of region B, abolishes the DNA binding activity of HNF-3A. These results suggest that the highly conserved region B is the DNA binding domain of HNF-3A and, by extension, of Fkh, which is probably also a transcription factor. Our study of amorphic fkh mutations (Weigel et al., 1989a) supports the functional analysis of HNF-3A (Figure 2). The molecular analysis of four complete loss-offunction mutations revealed that three result in truncated versions of the wild-type protein that completely or partially lack region B (Figure 2). The fourth mutation is an in-frame deletion of six amino acids between positions 232 and 237 within region B. Based on the results obtained for HNF-3A (Lai et al., 1990) all four mutant proteins would be predicted to lack DNA binding activity. Several features of domain A indicate that it is probably not involved directly in DNA binding. Removal of about half of this region in HNF-3A does not affect DNA binding (Lai et al., 1990). Region A is highly hydrophobic and, except for the carboxy-terminal end, contains no conserved basic amino acids, which are often important for protein-DNA interaction. Most significantly, optimal alignment between Fkh and HNF-3A is possible only after in-
**
****.
270 310
aa = 86 % aa = 92 %
and Fkh
The HNF-SA sequence is from Lai et al. (1990); Fkh protein sequence Asterisks indicate identities, dots conserved exchanges. In (A), gaps
is from Weigel et al. (1999a). (A) Similarity region have been introduced for maximal alignment.
A. (6) Similarity
region
B.
Cdl 456
HNF-3A
DNA binding ,.*..w