Plant Molecular Biology 6: 253-263, 1986 © 1986 Martinus Nijhoff Publishers, Dordrecht - Printed in the Netherlands

Nucleotide sequences of two corn histone H 3 genes. G e n o m i c organization of the corn histone H 3 and H 4 genes Nicole Chaubet, Gabriel Philipps, Marie-Edith Chaboute, Martine Ehling & Claude Gigot Laboratoire de Virologie, Institut de Biologie Mol~culaire et Cellulaire du C.N.R.S., 15 rue Descartes, 67000 Strasbourg, France

Keywords: codon usage, genomic organization, histone gene, Zea mays

Summary Two histone H3 genes have been cloned from a XgtWESX.B corn genomic library. The nucleotide sequences show 96°7o homology and both encode the same protein, which differs from its counterpart in wheat and pea by one amino acid substitution. The 5'-flanking regions of the two corn H3 genes contain the classical histone-gene-specific consensus sequences and possess several regions of extensive nucleotide homology. A conserved octanucleotide 5 ' - C G C G G A T C - 3 ' occurs at approximately 200 nucleotides upstream from the initiation ATG codon. This octanucleotide was found to exist in all of the 7 plant histone genes sequenced so far. Codon usage is characterized by a very high frequency of C (67O7o) and G (28°7o) at the third position of the codons, those ending by A (1%) and T (4O/o) being practically excluded. C o m p a r i s o n of Southern blots of EcoRI, EcoRV and B a m H I digested genomic D N A suggests that the corn H3 and H4 genes are not closely associated. The H3 genes exist as 60 to 80 copies and the H4 genes as 100 to 120 copies per diploid genome.

Introduction

gene encoding H 2 A was established (23). We have recently cloned and sequenced 2 genes encoding the corn histone H4 (Philipps et al., submitted). Both genes encode the same protein and sequence homologies are restricted to the coding regions. In this publication, we describe the molecular cloning and present the nucleotide sequence of two genes encoding the corn histone H3. The copy number of both H3 and H4 genes as well as their genomic organization are studied. The particular codon usage within these genes is discussed as compared with coding sequences of histone and nonhistone genes from other organisms.

Although the histone genes have been cloned and extensively studied in a wide variety of organisms in the last ten years (reviews in 7, 9), not much is known about these genes in the plant kingdom. In the past two years, the histone H3 and H4 genes have been cloned from a wheat genomic library and the nucleotide sequences of two variants of H4 (19, 20) and one H3 gene (21) have been established. It was shown that, a m o n g 12 recombinant clones hybridizing with the sea urchin H4 gene used as a probe, only 3 also hybridized with the sea urchin H3 gene. This result, together with the comparison of the autoradiograms of different genomic D N A restriction patterns blot hybridized with H3 or H4 genes, suggests that in wheat not all the H3 and H4 genes are closely linked. In rice, on the other hand, the genes encoding the histones H2A, H2B and H4 were shown to be located on a 6.45 kb D N A fragment (22). A partial nucleotide sequence of the

Materials and methods D N A purification The D N A was purified from 5 - 6 - d a y s - o l d corn plantlets (Zea mays - cultivar I N R A 258) as previ253

254 ously described (Philipps et al., submitted), using a technique derived from (26).

Molecular cloning To construct a corn gene library, the EcoRI fragments of partially digested corn genomic DNA were ligated to the EcoRI arms of the phage XgtWESX.B. Bacteria of the E. coli strain 2701:671 r k - m k - reCA- Su2 Su3 were infected with the in vitro packaged DNA concatemers and the recombinant phage plaques were screened by the in situ hybridization technique (2). As a probe we used a HinfI fragment containing part of the H3 coding sequence (nucleotide 154 to 390), subcloned from the clone h22 carrying the genes encoding the 5 major histones of the sea urchin Psammechinus miliaris (15). The recombinant plasmid pCh22 was kindly provided by Prof. Birnstiel.

DNA analysis, transfer, hybridization and sequencing The DNA was analysed by electrophoresis on 0.8 to 2% agarose gels and, when desired, the DNA fragments were transferred onto Gene Screen Plus membranes and hybridized according to Southern (17). The hybridization solution contained 1% SDS, 1 M NaC1, 10% Dextran sulfate, 50% deionized formamide and 100/~g of sonicated heatdenatured E. coli DNA. The nitrocellulose membranes were incubated during 24 to 48 h at 42 °C in the presence of 32p nick-translated DNA probe. The DNA fragments were extracted from agarose gels by the NaI technique (24). For sequencing, the purified DNA fragments were dephosphorylated with calf intestinal alkaline phosphatase, 5'[32p] end-labelled with T4 polynucleoticle kinase and [3,-32p]-ATP, and isolated by 60/0 polyacrylamide gel electrophoresis. The end-labelled DNA fragments were sequenced by the chemical degradation technique (9). Results

Molecular cloning and nucleotide sequences of 2 corn histone H3 genes Molecular cloning and sequencing strategy A corn genomic library was constructed and

screened with the [32p] labelled sea urchin H3 gene as indicated in Materials and Methods. Four positive clones were plaque purified out of 5.104 recombinant phages by 3 rounds of screening. None of the 4 clones hybridized with the sea urchin H4 gene. Two of the H3 positive clones, H3)~C2 and H3XC4, containing inserted DNA fragments of 8.8 and 10.5 kb, respectively, were further characterized. Both inserts were recloned into the EcoRI site of the plasmid pBR322, generating the recombinant plasmids H3pC2 and H3pC4. The restriction maps of these two plasmids were established and the coding regions of the H3 genes were localized by hybridization with the sea urchin H3 gene (Fig. la and 2a). The final complete nucleotide sequence of the coding and flanking regions of both genes was established after successive subcloning of several fragments in pUC9 and following the sequencing strategies described in Figs. 1 and 2.

Nucleotide sequences The nucleotide sequences of H3C2 and H3C4 including the 5' a n d 3' flanking regions are presented in Fig. 3. The coding regions. The primary structures of the two histone H3 proteins deduced from the nucleotide sequences are identical. They differ from their counterparts in wheat (21) and pea (13) by substitution at position 90 of a serine by alanine. Among the 411 nucleotides of the corn H3 coding regions, only 18 substitutions were found (96% of nucleotide homology). The sequence homology is 92% with the wheat H3 gene (21) and only 79% with the sea urchin H3 gene which was used as a probe and which differs from the corn H3 by 5 amino acid substitutions (15). The 5'-flanking regions. The TATA-box like sequences 5 '-TACAAA-3' are present at positions - 113 and - 121, upstream from the initiation ATG codon in H3C2 and H3C4, respectively. Immediately upstream from the TATA boxes are GCTCC ( - 8 and - 1 5 in H3C2 and -13 in H3C4 from TATA) which are very similar to the GATCC pentamers found in other histone genes (7). Similar sequences were found in the 2 corn H4 genes at approximately the same position (Philipps et al., submitted).

255

u,,,~P . . . . . • ,~v'-'~

PE

~

PB SP

H

J_ I J.~ [ ~

P

BH

JI

-~'1~1

B

J

Ii

B

E

I

I___

i I

~ kb

(a)

Cb) S

B

.

:

C~ )

;1 kb E/ Bssh

~qB

H3pUC2"3

II

I

I ]

Rsa I

Cb")

1

=

--

(c'l

--

Fig. 1. Restriction m a p (a, b, b ' ) and sequencing strategy (c, c ' ) of the cloned corn genomic D N A fragment H3C2 recloned at the EcoRI site of pBR322. Two PstI fragments of H3pC2 were subcloned in the PstI site of the polylinker region of pUC9, generating H3pUC2-1 and H3pUC2-2 (b). These inserts were partially sequenced (c). In order to establish the complete nucleotide sequence, the D N A fragment EcoRI-BamHI was subcloned in pUC9, generating H3pUC2-3 (b ') which was sequenced using the restriction sites indicated in (b') and following the sequencing strategy shown in (c'). The boxes (white or plain) indicate the H3 coding sequences. A = Aval, (2A = two Aval sites were not precisely localized between the arrows); B = BamHI; Bg = BgllI; E = EcoR1; H = Hindlll; P = Pstl; S = Sail. Open circles ( o ) indicate that restriction sites of the polylinker region of pUC9 were used for 32p end-labelling.

H3pC4

HA E HB I111 I

I

S S I

AH 11

S I

I ~

ASA I I

H

P BHE

(a)

III 7--~ I tLk~-~ II 'lkb '

(b)

ilkb,

¢ Rsa I Sau 96 I

i 1

I

]

r

I

I I J

=

(b~ I

---

--

--

(cl

Fig. 2. Restriction m a p (a, b, b ' ) and sequencing strategy (c, c ' ) of the cloned corn genomic D N A fragment H3C4, recloned at the EcoRl site of pBR322. Two HindlII-PstI fragments were subcloned into the polylinker region of pUC9, generating H3pUC4-1 and H3pUC4-2 (b) which were partially sequenced (c). A H i n d l l I - B a m H I fragment carrying the complete H3 coding sequence was subcloned in pUC9, generating H3pUC4-3 (b'), which was sequenced using the restriction sites indicated in (b ') and following the sequencing strategy described in (c'). Symbols and lettering for the restriction endonucleases are the same as in Fig. 1.

256

M3 C2 M3 C4

AAAAAATTPCT~I~A'l~ AT'I~AT&TI~'~.~TAGAAGT AGCCT'FrATGC G~lr~ -450

-400

.....................................................

c'i'oc^GATcc'r6.ccoccr^c'r6"rc^r

ATAGATCITITa2I~-~CCCAGCCTTATAC CGGTTA~'~'~'~CGCGATTGCGCCTCTCArrI"~CACTCC A X G C G C C C C A C A T -350 T I T C G C G CATC,C CTGTGAI-~TD2ATCGAGG CGC C COG C CAAAC CAAACIDGCGAGGTTG CG CGGtTFII2AAAGGAGC CG TTTCACO'rrFI~ACCGAAGCGCCCAGCCTG C CTAAC CAACAAATIV,G T A C G G T G G C G C G G T T F I ~ A A A A G A A G T C G G A A -300

-250

~.~.~u.cc ATC"IV,C ACCGATCGACTA~.~CAOOCCC"I"rC:C.A~ ~CCC:AXa"rCACn"r~;TA~CaL.~T^~AC C C C A ~ & C A C C A T C I T ~ A C C CAC C G A C T A G T A G G C C C T C G G A T C C T C CCTGATrAAGTC CTAGCCAATAGGAGCCCAGAACCAC CCA -200

Accc^rc.~c~c~^~c+c^c.rcc~cc.i`cc^cc.rc^~c~cc~Tc6^~`rccc~c~-c~c~Tc^c~^6c T C A C G C G G A T C G T C C CTACGCTTCCACCTCATCGGCGCCGTCCATC~_. _CA_TC_ CAAC A C C T A T I C C G T T A C ~ C -150

CAT

-100

cooc'r^oc6c~cob.~o~cc6oo~'e~'~'cc~oo~c^ccc~eccc^Tccc~rc.rcco,.,,6 cc'rcco,.~rrc'rcooc'rcococ'rccoc^ccr^~Ac,.,.,fr^cccATccc^~cAco^coc^~coc^~c^~c~ -50

OCCTCGCAT6AAACCAACT~TCCCAAATCCACCGCC~CTCOGCTCGCCCATCTfCG6GCCAAGCAC~.GAATCTCCG ATC C C C C A G A A A A T C A A C A C C T C C C A A ~ T C C A C ~

CA CCAACTC GCCGTCCTCCC,C G C C A A G C A C C A A A G C T C A C G +50

ATG GCC CGC ACG AAG CAG ACG GCG CGC AAG TCG ACG GGC GGC AAG GCG CCC CGC AAG CAG ................................ A ........................... Met Ala Arg Thr Lys Gln Thr Ala Arg Lys Set Thr G1H Gly Lys Ala Pro Arg Lys Gln +100 CTG OCC ACC AAG GCG GCG CGC AAG TCG GCG CCG GCA ACC GGT GGC GTG AAG AAG CCT CAC ..... G .................... T ........ C ..... C .............. C ... Leu Ala Thr Lys Ala Ala Arg LHs Set Ala Pro Ala Thr Gly Gly Val LHa Lya Pro H i s *150 CGC 'FI'C CGC CCC GGC ACC GTC GCG CTC COG GAG A~'f CGC AAG TAC CAG AAG AGC ACG GAG ...................................... T ..................... Arg Phe Arg Pro Gly Thr Val Ala Leu Arg Glu Ile Arg Lys Tyr Gln Lys Ser Thr Glu +200 CTG CTC ATC CGC AAG CTG CCC TTC CAG CGC CTC GTC COT GAG ATC GCG CAG GAT TTC AAG ............... T ..................................... C ...... Leu Leu lle Arg Lys Leu Pro Phe Gln Arg Leu Val Arg Glu Ile Ala Gln Asp Phe Lys +250

+300

ACC GAC CTC CGC ~TC CAG TCC TCC GCT GTC GCC GCG CTG CAG GAG GCC GCC GAG GCC TAC ..... T .................... C ..... T ........................... l~%r Asp Leu Arg [~he Gln Ser Set Ala Val Ala Ala Leu Gln Glu Ala Ala Glu Ala Tyr +350 CTC G ~

GGO CTC ~ C

GAG GAC ACC AAC cTc TGC GCC ATC CAC GCC AAG CGC GTC ACC ATC • .T ........... T ........ T .................................... Leu Val Gly Leu Phe Glu A~p Thr Ash Leu Cys Ala lle His Ala Lys Arg Val Thr Ile +400 AT(3 CCC AAG GAC ATC CAG CTC GCG CGC CGC ATC AGG GGC GAG AGG GCT TGATCTCCCGTTGTC .............. T ....................... A . . . . . . . . C . . .GCGAGGCGAGCT Met Pro Lys Asp Ile Gln Leu Ala Arg Arg Ile Arg Gly Glu Arg Ala E n d +450 *500

~ G TC 6 A C G T I ' G ~ C CTTTC CT6T(.'~T(.'~TCTCTG CTI"GTAGAA'I'LC T I ' G ~ ~TI'C ("TTCTG'ITCG6 AT C~ AC CG C A ~ C GG C G ' I " C ~ C'I'I"CC.C,C,C~ A C . G A ~ T I ' I ~ [ : ~ ~ +550 C A ~ C A ~ A C

CATI'G~AC-,AA~A~ATAGAGAG'I'TG'TrCA~AAC +600

CTGA"Fr +650

GTI~ATCTTGCATTCCTG CCTCTGTAATGGTTGT~AATGGAAATGATATTI~-'~'~'~'~'~'AATTCCCAGTIT +700 CATCAATATG'ITCAATCTGTGG'T~AA'a.',~'.~'r,~t"TGTGGACACGGAGTATGTGC CATTGGTCAGAGTAAACAACA~

Fig. 3. Nucleotide sequences of the two corn histone H3 genes H3C2 (upper line) and H3C4 (lower line). The coding sequences are represented as triplets and numbered beginning from the initiation codon ATG. In the coding region of H3C4 only the substituted bases are indicated. The nucleotides of the 5'-flanking region are numbered negatively from the initiation codon ATG. The TATAboxes are boxed, the G A T C C sequences indicated with brackets, and the CAAT-box with dotted lines. The conserved C G C G G A T C sequences are underlined. The A G G A sequences are indicated by a wavy line•

257 Sequences showing analogies with the CAAT-box which is classically seen in the eukaryotic class II genes are located far upstream o f the 'TATA'. In H3C2, the sequence 5 ' - C A A T G C C A C T - 3 ' is 67 nucleotides before the TATA and in H 3 C 4 a similar m o t i f 5 ' - C A T C T C C A T C - 3 ' is 85 nucleotides upstream f r o m the TATA. These sequences differ f r o m the canonical CAAT-box which is generally f o u n d in animal genes. Similar sequences were f o u n d in the 5 ' - f l a n k i n g regions o f the wheat H4 gene (19) as well as in several but not all nuclear genes o f higher plants (1). W h e t h e r or not they play the role o f a typical CAAT-box remains to be determined. C o m p u t e r analysis o f the sequence homologies existing between the 5 ' - f l a n k i n g regions o f the different plant histone genes so far sequenced revealed, in addition to the classical consensus sequences mentioned above, the existence o f a highly conserved octanucleotide: 5'-CGCGGATC-3'. This sequence is present in H 3 C 2 and H 3 C 4 at nucleotide - 2 2 9 and - 2 3 4 , respectively. It is noteworthy that this octanucleotide is present in all o f the 7 plant histone genes sequenced so far and is generally located (except in the case o f corn H4C14) between nucleotides - 1 9 0 and - 2 4 0 (Table 1). N o functional role has yet been assigned for this highly conserved sequence. A p a r t from these consensus sequences, the two 5 ' - f l a n k i n g regions o f the corn H3 genes share significant sequence homologies, as shown by the c o m p u t e r alignments o f the two regions (Fig. 4a). There is one particular region between nucleotide - 1 8 0 and - 3 5 0 , showing 85°/o sequence h o m o l o gy. Such an extensive resemblance was not f o u n d

between the 5 ' - f l a n k i n g regions o f the two corn H4 genes (Fig. 4b) thus suggesting that the two corn H3 genes described here have diverged more recently than did the H 4 genes. T h e 3 ' - f l a n k i n g regions. Between the 3 ' n o n c o d i n g regions o f the two corn H3 genes only limited sequence h o m o l o g y was found. Within the 150 nucleotides immediately downstream from the termination codons, A is largely underrepresented. A l t h o u g h stretches o f T as well as several inverted repeated sequences exist in this region, none o f them can account for the typical T-hyphenated inverted repeat which is t h o u g h t to be required for the production o f the 3' termini o f the m R N A in most o f the animal histone genes (3). Only one short sequence should be mentioned: 5 ' - A G A A - 3 ' at position + 5 5 in H 3 C 2 and 5 ' - A G G A - 3 ' at + 57 in H 3 C 4 with respect to the termination codon. These tetranucleotides show some analogies with the purine-rich sequences which are generally located 10 to 20 nucleotides downstream from the end o f animal histone m R N A (3). C o d o n utilization in c o r n H 3 a n d H 4 g e n e s

The n o n - r a n d o m use o f s y n o n y m o u s codons in the coding sequences o f most eukaryotic genes is now well d o c u m e n t e d (6, 8). Biased c o d o n usage is particularly striking when considering the corn histone H3 and H 4 genes. In Table 2 the 478 c o d o n s o f the 2H3 and the 2H4 genes o f corn are listed as c o m p a r e d with the c o d o n s used in the corresponding genes from wheat (19-21), chicken (4, 18) and

Table 1. Homology block and proposed consensus sequence of 8 nucleotides found in the 5'-flanking region of all the plant histone genes sequenced so far. The position of this octanucleotide is negatively numbered with respect to the initiation codon ATG.

Gene

Position

Sequence

Reference

Corn H3C2 H3C4 Corn H4C7 H4C14 Wheat H4TH011 H4TH091 Wheat H3

-229 234 -207 - 157 - 191 191 - 210 223

5'-CGCGGATC-3' 5' -CGCGGATC-3 ' 5'-CGCGGCATC-3' 5'-CGCGGATC-3 ' 5' -CGCGGCATC-3 ' 5' -CGCGGATC-3 ' 5' -CCCGGATC-3 ' 5' -CGCGGCATT-3 '

This publication This publication Philipps et al. Subm. Philipps et al. Subm. (19) (20) (21) (21)

Consensus

- 190 to - 235

5' -CGCGGATC-3'

258 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ;..C'rOCAOA+CCT0CCOCCi" II I I TATAOAI~-xTI~ACCCAC.CC'I"rATACCC, GTI'A-I-~-x-x~ ~C.A~G

ACTCI~A'PFrI~GCGCATGCC . . . . . . . .

;,.'I'6"~ArrI'IV^TCOAG~

IIIIIIIIIII I I I I I IIIIII I I I I1 CCTCTCA'I'ITI~ACTC CAXGCGC C CC ACATTITCACGI"ITICAC CGAAG C

occcoocc.......AAAC6AAACrOOC0~C~OCO&-,~. AAA6 IIII III I I1'1 I'1 III IIitll Illi I1'1 GCCCAGCCTC,CCTAACCAACAAATI'GGrACGGTC,GCOCG,.,TI-I-I~ AAAAG GAGCCGAJ%AACCATCIV.,CACCGATCGACTAGCACAGGCCC'TTCGATCCTC II II Illiilllllllll I IIIIIII IIIIIII IIIIIII

AACOC^TCAiACOOCCCAA6CCAAAcm~C,~m~,&O~-'rCCCC^6 II II I l l I I II I I GAATTCGAA~-~'~x.~_,C~CAC,CGGTCCAC,CCT

................

AAA~CCAJ&C~C^CO6OA='rOCAC^e^oc^crc. i~CrCC0~6

AAGI~JGAAACCATCTGCACCCACCGACTAG..TAGGCCCTCGGATCCTC

i ii i i i i i iili OAGTCTGTTGCGGT.CGT~GACACAC,

CCCAAITCAGTTGTAGCCAATAGGAGCCCAGAGCCACCCATCACGCGGAT II III III IIIIIIIIIIIIIIIIII1 IIIIIIIIIIit11111 CC"I~GAI"rAAGTCCTAC,CCAATAC,(]AGCCCAG~CCACCCATCACGCC, GAT

'l ill , 'i I' I' il II III I I' I I GTATACAGCCTCACGCCAGCCAACTCAGCGATCCGTGC~ACCACGT

CCTC ACTCCOCC'rCCACCT6ATCC, GC C,CC6"i'~3C A I C CC,CC,CA.A.~CA I II II III IIIIIIlllllllllllllllll II III II CGTCCCTACGCTI~CACCTCATCGGCGCCGTCCATCTCCATCCAACACCT ~ CG.I.~7 AC CTAGCCCAT,CTiG C C CTI,CI~I,~CG AAA~mj~G.I~I~C C CC,C,C.T. ' ' I I ' i '111 '1 lilllll ill Ili'l ATTC ....... CGTrACCTI~CCCATCCTCCGAAAAAATTCTCGGCTCGC

.CCCCGCTCCC,O'I~ACAAA~ . . . . . . .

ACACeCAACCCCATCCCAATCTC

' '111 I I ii'lil I' i I Ill' II GCTCCGCACCTAC'rACAAATACCCA~CCATCACGACGCATCGCATCAeT

i ii ill i i GTCCQ-rcc.GTCCCCTCTCT

CACC^CC0C6AAA~CCTC/~CCCAACO...COCrcci'co^~COCA÷ TCCCI'C'ITCiX:CACCAATA6CAATCO~C'ITCCCATC6A/~CC, G'I'rCAT I

t I I

I

II

I I

I

I

I

II

II

I

c^oco^Tecocooc^~x'r^ccc^oc~^ccco'rcco~crcoT CCC^CCCTC6C0OCTCrC^6OC00^~C0"r0CCCC^~AA6OCCrCCC^Ci' t' , "" i ' ' ' " ' ' " ' I ' Ill OCCCATCGAACGGCGCAGAGTCGCAGACTOGCTCCGCACCCCACCACACG

~occrcoc/acrCrAT^~^c..~avocccc^cAcrdcoc^ocrcc~ " ' I II I '1 I ' '1 I ' I li CCG'I'I'I'A~m, ATAAC'I'CAT'FI'CCCGTIX3ATt-'I"GAJ%GCCACTCG'TCTCATC'I'C

CGAACGCCTCGCATCAAACCAACI%TIu~CCAAATCCACCGCCGGCTCC~ I I I il I I I 1111 I IIIIII IIII1 I OCCAAATCCCOC^G~AAA~C~C^C.C'rCccA~C^COCI~CC^CC~

TCCAATCCA~'~3AGTACCCAJ~'I~'-'TCCCCTACACCAGAAAAATCCCCAGTI~ I I II I I illll I III Ilil IIII II TTCCATACGACCATACCCTACCTCC ............ AAATTCCCAATI~

~ C , CCCAT61"I'CGC~CAAGCACCG.%AT~CCGA.

~ C ~ 3 G ' T C A O A O A O ' i " c ~ c , . c , COA I II I IIIIII I i I I TCA.I.CC'I'C'I'CeTCAOAGTI'CGAGTGACCA

llllll

I1

IIIIIIIIIllll

11 I I I

CTCGCCGTCCTCCGCGCCAAGCACCAAAGCTCACGA (a)

(b)

Fig. 4. Sequence homologies in the 5'-flanking regions between the 2 corn H3 genes H3C2 and H3C4 (a) and between the 2 corn H4 genes H4C7 and H4C14 (b) (Philipps et al., submitted). The nucleotide alignments were obtained using the 'GAP' computer program of the University of Wisconsin Genetics Computer Group (UWGCG).

the sea urchin Psammechinus miliaris which was used as probe (15; S. Jallat, personal communication). In order to make interspecies comparisons easier, we present in Table 3a the frequency of each of the 4 bases at the third position of the codons in the H3 and H4 genes of the organisms cited above. From these tables it appears that in the corn genes the codons ending with A and T are largely underrepresented, with those ending with AA and TA being completely absent (Table 2). Among the 95% of codons ending with C or G, those ending with C are overemployed (67%) as compared with the general base composition of the genes. Strikingly biased codon usage also occurs in the H3 H4 genes of wheat (19-21) and to a lesser extent in chicken (4, 18), Xenopus (11) and man (25), as well as in some nonhistone protein genes such as the gene encoding the corn storage protein glutelin-2 (14) (Table 3b). On the contrary, the underrepresentation of degenerate A and T does not exist in the H3 H4 genes of sea urchin (15), the yeast Saccharomyces cerevisiae (16), nor in the coding se-

quences of the majority of plant and animal nuclear genes (Table 3b). Studies on the dinucleotide frequency and codon usage in eukaryotic genes has revealed a marked underuse, if not quasi-exclusion, of the dinucleotide CpG as compared with GpC (12). It was interesting to study the occurrence of these two dinucleotides in sequences showing such a large excess of G and C. Independently of their position in the codons, the CpG and GpC doublets exist in the corn H3 and H4 genes in approximately the same frequency (12.7% and 13.8% respectively of the overall dinucleotides). When considered with respect to their position in the codons, it appears from the arginine codons used in the corn H3 and H4 genes (Table 2) that there is no exclusion of CpG in position 1 - 2 of the codons, as the quartet CGT, CGA, CGC and CGG is used more frequently than the codon duet AGA and AGG. On the other hand, in Table 4 we show the frequency of CpG and GpC occurring in position 2 - 3 of the codons of several genes. In all of the genes listed here, GpC

259 Table 2. C o d o n u s a g e in t h e c o d i n g r e g i o n s o f t h e h i s t o n e H 3 H 4 g e n e s in c o r n ( 2 H 3 a n d 2 H 4 genes), w h e a t ( 1 H 3 a n d 2 H 4 ) , c h i c k e n a n d sea u r c h i n Psammechinus miliaris ( I H 3 a n d 1 H 4 ) . Corn

Phe Leu Leu

Ile

Met Val

Ser

Pro

Thr

Ala

TTT TTC TTA TTG CTT CTC CTA CTG ATT ATC ATA ATG GTT GTC GTA GTG TCT TCC TCA TCG

3 11 1 1 27 11 5 23 4 18 10 2 4 1 5 1 9

CCT CCC CCA CCG ACT ACC ACA ACG GCT GCC GCA GCG

4 1 23 10 5 21 1 27

Wheat (19.20.21)

Chick (4.18)

9 30 1 6 21 3 15 7 1 6 2 4 4 22 2 1 23 9

6 2 9 9 2 11 4 8 7 1 1 3 1 4 1 1 2 6 9 1 13 11

S. U r c h i n (15)

Corn

Tyr 6 Term. 2 3 4 3 8 1 12 3 2 6 3 4 2 1 1 3 3 1 2 8 5 1 5 12 8 -

His Gin Asn Lys Asp Glu Cys Term. Trp Arg

Set Arg Gly

TAT TAC TAA TAG CAT CAC CAA CAG AAT AAC AAA AAG GAT GAC GAA GAG TGT TGC TGA TGG CGT CGC CGA CGC AGT A'GC AGA AGG GGT GGC GGA GGG

12 1 8 20 1 5 48 2 12 22 2 3 . 5 43 6 2 1 9 3 32 4 9

Wheat

Chick

S. U r c h i n

10 1 3 6 14 5 1 33 2 9 15 1 -

1 6 1 4 1 8 2 1 2 22 2 5 2 9 1 1

1 6 1 1 2 2 3 7 2 1 5 19 3 3 2 10 1 -

6 21 3 3 1 1 3 14 2 5

8 9 7 1 3 6 2 6 9 7 3

.

. 1 32 8 1 6 34 2 4

.

Table 3. P e r c e n t a g e o f e a c h n u c l e o t i d e o c c u r r i n g in 3 r d p o s i t i o n o f t h e c o d o n s in H 3 H 4 g e n e s (a) a n d in s o m e o t h e r p r o t e i n g e n e s (b) o f d i f f e r e n t o r g a n i s m s .

A T C G C + G

(a)

(b)

Histone H3- H4 Genes

Nonhistone protein genes

Corn

Wheat ( 19.20.21)

Chick (4.18)

S. U r c h . (15)

Glut. (14)

Zein (5)

Plant (8)

Anita. (8)

1 4 67 28 95

1.5 1.5 63 34 97

3.5 10 47.5 39 86.5

21.5 17 36 25.5 61.5

9 12.5 33 44.5 77.5

30.5 24 24 21.5 45

30 29.5 22.5 18 40.5

14 21 38 27 67

Table 4. F r e q u e n c y o f C p G a n d G p C in p o s i t i o n 2 - 8 o f t h e c o d o n s in H 3 H 4 (a) a n d n o n h i s t o n e p r o t e i n (b) genes. T h e v a l u e s a r e e x p r e s s e d as p e r c e n t a g e s o f t h e t o t a l n u m b e r s o f c o d o n s w i t h either C ( f o r C p G ) o r G ( f o r G p C ) in s e c o n d p o s i t i o n o f t h e c o d o n s . CpG GpC

40 68

23 72

44 63

2 35

52 75

9 39

7.5 28

8 39

260 is used more frequently than CpG. However, the frequency of CpG is significantly higher in corn, wheat, chicken H3 and H4 genes and in corn glutelin-2 gene, than in sea urchin H3 H4 genes, in the gene encoding zein and in the majority of plant and animal genes (Table 4). The comparison of the data of Tables 3 and 4 suggests that CpG is used more frequently in those genes showing an underuse of A and T in third position of the codons.

No band was found to hybridize with both H3 and H4, thus showing that the two genes are never located on the same EcoRI generated DNA fragment. In order to obtain more information about a possi-

H3

Genomic organization and reiteration of the H3 and H4 genes in the corn genome Genomic organization Southern hybridization of EcoRI digested corn genomic DNA with 32p-labelled H3 or H4 sea urchin genes showed complex autoradiographic patterns with many different labelled bands (Fig. 5). I

I

1

2.36 kb I

I

I

I

I H4

II

N

I I

/~

/~

tXI

L I I

I

!,l~ I l/l/t,,'t A

II

II

/I II

Fig. 6. Estimation of the copy number of H3 and H4 genes in

Fig. 5. Blot hybridization of corn genomic D N A with H3 and H4 genes of sea urchin used as probes. Corn genomic D N A was digested with EcoRI, EcoRV or B a m H l . Two sets of each digest were coelectrophoresed on a 0.8°70 agarose gel. One set was blot-hybridized with 32p-labelled H3 gene (lanes 3) and the second with the H4 gene (lanes 4) of sea urchin. Size references are a mixture o f HindIII and HindIII + EcoRI fragments of XDNA.

corn genome. Ten #g (estimated colorimetrically) of EcoRl digested corn genomic D N A were run on a 0.8°7o agarose gel in parallel with known a m o u n t s of 2 differently sized D N A standards containing either the H3 or the H4 gene of corn. The H3 standards were: H3pUC2-3 linearized with EcoRI (3.9 kb) and H3C2 (8.8 kb) excised from H3pC2 with EcoRI. As H4 standards, we used H4pC14 (7.4 kb) linearized with B a m H l and H4C14 (3.1 kb) excised from H 4 p C I 4 with EcoRI. As probes we used 32P-labelled corn H3 (251 bp PstI-SalI fragment) and H4 (328 bp HinfI-BanI fragment including 16 noncoding bp) genes isolated from H3C2 and H4C14 respectively. The copy number of each gene was estimated by scanning densitometry of the autoradiograms. In the densitometer tracings presented here, the D N A standards (dotted lines) correspond to 10 copies of the H3 (a) and 40 copies of the H4 (b) genes per diploid corn genome. Size references are those of a mixture of HindlIl and HindllI + EcoRI digested XDNA.

261 ble clustering of the H3 and H4 genes in corn, the genomic DNA was digested with either EcoRV or BamHI and the genomic blots were hybridized with 32p-labelled sea urchin H3 or H4 genes. Although the DNA fragments of high molecular weight are poorly resolved, the pairwise comparison of the autoradiograms showed few if any common bands (Fig. 5). These results very strongly suggest that in corn, the H3 and H4 genes are not clustered as occurs in many other organisms (review in I0), and most probably exist as dispersed elements in the genome. Reiteration The EcoRI digested corn nuclear DNA was coelectrophoresed on an agarose gel with different known amounts of recombinant plasmid DNA fragments containing the coding regions of either the H3 or H4 corn genes. In order to minimize quantitative errors due to differences in transfer efficiency, two differently sized reference DNAs were used for each gene. The transferred genomic and reference DNA were hybridized with the 32p_ labelled coding sequences of corn H3 or H4 gene. The number of copies was established by densitometer scanning of the autoradiograms. The densitometer tracings used for the reiteration estimations are shown on Fig. 6a for H3 and 6b for H4. These evaluations showed that H3 was reiterated 60 to 80 times and H4 100 to 120 times per corn diploid genome. Discussion

Our knowledge of plant histone genes is restricted to 3 plants of the same family: the gramineae. The complete nucleotide sequence of 7 genes was established in wheat (19-21) (2 H4 genes and 1 H3 gene) and corn (2 H4 genes and 2 H3 genes), and a partial sequence of a H2A gene of rice has recently been published (23). From these data the following conclusions emerge: 1) Among the classical 5'-flanking sequences, a typical motif could not be established for the CAAT-box. Different sequences such as CACT or CATC were often found associated as doublets analogous to those described in other plant genes (1). The regulatory role of these sequences remains to be determined.

As far as the histone gene specific GATCC motif is concerned (7) it appears from the comparison of the 7 wheat and corn histone genes that the pentamer 5 ' - G C (~) ( ~ C-3' emerges as the consensus sequence in plant histone genes. One or two copies of this pentamer are located in all genes 5 to 20 nucleotides upstream of the TATA-box. The presence in all of the plant histone genes so far sequenced of the octanucleotide 5'CGCGGATC-3' at approximately 200 nucleotides upstream from the initiating ATG is intriguing. No analogous sequence occurs in the 5'-flanking regions of animal histone genes or plant and animal nuclear genes, thus suggesting that this octanucleotide is specific for the plant histone genes. More information is necessary on the structure and expression of plant histone genes in order to establish the universality of this sequence and to determine its possible regulatory function. 2) The codon usage in both wheat and corn histone genes is significantly different from that in the majority of animal histone genes and plant nuclear genes. It is characterized by the quasiexclusion of A and T from the ends of the codons. This was also found, to a lesser extent, in the histone H3 H4 genes of chicken (4, 18) and in the gene encoding glutelin-2 (14), a storage protein of corn. On the contrary, the gene encoding zein, another corn storage protein, does not show this particularity (5). It is noteworthy that those genes with low amounts of A and T in third position of the codons do not show the classical quasi-exclusion of the CpG doublet (compare Tables 3 and 4). That such differences exist in codon usage between genes of the same plant (i.e. histone H3 and H4, glutelin-2 and zein in corn) is intriguing. It will be very interesting to see whether or not there exists in this organism a correlation between the particular codon usage and the rate of gene expression as has been proposed elsewhere (6). More sequences of plant histone genes are needed to determine whether the very strikingly biased codon usage found in wheat and corn histone H3 and H4 genes is restricted to the gramineae histone genes. The partial sequence of the rice histone H2A gene (23) is too short to permit any further conclusion. 3) Considering the genomic organization of the

262 H3 and H4 genes in corn, there are several arguments suggesting that these genes are dispersed in the genome: (i)

(ii)

(iii)

(iv)

None of the cloned DNA fragments react with both H3 and H4 probes, nor did any hybridize with the sea urchin H2B gene used as probe (data not shown). Southern blots of corn genomic DNA restricted with 3 different endonucleases show a very small number (if any) of DNA fragments hybridizing with both H3 and H4. The 2 corn H3 genes show extensive sequence homologies in their 5'-flanking noncoding regions whereas the H4 genes do not, suggesting that they have undergone independent evolution. H3 and H4 genes have significantly different copy numbers.

As mentioned before, in wheat the H3 and H4 genes are partially linked (21) whereas in rice the H2A, H2B and H4 genes are located on a 6.45 kb DNA fragment. Such topological variability was also found in the organization of the histone genes in the animal kingdom. It is evident that additional work will be required in order to allow us to draw universal conclusions on the structure and genomic organization of plant histone genes. In particular, there is an almost complete lack of information on (i) the genes encoding HI, H2A and H2B and (ii) the organization of the histone genes in dicotyledons.

4.

5.

6.

7. 8. 9.

10.

11.

12. 13.

14.

15.

16.

Acknowledgements 17.

We wish to thank Prof. L. Hirth for stimulating discussions and for his constant interest in this work. We also thank Prof. M. L. Birnstiel for the gift of the plasmids pCh22 and pHae 181.

18.

19.

References 1. Anderson OD, Litts JC, Gautier MF, Greene FC: Nucleic acid sequence and chromosome assignment of a wheat storage protein gene. Nucleic Acids Res 12:8129-8144, 1984. 2. Benton WD, Davis RW: Screening Xgt recombinant clones by hybridization to single plaques in situ. Science 196:180- 182, 1977. 3. Birnstiel ML, Busslinger M, Strub K: Transcription termi-

20.

21.

22.

nation and 3' processing: the end is in site. Cell 41:349 - 359, 1985. Engel JD, Sugarman BJ, Dodgson JB: A chicken histone H3 gene contains intervening sequences. Nature 297:434- 436, 1982. Geraghty DE, Messing J, Rubenstein I: Sequence analysis and comparison of cDNAs of the zein multigene family. EMBO J:1329- 1335, 1982. Grantham R, Gauthier C, Gouy M, Jacobzone M, Mercier R: Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res 9:r43-r74, 1981. Hentschel C, Birnstiel ML: The organization and expression of histone gene families. Cell 25:301- 313, 1981. Lycett GW, Delauney AJ, Croy RRD: Are plant genes different? FEBS Lett 153:43- 46, 1983. Maxam AM, Gilbert W: Sequencing end-labeled DNA with base-specific chemical cleavage. Methods Enzymol 65:499- 560, 1980. Maxson R, Cohn R, Kedes L, Mohun T: Expression and organization of histone genes. Annu Rev Genet 17:239-277, 1983. Moorman AFM, De Boer PAJ, De Laaf RTM, Van Dongen WHAM, Destree OH J: Primary structure of the histone H3 and H4 genes and their flanking sequences in a minor histone gene cluster of Xenopus laevis. FEBS Lett 136:45 - 52, 1981. Nussinov R: Some rules in the ordering of nucleotides in the DNA. Nucleic Acids Res 8:4545- 4562, 1980. Patthy L, Smith EL, Johnson J: Histone Ill: V. The amino acid sequence of pea embryo histone III. J Biol Chem 248:6834- 6840, 1973. Prat S, Cortadas J, Puigdomenech P, Palau J: Nucleic Acid (cDNA) and amino acid sequences of the maize endosperm protein glutelin-2. Nucleic Acids Res 13:1493- 1504, 1985. Schaffner W, Kunz G, Daetwyler H, Telford J, Smith HO, Birnstiel ML: Genes and spacers of cloned sea urchin histone DNA analysed by sequencing. Cell 14:655-671, 1978. Smith MM, Andresson OS: DNA sequences of yeast H3 and H4 histone genes from two non-allelic gene sets encode identical H3 and H4 proteins. J Mol Biol 169:663-690, 1983. Southern EM: Detection of specific sequences among DNA fragments separated by electrophoresis. J Mol Biol 98:503 - 517, 1975. Sugarman B J, Dodgson JB, Engel JD: Genomic organization, DNA sequence and expression of chicken embryonic histone genes. J Biol Chem 258:9005-9016, 1983. Tabata T, Sasaki K, Iwabuchi M: The structural organization and DNA sequence of a wheat histone H4 gene. Nucleic Acids Res 11:5865 - 5875, 1983. Tabata T, Iwabuchi M: Molecular cloning and nucleotide sequence of a variant wheat histone H4 gene. Gene 31:285 - 289, 1984. Tabata T, Fukasawa M, Iwabuchi M: Nucleotide sequence and genomic organization of a wheat histone H3 gene. Mol Gen Genet 196:397- 400, 1984. Thomas G, Padayatty JD: Organization and bidirectional transcription of H2A, H2B and H4 histone genes in rice embryos. Nature 306:82-84, 1983.

263 23. Thomas G, Padayatty JD: Restriction map and partial sequence of a rice DNA fragment carrying histone genes H2A, H2B and H4. Ind J Biochem Biophys 21:1-6, 1984. 24. Vogelstein B, Gillespie D: Preparative and analytical purification of DNA from agarose. Proc Natl Acad Sci USA 76:615-619, 1979. 25. Zhong R, Roeder RG, Heintz N: The primary structure and expression of four cloned human histone genes. Nucleic

Acids Res 11:7409- 7425, 1983. 26. Zimmerman JL, Goldberg RB: DNA sequence organization in the genome of Nicotiana tabacurn. Chromosoma 59:227 - 252, 1977.

Received 2 October 1985; in revised form 12 December 1985; accepted 16 December 1985.

Nucleotide sequences of two corn histone H3 genes. Genomic organization of the corn histone H3 and H4 genes.

Two histone H3 genes have been cloned from a λgtWESλ.B corn genomic library. The nucleotide sequences show 96% homology and both encode the same prote...
851KB Sizes 0 Downloads 0 Views