Plant Molecular Biology 8: 179-191 (1987) © Martinus Nijhoff Publishers, Dordrecht - Printed in the Netherlands

179

Genomic organization and nucleotide sequences of two histone H3 and two histone H4 genes of Arabidopsis thaliana Marie-Edith Chaboute, Nicole Chaubet, Gabriel Philipps, Martine Ehling & Claude Gigot Laboratoire de Virologie, lnstitut de Biologie Mol~culaire et Cellulaire du C.N.R.S., 15 rue Descartes, 67000 Strasbourg, France

Keywords: Arabidopsis thaliana, codon usage, genomic organization, histone gene

Summary Two histone H3 and two histone H4 genes have been cloned from a kgtWESk.B Arabidopsis thaliana gene library. From their nucleotide sequences and from studies on their genomic organization, the following conclusions can be drawn: 1) The nucleotide sequences of the two H3 coding regions show only 85070 homology, but encode the same proteins. The Arabidopsis H3" has the same amino acid sequence as its counterpart in corn, but differs from that of pea and wheat by replacement in position 90 o f a serine by an alanine. The two H4 coding regions have 97070 sequence homology and encode the same protein, identical to the sequence of their counterpart in pea, corn and one H4 variant in wheat. 2) The 5 '-flanking regions of the 4 genes contain the classical histone-gene-specific consensus sequences, except H3A725 which lacks the GATCC-Iike pentamer. The conserved octanucleotide 5'-CGCGGATC-3' which was previously found in the 5 '-flanking sequences of corn and wheat H3 and H4 genes is also present in all four genes described here approximately 200 to 250 nucleotides upstream from the initiation ATG. The 5'-flanking regions of the H4 genes display extensive sequence homology, whereas those of the H3 genes do not. 3) The 3'-flanking regions do not possess the classical histone-gene-specific T hyphenated dyad symmetry motif. 4) Each H3 and H4 gene exists as 5 to 7 copies per haploid genome.

Introduction Although our knowledge of the organization, structure and expression o f the histone genes is well documented in the animal kingdom (reviews in 8 and 11), it is restricted in the plant kingdom to three cereals: wheat (18-20), rice (21, 22) and corn (5, 14). It was shown that in the wheat and corn genomes the genes encoding histone H3 and H4 were present at approximately 60 to 100 copies and that these two genes were not (all) closely linked. In rice, on the contrary, one set of H2A, H2B and H4 genes was found to be associated on a 6.45 kb DNA fragment. All these plants have large nuclear genomes with high amounts of dispersed repetitive sequences (6).

In this publication, we present the first studies on the histone genes of a dicotyledon: Arabidopsis thaliana. This small weed of the crucifereae family with a generation time of 4 - 5 weeks has been used in many genetic studies. Its nuclear genome is known to be the smallest among higher plants, contains only 10 to 1407orepetitive DNA (9) and is composed predominantly of single-copy sequences (15). Here we describe the molecular cloning and complete nucleotide sequences of two histone H3 and two histone H4 genes of Ar'abidopsis thaliana. The codon usage within the 4 coding sequences is discussed as compared with that in the histone genes of other organisms. The copy numbers of both H3 and H4 genes have also been determined.

180 Materials and methoas

DNA purification The DNA was extracted from 4 week-old Arabidopsis thaliana (wild type collected in Strasbourg) plants, using a technique derived from that of (24). Frozen leaves were ground in a mortar in the presence of liquid nitrogen and the powder suspended in 3 volumes of extraction buffer (Tris-HC1 200 mM pH 9.0, EDTA 200 mM, SDS 4°70) preheated to 60 °C. After two extractions with one volume of phenol-chloro form-isoamyl alcohol (25:24:1) followed by two extractions with chloroform-isoamyl alcohol (24:1), the aqueous supernatant was treated with ether and the DNA precipitated overnight at - 2 0 °C with two volumes of distilled ethanol in the presence of 0.2 M NaC1. The precipitate was washed with 70% ethanol vacuum dried and suspended in sterile distilled water. After a 1-hour treatment at 37 °C with preheated RNase A (200/zg/ml), the suspension was adjusted to 0.5°70 SDS and treated with 200/~g/ml of proteinase K overnight at 37 °C. The DNA was reextracted twice with one volume of phenolchloroform-isoamyl alcohol (25:24:1) twice with chloroform-isoamyl alcohol and precipitated overnight at - 2 0 ° C with 2 volumes of distilled ethanol.

plasmids pCh22 and pHae 181 were kindly provided by Prof. Birnstiel. The hybridization solution contained 2 × SSC, 1 × Denhardt's solution, 0.05 M phosphate buffer pH 6.5, 50% deionized formamide and 100/xg/ml of sonicated denatured E. coli DNA. The filters were soaked in this solution for 6 h at 37 °C before adding the radioactive probe. Hybridization was carried out for 24 to 48 h. The filters were then washed twice for 30 min at 37 °C in 2 × SSC containing 0.1°70 SDS.

DNA analysis, transfer and hybridization DNA was analysed by electrophoresis on 0.8 to 2% agarose gels. The fragments were transferred onto Gene Screen Plus membrane and hybridized according to Southern (17). The hybridization solution contained 1% SDS, 1 M NaCI, 10070 Dextran sulfate and 500/o deionized formamide. After 6 h prehybridization at 42 °C, the heat-denatured radioactively labelled H4 probe was added along with 100/~g/ml sonicated heat-denatured E. coli DNA. Hybridization was carried out for 24 to 48 h at 42°C. The membrane was then washed 5 min in 2 × SSC at room temperature, twice for 30 min in 2 x SSC-SDS 1% at 42°C and finally once for 10 min in 0.1 × SSC at room temperature.

DNA sequencing Construction of an Arabidopsis gene library and molecular cloning The purified Arabidopsis DNA was partially digested with EcoRI and the fragments were ligated to the EcoRI arms of the phage kgtWESk.B used as vector. The DNA concatemers generated by ligation were packaged in vitro using 'packaging extracts' prepared from the E. coli lysogene strains BHB 2688 and BHB 2690 as described in (2) with slight modifications (B. Hohn, personal communication). Bacteria of the strain 2701 were infected and the resulting phage plaques were screened by the in situ hybridization technique (3). As probes we used the HinfI fragment containing part of the H3 coding sequence (nucleotide 154 to 390), subcloned from the clone h22 carrying the genes encoding the five major histones of the sea urchin Psammechinus miliaris, and the coding region of the sea urchin histone H4 gene isolated from the recombinant plasmid pHae 181. Both recombinant

The DNA fragments to be sequenced were extracted from agarose gels by the NaI technique described in (23). After digestion with the appropriate restriction nuclease(s) and dephosphorylation with calf intestinal alkaline phosphatase, the fragments were 5'-end labelled with [-/32p]ATP with T4 polynucleotide kinase and isolated by 6% polyacrylamide gel electrophoresis. The endlabelled DNA was sequenced by the chemical degradation technique (10).

Results

Molecular cloning and nucleotide sequences of two H3 and two H4 genes Molecular cloning EcoRI digested with genomic DNA of Arabidopsis thaliana was analysed on a 0.8% agarose gel and

181 transferred onto Gene Screen Plus membrane according to Southern (17). The genomic blots were hybridized with 32p-labelled sea urchin H3 or H4 coding sequences. The hybridization patterns presented in Fig. 1 show 4 bands of 4.5, 6.0, 8.5 and 9.75 kb hybridizing with H3 and 3 bands of 3.0, 3.25 and 5.5 kb hybridizing with H4. In order to study the structure of some of the H3 and H4 genes, an Arabidopsis genomic library was constructed, using the EcoRI arms of the bacteriophage kgtWESk.B DNA as a vector. 1.6 x 105

recombinant phages were screened with the 32p_ labelled sea urchin H3 and H4 genes used as probes. 31 H3 and 3 H4 positive clones were detected and plaque purified after three rounds of screening. The DNA fragments inserted in 6 H3 positive recombinant clones among the 31 positives were shown to belong to two classes of 9.75 and 8.55 kb in size and all three H4 positive clones had a DNA fragment of 5.5 kb. Two H3 positive DNA fragments, H3A713 (9.75 kb) and H3A725 (8.55 kb), and two H4 positive DNA fragments, H4A748 and H4A777, were subcloned into pBR322. After their localization on the different fragments, the coding regions as well as the 5 ' - and 3 '-flanking regions were sequenced using the strategies presented in Figs. 2 to 5. The restriction maps of H3A713 and H3A725 are very different, whereas those of H4A748 and H4A777 show extensive similarities. The nucleotide sequences of the two H3 and the two H4 genes are presented in Figs. 6 and 7, respectively.

H3 and H4 coding regions

Fig. 1. Hybridization patterns of two Southern blots of 2/zg each of EcoRI digested Arabidopsis genomic D N A electrophoresed on a 0.8°7o agarose gel. The hybridization probes were the sea urchin Psammechinus miliaris H3 (nucleotide 154 to 390) and H4 (nucleotide 9 to 286) coding regions 32p_ labelled by nick-translation. The arrows indicate the D N A fragments which were cloned and partially sequenced.

The amino acid sequences of the two histone H3 proteins deduced from the nucleotide sequences presented in Fig. 6 are the same. They differ from the sea urchin P. miliaris H3 by 5 substitutions: in position 41 tyrosine is replaced by phenylalanine, arginine by lysine in 53, glutamic acid by aspartic acid in 81, methionine by alanine in 90 and serine by alanine in 96. It should be mentioned that the sequence of H3 of Arabidopsis is the same as that of corn (5), but differs from that of wheat (20) and pea (13) by substitution at position 90 of serine by alanine. At the nucleotide level, there are 61 substitutions between the two H3 coding regions, leading to a sequence homology of only 85°70. Such a divergence between two genes of the same organism, encoding the same protein is noteworthy, as we have shown recently that in corn, two H3 genes present 96°7o sequence homology (5). In contrast to the divergence existing at the sequence level between the two H3 genes, the H4 genes show 97°70 homology. These two genes encode the same protein (Fig. 7) which is identical to its counterpart in pea (13), corn (14) and one variant of the wheat H4 (18). It differs from the sea urchin P. miliaris H4 by substitution of valine by

182

i

lkb

m

E H3PA713.1 I

S I

X I

H I I s ~ ;

S H X R HA IIl~lBg I I I ~ A

H X I I

XE sgll 9.75kb

a

H3PA713.2 .1 kb

Bgl II Hind III Xba I

I I

I

1

I

D

~

b

I

'.

:-

9

I

Fig. 2. Restriction map (a, b) and sequencing strategy (c) of the cloned Arabidopsis DNA fragment H3A713, recloned at the EcoRl site of pBR322. The Sall-AvaI fragment of 2.1 kb containing the H3 coding region was subcloned into pBR322 generating the subclone H3pA713.2 which was partially sequenced using the restriction sites indicated in (b) and following the strategy schematized in (c). The white or black boxes indicate the coding sequences. A = AvaI; Bg = BgllI; E = EcoRI; H = HindlII; R = EcoRV; X = Xbal.

lkb

J

H3PA725

EP l

H HB 4H H ..I. II1 : : : I I1"'1 I ~ I I B~g'~ Bg~'~ Bg ~

P I

B E I I 8.55kb

B I

H3PUA725

Hind III Bgl II Rsa I

/

I I I .~ 4 sitesl,.-

I

I I

I

I I

Msp I

I I1

I q I

b

:

~

I

I m

Fig. 3. Restriction map (a, b) and sequencing strategy (c) of the cloned Arabidopsis DNA fragment H3A725, recloned at the EcoRI site of pBR322. The PstI-BamHl fragment of 4.25 kb containing the H3 coding region was subcloned into the polylinker region of pUC9, generating the plasmid H3pUA725 which was partially sequenced using the restriction sites shown in (b) and following the sequencing strategy schematized in (c). Symbols and lettering are the same as in Fig. 2. B = BamHl; P = PstI.

i s o l e u c i n e in p o s i t i o n 60, cysteine by t h r e o n i n e in 73 a n d lysine by a r g i n i n e in 77. U p to now, t h e n u c l e o t i d e s e q u e n c e s o f 5 H 3 genes and 6 H4 genes from three different plants

( t w o cereals a n d o n e d i c o t ) h a v e b e e n e s t a b l i s h e d . It is i n t e r e s t i n g to c o m p a r e t h e s e q u e n c e h o m o l o gies existing b e t w e e n t h e s e g e n e s e n c o d i n g s u c h h i g h l y c o n s e r v e d p r o t e i n s . I n Table 1 we p r e s e n t t h e

183 lkb

i

i

E

AA "l-Bg

H4PA748

H4PA748.2

B

A,HA I III

L /

PH I I

~

Accl

P I

E I

,5.5 k b

I I

b

Ava I

I

Bgl II

I

EcoR V

I

I .lkb

~-

:

~



~

c

{---i,,

Fig. 4. Restriction map (a, b) and sequencing strategy (c) of the cloned Arabidopsis DNA fragment H4A748, recloned at the EcoRI site of pBR322. The EcoRI-BamHl fragment of 3.0 kb containing the H4 coding region was subcloned into pBR322, generating the plasmid H4pA748.2 which was partially sequenced using the restriction sites shown in (b) and following the sequencing strategy schematized in (c). Symbols and lettering are the same as in Figs. 2 and 3. lkb L

I

H4PA777.1

E I

AA _L_I

B A H I~

PA III

P

E J

5.Skb

Bg H4pA777.2

Acc I

I

Ava I

I

I

Ban I B g l II

I .lkb

I

: :

:

: :

:

c •

I

q

i

Fig. 5. Restriction map (a, b) and sequencing strategy (c) of the cloned Arabidopsis fragment H4A777, recloned at the EcoRI site of pBR322. The EcoRI-BamHI fragment of 3.0 kb containing the H4 coding region was subcloned into pBR322, generating the plasmid H4pA777.2 which was partially sequenced using the restriction sites shown in (b) and following the sequencing strategy schematized in (c). Symbols and lettering are the same as in Figs. 2 - 4 . percentage o f h o m o l o g y existing between the genes e n c o d i n g H3 a n d H 4 in corn, Arabidopsis a n d wheat as c o m p a r e d with the c o r r e s p o n d i n g genes in the sea u r c h i n P. miliaris (16). Several points emerge

from these c o m p a r i s o n s : (i) the sequence h o m o l o g y between two genes encoding the same protein in the same p l a n t is very high ( 9 2 - 9 7 % ) except between the two H3 genes

-400

H3 A713 H3 A725

184

GTCGACTGA .........

-350 TACACAACATATTAA~CTTAACCTAGACTCTACAACATC, GGLTI"FI A ~ F ~ C A T A G A G G C C T I ~ A C T G T C A G G T T C C . . . . . . . . . . . . . . . . . . . . . . . . . . . . TAATGGGCTTGuTI" I-FFI~3GTAAAG A T A A T A T A T T T A G A T T A T ~ I L T I T r ^ -300

-250

A A A A T T T A G C G AAAAG G T ~ T A A A A G A C G C G C C T A A A T A A A A A A A C T A A A A A A T T ~ T G A A G G A A G C T A A A A A A L T i ~ G G C ~'Iq~'TATrGGTL"FI~GTACCTTCACGCGGATCAGTGTTAGAAAAGAA?~ A R A T C T G T C C G T A G A A T T G T A ~ A C -200 G CTAAAGAAAACTAAC CTGA~'TGGCTGCGTTAGACGTGACGCGGATCACTAAAATTAAAACAATCACAACCGTI~ATTA O G T I ~ C T G T G ATTIC ACG C G G A T C G T G A T A G A A A C C T C T T A T A T C C G T I ~ 3 A T T A A A A A C A..,,,, A A T G A A..... CGATCCAG ATCTA -150

-I00

AATAAAAGItTCTAACOA C ( ; ~ : A ~ / ~ J ~ C J ~ ' ~ ' ~ A ~ C ~ A T A C G ~ O ~ C A A I " C C A G ' I " A ~ C O . T , r A 6 A T C C C G CTGAAAAAGAAAAOCGAAATA't - t - t - I ~ J C C T ~ C C A ~ T A A ~ T C C A GACG~AAATA~CCCCTTCCTCAA -50 TCTCAACA " " A~,I%GErt'IT ^ C A A A ~ ^ T C ACT'r,~'ITCACAAATCI~AAAGCF£~GGATACTAA TCGAAA CAAACTATCCAOATCTCAACTTTCTCTCATCTI~AAATTAAAATCAAACAG ~Frc'ITAATAACArIT~ACTTC +50 ~ C G T ,~.CC ~ CAA ACC GCA ~ ~ "I"CC ACC C G A (;GA ~ G C C C C A AGG A.,,*,.A C A A ................. G . .G . .T . . . . . . . . A . . . . . . . . . . . G . .T . . . . . . . . G . .O Met. A l a A t 9 T h r L y ~ G l n T h r A l a A c g L¥~ S e t T h f G l y C l y L¥~ A l a P r o A r g L y s G l n +IO0 CTC GCA ACA ~G

GCG G C G A G G A A A T e A C C ~ CCG CCC A c e G G A G G A Gl~3 A A G A A G C C A CAC • .G . .G ..... A . . . . . . . . . . . . . . T ........ C ........................ L ~ u A l a T h r Ly~ Ala A l a A f g L¥~ Sur A l a Pro Ala Thr Gly Gly Val Lys L¥~ Pro His -150

ArIA T T C C O T CL'£ G G A ACI' G W GCC CI'A A C A G A A ATe AGG AAG T A T CAC AAG AGC A C £ GAG . . . . . . A . A . .C . . . . . . . . . . . T T ................... C ............... gr~l Phe Arg Pro Gly ~ r Val gla L~u Atg G l u Ile At g Ly~ Tyr Gln Lys Set The G l u +200 A~t CAC COT Tq~3 GTT CGT GAG ATC GCT CAG G A T T r C C'IT C T G A ~ C G C AAG C'F£ CCG ~ ..... C ...... A.G C . T . . C A.A . . . . . . . . . ..... T ..... T . .A . .C . .T . . . . . . L e u L e u l i e A r g Lys L u u Pro Phe Gin Arg Lcu Val Atg G l u lle Ala Gln Asp Phe L¥~ +300

+250

A C A O A T CI~ C ~ r T T C CAG A G C A G C GCC GTC G C A CCA CIT CAG C A A G C G G C T G A A G C A T A C ..C . . . . . T . . . . . . . . . . . . . . T . . . . . . . . C ..T . . . . . . . . . . . T . . . . . . . . T ... T h t A s p Leu A r @ Phe G i n Set Set Ala Val Ala Ala Leu G l n Glu A l a AIa G l u Ala Tyr +350

CI'C GTT GGA ~ . . . . .

C

Leu Val

~

GAA GAC ACC AAT C1~ TGC GCG A'~.' CAT GCT AAG A G A ~ £ C A C T A T C ..G . . . . . G ... ... C.C ..C . . . . . . . . . . . . . . C ..... C ..... C ..C ... G l y L e u Phu G l u Asp T h r Ash L~u Cy~ A l a I 1 ~ H I s A l a L y a A r g V a l T h t I l e 1-400

ATG CCT AAG GAT AT'/' CAA T I G GCG AGG AGA ATT AGA GGC GAG AGG Gc,r TAAGAAC,GAGA'~'GA ..... C ..... C . . C . .G C . C . . . . . A ..... C ..... A . .A C . A . . . . . GATCTAAGTATTC Met P r o L y a A a p l i e G i n L ~ u A l a A r q ArM I l e A r 9 C l y G l u A r 9 A l a EJ~l +450 +500 AC;'r^cr['rAGAC'IX;~,~TCOTrAIX; C'~'rATGTATAT/rrrrcG'rrI'r(:C c r . ~ ~ A O G 6 ~ ^ ~ TGGTTA~-IT~P~ATTAGAACCATAAGATTGTAAAAITTCAAGTrAA A T C CAAGGGTTATAATATTAAI"rCCA +550 ~TG CATTTATG'FIGTTCGATGAAATCCCTCAA'£GAATCITAATTAAGTC'/~TIT/C~~A~'~A AO C A A T T A T C T T T E G T T C I ' A T C A A A G T A T T G A C I " t T I ' P A G T T G A C T I T A I " r A A T C A C T ~ A A ~ / G C ,*-600

+650

ATCOTTTA~'ACAATCTG'I'rGAC~3CTAGGTATAAGTG G T T A A A ' F I ' F F r A T A T C C A A A A C A A T ~ A C TAG'FI~ACAA CTCA'I'TATAGTCTCACA/*,,ATTAATCAGCT~q 'GACGCTGA A A T ' I ~ T C A ~

CAGA~ ~

+V00 AAA~GCGAACACTAATCTI~3"FECACAAGAG~IGAA~TTCTATAATTA T G C F I T G C ~ A ~ A ~ GTGATTCGATCGAAGATOAACTCCAATCEGCTATCC, AATCCTATATCATCGGAGGTCTTCAC~CA~TrI~ +750

+800

~T~ITI'GGCATAA~'I~TCACACGGTTACACCATTATGCFI~TCTCrTATG CATAACTCTATAAGATCT CAAAITCTTTCAAGCTT .....................................................

Fig. 6. Nucleotide sequences of the two Arabidopsis histone H3 genes H3A713 (upper line) and H3A725 (lower line). The coding sequences are represented as triplets and numbered beginning from the initiation codon ATG. In the coding region of H3A725 only the substituted bases are indicated. The nucleotides of the 5'-flanking region are numbered negatively from the initiation codon ATG. The TATA-boxes are boxed, the GATCC-Iike sequences indicated with brackets, and the putative CAAT-boxes with dotted lines. The conserved CGCGGATC octanucleotides are underlined.

185

-600 H4 6748 H4

ACAC~C'rI"~TGT~'ITFI'GG~A'I'A~A~G^^^O~GGC^^TACC ACACAAALTFI~TATGTAAA~FI~3GTGATATTTATGGTCGAAAGAAGGCAATA

A777

-550

-500

CA~I"GTATGTTCCAATATCAATATCAATACGATAAL-TrGATAATACTAACATATGATTGTCA~I~3ZTtTIa2CAGTATCA CCA~'I~GTATGTI'CCAATATCAATATCAATACGATAA~ATAATACTAACATATGATTGTCA~'I~SzT~Tt'C CAGTATC -450

-400

AATAT~TTAAGCTACT~CAAAATTAGTATAAATCACTATGTTATAAATC-xTrF~GG~rGTAACT~TAATTCGTGGGT -350 ZTFFL'AAAATAAAA~CATGTGAAAATTTTCAAATAATGTGATGGCGCAA'ITt-t'ATFFrCCGAGq~C CAAAATATTG CCG "rI"FI'AAAAATAAAAGCATGTC~AAATTTTCAAATAATCTGATCGCGCAA'ITt-I-A~TI'rCC~CAAAATATI~]CCG -300

-250

crrc^~^cccr~a'rArc~ccccA6^~c~^cc~r~crr^crccc~Arc^crccclrcAcccc~&A CTTCATTACCCTAAT~GCCACATCTAAAACAAAAGACGAT~A~EGGCTAC

CACTGTCATCACGCGGATCAC

-200 CT~TATG/~CC~GAI'T~C^C~"CC.~CGG'ITrATA'dATCA"

"I~TA~TAC

A C A C ~ A ~ T A ~ A ~

CGT

TAATATTCACCGTCGATTAAAATAGATGGACC~FI-I'AAACATCATTTTA~GTACACACGGATCGATATCTCAGCCGTT -150

-i00

~^TcT^c'rA1r,c c a r c ~ ^ r m c r c ~ A ~ ^ c ^ c - r c - r c c c ~ a ~ r c c ~ ~ r c ^ c ^ r c ^ r ^ c c c ^ -50

CCCAAATL~ACT~AAC CG~C~ACAGACATCAATTTCTCTCGAGATAAACTAAATCTTC G CTGAAAAA CCCAAATCATCTCTTAACC ~ C T T C T A C T G A C A T C A A ~ T r CTCI'CGAGAAAAACTAAATCTGCGCTGAAAAA +50

^re r c c c ~

cot c~

~c

& A ccA ~ . ~ c ~

rrc cc~ ~c

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Met

6c^ co~ c c c ~ , a co~ c^c A

^&

. . . . . . . . . . . . . . . . . . . .

Set Gly Arg Gly Lys Gly Gly Lys Gly Leu Oly Lys Gly GIy Ala Lys Arg Hls Arg

+I00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

C

. . . . . . . . . . . . . . .

Lys Val Leu Arg Asp A~Q Ile Gin Gly Ile Thr Lys Pro Ala Ile Arg Arg Leu Ala Arg +150

~c~ c ~ r c~^ 6 r c ~

c o t ^~c ~ r

c~

. . . . . . . . . . . . . . . . . . . . . . . . . . .

c r ~ ~ r c ~AC c~a 6^C Act co~ & c o r c c r c T.G

~6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Arg Gly Gly Val Lys Arg lle Set Gly Lea Ile Ty~ Glu Glu Thr Arg Gly Val Lea Lys +200

Arc r r r c r c ~

~c

~'m ^ ~

c o t oar oc~ o r c ^ c r ~^c ~ c c a^a c ~

obr ^c,~ ^c~

~6

................................... C ..... T ........ A ......... Ile Phe Leu G l u A s n Val Ile Arg Asp Ala Val Thr Tyr Thr Glu His Ala Arg Arg Lys +250

^ c r o r ~ ^cc 6 c c ^r~ o a r ~ .....

C

~300

o r c r^c oc~ c r c ~ a

^~ 6~

c~^ ^c~ ^&r c a r ~^c c c ~

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Thr Val Thr Ala Met Anp Val Val Tyr Ala Leu Lys Arg Gin Gly Arg Thr Leu Tyr Oly +350 TTC GG'T GGT TAATTAGAu-ITtT~AGATCCC,CGTTTGTGT~T~CTGGGT'FI'CTCACTTAAC, CGTCrGC~ITrt'ACT . . . . . C . . . . . . GAAf~-I-~-~-~-x-~:ATATC C GC GTTIvtTI~,Tt"F~'~'ta2TGGA~'ET CTTACI~AGGCGTA~'rA~I~CT Phe Gly Gly End +400 +450 TTTGTATTGGGTTT~GTTTAGTAGTTTGCGGTAGGTTCTTGTTATGTGTAA~ACC~L~F~F~-~ATCATC~-ITCAGC A C,G G A T T T T C T A A T C A A A T C A T G T T A T C C T T ~ C G T T T T A C A T T I ~ ' T A ~ AGTTTGCGGTA

crr~c,~-rG~r~r~cc~cb.crrc^c GC G T r C ' F I ' C I ~ A T O T b - ~ ^ A I ~ A / : C . , C C T T T

Fig. 7. Nucleotide sequences of the two Arabidopsis histone H4 genes H4A748 (upper line) and H4A777 (lower line). The coding sequences are represented as triplets and numbered beginning from the initiation codon ATG. In the coding region of H4A777 only the substituted bases are indicated. The nucleotides of the 5 '-flanking region are numbered negatively from the initiation codon ATG. The TATA-boxes are boxed, the GATCC-Iike sequences indicated with brackets, and the putative CAAT-boxes with dotted lines. The conserved CGCGGATC octanucleotides are underlined.

186 Table 1. Percentage of sequence homologies between the coding regions of the genes encoding H3 and H4 in several species. In the case of the H3 genes of Arabidopsis (a) and (b) refer to H3 A713 and H3 A725 respectively. Species compared

Arabidopsis/Arabidosis Arabidopsis/Corn Arabidopsis/Wheat Arabidopsis/Sea urchin Corn/Corn Corn/Wheat Corn/Sea urchin

Genes compared H3

H4

85 75 (a) 72 (b) 75 (a) 72 (b) 77 (a) 73 (b) 96 92 79

97 77 79 76 92 93 78

of Arabidopsis (85°7o), (ii) the same high level of sequence homology exists when comparing the H3 and H4 genes of corn and wheat, (iii) surprisingly, there is not more sequence divergence between plants (corn, wheat or Arabidopsis) and sea urchin than between the cereals (corn or wheat) and Arabidopsis although the H3 and H4 from sea urchin and plants differ by several amino acids. The divergence results mainly from different codon usages. Codon usage in the Arabidopsis H3 and H4 genes Several studies of the codon usage in eukaryotic genes have shown that synonymous codons are not used randomly (7). We have recently shown, for example, that the codon usage in H3 and H4 genes of corn is strikingly biased, A and T being largely underused at codon position 3 (5). In the 4 genes presented here there is no striking discrepancy between the occurrence of each of the 4 bases in position 3 of the codons and the overall base composition of the genes. The detailed codon usage is presented in Table 2 as compared with that of corn (5) and P. miliaris (16). Although A is not greatly under-represented at position 3 of the codons as compared with the other three bases (Table 2), the codons GTA (Val) and ATA (lie) are absent from both H3 and H4 genes and TTA, CTA (Leu), TCA (Ser), CCA (Pro), ACA (Thr) and GAA (Glu) from H4 genes only. On the contrary, there is an overuse of GGA (Gly) in all four genes. Codons ending with G are represented at a normal rate except in the quartet encoding glycine from which G G G is completely absent. The

codon sextet encoding arginine is of particular interest: among 64 residues, 40 are encoded by the duet AGA, AGG but only 24 by the quartet CGT, CGC, CGA and CGG, CGG being never used and CGC only once. It thus appears that in the Arabidopsis H3 and H4 genes, the codon usage is biased at the level of each amino acid by a preferential use of some codons. The same situation occurs in the sea urchin H3 and H4 genes but the codons preferentially used are different from those in Arabidopsis. On the contrary, in the corn H3 and H4 genes the codon usage is quite differently biased, A and T being quasi excluded from the third position of all codons (5). These two different types of biased codon usages finally lead to the same divergence of 20 to 25°7o between the three compared organisms (Table 1). It was shown that in the majority of viral and eukaryotic genes including the H3 and H4 genes of sea urchin (16), there is a striking overuse of the dinucleotide GpC over CpG in position 2 - 3 of the codons (12). This particularity was not found in the H3 and H4 genes of corn (5) nor in the Arabidopsis H3 and H4 genes in which GpC is used 12 times while CpG is used 19 times in position 2 - 3 of the codons.

5'-flanking regions The 5'-flanking regions of the four genes presented here are characterized by an A-T rich nucleotide composition (about 65 °7o). They contain the classical consensus sequences generally found upstream from the coding regions of the histone genes (see Figs. 6 and 7). The TATA-like sequences are located approximately 100 nucleotides before the initiation ATG. In the majority of histone genes sequenced up to now, there are one or two pentamers similar to GATCC, located 10 to 20 nucleotides before the TATA. Such a motif is absent from H3A725, but a similar sequence GCTTC was found in H3A713, 7 nucleotides before the TATA. In both H4A748 and H4A777 there are two such motifs, namely GTCTT and GACTC, located 9 and 17 nucleotides upstream, respectively, from the TATAbox. In each of the four genes, sequences showing analogies with the CAAT-boxes found in the wheat H4 genes (18-19) as well as in several nuclear genes of higher plants (1) are located 2 5 - 3 0 and 7 0 - 9 0

187 Table 2. Codon usage in the coding regions of the Arabidopsis histone H3 and H4 genes, as compared with that in corn (2H3 and 2H4 genes) and Psammechinus miliaris (IH3 and 2H4).

Ara H3 Phe T T T Leu Leu

Ile

Met Val

Ser

Pro

Thr

Ala

TTC TTA TTG CTT CTC CTA CTG ATT ATC ATA ATG GTT GTC GTA GTG TCT TCC TCA TCG CCT CCC CCA CCG ACT ACC ACA ACG GCT GCC GCA

GCG

1 9 1 3 9 7 1 3 4 10 2 4 6

Ara H4

Corn

S. urch

2 2

3 11

5 2 7

1 1 27

2 3 11

11 5 23

2 4 9

4

6 2 3 4 3 8 1 12 3 2 6 3 4 2 1 1 3 3 1 2 8 5 1 5 12 8 -

18

2 1 1 2 3 2 4 3 5 10 3 2 17 8 5 10

3

2 2

10 2 4 1 5 1 9

8 6

4 1 23

7 4 1 2

10 5 21 1 27

nucleotides before the TATA. Whether or not these sequences play a role in the expression of these histone genes remains to be established. W e recently mentioned the existence in the 5'-flanking regions of the corn (5, 14) and wheat ( 1 8 - 2 0 ) histone H3 and H4 genes of a highly conserved octanucleotide 5'CGCGGATC 3', located 100 to 120 nucleotides upstream from the TATA. This sequence is also present in all four sequences presented in this paper, 91 to 140 nucleotides before the TATA. Experiments are in progress to determine whether this motif plays a role in the regulation of the expression of the plant histone genes. The computer alignments of the nucleotide sequences of the 5 '-flanking regions of the H3 genes presented in Fig. 8a show only very limited sequence homologies. This result is not surprising, as

Tyr

TAT TAC Term TAA TAG His CAT CAC Gin C A A CAG Asn ATT ACC Lys A A A AAG Asp GAT GAC Glu G A A GAG Cys T G T TGC Term TGA Trp TGG Arg CGT CGC CGA CGG Ser AGT AGC Arg AGA AGG Gly G G T GGC GGA GGG

Ara H3

Ara H4

1 3 1 1 1 3 3 13 2 10 18 4 4 9 5

8 2 4 4 4 3 17 4 2 8

-

-

2 8 1 1 1 5 12 12 1 13 -

12 2 2 6 10 13 3 18 -

Corn

S. urch

20

1 6 1 1 2 2 3 7

1

2

5

1 5 19 3 3 2 10

12 1 8

48 2 12 22

-

2 3 5 43 6 2 1 9 3 32 4 9

1 8 9 7 1 3 6 2 6 9 7 3

the two genes show significant divergence even between their coding regions. On the contrary, only minor nucleotide changes were found between the 5 '-flanking regions of the H4 genes (Fig. 8b), thus confirming at the nucleotide level, the similarities existing between the restriction maps of H4A748 and H4A777 (Figs. 4 and 5). Such a high level of sequence homology suggests that these two genes most probably result from a recent duplication.

3'-flanking regions The nucleotide sequences of these regions are characterized by a low content of C + G (30 to 38%) and a very high percentage of T (over 40%) present as short stretches in all four genes presented here. The computer alignments showed that no significant homologies exist between the H3

188 ACAC^^^C'g~ . . . . .

,i,lrIllll

~Ai^~A'i'A'r

TTA~T

CC,^^^G^^ GGC

tlII1111ttJlli~lJlllilIililllPi11,1

A C AC A A A ~ A T G T A A A T T T G G T G A T A T r T A T G G T C G A A A G A A G G

C

A^TAC C C A ' I ' i " G T A ~ C J%,^IATCAATA'i'~LI~^TACGAT/~I~CTI'GAT^A'i" Illl

Illlllll{llll}lll

{llllllllllllll{llllllllll{

AATA. C CATIb"TATGTrC CAATATCAATATC AATAC G ATAA CTTGATAAT Ifllllllllllllltllllll

Illlll{l{lllllll

ACTAACATATGAT~ATTGI'I-I'~%CCAGTATCAATAT..

IIIIIIIII

ATTAAGCTA

CTACAAAATTAGTATAAATCACTATATTATAAATu- J:i-J.-Z-J.CGGTTGTAAC Illlllllllllllllllllll II Illlllllllll{lll{l{lllll CTAC AAAATTAGTATAAATCACTATG'TTATAAATt-~'I'~-~'I~IGG~TAAC .. GTt'GAC. 'i'I;ATACAC ^^ 6 ATA'i'T.~a~,CC,C-,GCTI'AAC CT.~OA Cli'rrA C A.~I i I {I I I III I ill I I I I 11 1 TAATGC,G ~ . . . . ~.-x-J;~-~uGTAAAG...ATAA.. TAT^. TTTAGAT

TTGTAATrc~'/.'/.'x'J.-r. AAAATAAAAGCATGTGAAAATITrCAAATA Illllllllllllllllll II IIIIllllllllllllllltlllllll TTGTAATTCGTC.~/.'i'J.'z'zAAAAATAAAAOCATGTGAAAATTITCAAATA

CATC~-,C,t"Ft'L'~'^TI'ITATAGAGGCCTI~ACTGTCAGGTI'C CAAAATTTAG II IIII111111 II It IIII II I Iit TAT. TI~.-/.-J.-/:~.'A:LTDU. T A ~ . •.GT. ACCTTC ..........

ATGTGATC, GCGCAA'I"I"I'I'A'~-L-t-~LCGAGTI~CAAAATATTC, C CC,CITCAT {lllllllllllllllllllll IIIIIIIIIIIIIIIIIIIIIIIIIII ATGTGATOGCGCAAI't'~'[ATI'ITC C GAGTI~ C AAAATATr~c C C,CITCAT

COAA~OO'~'~rA~CO6OCCr~TAi~AAiAAAZ'ro~o~

TA C C C I ' ^ ^ ~ C,CC .I~ C ATGT^A^^C AA^^GA C G A'i~TCTI~AG "IT.~3 IIllllllllllllllllllll Illlllllllllltlll Ill IIII TACCC'gAATZ~¥'GGCO CCACATCrAAAACAAAAGACGA~. "i"r A , ~

lllll I I I II I I I I III ................ ACGCGGATCAGT ...... GTTAGAAAA .... GAA

OOAAOC'r.VJ~AC~'rOOOEOC'rAAAOA.'UiAC'rAACC'rO~;.rrOOC~CG÷ I III II II II l I11 II II Ill I TO ........ AAATtqvt'C C GTAGAA'z't~ATTAATCT.. A C ~ . T

TATCACI'G C 6AI'~3ACC~CGGA'I~3ACTAATA'i~^^C CGTC G.~TI'^^^^CAGi II IIlll Illllllllllllllllllll IIIlllllllllll Ill TACCA~ ATCACCd3C,GA'I'CACTAATATTCACC G T C G A ~ T A ~

TAG A~-'TOA6OCC~3A_TCAC÷^.~.~d~:ITAAA~tC.~.A'I~TACAA6Cb-~ITGATT. :

TC GAeC~ITrAIACAI~Az-I-x-~'^TI~TACACACOGA~GATA~'TCAGCC

GTGATTI~ACGCGGATC. GTGATA. GAAAC CTC TrATATC C GTrG ATI'AA

I Illllllll Illlltllllll{llllllll{llllllll{lllllll 2V.,GACGGTITAAACATCA'z-J.'J;L'~ACACACGGATCGATATCI~AGCC

. . . . A~ATAi~ . . . . . .

^0~"rAACO^C~C ........

~.'.

C~.'.~

GTrAo^rrrk~T^roco^~^Trc~rc~.A~AT~crerce~rc~

AAACAAATGAACC,ATCCAGATCTAATC CCGCTGAAAAAGAAAAOC0AAAT

,lII,ll ~i I I , , i , , l l l i , , I I , i ~ , l l I ,~li~lil~,ll GTTAGATCTACTATC, CGATCTGAT~,CITAAAT!'ATI'AOACTCI~CGTCT

ACTI'CC CTT~,~I~ATA~ CI~I'CC~--~I'ATA^AA/~.CC~'I'C... 6AOTAI'~2CCT'~ III III I Illl I Ill Ill I I Illlll I AITt-D2CCT... CCCACTIT. TIL-TAATTTC CAGAC GATAAATATC CCCT

Tr~ccr^T~JU~CA~rn~^c^rcTn'~^ccc~^crcrr~6 II{lllllllll IIIIIIIIIIll I I Illlllllll Illlllll ~T,CCrXTAAAACCAXrrrCAC^rC^T^COCACCCX~rC^TC'rCrrA~C

r~'rcrC~A~^T.cio^rcrc^c~rcr.rc^~ I Illl III II II I l l l l l l l l • Ill Illll r.. ccrcA..ArC~A~CAXACr..^rCCAG^rcrcA~^rc

CG " ACAGICATC^^~OAOAT~CT^^^T~T ll{{lll{{}{{{{{i{ {{l{{{{{lill}lll{l{{ {i{}{{{li{{ c ~ A C I ' ~ A C A T C A A ~ G AO A A ^ ^ A C r ^ A ^ ~

ACTTA~AC^^^~AAA~OOATAC'r.t.A

rrCA~Arr^.

.....

.....

A~rCA~C^orrrc..rr~ATA~CArrTr^CTrC

IIIlllilll

b

a

Fig. 8. Sequencehomologies in the 5 '-flanking regions between the two H3 genes H3A713 and H3A725 (a) and between the two H4 genes H4A748 and H4A777 (b). The nucleotide alignments were obtained using the GAP computer program of the University of Wisconsin Genetics Computer Group (UWGCG). 3 '-flanking regions (data not showh). On the other hand, the H4 genes exhibit extensive sequence homology, except for a fragment of 42 nucleotides in H4A777 ( + 3 6 4 to +406) which is absent from H4A748. These comparisons are in good agreement with the divergence existing between the two H3 genes and the high level o f h o m o l o g y found between the H4 genes. In the animal histone genes the sequences downstream from the termination codon are characterized by the existence of a GC-rich T-hyphenated palindromic structure. This very typical structure was shown to play a predominant role in the 3' processing of the histone m R N A s (4). Although several inverted repeats were found in these regions of the Arabidopsis H3 and H4 genes described

here, none of them resembles the aforesaid typical structure.

Copy number and genomic organization of H3 and H4 genes Copy number In order to estimate the reiteration of H3 and H4 genes in the Arabidopsis genome, EcoRI digested genomic D N A was coelectrophoresed with different known amounts of the cloned genomic D N A fragments carrying either the H3 or H4 genes. The Southern blots were hybridized with the 32p_ labelled coding regions of the Arabidopsis H3 or H4 genes. In order to minimize errors due to the sequence

189 divergence existing between H3A713 and H3A725 two estimations were done, one using H3A713 and the other using H3A725 as reference and probe. The two autoradiograms are shown in Figs. 9a and 9b. It is clear from the comparison of the two autoradiograms that using only one of the two genes as reference and probe would have led to an underevaluation of the total copy number of the H3 genes. The densitometer scannings of the H3A713 and H3A725 autoradiograms gave an estimation of 5 and 4 copies per haploid genome, respectively. As we do not know what sequence homologies exist between each of the two probes and the two smaller DNA fragments of 6.0 and 4.5 kb hybridizing with H3 these figures may be underevaluated. We propose a value of 5 to 7 copies of the H3 genes for Arabidopsis haploid genome.

The same copy number was deduced from the densitometer scanning of the H4 autoradiograms presented in Fig. 9c.

Genomic organization The genomic blots of both histone H3 and H4 genes showing multiple hybridization bands of different sizes demonstrate that in Arabidopsis these genes are not associated in a repeated unit, as was shown to be the case in several animal genomes (8, 11). In fact, the sequence divergence existing between the two H3 genes reveals a completely different environment and thus suggests that these genes have undergone an independent evolution. These arguments strongly support the hypothesis that in the Arabidopsis genome, the H3 and H4 genes most probably exist as dispersed elements.

Discussion

Fig. 9. Estimation o f the copy number of H3 and H4 genes in the genome o f Arabidopsis. 2/~g (estimated colorimetrically) o f EcoRI digested Arabidopsis genomic DNA were run on a 0.8% agarose gel in parallel with known amounts of 3 different DNA standards containing either the H3 or the H4 gene of Arabidopsis. The standards were the purified H3A713 (Fig. 9a), H3A725 (Fig. 9b) and H4A748 (Fig. 9c). Each gel was blot hybridized with the 32p-labelled coding region of the corresponding standard. (a) Lanes 1 and 2 contain 2 and 1 copies respectively o f H3AT13. Lane 3, 2/~g of EcoRI digested Arabidopsis DNA: (b) Lane 1:2 #g of EcoRI digested with Arabidopsis DNA. Lanes 2 and 3 : 1 and 2 copies of H3A725 respectively. (c) 2 copies (lane 1) and 1 copy (lane 2) o f H4A748. Lane 3:2/zg of EcoRl digested Arabidopsis DNA. The copy number of each gene was estimated by scanning densitometry of the autoradiograms.

Up to now, our knowledge concerning the plant histone genes was restricted to three cereals: corn (5, 14), wheat (18-21) and rice (21, 22). As the H3 and H4 genes of Arabidopsis thaliana presented here are the first histone genes of a dicot to be cloned and sequenced, several interesting comparisons can be made with the structure and organization of their counterparts in the monocots corn and wheat. 1) The general organization of the consensus sequences in the 5'-flanking regions of the 4 genes presented here is very similar to that found in the H3 and H4 genes of the cereals mentioned above. The TATA-boxes are located approximately 100 to 110 nucleotides upstream from the initiation ATG. The classical histone-gene-specific GATCC-Iike pentamers were found in three of the genes, 10 to 20 nucleotides before the TATA. From the studies of the 5'-flanking sequences of 11 plant histone genes previously described (5, 14, 18-20) the pentamer G c ( T ) ( C ) c

emerged as the plant-specific

sequence of this consensus motif (5). In both H4 genes there exists, in addition to a GTCTT motif (9 nucleotides before the TATA) a second pentamer GACTC (17 nucleotides before the TATA) which is much more similar to that generally found in the animal histone genes (8). It should be mentioned that H3A725 lacks this consensus sequence and it

190 will be interesting to determine if this absence affects the expression of this particular gene. From the analysis of the 5'-flanking regions of the H3 and H4 genes of wheat and corn, a highly conserved octanucleotide CGCGGATC emerged. This sequence which was systematically found 200 to 250 nucleotides upstream from the initiation ATG is also present at approximately the same position in all four genes presented here, as well as in two other corn H3 and H4 genes which we have sequenced recently (data not shown). The presence of this octanucleotide in all eleven of these genes sequenced up to now strongly suggests that it is specific for plant histone genes. Studies are in progress to determine (i) whether or not it is restricted to the H3 and H4 genes or if it can also be found in the 5'-flanking regions of the H2A, H2B and H1 genes, (ii) if this sequence has any regulatory function in the expression of histone genes in plants. 2) In the Arabidopsis H3 and H4 genes, the coding regions do not show the very strickingly biased codon usage found in corn (5) and wheat (18-20). 3) The 3 '-flanking regions of the majority of histone genes contain the classical T-hyphenate d GCrich inverted repeat, which in animals was shown to be essential for a correct positioning of the histone mRNA 3'-end. Analogous palindromic structures were described in the corn H4 histone genes (14) but are absent from the Arabidopsis genes described here. One may imagine that there exists in Arabidopsis and perhaps in the dicots in general a particular mechanism of histone mRNA processing. 4) As far as the organization of the H3 and H4 genes in the Arabidopsis genome is concerned the following observations emerge: (i) Each gene exists in a small number of copies. (ii) H3 and H4 were never found to be associated on the same genomic DNA fragment. (iii) The H4 genes show extensive sequence homologies and very similar environment, whereas the H3 genes are very different even in their coding sequences, thus suggesting independent evolutions. The small size of the Arabidopsis genome should allow a more accurate mapping of these genes with respect to one another. If it appears that these genes, as well as those encoding H2A and H2B, exist as dispersed elements, it will be of great interest

to determine if their expression is independently regulated or if there exists some mechanism of concerted expression.

Acknowledgements We wish to thank Prof. L. Hirth for stimulating discussions and for his constant interest in this work. We also thank Prof. M. L. Birnstiel for the gift of the plasmids pCh22 and pHae 181.

References 1. Anderson OD, Litts JC, Gautier MF, Greene FC: Nucleic acid sequence and chromosome assignment of a wheat storage protein gene. Nucleic Acids Res 12:8129-8144, 1984. 2. Becket A, GoldM: Isolation of the bacteriophage Lambdagene protein. Proc Natl Acad Sci USA 72:581 -585, 1975. 3. Benton WD, Davis RW: Screening ~gt recombinant clones by hybridization to single plaques in situ. Science 196:180- 182, 1977. 4. Birnstiel ML, Busslinger M, Strub K: Transcription termination and 3' processing: the end is in site. Cell 41:349- 359, 1985. 5. Chaubet N, Philipps G, Chaboute ME, Ehling M, Gigot C: Nucleotide sequences of two corn histone H3 genes. Genomic organization of the corn histone H3 and H4 genes. Plant Mol Biol 6:253-263, 1986. 6. Flavell R: The molecular characterization and organization of plant chromosomal sequences. Annu Rev Plant Physiol 31:568 - 596, 1980. 7. Grantham R, Gauthier C, Gouy M, Jacobzone M, Mercier R: Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res 9:r43- r74, 1981. 8. Hentschel C, Birnstiel ML: The organization and expression of histone gene families. Cell 25:301- 313, 1981. 9. Leutwiler LS, Hough-Evans BR, Meyerovitz EM: The DNA ofArabidopsis thaliana. Mol Gen Genet 194:15 - 23, 1984. 10. Maxam AM, Gilbert W: Sequencing end-labeled DNA with base-specific chemical cleavage. Methods Enzymol 65:499- 560, 1980. 11. Maxson R, Cohn R, Kedes L, Mohun T: Expression and organization of histone genes. Annu Rev Genet 17:239- 277, 1983. 12. Nussinov R: Some rules in the ordering of nucleotides in the DNA. Nucleic Acids Res 8:4545-4562, 1980. 13. Patthy L, Smith EL, Johnson J: Histone III: V. The amino acid sequence of pea embryo histone III. J Biol Chem 248:6834- 6840, 1973. 14. Philipps G, Chaubet N, Chaboute ME, Ehling M, Gigot C: Genomic organization and nucleotide sequences of two corn histone H4 genes. Gene 42:225- 229, 1986. 15. Pruitt RE, Meyerovitz EM: Characterization of the genome of Arabidopsis thaliana. J Mol Biol 187:169-183, 1986. 16. Schaffner W, Kunz G, Daetwyler H, Telford J, Smith HO,

191

17.

18.

19.

20.

Birnstiel ML: Genes and spacers of cloned sea urchin histone DNA analysed by sequencing. Cell 14:655-671, 1978. Southern EM: Detection of specific sequences among DNA fragments separated by electrophoresis. J Mol Biol 98:503 - 517, 1975. Tabata T, Sasaki K, lwabuchi M: The structural organization and DNA sequence of a wheat histone H4 gene. Nucleic Acids Res 11:5865-5875, 1983. Tabata T, Iwabuchi M: Molecular cloning and nucleotide sequence of a variant wheat histone H4 gene. Gene 31:285-289, 1984. Tabata T, Fukasawa M, Iwabuchi M: Nucleotide sequence and genomic organization of a wheat histone H3 gene. Mol Gen Genet 196:397- 400, 1984.

21. Thomas G, Padayatty JD: Organization and bidirectional transcription of H2A, H2B and H4 histone genes in rice embryos. Nature 306:82- 84, 1983. 22. Thomas G, Padayatty JD: Restriction map and partial sequence of a rice DNA fragment carrying histone genes H2A, H2B and H4. Ind J Biochem Biophys 21:~-6, 1984. 23. Vogelstein B, Gillespie D: Preparative and analytical purification of DNA from agarose. Proc Natl Acad Sci USA 76:615-619, 1979. 24. Zimmerman JL, Goldberg RB: DNA sequence organization in the genome of Nicotiana tabacum. Chromosoma 59:227- 252, 1977. Received 18 July 1986; in revised form 24 September 1986; accepted 8 October 1986.

Genomic organization and nucleotide sequences of two histone H3 and two histone H4 genes of Arabidopsis thaliana.

Two histone H3 and two histone H4 genes have been cloned from a λgtWESλ·B Arabidopsis thaliana gene library. From their nucleotide sequences and from ...
1MB Sizes 0 Downloads 0 Views