Plant MolecularBiology12: 245-256, 1989 © 1989KluwerAcademicPublishers. Printed in Belgium

245

Characterization of the kafirin gene family from sorghum reveals extensive homology with zein from maize Richard T. DeRosO, Din-Pow Ma 2, In-Sook Kwon 1, Seyed E. Hasnain 3, Rodney C. Klassy1 and Timothy C. HalP*

~Department of Biology, Texas A&M University, College Station, TX 77843-3258, USA (*author for correspondence); 2Department of Biochemistry, Mississippi State University, Drawer BB, Starkville, MS 39762, USA; 3National Institute of Immunology, JNU Complex, Jit Singh Marg, New Delhi 110 067, lndia Received23 August 1988;accepted 13 October 1988

Key words." kafirin, prolamin, seed storage protein, sorghum genomic library, zein Abstract

Electrophoretic analysis of translation products of polyadenylated RNA isolated from mid-maturation sorghum seed in the presence of [35S]met, [3H]leu, or [3H]val revealed two major proteins of 24 kDa and 21 kDa. These products were not detected when [3H]lys was supplied as the radioactive substrate. Under similar electrophoretic conditions, kafirin (a major seed storage prolamin of sorghum), migrated as two bands of 22 kDa and 19 kDa. Sequence analysis of two cDNA clones (pSK8 and pSKR2) from sorghum seed mRNA revealed them to be highly homologous with each other and to the 22 kDa zeins from maize, suggesting that they represented kafirin cDNAs. Compared with pSKR2, pSK8 had an insertion of 24 nucleotides and a deletion of 24 nucleotides, so that the coding regions were nearly identical in length. The deduced amino acid sequence for these cDNA clones reveals that kafirin, like zein, is rich in glutamine and nonpolar amino acids, but contains no lysine. Both kafirin and zein have a 21 amino acid signal peptide exhibiting 80% homology and eight copies of a repetitive amino acid block in the C-terminal domain with the consensus: I~LLALN6LALANPAAYLQQQQ. The kafirin cDNAs were used as probes to screen a sorghum genomic library; one genomic clone (XGK.1) was sequenced and found to be very similar (97.8%) to the pSK8 cDNA clone. Clone kGK.1 contains features typical for a functional gene in that the intronless open reading frame encoding 268 amino acids is flanked at the 5' end by sequences corresponding to the CAAT and TATA promoter boxes (positioned at about - 6 0 and - 30 bp, respectively, from the transcriptional initiation site), and at the 3' end by a consensus polyadenylation signal. In common with zein genomic clones, kafirin clones contain a 15 basepair consensus sequence centered at postion - 320 relative to the transcriptional initiation site. Under similar hybridization conditions, genomic reconstruction analysis using an oligonucleotide probe indicated the presence of less than 20 copies of kafirin per haploid sorghum genome compared with approximatley 140 copies of zein per haploid maize genome.

Introduction

Sorghum (Sorghum bicolor) stands 7th in farm value

among US crops, and is widely used throughout the world for grain and forage in climates that are too hot and arid for maize. However, many features of

246 sorghum suggest it is closely related to maize [27]. Despite the economic importance of sorghum, little detailed information about its seed proteins is presently available. It has, however, been shown that prolamins predominate in sorghum grain, which has an overall amino acid content that is low in lysine and tryptophan [27]. The major prolamin isolated from sorghum seed is called kafirin. Intensive analysis of maize seed proteins, especially the zein fraction (the major prolamin in maize seed) [19], has been undertaken in the past decade. From such work it has become clear that zein is encoded by a multigene family. The term zein has been applied to several molecular species; the 22 kDa and 19 kDa forms have highly similar sequences notable for the absence of lysine residues [51]; the 27 kDa, 15 kDa and 10 kDa forms (zein-2) contain much higher proportions of methionine and cysteine and have little in common with the zein-1 fraction beyond their solubility in alcohol. Different research groups have provided widely varying estimates for the copy number of zein genes, but a generally accepted number appears to be above 25 for the 22 kDa form and 55 for the 19 kDa form [52]. We wished to learn if the kafirin fraction of sorghum is encoded by genes similar to those described for the 22 kDa and 19 kDa zeins, and also to investigate the complexity and regulation of the kafirin gene family. Such studies are important in understanding the molecular basis of sorghum seed protein expression and allow comparisons to be made between the biology of kafirin and zein proteins. Homologies between kafirin and zein will more precisely define structurally conserved domains in these gene families. Data presented here should provide information useful for improvement of the nutritive and processing qualities of sorghum seed.

Materials and methods

Isolation and translation of sorghum polyadenylated R N A Total RNA was prepared from mid-maturation sorghum seeds as described by Sengupta-Golpalan et al. [45], and polyadenylated RNA was obtained by

poly(U) Sephadex chromatography [39]. The wheat germ extract ($23) used for translation in vitro was prepared according to Marcu and Dudock [32]. The translation products were separated by electrophoresis through an 18% SDS-polyacrylamide gel [25] and visualized by fluorography [7].

cDNA cloning Clone pSK8 was isolated from a cDNA library of mid-maturation seed mRNA constructed by the S1 method of Maniatis et al. [31]. This clone, and others isolated from this library, lacked 5' untranslated regions, and a second library was constructed using an adaptation of the methods of Okayama and Berg [40] and Watson and Jackson [50]: the mRNAcDNA hybrid obtained by reverse transcription was treated with RNase H, Escherichia coli DNA polymerase I and DNA ligase in the presence of deoxyribonucleoside triphosphates. The double-stranded cDNAs were filled-in with T4 DNA polymerase, blunt-end ligated to phosphorylated Eco RI linker, and cloned into the Eco RI site of the hgt 10 vector [201. cDNA libraries were screened with either a 23 base oligonucleotide (32P-TTGT TGTTGTAG GTATGTAGCGG) that represented the sequence of the repetitive amino acid block, or with the pSK8 insert labeled with 32p by nick translation. The positive clones were used for DNA mini-preps [31], and cDNA inserts were purified and subcloned into M13 bacteriophage for DNA sequencing.

Construction of genomic libraries

Total sorghum (Sorghum bicolor, cv. RTx-430) DNA was isolated from etiolated seedlings 7 - 9 days after planting. Fresh tissue was frozen in liquid nitrogen, ground to a fine powder, and extracted for 1 h at 55 °C with 1% hexadecyltrimethylammoniumbromide, 0.7 M NaC1, 50 mM Tris-HC1 (pH 8.0), 10 mM EDTA, 100 mM sodium metabisulfite, 1% polyvinylpyrrolidone-40 and 20/~g/ml proteinase K. The resulting lysate was extracted twice with an

247 equal volume of chloroform:octanol (24:1), adjusted to 2.5 M in ammonium acetate, and nucleic acids were precipitated by addition of one volume of isopropanol. DNA was purified by isopycnic banding through two sequential CsC1 density gradients. The resulting sorghum DNA was routinely greater than 60 kb in length. Eco RI digestion of sorghum DNA was conducted as described by Sun and Slightom [48] and 10-23 kb fragments were purified by centrifugation through a 10-40% sucrose gradient [311. Genomic libraries were constructed in bacteriophage ~ Charon 35 [29]. Ligation and screening of subsequent libraries were essentially as described by Murray et al. [38]. Recombinant phage containing kafirin genes were identified by hybridization with both cDNA clones (pSK8 and pSKR2) and to the 23 base synthetic oligonucleotide described above.

DNA sequencing The cDNA and genomic clones were sequenced by the dideoxynucleotide chain termination method [43] after ligation into M13mpl8 or M13mpl9 and transfection in E. coli JM 101 [31]. The singlestranded recombinant phage DNAs were concentrated with polyethylene glycol, extracted with phenol, and sequenced using a universal primer [1]. A sequential series of overlapping clones representing both orientations was produced by hybridization of single-stranded recombinant M13 DNA with complementary 22- or 29-mer oligonucleotides as described [9]. The complete sequence of contiguous regions was assembled from these overlapping clones and analyzed using programs from University of Wisconsin Genetics Computer Group [10].

Primer extension analysis The transcriptional start site on genomic clone XGK.1 was determined by primer extension [15]. Total mid-maturation RNA (30 ~g) was mixed with 104 cpm of a 29-mer (32p-GCTCACTGAAAGAGCAAGGAGCACAAGGA) complementary to the 5' region of kafirin mRNA, boiled for 3 minutes,

then annealed at 65 °C for 90 minutes. The duplex was extended using AMV reverse transcriptase in the presence of four dNTPs and RNase inhibitor (RNasin, Promega Biotech). The extended products were ethanol precipitated, and resolved on a 6% polyacrylamide sequencing gel in parallel with a ladder made from dideoxynucleotidesequencing reactions, using the 29-mer as a sequencing primer, of a related single-stranded recombinant M13 that contained the 5' region of XGK.1 (see Fig. 5).

Genomic reconstruction

Prior to DNA:DNA hybridization analysis of the kafirin gene family, total sorghum DNA (1 t~g) was digested to completion with the restriction enzymes specified in the text, fractionated on a 0.7% agarose gel by electrophoresis, and transferred onto nylon membranes (GeneScreen Plus, New England Nuclear) by an alkaline blotting procedure [42]. The pSKR2 cDNA insert was coelectrophoresed in mixtures that represented 10, 20, and 40 copies per haploid genome equivalent. The probe during these hybridizations was a mixture containing nicktranslated pSK8 and pSKR2 cDNA inserts. For slot-blot hybridization, sorghum nuclear DNA (1 #g) and maize DNA (Zea mays cv. B-73; 2.87 ~g) were depurinated with HC1, followed by strand cleavage and denaturation with NaOH. DNA samples were blotted onto nylon membranes under alkaline conditions [42] using a slot-blot manifold (Sch!eicher & Schuell). Purified cDNA insert from pSK8 was treated identically as above and applied to the filter in mixtures with 1 #g carrier lambda DNA that contained the indicated copies per haploid genome equivalent. Filters were probed with a synthetic 23 base oligonucleotide (above) that was 32p. labeled by T4 polynucleotide kinase in the presence of 3,32p-ATP ( > 109 cpm/~g). The gene copy number determined from the reconstruction blots was based upon a haploid genome content of sorghum (cv. RTx-430) being 1.74 pg [28] and 5.0 pg for maize (cv. B-73) [28]. One microgram of nuclear sorghum DNA or 0.653 pg of purified SK8 DNA (908 bp) contains the equivalent

248 number of molecules as a single copy gene (3.08 × 106 molecules). The gene copy numbers were estimated by densitometric comparison of the hybridization signals in genomic DNA to those obtained from the reconstruction standards.

Results

In vitro translation of polyadenylated RNA\ [35S]met-1abeled translation products of polyadenylated RNA isolated from mid-maturation sorghum seeds contained two major bands that migrat-

ed during electrophoresis as expected for molecules of 24 kDa and 21 kDa. These values are approximately 2 kDa larger than were obtained for native kafirin, consistent with the presence of a signal sequence on each molecular species. These two major products were also observed when [3H]leu or [3H]val was used as label (Fig. 1, lanes d, e, h, i, j, and k), but were not present among the translation products when [3H]lys was used as the labeled substrate (Fig. 1, lanes f and g). These observations are in agreement with the belief that kafirin was a major component of the translation products, and consequently, the polyadenylated RNA extracted from mid-maturation seeds was enriched in kafirin mRNA.

Structural analysis of kafirin cDNA clones

Fig. 1. SDS-PAGE analysis of in vitro translation (wheat germ extract) products of poly(A) mRNA isolated from midmaturation sorghum seed. (a) translation products of brome mosaic virus (BMV) mRNA in the presence of 35S-methionine; (b & c) products from sorghum poly(A) RNA in the presence of 35Smethionine; (d, e, h & i) products from sorghum poly(A) RNA in the presence of 3H-leucine; (f & g) products from sorghum poly(A) RNA in the presence of 3H-lysine; (j & k) products from sorghum poly(A) RNA in the presence of 3H-valine; and (1) products from brome mosaic virus mRNA in the presence of 3Hlysine. Numbers indicate size in kilodaltons.

Clone pSK8, isolated from the sorghum cDNA library constructed by the S1 nuclease procedure (see Materials and methods), was sequenced and found to be 80°70 homologous with the zein cDNA clone pCM1, a member of the zein B49 subfamily [37, 46]. This comparison (not shown) indicated that pSK8 lacked only the first 6 nucleotides of the coding region, and contained the complete 3' untranslated sequence up to the poly(A) tail. The pSK8 insert was used as a hybridization probe to screen a second cDNA library constructed by an RNase H procedure (see Materials and methods). Clone pSKR2, isolated from this library, contained 55 bp of 5' untranslated sequence, an open reading frame encoding 268 amino acids and 88 bp of 3' untranslated sequence (Fig. 2). Allowing for an average poly(A) tail of 200 nt, the polyadenylated form of mRNA corresponding to pSKR2 would be about 1150 nt. Hybridization analysis between 32p-labeled pSKR2 and polyadenylated RNA from sorghum seed revealed an RNA band of approximately 1200 nt, indicating the length of pSKR2 to be close to that expected for a full-length cDNA clone (data not shown). The deduced amino acid sequences for the two kafirin clones suggest that they encode the 22 kDa kafirin species; they show considerable (70°7o) homology with the zein sequence derived from pCM1

249 -68 -19 GKI ATCGCCCATAACATTGAGAGGAATTAGAAAAATACCAAGTG/U~CGAACI GKI GK4 T C CC GK4 SKR2 . . . . GAAT C CG C C C C G A . SKR2 -18 ¢1 32 SK8 . . . . . . . . . . -. . . . . . . . . . . . . GKI AGCAACGICCITCCA~CA~I GGCTACCAAGATATTCGCCCTCCTTGCGCT GK4 CG T 1 IT T SKR? T CA TT

SK8 GKt GK4 SKR2

33 82 SK8 SK8 GKI CCATGC~CTTTTAGTGAGCGGTACAACTGCGGCCA~TATTC~ACAGIGCT GKI GK4 T C C ATT GG A G~4 SKRZ T C A CA AT G SKR2 83 132 SK8 SK8 GKI CACTTGCTCCTAATGCTATTATTCCACAGTTCATCCCACCTGTIACIGCT GKI GK4 G C GG T A C GJC4 SKR~ T C AC C SKR~ 133 18Z SK$ ATCAGGCIT GK] TTAGGGAATGAACACCTAGCTGTGCAGGCCTATCCTGGACAGCAGGTGCT GK4 A TC C T T ACAGGCCT A A C SKRZ G G ATTC CG C I C~,ACT C

SK8 GK! GK4 SKRZ

183 23Z S~8 CT GK| TTCGGCGAGCATCTTACAACAACCAATTGCCCAATTGCAGCAGCAATCCT GK4 G A G T A A SKRZ G AAT T A A

SK8 GK] GK4 SKR?

233

282 A G GAG SK8 TGGCCCATCTAACCGTACAGACCATCACAGCGCAGCAGCAGCAGCAACAA GKI GK4 C T G G ............... GK4 SKRZ C T G ...... SKRZ SK8

~I

263 332 SK8 SK8 Gl(l CAM.AAC&~CAACA.I~..qACAACAGTTCTT6TCATOV:TCAGTGCJ~TAGC g ( l GX4 . . . . . . . . . . . . G A TC A C A CA A rdC4 SKP.2 . . . . . . . . . . . . . . . . . . CAC 6 CA 332

382 CG

.~8

me| CGIEGCG/UI, C CAG~CECCTAcTrGcAACAACAGCTGCTTACATCC/IACC ~KI GK4 $KR2

A C

C A

CC T A CT TG

GT G

A A

383

C4L4 SKR2 424

SK8 GK4 SKRZ

TGCG TGG

C A C IG CAA

I

461

505 SK8 . . . . . . ACAA CAACAACIGCTTCTATCCAACCCACAGGCTGCCACCAATG GKI GK4 ACAACA A G TI" A TGG CA GK4 $1CRZ . . . . . . G AA GC A T T GG SKRZ SK8

G~I

506

555

SX8

SK8

C~I CCGCIACATACCTACAACAACAACAATTTCAACAGATCTIACCGGCGCTC GKI G G GT . . . . . . . . . T C A T A GK4 T GG

T GG

G GC G

A

C

A A

553 605 SK8 GC GKI AGTCAACTACGCATGGCAAACCCTACCGCTTACTTGCAACAGCAACAGT] GK4 GC TG T G~ C A AC SKR2 GC T TG GG C ... CAC C 606 655 SK8 GKI GCTTCCAATCAACCAACTGGCTCTGGCAAACACTGACGCATACTTGCAAC GK4 T T T G G TA C CA SKR2 T C G A G CC C 656 G 706

~l

Dw

94[

|

^v

~

^L

llI IV V SKR2 144 VA~FAAYLC~(~Y.NFLVAANAAA~LN(~L/~C~ILI~ALSq L A L v ~

sKs

143

L

SKB GKI GK4 SKR2

Pcm 194

T

,V

ILLS

q^ T

T

q

F

[

~

| l~SS AVG Ier Q L I V T VA Vl Vll SKR2 193 A A Y L ~ T . C ~ L L F F N Q L A V T N T A A ~ L R V N F V V A A N P L C A A F I ~ F R ~ SK8 192 T QQQ[ l LAD J F L V V QQ ~

..q|

TSS

RIq .L LZVP V

VIii SKR2 242 [LLPFNQISL/~IPAFSWQ~PIVGSAIV* SK8 242 SS V L [ I G F* PCHI 241 YS F L l C F*

I~3 192

191

V

~CK

T

TA GT

C

C G

C

T T

C T G CA¢4 C GTGC SKR2 755

SK8 GCCGCCTTCCTAC.AC~AACAAC.AATTGTCGTCATTCAACCAGATATCTTT GKI TA G C AG T C G GK4 G C AG T C SKR2 756

G,K4 SKR2

Pcm

qql

193 241 2~I

V.0 268 268 267

~NSENSUS

SK8

GK4 SKR2

SK8

53 QAYRLQQALANSILQQPFAQLQQQSSAHLTVQTI ......... AAQQQQQQ 94 51 I V SA I HFL TRRRQQ~ . i00 53 A V IN L I ......... T . 93 I II SKR2 95 F[FLPALSQLALANPVAY]A~LL~SNPLALVNNRAYQQ(~.L~¢~/LPMI SQ 143 SK8 IO1 | SS A VR QA | T HS A A ~ . .[...... 142

PCHI 144 L

SK8 GKI GK4 SKRZ

52 50 52

SKR2 SZ8 PCHI

SKRZ

~(1 AACIdKAGCTGCTTCCGGTTMTCCACTGGTAGTTGCCAATCCATTAGIT GKI

DC8

¥ i HATKIFVLLALLALSVSTTTAVIIPQCSLAPNAIISQFLPPLTPVGFEHPAL 1 A 11 L G A P V AL N L V 1 LA F A N F S P V SM L v

SKK2 SK8 PCMI

705

SK8

~1

(Fig. 3). All three sequences are rich in nonpolar amino acids, contain no lysine, and have a similar 21 amino acid signal peptide sequence (consensus: MATKIFSLLMLLALSASAATA), which may target these proteins to the endoplasmic reticulum which produces protein bodies within the endosperm of developing grain. The most notable difference among the three clones is that pSK8 contains (with reference to pSKR2) an insertion of 24 nucleotides between positions 272-295 and a deletion of 24 nucleotides at position 428. Interestingly,

GX1

TACAACAG GK4 TACAACAG SKR2

461 424 SK8 SK8 GK| . . . . . . . . . . . . . AGTTGCAACTAGCCAIGGCC~U~TCCAACCGCCIACGT GKI GIC4 GCCTATCTGGGCCTCAGC T GCT CG T C GK4 SKR2 GTCCTGCCAATGATCAG] GG A C TG C SKR2

GK4 SJCR2

and pSKR2) and genomic (kGK.1 and XGK.4) clones. The complete nucleotide sequence for ),GK.1 is shown; only variable nucleotides are shown for pSK8, pSKR2, and ~GK.4. Nucleotide + 1 denotes the translational start site for ~,GK.1. Translational initiation (ATG) codons are indicated by + I and polyadenylation addition sites for the cDNA clones are denoted by an asterisk. Dots represent missing nucleotides or gaps introduced to maximize sequence homology.

SK8

~(| CACATTCTCTGGCTAATGCTGCI"GCATACCAGCAACAACAAC. . . . . . . .

GIC4 SKRZ

Fig. 2. Nucleotide comparisons between kafirin cDNA (pSK8

B05 SK8

GGTTAACCCTGCCTTGTCG|GGCAGCAACCCAICATTGG!GGTGCCATCT ~ | AA CA G A A T C~4 AG T A G AA G SKR2

806 8S2 SKS SK8 GIC| TCIAGAI1ACAIAIGAGTTGTTGTACTTGATAACAA.AGCTCTC..ATAC GKI GX4 ... CT TT A.. G GX4 SKR2 ... c . G AT T SKR2 853 ~J02 SK8 * SK8 ~3~1 CGGCATGGGCAACTTTCCTAAAATAATCAATATAiIGGTI GAGAIITA ~1 GIC4 A G T C TA A GJ(4 SKR2 ? C AAT TC A * SKR2

~FIRIN

FLLALNPLALANPAAYLQQQQ

ZEIN FLLALNQLAVANPAAYLQQQQ (Spena e t al., 1982) ZEIN LLPFNQLAVLNPAAYLQQQQ (Gera~cyetal.,1982)

Fig. 3. Comparison of the deduced amino acid sequences from cDNA clones pSKR2 and pSK8 with that from zein cDNA clone pCM-1. The signal peptide cleavage site is denoted by an arrowhead and the termination codon is indicated by an asterisk. Dots indicate gaps introduced to maximize sequence alignment. Repetitive amino acid blocks are enclosed in boxes, and the consensus sequence for these repetitive blocks is shown for both karlrin and zein.

250 the insertions and deletions in pSK8, relative to pSKR2, retain the presence of repetitive structures in the primary amino acid sequence (see below).

repetitive blocks were generated by multiple duplications.

GAATTC~GCTCGGTACCCA~CCTCCC~¢CCATGCTCGCCACGTTTGTTAGGCC~GG -815

Compar&on of the derived kafirin and zein protein sequences

AGGCGCCCGCCACACCTACCACGCCCTACC~TCCGCCATGTCTAGCTATCACGCCCTCC-75"5 TTGCCTCACGTGAATGTC~GATTGTTTCACCATATTATTTATGACAGAGGACTTGAT~ -695 TTTTTTTCTTGT~TCAJ~AGTTT~TTAAAACTTTGTCAAATTTACAC~TACTAAAGCA -635

Previous N-terminal analyses of the 22 kDa kafirin protein [4] have revealed the amino acid sequence FI P C PI I NHE-vvIAQySLAQNAIAAQFLPAL-. This sequence closely matches the amino acid sequence deduced from the eDNA data, starting at amino acid position 22, and provides confrmation that the first 21 amino acids encoded by the eDNA clones represent a signal peptide in precursor kafirin that is cleaved during maturation to yield the form found in mature seed endosperm. A structural model has been proposed [2] for the 22 and 19 kDa zeins, based on the presence of nine homologous repeat units in their primary amino acid sequences [46] and on their circular dichroic spectra [2]. Dot matrix plot analysis of the deduced amino acid sequences for pSK8 and pSKR2 reveals (data not shown) eight repetitive structures towards the C-terminus of kafirin with a consensus sequence: IVLLALNt~LALANPAAYLQQQQ (Fig. 3). This is similar to the two consensus blocks proposed by Spena [46] and Geraghty et al. [13] for zein (also shown in Fig. 3). Conservation of these blocks in zein and kafirin suggests that these proteins arose from a common ancestral gene; it is likely that the

ATTGT~TC~G~CACAGGGAGTGCGTGT~TMTAGCTAT~GCATTCT~TTTGTAC -575 ATTCTATTTGTGTGCATACATCTGTACATACTGGGATTTCMTTGTACATACTGGATTTC -515 ~TTTGTCTTGATCTTGTAGCATTTTTCMTCATT~TG~C~CTTCATCT~CTACGT -455 TGC~GACAAATAGTACAGTAGTAC~CAAAGTCCTTTGATAAAGGCTTTGATATACATG -395 AGC~GTCATMCTTTACTTGCACATCATGTC~GG~CATTTCTGATGTGGCT -335 ~GGCTATAACATGT~T~GGTG~GTGATGTCACTCCTCATTTAT~GTTCC~ -275 TAGA~T~C~CTTGTTCCTTGT~GTAGTGG~E~ATTGTCTTTCCTACACAGACCATA -215 TMTC¢~T[~.AATTGAT~CTA/~TGTCAAAATCGACTAGGTGCCATGTCATCTATAGT -155 TTATCTGTTGTTCGCAAAAG~TAJLACA~TATCTATGAGCTCTCACTC.e~

-95

'~GCCCC/LAATCAGTAGTTA/~CCATCGCCCAT~CATTGAGAGG~TTAGAAA/~T -35 +1 ACC~GTGA/~CGACTAGC~CGTCCTTCCMC~TGGCTACC~GATATTCGCCCTCCT 26 ATK]F LL T

TGCGCTCCATGCTCTTTTAGTGAGCGGTACMCT~CGGCCATTATTCCACAGTGCTCACT A L H A L L V G T T A I P O C S L

86

TGCTCCT~TGCTATTATTCCACAGTTCATCCCACCTGTTACTGCTTTAGGG~TG~CA ]46 A P N A I P F I P V T L G E H CCTAGCTGTGCAGGCCTATCCTGGACAGCAGGTGCTTTCGGCGAGCATCTTAC~C~CC 206 L A V Q Y P Q Q V L S A ] L Q P ~TTGCCC~TTGCAGCAGC~TCCTTGGCCCATCT~CCGTACA~CCATCACAGCGCC 266 I A Q L Q Q L A H T V T I T A P ~GGCAGCAGC~C~C~C~CMC~C~CMC~CAGTTCTTGTCATCACTCAGTGC 326 R Q Q Q Q Q Q Q Q Q F $ S L S A ACTAGCCGTGGCG~CCAGGCCGCCTACTTGC~C~CAGCTGCTTACATCC~CCCACA 386 L A V A Q A Y L Q Q Q L T $ P H TTCTCTGGCT~TGCTGCTGCATACCAGC~C~C~CAGTTGC~CTAGCCATGGCG~ 446 S L A N A A A Y Q Q Q Q Q L L A A N TCCAACC~CTACGTACAACAACAACT~CTATCCAACCCACAGGCT~CACCAAT~ SO6 P T A Y Q Q L L L N P A A N A ¢~TA~TACCTAC~C~C~C~TTTC~CAGATCTTACCGGCGCTCAGTE~CTACG 566 A T Y L Q Q F Q Q ] L P L S Q L R CATGGCAJmLACCCTACCGCTTACTTGC~CAGC~CAGTTGCTT(C~TC~CC~CTGGC 626 M N P A Y Q Q Q Q L L I N Q L A T~GGCAAACACTGACGCATACTTGC~C~C~CAGCTGCTTCCG~TT~TCCACTGGT 686 L N T A Y Q Q Q Q L L P V/N P L V

Fig. 4. The nucleotide and deduced amino acid sequences of kafirin clone XGK.1. TATA and CAAT boxes are designated by sequences within boxes, the major transcriptional start site (see Fig. 5) is denoted by an adenosine residue with a caret (-) above it. A m i n o acids are designated by their one letter symbol and the mature N-terminus is marked by an arrowhead. The polyadenylation signal is denoted by checked squares. Putative locations for N-linked glycosylation are marked by solid squares. The single underline represents the 105 base direct repeat sequence within ~,GK.4, and the double underline indicates the 8 bp consensus sequence hypothesized to confer endosperm specificity [8, 23, 30, 35]. Lower case letters represent the translational termination codon.

AGTTGCC~TCCATTAGTTGCCGCCTTCCTACAGC~C~C~TTGTCGTCATTC~CCA 746 V N P V A F L Q Q Q Q L S S N Q GATATCTTTGGTT~CCCTGCCTTGTCGTGGCAGC~CCCATCATTGG~GGTGCCATCTT806 I L V P A S W Q O P I 1G G A I F CtagATTACATATGA~TTGTTGTACTTGAT~C~AGCTCTCATACCGGCATGGGC~CT 866 lll|lml TCCTA~T~TC~GAGATTGGTTGAGATTTATTTGTCTATTTCGT~TTATGTTCTTC926 ATATATGCGATTGA~CATCACATCAT~TTA~GACACATGCTTGGTT~TTTGTGGA 986 C~T~CATCCTACTTCATC~TCTA~GATGTGCCTGCCCGACETG~TATTCT~CTT1046 1106 GGGTGTGTGTGTTTTATACGATGAGTGCACTATTGG~TCGCGCGCTTTGCCTAGTGTCT

AGGGCACTCGGCA~CGGTA~CACTCGGCA~CCGGTGTGTGCTGGGTGCCACAI166 CTCGG~AGCGGAGACAGC~CA~T~CGGCAAAAGCAA~CATAGTEA~TCAG~ 1226 GTGC~IGTATGGGG~TCCTCTAGAGTCGACCATGCAGGCATGAC~GCTT ........

1278

251

Isolation and characterization of a genomic clone encoding kafirin A sorghum genomic library was constructed in bacteriophage X vector Charon 35 [29] and several clones encoding kafirin were isolated using pSKR2, pSK8, and the synthetic oligonucleotide as probes. One clone, XGK.1, appears to contain an entire kafirin gene; the complete nucleotide and derived amino acid sequence of a 2.15 kb Acc I fragment ()xGK.1) of this clone is given in Fig. 4. Regions homologous with the TATA and CAAT promoter regions of plant genes [11] are located close to the - 3 0 and - 6 0 positions, relative to the putative transcriptional initiation site revealed by S1 mapping and primer extension analysis (see below and Fig. 5). Additional upstream sequences analogous to the zein " - 3 3 0 " sequences [23, 30] are also present. As found for all zein genes characterized thus far, )xGK.1 contains no intervening sequences. It closely corresponds with the pSK8 cDNA sequence (97.8%) and has a contiguous open reading frame encoding 268 amino acids. The consensus polyadenylation sequence for the B49 subfamily of zein genes, AATAAT, occurs 60 bp downstream from the translation stop signal. The predicted molecular weight of the polypeptide encoded by XGK.1 is 26.9 kDA (excluding a predicted 2.2 kDa signal peptide). This discrepancy between apparent and predicted molecular weights has also been described with respect to zein genes [14, 21]. As observed for many eukaryotic genes, the 1 kb 5' and 400 bp of 3' untranslated regions surrounding XGK.1 are highly A-T-rich ( - 6 3 % ) compared with the coding region, where the A-T proportion is only 50%. A 2.2 kb Bgl II-Dra I fragment derived from a second kafirin genomic clone, XGK.4, has been completely sequenced and been found to be homologous (83 %) with cDNA clone pSKR2. This genomic clone also has all the hallmarks of a functional gene; however, a termination codon interrupts the reading frame at codon 175. Homologies between XGK.1 and XGK.4 are 85 % within the 5' flanking 1 kb, 81% within the coding region, and 82% within the 3' untranscribed regions. An interesting feature within the 5' flanking region of )xGK.4 is the presence of a 105 bp direct repeat between positions -287 and -78 relative to

Fig. 5. Identification of the transcriptional start site of kafirin genomic clone XGK.1 by primer extension analysis. Location of the major extension product is marked by a large arrowhead and the minor products by a small arrowhead; lanes marked A, C, G, T, indicate dideoxy sequence reactions of ?,GK.1; and lane marked (+) denotes the presence of 30 #g total seed RNA in the primer extension reaction.

252 the cap site (87% homologous) which immediately follow one another. Clone XGK.1 contains only one copy of this direct repeat within its 5' flanking DNA.

S-I mapping and primer extension analysis Since XGK.1 possesses all the elements of a functional gene, S-I mapping was undertaken to determine the putative transcriptional start site. One maj or and two minor bands resistant to mung bean nuclease digestion were observed approximately 3 0 - 3 2 bp

downstream from the TATA box (data not shown). To complement this finding, primer extension was used to precisely locate the 5' end of kafirin mRNA. As shown in Fig. 5, four extended products are resolved on a 6°70 polyacrylamide-7 M urea gel, all of which correspond to the T lane of the dideoxy sequencing of M13 clone as described in Materials and methods. These results indicate that one major and three minor initiation sites are respectively located at 68, 69, 66 and 65 nucleotides upstream from the translation starting site. Hence, although it is not a full-length clone, pSKR2 appears to lack only some 12 nt at its 5' end.

Fig. 6. Genomic reconstruction analysis of the kafirin and zein gene families. Panel A: 10, 20, 40, 80 and 160 copies o f p S K R 2 per haploid genome equivalent (per 1 /~g of sorghum DNA) and sorghum nuclear DNA which was digested with the indicated restriction endonucleases were probed during southern blot hybridizations with 10s cpm of nick translated pSKR2 (3 x 108 cpm//zg). Panel B: reconstruction analysis of sorghum and maize nuclear DNA. Lanes bracketed by "S" contain 1 ~g of sorghum DNA (1.74 pg per haploid genome) [28] and those bracketed by a "M" contain 2.87 ~g of maize D N A (5.0 pg per haploid genome) [28]. The hybridization probe was a 23 base synthetic oligonucleotide, which encoded a repetitive amino acid block (nacleotides 4 9 9 - 5 2 2 of pSK8). Numbers indicate the a m o u n t (in copies per haploid genome equivalent) of purified pSK8 insert D N A which was applied to the filter. Copy numbers were determined by comparisons between densitometer tracings of test lanes (containing sorghum or maize DNA) with those of lanes containing a known copy number.

253

Estimation of gene copy number by genomic reconstruction The number of genes encoding 22 kDa kafirins was determined by digestion of 1 ~g of sorghum genomic DNA with Hind III or Bgl II and Dra I, followed by electrophoresis and transfer onto a nylon filter (Fig. 6A). Hybridization was conducted under moderately stringent conditions (T m 25°C). The pSKR2 cDNA insert was used for gene reconstruction on the basis of the haploid genome content of sorghum being 1.74 pg [28], and 0.653 pg of purified pSK8 DNA insert being equal to one gene copy equivalent per 1/zg of sorghum DNA. Comparisons between the densitometer measurements of pSKR2 and genomic DNA bands indicate that the copy numbers for the 3.0 kb and 3.5 kb Hind III bands are 14 and 6, respectively, and that the total copy number of the 2.2 kb Dra I/Bgl II digestion fragment was 20. The larger hybridizing band in the reconstruction of panel A represents contaminating vector DNA which cross-hybridizes under these conditions. In order to determine the total copy number of 22 kDa and 19 kDa kafirins, as well as that for the entire zein-1 family, slot blot analysis of total sorghum and maize DNA was performed using a 23 base oligonucleotide as tlae probe (see Methods; Fig. 6B). This analysis confirmed that there are approximately 20 kafirin genes encoding the 22 kDa and 19 kDa kafirin proteins, and some 110 genes encoding the zein-1 family. It should be stressed that the existence of kafirin gene families which do not cross-hybridize with these sequences cannot be eliminated, therefore, 20 gene copies per haploid genome represents the minimum number of kafirin genes.

French bean and/3-conglycinin of soybean [11]. Sequence conservation observed in the several zein genes sequenced thus far has been interpreted as identifying regions of structural importance [2, 21, 24, 33, 34, 41, 44, 49]. Homologies observed between the zein and kafirin sequences (Figs. 2 and 3) should provide even more rigorous definition of structurally important regions in these proteins. In particular, the multiple repeats thought to yield a rod-like structure for zein [2, 41] are conserved in kafirin. The copy number for kafirin genes in sorghum was found to be lower than that for zein genes in corn (20 vs. 110; see Fig. 6). If all, or the majority of, zein genes are expressed, nutritional enhancement of corn seed by approaches such as the insertion of lysine residues into zein coding regions will be complicated by dilution arising from the expression of competing endogenous zein genes. It appears that this difficulty is alleviated, but not obviated, in the case of sorghum. Clone ;~GK.4 was found to contain a termination codon, and although this could be a cloning error, it probably represents a pseudogene. The existence of a 105 bp duplication within the 5' flanking region of clone )xGK.4 that is not present in XGK.1 raises some interesting questions relating to the effects that this sequence may have upon regulation of kafirin expression. One possibility is that the duplication of the sequences may induce a downregulation of transcription such as the negative regulatory elements within the promoter of the pea ribulose-l,5bisphosphate carboxylase [22]. It, therefore, will be important to determine the effects that this duplication has upon kafirin expression in both transgenic dicot and monocot tissues.

Potential sites for glycosylation in kafirin Discussion

Comparison of GK.1, GK.4 and zein sequences Overall, our findings show that extensive homologies exist between kafirin and zein proteins within the prolamin fractions of corn and sorghum seed, analogous to the homologies found for phaseolin of

Analysis of the predicted amino acid sequence of cDNA clones pSK8 and pSKR2 indicated that 22 kDa kafirins potentially fall into two classes: glycosylated and nonglycosylated. Clones pSK8 and XGK.1 both contain two sequences characteristic for N-linked glycosylation sequences (amino acids 149 and 192), whereas pSKR2 and )xGK.4 do not contain these sequences. Preliminary studies suggested that

254 at least some portion of the kafirin fraction bound specific lectins (data not shown). Additionally, Burr and Burr [6] suggested that a single glucose residue is associated with the zein fraction from maize seed. We considered the possibility that some kafirin (and perhaps zein) molecules are transiently glycosylated between the site of synthesis and final deposition as a mature protein. In order to determine if kafirin possesses N-linked oligosaccarides, kafirin samples were sent to Thomas Rachemacher's laboratory at Oxford University. Samples of native kafirin were subjected to hydrazinolysis, reduction with NaB3H 4 and Bio-gel P-4 chromatography as described by Ashford et al. [3]. These studies indicated that karlfin protein is not associated with any sugar moiety (D. Ashford, personal communication).

and TATA boxes were found adjacent to the - 30 and - 7 0 regions (Fig. 4), as has been observed for phaseolin [18] and other seed genes [19]. Considerable interest exists in identifying features of genes, mRNAs, and proteins responsible for their tissue-specific expression and accumulation. Analysis of transcriptional expression of low molecular weight glutenin genes from wheat in transgenic tobacco has suggested that the sequence TGTAAAG~, located approximately 330 bp upstream of the transcriptional start site, is especially important for endosperm-specific gene expression [8, 23, 30, 35]. This sequence is found twice, four helical turns apart, approximately 300 bp from the transcriptional start site within genomic clone XGK.1. A third similar sequence (TCGAAAAG) is found 32 bp downstream (approximately three helical turns) from the last TGTAAAGG sequence.

Evidence suggesting that GK.1 is an expressed gene We have examined features of kafirin mRNA, predicted from the cDNA sequence, that may hinder expression at the translational level. As described by Spena et al. [47] for phaseolin, zein, and several other plant mRNAs, a hairpin structure can be formed between the 5' and 3' untranslated regions of karlrin clones XGK.1 and pSK8 (e.g. z~G = - 205 kJ for XGK.1). Since both kafirin and phaseolin are efficiently translated in vitro (see Fig. 1 and [17]), it appears unlikely that this putative secondary structure inhibits translation either in vitro or in vivo. Nevertheless, it should be borne in mind that the wheat germ translational system is of monocot origin, and it is conceivable, if unlikely, that dicot ribosomes have some additional requirement for efficient translation. S1 and primer extension analysis of XGK.1 (Fig. 5) and XGK.4 (not shown) indicated that the transcriptional start site lies immediately upstream of the coding region. This finding detracts from the possibility that an intron (or introns) lies within its 5' untranslated region, as has been suggested for zein [5]. Northern hybridization analysis (not shown) also provided no evidence for transcription from distantly duplicated promoter regions, such as those found 1 kb apart for zein clone PML1 [26]. Repeat CAAT

Expression of genomic sequences of monocot origin in dicot plants Relatively few monocot gene families have been characterized thus far, and genes from only five families have been transferred to transgenic dicot plants: (i) zein genes [16, 36, 44]; (ii) maize Adh-1 gene [12]; (iii) wheat high and low molecular weight glutenin genes [8]; and (iv) the barley B hordein gene [35]. The promoters of several of these genes retain their "normal" temporal and tissue specificity of transcription in transgenic dicot plants [8, 35, 44], but none of these genes has been shown to produce detectable amounts of protein in transgenic dicot tissue. A recent study which investigated the function of a genomic zein gene within transgenic tobacco plants [44] suggests that the lack of detectable protein synthesis is due to the weak activity of the zein promoter in dicot tissue, not the rapid degradation of newly synthesized protein. This tends to support the idea that each of the zein genes (approximately 50-150 copies) [52] may only contribute a small portion of protein to the zein pool. As transformation and regeneration of cereal crops is now becoming feasible, the lower genomic copy number of kafifin genes (approximately 20) relative to zein may

255 indicate that a low number modified kafirin gene sequences (e.g. high lysine) will have a more pronounced effect upon grain quality of sorghum than of maize. We believe this study presents the first molecular characterization of the kafirin gene family. Striking sequence homology has been demonstrated to the zein genes of maize indicating that sorghum and maize have evolved from a common ancestral plant. Homology between kafirin and zein peptides indicates that these two proteins are most likely processed and deposited within protein bodies in a similar manner. Therefore, analysis of shared gene sequences and peptide structure between kafirin and zein should lead to a more complete understanding of prolamine structure and function within sorghum and maize grain.

Acknowledgements We thank Michael Murray of the Agrigenetics Advanced Science Co., Madison, WI, and Howard Hershey of the DuPont Co., Wilmington, DE, for their advice on genomic library production. David Ashford is gratefully acknowledged for performing carbohydrate analyses on native kafirin and Dr. Fred Miller, Texas A&M Soil and Crop Sciences Department, for sorghum seed. This research was supported by grants from the Texas Advanced Technology Research Program and Rh6ne-Poulenc Associates. R.C.K. is a USDA predoctoral fellowship recipient.

References 1. Anderson S, Gart M J, Mayol L, Young IG: A short primer for sequencing DNA cloned in the single-stranded phage vector M13mp2. Nucl Acids Res 8:731-1743 (1980). 2. Argos P, Pedersen K, Marks MD, Larkins BA: A structural model for maize zein proteins. J Biol Chem 257:9984-9990 (1982). 3. Ashford D, Dwek RA, Welply JK, Amatayakul S, Homans SW, Lis H, Taylor GN, Sharon N, Rademacher TW: The/31-2D-xylose and ~l-3-L-fucose substituted N-linked oligosaccarides from Erythrina cristagalli lectin. Isolation, characterization and comparison with other legume lectins. Eur J Biochem 166:311-320 (1987). 4. Bietz JA: Cereal prolamin evolution and homology revealed by sequence analysis. Biochem Genet 20: 1039-1053 (1982).

5. Brown JWS, Wandelt CH, Feix G, Neuhaus G, Schweiger HG: The upstream regions of zein genes: sequence analysis and expression in the unicellular alga Acetabularia. Eur J Cell Biol 4 2 : 8 3 - 8 8 (1986). 6. Burr FA, Burr B: Molecular basis of zein protein synthesis in maize endosperm. In: Rubenstein I, Phillips RL, Green CE, Gengenbach BG (eds) The Plant Seed: Development, Preservation, and Germination, pp. 27-48. Academic Press, New York (1979). 7. Chamberlain JP: Fluorigraphic detection of radioactivity in polyacrylamide gels with the water soluble flour, sodium salicylate. Anal Biochem 98:132-135 (1979). 8. Colot V, Robert LS, Kavanagh TA, Bevan MW, Thompson RD: Localization of sequences in wheat endosperm protein genes which confer tissue-specific expression in tobacco. EMBO J 6: 3559-3564 (1987). 9. Dale RMK, McClure BA, Houchins JP: A rapid singlestranded cloning strategy for producing a series of overlapping clones for use in DNA sequencing: application to sequencing the corn mitochondrial 18S rDNA. Plasmid 13: 31-40 (1985). 10. Devereux J, Haeberli P, Smithies O: A comprehensive set of sequence analysis program for the VAX. Nucl Acids Res 12: 387-395 (1984). 11. Doyle J J, Schuler MA, Godette WD, Zenger V, Beachy RN, Slightom JL: The glycosylated seed storage proteins of Glycine max and Phaseolus vulgaris. J Biol Chem 261: 9228-9238 (1986). 12. Ellis JG, Llewellyn DJ, Dennis ES, Peacock WJ: Maize Adh-1 promoter sequences control anaerobic regulation: addition of upstream promoter elements from constitutive genes is necessary to expression in tobacco. EMBO J 6: l l - 1 6 (1987). 13. Geraghty DE, Messing J, Rnbenstein I: Sequence analysis and comparison of cDNAs of the zein multigene family. EMBO J 1:1329-1335 (1982). 14. Geraghty DE, Peifer MA, Rubenstein I, Messing J: The primary structure of a plant storage protein: zein. Nucl Acids Res 9:5163-5174 (1981). 15. Golden SS, Brusslan J, Haselkorn R: Expression of a family ofpsbA genes encoding a photosystem II polypeptide in the cyanobacterium Anacystis nidulans R2. EMBO J 5: 2789-2798 (1986). 16. Goldsbrough PB, Gelvin SB, Larkins BA: Expression of maize zein genes in transformed sunflower cells. Mol Gen Genet 202:374-381 (1986). 17. Hall TC, Ma Y, Buchbinder BU, Pyne JW, Sun SM, Bliss FA: Messenger RNA for G1 protein of french bean seeds: cell-free translation and product characterization. Proc Natl Acad Sci USA 75:3196-3200 0978). 18. Hall TC, Reichert NA, Sengupta-Gopalan C, Cramer JH, Lea K, Barker RF, Slightom JL, Klassy R, Kemp JD: In: van Vloten-Doting L, Groot GPS, Hall TC (eds) Organization and Expression of the Plant Genome, pp. 517- 529. Plenum Press, New York 0985). 19. Heidecker G, Messing J: Structural analysis of plant genes. Ann Rev Plant Physiol 37:439-466 (1986).

256 20. Hoyt MA, Knight DM, Das A, Miller HI, Echols H: Control of phage X development by stability and synthesis of cll protein: role of the viral clII and host hflA, himA and himD genes. Cell 31:565-573 (1982). 21. Hu N-T, Peifer MA, Heidecker G, Messing J, Rubenstein I: Primary structure of a genomic zein sequence of maize. EMBO J 1:1337-1342 (1982). 22. Kuhlemeier C, Fluhr R, Green PJ, Chua N-H: Sequences in the pea rbcS-3A gene have homology to constitutive mammalian enhancers but function as negative regulatory elements. Genes and Development 1:247-255 (1987). 23. Kreis M, Shewry PR, Forde BG, Forde J, Miflin BJ: Structure and evolution of seed storage proteins and their genes with particular reference to those of wheat, barley and rye. Oxford Surveys of Plant Mol and Cell Biol 2:253-317 (1985). 24. Kriz AL, Boston RS, Larkins BA: Structural and transcriptional analysis of DNA sequences flanking genes that encode 19 kilodalton zeins. Mol Gen Genet 2 0 7 : 9 0 - 9 8 (1987). 25. Laemmli UK: Cleavage of structural proteins during the assembly of the head of bacteriophage T a. Nature 227: 680-685 (1970). 26. Langridge P, Feix G: A zein gene of maize is transcribed from two widely separated promoter regions. Cell 34: 1015-1022 (1983). 27. Lasztity R: The Chemistry of Cereal Proteins. CRC Press, Boca Raton, FL (1983). 28. Laurie DA, Bennett MD: Nuclear DNA content in the genera Zea and Sorghum. Intergeneric, interspecific and intraspecific variation. Heredity 55:307-313 (1985). 29. Loenen WAM, Blattner FR: Lambda Charon vectors (Ch32, 33, 34 and 35) adapted for DNA cloning in recombinantdeficient hosts. Gene 26:171-179 (1983). 30. Maier U-G, Brown JWS, Toloczyki C, Feix G: Binding of a nuclear factor to a consensus sequence in the 5' flanking region of zein genes from maize. EMBO J 6 : 1 7 - 2 2 (1987). 31. Maniatis T, Fritsch EF, Sambrook J: Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1982). 32. Marcu K, Dudock B: Characterization of a highly efficient protein synthesizing system derived from commercial wheat germ. Nucl Acids Res 1:1385-1397 (1974). 33. Marks MD, Lindell JS, Larkins BA: Quantitative analysis of the accumulation of zein mRNA during maize endosperm development. J Biol Chem 260:16445-16450 (1985). 34. Marks MD, Lindell JS, Larkins BA: Nucleotide sequence analysis of zein mRNAs from maize endosperm. J Biol Chem 260:16451-16459 (1985). 35. Marris C, Gallois J, Kreis M: The 5' flanking region of a barley B hordein gene controls tissue and development specific CAT expression in tobacco plants. Plant Mol Biol 10: 359- 366 (1988). 36. Matzke MA, Susani M, Binns AN, Lewis ED, Rubenstein I, Matzke AJM: Transcription of a zein gene introduced into sunflower using a Ti-plasmid vector. EMBO J 3:1525-1531 (1984).

37. Messing J, Geraghty D, Heidecker G, Hu N-T, Kridl J, Rubenstein I: Plant gene structure. Genetic Engineering of Plants 26:211-227 (1983). 38. Murray MG, Kennard WC, Drong RF, Slightom JL: Use of a recombination-deficient phage lambda system to construct wheat genomic libraries. Gene 30:237-240 (1984). 39. Murray MG, Peters DL, Thompson WF: Ancient and repeated sequences in the pea and mung bean genomes and implications for genome evolution. J Mol Evol 17:31-42 (1981). 40. Okayama H, Berg P: High-efficiency cloning of full-length cDNA. Mol Cell Biol 2:161-170 (1982). 41. Pedersen K, Devereux J, Wilson DR, Sheldon E, Larkins BA: Cloning and sequence analysis reveal structural variation among related zein genes in maize. Cell 29:1015 - 1026 (1982). 42. Reed KC, Mann DA: Rapid transfer of DNA from agarose gels to nylon membranes. Nucl Acids Res 13:7207-7221 (1985). 43. Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74: 5463 - 5467 (1977). 44. Schernthaner JP, Matzke MA, Matzke AJM: Endospermspecific activity of a zein gene promoter in transgenic tobacco plants. EMBO J 7:1249-1255 (1988). 45. Sengupta-Gopalan C, Reichert NA, Barker RF, Hall TC, Kemp JD: Developmentally regulated expression of the bean fl-phaseolin gene in tobacco seed. Proc Natl Acad Sci USA 82:3320-3324 (1985). 46. Spena A, Viotti A, Pirrotta V: A homologous repetitive block structure underlies the heterogeneity of heavy and light chain zein genes. EMBO J 1:1589-1594 (1982). 47. Spena A, Viotti A, Pirrotta V: Two adjacent genomic zein sequences: structural organization and tissue-specific restriction pattern. J Mol Biol 169:799-811 (1983). 48. Sung MT, Slightom JL: Methods for preparation of plant nucleic acids optimally suited for restriction endonuclease digesting and cloning. The construction of jack bean and soybean phage libraries in charon 4A. In: Panapoulos NJ (ed) Genetic Engineering in Plant Sciences, pp. 39-61. Praeger, New York (1981). 49. Viotti A, Cairo G, Vitale A, Sala E: Each zein gene can produce polypeptides of different lengths. EMBO J 4:1103 - 1110 (1985). 50. Watson C J, Jackson JF: An alternative procedure for the synthesis of double-stranding cDNA for cloning in phage and plasmid vectors. In: Glover DM (ed) DNA Cloning: A Practical Approach. Vol. 1 pp. 79-88. IRL Press, Oxford/Washington DC. 51. Wilson GH: Structure and evolution of seed storage proteins and their genes with particular reference to those of wheat, barley and rye. In: Gottschalk W, Mueller HP (eds) Seed Proteins: Biochemistry, Genetics, Nutritive Value. Martinus Nijhoff, The Hague/Boston/London (1983). 52. Wilson DR, Larkins BA: Zein gene organization in maize and related grasses. J Mol Evol 20" 330-340 (1984).

Characterization of the kafirin gene family from sorghum reveals extensive homology with zein from maize.

Electrophoretic analysis of translation products of polyadenylated RNA isolated from mid-maturation sorghum seed in the presence of [(35)S]met, [(3)H]...
1MB Sizes 0 Downloads 0 Views