Res, MicrobioL 1991, 142, 905.912

© INSTITU'rPASqEIJR/ELSEVlER

Paris 1991

The project of sequencing the entire Bacillus subtilis genome F, Kunst tl) and K. Devine t:> to UniM de Bioehimie Microbienne, Centre National de la Recherche Seientifique, URA 1300. lnstitut Pasteur, 25, rue du Docteur Roux, 75724 Paris Cedex 15, and ¢2) University o f Dublin, Trinity College, Department o f Genetics, Lincoln Place Gate, Dublin 2

SUMMARY The results obtained during the first year of the proiect involving the sequencing of the Bacillus subtRis genome are presented. Different gene libraries using a yeast artificial chromosome vector and bacteriophage vectors, lambda Fixll and ~ 1 0 5 J 1 2 4 , have been constructed. A total of 3 0 0 kbp have been cloned using the lambda Fixll vector, 68 kbp of which have been fully sequenced. Several open reading frames showing homologies with genes of other organisms were found. Two genes, previously unknown in this organism, have been identified.

Key-words: Bacillus subtilis, Genome; Sequencing, Yeast artificial chromosome and bacteriophage vectors; Review.

Why sequence the Bactllux subtilis genome? It is beyond doubt that knowledge of the complete sequence o f the genome of any organism would radically alter, and greatly expedite the way in which scientific research would be carried out on that organism. Since present technology and resources preclude the complete sequencing of all currently investigated orgattisms, it is important to choose the organisms for complete genome sequencing projects judiciously, in order to yield maximum scientific and practical benefit and for it to be a realistic goal. Bacillussubtilis fulfills these criteria. It has a comparatively small genome (4,200 kb); the geaome shows a high level of stability (no demonstrated poiymorphism of the B. subtilis 168 strain); genetic analysis is highly advanced and has led to the construction of a very detailed genetic map (second only to that of Esehedchia cob). At the first level of analysis, the DNA sequence will lead to a complete catalogue of putative protein sequences. These are likely to fall into one of 3 categories: (1) those whose functions are known,

(2) those which show similarities with proteins identified in other organisms and which may have similar though not necessarily identical functions in B. subtilis, and (3) those, probably the majority, whose function is unknown at presem. The ease with which B. subtitis can be genetically manipulated means that it is particulally well suited to the analysis of the latter two groups of proteins by reverse genetics. It will be of great scientific interest to compare the sequence of the B, subtilis genome with that ot E. co# (the sequencing of which is being undertaken by American and Japanese groups). These organisms diverged about 1.2 billion years ago (I-[ori et eL, 1979). It will be interesting to examine the conservation of "'core" genes responsible for general metabolism. Of special interest will he analysis of genes involved in eellttiar processes specific to one particular organism. For example. ,9. subtilis has evolved specialized reslgnases, such as competence and sporulation, to adverse environmental conditions. Comparisons should thus lead to insights into gertome structure and evolution.

906

F. K U N S T A N D K. D E V I N E

A major challenge is the elucidation of the complex regulatory networks whose function is tn coordinate cellular processes. In addition to being of fundamental scientific inlerest, such knowledge will greatly facilitate the manipulation of industrially important Bacillus species, e.g.B, lieheniformts, B. natto, B. amyloliquefaeiens, B. brevis and B. polymyxa whose genetics is less well developed than that of B. subtilis. A knowledge of the corresponding regulatory systems in B. subtilis should lead to a great improvement in the expression of enzymes and metabolites in the industrially important organisms.

tify overlapping lambda phages in the assigned DNA region by screening the bank with DNA probes for genes which had already been sequenced. 4) The sequencing strategies, methods and computer analysis were to be optimized. Conslruction of libraries of cloned DNA [ragments TwO gene banks of B. 8ubtilis DNA were constructed using lambda Fixll, one at Trinity College by K. Devine and one at the Pasteur Institute by F. Kunst. Lambda Fixll DNA, digested with Xhol and partially filled-in with Klenow polymerase, producing $'-TC protruding ends, was supplied by a commercial supplier (Stratagene, La Jolla, USA). Partial 8au3A digestion and a partial fin-in reaction were performed on B. subtilis chromosomal D N A , producing fragments with 5'-GA ends, which were then separated using agarose gel eleetrophoresis. Fragments in the size range of 9-20 kbp were dectroeluted and purified. This DNA was ligat ed in vitro to lambda Fixll DNA and packaged, creating a bank of I x 104 to 5×104 independent recombinant phages per tag of vector DNA. Given the number of phages obtained, both banks should be representative of the total B. subtilis genome. The first results obtained with these banks are described below. The group of S.D. Ehrlieh constructed a collection of large segments of the B. subtilis genome in yeast artificial chromosomes (YAC). The following considerations led to this choice : (1) large DNA segments ( > 50 kbp) can be cloned in YAC, so that a representative collection requil'eS a relatively low number of clones; and (2) certain B. subtilis genes which are toxic for E. colt are not necessarily toxic for yeast, since the likelihood of their expression in this host is low.

Initiation of the project

The initial project was conceived by J. Hoeh (Scripps Clinic, La Jolla, USA) as a common undertaking involving 5 American groups and 5 European groups. The American groups have a grant application pending. Therefore, the project is for the moment limited to the following European groups, which obtained support from the E.E.C. : (1) R. Dedonder, G. Rapoport, A. Danchin (Pasteur Institute, France), (2) S.D. Ehrllch (INRA, Jouy-en-Josas, France), (3) A. Galizzi (University of Pavia, ":~ly), (4) J. Erringtnn (University of Oxford, United Kingdom), and (5) K. Devine (Trinity College, Dublin, Ireland). The group of D. Karamata (Lausanne, Switzerland) is participating in this project without being supported by the E.E.C. The group of S. Bron and G. Venema (University of Groningen. The Netherlands) is planning to join the project at a later stage. The participants agreed to use the following initial strategy : ( l ) the strain chosen for the project was B. subtilis 168, which is commonly used in most of the [ab0ratories working with Bacillus, and (2) regions of the chromosome were assigned to each participating group ; each region (between 200 and 400 kbp) began and ended at a gene which had already been cloned and, if possible, sequenced.

To clone large D N A segments it is necessary to minimize DNA breakage. All the biochemical steps necessary [or cloning (DNA preparation, partial cleavage with EcoRI in the presence of the cognate methylase and ligation to the vector) were therefore performed in agarnse blocks. A highly transformable yeast strain SX4-6A (from B. Dujon, Pasteur institute, Paris) was used as the cloning host. The cloning procedure was derived from that described by Burke et aL (1987). A total of 800 yeast clones carrying B. subtilis D N A segments were obtained, as ascertained by hybridizing B. subtifis total DNA probe with yeast chromosomes separated by pulse-field electrophoresis. Some ]63 YAC had inserts of > 90 kb, which

The goals to be achieved were as follows, l) The construction of a Notl or Sill restriction map of the B. sublilis chromosome. These maps are now available (Piggot, unpublished data; Ventra and Weiss, 1989). 2) The construction of genomic banks from strain 16g; two independent banks of recombinant lambda phages were to be constructed using the lambda Fixil vector (Srratagene, La Julia, USA); alternative vectors, pYAC and the bacteriophage ~105, were to be tested. 3) The ordering o f the DNA inseas in the lambda libraries ; each group had to iden-

OaF PCR

= open reading frame. = po yme~ ¢kain reaction.

I

YAC

= yci~staltificial chr~mosurae.

SEQUENCING OF THE B. SUBTILIS GENOME

the ~105 genome (fig. 1), Each cloned DNA segment ended im a polylinker, providing a choice of various different restriction endonocleases for cloning, Between these restriction sites and one of the phage DNA segments was a selectable chloramphenicol resistance determinant. The ligation reaction generated some molecules in which a fragment of target DNA was flanked by fragments of phage DNA in the correct orientation (partial fill-in reactions were used to minimize the proportion of rton-recombinant molecules). Selection for chloramphenicol-resistant transformants in a strain containing the • 105J 124 prophage generated recombinants that had replaced the region of dispensable phage DNA with a segment of @105 DNA. The library of recombinants that the collection of transformants constituted could be ztored as frozen cells, or amplified by prophage induction and screened for specific recombinants using standard methods. The problem of the poor transformability of O105 lysogens was overcome by incorporating an lad mutation (Garro and Law, 1974). The poor inducibility of ind mutants was avoided by the use of an indctsdouble mutant, which is thermo-inducible.

is 9 genome equivalents and highly representative with a probability of > 99 %. Comparison of the restriction maps of several randomly chosen YAC carrying a total of about 1,2130 kb of 8, 8ttbtilis DNA (25 % of the genome) with that of the B. subtilis chromosome suggests that titere were no rearrangemeres of the bacterial DNA in :yeast. Fnture efforts will be directed towards ordering the collection using probes provided by colleagues. J, Errington's group previously developed an efficient strategy for cloning in B. su#tilis using vectors based on bacteriophage Col05. The most recent improvements involved the development of a system in which defective phage vectors capable of incorporating inserts of at least 11 kbp were generated. The system was based on the prophage transformation method of Kawamura et aL (1979) with several improvements. The fanking DNA segments used to direct insertion into the prophage were providea by specific cloned DNA segments in E. cog. The inserts in plasraids pSG521 and pSG523 corresponded to flanking regions of a 6.9-kbp dispensable DNA fragment in

Insert DNA

pSG521

BXSPIt . ~

,~tO5J124 prophage

Ligation hi ritro

BXSP

pSG523

', X

:X I

[Non-essential DNA~: T J

; i/y

l

Recombinant prophage

...Insert DNA ~

907

Transformation of B. subtilis lysogenic for @105JI24 ind cts-52 !

Selection for chloramphenicol resistance /~/

ind cts-52

Fig. l. Cloning ir~ phage dP105JI24 by prophage transformati.on. Pla~mids pSG521 and pSG523 are shown linearized at one of theiJr polylinke¢ restriction sites. Ligatioa to target DNA generates some molecules of the composition shown. The ligation mixture is transformed into a B. subtilis strain lysogeeie for phage ~b105J124. Chloramphenicol-resistant tmnsformants could arise by a double-crossover event involving the regions of homology between the plasmld DNA segments and the prophage, as shown by the broken lines and crosses- ~105JI24 represents one example of the recombinant prophages that would be generated.

F. KUNST A N D ~. D E V I N E

908

Table I. Collection of B, subtilis genome ~egments in YAC.

Probe

Map position (rain)

dnaG ubrB spoVG spoOH spc gerD lipA amyR urn; srfA dal xfl par sspB comK add a[g .ors ssp D spoVE spollG flaA recE glnA cilB gltA odhA/B sspC thyB recG trpC spoVH spoOA spolllA dnaE spolllC sacC spoOB sdhC sdhB dnaB polA rpsD men gig degQ dtG gerA sacB degU tag rodC

0 3 6 ]l 12 16 22 25 26 30 38 48 55 65 80 86 102 118 121 133 135 152 160 167 173 180 181 182 195 200 203 211 217 220 224 227 233 241 252 252 256 257 263 273 275 279 289 289 296 306 310 314

No. or hybridizing clones

4 5 6 11 4 11 4 1 3 13 4 2 3 8 7 6 3 1 3 2 4 2 2 1 4 2 2 4 2 l 2 4 3 3 2 2 2 6 3 3 1 2 6 5 3 2 4 4 7 2 9 6

spollD sacA epr hut thrs2 gnt gyrA/B

316 330 333 335 344 347 355

6 3 4 3 9 3 6

Total

59

240

Ordering of cloned DNA fragments Ordering o f the Y A C Hybridization with known DNA segments (provided by several colleagues), led S.D. Ehrlich's group to identify 134 YAC, listed in table 1, which cover about 80 % of the genome. To order the YAC in contigs, probes homologous to YAC ends were generated by inverse PCR (polymerase chain reaction; Ochman et at,, 1988) and hybridized to DNA from YAC-carrying clones. The eontigs generated until now (represented as filled boxes, fig. 2) cover about 40 °/0 of the genome. Future work will be directed towards completion of the ordered YAC collection (which will be made available to the scientific community) and towards testing of YAC-based rapid sequencing strategies.

s

dnaG

epr

dal

B. subtilis

168

trpC

g/hA

Fig, 2. Contigs corresponding to characterized DNA segments cloned in YAC.

SEQUENCING OF THE B. SUBTILIS GENOME Map position 314

Gen¢li¢

markers

[g

dC

316

di II

318

g

320

322

324

326

sFolIDesrA nora spoO

328

[

Lambda (jumping method) Lamhda n o r a ¢¢

333

,

T

~ •

Lambda F phages

909

330

I

thiC

rocP

I

I

Lambda t/I/C

I-..-...I Lambda suet

~ Lambdasacs

C.---I

i~lnlhaaH

Fig. 3. Map of~he ~hromosomal region located between the markers gerR(S[4 °) and tyrT(about 33~-°l. The order of the markers was a.~indicated by Piggot (1989) except for the modified position of the spoIID marker. Recombinant phages containing DNA segments of this region are also shown. The star indicates that lambda nora a]so contains spollO.

Recombinant phage libraries The group at the Pasteur Institute was assigned the 315-335 ° region of the map and used this region to test the quality of its recombinant lambda phage library and to order the corresponding phages. This region contains the following markers, in this sup. posed order, gerB-rodC-divll-glyC-n#rA-spoOF-baesacA-sacP-sacT, followed by a region whose orientation was shown to be ambiguous by their work : either thiC-epr-saeX-saeY.tyrT or tyrT-sacY-saeXepr-thiC (fig. 3). The sacs locus comprising the saeX and saeY genes, was previously cloned using plasmld vector pMK4 (D~barbouill(~ et aL, 1987). As part of this project, the thiC and nora markers were also cloned using this plasrnid vector. Radioactive probes of these recombinant plasmids were used to identify and purify the recombinant phages containing the corresponding regions: lamhda-sacS, lambda-sacT, lambda-thiC and lambda.narA. In an attempt to bridge the gap between larabdasacT and lambda-sacS a l-kbp H/ndIlI fragment o f lambda-sacTDNA at the end of the DNA insert opposite the sacTmarker was used as a probe to screen the lambd~ bank leading to the isolation of a recombinant phage, lambda H, now under study. The gerB gene is currently being sequenced by D. Smith (Birmingham, UK) who will provide this gene for use as a probe. The spoOF a~d spoIID genes were previously sequenced (Truth et aL, 1985; Lopea-Diaz ez al., 1996). A DNA fragment of ~polID was amplified by PCR and used to identify a lambda phage hybridizing with this probe.

Finally, a chromosome-jumpirtg method was developed using an integrative plasmid vector (Glaser e/aL, in preparation). This method enabled us to recover a DNA probe in the spoOF-sacA region at a distance of about 50 kbp from the sacA marker, and the recornbinar~t phage hybridizing with this probe was isolated. The group at the Pasteur Institute now possesses 9 recombinant phages with an average insert size of about 15 kbp. Taking into account that some of the inserts are overlapping, they have presently cloned more than 100 kbp of the allotted DNA region. The group at Trinity College (Dublin) concentrated on two regions within its allotted segment of the genuine (100-130°): the arginin¢ biosynthetic operon at 102= and the PBS-X region at 112°. The genomlc lamhda bank constructed in this laboratory was screened for clones from the arginine and PBS-X regions of the chromosome• These clones were restriction mapped and checked for structural integrity. The group at the Universityof Pavia was assigned the sector of the B. subtitis ellromosome between markerspolCand dnaA (150a-165°). First, they started to order lambda clones from 2 overlapping clones corresponding to the outG Incus. Using traditional walking techniques, they isolated a set of 9 overlapping clones, corresponding to 80 kbp ofB. subtilis chromosomal DNA, None of the isolated clones contained dnaA or polC_ They then cloned the flaA locus. From the dasslcal map, the.f/cA locus appeared to be part o f their allotted segment between the dnaA and poIC markers. They revised the genetic map and the order is now dnaA, polC, flaA with flaA at 13 transductional map ur6ts from polC. They isolated

F. K U N S T A N D K. DEVINE

910

This observation was supported by the recent discovery of a second reading frame homologous to tyrosyl-tRNA syntheta~e in B. subtilis close to rpsD (]2 ~) (Grundy and Henkin, 1990). The gene corresponding to ORF-XX was found to encode a protein showing homology with galactokinase and it complemented a galaetokinase-deficient mutation of E. coli. The ORF-Xl-encodcd putative protein showed homology with: (1)the B. breris tycA gene coding for one of the enzymes involved in the biosynthesis e l the antibiotic tyroeidin, which is synthesized through a multienzyme thiotemplate mechanism ; (2) the peptide synthetase performing the first step of penicillin biosynthesis in Penicillum ehrysogenum (D. J. Smith et ¢1., 1990); (3) the coumutate CoA llgase. ORF-XXIII and -XXIV may encode ~raffype terminal oxidase subunits 1 and 3, respectively, since they showed strong homologies with cytocbrome oxidases from mitoebondria and from the bacterium Paracoeeus denitriJTeans. Finally, ORF-XXVI may encode a protein which presents homology with the B. subtilis SpoVE protein which is involved in sporulafion and with the E. coli FtsW and RodA proteins which are involved in cell division and eell wall biosynthesis, respectively. A. Galizzi's group (University of Pavia) has extended the seqt:ence of the outG locus. At present, they have obtained the complete sequence of 10,137 contiguous bp. Preliminary analysis of the sequence data indicated the presence of a peculiar structural organization. At the nucleotide level, 3 regions of

a lambda phage of 8.3 kbp corresponding to part of the flaA locus which had been sequenced (next section). Using lamhda EMBL3 and lambda EMBL4 libraries, D. Karamata's group identified 6 overlapping recombinant phages yielding 70 kb of continuous chromosomal DNA including the gtaB, gtaA and degU genes (Young et aL, 1989). One end o f this segment included the gerB marker {314°) and the corresponding phage was sent to D. Smith (Birmlngham~ United Kingdom), who is presently sequencing it. The other end went beyond the hag gone (307°). J. Errington isolated 2 recombinant ~105 phages complementing spo VD ( 133°) and spo VF (] 48 ~) mutations),

Sequencing At the Pasteur Institute, the B. subtilis DNA inserts of the 2 lambda recombinant phages, lambda sacs and lamhda sacT, were sequenced, which represents more than 34 kbp. Twenty-nine open reading frames (ORF) were identified in this segment (fig. 4). ORF-VII was shown to correspond to a gone, now called tyrT, encoding a tyrosyl-tRNA synthetase. This gone was found to code for an authentic tyrosyltP,NA syntbetase as shown by its ability to complement a thermozensitive lyrS mutation of E. coil However, disruption of this gone did not lead to any recognizable phenotype (Glaser et al., 1991).

Larnbda sacs

I

;

The project of sequencing the entire Bacillus subtilis genome.

The results obtained during the first year of the project involving the sequencing of the Bacillus subtilis genome are presented. Different gene libra...
397KB Sizes 0 Downloads 0 Views