IJSEM Papers in Press. Published August 20, 2014 as doi:10.1099/ijs.0.068205-0
1
Genome-based reclassification of Bacillus cibi as a latter heterotypic synonym of
2
Bacillus indicus and emended description of Bacillus indicus.
3
Running Title: Reclassification of Bacillus cibi as Bacillus indicus.
4
Contents category: New Taxa (Firmicutes)
5
Samantha J. Stropko†, Shannon E. Pipes† and Jeffrey D. Newman
6
Correspondence: Jeffrey D. Newman,
[email protected] ,
7 8
phone: 570-321-4386, fax: 570-321-4073 Biology Department, Lycoming College, Williamsport PA 17701 USA
9 10
†
11
The Whole Genome Shotgun projects of Bacillus indicus LMG 22858T and B. cibi DSM
12
16189T have been deposited at DDBJ/EMBL/GenBank under the accessions
13
JGVU00000000 and JNVC00000000, respectively. The versions described in this paper
14
are versions JGVU02000000 and JNVC02000000.
15
The online version of this paper includes the following supplemental data: Fig. S1. Dot
16
plot of B.indicus LMG 22858T vs B.cibi DSM 16189T. Table S1. B. indicus LMG 22058T
17
genes not present in B. cibi DSM 16189T. Table S2. B. cibi DSM 16189T genes not
18
present in B. indicus LMG 22058T. Fig. S2. Pigmentation of B.indicus and B.cibi.
19
These authors contributed equally to this work.
20 21
Abstract While characterizing a related strain, it was noted that there was little difference
22
between the 16S rRNA sequences of Bacillus indicus LMG 22858T and Bacillus cibi
23
DSM 16189T. Phenotypic characterization revealed differences only in the utilization of
24
mannose and galactose and slight variation in pigmentation. Whole genome shotgun
25
sequencing and comparative genomics were used to calculate established
26
phylogenomic metrics and explain phenotypic differences. The full, genome-derived
27
16S rRNA sequences were 99.74% similar. The average nucleotide identity (ANI) of the
28
two strains was 98.0%, the average amino acid identity (AAI) was 98.3%, and the
29
estimated DNA-DNA hybridization (DDH) determined by the genome-genome distance
30
calculator (GGDC) was 80.3%. These values are higher than the species thresholds for
31
these metrics, which are 95%, 95%, and 70% respectively, suggesting that these two
32
strains should be classified as members of the same species.
33
34
Bacillus indicus Sd/3T was isolated from arsenic-contaminated aquifer sand and
35
described by Suresh et al. (2004). Bacillus cibi JG-30T was isolated from the Korean
36
fermented seafood dish jeotgal, and published in 2005 (Yoon et al.). While Yoon et al.
37
used information about reference strains and cited Suresh et al., 2004, a comparison to
38
the B. indicus Sd/3T 16S rRNA was not reported, and B. indicus Sd/3T was not used as
39
a reference strain despite tight clustering of the original DNA sequences on the
40
neighbor-joining tree (Fig. 1).
41
While investigating environmental unknown organisms isolated from a freshwater
42
creek in the undergraduate microbiology course at Lycoming College (Newman, 2000;
43
Strahan et al., 2011; Kirk et al., 2013), strain SJS was found to be closely related to B.
44
indicus and B. cibi. The type strains B. indicus LMG 22858T and B. cibi DSM 16189T
45
were obtained from BCCM and DSMZ, respectively, characterized phenotypically and
46
their genomes were sequenced.
47
Genomic DNA was isolated from B. indicus LMG 22858T and B. cibi DSM 16189T
48
using the Qiagen DNeasy Blood and Tissue Kit according to the manufacturer’s
49
instructions for Gram positive bacteria. Libraries were prepared and then sequenced on
50
an Illumina MiSeq (V3 2x300 base) by the Indiana University Center for Genome
51
Studies as part of a Genome Consortium for Active Teaching NextGen Sequencing
52
Group (GCAT-SEEK) shared run (Buonaccorsi et al., 2011, 2014). Sequencing reads
53
were filtered (median phred score>20), trimmed (phred score >16), and assembled
54
using the paired-end de novo assembly option in NextGENe V2.3.4.2 (SoftGenetics).
55
The assembled genomes were uploaded to the Rapid Annotation with Subsystem
56
Technology web service (Aziz et al., 2008; Overbeek et al., 2014) for analysis, guided
57
contig reordering, and assembly improvement.
58
Approximately 0.9 million reads from the B. indicus LMG 22858T genome were
59
assembled into 22 contigs with a total genome length of 4,129,127 bp, an average
60
coverage of 52x and a G+C content of 44.4 mol%, significantly higher than the 41.2%
61
reported by Suresh et al., (2004). Automated annotation identified 4285 potential protein
62
coding sequences and 83 RNAs.
63 64
Approximately 1.4 million B. cibi DSM 16189T reads were assembled into 24 contigs with a total genome length of 4,072,175 bp, an average coverage of 71x and a
65
G+C content of 44.4 mol%, in good agreement with the 45% reported by Yoon et al.,
66
2005). Automated annotation identified 4253 potential protein coding sequences and 85
67
RNAs. Contigs were ordered and oriented to maximize synteny of the 3964 shared
68
coding sequences (Supplementary Figure S1).
69
The 16S rRNA sequences obtained from the sequenced genomes were
70
compared to the partial sequences reported at the time of initial publication using the
71
pairwise alignment (Myers and Miller, 1988) feature implemented on the EzTaxon-e
72
web-based service (Kim et al., 2012). The Bacillus indicus LMG 22858T genome-
73
derived 16S rRNA sequence was identical to the Bacillus indicus Sd/3T 16s rRNA partial
74
sequence (AJ583158) (Suresh et al., 2004) except for 2 differences and an insertion
75
within the first 14 bases of the partial rRNA sequence, which apparently did not have
76
the amplification primer sequences trimmed. Because the genome-derived sequence is
77
present within a large contig as part of a complete rRNA operon with the 23S and 5S
78
rRNA genes, it is likely to be the correct sequence. There was one difference between
79
the B. cibi DSM 16189T genome-derived 16S rRNA sequence and the B. cibi JG-30T
80
partial sequence (AY550276) (Yoon et al., 2005). The two genome-derived sequences
81
differed at four of 1533 bases for a similarity of 99.74%. The four differences
82
corresponded to two pairs of substitutions that maintained base-pairing within stems.
83
Several phylogenomic metrics have been developed to measure genome
84
similarity for taxonomic determinations. The traditional metric of DNA-DNA hybridization
85
(DDH) (Tindall et al., 2010) has been criticized due to technical difficulty, lack of
86
reproducibility, and inability to incorporate into databases. Meier-Kolthoff et al. (2013)
87
developed the Genome-Genome Distance Calculator (GGDC) to estimate the DNA-
88
DNA hybridization value based on high scoring segment pairs (HSPs) from genome
89
sequence comparisons. The estimated DDH between the B. cibi DSM 16189T and B.
90
indicus LMG 22858T genomes was 80.3% (95% confidence interval 78.3-82.3%), which
91
exceeds the 70% DDH species boundary.
92
Average Nucleotide Identity (ANI) reflects the similarity of 1 kb sequence
93
fragments using the algorithm described by Goris et al., (2007). The implementation on
94
the ExGenome Web service (http://www.ezbiocloud.net/ezgenome/ani) (Kim et al.,
95
2012) yielded a value of 97.8%, while the Kostas Lab implementation (http://enve-
96
omics.ce.gatech.edu/ani/ ) calculated the value as 98.24%. In both cases, this is higher
97
than the 94-96% threshold that correlates to the demarcation of separate species
98
(Konstantinidis & Tiedje, 2005a, Richter & Rosselló-Móra, 2009).
99
Average Amino Acid Identity (AAI) reflects the similarity of orthologous proteins.
100
Because amino acid sequences change more slowly than nucleotide sequences and
101
have greater complexity, AAI values are more sensitive over greater evolutionary
102
distances, yet can still be used to distinguish organisms at the species level as well
103
(Konstantinidis & Tiedje 2005b). Based on the bidirectional best hits (3964 cds)
104
identified by the sequence-based comparison tool of the SEED viewer (Overbeek et al.,
105
2005), AAI was calculated as 98.33% with the web-based NewmanLab AAI calculator
106
(http://lycofs01.lycoming.edu/~newman/aai/). Again, this is greater than the 95%
107
threshold that correlates to separate species.
108
During phenotypic characterization as described by Kirk et al. (2013), several
109
discrepancies with published results were noted. Motility during wet mounts was
110
observed for both strains, which each have more than 80 genes for motility and
111
chemotaxis, however, B. indicus was originally described as non-motile (Suresh et al.,
112
2004). Both organisms grew at 4%, 6%, and 8% NaCl on TSA plates and at 4% and 8%
113
NaCl in Biolog GenIII plates, though B. indicus was originally reported as negative for
114
growth under these moderately high NaCl concentrations. Growth at 8% NaCl and
115
motility were also observed for both strains by Khaneja et al. (2010). Neither organism
116
grew at 44oC, in contrast to the positive growth result reported by Yoon et al. (2005) at
117
this temperature. Khaneja et al. (2010) also observed that both strains show resistance
118
to arsenic and arsenate, though B. cibi DSM 16189T exhibited slightly lower maximum
119
tolerated concentrations.
120
While examining the utilization of carbon sources on the Biolog GenIII plates,
121
both organisms yielded a strong positive response in the negative control well when
122
using Biolog Inoculating Fluid A, suggesting that they contained stored carbon sources.
123
Consistent with this hypothesis, both organisms contained a glycogen synthesis and
124
utilization operon. A weaker response in the negative control well was observed with
125
inoculating fluid B which allowed the assessment of exogenous carbohydrate utilization
126
abilities. The primary nutritional differences between the two strains were that B. indicus
127
LMG 22858T utilized mannose, but not galactose, while B. cibi DSM 16189T utilized
128
galactose but not mannose. Examination of the genome sequences revealed the
129
existence of a 55 kb region with a mannose utilization operon that was present in B.
130
indicus LMG 22858T but absent in B. cibi DSM 16189T(Supplemental Table S1). A likely
131
explanation for the lack of galactose utilization by B. indicus LMG 22858T is a 1 base
132
deletion near the 3’ end of the Galactose-1-phosphate uridylyltransferase gene that
133
results in an apparently non-functional in-frame fusion with the adjacent aldose-1-
134
epimerase gene. While these organisms shared approximately 93% of their genomes,
135
B. indicus LMG 22858T contained 321 unique coding sequences, including a 34 kb
136
prophage (Table S1). B. cibi DSM 16189T contained 282 unique coding sequences
137
(Table S2).
138
All other standard phenotypes tested and listed in the emended species
139
description below were identical to each other and to previously reported results except
140
for β-glucosidase activity on the API ZYM test strip (Biomerieux) which was present in
141
B. indicus LMG 22858T but not in B. cibi DSM 16189T.
142
Fatty acid profiles were generated by Gas-Liquid Chromatography (MIDI
143
Sherlock version 6.1, method RTSBA6) using cells grown on tryptic soy broth agar
144
(TSBA) for 24 hr at 30oC and extracted using the standard method (Sasser, 1990). The
145
fatty acid composition (Table 1) was essentially identical and the most abundant fatty
146
acids were iso-C15:0 , and anteiso-C15:0.
147
While colonies of B. indicus LMG 22858T and B. cibi DSM 16189T grown on
148
TSBA at 30oC for 24 hr appear similar, the pigmentation of B. cibi DSM 16189T colonies
149
is darker by 72 hr (Fig S2a). To extract the pigments, 5-10 mg of cell mass was
150
scraped from a plate, spread around the inside of a 1.5 mL microcentrifuge tube, which
151
was incubated in the shaker-incubator at 250 rpm and 28oC with 1 mL acetone for 1 hr.
152
The acetone extract was decanted into a 2 mL sample vial and analyzed on an Agilent
153
1200 HPLC with an Agilent Zorbax Eclipse XDB C18 column (4.6 mm x 250 mm) at
154
40oC and a diode array detector. Separation was achieved using 50 mM NaH2PO4 (pH
155
4.5) as Solvent A; methanol containing 0.l% v/v glacial acetic acid as Solvent B; a flow
156
rate of 1 ml min-1; and the following gradient: 0-5 minutes at 0%; increase to 75%
157
solvent B at 10 min; increase to 100% solvent B at 30 min; remain at 100% solvent B
158
until 45 min. The elution profiles revealed a variety of compounds with different spectra
159
that absorbed light in the visible range (Fig. S2b-c). The optimal wavelength to detect all
160
of the peaks was 452 nm (Fig. 2a). There were three clusters of peaks that could be
161
distinguished based on absorbance spectrum (Fig. 2b); the 29-32 min peaks had
162
spectra consistent with 4,4’-diaponeurosporene, 36-40 min peaks had spectra
163
consistent with 4,4’-diaponeurosporenoate, and the 33-36 min peaks had spectra
164
consistent with Staphyloxanthin. Analysis of the two strains’ genomes revealed several
165
operons with homologs of the Staphylococcus aureus crtOPQMN genes that are
166
responsible for the synthesis of the golden pigment staphyloxanthin (Pelz et al., 2005).
167
Some of the different peaks in each region are likely to be different geometric isomers
168
(Melendez-Martinez et al., 2013). Both organisms contain all of the genes necessary to
169
produce staphyloxanthin, but B. indicus LMG 22858T contains less of this pigment (Fig.
170
2b), which is apparent in the more yellow, less golden color (Fig. S2a).
171
Based on the minor phenotypic and genomic differences, it is proposed that B.
172
cibi Yoon et al., 2005 should be considered as a later heterotypic synonym of B. indicus
173
Suresh et al. 2004.
174 175
Emended description of Bacillus indicus Suresh et al., 2004)
176
The description is the same as that given by Suresh et al. (2004) except for the
177
following traits. Cells are motile. Growth is observed on R2A and tryptic soy agars, but
178
not on phenylethanol or mannitol-salt agars. Yellow-golden non-diffusible carotenoid-
179
type pigments are produced that are consistent with staphyloxanthin and biosynthetic
180
precursors. The pH range for growth is 6–9, and cells grow at up to 8% NaCl. Cells
181
exhibit oxidase, catalase, amylase, and caseinase activities, but do not liquefy gelatin.
182
On the API ZYM panel, positive for C4 esterase, C8 esterase lipase, leucine
183
arylamidase (weak), α-chymotrypsin, acid phosphatase (weak), naphthol-AS-BI-
184
phosphohydrolase, β-galactosidase, and α-glucosidase, and negative for alkaline
185
phosphatase, C14 lipase, valine and cystine arylamidase, trypsin, α-galactosidase, β-
186
glucuronidase, and N-acetyl-β-glucosaminidase, α-mannosidase, and α-fucosidase, but
187
β-glucosidase varies.
188
189 190
When using inoculating fluid B and the Biolog GenIII plate (27oC), positive for utilization
191
of dextrin, D-maltose, D-trehalose, D-cellobiose, gentiobiose, sucrose, D-turanose,
192
stachyose, D-raffinose, D-melibiose, β-methyl-D-glucoside, D-salicin
193
N-acetyl-D-glucosamine, α-D-glucose, D-fructose, glycerol, gelatin, glycyl-L-proline, L-
194
alanine, L-aspartic acid, L-glutamic acid, L-histidine, L-pyroglutamic acid, L-serine,
195
pectin, D-gluconic acid, D-glucuronic acid, methyl pyruvate, D-lactic acid methyl ester,
196
L-lactic acid, α-keto-glutaric acid, L-malic acid, Tween-40, β-hydroxy-D,L-butyric acid
197
acetoacetic acid, and acetic acid; utilization of D-mannose and galactose varies;
198
negative for the remaining utilization tests. Also on the GenIII plate, resistant to pH 6,
199
NaCl at 1, 4 and 8%, 1% Na-lactate, LiCl, K-tellurite, aztreonam, and Na-butyrate;
200
sensitive to the remaining inhibitory conditions. The DNA G+C content is 44.4 mol%.
201
The Type strain is Sd/3T (=MTCC 4374T=DSM 15820T), the former Type strain of B.cibi,
202
JG-30T (=KCTC 3880T=DSM 16189T), is a second strain of B. indicus.
203 204
Acknowledgements
205
This work has been funded by a Lycoming College Professional Development Grant to
206
JDN and a Summer Student Research Grant to SJS. GCAT-SEEK has been supported
207
by NSF Award # DBI-1248096: RCN-UBE - GCAT-SEEK: The Genome Consortium for
208
Active Undergraduate Research and Teaching Using Next-Generation Sequencing.
209
References
210
Aziz, R.K., Bartels, D., Best, A.A., DeJongh, M., Disz, T., Edwards, R.A., Formsma,
211
K., Gerdes, S., Glass, E.M. & other authors (2008). The RAST Server: rapid
212
annotations using subsystems technology. BMC Genomics 9, 75.
213
Buonaccorsi, V. P., Boyle, M. D., Grove, D., Praul, C., Sakk, E., Stuart, A., Tobin,
214
T., Hosler, J., Carney, S. L. & other authors (2011). GCAT-SEEKquence: Genome
215
Consortium for Active Teaching of Undergraduates through Increased Faculty Access to
216
Next-Generation Sequencing Data. CBE Life Sci Educ 10, 342-5.
217
Buonaccorsi, V.P., Peterson, M., Lamendella, R., Newman, J.D., Trun, N., Tobin,
218
T., Aguilar, A., Hunt, A., Praul,C. & other authors (2014). Vision and Change
219
through the Genome Consortium on Active Teaching using Next-Generation
220
Sequencing (GCAT-SEEK). CBE Life Sci Educ 13, 1-2.
221
Felsenstein J. (1985). Confidence limits on phylogenies: An approach using the
222
bootstrap. Evolution 39, 783-791.
223
Goris, J., Konstantinidis, K.T., Klappenbach, J.A., Coenye, T., Vandamme, P. &
224
Tiedje, J.M. (2007). DNA–DNA hybridization values and their relationship to whole-
225
genome sequence similarities. Int J Syst Evol Microbiol 57, 81–91.
226
Kim, O.S., Cho, Y.J., Lee, K., Yoon, S.H., Kim, M., Na, H., Park, S.C., Jeon, Y.S.,
227
Lee, J.H. & other authors (2012). Introducing EzTaxon-e: a prokaryotic 16S rRNA
228
Gene sequence database with phylotypes that represent uncultured species. Int J Syst
229
Evol Microbiol 62, 716–721.
230
Khaneja, R., Perez-Fons, L., Fakhry, S., Baccigalupi, L., Steiger, S., To, E.,
231
Sandmann, G., Dong, T.C., Ricca, E. & other authors (2010). Carotenoids found in
232
Bacillus. J of Applied Microbiol 108, 1889–1902.
233
Kimura M. (1980). A simple method for estimating evolutionary rate of base
234
substitutions through comparative studies of nucleotide sequences. Journal of
235
Molecular Evolution 16, 111-120.
236
Kirk, K.E., Hoffman, J.A., Smith, K.A., Strahan, B.L., Failor, K.C., Krebs, J.E., Gale,
237
A.N., Do, T.D., Sontag, T.C. & other authors (2013). Chryseobacterium angstadtii sp.
238
nov., isolated from a newt tank. Int J Syst Evol Microbiol 63, 4777-4783.
239
Konstantinidis, K.T. & Tiedje, J.M. (2005a). Genomic insights that advance the
240
species definition for prokaryotes. Proc Natl Acad Sci USA 102, 2567-2572.
241
Konstantinidis, K.T. & Tiedje, J.M. (2005b). Towards a Genome-Based Taxonomy for
242
Prokaryotes. J Bacteriol 187(18), 6258.
243
Meier-Kolthoff, J.P., Auch, A.F., Klenk, H.-P. & Göker, M. (2013). Genome
244
sequence-based species delimitation with confidence intervals and improved distance
245
functions. BMC Bioinformatics 14, 60.
246
Melendez-Martinez, A.J., Stinco, C.M., Liu, C. & Wang, X.D. (2013). A simple HPLC
247
method for the comprehensive analysis of cis/trans (Z/E) geometrical isomers of
248
carotenoids for nutritional studies. Food Chem 138(2–3), 1341–1350.
249
Myers, E. W. & Miller, W. (1988). Optimal alignments in linear space. Comput Appl
250
Biosci 4, 11-17.
251
Newman J. D. (2000). Molecular phylogeny in the undergraduate microbiology
252
laboratory. Focus on Microbiol Educ 6, 3–4.
253
Overbeek, R., Begley, T., Butler, R.M., Choudhuri, J.V., Chuang, H.Y., Cohoon, M.,
254
de Crecy-Lagard, V., Diaz, N. & other authors. (2005). The subsystems approach to
255
genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids
256
Res 33, 5691-5702.
257
Overbeek, R., Olson, R., Pusch, G.D., Olsen, G.J., Davis, J.J., Disz, T., Edwards,
258
R.A., Gerdes, S., Parrello, B. & other authors (2014) The SEED and the Rapid
259
Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids
260
Research 42 (D1): D206-D214. doi: 10.1093/nar/gkt1226.
261
Pelz, A., Wieland, K.P., Putzbach, K., Hentschel, P., Albert, K. & Götz, F. (2005).
262
Structure and Biosynthesis of Staphyloxanthin from Staphylococcus aureus. J Biol
263
Chem 280, 32493-32498.
264
Richter, M. & Rosselló-Móra, R. (2009). Shifting the genomic gold standard for the
265
prokaryotic species definition. Proc Natl Acad Sci USA 106, 19126-19131.
266
Saitou, N. & Nei, M. (1987). The neighbor-joining method: A new method for
267
reconstructing phylogenetic trees. Mol Biol Evol 4, 406-425.
268
Sasser, M. (1990). Identification of bacteria by gas chromatography of cellular fatty
269
acids. Technical Note no. 101. Newark, DE: MIDI Inc.
270
Strahan, B.L., Failor, K.C., Batties, A.M., Hayes, P.S., Cicconi, K.M., Mason, C.T. &
271
Newman, J.D (2011). Chryseobacterium piperi sp. nov., isolated from a freshwater
272
creek. Int J Syst Evol Microbiol 61, 2162-2166.
273
Suresh, K., Prabagaran, S. R., Sengupta, S. & Shivaji, S. (2004). Bacillus indicus sp.
274
nov., an arsenic-resistant bacterium isolated from an aquifer in West Bengal, India. Int J
275
Syst Evol Microbiol 54, 1369–1375.
276
Tamura K., Stecher G., Peterson D., Filipski A. & Kumar S. (2013). MEGA6:
277
Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30, 2725-2729.
278
Tindall B.J., Roselló-Móra, R., Busse, H.-J., Ludwig, W. & Kämpfer, P. (2010).
279
Notes on the characterization of prokaryote strains for taxonomic purposes. Int J Syst
280
Evol Microbiol 60, 249-266.
281
Yoon, J. H., Lee, C. H. & Oh, T. K. (2005). Bacillus cibi sp. nov., isolated from jeotgal,
282
a traditional Korean fermented seafood. Int J Syst Evol Microbiol 55, 733–756.
283
284
Table 1. Fatty acid composition (%) of B. indicus LMG 22858T and B. cibi DSM
285
16189T grown on TSBA at 30oC for 24 hr.
286 287
Strains: 1, B. indicus LMG 22858T; 2, B. cibi DSM 16189T. Fatty acids amounting to