AEM Accepts, published online ahead of print on 29 August 2014 Appl. Environ. Microbiol. doi:10.1128/AEM.01525-14 Copyright © 2014, American Society for Microbiology. All Rights Reserved.
1
High diversity of tailed phages, eukaryotic viruses and virophage-like elements in the
2
metaviromes of Antarctic soils
3 4
Olivier Zablockia, Lonnie van Zylb, Evelien M. Adriaenssensa, Enrico Rubagottia, Marla
5
Tuffinb, Craig Caryc, and Don Cowana#
6 7
Centre for Microbial Ecology and Genomics, University of Pretoria, Pretoria, South Africaa;
8
Institute for Microbial Biotechnology and Metagenomics, University of the Western Cape,
9
Bellville, South Africab; School of Biological Sciences, University of Waikato, Hamilton,
10
New Zealandc
11 12
Running Head: Metaviromics of Antarctic soil communities
13 14
#Address correspondence to Don A. Cowan,
[email protected] 15 16 17 18 19 20 21 22 23 1
24
Abstract
25
The metaviromes of two distinct Antarctic hyperarid desert soil communities have
26
been characterized. Hypolithic communities, cyanobacteria-dominated assemblages situated
27
on the ventral surfaces of quartz pebbles embedded in the desert pavement, showed higher
28
virus diversity than surface soils, which correlated with previous bacterial community
29
studies. Prokaryotic viruses (i.e. phages) represented the largest viral component (particularly
30
Mycobacterium phages) in both habitats, with an identical hierarchical sequence abundance
31
of tailed phage families (Siphoviridae > Myoviridae > Podoviridae). No Archaeal viruses
32
were found. Unexpectedly, cyanophages were poorly represented in both metaviromes and
33
were phylogenetically distant from currently characterized cyanophages. Putative phage
34
genomes were assembled and showed a high level of unaffiliated genes, mostly from
35
hypolithic viruses. Moreover, unusual gene arrangements were observed, in which eukaryotic
36
and prokaryotic virus-derived genes were found within identical genome segments.
37
Phycodnaviridae and Mimiviridae viruses were the second most abundant taxa and more
38
numerous within open soil. Novel virophage- like sequences (within the Sputnik clade) were
39
identified. These findings highlight a high virus diversity and novel species discovery
40
potential within Antarctic hyperarid soils and may serve as a starting point for future studies
41
targeting specific viral groups.
42
Introduction
43
Antarctica is the coldest, driest place on Earth (1). Exposed soil areas comprise
44
approximately 0.4% of the continent’s surface and are mainly located in coastal areas,
45
particularly on the Antarctic Peninsula and in the McMurdo Dry Valleys (2). These mineral
46
soils are exposed to a range of “extreme” abiotic factors, including very low temperatures,
47
high soil salinity, low water availability and nutrient levels, high levels of UV radiation and
2
48
strong, cold winds descending from glaciers or mountain tops (katabatic). Due to these
49
conditions, the most morphologically distinct soil communities (i.e. type I, II and III
50
hypoliths) are associated with lithic surfaces (3, 4). Hypolithic communities, occurring on the
51
ventral surfaces of translucent quartz rocks, have been shown to be mostly composed of
52
phototrophic cyanobacterial species (5). These photoautotroph-dominated communities have
53
been attributed crucial roles within the Antarctic soil ecosystem, such as primary productivity
54
and nitrogen input (6, 7). While the composition of these communities is now reasonably well
55
understood (8-12), the associated viruses, with their potential to influence microbial
56
population dynamics and nutrient cycling via viral lysis (13), have yet to be characterised.
57
No comprehensive analyses of the collective viral genomic content
58
(i.e. the metavirome) of Antarctic soils have yet been published, with the limited number of
59
reported Antarctic viral metagenomic studies focusing on aquatic systems (14, 15, 16) and
60
Antarctic mega fauna such as seals (17) and penguins (18). Metaviromic surveys of saline
61
meromictic lakes have shown a high diversity of virus-like particles (mostly phages) and
62
several virophages (15, 16). To date, the few studies of viruses in Antarctic soils have all
63
focused on classical phage isolation and lytic induction experiments from culturable bacterial
64
species (19, 20, 21). Here we report a comprehensive characterization of virus diversity using
65
a metagenomic approach in Antarctic desert soils, with a focus on the dsDNA virus
66
composition of two common microhabitats: open surface soils and hypolithic communities.
67
Materials and Methods
68
Sampling location
69
Samples were collected from the Miers Valley, Ross Dependency in Eastern
70
Antarctica (GPS coordinates: 78º 05.6’ S, 163º 48.6’ E) during the austral summer period of
71
2011. For the open soil sample, 1.5 kg of surface soil (0-2cm depth) was collected from an
3
72
approx. 1m2 area at a single location. The hypolith sample consisted of 0.5 kg of hypolith
73
scrapings gathered aseptically from a collection of cyanobacterial-type hypoliths (n > 50)
74
from an area of approx. 50 m2. The open soil sample was recovered from within this area.
75
Samples were transferred and stored in sterile Whirl-Paks (Nasco, product number
76
B01445WA) at below 0ºC in the field and during transport, and at -80ºC in the laboratory.
77
Sample processing, DNA extraction and sequencing
78
Processing of both sample was performed similar to (22). Both the open soil sample
79
and bulked hypolithic samples were suspended in 3L of de-ionized water and shaken
80
vigorously. Solids were allowed to settle and the supernatant was decanted. The process was
81
repeated and both supernatants were mixed. The supernatant was centrifuged at 1593 ×g for
82
10 min (Beckman, JA10 rotor), decanted and passed through a 0.22μm filter (Millipore,
83
Streicup 500ml, 0.22 μm, Cat. no. SCGPU05RE). Virus particles were collected from the
84
filtrate by centrifugation in a Beckman JA20 rotor at 43,667 ×g for 6 hours, in autoclaved
85
30ml Nalgene PPCO tubes (cat. No. 3119-0030). The 6L were spun down by discarding the
86
supernatant from each 30ml tube (x 8 in a JA20) after a round of centrifugation, then adding
87
another 30ml of the extract to the tube. The individual pellets were resuspended in 3ml
88
successively: the first pellet was resuspended in 3ml TE buffer and the liquid was then
89
transferred to the next tube, the pellet resuspended properly, then transferred to the next tube
90
and so on until all pellets were resuspended. The pellets were treated with DNAseI (EN0521,
91
Fermentas) and RNAseA (EN0531, Fermentas) to a final concentration of 0.1 µg/ml at 37ºC
92
for 1 hour. The presence of bacterial DNA was checked by amplifying the 16S rRNA gene
93
(primers E9F and U1510R; 23, 24) as follows: 1µl of genomic DNA was mixed with 2.5µl of
94
each primer (10mM), 2.5 µl of 2 µM dNTPs, 2.5 µl of 10X DreamTaq buffer (ThermoFisher
95
Scientific, MA, USA), 1µl BSA 10 mg/ml, 0.125 µl DreamTaq polymerase (ThermoFisher
4
96
Scientific, MA, USA) and milliQ water to a total volume of 25 µl. PCR was conducted under
97
the following thermal regime: 95 °C for 5 min, 95°C for 30 sec, 52°C for 30 sec, 72°C for 85
98
sec (30 cycles) and 10 min at 72°C. The virus suspension was treated with Proteinase K
99
(Fermentas) at a final concentration of 1µg/ml at 55°C for 2 hours. 70 µl SDS (20%) was
100
added and incubated at 37 ºC for 1 hour. Nucleic acids were purified by performing two
101
rounds
102
chloroform:isoamyl alcohol (24:1) phase separation. DNA was precipitated by the addition of
103
one tenth volume sodium acetate (3M, pH 5.2) and two volumes 100% ethanol and left
104
overnight at 4ºC. Samples were centrifuged at 29,000×g for 10 min to pellet the DNA, which
105
was resuspended in 30ul of TE buffer. The DNA was further cleaned using the Qiagen Gel
106
Extraction kit (Qiaex II, cat. no. 20021). 10 ng of each sample was then used to perform
107
Phi29 amplification (GE Healthcare GenomiPhi HY DNA amplification kit CAT# 25-6600-
108
20) using the manufacturer's recommendations. Library preparation included a 10% phiX V3
109
spike as per manufacturer’s instructions (Preparation guide, Part #15031942, May 2012
110
revision) with the Illumina Nextera XT library prep kit/MiSeq Reagent kit V2. The amplified
111
DNA was sequenced (2×250bp reads, ~250bp average insert size) on the Illumina MiSeq
112
Sequencer platform located at the University of the Western Cape, Cape Town, South Africa.
113
These sequence data have been submitted to the DDBJ/EMBL/GenBank databases (sequence
114
read archive, SRA) under study accession SRP038018 (hypolith library) and SRP 035457
115
(open soil library).
116
Sequence data analysis
of
phenol:chloroform:isoamyl
alcohol
(25:24:1)
extraction
followed
by
117
Sequence reads were curated for quality control and adapter trimmed using CLC
118
Genomics version 6.0.1 (CLC, Denmark), using the default parameters. Unpaired reads were
119
aligned against each other using Bowtie under default parameters. De novo assembly for each
120
read dataset was performed with both CLC Genomics and DNASTAR Lasergene SeqMan 5
121
assembler suite using the default parameters. Reads and contigs were uploaded to the
122
MetaVir (25) version 2 server (http://metavir-meb.univ-bpclermont.fr/) and MG-RAST (26;
123
http://metagenomics.anl.gov/) for virus diversity estimations (data available from these
124
webservers). Taxonomic composition by MetaVir was computed from a BLAST comparison
125
with the Refseq complete viral genomes protein sequences database from NCBI (release of
126
2013-05-01) using BLASTp with a threshold of 10-5 on the e-value. Assembled reads were
127
searched for open reading frames and compared to the RefSeq complete viral database
128
(through the MetaVir pipeline) and MG-RAST, which include annotations using, for
129
functional and organism assignment: GenBank, IMG, KEGG, PATRIC, RefSeq, SEED,
130
SwissProt, trEMBL and eggnog. The subset of affiliated (i.e. predicted genes with a database
131
match) contigs generated by CLC Genomics were compared to the contigs generated by
132
LaserGene using BLASTx under standard parameters. For the assignment of functional
133
hierarchy, COG, KO and NOG databases were used. Guanine-cytosine (GC) content was
134
determined by importing .fasta files into BioEdit (27). The presence of tRNAs in annotated
135
contigs
136
http://lowelab.ucsc.edu/tRNAscan-SE/ (28). For prediction of phage lifestyle and host Gram
137
stain reaction, whole genome protein sequences of candidate phage genomes were submitted
138
to the online version of PHACTS (http://www.phantome.org/PHACTS/; 29). Aligned marker
139
genes showing sufficient homology (>150bp, MetaVir) against the contigs were recovered
140
and phylogenetic analysis was performed using MEGA5 (http://www.megasoftware.net/).
141
Rooted dendrograms were inferred using the maximum likelihood method with a bootstrap
142
test of a 1000 pseudo-replicates. Phylogenetic analysis for virophage sequences was
143
performed independently from MetaVir. Metavirome virophage amino acid sequences, as
144
well as 9 virophage MCP sequences obtained from the NCBI Genbank database were aligned
was
assessed
with
the
tRNAscanSE
6
software
accessible
through
145
with the online version of MAFFT version 7 (http://mafft.cbrc.jp/alignment/software/). Tree
146
construction was conducted as outlined in Zhou et al., 2013 (30).
147
Results
148
Viral diversity estimations
149
The presence of bacterial contamination was deemed negligible in both metavirome
150
libraries (using the 16S gene fragment), as no discernable bands of amplified products were
151
obtained. MiSeq reads ranged from 236-241 bp, with an overall higher G+C content within
152
reads obtained from the open soil library. Sequencing metadata, assembly metrics, BLASTp
153
searches are summarized in Table 1. BLASTx comparison of contig datasets from both
154
habitat libraries (generated by two separate assemblers) revealed that 99.41% (hypolith
155
library) and 99.5% (open soil library) of affiliated contigs were shared between the two
156
assembled read datasets. Contigs from CLC Genomics were used for the remainder of the
157
analysis. Aligned against each other, libraries contained 66.01% of reads that were unique to
158
each habitat, while 33.99% were shared (a read aligned at least once). In both read datasets,
159
bacteria were the most represented hits (80.7-94.5%). However, these estimations varied
160
depending on the metagenomic platform used and whether reads or contigs were submitted.
161
BLASTp searches from MetaVir using contigs produced significantly more virus-related hits
162
compared to MG-RAST. For example, 1.9% of the open soil contigs were predicted to be
163
viral in origin in MG-RAST, while MetaVir with the same dataset predicted 18.8 %. The
164
same was true for the hypolith library, where MG-RAST predicted 12.8% for viruses, while
165
MetaVir predicted 19.2%. Archaea, Eukaryota and “other” represented the smallest fraction
166
while viruses (particularly in hypolith) were second in terms of contig affiliations (Figure 1).
167
Total diversity (α-diversity) was computed by MG-RAST using normalized values, since
168
unequal distribution of reads between open soil and hypolith libraries were obtained. The
7
169
hypolith library was three-fold more diverse compared to the open soil library (344.5 vs.
170
1058.4 species). In contrast, the open soil showed higher taxonomic abundance (species
171
evenness, γ-diversity) compared to the hypolith (15663 vs. 11480; c.f. Table 2).
172
Proteobacteria and Firmicutes were the most abundant in both libraries, with viruses in
173
hypolith the third most abundant organisms. Rarefaction curves generated by MG-RAST
174
(Supplementary Figure 1) showed the hypolith library was sampled more comprehensively
175
compared to the open soil. Nineteen virus families were identified by MetaVir (Table 3), in
176
which prokaryotic viruses were the most abundant in both habitats (OS = 76.0%; HY =
177
82.3%). Identified phages were dominated by the order Caudovirales in the following
178
abundance ranking (identical for both habitats): Siphoviridae > Myoviridae > Podoviridae.
179
The next most highly represented virus families were Mimiviridae and Phycodnaviridae, both
180
more numerous in the open soil sample. Viral parasites of large dsDNA viruses, i.e.
181
virophages (31), were exclusively identified in the open soil habitat. Signatures from
182
Adenoviridae, Bicaudaviridae, Hytrosaviridae, Retroviridae and Rudiviridae were found in
183
low numbers in the hypolith habitat only. Both habitats contained 13.5-15.1% of sequences
184
identified as unclassified viruses.
185
Due to the lack of universal markers for viruses (such as the 16S rDNA marker used
186
for bacteria or the 18S for eukaryotes), markers targeting virus families/species were used
187
instead as an alternative to improve taxonomic affiliation of the annotated ORFs, from both
188
assembled reads (contigs) and reads alone. Sequences with significant homology to reference
189
markers are shown in Supplementary Table 1. The large terminase subunit (TerL) marker,
190
required for packaging initiation in members of the Caudovirales (32), was the most common
191
match in both habitats. This was consistent with the taxonomic affiliations of contigs in the
192
virus families shown in Table 3. Non-bacterial viruses (Paramecium bursaria chlorella virus
193
and Emiliana huxleyi virus, family: Phycodnaviridae and invertebrate viruses, family:
8
194
Ascoviridae), identified with the major capsid protein (MCP) and DNA polymerase family B
195
(PolB) gene markers, were found exclusively within the open soil community. For virophage-
196
related sequences, 5 candidate ORFs were submitted to a tBLASTn query and showed closest
197
similarity to the Zamilon (33) and Sputnik (31) virophages, both isolated from soil and
198
aquatic environments, respectively. We attempted to determine its phylogenetic relationship,
199
as amongst the very few currently recognised virophages, one has been isolated from Organic
200
Lake, Antarctica. Amongst these ORFs, a partial MCP sequence (344 amino acids long) was
201
identified and aligned with other known virophage MCP sequences (30). The MCP tree in
202
Supplementary Figure 2 shows an identical clustering pattern to that of (30) and indicates that
203
the virophage sequence from open soil (abbreviated Miers Valley Soil Virophage, MVSV)
204
belonged to the Sputnik virophage group (cluster 1), and was not more closely related to its
205
Antarctic counterpart. Its position within the tree suggests that MVSV shares a genetically
206
distant ancestor with Sputnik and Zamilon virophages. No reads with significant homology to
207
the psbA gene (a marine cyanophage photosynthesis-related gene) were identified. However,
208
other cyanophage sequences were detected within the G20 and PhoH phylogenies from the
209
hypolith dataset alone (Figure 2), present as highly divergent sequences at the root of
210
cyanophage sequence clusters. A summary of marker-identified phage species for each
211
marker is shown in Supplementary Table 2.
212
Functional composition of hypoliths and open soil
213
The hypolith dataset was highly uncharacterized (predicted proteins with no
214
significant homologs), with 81.3% compared to the open soil with 58.5%. 26 functional
215
categories were assigned to both libraries (Figure 3), each sub-divided into distinct sub-
216
systems. Apart from the phage category, functional abundance in all categories was greater in
217
open soil. Highest abundance variations between both biotopes included several metabolic
218
pathways involving phosphorus, nitrogen, aromatic compounds and iron metabolism. 9
219
Dormancy and sporulation- related functions were also notably higher in the open soil. A
220
similar trend was found for stress-related functions, including oxidative, osmotic and acid
221
stress. However, found almost exclusively in the hypolith library were desiccation stress-
222
related protein functions. Virus-specific functional components were retrieved manually from
223
the MetaVir server, counted and classified into several virus component categories, shown in
224
Supplementary Table 3. Both habitat samples contained genes encoding numerous virus
225
structural proteins (portal, tape measure, and capsid) and enzymes (terminases, DNA/RNA
226
polymerases, helicases and lysins), consistent with an abundance of tailed phage related
227
components.
228
subset of 26 were selected for further analysis (Supplementary Table 4) based on a
229
combination of criteria: size (≥ 10,000 bp), percentage of annotated ORFs within a contig
230
(11-100%) and predicted circularity of the putative genome. On average, the percentage of
231
homologous genes from public databases was 40.2 ± 21.2% in open soil and 31.9 ± 12.5% in
232
the hypolith sample. Average G+C content in open soil and hypolith contigs was 55.6 ± 7.3%
233
and 44.3 ± 3.2%, respectively. Phage genomes were submitted to PHACTS (29) for lifestyle
234
(temperate or lytic) and host Gram reaction prediction. As a general trend for both habitats,
235
putative temperate phages dominated (61.5%) while predicted host range was 88.5% Gram
236
negative. These predictions are supported by a recent study (34), which reported that Gram
237
negative Proteobacteria were the dominant phylum in hypolithic and open soil habitats within
238
the Dry Valleys.
239
contigs contained genes from two virus families infecting algae (Phycodnaviridae-like) and
240
amoeba (Mimiviridae-like), positioned between phage-related genes. The largest contig
241
(AntarOS_1, 177 571 bp) contained one gene from Acanthamoeba polyphaga mimivirus and
242
one from Paramecium bursaria chlorella virus A1 while the rest of the genes were phage
243
related. Several core genes (35) from the nucleocytoplasmic large DNA viruses (NCLDVs)
Due to the large number of assembled contigs, a
For the open soil habitat alone,
10
244
were identified in the open soil (also to a lesser extent in the hypolith library) contig dataset.
245
These included Topoisomerase II, RNA polymerase subunit 2, guanylyltransferase, RuvC,
246
dUTPase 2, thymidilate kinase, MutT/NUDIX motif and Ankyrin repeats. A hybrid gene
247
arrangement from different viruses was found in another contig, AntarOS_17 (Figure 4). This
248
24 870 bp contig was divided into 30 predicted ORFs, 21 of which showed significant
249
homology to virus genes in the RefSeq database (detailed BLAST result for individual ORFs
250
can be found in Supplementary Table 5). Of the 30 predicted ORFs, from gene 11 to gene 30
251
but excluding gene 18, 67% showed significant homology at the amino acid level with a
252
single microalgae-infecting virus species (unclassified Tetraselmis viridis virus S1,
253
NC_020869.1). Gene 4-8, 10 and 18 showed similarity to several phages infecting
254
Burkholderia, Rhodobacter and Azospirillum species. The genome size was too short to be
255
considered phycodnavirus-like (36), and possessed phage genes that were in usual functional
256
synteny towards phage head maturation (such as TerS, TerL and capsid protein, ORF 4-8). A
257
tBLASTn search using the protein sequence of the TerL gene against the NCBI nr database
258
showed highest similarity (34-35% identity, e-value 6x10-76) to various Streptococcus phi
259
phages including SsUD1, m46.1 and D12. To verify that this contig did not result from read
260
misassembly (i.e. chimeric), two sets of primers were designed to amplify fragments from the
261
overlapping region between ORF10 and ORF 11, which based on the contig annotations,
262
appeared to delineate two gene sets from different viruses. Amplicons of expected sizes were
263
obtained and sequenced bi-directionally by Sanger technology. The sequenced fragments
264
aligned with their respective regions, which indicated that this contig region was correctly
265
assembled.
266
Phage-host associations
267
As both habitats showed a high diversity of phage-related sequences, taxonomic
268
affiliation of the reads (marker gene - independent) were categorized according to host and 11
269
relative sequence abundances in both habitat samples (Figure 5). Phage sequences identified
270
most closely to host species spanning 5 bacterial phyla: Firmicutes (7 bacterial genera),
271
Proteobacteria (8 bacterial genera), Cyanobacteria (3 bacterial genera), Bacteroidetes
272
(Flavobacterium) and Actinobacteria (4 bacterial genera). By comparing the bacterial OTU
273
distributions in the same soil environments generated by (34), we attempted to correlate
274
presence/absence of bacterial OTUs based on the phage sequences obtained. Additionally, we
275
included in our comparison 454 sequencing-based soil metagenomic data from (37), which
276
surveyed moraine soil collected from the margins of a permanent melt water pond located at
277
Mars Oasis on Alexander Island, west of the Antarctic Peninsula. Based on identified phage
278
species, only Firmicutes were found in both soil habitats in this study but not in the
279
16S/TRFLP bacterial data from (34). This discrepancy between phage and bacterial data was
280
also observed for hypoliths in hot desert soils (Adriaenssens, personal communication). The
281
other major bacterial phyla (Proteobacteria, Cyanobacteria, Bacteroidetes and Actinobacteria)
282
were present in both the metavirome data and 16S/TRFLP sequence data. However, bacterial
283
genera indirectly identified by their phages from this study, were all found in the Pearce et al.
284
survey (37).
285
At the level of individual phages, Lactococcus and Mycobacterium phage
286
sequences were most common in the hypolith sample (> 10%), whereas in open soil the
287
largest fraction (> 6%) was composed of Bacillus, Pseudomonas and Mycobacterium phage
288
sequences. Few phage host species could be linked to the 16S/TRFLP data, but at the phylum
289
level, Proteobacteria and Actinobacteria were present in both datasets. Caulobacter and in
290
Flavobacterium were found both in this study (identified by their phage) and 16S/TRFLP
291
data. However, the top 10 virus fraction obtained by Pearce et al. was similar to this study,
292
where Mycobacterium phages ranked first (in the case of the hypolith sample), but also
12
293
included Pseudomonas, Enterobacteria, Flavobacterium and Synechococcus phages.
294 295
Discussion
296
Unlike aquatic ecosystems which have received considerable attention since the
297
advent of viral metagenomics (38), virus diversity of many soil habitats has not been
298
extensively characterized. In studies of Antarctic continental microbiology, only fresh water
299
lake metaviromes have been reported to date (14, 15). Recent phylogenetic studies of
300
Antarctic Dry Valley soils have shown that hypolithic communities represent the most
301
biodiverse and complex biological assemblages in this hyperarid soil biome (3). Diversity
302
estimations generated from our data (including viruses) suggest a similar pattern (three-fold
303
higher diversity compared to open soil). Furthermore, read libraries shared a mere 33%
304
similarity overlap, indicating hypolithic communities as distinct soil habitat from their
305
surroundings. This uniqueness, coupled with its higher microbial diversity, may entitle
306
hypolithic communities as biodiversity “micro hotspots” in this hyperarid desert. A large
307
fraction of ORFs (62.5-84.5%) from both soil habitat samples had no significant homologs in
308
public sequence databases, also observed in another published soil metavirome (39).
309
Rarefaction curves indicated that the open soil biotope has not been sampled sufficiently, and
310
therefore a greater sequencing depth would be advisable in future metagenome experiments
311
for this habitat.
312
Taxonomic/functional affiliation of predicted ORFs and gene marker analyses (e.g.
313
using TerL) were consistent with the conclusion that tailed bacteriophages were the primary
314
virus component in both soil habitats. Furthermore, an identical family-specific hierarchical
315
abundance was observed for both habitats (Siphoviridae >Myoviridae >Podoviridae) but with
316
a higher sequence diversity in hypolith communities. Compared to other virus groups, phage
13
317
sequences were predictably over-represented in both metaviromes, given that the biotic
318
component in these soils is dominated by prokaryotes (9). However, we note that our nucleic
319
acid extraction method would exclude RNA viruses (either single or double stranded) and we
320
therefore do not claim that our data reflect the complete viral diversity in these soil habitats.
321
As a sampling bias, viruses in a prophage state may constitute a large (and unsurveyed)
322
proportion of the dsDNA phage diversity, given that it has been reported (19) that many soil-
323
borne bacteria appear to contain prophages, including those from Antarctic soils. Conversely,
324
our data suggests that ~61% of phage assemblages are temperate.
325
Very few archaeal virus signatures were found in either soil habitat, consistent with
326
previous prokaryote diversity studies (9, 34). Unexpectedly, cyanophages were poorly
327
represented in the hypolith sample (in terms of sequence abundance and diversity). Given the
328
dominance of cyanobacteria in Type I hypoliths (10, 12), it was reasonably predicted that
329
cyanophage would represent a major clade. The apparent success of cyanobacteria as
330
dominant elements of the hypolithic community might possibly be linked to the low
331
abundance of associated viruses (where the phage infection of other bacterial groups such
332
Mycobacteria, Bacillus, Flavobacterium and Pseudomonads was higher and their host
333
populations under tighter predation control). However, this is in contradiction to the general
334
understanding that the most abundant phage groups in any given environment reflect the
335
abundance of microbial community members found in that environment (40). While it is thus
336
possible that cyanophages genuinely represent a minor component of phage diversity, we
337
suggest that this result is an artefact of the substantial under-representation of soil-associated
338
cyanophage genome sequences in public databases, further accentuated by the fact that most
339
characterized cyanophages are of aquatic origin (41). To our knowledge, only a small fraction
340
of cyanophages of soil origin have been described to date (42), which also reported a high
341
phylogenetic distance from “common” marine cyanophages, emphasizing the fact that little is
14
342
known about these cyanophages. In support of this, cyanophage communities in paddy field
343
soils have been shown to be different from those in freshwater, marine water and even paddy
344
floodwater, identifying unique G20 subclusters specific to soil derived cyanophages (42). In
345
the current study, marker gene analysis successfully identified several metavirome sequences
346
at the root of cyanophage clusters, suggesting that these represent novel phage phylotypes
347
with a high genetic distance from currently characterized cyanophages. The high abundance
348
of phages infecting certain bacterial genera such as Mycobacterium, Lactococcus, Bacillus
349
and Pseudomonas phages (~6-10% of all identified phage sequences) in both Antarctic desert
350
soil habitats has been reported by Pearce et al. (2012).
351
Sequences with close homology to large dsDNA eukaryotic virus families such as
352
Mimiviridae and Phycodnaviridae-like genomic elements were found as the second largest
353
virus component (0.88% - 4.33%) in both habitats (excluding the unclassifiable virus fraction
354
(13.5%-15.1%)). Mimivirus -related sequences were unexpected, as a 0.22µm filter size
355
should have excluded large virus particles (~ 0.7 micron; (43)), as well as repeated
356
centrifugation steps. Phycodnaviruses, at ~0.16 ± 0.06 microns (44), would be expected to be
357
recovered in the filtrate. However, detection of MCP components from a novel Sputnik-like
358
virophage, a parasite of large dsDNA viruses (31), provided further indirect evidence for the
359
presence of mimivirus-like populations in the open soil habitat. In addition, the identified
360
virophage sequence was more closely related to geographically distant isolates (France and
361
Tunisia) compared to the other virophage isolate from Organic Lake in Antarctica. A recent
362
study (37) showed sequences belonging to both host genera (Paramecium, Chlorella and
363
Acanthamoeba) and their associated viruses (Chlorovirus and Mimivirus) in moraine
364
Antarctic soil. La Scola et al. (45) first demonstrated the presence of Mimiviruses in soil
365
(previously only isolated from aquatic habitats). The present metavirome sequences,
366
combined with pyrosequencing data of metagenomic libraries from Pearce et al. (37), provide
15
367
additional evidence for the presence of Mimivirus-like genome elements in Antarctic soils.
368
Further sampling to isolate virophages from Antarctic soils would provide further
369
understanding into the ecology and function of these infectious agents, given that their
370
contributions into the regulation of viral populations are starting to be become apparent in
371
other habitats (33). The unusual gene configuration observed within contig AntarOS_17
372
(were phage and eukaryotic viruses were predicted) was confirmed by PCR on the original
373
DNA sample, therefore ruling out misassembly of the reads for this region. Most likely, this
374
was caused by misannotation of the predicted ORFs, caused by a lack of closer homologs in
375
databases. While read misassembly is still a possibility in other generated contigs, confidence
376
level in assembly accuracy was high as the Illumina sequencing control used in both runs was
377
phiX174, (a ~5000 bp ssDNA virus) which was re-assembled almost completely (99.7%) and
378
correctly annotated by the MetaVir pipeline.
379
representing less than 0.5% of sequence abundance (Table 1) included those infecting
380
infected Diptera, arthropods and other invertebrates and were mostly found in the open soil
381
habitat. As these hosts have been shown to occur on the Antarctic Peninsula (46, 47), this
382
may represent an additional pool of uncharacterized viruses within the Antarctic invertebrate
383
fauna.
384
correlation between phage genera from this study and their associated hosts identified in
385
other bacterial diversity studies was established (34, 37, 48, 49). As in previous hypolith/open
386
soil community diversity (α-diversity) comparisons (Makhalanyane et al., 2013), where
387
hypoliths showed a higher degree of diversity compared to open soil, the same was
388
demonstrated to be true for their associated viruses.
389
represents an initial broad survey of virus diversity in Antarctic hyperarid desert soils, and
390
has demonstrated these local virus assemblages are highly diverse and largely
391
uncharacterized. Due to a huge gap in terms of homologous sequences in present databases,
Virus
families
A positive
16
This study
392
the generation of additional metagenomic sequence data is not likely to yield usable
393
information. This emphasises the need for more “traditional” studies, performed in parallel on
394
identical sample sources. These include morphological data from microscopy, lytic induction
395
(e.g. mitomycin C) upon raw soil and Sanger sequencing of clones targeting specific virus
396
families. Unfortunately, a large fraction will most likely remain uncharacterizeable in vitro,
397
as the majority of their host (bacteria in particular) remain unculturable. Larger eukaryotic
398
viruses infecting algae, amoeba and invertebrates have not previously been characterized in
399
this environment, and our data demonstrate that these viruses represent an unknown virus
400
population that awaits characterization. Such data would further advance our understanding
401
of the trophic structure and function of communities inhabiting this cold, hyperarid desert
402
biome.
403
Acknowledgements
404
The authors gratefully acknowledge financial support from the National Research
405
Foundation (SANAP), the University of Waikato NZTABS program, Antarctica New
406
Zealand and the University of Pretoria Genomics Research Institute. The authors also wish to
407
thank
408
(CHPC),
409
Technology of South Africa. E.M Adriaenssens holds a Vice-Chancellor’s fellowship of the
410
University of Pretoria and O. Zablocki is supported by the NRF-DST doctoral Innovation
411
fund. Opinions expressed and conclusions arrived at, are those of the authors and are not
412
necessarily to be attributed to the NRF. The authors declare no conflict of interests.
413
References
414 415
the an
Centre initiative
for supported
High by
the
Performance Department
of
Computing Science
and
1. Oerlemans J, Fortuin JPF. 1990. Parameterization of the annual surface temperature and mass balance of Antarctica. Ann. Glaciol. 14: 78-84. 17
416 417
2. Bockheim JG, McLeod M. 2008. Soil distribution in the McMurdo dry valleys, Antarctica. Geoderma 144: 43-49.
418
3. Cary SC, McDonald IR, Barrett JE, Cowan DA. 2010. On the rocks: the
419
microbiology of Antarctic Dry Valley soils. Nature Rev. Microbiol. 8(2): 129-138.
420
4. Wierzchos J, de los Ríos A, Ascas, C. 2013. Microorganisms in desert rocks: the
421
edge of life on Earth. Int. Microbiol. 15(4): 172-182.
422
5. Wood SA, Rueckert A, Cowan DA, Cary SC. 2008. Sources of edaphic
423
cyanobacterial diversity in the Dry Valleys of Eastern Antarctica. ISME J. 2(3): 308-
424
320.
425
6. Tracy CR, Streten‐Joyce C, Dalton R, Nussear KE, Gibb KS, Christian KA.
426
2010. Microclimate and limits to photosynthesis in a diverse community of hypolithic
427
cyanobacteria in northern Australia. Appl. Environ. Microbiol. 12(3): 592-607.
428
7. Cowan DA, Sohm JA, Makhalanyane TP, Capone DG, Green TGA, Cary SC,
429
Tuffin IM. 2011. Hypolithic communities: important nitrogen sources in Antarctic
430
desert soils. Environ. Microbiol. Rep. 3(5): 581-586.
431 432
8. Smith MC, Bowman JP, Scott FJ, Line ME. 2000. Sublithic bacteria associated with Antarctic quartz stones. Antarct. Sci. 12(2):177-184.
433
9. Pointing SB, Chan Y, Lacap DC, Lau MC, Jurgens JA, Farrell RL. 2009. Highly
434
specialized microbial diversity in hyper-arid polar desert. Proc. Natl. Acad. Sci. U. S.
435
A 106(47):19964-19969.
436 437
10. Cowan DA, Khan N, Pointing S, Cary SC. 2010. Diverse hypolithic refuge communities in Antarctic Dry Valleys. Antarct. Sci. 22: 714–720.
438
11. Cowan DA, Pointing SB, Stevens MI, Cary SC, Stomeo F, Tuffin IM. 2011.
439
Distribution and abiotic influences on hypolithic microbial communities in an
440
Antarctic Dry Valley. Polar Biol. 34(2): 307-311.
18
441
12. Khan N, Tuffin M, Stafford W, Cary C, Lacap DC, Pointing SB, Cowan DA.
442
2011. Hypolithic microbial communities of quartz rocks from Miers Valley,
443
McMurdo Dry Valleys, Antarctica. Polar. Biol. 34(11): 1657-1668.
444
13. Laybourn-Parry J. 2009.No place too cold. Science 324(5934): 1521-1522.
445
14. Säwström C, Lisle J, Anesio AM, Priscu JC, Laybourn-Parry J. 2008
446
Bacteriophage in polar inland waters. Extremophiles 12(2):167-175.
447
15. López-Bueno A, Tamames J, Velázquez D, Moya A, Quesada A, Alcamí A. 2009.
448
High diversity of the viral community from an Antarctic lake. Science 326(5954):
449
858-861.
450
16. Yau S, Lauro FM, DeMaere MZ, Brown MV, Raftery MJ, Andrews-Pfannkoch
451
C, Lewis M, Hoffman JM, Gibson JA, Cavicchioli R. 2011. Virophage control of
452
antarctic algal host–virus dynamics. Proc. Natl. Acad. Sci. U. S. A 108 (15): 6163-
453
6168.
454
17. Kennedy S, Kuiken T, Jepson PD, Deaville R, Forsyth M, Barrett T, Wilson S.
455
2000. Mass die-off of Caspian seals caused by canine distemper virus. Emerg. Infect.
456
Dis. 6(6): 637.
457
18. Wallensten A, Munster VJ, Osterhaus ADME, Waldenström J, Bonnedahl J,
458
Broman T, Olsen B. 2006. Mounting evidence for the presence of influenza A virus
459
in the avifauna of the Antarctic region. Antarctic Science 18(3): 353.
460
19. Williamson KE, Radosevich M, Smith DW, Wommack KE. 2007. Incidence of
461
lysogeny within temperate and extreme soil environments. Environ. Microbiol. 9(10):
462
2563-2574.
463
20. Meiring TL, Tuffin IM, Cary C, Cowan DA. 2012. Genome sequence of temperate
464
bacteriophage Psymv2 from Antarctic Dry Valley soil isolate Psychrobacter sp. MV2.
465
Extremophiles 16(5): 715-726.
19
466
21. Swanson MM, Reavy B, Makarova KS, Cock PJ, Hopkins DW, Torrance L,
467
Taliansky M. 2012. Novel bacteriophages containing a genome of another
468
bacteriophage within their genomes. PloS One 7(7): e40683.
469
22. Adriaenssens EM, Van Zyl L, De Maayer P, Rubagotti E, Rybicki E, Tuffin M,
470
Cowan DA. 2014. Metagenomic analysis of the viral community in Namib Desert
471
hypoliths. Environ. Microbiol. DOI: 10.1111/1462-2920.12528
472
23. Hansen MC, Tolker‐Nielsen T, Givskov M, Molin S. 1998. Biased 16S rDNA PCR
473
amplification caused by interference from DNA flanking the template region. FEMS
474
Microbiol. Ecol. 26(2):141-149.
475 476
24. Reysenbach AL, Pace NR, Robb FT, Place AR. 1995. Archaea: a laboratory manual—thermophiles. Cold Spring Harb. Protoc. 16: 101-107.
477
25. Roux S, Faubladier M, Mahul A, Paulhe N, Bernard A, Debroas D, Enault F.
478
2011. Metavir: a web server dedicated to virome analysis. Bioinformatics 27(21):
479
3074-3075.
480
26. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T,
481
Rodrigez A, Stevens R, Wilke A, Wilkening J, Edwards RA. 2008. The
482
metagenomics RAST server–a public resource for the automatic phylogenetic and
483
functional analysis of metagenomes. BMC bioinformatics, 9(1): 386.
484 485
27. Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic. Acids. Symp. Ser. 41:95-98.
486
28. Schattner P, Brooks AN, Lowe TM. 2005. The tRNAscan-SE, snoscan and snoGPS
487
web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 33: W686-
488
W689.
489 490
29. McNair K, Bailey BA, Edwards RA. 2012. PHACTS, a computational approach to classifying the lifestyle of phages. Bioinformatics 28(5): 614-618.
20
491 492
30. Zhou J, Zhang W, Yan S, Xiao J, Zhang Y, Li B, Pan Y, Wang Y. 2013. Diversity of virophages in metagenomic data sets. J. Virol. 87(8): 4225-4236.
493
31. La Scola B, Desnues C, Pagnier I, Robert C, Barrassi L, Fournous G, Raoult D.
494
2008. The virophage as a unique parasite of the giant mimivirus. Nature 455(7209):
495
100-104.
496 497
32. Black LW. 1995. DNA packaging and cutting by phage terminases: control in phage T4 by a synaptic mechanism. Bioessays 17(12): 1025-1030.
498
33. Gaia M, Benamar S, Boughalmi M, Pagnier I, Croce O, Colson P, Raoult D, La
499
Scola B. 2014. Zamilon, a novel virophage with Mimiviridae host specificity. PLOS
500
One 9(4): e94923.
501
34. Makhalanyane TP, Valverde A, Birkeland NK, Cary SC, Tuffin IM, Cowan DA.
502
2013. Evidence for successional development in Antarctic hypolithic bacterial
503
communities. ISME J. 7(11):2080-2090.
504 505 506 507
35. Lyer LM, Aravind L, Koonin EV. 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. Virol. 75(23): 11720-11734. 36. van Ettel JL, Graves MV, Müller DG, Boland W, Delaroque N. 2002. Phycodnaviridae–large DNA algal viruses. Arch. Virol. 147(8): 1479-1516.
508
37. Pearce DA, Newsham KK, Thorne MA, Calvo-Bado L., Krsek M, Laskaris P,
509
Hodson A, Wellington EM. 2012. Metagenomic analysis of a southern maritime
510
Antarctic soil. Front. Microbiol. 3.
511
38. Srinivasiah S, Bhavsar J, Thapar K, Liles M, Schoenfeld T, Wommack KE.
512
2008. Phages across the biosphere: contrasts of viruses in soil and aquatic
513
environments. Res. Microbiol. 159(5): 349-357.
514
39. Fierer N, Breitbart M, Nulton J,Salamon P, Loozupone C, Jones R, Robeson M,
515
Edwards RA, Felts B, Rayhawk S, Knight R, Rohwer F, Jackson RB. 2007.
21
516
Metagenomic and small-subunit rRNA analyses reveal the genetic diversity of
517
bacteria, fungi, and viruses in soil. Appl. Environ. Microbiol. 73(21):7059-7066.
518
40. Breitbart M, Rohwer F. 2005. Here a virus, there a virus, everywhere the same
519
virus? Trends Microbiol. 13(6): 278-284.
520
41. Hurst CJ. 2011. Studies in viral ecology, vol. 1. New Jersey, USA: Wiley-Blackwell.
521
42. Wang G, Asakawa S, Kimura M. 2011. Spatial and temporal changes of
522
cyanophage communities in paddy field soils as revealed by the capsid assembly
523
protein gene g20. FEMS Microbiol. Ecol. 76(2):352-9.
524
43. Claverie JM, Grzela R, Lartigue A, Bernadac A, Nitsche S, Vacelet J, Abergel C.
525
2009. Mimivirus and Mimiviridae: giant viruses with an increasing number of
526
potential hosts, including corals and sponges. J. Invertebr. Patho. 101(3): 172-180.
527
44. Dunigan DD, Fitzgerald LA, Van Etten JL. 2006. Phycodnaviruses: a peek at
528
genetic diversity. Virus. Res. 117(1): 119-132.
529
45. La Scola B, Campocasso A, N’Dong R, Fournous G, Barrassi L, Flaudrops C,
530
Raoult D. 2010. Tentative characterization of new environmental giant viruses by
531
MALDI-TOF mass spectrometry. Intervirology 53(5): 344-353.
532
46. Convey P, Gibson JA, Hillenbrand CD, Hodgson DA, Pugh PJ, Smellie JL,
533
Stevens MI. 2008. Antarctic terrestrial life–challenging the history of the frozen
534
continent? Biol. Rev. Camb. Philos. Soc. 83(2): 103-117.
535 536
47. Treonis AM, Wall DH, Virginia RA. 1999. Invertebrate biodiversity in Antarctic Dry Valley soils and sediments. Ecosystems 2(6): 482-492.
537
48. Friedmann EI. 1993. Antarctic Microbiology. Wiley-Liss. New York, p 634.
538
49. Smith JJ, Tow LA, Stafford W, Cary C, Cowan DA. 2006. Bacterial diversity in
539
three different Antarctic cold desert mineral soils. Microb. Ecol. 51(4): 413-421.
540
22
541 542
543 544 545 546 547 548
FIGURES
549 550 551 552
Figure 1: Taxonomic affiliation of the predicted ORFs in both habitat libraries, assigned by MG-RAST.
553 554
23
555 556
Figure 2: Selected cyanophage sub-tree phylogenies from G20 (A) and PhoH (B) marker genes based on
557
protein alignments retrieved from the MetaVir 2.0 analysis server (metagenomic read selection and tree
558
construction methods are outlined in Roux et al. (2011).
559 560 561
24
Functional Abundance Virulence, Disease and Defense Sulfur Metabolism Stress Response Secondary Metabolism Respiration Regulation and Cell signaling RNA Metabolism Protein Metabolism Potassium metabolism
Functional categories
Photosynthesis Phosphorus Metabolism Phages, Prophages, Transposable elements, Plasmids Nucleosides and Nucleotides Nitrogen Metabolism Motility and Chemotaxis Metabolism of Aromatic Compounds Membrane Transport Iron acquisition and metabolism Fatty Acids, Lipids, and Isoprenoids Dormancy and Sporulation DNA Metabolism Cofactors, Vitamins, Prosthetic Groups, Pigments Clustering-based subsystems Cell Wall and Capsule Cell Division and Cell Cycle Carbohydrates Amino Acids and Derivatives 0
0.2
0.4
0.6
0.8
1
Normalized abundance ratio
Open soil
Hypolith
562 563
Figure 3: Functional assignment of predicted ORFs compared in both soil habitats. (Functional annotation
564
performed by MG-RAST using a 60% similarity cut-off).
565 25
1.2
566 567
Figure 4: Predicted genome organization of phage AntarOS_17, assembled from open soil reads. Arrows represent
568
open reading frames (ORFs) and their orientation. Purple indicates ORFs without taxonomic affiliations or ORFs
569
that were predicted without a known function (ORFan). Red indicates ORFs which were more closely related to
570
bacteriophage genes and green indicates ORFs more closely related to Tetraselmis viridis virus S1, a phycodnavirus.
571
Numbers below the continuous black line (whole contig length) represent the nucleotide number at a given point.
572
Numbers on top of each ORF arrow denotes gene number. HP denotes a hypothetical protein.
573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 26
589 590
Figure 5: Relative abundance of identified phage isolate sequences based on predicted ORFs identified by MetaVir
591
(e-value cut-off: 10-5) against the RefSeq database. Phage isolates were clustered and counted per their bacterial
592
hosts in their respective habitats (hypolith: dark grey); open soil: light grey). Black lines below a subset of phage-
593
infecting bacterial species indicate the bacterial host phylum.
594 595 596 597 598 599 600 601 602 603 27
TABLES
604 605
Table 1: NGS metadata, including assembly, annotation, and diversity statistics produced by CLC
606
Genomics and MG-RAST.
607 608
Pre-QC nbr. of reads Post-QC nbr. of reads Average read length (post-QC) Mean G+C % Nbr. / % of reads not assembled into contigs Nbr. of contigs generated Minimum length (nt) Maximum length (nt) N25/N50/N75 (nt) % unknown proteins % annotated proteins α-diversity (total, not limited to viruses)
Open Soil library 1,622,598 1,597,524 236.95 52±12 111,385 (6.97)
Hypolith library 3,771,948 3,729,606 241.25 47±8 274,272 (7.35)
22,237 200 177,571 865/446/267 58.5 39.2 344.508
53,695 200 50,044 945/553/354 81.3 14.5 1058.398
609 610
Table 2: Relative abundance of the most represented phyla in both biotopes identified by MG-RAST.
611
OS: open soil; HY: hypolith. Top Phyla Proteobacteria Firmicutes Viruses Actinobacteria Bacteroidetes Cyanobacteria Choroflexi Verrucomicrobia Chordata Unclassified Eukaryotes Planctomycetes
Relative abundance OS HY 9493 3492 2722 2647 302 1469 521 1039 1133 876 173 443 60 121 165 110 91 109 85 106 103
87
% from total abundance OS HY 60.6 30.4 17.4 23 1.9 12.8 3.3 9 7.2 7.6 1.1 3.9 0.4 1.1 1.1 1 0.6 0.9 0.5 0.9 0.7
0.8
612 28
613
Table 3: Taxonomic abundance of identified viral ORFs (BLASTp with a threshold of 10-5 on e-value)
614
identified by MetaVir in both Antarctic biotopes. Virus order Caudovirales
615
Virus family
Host
Myoviridae Podoviridae Siphoviridae Herpesviridae Adenoviridae Ascoviridae Asfarviridae
Hypolith (%/sequence hits) 20.9 (1305) 9.04 (565) 52.6 (3287) 0.11 (7) 0.02 (1) 0.03 (2) 0.03 (2)
Bacteria, Archaea Bacteria Bacteria, Archaea Herpesvirales Vertebrates Families not Vertebrates assigned into an Invertebrates order1 Swine, arthropod borne Baculoviridae Invertebrates 0.08 (5) Bicaudaviridae Archaea 0.05 (3) Hytrosaviridae Diptera (flies) 0.02 (1) Inoviridae Bacteria 0.16 (10) Iridoviridae Amphibians, fishes, 0.11 (7) invertebrates Microviridae Bacteria 0.30 (19) Mimiviridae Amoeba 0.88 (55) Phycodnaviridae Algae 1.74 (109) Polydnaviridae Parasitoid wasps 0.02 (1) Poxviridae Humans, 0.06 (4) arthropods, vertebrates Retroviridae Vertebrates 0.02 (1) Rudiviridae Thermophilic 0.03 (2) Archaea Viruses not Unclassified viruses N/A 13.5 (844) assigned into Sputnik virophage Mimivirus-infected 0.00 families1 amoeba 1 Virus families not yet assigned into an order as per the ICTV 2012 release.
29
Open soil (%/sequence hits) 26.1 (415) 11.0 (175) 38.3 (610) 0.13 (2) 0.00 (0) 0.37 (6) 0.06 (1) 0.25 (4) 0.00 (0) 0.00(0) 0.06 (1) 0.44 (7) 0.50 (8) 2.32 (37) 4.33 (69) 0.00 (0) 0.56 (9) 0.00 (0) 0.00 (0) 15.1 (241) 0.37 (6)