Accepted Article

Received Date : 30-Jan-2014 Revised Date : 24-Apr-2014 Accepted Date : 07-May-2014 Article type : Resource Article

DNA barcoding of Murinae (Rodentia: Muridae) and Arvicolinae (Rodentia: Cricetidae) distributed in China

Jing Li1, Xin Zheng1, Yansen Cai2,3, Xiuyue Zhang1, Min Yang2, Bisong Yue1*, Jing Li2* 1

Key Laboratory of Bio-Resources and Eco-Environment (Ministry of Education), College

of Life Sciences, Sichuan University, Chengdu, 610065, China. 2

Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life

Sciences, Sichuan University, Chengdu, 610064, China. 3

Department of Medical Biology and Genetics, Luzhou Medical College, Luzhou, 646000,

China.

Corresponding author: Jing Li*, Bisong Yue* Corresponding address: College of Life Sciences, Sichuan University, South section I, Yihuan Road, Chengdu 610064, Sichuan Province, P. R. China Corresponding author’s Email and Fax: Email: [email protected], [email protected]; Fax: +86 28 85414886; Tel: +86 28 13808067169,+86 28 13688135316

This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1111/1755-0998.12279 This article is protected by copyright. All rights reserved.

Accepted Article

Keywords: DNA barcoding; COI; Murinae; Arvicolinae; species identification

Running title: DNA barcoding of Murinae and Arvicolinae

Abstract

Identification of rodents is very difficult mainly due to high similarities in morphology and controversial taxonomy. In this study, mitochondrial cytochrome oxidase subunit I (COI) was used as DNA barcode to identify the Murinae and Arvicolinae species distributed in China, and to facilitate the systematics studies of Rodentia. In total, 242 sequences (31species, 11 genera) from Murinae and 130 sequences (23 species, 6 genera) from Arvicolinae were investigated, of which 90 individuals were novel. Genetic distance, threshold method, tree-based method, online BLAST and BLOG were employed to analyze the datasets. There was no obvious barcode gap. The average K2P distance within species and genera was 2.10% and 12.61% in Murinae, and 2.86% and 11.80% in Arvicolinae, respectively. The optimal threshold was 5.62% for Murinae and 3.34% for Arvicolinae. All phylogenetic trees exhibited similar topology and could distinguish 90.32% of surveyed species in Murinae and 82.60% in Arvicolinae with high support values. BLAST analyses yielded similar results with identification success rates of 92.15% and 93.85% for Murinae and Arvicolinae, respectively. BLOG successfully authenticated 100% of detected species except Leopoldamys edwardsi based on the latest taxonomic revision. Our results support the species status of recently recognized Micromys erythrotis, Eothenomys tarquinius, and E. hintoni, and confirm the important roles of comprehensive taxonomy and accurate morphological identification in DNA barcoding studies. We believe that, when proper analytic methods are applied or combined, DNA barcoding could serve as an accurate and effective species identification This article is protected by copyright. All rights reserved.

Accepted Article

approach for Murinae and Arvicolinae based on a proper taxonomic framework.

Introduction Rodentia is one of the most diverse and ubiquitous tetrapod groups in the mammalian order, comprising approximately 42% of mammalian species (Musser & Carleton 2005). Rodents are typically characterized by large population sizes and play important roles in food chain, ecological equilibrium, and seed dispersal and have major impacts on agriculture and forest production, as well as on disease transmission (Jędrzejewski et al. 1995; Mills & Childs 1998; DeMattia et al. 2004; Valone & Schutzenhofer 2007; Jones et al. 2008; Meerburg et al. 2009; Jacob et al. 2014). Compared with other mammalian taxa, Rodentia still remains poorly understood in many research fields. To facilitate further study of rodents, it is critical to confirm their accurate identification. However, it is a most challenging task to identify rodent species, especially for species in the subfamilies Murinae and Arvicolinae. As a center of rodent biodiversity, China hosts nearly 200 species of rodents, including most representatives of Murinae (Rodentia: Muridae) and many species of Arvicolinae (Rodentia: Cricetidae) (Wang et al. 2003; Musser & Carleton, 2005). A large proportion of these rodents correspond to recently diverged species or to sibling species (Conroy & Cook 1999; Jaarola et al. 2004; Jing et al. 2007; Fink et al. 2010; Rowe et al. 2011; Nicolas et al. 2012), some of which are endemic to China. Due to rapid radiation and high intraspecific and interspecific biodiversity, the identification of Murinae and Arvicolinae has been quite difficult (Denys et al. 2003; Robins et al. 2007; Rowe et al. 2011) and the morphological phylogenetics of many taxa remains controversial This article is protected by copyright. All rights reserved.

Accepted Article

(Achmadi et al. 2013). On the one hand, there is high similarity in morphology between different species. For example, Niviventer confucianus and N. fulvescens (Lunde et al. 2009), and Leopoldamys edwardi and L. herberti (Balakirev et al. 2013) have been misidentified even by experts (Pagès et al. 2010). On the other hand, morphological divergence within many species is unusually wide (Chaval et al. 2010; Pagès et al. 2010; Lu et al. 2012). In addition, shortage of taxonomists, complex identification criteria, and time-consuming processes have sometimes made morphological methods impractical for accurate species authentication of rodents. Therefore, it is important to develop an accurate and effective molecular identification approach for Murinae and Arvicolinae to assist morphological identification. As a powerful tool for rapid species assignment and cryptic species discovery in various taxa (Hebert et al. 2004; Hajibabaei et al. 2006; Hubert et al. 2008; Reid et al. 2011; Nagy et al. 2012; Lees et al. 2014), DNA barcoding can make this question easier. It is a molecular technique of analyzing genetic divergence across short molecular markers, such as a the fragment of mitochondrial cytochrome oxidase subunit I (COI) gene in animals (Hebert et al. 2003a,b), to facilitate the study of systematics, biological diversity, biological conservation, biogeography, and ecology. However, only few studies have been conducted on DNA barcoding of the Rodentia (Borisenko et al. 2007; Nicolas et al. 2012; Lu et al. 2012; Müller et al. 2013). To our knowledge, this study is the first large-scale and systematic report on DNA barcoding of the Murinae and Arvicolinae species distributed in China. According to previous reports, there are 49 species of Murinae and 53 species of Arvicolinae in China (Musser & Carleton 2005; Abramov et al. 2009; Liu et al. 2012). We

This article is protected by copyright. All rights reserved.

Accepted Article

included 31 Murinae species and 23 Arvicolinae species in this study and many of these rodents are also distributed in Southeast Asia. We used DNA barcodes based on the COI gene to identify species in Murinae and Arvicolinae, in order to provide an accurate and efficient molecular method for species identification, and to facilitate the systematic studies of rodents. Multiple analytic approaches of DNA barcoding were selectively employed to gain more accurate and reliable evidence.

Materials and methods

Sample collection Ninety-three field individuals of Rodentia were live-trapped from different localities in Sichuan Province, China from January 2008 to December 2011. After being euthanized with CO2, all specimens were identified based on external morphology (color of pelage, head and body length, tail length, hind foot length, and ear length) and dental features by taxonomic experts from the Nature Museum in the College of Life Sciences at Sichuan University. Those specimens represented 15 putative species in Murinae and Arvicolinae (Musser & Carleton 2005). Muscle or liver tissue was collected and preserved in 95% ethanol at -20 °C for molecular studies. All samples were obtained in accordance with Chinese regulations for the implementation of protection of terrestrial wild animals (State Council Decree [1992] No. 13). All field studies and lab work were approved by the Guidelines for Care and Use of Laboratory Animals and the Ethics Committee of at Sichuan University (Chengdu, China).

This article is protected by copyright. All rights reserved.

Accepted Article

DNA extraction, amplification and sequencing Genomic DNA was extracted using a standard proteinase K/phenol method with minor modifications (Sambrook & Russell 2001). DNA quality was verified by 1.0% agarose gel electrophoresis. COI gene was amplified using the primer pair VF1 5’-TTC TCA ACC AAC CAC AAA GAC ATT GG-3’ and VR1 5’-TAG

ACT TCT GGG TGG CCA AAG AAT

CA-3’, or the degenerate primer VF1d 5'-TTC TCA ACC AAC CAC AAR GAY ATY GG-3' and VR1d 5'-TAG ACT TCT GGG TGG CCR AAR AAY CA-3' (Ivanova et al. 2006), yielding fragments of approximately 650 bp. Each PCR mixture (25 μL) contained 10 ng template DNA, 2μL 10× PCR buffer (MgCl2 free, TaKaRa), 0.2 μM each primer, 1.5 mM MgCl2, 0.2 mM each dNTP and 1 U Taq polymerase (TaKaRa). Amplifications were performed on an S1000 thermo cycler (Bio-Rad) using an initial denaturation period of 94 °C for 5 min, followed by 34 cycles of 94 °C for 45s, 51 °C for 45s and 72 °C for 45s, and a final extension at 72 °C for 8 min. Amplicons were purified using a DNA gel extraction kit (OMEGA) and bidirectional sequenced by Invitrogen Trading Guangzhou Co. Ltd. The contigs were manually inspected with CExpress 9.1. Sequences were assembled and translated into amino acids sequences in MEGA 5.2 (Tamura et al. 2011) to verify that they were free of stop codons and gaps. All sequences were submitted to GenBank (accession numbers KF999083-KF999172) and sequences, specimens and collection details were submitted

to

BOLD

(accession

numbers

RBOLD001-14 - RBOLD090-14)

(http://www.barcodinglife.org) (Ratnasingham & Hebert 2007) under the project “Barcoding of Murinae and Arvicolinae in southwestern China”.

This article is protected by copyright. All rights reserved.

Accepted Article

Except the original data, we downloaded COI sequences of Murinae and Arvicolinae from GenBank to include them in this study, too.

Data analysis Five DNA barcoding analytic approaches were employed to analyze the sequences after pre-processing. The descriptions of these methods are described below. The genetic distance method was performed first. K2P (Kimura 1980) pairwise distances were calculated using MEGA5.2 (Tamura et al. 2011). Nucleotide gaps were treated as pairwise deletions, and substitutions included both transitions and transversions. Genetic distances within species, genera and subfamilies were calculated, and frequency distribution histograms of all conspecific and heterospecific pairwise distances were constructed to look for barcoding gaps (Meyer & Paulay 2005). This method, along with a tree-based method, has been recommended by the BOLD (Ratnasingham & Hebert 2007). We applied a threshold-based method to investigate species boundaries based on the approach by Nicolas et al. (2012). The optimal threshold "t" is used to describe species boundary. When intraspecific distance is less than "t" or interspecific distance is greater than "t", specimens were thought to be correctly identified. Otherwise, they were classified as "false positive" and "false negative", respectively, as described by Nicolas et al. (2012). Thus, a scenario in which two specimens belonging to the same species are classified in two different species was treated as a "false positive", while two specimens coming from different species are classified within the same species was treated as a "false negative". In order to determine an optimal threshold, a cumulative distribution function of the total number of

This article is protected by copyright. All rights reserved.

Accepted Article

"false positive" and "false negative" was drawn in the two subfamilies by varying the threshold from 0 to the maximum interspecific distance, increasing "t" gradually by 0.05%. The optimal threshold was obtained by minimizing the sum of errors. The tree-based method was used based on the assumption that samples of distinct species would form discrete clusters in a DNA barcode tree. Neighbor joining trees (Saitou & Nei 1987), maximum parsimony trees (Edwards & Sforza 1963), Bayesian trees (Huelsenbeck & Ronquist 2001), and maximum likelihood trees (Guindon & Gascuel 2003) were constructed to gauge the robustness of our results. We mainly focused on whether individuals from the same species clustered together, rather than on the evolutionary relationships between species. Tscherskia triton and Meriones meridianus were selected as outgroups for Murinae and Arvicolinae, respectively. NJ trees based on K2P distances were implemented in MEGA5.2 (Tamura et al. 2011) using 1000 bootstrap replicates to assess the branch support. Gaps and missing data were treated as pairwise deletions. Substitutions included transitions and transversions. And uniform substitution rate was used. MP trees were obtained using PAUP*4b10 (Swofford 2003) with a heuristic search using 1,000 random sequence addition replicates. Node supports were assessed using 1000 bootstraps replicates. BI trees were performed with MrBayes v 3.1.2 (Ronquist & Huelsenbeck 2003). The TrN+I+G model was selected for both Murinae and Arvicolinae by Modeltest 3.7 (Posada & Crandall 1998) based on the AIC criterion. Four independent MCMC chains were simultaneously run for 3,000,000 generations by sampling one tree per 100 generations. The first 7500 trees were discarded as burn-ins. Bayesian posterior probabilities were used to assess robustness of trees. ML trees were generated by

This article is protected by copyright. All rights reserved.

Accepted Article

online PhyML3.0 (http://www.atgc-montpellier.fr/phyml/) (Guindon et al. 2010) with the following main parameters: substitution model (HKY85), transition and transversion ratio (estimated), number of substitution rate categories (6), starting tree (BIONJ), type of tree improvement (SPR & NNI), number of random starting tree (5). The proportion of invariable sites and Gamma distribution shape parameter were defined as the best model obtained. Robustness of ML trees was assessed by 100 bootstrap replicates. Online nucleotide BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) were carried out as a similarity-based method (Altschul et al. 1997). The parameter "program selection" was set as "highly similar sequences". By querying every data sequence against the GenBank nucleotide database, the most similar sequence (called "best hit") with the highest score was obtained based on pairwise alignments. The query sequence would be assigned as the species associated with its best hit(s), whose coverage should be greater than 90% and identity should be greater than 95%. If there was no such best hit or best hits contained more than one species, the assignment of query sequence could not be determined. Taking GenBank sequences into account, best hits should exclude the query sequence itself. Lastly, BLOG (Barcoding with LOGic) (Bertolazzi et al. 2009) was employed as a character-based identification method with program BLOG2.0 (Weitschek et al. 2013). A training set of sequences with species assignments and a testing set of sequences in FASTA format were input, and the program then created classification logic formulas for the training set and applied them to the testing set (Weitschek et al. 2013). The train file consisted of approximately 65% of data sequences, with the remaining as the test file. Input parameters for feature selection were as follows: a maximum number of 30 features chosen

This article is protected by copyright. All rights reserved.

Accepted Article

("BETA=30"), a maximum of 500 iterations ("GRASPITER = 500"), and a maximum time of 5 minutes for analysis ("GRASPSECS=300"). The logic formula having the lowest false positive rate on the reference data set was taken as the identification. The outputs contained logic formulas and classification statistics. For species that could not be successfully classified, a second analysis was conducted by consulting the latest taxonomic revision of Eothenomys and Micromys, as well as results of genetic distance and phylogenetic trees, in order to explore proper classification. For the species difficult to classify in molecular analyses, such as species in the genera Eothenomys and Micromys, specimens were re-examined by experts including Dr. Shaoying Liu from the Sichuan Academy of Forestry, who followed the protocol described by Liu et al. (2007) and checked the specimens against voucher specimens at the Nature Museum of the Sichuan Academy of Forestry. We also re-examined GenBank sequences carefully by reviewing their sources and relevant references to confirm the results.

Results

Amplification and sequencing A total of 90 individuals were successfully amplified and sequenced, representing 12 putative species in five genera from Murinae and three species in two genera from Arvicolinae (Table 1). Three sequences were not obtained due to either low quality of DNA or amplification failure. No stop codons or indels were found, suggesting no pseudogenes in

This article is protected by copyright. All rights reserved.

Accepted Article

those sequences. Additionally, COI sequences of Murinae and Arvicolinae were downloaded from GenBank (available in July, 2013), which were mainly sampled from China, Southeast Asia and Russia. In total, Murinae was comprised of a total of 242 sequences belonging to 31 species in 11 genera, with 79 individuals determined originally and 163 GenBank sequences (Table S1a); Arvicolinae contained 130 sequences belonging to 23 species in 6 genera, with 11 samples determined originally and 119 GenBank sequences (Table S1b). The aligned sequences were trimmed to 645 bp for Murinae and 648 bp for Arvicolinae, homologous to the 5' end of Mus musculus COI gene from 58 bp to 702 bp or 705 bp. The COI gene from Murinae consisted of 392 conserved sites and 253 variable sites, and Arvicolinae consisted of 410 conserved sites and 238 variable sites. The mean nucleotide frequencies of the two datasets were A: 0.288, C: 0.250, G: 0.165, and T: 0.297 for Murinae and A: 0.273, C: 0.280, G: 0.173, and T: 0.274 for Arvicolinae. Sample size for every species ranged from 1 to 20 with an average size of 7 individuals, and approximately 89% of the species were represented by two or more specimens. Singletons, such as Rattus losea and Myodes regulus, were not discarded, since in most cases a single representation of a species could still provide a benchmark for that species and increase the reliability of the overall DNA barcoding analysis (Rach et al. 2008; Lim et al. 2011).

Genetic distance The average level of genetic divergence in the COI gene is summarized in Table 2. The mean K2P pairwise distances of Murinae within species, genera and subfamilies were 2.10%,

This article is protected by copyright. All rights reserved.

Accepted Article

12.61% and 18.08%; respectively, and 2.86%, 11.80% and 16.74% for Arvicolinae. Genetic divergence increased with taxonomic level as expected. Mean congeneric species distance was approximately six and four times the mean conspecific distance in Murinae and Arvicolinae, respectively. The divergence among confamilial taxa was slightly higher than that among congenerics in both data sets. Frequency distribution histograms of all K2P pairwise distances are shown in Fig. 1. Although intraspecific distances were generally lower than interspecific distances, there was no barcoding gap in Murinae or Arvicolinae. Unexpectedly high conspecific distances and low heterospecific distances are shown in Table 3. Very high conspecific distances were found with R. tanezumi, R. andamanensis, Micromys minutus, E. chinensis, E. custos and Caryomys eva. Very low inter-specific distances were found for two species pairs: R. tanezumi-R. andamanensis and Niviventer langbianis-N. tenaster.

Optimal thresholds The distribution of false-positives and false-negatives is represented in Fig. 2. The optimal threshold value of Murinae ranged from 5.57% to 5.68%, with a minimum errors value of 135 (10.46%) false positives plus 97 (0.35%) false negatives (Fig. 2a). False-positives involved individuals of the following species with the mean intraspecific distance of these individuals indicated in parenthesis: N. confucianus (6.24%), R. tanezumi (10.21%), R. andamanensis (9.14%), Leopoldamys edwardsi (9.17%), Maxomys surifer (5.85%) and M. minutus (10.93%). Most false-negatives occurred in sibling species, involving some individuals of the following species with the mean interspecific distance of

This article is protected by copyright. All rights reserved.

Accepted Article

these individuals in parenthesis: N. andersoni-N. excelsior (5.54%), N. langbianis-N. tenaster (2.07%), R. nitidus-R. norvegicus (5.47%), R. tanezumi-R. andamanensis (1.39%) and R. tanezumi-R. rattus.(3.55%). The optimal threshold of Arvicolinae was 3.34%. The minimum number of total errors was 123 (23.34%) false-positives and 15 (0.19%) false-negatives (Fig. 2b). The distribution of cumulative errors in Arvicolinae was similar to that in Murinae. False-positives contained individuals of E. chinensis (9.87%), E. custos (10.30%), C. eva (7.03%), E. eleusis (4.49%), E. melanogaster (3.50%), and Microtus arvalis (3.51%). False-negatives included E. eleusis-E. cachinus (2.41%), E. miletus-E. cachinus (1.62%), and Myodes rufocanus-M. regulus (3.33%).

Tree-based identification Tree-based methods provided high resolution in species discrimination. All phylogenetic trees (NJ, MP, BI, and ML) exhibited similar topologies and differed only in the supporting values for some nodes. These trees showed that 97.52% of samples from Murinae and 88.46% from Arvicolinae formed cohesive groups with more than 90% bootstrap values or posterior possibilities, and successfully classified most false-positives and false-negatives obtained from threshold method, except three species from Murinae and four species from Arvicolinae. Even species with relatively low interspecific variations, such as N. andersoni-N. excelsior (5.67%), R. tanezumi-R. rattus (5.22%), were well recovered in phylogenetic analyses. The resultant NJ trees are shown in Fig. 3.

This article is protected by copyright. All rights reserved.

Accepted Article

One puzzling cluster in Murinae was the Rattus complex shown in Fig. 3a. Three R. tanezumi individuals clustered with two R. andamanensis individuals, while the majority of R. tanezumi and R. andamanensis formed distinct and cohesive clades. As a consequence, R.tanezumi individuals fell within two different clades: specimens from China (Sichuan), Japan and Indonesia formed a monospecific clade, while three other specimens from China (Guizhou and Yunnan) clustered with two R. andamanensis from Cambodia and Thailand. Correspondingly, R. andamanensis showed a similar division with all samples from Vietnam forming a monospecific clade. The other puzzling cluster occurred in the Niviventer complex (Fig. 3a) where the only individual of N. tenaster fell in the cluster of N. langbianis with a 2.07% mean interspecific distance (ranging from 0.31% to 3.20%). In Arvicolinae, both E. chinensis and E. custos resolved into paraphyletic groups. E. chinensis from Sichuan segregated into two different monospecific clades: E. chinensis I was sister to E. proditor, while E. chinensis II was sister to one clade of E. custos (I). Similarly, E. custos formed two clades, which respectively clustered with heterospecific samples from the same location. Namely, E. custos I from Sichuan clustered with E. chinensis II, and E. custos II from Yunnan was sister to E. wardi from Yunnan, forming the basal clade of Eothenomys. Another puzzle occurred in the the Eothenomys complex (Fig. 3b) where there was an unclear separation among three sympatric species—E. cachinus, E. eleusis and E. miletus. M. minutus, L. edwardsi and C. eva had deep internal splits in their monophyletic clusters with >90% support values and correspondingly large K2P distances between the internal clusters. K2P pairwise distances of the species with paraphyletic topology or deep internal splits are shown in Table 3c. In addition, the clades of N. confucianus (6.12%), N. fulvescens (5.14%), Apodemus chevrieri (4.88 %), M. surifer (4.54%) and A. draco (3.13%) also contained clear but shallow subclusters with relatively large mean K2P genetic distance This article is protected by copyright. All rights reserved.

Accepted Article

between those subclusters (as indicated in parentheses above), which were in accord with the sampling locations, inferring differentiation of geographically separated populations (Fig. 4). Both phylogeny and genetic distance were in agreement with the differentiation of separated geographic populations.

BLAST results Within the Murinae, 223 out of 242 similarity searches succeeded in identifying conspecific sequences with a 92.15% success rate, and 122 out of 130 searches within the Arvicolinae were correctly identified with a similar 93.85% success rate. Twenty four species from Murinae and 18 species from Arvicolinae were completely identified. The unclassified species and samples are listed in Table 4. One scenario for these unclassified samples was "no best hit" due to low identity (< 95% ). This was mainly caused by lack of conspecific sequences in the GenBank database, such as M. erythrotis, R. pyctoris, and M. regulus. A second scenario was that best hits contained two or more species, and a third scenario was that query sequences were assigned to a species different from the putative one. Of note, several analyzed individuals were found to share the same haplotype as samples from different species (Table 4, labeled with asterisks), which were likely caused by morphological misidentification or evolutionary events, such as incomplete lineage sorting or introgression.

Performance of BLOG The correct classification rate of testing data was 92.22% for Murinae and 66.67% for Arvicolinae, as assessed by BLOG analysis. Worry-classified and not-classified elements This article is protected by copyright. All rights reserved.

Accepted Article

included R. tanezumi, R. andamanensis and L. edwardsi from Murinae and C. eva, E. chinensis and E. custos from Arvicolinae. BLOG could not identify most paraphyletic species, and these problematic species were consistent with the results of distance-based methods and phylogenetic methods. The low correct classification rate within Arvicolinae was attributed to the large number of specimens in E. chinensis and E. custos. The logic formulas are listed in Table S2. To explore the classification of these unclassified species, a second analysis was carried out. Taking the results of genetic distance, DNA barcode trees and BLAST into account, we referred to the latest taxonomic revision to explore whether they could be classified as taxonomic units by unique diagnostic formulas. The outputs showed that all specimens could be correctly classified based on the latest taxonomic revision of Eothenomys and Micromys, as well as the results of genetic distance and DNA barcode trees, and those clusters did have distinct classification formulas (Table 5), which were different from those of the remaining conspecific individuals. In the case of L. edwardsi, in which it formed a cluster of only one individual, we did no further analysis.

Discussion

Species identification of Murinae and Arvicolinae based on DNA barcoding In this study, our results initially confirmed that rodents of Murinae and Arvicolinae in China could be accurately and efficiently identified by DNA barcoding based on a large-scale sample. Murinae and Arvicolinae were difficult to identify due to remarkable biodiversity

This article is protected by copyright. All rights reserved.

Accepted Article

and special evolutionary characteristics, such as rapid adaptive radiation and short divergence time (Conroy & Cook 1999; Steppan et al. 2004; 2005; Galewski et al. 2006; Robovský et al. 2008), which were also revealed by the wide distribution of interspecific distances, relatively large intraspecific distances, and large optimal thresholds (Murinae: ~5.62%, Arvicolinae: 3.34%) found in this study. This is especially common in Rattus, Niviventer, Microtus, and Eothenomys (Luo et al. 2004; Jing et al. 2007; Fink et al. 2010; Rowe et al. 2011). Despite these factors, our study indicated that DNA barcoding could correctly identify more than 90% of the samples with proper approaches, such as phylogenetic analyses, online nucleotide BLAST and BLOG. Most importantly, the closely related species that were prone to be morphologically misidentified, such as N. confucianus and N. fulvescens, as well as N. andersoni and N. confucianus, could be accurately identified by DNA barcoding. Compared with Arvicolinae, Murinae obtained better results owing to more sufficient sampling. Further, DNA barcoding could even identify different geographic populations in this study (Fig. 4). Taking into consideration the sampling positions, geographic and genetic distances showed great congruence. Indeed, conspecific individuals from different locations often exhibited larger genetic distances and grouped into distinct clusters. The geographic population differentiation was thought to be the main reason for relatively large intraspecific distances in some studied rodents (Table 3a). Additionally, we found a larger optimal threshold of Murinae (~5.62%) in Southeast Asia than that in a previously examined African Tribe population with an optimal threshold of 3.76% (Nicolas et al. 2012), indicating there was more significant population differentiation in Murinae from Southeast Asia than in the

This article is protected by copyright. All rights reserved.

Accepted Article

African Tribe population. This was in agreement with the viewpoint that Southeast Asia was not only a hotspot of interspecific but also of intraspecific biodiversity (Pagès et al. 2011).

Taxonomic revision and DNA barcoding identification of rodents Although identification of Murinae and Arvicolinae was successful, it is worth noting that DNA barcoding could not improve much on the identification of taxa without proper taxonomic classification, even with multiple analytic methods. This has been widely recognized by previous DNA barcoding studies (Will et al. 2004; DeSalle et al. 2005; Goldstein et al. 2011) as well as our work. Thus, with the help of the latest taxonomic revision and sample reexamination, the confusion within E. chinensis, E. custos and M. minutus was resolved well. E. chinensis and E. custos displayed large intraspecific distances exceeding the mean intraspecific distance and the optimum threshold of Arvicolinae. Moreover, they were both resolved as paraphylies in trees, owning different diagnostic formulas from the other, which suggests they might contain cryptic species. After reexamining and confirming the identification of our specimens, we further referred to the source of GenBank sequences in Eothenomys, most of which were generated from Liu et al. (2012). In their study, the authors recognized the two clades of E. chinensis as distinct species, named E. chinensis and E. tarquinius, and the two clades of E. custos as E. custos and E. hintoni based on strong molecular and morphological evidence. GenBank sequences HM165289, HM165320, HM165322, and HM165324 labeled as E. chinensis (E. chinensis II in Fig. 3b) were actually E. tarquinius, and HM165282-HM165283, HM165332-HM165333, HM165335-HM165338

This article is protected by copyright. All rights reserved.

Accepted Article

(E. custos I in Fig. 3b) were actually E. hintoni. Our results not only prove DNA barcoding could correctly classify these closely related species even when GenBank sequences were not correctly labeled, but also reveal the importance of comprehensive taxonomy in DNA barcoding studies. Morphological identification of sibling species in Rodentia is so difficult to the point that even experts would make mistakes. In our analysis, M. minutus formed two monophyletic clusters, which were respectively sampled from Sichuan and China. Both mean K2P distance (10.93%) and phylogenetic topology suggested the Sichuan group likely represented a distinct taxon. We further investigated the taxonomic revision of Micromys by Yasuda et al. (2005) and Abramov et al. (2009), which suggested a new species in genus Micromys, denominated M. erythrotis. After carefully re-examining our specimens in morphological methods and taking the geographic distribution into consideration, we found that the primarily identified M. minutus were actually M. erythrotis, and this was congruent with our DNA barcoding analyses. With further analysis in BLOG, the two taxa both obtained unique classification formulas. With the latest taxonomic revision, the average intraspecific and interspecific genetic distances were 2.02% and 12.60% in Murinae, and 1.24% and 11.62% in Arvicolinae, respectively. Although the optimal thresholds almost did not change, the sum of errors in Arvicolinae dramatically decreased to 27 false-positives and 15 false-negatives. This study confirmed the species status of E. tarquinius, E. hintoni, and M. erythrotis from the viewpoint of DNA barcoding. These results are in agreement with the viewpoint of DeSalle et al. (2005) and Goldstein et al. (2011) that thorough taxonomy and accurate morphological identification are fundamental for DNA barcoding studies, and that in return, DNA barcoding could

This article is protected by copyright. All rights reserved.

Accepted Article

complement morphological identification and facilitate systematic study with other information.

Ambiguous identification in this study Acknowledging the misidentification of two R. tanezumi individuals (JQ793910 and JQ793911), which were proved to be R. andamanensis by Lu et al. (2012), we considered R. andamanensis were resolved in paraphyletic clades with typical congeneric divergence. Combined with large mean intraspecific distance (3.84%), ambiguous identification in BLAST and BLOG, this indicates R. andamanensis might contain cryptic species or ESUs. Our results reflects the poor taxonomy of Rattus. The paraphyly and high conspecific genetic divergence of R. tanezumi and R. andamanensis were somehow expected, since many previous studies showed some species in Rattus were resolved as paraphyly and this genus is in need of thoroughly taxonomic study (Musser & Carleton 2005; Robins et al. 2007; Pagès et al. 2010). This complex might also be due to evolutionary factors, such as incomplete lineage sorting and introgression, for these evolutionary events were common in Rattus (Rowe et al. 2011; Lack et al. 2012; Pagès et al. 2013). This ambiguous identification still requires more sampling and an integrated study of morphology, karyology, systematics, ecology, as well as other molecular markers to resolve. Another ambiguous identification in Murinae was L. edwardsi, which showed a large mean inter-cluster distance (9.17%) and deep internal splits. This was mainly due to one of eight individuals in L. edwardsi (JF444996) collected from China. This individual was identified as L. neilli with 99% of identity in BLAST analyses. According to Balakirev et al.

This article is protected by copyright. All rights reserved.

Accepted Article

(2013), L. neilli was a junior synonym of L. herberti, while L. herberti and L. edwardsi were difficult to distinguish based on morphological characters. Therefore, the unexpected results observed in our analysis may be caused by misidentification of the individual of JF444996. To test this inference, we downloaded all available COI sequences of L. neilli from GenBank and reconstructed phylogenetic trees with all sequences in Murinae. The results showed JF444996 nested in L. neilli with 96% support value. However, these results still could not exclude the possibility of incomplete lineage sorting or introgression, which would require further studies of morphology, biogeography and taxonomy to resolve. Relatively large intraspecific distances (2.84%), deep internal splits in phylogenetic trees and outputs of BLOG suggest there were subspecies in the studied C. eva. According to Thomas (1911), the two subspecies in C. eva, namely C. e. alcinous and C. e. eva, were only distributed in northern Sichuan and outside northern Sichuan, respectively. However, the two assemblages of C. eva in our DNA barcode trees were not consistent with the above classification, and instead showed that one assemblage corresponds to northern Sichuan and the other corresponds to northern Sichuan and southern Gansu. Our results indicate the classification along with geographic coverage of the subspecies in C. eva needs to be reexamined with more types of information and more samples.

Different performance of DNA barcoding analytic approaches Species identification success rate varied across different barcoding analytic approaches. When the revised taxonomy system was adopted, the performance of all analytic methods were improved, with BLOG and phylogenetic analyses outperforming other methods. Even

This article is protected by copyright. All rights reserved.

Accepted Article

though intraspecific and interspecific genetic distances overlapped in our study, the genetic distance method provided significant and fundamental genetic information for subsequent analyses. We strongly agree with Zhang and Hanner (2011) in that genetic distance is an excellent shorthand to build a first species partition hypothesis. Similarly, it is meaningful to explore optimal threshold. Threshold is a statistical expression of general species boundary. Thus, we could quickly assess the assignment of a specimen based on its genetic distance by the optimal threshold. The online BLASTn performed better than local BLASTn when compared with the results of Velzen et al.(2012). Their correct identification rates were 86.37% for empirical data and 85.61% for simulated data of recently diverged species with local BLASTn, while it was more than 93% in our study. This could be attributed to the powerful GenBank nucleotide database where most tested species are represented. Classification failures in the first analysis of BLOG emphasized the importance of accurate taxonomic framework. When we adopted the novel taxonomic revision in the second analysis, BLOG successfully identified all testing sequences except R. andamanensis and L. edwardsi. However, we noticed that M. erythrotis were not distinguished from M. minutus in the first BLOG analysis, which suggests that BLOG was not as useful as expected in discovering cryptic species. Furthermore, the tree-based method was not only a powerful analytic method of species identification, but also of discovering new taxa with graphic outputs.

This article is protected by copyright. All rights reserved.

Accepted Article

Conclusions

Our study confirmed that DNA barcoding based on the COI gene could efficiently and accurately identify most Murinae and Arvicolinae distributed in China, providing an efficient and accurate species-specific identification method and facilitating the systematic study of Rodentia. Based on the latest taxonomic revision, the tree-based method, online BLAST and BLOG successfully distinguished 90% to 100% of investigated samples. Relatively high intraspecific divergence, large optimal thresholds and clear internal splits in trees demonstrated significant population differentiation within species of the studied Murinae. With the help of more comprehensive sampling and multiple DNA barcoding analytical methods, we have confirmed the newly identified species E. tarquinius, E. hintoni and M. erythrotis and suggested potential cryptic species or subspecies in R. andamanensis and C. eva. Furthermore, our study confirms the important role of thorough taxonomy and accurate morphological identification in DNA barcoding studies. The inclusion of closely related species and different conspecific populations is expected to provide a more robust proof of DNA barcoding in rodent species identification and discovery of cryptic species.

Acknowledgements This study was supported by the National Science and Technology Support Project of China (2012BAC01B06), also partly funded by Sichuan Youth Science and Technology Foundation (2011JQ0022), and the Project Sponsored by the Scientific Research Foundation

This article is protected by copyright. All rights reserved.

Accepted Article

for the Returned Overseas Scholars, State Education Ministry (20111568-8-3). We are grateful to Dr. Cong Guo, Dr. Jianghong Ran, Dr. Feiyun Tu and Guo Cai for providing samples. We thank Shaoying Liu and other taxonomists of the Sichuan Academy of Forestry for their identification of samples, Quekun Peng for figure preparation, Dr. Johann Bergholz and Prof. Timothy Moermond in Sichuan University and Dr. Devaughn Fraser in UCLA for English revision.

References Abramov AV, Meschersky IG, Rozhnov VV (2009) On the taxonomic status of the harvest mouse Micromys minutus (Rodentia: Muridae) from Vietnam. Zootaxa, 2199, 58-68. Achmadi AS, Esselstyn JA, Rowe KC et al. (2013) Phylogeny, diversity, and biogeography of Southeast Asian spiny rats (Maxomys). Journal of Mammalogy, 94, 1412-1423. Altschul SF, Madden TL, Schäffer AA et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25, 3389-3402. Balakirev AE, Abramov AV, Rozhnov VV (2013) Revision of the genus Leopoldamys (Rodentia, Muridae) as inferred from morphological and molecular data, with a special emphasis on the species composition in continental Indochina. Zootaxa, 3640, 521-549. Bertolazzi P, Felici G, Weitschek E (2009) Learning to classify species with barcodes. BMC Bioinformatics, 10, S7. Borisenko AV, Lim BK, Ivanova NV et al. (2008) DNA barcoding in surveys of small mammal communities: a field study in Suriname. Molecular Ecology Resources, 8, 471-479. Chaval Y, Dobigny G, Michaux J et al. (2010) A multi-approach survey as the most reliable tool to accurately assess biodiversity: an example of Thai murine rodents. Kasetsart This article is protected by copyright. All rights reserved.

Accepted Article

Journal-Natural Science, 44, 590-603. Conroy CJ, Cook JA (1999) MtDNA evidence for repeated pulses of speciation within arvicoline and murid rodents. Journal of Mammalian Evolution, 6, 221-245. DeMattia EA, Curran LM, Rathcke BJ (2004) Effects of small rodents and large mammals on Neotropical seeds. Ecology, 85, 2161-2170. Denys CE, Lecompte E, Granjon L et al. (2003) Integrative systematics: the importance of combining techniques for increasing knowledge of african Murinae. In Rats, Mice and People: Rodent Biology and Management (eds Singleton GR, Hinds LA, Krebs CJ and Spratt DM). pp. 531-535. Australian Centre for International Agricultural Research Monograph No. 96,.Canberra, Australia. DeSalle R, Egan MG, Siddall M (2005) The unholy trinity: taxonomy, species delimitation and DNA barcoding. Philosophical Transactions of the Royal Society B: Biological Sciences, 360, 1905-1916. Edwards AWF, Sforza CLL (1963) The reconstruction of evolution. Heredity, 18. Fink S, Fischer MC, Excoffier L et al. (2010) Genomic scans support repetitive continental colonization events during the rapid radiation of voles (Rodentia: Microtus): the utility of AFLPs versus mitochondrial and nuclear sequence markers. Systematic Biology, 59, 548-572. Galewski T, Tilak M, Sanchez S, et al. (2006) The evolutionary radiation of Arvicolinae rodents (voles and lemmings): relative contribution of nuclear and mitochondrial DNA phylogenies. BMC Evolutionary Biology, 6, 80. Goldstein PZ, DeSalle R (2011) Integrating DNA barcode data and taxonomic practice: determination, discovery, and description. Bioessays, 33, 135-147. Guindon S & Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology, 52, 696-704.

This article is protected by copyright. All rights reserved.

Accepted Article

Guindon S, Dufayard JF, Lefort V et al. (2010) New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology, 59, 307-321. Hajibabaei M, Janzen DH, Burns JM et al. (2006) DNA barcodes distinguish species of tropical Lepidoptera. Proceedings of the National Academy of Sciences, 103, 968-971. Hebert PDN, Cywinska A, Ball SL deWaard JR (2003a) Biological identifications through DNA barcodes. Proceedings of the Royal Society B: Biological Sciences, 270,313-321. Hebert PDN, Ratnasingham S, deWaard JR (2003b) Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proceedings of the Royal Society B: Biological Sciences, 270, S96-S99. Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004) Identification of birds through DNA Barcodes. PLoS Biology, 2, e312. Hubert N, Hanner R, Holm E et al. (2008) Identifying Canadian freshwater fishes through DNA barcodes. PLoS ONE, 3, e2490. Huelsenbeck JP & Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754-755. Ivanova NV, DeWaard JR, Hebert PDN (2006) An inexpensive, automation-friendly protocol for recovering high-quality DNA. Molecular Ecology Notes, 6, 998–1002 Jaarola M, Martínková N, Gündüz İ, et al. (2004) Molecular phylogeny of the speciose vole genus Microtus (Arvicolinae, Rodentia) inferred from mitochondrial DNA sequences. Molecular Phylogenetics and Evolution, 33, 647-663. Jacob J, Manson P, Barfknecht R, et al. (2014) Common Vole (Microtus Arvalis) Ecology and Management: Implications For Risk Assessment of Plant Protection Products. Pest Management Science, DOI: 10.1002/ps.3695. Jędrzejewski W, Jędrzejewska B, Szymura L (1995) Weasel population response, home range,

This article is protected by copyright. All rights reserved.

Accepted Article

and predation on rodents in a deciduous forest in Poland. Ecology, 179-195. Jing M, Yu HT, Wu SH, et al. (2007) Phylogenetic relationships in genus Niviventer (Rodentia: Muridae) in China inferred from complete mitochondrial cytochrome b gene. Molecular Phylogenetics and Evolution, 44, 521-529. Jones KE,, Patel NG, Levy MA et al. (2008) Global trends in emerging infectious diseases. Nature, 451, 990-994. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16, 111-120. Tamura K, Peterson D, Peterson N et al. (2011) MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution, 28, 2731-2739. Lack JB, Greene DU, Conroy CJ, et al. (2012) Invasion facilitates hybridization with introgression in the Rattus rattus species. Molecular Ecology, 21, 3545-3561. Lees DC, Kawahara AY, Rougerie R, et al. (2014) DNA barcoding reveals a largely unknown fauna of Gracillariidae leaf-mining moths in the Neotropics. Molecular Ecology Resources, 14, 286-296. Lim GS, Balke M, Meier R (2011) Determining species boundaries in a world full of rarity: singletons, species delimitation methods. Systematic Biology, 61, 165-169. Liu S, Sun Z, Zeng Z et al. (2007) A New Vole (Cricetidae: Arvicolinae: Proedromys) from the Liangshan Mountains of Sichuan Province, China. Journal of Mammalogy, 88, 1170-1178. Liu S, Liu Y, Guo P et al. (2012) Phylogeny of oriental voles (Rodentia: Muridae: Arvicolinae): molecular and morphological evidence. Zoological Science, 29, 610-622 Lu L, Chesters D, Zhang W et al. (2012) Small mammal investigation in spotted fever focus This article is protected by copyright. All rights reserved.

Accepted Article

with DNA-barcoding and taxonomic implications on rodents species from Hainan of China. PLoS ONE, 7, e43479. Lunde D, Smith AT, Hoffmann RS (2009) Rodentia. In A Guide to the Mammals of China (eds Smith AT, Xie Y, Gemma F, Wang S). pp. 95-184. Hunan Education Press, Changsha, China. Luo J, Yang D, Suzuki H, et al. (2004) Molecular phylogeny and biogeography of Oriental voles: genus Eothenomys (Muridae, Mammalia). Molecular Phylogenetics and Evolution, 33, 349-362. Meerburg BG, Singleton GR, Leirs H (2009) "The Year of the Rat ends: time to fight hunger!". Pest Management Science, 65, 351–352. Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biology, 3, e422. Mills JN, Childs JE (1998) Ecologic studies of rodent reservoirs: their relevance for human health. Emerging Infectious Diseases, 4, 529. Müller L, Gonçalves GL, Cordeiro-Estrela P et al. (2013) DNA Barcoding of Sigmodontine Rodents: Identifying Wildlife Reservoirs of Zoonoses. PLoS ONE, 8, e80282. Musser GG, Carleton MD (2005). Superfamily Muroidea. In: Mammal species of the World a taxonomic and geographic reference (eds Wilson DE & Reeder DM). pp. 894-1531. 3rd ed. Johns Hopkins University Press, Baltimore, Maryland. Nagy ZT, Sonet G, Glaw F et al. (2012) First large-scale DNA barcoding assessment of reptiles in the biodiversity hotspot of Madagascar, based on newly designed COI primers. PLoS ONE, 7, e34506. Nicolas V, Schaeffer B, Missoup AD et al. (2012) Assessment of three mitochondrial genes (16S, Cytb, COI) for identifying species in the Praomyini Tribe (Rodentia: Muridae). PLoS ONE, 7, e36586.

This article is protected by copyright. All rights reserved.

Accepted Article

Padial JM, Miralles A, De la Riva I, Vences M (2010) The integrative future of taxonomy. Frontiers in Zoology, 7, 1-14. Pagès M, Chaval Y, Herbreteau V et al. (2010) Revisiting the taxonomy of the Rattini tribe: a phylogeny-based delimitation of species boundaries. BMC evolutionary Biology, 10, 184. Pagès M, Latinne A, Johan M (2011) Inter-and intraspecific genetic biodiversity in South East Asian rodents: new insights for their conservation. In Biodiversity Hotspots (eds Zachos FE, Habel JC), pp. 363-382. Springer Berlin Heidelberg, Berlin, Germany. Pagès M, Bazin E, Galan M, et al. (2013) Cytonuclear discordance among Southeast Asian black rats (Rattus rattus complex). Molecular Ecology, 22, 1019–1034. Posada D & Crandall KA (1998) Modeltest: testing the model of DNA substitution. Bioinformatics, 14, 817-818. Rach J, DeSalle R, Sarkar IN et al. (2008) Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata. Proceedings of the Royal Society B: Biological Sciences, 275, 237-247. Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data System (www.barcodinglife.org). Molecular Ecology Notes, 7, 355-364. Reid BN, Le M, McCord WP et al. (2011) Comparing and combining distance-based and character-based approaches for barcoding turtles. Molecular Ecology Resources, 11, 956-967. Robins JH, Hingston M, Matisoo-Smith E et al. (2007) Identifying Rattus species using mitochondrial DNA. Molecular Ecology Notes, 7, 717-729. Robovský J, ŘIčánková V, Zrzavý J (2008) Phylogeny of Arvicolinae (Mammalia, Cricetidae): utility of morphological and molecular data sets in a recently radiating clade. Zoologica Scripta, 37, 571-590. Ronquist, F, Huelsenbeck, JP (2003) MrBayes 3: Bayesian phylogenetic inference under

This article is protected by copyright. All rights reserved.

Accepted Article

mixed models. Bioinformatics, 19, 1572-1574.754-755. Rowe KC, Aplin KP, Baverstock PR et al. (2011) Recent and rapid speciation with limited morphological disparity in the genus Rattus. Systematic Biology, 60, 188-203. Rubinoff D (2006) DNA barcoding evolves into the familiar. Conservation Biology, 20, 1548-1549. Saitou N & Nei M (1987) The neighbour-joining method: a new method for reconstructing evolutionary trees. Molecular Biology and Evolution, 4, 406-425. Sambrook J, Russell DW (2001) Preparation and separation of eukaryotic genomic DNA. In Molecular Cloning, A Laboratory Manual (eds Fritsch EF & Maniatis T ). pp461-470. 3rd ed, Cold Spring Harbor Laboratory Press, New York, USA. Steppan SJ, Adkins RM, Anderson J (2004) Phylogeny and divergence-date estimates of rapid radiations in muroid rodents based on multiple nuclear genes. Systematic Biology, 53, 533-553. Steppan SJ, Adkins RM, Spinks PQ et al. (2005) Multigene phylogeny of the Old World mice, Murinae, reveals distinct geographic lineages and the declining utility of mitochondrial genes compared to nuclear genes. Molecular Phylogenetics and Evolution, 37, 370-388. Swofford DL (2003) PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts. Thomas, O (1911) Mammals collected in the provinces of Kan-su and Sze-chewan, western China, by Mr. Malcom Anderson, for the Duke of Bedford's exploration of eastern Asia. Abstracts of the Proceedings of the Zoological Society of London, 90, 3-5. Valone TJ, Schutzenhofer MR (2007) Reduced rodent biodiversity destabilizes plant populations. Ecology, 88, 26-31. Velzen RV, Weitschek E, Felici G et al. (2012) DNA Barcoding of Recently Diverged

This article is protected by copyright. All rights reserved.

Accepted Article

Species: Relative Performance of Matching Methods. PLoS ONE, 7, e30490. Wang YX (2003) Rodentia. In A complete checklist of mammal species and subspecies in China: a taxonomic and geographic reference. pp137-223. China Forestry Publishing House, Beijing, China. Weitschek E, Velzen R, Felici G et al. (2013) BLOG 2.0: a software system for character-based species classification with DNA Barcode sequences. What it does, how to use it. Molecular Ecology Resources, 13, 1043-1046. Will KW, Rubinoff D (2004) Myth of the molecule: DNA barcodes for species cannot replace morphology for identification and classification. Cladistics, 20, 47-55. Yasuda SP, Vogel P, Tsuchiya K et al. (2005) Phylogeographic patterning of mtDNA in the widely distributed harvest mouse (Micromys minutus) suggests dramatic cycles of range contraction and expansion during the mid-to late Pleistocene. Canadian Journal of Zoology, 83, 1411-1420. Zhang JB, Hanner R ( 2011) DNA barcoding is a useful tool for the identification of marine fishes from Japan. Biochemical Systematics and Ecology, 39, 31-42.

Contributions Bisong Yue and Jing Li* initiated and designed the project. Jing Li carried out the study and wrote the manuscript. Jing Li* contributed to final editing of the paper. Min Yang participated in sample collection and generated the figures. Xin Zheng, Xiuyue Zhang and Yansen Cai participated in the laboratory processes. (Jing Li* refers to the corresponding author.)

This article is protected by copyright. All rights reserved.

Accepted Article

Data Accessibility DNA sequences: GenBank accessions and sampling information are shown in Table S1. BOLD project "Barcoding Murinae and Arvicolinae in southwestern China", accessions RBOLD001-14-RBOLD090-14. Phylogenetic data and final DNA sequence assembly is available in TreeBASE and the study ID is S15265.

Figure Legends Fig. 1 Frequency distribution histograms of all K2P pairwise distances for the COI gene of (a): Murinae and (b): Arvicolinae, exhibiting no barcoding gap.

Fig. 2 Distributions of cumulative errors based on K2P pairwise distance of COI in (a) Murinae and (b) Arvicolinae. Optimal thresholds are indicated.

Fig. 3 The COI neighbor-joining tree based on K2P distances for (a): Murinae; (b): Arvicolinae. Most bold-labeled taxa are displayed in detail at right. Species in grey shadow contain different geographic populations. Nodes with BP less than 75% were collapsed. Numbers refer to nodal support values inferred from NJ bootstrap. Nodes for particular species with multiple specimens are diagramed as vertical lines or triangles, with the vertical heights representing sample sizes (indicated in parenthesis) and the horizontal depth indicating the genetic divergence.

This article is protected by copyright. All rights reserved.

Accepted Article

Fig. 4 Rodents from different geographic populations could be distinguished by phylogenetic trees based on DNA barcoding.

Table Legends Table 1 Original samples used in this study.

Table 2 K2P pairwise distances of the COI gene within different taxonomic levels of the investigated Murinae and Arvicolinae.

Table 3 Unexpected genetic divergence of the COI gene in Murinae and Arvicolinae. (a) Relatively high intraspecific K2P pairwise distances. (b) Relatively low interspecific K2P pairwise distances. (* indicate the species containing different geographic populations. (c) K2P pairwise distances of taxa with paraphyly or deep internal splits in DNA barcode trees. Table 4 Ambiguous identifications in the online nucleotide BLAST analyses. Underlined accessions indicate failures caused by the absence of a highest-scored-hit (identity

DNA barcoding of Murinae (Rodentia: Muridae) and Arvicolinae (Rodentia: Cricetidae) distributed in China.

Identification of rodents is very difficult mainly due to high similarities in morphology and controversial taxonomy. In this study, mitochondrial cyt...
594KB Sizes 0 Downloads 3 Views