J Med Primatol doi:10.1111/jmp.12130

ORIGINAL ARTICLE

Genetic diversity and population structure of long-tailed macaque (Macaca fascicularis) populations in Peninsular Malaysia Sonia Nikzad1, Soon Guan Tan1, Christina Yong Seok Yien2, Jillian Ng3, Noorjahan Banu Alitheen1, Razib Khan4, Jeffrine J. Rovie-Ryan5, Alireza Valdiani6, Parastoo Khajeaian1 & Sree Kanthaswamy3 1 2 3 4 5

Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, Serdang, Malaysia Department of Biology, Faculty of Science, Universiti Putra Malaysia, Serdang, Malaysia California National Primate Research Center, University of California, Davis, CA, USA Department of Population Health and Reproduction, School of Veterinary Medicine, University of California, Davis, CA, USA Ex-Situ Conservation Division, Department of Wildlife and National Parks (DWNP), Wildlife Genetic Resource Bank (WGRB) Laboratory, Kuala Lumpur, Malaysia 6 Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, Serdang, Malaysia

Keywords crab-eating macaques – microsatellites – non-human primates – population genetics – short tandem repeats Correspondence Soon Guan Tan Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia. Tel.: +603 89468098; fax: +603 89467510; e-mail: [email protected] Accepted April 23, 2014.

Abstract Background The genetic diversity and structure of long-tailed macaques (Macaca fascicularis) in Peninsular Malaysia, a widely used non-human primate species in biomedical research, have not been thoroughly characterized. Methods Thirteen sites of wild populations of long-tailed macaques representing six states were sampled and analyzed with 18 STR markers. Results The Sunggala and Penang Island populations showed the highest genetic diversity estimates, while the Jerejak Island population was the most genetically discrete due to isolation from the mainland shelf. Concordant with pairwise FST estimates, STRUCTURE analyses of the seven PCA-correlated clusters revealed low to moderate differentiation among the sampling sites. No association between geographic and genetic distances exists, suggesting that the study sites, including island study sites, are genetically if not geographically contiguous. Conclusions The status of the genetic structure and composition of longtailed macaque populations require further scrutiny to develop this species as an important animal model in biomedical research.

Introduction The long-tailed macaque (Macaca fascicularis), which is also known as cynomolgus macaque and crab-eating macaque, is found in Brunei, Cambodia, Indonesia, Malaysia (including the peninsula as well as Sabah and Sarawak in East Malaysia), the Philippines, Singapore, southern Thailand, southern Vietnam, and Nicobar, India [13]. [Correction added on 26 June 2014, after first online publication: The first sentence of the Introduction was edited to include Nicobar, India as one of the places where the long-tailed macaque (Macaca fascicularis) can be found.] Long-tailed macaques are desirable J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

for use in medical and scientific research [39] and are primarily used in biotechnological and biomedical research on infectious diseases such as AIDS [50], influenza [22], tuberculosis [5], measles [37], and neurological diseases, including Alzheimer’s [47] and Parkinson’s [9] diseases. The genetic composition of study subjects is one of several factors known to influence experimental outcomes [16]. An animal’s genetic background reflects the genetic variation and structure of the population from which it is derived. Consequently, animals from different source populations can respond differently to the same experimental treatment [7, 44]. Therefore, identifying and quantifying genetic variation and 433

M. fascicularis genetic diversity

structure of these source populations are important for selecting the most appropriate individuals for particular research studies. The analysis of the genetic variation of long-tailed macaques is necessary for the efficient use of the species in biomedical studies [23]. Evaluation of genetic differentiation at the intra- and inter-population levels of long-tailed macaques is also important for the conservation of genetic resources and inbreeding management. Several studies have found that M. fascicularis from different ancestries differ in the genetic variability of their mtDNA, Y chromosome, and autosomal markers [2, 3, 40, 42, 43]. Despite these efforts, the characterization of the genetic variation of natural populations of the longtailed macaque species is still required on a much larger genomic scale as well as the geographic scale. The combination of observed data from wild animals and the use of molecular markers to conclude genetic relationships between individuals have provided perceptions into mating systems, reproductive strategies, dispersal patterns, genetic relatedness, and the influence of kinship on social behavior [8, 29, 40]. In brief, analysis of the genetic structure of populations affords insight into the evolutionary consequences to different social groups of animals [26]. Large numbers of chromosomally unlinked and highly polymorphic short tandem repeats (STRs) [48] can be analyzed from blood, feces, hair, and other biological sources from wild populations. As such, these markers have emerged as one of the most widespread choices for population genetic studies due to their potential to generate high-precision data for estimating genetic variation and structure, rates of deviations from panmixia (random mating), and also assessments of parentage and kinship [17–19, 36]. STR markers isolated in a species can be cross-species amplified in other related species with a variable degree of achievement [11]. Kikuchi et al. [21] selected 148 STR markers from the human genome database to develop a panel of STRs that was applicable for genome-wide screening of long-tailed macaques. After the cross-species amplifications, 66 of 148 of the crosses showed polymorphisms and single-gene inheritance. Moreover, Higashino et al. [15] isolated 671 STR markers through established BAC (bacterial artificial chromosome) clones to investigate the level of polymorphisms in long-tailed macaques; 499 of them were found to be polymorphic. Nevertheless, the use of STR markers for the study of genetic variation of various subspecies of long-tailed macaques has rarely been researched [19]. Several studies have assessed the molecular phylogenetic relationships among macaque species or populations throughout relatively large geographic areas of Southeast Asia; Blancher et al. [2] confirmed the mixed 434

Nikzad et al.

origin (Indonesian/continental) assumption of Mauritius long-tailed monkeys and a population bottleneck estimated by mtDNA polymorphism. Chu et al. [6] studied sequences of the mitochondrial DNA control region to provide genetic evidence for the evolution and dispersal scenario of the three closely related macaque species, M. mulatta, M. cyclopis, and M. fuscata, in eastern Asia. Smith et al. [40], analyzing nucleotide diversity between mtDNA, revealed that long-tailed macaques from Malaysia and Indonesia were far more genetically diverse, while those of Mauritius were less diverse than any other population studied. In addition, Kawamoto et al. [20], Tosi and Coke [42], and Stevison and Kohn [41] have employed molecular phylogenies to determine the geographic backgrounds of non-endemic human-introduced macaque populations. Schillaci et al. [38] indicated the likelihood of monophyly for the longtailed macaques from Singaporean relatives to other Southeast Asian populations for five different mitochondrial gene fragments. In line with the aforementioned efforts, Li et al. [23] attempted to determine the genetic variation for M. f. fascicularis and M. f. aurea monkeys from Indonesia and Cambodia using 26 microsatellite markers. Nevertheless, very few studies have focused on the molecular relationships of natural/wild populations of macaque species in Peninsular Malaysia. As such, the present study employed 18 polymorphic STRs from several sources [18, 19] to assess the genetic variation and structure among the 13 sampling sites of wild long-tailed macaques from Peninsular Malaysia. Methods The investigation detailed in this manuscript complied with the protocols approved by the institutional Animal Care and Use Committee (IACUC), University of California, Davis, USA as adopted by the PREDICT Programme in Malaysia under which the Department is working collaboratively with the EcoHealth Alliance, the Ministry of Health Malaysia, and the Veterinary Services Department, Malaysia. In total, 200 DNA samples were extracted from long-tailed macaque liver tissues using the BIONEER AccuPrepâ Genomic DNA Extraction Kit (BIONEER, Seoul, Korea) and the QIAGEN DNeasy Blood & Tissue Kit (QIAGEN, Valencia, CA, USA). Sampling was conducted opportunistically by the Department of Wildlife and National Parks (DWNP), Malaysia in 2011 on trapped animals that invaded the human settlement areas as part of the DWNP’s Wildlife Disease Surveillance Programme (WDSP). Figures 1 and 2 present the locations from which the macaque samples were obtained using the R package ‘RgoogleMaps’ version 1.2.0.3 [25], while J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

M. fascicularis genetic diversity

Nikzad et al.

Fig. 1 The sampling locations of the long-tailed macaque populations across Peninsular Malaysia.

Fig. 2 Penang mainland and Island sampling locations (http://www.pinmaps.net).

Table 1 shows the names of the states, locations, and number of samples from each location. The sampling was conducted according to all the rules and regulations of the relevant authorities in Malaysia. All samples were J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

kept at the Wildlife Genetic Resource Bank (WGRB) Laboratory, DWNP. This research adhered to the American Society of Primatologists’ principles for the ethical treatment of primates. 435

M. fascicularis genetic diversity

Nikzad et al.

Table 1 Sampling location and sample size of Macaca fascicularis

Location

State

Sample size

Damansara Subang Jeram Puncak Alam Tanjung Sepat (Kuantan and Pekan) (Manong and Pantai Remis) Port Dickson Sunggala Bukit Jering (Sg. Kecil, Kuala Juru, Permatang Kriang, Ladang Byram) Jerejak Island (Tanjung Tokong, Tanjung Bunga, Bukit Gambir)

Selangor Selangor Selangor Selangor Selangor Pahang Perak Negeri Sembilan Negeri Sembilan Kelantan Penang (mainland)

17 15 11 15 17 17 17 20 17 16 10

Penang Penang (island)

13 15

Panels of 18 STR markers with high expected heterozygosities were selected from the studies by Kanthaswamy et al. [19], Satkoski et al. [36], Kikuchi et al. [21], and Rogers et al. [33] (Table 2). The amplifications of the STR loci were conducted in 10 ll reactions using a Biometra TProfessional thermal cycler (Biometra, Goettingen, Germany). Each PCR reaction contained 1.0 ll of 25– 30 ng template DNA, 2 ll of PCR buffer (5X Green GoTaqâ Flexi Buffer; Promega, Madison, WI, USA), 1.5 ll of MgCl2 (25 mM solution), 0.25 ll dNTP Mix (10 mM each), 0.5 ll of each primer (10 mM forward and reverse), 0.06 ll of GoTaqâ DNA Polymerase (5 U/ll), and 4.2 ll of sterile deionized distilled water (ddH2O). The cycling profile for the 18 STR markers was as follows: An initial denaturing step of 5 minutes at 94°C, followed by 29 cycles of 30 s at 94°C, 75 s at an appropriate primer annealing temperature (Table 2), and 15 s at 72°C. Duration of final extension was 7 minutes at 72°C. The amplified DNA fragments were electrophoresed using 4% MetaPhor Agarose gels and 8% polyacrylamide gels (PAGE) for better resolution. Fragment sizes were visualized and assigned using the UVIdoc gel documentation system and program (UVITEC, Cambridge, UK). For dimeric and tetrameric loci, allele sizes were rounded to integers two or four units apart, respectively. The data were examined for probable scoring errors due to stutter or large allele dropout, and incidences of null alleles were examined using the program MICRO-CHECKER version 2.2 [45]. The extent of genetic variation for each locus throughout the populations, the observed (na) and effective (ne) numbers of alleles per locus and the observed (HO) and expected (HE) heterozygosities, were determined using GENEPOP version 4.2 [32] for per locus and ARLEQUIN version 3.5.1.3 [10] across all 18 436

STRs for each study population. GENEPOP 4.2 [28] was used to test for adherence to Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD) among pairs of loci in each population at the 0.05 significance level using Fisher’s exact tests [32] with unbiased Pvalues generated by the Markov Chain Monte Carlo (MCMC) approach [14] with 5000 iterations per batch. The polymorphism information content (PIC) value [4], which shows the informativeness of markers, was estimated for each locus using the POWERMARKER software version 3.25 [24]. F-statistics [FIS (inbreeding coefficient) and FST (population subdivision coefficient)] were ascertained using FSTAT 2.9.3 [12] to test for nonrandom mating in the Malaysian long-tailed macaque populations, and the 95% confidence intervals (CI) for each calculation were evaluated by bootstrapping across loci by performing 1000 permutations of the data [12, 49]. Positive F-statistics indicate an excess of homozygosity caused by inbreeding (FIS) and/or genetic subdivision (FST) [51]. Analysis of molecular variance (AMOVA) implemented in Arlequin [10] was performed to quantify the extent of population differentiation and the distribution of genetic variations within and among the 13 regional sites of long-tailed macaques. The same software was used to estimate the pairwise FST [49]. The significance of the pairwise FST evaluations was verified by a probability distribution created from permutation tests (N = 1000) with Bonferroni corrections for multiple comparisons. A population-specific FST, the mean pairwise FST estimate produced by comparing a specific population with all others, was computed for each population as well. Values of pairwise FST were examined for linear correlation with values of gene diversity (expected heterozygosity) assumed under the neutral hypothesis [32]. We used a principal components analysis (PCA) and a Bayesian clustering analysis to characterize the population structure among the 13 different long-tailed macaque sampling sites. The PCA was performed using SMARTPCA contained in the EIGENSOFT version 5.0 program [30], which employs a key refinement to the PCA-based algorithm for improved resolution of population genetic stratification based on the Tracy–Widom (TW) theory. This program generated the number of genetic clusters among the sampling sites based on each animal’s coordinates along axes of variation. To determine whether the long-tailed macaques’ nuclear genetic variation at the 18 STRs complied with a geographic pattern, the Bayesian analysis was implemented with the program STRUCTURE 2.3.4 [31] to classify individuals discretely among the K PCA-correlated long-tailed macaque genetic clusters or subgroups. According to Patterson et al. [30], PCA methods may J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

Macaque chromosome

1

2

3

3

4

5

5

6

7

8

8

9

9

11

15

15

16

17

Loci

D1s548

D3s1768

D7s1826

D7s794

D6s501

D4s1626

D4s2365

D5s1457

J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

D15s823

D8s1106

D8s1466

D10s1432

271j8

AGAT007

D9s921

D9s934

270e8

D13s765

179–247

119–155

178–246

176–216

156–196

172–216

146–178

124–152

121–173

320–380

96–140

276–320

160–228

152–204

116–156

116–152

176–204

184–216

Size (bp) F: GAACTCATTGGCAAAAGGAA R: GCCTCTTTGTTGCAGTGATT F: GGTTGCTGCCAAAGATTAGA R: CACTGTGATTTGCTGTTGGA F: CATCCATCTATCTCTGTAATCTCTC R: GTTCTTTATTTAACACACCTGTCTCAATCC F: GCCAATTCTCCTAACAAATCC R: GTTCTTTATGCCCATGTGTTAGGGTT F: GCTGGAAACTGATAAGGGCT R: GCCACCCTGGCTAAGTTACT F: TACACTTGAACAAAGTAAGGATGC R: AAAGGAAAAGGAATGGGATG F: AGTAATTCTTCAACTGCATCACC R: ATGCCAAGGATGGTGAGTTA F: TAGGTTCTGGGCATGTCTGT R: TGCTTGGCACACTTCAGG F: GGCTTTGCATCCAGAATTTA R: GTTCTTCACTTCCAACATTGAGGGTC F: TTGTTTACCCCTGCATCACT R: GTTCTTTTCTCAGAATTGCTCATAGTGC F: ATGTTTTTCAGTTCTAATGCACA R: GTTCTTCAATGCTCTGTATAGAGCTACACC F: CAGTGGACACTAAACACAATCC R: TAGATTATCTAAATGGTGGATTTCC F: CGTGCAGATGGTAATCTTTT R: GTTCTTCCCCATTGTAGCTAGCACAGT F: CAATGTTAACTGACTGCATTGCTG R: TGGAAGCTACAATTCAAGATGAGA F: CCTGGAGAATCTTGTGATGC R: GTTCTTTCTTTCATGTTGGCTCCTGT F: TTTCCTAGTAGCTCAAGTAAAGAGG R: GTTCTTAGACTTGGACTGAATTACACTGC F: GTCTGTTCTGTTCATTGATAT R: GTTCTTGGTCTGGAAAACCCCCATTAAT F: GTATGATTTATTTCAGGTTTGCA R: GTTCTTTTGAAACTTAGAGACAGCTTGC

Primer sequence 50 ?30

Table 2 The characteristics of 18 STR loci used in the present study

G09003

AC144142

G08751

55

59

55

55

57

– G08735

59

55

55

55

54

55

55

55

55

52

55

55

55

Annealing temperature (°C)

AC142914

G08816

G09586

G09378

G07912

G08431

G08329

G08368

G08551

G08607

G08622

G08287

G07827

GenBank accession

Kanthaswamy et al., [19]

Satkoski et al., 2008

Kanthaswamy et al., [19]

Kanthaswamy et al., [19]

Kanthaswamy et al., [19]

Satkoski et al., 2008

Kanthaswamy et al., [19]

Kanthaswamy et al., [19]

Kanthaswamy et al., [19]

Rogers et al., [33]

Kanthaswamy et al., [19]

Kanthaswamy et al., [19]

Kanthaswamy et al., [19]

Kanthaswamy et al., [19]

Kikuchi et al., [21]

Satkoski et al., 2008

Kanthaswamy et al., [19]

Kanthaswamy et al., [19]

Reference

Nikzad et al. M. fascicularis genetic diversity

437

M. fascicularis genetic diversity

represent an ideal default for estimating the number of genetic clusters for implementation in the STRUCTURE program. STRUCTURE employs a Markov Chain Monte Carlo (MCMC) approach to calculate L(K), the posterior probability that the data correspond to the hypothesis of K a priori assigned genetically distinctive groups, to evaluate the fractional association of each animal in each group, and by that token, detect the presence of genetic admixture among these individuals. Simulations were performed with 5 9 105 iterations after a burn-in period of 105 using prior population information (admixture with LOCPRIOR). The STRUCTURE runs were repeated 10 times with each set of postulations to ensure that group assignments with the highest probabilities were identified. Results Eighteen loci were utilized for the genetic studies. All the amplified bands revealed sizes within or very near the previously reported size ranges based on the literature (see Table 2). The studied loci provided robust and reliable amplifications of polymorphic bands and exhibited high levels of genetic heterogeneity. When data from the different sampling sites were pooled, none of the loci presented any evidence for null alleles at statistically significant levels (P-value < 0.05). All 18 markers were also statistically unlinked (P-value > 0.05) after data from the different sites were pooled. Estimates of allele number (na), observed (HO), and expected (HE) heterozygosities that averaged over the study sites for each of the 18 loci are presented in Table 3. The HO values across all the loci ranged from 0.33 (D1s548) to 0.68 (D7s1826) with a mean value of 0.50. The HE values throughout all the loci ranged from 0.71 (D1s548) to 0.89 (D15s823) with a mean value of 0.80. A high PIC value relies on the number and frequency distribution of the alleles calculated. Botstein et al. [4] considered markers with PIC values greater than 0.500, such as those used in this study, as highly informative. The statistical significance of 2-locus LD among 18 STR loci was tested using the exact test; the LD P-values were obtained for 153 pairs of combinations in each population. At the adjusted P-value for 5% nominal level (0.00032), no population showed any significant LD between the locus pairs that were located on the same chromosome (D4s1626 & D4s2365, D7s1826 & D7s794, D8s1106 & D8s1466, D9s934 & D9s921, and D10s1432 & 271j8). Exact tests of HWE in probability of P-value < 0.05 after Bonferroni correction demonstrated that various loci from the 18 loci studied here significantly deviated from HWE for each group: Damansara (Selangor) = 10; Subang (Selangor) = 7; 438

Nikzad et al.

Table 3 Per locus estimates of allele numbers (na), observed (HO), expected (HE) heterozygosity, and polymorphic information content (PIC) for each of the 18 STRs. Standard errors are shown in parentheses Locus

na

HO

HE

PIC

D1s548 D3s1768 D7s1826 D7s794 D6s501 D4s1626 D4s2365 D5s1457 D15s823 D8s1106 D8s1466 D10s1432 271j8 AGAT007 D9s921 D9s934 270e8 D13s765 Mean

5 9 9 7 7 7 8 5 11 7 5 6 8 7 7 8 7 7 7.22

0.326 (0.067) 0.639 (0.042) 0.677 (0.082) 0.536 (0.052) 0.420 (0.033) 0.493 (0.035) 0.364 (0.032) 0.413 (0.087) 0.664 (0.087) 0.514 (0.066) 0.406 (0.085) 0.496 (0.072) 0.579 (0.074) 0.362 (0.065) 0.514 (0.033) 0.537 (0.025) 0.578 (0.044) 0.423 (0.042) 0.497 (0.056)

0.705 (0.051) 0.862 (0.064) 0.862 (0.071) 0.790 (0.086) 0.784 (0.086) 0.806 (0.053) 0.822 (0.058) 0.714 (0.075) 0.887 (0.043) 0.790 (0.078) 0.732 (0.043) 0.792 (0.081) 0.860 (0.047) 0.777 (0.055) 0.821 (0.049) 0.845 (0.094) 0.806 (0.037) 0.802 (0.038) 0.803 (0.061)

0.764 0.920 0.870 0.835 0.905 0.889 0.881 0.816 0.913 0.866 0.783 0.812 0.884 0.827 0.861 0.875 0.856 0.862 0.857

Jeram (Selangor) = 5; Puncak Alam (Selangor) = 11; Tanjung Sepat (Selangor) = 8; Pahang = 9; Perak = 11; Port Dickson (Negeri Sembilan) = 9; Sunggala (Negeri Sembilan) = 12; Bukit Jering (Kelantan) = 5; mainland Penang (Penang) = 4; Jerejak Island (Penang) = 8; and Penang Island (Penang) = 13. The average number of STR alleles (na) and the average observed (HO) and expected (HE) values of heterozygous genotypes in each population are presented in Table 4. The Sunggala (Negeri Sembilan) population showed the highest average number of alleles per locus (na = 8.3) and the Jerejak Island (Penang) the lowest (na = 5.8), as other populations presented 6.3–7.8 alleles per locus. The lowest average of HE belonged to Jerejak Island (0.76) that was mainly accompanied by lower numbers of alleles. Estimates of HO varied from 0.40 (Penang Island, Penang) to 0.54 (Subang, Selangor and Bukit Jering, Kelantan), while HE ranged from 0.76 (Jerejak Island) to 0.83 (Sunggala). All pairwise FST and FIS assessments, presented in Table 4, were statistically significant at the 0.05 level of probability (significance test results not shown). Genetic differentiation was lowest (pairwise FST = 0.033–0.038) among mainland Penang, Jerejak Island, and Penang Island populations, while the greatest divergence (pairwise FST = 0.106–0.109) was observed among the Damansara, Perak, Bukit Jering, and Jerejak Island populations. J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

Penang Island

Jerejak Island

Mainland Penang

Bukit Jering

Sunggala

Port Dickson

Perak

Pahang

Tanjung Sepat

Puncak Alam

Jeram

na HO HE Population-specific FIS Population-specific FST Subang

7.6 0.541 0.816 0.345 0.080

7.6 0.484 0.789 0.395 0.081 0.084 (5.69) 0.083 (23.74) 0.072 (19.22) 0.069 (56.13) 0.071 (206.20) 0.109 (176.61) 0.080 (73.92) 0.067 (81.44) 0.108 (258.11) 0.069 (260.76) 0.096 (276.57) 0.068 (286.95) 0.086 (22.10) 0.073 (17.74) 0.080 (51.93) 0.079 (211.35) 0.102 (177.58) 0.084 (71.54) 0.076 (79.26) 0.081 (262.72) 0.052 (262.52) 0.085 (278.14) 0.076 (288.48)

Subang

Damansara

0.045 (4.52) 0.046 (67.00) 0.045 (224.62) 0.042 (157.08) 0.096 (90.95) 0.054 (98.94) 0.093 (253.96) 0.059 (243.33) 0.105 (285.55) 0.057 (268.80)

6.3 0.530 0.799 0.347 0.068

Jeram

0.059 (64.40) 0.062 (221.04) 0.074 (160.71) 0.091 (87.46) 0.062 (95.40) 0.084 (254.59) 0.067 (246.57) 0.093 (261.91) 0.062 (272.19)

7.2 0.463 0.804 0.433 0.070

Puncak Alam

0.042 (234.89) 0.080 (222.18) 0.072 (31.09) 0.055 (38.89) 0.078 (314.12) 0.060 (309.73) 0.095 (324.57) 0.066 (334.71)

7.2 0.513 0.800 0.366 0.067

Tanjung Sepat

0.075 (307.20) 0.062 (220.76) 0.052 (220.39) 0.086 (241.64) 0.049 (353.93) 0.075 (372.18) 0.048 (382.01)

7.4 0.539 0.801 0.334 0.062

Pahang

0.105 (248.00) 0.078 (256.01) 0.101 (182.45) 0.060 (90.08) 0.106 (103.16) 0.067 (113.01)

7.5 0.471 0.795 0.415 0.083

Perak

0.053 (8.18) 0.083 (325.69) 0.068 (333.92) 0.083 (349.36) 0.061 (359.65)

7.8 0.533 0.819 0.354 0.078

Port Dickson

0.061 (331.46) 0.064 (341.74) 0.086 (357.25) 0.045 (367.55)

8.3 0.471 0.830 0.441 0.063

Sunggala

0.066 (161.88) 0.098 (176.34) 0.065 (182.74)

6.7 0.542 0.782 0.314 0.084

Bukit Jering

0.033 (18.29) 0.038 (28.53)

6.5 0.517 0.818 0.381 0.057

Mainland Penang

0.078 (10.57)

5.8 0.457 0.758 0.407 0.086

Jerejak Island

7.3 0.396 0.829 0.531 0.061

Penang Island

Table 4 Mean allele numbers (na), observed (HO), and expected heterozygosity (HE) averaged across 18 STRs for each population. Pairwise FST estimates and geographic distances measured (in km) between populations (parenthesized) are below the diagonal. All F-statistics estimates were significantly different from zero (95% CI), while all the P-values for pairwise FST estimate were significant (P-value < 0.05)

Nikzad et al. M. fascicularis genetic diversity

439

M. fascicularis genetic diversity

The population-specific FIS estimates were significantly high (FIS > 0.31) across all the populations of which the greatest value belonged to Penang Island (FIS = 0.53). Population-specific (average pairwise) FST values varied from 0.057 (mainland Penang) to 0.086 (Jerejak Island) with the other populations ranging between 0.061 and 0.084, reflecting the lowest and highest partially differentiated populations of mainland Penang and Jerejak Island from all others, respectively. Patterson et al.’s refined PCA [30] revealed seven genetic clusters with 95% CI (Fig. 3A). While some long-tailed macaques differentiated along the second linear discriminant (LD2) cluster, the position of most of the other animals cannot be as easily discriminated geographically. Despite the appearance of the genetic

Nikzad et al.

clusters as a conglomerate in PCA space, several Port Dickson and Sunggala individuals were separated from the rest of the study animals along the first linear discriminant (LD1). The application of PCA-based prior group information assisted the Bayesian clustering formation of the STRUCTURE program (Fig. 3B). As indicated by their mixed colors, STRUCTURE analysis suggests that the Jeram and Puncak Alam sites in Selangor consist of populations that are more genetically heterogeneous compared with the rest of the study sites. The Puncak Alam population with a HE estimate of 0.80 exhibited the most genetically admixed individuals with almost equal proportions of admixture from four different genetic clusters.

(A)

(B)

Fig. 3 (A) The PCA determined that the highest level of structure with seven distinct genetic clusters (i.e., K = 7). The first four of the PCA axes explain approximately 75%, 18%, 4%, and 3% of the genetic variation among the long-tailed macaques, respectively. As the first two axes explain over 90% of the variance in this population, this two-dimensional solution is adequate. (B) Clustering assignment depending on the Bayesian approach under an admixture model provided by STRUCTURE software; the analysis for the most likely value of K (K = 7) based on the PCA analysis. Each individual is characterized by a single column that is divided into segments whose size and color correspond to the relative ratio of the animal genome corresponding to a specific cluster; each color represents a cluster. Study sites are separated by black lines. The numbers on the x-axis correspond to a specific sampling site of long-tailed macaques. The y-axis shows the probability of assignment of an individual to each population cluster. The numbers on the x-axis are ascribed as follows: (i) Damansara, (ii) Subang, (iii) Jeram, (iv) Puncak Alam, (v) Tanjung Sepat, (vi) Pahang, (vii) Perak, (viii) Port Dickson, (ix) Sunggala, (x) Bukit Jering, (xi) Penang mainland, (xii) Jerejak Island, and (xiii) Penang Island.

440

J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

M. fascicularis genetic diversity

Nikzad et al.

Damansara, Tanjung Sepat, Perak, and Jerejak Island (Penang) each formed a homogenous group, respectively. Jerejak Island, which is the most homogenous and discrete population in this study, formed the fifth cluster (green), which also comprises half of the Penang Island individuals, thus showing a small but notable separation between the two islands. STRUCTURE also revealed a greater genetic affinity (green) between the Penang, Damansara, Subang, and Pahang populations than expected. To a larger extent, the Pahang individuals were affined with the Tanjung Sepat in Selangor (yellow). Essentially, two additional genetic subdivisions (dark blue and orange) were inferred among the Sunggala (Negeri Sembilan) animals; while approximately half of its population clustered with the Port Dickson animals (dark blue), the rest of the Sunggala’s animals grouped with the Bukit Jering (Kelantan) and half of Penang Island individuals to form the orange cluster. The highest estimates of HE (0.83) were recorded in Sunggala and Penang Island probably because each of these sites contained animals belonging to two separate genetically homogenous clusters. A Pearson’s correlation coefficient to assess the relationship between FST and geographic distances (Table 4) implied that isolation by distance is absent (P-value > 0.05), that is, there is no association between genetic distance and geographic distance. Yet, the results of the AMOVA (Table 5) showed that the majority (93%) of variation is within each of the 13 study sites. The population-specific FST (range = 0.06–0.09; mean = 0.07), which indicates a reasonably low to moderate differentiation among the subpopulations, is also concordant with the estimate of 7% genetic divergence that is attributable to the moderate variation among populations based on the AMOVA. Concordant with the FST estimates, the clustering patterns from the STRUCTURE analysis also confirmed the lack of genetic structure among the tested sites as well as a lack of significant change in allele frequencies across the distributional range of the longtailed macaque populations in Peninsular Malaysia. Discussion We succeeded in amplifying 18 STR loci in 200 longtailed macaque samples using the polymorphic loci Table 5 AMOVA results for all populations Source of variation

Degrees of freedom

Variation %

Among populations Within populations Total FST

12 199 211 0.074

7.35 92.65 100

J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

reported in previous studies of the same species [18, 19]. All of the loci employed in this study provided robust amplifications of polymorphic bands and were highly informative. The number of alleles and the observed and expected heterozygosity estimates were slightly lower than those reported in other studies [18, 19, 21]. Estimates of FIS calculated in this study are much greater than those in other populations of long-tailed macaques in Southeast Asia [18]. The much greater values of HE compared with those of HO arose from high FIS values which are often associated with lower observed heterozygosity for populations of a single species [29]. As none of the 18 loci analyzed showed significant and systematic effects from null alleles across populations, the high values of FIS and excess homozygosity among animals in this study probably stemmed from non-random mating within the social units that were sampled and from remote inbreeding at the local levels [18]. Furthermore, the positive FIS values could also be ascribable to our opportunistic sampling near human settlements, which may have favored related animals from low-ranking matrilines marginalized from the main social group and thus more likely to be trapped. On the other hand, our sampling strategy may have introduced a bias in favor of related higher-ranked animals, which can readily explore group boundaries unchallenged and are thus more likely to stray into dangerous human zones. Patterson et al. [30] argued that estimating K with STRUCTURE is statistically unsound; therefore, we used their TW-enhanced PCA to first identify the different genetic clusters of long-tailed macaque individuals for the STRUCTURE run. The PCA detected a maximum value of K = 7 genetic clusters, and the PCA plot in Fig. 3A confirmed the close genetic relationships among most of the geographically disparate sites. Based on this outcome, the STRUCTURE analysis was subsequently performed under the assumption that seven genetically discrete clusters of long-tailed macaque individuals exist among the different examined sites. The admixed structure in the mainland Penang, Penang Island, and Jerejak Island populations demonstrated in the STRUCTURE analysis (Fig. 3A) was also supported by the pairwise FST test. This cluster along with the Tanjung Sepat – Pahang, the Port Dickson – Sunggala, the Penang – Pahang – Selangor (Damansara and Subang), and the Sunggala – Penang Island – Bukit Jering clusters suggest that these populations not only experience the combined effects of genetic drift and gene flow (both natural and human assisted), but also have only recently diverged. These patterns of STRUCTURE-based assignments and pairwise FST estimates imply that population differentia441

M. fascicularis genetic diversity

Nikzad et al.

tion has occurred at variable rates that are not proportional to geographic distances. Genetic drift has played a significant role in shaping these populations’ genetic composition but the larger effective population sizes of long-tailed macaques in the mainland have minimized Wahlund’s effect [46], that is, reduction of heterozygosity across populations caused by substructure among populations. For instance, Bukit Jering in northern Peninsular Malaysia is surrounded by tall mountainous areas, but exhibits the highest population-specific FST value (0.084) among all mainland populations. The high level of genetic homogeneity within the Jerejak Island population reflects its relative genetic isolation and restricted population size from the rest of the study sites. In spite of significant geographic barriers, the Jerejak Island, mainland Penang, and Penang Island populations exhibit low to moderate genetic divergence. These islands have been connected to the mainland during the Pleistocene glacial periods and then became separated from the mainland as the sea levels rose [1]. Using Ychromosomal data, Rovie-Ryan et al. [35] estimated that the Penang island- Penang mainland split occurred about 0.7–0.4 mya, and based on mtDNA evidence, Rovie-Ryan et al. [34] reported that mainland Penang and the Penang island long-tailed macaque populations still share haplotype Hap4. Human interference may have also contributed to the current long-tailed macaque population genetic structure observed in this study. The individual assignment test in STRUCTURE revealed an affinity between the Penang and Selangor populations. This genetic similarity between these geographically distant sites probably resulted about 40–50 years ago when local traders transported long-tailed macaques from Penang into Selangor for exportation [27]. Based on the AMOVA, partial genetic differentiation among the mainland study sites corresponded with their adjacent geographic proximity. As such, the rapid growth of human populations, urbanization, and conversion of rain forests to other forms of land use may have forced the relocations of many groups of long-tailed maca-

ques. This may have led to admixture among some of the mainland populations and resulted in a more geographically complex distribution with substantial overlap among different genetic clusters among these populations. In contrast to the rhesus macaques (Macaca mulatta), another biomedical relevant non-human primate species, the genetic variation of Malaysian long-tailed macaques is largely uncharacterized. In order to develop this long-tailed macaque population as a non-human primate model for biomedical research, this study attempted to provide a better understanding of this population’s genetic structure and genetic diversity. Although there are some insular differences and more noticeable structure among sites closest to the mainland, there is evidence from this study that the combined affects from genetic drift, inbreeding and natural and human assisted gene flow, could have contributed to the lack of clinal variation among the Peninsular Malaysian long-tailed macaque populations. Hence, based on this study, one Peninsular Malaysian long-tailed macaque is as good as another for purposes of biomedical research. Work remains to be done in the future which includes mtDNA analysis to compare peninsular to other mainland and insular Southeast Asian and Mauritian haplotypes for further examination of possible founder effects and genetic structure. Acknowledgments We are grateful to the anonymous reviewers of JMP for their insightful comments that have helped to improve this manuscript. This project was funded by the Ministry of Higher Education Malaysia through FRGS project no. 01-12-10-973FR. Our special thanks go to The Department of Wildlife and National Parks, Malaysia for providing the samples and some required chemicals used in this study. Our appreciation also goes to Dr. Zainal Zahari Zainuddin, the DWNP Outbreak Response Team (ORT), EcoHealth Alliance and the PREDICT Program (USAID).

References 1 Asadpour R, Lim H, Alashloo MM, Shekafti S, Moussavi S: Application of THEOS imagery to study Chlorophyll-a at the Strait of Penang Island, Malaysia. Paper presented at the 2nd International Conference on Environmental Engineering and

442

Applications, Shanghai, China, 2011. 2 Blancher A, Bonhomme M, Crouau-Roy B, Terao K, Kitano T, Saitou N: Mitochondrial DNA sequence phylogeny of 4 populations of the widely distributed cynomolgus macaque (Macaca

fascicularis fascicularis). J Hered 2008; 99:254–64. 3 Bonhomme M, Blancher A, Cuartero S, Chikhi L, Crouau-Roy B: Origin and number of founders in an introduced insular primate: estimation from nuclear genetic data. Mol Ecol 2008; 17:1009–19.

J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

M. fascicularis genetic diversity

Nikzad et al.

4 Botstein D, White RL, Skolnick M, Davis RW: Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 1980; 32:314–31. 5 Capuano SV 3rd, Croix DA, Pawar S, Zinovik A, Myers A, Lin PL, Bissel S, Fuhrman C, Klein E, Flynn JL: Experimental Mycobacterium tuberculosis infection of cynomolgus macaques closely resembles the various manifestations of human M. tuberculosis infection. Infect Immun 2003; 71:5831–44. 6 Chu JH, Lin YS, Wu HY: Evolution and dispersal of three closely related macaque species, Macaca mulatta, M. cyclopis, and M. fuscata, in the eastern Asia. Mol Phylogenet Evol 2007; 43:418–29. 7 Degenhardt JD, de Candia P, Chabot A, Schwartz S, Henderson L, Ling B, Hunter M, Jiang Z, Palermo RE, Katze M, Eichler EE, Ventura M, Rogers J, Marx P, Gilad Y, Bustamante CD: Copy number variation of CCL3-like genes affects rate of progression to simian-AIDS in Rhesus Macaques (Macaca mulatta). PLoS Genet 2009; 5:e1000346. 8 Di Fiore A: Molecular genetic approaches to the study of primate behavior, social organization, and reproduction. Am J Phys Anthropol 2003; 122(Suppl 37):62–99. 9 Emborg ME: Nonhuman primate models of Parkinson’s disease. ILAR J 2007; 48:339–55. 10 Excoffier L, Laval G, Schneider S: Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinform Online 2005; 1:47– 50. 11 Godavarthi S, Jayaraman A, Gaur A: Cross-species amplification of human microsatellite markers in pig-tailed and stump-tailed macaques. J Genet 2011; 90:e6–9. 12 Goudet J: FSTAT, a program to estimate and test gene diversities and fixation indices (version 2.9.3).

13

14

15

16 17

18

19

20

2001; http://www.unil.ch/izea/software/fstat.htm. Groves CP: Primate Taxonomy. Washington, DC: Smithsonian Institution Press, 2001. Guo SW, Thompson EA: Performing the exact test of Hardy–Weinberg proportion for multiple alleles. Biometrics 1992; 48:361–72. Higashino A, Osada N, Suto Y, Hirata M, Kameoka Y, Takahashi I, Terao K: Development of an integrative database with 499 novel microsatellite markers for Macaca fascicularis. BMC Genet 2009; 10:24. Howard BR: Control of variability. ILAR J 2002; 43:194–201. Kanthaswamy S, Kou A, Smith DG: Population genetic statistics from rhesus macaques (Macaca mulatta) in three different housing configurations at the California National Primate Research Center. J Am Assoc Lab Anim Sci 2010; 49:598–609. Kanthaswamy S, Ng J, Satkoski Trask J, George DA, Kou AJ, Hoffman LN, Doherty TB, Houghton P, Smith DG: The genetic composition of populations of cynomolgus macaques (Macaca fascicularis) used in biomedical research. J Med Primatol 2013; 42:120–31. Kanthaswamy S, Satkoski J, George D, Kou A, Erickson BJ, Smith DG: Interspecies hybridization and the stratification of nuclear genetic variation of rhesus (Macaca mulatta) and long-tailed macaques (Macaca fascicularis). Int J Primatol 2008; 29:1295–311. Kawamoto Y, Shotake T, Nozawa K, Kawamoto S, Tomari K, Kawai S, Shirai K, Morimitsu Y, Takagi N, Akaza H, Fujii H, Hagihara K, Aizawa K, Akachi S, Oi T, Hayaishi S: Postglacial population expansion of Japanese macaques (Macaca fuscata) inferred from mitochondrial DNA phylogeography. Primates 2007; 48:27–40.

J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

21 Kikuchi T, Hara M, Terao K: Development of a microsatellite marker set applicable to genome-wide screening of cynomolgus monkeys (Macaca fascicularis). Primates 2007; 48:140–6. 22 Kuiken T, Rimmelzwaan GF, Van Amerongen G, Osterhaus AD: Pathology of human influenza A (H5N1) virus infection in cynomolgus macaques (Macaca fascicularis). Vet Pathol 2003; 40:304–10. 23 Li A, Sun Z, Zeng L, Li R, Kong D, Zhao Y, Bai J, Zhao S, Shang S, Shi Y: Microsatellite variation in two subspecies of cynomolgus monkeys (Macaca fascicularis). Am J Primatol 2012; 74:561–8. 24 Liu K, Muse SV: PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics 2005; 21:2128–9. 25 Loecher M: RgoogleMaps: Overlays on Google map Tiles in R. Berlin, Germany: R package version 1.2.0.3: Berlin School of Economics and Law (BSEL), 2012. 26 Lukas D, Reynolds V, Boesch C, Vigilant L: To what extent does living in a group mean living with kin? Mol Ecol 2005; 14:2181–96. 27 Muda H: Perdagangan Primat di Malaysia Barat. Kuala Lumpur: Jabatan Perlindungan Hidupan Liar Semenanjung Malaysia, 1982. 28 Nei M: Molecular Evolutionary Genetics. New York, NY: Columbia University Press, 1987. 29 Nsubuga AM, Robbins MM, Boesch C, Vigilant L: Patterns of paternity and group fission in wild multimale mountain gorilla groups. Am J Phys Anthropol 2008; 135:263–74. 30 Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet 2006; 2:e190. 31 Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 2000; 155:945–59. 32 Raymond M, Rousset F: GENEPOP (Version 1.2): population genetics software for exact tests and ecumenicism. J Hered 1995; 86:248–9.

443

M. fascicularis genetic diversity

33 Rogers J, Bergstrom M, Garcia R 4th, Kaplan J, Arya A, Novakowski L, Johnson Z, Vinson A, Shelledy W: A panel of 20 highly variable microsatellite polymorphisms in rhesus macaques (Macaca mulatta) selected for pedigree or population genetic analysis. Am J Primatol 2005; 67:377–83. 34 Rovie-Ryan J, Abdullah M, Sitam F, Abidin Z, Tan S: Genetic diversity of Macaca fascicularis (Cercopithecidae) from Penang, Malaysia as inferred from mitochondrial control region segment. J Indonesia Nat Hist 2014; 2:10–21. 35 Rovie-Ryan J, Abdullah M, Sitam F, Abidin Z, Tan S: Y-chromosomal gene flow of Macaca fascicularis (Cercopithecidae) between the insular and mainland peninsula of Penang state, Malaysia. J Sci Technol Tropics 2013; 9:113–26. 36 Satkoski J, George D, Smith DG, Kanthaswamy S: Genetic characterization of wild and captive rhesus macaques in China. J Med Primatol 2008; 37:67–80. 37 Sato H, Kobune F, Ami Y, Yoneda M, Kai C: Immune responses against measles virus in cynomolgus monkeys. Comp Immunol Microbiol Infect Dis 2008; 31:25–35. 38 Schillaci M, Saravia S, Lee B-H, Matheson C: Preliminary report on mitochondrial DNA variation in Macaca fascicularis from

444

Nikzad et al.

39

40

41

42

43

44

Singapore. Raffles Bull Zool 2011; 59:101–8. Shidler S: Macaques. Madison: Wisconsin National Primate Research Center, University of Wisconsin, 2007. Smith DG, McDonough JW, George DA: Mitochondrial DNA variation within and among regional populations of longtail macaques (Macaca fascicularis) in relation to other species of the fascicularis group of macaques. Am J Primatol 2007; 69:182–98. Stevison LS, Kohn MH: Determining genetic background in captive stocks of cynomolgus macaques (Macaca fascicularis). J Med Primatol 2008; 37:311–7. Tosi AJ, Coke CS: Comparative phylogenetics offer new insights into the biogeographic history of Macaca fascicularis and the origin of the Mauritian macaques. Mol Phylogenet Evol 2007; 42:498–504. Tosi AJ, Morales JC, Melnick DJ: Paternal, maternal, and biparental molecular markers provide unique windows onto the evolutionary history of macaque monkeys. Evolution 2003; 57:1419–35. Trichel AM, Rajakumar PA, Murphey-Corb M: Species-specific variation in SIV disease progression between Chinese and Indian subspecies of rhesus macaque. J Med Primatol 2002; 31:171–8.

45 Van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P: Micro-checker: software for identifying and correcting genotyping errors in microsatellite data. Mol Ecol Notes 2004; 4:535–8. 46 Wahlund S: Zusammensetzung Von Populationen und Korrelationserscheinungen vom Standpunkt der Vererbungslehre aus betrachtet. Hereditas 1928; 11:65–106. 47 Wang CY, Finstad CL, Walfield AM, Sia C, Sokoll KK, Chang TY, Fang XD, Hung CH, Hutter-Paier B, Windisch M: Site-specific UBITh amyloid-beta vaccine for immunotherapy of Alzheimer’s disease. Vaccine 2007; 25:3041–52. 48 Weber JL, May PE: Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am J Hum Genet 1989; 44:388–96. 49 Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution 1984; 38:1358–70. 50 Wiseman RW, O’Connor DH: Major histocompatibility complex-defined macaques in transplantation research. Transplant Rev 2007; 21:17–25. 51 Wright S: Evolution and the Genetics of Populations, Vol. 4: Variability Within and Among Natural Populations. Chicago, IL: University of Chicago Press, 1978.

J Med Primatol 43 (2014) 433–444 © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

Genetic diversity and population structure of long-tailed macaque (Macaca fascicularis) populations in Peninsular Malaysia.

The genetic diversity and structure of long-tailed macaques (Macaca fascicularis) in Peninsular Malaysia, a widely used non-human primate species in b...
570KB Sizes 1 Downloads 3 Views