Journal of Microbiological Methods 97 (2014) 34–43

Contents lists available at ScienceDirect

Journal of Microbiological Methods journal homepage: www.elsevier.com/locate/jmicmeth

A new microarray platform for whole-genome expression profiling of Mycobacterium tuberculosis Balaji Venkataraman a, Madavan Vasudevan b, Amita Gupta a,⁎ a b

Department of Microbiology, University of Delhi South Campus, Benito Juarez Road, New Delhi 110021, India Bionivid Technology [P] Ltd, 401-4 AB Cross, 1st Main, Kasturi Nagar, East of NGEF, Bangalore 560043, India

a r t i c l e

i n f o

Article history: Received 28 October 2013 Received in revised form 12 December 2013 Accepted 12 December 2013 Available online 21 December 2013 Keywords: M. tuberculosis Microarray RNA amplification Whole genome expression profiling

a b s t r a c t Microarrays have allowed gene expression profiling to progress from the gene level to the genome level, and oligonucleotide microarrays have become the platform of choice for large-scale, targeted gene expression studies. cDNA arrays and spotted oligonucleotide arrays have gradually given way to in situ synthesized oligonucleotidebased DNA microarrays for whole-genome expression profiling. With the identification of new coding and regulatory sequences, it is imperative that microarrays be updated to enable more complete expression profiling of genomes. We report here a new in situ synthesized oligonucleotide-based microarray platform for Mycobacterium tuberculosis that has been updated for the latest genome information and incorporates hitherto unannotated genes with described biological functions. This microarray has greater coverage of mycobacterial genes than any other array reported to date. We have also evaluated different labeled-target preparation methods and hybridization conditions for this new microarray to obtain high quality data and reproducible results. The new design has been rigorously validated for its specificity and performance using samples isolated from mycobacteria grown under different environment conditions. Further, the quality of the generated data has been compared with published data and is superior to that obtained using spotted oligonucleotide microarrays. © 2013 Elsevier B.V. All rights reserved.

1. Introduction The high-throughput determination of gene expression and comparative quantification is central to most biological studies today. DNA microarray technology has made it possible to perform gene expression quantification for thousands of genes in a single experiment. The technology has evolved from low-density spotted cDNA arrays to mediumdensity spotted oligonucleotide arrays to high-density in situ synthesized oligonucleotide arrays. The processes involved in labeled-target preparation, hybridization and array washing have also improved considerably to reduce the noise while increasing the specificity of hybridization and sensitivity of detection. Microarrays have proved invaluable in disease management by providing information on pathway modulation in response to drug treatment, identifying biomarkers for diagnosis of disease, allowing for prognosis of infections and cancers, screening for drug resistance, and understanding host responses for vaccine development (Chengalvala et al., 2007; Herwig and Lehrach, 2006; Perreten et al., 2005; Sanchez-Carbayo, 2003; Tree et al., 2006). Infection biology has advanced tremendously with the availability of expression profiles for pathogens, such as Mycobacterium tuberculosis, from varied niches in the host and with the identification of genes crucial to pathogen survival (Rohde et al., 2012; Waddell and Butcher, 2007).

⁎ Corresponding author. Tel.: +91 11 24114172. E-mail address: [email protected] (A. Gupta). 0167-7012/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.mimet.2013.12.009

Despite remarkable improvements in microarray technology and its diverse applications, the lack of correlation between the microarray data produced by different laboratories and different platforms raises concerns about the reliability of the obtained data in a biological context and its usability for downstream experiments (Kendall et al., 2004; Yauk and Berndt, 2007). Several studies have compared the performance of different microarray platforms in quantifying the expression level of genes, by focusing on aspects such as reproducibility, accuracy, statistical significance, and technical and biological variability (DeRisi et al., 1996). The performance of microarrays has also been compared with that of newer technologies, such as high-throughput transcriptome sequencing (RNA-Seq). While RNA-Seq offers the advantages of being genome annotation-independent and avoiding any bias that may be introduced during hybridization on microarrays, it poses significant wet-lab challenges in terms of input RNA quality and quantity, lengthy library preparation protocols and technical expertise (Kogenaru et al., 2012). Additionally, there are limited solutions for downstream analysis of sequencing data. In contrast, the sample processing and analysis pipeline for microarrays are robust and well defined. Consequently, microarrays continue to be the preferred platform for transcript profiling of wellannotated genomes and for studies with large sample sizes. The success of any microarray project lies in a good array design, robust labeled-target preparation method and stringent hybridization and washing conditions to ensure low signal-to-noise ratios and reproducible data quality. Genome coverage, probe characteristics such as GC-percentage, melting temperature (Tm), cross-hybridization potential,

B. Venkataraman et al. / Journal of Microbiological Methods 97 (2014) 34–43

sequence specificity, probe multiplicity and duplication are all important considerations when designing an array. Protocols for RNA amplification and labeling should also be optimized for every microarray platform to obtain labeled targets with high specific activity that represent the entire transcriptome with the least amount of priming bias and high reproducibility. Single-dye labeling-based target preparation has been shown to give results in greater concordance with real-time PCR and RNA-Seq than two-dye labeling (Git et al., 2010). Furthermore, hybridization and washing conditions should be standardized for every genome to avoid biases in target annealing and to eliminate non-specific interactions. In this study, we describe a new in situ synthesized oligonucleotide microarray for M. tuberculosis H37Rv. The array incorporates specific probes for all of the annotated genes of Mycobacteria, as well as probes for 20 unannotated genes. These genes comprise toxin–antitoxin loci in M. tuberculosis and have been shown to have biological functions in M. tuberculosis (Arcus et al., 2005; Gupta, 2009; Pandey and Gerdes, 2005; Ramage et al., 2009). We have employed two different methods, incorporating different priming strategies for the amplification of mycobacterial RNA and the Cy3 labeling of amplified products. Two methods were used because priming methodologies can create significant bias in the representation of the mRNA population and therefore need to be evaluated. The effect of formamide on the stringency of the hybridization of labeled targets to the array was also studied for the GC-rich mycobacterial genome. Furthermore, data obtained using this new platform was validated using real-time PCR and was compared with published data obtained for spotted oligonucleotide microarrays of M. tuberculosis. 2. Methods 2.1. Designing of the microarray Information about M. tuberculosis microarrays was obtained from various databases, including MTBreg (http://www.doe-mbi.ucla. edu/Services/MTBreg/), EBI Array Express (http://www.ebi.ac.uk/ arrayexpress/), NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/) and Pubmed (http://www.ncbi.nlm.nih.gov/pubmed/). Information pertaining to the chronology of submitted data, type of microarray synthesis, genome coverage, probe density and number of samples processed was collected to characterize existing M. tuberculosis microarrays. M. tuberculosis H37Rv transcript sequences from Tuberculist Release 25 (http://tuberculist.epfl.ch/) were downloaded in FASTA format. Sequences for genes hitherto unannotated but shown in the literature to encode proteins were also obtained from the M. tuberculosis H37Rv genome sequence. Agilent e-array web design software (https:// earray.chem.agilent.com/earray/) was used to create a custom array design to represent the whole M. tuberculosis H37Rv genome. FASTA sequences for all transcripts were uploaded in e-array, and probes were designed as ≤60-mer length according to the base composition methodology, cross hybridization potential, GC percent and Tm (melting temperature). A minimum of one specific probe per gene was designed in which several genes were represented by multiple probes. Sixtyseven genes with duplicated sequences in the genome were excluded from the query. All probes were cross-checked against Tuberculist (http://tuberculist.epfl.ch/) using the BLAST alignment tool to ensure specificity. 2.2. In situ microarray synthesis Microarrays were synthesized using the proprietary non-contact SurePrint Ink-jet technology (Agilent Technologies), in which oligo monomers are deposited uniformly onto specially prepared 1 × 3inch glass slides. The process involves the in situ synthesis of the oligonucleotide probes, base by base; from digital sequence files directly onto the glass slide using phosphoramidite synthesis chemistry (Hughes et al., 2001; Nakaya et al., 2007). The precise inkjet process results in

35

consistent spot uniformity and traceability, leading to high sensitivity and specificity. 2.3. Biological material M. tuberculosis H37Rv was grown in Middlebrook 7H9 (Difco, Becton-Dickinson and Co., USA) supplemented with 0.05% Tween 80 and 10% albumin dextrose catalase (ADC, Difco, Becton-Dickinson and Co., USA) at 37 °C and 200 rpm, reaching an OD600 of 0.4–0.5. The culture was treated with isoniazid (isonicotinylhydrazine, INH) (Sigma Aldrich, USA) at 1 μg/ml. Control cultures were grown without the drug treatment. Cells were harvested after 0, 6, 24 and 72 h of growth in either the presence or absence of the drug. 2.4. RNA extraction and amplification RNA was isolated using RNeasy columns (Qiagen Inc., USA) and previously described methods (Balaji et al., 2013). Purity and integrity of RNA were assessed by microfluidics-based capillary electrophoresis using RNA 6000 Nano kit on the Agilent 2100 Bioanalyser (Agilent Technologies Inc., USA). All RNA samples had RIN values greater than 9.0. The first priming method (PolyA-dT) involved the polyadenylation of mycobacterial RNA, followed by oligo-dT-based amplification and labeling to produce cRNA. Polyadenylation was performed using a Poly (A) polymerase tailing kit (Epicenter Biotechnologies, USA), according to the manufacturer's instructions. Briefly, 2 μg from each RNA sample, in duplicate, was treated with polyA polymerase at 37 °C for 20 min. Poly A tailed RNA was purified using RNeasy columns (QIAGEN Inc., USA) and quantified using a Nanodrop 2000c spectrophotometer (Thermo Scientific, USA). A total of 25 ng of Poly A tailed RNA was amplified and labeled using the oligo-dT-based T7 promoter primers included in the Low Input Quick Amp labeling kit (Agilent Technologies, G4140-90040). The Agilent spike-in controls included in the One-Color RNA Spike-In Kit (Agilent Technologies, 5188–5282) were simultaneously labeled and amplified with the RNA samples. The second priming method (WT) involved random-priming based amplification and labeling of mycobacterial RNA. For this, 25 ng of the native, non-adenylated total RNA was amplified and labeled using the random nucleotide-based T7 promoter primers included in the Low Input Quick Amp WT labeling kit (Agilent Technologies, G4140-90042). The labeled cRNA were purified using RNeasy columns (QIAGEN Inc., USA). cRNA yields and specific activities were measured using a Nanodrop 2000c spectrophotometer (Thermo Scientific, USA). The cRNA profiles were also obtained by microcapillary electrophoresis using an RNA 6000 Nano kit on the Agilent 2100 Bioanalyzer (Agilent Technologies Inc., USA). 2.5. Microarray hybridization, washing and scanning For all arrays, the Gene Expression Hybridization Kit (Agilent Technologies, 5188–5279) and associated protocols (G4140-90040) were used. To evaluate the effect of formamide on the annealing specificity of the GC-rich targets, the purified cRNA (600 ng) was fragmented with fragmentation blocking mix in the presence or absence of formamide (7%) by incubating samples at 60 °C for 30 min. Microarrays were prepared in an Agilent Technologies Hybridization Chamber according to the manufacturer's instructions (G2534-90001). Once loaded into the hybridization chamber, the samples were placed in the hybridization oven (Agilent Technologies, G2505-80085) and incubated for 17 h at 65 °C while rotating at setting 10. Following hybridization, the samples were washed according to the procedure described by Agilent Technologies. Microarray slides were scanned at a resolution of 5 μm using an Agilent Microarray scanner (G2565CA) with scan control software as per the manufacturer's instructions (G2505-90020). The settings were as follows: Agilent HD_GX_1colour (61 × 21.6 mm), TIFF 20 bit, and Photomultiplier tube (PMT) gain 100%. Scanned image

36

Table 1 Unannotated genes of M. tuberculosis unique to our microarray.

12 10

6 4 2 0

Year

B 23% 53% 24%

GI

Rv number

No. of replicate probes

No. of multiple probes

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

71575–71823 c333342–333160 c547077–547355 c547345–547515 710782–711009 1073325–1073543 c2205558–2205277 c2226131–2225841 c2234644–2234919 2320829–2321059 2321055–2321462 c2402508–2402720 c2505735–2506153 c2506381–2506208 c2547085–2546840 57116999 57117004 3110734–3110507 3174744–3174989 c4140463–4140239

Rv0064A Rv0277A Rv0456A Rv0456B Rv0616A Rv0959A Rv1962A Rv1982A Rv1991A Rv2063 Rv2063A Rv2142A Rv2231A Rv2231B Rv2274A Rv2530A Rv2601A Rv2801A Rv2862A Rv3697A

3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

2 1 3 1 2 2 2 2 2 2 3 2 5 2 2 2 2 2 2 2

Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) and can be queried via GEO series accession number GSE45811 with the microarray platform accession number GPL16972. Alternate hybridization methods were compared by performing differential expression analysis of samples processed using the PolyA-dT method versus the WT method using a Volcano plot. Probes that showed a 2-fold or higher change with a p value b 0.05 (unpaired Student t-test) were considered to be differentially expressed. Similar comparisons were also made between samples treated with and without formamide.

C No. of Experiments

S. no.

8

16 14 12 10 8 6 4 2 0

2.7. Quantitative real-time PCR analysis Low Density

Medium Density

In situ oligonucleotide

High Density

Spotted DNA/cDNA

Very High Density Spotted oligonucleotide

Fig. 1. Review of existing Mycobacterium tuberculosis H37Rv microarrays. (A) Chronological distribution of published M. tuberculosis microarray designs. Each bar shows the number of microarray experiments performed per year for the different microarray probe synthesis technologies. (B) The pie chart shows the distribution of different microarray types for M. tuberculosis. The relative size of each section corresponds to the number of experiments performed using different microarray probe synthesis technologies. (C) Classification of M. tuberculosis microarrays by probe density and probe synthesis technology.

displays were analyzed by quantifying the pixel density of each hybridization spot using the Agilent Feature Extraction software (v10.5). Local background signals were subtracted from the data automatically by this software. Hybridizations were performed in duplicate. 2.6. Microarray data analysis Raw data from the Feature Extraction software were obtained in .txt (Tab delimited) format and were normalized using GeneSpring GX v 12.0. Normalization was performed by taking the 50th percentile for each sample. A baseline correction was applied to the median of all samples. The expected versus observed measurements of the hybridization control probes were compared for hybridization quality control for each sample. Hybridization quality control for replicate samples was performed by measuring the correlation co-efficient, performing a principle component analysis (PCA), creating scatter plots and creating an unsupervised condition tree of samples using the Pearson uncentered algorithm with average linkage rule. The normalized data from the microarray gene expression experiment has been submitted to NCBI's

Real time PCR was carried out as per recommended MIQE guidelines (Bustin et al., 2010) and previously described methods (Balaji et al., 2013). The reactions were performed in a final volume of 10 μl each containing 0.3 μM of each oligonucleotide, 1 μl of cDNA (1 ng), and the Power SYBR Green Master Mix (Life Technologies Corporation, USA). cDNA was obtained from DNA free intact RNA (RIN ≥ 9) using random priming based Superscript III First-strand synthesis system

6.00

* 4.00

* *

Fold Change

No. of Experiments

A

B. Venkataraman et al. / Journal of Microbiological Methods 97 (2014) 34–43

2.00

*

** **

*

*

*

**

* *

0.00

-2.00

72 Hrs

24 Hrs

6 Hrs

* *

* -4.00

* -6.00

Time Rv0456B

Rv2274A

Rv0959A

Rv2063

Rv2142A

Rv2862A

Rv2063A Rv3697A

Fig. 2. Fold change analysis for the twenty unannotated genes unique to our array platform. Eight genes showed time-based changes in expression in M. tuberculosis across 6, 24 and 72 h of growth in culture. Asterisks denote significant fold change with a p value b0.05.

B. Venkataraman et al. / Journal of Microbiological Methods 97 (2014) 34–43

37

Table 2 Probe distribution for M. tuberculosis array design. Group

Type

No. of targets

No. of probes

Purpose

1a 2b 3 4 5 6

All features M. tuberculosis genes Genes with multiple probes (replicated) Genes with one unique probe (replicated) Genes with one unique probe (without replicate) Positive & negative control

4073 4016 664 3285 67 57

15744 15208 5284 9857 67 536

Comprehensive coverage Transcriptional profiling Determine inter-probe variance Determine intra-probe variance Host specificity Normalization/manufacturer requirement

a b

Group 1 = Sum of Group 2 and 6. Group 2 = Sum of Group 3, 4 and 5.

(Invitrogen, Life Technologies Corporation, USA) as per the manufacturer's instructions. Control reactions were set up without the addition of reverse transcriptase (minus RT control). The specificity of the reverse transcription and the PCR reactions was monitored by using minus RT (NAC), minus primer, and minus template controls (NTC). Dissociation curves were performed for each set of primers to check their specificity and to confirm the presence of a unique PCR product. Results were analyzed using the 16S rRNA as a control for normalizing the amount of cDNA. Relative quantification based on the ΔΔCt method was used. Normalization: ΔCt = Ct (sample) Ct (16S rRNA corresponding DNA concentration). The difference between the ΔCt of treated and control sample is referred as ΔΔCt. The fold change was calculated as 2−ΔΔCt. The efficiencies of all primer sets for all the genes for relative quantification were in the same range.

however, the majority of design submissions for M. tuberculosis have occurred since 2006 (Fig. 1A). A comprehensive analysis of the existing 58 microarray designs for M. tuberculosis reported in TB databases revealed that greater than 50% of these designs were synthesized as spotted oligonucleotide arrays, while the remaining were shared between in situ synthesized oligonucleotide arrays and spotted cDNA arrays (Fig. 1B). A spotted cDNA microarray design was used in only low- to mediumdensity formats (Fig. 1C). The vast majority of the spotted oligonucleotide microarray designs were in the low-density format, while in situ synthesized oligonucleotide microarray designs consisted of both medium- and low-density formats (Dufva, 2005). Due to the advancement in fabrication technology and the wide availability of custom arrays, in situ synthesized oligonucleotide arrays have become the preferred microarray platform for whole genome expression profiling (Postier et al., 2008; Rosu et al., 2013).

3. Results and discussion 3.1. Review of existing M. tuberculosis H37Rv microarrays

3.2. A new in situ synthesized oligonucleotide microarray for M. tuberculosis

The first report of the use of microarray technology and the first submission of a microarray design for M. tuberculosis were in 2003;

A new microarray was designed for M. tuberculosis H37Rv. The probes were designed as ≤ 60-mer in length and covered 99% of all

Array 1

Array 2

Fig. 3. Images of replicate arrays.

B. Venkataraman et al. / Journal of Microbiological Methods 97 (2014) 34–43

3.3. Testing and validation of the new microarray design RNA isolated from M. tuberculosis H37Rv was amplified, labeled with Cy3 and then processed on microarray slides, as described in the Methods section. Replicates of the array showed similar fluorescence profiles, a high signal-to-noise ratio, and consistent spot morphology (Fig. 3). Another measure of overall array quality was the average signal-to-noise ratio (S/N), defined as [S/N = (g signal − g background signal)/g background SD]. Generally, S/N ratios greater than 10 indicate high-quality arrays (Gill et al., 2002). The average S/N for our arrays was 2800, with a range of 16–680,000 for all probes. The quality control check for the probes indicated that all of the newly designed probes had a Tm between 78 and 93 °C, a GC content of 40–82% and lengths of 44–60 bases. The probe sequences were checked for sequence dust, vector masking/repeat masking, dust masking, and low complexity masking. None of the probes were outliers of the above rules, and the 5120 uniquely designed probes were thus determined to be highly specific.

A

3%

97%

≤0.25

≥0.26

B 10

Log Signal Intensity

genes (4016 of 4062 genes) annotated in the M. tuberculosis H37Rv genome. Probes for 20 genes of M. tuberculosis that were hitherto unannotated but have been described as having biological functions were incorporated into our array (Table 1). These genes are not present on any whole-genome array of Mtb reported to date, thus making our array design more complete for future analyses of the Mtb transcriptome. Growth-dependent changes in expression levels were observed for several of these genes (Fig. 2), indicating that these genes are actively expressed in Mtb and need to be part of future expression studies to understand their role in mycobacteria. These genes are part of toxin antitoxin loci of M. tuberculosis and could be components of stress response pathways (Gupta, 2009). Our microarray is comprised of 15,744 probes (including the control features) for a total of 4076 targets. Of these, 15,208 probes represent 4016 mycobacterial genes, and 664 of these genes have multiple probes replicated on the array, ultimately accounting for 5284 spots. In addition, 3285 genes have one unique probe replicated three times on the array, and 67 genes have one unique probe present as a single spot on the array. A detailed summary of the microarray design is provided in Table 2.

8 6 4 2 0 Rv0030

Rv0038

Rv0041

Rv0046c

C Log Signal intensity

38

10 8 6 4 2 0

3.4. Probe reproducibility Our array contained three replicates of the same probe (primary probe) for 3947 genes and four replicates of the same probe for two other genes. The 5043 probe sequences specific to all 3949 genes were thus replicated in a total of 15,131 spots on the array. We determined variation in signal intensities for probes with identical sequences by comparisons with the same target gene that was present in different locations on the array. This allowed us to determine whether the location of the probe within the microarray had any effect on reported signal intensity. As shown in Fig. 4A, 97% of the primary probes exhibited excellent reproducibility. Only 3% of all primary probes showed variation among replicates. Fig. 4B provides an example of the reproducibility of primary probes contained on the microarray. All replicates of a primary probe for Rv0030, Rv0038, Rv0041 and Rv0046c returned similar log signal intensity values, with a CV of 2%. The average Coefficient of Variation (CV) of the total replicate probes was 2%, which is considered to be an exceptional quality of reproducibility (Shi et al., 2006). The excellent overall consistency of the primary probes was indicative of high-quality probe synthesis, hybridization and slide washing, thus resulting in minimal spot-to-spot variation in these arrays. 3.5. Alternative probe reproducibility We next examined the variance associated with probes of alternative sequences directed against the same target gene. Different probes

Fig. 4. Probe reproducibility analysis. (A) A pie chart showing the overall distribution variance of primary probes. (B) Three replicates of a probe of identical sequence directed against the same target gene were present on the array. The plot shows the log of signal intensity for all the replicates of each probe sequences for four different target genes. (C) Four probes with different sequences for a single target gene were present as three replicates each on our array. The average signal intensity for the three replicates of each probe is shown for each bar. The four bars for each gene depict the four alternative probes for a single target. Data for the four different probe sequences for the four genes are shown.

designed against the same gene can have different affinities for the target sequence, thus raising issues of probe-to-probe variation and the accuracy of the results. Our array contained 2–8 probes with different sequences for a single target gene spotted as three replicates each. This was performed for 664 genes, accounting for 5625 features on the array, and allowed us to study alternative probe reproducibility. As shown in Fig. 4C, alternative probes of the genes Rv0030, Rv0038, Rv0041, and Rv0046c returned similar log signal intensity values, with an average CV of 1.92%. Similar variation was observed for other genes. Inter-array variation was checked for replicate samples. Excellent reproducibility was obtained between the replicates for low- and highintensity signals, indicating minimal inter-array variation and reproducible target preparation and hybridization (Fig. 5A and B). These results show that our platform has good accuracy for reporting biological results.

B. Venkataraman et al. / Journal of Microbiological Methods 97 (2014) 34–43

A

B

1 2

C

39

1

2

Log (Green Processed signal)

5.38 4.88 4.38 3.88 3.38 2.88 2.38 1.88 1.38 0.88 0.38 -0.12 -0.20

0.80

1.80

2.80

3.80

4.80

5.80

6.80

Log (Concentration) Processed Signal Vs Concentration Fig. 5. (A and B) Heat maps showing the gene expression from replicate arrays. (C) Dynamic range and linearity of the spike-in controls. Data representing the green signal for each spike-in transcript is plotted against the log of the relative concentration for one microarray. The line shown on the plot represents the linear range based on a parametric curve fit though the data.

3.6. Quality control of the microarray Positive controls, which comprised the Spike-in RNA controls from Agilent Technologies, were processed with our sample. These Spike-in controls are a mixture of 10 in vitro synthesized, polyadenylated transcripts derived from the Adenovirus E1A gene that were premixed at concentrations that span six logs and differed by one-log or half-log increments. Data representing the log of the green processed signal for each Spike-in transcript are plotted against the log of the relative concentration in Fig. 5C. Because the graph is plotted on a log scale, the error bars can be evaluated as CVs. As evident from the plot, the spike-in controls showed good linearity in our array with minimal CV. The Quality Control evaluation metrics showed good representations of each metric, thus indicating specific hybridization (data not shown).

3.7. Effect of priming methodologies on the quality and reliability of measurements Two different protocols (PolyA-dT and WT) for RNA amplification and Cy3 labeled target preparation were compared using the new M. tuberculosis microarray. Total RNA samples isolated from the early exponential stage (A 600 ~ 0.4) cultures of M. tuberculosis grown under normal conditions and following treatment with isoniazid (INH; 1 μg/ml for 6 h), a frontline TB drug, were used for this study. The first priming method (PolyA-dT) involved the initial polyadenylation of mycobacterial RNA followed by oligo-dT based amplification and labeling to produce cRNA. The second priming method (WT) involved random-priming based amplification and labeling of mycobacterial RNA to produce cRNA. The specific activity and yield of labeled cRNA

40

B. Venkataraman et al. / Journal of Microbiological Methods 97 (2014) 34–43

obtained by both the protocols were comparable. The PolyA-dT based amplification protocol produced longer cRNA than did the WT-based amplification process (data not shown). This result was expected because the PolyA-dT method primes from the polyA tail at the 3′ end, while the WT method primes randomly in the transcript. Microarray data obtained from each method were evaluated for reproducibility between replicates using correlation co-efficient, principal component analysis, hybridization control signals, and an unsupervised condition tree. For both within and between M. tuberculosis growth stages, a significant correlation was observed between arrays for samples processed using each method. Fig. 6A and B shows a high level of similarity between gene expression profiles from 0- to 6-hour samples. However, samples processed using the PolyA-dT method had a different profile from those processed using the WT method. The detectability of expressed genes in samples processed by each of the two protocols was examined using p values. The relative frequency, for any value x measured as the percentage of values less than or equal to x, of p values was plotted for each protocol (Fig. 6C). There was absolutely no deviation in the relative frequencies between the PolyA-dT and WT methods, indicating a similar sensitivity of measurement for

both methods. Almost all probe sets detected with the PolyA-dT protocol were also detected (at the same p value cutoff) with the WT protocol. 3.8. Comparison of the ability to detect differentially expressed genes using the PolyA-dT and WT protocols The microarray data was analyzed for differential expression of genes in response to INH treatment. Regardless of protocol, more probe sets were detected in the INH treated sample than in the untreated sample. Almost all probe sets detected with the WT protocol were also detected with the PolyA-dT protocol. Unsupervised hierarchical clustering of control and treated samples clearly showed a similarity between the samples processed using the same protocol (Fig. 7A). An analysis of differentially expressed genes by varying fold change (≥1.5, ≥2, ≥4, ≥8) and p value (≤0.05, ≤0.1 and N0.1) between treated and untreated samples processed by PolyA-dT and WT protocols was also performed. The results showed that the majority of the differentially expressed genes identified under the PolyA-dT protocol were also differentially expressed under the WT protocol (Fig. 7B and

A

B PolyA-dT WT

0H 6H

0H 6H

Label

PolyA-dT WT

C 1.2

Relative Frequency

1 0.8 0.6 0.4 0.2 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

p value

RelFreq PolyA-dT

RelFreq WT

Fig. 6. (A) Principle component analysis of gene expression profiles obtained from samples collected at 0 and 6 h after the exponential phase. (B) The hierarchical condition tree analysis, performed using the Pearson uncentered correlation and average linkage rule. The branches of the condition tree are colored to discriminate the two subclusters with the largest distance, which correspond to the two different priming methods. The PolyA-dT method is in green, and the WT method is in black. The lower color bar indicates distinct time-points: 0 h (red) and 6 h (blue). (C) Empirical distributions of the detection p values (x-axis) are plotted against the relative frequency (y-axis) of values less than those on the corresponding curve. Plots show samples processed using the PolyA-dT and WT protocols. The gap between curves in the plots depicts the variation between the two methods.

B. Venkataraman et al. / Journal of Microbiological Methods 97 (2014) 34–43

A

B

41

Table 3 Number of differentially expressed genes detected using each protocol (PolyA-dT and WT) as a function of the minimum fold change and p value. Fold change

p value

PolyA-dT a

WTb

Bothc

1.5

b0.05 b0.1 N0.1 b0.05 b0.1 N0.1 b0.05 b0.1 N0.1 b0.05 b0.1 N0.1

714 715 5 178 178 0 32 32 0 13 13 0

574 577 8 145 145 2 34 34 1 11 11 0

474 474 475 126 126 0 29 29 0 12 12 0

2

4

8

a b c

Number of genes differentially expressed using the PolyA-dT protocol. Number of genes differentially expressed using the WT protocol. Number of genes differentially expressed by both protocols.

protocols. Nineteen genes were detected only by the WT protocol and 65 genes only by the PolyA-dT protocol. 3.9. Comparison of functional gene categories identified as differentially expressed by the PolyA-dT and WT methods

PolyA-dT

WT

Control

INH Treated

C

PolyA-dT (210 entities)

PolyA-dT

WT

Control

INH Treated

WT (164 entities)

Tables 4 and 5 show the functional gene categories that the two methods identified as differentially expressed. Although the number of genes identified as differentially expressed in each category was different between the two methods, the functional pathways identified as being affected by the isoniazid treated were the same for both methods. This indicates that both the protocols were equally efficient at determining the expression profiles for the mycobacterial genome under the given conditions. Among the genes that were identified as differentially expressed by at least one protocol (PolyA-dT or WT) with a p value of 0.05 and fold change greater than 2, few genes were validated using qRT-PCR. The qRT-PCR showed similar changes in expression levels as those observed with the PolyA-dT and WT methods (data not shown) substantiating the reproducibility and accuracy of the microarray. The random priming WT method, being easier, could be the choice for processing mycobacterial samples for microarray. 3.10. Effect of formamide on hybridization stringency

Fig. 7. (A) The hierarchical condition tree analysis, performed using the Pearson uncentered correlation and average linkage rule. The PolyA-dT method is shown in red and the WT method in orange. This is summarized in the color bar underneath the cluster diagram. The lower color bar indicates two conditions: control (black) and 6 h (yellow). (B) The unsupervised hierarchical clustering using the Pearson uncentered algorithm with the average linkage rule for the PolyA-dT and WT methods. (C) Venn diagram representation of genes identified as differentially regulated with respect to the control (treated/untreated) using the PolyA-dT and WT methods with a fold change of 2.0 and false discovery rate of p b 0.05.

Table 3). At lower fold changes for each p value cut-off, more differentially expressed probe sets were discovered with the PolyA-dT protocol than with the WT protocol (compare columns 3 and 4 in Table 3). At higher fold change values, the differentially expressed probe sets were similar for the two protocols. Fig. 7C displays a Venn diagram comparing protocol performance for the criterion fold change of 2 and the p value of ≤ 0.05. Using these criteria, 145 genes were found to be differentially expressed by both

The effect of formamide during the hybridization of labeled cRNA on the array was studied for RNA samples that were obtained from the exponential phase (OD600 −0.4) culture of M. tuberculosis grown for 24 and 72 h and processed using the WT method. Since the mycobacterial genome is GC-rich, the presence of formamide during hybridization should increase the stringency of annealing of the labeled sample to the probes, thereby improving specificity and reducing background noise. An analysis of data obtained for hybridizations of the same samples in the absence and presence of formamide showed that there was a very high level of correlation between replicates, independent of formamide. Hierarchical condition tree analysis showed high levels of similarity between samples processed with and without formamide (Supplementary Fig. S1A). Of the differentially detected genes, 90% were common to samples processed with and without formamide (Supplementary Fig. S1B), irrespective of time points (p value ≤0.05 and fold change of 2 and above). This result indicates that formamide, at the concentrations tested, did not significantly alter the hybridization stringency. 3.11. Comparison of data obtained using the in situ synthesized oligonucleotide-based microarray versus a spotted oligonucleotide-based microarray The data obtained for RNA samples isolated after the 6-hour treatment with isoniazid at a concentration of 1 μg/ml and processed using

42

B. Venkataraman et al. / Journal of Microbiological Methods 97 (2014) 34–43

Table 4 Genes differentially expressed using the PolyA-dT protocol classified into M. tuberculosis functional gene categories. Functional category

Degradation (I.A) Energy metabolism (I.B) Central intermediary metabolism (I.C) Amino acid biosynthesis (I.D) Purines, pyrimidines, nucleosides and nucleotide (I.F) Biosynthesis of cofactors, prosthetic groups and carriers (I.G) Lipid biosynthesis (I.H) Polyketide & non-ribosomal peptide synthesis (I.I) Broad regulatory functions (I.J) Synthesis and modification of macromolecules (II.A) Degradation of macromolecules (II.B) Cell envelope (II.C) Transport/binding proteins (III.A) Chaperones/heat shock (III.B) Protein and peptide secretion (III.D) Detoxification (III.F) Virulence (IV.A) IS elements, repeat sequences, phage (IV.B) PE and PPE families (IV.C) Miscellaneous transferases. (IV.H)

No. of genes detected using PolyA-dT with p value b 0.05

No. of genes detected using WT with p value b 0.05

p value b 0.1

p value N 0.1

5 19 2 1 2 1 12 2 10 3 6 15 5 1 1 3 3 3 9 1

5 18 1 1 2 1 12 2 10 3 6 15 5 1 0 3 3 2 9 1

5 18 1 1 2 1 12 2 10 3 6 15 5 1 0 3 3 2 9 1

0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0

our in situ synthesized oligonucleotide based microarray were compared with the data reported for similarly treated samples processed using a spotted oligonucleotide microarray (Karakousis et al., 2008). Of the 3862 genes analyzed, 12 genes (11 up- and 1 downregulated) were reported to be differentially expressed by Karakousis et al., while 139 (87 up-, 58 downregulated) and 177 (110 up-, 67 downregulated) genes were identified as differentially expressed (p value ≤ 0.05 and fold change of 2) by our WT- and PolyA-dT-based arrays, respectively (Fig. 8). The single downregulated gene and eight of the elevenupregulated genes identified by Karakousis et al. were also identified by our platform. The higher number of differentials obtained in our microarray might be the result of differences in microarray platform with respect to probe sequence and their fabrication procedure. Our array comprises of oligonucleotides synthesized on the slide, which results in more consistent spot uniformity and traceability, leading to high sensitivity and specificity vis-à-vis spotting of the probe oligonucleotides on the glass slide. Further the RNA processing methodology and sensitivity of microarray scanner can also make a difference in the signals

obtained. Nonetheless, the trend for the up- or downregulation of differentially expressed genes was similar for all three data sets and was validated by real-time PCR (Table 6). 4. Conclusions The new M. tuberculosis microarray described here is a more complete representation of the genome than any other array design reported to date. It has been updated to include probes for genes that were unannotated but were shown to have biological functions. The new design has been rigorously validated for its specificity and performance. This microarray allows for the generation of high-quality data, which is in concordance with previously, published data. Further, the sensitivity of this new microarray platform is superior to that of the older microarrays developed using spotted oligonucleotide technology. Using optimized methods for sample preparation, labeling and hybridization, this new microarray enables accurate gene expression profiling of M. tuberculosis for biological studies.

Table 5 Genes observed to be differentially expressed using the WT protocol classified into M. tuberculosis functional gene categories. Functional category

Degradation (I.A) Energy metabolism (I.B) Central intermediary metabolism (I.C) Amino acid biosynthesis (I.D) Purines, pyrimidines, nucleosides and nucleotide (I.F) Biosynthesis of cofactors, prosthetic groups and carriers (I.G) Lipid biosynthesis (I.H) Polyketide & non-ribosomal peptide synthesis (I.I) Broad regulatory functions (I.J) Synthesis and modification of macromolecules (II.A) Degradation of macromolecules (II.B) Cell envelope (II.C) Transport/binding proteins (III.A) Chaperones/heat shock (III.B) Protein and peptide secretion (III.D) Detoxification (III.F) Virulence (IV.A) IS elements, repeated sequences and phage (IV.B) PE and PPE families (IV.C) Miscellaneous transferases. (IV.H)

Number of genes detected using WT with p value b 0.05

Number of genes detected using PolyA-dT with p value b 0.05

p value b 0.1

p value N 0.1

4 17 1 2 3 1 11 2 4 4 5 13 4 2 0 2 2 2 4 1

4 17 1 2 3 1 11 2 4 4 5 13 4 2 0 2 2 2 4 1

4 17 1 2 3 1 11 2 4 4 5 13 4 2 0 2 2 2 4 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

B. Venkataraman et al. / Journal of Microbiological Methods 97 (2014) 34–43

(110

PolyA-dT / 67 entities)

28 22

6 13

74 44

1 0

WT / 58 entities)

(87

7 1

0 0

3 0

(11

Karakousis / 1 entities)

Fig. 8. Venn diagram representation of transcripts observed to be differentially expressed following INH treatment measured by the PolyA-dT and WT methods and according to published data of Karakousis et al.

Table 6 Comparison of the fold change obtained for differentially expressed genes by Karakousis et al. with that obtained using the new array platform with the PolyA-dT and WT protocols. ORF

Rv3139 Rv2245 Rv0341 Rv2246 Rv2846c Rv2244 Rv1955

Gene name

fadE24 kasA iniB kasB efpA acpM higB

Karakousis et al.

New array platform

Rep1

Rep2

Rep3

WTb

PolyA-dTb

qRT-PCRb

8.13 3.26 4.41 1.9 3.09 2.35 1.06

4.82 4.53 13.36 NDa NDa 4.11 0.6

7.14 13.91 16.45 17.39 2.76 7.19 1.87

2.39 10.51 14.24 9.1 11.16 8.71 1.29

2.44 10.48 19.51 9.68 11.52 11.06 1.42

1.68 9.85 12.77 7.62 7.92 7.11 1.18

a

Not detected. Fold change with a p value of b0.05 in INH treated against untreated samples in triplicates. b

Supplementary data to this article can be found online at http:// dx.doi.org/10.1016/j.mimet.2013.12.009. Acknowledgments The authors gratefully acknowledge Professor V. K. Chaudhary for critical evaluation of the manuscript. This work was supported by grant from the Department of Biotechnology, Government of India. References Arcus, V.L., Rainey, P.B., Turner, S.J., 2005. The PIN-domain toxin–antitoxin array in mycobacteria. Trends Microbiol. 13, 360–365. Balaji, V., Gupta, N., Gupta, A., 2013. A robust and efficient method for the isolation of DNA-free, pure and intact RNA from Mycobacterium tuberculosis. J. Microbiol. Methods 93, 198–202. Bustin, S.A., Beaulieu, J.F., Huggett, J., Jaggi, R., Kibenge, F.S., Olsvik, P.A., Penning, L.C., Toegel, S., 2010. MIQE precis: practical implementation of minimum standard guidelines for fluorescence-based quantitative real-time PCR experiments. BMC Mol. Biol. 11, 74.

43

Chengalvala, M.V., Chennathukuzhi, V.M., Johnston, D.S., Stevis, P.E., Kopf, G.S., 2007. Gene expression profiling and its practice in drug development. Curr. Genomics 8, 262–270. DeRisi, J., Penland, L., Brown, P.O., Bittner, M.L., Meltzer, P.S., Ray, M., Chen, Y., Su, Y.A., Trent, J.M., 1996. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat. Genet. 14, 457–460. Dufva, M., 2005. Fabrication of high quality microarrays. Biomol. Eng. 22, 173–184. Gill, R.T., Katsoulakis, E., Schmitt, W., Taroncher-Oldenburg, G., Misra, J., Stephanopoulos, G., 2002. Genome-wide dynamic transcriptional profiling of the light-to-dark transition in Synechocystis sp. strain PCC 6803. J. Bacteriol. 184, 3671–3681. Git, A., Dvinge, H., Salmon-Divon, M., Osborne, M., Kutter, C., Hadfield, J., Bertone, P., Caldas, C., 2010. Systematic comparison of microarray profiling, real-time PCR, and next-generation sequencing technologies for measuring differential microRNA expression. RNA 16, 991–1006. Gupta, A., 2009. Killing activity and rescue function of genome-wide toxin–antitoxin loci of Mycobacterium tuberculosis. FEMS Microbiol. Lett. 290, 45–53. Herwig, R., Lehrach, H., 2006. Expression profiling of drug response—from genes to pathways. Dialogues Clin. Neurosci. 8, 283–293. Hughes, T.R., Mao, M., Jones, A.R., Burchard, J., Marton, M.J., Shannon, K.W., Lefkowitz, S.M., Ziman, M., Schelter, J.M., Meyer, M.R., Kobayashi, S., Davis, C., Dai, H., He, Y.D., Stephaniants, S.B., Cavet, G., Walker, W.L., West, A., Coffey, E., Shoemaker, D.D., Stoughton, R., Blanchard, A.P., Friend, S.H., Linsley, P.S., 2001. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol. 19, 342–347. Karakousis, P.C., Williams, E.P., Bishai, W.R., 2008. Altered expression of isoniazidregulated genes in drug-treated dormant Mycobacterium tuberculosis. J. Antimicrob. Chemother. 61, 323–331. Kendall, S.L., Rison, S.C., Movahedzadeh, F., Frita, R., Stoker, N.G., 2004. What do microarrays really tell us about M. tuberculosis? Trends Microbiol. 12, 537–544. Kogenaru, S., Qing, Y., Guo, Y., Wang, N., 2012. RNA-seq and microarray complement each other in transcriptome profiling. BMC Genomics 13, 629. Nakaya, H., Reis, E., Verjovski-Almeida, S., 2007. Concepts on microarray design for genome and transcriptome analyses. In: Buzdin, A., Lukyanov, S. (Eds.), Nucleic acids hybridization modern applications. Springer, Netherlands, pp. 265–307. Pandey, D.P., Gerdes, K., 2005. Toxin–antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes. Nucleic Acids Res. 33, 966–976. Perreten, V., Vorlet-Fawer, L., Slickers, P., Ehricht, R., Kuhnert, P., Frey, J., 2005. Microarraybased detection of 90 antibiotic resistance genes of gram-positive bacteria. J. Clin. Microbiol. 43, 2291–2302. Postier, B., Didonato Jr., R., Nevin, K.P., Liu, A., Frank, B., Lovley, D., Methe, B.A., 2008. Benefits of in-situ synthesized microarrays for analysis of gene expression in understudied microorganisms. J. Microbiol. Methods 74, 26–32. Ramage, H.R., Connolly, L.E., Cox, J.S., 2009. Comprehensive functional analysis of Mycobacterium tuberculosis toxin–antitoxin systems: implications for pathogenesis, stress responses, and evolution. PLoS Genet. 5, e1000767. Rohde, K.H., Veiga, D.F., Caldwell, S., Balazsi, G., Russell, D.G., 2012. Linking the transcriptional profiles and the physiological states of Mycobacterium tuberculosis during an extended intracellular infection. PLoS Pathog. 8, e1002769. Rosu, V., Bandino, E., Cossu, A., 2013. Unraveling the transcriptional regulatory networks associated to mycobacterial cell wall defective form induction by glycine and lysozyme treatment. Microbiol. Res. 168, 153–164. Sanchez-Carbayo, M., 2003. Use of high-throughput DNA microarrays to identify biomarkers for bladder cancer. Clin. Chem. 49, 23–31. Shi, L., Reid, L.H., Jones, W.D., Shippy, R., Warrington, J.A., Baker, S.C., Collins, P.J., de Longueville, F., Kawasaki, E.S., Lee, K.Y., Luo, Y., Sun, Y.A., Willey, J.C., Setterquist, R.A., Fischer, G.M., Tong, W., Dragan, Y.P., Dix, D.J., Frueh, F.W., Goodsaid, F.M., Herman, D., Jensen, R.V., Johnson, C.D., Lobenhofer, E.K., Puri, R.K., Schrf, U., ThierryMieg, J., Wang, C., Wilson, M., Wolber, P.K., Zhang, L., Amur, S., Bao, W., Barbacioru, C.C., Lucas, A.B., Bertholet, V., Boysen, C., Bromley, B., Brown, D., Brunner, A., Canales, R., Cao, X.M., Cebula, T.A., Chen, J.J., Cheng, J., Chu, T.M., Chudin, E., Corson, J., Corton, J.C., Croner, L.J., Davies, C., Davison, T.S., Delenstarr, G., Deng, X., Dorris, D., Eklund, A.C., Fan, X.H., Fang, H., Fulmer-Smentek, S., Fuscoe, J.C., Gallagher, K., Ge, W., Guo, L., Guo, X., Hager, J., Haje, P.K., Han, J., Han, T., Harbottle, H.C., Harris, S.C., Hatchwell, E., Hauser, C.A., Hester, S., Hong, H., Hurban, P., Jackson, S.A., Ji, H., Knight, C.R., Kuo, W.P., LeClerc, J.E., Levy, S., Li, Q.Z., Liu, C., Liu, Y., Lombardi, M.J., Ma, Y., Magnuson, S.R., Maqsodi, B., McDaniel, T., Mei, N., Myklebost, O., Ning, B., Novoradovskaya, N., Orr, M.S., Osborn, T.W., Papallo, A., Patterson, T.A., Perkins, R.G., Peters, E.H., Peterson, R., Philips, K.L., Pine, P.S., Pusztai, L., Qian, F., Ren, H., Rosen, M., Rosenzweig, B.A., Samaha, R.R., Schena, M., Schroth, G.P., Shchegrova, S., Smith, D.D., Staedtler, F., Su, Z., Sun, H., Szallasi, Z., Tezak, Z., Thierry-Mieg, D., Thompson, K.L., Tikhonova, I., Turpaz, Y., Vallanat, B., Van, C., Walker, S.J., Wang, S.J., Wang, Y., Wolfinger, R., Wong, A., Wu, J., Xiao, C., Xie, Q., Xu, J., Yang, W., Zhang, L., Zhong, S., Zong, Y., Slikker Jr., W., 2006. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161. Tree, J.A., Elmore, M.J., Javed, S., Williams, A., Marsh, P.D., 2006. Development of a guinea pig immune response-related microarray and its use to define the host response following Mycobacterium bovis BCG vaccination. Infect. Immun. 74, 1436–1441. Waddell, S.J., Butcher, P.D., 2007. Microarray analysis of whole genome expression of intracellular Mycobacterium tuberculosis. Curr. Mol. Med. 7, 287–296. Yauk, C.L., Berndt, M.L., 2007. Review of the literature examining the correlation among DNA microarray technologies. Environ. Mol. Mutagen. 48, 380–394.

A new microarray platform for whole-genome expression profiling of Mycobacterium tuberculosis.

Microarrays have allowed gene expression profiling to progress from the gene level to the genome level, and oligonucleotide microarrays have become th...
2MB Sizes 0 Downloads 0 Views