Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Q1 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 Q4 56 57 58 59 60 61 62 63 64 65 66

Contents lists available at ScienceDirect

Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi

Decision trees for the analysis of genes involved in Alzheimer's disease pathology Sonia L. Mestizo Gutiérrez a, Marisol Herrera Rivero b, Nicandro Cruz Ramírez c, Elena Hernández d, Gonzalo E. Aranda-Abreu d,n a

Doctorado en Investigaciones Cerebrales, Universidad Veracruzana, Av. Luis Castelazo Ayala S/N, Xalapa, Veracruz 91190, Mexico Doctorado en Ciencias Biomédicas, Universidad Veracruzana, Av. Luis Castelazo Ayala S/N, Xalapa, Veracruz, Mexico c Departamento de Inteligencia Artificial, Universidad Veracruzana, Sebastián Camacho 5, Centro, Xalapa, Veracruz 91000, Mexico d Centro de Investigaciones Cerebrales, Cuerpo Académico de Neuroquímica, Universidad Veracruzana, Av. Luis Castelazo Ayala S/N, Xalapa, Veracruz, Mexico b

H I G H L I G H T S

    

We We We We We

used decision trees to classify different stages of Alzheimer's disease (AD). examine the expression levels of public dataset of 31 individuals. analyzed 69 genes previously reported in a meta analysis and 7 additional genes. used Mini-Mental Stage Examination score and number of neurofibrillary tangles. were found that expression level of tau also an important role in incipient AD.

art ic l e i nf o

a b s t r a c t

Article history: Received 26 February 2014 Received in revised form 22 April 2014 Accepted 1 May 2014

Background: Alzheimer's disease (AD) is characterized by a gradual loss of memory, orientation, judgement and language. There is still no cure for this disorder. AD pathogenesis remains fairly unknown and its underlying molecular mechanisms are not yet fully understood. Several studies have shown that the abnormal accumulation of beta-amyloid and tau proteins occurs 10 to 20 years before the onset of symptoms of the disease, so it is extremely important to identify changes in the brain before the first symptoms. Methods: We used decision trees to classify 31 individuals (9 healthy controls and 22 AD patients in three different stages of disease) according to the expression of 69 genes previously reported in a metaanalysis, plus the expression levels of APP, APOE, BACE1, NCSTN, PSEN1, PSEN2 and MAPT. We also included in our analysis the MMSE (Mini-Mental State Examination) scores and number of NFT (neurofibrillary tangles). Results: Results allowed us to generate a model of classification values for different AD stages of severity, according to MMSE scores, and achieve the identification of the expression level of protein tau that may possibly determine the onset (incipient stage) of AD. Discussion: We used decision trees to model the different stages of AD (severe, moderate, incipient and control) based on the meta-analysis of gene expression levels plus MMSE and NFT scores. Both classifiers reported the variable MMSE as most informative, however it we were found that the protein tau also an important role in the onset of AD. & 2014 Published by Elsevier Ltd.

Keywords: Gene expression Microarray MMSE MAPT

1. Introduction Alzheimer's disease (AD) is a neurodegenerative disorder characterized by a gradual loss of cognitive abilities such as memory, spatialtemporal orientation, judgement and language, slowly declining over

n

Corresponding author. E-mail address: [email protected] (G.E. Aranda-Abreu).

an average period of 8 to 12 years. Cognitive decline is the result of neuronal damage and death due to the generation of neurofibrillary tangles (intracellular) and neuritic plaques (extracellular) in the brain of these patients, mainly constituted by protein tau and amyloid peptides, respectively. There is a familial form of AD due to autosomal dominant inheritance of chromosomal alterations (Herrera et al., 2010), although the most common form of the disease presents a less marked genetic background. The main risk factor for this sporadic form of AD remains to be advanced age, but the genetic predisposition

http://dx.doi.org/10.1016/j.jtbi.2014.05.002 0022-5193/& 2014 Published by Elsevier Ltd.

Please cite this article as: Mestizo Gutiérrez, S.L., et al., Decision trees for the analysis of genes involved in Alzheimer's disease pathology. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.002i

S.L. Mestizo Gutiérrez et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

to develop the disease is being uncovered with the discovery of an increasing number of genes associated to onset, severity and/or progression of AD. The lack of a cure for AD and its rising prevalence worldwide, have driven scientists around the globe to enhance and join efforts to discover accessible methods for an early diagnosis and new, most efficient, therapeutic targets. Microarrays have emerged as a powerful tool to generate genetic expression data and compare the relative RNA abundance between biological samples. Microarrays allow monitoring of parallel gene expression (Ricciarelli et al., 2004) and the study of molecular pathways affected by disease. Machine learning techniques have been successfully applied to microarray analyses in order to establish genetic associations with AD (Ricciarelli et al., 2004; Moscato et al., 2005; Benuskova and Kasabov, 2008; Miller et al., 2008; Kong et al., 2009; Bringay et al., 2010). For example: the National Research Council Canada investigated microarray patterns for AD, compared to healthy controls, using data mining. This study identified 67 genes, including 17 genes previously associated with AD (Walker et al., 2004). Some methods for selection and classification, such as support vector machines (SVMs), neural networks, Bayesian networks, decision trees, bagging, boosting and random forests, have proven useful to identify differences in microarray expression data (Hong et al., 2006; Pirooznia et al., 2008). More recently, a number of gene expression markers for AD have been identified using several machine learning techniques, such as information gain, random forests (based on a collection of decision trees), and genetic algorithms and SVMs. After testing six classifiers (Naive Bayes, C4.5 decision tree, Nearest neighbor, Random forest, SVMþ Gaussian kernel, SVMþlinear kernel), the authors used SVMþlineal kernel as the most accurate classifier, and random forest and SVMþGaussian kernel to evaluate the quality of classification; this study concluded that genetic algorithms, together with SVMs, can identify sets of genes capable of better classify tissues when in combination (Scheubert et al., 2012). Previous gene expression profiling studies have identified various functional categories of genes implicated in the AD pathophysiology, from which probably mitochondrial dysfunction, aberrant intracellular calcium signaling and inflammation in several body tissues and brain regions have become of greater interest. A major increase in abnormal expression has been found during progression from moderate to severe AD, corroborating the importance of early treatments (Cooper-Knock et al., 2012). Most studies for biomarker discovery focus on case-control approaches, comparing between healthy individuals and AD patients, regardless of disease severity; nevertheless, here we aimed to generate a model for different AD stages of severity: incipient, moderate and severe. With this purpose, we used decision trees to classify the expression data of the 69 genes identified by a metaanalysis study of 5 microarray datasets, plus APP, APOE, NCSTN, BACE1, PSEN1, PSEN2 and MAPT, in a public dataset of 31 individuals (22 AD in different stages of severity and 9 healthy controls). 2. Materials and methods 2.1. Decision trees The decision trees represent nested decisions that serve to classify the data. When using a decision tree over the data, we obtain rules that allow their classification. A tree is represented by a set of nodes, leaves and branches. The root node is the attribute from which to start the classification process, the internal nodes correspond to each of the questions about the particular attribute of the problem. The branches coming out of each of these nodes are labeled with the possible values of the attribute. End nodes or leaf nodes correspond to a decision, which coincides with one of the class variables of the problem to be solved. An algorithm for generation of decision trees consists of two stages: the first is the induction of the tree and the second to the

classification. The construction of the tree starts generating the root node, select an attribute test and dividing the training set into two or more subsets, for each partition a new node is generated and so forth. In the second stage of the algorithm, each new object is classified by the tree, then the tree is traversed from the root to a leaf node, from which it determines the membership of the object to a class. Some of the most commonly used classification algorithms are ID3 and C4.5. The ID3 algorithm chooses the best attribute using a heuristic called information gain. ID3 determines which attributes provides insight terminal class and builds a tree with that attribute as first division. The most informative attribute is the smallest maximum entropy or information gain. The entropy is defined as a measure of average uncertainty. Is calculated from the probability of occurrence of each event (Quinlan, 1990). If the target attribute takes different values for c, then S is relative entropy for each value of c is defined as: c

EntropyðSÞ ¼ ∑  pi log 2 pi i¼1

where pi is the ratio/the probability that S belongs to class i. The information gain is the complement of entropy. A higher entropy, less information. Given a set of sample S with respect to an attribute A, the information gain is defined as: Information gain ðS; AÞ ¼ EntropyðSÞ 

c



v A ValuesðAÞ

Sv EntropyðSv Þ S

where Values (A) is the set of all possible values of the attribute A and Sv is a subset of S where the attribute A has value v. We used the C4.5 algorithm is an extension of ID3 algorithm that can handle continuous values and missing values. It is computationally fast and the decision rules are simple and legible. 2.2. Evaluation method: stratified k-fold cross-validation We use the definition of the cross-validation method proposed by Kohavi (1995). The database D split in k mutually exclusive random samples called the folds: D1 ; D2 ; …Dk where such folds have approximately equal size. We train this classifier each i A 1,2, …k using D\Di and test it on Di. The cross-validation accuracy estimation is the total number of correct classifications divided by the sample size (total number of instances in D). Thus, the k-fold cross validation estimate is: acccv ¼

1 ∑ δðIðD n DðiÞ; vi Þ; yi Þ nðvi ;yiÞ A D

where ðIðD n DðiÞ ; vi Þ; yi Þ denotes the label assigned by inducer I to an unbalanced instance vi on dataset D n DðiÞ , yi is the class of instance, vi, n is the size of the complete dataset δði; jÞ is a function where δði; jÞ ¼ 1 if i ¼ j and 0 if i aj. In stratified k-fold cross-validation, the folds approximately contain (roughly) the same proportion of classes as in the complete dataset D (Ameca and Cruz Mezura). We evaluate the performance of classifiers using the following measures: (a) Accuracy: is the number of correct classifications divided by the size of the corresponding test set. (b) Sensitivity: is the ability to correctly identify patients who are at a certain stage of AD. (c) Specificity: the ability to correctly identify those patients without AD.

2.3. Implementation details 2.3.1. Dataset We used the public human GDS810 array dataset obtained from the National Center for Biotechnology Information (NCBI) Gene

Please cite this article as: Mestizo Gutiérrez, S.L., et al., Decision trees for the analysis of genes involved in Alzheimer's disease pathology. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.002i

S.L. Mestizo Gutiérrez et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

3

resulted in the correct classification of 26 (83%) out of 31 samples: 1 Expression Omnibus (GEO) database (Affymetrix GeneChip (HG6/9 controls, 5/7 incipient AD, 8/8 moderate AD and 7/7 severe AD. 2 U133A) (Gene Expression Omnibus, 2013). The GDS810 microarray MMSE score was the most informative variable in this gene set 3 published by Blalock et al. (2004) assessed gene expression (Fig. 1). The algorithm provided MMSE score cut off values for each 4 changes in the hippocampal CA1 subfield from a total of 31 stage of disease: normal 425, incipient 19–25, moderate 18–12 5 subjects 9 controls and 22 AD subjects of varying AD severity (7 6 incipient, 8 moderate and 7 severe) with a total of 22,283 genes. and severe o12; which resemble those commonly used in the 7 This dataset includes Mini-Mental State Examination (MMSE) and clinical practice to classify an individual's cognitive state. 8 neurofibrillary tangle (NFT). For the second classifier we tested, using APP, APOE, BACE1, 9 NCSTN, PSEN1, PSEN2 and MAPT as gene set, we obtained 24 (77%) 10 out of 31 samples correctly classified: 5/9 controls, 4/7 incipient 2.3.2. Gene selection 11 AD, 8/8 moderate AD and 7/7 severe AD. As shown in Fig. 2, here We were interested in working with a group of 69 genes 12 we also found MMSE as the most informative variable; however, reported from a meta-analysis by Bossers (2009) this used 5 dif13 and importantly, MAPT expression was also relevant to differentiferent AD studies from 2004 to 2009 (Table 1), in which the gene 14 ate between control and incipient AD samples. This suggests that expression of hippocampal and frontal brain regions from AD 15 when the MMSE is above 25 points, hippocampal MAPT expression individuals in different disease stages was analysed. 36 of the 16 can help differentiating healthy individuals from those in very genes identified by the meta-analysis were significantly up17 early stages of AD, and provide a cut off value that should equal or regulated and 33 were down-regulated in end-stage AD, compared 18 be above 540.9 for an individual to be classified as incipient AD. to controls. We were also interested in including some of the main 19 genes coding for proteins that directly participate in hallmark AD 20 processes: APP, APOE, BACE1, NCSTN, PSEN1, PSEN2 and MAPT 21 (Herrera et al., 2010). 4. Discussion 22 23 2.3.3. Decision trees We tested two classifiers for AD severity through decision trees, 24 We tested two disease classifiers with different sets of genes, using gene expression, MMSE and NFT scores. Because we had pre25 using the J48 algorithm in the Waikato Environment for Knowlselected 76 genes of our interest, we first tested for expression 26 edge Analysis (WEKA) (Hall et al., 2009; Waikato Environment for differences between disease stages to only include in one classifier 27 Knowledge Analysis, 2013) an implementation of the C4.5 algothe most significant genes. MTHFD1, CASC3, CLIP3 and APP showed 28 rithm, with 10-fold cross-validation. The first set of genes tested the most significant differences between AD stages of severity and 29 consisted of those found to significantly differ by ANOVA, and the normal cognition. The methylenetetrahydrofolate dehydrogenase 30 second set consisted of the AD-related genes we added to the list: 1 (MTHFD1) gene encodes a protein with three enzymatic activ31 APP, APOE, BACE1, NCSTN, PSEN1, PSEN2 and MAPT. Expression data ities: 5,10-methylenetetrahydrofolate dehydrogenase, 5,10-methe32 as well as MMSE and NFT scores were included. nyltetrahydrofolate cyclohydrolase and 10-formyltetrahydrofolate 33 synthetase. These catalyse sequential reactions in substrate 34 2.3.4. Statistical analysis 35 Differentially expressed genes were investigated from the total Table 2 36 of 76 genes by one-way ANOVA, using IBM SPSS Statistics version Significant results from the one-way ANOVA. 37 20 and the free tool for the exploration of genomic data Multi 38 Group /gene MTHFD1 CASC3 CLIP3 APP Experiment Viewer (MeV v. 4.8) (Saeed et al., 2006). 39 Control-mean 911.21 1372.3 4320.4 5218.62 40 Control-SD 227.76 345.75 764.55 1136.51 3. Results 41 Incipient-mean 867.7 1646.74 3117.44 5780.91 42 Incipient-SD 227.61 485.11 644.71 443.67 The analysis of variance showed statistically significant differ43 Moderate-mean 721.69 2256.36 3896.83 6351.08 Moderate-SD 95.69 724.96 945.53 717.71 ences with p o0.01 in only 4 of the analysed genes: MTHFD1, 44 Severe-mean 553.57 2198.71 2845.0 4619.01 CASC3, CLIP3 and APP (Table 2). 45 Severe-SD 116.12 561.72 493.72 1203.09 We used these as a first gene set to test its classification power 46 F ratio 6.1262 5.1177 5.3421 4.7231 for disease stage of severity by the algorithm J48. This analysis 47 48 49 50 Table 1 51 Datasets included in the meta-analysis we used for gene selection. (Used with Permission of Dr. Koen Bossers). 52 Dataset Brain area Subjets Array platform #Signif 53 genes 54 55 Bossers Frontal medial 49 Subjects, 7 patients per Braak stage for tangles (Braak I-VI), and Aligent 44k Whole human genome (44,000 992 56 gyrus 7 subjects without neurofibrillary tangle or plaque pathology probes) Q3 (2009) 57 9 Controls (average Braak score for tangles: II), 22 AD patients (average Affymetrix Human Genome U133A GeneChip 1495 Blalock et al. Hippocampus 58 Q2 (2004) Braak score for tangles: V, varying MMSE score) ( 422,000 genes) 59 61 AD samples (31 females/30 males) and 53 samples from individuals cDNA microarrays: “UU arrays” (7762 cDNA 2907 Emilsson Frontal cortex, 60 without psychiatric disorder (27 females/26 males), pooled design clones) and “RIT arrays” (20,000 cDNA clones) et al. (2006) Brodmann areas 61 8 and 9 62 Parachikova Hippocampus and 10 AD patients (amylod load 2.7–13.5%, MMSE 17–22, BRAAK IV-V) and 14 Affymetrix U95Av2 GeneChip, ( 412,000 113 63 et al. (2007) prefrontal cortex non-demented controls (amyloid load 0–13.5%, MMSE 25–30, BRAAK I-V) genes) 64 Xu et al. Hippocampus 4 APOE4/4 AD, 5 APOE3/4, 3 APOE3/3 AD and 3 control samples Affymetrix Human Genome U133A GeneChip 133 65 (2006) ( 422,000 genes) 66 Please cite this article as: Mestizo Gutiérrez, S.L., et al., Decision trees for the analysis of genes involved in Alzheimer's disease pathology. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.002i

S.L. Mestizo Gutiérrez et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

MMSE < =18

> 18

MMSE

MMSE

< =11

> 11

Severe (7.0)

> 25

< = 25

Moderate (8.0)

Incipient (5.0)

Control (11.0/2.0)

Fig. 1. Decision tree for significant genes (MTHFD1, CASC3, CLIP3 and APP). MMSE modelling stages of AD.

MMSE < =18

> 18

MMSE

< =11

Severe (7.0)

MMSE

> 11

Moderate (8.0)

> 25

< = 25

Incipient (5.0)

< = 540.9

Control (8.0)

MAPT

> 540.9

Incipient (3.0/1.0)

Fig. 2. Decision tree for the main AD-related genes (APP, APOE, BACE1, NCSTN, PSEN1, PSEN2 and MAPT). MMSE scores model different stages of AD but, between controls and the incipient stage of disease, MAPT expression level becomes an important variable to better classify and differentiate one stage from the other.

molecules for de novo synthesis of methionine, thymidylate and purines in the cytoplasmic folate-mediated one-carbon metabolism that impacts DNA synthesis and stability, as well as gene expression (MacFarlane et al., 2010). The cancer susceptibility candidate 3 (CASC3), also known as metastatic lymph node 51 (MLN51), is a component of a splicing-dependent multi-protein exon junction complex (EJC) that shows over-expression in several types of cancer (Degot et al., 2002). The EJC is involved in mRNA splicing, transport and processing, as well as translation regulation processes (Baguet et al., 2007). CAP-GLY domain containing linker protein 3 (CLIP3, RSNL1 or CLIPR-59) is involved in the trans-Golgi network (TGN)-endosome dynamics (Perez et al., 2002) it also plays a role in T cell apoptosis (Sorice et al., 2010) and functions as a scaffold protein in adipocyte glucose transport, mediating the membrane localization of phosphorylated protein kinase B (phospho-Akt) (Ding and Du, 2009). Finally, the gene coding for the amyloid precursor protein (APP) is one of the main genes playing crucial roles in the pathophysiology of AD. Since the discovery of APP and its involvement in AD, a number of physiological roles have been attributed to this protein, some considered fragment- or isoform-specific; nonetheless, its true physiological functions are not yet clearly understood. APP functions as a cell surface receptor and participates in neurite outgrowth, transcription regulation, cell adhesion, apoptosis and many other processes (Zhang et al., 2012). Even when the expression of these genes significantly differed in our study, this data appeared to provide less information than MMSE in regard of disease severity status. The results we

obtained with our first classifier modelled AD severity using MMSE score as the most important variable, providing cut off values for each disease stage that are consistent with the values commonly used in practice. Another crucial gene in AD pathology is the one coding for the microtubule-associated protein tau (MAPT), which participates in microtubule assembly and stability, as well as axonal microtubuleplasma membrane interactions. Tau functions are regulated through differentially expressed isoforms and phosphorylation. Although our understanding of tau pathological molecular mechanisms has greatly increased in recent years, the key initial steps leading to tau hyperphosphorylation in AD remain unclear (Ittner et al., 2011). Our second classifier built with the main AD-related genes also reported MMSE score as the most informative variable; nevertheless, here we found that MAPT expression is also an important element to differentiate between normal cognition and mild impairment when the MMSE score appears in the “normal” range. MAPT overexpression induces an increase in transcript levels of pro-inflammatory markers (Wang et al., 2010) antagonizes the Aβ-potentiated apoptosis (Wang et al., 2010), changes cell shape and alters cell growth and organelle distribution (Ebneth et al., 1998) leads to tau hyperphosphorylation and redistribution with increased gliosis and vacuolization (Adams et al., 2009) amongst other resulting abnormalities able to contribute to neurodegeneration. Here, we provide a tentative MAPT expression level that may represent a value of mRNA abundance required to initiate abnormal cellular processes associated with tau

Please cite this article as: Mestizo Gutiérrez, S.L., et al., Decision trees for the analysis of genes involved in Alzheimer's disease pathology. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.002i

S.L. Mestizo Gutiérrez et al. / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

overexpression. As future work, we are considering include the study of databases living tissue of patients with AD to identify the expression levels of candidates genes at the beginning of the disease or even the first symptoms. 5. Conclusions A diverse number of variables, such as gene expression, cognitive state and pathological measures, have been used to generate disease classifiers through several algorithms, but most studies focus on a case-control approach that provides little or no information about the values capable of correctly classifying different disease stages of severity. We tested two decision tree classifiers for AD severity using MMSE and NFT scores in both, but two sets of genes selected differently. Because MMSE was the most important element in both classifiers, we observed that, even when a set of genes significantly differs in expression between groups, expression levels may not represent more valuable information than clinical variables. Nonetheless, we found that MAPT expression levels may have the potential to discern between normal cognition and incipient AD when the MMSE scores are considered normal. In conclusion, we successfully modelled different stages of AD severity with an overall accuracy of 77% and 83% for each classifier tested, using a relatively small number of variables. Although the sample size of this study is a limitation, we show that MAPT expression holds a potential to be an important variable to include for the early diagnosis of AD. Contribution of authors Sonia L. Mestizo Gutiérrez: She studied the trees of decision. Marisol Herrera Rivero: She contribute by editing the article. Nicandro Cruz Ramirez: He contribute to the analysis of results and approval of the article. Elena Hernández Aguilar: She contribute to the analysis of results and approval of the article. Gonzalo E. Aranda-Abreu: He contribute to the analysis of results and approval of the article. References Adams, S.J., Crook, R.J., Deture, M., Randle, S.J., Innes, A.E., Yu, X.Z., Lin, W.L., Dugger, B.N., McBride, M., Hutton, M., Dickson, D.W., McGowan, E., 2009. Overexpression of wild-type murine tau results in progressive tauopathy and neurodegeneration. Am. J. Pathol. 175 (4), 1598–1609. Ameca, M., Cruz, N., Mezura, E., Assessment of Bayesian Network Classifiers as Tools for Discriminating Breast Cancer Pre-diagnosis Based on Three Diagnostic Methods, Advances in Artificial Intelligence Lecture Notes in Computer Science. 7629: 419-431. Baguet, A., Degot, S., Cougot, N., Bertrand, E., Chenard, M.P., Wendling, C., Kessler, P., Le Hir, H., Rio, M.C., Tomasetto, C., 2007. The exon-junction-complexcomponent metastatic lymph node 51 functions in stress-granule assembly. J. Cell Sci. 120, 2774–2784. Benuskova, L., Kasabov, N., 2008. Modeling dynamics using computational neurogenetic approach. Cogn. Neurodyn. 2, 319–334. Blalock, E.M., Geddes, J.W., Chen, K.C., Porter, N.M., Markesbery, W.R., Landfield, P. W., 2004. Incipient Alzheimer's disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. Proc. Nat. Acad. Sci. U.S. A. 101, 2173–2178. K. Bossers, Spot the Difference: Microarray Analysis of Gene Expression Changes in Alzheimer's and Parkinson's Disease. Ph.D. Thesis. Off page, Amsterdam, 2009. Bringay, S., Mathieu, R., Maguelonne, T., Pascal, P., Ronza, R., 2010. Discovering novelty in sequential patterns: application for analysis of microarray data on Alzheimer disease. Stud. Health Technol. Inform. 160 (2), 1314–1318.

5

Cooper-Knock, J., Kirby, J., Ferraiuolo, L., Heath, P.R., Rattray, M., Shaw, P.J., 2012. Gene expression profiling in human neurodegenerative disease. Nat. Rev. Neurol. 8 (9), 518–530. Degot, S., Régnier, C.H., Wendling, C., Chenard, M.P., Rio, M.C., Tomasetto, C., 2002. Metastatic lymph node 51, a novel nucleo-cytoplasmic protein overexpressed in breast cancer. Oncogene 21, 4422–4434. Ding, J., Du, K., 2009. ClipR-59 interacts with Akt and regulates Akt cellular compartmentalization. Mol. Cell Biol. 29 (6), 1459–1471. Ebneth, A., Godemann, R., Stamer, K., Illenberger, S., Trinczek, B., Mandelkow, E., 1998. Overexpression of tau protein inhibits kinesin-dependent trafficking of vesicles, mitochondria, and endoplasmic reticulum: implications for Alzheimer's disease. J. Cell Biol. 143, 777–794. Gene Expression Omnibus [Last accessed: 09-14-2013] Available at: 〈http://www. ncbi.nlm.nih.gov/geo〉. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I., 2009. The WEKA Data Mining Software: an update. SIGKDD Explorations 11 (1), 10–18. Herrera, M., Hernández, M., Manzo, J., Aranda, G., 2010. Enfermedad de Alzheimer: inmunidad y diagnóstico. Rev. Neurol. 51, 153–164. Hong, H., Jiuyong, L., Ashley, P., Hua, W., Grant, D., 2006. A comparative study of classification methods for microarray data analysis. In: Christen, P., Kennedy, P., Li, J., Simoff, S., Williams, G. (Eds.), Proceedings of the Fifth Australasian Conference on Data Mining and Analystics; Australia. Australian Computer Society, Inc, pp. 33–37. Ittner, A., Ke, Y., van Eersel, J., Gladbach, A., Gö tz, J., Ittner, L.M., 2011. Brief update on different roles of Tau in neurodegeneration. IUBMB Life 63 (7), 495–502. Kohavi, R., 1995. A Study of Cross-validation and Bootstrap for Accuracy Estimation and Model Selection. Morgan Kaufmann, pp. 1137–1143. Kong, W., Mou, X., Liu, Q., Chen, Z., Vanderburg, C., Rogers, J., Huang, X., 2009. Independent component analysis of Alzheimer's DNA microarray gene expression data. Mol. Neurodegener. 4, 5. MacFarlane, A.J., Perry, C.A., McEntee, M.F., Lin, D.M., Stover, P.J., 2010. Mthfd1 is a modifier of chemically induced intestinal carcinogenesis. Carcinogenesis 32 (3), 427–433. Miller, J., Oldham, M., Geschwind, D., 2008. A system level analysis of transcriptional changes in Alzheimer's disease and normal aging. J. Neurosci. 28 (6), 1410–1420. Moscato, P., Berreta, R., Hourani, M., Mendes, A., Cota, C., 2005. Genes related with Alzheimer's disease: a comparison of evolutionary search, statistical and integer programming approaches. In: Rothlauf, F. (Ed.), In Applications of Evolutionary Computing, vol. 3449. Springer, Berlin Heidelberg, pp. 84–94. Perez, F., Pernet-Gallay, K., Nizak, C., Goodson, H.V., Kreis, T.E., Goud, B., 2002. CLIPR-59, a new trans-Golgi/TGN cytoplasmic linker protein belonging to the CLIP-170 family. J. Cell Biol. 156, 631–642. Pirooznia, M., Yang, J., Yang, M., Deng, M.Y., 2008. A comparative study of different machine learning methods in microarray gene expression data. BMC Genomics Suppl. I, S13. Quinlan, J., 1990. Induction of decision trees. In: Jude, W., Shavlik, Dietterich, Thomas G. (Eds.), Readings in Machine Learning. Morgan Kaufmann. Ricciarelli, R., D'Abramo, C., Massone, S., Marinari, U., Pronzato, M., Tabaton, M., 2004. Microarray analysis in Alzheimer's disease and normal aging. IUBM Life 56, 349–354. Saeed, A., Bhagabati, N.K., Braisted, J.C., Liang, W., Sharov, V., Howe, E.A., Li, J., Thiagarajan, M., White, J.A., Quackenbush, J., 2006. TM4 microarray software suite. Methods Enzymol. 411, 134–193. Scheubert, L., Lustrek, M., Schmidt, R., Repsilber, D., Fuellen, G., 2012. Tissue-based Alzheimer gene expression markers comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets. BMC Bioinformatics 13, 266. Sorice, M., Matarrese, P., Manganelli, V., Tinari, A., Giammarioli, A.M., Mattei, V., Misasi, R., Garofalo, T., Malorni, W., 2010. Role of GD3-CLIPR-59 association in lymphoblastoid T cell apoptosis triggered by CD95/Fas. PLoS One 5 (1), e8567. Waikato Environment for Knowledge Analysis [Last accessed: 09-20-2013]. Available at: 〈http://www.cs.waikato.ac.nz/ml/weka/〉. Walker, P., Smith, B., Liu, Q.Y., Famili, A.F., Valdés, J.J., Liu, Z., Lach, B., 2004. Data mining of gene expression changes in Alzheimer brain. Artif. Intell. Med. 31 (2), 137–154. Wang, D.B., Dayton, R.D., Zweig, R.M., Klein, R.L., 2010. Transcriptome analysis of a tau overexpression model in rats implicates an early pro-inflammatory response. Exp. Neurol. 224 (1), 197–206. Wang, Z.F., Yin, J., Zhang, Y., Zhu, L.Q., Tian, Q., Wang, X.C., Li, H.L., Wang, J.Z., 2010. Overexpression of tau proteins antagonizes amyloid-beta-potentiated apoptosis through mitochondria-caspase-3 pathway in N2a cells. J. Alzheimers Dis. 20 (1), 145–157. Zhang, H., Ma, Q., Zhang, Y.W., Xu, H., 2012. Proteolytic processing of Alzheimer's βamyloid precursor protein. J. Neurochem. 120 (Suppl. 1), 9–21.

Please cite this article as: Mestizo Gutiérrez, S.L., et al., Decision trees for the analysis of genes involved in Alzheimer's disease pathology. J. Theor. Biol. (2014), http://dx.doi.org/10.1016/j.jtbi.2014.05.002i

Decision trees for the analysis of genes involved in Alzheimer's disease pathology.

Alzheimer's disease (AD) is characterized by a gradual loss of memory, orientation, judgement and language. There is still no cure for this disorder. ...
383KB Sizes 2 Downloads 4 Views