news & views NANOTOXICOLOGY

Seeing the trees for the forest

Meta-analysis of the literature on quantum dot toxicity using a machine-learning tool helps reveal hidden relationships between material properties and toxicity.

Elizabeth A. Casman and Jeremy M. Gernand

C

admium-containing semiconductor quantum dots (QDs) can be tailored and tuned for specific applications in imaging, medicine, electronics, and other fields. They can vary in size, coatings, functionalization, chemical content, and surface charge, and as applications proliferate, so does the number of unique QD types. There are hundreds of in vitro studies on the toxicity of various QDs using different cell lines and under different conditions. Given this variety, it has been difficult to generalize across studies to ascertain the properties of the QDs that lead to the observed toxic effects. Writing in Nature Nanotechnology, Igor Medintz, Yoram Cohen, Rong Liu and colleagues at the US Naval Research Laboratory, University of California Los Angeles and Sotera Defense Solutions report a metaanalysis of the in vitro QD toxicity literature, which shows that the properties best correlated with toxicity are QD diameter, surface ligand type, surface modification, and shell composition1. The numerical method used in the metaanalysis is called a random forest algorithm. It has been used previously to identify and rank the characteristics responsible for carbon nanotube pulmonary toxicity 2,3 and in the development of chemical toxicity quantitative structure–activity relationships, especially when traditional regression methods are not adequate4,5. What is new, and very welcome in the study by Medintz and co-workers, is the way they attempt to retrieve the nonlinear and complex relationships among variables that typically get lost in the calculation of a random forest. A random forest is an ensemble of a large number of classification or regression trees (also known as decision trees). Regression tree algorithms sort observations. The word observation in this case means a unique toxicological exposure and its result. For example, a study testing three doses of a substance at two different pHs at two different exposure times would contain twelve observations. Regression trees split a dataset into subsets of observations along ‘branches’ and eventually into

Prediction 4 Prediction 8

no

Prediction 7

no

yes Category g?

Category e?

yes Category c?

Prediction 6 yes

Category f?

no

Prediction 3

no

Category b?

no yes

no

yes

Category d? yes

Prediction 5

no

Prediction 2

Prediction 1

yes

Category a?

Figure 1 | Schematic of a categorical regression tree composed of ‘branches’ and ‘leaves’. A regression tree algorithm searches for the value of the independent variable that will divide the observations into two subsets whose predictions (values of the dependent variable) are most un-alike. Here the ‘root’ node is labelled ‘Category a?’, which is to be read “Does this observation belong in category a?”, where category a is one of the mutually exclusive categories of the variable A. For example, if A is particle diameter, its categories might be diameter  20 nm. If ‘a’ is the category, diameter 

Nanotoxicology: Seeing the trees for the forest.

Nanotoxicology: Seeing the trees for the forest. - PDF Download Free
105KB Sizes 2 Downloads 14 Views