A Comparative Study of Nonlinear Machine Learning for the "In Silico" Depiction of Tyrosinase Inhibitory Activity from Molecular Structure

作者:Huong Le Thi Thu; Marrero Ponce Yovani*; Casanola Martin Gerardo M; Casas Cardoso Gladys; del Carmen Chavez Maria; Garcia Maria M; Morell Carlos; Torrens Francisco; Abad Concepcion
来源:Molecular Informatics, 2011, 30(6-7): 527-537.
DOI:10.1002/minf.201100021

摘要

In the preset report, for the first time, support vector machine (SVM), artificial neural network (ANN), Bayesian networks (BNs), k-nearest neighbor (k-NN) are applied and compared on two "in-house" datasets to describe the tyrosinase inhibitory activity from the molecular structure. The data set Data I is used for the identification of tyrosinase inhibitors (TIs) including 701 active and 728 inactive compounds. Data II consists of active chemicals for potency estimation of TIs. The 2D TOMOCOMD-CARDD atom-based quadratic indices are used as molecular descriptors. The derived models show rather encouraging results with the areas under the Receiver Operating Characteristic (AURC) curve in the test set above 0.943 and 0.846 for the Data I and Data II, respectively. Multiple comparison tests are carried out to compare the performance of the models and reveal the improvement of machine learning (ML) techniques with respect to statistical ones (see Chemometr. Intell. Lab. Syst. 2010, 104, 249). In some cases, these ameliorations are statistically significant. The tests also demostrate that k-NN, despite being a rather simple approach, presents the best behavior in both data. The obtained results suggest that the ML-based models could help to improve the virtual screening procedures and the confluence of these different techniques can increase the practicality of data mining procedures of chemical databases for the discovery of novel TIs as possible depigmenting agents.

  • 出版日期2011-6