Emerging chemical patterns applied to prediction of P-glycoprotein inhibitors

作者:Pan Xianchao; Chao Li; Tan Wen; Yang Li; Podraza Roman; Mei Hu*
来源:Chemometrics and Intelligent Laboratory Systems, 2014, 137: 140-145.
DOI:10.1016/j.chemolab.2014.06.017

摘要

Recently, emerging chemical patterns (ECPs) has been proposed as a powerful tool for compound classification in cheminformatics. However, the prediction power and applicability of the ECP approach has remained largely unexplored. Herein, the effects of sample size, data quality, and unbalanced data on the prediction performance of ECP were systematically investigated by using a dataset consisted of 666 P-gp inhibitors and 609 non-inhibitors. The results showed that the ECP classification can achieve high sensitivity and modest specificities, depending on the size or positive-to-negative ratio of a training set. For a training set with only 3 positive and 3 negative training samples, a predictive ECP model was obtained with sensitivity larger than 0.95 for 418 test samples. In addition, the results showed that the prediction performance of an ECP model was strongly influenced by the quality of training samples. Taken together, the ECP approach renders methodology attractive for the virtual screening of lead compounds, especially when few positive samples are available.