摘要

Support vector machines (SVMs) are among the most popular machine learning methods for compound classification and other chemoinformatics tasks such as, for example, the prediction of ligand-target pairs or compound activity profiles. Depending on the specific applications, different SVM strategies can be used. For example, in the context of potency-directed virtual screening, linear combinations of multiple SVM models have been shown to enrich database selection sets with potent compounds compared to individual models. An open question concerning the use of SVM linear combinations (SVM-LCs) is how to best weight the models on a relative scale. Typically, linear weights are subjectively set. Herein, preferred weighting factors for SVM-LC were systematically determined. Therefore, weights were treated as meta-parameters and optimized by machine learning to enrich data set rankings with highly active compounds. The meta-parameter approach has been applied to 10 screening data sets and found to further improve SVM performance over other SVM-LCs and support vector regression (SVR) models. The results show that optimal weights depend on data set characteristics and chosen molecular representations. In addition, individual models often do not contribute to the performance of SVM-LCs. Taken together, these findings emphasize the need for systematic meta-parameter estimation.

  • 出版日期2015-2

全文