An evaluation of experimental design in QSAR modelling utilizing the k-medoid clustering

Brandmaier Stefan<sup>*</sup>; Tetko Igor V; Oberg Tomas

doi:10.1002/cem.2459

摘要

A reliable selection of a representative subset of chemical compounds has been reported to be crucial for numerous tasks in computational chemistry and chemoinformatics. We investigated the usability of an approach on the basis of the k-medoid algorithm for this task and in particular for experimental design and the split between training and validation set. We therefore compared the performance of models derived from such a selection to that of models derived using several other approaches, such as space-filling design and D-optimal design. We validated the performance on four datasets with different endpoints, representing toxicity, physicochemical properties and others. Compared with the models derived from the compounds selected by the other examined approaches, those derived with the k-medoid selection show a high reliability for experimental design, as their performance was constantly among the best for all examined datasets. Of all the models derived with all examined approaches, those derived with the k-medoid approach were the only ones that showed a significantly improved performance compared with a random selection, for all datasets, the whole examined range of selected compounds and for each dimensionality of the search space.

出版日期2012-10

全文

访问全文

收藏分享被引(2) 浏览

更新时间：2017-06-25 02:48

An evaluation of experimental design in QSAR modelling utilizing the k-medoid clustering

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友