Do Not Hesitate to Use Tversky-and Other Hints for Successful Active Analogue Searches with Feature Count Descriptors

作者:Horvath Dragos*; Marcou Gilles; Varnek Alexandre
来源:Journal of Chemical Information and Modeling, 2013, 53(7): 1543-1562.
DOI:10.1021/ci400106g

摘要

This study is an exhaustive analysis of the neighborhood behavior over a large coherent data set (ChEMBL target/ligand pairs of known K-i, for 165 targets with %26gt;50 associated ligands each). It focuses on similarity-based virtual screening (SVS) success defined by the ascertained optimality index. This is a weighted compromise between purity and retrieval rate of active hits in the neighborhood of an active query. One key issue addressed here is the impact of Tversky asymmetric weighing of query vs candidate features (represented as integer-value ISIDA colored fragment/pharmacophore triplet count descriptor vectors). The nearly a 3/4 million independent SVS runs showed that Tversky scores with a strong bias in favor of query-specific features are, by far, the most successful and the least failure-prone out of a set of nine other dissimilarity scores. These include classical Tanimoto, which failed to defend its privileged status in practical SVS applications. Tversky performance is not significantly conditioned by tuning of its bias parameter alpha. Both initial %26quot;guesses%26quot; of alpha = 0.9 and 0.7 were more successful than Tanimoto (at its turn, better than Euclid). Tversky was eventually tested in exhaustive similarity searching within the library of 1.6 M commercial + bioactive molecules at http://infochim.u-strasbg.fr/webserv/VSEngine.html, comparing favorably to Tanimoto in terms of %26quot;scaffold hopping%26quot; propensity. Therefore, it should be used at least as often as, perhaps in parallel to Tanimoto in SVS. Analysis with respect to query subclasses highlighted relationships of query complexity (simply expressed in terms of pharmacophore pattern counts) and/or target nature vs SVS success likelihood. SVS using more complex queries are more robust with respect to the choice of their operational premises (descriptors, metric). Yet, they are best handled by %26quot;pro-query%26quot; Tversky scores at alpha %26gt; 0.5. Among simpler queries, one may distinguish between %26quot;growable%26quot; (allowing for active analogs with additional features), and a few %26quot;conservative%26quot; queries not allowing any growth. These (typically bioactive amine transporter ligands) form the specific application domain of %26quot;pro-candidate%26quot; biased Tversky scores at alpha %26lt; 0.5.

  • 出版日期2013-7