A scalable algorithm for molecular property estimation in high dimensional scaffold-based libraries

作者:Izmailov Sofia; Feng XiaoJiang; Li Genyuan; Rabitz Herschel*
来源:Journal of Mathematical Chemistry, 2012, 50(7): 1765-1790.
DOI:10.1007/s10910-012-0005-y

摘要

An algorithm is presented for the estimation of molecular properties over a library built around a scaffold, which has N sites for functionalization with M-i moieties at the ith scaffold site, corresponding to a library of Pi(N)(i=1) M-i molecules. The algorithm relies on a series of operations involving (i) synthesis and property measurement of a minimal number of T randomly sampled members of the library, (ii) expression of the observed property in terms of a high-dimensional model representation (HDMR) of the moiety -> property map, (iii) optimization of the ordered sequence of moieties on each site to regularize the HDMR map and (iv) interpolation using the map to estimate the properties of as yet unsynthesized compounds. The set of operations is performed iteratively aiming to reach convergence of the predictive HDMR map with as few synthesized samples as possible. Through simulation, the number T of required random molecular samples is shown to scale very favorably with T << Pi(N)(i=1) M-i for cases up to N = 20 and M-i = 20. For example, high estimation quality was attained for simulated libraries with T similar to 5,000 sampled compounds for a library of 20(12) members and T similar to 12,500 sampled compounds for a library of 20(20) members. The algorithm is based on the assumption that a systematic pattern exists in the moiety -> property map provided that the moieties are optimally ordered on the scaffold sites within the context of HDMR. The overall procedure is referred to as the substituent reordering HDMR algorithm (SR-HDMR). The technique was also successfully tested with laboratory data for estimating C-13-NMR shifts in a tri-substituted benzene library and for lac operon repression binding.

  • 出版日期2012-8