摘要

Approaches that can predict the biological activity or properties of a chemical compound are an important application of machine learning In this paper we Introduce a new kernel function for measuring the similarity between chemical compounds and for learning their related properties and activities The method is based on local atom pair environments which can be rapidly computed by using the topological all-shortest paths matrix and the geometrical distance matrix of a molecular graph as lookup tables The local atom pair environments are stored in prefix search trees so called tries for an efficient comparison The kernel can be either computed as an optimal assignment kernel or as a corresponding convolution kernel over all local atom similarities We implemented the Tanimoto kernel min kernel minmax kernel and the dot product kernel as local kernels which are computed recursively by traversing the tries
We tested the approach on eight structure-activity and structure-property molecule benchmark data sets from the literature The models were trained with epsilon- support vector regression and support vector classification The local atom pair kernels showed to be at least competitive to state-of-the-art kernels in seven out of eight cases in a direct comparison A comparison against literature

  • 出版日期2010-12
  • 单位上海生物信息技术研究中心