Analysis and Study of Molecule Data Sets Using Snowflake Diagrams of Weighted Maximum Common Subgraph Trees

作者:Cerruela Garcia Gonzalo; Luque Ruiz Irene; Angel Gomez Nieto Miguel
来源:Journal of Chemical Information and Modeling, 2011, 51(6): 1216-1232.
DOI:10.1021/ci100484z

摘要

Isomorphism measures based on the maximum common subgraph (MCS) calculation are widely used in computational chemistry for classifying, screening, and predicting properties and biological activity within chemical databases. The development of a weighted hierarchical structure based on the MCS is described in this paper. Furthermore, a 2D representation model is proposed as the proper tool for the study and preliminary analysis of molecule data sets. The development process of the weighted MCS tree is open to the use of different approaches. By taking into account different molecular descriptors, similarity and distance measures in the weighted MCS tree, the relationships between the molecular property or the activity, and the variables considered for the building and display of the weighted MCS tree can be observed. Besides that, the representation model based on snowflake diagrams allows to display of those relationships as well as shows any existing degeneration, in order to detect any possible outlier that could be obtained during the development of predictive models and to extract new variables that can be used in the building of quantitative structure-activity relationship models.

  • 出版日期2011-6