摘要

A quantitative structure-activity relationship was developed to predict the efficacy of carbon adsorption as a control technology for endocrine-disrupting compounds, pharmaceuticals, and components of personal care products, as a tool for water quality professionals to protect public health. Here, we expand previous work to investigate a broad spectrum of molecular descriptors including subdivided surface areas, adjacency and distance matrix descriptors, electrostatic partial charges, potential energy descriptors, conformation-dependent charge descriptors, and Transferable Atom Equivalent (TAE) descriptors that characterize the regional electronic properties of molecules. We compare the efficacy of linear (Partial Least Squares) and non-linear (Support Vector Machine) machine learning methods to describe a broad chemical space and produce a user-friendly model. We employ cross-validation, y-scrambling, and external validation for quality control. The recommended Support Vector Machine model trained on 95 compounds having 23 descriptors offered a good balance between good performance statistics, low error, and low probability of over-fitting while describing a wide range of chemical features. The cross-validated model using a log-uptake (q(e)) response calculated at an aqueous equilibrium concentration (C-e) of 1 M described the training dataset with an r(2) of 0.932, had a cross-validated r(2) of 0.833, and an average residual of 0.14 log units.

  • 出版日期2016