Development of a new regression analysis method using independent component analysis

作者:Kaneko Hiromasa; Arakawa Masamoto; Funatsu Kimito*
来源:Journal of Chemical Information and Modeling, 2008, 48(3): 534-541.
DOI:10.1021/ci700245f

摘要

In this paper, independent component analysis (ICA) and regression analysis are combined to extract significant components. ICA is a method that extracts mutually independent components from explanatory variables. A relationship between the independent components and an objective variable is constructed by the least-squares method. This method is named ICA-MLR (MLR = multiple linear regression). We verified the superiority of ICA-MLR over partial least squares (PLS) with simulation data and tried to apply this method to a quantitative structure-property relationship analysis of aqueous solubility. We constructed models between aqueous solubility and 173 molecular descriptors. PLS and genetic algorithm PLS models were constructed for a comparison of ICA-MLR. R-2, Q(2), and R-pred(2) values of the PLS model are 0.836, 0.819, and 0.848, respectively. These values of the ICA-MLR model are 0.937, 0.868, and 0.894, respectively. ICA-MLR achieved higher predictive accuracy than PLS. ICA-MLR could extract effective components from explanatory variables and construct the regression model with high predictive accuracy. In addition, the information of regression coefficients b(ICA-MLR) indicates the magnitude of contribution of each descriptor in the analysis of aqueous solubility.

  • 出版日期2008-3