摘要

An improved QSPR method based on support vector machine (SVM) applying rational sample data selection and genetic algorithm (GA)-controlled training parameters optimization was developed to study the standard formation Gibbs free energy of 78 kinds of acyclic alkanes. The SVMs were trained applying the standard regression algorithm based on quadratic programming theory, and the Gaussian radial basis kernel RBKF) was employed in the training process. Meanwhile, eight well-known topological indices were used as structural descriptors for each alkane molecule, and they were also considered to be the potential input variables for the proposed QSPR models. Subsequently, by optimizing the epsilon parameter in insensitive loss function, the penal factor C, the a parameter in RBKF and the input variable representations simultaneously via GA, a novel QSPR approach based on the combination of GA and SVM was proposed to improve the prediction results of the independent external test samples. For independent external test samples selected randomly prior to QSPR model development, an improved predictive modeling method based on SVM was achieved by rationally selecting the training and the internal test data set with sphere exclusion algorithm and optimizing the SVM training parameters by the proposed GA method. <br xmlns:set="http://exslt.org/sets">For comparing purpose, partial least square (PLS) regression method was also used as another QSPR modeling tool for the experimental data set. Moreover, to verify the improved modeling method in a more general way, two mathematically simulated QSPR data sets were built to confirm its validity.