摘要

In order to define an applicability domain for quantitative structure-activity relationship modeling, a combinational strategy of model disturbance and outlier comparison is developed. An indicator named model disturbance index was defined to estimate the prediction error. Moreover, the information of the outliers in the training set was used to filter the unreliable samples in the test set based on "structural similarity". Chromatography retention indices data were used to investigate this approach. The relationship between model disturbance index and prediction error can be found. Also, the comparison between the outlier set and the test set could provide additional information about which unknown samples should be paid more attentions. A novel technique based on model population analysis was used to evaluate the validity of applicability domain. Finally, three commonly used methods, i.e. Leverage, descriptor range-based and model perturbation method, were compared with the proposed approach.