摘要

One of the most important challenges in biology is to understand and simulate the folding behavior of some simple two-state proteins. In this work, a large pool for descriptors of proteins was developed including amino acid descriptors, topological descriptors and charged partial surface area descriptors. A heuristic method was employed to select significant features. As a result, three descriptors, total contact distance, unfolding entropy change and total charge weighted partial negative surface area were chosen. Total contact distance was proved to be the most relevant factor controlling the folding behavior. Based on these descriptors, a support vector machine method was applied to build a predictive model. Comparing the statistical results of other works and methods, support vector machine method exhibited the best whole performance. Our results demonstrate that the native-state topology is the major determinant for the folding rates of two-state proteins and that the support vector machine method is a powerful tool to build a predictive model. In turn, the method and the knowledge can be used to develop in-silico predictive models to simulate the folding process of proteins.