摘要

Loop region is necessary structural element of protein molecule, and plays significant roles in protein functioning, e.g., in signaling, ligand recognition. Unlike the well-defined secondary structures (i.e., helix, sheet), however, loop regions vary in structure and some of them are even not able to be measured by ordinary experimental methods. For these reasons, computer-aided prediction of loop structure became a hotspot in bioinformatics and biophysics. Sorts of algorithms have been developed for this purpose. So far, however, the prediction of long loop is still a challenge. Among all the common algorithms, LEAP algorithm achieves the highest precision on long loop prediction. Our investigation on a test data set with LEAP algorithm reveals that the ultimate loop structure predicted by LEAP is almost entirely determined by the initial sampling of the conformation of the loop backbone. If all the backbone conformations in the initial sampling are quite distant from the real (native) conformation, the ultimately predicted structure is also distant from the native conformation, and the prediction accuracy cannot be improved obviously only by increasing the computation time. In the original LEAP, the initial sampling is based on the rough distribution of the backbone torsion angle (Ramachandran plot, R-plot) which doesn't consider the sequence information of the loop region. Many conformations which are far from the native conformation are most likely generated in the sampling. So there raises the open question, is it possible to enhance the initial sampling to be more targeted to the native conformation? In this paper, we suggest an approach to introduce the position-specific amino-acid sequence information into the initial sampling of the backbone conformation, which may generate more targeted initial decoys. An algorithm of protein secondary structure prediction, SPINE X, is used to generate rough but reasonable estimates of torsion angles of each amino acid of the loop backbone in sequence-dependent way. We then combine these values with the original R-plot to reconstruct a new R-plot for each amino acid in the loop, and the initial sampling is performed according to the new R-plot. We applied this new algorithm to a test set of loops (generated from single-chain proteins in CASP 10), and found the medians/means of RMSDs can reduce about 0.12 angstrom/0.13 angstrom, 0.25 angstrom/0.27 angstrom , 0.47 angstrom/0.27 for loop sets of leng 10, 11, 12, respectively. Comparing to the original LEAP algorithm, the probability of making more accurate predictions is almost doubled when using the refined algorithm. The logic of our approach is not limited to LEAP, and can be extended to other algorithms which are also significantly dependent on initial sampling.

全文