A Tabu Search Approach for the NMR Protein Structure-Based Assignment Problem

作者:Cavuslar Gizem*; Catay Bulent; Apaydin Mehmet Serkan
来源:IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012, 9(6): 1621-1628.
DOI:10.1109/TCBB.2012.122

摘要

Nuclear Magnetic Resonance (NMR) (Abbreviations used: NMR, Nuclear Magnetic Resonance; NOE, Nuclear Overhauser Effect; RDC, Residual Dipolar Coupling; PDB, Protein Data Bank; SBA, Structure-Based Assignments; NVR, Nuclear Vector Replacement; BIP, Binary Integer Programming; TS, Tabu Search; QAP, Quadratic Assignment Problem; ff2, the FF Domain 2 of human transcription elongation factor CA150 (RNA polymerase II C-terminal domain interacting protein); SPG, Streptococcal Protein G; hSRI, Human Set2-Rpb1 Interacting Domain; MBP, Maltose Binding Protein; EIN, Amino Terminal Domain of Enzyme I from Escherichia Coli; EM, expectation maximization) Spectroscopy is an experimental technique which exploits the magnetic properties of specific nuclei and enables the study of proteins in solution. The key bottleneck of NMR studies is to map the NMR peaks to corresponding nuclei, also known as the assignment problem. Structure-Based Assignment (SBA) is an approach to solve this computationally challenging problem by using prior information about the protein obtained from a homologous structure. NVR-BIP used the Nuclear Vector Replacement (NVR) framework to model SBA as a binary integer programming problem. In this paper, we prove that this problem is NP-hard and propose a tabu search (TS) algorithm (NVR-TS) equipped with a guided perturbation mechanism to efficiently solve it. NVR-TS uses a quadratic penalty relaxation of NVR-BIP where the violations in the Nuclear Overhauser Effect constraints are penalized in the objective function. Experimental results indicate that our algorithm finds the optimal solution on NVR-BIP's data set which consists of seven proteins with 25 templates (31 to 126 residues). Furthermore, it achieves relatively high assignment accuracies on two additional large proteins, MBP and EIN (348 and 243 residues, respectively), which NVR-BIP failed to solve. The executable and the input files are available for download at http://people.sabanciuniv.edu/catay/NVR-TS/NVR-TS.html.

  • 出版日期2012-12