Imbalance Data Processing Strategy for Protein Interaction Sites Prediction

Wang, Bing; Mei, Changqing; Wang, Yuanyuan; Zhou, Yuming; Cheng, Mu-Tian; Zheng, Chun-Hou; Wang, Lei; Zhang, Jun; Chen, Peng; Xiong, Yan

doi:10.1109/TCBB.2019.2953908

协同网络创新平台服务，让科研更成功

Imbalance Data Processing Strategy for Protein Interaction Sites Prediction

Abstract: Protein-protein interactions play essential roles in various biological progresses. Identifying protein interaction sites can facilitate researchers to understand life activities and therefore will be helpful for drug design. However, the number of experimental determined protein interaction sites is far less than that of protein sites in protein-protein interaction or protein complexes. Therefore, the negative and positive samples are usually imbalanced, which is common but bring result bias on the prediction of protein interaction sites by computational approaches. In this work, we presented three imbalance data processing strategies to reconstruct the original dataset, and then extracted protein features from the evolutionary conservation of amino acids to build a predictor for identification of protein interaction sites. On a dataset with 10,430 surface residues but only 2,299 interface residues, the imbalance dataset processing strategies can obviously reduce the prediction bias, and therefore improve the prediction performance of protein interaction sites. The experimental results show that our prediction models can achieve a better prediction performance, such as a prediction accuracy of 0.758, or a high F-measure of 0.737, which demonstrated the effectiveness of our method.

Output ID:
22031244462
Category:
期刊论文
Keywords:
Proteins; Feature extraction; Amino acids; Data mining; Surface treatment; Support vector machines; Entropy; Protein interaction sites; imbalanced data; conservative features; prediction performance; prediction bias
Authors:
Wang, Bing^*; Mei, Changqing; Wang, Yuanyuan; Zhou, Yuming; Cheng, Mu-Tian; Zheng, Chun-Hou; Wang, Lei; Zhang, Jun; Chen, Peng; Xiong, Yan
Journal Name:
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Status:
Published
Publication Date/Period:
2021-5
Issue No.:
3
Volume No.:
18
Pages:
985-994
DOI:
10.1109/TCBB.2019.2953908
Times Cited:
21
Funding:
National Natural Science Foundation of China [61472282, 61672035, 61872004, 61873221, 61672447]; Anhui Province Funds for Excellent Youth Scholars in Colleges [gxyqZD2016068]; Co-Innovation Center for Information Supply & Assurance Technology in AHU [ADXXBZ201705]; Anhui Scientific Research Foundation for Returned Scholars; Natural Science Foundation of Hunan Province [2018JJ4058, 2017JJ5036]
Related Links:
View >
Last update date:
2024/04/23 17:23:21