摘要

Aptamers have exhibited a great potential for research, clinical and industrial purposes. A critical step to realize these applications is to gain high-affinity aptamers specific to interested targets. To facilitate the selection of aptamers generated in systematic evolution of ligands by exponential enrichment (SELEX) process, we propose a novel nucleic add sequence encoding strategy of Apta-LoopEnc for secondary structural feature extraction of candidate sequences by analyzing their delicate substructures in loop regions. Since the unique loop structures of aptamers determine their interaction with targets, encoding their central loop structures directly enables featuring aptamer binding affinity related properties. Additionally, the nucleotide composition of a sequence is also used as descriptors in Apta-LoopEnc to further decrease the description similarity between sequences. The feasibility of Apta-LoopEnc for sequence encoding has been demonstrated by the study of high-affinity aptamer identification against human hepatocellular carcinoma cells. The results indicate the developed Apta-LoopEnc is able to significantly improve the performance of different pattern recognition models. Using the Apta-LoopEnc based support vector machine (SVM) to predict a set of newly designed candidate sequences beyond SELEX has further demonstrated the potential of the developed sequence encoding and prediction strategy in aid of high-performance aptamer design and optimization in an easy, time-saving and cost-effective way via computation, thus, promoting the development of aptamer-related studies and applications.