摘要

Finding discriminative motifs has recently received much attention in biomedicine as such motifs allow us to characterize in distinguishing two different classes of sequences. It is common in biomedical applications that the quantity of labeled sequences is very limited while a large number of unlabeled sequences is usually available. The current methods of discriminative motif finding are powerful and effective with large labeled datasets, but they do not function well on small labeled datasets. In this paper, we present a semi-supervised ensemble method for finding discriminative motifs which is based on the SLUPC algorithm, a separate-and-conquer searching method to discover motifs of type %26apos;discriminative one occurrence per sequence%26apos;. The proposed method, named E-SLUPC (Ensemble SLUPC), uses SLUPC to search discriminative motifs from an extended labeled dataset that contains labeled data and unlabeled data with predicted labels. Strong discriminative and frequent motifs characterizing two outcome classes of hepatitis C virus treatment (sustained viral response and non-sustained viral response) were detected and analyzed. Furthermore, the experimental evaluation shows that our method can function considerably well in the common context of medical research when the labeled data is usually difficult to obtain.

  • 出版日期2013