摘要

Identifying cancer-causing mutated driver genes from passenger mutations is crucial to enhance the development of cancer diagnostics and therapeutics, and many previous efforts have been undertaken to identify cancer driver genes from somatic mutation data of specific types of cancers. However, many driver genes are underestimated when the mutation data of only specific cancers are investigated, which complicates the understanding of tumorigenesis. According to recent studies, cancers of disparate organs have many shared genomic mutations, and some driver genes that are not highly frequently mutated in patients of one cancer type may display considerable mutation frequencies across patients of multiple cancer types. By taking into account both the similarities of mutation profiles of different cancer types and the information of gene interaction network, we propose a novel unsupervised learning model based on matrix tri-factorization by learning the similarities from pairwise constraints to detect driver genes from pan-cancer data. In the evaluation of known benchmarking genes, our model achieves better performance than those of the existing matrix factorization based methods which do not consider the pairwise similarities between cancers. Furthermore, the detection performance of our model is also largely increased (area under the precision-recall curve = 9.1% for Vogelstein genes) when compared with existing methods. Moreover, our model discovers some driver genes that have been reported in recent published studies, showing its potential for application in identifying driver gene candidates for further wet experimental verification.