摘要

With advances in microarray technology, many biomarkers selection approaches have been proposed for cancer diagnosis. Marker sets are selected by scoring genes for how well they can discriminate between different classes of diseases [1-4] or are ranked by significance analysis without reference to classification tasks. However there is a pressing need for methods integrating biological priori knowledge in the gene selection process. In this study, we proposed to identify genes primarily in terms of diagnostic outcome relevance. As gene expression is a combination effect, with the help of SVD, the microarray data is decomposed, the eigenvectors correspond to the biological effect of clinical outcomes are identified. Genes which play important roles in determining this biological effect are detected. Therefore, genes are essentially identified in terms of the strength of association with clinical outcomes and the relationship of genes and clinical outcomes is analyzed. Monte Carlo simulations are then used to fine tune the selected gene set in terms of classification accuracy. The approach was tested on four public data sets. Comparative studies show that the selected genes achieved higher classification accuracies. Graphical analysis visualizes that they have close relationship with the cancer class. Statistical simulation shows that the gene set found by the proposed method is also less variable and comparatively invariant to external influences. The biological relevance of the selected genes is further discussed and validated with the literature study and analysis of biological databases.

全文