摘要

There have been hundreds of genes demonstrated to be associated with lung squamous cell carcinoma (LSCC), presenting various degrees of association with this disease. In the present study, gene vectors were investigated as genetic biomarkers for the diagnosis and personalized treatment of LSCC. A LSCC genetic database (LSCC_GD) was developed through literature-associated data analysis, where 260 LSCC target genes were curated. Subsequently, numerous associations between these genes and LSCC were studied. Following this, a sparse representation-based variable selection (SRVS) was employed for gene selection from two LSCC gene expression datasets, followed by a case/control classification. Results were compared using analysis of variance (ANOVA)-based gene selection approaches. Using SRVS, a gene vector was selected from each dataset, resulting in significantly higher classification accuracy (CR), compared with randomly selected genes (For datasets GSE18842 and GSE1987, CR=100 and 100% and permutation P=5.0x10(-4) and 1.8x10(-3), respectively). The SRVS method outperformed ANOVA in terms of the classification ratio. The results indicated that, for a given dataset, there may be a gene vector from the 260 curated LSCC genes that possesses significant prediction power. SRVS is effective in identifying the optimum gene subset target for personalized treatment.