A Model-Based Joint Identification of Differentially Expressed Genes and Phenotype-Associated Genes

作者:Cho Samuel Sunghwan; Kim Yongkang; Yoon Joon; Seo Minseok; Shin Su kyung; Kwon Eun Young; Kim Sung Eun; Bae Yun Jung; Lee Seungyeoun; Sung Mi Kyung; Choi Myung Sook; Park Taesung*
来源:PLos One, 2016, 11(3): e0149086.
DOI:10.1371/journal.pone.0149086

摘要

Over the last decade, many analytical methods and tools have been developed for microarray data. The detection of differentially expressed genes (DEGs) among different treatment groups is often a primary purpose of microarray data analysis. In addition, association studies investigating the relationship between genes and a phenotype of interest such as survival time are also popular in microarray data analysis. Phenotype association analysis provides a list of phenotype-associated genes (PAGs). However, it is sometimes necessary to identify genes that are both DEGs and PAGs. We consider the joint identification of DEGs and PAGs in microarray data analyses. The first approach we used was a naive approach that detects DEGs and PAGs separately and then identifies the genes in an intersection of the list of PAGs and DEGs. The second approach we considered was a hierarchical approach that detects DEGs first and then chooses PAGs from among the DEGs or vice versa. In this study, we propose a new model-based approach for the joint identification of DEGs and PAGs. Unlike the previous two-step approaches, the proposed method identifies genes simultaneously that are DEGs and PAGs. This method uses standard regression models but adopts different null hypothesis from ordinary regression models, which allows us to perform joint identification in one-step. The proposed model-based methods were evaluated using experimental data and simulation studies. The proposed methods were used to analyze a microarray experiment in which the main interest lies in detecting genes that are both DEGs and PAGs, where DEGs are identified between two diet groups and PAGs are associated with four phenotypes reflecting the expression of leptin, adiponectin, insulin-like growth factor 1, and insulin. Model-based approaches provided a larger number of genes, which are both DEGs and PAGs, than other methods. Simulation studies showed that they have more power than other methods. Through analysis of data from experimental microarrays and simulation studies, the proposed model-based approach was shown to provide a more powerful result than the naive approach and the hierarchical approach. Since our approach is model-based, it is very flexible and can easily handle different types of covariates.

  • 出版日期2016-3-10