摘要

Lately, the enormous generation of databases in almost every aspect of life has created a great demand for new powerful tools for turning data into useful information. Therefore, researchers were encouraged to explore and develop new machine learning ideas and methods. Mixture models are one of the machine learning techniques receiving considerable attention due to their ability to handle efficiently and effectively multidimensional data. In this paper, we represent a solution for two challenging issues: modeling non-Gaussian data and determining the set of relevant features in the data. The problem of modeling non-Gaussian data largely present in several computer vision, image processing, medical, and Bioinformatics applications is accomplished by the development of a generative infinite Gamma mixture model. The Gamma is chosen for its ability to handle long-tailed distributions, which allows it to have a good approximation to data with outliers. The proposed model, which can be viewed as a Dirichlet process mixture of Gamma distributions, takes into account the feature selection problem by determining a set of relevant features for each data cluster which provides better interpretability and generalization capabilities. We propose then an efficient algorithm to learn this infinite model's parameters by estimating all its posterior quantities of interest using Markov Chain Monte Carlo (MCMC) simulations. Thus, our algorithm is able to perform model selection, parameter learning, and feature selection simultaneously in a single step for the Gamma Mixture model. Furthermore, we show how the model can be used, while comparing it with other popular models in the literature, in two challenging applications namely medical images and gene expressions classification.

  • 出版日期2015-1