摘要

With high-dimensional data emerging in various domains, sparse logistic regression models have gained much interest of researchers. Variable selection plays a key role in both improving the prediction accuracy and enhancing the interpretability of built models. Bayesian variable selection approaches enjoy many advantages such as high selection accuracy, easily incorporating many kinds of prior knowledge and so on. Because Bayesian methods generally make inference from the posterior distribution with Markov Chain Monte Carlo (MCMC) techniques, however, they become intractable in high-dimensional situations due to the large searching space. To address this issue, a novel variational Bayesian method for variable selection in high-dimensional logistic regression models is presented. The proposed method is based on the indicator model in which each covariate is equipped with a binary latent variable indicating whether it is important. The Bernoulli-type prior is adopted for the latent indicator variable. As for the specification of the hyperparameter in the Bernoulli prior, we provide two schemes to determine its optimal value so that the novel model can achieve sparsity adaptively. To identify important variables and make predictions, one efficient variational Bayesian approach is employed to make inference from the posterior distribution. The experiments conducted with both synthetic and some publicly available data show that the new method outperforms or is very competitive with some other popular counterparts.