摘要

Variable selection plays a dominant role in building forecast models when high-dimensional data appears. However, how to select important variables from a large number of candidate variables efficiently and accurately poses a critical challenge to researchers from various scientific fields including machine learning, genetics, medicine, and finance. In this paper, a novel approach for sparse estimation is proposed. This approach combines the advantages of the square root loss function and nonconvex penalty to obtain an interpretable model with high forecasting accuracy. In particular, the square root loss function facilitates the choice of regularization parameters based on the noise level that is critically difficult to estimate as the number of variables increases; the nonconvex penalty is shown to be superior over the convex penalty in terms of selection consistency especially when the number of variables exceeds the sample size. In computation, a fast and simple-to-implement algorithm is developed with a theoretical guarantee of its convergence. Furthermore, an accelerated gradient method is utilized to further speed up the convergence and the proposed algorithm is proved to scale well to high-dimensional data. Simulation examples with diverse sample sizes, dimensions, correlation coefficients, noise levels, and real data examples focusing on the inbred mouse microarray gene selection problem are exhibited to demonstrate the efficiency and efficacy of this novel approach compared with other existing competitors.