摘要

Variable screening procedure is popularly used in ultrahigh-dimensional data analysis. It ranks the importance of the predictor variables by marginal correlations and then screens out the variables that are weakly correlated or uncorrelated with the response variables. Though demonstrated their effectiveness, the performance of most variable screening approaches depend on the pre-determined threshold of the size of selected predictor variables, which is some integer multiples of [n/log(n)] with n being the sample size. To circumvent this issue, we propose a novel data-driven variable screening procedure that can automatically determine the threshold. In our proposal, we rank the importance of the predictor variables by the p-values using some modified independent tests, with the smaller p-values indicating higher correlation. Compared with the existing counterpart, extensive simulation studies and a real genetic data indicate the preference of our procedure.