摘要

Feature screening and variable selection are fundamental in analysis of ultrahigh-dimensional data, which are being collected in diverse scientific fields at relatively low cost. Distance correlation-based sure independence screening (DC-SIS) has been proposed to perform feature screening for ultrahigh-dimensional data. The DC-SIS possesses sure screening property and filters out unimportant predictors in a model-free manner. Like all independence screening methods, however, it fails to detect the truly important predictors which are marginally independent of the response variable due to correlations among predictors. When there are many irrelevant predictors which are highly correlated with some strongly active predictors, the independence screening may miss other active predictors with relatively weak marginal signals. To improve the performance of DC-SIS, we introduce an effective iterative procedure based on distance correlation to detect all truly important predictors and potentially interactions in both linear and nonlinear models. Thus, the proposed iterative method possesses the favourable model-free and robust properties. We further illustrate its excellent finite-sample performance through comprehensive simulation studies and an empirical analysis of the rat eye expression data set.