摘要

We consider the equivalent problems of estimating the residual variance, the proportion of explained variance. and the signal strength in a high-dimensional linear regression model with Gaussian random design. Our aim is to understand the impact of not knowing the sparsity of the vector of regression coefficients and not knowing the distribution of the design on minimax estimation rates of.. Depending on the sparsity k of the vector regression coefficients, optimal estimators of. either rely on estimating the vector of regression coefficients or are based on U-type statistics. In the important situation where k is unknown, we build an adaptive procedure whose convergence rate simultaneously achieves the minimax risk over all k up to a logarithmic loss which we prove to be non avoidable. Finally, the knowledge of the design distribution is shown to play a critical role. When the distribution of the design is unknown, consistent estimation of explained variance is indeed possible in much narrower regimes than for known design distribution.

  • 出版日期2018-11