摘要

In this paper we present a quasi-optimal sample set for ordinary least squares (OLS) regression. The quasi-optimal set is designed in such a way that, for a given number of samples, it can deliver the regression result as close as possible to the result obtained by a (much) larger set of candidate samples. The quasi-optimal set is determined by maximizing a quantity measuring the mutual column orthogonality and the determinant of the model matrix. This procedure is nonadaptive, in the sense that it does not depend on the sample data. This is useful in practice, as it allows one to determine, prior to the potentially expensive data collection procedure, where to sample the underlying system. In addition to presenting the theoretical motivation of the quasi-optimal set, we also present its efficient implementation via a greedy algorithm, along with several numerical examples to demonstrate its efficacy. Since the quasi-optimal set allows one to obtain a near optimal regression result under any affordable number of samples, it notably outperforms other standard choices of sample sets. An immediate application of the quasi-optimal set is stochastic collocation for uncertainty quantification, where data collection usually requires expensive PDE solutions. It is demonstrated that the quasi-optimal set OLS can deliver very competitive results, compared to the generalized polynomial chaos Galerkin method, and yet it remains fully nonintrusive and is much more flexible for practical computations.

  • 出版日期2016