A Framework for Unbiased Model Selection Based on Boosting

Hofner Benjamin<sup>*</sup>; Hothorn Torsten; Kneib Thomas; Schmid Matthias

doi:10.1198/jcgs.2011.09220

摘要

Variable and model selection are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the covariates are of different nature. Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is noninformative. Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative. We investigate these problems from a theoretical perspective and suggest a framework for bias correction based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations. Supplemental materials including an application to forest health models, additional simulation results, additional theorems, and proofs for the theorems are available online.

出版日期2011-12
单位河北医科大学

全文

访问全文

收藏分享被引(67) 浏览

更新时间：2024-04-23 02:07

A Framework for Unbiased Model Selection Based on Boosting

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友