A note on dichotomization of continuous response variable in the presence of contamination and model misspecification

Shentu Yue; Xie Minge<sup>*</sup>

doi:10.1002/sim.3966

摘要

The purpose of this note is to raise awareness of the complexity of the practice involving dichotomization. It is well known that the regular regression models are effective tools for analyzing Gaussian-type response variables, and researchers are often told that it is a 'bad idea' to practice dichotomization if continuous measurements are available. We demonstrate through special cases, however, that there is another side of the story if the response variable is contaminated. Although dichotomization causes loss of information, it can also reduce input of contamination. If the reduction of contamination input outweighs the loss of information, analysis based on dichotomization can sometimes provide better results. We derive formulas of bias and variance for binary regression estimators under a contamination model of unknown additive errors, and compare them with both the least squares and robust M-estimators from the corresponding linear regression analysis using continuous responses. As a case study, we study extensively the case in which the observed response is contaminated by an error with a mean and a variance proportional to the mean and the variance of the uncontaminated true response. Conditions under which dichotomization is preferred are obtained. A simulation study based on a real data setting is provided, which supports the theoretical developments.

出版日期2010-9-20

全文

访问全文

收藏分享被引(14) 浏览

更新时间：2018-01-19 02:22

A note on dichotomization of continuous response variable in the presence of contamination and model misspecification

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友