Embedded feature selection accounting for unknown data heterogeneity

Lu, Meng<sup>*</sup>

doi:10.1016/j.eswa.2018.11.006

摘要

Data heterogeneity is one of the big challenges in modern data analysis caused by the effects of unknown/unwanted factors introduced during data collection procedures. It will cause spurious estimation of variable effects when traditional methods are applied for feature selection which simply assume that data samples are independently and identically distributed. Although some existing statistical models can evaluate more accurately the significance of each variable by estimating and including unknown factors as covariates, they are categorized as filter methods suffering from variable redundancy and lack of predictability. Therefore, we propose an embedded feature selection method from a sparse learning perspective capable of adjusting unknown heterogeneity. Its performance is investigated by evaluating the classification performance using the selected features in multi-class classification problems. Benefitting from the effective adjustment of unknown heterogeneity and model selection strategy, the experimental results on synthetic data and three real-world benchmark data sets have shown that our method can achieve consistent superiority over several conventional embedded methods and existing statistical models.

出版日期2019-4-1
单位天津大学

全文

访问全文

收藏分享被引(40) 浏览

更新时间：2024-05-10 12:06

Embedded feature selection accounting for unknown data heterogeneity

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友