摘要

Data heterogeneity is one of the big challenges in modern data analysis caused by the effects of unknown/unwanted factors introduced during data collection procedures. It will cause spurious estimation of variable effects when traditional methods are applied for feature selection which simply assume that data samples are independently and identically distributed. Although some existing statistical models can evaluate more accurately the significance of each variable by estimating and including unknown factors as covariates, they are categorized as filter methods suffering from variable redundancy and lack of predictability. Therefore, we propose an embedded feature selection method from a sparse learning perspective capable of adjusting unknown heterogeneity. Its performance is investigated by evaluating the classification performance using the selected features in multi-class classification problems. Benefitting from the effective adjustment of unknown heterogeneity and model selection strategy, the experimental results on synthetic data and three real-world benchmark data sets have shown that our method can achieve consistent superiority over several conventional embedded methods and existing statistical models.