摘要

Analyzing a proteomics dataset that contains a large amount of independent variables (biomarkers) with few response variables and many missing values can be very challenging. The authors tackle the problem by first exploring different imputation techniques to treat the missing values and then investigate multiple selection techniques to pick the best set of biomarkers to predict the unknown patients' disease status. They conclude their analysis by cross-validating the different combinations of imputation and selection techniques (using the set of patients of known disease status) in order to find the optimal technique for the supplied dataset.

  • 出版日期2011-6