摘要

A novel variable selection method named stability and variable permutation (SVP) is proposed based on evolutionary principles of Intraspecific competition' and 'survival of the fittest'. In SVP, variables are selected in an iterative and competitive manner. In each iteration, Monte Carlo sampling (MCS) runs in sample space and variable space for stability and variable permutation, respectively. Variables are divided into elite variables and normal variables according to stability by adaptive reweighted sampling (ARS). Then, combining variable permutation analysis, exponentially decreasing EDF) is employed to select important variables from normal variables. Elite variables and important variables construct a new variable subset for the next iteration. After the selection iterations are terminated, a number of sub-models were generated by Monte Carlo cross validation (MCCV) for each variable subset. The optimal variable subset was considered to be the one with the minimum mean value and relatively low standard deviation of root mean square error of MCCV. The performance of SVP is evaluated by three near-infrared (NIR) datasets: corn oil dataset, diesel fuel total aromatics dataset and wheat protein dataset. Compared with methods of moving window PLS (MWPLS), Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS), stability competitive adaptive reweighted sampling (SCARS), variable permutation population analysis (VPPA) and genetic algorithm (GA), SVP shows better prediction results.