摘要

In applications like feature-level sensor fusion, the problem of selecting an optimal number of sensors can lead to reduced maintenance costs and the creation of compact online databases for future use. This problem of sensor selection can be reduced to the problem of selecting an optimal set of groups of features during model selection. This is a more complex problem than the problem of feature selection, which has been recognized as a key aspect of statistical model identification. This work proposes a new algorithm based on the use of a Bayesian framework for the purpose of selecting groups of features during regression and classification. The hierarchical Bayesian formulation introduces grouping for the parameters of a generalized linear model and the model hyper-parameters are estimated using an empirical Bayes procedure. A novel aspect of the algorithm is its ability to simultaneously perform feature selection within groups to reduce over-fitting of the data. Further, the parameters obtained from this algorithm can be used to obtain a rank order among the selected sensors. The performance of the algorithm is first tested on a synthetic regression example. Finally, it is applied to the problem of fault detection in diesel engines (30,000 data records from 43 sensors, 8 classes) and used to compare the misclassification rates with a varying number of sensors.

  • 出版日期2010-1