A BIC Based Initial Training Set Selection Algorithm for Active Learning and Its Application in Audio Detection

Leng, Yan<sup>*</sup>; Qi, Guang-hui; Xu, Xin-yan

摘要

To construct a classification system or a detection system, large amounts of labeled samples are needed. However, manual labeling is dull and time consuming, so researchers have proposed the active learning technology. The initial training set selection is the first step of an active learning process, but currently there have been few studies on it. Most active learning algorithms adopt random sampling or algorithms like sampling by clustering (SBC) to select the initial training samples. But these two kinds of method would lose their effectiveness in detecting events of small probability because sometimes they could not select or select too few samples of the small probability events. To solve this problem, this paper proposes a BIC based initial training set selection algorithm. The BIG based algorithm performs clustering on the whole training set first, then uses BIG to judge the status of clusters. Finally, it adopts different selection strategies for clusters of different status. Experimental results on two real data sets show that, compared to random sampling and SBC, the proposed BIG based initial training set selection algorithm can efficiently solve the detection problem of small probability events. In the mean time, it has obvious advantages in detecting events of non-small probability.

出版日期2013-6
单位山东师范大学; 山东交通学院

收藏分享被引浏览

更新时间：2021-07-11 18:52

A BIC Based Initial Training Set Selection Algorithm for Active Learning and Its Application in Audio Detection

摘要

产品服务

站内浏览

服务支持

联系方式

科研之友