摘要

This work focuses on multiple instance learning (MIL) with sparse positive bags (which we name as sparse MIL). A structural representation is presented to encode both instances and bags. This representation leads to a non-i.i.d. MIL algorithm, miStruct, which uses a structural similarity to compare bags. Furthermore, MIL with this representation is shown to be equivalent to a document classification problem. Document classification also suffers from the fact that only few paragraphs/words are useful in revealing the category of a By using the TF-IDF representation which has excellent empirical performance in document classification, the miDoc method is proposed. The proposed methods achieve significantly higher accuracies and AUC (area under the ROC curve) than the state-of-the-art in a large number of sparse MIL problems, and the document classification analogy explains their efficacy in sparse MIL problems.