摘要

This paper proposes a novel method which employs cloud model theory to automatically explore distinguish weight of each term for the query statement to improve performance of BM25 in information retrieval. The excellent performance of BM25 is mainly contribute to the sub-linear Term Frequency (TF) normalization formula, research of cloud model theory illustrates that the uncertainty of query term's distribution among different documents can make distinctiveness contribution to relevant evaluation, this paper introduce cloud model theory into information retrieval systems based on BM25 to take the intrinsic law of term distribution into account. By using Digital Characters of cloud model, we can reduce the noise of the query and automatically obtain each query term a distinctive weight to revise the BM25. Further more, modified cloud model can adjust itself to BM25F, a variant of BM25. Experiments on NTCIR-5 (the 5th NII Test Collection for IR Systems) document collection for SLIR (Single Language IR) show that our method achieves effective improvement comparing with the standard BM25.

全文