摘要

In this paper we present a method that can be used to attain specific objectives in a typical prior art search process. The objectives are first to assist patent searchers in understanding the underlying technical concepts of a patent by identifying relevant international patent classification (IPC) codes and second to help them conduct a filtered search based on automatically selected IPCs. We view the automated selection of IPCs as a collection selection problem from the domain of distributed information retrieval (DIR) that can be addressed using existing DIR methods, which we extend and adapt for the patent domain. Our work exploits the intellectually assigned classifications codes that are used to categorize patents and to facilitate patent searches. In our method, manually assigned IPC codes of patent documents are used to cluster, distribute and index patents through hundreds or thousands of sub-collections. We propose a new multilayer collection selection method that effectively suggests classification codes exploiting the hierarchical classification schemes such as IPC/CPC. The new method in addition to utilizing the topical relevance of IPCs at a particular level of interest exploits the topical relevance of their ancestors in the IPC hierarchy and aggregates those multiple estimations of relevance to a single estimation. Experimental results on the CLEF-IP 2011 dataset show that the proposed approach outperforms state-of-art methods from the DIR domain not only in identifying relevant IPC codes but also in retrieving relevant patent documents given a patent query.

  • 出版日期2015-12