A Generic Discriminative Model for Information Extraction

作者:M Sathya; V Prasanna Venkatesan; G Sureshkumar
来源:International Journal of Soft Computing, 2012.

摘要

Information extraction is the automatically extracting of facts from text, which includes detection of named entities, entity relations and events. Conventional approaches to information extraction try to find syntactic patterns based on deep processing of text, such as partial or full parsing. The problem these solutions have to face is that as deeper analysis is used, the accuracy of the result decreases and one cannot recover from the induced errors. On the other hand, lower level processing is more accurate and it can also provide useful information. However, within the framework of conventional approaches, this kind of information cannot be efficiently incorporated. This study describes a novel supervised approach based on kernel methods to address these issues. In this approach customized kernels are used to match syntactic structures produced from different preprocessing phases. Using properties of a kernel, individual kernels are combined into a composite kernel to integrate and extend all the information. The composite kernels can be used with various classifiers, such as Nearest Neighbor or Support Vector Machines (SVM). Each level of syntactic information can contribute to Information Extraction (IE) tasks and low-level information can help to recover from errors in deep processing.

  • 出版日期2012

全文