摘要

Table extraction is usually complemented with the table annotation to find the hidden semantics in a particular piece of document or a book. These hidden semantics are determined by identifying a type for each column, finding the relationships between the columns, if any, and the entities in each cell. Though used for the small documents and web-pages, these approaches have not been extended to the table extraction and annotation in the book tables. This paper focuses on detecting, locating and annotating entities in book tables. More specifically it contributes algorithms for identifying and locating the tables in books and annotating the table entities by using the online knowledge source DBpedia Spotlight. The missing entities from the DBpedia Spotlight are then annotated using Google Snippets. It was found that the combined results give higher accuracy and superior performance over the use of DBpedia alone. The approach is a complementary one to the existing table annotation approaches as it enables us to discover and annotate entities that are not present in the catalogue. We have tested our scheme on Computer Science books and got promising results in terms of accuracy and performance.

  • 出版日期2018-7