摘要

Keyword Search has been recognised as a viable alternative for information search in semi-structured and structured data sources. Current state-of-the-art keyword-search techniques over relational databases do not take advantage of correlative meta-information included in structured and semi-structured data sources leaving relevant answers out. These techniques are also limited due to scalability, performance and precision issues that are evident when they are implemented on large datasets. Based on an in-depth analysis of issues related to indexing and ranking semi-structured and structured information. We propose a new keyword-search algorithm that takes into account the semantic information extracted from the schemes of the structured and semi-structured data sources and combine it with the textual relevance obtained by a common text retrieval approach. The algorithm is implemented in a keyword-based search engine called KESOSASD (Keyword Search Over Semi-structured and Structured Data), improving its precision and response time. Our approach models the semi-structured and structured information as graphs, and make use of a Virtual Document Structure Aware Inverted Index (VDSAII). This index is created from a set of logical structures called Virtual Documents, which capture and exploit the implicit structural relationships (semantics) depicted in the schemas of the structured and semi-structured data sources. Extensive experiments were conducted to demonstrate that KESOSASD outperforms existing approaches in terms of search efficiency and accuracy. Moreover, KESOSASD is prepared to scale out and manage large databases without degrading its effectiveness.

  • 出版日期2014-12-20