摘要

The ontology matching process has become a vital part of the (semantic) web, enabling interoperability among heterogeneous data. To enable interoperability, similar entity pairs across heterogeneous data are discovered using a static set of matchers consisting of linguistic, structural and/or instance matchers that discover similar entities. Numerous sets of matchers exist in the literature; however, none of the matcher sets are capable of achieving good results across all data. In addition, it is both tedious and painstaking for domain experts to select the best set of matchers for the given data to be matched. In this paper, we propose two bootstrapping-based approaches, Bottom-up and Top-down, to automatically select the best set of matchers for the given ontologies to be matched. The selection is processed, based on the characteristics of the ontologies which are quantified by a set of quality metrics. Two new structural quality metrics, the Concept External Structural Richness (CESR) and the Concept Internal Structural Richness (CISR), have also been proposed to better quantify the structural characteristics of the ontology. The best set of matchers is chosen using the sets of patterns learned through the proposed Bottom-up and Top-down bootstrapping approaches. The proposed metrics and the patterns constructed using these approaches are evaluated using the COMA matching tool with existing benchmark ontologies (Benchmark, Conference and Benchmark2 tracks of the OAEI 2011). The proposed Bottom-up based patterns, along with the two proposed quality metrics, achieved better effectiveness (F-measure) in selecting the best set of matchers in comparison with the static set of matching, supervised ML algorithms and the existing automatic matching. Specifically, the proposed Bottom-up patterns achieve a 14.6% Average Gain/Task and a significant improvement of 129% in comparison with the existing KNN model's Average Gain/Task.

  • 出版日期2017-12