Efficient Distributed Query Processing on Large Scale RDF Graph Data

Wang, Xin; Xu, Qiang; Chai, Le-Le; Yang, Ya-Jun<sup>*</sup>; Chai, Yun-Peng

doi:10.13328/j.cnki.jos.005696

摘要

Knowledge graphs are the main representation form of intelligent data. With the development of knowledge graphs, more and more intelligent data has been released in the form of the resource description framework (RDF). It is known that the semantics of SPARQL correspond to graph homomorphism which is an NP-complete problem. Therefore, how to efficiently answer SPARQL queries in parallel over big RDF graphs has been widely recognized as a challenging problem. There are some research works using the MapReduce computational model to process big RDF graph. However, SPARQL queries in these works are decomposed into the set of query clauses without considering any semantics and graph structure embedded in RDF graph, which leads to overmuch MapReduce iterations. This study first decomposes the SPARQL query graph into a set of stars by utilizing the semantic and structural information embedded RDF graphs as heuristics, which can be matched in fewer MapReduce iterations. Meanwhile, a matching order of these stars is given to reduce intermediate results in MapReduce iterations. During the matching phase, each round of MapReduce adds one star with the join operation. The extensive experiments on both synthetic dataset WatDiv, and real-world dataset DBpedia are carried out. The experiments results demonstrate that the proposed star decomposition-based method can answer SPARQL BGP queries efficiently, which outperforms SHARD and S2X by one order of magnitude. Finally, extensive experiments show that the performance of the optimization algorithms is improved by 49.63% and 78.71% than the basic algorithm over both synthetic and real datasets.

出版日期2019-3-1
单位天津大学; 中国人民大学

全文

访问全文

收藏分享被引浏览

更新时间：2023-04-23 15:37

Efficient Distributed Query Processing on Large Scale RDF Graph Data

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友