A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce

作者:Ahn Jinhyun; Im Dong Hyuk*; Lee Taewhi; Kim Hong Gee
来源:Journal of Supercomputing, 2017, 73(2): 810-836.
DOI:10.1007/s11227-016-1803-y

摘要

A massive amount of extensible markup language (XML) data from various areas is available on the Web. Answering structural queries against XML data is important, as it is the core of information retrieval systems for XML data. Labeling scheme has been suggested for rapid query processing of massive XML data. Interval-based, prefix-based, and prime number labeling scheme exist. Of these, the prime number labeling scheme has the advantage of query processing by arithmetic operations. Recently, the repetitive prime number labeling scheme was proposed; this scheme produces a smaller label size than conventional prime number labeling using prime numbers repetitively. However, a parallel algorithm for the repetitive prime number labeling scheme does not exist; therefore, this scheme is difficult to apply to massive XML data. In this paper, a dynamic and parallel approach of XML labeling algorithm that works with MapReduce is proposed for, particularly, the repetitive prime number labeling scheme. Two optimization techniques are devised: the label assignment order adjustment to further reduce the label size and the upper tree compressing technique to reduce the memory requirements during the labeling process. Experiments over real-world XML data confirmed that the techniques are effective than the previous works.

  • 出版日期2017-2