High-performance XML modeling of parallel queries based on MapReduce framework

Song, Kunfang<sup>*</sup>; Lu, Hongwei

doi:10.1007/s10586-016-0628-z

摘要

With the increasing of data at an incredible rate, the development of cloud computing technologies is of critical importance to the advances of researches. MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. Traditional parallel XML parsing and indexing approaches are inadequate for processing large-scale XML datasets on clusters and; therefore, we propose an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. Our solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, we introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, we design an advanced two phase MapReduce solution that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The first MapReduce phase applies filtering, labeling, index building techniques, in which each DataNode performs elements labeling using a map function and a reduce function to merge and build indexes. In the second phase, local XML queries in multiple partitions are performed in parallel using index-table-enabled B-SLCA. Our experimental results show the efficiency and effectiveness of our proposed parallel XML data approach using MapReduce Framework.

出版日期2016-12
单位华中科技大学

全文

访问全文

收藏分享被引(3) 浏览

更新时间：2024-05-12 19:11

High-performance XML modeling of parallel queries based on MapReduce framework

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友