摘要

Big data is playing a more and more important role in every area such as medical health, internet finance, culture and education etc. How to process these big data efficiently is a huge challenge. MapReduce is a good parallel programming language to process big data. However, it has lots of shortcomings. For example, it cannot process complex computing. It cannot suit real-time computing. In order to overcome these shortcomings of MapReduce and its variants, in this paper, we propose a Semantic++ MapReduce parallel programming model. This study includes the following parts. (1) Semantic++ MapReduce parallel programming model. It includes physical framework of semantic++ MapReduce parallel programming model and logic framework of semantic++ MapReduce parallel programming model;(2) Semantic++ extraction and management method for big data;(3) Semantic++ MapReduce parallel programming computing framework. It includes semantic++ map, semantic++ reduce and semantic++ shuffle;(4) Semantic++ MapReduce for multi-data centers. It includes basic framework of semantic++ MapReduce for multi-data centers and semantic++ MapReduce application framework for multi-data centers;(5) A Case Study of semantic++ MapReduce across multi-data centers.

全文