D-Ocean: an unstructured data management system for data ocean environment

作者:Zhuang, Yueting; Wang, Yaoguang; Shao, Jian*; Chen, Ling; Lu, Weiming; Sun, Jianling; Wei, Baogang; Wu, Jiangqin
来源:Frontiers of Computer Science, 2016, 10(2): 353-369.
DOI:10.1007/s11704-015-5045-6

摘要

Together with the big datamovement,many organizations collect their own big data and build distinctive applications. In order to provide smart services upon big data, massive variable data should be well linked and organized to form Data Ocean, which specially emphasizes the deep exploration of the relationships among unstructured data to support smart services. Currently, almost all of these applications have to deal with unstructured data by integrating various analysis and search techniques upon massive storage and processing infrastructure at the application level, which greatly increase the difficulty and cost of application development. @@@ This paper presents D-Ocean, an unstructured data management system for data ocean environment. D-Ocean has an open and scalable architecture, which consists of a core platform, pluggable components and auxiliary tools. It exploits a unified storage framework to store data in different kinds of data stores, integrates batch and incremental processing mechanisms to process unstructured data, and provides a combined search engine to conduct compound queries. Furthermore, a so-called RAISE process modeling is proposed to support the whole process of Repository, Analysis, Index, Search and Environment modeling, which can greatly simplify application development. The experiments and use cases in production demonstrate the efficiency and usability of D-Ocean.