A novel approach to record file correlation and reduce mapping frequency on HDFS based on ExtendHDFS

Xiao, Chang; Li, Qiang; Zheng, Dong

doi:10.1109/ICCSNT.2013.6967105

摘要

Hadoop Distributed File System (HDFS) is quite commonly deployed in large data storage facilities and behaved very efficient when managing very large files. However, it has problems when operating large amount of small files. This is mainly because of the master-slave structure. Access request of too many small files will bring heavy burden to NameNode, which is the master machine of Hadoop. In the previous studies, Dong paid attention to file correlation and Chandrasekar S has proposed a general prefetching method. But neither of them gives a specific approach to record file correlation. Both of them made an assumption that files in one merged block has the higher correlation. In this paper, we proposed a new way to record file correlations based on Chandrasekar's EHDFS. Through our recorded data, an optimal file request chain is achieved. The chain represents the most correlate file order. According to this order, blocks that contains small files can be re-constructed. After reconstruction, the new blocks will have higher prefetching efficiency according to our theoretical analysis and significantly reduce the request sent to Hadoop NameNode.

出版日期2014
单位上海交通大学

全文

访问全文

收藏分享被引浏览

更新时间：2019-09-14 02:31

A novel approach to record file correlation and reduce mapping frequency on HDFS based on ExtendHDFS

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友