Education, Science, Technology, Innovation and Life
Open Access
Sign In

Research on General File Storage Scheme Based on Hadoop

Download as PDF

DOI: 10.23977/CNCI2020077

Author(s)

Huimin Liu and Huijie Liu

Corresponding Author

Huimin Liu

ABSTRACT

HDFS is Hadoop's underlying distributed file storage system, designed for large files. When accessing a large number of small files, the problem of low read and write rate and excessive memory load will be encountered. Therefore, HDFS is optimized for the problem of different size files. Firstly, the historical access log is analyzed, the relevance between files is acquired by using Apriori algorithm, the correlation probability model is established, and then a directed graph merging algorithm based on correlation is proposed. In order to solve the problem of low access rate caused by small file merging, prefetching strategy and LRU substitution strategy based on high heat were introduced. Experimental results show that this scheme can effectively reduce the metadata volume of NameNode, improve the memory utilization, and effectively improve the file read and write performance.

KEYWORDS

Merging algorithm; correlation; storage of common files; cache replacement strategy

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.