论文部分内容阅读
大部分存储集群构建时可能包含有遗留设备及新购置设备,这些设备在存储性能方面存在较大差异.采用HDFS默认的机架感知存储策略时,可能使访问频率高的数据存储在低性能节点上,而访问频率低的数据存储在高性能节点上,既影响集群响应时间,又降低了资源利用率.针对以上问题,提出一种分级存储调度机制.在HDFS机架感知调度策略基础上,首先根据节点的CPU、内存大小、磁盘大小、磁盘I/O等固有硬件性能将节点划分为高配置节点和低配置节点,其次根据节点的CPU使用率、内存使用率、网络带宽使用率、磁盘使用率等性能的动态因素建立节点的性能评价模型,并建立三个性能级别.根据节点配置情况、性能级别及网络位置等多方面因素进行综合调度.同时在集群运行过程中,会根据数据的访问频率对数据块的分布进行动态调整.实验结果表明,本文提出的分级存储调度机制可以在HDFS异构集群中提高数据的访问效率,优化集群性能.
Most storage clusters are built with legacy devices and newly acquired devices that have significant storage performance differences.Using HDFS default rack-aware storage strategies may make it possible to store high-frequency access data on low-performance nodes , While the low-access data is stored on high-performance nodes, which not only affects cluster response time, but also reduces resource utilization.Aiming at the above problems, a hierarchical storage scheduling mechanism is proposed.On the basis of HDFS rack aware scheduling strategy, Firstly, the nodes are divided into high-configuration nodes and low-configuration nodes according to the inherent hardware performance of nodes such as CPU, memory size, disk size and disk I / O. Secondly, according to node CPU usage, memory usage, network bandwidth usage, Utilization and other performance factors to establish performance evaluation model of the node and establish three performance levels.According to the node configuration, performance level and network location and other factors for a comprehensive scheduling.At the same time in the cluster operation, according to the data Access frequency to dynamically adjust the distribution of data blocks.The experimental results show that the proposed hierarchical storage Scheduling mechanism can improve data access efficiency and optimize cluster performance in HDFS heterogeneous clusters.