WO2020206704A1 - 基于集群节点负载状态预测的冷热区域划分节能存储方法 - Google Patents

基于集群节点负载状态预测的冷热区域划分节能存储方法 Download PDF

Info

Publication number
WO2020206704A1
WO2020206704A1 PCT/CN2019/082592 CN2019082592W WO2020206704A1 WO 2020206704 A1 WO2020206704 A1 WO 2020206704A1 CN 2019082592 W CN2019082592 W CN 2019082592W WO 2020206704 A1 WO2020206704 A1 WO 2020206704A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
cold
load
cluster
nodes
Prior art date
Application number
PCT/CN2019/082592
Other languages
English (en)
French (fr)
Inventor
倪丽娜
韩庆亮
禹继国
张金泉
Original Assignee
山东科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东科技大学 filed Critical 山东科技大学
Publication of WO2020206704A1 publication Critical patent/WO2020206704A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • G06F9/4893Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues taking into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5019Workload prediction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention belongs to the field of information technology processing, and in particular relates to a cold and hot zone division energy-saving storage method based on cluster node load state prediction.
  • each data block has multiple backups.
  • the access frequency of data blocks gradually decreases, and the computing load of nodes also gradually decreases. This leads to some computing nodes running at a lower load, resulting in energy waste.
  • the data blocks stored by the data nodes will become cold data in a considerable proportion of the time, and these data will occupy the storage resources of the computing nodes. Therefore, it is necessary to migrate multiple backups of data blocks and sleep some nodes to save energy.
  • the present invention proposes a cold and hot zone partitioning energy-saving storage method based on the prediction of the load status of cluster nodes.
  • the design is reasonable, overcomes the shortcomings of the prior art, and has good effects.
  • An energy-saving storage method for partitioning cold and hot regions based on cluster node load state prediction including the following steps:
  • Step 1 Based on the LSTM prediction model, predict the load in the node period T, and arrange the list Load_List according to the load from high to low:
  • Step 2 According to the load value list Load_List, through the preset load threshold Threshold, divided into hot zone Nset hot and cold zone Nset cold , assuming that there are m nodes in the hot zone and nm nodes in the cold zone, m ⁇ n, that is :
  • Step 3 Determine the relationship between Resource and Requirement in the hot zone
  • Step 4 Add the nodes of the buffer queue to the hot zone one by one until the resource amount in the hot zone Resource ⁇ Requirement, or all nodes in the buffer queue join the hot zone; if Resource ⁇ Requirement is still not satisfied, activate the sleep queue one by one And add it to the hot zone until Resource ⁇ Requirement is met, and then perform load balancing operations;
  • Step 5 Store the newly created data block in the hot zone Nset hot node, and adopt the cluster default data block placement strategy
  • Step 6 Calculate the access frequency list of the data block of the hot zone node Determine the frequency of data block access in node i The relationship with the coldness threshold ⁇ ;
  • Step 7 Calculate the data block access frequency list of the Nset cold buffer node in the cold area
  • node j continues to remain in the cold zone buffer node queue
  • Step 8 Update the prediction model to predict the node load in the next period T.
  • This application first introduces the idea of the method, then describes the execution steps of the method, and the method's pseudo-code implementation; then conducts a theoretical analysis of the energy-saving effect of the energy-saving strategy; then introduces the experimental process, including the experimental environment, The selection of the data set, the access format of the data block, and the final analysis of the results of the experiment. Through a quantitative analysis and comparison with the energy consumption of the Hadoop cluster that is not divided into hot and cold regions, it is verified that the energy-saving storage strategy of this application has Certain actual energy saving effect.
  • the load situation of the cluster is relatively balanced, and the load rate of the cluster is about 32%.
  • the load rate of the cluster is about 32%.
  • most of the workload in the cluster hotspot area is 40%. Between% and 60%, the average load rate of nodes in the hot zone is about 50.1%. Compared with the load of about 32% in traditional clusters, it makes better use of the resources of cluster nodes.
  • 20%-30% of the hosts are Sleep for different periods of time, thereby reducing the total energy consumption of the cluster.
  • the invention effectively increases the average load of nodes in the hotspots in the cluster, significantly reduces the total energy consumption of the cluster, and has far-reaching significance for improving energy utilization, reducing the operating cost of the data center, and building a clean, green, energy-saving and environmentally-friendly data center significance.
  • Figure 1 is a schematic diagram of the node distribution of the cold and hot zone division strategy.
  • Figure 2 is a schematic diagram of the number of nodes in the hotspot area.
  • Figure 3 is a schematic diagram of energy consumption per unit time in a 10-hour interval.
  • the data block access of Hadoop cluster is generally regular. This application obtains a general description of the law by consulting the literature. This application designs a storage strategy for the data blocks of the Hadoop cluster according to the data access law.
  • Access to a data block generally occurs within a short period of time after it is uploaded. As time goes by, the data block will become cold and the probability of being accessed will decrease. Verifying the rationality of this law is of great help to the design of storage strategies for data blocks.
  • the default copy storage mechanism of the Hadoop cluster will store its copies in three locations.
  • Location 2 and Location 1 are in the same rack but different nodes
  • Location 3 is stored on a node in a different rack from Location 2 .
  • the data block storage of the Hadoop cluster mainly considers its reliability to ensure that when a node where a data block is placed fails, the backup of the data block can be taken from other nodes in the cluster, which also ensures the security of data storage. But this strategy is not conducive to reducing energy consumption.
  • this application uses the Graylog log analysis tool to analyze the access log collection of the English page of Wikipedia within one month by calling its API interface.
  • Graylog's statistical analysis of log files there are a total of 14,226 files, and 10799 have less than 6 access times during the statistical period, accounting for 75.9103% of log files.
  • the access volume of all files in the cluster is 1769884, and the access volume of files with less than 6 accesses is 212,352, accounting for 11.9981% of the total file access volume. It can be seen that for about 76% of the files, the access volume of the files only accounts for about 12% of the total access volume. According to further statistics, it can be found that only one file access accounted for 63.2863% of the total, and the most frequently accessed file was accessed 24964 times. From this data regular access analysis, it can be seen that access to files has an obvious tendency to time characteristics.
  • This application proposes a HES-Storage (Hadoop Energy-Saving Storage) energy-saving storage strategy based on node state prediction.
  • HES-Storage Hadoop Energy-Saving Storage
  • the cold area and the hot area of the cluster are divided.
  • many data center division methods are static division, that is, when the cluster is deployed, the cluster is divided into hot and cold regions.
  • the advantage of this method is simple management, but the problems that arise cannot be ignored, due to the calculation of the cluster input
  • the demand is constantly changing, and the load of the cluster will continue to change. When the load of the cluster increases and the nodes in the hot zone are not enough, nodes in the cold zone are required to participate. This process will involve a large number of file block migration and other operations, resulting in great energy Consumption loss; if the load of the cluster is small, the computing power of the nodes in the hot zone will be wasted.
  • the data blocks with high access frequency are stored in the active node, and the nodes with low access frequency are stored in the waiting queue first.
  • the data block access coldness threshold ⁇ is set. If the data in the node is If the block meets the condition of the coldness threshold ⁇ , the node is put to sleep.
  • This application mainly considers the following aspects when designing a storage strategy:
  • Nodes with high load in historical state generally store more data blocks that access hotspots, and more data related to computing tasks will be backed up.
  • the general principle of backup is to migrate nodes with fewer data blocks to nodes with more data blocks and maintain high load on these nodes, which can reduce the cost of migration energy consumption to a certain extent.
  • this application divides the cluster into hot and cold regions, and adopts different data block storage strategies in the hot and cold regions, and maintains a buffer queue and a sleep queue in the cold region.
  • the buffer node is in the cold region.
  • the purpose of setting this area is to store data blocks in the hot zone that have reached the coldness threshold, because these data blocks do not take long to cool down. According to the data block access rule, these data may be accessed again, so the area is set to achieve better service performance.
  • Figure 1 is a schematic diagram of the node distribution of the cold and hot zone division strategy.
  • Hot area The main contributor to the energy consumption of the cluster, and it also stores data that is frequently accessed by the cluster.
  • the cluster in the buffer queue state is first added to the hot zone node, and then the entire cluster resource amount and task request amount are recalculated If the relationship does not meet the requirements, the nodes in the dormant queue need to be activated, so that the purpose of dynamically adjusting the scale of active nodes in the cluster is achieved.
  • New data blocks are stored in the hot zone by default.
  • the storage strategy uses the default storage mechanism of the Hadoop cluster, because the storage mechanism stores the data blocks in the cluster in a relatively balanced manner according to the idea of load balancing, which is conducive to making full use of the hot zone nodes Computing power.
  • B. Cold area This area maintains sleep queues and buffer queues.
  • the dormant queue stores the data that has become cold in the cluster, and the possibility of these data being accessed again is very low; the buffer queue is based on the prediction model to obtain the load status of the nodes in the future time period T, and queues from large to small according to the predicted load situation.
  • nodes with low predicted load are placed in the buffer queue. These nodes are spare cold data storage nodes.
  • Nset ⁇ node 1 ,node 2 ...,node n ⁇
  • the data block is denoted as block i .
  • the coldness threshold must be met, denoted as ⁇ ; the sleep operation of the buffer node must meet the sleep threshold, denoted as ⁇ .
  • the energy-saving storage process is based on the load prediction results of the predictive model.
  • the cluster is classified into different areas according to the load threshold Threshold.
  • the hot area uses the default storage strategy, and the cold area uses centralized storage. Calculate the access frequency of data blocks in the hot area.
  • the coldness threshold is migrated to the buffer node; the data block access frequency set of the buffer node is calculated, and the buffer node that meets the dormancy threshold is put to sleep; the relationship between the amount of cluster resources and the demand is updated; the prediction model is updated.
  • the data in the hot and cold areas adopt different storage methods.
  • the "centralized" storage method is adopted, because the probability of access to the cold data becomes lower and lower over time.
  • the data stored in this area is migrated from the hot area node, the earlier the node sleeps, the lower the probability of being accessed. That is, the sooner the data block enters the cold area, the sooner the host host will enter the dormant state.
  • the data that enters the cold area in a certain adjacent time period will be stored on the same data node until the storage limit of the node is reached.
  • the data blocks stored in adjacent nodes of the queue move into the cold area at close times. That is, the migration time of the data blocks stored in the node Slave i and Slave i+1 is closer than that of Slave i and Slave i+2 .
  • the advantage of this is that the data blocks with small cooling time difference can be stored together.
  • the data in the hot zone adopts a "balanced" distribution.
  • the data is distributed as evenly as possible among the nodes in the hot zone to provide better access services.
  • the HES-Storage data storage strategy needs to count the energy consumption of the cluster nodes from the start of service to the dormancy period, and respectively count the energy consumption values of the hot zone nodes and the cold zone nodes.
  • the hot zone nodes are the main contributor to the energy consumption of the cluster, but the cold zone
  • the buffer queue nodes in the area also need statistics.
  • the experimental scenario is to simulate a user accessing a webpage, and the algorithm of this application is only for data partition storage strategies in this scenario.
  • the algorithm of this application is only for data partition storage strategies in this scenario.
  • first design the input format of the requested data block see Table 1.1:
  • the data set uses the access log data of the English page of Wikipedia from October 1, 2018 to October 14, 2018, which can be downloaded on the official website of cloudsuite.
  • the format of access is shown in Table 1.1.
  • These data sets are mainly used to simulate the access conditions of real users, and to view the load changes of the entire cluster and the output results of the sleeping queue nodes.
  • the energy consumption results are quantitatively analyzed, and compared with the case where the hot and cold regions are not divided Of cluster energy consumption data.
  • the experimental process is to verify the energy-saving effect of the storage strategy proposed in this application. Therefore, the design of data set selection and data block access format is designed. The following are the main steps of the experiment:
  • the data must be preprocessed first. Because of the Agent probe installed in the bottom data collection layer, the CPU and memory data obtained in real time are used to train the LSTM prediction model, and the Threshold is preset Categorize the nodes.
  • the analysis result sets the access frequency threshold; it mainly includes the threshold ⁇ for the data block to migrate from the hot area to the cold area buffer queue, and the sleep threshold ⁇ that the frequency set of data blocks in the buffer node meets.
  • FIG. 2 is a schematic diagram of the number of nodes in the cluster hot zone during the simulation experiment.
  • the time period when the number of active nodes in the cluster is 8 is the longest, but peaks may occur at certain times, such as 169H and 170H.
  • the reason for this situation may be that the resource amount of the cluster cannot meet the demand. For example, in case of centralized access, the cluster needs more node resources. Another reason is that when you need to access the data of the dormant node, you need to activate the dormant node to join the active node, and there will be a certain probability of continuously accessing the data of the dormant node, so that more nodes are activated to join the cluster.
  • the energy consumption value is calculated according to the formula (2) energy consumption calculation formula.
  • the total energy consumption value of the cluster from t 0 to t 1 denoted by E, is obtained by integrating the power of the node P(u(t)), as shown in formula (2):
  • the active nodes of the cluster can achieve the purpose of adjusting the scale of the cluster according to the number of visits and the changing law of the visit hotspot, thereby reducing the energy consumption of the cluster.
  • the overall trend of the cluster size adjustment process is relatively gentle, but at some points in time, there will be continuous changes in the cluster size. The reasons for this phenomenon may be as follows:
  • File access has a "long tail effect". The file accesses a data block with a small access frequency, making it difficult for the node where it is located to enter the dormant state, or it needs to be activated to obtain data after entering the dormant state.
  • This application proposes an energy-saving storage algorithm for partitioning hot and cold regions based on cluster node load state prediction.
  • the main basis is that cluster data access has a general rule. According to the data access rule, the storage of these data is stored in hot and cold zone partitions. , In order to achieve the purpose of storing a part of cold data nodes that are accessed with a very low probability of sleeping.
  • This application first introduces the idea of the method, then describes the execution steps of the method, and the method's pseudo-code implementation; then conducts a theoretical analysis of the energy-saving effect of the energy-saving strategy; then introduces the experimental process, including the experimental environment, The selection of the data set, the access format of the data block, and the final analysis of the results of the experiment. Through a quantitative analysis and comparison with the energy consumption of the Hadoop cluster that is not divided into hot and cold regions, it is verified that the energy-saving storage strategy of this application has Certain actual energy saving effect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于集群节点负载状态预测的冷热区域划分节能存储方法,属于信息技术处理领域。本申请首先介绍了方法的思想,进而描述了方法的执行步骤,以及方法的伪代码实现;接着对该节能策略的节能效果进行理论上的分析;然后介绍了实验的过程,包括实验的环境,数据集的选用,数据块的访问格式,最后对实验的结果做了分析,通过和Hadoop集群未做冷热区域划分的方式的能耗进行量化的分析比较,验证了本申请的节能存储策略具有一定的实际节能效果。

Description

基于集群节点负载状态预测的冷热区域划分节能存储方法 技术领域
本发明属于信息技术处理领域,具体涉及一种基于集群节点负载状态预测的冷热区域划分节能存储方法。
背景技术
云计算技术的发展使得数据中心的规模和数量不断扩大,同时也带来数据中心集群能耗成本越来越高的问题。在Hadoop集群中为了实现数据的可靠性,每个数据块有多个备份。随着时间推移,数据块的访问频率逐步下降,节点的计算负载也逐步降低。这就导致了某些计算节点以较低的负载在运行,造成了能源浪费。同时,数据节点存储的数据块会有相当大比例的时间变为冷数据,而且这些数据会占用计算节点的存储资源。因此,需要对数据块的多个备份进行迁移,休眠部分节点,节省能源。
发明内容
针对现有技术中存在的上述技术问题,本发明提出了一种基于集群节点负载状态预测的冷热区域划分节能存储方法,设计合理,克服了现有技术的不足,具有良好的效果。
为了实现上述目的,本发明采用如下技术方案:
一种基于集群节点负载状态预测的冷热区域划分节能存储方法,包括如下步骤:
步骤1:基于LSTM预测模型,预测节点周期T内的负载,并按照负载从高到低排列列表Load_List:
Figure PCTCN2019082592-appb-000001
其中,node i表示第i个节点,i=1,2,…,m;
Figure PCTCN2019082592-appb-000002
表示节点node i在周期T内的负载;node j表示第j个节点,j=m+1,m+2,…,n;
Figure PCTCN2019082592-appb-000003
表示节点node j在周期T内的负载;
步骤2:根据负载值列表Load_List,通过预设负载阈值Threshold,划分为热区域Nset hot和冷区域Nset cold,假定热区域中有m个节点,冷区域中有n-m个节点,m≤n,即:
Nset hot={node 1,node 2,…,node i,…,node m};其中,i=1,2,…,m;
Nset cold={node m+1,node m+2,…,node j,…,node n};其中,j=m+1,m+2,…,n;
步骤3:判断热区域的资源量Resource和需求量Requirement的关系;
若:判断结果是Resource<Requirement,则执行步骤4;
或判断结果是Resource≥Requirement,则执行步骤5;
步骤4:将缓冲队列的节点逐个加入到热区域中,直到热区域中资源量Resource≥Requirement,或缓冲队列中所有节点都加入热区域为止;如果仍然不满足Resource≥Requirement,则逐个激活休眠队列的节点,并将其加入到热区域直到满足Resource≥Requirement为止,然后进行负载均衡操作;
步骤5:将新创建的数据块存储在热区域Nset hot节点中,并且采用集群默认的数据块放置策略;
步骤6:计算热区域节点数据块访问频度列表
Figure PCTCN2019082592-appb-000004
判断节点node i中数据块访问频度
Figure PCTCN2019082592-appb-000005
与冷度阈值θ的大小关系;
若:判断结果是
Figure PCTCN2019082592-appb-000006
则该数据块集合迁移到冷区域缓冲队列;
或判断结果是
Figure PCTCN2019082592-appb-000007
则该数据块集合仍然保留在热区域;
步骤7:计算冷区域Nset cold缓冲节点的数据块访问频度列表
Figure PCTCN2019082592-appb-000008
判断节点node j中数据块访问频度
Figure PCTCN2019082592-appb-000009
与休眠阈值λ的大小关系;
若:判断结果是
Figure PCTCN2019082592-appb-000010
则将节点node j休眠;
或判断结果是
Figure PCTCN2019082592-appb-000011
则节点node j继续保留在冷区域缓冲节点队列中;
步骤8:更新预测模型,预测下一周期T的节点负载情况。
本发明所带来的有益技术效果:
本申请首先介绍了方法的思想,进而描述了方法的执行步骤,以及方法的伪代码实现;接着对该节能策略的节能效果进行理论上的分析;然后介绍了实验的过程,包括实验的环境,数据集的选用,数据块的访问格式,最后对实验的结果做了分析,通过和Hadoop集群未做冷热区域划分的方式的能耗进行量化的分析比较,验证了本申请的节能存储策略具有一定的实际节能效果。
传统Hadoop集群在默认的数据块放置策略下,集群的负载情况相对均衡,集群的负载率在32%左右,本申请在采用了HES-Storage策略后,集群热点区域的工作负载绝大部分在40%到60%之间,热点区域的节点平均负载率约50.1%,相比较于传统集群的32%左右的负载,更好地利用了集群节点的资源,同时20%~30%的主机数量在不同时间段休眠,从而降低 了集群的总能耗。
本发明有效提高了集群中热点区域的节点平均负载,显著降低了集群的耗能总量,对于提高能源利用率,降低数据中心的运营成本,打造清洁、绿色、节能环保的数据中心具有深远的意义。
附图说明
图1为冷热区域划分策略节点分布示意图。
图2为热点区域的节点数量示意图。
图3为10个小时区间内的单位时间的能耗值示意图。
具体实施方式
下面结合附图以及具体实施方式对本发明作进一步详细说明:
Hadoop集群的数据块访问是有一般规律的。本申请通过查阅文献得到该规律的一般阐述,本申请根据数据访问规律,设计了Hadoop集群数据块的存储策略。
1、数据块存储设计思想
数据块的访问一般发生在其上传后较短的时间内,随着时间的推移,该数据块会变冷,被访问的概率会降低。验证该规律的合理性对设计数据块的存储策略有很大帮助。
1.1、Hadoop集群数据块存储分析
(1)副本存储机制分析
对于一份数据块Data,Hadoop集群默认的副本存储机制会将其副本存储在三个位置。该数据块的副本集合记为:data={file 1,file 2,file 3},其位置的集合表示为:location={Location 1,Location 2,Location 3},其中Location 1是集群随机选择的一个节点,Location 2的位置和Location 1在同一机架但是不同的节点,Location 3存储在与Location 2不同机架的节点上。
Hadoop集群的数据块存储主要考虑其可靠性,以保证当放置某个数据块的节点发生故障时,能够在集群的其他节点取到该数据块的备份,这样也保证了数据存储的安全性,但是该策略不利于降低能耗。
(2)数据规律访问分析
为了验证数据块访问规律,本申请借助于Graylog日志分析工具,通过调用其API接口分析了在一个月时间内***英文页面的访问日志集。
通过Graylog对日志文件的统计分析,共有14226个文件,在统计时间段内访问次数少于6次的有10799个,占日志文件的75.9103%。集群中所有文件的访问量是1769884,访问次数少于6次的文件访问量为212352,占文件总访问量的11.9981%。可以看出,约76%的文件,其文件的访问量只占到总访问量的约12%。进一步统计可以发现,文件访问量只有一次 的占到总量的63.2863%,文件访问最热的文件被访问了24964次。由此数据规律访问分析可见,对文件的访问是有明显的时间特征倾向的。
1.2、冷热区域划分策略设计
本申请提出了基于节点状态预测的HES-Storage(Hadoop Energy-Saving Storage)的节能存储策略。根据集群节点数据训练的模型,划分集群的冷区域和热区域。目前很多数据中心划分的方法是静态划分,即在集群部署的时候,就将集群划分为热区域和冷区域,这种方法的好处是管理简单,但是产生的问题不容忽视,由于集群输入的计算需求是不断变化的,集群的负载会不断变化,当集群的负载增加,热区域节点不够时需要冷区域的节点参与,这个过程会涉及到大量的文件块的迁移等操作,造成很大的能耗损失;如果集群的负载很小,就会浪费热区域节点的计算能力。
根据数据访问特点,将访问频度高的数据块存储在活跃节点中,将访问频度低的节点,先存储在待休眠队列中,同时设置数据块访问冷度阈值θ,如果节点中的数据块符合冷度阈值θ的条件,将该节点进行休眠。
本申请在设计存储策略时主要考虑以下几个方面:
(1)将集群中某个节点记为Slave i,通过历史数据的训练,给出Slave i未来时间周期T内的负载值预测,当其低于预设负载值的时候,将Slave i归为低负载节点,不再分配新的计算任务,将其加入到缓冲队列,同时作为冷数据的备用迁入节点。
(2)历史状态下负载高的节点,一般情况下存储了更多访问热点的数据块,即将更多与计算任务相关的数据备份。备份的一般原则是将数据块少的节点向数据块多的节点迁移,维持这些节点的高负载,这样能够在一定程度上降低迁移能耗的成本。
通过以上分析,本申请将集群划分为热区域和冷区域,并且在冷热区域中采用不同的数据块存储策略,同时在冷区域中维护一个缓冲队列和一个休眠队列,缓冲节点是冷区域中负载很低但是未休眠的节点,设置该区域的目的是存储热区域中达到冷度阈值的数据块,因为这些数据块变冷时间不长。根据数据块访问规律,这些数据有可能会被再次访问,所以设置该区域是为了达到较好的服务性能。
图1为冷热区域划分策略节点分布示意图。
A.热区域:集群能耗的主要贡献部分,同时存储了集群访问频度高的数据。当集群的计算需求增加,热区域的节点负载过高导致集群的服务质量下降,这时首先会将处于缓冲队列状态的集群加入到热区域节点,然后重新计算整个集群资源量和任务请求量的关系,如果还不满足要求,则需要激活休眠队列的节点,这样就达到了动态的调整集群活跃节点规模的 目的。新的数据块默认存储在热区域中,同时,存储策略使用Hadoop集群默认的存储机制,因为该存储机制根据负载均衡的思想将数据块相对均衡地存储到集群中,有利于充分利用热区域节点的计算能力。
B.冷区域:该区域维护休眠队列和缓冲队列。休眠队列存放集群中变冷的数据,这些数据再次被访问的可能性很低;缓冲队列是根据预测模型,得到未来时间周期T内节点的负载状态,按照预测的负载情况从大到小排队,同时将预测的负载情况低的节点放在缓冲队列里。这些节点是备用的冷数据存储节点。节点由缓冲队列节点加入休眠节点需要满足休眠的条件,即当该缓冲节点上所有数据块的访问频度满足休眠阈值λ,将该节点进行休眠操作。
2、数据存储策略描述
定义2.1集群节点数量为n,则节点集合表示为Nset={node 1,node 2...,node n},则热区域为Nset的子集Nset hot,记为:
Figure PCTCN2019082592-appb-000012
其中k i(i=1,2,…,m)∈[1,n],Nset其他节点构成冷区域Nset cold,表示为:
Figure PCTCN2019082592-appb-000013
Figure PCTCN2019082592-appb-000014
其中k j(j=m+1,m+2,…,n)∈[1,n]。
定义2.2数据块记作block i,则从热区域迁移该数据块到冷区域缓冲节点时,需要满足冷度阈值,记作θ;缓冲节点休眠操作需要满足休眠阈值,记作λ。
(1)算法步骤描述:
该节能存储过程首先基于预测模型的负载预测结果,将集群根据负载阈值Threshold分类到不同的区域,热区域采用默认存储策略,冷区域采用集中存储,计算热区域数据块的访问频度,如果满足冷度阈值则向缓冲节点迁移;计算缓冲节点的数据块访问频度集合,将满足休眠阈值的缓冲节点休眠;更新集群资源量和需求量的关系;更新预测模型。
具体的算法描述见算法1-1:
Figure PCTCN2019082592-appb-000015
Figure PCTCN2019082592-appb-000016
2)算法伪代码
Figure PCTCN2019082592-appb-000017
3、算法节能分析
随着时间的推移,一些热点数据会变为冷数据,需要根据一定的策略将这些“变冷”的数据迁向冷区域的缓冲节点,如果在冷区域的缓冲队列节点中存储的这些冷数据满足休眠阈值λ,则将该缓冲节点进行休眠。
(1)冷区域数据块存储方式
冷热区域中的数据采用不同的存储方式,在冷区域中,采用“集中式”的存储方式,因为冷数据会随着时间的推移,其被访问的概率越来越低。采用这种存储方式,由于该区域存储的数据是从热区域节点迁移过来的,所以节点休眠时间越早,其被访问的概率越低。即越早进入冷区域的数据块,其所宿主的主机会越早的进入休眠的状态。同时,在某个相邻的时间段内进入冷区域的数据,会存储在相同的数据节点上,直到达到该节点的存储上限。
例如,对于冷区域Queue_Cold_Area中的节点,存储在该队列相邻的节点中的数据块迁入冷区域的时间是接近的。即节点Slave i和Slave i+1中存储的数据块的迁入时间相比较Slave i和Slave i+2更为接近。这样做的优点是可以将变冷时间差异较小的数据块储在一起,当计算该节点存储的数据块列表访问频度时,相比较于分散或者随机放置的方式,更容易地满足休眠阈值λ,从而具有较好的节能效果。
(2)热区域数据块存储方式
热区域中的数据采用的是“均衡式”的分布,根据集群默认的负载均衡策略,尽可能均匀地将数据分布在热点区域的节点中,以提供更好的访问服务。
HES-Storage数据存储策略需要统计集群节点开始服务到休眠的时间段内的能耗,分别统计热区域节点和冷区域节点的能耗值,热区域节点是集群能耗的主要贡献部分,但是冷区域的缓冲队列节点也需要统计。
所以,集群总的能耗可以用公式(1)计算:
Figure PCTCN2019082592-appb-000018
根据数据的访问规律,将访问频度不同的数据按照不同的策略进行存储是本申请算法的核心思想。如果将冷区域的缓冲队列的能耗考虑在内,同时根据数据的变冷规律,将热区域中的数据存储到冷区域,可以达到根据预测的负载情况动态调整集群活跃节点规模的目的,进而降低集群的能耗。
4、实验结果和分析
4.1实验设计
实验场景是模拟用户访问网页,本申请算法只针对该场景下做数据分区存储策略。为了验证此数据块放置策略的有效性,首先设计请求数据块的输入格式,见表1.1:
表1.1数据块的读写请求格式
Tab.1.1 The format of read and write request of data blocks
Figure PCTCN2019082592-appb-000019
数据集选用的是2018年10月1日到2018年10月14日***的英文页面访问日志数据,可在cloudsuite官网下载。访问的格式见表1.1。主要利用这些数据集来模拟真实用户的访问情况,并且查看整个集群的负载变化以及休眠队列节点的输出结果,最后通过能耗模型,量化分析能耗结果,对比没有采取冷热区域划分的情况下的集群能耗数据。
4.2实验过程
该实验过程是要验证本申请提出的存储策略节能效果,所以在数据集选用,数据块访问格式等方面进行了设计,下面是实验的主要步骤:
(1)训练历史数据,首先要对数据进行预处理,由于在底层的数据收集层安装的Agent探针,所以使用实时获得的CPU和内存数据,用来训练LSTM预测模型,同时通过预设Threshold对节点归类。
(2)分析结果设置访问频度阈值;主要包括数据块从热区域迁移到冷区域缓冲队列的阈值θ,以及缓冲节点中数据块频度集合满足的休眠阈值λ。
(3)持续的请求访问,将按照接口的要求传入参数,同时底层监控集群所有节点资源使用率的情况,节点的活跃状态,节点休眠时间段等,详细指标见表1.2:
表1.2集群能耗关键指标
Tab.1.2 Key index of energy consumption in cluster
Figure PCTCN2019082592-appb-000020
Figure PCTCN2019082592-appb-000021
将这些***指标数据收集并存储在SERVER端的数据库中,并且输出到集群的运行日志中以供下一阶段的数据分析。
4.3结果与分析
实验有两个关键过程,即两次数据块迁移时机的选择。首先要计算热区域数据块频度集合,与阈值θ比较,将符合条件的数据块集合迁向冷区域,该阈值对算法的实际节能效果影响较大,同时该阈值也非常适用于模拟用户访问网页访问的场景;其次在决定冷区域的缓冲节点是否休眠时,需要遍历该节点上所有的数据块的访问频度,只有当其存储的数据块集合中的数据块全部满足休眠阈值λ,才能对该节点进行休眠操作。结果如下所示:
(1)集群活跃节点的数量变化情况
集群能耗的主要部分是由热区域节点产生的,主要受两方面因素的影响:集群热区域节点的数量以及热区域节点的平均负载情况。所以,热区域的节点数量和负载对节能效果有重要的影响,图2是模拟实验期间,集群热区域的节点数量变化情况示意图。
通过图2可以分析得出以下结论:
1)集群的活跃节点数量为8的时间段最长,但是某些时间会出现峰值的情况,比如在169H、170H的时间。出现这种情况的原因可能是集群的资源量不能满足需求,比如遇到集中访问的情况,此时集群需要更多的节点的资源量。还有一种原因是需要访问休眠节点的数据时,需要激活休眠节点加入到活跃节点中,会存在一定概率连续访问休眠节点的数据,这样就激活了较多的节点加入到集群中。
2)传统Hadoop集群在默认的数据块放置策略下,集群的负载情况相对均衡,集群的负载率在32%左右。在采用了HES-Storage策略后,集群的热区域的工作负载绝大部分处于区间40%到60%之间,热区域的节点平均负载率约为50.1%,相比较于传统集群的32%左右的负载,更好地利用了集群节点的资源。同时20%~30%的主机数量在不同时间段休眠,从而降低了集群的总能耗。
(2)集群单位时间的能耗分析
根据式(2)能耗计算公式计算能耗值。集群在t 0到t 1期间的总能耗值,用E表示,通过对节点的功率P(u(t))进行积分计算得到,如公式(2)所示:
Figure PCTCN2019082592-appb-000022
由于集群的运行时间较长,在此采用单位时间的能耗值对比情况。将***指标Per_node_CPUpercent、Per_node_MEMpercent、Per_node_aliveTime等指标数据代入计算可得到能耗值的估值,图3为10个小时区间内的单位时间的能耗值。
根据实验结果分析,集群的活跃节点能够根据访问量和访问热点的变化规律,达到调整集群的规模的目的,进而降低集群能耗。但是从图2可以看出,集群规模的调整过程整体趋势比较平缓,但是有的时间点上,会出现集群规模连续变化的现象,造成该现象的原因可能有以下几点:
1)文件的访问出现“长尾效应”,文件访问了访问频度很小的数据块,使其所在的节点很难进入休眠状态,或者进入休眠后需要激活以获取数据。
2)文件访问量遇到峰值的情况,集群平均负载会增加,但是集群的规模不会立即就提升上来,需要在下一个时间周期内,根据集群节点的平均负载,预测到很多节点的负载率超过阈值的情况。这种情况下会首先将缓冲队列的节点加入到集群,如果预测状态的平均负载率不够,会激活休眠节点。然而会有时间滞后的情况出现,对访问性能产生一定的影响。
3)通过图3可以看出,在采用HES-Storage策略存储数据块后,集群总体的能耗值低于Hadoop集群默认的存储方式。集群节点的数量是根据集群的负载量进行调整的,将某些达到阈值的节点休眠,从而降低了集群总的能耗值,达到本申请存储策略的设计目的。
5、小结
本申请提出了基于集群节点负载状态预测的冷热区域划分节能存储算法,主要依据是集群的数据访问是有一般规律的,根据数据访问规律,将这些数据的存储进行冷热区分区的方式存储,以达到休眠一部分存储了极低概率被访问到的冷数据节点的目的。本申请首先介绍了方法的思想,进而描述了方法的执行步骤,以及方法的伪代码实现;接着对该节能策略的节能效果进行理论上的分析;然后介绍了实验的过程,包括实验的环境,数据集的选用,数据块的访问格式,最后对实验的结果做了分析,通过和Hadoop集群未做冷热区域划分的方式的能耗进行量化的分析比较,验证了本申请的节能存储策略具有一定的实际节能效果。
当然,上述说明并非是对本发明的限制,本发明也并不仅限于上述举例,本技术领域的技术人员在本发明的实质范围内所做出的变化、改型、添加或替换,也应属于本发明的保护范围。

Claims (1)

  1. 一种基于集群节点负载状态预测的冷热区域划分节能存储方法,其特征在于:包括如下步骤:
    步骤1:基于LSTM预测模型,预测节点周期T内的负载,并按照负载从高到低排列列表Load_List:
    Figure PCTCN2019082592-appb-100001
    其中,node i表示第i个节点,i=1,2,…,m;
    Figure PCTCN2019082592-appb-100002
    表示节点node i在周期T内的负载;node j表示第j个节点,j=m+1,m+2,…,n;
    Figure PCTCN2019082592-appb-100003
    表示节点node j在周期T内的负载;
    步骤2:根据负载值列表Load_List,通过预设负载阈值Threshold,划分为热区域Nset hot和冷区域Nset cold,假定热区域中有m个节点,冷区域中有n-m个节点,m≤n,即:
    Nset hot={node 1,node 2,…,node i,…,node m};其中,i=1,2,…,m;
    Nset cold={node m+1,node m+2,…,node j,…,node n};其中,j=m+1,m+2,…,n;
    步骤3:判断热区域的资源量Resource和需求量Requirement的关系;
    若:判断结果是Resource<Requirement,则执行步骤4;
    或判断结果是Resource≥Requirement,则执行步骤5;
    步骤4:将缓冲队列的节点逐个加入到热区域中,直到热区域中资源量Resource≥Requirement,或缓冲队列中所有节点都加入热区域为止;如果仍然不满足Resource≥Requirement,则逐个激活休眠队列的节点,并将其加入到热区域直到满足Resource≥Requirement为止,然后进行负载均衡操作;
    步骤5:将新创建的数据块存储在热区域Nset hot节点中,并且采用集群默认的数据块放置策略;
    步骤6:计算热区域节点数据块访问频度列表
    Figure PCTCN2019082592-appb-100004
    判断节点node i中数据块访问频度
    Figure PCTCN2019082592-appb-100005
    与冷度阈值θ的大小关系;
    若:判断结果是
    Figure PCTCN2019082592-appb-100006
    则该数据块集合迁移到冷区域缓冲队列;
    或判断结果是
    Figure PCTCN2019082592-appb-100007
    则该数据块集合仍然保留在热区域;
    步骤7:计算冷区域Nset cold缓冲节点的数据块访问频度列表
    Figure PCTCN2019082592-appb-100008
    判断节点node j中数据块访问频度
    Figure PCTCN2019082592-appb-100009
    与休眠阈值λ的大小关系;
    若:判断结果是
    Figure PCTCN2019082592-appb-100010
    则将节点node j休眠;
    或判断结果是
    Figure PCTCN2019082592-appb-100011
    则节点node j继续保留在冷区域缓冲节点队列中;
    步骤8:更新预测模型,预测下一周期T的节点负载情况。
PCT/CN2019/082592 2019-04-10 2019-04-15 基于集群节点负载状态预测的冷热区域划分节能存储方法 WO2020206704A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910282581.5A CN110096350B (zh) 2019-04-10 2019-04-10 基于集群节点负载状态预测的冷热区域划分节能存储方法
CN201910282581.5 2019-04-10

Publications (1)

Publication Number Publication Date
WO2020206704A1 true WO2020206704A1 (zh) 2020-10-15

Family

ID=67444583

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/082592 WO2020206704A1 (zh) 2019-04-10 2019-04-15 基于集群节点负载状态预测的冷热区域划分节能存储方法

Country Status (2)

Country Link
CN (1) CN110096350B (zh)
WO (1) WO2020206704A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3968173A4 (en) * 2019-09-25 2022-05-11 Huawei Cloud Computing Technologies Co., Ltd. METHOD AND APPARATUS FOR MANAGING DATA IN A PARTITION TABLE, MANAGEMENT NODE AND STORAGE MEDIA
CN111045598B (zh) * 2019-10-10 2023-08-15 深圳市金泰克半导体有限公司 数据存储方法、装置
CN113407620B (zh) * 2020-03-17 2023-04-21 北京信息科技大学 基于异构Hadoop集群环境的数据块放置方法及***
CN113570004B (zh) * 2021-09-24 2022-01-07 西南交通大学 一种乘车热点区域预测方法、装置、设备及可读存储介质
CN113778692B (zh) * 2021-11-10 2022-03-08 腾讯科技(深圳)有限公司 一种数据处理的方法及装置、计算机设备和存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573119A (zh) * 2015-02-05 2015-04-29 重庆大学 云计算中面向节能的Hadoop分布式文件***存储策略
CN108810140A (zh) * 2018-06-12 2018-11-13 湘潭大学 云存储***中基于动态阈值调整的分级存储方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593452B (zh) * 2013-11-21 2017-06-13 北京科技大学 一种基于MapReduce机制的数据密集型成本优化方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573119A (zh) * 2015-02-05 2015-04-29 重庆大学 云计算中面向节能的Hadoop分布式文件***存储策略
CN108810140A (zh) * 2018-06-12 2018-11-13 湘潭大学 云存储***中基于动态阈值调整的分级存储方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN HAO: "Research on Energy-conserving Strategies of File Storage Based on Cluster Scale Adjustment", CHINESE MASTER'S THESES FULL-TEXT DATABASE, 1 April 2017 (2017-04-01), pages 1 - 67, XP055741519 *
YANG LIU ; CHASE Q. WU ; MENG WANG ; AIQIN HOU ; YONGQIANG WANG: "On A Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters", 2018 INTERNATIONAL SYMPOSIUM ON NETWORKS, COMPUTERS AND COMMUNICATIONS (ISNCC), 21 June 2018 (2018-06-21), pages 1 - 7, XP033442711, DOI: 10.1109/ISNCC.2018.8530970 *

Also Published As

Publication number Publication date
CN110096350A (zh) 2019-08-06
CN110096350B (zh) 2020-05-05

Similar Documents

Publication Publication Date Title
WO2020206704A1 (zh) 基于集群节点负载状态预测的冷热区域划分节能存储方法
WO2020206705A1 (zh) 一种基于集群节点负载状态预测的作业调度方法
Bitirgen et al. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach
Pinheiro et al. Energy conservation techniques for disk array-based servers
US8010764B2 (en) Method and system for decreasing power consumption in memory arrays having usage-driven power management
US7752470B2 (en) Method and system for power management including device controller-based device use evaluation and power-state control
CN103795781B (zh) 一种基于文件预测的分布式缓存方法
Liao et al. Energy-efficient algorithms for distributed storage system based on block storage structure reconfiguration
Karakoyunlu et al. Exploiting user metadata for energy-aware node allocation in a cloud storage system
CN104765572B (zh) 一种节能的虚拟存储服务器***及其调度方法
WO2021051441A1 (zh) 一种Hadoop集群节能***
Chen et al. Power and thermal-aware virtual machine scheduling optimization in cloud data center
Tian et al. Modeling and analyzing power management policies in server farms using stochastic petri nets
Wang et al. An Improved Memory Cache Management Study Based on Spark.
Chou et al. Exploiting replication for energy-aware scheduling in disk storage systems
CN110308991B (zh) 一种基于随机任务的数据中心节能优化方法及***
WO2022104500A9 (zh) 一种负载控制方法、装置、计算机设备及存储介质
Li et al. SLA‐Aware and Energy‐Efficient VM Consolidation in Cloud Data Centers Using Host State 3rd‐Order Markov Chain Model
CN115292030A (zh) 一种云计算环境下物理内存自适应方法及***
Tai et al. SLA-aware data migration in a shared hybrid storage cluster
CN110968180B (zh) 一种通过减少数据传输实现gpu降耗的方法及***
Wu et al. Overview of typical application energy efficiency optimization in high-performance data centers
You et al. Anticipation-based green data classification strategy in cloud storage system
WO2024001994A1 (zh) 节能管理方法、装置、计算设备及计算机可读存储介质
Huixi et al. A combination of host overloading detection and virtual machine selection in cloud server consolidation based on learning method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19924391

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19924391

Country of ref document: EP

Kind code of ref document: A1