CN111859703B

CN111859703B - A heat-aware data center energy-saving data replica placement method

Info

Publication number: CN111859703B
Application number: CN202010748759.3A
Authority: CN
Inventors: 邓玉辉; 范志峰
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2022-05-10
Anticipated expiration: 2040-07-30
Also published as: CN111859703A

Abstract

The invention discloses a data center energy-saving data copy placement method based on heat sensing. Aiming at the problem of huge energy consumption generated when a storage type data center disk provides online access service externally, the invention designs a copy placement strategy which comprehensively considers the refrigeration temperature and energy consumption of the storage type data center and the online state and energy consumption of the disk, and realizes the circulation, the cold and hot partition of the copy and the copy access prediction according to the airflow organization of the storage type data center, so that the hot copy is preferentially gathered and placed on a node with less influence on heat recirculation, the refrigeration temperature is prevented from being greatly lowered due to the opening of the node, and the refrigeration energy consumption and the total energy consumption of the storage type data center are effectively reduced.

Description

A data center energy-saving data replica placement method based on heat perception

技术领域technical field

本发明涉及数据中心中副本放置技术领域，具体涉及一种基于热量感知的数据中心节能数据副本放置方法。The invention relates to the technical field of replica placement in a data center, in particular to a method for placing energy-saving data replicas in a data center based on heat perception.

背景技术Background technique

随着数据的指数型增长，数据中心作为数据的承载体也日益剧增，数据机房的建设量与建设规模不断扩大。但与此同时，数据中心的耗电量与碳排放量也呈几何级增长，其能耗问题日益突出。对于存储型数据中心在不影响数据访问业务的情况下，如何有效降低能耗问题亟需解决。With the exponential growth of data, the data center as a carrier of data is also increasing rapidly, and the construction volume and scale of data room are constantly expanding. However, at the same time, the power consumption and carbon emissions of data centers have also increased exponentially, and the problem of energy consumption has become increasingly prominent. For storage data centers, it is urgent to solve the problem of how to effectively reduce energy consumption without affecting data access services.

现有的存储型数据中心节能方法，主要通过调整磁盘阵列，规划设计磁盘的组合方式，调整数据集排布，实现节能。但是现有方法的节能效果欠佳，同时产生的局部热点会造成大量的制冷能耗，在磁盘阵列的计算量上也较为复杂。当前的研究中，针对存储型数据中心副本放置采用气流组织模型降低能耗的研究较少。Existing energy-saving methods for storage-type data centers mainly realize energy saving by adjusting disk arrays, planning and designing the combination mode of disks, and adjusting the arrangement of data sets. However, the energy saving effect of the existing method is not good, and the local hot spots generated at the same time will cause a large amount of cooling energy consumption, and the calculation amount of the disk array is also relatively complicated. In the current research, there are few researches on using the airflow organization model to reduce energy consumption for replica placement in storage data centers.

发明内容SUMMARY OF THE INVENTION

本发明的目的是针对存储型数据中心存在运营成本能耗巨大、能效比低的问题，提出一种基于热量感知的数据中心节能数据副本放置方法，能够有效降低存储型数据中心的总能耗。The purpose of the present invention is to solve the problems of huge operating cost and energy consumption and low energy efficiency ratio in storage data centers, and propose a heat-sensing-based data center energy-saving data copy placement method, which can effectively reduce the total energy consumption of storage data centers.

本发明的目的可以通过采取如下技术方案达到：The purpose of the present invention can be achieved by adopting the following technical solutions:

一种基于热量感知的数据中心节能数据副本放置方法，所述的节能数据副本放置方法包括以下步骤：A method for placing copies of energy-saving data in a data center based on heat perception, the method for placing copies of energy-saving data includes the following steps:

S1、根据数据中心气流组织特征与热量再循环，针对存储型数据中心构建能耗模型；S1. According to the air distribution characteristics and heat recirculation of the data center, an energy consumption model is constructed for the storage data center;

S2、以最低总能耗为目标，生成热量感知磁盘序列DS，并将磁盘序列DS划分为活跃副本区以及冗余副本区，其中，所述的副本是数据集副本，数据集是单独一个数据块或者多个数据块的集合；S2. Aiming at the lowest total energy consumption, a heat-aware disk sequence DS is generated, and the disk sequence DS is divided into an active copy area and a redundant copy area, wherein the copy is a copy of the data set, and the data set is a single data set A block or a collection of multiple data blocks;

S3、分别对活跃副本区以及冗余副本区所包含的磁盘，再次利用存储型数据中心的能耗模型，以最低总能耗为目标，生成优化后的磁盘序列DS_new；S3, for the disks contained in the active copy area and the redundant copy area, use the energy consumption model of the storage data center again, and take the lowest total energy consumption as the goal to generate the optimized disk sequence DS _new ;

S4、定义并初始化副本表ReplicaTable，然后采用副本表ReplicaTable管理副本，将副本按优化后的磁盘序列DS_new分别顺序放置于活跃副本区以及冗余副本区；S4. Define and initialize the replica table ReplicaTable, then use the replica table ReplicaTable to manage the replicas, and place the replicas in the active replica area and the redundant replica area in the order of the optimized disk sequence DS _new ;

S5、统计多个周期的副本访问情况，划分冷热副本，并将冷热副本进行迁移放置，同时更新副本表ReplicaTable。S5. Count the replica access conditions of multiple cycles, divide the hot and cold replicas, migrate and place the hot and cold replicas, and update the replica table ReplicaTable at the same time.

进一步地，所述的步骤S1、根据数据中心气流组织特征与热量再循环，针对存储型数据中心构建能耗模型，具体实现如下：Further, in the step S1, according to the air distribution characteristics of the data center and the heat recirculation, an energy consumption model is constructed for the storage data center, and the specific implementation is as follows:

将存储型数据中心每个机架上的机箱看成节点，将存储型数据中心划分为若干个节点，而每个节点内包含若干个磁盘，且每个节点内磁盘共享一个节点电源，并根据数据中心气流组织特征与热量再循环，得到节点内处于活跃状态磁盘数量所产生的节点磁盘能耗模型，同时，根据节点热量循环系数矩阵以及节点磁盘能耗得到数据中心制冷能耗模型以及总能耗模型。The chassis on each rack of the storage data center is regarded as a node, and the storage data center is divided into several nodes, and each node contains several disks, and the disks in each node share a node power supply, and according to the The airflow organization characteristics and heat recirculation of the data center are used to obtain the node disk energy consumption model generated by the number of active disks in the node. consumption model.

进一步地，根据数据中心气流组织特征与热量再循环，计算出节点间的热量循环系数矩阵。Further, according to the air distribution characteristics of the data center and the heat recirculation, the heat circulation coefficient matrix between the nodes is calculated.

进一步地，所述的得到节点内处于不同状态磁盘数量所产生的节点磁盘能耗模型中，依据磁盘的服务状态将磁盘分为关闭、休眠、活跃状态，不同状态对应产生不同的能耗，不同状态下的磁盘的构成节点磁盘能耗，同时，定义Requsets(s)为存储型数据中心接收到数据集s的数据访问请求，s为该数据访问请求申请访问的数据集编号，此时该数据集s的副本所在磁盘需处于活跃状态。Further, in the obtained node disk energy consumption model generated by the number of disks in different states in the node, the disks are divided into closed, dormant, and active states according to the service state of the disk, and different states produce different energy consumption corresponding to different states. At the same time, define Requests(s) as the data access request of the data set s received by the storage data center, s is the data set number that the data access request applies for access, at this time the data The disk where the replica of set s resides must be active.

进一步地，所述的步骤S2中的以最低总能耗为目标，生成热量感知磁盘序列DS，并将磁盘序列DS划分为活跃副本区以及冗余副本区的过程如下：Further, the process of generating the heat-aware disk sequence DS with the lowest total energy consumption as the target in the step S2, and dividing the disk sequence DS into an active copy area and a redundant copy area is as follows:

S201、通过建立能耗模型，以最小总能耗为目标，采用贪心算法思想，遍历所有节点，选取出第一个开启与放置副本的节点与磁盘编号，并将该磁盘编号记录在磁盘序列DS中，其中，所述的磁盘编号由小到大进行选取，且已经存在于磁盘序列DS中的磁盘编号不重复遍历；S201. By establishing an energy consumption model, aiming at the minimum total energy consumption and adopting the greedy algorithm idea, traverse all nodes, select the first node and disk number for opening and placing a copy, and record the disk number in the disk sequence DS , wherein, the disk number is selected from small to large, and the disk number that already exists in the disk sequence DS is not traversed repeatedly;

S202、通过建立能耗模型，以最小化总能耗为目标，选取出下一个开启与放置副本的节点与磁盘编号，并将该磁盘编号记录在磁盘序列DS中；S202, by establishing an energy consumption model, with the goal of minimizing the total energy consumption, select the next node and disk number for opening and placing the replica, and record the disk number in the disk sequence DS;

S203、固定已经记录在磁盘序列DS中的磁盘编号，重复步骤S202，直到得到完整的磁盘序列DS；S203, fix the disk number that has been recorded in the disk sequence DS, repeat step S202, until the complete disk sequence DS is obtained;

S204、将磁盘序列DS按照活跃副本与冗余副本的比例，划分为活跃副本区以及冗余副本区。S204: Divide the disk sequence DS into an active copy area and a redundant copy area according to the ratio of the active copy to the redundant copy.

进一步地，所述的步骤S204中，根据设定的活跃副本数active与冗余副本数redundant的比例，将磁盘序列DS的前

个磁盘组成活跃副本区，剩下的磁盘组成冗余副本区，其中，d为磁盘序列DS的磁盘总数，

表示向上取整。Further, in the step S204, according to the set ratio of the number of active copies active to the number of redundant copies redundant

The remaining disks form the active copy area, and the remaining disks form the redundant copy area, where d is the total number of disks in the disk sequence DS,

Indicates rounded up.

进一步地，所述的步骤S3过程如下：Further, the described step S3 process is as follows:

S301、定义活跃副本区包含的磁盘编号为集合S_Active，冗余副本区包含的磁盘编号为集合S_Redundant；S301. Define the disk number included in the active copy area as the set S _Active , and the disk number included in the redundant copy area as the set S _Redundant ;

S302、建立能耗模型，以最小化总能耗为目标，遍历集合S_Active，选取前active个开启并放置副本的磁盘编号，并记录在优化后的磁盘序列DS_new的活跃副本区，其中，active为活跃副本数；S302 , establishing an energy consumption model, aiming at minimizing the total energy consumption, traverse the set S _Active , select the first active disk numbers that are turned on and place copies, and record them in the active copy area of the optimized disk sequence DS _new , wherein, active is the number of active copies;

S303、建立能耗模型，以最小化总能耗为目标，遍历集合S_Redundant，选取后redundant个开启并放置副本的磁盘编号，并记录在优化后的磁盘序列DS_new的冗余副本区，其中，redundant为冗余副本数；S303 , establishing an energy consumption model, aiming at minimizing the total energy consumption, traverse the set S _Redundant , select the number of the redundant disks that are opened and place replicas, and record them in the redundant replica area of the optimized disk sequence DS _new , wherein , redundant is the number of redundant copies;

S304、固定已经记录在磁盘序列DS_new中的磁盘编号，重复步骤S302和S303，直至得到完整的磁盘序列DS_new。S304: Fix the disk number that has been recorded in the disk sequence DS _new , and repeat steps S302 and S303 until the complete disk sequence DS _new is obtained.

进一步地，所述的步骤S4中定义并初始化副本表ReplicaTable的过程如下：Further, the process of defining and initializing the replica table ReplicaTable in the step S4 is as follows:

首先初始化副本表ReplicaTable，按照数据的访问请求，将数据备份active-1个活跃副本以及redundant个冗余副本，其中，active为活跃副本数，redundant为冗余副本数，并按照磁盘序列DS_new的磁盘编号顺序，将活跃副本放置在活跃副本区，冗余副本放置在冗余副本区，同时将副本位置写入副本表ReplicaTable，其中，所述的将副本位置写入副本表ReplicaTable中，将采用以下方法写入：First initialize the replica table ReplicaTable, according to the data access request, back up the data active-1 active replica and redundant redundant replicas, where active is the number of active replicas, redundant is the number of redundant replicas, and according to the disk sequence DS _new In the order of disk numbers, the active copy is placed in the active copy area, and the redundant copy is placed in the redundant copy area. At the same time, the copy position is written into the copy table ReplicaTable, wherein, the copy position is written into the copy table ReplicaTable, will use The following method writes:

ReplicaTable_s,k＝jReplicaTable _s,k = j

其中，s表示数据集编号，k为该数据集的第k个副本，j为该副本所在磁盘编号，ReplicaTable_s,k＝j表示数据集编号s的第k个副本存储于磁盘编号j中。Among them, s represents the data set number, k is the k-th replica of the data set, j is the disk number where the replica is located, and ReplicaTable _s,k =j represents that the k-th replica of the data set number s is stored in disk number j.

进一步地，所述的步骤S5过程如下：Further, the described step S5 process is as follows:

S501、记录并统计多个周期的副本访问情况，其中，所述的副本访问情况包括数据副本访问频数、访问时间；S501, record and count the replica access situation of multiple cycles, wherein, the replica access situation includes data replica access frequency and access time;

S502、将每个数据集在每个周期内的访问频数进行比较，并降序排列，定义排名在前20％的数据集为热数据，其对应的数据集副本为热副本；剩余数据集为冷数据，对应的数据集副本为冷副本，其中，每个数据集在当前周期内的访问频数为该数据集所有副本在当前周期内的访问频数总和；S502. Compare the access frequency of each data set in each cycle, and arrange them in descending order, and define the top 20% data sets as hot data, and the corresponding data set copies are hot copies; the remaining data sets are cold Data, the corresponding data set copy is a cold copy, where the access frequency of each data set in the current cycle is the sum of the access frequencies of all copies of the data set in the current cycle;

S503、采用二次指数平滑法，对未来D个周期的数据集访问频数进行预测，并重新定义冷热副本；S503, using the quadratic exponential smoothing method to predict the access frequency of the data set for D periods in the future, and redefine the hot and cold copies;

S504、将优化后的磁盘序列DS_new活跃副本区中包含的新定义热副本迁移到该区的前20％的磁盘中，将优化后的磁盘序列DS_new活跃副本区中包含的新定义的冷副本迁移到该区剩下的磁盘中；S504: Migrate the newly defined hot copies contained in the optimized disk sequence DS _new active copy area to the top 20% of the disks in the area, and transfer the newly defined cold copies contained in the optimized disk sequence DS _new active copy area The copy is migrated to the remaining disks in the area;

S505、将优化后的磁盘序列DS_new冗余副本区中包含的新定义热副本迁移到该区的前20％的磁盘中，将优化后的磁盘序列DS_new冗余副本区中包含的新定义的冷副本迁移到该区剩下的磁盘中，完成冷热副本的迁移放置，同时更新副本表ReplicaTable。S505: Migrate the newly defined hot copies contained in the optimized disk sequence DS _new redundant copy area to the top 20% of the disks in the area, and transfer the newly defined hot copies contained in the optimized disk sequence DS _new redundant copy area The cold copy is migrated to the remaining disks in this area, the migration and placement of the hot and cold copies are completed, and the replica table ReplicaTable is updated at the same time.

进一步地，所述的建立能耗模型，以最小化总能耗为目标，具体如下所示：Further, the establishment of the energy consumption model is aimed at minimizing the total energy consumption, as follows:

Y＝f(ReplicaTable,Requsets(s))Y=f(ReplicaTable, Requests(s))

其中，in,

P_node＝P_usingY+P_idleλP _node =P _using Y+P _idle λ

COP＝0.0068t_sup ²+0.0008t_sup+0.4580COP=0.0068t _sup ² +0.0008t _sup +0.4580

t_sup＝min(t_critital-DP_node)t _sup = min(t _critital -DP _node )

其中，假定存储型数据中心有n个节点，

表示第i个节点产生的能耗，λ_i为0或1的二值变量表示第i个节点是否处于活跃状态，0表示否，1表示是，Y表示节点处于活跃状态的磁盘个数组成的向量，由数据访问请求Requsets(s)以及副本表ReplicaTable决定，P_node表示

组成的向量，单个节点每增加一个磁盘转换为活跃状态的额外产生能耗为P_using，节点所有磁盘处于休眠状态能耗和为P_idle，λ表示λ_i组成的向量，COP为制冷设备制冷性能系数，t_sup为制冷设备的供应温度，温度警戒值t_critital表示节点磁盘需要低于该温度才能提供数据访问服务，热量循环矩阵D表示节点间热量相互影响的系数和节点能耗之间的关系。Among them, it is assumed that the storage data center has n nodes,

Represents the energy consumption generated by the i-th node. A binary variable with λ _i of 0 or 1 indicates whether the i-th node is in an active state, 0 means no, 1 means yes, and Y means the number of disks in which the node is in an active state. Vector, determined by data access request Requests(s) and replica table ReplicaTable, represented by P _node

A vector composed of, the additional energy consumption of a single node when a disk is converted to an active state is P _using , the energy consumption sum of all disks in a dormant state of a node is P _idle , λ represents the vector composed of λ _i , and COP is the cooling performance of the cooling equipment coefficient, t _sup is the supply temperature of the cooling equipment, the temperature warning value t _critital indicates that the node disk needs to be lower than this temperature to provide data access services, and the heat cycle matrix D indicates the relationship between the coefficient of heat mutual influence between nodes and the energy consumption of nodes .

本发明相对于现有技术具有如下的优点及效果：Compared with the prior art, the present invention has the following advantages and effects:

(1)本发明采用气流组织模型建模，可以有效兼顾存储能耗以及制冷温度与能耗，能够充分的采用热量再循环特征降低数据中心能耗。(1) The present invention adopts airflow organization model modeling, which can effectively take into account the storage energy consumption, cooling temperature and energy consumption, and can fully use the heat recirculation feature to reduce the energy consumption of the data center.

(2)本发明采用贪心算法的思想，时间复杂度较低，可用于在线存储型数据中心的运行。(2) The present invention adopts the idea of a greedy algorithm, has low time complexity, and can be used for the operation of an online storage data center.

(3)本发明采用访问请求的预测，可以更加准确的划分冷热数据副本，同时通过副本的迁移放置，进一步降低数据中心能耗。(3) The present invention adopts the prediction of the access request, which can divide the hot and cold data copies more accurately, and at the same time, further reduces the energy consumption of the data center through the migration and placement of the copies.

附图说明Description of drawings

图1是本发明提出的热量感知磁盘序列计算方法流程图；Fig. 1 is the flow chart of the heat-aware disk sequence calculation method proposed by the present invention;

图2是本发明的活跃副本分区、冗余副本分区示意图；2 is a schematic diagram of an active replica partition and a redundant replica partition of the present invention;

图3是本发明提出优化的热量感知磁盘序列计算方法流程图。FIG. 3 is a flowchart of an optimized heat-aware disk sequence calculation method proposed by the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

实施例Example

本实施例具体公开了一种基于热量感知的数据中心节能数据副本放置方法，该节能数据副本放置方法包括以下步骤：The present embodiment specifically discloses a method for placing copies of energy-saving data in a data center based on heat perception, and the method for placing copies of energy-saving data includes the following steps:

S1、根据数据中心气流组织特征与热量再循环，针对存储型数据中心构建能耗模型。S1. According to the air distribution characteristics and heat recirculation of the data center, an energy consumption model is constructed for the storage data center.

将存储型数据中心每个机架上的机箱看成节点，将存储型数据中心划分为若干个节点，而每个节点内包含若干个磁盘，且每个节点内磁盘共享一个节点电源，并根据数据中心气流组织特征与热量再循环，计算出节点间的热量循环系数矩阵。The chassis on each rack of the storage data center is regarded as a node, and the storage data center is divided into several nodes, and each node contains several disks, and the disks in each node share a node power supply, and according to the Data center airflow organization characteristics and heat recirculation, and calculate the heat circulation coefficient matrix between nodes.

依据磁盘的服务状态可分为关闭、休眠、活跃状态，而这些状态对应产生不同的能耗，这些不同状态下的磁盘构成节点磁盘能耗，且根据存储型数据中心接受到的数据访问请求Requsets(s)，该数据集副本所在磁盘需处于活跃状态，Requests为数据访问请求，s为该请求访问的数据集编号。According to the service state of the disk, it can be divided into closed, dormant, and active states, and these states generate different energy consumption. The disks in these different states constitute the node disk energy consumption, and according to the data access request received by the storage data center Requests (s), the disk where the copy of the dataset is located must be active, Requests is the data access request, and s is the number of the dataset accessed by the request.

依据磁盘的服务状态将磁盘分为关闭、休眠、活跃状态，不同状态对应产生不同的能耗，不同状态下的磁盘的构成节点磁盘能耗，同时，定义Requsets(s)为存储型数据中心接收到数据集s的数据访问请求，s为该数据访问请求申请访问的数据集编号，此时该数据集s的副本所在磁盘需处于活跃状态。According to the service state of the disk, the disk is divided into closed, dormant, and active states. Different states generate different energy consumption, and the disk energy consumption of the constituent nodes of the disk in different states. A data access request to data set s, where s is the data set number that the data access request applies for access, and the disk where the copy of the data set s is located must be in an active state.

根据节点热量循环系数矩阵以及节点内处于活跃状态磁盘数量所产生的节点磁盘能耗模型，可以得到数据中心制冷能耗以及总能耗模型。According to the node heat cycle coefficient matrix and the node disk energy consumption model generated by the number of active disks in the node, the cooling energy consumption and total energy consumption model of the data center can be obtained.

假定存储型数据中心有n个节点，

表示第i个节点产生的能耗，λ_i为0或1的二值变量表示第i个节点是否处于活跃状态，0表示否，1表示是，Y表示节点处于活跃状态的磁盘个数组成的向量，由数据访问请求以及副本表的函数决定，P_node表示

组成的向量，单个节点每增加一个磁盘转换为活跃状态的额外产生能耗为P_using，节点所有磁盘处于休眠状态能耗和为P_idle，λ表示λ_i组成的向量，COP为制冷设备制冷性能系数，t_sup为制冷设备的供应温度，温度警戒值t_critital表示节点磁盘需要低于该温度才能提供数据访问服务，热量循环矩阵D表示节点间热量相互影响的系数和节点能耗之间的关系，那么以最小化能耗为目标，总能耗模型表示如下：Assuming that there are n nodes in the storage data center,

Represents the energy consumption generated by the i-th node. A binary variable with λ _i of 0 or 1 indicates whether the i-th node is in an active state, 0 means no, 1 means yes, and Y means the number of disks in which the node is in an active state. Vector, determined by data access request and function of replica table, represented by P _node

A vector composed of, the additional energy consumption of a single node when a disk is converted to an active state is P _using , the energy consumption sum of all disks in a dormant state of a node is P _idle , λ represents the vector composed of λ _i , and COP is the cooling performance of the cooling equipment coefficient, t _sup is the supply temperature of the cooling equipment, the temperature warning value t _critital indicates that the node disk needs to be lower than this temperature to provide data access services, and the heat cycle matrix D indicates the relationship between the coefficient of heat mutual influence between nodes and the energy consumption of nodes , then with the goal of minimizing energy consumption, the total energy consumption model is expressed as follows:

Y＝f(ReplicaTable,Requsets(s))Y=f(ReplicaTable, Requests(s))

其中，in,

P_node＝P_usingY+P_idleλP _node =P _using Y+P _idle λ

t_sup＝min(t_critital-DP_node)t _sup = min(t _critital -DP _node )

S2、以最低总能耗为目标，生成热量感知磁盘序列DS，并将磁盘序列DS划分为活跃副本区以及冗余副本区。S2. Aiming at the lowest total energy consumption, a heat-aware disk sequence DS is generated, and the disk sequence DS is divided into an active copy area and a redundant copy area.

如图1所示，生成热量感知磁盘序列流程如下：As shown in Figure 1, the sequence flow for generating a heat-aware disk is as follows:

(1)设置节点个数n以及节点所包含的磁盘数量m，节点编号i从1开始遍历；(1) Set the number of nodes n and the number of disks m contained in the node, and the node number i starts to traverse from 1;

(2)若该节点编号i处于活跃状态的磁盘数量Y_i小于节点所包含的磁盘数量m，则该节点编号处于活跃状态的磁盘数量Y_i自增1，否则，节点编号i自增1，重复步骤(2)；(2) If the number of disks Y _i in the active state of the node number i is less than the number of disks m contained in the node, the number of disks in the active state of the node number _Yi increases by 1, otherwise, the node number i increases by 1 automatically, Repeat step (2);

(3)建立数据中心总能耗模型，计算当前分配的磁盘情况下的总能耗，若总能耗最小，则将节点i的磁盘编号j记录在磁盘序列DS中，其中磁盘编号由小到大进行选取，否则该节点编号处于活跃状态的磁盘数量Y_i自减1，节点编号i自增1，转到步骤(2)；(3) Establish the total energy consumption model of the data center, and calculate the total energy consumption under the condition of the currently allocated disks. If the total energy consumption is the smallest, record the disk number j of node i in the disk sequence DS, where the disk numbers are from small to small. Select a large number, otherwise the number of disks Y _i in the active state of the node number is decremented by 1, and the node number i is incremented by 1, and go to step (2);

(4)判断磁盘序列DS是否完整，若是，则输出磁盘序列DS，否则转到步骤(1)。(4) Determine whether the disk sequence DS is complete, if yes, output the disk sequence DS, otherwise go to step (1).

如图2所示，表示活跃副本区与冗余副本区的划分示意图。As shown in FIG. 2 , it is a schematic diagram showing the division of the active copy area and the redundant copy area.

将磁盘序列DS按照活跃副本与冗余副本的比例，划分为活跃副本区以及冗余副本区，其中根据设定的活跃副本数active与冗余副本数redundant的比例，将磁盘序列DS的前

个磁盘组成活跃副本区，剩下的磁盘组成冗余副本区。Divide the disk sequence DS into an active copy area and a redundant copy area according to the ratio of active copies to redundant copies.

One disk forms the active copy area, and the remaining disks form the redundant copy area.

S3、分别对活跃副本区以及冗余副本区所包含的磁盘，再次利用存储型数据中心能耗模型，以最低总能耗为目标，生成优化后的磁盘序列DS_new。S3. For the disks included in the active copy area and the redundant copy area, the storage data center energy consumption model is used again, and an optimized disk sequence DS _new is generated with the lowest total energy consumption as the goal.

如图3所示，表示生成优化后的热量感知磁盘序列流程：As shown in Figure 3, it represents the process of generating an optimized heat-aware disk sequence:

(1)采用步骤S2得到的磁盘序列DS，定义活跃副本区包含的磁盘编号为集合S_Active，冗余副本区包含的磁盘编号为集合S_Redundant，集合S_Active的磁盘序号p从1开始遍历；(1) adopting the disk sequence DS obtained in step S2, defining the disk number contained in the active copy area to be the set S _Active , the disk number contained in the redundant copy area being the set S _Redundant , and the disk sequence number p of the set S _Active is traversed from 1;

(2)判断磁盘序号p对应的磁盘编号是否处于活跃状态，若是，则磁盘序号p自增1，重复步骤(2)，否则建立数据中心总能耗模型；(2) judging whether the disk number corresponding to the disk serial number p is in an active state, if so, then the disk serial number p is automatically incremented by 1, and step (2) is repeated, otherwise the total energy consumption model of the data center is established;

(3)计算当前分配的磁盘情况下的总能耗，若总能耗最小，则磁盘序号p对应的磁盘编号记录在优化后的磁盘序列DS_new的活跃副本区，磁盘序号p自增1，转到步骤(2)；(3) Calculate the total energy consumption of the currently allocated disk. If the total energy consumption is the smallest, the disk number corresponding to the disk sequence number p is recorded in the active copy area of the optimized disk sequence DS _new , and the disk sequence number p is incremented by 1. Go to step (2);

(4)判断本轮累计转换活跃磁盘是否等于active，若是，定义集合S_Redundant的磁盘序号q从1开始遍历，否则磁盘序号p自增1，转到步骤(2)；(4) Judging whether the active disk of this round of cumulative conversion is equal to active, if so, define the disk sequence number q of the set S _Redundant to traverse from 1, otherwise the disk sequence number p increases by 1, and goes to step (2);

(5)判断磁盘序号q对应的磁盘编号是否处于活跃状态，若是，则磁盘序号q自增1，重复步骤(5)，否则建立数据中心总能耗模型；(5) Judging whether the disk number corresponding to the disk serial number q is in an active state, and if so, the disk serial number q is automatically incremented by 1, and step (5) is repeated, otherwise the total energy consumption model of the data center is established;

(6)计算当前分配的磁盘情况下的总能耗，若总能耗最小，则磁盘序号q对应的磁盘编号记录在优化后的磁盘序列DS_new的冗余副本区，磁盘序号q自增1，转到步骤(5)；(6) Calculate the total energy consumption of the currently allocated disks. If the total energy consumption is the smallest, the disk number corresponding to the disk serial number q is recorded in the redundant copy area of the optimized disk sequence DS _new , and the disk serial number q is automatically incremented by 1 , go to step (5);

(7)判断本轮累计转换活跃磁盘是否等于redundant，若是，转到步骤(8)，否则磁盘序号q自增1，转到步骤(5)；(7) Judging whether the current cumulative conversion active disk is equal to redundant, if so, go to step (8), otherwise the disk serial number q is incremented by 1, and go to step (5);

(8)若优化后的磁盘序列DS_new完整，则输出磁盘序列DS_new，否则，转到步骤(2)。(8) If the optimized disk sequence DS _new is complete, output the disk sequence DS _new , otherwise, go to step (2).

S4、定义并初始化副本表ReplicaTable，然后采用副本表ReplicaTable管理副本，将副本按优化后的磁盘序列DS_new顺序分别放置于活跃副本区以及冗余副本区；其中，定义并初始化副本表ReplicaTable的过程如下：S4. Define and initialize the replica table ReplicaTable, then use the replica table ReplicaTable to manage the replicas, and place the replicas in the active replica area and the redundant replica area in the order of the optimized disk sequence DS _new ; among them, define and initialize the replica table ReplicaTable The process of as follows:

初始化副本表ReplicaTable，按照数据的访问请求，将数据备份active-1个活跃副本以及redundant个冗余副本，并按照DS_new的磁盘编号顺序，将活跃副本放置在活跃副本区，冗余副本放置在冗余副本区，同时将副本位置写入副本表。Initialize the replica table ReplicaTable, back up active-1 active replicas and redundant replicas according to the data access request, and place the active replicas in the active replica area and redundant replicas in the order of the disk numbers of DS _new . Redundant replica area, while writing replica position to replica table.

副本位置写入副本表中，采用以下方法写入：The replica position is written to the replica table by the following methods:

ReplicaTable_s,k＝jReplicaTable _s,k = j

(1)记录并统计多个周期的副本访问情况，包括数据副本访问频数、访问时间等；(1) Record and count replica access over multiple cycles, including data replica access frequency, access time, etc.;

(2)将每个数据集在每个周期内的访问频数进行比较，并降序排列，定义排名在前20％的数据集为热数据，其对应的数据集副本为热副本；剩余数据集为冷数据，对应的数据集副本为冷副本；(2) Compare the access frequency of each data set in each cycle, and arrange them in descending order, define the top 20% data sets as hot data, and the corresponding data set copies as hot copies; the remaining data sets are Cold data, the corresponding data set copy is a cold copy;

将数据集在当前周期内的访问频数为该数据集所有副本在当前周期内的访问频数总和。The access frequency of the dataset in the current cycle is the sum of the access frequencies of all replicas of the dataset in the current cycle.

(3)采用二次指数平滑法，对未来D个周期的数据集访问频数进行预测，并重新定义冷热副本；(3) Use the quadratic exponential smoothing method to predict the access frequency of the data set for D periods in the future, and redefine the hot and cold copies;

(4)将优化后的磁盘序列DS_new活跃副本区中包含的新定义热副本迁移到该区的前20％的磁盘中，将优化后的磁盘序列DS_new活跃副本区中包含的新定义的冷副本迁移到该区剩下的磁盘中；(4) Migrate the newly defined hot copy contained in the optimized disk sequence DS _new active copy area to the top 20% of the disks in this area, and transfer the newly defined hot copy contained in the optimized disk sequence DS _new active copy area The cold copy is migrated to the remaining disks in the area;

(5)将优化后的磁盘序列DS_new冗余副本区中包含的新定义热副本迁移到该区的前20％的磁盘中，将优化后的磁盘序列DS_new冗余副本区中包含的新定义的冷副本迁移到该区剩下的磁盘中，完成冷热副本的迁移放置，同时更新副本表ReplicaTable。(5) Migrate the newly defined hot copy contained in the optimized disk sequence DS _new redundant copy area to the top 20% of the disks in this area, and transfer the new definition hot copy contained in the optimized disk sequence DS _new redundant copy area The defined cold copy is migrated to the remaining disks in the area, the migration and placement of the hot and cold copies are completed, and the replica table ReplicaTable is updated at the same time.

综上所述，本实施例采用气流组织模型建模，可以有效兼顾存储能耗以及制冷温度与能耗，能够充分的采用热量再循环特征降低数据中心能耗。热量感知磁盘序列的生成，采用采用贪心算法的思想，时间复杂度较低，可用于在线存储型数据中心的运行。采用访问请求的预测，可以更加准确的划分冷热数据副本，同时通过副本的迁移放置，进一步降低数据中心能耗。To sum up, this embodiment adopts the airflow organization model for modeling, which can effectively take into account the storage energy consumption, cooling temperature and energy consumption, and can fully use the heat recirculation feature to reduce the energy consumption of the data center. The generation of the heat-aware disk sequence adopts the idea of the greedy algorithm, which has a low time complexity and can be used for the operation of an online storage data center. Using the prediction of access requests, the hot and cold data copies can be divided more accurately, and the energy consumption of the data center can be further reduced through the migration and placement of copies.

上述实施例为本发明较佳的实施方式，但本发明的实施方式并不受上述实施例的限制，其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化，均应为等效的置换方式，都包含在本发明的保护范围之内。The above-mentioned embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the above-mentioned embodiments, and any other changes, modifications, substitutions, combinations, The simplification should be equivalent replacement manners, which are all included in the protection scope of the present invention.

Claims

1. A data center energy-saving data copy placement method based on heat perception is characterized by comprising the following steps:

s1, constructing an energy consumption model aiming at the storage type data center according to the airflow organization characteristics and the heat recirculation of the data center;

s2, generating a heat sensing disk sequence DS with the lowest total energy consumption as a target, and dividing the disk sequence DS into an active copy area and a redundant copy area, wherein the copy is a data set copy, and the data set is a single data block or a set of a plurality of data blocks;

s3, respectively generating optimized disk sequence DS for the disks contained in the active copy area and the redundant copy area by using the energy consumption model of the storage type data center again and taking the lowest total energy consumption as the target_new(ii) a The process is as follows:

s301, defining the number of the disk contained in the active copy area as a set S_ActiveThe redundant copy area contains the disk number set S_Redundant；

S302, establishing an energy consumption model, traversing the set S with the aim of minimizing total energy consumption_ActiveSelecting the disk numbers of the front active opening and placing the copies, and recording the disk sequence DS after optimization_newThe active copy area of (1), wherein active is the number of active copies;

s303, establishing an energy consumption model, traversing the set S by taking the minimum total energy consumption as a target_RedundantSelecting the disk numbers of the later redundant opening and placing the copies, and recording the disk numbers in the optimized disk sequence DS_newWherein redundant is the number of redundant copies;

s304, fixing the DS recorded in the disc sequence_newRepeating the steps S302 and S303 until a complete disk sequence DS is obtained_new；

S4, defining and initializing the copy table replicable, then managing the copy by using the copy table replicable, and connecting the auxiliary unitThe present is optimized according to the disk sequence DS_newThe data are respectively and sequentially placed in an active copy area and a redundant copy area;

s5, counting the copy access conditions of a plurality of periods, dividing cold and hot copies, performing migration placement on the cold and hot copies, and updating the repliaTable.

2. The method for placing the energy-saving data copy in the data center based on the heat perception as claimed in claim 1, wherein the step S1 is to construct an energy consumption model for the storage type data center according to the airflow organization characteristics and the heat recycling of the data center, and is implemented as follows:

the method comprises the steps of regarding a case on each rack of a storage type data center as a node, dividing the storage type data center into a plurality of nodes, enabling each node to contain a plurality of magnetic disks, enabling the magnetic disks in each node to share a node power supply, obtaining a node magnetic disk energy consumption model generated by the number of the magnetic disks in an active state in each node according to the airflow organization characteristics and heat recirculation of the data center, and obtaining a data center refrigeration energy consumption model and a total energy consumption model according to a node heat circulation coefficient matrix and node magnetic disk energy consumption.

3. The data center energy-saving data copy placement method based on heat perception according to claim 1, wherein a heat circulation coefficient matrix between nodes is calculated according to data center airflow organization characteristics and heat recycling.

4. The method for placing the energy-saving data copies of the data center based on the heat perception as claimed in claim 2, wherein in the node disk energy consumption model generated by obtaining the number of disks in different states in the node, the disks are divided into a closed state, a dormant state and an active state according to the service state of the disks, different energy consumptions are correspondingly generated in different states, the disks in different states form node disk energy consumption, meanwhile, requusets(s) is defined as data access requests of a data set s received by the storage type data center, s is a data set number for which the data access requests apply for access, and at this time, the disk where the copy of the data set s is located needs to be in the active state.

5. The method for placing energy-saving data copies in a data center based on heat sensing as claimed in claim 1, wherein the step S2 is performed by generating a heat-sensing disk sequence DS with the lowest total energy consumption as a target, and dividing the disk sequence DS into an active copy region and a redundant copy region as follows:

s201, traversing all nodes by establishing an energy consumption model and taking minimum total energy consumption as a target by adopting a greedy algorithm idea, selecting a node for starting and placing a copy and a disk number, and recording the disk number in a disk sequence DS, wherein the disk number is selected from small to large, and the disk number existing in the disk sequence DS is not repeatedly traversed;

s202, selecting a next node for starting and placing a copy and a disk number by establishing an energy consumption model and aiming at minimizing total energy consumption, and recording the disk number in a disk sequence DS;

s203, fixing the disk numbers recorded in the disk sequence DS, and repeating the step S202 until a complete disk sequence DS is obtained;

and S204, dividing the disk sequence DS into an active copy area and a redundant copy area according to the proportion of the active copy to the redundant copy.

6. The method as claimed in claim 5, wherein in step S204, the front part of the disk sequence DS is divided into an active copy number and a redundant copy number according to a ratio of an active copy number to a redundant copy number that is set

One disk constitutes a live copy region and the remaining disks constitute a redundant copy region, where d is the total number of disks in the disk sequence DS,

indicating rounding up.

7. The method for placing a copy of energy saving data in a data center based on heat sensing as claimed in claim 1, wherein the process of defining and initializing the copy table replicable in step S4 is as follows:

firstly, initializing a copy table replicable, backing up active-1 active copies and redundant copies of data according to an access request of the data, wherein active is the number of the active copies, redundant is the number of the redundant copies, and the data are copied according to a disk sequence DS_newThe active copy is placed in the active copy area, the redundant copy is placed in the redundant copy area, and the copy position is written into the copy table replicable, wherein the copy position is written into the copy table replicable, and the following method is adopted for writing:

ReplicaTable_s,k＝j

wherein s represents the number of the data set, k is the kth copy of the data set, j is the number of the disk where the copy is located, and replicable Table_s,kJ indicates that the kth copy of the data set number s is stored in disk number j.

8. The method for placing the energy-saving data copy of the data center based on the heat perception as claimed in claim 1, wherein the step S5 is as follows:

s501, recording and counting the access conditions of the copies in multiple periods, wherein the access conditions of the copies comprise data copy access frequency and access time;

s502, comparing the access frequency of each data set in each period, sequencing the access frequency in a descending order, defining the data set ranked at the top 20% as hot data, and defining the corresponding data set copy as a hot copy; the rest data sets are cold data, and the corresponding data set copies are cold copies, wherein the access frequency count of each data set in the current period is the sum of the access frequency counts of all the copies of the data set in the current period;

s503, predicting the data set access frequency of the next D periods by adopting a secondary exponential smoothing method, and redefining cold and hot copies;

s504, optimizing the disk sequence DS_newMigrating the newly defined hot copy contained in the active copy area to the first 20% of the disks in the area, and executing the optimized disk sequence DS_newMigrating the newly defined cold copy contained in the active copy area to the rest of the disk in the area;

s505, optimizing the disk sequence DS_newMigrating the newly defined hot copy contained in the redundant copy area to the first 20% of the disks in the area, and carrying out DS (sequence of disks) optimization_newAnd migrating the newly defined cold copy contained in the redundant copy area to the rest of the disks in the redundant copy area, finishing migration and placement of the cold and hot copies, and updating the replicable table of the copy table.

9. The method for placing the energy-saving data copy of the data center based on the heat perception as claimed in claim 1, wherein the energy consumption model is established with the goal of minimizing the total energy consumption as follows:

Y＝f(ReplicaTable，Requsets(s))

wherein,

P_node＝P_usingY+P_idleλ

COP＝0.0068t_sup ²+0.0008t_sup+0.4580

t_sup＝min(t_critital-DP_node)

wherein, the storage type data center is assumed to have n nodes,

representing the energy consumption, λ, generated by the ith node_iThe binary variable quantity of 0 or 1 represents whether the ith node is in an active state, 0 represents no, 1 represents yes, Y represents a vector consisting of the number of disks of the node in the active state, and is determined by data access requests Requusets(s) and a copy table ReplicaTable, P_nodeTo represent

The vector is formed, and the additional energy consumption generated by each additional disk of a single node converted into an active state is P_usingThe sum of the energy consumption of all the disks of the node in the dormant state is P_idleAnd λ represents λ_iThe vector of composition, COP, is the coefficient of refrigeration performance of the refrigeration equipment, t_supFor supplying the temperature of the refrigerating device, a temperature warning value t_crititalIndicating that the node disk needs to be below the temperature to provide data access service, and the heat cycle matrix D represents the relationship between the coefficient of thermal interaction between nodes and the node energy consumption.