[go: up one dir, main page]

CN111400026B - A distributed load balancing method based on master-slave backup technology - Google Patents

A distributed load balancing method based on master-slave backup technology Download PDF

Info

Publication number
CN111400026B
CN111400026B CN201911119106.2A CN201911119106A CN111400026B CN 111400026 B CN111400026 B CN 111400026B CN 201911119106 A CN201911119106 A CN 201911119106A CN 111400026 B CN111400026 B CN 111400026B
Authority
CN
China
Prior art keywords
task
tasks
node
nodes
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911119106.2A
Other languages
Chinese (zh)
Other versions
CN111400026A (en
Inventor
谢在鹏
李博文
张基
朱晓瑞
徐媛媛
叶保留
毛莺池
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201911119106.2A priority Critical patent/CN111400026B/en
Publication of CN111400026A publication Critical patent/CN111400026A/en
Application granted granted Critical
Publication of CN111400026B publication Critical patent/CN111400026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/12Simultaneous equations, e.g. systems of linear equations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Multi Processors (AREA)

Abstract

本发明公开了一种主从备份技术的分布式负载均衡方法,包括步骤如下:S1划分节点集合;S2划分任务集合;S3分发子任务;S4记录执行时间;S5计算执行效率;S6求解分配方案;通过将任务分配问题建模为线性规划问题,通过对抽象模型的分析,实现了对任务分配可行域的求解。同时结合实际问题,提出一种带备份的任务分配和动态调整方法,该方法旨在提供一种能在有备份的分布式集群上实现按效能的任务分配,同时尽可能降低节点之间的显在性能关联性。能够实现在提高主节点的任务量时,尽量不提高备份节点的额外负担,进而可以缓解节点间存在的性能差异对整体任务运行时间的影响。The invention discloses a distributed load balancing method of a master-slave backup technology, comprising the following steps: S1 divides node sets; S2 divides task sets; S3 distributes subtasks; S4 records execution time; S5 calculates execution efficiency; S6 solves distribution schemes ; By modeling the task allocation problem as a linear programming problem, and through the analysis of the abstract model, the solution to the task allocation feasible domain is realized. At the same time, combined with practical problems, a task allocation and dynamic adjustment method with backup is proposed. in performance relevance. It can be realized that when increasing the task load of the master node, the additional burden of the backup node should not be increased as far as possible, thereby alleviating the impact of performance differences between nodes on the overall task running time.

Description

一种基于主从备份技术的分布式负载均衡方法A distributed load balancing method based on master-slave backup technology

技术领域technical field

本发明属于计算机通信领域,尤其涉及一种基于主从备份技术的分布式负载均衡方法。The invention belongs to the field of computer communication, in particular to a distributed load balancing method based on master-slave backup technology.

背景技术Background technique

在分布式系统中,容忍进程失败的关键方法是把多个同样的进程放到一个组中,当一个信息发送到组本身进行处理时,所有的成员都接受并处理它。通过这种方式,如果组中的一个进程失败,其他的一些进程可以接管它。在需要容错的情况下,通常使用进程复制的方法,当主进程崩溃时,备份进程替代当前主进程的任务;在需要提高效能的情况下,通常使用复制和缓存扩展以及冗余编码来复制主进程形成进程组,由于冗余的存在可以使得任务更快回收(基于MDS码的任务冗余部分快速回收),可以使得任务所需通讯量减少(基于CDC编码的通信负载降低)。In a distributed system, the key way to tolerate process failure is to put multiple identical processes into a group. When a message is sent to the group itself for processing, all members accept and process it. This way, if one process in the group fails, some other process can take over for it. In the case of fault tolerance, the method of process replication is usually used. When the main process crashes, the backup process replaces the task of the current main process; in the case of needing to improve performance, replication and cache expansion and redundant coding are usually used to copy the main process. Forming a process group, due to the existence of redundancy, tasks can be recycled faster (fast recovery of redundant parts of tasks based on MDS codes), which can reduce the amount of communication required by tasks (reduction of communication load based on CDC codes).

通常的进程备份方案将主节点上运行的主进程完全复制形成备份进程,并放置到备份节点上。由于节点之间存在全备份关系,主节点和备份节点之间需要保证强一致性,增加主节点的任务量同时也会相应的增加备份节点的任务量。当组中主节点与其备份节点存在性能差异时,提高主节点的任务量可能会导致主节点和备份节点之间存在不可忽视的运行时间差异,影响备份的实时性。In a common process backup scheme, the master process running on the master node is completely copied to form a backup process, and placed on the backup node. Since there is a full backup relationship between nodes, strong consistency needs to be ensured between the master node and the backup node. Increasing the task load of the master node will also increase the task load of the backup node accordingly. When there is a performance difference between the primary node and its backup node in the group, increasing the task load of the primary node may cause a non-negligible difference in running time between the primary node and the backup node, affecting the real-time performance of the backup.

发明内容Contents of the invention

发明目的:为了将上述任务分配问题建模为线性规划问题,通过对抽象模型的分析,实现对任务分配可行域的求解,同时结合实际问题,提出一种带备份的任务分配和动态调整方法。Purpose of the invention: In order to model the above task allocation problem as a linear programming problem, through the analysis of the abstract model, realize the solution to the task allocation feasible domain, and combine with practical problems, propose a task allocation and dynamic adjustment method with backup.

技术方案:本发明提供一种基于主从备份技术的分布式负载均衡方法,包括如下具体步骤:Technical solution: the present invention provides a distributed load balancing method based on master-slave backup technology, including the following specific steps:

(1)根据构建的有节点分布式集群,划分节点集合;(1) According to the constructed distributed cluster with nodes, divide the node set;

(2)在集群上运行设置备份等级的任务,将同一批次所有任务划分任务集合;(2) Run the task of setting the backup level on the cluster, and divide all tasks of the same batch into task sets;

(3)依据步骤(1)和(2)得到的节点集合和任务集合,将任务集合包含的任务分发至对应节点集合;(3) According to the node set and task set obtained in steps (1) and (2), distribute the tasks contained in the task set to the corresponding node set;

(4)待所有节点完成其收到的任务,收集并记录执行时间;(4) After all nodes complete the tasks they receive, collect and record the execution time;

(5)获取每个节点的计算效率,归一化计算执行效率;(5) Obtain the computing efficiency of each node, and normalize the computing execution efficiency;

(6)通过已知常量建立方程组,变形转换为线性规划问题,进而求解分配方案。(6) Establish a system of equations through known constants, transform it into a linear programming problem, and then solve the allocation scheme.

进一步地,步骤(1)中所述划分节点集合方法包括:Further, the method for dividing node sets described in step (1) includes:

(1.1)构建一个n个节点分布式集群为

Figure BDA0002274924800000021
(1.1) Construct a distributed cluster of n nodes as
Figure BDA0002274924800000021

(1.2)对n个节点进行排列组合,取任意r个节点组成一组子集合

Figure BDA0002274924800000022
将全部的可能的组合
Figure BDA0002274924800000023
组成集合
Figure BDA0002274924800000024
集合
Figure BDA0002274924800000025
Figure BDA0002274924800000026
个元素,
Figure BDA0002274924800000027
初始化
Figure BDA0002274924800000028
(1.2) Arrange and combine n nodes, and take any r nodes to form a set of subsets
Figure BDA0002274924800000022
combine all possible combinations
Figure BDA0002274924800000023
Form a set
Figure BDA0002274924800000024
gather
Figure BDA0002274924800000025
have
Figure BDA0002274924800000026
elements,
Figure BDA0002274924800000027
initialization
Figure BDA0002274924800000028

进一步地,步骤(2)中所述划分任务集合方法包括:Further, the method for dividing task sets described in step (2) includes:

(2.1)在步骤(1)构建的集群上运行F个均等任务量的任务,同时设置备份等级为r;(2.1) Run F tasks with an equal amount of tasks on the cluster built in step (1), and set the backup level to r at the same time;

(2.2)整个任务集合F划分为小批次,每一批次执行一次任务分配与运行统计,系统运行时段记作t,批次记作Fj,每个批次Fj的计算时间记作Δt;每个批次Fj划分为更小批量的小任务集合,记作

Figure BDA0002274924800000029
Figure BDA00022749248000000210
的大小记作
Figure BDA00022749248000000211
(2.2) The entire task set F is divided into small batches, each batch executes task allocation and operation statistics once, the system running period is denoted as t, the batch is denoted as F j , and the calculation time of each batch F j is denoted as Δt; each batch F j is divided into smaller batches of small task sets, denoted as
Figure BDA0002274924800000029
Figure BDA00022749248000000210
the size of
Figure BDA00022749248000000211

(2.3)将同一个批次的所有任务Fj划分为

Figure BDA00022749248000000212
个任务集,在t时刻,以比例
Figure BDA00022749248000000213
划分批次任务Fj
Figure BDA00022749248000000214
个任务集合
Figure BDA00022749248000000215
集合
Figure BDA00022749248000000216
Figure BDA00022749248000000217
类似,集合
Figure BDA00022749248000000218
也有
Figure BDA00022749248000000219
个元素。(2.3) Divide all tasks F j of the same batch into
Figure BDA00022749248000000212
task set, at time t, in proportion
Figure BDA00022749248000000213
Divide the batch task F j as
Figure BDA00022749248000000214
set of tasks
Figure BDA00022749248000000215
gather
Figure BDA00022749248000000216
and
Figure BDA00022749248000000217
similar, set
Figure BDA00022749248000000218
also have
Figure BDA00022749248000000219
elements.

进一步地,步骤(3)中所述分发子任务方法包括:Further, the method for distributing subtasks described in step (3) includes:

(3.1)依据步骤(1)和(2),获取两个有

Figure BDA00022749248000000220
个元素的集合
Figure BDA00022749248000000221
Figure BDA00022749248000000222
每次从两个集合中取元素
Figure BDA00022749248000000223
Figure BDA00022749248000000224
其中
Figure BDA00022749248000000225
是一个任务集合,
Figure BDA00022749248000000226
是一组节点组成的集合;(3.1) According to steps (1) and (2), two valid
Figure BDA00022749248000000220
collection of elements
Figure BDA00022749248000000221
and
Figure BDA00022749248000000222
Take elements from two collections at a time
Figure BDA00022749248000000223
and
Figure BDA00022749248000000224
in
Figure BDA00022749248000000225
is a set of tasks,
Figure BDA00022749248000000226
is a set of nodes;

(3.2)依次将

Figure BDA00022749248000000227
包含的任务发送到
Figure BDA00022749248000000228
表示的节点上,重复上述过程直到每一个任务集
Figure BDA00022749248000000229
中的任务都被发送给了对应的
Figure BDA00022749248000000230
中所有的节点上;同时每一个节点上存在的待处理任务个数一共为
Figure BDA00022749248000000231
每个任务集被拷贝并发送到r个不同的节点上。(3.2) in turn
Figure BDA00022749248000000227
Contains tasks sent to
Figure BDA00022749248000000228
On the node represented by , repeat the above process until each task set
Figure BDA00022749248000000229
The tasks in are sent to the corresponding
Figure BDA00022749248000000230
On all the nodes in ; at the same time, the total number of tasks to be processed on each node is
Figure BDA00022749248000000231
Each task set is copied and sent to r different nodes.

进一步地,步骤(4)中记录执行时间过程还包括:Further, the recording execution time process in step (4) also includes:

(4.1)等待所有节点完成其收到的任务,执行冗余校验或其它任务。收集每个节点Nq的任务执行时间,记作Δtq(4.1) Wait for all nodes to complete the tasks they received, perform redundancy check or other tasks. Collect the task execution time of each node N q , denoted as Δt q ;

(4.2)将当前时刻记作t,节点Nq的当前计算效率记作λq(t),节点Nq上拥有的任务集

Figure BDA00022749248000000232
的大小记作
Figure BDA00022749248000000233
(4.2) Denote the current moment as t, the current computing efficiency of node N q as λ q (t), and the task set owned by node N q
Figure BDA00022749248000000232
the size of
Figure BDA00022749248000000233

(4.3)采用公式(1)进行计算。(4.3) Use formula (1) to calculate.

Figure BDA00022749248000000234
Figure BDA00022749248000000234

每个节点独立计算自己的计算效率λq(t),并将其发送给其他所有节点。Each node independently calculates its own computational efficiency λ q (t) and sends it to all other nodes.

进一步地,步骤(5)中计算执行效率方法包括:Further, the method for calculating execution efficiency in step (5) includes:

(5.1)获取到每个节点所有的λq(t),使用公式(2)对λq(t)进行归一化;(5.1) Obtain all λ q (t) of each node, and use formula (2) to normalize λ q (t);

Figure BDA0002274924800000031
Figure BDA0002274924800000031

定义

Figure BDA0002274924800000032
和e(t):
Figure BDA0002274924800000033
definition
Figure BDA0002274924800000032
and e(t):
Figure BDA0002274924800000033

e(t)=(e0(t),e1(t),…,en-1(t));e(t)=(e 0 (t), e 1 (t),..., e n-1 (t));

(5.2)将分发矩阵记作

Figure BDA0002274924800000034
令A为系数矩阵,
Figure BDA0002274924800000035
为变量,e(t)为常数项,可得非齐次线性方程组(3)。(5.2) Write the distribution matrix as
Figure BDA0002274924800000034
Let A be the coefficient matrix,
Figure BDA0002274924800000035
is a variable, and e(t) is a constant item, the non-homogeneous linear equations (3) can be obtained.

Figure BDA0002274924800000036
Figure BDA0002274924800000036

其中,分发矩阵的行标代表节点下标,列标代表任务集下标,元素aq,i=1代表节点q拥有任务集i,元素aq,i=0代表节点q不拥有任务集iAmong them, the row index of the distribution matrix represents the node index, and the column index represents the task set index. The element a q, i = 1 means that the node q owns the task set i, and the element a q, i = 0 means that the node q does not own the task set i

进一步地,步骤(6)中求解分配方案具体包括:Further, solving the allocation scheme in step (6) specifically includes:

(7.1)当行数等于列数时,

Figure BDA0002274924800000037
方程组有唯一解
Figure BDA0002274924800000038
(7.1) When the number of rows is equal to the number of columns,
Figure BDA0002274924800000037
system of equations has a unique solution
Figure BDA0002274924800000038

(7.2)当方程组的行数大于列数时,

Figure BDA0002274924800000039
方程组无唯一解,可以变形为一个线性规划问题,过程如下:(7.2) When the number of rows of the equation system is greater than the number of columns,
Figure BDA0002274924800000039
The system of equations has no unique solution, so it can be transformed into a linear programming problem. The process is as follows:

(7.2.1)首先定义矩阵A的形式,其左半部分为

Figure BDA00022749248000000310
右半部分为
Figure BDA00022749248000000311
即:(7.2.1) First define the form of matrix A, the left half of which is
Figure BDA00022749248000000310
the right half is
Figure BDA00022749248000000311
Right now:

Figure BDA00022749248000000312
Figure BDA00022749248000000312

同理定义

Figure BDA00022749248000000313
的上半部分为
Figure BDA00022749248000000314
下半部分为
Figure BDA00022749248000000315
Same definition
Figure BDA00022749248000000313
The upper half of
Figure BDA00022749248000000314
The second half is
Figure BDA00022749248000000315

上述方程组改写为(公式4):The above equations are rewritten as (Formula 4):

Figure BDA00022749248000000316
Figure BDA00022749248000000316

求解得:Solved:

Figure BDA00022749248000000317
Figure BDA00022749248000000317

Figure BDA0002274924800000041
的每个维度都大于0,取
Figure BDA0002274924800000042
为自变量,将其转换为线性规划问题,如公式(6),其中
Figure BDA0002274924800000043
为向量
Figure BDA0002274924800000044
的第i个元素:like
Figure BDA0002274924800000041
Each dimension of is greater than 0, take
Figure BDA0002274924800000042
As an independent variable, convert it into a linear programming problem, such as formula (6), where
Figure BDA0002274924800000043
as a vector
Figure BDA0002274924800000044
The i-th element of :

Figure BDA0002274924800000045
Figure BDA0002274924800000045

Figure BDA0002274924800000046
Figure BDA0002274924800000046

求解得

Figure BDA0002274924800000047
Solved
Figure BDA0002274924800000047

(7.2.2)向量

Figure BDA0002274924800000048
中得每一个元素
Figure BDA0002274924800000049
Figure BDA00022749248000000410
对应一个任务集占总任务数目的比例,向量
Figure BDA00022749248000000411
对应的一组值即可作为一组分配方案。使用该分配方案依照比例划分每一批次的任务Fj,令t=t+1,进入下一个时间步,并执行步骤(2),重复任务分配与计算性能估计,直到所有任务执行完毕。(7.2.2) vector
Figure BDA0002274924800000048
get every element
Figure BDA0002274924800000049
Figure BDA00022749248000000410
Corresponding to the ratio of a task set to the total number of tasks, the vector
Figure BDA00022749248000000411
The corresponding set of values can be used as a set of allocation schemes. Use this allocation scheme to divide each batch of tasks F j in proportion, set t=t+1, enter the next time step, and perform step (2), repeat task allocation and calculation performance estimation until all tasks are executed.

有益效果:本发明与现有技术相比,其显著优点是:(1)提供一种能在有备份的分布式集群上实现按效能的任务分配,同时尽可能降低节点之间的显在性能关联性;(2)能够实现在提高主节点的任务量时,尽量不提高备份节点的额外负担,进而可以缓解节点间存在的性能差异对整体任务运行时间的影响。Beneficial effects: Compared with the prior art, the present invention has the remarkable advantages of: (1) providing a distributed cluster that can realize performance-based task allocation while reducing the apparent performance between nodes as much as possible; Relevance; (2) It can realize that when increasing the task load of the master node, the extra burden of the backup node should not be increased as far as possible, and then the impact of the performance difference between nodes on the overall task running time can be alleviated.

附图说明Description of drawings

图1一种基于主从备份技术的分布式负载均衡方法流程图;Fig. 1 is a flow chart of a distributed load balancing method based on master-slave backup technology;

图2可行域样例。Figure 2 Feasible region example.

具体实施方式:Detailed ways:

下面结合说明书附图对本发明的具体实施方式进行说明。The specific implementation manners of the present invention will be described below in conjunction with the accompanying drawings.

如图1具体描述实施例1:设一个拥有3个节点的分布式集群

Figure BDA00022749248000000412
Figure BDA00022749248000000413
如果要在
Figure BDA00022749248000000414
上执行4次循环计算,每次执行120个均等任务量的任务(任务之间无关联性,可以乱序执行且不相互影响)。设置每个任务在本集群上的冗余备份数目r=2(即每个任务要同时执行两次)。下文将依照上述步骤描述本方法的具体实施方式。Example 1 is described in detail in Figure 1: Set up a distributed cluster with 3 nodes
Figure BDA00022749248000000412
Figure BDA00022749248000000413
if you want to be in
Figure BDA00022749248000000414
Execute 4 loop calculations on the above, and each time execute 120 tasks with an equal amount of tasks (there is no correlation between tasks, they can be executed out of order and do not affect each other). The number of redundant backups of each task on the cluster is set to r=2 (that is, each task must be executed twice at the same time). The specific implementation of the method will be described below according to the above steps.

步骤1:对3个节点进行排列组合,得组合结果

Figure BDA00022749248000000415
Figure BDA00022749248000000416
初始化每批次任务集大小
Figure BDA00022749248000000417
Step 1: Arrange and combine the 3 nodes to get the combination result
Figure BDA00022749248000000415
Figure BDA00022749248000000416
Initialize the task set size for each batch
Figure BDA00022749248000000417

步骤2:按照每批次任务集大小

Figure BDA00022749248000000418
将每批次循环的任务进行划分,则一批次120个子任务将划分为三个集合
Figure BDA00022749248000000419
Figure BDA00022749248000000420
Step 2: According to the size of each batch of task sets
Figure BDA00022749248000000418
Divide the tasks of each batch cycle, then a batch of 120 subtasks will be divided into three sets
Figure BDA00022749248000000419
and
Figure BDA00022749248000000420

步骤3:依照步骤1和2,得到了两个有3个元素的集合

Figure BDA00022749248000000421
Figure BDA00022749248000000422
每次从两个集合中取元素
Figure BDA0002274924800000051
Figure BDA0002274924800000052
其中
Figure BDA0002274924800000053
是一个任务集合,
Figure BDA0002274924800000054
是一组节点组成的集合。依次将
Figure BDA0002274924800000055
包含的任务发送到
Figure BDA0002274924800000056
表示的节点上,即每一个子集合
Figure BDA0002274924800000057
中的所有节点均接收到同样的任务集
Figure BDA0002274924800000058
即将
Figure BDA0002274924800000059
发送给节点{N0,N1},将
Figure BDA00022749248000000510
发送给节点{N0,N2},将
Figure BDA00022749248000000511
发送给节点{N1,N2}。此时,每一个节点上存在的待处理任务个数一共为μ=2,每个任务集被拷贝并发送到2个不同的节点上,然后进入步骤4。Step 3: According to steps 1 and 2, two sets with 3 elements are obtained
Figure BDA00022749248000000421
and
Figure BDA00022749248000000422
Take elements from two collections at a time
Figure BDA0002274924800000051
and
Figure BDA0002274924800000052
in
Figure BDA0002274924800000053
is a set of tasks,
Figure BDA0002274924800000054
It is a set of nodes. in turn
Figure BDA0002274924800000055
Contains tasks sent to
Figure BDA0002274924800000056
On the node represented, that is, each sub-set
Figure BDA0002274924800000057
All nodes in receive the same set of tasks
Figure BDA0002274924800000058
about to
Figure BDA0002274924800000059
Send to node {N 0 ,N 1 }, will
Figure BDA00022749248000000510
Send to node {N 0 ,N 2 }, will
Figure BDA00022749248000000511
Send to node {N 1 , N 2 }. At this time, the total number of tasks to be processed on each node is μ=2, and each task set is copied and sent to two different nodes, and then enter step 4.

步骤4:等待所有节点完成其收到的任务。每个节点Nq独立记录其所有子任务的计算效率并发送给集群中的其他所有节点。以节点N0为例,将其总的运行时间记作Δt0,当前时刻为t,将节点N0上拥有的任务集{P1,P2}的大小记作

Figure BDA00022749248000000512
计算λ0(t)。Step 4: Wait for all nodes to complete the tasks they received. Each node N q independently records the computational efficiency of all its subtasks and sends them to all other nodes in the cluster. Taking node N 0 as an example, its total running time is recorded as Δt 0 , the current moment is t, and the size of the task set {P 1 , P 2 } owned by node N 0 is recorded as
Figure BDA00022749248000000512
Calculate λ 0 (t).

Figure BDA00022749248000000513
Figure BDA00022749248000000513

并发送λ0(t)给节点N1,N2。进入步骤5。And send λ 0 (t) to nodes N 1 , N 2 . Go to step 5.

步骤5:在每个节点Nq上执行如下操作:从其他节点获取所有的λq(t),对λq(t)进行归一化,得Step 5: Perform the following operations on each node N q : Get all λ q (t) from other nodes, normalize λ q (t), and get

Figure BDA00022749248000000514
Figure BDA00022749248000000514

定义

Figure BDA00022749248000000515
和e(t)为如下形式:
Figure BDA00022749248000000516
Figure BDA00022749248000000517
e(t)=(e0(t),e1(t),e2(t))。可得非齐次线性方程组:definition
Figure BDA00022749248000000515
and e(t) are of the form:
Figure BDA00022749248000000516
Figure BDA00022749248000000517
e(t)=(e 0 (t), e 1 (t), e 2 (t)). Inhomogeneous linear equations can be obtained:

Figure BDA00022749248000000518
Figure BDA00022749248000000518

其中:in:

Figure BDA00022749248000000519
Figure BDA00022749248000000519

步骤6:求解非齐次线性方程组,解

Figure BDA00022749248000000520
作为下一次迭代的任务分配方案,继续执行步骤2,直到完成所有批次的迭代。Step 6: Solve the inhomogeneous system of linear equations, solving
Figure BDA00022749248000000520
As the task allocation scheme for the next iteration, continue to perform step 2 until all batches of iterations are completed.

假设某一次迭代中,节点N0,N1,N2上的任务执行时间分别为Δt0=16.7secs,Δt1=11.1secs,Δt2=14.3secs。可得其计算效率分别为λ0(t)=0.0399,λ1(t)=0.0600,λ2(t)=0.0466,归一化之后可得e0(t)=0.545,e1(t)=0.818,e2(t)=0.636那么上述的非齐次线性方程组的方程表示形式就可以写作:Assume that in a certain iteration, the execution time of tasks on nodes N 0 , N 1 , and N 2 are Δt 0 =16.7 secs, Δt 1 =11.1 secs, and Δt 2 =14.3 secs, respectively. The calculation efficiency can be obtained as λ 0 (t) = 0.0399, λ 1 (t) = 0.0600, λ 2 (t) = 0.0466, after normalization, e 0 (t) = 0.545, e 1 (t) =0.818, e 2 (t)=0.636 Then the equation expression form of the above-mentioned non-homogeneous linear equation system can be written as:

Figure BDA0002274924800000061
Figure BDA0002274924800000061

可以解出:can be solved:

Figure BDA0002274924800000062
Figure BDA0002274924800000062

那么下一次进行任务划分时,其每个子任务集合的大小

Figure BDA0002274924800000063
Figure BDA0002274924800000064
因此每个节点上的任务总量变为66,98,76,与其计算性能估值0.0399,0.0600,0.0466相对应,可以认为其达到了近似最大资源利用率。Then the next time the task is divided, the size of each subtask set
Figure BDA0002274924800000063
Figure BDA0002274924800000064
Therefore, the total number of tasks on each node becomes 66, 98, and 76, corresponding to its computing performance estimates of 0.0399, 0.0600, and 0.0466, which can be considered to have reached the approximate maximum resource utilization.

实施例2:设一个拥有4个节点的分布式集群

Figure BDA0002274924800000065
如果要在
Figure BDA0002274924800000066
上执行4次循环计算,每次执行120个均等任务量的任务(任务之间无关联性,可以乱序执行且不相互影响)。设置每个任务在本集群上的冗余备份数目r=2(即每个任务要同时执行两次)。下文将依照上述步骤描述本方法的具体实施方式。Example 2: Set up a distributed cluster with 4 nodes
Figure BDA0002274924800000065
if you want to be in
Figure BDA0002274924800000066
Execute 4 loop calculations on the above, and each time execute 120 tasks with an equal amount of tasks (there is no correlation between tasks, they can be executed out of order and do not affect each other). The number of redundant backups of each task on the cluster is set to r=2 (that is, each task must be executed twice at the same time). The specific implementation of the method will be described below according to the above steps.

步骤1:对4个节点进行排列组合,总共有

Figure BDA0002274924800000067
种组合方案,记作D1,D2,…,D6。D1={N1,N2},D2={N1,N3},…,D6={N3,N4}。组合结果记作
Figure BDA0002274924800000068
Figure BDA0002274924800000069
初始化
Figure BDA00022749248000000610
Step 1: Arrange and combine the 4 nodes, there are a total of
Figure BDA0002274924800000067
A combination scheme, denoted as D 1 , D 2 ,..., D 6 . D 1 ={N 1 ,N 2 }, D 2 ={N 1 ,N 3 },...,D 6 ={N 3 ,N 4 }. The combined result is recorded as
Figure BDA0002274924800000068
Figure BDA0002274924800000069
initialization
Figure BDA00022749248000000610

步骤2:按照每批次任务集大小

Figure BDA00022749248000000611
将每批次循环的任务进行划分,则一批次120个子任务将划分为6个集合
Figure BDA00022749248000000612
Figure BDA00022749248000000613
Step 2: According to the size of each batch of task sets
Figure BDA00022749248000000611
Divide the tasks of each batch cycle, then a batch of 120 subtasks will be divided into 6 sets
Figure BDA00022749248000000612
and
Figure BDA00022749248000000613

步骤3:依照步骤1和2,得到了两个有6个元素的集合

Figure BDA00022749248000000614
Figure BDA00022749248000000615
每次从两个集合中取元素
Figure BDA00022749248000000616
Figure BDA00022749248000000617
其中
Figure BDA00022749248000000618
是一个任务集合,
Figure BDA00022749248000000619
是一组节点组成的集合。依次将
Figure BDA00022749248000000620
包含的任务发送到
Figure BDA00022749248000000621
表示的节点上,即每一个子集合
Figure BDA00022749248000000622
中的所有节点均接收到同样的任务集
Figure BDA00022749248000000623
即将
Figure BDA00022749248000000624
发送给节点{N0,N1},将
Figure BDA00022749248000000625
发送给节点{N0,N2},将
Figure BDA00022749248000000626
发送给节点{N0,N3},以此类推。此时,每一个节点上存在的待处理任务个数一共为μ=3,每个任务集被拷贝并发送到2个不同的节点上,然后进入步骤4。Step 3: According to steps 1 and 2, two sets with 6 elements are obtained
Figure BDA00022749248000000614
and
Figure BDA00022749248000000615
Take elements from two collections at a time
Figure BDA00022749248000000616
and
Figure BDA00022749248000000617
in
Figure BDA00022749248000000618
is a set of tasks,
Figure BDA00022749248000000619
It is a set of nodes. in turn
Figure BDA00022749248000000620
Contains tasks sent to
Figure BDA00022749248000000621
On the node represented, that is, each sub-set
Figure BDA00022749248000000622
All nodes in receive the same set of tasks
Figure BDA00022749248000000623
about to
Figure BDA00022749248000000624
Send to node {N 0 ,N 1 }, will
Figure BDA00022749248000000625
Send to node {N 0 ,N 2 }, will
Figure BDA00022749248000000626
Send to node {N 0 ,N 3 }, and so on. At this time, the total number of tasks to be processed on each node is μ=3, and each task set is copied and sent to two different nodes, and then step 4 is entered.

步骤4:等待所有节点完成其收到的任务。每个节点Nq独立记录其所有子任务的计算效率并发送给集群中的其他所有节点。以节点N0为例,将其总的运行时间记作Δt0,当前时刻为t,将节点N0上拥有的任务集{P1,P2,P3}的大小记作

Figure BDA0002274924800000071
计算λ0(t)。Step 4: Wait for all nodes to complete the tasks they received. Each node N q independently records the computational efficiency of all its subtasks and sends them to all other nodes in the cluster. Taking node N 0 as an example, its total running time is recorded as Δt 0 , and the current moment is t, and the size of the task set {P 1 , P 2 , P 3 } owned by node N 0 is recorded as
Figure BDA0002274924800000071
Calculate λ 0 (t).

Figure BDA0002274924800000072
Figure BDA0002274924800000072

并发送λ0(t)给节点N1,N2,N3。进入步骤5。And send λ 0 (t) to nodes N 1 , N 2 , N 3 . Go to step 5.

步骤5:在每个节点Nq上执行如下操作:从其他节点获取所有的λq(t),对λq(t)进行归一化,得Step 5: Perform the following operations on each node N q : Get all λ q (t) from other nodes, normalize λ q (t), and get

Figure BDA0002274924800000073
Figure BDA0002274924800000073

定义

Figure BDA0002274924800000074
和e(t)为如下形式:
Figure BDA0002274924800000075
e(t)=(e0(t),e1(t),...,e5(t))。definition
Figure BDA0002274924800000074
and e(t) are of the form:
Figure BDA0002274924800000075
e(t)=(e 0 (t), e 1 (t), . . . , e 5 (t)).

可得非齐次线性方程组:Inhomogeneous linear equations can be obtained:

Figure BDA0002274924800000076
Figure BDA0002274924800000076

其中:in:

Figure BDA0002274924800000077
Figure BDA0002274924800000077

步骤6:上述非齐次线性方程组稀疏矩阵不为非奇异矩阵,无法求解非齐次线性方程组,可以变形为一个线性规划问题,过程如下:Step 6: The sparse matrix of the above-mentioned non-homogeneous linear equations is not a non-singular matrix and cannot solve the non-homogeneous linear equations. It can be transformed into a linear programming problem. The process is as follows:

首先矩阵A可以写作一个方阵和一个普通矩阵拼接的形式,称其左半部分为Al,右半部分为Ar,即:First, the matrix A can be written in the form of a square matrix and an ordinary matrix spliced, and its left half is called A l , and the right half is called A r , that is:

Figure BDA0002274924800000081
Figure BDA0002274924800000081

同样的称

Figure BDA0002274924800000082
的上半部分为
Figure BDA0002274924800000083
下半部分为
Figure BDA0002274924800000084
the same name
Figure BDA0002274924800000082
The upper half of
Figure BDA0002274924800000083
The second half is
Figure BDA0002274924800000084

那么上述方程组可以改写为(4):Then the above equations can be rewritten as (4):

Figure BDA0002274924800000085
Figure BDA0002274924800000085

可以解出:can be solved:

Figure BDA0002274924800000086
Figure BDA0002274924800000086

Figure BDA0002274924800000087
的每个维度都大于0,取
Figure BDA0002274924800000088
为自变量,可以将其转换为线性规划问题,如公式(6)。make
Figure BDA0002274924800000087
Each dimension of is greater than 0, take
Figure BDA0002274924800000088
As an independent variable, it can be transformed into a linear programming problem, such as formula (6).

Figure BDA0002274924800000089
Figure BDA0002274924800000089

Figure BDA00022749248000000810
Figure BDA00022749248000000810

求得

Figure BDA00022749248000000811
后,带入公式(6)进而求解
Figure BDA00022749248000000812
求得
Figure BDA00022749248000000813
向量
Figure BDA00022749248000000814
中得每一个元素
Figure BDA00022749248000000815
Figure BDA00022749248000000816
对应一个任务集占总任务数目的比例,那么向量
Figure BDA00022749248000000817
对应的一组值即可作为一组分配方案。使用该分配方案依照比例划分每一批次的任务Fj,令t=t+1,进入下一个时间步,并执行步骤2,重复任务分配与计算性能估计,直到所有任务执行完毕。obtain
Figure BDA00022749248000000811
After that, it is brought into the formula (6) to solve
Figure BDA00022749248000000812
obtain
Figure BDA00022749248000000813
vector
Figure BDA00022749248000000814
get every element
Figure BDA00022749248000000815
Figure BDA00022749248000000816
Corresponding to the proportion of a task set to the total number of tasks, then the vector
Figure BDA00022749248000000817
The corresponding set of values can be used as a set of allocation schemes. Use this allocation scheme to divide each batch of tasks F j according to the proportion, set t=t+1, enter the next time step, and perform step 2, repeat task allocation and calculation performance estimation until all tasks are executed.

假设某一次迭代中,节点N0,N1,N2,N3上的任务执行时间分别为Δt0=16.7secs,Δt1=11.1secs,Δt2=14.3secs,Δt3=11.1secs。可得其计算效率分别为:Assume that in a certain iteration, the task execution times on nodes N 0 , N 1 , N 2 , and N 3 are Δt 0 =16.7 secs, Δt 1 =11.1 secs, Δt 2 =14.3 secs, and Δt 3 =11.1 secs, respectively. The computational efficiencies can be obtained as follows:

λ0(t)=0.0299,λ1(t)=0.0450,λ2(t)=0.0349,λ3(t)=0.0450λ 0 (t) = 0.0299, λ 1 (t) = 0.0450, λ 2 (t) = 0.0349, λ 3 (t) = 0.0450

归一化之后可得:After normalization, we can get:

e0(t)=0.3862,e1(t)=0.5814,e2(t)=0.4510,e3(t)=0.5814e 0 (t)=0.3862, e 1 (t)=0.5814, e 2 (t)=0.4510, e 3 (t)=0.5814

那么上述的非齐次线性方程组的方程表示形式就可以写作:Then the equation representation of the above non-homogeneous linear equations can be written as:

Figure BDA0002274924800000091
Figure BDA0002274924800000091

即:Right now:

Figure BDA0002274924800000092
Figure BDA0002274924800000092

系数矩阵:Coefficient matrix:

Figure BDA0002274924800000093
Figure BDA0002274924800000093

Figure BDA0002274924800000094
Figure BDA0002274924800000094

对其求逆得:Inverting it yields:

Figure BDA0002274924800000095
Figure BDA0002274924800000095

Figure BDA0002274924800000096
得:beg
Figure BDA0002274924800000096
have to:

Figure BDA0002274924800000097
Figure BDA0002274924800000097

Figure BDA0002274924800000098
得:beg
Figure BDA0002274924800000098
have to:

Figure BDA0002274924800000099
Figure BDA0002274924800000099

令:make:

Figure BDA0002274924800000101
Figure BDA0002274924800000101

可得:Available:

Figure BDA0002274924800000102
Figure BDA0002274924800000102

上述约束在第一象限内有可行域,如图2所示。The above constraints have a feasible region in the first quadrant, as shown in Figure 2.

在上述可行域中求:Find in the above feasible domain:

Figure BDA0002274924800000103
Figure BDA0002274924800000103

取一组可行解:Take a set of feasible solutions:

Figure BDA0002274924800000104
Figure BDA0002274924800000104

继续解出

Figure BDA0002274924800000105
Continue to solve
Figure BDA0002274924800000105

Figure BDA0002274924800000106
Figure BDA0002274924800000106

那么下一次进行任务划分时,其每个子任务集合的大小

Figure BDA0002274924800000107
Figure BDA0002274924800000108
因此每个节点上的任务总量变为46,70,54,70,与其计算性能估值0.0299,0.0450,0.039,0.0450相对应,可以认为其达到了近似最大资源利用率。Then the next time the task is divided, the size of each subtask set
Figure BDA0002274924800000107
Figure BDA0002274924800000108
Therefore, the total number of tasks on each node becomes 46, 70, 54, and 70, corresponding to its computing performance estimates of 0.0299, 0.0450, 0.039, and 0.0450, which can be considered to have reached the approximate maximum resource utilization.

Claims (5)

1.一种基于主从备份技术的分布式负载均衡方法,其特征在于,包括如下步骤:1. A distributed load balancing method based on master-slave backup technology, characterized in that, comprising the steps: (1)根据构建的有节点分布式集群,划分节点集合;(1) According to the constructed distributed cluster with nodes, divide the node set; (2)在集群上运行设置备份等级的任务,将同一批次所有任务划分任务集合;(2) Run the task of setting the backup level on the cluster, and divide all tasks of the same batch into task sets; (3)依据步骤(1)和(2)得到的节点集合和任务集合,将任务集合包含的任务分发至对应节点集合;(3) According to the node set and task set obtained in steps (1) and (2), distribute the tasks contained in the task set to the corresponding node set; (4)待所有节点完成其收到的任务,收集并记录执行时间;(4) After all nodes complete the tasks they receive, collect and record the execution time; (5)获取每个节点的计算效率,归一化计算执行效率;(5) Obtain the computing efficiency of each node, and normalize the computing execution efficiency; (6)通过已知常量建立方程组,变形转换为线性规划问题,进而求解分配方案;(6) Establish a system of equations through known constants, transform it into a linear programming problem, and then solve the allocation scheme; 其中,步骤(1)中所述划分节点集合方法包括:Wherein, the method for dividing the set of nodes described in step (1) includes: (1.1)构建一个n个节点分布式集群为
Figure FDA0003908848000000011
(1.1) Construct a distributed cluster of n nodes as
Figure FDA0003908848000000011
(1.2)对n个节点进行排列组合,取r个节点组成一组子集合
Figure FDA0003908848000000012
Figure FDA0003908848000000013
将全部的可能的组合
Figure FDA0003908848000000014
组成集合
Figure FDA0003908848000000015
集合
Figure FDA0003908848000000016
Figure FDA0003908848000000017
个元素,
Figure FDA0003908848000000018
初始化
Figure FDA0003908848000000019
(1.2) Arrange and combine n nodes, and take r nodes to form a set of subsets
Figure FDA0003908848000000012
Figure FDA0003908848000000013
combine all possible combinations
Figure FDA0003908848000000014
Form a set
Figure FDA0003908848000000015
gather
Figure FDA0003908848000000016
have
Figure FDA0003908848000000017
elements,
Figure FDA0003908848000000018
initialization
Figure FDA0003908848000000019
步骤(6)中求解分配方案具体包括:In step (6), solving the allocation scheme specifically includes: (6.1)当行数等于列数时,
Figure FDA00039088480000000110
方程组有唯一解
Figure FDA00039088480000000111
(6.1) When the number of rows is equal to the number of columns,
Figure FDA00039088480000000110
system of equations has a unique solution
Figure FDA00039088480000000111
(6.2)当方程组的行数大于列数时,
Figure FDA00039088480000000112
方程组无唯一解,可以变形为一个线性规划问题,过程如下:
(6.2) When the number of rows of the equation system is greater than the number of columns,
Figure FDA00039088480000000112
The system of equations has no unique solution, so it can be transformed into a linear programming problem. The process is as follows:
(6.2.1)定义矩阵A的形式,其左半部分为
Figure FDA00039088480000000113
右半部分为
Figure FDA00039088480000000114
即:
(6.2.1) Define the form of matrix A, whose left half is
Figure FDA00039088480000000113
the right half is
Figure FDA00039088480000000114
Right now:
Figure FDA00039088480000000115
Figure FDA00039088480000000115
同理定义向量
Figure FDA00039088480000000116
的上半部分为
Figure FDA00039088480000000117
下半部分为
Figure FDA00039088480000000118
Define the vector in the same way
Figure FDA00039088480000000116
The upper half of
Figure FDA00039088480000000117
The second half is
Figure FDA00039088480000000118
上述方程组改写为(公式4):The above equations are rewritten as (Formula 4):
Figure FDA00039088480000000119
Figure FDA00039088480000000119
求解得:Solved:
Figure FDA0003908848000000021
Figure FDA0003908848000000021
Figure FDA0003908848000000022
的每个维度都大于0,取
Figure FDA0003908848000000023
为自变量,将其转换为线性规划问题,如公式(6),其中
Figure FDA0003908848000000024
为向量
Figure FDA0003908848000000025
的第i个元素:
like
Figure FDA0003908848000000022
Each dimension of is greater than 0, take
Figure FDA0003908848000000023
As an independent variable, convert it into a linear programming problem, such as formula (6), where
Figure FDA0003908848000000024
as a vector
Figure FDA0003908848000000025
The i-th element of :
Figure FDA0003908848000000026
Figure FDA0003908848000000026
Figure FDA0003908848000000027
Figure FDA0003908848000000027
求解得
Figure FDA0003908848000000028
Solved
Figure FDA0003908848000000028
(6.2.2)向量
Figure FDA0003908848000000029
中得每一个元素
Figure FDA00039088480000000210
Figure FDA00039088480000000211
Figure FDA00039088480000000212
对应一个任务集占总任务数目的比例,向量
Figure FDA00039088480000000213
对应的一组值即可作为一组分配方案,使用该分配方案依照比例划分每一批次的任务Fj,令t=t+1,进入下一个时间步,并执行步骤(2),重复任务分配与计算性能估计,直到所有任务执行完毕。
(6.2.2) vector
Figure FDA0003908848000000029
get every element
Figure FDA00039088480000000210
Figure FDA00039088480000000211
Figure FDA00039088480000000212
Corresponding to the ratio of a task set to the total number of tasks, the vector
Figure FDA00039088480000000213
The corresponding set of values can be used as a set of allocation schemes, use this allocation scheme to divide each batch of tasks F j according to the proportion, set t=t+1, enter the next time step, and perform step (2), repeat Task allocation and computing performance estimation until all tasks are executed.
2.根据权利要求1所述的一种基于主从备份技术的分布式负载均衡方法,其特征在于,步骤(2)中所述划分任务集合方法包括:2. a kind of distributed load balancing method based on master-slave backup technology according to claim 1, is characterized in that, described in the step (2) task collection method comprises: (2.1)在步骤(1)构建的集群上运行F个均等任务量的任务,同时设置备份等级为r;(2.1) Run F tasks with an equal amount of tasks on the cluster constructed in step (1), and set the backup level to r at the same time; (2.2)整个任务集合F划分为小批次,每一批次执行一次任务分配与运行统计,系统运行时段记作t,每个批次的任务Fj的计算时间记作Δt;每个批次的任务Fi划分为更小批量的小任务集合,记作
Figure FDA00039088480000000214
Figure FDA00039088480000000215
的大小记作
Figure FDA00039088480000000216
(2.2) The entire task set F is divided into small batches, each batch executes task allocation and operation statistics, the system running period is recorded as t, and the calculation time of each batch of task F j is recorded as Δt; each batch The second task F i is divided into smaller batches of small task sets, denoted as
Figure FDA00039088480000000214
Figure FDA00039088480000000215
the size of
Figure FDA00039088480000000216
(2.3)将同一个批次的所有任务Fj划分为
Figure FDA00039088480000000217
个任务集,在t时刻,以比例
Figure FDA00039088480000000218
划分批次任务Fj
Figure FDA00039088480000000219
个任务集合
Figure FDA00039088480000000220
集合
Figure FDA00039088480000000221
Figure FDA00039088480000000222
类似,集合
Figure FDA00039088480000000223
也有
Figure FDA00039088480000000224
个元素。
(2.3) Divide all tasks F j of the same batch into
Figure FDA00039088480000000217
task set, at time t, in proportion
Figure FDA00039088480000000218
Divide the batch task F j as
Figure FDA00039088480000000219
set of tasks
Figure FDA00039088480000000220
gather
Figure FDA00039088480000000221
and
Figure FDA00039088480000000222
similar, set
Figure FDA00039088480000000223
also have
Figure FDA00039088480000000224
elements.
3.根据权利要求2所述的一种基于主从备份技术的分布式负载均衡方法,其特征在于,步骤(3)中分发子任务方法包括:3. a kind of distributed load balancing method based on master-slave backup technology according to claim 2, is characterized in that, in the step (3), the method for distributing subtasks comprises: (3.1)获取两个有
Figure FDA00039088480000000225
个元素的集合
Figure FDA00039088480000000226
Figure FDA00039088480000000227
每次从两个集合中取元素
Figure FDA00039088480000000228
Figure FDA00039088480000000229
其中
Figure FDA00039088480000000230
是一个任务集合,
Figure FDA00039088480000000231
是一组节点组成的集合;
(3.1) Get two
Figure FDA00039088480000000225
collection of elements
Figure FDA00039088480000000226
and
Figure FDA00039088480000000227
Take elements from two collections at a time
Figure FDA00039088480000000228
and
Figure FDA00039088480000000229
in
Figure FDA00039088480000000230
is a set of tasks,
Figure FDA00039088480000000231
is a set of nodes;
(3.2)依次将
Figure FDA00039088480000000232
包含的任务发送到
Figure FDA00039088480000000233
表示的节点上,直到每一个任务集
Figure FDA00039088480000000234
中的任务都被发送给了对应的
Figure FDA00039088480000000235
中所有的节点上;同时每一个节点上存在的待处理任务个数一共为
Figure FDA00039088480000000236
每个任务集被拷贝并发送到r个不同的节点上。
(3.2) in turn
Figure FDA00039088480000000232
Contains tasks sent to
Figure FDA00039088480000000233
on the nodes represented, until each task set
Figure FDA00039088480000000234
The tasks in are sent to the corresponding
Figure FDA00039088480000000235
On all the nodes in ; at the same time, the total number of tasks to be processed on each node is
Figure FDA00039088480000000236
Each task set is copied and sent to r different nodes.
4.根据权利要求3所述的一种基于主从备份技术的分布式负载均衡方法,其特征在于,步骤(4)中记录执行时间过程还包括:4. a kind of distributed load balancing method based on master-slave backup technology according to claim 3, is characterized in that, in step (4), record execution time process also comprises: (4.1)等待所有节点完成其收到的任务,执行冗余校验或其它任务,收集每个节点Nq的任务执行时间,记作Δtq(4.1) Wait for all nodes to complete the tasks they received, perform redundancy checks or other tasks, collect the task execution time of each node N q , and record it as Δt q ; (4.2)将当前时刻记作t,节点Nq的当前计算效率记作λq(t),节点Nq上拥有的任务集
Figure FDA0003908848000000039
的大小记作
Figure FDA0003908848000000031
(4.2) Denote the current moment as t, the current computing efficiency of node N q as λ q (t), and the task set owned by node N q
Figure FDA0003908848000000039
the size of
Figure FDA0003908848000000031
(4.3)采用公式(1)进行计算:(4.3) Use formula (1) to calculate:
Figure FDA0003908848000000032
Figure FDA0003908848000000032
每个节点独立计算自己的计算效率λq(t),并将其发送给其他所有节点。Each node independently calculates its own computational efficiency λ q (t) and sends it to all other nodes.
5.根据权利要求4所述的一种基于主从备份技术的分布式负载均衡方法,其特征在于,步骤(5)中计算执行效率方法包括:5. a kind of distributed load balancing method based on master-slave backup technology according to claim 4, is characterized in that, calculates execution efficiency method in step (5) and comprises: (5.1)获取到每个节点所有的λq(t),使用公式(2)对λq(t)进行归一化;(5.1) Obtain all λ q (t) of each node, and use formula (2) to normalize λ q (t);
Figure FDA0003908848000000033
Figure FDA0003908848000000033
定义
Figure FDA0003908848000000034
和e(t):
Figure FDA0003908848000000035
e(t)=(e0(t),e1(t),…,en-1(t));
definition
Figure FDA0003908848000000034
and e(t):
Figure FDA0003908848000000035
e(t)=(e 0 (t), e 1 (t),..., e n-1 (t));
(5.2)将分发矩阵记作A,
Figure FDA0003908848000000036
令A为系数矩阵,
Figure FDA0003908848000000037
为变量,e(t)为常数项,可得非齐次线性方程组(3):
(5.2) Denote the distribution matrix as A,
Figure FDA0003908848000000036
Let A be the coefficient matrix,
Figure FDA0003908848000000037
is a variable, and e(t) is a constant item, the non-homogeneous linear equation system (3) can be obtained:
Figure FDA0003908848000000038
Figure FDA0003908848000000038
其中,分发矩阵的行标代表节点下标,列标代表任务集下标,元素aq,i=1代表节点q拥有任务集i,元素aq,i=0代表节点q不拥有任务集i。Among them, the row index of the distribution matrix represents the node index, and the column index represents the task set index. The element a q, i = 1 means that the node q owns the task set i, and the element a q, i = 0 means that the node q does not own the task set i .
CN201911119106.2A 2019-11-15 2019-11-15 A distributed load balancing method based on master-slave backup technology Active CN111400026B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911119106.2A CN111400026B (en) 2019-11-15 2019-11-15 A distributed load balancing method based on master-slave backup technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911119106.2A CN111400026B (en) 2019-11-15 2019-11-15 A distributed load balancing method based on master-slave backup technology

Publications (2)

Publication Number Publication Date
CN111400026A CN111400026A (en) 2020-07-10
CN111400026B true CN111400026B (en) 2023-02-28

Family

ID=71433924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911119106.2A Active CN111400026B (en) 2019-11-15 2019-11-15 A distributed load balancing method based on master-slave backup technology

Country Status (1)

Country Link
CN (1) CN111400026B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858721B (en) * 2020-08-03 2023-07-21 南京大学 A Distributed Computing Method Based on Priority Coding
CN113505021B (en) * 2021-05-26 2023-07-18 南京大学 Fault-tolerant method and system based on multi-master node master-slave distributed architecture
CN115730730A (en) * 2022-11-28 2023-03-03 西安电子科技大学 A scheduling method, system and device for production line operations
CN119149893B (en) * 2024-11-14 2025-03-25 南方电网科学研究院有限责任公司 Acceleration method for batch matrix multiplication running on computing platform

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176850A (en) * 2013-04-10 2013-06-26 国家电网公司 A load-balancing task allocation method for power system network clusters
CN104283948A (en) * 2014-09-26 2015-01-14 东软集团股份有限公司 Server cluster system and load balancing implementation method thereof
CN105302649A (en) * 2015-12-03 2016-02-03 中国联合网络通信集团有限公司 Disaster recovery backup method and system
US20160179642A1 (en) * 2014-12-19 2016-06-23 Futurewei Technologies, Inc. Replicated database distribution for workload balancing after cluster reconfiguration
CN110190991A (en) * 2019-05-21 2019-08-30 华中科技大学 A kind of fault-tolerance approach of distributed stream processing system under more application scenarios

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176850A (en) * 2013-04-10 2013-06-26 国家电网公司 A load-balancing task allocation method for power system network clusters
CN104283948A (en) * 2014-09-26 2015-01-14 东软集团股份有限公司 Server cluster system and load balancing implementation method thereof
US20160179642A1 (en) * 2014-12-19 2016-06-23 Futurewei Technologies, Inc. Replicated database distribution for workload balancing after cluster reconfiguration
CN105302649A (en) * 2015-12-03 2016-02-03 中国联合网络通信集团有限公司 Disaster recovery backup method and system
CN110190991A (en) * 2019-05-21 2019-08-30 华中科技大学 A kind of fault-tolerance approach of distributed stream processing system under more application scenarios

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SDN期末作业;bokerr;《https://www.cnblogs.com/bokers/p/8343502.html》;20180124;第1-8页 *

Also Published As

Publication number Publication date
CN111400026A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111400026B (en) A distributed load balancing method based on master-slave backup technology
Chahal et al. A hitchhiker’s guide on distributed training of deep neural networks
KR102316670B1 (en) computational accelerator
CN103810061B (en) A kind of High Availabitity cloud storage method
CN104714850B (en) A kind of isomery based on OPENCL calculates equalization methods jointly
CN102662639A (en) Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN108228970B (en) Explicit asynchronous long parallel computing method for structural dynamics analysis
CN105373517A (en) Spark-based distributed matrix inversion parallel operation method
CN119536816B (en) Method for operating DeePMD-kit model in Shenwei supercomputer
CN111428192A (en) Method and system for optimizing sparse matrix-vector multiplication for high-performance computing frameworks
CN116302574B (en) Concurrent processing method based on MapReduce
Huang et al. Real-time contingency analysis on massively parallel architectures with compensation method
Tang et al. An efficient in-memory checkpoint method and its practice on fault-tolerant HPL
CN110362780A (en) A kind of big data tensor canonical decomposition calculation method based on Shen prestige many-core processor
Huai et al. DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems
CN109033733B (en) Parallel method for solving temperature field by using finite element region decomposition improved SSORPCG
CN120670107A (en) Heterogeneous computing thread block optimal scheduling method and system based on dynamic topology mapping
CN107273339A (en) A kind of task processing method and device
Agullo et al. On the resilience of parallel sparse hybrid solvers
Jia et al. Parallel reduction to hessenberg form with algorithm-based fault tolerance
CN115361115A (en) Quantum circuit generation method and system
CN116167304B (en) Reservoir numerical simulation GMRES optimization method and system based on Shenwei framework
CN109241620B (en) An Improved SSORPCG Parallel Solution Method for Slope Stress Field Based on Finite Element Area Decomposition
WO2025161462A1 (en) Storage and compute decoupling system, data processing method and storage medium
CN119337605A (en) A GPU direct read and write method for large-scale HDF5 data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant