CN111400026B - A distributed load balancing method based on master-slave backup technology - Google Patents
A distributed load balancing method based on master-slave backup technology Download PDFInfo
- Publication number
- CN111400026B CN111400026B CN201911119106.2A CN201911119106A CN111400026B CN 111400026 B CN111400026 B CN 111400026B CN 201911119106 A CN201911119106 A CN 201911119106A CN 111400026 B CN111400026 B CN 111400026B
- Authority
- CN
- China
- Prior art keywords
- task
- tasks
- node
- nodes
- master
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/12—Simultaneous equations, e.g. systems of linear equations
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Operations Research (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Multi Processors (AREA)
Abstract
本发明公开了一种主从备份技术的分布式负载均衡方法,包括步骤如下:S1划分节点集合;S2划分任务集合;S3分发子任务;S4记录执行时间;S5计算执行效率;S6求解分配方案;通过将任务分配问题建模为线性规划问题,通过对抽象模型的分析,实现了对任务分配可行域的求解。同时结合实际问题,提出一种带备份的任务分配和动态调整方法,该方法旨在提供一种能在有备份的分布式集群上实现按效能的任务分配,同时尽可能降低节点之间的显在性能关联性。能够实现在提高主节点的任务量时,尽量不提高备份节点的额外负担,进而可以缓解节点间存在的性能差异对整体任务运行时间的影响。The invention discloses a distributed load balancing method of a master-slave backup technology, comprising the following steps: S1 divides node sets; S2 divides task sets; S3 distributes subtasks; S4 records execution time; S5 calculates execution efficiency; S6 solves distribution schemes ; By modeling the task allocation problem as a linear programming problem, and through the analysis of the abstract model, the solution to the task allocation feasible domain is realized. At the same time, combined with practical problems, a task allocation and dynamic adjustment method with backup is proposed. in performance relevance. It can be realized that when increasing the task load of the master node, the additional burden of the backup node should not be increased as far as possible, thereby alleviating the impact of performance differences between nodes on the overall task running time.
Description
技术领域technical field
本发明属于计算机通信领域,尤其涉及一种基于主从备份技术的分布式负载均衡方法。The invention belongs to the field of computer communication, in particular to a distributed load balancing method based on master-slave backup technology.
背景技术Background technique
在分布式系统中,容忍进程失败的关键方法是把多个同样的进程放到一个组中,当一个信息发送到组本身进行处理时,所有的成员都接受并处理它。通过这种方式,如果组中的一个进程失败,其他的一些进程可以接管它。在需要容错的情况下,通常使用进程复制的方法,当主进程崩溃时,备份进程替代当前主进程的任务;在需要提高效能的情况下,通常使用复制和缓存扩展以及冗余编码来复制主进程形成进程组,由于冗余的存在可以使得任务更快回收(基于MDS码的任务冗余部分快速回收),可以使得任务所需通讯量减少(基于CDC编码的通信负载降低)。In a distributed system, the key way to tolerate process failure is to put multiple identical processes into a group. When a message is sent to the group itself for processing, all members accept and process it. This way, if one process in the group fails, some other process can take over for it. In the case of fault tolerance, the method of process replication is usually used. When the main process crashes, the backup process replaces the task of the current main process; in the case of needing to improve performance, replication and cache expansion and redundant coding are usually used to copy the main process. Forming a process group, due to the existence of redundancy, tasks can be recycled faster (fast recovery of redundant parts of tasks based on MDS codes), which can reduce the amount of communication required by tasks (reduction of communication load based on CDC codes).
通常的进程备份方案将主节点上运行的主进程完全复制形成备份进程,并放置到备份节点上。由于节点之间存在全备份关系,主节点和备份节点之间需要保证强一致性,增加主节点的任务量同时也会相应的增加备份节点的任务量。当组中主节点与其备份节点存在性能差异时,提高主节点的任务量可能会导致主节点和备份节点之间存在不可忽视的运行时间差异,影响备份的实时性。In a common process backup scheme, the master process running on the master node is completely copied to form a backup process, and placed on the backup node. Since there is a full backup relationship between nodes, strong consistency needs to be ensured between the master node and the backup node. Increasing the task load of the master node will also increase the task load of the backup node accordingly. When there is a performance difference between the primary node and its backup node in the group, increasing the task load of the primary node may cause a non-negligible difference in running time between the primary node and the backup node, affecting the real-time performance of the backup.
发明内容Contents of the invention
发明目的:为了将上述任务分配问题建模为线性规划问题,通过对抽象模型的分析,实现对任务分配可行域的求解,同时结合实际问题,提出一种带备份的任务分配和动态调整方法。Purpose of the invention: In order to model the above task allocation problem as a linear programming problem, through the analysis of the abstract model, realize the solution to the task allocation feasible domain, and combine with practical problems, propose a task allocation and dynamic adjustment method with backup.
技术方案:本发明提供一种基于主从备份技术的分布式负载均衡方法,包括如下具体步骤:Technical solution: the present invention provides a distributed load balancing method based on master-slave backup technology, including the following specific steps:
(1)根据构建的有节点分布式集群,划分节点集合;(1) According to the constructed distributed cluster with nodes, divide the node set;
(2)在集群上运行设置备份等级的任务,将同一批次所有任务划分任务集合;(2) Run the task of setting the backup level on the cluster, and divide all tasks of the same batch into task sets;
(3)依据步骤(1)和(2)得到的节点集合和任务集合,将任务集合包含的任务分发至对应节点集合;(3) According to the node set and task set obtained in steps (1) and (2), distribute the tasks contained in the task set to the corresponding node set;
(4)待所有节点完成其收到的任务,收集并记录执行时间;(4) After all nodes complete the tasks they receive, collect and record the execution time;
(5)获取每个节点的计算效率,归一化计算执行效率;(5) Obtain the computing efficiency of each node, and normalize the computing execution efficiency;
(6)通过已知常量建立方程组,变形转换为线性规划问题,进而求解分配方案。(6) Establish a system of equations through known constants, transform it into a linear programming problem, and then solve the allocation scheme.
进一步地,步骤(1)中所述划分节点集合方法包括:Further, the method for dividing node sets described in step (1) includes:
(1.1)构建一个n个节点分布式集群为 (1.1) Construct a distributed cluster of n nodes as
(1.2)对n个节点进行排列组合,取任意r个节点组成一组子集合将全部的可能的组合组成集合集合有个元素,初始化 (1.2) Arrange and combine n nodes, and take any r nodes to form a set of subsets combine all possible combinations Form a set gather have elements, initialization
进一步地,步骤(2)中所述划分任务集合方法包括:Further, the method for dividing task sets described in step (2) includes:
(2.1)在步骤(1)构建的集群上运行F个均等任务量的任务,同时设置备份等级为r;(2.1) Run F tasks with an equal amount of tasks on the cluster built in step (1), and set the backup level to r at the same time;
(2.2)整个任务集合F划分为小批次,每一批次执行一次任务分配与运行统计,系统运行时段记作t,批次记作Fj,每个批次Fj的计算时间记作Δt;每个批次Fj划分为更小批量的小任务集合,记作 的大小记作 (2.2) The entire task set F is divided into small batches, each batch executes task allocation and operation statistics once, the system running period is denoted as t, the batch is denoted as F j , and the calculation time of each batch F j is denoted as Δt; each batch F j is divided into smaller batches of small task sets, denoted as the size of
(2.3)将同一个批次的所有任务Fj划分为个任务集,在t时刻,以比例划分批次任务Fj为个任务集合集合与类似,集合也有个元素。(2.3) Divide all tasks F j of the same batch into task set, at time t, in proportion Divide the batch task F j as set of tasks gather and similar, set also have elements.
进一步地,步骤(3)中所述分发子任务方法包括:Further, the method for distributing subtasks described in step (3) includes:
(3.1)依据步骤(1)和(2),获取两个有个元素的集合和每次从两个集合中取元素和其中是一个任务集合,是一组节点组成的集合;(3.1) According to steps (1) and (2), two valid collection of elements and Take elements from two collections at a time and in is a set of tasks, is a set of nodes;
(3.2)依次将包含的任务发送到表示的节点上,重复上述过程直到每一个任务集中的任务都被发送给了对应的中所有的节点上;同时每一个节点上存在的待处理任务个数一共为每个任务集被拷贝并发送到r个不同的节点上。(3.2) in turn Contains tasks sent to On the node represented by , repeat the above process until each task set The tasks in are sent to the corresponding On all the nodes in ; at the same time, the total number of tasks to be processed on each node is Each task set is copied and sent to r different nodes.
进一步地,步骤(4)中记录执行时间过程还包括:Further, the recording execution time process in step (4) also includes:
(4.1)等待所有节点完成其收到的任务,执行冗余校验或其它任务。收集每个节点Nq的任务执行时间,记作Δtq;(4.1) Wait for all nodes to complete the tasks they received, perform redundancy check or other tasks. Collect the task execution time of each node N q , denoted as Δt q ;
(4.2)将当前时刻记作t,节点Nq的当前计算效率记作λq(t),节点Nq上拥有的任务集的大小记作 (4.2) Denote the current moment as t, the current computing efficiency of node N q as λ q (t), and the task set owned by node N q the size of
(4.3)采用公式(1)进行计算。(4.3) Use formula (1) to calculate.
每个节点独立计算自己的计算效率λq(t),并将其发送给其他所有节点。Each node independently calculates its own computational efficiency λ q (t) and sends it to all other nodes.
进一步地,步骤(5)中计算执行效率方法包括:Further, the method for calculating execution efficiency in step (5) includes:
(5.1)获取到每个节点所有的λq(t),使用公式(2)对λq(t)进行归一化;(5.1) Obtain all λ q (t) of each node, and use formula (2) to normalize λ q (t);
定义和e(t): definition and e(t):
e(t)=(e0(t),e1(t),…,en-1(t));e(t)=(e 0 (t), e 1 (t),..., e n-1 (t));
(5.2)将分发矩阵记作令A为系数矩阵,为变量,e(t)为常数项,可得非齐次线性方程组(3)。(5.2) Write the distribution matrix as Let A be the coefficient matrix, is a variable, and e(t) is a constant item, the non-homogeneous linear equations (3) can be obtained.
其中,分发矩阵的行标代表节点下标,列标代表任务集下标,元素aq,i=1代表节点q拥有任务集i,元素aq,i=0代表节点q不拥有任务集iAmong them, the row index of the distribution matrix represents the node index, and the column index represents the task set index. The element a q, i = 1 means that the node q owns the task set i, and the element a q, i = 0 means that the node q does not own the task set i
进一步地,步骤(6)中求解分配方案具体包括:Further, solving the allocation scheme in step (6) specifically includes:
(7.1)当行数等于列数时,方程组有唯一解 (7.1) When the number of rows is equal to the number of columns, system of equations has a unique solution
(7.2)当方程组的行数大于列数时,方程组无唯一解,可以变形为一个线性规划问题,过程如下:(7.2) When the number of rows of the equation system is greater than the number of columns, The system of equations has no unique solution, so it can be transformed into a linear programming problem. The process is as follows:
(7.2.1)首先定义矩阵A的形式,其左半部分为右半部分为即:(7.2.1) First define the form of matrix A, the left half of which is the right half is Right now:
同理定义的上半部分为下半部分为 Same definition The upper half of The second half is
上述方程组改写为(公式4):The above equations are rewritten as (Formula 4):
求解得:Solved:
若的每个维度都大于0,取为自变量,将其转换为线性规划问题,如公式(6),其中为向量的第i个元素:like Each dimension of is greater than 0, take As an independent variable, convert it into a linear programming problem, such as formula (6), where as a vector The i-th element of :
求解得 Solved
(7.2.2)向量中得每一个元素 对应一个任务集占总任务数目的比例,向量对应的一组值即可作为一组分配方案。使用该分配方案依照比例划分每一批次的任务Fj,令t=t+1,进入下一个时间步,并执行步骤(2),重复任务分配与计算性能估计,直到所有任务执行完毕。(7.2.2) vector get every element Corresponding to the ratio of a task set to the total number of tasks, the vector The corresponding set of values can be used as a set of allocation schemes. Use this allocation scheme to divide each batch of tasks F j in proportion, set t=t+1, enter the next time step, and perform step (2), repeat task allocation and calculation performance estimation until all tasks are executed.
有益效果:本发明与现有技术相比,其显著优点是:(1)提供一种能在有备份的分布式集群上实现按效能的任务分配,同时尽可能降低节点之间的显在性能关联性;(2)能够实现在提高主节点的任务量时,尽量不提高备份节点的额外负担,进而可以缓解节点间存在的性能差异对整体任务运行时间的影响。Beneficial effects: Compared with the prior art, the present invention has the remarkable advantages of: (1) providing a distributed cluster that can realize performance-based task allocation while reducing the apparent performance between nodes as much as possible; Relevance; (2) It can realize that when increasing the task load of the master node, the extra burden of the backup node should not be increased as far as possible, and then the impact of the performance difference between nodes on the overall task running time can be alleviated.
附图说明Description of drawings
图1一种基于主从备份技术的分布式负载均衡方法流程图;Fig. 1 is a flow chart of a distributed load balancing method based on master-slave backup technology;
图2可行域样例。Figure 2 Feasible region example.
具体实施方式:Detailed ways:
下面结合说明书附图对本发明的具体实施方式进行说明。The specific implementation manners of the present invention will be described below in conjunction with the accompanying drawings.
如图1具体描述实施例1:设一个拥有3个节点的分布式集群 如果要在上执行4次循环计算,每次执行120个均等任务量的任务(任务之间无关联性,可以乱序执行且不相互影响)。设置每个任务在本集群上的冗余备份数目r=2(即每个任务要同时执行两次)。下文将依照上述步骤描述本方法的具体实施方式。Example 1 is described in detail in Figure 1: Set up a distributed cluster with 3 nodes if you want to be in Execute 4 loop calculations on the above, and each time execute 120 tasks with an equal amount of tasks (there is no correlation between tasks, they can be executed out of order and do not affect each other). The number of redundant backups of each task on the cluster is set to r=2 (that is, each task must be executed twice at the same time). The specific implementation of the method will be described below according to the above steps.
步骤1:对3个节点进行排列组合,得组合结果 初始化每批次任务集大小 Step 1: Arrange and combine the 3 nodes to get the combination result Initialize the task set size for each batch
步骤2:按照每批次任务集大小将每批次循环的任务进行划分,则一批次120个子任务将划分为三个集合且 Step 2: According to the size of each batch of task sets Divide the tasks of each batch cycle, then a batch of 120 subtasks will be divided into three sets and
步骤3:依照步骤1和2,得到了两个有3个元素的集合和每次从两个集合中取元素和其中是一个任务集合,是一组节点组成的集合。依次将包含的任务发送到表示的节点上,即每一个子集合中的所有节点均接收到同样的任务集即将发送给节点{N0,N1},将发送给节点{N0,N2},将发送给节点{N1,N2}。此时,每一个节点上存在的待处理任务个数一共为μ=2,每个任务集被拷贝并发送到2个不同的节点上,然后进入步骤4。Step 3: According to
步骤4:等待所有节点完成其收到的任务。每个节点Nq独立记录其所有子任务的计算效率并发送给集群中的其他所有节点。以节点N0为例,将其总的运行时间记作Δt0,当前时刻为t,将节点N0上拥有的任务集{P1,P2}的大小记作计算λ0(t)。Step 4: Wait for all nodes to complete the tasks they received. Each node N q independently records the computational efficiency of all its subtasks and sends them to all other nodes in the cluster. Taking node N 0 as an example, its total running time is recorded as Δt 0 , the current moment is t, and the size of the task set {P 1 , P 2 } owned by node N 0 is recorded as Calculate λ 0 (t).
并发送λ0(t)给节点N1,N2。进入步骤5。And send λ 0 (t) to nodes N 1 , N 2 . Go to step 5.
步骤5:在每个节点Nq上执行如下操作:从其他节点获取所有的λq(t),对λq(t)进行归一化,得Step 5: Perform the following operations on each node N q : Get all λ q (t) from other nodes, normalize λ q (t), and get
定义和e(t)为如下形式: e(t)=(e0(t),e1(t),e2(t))。可得非齐次线性方程组:definition and e(t) are of the form: e(t)=(e 0 (t), e 1 (t), e 2 (t)). Inhomogeneous linear equations can be obtained:
其中:in:
步骤6:求解非齐次线性方程组,解作为下一次迭代的任务分配方案,继续执行步骤2,直到完成所有批次的迭代。Step 6: Solve the inhomogeneous system of linear equations, solving As the task allocation scheme for the next iteration, continue to perform step 2 until all batches of iterations are completed.
假设某一次迭代中,节点N0,N1,N2上的任务执行时间分别为Δt0=16.7secs,Δt1=11.1secs,Δt2=14.3secs。可得其计算效率分别为λ0(t)=0.0399,λ1(t)=0.0600,λ2(t)=0.0466,归一化之后可得e0(t)=0.545,e1(t)=0.818,e2(t)=0.636那么上述的非齐次线性方程组的方程表示形式就可以写作:Assume that in a certain iteration, the execution time of tasks on nodes N 0 , N 1 , and N 2 are Δt 0 =16.7 secs, Δt 1 =11.1 secs, and Δt 2 =14.3 secs, respectively. The calculation efficiency can be obtained as λ 0 (t) = 0.0399, λ 1 (t) = 0.0600, λ 2 (t) = 0.0466, after normalization, e 0 (t) = 0.545, e 1 (t) =0.818, e 2 (t)=0.636 Then the equation expression form of the above-mentioned non-homogeneous linear equation system can be written as:
可以解出:can be solved:
那么下一次进行任务划分时,其每个子任务集合的大小 因此每个节点上的任务总量变为66,98,76,与其计算性能估值0.0399,0.0600,0.0466相对应,可以认为其达到了近似最大资源利用率。Then the next time the task is divided, the size of each subtask set Therefore, the total number of tasks on each node becomes 66, 98, and 76, corresponding to its computing performance estimates of 0.0399, 0.0600, and 0.0466, which can be considered to have reached the approximate maximum resource utilization.
实施例2:设一个拥有4个节点的分布式集群如果要在上执行4次循环计算,每次执行120个均等任务量的任务(任务之间无关联性,可以乱序执行且不相互影响)。设置每个任务在本集群上的冗余备份数目r=2(即每个任务要同时执行两次)。下文将依照上述步骤描述本方法的具体实施方式。Example 2: Set up a distributed cluster with 4 nodes if you want to be in Execute 4 loop calculations on the above, and each time execute 120 tasks with an equal amount of tasks (there is no correlation between tasks, they can be executed out of order and do not affect each other). The number of redundant backups of each task on the cluster is set to r=2 (that is, each task must be executed twice at the same time). The specific implementation of the method will be described below according to the above steps.
步骤1:对4个节点进行排列组合,总共有种组合方案,记作D1,D2,…,D6。D1={N1,N2},D2={N1,N3},…,D6={N3,N4}。组合结果记作 初始化 Step 1: Arrange and combine the 4 nodes, there are a total of A combination scheme, denoted as D 1 , D 2 ,..., D 6 . D 1 ={N 1 ,N 2 }, D 2 ={N 1 ,N 3 },...,D 6 ={N 3 ,N 4 }. The combined result is recorded as initialization
步骤2:按照每批次任务集大小将每批次循环的任务进行划分,则一批次120个子任务将划分为6个集合且 Step 2: According to the size of each batch of task sets Divide the tasks of each batch cycle, then a batch of 120 subtasks will be divided into 6 sets and
步骤3:依照步骤1和2,得到了两个有6个元素的集合和每次从两个集合中取元素和其中是一个任务集合,是一组节点组成的集合。依次将包含的任务发送到表示的节点上,即每一个子集合中的所有节点均接收到同样的任务集即将发送给节点{N0,N1},将发送给节点{N0,N2},将发送给节点{N0,N3},以此类推。此时,每一个节点上存在的待处理任务个数一共为μ=3,每个任务集被拷贝并发送到2个不同的节点上,然后进入步骤4。Step 3: According to
步骤4:等待所有节点完成其收到的任务。每个节点Nq独立记录其所有子任务的计算效率并发送给集群中的其他所有节点。以节点N0为例,将其总的运行时间记作Δt0,当前时刻为t,将节点N0上拥有的任务集{P1,P2,P3}的大小记作计算λ0(t)。Step 4: Wait for all nodes to complete the tasks they received. Each node N q independently records the computational efficiency of all its subtasks and sends them to all other nodes in the cluster. Taking node N 0 as an example, its total running time is recorded as Δt 0 , and the current moment is t, and the size of the task set {P 1 , P 2 , P 3 } owned by node N 0 is recorded as Calculate λ 0 (t).
并发送λ0(t)给节点N1,N2,N3。进入步骤5。And send λ 0 (t) to nodes N 1 , N 2 , N 3 . Go to step 5.
步骤5:在每个节点Nq上执行如下操作:从其他节点获取所有的λq(t),对λq(t)进行归一化,得Step 5: Perform the following operations on each node N q : Get all λ q (t) from other nodes, normalize λ q (t), and get
定义和e(t)为如下形式:e(t)=(e0(t),e1(t),...,e5(t))。definition and e(t) are of the form: e(t)=(e 0 (t), e 1 (t), . . . , e 5 (t)).
可得非齐次线性方程组:Inhomogeneous linear equations can be obtained:
其中:in:
步骤6:上述非齐次线性方程组稀疏矩阵不为非奇异矩阵,无法求解非齐次线性方程组,可以变形为一个线性规划问题,过程如下:Step 6: The sparse matrix of the above-mentioned non-homogeneous linear equations is not a non-singular matrix and cannot solve the non-homogeneous linear equations. It can be transformed into a linear programming problem. The process is as follows:
首先矩阵A可以写作一个方阵和一个普通矩阵拼接的形式,称其左半部分为Al,右半部分为Ar,即:First, the matrix A can be written in the form of a square matrix and an ordinary matrix spliced, and its left half is called A l , and the right half is called A r , that is:
同样的称的上半部分为下半部分为 the same name The upper half of The second half is
那么上述方程组可以改写为(4):Then the above equations can be rewritten as (4):
可以解出:can be solved:
令的每个维度都大于0,取为自变量,可以将其转换为线性规划问题,如公式(6)。make Each dimension of is greater than 0, take As an independent variable, it can be transformed into a linear programming problem, such as formula (6).
求得后,带入公式(6)进而求解求得向量中得每一个元素 对应一个任务集占总任务数目的比例,那么向量对应的一组值即可作为一组分配方案。使用该分配方案依照比例划分每一批次的任务Fj,令t=t+1,进入下一个时间步,并执行步骤2,重复任务分配与计算性能估计,直到所有任务执行完毕。obtain After that, it is brought into the formula (6) to solve obtain vector get every element Corresponding to the proportion of a task set to the total number of tasks, then the vector The corresponding set of values can be used as a set of allocation schemes. Use this allocation scheme to divide each batch of tasks F j according to the proportion, set t=t+1, enter the next time step, and perform step 2, repeat task allocation and calculation performance estimation until all tasks are executed.
假设某一次迭代中,节点N0,N1,N2,N3上的任务执行时间分别为Δt0=16.7secs,Δt1=11.1secs,Δt2=14.3secs,Δt3=11.1secs。可得其计算效率分别为:Assume that in a certain iteration, the task execution times on nodes N 0 , N 1 , N 2 , and N 3 are Δt 0 =16.7 secs, Δt 1 =11.1 secs, Δt 2 =14.3 secs, and Δt 3 =11.1 secs, respectively. The computational efficiencies can be obtained as follows:
λ0(t)=0.0299,λ1(t)=0.0450,λ2(t)=0.0349,λ3(t)=0.0450λ 0 (t) = 0.0299, λ 1 (t) = 0.0450, λ 2 (t) = 0.0349, λ 3 (t) = 0.0450
归一化之后可得:After normalization, we can get:
e0(t)=0.3862,e1(t)=0.5814,e2(t)=0.4510,e3(t)=0.5814e 0 (t)=0.3862, e 1 (t)=0.5814, e 2 (t)=0.4510, e 3 (t)=0.5814
那么上述的非齐次线性方程组的方程表示形式就可以写作:Then the equation representation of the above non-homogeneous linear equations can be written as:
即:Right now:
系数矩阵:Coefficient matrix:
对其求逆得:Inverting it yields:
求得:beg have to:
求得:beg have to:
令:make:
可得:Available:
上述约束在第一象限内有可行域,如图2所示。The above constraints have a feasible region in the first quadrant, as shown in Figure 2.
在上述可行域中求:Find in the above feasible domain:
取一组可行解:Take a set of feasible solutions:
继续解出 Continue to solve
那么下一次进行任务划分时,其每个子任务集合的大小 因此每个节点上的任务总量变为46,70,54,70,与其计算性能估值0.0299,0.0450,0.039,0.0450相对应,可以认为其达到了近似最大资源利用率。Then the next time the task is divided, the size of each subtask set Therefore, the total number of tasks on each node becomes 46, 70, 54, and 70, corresponding to its computing performance estimates of 0.0299, 0.0450, 0.039, and 0.0450, which can be considered to have reached the approximate maximum resource utilization.
Claims (5)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911119106.2A CN111400026B (en) | 2019-11-15 | 2019-11-15 | A distributed load balancing method based on master-slave backup technology |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911119106.2A CN111400026B (en) | 2019-11-15 | 2019-11-15 | A distributed load balancing method based on master-slave backup technology |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111400026A CN111400026A (en) | 2020-07-10 |
| CN111400026B true CN111400026B (en) | 2023-02-28 |
Family
ID=71433924
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201911119106.2A Active CN111400026B (en) | 2019-11-15 | 2019-11-15 | A distributed load balancing method based on master-slave backup technology |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111400026B (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111858721B (en) * | 2020-08-03 | 2023-07-21 | 南京大学 | A Distributed Computing Method Based on Priority Coding |
| CN113505021B (en) * | 2021-05-26 | 2023-07-18 | 南京大学 | Fault-tolerant method and system based on multi-master node master-slave distributed architecture |
| CN115730730A (en) * | 2022-11-28 | 2023-03-03 | 西安电子科技大学 | A scheduling method, system and device for production line operations |
| CN119149893B (en) * | 2024-11-14 | 2025-03-25 | 南方电网科学研究院有限责任公司 | Acceleration method for batch matrix multiplication running on computing platform |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103176850A (en) * | 2013-04-10 | 2013-06-26 | 国家电网公司 | A load-balancing task allocation method for power system network clusters |
| CN104283948A (en) * | 2014-09-26 | 2015-01-14 | 东软集团股份有限公司 | Server cluster system and load balancing implementation method thereof |
| CN105302649A (en) * | 2015-12-03 | 2016-02-03 | 中国联合网络通信集团有限公司 | Disaster recovery backup method and system |
| US20160179642A1 (en) * | 2014-12-19 | 2016-06-23 | Futurewei Technologies, Inc. | Replicated database distribution for workload balancing after cluster reconfiguration |
| CN110190991A (en) * | 2019-05-21 | 2019-08-30 | 华中科技大学 | A kind of fault-tolerance approach of distributed stream processing system under more application scenarios |
-
2019
- 2019-11-15 CN CN201911119106.2A patent/CN111400026B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103176850A (en) * | 2013-04-10 | 2013-06-26 | 国家电网公司 | A load-balancing task allocation method for power system network clusters |
| CN104283948A (en) * | 2014-09-26 | 2015-01-14 | 东软集团股份有限公司 | Server cluster system and load balancing implementation method thereof |
| US20160179642A1 (en) * | 2014-12-19 | 2016-06-23 | Futurewei Technologies, Inc. | Replicated database distribution for workload balancing after cluster reconfiguration |
| CN105302649A (en) * | 2015-12-03 | 2016-02-03 | 中国联合网络通信集团有限公司 | Disaster recovery backup method and system |
| CN110190991A (en) * | 2019-05-21 | 2019-08-30 | 华中科技大学 | A kind of fault-tolerance approach of distributed stream processing system under more application scenarios |
Non-Patent Citations (1)
| Title |
|---|
| SDN期末作业;bokerr;《https://www.cnblogs.com/bokers/p/8343502.html》;20180124;第1-8页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111400026A (en) | 2020-07-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111400026B (en) | A distributed load balancing method based on master-slave backup technology | |
| Chahal et al. | A hitchhiker’s guide on distributed training of deep neural networks | |
| KR102316670B1 (en) | computational accelerator | |
| CN103810061B (en) | A kind of High Availabitity cloud storage method | |
| CN104714850B (en) | A kind of isomery based on OPENCL calculates equalization methods jointly | |
| CN102662639A (en) | Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method | |
| CN108228970B (en) | Explicit asynchronous long parallel computing method for structural dynamics analysis | |
| CN105373517A (en) | Spark-based distributed matrix inversion parallel operation method | |
| CN119536816B (en) | Method for operating DeePMD-kit model in Shenwei supercomputer | |
| CN111428192A (en) | Method and system for optimizing sparse matrix-vector multiplication for high-performance computing frameworks | |
| CN116302574B (en) | Concurrent processing method based on MapReduce | |
| Huang et al. | Real-time contingency analysis on massively parallel architectures with compensation method | |
| Tang et al. | An efficient in-memory checkpoint method and its practice on fault-tolerant HPL | |
| CN110362780A (en) | A kind of big data tensor canonical decomposition calculation method based on Shen prestige many-core processor | |
| Huai et al. | DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems | |
| CN109033733B (en) | Parallel method for solving temperature field by using finite element region decomposition improved SSORPCG | |
| CN120670107A (en) | Heterogeneous computing thread block optimal scheduling method and system based on dynamic topology mapping | |
| CN107273339A (en) | A kind of task processing method and device | |
| Agullo et al. | On the resilience of parallel sparse hybrid solvers | |
| Jia et al. | Parallel reduction to hessenberg form with algorithm-based fault tolerance | |
| CN115361115A (en) | Quantum circuit generation method and system | |
| CN116167304B (en) | Reservoir numerical simulation GMRES optimization method and system based on Shenwei framework | |
| CN109241620B (en) | An Improved SSORPCG Parallel Solution Method for Slope Stress Field Based on Finite Element Area Decomposition | |
| WO2025161462A1 (en) | Storage and compute decoupling system, data processing method and storage medium | |
| CN119337605A (en) | A GPU direct read and write method for large-scale HDF5 data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |