[go: up one dir, main page]

CN111858058A - SGD load balancing method, device and storage medium based on parallel computing - Google Patents

SGD load balancing method, device and storage medium based on parallel computing Download PDF

Info

Publication number
CN111858058A
CN111858058A CN202010723846.3A CN202010723846A CN111858058A CN 111858058 A CN111858058 A CN 111858058A CN 202010723846 A CN202010723846 A CN 202010723846A CN 111858058 A CN111858058 A CN 111858058A
Authority
CN
China
Prior art keywords
nodes
node
load balancing
model
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010723846.3A
Other languages
Chinese (zh)
Inventor
王彪
王亚强
刘魁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Cheng Xin High Tech Information Technology Co ltd
Chengdu University of Information Technology
Original Assignee
Chengdu Cheng Xin High Tech Information Technology Co ltd
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Cheng Xin High Tech Information Technology Co ltd, Chengdu University of Information Technology filed Critical Chengdu Cheng Xin High Tech Information Technology Co ltd
Priority to CN202010723846.3A priority Critical patent/CN111858058A/en
Publication of CN111858058A publication Critical patent/CN111858058A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses an SGD load balancing method based on parallel computing, which comprises the following steps: realizing distributed parallel gpu calculation based on a design mode combining model parallel and data parallel; and a semaphore mechanism is adopted to realize synchronous communication between the main node and the sub-nodes, and the optimizer in the sub-container updates the weight by adopting a random gradient descent algorithm. The main node constructs a minimum spanning tree by taking the error in the control table of the child nodes as the weight, finds out the key nodes in the graph nodes, eliminates the nodes without nodes in sequence and redistributes the hardware resources of the nodes. The method realizes that a plurality of model copies simultaneously process different subsets of training samples, periodically carries out interactive combination on the model copies, and optimizes a distributed algorithm. The invention provides a new framework thought to realize the strategy of load balancing calculation, improves the model development efficiency and reduces the development cost, and the algorithm has better adaptability to the data scale and simultaneously realizes the asynchronous communication among the dynamic management sub-containers.

Description

基于并行计算的SGD负载均衡方法、装置及存储介质SGD load balancing method, device and storage medium based on parallel computing

技术领域technical field

本发明涉及机器学习领域,尤其涉及基于并行计算的SGD负载均衡方法、装置及存储介质。The present invention relates to the field of machine learning, and in particular, to an SGD load balancing method, device and storage medium based on parallel computing.

背景技术Background technique

目前,人们已经领略到人工智能在多个领域的巨大优势。机器学习是人工智能中重要的一环,通过对海量的数据进行建模、训练,帮助人们进行决策。At present, people have realized the huge advantages of artificial intelligence in many fields. Machine learning is an important part of artificial intelligence. It helps people make decisions by modeling and training massive amounts of data.

然而随着大数据的兴起,数据规模越来越庞大,单机模式下的存储及计算能力已经无法满足海量数据的要求。分布式机器学习应运而生,采用分布式机器学习来加快模型收敛的速度已经成为业界主流的方式,目前分布式机器学习较为通用的做法有两种:模型并行和数据并行。However, with the rise of big data, the scale of data is getting larger and larger, and the storage and computing power in stand-alone mode can no longer meet the requirements of massive data. Distributed machine learning emerges as the times require. Using distributed machine learning to speed up model convergence has become a mainstream approach in the industry. Currently, there are two common approaches to distributed machine learning: model parallelism and data parallelism.

然而当前的并行计算受限于木桶效应,往往要等到最慢的节点计算完才能进行下一步计算。实现对多个模型副本同时处理训练样本的不同子集,周期性的对各模型副本结果进行交互合并,提供大规模数据下的计算效率,技术难度要求较高。However, the current parallel computing is limited by the barrel effect, and it is often necessary to wait for the slowest node to complete the calculation before proceeding to the next step. It realizes the simultaneous processing of different subsets of training samples for multiple model copies, and periodically merges the results of each model copy to provide computing efficiency under large-scale data, which requires high technical difficulty.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服现有技术的不足,提供基于并行计算的SGD负载均衡方法、装置及存储介质,采用基于模型并行模式和数据并行模式相结合的方式。与现有技术相比,本发明有效的实现了多个模型副本同时处理训练样本的不同子集,周期性的对各模型副本的结果进行交互合并,对分布式算法进行优化。The purpose of the present invention is to overcome the deficiencies of the prior art, and to provide an SGD load balancing method, device and storage medium based on parallel computing, using a combination of model-based parallel mode and data parallel mode. Compared with the prior art, the present invention effectively realizes that multiple model copies process different subsets of training samples at the same time, periodically merges the results of each model copy interactively, and optimizes the distributed algorithm.

本发明的目的是通过以下技术方案来实现的:The purpose of this invention is to realize through the following technical solutions:

基于并行计算的SGD负载均衡方法,包括以下步骤:The SGD load balancing method based on parallel computing includes the following steps:

步骤1:搭建并行gpu计算架构,采用基于模型并行模式和数据并行模式相结合的方式,构建单向联通图,在图节点之间周期性的进行模型流通,使模型覆盖数据集,并为图节点择优分配硬件设备;Step 1: Build a parallel GPU computing architecture, build a one-way connected graph based on a combination of model-parallel mode and data-parallel mode, and periodically circulate models between graph nodes, so that the model covers the data set and creates a graph for the graph. Nodes preferentially allocate hardware devices;

步骤2:动态管理节点硬件资源,采用信号量机制实现主节点对子节点间同步通信,并在子容器中优化器采用随机梯度下降算法更新权重。Step 2: Dynamically manage the node hardware resources, use the semaphore mechanism to realize synchronous communication between the master node and the child nodes, and update the weights by the stochastic gradient descent algorithm in the optimizer in the child container.

具体的,所述步骤1中搭建并行gpu计算架构具体包括以下子步骤:Specifically, building a parallel GPU computing architecture in step 1 specifically includes the following sub-steps:

S101,配置一个管理节点Manager,在创建N个容器部署在不同的机器上,记为节点Node,在子节点上创建节点控制表,记录节点ID、节点数据集和当前批次误差;S101, configure a management node Manager, create N containers and deploy them on different machines, denoted as node Node, create a node control table on child nodes, record node ID, node data set and current batch error;

S102,在子节点间建立连接,形成单向连通图,在子节点中搭建神经网络,设置一个周期的时间片T;S102, establishing connections between child nodes to form a one-way connected graph, building a neural network in the child nodes, and setting a period of time slice T;

S103,将数据样本平均分为N份,按顺序送入节点中,使用SGD算法在不同的节点上训练,每份数据样本经过前向传播和反向传播得到一个局部的梯度值,并更新梯度;S103: Divide the data samples into N equally, send them to the nodes in sequence, use the SGD algorithm to train on different nodes, each data sample obtains a local gradient value through forward propagation and back propagation, and updates the gradient ;

S104,在每个训练周期中按照图的层次进行遍历,记录该模型误差的无偏估计量,将误差值记录在节点控制表中。S104, traverse according to the level of the graph in each training cycle, record the unbiased estimator of the model error, and record the error value in the node control table.

具体的,所述子步骤S104中图的遍历过程具体包括:将上层节点输出的权值和偏置等参数封装成一个NN对象进行传输;在当前节点收到上层节点传来的NN对象后,将NN对象作为隐含层进行训练;若当前节点有多个上层节点,则对上层节点传来的NN对象进行合并,求出NN对象的均值作为隐含层进行训练。Specifically, the traversal process of the graph in the sub-step S104 specifically includes: encapsulating parameters such as weights and biases output by the upper-layer node into a NN object for transmission; after the current node receives the NN object transmitted by the upper-layer node, The NN object is used as the hidden layer for training; if the current node has multiple upper-layer nodes, the NN objects from the upper-layer nodes are merged, and the mean value of the NN objects is obtained as the hidden layer for training.

具体的,所述步骤2中动态管理节点硬件资源过程具体包括以下子步骤:Specifically, the process of dynamically managing node hardware resources in step 2 specifically includes the following sub-steps:

S201,在每个周期中,通过主节点查询节点控制表,以节点控制表中的误差作为权值构建最小生成树,对最小生成树中的权值进行排序;S201, in each cycle, query the node control table through the master node, use the error in the node control table as the weight to construct a minimum spanning tree, and sort the weights in the minimum spanning tree;

S202,当训练模型将要收敛时,主节点根据节点控制表中的每个周期的最小生成树,按照权值对节点进行排序,对关键节点发送同步信号;S202, when the training model is about to converge, the master node sorts the nodes according to the weights according to the minimum spanning tree of each cycle in the node control table, and sends a synchronization signal to the key nodes;

S203,主节点按次序回收单向联通图中未收到同步信号的节点的任务,并将该节点的硬件资源分配给相邻的关键节点,加快相邻关键节点的计算速度,直至完成所有节点完成训练任务。S203, the master node reclaims the tasks of the nodes that have not received the synchronization signal in the one-way Unicom diagram in order, and allocates the hardware resources of the node to the adjacent key nodes to speed up the calculation speed of the adjacent key nodes until all nodes are completed. Complete training tasks.

一种计算装置,包括存储器,存储器中存储有计算机可执行指令;处理器,用于执行所述计算机程序时实现上述负载均衡方法的步骤。A computing device includes a memory in which computer-executable instructions are stored; and a processor for implementing the steps of the above load balancing method when executing the computer program.

一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述负载均衡方法的步骤。A computer-readable storage medium stores a computer program on the computer-readable storage medium, and when the computer program is executed by a processor, implements the steps of the above load balancing method.

本发明的有益效果:本发明提出了一种新的架构思路来实现负载均衡计算的策略,提高了模型开发效率和减少了开发成本,使该算法对数据规模有较好的适应性,同时实现了动态管理子容器间的异步通信。Beneficial effects of the present invention: The present invention proposes a new architectural idea to realize the strategy of load balancing calculation, which improves the model development efficiency and reduces the development cost, so that the algorithm has better adaptability to the data scale, and at the same time realizes the Asynchronous communication between subcontainers is dynamically managed.

附图说明Description of drawings

图1是本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.

图2是本发明的并行计算架构示意图。FIG. 2 is a schematic diagram of a parallel computing architecture of the present invention.

图3是本发明采用信号量机制实现动态管理节点硬件资源的示意图。FIG. 3 is a schematic diagram of the present invention using a semaphore mechanism to dynamically manage node hardware resources.

具体实施方式Detailed ways

为了对本发明的技术特征、目的和效果有更加清楚的理解,现对照附图说明本发明的具体实施方式。In order to have a clearer understanding of the technical features, objects and effects of the present invention, the specific embodiments of the present invention will now be described with reference to the accompanying drawings.

本实施例中,如图1所示,基于并行计算的SGD负载均衡方法,主要包括以下步骤:In this embodiment, as shown in FIG. 1 , the SGD load balancing method based on parallel computing mainly includes the following steps:

步骤1:搭建并行gpu计算架构,采用基于模型并行模式和数据并行模式相结合的方式,构建单向联通图,在图节点之间周期性的进行模型流通,使模型覆盖数据集,并为图节点择优分配硬件设备;Step 1: Build a parallel GPU computing architecture, build a one-way connected graph based on a combination of model-parallel mode and data-parallel mode, and periodically circulate models between graph nodes, so that the model covers the data set and creates a graph for the graph. Nodes preferentially allocate hardware devices;

步骤2:动态管理节点硬件资源,采用信号量机制实现主节点对子节点间同步通信,并在子容器中优化器采用随机梯度下降算法更新权重。Step 2: Dynamically manage the node hardware resources, use the semaphore mechanism to realize synchronous communication between the master node and the child nodes, and update the weights by the stochastic gradient descent algorithm in the optimizer in the child container.

本实施例中,如图2所示,本发明提供了基于并行计算SGD负载均衡方法的结构示意图,其具体实现过程包括:首先配置一个管理节点Manager,在创建N个容器部署在不同的机器上,记为节点Node,在子节点上创建节点控制表,用来记录节点ID、节点数据集、当前批次误差。在子节点间建立连接,形成单向连通图(图节点为GPU硬件设备),在子节点中搭建神经网络,设置一个周期的时间片T。将数据样本平均分为N份,按顺序送入节点中,使用SGD算法在不同的节点上训练,每份数据样本经过前向传播和反向传播得到一个局部的梯度值,并更新梯度。在每个训练周期中按照图的层次遍历记录该模型误差的无偏估计量,将误差值记录在节点控制表中。其中,在图的遍历过程中相邻节点之间需要传输节点之间的权值和偏置,由于神经网络复杂、参数众多,所以将参数封装成一个NN对象进行传输,在节点收到上层节点传来的NN对象后,将NN对象作为隐含层进行训练。若节点有多个上层节点,则对上层节点传来的NN对象进行合并,求出NN对象的均值作为隐含层进行训练。周期性的进行模型流通,使模型在所有数据上运行。In this embodiment, as shown in FIG. 2 , the present invention provides a schematic structural diagram of the SGD load balancing method based on parallel computing. The specific implementation process includes: firstly configuring a management node Manager, and then creating N containers and deploying them on different machines , denoted as node Node, create a node control table on child nodes to record node ID, node data set, and current batch error. Establish connections between child nodes to form a one-way connected graph (the graph nodes are GPU hardware devices), build a neural network in the child nodes, and set a period of time slice T. The data samples are equally divided into N parts, which are sent to the nodes in sequence, and the SGD algorithm is used to train on different nodes. Each data sample obtains a local gradient value through forward propagation and back propagation, and the gradient is updated. In each training cycle, the unbiased estimator of the model error is traversed and recorded according to the level of the graph, and the error value is recorded in the node control table. Among them, in the process of graph traversal, the weights and offsets between adjacent nodes need to be transmitted between adjacent nodes. Because the neural network is complex and has many parameters, the parameters are encapsulated into a NN object for transmission, and the node receives the upper-layer node. After the incoming NN object, the NN object is used as a hidden layer for training. If the node has multiple upper-layer nodes, the NN objects from the upper-layer nodes are merged, and the mean value of the NN objects is obtained as the hidden layer for training. Periodically circulate the model so that the model runs on all data.

基于步骤1所述的架构,在训练一段时间后,部分节点的误差会下降的非常缓慢,需要非常长的训练时间才能达到收敛,非常影响训练效率,同时也会产生大量的无效计算,造成硬件资源的浪费。因此本发明引进信号量机制实现主节点与子节点的同步通信,管理对节点硬件资源进行动态管理。Based on the architecture described in step 1, after training for a period of time, the error of some nodes will decrease very slowly, and it will take a very long training time to achieve convergence, which will greatly affect the training efficiency, and will also generate a large number of invalid calculations, causing hardware failure. waste of resources. Therefore, the present invention introduces a semaphore mechanism to realize the synchronous communication between the master node and the child nodes, and manages the dynamic management of the node hardware resources.

本实施例中,图3是本发明采用信号量机制实现动态管理节点硬件资源的示意图,其具体的实现过程包括:在每个周期中,主节点查询节点控制表,以节点控制表中的误差作为权值构建最小生成树,对最小生成树中的权值进行排序。训练一定周期后(模型将要收敛时),主节点根据节点控制表中的每个周期的最小生成树,按照权值对节点进行排序,对关键节点发送同步信号。随后主节点按次序回收未收到同步信号的节点的任务,并将该节点的硬件资源分配给相邻的关键节点,用来加快相邻节点的计算速度,以提升整个模型的效率。In this embodiment, FIG. 3 is a schematic diagram of the present invention using a semaphore mechanism to implement dynamic management of node hardware resources. The specific implementation process includes: in each cycle, the master node queries the node control table, and uses the error in the node control table. Build a minimum spanning tree as weights, and sort the weights in the minimum spanning tree. After a certain period of training (when the model is about to converge), the master node sorts the nodes according to the weights according to the minimum spanning tree of each cycle in the node control table, and sends a synchronization signal to the key nodes. Then the master node reclaims the tasks of the nodes that have not received the synchronization signal in order, and allocates the hardware resources of the node to the adjacent key nodes to speed up the calculation speed of the adjacent nodes and improve the efficiency of the entire model.

本发明所采用的架构思路能有效的降低Loss值,提供模型的开发效率,减少开发成本,且对数据规模有较好的适应性。The architectural idea adopted by the present invention can effectively reduce the Loss value, improve the development efficiency of the model, reduce the development cost, and have better adaptability to the data scale.

此外,本发明还提供一种计算装置和一种计算机可读存储介质。其中,一种计算装置包括存储器,存储器中存储有计算机可执行指令;处理器,用于执行所述计算机程序时实现实施例中负载均衡方法的所有实现过程和步骤。一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述负载均衡方法的所有方法和步骤。In addition, the present invention also provides a computing device and a computer-readable storage medium. Wherein, a computing device includes a memory, in which computer-executable instructions are stored; and a processor for implementing all implementation processes and steps of the load balancing method in the embodiment when executing the computer program. A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements all the methods and steps of the above load balancing method.

以上显示和描述了本发明的基本原理和主要特征和本发明的优点。本行业的技术人员应该了解,本发明不受上述实施例的限制,上述实施例和说明书中描述的只是说明本发明的原理,在不脱离本发明精神和范围的前提下,本发明还会有各种变化和改进,这些变化和改进都落入要求保护的本发明范围内。本发明要求保护的范围由所附的权利要求书及其等效物界定。The basic principles and main features of the present invention and the advantages of the present invention have been shown and described above. Those skilled in the art should understand that the present invention is not limited by the above-mentioned embodiments. The above-mentioned embodiments and descriptions only illustrate the principle of the present invention. Without departing from the spirit and scope of the present invention, the present invention will also have Various changes and modifications fall within the scope of the claimed invention. The claimed scope of the present invention is defined by the appended claims and their equivalents.

Claims (6)

1. The SGD load balancing method based on parallel computing is characterized by comprising the following steps of:
step 1: constructing a parallel gpu computing architecture, constructing a one-way communication graph by adopting a mode of combining a model parallel mode and a data parallel mode, periodically carrying out model circulation among graph nodes, enabling a model to cover a data set, and preferentially distributing hardware equipment for the graph nodes;
step 2: and dynamically managing node hardware resources, realizing synchronous communication between the main node and the sub-nodes by adopting a semaphore mechanism, and updating the weight by adopting a random gradient descent algorithm in the optimizer in the sub-container.
2. The SGD load balancing method based on parallel computing according to claim 1, wherein the building of the parallel gpu computing architecture in step 1 specifically includes the following sub-steps:
s101, configuring a management Node Manager, creating N containers to be deployed on different machines, marking as Node nodes, creating a Node control table on a child Node, and recording a Node ID, a Node data set and a current batch error;
s102, establishing connection among the sub-nodes to form a one-way connection graph, building a neural network in the sub-nodes, and setting a time slice T of one period;
s103, evenly dividing the data samples into N parts, sending the N parts into nodes in sequence, training the nodes on different nodes by using an SGD algorithm, obtaining a local gradient value by each part of the data samples through forward propagation and backward propagation, and updating the gradient; and S104, traversing according to the hierarchy of the graph in each training period, recording the unbiased estimation quantity of the model error, and recording the error value in the node control table.
3. The SGD load balancing method according to claim 2, wherein the traversal process of the graph in the sub-step S104 specifically includes: packing parameters such as weight and bias output by an upper node into an NN object for transmission; after the current node receives the NN object transmitted by the upper node, training the NN object as a hidden layer; and if the current node has a plurality of upper nodes, merging NN objects transmitted from the upper nodes, and solving the mean value of the NN objects as a hidden layer for training.
4. The SGD load balancing method based on parallel computing according to claim 1, wherein the step 2 of dynamically managing hardware resources of nodes specifically comprises the following sub-steps:
s201, in each period, inquiring a node control table through a main node, constructing a minimum spanning tree by taking an error in the node control table as a weight, and sequencing the weights in the minimum spanning tree;
s202, when the training model is to be converged, the main node sorts the nodes according to the minimum spanning tree of each period in the node control table and the weight, and sends a synchronization signal to the key node;
and S203, the main node sequentially recovers the tasks of the nodes which do not receive the synchronous signals in the unidirectional communication graph, distributes the hardware resources of the nodes to the adjacent key nodes, and accelerates the calculation speed of the adjacent key nodes until all the nodes finish the training tasks.
5. A computing device, comprising
A memory having computer-executable instructions stored therein;
a processor for implementing the steps of the load balancing method according to any one of claims 1 to 4 when executing the computer program.
6. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the load balancing method according to any one of claims 1 to 4.
CN202010723846.3A 2020-07-24 2020-07-24 SGD load balancing method, device and storage medium based on parallel computing Pending CN111858058A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010723846.3A CN111858058A (en) 2020-07-24 2020-07-24 SGD load balancing method, device and storage medium based on parallel computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010723846.3A CN111858058A (en) 2020-07-24 2020-07-24 SGD load balancing method, device and storage medium based on parallel computing

Publications (1)

Publication Number Publication Date
CN111858058A true CN111858058A (en) 2020-10-30

Family

ID=72950115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010723846.3A Pending CN111858058A (en) 2020-07-24 2020-07-24 SGD load balancing method, device and storage medium based on parallel computing

Country Status (1)

Country Link
CN (1) CN111858058A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598118A (en) * 2021-03-03 2021-04-02 成都晓多科技有限公司 Method, device, storage medium and equipment for processing abnormal labeling in supervised learning
CN114035937A (en) * 2021-10-15 2022-02-11 北京潞晨科技有限公司 Distributed training and reasoning method, system, equipment and readable storage medium based on artificial intelligence
CN114167828A (en) * 2021-12-03 2022-03-11 润电能源科学技术有限公司 External hanging control method of DCS controller and related device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339351A (en) * 2016-08-30 2017-01-18 浪潮(北京)电子信息产业有限公司 SGD (Stochastic Gradient Descent) algorithm optimization system and method
CN108304918A (en) * 2018-01-18 2018-07-20 中兴飞流信息科技有限公司 A kind of the parameter exchange method and system of the deep learning of data parallel
CN108921196A (en) * 2018-06-01 2018-11-30 南京邮电大学 A kind of semantic segmentation method for improving full convolutional neural networks
CN110678843A (en) * 2017-04-17 2020-01-10 微软技术许可有限责任公司 Dynamically partitioning workloads in deep neural network modules to reduce power consumption
CN110795228A (en) * 2018-08-03 2020-02-14 伊姆西Ip控股有限责任公司 Adaptive batch dataset partitioning for distributed deep learning using accelerator mixture sets
CN111178486A (en) * 2019-11-27 2020-05-19 湖州师范学院 An Asynchronous Parallel Search Method for Hyperparameters Based on Population Evolution
WO2020102526A1 (en) * 2018-11-14 2020-05-22 North Carolina State University Deep neural network with compositional grammatical architectures
US20200175422A1 (en) * 2018-11-29 2020-06-04 International Business Machines Corporation Asynchronous gradient weight compression

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339351A (en) * 2016-08-30 2017-01-18 浪潮(北京)电子信息产业有限公司 SGD (Stochastic Gradient Descent) algorithm optimization system and method
CN110678843A (en) * 2017-04-17 2020-01-10 微软技术许可有限责任公司 Dynamically partitioning workloads in deep neural network modules to reduce power consumption
CN108304918A (en) * 2018-01-18 2018-07-20 中兴飞流信息科技有限公司 A kind of the parameter exchange method and system of the deep learning of data parallel
CN108921196A (en) * 2018-06-01 2018-11-30 南京邮电大学 A kind of semantic segmentation method for improving full convolutional neural networks
CN110795228A (en) * 2018-08-03 2020-02-14 伊姆西Ip控股有限责任公司 Adaptive batch dataset partitioning for distributed deep learning using accelerator mixture sets
WO2020102526A1 (en) * 2018-11-14 2020-05-22 North Carolina State University Deep neural network with compositional grammatical architectures
US20200175422A1 (en) * 2018-11-29 2020-06-04 International Business Machines Corporation Asynchronous gradient weight compression
CN111178486A (en) * 2019-11-27 2020-05-19 湖州师范学院 An Asynchronous Parallel Search Method for Hyperparameters Based on Population Evolution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHENG DANING 等: "Weighted parallel SGD for distributed unbalanced-workload training system", 《JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING》 *
鲁淑霞 等: "带有方差减小的加权零阶随机梯度下降算法", 《河北大学学报(自然科学版)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598118A (en) * 2021-03-03 2021-04-02 成都晓多科技有限公司 Method, device, storage medium and equipment for processing abnormal labeling in supervised learning
CN112598118B (en) * 2021-03-03 2021-06-25 成都晓多科技有限公司 Method, device, storage medium and equipment for processing abnormal labeling in supervised learning
CN114035937A (en) * 2021-10-15 2022-02-11 北京潞晨科技有限公司 Distributed training and reasoning method, system, equipment and readable storage medium based on artificial intelligence
CN114167828A (en) * 2021-12-03 2022-03-11 润电能源科学技术有限公司 External hanging control method of DCS controller and related device

Similar Documents

Publication Publication Date Title
CN109032671B (en) Distributed deep learning method and system based on data parallel strategy
CN110533183B (en) A Task Placement Method for Heterogeneous Network Awareness in Pipelined Distributed Deep Learning
CN114756383B (en) Distributed computing method, system, equipment and storage medium
Rashidi et al. Astra-sim: Enabling sw/hw co-design exploration for distributed dl training platforms
US20220129302A1 (en) Data processing system and method for heterogeneous architecture
US11481627B2 (en) Distributed learning of composite machine learning models
CN111858058A (en) SGD load balancing method, device and storage medium based on parallel computing
Van Tendeloo et al. PythonPDEVS: a distributed Parallel DEVS simulator
CN107578094A (en) The method that the distributed training of neutral net is realized based on parameter server and FPGA
Sun et al. Gradientflow: Optimizing network performance for large-scale distributed dnn training
CN110347636B (en) Data execution body and data processing method thereof
Zhan et al. Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking
CN107807983A (en) A kind of parallel processing framework and design method for supporting extensive Dynamic Graph data query
WO2025112979A1 (en) Parallel strategy optimal selection method, and neural network solver training method and apparatus
CN111241301A (en) Knowledge graph representation learning-oriented distributed framework construction method
CN118644225B (en) A substation operation and maintenance decision-making method based on multi-agent reinforcement learning
Liu et al. Aedfl: efficient asynchronous decentralized federated learning with heterogeneous devices
Addanki et al. Placeto: Efficient progressive device placement optimization
WO2025081828A1 (en) Training model distribution method and apparatus, and computer device and storage medium
CN110868461B (en) Data distribution method facing heterogeneous bandwidth between nodes in Gaia cluster
Zhang et al. The optimization of model parallelization strategies for multi-GPU training
CN119166298A (en) Heterogeneous intelligent computing power optimization management and scheduling system to accelerate large model training tasks
CN106201985B (en) A kind of distributed parallel load flow calculation system development approach based on PQ method
Wang et al. A coordinated two-stages virtual network embedding algorithm based on reinforcement learning
CN114358859A (en) Graph-based large-scale embedding model training method and system for click-through rate prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20221209

AD01 Patent right deemed abandoned