CN111858058A - SGD load balancing method, device and storage medium based on parallel computing - Google Patents
SGD load balancing method, device and storage medium based on parallel computing Download PDFInfo
- Publication number
- CN111858058A CN111858058A CN202010723846.3A CN202010723846A CN111858058A CN 111858058 A CN111858058 A CN 111858058A CN 202010723846 A CN202010723846 A CN 202010723846A CN 111858058 A CN111858058 A CN 111858058A
- Authority
- CN
- China
- Prior art keywords
- nodes
- node
- load balancing
- model
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multi Processors (AREA)
Abstract
Description
技术领域technical field
本发明涉及机器学习领域,尤其涉及基于并行计算的SGD负载均衡方法、装置及存储介质。The present invention relates to the field of machine learning, and in particular, to an SGD load balancing method, device and storage medium based on parallel computing.
背景技术Background technique
目前,人们已经领略到人工智能在多个领域的巨大优势。机器学习是人工智能中重要的一环,通过对海量的数据进行建模、训练,帮助人们进行决策。At present, people have realized the huge advantages of artificial intelligence in many fields. Machine learning is an important part of artificial intelligence. It helps people make decisions by modeling and training massive amounts of data.
然而随着大数据的兴起,数据规模越来越庞大,单机模式下的存储及计算能力已经无法满足海量数据的要求。分布式机器学习应运而生,采用分布式机器学习来加快模型收敛的速度已经成为业界主流的方式,目前分布式机器学习较为通用的做法有两种:模型并行和数据并行。However, with the rise of big data, the scale of data is getting larger and larger, and the storage and computing power in stand-alone mode can no longer meet the requirements of massive data. Distributed machine learning emerges as the times require. Using distributed machine learning to speed up model convergence has become a mainstream approach in the industry. Currently, there are two common approaches to distributed machine learning: model parallelism and data parallelism.
然而当前的并行计算受限于木桶效应,往往要等到最慢的节点计算完才能进行下一步计算。实现对多个模型副本同时处理训练样本的不同子集,周期性的对各模型副本结果进行交互合并,提供大规模数据下的计算效率,技术难度要求较高。However, the current parallel computing is limited by the barrel effect, and it is often necessary to wait for the slowest node to complete the calculation before proceeding to the next step. It realizes the simultaneous processing of different subsets of training samples for multiple model copies, and periodically merges the results of each model copy to provide computing efficiency under large-scale data, which requires high technical difficulty.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于克服现有技术的不足,提供基于并行计算的SGD负载均衡方法、装置及存储介质,采用基于模型并行模式和数据并行模式相结合的方式。与现有技术相比,本发明有效的实现了多个模型副本同时处理训练样本的不同子集,周期性的对各模型副本的结果进行交互合并,对分布式算法进行优化。The purpose of the present invention is to overcome the deficiencies of the prior art, and to provide an SGD load balancing method, device and storage medium based on parallel computing, using a combination of model-based parallel mode and data parallel mode. Compared with the prior art, the present invention effectively realizes that multiple model copies process different subsets of training samples at the same time, periodically merges the results of each model copy interactively, and optimizes the distributed algorithm.
本发明的目的是通过以下技术方案来实现的:The purpose of this invention is to realize through the following technical solutions:
基于并行计算的SGD负载均衡方法,包括以下步骤:The SGD load balancing method based on parallel computing includes the following steps:
步骤1:搭建并行gpu计算架构,采用基于模型并行模式和数据并行模式相结合的方式,构建单向联通图,在图节点之间周期性的进行模型流通,使模型覆盖数据集,并为图节点择优分配硬件设备;Step 1: Build a parallel GPU computing architecture, build a one-way connected graph based on a combination of model-parallel mode and data-parallel mode, and periodically circulate models between graph nodes, so that the model covers the data set and creates a graph for the graph. Nodes preferentially allocate hardware devices;
步骤2:动态管理节点硬件资源,采用信号量机制实现主节点对子节点间同步通信,并在子容器中优化器采用随机梯度下降算法更新权重。Step 2: Dynamically manage the node hardware resources, use the semaphore mechanism to realize synchronous communication between the master node and the child nodes, and update the weights by the stochastic gradient descent algorithm in the optimizer in the child container.
具体的,所述步骤1中搭建并行gpu计算架构具体包括以下子步骤:Specifically, building a parallel GPU computing architecture in step 1 specifically includes the following sub-steps:
S101,配置一个管理节点Manager,在创建N个容器部署在不同的机器上,记为节点Node,在子节点上创建节点控制表,记录节点ID、节点数据集和当前批次误差;S101, configure a management node Manager, create N containers and deploy them on different machines, denoted as node Node, create a node control table on child nodes, record node ID, node data set and current batch error;
S102,在子节点间建立连接,形成单向连通图,在子节点中搭建神经网络,设置一个周期的时间片T;S102, establishing connections between child nodes to form a one-way connected graph, building a neural network in the child nodes, and setting a period of time slice T;
S103,将数据样本平均分为N份,按顺序送入节点中,使用SGD算法在不同的节点上训练,每份数据样本经过前向传播和反向传播得到一个局部的梯度值,并更新梯度;S103: Divide the data samples into N equally, send them to the nodes in sequence, use the SGD algorithm to train on different nodes, each data sample obtains a local gradient value through forward propagation and back propagation, and updates the gradient ;
S104,在每个训练周期中按照图的层次进行遍历,记录该模型误差的无偏估计量,将误差值记录在节点控制表中。S104, traverse according to the level of the graph in each training cycle, record the unbiased estimator of the model error, and record the error value in the node control table.
具体的,所述子步骤S104中图的遍历过程具体包括:将上层节点输出的权值和偏置等参数封装成一个NN对象进行传输;在当前节点收到上层节点传来的NN对象后,将NN对象作为隐含层进行训练;若当前节点有多个上层节点,则对上层节点传来的NN对象进行合并,求出NN对象的均值作为隐含层进行训练。Specifically, the traversal process of the graph in the sub-step S104 specifically includes: encapsulating parameters such as weights and biases output by the upper-layer node into a NN object for transmission; after the current node receives the NN object transmitted by the upper-layer node, The NN object is used as the hidden layer for training; if the current node has multiple upper-layer nodes, the NN objects from the upper-layer nodes are merged, and the mean value of the NN objects is obtained as the hidden layer for training.
具体的,所述步骤2中动态管理节点硬件资源过程具体包括以下子步骤:Specifically, the process of dynamically managing node hardware resources in step 2 specifically includes the following sub-steps:
S201,在每个周期中,通过主节点查询节点控制表,以节点控制表中的误差作为权值构建最小生成树,对最小生成树中的权值进行排序;S201, in each cycle, query the node control table through the master node, use the error in the node control table as the weight to construct a minimum spanning tree, and sort the weights in the minimum spanning tree;
S202,当训练模型将要收敛时,主节点根据节点控制表中的每个周期的最小生成树,按照权值对节点进行排序,对关键节点发送同步信号;S202, when the training model is about to converge, the master node sorts the nodes according to the weights according to the minimum spanning tree of each cycle in the node control table, and sends a synchronization signal to the key nodes;
S203,主节点按次序回收单向联通图中未收到同步信号的节点的任务,并将该节点的硬件资源分配给相邻的关键节点,加快相邻关键节点的计算速度,直至完成所有节点完成训练任务。S203, the master node reclaims the tasks of the nodes that have not received the synchronization signal in the one-way Unicom diagram in order, and allocates the hardware resources of the node to the adjacent key nodes to speed up the calculation speed of the adjacent key nodes until all nodes are completed. Complete training tasks.
一种计算装置,包括存储器,存储器中存储有计算机可执行指令;处理器,用于执行所述计算机程序时实现上述负载均衡方法的步骤。A computing device includes a memory in which computer-executable instructions are stored; and a processor for implementing the steps of the above load balancing method when executing the computer program.
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述负载均衡方法的步骤。A computer-readable storage medium stores a computer program on the computer-readable storage medium, and when the computer program is executed by a processor, implements the steps of the above load balancing method.
本发明的有益效果:本发明提出了一种新的架构思路来实现负载均衡计算的策略,提高了模型开发效率和减少了开发成本,使该算法对数据规模有较好的适应性,同时实现了动态管理子容器间的异步通信。Beneficial effects of the present invention: The present invention proposes a new architectural idea to realize the strategy of load balancing calculation, which improves the model development efficiency and reduces the development cost, so that the algorithm has better adaptability to the data scale, and at the same time realizes the Asynchronous communication between subcontainers is dynamically managed.
附图说明Description of drawings
图1是本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.
图2是本发明的并行计算架构示意图。FIG. 2 is a schematic diagram of a parallel computing architecture of the present invention.
图3是本发明采用信号量机制实现动态管理节点硬件资源的示意图。FIG. 3 is a schematic diagram of the present invention using a semaphore mechanism to dynamically manage node hardware resources.
具体实施方式Detailed ways
为了对本发明的技术特征、目的和效果有更加清楚的理解,现对照附图说明本发明的具体实施方式。In order to have a clearer understanding of the technical features, objects and effects of the present invention, the specific embodiments of the present invention will now be described with reference to the accompanying drawings.
本实施例中,如图1所示,基于并行计算的SGD负载均衡方法,主要包括以下步骤:In this embodiment, as shown in FIG. 1 , the SGD load balancing method based on parallel computing mainly includes the following steps:
步骤1:搭建并行gpu计算架构,采用基于模型并行模式和数据并行模式相结合的方式,构建单向联通图,在图节点之间周期性的进行模型流通,使模型覆盖数据集,并为图节点择优分配硬件设备;Step 1: Build a parallel GPU computing architecture, build a one-way connected graph based on a combination of model-parallel mode and data-parallel mode, and periodically circulate models between graph nodes, so that the model covers the data set and creates a graph for the graph. Nodes preferentially allocate hardware devices;
步骤2:动态管理节点硬件资源,采用信号量机制实现主节点对子节点间同步通信,并在子容器中优化器采用随机梯度下降算法更新权重。Step 2: Dynamically manage the node hardware resources, use the semaphore mechanism to realize synchronous communication between the master node and the child nodes, and update the weights by the stochastic gradient descent algorithm in the optimizer in the child container.
本实施例中,如图2所示,本发明提供了基于并行计算SGD负载均衡方法的结构示意图,其具体实现过程包括:首先配置一个管理节点Manager,在创建N个容器部署在不同的机器上,记为节点Node,在子节点上创建节点控制表,用来记录节点ID、节点数据集、当前批次误差。在子节点间建立连接,形成单向连通图(图节点为GPU硬件设备),在子节点中搭建神经网络,设置一个周期的时间片T。将数据样本平均分为N份,按顺序送入节点中,使用SGD算法在不同的节点上训练,每份数据样本经过前向传播和反向传播得到一个局部的梯度值,并更新梯度。在每个训练周期中按照图的层次遍历记录该模型误差的无偏估计量,将误差值记录在节点控制表中。其中,在图的遍历过程中相邻节点之间需要传输节点之间的权值和偏置,由于神经网络复杂、参数众多,所以将参数封装成一个NN对象进行传输,在节点收到上层节点传来的NN对象后,将NN对象作为隐含层进行训练。若节点有多个上层节点,则对上层节点传来的NN对象进行合并,求出NN对象的均值作为隐含层进行训练。周期性的进行模型流通,使模型在所有数据上运行。In this embodiment, as shown in FIG. 2 , the present invention provides a schematic structural diagram of the SGD load balancing method based on parallel computing. The specific implementation process includes: firstly configuring a management node Manager, and then creating N containers and deploying them on different machines , denoted as node Node, create a node control table on child nodes to record node ID, node data set, and current batch error. Establish connections between child nodes to form a one-way connected graph (the graph nodes are GPU hardware devices), build a neural network in the child nodes, and set a period of time slice T. The data samples are equally divided into N parts, which are sent to the nodes in sequence, and the SGD algorithm is used to train on different nodes. Each data sample obtains a local gradient value through forward propagation and back propagation, and the gradient is updated. In each training cycle, the unbiased estimator of the model error is traversed and recorded according to the level of the graph, and the error value is recorded in the node control table. Among them, in the process of graph traversal, the weights and offsets between adjacent nodes need to be transmitted between adjacent nodes. Because the neural network is complex and has many parameters, the parameters are encapsulated into a NN object for transmission, and the node receives the upper-layer node. After the incoming NN object, the NN object is used as a hidden layer for training. If the node has multiple upper-layer nodes, the NN objects from the upper-layer nodes are merged, and the mean value of the NN objects is obtained as the hidden layer for training. Periodically circulate the model so that the model runs on all data.
基于步骤1所述的架构,在训练一段时间后,部分节点的误差会下降的非常缓慢,需要非常长的训练时间才能达到收敛,非常影响训练效率,同时也会产生大量的无效计算,造成硬件资源的浪费。因此本发明引进信号量机制实现主节点与子节点的同步通信,管理对节点硬件资源进行动态管理。Based on the architecture described in step 1, after training for a period of time, the error of some nodes will decrease very slowly, and it will take a very long training time to achieve convergence, which will greatly affect the training efficiency, and will also generate a large number of invalid calculations, causing hardware failure. waste of resources. Therefore, the present invention introduces a semaphore mechanism to realize the synchronous communication between the master node and the child nodes, and manages the dynamic management of the node hardware resources.
本实施例中,图3是本发明采用信号量机制实现动态管理节点硬件资源的示意图,其具体的实现过程包括:在每个周期中,主节点查询节点控制表,以节点控制表中的误差作为权值构建最小生成树,对最小生成树中的权值进行排序。训练一定周期后(模型将要收敛时),主节点根据节点控制表中的每个周期的最小生成树,按照权值对节点进行排序,对关键节点发送同步信号。随后主节点按次序回收未收到同步信号的节点的任务,并将该节点的硬件资源分配给相邻的关键节点,用来加快相邻节点的计算速度,以提升整个模型的效率。In this embodiment, FIG. 3 is a schematic diagram of the present invention using a semaphore mechanism to implement dynamic management of node hardware resources. The specific implementation process includes: in each cycle, the master node queries the node control table, and uses the error in the node control table. Build a minimum spanning tree as weights, and sort the weights in the minimum spanning tree. After a certain period of training (when the model is about to converge), the master node sorts the nodes according to the weights according to the minimum spanning tree of each cycle in the node control table, and sends a synchronization signal to the key nodes. Then the master node reclaims the tasks of the nodes that have not received the synchronization signal in order, and allocates the hardware resources of the node to the adjacent key nodes to speed up the calculation speed of the adjacent nodes and improve the efficiency of the entire model.
本发明所采用的架构思路能有效的降低Loss值,提供模型的开发效率,减少开发成本,且对数据规模有较好的适应性。The architectural idea adopted by the present invention can effectively reduce the Loss value, improve the development efficiency of the model, reduce the development cost, and have better adaptability to the data scale.
此外,本发明还提供一种计算装置和一种计算机可读存储介质。其中,一种计算装置包括存储器,存储器中存储有计算机可执行指令;处理器,用于执行所述计算机程序时实现实施例中负载均衡方法的所有实现过程和步骤。一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述负载均衡方法的所有方法和步骤。In addition, the present invention also provides a computing device and a computer-readable storage medium. Wherein, a computing device includes a memory, in which computer-executable instructions are stored; and a processor for implementing all implementation processes and steps of the load balancing method in the embodiment when executing the computer program. A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements all the methods and steps of the above load balancing method.
以上显示和描述了本发明的基本原理和主要特征和本发明的优点。本行业的技术人员应该了解,本发明不受上述实施例的限制,上述实施例和说明书中描述的只是说明本发明的原理,在不脱离本发明精神和范围的前提下,本发明还会有各种变化和改进,这些变化和改进都落入要求保护的本发明范围内。本发明要求保护的范围由所附的权利要求书及其等效物界定。The basic principles and main features of the present invention and the advantages of the present invention have been shown and described above. Those skilled in the art should understand that the present invention is not limited by the above-mentioned embodiments. The above-mentioned embodiments and descriptions only illustrate the principle of the present invention. Without departing from the spirit and scope of the present invention, the present invention will also have Various changes and modifications fall within the scope of the claimed invention. The claimed scope of the present invention is defined by the appended claims and their equivalents.
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010723846.3A CN111858058A (en) | 2020-07-24 | 2020-07-24 | SGD load balancing method, device and storage medium based on parallel computing |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010723846.3A CN111858058A (en) | 2020-07-24 | 2020-07-24 | SGD load balancing method, device and storage medium based on parallel computing |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN111858058A true CN111858058A (en) | 2020-10-30 |
Family
ID=72950115
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010723846.3A Pending CN111858058A (en) | 2020-07-24 | 2020-07-24 | SGD load balancing method, device and storage medium based on parallel computing |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111858058A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112598118A (en) * | 2021-03-03 | 2021-04-02 | 成都晓多科技有限公司 | Method, device, storage medium and equipment for processing abnormal labeling in supervised learning |
| CN114035937A (en) * | 2021-10-15 | 2022-02-11 | 北京潞晨科技有限公司 | Distributed training and reasoning method, system, equipment and readable storage medium based on artificial intelligence |
| CN114167828A (en) * | 2021-12-03 | 2022-03-11 | 润电能源科学技术有限公司 | External hanging control method of DCS controller and related device |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106339351A (en) * | 2016-08-30 | 2017-01-18 | 浪潮(北京)电子信息产业有限公司 | SGD (Stochastic Gradient Descent) algorithm optimization system and method |
| CN108304918A (en) * | 2018-01-18 | 2018-07-20 | 中兴飞流信息科技有限公司 | A kind of the parameter exchange method and system of the deep learning of data parallel |
| CN108921196A (en) * | 2018-06-01 | 2018-11-30 | 南京邮电大学 | A kind of semantic segmentation method for improving full convolutional neural networks |
| CN110678843A (en) * | 2017-04-17 | 2020-01-10 | 微软技术许可有限责任公司 | Dynamically partitioning workloads in deep neural network modules to reduce power consumption |
| CN110795228A (en) * | 2018-08-03 | 2020-02-14 | 伊姆西Ip控股有限责任公司 | Adaptive batch dataset partitioning for distributed deep learning using accelerator mixture sets |
| CN111178486A (en) * | 2019-11-27 | 2020-05-19 | 湖州师范学院 | An Asynchronous Parallel Search Method for Hyperparameters Based on Population Evolution |
| WO2020102526A1 (en) * | 2018-11-14 | 2020-05-22 | North Carolina State University | Deep neural network with compositional grammatical architectures |
| US20200175422A1 (en) * | 2018-11-29 | 2020-06-04 | International Business Machines Corporation | Asynchronous gradient weight compression |
-
2020
- 2020-07-24 CN CN202010723846.3A patent/CN111858058A/en active Pending
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106339351A (en) * | 2016-08-30 | 2017-01-18 | 浪潮(北京)电子信息产业有限公司 | SGD (Stochastic Gradient Descent) algorithm optimization system and method |
| CN110678843A (en) * | 2017-04-17 | 2020-01-10 | 微软技术许可有限责任公司 | Dynamically partitioning workloads in deep neural network modules to reduce power consumption |
| CN108304918A (en) * | 2018-01-18 | 2018-07-20 | 中兴飞流信息科技有限公司 | A kind of the parameter exchange method and system of the deep learning of data parallel |
| CN108921196A (en) * | 2018-06-01 | 2018-11-30 | 南京邮电大学 | A kind of semantic segmentation method for improving full convolutional neural networks |
| CN110795228A (en) * | 2018-08-03 | 2020-02-14 | 伊姆西Ip控股有限责任公司 | Adaptive batch dataset partitioning for distributed deep learning using accelerator mixture sets |
| WO2020102526A1 (en) * | 2018-11-14 | 2020-05-22 | North Carolina State University | Deep neural network with compositional grammatical architectures |
| US20200175422A1 (en) * | 2018-11-29 | 2020-06-04 | International Business Machines Corporation | Asynchronous gradient weight compression |
| CN111178486A (en) * | 2019-11-27 | 2020-05-19 | 湖州师范学院 | An Asynchronous Parallel Search Method for Hyperparameters Based on Population Evolution |
Non-Patent Citations (2)
| Title |
|---|
| CHENG DANING 等: "Weighted parallel SGD for distributed unbalanced-workload training system", 《JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING》 * |
| 鲁淑霞 等: "带有方差减小的加权零阶随机梯度下降算法", 《河北大学学报(自然科学版)》 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112598118A (en) * | 2021-03-03 | 2021-04-02 | 成都晓多科技有限公司 | Method, device, storage medium and equipment for processing abnormal labeling in supervised learning |
| CN112598118B (en) * | 2021-03-03 | 2021-06-25 | 成都晓多科技有限公司 | Method, device, storage medium and equipment for processing abnormal labeling in supervised learning |
| CN114035937A (en) * | 2021-10-15 | 2022-02-11 | 北京潞晨科技有限公司 | Distributed training and reasoning method, system, equipment and readable storage medium based on artificial intelligence |
| CN114167828A (en) * | 2021-12-03 | 2022-03-11 | 润电能源科学技术有限公司 | External hanging control method of DCS controller and related device |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109032671B (en) | Distributed deep learning method and system based on data parallel strategy | |
| CN110533183B (en) | A Task Placement Method for Heterogeneous Network Awareness in Pipelined Distributed Deep Learning | |
| CN114756383B (en) | Distributed computing method, system, equipment and storage medium | |
| Rashidi et al. | Astra-sim: Enabling sw/hw co-design exploration for distributed dl training platforms | |
| US20220129302A1 (en) | Data processing system and method for heterogeneous architecture | |
| US11481627B2 (en) | Distributed learning of composite machine learning models | |
| CN111858058A (en) | SGD load balancing method, device and storage medium based on parallel computing | |
| Van Tendeloo et al. | PythonPDEVS: a distributed Parallel DEVS simulator | |
| CN107578094A (en) | The method that the distributed training of neutral net is realized based on parameter server and FPGA | |
| Sun et al. | Gradientflow: Optimizing network performance for large-scale distributed dnn training | |
| CN110347636B (en) | Data execution body and data processing method thereof | |
| Zhan et al. | Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking | |
| CN107807983A (en) | A kind of parallel processing framework and design method for supporting extensive Dynamic Graph data query | |
| WO2025112979A1 (en) | Parallel strategy optimal selection method, and neural network solver training method and apparatus | |
| CN111241301A (en) | Knowledge graph representation learning-oriented distributed framework construction method | |
| CN118644225B (en) | A substation operation and maintenance decision-making method based on multi-agent reinforcement learning | |
| Liu et al. | Aedfl: efficient asynchronous decentralized federated learning with heterogeneous devices | |
| Addanki et al. | Placeto: Efficient progressive device placement optimization | |
| WO2025081828A1 (en) | Training model distribution method and apparatus, and computer device and storage medium | |
| CN110868461B (en) | Data distribution method facing heterogeneous bandwidth between nodes in Gaia cluster | |
| Zhang et al. | The optimization of model parallelization strategies for multi-GPU training | |
| CN119166298A (en) | Heterogeneous intelligent computing power optimization management and scheduling system to accelerate large model training tasks | |
| CN106201985B (en) | A kind of distributed parallel load flow calculation system development approach based on PQ method | |
| Wang et al. | A coordinated two-stages virtual network embedding algorithm based on reinforcement learning | |
| CN114358859A (en) | Graph-based large-scale embedding model training method and system for click-through rate prediction |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20221209 |
|
| AD01 | Patent right deemed abandoned |