[go: up one dir, main page]

CN105302526A - Data processing system and method - Google Patents

Data processing system and method Download PDF

Info

Publication number
CN105302526A
CN105302526A CN201510680669.4A CN201510680669A CN105302526A CN 105302526 A CN105302526 A CN 105302526A CN 201510680669 A CN201510680669 A CN 201510680669A CN 105302526 A CN105302526 A CN 105302526A
Authority
CN
China
Prior art keywords
master node
node
slave
data
slave nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510680669.4A
Other languages
Chinese (zh)
Other versions
CN105302526B (en
Inventor
张清
沈铂
王娅娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201510680669.4A priority Critical patent/CN105302526B/en
Publication of CN105302526A publication Critical patent/CN105302526A/en
Application granted granted Critical
Publication of CN105302526B publication Critical patent/CN105302526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a data processing system. The system comprises a master node and a plurality of slave nodes, wherein the master node is used for batch reading of to-be-processed data; the master node is also used for sending the to-be-processed data to each slave node after reading each time, updating a network according to a weight returned by each slave node, sending updated network information parameters to each slave node, and reading a next batch of to-be-processed data; and the slave nodes are used for performing forward-backward calculation on the received data sent by the main node to obtain weights and returning the weights to the master node. According to the scheme, a master-slave calculation mode is adopted, so that the time for processing a deep learning application is shortened and the calculation efficiency is improved.

Description

一种数据处理系统及方法A data processing system and method

技术领域technical field

本发明涉及计算机领域,具体涉及数据处理方法及系统。The invention relates to the field of computers, in particular to a data processing method and system.

背景技术Background technique

2006年,加拿大多伦多大学教授、机器学习领域泰斗——GeoffreyHinton和他的学生RuslanSalakhutdinov在顶尖学术刊物《科学》上发表了一篇文章,开启了深度学习在学术界和工业界的浪潮。自2006年以来,深度学习在学术界持续升温。斯坦福大学、纽约大学、加拿大蒙特利尔大学等成为研究深度学习的重镇。2010年,美国国防部DARPA计划首次资助深度学习项目,参与方有斯坦福大学、纽约大学和NEC美国研究院。支持深度学习的一个重要依据,就是脑神经系统的确具有丰富的层次结构。一个最著名的例子就是Hubel-Wiesel模型,由于揭示了视觉神经的机理而曾获得诺贝尔医学与生理学奖。除了仿生学的角度,目前深度学习的理论研究还基本处于起步阶段,但在应用领域已显现出巨大能量。2011年以来,微软研究院和Google的语音识别研究人员先后采用DNN技术降低语音识别错误率20%~30%,是语音识别领域十多年来最大的突破性进展。2012年,DNN技术在图像识别领域取得惊人的效果,在ImageNet评测上将错误率从26%降低到15%。在这一年,DNN还被应用于制药公司的DrugeActivity预测问题,并获得世界最好成绩,这一重要成果被《纽约时报》报道。In 2006, Geoffrey Hinton, a professor at the University of Toronto in Canada and a leader in the field of machine learning, and his student Ruslan Salakhutdinov published an article in the top academic journal "Science", which opened a wave of deep learning in academia and industry. Since 2006, deep learning has continued to gain momentum in academia. Stanford University, New York University, University of Montreal, Canada, etc. have become important centers for deep learning research. In 2010, the DARPA program of the U.S. Department of Defense funded deep learning projects for the first time, and the participants included Stanford University, New York University and NEC American Research Institute. An important basis for supporting deep learning is that the brain nervous system does have a rich hierarchical structure. One of the most famous examples is the Hubel-Wiesel model, which won the Nobel Prize in Medicine and Physiology for revealing the mechanism of the optic nerve. In addition to the perspective of bionics, the current theoretical research on deep learning is still in its infancy, but it has shown great energy in the field of application. Since 2011, speech recognition researchers at Microsoft Research and Google have successively used DNN technology to reduce the error rate of speech recognition by 20% to 30%, which is the biggest breakthrough in the field of speech recognition for more than ten years. In 2012, DNN technology achieved amazing results in the field of image recognition, reducing the error rate from 26% to 15% in the ImageNet evaluation. In this year, DNN was also applied to the DrugActivity prediction problem of pharmaceutical companies, and achieved the best results in the world. This important achievement was reported by the New York Times.

如今Google、微软、百度等知名的拥有大数据的高科技公司争相投入资源,占领深度学习的技术制高点,正是因为它们都看到了在大数据时代,更加复杂且更加强大的深度模型能深刻揭示海量数据里所承载的复杂而丰富的信息,并对未来或未知事件做更精准的预测。Now Google, Microsoft, Baidu and other well-known high-tech companies with big data are scrambling to invest resources and occupy the technical commanding heights of deep learning, precisely because they have seen that in the era of big data, more complex and powerful deep models can profoundly Reveal the complex and rich information carried in massive data, and make more accurate predictions for future or unknown events.

目前深度学习应用包括语音识别、图像识别、自然语言处理、搜索广告CTR预估等,这些应用的计算量十分巨大,其需要大规模计算,采用GPU高性能计算将进一步提升应用处理效率,基于GPU来设计深度学习系统是一个不错的选择。At present, deep learning applications include speech recognition, image recognition, natural language processing, and search advertisement CTR estimation. It is a good choice to design a deep learning system.

发明内容:Invention content:

本发明提供一种数据处理系统及方法,提高了计算效率。The invention provides a data processing system and method, which improves calculation efficiency.

为解决上述技术问题,本发明提供一种数据处理系统,所述系统包括:一个主节点,多个从节点;In order to solve the above technical problems, the present invention provides a data processing system, the system includes: a master node, a plurality of slave nodes;

所述主节点用于分批读取待处理的数据;还用于每次读取后将待处理的数据分发到各从节点,根据各所述从节点返回的权重更新网络,将所述更新后的网络信息参数发送给各所述从节点后,读取下一批待处理的数据;The master node is used to read the data to be processed in batches; it is also used to distribute the data to be processed to each slave node after reading each time, update the network according to the weight returned by each slave node, and update the After the last network information parameter is sent to each described slave node, read the next batch of data to be processed;

所述从节点用于对接收的所述主节点分发的数据进行前向后向计算后得出权重,返回给所述主节点。The slave node is used to perform forward and backward calculations on the received data distributed by the master node to obtain weights and return them to the master node.

优选地,Preferably,

所述主节点包括两个CPU和一个GPU;The master node includes two CPUs and a GPU;

所述从节点包括两个CPU和两个GPU;The slave node includes two CPUs and two GPUs;

所述主节点及所述各从节点采用CPU和GPU异构架构的混合集群系统模式。The master node and the slave nodes adopt a hybrid cluster system mode of CPU and GPU heterogeneous architecture.

优选地,Preferably,

所述系统还包括并行分布式Lustre存储:The system also includes parallel distributed Luster storage:

所述主节点用于分批读取待处理的数据具体是指:The master node is used to read the data to be processed in batches specifically refers to:

所述主节点从所述Lustre存储中并行读取数据。The master node reads data in parallel from the Luster storage.

优选地,Preferably,

所述Lustre存储支持多进行或多线程并行读写。The Luster storage supports multi-process or multi-thread parallel reading and writing.

优选地,Preferably,

所述主节点与所述各从节点之间采用远程直接数据存取RDMA方式接收/发送数据。The remote direct data access (RDMA) method is used to receive/send data between the master node and the slave nodes.

优选地,Preferably,

所述从节点配置1块IB网卡,所述主节点和所述各从节点之间通过IB网络互连;The slave node is configured with one IB network card, and the master node and each slave node are interconnected through an IB network;

所述主节点及所述各节点内CPU与GPU之间通过PCIE3.0标准。The master node and the CPU and GPU in each node pass the PCIE3.0 standard.

优选地,Preferably,

所述从节点的个数不大于8。The number of said slave nodes is not greater than 8.

本发明还提供一种数据处理的方法,应用于如权利要求1至7任一所述的系统中,所述方法包括:The present invention also provides a method of data processing, which is applied to the system according to any one of claims 1 to 7, said method comprising:

步骤S1、所述主节点读取待处理的数据分发到各从节点;Step S1, the master node reads the data to be processed and distributes it to each slave node;

步骤S2、所述主节点接收所述各从节点返回的权重;Step S2, the master node receives the weights returned by the slave nodes;

步骤S3、所述主节点根据所述各从节点返回的权重更新网络,并将所述更新后的网络信息参数发送给所述各从节点。Step S3, the master node updates the network according to the weights returned by the slave nodes, and sends the updated network information parameters to the slave nodes.

步骤S4、主节点发送更新后的网络后,检查是否还存在待处理的数据,如果存在,则返回S1。Step S4, after the master node sends the updated network, it checks whether there is still data to be processed, and if so, returns to S1.

优选地,Preferably,

所述步骤S1后,步骤S2前,所述方法还包括:After the step S1, before the step S2, the method also includes:

步骤S11、所述各从节点的GPU根据所述网络信息参数对接收的所述主节点分发的数据进行前向后向计算后得出权重;Step S11, the GPUs of the slave nodes perform forward and backward calculations on the received data distributed by the master node according to the network information parameters to obtain weights;

所述S2包括:The S2 includes:

所述主节点的GPU接收所述各从节点发送的权重。The GPU of the master node receives the weights sent by the slave nodes.

优选地,Preferably,

所述步骤S1前,所述方法还包括:Before the step S1, the method also includes:

步骤S0、所述主节点从并行分布式Lustre存储中并行读取数据。Step S0, the master node reads data in parallel from the parallel distributed Luster storage.

上述方案采用主从计算模式,从而减少了深度学习应用处理时间,提升了计算效率。The above scheme adopts the master-slave computing mode, which reduces the processing time of deep learning applications and improves computing efficiency.

附图说明Description of drawings

图1为实施例一中的数据处理系统的结构示意图;Fig. 1 is the structural representation of the data processing system in embodiment one;

图2为实施例一中的数据处理方法的流程图;Fig. 2 is the flowchart of the data processing method in embodiment one;

图3为实施例二中的数据处理系统的结构示意图;Fig. 3 is the structural representation of the data processing system in the second embodiment;

图4为实施例二中的Lustre存储与主节点的连接示意图;Fig. 4 is the schematic diagram of the connection between the Luster storage and the master node in the second embodiment;

图5为实施例二中的数据处理系统的逻辑关系示意图。FIG. 5 is a schematic diagram of the logical relationship of the data processing system in the second embodiment.

具体实施方式detailed description

为使本申请的目的、技术方案和优点更加清楚明白,下文中将结合附图对本申请的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。In order to make the purpose, technical solution and advantages of the application clearer, the embodiments of the application will be described in detail below in conjunction with the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.

实施例一Embodiment one

如图1所示,本发明提供一种数据处理系统,所述系统包括:一个主节点11,多个从节点12;As shown in Figure 1, the present invention provides a kind of data processing system, and described system comprises: a master node 11, a plurality of slave nodes 12;

所述主节点11用于分批读取待处理的数据;还用于每次读取后将待处理的数据分发到各从节点,根据各所述从节点返回的权重更新网络,将所述更新后的网络信息参数发送给各所述从节点后,读取下一批待处理的数据;The master node 11 is used to read the data to be processed in batches; it is also used to distribute the data to be processed to each slave node after each reading, and update the network according to the weight returned by each slave node, and the After the updated network information parameters are sent to each of the slave nodes, the next batch of data to be processed is read;

所述从节点12用于对接收的所述主节点分发的数据进行前向后向计算后得出权重,返回给所述主节点。The slave node 12 is used to perform forward and backward calculations on the received data distributed by the master node to obtain weights and return them to the master node.

在本实施例中,从节点的个数可以设置为不大于8。主节点11包括两个CPU121和一个GPU122;从节点12包括两个CPU121和两个GPU122;主节点及各从节点采用CPU和GPU异构架构的混合集群系统模式。In this embodiment, the number of slave nodes may be set to be no greater than 8. The master node 11 includes two CPUs 121 and one GPU 122; the slave node 12 includes two CPUs 121 and two GPUs 122; the master node and each slave node adopt a mixed cluster system mode of CPU and GPU heterogeneous architecture.

优选地,Preferably,

所述系统还包括并行分布式Lustre存储13:Lustre存储13支持多进行或多线程并行读写。主节点11从Lustre存储中并行读取数据。The system also includes a parallel distributed Luster storage 13: the Luster storage 13 supports multi-process or multi-thread parallel reading and writing. Master nodes 11 read data in parallel from Luster storage.

主节点与各从节点之间采用远程直接数据存取RDMA方式接收/发送数据。The remote direct data access RDMA method is used to receive/send data between the master node and each slave node.

优选地,Preferably,

从节点12配置1块IB网卡,主节点11和各从节点12之间通过IB网络互连;主节点11及各节点内CPU与GPU之间通过PCIE3.0标准。The slave node 12 is configured with one IB network card, and the master node 11 and each slave node 12 are interconnected through the IB network; the master node 11 and the CPU and GPU in each node are connected through the PCIE3.0 standard.

如图2所示,本发明还提供一种数据处理的方法,应用于如权利要求1至7任一所述的系统中,所述方法包括:As shown in Figure 2, the present invention also provides a data processing method, which is applied to the system according to any one of claims 1 to 7, the method comprising:

步骤S1、主节点读取待处理的数据分发到各从节点;Step S1, the master node reads the data to be processed and distributes it to each slave node;

步骤S2、主节点接收各从节点返回的权重;Step S2, the master node receives the weight returned by each slave node;

具体的,主节点的GPU接收各从节点发送的权重Specifically, the GPU of the master node receives the weights sent by each slave node

步骤S3、主节点根据各从节点返回的权重更新网络,并将更新后的网络信息参数发送给各从节点。Step S3, the master node updates the network according to the weight returned by each slave node, and sends the updated network information parameters to each slave node.

步骤S4、主节点发送更新后的网络后,检查是否还存在待处理的数据,如果存在,则返回S1。Step S4, after the master node sends the updated network, it checks whether there is still data to be processed, and if so, returns to S1.

优选地,Preferably,

所述步骤S1后,步骤S2前,所述方法还包括:After the step S1, before the step S2, the method also includes:

步骤S11、各从节点的GPU根据网络信息参数对接收的主节点分发的数据进行前向后向计算后得出权重;Step S11, the GPU of each slave node performs forward and backward calculation on the received data distributed by the master node according to the network information parameters to obtain the weight;

优选地,Preferably,

所述步骤S1前,所述方法还包括:Before the step S1, the method also includes:

步骤S0、主节点从并行分布式Lustre存储中并行读取数据。Step S0, the master node reads data in parallel from the parallel distributed Luster storage.

实施例二Embodiment two

下面结合具体的场景进一步说明本发明的技术方案。The technical solution of the present invention will be further described below in conjunction with specific scenarios.

如图3所示,本实施例的数据处理系统可运行深度学习caffe应用,采用Cifar-10数据测试,具体可以采用如下架构实现:As shown in Figure 3, the data processing system of this embodiment can run deep learning caffe applications, using Cifar-10 data testing, specifically can be implemented using the following architecture:

一、数据处理系统可以采用CPU+GPU异构架构的混合集群系统模式;并采用主从模式,整个系统计算节点分为1个主节点和8个从节点。根据深度学习应用算法特点,参数更新计算、数据读取和分发、网络更新计算由主节点完成;耗时的前向后向计算由从节点完成。1. The data processing system can adopt the hybrid cluster system mode of CPU+GPU heterogeneous architecture; and adopt the master-slave mode, and the computing nodes of the whole system are divided into 1 master node and 8 slave nodes. According to the characteristics of deep learning application algorithms, parameter update calculations, data reading and distribution, and network update calculations are completed by the master node; time-consuming forward and backward calculations are completed by the slave nodes.

下面进一步对本实施例中主节点和从节点做详细的介绍:The master node and the slave node in this embodiment are further described in detail below:

a)主节点a) Master node

主节点内为CPU与GPU协同计算,CPU与GPU通信采用PCIE3.0标准,2块CPU,1块NvidiaK40GPU,GPU支持PCIE3.0标准,主节点个数为1个。主节点配置2块IB网卡,主节点与存储、其它从节点通过IB网络互连。The master node is for CPU and GPU collaborative computing, CPU and GPU communication adopts PCIE3.0 standard, 2 CPUs, 1 NvidiaK40GPU, GPU supports PCIE3.0 standard, and the number of master nodes is 1. The master node is equipped with two IB network cards, and the master node is interconnected with the storage and other slave nodes through the IB network.

b)从节点b) slave node

从节点内为CPU与GPU协同计算,CPU与GPU通信采用PCIE3.0标准,2块CPU,2块NvidiaK40GPU,GPU支持PCIE3.0标准,2块GPU都插到CPU0的插槽上。从节点个数为8个。从节点配置1块IB网卡,从节点与主节点通过IB网络互连。The slave node is for CPU and GPU collaborative computing. The communication between CPU and GPU adopts the PCIE3.0 standard. There are 2 CPUs and 2 NvidiaK40GPUs. The GPUs support the PCIE3.0 standard. Both GPUs are plugged into the CPU0 slot. The number of slave nodes is 8. The slave node is equipped with an IB network card, and the slave node and the master node are interconnected through the IB network.

二、如图4所示,在本实施例的技术方案中,还需设计并行分布式Lustre存储,支持多进程或多线程并行读写,并行读写带宽高、延迟低,Lustre存储与主节点通过IB网络互连。2. As shown in Figure 4, in the technical solution of this embodiment, parallel distributed Luster storage needs to be designed to support multi-process or multi-thread parallel read and write, with high parallel read and write bandwidth and low latency. Luster storage and the master node Interconnected through the IB network.

三、设计网络,本数据处理系统采用Mellanox公司的56Gb/sIB高速网络,实现并行存储、主节点、从节点的高速互连。3. Network design. This data processing system adopts 56Gb/sIB high-speed network of Mellanox Company to realize parallel storage, high-speed interconnection of master nodes and slave nodes.

如图5所示,系统各个部件工作逻辑关系设计如下:As shown in Figure 5, the working logic relationship of each component of the system is designed as follows:

(1)主节点从并行Lustre存储并行读取Cifar-10数据;(1) The master node reads Cifar-10 data in parallel from parallel Luster storage;

(2)主节点把数据分发到8个从节点;(2) The master node distributes the data to 8 slave nodes;

(3)每个从节点的2块GPU开始进行前向后向计算,并把计算后的权重通过RDMA直传到主节点GPU上;(3) The two GPUs of each slave node start to perform forward and backward calculations, and directly transmit the calculated weights to the master node GPU through RDMA;

(4)主节点接收到新权重后在GPU上进行计算,并更新网络,然后把新网络RDMA发送给从节点;(4) After the master node receives the new weight, it calculates on the GPU, updates the network, and then sends the new network RDMA to the slave node;

上述步骤依此迭代执行,直至所有数据处理完成,其逻辑关系如图3所示。The above steps are executed iteratively until all data processing is completed, and the logical relationship is shown in FIG. 3 .

以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现,相应地,上述实施例中的各模块/模块可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本申请不限制于任何特定形式的硬件和软件的结合。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention. Those skilled in the art can understand that all or part of the steps in the above method can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, and the like. Optionally, all or part of the steps in the above embodiments can also be implemented using one or more integrated circuits. Correspondingly, each module/module in the above embodiments can be implemented in the form of hardware, or can be implemented in the form of software function modules. The form is realized. This application is not limited to any specific form of combination of hardware and software.

Claims (10)

1.一种数据处理系统,其特征在于,所述系统包括:一个主节点,多个从节点;1. A data processing system, characterized in that the system comprises: a master node, a plurality of slave nodes; 所述主节点用于分批读取待处理的数据;还用于每次读取后将待处理的数据分发到各从节点,根据各所述从节点返回的权重更新网络,将所述更新后的网络信息参数发送给各所述从节点后,读取下一批待处理的数据;The master node is used to read the data to be processed in batches; it is also used to distribute the data to be processed to each slave node after reading each time, update the network according to the weight returned by each slave node, and update the After the last network information parameter is sent to each described slave node, read the next batch of data to be processed; 所述从节点用于对接收的所述主节点分发的数据进行前向后向计算后得出权重,返回给所述主节点。The slave node is used to perform forward and backward calculations on the received data distributed by the master node to obtain weights and return them to the master node. 2.如权利要求1所述的系统,其特征在于:2. The system of claim 1, wherein: 所述主节点包括两个CPU和一个GPU;The master node includes two CPUs and a GPU; 所述从节点包括两个CPU和两个GPU;The slave node includes two CPUs and two GPUs; 所述主节点及所述各从节点采用CPU和GPU异构架构的混合集群系统模式。The master node and the slave nodes adopt a hybrid cluster system mode of CPU and GPU heterogeneous architecture. 3.如权利要求2所述的系统,其特征在于,所述系统还包括并行分布式Lustre存储:3. The system according to claim 2, further comprising parallel distributed Luster storage: 所述主节点用于分批读取待处理的数据具体是指:The master node is used to read the data to be processed in batches specifically refers to: 所述主节点从所述Lustre存储中并行读取数据。The master node reads data in parallel from the Luster storage. 4.如权利要求3所述的系统,其特征在于:4. The system of claim 3, wherein: 所述Lustre存储支持多进行或多线程并行读写。The Luster storage supports multi-process or multi-thread parallel reading and writing. 5.如权利要求4所述的系统,其特征在于:5. The system of claim 4, wherein: 所述主节点与所述各从节点之间采用远程直接数据存取RDMA方式接收/发送数据。The remote direct data access (RDMA) method is used to receive/send data between the master node and the slave nodes. 6.如权利要求1至5任一所述的系统,其特征在于:6. The system according to any one of claims 1 to 5, characterized in that: 所述从节点配置1块IB网卡,所述主节点和所述各从节点之间通过IB网络互连;The slave node is configured with one IB network card, and the master node and each slave node are interconnected through an IB network; 所述主节点及所述各节点内CPU与GPU之间通过PCIE3.0标准。The master node and the CPU and GPU in each node pass the PCIE3.0 standard. 7.如权利要求1至5任一所述的系统,其特征在于:7. The system according to any one of claims 1 to 5, characterized in that: 所述从节点的个数不大于8。The number of said slave nodes is not greater than 8. 8.一种数据处理的方法,应用于如权利要求1至7任一所述的系统中,其特征在于,所述方法包括:8. A method for data processing, applied to the system according to any one of claims 1 to 7, characterized in that the method comprises: 步骤S1、所述主节点读取待处理的数据分发到各从节点;Step S1, the master node reads the data to be processed and distributes it to each slave node; 步骤S2、所述主节点接收所述各从节点返回的权重;Step S2, the master node receives the weights returned by the slave nodes; 步骤S3、所述主节点根据所述各从节点返回的权重更新网络,并将所述更新后的网络信息参数发送给所述各从节点。Step S3, the master node updates the network according to the weights returned by the slave nodes, and sends the updated network information parameters to the slave nodes. 步骤S4、主节点发送更新后的网络后,检查是否还存在待处理的数据,如果存在,则返回S1。Step S4, after the master node sends the updated network, it checks whether there is still data to be processed, and if so, returns to S1. 9.如权利要求8所述的方法,其特征在于:9. The method of claim 8, wherein: 所述步骤S1后,步骤S2前,所述方法还包括:After the step S1, before the step S2, the method also includes: 步骤S11、所述各从节点的GPU根据所述网络信息参数对接收的所述主节点分发的数据进行前向后向计算后得出权重;Step S11, the GPUs of the slave nodes perform forward and backward calculations on the received data distributed by the master node according to the network information parameters to obtain weights; 所述S2包括:The S2 includes: 所述主节点的GPU接收所述各从节点发送的权重。The GPU of the master node receives the weights sent by the slave nodes. 10.如权利要求8至9任一所述的方法,其特征在于:10. The method according to any one of claims 8 to 9, characterized in that: 所述步骤S1前,所述方法还包括:Before the step S1, the method also includes: 步骤S0、所述主节点从并行分布式Lustre存储中并行读取数据。Step S0, the master node reads data in parallel from the parallel distributed Luster storage.
CN201510680669.4A 2015-10-19 2015-10-19 A kind of data processing system and method Active CN105302526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510680669.4A CN105302526B (en) 2015-10-19 2015-10-19 A kind of data processing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510680669.4A CN105302526B (en) 2015-10-19 2015-10-19 A kind of data processing system and method

Publications (2)

Publication Number Publication Date
CN105302526A true CN105302526A (en) 2016-02-03
CN105302526B CN105302526B (en) 2019-03-01

Family

ID=55199830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510680669.4A Active CN105302526B (en) 2015-10-19 2015-10-19 A kind of data processing system and method

Country Status (1)

Country Link
CN (1) CN105302526B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956659A (en) * 2016-05-11 2016-09-21 北京比特大陆科技有限公司 Data processing device, data processing system and server
CN107463448A (en) * 2017-09-28 2017-12-12 郑州云海信息技术有限公司 A kind of deep learning weight renewing method and system
CN109166074A (en) * 2018-08-06 2019-01-08 联想(北京)有限公司 computing system
CN110019093A (en) * 2017-12-28 2019-07-16 中国移动通信集团安徽有限公司 Method for writing data, device, equipment and medium
CN113626368A (en) * 2021-06-30 2021-11-09 苏州浪潮智能科技有限公司 Artificial intelligence data processing method and related device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7526634B1 (en) * 2005-12-19 2009-04-28 Nvidia Corporation Counter-based delay of dependent thread group execution
CN102929718A (en) * 2012-09-17 2013-02-13 江苏九章计算机科技有限公司 Distributed GPU (graphics processing unit) computer system based on task scheduling
CN104036451A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Parallel model processing method and device based on multiple graphics processing units
CN104035751A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Graphics processing unit based parallel data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7526634B1 (en) * 2005-12-19 2009-04-28 Nvidia Corporation Counter-based delay of dependent thread group execution
CN102929718A (en) * 2012-09-17 2013-02-13 江苏九章计算机科技有限公司 Distributed GPU (graphics processing unit) computer system based on task scheduling
CN104036451A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Parallel model processing method and device based on multiple graphics processing units
CN104035751A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Graphics processing unit based parallel data processing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHANSHAN ZHANG ET AL.: "ASYNCHRONOUS STOCHASTIC GRADIENT DESCENT FOR DNN TRAINING", 《IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956659A (en) * 2016-05-11 2016-09-21 北京比特大陆科技有限公司 Data processing device, data processing system and server
CN105956659B (en) * 2016-05-11 2019-11-22 北京比特大陆科技有限公司 Data processing device and system, server
CN107463448A (en) * 2017-09-28 2017-12-12 郑州云海信息技术有限公司 A kind of deep learning weight renewing method and system
CN110019093A (en) * 2017-12-28 2019-07-16 中国移动通信集团安徽有限公司 Method for writing data, device, equipment and medium
CN109166074A (en) * 2018-08-06 2019-01-08 联想(北京)有限公司 computing system
CN113626368A (en) * 2021-06-30 2021-11-09 苏州浪潮智能科技有限公司 Artificial intelligence data processing method and related device
CN113626368B (en) * 2021-06-30 2023-07-25 苏州浪潮智能科技有限公司 A data processing method and related device for artificial intelligence

Also Published As

Publication number Publication date
CN105302526B (en) 2019-03-01

Similar Documents

Publication Publication Date Title
JP6974270B2 (en) Intelligent high bandwidth memory system and logic die for it
CN105224502A (en) A kind of degree of depth learning method based on GPU and system
CN106297774B (en) A kind of the distributed parallel training method and system of neural network acoustic model
CN113366501B (en) Split network acceleration architecture
CN106951926B (en) Deep learning method and device of hybrid architecture
CN105227669A (en) A kind of aggregated structure system of CPU and the GPU mixing towards degree of depth study
CN105302526A (en) Data processing system and method
CN105956659B (en) Data processing device and system, server
JP2022058328A (en) Distributed model training equipment and methods, electronics, storage media, and computer programs
US8677299B1 (en) Latch clustering with proximity to local clock buffers
CN107346351A (en) For designing FPGA method and system based on the hardware requirement defined in source code
CN111191784A (en) Transposed sparse matrix multiplied by dense matrix for neural network training
CN110059793B (en) Stepwise modification of generative adversarial neural networks
WO2025001229A1 (en) Computing system, model training method and apparatus, and product
Lawande et al. Novo‐G#: a multidimensional torus‐based reconfigurable cluster for molecular dynamics
CN117312215B (en) Server system, job execution method, device, equipment and medium
CN116187464A (en) Blind quantum computing processing method and device and electronic equipment
CN110888824B (en) Multilevel memory hierarchy
CN117785490B (en) A training architecture, method, system and server for a graph neural network model
Wang et al. Enabling efficient large-scale deep learning training with cache coherent disaggregated memory systems
US8458634B2 (en) Latch clustering with proximity to local clock buffers
US20240311668A1 (en) Optimizing quantum computing circuit state partitions for simulation
CN205983537U (en) Data processing device and system, server
CN105718991B (en) Cell Array Computing System
US20240311667A1 (en) Simulating quantum computing circuits using sparse state partitioning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant