[go: up one dir, main page]

CN118503191B - High-power storage module and communication method - Google Patents

High-power storage module and communication method Download PDF

Info

Publication number
CN118503191B
CN118503191B CN202410964538.8A CN202410964538A CN118503191B CN 118503191 B CN118503191 B CN 118503191B CN 202410964538 A CN202410964538 A CN 202410964538A CN 118503191 B CN118503191 B CN 118503191B
Authority
CN
China
Prior art keywords
data
pcie
circuit board
network
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410964538.8A
Other languages
Chinese (zh)
Other versions
CN118503191A (en
Inventor
李修录
尹善腾
朱小聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Axd Anxinda Memory Technology Co ltd
Original Assignee
Axd Anxinda Memory Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Axd Anxinda Memory Technology Co ltd filed Critical Axd Anxinda Memory Technology Co ltd
Priority to CN202410964538.8A priority Critical patent/CN118503191B/en
Publication of CN118503191A publication Critical patent/CN118503191A/en
Application granted granted Critical
Publication of CN118503191B publication Critical patent/CN118503191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4221Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Semiconductor Integrated Circuits (AREA)
  • Semiconductor Memories (AREA)
  • Power Sources (AREA)

Abstract

The application relates to the technical field of semiconductor manufacturing, and provides a high-computation-power storage module and a communication method, wherein a plurality of circuit boards form a network; each circuit board comprises a bus and a plurality of chips; each of the chips includes a processing unit and a memory unit, and the processing unit and the memory unit in each of the chips are connected to the bus. The plurality of circuit boards form a network, so that the calculation power of the storage module is improved, and the method is particularly suitable for application scenes requiring high calculation power.

Description

高算力存储模组以及通信方法High computing power storage module and communication method

技术领域Technical Field

本发明涉及半导体制造技术领域,具体涉及一种高算力存储模组以及通信方法。The present invention relates to the field of semiconductor manufacturing technology, and in particular to a high computing power storage module and a communication method.

背景技术Background Art

目前存储设备的算力较低,尤其是在直接支持高速计算的应用场景,难以支持需要大量运算,例如AI模型训练类的计算任务等,所以随着技术发展,对于存储设备的算力要求越来越高,目前的存储设备难以满足日益提高的算力需求。Currently, the computing power of storage devices is relatively low, especially in application scenarios that directly support high-speed computing. It is difficult to support computing tasks that require a large amount of computing, such as AI model training. Therefore, with the development of technology, the computing power requirements for storage devices are getting higher and higher. Current storage devices are difficult to meet the increasing computing power needs.

发明内容Summary of the invention

针对现有技术存在的不足,本申请提供了一种高算力存储模组以及通信方法,有利于提供一种高算力存储模组,提高了存储模组的算力,特别适用于需求高算力的应用场景中。In response to the shortcomings of the prior art, the present application provides a high-computing power storage module and a communication method, which is conducive to providing a high-computing power storage module, improving the computing power of the storage module, and is particularly suitable for application scenarios that require high computing power.

为解决上述问题,本发明提供如下技术方案:To solve the above problems, the present invention provides the following technical solutions:

第一方面,本申请实施例提供了一种高算力存储模组,所述高算力存储模组包括:多个电路板,所述多个电路板之间组成网络;In a first aspect, an embodiment of the present application provides a high computing power storage module, the high computing power storage module comprising: a plurality of circuit boards, the plurality of circuit boards forming a network;

每个所述电路板中包括总线和多个芯片;Each of the circuit boards includes a bus and a plurality of chips;

每个所述芯片均包括处理单元和存储单元,每个所述芯片中的处理单元和存储单元连接于所述总线。Each of the chips includes a processing unit and a storage unit, and the processing unit and the storage unit in each of the chips are connected to the bus.

在一些实施方式中,所述处理单元和所述存储单元通过UCIe协议互相通信,每个所述芯片通过UCIe协议与所述电路板上的PCIe总线通信,每个所述电路板上的PCIe总线互相连接,组成第一网络。In some implementations, the processing unit and the storage unit communicate with each other via a UCIe protocol, each chip communicates with a PCIe bus on the circuit board via a UCIe protocol, and each PCIe bus on the circuit board is connected to each other to form a first network.

在一些实施方式中,每个所述电路板中的芯片通过PCIe协议与所述电路板上的PCIe总线通信,每个所述电路板上的PCIe总线互相连接,组成第二网络。In some implementations, the chip in each circuit board communicates with the PCIe bus on the circuit board via the PCIe protocol, and the PCIe buses on each circuit board are interconnected to form a second network.

在一些实施方式中,所述多个电路板通过交换机组成树状网络;In some embodiments, the plurality of circuit boards form a tree network through a switch;

所述树状网络中第一级电路板的总线连接于交换机;The bus of the first-level circuit board in the tree network is connected to the switch;

所述交换机连接于外部设备。The switch is connected to an external device.

在一些实施方式中,所述树状网络包括N级电路板,每个第M级电路板连接一个第M+1级电路板;其中,N和M为正整数,M小于或等于N减一。In some embodiments, the tree network includes N levels of circuit boards, and each Mth level circuit board is connected to an M+1th level circuit board; wherein N and M are positive integers, and M is less than or equal to N minus one.

在一些实施方式中,所述电路板为5.0电路板。In some embodiments, the circuit board is a 5.0 circuit board.

在一些实施方式中,所述芯片还包括至少一个缓存单元,所述处理单元连接于所述缓存单元,所述缓存单元为非易失性存储器;In some embodiments, the chip further includes at least one cache unit, the processing unit is connected to the cache unit, and the cache unit is a non-volatile memory;

每个所述芯片中的处理单元仅根据其所在的芯片中的所述缓存单元和所述存储单元中的数据完成运算任务。The processing unit in each chip completes the computing task only according to the data in the cache unit and the storage unit in the chip where the processing unit is located.

第二方面,本申请实施例提供了一种基于第一方面的高算力存储模组的通信方法,所述方法包括:In a second aspect, an embodiment of the present application provides a communication method of a high computing power storage module based on the first aspect, the method comprising:

接收外部设备发送的数据;Receive data sent by external devices;

判断所述数据的数据标签;Determining a data label of the data;

根据所述数据的数据标签处理所述数据。The data is processed according to the data tags of the data.

在一些实施方式中,所述根据所述数据的数据标签处理所述数据,包括:In some implementations, processing the data according to the data tag of the data includes:

在所述数据标签为运算标签中的统筹运算标签时,将所述数据拆分成多个子数据;将每个子数据分别分配给不同芯片中的处理单元;When the data tag is a comprehensive operation tag in the operation tag, the data is split into a plurality of sub-data; each sub-data is respectively allocated to a processing unit in a different chip;

在所述数据标签为运算标签中的独立运算标签时,将空闲处理单元中的其中一个空闲处理单元确定为目标处理单元,将所述数据分配给所述目标处理单元。When the data tag is an independent operation tag in the operation tag, one of the idle processing units is determined as the target processing unit, and the data is allocated to the target processing unit.

在一些实施方式中,所述根据所述数据的数据标签处理所述数据,包括:In some implementations, processing the data according to the data tag of the data includes:

在所述数据标签为存储标签时,根据所述存储标签确定目标存储单元;When the data tag is a storage tag, determining a target storage unit according to the storage tag;

将所述数据存储进所述目标存储单元中。The data is stored in the target storage unit.

本申请提供了一种高算力存储模组以及通信方法,高算力存储模组包括:多个电路板,所述多个电路板之间组成网络;每个所述电路板中包括总线和多个芯片;每个所述芯片均包括处理单元和存储单元,每个所述芯片中的处理单元和存储单元连接于所述总线。本申请中多个电路板组成网络,提高了存储模组的算力,特别适用于需求高算力的应用场景中。The present application provides a high-computing power storage module and a communication method, wherein the high-computing power storage module comprises: a plurality of circuit boards, wherein the plurality of circuit boards form a network; each of the circuit boards comprises a bus and a plurality of chips; each of the chips comprises a processing unit and a storage unit, wherein the processing unit and the storage unit in each of the chips are connected to the bus. In the present application, a plurality of circuit boards form a network, which improves the computing power of the storage module, and is particularly suitable for application scenarios requiring high computing power.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本申请实施例提供的高算力存储模组的第一结构示意图。FIG1 is a first structural diagram of a high computing power storage module provided in an embodiment of the present application.

图2是本申请实施例提供的PCIe通信下的电路板的结构示意图。FIG. 2 is a schematic diagram of the structure of a circuit board under PCIe communication provided in an embodiment of the present application.

图3是本申请实施例提供的UCIe通信下的电路板的结构示意图。FIG3 is a schematic diagram of the structure of a circuit board under UCIe communication provided in an embodiment of the present application.

图4是本申请实施例提供的高算力存储模组的第二结构示意图。FIG. 4 is a second structural diagram of the high computing power storage module provided in an embodiment of the present application.

图5是本申请实施例提供的高算力存储模组的第三结构示意图。FIG5 is a third structural diagram of the high computing power storage module provided in an embodiment of the present application.

图6是本申请实施例提供的一种通信方法的流程示意图。FIG6 is a flow chart of a communication method provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本发明的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. In the description of the present invention, the meaning of "plurality" is two or more, unless otherwise clearly and specifically defined.

请参阅图1,图1是本申请实施例提供的高算力存储模组的第一结构示意图。如图1所示,高算力存储模组1包括:多个电路板10,多个电路板之间组成网络。Please refer to Figure 1, which is a first structural diagram of a high computing power storage module provided in an embodiment of the present application. As shown in Figure 1, the high computing power storage module 1 includes: a plurality of circuit boards 10, and a network is formed between the plurality of circuit boards.

在一些实施方式中,电路板10可以为印刷电路板(PrintedCircuitBoard,PCB)、柔性电路板(FlexiblePrintedCircuit,FPC)等,本申请不做限制。In some embodiments, the circuit board 10 may be a printed circuit board (PCB), a flexible printed circuit (FPC), etc., which is not limited in the present application.

在一些实施方式中,电路板为PCIe5.0电路板。In some embodiments, the circuit board is a PCIe5.0 circuit board.

具体地,PCI-Express(Peripheral ComponentInterconnectExpress,简称PCIe)是一种高速串行计算机扩展总线标准,目前高速计算系统普遍使用的总线,可以实现一个设备中内存的某个起始地址到另一个设备中内存的某个起始地址的高速数据传输。Specifically, PCI-Express (Peripheral Component Interconnect Express, referred to as PCIe) is a high-speed serial computer expansion bus standard, a bus currently commonly used in high-speed computing systems, which can achieve high-speed data transmission from a certain starting address in the memory of one device to a certain starting address in the memory of another device.

具体地,PCIe 5.0提供了高达32GT/s的数据传输速率,这是PCIe 4.0的两倍。这种速度的提升直接转化为更快的数据传输率,减少了延迟,并显著提高了整体系统性能,由于在16个通道的配置中能够传输128GB/s(每秒千兆字节),PCIe5.0不仅提供了高速、可靠的数据传输通道,还满足了大量数据的处理需求。Specifically, PCIe 5.0 provides a data transfer rate of up to 32GT/s, which is twice that of PCIe 4.0. This speed increase directly translates into faster data transfer rates, reduces latency, and significantly improves overall system performance. With the ability to transfer 128GB/s (gigabytes per second) in a 16-channel configuration, PCIe5.0 not only provides a high-speed, reliable data transfer channel, but also meets the processing needs of large amounts of data.

具体地,PCIe 5.0电路板能够向下兼容PCIe 4.0设备和PCIe 3.0设备。Specifically, PCIe 5.0 circuit boards are backward compatible with PCIe 4.0 devices and PCIe 3.0 devices.

可选地,PCIe 5.0为了实现如此高的数据传输速率,对电路板的性能提出了更高的要求,电路板必须具有优秀的介电常数(Dk)特性、低耗散因数(Df)和良好的热导率,以应对信号高速传输所产生的热量。Optionally, in order to achieve such a high data transmission rate, PCIe 5.0 places higher requirements on the performance of the circuit board. The circuit board must have excellent dielectric constant (Dk) characteristics, low dissipation factor (Df) and good thermal conductivity to cope with the heat generated by high-speed signal transmission.

可选地,在高频率下,PCIe 5.0对信号完整性的要求很高,PCIe 5.0电路板的通道损耗在16GHz的频率点应小于36dB,以确保信号在传输过程中的质量和完整性。Optionally, at high frequencies, PCIe 5.0 has very high requirements for signal integrity, and the channel loss of the PCIe 5.0 circuit board should be less than 36dB at a frequency of 16GHz to ensure the quality and integrity of the signal during transmission.

在一些实施方式中,电路板为PCIe电路板,PCIe电路板可以为图2或图3中的结构。In some implementations, the circuit board is a PCIe circuit board, and the PCIe circuit board may be a structure as shown in FIG. 2 or FIG. 3 .

请再参阅图2,图2是本申请实施例提供的PCIe通信下的电路板的结构示意图。如图2所示,电路板10包括PCIe总线11和多个芯片12。Please refer to FIG2 again, which is a schematic diagram of the structure of a circuit board under PCIe communication provided by an embodiment of the present application. As shown in FIG2 , a circuit board 10 includes a PCIe bus 11 and a plurality of chips 12 .

在一些实施方式中,每个所述芯片12均包括处理单元121和存储单元122,每个所述芯片12中的处理单元121和存储单元122连接于所述总线11。In some embodiments, each of the chips 12 includes a processing unit 121 and a storage unit 122 , and the processing unit 121 and the storage unit 122 in each of the chips 12 are connected to the bus 11 .

在一些实施方式中,每个所述电路板10上的芯片12中的处理单元121和所述存储单元122通过PCIe协议互相通信,每个所述电路板10的芯片12通过PCIe协议与所述电路板上的PCIe总线11通信,每个所述电路板10上的PCIe总线11互相连接,组成第二网络。In some embodiments, the processing unit 121 and the storage unit 122 in the chip 12 on each circuit board 10 communicate with each other via the PCIe protocol, the chip 12 on each circuit board 10 communicates with the PCIe bus 11 on the circuit board via the PCIe protocol, and the PCIe buses 11 on each circuit board 10 are connected to each other to form a second network.

可选地,处理单元121为中央处理器(Central Processing Unit,CPU)或者微控制单元(Micro controller Unit,MCU)。Optionally, the processing unit 121 is a central processing unit (CPU) or a micro controller unit (MCU).

可选地,存储单元122为闪存。Optionally, the storage unit 122 is a flash memory.

在一些实施方式中,芯片12还包括至少一个缓存单元123,所述处理单元121连接于所述缓存单元123,所述缓存单元123为非易失性存储器;每个所述芯片12中的处理单元121仅根据其所在的芯片12中的所述缓存单元123和所述存储单元122中的数据完成运算任务。In some embodiments, the chip 12 also includes at least one cache unit 123, the processing unit 121 is connected to the cache unit 123, and the cache unit 123 is a non-volatile memory; the processing unit 121 in each chip 12 completes the computing task only based on the data in the cache unit 123 and the storage unit 122 in the chip 12 where it is located.

可选地,缓存单元123为双倍速率同步动态随机存储器(Double DataRateSynchronous Dynamic Random Access Memory,DDR SRAM)。Optionally, the cache unit 123 is a double data rate synchronous dynamic random access memory (DDR SRAM).

图3是本申请实施例提供的UCIe通信下的电路板的结构示意图。如图3所示,电路板10为PCIe电路板,电路板10包括PCIe总线11和多个芯片12。Fig. 3 is a schematic diagram of the structure of a circuit board under UCIe communication provided by an embodiment of the present application. As shown in Fig. 3 , the circuit board 10 is a PCIe circuit board, and the circuit board 10 includes a PCIe bus 11 and a plurality of chips 12 .

每个所述电路板10上的芯片12中的处理单元121和存储单元122通过UCIe协议互相通信,且每个所述芯片12通过UCIe协议与所述电路板10上的PCIe总线11通信,每个所述电路板上的PCIe总线11互相连接,组成第一网络。The processing unit 121 and the storage unit 122 in the chip 12 on each circuit board 10 communicate with each other via the UCIe protocol, and each chip 12 communicates with the PCIe bus 11 on the circuit board 10 via the UCIe protocol. The PCIe buses 11 on each circuit board are interconnected to form a first network.

图3中UCIe协议扮演了一个桥梁的角色,允许芯片12之间先通过UCIe标准进行内部通信,然后通过电路板10集成的PCIe总线与系统中的其他组件(如CPU、内存或外设)进行更广泛的通信。The UCIe protocol in FIG. 3 plays the role of a bridge, allowing the chips 12 to communicate internally through the UCIe standard first, and then communicate more extensively with other components in the system (such as the CPU, memory, or peripherals) through the PCIe bus integrated in the circuit board 10.

请再参阅图4,图4是本申请实施例提供的一种高算力存储模组的第二结构示意图。如图4所示,高算力存储模组1中的所述多个PCIe电路板10通过PCIe交换机20组成树状PCIe网络。Please refer to Figure 4 again, which is a second structural diagram of a high computing power storage module provided in an embodiment of the present application. As shown in Figure 4, the multiple PCIe circuit boards 10 in the high computing power storage module 1 form a tree-like PCIe network through a PCIe switch 20.

在一些实施方式中,所述树状PCIe网络中第一级PCIe电路板的总线(图中未示出)连接于PCIe交换机20;所述PCIe交换机20连接于外部设备100。In some implementations, a bus (not shown in the figure) of a first-level PCIe circuit board in the tree-like PCIe network is connected to a PCIe switch 20 ; and the PCIe switch 20 is connected to an external device 100 .

在一些实施方式中,树状PCIe网络包括N级PCIe电路板,每个第M级PCIe电路板连接一个第M+1级PCIe电路板;其中,N和M为正整数,M小于或等于N减一。In some embodiments, the tree-like PCIe network includes N levels of PCIe circuit boards, and each M-th level PCIe circuit board is connected to an M+1-th level PCIe circuit board; wherein N and M are positive integers, and M is less than or equal to N minus one.

示例性地,图4中N等于2。Exemplarily, N is equal to 2 in FIG. 4 .

具体地,通过PCIe交换机,下级PCIe电路板需要经过上级PCIe电路板的总线接收外部设备100发送的数据,但下级PCIe电路板接收数据不需要经过上级PCIe电路板的处理单元或存储单元,因此虽然看似传输路径长,但是对传输时间的影响很小,保证了上述树状PCIe网络的传输速度,外部设备100发送的数据可以被快速的分配给对应的处理单元或存储单元。Specifically, through the PCIe switch, the lower-level PCIe circuit board needs to receive the data sent by the external device 100 through the bus of the upper-level PCIe circuit board, but the lower-level PCIe circuit board does not need to pass through the processing unit or storage unit of the upper-level PCIe circuit board to receive the data. Therefore, although the transmission path seems to be long, the impact on the transmission time is very small, thereby ensuring the transmission speed of the above-mentioned tree-like PCIe network, and the data sent by the external device 100 can be quickly allocated to the corresponding processing unit or storage unit.

请再参阅图5,图5是本申请实施例提供的一种高算力存储模组的第三结构示意图。如图5所示,所述多个PCIe电路板10组成链状PCIe网络,位于所述链状PCIe网络头部的PCIe电路板10连接于外部设备100。Please refer to Figure 5 again, which is a third structural diagram of a high computing power storage module provided by an embodiment of the present application. As shown in Figure 5, the multiple PCIe circuit boards 10 form a chain PCIe network, and the PCIe circuit board 10 located at the head of the chain PCIe network is connected to an external device 100.

在一些实施方式中,在链状PCIe网络中每个PCIe电路板10的PCIe总线互相连接,从而头部PCIe电路板10可以将数据传送至尾部PCIe电路板10中。In some embodiments, the PCIe buses of each PCIe circuit board 10 are connected to each other in a chain PCIe network, so that the head PCIe circuit board 10 can transmit data to the tail PCIe circuit board 10 .

在一些实施方式中,因此链状PCIe网络不具有交换机,因此每个芯片还包括管理单元,所述管理单元连接于所述PCIe总线,所述处理单元和存储单元连接于所述管理单元;所述管理单元用于接收来自PCIe总线的数据,并根据所述数据控制所述处理单元以及所述存储单元。In some embodiments, the chain PCIe network does not have a switch, so each chip also includes a management unit, which is connected to the PCIe bus, and the processing unit and the storage unit are connected to the management unit; the management unit is used to receive data from the PCIe bus and control the processing unit and the storage unit according to the data.

通过上述方式,让管理单元代替PCIe交换机进行数据分配,实现算力和存储空间整合。Through the above method, the management unit replaces the PCIe switch to distribute data and realize the integration of computing power and storage space.

链状PCIe网络连接关系简单,不需要设置PCIe交换机,在PCIe电路板的数量较少时,也能够满足高算力需求。The chain-like PCIe network connection relationship is simple and does not require the setting of a PCIe switch. It can also meet high computing power requirements when the number of PCIe circuit boards is small.

可选地,在PCIe电路板的数量小于或等于三时,采用链状PCIe网络,在PCIe电路板的数量大于三时,采用树状PCIe网络。Optionally, when the number of PCIe circuit boards is less than or equal to three, a chain-like PCIe network is adopted, and when the number of PCIe circuit boards is greater than three, a tree-like PCIe network is adopted.

进一步地,PCIe协议为双工通信,双工通信允许数据同时在两个方向上传输,意味着通信的双方可以同时发送和接收信息,因此为了支持全双工通信,每个PCIe总线都包括两条独立的通信信道,一条用于发送,另一条用于接收,确保数据流不会相互干扰。Furthermore, the PCIe protocol is a duplex communication, which allows data to be transmitted in both directions at the same time, meaning that both communicating parties can send and receive information at the same time. Therefore, in order to support full-duplex communication, each PCIe bus includes two independent communication channels, one for sending and the other for receiving, to ensure that the data streams do not interfere with each other.

可选地,两条独立的通信信道组成一个通道,PCIe总线可以包括1、2、4、8、16等不同的通道数。Optionally, two independent communication channels form one channel, and the PCIe bus may include different numbers of channels such as 1, 2, 4, 8, 16, etc.

进一步地,PCIe协议支持热插拔,因此在PCIe网络和外部设备通信时,可以随时拔出或插入PCIe电路板,以组成PCIe电路板数量不同的PCIe网络。Furthermore, the PCIe protocol supports hot plugging, so when the PCIe network communicates with external devices, the PCIe circuit board can be unplugged or plugged in at any time to form a PCIe network with different numbers of PCIe circuit boards.

请再参阅图6,图6是本申请实施例提供的一种高算力存储模组的通信方法的流程示意图。该高算力存储模组的通信方法基于上述实施例中的高算力存储模组,高算力存储模组连接于外部设备,如图6所示,该方法100包括:步骤110至步骤130。Please refer to Figure 6 again, which is a flow chart of a communication method of a high computing power storage module provided in an embodiment of the present application. The communication method of the high computing power storage module is based on the high computing power storage module in the above embodiment, and the high computing power storage module is connected to an external device. As shown in Figure 6, the method 100 includes: steps 110 to 130.

步骤110:接收外部设备发送的数据。Step 110: Receive data sent by an external device.

在一些实施方式中,外部设备发送的数据可以体现为指令、需要写入的数据、运算任务等等。In some implementations, the data sent by the external device may be embodied as instructions, data to be written, computing tasks, and the like.

步骤120:判断所述数据的数据标签。Step 120: Determine the data label of the data.

在一些实施方式中,数据标签包括运算标签和存储标签。In some implementations, the data tags include computation tags and storage tags.

在一些实施方式中,运算标签包括统筹运算标签和独立运算标签。In some implementations, the operation tags include integrated operation tags and independent operation tags.

步骤130:根据所述数据的数据标签处理所述数据。Step 130: Process the data according to the data tags of the data.

在一些实施方式中,步骤130包括下述步骤。In some embodiments, step 130 includes the following steps.

(1)在所述数据标签为运算标签中的统筹运算标签时,将所述数据拆分成多个子数据;将每个子数据分别分配给不同芯片中的处理单元。(1) When the data tag is a comprehensive operation tag in the operation tag, the data is split into multiple sub-data; and each sub-data is respectively allocated to a processing unit in a different chip.

(2)在所述数据标签为运算标签中的独立运算标签时,将空闲处理单元中的其中一个空闲处理单元确定为目标处理单元,将所述数据分配给所述目标处理单元。(2) When the data tag is an independent operation tag in the operation tag, one of the idle processing units is determined as the target processing unit, and the data is allocated to the target processing unit.

通过上述方式,可以让处理单元并行处理数据,最大化的利用处理单元的算力。Through the above method, the processing units can process data in parallel to maximize the use of the computing power of the processing units.

在一些实施方式中,步骤130包括下述步骤。In some embodiments, step 130 includes the following steps.

(1)在所述数据标签为存储标签时,根据所述存储标签确定目标存储单元;(1) when the data tag is a storage tag, determining a target storage unit according to the storage tag;

(2)将所述数据存储进所述目标存储单元中。(2) Storing the data in the target storage unit.

通过上述方式,可以将数据存储进不同芯片的存储单元中,扩大了存储空间。Through the above method, data can be stored in the storage units of different chips, thereby expanding the storage space.

在一些实施方式中,步骤130包括步骤:根据所述数据的数据标签和网络的网络结构处理所述数据。In some embodiments, step 130 includes the step of processing the data according to data labels of the data and a network structure of the network.

具体地,步骤根据所述数据的数据标签和网络的网络结构处理所述数据,包括:Specifically, the step of processing the data according to the data label of the data and the network structure of the network includes:

(1)若数据的数据标签为运算标签中的统筹运算标签,且网络的网络结构为树状结构时,将所述数据拆分成多个子数据;(1) If the data label of the data is a comprehensive operation label in the operation label, and the network structure of the network is a tree structure, the data is split into multiple sub-data;

(2)将每个子数据复制生成一个复制子数据;(2) Copy each sub-data to generate a copy sub-data;

(3)为每个处理单元分配一个子数据和与所述子数据不同的复制子数据;(3) allocating a sub-data and a replica sub-data different from the sub-data to each processing unit;

(4)在接收到所有处理单元的运算结果后,汇总子数据的第一运算结果和复制子数据的第二运算结果。(4) After receiving the operation results of all processing units, the first operation results of the sub-data are summarized and the second operation results of the sub-data are copied.

(5)在所述第一运算结果和所述第二运算结果相同时,向外部设备发送最终运算结果;其中,最终运算结果与第一运算结果和第二运算结果相同。(5) When the first operation result and the second operation result are the same, sending a final operation result to an external device; wherein the final operation result is the same as the first operation result and the second operation result.

上述方式中,因为树状结构中的电路板数量较多,因此将数据拆分得到的子数据运算量通常较小,因此让一个处理单元运算两个子数据不会拖累运算速度,并且上述方式最后汇总运算结果后,会对运算结果进行校正,可以保证运算安全性和运算可靠性。In the above method, because there are a large number of circuit boards in the tree structure, the amount of computation required to split the data into sub-data is usually small, so allowing one processing unit to compute two sub-data will not slow down the computing speed. After the above method finally summarizes the computing results, the computing results will be corrected to ensure computing security and reliability.

在一些实施方式中,在高算力存储模组连接的外部设备和/或每个芯片的缓存单元中,会记录每个电路板中的每个存储单元存储的数据的代号,因此支持热插拔功能,因此在拔出电路板后,会等待新的电路板插入,新的电路板插入后根据拔出的电路板中的每个存储单元存储的数据的代号将数据恢复至新的电路板的存储单元中。In some embodiments, the code of the data stored in each storage unit in each circuit board is recorded in the external device connected to the high-computing power storage module and/or the cache unit of each chip, thereby supporting the hot plug function. Therefore, after the circuit board is unplugged, it will wait for the new circuit board to be inserted. After the new circuit board is inserted, the data will be restored to the storage unit of the new circuit board according to the code of the data stored in each storage unit in the unplugged circuit board.

在一些实施方式中,在树状网络的第一级PCIe电路板的缓存单元中专门划分出代号存储区域,以存储每个存储单元存储的数据的代号,In some implementations, a code name storage area is specially allocated in the cache unit of the first-level PCIe circuit board of the tree network to store the code name of the data stored in each storage unit.

综上,本申请提供了一种高算力存储模组以及通信方法,高算力存储模组包括:多个电路板,所述多个电路板之间组成网络;每个所述电路板中包括总线和多个芯片;每个所述芯片均包括处理单元和存储单元,每个所述芯片中的处理单元和存储单元连接于所述总线。本申请中多个电路板组成网络,提高了存储模组的算力,特别适用于需求高算力的应用场景中。In summary, the present application provides a high-computing power storage module and a communication method, wherein the high-computing power storage module comprises: a plurality of circuit boards, wherein the plurality of circuit boards form a network; each of the circuit boards comprises a bus and a plurality of chips; each of the chips comprises a processing unit and a storage unit, wherein the processing unit and the storage unit in each of the chips are connected to the bus. In the present application, a plurality of circuit boards form a network, which improves the computing power of the storage module, and is particularly suitable for application scenarios requiring high computing power.

最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (3)

1.一种高算力存储模组的通信方法,其特征在于,应用于高算力存储模组,所述高算力存储模组包括:多个电路板,所述多个电路板之间组成网络;1. A communication method for a high computing power storage module, characterized in that it is applied to a high computing power storage module, wherein the high computing power storage module comprises: a plurality of circuit boards, wherein the plurality of circuit boards form a network; 每个所述电路板中包括总线和多个芯片;Each of the circuit boards includes a bus and a plurality of chips; 每个所述芯片均包括处理单元和存储单元,每个所述芯片中的处理单元和存储单元连接于所述总线;Each of the chips comprises a processing unit and a storage unit, and the processing unit and the storage unit in each of the chips are connected to the bus; 所述电路板为PCIe电路板;N和M为正整数,N大于或等于二,M小于或等于N减一;The circuit board is a PCIe circuit board; N and M are positive integers, N is greater than or equal to two, and M is less than or equal to N minus one; 在所述PCIe电路板的数量大于3时,多个PCIe电路板通过PCIe交换机组成树状PCIe网络;所述树状PCIe网络中第一级PCIe电路板的总线连接于PCIe交换机;所述PCIe交换机连接于外部设备;When the number of the PCIe circuit boards is greater than 3, the multiple PCIe circuit boards form a tree-like PCIe network through a PCIe switch; a bus of a first-level PCIe circuit board in the tree-like PCIe network is connected to the PCIe switch; and the PCIe switch is connected to an external device; 所述树状PCIe网络包括N级PCIe电路板,每个第M级PCIe电路板连接一个第M+1级PCIe电路板;The tree-like PCIe network includes N levels of PCIe circuit boards, each M-th level PCIe circuit board is connected to an M+1-th level PCIe circuit board; 所述芯片还包括至少一个缓存单元,所述处理单元连接于所述缓存单元,所述缓存单元为非易失性存储器;The chip further includes at least one cache unit, the processing unit is connected to the cache unit, and the cache unit is a non-volatile memory; 每个所述芯片中的处理单元仅根据其所在的芯片中的所述缓存单元和所述存储单元中的数据完成运算任务;The processing unit in each chip completes the computing task only according to the data in the cache unit and the storage unit in the chip where it is located; 所述方法包括:The method comprises: 接收外部设备发送的数据;Receive data sent by external devices; 判断所述数据的数据标签;Determining a data label of the data; 根据所述数据的数据标签处理所述数据;processing the data according to data labels of the data; 所述根据所述数据的数据标签处理所述数据,包括:The processing of the data according to the data label of the data includes: 若数据的数据标签为运算标签中的统筹运算标签,且网络的网络结构为树状结构时,将所述数据拆分成多个子数据;If the data label of the data is a comprehensive operation label in the operation label, and the network structure of the network is a tree structure, the data is split into multiple sub-data; 将每个子数据复制生成一个复制子数据;Copy each sub-data to generate a copy sub-data; 为每个处理单元分配一个子数据和与所述子数据不同的复制子数据;Allocating a sub-data and a duplicate sub-data different from the sub-data to each processing unit; 在接收到所有处理单元的运算结果后,汇总子数据的第一运算结果和复制子数据的第二运算结果;After receiving the operation results of all processing units, summarizing the first operation results of the sub-data and copying the second operation results of the sub-data; 在所述第一运算结果和所述第二运算结果相同时,向外部设备发送最终运算结果;When the first operation result and the second operation result are the same, sending a final operation result to an external device; 所述方法还包括:The method further comprises: 在树状网络的第一级PCIe电路板的缓存单元中专门划分出代号存储区域,以存储PCIe电路板中每个存储单元存储的数据的代号,在PCIe电路板拔出后,等待新的电路板插入,新的电路板插入后根据任意一个第一级PCIe电路板中存储的拔出的PCIe电路板中的每个存储单元存储的数据的代号将数据恢复至新的PCIe电路板的存储单元中。A code storage area is specially divided in the cache unit of the first-level PCIe circuit board in the tree network to store the code of the data stored in each storage unit in the PCIe circuit board. After the PCIe circuit board is pulled out, wait for a new circuit board to be inserted. After the new circuit board is inserted, the data is restored to the storage unit of the new PCIe circuit board based on the code of the data stored in each storage unit in the pulled-out PCIe circuit board stored in any first-level PCIe circuit board. 2.根据权利要求1所述的通信方法,其特征在于,所述根据所述数据的数据标签处理所述数据,包括:2. The communication method according to claim 1, wherein processing the data according to the data tag of the data comprises: 在所述数据标签为运算标签中的独立运算标签时,将空闲处理单元中的其中一个空闲处理单元确定为目标处理单元,将所述数据分配给所述目标处理单元。When the data tag is an independent operation tag in the operation tag, one of the idle processing units is determined as the target processing unit, and the data is allocated to the target processing unit. 3.根据权利要求1所述的通信方法,其特征在于,所述根据所述数据的数据标签处理所述数据,包括:3. The communication method according to claim 1, wherein processing the data according to the data tag of the data comprises: 在所述数据标签为存储标签时,根据所述存储标签确定目标存储单元;When the data tag is a storage tag, determining a target storage unit according to the storage tag; 将所述数据存储进所述目标存储单元中。The data is stored in the target storage unit.
CN202410964538.8A 2024-07-18 2024-07-18 High-power storage module and communication method Active CN118503191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410964538.8A CN118503191B (en) 2024-07-18 2024-07-18 High-power storage module and communication method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410964538.8A CN118503191B (en) 2024-07-18 2024-07-18 High-power storage module and communication method

Publications (2)

Publication Number Publication Date
CN118503191A CN118503191A (en) 2024-08-16
CN118503191B true CN118503191B (en) 2024-10-29

Family

ID=92231723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410964538.8A Active CN118503191B (en) 2024-07-18 2024-07-18 High-power storage module and communication method

Country Status (1)

Country Link
CN (1) CN118503191B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112201A (en) * 2023-07-31 2023-11-24 中国电信股份有限公司技术创新中心 Hardware resource scheduling method, device, computer equipment and storage medium
CN118260235A (en) * 2024-04-22 2024-06-28 原粒(北京)半导体技术有限公司 Force calculation acceleration card, design method and force calculation server

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360323B (en) * 2021-07-02 2024-11-29 西安紫光国芯半导体股份有限公司 Many-core computing circuit, stacked chip and fault-tolerant control method
CN115221111A (en) * 2022-07-13 2022-10-21 中国电信股份有限公司 Processor data operation method, processor, device and storage medium
CN218446658U (en) * 2022-07-25 2023-02-03 深圳矽递科技股份有限公司 Board card assembly with high calculation power
CN118034780A (en) * 2024-02-06 2024-05-14 电子科技大学 Nonvolatile multi-core heterogeneous integrated memory internal computing acceleration system
CN118295960B (en) * 2024-06-03 2024-09-03 芯方舟(上海)集成电路有限公司 Force calculating chip, design method and manufacturing method thereof and force calculating chip system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112201A (en) * 2023-07-31 2023-11-24 中国电信股份有限公司技术创新中心 Hardware resource scheduling method, device, computer equipment and storage medium
CN118260235A (en) * 2024-04-22 2024-06-28 原粒(北京)半导体技术有限公司 Force calculation acceleration card, design method and force calculation server

Also Published As

Publication number Publication date
CN118503191A (en) 2024-08-16

Similar Documents

Publication Publication Date Title
US8151042B2 (en) Method and system for providing identification tags in a memory system having indeterminate data response times
CN113535633B (en) On-chip caching device and read-write method
CN101093717B (en) Input/output agent having multiple secondary ports
CN114546913B (en) Method and device for high-speed data interaction between multiple hosts based on PCIE interface
CN114675722A (en) A memory expansion unit and a rack
JP2011518378A (en) Direct data transfer between slave devices
EP1963977B1 (en) Memory systems with memory chips down and up
CN114168520A (en) Optical fiber communication bus device, equipment and system
CN108090014A (en) The storage IO casees system and its design method of a kind of compatible NVMe
CN111552658B (en) Communication method, communication control device and I2C bus system
US20100005206A1 (en) Automatic read data flow control in a cascade interconnect memory system
CN117971740B (en) Memory expansion board card and memory expansion method
CN109933554A (en) An NVMe hard disk expansion device based on GPU server
CN116126742A (en) Memory access method, device, server and storage medium
CN117851283A (en) A distributed memory orthogonal architecture based on CXL
CN209248436U (en) An expansion board and server
Kwon et al. Gen-z memory pool system architecture
CN113312304B (en) A kind of interconnection device, motherboard and server
CN118503191B (en) High-power storage module and communication method
US20060200614A1 (en) Computer system using serial connect bus, and method for interconnecting a plurality of CPU using serial connect bus
CN209248518U (en) A solid-state hard disk expansion board and server
CN207833500U (en) A kind of 40G rate network interface cards for supporting non-standard interface
CN119107986A (en) Memory module, electronic device and data migration method
JP2008529134A (en) Low power semiconductor storage controller for mobile phones and other portable devices
KR20050080704A (en) Apparatus and method of inter processor communication

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant