CN109002411B - Method and system for automatically configuring GPU expansion box, and GPU expansion box that can be automatically configured - Google Patents
Method and system for automatically configuring GPU expansion box, and GPU expansion box that can be automatically configured Download PDFInfo
- Publication number
- CN109002411B CN109002411B CN201810824831.9A CN201810824831A CN109002411B CN 109002411 B CN109002411 B CN 109002411B CN 201810824831 A CN201810824831 A CN 201810824831A CN 109002411 B CN109002411 B CN 109002411B
- Authority
- CN
- China
- Prior art keywords
- expansion box
- gpu expansion
- gpu
- bmc
- host
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4004—Coupling between buses
- G06F13/4009—Coupling between buses with data restructuring
- G06F13/4018—Coupling between buses with data restructuring with data-width conversion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/16—Constructional details or arrangements
- G06F1/18—Packaging or power distribution
- G06F1/183—Internal mounting support structures, e.g. for printed circuit boards, internal connecting means
- G06F1/185—Mounting of expansion boards
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Power Engineering (AREA)
- Human Computer Interaction (AREA)
- Information Transfer Systems (AREA)
Abstract
本申请公开了一种自动配置GPU扩展箱的方法、系统及可自动配置的GPU扩展箱,该方法包括:建立GPU扩展箱的连接拓扑与GPU扩展箱内PCIE switch芯片配置之间的映射关系;主机的BMC或GPU扩展箱的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑;根据当前GPU扩展箱的连接拓扑以及所述映射关系,对当前GPU扩展箱内PCIE switch芯片进行配置。该系统包括映射关系建立模块、检测模块和配置模块。该可自动配置的GPU扩展箱中设置有GPU扩展箱的BMC,且主机与GPU扩展箱之间通过连接线通信连接,且主机和GPU扩展箱之间,以及多级GPU扩展箱之间的连接线上设置有I2C总线。本申请能够避免用户手动对GPU扩展箱进行配置,大大提高配置的准确性和配置效率,有利于提高计算机的性能。
The present application discloses a method and system for automatically configuring a GPU expansion box, and an automatically configurable GPU expansion box. The method includes: establishing a mapping relationship between a connection topology of the GPU expansion box and a configuration of a PCIE switch chip in the GPU expansion box; The BMC of the host or the BMC of the GPU expansion box uses the I2C signal to detect the connection topology of the current GPU expansion box; according to the connection topology of the current GPU expansion box and the mapping relationship, the PCIE switch chip in the current GPU expansion box is configured. The system includes a mapping relationship establishment module, a detection module and a configuration module. The automatically configurable GPU expansion box is provided with the BMC of the GPU expansion box, and the host and the GPU expansion box are connected through a communication cable, and the connections between the host and the GPU expansion box, and between the multi-level GPU expansion boxes There is an I2C bus on the line. The present application can prevent the user from manually configuring the GPU expansion box, greatly improve the accuracy and efficiency of the configuration, and help improve the performance of the computer.
Description
技术领域technical field
本申请涉及服务器系统设计技术领域,特别是涉及一种自动配置GPU扩展箱的方法、系统及可自动配置的GPU扩展箱。The present application relates to the technical field of server system design, and in particular, to a method and system for automatically configuring a GPU expansion box and a GPU expansion box that can be automatically configured.
背景技术Background technique
随着大数据、云计算和人工智能的技术的快速发展,系统对服务器计算性能的要求越来越高,GPU(Graphics Processing Unit,图形处理器)因其在数据计算上的优势,使其在服务器上的应用越来越广泛。为了扩展更多的GPU,通常将多个GPU集成到一个扩展箱内,形成GPU扩展箱,从而实现GPU资源的池化,便于GPU资源的调度。With the rapid development of big data, cloud computing and artificial intelligence technologies, the system has higher and higher requirements for server computing performance. GPU (Graphics Processing Unit, graphics processor), because of its advantages in data computing, makes it in the The application on the server is more and more extensive. In order to expand more GPUs, multiple GPUs are usually integrated into one expansion box to form a GPU expansion box, so as to realize the pooling of GPU resources and facilitate the scheduling of GPU resources.
通常GPU扩展箱是使用PCIE switch芯片将CPU上有限的PCIE(peripheralcomponent interconnect express,高速串行计算机扩展总线标准)接口扩展成多个PCIE接口,从而连接更多的GPU,完成GPU的扩展和GPU资源的池化。GPU和CPU之间通过PCIE接口连接的方式就是GPU扩展箱的连接拓扑。实际应用中,由于运行的业务不同,需要针对不同的业务更换不同的连接拓扑。Generally, the GPU expansion box uses the PCIE switch chip to expand the limited PCIE (peripheral component interconnect express, high-speed serial computer expansion bus standard) interface on the CPU into multiple PCIE interfaces, thereby connecting more GPUs, completing GPU expansion and GPU resources. pooling. The connection between the GPU and the CPU through the PCIE interface is the connection topology of the GPU expansion box. In practical applications, due to different running services, different connection topologies need to be replaced for different services.
目前的GPU扩展箱中采用固定的连接拓扑,当业务不同需要改变连接拓扑时,用户首先需要根据需求改变机箱的物理连接关系,然后手动对GPU扩展箱进行配置,从而实现不同的拓扑连接。The current GPU expansion box adopts a fixed connection topology. When the connection topology needs to be changed for different services, the user first needs to change the physical connection relationship of the chassis according to the requirements, and then manually configure the GPU expansion box to achieve different topology connections.
然而,目前的GPU扩展中,由于连接拓扑改变时,需要手动对GPU扩展箱进行配置,操作较为繁琐,容易出现配置误差,配置的准确性和配置效率不够高,从而影响计算机的性能。However, in the current GPU expansion, when the connection topology changes, the GPU expansion box needs to be manually configured, which is cumbersome and prone to configuration errors. The accuracy and efficiency of the configuration are not high enough, thus affecting the performance of the computer.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种自动配置GPU扩展箱的方法、系统及可自动配置的GPU扩展箱,以解决现有技术中GPU扩展箱的配置准确性和配置效率低的问题。The present application provides a method and system for automatically configuring a GPU expansion box, and an automatically configurable GPU expansion box, so as to solve the problems of low configuration accuracy and low configuration efficiency of the GPU expansion box in the prior art.
为了解决上述技术问题,本申请实施例公开了如下技术方案:In order to solve the above technical problems, the embodiments of the present application disclose the following technical solutions:
一种自动配置GPU扩展箱的方法,所述GPU扩展箱中设置有一PCIE switch芯片,所述PCIE switch芯片上设置有多个端口,其中,第一端口用于连接主机或上一级GPU扩展箱,第二端口用于备用、连接下一级GPU扩展箱或连接另一主机,其余端口用于连接当前GPU扩展箱内的GPU,其特征在于,所述方法包括:A method for automatically configuring a GPU expansion box, wherein a PCIE switch chip is arranged in the GPU expansion box, and a plurality of ports are arranged on the PCIE switch chip, wherein the first port is used to connect a host or an upper-level GPU expansion box , the second port is used for backup, connecting the next-level GPU expansion box or connecting another host, and the remaining ports are used to connect the GPU in the current GPU expansion box, and the method includes:
建立GPU扩展箱的连接拓扑与GPU扩展箱内PCIE switch芯片配置之间的映射关系,所述GPU扩展箱的连接拓扑包括:直联模式、级联模式和上行模式;Establish a mapping relationship between the connection topology of the GPU expansion box and the configuration of the PCIE switch chips in the GPU expansion box, where the connection topology of the GPU expansion box includes: direct connection mode, cascade mode and uplink mode;
主机的BMC(Baseboard Management Controller,基板管理控制器)或GPU扩展箱的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑;The BMC (Baseboard Management Controller) of the host or the BMC of the GPU expansion box uses the I2C signal to detect the connection topology of the current GPU expansion box;
主机的BMC或GPU扩展箱的BMC根据当前GPU扩展箱的连接拓扑以及所述映射关系,对当前GPU扩展箱内PCIE switch芯片进行配置。The BMC of the host or the BMC of the GPU expansion box configures the PCIE switch chip in the current GPU expansion box according to the connection topology of the current GPU expansion box and the mapping relationship.
可选地,主机的BMC或GPU扩展箱的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑之前,所述方法还包括:Optionally, before the BMC of the host or the BMC of the GPU expansion box uses the I2C signal to detect the connection topology of the current GPU expansion box, the method further includes:
通过在主机和GPU扩展箱之间的连接线上增加I2C信号,建立主机BMC与GPU扩展箱BMC之间的通信连接;Establish a communication connection between the host BMC and the GPU expansion box BMC by adding an I2C signal to the connection line between the host and the GPU expansion box;
通过在上一级GPU扩展箱和下一级GPU扩展箱之间的连接线上增加I2C信号,建立上一级GPU扩展箱的BMC与下一级GPU扩展箱的BMC之间的通信连接。By adding an I2C signal on the connection line between the upper-level GPU expansion box and the next-level GPU expansion box, a communication connection between the BMC of the upper-level GPU expansion box and the BMC of the next-level GPU expansion box is established.
可选地,所述映射关系为:Optionally, the mapping relationship is:
GPU扩展箱的第一连接拓扑与第一配置相匹配,其中,第一连接拓扑为直联模式,第一配置为:PCIE switch芯片上第一端口连接一主机,且第二端口不连接任何主机或GPU扩展箱;The first connection topology of the GPU expansion box matches the first configuration, wherein the first connection topology is a direct connection mode, and the first configuration is: the first port on the PCIE switch chip is connected to a host, and the second port is not connected to any host or GPU expansion box;
GPU扩展箱的第二连接拓扑与第二配置相匹配,其中,第二连接拓扑为级联模式的第一级,第二配置为:PCIE switch芯片上第一端口连接一主机,且第二端口连接下一级GPU扩展箱;The second connection topology of the GPU expansion box matches the second configuration, wherein the second connection topology is the first stage of the cascade mode, and the second configuration is: the first port on the PCIE switch chip is connected to a host, and the second port Connect the next-level GPU expansion box;
GPU扩展箱的第三连接拓扑与第三配置相匹配,其中,第三连接拓扑为级联模式的第N级,第三配置为:PCIE switch芯片上第一端口连接上一级GPU扩展箱,N≥2且N为自然数;The third connection topology of the GPU expansion box matches the third configuration, wherein the third connection topology is the Nth level of the cascade mode, and the third configuration is: the first port on the PCIE switch chip is connected to the upper-level GPU expansion box, N≥2 and N is a natural number;
GPU扩展箱的第四连接拓扑与第四配置相匹配,其中,第四连接拓扑为上行模式,第四配置为:PCIE switch芯片上第一端口连接一主机,第二端口连接另一主机。The fourth connection topology of the GPU expansion box matches the fourth configuration, wherein the fourth connection topology is an uplink mode, and the fourth configuration is: the first port on the PCIE switch chip is connected to a host, and the second port is connected to another host.
可选地,当GPU扩展箱的连接拓扑为:直联模式或上行模式时,所述主机的BMC或GPU扩展箱的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑,包括:Optionally, when the connection topology of the GPU expansion box is: direct connection mode or uplink mode, the BMC of the host or the BMC of the GPU expansion box utilizes the I2C signal to detect the connection topology of the current GPU expansion box, including:
主机的BMC利用I2C信号扫描GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或GPU扩展箱的BMC;或者,The BMC of the host uses the I2C signal to scan the first port and the second port of the PCIE switch chip in the GPU expansion box to determine whether the BMC of the host or the BMC of the GPU expansion box is detected; or,
GPU扩展箱的BMC利用I2C信号扫描所述GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或GPU扩展箱的BMC。The BMC of the GPU expansion box uses the I2C signal to scan the first port and the second port of the PCIE switch chip in the GPU expansion box to determine whether the BMC of the host or the BMC of the GPU expansion box is detected.
可选地,当GPU扩展箱的连接拓扑为级联模式时,所述主机的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑,包括:Optionally, when the connection topology of the GPU expansion box is in cascade mode, the BMC of the host detects the connection topology of the current GPU expansion box by using an I2C signal, including:
主机的BMC利用I2C信号扫描第一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或第一级GPU扩展箱的BMC;The BMC of the host uses the I2C signal to scan the first port and the second port of the PCIE switch chip in the first-level GPU expansion box to determine whether the BMC of the host or the BMC of the first-level GPU expansion box is detected;
上一级GPU扩展箱的BMC利用I2C信号扫描下一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到下一级GPU扩展箱的BMC。The BMC of the upper-level GPU expansion box uses the I2C signal to scan the first port and the second port of the PCIE switch chip in the next-level GPU expansion box to determine whether the BMC of the next-level GPU expansion box is detected.
可选地,当GPU扩展箱的连接拓扑为级联模式时,所述GPU扩展箱的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑,包括:Optionally, when the connection topology of the GPU expansion box is in cascade mode, the BMC of the GPU expansion box detects the connection topology of the current GPU expansion box by using an I2C signal, including:
第一级GPU扩展箱的BMC利用I2C信号扫描所述第一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或第一级GPU扩展箱的BMC;The BMC of the first-level GPU expansion box utilizes the I2C signal to scan the first port and the second port of the PCIE switch chip in the first-level GPU expansion box to determine whether to detect the BMC of the host or the BMC of the first-level GPU expansion box;
上一级GPU扩展箱的BMC利用I2C信号扫描下一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到下一级GPU扩展箱的BMC。The BMC of the upper-level GPU expansion box uses the I2C signal to scan the first port and the second port of the PCIE switch chip in the next-level GPU expansion box to determine whether the BMC of the next-level GPU expansion box is detected.
可选地,所述根据当前GPU扩展箱的连接拓扑以及所述映射关系,对GPU扩展箱内PCIE switch芯片进行配置,包括:Optionally, according to the connection topology of the current GPU expansion box and the mapping relationship, the configuration of the PCIE switch chip in the GPU expansion box includes:
如果当前GPU扩展箱的PCIE switch芯片上仅有第一端口检测到主机的BMC,第二端口不连接任何主机或GPU扩展箱,将当前GPU扩展箱的PCIE switch芯片设置为第一配置;If only the first port on the PCIE switch chip of the current GPU expansion box detects the BMC of the host, and the second port is not connected to any host or GPU expansion box, set the PCIE switch chip of the current GPU expansion box to the first configuration;
如果当前GPU扩展箱的PCIE switch芯片上第一端口检测到主机的BMC,第二端口连接下一级GPU扩展箱,将当前GPU扩展箱的PCIE switch芯片设置为第二配置;If the first port on the PCIE switch chip of the current GPU expansion box detects the BMC of the host, the second port is connected to the next-level GPU expansion box, and the PCIE switch chip of the current GPU expansion box is set to the second configuration;
如果当前GPU扩展箱的PCIE switch芯片上仅有第一端口检测到上一级GPU扩展箱的BMC,将当前GPU扩展箱的PCIE switch芯片设置为第三配置;If only the first port on the PCIE switch chip of the current GPU expansion box detects the BMC of the previous-level GPU expansion box, set the PCIE switch chip of the current GPU expansion box to the third configuration;
如果当前GPU扩展箱的PCIE switch芯片上第一端口检测到一主机的BMC,第二端口检测到另一主机的BMC,将当前GPU扩展箱的PCIE switch芯片设置为第四配置。If the first port on the PCIE switch chip of the current GPU expansion box detects the BMC of one host, and the second port detects the BMC of another host, the PCIE switch chip of the current GPU expansion box is set to the fourth configuration.
一种自动配置GPU扩展箱的系统,所述GPU扩展箱中设置有一PCIE switch芯片,所述PCIE switch芯片上设置有多个端口,其中,第一端口用于连接主机或上一级GPU扩展箱,第二端口用于备用、连接下一级GPU扩展箱或连接另一主机,其余端口用于连接当前GPU扩展箱内的GPU,所述系统包括:A system for automatically configuring a GPU expansion box, wherein a PCIE switch chip is arranged in the GPU expansion box, and a plurality of ports are arranged on the PCIE switch chip, wherein the first port is used to connect a host or an upper-level GPU expansion box , the second port is used for backup, connecting to the next-level GPU expansion box or connecting to another host, and the remaining ports are used to connect the GPU in the current GPU expansion box, and the system includes:
映射关系建立模块,用于建立GPU扩展箱的连接拓扑与GPU扩展箱内PCIE switch芯片配置之间的映射关系;a mapping relationship establishing module, used to establish a mapping relationship between the connection topology of the GPU expansion box and the configuration of the PCIE switch chip in the GPU expansion box;
检测模块,用于利用I2C信号检测当前GPU扩展箱的连接拓扑;The detection module is used to detect the connection topology of the current GPU expansion box by using the I2C signal;
配置模块,用于根据当前GPU扩展箱的连接拓扑以及所述映射关系,对GPU扩展箱内PCIE switch芯片进行配置。The configuration module is configured to configure the PCIE switch chip in the GPU expansion box according to the current connection topology of the GPU expansion box and the mapping relationship.
可选地,所述映射关系为:Optionally, the mapping relationship is:
GPU扩展箱的第一连接拓扑与第一配置相匹配,其中,第一连接拓扑为直联模式,第一配置为:PCIE switch芯片上第一端口连接一主机,且第二端口不连接任何主机或GPU扩展箱;The first connection topology of the GPU expansion box matches the first configuration, wherein the first connection topology is a direct connection mode, and the first configuration is: the first port on the PCIE switch chip is connected to a host, and the second port is not connected to any host or GPU expansion box;
GPU扩展箱的第二连接拓扑与第二配置相匹配,其中,第二连接拓扑为级联模式的第一级,第二配置为:PCIE switch芯片上第一端口连接一主机,且第二端口连接下一级GPU扩展箱;The second connection topology of the GPU expansion box matches the second configuration, wherein the second connection topology is the first stage of the cascade mode, and the second configuration is: the first port on the PCIE switch chip is connected to a host, and the second port Connect the next-level GPU expansion box;
GPU扩展箱的第三连接拓扑与第三配置相匹配,其中,第三连接拓扑为级联模式的第N级,第三配置为:PCIE switch芯片上第一端口连接上一级GPU扩展箱,N≥2且N为自然数;The third connection topology of the GPU expansion box matches the third configuration, wherein the third connection topology is the Nth level of the cascade mode, and the third configuration is: the first port on the PCIE switch chip is connected to the upper-level GPU expansion box, N≥2 and N is a natural number;
GPU扩展箱的第四连接拓扑与第四配置相匹配,其中,第四连接拓扑为上行模式,第四配置为:PCIE switch芯片上第一端口连接一主机,第二端口连接另一主机。The fourth connection topology of the GPU expansion box matches the fourth configuration, wherein the fourth connection topology is an uplink mode, and the fourth configuration is: the first port on the PCIE switch chip is connected to a host, and the second port is connected to another host.
一种可自动配置的GPU扩展箱,所述GPU扩展箱中设置有一PCIE switch芯片,所述PCIE switch芯片上设置有多个端口,其中,第一端口用于连接主机或上一级GPU扩展箱,第二端口用于备用、连接下一级GPU扩展箱或连接另一主机,其余端口用于连接当前GPU扩展箱内的GPU,An automatically configurable GPU expansion box, the GPU expansion box is provided with a PCIE switch chip, and the PCIE switch chip is provided with a plurality of ports, wherein the first port is used to connect a host or an upper-level GPU expansion box , the second port is used for backup, connecting to the next-level GPU expansion box or connecting to another host, and the other ports are used to connect the GPU in the current GPU expansion box,
所述主机中设置有主机的BMC,用于对GPU扩展箱和主机进行监控,并根据GPU扩展箱的连接拓扑,对GPU扩展箱内PCIE switch芯片进行配置;The host computer is provided with the BMC of the host computer, which is used to monitor the GPU expansion box and the host computer, and configure the PCIE switch chip in the GPU expansion box according to the connection topology of the GPU expansion box;
所述GPU扩展箱中还设置有GPU扩展箱的BMC,用于对GPU扩展箱和主机进行监控,并根据GPU扩展箱的连接拓扑,对GPU扩展箱内PCIE switch芯片进行配置;The GPU expansion box is also provided with a BMC of the GPU expansion box, for monitoring the GPU expansion box and the host, and configuring the PCIE switch chip in the GPU expansion box according to the connection topology of the GPU expansion box;
所述主机与GPU扩展箱之间通过连接线通信连接,且所述主机和GPU扩展箱之间,以及多级GPU扩展箱之间的连接线上设置有I2C总线,所述I2C总线用于建立主机BMC与GPU扩展箱BMC之间的通信连接,以及,建立上一级GPU扩展箱的BMC与下一级GPU扩展箱的BMC之间的通信连接;The host and the GPU expansion box are communicated and connected through a connecting line, and an I2C bus is provided on the connecting line between the host and the GPU expansion box, as well as between the multi-level GPU expansion boxes, and the I2C bus is used to establish The communication connection between the host BMC and the GPU expansion box BMC, and the establishment of the communication connection between the BMC of the upper-level GPU expansion box and the BMC of the next-level GPU expansion box;
所述主机的BMC和GPU扩展箱的BMC中均设置有上述的一种自动配置GPU扩展箱的系统。Both the BMC of the host and the BMC of the GPU expansion box are provided with the above-mentioned system for automatically configuring the GPU expansion box.
本申请的实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of the present application may include the following beneficial effects:
本申请提供一种自动配置GPU扩展箱的方法,该方法首先建立GPU扩展箱的连接拓扑与GPU扩展箱内PCIE switch芯片配置之间的映射关系;然后主机的BMC或GPU扩展箱的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑;最后根据当前GPU扩展箱的连接拓扑以及所述映射关系,对当前GPU扩展箱内PCIE switch芯片进行配置。本申请首先建立连接拓扑与GPU扩展箱内的PCIE switch芯片的配置一一对应关系,然后通过GPU扩展箱和主机内的BMC芯片控制I2C信号对GPU扩展箱的端口进行扫描,从而实现对GPU扩展箱与主机构成的计算系统进行实时监控,当系统的连接拓扑改变时,BMC会根据实际的连接拓扑对扩展箱进行配置,从而完成整个计算系统的改配。由于本申请通过BMC控制I2C信号进行实时监控并完成自动改配,从而避免用户手动对GPU扩展箱进行配置,因此,能够大大提高配置的准确性和配置效率,有利于提高计算机的性能。The application provides a method for automatically configuring a GPU expansion box. The method first establishes a mapping relationship between the connection topology of the GPU expansion box and the configuration of the PCIE switch chip in the GPU expansion box; then the BMC of the host or the BMC of the GPU expansion box utilizes I2C The signal detects the connection topology of the current GPU expansion box; finally, the PCIE switch chip in the current GPU expansion box is configured according to the connection topology of the current GPU expansion box and the mapping relationship. The application first establishes a one-to-one correspondence between the connection topology and the configuration of the PCIE switch chip in the GPU expansion box, and then controls the I2C signal of the GPU expansion box and the BMC chip in the host to scan the ports of the GPU expansion box, thereby realizing the expansion of the GPU. The computing system composed of the box and the host is monitored in real time. When the connection topology of the system changes, the BMC will configure the expansion box according to the actual connection topology, thereby completing the modification of the entire computing system. Since the present application controls the I2C signal through the BMC to perform real-time monitoring and complete automatic configuration, so as to avoid the user from manually configuring the GPU expansion box, the accuracy and configuration efficiency of the configuration can be greatly improved, and the performance of the computer can be improved.
本申请还提供一种自动配置GPU扩展箱的系统,该系统主要包括:映射关系建立模块、检测模块和配置模块,通过映射关系建立模块建立GPU扩展箱的连接拓扑与GPU扩展箱内PCIE switch芯片配置之间的映射关系;然后通过检测模块利用I2C信号检测当前GPU扩展箱的连接拓扑;最后配置模块根据检测模块的检测结果和映射关系建立模块的映射关系,结合当前GPU扩展箱的连接拓扑,对GPU扩展箱内PCIE switch芯片进行配置。本申请中映射关系建立模块、检测模块和配置模块的设置,能够充分利用BMC对计算系统进行实时监控并完成自动改配,从而避免用户手动对GPU扩展箱进行配置,因此,能够大大提高配置的准确性和配置效率,有利于提高计算机的性能。The present application also provides a system for automatically configuring a GPU expansion box, the system mainly includes: a mapping relationship establishment module, a detection module and a configuration module, and the mapping relationship establishment module establishes the connection topology of the GPU expansion box and the PCIE switch chip in the GPU expansion box The mapping relationship between the configurations; then use the I2C signal to detect the connection topology of the current GPU expansion box through the detection module; finally, the configuration module establishes the mapping relationship of the module according to the detection results of the detection module and the mapping relationship, combined with the current connection topology of the GPU expansion box, Configure the PCIE switch chip in the GPU expansion box. The settings of the mapping relationship establishment module, the detection module and the configuration module in this application can make full use of the BMC to monitor the computing system in real time and complete the automatic reconfiguration, thereby avoiding the user's manual configuration of the GPU expansion box, thus greatly improving the configuration efficiency. Accuracy and configuration efficiency are beneficial to improve computer performance.
本申请还提供一种可自动配置的GPU扩展箱,该GPU扩展箱中设置有BMC,且主机中也设置有BMC,且GPU扩展箱的BMC和主机的BMC中均设置有如上所述的自动配置GPU扩展箱的系统,该系统包括映射关系建立模块、检测模块和配置模块。本申请的GPU扩展箱中I2C总线的设置能够实现GPU扩展箱BMC与主机BMC的通信,从而通过BMC对计算系统进行实时监控并完成自动改配,因此能够避免用户手动对GPU扩展箱进行配置,能够大大提高配置的准确性和配置效率,有利于提高计算机的性能。The present application also provides an automatically configurable GPU expansion box, the GPU expansion box is provided with a BMC, and the host is also provided with a BMC, and the BMC of the GPU expansion box and the BMC of the host are both provided with the above-mentioned automatic A system for configuring a GPU expansion box, the system includes a mapping relationship establishment module, a detection module and a configuration module. The setting of the I2C bus in the GPU expansion box of the present application can realize the communication between the GPU expansion box BMC and the host BMC, so that the computing system can be monitored in real time through the BMC and the automatic modification can be completed, so that the user can be prevented from manually configuring the GPU expansion box. The configuration accuracy and configuration efficiency can be greatly improved, and the performance of the computer can be improved.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not limiting of the present application.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. In other words, other drawings can also be obtained based on these drawings without creative labor.
图1为本申请实施例所提供的一种自动配置GPU扩展箱的方法的流程示意图;1 is a schematic flowchart of a method for automatically configuring a GPU expansion box according to an embodiment of the present application;
图2为本申请实施例中GPU扩展箱的连接拓扑图;Fig. 2 is the connection topology diagram of GPU expansion box in the embodiment of the application;
图3为本申请实施例所提供的一种自动配置GPU扩展箱的系统的结构示意图;3 is a schematic structural diagram of a system for automatically configuring a GPU expansion box provided by an embodiment of the application;
图4为本申请实施例所提供的一种可自动配置的GPU扩展箱的结构示意图。FIG. 4 is a schematic structural diagram of an automatically configurable GPU expansion box provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described The embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the scope of protection of the present application.
为了更好地理解本申请,下面结合附图来详细解释本申请的实施方式。For a better understanding of the present application, the embodiments of the present application are explained in detail below with reference to the accompanying drawings.
实施例一Example 1
参见图1,图1为本申请实施例所提供的一种自动配置GPU扩展箱的方法的流程示意图。由图1可知,本实施例中自动配置GPU扩展箱的方法主要包括如下过程:Referring to FIG. 1 , FIG. 1 is a schematic flowchart of a method for automatically configuring a GPU expansion box provided by an embodiment of the present application. As can be seen from FIG. 1 , the method for automatically configuring the GPU expansion box in this embodiment mainly includes the following processes:
S1:建立GPU扩展箱的连接拓扑与GPU扩展箱内PCIE switch芯片配置之间的映射关系。其中,GPU扩展箱的连接拓扑主要包括:直联模式、级联模式和上行模式三种类型。S1: Establish a mapping relationship between the connection topology of the GPU expansion box and the configuration of the PCIE switch chip in the GPU expansion box. Among them, the connection topology of the GPU expansion box mainly includes three types: direct connection mode, cascade mode and uplink mode.
本实施例的GPU扩展箱中设置有一PCIE switch芯片,PCIE switch芯片上设置有多个端口,其中,第一端口用于连接主机或上一级GPU扩展箱,第二端口用于备用、连接下一级GPU扩展箱或连接另一主机,其余端口用于连接当前GPU扩展箱内的GPU。The GPU expansion box of this embodiment is provided with a PCIE switch chip, and the PCIE switch chip is provided with a plurality of ports, wherein the first port is used to connect the host or the upper-level GPU expansion box, and the second port is used for backup and connection to the lower level. The first-level GPU expansion box can be connected to another host, and the remaining ports are used to connect the GPU in the current GPU expansion box.
用户在使用GPU扩展箱时,会根据业务的实际需求对连接拓扑进行改动,这就需要在各种模式之间进行切换,通常用户会先进行物理连线的重新连接,完成GPU扩展箱的连接拓扑切换,然后再根据具体的连接拓扑进行PCIE switch芯片的重新配置,本实施例能够通过BMC实现PCIE switch芯片的自动配置。When using the GPU expansion box, the user will change the connection topology according to the actual needs of the business, which requires switching between various modes. Usually, the user will first reconnect the physical connection to complete the connection of the GPU expansion box. The topology is switched, and then the PCIE switch chip is reconfigured according to the specific connection topology. This embodiment can realize the automatic configuration of the PCIE switch chip through the BMC.
三种类型的连接拓扑可以参见图2。本实施中GPU扩展箱和主机能够构成一个计算系统,这个计算系统通常有三种类型的连接拓扑,也就是本实施例中所称的GPU扩展箱的连接拓扑。直联模式中,一个主机ServerA通过CPU的端口Pa0与一个GPU扩展箱连接;级联模式中,主机ServerA通过CPU的一个端口Pa0与第一级GPU扩展箱GPU box0的一个端口P0连接,第一级GPU扩展箱GPU box0通过另一个端口P1与下一级GPU扩展箱GPU box1的一个端口P0连接,以此类推,级联模式可以设置多级GPU扩展箱;上行模式中,可以有双上行或多上行,以双上行模式为例,主机ServerA通过CPU的Pa0端口连接GPU扩展箱的一个端口P0,主机ServerB通过CPU的Pb0端口连接GPU扩展箱的另一个端口P1。Three types of connection topologies can be seen in Figure 2. In this implementation, the GPU expansion box and the host can form a computing system, and this computing system usually has three types of connection topologies, which are referred to as the connection topology of the GPU expansion box in this embodiment. In the direct connection mode, a host ServerA is connected to a GPU expansion box through the port Pa0 of the CPU; in the cascade mode, the host ServerA is connected to a port P0 of the first-level GPU expansion box GPU box0 through a port Pa0 of the CPU. The first-level GPU expansion box GPU box0 is connected to a port P0 of the next-level GPU expansion box GPU box1 through another port P1, and so on, the cascade mode can set up multi-level GPU expansion boxes; in the upstream mode, there can be dual upstream or For multiple uplinks, taking the dual uplink mode as an example, the host ServerA is connected to one port P0 of the GPU expansion box through the Pa0 port of the CPU, and the host ServerB is connected to the other port P1 of the GPU expansion box through the Pb0 port of the CPU.
本实施例中GPU扩展箱的连接拓扑与GPU扩展箱内PCIE switch芯片配置之间的映射关系如下:In this embodiment, the mapping relationship between the connection topology of the GPU expansion box and the configuration of the PCIE switch chip in the GPU expansion box is as follows:
GPU扩展箱的第一连接拓扑与第一配置相匹配,其中,第一连接拓扑为直联模式,第一配置为:PCIE switch芯片上第一端口连接一主机,且第二端口不连接任何主机或GPU扩展箱。GPU扩展箱的第二连接拓扑与第二配置相匹配,其中,第二连接拓扑为级联模式的第一级,第二配置为:PCIE switch芯片上第一端口连接一主机,且第二端口连接下一级GPU扩展箱。GPU扩展箱的第三连接拓扑与第三配置相匹配,其中,第三连接拓扑为级联模式的第N级,第三配置为:PCIE switch芯片上第一端口连接上一级GPU扩展箱,N≥2且N为自然数。GPU扩展箱的第四连接拓扑与第四配置相匹配,其中,第四连接拓扑为上行模式,第四配置为:PCIE switch芯片上第一端口连接一主机,第二端口连接另一主机。The first connection topology of the GPU expansion box matches the first configuration, wherein the first connection topology is a direct connection mode, and the first configuration is: the first port on the PCIE switch chip is connected to a host, and the second port is not connected to any host or a GPU expansion box. The second connection topology of the GPU expansion box matches the second configuration, wherein the second connection topology is the first stage of the cascade mode, and the second configuration is: the first port on the PCIE switch chip is connected to a host, and the second port Connect the next level GPU expansion box. The third connection topology of the GPU expansion box matches the third configuration, wherein the third connection topology is the Nth level of the cascade mode, and the third configuration is: the first port on the PCIE switch chip is connected to the upper-level GPU expansion box, N≥2 and N is a natural number. The fourth connection topology of the GPU expansion box matches the fourth configuration, wherein the fourth connection topology is an uplink mode, and the fourth configuration is: the first port on the PCIE switch chip is connected to a host, and the second port is connected to another host.
进一步地,在步骤S1之前,本实施还包括步骤S01:通过在主机和GPU扩展箱之间的连接线上增加I2C信号,建立主机BMC与GPU扩展箱BMC之间的通信连接;Further, before step S1, this implementation also includes step S01: by adding an I2C signal on the connection line between the host and the GPU expansion box, establish a communication connection between the host BMC and the GPU expansion box BMC;
S02:通过在上一级GPU扩展箱和下一级GPU扩展箱之间的连接线上增加I2C信号,建立上一级GPU扩展箱的BMC与下一级GPU扩展箱的BMC之间的通信连接。S02: Establish a communication connection between the BMC of the upper-level GPU expansion box and the BMC of the next-level GPU expansion box by adding an I2C signal to the connection line between the upper-level GPU expansion box and the next-level GPU expansion box .
通过步骤S01,在主机和GPU扩展箱之间的连接线上设置I2C总线,从而使主机BMC与GPU扩展箱BMC之间能够通过I2C信号实现通信连接。通过步骤S02,在上一级GPU扩展箱和下一级GPU扩展箱之间的连接线上设置I2C总线,从而使上一级GPU扩展箱和下一级GPU扩展箱之间能够通过I2C信号实现通信连接。Through step S01, an I2C bus is set on the connection line between the host and the GPU expansion box, so that the communication connection between the host BMC and the GPU expansion box BMC can be realized through the I2C signal. Through step S02, an I2C bus is set on the connection line between the upper-level GPU expansion box and the next-level GPU expansion box, so that the connection between the upper-level GPU expansion box and the next-level GPU expansion box can be realized through I2C signals communication connection.
继续参见图1可知,建立映射关系后,执行步骤S2:主机的BMC或GPU扩展箱的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑。Continuing to refer to FIG. 1 , after the mapping relationship is established, step S2 is performed: the BMC of the host or the BMC of the GPU expansion box uses the I2C signal to detect the connection topology of the current GPU expansion box.
本实施中由于主机的BMC和GPU扩展箱的BMC之间已经通过I2C信号实现通信连接,两者中任意一个BMC均可以对当前GPU扩展箱中的连接拓扑进行检测。根据GPU扩展箱的连接拓扑的不同,主机的BMC或GPU扩展箱的BMC利用I2C信号对当前GPU扩展箱连接拓扑的检测方法也不同。具体包括以下三种情况:In this implementation, since the communication connection between the BMC of the host and the BMC of the GPU expansion box has been realized through the I2C signal, any BMC of the two can detect the connection topology in the current GPU expansion box. Depending on the connection topology of the GPU expansion box, the BMC of the host or the BMC of the GPU expansion box uses the I2C signal to detect the current connection topology of the GPU expansion box in different ways. Specifically, it includes the following three situations:
S21:当GPU扩展箱的连接拓扑为直联模式或上行模式时,主机的BMC或GPU扩展箱的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑的方法,包括:S21: When the connection topology of the GPU expansion box is the direct connection mode or the uplink mode, the BMC of the host or the BMC of the GPU expansion box uses the I2C signal to detect the connection topology of the current GPU expansion box, including:
S211:主机的BMC利用I2C信号扫描GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或GPU扩展箱的BMC。S211: The BMC of the host scans the first port and the second port of the PCIE switch chip in the GPU expansion box by using the I2C signal, and determines whether the BMC of the host or the BMC of the GPU expansion box is detected.
S212:GPU扩展箱的BMC利用I2C信号扫描GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或GPU扩展箱的BMC。S212: The BMC of the GPU expansion box scans the first port and the second port of the PCIE switch chip in the GPU expansion box by using the I2C signal, and determines whether the BMC of the host computer or the BMC of the GPU expansion box is detected.
本实施例中利用I2C信号进行的扫描是双向通信,即:扫描发起端和被扫描对象之间为双向通信,只要能够扫描到,即表示两者之间可以实现双向通信。例如:如果主机扫描到了GPU扩展箱,那么主机就可以与GPU扩展箱通信,此时,主机和GPU扩展箱都可以获取到整个计算系统的拓扑信息,这个拓扑信息中包含扫描发起端自身机箱所在拓扑连接中的位置。In this embodiment, the scanning using the I2C signal is bidirectional communication, that is, the scanning initiator and the scanned object are bidirectional communication, as long as they can be scanned, it means that bidirectional communication can be achieved between the two. For example: if the host scans the GPU expansion box, the host can communicate with the GPU expansion box. At this time, both the host and the GPU expansion box can obtain the topology information of the entire computing system. This topology information includes the scan initiator's own chassis. The position in the topological connection.
S22:当GPU扩展箱的连接拓扑为级联模式时,主机的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑的方法,包括:S22: When the connection topology of the GPU expansion box is in cascade mode, the BMC of the host uses the I2C signal to detect the current connection topology of the GPU expansion box, including:
S221:主机的BMC利用I2C信号扫描第一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或第一级GPU扩展箱的BMC。S221: The BMC of the host scans the first port and the second port of the PCIE switch chip in the first-level GPU expansion box by using the I2C signal, and determines whether the BMC of the host or the BMC of the first-level GPU expansion box is detected.
S222:上一级GPU扩展箱的BMC利用I2C信号扫描下一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到下一级GPU扩展箱的BMC。S222: The BMC of the upper-level GPU expansion box uses the I2C signal to scan the first port and the second port of the PCIE switch chip in the next-level GPU expansion box to determine whether the BMC of the next-level GPU expansion box is detected.
在级联模式中,BMC的扫描为从主机BMC开始,一级一级向下扫描,主机扫描第一级GPU扩展箱,第一级GPU扩展箱向下扫描下一级的GPU扩展箱。In cascade mode, the BMC scan starts from the host BMC, scans down one level at a time, the host scans the first level GPU expansion box, and the first level GPU expansion box scans down the next level GPU expansion box.
S23:当GPU扩展箱的连接拓扑为级联模式时,GPU扩展箱的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑的方法,包括:S23: When the connection topology of the GPU expansion box is in cascade mode, the BMC of the GPU expansion box uses the I2C signal to detect the current connection topology of the GPU expansion box, including:
S231:第一级GPU扩展箱的BMC利用I2C信号扫描第一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或第一级GPU扩展箱的BMC;S231: The BMC of the first-level GPU expansion box uses the I2C signal to scan the first port and the second port of the PCIE switch chip in the first-level GPU expansion box to determine whether the BMC of the host or the BMC of the first-level GPU expansion box is detected;
S232:上一级GPU扩展箱的BMC利用I2C信号扫描下一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到下一级GPU扩展箱的BMC。S232: The BMC of the upper-level GPU expansion box uses the I2C signal to scan the first port and the second port of the PCIE switch chip in the next-level GPU expansion box to determine whether the BMC of the next-level GPU expansion box is detected.
由以上步骤S21-S23可知,采用本实施例中的方法对BMC芯片进行软件编程,可以实现GPU扩展箱的BMC或主机的BMC自动识别GPU扩展箱的拓扑连接关系,识别当前拓扑连接关系后,后续GPU扩展箱的BMC或主机的BMC可以根据相应的拓扑结构正确配置PCIEswitch,从而完成计算系统的改配。It can be known from the above steps S21-S23 that the method in the present embodiment is adopted to carry out software programming to the BMC chip, so that the BMC of the GPU expansion box or the BMC of the host can automatically identify the topology connection relationship of the GPU expansion box, and after identifying the current topology connection relationship, The BMC of the subsequent GPU expansion box or the BMC of the host can correctly configure the PCIEswitch according to the corresponding topology structure, thereby completing the reconfiguration of the computing system.
利用BMC通过I2C信号获取到当前的拓扑连接关系后,执行步骤S3:主机的BMC或GPU扩展箱的BMC根据当前GPU扩展箱的连接拓扑以及映射关系,对当前GPU扩展箱内PCIEswitch芯片进行配置。After using the BMC to obtain the current topology connection relationship through the I2C signal, step S3 is performed: the BMC of the host or the BMC of the GPU expansion box configures the PCIEswitch chip in the current GPU expansion box according to the connection topology and mapping relationship of the current GPU expansion box.
具体地,步骤S3又包括如下过程:Specifically, step S3 includes the following process:
S31:如果当前GPU扩展箱的PCIE switch芯片上仅有第一端口检测到主机的BMC,第二端口不连接任何主机或GPU扩展箱,将当前GPU扩展箱的PCIE switch芯片设置为第一配置;S31: If only the first port on the PCIE switch chip of the current GPU expansion box detects the BMC of the host, and the second port is not connected to any host or GPU expansion box, set the PCIE switch chip of the current GPU expansion box to the first configuration;
S32:如果当前GPU扩展箱的PCIE switch芯片上第一端口检测到主机的BMC,第二端口连接下一级GPU扩展箱,将当前GPU扩展箱的PCIE switch芯片设置为第二配置;S32: If the first port on the PCIE switch chip of the current GPU expansion box detects the BMC of the host, the second port is connected to the next-level GPU expansion box, and the PCIE switch chip of the current GPU expansion box is set to the second configuration;
S33:如果当前GPU扩展箱的PCIE switch芯片上仅有第一端口检测到上一级GPU扩展箱的BMC,将当前GPU扩展箱的PCIE switch芯片设置为第三配置;S33: If only the first port on the PCIE switch chip of the current GPU expansion box detects the BMC of the previous-level GPU expansion box, set the PCIE switch chip of the current GPU expansion box to the third configuration;
S34:如果当前GPU扩展箱的PCIE switch芯片上第一端口检测到一主机的BMC,第二端口检测到另一主机的BMC,将当前GPU扩展箱的PCIE switch芯片设置为第四配置。S34: If the first port on the PCIE switch chip of the current GPU expansion box detects the BMC of one host, and the second port detects the BMC of another host, set the PCIE switch chip of the current GPU expansion box to the fourth configuration.
进一步地,本实施例中还可以将改配后的PCIE switch配置进行显示,如:可在GPU扩展箱的BMC或主机的BMC的网页下显示当前的拓扑结构。Further, in this embodiment, the reconfigured PCIE switch configuration can also be displayed, for example, the current topology can be displayed under the webpage of the BMC of the GPU expansion box or the BMC of the host.
下面以直联模式、双上行模式和级联模式的第一级的连接拓扑为例,说明一下主机的BMC或GPU扩展箱的BMC利用I2C信号检测当前GPU扩展箱的连接拓扑的方法:The following takes the connection topology of the first level of direct connection mode, dual uplink mode and cascading mode as an example to explain how the BMC of the host or the BMC of the GPU expansion box uses the I2C signal to detect the connection topology of the current GPU expansion box:
1)GPU扩展箱的BMC通过I2C信号扫描GPU扩展箱的P0和P1接口,判断是否检测到主机的BMC或GPU扩展箱的BMC。1) The BMC of the GPU expansion box scans the P0 and P1 interfaces of the GPU expansion box through the I2C signal to determine whether the BMC of the host or the BMC of the GPU expansion box is detected.
2)若仅P0口检测到主机的BMC,则将PCIE switch设置为第一配置。2) If only the P0 port detects the BMC of the host, set the PCIE switch to the first configuration.
3)若仅P0口检测到扩展箱的BMC,则将PCIE switch设置为第三配置。3) If only the P0 port detects the BMC of the expansion box, set the PCIE switch to the third configuration.
4)若P0及P1口检测都到主机的BMC,则将PCIE switch设置为第四配置。4) If both the P0 and P1 ports detect the BMC of the host, set the PCIE switch to the fourth configuration.
5)配置成功后,在BMC web下显示实际的拓扑连接关系,并点亮LED灯告知用户改配完成。5) After the configuration is successful, the actual topology connection relationship is displayed under the BMC web, and the LED light is lit to inform the user that the configuration is completed.
综上所述,采用本实施例中自动配置GPU扩展箱的方法,用户只需要根据应用场景进行连线关系的改动,不需要对计算系统进行重新配置,计算系统所有的配置完全由软件自动完成,能够大大提高改配效率,降低人为操作失误的影响。To sum up, using the method of automatically configuring the GPU expansion box in this embodiment, the user only needs to change the connection relationship according to the application scenario, and does not need to reconfigure the computing system, and all the configuration of the computing system is completely automatically completed by the software. , which can greatly improve the efficiency of the allocation and reduce the impact of human error.
实施例二Embodiment 2
在图1和图2所示实施例的基础之上参见图3,图3为本申请实施例所提供的一种自动配置GPU扩展箱的系统的结构示意图。由图3可知,本实施例中自动配置GPU扩展箱的系统主要包括:映射关系建立模块、检测模块和配置模块三部分。其中,映射关系建立模块用于建立GPU扩展箱的连接拓扑与GPU扩展箱内PCIE switch芯片配置之间的映射关系;检测模块用于利用I2C信号检测当前GPU扩展箱的连接拓扑;配置模块用于根据当前GPU扩展箱的连接拓扑以及所述映射关系,对GPU扩展箱内PCIE switch芯片进行配置。Referring to FIG. 3 on the basis of the embodiments shown in FIG. 1 and FIG. 2 , FIG. 3 is a schematic structural diagram of a system for automatically configuring a GPU expansion box provided by an embodiment of the present application. It can be seen from FIG. 3 that the system for automatically configuring the GPU expansion box in this embodiment mainly includes three parts: a mapping relationship establishment module, a detection module and a configuration module. Among them, the mapping relationship establishment module is used to establish the mapping relationship between the connection topology of the GPU expansion box and the configuration of the PCIE switch chip in the GPU expansion box; the detection module is used to detect the connection topology of the current GPU expansion box by using the I2C signal; the configuration module is used to Configure the PCIE switch chip in the GPU expansion box according to the current connection topology of the GPU expansion box and the mapping relationship.
映射关系建立模块所建立的映射关系为:GPU扩展箱的第一连接拓扑与第一配置相匹配,其中,第一连接拓扑为直联模式,第一配置为:PCIE switch芯片上第一端口连接一主机,且第二端口不连接任何主机或GPU扩展箱;GPU扩展箱的第二连接拓扑与第二配置相匹配,其中,第二连接拓扑为级联模式的第一级,第二配置为:PCIE switch芯片上第一端口连接一主机,且第二端口连接下一级GPU扩展箱;GPU扩展箱的第三连接拓扑与第三配置相匹配,其中,第三连接拓扑为级联模式的第N级,第三配置为:PCIE switch芯片上第一端口连接上一级GPU扩展箱,N≥2且N为自然数;GPU扩展箱的第四连接拓扑与第四配置相匹配,其中,第四连接拓扑为上行模式,第四配置为:PCIE switch芯片上第一端口连接一主机,第二端口连接另一主机。The mapping relationship established by the mapping relationship establishing module is: the first connection topology of the GPU expansion box matches the first configuration, wherein the first connection topology is a direct connection mode, and the first configuration is: the first port on the PCIE switch chip is connected a host, and the second port is not connected to any host or GPU expansion box; the second connection topology of the GPU expansion box matches the second configuration, wherein the second connection topology is the first stage of the cascade mode, and the second configuration is : The first port on the PCIE switch chip is connected to a host, and the second port is connected to the next-level GPU expansion box; the third connection topology of the GPU expansion box matches the third configuration, wherein the third connection topology is in cascade mode The Nth stage and the third configuration are: the first port on the PCIE switch chip is connected to the upper-level GPU expansion box, N≥2 and N is a natural number; the fourth connection topology of the GPU expansion box matches the fourth configuration, where the first The four-connection topology is an uplink mode, and the fourth configuration is: the first port on the PCIE switch chip is connected to a host, and the second port is connected to another host.
进一步地,本实施例中自动配置GPU扩展箱的系统中还设置有I2C信号增加模块,用于在主机和GPU扩展箱之间的连接线上增加I2C信号,建立主机BMC与GPU扩展箱BMC之间的通信连接;以及,通用于在上一级GPU扩展箱和下一级GPU扩展箱之间的连接线上增加I2C信号,建立上一级GPU扩展箱的BMC与下一级GPU扩展箱的BMC之间的通信连接。Further, in the present embodiment, the system for automatically configuring the GPU expansion box is also provided with an I2C signal increase module, which is used to increase the I2C signal on the connection line between the host and the GPU expansion box, and establishes the relationship between the host BMC and the GPU expansion box BMC. and, generally used to increase the I2C signal on the connection line between the upper-level GPU expansion box and the next-level GPU expansion box to establish the connection between the BMC of the upper-level GPU expansion box and the next-level GPU expansion box Communication connection between BMCs.
进一步地,本实施例中I2C信号增加模块为I2C总线。Further, in this embodiment, the I2C signal adding module is an I2C bus.
本实施中检测模块包括:第一检测单元、第二检测单元、第三检测单元和第四检测单元。其中,第一检测单元用于当GPU扩展箱的拓扑连接为直联模式或上行模式时,在主机的BMC中,利用I2C信号扫描GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或GPU扩展箱的BMC。In this implementation, the detection module includes: a first detection unit, a second detection unit, a third detection unit and a fourth detection unit. Wherein, the first detection unit is used to scan the first port and the second port of the PCIE switch chip in the GPU expansion box by using the I2C signal in the BMC of the host when the topology connection of the GPU expansion box is in the direct connection mode or the uplink mode, Determine whether the BMC of the host or the BMC of the GPU expansion box is detected.
第二检测单元用于当GPU扩展箱的拓扑连接为直联模式或上行模式时,在GPU扩展箱的BMC中,利用I2C信号扫描所述GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或GPU扩展箱的BMC。The second detection unit is used to scan the first port and the second port of the PCIE switch chip in the GPU expansion box by using the I2C signal in the BMC of the GPU expansion box when the topology connection of the GPU expansion box is in the direct connection mode or the upstream mode. port to determine whether the BMC of the host or the BMC of the GPU expansion box is detected.
第三检测单元用于当GPU扩展箱的连接拓扑为级联模式时,在主机的BMC中,利用I2C信号扫描第一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或第一级GPU扩展箱的BMC;以及,在上一级GPU扩展箱的BMC中,利用I2C信号扫描下一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到下一级GPU扩展箱的BMC。The third detection unit is used to scan the first port and the second port of the PCIE switch chip in the first-level GPU expansion box by using the I2C signal in the BMC of the host when the connection topology of the GPU expansion box is in cascade mode, and determine whether Detecting the BMC of the host or the BMC of the first-level GPU expansion box; and, in the BMC of the upper-level GPU expansion box, using the I2C signal to scan the first port and the second port of the PCIE switch chip in the next-level GPU expansion box , to determine whether the BMC of the next-level GPU expansion box is detected.
第四检测单元用于当GPU扩展箱的连接拓扑为级联模式时,在第一级GPU扩展箱的BMC中,利用I2C信号扫描第一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到主机的BMC或第一级GPU扩展箱的BMC;以及,在上一级GPU扩展箱的BMC中,利用I2C信号扫描下一级GPU扩展箱中PCIE switch芯片的第一端口和第二端口,判断是否检测到下一级GPU扩展箱的BMC。The fourth detection unit is used to scan the first port and the second port of the PCIE switch chip in the first-level GPU expansion box by using the I2C signal in the BMC of the first-level GPU expansion box when the connection topology of the GPU expansion box is the cascade mode. Two ports, determine whether the BMC of the host or the BMC of the first-level GPU expansion box is detected; and, in the BMC of the upper-level GPU expansion box, use the I2C signal to scan the first level of the PCIE switch chip in the next-level GPU expansion box port and the second port to determine whether the BMC of the next-level GPU expansion box is detected.
进一步地,本实施中配置模块包括:第一设置单元、第二设置单元、第三设置单元和第四设置单元。其中,第一设置单元,用于检测到当前GPU扩展箱的PCIE switch芯片上仅有第一端口检测到主机的BMC,第二端口不连接任何主机或GPU扩展箱时,将当前GPU扩展箱的PCIE switch芯片设置为第一配置。第二设置单元,用于检测到当前GPU扩展箱的PCIEswitch芯片上第一端口检测到主机的BMC,第二端口连接下一级GPU扩展箱时,将当前GPU扩展箱的PCIE switch芯片设置为第二配置。第三设置单元,用于检测到当前GPU扩展箱的PCIE switch芯片上仅有第一端口检测到上一级GPU扩展箱的BMC时,将当前GPU扩展箱的PCIE switch芯片设置为第三配置。第四检测单元,用于检测到当前GPU扩展箱的PCIEswitch芯片上第一端口检测到一主机的BMC,第二端口检测到另一主机的BMC时,将当前GPU扩展箱的PCIE switch芯片设置为第四配置。Further, in this implementation, the configuration module includes: a first setting unit, a second setting unit, a third setting unit and a fourth setting unit. The first setting unit is used to detect that only the first port on the PCIE switch chip of the current GPU expansion box detects the BMC of the host, and when the second port is not connected to any host or GPU expansion box, the current GPU expansion box The PCIE switch chip is set to the first configuration. The second setting unit is used to detect that the first port on the PCIEswitch chip of the current GPU expansion box detects the BMC of the host, and when the second port is connected to the next-level GPU expansion box, set the PCIE switch chip of the current GPU expansion box to the No. Second configuration. The third setting unit is configured to set the PCIE switch chip of the current GPU expansion box to the third configuration when only the first port on the PCIE switch chip of the current GPU expansion box detects the BMC of the previous-level GPU expansion box. The fourth detection unit is used to detect that the first port on the PCIEswitch chip of the current GPU expansion box detects the BMC of a host, and when the second port detects the BMC of another host, set the PCIE switch chip of the current GPU expansion box to Fourth configuration.
该实施例中自动配置GPU扩展箱的系统的工作原理和工作方法,在图1和图2所示的实施例中已经详细阐述,两个实施例之间可以互相参照,在此不再赘述。The working principle and working method of the system for automatically configuring the GPU expansion box in this embodiment have been described in detail in the embodiments shown in FIGS.
实施例三Embodiment 3
在图1、图2和图3所示实施例的基础之上参见图4,图4为本申请实施例所提供的一种可自动配置的GPU扩展箱的结构示意图。由图4可知,本实施例中的GPU扩展箱中,设置有一PCIE switch芯片,PCIE switch芯片上设置有多个端口,其中,第一端口用于连接主机或上一级GPU扩展箱,第二端口用于备用、连接下一级GPU扩展箱或连接另一主机,其余端口用于连接当前GPU扩展箱内的GPU。该GPU扩展箱中还设置有GPU扩展箱的BMC,用于对GPU扩展箱和主机进行监控,并根据GPU扩展箱的连接拓扑,对GPU扩展箱内PCIE switch芯片进行配置。另外,本实施例的主机中也设置有主机的BMC,用于对GPU扩展箱和主机进行监控,并根据GPU扩展箱的连接拓扑,对GPU扩展箱内PCIE switch芯片进行配置。主机与GPU扩展箱之间通过连接线通信连接,且主机和GPU扩展箱之间,以及多级GPU扩展箱之间的连接线上设置有I2C总线。主机和GPU扩展箱之间的I2C总线,用于建立主机BMC与GPU扩展箱BMC之间的通信连接,多级GPU扩展箱之间的I2C总线,用于建立上一级GPU扩展箱的BMC与下一级GPU扩展箱的BMC之间的通信连接。Referring to FIG. 4 on the basis of the embodiments shown in FIG. 1 , FIG. 2 and FIG. 3 , FIG. 4 is a schematic structural diagram of an automatically configurable GPU expansion box provided by an embodiment of the present application. As can be seen from FIG. 4 , in the GPU expansion box in this embodiment, a PCIE switch chip is provided, and a plurality of ports are provided on the PCIE switch chip, wherein the first port is used to connect the host or the upper-level GPU expansion box, and the second The ports are used for backup, connecting to the next-level GPU expansion box or connecting to another host, and the remaining ports are used to connect the GPU in the current GPU expansion box. The GPU expansion box is also provided with a BMC of the GPU expansion box, which is used to monitor the GPU expansion box and the host, and configure the PCIE switch chip in the GPU expansion box according to the connection topology of the GPU expansion box. In addition, the host of this embodiment is also provided with a BMC of the host, which is used to monitor the GPU expansion box and the host, and configure the PCIE switch chip in the GPU expansion box according to the connection topology of the GPU expansion box. The host computer and the GPU expansion box are communicated and connected through a connecting line, and an I2C bus is provided on the connecting line between the host computer and the GPU expansion box and between the multi-level GPU expansion boxes. The I2C bus between the host and the GPU expansion box is used to establish the communication connection between the host BMC and the GPU expansion box BMC, and the I2C bus between the multi-level GPU expansion boxes is used to establish the BMC of the upper-level GPU expansion box. Communication connection between BMCs of next-level GPU expansion boxes.
本实施例中,主机的BMC和GPU扩展箱的BMC中均设置自动配置GPU扩展箱的系统,该系统包括:映射关系建立模块、检测模块和配置模块三部分。其中,映射关系建立模块用于建立GPU扩展箱的连接拓扑与GPU扩展箱内PCIE switch芯片配置之间的映射关系;检测模块用于利用I2C信号检测当前GPU扩展箱的连接拓扑;配置模块用于根据当前GPU扩展箱的连接拓扑以及所述映射关系,对GPU扩展箱内PCIE switch芯片进行配置。In this embodiment, both the BMC of the host and the BMC of the GPU expansion box are provided with a system for automatically configuring the GPU expansion box, and the system includes three parts: a mapping relationship establishment module, a detection module and a configuration module. Among them, the mapping relationship establishment module is used to establish the mapping relationship between the connection topology of the GPU expansion box and the configuration of the PCIE switch chip in the GPU expansion box; the detection module is used to detect the connection topology of the current GPU expansion box by using the I2C signal; the configuration module is used to Configure the PCIE switch chip in the GPU expansion box according to the current connection topology of the GPU expansion box and the mapping relationship.
其中,映射关系建立模块所建立的映射关系包括:GPU扩展箱的第一连接拓扑与第一配置相匹配,其中,第一连接拓扑为直联模式,第一配置为:PCIE switch芯片上第一端口连接一主机,且第二端口不连接任何主机或GPU扩展箱;GPU扩展箱的第二连接拓扑与第二配置相匹配,其中,第二连接拓扑为级联模式的第一级,第二配置为:PCIE switch芯片上第一端口连接一主机,且第二端口连接下一级GPU扩展箱;GPU扩展箱的第三连接拓扑与第三配置相匹配,其中,第三连接拓扑为级联模式的第N级,第三配置为:PCIE switch芯片上第一端口连接上一级GPU扩展箱,N≥2且N为自然数;GPU扩展箱的第四连接拓扑与第四配置相匹配,其中,第四连接拓扑为上行模式,第四配置为:PCIE switch芯片上第一端口连接一主机,第二端口连接另一主机。The mapping relationship established by the mapping relationship establishing module includes: the first connection topology of the GPU expansion box matches the first configuration, wherein the first connection topology is a direct connection mode, and the first configuration is: the first connection topology on the PCIE switch chip The port is connected to a host, and the second port is not connected to any host or GPU expansion box; the second connection topology of the GPU expansion box matches the second configuration, wherein the second connection topology is the first level of the cascade mode, and the second The configuration is as follows: the first port on the PCIE switch chip is connected to a host, and the second port is connected to the next-level GPU expansion box; the third connection topology of the GPU expansion box matches the third configuration, wherein the third connection topology is cascading The Nth level of the mode, the third configuration is: the first port on the PCIE switch chip is connected to the upper-level GPU expansion box, N≥2 and N is a natural number; the fourth connection topology of the GPU expansion box matches the fourth configuration, where , the fourth connection topology is an uplink mode, and the fourth configuration is: the first port on the PCIE switch chip is connected to a host, and the second port is connected to another host.
该实施例中未详细描述的部分可以参见图1、图2和图3所示的实施例,三者之间可以互相参照,在此不再赘述。For parts not described in detail in this embodiment, reference may be made to the embodiments shown in FIG. 1 , FIG. 2 , and FIG. 3 , and mutual reference can be made among the three, which will not be repeated here.
以上所述仅是本申请的具体实施方式,使本领域技术人员能够理解或实现本申请。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above descriptions are only specific embodiments of the present application, so that those skilled in the art can understand or implement the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present application. Therefore, this application is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810824831.9A CN109002411B (en) | 2018-07-24 | 2018-07-24 | Method and system for automatically configuring GPU expansion box, and GPU expansion box that can be automatically configured |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810824831.9A CN109002411B (en) | 2018-07-24 | 2018-07-24 | Method and system for automatically configuring GPU expansion box, and GPU expansion box that can be automatically configured |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN109002411A CN109002411A (en) | 2018-12-14 |
| CN109002411B true CN109002411B (en) | 2021-04-27 |
Family
ID=64597737
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810824831.9A Active CN109002411B (en) | 2018-07-24 | 2018-07-24 | Method and system for automatically configuring GPU expansion box, and GPU expansion box that can be automatically configured |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109002411B (en) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110377556A (en) * | 2019-06-26 | 2019-10-25 | 苏州浪潮智能科技有限公司 | The adaptive device and method of common calculation module and Heterogeneous Computing module based on Retimer |
| CN111352787B (en) * | 2020-03-13 | 2023-08-18 | 浪潮商用机器有限公司 | GPU topology connection detection method, device, equipment and storage medium |
| CN111538693A (en) * | 2020-04-27 | 2020-08-14 | 中国科学院自动化研究所 | PCIE bus expansion system and method |
| CN112306947A (en) * | 2020-11-05 | 2021-02-02 | 山东云海国创云计算装备产业创新中心有限公司 | Topology switching method, device and equipment |
| CN112651162A (en) * | 2021-01-07 | 2021-04-13 | 中天恒星(上海)科技有限公司 | Method for automatically configuring GPU expansion box |
| CN113194048B (en) * | 2021-04-16 | 2023-05-26 | 山东英信计算机技术有限公司 | Device for dynamically switching CPU and GPU topology and use method |
| CN115827518A (en) * | 2022-11-23 | 2023-03-21 | 苏州浪潮智能科技有限公司 | External device management method, device, device and storage medium |
| CN116260725B (en) * | 2023-01-31 | 2026-01-30 | 苏州元脑智能科技有限公司 | A server bandwidth allocation method, apparatus, electronic device, and storage medium. |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104202194A (en) * | 2014-09-10 | 2014-12-10 | 华为技术有限公司 | Configuration method and device of PCIe (peripheral component interface express) topology |
| CN107102964A (en) * | 2017-05-19 | 2017-08-29 | 郑州云海信息技术有限公司 | A kind of method that GPU cluster expansion is carried out using high-speed connector |
| CN107632953A (en) * | 2017-09-14 | 2018-01-26 | 郑州云海信息技术有限公司 | A kind of GPU casees PCIE extends interconnection topology device |
| CN108173735A (en) * | 2018-01-17 | 2018-06-15 | 郑州云海信息技术有限公司 | A kind of GPU Box server cascaded communication method, apparatus and system |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9947070B2 (en) * | 2016-09-08 | 2018-04-17 | Dell Products L.P. | GPU that passes PCIe via displayport for routing to a USB type-C connector |
-
2018
- 2018-07-24 CN CN201810824831.9A patent/CN109002411B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104202194A (en) * | 2014-09-10 | 2014-12-10 | 华为技术有限公司 | Configuration method and device of PCIe (peripheral component interface express) topology |
| CN107102964A (en) * | 2017-05-19 | 2017-08-29 | 郑州云海信息技术有限公司 | A kind of method that GPU cluster expansion is carried out using high-speed connector |
| CN107632953A (en) * | 2017-09-14 | 2018-01-26 | 郑州云海信息技术有限公司 | A kind of GPU casees PCIE extends interconnection topology device |
| CN108173735A (en) * | 2018-01-17 | 2018-06-15 | 郑州云海信息技术有限公司 | A kind of GPU Box server cascaded communication method, apparatus and system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109002411A (en) | 2018-12-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109002411B (en) | Method and system for automatically configuring GPU expansion box, and GPU expansion box that can be automatically configured | |
| US12547579B2 (en) | Board for CXL data transmission, method for data transmission control and device | |
| CN104202194B (en) | The collocation method and device of PCIe topologys | |
| CN110554943B (en) | Multi-node server CMC management system and method based on I3C | |
| CN103543961B (en) | PCIe-based storage extension system and method | |
| EP4488841A1 (en) | Server and server management system therefor | |
| CN105868133B (en) | A kind of serial ports remote centralized management method for more node mainboards | |
| KR102350379B1 (en) | Method, apparatus, electronic device and computer readable storage medium for supporting communication among chips | |
| CN107329774A (en) | The method and apparatus for determining Redriver chip parameters | |
| CN103124225A (en) | Multiple node initializing detection method, device and system | |
| CN113177018B (en) | Server using double-slot CPU | |
| CN112069106A (en) | FPGA-based multi-server PECI link control system | |
| CN118245295A (en) | PCIe link state detection method of server and server | |
| CN106909752A (en) | The analogue system of high-speed railway computer interlock system external interface test | |
| CN110362525A (en) | A method, system and board for realizing multi-serial port switching based on CPLD | |
| CN108647180A (en) | A kind of arithmetic system and corresponding electronic equipment | |
| CN112615739B (en) | A method and system for adapting an OCP3.0 network card in a multi-host application environment | |
| CN113608935B (en) | Method, system, equipment and medium for testing network card | |
| CN104243360B (en) | The collocation method and device of a kind of conversion link | |
| CN118626431B (en) | Processor communication method, device, equipment, system and storage medium | |
| CN113849355B (en) | I2C rate self-adaptive adjustment method, system, terminal and storage medium | |
| CN112882773B (en) | Network performance detection method, device, test terminal and storage medium | |
| CN120743662A (en) | Test board and test system | |
| CN102880583B (en) | Device and method for configuring dynamic link of multi-way server | |
| CN104065664A (en) | Cloud server authentication system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |