CN108762818A

CN108762818A - A kind of optimization design server and maintaining method

Info

Publication number: CN108762818A
Application number: CN201810538595.4A
Authority: CN
Inventors: 李艳; 白云峰
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2018-05-30
Filing date: 2018-05-30
Publication date: 2018-11-06

Abstract

The invention discloses an optimally designed server and a maintenance method. The optimizedly designed server is arranged in layers according to the functional modules of the traditional server system and the GPU server system, and the number of layers is multi-layered to realize the uninterrupted operation of the whole machine. It can be stretched and stretched freely, and one side of each layer except the top layer is equipped with a tank chain for pulling the traditional server system or GPU server system, which ensures the coherence of the various layers and avoids the When the components are lost, a fixed buckle is provided on the other side of each layer, which strengthens the structural firmness of the server after the optimized design.

Description

An optimized design server and maintenance method

技术领域technical field

本发明涉及人工智能服务器领域，尤其是涉及一种优化设计服务器及维护方法。The invention relates to the field of artificial intelligence servers, in particular to an optimized design server and a maintenance method.

背景技术Background technique

人工智能在当前世界的在线服务领域发挥着越来越重要的作用，在谷歌、Facebook、微软和百度这些公司内部，GPU在“深度学习”领域发挥着巨大的作用。GPU是一个大规模的SIMD集合，拥有大量的向量运算系统，在进行重复功能的流计算时，可以平行处理大量琐碎信息，只需要定好功能数据不断往里进就可以了。CPU的强大在于面对控制密集型运算时非常优秀，但是对于数据密集型的运算，比如屏幕中每个pixel应该显示什么颜色，相对来说就受限于CPU的运行机制压力会很大。。Artificial intelligence is playing an increasingly important role in the online services of the current world. Within companies such as Google, Facebook, Microsoft, and Baidu, GPUs play a huge role in the field of "deep learning". GPU is a large-scale SIMD collection with a large number of vector computing systems. When performing stream computing with repetitive functions, it can process a large amount of trivial information in parallel. It only needs to set the function data and keep entering it. The power of the CPU is that it is excellent in the face of control-intensive operations, but for data-intensive operations, such as what color should be displayed by each pixel on the screen, it is relatively limited by the operating mechanism of the CPU and the pressure will be great. .

随着人工智能的蓬勃发展，GPU被大量广泛使用，对于大规模数据中心，已经开始大量部署池化类的服务器，这类传统服务器在结构上面不支持GPU的热运维，一旦服务器需要检修或者GPU升级换代，需要整体将服务器进行断电进行相关操作，无法像更换硬盘、风扇、电源那样便捷操作，这会导致整个服务停止运作，给整个系统带来极大的挑战，因此急需设计和开发一种可热运维和易升级换代GPU的服务器满足市场需求，满足整个服务。With the vigorous development of artificial intelligence, GPUs are widely used in large numbers. For large-scale data centers, a large number of pooled servers have begun to be deployed. This type of traditional server structure does not support the thermal operation and maintenance of GPUs. Once the server needs to be repaired or To upgrade the GPU, it is necessary to power off the server as a whole for related operations, which is not as convenient as replacing the hard disk, fan, and power supply. This will cause the entire service to stop operating and bring great challenges to the entire system. Therefore, design and development are urgently needed A server that can be hot-operated and easily upgraded with a GPU meets market demand and meets the entire service.

发明内容Contents of the invention

为解决上述问题，本发明提供了一种优化设计服务器,优化设计后的服务器GPU支持热运维，可实现单一部件的检修，该维护方法方便快捷。In order to solve the above problems, the present invention provides an optimally designed server. The optimally designed server GPU supports thermal operation and maintenance, and can realize the maintenance of a single component. The maintenance method is convenient and quick.

基于此，本发明的技术方案为：Based on this, technical scheme of the present invention is:

一种优化设计服务器，包括传统服务器系统、GPU服务器系统、坦克链、电线和固定扣，所述传统服务器系统和GPU服务器系统按照功能模块分层设置、且分层数量为多层，除顶层外各所述层的第一侧均设有用于抽拉传统服务器系统或GPU服务器系统的坦克链，所述电线穿插于坦克链中，在各所述层的第二侧设有固定扣。An optimally designed server, including a traditional server system, a GPU server system, tank chains, wires and fixing buckles, the traditional server system and the GPU server system are arranged in layers according to functional modules, and the number of layers is multi-layered, except for the top layer Tank chains for pulling traditional server systems or GPU server systems are provided on the first side of each of the layers, the wires are inserted into the tank chains, and fixed buckles are provided on the second sides of each of the layers.

进一步的，所述传统服务器系统包括CPU处理器摸块、存储模块、网络模块、电源和风扇。Further, the traditional server system includes a CPU processor module, a storage module, a network module, a power supply and a fan.

进一步的，所述GPU服务器系统内设有多个GPU处理器模块。Further, the GPU server system is provided with multiple GPU processor modules.

进一步的，所述GPU处理器模块数量为6个。Further, the number of GPU processor modules is 6.

进一步的，所述传统服务器系统和GPU服务器系统均独立设置于不同的层内。Further, the traditional server system and the GPU server system are independently arranged in different layers.

进一步的，所述电线为供电电线和数据业务走线。Further, the wires are power supply wires and data service wires.

另外，本发明还提供了一种优化设计服务器的维护方法，利用上述优化设计的服务器，当所述服务器进行升级换代或运维更换时，In addition, the present invention also provides a maintenance method for an optimally designed server. Using the above-mentioned optimally designed server, when the server is upgraded or replaced by operation and maintenance,

a.进入传统服务器系统并disable掉相对应的GPU服务器系统；a. Enter the traditional server system and disable the corresponding GPU server system;

b.解除固定扣将需更换或运维的GPU服务器系统拉出，待成功更换后推回更换之后的GPU服务器系统；b. Release the fixing buckle and pull out the GPU server system that needs to be replaced or operated and maintained, and push back the replaced GPU server system after successful replacement;

c.安装固定扣，再次进入传统服务器系统重新更新驱动并识别相对应的GPU服务器系统，则该服务器进入正常业务模式。c. Install the fixing buckle, re-enter the traditional server system to re-update the driver and identify the corresponding GPU server system, then the server will enter the normal business mode.

实施本发明实施例，具有以下有益效果：Implementing the embodiment of the present invention has the following beneficial effects:

1、本发明经优化设计的服务器，将传统服务器系统和GPU服务器系统按照功能模块分层设置，且分层数量为多层，实现整机不间断运行情况下可以自由伸缩拉伸，并除顶层外各所述层的一侧均设有用于抽拉传统服务器系统或GPU服务器系统的坦克链，保证了各个层之间的整理连贯性，避免在抽拉层时，零部件的丢失，并在各所述层的另一侧设有固定扣，加强了优化设计之后服务器的结构牢固性。1. In the optimized server of the present invention, the traditional server system and the GPU server system are layered according to functional modules, and the number of layers is multi-layered, so that the whole machine can be freely stretched and stretched under the condition of uninterrupted operation, and the top layer is removed. One side of each outer layer is equipped with a tank chain for pulling the traditional server system or GPU server system, which ensures the coherence of each layer, avoids the loss of parts when pulling the layer, and The other side of each layer is provided with a fixed buckle, which strengthens the structural firmness of the server after the optimized design.

2、本发明的GPU服务器系统内设有多个GPU处理器模块，优选于6个GPU处理器，进行人工智能、深度学习、神经网络推演时，可以大大提升工作效率，其能效可以比肩多台传统服务器。2. The GPU server system of the present invention is equipped with a plurality of GPU processor modules, preferably 6 GPU processors. When performing artificial intelligence, deep learning, and neural network deduction, the work efficiency can be greatly improved, and its energy efficiency can be comparable to multiple traditional server.

3、将所述传统服务器系统和GPU服务器系统均独立设置于不同的层内，独立设置方便后期维护；另外电线为供电电线和数据业务走线，并通过穿插在坦克链与传统服务器系统或GPU服务器系统连接，实现服务器CPU对GPU资源的访问。3. The traditional server system and the GPU server system are independently installed in different layers, which is convenient for later maintenance; in addition, the wires are used for power supply wires and data service lines, and are interspersed between the tank chain and the traditional server system or GPU The server system is connected to realize the access of the server CPU to the GPU resources.

4、利用本发明的服务器进行维护的方法清晰合理，在对该服务器进行运维或迭代升级，实现该服务器的不间断运行，避免了数据丢失。4. The maintenance method of the server of the present invention is clear and reasonable, and the server is operated and maintained or iteratively upgraded to realize uninterrupted operation of the server and avoid data loss.

附图说明Description of drawings

图1为本实施例所述优化设计服务器的整体结构示意图。FIG. 1 is a schematic diagram of the overall structure of the optimization design server described in this embodiment.

图2为本实施例所述利用上述优化设计服务器进行维护方法的流程图。FIG. 2 is a flow chart of the maintenance method by using the optimization design server described in this embodiment.

附图标记说明：Explanation of reference signs:

其中，1、传统服务器系统，2、GPU服务器系统，3、坦克链，4，固定扣。Among them, 1. Traditional server system, 2. GPU server system, 3. Tank chain, 4. Fixed buckle.

具体实施方式Detailed ways

下面将结合本发明中的附图和实施例，对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明的保护范围。The technical solutions of the present invention will be clearly and completely described below in conjunction with the drawings and embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

结合图1所示，本发明实施例所述的一种优化设计服务器，包括传统服务器系统1、GPU服务器系统2、坦克链3、所述电线和固定扣4，传统服务器系统1和GPU服务器系统2按照功能模块分层设置、且分层数量为多层，除顶层外各所述层的第一侧均设有用于抽拉传统服务器系统1或GPU服务器系统2的坦克链3，所述电线穿插于坦克链3中，在各所述层的第二侧设有固定扣4。本发明实施例的经优化设计的服务器，将传统服务器系统1和GPU服务器系统2按照功能模块分层设置，且分层数量为多层，具体每一层为传统服务器系统1或GPU服务器系统2均不特定，实现整机不间断运行情况下可以自由伸缩拉伸，并除顶层外各所述层的一侧均设有用于抽拉传统服务器系统1或GPU服务器系统2的坦克链3，保证了各个层之间的整理连贯性，避免在抽拉层时，零部件的丢失，并在各所述层的另一侧设有固定扣4，加强了优化设计之后服务器的结构牢固性。As shown in Fig. 1, a kind of optimized design server described in the embodiment of the present invention includes a traditional server system 1, a GPU server system 2, a tank chain 3, the electric wire and a fixed buckle 4, a traditional server system 1 and a GPU server system 2. According to the layered setting of functional modules, and the number of layers is multi-layered, the first side of each layer except the top layer is provided with a tank chain 3 for pulling the traditional server system 1 or GPU server system 2, and the wires Inserted in the tank chain 3, a fixed buckle 4 is provided on the second side of each layer. In the optimally designed server of the embodiment of the present invention, the traditional server system 1 and the GPU server system 2 are layered according to functional modules, and the number of layers is multi-layered, and each layer is specifically a traditional server system 1 or a GPU server system 2 None of them are specific, and they can be freely stretched and stretched under the condition of uninterrupted operation of the whole machine, and one side of each layer except the top layer is equipped with a tank chain 3 for pulling the traditional server system 1 or GPU server system 2, ensuring The arrangement continuity between each layer is improved, and the loss of components is avoided when the layer is pulled, and the other side of each layer is provided with a fixed buckle 4, which strengthens the structural firmness of the server after the optimized design.

其中，传统服务器系统1包括CPU处理器摸块、存储模块、网络模块、电源和风扇，GPU服务器系统2内设有多个GPU处理器模块，优选于6个GPU处理器，进行人工智能、深度学习、神经网络推演时，可以大大提升工作效率，其能效可以比肩多台传统服务器。传统服务器系统1和GPU服务器系统2均独立设置于不同的层内，独立设置方便后期维护；另外所述电线为供电电线和数据业务走线，并通过穿插在坦克链3与传统服务器系统1或GPU服务器系统2连接，实现服务器CPU对GPU资源的访问。Wherein, traditional server system 1 comprises CPU processor module, storage module, network module, power supply and fan, GPU server system 2 is provided with a plurality of GPU processor modules, preferably 6 GPU processors, for artificial intelligence, depth During learning and neural network deduction, work efficiency can be greatly improved, and its energy efficiency can be comparable to that of multiple traditional servers. Both the traditional server system 1 and the GPU server system 2 are independently set in different layers, and the independent setting is convenient for later maintenance; in addition, the wires are power supply wires and data service lines, and are interspersed between the tank chain 3 and the traditional server system 1 or The GPU server system 2 is connected to realize the access of the server CPU to the GPU resource.

另外，在除顶层外各所述层的一侧采用坦克链3，坦克链3起到适合于使用在往复运动的场合，能够对内置的所述电线起到牵引和保护作用；坦克链3每节由左右链板和上下盖板组成，每节都能打开且链节之间转动自如，便于安装和维修，装拆方便、不必穿线，打开盖板后即可把电缆方便的安装在内部，大大提升了走线的便捷程度；坦克链3具有较高的压力和抗拉负荷，良好的韧性、高弹性和耐磨性，阻燃，高低温时性能稳定，可以使用在室外，运行速度稳定。In addition, a tank chain 3 is used on one side of each layer except the top layer, and the tank chain 3 is suitable for use in reciprocating motions, and can pull and protect the built-in electric wires; each tank chain 3 The link is composed of left and right chain plates and upper and lower cover plates. Each link can be opened and the chain links can rotate freely, which is convenient for installation and maintenance. Greatly improves the convenience of wiring; the tank chain 3 has high pressure and tensile load, good toughness, high elasticity and wear resistance, flame retardant, stable performance at high and low temperatures, can be used outdoors, and runs at a stable speed .

结合图2所示，本发明还提供了一种优化设计服务器的维护方法，利用上述优化设计的服务器，当所述服务器进行升级换代或运维更换时，As shown in FIG. 2, the present invention also provides a maintenance method for an optimally designed server. Using the above-mentioned optimally designed server, when the server is upgraded or replaced by operation and maintenance,

a.进入传统服务器系统1并disable掉相对应的GPU服务器系统2；a. Enter the traditional server system 1 and disable the corresponding GPU server system 2;

b.解除固定扣4将需更换或运维的GPU服务器系统2拉出，待成功更换后推回更换之后的GPU服务器系统2；b. Release the fixing buckle 4 to pull out the GPU server system 2 that needs to be replaced or maintained, and push back the replaced GPU server system 2 after successful replacement;

c.安装固定扣，再次进入传统服务器系统1重新更新驱动并识别相对应的GPU服务器系统2，则该服务器进入正常业务模式。c. Install the fixing buckle, re-enter the traditional server system 1 to re-update the driver and identify the corresponding GPU server system 2, then the server enters the normal business mode.

利用本发明的服务器进行维护的方法清晰合理，在对该服务器进行运维或迭代升级，实现该服务器的不间断运行，避免了数据丢失。The maintenance method of the server of the present invention is clear and reasonable, and the server is operated and maintained or iteratively upgraded to realize uninterrupted operation of the server and avoid data loss.

以上所述仅是本发明的优选实施方式，应当指出对于本技术领域的普通技术人员来说，在不脱离本发明技术原理的前提下，还可以做出若干改进和替换，这些改进和替换也应视为本发明的保护范围。The above is only a preferred embodiment of the present invention, it should be pointed out that for those of ordinary skill in the art, without departing from the technical principle of the present invention, some improvements and substitutions can also be made, and these improvements and substitutions are also possible. It should be regarded as the protection scope of the present invention.

Claims

1. an optimized design server, is characterized in that, comprises traditional server system, GPU server system, tank chain, electric wire and fixed buckle, described traditional server system and GPU server system are arranged according to functional module layering, and layered quantity is Multi-layer, except the top layer, the first side of each layer is provided with a tank chain for pulling the traditional server system or GPU server system, the wires are inserted in the tank chain, and the second side of each layer is provided with With fixed buckle.

2. The optimized design server according to claim 1, wherein said traditional server system comprises a CPU processor module, a storage module, a network module, a power supply and a fan.

3. The optimization design server according to claim 1, wherein a plurality of GPU processor modules are arranged in the GPU server system.

4. The optimized design server according to claim 3, wherein the number of said GPU processor modules is six.

5. The optimized design server according to claim 1, wherein the traditional server system and the GPU server system are independently arranged in different layers.

6. The optimized design server according to claim 1, wherein the electric wires are power supply electric wires and data service wirings.

7. A maintenance method for an optimally designed server, using the optimally designed server according to any one of claims 1 to 6, characterized in that, when the server is upgraded or replaced by operation and maintenance,

a. Enter the traditional server system and disable the corresponding GPU server system;

b. Unfasten the buckle and pull out the GPU server system that needs to be upgraded or maintained, and push it back to the replaced GPU server system after successful replacement;

c. Install the fixing buckle, re-enter the traditional server system to re-update the driver and identify the corresponding GPU server system, then the server will enter the normal business mode.