CN105912431A - Reboot testing method of server, server, control device and system - Google Patents
Reboot testing method of server, server, control device and system Download PDFInfo
- Publication number
- CN105912431A CN105912431A CN201610202489.XA CN201610202489A CN105912431A CN 105912431 A CN105912431 A CN 105912431A CN 201610202489 A CN201610202489 A CN 201610202489A CN 105912431 A CN105912431 A CN 105912431A
- Authority
- CN
- China
- Prior art keywords
- server
- count
- file
- controller
- ispci
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2273—Test methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Test And Diagnosis Of Digital Computers (AREA)
Abstract
本发明提供了服务器reboot测试方法、服务器、控制器和系统,该方法包括:通过交换机,建立服务器和控制器相互通信;服务器接收控制器发送的开机请求,进行开机启动;判断是否存在Ispci‑tmp文件,如果是,则读取设备信息,将设备信息写入Ispci‑$count文件,并比较Ispci‑tmp文件和Ispci‑$count文件是否一致,否则,为服务器的设备信息,生成Ispci‑tmp文件;当Ispci‑tmp文件和Ispci‑$count文件一致时,发送启动完成信息;创建gpu.txt文件和server.txt文件;接收控制器发送的关机请求,进行关机操作,实现了服务器稳定性测试的自动化。
The invention provides a server reboot test method, server, controller and system, the method comprising: through a switch, establishing mutual communication between the server and the controller; the server receiving the boot request sent by the controller, and starting the boot; judging whether there is Ispci-tmp file, if yes, read the device information, write the device information into the Ispci‑$count file, and compare whether the Ispci‑tmp file is consistent with the Ispci‑$count file, otherwise, generate the Ispci‑tmp file for the server’s device information ;When the Ispci-tmp file is consistent with the Ispci-$count file, send the start-up completion information; create the gpu.txt file and the server.txt file; receive the shutdown request sent by the controller, and perform the shutdown operation to realize the server stability test automation.
Description
技术领域technical field
本发明涉及服务器应用技术领域,特别涉及服务器reboot测试方法、服务器、控制器和系统。The invention relates to the technical field of server applications, in particular to a server reboot test method, server, controller and system.
背景技术Background technique
随着云计算服务的不断发展,对服务器的稳定性要求越来越高。目前服务器稳定性测试的一种重要方法为reboot测试。With the continuous development of cloud computing services, the requirements for server stability are getting higher and higher. A kind of important method of server stability test is reboot test at present.
现有服务器的reboot测试方式主要是,在各个服务器节点安装reboot脚本,通过人工的方式为各个服务器节点连通电源,手动开启各个服务器节点,reboot脚本运行,并检测开机过程是否正常,然后对服务器节点执行关机操作,并需要人工依次对各个服务器节点进行断开电源的操作,即现有的reboot测试方法只能通过人工参与的方式完成,而无法使服务器稳定性测试自动进行。The reboot test method of the existing server is mainly to install the reboot script on each server node, manually connect the power supply to each server node, manually start each server node, reboot the script to run, and check whether the boot process is normal, and then test the server node To perform a shutdown operation, it is necessary to manually disconnect the power of each server node in turn, that is, the existing reboot test method can only be completed by manual participation, and the server stability test cannot be performed automatically.
发明内容Contents of the invention
本发明实施例提供了服务器reboot测试方法、服务器、控制器和系统,实现服务器稳定性测试的自动化。The embodiment of the present invention provides a server reboot test method, a server, a controller and a system, so as to realize the automation of the server stability test.
服务器reboot测试方法,通过交换机,建立服务器和控制器之间相互通信;还包括:The server reboot test method establishes communication between the server and the controller through the switch; it also includes:
当服务器接收到控制器发送的开机请求时,进行开机启动;When the server receives the boot request sent by the controller, it starts the boot;
服务器判断本身是否存在Ispci-tmp文件,如果是,则读取服务器中的设备信息,将该设备信息写入Ispci-$count文件,并比较Ispci-tmp文件和Ispci-$count文件是否一致,否则,为服务器中的设备信息,生成Ispci-tmp文件;The server judges whether the Ispci-tmp file exists, if yes, reads the device information in the server, writes the device information into the Ispci-$count file, and compares whether the Ispci-tmp file is consistent with the Ispci-$count file, otherwise , generate the Ispci-tmp file for the device information in the server;
当所述Ispci-tmp文件和所述Ispci-$count文件一致时,发送启动完成信息给控制器;When the Ispci-tmp file is consistent with the Ispci-$count file, send startup completion information to the controller;
创建gpu.txt文件和server.txt文件;Create gpu.txt file and server.txt file;
接收控制器发送的关机请求,进行关机操作。Receive the shutdown request sent by the controller, and perform the shutdown operation.
优选地,所述通过交换机,建立服务器和控制器之间相互通信,包括:Preferably, the establishment of mutual communication between the server and the controller through the switch includes:
服务器通过OS网络和BMC网络连接到交换机;The server is connected to the switch through the OS network and the BMC network;
控制器通过OS网络连接到交换机。The controller is connected to the switch through the OS network.
优选地,上述方法进一步包括:在服务器中设置第一计数器count;Preferably, the above method further includes: setting a first counter count in the server;
在所述开机启动之后,进一步包括:服务器判断本身是否存在count文件,如果是,则将所述第一计数器count+1,存储到所述count文件;否则,启动所述第一计数器count,将所述第一计数器count+1,生成count文件,并将所述第一计数器count写入服务器开机启动项。After the startup, it further includes: the server judges whether there is a count file, if yes, then stores the first counter count+1 in the count file; otherwise, starts the first counter count, and The first counter count+1, generate a count file, and write the first counter count into the startup item of the server.
优选地,所述服务器为Pcie-Switch服务器,该Pcie-Switch服务器包括:资源服务器和server端,其中,所述server端插有一张retimer卡,通过该retimer卡及MiniSASHD线缆与所述资源服务器连接;Preferably, the server is a Pcie-Switch server, and the Pcie-Switch server includes: a resource server and a server end, wherein a retimer card is inserted in the server end, through which the retimer card and the MiniSASHD cable are connected to the resource server connect;
进一步包括:设置启动时序;It further includes: setting the start sequence;
所述开机启动,包括:根据设置的启动时序,顺序启动所述资源服务器和server端。The booting includes: sequentially starting the resource server and the server end according to the set startup sequence.
服务器reboot测试方法,应用于控制器,在控制器中设置第二计数器count,并设置检测阈值;还包括:The server reboot test method is applied to the controller, and the second counter count is set in the controller, and the detection threshold is set; it also includes:
M1、控制器初始化所述第二计数器count;M1. The controller initializes the second counter count;
M2、接收服务器发送的启动完成信息,判断所述第二计数器count的计数是否小于检测阈值,如果是,则检测服务器中是否存在gpu.txt文件和server.txt文件,如果是,则调用服务器的关机函数,控制服务器关机;M2, receiving the startup completion information sent by the server, judging whether the count of the second counter count is less than the detection threshold, if yes, then detecting whether there are gpu.txt files and server.txt files in the server, and if so, calling the server's Shutdown function, control server shutdown;
M3、向服务器发送开机请求,调用服务器的开机函数,控制服务器开机启动,并将第二计数器count+1,执行M2。M3. Send a start-up request to the server, call the start-up function of the server, control the start-up of the server, and set the second counter count+1, and execute M2.
优选地,上述方法进一步包括:清除所述服务器中的操作系统日志。Preferably, the above method further includes: clearing operating system logs in the server.
优选地,所述服务器为Pcie-Switch服务器,该Pcie-Switch服务器包括:资源服务器和server端,其中,所述server端插有一张retimer卡,通过该retimer卡及MiniSASHD线缆与所述资源服务器连接;Preferably, the server is a Pcie-Switch server, and the Pcie-Switch server includes: a resource server and a server end, wherein a retimer card is inserted in the server end, through which the retimer card and the MiniSASHD cable are connected to the resource server connect;
所述控制服务器关机,包括:顺序控制所述server端关机和所述资源服务器关机;The controlling the shutdown of the server includes: sequentially controlling the shutdown of the server end and the shutdown of the resource server;
所述控制服务器开机启动,包括:顺序控制所述资源服务器开机启动和所述server端开机启动。The control server startup includes: sequentially controlling the startup of the resource server and the startup of the server.
应用于上述任一所述的服务器reboot测试方法的服务器,通过外设的交换机,与外设的控制器之间相互通信,包括:开关单元、第一判断单元、读取写入单元和生成单元,其中,The server applied to any of the server reboot test methods described above communicates with the controller of the peripheral device through the switch of the peripheral device, including: a switch unit, a first judgment unit, a read and write unit, and a generation unit ,in,
所述开关单元,用于当接收到外设的控制器发送的开机请求时,进行开机启动,并触发所述第一判断单元,当接收到外设的控制器发送的关机请求,进行关机操作;The switch unit is configured to perform a power-on start when receiving a power-on request sent by the controller of the peripheral device, and trigger the first judging unit to perform a power-off operation when receiving a power-off request sent by the controller of the peripheral device ;
所述第一判断单元,用于接收到所述开机单元的触发时,判断是否存在Ispci-tmp文件,如果是,则触发所述读取写入单元;并比较Ispci-tmp文件和Ispci-$count文件是否一致,否则,触发所述生成单元;The first judging unit is used to judge whether there is an Ispci-tmp file when receiving the trigger of the boot unit, and if so, trigger the read and write unit; and compare the Ispci-tmp file and Ispci-$ Whether the count file is consistent, otherwise, trigger the generation unit;
所述读取写入单元,用于读取服务器中的设备信息,将该设备信息写入Ispci-$count文件,并当所述Ispci-tmp文件和所述Ispci-$count文件一致时,发送启动完成信息给外设的控制器,并创建gpu.txt文件和server.txt文件;The read and write unit is used to read the device information in the server, write the device information into the Ispci-$count file, and when the Ispci-tmp file is consistent with the Ispci-$count file, send Start the completion information to the controller of the peripheral, and create the gpu.txt file and the server.txt file;
所述生成单元,用于为各个设备信息,生成Ispci-tmp文件。The generating unit is configured to generate an Ispci-tmp file for each device information.
优选地,上述服务器,通过OS网络和BMC网络连接到外设的交换机。Preferably, the above-mentioned server is connected to the switch of the peripheral device through the OS network and the BMC network.
优选地,上述服务器,进一步包括:第二判断单元和第一计数器,其中,Preferably, the above server further includes: a second judging unit and a first counter, wherein,
所述第二判断单元,用于判断是否存在count文件,如果是,则触发所述第一计数器;否则,启动所述第一计数器,生成count文件,并将所述第一计数器写入服务器开机启动项;The second judging unit is used to judge whether there is a count file, and if so, trigger the first counter; otherwise, start the first counter, generate a count file, and write the first counter into the server to boot startup item;
所述第一计数器,用于统计所述开关单元的开机启动次数,当所述开关单元进行开机启动时,进行count+1,并将开机启动次数存储到所述count文件。The first counter is used to count the number of startup times of the switch unit, and count+1 is performed when the switch unit is started up, and the number of startup times is stored in the count file.
优选地,上述服务器,为Pcie-Switch服务器,该Pcie-Switch服务器包括:资源服务器和server端,其中,所述server端插有一张retimer卡,通过该retimer卡及MiniSASHD线缆与所述资源服务器连接。Preferably, the above-mentioned server is a Pcie-Switch server, and the Pcie-Switch server includes: a resource server and a server end, wherein a retimer card is inserted in the server end, through which the retimer card and the MiniSASHD cable are connected to the resource server connect.
应用于上述任一所述的服务器reboot测试方法的控制器,包括:设置单元、第二计数器、检测单元和调用控制单元,其中,The controller applied to any of the server reboot testing methods described above, including: a setting unit, a second counter, a detection unit and a call control unit, wherein,
所述设置单元,用于设置检测阈值;The setting unit is used to set the detection threshold;
所述检测单元,用于判断所述第二计数器的计数是否小于所述设置单元设置的检测阈值,如果是,则检测外设的服务器中是否存在gpu.txt文件和server.txt文件,如果是,则触发所述调用控制单元;The detection unit is used to judge whether the count of the second counter is less than the detection threshold set by the setting unit, if yes, then detect whether there are gpu.txt files and server.txt files in the server of the peripheral device, if yes , then trigger the calling control unit;
所述调用控制单元,用于在接收到所述检测单元的触发时,调用外设的服务器的关机函数,控制外设的服务器关机,向外设的服务器发送开机请求,调用外设的服务器的开机函数,控制外设的服务器开机启动,并将所述第二计数器的计数加1。The calling control unit is configured to call the shutdown function of the server of the peripheral device when receiving the trigger of the detection unit, control the server of the peripheral device to shut down, send a startup request to the server of the peripheral device, and call the server of the peripheral device The booting function controls the booting of the server of the peripheral device, and adds 1 to the count of the second counter.
服务器reboot测试系统,包括:至少一个上述任意一种服务器、交换机和上述任意一种控制器,其中,The server reboot test system includes: at least one of the above-mentioned any kind of server, a switch and any of the above-mentioned controllers, wherein,
所述至少一个服务器和所述控制器分别与所述交换机连接。The at least one server and the controller are respectively connected to the switch.
本发明实施例提供了服务器reboot测试方法、服务器、控制器和系统,该方法,通过交换机,建立服务器和控制器之间相互通信;当服务器接收到控制器发送的开机请求时,进行开机启动;服务器判断本身是否存在Ispci-tmp文件,如果是,则读取服务器中的设备信息,将该设备信息写入Ispci-$count文件,并比较Ispci-tmp文件和Ispci-$count文件是否一致,否则,为服务器中的设备信息,生成Ispci-tmp文件;当所述Ispci-tmp文件和所述Ispci-$count文件一致时,发送启动完成信息给控制器;创建gpu.txt文件和server.txt文件;接收控制器发送的关机请求,进行关机操作,通过该方法,通过服务器判断文件是否存在,并对比文件间的一致性,即可判断出服务器启动是否正常,另外,服务器的启动和关机均可在控制器的控制下,自动进行,而无需人工参与进来,实现了服务器稳定性测试的自动化。The embodiment of the present invention provides a server reboot test method, server, controller and system. In the method, the mutual communication between the server and the controller is established through a switch; when the server receives the boot request sent by the controller, it starts the boot; The server judges whether the Ispci-tmp file exists, if yes, reads the device information in the server, writes the device information into the Ispci-$count file, and compares whether the Ispci-tmp file is consistent with the Ispci-$count file, otherwise , generate an Ispci-tmp file for the device information in the server; when the Ispci-tmp file is consistent with the Ispci-$count file, send the startup completion information to the controller; create a gpu.txt file and a server.txt file ;Receive the shutdown request sent by the controller, and perform the shutdown operation. Through this method, the server can judge whether the file exists, and compare the consistency between the files to determine whether the server startup is normal. In addition, the server can be started or shut down. Under the control of the controller, it is carried out automatically without manual participation, and the automation of the server stability test is realized.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are For some embodiments of the present invention, those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1是本发明一个实施例提供的服务器reboot测试方法的流程图;Fig. 1 is the flow chart of the server reboot test method provided by one embodiment of the present invention;
图2是本发明另一个实施例提供的服务器reboot测试方法的流程图;Fig. 2 is a flowchart of a server reboot testing method provided by another embodiment of the present invention;
图3是本发明又一个实施例提供的服务器reboot测试方法的流程图;Fig. 3 is a flowchart of a server reboot testing method provided by another embodiment of the present invention;
图4是本发明实施例提供的Pcie-Switch服务器的启动/关机时序示意图;Fig. 4 is a schematic diagram of the startup/shutdown sequence of the Pcie-Switch server provided by the embodiment of the present invention;
图5是本发明一个实施例提供的服务器的结构示意图;Fig. 5 is a schematic structural diagram of a server provided by an embodiment of the present invention;
图6是本发明一个实施例提供的控制器的结构示意图;Fig. 6 is a schematic structural diagram of a controller provided by an embodiment of the present invention;
图7是本发明一个实施例提供的服务器reboot测试系统的结构示意图。FIG. 7 is a schematic structural diagram of a server reboot test system provided by an embodiment of the present invention.
具体实施方式detailed description
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例,基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work belong to the protection of the present invention. scope.
如图1所示,本发明实施例提供了一种服务器reboot测试方法,该方法可以包括以下步骤:As shown in Figure 1, the embodiment of the present invention provides a server reboot testing method, which may include the following steps:
步骤101:通过交换机,建立服务器和控制器之间相互通信;Step 101: establish mutual communication between the server and the controller through the switch;
步骤102:当服务器接收到控制器发送的开机请求时,进行开机启动;Step 102: When the server receives the boot request sent by the controller, start the server;
步骤103:服务器判断本身是否存在Ispci-tmp文件,如果是,则执行步骤104;否则,执行步骤105;Step 103: the server judges whether the Ispci-tmp file exists in itself, if yes, then execute step 104; otherwise, execute step 105;
步骤104:读取服务器中的设备信息,将该设备信息写入Ispci-$count文件,并比较Ispci-tmp文件和Ispci-$count文件是否一致,如果是,则执行步骤106,否则,执行步骤107;Step 104: read the device information in the server, write the device information into the Ispci-$count file, and compare whether the Ispci-tmp file is consistent with the Ispci-$count file, if yes, then perform step 106, otherwise, perform step 107;
步骤105:为服务器中的设备信息,生成Ispci-tmp文件;Step 105: generate an Ispci-tmp file for the device information in the server;
步骤106:发送启动完成信息给控制器,并创建gpu.txt文件和server.txt文件,并执行步骤108;Step 106: Send start-up completion information to the controller, and create a gpu.txt file and a server.txt file, and execute step 108;
步骤107:提示错误信息,并结束当前流程;Step 107: Prompt an error message, and end the current process;
步骤108:接收控制器发送的关机请求,进行关机操作。Step 108: Receive a shutdown request sent by the controller, and perform a shutdown operation.
通过交换机,建立服务器和控制器之间相互通信;当服务器接收到控制器发送的开机请求时,进行开机启动;服务器判断本身是否存在Ispci-tmp文件,如果是,则读取服务器中的设备信息,将该设备信息写入Ispci-$count文件,并比较Ispci-tmp文件和Ispci-$count文件是否一致,否则,为服务器中的设备信息,生成Ispci-tmp文件;当所述Ispci-tmp文件和所述Ispci-$count文件一致时,发送启动完成信息给控制器;创建gpu.txt文件和server.txt文件;接收控制器发送的关机请求,进行关机操作,通过该方法,通过服务器判断文件是否存在,并对比文件间的一致性,即可判断出服务器启动是否正常,另外,服务器的启动和关机均可在控制器的控制下,自动进行,而无需人工参与进来,实现了服务器稳定性测试的自动化。Establish mutual communication between the server and the controller through the switch; when the server receives the boot request sent by the controller, it starts up; the server judges whether there is an Ispci-tmp file, and if so, reads the device information in the server , write the device information into the Ispci-$count file, and compare whether the Ispci-tmp file is consistent with the Ispci-$count file, otherwise, generate the Ispci-tmp file for the device information in the server; when the Ispci-tmp file When it is consistent with the Ispci-$count file, send the startup completion information to the controller; create a gpu.txt file and a server.txt file; receive the shutdown request sent by the controller, and perform a shutdown operation. By this method, the server judges the file Whether it exists, and compare the consistency between files, you can judge whether the server startup is normal. In addition, the startup and shutdown of the server can be carried out automatically under the control of the controller without manual participation, achieving server stability Automation of tests.
在本发明一个实施例中,为了保证服务器与控制器间的通信,步骤101的具体实施方式:服务器通过OS网络和BMC网络连接到交换机;控制器通过OS网络连接到交换机。In an embodiment of the present invention, in order to ensure communication between the server and the controller, the specific implementation manner of step 101: the server is connected to the switch through the OS network and the BMC network; the controller is connected to the switch through the OS network.
在本发明一个实施例中,为了实现统计服务器启动次数,进一步包括:在服务器中设置第一计数器count;在步骤102之后,进一步包括:服务器判断本身是否存在count文件,如果是,则将第一计数器count+1,存储到count文件;否则,启动第一计数器count,将第一计数器count+1,生成count文件,并将第一计数器count写入服务器开机启动项,通过将计数器写入服务器开机启动项,保证了启动次数统计的准确性。In one embodiment of the present invention, in order to realize counting server startup times, further include: setting the first counter count in the server; The counter count+1 is stored in the count file; otherwise, start the first counter count, add the first counter count+1 to generate a count file, and write the first counter count into the startup item of the server, by writing the counter into the server startup The startup item ensures the accuracy of the statistics of startup times.
在本发明一个实施例中,所述服务器为Pcie-Switch服务器,该Pcie-Switch服务器包括:资源服务器和server端,其中,所述server端插有一张retimer卡,通过该retimer卡及MiniSASHD线缆与所述资源服务器连接;进一步包括:设置启动时序;所述开机启动,包括:根据设置的启动时序,顺序启动所述资源服务器和server端,保证了Pcie-Switch服务器自动的正常启动。In one embodiment of the present invention, the server is a Pcie-Switch server, and the Pcie-Switch server includes: a resource server and a server end, wherein a retimer card is inserted in the server end, through which the retimer card and the MiniSASHD cable Connecting with the resource server; further comprising: setting the startup sequence; the booting includes: starting the resource server and the server in sequence according to the set startup sequence, ensuring the automatic normal startup of the Pcie-Switch server.
如图2所示,本发明实施例提供服务器reboot测试方法,应用于控制器,可以包括如下步骤:As shown in Figure 2, the embodiment of the present invention provides a server reboot test method, which is applied to a controller and may include the following steps:
步骤201:在控制器中设置第二计数器count,并设置检测阈值;Step 201: setting the second counter count in the controller, and setting the detection threshold;
步骤202:控制器初始化第二计数器count;Step 202: the controller initializes the second counter count;
步骤203:接收服务器发送的启动完成信息,判断第二计数器count的计数是否小于检测阈值,如果是,则执行步骤204;否则,执行步骤205;Step 203: receiving the startup completion information sent by the server, and judging whether the count of the second counter count is less than the detection threshold, if yes, go to step 204; otherwise, go to step 205;
步骤204:检测服务器中是否存在gpu.txt文件和server.txt文件,如果是,则执行步骤206;否则,执行步骤207;Step 204: Detect whether there are gpu.txt files and server.txt files in the server, if yes, then execute step 206; otherwise, execute step 207;
步骤205:关闭服务器,并退出控制服务器,并结束当前流程;Step 205: close the server, exit the control server, and end the current process;
步骤206:调用服务器的关机函数,控制服务器关机,并执行步骤208;Step 206: call the shutdown function of the server, control the shutdown of the server, and execute step 208;
步骤207:延迟一定时间,并返回执行步骤203;Step 207: delay for a certain time, and return to step 203;
步骤208:向服务器发送开机请求,调用服务器的开机函数,控制服务器开机启动,并将第二计数器count+1,执行步骤203。Step 208: Send a boot request to the server, call a boot function of the server, control the server to boot, set the second counter count+1, and execute step 203.
在本发明一个实施例中,为了避免服务器中已经执行的操作对服务器启动产生影响,该方法进一步包括:清除服务器中的操作系统日志。In an embodiment of the present invention, in order to prevent the operations already executed in the server from affecting the startup of the server, the method further includes: clearing operating system logs in the server.
在本发明一个实施例中,所述服务器为Pcie-Switch服务器,该Pcie-Switch服务器包括:资源服务器和server端,其中,所述server端插有一张retimer卡,通过该retimer卡及MiniSASHD线缆与所述资源服务器连接;所述控制服务器关机,包括:顺序控制所述server端关机和所述资源服务器关机;所述控制服务器开机启动,包括:顺序控制所述资源服务器开机启动和所述server端开机启动,保证了Pcie-Switch服务器的正常启动,实现了对Pcie-Switch服务器的稳定性测试的自动化。In one embodiment of the present invention, the server is a Pcie-Switch server, and the Pcie-Switch server includes: a resource server and a server end, wherein a retimer card is inserted in the server end, through which the retimer card and the MiniSASHD cable connected with the resource server; the control server shutdown includes: sequentially controlling the shutdown of the server end and the shutdown of the resource server; the startup of the control server includes: sequentially controlling the startup of the resource server and the server The terminal is booted and started, which ensures the normal startup of the Pcie-Switch server and realizes the automation of the stability test of the Pcie-Switch server.
为使本发明的目的、技术方案和优点更加清楚,结合服务器和控制器之间的交互过程,作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, a further detailed description will be made in conjunction with the interaction process between the server and the controller.
如图3所示,本发明又一实施例提供了服务器reboot测试方法,该方法可以包括以下步骤:As shown in Figure 3, another embodiment of the present invention provides a server reboot testing method, which may include the following steps:
步骤301:通过交换机,建立服务器和控制器之间相互通信;Step 301: establish mutual communication between the server and the controller through the switch;
在该步骤中,服务器通过OS网络和BMC网络连接到交换机;控制器通过OS网络连接到交换机;当服务器为Pcie-Switch服务器时,该Pcie-Switch服务器包括:资源服务器和server端,其中,资源服务器可包含多个GPU,server端插有一张retimer卡,通过该retimer卡及MiniSASHD线缆与资源服务器连接;资源服务器和server端通过OS网络和BMC网络连接到交换机。In this step, the server is connected to the switch through the OS network and the BMC network; the controller is connected to the switch through the OS network; when the server is a Pcie-Switch server, the Pcie-Switch server includes: a resource server and a server end, wherein the resource The server can contain multiple GPUs, and a retimer card is inserted in the server end, which is connected to the resource server through the retimer card and the MiniSASHD cable; the resource server and the server end are connected to the switch through the OS network and the BMC network.
步骤302:在服务器中设置第一计数器count,在控制器中设置第二计数器count,并设置检测阈值;Step 302: setting the first counter count in the server, setting the second counter count in the controller, and setting the detection threshold;
在该步骤中,当服务器为Pcie-Switch服务器时,可将第一计数器count设置在资源服务器。In this step, when the server is a Pcie-Switch server, the first counter count can be set on the resource server.
步骤303:控制器初始化第二计数器count,清除服务器中的操作系统日志;Step 303: the controller initializes the second counter count, and clears the operating system log in the server;
该步骤中,清除服务器中的操作系统日志,是为了避免服务器之前的操作对稳定性测试造成影响。In this step, the operating system logs in the server are cleared to avoid the impact of previous operations on the server on the stability test.
步骤304:控制器向服务器发送开机请求,调用服务器的开机函数,进行服务器开机启动;Step 304: the controller sends a boot request to the server, calls the server boot function, and starts the server;
在该步骤中,当服务器为非热插拔服务器如Pcie-Switch服务器等,可以进一步通过设置启动时序,根据启动时序启动服务器,如图4所示,本发明实施例为Pcie-Switch服务器设置的启动/关机时序,由于Pcie-Switch服务器为非热插拔,则需要首先启动包含有GPU的资源服务器,当资源服务器启东完成之后,再启动server端;同时,在关机过程中,则首先关闭server端,再关闭资源服务器,从而避免服务器由于启动时序问题造成的宕机。In this step, when the server is a non-hot-swappable server such as a Pcie-Switch server, etc., the server can be started according to the startup sequence by setting the startup sequence, as shown in Figure 4, the embodiment of the present invention is set for the Pcie-Switch server Startup/shutdown sequence, since the Pcie-Switch server is not hot-swappable, you need to start the resource server containing the GPU first, and then start the server after the resource server is started; at the same time, in the shutdown process, first shut down the server end, and then shut down the resource server, so as to avoid server downtime caused by startup timing problems.
步骤305:服务器判断本身是否存在count文件,如果是,则执行步骤306;否则执行步骤307;Step 305: the server judges whether the count file exists, if yes, execute step 306; otherwise execute step 307;
步骤306:将第一计数器count+1,存储到count文件,并执行步骤308;Step 306: store the first counter count+1 in the count file, and execute step 308;
步骤307:启动第一计数器count,将第一计数器count+1,生成count文件,并将第一计数器count写入服务器开机启动项;Step 307: start the first counter count, add 1 to the first counter count, generate a count file, and write the first counter count into the server startup item;
在步骤305至步骤307的过程,主要是服务器自身对其启动次数的统计,该统计通过计数器自动完成,而无需人为的参与。The process from step 305 to step 307 is mainly about counting the number of startup times of the server itself, and the counting is done automatically through the counter without human participation.
步骤308:服务器判断本身是否存在Ispci-tmp文件,如果是,则执行步骤309;否则执行步骤310;Step 308: the server judges whether the Ispci-tmp file exists in itself, if yes, then executes step 309; otherwise executes step 310;
步骤309:读取服务器中的设备信息,将该设备信息写入Ispci-$count文件,并比较Ispci-tmp文件和Ispci-$count文件是否一致,如果是,则执行步骤311;否则执行步骤312;Step 309: Read the device information in the server, write the device information into the Ispci-$count file, and compare whether the Ispci-tmp file is consistent with the Ispci-$count file, if yes, then perform step 311; otherwise, perform step 312 ;
步骤310:为服务器中的设备信息,生成Ispci-tmp文件;Step 310: generate an Ispci-tmp file for the device information in the server;
步骤308至步骤310是对服务器中各个设备的信息的收集和对比,通过对设备信息的对比,判断服务器是否启动完全,例如:一个服务器中有GPU1和GPU2,在Ispci-tmp文件则包含有GPU1和GPU2的信息,而Ispci-$count文件中仅有GPU1的信息,则两个文件不一致,说明服务器启动并未完成。Steps 308 to 310 are to collect and compare the information of each device in the server. By comparing the device information, it is judged whether the server is fully started. For example, if there are GPU1 and GPU2 in a server, GPU1 is included in the Ispci-tmp file and GPU2 information, but only GPU1 information in the Ispci-$count file, the two files are inconsistent, indicating that the server startup has not been completed.
步骤311:发送启动完成信息给控制器,创建gpu.txt文件和server.txt文件,并执行步骤313;Step 311: Send the startup completion information to the controller, create a gpu.txt file and a server.txt file, and execute step 313;
步骤312:提示错误信息,并结束当前流程;Step 312: Prompt an error message, and end the current process;
步骤313:控制器接收服务器发送的启动完成信息,判断第二计数器count的计数是否小于检测阈值,如果是,则执行步骤314;否则,执行步骤315;Step 313: The controller receives the startup completion information sent by the server, and judges whether the count of the second counter count is less than the detection threshold, and if so, executes step 314; otherwise, executes step 315;
例如:设置检测阈值为1000,第二计数器count的计数为服务器启动次数,则当服务器启动次数小于1000时,第二计数器count的计数小于1000。For example, if the detection threshold is set to 1000, and the count of the second counter count is the number of server startups, then when the number of server startups is less than 1000, the count of the second counter count is less than 1000.
步骤314:检测服务器中是否存在gpu.txt文件和server.txt文件,如果是,则执行步骤316;否则,执行步骤317;Step 314: Detect whether there are gpu.txt files and server.txt files in the server, if yes, then execute step 316; otherwise, execute step 317;
在该步骤中,首先需要检测服务器是否连接到交换机,即控制器能够通过交换机连接到服务器,由于前面提及当服务器启动完成后,将创建gpu.txt文件和server.txt文件,则通过控制器检测服务器中是否存在gpu.txt文件和server.txt文件,来进一步确定服务器已经启动完成。In this step, it is first necessary to detect whether the server is connected to the switch, that is, the controller can connect to the server through the switch. As mentioned above, when the server is started, the gpu.txt file and the server.txt file will be created. Detect whether the gpu.txt file and server.txt file exist in the server to further confirm that the server has been started.
步骤315:关闭服务器,并退出控制服务器,并结束当前流程;Step 315: close the server, exit the control server, and end the current process;
步骤316:调用服务器的关机函数,控制服务器关机,并执行步骤304;Step 316: call the shutdown function of the server to control the shutdown of the server, and execute step 304;
在该步骤中,将第二计算器count+1,对于非热插拔服务器如Pcie-Switch服务器等来说,能够根据上面设置的时序,顺序控制server端关机和资源服务器关机。In this step, the second counter count+1, for non-hot-swappable servers such as Pcie-Switch servers, etc., can sequentially control the shutdown of the server end and the shutdown of the resource server according to the sequence set above.
步骤317:延迟一定时间,并返回执行步骤313。Step 317: Delay for a certain time, and return to step 313.
如果没检测到gpu.txt文件和server.txt文件,可能由于服务器还没启动完成,那么可以通过延长一定时间如5s再重新接收服务器发送的启动完成信息。If the gpu.txt and server.txt files are not detected, it may be because the server has not been started yet, so you can re-receive the start-up completion message sent by the server by extending a certain period of time, such as 5s.
如图5所示,本发明实施例提供应用于上述任一所述的服务器reboot测试方法的服务器,该服务器通过外设的交换机,与外设的控制器之间相互通信,包括:开关单元501、第一判断单元502、读取写入单元503和生成单元504,其中,As shown in FIG. 5 , the embodiment of the present invention provides a server applied to any of the server reboot testing methods described above. The server communicates with the controller of the peripheral device through the switch of the peripheral device, including: a switch unit 501 , a first judging unit 502, a reading and writing unit 503 and a generating unit 504, wherein,
开关单元501,用于当接收到外设的控制器发送的开机请求时,进行开机启动,并触发第一判断单元502,当接收到外设的控制器发送的关机请求,进行关机操作;The switch unit 501 is configured to perform a power-on start when receiving a power-on request sent by the controller of the peripheral device, and trigger the first judging unit 502 to perform a power-off operation when receiving a power-off request sent by the controller of the peripheral device;
第一判断单元502,用于接收到开机单元501的触发时,判断是否存在Ispci-tmp文件,如果是,则触发读取写入单元503;并比较Ispci-tmp文件和Ispci-$count文件是否一致,否则,触发生成单元504;The first judging unit 502 is used to judge whether there is an Ispci-tmp file when receiving the trigger of the boot unit 501, and if so, trigger the reading and writing unit 503; and compare whether the Ispci-tmp file and the Ispci-$count file consistent, otherwise, trigger generating unit 504;
读取写入单元503,用于读取服务器中的设备信息,将该设备信息写入Ispci-$count文件,并当Ispci-tmp文件和Ispci-$count文件一致时,发送启动完成信息给外设的控制器,并创建gpu.txt文件和server.txt文件;Read and write unit 503, used to read the device information in the server, write the device information into the Ispci-$count file, and when the Ispci-tmp file is consistent with the Ispci-$count file, send the startup completion information to the outside Set the controller, and create gpu.txt file and server.txt file;
生成单元504,用于为各个设备信息,生成Ispci-tmp文件。The generating unit 504 is configured to generate an Ispci-tmp file for each device information.
在本发明另一实施例中,通过OS网络和BMC网络连接到外设的交换机。In another embodiment of the present invention, the switches connected to the peripheral devices are connected through the OS network and the BMC network.
在本发明又一实施例中,上述服务器进一步包括:第二判断单元和第一计数器(图中未示出),其中,In yet another embodiment of the present invention, the above server further includes: a second judging unit and a first counter (not shown in the figure), wherein,
第二判断单元,用于判断是否存在count文件,如果是,则触发第一计数器;否则,启动第一计数器,生成count文件,并将第一计数器写入服务器开机启动项;The second judging unit is used to judge whether there is a count file, and if so, trigger the first counter; otherwise, start the first counter, generate the count file, and write the first counter into the startup item of the server;
第一计数器,用于统计开关单元501的开机启动次数,当开关单元501进行开机启动时,进行count+1,并将开机启动次数存储到count文件。The first counter is used for counting the times of power-on of the switch unit 501. When the switch unit 501 is powered on, count+1 is performed, and the times of power-on and start are stored in the count file.
在本发明另一实施例中,上述服务器为Pcie-Switch服务器,该Pcie-Switch服务器包括:资源服务器和server端,其中,server端插有一张retimer卡,通过该retimer卡及MiniSASHD线缆与资源服务器连接。In another embodiment of the present invention, the above-mentioned server is a Pcie-Switch server, and the Pcie-Switch server includes: a resource server and a server end, wherein a retimer card is inserted in the server end, through which the retimer card and the MiniSASHD cable are connected to the resource server connection.
如图6所示,本发明实施例提供应用于上述任一所述的服务器reboot测试方法的控制器,该控制器,包括:设置单元601、第二计数器602、检测单元603和调用控制单元604,其中,As shown in FIG. 6 , the embodiment of the present invention provides a controller applied to any of the server reboot testing methods described above, the controller includes: a setting unit 601, a second counter 602, a detection unit 603 and a call control unit 604 ,in,
设置单元601,用于设置检测阈值;A setting unit 601, configured to set a detection threshold;
检测单元603,用于判断第二计数器602的计数是否小于设置单元601设置的检测阈值,如果是,则检测外设的服务器中是否存在gpu.txt文件和server.txt文件,如果是,则触发调用控制单元604;The detection unit 603 is used to judge whether the count of the second counter 602 is less than the detection threshold set by the setting unit 601, if yes, then detect whether there are gpu.txt files and server.txt files in the server of the peripheral device, and if so, trigger call the control unit 604;
调用控制单元604,用于在接收到检测单元603的触发时,调用外设的服务器的关机函数,控制外设的服务器关机,向外设的服务器发送开机请求,调用外设的服务器的开机函数,控制外设的服务器开机启动,并将第二计数器602的计数加1。Calling the control unit 604, used to call the shutdown function of the server of the peripheral device when receiving the trigger of the detection unit 603, control the shutdown of the server of the peripheral device, send a startup request to the server of the peripheral device, and call the startup function of the server of the peripheral device , control the server of the peripheral to start up, and add 1 to the count of the second counter 602 .
上述装置内的各单元之间的信息交互、执行过程等内容,由于与本发明方法实施例基于同一构思,具体内容可参见本发明方法实施例中的叙述,此处不再赘述。The information exchange and execution process among the units in the above-mentioned device are based on the same concept as the method embodiment of the present invention, and the specific content can refer to the description in the method embodiment of the present invention, and will not be repeated here.
如图7所示,本发明实施例提供服务器reboot测试系统,包括:至少一个上述任意一种服务器701、交换机702和控制器703,其中,As shown in FIG. 7 , an embodiment of the present invention provides a server reboot test system, including: at least one server 701 of any one of the above, a switch 702 and a controller 703, wherein,
至少一个服务器701和控制器703分别与交换机702连接。At least one server 701 and a controller 703 are respectively connected to the switch 702 .
根据上述方案,本发明的各实施例所提供的服务器reboot测试方法、服务器、控制器和系统,至少具有如下有益效果:According to the above solution, the server reboot testing method, server, controller and system provided by each embodiment of the present invention have at least the following beneficial effects:
1.通过交换机,建立服务器和控制器之间相互通信;当服务器接收到控制器发送的开机请求时,进行开机启动;服务器判断本身是否存在Ispci-tmp文件,如果是,则读取服务器中的设备信息,将该设备信息写入Ispci-$count文件,并比较Ispci-tmp文件和Ispci-$count文件是否一致,否则,为服务器中的设备信息,生成Ispci-tmp文件;当所述Ispci-tmp文件和所述Ispci-$count文件一致时,发送启动完成信息给控制器;创建gpu.txt文件和server.txt文件;接收控制器发送的关机请求,进行关机操作,通过该方法,通过服务器判断文件是否存在,并对比文件间的一致性,即可判断出服务器启动是否正常,另外,服务器的启动和关机均可在控制器的控制下,自动进行,而无需人工参与进来,实现了服务器稳定性测试的自动化。1. Establish mutual communication between the server and the controller through the switch; when the server receives the boot request sent by the controller, it starts the boot; the server judges whether there is an Ispci-tmp file, and if so, reads the Ispci-tmp file in the server. Device information, write the device information into the Ispci-$count file, and compare whether the Ispci-tmp file is consistent with the Ispci-$count file, otherwise, generate the Ispci-tmp file for the device information in the server; when the Ispci- When the tmp file is consistent with the Ispci-$count file, send the start-up completion information to the controller; create a gpu.txt file and a server.txt file; receive the shutdown request sent by the controller, and perform a shutdown operation. Through this method, through the server Judging whether the file exists, and comparing the consistency between the files, you can judge whether the server startup is normal. In addition, the startup and shutdown of the server can be carried out automatically under the control of the controller without manual participation, realizing the server Automation of Stability Testing.
2.服务器通过OS网络和BMC网络连接到交换机;控制器通过OS网络连接到交换机;使得控制器能够控制服务器的自动开关机,保证了服务器稳定性测试的自动化,另外,通过设置启动时序;根据设置的启动时序,顺序启动Pcie-Switch服务器的资源服务器和server端,使得非热插拔的Pcie-Switch服务器也能够实现稳定性测试的自动化。2. The server is connected to the switch through the OS network and BMC network; the controller is connected to the switch through the OS network; the controller can control the automatic power on and off of the server, ensuring the automation of the server stability test. In addition, by setting the startup sequence; according to The set startup sequence starts the resource server and the server side of the Pcie-Switch server sequentially, so that the non-hot-swappable Pcie-Switch server can also realize the automation of the stability test.
3.在服务器启动完成之后,启动完成信息给控制器,创建gpu.txt文件和server.txt文件,控制器在接收到启动完成信息之后,还是会检测服务器中是否存在gpu.txt文件和server.txt文件,保证了服务器启动的准确性。3. After the server startup is completed, the startup completion information is sent to the controller, and the gpu.txt file and the server.txt file are created. After the controller receives the startup completion information, it will still detect whether the gpu.txt file and server exist in the server. txt file to ensure the accuracy of server startup.
4.通过设置检测阈值,只有当控制器中的第二计数器count的计数小于检测阈值时,控制器才会调用开机/关机函数,控制服务器的开机/关机,避免稳定性测试进入死循环,另外,控制器通过清除服务器中的操作系统日志,避免服务器中除开关机之外的操作对服务器稳定性造成影响,进一步提高服务器稳定性测试的准确性。4. By setting the detection threshold, only when the count of the second counter count in the controller is less than the detection threshold, the controller will call the startup/shutdown function to control the startup/shutdown of the server and avoid the stability test from entering an endless loop. In addition The controller, by clearing the operating system logs in the server, avoids operations in the server other than switching on and off from affecting the stability of the server, and further improves the accuracy of the server stability test.
需要说明的是,在本文中,诸如第一和第二之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个······”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同因素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or sequence. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without more limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional same elements in the process, method, article or apparatus comprising said element.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储在计算机可读取的存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质中。Those of ordinary skill in the art can understand that all or part of the steps for realizing the above-mentioned method embodiments can be completed by hardware related to program instructions, and the aforementioned programs can be stored in a computer-readable storage medium. When the program is executed, the It includes the steps of the above method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.
最后需要说明的是:以上所述仅为本发明的较佳实施例,仅用于说明本发明的技术方案,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所做的任何修改、等同替换、改进等,均包含在本发明的保护范围内。Finally, it should be noted that: the above descriptions are only preferred embodiments of the present invention, and are only used to illustrate the technical solutions of the present invention, and are not used to limit the protection scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610202489.XA CN105912431A (en) | 2016-04-01 | 2016-04-01 | Reboot testing method of server, server, control device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610202489.XA CN105912431A (en) | 2016-04-01 | 2016-04-01 | Reboot testing method of server, server, control device and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105912431A true CN105912431A (en) | 2016-08-31 |
Family
ID=56745210
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610202489.XA Pending CN105912431A (en) | 2016-04-01 | 2016-04-01 | Reboot testing method of server, server, control device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105912431A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649014A (en) * | 2016-12-28 | 2017-05-10 | 郑州云海信息技术有限公司 | Automatic testing method of calculating type server which supports multiple GPUs |
CN108958995A (en) * | 2018-05-21 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of method and system of whole machine cabinet server stability test |
CN116244113A (en) * | 2023-02-22 | 2023-06-09 | 安芯网盾(北京)科技有限公司 | System downtime obstacle avoidance and restoration method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7818621B2 (en) * | 2007-01-11 | 2010-10-19 | International Business Machines Corporation | Data center boot order control |
CN104375910A (en) * | 2014-11-24 | 2015-02-25 | 浪潮电子信息产业股份有限公司 | Automatic power-on and power-off test method |
CN104536875A (en) * | 2015-01-16 | 2015-04-22 | 浪潮电子信息产业股份有限公司 | IPMI-based method for carrying out automatic restart test on server |
CN104899120A (en) * | 2015-05-27 | 2015-09-09 | 浪潮电子信息产业股份有限公司 | Server stability testing method based on BMC (baseboard management controller) startup and shutdown functions |
-
2016
- 2016-04-01 CN CN201610202489.XA patent/CN105912431A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7818621B2 (en) * | 2007-01-11 | 2010-10-19 | International Business Machines Corporation | Data center boot order control |
CN104375910A (en) * | 2014-11-24 | 2015-02-25 | 浪潮电子信息产业股份有限公司 | Automatic power-on and power-off test method |
CN104536875A (en) * | 2015-01-16 | 2015-04-22 | 浪潮电子信息产业股份有限公司 | IPMI-based method for carrying out automatic restart test on server |
CN104899120A (en) * | 2015-05-27 | 2015-09-09 | 浪潮电子信息产业股份有限公司 | Server stability testing method based on BMC (baseboard management controller) startup and shutdown functions |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649014A (en) * | 2016-12-28 | 2017-05-10 | 郑州云海信息技术有限公司 | Automatic testing method of calculating type server which supports multiple GPUs |
CN108958995A (en) * | 2018-05-21 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of method and system of whole machine cabinet server stability test |
CN116244113A (en) * | 2023-02-22 | 2023-06-09 | 安芯网盾(北京)科技有限公司 | System downtime obstacle avoidance and restoration method and device |
CN116244113B (en) * | 2023-02-22 | 2023-12-19 | 安芯网盾(北京)科技有限公司 | System downtime obstacle avoidance and restoration method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103778038B (en) | Method and system for verifying cloud test and remote monitoring integrated circuit device | |
CN111552486B (en) | SSD firmware burning method and related components | |
CN108897646B (en) | A BIOS chip switching method and baseboard management controller | |
CN111966380A (en) | BMC (baseboard management controller) firmware upgrading method, system, terminal and storage medium | |
CN104615523A (en) | Fatigue test method for BMC management module based on IPMI protocol | |
CN105068900A (en) | Testing method for remote control server cold reboot | |
WO2024007510A1 (en) | Server management method, apparatus and system, and electronic device and readable storage medium | |
CN103605591A (en) | Method and device for controlling memory initialization of terminal system | |
CN112596750B (en) | Application testing method and device, electronic equipment and computer readable storage medium | |
CN110162435A (en) | A kind of server PXE starting test method, system, terminal and storage medium | |
CN110673867A (en) | CPLD online upgrade method, device and system | |
CN119473744B (en) | Link test method, electronic device, storage medium, product and computing device | |
CN115951949A (en) | Method, device and computing device for recovering configuration parameters of BIOS | |
CN105302726A (en) | Test method and device | |
CN105912431A (en) | Reboot testing method of server, server, control device and system | |
CN114780316A (en) | Memory test method, device and system | |
CN107329914A (en) | It is a kind of that the out of order method and device of hard disk is detected based on linux system | |
CN111352662B (en) | A server startup sequence control method, system, terminal and storage medium | |
CN105468123A (en) | Rack management controller, power management program update system and method | |
CN106603343A (en) | A method for testing stability of servers in batch | |
CN105718324A (en) | Simulation test method, device and equipment for abnormal power failure in mobile terminal upgrade process | |
CN111399871B (en) | System updating method, device, equipment and medium of HBA card | |
CN112817883A (en) | Method, device and system for adapting interface platform and computer readable storage medium | |
WO2017096889A1 (en) | Method and device for upgrading and downgrading system | |
CN115562900B (en) | AMD server system installation power-off processing method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160831 |
|
RJ01 | Rejection of invention patent application after publication |