[go: up one dir, main page]

CN104021054A - Server fault visual detecting and processing method and system and programmable chip - Google Patents

Server fault visual detecting and processing method and system and programmable chip Download PDF

Info

Publication number
CN104021054A
CN104021054A CN201410258508.1A CN201410258508A CN104021054A CN 104021054 A CN104021054 A CN 104021054A CN 201410258508 A CN201410258508 A CN 201410258508A CN 104021054 A CN104021054 A CN 104021054A
Authority
CN
China
Prior art keywords
fault
bmc
information
programmable chip
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410258508.1A
Other languages
Chinese (zh)
Inventor
郑天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410258508.1A priority Critical patent/CN104021054A/en
Publication of CN104021054A publication Critical patent/CN104021054A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a server fault visual detecting and processing method and system and a programmable chip. The method includes the steps that the programmable chip receives fault information sent by a BMC in a server; the programmable chip sends the fault information to an OLED displayer to be displayed, the fault level is judged according to the fault information, and a corresponding processing strategy is fed back to the BMC according to the fault level so as to drive the BMC to conduct corresponding processing on the server according to the processing strategy. According to the server fault visual detecting and processing method and system and the programmable chip, the occupancy rate of the BMS can be reduced, the purpose of rapid fault positioning can further be achieved, and a multi-node system and a redundant system can be well supported. In addition, the programmable chip can judge the level of occurred faults and can feed back corresponding processing strategies to drive the BMC to conduct corresponding processing on the faults.

Description

服务器故障可视化侦测及处理方法、系统及可编程芯片Server fault visual detection and processing method, system and programmable chip

技术领域technical field

本发明涉及计算机应用技术领域,具体涉及一种服务器故障可视化侦测及处理方法、系统及可编程芯片。The invention relates to the field of computer application technology, in particular to a server fault visual detection and processing method, system and programmable chip.

背景技术Background technique

高可靠性作为服务器的一个重要特性,使得故障检测必然成为服务器的一个重要功能。目前服务器故障检测主要是利用的LED灯与故障手册结合的方式,具体就是用户先判断LED灯的状态,然后查阅故障手册定位具体的故障,这种方法的缺点是故障手册中信息一般都很多,用户查找会花费很多的时间,无法做到快速定位故障的目的。As an important feature of the server, high reliability makes fault detection an important function of the server. At present, server fault detection mainly uses the combination of LED light and fault manual. Specifically, the user first judges the status of the LED light, and then consults the fault manual to locate the specific fault. The disadvantage of this method is that the fault manual generally contains a lot of information. It takes a lot of time to search for users, and the purpose of quickly locating faults cannot be achieved.

其次,传统的故障检测方法是通过BMC直接控制LED驱动器,如图1所示,基板管理控制器(Baseboard Management Controller,简称BMC)通过SMB(SMBUS的缩写)直接控制LED驱动器,让LED矩阵可以随时显示服务器的运行情况。当前,为了更好的实现人机交互,故障检测需要显示的内容的丰富性和即时性要求也越来越高。但是这样会占用运算能力有限的BMC相当多的资源,甚至有可能出现显示不及时的状况。并且,传统的故障检测方法仅适用于单节点非冗余系统,即每个节点的BMC智能控制相对于的LED矩阵,这种架构不适用于多节点系统和冗余系统。Secondly, the traditional fault detection method is to directly control the LED driver through the BMC. As shown in Figure 1, the Baseboard Management Controller (BMC for short) directly controls the LED driver through the SMB (abbreviation of SMBUS), so that the LED matrix can be controlled at any time. Displays the running status of the server. At present, in order to better realize human-computer interaction, the requirements for richness and immediacy of displayed content for fault detection are also getting higher and higher. However, this will take up a lot of resources of the BMC with limited computing power, and may even cause the display to be out of time. Moreover, the traditional fault detection method is only suitable for single-node non-redundant systems, that is, the BMC intelligent control of each node is relative to the LED matrix, and this architecture is not suitable for multi-node systems and redundant systems.

发明内容Contents of the invention

本发明需要解决的技术问题是提供一种服务器故障可视化侦测及处理方法、系统及可编程芯片,不仅能够减少BMS的占用率,还可以达到快速定位故障的目的。The technical problem to be solved by the present invention is to provide a server fault visual detection and processing method, system and programmable chip, which can not only reduce the occupancy rate of BMS, but also achieve the purpose of quickly locating faults.

为了解决上述技术问题,本发明提供了一种服务器故障可视化侦测及处理方法,包括:In order to solve the above technical problems, the present invention provides a method for visual detection and processing of server faults, including:

可编程芯片接收服务器中的基板管理控制器BMC发送的故障信息;The programmable chip receives the fault information sent by the baseboard management controller BMC in the server;

所述可编程芯片将所述故障信息发送至OLED显示器进行显示,并且根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动所述BMC根据所述处理策略对所述服务器进行相应处理。The programmable chip sends the fault information to the OLED display for display, and judges the fault level according to the fault information, and feeds back a corresponding processing strategy to the BMC according to the fault level, so as to drive the BMC according to the fault level. The processing policy performs corresponding processing on the server.

进一步地,还包括:Further, it also includes:

所述可编程芯片接收到所述BMC发送的服务器的工作状态信息;The programmable chip receives the working status information of the server sent by the BMC;

所述可编程芯片将所述工作状态信息发送至OLED显示器进行显示。The programmable chip sends the working status information to the OLED display for display.

进一步地,所述可编程芯片采用智能平台管理接口IPMI协议与所述BMC进行通信;所述可编程芯片通过I2C总线接收所述BMC发送的所述故障信息,通过通用异步收发器UART接口接收所述BMC发送的所述工作状态信息;所述可编程芯片通过UART接口反馈所述相应的处理策略至所述BMC;Further, the programmable chip uses the intelligent platform management interface IPMI protocol to communicate with the BMC; the programmable chip receives the fault information sent by the BMC through the I2C bus, and receives the fault information through the UART interface of the universal asynchronous transceiver. The working status information sent by the BMC; the programmable chip feeds back the corresponding processing strategy to the BMC through the UART interface;

其中,所述可编程芯片为片上可编程系统PSOC系列芯片。Wherein, the programmable chip is a programmable system on chip PSOC series chip.

进一步地,所述故障信息包括:定位信息和报错信息;Further, the fault information includes: positioning information and error reporting information;

所述方法还包括:BMC通过与FPGA通信进行定位,将所述故障信息以数据包格式发送给所述可编程芯片;所述定位信息为发生故障的硬件信息,所述报错信息为所述发生故障的硬件的出错信息。The method also includes: the BMC communicates with the FPGA to locate, and sends the fault information to the programmable chip in a data packet format; the positioning information is the hardware information of the fault, and the error message is the fault information. Error message for faulty hardware.

进一步地,所述方法还包括:为每个故障级别设置相应的阈值以及处理策略,Further, the method further includes: setting a corresponding threshold and a processing strategy for each fault level,

所述根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理,包括:Said judging the fault level according to the fault information, feeding back a corresponding processing strategy to the BMC according to the fault level, so as to drive the BMC to perform corresponding processing on the server according to the processing strategy, including:

如果所述报错信息达到某一故障级别对应的阈值,则判断发生的故障为该故障级别,并查询对应的处理策略,将所述处理策略发送至所述BMC,其中,所述处理策略中还包括:所述定位信息和所述故障级别。If the error information reaches the threshold corresponding to a certain fault level, then it is judged that the fault occurred is the fault level, and the corresponding processing strategy is queried, and the processing strategy is sent to the BMC, wherein the processing strategy also includes Including: the positioning information and the fault level.

进一步地,所述BMC包括一个或多个。Further, the BMC includes one or more.

为了解决上述技术问题,本发明还提供了一种可编程芯片,用于服务器故障可视化侦测及处理,包括:In order to solve the above technical problems, the present invention also provides a programmable chip for visual detection and processing of server faults, including:

接收模块,用于接收BMC发送的故障信息;The receiving module is used to receive the fault information sent by the BMC;

显示控制模块,用于将所述故障信息发送至OLED显示器进行显示;A display control module, configured to send the fault information to an OLED display for display;

故障控制模块,用于根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理。The fault control module is configured to judge a fault level according to the fault information, and feed back a corresponding processing policy to the BMC according to the fault level, so as to drive the BMC to perform corresponding processing on the server according to the processing policy.

进一步地,所述可编程芯片采用IPMI协议与所述BMC进行通信;Further, the programmable chip communicates with the BMC using the IPMI protocol;

所述接收模块,还用于通过I2C总线接收所述故障信息;The receiving module is also used to receive the fault information through the I2C bus;

所述显示控制模块,还用于通过串行外设接口SPI将所述故障信息发送至所述OLED显示器;The display control module is further configured to send the fault information to the OLED display through a serial peripheral interface SPI;

所述故障控制模块,还用于通过UART接口反馈所述相应的处理策略至所述BMC。The fault control module is further configured to feed back the corresponding processing strategy to the BMC through a UART interface.

进一步地,所述故障信息包括:定位信息和报错信息;所述定位信息为发生故障的硬件的位置信息,所述报错信息为所述发送故障的硬件的出错信息。Further, the fault information includes: location information and error reporting information; the location information is the location information of the faulty hardware, and the error reporting information is the error information of the sending faulty hardware.

所述故障控制模块,还用于为每个故障级别设置相应的阈值以及处理策略;The fault control module is also used to set corresponding thresholds and processing strategies for each fault level;

所述故障控制模块,用于根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理,包括:The fault control module is configured to judge a fault level according to the fault information, and feed back a corresponding processing strategy to the BMC according to the fault level, so as to drive the BMC to perform corresponding processing on the server according to the processing strategy, including:

如果所述报错信息达到某一故障级别对应的阈值,则判断发生的故障为该故障级别,并查询对应的处理策略,将所述处理策略发送至所述BMC,其中,所述处理策略中还包括:所述定位信息和所述故障级别。If the error information reaches the threshold corresponding to a certain fault level, then it is judged that the fault occurred is the fault level, and the corresponding processing strategy is queried, and the processing strategy is sent to the BMC, wherein the processing strategy also includes Including: the positioning information and the fault level.

为了解决上述技术问题,本发明还提供了一种服务器故障可视化侦测及处理系统,包括:一个或多个BMC、如上所述的可编程芯片以及OLED显示器。In order to solve the above technical problems, the present invention also provides a visual detection and processing system for server faults, including: one or more BMCs, the programmable chip as described above, and an OLED display.

与现有技术相比,本发明的至少一个实施例提供的服务器故障可视化侦测及处理方法、系统及可编程芯片,在服务器上增加OLED(有机发光二极管)显示器,在OLED显示器与BMC之间加上可编程芯片,此外,用户通过OLED显示的故障信息,可以快速定位到故障点;在另一个实施例中,采用PSOC(Programmable System On Chip,片上可编程系统)系列芯片,输入可以连接一个或多个BMC,输出到一个OLED显示器上,以实现由BMC控制PSOC驱动OLED显示的模式,不仅能够减少BMS的占用率,同时能够让客户了解更丰富的服务器的即时运行情况,并且对多节点系统和冗余系统能够很好地支持。在另一个实施例中,可编程芯片能够判断发生的故障级别,并且反馈相应的处理策略来驱使BMC对故障进行相应处理。Compared with the prior art, at least one embodiment of the present invention provides a server failure visual detection and processing method, system and programmable chip, adding an OLED (organic light-emitting diode) display to the server, and connecting the OLED display and the BMC between the OLED display and the BMC. In addition, the user can quickly locate the fault point through the fault information displayed on the OLED; in another embodiment, the PSOC (Programmable System On Chip, programmable system on chip) series chip is used, and the input can be connected to a or multiple BMCs, output to an OLED display to realize the mode that the BMC controls the PSOC to drive the OLED display, which can not only reduce the occupancy rate of the BMS, but also allow customers to know more about the real-time operation of the server, and multi-node systems and redundant systems are well supported. In another embodiment, the programmable chip can determine the level of the fault, and feed back a corresponding processing strategy to drive the BMC to handle the fault accordingly.

附图说明Description of drawings

图1是现有技术中故障检测示意图;FIG. 1 is a schematic diagram of fault detection in the prior art;

图2是实施例中服务器故障可视化侦测及处理装置的结构图;Fig. 2 is a structural diagram of a server failure visual detection and processing device in an embodiment;

图3是实施例中可编程芯片的结构图;Fig. 3 is the structural diagram of programmable chip in the embodiment;

图4是实施例中服务器故障可视化侦测及处理方法的流程图。FIG. 4 is a flow chart of a server fault visual detection and processing method in an embodiment.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白,下文中将结合附图对本发明的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。In order to make the purpose, technical solution and advantages of the present invention more clear, the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.

实施例:Example:

如图2所示,本实施例提供了一种服务器故障可视化侦测及处理系统,包括:BMC、可编程芯片以及OLED显示器,其中:As shown in Figure 2, the present embodiment provides a visual detection and processing system for server faults, including: BMC, programmable chip and OLED display, wherein:

BMC与可编程芯片相互连接,并通过智能平台管理接口(IntelligentPlatform Management Interface,简称IPMI)与所述可编程芯片通信;通过通用异步收发器(Universal Asynchronous Receiver/Transmitter,简称UART)接口发送服务器的工作状态信息,通过I2C总线发送故障信息至所述可编程芯片;所述可编程芯片通过UART接口反馈故障处理策略至BMC;The BMC and the programmable chip are connected to each other, and communicate with the programmable chip through the Intelligent Platform Management Interface (Intelligent Platform Management Interface, referred to as IPMI); Status information, sending fault information to the programmable chip through the I2C bus; the programmable chip feeds back the fault handling strategy to the BMC through the UART interface;

BMC通过与FPGA通信获取服务器的工作状态信息或故障信息;The BMC obtains the working status information or fault information of the server by communicating with the FPGA;

所述可编程芯片与OLED显示器相互连接,并通过串行外设接口(SerialPeripheral Interface,简称SPI)将所述故障信息发送至OLED显示器。The programmable chip and the OLED display are connected to each other, and the fault information is sent to the OLED display through a serial peripheral interface (Serial Peripheral Interface, SPI for short).

其中,可编程芯片可采用PSOC系列芯片,比如采用赛普拉斯公司生产的Cypress PSOC4系列芯片。Wherein, the programmable chip can adopt PSOC series chips, such as Cypress PSOC4 series chips produced by Cypress Corporation.

优选地,该系统包括一个或多个BMC,一个可编程芯片通过连接多个BMC,以驱动OLED显示器显示故障信息,可以实现对多节点系统和冗余系统的支持。例如,现在一台八路服务器(八个CPU),可以有两种工作模式,八路工作模式和双四路工作模式,在八路工作模式下那就是一个BMC起作用,如果是双四路工作模式就是两个BMC起作用,BMC多少由具体的工作模式决定。Preferably, the system includes one or more BMCs, and one programmable chip can drive OLED displays to display fault information by connecting multiple BMCs, so as to support multi-node systems and redundant systems. For example, now an eight-way server (eight CPUs) can have two working modes, eight-way working mode and dual four-way working mode. In the eight-way working mode, one BMC works. Two BMCs work, and the number of BMCs depends on the specific working mode.

如图3所示,本实施例提供了可编程芯片,用于服务器故障可视化侦测及处理,包括:As shown in Figure 3, this embodiment provides a programmable chip for visual detection and processing of server faults, including:

接收模块,用于接收BMC发送的故障信息;The receiving module is used to receive the fault information sent by the BMC;

显示控制模块,用于将所述故障信息发送至OLED显示器进行显示;A display control module, configured to send the fault information to an OLED display for display;

故障控制模块,用于根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理。The fault control module is configured to judge a fault level according to the fault information, and feed back a corresponding processing policy to the BMC according to the fault level, so as to drive the BMC to perform corresponding processing on the server according to the processing policy.

其中,所述BMC包括一个或多个,显示控制模块将一个或多个BMC的故障信息发送至OLED显示器进行显示;故障控制模块,将与BMC对应的处理策略反馈至该BMC所属的服务器。Wherein, the BMC includes one or more, and the display control module sends the fault information of one or more BMCs to the OLED display for display; the fault control module feeds back the processing strategy corresponding to the BMC to the server to which the BMC belongs.

其中,所述可编程芯片采用IPMI协议与所述BMC进行通信;Wherein, the programmable chip communicates with the BMC using the IPMI protocol;

所述接收模块,还用于通过I2C总线接收所述故障信息;The receiving module is also used to receive the fault information through the I2C bus;

所述显示控制模块,还用于通过串行外设接口SPI将所述故障信息发送至所述OLED显示器;The display control module is further configured to send the fault information to the OLED display through a serial peripheral interface SPI;

所述故障控制模块,还用于通过UART接口反馈所述相应的处理策略至所述BMC。The fault control module is further configured to feed back the corresponding processing strategy to the BMC through a UART interface.

其中,所述故障信息包括:定位信息和报错信息;所述定位信息为发生故障的硬件的位置信息,所述报错信息为所述发送故障的硬件的出错信息。例如,1号内存板上温度过高为多少,其中1号板为定位信息,温度为报错信息。BMC通过与现场可编程逻辑门阵列(Field Programmable Gate Array,简称FPGA)通信进行定位,将所述故障信息以数据包格式发送给所述可编程芯片,比如,故障信息格式可以是CPU0error temperature65°;Wherein, the fault information includes: location information and error reporting information; the location information is the location information of the faulty hardware, and the error reporting information is the error information of the sending faulty hardware. For example, how high is the temperature on memory board No. 1, where board No. 1 is the positioning information, and the temperature is the error reporting information. BMC locates by communicating with Field Programmable Gate Array (FPGA), and sends the fault information to the programmable chip in data packet format, for example, the fault information format can be CPU0error temperature65°;

所述故障控制模块,还用于为每个故障级别设置相应的阈值以及处理策略;The fault control module is also used to set corresponding thresholds and processing strategies for each fault level;

所述故障控制模块,用于根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理,包括:The fault control module is configured to judge a fault level according to the fault information, and feed back a corresponding processing strategy to the BMC according to the fault level, so as to drive the BMC to perform corresponding processing on the server according to the processing strategy, including:

如果所述报错信息达到某一故障级别对应的阈值,则判断发生的故障为该故障级别,并查询对应的处理策略,将所述处理策略发送至所述BMC,其中,所述处理策略中还包括:所述定位信息和所述故障级别。If the error information reaches the threshold corresponding to a certain fault level, then it is judged that the fault occurred is the fault level, and the corresponding processing strategy is queried, and the processing strategy is sent to the BMC, wherein the processing strategy also includes Including: the positioning information and the fault level.

例如,设置3个故障级别,分别是:警告故障、严重警告和不可恢复故障,以CPU为例,分别对应3个故障级别的阈值设为70°、80°、90°,也就是说,报错信息中CPU的温度达到70°为告警故障,达到90°则为不可恢复故障。对应警告和严重警告的处理策略可以是查看服务器风扇是否在位并启动,而严重警告对应的处理策略则可以是直接关机。BMC接收到可编程芯片的处理策略后,就可以根据所述处理策略对服务器进行相应处理。For example, set three fault levels, namely: warning fault, serious warning and unrecoverable fault. Taking the CPU as an example, the thresholds corresponding to the three fault levels are set to 70°, 80°, and 90° respectively. That is to say, error In the information, if the CPU temperature reaches 70°C, it is an alarm fault, and if it reaches 90°C, it is an unrecoverable fault. The processing strategy corresponding to warnings and serious warnings can be to check whether the server fan is in place and start it, while the processing strategy corresponding to serious warnings can be to directly shut down. After receiving the processing policy of the programmable chip, the BMC can process the server accordingly according to the processing policy.

另外,故障级别还可以包括普通警告,先发送至OLED显示,同时OLED会显示出用户对此故障的可操作选项,用户通过按键选中选项,进而可编程芯片才将该用户选中的故障处理方式作为对该普通告警故障的处理策略发送给BMC。In addition, the fault level can also include general warnings, which are first sent to the OLED display, and the OLED will display the user's operable options for this fault. The processing strategy for the common alarm fault is sent to the BMC.

所述可编程芯片为片上可编程系统PSOC系列芯片。The programmable chip is a programmable system on chip PSOC series chip.

如图4所示,本实施例提供了一种服务器故障可视化侦测及处理方法,包括以下步骤:As shown in Figure 4, this embodiment provides a visual detection and processing method for a server failure, including the following steps:

S101:可编程芯片接收到BMC发送的故障信息;S101: The programmable chip receives the fault information sent by the BMC;

BMC需实时监控服务器是否发生故障,当监控到服务器发生故障时,将故障信息发送给可编程芯片;The BMC needs to monitor whether the server is faulty in real time, and when it detects that the server is faulty, it will send the fault information to the programmable chip;

所述故障信息包括:定位信息和报错信息;例如,1号内存板上温度过高为多少,其中1号板为定位信息,温度为报错信息。所述定位信息为发生故障的硬件信息,比如:发生故障的硬件是CPU、硬盘Disk、风扇以及内存板MRB、电源PSU等等,所述报错信息为所述发生故障的硬件的出错信息,如CPU电压和温度的出错信息、Disk出错信息、风扇出错信息等等。由BMC通过与FPGA通信进行定位,然后将所述故障信息以数据包格式发送给PSoC。例如,故障信息格式可以是CPU0 error temperature65°。The fault information includes: positioning information and error reporting information; for example, how high is the temperature on the No. 1 memory board, where No. 1 board is the positioning information, and the temperature is the error reporting information. Described location information is the hardware information that breaks down, such as: the hardware that breaks down is CPU, hard disk Disk, fan and memory board MRB, power supply PSU etc., and described error message is the error message of described hardware that breaks down, as Error information of CPU voltage and temperature, Disk error information, fan error information, etc. The BMC communicates with the FPGA for positioning, and then sends the fault information to the PSoC in a data packet format. For example, the fault information format can be CPU0 error temperature65°.

优选地,与可编程芯片连接的BMC包括一个或多个,一个可编程芯片通过连接多个BMC,以驱动OLED显示器显示故障信息,可以实现对多节点系统和冗余系统的支持。Preferably, the BMC connected with the programmable chip includes one or more, and one programmable chip is connected with multiple BMCs to drive the OLED display to display fault information, which can realize support for multi-node system and redundant system.

S102:所述可编程芯片将所述故障信息发送至OLED显示器进行显示,并且根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理。S102: The programmable chip sends the fault information to the OLED display for display, and judges the fault level according to the fault information, and feeds back a corresponding processing strategy to the BMC according to the fault level, so as to drive the BMC according to the fault level. The processing policy handles the server accordingly.

其中,所述方法还包括:为每个故障级别设置相应的阈值以及处理策略,Wherein, the method further includes: setting a corresponding threshold and a processing strategy for each fault level,

所述根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理,包括:Said judging the fault level according to the fault information, feeding back a corresponding processing strategy to the BMC according to the fault level, so as to drive the BMC to perform corresponding processing on the server according to the processing strategy, including:

如果所述报错信息达到某一故障级别对应的阈值,则判断发生的故障为该故障级别,并查询对应的处理策略,将所述处理策略发送至所述BMC,其中,所述处理策略中还包括:所述定位信息和所述故障级别。If the error information reaches the threshold corresponding to a certain fault level, then it is judged that the fault occurred is the fault level, and the corresponding processing strategy is queried, and the processing strategy is sent to the BMC, wherein the processing strategy also includes Including: the positioning information and the fault level.

例如,设置3个故障级别,分别是:警告故障、严重警告和不可恢复故障,以CPU为例,分别对应3个故障级别的阈值设为70°、80°、90°,也就是说,报错信息中CPU的温度达到70°为告警故障,达到90°则为不可恢复故障。对应警告和严重警告的处理策略可以是查看服务器风扇是否在位并启动,而严重警告对应的处理策略则可以是直接关机。BMC接收到可编程芯片的处理策略后,就可以根据所述处理策略对服务器进行相应处理。For example, set three fault levels, namely: warning fault, serious warning and unrecoverable fault. Taking the CPU as an example, the thresholds corresponding to the three fault levels are set to 70°, 80°, and 90° respectively. That is to say, error In the information, if the CPU temperature reaches 70°C, it is an alarm fault, and if it reaches 90°C, it is an unrecoverable fault. The processing strategy corresponding to warnings and serious warnings can be to check whether the server fan is in place and start it, while the processing strategy corresponding to serious warnings can be to directly shut down. After receiving the processing policy of the programmable chip, the BMC can process the server accordingly according to the processing policy.

另外,故障级别还可以包括普通警告,先发送至OLED显示,同时OLED会显示出用户对此故障的可操作选项,用户通过按键选中选项,进而可编程芯片才将该用户选中的故障处理方式作为对该普通告警故障的处理策略发送给BMC。In addition, the fault level can also include general warnings, which are first sent to the OLED display, and the OLED will display the user's operable options for this fault. The processing strategy for the common alarm fault is sent to the BMC.

优选地,所述BMC包括一个或多个,可编程芯片接收每个BMC发送的故障信息,并将一个或多个BMC的故障信息发送至OLED显示器进行显示;将与BMC对应的处理策略反馈至该BMC所属的服务器。Preferably, the BMC includes one or more, and the programmable chip receives the fault information sent by each BMC, and sends the fault information of one or more BMCs to the OLED display for display; the processing strategy corresponding to the BMC is fed back to The server to which the BMC belongs.

此外,当BMC没有监控到服务器发生故障,即服务器处于正常工作状态时,所述方法还包括:In addition, when the BMC does not monitor the failure of the server, that is, when the server is in a normal working state, the method further includes:

BMC向所述可编程芯片发送服务器的工作状态信息;所述可编程芯片将所述工作状态信息发送至OLED显示器进行显示。The BMC sends the working status information of the server to the programmable chip; the programmable chip sends the working status information to the OLED display for display.

OLED显示器显示的工作状态信息包括:服务器硬件配置信息(比如:操作系统版本、CPU型号、内存以及硬盘大小)、网络配置参数(比如:ip地址、网关、子网掩码)、BMC固件版本信息、OLED固件版本信息、用户自定义字符串显示、服务器运行状态下CPU硬盘内存利用率信息、客服电话等。The working status information displayed on the OLED display includes: server hardware configuration information (such as: operating system version, CPU model, memory and hard disk size), network configuration parameters (such as: ip address, gateway, subnet mask), BMC firmware version information , OLED firmware version information, user-defined character string display, CPU hard disk memory utilization information under server running status, customer service phone number, etc.

其中,本实施例中,优选地采用赛普拉斯公司生产的PSOC芯片;PSOC芯片是指PSOC系统单片机,在一个专有的MCU(Microprogrammed ControlUnit,微处理控制单元)内核周围集成了可配置的模拟和数字外围器件阵列PSOC块,利用芯片内部的可编程互联阵列,有效地配置芯片上的模拟和数字块资源,达到可编程片上系统的目的。在BMC和OLED之间加入PSOC,而PSOC中烧录了控制OLED显示所需的代码。Wherein, in the present embodiment, preferably adopt the PSOC chip that Cypress Company produces; PSOC chip refers to PSOC system single-chip microcomputer, has integrated configurable around a proprietary MCU (Microprogrammed ControlUnit, microprocessing control unit) core The analog and digital peripheral device array PSOC block uses the programmable interconnection array inside the chip to effectively configure the analog and digital block resources on the chip to achieve the purpose of a programmable system on chip. A PSOC is added between the BMC and the OLED, and the code required to control the OLED display is burned in the PSOC.

所述可编程芯片采用智能平台管理接口IPMI协议与所述BMC进行通信;所述可编程芯片通过I2C总线接收所述BMC发送的所述故障信息,I2C总线是由PHILIPS公司开发的两线式串行总线,所述可编程芯片通过通用异步收发器UART接口接收所述BMC发送的所述工作状态信息;所述可编程芯片通过UART接口反馈所述相应的处理策略至所述BMC。The programmable chip adopts the intelligent platform management interface IPMI protocol to communicate with the BMC; the programmable chip receives the fault information sent by the BMC through the I2C bus, and the I2C bus is a two-wire serial bus developed by PHILIPS company. row bus, the programmable chip receives the working state information sent by the BMC through the UART interface; the programmable chip feeds back the corresponding processing strategy to the BMC through the UART interface.

在一个应用示例中,以可编程芯片采用Cypress PSOC4芯片为例,Cypress PSOC4芯片通过UART接口与BMC通信以获取服务器正常工作状态和性能信息以及反馈相应的故障处理策略给BMC;通过I2C总线与BMC通信以获取服务器发生故障时的故障信息。In an application example, take the Cypress PSOC4 chip as an example for the programmable chip. The Cypress PSOC4 chip communicates with the BMC through the UART interface to obtain the normal working status and performance information of the server and feed back the corresponding fault handling strategy to the BMC; communicate with the BMC through the I2C bus Communication for failure information in case of server failure.

PSOC4将解析后的数据通过SPI传入OLED显示器的buffer,OLED显示器通过buffer里的数据做出相应的显示。PSOC4 transmits the parsed data to the buffer of the OLED display through SPI, and the OLED display makes a corresponding display through the data in the buffer.

服务器无故障时,PSOC4芯片与BMC以IPMI协议进行通信,将获取到的相关服务器硬件配置信息、网络参数、利用率等信息发送给OLED显示器进行显示。When the server is not faulty, the PSOC4 chip communicates with the BMC through the IPMI protocol, and sends the obtained relevant server hardware configuration information, network parameters, utilization rate and other information to the OLED display for display.

当服务器发生故障时,BMC通过I2C总线将故障信息发送给PSOC4芯片,PSOC4芯片将故障信息发送至OLED显示器,OLED显示器会跳出服务器正常显示状态,对故障信息进行两秒中闪烁报警,报警完成后,回到正常界面,同时会在显示屏的左上方显示一个报警标记位,用户可以通过OLED上的报警标识位,判断是否有故障以及是否故障已经被处理,若故障未处理,用户可以通过按键操作,对定位的故障源进行相应简单的处理,譬如对故障的源的控制芯片进行reset处理。用户若要查看故障信息,需要按键操作进入故障信息菜单项,显示故障信息。When the server fails, the BMC sends the fault information to the PSOC4 chip through the I2C bus, and the PSOC4 chip sends the fault information to the OLED display, and the OLED display will jump out of the normal display state of the server and flash the fault information for two seconds. , to return to the normal interface, and an alarm flag will be displayed on the upper left of the display at the same time. The user can judge whether there is a fault and whether the fault has been resolved through the alarm flag on the OLED. If the fault has not been resolved, the user can press the button Operation, perform corresponding simple processing on the located fault source, for example, perform reset processing on the control chip of the fault source. If the user wants to view the fault information, he needs to press the key to enter the fault information menu item and display the fault information.

同时,PSOC4芯片根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理。例如,PSOC4芯片判断发生了不可恢复故障,PSOC4芯片直接反馈关机的处理策略给BMC以驱动BMC对服务器进行关机处理。At the same time, the PSOC4 chip judges the fault level according to the fault information, and feeds back a corresponding processing strategy to the BMC according to the fault level, so as to drive the BMC to process the server accordingly according to the processing strategy. For example, the PSOC4 chip judges that an unrecoverable fault has occurred, and the PSOC4 chip directly feeds back a shutdown processing strategy to the BMC to drive the BMC to shutdown the server.

从上述实施例可以看出,相对于现有技术,上述实施例中提供的的至少一个实施例提供的服务器故障可视化侦测及处理方法、装置及可编程芯片,在服务器上增加OLED显示器,在OLED显示器与BMC之间加上可编程芯片,此外,用户通过OLED显示的故障信息,可以快速定位到故障点;在另一个实施例中,采用PSOC系列芯片,输入可以连接一个或多个BMC,输出到一个OLED显示器上,以实现由BMC控制PSOC驱动OLED显示的模式,不仅能够减少BMS的占用率,同时能够让客户了解更丰富的服务器的即时运行情况,并且对多节点系统和冗余系统能够很好地支持。在另一个实施例中,可编程芯片能够判断发生的故障级别,并且反馈相应的处理策略来驱使BMC对故障进行相应处理。It can be seen from the above embodiments that, compared to the prior art, at least one embodiment provided in the above embodiments provides a server failure visual detection and processing method, device, and programmable chip, adding an OLED display to the server, and adding an OLED display to the server. A programmable chip is added between the OLED display and the BMC. In addition, the user can quickly locate the fault point through the fault information displayed on the OLED; in another embodiment, the PSOC series chip is used, and the input can be connected to one or more BMCs. Output to an OLED display to realize the mode that the BMC controls the PSOC to drive the OLED display, which can not only reduce the occupancy rate of the BMS, but also allow customers to know more about the real-time operation of the server, and support multi-node systems and redundant systems able to support well. In another embodiment, the programmable chip can determine the level of the fault, and feed back a corresponding processing strategy to drive the BMC to handle the fault accordingly.

本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本发明不限制于任何特定形式的硬件和软件的结合。Those skilled in the art can understand that all or part of the steps in the above method can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, and the like. Optionally, all or part of the steps in the foregoing embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, or may be implemented in the form of software function modules. The present invention is not limited to any specific combination of hardware and software.

以上所述仅为本发明的优选实施例而已,并非用于限定本发明的保护范围。根据本发明的发明内容,还可有其他多种实施例,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. According to the content of the present invention, there can also be other various embodiments. Those skilled in the art can make various corresponding changes and deformations according to the present invention without departing from the spirit and essence of the present invention. Within the spirit and principles of the present invention, any modifications, equivalent replacements, improvements, etc., shall be included within the protection scope of the present invention.

Claims (10)

1.一种服务器故障可视化侦测及处理方法,包括:1. A server fault visual detection and processing method, comprising: 可编程芯片接收服务器中的基板管理控制器BMC发送的故障信息;The programmable chip receives the fault information sent by the baseboard management controller BMC in the server; 所述可编程芯片将所述故障信息发送至OLED显示器进行显示,并且根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动所述BMC根据所述处理策略对所述服务器进行相应处理。The programmable chip sends the fault information to the OLED display for display, and judges the fault level according to the fault information, and feeds back a corresponding processing strategy to the BMC according to the fault level, so as to drive the BMC according to the fault level. The processing policy performs corresponding processing on the server. 2.如权利要求1所述的方法,其特征在于,还包括:2. The method of claim 1, further comprising: 所述可编程芯片接收到所述BMC发送的服务器的工作状态信息;The programmable chip receives the working status information of the server sent by the BMC; 所述可编程芯片将所述工作状态信息发送至OLED显示器进行显示。The programmable chip sends the working status information to the OLED display for display. 3.如权利要求2所述的方法,其特征在于:3. The method of claim 2, wherein: 所述可编程芯片采用智能平台管理接口IPMI协议与所述BMC进行通信;所述可编程芯片通过I2C总线接收所述BMC发送的所述故障信息,通过通用异步收发器UART接口接收所述BMC发送的所述工作状态信息;所述可编程芯片通过UART接口反馈所述相应的处理策略至所述BMC;The programmable chip adopts the intelligent platform management interface IPMI protocol to communicate with the BMC; the programmable chip receives the fault information sent by the BMC through the I2C bus, and receives the fault information sent by the BMC through a UART interface. The working status information; the programmable chip feeds back the corresponding processing strategy to the BMC through the UART interface; 其中,所述可编程芯片为片上可编程系统PSOC系列芯片。Wherein, the programmable chip is a programmable system on chip PSOC series chip. 4.如权利要求1所述的方法,其特征在于:4. The method of claim 1, wherein: 所述故障信息包括:定位信息和报错信息;The fault information includes: positioning information and error reporting information; 所述方法还包括:BMC通过与FPGA通信进行定位,将所述故障信息以数据包格式发送给所述可编程芯片;所述定位信息为发生故障的硬件信息,所述报错信息为所述发生故障的硬件的出错信息。The method also includes: the BMC communicates with the FPGA to locate, and sends the fault information to the programmable chip in a data packet format; the positioning information is the hardware information of the fault, and the error message is the fault information. Error message for faulty hardware. 5.如权利要求4所述的方法,其特征在于:5. The method of claim 4, wherein: 所述方法还包括:为每个故障级别设置相应的阈值以及处理策略,The method also includes: setting a corresponding threshold and a processing strategy for each fault level, 所述根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理,包括:Said judging the fault level according to the fault information, feeding back a corresponding processing strategy to the BMC according to the fault level, so as to drive the BMC to perform corresponding processing on the server according to the processing strategy, including: 如果所述报错信息达到某一故障级别对应的阈值,则判断发生的故障为该故障级别,并查询对应的处理策略,将所述处理策略发送至所述BMC,其中,所述处理策略中还包括:所述定位信息和所述故障级别。If the error information reaches the threshold corresponding to a certain fault level, then it is judged that the fault occurred is the fault level, and the corresponding processing strategy is queried, and the processing strategy is sent to the BMC, wherein the processing strategy also includes Including: the positioning information and the fault level. 6.如权利要求1~5任一项权利要求所述的方法,其特征在于:6. The method according to any one of claims 1 to 5, characterized in that: 所述BMC包括一个或多个。The BMC includes one or more. 7.一种可编程芯片,用于服务器故障可视化侦测及处理,包括:7. A programmable chip for visual detection and processing of server faults, including: 接收模块,用于接收BMC发送的故障信息;The receiving module is used to receive the fault information sent by the BMC; 显示控制模块,用于将所述故障信息发送至OLED显示器进行显示;A display control module, configured to send the fault information to an OLED display for display; 故障控制模块,用于根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理。The fault control module is configured to judge a fault level according to the fault information, and feed back a corresponding processing policy to the BMC according to the fault level, so as to drive the BMC to perform corresponding processing on the server according to the processing policy. 8.如权利要求7所述的可编程芯片,其特征在于:8. The programmable chip according to claim 7, characterized in that: 所述可编程芯片采用IPMI协议与所述BMC进行通信;The programmable chip communicates with the BMC using the IPMI protocol; 所述接收模块,还用于通过I2C总线接收所述故障信息;The receiving module is also used to receive the fault information through the I2C bus; 所述显示控制模块,还用于通过串行外设接口SPI将所述故障信息发送至所述OLED显示器;The display control module is further configured to send the fault information to the OLED display through a serial peripheral interface SPI; 所述故障控制模块,还用于通过UART接口反馈所述相应的处理策略至所述BMC。The fault control module is further configured to feed back the corresponding processing strategy to the BMC through a UART interface. 9.如权利要求8所述的可编程芯片,其特征在于:9. The programmable chip according to claim 8, characterized in that: 所述故障信息包括:定位信息和报错信息;所述定位信息为发生故障的硬件的位置信息,所述报错信息为所述发送故障的硬件的出错信息;The fault information includes: positioning information and error reporting information; the positioning information is the location information of the hardware that has failed, and the error reporting information is the error information of the hardware sending the fault; 所述故障控制模块,还用于为每个故障级别设置相应的阈值以及处理策略;The fault control module is also used to set corresponding thresholds and processing strategies for each fault level; 所述故障控制模块,用于根据所述故障信息判断故障级别,根据所述故障级别反馈相应的处理策略至所述BMC,以驱动BMC根据所述处理策略对服务器进行相应处理,包括:The fault control module is configured to judge a fault level according to the fault information, and feed back a corresponding processing strategy to the BMC according to the fault level, so as to drive the BMC to perform corresponding processing on the server according to the processing strategy, including: 如果所述报错信息达到某一故障级别对应的阈值,则判断发生的故障为该故障级别,并查询对应的处理策略,将所述处理策略发送至所述BMC,其中,所述处理策略中还包括:所述定位信息和所述故障级别。If the error information reaches the threshold corresponding to a certain fault level, then it is judged that the fault occurred is the fault level, and the corresponding processing strategy is queried, and the processing strategy is sent to the BMC, wherein the processing strategy also includes Including: the positioning information and the fault level. 10.一种服务器故障可视化侦测及处理系统,包括:一个或多个BMC、如权利要求7~9所述的可编程芯片以及OLED显示器。10. A visual detection and processing system for server faults, comprising: one or more BMCs, the programmable chip according to claims 7-9, and an OLED display.
CN201410258508.1A 2014-06-11 2014-06-11 Server fault visual detecting and processing method and system and programmable chip Pending CN104021054A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410258508.1A CN104021054A (en) 2014-06-11 2014-06-11 Server fault visual detecting and processing method and system and programmable chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410258508.1A CN104021054A (en) 2014-06-11 2014-06-11 Server fault visual detecting and processing method and system and programmable chip

Publications (1)

Publication Number Publication Date
CN104021054A true CN104021054A (en) 2014-09-03

Family

ID=51437822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410258508.1A Pending CN104021054A (en) 2014-06-11 2014-06-11 Server fault visual detecting and processing method and system and programmable chip

Country Status (1)

Country Link
CN (1) CN104021054A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104503886A (en) * 2014-12-03 2015-04-08 浪潮集团有限公司 Design method for driving OLED by utilizing PSOC
CN104598346A (en) * 2015-02-15 2015-05-06 浪潮电子信息产业股份有限公司 Monitoring management device and method for rapid fault location in server system
CN105446657A (en) * 2015-11-11 2016-03-30 浪潮电子信息产业股份有限公司 Method for monitoring RAID card
CN105893196A (en) * 2016-04-05 2016-08-24 浪潮电子信息产业股份有限公司 Server debugging auxiliary tool and system
CN106407059A (en) * 2016-09-28 2017-02-15 郑州云海信息技术有限公司 Server node testing system and method
CN106557392A (en) * 2015-09-29 2017-04-05 鸿富锦精密工业(深圳)有限公司 Server failure detection means and method
CN106710512A (en) * 2017-02-28 2017-05-24 郑州云海信息技术有限公司 Display device and display method
CN106844162A (en) * 2017-02-25 2017-06-13 郑州云海信息技术有限公司 Storage server cabinet management system and method based on BMC
CN107193701A (en) * 2017-06-06 2017-09-22 郑州云海信息技术有限公司 Server master board and method for diagnosing faults with fault diagnosis functions
CN107203458A (en) * 2017-05-23 2017-09-26 郑州云海信息技术有限公司 A kind of server state information display device and method
CN107294786A (en) * 2017-07-13 2017-10-24 郑州云海信息技术有限公司 A kind of failure information processing method and device
CN107608925A (en) * 2017-10-09 2018-01-19 郑州云海信息技术有限公司 A kind of Server Extension card information acquisition methods and device
CN107621988A (en) * 2017-09-06 2018-01-23 郑州云海信息技术有限公司 A method and system for locating downtime faults in DC testing
CN107729169A (en) * 2017-09-25 2018-02-23 郑州云海信息技术有限公司 A kind of long range positioning method and apparatus of four components server node respective disc position
CN108369501A (en) * 2015-12-15 2018-08-03 华为技术有限公司 In real-time system the room and time perceptual organization of component be isolated
CN108429643A (en) * 2018-02-28 2018-08-21 郑州云海信息技术有限公司 Method, device and equipment for server fault management
CN109062771A (en) * 2018-07-11 2018-12-21 郑州云海信息技术有限公司 A kind of server monitoring OLED module redundancy approach, device, equipment and storage medium
CN111142643A (en) * 2019-12-25 2020-05-12 浪潮商用机器有限公司 A method, device and system for modifying power supply strategy of a power chip
CN111767184A (en) * 2020-09-01 2020-10-13 苏州浪潮智能科技有限公司 A kind of fault diagnosis method, device, electronic equipment and storage medium
CN114020561A (en) * 2021-10-22 2022-02-08 苏州浪潮智能科技有限公司 Fault reporting method, system, device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1496435A1 (en) * 2003-07-11 2005-01-12 Yogitech Spa Dependable microcontroller, method for designing a dependable microcontroller and computer program product therefor
US20060294207A1 (en) * 2005-06-09 2006-12-28 International Business Machines Corporation Apparatus and method for autonomically adjusting configuration parameters for a server when a different server fails
CN102082781A (en) * 2009-11-27 2011-06-01 宏正自动科技股份有限公司 Server management system and method thereof
CN102609350A (en) * 2012-02-15 2012-07-25 浪潮电子信息产业股份有限公司 Server memory failure alarm method
CN103425545A (en) * 2013-08-20 2013-12-04 浪潮电子信息产业股份有限公司 System fault tolerance method for multiprocessor server
CN103744774A (en) * 2014-01-23 2014-04-23 浪潮电子信息产业股份有限公司 Server fault visualizing and rapid diagnosing method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1496435A1 (en) * 2003-07-11 2005-01-12 Yogitech Spa Dependable microcontroller, method for designing a dependable microcontroller and computer program product therefor
US20060294207A1 (en) * 2005-06-09 2006-12-28 International Business Machines Corporation Apparatus and method for autonomically adjusting configuration parameters for a server when a different server fails
CN102082781A (en) * 2009-11-27 2011-06-01 宏正自动科技股份有限公司 Server management system and method thereof
CN102609350A (en) * 2012-02-15 2012-07-25 浪潮电子信息产业股份有限公司 Server memory failure alarm method
CN103425545A (en) * 2013-08-20 2013-12-04 浪潮电子信息产业股份有限公司 System fault tolerance method for multiprocessor server
CN103744774A (en) * 2014-01-23 2014-04-23 浪潮电子信息产业股份有限公司 Server fault visualizing and rapid diagnosing method

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104503886A (en) * 2014-12-03 2015-04-08 浪潮集团有限公司 Design method for driving OLED by utilizing PSOC
CN104598346A (en) * 2015-02-15 2015-05-06 浪潮电子信息产业股份有限公司 Monitoring management device and method for rapid fault location in server system
CN106557392A (en) * 2015-09-29 2017-04-05 鸿富锦精密工业(深圳)有限公司 Server failure detection means and method
CN105446657A (en) * 2015-11-11 2016-03-30 浪潮电子信息产业股份有限公司 Method for monitoring RAID card
CN105446657B (en) * 2015-11-11 2018-06-19 浪潮电子信息产业股份有限公司 Method for monitoring RAID card
CN108369501A (en) * 2015-12-15 2018-08-03 华为技术有限公司 In real-time system the room and time perceptual organization of component be isolated
CN105893196A (en) * 2016-04-05 2016-08-24 浪潮电子信息产业股份有限公司 Server debugging auxiliary tool and system
CN106407059A (en) * 2016-09-28 2017-02-15 郑州云海信息技术有限公司 Server node testing system and method
CN106844162A (en) * 2017-02-25 2017-06-13 郑州云海信息技术有限公司 Storage server cabinet management system and method based on BMC
CN106710512A (en) * 2017-02-28 2017-05-24 郑州云海信息技术有限公司 Display device and display method
CN107203458A (en) * 2017-05-23 2017-09-26 郑州云海信息技术有限公司 A kind of server state information display device and method
CN107193701A (en) * 2017-06-06 2017-09-22 郑州云海信息技术有限公司 Server master board and method for diagnosing faults with fault diagnosis functions
CN107294786A (en) * 2017-07-13 2017-10-24 郑州云海信息技术有限公司 A kind of failure information processing method and device
CN107621988A (en) * 2017-09-06 2018-01-23 郑州云海信息技术有限公司 A method and system for locating downtime faults in DC testing
CN107729169A (en) * 2017-09-25 2018-02-23 郑州云海信息技术有限公司 A kind of long range positioning method and apparatus of four components server node respective disc position
CN107608925A (en) * 2017-10-09 2018-01-19 郑州云海信息技术有限公司 A kind of Server Extension card information acquisition methods and device
CN108429643A (en) * 2018-02-28 2018-08-21 郑州云海信息技术有限公司 Method, device and equipment for server fault management
CN109062771A (en) * 2018-07-11 2018-12-21 郑州云海信息技术有限公司 A kind of server monitoring OLED module redundancy approach, device, equipment and storage medium
CN111142643A (en) * 2019-12-25 2020-05-12 浪潮商用机器有限公司 A method, device and system for modifying power supply strategy of a power chip
CN111142643B (en) * 2019-12-25 2021-07-16 浪潮商用机器有限公司 A method, device and system for modifying power supply strategy of a power chip
CN111767184A (en) * 2020-09-01 2020-10-13 苏州浪潮智能科技有限公司 A kind of fault diagnosis method, device, electronic equipment and storage medium
CN114020561A (en) * 2021-10-22 2022-02-08 苏州浪潮智能科技有限公司 Fault reporting method, system, device, computer equipment and storage medium
CN114020561B (en) * 2021-10-22 2024-05-24 苏州浪潮智能科技有限公司 Fault reporting method, system, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104021054A (en) Server fault visual detecting and processing method and system and programmable chip
US8156253B2 (en) Computer system, device sharing method, and device sharing program
CN110457164A (en) Device management method, device and server
CN105224362A (en) Host computer carries out the method and system of program upgrade to slave computer
CN105808407B (en) Method for managing devices, device and device management controller
CN114003445B (en) BMC I2C monitoring function test method, system, terminal and storage medium
CN105095001A (en) Virtual machine exception recovery method under distributed environment
WO2015035574A1 (en) Failure processing method, computer system, and apparatus
CN114218004B (en) Fault processing method and system of Kubernetes cluster physical node based on BMC
CN105183600A (en) Device and method for remotely positioning hard disk fault
CN109032867A (en) A kind of method for diagnosing faults, device and equipment
CN105183575A (en) Processor fault diagnosis method, device and system
CN106547660A (en) Baseboard management controller state detecting system and method
CN116483613A (en) Processing method and device for faulty memory stick, electronic equipment and storage medium
CN101820359A (en) Fault processing method and equipment for network equipment
CN103561089B (en) Virtual machine desktop log-in, Apparatus and system
CN103986588B (en) Remote control method for computer system and computer device
CN100375440C (en) Network connection backup system
CN105717820B (en) A kind of redundancy backup detection method of AUV
CN203241747U (en) A Redundant Design IoT System
US7921327B2 (en) System and method for recovery from uncorrectable bus errors in a teamed NIC configuration
CN100550771C (en) Realize the method and system of long-distance loading monoboard fastener
CN104950880B (en) Industrial control equipment debugging system and method
CN115599617B (en) Bus detection method, device, server and electronic equipment
US8520566B2 (en) Network connection method with auto-negotiation mechanism, network apparatus having auto-negotiation mechanism and network connection method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140903

WD01 Invention patent application deemed withdrawn after publication