CN113434324A - Abnormal information acquisition method, system, device and storage medium - Google Patents
Abnormal information acquisition method, system, device and storage medium Download PDFInfo
- Publication number
- CN113434324A CN113434324A CN202110728525.7A CN202110728525A CN113434324A CN 113434324 A CN113434324 A CN 113434324A CN 202110728525 A CN202110728525 A CN 202110728525A CN 113434324 A CN113434324 A CN 113434324A
- Authority
- CN
- China
- Prior art keywords
- slave
- host
- machine
- shared memory
- abnormal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4282—Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Debugging And Monitoring (AREA)
Abstract
本发明提供了一种异常信息获取方法、系统、设备及存储介质,所述方法包括:从机发生异常时,所述从机通过内核程序中的钩子函数获取异常信息;所述从机将所述异常信息写入主机与从机的共享内存;所述从机向所述主机发送中断信号,所述主机配置为接收到所述从机发送的中断信号后,从所述从机所对应的共享内存中读取异常信息。通过采用本发明,在从机出现故障时,可以通过内核程序中的钩子函数获取异常信息,然后写入到共享内存中,并向主机发送中断信号,主机在接收到中断信号后即可以从共享内存中读取到异常信息,解决了现有技术中内核错误时无法获取和记录异常信息的问题。
The present invention provides a method, system, device and storage medium for obtaining exception information. The method includes: when a slave machine is abnormal, the slave machine obtains the exception information through a hook function in a kernel program; The exception information is written into the shared memory of the master and the slave; the slave sends an interrupt signal to the master, and the master is configured to receive the interrupt signal sent by the slave Read exception information from shared memory. By adopting the present invention, when the slave machine fails, the abnormal information can be obtained through the hook function in the kernel program, and then written into the shared memory, and an interrupt signal is sent to the host. The abnormal information is read in the memory, which solves the problem that the abnormal information cannot be acquired and recorded when the kernel fails in the prior art.
Description
技术领域technical field
本发明涉及数据处理技术领域,尤其涉及一种异常信息获取方法、系统、设备及存储介质。The present invention relates to the technical field of data processing, and in particular, to a method, system, device and storage medium for acquiring abnormal information.
背景技术Background technique
PCIe(peripheral component interconnect express)是一种高速串行计算机扩展总线标准,其属于高速串行点对点双通道高带宽传输,所连接的设备分配独享通道带宽,不共享总线带宽,主要支持主动电源管理、错误报告、端对端的可靠性传输、热插拔以及服务质量监测等功能,在应用中,设备可以分为主机和从机。从机发生内核错误(kenrelpanic)后,一般异常信息仅仅能从从机的控制台获取得到,而没有其他渠道获取。PCIe (peripheral component interconnect express) is a high-speed serial computer expansion bus standard. It belongs to high-speed serial point-to-point dual-channel high-bandwidth transmission. The connected devices are allocated exclusive channel bandwidth and do not share bus bandwidth. It mainly supports active power management. , error reporting, end-to-end reliable transmission, hot swap and service quality monitoring and other functions, in the application, the device can be divided into master and slave. After a kernel error (kenrelpanic) occurs in the slave machine, the general exception information can only be obtained from the console of the slave machine, and there is no other way to obtain it.
在从机发生内核错误时,由于内核并不确定是什么错误导致的,所以无法操作存储设备,也就无法保存异常信息。如果需要抓取从机的内核错误信息,必须实时对从机的控制台进行观察。而从机一旦发生内核错误,会重启,此时即使对从机的控制台进行观察,如果没有注意到重启,也无法抓取到异常信息。When a kernel error occurs in the slave machine, since the kernel is not sure what the error is, the storage device cannot be operated, and the exception information cannot be saved. If you need to capture the kernel error information of the slave, you must observe the console of the slave in real time. Once a kernel error occurs on the slave machine, it will restart. At this time, even if the console of the slave machine is observed, if the restart is not noticed, the abnormal information cannot be captured.
发明内容SUMMARY OF THE INVENTION
针对现有技术中的问题,本发明的目的在于提供一种异常信息获取方法、系统、设备及存储介质,解决了现有技术中内核错误时无法获取和记录异常信息的问题。Aiming at the problems in the prior art, the purpose of the present invention is to provide a method, system, device and storage medium for obtaining exception information, which solves the problem that exception information cannot be obtained and recorded when the kernel fails in the prior art.
本发明实施例提供一种异常信息获取方法,包括如下步骤:An embodiment of the present invention provides a method for acquiring abnormal information, including the following steps:
从机发生异常时,所述从机通过内核程序中的钩子函数获取异常信息;When an exception occurs in the slave machine, the slave machine obtains the exception information through the hook function in the kernel program;
所述从机将所述异常信息写入主机与从机的共享内存;The slave writes the exception information into the shared memory of the host and the slave;
所述从机向所述主机发送中断信号,所述主机配置为接收到所述从机发送的中断信号后,从所述从机所对应的共享内存中读取异常信息。The slave sends an interrupt signal to the master, and the master is configured to read the exception information from the shared memory corresponding to the slave after receiving the interrupt signal sent by the slave.
通过采用本发明的异常信息获取方法,在从机出现故障时,可以通过内核程序中的钩子函数获取异常信息,然后写入到共享内存中,并向主机发送中断信号,主机在接收到中断信号后即可以从共享内存中读取到异常信息,解决了现有技术中内核错误时无法获取和记录异常信息的问题。By adopting the abnormal information acquisition method of the present invention, when the slave machine fails, the abnormal information can be obtained through the hook function in the kernel program, and then written into the shared memory, and an interrupt signal is sent to the host computer, and the host computer receives the interrupt signal. Afterwards, the exception information can be read from the shared memory, which solves the problem in the prior art that the exception information cannot be acquired and recorded when a kernel error occurs.
在一些实施例中,所述从机将所述异常信息写入主机与从机的共享内存之前,还包括如下步骤:In some embodiments, before the slave writes the exception information into the shared memory of the master and the slave, it further includes the following steps:
主机在本机为所述从机分配一段内存,作为共享内存,并将所述内存的物理地址写入所述从机的寄存器;The host allocates a section of memory locally for the slave as a shared memory, and writes the physical address of the memory into the register of the slave;
所述从机通过本机的寄存器获取所述主机分配的共享内存的物理地址,建立所述主机分配的共享内存与本机空间的映射;或者,The slave obtains the physical address of the shared memory allocated by the host through the register of the host, and establishes a mapping between the shared memory allocated by the host and the local space; or,
从机在本机分配一段内存,作为共享内存,并建立所述共享内存与所述主机的读取地址之间的映射。The slave allocates a section of memory locally as a shared memory, and establishes a mapping between the shared memory and the read address of the host.
在一些实施例中,所述从机向所述主机发送中断信号之前,还包括如下步骤:In some embodiments, before the slave sends an interrupt signal to the master, the following steps are further included:
所述主机为所述从机分配msi中断号。The master assigns an msi interrupt number to the slave.
在一些实施例中,所述从机向所述主机发送中断信号之后,还包括如下步骤:In some embodiments, after the slave sends an interrupt signal to the master, the following steps are further included:
所述主机接收到从机发送的中断信号;the host receives the interrupt signal sent by the slave;
所述主机从所述从机所对应的共享内存中读取异常信息;The host reads the exception information from the shared memory corresponding to the slave;
所述主机将读取的所述异常信息写入本机的异常日志。The host writes the read exception information into the exception log of the host.
在一些实施例中,所述主机接收到从机发送的中断信号之后,还包括如下步骤:In some embodiments, after the host receives the interrupt signal sent by the slave, it further includes the following steps:
所述主机分析接收到的中断信号的类型;the host analyzes the type of the received interrupt signal;
如果所述中断信号的类型为内核错误类型,则所述主机从所述从机所对应的共享内存中读取异常信息。If the type of the interrupt signal is a kernel error type, the master reads the exception information from the shared memory corresponding to the slave.
在一些实施例中,所述主机将读取的所述异常信息写入本机的异常日志,包括如下步骤:In some embodiments, the host writes the read exception information into an exception log of the host, including the following steps:
所述主机为所述异常信息添加所述从机的标识;The host adds the identifier of the slave to the abnormal information;
所述主机以主机侧的内核错误等级发起日志收集进程,将所述异常信息写入到本机的异常日志。The host initiates a log collection process with the kernel error level on the host side, and writes the exception information to the exception log of the host.
在一些实施例中,从机发生异常时,所述从机通过内核程序中的钩子函数获取异常信息之前,还包括如下步骤:In some embodiments, when an exception occurs in the slave machine, before the slave machine obtains the exception information through a hook function in the kernel program, the following steps are further included:
所述从机在操作系统中注册钩子函数,所述钩子函数配置为在所述从机发生异常时,获取本机的异常信息。The slave registers a hook function in the operating system, and the hook function is configured to acquire exception information of the slave when an exception occurs in the slave.
本发明实施例还提供一种异常信息获取系统,应用于所述的异常信息获取方法,所述系统包括主机和从机,所述主机和所述从机用于执行如下步骤:An embodiment of the present invention further provides a system for acquiring abnormal information, which is applied to the method for acquiring abnormal information. The system includes a host and a slave, and the host and the slave are configured to perform the following steps:
从机发生异常时,所述从机通过内核程序中的钩子函数获取异常信息;When an exception occurs in the slave machine, the slave machine obtains the exception information through the hook function in the kernel program;
所述从机将所述异常信息写入主机与从机的共享内存;The slave writes the exception information into the shared memory of the host and the slave;
所述从机向所述主机发送中断信号;the slave sends an interrupt signal to the master;
所述主机接收到所述从机发送的中断信号后,从所述从机所对应的共享内存中读取异常信息。After receiving the interrupt signal sent by the slave, the host reads the exception information from the shared memory corresponding to the slave.
通过采用本发明的异常信息获取系统,在从机出现故障时,从机可以通过内核程序中的钩子函数获取异常信息,然后写入到共享内存中,并向主机发送中断信号,主机在接收到中断信号后即可以从共享内存中读取到异常信息,解决了现有技术中内核错误时无法获取和记录异常信息的问题。By adopting the exception information acquisition system of the present invention, when the slave machine fails, the slave machine can obtain the exception information through the hook function in the kernel program, then write it into the shared memory, and send an interrupt signal to the host computer. After the interrupt signal, the exception information can be read from the shared memory, which solves the problem in the prior art that the exception information cannot be acquired and recorded when a kernel error occurs.
本发明实施例还提供一种异常信息获取设备,包括:The embodiment of the present invention also provides a device for acquiring abnormal information, including:
处理器;processor;
存储器,其中存储有所述处理器的可执行指令;a memory in which executable instructions for the processor are stored;
其中,所述处理器配置为经由执行所述可执行指令来执行所述的异常信息获取方法的步骤。Wherein, the processor is configured to execute the steps of the exception information acquisition method by executing the executable instruction.
通过采用本发明所提供的异常信息获取设备,所述处理器在执行所述可执行指令时执行所述的异常信息获取方法,由此可以获得上述异常信息获取方法的有益效果。By using the device for obtaining exception information provided by the present invention, the processor executes the method for obtaining exception information when executing the executable instruction, thereby obtaining the beneficial effects of the above method for obtaining exception information.
本发明实施例还提供一种计算机可读存储介质,用于存储程序,所述程序被处理器执行时实现所述的异常信息获取方法的步骤。Embodiments of the present invention further provide a computer-readable storage medium for storing a program, and when the program is executed by a processor, the steps of the method for obtaining exception information are implemented.
通过采用本发明所提供的计算机可读存储介质,其中存储的程序在被执行时实现所述的异常信息获取方法的步骤,由此可以获得上述异常信息获取方法的有益效果。By using the computer-readable storage medium provided by the present invention, the stored program implements the steps of the abnormal information acquisition method when executed, thereby obtaining the beneficial effects of the abnormal information acquisition method.
附图说明Description of drawings
通过阅读参照以下附图对非限制性实施例所作的详细描述,本发明的其它特征、目的和优点将会变得更明显。Other features, objects and advantages of the present invention will become more apparent upon reading the detailed description of non-limiting embodiments with reference to the following drawings.
图1是本发明一实施例的异常信息获取方法中从机上报异常信息的流程图;1 is a flowchart of a slave machine reporting abnormal information in a method for acquiring abnormal information according to an embodiment of the present invention;
图2是本发明一实施例的异常信息获取方法中主机获取异常信息的流程图;2 is a flowchart of a host acquiring abnormal information in a method for acquiring abnormal information according to an embodiment of the present invention;
图3是本发明一实施例的异常信息获取系统的结构示意图;3 is a schematic structural diagram of a system for acquiring abnormal information according to an embodiment of the present invention;
图4是本发明一实施例的异常信息获取设备的结构示意图;4 is a schematic structural diagram of a device for acquiring abnormal information according to an embodiment of the present invention;
图5是本发明一实施例的计算机存储介质的结构示意图。FIG. 5 is a schematic structural diagram of a computer storage medium according to an embodiment of the present invention.
具体实施方式Detailed ways
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的实施方式;相反,提供这些实施方式使得本发明将全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。在图中相同的附图标记表示相同或类似的结构,因而将省略对它们的重复描述。Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments, however, can be embodied in various forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus their repeated descriptions will be omitted.
如图1所示,在一实施例中,本发明提供了一种异常信息获取方法,该方法可以用于获取PCIe从机在发生异常时的异常信息,所述方法包括如下步骤:As shown in FIG. 1 , in one embodiment, the present invention provides a method for acquiring abnormality information, which can be used to acquire abnormality information of a PCIe slave when an abnormality occurs, and the method includes the following steps:
S100:从机发生异常时,所述从机通过内核程序中的钩子函数获取异常信息;S100: When an exception occurs in the slave, the slave obtains the exception information through a hook function in the kernel program;
S200:所述从机将所述异常信息写入主机与从机的共享内存;S200: the slave writes the exception information into the shared memory of the host and the slave;
S300:所述从机向所述主机发送中断信号,所述主机配置为接收到所述从机发送的中断信号后,从所述从机所对应的共享内存中读取异常信息,所述异常信息包括所述从机的操作系统的目前状态和其他一些需要记录的异常参数等。S300: The slave sends an interrupt signal to the host, and the host is configured to read exception information from the shared memory corresponding to the slave after receiving the interrupt signal sent by the slave, and the exception The information includes the current state of the operating system of the slave and some other abnormal parameters that need to be recorded.
通过采用本发明的异常信息获取方法,在从机出现故障时,通过步骤S100可以通过内核程序中的钩子函数获取异常信息,然后通过步骤S200写入到主机内存中,并通过步骤S300向主机发送中断信号,主机在接收到中断信号后即可以从共享内存中读取到异常信息,解决了现有技术中内核错误时无法获取和记录异常信息的问题。By adopting the method for obtaining exception information of the present invention, when the slave machine fails, the exception information can be obtained through the hook function in the kernel program through step S100, and then written into the host memory through step S200, and sent to the host through step S300. With the interrupt signal, the host can read the abnormal information from the shared memory after receiving the interrupt signal, which solves the problem that the abnormal information cannot be acquired and recorded when the kernel fails in the prior art.
在该实施例中,所述步骤S200:所述从机将所述异常信息写入主机与从机的共享内存之前,还包括如下步骤:In this embodiment, the step S200: before the slave writes the exception information into the shared memory of the master and the slave, the following steps are further included:
主机在本机为所述从机分配一段内存,作为共享内存,并将所述内存的物理地址写入所述从机的寄存器;内存的大小可以根据需要选择,例如4Mbyte、5Mbyte等,主机和从机之间可以是一对一或者一对多的关系,在一个主机对应于多个从机时,主机为各个从机分配的内存的大小可以相同,也可以不同;The host allocates a section of memory locally for the slave as a shared memory, and writes the physical address of the memory into the register of the slave; the size of the memory can be selected as needed, such as 4Mbyte, 5Mbyte, etc., the host and The relationship between slaves can be one-to-one or one-to-many. When one master corresponds to multiple slaves, the size of the memory allocated by the master to each slave can be the same or different;
所述从机通过本机的寄存器获取所述主机分配的共享内存的物理地址,建立所述主机分配的共享内存与本机空间的映射,此处主机分配的内存指的是在主机侧的内存空间,本机空间指的是从机侧的本机空间,所述从机利用PCIe的bar outbond设置,将所述主机分配的共享内存映射到本机空间,此映射为从机建立的外向映射。The slave obtains the physical address of the shared memory allocated by the host through the register of the host, and establishes a mapping between the shared memory allocated by the host and the local space, where the memory allocated by the host refers to the memory on the host side Space, the local space refers to the local space on the slave side. The slave uses the PCIe bar outbond setting to map the shared memory allocated by the host to the local space. This mapping is the outbound mapping established by the slave. .
在上述实施方式中,共享内存是在主机侧的。在另一种实施方式中,所述共享内存也可以是在从机侧的,在从机数量比较多的情况下更有利于减轻主机侧的存储负担。具体地,所述步骤S200:所述从机将所述异常信息写入主机与从机的共享内存之前,还包括:从机在本机分配一段内存,作为共享内存,并建立所述共享内存与所述主机的读取地址之间的映射,此映射为从机建立的内向映射。In the above embodiment, the shared memory is on the host side. In another implementation manner, the shared memory may also be on the slave side, which is more conducive to reducing the storage burden on the host side when the number of slaves is relatively large. Specifically, the step S200: before the slave writes the exception information into the shared memory of the host and the slave, the method further includes: the slave allocates a section of memory locally as a shared memory, and establishes the shared memory The mapping with the read address of the master, which is an inward mapping established by the slave.
在该实施例中,所述步骤S300:从机向所述主机发送中断信号之前,还包括如下步骤:In this embodiment, the step S300: before the slave sends an interrupt signal to the master, the following steps are further included:
所述主机为所述从机分配msi(message signal interrupt,消息信号中断)中断号,建立所述msi中断号与从机之间的映射关系。The host assigns an msi (message signal interrupt, message signal interrupt) interrupt number to the slave, and establishes a mapping relationship between the msi interrupt number and the slave.
如图2所示,在该实施例中,所述步骤S300:从机向所述主机发送中断信号之后,还包括如下步骤:As shown in FIG. 2, in this embodiment, the step S300: after the slave sends an interrupt signal to the host, the following steps are further included:
S400:所述主机接收到从机发送的中断信号;S400: the host receives the interrupt signal sent by the slave;
S500:所述主机从所述从机所对应的共享内存中读取异常信息;S500: the host reads abnormal information from the shared memory corresponding to the slave;
S600:所述主机将读取的所述异常信息写入本机的异常日志。S600: The host writes the read exception information into an exception log of the host.
因此,在从机出现故障时,主机通过步骤S400接收到从机发送的中断信号,主机侧可以通过该接收到的中断信号知道从机出现异常,并且通过步骤S500可以在本地为从机分配的共享内存中读取从机的异常信息,然后通过步骤S600将读取的从机的异常信息写入主机的异常日志,后续只需要查看主机的日志系统,即可以确定从机是否发生了异常,并且可以根据从机的异常信息分析从机异常原因。Therefore, when the slave fails, the master receives the interrupt signal sent by the slave through step S400, and the master side can know that the slave is abnormal through the received interrupt signal, and through step S500 can locally assign the slave. Read the exception information of the slave in the shared memory, and then write the read exception information of the slave into the exception log of the host through step S600, and then only need to check the log system of the host to determine whether the slave has an exception. And the abnormal cause of the slave can be analyzed according to the abnormal information of the slave.
在该实施例中,从机发生异常例如包括从机发生内核错误(kernel panic),此外,在其他可替代的实施方式中,本发明的异常信息获取方法还可以应用于其他类型的从机异常,而不限于内核错误,因此,所述从机检测到其他类型的异常时,发起中断和获取异常信息时,主机也可以获取到从机的异常信息。In this embodiment, the occurrence of an exception in the slave machine includes, for example, the occurrence of a kernel panic in the slave machine. In addition, in other alternative embodiments, the exception information acquisition method of the present invention can also be applied to other types of slave machine exceptions , not limited to kernel errors, therefore, when the slave detects other types of exceptions, initiates an interrupt and obtains exception information, the master can also obtain the exception information of the slave.
在该实施例中,也可以限定只有内核错误类型的从机异常需要主机获取异常日志,具体地,在该实施例中,所述步骤S400:所述主机接收到从机发送的中断信号之后,还包括如下步骤:In this embodiment, it may also be limited that only the abnormality of the slave machine of the kernel error type requires the host to obtain the exception log. Specifically, in this embodiment, the step S400: after the host receives the interrupt signal sent by the slave, It also includes the following steps:
所述主机分析接收到的中断信号的类型;the host analyzes the type of the received interrupt signal;
如果所述中断信号的类型为内核错误类型,则所述主机从所述从机所对应的共享内存中读取异常信息。If the type of the interrupt signal is a kernel error type, the master reads the exception information from the shared memory corresponding to the slave.
在该实施例中,所述步骤S600:所述主机将读取的所述异常信息写入本机的异常日志,包括如下步骤:In this embodiment, the step S600: the host writes the read abnormal information into the abnormal log of the local machine, including the following steps:
所述主机为所述异常信息添加所述从机的标识,后续在主机侧查看异常信息时,可以确定出现异常的是哪一个从机;The host adds the identifier of the slave to the abnormal information, and when checking the abnormal information on the host side, it can determine which slave has the abnormality;
所述主机以主机侧的内核错误(kernel error)等级发起日志收集进程(syslog/rsyslog),将所述异常信息存储到本机日志,由此后续通过查看主机的日志即可以获取到从机的异常信息,知道从机是否发生过异常,如果发生过异常,可以根据异常信息分析出当时从机异常的原因。The host initiates the log collection process (syslog/rsyslog) with the kernel error level of the host side, and stores the abnormal information in the log of the host, so that the log of the slave can be obtained later by viewing the log of the host. Abnormal information, know whether an abnormality has occurred in the slave. If an abnormality has occurred, the cause of the abnormality of the slave at that time can be analyzed according to the abnormal information.
在该实施例中,所述步骤S100:从机发生异常时,所述从机通过内核程序中的钩子函数获取异常信息之前,还包括如下步骤:In this embodiment, the step S100: when an exception occurs in the slave machine, before the slave machine obtains the exception information through the hook function in the kernel program, the following steps are further included:
所述从机在操作系统中注册钩子函数,所述钩子函数配置为在所述从机发生异常时,获取本机的异常信息。The slave registers a hook function in the operating system, and the hook function is configured to acquire exception information of the slave when an exception occurs in the slave.
如图3所示,本发明实施例还提供一种异常信息获取系统,应用于所述的异常信息获取方法,所述系统包括主机M100和从机M200,所述主机M100和所述从机M200用于执行如下步骤:As shown in FIG. 3 , an embodiment of the present invention further provides a system for acquiring abnormal information, which is applied to the method for acquiring abnormal information. The system includes a host M100 and a slave M200, and the host M100 and the slave M200 Used to perform the following steps:
从机M200发生异常时,所述从机M200通过内核程序中的钩子函数获取异常信息;When the slave M200 is abnormal, the slave M200 obtains the abnormal information through the hook function in the kernel program;
所述从机M200将所述异常信息写入主机M100和从机M200的共享内存;The slave M200 writes the abnormal information into the shared memory of the host M100 and the slave M200;
所述从机M200向所述主机M100发送中断信号;The slave M200 sends an interrupt signal to the host M100;
所述主机M100接收到所述从机M200发送的中断信号后,从所述从机M200所对应的共享内存中读取异常信息。After receiving the interrupt signal sent by the slave M200, the host M100 reads the exception information from the shared memory corresponding to the slave M200.
通过采用本发明的异常信息获取系统,在从机出现故障时,从机可以通过内核程序中的钩子函数获取异常信息,然后写入到共享内存中,并向主机发送中断信号,主机在接收到中断信号后即可以从共享内存中读取到异常信息,解决了现有技术中内核错误时无法获取和记录异常信息的问题。By adopting the exception information acquisition system of the present invention, when the slave machine fails, the slave machine can obtain the exception information through the hook function in the kernel program, then write it into the shared memory, and send an interrupt signal to the host computer. After the interrupt signal, the exception information can be read from the shared memory, which solves the problem in the prior art that the exception information cannot be acquired and recorded when a kernel error occurs.
进一步地,在该实施例中,所述主机M100从所述从机M200所对应的共享内存中读取异常信息之后,所述主机M100还将读取的所述异常信息写入本机的异常日志。具体地,所述主机M100将读取的所述异常信息写入本机的异常日志,包括:所述主机M100为所述异常信息添加所述从机M200的标识,后续在主机侧查看异常信息时,可以确定出现异常的是哪一个从机;所述主机M100以主机侧的内核错误(kernel error)等级发起日志收集进程(syslog/rsyslog),将所述异常信息存储到本机日志,由此后续通过查看主机M100的日志即可以获取到从机M200的异常信息,知道从机M200是否发生过异常,如果发生过异常,可以根据异常信息分析出当时从机M200异常的原因。Further, in this embodiment, after the host M100 reads the exception information from the shared memory corresponding to the slave M200, the host M100 also writes the read exception information into the local exception log. Specifically, the host M100 writes the read abnormal information into the abnormal log of the local machine, including: the host M100 adds the identifier of the slave M200 to the abnormal information, and then checks the abnormal information on the host side. When the abnormality occurs, it can be determined which slave is abnormal; the host M100 initiates a log collection process (syslog/rsyslog) with the kernel error level of the host side, and stores the abnormal information in the local log, by In this follow-up, by viewing the log of the master M100, you can obtain the abnormal information of the slave M200, and know whether the slave M200 has an abnormality.
在该实施例中,所述主机M100还用于在本机为所述从机M200分配一段共享内存,并将所述共享内存的物理地址写入所述从机M200的寄存器;共享内存的大小可以根据需要选择,例如4Mbyte、5Mbyte等,主机和从机M200之间可以是一对一或者一对多的关系,在一个主机对应于多个从机M200时,主机为各个从机M200分配的内存的大小可以相同,也可以不同。In this embodiment, the host M100 is further configured to allocate a section of shared memory for the slave M200 locally, and write the physical address of the shared memory into the register of the slave M200; the size of the shared memory It can be selected according to needs, such as 4Mbyte, 5Mbyte, etc. The relationship between the master and the slave M200 can be one-to-one or one-to-many. When one master corresponds to multiple slave M200s, the master allocates the M200 to each slave M200. The memory size can be the same or different.
所述从机M200还用于通过本机的寄存器获取所述主机M100分配的共享内存的物理地址,建立所述主机M100分配的共享内存与本机空间的映射,此处主机M100分配的共享内存指的是在主机侧的内存空间,本机空间指的是从机侧的本机空间,所述从机M200利用PCIe的bar outbond设置,将所述主机M100分配的共享内存映射到本机空间,此映射为从机建立的外向映射。The slave M200 is also used to obtain the physical address of the shared memory allocated by the host M100 through the register of the local machine, and establish a mapping between the shared memory allocated by the host M100 and the local space, where the shared memory allocated by the host M100. Refers to the memory space on the host side, and the local space refers to the local space on the slave side. The slave M200 uses the PCIe bar outbond setting to map the shared memory allocated by the host M100 to the local space. , this mapping is the outward mapping established by the slave.
在另一种实施方式中,所述共享内存可以在从机侧,在从机数量比较多的情况下更有利于减轻主机侧的存储负担。具体地,所述从机M200还用于在本机分配一段内存,作为共享内存,并建立所述共享内存与所述主机的读取地址之间的映射,此映射为从机建立的内向映射。In another implementation manner, the shared memory may be on the slave side, which is more conducive to reducing the storage burden on the host side when the number of slaves is relatively large. Specifically, the slave M200 is also used to allocate a section of memory locally as a shared memory, and to establish a mapping between the shared memory and the read address of the host, and this mapping is an inward mapping established by the slave .
本发明实施例还提供一种异常信息获取设备,包括处理器;存储器,其中存储有所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行所述的异常信息获取方法的步骤。An embodiment of the present invention further provides a device for obtaining exception information, including a processor; a memory, in which executable instructions of the processor are stored; wherein the processor is configured to execute the executable instructions by executing the executable instructions. The steps of the exception information acquisition method.
所属技术领域的技术人员能够理解,本发明的各个方面可以实现为系统、方法或程序产品。因此,本发明的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。As will be appreciated by one skilled in the art, various aspects of the present invention may be implemented as a system, method or program product. Therefore, various aspects of the present invention can be embodied in the following forms: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein as implementations "circuit", "module" or "system".
下面参照图4来描述根据本发明的这种实施方式的电子设备600。图4显示的电子设备600仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。The
如图4所示,电子设备600以通用计算设备的形式表现。电子设备600的组件可以包括但不限于:至少一个处理单元610、至少一个存储单元620、连接不同系统组件(包括存储单元620和处理单元610)的总线630、显示单元640等。As shown in FIG. 4,
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元610执行,使得所述处理单元610执行本说明书上述电子处方流转处理方法部分中描述的根据本发明各种示例性实施方式的步骤。例如,所述处理单元610可以执行如图1中所示的步骤。Wherein, the storage unit stores program codes, and the program codes can be executed by the
所述存储单元620可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)6201和/或高速缓存存储单元6202,还可以进一步包括只读存储单元(ROM)6203。The
所述存储单元620还可以包括具有一组(至少一个)程序模块6205的程序/实用工具6204,这样的程序模块6205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The
总线630可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。The
电子设备600也可以与一个或多个外部设备700(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备600交互的设备通信,和/或与使得该电子设备600能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口650进行。并且,电子设备600还可以通过网络适配器660与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。网络适配器660可以通过总线630与电子设备600的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备600使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The
通过采用本发明所提供的异常信息获取设备,所述处理器在执行所述可执行指令时执行所述的异常信息获取方法,由此可以获得上述异常信息获取方法的有益效果。By using the device for obtaining exception information provided by the present invention, the processor executes the method for obtaining exception information when executing the executable instruction, thereby obtaining the beneficial effects of the above method for obtaining exception information.
本发明实施例还提供一种计算机可读存储介质,用于存储程序,所述程序被处理器执行时实现所述的异常信息获取方法的步骤。在一些可能的实施方式中,本发明的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述电子处方流转处理方法部分中描述的根据本发明各种示例性实施方式的步骤。Embodiments of the present invention further provide a computer-readable storage medium for storing a program, and when the program is executed by a processor, the steps of the method for obtaining exception information are implemented. In some possible implementations, aspects of the present invention can also be implemented in the form of a program product comprising program code for enabling the program product to run on a terminal device The terminal device executes the steps according to various exemplary embodiments of the present invention described in the above-mentioned electronic prescription flow processing method section of this specification.
参考图5所示,描述了根据本发明的实施方式的用于实现上述方法的程序产品800,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本发明的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。5, a
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
所述计算机可读存储介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读存储介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。The computer-readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, carrying readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A readable storage medium can also be any readable medium other than a readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或集群上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming Language - such as the "C" language or similar programming language. The program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or cluster execute on. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (eg, using an Internet service provider business via an Internet connection).
通过采用本发明所提供的计算机可读存储介质,其中存储的程序在被执行时实现所述的异常信息获取方法的步骤,由此可以获得上述异常信息获取方法的有益效果。By using the computer-readable storage medium provided by the present invention, the stored program implements the steps of the abnormal information acquisition method when executed, thereby obtaining the beneficial effects of the abnormal information acquisition method.
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in combination with specific preferred embodiments, and it cannot be considered that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deductions or substitutions can be made, which should be regarded as belonging to the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110728525.7A CN113434324A (en) | 2021-06-29 | 2021-06-29 | Abnormal information acquisition method, system, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110728525.7A CN113434324A (en) | 2021-06-29 | 2021-06-29 | Abnormal information acquisition method, system, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113434324A true CN113434324A (en) | 2021-09-24 |
Family
ID=77757698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110728525.7A Pending CN113434324A (en) | 2021-06-29 | 2021-06-29 | Abnormal information acquisition method, system, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113434324A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115309617A (en) * | 2022-08-08 | 2022-11-08 | 科东(广州)软件科技有限公司 | Desktop display method of background operating system information, heterogeneous system and storage medium |
CN115981892A (en) * | 2023-01-03 | 2023-04-18 | 哲库科技(上海)有限公司 | Log reading method and device, electronic equipment and storage medium |
CN116185978A (en) * | 2023-02-10 | 2023-05-30 | 苏州浪潮智能科技有限公司 | Log processing method, device, electronic device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0512235A (en) * | 1991-07-08 | 1993-01-22 | Tokyo Electric Co Ltd | Electronics |
JP2006338445A (en) * | 2005-06-03 | 2006-12-14 | Matsushita Electric Ind Co Ltd | Abnormal information storage device |
CN102469474A (en) * | 2010-11-15 | 2012-05-23 | 中兴通讯股份有限公司 | Method and device for processing abnormal information of communication equipment |
CN105204977A (en) * | 2014-06-30 | 2015-12-30 | 中兴通讯股份有限公司 | System exception capturing method, main system, shadow system and intelligent equipment |
CN111274059A (en) * | 2020-01-21 | 2020-06-12 | 浙江大华技术股份有限公司 | Software exception handling method and device for slave equipment |
CN111694684A (en) * | 2019-03-15 | 2020-09-22 | 百度在线网络技术(北京)有限公司 | Abnormal construction method and device of storage equipment, electronic equipment and storage medium |
-
2021
- 2021-06-29 CN CN202110728525.7A patent/CN113434324A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0512235A (en) * | 1991-07-08 | 1993-01-22 | Tokyo Electric Co Ltd | Electronics |
JP2006338445A (en) * | 2005-06-03 | 2006-12-14 | Matsushita Electric Ind Co Ltd | Abnormal information storage device |
CN102469474A (en) * | 2010-11-15 | 2012-05-23 | 中兴通讯股份有限公司 | Method and device for processing abnormal information of communication equipment |
CN105204977A (en) * | 2014-06-30 | 2015-12-30 | 中兴通讯股份有限公司 | System exception capturing method, main system, shadow system and intelligent equipment |
CN111694684A (en) * | 2019-03-15 | 2020-09-22 | 百度在线网络技术(北京)有限公司 | Abnormal construction method and device of storage equipment, electronic equipment and storage medium |
CN111274059A (en) * | 2020-01-21 | 2020-06-12 | 浙江大华技术股份有限公司 | Software exception handling method and device for slave equipment |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115309617A (en) * | 2022-08-08 | 2022-11-08 | 科东(广州)软件科技有限公司 | Desktop display method of background operating system information, heterogeneous system and storage medium |
CN115981892A (en) * | 2023-01-03 | 2023-04-18 | 哲库科技(上海)有限公司 | Log reading method and device, electronic equipment and storage medium |
CN116185978A (en) * | 2023-02-10 | 2023-05-30 | 苏州浪潮智能科技有限公司 | Log processing method, device, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9197504B2 (en) | Enhanced remote presence | |
US10127170B2 (en) | High density serial over LAN management system | |
JP6710790B2 (en) | Method and apparatus for operating a smart network interface card | |
CN113434324A (en) | Abnormal information acquisition method, system, device and storage medium | |
CN103955441B (en) | Equipment management system, equipment management method and IO (Input/Output) expansion interface | |
US20130166672A1 (en) | Physically Remote Shared Computer Memory | |
US10324888B2 (en) | Verifying a communication bus connection to a peripheral device | |
CN107528829B (en) | BMC chip, server side and remote monitoring management method thereof | |
US10168934B2 (en) | Method and device for monitoring data integrity in shared memory environment | |
CN114880266B (en) | Fault processing method and device, computer equipment and storage medium | |
EP3429128A1 (en) | Hard drive operation method and hard drive manager | |
WO2016127600A1 (en) | Exception handling method and apparatus | |
CN111459863A (en) | NVME-MI-based chassis management system and method | |
US9747149B2 (en) | Firmware dump collection from primary system dump device adapter | |
CN111818145B (en) | File transmission method, device, system, equipment and storage medium | |
CN117873924A (en) | A computing device, a management controller and a data processing method | |
US10261937B2 (en) | Method and system for communication of device information | |
US9239807B2 (en) | Providing bus resiliency in a hybrid memory system | |
CN113434089B (en) | Data moving method and device and PCIE system | |
CN107818061B (en) | Data bus and management bus for associated peripheral devices | |
CN118643000A (en) | Generating method, sending method and device of configuration information table of server PCIe port | |
CN110602162B (en) | Terminal evidence obtaining method, device, equipment and storage medium | |
CN112579507A (en) | Host machine and BMC communication method, BIOS, operating system, BMC and server | |
CN103927133B (en) | Hard disk unit and computer system | |
JP2022107091A (en) | Information processing device, information processing system, information processing device control method, and information processing device control program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210924 |
|
RJ01 | Rejection of invention patent application after publication |