WO2016000298A1

WO2016000298A1 - System exception capturing method, main system, shadow system and intelligent device

Info

Publication number: WO2016000298A1
Application number: PCT/CN2014/084439
Authority: WO
Inventors: 蒋彪
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-06-30
Filing date: 2014-08-14
Publication date: 2016-01-07
Also published as: CN105204977A

Abstract

Provided are a system exception capturing method, a main system, a shadow system and an intelligent device. The method at a main system side comprises: a main system starting a shadow system used for performing exception detection on the main system on a second hardware resource in a hardware environment, the second hardware resource being different from a first hardware resource for the operation of the main system in the hardware environment; the main system dynamically saving its own operation state information in a shared memory, so that when it is monitored that an exception occurs in the main system, the shadow system acquires the operation state information about the main system from the shared memory; the main system saving its own physical memory address in the shared memory, so that when it is monitored that an exception occurs in the main system, the shadow system can access a physical memory of the main system via the physical memory address in the shared memory and acquires information about the use of the physical memory by the main system. In the solution of the present invention, in the case where an operating system (i.e. a main system) breaks down, exception information thereabout can also be captured.

Description

TECHNICAL FIELD The present invention relates to the field of computer operating system technologies, and in particular, to a method for capturing system anomalies, a main system, a shadow system, and a smart device. BACKGROUND With the rapid development of computer software and hardware technology, the hardware environment and business programs of the operating system are becoming more and more complicated. In practical applications, the system often encounters a system crash, and the possible performances are as follows: the keyboard and the mouse are unresponsive, unable to The ping, the display cannot be lit, or the abnormal information cannot be displayed on the display, and the system log cannot record the valid fault information. At this time, the environment may completely lose its response and cannot be operated. The analytical positioning of such problems has always been a major problem in the industry. Existing operating system, there are some means for positioning crash problem, such as the Linux operating system kdump capture technology into the operating system kernel software anomaly, the Linux operating system nmi_ _W atchdog technology can capture the deadlock in the kernel interrupt abnormal, the Linux operating The system's watchdog technology can catch kernel scheduling exceptions, but it can't capture valid information for the following crashes: 1. CPU (Central Processing Unit) The hardware failure causes the operating system to hang. In this case, the CPU hardware directly hangs, causing the operating system to hang directly on the CPU, so that valid information cannot be recorded.

2. The memory hardware failure causes the operating system to hang. In this case, the memory hardware failure causes the operating system to hang directly, so that valid information cannot be recorded. 3. PCI (Peripheral Component Interconnect) device hardware or firmware failure causes the PCI bus to hang, eventually causing the operating system to hang. In this case, valid information cannot be recorded.

4. The hard disk hardware or firmware failure causes the operating system to hang. In this case, the system I/O (input/output) hangs due to a hard disk failure and the log cannot be logged.

5. The system is overloaded and the operating system hangs, such as running out of memory. In this case, the operating system cannot perform operations related to recording exception information. 6. High-priority tasks continue to occupy the CPU, causing other low-priority tasks to fail to be scheduled, eventually causing the operating system to hang. In this case, the system can only schedule high-priority task execution, and the low-level process related to recording abnormal information cannot be scheduled, so that valid information cannot be recorded.

7. A deadlock occurs during the soft interrupt processing process, causing other tasks to fail to be scheduled, eventually causing the operating system to hang. In this case, since the process related to the recording of the abnormal information cannot be scheduled, the valid information cannot be recorded. In response to the above problems, it is not difficult to think of configuring a specialized monitoring device for capturing abnormal information of the monitored device in real time. That is, after the system of the monitored device crashes, it will not affect the abnormal capture of the monitoring device. However, this solution is not suitable due to the additional configuration of monitoring equipment. SUMMARY OF THE INVENTION The technical problem to be solved by embodiments of the present invention is to provide a system abnormality capturing method, a main system, a shadow system, and a smart device, which can independently run a main system and a shadow system in a hardware environment, after the main system, the shadow The system is still able to capture exception information for the primary system. To solve the above technical problem, an embodiment of the present invention provides a method for capturing an abnormality of a system, which is applied to a main system, including: a main system starts a second hardware resource in a hardware environment, and performs an abnormality detection on the main system. a shadow system; the second hardware resource is different from the first hardware resource of the main system running in the hardware environment; the main system dynamically saves its running state information in a shared memory, so that the shadow system monitors the main system abnormality Obtaining running state information of the main system from the shared memory; the main system saves the physical memory address in the shared memory, so that the shadow system can pass the shared memory when monitoring the abnormality of the main system The physical memory address in the accesses the physical memory of the primary system and obtains information about the physical memory used by the primary system. The capturing method applied to the main system further includes: the main system performs abnormal monitoring on the shadow system; when the main system detects the shadow system abnormality, the main system resets the shadow system. The main system starts a shadow system for performing anomaly detection on the main system on the second hardware resource of the hardware environment, including: The main system loads the kernel of the shadow system into the physical memory of the shadow system; the main system configures the startup parameters of the system kernel of the shadow system according to the information of the second hardware resource; the main system transfers the CPU assigned to the shadow system to the shadow system. The physical memory is such that the CPU assigned to the shadow system runs the system kernel to start the shadow system. The capturing method applied to the primary system further includes: the primary system saves information about the first hardware resource that supports the heartbeat packet detection in the shared memory, so that the shadow system can determine the support heartbeat according to the shared memory. A first hardware resource detected by the packet, and a heartbeat packet detection mechanism is established with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring of the primary system. The capturing method applied to the main system further includes: the main system counting the user state process by the soft watchdog; the main system saves the watchdog count in the shared memory in real time, thereby enabling the shadow system to The main system is abnormally monitored according to the update status of the count in the shared memory. In addition, another embodiment of the present invention further provides a method for capturing a system abnormality, which is applied to a shadow system, where the shadow system runs in a second hardware resource of a hardware environment, and the second hardware resource runs with the main system in the The first hardware resource of the hardware environment is different; the capturing method includes: the shadow system performs an abnormality detection on the main system; when the shadow system detects the main system abnormality, the physical memory address of the main system and the running of the main system are obtained from a shared memory. Status information; wherein the physical memory address and running status information in the shared memory are saved by the primary system; the shadow system accesses the physical memory of the primary system according to the physical memory address of the primary system, and the physical memory of the primary system is obtained. Information; The shadow system records information about the physical memory used by the primary system and the operational status of the primary system. The shadow system accesses the physical memory of the main system according to the physical memory address of the main system, and the information about the physical memory used by the main system includes: The shadow system loads a query kernel for obtaining physical memory information of the main system; The physical memory address of the primary system obtained by the shadow system root is configured to query the boot parameters of the kernel; the shadow system runs the query kernel, accesses the physical memory of the primary system, and obtains information about the physical memory used by the primary system. The shared memory further includes information of the first physical resource saved by the primary system. The abnormal detection of the primary system by the shadow system includes: the shadow system supports the information of the first hardware resource detected by the heartbeat packet from the shared memory. And determining the first hardware resource that supports the detection of the heartbeat packet; the shadow system and the first hardware resource supporting the detection of the heartbeat packet establish a heartbeat packet detection mechanism, thereby implementing abnormal monitoring of the main system. The shared memory further includes: a counting of the user state process by the main system through the soft watchdog; the abnormal detection of the main system by the shadow system includes: the shadow system is based on the update status of the count in the shared memory to the main system Perform abnormal monitoring. In addition, another embodiment of the present invention further provides a main system, running on a first hardware resource of a hardware environment, comprising: a startup module, configured to start a pair on a second hardware resource of a hardware environment a shadow system in which the main system performs an abnormality detection; the second hardware resource is different from a first hardware resource in which the main system runs in the hardware environment; and the first saving module is configured to dynamically save the running state information of the main system in a sharing In the memory, when the shadow system detects the abnormality of the main system, the running state information of the main system is obtained from the shared memory; and the second saving module is configured to save the physical memory address of the main system in the shared memory. When the shadow system detects the abnormality of the main system, the physical memory of the main system can be accessed through the physical memory address in the shared memory, and the information of the physical memory used by the main system is obtained. The shared memory further includes running state information of the shadow system; the main system includes: a shadow system monitoring module, configured to perform abnormal monitoring on the shadow system; The reset module is configured to reset the shadow system when the first monitoring module detects that the shadow system is abnormal. The startup module includes: a first loading submodule configured to load a kernel of a shadow system into a physical memory of the shadow system; and a first configuration submodule configured to configure a system of the shadow system according to information of the second hardware resource The boot parameter of the kernel; the first running submodule is configured to jump the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU assigned to the shadow system runs the system kernel to start the shadow system. The main system further includes: a third saving module, configured to save the information of the first hardware resource that supports the detection of the heartbeat message in the shared memory, so that the shadow system can determine the support heartbeat according to the shared memory. A first hardware resource detected by the packet, and a heartbeat packet detection mechanism is established with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring of the primary system. The main system further includes: a watchdog module configured to count a user state process by a soft watchdog; a fourth save module configured to save the watchdog count in the shared memory in real time, Thereby, the shadow system can perform abnormal monitoring on the main system according to the update status of the count in the shared memory. In addition, another embodiment of the present invention further provides a shadow system, a second hardware resource running in a hardware environment, where the second hardware resource is different from a first hardware resource occupied by the main system in the hardware environment; The system includes: a main system monitoring module, configured to perform an abnormality detection on the main system, the first acquiring module, configured to acquire a physical memory address of the main system from a shared memory when the main system monitoring module detects that the main system is abnormal The operating state information of the main system; wherein the physical memory address and the running state information in the shared memory are saved by the main system; the second obtaining module is configured to be the main system acquired according to the first acquiring module The physical memory address accesses the physical memory of the primary system, and obtains information about the physical memory used by the primary system; The recording module is set to record the information of the physical memory used by the main system and the running status information of the main system. The second obtaining module includes: a second loading submodule configured to load a query kernel for obtaining information of the physical memory used by the main system; and a second configuration submodule configured to obtain the physical of the main system obtained by the root The memory address configuration queries the boot parameters of the kernel; the second running submodule is configured to run the query kernel, access the physical memory of the primary system, and obtain information about the physical memory used by the primary system. The shared memory includes information of the first hardware resource that is supported by the primary system and supports the detection of the heartbeat message. The primary system monitoring module includes: a third acquiring submodule, configured to support the heartbeat from the shared memory. The information of the first hardware resource detected by the text, and the first hardware resource that supports the detection of the heartbeat packet is determined; the first monitoring sub-module is configured to establish a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, Thereby achieving abnormal monitoring of the main system. The shared memory further includes: counting, by the main system, the user state process by the soft watchdog; the main system monitoring module includes: a second monitoring submodule, configured to update according to the count in the shared memory The situation is abnormal to the main system

In addition, another embodiment of the present invention further provides a smart device, including the above main system and the above shadow system. The foregoing technical solutions of the embodiments of the present invention have the following advantages: The solution of the embodiment of the present invention can independently run the main system and the shadow system in a hardware environment, and after the main system, the shadow system can still capture the abnormal information of the main system. It is of great significance to enhance the maintainability of the main system. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a schematic diagram showing steps of implementing a system abnormality capturing method in a main system according to the present invention; FIG. 2 is a schematic diagram showing steps of implementing a system abnormality capturing method in a shadow system according to the present invention; FIG. 3 to FIG. FIG. 6 is a schematic structural diagram of a main system of the present invention; FIG. 7 is a schematic structural view of a main system of the present invention; FIG. Schematic diagram of the steps of the smart device starting the main system; FIG. 9 is a schematic diagram showing the steps of the main system booting shadow system of the smart device of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS In order to make the technical problems, technical solutions, and advantages of the present invention more comprehensible, the following detailed description will be made in conjunction with the accompanying drawings and specific embodiments. The invention provides a method for capturing an abnormality of a system, and a shadow system is used on the original operating system (ie, the main system of the present invention), and is specifically used for monitoring an abnormality of the main system. phenomenon. Wherein, the shadow system and the main system run on different hardware resources in the same hardware environment, so the operation of the shadow system is not affected when the main system occurs. It can be seen that the capture method of the present invention is significant for enhancing the maintainability of the primary system. As shown in FIG. 1 , a method for capturing a system abnormality applied to a primary system includes: Step 11: A primary system starts a shadow system for performing an abnormality detection on a primary system on a second hardware resource of a hardware environment; The second hardware resource is different from the first hardware resource of the main system running in the hardware environment; Step 12, the main system dynamically saves its running state information in a shared memory, so that the shadow system monitors the abnormality of the main system, The running status information of the primary system is obtained in the shared memory. In step 13, the main system saves the physical memory address of the main system in the shared memory, so that the shadow system can access the physical memory of the main system through the physical memory address in the shared memory when the main system is abnormal. Information to the main system using physical memory. The hardware environment may be a PC, a PAD, or a mobile phone. Running the main system and the shadow system based on different hardware resources may be, using different CPU cores, different areas of physical memory, etc. to run the main system and the shadow system. This ensures that the operation of the shadow system does not depend on the main system. As shown in FIG. 2, the method for capturing a system abnormality applied to a shadow system includes: Step 21: The shadow system performs an abnormality detection on the main system; Step 22: When the shadow system detects the main system abnormality, the main memory is obtained from a shared memory. The physical memory address of the system and the running status information of the primary system; wherein the physical memory address and the running status information in the shared memory are saved by the primary system; Step 23, the shadow system accesses according to the physical memory address of the primary system The physical memory of the main system obtains the information of the physical memory used by the main system; Step 24, the shadow system records the information of the physical memory used by the main system and the running status information of the main system; The method for capturing the abnormality of the system shown in FIG. 1 and FIG. It can be known that the entire solution of the present invention can be implemented based on a hardware-based soft implementation, so that it is not necessary to separately configure hardware for support, and the deployment is extremely convenient. In addition, under the main system, the shadow system can still capture the abnormal information of the main system, which is more comprehensive than the abnormal information that the main system can capture afterwards. The information about the physical memory used by the shadow system is a snapshot of the main system when the abnormality is obtained. The dynamic state information of the main system can be used to locate the abnormality. In addition, in order to improve the stability of the shadow system monitoring main system, the shadow system can be monitored by the main system on the basis of the above scheme. When the shadow system is abnormal, the main system resets the shadow system to ensure the normal shadow system. jobs. That is, in the above method for capturing the system abnormality applied to the main system, the method further includes: Step 14: The main system performs abnormality monitoring on the shadow system; Step 15: When the main system detects the shadow system abnormality, the main system resets the shadow system . Of course, it should be noted that, according to the solution of the present invention, the shadow system can also reset the main system after determining the abnormality of the main system, thereby ensuring the normal operation of the main system. In addition, in the actual implementation of step 11, the following technical problems are faced: The primary system loads the shadow system on the specified second hardware resource, and the shadow system cannot use the hardware resources allocated to the primary system. To this end, the present invention provides an implementation, that is, step 11 specifically includes: Step 111, the main system loads the kernel of the shadow system into the physical memory of the shadow system; Step 112, the main system configures according to the information of the second hardware resource The startup parameter of the system kernel of the shadow system; Step 113, the main system jumps the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU allocated to the shadow system runs the system kernel to start the shadow system. When the first hardware resource is used to run the main system, the first hardware resource and the second hardware resource may be first divided on the original main system. The primary system is then reinitialized to cause the primary system to run on the first hardware resource. The specific principle is the same as the above steps 111 to 113, and will not be repeated herein. Similarly, in the implementation of step 23, the kernel technology is also applied, which specifically includes: Step 231: The shadow system loads a query kernel for obtaining physical memory information of the primary system; Step 232, the primary system of the shadow system root acquired The physical memory address configuration queries the startup parameters of the kernel; Step 233, the shadow system runs the query kernel, accesses the physical memory of the primary system, and obtains information about the physical memory used by the primary system. Steps 231 to 233 are described in detail below in conjunction with an embodiment.

<Embodiment 1> Step All, the shadow system detects the physical CPU information used by the main system from the shared memory. The physical CPU information is stored in the shared memory by the host system. In step A12, the shadow system stops the operation by sending an interrupt request to all CPUs used by the main system. Step A13, if the main system can also respond to the interrupt request at this time, the main system stops running, stops the access operation to the memory, and synchronizes the state between the CPUs. If the primary system cannot respond to the interrupt request at this time (has been hanged), skip this step. In step A14, the shadow system reads the physical memory address previously saved by the main system from the shared memory. In step A15, the shadow system uses the physical memory address of the primary system as a startup parameter of the query kernel, and then loads the query kernel. In this way, the shadow system can access the physical memory used by the primary system. In step A16, the shadow system reads the information of the physical memory used by the main system, and performs dumping. The dump mode can be determined according to the second hardware resource owned by the shadow system. If the shadow system has a separate hard disk, the dump file can be saved on the hard disk; if the shadow system has a separate network card, the dump can also be dumped through the network. The file is uploaded to the specified location on the network. In the first embodiment, the shadow system terminates the operation of the main system before accessing the physical memory of the main system, thereby ensuring the consistency of the information obtained by the main system using the physical memory, and improving the stability of the access process. Sex. In addition, if the cause of the main system crash is because the PCI bus hangs, since the shadow system and the main system need to use the same PCI bus, the shadow system may not be able to perform PCI bus-related operations at this time, and thus the abnormal information capture cannot be realized. . To this end, another preferred solution, the shadow system can initialize the PCI bus before loading the query kernel. After the PCI bus is initialized, the query kernel can be used. The method of abnormal monitoring of the main system is described in detail below. Specifically, in the foregoing method for capturing a system abnormality applied to the main system, the method further includes: Step 16: The main system saves information about the first hardware resource that supports the heartbeat packet detection in the shared memory; In step 21, the shadow system supports the information of the first hardware resource detected by the heartbeat packet from the shared memory, and determines the first hardware resource that supports the detection of the heartbeat packet. Then, a heartbeat packet detection mechanism is established with the first hardware resource that supports heartbeat packet detection, thereby implementing abnormal monitoring of the main system. The steps related to detecting the heartbeat message are introduced in the following with reference to specific embodiments.

<Embodiment 2> In the second embodiment, the shadow system performs abnormal monitoring according to the status of the NIC resources occupied by the main system, including the steps shown in FIG. 3: Step B11, after the shadow system is started, the main system is obtained from the shared memory. NIC information, and periodically send heartbeat request packets to the specified NIC of the primary system through the NIC resources assigned to it. Because it is the same hardware environment, if the main system has NIC resources, then the shadow system can also be allocated to the NIC resources. In step B12, the main system receives the heartbeat request packet sent by the shadow system through its own network card, and responds to the corresponding heartbeat response message. Step B13: If the shadow system detects that the specified number of heartbeat response reports are lost within the specified time, determining that the primary system is abnormal.

<Embodiment 3> In the third embodiment, the shadow system performs an abnormal capture on the state of the CPU resources occupied by the main system, and specifically includes the steps shown in FIG. 4: Step C11, the shadow system is read from the shared memory after being started. CPU resource information used by the main system. Step C12: The shadow system periodically sends an inter-core interrupt to the CPU used by the main system as a heartbeat request message; this step is applicable to the chip of the multi-core CPU, and the shadow system and the main system can run on different CPUs. Step C13: After receiving the inter-core interrupt sent by the shadow system through the shared memory, the main system returns the response of the inter-core interrupt to the CPU used by the shadow system, that is, returns the heartbeat response message. It should be noted that after processing the received core, the CPU will continue to operate normally, and the interrupt processing will be fast, which will not affect normal services. Step C14: If the shadow system does not receive the response of the inter-core interrupt sent by the primary system within the specified time, the primary system is determined to be abnormal. In summary, in the second embodiment and the third embodiment, the shadow system can obtain the first physical resource used by the primary system to support the detection of the heartbeat packet through the shared memory, thereby establishing an abnormality detecting mechanism. In the same way, the main system can also perform abnormal monitoring on the shadow system in turn, and this article will not repeat them. In addition, the subsystem can also establish anomaly monitoring according to the running process of the main system. Specifically, in the foregoing method for capturing a system abnormality applied to the main system, the method further includes: Step 17, the main system counts the user state process by using a soft watchdog; Step 18, the main system saves the watchdog count in real time. Correspondingly, in the above step 21, the shadow system determines whether the main system is deadlock according to the count update status of the watchdog in the shared memory. The above abnormality monitoring mechanism will be introduced below in conjunction with specific embodiments.

<Embodiment 4> In the fourth embodiment, the user state process is executed as a feeding condition, and specifically includes the steps shown in FIG. 5: In step D11, the main system starts the watchdog driver. In step D12, the main system feeds the dog according to the executed user state process. In step D13, the main system simulates that the watchdog driver receives the dog feed request from the user state process, and writes the dog feed flag to the designated area in the shared memory (the flag can be implemented as a simple count). In step D14, the shadow system periodically queries the dog feed flag in the designated area of the shared memory. Step D15: If the shadow system determines, according to the shared memory, that the dog feed flag of the primary system is not updated within a limited time, the primary system is considered abnormal. In the fourth embodiment, as a preferred solution, a high-priority user state process can be used as a feeding condition to reduce resource running consumption. Similarly, the main system can also perform abnormal monitoring on the subsystem according to the principle of the fourth embodiment. In addition, the running status information of the main system may further include: a current CPU usage of the main system, a memory usage, and a system log. In summary, the system abnormal capture method of the present invention has the following advantages:

1) It is more convenient to deploy, does not rely on additional hardware environment, and can be used in the environment without serial port.

2) Under the occurrence of various abnormal conditions in the main system, fault information can also be captured for analysis and positioning. 3) After the abnormality of the main system, technical support is provided for automatic recovery. In addition, the present invention also provides a main system, running on a first hardware resource of a hardware environment, as shown in FIG. 6, comprising: a startup module, configured to start on a second hardware resource of a hardware environment for a shadow system for performing an abnormality detection on the main system; the second hardware resource is different from the first hardware resource of the main system running in the hardware environment; the first saving module is configured to dynamically save the running state information of the main system in a In the shared memory, when the shadow system detects the abnormality of the main system, the running state information of the main system is obtained from the shared memory; and the second saving module is configured to save the physical memory address of the main system in the shared memory. The shadow system can access the physical memory of the primary system through the physical memory address in the shared memory and obtain information about the physical memory used by the primary system when the primary system is abnormal. In addition, in order to improve the stability of the shadow system monitoring main system, the shadow system can be monitored by the main system on the basis of the above scheme. When the shadow system is abnormal, the main system resets the shadow system to ensure the normal shadow system. jobs. That is, the main system includes: a shadow system monitoring module configured to perform abnormal monitoring on the shadow system; and a reset module configured to reset the shadow system when the first monitoring module detects that the shadow system is abnormal. In the technical solution of the foregoing solution, the startup module includes: a first loading submodule, configured to load a kernel of a shadow system into a physical memory of the shadow system; and a first configuration submodule configured to be based on information of the second hardware resource Configuring a booting parameter of the system kernel of the shadow system; the first running submodule is configured to jump the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU allocated to the shadow system runs the system kernel to Start the shadow system. In addition, on the basis of the foregoing embodiment, the main system further includes: a third saving module, configured to save information of the first hardware resource that supports heartbeat packet detection in the shared memory, so that the shadow system can And determining, according to the shared memory, the first hardware resource that supports the detection of the heartbeat packet, and establishing a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring on the primary system. In addition, based on the foregoing embodiment, the main system further includes: a watchdog module, configured to count user state processes by a soft watchdog; a fourth save module, configured to count the watchdog It is saved in the shared memory in real time, so that the shadow system can perform abnormal monitoring on the main system according to the update status of the count in the shared memory. Obviously, the main system of the present embodiment corresponds to the method for capturing the abnormality of the system applied to the main system of the present invention, and the technology that can be achieved by the method can be achieved by the main system of the present embodiment. In addition, the present invention further provides a shadow system, a second hardware resource running in a hardware environment, where the second hardware resource is different from a first hardware resource occupied by the main system in the hardware environment; The shadow system includes: The main system monitoring module is configured to perform an abnormality detection on the main system; the first obtaining module is configured to obtain the physical memory address of the main system and the main system from a shared memory when the main system monitoring module detects that the main system is abnormal. The operating state information of the primary system obtained by the first acquiring module is configured by the primary system. The memory address accesses the physical memory of the main system, and obtains information of the physical memory used by the main system; the recording module is set to record the information of the physical memory used by the main system and the running status information of the main system. The second obtaining module includes: a second loading submodule configured to load a query kernel for obtaining information of the physical memory used by the main system; and a second configuration submodule configured to obtain the physical of the main system obtained by the root The memory address configuration queries the boot parameters of the kernel; the second running submodule is configured to run the query kernel, access the physical memory of the primary system, and obtain information about the physical memory used by the primary system. Specifically, the shared memory includes information of a first hardware resource that is supported by the primary system and supports detection of a heartbeat message. The primary system monitoring module includes: a third acquisition submodule, configured to support a heartbeat from the shared memory. The information of the first hardware resource detected by the packet is determined, and the first hardware resource that supports the detection of the heartbeat packet is determined. The first monitoring submodule is configured to establish a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet. , thus achieving abnormal monitoring of the main system. In the above description, the shadow system can obtain the first physical resource used by the primary system to support the detection of the heartbeat message through the shared memory, thereby establishing an abnormality detecting mechanism. In the same way, the main system can also reverse the abnormal monitoring of the shadow system, and this article will not repeat them. Specifically, the shared memory further includes: counting, by the main system, the user state process by the soft watchdog; the main system monitoring module includes: a second monitoring submodule, configured to perform an abnormality on the primary system according to the update status of the count in the shared memory

Obviously, the shadow system of this embodiment corresponds to the method for capturing the system anomaly applied to the shadow system of the present invention. The technology that can be achieved by the method is also considered to be achieved by the main system of the embodiment. In addition, the present invention also provides a smart device, including: a main system and a shadow system provided by the present invention. The smart device can be a PC, a PAD, or a mobile phone. As shown in FIG. 8 , the process of starting the main system by the smart device includes the following steps: Step E11, hardware power-on, BIOS performs hardware self-test and scanning. In step E12, the primary system is started only on the hardware resources allocated to the primary system. Step E13: During the startup process of the primary system, the information of the first hardware resource (such as physical memory location occupancy information and CPU hardware information) detected until the heartbeat packet is used is written into the shared memory. In step E14, during the startup process of the main system, a physical memory area of a certain size is selected as the shared memory for communication between the primary system and the shadow system. In step E15, when the main system is initialized, the shared memory is reserved, and the operating system and the business program are not used by the conventional memory allocation mode, but can be accessed through a special interface when needed. As shown in FIG. 9, the process of the main system startup subsystem includes the following steps: Step F11, the main system loads the kernel image of the shadow system into the area allocated to the physical memory of the shadow system. Step F12, the main system transmits the second hardware resource information allocated to the shadow system as a startup parameter to the system kernel of the shadow system, and causes the CPU allocated to the shadow system to jump to the physical memory loaded by the system kernel, and the shadow system starts. . In step F13, the shadow system is started, and the second hardware resource allocated to itself is initialized. In step F14, the shadow system loads a search kernel for collecting information of the main system using physical memory into a designated area of the shared memory. In addition, in the embodiment of the present invention, the shared memory is divided into different areas according to different uses, and the specific includes: Static information area, used to save information that the main system and shadow system will not update after startup. The area is fixed in size and will not scroll. Further, the area is divided into three parts: The first part saves the information that the primary system occupies the first hardware resource and the physical memory address used by the primary system. The second part stores the image of the search kernel. The third part stores the dog marking information described above. The dynamic information area is used to save the running status information of the main system, such as: current CPU usage, memory usage, and system logs. The area is fixed in size and will be dynamically scrolled. When the area is full, the newly written information will loop over the beginning of the area. The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention. Industrial Applicability The above technical solution provided by the present invention can be applied to the capture of system anomalies. With the above technical solution, the main system and the shadow system can be independently operated in a hardware environment, and after the main system, the shadow system can still capture. Abnormal information of the main system. It is of great significance to enhance the maintainability of the main system.

Claims

Claim

1. A method for capturing system anomalies, applied to a host system, including:

The primary system initiates a shadow system for performing anomaly detection on the primary system on a second hardware resource of the hardware environment; the second hardware resource is different from the first hardware resource of the primary system running in the hardware environment;

The main system dynamically saves its running status information in a shared memory, so that the shadow system obtains the running status information of the main system from the shared memory when the main system abnormality is detected; the main system sets its own physical memory address. The function is saved in the shared memory, so that the shadow system can access the physical memory of the primary system through the physical memory address in the shared memory and obtain information about the physical memory used by the primary system when the primary system is abnormal.

2. The capturing method according to claim 1, further comprising: performing abnormal monitoring on the shadow system by the main system;

When the primary system detects a shadow system anomaly, the primary system resets the shadow system.

3. The capturing method according to claim 1, wherein the main system starts a shadow system for detecting an abnormality of the main system on the second hardware resource of the hardware environment, and the main system loads the kernel of the shadow system into the shadow. In the physical memory of the system;

The main system configures the startup parameters of the system kernel of the shadow system according to the information of the second hardware resource; the main system jumps the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU allocated to the shadow system runs the system Kernel to start the shadow system.

The capture method according to claim 1, further comprising: the primary system saving information of the first hardware resource supporting the detection of the heartbeat message in the shared memory, so that the shadow system can determine according to the shared memory. A first hardware resource that supports heartbeat packet detection is configured, and a heartbeat packet detection mechanism is established with the first hardware resource that supports heartbeat packet detection, so as to implement abnormal monitoring of the main system.

5. The capturing method according to claim 1, further comprising:

The main system counts user state processes through a soft watchdog; The main system saves the watchdog's count in the shared memory in real time, thereby enabling the shadow system to perform abnormality monitoring on the main system according to the update status of the count in the shared memory.

A method for capturing a system abnormality, which is applied to a shadow system, wherein the shadow system runs a second hardware resource in a hardware environment, and the second hardware resource is different from a first hardware resource in which the main system runs in the hardware environment; Including: The shadow system performs anomaly detection on the main system;

When the shadow system detects the abnormality of the primary system, the physical memory address of the primary system and the running status information of the primary system are obtained from a shared memory; wherein the physical memory address and the running status information in the shared memory are The system saves the physical memory of the main system according to the physical memory address of the main system, and obtains information about the physical memory used by the main system;

The shadow system records information about the physical memory used by the primary system and the operational status of the primary system.

The capture method according to claim 6, wherein the shadow system accesses the physical memory of the main system according to the physical memory address of the main system, and the step of obtaining the information of the physical memory of the main system comprises: loading the shadow system to obtain the main The query kernel of the system's physical memory information; the physical memory address of the primary system obtained by the shadow system root is configured to query the startup parameters of the kernel; the shadow system runs the query kernel, accesses the physical memory of the primary system, and obtains the physical memory of the primary system. information.

The capture method according to claim 6, wherein the shared memory further includes information of the first physical resource saved by the primary system; and the step of the shadow system performing abnormality detection on the primary system includes:

The shadow system supports the information of the first hardware resource detected by the heartbeat packet from the shared memory, and determines the first hardware resource that supports the detection of the heartbeat packet;

The shadow system establishes a heartbeat packet detection mechanism with the first hardware resource that supports heartbeat packet detection, thereby implementing abnormal monitoring of the main system.

9. The capture method according to claim 6, wherein the shared memory further comprises: counting, by the main system, the user state process by the soft watchdog;

The steps of the shadow system to detect the abnormality of the main system include: The shadow system performs an abnormality monitoring on the primary system according to the update status of the count in the shared memory.

10. A host system, running on a first hardware resource of a hardware environment, comprising: a boot module configured to initiate a shadow system for performing anomaly detection on the host system on a second hardware resource of a hardware environment; The second hardware resource is different from the first hardware resource of the main system running in the hardware environment; the first saving module is configured to dynamically save the running state information of the main system in a shared memory, so that the shadow system monitors the main When the system is abnormal, the running status information of the main system is obtained from the shared memory;

The second saving module is configured to save the physical memory address of the primary system in the shared memory, so that the shadow system can access the physical memory of the primary system through the physical memory address in the shared memory when the primary system is detected to be abnormal. And get information about the physical memory used by the primary system.

11. The host system of claim 10, wherein the shared memory further comprises operational status information of a shadow system; the primary system comprising:

The shadow system monitoring module is configured to perform abnormal monitoring on the shadow system;

The reset module is configured to reset the shadow system when the first monitoring module detects that the shadow system is abnormal.

The primary system of claim 10, wherein the startup module comprises: a first loading submodule configured to load a kernel of a shadow system into a physical memory of the shadow system; the first configuration submodule is set to Configuring a startup parameter of a system kernel of the shadow system according to the information of the second hardware resource;

The first running sub-module is configured to jump the CPU assigned to the shadow system to the physical memory of the shadow system, thereby causing the CPU assigned to the shadow system to run the system kernel to start the shadow system.

The main system according to claim 10, further comprising: a third saving module, configured to save information of the first hardware resource supporting the detection of the heartbeat message in the shared memory, so that the shadow system can be configured according to The shared memory determines the first hardware resource that supports the detection of the heartbeat packet, and establishes a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring of the primary system.

14. The main system according to claim 10, further comprising: The watchdog module is set to count user state processes through a soft watchdog;

a fourth saving module, configured to save the watchdog count in the shared memory in real time, thereby enabling the shadow system to perform abnormal monitoring on the main system according to the update status of the count in the shared memory

15. A shadow system, a second hardware resource running in a hardware environment, the second hardware resource being different from a first hardware resource occupied by the main system in the hardware environment; the shadow system comprising:

Main system monitoring module, set to perform anomaly detection on the main system

a first obtaining module, configured to: when the main system monitoring module detects that the main system is abnormal, obtain a physical memory address of the main system and running state information of the main system from a shared memory; wherein, the shared memory The physical memory address and the running status information are saved by the main system; the second obtaining module is configured to access the physical memory of the main system according to the physical memory address of the main system acquired by the first obtaining module, and obtain the physical function of the main system. Memory information; The recording module is set to record the information of the physical memory used by the main system and the running status information of the main system.

The shadow system according to claim 15, wherein the second obtaining module comprises: a second loading submodule, configured to load a query kernel for obtaining information of a physical memory used by the main system;

The second configuration submodule is configured to configure a booting parameter of the kernel for the physical memory address of the primary system obtained by the root;

The second running submodule is configured to run the query kernel, access the physical memory of the main system, and obtain information about the physical memory used by the main system.

The shadow system of claim 15, wherein the shared memory comprises information of a first hardware resource detected by a primary system and supported by a heartbeat message; the primary system monitoring module comprises:

The third obtaining sub-module is configured to support information of the first hardware resource detected by the heartbeat packet from the shared memory, and determine a first hardware resource that supports detection of the heartbeat packet;

The first monitoring sub-module is configured to establish a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, thereby implementing abnormal monitoring of the main system.

The shadow system according to claim 15, wherein the shared memory further comprises: counting, by the main system, the user state process by the soft watchdog;

The main system monitoring module includes:

The second monitoring submodule is configured to perform abnormal monitoring on the primary system according to the update status of the count in the shared memory.

19. A smart device, comprising: a primary system according to any of claims 10-14 and any of the shadow systems of claims 15-18.