[go: up one dir, main page]

WO2016000298A1 - System exception capturing method, main system, shadow system and intelligent device - Google Patents

System exception capturing method, main system, shadow system and intelligent device Download PDF

Info

Publication number
WO2016000298A1
WO2016000298A1 PCT/CN2014/084439 CN2014084439W WO2016000298A1 WO 2016000298 A1 WO2016000298 A1 WO 2016000298A1 CN 2014084439 W CN2014084439 W CN 2014084439W WO 2016000298 A1 WO2016000298 A1 WO 2016000298A1
Authority
WO
WIPO (PCT)
Prior art keywords
shadow
main system
main
physical memory
information
Prior art date
Application number
PCT/CN2014/084439
Other languages
French (fr)
Chinese (zh)
Inventor
蒋彪
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016000298A1 publication Critical patent/WO2016000298A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment

Definitions

  • the present invention relates to the field of computer operating system technologies, and in particular, to a method for capturing system anomalies, a main system, a shadow system, and a smart device.
  • BACKGROUND With the rapid development of computer software and hardware technology, the hardware environment and business programs of the operating system are becoming more and more complicated. In practical applications, the system often encounters a system crash, and the possible performances are as follows: the keyboard and the mouse are unresponsive, unable to The ping, the display cannot be lit, or the abnormal information cannot be displayed on the display, and the system log cannot record the valid fault information. At this time, the environment may completely lose its response and cannot be operated. The analytical positioning of such problems has always been a major problem in the industry.
  • the memory hardware failure causes the operating system to hang. In this case, the memory hardware failure causes the operating system to hang directly, so that valid information cannot be recorded.
  • PCI Peripheral Component Interconnect
  • firmware failure causes the PCI bus to hang, eventually causing the operating system to hang. In this case, valid information cannot be recorded.
  • the hard disk hardware or firmware failure causes the operating system to hang.
  • the system is overloaded and the operating system hangs, such as running out of memory. In this case, the operating system cannot perform operations related to recording exception information. 6. High-priority tasks continue to occupy the CPU, causing other low-priority tasks to fail to be scheduled, eventually causing the operating system to hang. In this case, the system can only schedule high-priority task execution, and the low-level process related to recording abnormal information cannot be scheduled, so that valid information cannot be recorded.
  • a deadlock occurs during the soft interrupt processing process, causing other tasks to fail to be scheduled, eventually causing the operating system to hang. In this case, since the process related to the recording of the abnormal information cannot be scheduled, the valid information cannot be recorded.
  • this solution is not suitable due to the additional configuration of monitoring equipment.
  • an embodiment of the present invention provides a method for capturing an abnormality of a system, which is applied to a main system, including: a main system starts a second hardware resource in a hardware environment, and performs an abnormality detection on the main system.
  • the second hardware resource is different from the first hardware resource of the main system running in the hardware environment;
  • the main system dynamically saves its running state information in a shared memory, so that the shadow system monitors the main system abnormality Obtaining running state information of the main system from the shared memory;
  • the main system saves the physical memory address in the shared memory, so that the shadow system can pass the shared memory when monitoring the abnormality of the main system
  • the capturing method applied to the main system further includes: the main system performs abnormal monitoring on the shadow system; when the main system detects the shadow system abnormality, the main system resets the shadow system.
  • the main system starts a shadow system for performing anomaly detection on the main system on the second hardware resource of the hardware environment, including: The main system loads the kernel of the shadow system into the physical memory of the shadow system; the main system configures the startup parameters of the system kernel of the shadow system according to the information of the second hardware resource; the main system transfers the CPU assigned to the shadow system to the shadow system.
  • the physical memory is such that the CPU assigned to the shadow system runs the system kernel to start the shadow system.
  • the capturing method applied to the primary system further includes: the primary system saves information about the first hardware resource that supports the heartbeat packet detection in the shared memory, so that the shadow system can determine the support heartbeat according to the shared memory.
  • a first hardware resource detected by the packet, and a heartbeat packet detection mechanism is established with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring of the primary system.
  • the capturing method applied to the main system further includes: the main system counting the user state process by the soft watchdog; the main system saves the watchdog count in the shared memory in real time, thereby enabling the shadow system to The main system is abnormally monitored according to the update status of the count in the shared memory.
  • another embodiment of the present invention further provides a method for capturing a system abnormality, which is applied to a shadow system, where the shadow system runs in a second hardware resource of a hardware environment, and the second hardware resource runs with the main system in the The first hardware resource of the hardware environment is different;
  • the capturing method includes: the shadow system performs an abnormality detection on the main system; when the shadow system detects the main system abnormality, the physical memory address of the main system and the running of the main system are obtained from a shared memory. Status information; wherein the physical memory address and running status information in the shared memory are saved by the primary system; the shadow system accesses the physical memory of the primary system according to the physical memory address of the primary system, and the physical memory of the primary system is obtained.
  • the shadow system records information about the physical memory used by the primary system and the operational status of the primary system.
  • the shadow system accesses the physical memory of the main system according to the physical memory address of the main system, and the information about the physical memory used by the main system includes:
  • the shadow system loads a query kernel for obtaining physical memory information of the main system;
  • the physical memory address of the primary system obtained by the shadow system root is configured to query the boot parameters of the kernel;
  • the shadow system runs the query kernel, accesses the physical memory of the primary system, and obtains information about the physical memory used by the primary system.
  • the shared memory further includes information of the first physical resource saved by the primary system.
  • the abnormal detection of the primary system by the shadow system includes: the shadow system supports the information of the first hardware resource detected by the heartbeat packet from the shared memory.
  • the shared memory further includes: a counting of the user state process by the main system through the soft watchdog; the abnormal detection of the main system by the shadow system includes: the shadow system is based on the update status of the count in the shared memory to the main system Perform abnormal monitoring.
  • another embodiment of the present invention further provides a main system, running on a first hardware resource of a hardware environment, comprising: a startup module, configured to start a pair on a second hardware resource of a hardware environment a shadow system in which the main system performs an abnormality detection; the second hardware resource is different from a first hardware resource in which the main system runs in the hardware environment; and the first saving module is configured to dynamically save the running state information of the main system in a sharing In the memory, when the shadow system detects the abnormality of the main system, the running state information of the main system is obtained from the shared memory; and the second saving module is configured to save the physical memory address of the main system in the shared memory.
  • the shared memory further includes running state information of the shadow system;
  • the main system includes: a shadow system monitoring module, configured to perform abnormal monitoring on the shadow system;
  • the reset module is configured to reset the shadow system when the first monitoring module detects that the shadow system is abnormal.
  • the startup module includes: a first loading submodule configured to load a kernel of a shadow system into a physical memory of the shadow system; and a first configuration submodule configured to configure a system of the shadow system according to information of the second hardware resource The boot parameter of the kernel; the first running submodule is configured to jump the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU assigned to the shadow system runs the system kernel to start the shadow system.
  • the main system further includes: a third saving module, configured to save the information of the first hardware resource that supports the detection of the heartbeat message in the shared memory, so that the shadow system can determine the support heartbeat according to the shared memory.
  • a first hardware resource detected by the packet, and a heartbeat packet detection mechanism is established with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring of the primary system.
  • the main system further includes: a watchdog module configured to count a user state process by a soft watchdog; a fourth save module configured to save the watchdog count in the shared memory in real time, Thereby, the shadow system can perform abnormal monitoring on the main system according to the update status of the count in the shared memory.
  • another embodiment of the present invention further provides a shadow system, a second hardware resource running in a hardware environment, where the second hardware resource is different from a first hardware resource occupied by the main system in the hardware environment;
  • the system includes: a main system monitoring module, configured to perform an abnormality detection on the main system, the first acquiring module, configured to acquire a physical memory address of the main system from a shared memory when the main system monitoring module detects that the main system is abnormal The operating state information of the main system; wherein the physical memory address and the running state information in the shared memory are saved by the main system; the second obtaining module is configured to be the main system acquired according to the first acquiring module The physical memory address accesses the physical memory of the primary system, and obtains information about the physical memory used by the primary system; The recording module is set to record the information of the physical memory used by the main system and the running status information of the main system.
  • the second obtaining module includes: a second loading submodule configured to load a query kernel for obtaining information of the physical memory used by the main system; and a second configuration submodule configured to obtain the physical of the main system obtained by the root
  • the memory address configuration queries the boot parameters of the kernel; the second running submodule is configured to run the query kernel, access the physical memory of the primary system, and obtain information about the physical memory used by the primary system.
  • the shared memory includes information of the first hardware resource that is supported by the primary system and supports the detection of the heartbeat message.
  • the primary system monitoring module includes: a third acquiring submodule, configured to support the heartbeat from the shared memory.
  • the information of the first hardware resource detected by the text, and the first hardware resource that supports the detection of the heartbeat packet is determined; the first monitoring sub-module is configured to establish a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, Thereby achieving abnormal monitoring of the main system.
  • the shared memory further includes: counting, by the main system, the user state process by the soft watchdog; the main system monitoring module includes: a second monitoring submodule, configured to update according to the count in the shared memory The situation is abnormal to the main system
  • FIG. 1 is a schematic diagram showing steps of implementing a system abnormality capturing method in a main system according to the present invention
  • FIG. 2 is a schematic diagram showing steps of implementing a system abnormality capturing method in a shadow system according to the present invention
  • FIG. 6 is a schematic structural diagram of a main system of the present invention
  • FIG. 7 is a schematic structural view of a main system of the present invention
  • FIG. 9 is a schematic diagram showing the steps of the main system booting shadow system of the smart device of the present invention.
  • the invention provides a method for capturing an abnormality of a system, and a shadow system is used on the original operating system (ie, the main system of the present invention), and is specifically used for monitoring an abnormality of the main system. phenomenon.
  • the shadow system and the main system run on different hardware resources in the same hardware environment, so the operation of the shadow system is not affected when the main system occurs. It can be seen that the capture method of the present invention is significant for enhancing the maintainability of the primary system. As shown in FIG.
  • a method for capturing a system abnormality applied to a primary system includes: Step 11: A primary system starts a shadow system for performing an abnormality detection on a primary system on a second hardware resource of a hardware environment; The second hardware resource is different from the first hardware resource of the main system running in the hardware environment; Step 12, the main system dynamically saves its running state information in a shared memory, so that the shadow system monitors the abnormality of the main system, The running status information of the primary system is obtained in the shared memory.
  • the main system saves the physical memory address of the main system in the shared memory, so that the shadow system can access the physical memory of the main system through the physical memory address in the shared memory when the main system is abnormal. Information to the main system using physical memory.
  • the hardware environment may be a PC, a PAD, or a mobile phone.
  • Running the main system and the shadow system based on different hardware resources may be, using different CPU cores, different areas of physical memory, etc. to run the main system and the shadow system. This ensures that the operation of the shadow system does not depend on the main system.
  • the method for capturing a system abnormality applied to a shadow system includes: Step 21: The shadow system performs an abnormality detection on the main system; Step 22: When the shadow system detects the main system abnormality, the main memory is obtained from a shared memory.
  • the physical memory of the main system obtains the information of the physical memory used by the main system;
  • Step 24, the shadow system records the information of the physical memory used by the main system and the running status information of the main system;
  • the shadow system can still capture the abnormal information of the main system, which is more comprehensive than the abnormal information that the main system can capture afterwards.
  • the information about the physical memory used by the shadow system is a snapshot of the main system when the abnormality is obtained.
  • the dynamic state information of the main system can be used to locate the abnormality.
  • the shadow system can be monitored by the main system on the basis of the above scheme. When the shadow system is abnormal, the main system resets the shadow system to ensure the normal shadow system. jobs.
  • the method further includes: Step 14: The main system performs abnormality monitoring on the shadow system; Step 15: When the main system detects the shadow system abnormality, the main system resets the shadow system .
  • the shadow system can also reset the main system after determining the abnormality of the main system, thereby ensuring the normal operation of the main system.
  • the following technical problems are faced: The primary system loads the shadow system on the specified second hardware resource, and the shadow system cannot use the hardware resources allocated to the primary system.
  • step 11 specifically includes: Step 111, the main system loads the kernel of the shadow system into the physical memory of the shadow system; Step 112, the main system configures according to the information of the second hardware resource The startup parameter of the system kernel of the shadow system; Step 113, the main system jumps the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU allocated to the shadow system runs the system kernel to start the shadow system.
  • the first hardware resource is used to run the main system, the first hardware resource and the second hardware resource may be first divided on the original main system. The primary system is then reinitialized to cause the primary system to run on the first hardware resource.
  • Step 231 The shadow system loads a query kernel for obtaining physical memory information of the primary system; Step 232, the primary system of the shadow system root acquired The physical memory address configuration queries the startup parameters of the kernel; Step 233, the shadow system runs the query kernel, accesses the physical memory of the primary system, and obtains information about the physical memory used by the primary system. Steps 231 to 233 are described in detail below in conjunction with an embodiment.
  • the shadow system detects the physical CPU information used by the main system from the shared memory.
  • the physical CPU information is stored in the shared memory by the host system.
  • the shadow system stops the operation by sending an interrupt request to all CPUs used by the main system.
  • Step A13 if the main system can also respond to the interrupt request at this time, the main system stops running, stops the access operation to the memory, and synchronizes the state between the CPUs. If the primary system cannot respond to the interrupt request at this time (has been hanged), skip this step.
  • the shadow system reads the physical memory address previously saved by the main system from the shared memory.
  • the shadow system uses the physical memory address of the primary system as a startup parameter of the query kernel, and then loads the query kernel. In this way, the shadow system can access the physical memory used by the primary system.
  • the shadow system reads the information of the physical memory used by the main system, and performs dumping.
  • the dump mode can be determined according to the second hardware resource owned by the shadow system. If the shadow system has a separate hard disk, the dump file can be saved on the hard disk; if the shadow system has a separate network card, the dump can also be dumped through the network. The file is uploaded to the specified location on the network.
  • the shadow system terminates the operation of the main system before accessing the physical memory of the main system, thereby ensuring the consistency of the information obtained by the main system using the physical memory, and improving the stability of the access process. Sex.
  • the shadow system can initialize the PCI bus before loading the query kernel. After the PCI bus is initialized, the query kernel can be used. The method of abnormal monitoring of the main system is described in detail below.
  • the method further includes: Step 16: The main system saves information about the first hardware resource that supports the heartbeat packet detection in the shared memory; In step 21, the shadow system supports the information of the first hardware resource detected by the heartbeat packet from the shared memory, and determines the first hardware resource that supports the detection of the heartbeat packet. Then, a heartbeat packet detection mechanism is established with the first hardware resource that supports heartbeat packet detection, thereby implementing abnormal monitoring of the main system.
  • the steps related to detecting the heartbeat message are introduced in the following with reference to specific embodiments.
  • the shadow system performs abnormal monitoring according to the status of the NIC resources occupied by the main system, including the steps shown in FIG. 3: Step B11, after the shadow system is started, the main system is obtained from the shared memory. NIC information, and periodically send heartbeat request packets to the specified NIC of the primary system through the NIC resources assigned to it. Because it is the same hardware environment, if the main system has NIC resources, then the shadow system can also be allocated to the NIC resources. In step B12, the main system receives the heartbeat request packet sent by the shadow system through its own network card, and responds to the corresponding heartbeat response message. Step B13: If the shadow system detects that the specified number of heartbeat response reports are lost within the specified time, determining that the primary system is abnormal.
  • the shadow system performs an abnormal capture on the state of the CPU resources occupied by the main system, and specifically includes the steps shown in FIG. 4: Step C11, the shadow system is read from the shared memory after being started. CPU resource information used by the main system.
  • Step C12 The shadow system periodically sends an inter-core interrupt to the CPU used by the main system as a heartbeat request message; this step is applicable to the chip of the multi-core CPU, and the shadow system and the main system can run on different CPUs.
  • Step C13 After receiving the inter-core interrupt sent by the shadow system through the shared memory, the main system returns the response of the inter-core interrupt to the CPU used by the shadow system, that is, returns the heartbeat response message.
  • Step C14 If the shadow system does not receive the response of the inter-core interrupt sent by the primary system within the specified time, the primary system is determined to be abnormal.
  • the shadow system can obtain the first physical resource used by the primary system to support the detection of the heartbeat packet through the shared memory, thereby establishing an abnormality detecting mechanism.
  • the main system can also perform abnormal monitoring on the shadow system in turn, and this article will not repeat them.
  • the subsystem can also establish anomaly monitoring according to the running process of the main system.
  • the method further includes: Step 17, the main system counts the user state process by using a soft watchdog; Step 18, the main system saves the watchdog count in real time.
  • the shadow system determines whether the main system is deadlock according to the count update status of the watchdog in the shared memory.
  • the user state process is executed as a feeding condition, and specifically includes the steps shown in FIG. 5:
  • step D11 the main system starts the watchdog driver.
  • step D12 the main system feeds the dog according to the executed user state process.
  • step D13 the main system simulates that the watchdog driver receives the dog feed request from the user state process, and writes the dog feed flag to the designated area in the shared memory (the flag can be implemented as a simple count).
  • step D14 the shadow system periodically queries the dog feed flag in the designated area of the shared memory.
  • Step D15 If the shadow system determines, according to the shared memory, that the dog feed flag of the primary system is not updated within a limited time, the primary system is considered abnormal.
  • a high-priority user state process can be used as a feeding condition to reduce resource running consumption.
  • the main system can also perform abnormal monitoring on the subsystem according to the principle of the fourth embodiment.
  • the running status information of the main system may further include: a current CPU usage of the main system, a memory usage, and a system log.
  • the present invention also provides a main system, running on a first hardware resource of a hardware environment, as shown in FIG.
  • a startup module configured to start on a second hardware resource of a hardware environment for a shadow system for performing an abnormality detection on the main system; the second hardware resource is different from the first hardware resource of the main system running in the hardware environment; the first saving module is configured to dynamically save the running state information of the main system in a In the shared memory, when the shadow system detects the abnormality of the main system, the running state information of the main system is obtained from the shared memory; and the second saving module is configured to save the physical memory address of the main system in the shared memory.
  • the shadow system can access the physical memory of the primary system through the physical memory address in the shared memory and obtain information about the physical memory used by the primary system when the primary system is abnormal.
  • the shadow system can be monitored by the main system on the basis of the above scheme.
  • the main system When the shadow system is abnormal, the main system resets the shadow system to ensure the normal shadow system. jobs. That is, the main system includes: a shadow system monitoring module configured to perform abnormal monitoring on the shadow system; and a reset module configured to reset the shadow system when the first monitoring module detects that the shadow system is abnormal.
  • the startup module includes: a first loading submodule, configured to load a kernel of a shadow system into a physical memory of the shadow system; and a first configuration submodule configured to be based on information of the second hardware resource Configuring a booting parameter of the system kernel of the shadow system; the first running submodule is configured to jump the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU allocated to the shadow system runs the system kernel to Start the shadow system.
  • the main system further includes: a third saving module, configured to save information of the first hardware resource that supports heartbeat packet detection in the shared memory, so that the shadow system can And determining, according to the shared memory, the first hardware resource that supports the detection of the heartbeat packet, and establishing a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring on the primary system.
  • a third saving module configured to save information of the first hardware resource that supports heartbeat packet detection in the shared memory, so that the shadow system can And determining, according to the shared memory, the first hardware resource that supports the detection of the heartbeat packet, and establishing a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring on the primary system.
  • the main system further includes: a watchdog module, configured to count user state processes by a soft watchdog; a fourth save module, configured to count the watchdog It is saved in the shared memory in real time, so that the shadow system can perform abnormal monitoring on the main system according to the update status of the count in the shared memory.
  • the main system of the present embodiment corresponds to the method for capturing the abnormality of the system applied to the main system of the present invention, and the technology that can be achieved by the method can be achieved by the main system of the present embodiment.
  • the present invention further provides a shadow system, a second hardware resource running in a hardware environment, where the second hardware resource is different from a first hardware resource occupied by the main system in the hardware environment;
  • the shadow system includes:
  • the main system monitoring module is configured to perform an abnormality detection on the main system;
  • the first obtaining module is configured to obtain the physical memory address of the main system and the main system from a shared memory when the main system monitoring module detects that the main system is abnormal.
  • the operating state information of the primary system obtained by the first acquiring module is configured by the primary system.
  • the memory address accesses the physical memory of the main system, and obtains information of the physical memory used by the main system;
  • the recording module is set to record the information of the physical memory used by the main system and the running status information of the main system.
  • the second obtaining module includes: a second loading submodule configured to load a query kernel for obtaining information of the physical memory used by the main system; and a second configuration submodule configured to obtain the physical of the main system obtained by the root
  • the memory address configuration queries the boot parameters of the kernel; the second running submodule is configured to run the query kernel, access the physical memory of the primary system, and obtain information about the physical memory used by the primary system.
  • the shared memory includes information of a first hardware resource that is supported by the primary system and supports detection of a heartbeat message.
  • the primary system monitoring module includes: a third acquisition submodule, configured to support a heartbeat from the shared memory.
  • the information of the first hardware resource detected by the packet is determined, and the first hardware resource that supports the detection of the heartbeat packet is determined.
  • the first monitoring submodule is configured to establish a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet. , thus achieving abnormal monitoring of the main system.
  • the shadow system can obtain the first physical resource used by the primary system to support the detection of the heartbeat message through the shared memory, thereby establishing an abnormality detecting mechanism.
  • the main system can also reverse the abnormal monitoring of the shadow system, and this article will not repeat them.
  • the shared memory further includes: counting, by the main system, the user state process by the soft watchdog;
  • the main system monitoring module includes: a second monitoring submodule, configured to perform an abnormality on the primary system according to the update status of the count in the shared memory
  • the shadow system of this embodiment corresponds to the method for capturing the system anomaly applied to the shadow system of the present invention.
  • the technology that can be achieved by the method is also considered to be achieved by the main system of the embodiment.
  • the present invention also provides a smart device, including: a main system and a shadow system provided by the present invention.
  • the smart device can be a PC, a PAD, or a mobile phone.
  • the process of starting the main system by the smart device includes the following steps: Step E11, hardware power-on, BIOS performs hardware self-test and scanning.
  • step E12 the primary system is started only on the hardware resources allocated to the primary system.
  • Step E13 During the startup process of the primary system, the information of the first hardware resource (such as physical memory location occupancy information and CPU hardware information) detected until the heartbeat packet is used is written into the shared memory.
  • step E14 during the startup process of the main system, a physical memory area of a certain size is selected as the shared memory for communication between the primary system and the shadow system.
  • step E15 when the main system is initialized, the shared memory is reserved, and the operating system and the business program are not used by the conventional memory allocation mode, but can be accessed through a special interface when needed.
  • the process of the main system startup subsystem includes the following steps: Step F11, the main system loads the kernel image of the shadow system into the area allocated to the physical memory of the shadow system.
  • Step F12 the main system transmits the second hardware resource information allocated to the shadow system as a startup parameter to the system kernel of the shadow system, and causes the CPU allocated to the shadow system to jump to the physical memory loaded by the system kernel, and the shadow system starts. .
  • step F13 the shadow system is started, and the second hardware resource allocated to itself is initialized.
  • step F14 the shadow system loads a search kernel for collecting information of the main system using physical memory into a designated area of the shared memory.
  • the shared memory is divided into different areas according to different uses, and the specific includes: Static information area, used to save information that the main system and shadow system will not update after startup. The area is fixed in size and will not scroll.
  • the area is divided into three parts: The first part saves the information that the primary system occupies the first hardware resource and the physical memory address used by the primary system.
  • the second part stores the image of the search kernel.
  • the third part stores the dog marking information described above.
  • the dynamic information area is used to save the running status information of the main system, such as: current CPU usage, memory usage, and system logs.
  • the area is fixed in size and will be dynamically scrolled. When the area is full, the newly written information will loop over the beginning of the area.
  • the above technical solution provided by the present invention can be applied to the capture of system anomalies.
  • the main system and the shadow system can be independently operated in a hardware environment, and after the main system, the shadow system can still capture. Abnormal information of the main system. It is of great significance to enhance the maintainability of the main system.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Provided are a system exception capturing method, a main system, a shadow system and an intelligent device. The method at a main system side comprises: a main system starting a shadow system used for performing exception detection on the main system on a second hardware resource in a hardware environment, the second hardware resource being different from a first hardware resource for the operation of the main system in the hardware environment; the main system dynamically saving its own operation state information in a shared memory, so that when it is monitored that an exception occurs in the main system, the shadow system acquires the operation state information about the main system from the shared memory; the main system saving its own physical memory address in the shared memory, so that when it is monitored that an exception occurs in the main system, the shadow system can access a physical memory of the main system via the physical memory address in the shared memory and acquires information about the use of the physical memory by the main system. In the solution of the present invention, in the case where an operating system (i.e. a main system) breaks down, exception information thereabout can also be captured.

Description

一种系统异常的捕获方法、 主系统、 影子系统及智能设备 技术领域 本发明涉及计算机操作系统技术领域,特别是一种系统异常的捕获方法、主系统、 影子系统及智能设备。 背景技术 随着计算机软硬件技术的飞速发展, 操作系统运行的硬件环境和业务程序日趋复 杂, 在实际应用中, 经常会遇到系统死机的情况, 可能的表现为: 键盘、 鼠标无响应、 无法 ping通、 显示器无法点亮或者显示上无法显示异常信息、 同时系统日志也无法记 录到有效的故障信息, 此时环境可能完全失去响应, 无法操作。 此类问题的分析定位 一直是业界的一大难题。 现有的操作系统中存在一些针对死机问题的定位手段,如 Linux操作系统的 kdump 技术能捕获到操作系统内核软件异常、 Linux操作系统的 nmi_Watchdog技术能捕获到 内核中断死锁异常、 Linux操作系统的 watchdog技术能捕获到内核调度异常, 但对于 如下原因导致的死机异常, 却无法捕获到有效信息: 1、 CPU (Central Processing Unit, 中央处理器) 硬件故障导致操作系统挂死。 这 种情况下, CPU硬件直接挂死, 导致运行在该 CPU上操作系统直接挂死, 从而无法 记录到有效信息。 TECHNICAL FIELD The present invention relates to the field of computer operating system technologies, and in particular, to a method for capturing system anomalies, a main system, a shadow system, and a smart device. BACKGROUND With the rapid development of computer software and hardware technology, the hardware environment and business programs of the operating system are becoming more and more complicated. In practical applications, the system often encounters a system crash, and the possible performances are as follows: the keyboard and the mouse are unresponsive, unable to The ping, the display cannot be lit, or the abnormal information cannot be displayed on the display, and the system log cannot record the valid fault information. At this time, the environment may completely lose its response and cannot be operated. The analytical positioning of such problems has always been a major problem in the industry. Existing operating system, there are some means for positioning crash problem, such as the Linux operating system kdump capture technology into the operating system kernel software anomaly, the Linux operating system nmi_ W atchdog technology can capture the deadlock in the kernel interrupt abnormal, the Linux operating The system's watchdog technology can catch kernel scheduling exceptions, but it can't capture valid information for the following crashes: 1. CPU (Central Processing Unit) The hardware failure causes the operating system to hang. In this case, the CPU hardware directly hangs, causing the operating system to hang directly on the CPU, so that valid information cannot be recorded.
2、 内存硬件故障导致操作系统挂死。这种情况下, 内存硬件故障导致操作系统直 接挂死, 从而无法记录到有效信息。 3、 PCI (Peripheral Component Interconnect, 外设部件互连标准)设备硬件或固件 故障导致 PCI总线挂死, 最终导致操作系统挂死。 这种情况下, 无法记录有效信息。 2. The memory hardware failure causes the operating system to hang. In this case, the memory hardware failure causes the operating system to hang directly, so that valid information cannot be recorded. 3. PCI (Peripheral Component Interconnect) device hardware or firmware failure causes the PCI bus to hang, eventually causing the operating system to hang. In this case, valid information cannot be recorded.
4、硬盘硬件或固件故障导致操作系统挂死。这种情况下, 由于硬盘故障导致系统 I/O (输入 /输出) 挂死, 无法记录日志。 4. The hard disk hardware or firmware failure causes the operating system to hang. In this case, the system I/O (input/output) hangs due to a hard disk failure and the log cannot be logged.
5、 系统负荷过重导致操作系统挂死, 比如内存耗尽。这种情况下, 导致操作系统 无法执行记录异常信息相关的操作。 6、 高优先级任务持续占用 CPU导致其他低优先级任务无法得到调度, 最终导致 操作系统挂死。 这种情况下, 系统仅能调度高优先级任务执行, 而记录异常信息相关 的低级进程无法得到调度, 从而无法记录到有效信息。 5. The system is overloaded and the operating system hangs, such as running out of memory. In this case, the operating system cannot perform operations related to recording exception information. 6. High-priority tasks continue to occupy the CPU, causing other low-priority tasks to fail to be scheduled, eventually causing the operating system to hang. In this case, the system can only schedule high-priority task execution, and the low-level process related to recording abnormal information cannot be scheduled, so that valid information cannot be recorded.
7、软中断处理过程中出现死锁, 导致其他任务无法得到调度, 最终导致操作系统 挂死。 这种情况下, 由于记录异常信息相关的进程无法得到调度, 从而无法记录到有 效信息。 针对上述问题, 不难想到的是配置一个专业化的监测设备, 用于实时捕获被监测 设备的异常信息。 即被监测设备的系统死机后, 不会影响到监测设备的异常捕获。 但 是, 该方案由于额外配置了监测设备, 因此不具备适用性。 发明内容 本发明实施例要解决的技术问题是提供一种系统异常的捕获方法、 主系统、 影子 系统及智能设备, 能够一个硬件化境下独立运行主系统以及影子系统, 在主系统瘫痪 后, 影子系统依然能够捕获主系统的异常信息。 为解决上述技术问题, 本发明的实施例提供一种系统异常的捕获方法, 应用于主 系统, 包括: 主系统在一硬件环境的第二硬件资源上启动一用于对主系统进行异常检测的影子 系统; 所述第二硬件资源与主系统运行在所述硬件环境的第一硬件资源不同; 主系统将自己的运行状态信息动态保存在一共享内存中, 使得影子系统在监测出 主系统异常时, 从所述共享内存中获取到主系统的运行状态信息; 主系统将自己的物理内存地址保存在所述共享内存中, 使得影子系统在监测出主 系统异常时, 能够通过所述共享内存中的物理内存地址访问主系统的物理内存, 并获 取到主系统使用物理内存的信息。 其中, 所述应用于主系统的捕获方法还包括: 主系统对影子系统进行异常监测; 当主系统监测出影子系统异常时, 主系统对影子系统进行复位。 其中, 主系统在硬件环境的第二硬件资源上启动一用于对主系统进行异常检测的 影子系统包括: 主系统将影子系统的内核加载到影子系统的物理内存中; 主系统根据第二硬件资源的信息配置影子系统的系统内核的启动参数; 主系统将分配给影子系统的 CPU跳转到影子系统的物理内存,从而使该分配给影 子系统的 CPU运行所述系统内核, 以启动影子系统。 其中, 所述应用于主系统的捕获方法还包括: 主系统将支持心跳报文检测的第一硬件资源的信息保存在所述共享内存中, 使得 影子系统能够根据所述共享内存确定出支持心跳报文检测的第一硬件资源, 并与支持 心跳报文检测的第一硬件资源建立心跳报文检测机制, 以实现对主系统的异常监测。 其中, 所述应用于主系统的捕获方法还包括: 主系统通过软看门狗对用户态进程进行计数; 主系统将看门狗的计数实时保存在所述共享内存中, 从而使得影子系统能够根据 所述共享内存中计数的更新状况对主系统进行异常监测。 此外, 本发明的另一实施例还提供一种系统异常的捕获方法, 应用于影子系统, 所述影子系统运行在硬件化境的第二硬件资源, 且第二硬件资源与主系统运行在所述 硬件环境的第一硬件资源不同; 该捕获方法包括: 影子系统对主系统进行异常检测; 当影子系统监测出主系统异常时, 从一共享内存中获取主系统的物理内存地址以 及主系统的运行状态信息; 其中, 所述共享内存中的所述物理内存地址以及运行状态 信息是由主系统保存的; 影子系统根据主系统的物理内存地址访问主系统的物理内存, 得到主系统使用物 理内存的信息; 影子系统记录主系统使用物理内存的信息以及主系统的运行状态信息。 其中, 影子系统根据主系统的物理内存地址访问主系统的物理内存, 得到主系统 使用物理内存的信息包括: 影子系统加载一用于得到主系统的物理内存信息的查询内核; 影子系统根获取到的主系统的物理内存地址配置查询内核的启动参数; 影子系统运行所述查询内核, 访问主系统的物理内存, 得到主系统使用物理内存 的信息。 其中, 所述共享内存还包括由主系统保存的第一物理资源的信息; 影子系统对主系统进行异常检测包括: 影子系统从所述共享内存中支持心跳报文检测的第一硬件资源的信息, 并确定出 支持心跳报文检测的第一硬件资源; 影子系统与支持心跳报文检测的第一硬件资源建立心跳报文检测机制, 从而实现 对主系统的异常监测。 其中, 所述共享内存还包括: 主系统通过软看门狗对其用户态进程进行的计数; 影子系统对主系统进行异常检测包括: 影子系统根据所述共享内存中计数的更新状况对主系统进行异常监测。 此外, 本发明的另一实施例还提供一种主系统, 运行在硬件环境的第一硬件资源 上, 其包括: 启动模块, 设置为在一硬件环境的第二硬件资源上启动一用于对主系统进行异常 检测的影子系统; 所述第二硬件资源与主系统运行在所述硬件环境的第一硬件资源不 同; 第一保存模块, 设置为将主系统的运行状态信息动态保存在一共享内存中, 使得 影子系统在监测出主系统异常时, 从所述共享内存中获取到主系统的运行状态信息; 第二保存模块, 设置为将主系统的物理内存地址保存在所述共享内存中, 使得影 子系统在监测出主系统异常时, 能够通过所述共享内存中的物理内存地址访问主系统 的物理内存, 并获取到主系统使用物理内存的信息。 其中, 所述共享内存还包括影子系统的运行状态信息; 所述主系统包括: 影子系统监测模块, 设置为对影子系统进行异常监测; 复位模块, 设置为当所述第一监测模块监测出影子系统异常时, 对影子系统进行 复位。 其中, 所述启动模块包括: 第一加载子模块, 设置为将影子系统的内核加载到影子系统的物理内存中; 第一配置子模块, 设置为根据第二硬件资源的信息配置影子系统的系统内核的启 动参数; 第一运行子模块, 设置为将分配给影子系统的 CPU跳转到影子系统的物理内存, 从而使该分配给影子系统的 CPU运行所述系统内核, 以启动影子系统。 其中, 所述主系统还包括: 第三保存模块, 设置为将支持心跳报文检测的第一硬件资源的信息保存在所述共 享内存中, 使得影子系统能够根据所述共享内存确定出支持心跳报文检测的第一硬件 资源, 并与支持心跳报文检测的第一硬件资源建立心跳报文检测机制, 以实现对主系 统的异常监测。 其中, 所述主系统还包括: 看门狗模块, 设置为通过软看门狗对用户态进程进行计数; 第四保存模块, 设置为将看门狗的计数实时保存在所述共享内存中, 从而使得影 子系统能够根据所述共享内存中计数的更新状况对主系统进行异常监测。 此外, 本发明的另一实施例还提供一种影子系统, 运行在硬件环境的第二硬件资 源, 所述第二硬件资源与主系统占用所述硬件环境的第一硬件资源不同; 所述影子系 统包括: 主系统监测模块, 设置为对主系统进行异常检测 第一获取模块, 设置为当所述主系统监测模块监测出主系统异常时, 从一共享内 存中获取主系统的物理内存地址以及主系统的运行状态信息; 其中, 所述共享内存中 的所述物理内存地址以及运行状态信息是由主系统保存的; 第二获取模块, 设置为根据所述第一获取模块获取到的主系统的物理内存地址访 问主系统的物理内存, 得到主系统使用物理内存的信息; 记录模块, 设置为记录主系统使用物理内存的信息以及主系统的运行状态信息。 其中, 所述第二获取模块包括: 第二加载子模块,设置为加载一用于得到主系统使用物理内存的信息的查询内核; 第二配置子模块, 设置为根获取到的主系统的物理内存地址配置查询内核的启动 参数; 第二运行子模块, 设置为运行所述查询内核, 访问主系统的物理内存, 得到主系 统使用物理内存的信息。 其中, 所述共享内存包括由主系统保存的支持心跳报文检测的第一硬件资源的信 息; 所述主系统监测模块包括: 第三获取子模块, 设置为从所述共享内存中支持心跳报文检测的第一硬件资源的 信息, 并确定出支持心跳报文检测的第一硬件资源; 第一监测子模块, 设置为与支持心跳报文检测的第一硬件资源建立心跳报文检测 机制, 从而实现对主系统的异常监测。 其中, 所述共享内存还包括: 主系统通过软看门狗对其用户态进程进行的计数; 所述主系统监测模块包括: 第二监测子模块, 设置为根据所述共享内存中计数的更新状况对主系统进行异常 7. A deadlock occurs during the soft interrupt processing process, causing other tasks to fail to be scheduled, eventually causing the operating system to hang. In this case, since the process related to the recording of the abnormal information cannot be scheduled, the valid information cannot be recorded. In response to the above problems, it is not difficult to think of configuring a specialized monitoring device for capturing abnormal information of the monitored device in real time. That is, after the system of the monitored device crashes, it will not affect the abnormal capture of the monitoring device. However, this solution is not suitable due to the additional configuration of monitoring equipment. SUMMARY OF THE INVENTION The technical problem to be solved by embodiments of the present invention is to provide a system abnormality capturing method, a main system, a shadow system, and a smart device, which can independently run a main system and a shadow system in a hardware environment, after the main system, the shadow The system is still able to capture exception information for the primary system. To solve the above technical problem, an embodiment of the present invention provides a method for capturing an abnormality of a system, which is applied to a main system, including: a main system starts a second hardware resource in a hardware environment, and performs an abnormality detection on the main system. a shadow system; the second hardware resource is different from the first hardware resource of the main system running in the hardware environment; the main system dynamically saves its running state information in a shared memory, so that the shadow system monitors the main system abnormality Obtaining running state information of the main system from the shared memory; the main system saves the physical memory address in the shared memory, so that the shadow system can pass the shared memory when monitoring the abnormality of the main system The physical memory address in the accesses the physical memory of the primary system and obtains information about the physical memory used by the primary system. The capturing method applied to the main system further includes: the main system performs abnormal monitoring on the shadow system; when the main system detects the shadow system abnormality, the main system resets the shadow system. The main system starts a shadow system for performing anomaly detection on the main system on the second hardware resource of the hardware environment, including: The main system loads the kernel of the shadow system into the physical memory of the shadow system; the main system configures the startup parameters of the system kernel of the shadow system according to the information of the second hardware resource; the main system transfers the CPU assigned to the shadow system to the shadow system. The physical memory is such that the CPU assigned to the shadow system runs the system kernel to start the shadow system. The capturing method applied to the primary system further includes: the primary system saves information about the first hardware resource that supports the heartbeat packet detection in the shared memory, so that the shadow system can determine the support heartbeat according to the shared memory. A first hardware resource detected by the packet, and a heartbeat packet detection mechanism is established with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring of the primary system. The capturing method applied to the main system further includes: the main system counting the user state process by the soft watchdog; the main system saves the watchdog count in the shared memory in real time, thereby enabling the shadow system to The main system is abnormally monitored according to the update status of the count in the shared memory. In addition, another embodiment of the present invention further provides a method for capturing a system abnormality, which is applied to a shadow system, where the shadow system runs in a second hardware resource of a hardware environment, and the second hardware resource runs with the main system in the The first hardware resource of the hardware environment is different; the capturing method includes: the shadow system performs an abnormality detection on the main system; when the shadow system detects the main system abnormality, the physical memory address of the main system and the running of the main system are obtained from a shared memory. Status information; wherein the physical memory address and running status information in the shared memory are saved by the primary system; the shadow system accesses the physical memory of the primary system according to the physical memory address of the primary system, and the physical memory of the primary system is obtained. Information; The shadow system records information about the physical memory used by the primary system and the operational status of the primary system. The shadow system accesses the physical memory of the main system according to the physical memory address of the main system, and the information about the physical memory used by the main system includes: The shadow system loads a query kernel for obtaining physical memory information of the main system; The physical memory address of the primary system obtained by the shadow system root is configured to query the boot parameters of the kernel; the shadow system runs the query kernel, accesses the physical memory of the primary system, and obtains information about the physical memory used by the primary system. The shared memory further includes information of the first physical resource saved by the primary system. The abnormal detection of the primary system by the shadow system includes: the shadow system supports the information of the first hardware resource detected by the heartbeat packet from the shared memory. And determining the first hardware resource that supports the detection of the heartbeat packet; the shadow system and the first hardware resource supporting the detection of the heartbeat packet establish a heartbeat packet detection mechanism, thereby implementing abnormal monitoring of the main system. The shared memory further includes: a counting of the user state process by the main system through the soft watchdog; the abnormal detection of the main system by the shadow system includes: the shadow system is based on the update status of the count in the shared memory to the main system Perform abnormal monitoring. In addition, another embodiment of the present invention further provides a main system, running on a first hardware resource of a hardware environment, comprising: a startup module, configured to start a pair on a second hardware resource of a hardware environment a shadow system in which the main system performs an abnormality detection; the second hardware resource is different from a first hardware resource in which the main system runs in the hardware environment; and the first saving module is configured to dynamically save the running state information of the main system in a sharing In the memory, when the shadow system detects the abnormality of the main system, the running state information of the main system is obtained from the shared memory; and the second saving module is configured to save the physical memory address of the main system in the shared memory. When the shadow system detects the abnormality of the main system, the physical memory of the main system can be accessed through the physical memory address in the shared memory, and the information of the physical memory used by the main system is obtained. The shared memory further includes running state information of the shadow system; the main system includes: a shadow system monitoring module, configured to perform abnormal monitoring on the shadow system; The reset module is configured to reset the shadow system when the first monitoring module detects that the shadow system is abnormal. The startup module includes: a first loading submodule configured to load a kernel of a shadow system into a physical memory of the shadow system; and a first configuration submodule configured to configure a system of the shadow system according to information of the second hardware resource The boot parameter of the kernel; the first running submodule is configured to jump the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU assigned to the shadow system runs the system kernel to start the shadow system. The main system further includes: a third saving module, configured to save the information of the first hardware resource that supports the detection of the heartbeat message in the shared memory, so that the shadow system can determine the support heartbeat according to the shared memory. A first hardware resource detected by the packet, and a heartbeat packet detection mechanism is established with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring of the primary system. The main system further includes: a watchdog module configured to count a user state process by a soft watchdog; a fourth save module configured to save the watchdog count in the shared memory in real time, Thereby, the shadow system can perform abnormal monitoring on the main system according to the update status of the count in the shared memory. In addition, another embodiment of the present invention further provides a shadow system, a second hardware resource running in a hardware environment, where the second hardware resource is different from a first hardware resource occupied by the main system in the hardware environment; The system includes: a main system monitoring module, configured to perform an abnormality detection on the main system, the first acquiring module, configured to acquire a physical memory address of the main system from a shared memory when the main system monitoring module detects that the main system is abnormal The operating state information of the main system; wherein the physical memory address and the running state information in the shared memory are saved by the main system; the second obtaining module is configured to be the main system acquired according to the first acquiring module The physical memory address accesses the physical memory of the primary system, and obtains information about the physical memory used by the primary system; The recording module is set to record the information of the physical memory used by the main system and the running status information of the main system. The second obtaining module includes: a second loading submodule configured to load a query kernel for obtaining information of the physical memory used by the main system; and a second configuration submodule configured to obtain the physical of the main system obtained by the root The memory address configuration queries the boot parameters of the kernel; the second running submodule is configured to run the query kernel, access the physical memory of the primary system, and obtain information about the physical memory used by the primary system. The shared memory includes information of the first hardware resource that is supported by the primary system and supports the detection of the heartbeat message. The primary system monitoring module includes: a third acquiring submodule, configured to support the heartbeat from the shared memory. The information of the first hardware resource detected by the text, and the first hardware resource that supports the detection of the heartbeat packet is determined; the first monitoring sub-module is configured to establish a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, Thereby achieving abnormal monitoring of the main system. The shared memory further includes: counting, by the main system, the user state process by the soft watchdog; the main system monitoring module includes: a second monitoring submodule, configured to update according to the count in the shared memory The situation is abnormal to the main system
此外, 本发明的另一实施例还提供一种智能设备, 包括上述主系统以及上述影子 系统。 本发明实施例的上述技术方案的有益效果如下: 本发明实施例的方案能够一个硬件环境下分别独立运行主系统以及影子系统, 在 主系统瘫痪后, 影子系统依然能够捕获主系统的异常信息。 对于增强主系统的可维护 性意义重大。 附图说明 图 1为本发明的系统异常的捕获方法在主系统进行实施的步骤示意图; 图 2为本发明的系统异常的捕获方法在影子系统进行实施的步骤示意图; 图 3至图 5分别为本发明的系统异常的捕获方法确定主系统异常的不同实施例的 步骤示意图; 图 6为本发明的主系统的结构示意图; 图 7为本发明的影子系统的结构示意图; 图 8为本发明的智能设备启动主系统的步骤示意图; 图 9为本发明的智能设备的主系统启动影子系统的步骤示意图。 具体实施方式 为使本发明要解决的技术问题、 技术方案和优点更加清楚, 下面将结合附图及具 体实施例进行详细描述。 针对现有技术难以记录系统瘫痪后的异常信息, 本发明提供一种系统异常的捕获 方法, 在原有的操作系统 (即本文的主系统) 上引用了一个影子系统, 专用于监测主 系统的异常现象。 其中, 影子系统与主系统运行在同一硬件环境的不同硬件资源上, 因此当主系统发生瘫痪后并不影响影子系统的运行。 可见, 本发明的捕获方法对于增 强主系统的可维护性意义重大。 如图 1所示, 应用于主系统的系统异常的捕获方法包括: 步骤 11, 主系统在一硬件环境的第二硬件资源上启动一用于对主系统进行异常检 测的影子系统;所述第二硬件资源与主系统运行在所述硬件环境的第一硬件资源不同; 步骤 12, 主系统将自己的运行状态信息动态保存在一共享内存中, 使得影子系统 在监测出主系统异常时, 从所述共享内存中获取到主系统的运行状态信息。 步骤 13, 主系统将自己的物理内存地址保存在所述共享内存中, 使得影子系统在 监测出主系统异常时, 能够通过所述共享内存中的物理内存地址访问主系统的物理内 存, 并获取到主系统使用物理内存的信息。 其中, 所述硬件环境可以是一台 PC、 PAD 或者手机。 基于不同硬件资源运行主 系统和影子系统可以是, 利用不同的 CPU核心, 不同区域的物理内存等运行主系统和 影子系统。 从而保证影子系统的运行不依赖与主系统。 如图 2所示, 应用于影子系统的系统异常的捕获方法包括: 步骤 21, 影子系统对主系统进行异常检测; 步骤 22, 当影子系统监测出主系统异常时, 从一共享内存中获取主系统的物理内 存地址以及主系统的运行状态信息; 其中, 所述共享内存中的所述物理内存地址以及 运行状态信息是由主系统保存的; 步骤 23, 影子系统根据主系统的物理内存地址访问主系统的物理内存, 得到主系 统使用物理内存的信息; 步骤 24, 影子系统记录主系统使用物理内存的信息以及主系统的运行状态信息; 通过图 1和图 2所示的系统异常的捕获方法可以知道, 本发明的整个方案可基于 一个硬件化境下软实现, 因此不需要单独配置硬件进行支持, 部署极为方便。 此外, 在主系统瘫痪下, 影子系统依然能够捕获主系统的异常信息, 这要比主系统瘫痪后自 己所能够捕获到的异常信息更加全面。 其中, 影子获取到的主系统使用物理内存的信 息即为主系统在异常时的一个快照, 配合获取到的主系统动态的运行状态信息可以实 现对异常的定位能力。 此外, 为了提高影子系统监测主系统的稳定性, 在上述方案的基础之上还可以通 过主系统来监测影子系统, 当影子系统异常后, 主系统对影子系统进行复位, 从而保 证影子系统的正常工作。 即, 在上述应用于主系统的系统异常的捕获方法当中, 还包括: 步骤 14, 主系统对影子系统进行异常监测; 步骤 15, 当主系统监测出影子系统异常时, 主系统对影子系统进行复位。 当然, 需要说明的是, 基于本发明的方案, 影子系统在确定主系统异常后也可以 对主系统进行复位, 从而保证主系统正常运行。 此外, 在实际执行步骤 11的过程中, 面临如下技术难题: 主系统在指定的第二硬件资源上的加载影子系统, 影子系统不能使用分配给主系 统的硬件资源。 为此, 本发明提供了一种实现方案, 即步骤 11具体包括: 步骤 111, 主系统将影子系统的内核加载到影子系统的物理内存中; 步骤 112, 主系统根据第二硬件资源的信息配置影子系统的系统内核的启动参数; 步骤 113, 主系统将分配给影子系统的 CPU跳转到影子系统的物理内存, 从而使 该分配给影子系统的 CPU运行所述系统内核, 以启动影子系统。 而实现第一硬件资源运行主系统时, 可先在原主系统上划分好第一硬件资源以及 第二硬件资源。 之后重新初始化主系统, 使主系统运行在第一硬件资源上。 其具体原 理与上述步骤 111至步骤 113相同, 本文不再赘述。 同样地, 在实施步骤 23时也应用到了内核技术, 其具体包括: 步骤 231, 影子系统加载一用于得到主系统的物理内存信息的查询内核; 步骤 232, 影子系统根获取到的主系统的物理内存地址配置查询内核的启动参数; 步骤 233, 影子系统运行所述查询内核, 访问主系统的物理内存, 得到主系统使 用物理内存的信息。 下面结合一个实施例对步骤 231至步骤 233进行详细介绍。 In addition, another embodiment of the present invention further provides a smart device, including the above main system and the above shadow system. The foregoing technical solutions of the embodiments of the present invention have the following advantages: The solution of the embodiment of the present invention can independently run the main system and the shadow system in a hardware environment, and after the main system, the shadow system can still capture the abnormal information of the main system. It is of great significance to enhance the maintainability of the main system. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a schematic diagram showing steps of implementing a system abnormality capturing method in a main system according to the present invention; FIG. 2 is a schematic diagram showing steps of implementing a system abnormality capturing method in a shadow system according to the present invention; FIG. 3 to FIG. FIG. 6 is a schematic structural diagram of a main system of the present invention; FIG. 7 is a schematic structural view of a main system of the present invention; FIG. Schematic diagram of the steps of the smart device starting the main system; FIG. 9 is a schematic diagram showing the steps of the main system booting shadow system of the smart device of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS In order to make the technical problems, technical solutions, and advantages of the present invention more comprehensible, the following detailed description will be made in conjunction with the accompanying drawings and specific embodiments. The invention provides a method for capturing an abnormality of a system, and a shadow system is used on the original operating system (ie, the main system of the present invention), and is specifically used for monitoring an abnormality of the main system. phenomenon. Wherein, the shadow system and the main system run on different hardware resources in the same hardware environment, so the operation of the shadow system is not affected when the main system occurs. It can be seen that the capture method of the present invention is significant for enhancing the maintainability of the primary system. As shown in FIG. 1 , a method for capturing a system abnormality applied to a primary system includes: Step 11: A primary system starts a shadow system for performing an abnormality detection on a primary system on a second hardware resource of a hardware environment; The second hardware resource is different from the first hardware resource of the main system running in the hardware environment; Step 12, the main system dynamically saves its running state information in a shared memory, so that the shadow system monitors the abnormality of the main system, The running status information of the primary system is obtained in the shared memory. In step 13, the main system saves the physical memory address of the main system in the shared memory, so that the shadow system can access the physical memory of the main system through the physical memory address in the shared memory when the main system is abnormal. Information to the main system using physical memory. The hardware environment may be a PC, a PAD, or a mobile phone. Running the main system and the shadow system based on different hardware resources may be, using different CPU cores, different areas of physical memory, etc. to run the main system and the shadow system. This ensures that the operation of the shadow system does not depend on the main system. As shown in FIG. 2, the method for capturing a system abnormality applied to a shadow system includes: Step 21: The shadow system performs an abnormality detection on the main system; Step 22: When the shadow system detects the main system abnormality, the main memory is obtained from a shared memory. The physical memory address of the system and the running status information of the primary system; wherein the physical memory address and the running status information in the shared memory are saved by the primary system; Step 23, the shadow system accesses according to the physical memory address of the primary system The physical memory of the main system obtains the information of the physical memory used by the main system; Step 24, the shadow system records the information of the physical memory used by the main system and the running status information of the main system; The method for capturing the abnormality of the system shown in FIG. 1 and FIG. It can be known that the entire solution of the present invention can be implemented based on a hardware-based soft implementation, so that it is not necessary to separately configure hardware for support, and the deployment is extremely convenient. In addition, under the main system, the shadow system can still capture the abnormal information of the main system, which is more comprehensive than the abnormal information that the main system can capture afterwards. The information about the physical memory used by the shadow system is a snapshot of the main system when the abnormality is obtained. The dynamic state information of the main system can be used to locate the abnormality. In addition, in order to improve the stability of the shadow system monitoring main system, the shadow system can be monitored by the main system on the basis of the above scheme. When the shadow system is abnormal, the main system resets the shadow system to ensure the normal shadow system. jobs. That is, in the above method for capturing the system abnormality applied to the main system, the method further includes: Step 14: The main system performs abnormality monitoring on the shadow system; Step 15: When the main system detects the shadow system abnormality, the main system resets the shadow system . Of course, it should be noted that, according to the solution of the present invention, the shadow system can also reset the main system after determining the abnormality of the main system, thereby ensuring the normal operation of the main system. In addition, in the actual implementation of step 11, the following technical problems are faced: The primary system loads the shadow system on the specified second hardware resource, and the shadow system cannot use the hardware resources allocated to the primary system. To this end, the present invention provides an implementation, that is, step 11 specifically includes: Step 111, the main system loads the kernel of the shadow system into the physical memory of the shadow system; Step 112, the main system configures according to the information of the second hardware resource The startup parameter of the system kernel of the shadow system; Step 113, the main system jumps the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU allocated to the shadow system runs the system kernel to start the shadow system. When the first hardware resource is used to run the main system, the first hardware resource and the second hardware resource may be first divided on the original main system. The primary system is then reinitialized to cause the primary system to run on the first hardware resource. The specific principle is the same as the above steps 111 to 113, and will not be repeated herein. Similarly, in the implementation of step 23, the kernel technology is also applied, which specifically includes: Step 231: The shadow system loads a query kernel for obtaining physical memory information of the primary system; Step 232, the primary system of the shadow system root acquired The physical memory address configuration queries the startup parameters of the kernel; Step 233, the shadow system runs the query kernel, accesses the physical memory of the primary system, and obtains information about the physical memory used by the primary system. Steps 231 to 233 are described in detail below in conjunction with an embodiment.
<实施例一> 步骤 All , 影子系统检测从共享内存中读取主系统使用的物理 CPU信息。该物理 CPU信息由主系统保存在共享内存中。 步骤 A12, 影子系统通过给主系统所使用的所有 CPU发送中断请求, 使其停止运 行。 步骤 A13, 如果主系统此时还能响应中断请求, 则主系统停止运行、 停止对内存 的访问操作,并同步各个 CPU之间的状态。其中,若主系统此时不能响应中断请求(已 经挂死), 则跳过此步骤。 步骤 A14, 影子系统从共享内存中读取之前由主系统保存的物理内存地址。 步骤 A15, 影子系统将主系统的物理内存地址作为查询内核的启动参数, 然后加 载查询内核。 如此, 影子系统即可访问主系统使用的物理内存。 步骤 A16, 影子系统读取主系统使用物理内存的信息, 并进行转储。 转储方式可 根据影子系统拥有的第二硬件资源来定, 如果影子系统拥有独立的硬盘, 则可将转储 文件保存于硬盘中; 如果影子系统拥有独立的网卡, 也可以通过网络将转储文件上传 到网络的指定位置。 在实施例一中, 影子系统在对主系统的物理内存进行访问前, 并优选先终止主系 统的运行, 从而保证获取到的主系统使用物理内存的信息的一致性, 并提高访问过程 的稳定性。 此外, 如果导致主系统死机的原因是因为 PCI总线挂死, 由于影子系统和 主系统需要使用相同的 PCI总线, 所述此时影子系统可能无法进行 PCI总线相关的操 作, 从而无法实现异常信息捕获。 为此, 另一优选方案, 影子系统可在加载查询内核 前, 对 PCI总线进行初始化。 在 PCI总线初始化完成后, 查询内核即可使用。 下面对主系统进行异常监测的方法进行详细介绍。 具体地, 在上述应用于主系统的系统异常的捕获方法当中, 还包括: 步骤 16, 主系统将支持心跳报文检测的第一硬件资源的信息保存在所述共享内存 中; 对应,在上述步骤 21中,影子系统从所述共享内存中支持心跳报文检测的第一硬 件资源的信息, 并确定出支持心跳报文检测的第一硬件资源。 之后与支持心跳报文检 测的第一硬件资源建立心跳报文检测机制, 从而实现对主系统的异常监测。 下面结合具体实施例对心跳报文的检测相关步骤进行介绍。 <Embodiment 1> Step All, the shadow system detects the physical CPU information used by the main system from the shared memory. The physical CPU information is stored in the shared memory by the host system. In step A12, the shadow system stops the operation by sending an interrupt request to all CPUs used by the main system. Step A13, if the main system can also respond to the interrupt request at this time, the main system stops running, stops the access operation to the memory, and synchronizes the state between the CPUs. If the primary system cannot respond to the interrupt request at this time (has been hanged), skip this step. In step A14, the shadow system reads the physical memory address previously saved by the main system from the shared memory. In step A15, the shadow system uses the physical memory address of the primary system as a startup parameter of the query kernel, and then loads the query kernel. In this way, the shadow system can access the physical memory used by the primary system. In step A16, the shadow system reads the information of the physical memory used by the main system, and performs dumping. The dump mode can be determined according to the second hardware resource owned by the shadow system. If the shadow system has a separate hard disk, the dump file can be saved on the hard disk; if the shadow system has a separate network card, the dump can also be dumped through the network. The file is uploaded to the specified location on the network. In the first embodiment, the shadow system terminates the operation of the main system before accessing the physical memory of the main system, thereby ensuring the consistency of the information obtained by the main system using the physical memory, and improving the stability of the access process. Sex. In addition, if the cause of the main system crash is because the PCI bus hangs, since the shadow system and the main system need to use the same PCI bus, the shadow system may not be able to perform PCI bus-related operations at this time, and thus the abnormal information capture cannot be realized. . To this end, another preferred solution, the shadow system can initialize the PCI bus before loading the query kernel. After the PCI bus is initialized, the query kernel can be used. The method of abnormal monitoring of the main system is described in detail below. Specifically, in the foregoing method for capturing a system abnormality applied to the main system, the method further includes: Step 16: The main system saves information about the first hardware resource that supports the heartbeat packet detection in the shared memory; In step 21, the shadow system supports the information of the first hardware resource detected by the heartbeat packet from the shared memory, and determines the first hardware resource that supports the detection of the heartbeat packet. Then, a heartbeat packet detection mechanism is established with the first hardware resource that supports heartbeat packet detection, thereby implementing abnormal monitoring of the main system. The steps related to detecting the heartbeat message are introduced in the following with reference to specific embodiments.
<实施例二> 在实施例二中, 影子系统根据主系统占用网卡资源的状况进行异常监测, 包括如 图 3所示的步骤: 步骤 Bll, 影子系统启动后, 从共享内存中获取主系统的网卡信息, 并定期通过 分配给自己的网卡资源向主系统的指定网卡发送心跳请求报文。由于是同一硬件环境, 如果主系统拥有网卡资源的话, 那么影子系统同样也能够分配到网卡资源。 步骤 B12, 主系统通过自己的网卡接收影子系统发送的心跳请求报文, 并回复对 应的心跳响应报文。 步骤 B13, 如果指定时间内, 影子系统检测指定数量的心跳响应报丢失, 则确定 主系统异常。 <Embodiment 2> In the second embodiment, the shadow system performs abnormal monitoring according to the status of the NIC resources occupied by the main system, including the steps shown in FIG. 3: Step B11, after the shadow system is started, the main system is obtained from the shared memory. NIC information, and periodically send heartbeat request packets to the specified NIC of the primary system through the NIC resources assigned to it. Because it is the same hardware environment, if the main system has NIC resources, then the shadow system can also be allocated to the NIC resources. In step B12, the main system receives the heartbeat request packet sent by the shadow system through its own network card, and responds to the corresponding heartbeat response message. Step B13: If the shadow system detects that the specified number of heartbeat response reports are lost within the specified time, determining that the primary system is abnormal.
<实施例三> 在实施例三中, 影子系统对利用主系统占用的 CPU资源的状况进行异常捕获, 具 体包括如图 4所示的步骤: 步骤 Cll, 影子系统启动后从共享内存中读取主系统使用的 CPU资源信息。 步骤 C12,影子系统定期向主系统使用的 CPU发送核间中断,作为心跳请求报文; 该步骤适用于多核 CPU的芯片上, 影子系统和主系统可运行在不同的 CPU上。 步骤 C13, 主系统通过共享内存收到影子系统发来的核间中断后, 向影子系统使 用的 CPU回复核间中断的响应, 即回复心跳响应报文。 需要说明的是, CPU在处理 接收到的核间后, 会继续正常运行, 中断处理很快, 不会影响正常业务。 步骤 C14, 如果指定时间内, 影子系统没有收到主系统发送的核间中断的响应, 则确定主系统异常。 综上所述, 在实施例二和实施例三中, 影子系统可以通过共享内存得到主系统使 用的支持心跳报文检测的第一物理资源, 从而建立异常检测机制。 同理, 主系统也可 以反过来对影子系统进行异常监测, 本文不再进行赘述。 此外, 子系统还可以根据主系统的运行进程建立异常监测。 具体地, 在上述应用 于主系统的系统异常的捕获方法当中, 还包括: 步骤 17, 主系统通过软看门狗对用户态进程进行计数; 步骤 18, 主系统将看门狗的计数实时保存在所述共享内存中; 对应地,在上述步骤 21中,影子系统具体根据所述共享内存中看门狗的计数更新 状况确定出主系统是否死锁。 下面结合具体实施例对上述异常监测机制进行介绍。 <Embodiment 3> In the third embodiment, the shadow system performs an abnormal capture on the state of the CPU resources occupied by the main system, and specifically includes the steps shown in FIG. 4: Step C11, the shadow system is read from the shared memory after being started. CPU resource information used by the main system. Step C12: The shadow system periodically sends an inter-core interrupt to the CPU used by the main system as a heartbeat request message; this step is applicable to the chip of the multi-core CPU, and the shadow system and the main system can run on different CPUs. Step C13: After receiving the inter-core interrupt sent by the shadow system through the shared memory, the main system returns the response of the inter-core interrupt to the CPU used by the shadow system, that is, returns the heartbeat response message. It should be noted that after processing the received core, the CPU will continue to operate normally, and the interrupt processing will be fast, which will not affect normal services. Step C14: If the shadow system does not receive the response of the inter-core interrupt sent by the primary system within the specified time, the primary system is determined to be abnormal. In summary, in the second embodiment and the third embodiment, the shadow system can obtain the first physical resource used by the primary system to support the detection of the heartbeat packet through the shared memory, thereby establishing an abnormality detecting mechanism. In the same way, the main system can also perform abnormal monitoring on the shadow system in turn, and this article will not repeat them. In addition, the subsystem can also establish anomaly monitoring according to the running process of the main system. Specifically, in the foregoing method for capturing a system abnormality applied to the main system, the method further includes: Step 17, the main system counts the user state process by using a soft watchdog; Step 18, the main system saves the watchdog count in real time. Correspondingly, in the above step 21, the shadow system determines whether the main system is deadlock according to the count update status of the watchdog in the shared memory. The above abnormality monitoring mechanism will be introduced below in conjunction with specific embodiments.
<实施例四> 在实施例四中, 以执行用户态进程作为喂狗条件, 具体包括如图 5所示的步骤: 步骤 Dll, 主系统启动看门狗驱动。 步骤 D12, 主系统根据执行的用户态进程进行喂狗。 步骤 D13, 主系统模拟看门狗驱动收到用户态进程的喂狗请求后, 向共享内存中 的指定区域写入喂狗标记 (标记可实现为一个简单计数)。 步骤 D14, 影子系统定期查询共享内存指定区域中的喂狗标记。 步骤 D15, 如果影子系统根据共享内存确定出主系统的喂狗标记在限定时间内未 更新, 则认为主系统异常。 在实施例四中, 作为优选方案可将高优先级的用户态进程作为喂狗条件, 以降低 资源运行消耗。 同样地, 主系统也可根据实施例四的原理对子系统进行异常监测。 此外, 主系统的运行状态信息还可以进一步包括: 主系统当前的 CPU占用情况、 内存使用情况以及系统日志等。 综上所述, 本发明的系统异常的捕获方法具有以下优 点: <Embodiment 4> In the fourth embodiment, the user state process is executed as a feeding condition, and specifically includes the steps shown in FIG. 5: In step D11, the main system starts the watchdog driver. In step D12, the main system feeds the dog according to the executed user state process. In step D13, the main system simulates that the watchdog driver receives the dog feed request from the user state process, and writes the dog feed flag to the designated area in the shared memory (the flag can be implemented as a simple count). In step D14, the shadow system periodically queries the dog feed flag in the designated area of the shared memory. Step D15: If the shadow system determines, according to the shared memory, that the dog feed flag of the primary system is not updated within a limited time, the primary system is considered abnormal. In the fourth embodiment, as a preferred solution, a high-priority user state process can be used as a feeding condition to reduce resource running consumption. Similarly, the main system can also perform abnormal monitoring on the subsystem according to the principle of the fourth embodiment. In addition, the running status information of the main system may further include: a current CPU usage of the main system, a memory usage, and a system log. In summary, the system abnormal capture method of the present invention has the following advantages:
1 ) 部署更方便, 不依赖额外的硬件环境, 能在没有串口的境中使用。 1) It is more convenient to deploy, does not rely on additional hardware environment, and can be used in the environment without serial port.
2) 在主系统多种异常情况发生下, 也能够捕获故障信息, 以用于分析定位。 3 ) 为主系统出现异常后, 进行自动恢复提供了技术支持。 此外, 本发明还提供了一种主系统, 运行在硬件环境的第一硬件资源上, 如图 6 所示, 包括: 启动模块, 设置为在一硬件环境的第二硬件资源上启动一用于对主系统进行异常 检测的影子系统; 所述第二硬件资源与主系统运行在所述硬件环境的第一硬件资源不 同; 第一保存模块, 设置为将主系统的运行状态信息动态保存在一共享内存中, 使得 影子系统在监测出主系统异常时, 从所述共享内存中获取到主系统的运行状态信息; 第二保存模块, 设置为将主系统的物理内存地址保存在所述共享内存中, 使得影 子系统在监测出主系统异常时, 能够通过所述共享内存中的物理内存地址访问主系统 的物理内存, 并获取到主系统使用物理内存的信息。 此外, 为了提高影子系统监测主系统的稳定性, 在上述方案的基础之上还可以通 过主系统来监测影子系统, 当影子系统异常后, 主系统对影子系统进行复位, 从而保 证影子系统的正常工作。 即, 所述主系统包括: 影子系统监测模块, 设置为对影子系统进行异常监测; 复位模块, 设置为当所述第一监测模块监测出影子系统异常时, 对影子系统进行 复位。 在上述方案的技术上, 所述启动模块包括: 第一加载子模块, 设置为将影子系统的内核加载到影子系统的物理内存中; 第一配置子模块, 设置为根据第二硬件资源的信息配置影子系统的系统内核的启 动参数; 第一运行子模块, 设置为将分配给影子系统的 CPU跳转到影子系统的物理内存, 从而使该分配给影子系统的 CPU运行所述系统内核, 以启动影子系统。 此外, 在上述实施例的基础之上, 所述主系统还包括: 第三保存模块, 设置为将支持心跳报文检测的第一硬件资源的信息保存在所述共 享内存中, 使得影子系统能够根据所述共享内存确定出支持心跳报文检测的第一硬件 资源, 并与支持心跳报文检测的第一硬件资源建立心跳报文检测机制, 以实现对主系 统的异常监测。 此外, 在上述实施例的基础之上, 所述主系统还包括: 看门狗模块, 设置为通过软看门狗对用户态进程进行计数; 第四保存模块, 设置为将看门狗的计数实时保存在所述共享内存中, 从而使得影 子系统能够根据所述共享内存中计数的更新状况对主系统进行异常监测。 显然,本实施例的主系统与本发明的应用于主系统的系统异常的捕获方法相对应, 该方法所能达到的技术想过, 本实施例的主系统同样也能够达到。 此外, 本发明还提供了一种影子系统, 运行在硬件环境的第二硬件资源, 所述第 二硬件资源与主系统占用所述硬件环境的第一硬件资源不同; 如图 7所示, 所示影子 系统包括: 主系统监测模块, 设置为对主系统进行异常检测; 第一获取模块, 设置为当所述主系统监测模块监测出主系统异常时, 从一共享内 存中获取主系统的物理内存地址以及主系统的运行状态信息; 其中, 所述共享内存中 的所述物理内存地址以及运行状态信息是由主系统保存的; 第二获取模块, 设置为根据所述第一获取模块获取到的主系统的物理内存地址访 问主系统的物理内存, 得到主系统使用物理内存的信息; 记录模块, 设置为记录主系统使用物理内存的信息以及主系统的运行状态信息。 其中, 所述第二获取模块包括: 第二加载子模块,设置为加载一用于得到主系统使用物理内存的信息的查询内核; 第二配置子模块, 设置为根获取到的主系统的物理内存地址配置查询内核的启动 参数; 第二运行子模块, 设置为运行所述查询内核, 访问主系统的物理内存, 得到主系 统使用物理内存的信息。 具体地, 所述共享内存包括由主系统保存的支持心跳报文检测的第一硬件资源的 信息; 所述主系统监测模块包括: 第三获取子模块, 设置为从所述共享内存中支持心跳报文检测的第一硬件资源的 信息, 并确定出支持心跳报文检测的第一硬件资源; 第一监测子模块, 设置为与支持心跳报文检测的第一硬件资源建立心跳报文检测 机制, 从而实现对主系统的异常监测。 在上述描述中, 影子系统可以通过共享内存得到主系统使用的支持心跳报文检测 的第一物理资源, 从而建立异常检测机制。 同理, 主系统也可以反过来对影子系统进 行异常监测, 本文不再进行赘述。 具体地,所述共享内存还包括: 主系统通过软看门狗对其用户态进程进行的计数; 所述主系统监测模块包括: 第二监测子模块, 设置为根据所述共享内存中计数的更新状况对主系统进行异常 2) Under the occurrence of various abnormal conditions in the main system, fault information can also be captured for analysis and positioning. 3) After the abnormality of the main system, technical support is provided for automatic recovery. In addition, the present invention also provides a main system, running on a first hardware resource of a hardware environment, as shown in FIG. 6, comprising: a startup module, configured to start on a second hardware resource of a hardware environment for a shadow system for performing an abnormality detection on the main system; the second hardware resource is different from the first hardware resource of the main system running in the hardware environment; the first saving module is configured to dynamically save the running state information of the main system in a In the shared memory, when the shadow system detects the abnormality of the main system, the running state information of the main system is obtained from the shared memory; and the second saving module is configured to save the physical memory address of the main system in the shared memory. The shadow system can access the physical memory of the primary system through the physical memory address in the shared memory and obtain information about the physical memory used by the primary system when the primary system is abnormal. In addition, in order to improve the stability of the shadow system monitoring main system, the shadow system can be monitored by the main system on the basis of the above scheme. When the shadow system is abnormal, the main system resets the shadow system to ensure the normal shadow system. jobs. That is, the main system includes: a shadow system monitoring module configured to perform abnormal monitoring on the shadow system; and a reset module configured to reset the shadow system when the first monitoring module detects that the shadow system is abnormal. In the technical solution of the foregoing solution, the startup module includes: a first loading submodule, configured to load a kernel of a shadow system into a physical memory of the shadow system; and a first configuration submodule configured to be based on information of the second hardware resource Configuring a booting parameter of the system kernel of the shadow system; the first running submodule is configured to jump the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU allocated to the shadow system runs the system kernel to Start the shadow system. In addition, on the basis of the foregoing embodiment, the main system further includes: a third saving module, configured to save information of the first hardware resource that supports heartbeat packet detection in the shared memory, so that the shadow system can And determining, according to the shared memory, the first hardware resource that supports the detection of the heartbeat packet, and establishing a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring on the primary system. In addition, based on the foregoing embodiment, the main system further includes: a watchdog module, configured to count user state processes by a soft watchdog; a fourth save module, configured to count the watchdog It is saved in the shared memory in real time, so that the shadow system can perform abnormal monitoring on the main system according to the update status of the count in the shared memory. Obviously, the main system of the present embodiment corresponds to the method for capturing the abnormality of the system applied to the main system of the present invention, and the technology that can be achieved by the method can be achieved by the main system of the present embodiment. In addition, the present invention further provides a shadow system, a second hardware resource running in a hardware environment, where the second hardware resource is different from a first hardware resource occupied by the main system in the hardware environment; The shadow system includes: The main system monitoring module is configured to perform an abnormality detection on the main system; the first obtaining module is configured to obtain the physical memory address of the main system and the main system from a shared memory when the main system monitoring module detects that the main system is abnormal. The operating state information of the primary system obtained by the first acquiring module is configured by the primary system. The memory address accesses the physical memory of the main system, and obtains information of the physical memory used by the main system; the recording module is set to record the information of the physical memory used by the main system and the running status information of the main system. The second obtaining module includes: a second loading submodule configured to load a query kernel for obtaining information of the physical memory used by the main system; and a second configuration submodule configured to obtain the physical of the main system obtained by the root The memory address configuration queries the boot parameters of the kernel; the second running submodule is configured to run the query kernel, access the physical memory of the primary system, and obtain information about the physical memory used by the primary system. Specifically, the shared memory includes information of a first hardware resource that is supported by the primary system and supports detection of a heartbeat message. The primary system monitoring module includes: a third acquisition submodule, configured to support a heartbeat from the shared memory. The information of the first hardware resource detected by the packet is determined, and the first hardware resource that supports the detection of the heartbeat packet is determined. The first monitoring submodule is configured to establish a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet. , thus achieving abnormal monitoring of the main system. In the above description, the shadow system can obtain the first physical resource used by the primary system to support the detection of the heartbeat message through the shared memory, thereby establishing an abnormality detecting mechanism. In the same way, the main system can also reverse the abnormal monitoring of the shadow system, and this article will not repeat them. Specifically, the shared memory further includes: counting, by the main system, the user state process by the soft watchdog; the main system monitoring module includes: a second monitoring submodule, configured to perform an abnormality on the primary system according to the update status of the count in the shared memory
显然, 本实施例的影子系统与本发明的应用于影子系统的系统异常的捕获方法相 对应, 该方法所能达到的技术想过, 本实施例的主系统同样也能够达到。 此外, 本发明还提供一种智能设备, 包括: 本发明提供的主系统以及影子系统。 其中, 智能设备可以是 PC、 PAD或者手机等。 其中, 如图 8所示, 智能设备启动主系统的流程包括如下步骤: 步骤 Ell, 硬件上电, BIOS进行硬件自检和扫描。 步骤 E12, 仅使用分配给主系统的硬件资源上启动主系统。 步骤 E13, 主系统启动过程中, 将使用的直至心跳报文检测的第一硬件资源的信 息 (比如: 物理内存位置占用情况信息、 CPU硬件信息) 写入共享内存。 步骤 E14, 主系统启动过程中, 选定一定大小的物理内存区域作为主系统和影子 系统通信的共享内存用。 步骤 E15, 主系统初始化时, 将共享内存保留, 不让操作系统和业务程序通过常 规的内存分配方式使用, 但可以在需要时通过特殊接口访问。 其中, 如图 9所示, 主系统启动子系统的流程包括如下步骤: 步骤 Fll, 主系统将影子系统的内核镜像加载到分配给影子系统的物理内存的区 域。 步骤 F12, 主系统将分配给影子系统的第二硬件资源信息作为启动参数, 传递给 影子系统的系统内核,并使分配给影子系统的 CPU跳转该系统内核被加载的物理内存, 影子系统启动。 步骤 F13, 影子系统启动, 初始化分配给自己的第二硬件资源。 步骤 F14, 影子系统将用于收集主系统使用物理内存的信息的搜索内核加载到共 享内存的指定区域。 此外, 在本发明实施例中, 共享内存按不同用途划分为不同区域, 其具体包括: 静态信息区域, 用于保存主系统和影子系统启动后不会再更新的信息, 该区域大 小固定, 而且不会滚动。 进一步地, 此区域分为三个部分: 第一部分保存主系统占用 第一硬件资源的信息、 主系统使用的物理内存地址。 第二部分存放搜索内核的镜像。 第三部分存放上文所述的喂狗标记信息。 动态信息区域, 用于保存主系统动态的运行状态信息, 比如: 当前的 CPU占用情 况、 内存使用情况以及系统日志等。 该区域大小固定, 会进行动态滚动, 当区域写满 后, 新写入的信息会循环覆盖区域开始部分的内容。 以上所述是本发明的优选实施方式, 应当指出, 对于本技术领域的普通技术人员 来说, 在不脱离本发明所述原理的前提下, 还可以作出若干改进和润饰, 这些改进和 润饰也应视为本发明的保护范围。 工业实用性 本发明提供的上述技术方案,可以应用于系统异常的捕获中,采用上述技术方案, 能够一个硬件环境下分别独立运行主系统以及影子系统, 在主系统瘫痪后, 影子系统 依然能够捕获主系统的异常信息。 对于增强主系统的可维护性意义重大。 Obviously, the shadow system of this embodiment corresponds to the method for capturing the system anomaly applied to the shadow system of the present invention. The technology that can be achieved by the method is also considered to be achieved by the main system of the embodiment. In addition, the present invention also provides a smart device, including: a main system and a shadow system provided by the present invention. The smart device can be a PC, a PAD, or a mobile phone. As shown in FIG. 8 , the process of starting the main system by the smart device includes the following steps: Step E11, hardware power-on, BIOS performs hardware self-test and scanning. In step E12, the primary system is started only on the hardware resources allocated to the primary system. Step E13: During the startup process of the primary system, the information of the first hardware resource (such as physical memory location occupancy information and CPU hardware information) detected until the heartbeat packet is used is written into the shared memory. In step E14, during the startup process of the main system, a physical memory area of a certain size is selected as the shared memory for communication between the primary system and the shadow system. In step E15, when the main system is initialized, the shared memory is reserved, and the operating system and the business program are not used by the conventional memory allocation mode, but can be accessed through a special interface when needed. As shown in FIG. 9, the process of the main system startup subsystem includes the following steps: Step F11, the main system loads the kernel image of the shadow system into the area allocated to the physical memory of the shadow system. Step F12, the main system transmits the second hardware resource information allocated to the shadow system as a startup parameter to the system kernel of the shadow system, and causes the CPU allocated to the shadow system to jump to the physical memory loaded by the system kernel, and the shadow system starts. . In step F13, the shadow system is started, and the second hardware resource allocated to itself is initialized. In step F14, the shadow system loads a search kernel for collecting information of the main system using physical memory into a designated area of the shared memory. In addition, in the embodiment of the present invention, the shared memory is divided into different areas according to different uses, and the specific includes: Static information area, used to save information that the main system and shadow system will not update after startup. The area is fixed in size and will not scroll. Further, the area is divided into three parts: The first part saves the information that the primary system occupies the first hardware resource and the physical memory address used by the primary system. The second part stores the image of the search kernel. The third part stores the dog marking information described above. The dynamic information area is used to save the running status information of the main system, such as: current CPU usage, memory usage, and system logs. The area is fixed in size and will be dynamically scrolled. When the area is full, the newly written information will loop over the beginning of the area. The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention. Industrial Applicability The above technical solution provided by the present invention can be applied to the capture of system anomalies. With the above technical solution, the main system and the shadow system can be independently operated in a hardware environment, and after the main system, the shadow system can still capture. Abnormal information of the main system. It is of great significance to enhance the maintainability of the main system.

Claims

权 利 要 求 书 Claim
1. 一种系统异常的捕获方法, 应用于主系统, 包括: 1. A method for capturing system anomalies, applied to a host system, including:
主系统在一硬件环境的第二硬件资源上启动一用于对主系统进行异常检测 的影子系统; 所述第二硬件资源与主系统运行在所述硬件环境的第一硬件资源 不同;  The primary system initiates a shadow system for performing anomaly detection on the primary system on a second hardware resource of the hardware environment; the second hardware resource is different from the first hardware resource of the primary system running in the hardware environment;
主系统将自己的运行状态信息动态保存在一共享内存中, 使得影子系统在 监测出主系统异常时, 从所述共享内存中获取到主系统的运行状态信息; 主系统将自己的物理内存地址保存在所述共享内存中, 使得影子系统在监 测出主系统异常时, 能够通过所述共享内存中的物理内存地址访问主系统的物 理内存, 并获取到主系统使用物理内存的信息。  The main system dynamically saves its running status information in a shared memory, so that the shadow system obtains the running status information of the main system from the shared memory when the main system abnormality is detected; the main system sets its own physical memory address. The function is saved in the shared memory, so that the shadow system can access the physical memory of the primary system through the physical memory address in the shared memory and obtain information about the physical memory used by the primary system when the primary system is abnormal.
2. 根据权利要求 1所述的捕获方法, 其中, 还包括: 主系统对影子系统进行异常监测; 2. The capturing method according to claim 1, further comprising: performing abnormal monitoring on the shadow system by the main system;
当主系统监测出影子系统异常时, 主系统对影子系统进行复位。  When the primary system detects a shadow system anomaly, the primary system resets the shadow system.
3. 根据权利要求 1所述的捕获方法, 其中, 主系统在硬件环境的第二硬件资源上 启动一用于对主系统进行异常检测的影子系统包括: 主系统将影子系统的内核加载到影子系统的物理内存中; 3. The capturing method according to claim 1, wherein the main system starts a shadow system for detecting an abnormality of the main system on the second hardware resource of the hardware environment, and the main system loads the kernel of the shadow system into the shadow. In the physical memory of the system;
主系统根据第二硬件资源的信息配置影子系统的系统内核的启动参数; 主系统将分配给影子系统的 CPU跳转到影子系统的物理内存,从而使该分 配给影子系统的 CPU运行所述系统内核, 以启动影子系统。  The main system configures the startup parameters of the system kernel of the shadow system according to the information of the second hardware resource; the main system jumps the CPU allocated to the shadow system to the physical memory of the shadow system, so that the CPU allocated to the shadow system runs the system Kernel to start the shadow system.
4. 根据权利要求 1所述的捕获方法, 其中, 还包括: 主系统将支持心跳报文检测的第一硬件资源的信息保存在所述共享内存 中, 使得影子系统能够根据所述共享内存确定出支持心跳报文检测的第一硬件 资源, 并与支持心跳报文检测的第一硬件资源建立心跳报文检测机制, 以实现 对主系统的异常监测。 The capture method according to claim 1, further comprising: the primary system saving information of the first hardware resource supporting the detection of the heartbeat message in the shared memory, so that the shadow system can determine according to the shared memory. A first hardware resource that supports heartbeat packet detection is configured, and a heartbeat packet detection mechanism is established with the first hardware resource that supports heartbeat packet detection, so as to implement abnormal monitoring of the main system.
5. 根据权利要求 1所述的捕获方法, 其中, 还包括: 5. The capturing method according to claim 1, further comprising:
主系统通过软看门狗对用户态进程进行计数; 主系统将看门狗的计数实时保存在所述共享内存中, 从而使得影子系统能 够根据所述共享内存中计数的更新状况对主系统进行异常监测。 The main system counts user state processes through a soft watchdog; The main system saves the watchdog's count in the shared memory in real time, thereby enabling the shadow system to perform abnormality monitoring on the main system according to the update status of the count in the shared memory.
6. 一种系统异常的捕获方法, 应用于影子系统, 所述影子系统运行在硬件化境的 第二硬件资源, 且第二硬件资源与主系统运行在所述硬件环境的第一硬件资源 不同; 包括: 影子系统对主系统进行异常检测; A method for capturing a system abnormality, which is applied to a shadow system, wherein the shadow system runs a second hardware resource in a hardware environment, and the second hardware resource is different from a first hardware resource in which the main system runs in the hardware environment; Including: The shadow system performs anomaly detection on the main system;
当影子系统监测出主系统异常时, 从一共享内存中获取主系统的物理内存 地址以及主系统的运行状态信息; 其中, 所述共享内存中的所述物理内存地址 以及运行状态信息是由主系统保存的; 影子系统根据主系统的物理内存地址访问主系统的物理内存, 得到主系统 使用物理内存的信息;  When the shadow system detects the abnormality of the primary system, the physical memory address of the primary system and the running status information of the primary system are obtained from a shared memory; wherein the physical memory address and the running status information in the shared memory are The system saves the physical memory of the main system according to the physical memory address of the main system, and obtains information about the physical memory used by the main system;
影子系统记录主系统使用物理内存的信息以及主系统的运行状态信息。  The shadow system records information about the physical memory used by the primary system and the operational status of the primary system.
7. 根据权利要求 6所述的捕获方法, 其中, 影子系统根据主系统的物理内存地址 访问主系统的物理内存, 得到主系统使用物理内存的信息的步骤包括: 影子系统加载一用于得到主系统的物理内存信息的查询内核; 影子系统根获取到的主系统的物理内存地址配置查询内核的启动参数; 影子系统运行所述查询内核, 访问主系统的物理内存, 得到主系统使用物 理内存的信息。 The capture method according to claim 6, wherein the shadow system accesses the physical memory of the main system according to the physical memory address of the main system, and the step of obtaining the information of the physical memory of the main system comprises: loading the shadow system to obtain the main The query kernel of the system's physical memory information; the physical memory address of the primary system obtained by the shadow system root is configured to query the startup parameters of the kernel; the shadow system runs the query kernel, accesses the physical memory of the primary system, and obtains the physical memory of the primary system. information.
8. 根据权利要求 6所述的捕获方法, 其中, 所述共享内存还包括由主系统保存的 第一物理资源的信息; 影子系统对主系统进行异常检测的步骤包括: The capture method according to claim 6, wherein the shared memory further includes information of the first physical resource saved by the primary system; and the step of the shadow system performing abnormality detection on the primary system includes:
影子系统从所述共享内存中支持心跳报文检测的第一硬件资源的信息, 并 确定出支持心跳报文检测的第一硬件资源;  The shadow system supports the information of the first hardware resource detected by the heartbeat packet from the shared memory, and determines the first hardware resource that supports the detection of the heartbeat packet;
影子系统与支持心跳报文检测的第一硬件资源建立心跳报文检测机制, 从 而实现对主系统的异常监测。  The shadow system establishes a heartbeat packet detection mechanism with the first hardware resource that supports heartbeat packet detection, thereby implementing abnormal monitoring of the main system.
9. 根据权利要求 6所述的捕获方法, 其中, 所述共享内存还包括: 主系统通过软 看门狗对其用户态进程进行的计数; 9. The capture method according to claim 6, wherein the shared memory further comprises: counting, by the main system, the user state process by the soft watchdog;
影子系统对主系统进行异常检测的步骤包括: 影子系统根据所述共享内存中计数的更新状况对主系统进行异常监测。 The steps of the shadow system to detect the abnormality of the main system include: The shadow system performs an abnormality monitoring on the primary system according to the update status of the count in the shared memory.
10. 一种主系统, 运行在硬件环境的第一硬件资源上, 包括: 启动模块, 设置为在一硬件环境的第二硬件资源上启动一用于对主系统进 行异常检测的影子系统; 所述第二硬件资源与主系统运行在所述硬件环境的第 一硬件资源不同; 第一保存模块,设置为将主系统的运行状态信息动态保存在一共享内存中, 使得影子系统在监测出主系统异常时, 从所述共享内存中获取到主系统的运行 状态信息; 10. A host system, running on a first hardware resource of a hardware environment, comprising: a boot module configured to initiate a shadow system for performing anomaly detection on the host system on a second hardware resource of a hardware environment; The second hardware resource is different from the first hardware resource of the main system running in the hardware environment; the first saving module is configured to dynamically save the running state information of the main system in a shared memory, so that the shadow system monitors the main When the system is abnormal, the running status information of the main system is obtained from the shared memory;
第二保存模块, 设置为将主系统的物理内存地址保存在所述共享内存中, 使得影子系统在监测出主系统异常时, 能够通过所述共享内存中的物理内存地 址访问主系统的物理内存, 并获取到主系统使用物理内存的信息。  The second saving module is configured to save the physical memory address of the primary system in the shared memory, so that the shadow system can access the physical memory of the primary system through the physical memory address in the shared memory when the primary system is detected to be abnormal. And get information about the physical memory used by the primary system.
11. 根据权利要求 10所述的主系统,其中,所述共享内存还包括影子系统的运行状 态信息; 所述主系统包括: 11. The host system of claim 10, wherein the shared memory further comprises operational status information of a shadow system; the primary system comprising:
影子系统监测模块, 设置为对影子系统进行异常监测;  The shadow system monitoring module is configured to perform abnormal monitoring on the shadow system;
复位模块, 设置为当所述第一监测模块监测出影子系统异常时, 对影子系 统进行复位。  The reset module is configured to reset the shadow system when the first monitoring module detects that the shadow system is abnormal.
12. 根据权利要求 10所述的主系统, 其中, 所述启动模块包括: 第一加载子模块,设置为将影子系统的内核加载到影子系统的物理内存中; 第一配置子模块, 设置为根据第二硬件资源的信息配置影子系统的系统内 核的启动参数; The primary system of claim 10, wherein the startup module comprises: a first loading submodule configured to load a kernel of a shadow system into a physical memory of the shadow system; the first configuration submodule is set to Configuring a startup parameter of a system kernel of the shadow system according to the information of the second hardware resource;
第一运行子模块,设置为将分配给影子系统的 CPU跳转到影子系统的物理 内存, 从而使该分配给影子系统的 CPU运行所述系统内核, 以启动影子系统。  The first running sub-module is configured to jump the CPU assigned to the shadow system to the physical memory of the shadow system, thereby causing the CPU assigned to the shadow system to run the system kernel to start the shadow system.
13. 根据权利要求 10所述的主系统, 其中, 还包括: 第三保存模块, 设置为将支持心跳报文检测的第一硬件资源的信息保存在 所述共享内存中, 使得影子系统能够根据所述共享内存确定出支持心跳报文检 测的第一硬件资源, 并与支持心跳报文检测的第一硬件资源建立心跳报文检测 机制, 以实现对主系统的异常监测。 The main system according to claim 10, further comprising: a third saving module, configured to save information of the first hardware resource supporting the detection of the heartbeat message in the shared memory, so that the shadow system can be configured according to The shared memory determines the first hardware resource that supports the detection of the heartbeat packet, and establishes a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, so as to implement abnormal monitoring of the primary system.
14. 根据权利要求 10所述的主系统, 其中, 还包括: 看门狗模块, 设置为通过软看门狗对用户态进程进行计数; 14. The main system according to claim 10, further comprising: The watchdog module is set to count user state processes through a soft watchdog;
第四保存模块, 设置为将看门狗的计数实时保存在所述共享内存中, 从而 使得影子系统能够根据所述共享内存中计数的更新状况对主系统进行异常监  a fourth saving module, configured to save the watchdog count in the shared memory in real time, thereby enabling the shadow system to perform abnormal monitoring on the main system according to the update status of the count in the shared memory
15. 一种影子系统, 运行在硬件环境的第二硬件资源, 所述第二硬件资源与主系统 占用所述硬件环境的第一硬件资源不同; 所述影子系统包括: 15. A shadow system, a second hardware resource running in a hardware environment, the second hardware resource being different from a first hardware resource occupied by the main system in the hardware environment; the shadow system comprising:
主系统监测模块, 设置为对主系统进行异常检测  Main system monitoring module, set to perform anomaly detection on the main system
第一获取模块, 设置为当所述主系统监测模块监测出主系统异常时, 从一 共享内存中获取主系统的物理内存地址以及主系统的运行状态信息; 其中, 所 述共享内存中的所述物理内存地址以及运行状态信息是由主系统保存的; 第二获取模块, 设置为根据所述第一获取模块获取到的主系统的物理内存 地址访问主系统的物理内存, 得到主系统使用物理内存的信息; 记录模块, 设置为记录主系统使用物理内存的信息以及主系统的运行状态 信息。  a first obtaining module, configured to: when the main system monitoring module detects that the main system is abnormal, obtain a physical memory address of the main system and running state information of the main system from a shared memory; wherein, the shared memory The physical memory address and the running status information are saved by the main system; the second obtaining module is configured to access the physical memory of the main system according to the physical memory address of the main system acquired by the first obtaining module, and obtain the physical function of the main system. Memory information; The recording module is set to record the information of the physical memory used by the main system and the running status information of the main system.
16. 根据权利要求 15所述的影子系统, 其中, 所述第二获取模块包括: 第二加载子模块, 设置为加载一用于得到主系统使用物理内存的信息的查 询内核; The shadow system according to claim 15, wherein the second obtaining module comprises: a second loading submodule, configured to load a query kernel for obtaining information of a physical memory used by the main system;
第二配置子模块, 设置为根获取到的主系统的物理内存地址配置查询内核 的启动参数;  The second configuration submodule is configured to configure a booting parameter of the kernel for the physical memory address of the primary system obtained by the root;
第二运行子模块, 设置为运行所述查询内核, 访问主系统的物理内存, 得 到主系统使用物理内存的信息。  The second running submodule is configured to run the query kernel, access the physical memory of the main system, and obtain information about the physical memory used by the main system.
17. 根据权利要求 15所述的影子系统,其中,所述共享内存包括由主系统保存的支 持心跳报文检测的第一硬件资源的信息; 所述主系统监测模块包括: The shadow system of claim 15, wherein the shared memory comprises information of a first hardware resource detected by a primary system and supported by a heartbeat message; the primary system monitoring module comprises:
第三获取子模块, 设置为从所述共享内存中支持心跳报文检测的第一硬件 资源的信息, 并确定出支持心跳报文检测的第一硬件资源;  The third obtaining sub-module is configured to support information of the first hardware resource detected by the heartbeat packet from the shared memory, and determine a first hardware resource that supports detection of the heartbeat packet;
第一监测子模块, 设置为与支持心跳报文检测的第一硬件资源建立心跳报 文检测机制, 从而实现对主系统的异常监测。 The first monitoring sub-module is configured to establish a heartbeat packet detection mechanism with the first hardware resource that supports the detection of the heartbeat packet, thereby implementing abnormal monitoring of the main system.
18. 根据权利要求 15所述的影子系统, 其中, 所述共享内存还包括: 主系统通过软 看门狗对其用户态进程进行的计数; The shadow system according to claim 15, wherein the shared memory further comprises: counting, by the main system, the user state process by the soft watchdog;
所述主系统监测模块包括:  The main system monitoring module includes:
第二监测子模块, 设置为根据所述共享内存中计数的更新状况对主系统进 行异常监测。  The second monitoring submodule is configured to perform abnormal monitoring on the primary system according to the update status of the count in the shared memory.
19. 一种智能设备, 包括: 如权利要求 10-14所述的任一项主系统以及如权利要求 15-18所述的任一项影子系统。 19. A smart device, comprising: a primary system according to any of claims 10-14 and any of the shadow systems of claims 15-18.
PCT/CN2014/084439 2014-06-30 2014-08-14 System exception capturing method, main system, shadow system and intelligent device WO2016000298A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410307724.0A CN105204977A (en) 2014-06-30 2014-06-30 System exception capturing method, main system, shadow system and intelligent equipment
CN201410307724.0 2014-06-30

Publications (1)

Publication Number Publication Date
WO2016000298A1 true WO2016000298A1 (en) 2016-01-07

Family

ID=54952671

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/084439 WO2016000298A1 (en) 2014-06-30 2014-08-14 System exception capturing method, main system, shadow system and intelligent device

Country Status (2)

Country Link
CN (1) CN105204977A (en)
WO (1) WO2016000298A1 (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153453A (en) * 2016-03-04 2017-09-12 中兴通讯股份有限公司 A kind of linux system reset processing method and device
CN107783853A (en) * 2016-08-26 2018-03-09 中兴通讯股份有限公司 A kind of method and device that abnormal information is collected in the os starting stage
CN107276789B (en) * 2017-05-19 2020-12-01 太仓鸿羽智能科技有限公司 Log uploading method and device and computer readable storage medium
CN107729128B (en) * 2017-07-24 2019-09-24 深圳壹账通智能科技有限公司 Application data restoration methods, device, computer equipment and storage medium
CN110309008B (en) * 2018-03-20 2023-06-20 浙江宇视科技有限公司 Method and system for storing memory data in case of system abnormality
CN111338914A (en) * 2020-02-10 2020-06-26 华为技术有限公司 Fault notification method and related equipment
CN111745650B (en) * 2020-06-15 2021-10-15 哈工大机器人(合肥)国际创新研究院 Operation method of robot operation system and control method of robot
CN111858116B (en) 2020-06-19 2024-02-13 浪潮电子信息产业股份有限公司 Information recording method, device, equipment and readable storage medium
CN113434324A (en) * 2021-06-29 2021-09-24 苏州科达科技股份有限公司 Abnormal information acquisition method, system, device and storage medium
CN114691406B (en) * 2022-03-29 2025-09-05 深圳市广和通无线股份有限公司 Peripheral device interaction method, peripheral device, main controller and storage medium
CN115269264A (en) * 2022-06-14 2022-11-01 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Control system and method for acquiring main system memory data based on auxiliary system
CN115883823A (en) * 2022-11-10 2023-03-31 深圳创维-Rgb电子有限公司 Television anomaly analysis method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1553716A (en) * 2003-06-04 2004-12-08 中兴通讯股份有限公司 Clustering system for utilizing sharing internal memory in mobile communiation system and realizing method thereof
WO2005084040A1 (en) * 2004-02-27 2005-09-09 Utstarcom (China) Co., Ltd. A method and a system of double engines sharing memory
CN101055538A (en) * 2006-04-12 2007-10-17 国际商业机器公司 System and method for application fault tolerance and recovery
CN101800730A (en) * 2009-02-09 2010-08-11 国际商业机器公司 Safety enhanced virtual machine communication method and virtual machine system
CN102141931A (en) * 2011-03-15 2011-08-03 华为技术有限公司 Virtual machine establishing method, virtual machine monitor and virtual machine system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100472471C (en) * 2006-02-22 2009-03-25 联想(北京)有限公司 A system and method for acquiring computer operating system fault site information
CN101452420B (en) * 2008-12-30 2013-01-09 中兴通讯股份有限公司 Embedded software abnormal monitoring and handling arrangement and method thereof
CN102073572B (en) * 2009-11-24 2015-10-21 中兴通讯股份有限公司 For method for supervising and the system of polycaryon processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1553716A (en) * 2003-06-04 2004-12-08 中兴通讯股份有限公司 Clustering system for utilizing sharing internal memory in mobile communiation system and realizing method thereof
WO2005084040A1 (en) * 2004-02-27 2005-09-09 Utstarcom (China) Co., Ltd. A method and a system of double engines sharing memory
CN101055538A (en) * 2006-04-12 2007-10-17 国际商业机器公司 System and method for application fault tolerance and recovery
CN101800730A (en) * 2009-02-09 2010-08-11 国际商业机器公司 Safety enhanced virtual machine communication method and virtual machine system
CN102141931A (en) * 2011-03-15 2011-08-03 华为技术有限公司 Virtual machine establishing method, virtual machine monitor and virtual machine system

Also Published As

Publication number Publication date
CN105204977A (en) 2015-12-30

Similar Documents

Publication Publication Date Title
WO2016000298A1 (en) System exception capturing method, main system, shadow system and intelligent device
US8826290B2 (en) Method of monitoring performance of virtual computer and apparatus using the method
KR102460380B1 (en) Method and device for handling timeout of system service
JP4887150B2 (en) Method and apparatus for monitoring and resetting a coprocessor
US8423997B2 (en) System and method of controlling virtual machine
US8135985B2 (en) High availability support for virtual machines
CN101377750B (en) System and method for cluster fault toleration
JP5579650B2 (en) Apparatus and method for executing monitored process
US11157373B2 (en) Prioritized transfer of failure event log data
US9489230B1 (en) Handling of virtual machine migration while performing clustering operations
US9584438B2 (en) Idle worker-process page-out
US20120304184A1 (en) Multi-core processor system, computer product, and control method
US20110209148A1 (en) Information processing device, virtual machine connection method, program, and recording medium
JP5597293B2 (en) Computer system and program
CN119363568A (en) Fault handling methods, devices, equipment, storage media and products
EP4443291A1 (en) Cluster management method and device, and computing system
JP5390651B2 (en) Computer system and program
US9348533B2 (en) Memory image capture via memory write from a running system
JP2006092055A (en) Computer system
CN119988080A (en) A service degradation method and related equipment
JP2015132979A (en) Information processing device and control method thereof, control program and recording medium
JP2016076152A (en) Error detection system, error detection method, and error detection program
CN119690618A (en) IPMI command processing method and device
Sultania Monitoring and Failure Recovery of Cloud-Managed Digital Signage
JP2005284686A (en) Fault monitoring method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14896750

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14896750

Country of ref document: EP

Kind code of ref document: A1