CN115061776A - Processing method of virtual machine exception, electronic device and storage medium - Google Patents
Processing method of virtual machine exception, electronic device and storage medium Download PDFInfo
- Publication number
- CN115061776A CN115061776A CN202210614391.0A CN202210614391A CN115061776A CN 115061776 A CN115061776 A CN 115061776A CN 202210614391 A CN202210614391 A CN 202210614391A CN 115061776 A CN115061776 A CN 115061776A
- Authority
- CN
- China
- Prior art keywords
- virtual machine
- processing unit
- central processing
- preset
- time threshold
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45575—Starting, stopping, suspending or resuming virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45591—Monitoring or debugging support
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
技术领域technical field
本发明涉及云计算技术领域,尤其是涉及一种虚拟机异常的处理方法、电子设备和存储介质。The present invention relates to the technical field of cloud computing, and in particular, to a method for processing abnormality of a virtual machine, an electronic device and a storage medium.
背景技术Background technique
在云计算市场蓬勃发展的今天,云计算技术也得到了越来越多人的关注。虚拟化技术具有提升网络资源利用率、有效保护网络环境等显著优势,现已成为云计算技术的发展过程中不可或缺的支撑。In today's booming cloud computing market, cloud computing technology has also attracted more and more people's attention. Virtualization technology has significant advantages such as improving the utilization of network resources and effectively protecting the network environment, and has now become an indispensable support in the development of cloud computing technology.
虚拟机在执行一些微码时会关闭事件窗口,此时中央处理器(centralprocessing unit,简称“CPU”)无法响应某些事件。这些事件具体包括普通中断、不可屏蔽中断(Non Maskable Interrupt,简称“NMI”)以及系统管理中断(System managementinterrupt,简称“SMI”)等。而当虚拟机在执行微码的过程中出现异常时,会使得其所运行的CPU长时间无法对所有事件进行响应,进而导致该CPU卡死。此时,该CPU也无法响应其他CPU向其发送的通知消息,这会导致其他CPU不断向其发送消息,从而导致整台宿主机卡死。When the virtual machine executes some microcode, the event window will be closed, and the central processing unit ("CPU") cannot respond to some events at this time. These events specifically include common interrupts, non-maskable interrupts (Non Maskable Interrupt, “NMI” for short), and system management interrupts (System management interrupts, “SMI” for short). When an exception occurs in the process of executing the microcode, the virtual machine will cause the CPU it runs to be unable to respond to all events for a long time, thereby causing the CPU to freeze. At this point, the CPU also cannot respond to notification messages sent to it by other CPUs, which will cause other CPUs to continuously send messages to it, resulting in the entire host being stuck.
在虚拟机执行某些微码并出现卡死的情况时,传统的解决方法是增加针对这些微码的处理指令,以使得虚拟机在执行相应微码时能够避免出现问题。然而,这种解决方式存在显著的滞后性,在解决问题过程中可能依然难以正常使用虚拟机。When the virtual machine executes some microcodes and gets stuck, the traditional solution is to increase processing instructions for these microcodes, so that the virtual machine can avoid problems when executing the corresponding microcodes. However, there is a significant lag in this solution, and it may still be difficult to use the virtual machine properly during the problem-solving process.
发明内容SUMMARY OF THE INVENTION
本发明实施方式的目的在于提供一种虚拟机异常的处理方法、电子设备和存储介质,用以缩短处理虚拟机异常所需的时间,提高宿主机的可靠性和健壮性。The purpose of the embodiments of the present invention is to provide a virtual machine exception processing method, electronic device and storage medium, so as to shorten the time required for processing virtual machine exceptions and improve the reliability and robustness of the host machine.
为了实现上述目的,本发明的实施方式提供了一种虚拟机异常的处理方法,包括:中央处理器读取宿主机中存储的预设时间门限,且对中央处理器在non-root模式下运行的时间进行实时监控;在运行的时间达到预设时间门限的情况下,中央处理器生成预设虚拟机退出进程,触发虚拟机暂停运行;宿主机收集虚拟机的相关信息,且回收虚拟机占用的资源;其中,虚拟机占用的资源包括虚拟机占用的中央处理器的资源。In order to achieve the above object, embodiments of the present invention provide a method for processing an exception of a virtual machine, including: a central processing unit reads a preset time threshold stored in a host machine, and runs the central processing unit in a non-root mode. When the running time reaches the preset time threshold, the central processing unit generates a preset virtual machine exit process, triggering the virtual machine to suspend operation; the host machine collects the relevant information of the virtual machine and reclaims the virtual machine occupied The resources occupied by the virtual machine include the resources of the central processing unit occupied by the virtual machine.
本发明的实施方式还提供了一种服务器,包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述的虚拟机异常的处理方法。Embodiments of the present invention also provide a server, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor. The processor executes, so that at least one processor can execute the above-mentioned virtual machine exception processing method.
本发明的实施方式还提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现上述的虚拟机异常的处理方法。Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the above-mentioned method for processing an exception of a virtual machine is implemented.
在本发明的实施方式中,中央处理器首先读取宿主机中存储的预设时间门限,并对中央处理器在non-root模式下运行的时间进行实时监控,即对虚拟机的运行时间进行实时监控。进而在该运行时间达到预设时间门限的情况下,说明虚拟机运行的时间过长未能正常退出、可能发生了卡死,中央处理器生成预设虚拟机退出进程触发虚拟机暂停运行。相较于传统的虚拟机异常后进行诊断并针对性处理异常的方法,能够更为及时的发现虚拟机的运行异常,缩短处理虚拟机异常的所需时间。进而宿主机收集虚拟机的相关信息且回收虚拟机占用的资源,使得虚拟机占用的该中央处理器的资源能够恢复对该中央处理器所在的宿主机上的其他中央处理器的通信消息的响应,避免因其他中央处理器不断向异常中央处理器发送通信请求而导致发生宿主机卡死等严重问题,有效提高了宿主机的可靠性和健壮性。In the embodiment of the present invention, the central processing unit first reads the preset time threshold stored in the host machine, and monitors the running time of the central processing unit in the non-root mode in real time, that is, the running time of the virtual machine is monitored in real time. real time monitoring. Furthermore, when the running time reaches the preset time threshold, it means that the virtual machine runs for too long and fails to exit normally, and may be stuck, and the central processing unit generates a preset virtual machine exit process to trigger the virtual machine to suspend running. Compared with the traditional method of diagnosing and dealing with exceptions after the abnormality of the virtual machine, the abnormal operation of the virtual machine can be detected in a more timely manner, and the time required for processing the abnormality of the virtual machine can be shortened. Then the host machine collects relevant information of the virtual machine and reclaims the resources occupied by the virtual machine, so that the resources of the central processing unit occupied by the virtual machine can recover the response to the communication messages of other central processing units on the host machine where the central processing unit is located. , to avoid serious problems such as the host machine being stuck due to other central processing units continuously sending communication requests to the abnormal central processing unit, effectively improving the reliability and robustness of the host computer.
附图说明Description of drawings
一个或多个实施方式通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施方式的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。One or more embodiments are exemplified by the pictures in the corresponding drawings, and these exemplifications do not constitute limitations of the embodiments, and elements with the same reference numerals in the drawings are denoted as similar elements, Unless otherwise stated, the figures in the accompanying drawings do not constitute a scale limitation.
图1是根据本发明一实施方式中的虚拟机异常的处理方法流程示意图;FIG. 1 is a schematic flowchart of a method for processing an exception of a virtual machine according to an embodiment of the present invention;
图2是根据本发明另一实施方式中的虚拟机运行的流程示意图;FIG. 2 is a schematic flowchart of a virtual machine running according to another embodiment of the present invention;
图3是根据本发明一实施方式中的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明实施方式的目的、技术方案和优点更加清楚,下面将结合附图对本发明的各实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本发明各实施方式中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。In order to make the objectives, technical solutions and advantages of the embodiments of the present invention clearer, each embodiment of the present invention will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can appreciate that, in the various embodiments of the present invention, many technical details are set forth in order for the reader to better understand the present application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solutions claimed in the present application can be realized.
对于执行某些微码引发虚拟机卡死,传统的解决方法是发现漏洞后,运维人员针对这些微码增加相应的处理指令。使得虚拟机在执行这些微码时,能够完成运维人员增加设置的处理动作,从而使得虚拟机能够避免卡死。可见,这种解决虚拟机卡死问题的方式具有明显的滞后性,整个解决的过程可能需要较长时间,依然影响虚拟机的正常使用。For the execution of some microcodes to cause the virtual machine to freeze, the traditional solution is to add corresponding processing instructions to these microcodes after the vulnerability is discovered. When the virtual machine executes these microcodes, the operation and maintenance personnel can complete the processing action of adding settings, so that the virtual machine can avoid being stuck. It can be seen that this method of solving the virtual machine stuck problem has obvious lag, and the whole process of solving the problem may take a long time, which still affects the normal use of the virtual machine.
本发明的一实施方式涉及一种虚拟机异常的处理方法。在本实施方式中,中央处理器读取宿主机中存储的预设时间门限,且对中央处理器在non-root模式下运行的时间进行实时监控;在运行的时间达到预设时间门限的情况下,中央处理器生成预设虚拟机退出进程,触发虚拟机暂停运行;宿主机收集虚拟机的相关信息,且回收虚拟机占用的资源;其中,虚拟机占用的资源包括虚拟机占用的中央处理器的资源。An embodiment of the present invention relates to a method for processing an exception of a virtual machine. In this embodiment, the central processing unit reads the preset time threshold stored in the host, and monitors the running time of the central processing unit in the non-root mode in real time; when the running time reaches the preset time threshold Next, the central processing unit generates a preset virtual machine exit process, triggering the virtual machine to suspend operation; the host machine collects relevant information of the virtual machine, and recycles the resources occupied by the virtual machine; wherein, the resources occupied by the virtual machine include the central processing occupied by the virtual machine. device resources.
下面对本实施方式中的虚拟机异常的处理方法的实现细节进行具体的说明,以下内容仅为方便理解本方案的实现细节,并非实施本方案的必须。具体流程如图1所示,可包括如下步骤:The implementation details of the virtual machine exception processing method in this embodiment will be specifically described below. The following content is only for the convenience of understanding the implementation details of this solution, and is not necessary for implementing this solution. The specific process is shown in Figure 1, which may include the following steps:
步骤101,中央处理器读取宿主机中存储的预设时间门限,且对中央处理器在non-root模式下运行的时间进行实时监控。In
一般来说,在虚拟机运行时中央处理器以non-root模式运行,且虚拟机停止运行时中央处理器退出non-root模式。因此,中央处理器对其在non-root模式下运行的时间进行实时监控,即对虚拟机运行的时间进行实时监控。对虚拟机运行的时间进行实时监控,以便于在虚拟机运行时间超长时对虚拟机的运行进行及时的干预。Generally speaking, the CPU runs in non-root mode when the virtual machine is running, and exits the non-root mode when the virtual machine stops running. Therefore, the central processing unit performs real-time monitoring of the time when it runs in the non-root mode, that is, the real-time monitoring of the running time of the virtual machine. The running time of the virtual machine is monitored in real time, so that the running of the virtual machine can be intervened in time when the running time of the virtual machine is too long.
可以理解地,中央处理器读取宿主机中存储的预设时间门限,可以具体为:中央处理器在切换至non-root模式时,读取宿主机中存储的当前运行的虚拟机对应的预设时间门限。在宿主机上存在多个虚拟机的情况下,中央处理器能够用当前运行的虚拟机对应的预设时间门限约束该虚拟机的运行时间。Understandably, the central processing unit reads the preset time threshold stored in the host machine, which may be specifically: when the central processing unit switches to the non-root mode, reads the preset time threshold stored in the host machine corresponding to the currently running virtual machine. Set a time limit. In the case that there are multiple virtual machines on the host machine, the central processing unit can restrict the running time of the virtual machine by using a preset time threshold corresponding to the currently running virtual machine.
此外值得一提的是,对于在宿主机上运行的不同虚拟机,可以预先设置不同的预设时间门限,即可实现利用不同时间门限对虚拟机独立地进行运行异常的判断。It is also worth mentioning that, for different virtual machines running on the host machine, different preset time thresholds can be set in advance, so that it is possible to independently judge the abnormal operation of the virtual machines by using different time thresholds.
在一个例子中,在中央处理器读取宿主机中存储的预设时间门限之前,还可以包括:宿主机基于用户态程序中的时间配置接口,接收用户通过用户态程序设置的预设时间门限。可以理解地,在本例中,用户可以通过宿主机的用户态程序对虚拟机的预设时间门限进行自主设置,使得设置的预设时间门限能够针对性地满足虚拟机的实际运行需求。In an example, before the central processor reads the preset time threshold stored in the host, it may further include: the host receives the preset time threshold set by the user through the user-mode program based on the time configuration interface in the user-mode program . Understandably, in this example, the user can independently set the preset time threshold of the virtual machine through the user mode program of the host machine, so that the set preset time threshold can specifically meet the actual running requirements of the virtual machine.
宿主机基于时间配置接口接收预设时间门限可以具体包括:首先检查用户设置的预设时间门限,并判断该预设时间门限是否大于0;在预设时间门限大于0的情况下,将该预设时间门限存储。The host computer receiving the preset time threshold based on the time configuration interface may specifically include: first checking the preset time threshold set by the user, and judging whether the preset time threshold is greater than 0; if the preset time threshold is greater than 0, the preset time threshold Set a time threshold for storage.
值得一提的是,为了更好地实施本发明提供的方法,确保虚拟机运行异常时中央处理器无法正常退出non-root模式,在中央处理器读取宿主机中存储的预设时间门限之前,还可以包括:检测中央处理器是否具有定期退出non-root模式的机制。在中央处理器具有定期退出non-root模式的机制的情况下,所述宿主机存储所述预设时间门限,供所述中央处理器读取。It is worth mentioning that, in order to better implement the method provided by the present invention, to ensure that the central processing unit cannot normally exit the non-root mode when the virtual machine runs abnormally, before the central processing unit reads the preset time threshold stored in the host machine. , and may also include: detecting whether the central processing unit has a mechanism to periodically exit non-root mode. When the central processing unit has a mechanism for periodically exiting the non-root mode, the host computer stores the preset time threshold for the central processing unit to read.
在一个虚拟机为KVM虚拟机的具体实施例中,KVM虚拟机内核模块接收到预设时间门限后,检测当前CPU是否支持定期退出non-root模式,如果不支持,直接报错;若支持,则将预设时间门限的数值写入宿主机中该KVM虚拟机对应的虚拟机控制结构(VirtualMachine Control Structure,简称“VMCS”)中(具体可以写入VMCS的NOTIFY_WINDOW字段中),进而激活CPU的定期退出功能。In a specific embodiment where the virtual machine is a KVM virtual machine, after receiving the preset time threshold, the KVM virtual machine kernel module detects whether the current CPU supports regular exit from the non-root mode, and if not, reports an error directly; Write the value of the preset time threshold into the virtual machine control structure (VirtualMachine Control Structure, referred to as "VMCS") corresponding to the KVM virtual machine in the host machine (specifically, it can be written into the NOTIFY_WINDOW field of VMCS), and then activate the CPU periodically Exit function.
由于虚拟机的预设时间门限存储在各自对应的VMCS中,因而各虚拟机之间对异常的判断能够相互独立。多个虚拟机中的任意虚拟机卡死也不会影响其他虚拟机的正常运行。Since the preset time thresholds of the virtual machines are stored in their corresponding VMCSs, the abnormal judgments between the virtual machines can be independent of each other. If any virtual machine among multiple virtual machines is stuck, it will not affect the normal operation of other virtual machines.
此外,为了避免预设时间门限设置的过小而导致虚拟机在正常运行状态下被错误地回收,还可以预先设置一个预设时间门限的最小值。在接收用户通过用户态程序设置的预设时间门限之后,还可以包括:在接收的用户设置的预设时间门限小于预设的时间门限最小值的情况下,宿主机向用户发送重新设置提醒,或者将预设时间门限更新为时间门限最小值。In addition, in order to prevent the virtual machine from being erroneously reclaimed in a normal running state due to the setting of the preset time threshold being too small, a minimum value of the preset time threshold may also be preset. After receiving the preset time threshold set by the user through the user state program, the method may further include: in the case that the received preset time threshold set by the user is less than the preset minimum time threshold value, the host sends a reset reminder to the user, Or update the preset time threshold to the minimum value of the time threshold.
步骤102,在运行的时间达到预设时间门限的情况下,中央处理器生成预设虚拟机退出进程,触发虚拟机暂停运行。
一般来说,中央处理器在non-root模式下运行达到一定时间会自动退出non-root模式。因而,在中央处理器在non-root模式下运行的时间达到预设时间门限时,说明虚拟机运行的时间过长未能正常退出,可能发生了虚拟机卡死的情况。因此在本步骤中,中央处理器生成预设虚拟机退出进程能够及时触发虚拟机停止运行,并进行后续的步骤。Generally speaking, when the CPU runs in non-root mode for a certain period of time, it will automatically exit non-root mode. Therefore, when the running time of the central processing unit in the non-root mode reaches the preset time threshold, it means that the virtual machine has been running for too long and cannot exit normally, and the virtual machine may be stuck. Therefore, in this step, the central processing unit generates a preset virtual machine exit process, which can trigger the virtual machine to stop running in time, and perform subsequent steps.
值得一提的是,在生成预设虚拟机退出进程之后,还可以包括:中央处理器在宿主机上生成虚拟机异常告警信息。在宿主机生成告警信息能够提醒宿主机的运维人员对宿主机进行及时保护,确保在虚拟机卡死的情况下宿主机的安全运行。It is worth mentioning that, after the preset virtual machine exit process is generated, it may further include: the central processing unit generates virtual machine abnormal alarm information on the host machine. Generating alarm information on the host can remind the operation and maintenance personnel of the host to protect the host in time to ensure the safe operation of the host when the virtual machine is stuck.
此外,为了缩短虚拟机的停止运行时间,在回收虚拟机占用的资源之后,还可以包括:中央处理器触发虚拟机的重启。值得一提的是,此处涉及的触发虚拟机重启,可以根据用户需求按照原配置对虚拟机进行自动重启。In addition, in order to shorten the stop running time of the virtual machine, after reclaiming the resources occupied by the virtual machine, the method may further include: the central processing unit triggers the restart of the virtual machine. It is worth mentioning that the triggering of the virtual machine restart involved here can automatically restart the virtual machine according to the original configuration according to the user's needs.
步骤103,宿主机收集虚拟机的相关信息,且回收虚拟机占用的资源;其中,虚拟机占用的资源包括虚拟机占用的中央处理器的资源。
由于预设虚拟机退出进程区别于常规的虚拟机退出进程,因而在中央处理器生成预设虚拟机退出进程时,宿主机中控制虚拟机的内核模块捕捉到这一事件。为了使得虚拟机占用的资源能够被释放从而被重新利用,宿主机对虚拟机占用的资源进行回收。此处涉及的虚拟机占用的资源除占用的中央处理器的资源之外,还可以包括虚拟机占用的内存、磁盘以及其他透传的设备的资源。Since the preset virtual machine exit process is different from the conventional virtual machine exit process, when the central processing unit generates the preset virtual machine exit process, the kernel module controlling the virtual machine in the host machine captures this event. In order to enable the resources occupied by the virtual machine to be released and reused, the host machine reclaims the resources occupied by the virtual machine. The resources occupied by the virtual machine involved here may include, in addition to the resources occupied by the central processing unit, the memory, disk, and resources of other transparent transmission devices occupied by the virtual machine.
值得说明的是,本步骤中涉及的虚拟机的相关信息可以包括:虚拟机的软件版本、虚拟机控制结构VMCS和虚拟处理器(virtualcentral processing unit,简称“VCPU”)信息。值得一提的是,本实施方式中不对宿主机需收集的虚拟机的相关信息进行具体限制,虚拟机的相关信息可以包括与虚拟机以及运行虚拟机相关的其余需收集的信息。It is worth noting that the relevant information of the virtual machine involved in this step may include: the software version of the virtual machine, the virtual machine control structure VMCS, and virtual central processing unit (“VCPU” for short) information. It is worth mentioning that this embodiment does not specifically limit the relevant information of the virtual machine to be collected by the host machine, and the relevant information of the virtual machine may include other information to be collected related to the virtual machine and running the virtual machine.
此外,在对虚拟机的相关信息进行收集后,可以将这些信息文件打包存储在宿主机的固定位置中,以便于技术人员进行查看。In addition, after the relevant information of the virtual machine is collected, the information files can be packaged and stored in a fixed location of the host machine, so as to be easily viewed by technicians.
在本实施方式中,中央处理器首先读取宿主机中存储的预设时间门限,并对中央处理器在non-root模式下运行的时间进行实时监控,即对虚拟机的运行时间进行实时监控。进而在该运行时间达到预设时间门限的情况下,说明虚拟机运行的时间过长未能正常退出、可能发生了卡死,中央处理器生成预设虚拟机退出进程触发虚拟机暂停运行。相较于传统的虚拟机异常后进行诊断并针对性处理异常的方法,能够更为及时的发现虚拟机的运行异常,缩短处理虚拟机异常的所需时间。进而宿主机收集虚拟机的相关信息且回收虚拟机占用的资源,使得虚拟机占用的该中央处理器的资源能够恢复对该中央处理器所在的宿主机上的其他中央处理器的通信消息的响应,避免因其他中央处理器不断向异常中央处理器发送通信请求而导致发生宿主机卡死等严重问题,有效提高了宿主机的可靠性和健壮性。In this embodiment, the central processing unit first reads the preset time threshold stored in the host machine, and monitors the running time of the central processing unit in the non-root mode in real time, that is, the running time of the virtual machine is monitored in real time . Furthermore, when the running time reaches the preset time threshold, it means that the virtual machine runs for too long and fails to exit normally, and may be stuck, and the central processing unit generates a preset virtual machine exit process to trigger the virtual machine to suspend running. Compared with the traditional method of diagnosing and dealing with exceptions after the abnormality of the virtual machine, the abnormal operation of the virtual machine can be detected in a more timely manner, and the time required for processing the abnormality of the virtual machine can be shortened. Then the host machine collects relevant information of the virtual machine and reclaims the resources occupied by the virtual machine, so that the resources of the central processing unit occupied by the virtual machine can recover the response to the communication messages of other central processing units on the host machine where the central processing unit is located. , to avoid serious problems such as the host machine being stuck due to other central processing units continuously sending communication requests to the abnormal central processing unit, effectively improving the reliability and robustness of the host computer.
本发明的另一实施方式涉及一种虚拟机异常的处理方法。值得说明的是,本实施方式与上一实施方式的主要区别在于:在本实施方式中,在虚拟机暂停运行的情况下,还包括利用判断程序判断虚拟机退出进程是否为预设虚拟机退出进程。Another embodiment of the present invention relates to a method for processing an exception of a virtual machine. It is worth noting that the main difference between this embodiment and the previous embodiment is that: in this embodiment, when the virtual machine is suspended, it also includes using a judgment program to judge whether the virtual machine exit process is the preset virtual machine exit. process.
此外值得说明的是,上一实施方式中公开的技术细节在本实施方式中依然有效,为减少重复,在本实施方式中不再赘述。In addition, it is worth noting that the technical details disclosed in the previous embodiment are still valid in this embodiment, and are not repeated in this embodiment to reduce repetition.
下面对本实施方式中的虚拟机异常的处理方法的实现细节进行具体的说明,以下内容仅为方便理解本方案的实现细节,并非实施本方案的必须。本实施方式涉及的虚拟机运行的流程图可以如图2所示。The implementation details of the virtual machine exception processing method in this embodiment will be specifically described below. The following content is only for the convenience of understanding the implementation details of this solution, and is not necessary for implementing this solution. The flowchart of the virtual machine operation involved in this embodiment may be as shown in FIG. 2 .
步骤201,虚拟机开始运行。可以理解地,本实施方式中涉及的虚拟机被预先配置有对应的预设时间门限,用于对虚拟机的运行时间进行限制。
步骤202,定时器开始计时,通过定时器的计时时间对虚拟机运行的时间进行实时监控。
值得说明的是,在本实施方式中,利用定时器对虚拟机运行的时间进行实时监控。而由于对虚拟机运行时间进行监控可以通过对中央处理器在non-root模式下运行的时间进行实时监控来实现。因此,本步骤涉及的技术手段可以具体为:中央处理器在切换至non-root模式时触发定时器开始计时,通过定时器的计时时间对中央处理器在non-root模式下运行的时间进行实时监控。It should be noted that, in this embodiment, a timer is used to monitor the running time of the virtual machine in real time. The monitoring of the running time of the virtual machine can be realized by monitoring the running time of the central processing unit in non-root mode in real time. Therefore, the technical means involved in this step can be specifically as follows: when the central processing unit switches to the non-root mode, the timer is triggered to start timing, and the time when the central processing unit is running in the non-root mode is performed in real time through the timing time of the timer. monitor.
步骤203,虚拟机暂停运行。
步骤,204,判断虚拟机退出进程是否为预设虚拟机退出进程。Step 204: Determine whether the virtual machine exit process is a preset virtual machine exit process.
在很多情况下都会产生虚拟机退出进程,使得虚拟机暂停运行。在进行访问特权指令或者虚拟的硬件等正常的步骤流程时,也可能会产生虚拟机退出进程,使得虚拟机暂停运行。因此,为了确保只有在发生预设虚拟机退出进程时才进行处理异常的步骤,避免虚拟机正常退出时进行回收该虚拟机资源的步骤而导致浪费处理资源、影响虚拟机后续的正常运行,在虚拟机暂停运行后需要判断当前退出进程是否为预设虚拟机退出进程。In many cases, a virtual machine exit process will occur, causing the virtual machine to suspend operation. When performing normal steps such as accessing privileged instructions or virtual hardware, a virtual machine exit process may also occur, causing the virtual machine to suspend operation. Therefore, in order to ensure that the steps of handling exceptions are only performed when the preset virtual machine exit process occurs, and to avoid the steps of recycling the resources of the virtual machine when the virtual machine exits normally, which would lead to waste of processing resources and affect the subsequent normal operation of the virtual machine, in After the virtual machine is suspended, it is necessary to determine whether the current exit process is the preset virtual machine exit process.
步骤205,宿主机收集虚拟机的相关信息,且回收虚拟机占用的资源。In
在虚拟机退出进程为预设虚拟机退出进程时,宿主机收集虚拟机的相关信息,且回收虚拟机占用的资源,能够使得虚拟机占用的资源被释放。使得该虚拟机所占用的中央处理器资源能够恢复对该中央处理器所在的宿主机上的其他中央处理器的通信消息的响应,避免因其他中央处理器不断向异常中央处理器发送通信请求而导致发生宿主机卡死等严重问题,有效提高了宿主机的可靠性和健壮性。When the virtual machine exit process is the preset virtual machine exit process, the host machine collects relevant information of the virtual machine and reclaims the resources occupied by the virtual machine, so that the resources occupied by the virtual machine can be released. It enables the CPU resources occupied by the virtual machine to recover the response to the communication messages of other CPUs on the host where the CPU is located, so as to avoid other CPUs sending communication requests to the abnormal CPU continuously. This leads to serious problems such as the host machine being stuck, effectively improving the reliability and robustness of the host machine.
步骤206,将计时时间清零。
在虚拟机退出进程为非预设虚拟机退出进程时,说明这一段时间内虚拟机没有发生异常,无需进行进行异常处理。而是对定时器进行还原操作,即将定时时间清零。使得在处理器下一次进入non-root模式后,定时器能够重新开始计时,避免发生计时错误而错误的触发预设虚拟机退出进程,影响该虚拟机的正常运行。If the virtual machine exit process is a non-preset virtual machine exit process, it means that no exception occurs in the virtual machine during this period of time, and no exception handling is required. Instead, the timer is restored, that is, the timer is reset to zero. This enables the timer to restart timing after the processor enters the non-root mode next time, so as to avoid a timing error and erroneous triggering of the preset virtual machine exit process, which affects the normal operation of the virtual machine.
可以看出,实施本发明提供的方法并不会影响虚拟机正常运行情况下的步骤流程。因而,与运行虚拟机相关的工具与管理软件等不需要进行任何修改,能够保证实施本方法的电子设备具有较好的兼容性。此外,由于预设虚拟机退出进程是由中央处理器产生的,因而不会对虚拟机的运行性能产生干扰。It can be seen that the implementation of the method provided by the present invention will not affect the step flow under the normal operation of the virtual machine. Therefore, the tools and management software related to running the virtual machine do not need any modification, which can ensure that the electronic device implementing the method has better compatibility. In addition, since the preset virtual machine exit process is generated by the central processing unit, it will not interfere with the running performance of the virtual machine.
此外,上述方法中的各步骤之间耦合度较低,对于中央处理器生成预设虚拟机退出进程之后的步骤也可以根据不同场景的实际需求,进行灵活的定制化设计。In addition, the coupling degree between the steps in the above method is low, and the steps after the CPU generates the preset virtual machine exit process can also be flexibly customized according to the actual needs of different scenarios.
值得说明的是,除了前述内容中提及的微码异常引发的虚拟机卡死之外,对于如其他虚拟机长期占用中央处理器资源造成的卡死等,本发明提供的虚拟机异常的处理方法也适用。本发明提供的虚拟机异常的处理方法无需对虚拟机系统进行修改,能够在用户无感知的情况下,实现增强云平台的可用性和健壮性。It is worth noting that, in addition to the virtual machine stuck caused by the abnormal microcode mentioned in the foregoing content, for the stuck caused by other virtual machines occupying the CPU resources for a long time, the processing of the virtual machine exception provided by the present invention is method also works. The method for processing the abnormality of the virtual machine provided by the present invention does not need to modify the virtual machine system, and can enhance the usability and robustness of the cloud platform without the user's perception.
上面各方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包含相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。The steps of the above methods are divided only for the purpose of describing clearly, and can be combined into one step or split into some steps during implementation, and decomposed into multiple steps, as long as they contain the same logical relationship, they are all within the protection scope of this patent; Adding insignificant modifications to the algorithm or process or introducing insignificant designs without changing the core design of the algorithm and process are within the scope of protection of this patent.
本发明的另一实施方式涉及一种服务器,如图3所示,包括至少一个处理器301;以及,与至少一个处理器301通信连接的存储器302;其中,存储器302存储有可被至少一个处理器301执行的指令,指令被至少一个处理器301执行,以使至少一个处理器301能够执行上述的虚拟机异常的处理方法。Another embodiment of the present invention relates to a server, as shown in FIG. 3 , comprising at least one
其中,存储器302和处理器301采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器301和存储器302的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器301处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器301。The
处理器301负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器302可以被用于存储处理器301在执行操作时所使用的数据。
本发明的另一实施方式涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述虚拟机异常的处理方法的实施方式。Another embodiment of the present invention relates to a computer-readable storage medium storing a computer program. When a computer program is executed by a processor, an embodiment of the above-mentioned method for processing an exception of a virtual machine is implemented.
本领域技术人员可以理解,实现上述实施方式方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本发明各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-OnlyMemory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can understand that all or part of the steps in the methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program is stored in a storage medium and includes several instructions to make a device (which may be A single-chip microcomputer, a chip, etc.) or a processor (processor) executes all or part of the steps of the methods in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.
上述实施方式是提供给本领域普通技术人员来实现和使用本发明的,本领域普通技术人员可以在不脱离本发明的发明思想的情况下,对上述实施方式做出种种修改或变化,因而本发明的保护范围并不被上述实施方式所限,而应该符合权利要求书所提到的创新性特征的最大范围。The above-mentioned embodiments are provided for those of ordinary skill in the art to implement and use the present invention. Those of ordinary skill in the art can make various modifications or changes to the above-mentioned embodiments without departing from the inventive concept of the present invention. The protection scope of the invention is not limited by the above-mentioned embodiments, but should meet the maximum scope of the innovative features mentioned in the claims.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210614391.0A CN115061776A (en) | 2022-05-31 | 2022-05-31 | Processing method of virtual machine exception, electronic device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210614391.0A CN115061776A (en) | 2022-05-31 | 2022-05-31 | Processing method of virtual machine exception, electronic device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115061776A true CN115061776A (en) | 2022-09-16 |
Family
ID=83198087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210614391.0A Pending CN115061776A (en) | 2022-05-31 | 2022-05-31 | Processing method of virtual machine exception, electronic device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115061776A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160117190A1 (en) * | 2014-10-28 | 2016-04-28 | Intel Corporation | Virtual Processor Direct Interrupt Delivery Mechanism |
CN105993004A (en) * | 2014-07-21 | 2016-10-05 | 上海兆芯集成电路有限公司 | Address translation cache that supports simultaneous invalidation of common context entries |
CN114489941A (en) * | 2022-01-19 | 2022-05-13 | 上海交通大学 | Virtual machine management method and system running in host mode user mode |
-
2022
- 2022-05-31 CN CN202210614391.0A patent/CN115061776A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105993004A (en) * | 2014-07-21 | 2016-10-05 | 上海兆芯集成电路有限公司 | Address translation cache that supports simultaneous invalidation of common context entries |
US20160117190A1 (en) * | 2014-10-28 | 2016-04-28 | Intel Corporation | Virtual Processor Direct Interrupt Delivery Mechanism |
CN114489941A (en) * | 2022-01-19 | 2022-05-13 | 上海交通大学 | Virtual machine management method and system running in host mode user mode |
Non-Patent Citations (1)
Title |
---|
黄啸等: "基于硬件虚拟化的安全高效内核监控模型", 《软件学报》, vol. 2, no. 27, 29 February 2016 (2016-02-29), pages 481 - 494 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3893114B1 (en) | Fault processing method, related device, and computer storage medium | |
CN114328102B (en) | Equipment status monitoring method, device, equipment and computer-readable storage medium | |
US10585755B2 (en) | Electronic apparatus and method for restarting a central processing unit (CPU) in response to detecting an abnormality | |
US11526411B2 (en) | System and method for improving detection and capture of a host system catastrophic failure | |
JP3910554B2 (en) | Method, computer program, and data processing system for handling errors or events in a logical partition data processing system | |
US7318171B2 (en) | Policy-based response to system errors occurring during OS runtime | |
US20240126593A1 (en) | User-mode interrupt request processing method and apparatus | |
US7877643B2 (en) | Method, system, and product for providing extended error handling capability in host bridges | |
CN113821257B (en) | Method and device for inquiring information of processor kernel call stack | |
CN111026573A (en) | Watchdog system of multi-core processing system and control method | |
CN109062718B (en) | Server and data processing method | |
US8230446B2 (en) | Providing a computing system with real-time capabilities | |
CN114490276A (en) | Peripheral equipment abnormity monitoring method, device and system and storage medium | |
US5983359A (en) | Processor fault recovering method for information processing system | |
US20230214245A1 (en) | Online Migration Method and System for Bare Metal Server | |
US20160292108A1 (en) | Information processing device, control program for information processing device, and control method for information processing device | |
CN114115703A (en) | Bare metal server online migration method and system | |
CN114356708A (en) | A device fault monitoring method, device, device and readable storage medium | |
CN118708396A (en) | Error information processing method, device, medium and program product | |
US7260752B2 (en) | Method and apparatus for responding to critical abstracted platform events in a data processing system | |
CN1581079B (en) | Automatic restarting method and system for down of network server | |
CN115576734B (en) | A multi-core heterogeneous log storage method and system | |
CN115061776A (en) | Processing method of virtual machine exception, electronic device and storage medium | |
CN116627702A (en) | Method and device for restarting virtual machine in downtime | |
EP2691853B1 (en) | Supervisor system resuming control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |