CN104102528A

CN104102528A - Quick performance restoration method of virtual machine monitor

Info

Publication number: CN104102528A
Application number: CN201410334442.XA
Authority: CN
Inventors: 徐建; 费薇; 李涛; 张宏; 张琨; 衷宜; 陈龙
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2014-07-14
Filing date: 2014-07-14
Publication date: 2014-10-15

Abstract

The invention provides a fast virtual machine monitor performance recovery method, the steps are as follows: step 1, memory-based VM state passivation; step 2, VMM micro-restart; step 3, VM activation. The method provided by the invention can quickly clear the internal state of the VMM, reduce the impact on other VMs due to frequent disk access and a large amount of bandwidth occupation, and prevent the restarted system from entering the crash state again due to inconsistent data.

Description

Fast way to recover virtual machine monitor performance

技术领域technical field

本发明属于性能恢复技术领域，特别是一种快速的虚拟机监视器性能恢复方法。The invention belongs to the technical field of performance recovery, in particular to a fast virtual machine monitor performance recovery method.

背景技术Background technique

软件衰退现象是指一个长时间持续运行的软件系统会发生状态退化或性能降低，最终导致系统崩溃。软件衰退现象往往表现为系统资源缓慢泄漏、未释放的文件锁、数据损坏等形式，且随着时间推移，系统错误状态逐步累积，导致系统性能下降，或者瞬时失效。The phenomenon of software decay refers to the state degradation or performance degradation of a software system that continues to run for a long time, eventually leading to system crashes. Software degradation often manifests itself in the form of slow leakage of system resources, unreleased file locks, and data corruption. Over time, system error states gradually accumulate, resulting in system performance degradation or instantaneous failure.

近年来，为了有效地满足面向互联网的复杂应用对大规模计算能力、海量数据处理和信息服务的需求，国际学术界和工业界开始借助于虚拟化技术实现异构环境中资源的可共享、可管理和可协同，并支持应用大规模部署、迁移和运行维护。虚拟化技术通过在软、硬件之间引入虚拟层，屏蔽硬件平台的动态性、分布性和异构性，支持硬件资源的共享和复用，并为每个普通用户提供独立的、隔离的计算环境，同时为管理员提供软硬件资源的集中管理。结合图1，虚拟机监视器VMM(Virtual Machine Monitor)是实现虚拟化技术的中间层软件，常用的VMM有Xen和VMware等。借助于虚拟化技术在软、硬件之间引入虚拟层，该层屏蔽硬件平台的动态性、分布性和异构性，支持硬件资源的共享和复用，并为每个普通用户提供独立的、隔离的计算环境，同时为管理员提供软硬件资源的集中管理。通常，把通过VMM为应用提供的独立运行环境称之为虚拟机VM(Virtual Machine)，如运行在VMware中的Linux系统和应用程序构成一个VM，且把VM中的操作系统称之为客户操作系统GOS(Guest OperatingSystem)，VMM称为VM的宿主。对于每个VMM而言，可以承载多个VM，每个VM之间相互隔离。然而，对于任何软件而言，在其运行时不可避免地经历软件衰退，VMM自然也不能例外。因为VMM是VMs的宿主，所以当VMM进行性能恢复时，VMs的状态会丢失，导致VM提供的服务失效，且增加失效时间。In recent years, in order to effectively meet the needs of complex Internet-oriented applications for large-scale computing power, massive data processing and information services, the international academic and industrial circles have begun to use virtualization technology to realize the sharing and availability of resources in heterogeneous environments. Management and collaboration, and supports large-scale deployment, migration, and operation and maintenance of applications. Virtualization technology shields the dynamics, distribution and heterogeneity of the hardware platform by introducing a virtual layer between software and hardware, supports the sharing and multiplexing of hardware resources, and provides independent and isolated computing for each common user. environment while providing administrators with centralized management of hardware and software resources. Referring to Figure 1, the virtual machine monitor VMM (Virtual Machine Monitor) is the middle layer software that implements virtualization technology. Commonly used VMMs include Xen and VMware. With the help of virtualization technology, a virtual layer is introduced between software and hardware. This layer shields the dynamics, distribution and heterogeneity of the hardware platform, supports the sharing and multiplexing of hardware resources, and provides independent, An isolated computing environment while providing administrators with centralized management of hardware and software resources. Usually, the independent operating environment provided by VMM for applications is called a virtual machine VM (Virtual Machine). For example, the Linux system and applications running in VMware constitute a VM, and the operating system in the VM is called a guest operation. System GOS (Guest Operating System), VMM is called the host of VM. For each VMM, it can host multiple VMs, and each VM is isolated from each other. However, for any software, it is inevitable to experience software decay when it is running, and VMM is no exception. Because the VMM is the host of the VMs, when the VMM performs performance recovery, the state of the VMs will be lost, resulting in the failure of the services provided by the VM and increasing the failure time.

为了应对由于软件衰退导致的系统性能下降，甚至是失效，研究人员提出了很多用于恢复系统性能的方法，如Restart，Microreboot，Checkpointing，Rio，RootHammer，Otherworld，Recovery Box等，用于改善系统性能，增加系统的可靠性和可用性。In order to cope with system performance degradation or even failure due to software degradation, researchers have proposed many methods for restoring system performance, such as Restart, Microreboot, Checkpointing, Rio, RootHammer, Otherworld, Recovery Box, etc., to improve system performance , to increase system reliability and availability.

重启Restart(B.Randell.System Structure for Software Fault Tolerance.IEEETransactions on Software Engineering,SE-1(2):220-232,1975.)是一种最为简单而有效的性能恢复手段，常被应用于操作系统和进程的重启。其中，操作系统重启是用于解决操作系统运行时问题的最为常见的方法，然而，操作系统重启涉及关闭运行时系统，重新加载操作系统内核，需要比较多的时间开销，并且会导致正在运行的应用程序进程意外终止，造成数据丢失。本发明中的VMM地位等价于传统计算系统中操作系统的地位，若将重启技术直接应用于VMM会导致VM状态的丢失。Restarting Restart (B. Randell. System Structure for Software Fault Tolerance. IEEE Transactions on Software Engineering, SE-1(2): 220-232, 1975.) is the simplest and most effective means of performance recovery, often used in operations System and process restarts. Among them, operating system restart is the most common method for solving operating system runtime problems. However, operating system restart involves shutting down the runtime system and reloading the operating system kernel, which requires a lot of time overhead and will cause running The application process terminates unexpectedly causing data loss. The status of the VMM in the present invention is equivalent to that of the operating system in the traditional computing system. If the restart technology is directly applied to the VMM, the state of the VM will be lost.

Microreboot(G.Candea,S.Kawamoto,Y.Fujiki,G.Friedman,and A.Fox,Microreboot-A Technique for Cheap Recovery，6th Symposium on OperatingSystems Design and Implementation,San Francisco,CA,pp.31-44,2004)是一种递归重启技术，该技术的核心思想是首先对细粒度的软件组件实施重启，若重启后软件性能未能恢复，则对更粗粒度的软件组件实施重启，以此类推，直至达到恢复性能的目的。本发明中仅考虑VMM有恢复的需求，认为VMM承载的VM是健壮的，因此没有应用递归重启技术的必要性，但需要确保VMM重启时VM状态不丢失。Microreboot (G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox, Microreboot-A Technique for Cheap Recovery, 6th Symposium on Operating Systems Design and Implementation, San Francisco, CA, pp.31-44, 2004) is a recursive restart technology. The core idea of this technology is to restart the fine-grained software components first. If the software performance fails to recover after restarting, restart the coarser-grained software components, and so on until To achieve the purpose of restoring performance. The present invention only considers that the VMM has recovery requirements, and considers that the VM carried by the VMM is robust, so there is no need to apply the recursive restart technology, but it is necessary to ensure that the VM state is not lost when the VMM is restarted.

通过使用重启技术和Microreboot技术，重启后系统从失效状态重新回到了健康状态，但由于大量数据和状态丢失，很多应用和服务中断，性能大大降低。By using restart technology and Microreboot technology, the system returns to a healthy state from an invalid state after restarting. However, due to the loss of a large amount of data and state, many applications and services are interrupted, and the performance is greatly reduced.

为了解决恢复过程中待恢复软件组件所承载组件状态丢失的问题，Checkpointing以及Rio文件缓存法被提出。Checkpointing(S.Feldman and C.Brown.IGOR:A System for Program Debugging via Reversible Execution.InProceedings of the Workshop on Parallel and Distributed Debugging,pages112–123,1989)被应用于恢复过程，基本思想是在对待恢复软件组件实施恢复之前，采用Checkpointing技术保存所承载组件状态，但恢复过后再重新加载所承载组件状态，避免组件状态丢失。Rio文件缓存法(P.Chen,W.Ng,S.Chandra,C.Aycock,G.Rajamani,and D.Lowell,The Rio File Cache:Surviving Operating SystemCrashes,in Proc.Int’l Conf.ASPLOS,1996,pp.74–83.)能够保存旧的文件缓存。当系统崩溃时，Rio能够保存脏文件缓存至硬盘，并且能够防止因重启造成对文件的任何修改或损坏。Rio依赖于操作系统和硬件，有些操作不能被硬件支持，而本发明方法依赖于VMM，独立于硬件。另外，Rio关注的是系统的可靠性，对重启后提升系统性能作用甚微。In order to solve the problem of losing the state of the components carried by the software components to be restored during the recovery process, Checkpointing and Rio file caching methods are proposed. Checkpointing (S.Feldman and C.Brown.IGOR: A System for Program Debugging via Reversible Execution.InProceedings of the Workshop on Parallel and Distributed Debugging, pages112–123, 1989) is applied to the recovery process, the basic idea is to treat recovery software Before the component is restored, the Checkpointing technology is used to save the state of the carried component, but after the restoration, the state of the carried component is reloaded to avoid the loss of the state of the component. Rio File Cache Method (P.Chen, W.Ng, S.Chandra, C.Aycock, G.Rajamani, and D.Lowell, The Rio File Cache: Surviving Operating System Crashes, in Proc.Int'l Conf.ASPLOS, 1996 , pp.74–83.) can save old file caches. When the system crashes, Rio can save the dirty file cache to the hard disk, and can prevent any modification or corruption of the file due to restart. Rio depends on operating system and hardware, and some operations cannot be supported by hardware, while the method of the present invention depends on VMM and is independent of hardware. In addition, Rio focuses on the reliability of the system, which has little effect on improving system performance after restarting.

无论是Checkpointing和Rio文件缓存法保存组件状态都是基于外存实现的，需要更多的时间开销，因此研究人员考虑基于内存的状态保存方法，如Otherworld(M.Le,A.Gallagher,and Y.T amir,Challenges and Opportunities withFault Injection in Virtualized Systems，First International Workshop on VirtualizationPerformance:Analysis,Characterization and Tools,Austin,TX，April 2008)。Otherworld允许Linux内核从失效状态中恢复，并同时保存运行进程的状态。新内核在一块预留的存储空间中启动，因此失效系统的存储空间中的内容被保存了下来。在Otherworld中，线程通过新建进程描述符并从旧的内存区域拷贝出原值，然而修复的内核组件需要传递许多复杂的数据结构，这极有可能损坏内核。本发明虽然也采用的基于内存的VM状态保存思想，但充分利用原VM分配的内存空间，通过P2M映射表能方便识别被修改的内存页缩小需要保留的内存页范围，且所需额外内存开销很小。Both Checkpointing and Rio file caching methods are based on external memory to save component state, which requires more time overhead, so researchers consider memory-based state preservation methods, such as Otherworld (M.Le, A.Gallagher, and Y.T amir, Challenges and Opportunities with Fault Injection in Virtualized Systems, First International Workshop on Virtualization Performance: Analysis, Characterization and Tools, Austin, TX, April 2008). Otherworld allows the Linux kernel to recover from a failed state while saving the state of running processes. The new kernel starts in a reserved memory space, so the content of the failed system's memory space is preserved. In Otherworld, a thread creates a new process descriptor and copies the original value from the old memory area. However, the repaired kernel components need to pass many complex data structures, which is very likely to damage the kernel. Although the present invention also adopts the idea of memory-based VM state preservation, it makes full use of the memory space allocated by the original VM, and can conveniently identify the modified memory pages through the P2M mapping table to narrow the range of memory pages that need to be reserved, and the required additional memory overhead very small.

Recovery Box(M.Baker and M.Sullivan.The Recovery Box:Using FastRecovery to Provide High Availability in the UNIX Environment.In Proceedings ofthe Summer USENIX Conference,pages31–44,1992)仅保存操作系统和应用程序的状态，且将状态保存于未使用的内存空间中，在恢复之后重新加载这些状态，但是它也需要硬件支持以达到在重启后内存状态不丢失的目的，本发明方法依赖于VMM，独立于硬件，且能保存完整的VM状态。The Recovery Box (M.Baker and M.Sullivan.The Recovery Box:Using FastRecovery to Provide High Availability in the UNIX Environment.In Proceedings of the Summer USENIX Conference,pages31–44,1992) only saves the state of the operating system and applications, and Save the state in the unused memory space, and reload these states after recovery, but it also requires hardware support to achieve the purpose of not losing the memory state after restarting. The method of the present invention relies on VMM, is independent of hardware, and can Save the complete VM state.

发明内容Contents of the invention

本发明的目的在于提供一种融合VM钝化和激活以及VMM微重启实施快速的虚拟机监视器性能恢复方法。The purpose of the present invention is to provide a virtual machine monitor performance recovery method that integrates VM passivation and activation and VMM micro-restart to implement fast.

实现本发明目的的技术解决方案为：一种快速的虚拟机监视器性能恢复方法，步骤如下：The technical solution that realizes the object of the present invention is: a kind of fast virtual machine monitor performance restoration method, and the steps are as follows:

步骤1，基于内存的VM状态钝化，具体为：Step 1, memory-based VM status passivation, specifically:

在实施VMM性能恢复之前，将VMM中每个VM的所有状态保存到预留的内存块中；Save all state of each VM in the VMM to a reserved memory block before implementing VMM performance recovery;

步骤2，VMM微重启，具体为：Step 2, VMM micro-restart, specifically:

保存重启前VMM的堆信息、静态数据段和内存页信息，载入原始内核映像实现系统重启，解决重启前后状态的一致性问题；Save the heap information, static data segment and memory page information of VMM before restart, load the original kernel image to realize system restart, and solve the problem of consistency of state before and after restart;

步骤3，VM激活，具体为：Step 3, VM activation, specifically:

按照每个VM的配置信息创建新的VM，将保存在预留内存块中的VM的所有状态重新加载到新的VM中。Create a new VM according to the configuration information of each VM, and reload all the states of the VM saved in the reserved memory block into the new VM.

本发明与现有技术相比，其显著优点：(1)在VMM实施性能恢复之前，钝化VM的状态信息，尤其是已使用的内存页，能缩短重建VM过程中从文件系统读取文件填充内存页的时间，进而也避免了由于频繁的磁盘存取大量占用带宽对其他VMs的影响；(2)采用微重启的方法，既能快速清除VMM的内部状态，又能通过集成旧VMM和VMs的状态的方法迅速恢复系统性能，很大程度上减少了空间和时间成本；(3)在系统重启后，通过判断不一致的诱发因素，有针对性地使用不同的机制解决一致性问题，使得重启后的系统不会因为不一致的数据再次进入崩溃状态，从而保证恢复成功率。Compared with the prior art, the present invention has significant advantages: (1) before the VMM implements performance recovery, the state information of the passivation VM, especially the used memory pages, can shorten the process of rebuilding the VM and read files from the file system The time to fill the memory page, thereby avoiding the impact on other VMs due to frequent disk access and a large amount of bandwidth; (2) The micro-restart method can not only quickly clear the internal state of the VMM, but also integrate the old VMM and The VMs state method quickly restores system performance, which greatly reduces space and time costs; (3) After the system restarts, by judging the inconsistency inducing factors, it uses different mechanisms to solve the consistency problem in a targeted manner, so that The restarted system will not enter the crash state again due to inconsistent data, thus ensuring the recovery success rate.

下面结合附图对本发明作进一步详细描述。The present invention will be described in further detail below in conjunction with the accompanying drawings.

附图说明Description of drawings

图1是单服务器虚拟化系统场景图；Figure 1 is a scene diagram of a single server virtualization system;

图2是本发明快速的虚拟机性能恢复方法的流程图；Fig. 2 is a flow chart of the fast virtual machine performance recovery method of the present invention;

图3是VM状态钝化过程图；Fig. 3 is a process diagram of VM state passivation;

图4是VMM微重启的流程图；Fig. 4 is a flow chart of VMM micro-restart;

图5是VMM的内存结构图。Figure 5 is a memory structure diagram of the VMM.

具体实施方式Detailed ways

结合图2，本发明提供一种快速的虚拟机性能恢复方法，步骤如下：In conjunction with Fig. 2, the present invention provides a kind of quick virtual machine performance restoration method, and the steps are as follows:

步骤2，VMM微重启，具体为：Step 2, VMM micro-restart, specifically:

步骤3，VM激活，具体为：Step 3, VM activation, specifically:

按照每个VM的配置信息创建新的VM，将保存在预留内存堆中的VM的所有状态重新加载到新的VM中。Create a new VM according to the configuration information of each VM, and reload all the states of the VM saved in the reserved memory heap into the new VM.

结合图3，步骤1的具体过程如下：Combined with Figure 3, the specific process of step 1 is as follows:

步骤1.1，通过VMM提供的VM状态管理接口获取VMM承载的VM的个数n、每个VM的配置信息和每个VM的状态，设VM_LIST表示VMM承载的VM列表，记为VM_LIST＝{VM₁,VM₂,...,VM_n}，对于任意的VM_i(1≤i≤n)，其状态空间为S＝{s₁,s₂,s₃}，其中s₁表示VM处于运行态，s₂表示处于休眠态以及s₃表示关闭状态；VM的配置信息包括虚拟CPU个数、虚拟网络适配器个数、虚拟内存大小和虚拟磁盘空间大小等；若VM_i处于s₁或者s₂，则VMM为VM_i预留对应的内存堆，分配额外的内存单元存储VM_i的配置信息和执行状态；若VM_i处于s₃，记为r∈[1,n]；Step 1.1, obtain the number n of VMs carried by the VMM, the configuration information of each VM and the state of each VM through the VM state management interface provided by the VMM, set VM_LIST to represent the list of VMs carried by the VMM, and record it as VM_LIST={VM ₁ ,VM ₂ ,...,VM _n }, for any VM _i (1≤i≤n), its state space is S={s ₁ ,s ₂ ,s ₃ }, where s ₁ means the VM is in the running state , s ₂ means it is in sleep state and s ₃ means it is in shutdown state; the configuration information of VM includes the number of virtual CPUs, the number of virtual network adapters, the size of virtual memory and the size of virtual disk space, etc.; if VM _i is in s ₁ or s ₂ , Then the VMM reserves a corresponding memory heap for VM _i , and allocates additional memory units to store the configuration information and execution status of VM _i ; if VM _i is in s ₃ , it is denoted as r∈[1,n];

步骤1.2，选取运行态的和休眠态的中的任意一个VM_j，其中p,q∈[1,n]， ${VM}_{j} &Element; {{VM}_{p}^{s_{1}}, {VM}_{q}^{s_{2}}},$ 初始化j＝1；Step 1.2, select the running state and dormant Any VM _j in , where p,q∈[1,n], ${VM}_{j} &Element; {{VM}_{p}^{{the s}_{1}}, {VM}_{q}^{{the s}_{2}}},$ initialize j = 1;

步骤1.3，VMM向VM_j发出暂停指令，即suspend指令，一旦VM_j接收到suspend指令，则VM_j的suspend处理程序调用前端驱动程序的detach接口用于卸载VM_j上已经加载所有虚拟设备；每个VM都有一个前端驱动程序，该驱动程序负责管理VM上的所有虚拟设备，如虚拟磁盘，虚拟网络适配器等；一旦所有虚拟设备卸载完成，VM_j的suspend处理程序向VMM发起suspend超级调用；In step 1.3, the VMM sends a suspend command to VM _j , that is, the suspend command. Once VM _j receives the suspend command, the suspend handler of VM _j calls the detach interface of the front-end driver to unload all virtual devices that have been loaded on VM _j ; Each VM has a front-end driver, which is responsible for managing all virtual devices on the VM, such as virtual disks, virtual network adapters, etc.; once all virtual devices are unloaded, the suspend handler of VM _j initiates a suspend hypercall to VMM;

步骤1.4，VMM为发起suspend超级调用的VM_j预留对应的内存块，用于保存P2M映射表和缓存内存页，其中P2M映射表维护VM内存管理的物理地址到VMM管理的机器地址映射关系，由VMM负责管理P2M映射表；VM管理的每个内存页面被称为物理内存页，每个页面都有唯一的编号，称为PFN(physicalframe number)，而VMM管理的每个内存页面被称为机器内存页，也有唯一的编号，称为MFN(machine frame number)，所以P2M映射表的每一个表项都是PFN到MFN映射。一旦VM_j分配了内存页，则该内存页对应的PFN到MFN映射关系即被存储到P2M映射表；缓存内存页是被VM使用的那些内存页，可以通过查询P2M映射表计算，具体方法是遍历P2M映射表，每个映射表项中的PFN即为缓存内存页的PFN，这些缓存内存页在VMM实施恢复操作过程中，需要保留下来；P2M映射表和缓存内存页对应的机器内存页本身由VMM负责管理，所以这两部分VM状态的保存不需要额外内存空间，直接关闭VM_j；In step 1.4, the VMM reserves a corresponding memory block for the VM _j that initiates the suspend hypercall to save the P2M mapping table and cache memory pages, wherein the P2M mapping table maintains the mapping relationship between the physical address managed by the VM memory and the machine address managed by the VMM, The VMM is responsible for managing the P2M mapping table; each memory page managed by the VM is called a physical memory page, and each page has a unique number called PFN (physical frame number), and each memory page managed by the VMM is called The machine memory page also has a unique number, which is called MFN (machine frame number), so each entry in the P2M mapping table is a PFN-to-MFN mapping. Once VM _j allocates a memory page, the PFN-to-MFN mapping relationship corresponding to the memory page is stored in the P2M mapping table; cached memory pages are those memory pages used by the VM, which can be calculated by querying the P2M mapping table. The specific method is Traversing the P2M mapping table, the PFN in each mapping table entry is the PFN of the cache memory page, and these cache memory pages need to be preserved during the recovery operation of the VMM; the machine memory page itself corresponding to the P2M mapping table and the cache memory page The VMM is responsible for the management, so the preservation of these two parts of the VM state does not require additional memory space, and the VM _j is directly closed;

步骤1.5，再次通过VMM提供的VM状态管理接口获取VM_j的状态，若VM_j未处于关闭状态，则重新执行步骤1.3-1.4；若状态为已关闭，令j←j+1，重复步骤1.3-1.4，直到VMM提供的VM状态管理接口获取每一个VM_j的状态均处于关闭状态，步骤1结束。Step 1.5, obtain the state of VM _j again through the VM state management interface provided by VMM, if VM _j is not in the closed state, then re-execute steps 1.3-1.4; if the state is closed, set j←j+1, repeat step 1.3 -1.4, until the VM status management interface provided by the VMM obtains the status of each VM _j is in the closed state, and step 1 ends.

结合图4，步骤2的具体过程如下：Combined with Figure 4, the specific process of step 2 is as follows:

步骤2.1，保留VMM的状态；VMM的内存结构如图5所示，在VMM微重启过程中保留的数据包括VMM的静态数据段、VMM堆信息和VMM机器内存页信息，其中静态数据段包含了新的VMM启动过程中所需的关键信息，如中断描述符，指向VMM堆内存结构的指针，指向VM内存结构和VM列表的指针，具体方法为：Step 2.1, keep the state of VMM; the memory structure of VMM is shown in Figure 5, the data preserved in the VMM micro-restart process includes the VMM static data segment, VMM heap information and VMM machine memory page information, where the static data segment contains Key information required in the new VMM startup process, such as interrupt descriptors, pointers to VMM heap memory structures, pointers to VM memory structures and VM lists, the specific method is:

步骤2.1.1，修改VMM的链接程序linker，将预留的内存空间分配给bss段实现bss段的扩展，使得该增加的内存空间足以容纳重启前VMM的bss段和静态数据段的数据，将重启前VMM的bss段和静态数据段的数据复制到新的VMM的扩展后bss段；Step 2.1.1, modify the VMM link program linker, allocate the reserved memory space to the bss segment to realize the expansion of the bss segment, so that the increased memory space is enough to accommodate the data of the bss segment and static data segment of the VMM before restarting, and set Copy the data of the bss segment and the static data segment of the VMM before restarting to the expanded bss segment of the new VMM;

步骤2.1.2，修改VMM的启动程序bootup，在新VMM的内存堆分配之前，从重启前VMM的内存堆中通过遍历的方式识别未经使用的内存页，将未经使用的内存页被添加到内存堆尾部，这使得原先已经被使用的内存页在重启前后保持不变；通过这种方式，保存在VMM内存堆中的VM的状态信息在重启前后不会丢失；Step 2.1.2, modify the VMM startup program bootup, before the memory heap of the new VMM is allocated, identify unused memory pages from the memory heap of the VMM before restarting by traversing, and add the unused memory pages To the end of the memory heap, which keeps the previously used memory pages unchanged before and after restart; in this way, the state information of the VM stored in the VMM memory heap will not be lost before and after restart;

步骤2.2，为了确保VMM和VM_j状态得以保留，利用Linux内核提供的kexec机制设计了快速的VMM加载方法，该方法可以避免在系统重启过程中由于硬件复位导致的内存状态丢失，具体过程如下：在VMM中实现kexec机制的快速的VMM加载方法xexec，该方法负责加载VMM的原始内核映像以及特权虚拟机VM^*的内核，并为特权虚拟机VM^*分配初始内存和磁盘空间；特权虚拟机VM^*和VMM共同管理VMM其他的虚拟机VM₁,VM₂,...,VM_n；Step 2.2, in order to ensure that the state of VMM and VM _j is preserved, a fast VMM loading method is designed using the kexec mechanism provided by the Linux kernel. This method can avoid the loss of memory state due to hardware reset during system restart. The specific process is as follows: The fast VMM loading method xexec that implements the kexec mechanism in the VMM is responsible for loading the original kernel image of the VMM and the kernel of the privileged virtual machine VM ^* , and allocating initial memory and disk space for the privileged virtual machine VM ^* ; the privileged virtual machine VM ^* Cooperate with VMM to manage other virtual machines VM ₁ , VM ₂ ,...,VM _n of VMM;

步骤2.3，重启VMM并完成初始化，将扩展的bss段中的静态数据段中数据拷贝至重启后新的静态数据段中，这些数据包括特权虚拟机指针、存储空闲分配表指针、P2M映射表指针、时间事件对象指针、域列表指针、哈希表指针、系统时间变量、IRQ描述；Step 2.3, restart the VMM and complete the initialization, copy the data in the static data segment in the extended bss segment to the new static data segment after the restart, these data include the privileged virtual machine pointer, the storage free allocation table pointer, and the P2M mapping table pointer , time event object pointer, domain list pointer, hash table pointer, system time variable, IRQ description;

步骤2.4，消除新旧VMM之间、VMM和VM之间、以及VMM和硬件之间的状态不一致，即消除新旧VMM之间、VMM和VM之间、以及VMM和硬件之间的状态不一致：Step 2.4, eliminate the state inconsistencies between the old and new VMMs, between the VMM and the VM, and between the VMM and the hardware, that is, eliminate the state inconsistencies between the old and new VMMs, between the VMM and the VM, and between the VMM and the hardware:

新旧VMM之间的状态不一致是由VMM失效导致的关键数据结构被部分更新、锁未释放和内存泄露引起的；针对上述三种导致不一致的原因，采取不同的相对措施消除状态不一致，具体方法如下：针对部分更新的数据结构，利用VMM失效前记录的日志信息重建新VMM的数据结构；针对未释放的锁，跟踪所有的锁和信号并重新初始化；针对内存泄露，在恢复VMM后，系统回收泄露的存储空间；The state inconsistency between the old and new VMMs is caused by partial updates of key data structures, unreleased locks, and memory leaks caused by VMM failures; in view of the above three causes of inconsistencies, different relative measures are taken to eliminate state inconsistencies. The specific methods are as follows : For partially updated data structures, use the log information recorded before the VMM failure to rebuild the data structure of the new VMM; for unreleased locks, track all locks and signals and re-initialize; for memory leaks, after restoring the VMM, the system recycles leaked storage space;

VMM和VM之间的状态不一致是由非单调递增的系统时间、部分执行的超级调用、未送达的虚拟中断引起；针对上述三种导致不一致的原因，采取不同的相对措施消除状态不一致；具体方法如下：针对非单调递增的系统时间问题，在VMM崩溃前保存VMM时间结构，并在VMM重启后在VMs运行前恢复时间结构，保证系统时间和VMs的时间一致；针对部分执行的超级调用，通过日志获取超级调用执行信息，重新执行未完整执行的超级调用；针对未送达的虚拟中断，采用超时机制由超时处理程序重新发送虚拟中断请求。The state inconsistency between VMM and VM is caused by non-monotonically increasing system time, partially executed hypercalls, and undelivered virtual interrupts; for the above three causes of inconsistency, different relative measures are taken to eliminate state inconsistency; specifically The method is as follows: for the problem of non-monotonically increasing system time, save the VMM time structure before the VMM crashes, and restore the time structure before the VMs run after the VMM restarts to ensure that the system time is consistent with the time of the VMs; for partially executed hypercalls, Obtain the hypercall execution information through the log, and re-execute the incomplete hypercall; for the undelivered virtual interrupt, the timeout mechanism is used to resend the virtual interrupt request by the timeout handler.

通过重置I/O控制器或响应所有挂起的中断来解决VMM和硬件之间的不一致性。Resolve inconsistencies between the VMM and hardware by resetting the I/O controller or responding to all pending interrupts.

消除新旧VMM之间、VMM和VM之间、以及VMM和硬件之间的状态不一致的具体方法参见以下几篇文章：For specific methods to eliminate state inconsistencies between old and new VMMs, between VMMs and VMs, and between VMMs and hardware, see the following articles:

Candea G,Kawamoto S,Fujiki Y,et al.Microreboot-A Technique for CheapRecovery[C]//OSDI.2004,4:31-44.Candea G, Kawamoto S, Fujiki Y, et al.Microreboot-A Technique for CheapRecovery[C]//OSDI.2004,4:31-44.

Chen P M,Ng W T,Chandra S,et al.The Rio file cache:Surviving operatingsystem crashes[M].ACM,1996.Chen P M, Ng W T, Chandra S, et al.The Rio file cache: Surviving operating system crashes[M].ACM,1996.

Bailey K,Ceze L,Gribble S D,et al.Operating system implications of fast,cheap,non-volatile memory[C]//Proceedings of the13th USENIX conference on Hottopics in operating systems.USENIX Association,2011:2-2.Bailey K, Ceze L, Gribble S D, et al. Operating system implications of fast, cheap, non-volatile memory[C]//Proceedings of the13th USENIX conference on Hottopics in operating systems. USENIX Association, 2011:2-2.

Kourai K.Cachemind:Fast performance recovery using a virtual machinemonitor[C]//Dependable Systems and Networks Workshops(DSN-W),2010International Conference on.IEEE,2010:86-92.Kourai K.Cachemind: Fast performance recovery using a virtual machine monitor[C]//Dependable Systems and Networks Workshops(DSN-W),2010International Conference on.IEEE,2010:86-92.

Park E,Egger B,Lee J.Fast and space-efficient virtual machinecheckpointing[C]//ACM SIGPLAN Notices.ACM,2011,46(7):75-86.Park E, Egger B, Lee J. Fast and space-efficient virtual machine checkpointing[C]//ACM SIGPLAN Notices.ACM,2011,46(7):75-86.

步骤3的具体过程如下：The specific process of step 3 is as follows:

步骤3.1，遍历VM_LIST，选取任意VM_l，初始化变量l＝1,1≤l≤n；Step 3.1, traverse VM_LIST, select any VM _l , initialize variable l=1, 1≤l≤n;

步骤3.2，若VM_l处于运行态或休眠态，则按照VM_l的配置信息和P2M映射表为VM_l分配配置信息指定的内存空间，转步骤3.3；若VM_l处于关闭状态，则转步骤3.5；Step 3.2, if VM ₁ is in running state or dormant state, then according to the configuration information of VM ₁ and the P2M mapping table, allocate the memory space specified by the configuration information for VM ₁ , and go to step 3.3; if VM ₁ is in a closed state, then go to step 3.5 ;

步骤3.3，根据配置信息加载虚拟设备，并建立与前端驱动程序之间的通信；Step 3.3, load the virtual device according to the configuration information, and establish communication with the front-end driver;

步骤3.4，启动VM_l，并由VMM更新P2M映射表，设置VM_l的执行状态；Step 3.4, start VM ₁ , and update the P2M mapping table by VMM, set the execution state of VM ₁ ;

步骤3.5，l←l+1；若l≤n-1，则转到步骤3.2；否则，结束步骤3。Step 3.5, l←l+1; if l≤n-1, go to step 3.2; otherwise, end step 3.

Claims

1. a virtual machine monitor method for restoring performance fast, is characterized in that, step is as follows:

Step 1, the vm health passivation based on internal memory, is specially,

Before implementing VMM performance recovery, all states of each VM in VMM are saved in reserved memory block;

Step 2, VMM is micro-restarts, be specially,

Heapinfo, static data section and the page information of front VMM is restarted in preservation, is written into original kernel reflection and realizes system and restart, and solves the consistency problem of state before and after restarting;

Step 3, VM activates, is specially,

Create new VM according to the configuration information of each VM, all states that are kept at the VM in the reserved heap of VMM are re-loaded in new VM.

2. the method for restoring performance of virtual machine monitor fast according to claim 1, is characterized in that the detailed process of step 1 is as follows:

Step 1.1, the vm health management interface providing by VMM obtains the configuration information of number n, each VM and the state of each VM of the VM of VMM carrying, is designated as VM_LIST={VM ₁, VM ₂..., VM _n, VM_LIST is the VM list of VMM carrying, and is each in run mode with dormant state reserved corresponding memory block is stored its corresponding configuration information and executing state, wherein p, q ∈ [1, n];

Step 1.2, chooses run mode with dormant state in any one VM _j,

{VM}_{j} &Element; {{VM}_{p}^{s_{1}} {, VM}_{q}^{s_{2}}},

Initializing variable j=1;

Step 1.3, VMM is to VM _jsend suspend instruction, each VM _jdetach interface unload all virtual units that loaded, unloaded after each VM _jsuspend handling procedure initiate suspend hypercalls to VMM;

Step 1.4, VMM is the VM that initiates suspend hypercalls _jpreserve P2M mapping table and buffer memory page, close VM _j;

Step 1.5, the vm health management interface again providing by VMM obtains VM _jstate, if VM _jnot in closed condition, re-execute step 1.3-1.4; If state, for closing, makes j ← j+1, repeating step 1.3-1.4, until the vm health management interface that VMM provides obtains each VM _jstate all in closed condition.

3. the method for restoring performance of virtual machine monitor fast according to claim 1, the detailed process of step 2 is as follows:

Step 2.1, preserves heapinfo, static data section and the page information of restarting front VMM, and concrete grammar is:

Step 2.1.1, the chain program linker of amendment VMM, the bss section of expansion VMM, will restart the bss section of front VMM and the data Replica of static data section to bss section after the expansion of new VMM;

Step 2.1.2, the start-up routine bootup of amendment VMM, identification is restarted in the heap of front VMM without the page using and is added to heap afterbody;

Step 2.2, based on the quick VMM loading method of kexec mechanism, loads original kernel reflection and the franchise virtual machine VM of VMM ^*kernel, and be franchise virtual machine VM ^*distribute initial internal memory and disk space;

Step 2.3, restarts VMM and completes initialization, and data in the static data section in the bss section of expansion are copied to and are restarted in rear new static data section;

Step 2.4, eliminates between new and old VMM, state between VMM and VM and between VMM and hardware is inconsistent.

4. the method for restoring performance of virtual machine monitor fast according to claim 1, is characterized in that the detailed process of step 3 is as follows:

Step 3.1, traversal VM_LIST, chooses any VM _l, initializing variable l=1,1≤l≤n;

Step 3.2, if VM _lin run mode or dormant state, according to VM _lconfiguration information and P2M mapping table be VM _lthe memory headroom that assignment configuration information is specified, goes to step 3.3; If VM _lin closed condition, go to step 3.5;

Step 3.3, loads virtual unit according to configuration information, and communicating by letter between foundation and front-end driven program;

Step 3.4, starts VM _l, and upgrade P2M mapping table by VMM, VM is set _lexecuting state;

Step 3.5, l ← l+1; If l≤n-1, forwards step 3.2 to; Otherwise, end step 3.