HK1243785B - Virtual disk storage techniques - Google Patents
Virtual disk storage techniquesInfo
- Publication number
- HK1243785B HK1243785B HK18102670.3A HK18102670A HK1243785B HK 1243785 B HK1243785 B HK 1243785B HK 18102670 A HK18102670 A HK 18102670A HK 1243785 B HK1243785 B HK 1243785B
- Authority
- HK
- Hong Kong
- Prior art keywords
- virtual disk
- file
- extent
- virtual
- extension
- Prior art date
Links
Description
背景技术Background Art
存储虚拟化技术允许逻辑存储与物理存储分开。存储虚拟化的一个示范性使用情形是在虚拟机内。虚拟化软件(通常称为管理程序或虚拟机监视器)层安装在计算机系统上和控制虚拟机如何与物理硬件交互。由于通常把访客操作系统编码为实行物理硬件上的独占控制,所以虚拟化软件可以被配置成细分物理硬件的资源和模拟虚拟机内物理硬件的存在。存储虚拟化的另一使用情形是在被配置成实施存储阵列的计算机系统内。在此情形中,物理计算机系统或虚拟机可以使用iSCSI协议等连接到存储阵列。Storage virtualization technology allows logical storage to be separated from physical storage. One exemplary use case for storage virtualization is within a virtual machine. A layer of virtualization software (often called a hypervisor or virtual machine monitor) is installed on a computer system and controls how the virtual machine interacts with the physical hardware. Because guest operating systems are typically coded to exercise exclusive control over the physical hardware, the virtualization software can be configured to subdivide the resources of the physical hardware and simulate the presence of physical hardware within the virtual machine. Another use case for storage virtualization is within a computer system configured to implement a storage array. In this scenario, the physical computer system or virtual machine can connect to the storage array using protocols such as iSCSI.
可以使用存储操控模块来模拟虚拟或者物理机的存储。例如,存储操控模块可以通过对可以用来描述(即,存储)虚拟盘扩展的一个或更多个虚拟盘文件(即,诸如块的连续存储区域)进行读取和写入来操控虚拟或物理机发出的存储IO任务。同样地,存储操控程序可以通过向一个或更多个虚拟盘文件写入虚拟盘的位模式数据来响应写入请求,以及通过读取一个或更多个虚拟盘文件中存储的位模式来响应读取请求。A storage manipulation module can be used to emulate storage on a virtual or physical machine. For example, the storage manipulation module can manipulate storage I/O tasks issued by a virtual or physical machine by reading and writing one or more virtual disk files (i.e., contiguous storage areas such as blocks) that can be used to describe (i.e., store) virtual disk extents. Similarly, the storage manipulation program can respond to write requests by writing virtual disk bit pattern data to one or more virtual disk files, and respond to read requests by reading the bit patterns stored in one or more virtual disk files.
发明内容Summary of the Invention
本文描述用于在一个或更多个虚拟盘文件中存储虚拟盘数据的技术。在示范性配置中,虚拟盘扩展可以与表明虚拟盘扩展是否通过虚拟盘文件来描述的状态信息相关联。在某些情况下,可以收回用来描述虚拟盘扩展的空间,可以使用状态信息来确定如何操控针对虚拟盘扩展的后续读取和/或写入操作。可以使用收回的空间(例如,从一个或更多个范围建立的扩展)来描述同样或另一虚拟盘扩展。除了以上内容之外,在权利要求、具体实施方式、以及图中还描述了其它技术。This document describes techniques for storing virtual disk data in one or more virtual disk files. In an exemplary configuration, a virtual disk extent can be associated with state information indicating whether the virtual disk extent is described by a virtual disk file. In certain circumstances, space used to describe a virtual disk extent can be reclaimed, and the state information can be used to determine how subsequent read and/or write operations directed to the virtual disk extent are handled. The reclaimed space (e.g., an extent created from one or more extents) can be used to describe the same or another virtual disk extent. In addition to the above, other techniques are described in the claims, detailed description, and figures.
本领域技术人员可以明白,本公开的一个或更多个各种方面可以包括但不限于用于实现本文中参考的方面的电路和/或编程;电路和/或编程可以根据系统设计者的设计选择而实质上是被配置成实现本文中参考的方面的硬件、软件、和/或固件的任何组合。Those skilled in the art will appreciate that one or more various aspects of the present disclosure may include, but are not limited to, circuits and/or programming for implementing the aspects referenced herein; the circuits and/or programming may be substantially any combination of hardware, software, and/or firmware configured to implement the aspects referenced herein based on the design choices of a system designer.
以上内容是概述,因而必定包含细节的简化、概括和省略。本领域技术人员将会明白,发明内容只是示例性的,而并非意在以任何方式限制。The above content is an overview and therefore necessarily contains simplifications, generalizations and omissions of details. Those skilled in the art will appreciate that the summary is illustrative only and is not intended to be limiting in any way.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1描绘了计算机系统的高级方框图。Figure 1 depicts a high-level block diagram of a computer system.
图2描绘了虚拟化软件程序的示范性架构的高级方框图。FIG2 depicts a high-level block diagram of an exemplary architecture of a virtualization software program.
图3描绘了虚拟化软件程序的替选架构的高级方框图。FIG3 depicts a high-level block diagram of an alternative architecture for a virtualization software program.
图4描绘了被配置成实现虚拟盘的计算机系统的低级方框图。4 depicts a low-level block diagram of a computer system configured to implement a virtual disk.
图5A描绘了被配置成实现虚拟盘的计算机系统的低级方框图。5A depicts a low-level block diagram of a computer system configured to implement a virtual disk.
图5B描绘了被配置成实现虚拟盘的计算机系统的低级方框图。5B depicts a low-level block diagram of a computer system configured to implement a virtual disk.
图6描绘了差异盘的高级方框图。Figure 6 depicts a high-level block diagram of a differencing disk.
图7描绘了虚拟盘与虚拟盘文件之间关系的高级示例。FIG7 depicts a high-level example of the relationship between a virtual disk and virtual disk files.
图8描绘了虚拟盘与虚拟盘文件之间关系的高级示例。FIG8 depicts a high-level example of the relationship between a virtual disk and virtual disk files.
图9描绘了虚拟盘与虚拟盘文件之间关系的高级示例。FIG9 depicts a high-level example of the relationship between a virtual disk and virtual disk files.
图10描绘了虚拟盘与虚拟盘文件之间关系的高级示例。FIG10 depicts a high-level example of the relationship between a virtual disk and virtual disk files.
图11描绘了可以在计算机可读存储介质中实施和/或通过计算机系统执行的操作流程。FIG11 depicts an operational flow that may be embodied in a computer-readable storage medium and/or performed by a computer system.
图12描绘了可以结合图11所示例的那些来执行的额外操作。FIG12 depicts additional operations that may be performed in conjunction with those illustrated in FIG11.
图13描绘了可以结合图12所示例的那些来执行的额外操作。FIG13 depicts additional operations that may be performed in conjunction with those illustrated in FIG12.
图14描绘了可以在计算机可读存储介质中实施和/或通过计算机系统执行的操作流程。FIG14 depicts an operational flow that may be embodied in a computer-readable storage medium and/or performed by a computer system.
图15描绘了可以结合图14所示例的那些来执行的额外操作。FIG15 depicts additional operations that may be performed in conjunction with those illustrated in FIG14.
图16描绘了可以在计算机可读存储介质中实施和/或通过计算机系统执行的操作流程。FIG16 depicts an operational flow that may be embodied in a computer-readable storage medium and/or performed by a computer system.
图17描绘了可以结合图16所示例的那些来执行的额外操作。FIG17 depicts additional operations that may be performed in conjunction with those illustrated in FIG16.
具体实施方式DETAILED DESCRIPTION
公开的主题可以使用一个或更多个计算机系统。图1和以下讨论意在提供可以实施公开主题的合适运算环境的简要总体描述。The disclosed subject matter may employ one or more computer systems. Figure 1 and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the disclosed subject matter may be implemented.
通篇使用的术语电路可以包括诸如硬件中断控制器、硬盘驱动器、网络适配器、图形处理器、基于硬件的视频/音频编解码器、以及用来操作这种硬件的固件的硬件组件。术语电路还可以包括通过固件和/或软件配置的微处理器、专用集成电路、以及处理器,例如,执行指令读取和执行的多内核通用处理单元的内核。可以通过从存储器(例如,RAM、ROM)、固件、和/或海量存储加载的指令来配置处理器,实施可操作用于把处理器配置成执行功能的逻辑。在电路包括硬件和软件的组合的实例实施例中,实施者可以编写实施逻辑的源代码,其后续被编译成可以通过硬件执行的机器可读代码。由于本领域技术人员可以明白,本领域的现状已演进到在硬件实施功能或软件实施功能之间存在微小差别的程度,所以用以实现本文中描述的功能的硬件对软件的选择仅是一种设计选择。换言之,由于本领域技术人员可以明白,可以把软件过程变换成等同硬件结构,可以把硬件结构本身变换成等同软件过程,所以硬件实施对软件实施的选择留待实施者决定。The term "circuit," as used throughout this document, may include hardware components such as hardware interrupt controllers, hard drives, network adapters, graphics processors, hardware-based video/audio codecs, and the firmware used to operate such hardware. The term "circuit" may also include microprocessors, application-specific integrated circuits, and processors configured by firmware and/or software, such as the cores of a multi-core general-purpose processing unit that performs instruction fetching and execution. A processor may be configured by instructions loaded from memory (e.g., RAM, ROM), firmware, and/or mass storage, implementing logic operable to configure the processor to perform functions. In example embodiments where the circuit comprises a combination of hardware and software, the implementer may write source code implementing the logic, which is subsequently compiled into machine-readable code that can be executed by the hardware. As those skilled in the art will appreciate, the state of the art has evolved to the point where there is minimal difference between hardware-implemented and software-implemented functions. Therefore, the choice of hardware versus software to implement the functions described herein is merely a design choice. In other words, as those skilled in the art will appreciate, software processes can be transformed into equivalent hardware structures, and hardware structures themselves can be transformed into equivalent software processes. Therefore, the choice of hardware versus software implementation is left to the implementer's discretion.
现在参照图1,描绘了示范性运算系统100。计算机系统100可以包括处理器102,例如,执行内核。虽然示例了一个处理器102,但在其它实施例中计算机系统100可以具有多个处理器,例如,每处理器基板多个执行内核和/或可以各自具有多个执行内核的多个处理器基板。如图所示,各种计算机可读存储介质110可以通过向处理器102耦合各种系统组件的一个或更多个系统总线互连。系统总线可以是包括如下内容的数个类型总线结构中的任何总线结构:存储器总线或存储器控制器、外围总线、以及使用各种总线架构中任何总线架构的本地总线。在实例实施例中计算机可读存储介质110可以包括例如随机访问存储器(RAM)104、存储装置106(例如,机电硬盘驱动器、固态硬盘驱动器等)、固件108(例如,闪存RAM或ROM),以及诸如例如CD-ROM、软盘、DVD、闪存驱动器、外界存储装置等的可移除存储装置118。本领域技术人员应当明白,可以使用其它类型的计算机可读存储介质,如,磁盒、闪存卡、和/或数字视频盘。Referring now to FIG. 1 , an exemplary computing system 100 is depicted. Computer system 100 may include a processor 102, e.g., an execution core. While a single processor 102 is illustrated, in other embodiments, computer system 100 may have multiple processors, e.g., multiple execution cores per processor substrate and/or multiple processor substrates each having multiple execution cores. As shown, various computer-readable storage media 110 may be interconnected via one or more system buses that couple various system components to processor 102. The system bus may be any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. In an example embodiment, computer-readable storage media 110 may include, for example, random access memory (RAM) 104, storage 106 (e.g., an electromechanical hard drive, a solid-state hard drive, etc.), firmware 108 (e.g., flash RAM or ROM), and removable storage 118 such as, for example, a CD-ROM, floppy disk, DVD, flash drive, external storage, and the like. Those skilled in the art will appreciate that other types of computer-readable storage media may be used, such as magnetic cassettes, flash memory cards, and/or digital video disks.
计算机可读存储介质110可以提供处理器可执行指令122、数据结构、程序模块和诸如可执行指令的计算机系统100的其它数据的非易失性和易失性存储。可以在固件108中存储包含基本例程的基本输入/输出系统(BIOS)120,其中所述基本例程帮助在计算机系统100内的元件之间传送信息(如,在启动期间)。可以在固件108、存储装置106、RAM 104、和/或可移除存储装置118上存储大量程序,以及通过包括操作系统和/或应用程序的处理器102来执行。在示范性实施例中,计算机可读存储介质110可以存储在以下段落中更详细描述的虚拟盘解析器404,可以通过处理器102执行,从而把计算机系统100变换成被配置成用于特定目的的计算机系统,即,根据本文件中描述的技术配置的计算机系统。Computer-readable storage media 110 may provide non-volatile and volatile storage of processor-executable instructions 122, data structures, program modules, and other data for computer system 100, such as executable instructions. A basic input/output system (BIOS) 120, containing basic routines that help transfer information between elements within computer system 100 (e.g., during startup), may be stored in firmware 108. A number of programs may be stored on firmware 108, storage device 106, RAM 104, and/or removable storage device 118, and executed by processor 102, including an operating system and/or application programs. In an exemplary embodiment, computer-readable storage media 110 may store a virtual disk parser 404, described in more detail in the following paragraphs, which may be executed by processor 102 to transform computer system 100 into a computer system configured for a specific purpose, i.e., a computer system configured according to the techniques described in this document.
计算机系统100可以通过可以包括但不限于键盘和指点装置的输入装置116接收命令和信息。其它输入装置可以包括麦克风、操纵杆、游戏手柄、扫描仪等。这些和其它输入装置常常通过耦合到系统总线的串口接口连接到处理器102,但是也可以通过其它接口(如,并口、游戏端口、或者通用串行总线(USB))连接。显示器或其它类型的显示装置也可以经由接口(如,可以是图形处理器单元112的一部分或者连接到图形处理器单元112的视频适配器)连接到系统总线。除了显示器之外,计算机通常包括其它外围输出装置,如,扬声器和打印机(未示出)。图1的示范性系统还可以包括主机适配器、小型计算机系统接口(SCSI)总线、以及连接到SCSI总线的外界存储装置。The computer system 100 can receive commands and information through input devices 116, which may include, but are not limited to, a keyboard and pointing device. Other input devices may include microphones, joysticks, game controllers, scanners, and the like. These and other input devices are often connected to the processor 102 via a serial port interface coupled to the system bus, but may also be connected via other interfaces such as a parallel port, a game port, or a universal serial bus (USB). A monitor or other type of display device may also be connected to the system bus via an interface such as a video adapter that may be part of or connected to the graphics processor unit 112. In addition to a monitor, computers typically include other peripheral output devices such as speakers and printers (not shown). The exemplary system of FIG. 1 may also include a host adapter, a small computer system interface (SCSI) bus, and external storage devices connected to the SCSI bus.
计算机系统100可以使用去往一个或更多个远程计算机(如,远程计算机)的逻辑连接在联网环境中操作。远程计算机可以是另一计算机、服务器、路由器、网络PC、对等装置或其它公共网络节点,以及通常可以包括以上相对于计算机系统100描述的许多元件或所有元件。The computer system 100 can operate in a networked environment using logical connections to one or more remote computers (e.g., remote computers). A remote computer can be another computer, a server, a router, a network PC, a peer device, or other public network node, and can generally include many or all of the elements described above with respect to the computer system 100.
当在LAN或WAN组网环境中使用时,计算机系统100可以通过网络接口卡114连接到LAN或WAN。可以在内部或外部的NIC 114可以连接到系统总线。在联网环境中,可以把相对于计算机系统100描绘的程序模块、或者其一些部分存储在远程存储器存储装置中。将会明白,此处描述的网络连接是示范性的,还可以使用在计算机之间建立通信链路的其它方式。此外,虽然构思了本公开的众多实施例特别是非常适合计算机化系统,但本文中没有内容意在把本公开限制为这些实施例。When used in a LAN or WAN networking environment, the computer system 100 can be connected to the LAN or WAN via a network interface card 114. The NIC 114, which can be internal or external, can be connected to the system bus. In a networked environment, program modules depicted relative to the computer system 100, or portions thereof, can be stored in remote memory storage devices. It will be appreciated that the network connections described herein are exemplary, and other means of establishing a communication link between computers can also be used. Furthermore, while it is contemplated that numerous embodiments of the present disclosure are particularly well-suited to computerized systems, nothing herein is intended to limit the present disclosure to such embodiments.
转到图2,示例的是可以用来生成虚拟机的示范性虚拟化平台。在此实施例中,微核管理程序202可以被配置成控制和任意访问计算机系统200的硬件。微核管理程序202可以生成称为诸如子分区1至子分区N(其中,N是大于1的整数)的分区的执行环境。此处,子分区是微核管理程序202支持的隔离的基本单元。微核管理程序202可以隔离一个分区中的进程以免访问另一分区的资源。特别地,微核管理程序202可以隔离访客操作系统的核模式代码以免于访问另一分区的资源以及用户模式进程。可以把每个子分区映射到在微核管理程序202的控制下的一组硬件资源,例如,存储器、装置、处理器循环等。在实施例中,微核管理程序202可以是单机软件产品、嵌入主板的固件内的操作系统的一部分、专用集成电路、或者其组合。Turning to FIG. 2 , an exemplary virtualization platform that can be used to generate virtual machines is illustrated. In this embodiment, a microkernel hypervisor 202 can be configured to control and arbitrarily access the hardware of a computer system 200. The microkernel hypervisor 202 can generate execution environments called partitions, such as subpartition 1 through subpartition N (where N is an integer greater than 1). Here, a subpartition is the basic unit of isolation supported by the microkernel hypervisor 202. The microkernel hypervisor 202 can isolate processes in one partition from accessing resources in another partition. In particular, the microkernel hypervisor 202 can isolate kernel-mode code of a guest operating system from accessing resources in another partition and user-mode processes. Each subpartition can be mapped to a set of hardware resources under the control of the microkernel hypervisor 202, such as memory, devices, processor cycles, etc. In embodiments, the microkernel hypervisor 202 can be a stand-alone software product, part of an operating system embedded in the motherboard's firmware, an application-specific integrated circuit, or a combination thereof.
微核管理程序202可以通过约束物理计算机系统中存储器的访客操作系统视图而强制分区。当微核管理程序202实例化虚拟机时,它可以向虚拟机分配系统物理存储器(SPM)的页面(例如,具有开始和结束地址的存储器的定长块)作为访客物理存储器(GPM)。此处,微核管理程序202控制系统存储器的访客约束视图。术语访客物理存储器是从虚拟机的角度描述存储器页面的简洁方式,术语系统物理存储器是从物理系统的角度描述存储器页面的简洁方式。因而,向虚拟机分配的存储器的页面将会具有访客物理地址(虚拟机使用的地址)和系统物理地址(页面的实际地址)。The microkernel hypervisor 202 can enforce partitioning by constraining the guest operating system's view of memory in the physical computer system. When the microkernel hypervisor 202 instantiates a virtual machine, it can allocate pages of system physical memory (SPM) (e.g., fixed-length blocks of memory with start and end addresses) to the virtual machine as guest physical memory (GPM). Here, the microkernel hypervisor 202 controls the guest-constrained view of system memory. The term guest physical memory is a concise way to describe memory pages from the perspective of the virtual machine, while the term system physical memory is a concise way to describe memory pages from the perspective of the physical system. Consequently, a page of memory allocated to a virtual machine will have both guest physical addresses (the addresses used by the virtual machine) and system physical addresses (the actual addresses of the pages).
访客操作系统可以虚拟化访客物理存储器。虚拟存储器是允许操作系统过量调配(commit)存储器和允许应用对逻辑上连续的工作存储器单独访问的管理技术。在虚拟化环境中,访客操作系统可以使用此上下文中称为访客页面表的一个或更多个页面表把称为虚拟访客地址的虚拟地址转译成访客物理地址。在此实例中,存储器地址可以具有访客虚拟地址、访客物理地址、以及系统物理地址。A guest operating system can virtualize guest physical memory. Virtual memory is a management technique that allows the operating system to overcommit memory and grants applications independent access to logically contiguous working memory. In a virtualized environment, the guest operating system can use one or more page tables, referred to in this context as guest page tables, to translate virtual addresses (called virtual guest addresses) into guest physical addresses. In this example, a memory address can have a guest virtual address, a guest physical address, and a system physical address.
在描绘的实例中,还可以也视为与Xen开源管理程序的域0类似的父分区组件可以包括主机环境204。主机环境204可以是操作系统(或一套配置工具),主机环境204可以被配置成通过使用虚拟化业务提供器228(VSP)向在子分区1-N中执行的访客操作系统提供资源。可以使用在开源团体中通常称作后端驱动器的VSP 228来通过虚拟化业务客户端(VSC)(在开源团体或半虚拟化装置中通常称作前端驱动器)把接口复用到硬件资源。如图所示,虚拟化业务客户端在访客操作系统的环境中执行。然而,这些驱动器在它们经由VSP与主机环境204通信而非与硬件或模拟硬件通信的过程中与访客中驱动器的其余驱动器不同。在示范性实施例中可以把虚拟化业务提供器228使用以与虚拟业务客户端216和218通信的路径视为启发(enlightened)IO路径。In the depicted example, the parent partition components, which can also be considered similar to Domain 0 of the Xen open source hypervisor, can include a host environment 204. Host environment 204 can be an operating system (or a set of configuration tools) that can be configured to provide resources to the guest operating systems executing in child partitions 1-N using a virtualization service provider (VSP). VSP 228, commonly referred to as a back-end driver in the open source community, can be used to multiplex interfaces to hardware resources via virtualization service clients (VSCs) (commonly referred to as front-end drivers in the open source community or in paravirtualized devices). As shown, the virtualization service clients execute within the context of the guest operating system. However, these drivers differ from the rest of the guest drivers in that they communicate with host environment 204 via the VSP, rather than with hardware or emulated hardware. In exemplary embodiments, the paths used by virtualization service provider 228 to communicate with virtual service clients 216 and 218 can be considered enlightened IO paths.
如图所示,模拟器234(例如,虚拟化IDE装置、虚拟化视频适配器、虚拟化NIC等)可以被配置成在主机环境204内运行和附接到对访客操作系统220和222可用的模拟硬件资源,例如,IO端口、访客物理地址范围、虚拟VRAM、模拟ROM范围等。例如,当访客OS触及映射到访客物理地址(装置的寄存器在该地址将会用于存储器映射装置)的访客虚拟地址时,微核管理程序202可以拦截请求并把访客试图写入的值传递给关联的模拟器。此处,可以把此实例中的模拟硬件资源视为虚拟装置在访客物理地址空间中所处的位置。可以把以此方式对模拟器的使用视作模拟路径。该模拟路径相比于启发IO路径而言效率低,因为它较之它对在VSP与VSC之间传递消息而言需要较多CPU时间来模拟装置。例如,需要映射到寄存器的存储器上的数个动作以经由模拟路径向盘写入缓存器,而这会被减少为启发IO路径中从VSC向VSP传递的单个消息,因为VM中的驱动器被设计成访问虚拟化系统提供的IO业务而非设计成访问硬件。As shown, emulator 234 (e.g., a virtualized IDE device, a virtualized video adapter, a virtualized NIC, etc.) can be configured to run within host environment 204 and attach to emulated hardware resources available to guest operating systems 220 and 222, such as IO ports, guest physical address ranges, virtual VRAM, emulated ROM ranges, and the like. For example, when a guest OS touches a guest virtual address that is mapped to a guest physical address (where a device's register would be used for a memory-mapped device), microkernel hypervisor 202 can intercept the request and pass the value the guest is attempting to write to the associated emulator. Here, the emulated hardware resource in this example can be considered the location of the virtual device in the guest physical address space. Using the emulator in this manner can be considered an emulation path. This emulation path is less efficient than the heuristic IO path because it requires more CPU time to emulate the device than it does to pass messages between the VSP and VSC. For example, several actions on register-mapped memory are required to write a buffer to disk via the emulation path, but this is reduced to a single message passed from the VSC to the VSP in the heuristic IO path because the drivers in the VM are designed to access the IO services provided by the virtualization system rather than to access the hardware.
每个子分区可以包括一个或更多个虚拟处理器(230和232),访客操作系统(220和222)可以管理和调度在其上执行的线程。通常,虚拟处理器是提供具有具体架构的物理处理器的表示的可执行指令和相关联状态信息。例如,一个虚拟机可以具有虚拟处理器,其具有Intel x86处理器的特性,然而另一虚拟处理器可以具有PowerPC处理器的特性。可以把此实例中的虚拟处理器映射到计算机系统的处理器,使得将会通过物理处理器直接执行实现虚拟处理器的指令。因而,在包括多个处理器的实施例中,可以通过处理器同时执行虚拟处理器,而例如其它处理器执行管理程序指令。可以把分区中的虚拟处理器和存储器的组合视作虚拟机。Each child partition can include one or more virtual processors (230 and 232), which the guest operating system (220 and 222) can manage and schedule threads executing on. Generally, a virtual processor is executable instructions and associated state information that provide a representation of a physical processor having a specific architecture. For example, one virtual machine may have a virtual processor that has the characteristics of an Intel x86 processor, while another virtual processor may have the characteristics of a PowerPC processor. The virtual processors in this example can be mapped to processors of the computer system so that instructions implementing the virtual processors will be executed directly by the physical processors. Thus, in embodiments including multiple processors, a virtual processor can be executed simultaneously by a processor while, for example, other processors execute hypervisor instructions. The combination of virtual processors and memory in a partition can be considered a virtual machine.
访客操作系统(220和222)可以是诸如例如来自Microsoft®、Apple®、开源团体等的操作系统的任何操作系统。访客操作系统可以包括用户/核操作模式和可以具有可以包括调度器、存储器管理器等的核。总体而言,核模式可以包括授权至少对特许处理器指令访问的处理器中的执行模式。每个访客操作系统可以具有相关联的文件系统,其上可以存储有诸如终端服务器、电子商务服务器、电子邮件服务器等的应用,以及访客操作系统本身。访客操作系统可以调度线程以在虚拟处理器上执行,可以实现这些应用的实例。The guest operating systems (220 and 222) can be any operating system, such as, for example, operating systems from Microsoft®, Apple®, open source groups, and the like. A guest operating system can include user/kernel operating modes and can have a kernel, which can include a scheduler, a memory manager, and the like. Generally speaking, kernel mode can include an execution mode in a processor that grants access to at least privileged processor instructions. Each guest operating system can have an associated file system on which applications, such as a terminal server, an e-commerce server, an email server, and the like, can be stored, as well as the guest operating system itself. The guest operating system can schedule threads for execution on the virtual processors, which can implement instances of these applications.
现在参照图3,它示例了以上在图2中描述的替选虚拟化平台。图3描绘了图2的类似组件;然而,在此实例实施例中管理程序302可以包括微核组件和诸如虚拟化业务提供器228和装置驱动器224的与图2的主机环境204中类似的组件,而管理操作系统304可以包含例如用来配置管理程序302的配置工具。在此架构中,管理程序302可以执行与图2的微核管理程序202同样或类似的功能;然而,在此架构中管理程序304实现启发IO路径和包括计算机系统物理硬件的驱动器。图3的管理程序302可以是单机软件产品、嵌入母板的固件内的操作系统的一部分,或者可以通过专用集成电路实现管理程序302的一部分。Reference is now made to FIG3 , which illustrates an alternative virtualization platform to that described above in FIG2 . FIG3 depicts similar components to FIG2 ; however, in this example embodiment, hypervisor 302 may include a microkernel component and similar components as in host environment 204 of FIG2 , such as virtualization service provider 228 and device driver 224 , while management operating system 304 may include, for example, configuration tools for configuring hypervisor 302 . In this architecture, hypervisor 302 may perform the same or similar functions as microkernel hypervisor 202 of FIG2 ; however, in this architecture, hypervisor 304 implements heuristic IO paths and drivers for the physical hardware of the computer system. Hypervisor 302 of FIG3 may be a stand-alone software product, part of an operating system embedded in firmware on a motherboard, or a portion of hypervisor 302 may be implemented via an application-specific integrated circuit.
现在转到图4,它描述计算机系统400,其示例了可以用来实现本文中描述的技术的组件的高级方框图。简言之,计算机系统400可以包括与以上针对图1至3描述的类似的组件。图4示出了可以视为图2或图3示例的虚拟化平台的高级表示的虚拟化系统420。例如,可以把虚拟化系统420视为由微核管理程序202和主机环境204提供的特征的组合的高级表示。可替选地,可以把虚拟化系统420视为管理程序302和管理OS 304的高级表示。因而,本文通篇对术语“虚拟化系统420”的使用意指可以在任何类型的虚拟化软件层内或在任何类型的虚拟化平台中实施以下段落中描述的虚拟盘技术。Turning now to FIG. 4 , which depicts a computer system 400 illustrating a high-level block diagram of components that may be used to implement the techniques described herein. Briefly, computer system 400 may include similar components as described above with respect to FIG. 1 through FIG. 3 . FIG. 4 illustrates a virtualization system 420 that may be viewed as a high-level representation of the virtualization platform illustrated in FIG. 2 or FIG. 3 . For example, virtualization system 420 may be viewed as a high-level representation of a combination of features provided by microkernel hypervisor 202 and host environment 204. Alternatively, virtualization system 420 may be viewed as a high-level representation of hypervisor 302 and management OS 304. Thus, throughout this document, the use of the term “virtualization system 420” means that the virtual disk techniques described in the following paragraphs may be implemented within any type of virtualization software layer or in any type of virtualization platform.
虚拟化系统420可以包括卸载提供器引擎422。简言之,卸载提供器引擎422可以被配置成服务于例如应用424发出的卸载读取和卸载写入请求(有时称为代理读取和代理写入)。卸载读取请求是创建令牌的请求,该令牌表示在卸载读取是正常读取的情况下已读取的数据。卸载写入是用以把令牌表示的数据写到目的地点的请求。在一个使用实例中,可以使用后面是卸载写入的卸载读取从一个地点向另一个(例如,通过使用避免通过本地RAM移动数据的表示数据的令牌从计算机系统400向域内的目的计算机系统)复制数据。例如,假设计算机系统400和目的计算机系统(未示出)可以访问共同数据仓库,并且用以从计算机系统向目的地复制数据的请求被接收。并非把数据复制到目的地,应用424可以向卸载提供器引擎422发出请求以发出如它在令牌与数据相关联时存在的一样表示数据的令牌。可以向目的地发送令牌和通过在目的地上运行的程序使用令牌,以从共同数据存储仓库获得数据和把数据写入到目的地。在题为“Offload Reads and Writes”的共同未决美国专利申请No. 12/888,433和题为“Virtualization and Offload Reads and Writes”的美国专利申请No. 12/938,383中更详细描述了副本卸载技术,其内容在它们与本文中描述的技术一致的程度上整体经引用并入本文。Virtualization system 420 may include an offload provider engine 422. Briefly, offload provider engine 422 may be configured to service offload read and offload write requests (sometimes referred to as proxy reads and proxy writes), for example, issued by application 424. An offload read request is a request to create a token representing the data that would have been read if the offload read were a normal read. An offload write is a request to write the data represented by the token to a destination. In one use case, an offload read followed by an offload write can be used to copy data from one location to another (e.g., from computer system 400 to a destination computer system within a domain using a token representing the data that avoids moving the data through local RAM). For example, assume that computer system 400 and a destination computer system (not shown) have access to a common data repository and a request is received to copy data from the computer system to the destination. Instead of copying the data to the destination, application 424 may request offload provider engine 422 to issue a token representing the data as it existed when the token was associated with the data. The token can be sent to the destination and used by a program running on the destination to retrieve data from the common data repository and write the data to the destination. Copy offload techniques are described in more detail in co-pending U.S. patent application Ser. No. 12/888,433, entitled “Offload Reads and Writes,” and U.S. patent application Ser. No. 12/938,383, entitled “Virtualization and Offload Reads and Writes,” the contents of which are incorporated herein by reference in their entirety to the extent they are consistent with the techniques described herein.
可以使用可以是具体实例实施例中可执行指令模块的虚拟盘解析器404来实例化来自虚拟盘文件的虚拟盘和代表虚拟机操控存储IO。如图所示,虚拟盘解析器404可以打开诸如虚拟盘文件406的一个或更多个虚拟盘文件和生成虚拟盘402。A virtual disk parser 404, which can be an executable instruction module in a specific example embodiment, can be used to instantiate a virtual disk from a virtual disk file and manipulate storage IO on behalf of a virtual machine. As shown, the virtual disk parser 404 can open one or more virtual disk files such as virtual disk file 406 and generate a virtual disk 402.
虚拟盘解析器404可以经由虚拟化系统文件系统408从存储装置106获得虚拟盘文件406。简言之,虚拟化系统文件系统408表示组织虚拟化系统420的计算机文件和数据(如,虚拟盘文件406)的软件模块。虚拟化系统文件系统408可以把此数据存储在固定尺寸物理扩展的阵列(即,物理存储装置上连续的存储区域)中。在具体实例中,扩展可以是作为具有设置长度的位的字节序列的簇。示范性簇尺寸通常是512字节与64千字节之间2的幂。在具体配置中,簇尺寸可以是4千字节。The virtual disk parser 404 can obtain the virtual disk file 406 from the storage device 106 via the virtualization system file system 408. In short, the virtualization system file system 408 represents a software module that organizes the computer files and data (e.g., virtual disk file 406) of the virtualization system 420. The virtualization system file system 408 can store this data in an array of fixed-size physical extents (i.e., contiguous storage areas on the physical storage device). In a specific example, an extent can be a cluster, which is a sequence of bytes of a set length. Exemplary cluster sizes are typically powers of two between 512 bytes and 64 kilobytes. In a specific configuration, the cluster size can be 4 kilobytes.
当接收到用以打开虚拟盘文件406的请求时,虚拟化系统文件系统408确定文件在盘上位于何处和向盘装置驱动器发出IO任务以从盘的一个或更多个物理扩展读取数据。文件系统408发出的IO任务确定描述存储装置106上虚拟盘文件406的永久副本地点的盘偏移量和长度和向存储装置106发出IO任务。由于存储装置如何操作的语义,可以在高速缓冲存储器454表示的易失性存储器的一个或更多个级别的高速缓冲存储器中缓存写入IO任务,直到存储装置106的电路确定访问永久存储单元460(例如,磁盘片、闪存单元等)上的地点、并把表明虚拟盘文件406的永久副本新内容的缓存位模式写入到永久存储单元460为止。When a request to open a virtual disk file 406 is received, the virtualization system file system 408 determines where the file is located on disk and issues an IO task to the disk device driver to read data from one or more physical extents of the disk. The IO task issued by the file system 408 determines the disk offset and length that describes the location of the permanent copy of the virtual disk file 406 on the storage device 106 and issues the IO task to the storage device 106. Due to the semantics of how the storage device operates, the write IO task can be cached in one or more levels of volatile memory represented by cache memory 454 until the circuitry of the storage device 106 determines to access a location on a permanent storage unit 460 (e.g., a magnetic disk platter, a flash memory unit, etc.) and writes a cache bit pattern indicating the new contents of the permanent copy of the virtual disk file 406 to the permanent storage unit 460.
虚拟盘解析器404可以获得表明虚拟盘文件406的位模式和暴露虚拟盘文件406中的载荷(例如,用户数据)作为包括多个虚拟盘扩展的盘。在实施例中,这些虚拟盘扩展可以是尺寸为512千字节直至64兆字节、并分区成多个扇区的固定尺寸块;然而,在另一实施例中虚拟盘扩展可以是可变尺寸扩展。在示范性配置中,在启动访客操作系统412以前,设定与虚拟盘的模拟或启发存储控制器和模拟或启发方面有关的资源,使得在虚拟机410的访客物理地址空间内实现具有存储器映射寄存器的模拟存储控制器。启动代码可以运行和启动访客操作系统412。虚拟化系统420可以检测对访问访客物理地址空间的此区域的尝试,并返回使访客操作系统412确定存储装置被附接到模拟存储控制器的结果。在响应中,访客操作系统412可以加载驱动器(半虚拟化驱动器或常规驱动器)和使用驱动器向检测的存储装置发出存储IO请求。虚拟化系统420可以把存储IO请求发送给虚拟盘解析器404。Virtual disk parser 404 can obtain a bit pattern indicative of virtual disk file 406 and expose the payload (e.g., user data) in virtual disk file 406 as a disk comprising multiple virtual disk extents. In one embodiment, these virtual disk extents can be fixed-size blocks ranging from 512 kilobytes to 64 megabytes in size and partitioned into multiple sectors; however, in another embodiment, the virtual disk extents can be variable-size extents. In an exemplary configuration, resources related to an emulated or heuristic storage controller and aspects of the emulation or heuristic are set up before launching guest operating system 412, such that the emulated storage controller with memory-mapped registers is implemented within the guest physical address space of virtual machine 410. Boot code can run and launch guest operating system 412. Virtualization system 420 can detect attempts to access this area of the guest physical address space and return a result that allows guest operating system 412 to determine that a storage device is attached to the emulated storage controller. In response, guest operating system 412 can load a driver (either a paravirtualized driver or a regular driver) and use the driver to issue storage I/O requests to the detected storage device. The virtualization system 420 may send the storage IO request to the virtual disk parser 404 .
在访客操作系统412在运行之后它可以经由文件系统414向虚拟盘402发出IO任务,该文件系统与虚拟化系统文件系统414类似,原因是它组织访客操作系统412的计算机文件和数据以及访客操作系统412上安装的应用。访客操作系统412可以通过与操作系统如何与物理存储装置交互并最终把IO任务发送给虚拟盘解析器404类似的方式与虚拟盘402交互。虚拟盘解析器404可以以模拟物理存储装置的方式包括用于确定如何响应IO任务的逻辑。例如,虚拟盘解析器404可以从虚拟盘文件406读取数据和向虚拟盘文件406写入数据。写入到虚拟盘文件406的数据继而通过虚拟化系统文件系统408发送、并向永久存储单元460上或永久存储单元460中存储的虚拟盘文件406的永久副本交付。After the guest operating system 412 is running, it can issue IO tasks to the virtual disk 402 via the file system 414. The file system is similar to the virtualization system file system 414 because it organizes the computer files and data of the guest operating system 412 and the applications installed on the guest operating system 412. The guest operating system 412 can interact with the virtual disk 402 in a manner similar to how the operating system interacts with the physical storage device and ultimately sends the IO tasks to the virtual disk parser 404. The virtual disk parser 404 can include logic for determining how to respond to the IO tasks in a manner that simulates a physical storage device. For example, the virtual disk parser 404 can read data from the virtual disk file 406 and write data to the virtual disk file 406. The data written to the virtual disk file 406 is then sent through the virtualization system file system 408 and delivered to the permanent copy of the virtual disk file 406 stored on or in the permanent storage unit 460.
简要参照图5A,它示例了用于实施本文中描述的技术的替选架构。如图5所示,也可以在诸如Microsoft®提供的操作系统的操作系统502中实施虚拟盘解析器404。在此实例中,虚拟盘解析器404可以被配置成在可以包括与图1的计算机系统100类似的组件的存储服务器500上运行。在此实例中,存储服务器500可以包括物理存储装置510的阵列和可以被配置成使存储装置可用于服务器,从而使存储装置仿佛局部附接到操作系统508一样。虚拟盘解析器404可以与针对图4所描述的一样操作;可以通过网络连接将文件系统414发出的此配置读/写IO任务中的差异发送给虚拟盘解析器404。Referring briefly to FIG5A , an alternative architecture for implementing the techniques described herein is illustrated. As shown in FIG5 , a virtual disk parser 404 can also be implemented in an operating system 502, such as an operating system provided by Microsoft®. In this example, the virtual disk parser 404 can be configured to run on a storage server 500, which can include components similar to the computer system 100 of FIG1 . In this example, the storage server 500 can include an array of physical storage devices 510 and can be configured to make the storage devices available to the server as if the storage devices were locally attached to the operating system 508. The virtual disk parser 404 can operate as described with respect to FIG4 ; differences in this configuration read/write IO tasks issued by the file system 414 can be sent to the virtual disk parser 404 over a network connection.
简要参照图5B,它示例了用于实施本文中描述的技术的又一架构。图5B与图5A的类似之处在于在操作系统502中实施虚拟盘解析器404以及计算机系统512可以包括与图1的计算机系统100类似的组件。然而此实例中的差异是该图示例了环回附接虚拟盘402。可以在虚拟盘402中存储包括诸如应用424的应用的文件系统414,以及可以在计算机系统文件系统514中存储虚拟盘文件406。Referring briefly to FIG. 5B , yet another architecture for implementing the techniques described herein is illustrated. FIG. 5B is similar to FIG. 5A in that virtual disk parser 404 is implemented in operating system 502 and computer system 512 may include components similar to computer system 100 of FIG. 1 . However, the difference in this example is that the figure illustrates a loopback-attached virtual disk 402. A file system 414, including applications such as application 424, may be stored in virtual disk 402, and virtual disk files 406 may be stored in computer system file system 514.
现在转而关注虚拟盘402,虽然可以通过单个虚拟盘文件实现它,但在其它配置中可以使用一组差异虚拟盘文件实现虚拟盘402。图6示例了可以被虚拟盘解析器404用来作为差异盘来实现虚拟盘402的虚拟盘文件的示范性链。通常,差异虚拟盘文件表示相比于父图像、作为一组修改扩展的虚拟盘的当前状态。父图像可以是另一差异虚拟盘文件或基本虚拟盘文件。Turning now to virtual disk 402, while it can be implemented using a single virtual disk file, in other configurations, a set of differencing virtual disk files can be used to implement virtual disk 402. FIG6 illustrates an exemplary chain of virtual disk files that can be used by virtual disk parser 404 to implement virtual disk 402 as a differencing disk. Generally, a differencing virtual disk file represents the current state of the virtual disk as a set of modified extensions compared to a parent image. The parent image can be another differencing virtual disk file or a base virtual disk file.
在示范性配置中,可以把父虚拟盘文件与子虚拟盘文件之间的链接存储在所述子(child)内。特别地,所述子可以包括父(parent)的标识和描述父的地点的值。当起动虚拟机时,虚拟盘解析器404可以接收描述链中最后虚拟盘文件的信息,即,虚拟盘文件612是包括虚拟盘文件612、610、606、以及600的链中的最后一个,以及打开此文件。此文件可以包括其父(即,虚拟盘文件610)的标识和去往它的路径。虚拟盘解析器404可以定位和打开所述父等诸如此类直到定位和打开基本虚拟盘文件为止。In an exemplary configuration, a link between a parent virtual disk file and a child virtual disk file can be stored within the child. Specifically, the child can include the identifier of the parent and a value describing the parent's location. When starting a virtual machine, the virtual disk parser 404 can receive information describing the last virtual disk file in the chain, i.e., virtual disk file 612 being the last in the chain including virtual disk files 612, 610, 606, and 600, and open this file. This file can include the identifier of its parent (i.e., virtual disk file 610) and the path to it. The virtual disk parser 404 can locate and open the parent, and so on, until the base virtual disk file is located and opened.
虚拟盘解析器404可以使用表明在父虚拟盘文件中是否存在或存储数据的信息。通常,以读取/修改的形式打开链中的最后虚拟盘文件,而仅以读取的形式打开其它虚拟盘文件。因而,通常向链中的最后虚拟盘文件进行写入。读取操作类似地首先针对链中的最后虚拟盘文件,虚拟盘解析器404将会按从最后至基础的逻辑次序在逻辑上查找虚拟盘文件直到在未高速缓冲关于数据位于何处的信息的情况下找到数据为止。在具体实例中,虚拟盘文件(例如,虚拟盘文件612)的块分配表(未示出)可以包括如下状态信息:该状态信息表明是否通过虚拟盘文件的区段(section)定义虚拟盘扩展或此虚拟盘扩展是否是透明的(例如,进一步沿着链通过不同虚拟盘文件定义)。在一种实施方式中,虚拟盘解析器404可以确定此虚拟盘扩展是否透明和访问链中接下来的虚拟盘文件(例如,虚拟盘文件610)的块分配表等诸如此类直到定义数据的链中的虚拟盘文件被定位为止。Virtual disk parser 404 can use information indicating whether data exists or is stored in a parent virtual disk file. Typically, the last virtual disk file in a chain is opened for read/modify, while other virtual disk files are opened for read only. Therefore, writes are typically performed to the last virtual disk file in the chain. Similarly, read operations target the last virtual disk file in the chain first. Virtual disk parser 404 will logically search virtual disk files in a logical order from last to first until the data is found, without caching information about where the data is located. In a specific example, a block allocation table (not shown) of a virtual disk file (e.g., virtual disk file 612) may include status information indicating whether a virtual disk extent is defined by a section of the virtual disk file or whether the virtual disk extent is transparent (e.g., defined by a different virtual disk file further along the chain). In one embodiment, virtual disk parser 404 may determine whether the virtual disk extent is transparent and access the block allocation table of the next virtual disk file in the chain (e.g., virtual disk file 610), and so on, until the virtual disk file in the chain defining the data is located.
现在参照图7,它示例了由虚拟盘文件702至少部分地描述的虚拟盘402,其可与写入/可修改的图6中描述的任何虚拟盘文件(例如虚拟盘文件602、608、或者612,或者单个虚拟盘文件)类似。如图所示,虚拟盘402可以包括N个存储扩展(其中,N是大于0的整数),在此具体实例中虚拟盘402包括10个扩展。把虚拟盘402示例成包括通过虚拟盘扩展内的不同模式区分的不同文件的位模式和访客操作系统412的数据。Referring now to FIG. 7 , a virtual disk 402 is illustrated, at least in part, as described by a virtual disk file 702, which may be similar to any of the virtual disk files described in FIG. 6 (e.g., virtual disk files 602, 608, or 612, or a single virtual disk file) in a writable/modifiable manner. As shown, virtual disk 402 may include N storage extents (where N is an integer greater than 0), with 10 extents in this particular example. Virtual disk 402 is illustrated as including bit patterns of different files, distinguished by different patterns within the virtual disk extents, and data for guest operating system 412.
由于虚拟盘402并非物理存储装置,所以可以通过虚拟盘文件702内的不同区段“描述”(即,在虚拟盘文件702内的不同区段中存储)虚拟盘扩展的底层载荷数据。例如,通过虚拟盘文件偏移值0或可用来存储载荷数据的第一偏移量定义的区段来描述虚拟盘扩展1。可以在计算机系统400在操作时,可在随机访问存储器中存储的分配表416可以在任何区段中的虚拟盘文件702中维持和可以跨越多个区段。简言之,分配表416可以包括将虚拟盘扩展链接到虚拟盘文件702的区段的信息。例如,分配表416可以存储定义了虚拟盘文件字节偏移量的信息,该偏移量定义存储数据的虚拟盘文件702的区段。箭头表征分配表416中存储的关系。Because virtual disk 402 is not a physical storage device, the underlying payload data for a virtual disk extent can be "described" (i.e., stored in different sections within virtual disk file 702) by different sections within virtual disk file 702. For example, virtual disk extent 1 is described by a section defined by virtual disk file offset 0, or the first offset that can be used to store payload data. While computer system 400 is operating, allocation table 416, which can be stored in random access memory, can be maintained in any section of virtual disk file 702 and can span multiple sections. In short, allocation table 416 can include information linking virtual disk extents to sections of virtual disk file 702. For example, allocation table 416 can store information defining virtual disk file byte offsets that define sections of virtual disk file 702 where data is stored. Arrows represent relationships stored in allocation table 416.
在以下段落中更详细描述的,分配表416还可以包括状态信息;然而,此配置是示范性的。在替选配置中可以把此信息存储在虚拟盘文件702的不同区段中和加载到RAM 104中。分配表416可以包括每个虚拟盘扩展的条目(entry);表明每个扩展所处状态的状态信息;以及表明在虚拟盘文件702中的何处描述每个虚拟盘扩展的文件偏移量(未示例)。在替选实施例中也可以通过多个已经映射和连续(在文件偏移量中)的表条目定义扩展。在此配置中,可以在块载荷在文件中连续的情况下跨过块边界的读取和写入可以作为对虚拟盘文件702的单个读/写来处置。在具体实例中,虚拟盘解析器404还可以存储表明在虚拟盘文件的每个未使用区段中存储什么类型位模式的信息,即,自由空间图。除了以上内容之外,自由空间图可以允许通过虚拟盘解析器404使用以确定虚拟盘文件406的哪些扇区被使用和哪些是空闲的。此实例中的自由空间图可以被配置成跟踪非零文件中的自由空间。在示范性实施例中,因为使用自由空间的非零部分来描述虚拟盘402的一部分(其必须是零或不必公开来自其它虚拟盘偏移量的信息),所以分别通过零或非信息公开模式(通常是零)来覆盖自由空间。虚拟盘解析器404可以使用此信息以确定虚拟盘文件的什么区段分配给虚拟盘扩展。例如,如果写入处于零状态的虚拟盘扩展,则虚拟盘解析器404可以分配其中已经有零的区段以支持(back)虚拟盘扩展。As described in more detail in the following paragraphs, allocation table 416 may also include status information; however, this configuration is exemplary. In alternative configurations, this information may be stored in different sections of virtual disk file 702 and loaded into RAM 104. Allocation table 416 may include an entry for each virtual disk extent; status information indicating the status of each extent; and a file offset (not illustrated) indicating where each virtual disk extent is described in virtual disk file 702. In alternative embodiments, an extent may be defined by multiple table entries that are mapped and contiguous (in file offsets). In this configuration, reads and writes across block boundaries can be handled as a single read/write to virtual disk file 702, provided that block payloads are contiguous within the file. In a specific example, virtual disk parser 404 may also store information indicating what type of bit pattern is stored in each unused section of the virtual disk file, i.e., a free space map. In addition to the above, the free space map may be used by virtual disk parser 404 to determine which sectors of virtual disk file 406 are used and which are free. The free space map in this example can be configured to track free space in non-zero files. In the exemplary embodiment, because the non-zero portion of free space is used to describe a portion of virtual disk 402 (which must be zero or not disclose information from other virtual disk offsets), the free space is overwritten with zeros or a non-information-disclosing pattern (usually zeros), respectively. Virtual disk parser 404 can use this information to determine which sections of the virtual disk file are allocated to virtual disk extents. For example, if a virtual disk extent is written to and is in the zero state, virtual disk parser 404 can allocate sections that already have zeros in them to back up the virtual disk extent.
随着访客操作系统412或操作系统508运行,它将会生成数据和文件并向虚拟盘402发出盘写入以存储数据。当虚拟盘文件702不具有任何额外未使用空间时,虚拟盘解析器404可以扩展文件的末端和使用新空间来描述虚拟盘扩展。访客操作系统412或操作系统508可以使用、删除、以及重新使用虚拟盘402的区段;然而,由于虚拟盘解析器404仅代表文件系统414存储数据,所以虚拟盘解析器404会无法确定访客操作系统412是否仍正使用虚拟盘文件的区段。结果是,虚拟盘解析器404可以持有虚拟盘文件702中的分配空间以描述文件系统414不再使用的虚拟盘扩展。这样的结果是虚拟盘文件702的尺寸会增长直到它达到虚拟盘402的尺寸为止。As the guest operating system 412 or operating system 508 runs, it will generate data and files and issue disk writes to the virtual disk 402 to store data. When the virtual disk file 702 does not have any additional unused space, the virtual disk parser 404 can extend the end of the file and use the new space to describe the virtual disk extents. The guest operating system 412 or operating system 508 can use, delete, and reuse segments of the virtual disk 402; however, because the virtual disk parser 404 only stores data on behalf of the file system 414, the virtual disk parser 404 may not be able to determine whether the guest operating system 412 is still using segments of the virtual disk file. As a result, the virtual disk parser 404 may hold allocated space in the virtual disk file 702 to describe virtual disk extents that are no longer in use by the file system 414. As a result, the size of the virtual disk file 702 will grow until it reaches the size of the virtual disk 402.
在示范性实施例中,虚拟盘解析器404可以被配置成收回虚拟盘文件的未使用区段以及可选地重新使用它们。这样,需要扩展虚拟盘文件的频率被减小,且虚拟盘文件的总体尺寸被减小。在实例实施例中,当文件系统告知它不再使用虚拟盘扩展时,虚拟盘解析器404可以从虚拟盘文件释放(即,解链接)虚拟盘扩展并使虚拟盘扩展与描述应当如何对待对虚拟盘扩展的读取操作的信息相关联。可以随后重新使用虚拟盘文件的区段以描述同样的或另一虚拟盘扩展。In an exemplary embodiment, virtual disk parser 404 can be configured to reclaim unused segments of a virtual disk file and optionally reuse them. This reduces the frequency with which the virtual disk file needs to be expanded, and reduces the overall size of the virtual disk file. In an exemplary embodiment, when the file system informs it that a virtual disk extent is no longer in use, virtual disk parser 404 can release (i.e., unlink) the virtual disk extent from the virtual disk file and associate the virtual disk extent with information describing how read operations on the virtual disk extent should be treated. The segments of the virtual disk file can subsequently be reused to describe the same or another virtual disk extent.
在示范性配置中,虚拟盘解析器404可以使用文件系统发出的零命令的修剪(TRIM)、未映射(UNMAP)、和/或同样写入(WRITE SAME)来确定何时可以从虚拟盘文件406释放虚拟盘扩展。访客操作系统412或操作系统508可以发出修剪命令。例如,随着访客操作系统412或操作系统508运行,文件系统414可以确定不再需要一些扇区和发出修剪命令。可替选地或另外,虚拟盘解析器404可以被配置成请求文件系统414在预定间隔、或者在满足预定准则时(例如,当实例化虚拟机410时、当关断虚拟机410时、在略微使用情况下等)发出修剪命令。In an exemplary configuration, virtual disk parser 404 can use TRIM, UNMAP, and/or WRITE SAME commands issued by the file system to determine when to release virtual disk extents from virtual disk file 406. Guest operating system 412 or operating system 508 can issue TRIM commands. For example, as guest operating system 412 or operating system 508 runs, file system 414 can determine that some sectors are no longer needed and issue TRIM commands. Alternatively or in addition, virtual disk parser 404 can be configured to request file system 414 to issue TRIM commands at predetermined intervals or when predetermined criteria are met (e.g., when virtual machine 410 is instantiated, when virtual machine 410 is shut down, under minimal usage conditions, etc.).
简言之,使用修剪命令来通知数据存储装置有关如下内容:哪些扇区不再考虑使用以使得数据存储装置可以可选地丢弃其中存储的数据。文件系统414可以使用一个类型的修剪命令(称为自由空间修剪命令)来告知文件系统414不再使用扇区,称为标准修剪命令的其它并非如此。两个类型的修剪命令之间的差异是当扇区是自由空间修剪的主题时,文件系统414通过防止用户空间应用等从扇区读取来提供扇区的安全性。可以利用文件系统414确保对以此方式修剪了的扇区进行访问的事实来增加高效分配虚拟盘文件空间的能力。在以下段落中更详细描述此特定方面。In short, TRIM commands are used to inform the data storage device about which sectors are no longer considered for use so that the data storage device can optionally discard the data stored therein. One type of TRIM command, called a freespace TRIM command, can be used by the file system 414 to inform the file system 414 that a sector is no longer in use, while another, called a standard TRIM command, does not. The difference between the two types of TRIM commands is that when a sector is the subject of a freespace TRIM, the file system 414 provides security for the sector by preventing userspace applications, etc., from reading from it. The fact that the file system 414 ensures access to sectors trimmed in this manner can be exploited to increase the ability to efficiently allocate virtual disk file space. This particular aspect is described in more detail in the following paragraphs.
在示范性配置中,虚拟盘解析器404可以被配置成在修剪命令完全覆盖虚拟盘扩展时执行收回操作。或者换言之,虚拟盘解析器404可以响应于修剪命令的接收从虚拟盘文件解链接虚拟盘扩展,所述修剪命令定义识别虚拟盘扩展中所有扇区的虚拟盘扇区范围。在同样或替选实施例中,当接收到覆盖虚拟盘扩展一部分的修剪命令时,虚拟盘解析器404可以确定虚拟盘文件的什么部分对应于修剪扇区并向存储装置106发送针对虚拟盘文件部分的修剪命令。在此实例中,底层文件系统(例如,虚拟化系统文件系统408、存储服务器文件系统504、或者计算机系统文件系统514)可以转译修剪命令的偏移量并把转译偏移量发送给存储装置106、经由内部数据结构更新直接收回空间、或者从高速缓冲存储器清除数据。In an exemplary configuration, virtual disk parser 404 can be configured to perform a reclaim operation when a TRIM command completely overwrites a virtual disk extent. In other words, virtual disk parser 404 can delink the virtual disk extent from the virtual disk file in response to receiving a TRIM command that defines a virtual disk sector range that identifies all sectors in the virtual disk extent. In the same or alternative embodiments, when a TRIM command is received that overwrites a portion of a virtual disk extent, virtual disk parser 404 can determine what portion of the virtual disk file corresponds to the TRIM sector and send a TRIM command for the portion of the virtual disk file to storage device 106. In this instance, the underlying file system (e.g., virtualization system file system 408, storage server file system 504, or computer system file system 514) can translate the offset of the TRIM command and send the translated offset to storage device 106, reclaim the space directly via an internal data structure update, or purge the data from cache memory.
在同样或另一实施例中,当接收到覆盖虚拟盘扩展一部分的修剪命令时,虚拟盘解析器404可以被配置成存储表明什么扇区已是修剪命令的主题以及修剪命令是否是自由空间修剪的信息。在修剪虚拟盘扩展的剩余部分的情况下,虚拟盘解析器404可以从虚拟盘文件释放虚拟盘扩展。In the same or another embodiment, when a trim command is received that overwrites a portion of a virtual disk extent, virtual disk parser 404 can be configured to store information indicating what sectors were the subject of the trim command and whether the trim command was a free space trim. In the event that the remaining portion of the virtual disk extent is trimmed, virtual disk parser 404 can free the virtual disk extent from the virtual disk file.
当释放虚拟盘扩展时,虚拟盘解析器404可以使虚拟盘扩展与描述可以如何操控针对虚拟盘扩展的读取操作的状态信息相关联。表1示例了虚拟盘解析器404可以与虚拟盘扩展相关联和用来优化虚拟盘文件的收回的示范性状态信息。可以通过使用两个状态(描述的和未描述的)在一个实例中完成用以收回虚拟盘扩展的能力;然而,由于在删除数据时通常未擦除虚拟盘文件702中存储的位模式,所以可以使用额外状态来确定选择以描述虚拟盘扩展的空间在可以重新使用它之前何时需要清除或是否可以在不覆盖其中先前存储的数据的情况下重新使用它。在删除后未擦除数据的一个原因是它花费处理器周期以擦除数据,由于一些存储装置被配置成基于每块执行写入操作,所以在利用新数据覆盖时擦除数据较高效。以下状态是示范性的,所述公开不限于使用以下表定义的状态。When releasing a virtual disk extent, virtual disk parser 404 may associate the virtual disk extent with state information that describes how read operations directed to the virtual disk extent may be handled. Table 1 illustrates exemplary state information that virtual disk parser 404 may associate with a virtual disk extent and use to optimize the reclamation of virtual disk files. The ability to reclaim a virtual disk extent can be accomplished in one example using two states (described and not described); however, since the bit pattern stored in virtual disk file 702 is typically not erased when data is deleted, additional states may be used to determine when the space selected to describe the virtual disk extent needs to be cleared before it can be reused, or whether it can be reused without overwriting previously stored data therein. One reason for not erasing data after deletion is that it consumes processor cycles to erase the data, and since some storage devices are configured to perform write operations on a per-block basis, erasing data is more efficient when overwriting with new data. The following states are exemplary, and the disclosure is not limited to the states defined using the following table.
表1.Table 1.
结合图7参照表1,列出的第一个状态是表明通过虚拟盘文件702的区段来描述虚拟盘扩展的“映射”状态。例如,虚拟盘扩展0是示例成处于“映射”状态的实例虚拟盘扩展。7, the first state listed is a "mapped" state indicating that the virtual disk extent is described by a section of virtual disk file 702. For example, virtual disk extent 0 is an example virtual disk extent illustrated as being in the "mapped" state.
继续表1的描述,虚拟盘扩展可以与表明虚拟盘扩展是“透明”(即,通过不同虚拟盘文件描述虚拟盘扩展)的状态信息相关联。在虚拟盘解析器404接收到对处于透明状态的虚拟盘扩展的读取操作的情况下,虚拟盘解析器404可以参考不同虚拟盘文件并检查它的分配表以确定如何响应读取。在虚拟盘解析器404接收到对虚拟盘扩展的写入的情况下,虚拟盘解析器404可以把虚拟盘扩展从“透明”状态转变为“映射”状态。Continuing with the description of Table 1, a virtual disk extent may be associated with state information indicating that the virtual disk extent is "transparent" (i.e., the virtual disk extent is described by a different virtual disk file). If virtual disk parser 404 receives a read operation for a virtual disk extent in the transparent state, virtual disk parser 404 may refer to the different virtual disk file and check its allocation table to determine how to respond to the read. If virtual disk parser 404 receives a write to the virtual disk extent, virtual disk parser 404 may transition the virtual disk extent from the "transparent" state to the "mapped" state.
结合图7继续表1的描述,虚拟盘扩展还可以与“未映射”状态相关联。在此实例中,未通过虚拟盘文件702描述虚拟盘扩展,也未通过链中的任何其它虚拟盘文件描述它。在此实例中,可以使用未映射状态来描述虚拟盘扩展,所述虚拟盘扩展经受了未表明文件系统414将会确保对虚拟盘扩展进行访问的修剪命令。或者换言之,用来把此虚拟盘扩展转变为此状态的修剪命令是标准修剪命令。在虚拟盘扩展处于未映射状态且接收到表明对扩展读取的IO任务的情况下,虚拟盘解析器404可以通过零、零令牌、一、表示全一的令牌、或者非信息公开位模式(例如,全零、全一、或者一和零的随机生成模式)响应。在此实例中,如果分配虚拟盘文件702的区段以支持(back)此状态下的虚拟盘扩展,则虚拟盘解析器404可以在分配之前向虚拟盘文件702的区段写入非信息公开位模式,或选择已经包括非信息公开位模式的区段以描述虚拟盘扩展。图7的虚拟盘扩展6被指示为处于未映射状态。Continuing with the description of Table 1 in conjunction with FIG7 , a virtual disk extension can also be associated with an "unmapped" state. In this example, the virtual disk extension is not described by virtual disk file 702, nor is it described by any other virtual disk file in the chain. In this example, the unmapped state can be used to describe a virtual disk extension that has been subjected to a TRIM command that did not indicate that the file system 414 will ensure access to the virtual disk extension. In other words, the TRIM command used to transition the virtual disk extension to this state was a standard TRIM command. If a virtual disk extension is in the unmapped state and an IO task indicating a read of the extension is received, the virtual disk parser 404 can respond with a zero, a zero token, a one, a token indicating all ones, or a non-information-disclosing bit pattern (e.g., all zeros, all ones, or a randomly generated pattern of ones and zeros). In this example, if a segment of virtual disk file 702 is allocated to back a virtual disk extent in this state, virtual disk parser 404 can write a non-information-disclosing bit pattern to the segment of virtual disk file 702 before allocation, or select a segment that already includes a non-information-disclosing bit pattern to describe the virtual disk extent. Virtual disk extent 6 of FIG7 is indicated as being in an unmapped state.
在实施例中,可以保持定义未映射或未初始化扩展的数据,未映射或未初始化状态可以包括两个子状态:意味着在虚拟盘文件702内仍存在数据的锚定、以及意味着可以或无法保持数据的未锚定。在使用这些子状态的情况下,虚拟盘解析器404可以通过在不使区段为零的情况下分配存储数据的区段而把未映射但锚定的扩展转变为映射的。类似地,虽然虚拟盘解析器404被配置成对待未初始化的扩展就象对于虚拟盘402的至少一部分未映射它们一样,但虚拟盘解析器404可以通过在不使区段为零的情况下分配存储数据的区段,在该扩展向映射的转变期间避免使未初始化但锚定的扩展为零。In an embodiment, data defining unmapped or uninitialized extents can be retained, and the unmapped or uninitialized state can include two sub-states: anchored, meaning that the data still exists within virtual disk file 702, and unanchored, meaning that the data can or cannot be retained. Using these sub-states, virtual disk parser 404 can transition an unmapped but anchored extent to a mapped state by allocating extents storing data without zeroing the extents. Similarly, although virtual disk parser 404 is configured to treat uninitialized extents as if they were unmapped with respect to at least a portion of virtual disk 402, virtual disk parser 404 can avoid zeroing uninitialized but anchored extents during the transition of the extent to a mapped state by allocating extents storing data without zeroing the extents.
表1额外描述“零”状态。在此实例中,未通过虚拟盘文件702描述虚拟盘扩展也未通过链中的任何其它虚拟盘文件描述它;然而,需要把虚拟盘扩展读取成全零。在此实例中,可以使用零状态来描述经受了任一类型修剪命令的虚拟盘扩展或描述程序写入了全零的虚拟盘扩展。例如,假设删除工具程序向虚拟盘扩展4写入了全零以保证完全覆盖了它先前存储的数据。在虚拟盘扩展处于零状态、以及接收到表明对扩展读取的IO任务的情况下,虚拟盘解析器404可以通过零或零令牌(在卸载读取操作中)响应。在写入针对此状态的虚拟盘扩展的情况下,虚拟盘解析器404可以使虚拟盘文件702的区段为零并使用它来描述虚拟盘扩展或选择已经是零的虚拟盘文件702的区段,以及分配它以支持虚拟盘扩展。在此实施例中,可以使用数据结构或虚拟盘文件702跟踪为零空间。可以在打开虚拟盘文件702时、在关闭虚拟盘文件702时等周期性地更新数据结构。从处于未映射或未初始化状态的扩展进行读取可以可选地使虚拟盘解析器404在虚拟盘解析器404被配置成提供处于未映射或未初始化状态的扩展的扇区稳定性的配置中把所述扩展转变为零状态。Table 1 additionally describes a "zero" state. In this example, the virtual disk extent is not described by virtual disk file 702 or any other virtual disk file in the chain; however, the virtual disk extent needs to be read as all zeros. In this example, the zero state can be used to describe a virtual disk extent that has been subjected to any type of TRIM command or to describe a virtual disk extent to which a program has written all zeros. For example, suppose a delete utility program writes all zeros to virtual disk extent 4 to ensure that its previously stored data is completely overwritten. If a virtual disk extent is in the zero state and an I/O task indicating a read of the extent is received, virtual disk parser 404 can respond with a zero or zero token (in an offload read operation). When writing to a virtual disk extent in this state, virtual disk parser 404 can zero a section of virtual disk file 702 and use it to describe the virtual disk extent, or select a section of virtual disk file 702 that is already zeroed and allocate it to support the virtual disk extent. In this embodiment, a data structure or virtual disk file 702 can be used to track zero space. The data structure may be updated periodically when the virtual disk file 702 is opened, when the virtual disk file 702 is closed, etc. Reading from an extent that is in an unmapped or uninitialized state may optionally cause the virtual disk parser 404 to transition the extent to a zero state in configurations where the virtual disk parser 404 is configured to provide sector stability for extents that are in an unmapped or uninitialized state.
表1还描述称为“未初始化”状态的状态。未初始化状态表明未通过虚拟盘文件702描述虚拟盘扩展且文件系统414正确保对虚拟盘扩展的访问。即,文件系统414被配置成防止用户应用读取此虚拟盘扩展内的扇区。在此实例中,可以使用未初始化状态来描述经受了自由空间修剪命令的虚拟盘扩展。在虚拟盘扩展处于未初始化状态且表明对扩展读取的IO任务被接收到的情况下,虚拟盘解析器404可以通过任何数据(即,来自虚拟盘文件702中几乎任何其它位置的位模式、零、一、非信息公开位模式等)响应,因为在只可以把虚拟盘载荷数据和非安全性影响元数据暴露给虚拟盘客户端的需求之外,虚拟盘解析器404并未正提供针对虚拟盘扩展的安全性。在写入针对此状态下的虚拟盘扩展的情况下,虚拟盘解析器404可以在不需要更改可以在区段内存储的任何数据的情况下简单分配虚拟盘文件702的区段。结果是,此状态是最有益的,因为可以在虚拟盘文件内分配空间而不必预先清除它。图7的虚拟盘扩展5被指示为处于未初始化状态,且虚拟盘文件702未正支持虚拟盘扩展。Table 1 also describes a state called "Uninitialized." The Uninitialized state indicates that a virtual disk extent is not described by virtual disk file 702 and that file system 414 is ensuring access to the virtual disk extent. That is, file system 414 is configured to prevent user applications from reading sectors within this virtual disk extent. In this example, the Uninitialized state can be used to describe a virtual disk extent that has been subjected to a free space TRIM command. When a virtual disk extent is in the Uninitialized state and an IO task indicating a read request for the extent has been received, virtual disk parser 404 can respond with any data (i.e., a bit pattern from virtually any other location in virtual disk file 702, zeros, ones, non-information-disclosing bit patterns, etc.) because virtual disk parser 404 is not providing security for the virtual disk extent beyond the requirement that only virtual disk payload data and non-security-impacting metadata be exposed to virtual disk clients. When writing to a virtual disk extent in this state, virtual disk parser 404 can simply allocate an extent of virtual disk file 702 without modifying any data that may be stored within the extent. As a result, this state is the most beneficial because space can be allocated within the virtual disk file without having to clear it beforehand.The virtual disk extent 5 of Figure 7 is indicated as being in an uninitialized state, and the virtual disk file 702 is not supporting the virtual disk extent.
一旦状态信息与每个虚拟盘扩展相关联,则虚拟盘解析器404可以被配置成向管理员等提供关于如何布置虚拟盘402的额外信息。在实例实施例中,虚拟盘解析器404可以被配置成基于状态信息响应包括某些参数的偏移查询。例如,用户可以通过虚拟盘402发出以给定字节偏移量开始迭代的查询,以及定位满足诸如“映射”、“未映射”、“透明”等的具体准则的范围。另外,用户可以选择查询应当进行得多么“深入”以把差异虚拟盘文件702考虑在内。例如、以及参照图7,用户可以设置深度2和执行查询。在响应中,虚拟盘解析器404将对链中的最后两个虚拟盘文件(例如,虚拟盘文件610和612)执行查询。具体查询可以包括用以获得接下来的非透明范围、接下来的非零范围、接下来的定义范围、接下来的初始化范围等的查询。简言之,接下来定义范围的查询可以被配置成返回包含定义数据的接下来的范围(例如,在透明扇区解析为针对该扇区的父虚拟盘文件状态的情况下,处于映射或为零状态的扇区)。在透明扇区解析为针对该扇区的父虚拟盘文件状态的情况下,接下来初始化范围的查询可以返回包含除了未初始化状态以外的状态中数据的接下来的范围。Once status information is associated with each virtual disk extent, virtual disk parser 404 can be configured to provide additional information to an administrator, etc., regarding how virtual disk 402 is arranged. In an example embodiment, virtual disk parser 404 can be configured to respond to offset queries that include certain parameters based on the status information. For example, a user can issue a query that begins iterating through virtual disk 402 at a given byte offset and locates ranges that meet specific criteria such as "mapped," "unmapped," "transparent," etc. Additionally, the user can select how "deep" the query should proceed to account for difference virtual disk file 702. For example, and referring to FIG. 7 , the user can set a depth of 2 and execute the query. In response, virtual disk parser 404 will execute a query on the last two virtual disk files in the chain (e.g., virtual disk files 610 and 612). Specific queries can include queries to obtain the next non-transparent range, the next non-zero range, the next defined range, the next initialized range, etc. In short, a query for a next defined range can be configured to return the next range containing defined data (e.g., a sector in a mapped or zeroed state, if a transparent sector resolves to the parent virtual disk file state for that sector). A query for a next initialized range can return the next range containing data in a state other than an uninitialized state, if a transparent sector resolves to the parent virtual disk file state for that sector.
现在转到图8,它示例了虚拟盘解析器404响应于向虚拟盘402保存的文件或其它数据、可以将虚拟盘扩展如何从一个状态向另一个转变的具体实例。例如,假设用户使用虚拟机410内的数据库管理程序并创建数据库。用户可以把数据库保存在文件中,且文件系统414可以确定在虚拟盘402上何处保存文件802。文件系统414可以发出用以把文件802写入到例如落在虚拟盘扩展3-5内的扇区的一个或更多个盘写入。在此实例中,虚拟盘扩展3被“映射”,且虚拟盘解析器404可以把文件802的第一个部分写入到通过分配表416识别的区段。Turning now to FIG. 8 , which illustrates a specific example of how virtual disk parser 404 may transition a virtual disk extent from one state to another in response to a file or other data being saved to virtual disk 402. For example, assume a user uses a database manager within virtual machine 410 and creates a database. The user may save the database in a file, and file system 414 may determine where to save file 802 on virtual disk 402. File system 414 may issue one or more disk writes to write file 802 to sectors falling within virtual disk extents 3-5, for example. In this example, virtual disk extent 3 is "mapped," and virtual disk parser 404 may write the first portion of file 802 to the sectors identified by allocation table 416.
另一方面,虚拟盘扩展4和5处于“零”和“未初始化”状态。在此实例中,虚拟盘解析器404可以选择虚拟盘文件702的未使用区段以支持虚拟盘扩展4并确定虚拟盘扩展4处于为零状态。响应于此确定,虚拟盘解析器404可以使正要用来描述虚拟盘扩展4的区段为零或定位已经是全零的区段。在定位为零区段或使区段为零的过程完成之后,虚拟盘解析器404可以生成如下信息:该信息识别虚拟盘文件字节偏移量,所述偏移量表明定义在虚拟盘文件702中的何处描述虚拟盘扩展4的区段的第一个字节,并把所述信息存储在分配表416中。虚拟盘解析器404可以随后改变与虚拟盘扩展4相关联的状态信息以表明它是“映射”的。随后可以把向扩展4写入的部分写入到定位的区段中。On the other hand, virtual disk extents 4 and 5 are in a "zeroed" and "uninitialized" state. In this example, virtual disk parser 404 can select an unused section of virtual disk file 702 to support virtual disk extent 4 and determine that virtual disk extent 4 is in a zeroed state. In response to this determination, virtual disk parser 404 can zero out the section that was about to describe virtual disk extent 4 or locate a section that is already all zeros. After the process of locating or zeroing the section is complete, virtual disk parser 404 can generate information identifying a virtual disk file byte offset indicating the first byte of the section that describes virtual disk extent 4 as defined in virtual disk file 702, and store this information in allocation table 416. Virtual disk parser 404 can then change the state information associated with virtual disk extent 4 to indicate that it is "mapped." Writes to extent 4 can then be written to the located section.
可替选地,对于涵盖当前处于零状态中虚拟盘的整个扩展的写入的一部分,可以选取虚拟盘文件的定位区段,可以向该区段发出写入的部分,在该写入完成后,虚拟盘解析器404可以随后改变与虚拟盘扩展相关联的状态信息以表明该扩展是“映射”的。可替选地,对于只涵盖当前处于零状态的虚拟盘扩展一部分的写入的一部分,可以选取虚拟盘文件的定位区段,可以向该区段发出该写入的部分,可以向区段的剩余部分发出为零写入,在写入完成后,虚拟盘解析器404可以随后改变与虚拟盘扩展相关联的状态信息以表明扩展是“映射”的。本领域技术人员将会认识到,可以使用刷新或直写式(write-through)写入(如,迫使单元访问(force-unit-access)写入)强制写入的给定排序。Alternatively, for a portion of a write covering an entire extent of a virtual disk currently in a zeroed state, a positioned segment of the virtual disk file may be selected, the portion of the write may be issued to that segment, and after the write is complete, virtual disk parser 404 may subsequently change the state information associated with the virtual disk extent to indicate that the extent is "mapped." Alternatively, for a portion of a write covering only a portion of a virtual disk extent currently in a zeroed state, a positioned segment of the virtual disk file may be selected, the portion of the write may be issued to that segment, a zeroed write may be issued to the remainder of the segment, and after the write is complete, virtual disk parser 404 may subsequently change the state information associated with the virtual disk extent to indicate that the extent is "mapped." Those skilled in the art will recognize that a given ordering of writes may be enforced using flushes or write-through writes (e.g., force-unit-access writes).
类似地,虚拟盘解析器404可以选择虚拟盘文件702的未使用区段以支持虚拟盘扩展5,以及通过参看分配表416确定虚拟盘扩展5处于未初始化状态。响应于此确定,虚拟盘解析器404可以在不修改选定区段的内容的情况下分配区段以描述虚拟盘扩展5。虚拟盘解析器404可以生成如下信息:该信息识别表明区段的第一个字节的虚拟盘文件字节偏移量,其表明在虚拟盘文件702中何处描述虚拟盘扩展4,以及把区段的文件字节偏移量存储在分配表416中。虚拟盘解析器404可以随后改变与虚拟盘扩展5相关联的状态信息以表明它是“映射”的。Similarly, virtual disk parser 404 may select an unused section of virtual disk file 702 to support virtual disk extent 5 and determine, by referring to allocation table 416, that virtual disk extent 5 is in an uninitialized state. In response to this determination, virtual disk parser 404 may allocate a section to describe virtual disk extent 5 without modifying the contents of the selected section. Virtual disk parser 404 may generate information that identifies a virtual disk file byte offset indicating the first byte of the section, indicating where virtual disk extent 4 is described in virtual disk file 702, and store the file byte offset of the section in allocation table 416. Virtual disk parser 404 may then change the state information associated with virtual disk extent 5 to indicate that it is "mapped."
图9示例了虚拟盘解析器404响应于对文件802的删除操作和使虚拟盘扩展7的内容为零的操作可以如何将虚拟盘扩展从一个状态转换到另一个的另一具体实例。例如,用户可能已删除了文件802,文件系统414可能已发出了修剪命令。在此实例中,虚拟盘解析器404可以接收修剪命令,该命令包括完全涵盖虚拟盘扩展4和5以及部分涵盖虚拟盘扩展3的虚拟盘扇区范围。响应于虚拟盘扩展4和5被完全修剪的确定,虚拟盘解析器404可以被配置成从分配表416去除链接并把虚拟盘扩展4变换为表明虚拟盘文件702未正支持此虚拟盘扩展的状态。如虚拟盘扩展4的分配表条目所示,虚拟盘解析器404把虚拟盘扩展变换所至的状态取决于虚拟盘解析器404被配置成使用什么状态以及文件系统414是否发出自由空间修剪命令或标准修剪命令。例如,虚拟盘解析器404可以被配置成使用两个状态:映射的和零以描述虚拟盘扩展。可替选地,虚拟盘解析器404可以被配置成使用三个状态:映射的、零、未映射的以描述虚拟盘扩展。可替选地,虚拟盘解析器404可以被配置成使用四个状态:映射的、零、未映射的、以及未初始化的。未映射的与未初始化的之间的区别对应于标准修剪与自由空间修剪之间的区别。如果解析器被配置成不使用未初始化状态,则自由空间修剪作为正常修剪对待。如图所示,文件702的部分仍在虚拟盘文件702中存储,这是由于从虚拟盘文件702清除它们效率低下。FIG9 illustrates another specific example of how virtual disk parser 404 can transition virtual disk extents from one state to another in response to a delete operation on file 802 and an operation to zero the contents of virtual disk extent 7. For example, a user may have deleted file 802, and file system 414 may have issued a TRIM command. In this example, virtual disk parser 404 may receive a TRIM command that includes a range of virtual disk sectors that completely encompasses virtual disk extents 4 and 5 and partially encompasses virtual disk extent 3. In response to determining that virtual disk extents 4 and 5 are completely trimmed, virtual disk parser 404 may be configured to remove the link from allocation table 416 and transition virtual disk extent 4 to a state indicating that virtual disk file 702 is not currently backing this virtual disk extent. As shown in the allocation table entry for virtual disk extent 4, the state to which virtual disk parser 404 transitions the virtual disk extent depends on the state virtual disk parser 404 is configured to use and whether file system 414 issued a free space TRIM command or a standard TRIM command. For example, virtual disk parser 404 can be configured to use two states: mapped and zero to describe virtual disk extensions. Alternatively, virtual disk parser 404 can be configured to use three states: mapped, zero, unmapped to describe virtual disk extensions. Alternatively, virtual disk parser 404 can be configured to use four states: mapped, zero, unmapped, and uninitialized. The difference between unmapped and uninitialized corresponds to the difference between standard trimming and free space trimming. If the parser is configured not to use the uninitialized state, free space trimming is treated as normal trimming. As shown, parts of file 702 are still stored in virtual disk file 702, which is inefficient because it is cleared from virtual disk file 702.
由于所述修剪部分涵盖了虚拟盘扩展5,所以虚拟盘解析器404可以通过各种方式中的一个方式操控此扩展。在一个配置中,虚拟盘解析器404可以使扩展5停留在映射状态下。在此配置中,虚拟盘解析器404可以在对于整个扩展接收了修剪信息时转变扩展。可替选地,虚拟盘解析器404可以在希望接收较多如下这种修剪信息的情况下跟踪部分涵盖扩展的修剪信息:所述修剪信息提供可以释放描述扩展的空间的指示。Because the TRIM partially covers virtual disk extent 5, virtual disk parser 404 can manipulate this extent in one of various ways. In one configuration, virtual disk parser 404 can leave extent 5 in a mapped state. In this configuration, virtual disk parser 404 can transform an extent when TRIM information is received for the entire extent. Alternatively, virtual disk parser 404 can track TRIM information that partially covers an extent, if it is desirable to receive more TRIM information that indicates that space describing the extent can be freed.
类似地,所述修剪也部分涵盖了虚拟盘扩展。在此实例中,虚拟盘解析器404可以使它停留在映射状态中并且还可以被配置成向底层文件系统(例如,虚拟化文件系统408、存储服务器文件系统504、或者计算机系统文件系统514)发送描述不再使用的虚拟盘文件702的部分的修剪信息。Similarly, the trimming also partially covers the virtual disk extension. In this example, the virtual disk parser 404 can leave it in the mapped state and can also be configured to send trim information describing the portion of the virtual disk file 702 that is no longer in use to the underlying file system (e.g., the virtualized file system 408, the storage server file system 504, or the computer system file system 514).
除了文件802的删除之外,图9示出了使虚拟盘扩展为零的实例。虚拟盘解析器404可以扫描表明使虚拟盘扩展7的整个范围为零的文件系统414发出的IO任务。响应于此确定,虚拟盘解析器404可以被配置成从扩展分配表416去除链接并把虚拟盘扩展7转变为零状态。如图所示,虚拟盘扩展7的先前内容仍在虚拟盘文件702中存储。In addition to the deletion of file 802, FIG9 illustrates an example of zeroing a virtual disk extent. Virtual disk parser 404 may scan for IO tasks issued by file system 414 indicating that the entire extent of virtual disk extent 7 should be zeroed. In response to this determination, virtual disk parser 404 may be configured to remove the link from extent allocation table 416 and transition virtual disk extent 7 to a zeroed state. As shown, the previous contents of virtual disk extent 7 are still stored in virtual disk file 702.
转到图10,它示例了由一组虚拟盘文件1002、1004、1006(其可以与虚拟盘文件608、604、以及600定义的虚拟盘文件链类似)至少部分地描述的虚拟盘402。在此示范性实施例中,在多个虚拟盘文件上拆散表示虚拟盘402的数据。在此示范性实施例中,当虚拟盘解析器404试图读取虚拟盘扩展1和2时,虚拟盘解析器404可以访问虚拟盘文件1002的分配表并确定这些扩展是透明的。接下来,虚拟盘解析器404可以访问虚拟盘文件1004的分配表并确定这些扩展是透明的。最终,虚拟盘解析器404可以访问祖父级虚拟盘文件1006的分配表并确定这些虚拟盘扩展被定义。Turning to FIG. 10 , a virtual disk 402 is illustrated, at least in part, as described by a set of virtual disk files 1002, 1004, and 1006 (which may be similar to the chain of virtual disk files defined by virtual disk files 608, 604, and 600). In this exemplary embodiment, the data representing virtual disk 402 is spread across multiple virtual disk files. In this exemplary embodiment, when virtual disk parser 404 attempts to read virtual disk extents 1 and 2, virtual disk parser 404 may access the allocation table of virtual disk file 1002 and determine that these extents are transparent. Next, virtual disk parser 404 may access the allocation table of virtual disk file 1004 and determine that these extents are transparent. Finally, virtual disk parser 404 may access the allocation table of grandparent virtual disk file 1006 and determine that these virtual disk extents are defined.
以下是描绘操作流程的一系列流程图。为了容易理解,流程图被组织成使得初始流程图经由总体“大图片”视点呈现实施方式,后续流程图提供以虚线示例的进一步的额外内容和/或细节。再者,本领域技术人员可以明白,由虚线描绘的操作流程被认为是可选的。The following is a series of flow charts depicting operational flows. For ease of understanding, the flow charts are organized such that the initial flow chart presents an embodiment from an overall "big picture" perspective, while subsequent flow charts provide further additional content and/or details illustrated by dashed lines. Furthermore, those skilled in the art will appreciate that operational flows depicted by dashed lines are considered optional.
现在参照图11,它示例了用于收回虚拟盘文件内空间的操作流程,包括操作1100、1102、1104、以及1106。操作1100开始操作流程,操作1102示出了实例化(1102)包括虚拟盘扩展的虚拟盘(402),使虚拟盘扩展从虚拟盘文件分离。简要转到图4、图5A或图5B,虚拟盘解析器404(例如,可执行指令和相关联实例数据)可以实例化虚拟盘402,所述解析器暴露一个或更多个虚拟盘文件内存储的数据作为逻辑硬盘驱动器,该逻辑硬盘驱动器可以被配置成通过模拟硬盘驱动器的行为来操控来自文件系统414的读/写操作。虚拟盘文件406(可以是如图6中所示例的一个或更多个文件)可以存储通常在物理硬盘驱动器上得到的内容,即,盘分区、文件系统等。转到图7,把虚拟盘402示出为包括多个扩展,使其中一些扩展从虚拟盘文件702的任何区段分离。Referring now to FIG. 11 , an operational flow for reclaiming space within a virtual disk file is illustrated, including operations 1100 , 1102 , 1104 , and 1106 . Operation 1100 begins the operational flow, and operation 1102 illustrates instantiating ( 1102 ) a virtual disk ( 402 ) including virtual disk extents, detaching the virtual disk extents from the virtual disk file. Briefly referring to FIG. 4 , FIG. 5A , or FIG. 5B , a virtual disk parser 404 (e.g., executable instructions and associated instance data) can instantiate virtual disk 402 , exposing data stored within one or more virtual disk files as a logical hard drive that can be configured to handle read/write operations from file system 414 by emulating the behavior of a hard drive. Virtual disk files 406 (which can be one or more files as illustrated in FIG. 6 ) can store content typically found on a physical hard drive, i.e., disk partitions, file systems, etc. Turning to FIG. 7 , virtual disk 402 is illustrated as including multiple extents, with some of the extents detached from any section of virtual disk file 702 .
在具体示例中,假设所述扩展是块。在此实例中,可以使用可以从虚拟盘文件702中的一个或更多个区段加载到随机访问存储器中的分配表416来存储将虚拟盘402中的盘块链接到虚拟盘文件702的扩展尺寸(例如,块尺寸的)的区段的信息。分配表416也可以存储虚拟盘402中每个虚拟盘块的状态信息。潜在地包括非零数据的虚拟块可以与表明块处于映射状态的状态信息相关联。即,分配了虚拟盘文件702的区段以描述虚拟盘402的块(即,存储虚拟盘402块的数据)。虚拟盘块0-3和7处于此状态的块的实例。如图所示,虚拟盘块4和5、6、8和9可以是有效的虚拟盘块;然而,这些虚拟盘块不会具有虚拟盘文件702内分配的任何空间。由于文件系统414可以向这些块写入,所以在示范性实施例中,这些虚拟盘块可以与如下信息相关联:所述信息可以被虚拟盘解析器404用来确定如何响应对它们的读取和/或写入操作。In this specific example, assume that the extents are blocks. In this instance, an allocation table 416, which can be loaded into random access memory from one or more extents in virtual disk file 702, can be used to store information linking disk blocks in virtual disk 402 to extents of the virtual disk file 702 (e.g., block size). Allocation table 416 can also store status information for each virtual disk block in virtual disk 402. Virtual blocks that potentially contain non-zero data can be associated with status information indicating that the block is in a mapped state. That is, an extent of virtual disk file 702 is allocated to describe a block of virtual disk 402 (i.e., to store data for the virtual disk 402 block). Examples of blocks in this state are virtual disk blocks 0-3 and 7. As shown in the figure, virtual disk blocks 4 and 5, 6, 8, and 9 can be valid virtual disk blocks; however, these virtual disk blocks do not have any space allocated within virtual disk file 702. Because file system 414 may write to these blocks, in an exemplary embodiment, these virtual disk blocks may be associated with information that may be used by virtual disk parser 404 to determine how to respond to read and/or write operations thereto.
简要返回参照图11,操作1104示出了计算机系统可以额外包括如下电路:该电路基于与虚拟盘(402)相关联的状态信息、在不覆盖虚拟盘文件的区段内预先已有位模式的情况下分配虚拟盘文件的区段(406,600,602,604,606,608,610,612,702,1002)以描述虚拟盘扩展。例如、以及回到图8,虚拟盘解析器404可以接收用以向虚拟盘扩展的一部分写入的IO任务。响应于写入IO任务的接收,虚拟盘解析器404可以检查分配表416并确定尚未分配虚拟盘文件702内的空间以描述虚拟盘扩展以及分配虚拟盘文件406的区段来支持虚拟盘扩展。因而,虚拟盘解析器404将会把文件系统414向虚拟盘扩展写入的数据存储在虚拟盘文件702的区段中。Referring briefly back to FIG. 11 , operation 1104 illustrates that the computer system may additionally include circuitry that allocates, based on state information associated with the virtual disk (402), sections (406, 600, 602, 604, 606, 608, 610, 612, 702, 1002) of the virtual disk file to describe a virtual disk extent without overwriting a pre-existing bit pattern within the sections of the virtual disk file. For example, and returning to FIG. 8 , the virtual disk parser 404 may receive an IO task to write to a portion of a virtual disk extent. In response to receiving the write IO task, the virtual disk parser 404 may examine the allocation table 416 and determine that space within the virtual disk file 702 has not been allocated to describe the virtual disk extent and allocate sections of the virtual disk file 406 to support the virtual disk extent. Consequently, the virtual disk parser 404 may store the data written by the file system 414 to the virtual disk extent in the sections of the virtual disk file 702.
在此实例中,虚拟盘解析器404不会覆盖虚拟盘文件702的区段中已经存储的任何数据(通过写入全零、一、或者任何其它非信息公开位模式)——在使用它以基于分配表416中的状态信息来描述虚拟盘扩展以前。在示范性配置中,因为文件系统自由空间涵盖虚拟盘扩展,所以状态信息可以表明文件系统414正确保对此虚拟盘扩展的访问。在具体实例中,状态信息可以表明虚拟盘扩展处于“未初始化”状态。分配虚拟盘扩展而不清除它提供了节省处理器周期和IO任务的附加益处,其如若不然将被用来覆盖虚拟盘文件702的区段。In this example, virtual disk parser 404 does not overwrite any data already stored in the section of virtual disk file 702 (by writing all zeros, ones, or any other non-information-disclosing bit pattern) before using it to describe the virtual disk extent based on the status information in allocation table 416. In the exemplary configuration, because the file system free space covers the virtual disk extent, the status information can indicate that the file system 414 is ensuring access to this virtual disk extent. In a specific example, the status information can indicate that the virtual disk extent is in an "uninitialized" state. Allocating the virtual disk extent without clearing it provides the additional benefit of saving processor cycles and IO work that would otherwise be used to overwrite the section of virtual disk file 702.
在操作1104的具体实例中、以及转到图7,假设扩展是块且文件系统414向虚拟盘402发送IO任务以向虚拟盘块3-5写入表明文件802的位模式。响应于这种IO任务的接收,虚拟盘解析器404可以确定虚拟盘块5不被虚拟盘文件406的任何区段支持且未初始化它。响应于此确定,虚拟盘解析器404可以被配置成分配虚拟盘文件702的区段以描述虚拟盘块5并在其中写入表明文件802的位模式的一部分而不覆盖先前存储在IO任务未涵盖的区段的部分中的数据。In the specific example of operation 1104, and turning to FIG7 , assume that the extent is a block and file system 414 sends an IO task to virtual disk 402 to write a bit pattern indicative of file 802 to virtual disk blocks 3-5. In response to receipt of such an IO task, virtual disk parser 404 may determine that virtual disk block 5 is not backed by any extent of virtual disk file 406 and uninitialize it. In response to this determination, virtual disk parser 404 may be configured to allocate an extent of virtual disk file 702 to describe virtual disk block 5 and write therein a portion of the bit pattern indicative of file 802 without overwriting data previously stored in the portion of the extent not covered by the IO task.
再次转到图11,操作1106示出了计算机系统可以额外包括如下电路:该电路被配置成修改(1106)与虚拟盘扩展相关联的状态信息以表明通过虚拟盘文件描述虚拟盘扩展。例如、以及转回到图8,虚拟盘解析器404可以修改(例如,在存储器中覆盖)与虚拟盘扩展5相关联的状态信息以反映虚拟盘文件702正描述虚拟盘扩展。在一个配置中,状态信息的写入和修改可以同时发生。例如,虚拟盘解析器404可以在分配表416中存储表明虚拟盘扩展5是“映射”的信息。结果是,针对虚拟盘扩展5的扇区的后续读取操作将会由虚拟盘解析器404通过返回分配表416中识别的字节偏移量处存储的位模式来操控。虚拟盘解析器404可以向分配以描述虚拟盘扩展的虚拟盘文件702的区段同时写入数据(例如,与触发了此流程的写入操作相关联的位模式)和向虚拟化系统文件系统408、存储服务器文件系统504、或者计算机系统文件系统514发出用以向虚拟盘702的区段写入位模式的IO任务。在一些时间点,如,在完成后续发出的刷新命令之前,将会在永久存储单元460中维持位模式。Turning again to FIG. 11 , operation 1106 illustrates that the computer system may additionally include circuitry configured to modify ( 1106 ) state information associated with virtual disk extents to indicate that the virtual disk extents are described by virtual disk files. For example, and returning to FIG. 8 , virtual disk parser 404 may modify (e.g., overwrite in memory) state information associated with virtual disk extent 5 to reflect that virtual disk file 702 is describing the virtual disk extent. In one configuration, the writing and modification of state information may occur simultaneously. For example, virtual disk parser 404 may store information in allocation table 416 indicating that virtual disk extent 5 is "mapped." Consequently, subsequent read operations targeting sectors of virtual disk extent 5 will be handled by virtual disk parser 404 by returning the bit pattern stored at the byte offset identified in allocation table 416. The virtual disk parser 404 can simultaneously write data (e.g., a bit pattern associated with the write operation that triggered this process) to the extent of the virtual disk file 702 allocated to describe the virtual disk extent and issue an IO task to the virtualization system file system 408, the storage server file system 504, or the computer system file system 514 to write the bit pattern to the extent of the virtual disk 702. At some point in time, such as until a subsequently issued flush command is completed, the bit pattern will be maintained in the persistent storage unit 460.
现在转到图12,它示出了可以结合图11所示例的那些来执行的额外操作。转到操作1208,它表明计算机系统可以包括如下电路:该电路用于通过识别非零虚拟盘的扇区、处于非透明状态的虚拟盘的扇区、处于映射状态的虚拟盘的扇区、和/或处于初始化状态的虚拟盘的扇区的信息来响应偏移查询命令。例如,虚拟盘解析器404可以被配置成接收用以生成有关虚拟盘402的信息的命令:所述信息诸如在给定起始字节偏移量的情况下,处于非透明状态(即,除了透明以外的状态)、映射状态(即,包括虚拟盘文件406中数据的虚拟盘402的扇区)、定义状态(即,映射的或是零的虚拟盘402的扇区)、和/或初始化状态(即,除了未初始化以外的状态)中的虚拟盘上的接下来的字节偏移量。所述命令可以是深度受限的,原因在于只检查了特定数量的虚拟盘文件,并且除了由状态查询所表明的范围之外(无论请求了哪个状态查询),在检查特定数量的虚拟盘文件之后仍然透明的任何范围都被上报回请求方。响应于这种命令的接收,虚拟盘解析器404可以在虚拟盘402上的初始字节偏移量处开始并建立响应范围或范围的集合直到与命令相关联的范围被检测到且返回期望信息为止。Turning now to FIG. 12 , additional operations that may be performed in conjunction with those illustrated in FIG. 11 are illustrated. Turning to operation 1208 , the computer system may include circuitry for responding to an offset query command by identifying information about non-zero virtual disk sectors, sectors of virtual disks in a non-transparent state, sectors of virtual disks in a mapped state, and/or sectors of virtual disks in an initialized state. For example, virtual disk parser 404 may be configured to receive a command to generate information about virtual disk 402, such as the next byte offset on the virtual disk in a non-transparent state (i.e., a state other than transparent), a mapped state (i.e., a sector of virtual disk 402 that includes data in virtual disk file 406), a defined state (i.e., a sector of virtual disk 402 that is mapped or zeroed), and/or an initialized state (i.e., a state other than uninitialized), given a starting byte offset. The command may be depth limited in that only a certain number of virtual disk files are checked, and any ranges that remain transparent after checking the certain number of virtual disk files, in addition to the range indicated by the status query (regardless of which status query is requested), are reported back to the requester. In response to receipt of such a command, virtual disk parser 404 may start at an initial byte offset on virtual disk 402 and build a response range or set of ranges until the range associated with the command is detected and the desired information is returned.
继续图12的描述,操作1210示出了向控制虚拟盘文件(406,600,602,604,606,608,610,612,702,1002)的文件系统(414)发送请求以发出从包括修剪命令、未映射命令、写入同零命令(a write same of zero command)、以及零令牌卸载写入命令的命令组中选择的至少一个命令。返回参照图4、图5A、或者图5B,虚拟盘解析器404可以被配置成向文件系统414发出请求。此示例中的请求可以用于文件系统414发出修剪命令。例如,虚拟盘解析器404可以在虚拟盘402的实例化之后不久、和/或在虚拟机410关断、休眠等以前,周期性地向文件系统414发出一个或更多个请求。响应于这种请求,文件系统414可以确定它不再使用虚拟盘402的什么扇区以及向虚拟盘解析器404发送识别这些未使用的扇区的一个或更多个修剪命令。虚拟盘解析器404可以因此接收如下修剪信息,诸如文件系统414不再使用的扇区范围的列表、以及文件系统414是否正阻止从扇区的范围进行读取以确保对那些扇区的访问。虚拟盘解析器404可以接收信息,并把被该范围涵盖的虚拟盘扩展转变成可以收回虚拟盘文件702内的空间的状态。Continuing with the description of FIG. 12 , operation 1210 illustrates sending a request to a file system (414) controlling a virtual disk file (406, 600, 602, 604, 606, 608, 610, 612, 702, 1002) to issue at least one command selected from a command group consisting of a trim command, an unmap command, a write same of zero command, and a zero token offload write command. Referring back to FIG. 4 , FIG. 5A , or FIG. 5B , virtual disk parser 404 may be configured to issue a request to file system 414. The request in this example may be for file system 414 to issue a trim command. For example, virtual disk parser 404 may periodically issue one or more requests to file system 414 shortly after instantiation of virtual disk 402 and/or before virtual machine 410 is shut down, hibernated, or the like. In response to such a request, file system 414 can determine which sectors of virtual disk 402 it no longer uses and send one or more TRIM commands identifying these unused sectors to virtual disk parser 404. Virtual disk parser 404 can thus receive TRIM information such as a list of sector ranges that are no longer used by file system 414 and whether file system 414 is blocking reads from the ranges of sectors to ensure access to those sectors. Virtual disk parser 404 can receive the information and transition the virtual disk extents covered by the ranges into a state where space within virtual disk file 702 can be reclaimed.
继续图12的描述,操作1212示出了计算机系统可以包括用于响应于修剪第二虚拟盘扩展的一部分的请求的接收来确定第二虚拟盘扩展的一部分对应的虚拟盘文件的一部分的电路;以及用于向被配置成把虚拟盘文件存储在存储装置中的文件系统发送针对所确定的虚拟盘文件部分的修剪命令的电路。例如、以及参照图8,文件系统414可以发出识别虚拟盘扩展的一部分的修剪命令,例如,修剪命令可以只识别形成一个或更多个虚拟盘块的扇区的一部分对应的扇区的范围。在具体实例中,假设文件系统414修剪用来存储文件802的空间。这样,修剪命令可以只识别构建虚拟盘扩展3的扇区的一部分。在此实例中,虚拟盘解析器404可以确定扇区的范围涵盖虚拟盘扩展的子区段并使用分配表416中的映射信息来确定虚拟盘扩展的修剪扇区对应的虚拟盘文件702的部分。虚拟盘解析器404可以向虚拟化系统文件系统408或存储服务器文件系统504发出用以修剪虚拟盘扩展的修剪扇区对应的虚拟盘文件702的部分的请求。虚拟化系统文件系统408或存储服务器文件系统504可以被配置成通过修剪支持虚拟盘文件406的扇区的一部分、刷新来自高速缓冲存储器的数据、清除内部缓存器等使用修剪命令并从它获益。Continuing with the description of FIG. 12 , operation 1212 illustrates that the computer system may include circuitry for, in response to receiving a request to trim a portion of a second virtual disk extent, determining a portion of a virtual disk file corresponding to the portion of the second virtual disk extent; and circuitry for sending a trim command for the determined portion of the virtual disk file to a file system configured to store the virtual disk file in a storage device. For example, and referring to FIG. 8 , file system 414 may issue a trim command that identifies a portion of a virtual disk extent. For example, the trim command may identify only a range of sectors corresponding to a portion of sectors forming one or more virtual disk blocks. In a specific example, assume that file system 414 trims space used to store file 802. Thus, the trim command may identify only a portion of sectors forming virtual disk extent 3. In this example, virtual disk parser 404 may determine that the range of sectors encompasses a subsection of the virtual disk extent and, using mapping information in allocation table 416, determine the portion of virtual disk file 702 corresponding to the trimmed sectors of the virtual disk extent. The virtual disk parser 404 may issue a request to the virtualization system file system 408 or the storage server file system 504 to trim the portion of the virtual disk file 702 corresponding to the trim sectors of the virtual disk extent. The virtualization system file system 408 or the storage server file system 504 may be configured to use and benefit from the trim command by trimming a portion of the sectors backing the virtual disk file 406, flushing data from the cache memory, clearing internal buffers, and the like.
可替选地,虚拟盘解析器404可以存储表明虚拟盘扩展的一部分被修剪的信息以及表明它是否是自由空间修剪的信息。随着访客操作系统412或操作系统508运行,它可以最终使虚拟盘扩展的剩余部分为零或修剪虚拟盘扩展的剩余部分。响应于此事件,虚拟盘解析器404可以确定把虚拟盘扩展转变成未通过虚拟盘文件702对其进行描述的状态和基于如何修剪或使虚拟盘扩展的不同部分为零来选择状态。虚拟盘解析器404可以被配置成选择用以在可把虚拟盘扩展的不同部分置于不同未描述状态中时转变虚拟盘扩展的最限制性状态,其中,零状态是最限制性的,未初始化是最少限制性的,未映射在其之间某处。例如,如果使第一部分为零以及剩余部分未初始化,则虚拟盘解析器404可以把整个虚拟盘扩展转变成为零状态。Alternatively, virtual disk parser 404 may store information indicating that a portion of the virtual disk extent was trimmed and whether it was a free space trim. As guest operating system 412 or operating system 508 executes, it may eventually zero or trim the remaining portion of the virtual disk extent. In response to this event, virtual disk parser 404 may determine to transition the virtual disk extent to a state not described by virtual disk file 702 and select a state based on how different portions of the virtual disk extent were trimmed or zeroed. Virtual disk parser 404 may be configured to select the most restrictive state to transition the virtual disk extent to when different portions of the virtual disk extent may be placed in different undescribed states, with the zero state being the most restrictive, uninitialized being the least restrictive, and unmapped being somewhere in between. For example, if the first portion is zeroed and the remaining portion is uninitialized, virtual disk parser 404 may transition the entire virtual disk extent to the zero state.
继续图12的描述,操作1214示例了计算机系统400可以额外包括如下电路:该电路被配置成响应于用以修剪涵盖虚拟盘扩展的扇区范围的请求的接收、把虚拟盘扩展从虚拟盘文件的区段释放和修改与虚拟盘扩展相关联的状态信息以表明虚拟盘扩展在虚拟盘文件中不具有相关联空间。例如、以及转到图9,虚拟盘解析器404可以去除将虚拟盘扩展联结到虚拟盘文件702的区段的分配表416中的链接。此操作具有使虚拟盘扩展从虚拟盘文件702分离的效果。除了去除链接之外,虚拟盘解析器404可以修改与虚拟盘扩展相关联的状态信息以表明扩展在虚拟盘文件702内不具有关联空间,即,虚拟盘解析器404可以把虚拟盘扩展置于未映射、未初始化、或者为零状态中。Continuing with the description of FIG12, operation 1214 illustrates that computer system 400 may additionally include circuitry configured to, in response to receiving a request to trim a sector range encompassing a virtual disk extent, release the virtual disk extent from the extent of the virtual disk file and modify state information associated with the virtual disk extent to indicate that the virtual disk extent has no associated space within the virtual disk file. For example, and returning to FIG9, virtual disk parser 404 may remove a link in allocation table 416 that ties the virtual disk extent to the extent of virtual disk file 702. This operation has the effect of detaching the virtual disk extent from virtual disk file 702. In addition to removing the link, virtual disk parser 404 may modify state information associated with the virtual disk extent to indicate that the extent has no associated space within virtual disk file 702. That is, virtual disk parser 404 may place the virtual disk extent in an unmapped, uninitialized, or zeroed state.
虚拟盘解析器404可以响应于用以修剪或使虚拟盘扩展的区段为零的请求的接收而去除链接和更新状态信息。例如,可以接收识别可以涵盖一个或更多个虚拟盘扩展的字节偏移量范围的、用以修剪或使扇区为零的请求。响应于这种IO任务的接收,虚拟盘解析器404可以确定所述请求涵盖虚拟盘扩展的扇区和执行用于去除链接和更新状态信息的前述操作。Virtual disk parser 404 can remove links and update status information in response to receiving a request to trim or zero sectors of a virtual disk extent. For example, a request to trim or zero sectors can be received that identifies a range of byte offsets that can cover one or more virtual disk extents. In response to receiving such an IO task, virtual disk parser 404 can determine that the request covers sectors of the virtual disk extent and perform the aforementioned operations for removing links and updating status information.
在具体实例中,假设IO任务表明修剪是自由空间修剪。例如,用户可能已删除了作为虚拟盘扩展3-5上的位模式存储的文件802,文件系统414可以表明文件系统414不再使用该空间。响应于自由空间修剪命令的接收,虚拟盘解析器404可以访问分配表416和确定文件系统414已经修剪了扩展3、5的一部分和扩展4的全部。在此实例中,虚拟盘解析器404可以去除把虚拟盘扩展4映射到虚拟盘文件702的链接以及修改与虚拟盘扩展4相关联的状态信息以表明虚拟盘扩展未初始化。现在可以重新使用虚拟盘文件702的此区段以支持其它虚拟盘扩展。另外,虚拟盘解析器404可以确定虚拟盘扩展3和5是部分修剪命令的主体。在此实例中,虚拟盘解析器404可以使用分配表416以发现描述虚拟盘文件702的部分(所述部分描述虚拟盘扩展3和5的修剪部分)的虚拟盘文件字节偏移量、并向虚拟化系统文件系统408、存储服务器文件系统504、或者计算机系统文件514发出描述虚拟盘文件字节偏移量的修剪命令。In a specific example, assume that the IO task indicates that the trim is a free space trim. For example, a user may have deleted file 802, which is stored as a bit pattern on virtual disk extents 3-5, and file system 414 may indicate that file system 414 no longer uses this space. In response to receiving the free space trim command, virtual disk parser 404 may access allocation table 416 and determine that file system 414 has trimmed extents 3, 5, and all of extent 4. In this example, virtual disk parser 404 may remove the link that maps virtual disk extent 4 to virtual disk file 702 and modify the state information associated with virtual disk extent 4 to indicate that the virtual disk extent is uninitialized. This section of virtual disk file 702 can now be reused to support other virtual disk extents. In addition, virtual disk parser 404 may determine that virtual disk extents 3 and 5 are the subject of a partial trim command. In this example, the virtual disk parser 404 can use the allocation table 416 to discover the virtual disk file byte offset that describes the portion of the virtual disk file 702 (the portion that describes the trimmed portion of virtual disk extents 3 and 5) and issue a trim command describing the virtual disk file byte offset to the virtualization system file system 408, the storage server file system 504, or the computer system file system 514.
在另一具体实例中,假设文件系统414发出的IO任务表明使文件802为零。例如,文件802可以是存储敏感信息的数据库文件,所述敏感信息诸如信用卡号和管理员,该管理员被确定通过对文件内容写入全零来对其清零(zero out),这是通过将会在文件802中已有的数据上写入零的全零缓存器发出写入命令来实现。响应于这种IO任务的接收,虚拟盘解析器404可以被配置成确定使虚拟盘扩展4为零以及可以收回此扩展。在此实例中,虚拟盘解析器404可以去除把虚拟盘扩展4映射到虚拟盘文件702的链接以及修改与虚拟盘扩展4相关联的状态信息以表明使虚拟盘扩展为零。现在可以重新使用虚拟盘文件702的此区段以支持其它虚拟盘扩展,虚拟盘解析器404可以通过以全零应答来响应对虚拟盘扩展4的后续读取操作。In another specific example, assume that file system 414 issues an IO task indicating that file 802 should be zeroed. For example, file 802 may be a database file storing sensitive information, such as credit card numbers and administrators. The administrator is determined to zero out the file contents by writing all zeros to the file contents. This is accomplished by issuing a write command to an all-zero buffer that will write zeros over the existing data in file 802. In response to receiving this IO task, virtual disk parser 404 may be configured to determine that virtual disk extent 4 should be zeroed out and that this extent may be reclaimed. In this example, virtual disk parser 404 may remove the link that maps virtual disk extent 4 to virtual disk file 702 and modify the state information associated with virtual disk extent 4 to indicate that the virtual disk extent should be zeroed out. This section of virtual disk file 702 can now be reused to support other virtual disk extents, and virtual disk parser 404 may respond to subsequent read operations on virtual disk extent 4 by replying with all zeros.
在另一具体实例中,用户可以写入批量零以初始化文件802的状态,而非覆盖其中存储的数据。在此实例中,可以使用诸如虚拟盘解析器404上报作为零读取的修剪区段的情况下的诸如修剪命令、虚拟盘解析器404上报未映射区域是零时的未映射命令、同零写入(WRITE SAME of zero)命令、和/或零令牌卸载写入命令把扩展转变成为零状态。In another specific example, rather than overwriting the data stored therein, a user may write a batch of zeros to initialize the state of file 802. In this example, the extent may be transitioned to a zero state using commands such as a TRIM command when virtual disk parser 404 reports a trimmed segment read as zero, an UNMAP command when virtual disk parser 404 reports an unmapped area as zero, a WRITE SAME OF ZERO command, and/or a ZERO TOKEN UNLOAD WRITE command.
在具体实例中,假设IO任务表明修剪是标准修剪。例如,用户可能已删除了作为虚拟盘扩展3-5上的位模式存储的文件802;然而,修剪命令无法表明文件系统414是否正使用空间。响应于标准修剪命令的接收,虚拟盘解析器404可以访问分配表416并确定文件系统414已修剪了扩展3、5的一部分和扩展4的全部。在此实例中,虚拟盘解析器404可以去除把虚拟盘扩展4映射到虚拟盘文件702的链接,并修改与虚拟盘扩展4相关联的状态信息以表明虚拟盘扩展未映射或是零。现在可以重新使用虚拟盘文件702的此区段以描述其它虚拟盘扩展。另外,虚拟盘解析器404可以确定虚拟盘扩展3和5是部分修剪命令的主体。在此实例中,虚拟盘解析器404可以使用分配表416发现虚拟盘文件字节偏移量(其构成描述虚拟盘扩展3和5的修剪部分的虚拟盘文件702的部分)并向虚拟化系统文件系统408发出指定虚拟盘文件字节偏移量(通常以范围的形式)的修剪命令。In a specific example, assume that the IO task indicates that the trim is a standard trim. For example, a user may have deleted file 802, which is stored as a bit pattern on virtual disk extents 3-5; however, the trim command cannot indicate whether file system 414 is using the space. In response to receiving the standard trim command, virtual disk parser 404 can access allocation table 416 and determine that file system 414 has trimmed extents 3, part of extent 5, and all of extent 4. In this example, virtual disk parser 404 can remove the link that maps virtual disk extent 4 to virtual disk file 702 and modify the state information associated with virtual disk extent 4 to indicate that the virtual disk extent is unmapped or is zeroed. This section of virtual disk file 702 can now be reused to describe other virtual disk extents. In addition, virtual disk parser 404 can determine that virtual disk extents 3 and 5 are the subject of a partial trim command. In this example, virtual disk parser 404 can use allocation table 416 to discover the virtual disk file byte offset (which constitutes the portion of virtual disk file 702 that describes the trimmed portion of virtual disk extents 3 and 5) and issue a trim command to virtualization system file system 408 that specifies the virtual disk file byte offset (typically in the form of a range).
现在参照图13,其示例了除图12的操作1214之外可以执行的额外操作。操作1316示例了计算机系统可以包括用于接收向虚拟盘扩展写入数据的请求的电路;用于基于与虚拟盘扩展相关联的状态信息使虚拟盘文件的未使用区段为零的电路,该状态信息表明使虚拟盘扩展为零;以及用于分配虚拟盘文件的未使用区段以描述虚拟盘扩展的电路。参照图9的上下文,虚拟盘解析器404可以接收将数据写入虚拟盘扩展(例如,图9的虚拟盘扩展4,其在此实例中与表明使虚拟盘扩展为零的状态信息相关联)的请求。例如,当释放虚拟盘扩展4时虚拟盘解析器404可能已确定了使虚拟盘扩展为零,即,某应用通过使用公知零令牌的卸载写入向文件602写入全零。Referring now to FIG. 13 , additional operations that may be performed in addition to operation 1214 of FIG. 12 are illustrated. Operation 1316 illustrates that a computer system may include circuitry for receiving a request to write data to a virtual disk extent; circuitry for zeroing unused segments of a virtual disk file based on status information associated with the virtual disk extent, the status information indicating that the virtual disk extent is to be zeroed; and circuitry for allocating unused segments of the virtual disk file to describe the virtual disk extent. Referring to the context of FIG. 9 , virtual disk parser 404 may receive a request to write data to a virtual disk extent (e.g., virtual disk extent 4 of FIG. 9 , which in this example is associated with status information indicating that the virtual disk extent is to be zeroed). For example, virtual disk parser 404 may have determined that the virtual disk extent is to be zeroed when virtual disk extent 4 is freed, i.e., an application writes all zeros to file 602 via an offload write using a well-known zero token.
响应于确定虚拟盘扩展处于为零状态,虚拟盘解析器404可以识别虚拟盘文件702的未使用区段(即,未主动被使用以描述虚拟盘扩展和未主动被使用以存储任何分配的元数据的区段),以及使用该区段来支持虚拟盘扩展。虚拟盘解析器进一步保证作为全零读取的、来自新分配的扩展的尚未写入扇区的任何读取。虚拟盘解析器404可以向该区段写入IO写入任务的载荷;更新状态信息以表明虚拟盘扩展被映射;以及更新分配表416中的信息以描述虚拟盘文件字节偏移量,其识别用来存储虚拟盘扩展4的区段的开始。虚拟盘解析器404还可以创建日志条目,其保证在刷新写入以前系统故障和重启的情况中,新分配扩展的尚未写入扇区仍作为全零读取、以及新分配扩展的写入扇区作为全零或者写入数据读取。在第一个后续刷新命令后,虚拟盘解析器404保证继刷新完成后的系统故障将会造成从读取写入了的数据的新分配扩展的先前写入扇区读取、以及从读取零的新分配扩展的尚未写入扇区读取。In response to determining that the virtual disk extent is in a zeroed state, virtual disk parser 404 can identify an unused section of virtual disk file 702 (i.e., a section not actively used to describe the virtual disk extent and not actively used to store any allocated metadata) and use that section to support the virtual disk extent. Virtual disk parser 404 further ensures that any reads of unwritten sectors from the newly allocated extent are read as all zeros. Virtual disk parser 404 can write the payload of the IO write task to that section; update status information to indicate that the virtual disk extent is mapped; and update information in allocation table 416 to describe the virtual disk file byte offset that identifies the start of the section used to store virtual disk extent 4. Virtual disk parser 404 can also create a log entry that ensures that in the event of a system failure and restart before flushing writes, unwritten sectors of the newly allocated extent are still read as all zeros, and written sectors of the newly allocated extent are read as all zeros or written data. After the first subsequent flush command, virtual disk parser 404 ensures that a system failure following flush completion will result in reading from previously written sectors of the newly allocated extents reading written data, and reading from unwritten sectors of the newly allocated extents reading zeros.
继续图13的描述,操作1318示出了计算机系统可以包括用于接收向虚拟盘扩展写入的请求的电路;以及用于基于与虚拟盘扩展相关联的状态信息在不修改虚拟盘文件未使用区段的内容的情况下分配虚拟盘文件的未使用区段以描述虚拟盘扩展的电路,所述状态信息表明文件系统正确保对虚拟盘扩展的访问。再次参照图9的上下文,虚拟盘解析器404可以接收用以向虚拟盘扩展(例如,图9的虚拟盘扩展4,其在此实例中与表明文件系统414正提供虚拟盘扩展的安全性的状态信息相关联)写入数据的IO任务。响应于检测此状态信息,虚拟盘解析器404可以识别虚拟盘文件702的未使用区段;把IO任务的载荷写入该区段;更新状态信息以表明虚拟盘扩展被映射;以及更新分配表416中的信息以描述虚拟盘文件字节偏移量,该偏移量识别用来存储虚拟盘扩展4的区段的开始。Continuing with the description of FIG. 13 , operation 1318 illustrates that the computer system may include circuitry for receiving a request to write to a virtual disk extent; and circuitry for allocating an unused section of the virtual disk file to describe the virtual disk extent without modifying the contents of the unused section of the virtual disk file based on state information associated with the virtual disk extent, indicating that the file system is ensuring access to the virtual disk extent. Referring again to the context of FIG. 9 , virtual disk parser 404 may receive an IO task to write data to a virtual disk extent (e.g., virtual disk extent 4 of FIG. 9 , which in this example is associated with state information indicating that file system 414 is providing security for the virtual disk extent). In response to detecting this state information, virtual disk parser 404 may identify an unused section of virtual disk file 702; write the payload of the IO task to the section; update the state information to indicate that the virtual disk extent is mapped; and update information in allocation table 416 to describe a virtual disk file byte offset identifying the beginning of the section used to store virtual disk extent 4.
假设在此实例中所述扩展是块且IO任务的载荷只涵盖虚拟盘块中扇区的一部分。具体地,虚拟盘块可以是512千字节,所述写入可以涵盖虚拟盘块的前500个扇区。在此实例中,虚拟盘解析器404可以在不擦除剩余524个扇区中存储的数据的情况下在虚拟盘文件702分配区段的前500个扇区中写入数据。因而,如果检查了此区段则将会得到前500个扇区包括载荷而剩余524个扇区包括先前向虚拟盘文件702的区段写入了无论什么位模式。在此实例中,虚拟盘解析器404可以使用此区段而不清除它,因为文件系统414被配置成拒绝对在文件系统自由空间中的扇区的读取操作。由于将会防止应用读取虚拟盘块的剩余524个区段,所以它可以包含先前在虚拟盘中存储的任何数据。Assume in this example that the extent is a block and the I/O task's payload only covers a portion of the sectors in the virtual disk block. Specifically, the virtual disk block may be 512 kilobytes, and the write may cover the first 500 sectors of the virtual disk block. In this example, virtual disk parser 404 can write data to the first 500 sectors of the allocated section of virtual disk file 702 without erasing the data stored in the remaining 524 sectors. Therefore, if this section is examined, the first 500 sectors will contain the payload, while the remaining 524 sectors will contain whatever bit pattern was previously written to the section of virtual disk file 702. In this example, virtual disk parser 404 can use this section without clearing it because file system 414 is configured to reject read operations on sectors in file system free space. Since the application will be prevented from reading the remaining 524 sectors of the virtual disk block, it may contain any data previously stored on the virtual disk.
现在转到图13的操作1320,它示出了计算机系统可以被配置成包括用于接收向虚拟盘扩展写入的请求的电路;用于基于与虚拟盘扩展相关联的状态信息通过非信息公开位模式在逻辑上覆盖虚拟盘文件的未使用区段的电路,所述状态信息表明文件系统未正确保对虚拟盘扩展的访问;以及用于分配虚拟盘文件的覆盖区段以描述虚拟盘扩展的电路。再次参照图9的上下文,虚拟盘解析器404可以接收向虚拟盘扩展写入数据的请求,该虚拟盘扩展在此实例中与表明文件系统414未正确保对虚拟盘扩展访问的状态信息相关联。例如,虚拟盘解析器404可响应于标准修剪命令的接收已释放虚拟盘扩展,以及可以在分配表416中存储了表明未映射(即,未被虚拟盘文件702中的空间支持)虚拟盘扩展的状态信息。Turning now to operation 1320 of FIG. 13 , it is shown that the computer system can be configured to include circuitry for receiving a request to write to a virtual disk extent; circuitry for logically overwriting an unused section of the virtual disk file using a non-disclosure bit pattern based on status information associated with the virtual disk extent, the status information indicating that the file system has not properly ensured access to the virtual disk extent; and circuitry for allocating an overlay section of the virtual disk file to describe the virtual disk extent. Referring again to the context of FIG. 9 , virtual disk parser 404 can receive a request to write data to a virtual disk extent that, in this example, is associated with status information indicating that the file system 414 has not properly ensured access to the virtual disk extent. For example, virtual disk parser 404 may have freed the virtual disk extent in response to receiving a standard TRIM command and may store status information in allocation table 416 indicating that the virtual disk extent is unmapped (i.e., not backed by space in virtual disk file 702).
响应于确定虚拟盘扩展未映射,虚拟盘解析器404可以识别要使用的虚拟盘文件702的未使用区段以描述虚拟扩展和向该区段在逻辑上写入非信息公开位模式以保证对虚拟盘扩展的读取不无意中泄漏任何信息。在优选实施方式中,非信息公开位模式可以是全零或先前存储的数据。在使区段为零或向该区段在逻辑上写入一些其它非信息公开位模式(诸如先前存储数据)之后,虚拟盘解析器404可以把IO任务的载荷在逻辑上写入到区段;更新状态信息以表明虚拟盘扩展被映射;以及更新分配表416中的信息以描述虚拟盘文件字节偏移量,该偏移量识别用来存储虚拟盘扩展的区段的开始。In response to determining that the virtual disk extent is unmapped, virtual disk parser 404 may identify an unused section of virtual disk file 702 to be used to describe the virtual extent and logically write a non-information-disclosing bit pattern to the section to ensure that reading the virtual disk extent does not inadvertently leak any information. In a preferred embodiment, the non-information-disclosing bit pattern may be all zeros or previously stored data. After zeroing the section or logically writing some other non-information-disclosing bit pattern (such as previously stored data) to the section, virtual disk parser 404 may logically write the payload of the IO task to the section; update status information to indicate that the virtual disk extent is mapped; and update information in allocation table 416 to describe the virtual disk file byte offset that identifies the beginning of the section used to store the virtual disk extent.
继续图13的描述,操作1322示出了计算机系统可以包括被配置成基于表明使虚拟盘扩展为零的状态信息响应于与虚拟盘扩展相关联的卸载读取请求的接收向请求方发送表示零的令牌的电路。例如、以及参照图4,卸载提供器引擎422(例如,被配置成服务于卸载读取和卸载写入命令的电路)可以响应于请求方发出的卸载读取请求向请求方(例如,应用424)发送表示零的令牌。可以使用卸载读取请求通过生成和向请求方发送令牌从一个地点向另一个高效复制数据,令牌表示请求数据而非把数据复制到请求方的存储器中并随后把数据发送给目的地。卸载读取和卸载写入命令可以用来当目的地点辨识源地点生成的令牌时取得副本卸载以及可以把令牌表示的数据在逻辑上写入到目的地。在源生成的公知零令牌的情形中,目的地不需要访问底层存储,例如,存储装置106,其在此具体实施中可以是SAN目标。在此实例中,卸载读取请求可以是对具有一个或更多个虚拟盘扩展中存储的数据的一个或更多个文件执行卸载读取操作,其中之一与表明使虚拟盘扩展为零的状态信息相关联。在此实例中,可以通过生成公知零令牌值和向请求方返回该公知零令牌来服务于卸载读取请求。Continuing with the description of FIG. 13 , operation 1322 illustrates that the computer system may include circuitry configured to send a token representing zero to a requestor in response to receiving an offload read request associated with a virtual disk extent based on status information indicating zeroing the virtual disk extent. For example, and referring to FIG. 4 , offload provider engine 422 (e.g., circuitry configured to service offload read and offload write commands) may send a token representing zero to a requestor (e.g., application 424) in response to an offload read request issued by the requestor. Offload read requests can be used to efficiently copy data from one location to another by generating and sending a token to the requestor, the token representing the requested data, rather than copying the data to the requestor's memory and then sending the data to the destination. Offload read and offload write commands can be used to achieve copy offload when the destination recognizes the token generated by the source and can logically write the data represented by the token to the destination. In the case of a well-known zero token generated by the source, the destination does not need to access underlying storage, such as storage device 106, which in this embodiment may be a SAN target. In this example, the offload read request can be to perform an offload read operation on one or more files having data stored in one or more virtual disk extents, one of which is associated with state information indicating that the virtual disk extent is to be zeroed. In this example, the offload read request can be serviced by generating a well-known zero token value and returning the well-known zero token to the requester.
可以把卸载读取请求发送给卸载提供器引擎422。卸载提供器引擎422可以接收请求和向虚拟盘解析器404发送针对虚拟盘扩展中存储的数据的消息。虚拟盘解析器404可以接收请求,读取虚拟盘扩展的状态信息,以及在此具体实例中确定该状态信息表明使此虚拟盘扩展为零。虚拟盘解析器404可以向卸载提供器引擎422回送表明虚拟盘扩展是全零的消息,卸载提供器引擎422可以生成表明请求数据是全零(例如,描述虚拟盘块的扇区的范围是全零)的公知令牌值,以及把公知零令牌发送给请求方。The offload read request may be sent to the offload provider engine 422. The offload provider engine 422 may receive the request and send a message to the virtual disk parser 404 for the data stored in the virtual disk extent. The virtual disk parser 404 may receive the request, read the status information for the virtual disk extent, and determine, in this specific example, that the status information indicates that the virtual disk extent is to be zeroed. The virtual disk parser 404 may send a message back to the offload provider engine 422 indicating that the virtual disk extent is all zeros. The offload provider engine 422 may generate a well-known token value indicating that the requested data is all zeros (e.g., a range of sectors describing a virtual disk block is all zeros) and send the well-known zero token to the requester.
在具体实例中,可以向SAN转发卸载请求而非通过计算机系统400、存储业务500、或者计算机系统512处理。在此实例中,SAN可以生成令牌并把它返回给虚拟盘解析器404,其可以随后把零令牌发送给请求方。在又一实例中,当卸载提供器引擎422接收到表明虚拟盘扩展是全零的消息时,卸载提供器引擎422可以生成公知零令牌,其实际上通过把数据识别成与任何其它零数据等同并共享与公知零令牌相关联的区域、实现把请求的零数据在逻辑上复制到与令牌相关联的单独区域中。在卸载提供器引擎422随后接收到指定先前向请求方发送的令牌的卸载写入的情况下,卸载提供器引擎422可以把数据从与令牌相关联的区域在逻辑上复制到请求方指定的偏移量处。In a specific example, the offload request can be forwarded to the SAN rather than being processed by computer system 400, storage service 500, or computer system 512. In this example, the SAN can generate a token and return it to virtual disk parser 404, which can then send a zero token to the requester. In another example, when offload provider engine 422 receives a message indicating that the virtual disk extent is all zeros, offload provider engine 422 can generate a well-known zero token that effectively logically copies the requested zero data to a separate area associated with the token by identifying the data as equivalent to any other zero data and sharing the area associated with the well-known zero token. In the event that offload provider engine 422 subsequently receives an offload write specifying the token previously sent to the requester, offload provider engine 422 can logically copy the data from the area associated with the token to the offset specified by the requester.
现在转到图14,它示例了用于收回虚拟盘文件空间的操作流程,包括操作1400、1402、1404、以及1406。如图所示,操作1400开始操作流程,操作1402示出了计算机系统可以包括用于接收表明不再使用虚拟盘扩展的一部分的信号的电路,虚拟盘扩展是虚拟盘(402)的一部分,虚拟盘(402)存储在虚拟盘文件中。例如、以及转到图4,虚拟盘解析器404可以被配置成实例化虚拟盘402。文件系统414可以向虚拟盘解析器404发送表明它不再正使用虚拟盘402的一部分(即,虚拟盘扩展扇区的范围)的信号。在具体实例中,信号可以是修剪命令。在具体实例中,虚拟盘解析器404接收的信号可以识别定义不再正使用的扇区的范围(其可以是虚拟盘扩展的第一个部分)的字节偏移值。Turning now to FIG. 14 , an operational flow for reclaiming virtual disk file space is illustrated, including operations 1400 , 1402 , 1404 , and 1406 . As shown, operation 1400 begins the operational flow, and operation 1402 illustrates that a computer system may include circuitry for receiving a signal indicating that a portion of a virtual disk extent is no longer in use, the virtual disk extent being a portion of a virtual disk ( 402 ) stored in a virtual disk file. For example, and turning to FIG. 4 , virtual disk parser 404 may be configured to instantiate virtual disk 402. File system 414 may send a signal to virtual disk parser 404 indicating that it is no longer using a portion of virtual disk 402 (i.e., a range of sectors of a virtual disk extent). In a specific example, the signal may be a TRIM command. In a specific example, the signal received by virtual disk parser 404 may identify a byte offset value defining a range of sectors that are no longer in use (which may be the first portion of a virtual disk extent).
继续图14的描述,操作1404示出了计算机系统还可以包括被配置成识别描述虚拟盘扩展部分的虚拟盘文件(406,600,602,604,606,608,610,612,702,1002)的一部分的电路。返回参照图7,虚拟盘解析器404可以接收信号和识别例如虚拟盘扩展0的第一部分的虚拟盘字节偏移值。响应于信号的接收,虚拟盘解析器404可以检查分配表416以确定与信号相关联的虚拟盘字节偏移值对应的虚拟盘文件702的部分。Continuing with the description of FIG. 14 , operation 1404 illustrates that the computer system may further include circuitry configured to identify a portion of a virtual disk file ( 406 , 600 , 602 , 604 , 606 , 608 , 610 , 612 , 702 , 1002 ) that describes a virtual disk extent. Referring back to FIG. 7 , virtual disk parser 404 may receive a signal and identify a virtual disk byte offset value for, for example, the first portion of virtual disk extent 0. In response to receiving the signal, virtual disk parser 404 may check allocation table 416 to determine the portion of virtual disk file 702 corresponding to the virtual disk byte offset value associated with the signal.
现在转到图14的操作1406,它示出了计算机系统可以包括用于向被配置成把虚拟盘文件(406,600,602,604,606,608,610,612,702,1002)存储在存储装置中的文件系统发送用以修剪虚拟盘文件(406,600,602,604,606,608,610,612,702,1002)识别部分的请求的电路。例如、以及再次参照图7,虚拟盘解析器404可以确定把信号识别为小于整个虚拟盘扩展。例如,信号可以表明扇区的范围不包括虚拟盘扩展的所有扇区。响应于此确定,虚拟盘解析器404可以向主管虚拟盘文件702的文件系统(例如,虚拟化系统文件系统408)发出用以修剪虚拟盘扩展的修剪部分对应的虚拟盘文件702的部分的请求。虚拟化系统文件系统408可以被配置成通过修剪虚拟盘文件406、刷新来自高速缓冲存储器的数据、清除内部缓存器、把修剪发送给存储文件系统数据的盘等使用修剪命令和从它获益。Turning now to operation 1406 of FIG. 14 , it is shown that the computer system may include circuitry for sending a request to a file system configured to store virtual disk file (406, 600, 602, 604, 606, 608, 610, 612, 702, 1002) on a storage device to trim an identified portion of virtual disk file (406, 600, 602, 604, 606, 608, 610, 612, 702, 1002). For example, and referring again to FIG. 7 , virtual disk parser 404 may determine that the signal identified is less than the entire virtual disk extent. For example, the signal may indicate that the range of sectors does not include all sectors of the virtual disk extent. In response to this determination, virtual disk parser 404 may issue a request to a file system hosting virtual disk file 702 (e.g., virtualization system file system 408) to trim the portion of virtual disk file 702 corresponding to the trimmed portion of the virtual disk extent. The virtualized system file system 408 may be configured to use and benefit from the trim command by trimming virtual disk files 406, flushing data from cache memory, clearing internal buffers, sending trims to disks storing file system data, and the like.
在具体实例中,虚拟盘解析器404可以被配置成响应于确定用以修剪虚拟盘文件一部分的请求未涵盖整个扩展、向底层文件系统发出修剪命令。例如,假设信号识别出不再使用虚拟盘扩展的前600个扇区且虚拟盘解析器404可以确定虚拟盘扩展的前600个扇区小于构建虚拟盘扩展的1024个扇区。响应于此确定,虚拟盘解析器404可以访问分配表416,以及确定描述了描述虚拟盘扩展的虚拟盘文件702区段前600个扇区的虚拟盘文件字节偏移量、和向主管虚拟盘文件702的文件系统发送用以修剪虚拟盘文件702此部分的请求。In a specific example, virtual disk parser 404 can be configured to issue a trim command to the underlying file system in response to determining that a request to trim a portion of a virtual disk file does not encompass the entire extent. For example, assume that a signal identifies that the first 600 sectors of a virtual disk extent are no longer in use and virtual disk parser 404 can determine that the first 600 sectors of the virtual disk extent are less than the 1024 sectors that constitute the virtual disk extent. In response to this determination, virtual disk parser 404 can access allocation table 416 and determine the virtual disk file byte offset that describes the first 600 sectors of the virtual disk file 702 section that describes the virtual disk extent, and send a request to the file system hosting virtual disk file 702 to trim this portion of virtual disk file 702.
现在转到图15,它示例了可以结合图14描绘的那些来执行的额外操作。现在转到操作1508,它示出了计算机系统可以额外包括:用于基于表明使虚拟盘扩展为零的状态信息、响应于与虚拟盘扩展相关联的卸载读取请求的接收、向请求方发送(1508)表示零的令牌的电路。例如、以及参照图4,卸载提供器引擎422(例如,被配置成服务于卸载读取和卸载写入命令的电路)可以响应于请求方发出的卸载读取请求向请求方(例如,应用424)发送表示零的令牌。可以使用卸载读取请求通过生成和向请求方发送令牌从一个地点向另一个高效复制数据,令牌表示请求数据而非把数据复制到请求方的存储器中、随后把数据发送给目的地。卸载读取和卸载写入命令可以用来在目的地点辨识源地点生成的令牌时实现副本卸载,以及可以把令牌表示的数据在逻辑上写入到目的地。在源生成的公知零令牌的情形中,目的地不需要访问底层存储,例如,存储装置106,其在此具体实施方式中可以是SAN目标。在此实例中,卸载读取请求可以是对具有一个或更多个虚拟盘扩展中存储的数据的一个或更多个文件执行卸载读取操作,其中之一与表明使虚拟盘扩展为零的状态信息相关联。在此实例中,可以通过生成公知零令牌值和向请求方返回该公知零令牌来服务于卸载读取请求。Turning now to FIG. 15 , additional operations that may be performed in conjunction with those depicted in FIG. 14 are illustrated. Turning now to operation 1508 , the computer system may additionally include circuitry for sending ( 1508 ) a token representing zero to a requestor in response to receiving an offload read request associated with the virtual disk extent, based on state information indicating zeroing the virtual disk extent. For example, and referring back to FIG. 4 , the offload provider engine 422 (e.g., circuitry configured to service offload read and offload write commands) may send a token representing zero to a requestor (e.g., application 424) in response to an offload read request issued by the requestor. Offload read requests can be used to efficiently copy data from one location to another by generating and sending a token to the requestor, the token representing the requested data, rather than copying the data to the requestor's memory and then sending the data to the destination. The offload read and offload write commands can be used to implement copy offload when the destination recognizes the token generated by the source and can logically write the data represented by the token to the destination. In the case of a well-known zero token generated by the source, the destination does not need to access the underlying storage, for example, storage device 106, which in this embodiment may be a SAN target. In this example, the offload read request may be to perform an offload read operation on one or more files having data stored in one or more virtual disk extents, one of which is associated with state information indicating that the virtual disk extent is zeroed. In this example, the offload read request may be serviced by generating a well-known zero token value and returning the well-known zero token to the requester.
继续图15的描述,操作1510示出了计算机系统可以包括用于从虚拟盘文件的组中选择(1510)子组的电路;以及用于生成(1510)识别包括数据的子组的扇区和透明子组的扇区的信息的电路。在示范性实施例中,可以从多个虚拟盘文件实例化虚拟盘402。或者换言之,可以从M个虚拟盘文件形成虚拟盘402(其中,M是大于1的整数)。在此示范性实施例中,虚拟盘解析器404可以被配置成从例如管理员接收请求以确定在给定字节偏移量处开始的、虚拟盘402上接下来的字节偏移量,其与虚拟盘文件的子组内定义的扇区相关联。例如、以及参照图10,虚拟盘解析器404可以接收在虚拟盘扩展2的第一个扇区对应的虚拟盘偏移量处开始的接下来定义的字节偏移量的请求和表明子组包括虚拟盘文件1002和虚拟盘文件1004的信息。在此实例中,虚拟盘解析器404可以开始扫描(scan through)子组并确定接下来定义的字节偏移量是虚拟盘扩展3的开始对应的扇区。由于在此实例中,虚拟盘扩展2中的数据被虚拟盘文件1006的区段支持,所以它在查找之外且未如定义的一样返回。Continuing with the description of FIG. 15 , operation 1510 illustrates that the computer system may include circuitry for selecting ( 1510 ) a subgroup from a group of virtual disk files; and circuitry for generating ( 1510 ) information identifying sectors of the subgroup including data and sectors of the transparent subgroup. In an exemplary embodiment, virtual disk 402 may be instantiated from multiple virtual disk files. Or, in other words, virtual disk 402 may be formed from M virtual disk files (where M is an integer greater than 1). In this exemplary embodiment, virtual disk parser 404 may be configured to receive a request from, for example, an administrator to determine the next byte offset on virtual disk 402, starting at a given byte offset, that is associated with a sector defined within a subgroup of virtual disk files. For example, and referring to FIG. 10 , virtual disk parser 404 may receive a request for a next defined byte offset starting at a virtual disk offset corresponding to the first sector of virtual disk extent 2 and information indicating that the subgroup includes virtual disk file 1002 and virtual disk file 1004. In this example, virtual disk parser 404 may begin scanning through the subgroups and determine that the next defined byte offset is the sector corresponding to the start of virtual disk extent 3. Since, in this example, the data in virtual disk extent 2 is backed by sectors of virtual disk file 1006, it is outside the seek and not returned as defined.
继续图15的描述,操作1512示出了计算机系统可以包括被配置成响应于确定使虚拟盘扩展为零、使虚拟盘扩展从虚拟盘文件(406,600,602,604,606,608,610,612,702,1002)分离并修改与虚拟盘扩展相关联的状态信息以表明使虚拟盘扩展已为零的电路。例如、以及转到图7,在实施例中虚拟盘解析器404可以确定使虚拟盘扩展已为零。例如,虚拟盘解析器404可以接收用以向虚拟盘扩展(例如,虚拟盘扩展7)写入公知零令牌表示的数据的请求。虚拟盘解析器404可以根据与请求相关联的数据结构确定请求用于整个虚拟盘扩展,即,字节偏移值可以在扩展7的第一个扇区处开始和在扩展7的最后扇区处结束。响应于这种确定、以及并非把零写入到虚拟盘文件702的对应区段,虚拟盘解析器404可以被配置成去除把虚拟盘扩展7映射到用来描述虚拟盘扩展7的虚拟盘文件702的区段的链接、以及使虚拟盘扩展与表明虚拟盘扩展是全零的信息相关联。例如,虚拟盘解析器404可以在分配表416中写入表明虚拟盘扩展包括全零的八个字节的信息。此操作的最终结果是即使虚拟盘文件中不存在正逐位地描述扩展的部分,也可以重新使用虚拟盘文件702的区段来存储其它虚拟盘扩展的数据,并且将会仿佛虚拟盘扩展包括全零一样来读取虚拟盘扩展。Continuing with the description of FIG. 15 , operation 1512 illustrates that the computer system may include circuitry configured to, in response to determining to zero a virtual disk extent, detach the virtual disk extent from the virtual disk file (406, 600, 602, 604, 606, 608, 610, 612, 702, 1002) and modify state information associated with the virtual disk extent to indicate that the virtual disk extent has been zeroed. For example, and turning to FIG. 7 , in an embodiment, virtual disk parser 404 may determine to zero a virtual disk extent. For example, virtual disk parser 404 may receive a request to write data represented by a well-known zero token to a virtual disk extent (e.g., virtual disk extent 7). Virtual disk parser 404 may determine, based on a data structure associated with the request, that the request is for the entire virtual disk extent, i.e., the byte offset value may begin at the first sector of extent 7 and end at the last sector of extent 7. In response to this determination, and instead of writing zeros to the corresponding section of virtual disk file 702, virtual disk parser 404 can be configured to remove the link that maps virtual disk extent 7 to the section of virtual disk file 702 that describes virtual disk extent 7, and associate the virtual disk extent with information indicating that the virtual disk extent is all zeros. For example, virtual disk parser 404 can write eight bytes of information in allocation table 416 indicating that the virtual disk extent consists of all zeros. The net result of this operation is that even if the portion of the virtual disk file that bit-by-bit describes the extent does not exist, the section of virtual disk file 702 can be reused to store data for other virtual disk extents, and the virtual disk extent will be read as if it consisted of all zeros.
继续图15的描述,操作1514示出了计算机系统可以额外包括如下电路:该电路被配置成响应于文件系统认为虚拟盘扩展为自由空间的确定、使虚拟盘扩展从虚拟盘文件分离并修改与虚拟盘扩展相关联的状态信息以表明虚拟盘扩展是自由空间。例如、以及再次转到图7,虚拟盘解析器404可以确定文件系统414已使虚拟盘扩展与表明它是自由空间(即,文件系统414未使用的空间)的信息相关联。例如,虚拟盘解析器404可以接收来自文件系统414的、表明扇区的范围涵盖虚拟盘扩展(例如虚拟盘扩展3)的信号、以及表明认为扇区是自由空间的信息。响应于这种信号的接收,虚拟盘解析器404可以被配置成去除把虚拟盘扩展链接到虚拟盘文件702的区段的信息。此操作的结果是可以重新使用虚拟盘文件702的区段以存储其它虚拟盘扩展的数据。虚拟盘解析器404可以额外使虚拟盘扩展与表明虚拟盘扩展包括任意数据(即,虚拟盘的任何部分中先前存储的数据、全零、或者全一)的信息相关联。结果是,可以通过返回虚拟盘中先前存储的任意数据来操控针对此虚拟盘扩展的读取操作。另外,在虚拟盘解析器404被配置成允许任意数据在每次接收到读取操作时改变的情况下,任意数据可以可选地在每次接收到读取操作时改变。Continuing with the description of FIG. 15 , operation 1514 illustrates that the computer system may additionally include circuitry configured to, in response to the file system determining that the virtual disk extension is considered free space, detach the virtual disk extension from the virtual disk file and modify state information associated with the virtual disk extension to indicate that the virtual disk extension is free space. For example, and referring again to FIG. 7 , virtual disk parser 404 may determine that file system 414 has associated the virtual disk extension with information indicating that it is free space (i.e., space not used by file system 414). For example, virtual disk parser 404 may receive a signal from file system 414 indicating that a range of sectors encompasses a virtual disk extension (e.g., virtual disk extent 3) and information indicating that the sectors are considered free space. In response to receiving this signal, virtual disk parser 404 may be configured to remove information linking the virtual disk extension to the extent of virtual disk file 702. As a result, the extent of virtual disk file 702 can be reused to store data for other virtual disk extensions. Virtual disk parser 404 can additionally associate the virtual disk extent with information indicating that the virtual disk extent includes arbitrary data (i.e., data previously stored in any portion of the virtual disk, all zeros, or all ones). As a result, read operations directed to the virtual disk extent can be manipulated by returning the arbitrary data previously stored in the virtual disk. Furthermore, if virtual disk parser 404 is configured to allow the arbitrary data to change each time a read operation is received, the arbitrary data can optionally change each time a read operation is received.
继续图15的描述,操作1516示出了计算机系统可以额外包括如下电路:该电路被配置成响应于虚拟盘扩展被修剪的确定、使扩展从虚拟盘文件分离并修改与虚拟盘扩展相关联的状态信息以表明虚拟盘扩展包括非信息公开位模式。例如、以及再次转到图7,虚拟盘解析器404可以确定文件系统414已修剪了构成虚拟盘扩展的扇区的范围。响应于这种确定,虚拟盘解析器404可以去除把虚拟盘扩展链接到虚拟盘文件702的区段的分配表416中的信息。此操作的结果是可以重新使用虚拟盘文件702的区段以存储其它虚拟盘扩展的数据。虚拟盘解析器404可以额外使虚拟盘扩展与表明虚拟盘扩展包括非信息公开位模式(例如,全零、一、或者随机生成的位模式)的信息相关联。结果是,可以通过返回非信息公开位模式来操控针对此虚拟盘扩展的读取操作。在具体优选实施方式中,非信息公开位模式可以是全零。然而,这与上述零状态不同,原因在于可以使用零状态来表示有含义的零(即,有意使虚拟盘扩展为零的情况)。Continuing with the description of FIG. 15 , operation 1516 illustrates that the computer system may additionally include circuitry configured to, in response to determining that a virtual disk extent has been pruned, detach the extent from the virtual disk file and modify state information associated with the virtual disk extent to indicate that the virtual disk extent includes a non-disclosure bit pattern. For example, and referring again to FIG. 7 , virtual disk parser 404 may determine that file system 414 has pruned the range of sectors comprising the virtual disk extent. In response to this determination, virtual disk parser 404 may remove information from allocation table 416 that links the virtual disk extent to the extents of virtual disk file 702. This operation results in the extents of virtual disk file 702 being reusable to store data for other virtual disk extents. Virtual disk parser 404 may additionally associate the virtual disk extent with information indicating that the virtual disk extent includes a non-disclosure bit pattern (e.g., all zeros, ones, or a randomly generated bit pattern). Consequently, read operations directed to the virtual disk extent can be manipulated by returning the non-disclosure bit pattern. In a preferred embodiment, the non-disclosure bit pattern may be all zeros. However, this differs from the zero state described above in that the zero state can be used to represent meaningful zeros (ie, situations where the virtual disk is intentionally extended to zero).
参考操作1518,它示出了计算机系统可以额外包括被配置成向控制虚拟盘的文件系统(414)发送(1518)用以发出修剪命令的请求的电路。返回参照图7,虚拟盘解析器404可以被配置成发出文件系统414发出一个或更多个修剪命令的请求。在示范性配置中,虚拟盘解析器404可以被配置成周期性地发送这种请求或基于预定准则发送这种请求,例如,当VM410开始时或在要关断VM之前不久。响应于这种请求,文件系统414可以向虚拟盘解析器404发出识别虚拟盘402的未使用扇区的一个或更多个修剪命令。虚拟盘解析器404可以随后接收来自修剪命令的修剪信息,如,文件系统414不再使用的扇区范围和可选地表明是否认为修剪扇区为自由空间的信息。虚拟盘解析器404可以接收信息和使用它来更新分配表416中存储的状态信息以及可能收回虚拟盘文件702的未使用区段。Referring to operation 1518, it is shown that the computer system may additionally include circuitry configured to send (1518) a request to the file system (414) controlling the virtual disk to issue a trim command. Referring back to FIG. 7, the virtual disk parser 404 may be configured to issue a request to the file system 414 to issue one or more trim commands. In an exemplary configuration, the virtual disk parser 404 may be configured to send such a request periodically or based on predetermined criteria, for example, when the VM 410 starts or shortly before shutting down the VM. In response to such a request, the file system 414 may issue one or more trim commands to the virtual disk parser 404 to identify unused sectors of the virtual disk 402. The virtual disk parser 404 may then receive trim information from the trim command, such as the sector range that the file system 414 no longer uses and, optionally, information indicating whether the trimmed sectors are considered free space. The virtual disk parser 404 may receive the information and use it to update state information stored in the allocation table 416 and possibly reclaim unused sections of the virtual disk file 702.
现在转到图16,它示例了用于存储虚拟机数据的操作流程。操作流程通过操作1600开始并转变为描述如下情况的操作1602:该情况下,计算机系统可以包括用于执行(1602)包括虚拟机内文件系统的访客操作系统(220,222,412,518)的电路。例如、以及参照图4,虚拟化系统420(其可以是图3的管理程序302或主机环境204执行的功能和图2的微核管理程序202的组合)可以实例化虚拟机410并在其内运行访客操作系统(如,访客操作系统412)。在此实例中,访客操作系统412可以包括文件系统414,其可以是组织和控制用于访客操作系统412的数据的可执行指令。Turning now to FIG. 16 , an operational flow for storing virtual machine data is illustrated. The operational flow begins at operation 1600 and transitions to operation 1602 , which describes a scenario in which a computer system may include circuitry for executing ( 1602 ) a guest operating system ( 220 , 222 , 412 , 518 ) including a file system within a virtual machine. For example, and with reference to FIG. 4 , virtualization system 420 (which may be a combination of the functions performed by hypervisor 302 or host environment 204 of FIG. 3 and microkernel hypervisor 202 of FIG. 2 ) may instantiate virtual machine 410 and run a guest operating system (e.g., guest operating system 412) therein. In this example, guest operating system 412 may include file system 414 , which may be executable instructions for organizing and controlling data for guest operating system 412 .
继续图16的描述,操作1604示出了计算机系统可以包括用于向访客操作系统(220,222,412,508)暴露(1604)虚拟存储装置(402)的电路,虚拟存储装置(402)包括虚拟盘扩展,使虚拟盘扩展从虚拟盘文件分离。转回到图4,虚拟化系统420可以把虚拟盘402暴露给访客操作系统412。例如,虚拟盘解析器404可以与存储虚拟化业务提供器通信,该提供器可操作用于与在访客操作系统410内运行的存储虚拟化业务客户端通信。在具体实例中,存储虚拟化业务客户端可以是访客操作系统412内安装的驱动器,其向访客告知它可以与存储装置通信。在此实例中,文件系统414发送的IO任务经由通信通道(例如,存储器的区域和跨分区通知设施)首先被发送给存储虚拟化业务客户端、随后向存储虚拟化业务提供器发送。被虚拟盘解析器404打开和用来存储虚拟盘402的数据的一个或更多个虚拟盘文件406可以构成虚拟盘402。在具体实例中,可以通过图7的虚拟盘文件702至少部分地描述虚拟盘402。在另一具体实例中、以及转到图10,可以通过一组虚拟盘文件(1002-1006)描述虚拟盘402。在任一情形中、以及回到图4,虚拟盘402可以包括多个虚拟盘扩展,可以使虚拟盘扩展中的一个分离,即,不通过它的相关联的虚拟盘文件内的任何空间逐位地描述。Continuing with the description of FIG. 16 , operation 1604 illustrates that the computer system may include circuitry for exposing ( 1604 ) a virtual storage device ( 402 ) to a guest operating system ( 220 , 222 , 412 , 508 ), the virtual storage device ( 402 ) including a virtual disk extension, separating the virtual disk extension from the virtual disk file. Returning to FIG. 4 , the virtualization system 420 may expose the virtual disk 402 to the guest operating system 412. For example, the virtual disk parser 404 may communicate with a storage virtualization service provider, which may be operable to communicate with a storage virtualization service client running within the guest operating system 410. In a specific example, the storage virtualization service client may be a driver installed within the guest operating system 412 that notifies the guest that it can communicate with the storage device. In this example, IO tasks sent by the file system 414 are first sent to the storage virtualization service client and then to the storage virtualization service provider via a communication channel (e.g., a memory region and cross-partition notification facility). One or more virtual disk files 406 opened by virtual disk parser 404 and used to store data for virtual disk 402 may constitute virtual disk 402. In a specific example, virtual disk 402 may be at least partially described by virtual disk file 702 of FIG. 7 . In another specific example, and turning to FIG. 10 , virtual disk 402 may be described by a set of virtual disk files (1002-1006). In either case, and returning to FIG. 4 , virtual disk 402 may include multiple virtual disk extents, and one of the virtual disk extents may be detached, i.e., not described bit-by-bit by any space within its associated virtual disk file.
继续图16的描述,操作1606示出了计算机系统可以包括用于接收(1606)用以向虚拟盘扩展写入数据的请求的电路。转回到图7,虚拟盘解析器404可以接收用以向在虚拟盘文件702中不具有相关联空间的虚拟盘扩展写入数据的请求。例如,可以接收指定表明在虚拟盘扩展内的虚拟盘扇区地址的偏移值的IO任务。Continuing with the description of FIG16, operation 1606 shows that the computer system may include circuitry for receiving (1606) a request to write data to a virtual disk extent. Returning to FIG7, virtual disk parser 404 may receive a request to write data to a virtual disk extent that does not have associated space in virtual disk file 702. For example, an IO task may be received that specifies an offset value indicating a virtual disk sector address within the virtual disk extent.
转回到图16,操作1608示出了计算机系统可以可选地包括:用于确定(1608)与虚拟盘扩展相关联的状态信息表明虚拟盘扩展是自由空间的电路。响应于IO任务的接收,虚拟盘解析器404可以访问分配表416以及读取与虚拟盘扩展相关联的状态信息。在此实例中,虚拟盘扩展可以与表明虚拟盘扩展是自由空间(即,文件系统414未正使用虚拟盘扩展,以及可以通过任意数据应答对虚拟盘扩展的读取操作)的信息相关联。Returning to FIG. 16 , operation 1608 illustrates that the computer system may optionally include circuitry for determining ( 1608 ) that status information associated with the virtual disk extent indicates that the virtual disk extent is free space. In response to receiving the IO task, virtual disk parser 404 may access allocation table 416 and read the status information associated with the virtual disk extent. In this example, the virtual disk extent may be associated with information indicating that the virtual disk extent is free space (i.e., the virtual disk extent is not being used by file system 414 and a read operation on the virtual disk extent may be responded to with any data).
参照图16,操作1610示出了计算机系统可以可选地包括用于在不覆盖虚拟盘文件的区段内的预先存在的位模式的情况下、分配(1610)虚拟盘文件(406,600,602,604,606,608,610,612,702,1002)的区段以描述虚拟盘扩展的电路。例如、以及回到图7,响应于写入IO任务的接收,虚拟盘解析器404可以定位未正被使用的虚拟盘文件702中的区段以及将其分配以存储虚拟扩展的数据。例如,虚拟盘解析器404可以在分配表416中写入把虚拟盘扩展链接到虚拟盘文件702的分配区段的字节偏移值的信息。16 , operation 1610 illustrates that the computer system may optionally include circuitry for allocating ( 1610 ) a section of a virtual disk file ( 406 , 600 , 602 , 604 , 606 , 608 , 610 , 612 , 702 , 1002 ) to describe a virtual disk extent without overwriting a pre-existing bit pattern within the section of the virtual disk file. For example, and returning to FIG. 7 , in response to receiving a write IO task, virtual disk parser 404 may locate a section in virtual disk file 702 that is not currently in use and allocate it to store data for the virtual extent. For example, virtual disk parser 404 may write information in allocation table 416 linking the virtual disk extent to the byte offset value of the allocated section of virtual disk file 702.
在此实例中,由于状态信息表明文件系统414已把虚拟盘扩展5识别为自由空间,所以虚拟盘解析器404不会在使用区段来描述虚拟盘扩展以前、(通过写入全零、一、或者任何其它非信息公开位模式)覆盖虚拟盘文件702的区段中存储的、区段内已有的任何位模式(例如,来自一些删除文件的数据和/或任意数据)。这提供节省处理器周期和IO任务的附加益处,其否则将会用来覆盖虚拟盘扩展的区段。In this example, because the state information indicates that the file system 414 has identified virtual disk extent 5 as free space, the virtual disk parser 404 does not overwrite any existing bit patterns (e.g., data from some deleted files and/or arbitrary data) stored in the extent of the virtual disk file 702 before using the extent to describe the virtual disk extent (by writing all zeros, ones, or any other non-information-disclosing bit patterns). This provides the added benefit of saving processor cycles and IO work that would otherwise be used to overwrite the extents of the virtual disk extent.
参考图16的操作1612,它示出了计算机系统可以可选地包括用于修改(1612)与虚拟盘扩展相关联的状态信息以表明把虚拟盘扩展映射到虚拟盘文件的分配区段的电路。例如、以及转回到图7,虚拟盘解析器404可以修改(例如,在存储器中覆盖)与虚拟盘扩展相关联的状态信息以表明它是映射的。结果是,针对虚拟盘扩展扇区的后续读取操作将会由虚拟盘解析器404通过返回分配区段的相应部分中存储的位模式来操控。Referring to operation 1612 of FIG. 16 , it is shown that the computer system may optionally include circuitry for modifying ( 1612 ) state information associated with the virtual disk extent to indicate that the virtual disk extent is mapped to the allocated segment of the virtual disk file. For example, and returning to FIG. 7 , virtual disk parser 404 may modify (e.g., overwrite in memory) state information associated with the virtual disk extent to indicate that it is mapped. As a result, subsequent read operations directed to sectors of the virtual disk extent will be handled by virtual disk parser 404 by returning the bit pattern stored in the corresponding portion of the allocated segment.
现在转到图16的操作1614,它示出了把数据存储(1614)到虚拟盘文件的分配区段。转回到图6,虚拟盘解析器404可以把数据(即,位模式)写入到虚拟盘文件702中。可以向虚拟化系统文件系统408发出表明向虚拟盘文件702写入的IO任务,最终该改变可以通过永久存储单元460来维持。Turning now to operation 1614 of FIG16 , which illustrates storing 1614 the data in the allocated section of the virtual disk file. Returning to FIG6 , the virtual disk parser 404 can write the data (i.e., the bit pattern) to the virtual disk file 702. An IO task indicating the write to the virtual disk file 702 can be issued to the virtualization system file system 408, and ultimately the change can be maintained by the permanent storage unit 460.
现在转到图17,它示出了可以结合图16示例的那些来执行的额外操作。转而关注操作1716,它示出了计算机系统可以可选地包括:用于响应于确定使虚拟盘扩展为零、使虚拟盘扩展从虚拟盘文件分离并修改与虚拟盘扩展相关联的状态信息以表明使虚拟盘扩展已为零的电路。例如、以及转到图6,在实施例中虚拟盘解析器404可以确定使虚拟盘扩展已为零。例如,虚拟盘解析器404可以接收用以向虚拟盘扩展(例如,虚拟盘扩展7)写入由公知零令牌表示的数据的卸载写入请求。虚拟盘解析器404可以根据与请求相关联的数据结构来确定该请求用于整个虚拟盘扩展,即,字节偏移值可以在虚拟盘扩展7的第一个扇区处开始和在虚拟盘扩展7的最后扇区处结束。响应于这种确定、以及并非把零写入到虚拟盘文件702的相应区段,虚拟盘解析器404可以被配置成去除从虚拟盘扩展至分配表416中存储的虚拟盘文件702的区段的链接、以及使虚拟盘扩展与表明虚拟盘扩展是全零的信息相关联。Turning now to FIG. 17 , it illustrates additional operations that may be performed in conjunction with those illustrated in FIG. 16 . Turning attention to operation 1716 , it illustrates that the computer system may optionally include circuitry for, in response to determining to zero the virtual disk extent, detaching the virtual disk extent from the virtual disk file and modifying state information associated with the virtual disk extent to indicate that the virtual disk extent has been zeroed. For example, and also turning to FIG. 6 , in one embodiment, virtual disk parser 404 may determine that the virtual disk extent has been zeroed. For example, virtual disk parser 404 may receive an offload write request to write data represented by a well-known zero token to a virtual disk extent (e.g., virtual disk extent 7). Virtual disk parser 404 may determine, based on a data structure associated with the request, that the request is for the entire virtual disk extent, i.e., the byte offset value may begin at the first sector of virtual disk extent 7 and end at the last sector of virtual disk extent 7. In response to this determination, and instead of writing zeros to the corresponding segment of virtual disk file 702, virtual disk parser 404 can be configured to remove the link from the virtual disk extent to the segment of virtual disk file 702 stored in allocation table 416 and associate the virtual disk extent with information indicating that the virtual disk extent is all zeros.
继续图17的描述,操作1718示出了计算机系统可以可选地包括用于响应于来自文件系统(414)的、把虚拟盘扩展识别成自由空间的信号的接收、使虚拟盘扩展从虚拟盘文件(406,600,602,604,606,608,610,612,702,1002)分离(1718)并修改与虚拟盘扩展相关联的状态信息以表明虚拟盘扩展包括任意数据的电路。例如、以及再次转到图7,虚拟盘解析器404可以确定文件系统414已使虚拟盘扩展与表明它是自由空间(即,文件系统414未使用的空间)的信息相关联。例如,虚拟盘解析器404可以接收来自文件系统414的、表明涵盖虚拟盘扩展(例如,虚拟盘扩展3)的扇区范围的信号、以及表明扇区是自由空间的信息。响应于这种确定,虚拟盘解析器404可以被配置成去除分配表416中把虚拟盘扩展链接到虚拟盘文件702的区段的信息、以及使虚拟盘扩展与表明可以响应于读取IO任务的接收返回该任意数据(即,虚拟盘的任何部分中先前存储的数据、全零、或者全一)的信息相关联。Continuing with the description of FIG17, operation 1718 shows that the computer system may optionally include circuitry for, in response to receiving a signal from the file system 414 identifying the virtual disk extension as free space, detaching the virtual disk extension from the virtual disk file 406, 600, 602, 604, 606, 608, 610, 612, 702, 1002) and modifying state information associated with the virtual disk extension to indicate that the virtual disk extension includes any data. For example, and returning again to FIG7, the virtual disk parser 404 may determine that the file system 414 has associated the virtual disk extension with information indicating that it is free space (i.e., space not used by the file system 414). For example, the virtual disk parser 404 may receive a signal from the file system 414 indicating a range of sectors encompassing the virtual disk extension (e.g., virtual disk extent 3) and information indicating that the sectors are free space. In response to this determination, virtual disk parser 404 can be configured to remove information in allocation table 416 linking the virtual disk extent to the section of virtual disk file 702 and to associate the virtual disk extent with information indicating that any data (i.e., data previously stored in any portion of the virtual disk, all zeros, or all ones) can be returned in response to receipt of a read IO task.
图17的操作1720示出了计算机系统400可以可选地包括如下电路:该电路用于响应于用以修剪虚拟盘扩展所有扇区的请求的接收、使虚拟盘扩展从虚拟盘文件(406,600,602,604,606,608,610,612,702,1002)分离(1720)并修改与虚拟盘扩展相关联的状态信息以表明虚拟盘扩展包括非信息公开位模式。例如、以及再次转到图7,虚拟盘解析器404可以确定构成虚拟盘扩展的扇区已被修剪。例如,虚拟盘解析器404可以接收来自文件系统414的、表明涵盖虚拟盘扩展的扇区范围的修剪命令。响应于这种信号的接收,虚拟盘解析器404可以被配置成去除分配表416中把虚拟盘扩展链接到虚拟盘文件702的区段的信息、以及使虚拟盘扩展与表明虚拟盘扩展包括非信息公开位模式的信息相关联。Operation 1720 of FIG. 17 illustrates that computer system 400 may optionally include circuitry for, in response to receiving a request to trim all sectors of a virtual disk extent, detaching the virtual disk extent from virtual disk file (406, 600, 602, 604, 606, 608, 610, 612, 702, 1002) (1720) and modifying state information associated with the virtual disk extent to indicate that the virtual disk extent includes a non-information-disclosing bit pattern. For example, and returning to FIG. 7 , virtual disk parser 404 may determine that the sectors comprising the virtual disk extent have been trimmed. For example, virtual disk parser 404 may receive a trim command from file system 414 indicating a range of sectors encompassing the virtual disk extent. In response to receiving this signal, virtual disk parser 404 may be configured to remove information in allocation table 416 linking the virtual disk extent to the section of virtual disk file 702 and associate the virtual disk extent with information indicating that the virtual disk extent includes the non-information-disclosing bit pattern.
以上具体实施方式经由实例和/或操作图叙述了系统和/或过程的各种实施例。在这种方框图、和/或实例包含一个或更多个功能和/或操作的范围内,本领域技术人员将会理解,可以通过广泛范围的硬件、软件、固件、或者实质上其任何组合来单独地和/或统一实施这种方框图、或者实例内的每个功能和/或操作。The above detailed description describes various embodiments of the system and/or process by way of examples and/or operational diagrams. To the extent that such block diagrams and/or examples include one or more functions and/or operations, those skilled in the art will understand that each function and/or operation within such block diagrams or examples may be implemented individually and/or collectively by a wide range of hardware, software, firmware, or substantially any combination thereof.
虽然已经示出和描述了本文中描述的本主题的特定方面,但对本领域技术人员而言将会明显的是,基于本文中的教导,可以在不脱离本文中描述的主题和它的较宽泛方面的情况下做出改变和修改,因此,所附权利要求要在它们的范围内如在本文中描述的主题的真实精神和范围内一样涵盖所有这种改变和修改。While particular aspects of the subject matter described herein have been shown and described, it will be obvious to those skilled in the art, based on the teachings herein, that changes and modifications may be made without departing from the subject matter described herein and its broader aspects, and therefore, the appended claims are intended to cover within their scope all such changes and modifications as come within the true spirit and scope of the subject matter described herein.
Claims (20)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/046,617 US9146765B2 (en) | 2011-03-11 | 2011-03-11 | Virtual disk storage techniques |
| US13/046,617 | 2011-03-11 |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| HK13103627.0A Addition HK1176145B (en) | 2011-03-11 | 2013-03-22 | Virtual disk storage techniques |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| HK13103627.0A Division HK1176145B (en) | 2011-03-11 | 2013-03-22 | Virtual disk storage techniques |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1243785A1 HK1243785A1 (en) | 2018-07-20 |
| HK1243785B true HK1243785B (en) | 2021-05-14 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102707900B (en) | Virtual disk storage techniques | |
| JP7592584B2 (en) | Faster access to virtual machine memory backed by virtual memory of the host computing device | |
| JP5934344B2 (en) | Virtual storage disk technology | |
| US9286098B1 (en) | Using master file template area to increase density of virtual machines in a computer system | |
| JP5932973B2 (en) | Virtual storage disk technology | |
| CN102165431B (en) | On-the-fly replacement of physical hardware with emulation | |
| US7539828B2 (en) | Method and system for automatically preserving persistent storage | |
| US8370835B2 (en) | Method for dynamically generating a configuration for a virtual machine with a virtual hard disk in an external storage device | |
| US7032107B2 (en) | Virtual partition for recording and restoring computer data files | |
| US10691341B2 (en) | Method for improving memory system performance in virtual machine systems | |
| JP2006510995A (en) | A method of changing the basic computer software to boot from a protected medium and run. | |
| US20070180206A1 (en) | Method of updating a duplicate copy of an operating system on the same disk | |
| CN101630235A (en) | Techniques for implementing virtual storage devices | |
| US20060149899A1 (en) | Method and apparatus for ongoing block storage device management | |
| CN106796493A (en) | For the mark of storage device district | |
| HK1243785B (en) | Virtual disk storage techniques | |
| CN100447747C (en) | Method and device for exchanging NTFS partition data between Linux operating system and Windows operating system | |
| HK1176145B (en) | Virtual disk storage techniques | |
| HK1176145A (en) | Virtual disk storage techniques |