[go: up one dir, main page]

WO2024183559A1 - 数据共享方法、设备、系统及存储介质 - Google Patents

数据共享方法、设备、系统及存储介质 Download PDF

Info

Publication number
WO2024183559A1
WO2024183559A1 PCT/CN2024/078720 CN2024078720W WO2024183559A1 WO 2024183559 A1 WO2024183559 A1 WO 2024183559A1 CN 2024078720 W CN2024078720 W CN 2024078720W WO 2024183559 A1 WO2024183559 A1 WO 2024183559A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
shared
slot
shared data
virtual machines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2024/078720
Other languages
English (en)
French (fr)
Inventor
向凌风
陆庆达
吴结生
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou AliCloud Feitian Information Technology Co Ltd
Original Assignee
Hangzhou AliCloud Feitian Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou AliCloud Feitian Information Technology Co Ltd filed Critical Hangzhou AliCloud Feitian Information Technology Co Ltd
Publication of WO2024183559A1 publication Critical patent/WO2024183559A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of cloud storage technology, and in particular to a data sharing method, device, system and storage medium.
  • Metadata is system data that describes files, such as access rights, file owners, and file data distribution information. If a user needs to operate a file, he needs to access its metadata, determine the file's location or related attributes, and then complete the corresponding file operation.
  • VMs virtual machines
  • file metadata metadata
  • VMs need to obtain metadata from remote servers.
  • VMs modify metadata, these modifications need to be written back to the remote server and then synchronized to other VMs through the remote server. This can ensure metadata consistency but will result in reduced metadata access performance. How to ensure the consistency of shared data between VMs while ensuring shared data access performance is a technical problem that needs to be solved urgently.
  • Multiple aspects of the present application provide a data sharing method, device, system and storage medium, which are used to shorten the access path of shared data and improve the access performance of shared data while ensuring the consistency of shared data between VMs.
  • an embodiment of the present application provides a host, on which a target process module and multiple virtual machines having a data sharing relationship are deployed, and the host includes: a memory and a processor; the memory is used to store the target process module. The target process module and the program codes corresponding to the multiple virtual machines; the processor is coupled to the memory, and is used to execute the program code corresponding to the target process module, so as to implement the steps in the data sharing method provided in the first aspect of the present application.
  • an embodiment of the present application provides a data sharing system, comprising a target process module and multiple virtual machines with a data sharing relationship deployed on the same host; the target process module is used to implement the steps executed by the target process module in the method provided in the first aspect, and the multiple virtual machines are used to implement the steps executed by the virtual machines in the method provided in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium storing a computer program.
  • the processor is enabled to implement the steps in the data sharing method provided in the first aspect of the present application.
  • a shared memory window is allocated to these virtual machines on the host, and shared data between multiple virtual machines is stored through the local shared memory window, so as to realize local access to the shared data, shorten the access path of the shared data, and improve the access performance of the shared data;
  • the shared memory window has a read-only attribute for these virtual machines, that is, the virtual machine can read but not write to the shared memory window, and a target process module is added to the host, and the writing of the shared data is uniformly responsible by the target process module, which can avoid data errors that may be caused by multiple virtual machines writing to the shared memory window at the same time, and based on the shared memory window, any modification of the shared data can be immediately observed by each virtual machine, so that the consistency of the shared data can be achieved for each virtual machine, and the consistency of the shared data does not need to be synchronized between different VMs through network transmission, which can further improve the access performance of the shared data.
  • FIG1 is a schematic diagram of the structure of a data sharing system provided in an embodiment of the present application.
  • FIG2 is a schematic diagram of a usage mode in which a shared memory window provided in an embodiment of the present application is divided into multiple memory areas;
  • FIG3 is a schematic diagram showing a memory area divided into a plurality of slot groups and a usage method of the slots according to an embodiment of the present application;
  • FIG4 is a schematic diagram of a usage of a memory area including slot groups, slots, and a bitmap area provided in an embodiment of the present application;
  • FIG5 is a schematic diagram of the structure of a data sharing system for file metadata provided by an embodiment of the present application.
  • FIG6 is a flow chart of a data sharing method provided in an embodiment of the present application.
  • FIG7a is a schematic diagram of a flow chart of another data sharing method provided in an embodiment of the present application.
  • FIG7b is a schematic diagram of a flow chart of another data sharing method provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of the structure of a data sharing device provided in an embodiment of the present application.
  • FIG9 is a schematic diagram of the structure of another data sharing device provided in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of the structure of a host provided in an embodiment of the present application.
  • user information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • user information and data are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.
  • cloud service providers provide tenants with various cloud computing resources. Tenants can purchase or deploy their own virtual machines on the cloud computing resources provided by cloud service providers, and use virtual machines to host their own applications or services.
  • the various data generated by these applications or services during operation are stored in different storage nodes. From the perspective of virtual machines, data sharing may be required, but the existing data is scattered on different storage nodes, and the access path to shared data is long. In addition, in order to ensure the consistency of shared data, different virtual machines need to synchronize shared data, resulting in poor access performance of shared data.
  • Cloud computing scenarios face the technical problem of how to balance the access performance and consistency of shared data.
  • a solution is provided with the host as the unit.
  • a shared memory window is allocated to these virtual machines on the host, and the shared data between the multiple virtual machines is stored through the local shared memory window, so as to realize local access to the shared data, shorten the access path of the shared data, and improve the access performance of the shared data;
  • the shared memory window has a read-only attribute for these virtual machines, that is, the virtual machine can read but not write to the shared memory window, and a target process module is added to the host.
  • the writing of the shared data is uniformly responsible by the target process module, which can avoid data errors that may be caused by multiple virtual machines writing to the shared memory window at the same time. Moreover, based on the shared memory window, any modification of the shared data can be immediately observed by each virtual machine, so that the consistency of the shared data can be achieved for each virtual machine. Furthermore, the consistency of the shared data does not need to be synchronized between different VMs through network transmission, which can further improve the access performance of the shared data.
  • the efficiency of subsequent operations based on shared data can be improved.
  • the shared data is file metadata
  • shortening the access path of the file metadata and increasing the access speed of the file metadata will help improve the efficiency of file operations based on the file metadata and reduce the latency of file operations.
  • FIG1 is a schematic diagram of the structure of a data sharing system 100 provided in an embodiment of the present application.
  • the data sharing system 100 includes: a target process module 11 deployed on the same host 10 and multiple virtual machines 12 having a data sharing relationship.
  • the host 10 may be one or more, and in FIG1 , one host 10 is taken as an example for illustration.
  • at least some of the hosts 10 are deployed with a target process module 11 and multiple virtual machines 12 having a data sharing relationship.
  • the embodiment of the present application focuses on the host 10 deployed with the target process module 11 and multiple virtual machines 12 having a data sharing relationship.
  • the host 10 of this embodiment may be any physical device with storage, computing and communication capabilities, such as a server device such as a traditional server, a server array or a server cluster, or various terminal devices such as a mobile phone, a laptop computer, a desktop computer, or network devices such as various base stations and gateway devices.
  • a server device such as a traditional server, a server array or a server cluster
  • various terminal devices such as a mobile phone, a laptop computer, a desktop computer, or network devices such as various base stations and gateway devices.
  • the data sharing relationship between virtual machines 12 means that data can be shared between virtual machines 12.
  • virtual machines 12 can use the same data, and these data that can be used by virtual machines 12 are referred to as shared data.
  • virtual machines 12 with data sharing relationship may be distributed on the same host 10, or may be distributed on different hosts 10, and this is not limited.
  • MPI standard message passing interface
  • MPI standard message passing interface
  • a data sharing relationship is formed between them can be flexibly determined according to application requirements.
  • the same tenant can deploy multiple virtual machines, and multiple virtual machines provide the same service, so a data sharing relationship is formed between multiple virtual machines 12 deployed on the same host 10 and belonging to the same tenant.
  • the virtual machines they deploy can be distributed on different hosts, and these virtual machines distributed on different hosts can also form a data sharing relationship, which is not a concern.
  • different tenants may meet user access control policies, such as tenant A's data is authorized to another tenant B. In this case, a data sharing relationship can also be formed between virtual machines 12 across tenants.
  • the number of data sharing relationships existing on the host 10 is not limited, nor is the number of virtual machines 12 having a data sharing relationship.
  • the specific number may depend on the physical resources and processing capabilities of the host 10.
  • there may be one or more groups of virtual machines 12 having a data sharing relationship and different groups of virtual machines 12 have different data sharing relationships.
  • multiple virtual machines 12 of the first tenant and multiple virtual machines 12 of the second tenant may exist on the same host 10 at the same time, and each tenant's virtual machine forms a data sharing relationship.
  • the number of virtual machines 12 having the same data sharing relationship may be 2, 3, 5, or 10, etc., and there is no limitation on this.
  • the number of data sharing relationships existing on different hosts 10 and the number of virtual machines 12 having a data sharing relationship may be the same or different, and there is no limitation on this.
  • a shared memory window is allocated to these virtual machines on the host 10, and the shared data between the multiple virtual machines is stored through the local shared memory window, so as to realize local access to the shared data, shorten the access path of the shared data, and improve the access performance of the shared data.
  • a target process module 11 is added to the host 10, and the target process module 11 maintains and manages the shared memory window process.
  • the target process module 11 can be a process running on the host 10, such as a daemon process, but is not limited thereto. Among them, the target process module 11 can allocate a shared memory window for multiple virtual machines 12 having the same data sharing relationship from the physical space of the host 10, so as to store the shared data between the multiple virtual machines 12.
  • the read and write attributes of the shared memory window are constrained during the shared memory window allocation process.
  • the shared memory window has a read-only attribute for multiple virtual machines 12, that is, for multiple virtual machines 12, only read operations can be performed on the shared memory window, but no write operations can be performed on the shared memory window.
  • the shared memory window implements the constraint of only reading but not writing for the virtual machine 12. Since the virtual machine cannot write to the shared memory window, the problem of shared data errors caused by multiple virtual machines 12 writing to the shared memory window at the same time is solved.
  • the shared data between virtual machines 12 is dynamically changing, and when new shared data is generated, it needs to be written into the shared memory window so as to be shared among multiple virtual machines.
  • the target process module 11 is responsible for writing the shared data into the shared memory window, so that the new shared data can be written into the shared memory window in time, and at the same time, the error problem of shared data caused by multiple virtual machines 12 writing to the shared memory window at the same time can be solved.
  • the target process module 11 writes the new shared data into the shared memory window so that multiple virtual machines 12 can share the new shared data.
  • the shared data that the virtual machine 12 needs to read from the shared memory window is recorded as the first shared data.
  • the first shared data that needs to be read may be different; the shared data that needs to be written into the shared memory window is recorded as the second shared data.
  • the generating end of the second shared data can provide the second shared data to the target process module 11, for example, the second shared data can be provided to the target process module 11 by, but not limited to: Remote Procedure Call (RPC), so that the target process module 11 can write the second shared data into the shared memory window.
  • RPC Remote Procedure Call
  • the generating end of the second shared data will be different according to different application scenarios. Taking the application scenario shown in Figure 5 as an example, the second shared data is specifically the second file metadata, and its corresponding generating end is the host file system.
  • the order in which the virtual machine 12 reads the first shared data from the shared memory window and the target process module 11 writes the second shared data into the shared memory window is not limited. If the virtual machine 12 needs the first shared data first, it can read the first shared data from the shared memory window first; then, When the second shared data is generated, the target process module 11 writes the second shared data into the shared memory window.
  • the generation of the second shared data may be related to the process or result of executing the corresponding operation according to the first shared data, or may be unrelated to the process and result of executing the corresponding operation according to the first shared data.
  • the target process module 11 writes the second shared data into the shared memory window first; then, when the virtual machine needs the first shared data, it can read the first shared data from the shared memory window; the first shared data may be the second shared data written in advance, or may be other shared data in the shared memory window except the second shared data.
  • the virtual machine 12 reading the first shared data from the shared memory window and the target process module 11 writing the second shared data into the shared memory window may also be executed in parallel.
  • the shared data between virtual machines 12 with data sharing relationships will be different. Accordingly, the scenarios and methods in which the virtual machine 12 needs the first shared data, the scenarios and methods in which the second shared data is generated, and the corresponding operations performed according to the first shared data will also be different, and no limitation is made to this.
  • the shared data between each virtual machine 12 can be file metadata;
  • file metadata is data used to describe the attributes of a file, such as but not limited to the type of file, the owner of the file, the size of the file, the time of the last modification of the file, etc.; accordingly, when the virtual machine opens, deletes, moves or renames a file, it needs the metadata of the file, and can initiate a file metadata reading operation to request to read the file metadata from the shared memory window.
  • the corresponding file access operation can be performed according to the read file metadata, such as opening, deleting, moving or renaming the corresponding file.
  • the target process module 11 perceives the file metadata write operation and writes the new file metadata into the shared memory window.
  • a solution is provided based on the host.
  • a shared memory window is allocated to these virtual machines on the host.
  • the shared data between the multiple virtual machines is stored through the local shared memory window, so as to realize local access to the shared data, shorten the access path of the shared data, and improve the access performance of the shared data.
  • the shared memory window has a read-only attribute for these virtual machines, that is, the virtual machine can read but not write to the shared memory window, and a target process module is added to the host.
  • the writing of shared data is uniformly responsible by the target process module, which can avoid data errors that may be caused by multiple virtual machines writing to the shared memory window at the same time. Moreover, based on the shared memory window, any modification of the shared data can be immediately observed by each virtual machine, so that the consistency of the shared data can be achieved for each virtual machine. Furthermore, the consistency of the shared data does not need to be synchronized between different VMs through network transmission, which can further improve the access performance of the shared data.
  • the implementation method of the target process module 11 to allocate shared memory windows for multiple virtual machines 12 with a data sharing relationship is not limited.
  • the allocation of shared memory windows includes the application of shared memory windows and the exposure of shared memory windows.
  • the target process module 11 can call a memory allocation command to apply for a shared memory window to the operating system (OS) of the host 10.
  • OS operating system
  • the target process module 11 can respond to the instruction to start the virtual machine, determine that it is necessary to apply for a shared memory window for these virtual machines with a shared data relationship, and then, through some standard kernel access interfaces, such as The application programming interface (API) of the portable operating system interface (POSIX) accesses the OS of the host 10, applies for a shared memory window from the relevant unit of the OS, and the relevant unit of the OS applies for a shared memory window with a read-only attribute for multiple virtual machines 12 from the physical space of the host 10.
  • API interface provided by the OS can set the access right of the shared memory window to read-only.
  • the size of the shared memory window is not limited, and can be a fixed size, or the size of the shared memory window can be determined by the target process module 11.
  • a virtualization management module that provides virtualization services for multiple virtual machines 12 is also deployed on the host 10, which is not shown in Figure 1.
  • the virtualization management module is responsible for the creation of the virtual machine 12.
  • the hardware resources required for virtualization of the virtual machine 12 are reserved and a certain size of physical address space is reserved.
  • the physical address space is mounted on the virtual machine as a peripheral component interconnect (PCI) device.
  • PCI peripheral component interconnect
  • BAR base address register
  • the reserved physical address space is used for the virtual machine to cache shared data.
  • the target process module 11 when the target process module 11 applies for a shared memory window, it also provides the physical address of the shared memory window to the virtual machine management module; when multiple virtual machines are started, the virtualization management module maps the physical address space reserved by each of the multiple virtual machines to the shared memory window according to the physical address of the shared memory window, so that the virtual machine can directly access the shared memory window, and then read the required first shared data from the shared memory window to complete the exposure of the shared memory window.
  • a very small space is reserved in the header area of the reserved physical address space to store the description information of the shared memory window, which is used to describe which shared data is stored in the shared memory window, and the position offset of these shared data in the shared memory window, for example, the address offset D1 to D2 in the shared memory window is a readable shared data, so that the virtual machine can read the corresponding shared data from the shared memory window, and D1 and D2 represent the address offset in the shared memory window.
  • the description information of the shared memory window may include that the address offset from A1 (such as 0x0000 0000 0000) to A3 (such as 0x0000 0000 3F00 0000) in the shared memory window is the inode data, the address offset from B1 to B3 is the inode victim cache data, and the address offset from C1 to C3 is the directory entry data.
  • the above description information may also specifically include that the address offset from A1 to A2 in the shared memory window is the readable data in the inode data, the address offset from A2 to A3 is the bitmap area in the inode data, and the same is true for inode victim and dentry.
  • A1, A2, A3, B1, B3, C1 and C3 represent address offsets in the shared memory space, for example, 0x0000 0000 0000 and 0x0000 0000 3F00 0000 are address offsets represented by hexadecimal, 64-bit integers, which are examples of address offset A1 and address offset A2.
  • the implementation method of the virtualization management module is not limited, for example, it can be but not limited to: a virtual machine monitor (hypervisor) or a combination of a quick simulator (Quick EMUlator, QEMU) and a kernel-based virtual machine (KVM) in the hypervisor.
  • QEMU is responsible for the creation of the virtual machine and related work
  • KVM is responsible for the mapping of the reserved physical address space to the shared memory window.
  • the use of the shared memory window is not limited.
  • the shared data between multiple virtual machines with a data sharing relationship has multiple types. Based on this, the shared memory window can be divided into multiple memory areas according to the type of shared data, and different memory areas are used to store different types of shared data. As shown in Figure 2, it is a schematic diagram of the state of the shared memory window being divided into multiple memory areas. Taking the shared data as file metadata as an example, assuming that the file metadata includes three types of metadata: index node data, sacrifice cache data of the index node, and directory entry data, the shared memory window can be divided into three memory areas, recorded as memory area A, memory area B, and memory area C, as shown in Figure 2.
  • memory area A is used to store index node data
  • memory area B is used to store sacrifice cache data of the index node
  • memory area C is used to store directory entry data.
  • the index node data is a data structure that represents the corresponding file status.
  • the index node data records the file name, file version number, file size, file access rights, file disk location, when it is created/modified, whether it is a file or a directory, etc.
  • the index node data is indexed by the index node number (inode number).
  • Directory entry data is a kernel data structure that is responsible for mapping file names to corresponding inode numbers.
  • Directory entry data records the correspondence between file names and inode numbers, as well as the inode number of the previous level (referred to as the parent inode number). Based on directory entry data, file operations can be based on the inode number of the file, rather than relying on the storage path of the file.
  • the victim cache data of the index node refers to the data stored in the victim cache.
  • the victim cache is a relatively small cache space used to store the replaced index node data when the index node data must be replaced from its memory area. In this way, the virtual machine can check whether the latest version of the file data is cached locally through the victim cache data of the index node and the index node data in the memory area.
  • the description information corresponding to the first shared data can be determined according to the shared data read request, and the description information includes the type and identification information of the first shared data; according to the type of the first shared data, the first memory area adapted to the type of the first shared data is determined from the multiple memory areas; according to the identification information of the first shared data, the first shared data is read from the first memory area. Depending on the type of the first shared data, the first memory area will also be different.
  • the type of the first shared data refers to the index node data, rather than the directory entry data or the sacrifice cache data of the index node, and the identification information of the first shared data is the inode number in the index node data.
  • the first memory area is the memory area A shown in FIG2.
  • the type of the first shared data refers to the sacrifice cache data of the index node, rather than the directory entry data or the index node data, and the identification information of the first shared data is the inode number in the sacrifice cache data of the index node.
  • the first memory area is the memory area C shown in FIG2.
  • the type of the first shared data refers to directory entry data, rather than index node data or index node sacrifice cache data.
  • the identification information of the first shared data is the inode number in the directory entry data. Accordingly, the first shared data is The memory area is the memory area B shown in FIG. 2 .
  • the description information corresponding to the second shared data can be determined according to the shared data write request, and the description information includes the type and identification information of the second shared data; according to the type of the second shared data, a second memory area adapted to the type of the second shared data is determined from multiple memory areas; the second shared data is written into the second memory area and the second shared data is read. Depending on the type of the second shared data, the second memory area will also be different.
  • the type of the second shared data refers to the index node data, not the directory entry data or the sacrifice cache data of the index node, and the identification information of the second shared data is the inode number in the index node data.
  • the second memory area is the memory area A shown in FIG2.
  • the type of the second shared data refers to the sacrifice cache data of the index node, not the directory entry data or the index node data, and the identification information of the second shared data is the inode number in the sacrifice cache data of the index node.
  • the second memory area is the memory area C shown in FIG2.
  • the type of the second shared data refers to directory entry data, not index node data or sacrifice cache data of the index node.
  • the identification information of the second shared data is the inode number in the directory entry data. Accordingly, the second memory area is the memory area B shown in Figure 2.
  • this embodiment does not limit the size of multiple memory areas and the size relationship between them.
  • the size of memory area A can be configured to be, for example, 140 bytes
  • the size of memory area B can be configured to be, for example, 15 bytes
  • the size of memory area C can be configured to be, for example, 50 bytes, etc.
  • the size of each memory area is limited, as shared data is continuously generated, the memory area will overflow, and this embodiment allows the replacement operation of the shared data in each memory area.
  • multithreading can be used to write to the same memory area in parallel.
  • an error will occur if the same position is written at the same time.
  • slot sets slot sets
  • parallel writing is allowed between different slot sets, that is, write operations can be performed on multiple slot sets at the same time. Since write operations are performed in different slot sets, write errors caused by writing to the same position at the same time can be avoided.
  • each slot set includes multiple slots (slots), and each slot is a small storage space in the memory area, which is used to store a shared data of the corresponding type of the memory area to which it belongs.
  • the number of slot sets contained in the memory area is not limited, nor is the number of slots contained in the slot set limited, which can be flexibly determined according to application requirements.
  • FIG3 taking any memory area as an example, it is divided into a plurality of slot groups, namely set_0-set_N, and each slot group includes 4 slots as an example for illustration, where N is a natural number greater than or equal to 1.
  • a hash method can be used to disperse the shared data in the memory area into different slot groups. Furthermore, a hash calculation can be performed on the identification information of each shared data, and the shared data with the same hash result can be stored in the same slot group, and a mapping relationship between the slot group and the hash result can be established. Based on this, when the virtual machine needs to read the first shared data from the shared memory window, it can read the first shared data from the shared memory window. Determine the first slot in which the first shared data is stored in the shared memory window; read the first shared data from the first slot.
  • the first slot is a slot in any slot group in the first memory area, and the first memory area is a memory area adapted to the type of the first shared data.
  • the description information corresponding to the first shared data can be determined according to the shared data read request, and the description information includes the type and identification information of the first shared data; according to the type of the first shared data, the first memory area is determined from multiple memory areas; then, according to the hash result of the identification information of the first shared data, the first slot group to which the first shared data belongs in the first memory area is determined; according to the identification information of the shared data stored in each slot in the first slot group, the first slot where the first shared data is located is determined from the first slot group. Specifically, the identification information of the first shared data can be compared with the identification information of the shared data stored in each slot in the first slot group, so as to determine the first slot where the first shared data is located.
  • the second slot for storing the second shared data can be determined from the shared memory window; and the second shared data can be written into the second slot.
  • the second slot is a slot in any slot group in the second memory area, and the second memory area is a memory area adapted to the type of the second shared data.
  • the description information corresponding to the second shared data can be determined according to the shared data write request, and the description information includes the type and identification information of the second shared data; according to the type of the second shared data, the second memory area adapted to the type of the second shared data is determined from multiple memory areas; according to the hash result of the identification information of the second shared data, the second slot group to which the second shared data belongs in the second memory area is determined; according to the access information of each slot in the second slot group, the second slot for storing the second shared data is determined from the second slot group.
  • an idle slot can be selected from the second slot group as the second slot to store the second shared data; when all the slots in the second slot group are occupied, the target process module 11 can replace the shared data in a certain slot in a certain old data replacement (Evicting older version) manner to achieve slot recycling, and use the recycled slot as the second slot to store the second shared data.
  • the target process module 11 can replace the shared data in a certain slot in a certain old data replacement (Evicting older version) manner to achieve slot recycling, and use the recycled slot as the second slot to store the second shared data.
  • the old data replacement method adopted by the target process module 11 is not limited.
  • the access frequency and access time of the shared data on each slot can be recorded, and the least recently accessed replacement method can be adopted to replace the shared data with the least recently accessed data; or, the storage time of the shared data on each slot can be counted, and the shared data with the earliest storage time can be replaced.
  • the embodiment of the present application provides a simple and efficient old data replacement method to achieve slot recycling, which can be called clock algorithm reclaiming slots.
  • each memory area also includes a bitmap area, which includes bit groups corresponding to each slot group in the memory area to which it belongs, and each bit group is used to record the access information of whether each slot in the corresponding slot group has been accessed.
  • One bit corresponds to one slot, and each bit is a small storage space in a shared memory window.
  • each bit can take two values, which are recorded as a first value and a second value.
  • the first value indicates that the slot corresponding to the bit has been accessed, that is, the slot has been written with shared data (that is, it has been occupied), and the second value indicates that the slot corresponding to the bit has not been accessed or can be replaced, that is, the slot has not been written with shared data (that is, it is in an idle state).
  • the values of the first value and the second value are not limited. As shown in FIG. 4, the first value can be 1 and the second value can be 0, that is, the slot corresponding to the bit with a value of 1 has been occupied, and the slot corresponding to the bit with a value of 0 has not been accessed or can be replaced.
  • the target process module 11 can traverse the access information of each slot recorded in the bit group corresponding to the second slot group. For each bit position traversed, if the value of the bit position is the first value, the bit position is set to the second value to indicate that the shared data in the slot corresponding to the bit position can be replaced when the bit position is traversed again, and continue to traverse the subsequent bits until the target bit position with the second value is traversed for the first time, and the slot corresponding to the target bit position is used as the second slot for storing the second shared data. In a special case, all slots in the second slot group have been accessed.
  • the value of each bit in the bit group corresponding to the second slot group is the first value, such as 1. This requires traversing from the beginning to the last bit position, and then traversing from the beginning again. Since the value of each bit position has been set to the second value, such as 0, in the last traversal, the shared data in the first slot can be replaced. At this time, the first slot can be used as the second slot to store the second shared data.
  • the method of storing shared data in the shared memory window is not limited, and any method that can successfully store shared data is applicable to the embodiment of the present application.
  • the following structure is used when storing shared data in the shared memory window:
  • the two data storage units and a version number storage unit are allocated.
  • the two data storage units are used to store the same shared data
  • the version number storage unit is used to store a temporary version number.
  • the temporary version number has parity (even/odd), and the parity of the temporary version number is used to identify the data storage unit in the two data storage units that is currently effective for read operations and write operations respectively; wherein, when it is necessary to read shared data from the shared memory window, it is necessary to read from the "data storage unit effective for read operations", and when it is necessary to write shared data to the shared memory window, it is necessary to write to the "data storage unit effective for write operations" to support simultaneous read and write operations of the same shared data.
  • the parity of the temporary version number can change with the occurrence of write operations in the slot to ensure that the latest written shared data can be effective for read operations after each write operation is completed, so that the virtual machine can read the latest shared data.
  • a storage location for storing the first shared data can be determined, which storage location includes two data storage units and a version number storage unit; a temporary version number is read from the version number storage unit, and the first data storage unit for the read operation is determined based on the parity of the temporary version number read; the intermediate shared data is read from the first data storage unit, and after the reading is completed, the temporary version number read is compared with the temporary version number in the version number storage unit contained in the storage location; if the two are the same, the intermediate shared data is used as the first shared data; if the two are inconsistent, the operation of reading the first shared data from the first slot is re-executed according to the aforementioned steps.
  • a storage location for storing the second shared data can be determined, which includes two data storage units and a version number storage unit; a temporary version number is read from the version number storage unit, and according to the parity of the read temporary version number, the second data storage unit effective for the write operation in the two data storage units included in the storage location is determined; the second shared data is written to the second data storage unit, and after the writing is completed, the parity of the temporary version number is changed to change the second data storage unit to be effective for the read operation.
  • the value of the temporary version number is not concerned, but its parity is concerned.
  • the method of using two data storage units and one version number storage unit to store shared data does not rely on the division of the shared memory window, and the method of traversal or polling can be used to determine the storage of the first shared memory window.
  • the storage location of the shared data or the storage location for storing the second shared data is not limited.
  • the method of storing shared data using two data storage units and one version number storage unit can be combined with the slots divided by the shared memory window.
  • each slot includes two data storage units and one version number storage unit.
  • the two data storage units are used to store the same shared data.
  • the version number storage unit is used to store a temporary version number.
  • the temporary version number has parity, and the parity of the temporary version number is used to identify the data storage unit in the two data storage units that is currently effective for read operations and write operations respectively.
  • the virtual machine when the virtual machine reads the first shared data from the shared memory window, it can determine the first slot in the manner described above; then read the first shared data from the first slot in the following manner, specifically: read the temporary version number from the first slot, and determine the first data storage unit of the two data storage units contained in the first slot that is effective for the read operation based on the parity of the read temporary version number; read the intermediate shared data from the first data storage unit, and after the reading is completed, compare the read temporary version number with the temporary version number in the first slot; if the two are the same, it means that the target process module 11 has not modified the first shared data during the reading of the first shared data, so the intermediate shared data can be used as the first shared data; if the two are inconsistent, it means that the target process module 11 has modified the first shared data during the reading of the first shared data, and the read intermediate shared data is not the latest version of the data, so it is necessary to re-execute the operation of reading the first shared data from the first slot according to the previous steps.
  • the second slot can be determined in the manner described above; then the second shared data is written into the second slot in the following manner, specifically: based on the parity of the temporary version number in the second slot, the second data storage unit that is effective for write operations among the two data storage units included in the second slot is determined; the second shared data is written into the second data storage unit, and after the writing is completed, the parity of the temporary version number is changed to change the second data storage unit to be effective for read operations.
  • the method of changing the parity of the temporary version number is not limited.
  • the temporary version number after each write is completed, can be increased by 1, thereby continuously changing the parity of the temporary version number.
  • two values can also be pre-set, an odd number, such as 1, and an even number, such as 2, and then, after each write is completed, the temporary version number is switched between these two values, thereby continuously changing the parity of the temporary version number.
  • the correspondence between "the parity of the temporary version number” and “the two data storage units are effective for write operations and for read operations” is not limited.
  • the two data storage units are recorded as data storage unit D1 and data storage unit D2.
  • the temporary version number when the temporary version number is an odd number, it means that the shared data in the data storage unit D1 is effective for write operations, and the shared data in the data storage unit D2 is effective for read operations; then when the temporary version number becomes an even number, it means that the shared data in the data storage unit D1 is effective for read operations, and the shared data in the data storage unit D2 is effective for write operations.
  • the temporary version number when the temporary version number is an even number, it means that the shared data in the data storage unit D1 is effective for write operations, and the shared data in the data storage unit D2 is effective for read operations; then when the temporary version number becomes an odd number, it means that the shared data in the data storage unit D1 is effective for read operations, and the shared data in the data storage unit D2 is effective for write operations.
  • the specific implementation method of sharing data between virtual machines 12 is not limited.
  • the virtual machine 12 can be used as a user of shared data, and data sharing can be performed between these virtual machines 12; in other application scenarios, upper-layer applications can be deployed in the operating environment provided by the virtual machine 12, and these upper-layer applications deployed in the virtual machine 12 can be used as users of shared data, and data sharing can be performed between these upper-layer applications.
  • the virtual machine 12 also includes a data sharing management module 121, which is responsible for responding to the shared data read request initiated by the upper-layer application deployed in the virtual machine 12, reading the first shared data from the shared memory window, and performing corresponding operations according to the first shared data.
  • the data sharing management module 121 performs corresponding operations according to the first shared data specifically as follows: returning the first shared data to the upper-layer application, so that the upper-layer application can process the first shared data or perform subsequent operations according to the first shared data.
  • the implementation method of the upper-layer application is not limited, for example, it can be various video applications, mail applications, instant messaging applications, image processing applications, etc.
  • the data sharing solution provided in the embodiment of the present application is applicable to various application scenarios.
  • the following takes file metadata sharing in a cloud computing scenario as an example to describe in detail the technical solution of the embodiment of the present application.
  • FIG5 is a schematic diagram of the structure of a file metadata sharing system 200 provided in an embodiment of the present application.
  • the system 200 includes: a target process module 21 deployed on the same host 20, a host file system 22, and multiple virtual machines 23 with a data sharing relationship, and a guest file system 24 is deployed in the multiple virtual machines 23.
  • multiple hosts 20 are used as an example for illustration.
  • the virtual machine 23, as a running environment, can carry various upper-layer applications of the tenant. These upper-layer applications will generate data during the running process, and will also access data generated by other upper-layer applications or other data.
  • various data are organized and managed in the form of a file system.
  • a file system is deployed in the virtual machine 23, which is recorded as a client file system 24.
  • the host 20 has its own file system, which is recorded as a host file system 22.
  • the host file system 22 cooperates with the client file system 24 in each virtual machine 23 to provide file-related services for the upper-layer applications in the virtual machine 23, such as creating files, deleting files, opening files, reading files, renaming files, etc.
  • the file metadata sharing system 200 also includes: multiple storage nodes 25, which are distributed in different locations and are responsible for each file in the storage system 200.
  • the client file system 24 obtains the corresponding file metadata and initiates a file access request to the host file system 22 based on the obtained file metadata.
  • the host file system 25 performs file access operations on the corresponding storage node 25 according to the file metadata provided by the client file system 24, such as creating a file, deleting a file, opening a file, reading a file, renaming a file, etc.
  • the file metadata includes relevant information required to perform the file access operation, such as the file name, the location of the file on the disk, the type of the file, etc. For the description of the file metadata, please refer to the above embodiment, which will not be repeated here.
  • the file metadata is shared between the virtual machines 23 having a data sharing relationship.
  • a shared memory window is allocated to these virtual machines 23 on the host 20, and the shared file metadata between the multiple virtual machines 23 is stored through the local shared memory window, so as to realize local access to the file metadata, shorten the access path of the file metadata, and improve the access performance of the file metadata.
  • a target process module 21 is deployed on the host 20.
  • the target process module 21 may be a daemon process on the host 20, and is used to allocate a shared memory window for multiple virtual machines 23 from the physical space of the host 20.
  • the shared memory window is used to store file metadata shared between multiple virtual machines 23, is visible to all multiple virtual machines 23, and has a read-only attribute for multiple virtual machines 23.
  • the shared memory window is readable and writable to the target process module 21.
  • an upper-layer application in any virtual machine 23 When an upper-layer application in any virtual machine 23 needs to perform a file access operation, it can initiate a file metadata read operation to the client file system 24 in the virtual machine 23 where it is located.
  • the client file system 24 responds to the file metadata read operation initiated by the upper-layer application in the virtual machine 23 to which it belongs, reads the first file metadata from the shared memory window and returns it to the upper-layer application, so that the upper-layer application can perform the file access operation through the host file system 22 according to the first file metadata.
  • the first file metadata refers to the file metadata that needs to be read from the shared memory window, which can be any file metadata.
  • the upper-layer application when it needs to query the relevant file metadata, it can also initiate a file metadata read operation to the client file system 24 in the virtual machine 23 where it is located, which is not limited to performing a file access operation.
  • the file metadata read operation can be, but is not limited to: a request to obtain a file attribute value, a request to find a file directory, and a request to open a file.
  • the shared memory window is transparent.
  • the upper-level application initiates a request to obtain file attribute values, search for file directories, or open files, the upper-level application is unaware of the client file system 24 performing file metadata-related read operations from the shared memory window. For the upper-level application, it is as if these operations are completed in the kernel of the virtual machine.
  • the metadata of the accessed file may change. For example, when the name of a file is modified, the file name in the file metadata will change. For another example, when a file is modified, the most recent modification time of the file in the file metadata will change. These changes will generate new file metadata. In order for each virtual machine 23 to perceive the latest file metadata, it is necessary to write the new file metadata into the shared memory window. For the convenience of description, the newly generated file metadata is referred to as the second file metadata. In this embodiment, the host file system 22 can determine whether to generate new second file metadata according to the result of the file access operation during the execution of the file access operation.
  • the second file metadata is generated according to the execution result of the file access operation and the file metadata write operation is initiated to the target process module 21; the target process module 21 responds to the file metadata data write operation initiated by the host file system 22, and writes the second file metadata into the shared memory window so that multiple virtual machines can share the second file metadata.
  • the host file system 22 can provide the second file metadata to the target process module 21.
  • the second file metadata can be provided to the target process module 21 by way of but not limited to: Remote Procedure Call (RPC), so that the target process module 21 can write the second file metadata into the shared memory window.
  • RPC Remote Procedure Call
  • the target process module 21 and the shared memory window are used to store data between the client file system 24 and the host file system.
  • System 22 provides file metadata services, so that file metadata can be shared across virtual machines on the same host, and short-read and long-write of file metadata can be realized.
  • short-read refers to the virtual machine 23 directly reading the required first file metadata from the shared memory window without passing through the target process module 21
  • long-write refers to the writing of the second file metadata into the shared memory window through the target process module 21, rather than the host file system 22 directly writing into the shared memory window.
  • the long-write method can avoid data errors or security risks caused by multiple virtual machines performing write operations on the shared memory space at the same time.
  • the target process module can access the host file system using an interface supported by the host file system, such as POSIX API, to ensure compatibility between the target process module and the host file system. Based on this, the target process module can call a memory allocation command to inform the host file system to allocate memory space for it, and the target process module exposes the allocated memory space as a shared memory window to each virtual machine. Specifically, the target process module provides the physical address of the shared memory window to the virtual machine management module; when multiple virtual machines are started, the virtualization management module maps the physical address space reserved by each of the multiple virtual machines to the shared memory window according to the physical address of the shared memory window, thereby completing the exposure of the shared memory window.
  • POSIX API an interface supported by the host file system
  • the target process module can call a memory allocation command to inform the host file system to allocate memory space for it, and the target process module exposes the allocated memory space as a shared memory window to each virtual machine.
  • the target process module provides the physical address of the shared memory window to the virtual machine management module;
  • the physical address spaces reserved by each virtual machine can be the same or different, that is, for different virtual machines, the same shared memory window can be mapped to different private memory addresses (i.e., reserved physical address spaces).
  • the reserved physical address space can also include description information of the shared memory window, so that the virtual machine 23 can read file metadata from the shared memory window according to the description information. The relevant description of the description information of the shared memory window can be found in the aforementioned embodiment, which will not be repeated here.
  • the target process module may perform a zeroing operation on the shared memory window to provide a basis for subsequently writing file metadata into the shared memory window.
  • the shared memory window is managed and maintained from the following dimensions, as shown below:
  • the shared memory window can be divided into three memory areas, such as memory area A, memory area B, and memory area C shown in FIG2.
  • Memory area A is used to store inode data
  • memory area B is used to store inode victim cache data
  • memory area C is used to store dentry data.
  • a storage structure for storing file metadata in a memory area that is, for each file metadata, two data storage units and one version number storage unit are used to store it, the two data storage units are used to store the same file metadata, the version number storage unit is used to store a temporary version number, the temporary version number has parity (even/odd), and the parity of the temporary version number is used to identify the data storage unit in the two data storage units that is currently effective for read operations and write operations, respectively.
  • This embodiment also exemplarily provides the data structures of the index node data, the victim cache data, and the directory entry data, as well as the storage structure of each type of data when stored in the memory area.
  • the schematic definitions of these data structures and storage structures are as follows:
  • each data structure is padded to align with the size of the cache line. If the size of the file metadata is smaller than the cache line size, it is aligned to a power of 2, which is conducive to improving the read and write speed when reading the file metadata from the shared memory window. In addition, whether reading file metadata from the shared memory window or writing file metadata from the shared memory window, the data structure of each file metadata and the storage structure of each file metadata in the memory area are followed.
  • each slot set includes multiple slots for storing different file metadata.
  • the file metadata can be used as an element in the slot.
  • Each element can find the slot set to which it belongs through its hash fingerprint. For example, for the two types of file metadata, index node data and index node sacrifice cache data, their inode numbers can be hashed and mapped to the corresponding slot set according to the hash result; for directory entry data, their parent inode number and file name can be hashed together and mapped to the corresponding slot set according to the hash result.
  • An example of slot sets and slots can be shown in Figure 3.
  • the file metadata in the shared memory window has an atomic update function, so multiple client file systems can simultaneously retrieve the required file metadata in the shared memory window in a non-blocking manner.
  • this section describes the reading and writing process of the file metadata.
  • Read operation of file metadata According to the file metadata read request, determine the description information corresponding to the first file metadata to be read, for example, if the first file metadata is index node data, its description information may include the inode number; if the first file metadata is sacrifice cache data of the index node, its description information may include the inode number; if the first file metadata is directory entry data, its description information may include the parent inode number and the file name.
  • Hash the description information corresponding to the first file metadata, and determine the slot group where the first file metadata is located based on the hash result, assuming it is the first slot group; determine the first slot where the first file metadata is located based on the identification information of the shared data stored in each slot in the first slot group; read the temporary version number from the first slot, and determine the first data storage unit that is effective for the read operation among the two data storage units contained in the first slot based on the parity of the read temporary version number; read the intermediate metadata from the first data storage unit, and after the reading is completed, compare the read temporary version number with the temporary version number in the first slot; if the two are the same, it means that the target process module 21 has not modified the first file metadata during the reading of the first file metadata, so the intermediate metadata can be used as the first file metadata; if the two are inconsistent, it means that the target process module 21 has modified the first file metadata during the reading of the first file metadata, and the read intermediate metadata is not the latest version of the data, so it is necessary to re-execute
  • write operation of file metadata According to the file metadata write request, determine the description information corresponding to the second file metadata to be written, for example, if the second file metadata is index node data, its description information may include the inode number, if the second file metadata is the sacrifice cache data of the index node, its description information may include the inode number, if the second file metadata is directory entry data, its description information may include the parent inode number and the file name.
  • Hash the description information corresponding to the second file metadata, and determine the slot group where the second file metadata is located according to the hash result, assuming it is the second slot group; according to the access information of each slot in the second slot group, determine the second slot for storing the second shared data from the second slot group; according to the temporary version number in the second slot, determine the second data storage unit that is effective for the write operation among the two data storage units contained in the second slot; write the second shared data into the second data storage unit, and after the writing is completed, change the parity of the temporary version number to change the second data storage unit to be effective for the read operation.
  • each memory area includes a bitmap area in addition to a plurality of slot groups.
  • the bitmap area includes a bit group corresponding to each slot group in the memory area to which it belongs.
  • Each bit group is used to record access information of whether each slot in the corresponding slot group has been accessed.
  • One bit corresponds to one slot, and each bit is a small storage space in a shared memory window. Based on this, the slot can be recycled according to the clock algorithm. Based on this, when determining the second slot, the access information of each slot recorded in the bit group corresponding to the second slot group can be traversed.
  • bit position traversed For each bit position traversed, if the value of the bit position is the first value, the bit position is set to the second value to indicate that the file metadata in the slot corresponding to the bit position can be replaced when the bit position is traversed again, and the subsequent bits are traversed until the target bit position with the second value is traversed for the first time, and the slot corresponding to the target bit position is used as the second slot for storing the second file metadata.
  • the file metadata in the shared memory window is responsible for serving most non-write requests, including requests to obtain file attribute values, requests to search for file directories, and requests to open files.
  • the request to obtain file attribute values can be implemented by calling the getattr function
  • the request to search for file subdirectories can be implemented by calling the lookup function
  • the request to open files can be implemented by calling the open function.
  • getattr function used to obtain the attributes of files, directories, or folders.
  • lookup function according to the d_name member of the dentry of the current path, search for the inode number of the current directory in the parent directory file of the current directory (represented by inode).
  • open function used to open or create a file. Among them, the getattr function, the lookup function, and the open function are existing functions in traditional file systems, and no further explanation is given for this.
  • Opening a file is a special case that requires all three functions mentioned above. It is a combination of the three operations.
  • the client file system first searches the file path to confirm whether the file exists, and reads the relevant attributes of the file through the shared memory window to determine whether the user has access rights. After that, the client file system assigns a file descriptor to the upper-layer application and returns it.
  • the upper-layer application can perform the file open operation based on the file descriptor.
  • Lookup Indicates that you need to iterate along the file path and search for file metadata in the shared memory window by checking The file name and parent inode number are used to confirm the directory to which the file belongs. If the directory to which the file belongs cannot be found, "not found" will be returned, and the request will be sent to the target process module (such as a daemon process) to help retrieve the directory to which the file belongs.
  • the target process module such as a daemon process
  • Getattr When searching along a file directory, it searches for the index node data under the path in the shared memory window. If the index node data is not found, it helps to retrieve the index node data by sending the request to the target process module (such as a daemon process).
  • the target process module such as a daemon process
  • the system 200 further includes: a master node 26, which is used to store and maintain various file metadata of each file, such as index node data, directory entry data, and sacrifice cache data.
  • the complete file metadata is stored on the master node 26.
  • the master node 26 can be a remote server. Wherein, when the client file system cannot query or retrieve the required file metadata from the shared memory window, the target process module can be requested to obtain the required file metadata from the master node 26.
  • the target process module is also used to: when the first file metadata is not stored in the shared memory window, obtain the first file metadata from the master node 26, write the first metadata into the shared memory window, so that multiple virtual machines can read the first metadata; and when the second file metadata is written into the shared memory window, synchronize the second file metadata to the master node, so that the master node saves the second file metadata.
  • a shared memory window is allocated to them to cache the file metadata that each virtual machine needs to share, and the virtual machine and the target process module can cooperate with each other to achieve local access to the file metadata through short reading and long writing, shorten the access path of the file metadata, and improve the access performance of the file metadata;
  • the shared memory window has a read-only attribute for these virtual machines, that is, the virtual machine can read but not write to the shared memory window, and a target process module is added to the host.
  • the writing of the file metadata is uniformly responsible by the target process module, which can avoid data errors that may be caused by multiple virtual machines writing to the shared memory window at the same time.
  • any modification of the file metadata can be immediately observed by each virtual machine, so that the consistency of the file metadata can be achieved for each virtual machine, and there is no need for synchronization through network transmission, which can further improve the access performance of the file metadata.
  • FIG6 is a flow chart of a data sharing method provided in an embodiment of the present application.
  • the method is applied to a data sharing system, which includes a target process module deployed on the same host and multiple virtual machines having a data sharing relationship.
  • a data sharing system which includes a target process module deployed on the same host and multiple virtual machines having a data sharing relationship.
  • the method includes:
  • the target process module allocates a shared memory window from the physical space of the host, where the shared memory window is used to store shared data between multiple virtual machines deployed on the host and has a read-only attribute for the multiple virtual machines.
  • the multiple virtual machines read the required first shared data from the shared memory window, and perform corresponding operations according to the first shared data;
  • the target process module writes the second shared data into the shared memory window, so that the multiple virtual machines can share the second shared data.
  • step 602 and step 603 is not limited in the embodiment of the present application.
  • the two can be executed sequentially or in parallel, depending on the generation of the second shared data and the virtual machine's need for the first shared data.
  • the target process module allocates a shared memory window from the physical space of the host, including: The target process module may apply for a shared memory window in the physical space of the host; provide the physical address of the shared memory window to the virtual machine management module on the host, so that the virtualization management module can map the physical address space reserved by each of the multiple virtual machines to the shared memory window. Further optionally, the target process module may respond to the instruction to start multiple virtual machines, that is, when the virtual machines are started, apply for a shared memory window from the physical space of the host.
  • the shared memory window is divided into multiple memory areas, and different memory areas are used to store different types of shared data; each memory area includes multiple slot groups, each slot group includes multiple slots, and different slots are used to store different shared data.
  • each slot includes two data storage units and a version number storage unit, the two data storage units are used to store the same shared data, the version number storage unit is used to store a temporary version number, the parity of the temporary version number is used to identify the data storage unit of the two data storage units that is currently effective for read operations and write operations respectively, and the parity of the temporary version number changes with the occurrence of a write operation in the slot.
  • multiple virtual machines read required first shared data from a shared memory window, including: when the first shared data is needed, determining a first slot storing the first shared data from the shared memory window; reading the first shared data from the first slot; the first slot is a slot in any slot group in a first memory area, and the first memory area is a memory area adapted to the type of the first shared data.
  • determining the first slot storing the first shared data from the shared memory window includes: determining first description information corresponding to the first shared data according to a shared data read request, the first description information including a type and identification information of the first shared data; determining the first memory area from multiple memory areas according to the type of the first shared data; determining a first slot group to which the first shared data belongs in the first memory area according to a hash result of the identification information of the first shared data; and determining the first slot where the first shared data is located from the first slot group according to the identification information of the shared data stored in each slot in the first slot group.
  • reading the first shared data from the first slot includes: reading a temporary version number from the first slot, and determining, based on the parity of the read temporary version number, a first data storage unit of the two data storage units contained in the first slot that is effective for the read operation; reading the intermediate shared data from the first data storage unit, and after the reading is completed, comparing the read temporary version number with the temporary version number in the first slot; if the two are the same, using the intermediate shared data as the first shared data; if the two are inconsistent, re-executing the operation of reading the first shared data from the first slot.
  • the target process module writes the second shared data into the shared memory window, including: when the second shared data is generated between multiple virtual machines, determining a second slot for storing the second shared data from the shared memory window; writing the second shared data into the second slot; the second slot is a slot in any slot group in the second memory area, and the second memory area is a memory area adapted to the type of the second shared data.
  • determining a second slot for storing the second shared data from the shared memory window includes: determining second description information corresponding to the second shared data according to a shared data write request, the second description information including a type and identification information of the second shared data; determining a second memory area from a plurality of memory areas according to the type of the second shared data; determining a second slot group to which the second shared data belongs in the second memory area according to a hash result of the identification information of the second shared data; determining a second slot group for storing the second shared data from the second slot group according to access information of each slot in the second slot group Second slot for shared data.
  • any memory area also includes a bitmap area, the bitmap area includes bit groups corresponding to each slot group in the memory area to which it belongs, and a bit group is used to record access information of whether each slot in the slot group corresponding to it has been accessed.
  • the second slot for storing the second shared data is determined from the second slot group, including: traversing the access information corresponding to each slot recorded in the bit group corresponding to the second slot group, for the traversed bit position, if the value of the bit position is the first value, the bit position is set to the second value, and the subsequent bit positions are traversed until the target bit position with the value of the second value is traversed for the first time, and the slot corresponding to the target bit position is used as the second slot; the first value indicates that the slot corresponding to the bit position has been accessed, and the second value indicates that the slot corresponding to the bit position has not been accessed or can be replaced.
  • writing the second shared data into the second slot includes: determining, based on the parity of the temporary version number in the second slot, a second data storage unit among the two data storage units included in the second slot that is effective for write operations; writing the second shared data into the second data storage unit, and after the writing is completed, changing the parity of the temporary version number to change the second data storage unit to be effective for read operations.
  • FIG7a is a flow chart of another data sharing method provided in an embodiment of the present application.
  • the method is applied to a data sharing system, which includes a target process module deployed on the same host and multiple virtual machines having a data sharing relationship.
  • a data sharing system which includes a target process module deployed on the same host and multiple virtual machines having a data sharing relationship.
  • the method of this embodiment is described from the perspective of the target process module, as shown in FIG7a, the method includes:
  • step 701a and step 702a please refer to the corresponding steps in the embodiment shown in FIG6 , which will not be described in detail here.
  • FIG7b is a flow chart of another data sharing method provided in an embodiment of the present application.
  • the method is applied to a data sharing system, which includes a target process module deployed on the same host and multiple virtual machines having a data sharing relationship.
  • a data sharing system which includes a target process module deployed on the same host and multiple virtual machines having a data sharing relationship.
  • the method of this embodiment is described from the perspective of any virtual machine, as shown in FIG7b, the method includes:
  • step 701b and step 702b please refer to the corresponding steps in the embodiment shown in Figure 6, which will not be repeated here.
  • the detailed implementation and beneficial effects of each step in the above method embodiments have been described in detail in the above embodiments, and will not be elaborated here.
  • FIG8 is a schematic diagram of the structure of a data sharing device provided in an embodiment of the present application.
  • the device can be implemented as a target process module in the above embodiment, but is not limited thereto. As shown in FIG8 , the device includes:
  • An allocation module 81 configured to allocate a shared memory window from the physical space of the host, wherein the shared memory window is used to store shared data between multiple virtual machines and has a read-only attribute for multiple virtual machines so that the multiple virtual machines can read the required first shared data from the shared memory window;
  • the writing module 82 is used to write the second shared data into the shared memory window so that multiple virtual machines can share the second shared data.
  • the allocation module 81 is specifically used to: respond to instructions to start multiple virtual machines and apply for a shared memory window from the physical space of the host; provide the physical address of the shared memory window to the virtual machine management module on the host, so that the virtualization management module can map the physical address space reserved for each of the multiple virtual machines into a shared memory window.
  • the shared memory window is divided into multiple memory areas, and different memory areas are used to store different types of shared data; a memory area includes multiple slot groups, a slot group includes multiple slots, and different slots are used to store different shared data.
  • the writing module 82 is specifically used for: when second shared data is generated between multiple virtual machines, determining a second slot for storing the second shared data from the shared memory window; writing the second shared data into the second slot; the second slot is a slot in any slot group in the second memory area, and the second memory area is a memory area adapted to the type of the second shared data.
  • the write module 82 is specifically used to: determine second description information corresponding to the second shared data according to a shared data write request, the second description information including a type and identification information of the second shared data; determine the second memory area from multiple memory areas according to the type of the second shared data; determine the second slot group to which the second shared data belongs in the second memory area according to a hash result of the identification information of the second shared data; and determine the second slot for storing the second shared data from the second slot group according to the access information of each slot in the second slot group.
  • any memory area also includes a bitmap area, and the bitmap area includes bit groups corresponding to each slot group in the memory area to which it belongs, and a bit group is used to record access information of whether each slot in the slot group corresponding to it has been accessed.
  • the writing module 82 is specifically used to: traverse the access information corresponding to each slot recorded in the bit group corresponding to the second slot group, and for the traversed bit position, if the value of the bit position is the first value, set the bit position to the second value, and continue to traverse the subsequent bits until the target bit position with the second value is traversed for the first time, and the slot corresponding to the target bit position is used as the second slot; the first value indicates that the slot corresponding to the bit position has been accessed, and the second value indicates that the slot corresponding to the bit position has not been accessed or can be replaced.
  • the writing module 82 when writing the second shared data into the second slot, is specifically configured to: According to the parity of the temporary version number in the second slot, determine the second data storage unit of the two data storage units included in the second slot that is effective for write operations; write the second shared data into the second data storage unit, and after the writing is completed, change the parity of the temporary version number to change the second data storage unit to be effective for read operations.
  • FIG9 is a schematic diagram of the structure of another data sharing device provided in an embodiment of the present application.
  • the device can be implemented as a shared data management module or a client file system in the virtual machine in the above embodiment, but is not limited thereto.
  • the device includes:
  • a reading module 91 is used to read the required first shared data from the shared memory window; wherein the shared memory window is used to store shared data between multiple virtual machines and has a read-only attribute for multiple virtual machines, and the shared data in the shared memory window is written by the target process module; an execution module 92 is used to perform corresponding operations according to the first shared data.
  • the shared memory window is divided into multiple memory areas, and different memory areas are used to store different types of shared data; a memory area includes multiple slot groups, a slot group includes multiple slots, and different slots are used to store different shared data.
  • the reading module 91 is specifically used to: when the first shared data is needed, determine the first slot storing the first shared data from the shared memory window; read the first shared data from the first slot; the first slot is a slot in any slot group in the first memory area, and the first memory area is a memory area adapted to the type of the first shared data.
  • the reading module 91 is specifically used to: determine first description information corresponding to the first shared data according to a shared data read request, the first description information including the type and identification information of the first shared data; determine the first memory area from multiple memory areas according to the type of the first shared data; determine the first slot group to which the first shared data belongs in the first memory area according to a hash result of the identification information of the first shared data; and determine the first slot where the first shared data is located from the first slot group according to the identification information of the shared data stored in each slot in the first slot group.
  • the reading module 91 when reading the first shared data from the first slot, is specifically used to: read the temporary version number from the first slot, and determine the first data storage unit in the two data storage units contained in the first slot that is effective for the read operation based on the parity of the read temporary version number; read the intermediate shared data from the first data storage unit, and after the reading is completed, compare the read temporary version number with the temporary version number in the first slot; if the two are the same, use the intermediate shared data as the first shared data; if the two are inconsistent, re-execute the operation of reading the first shared data from the first slot.
  • the embodiment of the present application also provides a data sharing device, which includes: an allocation module, which is used to allocate a shared memory window from the physical space of the host, the shared memory window is used to store shared data between multiple virtual machines, and has a read-only attribute for multiple virtual machines; a reading module, which is used to read the required first shared data from the shared memory window; an execution module, which is used to perform corresponding operations according to the first shared data; and a writing module, which is used to write the second shared data into the shared memory window, so that multiple virtual machines can share the second shared data.
  • Fig. 10 is a schematic diagram of the structure of a host provided by an embodiment of the present application. As shown in Fig. 10, a target process module and multiple virtual machines with a data sharing relationship are deployed on the host, and the host includes: a memory 1001 and a processor 1002, and the processor 1002 is coupled to the memory 1001.
  • the memory 1001 is used to store the target process module and the program codes corresponding to the multiple virtual machines, and can be configured to store various other data to support the operation on the host. Examples of such data include instructions, messages, pictures, videos, etc. for any application or method operating on the host.
  • the processor 1002 executes the program code corresponding to the target process module in the memory 1001 to: allocate a shared memory window from the physical space of the host, the shared memory window is used to store shared data between multiple virtual machines, and has a read-only attribute for multiple virtual machines, so that multiple virtual machines can read the required first shared data from the shared memory window; write the second shared data into the shared memory window, so that multiple virtual machines can share the second shared data.
  • the processor 1002 when allocating a shared memory window from the physical space of the host, is specifically used to: respond to an instruction to start multiple virtual machines and apply for a shared memory window from the physical space of the host; provide the physical address of the shared memory window to the virtual machine management module on the host, so that the virtualization management module can map the physical address space reserved for each of the multiple virtual machines to the shared memory window.
  • the shared memory window is divided into multiple memory areas, and different memory areas are used to store different types of shared data; a memory area includes multiple slot groups, a slot group includes multiple slots, and different slots are used to store different shared data.
  • the processor 1002 when the processor 1002 writes the second shared data into the shared memory window, it is specifically used to: determine a second slot for storing the second shared data from the shared memory window when the second shared data is generated between multiple virtual machines; write the second shared data into the second slot; the second slot is a slot in any slot group in the second memory area, and the second memory area is a memory area adapted to the type of the second shared data.
  • the processor 1002 determines the second slot from the shared memory window, it is specifically used to: determine second description information corresponding to the second shared data based on a shared data write request, the second description information including the type and identification information of the second shared data; determine the second memory area from multiple memory areas based on the type of the second shared data; determine the second slot group to which the second shared data belongs in the second memory area based on a hash result of the identification information of the second shared data; and determine the second slot for storing the second shared data from the second slot group based on the access information of each slot in the second slot group.
  • any memory area further includes a bitmap area, the bitmap area includes bit groups corresponding to each slot group in the memory area to which it belongs, and one bit group is used to record access information of whether each slot in the slot group corresponding to it has been accessed.
  • the processor 1002 determines the second slot from the second slot group, it is specifically used to: traverse the access information corresponding to each slot recorded in the bit group corresponding to the second slot group, and for the traversed bit position, if the value of the bit position is the first value, set the bit position to the second value, and continue to traverse subsequent bits until the first traversal to the value of
  • the target bit of the second value uses the slot corresponding to the target bit as the second slot; the first value indicates that the slot corresponding to the bit has been accessed, and the second value indicates that the slot corresponding to the bit has not been accessed or can be replaced.
  • the processor 1002 when the processor 1002 writes the second shared data into the second slot, it is specifically used to: determine the second data storage unit among the two data storage units included in the second slot that is effective for write operations based on the parity of the temporary version number in the second slot; write the second shared data into the second data storage unit, and after the writing is completed, change the parity of the temporary version number to change the second data storage unit to be effective for read operations.
  • the processor 1002 executes the program code corresponding to the virtual machine in the memory 1001 to: read the required first shared data from the shared memory window, and perform corresponding operations based on the first shared data; wherein the shared memory window is used to store shared data between multiple virtual machines, and has a read-only attribute for multiple virtual machines, and the shared data in the shared memory window is written by the target process module.
  • the processor 1002 when the processor 1002 reads the required first shared data from the shared memory window, it is specifically used to: determine the first slot storing the first shared data from the shared memory window when the first shared data is needed; read the first shared data from the first slot; the first slot is a slot in any slot group in the first memory area, and the first memory area is a memory area adapted to the type of the first shared data.
  • the processor 1002 determines the first slot from the shared memory window, it is specifically used to: determine first description information corresponding to the first shared data according to a shared data read request, the first description information including the type and identification information of the first shared data; determine the first memory area from multiple memory areas according to the type of the first shared data; determine the first slot group to which the first shared data belongs in the first memory area according to a hash result of the identification information of the first shared data; and determine the first slot where the first shared data is located from the first slot group according to the identification information of the shared data stored in each slot in the first slot group.
  • the processor 1002 when the processor 1002 reads the first shared data from the first slot, it is specifically used to: read a temporary version number from the first slot, and determine the first data storage unit of the two data storage units included in the first slot that is effective for the read operation based on the parity of the read temporary version number; read the intermediate shared data from the first data storage unit, and after the reading is completed, compare the read temporary version number with the temporary version number in the first slot; if the two are the same, use the intermediate shared data as the first shared data; if the two are inconsistent, re-execute the operation of reading the first shared data from the first slot.
  • the processor 1002 executes the program code corresponding to the target process module in the memory 1001 on the one hand, so as to: allocate a shared memory window from the physical space of the host, the shared memory window is used to store shared data between multiple virtual machines, and has a read-only attribute for multiple virtual machines, and writes the second shared data into the shared memory window so that multiple virtual machines can share the second shared data; on the other hand, the processor 1002 executes the program code corresponding to the virtual machine in the memory 1001 on the other hand, so as to: read the first shared data required by the virtual machine from the shared memory window, and perform corresponding operations according to the first shared data. The detailed implementation of each operation can be found in the aforementioned embodiment, which will not be repeated here.
  • the host also includes: communication component 1003, display 1004, power supply component 1005, audio component 1006 and other components.
  • FIG10 only schematically shows some components, which does not mean that the host only includes the components shown in FIG10 .
  • the components in the dotted box in FIG10 are optional components, not mandatory components, and the specific components can be viewed in the host.
  • the host of this embodiment can be implemented as a terminal device such as a desktop computer, a laptop computer, a smart phone or an IOT device, or a server-side device such as a conventional server, a cloud server or a server array.
  • the host of this embodiment is implemented as a terminal device such as a desktop computer, a laptop computer, a smart phone, etc., it may include the components in the dotted box in Figure 10; if the host of this embodiment is implemented as a server-side device such as a conventional server, a cloud server or a server array, it may not include the components in the dotted box in Figure 10.
  • an embodiment of the present application further provides a computer-readable storage medium storing a computer program.
  • the processor When the computer program is executed by a processor, the processor is enabled to implement each step in the above-mentioned method embodiments.
  • An embodiment of the present application also provides a computer program product, including a computer program/instruction.
  • the processor is enabled to implement the steps in the above-mentioned method embodiments.
  • the above-mentioned memory can be implemented by any type of volatile or non-volatile storage device or a combination of them, such as static random-access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, disk or optical disk.
  • SRAM static random-access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic storage flash memory
  • flash memory disk or optical disk.
  • the above-mentioned communication component is configured to facilitate wired or wireless communication between the device where the communication component is located and other devices.
  • the device where the communication component is located can access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof.
  • the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel.
  • the communication component also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the above-mentioned display includes a screen, and the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
  • the power supply assembly provides power to various components of the device where the power supply assembly is located.
  • the power supply assembly may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device where the power supply assembly is located.
  • the above-mentioned audio component can be configured to output and/or input audio signals.
  • the audio component includes a microphone (Microphone, MIC), and when the device where the audio component is located is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signal can be further stored in a memory or sent via a communication component.
  • the audio component also includes a speaker for outputting audio. Frequency signal.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer-readable storage media (including but not limited to disk storage, compact disc read-only memory (CD-ROM), optical storage, etc.) containing computer-usable program code.
  • CD-ROM compact disc read-only memory
  • optical storage etc.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • a computing device includes one or more processors (Central Processing Unit, CPU), input/output interfaces, network interfaces, and memory.
  • processors Central Processing Unit, CPU
  • input/output interfaces input/output interfaces
  • network interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in a computer-readable medium, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media that can be used to store information by any method or technology. Information can be computer-readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase-change random access memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, read-only compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, tape disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media (transitory media), such as modulated data signals and carrier waves.
  • transitory computer-readable media such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供一种数据共享方法、设备、系统及存储介质。在本申请实施例中,在主机上为具有数据共享关系的虚拟机分配共享内存窗口,通过本地的共享内存窗口存储多个虚拟机之间的共享数据,实现共享数据的本地访问,缩短共享数据的访问路径,提高共享数据的访问性能;另外,虚拟机对共享内存窗口可读不可写,并在主机上增设目标进程模块,共享数据的写入由目标进程模块统一负责,可避免多个虚拟机同时对共享内存窗口进行写操作可能造成的数据错误问题,而且共享数据的任何修改可被各虚拟机立即观察到,对各虚拟机来说能够做到共享数据的一致性,而且无需经过网络传输进行同步,可进一步提高共享数据的访问性能。

Description

数据共享方法、设备、系统及存储介质
本申请要求于2023年03月03日提交中国专利局、申请号为202310219577.0、申请名称为“数据共享方法、设备、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及云存储技术领域,尤其涉及一种数据共享方法、设备、系统及存储介质。
背景技术
在云计算场景中,为了减小数据丢失的风险,通过将数据分散到不同存储节点上进行存储,并通过文件系统对这些数据进行管理,以提供可靠和高效的数据访问。在文件系统中,将数据划分为文件和元数据,元数据是对文件进行相关描述的系统数据,诸如访问权限、文件拥有者以及文件数据的分布信息等。用户需要操作一个文件,需要访问它的元数据,确定文件的位置或相关属性进而完成对应的文件操作。
在云计算场景中,虚拟机(Virtual Machine,VM)之间共享一些数据,例如文件的元数据,这些元数据存储在远端服务器上,VM需要到远端服务器获取元数据,当VM对元数据进行了修改,这种修改需要写回远端服务器,再通过远端服务器向其它VM进行同步,这可以保证元数据的一致性但会导致元数据的访问性能下降。如何在VM之间保证共享数据的一致性,又能保证共享数据的访问性能是亟需解决的技术问题。
发明内容
本申请的多个方面提供一种数据共享方法、设备、系统及存储介质,用以在VM之间保证共享数据一致性的情况下,缩短共享数据的访问路径,提高共享数据的访问性能。
第一方面,本申请实施例还提供一种数据共享方法,应用于数据共享系统,所述数据共享系统包括部署在同一主机上的目标进程模块和具有数据共享关系的多个虚拟机,所述方法包括:所述目标进程模块从所述主机的物理空间中分配共享内存窗口,所述共享内存窗口用于存储所述多个虚拟机之间的共享数据,且面向所述多个虚拟机具有只读属性;所述多个虚拟机从所述共享内存窗口中读取所需的第一共享数据,并根据所述第一共享数据执行相应操作;所述目标进程模块将第二共享数据写入所述共享内存窗口中,以供所述多个虚拟机共享所述第二共享数据。
第二方面,本申请实施例提供一种主机,所述主机上部署有目标进程模块和具有数据共享关系的多个虚拟机,所述主机包括:存储器和处理器;所述存储器,用于存储所述目 标进程模块和所述多个虚拟机对应的程序代码;所述处理器与所述存储器耦合,用于执行所述目标进程模块对应的程序代码,以用于实现本申请第一方面提供的数据共享方法中的步骤。
第三方面,本申请实施例提供一种数据共享系统,部署在同一主机上的目标进程模块和具有数据共享关系的多个虚拟机;所述目标进程模块,用于实现第一方面提供的方法中所述目标进程模块执行的步骤,所述多个虚拟机,用于实现第一方面提供的方法中所述虚拟机执行的步骤。
第四方面,本申请实施例还提供一种文件元数据共享系统,包括:部署在同一主机上的目标进程模块、主机文件系统以及具有数据共享关系的多个虚拟机,所述多个虚拟机中部署有客户文件系统;所述目标进程模块,用于从所述主机的物理空间中分配共享内存窗口,所述共享内存窗口用于存储所述多个虚拟机之间共享的文件元数据,且面向所述多个虚拟机具有只读属性;所述客户文件系统,用于响应其所属虚拟机中上层应用发起的文件元数据读取操作,从所述共享内存窗口中读取第一文件元数据并返回给所述上层应用;所述目标进程模块还用于:响应所述主机文件系统发起的文件元数据数据写入操作,将第二文件元数据写入所述共享内存窗口中,以供所述多个虚拟机共享所述第二文件元数据。
第五方面,本申请实施例提供一种存储有计算机程序的计算机可读存储介质,当所述计算机程序被处理器执行时,致使所述处理器能够实现本申请第一方面提供的数据共享方法中的步骤。
在本申请实施例中,针对部署在同一主机上具有数据共享关系的多个虚拟机,在主机上为这些虚拟机分配共享内存窗口,通过本地的共享内存窗口存储多个虚拟机之间的共享数据,实现共享数据的本地访问,缩短共享数据的访问路径,提高共享数据的访问性能;另外,共享内存窗口对这些虚拟机具有只读属性,即虚拟机对共享内存窗口可读不可写,并在主机上增设目标进程模块,共享数据的写入由目标进程模块统一负责,可避免多个虚拟机同时对共享内存窗口进行写操作可能造成的数据错误问题,而且基于共享内存窗口,共享数据的任何修改可被各虚拟机立即观察到,对各虚拟机来说能够做到共享数据的一致性,而且共享数据的一致性的实现无需经过网络传输在不同VM之间进行同步,可进一步提高共享数据的访问性能。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请实施例提供的一种数据共享系统的结构示意图;
图2为本申请实施例提供的共享内存窗口被划分为多个内存区域的使用方式示意图;
图3为本申请实施例提供的内存区域被换分为多个槽位组和槽位的使用方式示意图;
图4为本申请实施例提供的内存区域包含槽位组、槽位以及位图区的使用方式示意图;
图5为本申请实施例提供的一种用于文件元数据的数据共享系统的结构示意图;
图6为本申请实施例提供的一种数据共享方法的流程示意图;
图7a为本申请实施例提供的另一种数据共享方法的流程示意图;
图7b为本申请实施例提供的又一种数据共享方法的流程示意图;
图8为本申请实施例提供的一种数据共享装置的结构示意图;
图9为本申请实施例提供的另一种数据共享装置的结构示意图;
图10为本申请实施例提供的一种主机的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准,并提供有相应的操作入口,供用户选择授权或者拒绝。
在云计算场景中,云服务商面向租户提供各种云计算资源,租户可以在云服务商提供的云计算资源上购买或部署自己的虚拟机,通过虚拟机承载自己的各种应用或服务,这些应用或服务在运行过程中产生的各种数据被分散存储在不同的存储节点上。从虚拟机的角度来看,可能需要进行数据共享,但是现有数据被分散在不同存储节点上,共享数据的访问路径较长,而且为了保证共享数据的一致性,不同虚拟机之间还需要对共享数据进行同步,导致共享数据的访问性能较差。云计算场景面临着如何在共享数据的访问性能和一致性之间进行均衡的技术问题。
针对上述技术问题,在本申请实施例中,以主机为单位提供解决方案,针对部署在同一主机上具有数据共享关系的多个虚拟机,在主机上为这些虚拟机分配共享内存窗口,通过本地的共享内存窗口存储多个虚拟机之间的共享数据,实现共享数据的本地访问,缩短共享数据的访问路径,提高共享数据的访问性能;与此同时,共享内存窗口对这些虚拟机具有只读属性,即虚拟机对共享内存窗口可读不可写,并在主机上增设目标进程模块,共享数据的写入由目标进程模块统一负责,可避免多个虚拟机同时对共享内存窗口进行写操作可能造成的数据错误问题,而且基于共享内存窗口,共享数据的任何修改可被各虚拟机立即观察到,对各虚拟机来说能够做到共享数据的一致性,再者共享数据的一致性的实现无需经过网络传输在不同VM之间进行同步,可进一步提高共享数据的访问性能。
进一步,在提高共享数据的访问性能的基础上,可提高基于共享数据的后续操作的效 率,降低后续操作的时延。尤其是,在共享数据是文件元数据的情况下,在缩短文件元数据的访问路径,提高文件元数据的访问速度的情况下,有利于提高基于文件元数据的文件操作的效率,降低文件操作的时延。
以下结合附图,详细说明本申请各实施例提供的技术方案。
图1为本申请实施例提供的一种数据共享系统100的结构示意图。如图1所示,该数据共享系统100包括:部署在同一主机10上的目标进程模块11和具有数据共享关系的多个虚拟机12。在本实施例中,主机10可以是一台或多台,在图1中,以一台主机10为例进行图示。在主机10为多台的情况下,至少部分主机10上部署有目标进程模块11和具有数据共享关系的多个虚拟机12。本申请实施例重点关注部署有目标进程模块11和具有数据共享关系的多个虚拟机12的主机10。本实施例的主机10可以是任何具有存储、计算和通信能力的物理设备,例如可以是传统服务器、服务器阵列或服务器集群等服务器设备,还可以是手机、笔记本电脑、台式电脑等各种终端设备,还可以是各种基站、网关设备等网络设备。
在本实施例中,虚拟机12之间的数据共享关系是指虚拟机12之间能够进行数据共享,简单来说,虚拟机12之间能够使用相同的数据,这些可被虚拟机12共同使用的数据简称为共享数据。从数据共享关系的角度来看,具有数据共享关系的虚拟机12可能分布在同一主机10上,也可能分布不同主机10上,对此不做限定。本实施例重点关注部署在同一主机10上具有数据共享关系的虚拟机之间的数据共享过程,对于跨主机的虚拟机之间的数据共享过程对此不做关注,例如对于跨主机的虚拟机之间可以使用标准的消息传递接口(MPI)通信机制,以消息传递或远端内存访问等方式进行相互之间的数据操作。对于部署在同一主机10上的虚拟机12,彼此之间是否形成数据共享关系可根据应用需求灵活确定。在一些应用场景,同一租户可以部署多个虚拟机,多个虚拟机提供相同服务,则对于部署在同一主机10上且属于同一租户的多个虚拟机12之间形成数据共享关系。当然,从租户的角度来看,其部署的虚拟机可以分布在不同主机上,且这些分布在不同主机之上的虚拟机之间也可以形成数据共享关系,对此不做关注。在另一些应用场景中,不同租户之间可能满足用户访问控制策略,例如租户A的数据被授权给另一租户B,这种情况下跨租户的虚拟机12之间也可以形成数据共享关系。
在本实施例中,并不限定主机10上存在的数据共享关系的数量,也不限定具有数据共享关系的虚拟机12的数量,具体可视主机10的物理资源和处理能力而定。例如,对于同一主机10,其上可以存在一组或多组具有数据共享关系的虚拟机12,不同组的虚拟机12具有不同的数据共享关系,例如同一主机10上可以同时存在第一租户的多个虚拟机12和第二租户的多个虚拟机12,每个租户的虚拟机形成一种数据共享关系。对具有同一数据共享关系的虚拟机12的数量,可以是2个、3个、5个或10个等,对此也不做限定。另外,不同主机10上存在的数据共享关系的数量以及具有数据共享关系的虚拟机12的数量可以相同,也可以不同,对此均不做限定。
在本实施例中,如图1所示,以同一主机10为单位,对部署在同一主机10上部署的具有同一种数据共享关系的多个虚拟机12,在该主机10上为这些虚拟机分配共享内存窗口(shared memory window),通过本地的共享内存窗口存储多个虚拟机之间的共享数据,实现共享数据的本地访问,缩短共享数据的访问路径,提高共享数据的访问性能。进一步,在主机10上增设目标进程模块11,由目标进程模块11对共享内存窗口进程维护和管理。在本实施例中,目标进程模块11可以是运行在主机10上的进程,例如守护进程(daemon process),但并不限于此。其中,可由目标进程模块11从主机10的物理空间中为具有同一种数据共享关系的多个虚拟机12分配共享内存窗口,以用于存储多个虚拟机12之间的共享数据。
进一步,为了避免多个虚拟机12同时对共享内存窗口进行写操作造成共享数据的错误问题,在进行共享内存窗口分配的过程中,对共享内存窗口的读写属性进行了约束,共享内存窗口面向多个虚拟机12具有只读属性,即对多个虚拟机12来说,只能对共享内存窗口进行读操作,而不能对共享内存窗口进行写操作,简单来说,共享内存窗口对虚拟机12实现了只能读不可写的约束,由于虚拟机不能对共享内存窗口进行写操作,也就解决了多个虚拟机12同时对共享内存窗口进行写操作造成共享数据的错误问题。
但是,虚拟机12之间的共享数据是动态变化的,当新的共享数据产生的时候,需要写入共享内存窗口中,以便在多个虚拟机之间进行共享。在本申请实施例中,由目标进程模块11负责向共享内存窗口中写入共享数据,这样,能够确保新的共享数据能够及时被写入共享内存窗口,与此同时还能解决多个虚拟机12同时对共享内存窗口进行写操作造成共享数据的错误问题。
基于上述,对任一虚拟机12来说,在需要共享内存窗口中的共享数据时,可以直接从共享内存窗口中读取所需的共享数据,并根据所读取的共享数据执行相应操作。另外,在多个虚拟机之间产生需要写入共享内存窗口的新的共享数据的情况下,由目标进程模块11将该新的共享数据写入共享内存窗口中,以供多个虚拟机12共享该新的共享数据。为了便于描述和区分,将虚拟机12需要从共享内存窗口中读取的共享数据记为第一共享数据,对不同虚拟机12来说,需要读取的第一共享数据可能是不同的;将需要写入共享内存窗口中的共享数据记为第二共享数据。其中,第二共享数据的产生端可以将第二共享数据提供给目标进程模块11,例如可以通过但不限于:远程过程调用(Remote Procedure Call,RPC)的方式将第二共享数据提供给目标进程模块11,以供目标进程模块11将第二共享数据写入共享内存窗口中。其中,根据应用场景的不同,第二共享数据的产生端会有所不同,以图5所示应用场景为例,第二共享数据具体为第二文件元数据,其对应的产生端为主机文件系统。
在此说明,在本申请实施例中,并不限定虚拟机12从共享内存窗口中读取第一共享数据与目标进程模块11向共享内存窗口中写入第二共享数据的先后顺序。如果虚拟机12需要第一共享数据的需求在前,则可以先从共享内存窗口中读取第一共享数据;然后,在 产生第二共享数据的情况下,再由目标进程模块11将第二共享数据写入共享内存窗口。其中,第二共享数据的产生可以与根据第一共享数据执行相应操作的过程或结果有关,也可以与根据第一共享数据执行相应操作的过程和结果无关。如果先产生第二共享数据,由目标进程模块11先将第二共享数据写入共享内存窗口中;然后,虚拟机需要第一共享数据时,可以从共享内存窗口中读取第一共享数据;其中,第一共享数据可以是先行写入的第二共享数据,也可以是共享内存窗口中除第二共享数据之外的其它共享数据。当然,虚拟机12从共享内存窗口中读取第一共享数据与目标进程模块11向共享内存窗口中写入第二共享数据也可以并行执行。
另外,根据应用场景的不同,具有数据共享关系的虚拟机12之间的共享数据会有所不同,相应地,虚拟机12需要第一共享数据的场景和方式、产生第二共享数据的场景和方式以及根据第一共享数据所执行的相应操作也会有所不同,对此不做限定。例如,以虚拟化环境中的文件管理场景为例,各虚拟机12之间的共享数据可以是文件元数据;文件元数据是用于对文件进行属性描述的数据,例如包括但不限于文件的类型、文件的所有者、文件的大小、文件最近一次修改的时间等等;相应地,虚拟机在打开、删除、移动或重命名某个文件时,都需要该文件的元数据,并可发起文件元数据读取操作,以请求从共享内存窗口中读取该文件元数据,在成功读取到该文件元数据的情况下,可以根据所读取的文件元数据执行对应的文件访问操作,例如打开、删除、移动或重命名对应的文件。相应地,在执行文件访问操作的过程中,随着对文件的访问,文件的属性信息会发生相应变化,因此会产生新的文件元数据,于是可以发起文件元数据写入操作,由目标进程模块11感知该文件元数据写入操作,并将新的文件元数据写入共享内存窗口中。
无论是何种共享数据的应用场景,在本实施例中,以主机为单位提供解决方案,针对部署在同一主机上具有数据共享关系的多个虚拟机,在主机上为这些虚拟机分配共享内存窗口,通过本地的共享内存窗口存储多个虚拟机之间的共享数据,实现共享数据的本地访问,缩短共享数据的访问路径,提高共享数据的访问性能;与此同时,共享内存窗口对这些虚拟机具有只读属性,即虚拟机对共享内存窗口可读不可写,并在主机上增设目标进程模块,共享数据的写入由目标进程模块统一负责,可避免多个虚拟机同时对共享内存窗口进行写操作可能造成的数据错误问题,而且基于共享内存窗口,共享数据的任何修改可被各虚拟机立即观察到,对各虚拟机来说能够做到共享数据的一致性,再者共享数据的一致性的实现无需经过网络传输在不同VM之间进行同步,可进一步提高共享数据的访问性能。
在本申请实施例中,并不限定目标进程模块11为具有数据共享关系的多个虚拟机12分配共享内存窗口的实现方式。在本实施例中,共享内存窗口的分配包括共享内存窗口的申请和共享内存窗口的暴露。关于共享内存窗口的申请:目标进程模块11可以调用内存分配命令,向主机10的操作系统(Operating System,OS)申请共享内存窗口。例如,目标进程模块11作为一个正常运行的进程,它可以响应启动虚拟机的指令,确定需要为这些具有共享数据关系的虚拟机申请共享内存窗口,然后,通过一些标准的内核访问接口,例如 可移植操作系统接口(Portable Operating System Interface,POSIX)的应用程序接口(Application Programming Interface,API)访问主机10的OS,向OS的相关单元申请共享内存窗口,由OS的相关单元从主机10的物理空间中为多个虚拟机12申请具有只读属性的共享内存窗口,也就是说,OS提供的API接口可以设置共享内存窗口的访问权限为只读,如果在访问权限为只读的共享内存窗口中进行写操作会报错,导致虚拟机崩溃(crash),通过这种报错机制约束虚拟机对共享内存窗口只读不写。在本实施例中,并不限定共享内存窗口的大小,可以是固定大小,也可以由目标进程模块11确定共享内存窗口的大小。
关于共享内存窗口的暴露:在主机10上还部署有为多个虚拟机12提供虚拟化服务的虚拟化管理模块,在图1中未示出,虚拟化管理模块负责虚拟机12的创建,在创建虚拟机12的过程中为虚拟机12虚拟化所需的硬件资源并预留一定大小的物理地址空间,该物理地址空间作为外设组件互联标准(Peripheral Component Interconnect,PCI)设备挂载在虚拟机上,为了便于虚拟机在启动后能够访问该物理地址空间,在PCI设备配置空间中增加一个基地址寄存器(Base Address Register,BAR),将该物理地址空间的地址设置在BAR中。在本实施例中,预留的物理地址空间用于供虚拟机缓存共享数据。在本实施例中,目标进程模块11在申请到共享内存窗口的情况下,还将共享内存窗口的物理地址提供给虚拟机管理模块;由虚拟化管理模块在多个虚拟机启动时,根据共享内存窗口的物理地址,将多个虚拟机各自预留的物理地址空间映射为共享内存窗口,使得虚拟机能够直接访问共享内存窗口,进而从共享内存窗口中读取所需的第一共享数据,完成对共享内存窗口的暴露。进一步,通过在预留的物理地址空间的头部区域预定很小的空间,用来存放共享内存窗口的描述信息,该描述信息用于描述共享内存窗口中存储了哪些共享数据,这些共享数据在共享内存窗口中的位置偏移,例如在共享内存窗口中的地址偏移量D1到D2是某个可读的共享数据,以便于虚拟机能够从共享内存窗口中读取相应共享数据,D1和D2表示共享内存窗口中的地址偏移量。
以共享数据是下文中提到的文件元数据为例,在文件元数据中包括索引节点(inode)数据、索引节点的牺牲缓存(inode victim cache)数据和目录项(dentry)数据为例,则共享内存窗口的描述信息可以包括该共享内存窗口里面地址偏移量从A1(如0x0000 0000 0000 0000)到A3(如0x0000 0000 3F00 0000)是索引节点数据,地址偏移量从B1到B3是索引节点的牺牲缓存数据,地址偏移量从C1到C3是目录项数据。进一步,在索引节点数据包括槽位组和位图区的情况下,上述描述信息中还可以具体包括共享内存窗口里面地址偏移量从A1到A2是索引节点数据中可读的数据,地址偏移量从A2到A3是索引节点数据中的位图区,inode victim和dentry同理。其中,A1、A2、A3、B1、B3、C1和C3等表示共享内存空间中的地址偏移量,例如0x0000 0000 0000 0000和0x0000 0000 3F00 0000是用16进制,64位整数表示的地址偏移量,是对地址偏移量A1和地址偏移量A2的示例。关于各类文件元数据、槽位组和位图区等描述可参见下述实施例,在此暂不详述。
在本申请实施例中,并不限定虚拟化管理模块的实现方式,例如可以是但不限于:虚拟机监测器(hypervisor)或者是hypervisor中的快速模拟器(Quick EMUlator,QEMU)和基于内核的虚拟机(Kernel-based Virtual Machine,KVM)的组合。其中,QEMU负责虚拟机的创建以及相关工作,KVM负责预留的物理地址空间到共享内存窗口的映射。
在本实施例中,并不限定对共享内存储窗口的使用方式。在一可选实施例中,具有数据共享关系的多个虚拟机之间的共享数据具有多种类型,基于此,可以按照共享数据的类型,将共享内存窗口划分为多个内存区域,不同内存区域用于存储不同类型的共享数据。如图2所示,为共享内存窗口被划分为多个内存区域的状态示意图。以共享数据是文件元数据为例,假设文件元数据包括索引节点数据,索引节点的牺牲缓存数据以及目录项数据三类元数据,则可以将共享内存窗口划分为三个内存区域,记为内存区域A、内存区域B和内存区域C,如图2所示。其中,内存区域A用于存储索引节点数据,内存区域B用于存储索引节点的牺牲缓存数据,内存区域C用于存储目录项数据。其中,索引节点数据是一种数据结构,表示相应的文件状态,索引节点数据中记录有文件名,文件版本号、文件大小,文件的访问权限,文件磁盘的位置,何时创建/修改,是文件还是目录等,索引节点数据是由索引节点编号(inode编号)索引的。目录项数据是一种内核数据结构,它负责将文件名映射到对应的inode编号,目录项数据中记录有文件名与inode编号的对应关系,以及其上一层级的inode编号(简称为父inode编号)。基于目录项数据,使得文件操作能够基于文件的inode编号,而不依赖于文件的存储路径。索引节点的牺牲缓存数据是指存储在牺牲缓存(victim cache)中的数据,牺牲缓存是一个相对较小的缓存空间,用于在索引节点数据必须从其内存区域中被替换时,存储被替换的索引节点数据,这样虚拟机可以通过索引节点的牺牲缓存数据和内存区域中的索引节点数据检查本地是否缓存有最新版本的文件数据。
在共享内存窗口被划分为多个内存区域的情况下,虚拟机需要从共享内存窗口中读取第一共享数据时,可以根据共享数据读取请求,确定第一共享数据对应的描述信息,该描述信息包括第一共享数据的类型和标识信息;根据第一共享数据的类型,从多个内存区域中确定与第一共享数据的类型适配的第一内存区域;根据第一共享数据的标识信息,从第一内存区域中读取第一共享数据。根据第一共享数据的类型的不同,第一内存区域也会有所不同。以第一共享数据是索引节点数据为例,第一共享数据的类型是指索引节点数据,而并非是目录项数据或索引节点的牺牲缓存数据,第一共享数据的标识信息是索引节点数据中的inode编号,相应地,第一内存区域是图2所示的内存区域A。以第一共享数据是索引节点的牺牲缓存数据为例,第一共享数据的类型是指索引节点的牺牲缓存数据,而并非是目录项数据或索引节点数据,第一共享数据的标识信息是索引节点的牺牲缓存数据中的inode编号,相应地,第一内存区域是图2所示的内存区域C。以第一共享数据是目录项数据为例,第一共享数据的类型是指目录项数据,而并非是索引节点数据或索引节点的牺牲缓存数据,第一共享数据的标识信息是目录项数据中的inode编号,相应地,第一内 存区域是图2所示的内存区域B。
相应地,在共享内存窗口被划分为多个内存区域的情况下,目标进程模块11向共享内存窗口中写入第二共享数据时,可以根据共享数据写入请求,确定第二共享数据对应的描述信息,该描述信息包括第二共享数据的类型和标识信息;根据第二共享数据的类型,从多个内存区域中确定与第二共享数据的类型适配的第二内存区域;将第二共享数据写入第二内存区域中读取第二共享数据。根据第二共享数据的类型的不同,第二内存区域也会有所不同。以第二共享数据是索引节点数据为例,第二共享数据的类型是指索引节点数据,而并非是目录项数据或索引节点的牺牲缓存数据,第二共享数据的标识信息是索引节点数据中的inode编号,相应地,第二内存区域是图2所示的内存区域A。以第二共享数据是索引节点的牺牲缓存数据为例,第二共享数据的类型是指索引节点的牺牲缓存数据,而并非是目录项数据或索引节点数据,第二共享数据的标识信息是索引节点的牺牲缓存数据中的inode编号,相应地,第二内存区域是图2所示的内存区域C。以第二共享数据是目录项数据为例,第二共享数据的类型是指目录项数据,而并非是索引节点数据或索引节点的牺牲缓存数据,第二共享数据的标识信息是目录项数据中的inode编号,相应地,第二内存区域是图2所示的内存区域B。
进一步可选地,本实施例对多个内存区域的大小以及彼此之间的大小关系不做限定。以图2所示的内存区域A、内存区域B和内存区域C为例,考虑到索引节点数据相对较多,则可以配置内存区域A的大小例如为140字节,内存区域B的大小例如15个字节,内存区域C的大小例如50字节等。考虑到各个内存区域的大小是有限的,随着共享数据的不断产生,内存区域会溢出,本实施例中允许对各个内存区域中的共享数据进行替换操作。
进一步可选地,目标进程模块11在每个内存区域进行写操作时,为了提高写入效率,可以采用多线程对同一内存区域进行并行写入。但是,在并行写入的情况下,如果同时对同一位置进行写入就会发生错误,为了避免并行写入发生错误,在本实施例中,针对每个内存区域,进一步将其划分为多个槽位组(slot set),不同槽位组中存储不同的共享数据,不同槽位组之间允许并行写入,也就是同一时刻可以同时对多个槽位组进行写操作,由于是对不同槽位组中进行写操作,可以避免对同一位置同时进行写入造成的写入错误。进一步,每个槽位组包括多个槽位(slot),每个槽位是内存区域中的一小块存储空间,用于存储其所属内存区域对应类型下的一个共享数据。在本实施例中,并不限定内存区域包含的槽位组的数量,也不限定槽位组中包含的槽位数量,可根据应用需求灵活确定。如图3所示,以任一内存区域为例,将其划分为多个槽位组即set_0-set_N,每个槽位组包含4个槽位为例进行图示,N是大于等于1的自然数。
在本实施例中,在将内存区域划分为多个槽位组的情况下,可以采用哈希的方式,将内存区域中的共享数据分散到不同的槽位组中。进一步,可以对各共享数据的标识信息进行哈希计算,将相同哈希结果的共享数据存储在同一槽位组内,并建立槽位组与哈希结果之间的映射关系。基于此,虚拟机需要从共享内存窗口中读取第一共享数据时,可以从共 享内存窗口中确定存储有第一共享数据的第一槽位;从第一槽位中读取第一共享数据。其中,第一槽位是第一内存区域中任一槽位组中的槽位,第一内存区域是与第一共享数据的类型适配的内存区域。具体地,可以根据共享数据读取请求,确定第一共享数据对应的描述信息,该描述信息中包括第一共享数据的类型和标识信息;根据第一共享数据的类型,从多个内存区域中确定第一内存区域;然后,根据第一共享数据的标识信息的哈希结果,确定第一共享数据在第一内存区域中所属的第一槽位组;根据第一槽位组中各槽位上存储的共享数据的标识信息,从第一槽位组中确定第一共享数据所在的第一槽位。具体地,可以将第一共享数据的标识信息与第一槽位组中各槽位上存储的共享数据的标识信息进行比对,从而确定第一共享数据所在的第一槽位。
相应地,目标进程模块11向共享内存窗口中写入第二共享数据时,可以从共享内存窗口中确定用于存储第二共享数据的第二槽位;将第二共享数据写入第二槽位中。其中,第二槽位是第二内存区域中任一槽位组中的槽位,第二内存区域是与第二共享数据的类型适配的内存区域。具体地,可以根据共享数据写入请求,确定第二共享数据对应的描述信息,该描述信息包括第二共享数据的类型和标识信息;根据第二共享数据的类型,从多个内存区域中确定与第二共享数据的类型适配的第二内存区域;根据第二共享数据的标识信息的哈希结果,确定第二共享数据在第二内存区域中所属的第二槽位组;根据第二槽位组中各槽位的访问信息,从第二槽位组中确定用于存储第二共享数据的第二槽位。优选地,可以从第二槽位组中选择空闲的槽位作为第二槽位,用来存储第二共享数据;当第二槽位组中各槽位均被占用时,目标进程模块11可以按照一定的旧数据替换(Evicting older version)方式将某个槽位中的共享数据替换,实现槽位回收,将回收的槽位作为第二槽位,用来存储第二共享数据。
在本实施例中,并不限定目标进程模块11采用的旧数据替换方式,例如可以记录各个槽位上共享数据的访问频率和访问时间,采用最近最少访问的替换方式,将最近最少访问的共享数据替换;或者,也可以统计各槽位上共享数据的存储时间,将存储时间最早的共享数据替换。除此之外,本申请实施例提供一种简单、高效的旧数据替换方式,以实现槽位回收,该方式可称为时钟算法回收槽位(Clock algorithm reclaiming slots)。如图4所示,每个内存区域除了包括多个槽位组之外,还包括位图区,该位图区包括与其所属内存区域中各槽位组对应的比特组,每个比特组用于记录对应槽位组中各槽位是否被访问过的访问信息,一个比特位对应一个槽位,每个比特位都是共享内存窗口中的一个小存储空间。在本实施例中,每个比特位可以取两个值,记为第一值和第二值,第一值表示该比特位对应的槽位被访问过,即该槽位已经被写入了共享数据(即已被占用),第二值表示该比特位对应的槽位未被访问过或可被替换,即该槽位上尚未写入共享数据(即处于空闲状态)。在本实施例中,并不限定第一值和第二值的取值,如图4所示,第一值可以为1,第二值可以为0,即取值为1的比特位对应的槽位已经被占用,取值为0的比特位对应的槽位未被访问过或者可被替换。
基于上述,在确定第二槽位组的情况下,目标进程模块11可以遍历第二槽位组对应的比特组中记录的各槽位的访问信息,对于遍历到的每个比特位,若该比特位的取值为第一值,将该比特位置为第二值,以表示再次遍历到该比特位时该比特位对应槽位中的共享数据可被替换,并继续遍历后续比特位,直至首次遍历到取值为第二值的目标比特位,将目标比特位对应的槽位作为第二槽位,用于存储第二共享数据。在一种特殊情况下,第二槽位组中的所有槽位都被访问过,此时,第二槽位组对应比特组中各比特位的取值均为第一值,例如1,这就需要从头遍历到最后一个比特位,然后重新从头开始遍历,由于在上一次遍历时各个比特位的取值已经被置为第二值,例如0,所以第一个槽位中的共享数据可以被替换,此时第一个槽位可以作为第二槽位,用来存储第二共享数据。
在本申请实施例中,并不限定共享内存窗口中存储共享数据的方式,凡是能够成功存储共享数据的方式均适用于本申请实施例。在一可选实施例中,为了便于能够同时对共享内存窗口中的同一数据进行读写操作,在共享内存窗口中存储共享数据时采用以下结构:
针对每个共享数据,分配两个数据存储单元和一个版本号存储单元,两个数据存储单元用于存储同一共享数据,版本号存储单元用于存储临时版本号,该临时版本号具有奇偶性(even/odd),且该临时版本号的奇偶性用于标识这两个数据存储单元中当前对读操作和写操作分别生效的数据存储单元;其中,当需要从共享内存窗口中读取共享数据时,需要从“对读操作生效的数据存储单元”中读取,当需要向共享内存储窗口中写入共享数据时,需要向“对写操作生效的数据存储单元”中写入,以支持同一共享数据的同时读写操作。另外,临时版本号的奇偶性可随着槽位中写操作的发生而变化,以保证每次写操作完成后能够让最新写入的共享数据对读操作生效,以便虚拟机能够读取最新的共享数据。
虚拟机从共享内存窗口中读取第一共享数据时,首先,可以确定存储第一共享数据的存储位置,在该存储位置上包括两个数据存储单元和一个版本号存储单元;从版本号存储单元中读取临时版本号,根据所读取的临时版本号的奇偶性,确定该存储位置包含的两个数据存储单元中对读操作生效的第一数据存储单元;从第一数据存储单元中读取中间态共享数据,在读取完成后,将所读取的临时版本号与该存储位置包含的版本号存储单元中的临时版本号进行比对;若两者相同,将中间态共享数据作为第一共享数据;若两者不一致,按照前述步骤重新执行从第一槽位中读取第一共享数据的操作。相应地,目标进程模块向共享内存中写入第二共享数据时,首先,可以确定用于存储第二共享数据的存储位置,在该存储位置上包括两个数据存储单元和一个版本号存储单元;从版本号存储单元中读取临时版本号,根据所读取的临时版本号的奇偶性,确定该存储位置包含的两个数据存储单元中对写操作生效的第二数据存储单元;将第二共享数据写入第二数据存储单元,并在写入完成后,更改临时版本号的奇偶性,以将第二数据存储单元变更为对读操作生效。在本申请实施例中,并不关注临时版本号的取值,而是关注其奇偶性。
在此说明,采用两个数据存储单元和一个版本号存储单元对共享数据进行存储的方式,可以不依赖于对共享内存窗口的划分,则可以采用遍历或轮询等方式来确定存储有第一共 享数据的存储位置或用于存储第二共享数据的存储位置,对此不做限定。在一优选实施例中,可以将采用两个数据存储单元和一个版本号存储单元对共享数据进行存储的方式,与共享内存窗口划分出的槽位相结合,在该结合方案中,每个槽位中包括两个数据存储单元和一个版本号存储单元,两个数据存储单元用于存储同一共享数据,版本号存储单元用于存储临时版本号,该临时版本号具有奇偶性,且该临时版本号的奇偶性用于标识这两个数据存储单元中当前对读操作和写操作分别生效的数据存储单元。
基于上述,虚拟机从共享内存窗口中读取第一共享数据时,可以按照前文描述的方式确定第一槽位;然后按照下述方式从第一槽位中读取第一共享数据,具体地:从第一槽位中读取临时版本号,根据所读取的临时版本号的奇偶性,确定第一槽位包含的两个数据存储单元中对读操作生效的第一数据存储单元;从第一数据存储单元中读取中间态共享数据,在读取完成后,将所读取的临时版本号与第一槽位中的临时版本号进行比对;若两者相同,说明在读取第一共享数据过程中目标进程模块11未对第一共享数据进行过修改,所以可以将该中间态共享数据作为第一共享数据;若两者不一致,说明在读取第一共享数据过程中目标进程模块11对第一共享数据进行过修改,所读取的中间态共享数据不是最新版本的数据,因此需要按照前面的步骤重新执行从第一槽位中读取第一共享数据的操作。
相应地,目标进程模块向共享内存中写入第二共享数据时,可以按照前文描述的方式确定第二槽位;然后按照下述方式将第二共享数据写入第二槽位中,具体地:根据第二槽位中临时版本号的奇偶性,确定第二槽位包含的两个数据存储单元中对写操作生效的第二数据存储单元;将第二共享数据写入第二数据存储单元,并在写入完成后,更改临时版本号的奇偶性,以将第二数据存储单元变更为对读操作生效。
在本实施例中,并不限定更改临时版本号的奇偶性的方式。在一可选实施例中,在每次写入完成后,可以将临时版本号加1,以此不断改变临时版本号的奇偶性。在另一可选实施例中,也可以预先设定两个数值,一个奇数,例如1,一个偶数,例如2,然后,在每次写入完成后,将临时版本号在这两个数值之间进行切换,以此不断更改临时版本号的奇偶性。
另外,在本实施例中,也不限定“临时版本号的奇偶性”与“两个数据存储单元对写操作生效和对读操作生效”之间的对应关系。为了便于描述,将两个数据存储单元记为数据存储单元D1和数据存储单元D2。在一可选实施例中,临时版本号为奇数时,表示数据存储单元D1中的共享数据对写操作生效,数据存储单元D2中的共享数据对读操作生效;则当临时版本号变为偶数时,表示数据存储单元D1中的共享数据对读操作生效,数据存储单元D2中的共享数据对写操作生效。或者,在另一可选实施例中,临时版本号为偶数时,表示数据存储单元D1中的共享数据对写操作生效,数据存储单元D2中的共享数据对读操作生效;则当临时版本号变为奇数时,表示数据存储单元D1中的共享数据对读操作生效,数据存储单元D2中的共享数据对写操作生效。
在本申请各实施例中,并不限定在虚拟机12之间共享数据的具体实现方式,在一些 应用场景中,可以将虚拟机12作为共享数据的使用者,在这些虚拟机12之间进行数据共享;在另一些应用场景中,可以在虚拟机12提供的运行环境中部署上层应用,将这些部署在虚拟机12中的上层应用作为共享数据的使用者,在这些上层应用之间进行数据共享。在部署在虚拟机12中的上层应用之间共享数据的场景中,如图1所示,虚拟机12中除了部署有上层应用之外,还包括数据共享管理模块121,由该数据共享管理模块121负责响应虚拟机12中部署的上层应用发起的共享数据读取请求,从共享内存窗口中读取第一共享数据,并根据第一共享数据执行相应操作。在一可选实施例中,数据共享管理模块121根据第一共享数据执行相应操作具体为:将第一共享数据返回给上层应用,以供上层应用对第一共享数据进行处理或者根据第一共享数据进行后续操作。在本实施例中,并不限定上层应用的实现方式,例如可以是各种视频类应用、邮件类应用、即时通讯类应用、图像处理应用等。
本申请实施例提供的数据共享方案适用于各种应用场景,下面以云计算场景中的文件元数据共享为例,对本申请实施例的技术方案进行详细说明。
图5为本申请实施例提供的一种文件元数据共享系统200的结构示意图。在该系统200中包括:部署在同一主机20上的目标进程模块21、主机文件系统(host file system)22以及具有数据共享关系的多个虚拟机23,多个虚拟机23中部署有客户文件系统(guest file system)24。在图5中,以多台主机20为例进行图示。关于虚拟机23之间的数据共享关系等相关描述可参见图1所示实施例中的描述,在此不再赘述。
在本实施例中,虚拟机23作为一种运行环境,可以承载租户的各种上层应用,这些上层应用在运行过程会产生数据,也会访问其它上层应用产生的数据或其它一些数据。在本实施例中,以文件系统的方式对各种数据进行组织和管理。在虚拟机23中部署有文件系统,记为客户文件系统24,相应地,主机20具有自己的文件系统,记为主机文件系统22,该主机文件系统22配合各虚拟机23中的客户文件系统24为虚拟机23中的上层应用提供文件相关的服务,例如创建文件、删除文件、打开文件、读取文件、对文件进行重命名等。进一步,如图5所示,文件元数据共享系统200还包括:多个存储节点25,这些存储节点25分布在不同的位置,负责存储系统200中的各文件。
其中,虚拟机23中的上层应用在需要访问文件时,可以向客户文件系统24发起文件访问操作,客户文件系统24获取相应文件元数据,基于所获取的文件元数据向主机文件系统22发起文件访问请求,主机文件系统25根据客户文件系统24提供的文件元数据,在相应存储节点25上执行文件访问操作,例如创建文件、删除文件、打开文件、读取文件、对文件进行重命名等。其中,文件元数据中包括执行文件访问操作所需的相关信息,例如文件名、文件在磁盘上的位置、文件的类型等。关于文件元数据的描述可参见上文实施例,在此不再赘述。
在本实施例中,具有数据共享关系的虚拟机23之间共享文件元数据,为了缩短虚拟机23对文件元数据的访问路径,同时解决文件元数据在各虚拟机23之间的一致性问题, 在本实施例中,以同一主机20为单位,对部署在同一主机20上部署的需要进行文件元数据共享的多个虚拟机12,在该主机20上为这些虚拟机23分配共享内存窗口(shared memory window),通过本地的共享内存窗口存储多个虚拟机23之间的共享文件元数据,实现文件元数据的本地访问,缩短文件元数据的访问路径,提高文件元数据的访问性能。
在本实施例中,为了能够为虚拟机23分配共享内存窗口,在主机20上部署了目标进程模块21。目标进程模块21可以是主机20上的守护进程,用于从主机20的物理空间中为多个虚拟机23分配共享内存窗口,共享内存窗口用于存储多个虚拟机23之间共享的文件元数据,对多个虚拟机23全部可见,且面向多个虚拟机23具有只读属性。当然,该共享内存储窗口对目标进程模块21是可读写的。
当任一虚拟机23中的上层应用需要进行文件访问操作时,可以向其所在虚拟机23中的客户文件系统24发起文件元数据读取操作,客户文件系统24响应其所属虚拟机23中上层应用发起的文件元数据读取操作,从共享内存窗口中读取第一文件元数据并返回给上层应用,以供上层应用根据第一文件元数据通过主机文件系统22执行文件访问操作。第一文件元数据是指需要从共享内存窗口中读取的文件元数据,可以是任一文件元数据。在此说明,上层应用在需要查询相关文件元数据的情况下,也可以向其所在虚拟机23中的客户文件系统24发起文件元数据读取操作,并不限于进行文件访问操作。其中,文件元数据读取操作可以是但不限于:获取文件属性值的请求,查找文件目录的请求以及打开文件的请求。对虚拟机中的上层应用来说,共享内存窗口是透明的,例如上层应用在发起获取文件属性值,查找文件目录或者打开文件等请求时,上层应用对客户文件系统24从共享内存窗口进行文件元数据相关的读操作是无感知的,对上层应用来说就像这些操作是虚拟机的内核中完成一样。
在执行文件访问操作的过程中,被访问文件的元数据可能发生变化,例如在修改某个文件的名称时,文件元数据中的文件名会变化,又例如,对文件进行修改时,文件元数据中文件最近的修改时间会发生变化,这些变化会产生新的文件元数据,为了让各虚拟机23能够感知到最新文件元数据,需要将新的文件元数据写入共享内存窗口中。为了便于描述,将新产生的文件元数据称为第二文件元数据。在本实施例中,主机文件系统22可以在执行文件访问操作的过程中,可以根据文件访问操作的结果确定是否生成新的第二文件元数据,如果确定生成第二文件元数据,根据文件访问操作的执行结果生成第二文件元数据并向目标进程模块21发起文件元数据写入操作;目标进程模块21响应主机文件系统22发起的文件元数据数据写入操作,将第二文件元数据写入共享内存窗口中,以供多个虚拟机共享第二文件元数据。其中,主机文件系统22可以将第二文件元数据提供给目标进程模块21,可选地,可以通过但不限于:远程过程调用(Remote Procedure Call,RPC)的方式将第二文件元数据提供给目标进程模块21,以供目标进程模块21将第二文件元数据写入共享内存窗口中。
在本实施例中,通过目标进程模块21和共享内存窗口在客户文件系统24和主机文件 系统22提供文件元数据服务,使得在同一主机上跨虚拟机实现了文件元数据的共享,并且实现了文件元数据的短读(short-read)和长写(long-write)。其中,短读是指虚拟机23直接从共享内存窗口中读取所需的第一文件元数据,而不经过目标进程模块21;长写是指向共享内存窗口中写入第二文件元数据时需要通过目标进程模块21写入,而不是由主机文件系统22直接向共享内存窗口中写入。其中,长写方式可以避免多个虚拟机同时对共享内存空间进行写操作引起数据错误或安全风险问题。例如,如果一个虚拟机被攻击,此时如果还允许该虚拟机直接往共享内存窗口中写入文件元数据,可能会因为该虚拟机的恶意写入而给其它虚拟机带来不利影响,而采用本实施例提供的长写方式可解决该问题。
在本实施例中,目标进程模块作为一个正常的进程,可以采用主机文件系统支持的额接口方式例如POSIX API,访问主机文件系统,以确保目标进程模块与主机文件系统的兼容性。基于此,目标进程模块可以调用内存分配命令,告知主机文件系统为其分配内存空间,目标进程模块将分配到的内存空间作为共享内存窗口暴露给各个虚拟机。具体地,目标进程模块将共享内存窗口的物理地址提供给虚拟机管理模块;由虚拟化管理模块在多个虚拟机启动时,根据共享内存窗口的物理地址,将多个虚拟机各自预留的物理地址空间映射为共享内存窗口,完成对共享内存窗口的暴露。在此说明,对不同虚拟机来说,各自预留的物理地址空间可以相同,也可以不相同,也就是说,对不同虚拟机来说,同一共享内存窗口可以映射到不同的私有内存地址(即预留的物理地址空间)。进一步,在预留的物理地址空间中还可以包括共享内存窗口的描述信息,以方便虚拟机23根据该描述信息从共享内存窗口中读取文件元数据,关于共享内存窗口的描述信息的相关说明可参见前述实施例,在此不再赘述。
在一可选实施例中,目标进程模块在分配到共享内存窗口之后,可以对共享内存窗口执行归零操作,为后续向共享内存储窗口中写入文件元数据提供基础。
在本实施例中,针对共享内存窗口从以下几个维度进行管理和维护,如下所示:
1、共享数据布局(Shared data layout):
在这部分内容中,可以根据文件元数据的类型,假设文件元数据包括索引节点(inode)数据,索引节点的牺牲缓存数据(inode victim cache)以及目录项(dentry)数据三类元数据,将共享内存窗口划分为三个内存区域,如图2中所示的内存区域A、内存区域B和内存区域C。内存区域A用于存储索引节点数据,内存区域B用于存储索引节点的牺牲缓存数据,内存区域C用于存储目录项数据。关于三类文件元数的详细说明,可参见前述实施例,在此不再赘述。
在本实施例中,给出了在内存区域中存储文件元数据时的一种存储结构,即针对每个文件元数据,采用两个数据存储单元和一个版本号存储单元对其进行存储,两个数据存储单元用于存储同一文件元数据,版本号存储单元用于存储临时版本号,该临时版本号具有奇偶性(even/odd),且该临时版本号的奇偶性用于标识这两个数据存储单元中当前对读操作和写操作分别生效的数据存储单元。
本实施例还示例性地给出了索引节点数据、牺牲缓存数据以及目录项数据的数据结构,以及每种数据在内存区域内存储时的存储结构。这些数据结构和存储结构的示意性定义如下:

在此说明,上述各存储结构的定义中,临时版本号ver_与文件版本号f_ver不同,临时版本号的奇偶性有意义,而且在每个存储结构中临时版本号的初始值可以不同的。另外,为了便于维护这些数据结构,每个数据结构都被填充以与缓存行的大小对齐。如果文件元数据的大小小于缓存行大小,则将其对齐为2的幂,这样有利于提高从共享内存窗口中读取文件元数据时的读写速度。另外,无论是从共享内存窗口中读文件元数据,还是从共享内存窗口写文件元数据,均遵从上述各文件元数据的数据结构以及各文件元数据在内存区域中的存储结构。
2、共享内存窗口的划分:
为了充分利用共享内存窗口的内存空间,图2所示的内存区域A-B均被划分为一系列的槽位组(slot set),每个槽位组包括多个槽位(slot),用于存储不同的文件元数据,文件元数据可作为槽位中的元素(element)。每个元素都能通过其哈希指纹找到其所属的槽位组,例如,对于索引节点数据和索引节点的牺牲缓存数据这两类文件元数据,可以对其inode编号进行哈希,根据哈希结果映射到对应的槽位组中;对于目录项数据,可以将其父inode编号与文件名一起哈希,根据哈希结果映射到对应的槽位组中。关于槽位组和槽位的示例可参见图3所示。
3、文件元数据的访问(Accessing the metadata):
在本实施例中,共享内存窗口中的文件元数据具有原子更新功能,因此多个客户文件系统可以以非阻塞方式同时在共享内存窗口中检索所需的文件元数据。基于上述文件元数据的数据结构、在内存区域中的存储结构以及上述槽位组和槽位,这部分对文件元数据的读写过程进行说明。
文件元数据的读操作:根据文件元数据读取请求,确定待读取的第一文件元数据对应的描述信息,例如,如果第一文件元数据是索引节点数据,其描述信息可以包括inode编号,如果第一文件元数据是索引节点的牺牲缓存数据,其描述信息可以包括inode编号,如果第一文件元数据是目录项数据,其描述信息可以包括父inode编号与文件名。对第一文件元数据对应的描述信息进行哈希,根据哈希结果确定第一文件元数据所在的槽位组,假设是第一槽位组;根据第一槽位组中各槽位上存储的共享数据的标识信息,确定第一文件元数据所在的第一槽位;从第一槽位中读取临时版本号,根据所读取的临时版本号的奇偶性,确定第一槽位包含的两个数据存储单元中对读操作生效的第一数据存储单元;从第一数据存储单元中读取中间态元数据,在读取完成后,将所读取的临时版本号与第一槽位中的临时版本号进行比对;若两者相同,说明在读取第一文件元数据过程中目标进程模块21未对第一文件元数据进行过修改,所以可以将该中间态元数据作为第一文件元数据;若两者不一致,说明在读取第一文件元数据过程中目标进程模块21对第一文件元数据进行过修改,所读取的中间态元数据不是最新版本的数据,因此需要按照前面的步骤重新执行 从第一槽位中读取第一文件元数据的操作。
文件元数据的写操作:根据文件元数据写入请求,确定待写入的第二文件元数据对应的描述信息,例如,如果第二文件元数据是索引节点数据,其描述信息可以包括inode编号,如果第二文件元数据是索引节点的牺牲缓存数据,其描述信息可以包括inode编号,如果第二文件元数据是目录项数据,其描述信息可以包括父inode编号与文件名。对第二文件元数据对应的描述信息进行哈希,根据哈希结果确定第二文件元数据所在的槽位组,假设是第二槽位组;根据第二槽位组中各槽位的访问信息,从第二槽位组中确定用于存储第二共享数据的第二槽位;根据第二槽位中的临时版本号,确定第二槽位包含的两个数据存储单元中对写操作生效的第二数据存储单元;将第二共享数据写入第二数据存储单元,并在写入完成后,更改临时版本号的奇偶性,以将第二数据存储单元变更为对读操作生效。
4、删除旧版本(Evicting older version):
如图4所示,每个内存区域除了包括多个槽位组之外,还包括位图区,该位图区包括与其所属内存区域中各槽位组对应的比特组,每个比特组用于记录对应槽位组中各槽位是否被访问过的访问信息,一个比特位对应一个槽位,每个比特位都是共享内存窗口中的一个小存储空间。基于此,可以按照时钟算法回收槽位,基于此,在确定第二槽位时,可以遍历第二槽位组对应的比特组中记录的各槽位的访问信息,对于遍历到的每个比特位,若该比特位的取值为第一值,将该比特位置为第二值,以表示再次遍历到该比特位时该比特位对应槽位中的文件元数据可被替换,并继续遍历后续比特位,直至首次遍历到取值为第二值的目标比特位,将目标比特位对应的槽位作为第二槽位,用于存储第二文件元数据。详细实施过程和说明可参见前述实施例,在此不再赘述。
5、以较短的元数据路径服务文件请求:
在本实施例中,共享内存窗口中的文件元数据负责服务大多数非写请求,包括获取文件属性值的请求,查找文件目录的请求以及打开文件的请求。其中,获取文件属性值的请求可以调用getattr函数实现,查找文件子目录的请求可以调用lookup函数实现,打开文件的请求可以调用open函数实现。getattr函数:用于获取文件、目录或文件夹的属性。lookup函数:根据当前路径的dentry的d_name成员在当前目录的父目录文件(用inode表示)里查找当前目录的inode编号,查找到当前目录的inode编号后,根据此inode编号获得此inode编号对应的索引节点数据,然后将目录项和inode编号绑定,以得到文件的目录。open函数:用于打开或创建文件。其中,getattr函数、lookup函数和open函数是传统文件系统中的已有函数,对此不做过多说明。
Open:打开文件是一个特殊的例子,同时需要上述三个函数,是三个操作的结合。客户文件系统首先搜索文件路径以确认文件是否存在,并通过共享内存窗口读取文件的相关属性以确定用户是否具有访问权限。之后,客户文件系统将上层应用分配一个文件描述符并返回,上层应用可以根据该文件描述符执行文件打开操作。
Lookup:表示需要沿文件路径迭代,并搜索共享内存窗口中的文件元数据,通过检查 文件名称和父inode编号来确认文件所属的目录。如果找不到文件所属的目录,将返回“未找到”,并通过将请求发送到目标进程模块(如守护进程)来帮忙检索文件所属的目录。
Getattr:在沿文件目录搜索时,通过在共享内存窗口中搜索该路径下的索引节点数据。如果找不到索引节点数据,并通过将请求发送到目标进程模块(如守护进程)来帮忙检索索引节点数据。
基于此,如图5所示,该系统200还包括:主节点26,用于存储和维护各文件的各种文件元数据,例如索引节点数据、目录项数据以及牺牲缓存数据。主节点26上存储完整的文件元数据。可选地,主节点26可以是一远程的服务器。其中,在客户文件系统无法从共享内存窗口中查询或检索到所需的文件元数据时,可以请求目标进程模块从主节点26为其获取所需文件元数据。基于此,目标进程模块还用于:在共享内存窗口中未存储第一文件元数据的情况下,从主节点26获取第一文件元数据,将第一元数据写入共享内存窗口中,以供多个虚拟机读取第一元数据;以及在向共享内存窗口中写入第二文件元数据的情况下,向主节点同步第二文件元数据,以使主节点保存第二文件元数据。
在本实施例中,针对同一主机上的虚拟机,通过为其分配共享内存窗口来缓存各虚拟机需要共享的文件元数据,并且可由虚拟机和目标进程模块相互配合通过短读和长写的方式,实现文件元数据的本地访问,缩短文件元数据的访问路径,提高文件元数据的访问性能;另外,共享内存窗口对这些虚拟机具有只读属性,即虚拟机对共享内存窗口可读不可写,并在主机上增设目标进程模块,文件元数据的写入由目标进程模块统一负责,可避免多个虚拟机同时对共享内存窗口进行写操作可能造成的数据错误问题,而且基于共享内存窗口,文件元数据的任何修改可被各虚拟机立即观察到,对各虚拟机来说能够做到文件元数据的一致性,而且无需经过网络传输进行同步,可进一步提高文件元数据的访问性能。
图6为本申请实施例提供的一种数据共享方法的流程示意图。该方法应用于数据共享系统,数据共享系统包括部署在同一主机上的目标进程模块和具有数据共享关系的多个虚拟机,关于该系统的详细描述可参见前述实施例,在此不再赘述。如图6所示,该方法包括:
601、目标进程模块从主机的物理空间中分配共享内存窗口,所述共享内存窗口用于存储部署在主机上的多个虚拟机之间的共享数据,且面向多个虚拟机具有只读属性;
602、多个虚拟机从共享内存窗口中读取所需的第一共享数据,并根据第一共享数据执行相应操作;
603、目标进程模块将第二共享数据写入所述共享内存窗口中,以供所述多个虚拟机共享所述第二共享数据。
在此说明,本申请实施例中并不限定步骤602和步骤603的执行顺序,两者可以顺序执行,也可以并行执行,具体视第二共享数据的产生情况以及虚拟机需要第一共享数据的情况而定。
在一可选实施例中,目标进程模块从主机的物理空间中分配共享内存窗口,包括:从 主机的物理空间中申请共享内存窗口;将共享内存窗口的物理地址提供给主机上的虚拟机管理模块,以供虚拟化管理模块将多个虚拟机各自预留的物理地址空间映射为共享内存窗口。进一步可选地,目标进程模块可以响应启动多个虚拟机的指令,即在虚拟机启动时,从主机的物理空间中申请共享内存窗口。
在一可选实施例中,上述共享内存窗口被划分为多个内存区域,不同内存区域用于存储不同类型的共享数据;每个内存区域中包括多个槽位组,每个槽位组包括多个槽位,不同槽位用于存储不同的共享数据。
进一步可选地,每个槽位包括两个数据存储单元和一个版本号存储单元,所述两个数据存储单元用于存储同一共享数据,所述版本号存储单元用于存储临时版本号,所述临时版本号的奇偶性用于标识所述两个数据存储单元中当前对读操作和写操作分别生效的数据存储单元,且所述临时版本号的奇偶性随着所述槽位中写操作的发生而变化。
在一可选实施例中,多个虚拟机从共享内存窗口中读取所需的第一共享数据,包括:在需要第一共享数据的情况下,从共享内存窗口中确定存储有第一共享数据的第一槽位;从第一槽位中读取第一共享数据;第一槽位是第一内存区域中任一槽位组中的槽位,第一内存区域是与第一共享数据的类型适配的内存区域。
进一步可选地,从共享内存窗口中确定存储有第一共享数据的第一槽位,包括:根据共享数据读取请求,确定第一共享数据对应的第一描述信息,第一描述信息中包括第一共享数据的类型和标识信息;根据第一共享数据的类型,从多个内存区域中确定第一内存区域;根据第一共享数据的标识信息的哈希结果,确定第一共享数据在第一内存区域中所属的第一槽位组;根据第一槽位组中各槽位上存储的共享数据的标识信息,从第一槽位组中确定第一共享数据所在的第一槽位。
进一步可选地,从第一槽位中读取第一共享数据,包括:从第一槽位中读取临时版本号,根据所读取的临时版本号的奇偶性,确定第一槽位包含的两个数据存储单元中对读操作生效的第一数据存储单元;从第一数据存储单元中读取中间态共享数据,在读取完成后,将所读取的临时版本号与第一槽位中的临时版本号进行比对;若两者相同,将中间态共享数据作为第一共享数据;若两者不一致,重新执行从第一槽位中读取第一共享数据的操作。
相应地,基于上述,目标进程模块将第二共享数据写入共享内存窗口中,包括:在多个虚拟机之间产生第二共享数据的情况下,从共享内存窗口中确定用于存储第二共享数据的第二槽位;将第二共享数据写入第二槽位中;第二槽位是第二内存区域中任一槽位组中的槽位,第二内存区域是与第二共享数据的类型适配的内存区域。
进一步可选地,从共享内存窗口中确定用于存储第二共享数据的第二槽位,包括:根据共享数据写入请求,确定第二共享数据对应的第二描述信息,第二描述信息包括第二共享数据的类型和标识信息;根据第二共享数据的类型,从多个内存区域中确定第二内存区域;根据第二共享数据的标识信息的哈希结果,确定第二共享数据在第二内存区域中所属的第二槽位组;根据第二槽位组中各槽位的访问信息,从第二槽位组中确定用于存储第二 共享数据的第二槽位。
在一可选实施例中,任一内存区域还包括位图区,位图区包括与其所属内存区域中各槽位组对应的比特组,一个比特组用于记录与其对应槽位组中各槽位是否被访问过的访问信息。相应地,根据第二槽位组中各槽位的访问信息,从第二槽位组中确定用于存储第二共享数据的第二槽位,包括:遍历与第二槽位组对应比特组中记录的各槽位对应的访问信息,对于遍历到的比特位,若该比特位的取值为第一值,将该比特位置为第二值,并继续遍历后续比特位,直至首次遍历到取值为第二值的目标比特位,将目标比特位对应的槽位作为第二槽位;第一值表示对应比特位的槽位被访问过,第二值表示对应比特位的槽位未被访问过或可被替换。
在一可选实施例中,将第二共享数据写入第二槽位中,包括:根据第二槽位中临时版本号的奇偶性,确定第二槽位包含的两个数据存储单元中对写操作生效的第二数据存储单元;将第二共享数据写入第二数据存储单元,并在写入完成后,更改临时版本号的奇偶性,以将第二数据存储单元变更为对读操作生效。
图7a为本申请实施例提供的另一种数据共享方法的流程示意图。该方法应用于数据共享系统,数据共享系统包括部署在同一主机上的目标进程模块和具有数据共享关系的多个虚拟机,关于该系统的详细描述可参见前述实施例,在此不再赘述。本实施例的方法是从目标进程模块的角度进行的描述,如图7a所示,该方法包括:
701a、从主机的物理空间中分配共享内存窗口,共享内存窗口用于存储多个虚拟机之间的共享数据,且面向多个虚拟机具有只读属性,以供多个虚拟机从共享内存窗口中读取所需的第一共享数据;
702a、将第二共享数据写入共享内存窗口中,以供多个虚拟机共享第二共享数据。
关于步骤701a和步骤702a的详细实现,可参见图6所示实施例中的相应步骤,在此不再赘述。
图7b为本申请实施例提供的又一种数据共享方法的流程示意图。该方法应用于数据共享系统,数据共享系统包括部署在同一主机上的目标进程模块和具有数据共享关系的多个虚拟机,关于该系统的详细描述可参见前述实施例,在此不再赘述。本实施例的方法是从任一虚拟机的角度进行的描述,如图7b所示,该方法包括:
701b、从共享内存窗口中读取所需的第一共享数据;其中,共享内存窗口用于存储多个虚拟机之间的共享数据,且面向多个虚拟机具有只读属性,共享内存窗口中的共享数据是由目标进程模块写入的;
702b、根据第一共享数据执行相应操作。
关于步骤701b和步骤702b的详细实现,可参见图6所示实施例中的相应步骤,在此不再赘述。关于上述各方法实施例中各步骤的详细实施方式以及有益效果已经在前述实施例中进行了详细描述,此处将不做详细阐述说明。
需要说明的是,在上述实施例及附图中的描述的一些流程中,包含了按照特定顺序出 现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如701a、702a等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
图8为本申请实施例提供的一种数据共享装置的结构示意图。该装置可作为上述实施例中的目标进程模块实现,但并不限于此。如图8所示,该装置包括:
分配模块81,用于从主机的物理空间中分配共享内存窗口,共享内存窗口用于存储多个虚拟机之间的共享数据,且面向多个虚拟机具有只读属性,以供多个虚拟机从共享内存窗口中读取所需的第一共享数据;
写入模块82,用于将第二共享数据写入共享内存窗口中,以供多个虚拟机共享第二共享数据。
在一可选实施例中,分配模块81具体用于:响应启动多个虚拟机的指令,从主机的物理空间中申请共享内存窗口;将共享内存窗口的物理地址提供给主机上的虚拟机管理模块,以供虚拟化管理模块将多个虚拟机各自预留的物理地址空间映射为共享内存窗口。
在一可选实施例中,上述共享内存窗口被划分为多个内存区域,不同内存区域用于存储不同类型的共享数据;一个内存区域中包括多个槽位组,一个槽位组包括多个槽位,不同槽位用于存储不同的共享数据。
基于上述,写入模块82具体用于:在多个虚拟机之间产生第二共享数据的情况下,从共享内存窗口中确定用于存储第二共享数据的第二槽位;将第二共享数据写入第二槽位中;第二槽位是第二内存区域中任一槽位组中的槽位,第二内存区域是与第二共享数据的类型适配的内存区域。
进一步可选地,写入模块82在确定第二槽位时,具体用于:根据共享数据写入请求,确定第二共享数据对应的第二描述信息,第二描述信息包括第二共享数据的类型和标识信息;根据第二共享数据的类型,从多个内存区域中确定第二内存区域;根据第二共享数据的标识信息的哈希结果,确定第二共享数据在第二内存区域中所属的第二槽位组;根据第二槽位组中各槽位的访问信息,从第二槽位组中确定用于存储第二共享数据的第二槽位。
在一可选实施例中,任一内存区域还包括位图区,位图区包括与其所属内存区域中各槽位组对应的比特组,一个比特组用于记录与其对应槽位组中各槽位是否被访问过的访问信息。相应地,写入模块82在确定第二槽位时,具体用于:遍历与第二槽位组对应比特组中记录的各槽位对应的访问信息,对于遍历到的比特位,若该比特位的取值为第一值,将该比特位置为第二值,并继续遍历后续比特位,直至首次遍历到取值为第二值的目标比特位,将目标比特位对应的槽位作为第二槽位;第一值表示对应比特位的槽位被访问过,第二值表示对应比特位的槽位未被访问过或可被替换。
在一可选实施例中,写入模块82在将第二共享数据写入第二槽位中时,具体用于: 根据第二槽位中临时版本号的奇偶性,确定第二槽位包含的两个数据存储单元中对写操作生效的第二数据存储单元;将第二共享数据写入第二数据存储单元,并在写入完成后,更改临时版本号的奇偶性,以将第二数据存储单元变更为对读操作生效。
图9为本申请实施例提供的另一种数据共享装置的结构示意图。该装置可作为上述实施例中的虚拟机中的共享数据管理模块或客户文件系统实现,但并不限于此。如图9所示,该装置包括:
读取模块91,用于从共享内存窗口中读取所需的第一共享数据;其中,共享内存窗口用于存储多个虚拟机之间的共享数据,且面向多个虚拟机具有只读属性,共享内存窗口中的共享数据是由目标进程模块写入的;执行模块92,用于根据第一共享数据执行相应操作。
在一可选实施例中,共享内存窗口被划分为多个内存区域,不同内存区域用于存储不同类型的共享数据;一个内存区域中包括多个槽位组,一个槽位组包括多个槽位,不同槽位用于存储不同的共享数据。
在一可选实施例中,读取模块91具体用于:在需要第一共享数据的情况下,从共享内存窗口中确定存储有第一共享数据的第一槽位;从第一槽位中读取第一共享数据;第一槽位是第一内存区域中任一槽位组中的槽位,第一内存区域是与第一共享数据的类型适配的内存区域。
进一步可选地,读取模块91在从共享内存窗口中确定第一槽位时,具体用于:根据共享数据读取请求,确定第一共享数据对应的第一描述信息,第一描述信息中包括第一共享数据的类型和标识信息;根据第一共享数据的类型,从多个内存区域中确定第一内存区域;根据第一共享数据的标识信息的哈希结果,确定第一共享数据在第一内存区域中所属的第一槽位组;根据第一槽位组中各槽位上存储的共享数据的标识信息,从第一槽位组中确定第一共享数据所在的第一槽位。
进一步可选地,读取模块91在从第一槽位中读取第一共享数据时,具体用于:从第一槽位中读取临时版本号,根据所读取的临时版本号的奇偶性,确定第一槽位包含的两个数据存储单元中对读操作生效的第一数据存储单元;从第一数据存储单元中读取中间态共享数据,在读取完成后,将所读取的临时版本号与第一槽位中的临时版本号进行比对;若两者相同,将中间态共享数据作为第一共享数据;若两者不一致,重新执行从第一槽位中读取第一共享数据的操作。
除上述数据共享装置之外,本申请实施例还提供一种数据共享装置,该数据共享装置包括:分配模块,用于从主机的物理空间中分配共享内存窗口,共享内存窗口用于存储多个虚拟机之间的共享数据,且面向多个虚拟机具有只读属性;读取模块,用于从共享内存窗口中读取所需的第一共享数据;执行模块,用于根据第一共享数据执行相应操作;以及写入模块,用于将第二共享数据写入共享内存窗口中,以供多个虚拟机共享第二共享数据。关于本实施例中各功能模块的详细描述可参见图8和图9所示实施例中相应功能模块的描述,在此不再赘述。
关于上述各装置实施例中各功能模块的详细说明以及有益效果已经在前述实施例中进行了详细描述,此处将不做详细阐述说明。
图10为本申请实施例提供的一种主机的结构示意图。如图10所示,该主机上部署有目标进程模块和具有数据共享关系的多个虚拟机,该主机包括:存储器1001和处理器1002,处理器1002与存储器1001耦合。
存储器1001,用于存储目标进程模块和多个虚拟机对应的程序代码,并可被配置为存储其它各种数据以支持在主机上的操作。这些数据的示例包括用于在主机上操作的任何应用程序或方法的指令,消息,图片,视频等。
在一些实施例中,处理器1002执行存储器1001中目标进程模块对应的程序代码,以用于:从主机的物理空间中分配共享内存窗口,共享内存窗口用于存储多个虚拟机之间的共享数据,且面向多个虚拟机具有只读属性,以供多个虚拟机从共享内存窗口中读取所需的第一共享数据;将第二共享数据写入共享内存窗口中,以供多个虚拟机共享第二共享数据。
在一可选实施例中,处理器1002在从主机的物理空间中分配共享内存窗口时,具体用于:响应启动多个虚拟机的指令,从主机的物理空间中申请共享内存窗口;将共享内存窗口的物理地址提供给主机上的虚拟机管理模块,以供虚拟化管理模块将多个虚拟机各自预留的物理地址空间映射为共享内存窗口。
在一可选实施例中,共享内存窗口被划分为多个内存区域,不同内存区域用于存储不同类型的共享数据;一个内存区域中包括多个槽位组,一个槽位组包括多个槽位,不同槽位用于存储不同的共享数据。
在一可选实施例中,处理器1002在将第二共享数据写入共享内存窗口中时,具体用于:在多个虚拟机之间产生第二共享数据的情况下,从共享内存窗口中确定用于存储第二共享数据的第二槽位;将第二共享数据写入第二槽位中;第二槽位是第二内存区域中任一槽位组中的槽位,第二内存区域是与第二共享数据的类型适配的内存区域。
在一可选实施例中,处理器1002在从共享内存窗口中确定第二槽位时,具体用于:根据共享数据写入请求,确定第二共享数据对应的第二描述信息,第二描述信息包括第二共享数据的类型和标识信息;根据第二共享数据的类型,从多个内存区域中确定第二内存区域;根据第二共享数据的标识信息的哈希结果,确定第二共享数据在第二内存区域中所属的第二槽位组;根据第二槽位组中各槽位的访问信息,从第二槽位组中确定用于存储第二共享数据的第二槽位。
在一可选实施例中,任一内存区域还包括位图区,位图区包括与其所属内存区域中各槽位组对应的比特组,一个比特组用于记录与其对应槽位组中各槽位是否被访问过的访问信息。相应地,处理器1002在从第二槽位组中确定第二槽位时,具体用于:遍历与第二槽位组对应比特组中记录的各槽位对应的访问信息,对于遍历到的比特位,若该比特位的取值为第一值,将该比特位置为第二值,并继续遍历后续比特位,直至首次遍历到取值为 第二值的目标比特位,将目标比特位对应的槽位作为第二槽位;第一值表示对应比特位的槽位被访问过,第二值表示对应比特位的槽位未被访问过或可被替换。
在一可选实施例中,处理器1002在将第二共享数据写入第二槽位中时,具体用于:根据第二槽位中临时版本号的奇偶性,确定第二槽位包含的两个数据存储单元中对写操作生效的第二数据存储单元;将第二共享数据写入第二数据存储单元,并在写入完成后,更改临时版本号的奇偶性,以将第二数据存储单元变更为对读操作生效。
在另一些实施例中,处理器1002执行存储器1001中虚拟机对应的程序代码,以用于:从共享内存窗口中读取所需的第一共享数据,并根据第一共享数据执行相应操作;其中,共享内存窗口用于存储多个虚拟机之间的共享数据,且面向多个虚拟机具有只读属性,共享内存窗口中的共享数据是由目标进程模块写入的。
在一可选实施例中,处理器1002在从共享内存窗口中读取所需的第一共享数据时,具体用于:在需要第一共享数据的情况下,从共享内存窗口中确定存储有第一共享数据的第一槽位;从第一槽位中读取第一共享数据;第一槽位是第一内存区域中任一槽位组中的槽位,第一内存区域是与第一共享数据的类型适配的内存区域。
在一可选实施例中,处理器1002在从共享内存窗口中确定第一槽位时,具体用于:根据共享数据读取请求,确定第一共享数据对应的第一描述信息,第一描述信息中包括第一共享数据的类型和标识信息;根据第一共享数据的类型,从多个内存区域中确定第一内存区域;根据第一共享数据的标识信息的哈希结果,确定第一共享数据在第一内存区域中所属的第一槽位组;根据第一槽位组中各槽位上存储的共享数据的标识信息,从第一槽位组中确定第一共享数据所在的第一槽位。
在一可选实施例中,处理器1002在从第一槽位中读取第一共享数据时,具体用于:从第一槽位中读取临时版本号,根据所读取的临时版本号的奇偶性,确定第一槽位包含的两个数据存储单元中对读操作生效的第一数据存储单元;从第一数据存储单元中读取中间态共享数据,在读取完成后,将所读取的临时版本号与第一槽位中的临时版本号进行比对;若两者相同,将中间态共享数据作为第一共享数据;若两者不一致,重新执行从第一槽位中读取第一共享数据的操作。
在又一些实施例中,处理器1002一方面执行存储器1001中目标进程模块对应的程序代码,以用于:从主机的物理空间中分配共享内存窗口,共享内存窗口用于存储多个虚拟机之间的共享数据,且面向多个虚拟机具有只读属性,以及将第二共享数据写入共享内存窗口中,以供多个虚拟机共享第二共享数据;另一方面执行存储器1001中虚拟机对应的程序代码,以用于:从共享内存窗口中读取虚拟机所需的第一共享数据,并根据第一共享数据执行相应操作。关于各操作的详细实现可参见前述实施例,在此不再赘述。
进一步,如图10所示,该主机还包括:通信组件1003、显示器1004、电源组件1005、音频组件1006等其它组件。图10中仅示意性给出部分组件,并不意味着主机只包括图10所示组件。另外,图10中虚线框内的组件为可选组件,而非必选组件,具体可视主机的 产品形态而定。本实施例的主机可以实现为台式电脑、笔记本电脑、智能手机或IOT设备等终端设备,也可以是常规服务器、云服务器或服务器阵列等服务端设备。若本实施例的主机实现为台式电脑、笔记本电脑、智能手机等终端设备,可以包含图10中虚线框内的组件;若本实施例的主机实现为常规服务器、云服务器或服务器阵列等服务端设备,则可以不包含图10中虚线框内的组件。
相应地,本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,当计算机程序被处理器执行时,致使处理器能够实现上述各方法实施例中的各步骤。
本申请实施例还提供一种计算机程序产品,包括计算机程序/指令,当计算机程序/指令被处理器执行时,致使处理器能够实现上述各方法实施例中的步骤。
上述存储器可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(Static Random-Access Memory,SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read Only Memory,EEPROM),可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM),可编程只读存储器(Programmable Read-Only Memory,PROM),只读存储器(Read-Only Memory,ROM),磁存储器,快闪存储器,磁盘或光盘。
上述通信组件被配置为便于通信组件所在设备和其他设备之间有线或无线方式的通信。通信组件所在设备可以接入基于通信标准的无线网络,如WiFi,2G、3G、4G/LTE、5G等移动通信网络,或它们的组合。在一个示例性实施例中,通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信组件还包括近场通信(Near Field Communication,NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(Radio Frequency Identification,RFID)技术,红外数据协会(Infrared Data Association,IrDA)技术,超宽带(Ultra Wide Band,UWB)技术,蓝牙(BlueTooth,BT)技术和其他技术来实现。
上述显示器包括屏幕,其屏幕可以包括液晶显示器(Liquid Crystal Display,LCD)和触摸面板(TouchPanel,TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。
上述电源组件,为电源组件所在设备的各种组件提供电力。电源组件可以包括电源管理系统,一个或多个电源,及其他与为电源组件所在设备生成、管理和分配电力相关联的组件。
上述音频组件,可被配置为输出和/或输入音频信号。例如,音频组件包括一个麦克风(Microphone,MIC),当音频组件所在设备处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器或经由通信组件发送。在一些实施例中,音频组件还包括一个扬声器,用于输出音 频信号。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可读存储介质(包括但不限于磁盘存储器、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(Central Processing Unit,CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(Random Access Memory,RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(Phase-change Random Access Memory,PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(Dynamic Random Access Memory,DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(Digital Video Disc,DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory  media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (15)

  1. 一种数据共享方法,其特征在于,应用于数据共享系统,所述数据共享系统包括部署在同一主机上的目标进程模块和具有数据共享关系的多个虚拟机,所述方法包括:
    所述目标进程模块从所述主机的物理空间中分配共享内存窗口,所述共享内存窗口用于存储所述多个虚拟机之间的共享数据,且面向所述多个虚拟机具有只读属性;
    所述多个虚拟机从所述共享内存窗口中读取所需的第一共享数据,并根据所述第一共享数据执行相应操作;
    所述目标进程模块将第二共享数据写入所述共享内存窗口中,以供所述多个虚拟机共享所述第二共享数据。
  2. 根据权利要求1所述的方法,其特征在于,所述目标进程模块从所述主机的物理空间中分配共享内存窗口,包括:
    从所述主机的物理空间中申请共享内存窗口,并将所述共享内存窗口的物理地址提供给所述主机上的虚拟机管理模块,以供所述虚拟化管理模块将所述多个虚拟机各自预留的物理地址空间映射为所述共享内存窗口。
  3. 根据权利要求1所述的方法,其特征在于,所述共享内存窗口被划分为多个内存区域,不同内存区域用于存储不同类型的共享数据;一个内存区域中包括多个槽位组,一个槽位组包括多个槽位,不同槽位用于存储不同的共享数据。
  4. 根据权利要求3所述的方法,其特征在于,一个槽位包括两个数据存储单元和一个版本号存储单元,所述两个数据存储单元用于存储同一共享数据,所述版本号存储单元用于存储临时版本号,所述临时版本号的奇偶性用于标识所述两个数据存储单元中当前对读操作和写操作分别生效的数据存储单元,且所述临时版本号的奇偶性随着所述槽位中写操作的发生而变化。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述多个虚拟机从所述共享内存窗口中读取所需的第一共享数据,包括:
    在需要第一共享数据的情况下,从所述共享内存窗口中确定存储有所述第一共享数据的第一槽位;
    从所述第一槽位中读取所述第一共享数据;所述第一槽位是第一内存区域中任一槽位组中的槽位,所述第一内存区域是与所述第一共享数据的类型适配的内存区域。
  6. 根据权利要求5所述的方法,其特征在于,从所述共享内存窗口中确定存储有所述第一共享数据的第一槽位,包括:
    根据共享数据读取请求,确定所述第一共享数据对应的第一描述信息,所述第一描述信息中包括所述第一共享数据的类型和标识信息;
    根据所述第一共享数据的类型,从所述多个内存区域中确定所述第一内存区域;
    根据所述第一共享数据的标识信息的哈希结果,确定所述第一共享数据在所述第一内存区域中所属的第一槽位组;
    根据所述第一槽位组中各槽位上存储的共享数据的标识信息,从所述第一槽位组中确定所述第一共享数据所在的第一槽位。
  7. 根据权利要求5所述的方法,其特征在于,从所述第一槽位中读取所述第一共享数据,包括:
    从所述第一槽位中读取临时版本号,根据所读取的临时版本号的奇偶性,确定所述第一槽位包含的两个数据存储单元中对读操作生效的第一数据存储单元;
    从所述第一数据存储单元中读取中间态共享数据,在读取完成后,将所读取的临时版本号与所述第一槽位中的临时版本号进行比对;
    若两者相同,将所述中间态共享数据作为所述第一共享数据;若两者不一致,重新执行从所述第一槽位中读取所述第一共享数据的操作。
  8. 根据权利要求1-4任一项所述的方法,其特征在于,所述目标进程模块将第二共享数据写入所述共享内存窗口中,包括:
    在所述多个虚拟机之间产生第二共享数据的情况下,从所述共享内存窗口中确定用于存储所述第二共享数据的第二槽位;
    将所述第二共享数据写入所述第二槽位中;所述第二槽位是第二内存区域中任一槽位组中的槽位,所述第二内存区域是与所述第二共享数据的类型适配的内存区域。
  9. 根据权利要求8所述的方法,其特征在于,从所述共享内存窗口中确定用于存储所述第二共享数据的第二槽位,包括:
    根据共享数据写入请求,确定所述第二共享数据对应的第二描述信息,所述第二描述信息包括所述第二共享数据的类型和标识信息;
    根据所述第二共享数据的类型,从所述多个内存区域中确定所述第二内存区域;
    根据所述第二共享数据的标识信息的哈希结果,确定所述第二共享数据在所述第二内存区域中所属的第二槽位组;
    根据所述第二槽位组中各槽位的访问信息,从所述第二槽位组中确定用于存储所述第二共享数据的第二槽位。
  10. 根据权利要求9所述的方法,其特征在于,任一内存区域还包括位图区,所述位图区包括与其所属内存区域中各槽位组对应的比特组,一个比特组用于记录与其对应槽位组中各槽位是否被访问过的访问信息;
    根据所述第二槽位组中各槽位的访问信息,从所述第二槽位组中确定用于存储所述第二共享数据的第二槽位,包括:
    遍历与所述第二槽位组对应比特组中记录的各槽位对应的访问信息,对于遍历到的比特位,若该比特位的取值为第一值,将该比特位置为第二值,并继续遍历后续比特位,直至首次遍历到取值为所述第二值的目标比特位,将所述目标比特位对应的槽位作为所述第二槽位;所述第一值表示对应比特位的槽位被访问过,所述第二值表示对应比特位的槽位未被访问过或可被替换。
  11. 根据权利要求8所述的方法,其特征在于,将所述第二共享数据写入所述第二槽位中,包括:
    根据所述第二槽位中临时版本号的奇偶性,确定所述第二槽位包含的两个数据存储单元中对写操作生效的第二数据存储单元;
    将所述第二共享数据写入所述第二数据存储单元,并在写入完成后,更改所述临时版本号的奇偶性,以将所述第二数据存储单元变更为对读操作生效。
  12. 一种主机,其特征在于,所述主机上部署有目标进程模块和具有数据共享关系的多个虚拟机,所述主机包括:存储器和处理器;
    所述存储器,用于存储所述目标进程模块和所述多个虚拟机对应的程序代码;所述处理器与所述存储器耦合,用于执行所述目标进程模块对应的程序代码,以用于执行权利要求1-11中任一项所述方法中的步骤。
  13. 一种数据共享系统,其特征在于,包括:部署在同一主机上的目标进程模块和具有数据共享关系的多个虚拟机;所述目标进程模块,用于实现权利要求1-11任一项所述方法中所述目标进程模块执行的步骤,所述多个虚拟机,用于实现权利要求1-11任一项所述方法中所述虚拟机执行的步骤。
  14. 一种文件元数据共享系统,其特征在于,包括:部署在同一主机上的目标进程模块、主机文件系统以及具有数据共享关系的多个虚拟机,所述多个虚拟机中部署有客户文件系统;
    所述目标进程模块,用于从所述主机的物理空间中分配共享内存窗口,所述共享内存窗口用于存储所述多个虚拟机之间共享的文件元数据,且面向所述多个虚拟机具有只读属性;
    所述客户文件系统,用于响应其所属虚拟机中上层应用发起的文件元数据读取操作,从所述共享内存窗口中读取第一文件元数据并返回给所述上层应用;
    所述目标进程模块还用于:响应所述主机文件系统发起的文件元数据数据写入操作,将第二文件元数据写入所述共享内存窗口中,以供所述多个虚拟机共享所述第二文件元数据。
  15. 一种存储有计算机程序的计算机可读存储介质,其特征在于,当所述计算机程序被处理器执行时,致使所述处理器能够实现权利要求1-11中任一项所述方法中的步骤。
PCT/CN2024/078720 2023-03-03 2024-02-27 数据共享方法、设备、系统及存储介质 Ceased WO2024183559A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310219577.0A CN116401004A (zh) 2023-03-03 2023-03-03 数据共享方法、设备、系统及存储介质
CN202310219577.0 2023-03-03

Publications (1)

Publication Number Publication Date
WO2024183559A1 true WO2024183559A1 (zh) 2024-09-12

Family

ID=87009350

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/078720 Ceased WO2024183559A1 (zh) 2023-03-03 2024-02-27 数据共享方法、设备、系统及存储介质

Country Status (2)

Country Link
CN (1) CN116401004A (zh)
WO (1) WO2024183559A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118860992A (zh) * 2024-09-24 2024-10-29 阿里云计算有限公司 数据和容器文件读取方法、设备、系统、介质及程序产品
CN120723846A (zh) * 2025-08-20 2025-09-30 苏州元脑智能科技有限公司 数据共享方法、装置、电子设备及存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116401004A (zh) * 2023-03-03 2023-07-07 阿里巴巴(中国)有限公司 数据共享方法、设备、系统及存储介质
WO2025020014A1 (zh) * 2023-07-21 2025-01-30 华为技术有限公司 一种控制方法、网卡系统、主机系统及芯片系统
CN117056298B (zh) * 2023-08-24 2024-07-23 深圳市海成智联科技有限公司 基于共享内存的跨服务数据通信系统
CN117009114B (zh) * 2023-10-07 2024-05-28 联通(广东)产业互联网有限公司 一种数据共享方法、装置、电子设备及存储介质
CN117131136B (zh) * 2023-10-26 2024-01-19 新唐信通(浙江)科技有限公司 一种研发数据共享方法、系统、设备及存储介质
CN117311237B (zh) * 2023-11-07 2025-02-07 湖南行必达网联科技有限公司 一种电动汽车动力域控制器、数据通信方法及域控系统
CN117421141B (zh) * 2023-11-08 2025-10-24 厦门四信通信科技有限公司 一种进程间数据参数共享方法、装置、设备及介质
CN118093230B (zh) * 2024-04-22 2024-08-06 深圳华锐分布式技术股份有限公司 基于共享内存的跨进程通信方法、装置、设备及存储介质
CN119127394B (zh) * 2024-08-27 2025-03-14 北京天融信网络安全技术有限公司 主备虚拟机会话同步方法、设备及存储介质
CN120371537B (zh) * 2025-06-25 2025-09-05 浪潮电子信息产业股份有限公司 一种跨主机的内存共享方法、系统、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693162A (zh) * 2011-12-29 2012-09-26 中国科学技术大学苏州研究院 基于共享内存和核间中断的多核平台上多个虚拟机之间进程通信方法
CN104331327A (zh) * 2014-12-02 2015-02-04 山东乾云启创信息科技有限公司 大规模虚拟化环境中任务调度的优化方法及优化系统
US9146763B1 (en) * 2012-03-28 2015-09-29 Google Inc. Measuring virtual machine metrics
CN106612306A (zh) * 2015-10-22 2017-05-03 中兴通讯股份有限公司 虚拟机的数据共享方法及装置
CN111679921A (zh) * 2020-06-09 2020-09-18 Oppo广东移动通信有限公司 内存共享方法、内存共享装置及终端设备
CN116401004A (zh) * 2023-03-03 2023-07-07 阿里巴巴(中国)有限公司 数据共享方法、设备、系统及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693162A (zh) * 2011-12-29 2012-09-26 中国科学技术大学苏州研究院 基于共享内存和核间中断的多核平台上多个虚拟机之间进程通信方法
US9146763B1 (en) * 2012-03-28 2015-09-29 Google Inc. Measuring virtual machine metrics
CN104331327A (zh) * 2014-12-02 2015-02-04 山东乾云启创信息科技有限公司 大规模虚拟化环境中任务调度的优化方法及优化系统
CN106612306A (zh) * 2015-10-22 2017-05-03 中兴通讯股份有限公司 虚拟机的数据共享方法及装置
CN111679921A (zh) * 2020-06-09 2020-09-18 Oppo广东移动通信有限公司 内存共享方法、内存共享装置及终端设备
CN116401004A (zh) * 2023-03-03 2023-07-07 阿里巴巴(中国)有限公司 数据共享方法、设备、系统及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118860992A (zh) * 2024-09-24 2024-10-29 阿里云计算有限公司 数据和容器文件读取方法、设备、系统、介质及程序产品
CN120723846A (zh) * 2025-08-20 2025-09-30 苏州元脑智能科技有限公司 数据共享方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN116401004A (zh) 2023-07-07

Similar Documents

Publication Publication Date Title
WO2024183559A1 (zh) 数据共享方法、设备、系统及存储介质
CN113296696B (zh) 一种数据的访问方法、计算设备及存储介质
US9742860B2 (en) Bi-temporal key value cache system
CN103412822B (zh) 操作非易失性内存和数据操作的方法和相关装置
US11803517B2 (en) File system for anonymous write
CN107408132B (zh) 跨越多个类型的存储器移动分层数据对象的方法和系统
WO2024230779A1 (zh) 一种文件访问方法、系统、电子设备及机器可读存储介质
WO2019223377A1 (zh) 文件处理方法、装置、设备及存储介质
CN103078898B (zh) 文件系统、接口服务装置和数据存储服务提供方法
CN120123305B (zh) 一种多主机共享文件系统的方法、产品、设备及存储介质
US11500822B2 (en) Virtualized append-only interface
CN118120212A (zh) 一种文件去重方法、装置和设备
US9336232B1 (en) Native file access
US12483544B2 (en) Method and system for performing authentication and object discovery for on-premises cloud service providers
CN113590309B (zh) 一种数据处理方法、装置、设备及存储介质
US20220222234A1 (en) Systems and methods for multiplexing data of an underlying index
CN119493519A (zh) 数据管理方法及其装置
CN116467270A (zh) 数据管理系统、数据更新方法及装置
US11507402B2 (en) Virtualized append-only storage device
US11853319B1 (en) Caching updates appended to an immutable log for handling reads to the immutable log
US12229022B2 (en) Method and system for generating incremental approximation backups of limited access cloud data
US12306792B2 (en) Managing access to file based backups based on storage units and workload use
US12386713B2 (en) Managing use of a shared virtual disk for accessing data in file based backups by multiple virtual machines
US12271270B2 (en) Enabling user-based instant access from file based backups
CN117708072B (zh) 文件复制方法、终端设备及芯片系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24766281

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE