Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a file system data migration method, a system, equipment and a medium based on FUSE.
The aim of the invention can be achieved by the following technical scheme:
A file system data migration system based on FUSE comprises a data protection file system module, a FUSE file system module, a detection and feedback module and a system configuration and maintenance management module;
The data protection file system module comprises a metadata sub-module and a data snapshot sub-module, wherein the metadata sub-module is used for data mapping and related metadata management, and the data snapshot sub-module is used for data backup, unique ID management and verification management;
The FUSE file system module is used for carrying out mixing, compression and transmission on the acquired data through a file system kernel mode interface and a file system user mode interface, and comprises a file system kernel mode interface submodule and a file system user mode interface submodule;
The detection and feedback module is used for recording file operation and statistical data flow and comprises a file operation recording submodule and a data flow statistical submodule;
The system configuration and maintenance management module is used for managing system configuration and daily maintenance tasks and comprises a file system mounting and dismounting sub-module, a remote storage fault short-circuit switch sub-module and a log management sub-module.
Preferably, the data map includes a local Super block data map, dentry data map, inode data map, and the related metadata management includes, but is not limited to, namespace management, storage path management, file attribute management, quota management, and file check management.
Preferably, the local Super block data map is used to manage global information of the file system, including but not limited to block size and inode number;
The Dentry data mapping is used for maintaining the corresponding relation between the file name and the inode so as to quickly locate the file;
The Inode data map is used to store metadata information for the file including, but not limited to, file size, creation time.
Preferably, the data backup comprises snapshot creation, idempotent assurance and consistency assurance;
the unique ID management distributes unique identification for each file operation and prevents the data from being repeatedly written in the related backup operation;
The check management comprises data check and interface check, wherein the data check is used for carrying out data check after the snapshot is created, so that the data in the snapshot is consistent with the original data, and the interface check is used for relevant interface check, so that the input parameters and the execution result are correct and consistent when the backup operation is called.
Preferably, the snapshot creation is used for creating a snapshot of the data of each operation, the snapshot comprises the state of the data at a specific time point so as to ensure that the data can be restored to the previous state when the data is lost or damaged, the idempotent guarantee is used for designing the snapshot operation into idempotent so as to ensure that the result is still the same after the operation is repeatedly executed, and the consistency guarantee is used for ensuring the consistency of the data during the snapshot operation.
Preferably, the kernel mode interface submodule of the file system processes data through a kernel mode interface, directly intercepts data in a VFS layer, realizes kernel mode data reading and writing, and maintains kernel mode data cache;
the file system user mode interface sub-module realizes file system operation in a user mode, and realizes user mode data reading and writing through interaction between FUSE and a kernel mode interface.
The remote storage fault short-circuit switch submodule is used for monitoring the remote storage state and providing a short-circuit switch to quickly switch a storage path to standby storage when the remote storage fails;
The log management submodule comprises log grade, switch configuration and log data rolling deletion, wherein the log data rolling deletion is carried out according to file time, size and the like, and the log comprises system fault log records and data operation log records and is used for recording log information in the running process of a system.
A file system data migration method based on FUSE is applied to the file system data migration system based on FUSE, and comprises the following steps:
When application is mounted on FUSE and system call is executed, the operation of the system call is routed to a FUSE driver through VFS, wherein the FUSE comprises a kernel module and a FUSE daemon;
The fuse driver creates a fuse request structure body and stores the request in a request queue, and the process of executing the operation is blocked;
The fuse daemon reads a fuse request of a kernel queue in the kernel module through/dev/fuse block equipment and submits operation to an underlying file system, wherein the underlying file system comprises EXT4 and F2FS;
After the bottom file system processes the request, the fuse daemon writes the reply back to the/dev/fuse block device, and the fuse driver marks the request as completed and wakes up the user process.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the FUSE-based file system data migration system described above when executing the computer program.
A storage medium containing computer executable instructions that when executed by a computer processor are for performing the FUSE-based file system data migration system described above.
The beneficial effects of the invention are as follows:
(1) The non-invasive kernel interaction is realized by introducing the FUSE, the file system is realized under the condition of not invading the kernel, the data migration of the file system is carried out in the user space, the incremental backup is directly carried out on the bottom layer by using the file system, the kernel code is not modified, the switching between the user mode and the kernel mode when the data is read and transmitted is reduced, and the data migration efficiency is effectively improved.
(2) The snapshot creation ensures that the data can be restored to the previous state when the data is lost or damaged, the idempotent property ensures that the repeated execution is the same as the result obtained by the repeated execution, the consistency of the data during the snapshot operation is ensured through the consistency guarantee, and the problem that the data is not updated timely is solved.
(3) By designing the buffer area, the data is accumulated to a certain amount or is sent in batches for a certain time, so that unnecessary memory allocation and copying are avoided as much as possible, the requirement of local data backup to a far end is met, the system call cost caused by frequent small data amount multi-group number sending is reduced, and the data sending speed is greatly improved.
Detailed Description
In order to further describe the technical means and effects adopted by the invention for achieving the preset aim, the following detailed description is given below of the specific implementation, structure, characteristics and effects according to the invention with reference to the attached drawings and the preferred embodiment.
Referring to fig. 1, a file system data migration system based on FUSE includes a data protection file system module, a FUSE file system module, a detection and feedback module, and a system configuration and maintenance management module;
The data protection file system module comprises a metadata sub-module and a data snapshot sub-module, wherein the metadata sub-module is used for data mapping and related metadata management, and the data snapshot sub-module is used for data backup, unique ID management and verification management;
The FUSE file system module is used for carrying out mixing, compression and transmission on the acquired data through a file system kernel mode interface and a file system user mode interface, and comprises a file system kernel mode interface submodule and a file system user mode interface submodule;
The detection and feedback module is used for recording file operation and statistical data flow and comprises a file operation recording submodule and a data flow statistical submodule;
The system configuration and maintenance management module is used for managing system configuration and daily maintenance tasks and comprises a file system mounting and dismounting sub-module, a remote storage fault short-circuit switch sub-module and a log management sub-module.
In this embodiment, the data map includes a local Super block data map, dentry data map, inode data map, where the related metadata management includes, but is not limited to, namespace management, storage path management, file attribute management, quota management, and file check management;
the local Super block data map is used for managing global information of a file system, wherein the global information comprises, but is not limited to, block sizes and inodes;
The Dentry data mapping is used for maintaining the corresponding relation between the file name and the inode so as to quickly locate the file;
the Inode data map is used for storing metadata information of a file, and the metadata information comprises, but is not limited to, file size and creation time;
It should be noted that, the local Super block, denry, inode are all file organization forms of linux vfs.
In this embodiment, the data backup includes snapshot creation, idempotent assurance, and consistency assurance, where the snapshot creation is used to create a snapshot for the data of each operation, where the snapshot includes the state of the data at a specific time point, and ensures that the state can be restored to the previous state when the data is lost or damaged, the idempotent assurance is used to design the snapshot operation into idempotent, and ensures that no matter how many times the operation is executed, the result is the same, no side effect is generated due to repeated execution, and the consistency assurance is used to ensure that the consistency of the data in the snapshot operation, that is, the state of the data reflected by the snapshot is consistent, and no partial update or no update occurs;
the unique ID management distributes unique identification for each file operation and prevents the data from being repeatedly written in the related backup operation;
Specifically, a 64-bit ID is adopted, the unique ID is indexed through index, the index is organized according to a tree structure and comprises a father catalog ID and a file ID, the backup file is indexed and searched through the index, and the unique ID is used for ensuring that the backup operation cannot cause data to be written for multiple times due to system errors or repeated requests, so that the problem of data repetition or inconsistency is avoided.
The check management comprises data check and interface check, wherein the data check is used for carrying out data check by a system after the snapshot is created, ensuring that the data in the snapshot is consistent with the original data and is not damaged or tampered, and the interface check is used for relevant interface check so as to ensure that the input parameters and the execution result are correct and consistent when the backup operation is invoked.
In this embodiment, the kernel mode interface submodule of the file system processes data through a kernel mode interface, directly intercepts data in the VFS layer, realizes kernel mode data reading and writing, and maintains kernel mode data cache;
the file system user mode interface sub-module realizes file system operation in a user mode, and realizes user mode data reading and writing through interaction between FUSE and a kernel mode interface.
It should be noted that, the mixing is to cache data in the memory, speed up the reading operation, the compressing is to reduce the volume of data transmission and improve the transmission efficiency, and the data is transmitted to the target storage location by the transmission.
In this embodiment, the file operation recording submodule is used to record detailed information of file operations, including but not limited to reading, writing, and modifying.
The data flow statistics sub-module is used for counting flow information of data, in particular, reading flow and writing flow of the data.
In this embodiment, the file system mounting and dismounting submodule is used for providing a mounting and dismounting function of the data protection file system;
the remote storage fault short-circuit switch submodule is used for monitoring a remote storage state, and providing a short-circuit switch to rapidly switch a storage path to standby storage when the remote storage fails;
The log management submodule comprises log grade, switch configuration and log data rolling deletion, wherein the log data rolling deletion is carried out according to file time, size and the like, and the log comprises system fault log records and data operation log records and is used for recording log information in the running process of a system.
A file system data migration method based on FUSE includes:
When application is installed on a FUSE and a system call is executed, the operation of the system call is routed to a FUSE driver through a VFS, wherein the FUSE comprises a kernel module and a user space daemon (FUSE daemon);
The fuse driver creates a fuse request structure body and stores the request in a request queue, and the process of executing the operation is blocked;
The fuse daemon reads a fuse request of a kernel queue in the kernel module through/dev/fuse block equipment and submits operation to an underlying file system, wherein the underlying file system comprises EXT4 and F2FS;
After the bottom file system processes the request, the fuse daemon writes the reply back to the/dev/fuse block device, and the fuse driver marks the request as completed and wakes up the user process.
In this embodiment, the application mounting on the FUSE includes mounting the specified fuse_path to the/dev/FUSE device via a mount function.
In this embodiment, when the FUSE reads data through the/dev/FUSE block device, the FUSE uses the splice block of the FUSE to perform memory zero copy, where the splice block allows the user space to transfer data between two kernel memory buffers, without copying the data to the user space.
Specifically, the buffer area form data containing the file descriptor is processed by the write_buf () method of the fuse daemon, so that one memory application and copy are omitted.
In this embodiment, the multithreaded mode read request is implemented by fuse_session_loop_mt, which is more efficient in request processing than a single thread.
Specifically, the Fuse daemon processes file operation requests based on libfuse libraries, and in a multithreading mode, the Fuse daemon starts with one thread, if more than two requests exist in a kernel queue, other threads can be automatically generated, and 10 threads are supported simultaneously by default to process the requests.
In this embodiment, for data to be protected, non-invasive kernel interaction is implemented through FUSE, and file system data migration is performed in user space without modifying kernel codes;
The method comprises the steps of intercepting data in a VFS file system layer, and mixing, compressing and transmitting the acquired data through a kernel mode interface of the file system and a user mode interface of the file system;
It should be noted that, when the kernel module is loaded, the kernel module is registered as a fuse file system driver of the Linux virtual file system, and the fuse file system driver mainly realizes that codes are located in a user space, so that the kernel is not required to be recompiled, and the own file system can be realized under the condition of not invading the kernel;
Specifically, a user-defined file system is realized in a user space through FUSE, remote call between a client and a server is processed through an RPC interface, so that remote backup and recovery of data are realized, FUSE backup file metadata are stored in a key-value form, the file metadata key is a compression code generated according to a father directory and a file ID, the file metadata value comprises file attribute information and file block information, the file metadata key is organized and stored through an index tree of the FUSE backup file system, the file metadata value of backup is obtained through the file metadata key index, and storage of file actual data is performed through the file block information.
In order to reduce the read speed gap between the processor and the memory, a cache memory is generally used to increase the speed of request response and data transmission, i.e. the data is first stored in a buffer area and then accessed and transmitted.
In this embodiment, in order to avoid the system call overhead caused by frequent transmission of small data volume and multiple sets of data, a buffer module is designed to avoid unnecessary memory allocation and duplication as much as possible by designing a buffer area, where the buffer area is a character queue, and includes sender buffering and receiver buffering;
the sender buffers additional data from the tail of the queue, takes out the data from the head of the queue and sends the data to the socket;
and the receiver buffers and takes out data from the head of the queue, and receives data in the socket from the tail of the queue.
Specifically, the buffer module stores the data in the buffer area before the data is transmitted, and the data is accumulated to a certain amount or is transmitted in batches after a certain time is reached, so that the data transmission speed is greatly improved.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The present invention is not limited in any way by the above-described preferred embodiments, but is not limited to the above-described preferred embodiments, and any person skilled in the art will appreciate that the present invention can be embodied in the form of a program for carrying out the method of the present invention, while the above disclosure is directed to equivalent embodiments capable of being altered or modified in a slight manner, any and all concise modifications, equivalent variations and alterations of the above embodiments are still within the scope of the present disclosure, all as may be made without departing from the scope of the present disclosure.