CN117453643B

CN117453643B - File caching method, device, terminal and medium based on distributed file system

Info

Publication number: CN117453643B
Application number: CN202311777662.5A
Authority: CN
Inventors: 孟军; 郑华; 裴来广
Original assignee: Baike Data Technology Shenzhen Co ltd
Current assignee: Baike Data Technology Shenzhen Co ltd
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-04-02
Anticipated expiration: 2043-12-22
Also published as: CN117453643A

Abstract

The invention discloses a file caching method, a device, a terminal and a medium based on a distributed file system, wherein the method comprises the following steps: receiving first request information of a first client, and determining file zone bit information corresponding to the first request information, wherein a target file is cached by a first client user, and the first request information is used for requesting a metadata server in a cluster so as to send the file zone bit information to the metadata server; determining a cache position of the target file at the first client based on the file zone bit information; and receiving second request information of a second client, and guiding the second client to read the target file from the cache position based on the second request information. The invention can realize the efficient reading of the files stored in the client, and solve the problem of data lag caused by network delay so as to realize the interaction of the files among the terminals.

Description

File caching method, device, terminal and medium based on distributed file system

Technical Field

The present invention relates to the field of data storage technologies, and in particular, to a file caching method, device, terminal, and medium based on a distributed file system.

Background

At present, most of network speeds cannot reach the speed of a local hard disk, so network delay occurs. At present, in order to reduce the influence of network delay on file storage, a cache mode is basically adopted. The storage system has two metrics, one is bandwidth and one is IOPS (the number of requests that can be handled by a client in one second). In practical application, when a file is cached, the file is in the cache of a client and is not stored in a cluster, and if another client wants to read the file, the client does not know the position of the file, so that the reading of the file is affected, and the data interaction efficiency between the clients is reduced.

Accordingly, there is a need for improvement and advancement in the art.

Disclosure of Invention

The invention aims to solve the technical problems that in the prior art, the file caching method, the device, the terminal and the medium based on the distributed file system are provided, and aims to solve the problems that in the prior art, when files are cached, the files are inconvenient to read and the data interaction efficiency between clients is low.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a file caching method based on a distributed file system, where the method includes:

receiving first request information of a first client, and determining file zone bit information corresponding to the first request information, wherein the first client is used for caching a target file, and the first request information is used for requesting a metadata server in a cluster so as to send the file zone bit information to the metadata server;

determining a cache position of the target file at the first client based on the file zone bit information;

and receiving second request information of a second client, and guiding the second client to read the target file from the cache position based on the second request information.

In one implementation manner, the determining the file flag bit information corresponding to the first request information includes:

determining a first request intention of the first client based on the first request information;

if the first request intention is a notification intention, acquiring index information corresponding to the notification intention in the first request information based on the notification intention, and acquiring file zone bit information based on the index information, wherein the index information is used for guiding the target file.

In one implementation, the determining, based on the first request information, a first request intention of the first client includes:

analyzing the first request information to obtain analysis content;

and screening the header information of the analysis content, wherein if the header information exists in the analysis content, the first request intention is a notification intention, and the notification intention is used for notifying the metadata server of the header information.

In one implementation manner, if the first request intention is a notification intention, acquiring index information corresponding to the notification intention in the first request information based on the notification intention, and obtaining the file flag bit information based on the index information, where the method includes:

acquiring file storage directory information in the first client, wherein the file storage directory information is used for reflecting the mapping relation between the position codes of file storage and file header information;

matching the file storage directory information with the file header information corresponding to the notification intention, and determining position coding information pointed by the file header information, wherein the position coding information is used for reflecting the position of the target file at the first client;

and generating the index information based on the position coding information, and taking the index information as the file zone bit information.

In one implementation manner, the determining, based on the file flag bit information, a cache location of the target file at the first client includes:

determining file storage path information corresponding to the file zone bit information based on the file zone bit information;

and determining a final storage node in the file storage path information based on the file storage path information, and determining the cache position based on the final storage node, wherein each storage node in the first client corresponds to a unique cache position.

In one implementation, the directing the second client to read the target file from the cache location based on the second request information includes:

determining a second request intention corresponding to the second request information based on the second request information;

and if the second request intention is a reading intention, acquiring all files in the cache position, determining the target file based on file header information received by the metadata server, and reading the target file.

In one implementation, the method includes:

if the file cache number in the first client reaches a threshold value, acquiring an empty object file in the cluster;

and controlling the first client to cover the file to be cached on the empty object file in the cluster so as to finish file caching.

In a second aspect, an embodiment of the present invention further provides a file caching apparatus based on a distributed file system, where the apparatus includes:

the system comprises a zone bit determining module, a file zone bit determining module and a file zone bit determining module, wherein the zone bit determining module is used for receiving first request information of a first client and determining file zone bit information corresponding to the first request information, the first client is used for caching a target file, and the first request information is used for requesting a metadata server in a cluster so as to send the file zone bit information to the metadata server;

the cache position determining module is used for determining the cache position of the target file at the first client based on the file zone bit information;

and the file reading module is used for receiving second request information of a second client and guiding the second client to read the target file from the cache position based on the second request information.

In a third aspect, an embodiment of the present invention further provides a terminal, where the terminal includes a memory, a processor, and a file caching program based on a distributed file system stored in the memory and capable of running on the processor, and when the processor executes the file caching program based on the distributed file system, the processor implements a step of the file caching method based on the distributed file system according to any one of the above schemes.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a file caching program based on a distributed file system, where the file caching program based on the distributed file system implements the steps of the file caching method based on the distributed file system according to any one of the above schemes when the file caching program based on the distributed file system is executed by a processor.

The beneficial effects are that: compared with the prior art, the invention provides a file caching method based on a distributed file system, which comprises the steps of firstly receiving first request information of a first client and determining file zone bit information corresponding to the first request information, wherein the first client is used for caching a target file, and the first request information is used for requesting a metadata server in a cluster so as to send the file zone bit information to the metadata server. And then, based on the file zone bit information, determining the cache position of the target file at the first client. And finally, receiving second request information of a second client, and guiding the second client to read the target file from the cache position based on the second request information. The invention can realize the efficient reading of the files stored in the client, and solve the problem of data lag caused by network delay so as to realize the interaction of the files among the terminals.

Drawings

Fig. 1 is a flowchart of a specific implementation of a file caching method based on a distributed file system according to an embodiment of the present invention.

Fig. 2 is a functional schematic diagram of a file caching apparatus based on a distributed file system according to an embodiment of the present invention.

Fig. 3 is a schematic block diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment provides a file caching method based on a distributed file system, which can realize efficient data interaction among all clients and effectively solve the problem that file information cannot be updated in time due to network delay. In a specific application, the embodiment may first receive first request information of a first client, and determine file flag bit information corresponding to the first request information, where the first client is used to cache a target file, and the first request information is used to request a metadata server in a cluster, so as to send the file flag bit information to the metadata server. And then, based on the file zone bit information, determining the cache position of the target file at the first client. And finally, receiving second request information of a second client, and guiding the second client to read the target file from the cache position based on the second request information.

The file caching method based on the distributed file system can be applied to a terminal, and the terminal can be an intelligent product terminal such as a computer and an intelligent television. In addition, the terminal in this embodiment may also be a virtual terminal such as a cloud server. In specific application, as shown in fig. 1, the file caching method based on the distributed file system of the present embodiment may include the following steps:

step S100, receiving first request information of a first client, and determining file zone bit information corresponding to the first request information, wherein the first client is used for caching a target file, and the first request information is used for requesting a metadata server in a cluster so as to send the file zone bit information to the metadata server.

In this embodiment, the first client and the second client each interact with the cluster to form a distributed file system (Distributed File System, cephFS). The first client is used for caching a target file, wherein the target file is a file which needs to be read by the second client later. The first client and the second client in this embodiment may be the same or different clients, and after the target file is cached in the first client, the first client may send the first request information to the cluster at this time, so as to notify the metadata server in the cluster that the target file is cached in the first client. In order to accurately call and read the target file in the subsequent steps, the metadata server in the cluster may determine file flag bit information corresponding to the first request information after receiving the first request information, where the file flag bit information is used to reflect a location where the first client caches the target file.

In one implementation, step S100 in this embodiment specifically includes the following steps:

step S101, determining a first request intention of the first client based on the first request information;

step S102, if the first request intention is a notification intention, acquiring index information corresponding to the notification intention in the first request information based on the notification intention, and obtaining the file zone bit information based on the index information, wherein the index information is used for guiding the target file.

Specifically, after the first client sends the first request information to the cluster, the metadata server in the cluster may analyze the first request information and determine a corresponding first request intention, so as to more accurately perform corresponding control according to the determined first request intention. The metadata server in this embodiment receives the first request information, and may parse the first request information to obtain parsed content. Next, the embodiment identifies the analysis content, and identifies whether there is header information for reflecting the target file in the analysis content, specifically, whether there is header information in the analysis content may be determined based on a specific text recognition or character recognition method. Because the header information is a pointer of the file and is a unique identifier of the file, if the header information exists in the analysis content after the analysis content is screened, it is indicated that the header information is carried in the first request information, and the header information is received by the metadata server, and the target of the first request information at that time is to notify the header information to the metadata server, so that it can be determined that the first request intention corresponding to the first request information is a notification intention.

Further, when it is determined that the first request intention is a notification intention, the present embodiment may determine index information corresponding to the notification intention in the first request information based on the notification intention. In this embodiment, the index information is used to direct the target file, i.e., indicate the location of the target file to the metadata server. In one implementation manner, the embodiment first obtains file storage directory information in the first client, where the file storage directory information is used to reflect a mapping relationship between a position code of file storage and file header information. And the first request information carries file header information, so that the file storage directory information is matched with the file header information corresponding to the notification intention, and the position coding information pointed by the file header information at the moment can be determined, wherein the position coding information is used for reflecting the position of the target file at the first client. The present embodiment can generate index information based on the position-coding information, where the index information can be used to point the target information corresponding to the header information to the position corresponding to the position-coding information. The embodiment can use the index information as the file flag bit information, so that the metadata server can directly determine the position of the target file based on the index information.

Step 200, determining a cache position of the target file at the first client based on the file zone bit information.

After the metadata server obtains the file flag bit information, the cache position of the target file at the first client can be accurately determined. The cache location in this embodiment may be a cache area or a cache node, which is used to cache the data interaction needed in the cluster.

In one implementation, the method in this embodiment includes the following steps when determining the cache location:

step S201, determining file storage path information corresponding to the file zone bit information based on the file zone bit information;

step S202, determining a final storage node in the file storage path information based on the file storage path information, and determining the cache position based on the final storage node, wherein each storage node in the first client corresponds to a unique cache position.

In this embodiment, since the file flag information is generated based on the index information generated from the position-coding information of the target file, the position-coding information of the target file can be obtained based on the file flag information. Because the location code information reflects the code of the storage location of the first client for the target file, the embodiment can determine the corresponding file storage path information based on the location code information, and can determine the final storage node in the file storage path information, that is, the final location of the target file in the first client based on the file storage path information. Each storage node in the first client in this embodiment corresponds to a unique one of the cache locations. Thus, the present embodiment can determine the cache location based on the final storage node.

In one implementation manner, the first client in this embodiment may divide the cache space into a plurality of storage blocks in advance, each storage block is provided with encoding information, each storage block is used for storing a target file, and when the target file is called or read, the free storage block may be continuously used for storing a new target file. Because each storage block has the coding information of displacement, and each storage block stores one target file, after the file storage path is obtained, the file storage path can directly execute the position of the corresponding storage block, and the buffer position is also obtained, so that the efficiency of determining the buffer position can be improved.

And step S300, receiving second request information of a second client, and guiding the second client to read the target file from the cache position based on the second request information.

When the second client needs to read the target file cached in the first client, the second client may send second request information to the metadata server, and since the metadata server has already obtained the cache location of the target file in the first client, the metadata server in this embodiment may guide the second client to read the target file from the cache location after receiving the second request information. Therefore, the second client can read the target file more conveniently, and interaction between file data can be realized even if the target file is not cached in the cluster.

In one implementation, when the second client reads the target file, the method includes the following steps:

step 301, determining a second request intention corresponding to the second request information based on the second request information;

step S302, if the second request is intended to be read, all files in the cache position are obtained, the target file is determined based on the file header information received by the metadata server, and the target file is read.

After receiving the second request information, the metadata server first determines a corresponding second request intention. When determining the second request intention, the embodiment may also obtain corresponding analysis content based on analyzing the second request information, and then read whether the analysis content has a corresponding specific character, and if so, determine the second request intention. In this embodiment, if it is determined that the second request is intended to be read, all files stored in the cache location may be acquired. And then determining the target file based on the file header information received by the metadata server, and reading the target file. Therefore, even if the target file is only cached in the first client and not in the cluster due to network delay, the second client of the embodiment can still accurately and efficiently read the target file from the first client, thereby meeting the cluster data interaction requirement.

Furthermore, in another implementation, the buffer capacity of the first client is limited, and if the number of files buffered in the first client reaches a threshold, the present embodiment may transfer the files into the cluster to which they are transferred. Specifically, if the number of file caches in the first client reaches a threshold, an empty object file in the cluster is obtained, where the empty object file is an empty file without any content. These empty object files occupy the positions in the cluster, but have no effect, and the embodiment can control the first client to cover the file to be cached on the empty object files in the cluster so as to complete file caching.

In summary, the present embodiment may first receive first request information of a first client, and determine file flag bit information corresponding to the first request information, where the first client is configured to cache a target file, and the first request information is configured to request a metadata server in a cluster to send the file flag bit information to the metadata server. And then, based on the file zone bit information, determining the cache position of the target file at the first client. And finally, receiving second request information of a second client, and guiding the second client to read the target file from the cache position based on the second request information. The embodiment can realize the efficient reading of the files stored in the client, and solve the problem of data lag caused by network delay so as to realize the interaction of the files among the terminals.

Based on the above embodiment, the present invention further provides a file caching apparatus based on a distributed file system, as shown in fig. 2, where the apparatus includes: the system comprises a flag bit determining module 10, a cache position determining module 20 and a file reading module 30. Specifically, the flag determining module 10 is configured to receive first request information of a first client, and determine file flag information corresponding to the first request information, where the first client is configured to cache a target file, and the first request information is configured to request a metadata server in a cluster, so as to send the file flag information to the metadata server. The cache location determining module 20 is configured to determine a cache location of the target file at the first client based on the file flag bit information. The file reading module 30 is configured to receive second request information of a second client, and direct the second client to read the target file from the cache location based on the second request information.

In one implementation, the flag bit determination module 10 includes:

a first request intention determining unit configured to determine a first request intention of the first client based on the first request information;

the file mark determining unit is used for acquiring index information corresponding to the notification intention in the first request information based on the notification intention if the first request intention is the notification intention, and acquiring the file mark bit information based on the index information, wherein the index information is used for guiding the target file.

In one implementation, the first request intention determining unit includes:

the request analysis subunit is used for analyzing the first request information to obtain analysis content;

and the intention analysis subunit is used for screening the header information of the analysis content, and if the header information exists in the analysis content, the first request intention is really a notification intention, and the notification intention is used for notifying the metadata server of the header information.

In one implementation, the file flag determining unit includes:

a catalog acquisition subunit, configured to acquire file storage catalog information in the first client, where the file storage catalog information is used to reflect a mapping relationship between a position code of file storage and file header information;

the position coding subunit is used for matching the file storage directory information with the file header information corresponding to the notification intention, determining position coding information pointed by the file header information, and the position coding information is used for reflecting the position of the target file at the first client;

and the mark determining subunit is used for generating the index information based on the position coding information and taking the index information as the file mark bit information.

In one implementation, the cache location determining module 20 includes:

the storage path determining unit is used for determining file storage path information corresponding to the file zone bit information based on the file zone bit information;

and the storage node determining unit is used for determining a final storage node in the file storage path information based on the file storage path information and determining the cache position based on the final storage node, wherein each storage node in the first client corresponds to a unique cache position.

In one implementation, the file reading module 30 includes:

a second request intention determining unit, configured to determine, based on the second request information, a second request intention corresponding to the second request information;

and the target file reading unit is used for acquiring all files in the cache position if the second request is intended to be read, determining the target file based on the file header information received by the metadata server, and reading the target file.

In one implementation, the system further comprises:

the empty object file determining unit is used for acquiring the empty object files in the cluster if the file buffer number in the first client reaches a threshold value;

and the file covering unit is used for controlling the first client to cover the file to be cached on the empty object file in the cluster so as to finish file caching.

The working principle of each module in the file caching device based on the distributed file system in this embodiment is the same as the principle of each step in the above method embodiment, and will not be described herein again.

Based on the above embodiment, the present invention also provides a terminal, and a schematic block diagram of the terminal may be shown in fig. 3. The terminal may include one or more processors 100 (only one shown in fig. 3), a memory 101, and a computer program 102 stored in the memory 101 and executable on the one or more processors 100, such as a file caching program based on a distributed file system. The one or more processors 100, when executing the computer program 102, may implement the various steps in an embodiment of a distributed file system based file caching method. Alternatively, the functions of the modules/units in the embodiments of the file caching apparatus based on a distributed file system may be implemented by one or more processors 100 when executing computer program 102, which is not limited herein.

In one embodiment, the processor 100 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In one embodiment, the memory 101 may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. The memory 101 may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the electronic device. Further, the memory 101 may also include both an internal storage unit and an external storage device of the electronic device. The memory 101 is used to store computer programs and other programs and data required by the terminal. The memory 101 may also be used to temporarily store data that has been output or is to be output.

It will be appreciated by those skilled in the art that the functional block diagram shown in fig. 3 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the terminal to which the present inventive arrangements may be applied, as a specific terminal may include more or less components than those shown, or may be combined with some components, or may have a different arrangement of components.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium, that when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, operational database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual operation data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A file caching method based on a distributed file system, the method comprising:

receiving second request information of a second client, and guiding the second client to read the target file from the cache position based on the second request information;

the determining the file flag bit information corresponding to the first request information includes:

if the first request intention is a notification intention, acquiring index information corresponding to the notification intention in the first request information based on the notification intention, and acquiring file zone bit information based on the index information, wherein the index information is used for guiding the target file;

the determining, based on the first request information, a first request intention of the first client includes:

analyzing the first request information to obtain analysis content;

screening the header information of the analysis content, and if the header information exists in the analysis content, determining that the first request intention is a notification intention, wherein the notification intention is used for notifying the metadata server of the header information;

if the first request intention is a notification intention, acquiring index information corresponding to the notification intention in the first request information based on the notification intention, and acquiring the file flag bit information based on the index information, wherein the method comprises the following steps:

generating the index information based on the position coding information, and taking the index information as the file zone bit information;

the determining, based on the file flag bit information, a cache location of the target file at the first client includes:

determining a final storage node in the file storage path information based on the file storage path information, and determining the cache position based on the final storage node, wherein each storage node in the first client corresponds to a unique cache position;

or,

the determining, based on the file flag bit information, a cache location of the target file at the first client, further includes:

dividing the cache space into a plurality of storage blocks in advance in a first client, setting coding information in each storage block, storing a target file in each storage block, and continuously storing a new target file by the idle storage blocks when the target file is called or read, wherein each storage block has the coding information with unique bits, and storing one target file in each storage block;

after a file storage path is obtained, determining the position of a corresponding storage block based on the file storage path to obtain a cache position;

the directing the second client to read the target file from the cache location based on the second request information includes:

if the second request intention is a reading intention, acquiring all files in the cache position, determining the target file based on file header information received by the metadata server, and reading the target file;

the method comprises the following steps:

2. A file caching apparatus based on a distributed file system, the apparatus comprising:

the file reading module is used for receiving second request information of a second client and guiding the second client to read the target file from the cache position based on the second request information;

the flag bit determination module includes:

a file flag determining unit, configured to, if the first request intention is a notification intention, obtain index information corresponding to the notification intention in the first request information based on the notification intention, and obtain the file flag bit information based on the index information, where the index information is used to guide the target file;

the first request intention determining unit includes:

an intention analysis subunit, configured to screen the parsed content for header information, and if header information exists in the parsed content, it is sure that the first request intention is a notification intention, where the notification intention is used to notify the metadata server of the header information;

the file mark determining unit includes:

a flag determining subunit, configured to generate the index information based on the position coding information, and use the index information as the file flag bit information;

the buffer position determining module includes:

a storage node determining unit, configured to determine a final storage node in the file storage path information based on the file storage path information, and determine the cache location based on the final storage node, where each storage node in the first client corresponds to a unique one of the cache locations;

or,

the buffer position determining module further includes:

the file reading module comprises:

a target file reading unit, configured to obtain all files in the cache location if the second request is intended to be read, determine the target file based on file header information received by the metadata server, and read the target file;

the system further comprises:

3. A terminal comprising a memory, a processor and a distributed file system based file caching program stored in the memory and executable on the processor, the processor implementing the steps of the distributed file system based file caching method of claim 1 when executing the distributed file system based file caching program.

4. A computer readable storage medium, wherein a file caching program based on a distributed file system is stored on the computer readable storage medium, and when the file caching program based on the distributed file system is executed by a processor, the steps of the file caching method based on the distributed file system according to claim 1 are implemented.