CN113986118B

CN113986118B - Data processing method and device

Info

Publication number: CN113986118B
Application number: CN202111140712.XA
Authority: CN
Inventors: 余思明
Original assignee: New H3C Big Data Technologies Co Ltd
Current assignee: New H3C Big Data Technologies Co Ltd
Priority date: 2021-09-28
Filing date: 2021-09-28
Publication date: 2024-06-07
Anticipated expiration: 2041-09-28
Also published as: CN113986118A

Abstract

The present application relates to the field of data storage technologies, and in particular, to a data processing method and apparatus. The method is applied to a storage server including a first storage medium composed of an SSD and a second storage medium composed of an HDD, and includes: receiving a target data writing request and judging whether first data associated with the target data exists in the first storage medium or not; if the target data exist, combining the target data with the first data to obtain second data; judging whether the data length of the second data meets the preset requirement or not; if the data length of the second data meets the preset requirement, adding the second data into a first disk brushing queue, wherein the first disk brushing queue is a high-priority disk brushing queue; and if the data length of the second data does not meet the preset requirement, adding the second data into a second disk flushing queue, wherein the second disk flushing queue is a low-priority disk flushing queue.

Description

Data processing method and device

Technical Field

The present application relates to the field of data storage technologies, and in particular, to a data processing method and apparatus.

Background

In a storage array, in order to provide system performance of a storage system, an SSD is generally used to provide write buffering for an HDD, however, since the capacity of the SSD is always smaller than that of the HDD, the SSD eventually consumes its space as data is written continuously. At this time, dirty data written previously (data written to the SSD but not written to the HDD is referred to as "dirty data", otherwise "clean data") needs to be written to the HDD, a process called flushing. Clean data that has been brushed can be obsolete to free up space to accept new data. The swiping-disk algorithm determines which data to write to the HDD, typically selecting data that has not been accessed recently, thus making more efficient use of the data blocks of the SSD and thus making the overall system better.

The SSD disk provides writing acceleration to the outside, needs to continuously accept a user writing request, consumes space continuously, and the writing performance which can be provided to the outside when the space is near to exhaustion after long-time operation depends on the performance of writing dirty data into the HDD disk when the disk is brushed. At present, the goal of the disk flushing strategy is to flush the coldest dirty data cached by the SSD disk into the HDD as soon as possible to achieve the purpose of converting the dirty data into clean and freeing space to continue to provide data access, and such a disk flushing algorithm can provide a higher data hit rate as much as possible, but the performance characteristics of the HDD disk are not considered, which may cause the disk flushing service model to have insufficient affinity to the HDD, the dirty data writing rate is too slow, and the external performance is not good.

Disclosure of Invention

The application provides a data processing method and device, which are used for solving the problem of low disk brushing efficiency caused by insufficient affinity of disk brushing service to an HDD disk in the prior art.

In a first aspect, the present application provides a data processing method applied to a storage server including a first storage medium composed of an SSD and a second storage medium composed of an HDD, the method including:

receiving a target data writing request and judging whether first data associated with the target data exists in the first storage medium or not;

If the first data associated with the target data exists in the first storage medium, combining the target data and the first data to obtain second data;

Judging whether the data length of the second data meets the preset requirement or not;

If the data length of the second data meets the preset requirement, adding the second data into a first disk brushing queue, wherein the first disk brushing queue is a high-priority disk brushing queue;

And if the data length of the second data does not meet the preset requirement, adding the second data into a second disk flushing queue, wherein the second disk flushing queue is a low-priority disk flushing queue.

Optionally, the step of determining whether first data associated with the target data exists in the first storage medium includes:

And judging whether the data space written by the target data is overlapped or continuous with the data space of the cached data, and if so, judging that the cached data is the first data related to the target data.

Optionally, if the data length of the second data is greater than or equal to the set threshold, determining that the data length of the second data meets the preset requirement.

Optionally, the step of adding the second data to the first brush queue comprises:

adding the second data to the tail of the first brush queue;

The step of adding the second data to a second brush queue comprises:

and adding the second data to the tail of the second brush queue.

Optionally, the method further comprises:

When a disc brushing instruction is received, judging whether disc brushing data to be brushed is cached in the first disc brushing queue;

if the first brushing queue is judged to be cached with the data to be brushed, part/all of the data to be brushed is stored in a second storage medium from the head of the first brushing queue;

If the first disk brushing queue is judged to not store the data to be brushed, judging whether the second disk brushing queue is cached with the data to be brushed;

and if the fact that the data to be brushed is cached in the second brushing queue is judged, part/all of the data to be brushed is stored in a second storage medium from the head of the second brushing queue.

In a second aspect, the present application provides a data processing apparatus applied to a storage server including a first storage medium composed of an SSD and a second storage medium composed of an HDD, the apparatus comprising:

A receiving unit configured to receive a target data write request;

a first judging unit configured to judge whether first data associated with the target data exists in the first storage medium;

if the first judging unit judges that the first data related to the target data exists in the first storage medium, combining the target data and the first data to obtain second data;

The second judging unit is used for judging whether the data length of the second data meets the preset requirement;

if the second judging unit judges that the data length of the second data meets the preset requirement, the second data is added into a first disk brushing queue, wherein the first disk brushing queue is a high-priority disk brushing queue;

And if the second judging unit judges that the data length of the second data does not meet the preset requirement, adding the second data into a second disk brushing queue, wherein the second disk brushing queue is a low-priority disk brushing queue.

Optionally, when judging whether the first data associated with the target data exists in the first storage medium, the first judging unit is specifically configured to:

Optionally, when the second data is added to the first brush queue, the second judging unit is specifically configured to:

adding the second data to the tail of the first brush queue;

when the second data is added to the second brush queue, the second judging unit is specifically configured to:

and adding the second data to the tail of the second brush queue.

Optionally, the apparatus further comprises:

The disk brushing unit is used for judging whether disk brushing data are cached in the first disk brushing queue or not when a disk brushing instruction is received; if the first brushing queue is judged to be cached with the data to be brushed, part/all of the data to be brushed is stored in a second storage medium from the head of the first brushing queue; if the first disk brushing queue is judged to not store the data to be brushed, judging whether the second disk brushing queue is cached with the data to be brushed; and if the fact that the data to be brushed is cached in the second brushing queue is judged, part/all of the data to be brushed is stored in a second storage medium from the head of the second brushing queue.

In a third aspect, an embodiment of the present application provides a data processing apparatus, including:

a memory for storing program instructions;

a processor for invoking program instructions stored in said memory, performing the steps of the method according to any of the first aspects above in accordance with the obtained program instructions.

In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the steps of the method according to any one of the first aspects.

As can be seen from the foregoing, the method for processing data according to the embodiment of the present application is applied to a storage server, where the storage server includes a first storage medium composed of SSD and a second storage medium composed of HDD, and the method includes: receiving a target data writing request and judging whether first data associated with the target data exists in the first storage medium or not; if the first data associated with the target data exists in the first storage medium, combining the target data and the first data to obtain second data; judging whether the data length of the second data meets the preset requirement or not; if the data length of the second data meets the preset requirement, adding the second data into a first disk brushing queue, wherein the first disk brushing queue is a high-priority disk brushing queue; and if the data length of the second data does not meet the preset requirement, adding the second data into a second disk flushing queue, wherein the second disk flushing queue is a low-priority disk flushing queue.

By adopting the data processing method provided by the embodiment of the application, the relevance of the dirty data is judged, a plurality of dirty data with small data length are combined into one dirty data with large data length, and the brushing priority of the dirty data with large data length is defined to be high, so that the priority setting of the size of the dirty data is realized, and the better writing performance is provided for the outside when the service SSD with more affinity to the HDD is issued in the brushing process for cache acceleration.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly describe the drawings required to be used in the embodiments of the present application or the description in the prior art, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings of the embodiments of the present application for a person having ordinary skill in the art.

FIG. 1 is a detailed flowchart of a data processing method according to an embodiment of the present application;

FIG. 2 is a detailed flowchart of another data processing method according to an embodiment of the present application;

FIG. 3 is a detailed flowchart of a method for brushing a disc according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to any or all possible combinations including one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. Depending on the context, furthermore, the word "if" used may be interpreted as "at … …" or "at … …" or "in response to a determination".

As an example, referring to fig. 1, a detailed flowchart of a data processing method according to an embodiment of the present application is applied to a storage server, where the storage server includes a first storage medium including an SSD and a second storage medium including an HDD, and the method includes the following steps:

Step 100: a target data write request is received and it is determined whether first data associated with the target data exists in the first storage medium.

In the embodiment of the present application, when determining whether the first data associated with the target data exists in the first storage medium, a preferred implementation manner is as follows:

For example, assuming that data 1 has been stored in the SSD with a data space of 0 to 15, if the currently written target data has a data space of 13 to 16, the data space overlaps with the data space of data 1 (13 to 15), it is determined that data 1 is associated with the target data.

Assuming that data 2 is already stored in the SSD, the data space thereof is 0-15, if the data space of the target data currently written is 16-20, the data space thereof is linearly continuous (0-15, 16-20) with the data space of data 1, it is determined that data 1 is associated with the target data.

Assuming that data 3 is already stored in the SSD, the data space thereof is 0-15, if the data space of the target data currently written is 17-20, the data space thereof does not overlap with the data space of data 1 and is not linearly continuous, it is determined that data 1 is not associated with the target data.

Step 110: and if the first data associated with the target data exists in the first storage medium, combining the target data and the first data to obtain second data.

As described above, it is assumed that data 1 has been stored in the SSD, the data space thereof is 0 to 15, the data space of the target data currently written is 13 to 16, and data 1 is associated with the target data. Then, after the data 1 is combined with the target data, the data space of the obtained second data is 0-16.

Similarly, suppose that data 2 is already stored in the SSD, the data space thereof is 0-15, the data space of the currently written target data is 16-20, and data 2 is associated with the target data. Then, after the data 2 is combined with the target data, the data space of the obtained second data is 0-20.

Step 120: judging whether the data length of the second data meets the preset requirement.

In the embodiment of the application, if the data length of the second data is greater than or equal to the set threshold, it is determined that the data length of the second data meets the preset requirement.

That is, whether the data length of the second data satisfies the preset requirement may be determined by determining whether the data length of the second data is equal to or greater than the set threshold. If the preset requirement is greater than or equal to the set threshold, the preset requirement is met, and if the preset requirement is less than the set threshold, the preset requirement is not met.

Step 130: and if the data length of the second data meets the preset requirement, adding the second data into a first disk flushing queue, wherein the first disk flushing queue is a high-priority disk flushing queue.

Step 140: and if the data length of the second data does not meet the preset requirement, adding the second data into a second disk flushing queue, wherein the second disk flushing queue is a low-priority disk flushing queue.

In the embodiment of the application, the first disk brushing queue is a high-priority disk brushing queue, and the second disk brushing queue is a low-priority disk brushing queue, namely the disk brushing priority of the first disk brushing queue is higher than that of the second disk brushing queue.

That is, when it is determined that the brushing operation is performed, if there is data to be brushed in the first brushing queue with high priority, the data to be brushed in the first brushing queue needs to be processed preferentially. And processing the data to be brushed in the second brushing queue when the data to be brushed does not exist in the first brushing queue.

The data processing method provided by the embodiment of the application is described in detail below in connection with a specific application scenario. As shown in fig. 2, an exemplary embodiment of a detailed flowchart of a data processing method according to the present application is shown, and is configured to receive a user write request, determine whether the write request is a transparent write, if yes, directly write the write request into an HDD disk, if not, identify a data space associated with the write service (i.e. whether there is a first data that is a data space overlapping or linearly continuous with the target data corresponding to the write service), if yes, perform data merging processing to obtain second data (dirty data), and then determine whether a data length of the dirty data (associated related data and target data) is greater than or equal to a set threshold, if yes, hang the dirty data into a high-priority flash queue, and write the dirty data into an SSD disk; if not, hanging the data into a low-priority disk brushing queue and writing the data into an SSD disk.

Further, referring to fig. 3, a detailed flowchart of a disk flushing method according to an embodiment of the present application is shown, when a disk flushing instruction is received, a disk flushing is started, whether a high priority queue is empty is determined, if not, head dirty data of the high priority queue is extracted, further, whether the amount of the extracted dirty data reaches a word disk flushing threshold is determined, if yes, the extracted dirty data is flushed to an HDD disk, otherwise, head dirty data of the high priority queue is extracted continuously. If the high priority queue is empty, judging whether the low priority queue is empty, if not, picking head dirty data of the low priority queue, and when judging that the dirty data amount reaches a single disk flushing threshold, downloading the picked dirty data to an HDD disk. Of course, if both the high priority queue and the low priority queue are empty, this indicates that no dirty data needs to be flushed to the HDD disk.

For example, assuming that the currently managed dirty data space (i.e., the space in the SSD where data has been written) is ([ 0-15 ], [ 56-63 ], [ 72-127 ], [ 144-159 ]), the set high priority threshold is 16.

If the data space of the target data written at this time is 32-35, the target data written at this time is discontinuous and non-overlapping with the data spaces of other written data, then the target data written at this time is independent one data, the data length is 4, and the data length is smaller than the high priority threshold, namely the data is placed at the tail of the low-priority flush queue.

If the data space of the target data written at this time is 16-19 and the first data associated with the target data is 0-15, the second data obtained after final merging is 0-19, the data length is 20, and is greater than the high priority threshold, and the second data needs to be put into the tail of the high-priority flush queue (the original 0-15 is taken off from the high-priority queue and then hung into the tail of the high-priority queue).

If the data space of the target data written at this time is 52-55 and the first data associated with the target data is 56-63, the second data obtained after final merging is 52-63, the data length is 12 and is smaller than the high priority threshold, and the second data needs to be put into the tail of the low-priority flush queue (the original 56-63 is taken off from the low-priority queue and then hung into the tail of the low-priority queue).

Referring to fig. 4, a schematic structural diagram of a data processing apparatus according to an embodiment of the present application is provided, where the apparatus is applied to a storage server, and the storage server includes a first storage medium including an SSD and a second storage medium including an HDD, and the apparatus includes:

A receiving unit 40 for receiving a target data write request;

a first judging unit 41 for judging whether or not there is first data associated with the target data in the first storage medium;

If the first judging unit 41 judges that the first data associated with the target data exists in the first storage medium, combining the target data and the first data to obtain second data;

A second judging unit 42, configured to judge whether a data length of the second data meets a preset requirement;

If the second judging unit 42 judges that the data length of the second data meets the preset requirement, the second data is added into a first disk brushing queue, wherein the first disk brushing queue is a high-priority disk brushing queue;

and if the second judging unit 42 judges that the data length of the second data does not meet the preset requirement, adding the second data into a second disk flushing queue, wherein the second disk flushing queue is a low-priority disk flushing queue.

Alternatively, when judging whether there is first data associated with the target data in the first storage medium, the first judging unit 41 is specifically configured to:

Optionally, when the second data is added to the first brush queue, the second determining unit 42 is specifically configured to:

adding the second data to the tail of the first brush queue;

when the second data is added to the second brush queue, the second judging unit 42 is specifically configured to:

and adding the second data to the tail of the second brush queue.

Optionally, the apparatus further comprises:

The above units may be one or more integrated circuits configured to implement the above methods, for example: one or more Application SPECIFIC INTEGRATED Circuits (ASIC), or one or more microprocessors (DIGITAL SINGNAL processor, DSP), or one or more field programmable gate arrays (Field Programmable GATE ARRAY, FPGA), etc. For another example, when a unit is implemented in the form of a processing element scheduler code, the processing element may be a general purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Further, in the data processing apparatus provided by the embodiment of the present application, from a hardware level, a schematic hardware architecture of the data processing apparatus may be shown in fig. 5, where the data processing apparatus may include: a memory 50 and a processor 51,

Memory 50 is used to store program instructions; the processor 51 calls the program instructions stored in the memory 50 and executes the above-described method embodiments according to the obtained program instructions. The specific implementation manner and the technical effect are similar, and are not repeated here.

Optionally, the present application also provides a storage server comprising at least one processing element (or chip) for performing the above-described method embodiments.

Alternatively, the application also provides a program product, such as a computer-readable storage medium, having stored thereon computer-executable instructions for causing a computer to perform the above-described method embodiments.

Here, a machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state disk, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Moreover, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims

1. A data processing method, characterized by being applied to a storage server including a first storage medium composed of an SSD and a second storage medium composed of an HDD, the method comprising:

if the data length of the second data does not meet the preset requirement, adding the second data into a second disk brushing queue, wherein the second disk brushing queue is a low-priority disk brushing queue;

the step of determining whether first data associated with the target data exists in the first storage medium includes:

2. The method of claim 1, wherein if the data length of the second data is greater than or equal to a set threshold, determining that the data length of the second data meets a preset requirement.

3. The method of claim 1, wherein the step of adding the second data to the first brush queue comprises:

adding the second data to the tail of the first brush queue;

The step of adding the second data to a second brush queue comprises:

and adding the second data to the tail of the second brush queue.

4. A method as claimed in claim 3, wherein the method further comprises:

5. A data processing apparatus, characterized by being applied to a storage server including a first storage medium composed of an SSD and a second storage medium composed of an HDD, the apparatus comprising:

A receiving unit configured to receive a target data write request;

If the second judging unit judges that the data length of the second data does not meet the preset requirement, the second data is added into a second disk brushing queue, wherein the second disk brushing queue is a low-priority disk brushing queue;

When judging whether the first data associated with the target data exists in the first storage medium, the first judging unit is specifically configured to:

and judging whether the data space written by the target data is overlapped or continuous with the data space of the cached data, and if so, judging that the cached data is the first data associated with the target data.

6. The apparatus of claim 5, wherein if the data length of the second data is greater than or equal to a set threshold, determining that the data length of the second data meets a preset requirement.

7. The apparatus of claim 5, wherein the second determining unit is specifically configured to:

adding the second data to the tail of the first brush queue;

and adding the second data to the tail of the second brush queue.

8. The apparatus of claim 7, wherein the apparatus further comprises: