CN109960470B

CN109960470B - Data processing method and device and leader node

Info

Publication number: CN109960470B
Application number: CN201910244981.7A
Authority: CN
Inventors: 杨潇; 金朴堃
Original assignee: New H3C Technologies Co Ltd
Current assignee: New H3C Technologies Co Ltd
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2022-07-29
Anticipated expiration: 2039-03-28
Also published as: CN109960470A

Abstract

The invention provides a data processing method, a data processing device and a leader node. A cache pool of the distributed storage system is configured with a plurality of homing groups, each homing group corresponding to a plurality of objects stored in the cache pool, the method comprising: if the used capacity of the cache pool reaches a preset disk brushing capacity threshold value, determining a first number of objects to be brushed in the cache pool; respectively determining a second number of the objects to be brushed corresponding to each homing group according to the first number and the number of the objects corresponding to each homing group, wherein the second number is in positive correlation with the number of the objects corresponding to the homing group; and screening the second quantity of objects from the objects corresponding to the respective arrangement groups respectively and executing a disk brushing operation. The invention can reduce the frequency of disk refreshing, reduce the influence on the data read-write performance, avoid the object occupying the space of the cache pool for a long time and improve the utilization rate of the space of the cache pool.

Description

Data processing method and device and leader node

Technical Field

The invention relates to the technical field of data storage, in particular to a data processing method, a data processing device and a leader node.

Background

The Ceph is a distributed storage system, provides a software-defined and unified storage solution for block storage, object storage and file storage, has the characteristics of excellent performance, high reliability and high expansibility, and can provide massive, undifferentiated and unified distributed storage service for the outside.

Ceph adopts a layered storage technology, and a hard Disk with high price and high storage speed, such as a Solid State Disk (SSD), is deployed at the front end to form a cache pool; a low-cost Hard Disk, such as a Hard Disk Drive (HDD), with a relatively slow storage speed is deployed at the back end to form a storage pool.

User data is first stored in a cache pool. When the used capacity of the cache pool reaches the disk-flushing capacity threshold, dirty data (data not written to the back-end storage pool) in the cache pool is written to the back-end storage pool, and the process is called disk flushing. Clean data in the cache pool after the disk is flushed (dirty data becomes clean data after being written into the back-end storage pool) can be deleted to release the storage space of the cache pool, and the process is called elimination.

Currently, the brush of Ceph is mainly implemented based on PG (Placement Group).

PG is a logical cell in Ceph. When user data arrives at the distributed Storage system, the user data is first divided into a plurality of objects (objects), the objects are mapped to the PGs, one PG may correspond to a plurality of objects, and the PG is then mapped to the actual Storage unit OSD (Object Storage Device). Thus, after the disk is refreshed, the Object corresponding to the user data is written into the OSD.

The cache pool of Ceph may be preconfigured with a certain number of PGs. Before the disk is flushed, the ratio of the disk flushing capacity threshold value of the cache pool to the PG number is used as the disk flushing capacity threshold value of each PG. And when the capacity of the Object corresponding to the PG reaches the disk brushing capacity threshold value of the PG, performing disk brushing.

In actual operation, it is difficult to ensure that the number of objects corresponding to each PG is balanced. Therefore, there may be a portion PG corresponding to more objects for a long time, which results in frequent disk refreshing and affects data read-write performance. Some PGs have fewer objects for a long time, and the objects cannot be flushed and eliminated in time, so that the space of the cache pool is occupied.

That is to say, the prior art has the problem that frequent disk refreshing affects data read-write performance and the space utilization of the cache pool is unreasonable.

Disclosure of Invention

The invention provides a data processing method, a data processing device and a leader node, aiming at solving the problems that the data read-write performance is influenced by the existing disk-refreshing operation and the space utilization of a cache pool is unreasonable, and the data processing method, the data processing device and the leader node are used for improving the data read-write performance and improving the space utilization rate of the cache pool.

In order to achieve the purpose, the invention provides the following technical scheme:

in a first aspect, the present invention provides a data processing method, applied to a leader node in a distributed storage system, where a cache pool of the distributed storage system is configured with a plurality of homing groups, and each homing group corresponds to a plurality of objects stored in the cache pool, the method includes:

If the used capacity of the cache pool reaches a preset disk brushing capacity threshold value, determining a first number of objects to be brushed in the cache pool;

respectively determining a second number of the objects to be brushed corresponding to each homing group according to the first number and the number of the objects corresponding to each homing group, wherein the second number is in positive correlation with the number of the objects corresponding to the homing group;

and screening the second quantity of objects from the objects corresponding to the respective arrangement groups respectively and executing a disk brushing operation.

Optionally, the determining the first number of the objects to be flushed in the cache pool includes:

acquiring the preset first quantity;

or,

acquiring the preset disc to be brushed capacity and the size of an object;

and determining a first number of the objects to be flushed in the cache pool according to the capacity of the objects to be flushed and the size of the objects.

Optionally, the determining, according to the first number and the number of the objects corresponding to each of the homing groups, the second number of the objects to be brushed corresponding to each of the homing groups, respectively, includes:

the following operations are performed separately for each of the homing groups:

determining the percentage of the number of the objects corresponding to one homing group in the total number of the objects in the cache pool as a first parameter of the one homing group;

Determining a second parameter of the one homing group according to the number of cold objects corresponding to the one homing group and the number of objects corresponding to the one homing group;

and determining a second number of the objects to be brushed corresponding to the one homing group according to the first number, the first parameter and the second parameter.

Optionally, the determining a second parameter of the one homing group according to the number of cold objects corresponding to the one homing group and the number of objects corresponding to the one homing group includes:

acquiring the sum of the number of cold objects corresponding to the one homing group and the number of objects corresponding to the one homing group;

taking the quotient of the number of the objects corresponding to the one homing group as a second parameter of the one homing group.

Optionally, the determining, according to the first number, the first parameter, and the second parameter, a second number of the objects to be brushed corresponding to the one arrangement group includes:

obtaining the product of the first quantity, the first parameter and the second parameter;

and selecting the minimum number from the product and the number of the objects corresponding to the one arrangement group as a second number of the objects to be brushed corresponding to the one arrangement group.

In a second aspect, the present invention provides a data processing apparatus applied to a leader node in a distributed storage system, where a cache pool of the distributed storage system is configured with a plurality of homing groups, and each homing group corresponds to a plurality of objects stored in the cache pool, the apparatus includes:

a first quantity determining unit, configured to determine a first quantity of objects to be flushed in the cache pool if the used capacity of the cache pool reaches a preset flushing capacity threshold;

a second quantity determining unit, configured to determine, according to the first quantity and the quantity of the objects corresponding to each of the storage groups, a second quantity of the objects to be brushed corresponding to each of the storage groups, where the second quantity and the quantity of the objects corresponding to the storage groups have a positive correlation;

and the object screening unit is used for screening the second number of objects from the objects corresponding to the respective arrangement groups and executing the disk brushing operation.

Optionally, the determining, by the first number determining unit, a first number of objects to be flushed in the cache pool includes:

acquiring the preset first quantity;

or,

acquiring the preset disc to be brushed capacity and the size of an object;

Optionally, the determining, by the second quantity determining unit, a second quantity of the objects to be brushed corresponding to each of the storage groups according to the first quantity and the quantity of the objects corresponding to each of the storage groups, respectively includes:

Optionally, the determining, by the second quantity determining unit, a second parameter of the one homing group according to the quantity of the cold objects corresponding to the one homing group and the quantity of the objects corresponding to the one homing group includes:

Optionally, the determining, by the second quantity determining unit, a second quantity of the objects to be brushed corresponding to the one grouping group according to the first quantity, the first parameter, and the second parameter includes:

In a third aspect, the invention provides a leader node comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to cause, by the machine-executable instructions: the disc brushing method is realized.

In a fourth aspect, the present invention provides a machine-readable storage medium having stored therein machine-executable instructions that, when executed by a processor, implement the above-described method of flashing a disk.

It can be seen from the above description that, in the present invention, the leader node determines whether to need to flush the disk according to the use condition of the cache pool, so that the disk flushing frequency can be reduced, and the influence on the data read-write performance is reduced. Secondly, a certain number of objects are brushed off for each homing group, so that the objects are prevented from occupying the space of the cache pool for a long time, and the utilization rate of the space of the cache pool is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart illustrating a method of data processing according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating the implementation of step 101 according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an implementation of step 102 according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a hardware structure of a leader node according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used to describe various information in embodiments of the present invention, the information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, the negotiation information may also be referred to as second information, and similarly, the second information may also be referred to as negotiation information without departing from the scope of embodiments of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The embodiment of the invention provides a data processing method. In the method, the leader node determines whether the disk needs to be refreshed or not according to the service condition of the whole cache pool, so that the disk refreshing frequency can be reduced, and the influence on the data read-write performance is reduced. Secondly, the embodiment of the invention brushes a certain number of objects for each homing group, thereby avoiding the objects occupying the space of the cache pool for a long time and improving the utilization rate of the space of the cache pool.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the following detailed description of the embodiments of the present invention is performed with reference to the accompanying drawings and specific embodiments:

referring to fig. 1, a flowchart of a data processing method according to an embodiment of the present invention is shown. The flow applies to a Leader node (Leader) in a distributed storage system (e.g., Ceph).

In a distributed storage system employing tiered storage techniques, it is common to include a cache pool at the front end and a storage pool at the back end. User data is firstly stored in a cache pool, and after a disk refreshing operation is executed, the user data is written into a back-end storage pool from the cache pool.

As shown in fig. 1, the data processing flow may include the following steps:

step 101, if the used capacity of the cache pool reaches a preset disk-brushing capacity threshold, determining a first number of objects to be brushed in the cache pool.

The leader node may monitor the usage of the cache pool. And if the monitored used capacity of the cache pool reaches a preset disk-refreshing capacity threshold value, the cache pool needs to execute disk-refreshing operation so as to release the space of the cache pool.

In a distributed storage system, user data is stored in the form of objects (objects). Therefore, this step needs to determine the number of objects (called disk objects to be flushed for short) in the cache pool, which need to perform the disk flushing operation.

The process of determining the first number of objects to be flushed in the cache pool in this step is described below, and is not described herein for the moment.

The first number is named here for ease of distinction and is not intended to be limiting.

And 102, respectively determining a second number of the objects to be brushed corresponding to each arrangement group according to the first number and the number of the objects corresponding to each arrangement group.

In a distributed storage system, objects stored in a cache pool are managed in a unit of a group of storage (PG). Each cache pool is pre-configured with a certain number of homing groups. When user data is written into the cache pool in the form of an object, the leader node establishes a mapping relation between the object and the homing group, one object corresponds to one homing group, but one homing group can correspond to a plurality of objects.

After the first number of the objects to be flashed in the cache pool is determined through step 101, the first number needs to be split into each of the staging groups, that is, the second number of the objects to be flashed corresponding to each of the staging groups is determined. The second number is positively correlated with the number of objects corresponding to the homing group. That is, the larger the number of objects corresponding to the group, the larger the number of objects to be brushed corresponding to the group.

The second number is given here only for the sake of distinction and is not intended to be limiting.

Optionally, the sum of the second number corresponding to each of the homing groups is not less than the first number, that is, the disk-flushing requirement of the whole cache pool is met.

The process of determining the second number of the objects to be brushed corresponding to the group in this step is described below, and is not repeated here.

And 103, screening a second quantity of objects from the objects corresponding to the arrangement groups respectively and executing a disk brushing operation.

At the time of a particular filter, objects that have not been recently accessed (cold objects) may be selected. And executing a disk brushing operation on the screened object. That is, dirty data in the screened object is written to the back-end storage pool.

The dirty data becomes clean data after being subjected to disk brushing. And when the used capacity of the cache pool reaches a preset elimination capacity threshold value, selecting a certain amount of clean data to execute elimination operation so as to release the space of the cache pool.

Thus, the flow shown in fig. 1 is completed.

As can be seen from the flow shown in fig. 1, in the embodiment of the present invention, the leader node determines whether to flush the disk according to the use condition of the cache pool, so that the disk flushing frequency is reduced, and further, the influence on the data read-write performance is reduced. Secondly, a certain number of objects are brushed off for each homing group, so that the objects are prevented from occupying the space of the cache pool for a long time, and the utilization rate of the space of the cache pool is improved.

The process of determining the first number of objects to be flushed in the cache pool in step 101 is described below.

As an embodiment, the leader node may directly obtain the preset first number.

As another implementation manner, referring to fig. 2, an implementation flow of step 101 is shown in this embodiment of the present invention.

As shown in fig. 2, the process may include the following steps:

step 201, acquiring a preset disc to be brushed capacity and the size of an object.

In the embodiment of the present invention, the capacity of the disk to be flushed, that is, how much capacity, for example, 1000M, needs to be released when the used capacity of the cache pool reaches the disk flushing capacity threshold, may be preset to ensure the processing performance of the distributed storage system.

In addition, when the distributed storage system segments user data, the user data are segmented into objects with uniform sizes. That is, the size of the objects in the distributed storage system is consistent, e.g., 4M.

Step 202, determining a first number of objects to be flushed in the cache pool according to the acquired capacity of the objects to be flushed and the size of the objects.

Specifically, the quotient of the disk to be brushed capacity and the object size is used as the first number.

For example, the disk to be flushed has a capacity of 1000M, and the size of the object is 4M, the number of objects in the cache pool that need to perform the disk flushing operation is 1000/4-250.

The flow shown in fig. 2 is completed.

Through the flow shown in fig. 2, the total number of objects in the cache pool that need to perform the disk-scrubbing operation can be determined.

The process of determining the second number of objects to be brushed corresponding to each of the parking groups in step 102 will be described below. The flow shown in fig. 3 is performed for each of the homing groups.

As shown in fig. 3, the process may include the following steps:

step 301, determining the percentage of the number of objects corresponding to one homing group to the total number of objects in the cache pool as a first parameter of one homing group.

Here, the first parameter is only named for convenience of distinction and is not intended to be limiting.

For example, if the number of objects corresponding to the current home group is 100, and the total number of objects in the cache pool is 1000, the first parameter of the home group is 100/1000-10%.

Step 302, determining a second parameter of a homing group based on the number of cold objects corresponding to a homing group and the number of objects corresponding to a homing group.

Cold objects, i.e. objects with a relatively low access frequency, are usually determined by the number of accesses per unit time.

As an embodiment, the sum of the number of cold objects corresponding to a homing group and the number of objects corresponding to the homing group may be obtained. And taking the quotient of the obtained number of the objects corresponding to the homing group as a second parameter of the homing group. Here, the second parameter is only named for convenience of distinction and is not intended to be limiting.

For example, if the number of cold objects corresponding to a set group is 20, and the number of objects corresponding to the set group is 100, the second parameter of the set group is (20+100)/100 — 1.2.

The ratio of cold objects in the set group can be measured by the step.

Step 303, determining a second number of the objects to be brushed corresponding to one of the homing groups according to the first number, the first parameter and the second parameter.

As an example, a product of the first number, the first parameter, and the second parameter may be obtained. And selecting the minimum number from the number of the objects corresponding to the product and the arrangement group as a second number of the objects to be brushed corresponding to the arrangement group.

For example, the first number of the objects to be copied in the cache pool is 250, the number of the objects corresponding to the current grouping is 100, the first parameter of the current grouping is 10%, and the second parameter is 1.2, so that the number of the objects to be copied in the grouping is min {250 × 10% × 1.2, 100} — 30. That is, 30 objects in the objects corresponding to the current homing group need to perform a disk-flushing operation to release the space of the cache pool.

It can be seen that in the embodiment of the present invention, more buffer pool spaces are released by the grouped objects with more objects and more cold objects, so that the space utilization of the buffer pool is more reasonable.

The flow shown in fig. 3 is completed.

Through the process shown in fig. 3, the second number of the objects to be brushed corresponding to each of the homing groups is reasonably determined, so that the utilization of the cache pool is more optimized.

The method provided by the present invention is described below by way of a specific example.

Taking a buffer Pool in Ceph as an example, the buffer Pool is marked as Pool.

The volume of the disc to be brushed of the preset Pool is 1000M. The Pool is pre-configured with 5 homing groups (PG) designated PG 1-PG 5. The mapping relationship between each of the arranged groups (PG) and the Object (Object) in the current Pool is shown in table 1.

Placement group	Object
		PG1	Object1～Object100
PG2	Object101～Object300
		PG3	Object301～Object600
PG4	Object601～Object800
		PG5	Object801～Object1000

TABLE 1

As can be seen from Table 1, the total number of objects in the current Pool is 1000. The number of objects corresponding to PG1 is 100, the number of objects corresponding to PG2 is 200, the number of objects corresponding to PG3 is 300, the number of objects corresponding to PG4 is 200, and the number of objects corresponding to PG5 is 200. The size of each Object is 4M.

If the leader node determines that the used capacity (1000 × 4 ═ 4000M) of the current Pool reaches the disk-brushing capacity threshold, the number of objects to be brushed in the Pool is determined according to the preset capacity (1000M) to be brushed, and is recorded as N, wherein N is 1000/4 ═ 250. I.e. 250 objects need to be brushed off in Pool.

The leader node acquires the number (100) of objects corresponding to PG1 and the total number (1000) of objects in Pool, and determines that the first parameter W1 of PG1 is 100/1000 which is 10%. Similarly, the first parameter W2-200/1000-20% of PG2 is determined; the first parameter W3 — 300/1000 of PG3 is 30%; the first parameter W4 — 200/1000 of PG4 is 20%; PG5 has a first parameter W5-200/1000-20%.

Referring to table 2, the number of cold objects in the corresponding object for each PG shown in the present embodiment.

Placement group	Number of cold objects
		PG1	20
PG2	30
		PG3	60
PG4	10
		PG5	40

TABLE 2

From Table 2, the leader node may compute the second parameter for each PG. Taking PG1 as an example, if the number of cold objects (20) corresponding to PG1 and the number of objects (100) corresponding to PG1 are obtained, then the second parameter V1 of PG1 is (20+100)/100 is 1.2. Similarly, the second parameter V2 ═ 30+200)/200 ═ 1.15 of PG 2; the second parameter V3 ═ 60+300)/300 ═ 1.2 of PG 3; the second parameter V4 ═ 10+200)/200 of PG4 ═ 1.05; the first parameter V5 ═ 40+200)/200 of PG5 is 1.2.

And determining the number of the objects to be brushed corresponding to each PG according to the number N of the objects to be brushed in the Pool, the first parameter W and the second parameter V of each PG. Taking PG1 as an example, the number N (250) of objects to be brushed in Pool is obtained, the first parameter W1 (10%) of PG1, and the second parameter V1(1.2) of PG1 are calculated, and N × W1 × V1 is calculated to be 250 × 10% × 1.2 ═ 30 and smaller than the number (100) of objects corresponding to PG1, so that the number N1 of objects to be brushed corresponding to PG1 is determined to be 30.

Similarly, a first parameter W2 (20%) of PG2, a second parameter V2(1.15) of PG2 are obtained, and N × W2 × V2 is calculated to be 250 × 20% × 1.15 ≈ 58 and smaller than the number of objects (200) corresponding to PG2, so that it is determined that the number N2 of objects to be brushed corresponding to PG2 is 58.

A first parameter W3 (30%) of PG3, a second parameter V3(1.2) of PG3 are acquired, and N × W3 × V3 is calculated to be 250 × 30% × 1.2 to be 90, which is smaller than the number of objects (300) corresponding to PG3, so that the number N3 of objects to be brushed corresponding to PG3 is determined to be 90.

A first parameter W4 (20%) of PG4, a second parameter V4(1.05) of PG4 are acquired, and N × W4 × V4 is calculated to be 250 × 20% × 1.05 ≈ 53 and smaller than the number of objects (200) corresponding to PG4, so that it is determined that the number N4 of objects to be brushed corresponding to PG4 is 53.

A first parameter W5 (20%) of PG5, a second parameter V5(1.2) of PG5, and N × W5 × V5 is calculated to be 250 × 20% × 1.2 to be 60, which is smaller than the number of objects (200) corresponding to PG5, so that the number N5 of objects to be brushed corresponding to PG5 is determined to be 60.

As an embodiment, each PG in the cache pool may maintain objects corresponding to the PGs respectively in an Object queue manner. That is, each PG corresponds to an Object queue, and the Object queue records the identifier of the Object having a mapping relationship with the PG. When an Object is accessed, the identity of the Object is added to the end of the corresponding Object queue. That is, the Object identifiers in the Object queue are sorted according to the frequency with which the objects are accessed. The identification of an Object at the front of the queue indicates that the corresponding Object has not been accessed recently or is accessed less frequently.

After the quantity of the objects to be brushed corresponding to the PG is determined, the leader node inquires an Object queue corresponding to the PG, obtains the identifications of the objects with the quantity to be brushed at the front end of the queue, and executes a brushing operation on the corresponding objects.

Taking PG1 as an example, the Object queue (denoted as Q1) of PG1 records the identifiers of objects 1 to 100. And the order of arrangement (front to back) of the labels is Object 1-Object 100. When the number N1 of objects to be brushed corresponding to PG1 is determined to be 30, Q1 is queried, the identifiers of the 30 objects located at the front end of Q1 are obtained, and the corresponding objects (Object 1-Object 30) are found according to the identifiers to execute the brushing operation.

The brushing processing flow of other PGs is the same, and is not described herein again.

As can be seen from the above description, the total number of objects that each PG in Pool needs to perform a disk-brushing operation is N1+ N2+ N3+ N4+ N5-30 +58+90+53+ 60-291 objects, which is greater than the number of objects (250) to be brushed in Pool, and can satisfy the disk-brushing requirement.

As can be seen from the above description, in the objects corresponding to each PG, a certain number of objects are flushed, and the cold objects (objects located at the front end of the queue) are preferably flushed, so that the utilization rate of the cache pool is more reasonable.

The object after the disk brushing is executed becomes a clean object, and a knockout operation needs to be executed to release the space of the cache pool.

Specifically, the leader node pushes the identifier of the flushed clean object to an object stack, and each PG corresponds to one object stack. When the used capacity of the Pool reaches the elimination capacity threshold, acquiring the identifications of a certain number of clean objects from the bottom of the object stack corresponding to each PG (the identification of the object which is firstly written on the disk is positioned at the bottom), and deleting the clean objects corresponding to the identifications in the Pool to release the cache space.

In one embodiment, the culling capacity threshold may be equal to the brush capacity threshold. The number of objects to be brushed corresponding to PG may be equal to the number of objects to be eliminated corresponding to PG. That is, the objects to be brushed are all eliminated.

This completes the description of the present embodiment.

The method provided by the embodiment of the invention is described above, and the device provided by the embodiment of the invention is described below:

fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention. The device includes: a first quantity determination unit 401, a second quantity determination unit 402, and an object filtering unit 403, wherein:

a first quantity determining unit 401, configured to determine a first quantity of objects to be flushed in the cache pool if the used capacity of the cache pool reaches a preset disk flushing capacity threshold;

A second quantity determining unit 402, configured to determine, according to the first quantity and the quantity of the objects corresponding to each of the parking groups, a second quantity of the objects to be brushed corresponding to each of the parking groups, where the second quantity and the quantity of the objects corresponding to the parking groups have a positive correlation;

an object screening unit 403, configured to screen the second number of objects from the objects corresponding to the respective grouping groups, respectively, and perform a disk-brushing operation.

As an embodiment, the determining, by the first number determining unit 401, a first number of objects to be flushed in the cache pool, including:

acquiring the preset first quantity;

or,

acquiring the preset disc to be brushed capacity and the size of an object;

As an embodiment, the determining a second number of the objects to be brushed corresponding to each of the storage groups by the second number determining unit 402 according to the first number and the number of the objects corresponding to each of the storage groups includes:

and determining a second number of the objects to be brushed corresponding to the one arrangement group according to the first number, the first parameter and the second parameter.

As an embodiment, the determining the second parameter of the one homing group by the second number determining unit 402 according to the number of cold objects corresponding to the one homing group and the number of objects corresponding to the one homing group includes:

As an embodiment, the determining, by the second quantity determining unit 402, a second quantity of the objects to be brushed corresponding to the one grouping group according to the first quantity, the first parameter, and the second parameter includes:

Thus far, the description of the apparatus shown in fig. 4 is completed.

In the embodiment of the invention, the leader node determines whether the disk is required to be flushed according to the service condition of the cache pool, so that the disk flushing frequency is reduced, and the influence on the data read-write performance is further reduced. Secondly, a certain number of objects are brushed off for each homing group, so that the objects are prevented from occupying the space of the cache pool for a long time, and the utilization rate of the space of the cache pool is improved.

The leader node provided by the embodiment of the invention is described as follows:

referring to fig. 5, a hardware structure diagram of a leader node according to an embodiment of the present invention is provided. The leader node may include a processor 501, a machine-readable storage medium 502 having machine-executable instructions stored thereon. The processor 501 and the machine-readable storage medium 502 may communicate via a system bus 503. Also, the processor 501 may perform the disk-scrubbing method described above by reading and executing machine-executable instructions in the machine-readable storage medium 502 corresponding to the disk-scrubbing logic.

The machine-readable storage medium 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium 502 may include at least one of the following storage media: volatile memory, non-volatile memory, other types of storage media. The volatile Memory may be a Random Access Memory (RAM), and the nonvolatile Memory may be a flash Memory, a storage drive (e.g., a hard disk drive), a solid state disk, and a storage disk (e.g., a compact disk, a DVD).

Embodiments of the invention also provide a machine-readable storage medium, such as machine-readable storage medium 502 in fig. 5, comprising machine-executable instructions that are executable by processor 501 in the leader node to implement the disk-scrubbing method described above.

To this end, the description of the leader node shown in fig. 5 is completed.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the embodiments of the present invention should be included in the scope of the present invention.

Claims

1. A data processing method is applied to a leader node in a distributed storage system, and is characterized in that a cache pool of the distributed storage system is configured with a plurality of homing groups, and each homing group corresponds to a plurality of objects stored in the cache pool, and the method comprises the following steps:

the following operations are performed separately for each of the homing groups: determining the percentage of the number of the objects corresponding to one homing group in the total number of the objects in the cache pool as a first parameter of the one homing group; determining a second parameter of the one homing group according to the number of cold objects corresponding to the one homing group and the number of objects corresponding to the one homing group; determining a second number of the objects to be brushed corresponding to the one homing group according to the first number, the first parameter and the second parameter, wherein the second number and the number of the objects corresponding to the homing group form a positive correlation;

2. The method of claim 1, wherein the determining the first number of objects to be flushed in the cache pool comprises:

acquiring the preset first quantity;

or,

acquiring the preset disc to be brushed capacity and the size of an object;

3. The method of claim 1, wherein determining the second parameter for the one homing group based on the number of cold objects for the one homing group and the number of objects for the one homing group comprises:

4. The method according to claim 1 or 3, wherein the determining a second number of the objects to be brushed corresponding to the one homing group according to the first number, the first parameter and the second parameter comprises:

5. A data processing apparatus applied to a leader node in a distributed storage system, wherein a cache pool of the distributed storage system is configured with a plurality of homing groups, each homing group corresponding to a plurality of objects stored in the cache pool, the apparatus comprising:

a second quantity determination unit, configured to perform the following operations for each of the homing groups, respectively: determining the percentage of the number of the objects corresponding to one homing group in the total number of the objects in the cache pool as a first parameter of the one homing group; determining a second parameter of the one homing group according to the number of cold objects corresponding to the one homing group and the number of objects corresponding to the one homing group; determining a second number of the objects to be brushed corresponding to the one homing group according to the first number, the first parameter and the second parameter, wherein the second number and the number of the objects corresponding to the homing group form a positive correlation;

And the object screening unit is used for screening the second quantity of objects from the objects corresponding to the respective arrangement groups and executing the disk brushing operation.

6. The apparatus of claim 5, wherein the first number determination unit to determine the first number of objects to be flushed in the cache pool comprises:

acquiring the preset first quantity;

or,

acquiring the preset disc to be brushed capacity and the size of an object;

7. The apparatus of claim 5, wherein the second quantity determining unit determines the second parameter for the one homing group based on the number of cold objects for the one homing group and the number of objects for the one homing group, comprising:

8. The apparatus according to claim 5 or 7, wherein the second quantity determining unit determines the second quantity of the objects to be brushed corresponding to the one homing group according to the first quantity, the first parameter and the second parameter, and includes:

9. A leader node, comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor caused by the machine-executable instructions to: carrying out the method steps of any one of claims 1 to 4.

10. A machine-readable storage medium having stored therein machine-executable instructions which, when executed by a processor, perform the method steps of any of claims 1-4.