CN109710456B

CN109710456B - Data recovery method and device

Info

Publication number: CN109710456B
Application number: CN201811501810.XA
Authority: CN
Inventors: 金朴堃; 杨潇
Original assignee: Hangzhou H3C Technologies Co Ltd
Current assignee: Hangzhou H3C Technologies Co Ltd
Priority date: 2018-12-10
Filing date: 2018-12-10
Publication date: 2021-03-23
Anticipated expiration: 2038-12-10
Also published as: CN109710456A

Abstract

The application provides a data recovery method and a data recovery device, which comprise the following steps: when the OSD topology of the object storage device of the Ceph cluster changes, the method comprises the following steps: determining a target homing group PG to be subjected to data recovery; detecting whether the number of the current corresponding normal OSD copies of the target PG is larger than or equal to a preset minimum copy number or not and detecting the load state of the Ceph cluster; and if the number of the normal OSD copies corresponding to the target PG is more than or equal to the minimum number of the copies and the load state in the Ceph cluster is a busy state, delaying the recovery of the data to be subjected to data recovery in the target PG. By using the method provided by the application, the problem that the load of the OSD equipment is increased due to data recovery, and the service processing of the client side is influenced by the Ceph cluster can be prevented.

Description

Data recovery method and device

Technical Field

The present application relates to the field of storage, and in particular, to a data recovery method and apparatus.

Background

With the rapid development of technologies such as cloud computing, big data and internet of things, data is also increasing explosively, and the traditional data storage technology cannot meet the requirements of the current society, so that a Ceph (distributed storage system) is provided.

The Ceph is a distributed storage technology, integrates services of object storage, block storage and file storage, and has the advantages of high reliability, high automation, high expandability and the like.

In a Ceph cluster, when the topology of an OSD (Object Storage Device) in the Ceph cluster changes, data in a PG (place Group) corresponding to the OSD in the Ceph cluster needs to be restored. For example, when a certain OSD in the Ceph cluster is abnormal, the data in the PG corresponding to the abnormal OSD needs to be restored to the normal OSD. However, when the number of service IO (Input Output) sent by a client in Ceph cluster processing is very large, if data recovery is performed at this time, workload of the Ceph cluster is increased, and when the workload of the Ceph cluster exceeds a certain limit, the Ceph cluster suspends a part of service IO sent by the client, which seriously affects service processing of the client.

Disclosure of Invention

In view of this, the present application provides a data recovery method and apparatus, so as to prevent the problem that the service processing of the client is affected by the Ceph cluster due to the load increase of the OSD device caused by the data recovery.

Specifically, the method is realized through the following technical scheme:

according to a first aspect of the present application, a data recovery method is provided, where the method is applied to monitors in a Ceph cluster of a distributed storage system, and when OSD topology of object storage devices of the Ceph cluster changes, the method includes:

determining a target homing group PG to be subjected to data recovery;

detecting whether the number of the current corresponding normal OSD copies of the target PG is larger than or equal to a preset minimum copy number or not and detecting the load state of the Ceph cluster;

and if the number of the normal OSD copies corresponding to the target PG is more than or equal to the minimum number of the copies and the load state in the Ceph cluster is a busy state, delaying the recovery of the data to be subjected to data recovery in the target PG.

Optionally, the method further includes:

and if the number of the normal OSD copies corresponding to the target PG is less than the minimum number of the copies, or the Ceph cluster is in a non-busy state, recovering the data to be recovered in the target PG.

Optionally, the detecting a busy-idle state of the Ceph cluster includes:

detecting whether the current value of the cluster load parameter reflecting the current load state of the Ceph cluster is greater than a first preset value;

if yes, determining that the Ceph cluster is in a busy state;

if not, further detecting the current value of the node load parameter reflecting the current load state of each OSD in the Ceph cluster; if the current value of the node load parameter is larger than the OSD of a second preset value, determining that the Ceph cluster is in a busy state; and if the current values of the node load parameters of all the OSD in the Ceph cluster are less than or equal to the second preset value, determining that the Ceph cluster is in a non-busy state.

Optionally, after determining the target homing group PG to be subjected to data recovery, the method includes:

starting a preset timer;

the detecting whether the number of the normal OSD copies currently corresponding to the target PG is greater than or equal to a preset minimum number of the copies and detecting the load state of the Ceph cluster includes:

detecting whether the timer is overtime;

if the timer is overtime, detecting whether the number of the current corresponding normal OSD copies of the target PG is larger than or equal to a preset minimum copy number and detecting the load state of the Ceph cluster;

if the number of the normal OSD copies corresponding to the target PG is greater than or equal to the minimum number of the copies and the Ceph cluster is in a busy state, delaying recovery of data in the target PG, including:

and if the number of the normal OSD copies corresponding to the target PG is more than or equal to the minimum number of the copies and the Ceph cluster is in a busy state, returning to the step of detecting whether the timer is overtime.

Optionally, the restoring the data to be restored in the target PG includes:

starting a preset second timer when data to be recovered in the target PG starts to be recovered;

when the second timer is overtime, detecting whether all data to be recovered in the target PG are completely recovered or not;

if not, stopping recovering the unrecovered data in the target PG, closing the second timer, and returning to the step of detecting whether the timer is overtime.

Optionally, the cluster load parameter includes: the ratio of the current service IO number of the Ceph cluster to the current all IO numbers of the Ceph cluster;

the node load parameters include: hard disk utilization rate and IPOS times of reading and writing operations per second.

Optionally, before determining the target PG to be subjected to data recovery, the method further includes:

calculating an OSD group corresponding to each PG in the Ceph cluster;

the determining of the target PG to be subjected to data recovery includes:

and for each PG, if the calculated OSD group corresponding to the PG is inconsistent with the OSD group currently corresponding to the PG, determining the PG as a target PG to be subjected to data recovery.

According to a second aspect of the present application, there is provided a data recovery apparatus, which is applied to a monitor in a Ceph cluster of a distributed storage system, and when an OSD topology of object storage devices of the Ceph cluster changes, the apparatus includes:

the determining unit is used for determining a target homing group PG to be subjected to data recovery;

the detection unit is used for detecting whether the number of the current corresponding normal OSD copies of the target PG is larger than or equal to a preset minimum number of copies and detecting the load state of the Ceph cluster;

and the delay unit is used for delaying the recovery of the data to be subjected to data recovery in the target PG if the number of the normal OSD copies corresponding to the target PG is greater than or equal to the minimum number of the copies and the load state in the Ceph cluster is a busy state.

Optionally, the apparatus further comprises:

and the recovery unit is used for recovering the data to be recovered in the target PG if the number of the normal OSD copies corresponding to the target PG is less than the minimum number of the copies or the Ceph cluster is in a non-busy state.

Optionally, the detecting unit is configured to detect whether a current value of a cluster load parameter that reflects a current load state of the Ceph cluster is greater than a first preset value; if yes, determining that the Ceph cluster is in a busy state; if not, further detecting the current value of the node load parameter reflecting the current load state of each OSD in the Ceph cluster; if the current value of the node load parameter is larger than the OSD of a second preset value, determining that the Ceph cluster is in a busy state; and if the current values of the node load parameters of all the OSD in the Ceph cluster are less than or equal to the second preset value, determining that the Ceph cluster is in a non-busy state.

Optionally, the apparatus further comprises:

the starting unit is used for starting a preset timer;

the detection unit is specifically configured to detect whether the timer is overtime; if the timer is overtime, detecting whether the number of the current corresponding normal OSD copies of the target PG is larger than or equal to a preset minimum copy number and detecting the load state of the Ceph cluster;

the delay unit is specifically configured to return to the step of detecting whether the timer is overtime if the number of normal OSD copies corresponding to the target PG is greater than or equal to the minimum number of copies and the Ceph cluster is in a busy state.

Optionally, the recovery unit is configured to start a preset second timer when the data to be recovered in the target PG is recovered, specifically, when the data to be recovered in the target PG starts to be recovered; when the second timer is overtime, detecting whether all data to be recovered in the target PG are completely recovered or not; if not, stopping recovering the unrecovered data in the target PG, closing the second timer, and returning to the step of detecting whether the timer is overtime.

Optionally, the apparatus further comprises:

the computing unit is used for computing OSD groups corresponding to the PGs in the Ceph cluster;

the determining unit is specifically configured to, for each PG, determine that the PG is a target PG to be subjected to data recovery if the calculated OSD group corresponding to the PG is inconsistent with the OSD group currently corresponding to the PG.

As can be seen from the above description, on one hand, when the OSD topology in the Ceph set changes, the present application does not immediately recover the data to be recovered in the target PG, but determines the load status of the current Ceph cluster, and when it is determined that the Ceph cluster is in a busy state, the recovery of the data in the target PG is delayed, so that the problem that the processing of the service of the Ceph cluster client is affected due to the load increase of the OSD device caused by the data recovery can be effectively prevented.

On the other hand, the method and the device also compare the number of the normal OSD copies corresponding to the target PG with the preset minimum copy number so as to ensure that enough normal OSD copies can be used for ensuring that the read-write service aiming at the target PG is processed even if the recovery of the data in the target PG is delayed.

Drawings

Fig. 1 is a diagram illustrating a networking architecture of a Ceph according to an exemplary embodiment of the present application;

FIG. 2 is a flow chart illustrating a method of data recovery in accordance with an exemplary embodiment of the present application;

FIG. 3 is a flow chart illustrating another method of data recovery according to an exemplary embodiment of the present application;

FIG. 4 is a block diagram of a data recovery device shown in an exemplary embodiment of the present application;

fig. 5 is a hardware block diagram of a monitor according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Referring to fig. 1, fig. 1 is a diagram illustrating a networking architecture based on Ceph storage according to an exemplary embodiment of the present application.

The networking comprises a Ceph cluster and a client.

The client, also referred to as a client of the Ceph cluster, is mainly used for interacting with the Ceph cluster, so that the Ceph cluster can process read-write services of the client.

Specifically, the Ceph cluster may receive a service IO (such as a read IO, a write IO, and the like) sent by the client, and the Ceph cluster executes a read-write service of the client based on the service IO sent by the client.

For example, the Ceph cluster receives a write IO sent by a client, and the Ceph cluster can write data carried by the write IO locally. When the Ceph cluster receives the read IO sent by the client, the Ceph cluster can read data according to the read IO and return the read data to the client.

The above Ceph cluster may include: a monitor, a plurality of OSDs. Of course, the Ceph cluster may also include other devices such as a metadata server, and the Ceph cluster is only illustrated by way of example and is not specifically limited to the devices included in the Ceph cluster.

The monitor is mainly used for managing each device in the Ceph cluster. The monitor may be a single physical device or a cluster of multiple physical devices, and is only exemplary and not limited in particular.

The OSD is mainly used for storing data, for example, data is written after a write IO request is received, data is read after a read IO request is received, and the like. The OSD is typically a hard disk on each physical server in the Ceph cluster. Here, the OSD function and the device form of the OSD are merely exemplary and are not particularly limited.

Several concepts involved in the Ceph cluster are presented below.

1)PG

The PG, also called a homing group, is the smallest unit of data recovery and change in a Ceph cluster. PG is a logical concept, which is a logical set containing a set of data, and the data contained in PG is stored in the OSD group corresponding to the PG.

The OSD group corresponding to the PG comprises a plurality of OSD. The data in the PG may be duplicated and stored in each OSD of the group of OSDs corresponding to the PG.

For example, the OSD group corresponding to PG1 is [1,2,3], which indicates that the OSD group corresponding to PG1 includes 3 OSDs, and the OSD identifiers of the three OSD groups are OSD1, OSD2, and OSD3, respectively.

When writing data for the PG1, the data may be written to the OSD1, and the OSD1 synchronizes the data to the OSD2 and OSD3 so that each OSD holds a copy of the data in the PG 1.

2) OSD copy corresponding to PG

The OSDs in the OSD group corresponding to PG are called as OSD copies corresponding to PG. The number of OSDs in the group of OSDs corresponding to PG is referred to as the number of OSD copies corresponding to PG.

Also taking the OSD group corresponding to PG1 as [1,2,3], OSD1, OSD2, and OSD3 are all referred to as the OSD copy corresponding to PG 1. The number of OSD copies of PG1 is 3.

3) Business IO and recovery IO

The IOs present in a Ceph cluster may include: service IO and recovery IO.

And (4) service IO: the service IO is IO from a client, and is mainly used for instructing the Ceph cluster to perform read-write service of the client.

The service IO may include: read IO from a client, write IO from a client, etc.

And (4) recovering IO: when the data in the PG is recovered, recovery IO is generated in the Ceph cluster and mainly used for guiding the PG in the Ceph cluster to recover.

4) Data recovery in PG

When the OSD topology in the Ceph cluster changes, the monitor may calculate an OSD group corresponding to each PG in the cluster, and then, for each PG, if the current OSD group corresponding to the PG is different from the calculated OSD group, determine the PG as the PG to be subjected to data recovery.

The recovery of the data in the PG to be subjected to data recovery refers to: and restoring the data in the PG to the OSD in the OSD group corresponding to the calculated PG.

The existing data recovery method is as follows: when monitoring that a certain OSD in the Ceph cluster has a fault, the monitor immediately restores the data in the PG corresponding to the fault OSD to the normal OSD, and in the data restoration process, a restoration IO for guiding data restoration is generated, and the restoration IO carries the data to be restored. After the normal OSD receives the restoration IO, the PG data carried in the restoration IO can be written into the local.

However, if the normal OSD is processing a large amount of service IO from the client, processing to recover IO may aggravate the workload of the normal OSD, which may cause performance degradation of the normal OSD, and may seriously affect processing of the service IO sent by the client.

In view of this, the present application is directed to a data recovery method, after a monitor determines a target PG to be recovered, if the number of normal OSD copies corresponding to the target PG is greater than or equal to a preset minimum number of copies and a current Ceph cluster is in a busy state, delaying recovery of data in the target PG to the target OSD.

On one hand, when the OSD topology in the Ceph set changes, the method and the device do not immediately recover the data in the target PG to be recovered, but judge the busy and idle states of the current Ceph cluster, and delay recovery of the data in the target PG when the Ceph cluster is determined to be in the busy state, so that the problem that the processing of the client service of the Ceph cluster is affected due to the fact that the load of OSD equipment is increased due to data recovery can be effectively prevented.

In summary, when data recovery is performed, the method and the device can not only prevent the problem that the client service processing of the Ceph cluster is affected due to the fact that the OSD equipment load is increased due to data recovery by delaying recovery of data in the target PG when the Ceph cluster is busy, but also can ensure that the read-write service processing of the target PG cannot be affected in the process of delaying recovery of data in the target PG.

Referring to fig. 2, fig. 2 is a flowchart illustrating a data recovery method according to an exemplary embodiment of the present application. The method can be applied to a monitor of a Ceph cluster, and when the OSD topology of the Ceph cluster changes, the following steps can be executed.

Step 201: determining a target PG to be subjected to data recovery;

in the Ceph cluster, when the OSD in the Ceph cluster is offline due to network abnormality, or the OSD is newly added in the Ceph cluster, or the OSD fails, the OSD topology in the Ceph cluster is changed.

When the monitor detects that the OSD topology in the Ceph cluster changes, the monitor may determine the target PG to be restored.

First, how the lower monitor detects whether the OSD topology in the Ceph cluster changes is described.

When the method is implemented, the monitor can detect whether the OSD topology in the Ceph cluster changes or not by detecting whether the OSD cluster map changes or not.

Specifically, the monitor stores an OSD cluster map, and the OSD cluster map is used for recording the OSD topology in the current Ceph cluster. If the monitor monitors that the OSD cluster map changes, the monitor determines that the OSD topology in the Ceph cluster changes, and if the OSD cluster map does not change, the monitor determines that the OSD topology in the Ceph cluster does not change.

Next, how the monitor determines the target PG to be restored if the OSD topology in the Ceph cluster occurs is described.

In implementation, when the monitor determines that the OSD topology in the Ceph cluster changes, the monitor may calculate the OSD group corresponding to each PG through a Crush (Controlled Replication Under extensible hash) algorithm. The OSD group corresponding to the calculated PG indicates the mapping of each PG and OSD group in the Ceph cluster after data recovery.

Then, for each PG, the monitor may obtain the calculated OSD group corresponding to the PG and the OSD group corresponding to the current PG.

The PGs in the Ceph cluster are configured with two sets, an up set and an acting set. The OSD groups corresponding to the current PGs are recorded in the set up. and an OSD group corresponding to PG calculated by the Crush algorithm is recorded in the acting set.

During the obtaining, the monitor may obtain the OSD group corresponding to the current PG from the up set corresponding to the PG, and determine the OSD group corresponding to the PG calculated by the Crush algorithm from the acting set corresponding to the PG.

After the calculated OSD group corresponding to the PG and the OSD group corresponding to the current PG are obtained, the monitor can detect whether the calculated OSD group corresponding to the PG is consistent with the OSD group corresponding to the current PG.

If the calculated OSD group corresponding to the PG is not consistent with the OSD group corresponding to the current PG, determining the PG as a target PG to be subjected to data recovery;

and if the calculated OSD group corresponding to the PG is consistent with the OSD group corresponding to the current PG, determining that the PG is not the PG to be subjected to data recovery.

For example, also taking PG1 as an example, suppose that OSD groups [1,2,3] are recorded in the up set corresponding to PG1 and [4,2,3] are recorded in the acting set corresponding to PG 1.

The monitor can obtain the current OSD group [1,2,3] of PG1 from the up set, and can obtain the OSD group [4,2,3] of PG1 calculated by the Crush algorithm from the acting set. Since the monitor determines that the current OSD group [1,2,3] of the PG1 does not coincide with the OSD group [4,2,3] of the PG1 calculated by the Crush algorithm, the monitor determines that the PG1 is the target PG to be restored.

The target PG described herein may be one PG or a plurality of PGs, and the number of target PGs is not specifically limited here.

Step 202: and detecting whether the number of the current corresponding normal OSD copies of the target PG is larger than or equal to a preset minimum copy number or not and detecting the load state of the Ceph cluster.

Step 202 will be described in detail below with respect to both the triggering mechanism of step 202 and the specific implementation of step 202.

1. Step 202 trigger mechanism

After determining the target PG group to be subjected to data recovery, the monitor may start a preset timer (herein, referred to as a first timer).

The monitor may detect whether the first timer has timed out.

And if the first timer is not overtime, continuing to wait for the first timer to be overtime.

If the first timer has timed out, step 202 is triggered, that is, if the first timer has timed out, whether the number of the current corresponding normal OSD copies of the target PG is greater than or equal to a preset minimum number of copies or not is detected, and the Ceph cluster busy/idle state is detected.

2. The specific implementation of step 202:

1) the monitor can detect whether the number of the determined normal OSD copies corresponding to the target PG is larger than or equal to a preset minimum copy number.

When implemented, the monitor may first determine the number of normal OSD copies corresponding to the target PG.

Specifically, the monitor may select the same OSD from the two OSD groups as the normal OSD corresponding to the target PG from the OSD group currently corresponding to the target PG and the calculated OSD group corresponding to the target PG, and then count the number of the normal OSDs corresponding to the target PG.

For example, still taking PG1 as the target PG, assume that the current OSD group of PG1 is OSD group [1,2,3], and the calculated OSD group of PG1 is OSD group [4,2,3 ].

The same OSDs in the two OSD groups are OSD2 and OSD3, and then OSD2 and OSD3 are corresponding normal OSD copies of PG 1. The number of the normal OSD copies is 2.

Then, the monitor can detect whether the number of the determined normal OSD copies corresponding to the target PG is greater than or equal to a preset minimum number of copies.

It should be noted that the minimum number of copies is determined by the user according to the minimum number of OSD copies that the user can bear.

For example, in the conventional Ceph cluster, in order to ensure the reliability of data, a user sets a copy number (assumed to be represented by N) for the Ceph according to the conditions of storage space, write delay, and the like. Usually N.gtoreq.3.

In the present application, in addition to the number N of existing copies, the present application also designs the minimum number M of copies. The user can set the minimum number of copies (assumed to be represented by M) according to the value of N.

For example, the user can set the value of M in the interval of [1, N ], i.e., 1 ≦ M ≦ N. Here, the setting of the minimum copy data is merely exemplary and is not particularly limited.

The purpose of setting the minimum copy number and detecting whether the number of the normal OSD copies corresponding to the target PG is more than or equal to the preset minimum copy number is as follows: it is guaranteed that even if the recovery of data in the target PG is delayed, enough normal OSD copies can still be available to guarantee that read and write traffic for the target PG is processed.

Specifically, for example, assuming that the OSD1 is failed, the OSD group corresponding to the target PG is currently the OSD group [1,2,3], the calculated OSD group corresponding to the target PG is the OSD group [4,2,3], and at this time, the data in the target PG needs to be restored to the OSD 4.

In the process of restoring the data in the target PG to the OSD4, the Ceph cluster still receives traffic IO from the client for the target PG, and these traffic IO can be handled by the OSD copy (i.e., OSD2 or OSD3) due to the OSD1 failure and the data not completing the restoration.

For example, if the service IO is a read IO, at this time, because the OSD1 fails, and data in the OSD1 is not restored to the OSD4, it is necessary to read data from the OSD copy corresponding to the target PG, such as data from the OSD2 or OSD 3.

If all the OSD copies corresponding to the target PG are abnormal or the number of the OSD copies corresponding to the target PG is smaller than a threshold preset by a user, the processing of the service IO cannot be satisfied.

Based on the above reasons, the present application needs to detect whether the number of the normal OSD copies corresponding to the target PG is greater than or equal to the preset minimum number of copies, so as to ensure that even if the recovery of the data in the target PG is delayed, there are enough normal OSD copies to ensure that the read-write service for the target PG is processed.

2) Detecting the Ceph cluster busy-idle state

The first method is as follows:

step 2021: the monitor may detect whether a current value of a cluster load parameter of the Ceph cluster is greater than a first preset value.

The cluster load parameter is used to reflect a load state of the whole cluster, for example, the cluster load parameter may be a ratio of the current service IO number of the Ceph cluster to the current total IO number of the Ceph cluster.

The first preset value may be set by a user according to actual conditions, and is not specifically limited herein.

In implementation, the monitor may count the number a of all the current IOs of the Ceph cluster, and count the number B of the business IOs.

The monitor may then calculate the ratio of B to a, resulting in C, where C is B/a, which is the current value of the cluster load parameter for the Ceph cluster.

Step 2022: and if the current value of the cluster load parameter of the Ceph cluster is larger than a first preset value, determining that the Ceph cluster is in a busy state.

Step 2023: and if the current value of the cluster load parameter of the Ceph cluster is less than or equal to a first preset threshold value, further detecting the node load parameter of each OSD in the Ceph cluster.

The node load parameter of each OSD is used for representing the load state of the OSD; the larger the current value of the node load parameter of the OSD is, the more IO carried by the OSD is, the more the OSD is busy.

The node load parameters may include: the hard disk usage of the OSD, and the IOPS (Input/Output Operations Per Second) of the OSD. When the node load parameter is the hard disk utilization of the OSD, the second preset value is a preset value related to the hard disk utilization, and when the node load parameter is the IOPS of the OSD, the second preset value is a preset value related to the IOPS.

Step 2024: if the current value of the node load parameter is larger than the OSD with the second preset value, the monitor can determine that the Cpeh cluster is in a busy state.

Step 2025: and if the current values of the node load parameters of all the OSD in the Ceph cluster are less than or equal to a second preset value, the monitor can determine that the Ceph cluster is in a non-busy state.

The second method comprises the following steps:

the monitor detects node load parameters of each OSD in the Ceph cluster, if the current values of the node load parameters are larger than the OSD with a second preset value, the monitor can determine that the Cpeh cluster is in a busy state, and if the current values of the node load parameters of all the OSD in the Ceph cluster are smaller than or equal to the second preset value, the monitor can determine that the Ceph cluster is in a non-busy state.

The advantage of the first mode is that:

on one hand, the busy and idle states of the Ceph cluster are determined according to the cluster load parameters of the Ceph cluster and the node load parameters of the OSD nodes in the Ceph cluster, and the states of the Ceph cluster are reflected from the overall Ceph cluster and the nodes of the Ceph cluster, so that the busy and idle states of the Ceph cluster are reflected more comprehensively.

On the other hand, compared with the second mode, the existing Ceph cluster can detect and record the overall load parameter of the Ceph cluster, and the monitor can directly read the cluster load parameter of the Ceph cluster. The monitor needs to obtain the node load parameters of each OSD from each OSD node. Therefore, reading the cluster load parameters of the Ceph cluster is more convenient than obtaining the node load parameters of each OSD.

Therefore, by adopting the method of the first mode for judgment, when the cluster load parameter of the Ceph cluster is determined to be larger than the first preset value, the fact that the Ceph cluster is in the busy state can be determined, and the node load parameters of each OSD do not need to be obtained, so that the speed of determining the busy and idle states of the Ceph cluster is greatly saved.

It should be noted that, the present application does not specifically limit the timing sequence of "detecting whether the number of normal OSD copies corresponding to the target PG is greater than or equal to a preset minimum number of copies" and "detecting the busy/idle state of the Ceph cluster".

Step 203: and if the number of the normal OSD copies corresponding to the target PG is larger than or equal to the preset minimum number of copies and the Ceph cluster is in a busy state, delaying the recovery of the data in the target PG.

In this embodiment of the present application, if the current Ceph cluster is in a busy state, the monitor does not immediately restore the data in the target PG to the OSD corresponding to the calculated target PG, but periodically detects whether the number of normal OSD copies corresponding to the target PG is greater than or equal to a preset minimum number of copies and detects the busy/idle state of the Ceph cluster, and the data in the target PG is not restored until the number of normal copies corresponding to the target PG is greater than or equal to the minimum number of copies and the state of the current Ceph cluster is in a non-busy state.

In implementation, if the number of the normal OSD copies corresponding to the target PG is greater than or equal to the preset minimum number of copies and the Ceph cluster is in a busy state, the step of "detecting whether the first timer has timed out" in the step 202 is returned, if the first timer has timed out, the step 203 is continuously executed, and if the first timer has not timed out, the first timer is waited for being timed out. And starting to restore the data in the target PG until the number of the normal OSD copies corresponding to the target PG is less than the minimum number of copies or the Ceph cluster is in a non-busy state.

For example, if the number of the normal OSD copies corresponding to the target PG is greater than or equal to a preset minimum number of copies and the Ceph cluster is in a busy state, checking whether a first timer is overtime.

And if the first timer is not overtime, waiting for the first timer to be overtime. If the first timer is overtime, detecting whether the number of the normal OSD copies corresponding to the target PG is larger than or equal to the preset minimum number of the copies, and detecting the state of the current Ceph cluster. And if the number of the normal OSD copies corresponding to the target PG is detected to be larger than or equal to the preset minimum number of the copies and the current Ceph cluster is in a busy state, checking whether the first timer is overtime or not.

And if the first timer is not overtime, waiting for the first timer to be overtime. If the first timer is overtime, detecting whether the number of the normal OSD copies corresponding to the target PG is larger than or equal to the preset minimum number of the copies, detecting the state of the current Ceph cluster, and repeating the steps until the number of the normal copies corresponding to the target PG is smaller than the minimum number of the copies, or detecting that the current Ceph cluster is in a non-busy state, and starting to restore the data in the target PG.

Step 204: and if the number of the normal OSD copies corresponding to the target PG is less than the minimum number of the copies, or the Ceph cluster is in a non-busy state, recovering the data to be recovered in the target PG.

When the recovery is realized, when the recovery of the data to be recovered in the target PG is started, a preset second timer is started;

and when the second timer is overtime, detecting whether all data to be recovered in the target PG are completely recovered.

And if all the data to be recovered in the target PG are recovered, ending the data recovery process.

If all the data to be recovered in the target PG are not recovered, stopping recovering the data which is not recovered in the target PG, closing the second timer, and returning to the step of detecting the first timer in the step 202.

If the first timer is overtime, detecting whether the number of the normal OSD copies corresponding to the target PG at present is larger than or equal to a preset minimum number of copies and detecting the load state of the Ceph cluster, and if the number of the normal OSD copies corresponding to the target PG is smaller than the minimum number of copies or the Ceph cluster is in a non-busy state, starting to recover unrecovered data in the target PG and starting a second timer.

And when the second timer is overtime, detecting whether all data to be recovered in the target PG are completely recovered. And if all the data to be recovered in the target PG are completely recovered, ending the process. And if the recovery of all the data to be recovered in the target PG is not finished, stopping recovering the data which is not recovered currently in the target PG, closing the second timer, and returning to the step of detecting whether the first timer is overtime until all the data to be recovered in the target PG are finished recovering.

As can be seen from the above description, on one hand, when the OSD topology in the Ceph set changes, the data in the target PG to be recovered is not immediately recovered, but the busy/idle state of the current Ceph cluster is determined, and when it is determined that the Ceph cluster is in the busy state, the data in the target PG is delayed to be recovered, so that the problem that the processing of the service of the client of the Ceph cluster is affected due to the increased load of the OSD device caused by the data recovery can be effectively prevented.

In a third aspect, the data recovery method provided by the present application may be compatible with an existing Ceph cluster, for example, may be compatible with a Crush algorithm of the Ceph cluster, and all the data recovery methods provided by the present application have good compatibility.

Referring to fig. 3, fig. 3 is a flowchart illustrating another data recovery method according to an exemplary embodiment of the present application. The method can be applied to monitors in a Ceph cluster.

Step 301: and when the OSD topology in the Ceph cluster is changed, determining a target PG to be subjected to data recovery.

Step 302: a first timer is started.

Step 303: detecting whether the first timer is overtime.

If the first timer has not timed out, step 304 may be performed.

If the first timer times out, step 305 is executed.

Step 304: the monitor may wait for the first timer to time out.

Step 305: the monitor can detect whether the current number of the corresponding normal OSD copies of the target PG is larger than the preset minimum number of the copies.

If the current number of the corresponding normal OSD copies of the target PG is greater than or equal to the preset minimum number of the copies, step 306 is executed.

If the current number of the corresponding normal OSD copies of the target PG is smaller than the preset minimum number of the OSD copies, step 308 is executed.

Step 306: the monitor may detect whether a current value of a cluster load parameter of the Ceph cluster is greater than a first preset value.

And if the current value of the cluster load parameter of the Ceph cluster is greater than the first preset value, returning to the step 303.

If the current value of the cluster load parameter of the Ceph cluster is less than or equal to the first preset value, step 307 is executed.

Step 307: the monitor can detect whether the current value of the node load parameter of each OSD in the Ceph cluster is larger than a second preset value.

If the current values of the node load parameters of each OSD in the Ceph cluster are all less than or equal to the second preset value, step 308 is executed.

If the current value of the node load parameter is greater than the OSD with the second preset value in the Ceph cluster, the process returns to step 303.

Step 308: the monitor starts recovering the data of the target PG and starts a second timer.

Step 309: after the second timer expires, the monitor may detect whether all data to be recovered in the target PG completes recovery.

If all the data to be recovered in the target PG are recovered, step 311 is executed.

If there is unrecovered data in the target PG, step 310 is executed.

Step 310: the monitor may stop recovering data that is not currently recovered in the target PG and turn off the second timer.

After step 310 is performed, step 303 is returned to.

Step 311: the monitor may end the recovery of the data in the target PG.

The embodiment of the application also provides a data recovery device corresponding to the data recovery method.

Referring to fig. 4, fig. 4 is a block diagram illustrating a data recovery apparatus according to an exemplary embodiment of the present application, which may be applied to a monitor, and may include the following elements.

A determining unit 401, configured to determine a target homing group PG to be subjected to data recovery;

a detecting unit 402, configured to detect whether the number of the current corresponding normal OSD copies of the target PG is greater than or equal to a preset minimum number of copies, and detect a load state of the Ceph cluster;

a delaying unit 403, configured to delay recovering data to be subjected to data recovery in the target PG if the number of normal OSD copies corresponding to the target PG is greater than or equal to the minimum number of copies and a load state in the Ceph cluster is a busy state.

Optionally, the apparatus further comprises:

a recovering unit 404, configured to recover the data to be recovered in the target PG if the number of normal OSD copies corresponding to the target PG is smaller than the minimum number of copies or the Ceph cluster is in a non-busy state.

Optionally, the detecting unit 402 is configured to detect whether a current value of a cluster load parameter that reflects a current load state of the Ceph cluster is greater than a first preset value; if yes, determining that the Ceph cluster is in a busy state; if not, further detecting the current value of the node load parameter reflecting the current load state of each OSD in the Ceph cluster; if the current value of the node load parameter is larger than the OSD of a second preset value, determining that the Ceph cluster is in a busy state; and if the current values of the node load parameters of all the OSD in the Ceph cluster are less than or equal to the second preset value, determining that the Ceph cluster is in a non-busy state.

Optionally, the apparatus further comprises:

a starting unit 405, configured to start a preset timer;

the detecting unit 402 is specifically configured to detect whether the timer is overtime; if the timer is overtime, detecting whether the number of the current corresponding normal OSD copies of the target PG is larger than or equal to a preset minimum copy number and detecting the load state of the Ceph cluster;

the delay unit 403 is specifically configured to return to the step of detecting whether the timer is overtime if the number of normal OSD copies corresponding to the target PG is greater than or equal to the minimum number of copies and the Ceph cluster is in a busy state.

Optionally, the recovering unit 404 is configured to start a preset second timer when recovering the data to be recovered in the target PG, specifically when starting to recover the data to be recovered in the target PG; when the second timer is overtime, detecting whether all data to be recovered in the target PG are completely recovered or not; if not, stopping recovering the unrecovered data in the target PG, closing the second timer, and returning to the step of detecting whether the timer is overtime.

Optionally, the apparatus further comprises:

a calculating unit 406, configured to calculate an OSD group corresponding to each PG in the Ceph cluster;

the determining unit 401 is specifically configured to, for each PG, determine that the PG is a target PG to be subjected to data recovery if the calculated OSD group corresponding to the PG is inconsistent with the OSD group currently corresponding to the PG.

Correspondingly, the application also provides a hardware structure diagram corresponding to the device shown in fig. 4.

The monitor described in the present application may be a physical monitor, that is, one physical server, or may be a virtual monitor virtualized by a plurality of physical servers. When the monitor is a physical monitor, the hardware configuration of the monitor may be as shown in fig. 5.

Referring to fig. 5, fig. 5 is a hardware structure diagram of a monitor according to an exemplary embodiment of the present application.

The monitor includes: a communication interface 501, a processor 502, a machine-readable storage medium 503, and a bus 504; wherein the communication interface 501, the processor 502 and the machine-readable storage medium 503 are in communication with each other via a bus 504. The processor 502 may perform the data recovery methods described above by reading and executing machine-executable instructions in the machine-readable storage medium 503 corresponding to the data recovery control logic.

The machine-readable storage medium 503 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: volatile memory, non-volatile memory, or similar storage media. In particular, the machine-readable storage medium 503 may be a RAM (random Access Memory), a flash Memory, a storage drive (e.g., a hard disk drive), a solid state disk, any type of storage disk (e.g., a compact disk, a DVD, etc.), or similar storage medium, or a combination thereof.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A data recovery method is applied to monitors in a Ceph cluster of a distributed storage system, and when OSD topology of object storage devices of the Ceph cluster changes, the method comprises the following steps:

determining a target homing group PG to be subjected to data recovery;

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein the detecting the Ceph cluster load status comprises:

if yes, determining that the Ceph cluster is in a busy state;

4. The method according to claim 2, wherein after said determining the target homing group PG to be data restored, the method comprises:

starting a preset timer;

detecting whether the timer is overtime;

5. The method of claim 4, wherein the restoring the data to be restored in the target PG comprises:

6. The method of claim 3,

the cluster load parameters include: the ratio of the current service IO number of the Ceph cluster to the current all IO numbers of the Ceph cluster;

7. The method of claim 1, wherein prior to the determining the target PG to be subjected to data recovery, the method further comprises:

calculating an OSD group corresponding to each PG in the Ceph cluster;

the determining of the target PG to be subjected to data recovery includes:

8. A data recovery device is applied to a monitor in a Ceph cluster of a distributed storage system, and when OSD topology of object storage devices of the Ceph cluster changes, the device comprises:

9. The apparatus of claim 8, further comprising:

10. The apparatus according to claim 8, wherein the detecting unit is configured to detect whether a current value of a cluster load parameter reflecting a current load status of the Ceph cluster is greater than a first preset value; if yes, determining that the Ceph cluster is in a busy state; if not, further detecting the current value of the node load parameter reflecting the current load state of each OSD in the Ceph cluster; if the current value of the node load parameter is larger than the OSD of a second preset value, determining that the Ceph cluster is in a busy state; and if the current values of the node load parameters of all the OSD in the Ceph cluster are less than or equal to the second preset value, determining that the Ceph cluster is in a non-busy state.

11. The apparatus of claim 9, further comprising:

the starting unit is used for starting a preset timer;

12. The apparatus according to claim 11, wherein the recovery unit, when recovering the data to be recovered in the target PG, is specifically configured to start a preset second timer when starting to recover the data to be recovered in the target PG; when the second timer is overtime, detecting whether all data to be recovered in the target PG are completely recovered or not; if not, stopping recovering the unrecovered data in the target PG, closing the second timer, and returning to the step of detecting whether the timer is overtime.

13. The apparatus of claim 10, wherein the cluster load parameter comprises: the ratio of the current service IO number of the Ceph cluster to the current all IO numbers of the Ceph cluster;

14. The apparatus of claim 8, further comprising: