[go: up one dir, main page]

CN112394873B - Data management method, system, electronic equipment and storage medium - Google Patents

Data management method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN112394873B
CN112394873B CN201910740429.7A CN201910740429A CN112394873B CN 112394873 B CN112394873 B CN 112394873B CN 201910740429 A CN201910740429 A CN 201910740429A CN 112394873 B CN112394873 B CN 112394873B
Authority
CN
China
Prior art keywords
data block
file
inter
data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910740429.7A
Other languages
Chinese (zh)
Other versions
CN112394873A (en
Inventor
周玉坤
付忞
古亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN201910740429.7A priority Critical patent/CN112394873B/en
Publication of CN112394873A publication Critical patent/CN112394873A/en
Application granted granted Critical
Publication of CN112394873B publication Critical patent/CN112394873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0617Improving the reliability of storage systems in relation to availability
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data management method, a system, an electronic device and a computer readable storage medium, wherein the method comprises the following steps: when a write request of a first target file is received, performing blocking processing on the first target file, and generating metadata of the first target file; determining whether the data block in the first target file is an inter-file high-reference data block according to the metadata; the inter-file high reference data block is an inter-file repeated data block with the inter-file reference count being greater than or equal to a threshold value; if the data block is the inter-file high-reference data block, performing redundancy management on the data block by using a copy strategy; and if the data block is a non-inter-file high-reference data block, performing redundancy management on the data block by using an erasure coding strategy. Therefore, the data management method provided by the application ensures higher data availability with lower storage expense, and avoids the problem of data loss or inaccessible caused by storage equipment failure.

Description

Data management method, system, electronic equipment and storage medium
Technical Field
The present application relates to the field of storage technology, and more particularly, to a data management method, system, and electronic device and a computer readable storage medium.
Background
Data deduplication technology has been widely applied as a system-level compression technology in backup systems, primary storage systems, virtual machines, and cloud storage systems. Since storage systems inevitably face uncorrectable disk errors and potential sector errors, ensuring data availability is one of the important security indicators for storage systems. Data deduplication may reduce storage overhead relative to storage systems that do not employ data deduplication, but also inevitably compromises data availability. The logical layout and the physical layout of the file after the data deduplication are inconsistent, and the same data block is referenced in different files. Loss of one physical block can result in more severe data loss of the secondary storage system. Therefore, improving the data availability of data deduplication systems is a very serious challenge.
At present, an erasure code or copy policy is generally adopted for data subjected to data deduplication, the scalability of the erasure code based policy adopted in a storage system is poor and additional I/O overhead is caused, and the storage overhead is increased by a copy policy based method.
Therefore, how to meet the requirements of availability and low storage overhead in a data deduplication storage system is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide a data management method, a system, an electronic device and a computer readable storage medium, which meet the requirements of availability and low storage overhead in a data deduplication storage system.
In order to achieve the above object, the present application provides a data management method, including:
When a write request of a first target file is received, performing blocking processing on the first target file, and generating metadata of the first target file;
Determining whether the data block in the first target file is an inter-file high-reference data block according to the metadata; the inter-file high reference data block is an inter-file repeated data block with the inter-file reference count being greater than or equal to a threshold value;
If the data block is the inter-file high-reference data block, performing redundancy management on the data block by using a copy strategy;
And if the data block is a non-inter-file high-reference data block, performing redundancy management on the data block by using an erasure coding strategy.
Wherein the determining, according to the metadata, whether the data block in the first target file is an inter-file high-reference data block includes:
Judging whether the data block in the first target file is an inter-file repeated data block or not;
If the data block is the repeated data block among the files, judging that the data block is a high-reference data block among the files when the reference count among the files of the data block is larger than or equal to a threshold value, and judging that the data block is a low-reference data among the files when the reference count among the files is smaller than the threshold value;
and if the data block is a non-inter-file repeated data block, judging the data block as the non-inter-file high-reference data block.
The determining whether the data block in the first target file is a repeated data block between files includes:
judging whether the fingerprint of the data block is matched with a fingerprint sequence in metadata of the first target file or not;
If the data block is matched with the fingerprint sequence in the metadata of the first target file, the data block is a repeated data block in the file;
If the fingerprint sequence is not matched with the fingerprint sequence in the metadata of the first target file, judging whether the fingerprint hits the fingerprint sequences of other files or not; if yes, the data block is a repeated data block among files; if not, the data block is a non-repeated data block.
The redundancy management of the data block by using the copy policy includes:
And when the inter-file reference count meets a preset condition, increasing the number of copies of the data block. The preset conditions comprise a first preset condition and a second preset condition; the first preset condition is that the inter-file reference count is equal to the threshold value, the second preset condition is that the inter-file reference count, the target reference count and the current copy number meet a preset relation, the target reference count is the reference count of the data block when the copy number is increased from the target copy number to the current copy number, and the target copy number is the current copy number minus one.
Wherein the redundancy management of the data block using the erasure coding strategy includes:
adding the data block into a target container;
dividing the target container into k data objects when the target container is filled, and generating m check objects corresponding to the k data objects; wherein m and k are positive integers;
and storing the k data objects and the m check objects into k+m nodes.
Wherein, still include:
when a second target file reading request is received, reading the coding object of each data block in the second target file from at least k nodes according to metadata information of erasure code stripes;
and decoding the coded object to obtain each data block in the second target file so as to respond to the read request.
To achieve the above object, the present application provides a data management system comprising:
the generating module is used for carrying out blocking processing on the first target file when receiving a write request of the first target file and generating metadata of the first target file;
The determining module is used for determining whether the data block in the first target file is an inter-file high-reference data block or not according to the metadata; the inter-file high reference data block is an inter-file repeated data block with the inter-file reference count being greater than or equal to a threshold value; if yes, starting the workflow of the first management module; if not, starting the workflow of the second management module;
The first management module is used for carrying out redundancy management on the data blocks by utilizing a copy strategy;
and the second management module is used for carrying out redundancy management on the data block by utilizing an erasure code strategy.
To achieve the above object, the present application provides an electronic device including:
A memory for storing a computer program;
and a processor for implementing the steps of the data management method as described above when executing the computer program.
To achieve the above object, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data management method as described above.
According to the scheme, the data management method provided by the application comprises the following steps: when a write request of a first target file is received, performing blocking processing on the first target file, and generating metadata of the first target file; determining whether the data block in the first target file is an inter-file high-reference data block according to the metadata; the inter-file high reference data block is an inter-file repeated data block with the inter-file reference count being greater than or equal to a threshold value; if yes, redundancy management is carried out on the data blocks by using a copy strategy; if not, the data block is subjected to redundancy management by using an erasure coding strategy.
According to the data management method provided by the application, for the inter-file repeated data blocks with the inter-file reference count greater than or equal to the threshold value, namely the inter-file high-reference data blocks, the redundancy management is carried out by using the copy strategy, so that the higher data availability of the inter-file high-reference data blocks can be ensured, and for other data blocks, the redundancy management is carried out by using the erasure code strategy, and the storage cost is reduced. Therefore, the data management method provided by the application ensures higher data availability with lower storage expense, and avoids the problem of data loss or inaccessible caused by storage equipment failure. The application also discloses a data management system, an electronic device and a computer readable storage medium, and the technical effects can be achieved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:
FIG. 1 is a block diagram of a data management method according to the present application;
FIG. 2 is a flow chart illustrating a method of data management according to an exemplary embodiment;
fig. 3 is a detailed flowchart of step S103 in fig. 2;
FIG. 4 is a detailed flowchart of step S104 in FIG. 2;
FIG. 5 is a flow chart illustrating another data management method according to an exemplary embodiment;
FIG. 6 is a block diagram of a data management system, shown in accordance with an exemplary embodiment;
Fig. 7 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
As shown in FIG. 1, an architecture diagram of a data management method provided by the application comprises a data deduplication module, a duplicate data detection module and a data management module. The input of the data deduplication module is file content, and the output is metadata information. And the repeated data detection module outputs the input file metadata information into a data block state. The data blocks include non-duplicate data blocks, intra-file duplicate data blocks, inter-file high reference data blocks, and inter-file low reference data blocks. The input of the data management module is a data block state, and the output is a data redundancy management strategy, including a copy strategy and an erasure code strategy.
The embodiment of the application discloses a data management method which meets the requirements of availability and low storage overhead in a data deduplication storage system.
Referring to fig. 2, a flowchart of a data management method is shown according to an exemplary embodiment, as shown in fig. 2, including:
s101: when a write request of a first target file is received, performing blocking processing on the first target file, and generating metadata of the first target file;
the execution body of the embodiment can be a processor of a data deduplication storage system, and the data deduplication storage system is a storage system supporting the data deduplication characteristic function, so that storage cost can be reduced. The system segments the input data or file into data blocks, calculates a hash value of each data block as a fingerprint of the data block using a hash function (e.g., SHA-1) and maintains a fingerprint index table in memory, the fingerprint index table recording a one-to-one mapping between the fingerprint of the data block and the physical location.
In a specific implementation, when a write request of a first target file is received, the first target file is segmented into data blocks, and metadata is generated, wherein the metadata mainly comprises file names, paths, the number of the data blocks, fingerprint sequences of all the data blocks, the length of the data blocks and the like.
S102: determining whether the data block in the first target file is an inter-file high-reference data block according to the metadata; if yes, go to S103; if not, entering S104;
The inter-file high reference data block is an inter-file repeated data block with the inter-file reference count being greater than or equal to a threshold value;
In this step, for each data block in the first target file, it is necessary to determine its data block status, including non-duplicate data blocks, intra-file duplicate data blocks, and inter-file duplicate data blocks. The number of the non-repeated data blocks in the system is 1, the number of the repeated data blocks in the files where the repeated data blocks are located in the files is larger than 1, and the reference count of the repeated data blocks among the files is larger than 1, namely a plurality of different files contain the repeated data blocks among the files, wherein the repeated data blocks among the files comprise high reference data blocks among the files and low reference data blocks among the files. The process proceeds to S103 for inter-file high reference data blocks, otherwise proceeds to S104.
The inter-file high reference data block is an inter-file repeated data block with an inter-file reference count greater than or equal to a threshold, the inter-file reference count of the data block represents the number of files referencing the data block, and for an input data block, if the data block is an inter-file repeated data block, the inter-file reference count of the data block is increased by 1. When the inter-file reference count is greater than or equal to the threshold, the data block is an inter-file high reference data block. I.e. the step may comprise: judging whether the data block in the first target file is an inter-file repeated data block or not; if yes, determining the reference count among files of the data block; judging whether the reference count between files is larger than or equal to the threshold value; if yes, the data block is a high-reference data block among files; if not, the data block is the inter-file low reference data.
The step of determining whether the data block in the first target file is an inter-file repeated data block may include: judging whether the fingerprint of the data block is matched with a fingerprint sequence in metadata of the first target file or not; if the data blocks are matched, the data blocks are repeated data blocks in the file; if not, judging whether the fingerprint hits the fingerprint sequences of other files; if yes, the data block is a repeated data block among files; if not, the data block is a non-repeated data block.
S103: redundancy management is carried out on the data blocks by using a copy strategy;
In the step, the duplicate strategy is adopted for redundant management of the inter-file high-reference data blocks, so that the high availability of the data is ensured.
Preferably, the redundancy management can be performed by using a dynamic copy policy, that is, this step may include: and when the inter-file reference count meets a preset condition, increasing the number of copies of the data block. The person skilled in the art can set a preset condition, and dynamically adjust the number of copies of each data block, that is, when the reference count between files does not meet the preset condition, the number of copies is increased, and the increasing gradient can be set to 1.
S104: and carrying out redundancy management on the data block by using an erasure code strategy.
In the step, for the non-repeated data blocks and the low-reference data blocks between the files, the erasure coding strategy is utilized to carry out redundancy management, so that the overall storage overhead of the system is reduced.
According to the data management method provided by the embodiment of the application, for the repeated data blocks among the files, namely the high-reference data blocks among the files, with the reference count being greater than or equal to the threshold value, the redundancy management is carried out by using the copy strategy, so that the higher data availability of the high-reference data blocks among the files can be ensured, and for other data blocks, the redundancy management is carried out by using the erasure code strategy, and the storage cost is reduced. Therefore, the data management method provided by the embodiment of the application ensures higher data availability with lower storage expense, and avoids the problem of data loss or inaccessible caused by the failure of the storage equipment.
The following describes in detail the process of redundancy management using the copy policy, as shown in fig. 3, step S103 in the above embodiment may include:
S31: judging whether the reference count between files is equal to the threshold value; if yes, go to S33; if not, entering S32;
s32: determining the current copy number of the data block, and judging whether the reference count between files and the current copy number meet a preset relational expression or not; if yes, go to S33;
Wherein, the preset relational expression specifically comprises:
T-Tr=a·r+b;
Wherein T is the reference count between the files, r is the current copy number, T r is the reference count of the data block when the copy number is increased from r-1 to r, and a and b are preset parameters.
S33: increasing the number of copies of the data block by 1;
In this embodiment, the preset conditions include a first preset condition or a second preset condition, where the first preset condition is that the inter-file reference count is equal to a threshold, and the second preset condition is that the inter-file reference count is greater than the threshold, and the inter-file reference count T, the target reference count T r, and the current copy number r satisfy a preset relationship. If any of the above is satisfied, the number of copies of the data block is increased by 1. Each copy can be stored on different nodes in a scattered mode, reserved spaces of the nodes are written, and metadata information of the copies can be updated into a copy table.
Preferably, before step S31, the method may further include: and judging whether the current copy number of the data block exceeds the maximum copy number, if so, deleting the data block.
Therefore, for the inter-file high-reference data block, the embodiment dynamically adds the number of copies according to the increase of the inter-file reference count, so that the expansibility and the data availability of the system are improved.
The following describes in detail the process of redundancy management using erasure coding strategy, as shown in fig. 4, step S104 in the first embodiment may include:
S41: adding the data block into a target container;
In this embodiment, for non-inter-file high reference data blocks, i.e., non-duplicate data blocks and inter-file low reference data blocks, they are aggregated and written to the container.
S42: dividing the target container into k data objects when the target container is filled, and generating m check objects corresponding to the k data objects; wherein m and k are positive integers;
in this step, when the container is filled, a single container is divided into k data objects, and m check objects are generated, and the data objects and the check objects are collectively referred to as an encoding object. In the case of data block reading, a complete data block can be obtained by reading at least k code objects.
S43: and storing the k data objects and the m check objects into k+m nodes.
In this step, different encoded objects are stored in different nodes, i.e. k data objects and m check objects are stored in k+m nodes, respectively.
Therefore, for the non-inter-file high-reference data blocks, the erasure coding strategy is adopted to store the coding objects into a plurality of devices, so that the overall storage overhead of the system is reduced.
The following describes in detail the reading flow in the data management method provided by the present application, and specifically, as shown in fig. 5, the method includes:
S201: when a second target file reading request is received, reading the coding object of each data block in the second target file from at least k nodes according to metadata information of erasure code stripes;
in this embodiment, when a second target file read request is received, metadata of the second target file is searched to obtain a file path, a file name and a fingerprint sequence. And creating a file according to the file path and the file name. For each data block in the metadata, the encoding object is read from at least k nodes according to metadata information of the erasure code stripes.
The metadata information of the erasure code stripe here records the node storing the encoding object of each data block in the second target file. In the process of writing the file, the whole container is divided into k data objects, so that when each data block is read, at least k nodes are selected from nodes corresponding to each data block to read data, and then the complete data block can be obtained.
S202: and decoding the coded object to obtain each data block in the second target file so as to respond to the read request.
In the step, decoding operation is carried out on the coded object to obtain complete data blocks, all the data blocks are written into the new file, and the new file is returned to respond to the read request.
If the selected node has a fault node, the fault node recovery data can be reconstructed by using the erasure codes. If partial data blocks cannot be reconstructed, the copy table is searched to judge whether other nodes have copies, a copy address is obtained, and the copy is read to recover the data of the fault node.
The following describes a data management system according to an embodiment of the present application, and the data management system described below and the data management method described above may be referred to each other.
Referring to fig. 6, a block diagram of a data management system is shown according to an exemplary embodiment, as shown in fig. 6, including:
The generating module 601 is configured to perform a blocking process on a first target file when a write request of the first target file is received, and generate metadata of the first target file;
a determining module 602, configured to determine, according to the metadata, whether a data block in the first target file is an inter-file high-reference data block; the inter-file high reference data block is an inter-file repeated data block with the inter-file reference count being greater than or equal to a threshold value; if yes, starting the workflow of the first management module; if not, starting the workflow of the second management module;
The first management module 603 is configured to perform redundancy management on the data block by using a copy policy;
the second management module 604 is configured to perform redundancy management on the data block using an erasure coding policy.
According to the data management system provided by the embodiment of the application, for the inter-file repeated data blocks with the inter-file reference count greater than or equal to the threshold value, namely the inter-file high-reference data blocks, the redundancy management is carried out by using the copy strategy, so that the higher data availability of the inter-file high-reference data blocks can be ensured, and for other data blocks, the redundancy management is carried out by using the erasure code strategy, and the storage cost is reduced. Therefore, the data management system provided by the embodiment of the application ensures higher data availability with lower storage expense, and avoids the problem of data loss or inaccessible caused by storage equipment failure.
Based on the above embodiment, as a preferred implementation manner, the determining module 602 includes:
the first judging unit is used for judging whether the data block in the first target file is a repeated data block among files or not; if yes, starting the workflow of the first judging unit; if not, starting the workflow of the second judging unit;
A first determining unit, configured to determine that the data block is an inter-file high-reference data block when the inter-file reference count of the data block is greater than or equal to a threshold value, and determine that the data block is inter-file low-reference data when the inter-file reference count is less than the threshold value;
And the second judging unit is used for judging the data block to be the non-inter-file high-reference data block.
On the basis of the above embodiment, as a preferred embodiment, the first judging unit includes:
a first judging subunit, configured to judge whether a fingerprint of the data block hits a fingerprint sequence in metadata of the first target file; if the data blocks are matched, the data blocks are repeated data blocks in the file; if not, starting the workflow of the second judging subunit;
The second judging subunit is used for judging whether the fingerprint hits the fingerprint sequences of other files or not; if yes, the data block is a repeated data block among files; if not, the data block is a non-repeated data block.
On the basis of the above embodiment, as a preferred implementation manner, the first management module 603 is specifically a module that increases the number of copies of the data block when the inter-file reference count meets a preset condition.
On the basis of the above embodiment, as a preferred implementation manner, the preset condition includes a first preset condition or a second preset condition; the first preset condition is that the inter-file reference count is equal to the threshold value, the second preset condition is that the inter-file reference count, the target reference count and the current copy number meet a preset relation, the target reference count is the reference count of the data block when the copy number is increased from the target copy number to the current copy number, and the target copy number is the current copy number minus one.
Based on the above embodiment, as a preferred implementation manner, the second management module 604 includes:
An adding unit, configured to add the data block to a target container;
the generating unit is used for dividing the target container into k data objects when the target container is filled, and generating m check objects corresponding to the k data objects; wherein m and k are positive integers;
and the storage unit is used for storing the k data objects and the m check objects into k+m nodes.
On the basis of the above embodiment, as a preferred implementation manner, the method further includes:
The reading module is used for reading the coding object of each data block in the second target file from at least k nodes according to the metadata information of the erasure code strip when receiving a second target file reading request;
And the decoding module is used for decoding the coded object to obtain each data block in the second target file so as to respond to the read request.
The specific manner in which the various modules perform the operations in relation to the systems of the above embodiments have been described in detail in relation to the embodiments of the method and will not be described in detail herein.
The present application also provides an electronic device, referring to fig. 7, and a block diagram of an electronic device 700 provided in an embodiment of the present application, as shown in fig. 7, may include a processor 11 and a memory 12. The electronic device 700 may also include one or more of a multimedia component 13, an input/output (I/O) interface 14, and a communication component 15.
Wherein the processor 11 is configured to control the overall operation of the electronic device 700 to perform all or part of the steps of the data management method described above. The memory 12 is used to store various types of data to support operation on the electronic device 700, which may include, for example, instructions for any application or method operating on the electronic device 700, as well as application-related data, such as contact data, messages sent and received, pictures, audio, video, and so forth. The Memory 12 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 13 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 12 or transmitted through the communication component 15. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 14 provides an interface between the processor 11 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 15 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near field Communication (NFC for short), 2G, 3G or 4G, or a combination of one or more thereof, so that the corresponding Communication component 15 may comprise: wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the electronic device 700 may be implemented by one or more Application-specific integrated circuits (ASIC), digital signal Processor (DIGITAL SIGNAL Processor, DSP), digital signal processing device (DIGITAL SIGNAL Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable gate array (Field Programmable GATE ARRAY, FPGA), controller, microcontroller, microprocessor, or other electronic element for performing the data management methods described above.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the data management method described above is also provided. For example, the computer readable storage medium may be the memory 12 described above including program instructions executable by the processor 11 of the electronic device 700 to perform the data management method described above.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the application can be made without departing from the principles of the application and these modifications and adaptations are intended to be within the scope of the application as defined in the following claims.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (8)

1. A method of data management, comprising:
partitioning a first target file, and generating metadata of the first target file;
Determining whether the data block in the first target file is an inter-file high-reference data block according to the metadata; the inter-file high-reference data block is an inter-file repeated data block with the inter-file reference count being greater than or equal to a threshold value, and the inter-file reference count of the data block represents the number of files which reference the data block;
If the data block is the inter-file high-reference data block, performing redundancy management on the data block by using a copy strategy;
if the data block is a non-inter-file high-reference data block, performing redundancy management on the data block by using an erasure coding strategy;
the redundancy management of the data block by using the copy policy includes:
When the inter-file reference count meets a preset condition, increasing the number of copies of the data block; the preset conditions comprise a first preset condition or a second preset condition; the first preset condition is that the inter-file reference count is equal to the threshold value, the second preset condition is that the inter-file reference count, the target reference count and the current copy number of the data block meet a preset relation, the target reference count is the reference count of the data block when the copy number is increased from the target copy number to the current copy number, the target copy number is the current copy number minus one, and the preset relation is specifically: ,/> Count for the inter-file references,/> For the current copy number,/>For the number of copies from/>Increased to/>Reference count of time data block,/>And/>Is a preset parameter.
2. The method according to claim 1, wherein determining whether the data block in the first target file is an inter-file high-reference data block according to the metadata comprises:
Judging whether the data block in the first target file is an inter-file repeated data block or not;
If the data block is the repeated data block among the files, judging that the data block is a high-reference data block among the files when the reference count among the files of the data block is larger than or equal to a threshold value, and judging that the data block is a low-reference data among the files when the reference count among the files is smaller than the threshold value;
and if the data block is a non-inter-file repeated data block, judging the data block as the non-inter-file high-reference data block.
3. The method of claim 2, wherein determining whether the data block in the first target file is an inter-file duplicate data block comprises:
judging whether the fingerprint of the data block is matched with a fingerprint sequence in metadata of the first target file or not;
If the data block is matched with the fingerprint sequence in the metadata of the first target file, the data block is a repeated data block in the file;
If the fingerprint sequence is not matched with the fingerprint sequence in the metadata of the first target file, judging whether the fingerprint hits the fingerprint sequences of other files or not; if yes, the data block is a repeated data block among files; if not, the data block is a non-repeated data block.
4. The method of claim 1, wherein the redundancy management of the data block using an erasure coding strategy comprises:
adding the data block into a target container;
dividing the target container into k data objects when the target container is filled, and generating m check objects corresponding to the k data objects; wherein m and k are positive integers;
and storing the k data objects and the m check objects into k+m nodes.
5. The data management method according to any one of claims 1 to 4, characterized by further comprising:
when a second target file reading request is received, reading the coding object of each data block in the second target file from at least k nodes according to metadata information of erasure code stripes;
and decoding the coded object to obtain each data block in the second target file so as to respond to the read request.
6. A data management system, comprising:
the generating module is used for carrying out blocking processing on the first target file when receiving a write request of the first target file and generating metadata of the first target file;
The determining module is used for determining whether the data block in the first target file is an inter-file high-reference data block or not according to the metadata; the inter-file high-reference data block is an inter-file repeated data block with the inter-file reference count being greater than or equal to a threshold value, and the inter-file reference count of the data block represents the number of files which reference the data block; if yes, starting the workflow of the first management module; if not, starting the workflow of the second management module;
The first management module is used for carrying out redundancy management on the data blocks by utilizing a copy strategy;
the second management module is used for performing redundancy management on the data block by using an erasure code strategy;
The first management module is specifically configured to: when the inter-file reference count meets a preset condition, increasing the number of copies of the data block; the preset conditions comprise a first preset condition or a second preset condition; the first preset condition is that the inter-file reference count is equal to the threshold value, the second preset condition is that the inter-file reference count, the target reference count and the current copy number of the data block meet a preset relation, the target reference count is the reference count of the data block when the copy number is increased from the target copy number to the current copy number, the target copy number is the current copy number minus one, and the preset relation is specifically: ,/> Count for the inter-file references,/> For the current copy number,/>For the number of copies from/>Increased to/>Reference count of time data block,/>And/>Is a preset parameter.
7. An electronic device, comprising:
A memory for storing a computer program;
A processor for implementing the steps of the data management method according to any one of claims 1 to 5 when executing said computer program.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the data management method according to any of claims 1 to 5.
CN201910740429.7A 2019-08-12 2019-08-12 Data management method, system, electronic equipment and storage medium Active CN112394873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910740429.7A CN112394873B (en) 2019-08-12 2019-08-12 Data management method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910740429.7A CN112394873B (en) 2019-08-12 2019-08-12 Data management method, system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112394873A CN112394873A (en) 2021-02-23
CN112394873B true CN112394873B (en) 2024-05-24

Family

ID=74602267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910740429.7A Active CN112394873B (en) 2019-08-12 2019-08-12 Data management method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112394873B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398006B (en) * 2021-12-24 2024-11-05 中国电信股份有限公司 A distributed storage mode control method, device, equipment and storage medium
CN119883142B (en) * 2025-03-28 2025-06-27 苏州元脑智能科技有限公司 Data management method, device, medium and product of distributed storage system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN102567218A (en) * 2010-12-17 2012-07-11 微软公司 Garbage collection and hotspots relief for a data deduplication chunk store
US8250035B1 (en) * 2008-09-30 2012-08-21 Emc Corporation Methods and apparatus for creating a branch file in a file system
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN104917788A (en) * 2014-03-11 2015-09-16 中国移动通信集团公司 Data storage method and apparatus
CN109144417A (en) * 2018-08-16 2019-01-04 广州杰赛科技股份有限公司 A kind of cloud storage method, system and equipment
CN109522151A (en) * 2017-09-15 2019-03-26 北京京东尚科信息技术有限公司 Method and device for data redundancy storage

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8250035B1 (en) * 2008-09-30 2012-08-21 Emc Corporation Methods and apparatus for creating a branch file in a file system
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN102567218A (en) * 2010-12-17 2012-07-11 微软公司 Garbage collection and hotspots relief for a data deduplication chunk store
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN104917788A (en) * 2014-03-11 2015-09-16 中国移动通信集团公司 Data storage method and apparatus
CN109522151A (en) * 2017-09-15 2019-03-26 北京京东尚科信息技术有限公司 Method and device for data redundancy storage
CN109144417A (en) * 2018-08-16 2019-01-04 广州杰赛科技股份有限公司 A kind of cloud storage method, system and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
多云存储中的数据分布及混合冗余方法;袁雪梅;硕士电子期刊;第2章,第5章 *

Also Published As

Publication number Publication date
CN112394873A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
CN110018998B (en) File management method and system, electronic equipment and storage medium
US9910620B1 (en) Method and system for leveraging secondary storage for primary storage snapshots
US11698728B2 (en) Data updating technology
US11954373B2 (en) Data structure storage and data management
US10108356B1 (en) Determining data to store in retention storage
CN112749039B (en) Method, apparatus and program product for data writing and data recovery
US20160006461A1 (en) Method and device for implementation data redundancy
CN110018989B (en) Snapshot comparison method and device
CN109522154B (en) Data recovery method and related equipment and system
US11886705B2 (en) System and method for using free space to improve erasure code locality
US10572335B2 (en) Metadata recovery method and apparatus
CN111143116A (en) Method and device for processing bad blocks of disk
WO2023082629A1 (en) Data storage method and apparatus, electronic device, and storage medium
CN112394873B (en) Data management method, system, electronic equipment and storage medium
CN110413454A (en) Data reconstruction method, device and storage medium based on storage array
JP2010079486A (en) Semiconductor recording device
CN111444114A (en) Method, device and system for processing data in nonvolatile memory
WO2019037587A1 (en) Data restoration method and device
US11645333B1 (en) Garbage collection integrated with physical file verification
CN108121504B (en) Data deleting method and device
JP4476021B2 (en) Disk array system
CN110874285B (en) Method for realizing reducible write operation of EXT file system
CN107257281A (en) NOR FLASH store method, device and the computer-readable recording medium of key data record
CN113553215A (en) Erasure code data recovery optimization method and device based on environmental information
CN107305582B (en) Method and device for processing metadata

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant