CN107977168B

CN107977168B - Data dispersed storage system based on cloud storage

Info

Publication number: CN107977168B
Application number: CN201711351926.5A
Authority: CN
Inventors: 黄仁高
Original assignee: Anhui Changtai Information Security Service Co ltd
Current assignee: Anhui Shuan System Integration Co.,Ltd.
Priority date: 2017-12-15
Filing date: 2017-12-15
Publication date: 2021-01-01
Anticipated expiration: 2037-12-15
Also published as: CN107977168A

Abstract

The invention discloses a data decentralized storage system based on cloud storage, and relates to the technical field of data storage. The present invention includes a client, a data deduplication module, a processor, a data reference monitoring device, a first cloud storage device, a second cloud storage device, and a third cloud storage device; the data deduplication module receives the data information uploaded by the client and passes The data information is deduplicated by deduplication, and the processor performs data layout on the deduplicated data information and stores them in the first cloud storage device, the second cloud storage device and the third cloud storage device respectively. The invention obtains the data information after deduplication by processing the data, obtains the data reference rate, judges the attribute of the piece of data information according to the data reference rate, and uses the replication-based data distribution strategy to store the data information, thereby reducing the required amount of the processor. Operation steps; adopt data layout strategy based on erasure code to obtain higher storage efficiency.

Description

Data dispersed storage system based on cloud storage

Technical Field

The invention belongs to the technical field of data storage, and particularly relates to a data dispersed storage system based on cloud storage.

Background

With the explosive growth of data, how to effectively perform processes such as query and write on massive data by a data storage system becomes a research focus in the field of data storage. The data storage system is a system including various storage devices for storing programs and data, a control unit, a device for managing information scheduling, and a processing algorithm. As more and more data are stored, the storage space of the storage system is larger and larger, and the processing performance requirement on the data storage system is higher and higher.

At present, a data storage mode is mainly to set a large database to store mass data specially, and although the data storage mode can meet the requirement of high storage capacity of the mass data through the large database, the efficiency of inquiring and writing certain data in the large database is greatly reduced, and the data processing efficiency is sacrificed, so that a data dispersed storage system based on cloud storage is provided, and the problems are solved.

Disclosure of Invention

The invention aims to provide a cloud storage-based data dispersed storage system, which solves the problems of low data processing efficiency and poor safety performance under the condition of high storage capacity of the existing mass data through the arrangement of a client, a data deduplication module, a processor, a data reference monitoring device, a first cloud storage device, a second cloud storage device and a third cloud storage device.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention relates to a data dispersed storage system based on cloud storage, which comprises a client, a data duplicate removal module, a processor, a data reference monitoring device, a first cloud storage device, a second cloud storage device and a third cloud storage device, wherein the data duplicate removal module is used for removing duplicate data; the client uploads data information required to be stored to the data duplication eliminating module; the data deduplication module receives data information uploaded by a client and deduplicates the data information through data deduplication, and the data deduplication module is used for transmitting the deduplicated data information to the processor; the processor is electrically connected with the data reference monitoring device, and the data reference monitoring device is electrically connected with the data deduplication module; the data reference monitoring device is used for automatically acquiring the reference rate of the data information and transmitting the reference rate to the processor; the processor receives the data information after the data duplication elimination transmitted by the data duplication elimination module; the processor receives the data information utilization rate transmitted by the data reference monitoring device, and performs data layout on the data information subjected to the weight removal and stores the data information into the first cloud storage device, the second cloud storage device and the third cloud storage device respectively.

Further, the deduplication comprises the following steps: SS 01: automatically retrieving all data and partitioning; SS 02: the method comprises the steps of automatically judging repeated data information from data information by adopting a data blocking monitoring technology based on a block level; SS 03: deleting the repeated data information, keeping a single copy of the repeated data information, and replacing other repeated copies by using a pointer pointing to the single copy; SS 04: and obtaining the data information after the weight removal.

Further, the data layout comprises the following steps: s1: the processor automatically acquires the data information introduction rate acquired by the data duplication removal module when the data information is duplicated through the data introduction monitoring device; s2: the processor compares the data information utilization rate with a preset value of the utilization rate, and stores the data information by adopting a data distribution strategy based on copying when the data information utilization rate is greater than the preset value of the utilization rate; s3: when the data information introduction rate is less than or equal to the introduction rate preset value, adopting a data layout strategy based on erasure codes; s4: and storing the data information into the first cloud storage device, the second cloud storage device and the third cloud storage device.

Further, the block level-based data blocking monitoring technology adopts a fixed-length-based data blocking technology.

The invention has the following beneficial effects:

1. according to the invention, through the arrangement of the data deduplication module, data can be processed to obtain data information after deduplication, the data use rate is obtained through the data deduplication module, whether the data information is hot spot information or general information is judged through the data use rate, and for the hot spot information which is frequently accessed by a user, a data distribution strategy based on replication is adopted to store the data information, so that the operation steps required by a processor are reduced; and for general information, a data layout strategy based on erasure codes can be adopted, so that higher storage efficiency can be obtained.

2. According to the invention, through the arrangement of the plurality of cloud storage devices, the safe storage of information is realized to the greatest extent, the problem of loss of cloud disk storage data is avoided, and the information safety of people is protected to the greatest extent; the invention is simple and effective and easy to use.

Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a data dispersed storage system based on cloud storage according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention is a data distributed storage system based on cloud storage, including a client, a data deduplication module, a processor, a data reference monitoring device, a first cloud storage device, a second cloud storage device, and a third cloud storage device; the client uploads data information required to be stored to the data duplication eliminating module; the data duplicate removal module receives the data information uploaded by the client and removes the duplicate of the data information through data duplicate deletion, and the data duplicate removal module is used for transmitting the data information after the duplicate removal to the processor; the processor is electrically connected with the data reference monitoring device, and the data reference monitoring device is electrically connected with the data deduplication module; the data reference monitoring device is used for automatically acquiring the reference rate of the data information and transmitting the reference rate to the processor; the processor receives the data information after the data duplication elimination transmitted by the data duplication elimination module; the processor receives the data information quote rate transmitted by the data quote monitoring device, and the data information after the weight removal is subjected to data layout and is respectively stored in the first cloud storage device, the second cloud storage device and the third cloud storage device.

The data de-duplication method comprises the following steps: SS 01: automatically retrieving all data and partitioning; SS 02: the method comprises the steps of automatically judging repeated data information from data information by adopting a data blocking monitoring technology based on a block level; SS 03: deleting the repeated data information, keeping a single copy of the repeated data information, and replacing other repeated copies by using a pointer pointing to the single copy; SS 04: and obtaining the data information after the weight removal.

Wherein, the data layout comprises the following steps: s1: the processor automatically acquires the data information introduction rate acquired by the data duplication removal module when the data information is duplicated through the data introduction monitoring device; s2: the processor compares the data information utilization rate with a preset value of the utilization rate, and stores the data information by adopting a data distribution strategy based on copying when the data information utilization rate is greater than the preset value of the utilization rate; s3: when the data information introduction rate is less than or equal to the introduction rate preset value, adopting a data layout strategy based on erasure codes; s4: and storing the data information into the first cloud storage device, the second cloud storage device and the third cloud storage device.

The data blocking monitoring technology based on the block level adopts a data blocking technology based on a fixed length.

In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A data dispersed storage system based on cloud storage is characterized by comprising a client, a data duplicate removal module, a processor, a data reference monitoring device, a first cloud storage device, a second cloud storage device and a third cloud storage device;

the client uploads data information required to be stored to the data duplication eliminating module; the data deduplication module receives data information uploaded by a client and deduplicates the data information through data deduplication, and the data deduplication module is used for transmitting the deduplicated data information to the processor;

the processor is electrically connected with the data reference monitoring device, and the data reference monitoring device is electrically connected with the data deduplication module; the data reference monitoring device is used for automatically acquiring the reference rate of the data information and transmitting the reference rate to the processor;

the processor receives the data information after the data is removed from the weight, which is transmitted by the data removing module; the processor receives the data information utilization rate transmitted by the data reference monitoring device, performs data layout on the data information subjected to the weight removal, and respectively stores the data information into the first cloud storage device, the second cloud storage device and the third cloud storage device;

the data de-duplication comprises the following steps:

SS 01: automatically retrieving all data and partitioning;

SS 02: the method comprises the steps of automatically judging repeated data information from data information by adopting a data blocking monitoring technology based on a block level;

SS 03: deleting the repeated data information, keeping a single copy of the repeated data information, and replacing other repeated copies by using a pointer pointing to the single copy;

SS 04: obtaining data information after weight removal;

the data layout comprises the following steps:

s1: the processor automatically acquires the data information introduction rate acquired by the data duplication removal module when the data information is duplicated through the data introduction monitoring device;

s2: the processor compares the data information utilization rate with a preset value of the utilization rate, and stores the data information by adopting a data distribution strategy based on copying when the data information utilization rate is greater than the preset value of the utilization rate;

s3: when the data information introduction rate is less than or equal to the introduction rate preset value, adopting a data layout strategy based on erasure codes;

s4: and storing the data information into the first cloud storage device, the second cloud storage device and the third cloud storage device.

2. The cloud storage based data decentralized storage system according to claim 1, wherein the block level based data blocking monitoring technique employs a fixed length based data blocking technique.