[go: up one dir, main page]

CN107977168B - Data dispersed storage system based on cloud storage - Google Patents

Data dispersed storage system based on cloud storage Download PDF

Info

Publication number
CN107977168B
CN107977168B CN201711351926.5A CN201711351926A CN107977168B CN 107977168 B CN107977168 B CN 107977168B CN 201711351926 A CN201711351926 A CN 201711351926A CN 107977168 B CN107977168 B CN 107977168B
Authority
CN
China
Prior art keywords
data
data information
cloud storage
storage device
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711351926.5A
Other languages
Chinese (zh)
Other versions
CN107977168A (en
Inventor
黄仁高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Shuan System Integration Co.,Ltd.
Original Assignee
Anhui Changtai Information Security Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Changtai Information Security Service Co ltd filed Critical Anhui Changtai Information Security Service Co ltd
Priority to CN201711351926.5A priority Critical patent/CN107977168B/en
Publication of CN107977168A publication Critical patent/CN107977168A/en
Application granted granted Critical
Publication of CN107977168B publication Critical patent/CN107977168B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于云存储的数据分散存储系统,涉及数据存储技术领域。本发明包括客户端、数据除重模块、处理器、数据引用监控装置、第一云存储装置、第二云存储装置、第三云存储装置;数据除重模块接收客户端上传的数据信息并通过重复数据删除对数据信息进行除重,处理器将除重后的数据信息进行数据布局并分别存储到第一云存储装置、第二云存储装置和第三云存储装置。本发明通过对数据进行处理得到除重后数据信息,并获得数据引用率,通过数据引用率判断该段数据信息属性,采用基于复制的数据分布策略对数据信息进行存储从而降低处理器所需的运算步骤;采用基于纠删码的数据布局策略从而能够获得较高的存储效率。

Figure 201711351926

The invention discloses a data decentralized storage system based on cloud storage, and relates to the technical field of data storage. The present invention includes a client, a data deduplication module, a processor, a data reference monitoring device, a first cloud storage device, a second cloud storage device, and a third cloud storage device; the data deduplication module receives the data information uploaded by the client and passes The data information is deduplicated by deduplication, and the processor performs data layout on the deduplicated data information and stores them in the first cloud storage device, the second cloud storage device and the third cloud storage device respectively. The invention obtains the data information after deduplication by processing the data, obtains the data reference rate, judges the attribute of the piece of data information according to the data reference rate, and uses the replication-based data distribution strategy to store the data information, thereby reducing the required amount of the processor. Operation steps; adopt data layout strategy based on erasure code to obtain higher storage efficiency.

Figure 201711351926

Description

Data dispersed storage system based on cloud storage
Technical Field
The invention belongs to the technical field of data storage, and particularly relates to a data dispersed storage system based on cloud storage.
Background
With the explosive growth of data, how to effectively perform processes such as query and write on massive data by a data storage system becomes a research focus in the field of data storage. The data storage system is a system including various storage devices for storing programs and data, a control unit, a device for managing information scheduling, and a processing algorithm. As more and more data are stored, the storage space of the storage system is larger and larger, and the processing performance requirement on the data storage system is higher and higher.
At present, a data storage mode is mainly to set a large database to store mass data specially, and although the data storage mode can meet the requirement of high storage capacity of the mass data through the large database, the efficiency of inquiring and writing certain data in the large database is greatly reduced, and the data processing efficiency is sacrificed, so that a data dispersed storage system based on cloud storage is provided, and the problems are solved.
Disclosure of Invention
The invention aims to provide a cloud storage-based data dispersed storage system, which solves the problems of low data processing efficiency and poor safety performance under the condition of high storage capacity of the existing mass data through the arrangement of a client, a data deduplication module, a processor, a data reference monitoring device, a first cloud storage device, a second cloud storage device and a third cloud storage device.
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention relates to a data dispersed storage system based on cloud storage, which comprises a client, a data duplicate removal module, a processor, a data reference monitoring device, a first cloud storage device, a second cloud storage device and a third cloud storage device, wherein the data duplicate removal module is used for removing duplicate data; the client uploads data information required to be stored to the data duplication eliminating module; the data deduplication module receives data information uploaded by a client and deduplicates the data information through data deduplication, and the data deduplication module is used for transmitting the deduplicated data information to the processor; the processor is electrically connected with the data reference monitoring device, and the data reference monitoring device is electrically connected with the data deduplication module; the data reference monitoring device is used for automatically acquiring the reference rate of the data information and transmitting the reference rate to the processor; the processor receives the data information after the data duplication elimination transmitted by the data duplication elimination module; the processor receives the data information utilization rate transmitted by the data reference monitoring device, and performs data layout on the data information subjected to the weight removal and stores the data information into the first cloud storage device, the second cloud storage device and the third cloud storage device respectively.
Further, the deduplication comprises the following steps: SS 01: automatically retrieving all data and partitioning; SS 02: the method comprises the steps of automatically judging repeated data information from data information by adopting a data blocking monitoring technology based on a block level; SS 03: deleting the repeated data information, keeping a single copy of the repeated data information, and replacing other repeated copies by using a pointer pointing to the single copy; SS 04: and obtaining the data information after the weight removal.
Further, the data layout comprises the following steps: s1: the processor automatically acquires the data information introduction rate acquired by the data duplication removal module when the data information is duplicated through the data introduction monitoring device; s2: the processor compares the data information utilization rate with a preset value of the utilization rate, and stores the data information by adopting a data distribution strategy based on copying when the data information utilization rate is greater than the preset value of the utilization rate; s3: when the data information introduction rate is less than or equal to the introduction rate preset value, adopting a data layout strategy based on erasure codes; s4: and storing the data information into the first cloud storage device, the second cloud storage device and the third cloud storage device.
Further, the block level-based data blocking monitoring technology adopts a fixed-length-based data blocking technology.
The invention has the following beneficial effects:
1. according to the invention, through the arrangement of the data deduplication module, data can be processed to obtain data information after deduplication, the data use rate is obtained through the data deduplication module, whether the data information is hot spot information or general information is judged through the data use rate, and for the hot spot information which is frequently accessed by a user, a data distribution strategy based on replication is adopted to store the data information, so that the operation steps required by a processor are reduced; and for general information, a data layout strategy based on erasure codes can be adopted, so that higher storage efficiency can be obtained.
2. According to the invention, through the arrangement of the plurality of cloud storage devices, the safe storage of information is realized to the greatest extent, the problem of loss of cloud disk storage data is avoided, and the information safety of people is protected to the greatest extent; the invention is simple and effective and easy to use.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a data dispersed storage system based on cloud storage according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention is a data distributed storage system based on cloud storage, including a client, a data deduplication module, a processor, a data reference monitoring device, a first cloud storage device, a second cloud storage device, and a third cloud storage device; the client uploads data information required to be stored to the data duplication eliminating module; the data duplicate removal module receives the data information uploaded by the client and removes the duplicate of the data information through data duplicate deletion, and the data duplicate removal module is used for transmitting the data information after the duplicate removal to the processor; the processor is electrically connected with the data reference monitoring device, and the data reference monitoring device is electrically connected with the data deduplication module; the data reference monitoring device is used for automatically acquiring the reference rate of the data information and transmitting the reference rate to the processor; the processor receives the data information after the data duplication elimination transmitted by the data duplication elimination module; the processor receives the data information quote rate transmitted by the data quote monitoring device, and the data information after the weight removal is subjected to data layout and is respectively stored in the first cloud storage device, the second cloud storage device and the third cloud storage device.
The data de-duplication method comprises the following steps: SS 01: automatically retrieving all data and partitioning; SS 02: the method comprises the steps of automatically judging repeated data information from data information by adopting a data blocking monitoring technology based on a block level; SS 03: deleting the repeated data information, keeping a single copy of the repeated data information, and replacing other repeated copies by using a pointer pointing to the single copy; SS 04: and obtaining the data information after the weight removal.
Wherein, the data layout comprises the following steps: s1: the processor automatically acquires the data information introduction rate acquired by the data duplication removal module when the data information is duplicated through the data introduction monitoring device; s2: the processor compares the data information utilization rate with a preset value of the utilization rate, and stores the data information by adopting a data distribution strategy based on copying when the data information utilization rate is greater than the preset value of the utilization rate; s3: when the data information introduction rate is less than or equal to the introduction rate preset value, adopting a data layout strategy based on erasure codes; s4: and storing the data information into the first cloud storage device, the second cloud storage device and the third cloud storage device.
The data blocking monitoring technology based on the block level adopts a data blocking technology based on a fixed length.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (2)

1. A data dispersed storage system based on cloud storage is characterized by comprising a client, a data duplicate removal module, a processor, a data reference monitoring device, a first cloud storage device, a second cloud storage device and a third cloud storage device;
the client uploads data information required to be stored to the data duplication eliminating module; the data deduplication module receives data information uploaded by a client and deduplicates the data information through data deduplication, and the data deduplication module is used for transmitting the deduplicated data information to the processor;
the processor is electrically connected with the data reference monitoring device, and the data reference monitoring device is electrically connected with the data deduplication module; the data reference monitoring device is used for automatically acquiring the reference rate of the data information and transmitting the reference rate to the processor;
the processor receives the data information after the data is removed from the weight, which is transmitted by the data removing module; the processor receives the data information utilization rate transmitted by the data reference monitoring device, performs data layout on the data information subjected to the weight removal, and respectively stores the data information into the first cloud storage device, the second cloud storage device and the third cloud storage device;
the data de-duplication comprises the following steps:
SS 01: automatically retrieving all data and partitioning;
SS 02: the method comprises the steps of automatically judging repeated data information from data information by adopting a data blocking monitoring technology based on a block level;
SS 03: deleting the repeated data information, keeping a single copy of the repeated data information, and replacing other repeated copies by using a pointer pointing to the single copy;
SS 04: obtaining data information after weight removal;
the data layout comprises the following steps:
s1: the processor automatically acquires the data information introduction rate acquired by the data duplication removal module when the data information is duplicated through the data introduction monitoring device;
s2: the processor compares the data information utilization rate with a preset value of the utilization rate, and stores the data information by adopting a data distribution strategy based on copying when the data information utilization rate is greater than the preset value of the utilization rate;
s3: when the data information introduction rate is less than or equal to the introduction rate preset value, adopting a data layout strategy based on erasure codes;
s4: and storing the data information into the first cloud storage device, the second cloud storage device and the third cloud storage device.
2. The cloud storage based data decentralized storage system according to claim 1, wherein the block level based data blocking monitoring technique employs a fixed length based data blocking technique.
CN201711351926.5A 2017-12-15 2017-12-15 Data dispersed storage system based on cloud storage Active CN107977168B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711351926.5A CN107977168B (en) 2017-12-15 2017-12-15 Data dispersed storage system based on cloud storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711351926.5A CN107977168B (en) 2017-12-15 2017-12-15 Data dispersed storage system based on cloud storage

Publications (2)

Publication Number Publication Date
CN107977168A CN107977168A (en) 2018-05-01
CN107977168B true CN107977168B (en) 2021-01-01

Family

ID=62006457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711351926.5A Active CN107977168B (en) 2017-12-15 2017-12-15 Data dispersed storage system based on cloud storage

Country Status (1)

Country Link
CN (1) CN107977168B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109491591A (en) * 2018-09-17 2019-03-19 广东工业大学 A kind of information diffusion method suitable for cloudy storage system
CN110618968A (en) * 2019-08-13 2019-12-27 数字视觉云(北京)科技发展有限公司 Media asset storage system based on ipfs

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678158A (en) * 2013-12-26 2014-03-26 中国科学院信息工程研究所 Optimization method and system for data layout
CN104932841A (en) * 2015-06-17 2015-09-23 南京邮电大学 Saving type duplicated data deleting method in cloud storage system
CN106020722A (en) * 2016-05-19 2016-10-12 浪潮(北京)电子信息产业有限公司 Method, device and system for deduplication of repeated data of cloud storage system
CN107463334A (en) * 2016-06-03 2017-12-12 三星电子株式会社 System and method for providing expansible and contractile memory overload configuration

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8082228B2 (en) * 2008-10-31 2011-12-20 Netapp, Inc. Remote office duplication

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678158A (en) * 2013-12-26 2014-03-26 中国科学院信息工程研究所 Optimization method and system for data layout
CN104932841A (en) * 2015-06-17 2015-09-23 南京邮电大学 Saving type duplicated data deleting method in cloud storage system
CN106020722A (en) * 2016-05-19 2016-10-12 浪潮(北京)电子信息产业有限公司 Method, device and system for deduplication of repeated data of cloud storage system
CN107463334A (en) * 2016-06-03 2017-12-12 三星电子株式会社 System and method for providing expansible and contractile memory overload configuration

Also Published As

Publication number Publication date
CN107977168A (en) 2018-05-01

Similar Documents

Publication Publication Date Title
US12067256B2 (en) Storage space optimization in a system with varying data redundancy schemes
US12366958B2 (en) System and method for granular deduplication
CN104932841B (en) Economizing type data de-duplication method in a kind of cloud storage system
US9141633B1 (en) Special markers to optimize access control list (ACL) data for deduplication
US8949208B1 (en) System and method for bulk data movement between storage tiers
US9715434B1 (en) System and method for estimating storage space needed to store data migrated from a source storage to a target storage
CN102667709B (en) Systems and methods for providing long-term storage of data
CN103635900B (en) Time-based data partitioning
US8909605B1 (en) Method and system for accelerating data movement using change information concerning difference between current and previous data movements
US10339112B1 (en) Restoring data in deduplicated storage
US11372576B2 (en) Data processing apparatus, non-transitory computer-readable storage medium, and data processing method
CN102576321B (en) Performance storage system in fast photographic system for capacity optimizing memory system performance improvement
Mao et al. Leveraging data deduplication to improve the performance of primary storage systems in the cloud
KR101533340B1 (en) A method of data replication using data access frequency and erasure codes in cloud storage system
WO2014125582A1 (en) Storage device and data management method
US10229127B1 (en) Method and system for locality based cache flushing for file system namespace in a deduplicating storage system
CN103562914A (en) Resource efficient scale-out file systems
CN105095027A (en) Data backup method and apparatus
CN106020722A (en) Method, device and system for deduplication of repeated data of cloud storage system
CN103049508B (en) A kind of data processing method and device
US10649682B1 (en) Focused sanitization process for deduplicated storage systems
CN108090125A (en) A kind of data de-duplication method and device of non-query formulation
CN107977168B (en) Data dispersed storage system based on cloud storage
CN107066503A (en) The method and device of magnanimity metadata burst distribution
CN102722450B (en) Storage method for redundancy deletion block device based on location-sensitive hash

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 230000 floors 4-5, building A1, Zhongguancun collaborative innovation Zhihui Park, the intersection of Nanfeihe road and Lanzhou Road, Baohe Economic Development Zone, Hefei, Anhui Province

Patentee after: Anhui Changtai Technology Co.,Ltd.

Address before: 210-d16, building A3, Hefei Innovation Industrial Park, No. 800, Wangjiang West Road, high tech Zone, Hefei City, Anhui Province 230000

Patentee before: ANHUI CHANGTAI INFORMATION SECURITY SERVICE Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 230000 floors 4-5, building A1, Zhongguancun collaborative innovation Zhihui Park, the intersection of Nanfeihe road and Lanzhou Road, Baohe Economic Development Zone, Hefei, Anhui Province

Patentee after: Anhui Shuan System Integration Co.,Ltd.

Country or region after: China

Address before: 230000 floors 4-5, building A1, Zhongguancun collaborative innovation Zhihui Park, the intersection of Nanfeihe road and Lanzhou Road, Baohe Economic Development Zone, Hefei, Anhui Province

Patentee before: Anhui Changtai Technology Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address