[go: up one dir, main page]

IN2012KO01022A - - Google Patents

Info

Publication number
IN2012KO01022A
IN2012KO01022A IN1022KO2012A IN2012KO01022A IN 2012KO01022 A IN2012KO01022 A IN 2012KO01022A IN 1022KO2012 A IN1022KO2012 A IN 1022KO2012A IN 2012KO01022 A IN2012KO01022 A IN 2012KO01022A
Authority
IN
India
Prior art keywords
chunks
data
cdc
level
size
Prior art date
Application number
Inventor
Subhra CHAKRABORTY Rajat
Kishore DIDDI Bhanu
Original Assignee
Indian Inst Technology Kharagpur
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Indian Inst Technology Kharagpur filed Critical Indian Inst Technology Kharagpur
Priority to IN1022KO2012 priority Critical patent/IN2012KO01022A/en
Priority to US13/885,395 priority patent/US9311323B2/en
Priority to CN201280076874.4A priority patent/CN104813310A/en
Priority to PCT/IB2012/055688 priority patent/WO2014037767A1/en
Publication of IN2012KO01022A publication Critical patent/IN2012KO01022A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • G06F16/1752De-duplication implemented within the file system, e.g. based on file segments based on file chunks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Technologies are presented for data deduplication that operates at relatively high throughput and with relatively less storage space than conventional techniques. Building upon content-dependent chunking (CDC) using Rabin fingerprints  data may be fingerprinted and stored in variable-size chunks. In some examples  data may be chunked on multiple levels  for example  two levels  variable size large chunks in the first level and fixed-size sub-chunks in the second level  in order to prevent sub-chunks common to two or more data chunks from not being deduplicated. For example  at a first level  a CDC algorithm may be employed to fingerprint and chunk data in content-dependent sizes (variable sizes)  and at a second level the CDC chunks may be sliced into small fixed-size chunks. The sliced CDC chunks may then be used for deduplication.
IN1022KO2012 2012-09-05 2012-10-18 IN2012KO01022A (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
IN1022KO2012 IN2012KO01022A (en) 2012-09-05 2012-10-18
US13/885,395 US9311323B2 (en) 2012-09-05 2012-10-18 Multi-level inline data deduplication
CN201280076874.4A CN104813310A (en) 2012-09-05 2012-10-18 Multi-level inline data deduplication
PCT/IB2012/055688 WO2014037767A1 (en) 2012-09-05 2012-10-18 Multi-level inline data deduplication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IN1022KO2012 IN2012KO01022A (en) 2012-09-05 2012-10-18

Publications (1)

Publication Number Publication Date
IN2012KO01022A true IN2012KO01022A (en) 2015-06-05

Family

ID=50236597

Family Applications (1)

Application Number Title Priority Date Filing Date
IN1022KO2012 IN2012KO01022A (en) 2012-09-05 2012-10-18

Country Status (4)

Country Link
US (1) US9311323B2 (en)
CN (1) CN104813310A (en)
IN (1) IN2012KO01022A (en)
WO (1) WO2014037767A1 (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9424285B1 (en) * 2012-12-12 2016-08-23 Netapp, Inc. Content-based sampling for deduplication estimation
US9465808B1 (en) * 2012-12-15 2016-10-11 Veritas Technologies Llc Deduplication featuring variable-size duplicate data detection and fixed-size data segment sharing
EP3026585A4 (en) * 2014-02-14 2017-04-05 Huawei Technologies Co., Ltd. Server-based method for searching for data flow break point, and server
CN105446964B (en) * 2014-05-30 2019-04-26 国际商业机器公司 The method and device of data de-duplication for file
US9449012B2 (en) * 2014-05-30 2016-09-20 Apple Inc. Cloud library de-duplication
GB2542619A (en) * 2015-09-28 2017-03-29 Fujitsu Ltd A similarity module, a local computer, a server of a data hosting service and associated methods
US10997119B2 (en) * 2015-10-23 2021-05-04 Nutanix, Inc. Reduced size extent identification
CN105808169A (en) * 2016-03-14 2016-07-27 联想(北京)有限公司 Data deduplication method, apparatus and system
US10235396B2 (en) 2016-08-29 2019-03-19 International Business Machines Corporation Workload optimized data deduplication using ghost fingerprints
JP6841024B2 (en) * 2016-12-09 2021-03-10 富士通株式会社 Data processing equipment, data processing programs and data processing methods
US10621144B2 (en) 2017-03-23 2020-04-14 International Business Machines Corporation Parallel deduplication using automatic chunk sizing
US10325021B2 (en) * 2017-06-19 2019-06-18 GM Global Technology Operations LLC Phrase extraction text analysis method and system
US10747729B2 (en) 2017-09-01 2020-08-18 Microsoft Technology Licensing, Llc Device specific chunked hash size tuning
US10289335B2 (en) * 2017-09-12 2019-05-14 International Business Machines Corporation Tape drive library integrated memory deduplication
US10372681B2 (en) 2017-09-12 2019-08-06 International Business Machines Corporation Tape drive memory deduplication
US10678778B1 (en) * 2017-10-19 2020-06-09 EMC IP Holding Company LLC Date deduplication acceleration
CN108427538B (en) * 2018-03-15 2021-06-04 深信服科技股份有限公司 Storage data compression method and device of full flash memory array and readable storage medium
US11079954B2 (en) * 2018-08-21 2021-08-03 Samsung Electronics Co., Ltd. Embedded reference counter and special data pattern auto-detect
US10248646B1 (en) 2018-08-22 2019-04-02 Cognigo Research Ltd. Token matching in large document corpora
CN111291770B (en) * 2018-12-06 2023-07-25 华为技术有限公司 Parameter configuration method and device
JP7295422B2 (en) * 2019-09-10 2023-06-21 富士通株式会社 Information processing device and information processing program
US11119995B2 (en) 2019-12-18 2021-09-14 Ndata, Inc. Systems and methods for sketch computation
WO2021127245A1 (en) * 2019-12-18 2021-06-24 Ndata, Inc. Systems and methods for sketch computation
US10938961B1 (en) 2019-12-18 2021-03-02 Ndata, Inc. Systems and methods for data deduplication by generating similarity metrics using sketch computation
WO2022135658A1 (en) * 2020-12-21 2022-06-30 Huawei Technologies Co., Ltd. Method and system of storing data to data storage for variable size deduplication
US20230221864A1 (en) * 2022-01-10 2023-07-13 Vmware, Inc. Efficient inline block-level deduplication using a bloom filter and a small in-memory deduplication hash table

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6829355B2 (en) * 2001-03-05 2004-12-07 The United States Of America As Represented By The National Security Agency Device for and method of one-way cryptographic hashing
US7836387B1 (en) 2005-04-29 2010-11-16 Oracle America, Inc. System and method for protecting data across protection domain boundaries
US8527482B2 (en) * 2008-06-06 2013-09-03 Chrysalis Storage, Llc Method for reducing redundancy between two or more datasets
US8161255B2 (en) * 2009-01-06 2012-04-17 International Business Machines Corporation Optimized simultaneous storing of data into deduplicated and non-deduplicated storage pools
US8321648B2 (en) 2009-10-26 2012-11-27 Netapp, Inc Use of similarity hash to route data for improved deduplication in a storage server cluster
US9401967B2 (en) * 2010-06-09 2016-07-26 Brocade Communications Systems, Inc. Inline wire speed deduplication system
US20120053970A1 (en) * 2010-08-25 2012-03-01 International Business Machines Corporation Systems and methods for dynamic composition of business processes
CA2809224C (en) * 2010-08-31 2016-05-17 Nec Corporation Storage system
US20120089579A1 (en) * 2010-10-08 2012-04-12 Sandeep Ranade Compression pipeline for storing data in a storage cloud
CN102082575A (en) * 2010-12-14 2011-06-01 江苏格物信息科技有限公司 Method for removing repeated data based on pre-blocking and sliding window
WO2012112121A1 (en) * 2011-02-17 2012-08-23 Jitcomm Networks Pte Ltd Parallel data partitioning
CN102253820B (en) * 2011-06-16 2013-03-20 华中科技大学 Stream type repetitive data detection method

Also Published As

Publication number Publication date
WO2014037767A1 (en) 2014-03-13
US20140114934A1 (en) 2014-04-24
CN104813310A (en) 2015-07-29
US9311323B2 (en) 2016-04-12

Similar Documents

Publication Publication Date Title
IN2012KO01022A (en)
Martín-Fernández et al. Model-based replacement of rounded zeros in compositional data: classical and robust approaches
WO2014105447A3 (en) Backup user interface
EP3876141A4 (en) Object detection method, related device and computer storage medium
EP3780482A4 (en) Quantum key distribution method, device and storage medium
WO2014001568A3 (en) Method and apparatus for realizing a dynamically typed file or object system enabling a user to perform calculations over the fields associated with the files or objects in the system
WO2012125314A3 (en) Backup and restore strategies for data deduplication
GB201302917D0 (en) Hybrid backup and restore of very large file system using metadata image backup and traditional backup
WO2014150277A3 (en) Methods and systems for providing secure transactions
WO2013019869A3 (en) Data fingerpringting for copy accuracy assurance
WO2012092212A3 (en) Using index partitioning and reconciliation for data deduplication
WO2013169997A3 (en) Systems and methods for distributed storage
WO2010019596A3 (en) Scalable deduplication of stored data
WO2014159781A3 (en) Caching content addressable data chunks for storage virtualization
GB2508325A (en) Scalable deduplication system with small blocks
WO2012083267A3 (en) Garbage collection and hotspots relief for a data deduplication chunk store
EP3401798A4 (en) Push information rough selection sorting method, device and computer storage medium
WO2019006454A8 (en) Methods, systems, and media for controlling append-only file rewrites
WO2013187901A3 (en) Data deduplication management
GB201307395D0 (en) Systems and methods for storing and verifying security information
WO2019228574A3 (en) Log-structured storage systems
IN2014DN06811A (en)
CA2839078C (en) Virtual storage system and methods of copying electronic documents into the virtual storage system
WO2015001058A3 (en) Method and device for de-blending seismic data using source signature
WO2013068530A3 (en) Logically and end-user-specific physically storing an electronic file