IN2012KO01022A - - Google Patents
Info
- Publication number
- IN2012KO01022A IN2012KO01022A IN1022KO2012A IN2012KO01022A IN 2012KO01022 A IN2012KO01022 A IN 2012KO01022A IN 1022KO2012 A IN1022KO2012 A IN 1022KO2012A IN 2012KO01022 A IN2012KO01022 A IN 2012KO01022A
- Authority
- IN
- India
- Prior art keywords
- chunks
- data
- cdc
- level
- size
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
- G06F16/1752—De-duplication implemented within the file system, e.g. based on file segments based on file chunks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0613—Improving I/O performance in relation to throughput
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
- G06F3/0641—De-duplication techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Technologies are presented for data deduplication that operates at relatively high throughput and with relatively less storage space than conventional techniques. Building upon content-dependent chunking (CDC) using Rabin fingerprints data may be fingerprinted and stored in variable-size chunks. In some examples data may be chunked on multiple levels for example two levels variable size large chunks in the first level and fixed-size sub-chunks in the second level in order to prevent sub-chunks common to two or more data chunks from not being deduplicated. For example at a first level a CDC algorithm may be employed to fingerprint and chunk data in content-dependent sizes (variable sizes) and at a second level the CDC chunks may be sliced into small fixed-size chunks. The sliced CDC chunks may then be used for deduplication.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN1022KO2012 IN2012KO01022A (en) | 2012-09-05 | 2012-10-18 | |
US13/885,395 US9311323B2 (en) | 2012-09-05 | 2012-10-18 | Multi-level inline data deduplication |
CN201280076874.4A CN104813310A (en) | 2012-09-05 | 2012-10-18 | Multi-level inline data deduplication |
PCT/IB2012/055688 WO2014037767A1 (en) | 2012-09-05 | 2012-10-18 | Multi-level inline data deduplication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN1022KO2012 IN2012KO01022A (en) | 2012-09-05 | 2012-10-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
IN2012KO01022A true IN2012KO01022A (en) | 2015-06-05 |
Family
ID=50236597
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
IN1022KO2012 IN2012KO01022A (en) | 2012-09-05 | 2012-10-18 |
Country Status (4)
Country | Link |
---|---|
US (1) | US9311323B2 (en) |
CN (1) | CN104813310A (en) |
IN (1) | IN2012KO01022A (en) |
WO (1) | WO2014037767A1 (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9424285B1 (en) * | 2012-12-12 | 2016-08-23 | Netapp, Inc. | Content-based sampling for deduplication estimation |
US9465808B1 (en) * | 2012-12-15 | 2016-10-11 | Veritas Technologies Llc | Deduplication featuring variable-size duplicate data detection and fixed-size data segment sharing |
EP3026585A4 (en) * | 2014-02-14 | 2017-04-05 | Huawei Technologies Co., Ltd. | Server-based method for searching for data flow break point, and server |
CN105446964B (en) * | 2014-05-30 | 2019-04-26 | 国际商业机器公司 | The method and device of data de-duplication for file |
US9449012B2 (en) * | 2014-05-30 | 2016-09-20 | Apple Inc. | Cloud library de-duplication |
GB2542619A (en) * | 2015-09-28 | 2017-03-29 | Fujitsu Ltd | A similarity module, a local computer, a server of a data hosting service and associated methods |
US10997119B2 (en) * | 2015-10-23 | 2021-05-04 | Nutanix, Inc. | Reduced size extent identification |
CN105808169A (en) * | 2016-03-14 | 2016-07-27 | 联想(北京)有限公司 | Data deduplication method, apparatus and system |
US10235396B2 (en) | 2016-08-29 | 2019-03-19 | International Business Machines Corporation | Workload optimized data deduplication using ghost fingerprints |
JP6841024B2 (en) * | 2016-12-09 | 2021-03-10 | 富士通株式会社 | Data processing equipment, data processing programs and data processing methods |
US10621144B2 (en) | 2017-03-23 | 2020-04-14 | International Business Machines Corporation | Parallel deduplication using automatic chunk sizing |
US10325021B2 (en) * | 2017-06-19 | 2019-06-18 | GM Global Technology Operations LLC | Phrase extraction text analysis method and system |
US10747729B2 (en) | 2017-09-01 | 2020-08-18 | Microsoft Technology Licensing, Llc | Device specific chunked hash size tuning |
US10289335B2 (en) * | 2017-09-12 | 2019-05-14 | International Business Machines Corporation | Tape drive library integrated memory deduplication |
US10372681B2 (en) | 2017-09-12 | 2019-08-06 | International Business Machines Corporation | Tape drive memory deduplication |
US10678778B1 (en) * | 2017-10-19 | 2020-06-09 | EMC IP Holding Company LLC | Date deduplication acceleration |
CN108427538B (en) * | 2018-03-15 | 2021-06-04 | 深信服科技股份有限公司 | Storage data compression method and device of full flash memory array and readable storage medium |
US11079954B2 (en) * | 2018-08-21 | 2021-08-03 | Samsung Electronics Co., Ltd. | Embedded reference counter and special data pattern auto-detect |
US10248646B1 (en) | 2018-08-22 | 2019-04-02 | Cognigo Research Ltd. | Token matching in large document corpora |
CN111291770B (en) * | 2018-12-06 | 2023-07-25 | 华为技术有限公司 | Parameter configuration method and device |
JP7295422B2 (en) * | 2019-09-10 | 2023-06-21 | 富士通株式会社 | Information processing device and information processing program |
US11119995B2 (en) | 2019-12-18 | 2021-09-14 | Ndata, Inc. | Systems and methods for sketch computation |
WO2021127245A1 (en) * | 2019-12-18 | 2021-06-24 | Ndata, Inc. | Systems and methods for sketch computation |
US10938961B1 (en) | 2019-12-18 | 2021-03-02 | Ndata, Inc. | Systems and methods for data deduplication by generating similarity metrics using sketch computation |
WO2022135658A1 (en) * | 2020-12-21 | 2022-06-30 | Huawei Technologies Co., Ltd. | Method and system of storing data to data storage for variable size deduplication |
US20230221864A1 (en) * | 2022-01-10 | 2023-07-13 | Vmware, Inc. | Efficient inline block-level deduplication using a bloom filter and a small in-memory deduplication hash table |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6829355B2 (en) * | 2001-03-05 | 2004-12-07 | The United States Of America As Represented By The National Security Agency | Device for and method of one-way cryptographic hashing |
US7836387B1 (en) | 2005-04-29 | 2010-11-16 | Oracle America, Inc. | System and method for protecting data across protection domain boundaries |
US8527482B2 (en) * | 2008-06-06 | 2013-09-03 | Chrysalis Storage, Llc | Method for reducing redundancy between two or more datasets |
US8161255B2 (en) * | 2009-01-06 | 2012-04-17 | International Business Machines Corporation | Optimized simultaneous storing of data into deduplicated and non-deduplicated storage pools |
US8321648B2 (en) | 2009-10-26 | 2012-11-27 | Netapp, Inc | Use of similarity hash to route data for improved deduplication in a storage server cluster |
US9401967B2 (en) * | 2010-06-09 | 2016-07-26 | Brocade Communications Systems, Inc. | Inline wire speed deduplication system |
US20120053970A1 (en) * | 2010-08-25 | 2012-03-01 | International Business Machines Corporation | Systems and methods for dynamic composition of business processes |
CA2809224C (en) * | 2010-08-31 | 2016-05-17 | Nec Corporation | Storage system |
US20120089579A1 (en) * | 2010-10-08 | 2012-04-12 | Sandeep Ranade | Compression pipeline for storing data in a storage cloud |
CN102082575A (en) * | 2010-12-14 | 2011-06-01 | 江苏格物信息科技有限公司 | Method for removing repeated data based on pre-blocking and sliding window |
WO2012112121A1 (en) * | 2011-02-17 | 2012-08-23 | Jitcomm Networks Pte Ltd | Parallel data partitioning |
CN102253820B (en) * | 2011-06-16 | 2013-03-20 | 华中科技大学 | Stream type repetitive data detection method |
-
2012
- 2012-10-18 US US13/885,395 patent/US9311323B2/en not_active Expired - Fee Related
- 2012-10-18 IN IN1022KO2012 patent/IN2012KO01022A/en unknown
- 2012-10-18 CN CN201280076874.4A patent/CN104813310A/en active Pending
- 2012-10-18 WO PCT/IB2012/055688 patent/WO2014037767A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2014037767A1 (en) | 2014-03-13 |
US20140114934A1 (en) | 2014-04-24 |
CN104813310A (en) | 2015-07-29 |
US9311323B2 (en) | 2016-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
IN2012KO01022A (en) | ||
Martín-Fernández et al. | Model-based replacement of rounded zeros in compositional data: classical and robust approaches | |
WO2014105447A3 (en) | Backup user interface | |
EP3876141A4 (en) | Object detection method, related device and computer storage medium | |
EP3780482A4 (en) | Quantum key distribution method, device and storage medium | |
WO2014001568A3 (en) | Method and apparatus for realizing a dynamically typed file or object system enabling a user to perform calculations over the fields associated with the files or objects in the system | |
WO2012125314A3 (en) | Backup and restore strategies for data deduplication | |
GB201302917D0 (en) | Hybrid backup and restore of very large file system using metadata image backup and traditional backup | |
WO2014150277A3 (en) | Methods and systems for providing secure transactions | |
WO2013019869A3 (en) | Data fingerpringting for copy accuracy assurance | |
WO2012092212A3 (en) | Using index partitioning and reconciliation for data deduplication | |
WO2013169997A3 (en) | Systems and methods for distributed storage | |
WO2010019596A3 (en) | Scalable deduplication of stored data | |
WO2014159781A3 (en) | Caching content addressable data chunks for storage virtualization | |
GB2508325A (en) | Scalable deduplication system with small blocks | |
WO2012083267A3 (en) | Garbage collection and hotspots relief for a data deduplication chunk store | |
EP3401798A4 (en) | Push information rough selection sorting method, device and computer storage medium | |
WO2019006454A8 (en) | Methods, systems, and media for controlling append-only file rewrites | |
WO2013187901A3 (en) | Data deduplication management | |
GB201307395D0 (en) | Systems and methods for storing and verifying security information | |
WO2019228574A3 (en) | Log-structured storage systems | |
IN2014DN06811A (en) | ||
CA2839078C (en) | Virtual storage system and methods of copying electronic documents into the virtual storage system | |
WO2015001058A3 (en) | Method and device for de-blending seismic data using source signature | |
WO2013068530A3 (en) | Logically and end-user-specific physically storing an electronic file |