EP3430515B1 - Datenmanagement und sicherheit für verteilte speichersysteme - Google Patents
Datenmanagement und sicherheit für verteilte speichersysteme Download PDFInfo
- Publication number
- EP3430515B1 EP3430515B1 EP17718631.9A EP17718631A EP3430515B1 EP 3430515 B1 EP3430515 B1 EP 3430515B1 EP 17718631 A EP17718631 A EP 17718631A EP 3430515 B1 EP3430515 B1 EP 3430515B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- file
- data
- metadata
- segment
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/004—Arrangements for detecting or preventing errors in the information received by using forward error control
- H04L1/0056—Systems characterized by the type of code used
- H04L1/0057—Block codes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
- G06F11/1088—Reconstruction on already foreseen single or plurality of spare disks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0253—Garbage collection, i.e. reclamation of unreferenced memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
- G06F12/0848—Partitioned cache, e.g. separate instruction and operand caches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/14—Protection against unauthorised use of memory or access to memory
- G06F12/1408—Protection against unauthorised use of memory or access to memory by using cryptography
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/14—Protection against unauthorised use of memory or access to memory
- G06F12/1458—Protection against unauthorised use of memory or access to memory by checking the subject access rights
- G06F12/1466—Key-lock mechanism
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
- G06F16/1752—De-duplication implemented within the file system, e.g. based on file segments based on file chunks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
- H03M13/05—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
- H03M13/13—Linear codes
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
- H03M13/05—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
- H03M13/13—Linear codes
- H03M13/15—Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes
- H03M13/151—Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes using error location or error correction polynomials
- H03M13/1515—Reed-Solomon codes
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/29—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes combining two or more codes or code structures, e.g. product codes, generalised product codes, concatenated codes, inner and outer codes
- H03M13/2906—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes combining two or more codes or code structures, e.g. product codes, generalised product codes, concatenated codes, inner and outer codes using block codes
- H03M13/2921—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes combining two or more codes or code structures, e.g. product codes, generalised product codes, concatenated codes, inner and outer codes using block codes wherein error correction coding involves a diagonal direction
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/61—Aspects and characteristics of methods and arrangements for error correction or error detection, not provided for otherwise
- H03M13/615—Use of computational or mathematical techniques
- H03M13/616—Matrix operations, especially for generator matrices or check matrices, e.g. column or row permutations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
- H04L9/0863—Generation of secret information including derivation or calculation of cryptographic keys or passwords involving passwords or one-time passwords
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0894—Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3226—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1052—Security improvement
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/28—Using a specific disk cache architecture
- G06F2212/282—Partitioned cache
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/70—Details relating to dynamic memory management
- G06F2212/702—Conservative garbage collection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2107—File encryption
Definitions
- the application described herein generally, relates to distributed storage system and, more particularly, to techniques for data protection against failures in distributed storage systems.
- a distributed storage system may require many hardware devices, which often results in component failures that require recovery operations. Moreover, components in a distributed storage system may become unavailable, such as due to poor network connectivity or performance, without necessarily completely failing.
- redundancy measures are often introduced to protect data against storage node failures and outages, or other impediments. Such measures can include distributing data with redundancy over a set of independent storage nodes.
- Replication is often used in distributed storage systems to provide fast access to data.
- Triple replication can suffer from very low storage efficiency which, as used herein, generally refers to a ratio of an amount of original data to an amount of actually stored data, i.e., data with redundancy.
- Error-correcting coding provides an opportunity to store data with a relatively high storage efficiency, while simultaneously maintaining an acceptable level of tolerance against storage node failure.
- MDS maximum distance separable
- LDC Locally decodable codes
- GCC generalized concatenated codes
- RAID redundant arrays of independent disks
- the method begins by a dispersed storage (DS) processing module receiving a request to copy a data object in a dispersed storage network (DSN). The method continues with the DS processing module identifying one or more sets of at least a decode threshold number of slice names for one or more sets of encoded data slices of the data object and generating one or more sets of at least a decode threshold of new slice names.
- DS dispersed storage
- the method continues with the DS processing module sending the one or more sets of at least a decode threshold of new slice names to storage nodes of the DSN and instructing the storage nodes to link the one or more sets of at least a decode threshold of new slice names to the one or more sets of encoded data slices thereby producing a non-replicated copy of the data object.
- the present application includes a method for distributing data of a plurality of files over a plurality of respective remote storage nodes. This includes splitting into segments, by at least one processor configured to execute code stored in non-transitory processor readable media, the data of the plurality of files. Each segment is encoded, by the at least one processor, into a number of codeword chunks, wherein none of the codeword chunks contains any of the segments. Each codeword chunk is packaged with encoding parameters and identifiers into at least one package, and the at least one processor generates metadata for at least one file of the plurality of files and metadata for related segments of the at least one file.
- the metadata for the at least one file contains information to reconstruct the at least one file from the segments, and metadata for the related segments contains information for reconstructing the related segments from corresponding packages.
- the at least one processor encodes the metadata into at least one package, wherein the encoding corresponds to a respective security level and a protection against storage node failure.
- the at least one processor computes availability coefficients for the remote storage nodes using statistical data, wherein each availability coefficient characterizes predicted average download speed for a respective remote storage node.
- the at least one processor further assigns a plurality of packages to remote storage nodes, wherein the step of assigning uses the availability coefficients to minimize data retrieval latency and optimize workload distribution.
- Each of the package is transmitted to at least one respective storage node, and at least one of the plurality of files is retrieved as a function iteratively accessing and retrieving the packages of metadata and file data.
- the present application includes that the step of splitting into segments provides data within a respective segment that comprises a part of one individual file or several files.
- the present application includes aggregating a plurality of files for a segment as a function of minimizing a difference between segment size and a total size of embedded files, and a likelihood of joint retrieval of embedded files.
- the present application includes that the step of encoding each segment includes deduplication as a function of hash-based features of the file.
- the present application includes that the step of encoding each segment includes encryption, wherein at least one segment is encrypted entirely with an individual encryption key.
- the present application includes that the encryption key is generated as a function of data being encrypted.
- the present application includes that each of a plurality of respective individual encryption keys is encrypted with a respective key encryption key and distributed over a respective storage node, wherein each respective key encryption key is generated using a password-based key derivation function.
- the present application includes that the step of encoding each segment includes encryption, wherein at least one segment is partitioned into pieces, wherein each piece is separately encrypted, and further wherein a number of encryption keys per segment ranges from one to the number of pieces.
- the present application includes that the step of encoding each segment comprises erasure coding of mixing degree S, wherein codeword chunks are produced from information chunks using a linear block error correction code, and mixing degree S requires at least S codeword chunks to reconstruct any information chunk.
- the present application includes that respective erasure coding techniques are used for data segment encoding and metadata encoding, such that metadata is protected from at least storage node failure.
- the present application includes that the step of assigning packages to remote storage nodes minimizes retrieval latency for a group of related segments.
- the present application further includes a step of computing at least one relevance coefficient as a function of information representing an employed erasure correction coding scheme and significance of the respective codeword position for data retrieval.
- the present application includes that metadata for a file and metadata for related segments is divided into two parts, in which one part is individually packed in packages and another part is appended to packages containing respective encoded data segments.
- the present application includes arranging temporary storage of file data within a local cache by: operating over compound blocks of data; dividing memory space into regions with compound blocks of equal size; employing a file structure to optimize file arrangement within the local cache; and performing garbage collection to arrange free compound blocks.
- the present application includes that arranging temporary storage of file data within a local cache further includes cache optimization employing information representing a file structure.
- the present application includes that cache optimization is simplified by classifying files based on respective a plurality of categories of access patterns, and employing respective cache management strategy for similarly categorized files.
- the present application includes that the step of retrieving the packages of metadata and file data comprising: accessing, by the one or more processors, file metadata references within a local cache or within remote storage nodes; receiving, by the one or more processors, a plurality of packages from remote storage nodes by metadata references, each of the packages contain file metadata; receiving, by the one or more processors, a plurality of other packages containing encoded file segments from storage nodes by data references, wherein the encoded file segments are obtained at least partly from file metadata; reconstructing, by the one or more processors, file data from the packages as a function of metadata representing parameters associated with an encoding scheme and file splitting scheme.
- the present application includes that file retrieval speed is enhanced by caching metadata from a plurality of client side files.
- the present application includes a system for distributing data of a plurality of files over a plurality of respective remote storage nodes, the system comprising: one or more processors in electronic communication with non-transitory processor readable storage media, one or more software modules comprising executable instructions stored in the storage media, wherein the one or more software modules are executable by the one or more processors and include: a fragmentation module that configures the one or more processors to split into segments the data of the plurality of files; an encoding module that configures the one or more processors to encode each segment into a number of codeword chunks, wherein none of the codeword chunks contains any of the segments and to package each codeword chunk with at least one encoding parameter and identifier into at least one package; a configuration module that configures the one or more processors to generate metadata for at least one file of the plurality of files and metadata for related segments of the at least one file, wherein the metadata for the at least one file contains information to reconstruct the at least one file from the segments,
- KEKs key encryption keys
- KEKs can be generated using a password-based key derivation function, wherein in which a password for a vault is employed together with random data.
- KEKs are stored on the client side. In case of system failure, such as a client side crash, a copy of a KEK may be retrieved from storage nodes using a single password.
- erasure coding helps to tolerate storage node outages, while high storage efficiency is provided by selected construction of error-correction code, such as shown and described in greater detail herein.
- Code parameters e.g., length, dimension, minimum distance
- code length should not exceed the number of storage nodes specified in the vault configuration, and a number of tolerated storage nodes failures is equal to the minimum distance decreased by one.
- Storage efficiency can be enhanced by flexible deduplication.
- deduplication can be performed for not just files, but also for small parts or pieces of files.
- the present application accounts for an appropriate tradeoff between deduplication complexity and storage efficiency, which can be selectable by a client. Further, optional compression can be applied to data, depending on respective client preferences. Latency for data retrieval and repair operations can be further minimized by network load balancing technique such as shown and described herein.
- present disclosure relates to distributed secure data storage and transmission for use in various contexts including, for example, streaming and other applications.
- the dispersed storage of data, including in particular streaming media data, on cloud servers is one particularly useful application, while similarly applicable to configurations in which data may be stored on multiple storage devices which may be connected by any possible communications technology such as local area and/or wide area networks.
- this includes storage of media content, including without limitation video or audio content, that can be made available for streaming through the Internet.
- the disclosed improvements in speed and security, and greater utilization of available storage resources can enable higher streaming rates.
- the vast amount of storage space required for storage of video, audio and other metadata can further benefit from increased availability and utilization of existing resources and infrastructure, in accordance with respective implementations embodiments disclosed herein.
- data that are stored within a distributed storage system can be classified into several categories, and different coding techniques can be applied to different data categories.
- erasure coding techniques maximizing storage efficiency can be applied to a plurality of files containing original data
- highly utilized metadata techniques can be selected and applied to minimize access latency.
- high speed data retrieval is possible as a function of reconstructing data from different subsets of storage nodes. In case a number of available storage nodes is not less than a pre-defined threshold, data recovery is possible.
- a distributed storage system of the present application is object-level one, in which files with corresponding metadata are abstracted as objects. Further, small files can be aggregated into one single object to reduce the number of objects to be transmitted to storage nodes, and to reduce amount of metadata. Objects can be partitioned into segments, and each segment can be further encoded. Thus, a number of encoded chunks are produced from each segment. Encoded chunks together with corresponding metadata are encapsulated in packages, which are transferred to storage nodes. Client data is securely stored within the encoded chunks by utilizing encryption and erasure coding with pre-defined degree of data mixing.
- no amount of client data may be reconstructed from any set of encoded chunks, provided cardinality of the set is lower than the mixing degree.
- sizes of segments and encoded chunks can be selected as a function of respective client statistics, including statistics on read and write requests.
- client statistics including statistics on read and write requests.
- the size of a respective encoded chunk can be determined by the size of a related segment and the number of storage nodes and/or the length of selected error-correction code.
- a distributed storage system includes processing system devices configured to distribute and/or access client data quickly and efficiently over a set of storage nodes.
- Processing system devices can include one or several server clusters, in which each server cluster is configured with or as a file system server and a number of processing servers.
- a specially designed object-based file system can be included and deployed within each server cluster.
- File system servers of the server clusters can operate to maintain identical instances of the object-based file system. More particularly, a frequently used part of an object-based file system may be maintained within the processing system, while an entire object-based file system can be packed in a plurality of encoded chunks, encapsulated into packages and, thereafter, distributed over a set of storage nodes.
- Object search speed is, accordingly, enhanced as a result of selection of an appropriate tree data structure or a directed graph.
- An example object-based file system of the present application operates over large data blocks, referred as compound blocks. Compound blocks significantly reduce an amount of metadata, the number of operations performed by the object-based file system and the number of objects transmitted to storage nodes.
- a merging of NAS technology and object storage is provided, wherein files are also configured as objects, each having a unique ID. This provides the ability for files to be accessed from any application, from any geographic location and from any public or private storage provider, with simple HTTPS protocols, regardless of the same object being filed in a sub-folder on the NAS file system. This further provides enterprise applications with a multi-vendor storage solution that has all benefits of object storage.
- Implementations of the present application allow for mixing of storage nodes from multiple vendors, and provide functionality for users to select any respective ones of storage providers, including on-site and off-site, and to switch between storage providers at will.
- block and file system storage is configured to meet the needs of an increasingly distributed and cloud-enabled computing ecosystem.
- block-based storage blocks on disks are accessed via low-level storage protocols, such as SCSI commands, with little overhead and/or no additional abstraction layers. This provides an extremely fast way to access data on disks, and various high-level tasks, such as multi-user access, sharing, locking and security, can be deferred to operating systems.
- erasure codec has been developed for implementing secure cloud NAS storage with a relatively simple file system.
- the codec configures an erasure correcting code from component codes of smaller lengths.
- a library of component codes that includes optimal maximum distance separable (MDS) codes (such as Reed-Solomon) and codes with low encoding/decoding complexity (such as optimal binary linear codes) can be provided, and the structure of the erasure code can be optimized to the user's preferences.
- MDS maximum distance separable
- codes with low encoding/decoding complexity such as optimal binary linear codes
- This structure provides erasure coding with flexible parameters, such as to enable users to manage storage efficiency, data protection against failures, network traffic and CPU utilization.
- the erasure codec of the present application distributes network traffic, in conjunction with load balancing.
- the number of component codes within the configured erasure correcting code of the present application can depend on a number of available storage nodes, which can further be determined by a data vault's respective structure.
- erasure codec includes an improved performance algorithm for data processing by maximizing input/output operations per second ("IOPS") ratio by using concurrency and parallel processing. This can reduce latency and avoid operational limitations within datacenters. Moreover, configurations of the present application can obtain significantly high levels of security, such as to protect customer data within public or private cloud premises from unauthorized access and theft, by mixing and hiding data as a function of the erasure codec.
- a degree of data mixture can be selected according to user preference. The mixture degree can be the smallest number of storage nodes that need to be accessed in order to reconstruct a chosen amount of original user data. Higher mixture degrees can correspond to higher levels of data protection, such as to preclude unauthorized access, and to provide higher data retrieval complexity.
- Fig. 1 is a schematic block diagram illustrating a distributed storage system interacting with client applications, in accordance with an example implementation of the present application.
- Original data 106 e.g., files, produced by client applications 109
- Original data 106 is distributed over a set of storage nodes 103, and original data 106 is available to client applications 109 upon request. Any system producing and receiving data on the client side can be considered as an instance of a client application 109.
- processing system 101 located on the client side.
- processing system 101 can include one or several server clusters 107, in which original data 106 are transformed into encoded chunks 108, and vice-versa.
- a server cluster 107 can include a file system server and one or more processing servers, although a server cluster may include just an individual server.
- Storage nodes 103 can operate independently from each other, and can be physically located in different areas.
- Processing system 101 ensures data integrity, security, protection against failures, compression and deduplication.
- configuration of processing system 101 is specified by configuration metadata 104 maintained within highly protected storage 102. System configuration may be adjusted via administrator application 110.
- Fig. 2 is a schematic block diagram representing logical components of an example processing system, and arranged to transform original data into packages with encapsulated encoded chunks, and vice-versa, as well as to organize fast, reliable and secure data transmission.
- Fig. 2 illustrates an example logical architecture, as opposed to the example physical architecture illustrated by Fig. 1 .
- processing system 201 includes a number of modules, wherein each module is responsible for a particular functionality.
- program modules include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types.
- program modules can be located in both local and remote computer system storage media including memory storage devices. Accordingly, modules can be configured to communicate with and transfer data to each other.
- one or more client applications 202 can operate on application level 210.
- modules can be divided into two categories.
- a first category can include modules operating on particular levels.
- administrator module 215 operates on application level 210, and can be responsible for providing relevant information to a system administrator regarding tasks being performed, configuration information, monitoring information and statistics on the system more generally, as well as for receiving administrator's orders.
- Original data can be received by gateway module 203 operating within access level 211, where gateway module 203 supports different protocols, e.g., network file system (NFS) protocol, server message block (SMB) protocol, internet small computer system interface (iSCSI) protocol, representational state transfer (REST) or RESTful Web services.
- protocols e.g., network file system (NFS) protocol, server message block (SMB) protocol, internet small computer system interface (iSCSI) protocol, representational state transfer (REST) or RESTful Web services.
- NFS network file system
- SMB server message block
- iSCSI internet small computer system interface
- REST
- Gateway module 203 can provide opportunity for almost any arbitrary database or application on the client side to access the processing system 201. Moreover, gateway module 203 can enable communication between processing system 201 and storage nodes via the network (e.g., using hypertext transfer protocol secure (HTTPS)).
- HTTPS hypertext transfer protocol secure
- gateway module 203 provide for connectivity between data processing level 212 and object storage level 214. Transformation of original data into encoded chunks can be performed within data processing level 212.
- two modules operate in data processing level 212: file system module 204 and coding module 205.
- Coding module 205 can be configured to perform compression, encryption and erasure coding, while file system module 204 can be configured to keep track of correspondence between original data objects and packages with encoded chunks located on storage nodes.
- Load balancing module 206 while operating in network level 213, can be configured to minimize, for example, regulating traffic between processing system 201 and each storage node. Load balancing module 206 can perform bandwidth analysis and use results therefrom to optimize mapping between a set of encoded chunks and a set of storage nodes, i.e., to optimize distribution of packages over storage nodes.
- a second category of modules can include modules that are configured to affect or arrange functioning of other modules.
- configuration module 207 is operable to customize other modules according to configuration metadata.
- Control module 208 includes instructions that, when executed by one or more devices within processing system 101, to schedule tasks for other modules and to regulates resource consumption, e.g., memory and CPU.
- Monitoring module 209 can be configured to include instructions that, when executed by one or more devices within processing system 101, to activity track on activities being performed within the processing system 101 and its environment, as well as to generate event alerts, as appropriate.
- Modules can be distributed over a server cluster, i.e., file system server and processing servers.
- file system module 204, configuration module 207 and gateway module 203 are deployed over file system server.
- Coding module 205 and load balancing module 206 are deployed over processing servers.
- Control module 208 and monitoring module 209 are deployed over both file system server and processing servers.
- the present application configures one or more processing devices to partition objects into segments, and each segment can be further encoded into a number of chunks, which can be transferred to storage nodes.
- This structure significantly simplifies storage implementation processes, without compromising data security, integrity, protection and storage performance.
- information about data is encrypted at the client and stored securely within packages with encapsulated encoded chunks that are dispersed across storage nodes.
- a plurality of application servers, data vaults, a process is implemented in a virtual machine instance that includes operations for, for example, encryption, compression, deduplication, and protection and, moreover, slicing the information into a respective chunks and objects.
- the erasure codec generates various types of encoded chunks, which are spread across all the storage nodes and deployed for a vault installation.
- Metadata can be encoded in a way that is only visible and retrievable by the authorized data owner. This is implemented by abstracting erasure-coded metadata and NAS metadata, which is thereafter dispersed between different storage nodes.
- a package can be configured to contain encoded chunk together with related metadata: storage nodes configuration; a vault configuration; a link to active vault snapshot; and a current state of data blocks used for snapshot.
- the result is a simple NAS solution with all advantages of erasure-coded object storage, such as security, unlimited scalability, speed and data resiliency and without a requirement for use of RAID systems to provide data resiliency, and write or replicate multiple copies to different geographical locations to ensure availability during component failures.
- the systems and method shown and described herein provide for data protection, while including a relatively modest overhead (e.g., such as 40%-60% overhead), as opposed to a significantly larger overhead (e.g., 300-600% overhead) in traditional NAS systems.
- packages that are generated from original data are connected by shared base information, as well as by connectivity to one or more neighboring packages through metadata.
- the packages can be uploaded to geographically distributed storage nodes of the user's choosing, and contain links to a vault snapshots, as well as a current state of data blocks used for the snapshots. This provides significantly enhanced security and gives the vault a high tolerance for node failure.
- the present application supports the ability to reconstruct all data, even in the event of data loss on the client side. Simply by creating a new vault with account details, all data will become instantly accessible. This can be further made possible as a function of the intelligent indexing and caching data prior to data uploading to remote storage nodes, as well as data pre-fetching prior to receiving read requests.
- a default size of a compound block can be 4MB. These larger blocks ensure near Tier-1 performance on top of S3-type storage nodes.
- data blocks can be categorized such as "hot,” “warm” and “cold.” Rating indexes can be managed for NAS blocks, and these rating indexes can be further employed to identify a category of a corresponding compound block. In this way, frequently used warm and hot categories of data can be handled locally (in memory and stored in locally attached SSD), while also being dispersed in the cloud. Furthermore, the cached part of file system is regularly snapshotted, sliced, turned into packages with encoded chunks, and then distributed over storage nodes. If a cache includes several independent storage devices, e.g., several SSD, then replication or erasure coding can be employed within cache to enhance data protection. An example process 500 is illustrated in Fig. 5 .
- a virtual appliance provides a distributed, scalable, fault-tolerant and highly available storage system, which allows organizations to combine geographically distributed storage resources of different providers into a single namespace that is transparent, for example, on UNIX and WINDOWS operating systems,
- an instance can be provisioned as a virtual appliance or docker container, which can run under a virtualization framework, such as VMware, Xen or OpenStack.
- a virtualization framework such as VMware, Xen or OpenStack.
- it can be easily packaged on a hardware appliance.
- Example processes and components 700 are illustrated in Fig. 7 , and include an object storage layer, splitter, iSCSI, network file system (e.g., NFS), common internet file system ("CIFS"), mounted file system (for example, ext4 or btrfs), block storage, cache, and public and private cloud connectors.
- an object storage layer ensures consistent integration with public and private storage nodes.
- the object storage of the present application is significantly more scalable than traditional file system storage, at least in part because it is significantly simpler. For example, instead of organizing files in a directory hierarchy, object storage systems store files in a flat organization of containers, and unique identifiers are employed to retrieve them.
- Data splitting can be configured to perform three major operations on a stored data object: data slicing and mixing; high level encryption (for example, using AES-128, AES-196 or AES-256); and data encoding against failures with an efficient and flexible algorithm.
- Data encoding is configured to work in such a way that the produced encoded chunks do not contain any sequence of bytes from the original data object, even with the encryption option, for example, in the administrator application 110, being set to disabled.
- packages with encoded chunks can be anonymously stored within a set of storage nodes.
- transformed data blocks are transmitted to different storage nodes in parallel, ensuring efficient utilization of available network bandwidth, which results in high data transfer speed. This strategy makes data interception virtually impossible.
- vault snapshots, data blocks and packages with encoded chunks, described in greater herein form a graph of related data objects.
- An example map 900 showing storage nodes located around the world is illustrated in Fig. 9 .
- a fast, key-value pair-based, graph database is used to access various information about the state of the system. These include, for example finding the latest valid vault snapshot, the closest snapshot for rollback, and data blocks that may need repair.
- a full system configuration can be replicated to a subset of storage nodes on a regular basis. This ensures that data can survive an underlying virtual machine (VM) server outage, and that the system state can also be restored if the VM data is destroyed.
- Vault snapshots can include the following metadata: a list of the data blocks used; checksums for verifying data blocks integrity; a base snapshot image; blocks delta overlaid over base vault snapshot; and a link to previous vault snapshot used.
- the full range of NFS security can be supported.
- vault options a range of vault types can be configured to support different application configurations.
- vaults can be created and configured for file storage, archiving and/or deep archiving.
- Vaults can further be optimized for running block-based databases, imaging (e.g., video) and image storage applications.
- a primary storage vault is provided for a high performance file system. With vault content cached locally, this option is ideal for database applications. Files can be stored in a virtual folder, and managed in the background.
- the primary storage vault supports automatic snapshot management, wherein snapshots are created much faster than backups, and each snapshot is consistent with the source content at the moment of its creation. The frequency of snapshots can be defined, and snapshots can be split and dispersed to different datacenters in the cloud, such as shown and described herein. Thus, data are protected and backed up frequently, without the performance of applications being negatively affected.
- a high performance cloud storage file system is provided with virtually unlimited storage capacity. This option is ideal for web servers requiring large storage capacity for images and videos, and fast performance.
- Data can be stored across multiple cloud centers, and be managed by a single file system that can be accessed almost instantaneously from other members of the cluster located in other geographical regions. For example, data can be stored in multiple vault clusters, using a MICROSOFT AZURE data center in Ireland, an AWS data center in Virginia and an on-premises data center in Singapore.
- an archive vault option provides long term storage of data that is compressed and deduplicated.
- the data can be compressed automatically, which is useful in cases when low storage costs are desired and moderate retrieval speeds are tolerable.
- another archive vault offers lower storage cost compared to other archive vault options.
- This option may be ideal for data that are rarely retrieved, and data retrieval times are less important.
- Such an option may be implemented using AMAZON GLACIER cloud storage, and provides long term storage of data that is compressed and deduplicated.
- WINDOWS file sharing via CIFS protocol provides file sharing with WINDOWS servers and WINDOWS clients, including WINDOWS 7, WINDOWS 8, WINDOWS XP and other WINDOWS-compatible devices. Virtually an unlimited number of file shares are supported.
- Performance of the system can scale linearly with a number of storage nodes in the system. Accordingly, adding a new storage node will increase the available capacity and improve the overall performance of the system. The system will automatically move some data to the newly added storage node, because it balances space usage across all connected nodes. Removing a storage node is as straightforward as adding a node. The use of multi-vendor storage nodes allows the system to parallelize operations across vendors, which further contributes to its throughput.
- the teachings herein provide benefits of secret sharing schemes to storage by combining information dispersal with high level encryption. This preserves data confidentiality and integrity in the event of any of the packages with encoded chunks being compromised.
- the methods of data coding ensure that information can only be deciphered if all the information is known. This eliminates the need for key management while ensuring high levels of key security and reliability.
- Data can be packaged with AES-256/SHA-256 encryption which is validated for use in the most security conscious environments.
- Object metadata can include original data metadata and system metadata, in which original data metadata is provided by client applications together with related files, and system metadata can be generated by object-based distributed storage system application(s).
- original data metadata does not have to depend on object-based distributed storage system in general (i.e., processing system or cloud storage).
- original data metadata can include file attributes such as file type, time of last change, file ownership and access mode, e.g., read, write, execute permissions, as well as other metadata provided by client application together with the file containing original data.
- Original data metadata can be encoded and encapsulated into packages, together with original data.
- system metadata of an object is usable to manage an object within the distributed storage system, so it is particularly relevant from within the system.
- System metadata can include identifiers, cloud location information, erasure coding scheme, encryption keys, internal flags and timestamps. Additional system metadata can be specified depending on the requirements to the distributed storage system.
- identifiers e.g., numeric IDs and HASH values, are usable to identify objects and their versions.
- Cloud location information can be represented by an ordered list of data segments, in which each segment is given by a list of packages with encoded chunks (e.g., indices and locations). An index of a package can depend on a load balancing scheme.
- a location of a package can be provided by a reference, thereby providing an opportunity to download a package with encoded chunks from cloud storage.
- Information regarding a respective erasure coding scheme is usable to reconstruct data segments from encoded chunks.
- secure storage of encryption keys can be provided by using key encryption keys (KEKs), in which KEKs depend on password or, alternatively, by distribution over storage nodes using secret sharing.
- Key encryption keys KEKs
- Internal flags show various options that are enabled for an object, e.g., encryption, compression, access type, and caching policy.
- timestamps identify a time of object creation, modification and deletion. Timestamps are useful to track a relevant version of an object.
- this list of metadata is exemplary, and can be supplemented or detailed.
- a distributed storage system for a particular client can be specified by configuration metadata that include: vault configuration; erasure encoding scheme; encryption scheme; compression scheme; deduplication scheme; access control information; flags showing enabled options; and reference for file system root.
- a client has access to his/her storage space configured in a vault or a number of independent vaults.
- Respective coding techniques i.e., encryption, erasure coding, compression, and namespace can be specified for respective vaults.
- Each vault can be logically divided into volumes, and each volume may be created to store a particular type of data or for a particular user/group of users.
- access rights and data segments sizes can be specified for a volume.
- Fig. 10 is a schematic illustrating of example data and metadata transferring upon receiving WRITE request from client application.
- a client request is received by one of server clusters 1006, more particularly, by the gateway module.
- each server cluster 1006 can include a file system server and a number of processing servers.
- a gateway module is located within file system server.
- a WRITE request with a piece of file 1001 is transferred via communication network to gateway module.
- a network protocol employed for data transferring can depend on a client application, e.g., protocols SMB, NFS, CIFS, iSCSI and RESTful API.
- server cluster 1006 Upon receiving WRITE request 1001, server cluster 1006 performs coding of file segments into encoded chunks 1002 and generates object metadata. Then at step 2 server cluster 1006 initiates a PUT request for each package with encoded chunk, produced from the piece of file. At step 3, a wait occurs for acknowledgements 1003 upon successful placement of packages.
- Object metadata utilized by the object-based distributed storage system, can be distributed over storage nodes 1008 and partially cached within file system servers.
- the leading server cluster with the leading file system server 1007 is selected.
- Leading server cluster is temporary assigned using some consensus algorithm, e.g., Raft.
- metadata 1004 is transmitted to the leading file system server 1007, which retransmits it to other file system servers at step 5.
- metadata 1004 is distributed over the set of server clusters.
- the leading file system server 1007 is waiting for acknowledgements, in order to guarantee data integrity. If some server cluster is unavailable at a given moment, then the leading server cluster monitors status of this server cluster and arranges metadata 1004 transferring, as soon as possible.
- server cluster connected with the client application 1005, receives acknowledgement from the leading server cluster. Then at step 8, acknowledgement on successfully performed WRITE operation is sent to the client application 1005.
- Fig. 11 is a schematic block diagram illustrating example data processing and metadata generation in the case of a WRITE request.
- Gateway module of the processing system receives a WRITE request with a piece of file 1101, in which the file is specified by a unique identifier (ID), while an original data piece within the file is specified by offset indicating beginning of the piece and length of the piece.
- File attributes can be treated as original data metadata and encapsulated into packages together with data segment.
- file attributes 1109 are copied at step 1102. Segmentation of the file piece is performed at step 1103 (illustrated in additional detail in Fig. 12 ). Obtained parts of a file are employed to update an existing data segment or stored as new data segments.
- each new/updated data segment is encoded into a set of data chunks, in which encoding procedure includes deduplication, compression, encryption and erasure coding.
- encoding procedure includes deduplication, compression, encryption and erasure coding.
- compression and deduplication can be optional.
- encryption can be optional for low important data.
- HASH value 1110 is computed for each data segment, wherein a cryptographic hash function is employed, e.g., BLAKE2, BLAKE, SHA-3, SHA-2, SHA-1, MD5.
- Encryption keys 1111 may be based on HASH values 1110 or generated independently from content using random data. HASH values 1110 and encryption keys 1111 are considered to be a part of system metadata, since knowledge thereof is required to reconstruct original data.
- Packages are assigned to storage nodes at step 1105, in which network load is jointly balanced for packages with encoded chunks, produced from several related data segments (illustrated in additional detail in Fig. 17 ).
- packages are transferred from the processing center to storage nodes, where storage nodes send back to the processing center acknowledgments upon saving of packages.
- a data reference (DR) can be generated for each transferred package, in which the DR is an address of package within a storage node. Given DR for a package and permission, the package may be accessed within a storage node.
- a list of DRs 1107 is appended to system metadata of file piece, thereby providing complete object metadata that is obtainable at step 1108.
- object metadata is encoded to guarantee security and reliability.
- object metadata can be encoded in the same way as a data segment, as well as just encrypted and protected against failures using erasure coding or replication.
- object metadata is transferred to storage nodes, and acknowledgements are received by the processing system 101 thereafter.
- Access to metadata, distributed over storage nodes, can be provided using generated metadata references (MDRs).
- MDRs generated metadata references
- an MDR has the same general meaning for metadata as DR for data.
- metadata is spread over server clusters and tree/graph structure of the object-based file system is updated.
- an acknowledgement 1115 is sent to the client application.
- Fig. 12 is a schematic block diagram illustrating example building of data segments from an individual file or from a group of files combined into a logical file. Segmentation of a file 1201 can be performed depending on the file size. More particularly, if a file size is above a pre-defined threshold 1202, then it can be individually packed into a number of encoded chunks. Such files are referred to herein, generally, as large file, while files with a size lower than the threshold are referred to as small files. For example, a value of such threshold may be less than the segment size. If a file 1201 is the large file then, at step 1203, the file 1201 is partitioned in the number of data segments of specified size. For a small file, an attempt to pack several files into one data segment is made.
- the system checks whether the present file 1201 may be combined with already accumulated files or with the next small files.
- the file 1201 is converted into a data segment at step 1205.
- the file 1201 is embedded into a current logical file, where the logical file is a container for small files.
- the size of a logical file is defined in one or more implementations, to be equal to the size of a respective data segment.
- a logical file can be treated as a large file, while in other scenarios it is treated as a set of small files.
- Logical files are built at step 1206 using two principles: pack related (dependent) small files together and to decrease wasted storage space. For example, it is preferable to pack together files that are located in the same folder.
- Fig. 13 is a schematic block diagram illustrating example metadata processing and data reconstruction, in the case of READ request.
- the processing system receives READ request for a piece of file 1301, in which a file is specified by file ID, a piece within the file is specified by offset, indicating beginning and length of the piece, as in the case of WRITE request illustrated in Fig. 11 .
- the file root MDR is identified by given the file ID in a tree/graph structure of the object-based file system, in which a file is represented by a node containing a file root MDR.
- the tree/graph structure of the object-based file system can be partially cached within the processing system.
- object metadata includes information regarding an object location, such as a list of data segments produced from the object (file) and an ordered list of DRs 1310 for each data segment.
- packages with encoded chunks, related to the required file piece 1301 are independently retrieved from the set of storage nodes. These packages may be retrieved in any order.
- original data segments are recovered by coding module from encoded chunks.
- Fig. 14 is a schematic block diagram illustrating example data and metadata removal in the case of a DELETE request.
- This diagram corresponds to an example case when an option for deduplication is enabled. In such case, several links to the same object are possible within the object based file system, e.g., as induced by existence of logical files.
- a file system server identifies the file root MDR, at step 1402. Observe that only one file root MDR preferably corresponds to a file ID.
- Two implementations of deduplication are considered herein. In case of direct deduplication, all unreferenced objects are deleted instantly, while according to the second approach periodical garbage collection is employed as a background process in order to delete all unreferenced objects.
- file root MDR 1403 is removed upon a DELETE request, at step 1410, since each package with encoded chunk is referenced only once.
- MDRs with a list of DRs 1406, related to the file root MDR 1403, are recovered as a part of object metadata at the step 1405.
- the list of DRs 1406 is further employed at step 1407 in order to find MDRs of all objects, which use the same packages with encoded chunks as the file to be deleted.
- packages, utilized only by the file with ID 1401 are deleted.
- metadata corresponding to deleted files is removed at step 1409.
- file root MDR for the file with ID 1401 is also deleted from the object-based file system, at step 1410.
- a list of deleted objects can be maintained within journal logs for possible subsequent garbage collection and for statistics needs.
- MDR, DRs and packages with encoded chunks related to the file are simply removed upon DELETE request. This corresponds to file deletion operation given by steps: 1402, 1405, 1408, 1409 and 1410.
- Fig. 15 is a schematic block diagram illustrating example removal of unreferenced objects from the system in background regime.
- This process may be considered as garbage collection, in which garbage is represented by unreferenced objects stored in the processing system 101 and the set of storage nodes.
- Garbage collection can be implemented as a background process, i.e., periodical search for unreferenced objects with their subsequent removal is performed without termination of request processing.
- a DELETE request can be executed with much smaller latency than in the case of direct deduplication.
- An example garbage collection process occurs as follows.
- a search for unreferenced DRs is performed, where an unreferenced DR is a DR, which does not listed in metadata of any object.
- packages with DRs, specified by the obtained list of unreferenced DRs 1502 are deleted from storage nodes.
- unreferenced DRs 1502 are deleted at step 1504.
- Garbage collection activities can be scheduled depending on system workload.
- network resources, memory resources and computational resources are utilized for garbage collection in periods of low workload.
- Fig. 16 is a schematic block diagram illustrating example encoding of a data segment into a number of chunks. Steps shown at Fig. 16 are executed by a coding module, which is responsible for data encoding and decoding. Data segment 1601 of pre-defined size is received as input argument. Data segment 1601 is treated by the coding module as unstructured data, so only size of the data segment 1601 is relevant and no assumptions about content of the data segment 1601 are made. Integrity check is made upon data segment retrieval. Optional deduplication may be performed at step 1603. Different levels of deduplication are possible, e.g., segment-level deduplication, which is performed by comparison of HASH values for data segments of the same size, stored within the distributed storage system.
- segment-level deduplication which is performed by comparison of HASH values for data segments of the same size, stored within the distributed storage system.
- Optional compression may be performed at step 1604. If a compression option is enabled, then total or selective compression is performed. A compression transformation is applied to each data segment in the case of total compression. In the case of selective compression the compression transformation is applied at first to a piece of data segment. If a reasonable degree of compression is achieved for the piece, then the compression transformation is applied to the whole data segment. A flag showing whether compression was actually applied is stored within packages with encoded chunks. Compression can be performed prior to encryption, to obtain a fair degree of compression.
- an encryption step 1605 is mandatory for data segments with high secrecy degree, while optional in others. By enabling/disabling encryption for different secrecy degrees, tradeoff between security and computational complexity may be arranged. If encryption 1605 is enabled, then a segment can be encrypted as a whole, or it can be divided into several parts and each part is separately encrypted. The former strategy is referred to herein, generally, as segment level encryption, while the latter strategy is referred as chunk level encryption. Segment level encryption strategy can allow only full segment READ/WRITE requests, so in case partial READ/WRITE requests are required by the processing system 101, then chunk level strategy is selected. A strategy is identified at step 1606.
- each segment level encryption strategy has one or several individual encryption keys, so possibility of a malicious adversary accessing all data using one stolen key is eliminated.
- Erasure coding can be applied to protect data against storage node failures and provide fast data retrieval even if some storage nodes are unavailable, e.g., due to outage. Observe that erasure coding provides higher storage efficiency compared to replication. For example, in the case of triple replication two faults can be tolerated and storage efficiency is 33%. The same fault tolerance can be easily achieved with Reed-Solomon code of length 10 and dimension 8, providing storage efficiency 80%.
- erasure coding of obtained K chunks into N codeword chunks is applied at step 1610, in which an error-correction code of dimension K and length N ⁇ K is used.
- a relative size of codeword chunks can be the same as the size of information chunks.
- each chunk is considered as a vector of symbols and i'th symbols of K information chunks are erasure coded into i'th symbols of N codeword chunks, in which symbol size is defined by error-correction code parameters.
- computations for symbols with different indices are performed in parallel, e.g., using vectorization.
- AES advanced encryption standard
- Individual encryption keys for segments that are encrypted with KEK generated using password-based key derivation function (PBKDF2), in which the password for the corresponding vault is employed and salt (random data), e.g., 32 bytes.
- the length of encryption key may be different, e.g., 128, 192 or 256 bits.
- encryption is performed iteratively, where the number of rounds is set sufficiently high in order to provide desirable level of security. The number of rounds is also encoded within a package.
- encryption strategy can depend on a client's preferences. In the case of segment level encryption strategy, the smallest amount of redundancy is introduced. However, this strategy allows only full segment READ and WRITE requests, since data may by encrypted and decrypted only by segments. If partial READ and WRITE requests are needed, then chunk level encryption strategy can be employed. Observe that the last strategy allows to read and write data by chunks. Chunk level strategy with K individual keys provides higher security level, however, it also introduces the highest redundancy.
- Fig. 16 Upon execution of all steps, represented at Fig. 16 , one obtains an ordered list of encoded chunks 1611, where a local index of a package with encoded chunk corresponds to a chunk position within codeword of employed error-correction code.
- information about encoding methods e.g., the number of encryption rounds and erasure coding scheme, can be applied to the data segment, which is included within related packages.
- erasure coding module operates with chunks of data. Each chunk can include a number of elements; this number is the same for all chunks. Operations performed on chunks can be parallelized, since the same computations should be performed for all elements of a chunk.
- An erasure coding scheme can be specified by a generator matrix G of the selected (N, K, D) error-correction code, where N is code length, K is code dimension and D is minimum distance.
- N codeword chunks can be obtained as result of multiplication of K information chunks by generator matrix G.
- K information chunks may be reconstructed from any subset of codeword chunks of cardinality at least N - D + 1.
- MDS maximum distance separable
- Reed-Solomon code e.g., Reed-Solomon code
- codeword chunks can be divided into two groups: K mainstream chunks and N - K standby chunks.
- mainstream chunks are given by K codeword chunks, which provide low-complexity recovering of K information chunks.
- Fig. 17 is a schematic block diagram illustrating example network load balancing for transmission of a group of packages with encoded chunks produced from related segments.
- Network load balancing is optimized to reduce latency for READ and WRITE operations.
- load balancing module 1716 constructs a mapping of packages with encoded chunks, produced from f segments, to storage nodes 1706. More particularly, a set of packages with encoded chunks produced from i'th data segment is mapped to a set of storage nodes, in which 1 ⁇ i ⁇ f. Data segments are referred as related if simultaneous READ requests for them are predicted.
- related data segments can be data segments produced from the same file (for large files) or from files located in the same folder (for small files).
- index of data segment i is set to zero and an amount of data being transferred, referred as traffic prediction, is also set to zero.
- steps 1704 and 1705 are alternately performed, while i ⁇ f 1717, i.e., until all of the segments are processed.
- a mapping for i'th data segment is selected in such a way that weighted load is approximately the same for all storage nodes.
- M i,g is different.
- a weighted load for the g'th storage node is given by L g /a[g], where a[g] is availability coefficient for g'th storage node.
- Availability coefficients for storage nodes 1702 and relevance coefficients for codeword positions 1703 are provided by monitoring module and configuration module, these coefficients are periodically updated.
- traffic prediction is updated according to the mapping for i'th data segment, i.e., counters of packages for storage nodes are increased.
- Computation of availability coefficients for storage nodes 1709 is based on system statistics 1707.
- System statistics is accumulated by monitoring module, which is distributed across the whole processing system. For example, average time between READ request sending moment (to a storage node) and package receiving moment is estimated. Similarly, for WRITE request average time between package sending moment (to a storage node) and acknowledge receiving moment is estimated. These time estimations are referred to herein, generally, as latency estimations for different requests 1711. Distribution of the number of requests over time 1712 is employed to identify groups of almost simultaneous requests for which network load should be optimized in the first place. Amount of transmitted data is measured in the case of traffic distribution 1713 analysis.
- the list of statistics, utilized for computation of availability coefficients is not limited by 1711, 1712 and 1713, i.e., other available statistics may be also utilized.
- Computation of relevance coefficients for codeword positions 1710 is based on system statistics 1707 and configuration metadata 1708, provided by monitoring module and configuration module, respectively.
- Configuration metadata 1708 can be represented by erasure coding scheme 1715. This is usable to identify codeword positions, which are accessed in the first place upon a READ request.
- Relevance coefficients 1703 can be jointly optimized for different request types using probabilities of different requests 1714. More particularly, probabilities of different requests 1714 are employed as weighted coefficients in linear combination of relevance coefficients optimized for different requests.
- the network load can be balanced.
- there is also a reliability requirement such that none of storage nodes may receive more than one element of each codeword.
- Initialization of load balancer can include computation of relevance coefficients for codeword elements, in which codewords belong to a pre-selected code and computations are based on the analysis of pre-selected encoding scheme.
- Fig. 18 is a schematic illustration of an example server cluster cache and its respective environment.
- a server cluster 1801 can include a file system server (FSS) 1802 and a number of processing servers (PS) 1803.
- a cache located within the server cluster 1801 is considered as intermediate storage between a set of storage nodes and client applications.
- a server cluster cache is further referred as a cache.
- a cache can be divided into a metadata cache 1805 and an object cache 1804, in which a metadata cache 1805 is usable to store file system metadata and an object cache 1804 is used to store objects, e.g., data segments.
- Metadata cache 1805 contains the latest version of a part of file system metadata.
- a full version of file system metadata is stored within storage nodes, and this full version is periodically updated using partial version from the metadata cache 1805. Different parts of file system metadata are transferred from storage nodes to metadata cache 1805 on demand.
- FIG. 19 illustrates components of an example cache located within each server cluster.
- a cache of a server cluster 1901 may comprise several storage devices 1907 and, more particularly, random-access memory (RAM) 1905 and a number of solid-state drives (SSD) 1906.
- Storage devices 1907 can be managed by a controller 1904.
- Cache controller 1904 can provide the following functionality: memory allocation, reading and writing by data blocks, block status management, free space management, garbage collection initiation.
- Request analysis and statistical data processing can be performed by analyzer 1902.
- Garbage collector 1903 uses information provided by analyzer to select blocks to be deleted, thereby organizing free space for new blocks.
- Cache for objects and cache for file system metadata are described separately, due to differing logical structures and functionality.
- An object cache can be employed as a data buffer.
- Data can be transferred by portions, by segments between storage nodes and cache, and by sequences of small blocks between cache and client applications. These small blocks are further referred as r/w blocks, and their size depends on client applications.
- r/w blocks are produced by a file system, designed for block-level storage, so r/w block size is 4KB-512KB.
- the segment size corresponds to the block size for object-level storage. In the case of object-level storage large blocks are desired, such as 1MB-128MB. Large blocks can be referred to herein as compound blocks, since they are obtained from contiguous r/w blocks.
- file systems designed for block-level storage are referred as block-based file systems
- file systems designed for object-level storage are referred as object-based file systems.
- data within a cache may be modified by small r/w blocks
- data stored in the cloud i.e., distributed over storage nodes
- compound blocks may be modified by compound blocks.
- Maximum access speed can be achieved if objects are kept in a cache as single pieces. In this case, throughput also increases because of reduced amount of metadata, and the size of other file system data structures decreases.
- the system operates with compound blocks, and a number of different sizes for compound blocks may be specified.
- a compound block size for an object can be selected depending on an object size and an object access pattern. In the case of dominating linear access pattern, large compound blocks may be more efficient, while in the case of dominating random access pattern, smaller compound blocks may be more practical.
- analysis of operations over files, produced by client applications can be performed. For example, file extensions are clustered into a number of categories, depending of dominating access pattern as result of the analysis.
- access pattern is also useful for selection of prefetching strategy for a file, where access pattern is utilized to predict a set of compound blocks to be accessed with high probability in the near future.
- analysis of file sizes is performed. For example, distribution of the number of files with different sizes may be analyzed, as well as distribution of the total number of bytes within files with different sizes. The number of categories can be specified by a client or selected by the system automatically. In a simple case, only one compound block size is selected, so that all objects are divided into compound blocks of the same size.
- An obtained table of file extensions with associated compound block sizes can be kept as a part of system configuration metadata, and may be reconfigured upon administrator's request. The choice of compound block size depends not only on file extension, but also on other files attributes, such as size.
- Fig. 20 shows an example logical structure of server cluster cache for objects according to the present invention.
- Cache logical structure comprises three levels: storage device level 2003 operating over bytes 2007, block-based file system level 2002 operating over r/w blocks 2006, e.g., 4KB-512KB, and object-based file system level 2001 operating over compound blocks 2005, e.g., 1MB-128MB, and regions 2004.
- Memory space is divided into regions 2004.
- Each region 2004 comprises compound blocks 2005 of selected size, where all compound blocks 2005 within a region 2004 have the same size. Observe that segment size for an object, is limited by the largest compound block size. Variety of compound block sizes provides opportunity to keep big objects in contiguous space, while preventing small objects from consuming too much space.
- Each compound block has a unique identifier consisting of region identifier and local identifier, where local identifier specifies compound block inside region. Local identifiers are used to track statuses of compound blocks within region, as well as to access data.
- each region has significance map 2011, where significance of a compound block depends on the last access time and statistics on files stored within the system. Compound blocks of high significance are treated as hot blocks, while blocks of low significance are treated as cold blocks.
- Map 2011 may be implemented together with bitmaps 2009 and 2010 as a status map, or separately from bitmaps.
- Each region 2004 has region head 2008 containing region parameters together with region summary, where region summary shows the number of free compound blocks within the region, the number of dirty blocks within the region and other statistics on the region.
- a file is stored within one region.
- a region may contain one file or several files.
- Region size is selected in order to provide fast memory access, easy memory management and minimize the total amount of metadata, while avoiding excessive segmentation of memory space.
- a new region with specified compound block size is initialized when required. Files are assigned to regions to provide compact data placement, i.e., to minimize the number of regions being employed. If region contains only free blocks, then this region is marked as free one, and it may be initialized for another compound block size.
- CBMD compound block metadata
- CBMD concept is similar to encode data structure in UNIX-style file systems.
- CBMD contains attributes of related file segment.
- File segment data 2013 is stored next to CBMD, so there is no need to store compound block location explicitly. If all file data segments are stored in a region, then this region also contains all necessary metadata to reconstruct the file. Thus, file may be retrieved even if file system metadata is unavailable.
- File identifier is also stored as a part CBMD.
- CBMD may also contain free bit, dirty bit and status value, being updated each time when compound block is accessed.
- Compound block size can be selected to balance the following requirements: minimization of wasted memory space, minimization of the number of compound blocks to be transmitted, implementation simplicity.
- the first and the second requirements can be satisfied by using diversity of compound block sizes.
- this approach with diversity of sizes can hardly be efficiently implemented, and the probability of block selection mismatch increases (in the case of sequential READ requests).
- typical number of different compound block sizes recommended by the system is 1-4.
- File creation includes generation of corresponding metadata and metadata spreading over server clusters, more particularly, file system servers.
- File metadata includes file attributes, identifiers (e.g., file version), cloud location information (e.g., list of segments), file coding settings (compression settings, encryption settings, erasure coding scheme and etc.), file storage settings (e.g., flags for partial updates, intermediate storage settings) and file statistics (e.g., timestamps).
- File caching parameters include maximum number of compound blocks allocated for a file and a parallel transferring threshold.
- R/w block size may be also considered as a parameter, where r/w block size indicates granularity of data writes and reads.
- parallel transferring threshold is equal to the number of recently updated (hot) compound blocks which must be sent only sequentially, other compound blocks of a given file may be sent in parallel. In the case of file opening event a fixed number of compound blocks is allocated. Compound blocks may be transferred to storage nodes by request or by timeout.
- Fig. 21 is a schematic block diagram illustrating example memory allocation for an object.
- the amount of allocated memory is given by data segment size 2101 supplemented by metadata.
- size of compound block is selected from a pre-defined set.
- compound block size equal to specified data segment size 2101 is selected.
- the nearest (fast accessible) region with free compound blocks of required size is selected. Recall that information on free blocks in a region is a part of region summary located in region head. So, there is no need to scan free CB bitmap. Then a free compound block within selected region is occupied at step 2105 and its status is changed at step 2106.
- Location of compound block 2107 is given by the address of selected region and address of the compound block within region.
- the first strategy consists in a selection of compound block size t-times smaller than the data segment size 2101, where t is as small as possible and t such blocks are available.
- the second strategy consists in one using garbage collector to arrange free compound blocks of data segment size 2101. However, probability of free compound block absence is very low, since the processing system performs monitoring of free blocks and garbage collection on a regular basis.
- Fig. 22 is a schematic block diagram illustrating an example removal of obsolete objects from server cluster cache for objects in order to arrange free space for new objects, this procedure is referred as garbage collection.
- Garbage collection comprises three processes: free space monitoring, relocation process and removal process.
- the problem of garbage identification reduces to estimating probability for each compound block to be accessed. Compound blocks marked as garbage are later deleted.
- Free space monitoring process arranged as background process, estimates the number of free compound blocks within each region and analyses segmentation level. Segmentation level of a region depends on location of free compound blocks. More particularly, if free compound blocks are located in contiguous manner, then segmentation level is low; if almost all free compound blocks are separated by utilized compound blocks, then segmentation level is high. Depending on segmentation level estimates, relocation processes may be started.
- region summary located within region head, or scans free compound block bitmap and updates region summary. Region summary information is utilized if it was recently updated, otherwise free compound block bitmap is used. The number of free compound blocks in all regions is analyzed depending on compound block size. If the number of free compound blocks of a particular size in initialized regions is lower than a pre-defined threshold, and there is no memory space to arrange new region, then the removal process is started for regions with compound blocks of specified size.
- Removal process for a specified compound block size 2201 proceeds as follows. At first regions with compound blocks of size 2201 are identified at step 2203. Further steps correspond to separate processing of these regions. Thus, these regions may be processed sequentially or in parallel. In step 2204 information on the last access time for each utilized compound block within region is accessed. At step 2205 significance of each compound block is estimated. In general case significances of related compound blocks are jointly computed, since these significances depend on distribution of the last access time (for these related compound blocks) and system statistics 2208. Typically related compound blocks are given by compound blocks produced from the same file. Significance computation method is employed with different parameters for files of different types. Choice of parameters for a file type mostly depends on data access pattern 2209 dominating for this file type. Other statistics 2210 may be also utilized.
- such parameters for significance computation method are selected, that joint computations for related compound blocks may be reduced to independent computations for each compound block.
- Computed significances are written into compound block significance map. Steps 2204 and 2205 may be skipped if compound blocks significance map was recently updated. At step 2206 less significant compound blocks are removed from server cluster cache for objects. Observe that compound blocks marked as dirty may not be deleted.
- Metadata can be cached within server clusters.
- distributed storage system may comprise one or several server clusters. There is one file system server containing metadata per each server cluster. File system servers store identical copies of metadata.
- Fig. 23 illustrates an example of file representation within the distributed storage system.
- Original file data is represented by chunks of encoded data segment of the file 2315, while original file metadata is given by file attributes 2303 and it is also partially kept within package metadata 2314.
- Other entities are classified as system metadata, which is relevant only within the object-based distributed storage system.
- File data and metadata are distributed over storage nodes and may be requested using references.
- a reference can include prefix, type, HASH and identifier.
- the first part of reference i.e., prefix, identify a particular instance of distributed storage system, vault within the system and volume within the vault.
- the second part of reference is type of the content stored by this reference.
- type shows whether content is represented by data or metadata, e.g., file root metadata or segment metadata.
- Type also shows whether content is related to a logical file, comprising a number of small files, or to a typical file.
- the third part of reference is HASH of the content stored by this reference.
- the fourth part of reference is any identifier, which is able to guarantee reference uniqueness. This identifier may be randomly generated.
- HASH value is employed for deduplication and for integrity check.
- corresponding references contain the same HASH and almost certainly the same prefix and type.
- content comparison is performed, and in the case of coincidence, one reference is replaced by the second one and content, related to the first reference, is removed from the system.
- deduplication is performed separately for each volume.
- file root metadata reference which does not depend on file content, i.e., it does not contain HASH.
- DR data reference
- MDR metadata reference
- DR is a reference to encoded original data
- MDR is a reference to metadata containing DRs and/or MDRs.
- MDRs and DRs of a file are arranged into tree, referred as file reference tree.
- file reference tree contains three levels, where the first level is represented by file root MDR 2301, the second level is represented by segment MDRs 2306 and the third level is represented by DRs 2312.
- the number of levels may be different, while leaves are always given by DRs.
- DRs 2312 In the case of large files the number of levels may be increased, while decreased in the case of small files.
- File root MDR is a special MDR, providing opportunity to iteratively retrieve all file data and metadata from storage nodes.
- File root MDR is unique for each file.
- File root metadata 2302 accessible by file root MDR 2301, includes file attributes 2303, file size 2311 and a list of segments, where each segment within the list is specified by its index 2304, i.e., position within the file, subset of storage nodes (SSN) 2305 and segment MDR 2306.
- index 2304 i.e., position within the file, subset of storage nodes (SSN) 2305 and segment MDR 2306.
- indices 2304 are required to recover a file from data segments.
- Segment metadata 2307 may be transferred using corresponding segment MDR 2306 from any storage node belonging to SSN 2305. If segment metadata 2307 is t-times replicated, then corresponding SSN 2305 includes t storage nodes.
- Segment metadata 2307 can include segment size 2308, segment HASH 2309, codes parameters 2310 and list of packages with encoded chunks, produced from the segment.
- a location of a package is specified by DR 2312 and SSN, as in the case of segment metadata location.
- SSN for a package typically consists of one storage node, since data segment is erasure coded and no additional replication is required.
- Index shows a local position of a package with encoded chunk, i.e., the position of encoded chunk within codeword of employed error-correcting code.
- Erasure coding scheme e.g., error-correcting code, encryption method and compression method are specified in encoding parameters 2310.
- Package 2313, accessible by corresponding DR 2312, includes metadata 2314 and a chunk of encoded data segment of the file 2315.
- metadata 2314 may include metadata for the related chunk 2315, as well as metadata for the file in general.
- Fig. 24 shows an example of logical file metadata.
- Logical file root MDR 2401 is needed for iterative retrieval of logical file together with corresponding metadata and references from storage nodes.
- Distributed storage system operates with logical file root metadata 2402 in the same way as with file root metadata, represented at Fig. 23 .
- Logical file root metadata 2402, stored under reference 2401 can include common metadata and separate metadata for each embedded file.
- Common metadata comprises attributes of the logical file 2403, being similar to attributes of a typical file, size of the logical file 2404, small files embedding scheme 2405 and segment MDR 2406.
- Structure of segment metadata, stored under segment MDR 2406, is represented at Fig. 16 .
- Logical file is always packed within one segment, so 2402 contains only one segment MDR.
- Size of logical file 2404 should not exceed size of corresponding segment. If such data piece is appended to one of embedded files, that logical file does not fit in the data segment anymore, then initial logical file may be rearranged into two logical/typical files.
- Each embedded file is represented within logical file root metadata 2402 by file ID 2407, file status 2408, file offset 2409 and file metadata 2410.
- File ID 2407 helps to retrieve data for a particular embedded file.
- File status 2408 shows whether an embedded file is active or deleted.
- File offset 2409 shows location of embedded file data within data segment.
- File metadata 2410 is metadata of a particular embedded file, this metadata is independent of the scheme. If logical file is rearranged, then file metadata 2410 is just copied into a new logical file. There are two main reasons for logical file rearrangement: garbage collection (for embedded files with status "deleted") and logical file size exceeding segment size.
- File metadata 2410 can include file attributes 2411, file size 2412, file HASH 2413 and encoding parameters 2414.
- Embedded file may be compressed and/or encrypted prior to combining with other files, wherein this initial encoding is described by encoding parameters 2414. This means that all steps shown in Fig. 16 , except erasure coding 1610, may be individually applied to each embedded file, and then the same steps (with another parameters) are applied to a data segment produced from the logical file.
- object-based file system keeps a pair given by and including file ID and file root MDR. Pairs are organized into logical tables and the system can operate with them as with key-value pairs.
- Logical tables are arranged as B+ tree or another structure designed for efficient data retrieval in a block-oriented storage context. Frequently used part of logical tables is cached on the client side, where it arranged as a data structure for maintaining key-value pairs under condition of high insert volume, e.g., log-structured merge (LSM) tree.
- Logical tables distributed over storage nodes can be updated by timeout or if the amount of changes exceeds a threshold (i.e., in the case of high workload).
- Logical tables can be partitioned into segments, which are encoded into chunks encapsulated into a package and distributed over storage nodes. Erasure coding or replication is employed to guarantee protection against storage nodes failures and to provide high speed retrieval even in the case of storage node outages. Partition scheme for logical tables is also packed in encoded chunks encapsulated in packages. Segments of logical tables are retrieved on demand and cached on the client side. Partition scheme is optimized in order to minimize the number of packages being transferred from storage nodes, that is to maximize probability that simultaneously needed parts of the tree are packed together.
- Fig. 25 is a schematic block diagram illustrating an example selection of a data representation structure for file system management within server cluster cache for metadata.
- a data structure can be selected based on analysis of a particular distributed storage system. Distribution of the number of requests over time 2505, being a part of system statistics 2501, can be employed by strategy advisor 2503 to identify predominant operations and operations performed with high frequency over some periods of time. Traffic distribution over time 2506 represents an amount of data processed by the system, when performing various operations. Thus, distribution of the number of requests over time 2505 characterize intensity of various operations, while traffic distribution over time 2506 characterize their significance.
- System infrastructure 2502 is also of great importance, so hardware specification 2507 is also utilized by strategy advisor 2503.
- LSM tree is an appropriate structure, since it is designed specially for the case of intensive insertions.
- search over LSM tree may not be as fast over a B+ tree, and so smaller latency for read operation is provided by B+ tree 2510.
- a more general data structure than a tree may be needed in the case of some functional requirements, for these cases directed graph 2511, e.g., with cycles, is employed.
- Leveled LSM tree with its performance characteristics stands between B+ tree and size-tiered LSM tree.
- Selected strategy 2512 is utilized by object-based file system 2514. Efficiency estimates 2513 for the strategy 2512 are provided to be compared with statistical data, which will be further obtained during system lifetime according to selected strategy 2512.
- Systems and methods to encode data, retrieve and repair data, as well as to distribute data over storage nodes are provided. Proposed methods are optimized for the needs of cloud storage systems, e.g., security is provided via information dispersal. High storage efficiency can be provided by the selected construction of error correcting code. Low latency is guaranteed by network load balancing, low complexity encoding and small I/O overhead in the case of repair.
- Fig. 26 shows modules of an example system 2601, arranged to execute data encoding and decoding using an error-correcting code.
- the system 2601 comprises five modules, arranged to execute data processing, which are managed by a control module 2603 according to system configuration provided by configuration module 2602.
- Configuration module 2602 keeps specification of the employed error-correcting code together with encoding and decoding settings.
- Specification of an error-correcting code includes at least code parameters; if an error-correcting code is based on a number of component codes, then the code specification also comprises specifications of component codes and their composition scheme.
- Fragmentation module 2604 performs partition of original data into segments. Each segment is processed in an individual manner. Segments may have different sizes or the same size; segment size depends on the system requirements.
- Encoding module 2605 performs encoding of segments with error-correcting code. Chunks of encoded data are assigned to storage nodes by load balancing module 2606 to provide low latency data retrieval.
- Retrieval module 2607 performs reconstruction of original data from a sufficient number of encoded chunks, downloaded from storage nodes. In the case of storage node failure an old storage node is replaced by a new one, and repairing module 2608 is employed to reconstruct data within a new storage node.
- Fig. 27 is a schematic block diagram illustrating an example interrelationship of modules of a system, arranged to execute error-correction coding, and environment of the system.
- Original data 2702 is encoded and distributed over storage nodes 2703 by the system 2701, as well as retrieved from storage nodes 2703 on demand.
- Original data 2702 received by the system 2701, is partitioned into segments by fragmentation module 2704. Then a number of encoded chunks are generated for each segment by encoding module 2705.
- Load balancing module 2706 generates mapping of encoded chunks to storage nodes, where mapping is optimized to reduce average latency for segment retrieval operation. Encoded chunks are further transmitted to storage nodes 2703 according to the mapping. Storage nodes 2703 work independently from each other and may be located in different areas. Observe that mapping may be generated by load balancing module 2706 in parallel with data encoding by module 2705, or mapping generation may be combined with transmission of encoded chunks to storage nodes 2703.
- original data 2702 is reconstructed from encoded chunks distributed over storage nodes 2703 as follows.
- Load balancing module 2706 selects several storage nodes, from which encoded chunks are downloaded for a particular segment.
- Retrieval module 2707 performs reconstruction of a segment from a sufficient number of encoded chunks.
- MDS maximum distance separable
- repairing module 2708 is employed to reconstruct data within a new storage node. Repairing module 2708 is able to reconstruct lost encoded chunks from a sufficient number of available encoded chunks, produced from the same segment.
- Fig. 28 shows a flow diagram of example steps executed within encoding module 3801. The case of full block encoding is considered. Prior to actual encoding, a block of original data 3802 is divided into K chunks and encoding results in a codeword 3803, consisting of N encoded chunks, where K is dimension of an error-correcting code and N is its length. Chunks of block 3802 and encoded chunks of codeword 3803 have the same size. During encoding each chunk is considered as a sequence of elements and i 'th elements of K chunks of block 3802 are encoded into i 'th elements of N chunks of codeword, where element size is defined by error-correcting code parameters.
- Error-correcting code is may be specified by any of its K ⁇ N generator matrices. Security is guaranteed by encoding using such generator matrix G , that any chunk with original data can be reconstructed only from at least s encoded chunks, where s is referred as mixing degree. Thus, even if up to s - 1 storage nodes are compromised, a malicious adversary will not be able to recover original data.
- G MG ( syst )
- M K ⁇ K non-singular matrix
- G ( syst ) generator matrix for systematic encoding, i.e., K columns of matrix G ( syst ) constitute identity matrix, indices of these columns are further referred as systematic positions.
- mixing degree of matrix M is at least s
- matrix M is further referred as mixing matrix.
- the codeword c comprises K elements of mixed vector x ( mi x) , further referred as mainstream elements, while other N - K codeword elements are computed at step 3805, these N - K are further referred as standby elements.
- Multiplication of the mixed vector x ( mix ) by K ⁇ ( N - K ) submatrix R of generator matrix G ( syst ) is performed, wherein submatrix R can include N - K columns at non-systematic positions.
- Codeword elements are classified into mainstream elements and standby elements in order to arrange low complexity retrieval of original data. More particularly, mixing matrix M is optimized to insure that K original data elements may be reconstructed from K mainstream elements with low computation complexity. Moreover, if mixing degree s ⁇ K and partial retrieval is supported by the system, then mixing matrix M is further optimized to insure that individual original data elements may be reconstructed from the smallest possible number of mainstream elements. Observe that this number could not be lower than s . Thus, mainstream elements are requested in priority from storage nodes.
- Original data elements are reconstructed from mainstream elements as result of multiplication by matrix M -1 , where M -1 is inverse matrix for M.
- M -1 is inverse matrix for M.
- the number of zeros and units within matrices M and M -1 is maximized.
- the number of non-zero elements in each column of matrix M -1 must be at least s , since matrix M has mixing degree s .
- permutation matrix P ( left ) defines a permutation within original data vector x
- permutation matrix P ( right ) defines a permutation within mixed data vector x ( mix )
- the number of codeword elements to be updated is upper bounded by the sum of the number of standby elements and dimension of matrix S i .
- the number of codeword elements to be updated is upper bounded by the sum of the number of standby elements and dimension of matrix S i .
- block-diagonal structure of matrices S and S -1 insures low computational complexity for encoding and retrieval of original data.
- Fig. 29 is a schematic block diagram illustrating design of an error-correcting code according to an example implementation of the present application. Specifications of the error-correcting code and corresponding mixing matrix are further utilized by other modules of the system.
- Configuration module 2901 receives code dimension K and code length N ⁇ K as input arguments 2902.
- Code C comprises a number of component codes, wherein component codes are classified into two categories: outer codes and inner codes. Lengths of outer codes are divisible by n , while inner codes have the same length t .
- Structure of code C is such that decoding in code C reduces to decoding in its component codes. For example, any single erasure may be recovered in a code of length t , so values of no more than t - 1 elements are required. Particular values of n and t may be received as input arguments 2902 together with N and K , or selected by the configuration module 2901 at step 2904 together with length multipliers b 0 , ..., b h -1 , where h is the number of outer codes. Lengths of h outer codes are given by nb 0 , ..., nb h -1 .
- MDS maximum distance separable
- code C 0 inner Observe that only code C 0 inner will be employed for encoding, while the whole set of inner codes is utilized for data retrieval and repair.
- code C is specified by its generator matrix G ( init ) obtained from matrices C 0 inner and G i outer , 0 ⁇ i ⁇ u , as follows.
- step 2908 systematic generator matrix G ( syst ) is obtained from G ( inner ) . More particularly, linear operations over rows of K ⁇ N matrix G ( inner ) are employed in order to obtain K ⁇ N matrix G ( syst ) comprising K ⁇ K identity matrix. According to one implementation, Gaussian elimination is utilized.
- Generator matrix G specifies encoding scheme for designed error-correcting code C of dimension K and length N .
- Obtained generator matrix G comprising mixing matrix M as submatrix, is further utilized by encoding module 2912, retrieval module 2913 and load balancing module 2914.
- Generator matrix G ( init ) together with generator matrices for inner and outer codes are utilized by repairing module 2911, wherein repairing module 2911 performs recovering of codeword elements by decoding in code C assuming that generator matrix G ( init ) was used for encoding. All code specifications 2910 are also stored within configuration module.
- minimum distance of inner codes is at least 2. So, inner code C 0 inner is able to recover at least one erasure. From the structure of matrix G ( init ) it can be seen, that a codeword c of the code C can include n codewords of ( t , u ⁇ t , w 0 ⁇ 2) code C 0 inner , so any element c i can be expressed as a linear combination of at most t - 1 other elements c j , 0 ⁇ j ⁇ N, 0 ⁇ i ⁇ N . Thus, any erased codeword element may be locally decoded using at most t - 1 other codeword elements, provided that their values are available.
- a ⁇ b matrix is referred as a matrix in systematic form when it contains a ⁇ a identity matrix as a submatrix, a ⁇ b , and a set of column indices containing this submatrix is referred as a set of systematic positions.
- a 0 inner be a some set of systematic positions for the code C 0 inner
- G 0 inner syst L inner ⁇ G 0 inner
- L ( inner ) is a non-singular matrix.
- Fig. 30 shows a flow diagram of steps executed within encoding module in the case of system supporting both full block WRITE and part block WRITE requests.
- encoding module 3001 selects encoding strategy. If a full block is received, the only possible strategy is full block encoding, comprising data mixing step 3005 and redundancy computation step 3006. An output of step 3006 is a whole codeword 3003. If a part of block is received, then choice of strategy 3007 is made depending on the number of original data elements within partial block and their positions. More particularly, a strategy at step 3007 is selected to minimize network traffic, i.e., the number of encoded chunks being transferred between storage nodes and client side.
- step 3008 If full block encoding strategy appears to be more efficient, then missing elements of the partial block are reconstructed at step 3008. If partial block encoding strategy is more preferable, then at step 3009 difference between received partial block and the initial one is encoded and the initial codeword is updated according to encoding result, wherein the initial codeword is downloaded from storage nodes and the initial block of original data is reconstructed from the initial codeword. An output of step 3009 is a partial codeword 3003.
- Fig. 31 shows a flow diagram of steps executed to update a few elements within a block of original data, wherein partial encoding 3101 is employed in order to update codeword elements.
- Encoding 3101 of partial block of original data 3102 results in partial codeword 3103.
- Obsolete elements of the block 3107 x ( old ) may be recovered on demand; however, their values are usually pre-fetched by the system.
- structure of the generator matrix G of selected error-correcting code is such, that if vector x ( XOR ) contains only a few non-zero elements, then the obtained vector c ( XOR ) also contains only a few non-zero elements.
- Partial block encoding is employed only to update a few elements within a block, so original data difference vector x ( XOR ) always contains only a few non-zero elements. Thus, the number of codeword elements to be updated and further transmitted to storage nodes is small.
- Fig. 32 is a schematic block diagram illustrating initialization of load balancing module and steps performed to map encoded data to storage nodes.
- Load balancing module 3201 can include two components, wherein the first component performs initialization 3202 and the second component computes mapping 3203.
- Initialization of load balancing module 3201 consists in computation of relevance coefficients 3205 for N positions of codeword elements, wherein codewords belongs to a pre-selected ( N, K ) error-correcting code and computations are based on the analysis of pre-selected encoding scheme 3204 for this code.
- Initialization component 3202 receives encoding scheme 3204, e.g., generator matrix G of the pre-selected code, as input argument.
- encoding scheme 3204 e.g., generator matrix G of the pre-selected code
- P ( part WRITE ) 0 and relative average number of WRITE requests is the same for all codeword elements.
- K ⁇ K matrix M can include columns of matrix G corresponding to mainstream elements, i.e., G - , i with i ⁇ A .
- m i
- wt( M -1 [-, m i ]) is the number of non-zero elements within m i 'th column of inverse matrix to matrix M.
- (1 - P ( repair ) ) is the probability of READ operation for the purpose of information retrieval when storage nodes corresponding to elements of mainstream group are available.
- Output of initialization component 3202 is given by relevance coefficients ⁇ ( i ) 3205 for codeword elements c i , 0 ⁇ i ⁇ N , which are passed to mapping component 3203.
- Mapping component 3203 receives a number ⁇ of related codewords 3209 and transmission schedule 3211 as input arguments, while relevance coefficients 3205 and availability coefficients for storage nodes 3210 are received as input parameters. Thus, mapping component 3203 execute computations each time, when a number of related codewords 3209 and transmission schedule 3211 are received. Transmission schedule 3211 is optional, by default it is equal to zero. Two codewords are referred as related if READ request is predicted to be for both of them simultaneously, e.g., codewords produced from the same file. Availability coefficients for storage nodes 3210 are based on the bandwidth information and traffic prediction, e.g., average latency estimation in the case of WRITE and READ requests.
- mapping matrix ⁇ is optimized.
- ⁇ ⁇ N matrix is referred as mapping matrix if each of its rows is given by some permutation of vector (0,1, ..., N - 1).
- traffic prediction for i 'th storage node is given by ⁇ ( i ) + ⁇ ( i ), where ⁇ ( i ) is initial network traffic prediction for the i 'th storage node.
- Optimization of mapping matrix ⁇ consists in selection of such permutations, that ⁇ 0 + ⁇ 0 a 0 ⁇ ⁇ ⁇ ⁇ n ⁇ 1 + ⁇ n ⁇ 1 a n ⁇ 1 , where a i is availability coefficient for the i 'th storage node.
- Optimized ⁇ ⁇ N mapping matrix ⁇ specifies mapping of codeword elements to storage nodes, wherein a set of codeword elements assigned to the g 'th storage node is given by elements c ⁇ j g j , 0 ⁇ j ⁇ ⁇ , where c t j is the t 'th element of the j 'th codeword.
- Fig. 33 shows flow diagram of example steps executed within repairing module for reconstruction of erased elements of encoded data.
- repairing module 3301 is employed to reconstruct data within a new storage node.
- repairing module 3301 is also employed, when required element of original data cannot be reconstructed from elements of mainstream group, since some of them are unavailable due to storage node outage. In both cases these unavailable codeword elements are referred as erased elements.
- Repairing module 3301 receives erasure configuration 3302 as input argument. Repairing module 3301 compute values of erased elements 3303 and adjusted erasure configuration 3304 as follows. At step 3305 repair schedule is constructed. Within step 3305 all operations are performed over erasure configurations, thus actual values of elements are not required. Repair schedule designed to minimize the number of codeword elements to be transmitted from storage nodes. Repair schedule comprises specification of operations to be performed over non-erased codeword elements in order to obtain values of erased codeword elements. Repair schedule also contains list of positions of employed non-erased elements. Codeword elements are requested according to this list from storage nodes at step 3306. If all requested codeword elements are received 3307, then values of erased codeword elements are computed according to the repair schedule at step 3308.
- erasure configuration is adjusted by appending to it positions of requested but not received codeword elements. Adjusted erasure configuration 3310 is employed to design a new schedule at step 3305.
- Fig. 34 shows flow diagram of attempts to recover codeword elements using different strategies in accordance with one or more implementations.
- one of three repair strategies is selected. These strategies are referred as single-stage repair, multi-stage repair and massive repair. Choice of strategy depends on the number of erasures and structure of erasure configuration.
- Single-stage repair has the lowest complexity and it is the most commonly used one.
- Multi-stage repair include single-stage repair as the first step. Massive repair is used only in the case of single-stage and multi-stage repair failures, since its complexity is higher than complexity of multi-stage repair.
- Output of repair schedule design 3401 is given by repair schedule 3403, constructed to recover erasure configuration 3402.
- repair schedule 3403 constructed to recover erasure configuration 3402.
- an attempt to recover erased elements within single-stage repair strategy is made. If repair schedule was successfully 3405 designed at step 3404, then this schedule is returned as output repair schedule 3403. If single-stage repair failed to recover all erasures, then an attempt to recover them within multi-stage repair is made at step 3406. Upon success, multi-stage repair schedule is returned as output repair schedule 3403, otherwise repair schedule 3403 is designed according to massive repair strategy 3408.
- Fig. 35 shows flow diagram of repairing steps corresponding to single-stage repair strategy.
- Single-stage repair schedule module 3501 receives erasure configuration 3502 as input argument. Any erasure configuration containing up to w 0 - 1 erasures is recoverable by single-stage repair, where w 0 is the minimum distance of inner code C 0 inner . Erasure configurations containing more than ( t - u ) n erasures could not be recovered with single-stage repair, so such erasure configurations are passed to multi-stage repair schedule module. If the number of erasures is between w 0 and ( t - u ) n , then there is possibility that this erasure configuration may be recovered by single stage repair.
- codeword c of the error-correcting code C can include n codewords c [ T j ] of the ( t , u , w 0 ) code C 0 inner , 0 ⁇ j ⁇ n ; and also recall, that the code C 0 inner is able to correct up to w 0 - 1 erasures.
- single-stage repair performs recovering of erased elements within each codeword c [ T j ] independently.
- the single stage repair schedule module 3501 declares success and returns repair schedule 3503.
- Single-stage repair schedule 3503 is such that for each erased element c g , g ⁇ T j , it comprises a representation of c g as a linear combination of non-erased elements of c [ T j ]. If there are some unrecoverable erasures, single-stage repair schedule module 3501 declares failure and passes erasure configuration 3502 to multi-stage repair module.
- step 3605 reconstruction of first b i information elements for codewords of the code C i inner is performed as follows.
- Each of n codewords of an inner code is processed individually.
- Successfully expressed elements y g are marked as recovered, i.e., ê g ⁇ 0, otherwise marked as erased, i.e., ê g ⁇ 1.
- T j ⁇ A i outer b i .
- each element y g is marked as recovered ê g ⁇ 0.
- step 3609 erased elements within codewords of the inner code C i inner are repaired.
- Each of n codewords of an inner code is processed individually.
- Successfully expressed elements c g are marked as recovered, i.e., e g ⁇ 0.
- Massive repair schedule module receives erasure configuration as input argument. Error correction capability of massive repair is limited only by code construction. Thus, any erasure configuration containing less than D erasures is recoverable by massive repair, where D is the minimum distance of error-correcting code C . If the number of erasures is between D and N - K, then there is possibility that this erasure configuration may be recovered by massive stage repair. Observe that in the case of MDS codes values of N - K is equal to D - 1; however, error-correcting code C does not belong to MDS codes. Massive repair can be implemented in several ways.
- repair using parity checks is performed as follows. In order to repair g codeword elements, such group of g parity checks is selected, that i 'th erased codeword element participates only in the i 'th parity check from ⁇ ( g ) , and the total number of different codeword elements participating in parity checks of ⁇ ( g ) is minimized.
- the second condition corresponds to minimization of the number of READ requests.
- non-erased highly available codeword elements ⁇ are selected, wherein elements ⁇ are such that corresponding columns of generator matrix are linearly independent. Thus, latency is minimized. Observe that the number of READ requests is equal to K.
- matrix ⁇ is constructed from columns of generator matrix corresponding to elements ⁇ and g columns corresponding to elements being repaired.
- Gaussian elimination is applied to rows of matrix ⁇ in order to obtain matrix in systematic form, wherein systematic positions correspond to elements ⁇ . Values of elements being repaired are obtained by multiplication of sequence of elements ⁇ by non-systematic part of matrix ⁇ .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
- Retry When Errors Occur (AREA)
Claims (15)
- Verfahren zum Verteilen von Daten einer Vielzahl von Dateien über eine Vielzahl von jeweiligen entfernten Speicherknoten (1709), wobei das Verfahren umfasst:Spalten in Segmente (1103), der Daten der Vielzahl von Dateien, durch einen oder mehrere Prozessoren, die konfiguriert sind, Code auszuführen, der in nichtflüchtigen prozessorlesbaren Medien gespeichert ist;Codieren (1104), durch den einen oder die mehreren Prozessoren, jedes Segments in eine Anzahl von Codewortstücken, wobei keines der Codewortstücke einen jeglichen der Segmente enthält;Verpacken (1104) jedes Codewortstücks mit mindestens einem Codierungsparameter und Identifikator in mindestens ein Paket;Erzeugen (1108), durch den einen oder die mehreren Prozessoren, von Metadaten (1004) für mindestens eine Datei der Vielzahl von Dateien und Metadaten für verwandte Segmente der mindestens einen Datei, wobei die Metadaten für die mindestens eine Datei Informationen enthalten, um die mindestens eine Datei aus den Segmenten zu rekonstruieren, und Metadaten für die verwandten Segmente Informationen enthalten, um die verwandten Segments aus entsprechenden Paketen zu rekonstruieren;Codieren (1112), durch den einen oder die mehreren Prozessoren, der Metadaten in mindestens in Paket, wobei das Codieren einer jeweiligen Sicherheitsstufe und einem Schutz vor Speicherknotenausfall entspricht;Berechnen von Verfügbarkeitskoeffizienten für die entfernten Speicherknoten (1709) unter Verwendung von statistischen Daten, wobei jeder Verfügbarkeitskoeffizient vorhergesagte durchschnittliche Herunterladegeschwindigkeit für einen jeweiligen Speicherknoten (1709) charakterisiert;Zuweisen (1105), durch den einen oder die mehreren Prozessoren, einer Vielzahl von Paketen zu entfernten Speicherknoten, wobei der Schritt des Zuweisens die Verfügbarkeitskoeffizienten verwendet, um Datenabrufwartezeit zu minimieren und Arbeitslastverteilung zu optimieren;Übertragen (1106), durch den einen oder die mehreren Prozessoren, jedes der Pakete an mindestens einen jeweiligen entfernten Speicherknoten (1709); undAbrufen (1305), durch den einen oder die mehreren Prozessoren, mindestens einer der Vielzahl von Dateien als eine Funktion, die iterativ auf die Pakete von Metadaten und Dateidaten zugreift und diese abruft.
- Verfahren nach Anspruch 1, wobei der Schritt des Spaltens in Segmente Daten innerhalb eines jeweiligen Segments bereitstellt, die einen Teil einer einzelnen Datei oder mehrere Dateien umfassen, und das Verfahren weiter Aggregieren einer Vielzahl von Dateien für ein Segment als eine Funktion des Minimierens eines Unterschieds zwischen Segmentgröße und einer Gesamtgröße eingebetteter Dateien, und einer Wahrscheinlichkeit gemeinsamen Abrufens von eingebetteten Dateien umfasst.
- Verfahren nach Anspruch 1, wobei der Schritt des Codierens jedes Segments eine Deduplizierung als eine Funktion von hashbasierten Merkmalen der Datei einschließt.
- Verfahren nach Anspruch 1, wobei der Schritt des Codierens jedes Segments ein Verschlüsseln (1609) einschließt, wobei mindestens ein Segment vollständig mit einem individuellen Verschlüsselungsschlüssel (1111) verschlüsselt wird, wobei der Verschlüsselungsschlüssel (1111) als eine Funktion von Daten, die verschlüsselt werden, erzeugt wird.
- Verfahren nach Anspruch 4, wobei jeder einer Vielzahl von jeweiligen individuellen Verschlüsselungsschlüsseln mit einem jeweiligen Schlüsselverschlüsselungsschlüssel verschlüsselt wird und über einen jeweiligen Speicherknoten verteilt wird, wobei jeder jeweilige Schlüsselverschlüsselungsschlüssel unter Verwendung einer passwortbasierten Schlüsselableitungsfunktion erzeugt wird.
- Verfahren nach Anspruch 1, wobei der Schritt des Codierens jedes Segments eine Verschlüsselung (1609) einschließt, wobei mindestens ein Segment in Teilstücke aufgeteilt wird, wobei jedes Teilstück getrennt verschlüsselt wird, und wobei weiter eine Anzahl von Verschlüsselungsschlüsseln pro Segment von eins bis zu der Anzahl von Teilstücken reicht.
- Verfahren nach Anspruch 1, wobei der Schritt des Codierens jedes Segments eine Löschcodierung (1610) vom Mischgrad S umfasst, wobei Codewortstücke aus Informationsstücken unter Verwendung einer linearen Blockfehlerkorrekturfunktion produziert werden, und Mischgrad S mindestens S Codewortstücke erfordert, um ein jegliches Informationsstück zu rekonstruieren, wobei jeweilige Löschcodierungstechniken für Datensegmentcodierung und Metadatencodierung derart verwendet werden, dass Metadaten mindestens vor Speicherknotenausfall geschützt werden.
- Verfahren nach Anspruch 1, wobei der Schritt des Zuordnens von Paketen zu entfernten Speicherknoten Abrufwartezeit für eine Gruppe von verwandten Segmenten minimiert.
- Verfahren nach Anspruch 1, weiter umfassend:
Berechnen mindestens eines Relevanzkoeffizienten als eine Funktion von Informationen, die ein eingesetztes Löschkorrekturcodierungsschema und Signifikanz der jeweiligen Codewortposition für Datenabruf darstellen. - Verfahren nach Anspruch 1, wobei Metadaten für eine Datei und Metadaten für verwandte Segments in zwei Teile geteilt werden, bei denen ein Teil individuell in Pakete verpackt wird und ein anderer Teil an Pakete angehängt wird, die jeweilige codierte Datensegmente enthalten.
- Verfahren nach Anspruch 1, weiter umfassend Anordnen von temporärem Speicher von Dateidaten innerhalb eines lokalen Zwischenspeichers durch:Arbeiten über zusammengesetzte Datenblöcke;Teilen von Speicherraum in Bereiche mit zusammengesetzten Blöcken gleicher Größe;Einsetzen einer Dateistruktur, um Dateianordnung innerhalb des lokalen Zwischenspeichers zu optimieren; undDurchführen einer Speicherbereinigung, um freie zusammengesetzte Blöcke anzuordnen.
- Verfahren nach Anspruch 11, wobei Anordnen von temporärem Speicher von Dateidaten innerhalb eines lokalen Zwischenspeichers eine Zwischenspeicheroptimierung unter Einsetzen von Informationen einschließt, die eine Dateistruktur darstellen, wobei Zwischenspeicheroptimierung durch Klassifizieren von Dateien basierend auf einer jeweiligen Vielzahl von Kategorien von Zugangsmustern, und Einsetzen jeweiliger Zwischenspeicherverwaltungsstrategie für ähnlich kategorisierte Dateien vereinfacht wird.
- Verfahren nach Anspruch 1, wobei der Schritt des Abrufens von Paketen von Metadaten und Dateidaten umfasst: Zugreifen, durch den einen oder die mehreren Prozessoren, auf Dateimetadatenreferenzen innerhalb eines lokalen Zwischenspeichers oder innerhalb von entfernten Speicherknoten;Empfangen, durch den einen oder die mehreren Prozessoren, einer Vielzahl von Paketen aus entfernten Speicherknoten durch Metadatenreferenzen, wobei jedes der Pakete Dateimetadaten enthält;Empfangen, durch den einen oder die mehreren Prozessoren, einer Vielzahl von anderen Paketen, die codierte Dateisegmente enthalten, aus Speicherknoten durch Datenreferenzen, wobei die codierten Dateisegmente mindestens teilweise aus Dateimetadaten erhalten werden;Rekonstruieren, durch den einen oder die mehreren Prozessoren, von Dateidaten aus den Paketen als eine Funktion von Metadaten, die Parameter darstellen, die mit einem Codierungsschema und Dateispaltungsschema assoziiert sind.
- Verfahren nach Anspruch 13, wobei Dateiabrufgeschwindigkeit durch Zwischenspeichern von Metadaten aus einer Vielzahl von Dateien auf Client-Seite verbessert wird.
- System (2601) zum Verteilen von Daten einer Vielzahl von Dateien über eine Vielzahl von jeweiligen entfernten Speicherknoten, wobei das System (2601) umfasst:einen oder mehrere Prozessoren in elektronischer Kommunikation mit nichtflüchtigen prozessorlesbaren Speichermedien,ein oder mehrere Software-Module, umfassend ausführbare Anweisungen, die in den Speichermedien gespeichert sind, wobei das eine oder die mehreren Software-Module durch den einen oder die mehreren Prozessoren ausführbar sind und einschließen:ein Fragmentierungsmodul (2604), das den einen oder die mehreren Prozessoren dazu konfiguriert, die Daten der Vielzahl von Dateien in Segmente zu spalten;ein Codierungsmodul (2605), das den einen oder die mehreren Prozessoren dazu konfiguriert, jedes Segment in eine Anzahl von Codewortstücke zu codieren, wobei keines der Codewortstücke ein jegliches der Segmente enthält, und jedes Codewortstück mit mindestens einem Codierungsparameter und Identifikator in mindestens ein Paket zu verpacken;ein Konfigurationsmodul (2602), das den einen oder die mehreren Prozessoren dazu konfiguriert, Metadaten für mindestens eine Datei der Vielzahl von Dateien und Metadaten für verwandte Segmente der mindestens einen Datei zu erzeugen, wobei die Metadaten für die mindestens eine Datei Informationen enthalten, um die mindestens eine Datei aus den Segmenten zu rekonstruieren, und Metadaten für die verwandten Segmente Informationen zum Rekonstruieren der verwandten Segmente aus entsprechenden Paketen enthalten;wobei die Metadaten in mindestens ein Paket codiert werden, wobei das Codieren einer jeweiligen Sicherheitsstufe und einem Schutz vor Speicherknotenausfall entspricht;ein Lastausgleichsmodul (2606), das den einen oder die mehreren Prozessoren dazu konfiguriert, entfernten Speicherknoten eine Vielzahl von Paketen unter Verwendung von Verfügbarkeitskoeffizienten, die für die entfernten Speicherknoten unter Verwendung von statistischen Daten berechnet wurden, zuzuweisen, wobei jeder Verfügbarkeitskoeffizient vorhergesagte durchschnittliche Herunterladegeschwindigkeit für einen jeweiligen entfernten Speicherknoten charakterisiert, und wobei weiter das Lastausgleichsmodul Datenabrufwartezeit minimiert und Arbeitslastverteilung optimiert;ein Steuermodul, das den einen oder die mehreren Prozessoren dazu konfiguriert, jedes der Pakete an mindestens einen jeweiligen entfernten Speicherknoten zu übertragen und mindestens eine der Vielzahl von Dateien als eine Funktion von iterativem Zugreifen auf und Abrufen der Pakete von Metadaten und Dateidaten abzurufen.
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201662308223P | 2016-03-15 | 2016-03-15 | |
| US201662332002P | 2016-05-05 | 2016-05-05 | |
| US201662349145P | 2016-06-13 | 2016-06-13 | |
| US201662434421P | 2016-12-15 | 2016-12-15 | |
| PCT/US2017/022593 WO2017161050A2 (en) | 2016-03-15 | 2017-03-15 | Distributed storage system data management and security |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP3430515A2 EP3430515A2 (de) | 2019-01-23 |
| EP3430515B1 true EP3430515B1 (de) | 2021-09-22 |
Family
ID=58579252
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP17718631.9A Active EP3430515B1 (de) | 2016-03-15 | 2017-03-15 | Datenmanagement und sicherheit für verteilte speichersysteme |
Country Status (6)
| Country | Link |
|---|---|
| US (4) | US10735137B2 (de) |
| EP (1) | EP3430515B1 (de) |
| ES (1) | ES2899933T3 (de) |
| IL (1) | IL261816A (de) |
| MX (2) | MX2018011241A (de) |
| WO (1) | WO2017161050A2 (de) |
Families Citing this family (163)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9847918B2 (en) * | 2014-08-12 | 2017-12-19 | Microsoft Technology Licensing, Llc | Distributed workload reassignment following communication failure |
| US9626245B2 (en) | 2015-02-20 | 2017-04-18 | Netapp, Inc. | Policy based hierarchical data protection |
| US10558538B2 (en) | 2017-11-22 | 2020-02-11 | Netapp, Inc. | Erasure coding repair availability |
| US10795935B2 (en) | 2016-02-05 | 2020-10-06 | Sas Institute Inc. | Automated generation of job flow definitions |
| US10642896B2 (en) | 2016-02-05 | 2020-05-05 | Sas Institute Inc. | Handling of data sets during execution of task routines of multiple languages |
| US10650046B2 (en) | 2016-02-05 | 2020-05-12 | Sas Institute Inc. | Many task computing with distributed file system |
| US10650045B2 (en) | 2016-02-05 | 2020-05-12 | Sas Institute Inc. | Staged training of neural networks for improved time series prediction performance |
| US10931402B2 (en) | 2016-03-15 | 2021-02-23 | Cloud Storage, Inc. | Distributed storage system data management and security |
| MX2018011241A (es) | 2016-03-15 | 2018-11-22 | Datomia Res Labs Ou | Administracion y seguridad de datos del sistema de almacenamiento distribuido. |
| US10380360B2 (en) * | 2016-03-30 | 2019-08-13 | PhazrlO Inc. | Secured file sharing system |
| US10547681B2 (en) * | 2016-06-30 | 2020-01-28 | Purdue Research Foundation | Functional caching in erasure coded storage |
| US10169152B2 (en) | 2016-09-12 | 2019-01-01 | International Business Machines Corporation | Resilient data storage and retrieval |
| US10268538B2 (en) * | 2016-11-28 | 2019-04-23 | Alibaba Group Holding Limited | Efficient and enhanced distributed storage clusters |
| US10552585B2 (en) * | 2016-12-14 | 2020-02-04 | Microsoft Technology Licensing, Llc | Encoding optimization for obfuscated media |
| USD898059S1 (en) | 2017-02-06 | 2020-10-06 | Sas Institute Inc. | Display screen or portion thereof with graphical user interface |
| US10795760B2 (en) | 2017-03-20 | 2020-10-06 | Samsung Electronics Co., Ltd. | Key value SSD |
| US11275762B2 (en) | 2017-03-20 | 2022-03-15 | Samsung Electronics Co., Ltd. | System and method for hybrid data reliability for object storage devices |
| US10915498B2 (en) * | 2017-03-30 | 2021-02-09 | International Business Machines Corporation | Dynamically managing a high speed storage tier of a data storage system |
| USD898060S1 (en) | 2017-06-05 | 2020-10-06 | Sas Institute Inc. | Display screen or portion thereof with graphical user interface |
| US11625303B2 (en) * | 2017-06-23 | 2023-04-11 | Netapp, Inc. | Automatic incremental repair of granular filesystem objects |
| US11829583B2 (en) * | 2017-07-07 | 2023-11-28 | Open Text Sa Ulc | Systems and methods for content sharing through external systems |
| WO2019008748A1 (ja) * | 2017-07-07 | 2019-01-10 | 株式会社Asj | データ処理システムおよびこれを用いた分散データシステム |
| US12184781B2 (en) | 2017-07-10 | 2024-12-31 | Burstiq, Inc. | Systems and methods for accessing digital assets in a blockchain using owner consent contracts |
| US11238164B2 (en) * | 2017-07-10 | 2022-02-01 | Burstiq, Inc. | Secure adaptive data storage platform |
| US10805284B2 (en) * | 2017-07-12 | 2020-10-13 | Logmein, Inc. | Federated login for password vault |
| US10761743B1 (en) | 2017-07-17 | 2020-09-01 | EMC IP Holding Company LLC | Establishing data reliability groups within a geographically distributed data storage environment |
| US20190026841A1 (en) * | 2017-07-19 | 2019-01-24 | Sony Corporation | Distribution and access management of individual media content using code embedded within media content |
| US11477280B1 (en) * | 2017-07-26 | 2022-10-18 | Pure Storage, Inc. | Integrating cloud storage services |
| US10552072B1 (en) * | 2017-07-31 | 2020-02-04 | EMC IP Holding Company LLC | Managing file system namespace in network attached storage (NAS) cluster |
| US10560528B2 (en) * | 2017-08-29 | 2020-02-11 | Western Digital Technologies, Inc. | Cloud-based management of access to a data storage system on a local network |
| US10664574B1 (en) * | 2017-09-15 | 2020-05-26 | Architecture Technology Corporation | Distributed data storage and sharing in a peer-to-peer network |
| US10949302B2 (en) * | 2017-10-09 | 2021-03-16 | PhazrIO Inc. | Erasure-coding-based efficient data storage and retrieval |
| US10282129B1 (en) * | 2017-10-24 | 2019-05-07 | Bottomline Technologies (De), Inc. | Tenant aware, variable length, deduplication of stored data |
| US12191889B2 (en) * | 2017-10-30 | 2025-01-07 | Atombeam Technologies Inc | Data compression with signature-based intrusion detection |
| US12003256B2 (en) * | 2017-10-30 | 2024-06-04 | AtomBeam Technologies Inc. | System and method for data compression with intrusion detection |
| US10417088B2 (en) * | 2017-11-09 | 2019-09-17 | International Business Machines Corporation | Data protection techniques for a non-volatile memory array |
| KR102065958B1 (ko) * | 2017-11-13 | 2020-02-11 | 유한회사 이노릭스 | 파일 전송 방법 및 이를 수행하는 시스템 |
| US10904230B2 (en) * | 2017-11-29 | 2021-01-26 | Vmware, Inc. | Distributed encryption |
| KR102061345B1 (ko) * | 2017-12-18 | 2019-12-31 | 경희대학교 산학협력단 | 강화 학습 기반 암호화 및 복호화 수행 방법 및 이를 수행하는 클라이언트, 서버 시스템 |
| US11550811B2 (en) * | 2017-12-22 | 2023-01-10 | Scripps Networks Interactive, Inc. | Cloud hybrid application storage management (CHASM) system |
| US20190238323A1 (en) * | 2018-01-31 | 2019-08-01 | Nutanix, Inc. | Key managers for distributed computing systems using key sharing techniques |
| US10938557B2 (en) * | 2018-03-02 | 2021-03-02 | International Business Machines Corporation | Distributed ledger for generating and verifying random sequence |
| WO2019183547A1 (en) * | 2018-03-22 | 2019-09-26 | Datomia Research Labs Ou | Distributed storage system data management and security |
| WO2019183958A1 (zh) * | 2018-03-30 | 2019-10-03 | 华为技术有限公司 | 数据写入方法、客户端服务器和系统 |
| JP3230238U (ja) | 2018-04-10 | 2021-01-14 | ブラック ゴールド コイン インコーポレイテッドBlack Gold Coin, Inc. | 電子データを安全に格納するシステム |
| US10474368B1 (en) | 2018-04-24 | 2019-11-12 | Western Digital Technologies, Inc | Fast read operation utilizing reduced storage of metadata in a distributed encoded storage system |
| US10749958B2 (en) | 2018-04-24 | 2020-08-18 | Western Digital Technologies, Inc. | Reduced storage of metadata in a distributed encoded storage system |
| US11042661B2 (en) | 2018-06-08 | 2021-06-22 | Weka.IO Ltd. | Encryption for a distributed filesystem |
| US10802719B2 (en) * | 2018-06-11 | 2020-10-13 | Wipro Limited | Method and system for data compression and data storage optimization |
| US11281577B1 (en) * | 2018-06-19 | 2022-03-22 | Pure Storage, Inc. | Garbage collection tuning for low drive wear |
| US10664461B2 (en) | 2018-06-29 | 2020-05-26 | Cohesity, Inc. | Large content file optimization |
| US11074135B2 (en) * | 2018-06-29 | 2021-07-27 | Cohesity, Inc. | Large content file optimization |
| US10976949B1 (en) * | 2018-07-10 | 2021-04-13 | Amazon Technologies, Inc. | Archiving of streaming data |
| CN109067733B (zh) | 2018-07-27 | 2021-01-05 | 成都华为技术有限公司 | 发送数据的方法和装置,以及接收数据的方法和装置 |
| US10762051B1 (en) * | 2018-07-30 | 2020-09-01 | Amazon Technologies, Inc. | Reducing hash collisions in large scale data deduplication |
| US11474907B2 (en) | 2018-08-01 | 2022-10-18 | International Business Machines Corporation | Apparatus, method, and program product for cluster configuration data backup and recovery |
| US10621123B2 (en) * | 2018-08-02 | 2020-04-14 | EMC IP Holding Company LLC | Managing storage system performance |
| US10965315B2 (en) * | 2018-08-09 | 2021-03-30 | Andrew Kamal | Data compression method |
| US11347653B2 (en) | 2018-08-31 | 2022-05-31 | Nyriad, Inc. | Persistent storage device management |
| CN110874185B (zh) * | 2018-09-04 | 2021-12-17 | 杭州海康威视系统技术有限公司 | 一种数据存储方法及存储装置 |
| US11349655B2 (en) * | 2018-10-05 | 2022-05-31 | Oracle International Corporation | System and method for a distributed keystore |
| US10956046B2 (en) | 2018-10-06 | 2021-03-23 | International Business Machines Corporation | Dynamic I/O load balancing for zHyperLink |
| CN111061744B (zh) * | 2018-10-17 | 2023-08-01 | 百度在线网络技术(北京)有限公司 | 图数据的更新方法、装置、计算机设备及存储介质 |
| US11165574B2 (en) | 2018-10-18 | 2021-11-02 | Here Global B.V. | Secure map data storage using encoding by algorithm pool |
| CN111104057B (zh) * | 2018-10-25 | 2022-03-29 | 华为技术有限公司 | 存储系统中的节点扩容方法和存储系统 |
| US11436203B2 (en) | 2018-11-02 | 2022-09-06 | EMC IP Holding Company LLC | Scaling out geographically diverse storage |
| US11620261B2 (en) * | 2018-12-07 | 2023-04-04 | Vmware, Inc. | Writing data to an LSM tree file structure using consistent cache staging |
| US11232039B2 (en) * | 2018-12-10 | 2022-01-25 | Advanced Micro Devices, Inc. | Cache for storing regions of data |
| US10860726B2 (en) | 2018-12-12 | 2020-12-08 | American Express Travel Related | Peer-to-peer confidential document exchange |
| CN109815292A (zh) * | 2019-01-03 | 2019-05-28 | 广州中软信息技术有限公司 | 一种基于异步消息机制的涉税数据采集系统 |
| US10897402B2 (en) * | 2019-01-08 | 2021-01-19 | Hewlett Packard Enterprise Development Lp | Statistics increment for multiple publishers |
| WO2020160142A1 (en) * | 2019-01-29 | 2020-08-06 | ClineHair Commercial Endeavors | Encoding and storage node repairing method for minimum storage regenerating codes for distributed storage systems |
| US11748197B2 (en) * | 2019-01-31 | 2023-09-05 | Qatar Foundation For Education, Science And Community Development | Data storage methods and systems |
| US11531647B2 (en) | 2019-01-31 | 2022-12-20 | Qatar Foundation For Education, Science And Community Development | Data storage methods and systems |
| CN109885552B (zh) * | 2019-02-18 | 2023-08-18 | 天固信息安全系统(深圳)有限责任公司 | 分布式文件系统的元数据动态管理方法及分布式文件系统 |
| CN109947587B (zh) * | 2019-02-20 | 2022-09-27 | 长安大学 | 非均匀故障保护的分组修复码构造方法及故障修复方法 |
| US11513700B2 (en) * | 2019-03-04 | 2022-11-29 | International Business Machines Corporation | Split-n and composable splits in a dispersed lockless concurrent index |
| CN111722787B (zh) * | 2019-03-22 | 2021-12-03 | 华为技术有限公司 | 一种分块方法及其装置 |
| US20200327025A1 (en) * | 2019-04-10 | 2020-10-15 | Alibaba Group Holding Limited | Methods, systems, and non-transitory computer readable media for operating a data storage system |
| US11748004B2 (en) | 2019-05-03 | 2023-09-05 | EMC IP Holding Company LLC | Data replication using active and passive data storage modes |
| KR102898240B1 (ko) * | 2019-05-22 | 2025-12-10 | 묘타, 인크. | 보안, 복원, 및 제어가 강화된 분산된 데이터 스토리지를 위한 방법 및 시스템 |
| US11496457B2 (en) | 2019-06-10 | 2022-11-08 | Microsoft Technology Licensing, Llc | Partial pattern recognition in a stream of symbols |
| US11178135B2 (en) | 2019-06-10 | 2021-11-16 | Microsoft Technology Licensing, Llc | Partial pattern recognition in a stream of symbols |
| US10866699B1 (en) | 2019-06-10 | 2020-12-15 | Microsoft Technology Licensing, Llc | User interface for authentication with random noise symbols |
| US11258783B2 (en) | 2019-06-10 | 2022-02-22 | Microsoft Technology Licensing, Llc | Authentication with random noise symbols and pattern recognition |
| US11240227B2 (en) | 2019-06-10 | 2022-02-01 | Microsoft Technology Licensing, Llc | Partial pattern recognition in a stream of symbols |
| US11736472B2 (en) | 2019-06-10 | 2023-08-22 | Microsoft Technology Licensing, Llc | Authentication with well-distributed random noise symbols |
| US12155646B2 (en) | 2019-06-10 | 2024-11-26 | Microsoft Technology Licensing, Llc | Authentication with random noise symbols and pattern recognition |
| US11514149B2 (en) | 2019-06-10 | 2022-11-29 | Microsoft Technology Licensing, Llc | Pattern matching for authentication with random noise symbols and pattern recognition |
| US11513898B2 (en) * | 2019-06-19 | 2022-11-29 | Regents Of The University Of Minnesota | Exact repair regenerating codes for distributed storage systems |
| US11314593B2 (en) | 2019-06-25 | 2022-04-26 | Western Digital Technologies, Inc. | Storage node processing of data functions using overlapping symbols |
| US11055018B2 (en) | 2019-06-25 | 2021-07-06 | Western Digital Technologies, Inc. | Parallel storage node processing of data functions |
| US11281531B2 (en) * | 2019-06-25 | 2022-03-22 | Western Digital Technologies, Inc. | Serial storage node processing of data functions |
| US10990324B2 (en) | 2019-06-25 | 2021-04-27 | Western Digital Technologies, Inc. | Storage node processing of predefined data functions |
| CN110275793B (zh) * | 2019-06-27 | 2023-04-07 | 咪咕文化科技有限公司 | 一种用于MongoDB数据分片集群的检测方法及设备 |
| CN110474876B (zh) * | 2019-07-15 | 2020-10-16 | 湖南遥昇通信技术有限公司 | 一种数据编码解码方法、装置、设备以及存储介质 |
| US11394551B2 (en) | 2019-07-17 | 2022-07-19 | Microsoft Technology Licensing, Llc | Secure authentication using puncturing |
| US11449399B2 (en) | 2019-07-30 | 2022-09-20 | EMC IP Holding Company LLC | Mitigating real node failure of a doubly mapped redundant array of independent nodes |
| US11775193B2 (en) | 2019-08-01 | 2023-10-03 | Dell Products L.P. | System and method for indirect data classification in a storage system operations |
| US11133962B2 (en) | 2019-08-03 | 2021-09-28 | Microsoft Technology Licensing, Llc | Device synchronization with noise symbols and pattern recognition |
| CN113544635B (zh) | 2019-09-09 | 2025-03-14 | 华为云计算技术有限公司 | 存储系统中数据处理方法、装置以及存储系统 |
| US11675739B1 (en) * | 2019-09-23 | 2023-06-13 | Datex Inc. | Distributed data storage using hierarchically arranged metadata |
| US11449248B2 (en) * | 2019-09-26 | 2022-09-20 | EMC IP Holding Company LLC | Mapped redundant array of independent data storage regions |
| US11290259B2 (en) * | 2019-10-28 | 2022-03-29 | Gregory Tichy | Data distribution platform |
| US11128440B2 (en) * | 2019-10-29 | 2021-09-21 | Samsung Sds Co., Ltd. | Blockchain based file management system and method thereof |
| CN112749178A (zh) * | 2019-10-31 | 2021-05-04 | 华为技术有限公司 | 一种保证数据一致性的方法及相关设备 |
| US11435910B2 (en) | 2019-10-31 | 2022-09-06 | EMC IP Holding Company LLC | Heterogeneous mapped redundant array of independent nodes for data storage |
| US11435957B2 (en) | 2019-11-27 | 2022-09-06 | EMC IP Holding Company LLC | Selective instantiation of a storage service for a doubly mapped redundant array of independent nodes |
| US11741060B2 (en) * | 2019-11-27 | 2023-08-29 | Veritas Technologies Llc | Methods and systems for scalable deduplication |
| CN111400302B (zh) * | 2019-11-28 | 2023-09-19 | 杭州海康威视系统技术有限公司 | 连续存储数据的修改方法、装置和系统 |
| CN112882647B (zh) * | 2019-11-29 | 2024-06-18 | 伊姆西Ip控股有限责任公司 | 存储和访问数据的方法、电子设备和计算机程序产品 |
| US11210002B2 (en) | 2020-01-29 | 2021-12-28 | Samsung Electronics Co., Ltd. | Offloaded device-driven erasure coding |
| US11507279B2 (en) * | 2020-02-04 | 2022-11-22 | EMC IP Holding Company LLC | Data storage migration in replicated environment |
| US11150986B2 (en) * | 2020-02-26 | 2021-10-19 | Alibaba Group Holding Limited | Efficient compaction on log-structured distributed file system using erasure coding for resource consumption reduction |
| US11928084B2 (en) | 2020-02-28 | 2024-03-12 | Nebulon, Inc. | Metadata store in multiple reusable append logs |
| US11507308B2 (en) | 2020-03-30 | 2022-11-22 | EMC IP Holding Company LLC | Disk access event control for mapped nodes supported by a real cluster storage system |
| US11175989B1 (en) * | 2020-04-24 | 2021-11-16 | Netapp, Inc. | Pooling blocks for erasure coding write groups |
| US11651096B2 (en) | 2020-08-24 | 2023-05-16 | Burstiq, Inc. | Systems and methods for accessing digital assets in a blockchain using global consent contracts |
| CN114327239B (zh) * | 2020-09-27 | 2024-08-20 | 伊姆西Ip控股有限责任公司 | 存储和访问数据的方法、电子设备和计算机程序产品 |
| AU2021254561A1 (en) * | 2021-10-19 | 2023-05-04 | Neo Nebula Pty Ltd | A device, method and system for the secure storage of data in a distributed manner |
| US11693983B2 (en) * | 2020-10-28 | 2023-07-04 | EMC IP Holding Company LLC | Data protection via commutative erasure coding in a geographically diverse data storage system |
| US12250291B2 (en) * | 2020-11-10 | 2025-03-11 | Evernorth Strategic Development, Inc. | Encrypted database systems including homomorphic encryption |
| US11743241B2 (en) | 2020-12-30 | 2023-08-29 | International Business Machines Corporation | Secure data movement |
| US12481796B2 (en) | 2020-12-30 | 2025-11-25 | International Business Machines Corporation | Secure memory sharing |
| US11847141B2 (en) | 2021-01-19 | 2023-12-19 | EMC IP Holding Company LLC | Mapped redundant array of independent nodes employing mapped reliability groups for data storage |
| US11625174B2 (en) | 2021-01-20 | 2023-04-11 | EMC IP Holding Company LLC | Parity allocation for a virtual redundant array of independent disks |
| US11606277B2 (en) * | 2021-02-10 | 2023-03-14 | Cohesity, Inc. | Reducing the impact of network latency during a restore operation |
| US20220318227A1 (en) * | 2021-03-30 | 2022-10-06 | Dropbox, Inc. | Content management system for a distributed key-value database |
| CN112732203B (zh) * | 2021-03-31 | 2021-06-22 | 中南大学 | 一种再生码构造方法、文件重构方法及节点修复方法 |
| CN112988764B (zh) * | 2021-05-14 | 2022-05-10 | 北京百度网讯科技有限公司 | 数据存储方法、装置、设备和存储介质 |
| CN113391946B (zh) * | 2021-05-25 | 2022-06-17 | 杭州电子科技大学 | 一种分布式存储中的纠删码的编解码方法 |
| WO2022246644A1 (en) * | 2021-05-25 | 2022-12-01 | Citrix Systems, Inc. | Data transfer across storage tiers |
| US11354191B1 (en) | 2021-05-28 | 2022-06-07 | EMC IP Holding Company LLC | Erasure coding in a large geographically diverse data storage system |
| US11449234B1 (en) | 2021-05-28 | 2022-09-20 | EMC IP Holding Company LLC | Efficient data access operations via a mapping layer instance for a doubly mapped redundant array of independent nodes |
| CN113472691A (zh) * | 2021-06-16 | 2021-10-01 | 安阳师范学院 | 一种基于消息队列和纠删码的海量时序数据异地归档方法 |
| US12413243B2 (en) | 2021-08-10 | 2025-09-09 | Samsung Electronics Co., Ltd. | Systems, methods, and apparatus for dividing and compressing data |
| US12498869B2 (en) | 2021-08-10 | 2025-12-16 | Samsung Electronics Co., Ltd. | Systems, methods, and apparatus for hierarchical aggregation for computational storage |
| US12074962B2 (en) | 2021-08-10 | 2024-08-27 | Samsung Electronics Co., Ltd. | Systems, methods, and apparatus for dividing and encrypting data |
| US11991293B2 (en) | 2021-08-17 | 2024-05-21 | International Business Machines Corporation | Authorized secure data movement |
| US12265507B2 (en) * | 2021-09-01 | 2025-04-01 | International Business Machines Corporation | Contextually irrelevant file segmentation |
| CN113806316B (zh) * | 2021-09-15 | 2022-06-21 | 星环众志科技(北京)有限公司 | 一种文件同步方法、设备及存储介质 |
| US11936717B2 (en) * | 2021-11-16 | 2024-03-19 | Netflix, Inc. | Scalable media file transfer |
| IT202100031022A1 (it) * | 2021-12-10 | 2023-06-10 | Foolfarm S P A | Metodo per suddividere, distribuire e memorizzare un dato associato ad un soggetto in una pluralità di memorie distribuite di una rete di telecomunicazioni e relativo sistema elettronico di suddivisione, distribuzione e memorizzazione del dato |
| TWI764856B (zh) * | 2021-12-13 | 2022-05-11 | 慧榮科技股份有限公司 | 記憶體控制器與資料處理方法 |
| CN116521668A (zh) | 2022-01-21 | 2023-08-01 | 戴尔产品有限公司 | 用于数据存储的方法、设备和计算机程序产品 |
| CN114723444A (zh) * | 2022-01-21 | 2022-07-08 | 佛山赛思禅科技有限公司 | 一种用于并行投票共识的数据分片方法 |
| US20230315695A1 (en) * | 2022-03-31 | 2023-10-05 | Netapp Inc. | Byte-addressable journal hosted using block storage device |
| CN116975104A (zh) * | 2022-04-22 | 2023-10-31 | 戴尔产品有限公司 | 用于查找数据的方法、电子设备和计算机程序产品 |
| CN114896099B (zh) * | 2022-04-29 | 2023-04-25 | 中国人民解放军93216部队 | 用于泛在存储系统的网络环境自适应编码方法及系统 |
| US12505230B2 (en) * | 2022-05-14 | 2025-12-23 | Dell Products L.P. | Fragment and shuffle erasure coding technique |
| CN117251489A (zh) | 2022-06-10 | 2023-12-19 | 戴尔产品有限公司 | 用于跨区域查询数据的方法、电子设备和计算机程序产品 |
| US11797493B1 (en) * | 2022-07-13 | 2023-10-24 | Code Willing, Inc. | Clustered file system for distributed data storage and access |
| US12531732B2 (en) | 2022-09-29 | 2026-01-20 | Advanced Micro Devices, Inc. | Method and apparatus for storing keys |
| US12399864B2 (en) * | 2023-03-20 | 2025-08-26 | Joseph Vu PHAM | System for permanently storing and controlling a file and associated metadata on the blockchain |
| CN116737723B (zh) * | 2023-06-15 | 2026-01-02 | 北京火山引擎科技有限公司 | 数据存储方法、装置、电子设备及存储介质 |
| CN116860564B (zh) * | 2023-09-05 | 2023-11-21 | 山东智拓大数据有限公司 | 一种云服务器数据管理方法及其数据管理装置 |
| US20250199711A1 (en) * | 2023-12-18 | 2025-06-19 | Western Digital Technologies, Inc. | Peer-to-peer file sharing using consistent hashing for distributing data among storage nodes |
| US12531843B2 (en) | 2023-12-18 | 2026-01-20 | Western Digital Technologies, Inc. | Secure peer-to-peer file sharing using distributed ownership data |
| CN118409714B (zh) * | 2024-07-01 | 2024-11-01 | 杭州海康威视系统技术有限公司 | 一种数据存储方法、装置、设备以及存储介质 |
| CN119067666B (zh) * | 2024-11-07 | 2025-01-24 | 贵州省黔云集中招标采购服务有限公司 | 一种面向交易云平台的信息加密方法、系统及介质 |
| CN119828969B (zh) * | 2024-12-23 | 2026-01-06 | 华中科技大学 | 一种基于智能ssd的kv分离lsm树存储索引加速方法及装置 |
| CN120104579B (zh) * | 2025-05-07 | 2025-08-22 | 华云升达(北京)气象科技有限责任公司 | 基于机器学习的气象元数据存储方法及存储系统 |
| CN120315894B (zh) * | 2025-06-12 | 2025-08-15 | 浪潮电子信息产业股份有限公司 | 一种数据处理系统、方法、设备、介质及程序产品 |
Family Cites Families (48)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6665308B1 (en) | 1995-08-25 | 2003-12-16 | Terayon Communication Systems, Inc. | Apparatus and method for equalization in distributed digital data transmission systems |
| US6307868B1 (en) | 1995-08-25 | 2001-10-23 | Terayon Communication Systems, Inc. | Apparatus and method for SCDMA digital data transmission using orthogonal codes and a head end modem with no tracking loops |
| US7010532B1 (en) | 1997-12-31 | 2006-03-07 | International Business Machines Corporation | Low overhead methods and apparatus for shared access storage devices |
| US6952737B1 (en) | 2000-03-03 | 2005-10-04 | Intel Corporation | Method and apparatus for accessing remote storage in a distributed storage cluster architecture |
| US7272613B2 (en) | 2000-10-26 | 2007-09-18 | Intel Corporation | Method and system for managing distributed content and related metadata |
| CA2331474A1 (en) | 2001-01-19 | 2002-07-19 | Stergios V. Anastasiadis | Stride-based disk space allocation scheme |
| US7240236B2 (en) | 2004-03-23 | 2007-07-03 | Archivas, Inc. | Fixed content distributed data storage using permutation ring encoding |
| JP2007018563A (ja) | 2005-07-05 | 2007-01-25 | Toshiba Corp | 情報記憶媒体、情報記録方法及び装置、情報再生方法及び装置 |
| US7574579B2 (en) | 2005-09-30 | 2009-08-11 | Cleversafe, Inc. | Metadata management system for an information dispersed storage system |
| US8285878B2 (en) | 2007-10-09 | 2012-10-09 | Cleversafe, Inc. | Block based access to a dispersed data storage network |
| US8694668B2 (en) | 2005-09-30 | 2014-04-08 | Cleversafe, Inc. | Streaming media software interface to a dispersed data storage network |
| CN101485204A (zh) | 2006-06-29 | 2009-07-15 | 皇家飞利浦电子股份有限公司 | 一种对数据进行纠错编码和纠错解码的方法及装置 |
| US7698242B2 (en) | 2006-08-16 | 2010-04-13 | Fisher-Rosemount Systems, Inc. | Systems and methods to maintain process control systems using information retrieved from a database storing general-type information and specific-type information |
| US8296812B1 (en) | 2006-09-01 | 2012-10-23 | Vudu, Inc. | Streaming video using erasure encoding |
| US8442989B2 (en) | 2006-09-05 | 2013-05-14 | Thomson Licensing | Method for assigning multimedia data to distributed storage devices |
| US8655939B2 (en) | 2007-01-05 | 2014-02-18 | Digital Doors, Inc. | Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor |
| WO2009032711A1 (en) | 2007-08-29 | 2009-03-12 | Nirvanix, Inc. | Policy-based file management for a storage delivery network |
| US9106630B2 (en) | 2008-02-01 | 2015-08-11 | Mandiant, Llc | Method and system for collaboration during an event |
| EP2342661A4 (de) | 2008-09-16 | 2013-02-20 | File System Labs Llc | Matrixbasierte fehlerkorrektur- und löschcodeverfahren sowie vorrichtung und anwendung dafür |
| US7840680B2 (en) | 2008-10-15 | 2010-11-23 | Patentvc Ltd. | Methods and systems for broadcast-like effect using fractional-storage servers |
| US8504847B2 (en) | 2009-04-20 | 2013-08-06 | Cleversafe, Inc. | Securing data in a dispersed storage network using shared secret slices |
| US8132073B1 (en) * | 2009-06-30 | 2012-03-06 | Emc Corporation | Distributed storage system with enhanced security |
| US20110107182A1 (en) | 2009-10-30 | 2011-05-05 | Cleversafe, Inc. | Dispersed storage unit solicitation method and apparatus |
| US8352831B2 (en) | 2009-12-29 | 2013-01-08 | Cleversafe, Inc. | Digital content distribution utilizing dispersed storage |
| US10216647B2 (en) | 2010-02-27 | 2019-02-26 | International Business Machines Corporation | Compacting dispersed storage space |
| US8782227B2 (en) * | 2010-06-22 | 2014-07-15 | Cleversafe, Inc. | Identifying and correcting an undesired condition of a dispersed storage network access request |
| US8473778B2 (en) | 2010-09-08 | 2013-06-25 | Microsoft Corporation | Erasure coding immutable data |
| US8935493B1 (en) * | 2011-06-30 | 2015-01-13 | Emc Corporation | Performing data storage optimizations across multiple data storage systems |
| US8607122B2 (en) * | 2011-11-01 | 2013-12-10 | Cleversafe, Inc. | Accessing a large data object in a dispersed storage network |
| US8627066B2 (en) | 2011-11-03 | 2014-01-07 | Cleversafe, Inc. | Processing a dispersed storage network access request utilizing certificate chain validation information |
| US8656257B1 (en) | 2012-01-11 | 2014-02-18 | Pmc-Sierra Us, Inc. | Nonvolatile memory controller with concatenated error correction codes |
| US8799746B2 (en) | 2012-06-13 | 2014-08-05 | Caringo, Inc. | Erasure coding and replication in storage clusters |
| WO2014005279A1 (zh) | 2012-07-03 | 2014-01-09 | 北京大学深圳研究生院 | 一种可精确再生的分布式存储码的构建方法及装置 |
| US9304859B2 (en) | 2012-12-29 | 2016-04-05 | Emc Corporation | Polar codes for efficient encoding and decoding in redundant disk arrays |
| US9535802B2 (en) * | 2013-01-31 | 2017-01-03 | Technion Research & Development Foundation Limited | Management and recovery of distributed storage of replicas |
| RU2013128346A (ru) | 2013-06-20 | 2014-12-27 | ИЭмСи КОРПОРЕЙШН | Кодирование данных для системы хранения данных на основе обобщенных каскадных кодов |
| US9241044B2 (en) | 2013-08-28 | 2016-01-19 | Hola Networks, Ltd. | System and method for improving internet communication by using intermediate nodes |
| JP2016534471A (ja) | 2013-10-18 | 2016-11-04 | ヒタチ データ システムズ エンジニアリング ユーケー リミテッドHitachi Data Systems Engineering Uk Limited | シェアード・ナッシング分散型ストレージ・システムにおけるターゲットにより駆動される独立したデータの完全性および冗長性のリカバリ |
| US9648100B2 (en) | 2014-03-05 | 2017-05-09 | Commvault Systems, Inc. | Cross-system storage management for transferring data across autonomous information management systems |
| MX364334B (es) | 2014-05-13 | 2019-04-23 | Cloud Crowding Corp | Almacenamiento y transmisión seguros distribuidos de los datos del contenido multimedia de emisión en continuo. |
| US9684594B2 (en) | 2014-07-16 | 2017-06-20 | ClearSky Data | Write back coordination node for cache latency correction |
| US10043211B2 (en) | 2014-09-08 | 2018-08-07 | Leeo, Inc. | Identifying fault conditions in combinations of components |
| KR101618269B1 (ko) | 2015-05-29 | 2016-05-04 | 연세대학교 산학협력단 | 분산 저장 시스템에서의 데이터 손실 복구 부호화 방법 및 그 장치 |
| US10003357B2 (en) * | 2015-08-28 | 2018-06-19 | Qualcomm Incorporated | Systems and methods for verification of code resiliency for data storage |
| CN105335160B (zh) | 2015-11-10 | 2018-12-28 | 河海大学 | 一种基于jsf的web端组件敏捷开发方法 |
| MX2018011241A (es) | 2016-03-15 | 2018-11-22 | Datomia Res Labs Ou | Administracion y seguridad de datos del sistema de almacenamiento distribuido. |
| US10387248B2 (en) | 2016-03-29 | 2019-08-20 | International Business Machines Corporation | Allocating data for storage by utilizing a location-based hierarchy in a dispersed storage network |
| US10216740B2 (en) | 2016-03-31 | 2019-02-26 | Acronis International Gmbh | System and method for fast parallel data processing in distributed storage systems |
-
2017
- 2017-03-15 MX MX2018011241A patent/MX2018011241A/es unknown
- 2017-03-15 ES ES17718631T patent/ES2899933T3/es active Active
- 2017-03-15 WO PCT/US2017/022593 patent/WO2017161050A2/en not_active Ceased
- 2017-03-15 US US15/460,093 patent/US10735137B2/en not_active Expired - Fee Related
- 2017-03-15 US US15/460,119 patent/US10608784B2/en active Active
- 2017-03-15 EP EP17718631.9A patent/EP3430515B1/de active Active
-
2018
- 2018-09-14 MX MX2022014374A patent/MX2022014374A/es unknown
- 2018-09-16 IL IL261816A patent/IL261816A/en unknown
-
2020
- 2020-08-03 US US16/983,323 patent/US20210021371A1/en not_active Abandoned
-
2022
- 2022-03-29 US US17/707,456 patent/US20220368457A1/en not_active Abandoned
Also Published As
| Publication number | Publication date |
|---|---|
| MX2022014374A (es) | 2022-12-15 |
| US20220368457A1 (en) | 2022-11-17 |
| US20170272100A1 (en) | 2017-09-21 |
| MX2018011241A (es) | 2018-11-22 |
| US20170272209A1 (en) | 2017-09-21 |
| US10735137B2 (en) | 2020-08-04 |
| ES2899933T3 (es) | 2022-03-15 |
| US10608784B2 (en) | 2020-03-31 |
| WO2017161050A3 (en) | 2017-11-30 |
| IL261816A (en) | 2018-10-31 |
| US20210021371A1 (en) | 2021-01-21 |
| EP3430515A2 (de) | 2019-01-23 |
| WO2017161050A2 (en) | 2017-09-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20220368457A1 (en) | Distributed Storage System Data Management And Security | |
| US11777646B2 (en) | Distributed storage system data management and security | |
| CN113994626B (zh) | 具有增强的安全性、弹性和控制的分布式数据存储方法及系统 | |
| US10613776B2 (en) | Appyling multiple hash functions to generate multiple masked keys in a secure slice implementation | |
| US10846411B2 (en) | Distributed database systems and methods with encrypted storage engines | |
| US8190662B2 (en) | Virtualized data storage vaults on a dispersed data storage network | |
| US8171101B2 (en) | Smart access to a dispersed data storage network | |
| US10180912B1 (en) | Techniques and systems for data segregation in redundancy coded data storage systems | |
| US9891829B2 (en) | Storage of data with verification in a dispersed storage network | |
| US10693640B2 (en) | Use of key metadata during write and read operations in a dispersed storage network memory | |
| WO2007120437A2 (en) | Metadata management system for an information dispersed storage system | |
| GB2463078A (en) | Data storage and transmission using parity data | |
| CN111858149B (zh) | 属于集群的系统、用于备份的方法和机器可读介质 | |
| US20250348215A1 (en) | Secure storage of data via a block-based distributed computer system | |
| US11782789B2 (en) | Encoding data and associated metadata in a storage network | |
| WO2019183547A1 (en) | Distributed storage system data management and security | |
| Wei et al. | Expanstor: Multiple cloud storage with dynamic data distribution | |
| US10360391B2 (en) | Verifiable keyed all-or-nothing transform | |
| Akintoye et al. | Lightweight Cloud Storage Systems: Analysis and Performance Evaluation | |
| Chaitanya et al. | Middleware for a re-configurable distributed archival store based on secret sharing | |
| Jun | Research and Implement of a Security iSCSI Based on SSL | |
| Gangathade et al. | Review on Secure System With Improved Reliability Using Distributed Deduplication |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20181015 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| AX | Request for extension of the european patent |
Extension state: BA ME |
|
| DAV | Request for validation of the european patent (deleted) | ||
| DAX | Request for extension of the european patent (deleted) | ||
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
| 17Q | First examination report despatched |
Effective date: 20200217 |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
| INTG | Intention to grant announced |
Effective date: 20210422 |
|
| GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
| GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
| AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
| REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602017046357 Country of ref document: DE |
|
| REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP Ref country code: AT Ref legal event code: REF Ref document number: 1432862 Country of ref document: AT Kind code of ref document: T Effective date: 20211015 |
|
| REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG9D |
|
| REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20210922 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 |
|
| REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1432862 Country of ref document: AT Kind code of ref document: T Effective date: 20210922 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211223 |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E Free format text: REGISTERED BETWEEN 20220210 AND 20220216 |
|
| REG | Reference to a national code |
Ref country code: ES Ref legal event code: FG2A Ref document number: 2899933 Country of ref document: ES Kind code of ref document: T3 Effective date: 20220315 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220122 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220124 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: AL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602017046357 Country of ref document: DE |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 |
|
| PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
| 26N | No opposition filed |
Effective date: 20220623 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 |
|
| REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20220331 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20220315 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20220315 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20220331 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO Effective date: 20170315 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: MK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: ES Payment date: 20240429 Year of fee payment: 8 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210922 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20250930 Year of fee payment: 9 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20250930 Year of fee payment: 9 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20250930 Year of fee payment: 9 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: CH Payment date: 20250930 Year of fee payment: 9 |