US20170060924A1 - B-Tree Based Data Model for File Systems - Google Patents
B-Tree Based Data Model for File Systems Download PDFInfo
- Publication number
- US20170060924A1 US20170060924A1 US15/084,401 US201615084401A US2017060924A1 US 20170060924 A1 US20170060924 A1 US 20170060924A1 US 201615084401 A US201615084401 A US 201615084401A US 2017060924 A1 US2017060924 A1 US 2017060924A1
- Authority
- US
- United States
- Prior art keywords
- file
- objects
- data
- tree
- hash
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G06F17/30327—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/185—Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/289—Object oriented databases
-
- G06F17/30115—
-
- G06F17/30607—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Definitions
- This disclosure relates generally to data processing and, more specifically, to methods and systems for providing a B-tree based data model for organizing file systems.
- a file system may be viewed as a directed graph of objects, wherein nodes and leaves of the directed graph represent files and directories. Directories may further include subdirectories and files.
- each file or directory is assigned attributes that regulate user permissions for viewing, editing, and creation of files and directories. Attributes of directories and files are kept in the directed graph as objects. Large files in a file system may be split into a chain of blocks. Therefore, during a lifetime of a file system, the directed graph may be developed to include very long paths or chains of referring objects to form the root of the directed graph to leaves.
- any modifications to the file system such as a modification or creation of a new file or directory, require traveling the path through nodes to find a place for adding, creating, or modifying a new node. Having long and unbalanced paths from the root to leaves may require excessive input/output operations that may be time consuming and lead to unnecessarily redundant storage consumption.
- An example embodiment can provide a B-tree based data model constructed on top of an object store.
- the object store may include immutable, content-addressable, distributed objects representing fragments of user content.
- a method includes providing an object store to store objects.
- the objects can represent fragments of the data.
- Each object can be associated with an address.
- the method can further include associating a B-tree with the object store.
- the B-tree can include nodes, with each of the nodes including keys.
- Each key can be associated with at least one object from the object store.
- the method allows generating, based at least partially on objects from the object store, values for each key.
- the address of an object in the object store is based on content of the fragment of the data. In some embodiments, the method determines whether a size of an object from the object store is less than a pre-determined size. If the result of the determination is positive, a value of the object is stored in a particular node of the B-tree. The particular node includes a particular key associated with the object. If the result of the determination is negative, the address of the object is stored in the particular node of the B-tree.
- each object includes one of a metadata object or a data object.
- Metadata objects store at least a number of references to further objects from the data objects and an identification number associated with a file or a directory.
- the data object represents a continuous fragment of the file or a directory entry.
- the key associated with the metadata object includes at least three fields.
- the first field includes an indication of the metadata object.
- the second field includes the identification number.
- the third field includes a metadata index representing a distinct type of the metadata object.
- the type of the metadata object includes one of: attributes associated with the file or the directory, a symbolic link to the file or the directory, and an extended file attribute associated with the file or the directory.
- the key associated with the fragment of the file includes at least three fields. The first field includes an indication of the data object. The second field includes the identification number. The third field includes an offset of the continuous fragment of the file from the beginning of the file.
- the key associated with the directory entry includes at least three fields.
- the first field includes an indication of the data object.
- the second field includes the identification number.
- the third field includes a hash calculated based on a literal name of the directory entry.
- calculating the hash includes applying a hash function to the literal name to obtain a preliminary hash.
- the hash function can include, for example, crc32c, SipHash, or SipHash-order3.
- the preliminary hash can be shifted by a pre-determined base number to obtain the hash. If the hash matches an existing hash, then the hash is incremented by 1.
- the steps of the method for organizing data are stored on a machine-readable medium comprising instructions, which, when implemented by one or more processors, perform the recited steps.
- FIG. 1 illustrates an example system for organizing computer data.
- FIG. 2 illustrates an example of two collections sharing mutual data objects.
- FIG. 3 illustrates an example B-tree.
- FIG. 4 displays a table containing types of variables.
- FIG. 5A illustrates attributes describing a superblock entry in a data model.
- FIG. 5B illustrates attributes of a key used in a B-tree based data model.
- FIG. 6 displays attributes of a metadata, attributes of a file entry, and additional attributes of a directory entry in a B-tree based data model.
- FIG. 7 illustrates internal and external forms for extended attributes used in a B-tree data model.
- FIG. 8 illustrates a key, internal form, and external form for values for a file chunk.
- FIG. 9 illustrates a key and forms for values for a directory entry.
- FIG. 10 is a process flow diagram showing a method for organizing data.
- FIG. 11 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
- the technology described herein allows organizing data using a B-tree built on top of an object store.
- the object store includes immutable content-addressable distributed objects.
- the objects may represent fragments of data (for example, fragments of files and directories and metadata including attributes of files and directories).
- the method for organizing data includes providing an object store to store objects.
- the objects represent fragments of the data.
- Each object is associated with an address.
- the method can further include associating a B-tree with the object store.
- the B-tree includes nodes, with each node including keys.
- Each key can be associated with at least one object from the object store.
- the method allows generating, based at least partially on objects from the object store, values for each of the keys.
- FIG. 1 illustrates an example system 100 for organizing data, according to various example embodiments.
- the illustrated system 100 includes applications 110 , a directed graph 120 , and an object store 130 .
- applications may include operational systems, user applications, graphical user interface, and other software for creating, managing, and presenting the data.
- the object store 130 includes key-value entries (objects).
- Each of the key-value entries includes an identifier (a key) and a payload (also referred to as a value or an object content) representing a chunk of the data (for example, a chunk of user content.)
- objects are designated as either “data” or “metadata.” Payloads of “data” objects include only uninterpreted bytes. Payloads of “metadata” objects have an internal structure and may refer to other objects. In some embodiments, the payloads of “metadata” objects include keys of the objects to which the “metadata” objects refers.
- objects in the object store 130 are organized by a graph structure 120 (also referred as a directed graph or a collection).
- the graph structure 120 is a directed graph in which each node is an immutable content-addressable object.
- the collection includes a specific object designated as a root object.
- objects are at least partially content-addressable. This implies that some portions of the identifiers of objects are functions of object contents.
- a key for the object representing a chunk of data can be calculated using a smart hash (Smash) which is a function of bytes in the chunk.
- Smash smart hash
- each version of data corresponds to a new graph structure. Two graph structures representing two different snapshots can share mutual objects.
- FIG. 2 is a block diagram showing two collections 100 and 200 as having mutual objects.
- Collection 100 includes root objects 110 and objects 122 , 124 , 126 , 128 , 130 , and 230 .
- the collection 200 includes root object 210 , objects 222 , 224 , 226 , 228 , 230 , 126 , 130 , and 232 .
- the collections 200 and 100 share at least objects 126 , 130 , and 230 .
- the collection 200 is an increment of graph 100 (shown in FIG. 2 ).
- Each of the root objects is the root of a specific unique immutable graph.
- a new graph is constructed that includes the required changes. If the new graph is an incremental change from a previous graph, then the new graph is likely to share a large number of its objects with the previous graph. The only change required by the new objects is a new root.
- the objects in object store 130 include entities such as identifier node (“mode”), extended attributes (“xattr”), a symbolic link (“symlink”), and chunks of files and directories.
- each file or directory is represented by a chain of at least three objects: “mode” object, “xattr” object, and a “data” object.
- An “mode” object refers the “xattr” object and the “data” object.
- An “mode” object includes an identification number (“inum”) associated with the file or directory, number of links, file type and permission, creation and modification of file or directory, and other attributes.
- An “xattr” object includes extended file attributes depending on a type of a file system, such as information concerning an author of a text, a checksum, encoding, and so forth.
- An extended attribute may include a name of the attribute and a value associated with the attribute's name.
- the “data” object is an object containing a chunk of data (for example, a chunk of a file).
- an address of an object in object store 130 is a Smash, which is a hash function of the content of the object.
- the graph structure 120 includes a B-tree (also referred to as a B-tree model).
- a B-tree model for referencing metadata objects and data objects may allow accessing files by “mode” number in order to support persistent file handles.
- the B-tree may provide tools for managing efficient large directories having up to 1,000,000 entries, hard links, and a large amount of small files.
- the B-tree model allows flexible data layouts for non-streaming workloads.
- B-tree collections tend to have fewer, larger objects with many smaller file system entities such as “modes,” “symlinks,” “xattrs,” and small amounts of data packed together.
- the downside includes an increased conceptual complexity resulting in a structure that is unlike the user-visible file system namespace, and in some cases, results in increased input/output (I/O) resource consumption due to additional read-modify-write operations to repack updates within an otherwise unchanged object.
- I/O input/output
- All leaves in a B-tree have the same distance from the root with wide fanouts.
- the left to root path length is bounded to a small number, typically no more than 4-5 levels, even for bigger file systems. This property results in efficient access patterns to look data up and minimal I/O amplification when writing.
- a “libebtree” library is used for the B-tree implementation.
- keys in B-tree nodes have a fixed size.
- values in B-tree nodes have variable sizes.
- much of the overall structure of a file system is placed into the design of the key space.
- values associated with keys are then used to store small items efficiently packed into larger storage objects while larger objects are directly stored in object store 130 .
- Packing small items such as “mode” attributes, extended attributes, “symlinks,” and small file content into B-tree nodes can significantly reduce the number of objects required.
- a small file with a “xattr” can be represented by 3 objects (“mode,” “xattr,” and “data”) in a collection described in FIG. 2 .
- the small file can be packed into objects along with many other “inodes/xattrs/data” objects, consuming less than one object for the entire file.
- data storage is not limited by a fixed block size.
- Block sizes can be chosen to match the workload.
- Non-sequential I/O patterns can be matched to small blocks in order to avoid excess I/O amplification, whereas larger streaming writes are better served by large blocks.
- the B-tree model design allows using file blocks from 1 kilobyte to 1 Megabyte.
- the example B-tree 300 includes at least a root 302 , nodes 304 , 306 , and 308 , and a leaf 310 .
- the root 302 includes at least a key 308 (k 1 ).
- the node 304 includes at least keys 312 (k 2 ), 314 (k 3 ), and 316 (k 4 ).
- the node 306 includes at least keys 318 (k 5 ) and 320 (k 6 ).
- the node 308 includes at least keys 322 (k 7 ) and 324 (k 8 ).
- the example leaf 310 includes at least key 326 (k 9 ).
- the nodes 304 and 306 are referred by the root 302 and refer further nodes (for example, node 308 ).
- the key k 1 of root 302 divides all further nodes into two subtrees.
- the first subtree starts with the node 304 .
- Keys k 2 , k 3 , and k 4 are all less than key k 1 .
- the second subtree starts with the node 306 as a subtree. Keys k 5 and k 6 are greater than key k 1 .
- Node 304 may further divide B-tree 300 and refer at least four further subtrees, while node 306 may further divide B-tree into at least three further subtree.
- the example node 304 refers at least node 308 .
- Keys k 7 and k 8 in node 308 are greater than key k 3 and less than key k 4 of the node 304 .
- the leaf 312 does not refer any further nodes.
- Each node and leaf of B-tree may also include value fields (also referred to as payloads) for each key in the node or the leaf to store data inline, i.e. inside the B-tree.
- FIG. 4 is a table 400 of types of variables that can be used for generating keys of B-trees, according to various example embodiments.
- FIG. 5A is a table showing superblock 510 containing attributes associated with an example file system modeled via a B-tree.
- the superblock 510 can be implemented as a single, specific object in object store 130 .
- the superblock 510 includes a magic number that identifies the superblock in the object store 130 , unique identifier for the file system, update sequence of the file system, key of the root object in a B-tree, maximum object size in a B-tree, maximum object size of data in object store 130 , and maximum size of data that can be included in a B-tree.
- the identifier of file system and the update sequence can be used as references for debugging, as forensics to identify root objects for a given collection, and to determine the chronology of the root objects.
- FIG. 5B shows fields of a key 520 located at a node in a B-tree, according to various example embodiments.
- the key 520 includes three fields: a “data” field 522 , an “inum” field 524 , and an “index/offset” field 526 .
- the “data” field identifies type of object: data or metadata.
- “data” field 522 is equal to 0 for metadata and 1 for data.
- “Inum” field 524 indicates which file, directory, or metadata entry the key 520 presents.
- “Index/offset” field 526 relates to a particular chunk of data or an index of metadata.
- the field 526 is an index if key 520 associated with metadata (“data” field 522 is 0) and offset of a particular file chunk if key 520 relates to a file (“data” field 522 is 1).
- each distinct type of metadata has its own metadata index.
- a list of metadata indexes is shown in table 610 in FIG. 6 .
- the metadata index 612 in table 610 is the value placed in “index/offset” field 526 of key 520 if the key 520 relates to a metadata (that is when “data” field 522 is 1).
- metadata index 610 (and corresponding “index/offset” field 526 ) is 1 if the metadata are POSIX file attributes and 2 if metadata is a “symlink” target path.
- the target data for a “symlink” is the literal bytes of the target.
- the literal bytes of the target are stored in a value associated with a key 510 inside a node of a B-tree. If key 520 relates to a metadata containing an extended attribute of a file, index 612 is a hash of a name and value of the extended attribute.
- FIG. 6 also illustrates table 620 containing attributes 614 of a file or a directory and table 630 containing additional attributes 616 for a directory.
- the file attributes 614 include regular POSIX attributes.
- Additional attributes 616 for directories include a hash function (“hashnf”) and a key for the hash function (“hashkey”).
- “hashnf” is 0 if the hash function is cyclic redundancy check “crh32c,” 1 if the hash function is SipHash, and 2 if the hash function is SipHash with 3 characters of ordering.
- root directory has a constant “Inode” number of 1000.
- the root directory has no parent.
- the root includes an “. . . ” entry pointing back to the root directory. All new directories, including the root directory, start with attribute “nlink” equal to 2.
- “xattrs” are indexed by name using a “namehash” scheme which is further described in FIG. 9 and “crc32c” as the hash function.
- the value of a key associated with “xattrs” can be in one of the two forms: internal or external.
- FIG. 7 illustrates internal form 710 and external form 720 for a value of a key associated with an extended attribute.
- Both forms include an “inline” field 712 and second field 714 .
- the “inline” field 712 is 1 and the second field 714 includes a literal name (and a value) of an extended attribute.
- the “inline” field 712 is equal to 0 and the second field 714 includes a smart hash of an object in an object store, wherein the object includes the extended attribute. If the size of the extended attribute is larger than “maxinlinesize” (described in FIG. 5A ), then the external form 720 is stored in a B-tree.
- the value for the extended attribute is stored inline in a B-tree using form 710 .
- the size of the value for extended attribute may be as large as “maxdataobjsize” (described in FIG. 5A ) so that the extended attribute values are never become more than one object in size.
- FIG. 8 shows a key 810 for a chunk of file, an internal form 820 for value of the file chunk, and an external form 830 for value for the file chunk, according to some example embodiments.
- Key 810 includes a “data” field 522 set to 1, an “Inum” field 524 , and “offset” field 526 .
- “Inum” field is the metadata identifier of the file.
- “Offset” field is the offset of the file chunk from the beginning of file.
- Chunk also referred to as a fragment or a continuous fragment
- Two chunks of the same file are not allowed to overlap. Gaps between chunks and gaps between the end of the last chunk and file size are implicitly zero-filled.
- value of the file chunk can be stored inline in a B-tree and externally in object store 130 .
- Internal form 820 is used when the chunk is stored directly in a B-tree.
- Internal form 820 including a “BTC_DATA_INLINE” field 822 is set to 1 and a second field 824 containing literal byte data represents the chunk.
- the file chunk is stored inline in a node of a B-tree if the size of the chunk is less than “maxinlinesize” (described in FIG. 5A ).
- External form 830 is used when the file chunk is stored externally in object store 130 .
- External form 830 includes a “BTC_DATA_EXTERN” field 832 set to 2, “size” field 834 representing size of the file chunk, and Smash of an object in object store 130 , and the object corresponding to the file chunk.
- the external form is stored in a node of a B-tree if size of the chunk is larger than “maxinlinesize”(described in FIG. 5A ).
- dividing files in chunks (also referred herein as chunking) is performed in accordance with one or more policy. While introducing a policy for chunking one may consider following facts:
- a “block-aligned chunking” policy is applied.
- the block-aligned policy causes a chunk structure of a file to reflect write patterns to the file. If the file is written with streaming sequential writes then the chunks end up being as large as possible, which minimizes the per-object overhead and metadata. If the file is written non-sequentially, the chunks reflect the IO sizes of the writes to minimize read-modify-write overhead of updating a portion of a chunk's object.
- the data when data is written to a file, the data is accumulated in a memory waiting for a subsequent write. Adjacent writes are merged in an object up to the maxdataobjectsize.
- existing data for non-sequential writes is removed and replaced with the new data.
- the block size logically subdivides the file into a sequence of block sized and aligned segments.
- a chunk can span multiple blocks, but not more than one chunk can be located within a block. Therefore, if a write is not block-aligned, any existing chunk is split at the block boundaries, and a new chunk is inserted.
- the new chunk contains a combination of the old and new data. If the overwrite is already block aligned, then the new data is simply written. If the overwrite aligns with an existing chunk then the chunk is simply replaced without affecting the surrounding chunks.
- the chunks are merged to maintain the invariant of no more than one chunk per block.
- an additional constraint is introduced that forces chunks to be always split at “maxdataobjsize” boundaries in the file. This constrain may allow two files which are substantially similar to deduplicate against each other by allowing chunking to resynchronize at “maxdataobjsize” boundaries.
- a “fingerprint chunking” policy can be applied.
- the “fingerprint chunking” policy is intended to maximize the opportunities for deduplication by making the chunk structure a function of the file content rather than the write patterns.
- the same byte sequence may result in the same chunk structure, and, therefore, the same object IDs.
- Rabin fingerprint algorithm can be used to select content-dependent chunk division points.
- the “fingerprint chunking” policy may work best for streaming sequential writes. Non-sequential overwrites are especially expensive, as the new writes need to be merged with existing data, and the new data re-chunked according to the fingerprinting.
- a compression can be applied while chunking the file data.
- the user data (as large as possible) are fed into a compression algorithm at write time in order to create a specific output size.
- the compression can reduce not only the data size in bytes, but also in objects.
- compression is a content-dependent transformation, and is easiest to apply to streaming writes. Using compression for non-sequential writes, and, especially, overwrites, is expensive because the read-modify-write cycle also requires decompression and recompression.
- an “inlining” policy is applied.
- the inlining may allow small data to be directly embedded within a B-tree, rather than requiring small external objects in the object store.
- the inlining precludes deduplication for small files and results in a minimum amount of space savings from deduping small files.
- a B-tree collection possesses a global “maxinlinesize” setting. Maxinlinesize is defined at creation time of the B-tree collection to set the upper bound on the largest chunk that may be inlined. Typically, maxinlinesize is about 4k. As mentioned above, each file is also associated with an “inlinesize” which defines the chunk size that is inlined. The inlining is useful for small files or large sparse files with lots of small spans. The default “inlinesize” is 1k.
- FIG. 9 shows a key 910 for a directory entry and value 920 for the directory entry.
- the directories are indexed by name using a “namehash” scheme similar to extended attributes “xattr.”
- a region of the B-tree key space is reserved starting from a “base” for a certain number of slots. Slots are populated with entries as the entries are added. In some embodiments, it is assumed that the number of slots is very large with respect to expected number of entries, so the likelihood of single collisions is low.
- B-tree key 910 includes “data” filed field equal to 1, “Inum” field 524 representing a metadata identifier of the directory, and “offset” field 526 which includes a “diroffset” value.
- “Diroffset” is a hash function of a literal name of a directory entry (also referred to as a directory chunk).
- the value of directory entry 522 includes a “BTC_DIRENT” field 922 set to 0, a name field 924 holding the name of the directory entry, a “direnttype” field 926 holding the type of entry, and “inum” field 928 holding a metadata identifier of the directory entry.
- Each entry in slots has a form 930 as shown in FIG. 9 .
- the “tag” field is used by directories for the chunk type field.
- An entry with a zero-length name (the name field is a single 0x00 byte) is a tombstone 940 .
- a tombstone 940 is a deleted entry, which is required so that a lookup algorithm can find later names with a colliding hash.
- the name of the entry is hashed with a hash function to find a corresponding key, which is a slot in reserved slots. If the located slot is occupied by another name due to a hash collision, the key is incremented until it finds an available slot, which can be either a completely unused slot or occupied by a tombstone.
- the name is hashed to determine the corresponding slot. If the determined slot is unused, the entry does not exist. If the determined slot is occupied and the name does not match, the key is incremented until the name is found or an unused slot is found. Any tombstones encountered are then ignored and skipped over.
- the lookup algorithm is used to find the slot for the name in order to form B-tree. If the slot exists and the next slot is unused, then the entry is deleted. If the next slot is occupied, then it is assumed that a hash collision occurs and the entry is replaced by a tombstone (a zero-length name, and no payload). If there is a series of tombstones followed by an empty entry, the tombstones can be deleted.
- three types of hash function can be used:
- crc32c It is a resource inexpensive 32-bit hash. crc32c is neither strong nor collision resistant. Resulting names will have no apparent order and can be attacked to cause collisions, resulting in a denial-of-service attack. crc32c is used for “xattrs” since “xattr” is not generally writable from untrusted sources, and the number of “xattr” is not large.
- SipHash It is an efficient 64-bit keyed hash. Each directory has its own randomly generated key which is not exposed outside of the filesystem. Any attacker would need to know the key to be able to cause collisions at a higher rate than a random chance.
- SipHash-order3 SipHash-order3. It is a variant of SipHash that truncates the hash to 4 bytes and prepends 3 bytes of the name to the start of the hash. With a “normal” mix of names in a directory, this may result in a directory that is nearly lexically sorted, with entries having a common prefix being mixed randomly. This may help to accommodate applications which tend to operate on directories in an alphabetical order.
- a case-insensitive variant of a hash function can be used to support case-insensitive names of directories.
- the hash function may be marked as “CICP” (that is “case-insensitive, case-preserving”).
- CICP case-insensitive, case-preserving
- CICP case-insensitive, case-preserving
- CICP case-insensitive, case-preserving
- CICP case-insensitive, case-preserving
- Applying a case-insensitive hash function allows performing filesystem operations on large directories to be efficient. For example, without case-insensitive lookups, Samba package is needed to scan the whole directory to perform case-insensitive matching. Names of directories are required to be properly formed with UTF-8 coding.
- an “orphan directory” is provided.
- FIG. 10 is a process flow diagram showing a method 1000 for organizing data, according to an example embodiment.
- the method 1000 can be implemented using a computer system.
- the example computer system is shown in FIG. 11 .
- the method 1000 may commence with providing an object store to store objects in block 1010 .
- the object represents fragments of the data and can be assigned addresses.
- method 1000 can proceed with associating a B-tree with the object store.
- the B-tree includes nodes.
- Each of the nodes includes keys.
- Each of the keys is associated with at least one object from the object store.
- method 1000 generates, based at least partially on objects from the object store, values for the each of the keys.
- method 1000 allows determining that a size of an object from the object store is less than a pre-determined size.
- a value of the object is stored in a particular node of the B-tree, wherein the particular node includes a particular key associated with the object.
- the address of the object is stored in the particular node of the B-tree.
- FIG. 11 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of a computer system 1100 , within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
- the machine operates as a standalone device or can be connected (e.g., networked) to other machines.
- the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
- the machine can be a server, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a digital camera, a portable music player (e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 (MP3) player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- a portable music player e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 (MP3) player
- MP3 Moving Picture Experts Group Audio Layer 3
- a web appliance e.g., a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- MP3 Moving Picture Experts Group Audio Layer 3
- the example computer system 1100 includes a processor or multiple processors 1102 , a hard disk drive 1104 , a main memory 1106 , and a static memory 1108 , which communicate with each other via a bus 1110 .
- the computer system 1100 may also include a network interface device 1112 .
- the hard disk drive 1104 may include a computer-readable medium 1120 , which stores one or more sets of instructions 1122 embodying or utilized by any one or more of the methodologies or functions described herein.
- the instructions 1122 can also reside, completely or at least partially, within the main memory 1106 and/or within the processors 1102 during execution thereof by the computer system 1100 .
- the main memory 1106 and the processors 1102 also constitute machine-readable media.
- While the computer-readable medium 1120 is shown in an exemplary embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
- the term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions.
- computer-readable medium shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Such media can also include, without limitation, hard disks, floppy disks, NAND or NOR flash memory, digital video disks, RAM, ROM, and the like.
- the exemplary embodiments described herein can be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware.
- the computer-executable instructions can be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions can be executed on a variety of hardware platforms and for interfaces to a variety of operating systems.
- computer software programs for implementing the present method can be written in any number of suitable programming languages such as, for example, C, Python, JavaScript, Go, or other compilers, assemblers, interpreters or other computer languages or platforms.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- The present application claims benefit of U.S. provisional application No. 62/210,385 filed on Aug. 26, 2015. The disclosure of the aforementioned application is incorporated herein by reference for all purposes.
- This disclosure relates generally to data processing and, more specifically, to methods and systems for providing a B-tree based data model for organizing file systems.
- The approaches described in this section could be pursued but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
- In computer systems, user data are typically organized as file systems. In general, a file system may be viewed as a directed graph of objects, wherein nodes and leaves of the directed graph represent files and directories. Directories may further include subdirectories and files. In a multi-user computer system, each file or directory is assigned attributes that regulate user permissions for viewing, editing, and creation of files and directories. Attributes of directories and files are kept in the directed graph as objects. Large files in a file system may be split into a chain of blocks. Therefore, during a lifetime of a file system, the directed graph may be developed to include very long paths or chains of referring objects to form the root of the directed graph to leaves. Therefore, any modifications to the file system, such as a modification or creation of a new file or directory, require traveling the path through nodes to find a place for adding, creating, or modifying a new node. Having long and unbalanced paths from the root to leaves may require excessive input/output operations that may be time consuming and lead to unnecessarily redundant storage consumption.
- This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
- The technology described herein includes methods for organizing data. An example embodiment can provide a B-tree based data model constructed on top of an object store. The object store may include immutable, content-addressable, distributed objects representing fragments of user content.
- According to the example embodiment, a method includes providing an object store to store objects. The objects can represent fragments of the data. Each object can be associated with an address. The method can further include associating a B-tree with the object store. The B-tree can include nodes, with each of the nodes including keys. Each key can be associated with at least one object from the object store. The method allows generating, based at least partially on objects from the object store, values for each key.
- In some embodiments, the address of an object in the object store is based on content of the fragment of the data. In some embodiments, the method determines whether a size of an object from the object store is less than a pre-determined size. If the result of the determination is positive, a value of the object is stored in a particular node of the B-tree. The particular node includes a particular key associated with the object. If the result of the determination is negative, the address of the object is stored in the particular node of the B-tree.
- In some embodiments, each object includes one of a metadata object or a data object. Metadata objects store at least a number of references to further objects from the data objects and an identification number associated with a file or a directory. The data object represents a continuous fragment of the file or a directory entry.
- In some embodiments, the key associated with the metadata object includes at least three fields. The first field includes an indication of the metadata object. The second field includes the identification number. The third field includes a metadata index representing a distinct type of the metadata object.
- In some embodiments, the type of the metadata object includes one of: attributes associated with the file or the directory, a symbolic link to the file or the directory, and an extended file attribute associated with the file or the directory. In some embodiments, the key associated with the fragment of the file includes at least three fields. The first field includes an indication of the data object. The second field includes the identification number. The third field includes an offset of the continuous fragment of the file from the beginning of the file.
- In some embodiments, the key associated with the directory entry includes at least three fields. The first field includes an indication of the data object. The second field includes the identification number. The third field includes a hash calculated based on a literal name of the directory entry.
- In some embodiments, calculating the hash includes applying a hash function to the literal name to obtain a preliminary hash. The hash function can include, for example, crc32c, SipHash, or SipHash-order3. The preliminary hash can be shifted by a pre-determined base number to obtain the hash. If the hash matches an existing hash, then the hash is incremented by 1.
- According to another example embodiment of the present disclosure, the steps of the method for organizing data are stored on a machine-readable medium comprising instructions, which, when implemented by one or more processors, perform the recited steps.
- Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements.
-
FIG. 1 illustrates an example system for organizing computer data. -
FIG. 2 illustrates an example of two collections sharing mutual data objects. -
FIG. 3 illustrates an example B-tree. -
FIG. 4 displays a table containing types of variables. -
FIG. 5A illustrates attributes describing a superblock entry in a data model. -
FIG. 5B illustrates attributes of a key used in a B-tree based data model. -
FIG. 6 displays attributes of a metadata, attributes of a file entry, and additional attributes of a directory entry in a B-tree based data model. -
FIG. 7 illustrates internal and external forms for extended attributes used in a B-tree data model. -
FIG. 8 illustrates a key, internal form, and external form for values for a file chunk. -
FIG. 9 illustrates a key and forms for values for a directory entry. -
FIG. 10 is a process flow diagram showing a method for organizing data. -
FIG. 11 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed. - The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings illustrate exemplary embodiments. These exemplary embodiments, which are also referred to herein as “examples,” are described in enough detail to enable those skilled in the art to practice the present subject matter. The embodiments can be combined, other embodiments can be utilized, or structural, logical and electrical changes can be made without departing from the scope of what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
- The technology described herein allows organizing data using a B-tree built on top of an object store. In some embodiments, the object store includes immutable content-addressable distributed objects. The objects may represent fragments of data (for example, fragments of files and directories and metadata including attributes of files and directories).
- According to an example embodiment, the method for organizing data includes providing an object store to store objects. The objects represent fragments of the data. Each object is associated with an address. The method can further include associating a B-tree with the object store. The B-tree includes nodes, with each node including keys. Each key can be associated with at least one object from the object store. The method allows generating, based at least partially on objects from the object store, values for each of the keys.
-
FIG. 1 illustrates anexample system 100 for organizing data, according to various example embodiments. The illustratedsystem 100 includesapplications 110, a directedgraph 120, and anobject store 130. In various embodiments, applications may include operational systems, user applications, graphical user interface, and other software for creating, managing, and presenting the data. - In various embodiments, the
object store 130 includes key-value entries (objects). Each of the key-value entries includes an identifier (a key) and a payload (also referred to as a value or an object content) representing a chunk of the data (for example, a chunk of user content.) In various embodiments, objects are designated as either “data” or “metadata.” Payloads of “data” objects include only uninterpreted bytes. Payloads of “metadata” objects have an internal structure and may refer to other objects. In some embodiments, the payloads of “metadata” objects include keys of the objects to which the “metadata” objects refers. - In some embodiments, objects in the
object store 130 are organized by a graph structure 120 (also referred as a directed graph or a collection). Thegraph structure 120 is a directed graph in which each node is an immutable content-addressable object. The collection includes a specific object designated as a root object. - In various embodiments, objects are at least partially content-addressable. This implies that some portions of the identifiers of objects are functions of object contents. A key for the object representing a chunk of data can be calculated using a smart hash (Smash) which is a function of bytes in the chunk. In some embodiments, each version of data (a snapshot) corresponds to a new graph structure. Two graph structures representing two different snapshots can share mutual objects.
-
FIG. 2 is a block diagram showing two 100 and 200 as having mutual objects.collections Collection 100 includes root objects 110 and 122, 124, 126, 128, 130, and 230. Theobjects collection 200 includesroot object 210, objects 222, 224, 226, 228, 230, 126, 130, and 232. The 200 and 100 share atcollections 126, 130, and 230. In the example ofleast objects FIG. 2 , thecollection 200 is an increment of graph 100 (shown inFIG. 2 ). Each of the root objects is the root of a specific unique immutable graph. In order to get the effect of a modification or mutation, a new graph is constructed that includes the required changes. If the new graph is an incremental change from a previous graph, then the new graph is likely to share a large number of its objects with the previous graph. The only change required by the new objects is a new root. - In some embodiments, the objects in
object store 130 include entities such as identifier node (“mode”), extended attributes (“xattr”), a symbolic link (“symlink”), and chunks of files and directories. In a collection (for example,collection 100 or 200), each file or directory is represented by a chain of at least three objects: “mode” object, “xattr” object, and a “data” object. An “mode” object refers the “xattr” object and the “data” object. An “mode” object includes an identification number (“inum”) associated with the file or directory, number of links, file type and permission, creation and modification of file or directory, and other attributes. An “xattr” object includes extended file attributes depending on a type of a file system, such as information concerning an author of a text, a checksum, encoding, and so forth. An extended attribute may include a name of the attribute and a value associated with the attribute's name. The “data” object is an object containing a chunk of data (for example, a chunk of a file). In some embodiments, an address of an object inobject store 130 is a Smash, which is a hash function of the content of the object. - In some embodiments, the
graph structure 120 includes a B-tree (also referred to as a B-tree model). In various embodiments, using a B-tree model for referencing metadata objects and data objects may allow accessing files by “mode” number in order to support persistent file handles. The B-tree may provide tools for managing efficient large directories having up to 1,000,000 entries, hard links, and a large amount of small files. The B-tree model allows flexible data layouts for non-streaming workloads. - The result is that B-tree collections tend to have fewer, larger objects with many smaller file system entities such as “modes,” “symlinks,” “xattrs,” and small amounts of data packed together. The downside includes an increased conceptual complexity resulting in a structure that is unlike the user-visible file system namespace, and in some cases, results in increased input/output (I/O) resource consumption due to additional read-modify-write operations to repack updates within an otherwise unchanged object.
- All leaves in a B-tree have the same distance from the root with wide fanouts. The left to root path length is bounded to a small number, typically no more than 4-5 levels, even for bigger file systems. This property results in efficient access patterns to look data up and minimal I/O amplification when writing.
- In some embodiments, a “libebtree” library is used for the B-tree implementation. In some embodiments, keys in B-tree nodes have a fixed size. In other embodiments, values in B-tree nodes have variable sizes. In some embodiments, much of the overall structure of a file system is placed into the design of the key space. In various embodiments, values associated with keys are then used to store small items efficiently packed into larger storage objects while larger objects are directly stored in
object store 130. - Packing small items such as “mode” attributes, extended attributes, “symlinks,” and small file content into B-tree nodes can significantly reduce the number of objects required. For example, a small file with a “xattr” can be represented by 3 objects (“mode,” “xattr,” and “data”) in a collection described in
FIG. 2 . In the B-tree model, the small file can be packed into objects along with many other “inodes/xattrs/data” objects, consuming less than one object for the entire file. - In some embodiments, data storage is not limited by a fixed block size. Block sizes can be chosen to match the workload. Non-sequential I/O patterns can be matched to small blocks in order to avoid excess I/O amplification, whereas larger streaming writes are better served by large blocks. The B-tree model design allows using file blocks from 1 kilobyte to 1 Megabyte.
- An example B-
tree 300 is shown inFIG. 3 . The example B-tree 300 includes at least aroot 302, 304, 306, and 308, and anodes leaf 310. Theroot 302 includes at least a key 308 (k1). Thenode 304 includes at least keys 312 (k2), 314 (k3), and 316 (k4). Thenode 306 includes at least keys 318 (k5) and 320 (k6). Thenode 308 includes at least keys 322 (k7) and 324 (k8). Theexample leaf 310 includes at least key 326 (k9). The 304 and 306 are referred by thenodes root 302 and refer further nodes (for example, node 308). The key k1 ofroot 302 divides all further nodes into two subtrees. The first subtree starts with thenode 304. Keys k2, k3, and k4 are all less than key k1. The second subtree starts with thenode 306 as a subtree. Keys k5 and k6 are greater than key k1.Node 304 may further divide B-tree 300 and refer at least four further subtrees, whilenode 306 may further divide B-tree into at least three further subtree. Theexample node 304 refers atleast node 308. Keys k7 and k8 innode 308 are greater than key k3 and less than key k4 of thenode 304. The leaf 312 does not refer any further nodes. Each node and leaf of B-tree may also include value fields (also referred to as payloads) for each key in the node or the leaf to store data inline, i.e. inside the B-tree. -
FIG. 4 is a table 400 of types of variables that can be used for generating keys of B-trees, according to various example embodiments.FIG. 5A is atable showing superblock 510 containing attributes associated with an example file system modeled via a B-tree. In some embodiments, thesuperblock 510 can be implemented as a single, specific object inobject store 130. Thesuperblock 510 includes a magic number that identifies the superblock in theobject store 130, unique identifier for the file system, update sequence of the file system, key of the root object in a B-tree, maximum object size in a B-tree, maximum object size of data inobject store 130, and maximum size of data that can be included in a B-tree. In some embodiments, the identifier of file system and the update sequence can be used as references for debugging, as forensics to identify root objects for a given collection, and to determine the chronology of the root objects. -
FIG. 5B shows fields of a key 520 located at a node in a B-tree, according to various example embodiments. The key 520 includes three fields: a “data”field 522, an “inum”field 524, and an “index/offset”field 526. The “data” field identifies type of object: data or metadata. In some embodiments, “data”field 522 is equal to 0 for metadata and 1 for data. “Inum”field 524 indicates which file, directory, or metadata entry the key 520 presents. “Index/offset”field 526 relates to a particular chunk of data or an index of metadata. Thefield 526 is an index ifkey 520 associated with metadata (“data”field 522 is 0) and offset of a particular file chunk ifkey 520 relates to a file (“data”field 522 is 1). - In some embodiments, each distinct type of metadata has its own metadata index. A list of metadata indexes is shown in table 610 in
FIG. 6 . Themetadata index 612 in table 610 is the value placed in “index/offset”field 526 ofkey 520 if the key 520 relates to a metadata (that is when “data”field 522 is 1). In some embodiments, metadata index 610 (and corresponding “index/offset” field 526) is 1 if the metadata are POSIX file attributes and 2 if metadata is a “symlink” target path. The target data for a “symlink” is the literal bytes of the target. In some embodiments, the literal bytes of the target are stored in a value associated with a key 510 inside a node of a B-tree. Ifkey 520 relates to a metadata containing an extended attribute of a file,index 612 is a hash of a name and value of the extended attribute. -
FIG. 6 also illustrates table 620 containingattributes 614 of a file or a directory and table 630 containingadditional attributes 616 for a directory. In various embodiments, the file attributes 614 include regular POSIX attributes.Additional attributes 616 for directories include a hash function (“hashnf”) and a key for the hash function (“hashkey”). In some embodiments, “hashnf” is 0 if the hash function is cyclic redundancy check “crh32c,” 1 if the hash function is SipHash, and 2 if the hash function is SipHash with 3 characters of ordering. - In some embodiments, root directory has a constant “Inode” number of 1000. The root directory has no parent. The root includes an “. . . ” entry pointing back to the root directory. All new directories, including the root directory, start with attribute “nlink” equal to 2.
- Referring back to table 610, in some embodiments, “xattrs” are indexed by name using a “namehash” scheme which is further described in
FIG. 9 and “crc32c” as the hash function. The value of a key associated with “xattrs” can be in one of the two forms: internal or external. -
FIG. 7 illustratesinternal form 710 andexternal form 720 for a value of a key associated with an extended attribute. Both forms include an “inline”field 712 andsecond field 714. Ininternal form 710, the “inline”field 712 is 1 and thesecond field 714 includes a literal name (and a value) of an extended attribute. Inexternal form 720, the “inline”field 712 is equal to 0 and thesecond field 714 includes a smart hash of an object in an object store, wherein the object includes the extended attribute. If the size of the extended attribute is larger than “maxinlinesize” (described inFIG. 5A ), then theexternal form 720 is stored in a B-tree. If the extended attribute does not exceed “maxinlinesize,” then the value for the extended attribute is stored inline in a B-tree using form 710. The size of the value for extended attribute may be as large as “maxdataobjsize” (described inFIG. 5A ) so that the extended attribute values are never become more than one object in size. -
FIG. 8 shows a key 810 for a chunk of file, aninternal form 820 for value of the file chunk, and anexternal form 830 for value for the file chunk, according to some example embodiments.Key 810 includes a “data”field 522 set to 1, an “Inum”field 524, and “offset”field 526. “Inum” field is the metadata identifier of the file. “Offset” field is the offset of the file chunk from the beginning of file. Chunk (also referred to as a fragment or a continuous fragment) represents a byte extent of file content. Two chunks of the same file are not allowed to overlap. Gaps between chunks and gaps between the end of the last chunk and file size are implicitly zero-filled. - In some embodiments, value of the file chunk can be stored inline in a B-tree and externally in
object store 130.Internal form 820 is used when the chunk is stored directly in a B-tree.Internal form 820 including a “BTC_DATA_INLINE”field 822 is set to 1 and asecond field 824 containing literal byte data represents the chunk. The file chunk is stored inline in a node of a B-tree if the size of the chunk is less than “maxinlinesize” (described inFIG. 5A ). -
External form 830 is used when the file chunk is stored externally inobject store 130.External form 830 includes a “BTC_DATA_EXTERN”field 832 set to 2, “size”field 834 representing size of the file chunk, and Smash of an object inobject store 130, and the object corresponding to the file chunk. The external form is stored in a node of a B-tree if size of the chunk is larger than “maxinlinesize”(described inFIG. 5A ). - In some embodiments, dividing files in chunks (also referred herein as chunking) is performed in accordance with one or more policy. While introducing a policy for chunking one may consider following facts:
-
- Large size chunks may amortize per chunk or per object overhead at cost of increased IO amplification if the whole object is not needed.
- The object store may perform deduplication of objects based on object ID. Boundaries of an object determine the object ID. Therefore, if two identical byte sequences are chunked differently, then these two byte sequences may not deduplicate against each other.
- Compression of data in a file may cause a wide variance between a logical file content and resulting objects if the file data is highly compressible. For example, for common patterns like “all zeros” the difference between the logical file content and resulting objects can be in hundredfold.
- In some embodiments, a “block-aligned chunking” policy is applied. The block-aligned policy causes a chunk structure of a file to reflect write patterns to the file. If the file is written with streaming sequential writes then the chunks end up being as large as possible, which minimizes the per-object overhead and metadata. If the file is written non-sequentially, the chunks reflect the IO sizes of the writes to minimize read-modify-write overhead of updating a portion of a chunk's object.
- The following two parameters are relevant for “block-aligned chunking” policy.
-
- maxdataobjsize parameter defined for a collection. The maxdataobjsize parameter constrains the maximum size of the object used to store data; and
- block size parameter defined per a file. A block size parameter controls how chunks are split and aggregated. The block size parameter includes a power of 2.
- In some embodiments, when data is written to a file, the data is accumulated in a memory waiting for a subsequent write. Adjacent writes are merged in an object up to the maxdataobjectsize.
- In some embodiments, existing data for non-sequential writes, if present, is removed and replaced with the new data. The block size logically subdivides the file into a sequence of block sized and aligned segments. A chunk can span multiple blocks, but not more than one chunk can be located within a block. Therefore, if a write is not block-aligned, any existing chunk is split at the block boundaries, and a new chunk is inserted. The new chunk contains a combination of the old and new data. If the overwrite is already block aligned, then the new data is simply written. If the overwrite aligns with an existing chunk then the chunk is simply replaced without affecting the surrounding chunks.
- In some embodiments, if it is determined that a write results in more than one chunk within a single block region, the chunks are merged to maintain the invariant of no more than one chunk per block.
- In some embodiments, an additional constraint is introduced that forces chunks to be always split at “maxdataobjsize” boundaries in the file. This constrain may allow two files which are substantially similar to deduplicate against each other by allowing chunking to resynchronize at “maxdataobjsize” boundaries.
- In some embodiments, a “fingerprint chunking” policy can be applied. The “fingerprint chunking” policy is intended to maximize the opportunities for deduplication by making the chunk structure a function of the file content rather than the write patterns. When “fingerprint chunking” is applied, the same byte sequence may result in the same chunk structure, and, therefore, the same object IDs. In some embodiments, Rabin fingerprint algorithm can be used to select content-dependent chunk division points.
- The “fingerprint chunking” policy may work best for streaming sequential writes. Non-sequential overwrites are especially expensive, as the new writes need to be merged with existing data, and the new data re-chunked according to the fingerprinting.
- In some embodiments, a compression can be applied while chunking the file data. In some embodiments, the user data (as large as possible) are fed into a compression algorithm at write time in order to create a specific output size. The compression can reduce not only the data size in bytes, but also in objects. Like “fingerprinting”, compression is a content-dependent transformation, and is easiest to apply to streaming writes. Using compression for non-sequential writes, and, especially, overwrites, is expensive because the read-modify-write cycle also requires decompression and recompression.
- In some embodiments, an “inlining” policy is applied. The inlining may allow small data to be directly embedded within a B-tree, rather than requiring small external objects in the object store. The inlining precludes deduplication for small files and results in a minimum amount of space savings from deduping small files.
- A B-tree collection possesses a global “maxinlinesize” setting. Maxinlinesize is defined at creation time of the B-tree collection to set the upper bound on the largest chunk that may be inlined. Typically, maxinlinesize is about 4k. As mentioned above, each file is also associated with an “inlinesize” which defines the chunk size that is inlined. The inlining is useful for small files or large sparse files with lots of small spans. The default “inlinesize” is 1k.
-
FIG. 9 shows a key 910 for a directory entry andvalue 920 for the directory entry. The directories are indexed by name using a “namehash” scheme similar to extended attributes “xattr.” In some embodiments, a region of the B-tree key space is reserved starting from a “base” for a certain number of slots. Slots are populated with entries as the entries are added. In some embodiments, it is assumed that the number of slots is very large with respect to expected number of entries, so the likelihood of single collisions is low. B-tree key 910 includes “data” filed field equal to 1, “Inum”field 524 representing a metadata identifier of the directory, and “offset”field 526 which includes a “diroffset” value. “Diroffset” is a hash function of a literal name of a directory entry (also referred to as a directory chunk). The value ofdirectory entry 522 includes a “BTC_DIRENT”field 922 set to 0, aname field 924 holding the name of the directory entry, a “direnttype”field 926 holding the type of entry, and “inum”field 928 holding a metadata identifier of the directory entry. - Each entry in slots has a
form 930 as shown inFIG. 9 . The “tag” field is used by directories for the chunk type field. An entry with a zero-length name (the name field is a single 0x00 byte) is atombstone 940. Atombstone 940 is a deleted entry, which is required so that a lookup algorithm can find later names with a colliding hash. - In some embodiments, when a new directory of extended attribute entry is inserted in the B-tree, the name of the entry is hashed with a hash function to find a corresponding key, which is a slot in reserved slots. If the located slot is occupied by another name due to a hash collision, the key is incremented until it finds an available slot, which can be either a completely unused slot or occupied by a tombstone.
- In some embodiments, when an entry is looked up in the B-tree, the name is hashed to determine the corresponding slot. If the determined slot is unused, the entry does not exist. If the determined slot is occupied and the name does not match, the key is incremented until the name is found or an unused slot is found. Any tombstones encountered are then ignored and skipped over.
- In some embodiments, when an entry is deleted, the lookup algorithm is used to find the slot for the name in order to form B-tree. If the slot exists and the next slot is unused, then the entry is deleted. If the next slot is occupied, then it is assumed that a hash collision occurs and the entry is replaced by a tombstone (a zero-length name, and no payload). If there is a series of tombstones followed by an empty entry, the tombstones can be deleted.
- In some embodiments, three types of hash function can be used:
- 1) crc32c. It is a resource inexpensive 32-bit hash. crc32c is neither strong nor collision resistant. Resulting names will have no apparent order and can be attacked to cause collisions, resulting in a denial-of-service attack. crc32c is used for “xattrs” since “xattr” is not generally writable from untrusted sources, and the number of “xattr” is not large.
- 2) SipHash. It is an efficient 64-bit keyed hash. Each directory has its own randomly generated key which is not exposed outside of the filesystem. Any attacker would need to know the key to be able to cause collisions at a higher rate than a random chance.
- 3) SipHash-order3. It is a variant of SipHash that truncates the hash to 4 bytes and prepends 3 bytes of the name to the start of the hash. With a “normal” mix of names in a directory, this may result in a directory that is nearly lexically sorted, with entries having a common prefix being mixed randomly. This may help to accommodate applications which tend to operate on directories in an alphabetical order.
- In some embodiments, a case-insensitive variant of a hash function can be used to support case-insensitive names of directories. For example, the hash function may be marked as “CICP” (that is “case-insensitive, case-preserving”). Using a case-insensitive hash function allows looking up names without regard to case of characters of the names while keeping original capitalization of characters of the names. Applying a case-insensitive hash function allows performing filesystem operations on large directories to be efficient. For example, without case-insensitive lookups, Samba package is needed to scan the whole directory to perform case-insensitive matching. Names of directories are required to be properly formed with UTF-8 coding.
- In some embodiments, when an Inode loses the last name (that is nlink=0 or the Inode is “deleted”), the Inode may still be in use. While the file is still open, it functions normally. This means that the bulk of the work related to deleting the file is deferred until the file is closed.
- If there is a crash occurring before the close happens, then the Inode state could be left as stray garbage, which is Inodes with no names. Such an Inode is effectively inaccessible. At the same time the Inode can be still present within the B-tree, so a garbage collection stores Inode data alive.
- In some embodiments, to solve the issue of having an Inode without a last name, an “orphan directory” is provided. The “orphan directory” is a nameless directory with Inum=1. When a file is unlinked and the file “nlink” count goes to zero, an entry is added to the orphan directory and the file's nlink remains 0. When mounted, it traverses the entries in the orphan directory and releases all associated data.
-
FIG. 10 is a process flow diagram showing amethod 1000 for organizing data, according to an example embodiment. Themethod 1000 can be implemented using a computer system. The example computer system is shown inFIG. 11 . - The
method 1000 may commence with providing an object store to store objects inblock 1010. The object represents fragments of the data and can be assigned addresses. Inblock 1020,method 1000 can proceed with associating a B-tree with the object store. The B-tree includes nodes. Each of the nodes includes keys. Each of the keys is associated with at least one object from the object store. Inblock 1030,method 1000 generates, based at least partially on objects from the object store, values for the each of the keys. Inblock 1040,method 1000 allows determining that a size of an object from the object store is less than a pre-determined size. Inblock 1050, if the result of the determination is positive, a value of the object is stored in a particular node of the B-tree, wherein the particular node includes a particular key associated with the object. Inblock 1060, if the result of the determination is negative, the address of the object is stored in the particular node of the B-tree. -
FIG. 11 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of acomputer system 1100, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed. In various exemplary embodiments, the machine operates as a standalone device or can be connected (e.g., networked) to other machines. In a networked deployment, the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine can be a server, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a digital camera, a portable music player (e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 (MP3) player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. - The
example computer system 1100 includes a processor ormultiple processors 1102, ahard disk drive 1104, amain memory 1106, and astatic memory 1108, which communicate with each other via abus 1110. Thecomputer system 1100 may also include anetwork interface device 1112. Thehard disk drive 1104 may include a computer-readable medium 1120, which stores one or more sets ofinstructions 1122 embodying or utilized by any one or more of the methodologies or functions described herein. Theinstructions 1122 can also reside, completely or at least partially, within themain memory 1106 and/or within theprocessors 1102 during execution thereof by thecomputer system 1100. Themain memory 1106 and theprocessors 1102 also constitute machine-readable media. - While the computer-
readable medium 1120 is shown in an exemplary embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions. The term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Such media can also include, without limitation, hard disks, floppy disks, NAND or NOR flash memory, digital video disks, RAM, ROM, and the like. - The exemplary embodiments described herein can be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware. The computer-executable instructions can be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions can be executed on a variety of hardware platforms and for interfaces to a variety of operating systems. Although not limited thereto, computer software programs for implementing the present method can be written in any number of suitable programming languages such as, for example, C, Python, JavaScript, Go, or other compilers, assemblers, interpreters or other computer languages or platforms.
- Thus, systems and methods for methods for organizing data are disclosed. Although embodiments have been described with reference to specific example embodiments, it may be evident that various modifications and changes can be made to these example embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/084,401 US20170060924A1 (en) | 2015-08-26 | 2016-03-29 | B-Tree Based Data Model for File Systems |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201562210385P | 2015-08-26 | 2015-08-26 | |
| US15/084,401 US20170060924A1 (en) | 2015-08-26 | 2016-03-29 | B-Tree Based Data Model for File Systems |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20170060924A1 true US20170060924A1 (en) | 2017-03-02 |
Family
ID=58095689
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/084,401 Abandoned US20170060924A1 (en) | 2015-08-26 | 2016-03-29 | B-Tree Based Data Model for File Systems |
| US15/084,399 Active 2037-07-24 US10474654B2 (en) | 2015-08-26 | 2016-03-29 | Structural data transfer over a network |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/084,399 Active 2037-07-24 US10474654B2 (en) | 2015-08-26 | 2016-03-29 | Structural data transfer over a network |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US20170060924A1 (en) |
Cited By (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180373727A1 (en) * | 2017-06-26 | 2018-12-27 | Vmware, Inc. | Management of b-tree leaf nodes with variable size values |
| US10491387B2 (en) * | 2016-11-15 | 2019-11-26 | International Business Machines Corporation | End-to-end encryption of a block storage device with protected key |
| US20190392062A1 (en) * | 2018-06-22 | 2019-12-26 | EMC IP Holding Company LLC | Multi-level data deduplication for elastic cloud storage devices |
| US10628407B1 (en) * | 2017-02-27 | 2020-04-21 | Amazon Technologies, Inc. | Efficient multithreaded data structure processing |
| US20200185057A1 (en) * | 2018-08-03 | 2020-06-11 | Catalog Technologies, Inc. | Systems and methods for storing and reading nucleic acid-based data with error protection |
| US11062084B2 (en) * | 2018-06-27 | 2021-07-13 | Microsoft Technology Licensing, Llc | Generating diverse smart replies using synonym hierarchy |
| US11188194B2 (en) | 2018-06-27 | 2021-11-30 | Microsoft Technology Licensing, Llc | Personalization and synonym hierarchy for smart replies |
| US20220188339A1 (en) * | 2020-12-16 | 2022-06-16 | Electronics And Telecommunications Research Institute | Network environment synchronization apparatus and method |
| US11556897B2 (en) | 2018-05-31 | 2023-01-17 | Microsoft Technology Licensing, Llc | Job-post budget recommendation based on performance |
| US11636220B2 (en) * | 2019-02-01 | 2023-04-25 | Intertrust Technologies Corporation | Data management systems and methods |
| US11658926B2 (en) | 2018-06-27 | 2023-05-23 | Microsoft Technology Licensing, Llc | Generating smart replies involving image files |
| US11671492B2 (en) * | 2020-09-10 | 2023-06-06 | EMC IP Holding Company LLC | Multipart upload for distributed file systems |
| US11763169B2 (en) | 2016-11-16 | 2023-09-19 | Catalog Technologies, Inc. | Systems for nucleic acid-based data storage |
| US20230334093A1 (en) * | 2022-04-13 | 2023-10-19 | Oracle International Corporation | Graph-organized file system |
| US20230334014A1 (en) * | 2022-04-13 | 2023-10-19 | Oracle International Corporation | Implementing graph search with in-structure metadata of a graph-organized file system |
| US12002547B2 (en) | 2019-05-09 | 2024-06-04 | Catalog Technologies, Inc. | Data structures and operations for searching, computing, and indexing in DNA-based data storage |
| US12006497B2 (en) | 2018-03-16 | 2024-06-11 | Catalog Technologies, Inc. | Chemical methods for nucleic acid-based data storage |
| US20240333794A1 (en) * | 2017-10-12 | 2024-10-03 | Stanley Bruce Kinsey | Cloud-based index and centralized cloud-based hub interface for cloud-stored media |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111400249A (en) * | 2020-03-06 | 2020-07-10 | 深圳市瑞驰信息技术有限公司 | File storage system and method easy for counting file number |
| US20230177069A1 (en) * | 2021-12-08 | 2023-06-08 | Vmware, Inc. | Efficient journal log record for copy-on-write b+ tree operation |
| CN120255819B (en) * | 2025-05-23 | 2025-09-30 | 安徽海马云科技股份有限公司 | File processing method and device |
Citations (40)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5418947A (en) * | 1992-12-23 | 1995-05-23 | At&T Corp. | Locating information in an unsorted database utilizing a B-tree |
| US5752243A (en) * | 1993-10-20 | 1998-05-12 | Microsoft Corporation | Computer method and storage structure for storing and accessing multidimensional data |
| US6052688A (en) * | 1995-01-26 | 2000-04-18 | Hans Verner Thorsen | Computer-implemented control of access to atomic data items |
| US6411957B1 (en) * | 1999-06-30 | 2002-06-25 | Arm Limited | System and method of organizing nodes within a tree structure |
| US20020161781A1 (en) * | 2001-04-27 | 2002-10-31 | Sun Microsystems, Inc. | Method, system, program, and computer readable medium for indexing object oriented objects in an object oriented database |
| US20020194184A1 (en) * | 2001-06-04 | 2002-12-19 | Baskins Douglas L. | System for and method of efficient, expandable storage and retrieval of small datasets |
| US6675173B1 (en) * | 1998-01-22 | 2004-01-06 | Ori Software Development Ltd. | Database apparatus |
| US20040133590A1 (en) * | 2002-08-08 | 2004-07-08 | Henderson Alex E. | Tree data structure with range-specifying keys and associated methods and apparatuses |
| US20050063083A1 (en) * | 2003-08-21 | 2005-03-24 | Dart Scott E. | Systems and methods for the implementation of a digital images schema for organizing units of information manageable by a hardware/software interface system |
| US20050257083A1 (en) * | 2004-05-13 | 2005-11-17 | Cousins Robert E | Transaction-based storage system and method that uses variable sized objects to store data |
| US7213040B1 (en) * | 2002-10-29 | 2007-05-01 | Novell, Inc. | Apparatus for policy based storage of file data and meta-data changes over time |
| US20070185902A1 (en) * | 2006-01-26 | 2007-08-09 | Seagate Technology Llc | Object-based data storage device |
| US20070294319A1 (en) * | 2006-06-08 | 2007-12-20 | Emc Corporation | Method and apparatus for processing a database replica |
| US20070299867A1 (en) * | 2006-06-23 | 2007-12-27 | Timothy John Baldwin | Method and System for Defining a Heirarchical Structure |
| US7346734B2 (en) * | 2005-05-25 | 2008-03-18 | Microsoft Corporation | Cluster storage collection based data management |
| US20090037500A1 (en) * | 2007-07-31 | 2009-02-05 | Kirshenbaum Evan R | Storing nodes representing respective chunks of files in a data store |
| US20090182837A1 (en) * | 2008-01-11 | 2009-07-16 | Rogers J Andrew | Spatial Sieve Tree |
| US20090234821A1 (en) * | 2004-09-15 | 2009-09-17 | International Business Machines Corporation | Systems and Methods for Efficient Data Searching, Storage and Reduction |
| US20100042703A1 (en) * | 2006-11-17 | 2010-02-18 | Koninklijke Philips Electronics N.V. | Method and apparatus for assigning addresses to nodes of a communication network tree structure |
| US20100146003A1 (en) * | 2008-12-10 | 2010-06-10 | Unisys Corporation | Method and system for building a B-tree |
| US7853591B1 (en) * | 2006-06-30 | 2010-12-14 | Juniper Networks, Inc. | Protection of database operations |
| US20110022573A1 (en) * | 2009-07-27 | 2011-01-27 | International Business Machines Corporation | Preventing transfer and duplication of redundantly referenced objects across nodes of an application system |
| US20110225429A1 (en) * | 2008-08-29 | 2011-09-15 | Charalampos Papamanthou | Cryptographic accumulators for authenticated hash tables |
| US20110314246A1 (en) * | 2010-06-16 | 2011-12-22 | Microsoft Corporation | Hierarchical allocation for file system storage device |
| US20120030256A1 (en) * | 2010-07-30 | 2012-02-02 | Wolfgang Pfeifer | Common Modeling of Data Access and Provisioning for Search, Query, Reporting and/or Analytics |
| US20120117067A1 (en) * | 2010-10-29 | 2012-05-10 | Navteq North America, Llc | Method and apparatus for providing a range ordered tree structure |
| US20120331249A1 (en) * | 2011-06-23 | 2012-12-27 | CohortFS, LLC | Dynamic data placement for distributed storage |
| US20130007007A1 (en) * | 2011-06-29 | 2013-01-03 | Nokia Corporation | Method and apparatus for providing a list-based interface to key-value stores |
| US20130086303A1 (en) * | 2011-09-30 | 2013-04-04 | Fusion-Io, Inc. | Apparatus, system, and method for a persistent object store |
| US20130346725A1 (en) * | 2012-06-20 | 2013-12-26 | Microsoft Corporation | Structuring storage based on latch-free b-trees |
| US20140164284A1 (en) * | 2012-12-12 | 2014-06-12 | Michael Hartel | Business Object Data Structures with Node Keys |
| US8868926B2 (en) * | 2012-04-06 | 2014-10-21 | Exablox Corporation | Cryptographic hash database |
| US20140317065A1 (en) * | 2013-04-23 | 2014-10-23 | Exablox Corporation | Reference counter integrity checking |
| US20150220578A1 (en) * | 2014-02-04 | 2015-08-06 | Exablox Corporation | Content based organization of file systems |
| US20150278271A1 (en) * | 2014-03-31 | 2015-10-01 | Sandisk Enterprise Ip Llc | Compaction of Information in Tiered Data Structure |
| US20150370502A1 (en) * | 2014-06-19 | 2015-12-24 | Cohesity, Inc. | Making more active use of a secondary storage system |
| US20160063008A1 (en) * | 2014-09-02 | 2016-03-03 | Netapp, Inc. | File system for efficient object fragment access |
| US20160210080A1 (en) * | 2015-01-20 | 2016-07-21 | Ultrata Llc | Object memory data flow instruction execution |
| US20160306817A1 (en) * | 2015-04-14 | 2016-10-20 | Et International, Inc. | Systems and methods for key-value stores |
| US9864773B1 (en) * | 2013-04-29 | 2018-01-09 | Seagate Technology Llc | Object-based commands with data integrity identifiers |
Family Cites Families (177)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| NL8303167A (en) | 1983-09-14 | 1985-04-01 | Philips Nv | ONLY SERVICE DEVICE WITH MICRO-COMPUTER PROTECTED FROM FAULTS. |
| US4660130A (en) | 1984-07-24 | 1987-04-21 | Texas Instruments Incorporated | Method for managing virtual memory to separate active and stable memory blocks |
| US4953080A (en) * | 1988-04-25 | 1990-08-28 | Hewlett-Packard Company | Object management facility for maintaining data in a computer system |
| CA2285089C (en) | 1991-11-12 | 2000-05-09 | Ibm Canada Limited-Ibm Canada Limitee | Logical mapping of data objects using data spaces |
| GB2265734A (en) | 1992-03-27 | 1993-10-06 | Ibm | Free memory cell management system |
| US5832529A (en) | 1996-10-11 | 1998-11-03 | Sun Microsystems, Inc. | Methods, apparatus, and product for distributed garbage collection |
| US6237009B1 (en) | 1996-10-11 | 2001-05-22 | Sun Microsystems, Inc. | Lease renewal service |
| US6167437A (en) | 1997-09-02 | 2000-12-26 | Silicon Graphics, Inc. | Method, system, and computer program product for page replication in a non-uniform memory access system |
| US6098079A (en) | 1998-04-02 | 2000-08-01 | Mitsubishi Electric Information Technology Center America, Inc. (Ita) | File version reconciliation using hash codes |
| US6154747A (en) | 1998-08-26 | 2000-11-28 | Hunt; Rolf G. | Hash table implementation of an object repository |
| GB9825102D0 (en) | 1998-11-16 | 1999-01-13 | Insignia Solutions Plc | Computer system |
| JP2000207266A (en) | 1999-01-13 | 2000-07-28 | Mitsubishi Electric Corp | Replica system and replica method |
| US6480950B1 (en) | 2000-01-24 | 2002-11-12 | Oracle International Corporation | Software paging system |
| US7266555B1 (en) | 2000-03-03 | 2007-09-04 | Intel Corporation | Methods and apparatus for accessing remote storage through use of a local device |
| AU2001263033A1 (en) | 2000-05-09 | 2001-11-20 | Sun Microsystems, Inc. | Method and apparatus for proximity discovery of services |
| US8219662B2 (en) | 2000-12-06 | 2012-07-10 | International Business Machines Corporation | Redirecting data generated by network devices |
| US7216136B2 (en) | 2000-12-11 | 2007-05-08 | International Business Machines Corporation | Concurrent collection of cyclic garbage in reference counting systems |
| SE0004736D0 (en) | 2000-12-20 | 2000-12-20 | Ericsson Telefon Ab L M | Mapping system and method |
| US20030028514A1 (en) | 2001-06-05 | 2003-02-06 | Lord Stephen Philip | Extended attribute caching in clustered filesystem |
| US7222187B2 (en) | 2001-07-31 | 2007-05-22 | Sun Microsystems, Inc. | Distributed trust mechanism for decentralized networks |
| US7134041B2 (en) | 2001-09-20 | 2006-11-07 | Evault, Inc. | Systems and methods for data backup over a network |
| US6973049B2 (en) | 2001-10-16 | 2005-12-06 | Corrigent Systems Ltd. | Auto-configuration of network interfaces in a bidirectional ring network |
| US7117505B2 (en) | 2001-11-29 | 2006-10-03 | Veritas Operating Corporation | Methods, systems, and apparatus to interface with storage objects |
| US7177980B2 (en) | 2001-12-18 | 2007-02-13 | Storage Technology Corporation | Cache storage system and method |
| US7392421B1 (en) | 2002-03-18 | 2008-06-24 | Symantec Operating Corporation | Framework for managing clustering and replication |
| US6968439B2 (en) | 2002-08-29 | 2005-11-22 | Micron Technology, Inc. | Single segment data object management |
| EP1540510B1 (en) | 2002-09-10 | 2009-08-12 | Exagrid Systems, Inc. | Method and apparatus for managing data integrity of backup and disaster recovery data |
| GB0301726D0 (en) | 2003-01-24 | 2003-02-26 | Ecebs Ltd | Improved smartcard |
| US7043494B1 (en) | 2003-01-28 | 2006-05-09 | Pmc-Sierra, Inc. | Fast, deterministic exact match look-ups in large tables |
| US7177886B2 (en) | 2003-02-07 | 2007-02-13 | International Business Machines Corporation | Apparatus and method for coordinating logical data replication with highly available data replication |
| JP4068473B2 (en) | 2003-02-19 | 2008-03-26 | 株式会社東芝 | Storage device, assignment range determination method and program |
| US7478096B2 (en) | 2003-02-26 | 2009-01-13 | Burnside Acquisition, Llc | History preservation in a computer storage system |
| US7403961B1 (en) | 2003-03-14 | 2008-07-22 | Xilinx, Inc. | Dangling reference detection and garbage collection during hardware simulation |
| US6988180B2 (en) | 2003-09-29 | 2006-01-17 | Microsoft Corporation | Method and apparatus for lock-free, non-blocking hash table |
| US7496690B2 (en) | 2003-10-09 | 2009-02-24 | Intel Corporation | Method, system, and program for managing memory for data transmission through a network |
| US7596704B2 (en) | 2003-10-10 | 2009-09-29 | Jing-Jang Hwang | Partition and recovery of a verifiable digital secret |
| US8103772B2 (en) | 2003-12-24 | 2012-01-24 | Sap Aktiengesellschaft | Cluster extension in distributed systems using tree method |
| EP1870814B1 (en) | 2006-06-19 | 2014-08-13 | Texas Instruments France | Method and apparatus for secure demand paging for processor devices |
| US8185663B2 (en) | 2004-05-11 | 2012-05-22 | Hewlett-Packard Development Company, L.P. | Mirroring storage interface |
| US7065611B2 (en) | 2004-06-29 | 2006-06-20 | Hitachi, Ltd. | Method for controlling storage policy according to volume activity |
| US7715396B2 (en) | 2004-08-19 | 2010-05-11 | Microsoft Corporation | Network routing |
| US7613703B2 (en) | 2004-09-30 | 2009-11-03 | Microsoft Corporation | Organizing resources into collections to facilitate more efficient and reliable resource access |
| CN1761219B (en) | 2004-10-12 | 2010-04-28 | 华为技术有限公司 | The Method of Realizing Topology Automatic Discovery in MPLS Ring Network |
| US20060083247A1 (en) | 2004-10-14 | 2006-04-20 | Sun Microsystems, Inc. | Prefix lookup using address-directed hash tables |
| US7694167B2 (en) * | 2004-10-22 | 2010-04-06 | Microsoft Corporation | Maintaining routing consistency within a rendezvous federation |
| US20120310892A1 (en) * | 2004-12-21 | 2012-12-06 | Dam Tru Q | System and method for virtual cluster file server |
| WO2006098723A1 (en) | 2005-03-10 | 2006-09-21 | Thomson Licensing | Hybrid mesh routing protocol |
| WO2006094366A1 (en) | 2005-03-11 | 2006-09-14 | Rocksoft Limited | Method for indexing in a reduced-redundancy storage system |
| US8356021B2 (en) | 2005-03-11 | 2013-01-15 | Ross Neil Williams | Method and apparatus for indexing in a reduced-redundancy storage system |
| US8442996B2 (en) | 2005-04-12 | 2013-05-14 | Enrico Maim | Methods for granting access to resources modifiable by users in a computer environment, and resources structured therefore |
| US7539836B1 (en) | 2005-04-18 | 2009-05-26 | Netapp, Inc. | Method and system for configuring a data storage object |
| US20070005746A1 (en) | 2005-06-30 | 2007-01-04 | Roe Bryan Y | Enhanced network discovery service |
| US8296550B2 (en) | 2005-08-29 | 2012-10-23 | The Invention Science Fund I, Llc | Hierarchical register file with operand capture ports |
| US20070130232A1 (en) * | 2005-11-22 | 2007-06-07 | Therrien David G | Method and apparatus for efficiently storing and managing historical versions and replicas of computer data files |
| US7653650B2 (en) * | 2005-12-13 | 2010-01-26 | International Business Machines Corporation | Apparatus, system, and method for synchronizing change histories in enterprise applications |
| US8554758B1 (en) | 2005-12-29 | 2013-10-08 | Amazon Technologies, Inc. | Method and apparatus for monitoring and maintaining health in a searchable data service |
| US7454592B1 (en) | 2006-02-16 | 2008-11-18 | Symantec Operating Corporation | Block-level and hash-based single-instance storage |
| CN101046755B (en) | 2006-03-28 | 2011-06-15 | 郭明南 | System and method of computer automatic memory management |
| US7801128B2 (en) | 2006-03-31 | 2010-09-21 | Amazon Technologies, Inc. | Managing communications between computing nodes |
| US20070233828A1 (en) | 2006-03-31 | 2007-10-04 | Jeremy Gilbert | Methods and systems for providing data storage and retrieval |
| US9195665B2 (en) | 2006-04-28 | 2015-11-24 | Hewlett-Packard Development Company, L.P. | Method and system for data retention |
| US20070271303A1 (en) | 2006-05-18 | 2007-11-22 | Manuel Emilio Menendez | Personal file version archival management and retrieval |
| US8255420B2 (en) | 2006-05-23 | 2012-08-28 | Noryan Holding Corporation | Distributed storage |
| US8099605B1 (en) | 2006-06-05 | 2012-01-17 | InventSec AB | Intelligent storage device for backup system |
| US8990270B2 (en) | 2006-08-03 | 2015-03-24 | Hewlett-Packard Development Company, L. P. | Protocol virtualization for a network file system |
| WO2008024971A1 (en) | 2006-08-25 | 2008-02-28 | University Of Florida Research Foundation Inc. | Recursively partioned static ip router tables |
| US7539783B2 (en) | 2006-09-22 | 2009-05-26 | Commvault Systems, Inc. | Systems and methods of media management, such as management of media to and from a media storage library, including removable media |
| WO2008043197A1 (en) | 2006-09-26 | 2008-04-17 | Intel Corporation | Dynamically changing a garbage collector in a managed runtime system |
| US7827218B1 (en) | 2006-11-18 | 2010-11-02 | X-Engines, Inc. | Deterministic lookup using hashed key in a multi-stride compressed trie structure |
| US8161353B2 (en) | 2007-12-06 | 2012-04-17 | Fusion-Io, Inc. | Apparatus, system, and method for validating that a correct data segment is read from a data storage device |
| US7987278B2 (en) | 2006-12-18 | 2011-07-26 | Ricoh Company, Ltd. | Web services device profile on a multi-service device: dynamic addition of services |
| US7840537B2 (en) | 2006-12-22 | 2010-11-23 | Commvault Systems, Inc. | System and method for storing redundant information |
| US8935206B2 (en) | 2007-01-31 | 2015-01-13 | Hewlett-Packard Development Company, L.P. | Snapshots in distributed storage systems |
| JP5020673B2 (en) | 2007-03-27 | 2012-09-05 | 株式会社日立製作所 | A computer system that prevents the storage of duplicate files |
| WO2008125336A2 (en) | 2007-04-15 | 2008-10-23 | Phoenix Contact Gmbh & Co. Kg | Method and control device for controlling an automation system |
| EP2015587B1 (en) | 2007-05-14 | 2012-01-25 | Apple Inc. | Method of storing a multimedia object in memory, associated data structure and terminal |
| US7725437B2 (en) | 2007-07-31 | 2010-05-25 | Hewlett-Packard Development Company, L.P. | Providing an index for a data store |
| JP4386932B2 (en) | 2007-08-17 | 2009-12-16 | 富士通株式会社 | Storage management program, storage management device, and storage management method |
| US8069317B2 (en) | 2007-10-12 | 2011-11-29 | International Business Machines Corporation | Providing and utilizing high performance block storage metadata |
| US9413825B2 (en) * | 2007-10-31 | 2016-08-09 | Emc Corporation | Managing file objects in a data storage system |
| US8447733B2 (en) | 2007-12-03 | 2013-05-21 | Apple Inc. | Techniques for versioning file systems |
| US8140599B1 (en) | 2007-12-07 | 2012-03-20 | Emc Corporation | Garbage collection for merged collections |
| US8244846B2 (en) | 2007-12-26 | 2012-08-14 | Symantec Corporation | Balanced consistent hashing for distributed resource management |
| US8171244B2 (en) | 2008-02-01 | 2012-05-01 | Imation Corp. | Methods for implementation of worm mode on a removable disk drive storage system |
| JP5091704B2 (en) | 2008-02-06 | 2012-12-05 | 株式会社日立製作所 | Storage configuration recovery method and storage management system |
| US8019882B2 (en) | 2008-06-27 | 2011-09-13 | Microsoft Corporation | Content identification for peer-to-peer content retrieval |
| JP5146174B2 (en) | 2008-07-28 | 2013-02-20 | 富士通株式会社 | Virtual machine monitor device and program, and virtual machine memory sharing management method |
| US8078646B2 (en) | 2008-08-08 | 2011-12-13 | Oracle International Corporation | Representing and manipulating RDF data in a relational database management system |
| WO2010061260A1 (en) | 2008-11-03 | 2010-06-03 | Elvin Slavik | Method, system, and product for managing spatial data in a database |
| US8353018B2 (en) | 2008-11-13 | 2013-01-08 | Yahoo! Inc. | Automatic local listing owner authentication system |
| JP5396848B2 (en) | 2008-12-16 | 2014-01-22 | 富士通株式会社 | Data processing program, server device, and data processing method |
| US8886714B2 (en) | 2011-08-08 | 2014-11-11 | Ctera Networks Ltd. | Remote access service for cloud-enabled network devices |
| US9344438B2 (en) | 2008-12-22 | 2016-05-17 | Qualcomm Incorporated | Secure node identifier assignment in a distributed hash table for peer-to-peer networks |
| US8132168B2 (en) | 2008-12-23 | 2012-03-06 | Citrix Systems, Inc. | Systems and methods for optimizing a process of determining a location of data identified by a virtual hard drive address |
| US8094500B2 (en) | 2009-01-05 | 2012-01-10 | Sandisk Technologies Inc. | Non-volatile memory and method with write cache partitioning |
| US8566362B2 (en) | 2009-01-23 | 2013-10-22 | Nasuni Corporation | Method and system for versioned file system using structured data representations |
| US8397051B2 (en) | 2009-02-23 | 2013-03-12 | Autonomy, Inc. | Hybrid hash tables |
| US8301654B2 (en) | 2009-02-24 | 2012-10-30 | Hitachi, Ltd. | Geographical distributed storage system based on hierarchical peer to peer architecture |
| US8135748B2 (en) | 2009-04-10 | 2012-03-13 | PHD Virtual Technologies | Virtual machine data replication |
| US8886206B2 (en) | 2009-05-01 | 2014-11-11 | Digimarc Corporation | Methods and systems for content processing |
| US8407190B2 (en) | 2009-06-30 | 2013-03-26 | Commvault Systems, Inc. | Performing data storage operations with a cloud environment, including containerized deduplication, data pruning, and data transfer |
| US8706980B2 (en) | 2009-07-30 | 2014-04-22 | Cleversafe, Inc. | Method and apparatus for slice partial rebuilding in a dispersed storage network |
| US8320282B2 (en) | 2009-07-30 | 2012-11-27 | Calix, Inc. | Automatic control node selection in ring networks |
| US8285925B1 (en) | 2009-07-31 | 2012-10-09 | Amazon Technologies, Inc. | Management of object mapping information corresponding to a distributed storage system |
| US8843762B2 (en) | 2009-09-04 | 2014-09-23 | Gradiant, Centro Tecnolóxico de Telecomunicacións de Galicia | Cryptographic system for performing secure iterative computations and signal processing directly on encrypted data in untrusted environments |
| BR112012007376B1 (en) | 2009-10-01 | 2023-02-14 | Telefonaktiebolaget Lm Ericsson (Publ) | SCOPE OF APPLICATION PLATFORM, USER DEVICE FOR A WEB APPLICATION, AND METHOD FOR ENABLING COMMUNICATION BETWEEN A SCOPE OF APPLICATION PLATFORM AND A WEB APPLICATION |
| JP2011095976A (en) | 2009-10-29 | 2011-05-12 | Nippon Telegr & Teleph Corp <Ntt> | Device, method and program for distributed data management |
| US8868510B2 (en) | 2009-12-03 | 2014-10-21 | Sybase, Inc. | Managing data storage as an in-memory database in a database management system |
| US8452739B2 (en) | 2010-03-16 | 2013-05-28 | Copiun, Inc. | Highly scalable and distributed data de-duplication |
| US8458131B2 (en) | 2010-02-26 | 2013-06-04 | Microsoft Corporation | Opportunistic asynchronous de-duplication in block level backups |
| US8458299B2 (en) | 2010-03-17 | 2013-06-04 | Hitachi, Ltd. | Metadata management method for NAS global namespace design |
| FR2959089B1 (en) | 2010-04-16 | 2012-08-03 | Inst Nat Rech Inf Automat | COMPUTER RESOURCE AND INFRASTRUCTURE MANAGEMENT TOOL AND NETWORKS |
| US20110264880A1 (en) | 2010-04-23 | 2011-10-27 | Tatu Ylonen Oy Ltd | Object copying with re-copying concurrently written objects |
| US9047218B2 (en) | 2010-04-26 | 2015-06-02 | Cleversafe, Inc. | Dispersed storage network slice name verification |
| US8504718B2 (en) | 2010-04-28 | 2013-08-06 | Futurewei Technologies, Inc. | System and method for a context layer switch |
| EP2569904B1 (en) * | 2010-05-10 | 2020-12-02 | Telefonaktiebolaget LM Ericsson (publ) | A ring node, an ethernet ring and methods for loop protection in an ethernet ring |
| US8984241B2 (en) | 2010-07-07 | 2015-03-17 | Nexenta Systems, Inc. | Heterogeneous redundant storage array |
| JP5982366B2 (en) | 2010-07-09 | 2016-08-31 | ステイト・ストリート・コーポレーション | Systems and methods for private cloud computing |
| WO2012016089A2 (en) | 2010-07-28 | 2012-02-02 | Fusion-Io, Inc. | Apparatus, system, and method for conditional and atomic storage operations |
| US20120030260A1 (en) | 2010-07-30 | 2012-02-02 | Maohua Lu | Scalable and parallel garbage collection method and system for incremental backups with data de-duplication |
| US8407438B1 (en) | 2010-08-16 | 2013-03-26 | Symantec Corporation | Systems and methods for managing virtual storage disk data |
| US8495093B2 (en) | 2010-08-18 | 2013-07-23 | International Business Machines Corporation | Multiway trie data structure that dynamically adjusts node sizes in a manner that reduces memory footprint and improves access speed |
| JP5514041B2 (en) | 2010-08-25 | 2014-06-04 | 日本電信電話株式会社 | Identifier assignment method and program |
| US8473778B2 (en) | 2010-09-08 | 2013-06-25 | Microsoft Corporation | Erasure coding immutable data |
| US8972366B2 (en) | 2010-09-29 | 2015-03-03 | Red Hat, Inc. | Cloud-based directory system based on hashed values of parent and child storage locations |
| WO2012051600A2 (en) | 2010-10-15 | 2012-04-19 | Kyquang Son | File system-aware solid-state storage management system |
| US8239584B1 (en) | 2010-12-16 | 2012-08-07 | Emc Corporation | Techniques for automated storage management |
| US8484408B2 (en) | 2010-12-29 | 2013-07-09 | International Business Machines Corporation | Storage system cache with flash memory in a raid configuration that commits writes as full stripes |
| WO2012092553A1 (en) | 2010-12-31 | 2012-07-05 | Desktone, Inc. | Providing virtual desktops using resources accessed on public computer networks |
| US9311135B2 (en) | 2011-01-18 | 2016-04-12 | Scality, S.A. | Method for generating universal objects identifiers in distributed multi-purpose storage systems |
| US9251087B2 (en) | 2011-02-11 | 2016-02-02 | SanDisk Technologies, Inc. | Apparatus, system, and method for virtual memory management |
| US8510267B2 (en) | 2011-03-08 | 2013-08-13 | Rackspace Us, Inc. | Synchronization of structured information repositories |
| US8966191B2 (en) | 2011-03-18 | 2015-02-24 | Fusion-Io, Inc. | Logical interface for contextual storage |
| US8341312B2 (en) | 2011-04-29 | 2012-12-25 | International Business Machines Corporation | System, method and program product to manage transfer of data to resolve overload of a storage system |
| US8572290B1 (en) | 2011-05-02 | 2013-10-29 | Board Of Supervisors Of Louisiana State University And Agricultural And Mechanical College | System and architecture for robust management of resources in a wide-area network |
| US8433681B2 (en) | 2011-05-12 | 2013-04-30 | Dell Products L.P. | System and method for managing replication in an object storage system |
| US8346810B2 (en) | 2011-05-13 | 2013-01-01 | Simplivity Corporation | Reference count propagation |
| US8949182B2 (en) | 2011-06-17 | 2015-02-03 | International Business Machines Corporation | Continuous and asynchronous replication of a consistent dataset |
| US8843998B2 (en) | 2011-06-27 | 2014-09-23 | Cliqr Technologies, Inc. | Apparatus, systems and methods for secure and selective access to services in hybrid public-private infrastructures |
| US8880837B2 (en) | 2011-08-24 | 2014-11-04 | International Business Machines Corporation | Preemptively allocating extents to a data set |
| WO2013039535A1 (en) | 2011-09-12 | 2013-03-21 | Microsoft Corporation | Querying and repairing data |
| US9355119B2 (en) * | 2011-09-22 | 2016-05-31 | Netapp, Inc. | Allocation of absent data within filesystems |
| EP2575379A1 (en) | 2011-09-29 | 2013-04-03 | Alcatel Lucent | Apparatuses and computer program products for discovering and accessing local services via WiFi hotspots |
| US9372827B2 (en) | 2011-09-30 | 2016-06-21 | Commvault Systems, Inc. | Migration of an existing computing system to new hardware |
| US9455996B2 (en) | 2011-10-03 | 2016-09-27 | New York University | Generating progressively a perfect hash data structure, such as a multi-dimensional perfect hash data structure, and using the generated data structure for high-speed string matching |
| US9035568B2 (en) | 2011-12-05 | 2015-05-19 | Qualcomm Incorporated | Telehealth wireless communication hub device and service platform system |
| US8762627B2 (en) | 2011-12-21 | 2014-06-24 | Sandisk Technologies Inc. | Memory logical defragmentation during garbage collection |
| AU2011384790A1 (en) | 2011-12-30 | 2014-07-24 | Schneider Electric It Corporation | Systems and methods of remote communication |
| US8782344B2 (en) | 2012-01-12 | 2014-07-15 | Fusion-Io, Inc. | Systems and methods for managing cache admission |
| US9489827B2 (en) | 2012-03-12 | 2016-11-08 | Cisco Technology, Inc. | System and method for distributing content in a video surveillance network |
| US9213581B2 (en) | 2012-03-14 | 2015-12-15 | Sap Se | Method and system for a cloud frame architecture |
| US9043567B1 (en) | 2012-03-28 | 2015-05-26 | Netapp, Inc. | Methods and systems for replicating an expandable storage volume |
| US20130263151A1 (en) | 2012-04-03 | 2013-10-03 | Microsoft Corporation | Consistent Hashing Table for Workload Distribution |
| US9628438B2 (en) | 2012-04-06 | 2017-04-18 | Exablox | Consistent ring namespaces facilitating data storage and organization in network infrastructures |
| TWI631871B (en) | 2012-04-27 | 2018-08-01 | 內數位專利控股公司 | Method and apparatus for supporting proximity discovery procedures |
| US20150081964A1 (en) | 2012-05-01 | 2015-03-19 | Hitachi, Ltd. | Management apparatus and management method of computing system |
| US9323589B2 (en) | 2012-05-15 | 2016-04-26 | Oracle International Corporation | Self registration of event—consumers/producers and auto discovery |
| US20130346839A1 (en) | 2012-06-20 | 2013-12-26 | Francis Dinha | Private tunnel network portal |
| US20130346591A1 (en) | 2012-06-21 | 2013-12-26 | Alcatel-Lucent Usa Inc. | Clientless Cloud Computing |
| US9003477B2 (en) | 2012-06-27 | 2015-04-07 | Microsoft Technology Licensing, Llc | Model for managing hosted resources using logical scopes |
| US9348652B2 (en) | 2012-07-02 | 2016-05-24 | Vmware, Inc. | Multi-tenant-cloud-aggregation and application-support system |
| US8966343B2 (en) | 2012-08-21 | 2015-02-24 | Western Digital Technologies, Inc. | Solid-state drive retention monitor using reference blocks |
| US9002792B2 (en) | 2012-11-19 | 2015-04-07 | Compellent Technologies | Confirming data consistency in a data storage environment |
| US9715507B2 (en) | 2013-03-28 | 2017-07-25 | Ctera Networks, Ltd. | Techniques for reconciling metadata and data in a cloud storage system without service interruption |
| US9786197B2 (en) | 2013-05-09 | 2017-10-10 | Rockwell Automation Technologies, Inc. | Using cloud-based data to facilitate enhancing performance in connection with an industrial automation system |
| JP2016527741A (en) | 2013-05-21 | 2016-09-08 | エグザブロックス・コーポレーション | Automatic data ring discovery and configuration |
| US9389651B2 (en) | 2013-05-22 | 2016-07-12 | Exablox Corporation | Modular electronics chassis |
| WO2014201270A1 (en) | 2013-06-12 | 2014-12-18 | Exablox Corporation | Hybrid garbage collection |
| WO2014205286A1 (en) | 2013-06-19 | 2014-12-24 | Exablox Corporation | Data scrubbing in cluster-based storage systems |
| US8991950B2 (en) | 2013-07-10 | 2015-03-31 | Exablox Corporation | Modular electronics chassis |
| US9934242B2 (en) | 2013-07-10 | 2018-04-03 | Exablox Corporation | Replication of data between mirrored data sites |
| US20150066524A1 (en) | 2013-09-05 | 2015-03-05 | Dorsata, Inc. | Methods and systems for the implementation of web based collaborative clinical pathways |
| WO2015054664A1 (en) | 2013-10-11 | 2015-04-16 | Exablox Corporation | Hierarchical data archiving |
| US10248556B2 (en) | 2013-10-16 | 2019-04-02 | Exablox Corporation | Forward-only paged data storage management where virtual cursor moves in only one direction from header of a session to data field of the session |
| US9985829B2 (en) | 2013-12-12 | 2018-05-29 | Exablox Corporation | Management and provisioning of cloud connected devices |
| US9774582B2 (en) | 2014-02-03 | 2017-09-26 | Exablox Corporation | Private cloud connected device cluster architecture |
| US11023453B2 (en) * | 2015-01-29 | 2021-06-01 | Hewlett Packard Enterprise Development Lp | Hash index |
-
2016
- 2016-03-29 US US15/084,401 patent/US20170060924A1/en not_active Abandoned
- 2016-03-29 US US15/084,399 patent/US10474654B2/en active Active
Patent Citations (40)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5418947A (en) * | 1992-12-23 | 1995-05-23 | At&T Corp. | Locating information in an unsorted database utilizing a B-tree |
| US5752243A (en) * | 1993-10-20 | 1998-05-12 | Microsoft Corporation | Computer method and storage structure for storing and accessing multidimensional data |
| US6052688A (en) * | 1995-01-26 | 2000-04-18 | Hans Verner Thorsen | Computer-implemented control of access to atomic data items |
| US6675173B1 (en) * | 1998-01-22 | 2004-01-06 | Ori Software Development Ltd. | Database apparatus |
| US6411957B1 (en) * | 1999-06-30 | 2002-06-25 | Arm Limited | System and method of organizing nodes within a tree structure |
| US20020161781A1 (en) * | 2001-04-27 | 2002-10-31 | Sun Microsystems, Inc. | Method, system, program, and computer readable medium for indexing object oriented objects in an object oriented database |
| US20020194184A1 (en) * | 2001-06-04 | 2002-12-19 | Baskins Douglas L. | System for and method of efficient, expandable storage and retrieval of small datasets |
| US20040133590A1 (en) * | 2002-08-08 | 2004-07-08 | Henderson Alex E. | Tree data structure with range-specifying keys and associated methods and apparatuses |
| US7213040B1 (en) * | 2002-10-29 | 2007-05-01 | Novell, Inc. | Apparatus for policy based storage of file data and meta-data changes over time |
| US20050063083A1 (en) * | 2003-08-21 | 2005-03-24 | Dart Scott E. | Systems and methods for the implementation of a digital images schema for organizing units of information manageable by a hardware/software interface system |
| US20050257083A1 (en) * | 2004-05-13 | 2005-11-17 | Cousins Robert E | Transaction-based storage system and method that uses variable sized objects to store data |
| US20090234821A1 (en) * | 2004-09-15 | 2009-09-17 | International Business Machines Corporation | Systems and Methods for Efficient Data Searching, Storage and Reduction |
| US7346734B2 (en) * | 2005-05-25 | 2008-03-18 | Microsoft Corporation | Cluster storage collection based data management |
| US20070185902A1 (en) * | 2006-01-26 | 2007-08-09 | Seagate Technology Llc | Object-based data storage device |
| US20070294319A1 (en) * | 2006-06-08 | 2007-12-20 | Emc Corporation | Method and apparatus for processing a database replica |
| US20070299867A1 (en) * | 2006-06-23 | 2007-12-27 | Timothy John Baldwin | Method and System for Defining a Heirarchical Structure |
| US7853591B1 (en) * | 2006-06-30 | 2010-12-14 | Juniper Networks, Inc. | Protection of database operations |
| US20100042703A1 (en) * | 2006-11-17 | 2010-02-18 | Koninklijke Philips Electronics N.V. | Method and apparatus for assigning addresses to nodes of a communication network tree structure |
| US20090037500A1 (en) * | 2007-07-31 | 2009-02-05 | Kirshenbaum Evan R | Storing nodes representing respective chunks of files in a data store |
| US20090182837A1 (en) * | 2008-01-11 | 2009-07-16 | Rogers J Andrew | Spatial Sieve Tree |
| US20110225429A1 (en) * | 2008-08-29 | 2011-09-15 | Charalampos Papamanthou | Cryptographic accumulators for authenticated hash tables |
| US20100146003A1 (en) * | 2008-12-10 | 2010-06-10 | Unisys Corporation | Method and system for building a B-tree |
| US20110022573A1 (en) * | 2009-07-27 | 2011-01-27 | International Business Machines Corporation | Preventing transfer and duplication of redundantly referenced objects across nodes of an application system |
| US20110314246A1 (en) * | 2010-06-16 | 2011-12-22 | Microsoft Corporation | Hierarchical allocation for file system storage device |
| US20120030256A1 (en) * | 2010-07-30 | 2012-02-02 | Wolfgang Pfeifer | Common Modeling of Data Access and Provisioning for Search, Query, Reporting and/or Analytics |
| US20120117067A1 (en) * | 2010-10-29 | 2012-05-10 | Navteq North America, Llc | Method and apparatus for providing a range ordered tree structure |
| US20120331249A1 (en) * | 2011-06-23 | 2012-12-27 | CohortFS, LLC | Dynamic data placement for distributed storage |
| US20130007007A1 (en) * | 2011-06-29 | 2013-01-03 | Nokia Corporation | Method and apparatus for providing a list-based interface to key-value stores |
| US20130086303A1 (en) * | 2011-09-30 | 2013-04-04 | Fusion-Io, Inc. | Apparatus, system, and method for a persistent object store |
| US8868926B2 (en) * | 2012-04-06 | 2014-10-21 | Exablox Corporation | Cryptographic hash database |
| US20130346725A1 (en) * | 2012-06-20 | 2013-12-26 | Microsoft Corporation | Structuring storage based on latch-free b-trees |
| US20140164284A1 (en) * | 2012-12-12 | 2014-06-12 | Michael Hartel | Business Object Data Structures with Node Keys |
| US20140317065A1 (en) * | 2013-04-23 | 2014-10-23 | Exablox Corporation | Reference counter integrity checking |
| US9864773B1 (en) * | 2013-04-29 | 2018-01-09 | Seagate Technology Llc | Object-based commands with data integrity identifiers |
| US20150220578A1 (en) * | 2014-02-04 | 2015-08-06 | Exablox Corporation | Content based organization of file systems |
| US20150278271A1 (en) * | 2014-03-31 | 2015-10-01 | Sandisk Enterprise Ip Llc | Compaction of Information in Tiered Data Structure |
| US20150370502A1 (en) * | 2014-06-19 | 2015-12-24 | Cohesity, Inc. | Making more active use of a secondary storage system |
| US20160063008A1 (en) * | 2014-09-02 | 2016-03-03 | Netapp, Inc. | File system for efficient object fragment access |
| US20160210080A1 (en) * | 2015-01-20 | 2016-07-21 | Ultrata Llc | Object memory data flow instruction execution |
| US20160306817A1 (en) * | 2015-04-14 | 2016-10-20 | Et International, Inc. | Systems and methods for key-value stores |
Cited By (27)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10491387B2 (en) * | 2016-11-15 | 2019-11-26 | International Business Machines Corporation | End-to-end encryption of a block storage device with protected key |
| US11763169B2 (en) | 2016-11-16 | 2023-09-19 | Catalog Technologies, Inc. | Systems for nucleic acid-based data storage |
| US12001962B2 (en) | 2016-11-16 | 2024-06-04 | Catalog Technologies, Inc. | Systems for nucleic acid-based data storage |
| US12236354B2 (en) | 2016-11-16 | 2025-02-25 | Catalog Technologies, Inc. | Systems for nucleic acid-based data storage |
| US10628407B1 (en) * | 2017-02-27 | 2020-04-21 | Amazon Technologies, Inc. | Efficient multithreaded data structure processing |
| US20180373727A1 (en) * | 2017-06-26 | 2018-12-27 | Vmware, Inc. | Management of b-tree leaf nodes with variable size values |
| US10698865B2 (en) * | 2017-06-26 | 2020-06-30 | Vmware, Inc. | Management of B-tree leaf nodes with variable size values |
| US12323491B2 (en) * | 2017-10-12 | 2025-06-03 | Stanley Bruce Kinsey | Cloud-based index and centralized cloud-based hub interface for cloud-stored media |
| US20240333794A1 (en) * | 2017-10-12 | 2024-10-03 | Stanley Bruce Kinsey | Cloud-based index and centralized cloud-based hub interface for cloud-stored media |
| US12006497B2 (en) | 2018-03-16 | 2024-06-11 | Catalog Technologies, Inc. | Chemical methods for nucleic acid-based data storage |
| US11556897B2 (en) | 2018-05-31 | 2023-01-17 | Microsoft Technology Licensing, Llc | Job-post budget recommendation based on performance |
| US10929385B2 (en) * | 2018-06-22 | 2021-02-23 | EMC IP Holding Company LLC | Multi-level data deduplication for elastic cloud storage devices |
| US20190392062A1 (en) * | 2018-06-22 | 2019-12-26 | EMC IP Holding Company LLC | Multi-level data deduplication for elastic cloud storage devices |
| US11658926B2 (en) | 2018-06-27 | 2023-05-23 | Microsoft Technology Licensing, Llc | Generating smart replies involving image files |
| US11188194B2 (en) | 2018-06-27 | 2021-11-30 | Microsoft Technology Licensing, Llc | Personalization and synonym hierarchy for smart replies |
| US11062084B2 (en) * | 2018-06-27 | 2021-07-13 | Microsoft Technology Licensing, Llc | Generating diverse smart replies using synonym hierarchy |
| US12437841B2 (en) * | 2018-08-03 | 2025-10-07 | Catalog Technologies, Inc. | Systems and methods for storing and reading nucleic acid-based data with error protection |
| US20200185057A1 (en) * | 2018-08-03 | 2020-06-11 | Catalog Technologies, Inc. | Systems and methods for storing and reading nucleic acid-based data with error protection |
| US11636220B2 (en) * | 2019-02-01 | 2023-04-25 | Intertrust Technologies Corporation | Data management systems and methods |
| US12002547B2 (en) | 2019-05-09 | 2024-06-04 | Catalog Technologies, Inc. | Data structures and operations for searching, computing, and indexing in DNA-based data storage |
| US11671492B2 (en) * | 2020-09-10 | 2023-06-06 | EMC IP Holding Company LLC | Multipart upload for distributed file systems |
| US20220188339A1 (en) * | 2020-12-16 | 2022-06-16 | Electronics And Telecommunications Research Institute | Network environment synchronization apparatus and method |
| US12026181B2 (en) * | 2020-12-16 | 2024-07-02 | Electronics And Telecommunications Research Institute | Network environment synchronization apparatus and method |
| US12204494B2 (en) * | 2022-04-13 | 2025-01-21 | Oracle International Corporation | Implementing graph search with in-structure metadata of a graph-organized file system |
| US12001481B2 (en) * | 2022-04-13 | 2024-06-04 | Oracle International Corporation | Graph-organized file system |
| US20230334014A1 (en) * | 2022-04-13 | 2023-10-19 | Oracle International Corporation | Implementing graph search with in-structure metadata of a graph-organized file system |
| US20230334093A1 (en) * | 2022-04-13 | 2023-10-19 | Oracle International Corporation | Graph-organized file system |
Also Published As
| Publication number | Publication date |
|---|---|
| US20170063990A1 (en) | 2017-03-02 |
| US10474654B2 (en) | 2019-11-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20170060924A1 (en) | B-Tree Based Data Model for File Systems | |
| US12164386B2 (en) | Large content file optimization | |
| US9542409B2 (en) | Deduplicated file system | |
| US9830324B2 (en) | Content based organization of file systems | |
| US10365974B2 (en) | Acquisition of object names for portion index objects | |
| US20150006495A1 (en) | Methods and apparatuses to optimize updates in a file system based on birth time | |
| US20100161608A1 (en) | Methods and apparatus for content-aware data de-duplication | |
| GB2520361A (en) | Method and system for a safe archiving of data | |
| US11221777B2 (en) | Storage system indexed using persistent metadata structures | |
| US20170242882A1 (en) | An overlay stream of objects | |
| US12164799B2 (en) | Reducing memory usage in storing metadata | |
| CN118861100A (en) | Large object reading method, storage medium and device for database | |
| US12014070B2 (en) | Method, device, and computer program product for storage management | |
| US11016933B2 (en) | Handling weakening of hash functions by using epochs | |
| Savić et al. | The analysis and implication of data deduplication in digital forensics | |
| US20180367313A1 (en) | Secure memory and hierarchical structure and system therefor | |
| US20260037389A1 (en) | Opportunistic setting of relationships when cloning files across namespaces in a deduplication filesystem | |
| CN118861064A (en) | Database large object deletion method, storage medium and device | |
| CN118885485A (en) | Storage method and related products of temporary large objects in database | |
| CN118861066A (en) | Large object updating method, storage medium and device for database | |
| CN118861068A (en) | Methods for modifying large objects in databases and related products | |
| CN118861063A (en) | Database large object rewriting method, storage medium and device | |
| CN118861040A (en) | Large object deduplication processing method, storage medium and device for database | |
| CN118861042A (en) | Management method and related products for data table with large object column | |
| CN118861039A (en) | Large object storage method, storage medium and device for database |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: EXABLOX CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FITZHARDINGE, JEREMY;REEL/FRAME:038635/0450 Effective date: 20160516 |
|
| AS | Assignment |
Owner name: SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT, CALI Free format text: SECURITY INTEREST;ASSIGNORS:EXABLOX CORPORATION;STORAGECRAFT INTERMEDIATE HOLDINGS, INC.;STORAGECRAFT ACQUISITION CORPORATION;AND OTHERS;REEL/FRAME:041748/0849 Effective date: 20170324 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
| AS | Assignment |
Owner name: STORAGECRAFT TECHNOLOGY CORPORATION, MINNESOTA Free format text: TERMINATION AND RELEASE OF PATENT SECURITY AGREEMENT;ASSIGNOR:SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT;REEL/FRAME:055614/0852 Effective date: 20210316 Owner name: EXABLOX CORPORATION, MINNESOTA Free format text: TERMINATION AND RELEASE OF PATENT SECURITY AGREEMENT;ASSIGNOR:SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT;REEL/FRAME:055614/0852 Effective date: 20210316 Owner name: STORAGECRAFT ACQUISITION CORPORATION, MINNESOTA Free format text: TERMINATION AND RELEASE OF PATENT SECURITY AGREEMENT;ASSIGNOR:SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT;REEL/FRAME:055614/0852 Effective date: 20210316 Owner name: STORAGECRAFT INTERMEDIATE HOLDINGS, INC., MINNESOTA Free format text: TERMINATION AND RELEASE OF PATENT SECURITY AGREEMENT;ASSIGNOR:SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT;REEL/FRAME:055614/0852 Effective date: 20210316 |