[go: up one dir, main page]

US20170060924A1 - B-Tree Based Data Model for File Systems - Google Patents

B-Tree Based Data Model for File Systems Download PDF

Info

Publication number
US20170060924A1
US20170060924A1 US15/084,401 US201615084401A US2017060924A1 US 20170060924 A1 US20170060924 A1 US 20170060924A1 US 201615084401 A US201615084401 A US 201615084401A US 2017060924 A1 US2017060924 A1 US 2017060924A1
Authority
US
United States
Prior art keywords
file
objects
data
tree
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/084,401
Inventor
Jeremy Fitzhardinge
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Exablox Corp
Original Assignee
Exablox Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Exablox Corp filed Critical Exablox Corp
Priority to US15/084,401 priority Critical patent/US20170060924A1/en
Assigned to EXABLOX CORPORATION reassignment EXABLOX CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FITZHARDINGE, JEREMY
Publication of US20170060924A1 publication Critical patent/US20170060924A1/en
Assigned to SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT reassignment SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EXABLOX CORPORATION, STORAGECRAFT ACQUISITION CORPORATION, STORAGECRAFT INTERMEDIATE HOLDINGS, INC., Storagecraft Technology Corporation
Assigned to EXABLOX CORPORATION, STORAGECRAFT INTERMEDIATE HOLDINGS, INC., Storagecraft Technology Corporation, STORAGECRAFT ACQUISITION CORPORATION reassignment EXABLOX CORPORATION TERMINATION AND RELEASE OF PATENT SECURITY AGREEMENT Assignors: SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F17/30327
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/289Object oriented databases
    • G06F17/30115
    • G06F17/30607
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • This disclosure relates generally to data processing and, more specifically, to methods and systems for providing a B-tree based data model for organizing file systems.
  • a file system may be viewed as a directed graph of objects, wherein nodes and leaves of the directed graph represent files and directories. Directories may further include subdirectories and files.
  • each file or directory is assigned attributes that regulate user permissions for viewing, editing, and creation of files and directories. Attributes of directories and files are kept in the directed graph as objects. Large files in a file system may be split into a chain of blocks. Therefore, during a lifetime of a file system, the directed graph may be developed to include very long paths or chains of referring objects to form the root of the directed graph to leaves.
  • any modifications to the file system such as a modification or creation of a new file or directory, require traveling the path through nodes to find a place for adding, creating, or modifying a new node. Having long and unbalanced paths from the root to leaves may require excessive input/output operations that may be time consuming and lead to unnecessarily redundant storage consumption.
  • An example embodiment can provide a B-tree based data model constructed on top of an object store.
  • the object store may include immutable, content-addressable, distributed objects representing fragments of user content.
  • a method includes providing an object store to store objects.
  • the objects can represent fragments of the data.
  • Each object can be associated with an address.
  • the method can further include associating a B-tree with the object store.
  • the B-tree can include nodes, with each of the nodes including keys.
  • Each key can be associated with at least one object from the object store.
  • the method allows generating, based at least partially on objects from the object store, values for each key.
  • the address of an object in the object store is based on content of the fragment of the data. In some embodiments, the method determines whether a size of an object from the object store is less than a pre-determined size. If the result of the determination is positive, a value of the object is stored in a particular node of the B-tree. The particular node includes a particular key associated with the object. If the result of the determination is negative, the address of the object is stored in the particular node of the B-tree.
  • each object includes one of a metadata object or a data object.
  • Metadata objects store at least a number of references to further objects from the data objects and an identification number associated with a file or a directory.
  • the data object represents a continuous fragment of the file or a directory entry.
  • the key associated with the metadata object includes at least three fields.
  • the first field includes an indication of the metadata object.
  • the second field includes the identification number.
  • the third field includes a metadata index representing a distinct type of the metadata object.
  • the type of the metadata object includes one of: attributes associated with the file or the directory, a symbolic link to the file or the directory, and an extended file attribute associated with the file or the directory.
  • the key associated with the fragment of the file includes at least three fields. The first field includes an indication of the data object. The second field includes the identification number. The third field includes an offset of the continuous fragment of the file from the beginning of the file.
  • the key associated with the directory entry includes at least three fields.
  • the first field includes an indication of the data object.
  • the second field includes the identification number.
  • the third field includes a hash calculated based on a literal name of the directory entry.
  • calculating the hash includes applying a hash function to the literal name to obtain a preliminary hash.
  • the hash function can include, for example, crc32c, SipHash, or SipHash-order3.
  • the preliminary hash can be shifted by a pre-determined base number to obtain the hash. If the hash matches an existing hash, then the hash is incremented by 1.
  • the steps of the method for organizing data are stored on a machine-readable medium comprising instructions, which, when implemented by one or more processors, perform the recited steps.
  • FIG. 1 illustrates an example system for organizing computer data.
  • FIG. 2 illustrates an example of two collections sharing mutual data objects.
  • FIG. 3 illustrates an example B-tree.
  • FIG. 4 displays a table containing types of variables.
  • FIG. 5A illustrates attributes describing a superblock entry in a data model.
  • FIG. 5B illustrates attributes of a key used in a B-tree based data model.
  • FIG. 6 displays attributes of a metadata, attributes of a file entry, and additional attributes of a directory entry in a B-tree based data model.
  • FIG. 7 illustrates internal and external forms for extended attributes used in a B-tree data model.
  • FIG. 8 illustrates a key, internal form, and external form for values for a file chunk.
  • FIG. 9 illustrates a key and forms for values for a directory entry.
  • FIG. 10 is a process flow diagram showing a method for organizing data.
  • FIG. 11 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
  • the technology described herein allows organizing data using a B-tree built on top of an object store.
  • the object store includes immutable content-addressable distributed objects.
  • the objects may represent fragments of data (for example, fragments of files and directories and metadata including attributes of files and directories).
  • the method for organizing data includes providing an object store to store objects.
  • the objects represent fragments of the data.
  • Each object is associated with an address.
  • the method can further include associating a B-tree with the object store.
  • the B-tree includes nodes, with each node including keys.
  • Each key can be associated with at least one object from the object store.
  • the method allows generating, based at least partially on objects from the object store, values for each of the keys.
  • FIG. 1 illustrates an example system 100 for organizing data, according to various example embodiments.
  • the illustrated system 100 includes applications 110 , a directed graph 120 , and an object store 130 .
  • applications may include operational systems, user applications, graphical user interface, and other software for creating, managing, and presenting the data.
  • the object store 130 includes key-value entries (objects).
  • Each of the key-value entries includes an identifier (a key) and a payload (also referred to as a value or an object content) representing a chunk of the data (for example, a chunk of user content.)
  • objects are designated as either “data” or “metadata.” Payloads of “data” objects include only uninterpreted bytes. Payloads of “metadata” objects have an internal structure and may refer to other objects. In some embodiments, the payloads of “metadata” objects include keys of the objects to which the “metadata” objects refers.
  • objects in the object store 130 are organized by a graph structure 120 (also referred as a directed graph or a collection).
  • the graph structure 120 is a directed graph in which each node is an immutable content-addressable object.
  • the collection includes a specific object designated as a root object.
  • objects are at least partially content-addressable. This implies that some portions of the identifiers of objects are functions of object contents.
  • a key for the object representing a chunk of data can be calculated using a smart hash (Smash) which is a function of bytes in the chunk.
  • Smash smart hash
  • each version of data corresponds to a new graph structure. Two graph structures representing two different snapshots can share mutual objects.
  • FIG. 2 is a block diagram showing two collections 100 and 200 as having mutual objects.
  • Collection 100 includes root objects 110 and objects 122 , 124 , 126 , 128 , 130 , and 230 .
  • the collection 200 includes root object 210 , objects 222 , 224 , 226 , 228 , 230 , 126 , 130 , and 232 .
  • the collections 200 and 100 share at least objects 126 , 130 , and 230 .
  • the collection 200 is an increment of graph 100 (shown in FIG. 2 ).
  • Each of the root objects is the root of a specific unique immutable graph.
  • a new graph is constructed that includes the required changes. If the new graph is an incremental change from a previous graph, then the new graph is likely to share a large number of its objects with the previous graph. The only change required by the new objects is a new root.
  • the objects in object store 130 include entities such as identifier node (“mode”), extended attributes (“xattr”), a symbolic link (“symlink”), and chunks of files and directories.
  • each file or directory is represented by a chain of at least three objects: “mode” object, “xattr” object, and a “data” object.
  • An “mode” object refers the “xattr” object and the “data” object.
  • An “mode” object includes an identification number (“inum”) associated with the file or directory, number of links, file type and permission, creation and modification of file or directory, and other attributes.
  • An “xattr” object includes extended file attributes depending on a type of a file system, such as information concerning an author of a text, a checksum, encoding, and so forth.
  • An extended attribute may include a name of the attribute and a value associated with the attribute's name.
  • the “data” object is an object containing a chunk of data (for example, a chunk of a file).
  • an address of an object in object store 130 is a Smash, which is a hash function of the content of the object.
  • the graph structure 120 includes a B-tree (also referred to as a B-tree model).
  • a B-tree model for referencing metadata objects and data objects may allow accessing files by “mode” number in order to support persistent file handles.
  • the B-tree may provide tools for managing efficient large directories having up to 1,000,000 entries, hard links, and a large amount of small files.
  • the B-tree model allows flexible data layouts for non-streaming workloads.
  • B-tree collections tend to have fewer, larger objects with many smaller file system entities such as “modes,” “symlinks,” “xattrs,” and small amounts of data packed together.
  • the downside includes an increased conceptual complexity resulting in a structure that is unlike the user-visible file system namespace, and in some cases, results in increased input/output (I/O) resource consumption due to additional read-modify-write operations to repack updates within an otherwise unchanged object.
  • I/O input/output
  • All leaves in a B-tree have the same distance from the root with wide fanouts.
  • the left to root path length is bounded to a small number, typically no more than 4-5 levels, even for bigger file systems. This property results in efficient access patterns to look data up and minimal I/O amplification when writing.
  • a “libebtree” library is used for the B-tree implementation.
  • keys in B-tree nodes have a fixed size.
  • values in B-tree nodes have variable sizes.
  • much of the overall structure of a file system is placed into the design of the key space.
  • values associated with keys are then used to store small items efficiently packed into larger storage objects while larger objects are directly stored in object store 130 .
  • Packing small items such as “mode” attributes, extended attributes, “symlinks,” and small file content into B-tree nodes can significantly reduce the number of objects required.
  • a small file with a “xattr” can be represented by 3 objects (“mode,” “xattr,” and “data”) in a collection described in FIG. 2 .
  • the small file can be packed into objects along with many other “inodes/xattrs/data” objects, consuming less than one object for the entire file.
  • data storage is not limited by a fixed block size.
  • Block sizes can be chosen to match the workload.
  • Non-sequential I/O patterns can be matched to small blocks in order to avoid excess I/O amplification, whereas larger streaming writes are better served by large blocks.
  • the B-tree model design allows using file blocks from 1 kilobyte to 1 Megabyte.
  • the example B-tree 300 includes at least a root 302 , nodes 304 , 306 , and 308 , and a leaf 310 .
  • the root 302 includes at least a key 308 (k 1 ).
  • the node 304 includes at least keys 312 (k 2 ), 314 (k 3 ), and 316 (k 4 ).
  • the node 306 includes at least keys 318 (k 5 ) and 320 (k 6 ).
  • the node 308 includes at least keys 322 (k 7 ) and 324 (k 8 ).
  • the example leaf 310 includes at least key 326 (k 9 ).
  • the nodes 304 and 306 are referred by the root 302 and refer further nodes (for example, node 308 ).
  • the key k 1 of root 302 divides all further nodes into two subtrees.
  • the first subtree starts with the node 304 .
  • Keys k 2 , k 3 , and k 4 are all less than key k 1 .
  • the second subtree starts with the node 306 as a subtree. Keys k 5 and k 6 are greater than key k 1 .
  • Node 304 may further divide B-tree 300 and refer at least four further subtrees, while node 306 may further divide B-tree into at least three further subtree.
  • the example node 304 refers at least node 308 .
  • Keys k 7 and k 8 in node 308 are greater than key k 3 and less than key k 4 of the node 304 .
  • the leaf 312 does not refer any further nodes.
  • Each node and leaf of B-tree may also include value fields (also referred to as payloads) for each key in the node or the leaf to store data inline, i.e. inside the B-tree.
  • FIG. 4 is a table 400 of types of variables that can be used for generating keys of B-trees, according to various example embodiments.
  • FIG. 5A is a table showing superblock 510 containing attributes associated with an example file system modeled via a B-tree.
  • the superblock 510 can be implemented as a single, specific object in object store 130 .
  • the superblock 510 includes a magic number that identifies the superblock in the object store 130 , unique identifier for the file system, update sequence of the file system, key of the root object in a B-tree, maximum object size in a B-tree, maximum object size of data in object store 130 , and maximum size of data that can be included in a B-tree.
  • the identifier of file system and the update sequence can be used as references for debugging, as forensics to identify root objects for a given collection, and to determine the chronology of the root objects.
  • FIG. 5B shows fields of a key 520 located at a node in a B-tree, according to various example embodiments.
  • the key 520 includes three fields: a “data” field 522 , an “inum” field 524 , and an “index/offset” field 526 .
  • the “data” field identifies type of object: data or metadata.
  • “data” field 522 is equal to 0 for metadata and 1 for data.
  • “Inum” field 524 indicates which file, directory, or metadata entry the key 520 presents.
  • “Index/offset” field 526 relates to a particular chunk of data or an index of metadata.
  • the field 526 is an index if key 520 associated with metadata (“data” field 522 is 0) and offset of a particular file chunk if key 520 relates to a file (“data” field 522 is 1).
  • each distinct type of metadata has its own metadata index.
  • a list of metadata indexes is shown in table 610 in FIG. 6 .
  • the metadata index 612 in table 610 is the value placed in “index/offset” field 526 of key 520 if the key 520 relates to a metadata (that is when “data” field 522 is 1).
  • metadata index 610 (and corresponding “index/offset” field 526 ) is 1 if the metadata are POSIX file attributes and 2 if metadata is a “symlink” target path.
  • the target data for a “symlink” is the literal bytes of the target.
  • the literal bytes of the target are stored in a value associated with a key 510 inside a node of a B-tree. If key 520 relates to a metadata containing an extended attribute of a file, index 612 is a hash of a name and value of the extended attribute.
  • FIG. 6 also illustrates table 620 containing attributes 614 of a file or a directory and table 630 containing additional attributes 616 for a directory.
  • the file attributes 614 include regular POSIX attributes.
  • Additional attributes 616 for directories include a hash function (“hashnf”) and a key for the hash function (“hashkey”).
  • “hashnf” is 0 if the hash function is cyclic redundancy check “crh32c,” 1 if the hash function is SipHash, and 2 if the hash function is SipHash with 3 characters of ordering.
  • root directory has a constant “Inode” number of 1000.
  • the root directory has no parent.
  • the root includes an “. . . ” entry pointing back to the root directory. All new directories, including the root directory, start with attribute “nlink” equal to 2.
  • “xattrs” are indexed by name using a “namehash” scheme which is further described in FIG. 9 and “crc32c” as the hash function.
  • the value of a key associated with “xattrs” can be in one of the two forms: internal or external.
  • FIG. 7 illustrates internal form 710 and external form 720 for a value of a key associated with an extended attribute.
  • Both forms include an “inline” field 712 and second field 714 .
  • the “inline” field 712 is 1 and the second field 714 includes a literal name (and a value) of an extended attribute.
  • the “inline” field 712 is equal to 0 and the second field 714 includes a smart hash of an object in an object store, wherein the object includes the extended attribute. If the size of the extended attribute is larger than “maxinlinesize” (described in FIG. 5A ), then the external form 720 is stored in a B-tree.
  • the value for the extended attribute is stored inline in a B-tree using form 710 .
  • the size of the value for extended attribute may be as large as “maxdataobjsize” (described in FIG. 5A ) so that the extended attribute values are never become more than one object in size.
  • FIG. 8 shows a key 810 for a chunk of file, an internal form 820 for value of the file chunk, and an external form 830 for value for the file chunk, according to some example embodiments.
  • Key 810 includes a “data” field 522 set to 1, an “Inum” field 524 , and “offset” field 526 .
  • “Inum” field is the metadata identifier of the file.
  • “Offset” field is the offset of the file chunk from the beginning of file.
  • Chunk also referred to as a fragment or a continuous fragment
  • Two chunks of the same file are not allowed to overlap. Gaps between chunks and gaps between the end of the last chunk and file size are implicitly zero-filled.
  • value of the file chunk can be stored inline in a B-tree and externally in object store 130 .
  • Internal form 820 is used when the chunk is stored directly in a B-tree.
  • Internal form 820 including a “BTC_DATA_INLINE” field 822 is set to 1 and a second field 824 containing literal byte data represents the chunk.
  • the file chunk is stored inline in a node of a B-tree if the size of the chunk is less than “maxinlinesize” (described in FIG. 5A ).
  • External form 830 is used when the file chunk is stored externally in object store 130 .
  • External form 830 includes a “BTC_DATA_EXTERN” field 832 set to 2, “size” field 834 representing size of the file chunk, and Smash of an object in object store 130 , and the object corresponding to the file chunk.
  • the external form is stored in a node of a B-tree if size of the chunk is larger than “maxinlinesize”(described in FIG. 5A ).
  • dividing files in chunks (also referred herein as chunking) is performed in accordance with one or more policy. While introducing a policy for chunking one may consider following facts:
  • a “block-aligned chunking” policy is applied.
  • the block-aligned policy causes a chunk structure of a file to reflect write patterns to the file. If the file is written with streaming sequential writes then the chunks end up being as large as possible, which minimizes the per-object overhead and metadata. If the file is written non-sequentially, the chunks reflect the IO sizes of the writes to minimize read-modify-write overhead of updating a portion of a chunk's object.
  • the data when data is written to a file, the data is accumulated in a memory waiting for a subsequent write. Adjacent writes are merged in an object up to the maxdataobjectsize.
  • existing data for non-sequential writes is removed and replaced with the new data.
  • the block size logically subdivides the file into a sequence of block sized and aligned segments.
  • a chunk can span multiple blocks, but not more than one chunk can be located within a block. Therefore, if a write is not block-aligned, any existing chunk is split at the block boundaries, and a new chunk is inserted.
  • the new chunk contains a combination of the old and new data. If the overwrite is already block aligned, then the new data is simply written. If the overwrite aligns with an existing chunk then the chunk is simply replaced without affecting the surrounding chunks.
  • the chunks are merged to maintain the invariant of no more than one chunk per block.
  • an additional constraint is introduced that forces chunks to be always split at “maxdataobjsize” boundaries in the file. This constrain may allow two files which are substantially similar to deduplicate against each other by allowing chunking to resynchronize at “maxdataobjsize” boundaries.
  • a “fingerprint chunking” policy can be applied.
  • the “fingerprint chunking” policy is intended to maximize the opportunities for deduplication by making the chunk structure a function of the file content rather than the write patterns.
  • the same byte sequence may result in the same chunk structure, and, therefore, the same object IDs.
  • Rabin fingerprint algorithm can be used to select content-dependent chunk division points.
  • the “fingerprint chunking” policy may work best for streaming sequential writes. Non-sequential overwrites are especially expensive, as the new writes need to be merged with existing data, and the new data re-chunked according to the fingerprinting.
  • a compression can be applied while chunking the file data.
  • the user data (as large as possible) are fed into a compression algorithm at write time in order to create a specific output size.
  • the compression can reduce not only the data size in bytes, but also in objects.
  • compression is a content-dependent transformation, and is easiest to apply to streaming writes. Using compression for non-sequential writes, and, especially, overwrites, is expensive because the read-modify-write cycle also requires decompression and recompression.
  • an “inlining” policy is applied.
  • the inlining may allow small data to be directly embedded within a B-tree, rather than requiring small external objects in the object store.
  • the inlining precludes deduplication for small files and results in a minimum amount of space savings from deduping small files.
  • a B-tree collection possesses a global “maxinlinesize” setting. Maxinlinesize is defined at creation time of the B-tree collection to set the upper bound on the largest chunk that may be inlined. Typically, maxinlinesize is about 4k. As mentioned above, each file is also associated with an “inlinesize” which defines the chunk size that is inlined. The inlining is useful for small files or large sparse files with lots of small spans. The default “inlinesize” is 1k.
  • FIG. 9 shows a key 910 for a directory entry and value 920 for the directory entry.
  • the directories are indexed by name using a “namehash” scheme similar to extended attributes “xattr.”
  • a region of the B-tree key space is reserved starting from a “base” for a certain number of slots. Slots are populated with entries as the entries are added. In some embodiments, it is assumed that the number of slots is very large with respect to expected number of entries, so the likelihood of single collisions is low.
  • B-tree key 910 includes “data” filed field equal to 1, “Inum” field 524 representing a metadata identifier of the directory, and “offset” field 526 which includes a “diroffset” value.
  • “Diroffset” is a hash function of a literal name of a directory entry (also referred to as a directory chunk).
  • the value of directory entry 522 includes a “BTC_DIRENT” field 922 set to 0, a name field 924 holding the name of the directory entry, a “direnttype” field 926 holding the type of entry, and “inum” field 928 holding a metadata identifier of the directory entry.
  • Each entry in slots has a form 930 as shown in FIG. 9 .
  • the “tag” field is used by directories for the chunk type field.
  • An entry with a zero-length name (the name field is a single 0x00 byte) is a tombstone 940 .
  • a tombstone 940 is a deleted entry, which is required so that a lookup algorithm can find later names with a colliding hash.
  • the name of the entry is hashed with a hash function to find a corresponding key, which is a slot in reserved slots. If the located slot is occupied by another name due to a hash collision, the key is incremented until it finds an available slot, which can be either a completely unused slot or occupied by a tombstone.
  • the name is hashed to determine the corresponding slot. If the determined slot is unused, the entry does not exist. If the determined slot is occupied and the name does not match, the key is incremented until the name is found or an unused slot is found. Any tombstones encountered are then ignored and skipped over.
  • the lookup algorithm is used to find the slot for the name in order to form B-tree. If the slot exists and the next slot is unused, then the entry is deleted. If the next slot is occupied, then it is assumed that a hash collision occurs and the entry is replaced by a tombstone (a zero-length name, and no payload). If there is a series of tombstones followed by an empty entry, the tombstones can be deleted.
  • three types of hash function can be used:
  • crc32c It is a resource inexpensive 32-bit hash. crc32c is neither strong nor collision resistant. Resulting names will have no apparent order and can be attacked to cause collisions, resulting in a denial-of-service attack. crc32c is used for “xattrs” since “xattr” is not generally writable from untrusted sources, and the number of “xattr” is not large.
  • SipHash It is an efficient 64-bit keyed hash. Each directory has its own randomly generated key which is not exposed outside of the filesystem. Any attacker would need to know the key to be able to cause collisions at a higher rate than a random chance.
  • SipHash-order3 SipHash-order3. It is a variant of SipHash that truncates the hash to 4 bytes and prepends 3 bytes of the name to the start of the hash. With a “normal” mix of names in a directory, this may result in a directory that is nearly lexically sorted, with entries having a common prefix being mixed randomly. This may help to accommodate applications which tend to operate on directories in an alphabetical order.
  • a case-insensitive variant of a hash function can be used to support case-insensitive names of directories.
  • the hash function may be marked as “CICP” (that is “case-insensitive, case-preserving”).
  • CICP case-insensitive, case-preserving
  • CICP case-insensitive, case-preserving
  • CICP case-insensitive, case-preserving
  • CICP case-insensitive, case-preserving
  • Applying a case-insensitive hash function allows performing filesystem operations on large directories to be efficient. For example, without case-insensitive lookups, Samba package is needed to scan the whole directory to perform case-insensitive matching. Names of directories are required to be properly formed with UTF-8 coding.
  • an “orphan directory” is provided.
  • FIG. 10 is a process flow diagram showing a method 1000 for organizing data, according to an example embodiment.
  • the method 1000 can be implemented using a computer system.
  • the example computer system is shown in FIG. 11 .
  • the method 1000 may commence with providing an object store to store objects in block 1010 .
  • the object represents fragments of the data and can be assigned addresses.
  • method 1000 can proceed with associating a B-tree with the object store.
  • the B-tree includes nodes.
  • Each of the nodes includes keys.
  • Each of the keys is associated with at least one object from the object store.
  • method 1000 generates, based at least partially on objects from the object store, values for the each of the keys.
  • method 1000 allows determining that a size of an object from the object store is less than a pre-determined size.
  • a value of the object is stored in a particular node of the B-tree, wherein the particular node includes a particular key associated with the object.
  • the address of the object is stored in the particular node of the B-tree.
  • FIG. 11 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of a computer system 1100 , within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
  • the machine operates as a standalone device or can be connected (e.g., networked) to other machines.
  • the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • the machine can be a server, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a digital camera, a portable music player (e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 (MP3) player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • a portable music player e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 (MP3) player
  • MP3 Moving Picture Experts Group Audio Layer 3
  • a web appliance e.g., a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • MP3 Moving Picture Experts Group Audio Layer 3
  • the example computer system 1100 includes a processor or multiple processors 1102 , a hard disk drive 1104 , a main memory 1106 , and a static memory 1108 , which communicate with each other via a bus 1110 .
  • the computer system 1100 may also include a network interface device 1112 .
  • the hard disk drive 1104 may include a computer-readable medium 1120 , which stores one or more sets of instructions 1122 embodying or utilized by any one or more of the methodologies or functions described herein.
  • the instructions 1122 can also reside, completely or at least partially, within the main memory 1106 and/or within the processors 1102 during execution thereof by the computer system 1100 .
  • the main memory 1106 and the processors 1102 also constitute machine-readable media.
  • While the computer-readable medium 1120 is shown in an exemplary embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
  • the term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions.
  • computer-readable medium shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Such media can also include, without limitation, hard disks, floppy disks, NAND or NOR flash memory, digital video disks, RAM, ROM, and the like.
  • the exemplary embodiments described herein can be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware.
  • the computer-executable instructions can be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions can be executed on a variety of hardware platforms and for interfaces to a variety of operating systems.
  • computer software programs for implementing the present method can be written in any number of suitable programming languages such as, for example, C, Python, JavaScript, Go, or other compilers, assemblers, interpreters or other computer languages or platforms.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Methods and systems for organizing data are provided. An example method includes providing an object store to store objects. Each of the objects represents fragments of the data is are associated with an address. The method further allows associating a B-tree with the object store. The B-tree includes nodes, wherein each of the nodes includes keys, and wherein each of the keys is associated with at least one object from the object store. Values for each of the keys are generated based at least partially on objects from the object store. If the size of an object from the object store is less than a pre-determined size, a value of the object is stored in a particular node of the B-tree, with the particular nodes including a particular key associated with the object. Otherwise, the method includes storing the address associated with the object in the particular node of the B-tree.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • The present application claims benefit of U.S. provisional application No. 62/210,385 filed on Aug. 26, 2015. The disclosure of the aforementioned application is incorporated herein by reference for all purposes.
  • TECHNICAL FIELD
  • This disclosure relates generally to data processing and, more specifically, to methods and systems for providing a B-tree based data model for organizing file systems.
  • BACKGROUND
  • The approaches described in this section could be pursued but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
  • In computer systems, user data are typically organized as file systems. In general, a file system may be viewed as a directed graph of objects, wherein nodes and leaves of the directed graph represent files and directories. Directories may further include subdirectories and files. In a multi-user computer system, each file or directory is assigned attributes that regulate user permissions for viewing, editing, and creation of files and directories. Attributes of directories and files are kept in the directed graph as objects. Large files in a file system may be split into a chain of blocks. Therefore, during a lifetime of a file system, the directed graph may be developed to include very long paths or chains of referring objects to form the root of the directed graph to leaves. Therefore, any modifications to the file system, such as a modification or creation of a new file or directory, require traveling the path through nodes to find a place for adding, creating, or modifying a new node. Having long and unbalanced paths from the root to leaves may require excessive input/output operations that may be time consuming and lead to unnecessarily redundant storage consumption.
  • SUMMARY
  • This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
  • The technology described herein includes methods for organizing data. An example embodiment can provide a B-tree based data model constructed on top of an object store. The object store may include immutable, content-addressable, distributed objects representing fragments of user content.
  • According to the example embodiment, a method includes providing an object store to store objects. The objects can represent fragments of the data. Each object can be associated with an address. The method can further include associating a B-tree with the object store. The B-tree can include nodes, with each of the nodes including keys. Each key can be associated with at least one object from the object store. The method allows generating, based at least partially on objects from the object store, values for each key.
  • In some embodiments, the address of an object in the object store is based on content of the fragment of the data. In some embodiments, the method determines whether a size of an object from the object store is less than a pre-determined size. If the result of the determination is positive, a value of the object is stored in a particular node of the B-tree. The particular node includes a particular key associated with the object. If the result of the determination is negative, the address of the object is stored in the particular node of the B-tree.
  • In some embodiments, each object includes one of a metadata object or a data object. Metadata objects store at least a number of references to further objects from the data objects and an identification number associated with a file or a directory. The data object represents a continuous fragment of the file or a directory entry.
  • In some embodiments, the key associated with the metadata object includes at least three fields. The first field includes an indication of the metadata object. The second field includes the identification number. The third field includes a metadata index representing a distinct type of the metadata object.
  • In some embodiments, the type of the metadata object includes one of: attributes associated with the file or the directory, a symbolic link to the file or the directory, and an extended file attribute associated with the file or the directory. In some embodiments, the key associated with the fragment of the file includes at least three fields. The first field includes an indication of the data object. The second field includes the identification number. The third field includes an offset of the continuous fragment of the file from the beginning of the file.
  • In some embodiments, the key associated with the directory entry includes at least three fields. The first field includes an indication of the data object. The second field includes the identification number. The third field includes a hash calculated based on a literal name of the directory entry.
  • In some embodiments, calculating the hash includes applying a hash function to the literal name to obtain a preliminary hash. The hash function can include, for example, crc32c, SipHash, or SipHash-order3. The preliminary hash can be shifted by a pre-determined base number to obtain the hash. If the hash matches an existing hash, then the hash is incremented by 1.
  • According to another example embodiment of the present disclosure, the steps of the method for organizing data are stored on a machine-readable medium comprising instructions, which, when implemented by one or more processors, perform the recited steps.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements.
  • FIG. 1 illustrates an example system for organizing computer data.
  • FIG. 2 illustrates an example of two collections sharing mutual data objects.
  • FIG. 3 illustrates an example B-tree.
  • FIG. 4 displays a table containing types of variables.
  • FIG. 5A illustrates attributes describing a superblock entry in a data model.
  • FIG. 5B illustrates attributes of a key used in a B-tree based data model.
  • FIG. 6 displays attributes of a metadata, attributes of a file entry, and additional attributes of a directory entry in a B-tree based data model.
  • FIG. 7 illustrates internal and external forms for extended attributes used in a B-tree data model.
  • FIG. 8 illustrates a key, internal form, and external form for values for a file chunk.
  • FIG. 9 illustrates a key and forms for values for a directory entry.
  • FIG. 10 is a process flow diagram showing a method for organizing data.
  • FIG. 11 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
  • DETAILED DESCRIPTION
  • The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings illustrate exemplary embodiments. These exemplary embodiments, which are also referred to herein as “examples,” are described in enough detail to enable those skilled in the art to practice the present subject matter. The embodiments can be combined, other embodiments can be utilized, or structural, logical and electrical changes can be made without departing from the scope of what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
  • The technology described herein allows organizing data using a B-tree built on top of an object store. In some embodiments, the object store includes immutable content-addressable distributed objects. The objects may represent fragments of data (for example, fragments of files and directories and metadata including attributes of files and directories).
  • According to an example embodiment, the method for organizing data includes providing an object store to store objects. The objects represent fragments of the data. Each object is associated with an address. The method can further include associating a B-tree with the object store. The B-tree includes nodes, with each node including keys. Each key can be associated with at least one object from the object store. The method allows generating, based at least partially on objects from the object store, values for each of the keys.
  • FIG. 1 illustrates an example system 100 for organizing data, according to various example embodiments. The illustrated system 100 includes applications 110, a directed graph 120, and an object store 130. In various embodiments, applications may include operational systems, user applications, graphical user interface, and other software for creating, managing, and presenting the data.
  • In various embodiments, the object store 130 includes key-value entries (objects). Each of the key-value entries includes an identifier (a key) and a payload (also referred to as a value or an object content) representing a chunk of the data (for example, a chunk of user content.) In various embodiments, objects are designated as either “data” or “metadata.” Payloads of “data” objects include only uninterpreted bytes. Payloads of “metadata” objects have an internal structure and may refer to other objects. In some embodiments, the payloads of “metadata” objects include keys of the objects to which the “metadata” objects refers.
  • In some embodiments, objects in the object store 130 are organized by a graph structure 120 (also referred as a directed graph or a collection). The graph structure 120 is a directed graph in which each node is an immutable content-addressable object. The collection includes a specific object designated as a root object.
  • In various embodiments, objects are at least partially content-addressable. This implies that some portions of the identifiers of objects are functions of object contents. A key for the object representing a chunk of data can be calculated using a smart hash (Smash) which is a function of bytes in the chunk. In some embodiments, each version of data (a snapshot) corresponds to a new graph structure. Two graph structures representing two different snapshots can share mutual objects.
  • FIG. 2 is a block diagram showing two collections 100 and 200 as having mutual objects. Collection 100 includes root objects 110 and objects 122, 124, 126, 128, 130, and 230. The collection 200 includes root object 210, objects 222, 224, 226, 228, 230, 126, 130, and 232. The collections 200 and 100 share at least objects 126, 130, and 230. In the example of FIG. 2, the collection 200 is an increment of graph 100 (shown in FIG. 2). Each of the root objects is the root of a specific unique immutable graph. In order to get the effect of a modification or mutation, a new graph is constructed that includes the required changes. If the new graph is an incremental change from a previous graph, then the new graph is likely to share a large number of its objects with the previous graph. The only change required by the new objects is a new root.
  • In some embodiments, the objects in object store 130 include entities such as identifier node (“mode”), extended attributes (“xattr”), a symbolic link (“symlink”), and chunks of files and directories. In a collection (for example, collection 100 or 200), each file or directory is represented by a chain of at least three objects: “mode” object, “xattr” object, and a “data” object. An “mode” object refers the “xattr” object and the “data” object. An “mode” object includes an identification number (“inum”) associated with the file or directory, number of links, file type and permission, creation and modification of file or directory, and other attributes. An “xattr” object includes extended file attributes depending on a type of a file system, such as information concerning an author of a text, a checksum, encoding, and so forth. An extended attribute may include a name of the attribute and a value associated with the attribute's name. The “data” object is an object containing a chunk of data (for example, a chunk of a file). In some embodiments, an address of an object in object store 130 is a Smash, which is a hash function of the content of the object.
  • In some embodiments, the graph structure 120 includes a B-tree (also referred to as a B-tree model). In various embodiments, using a B-tree model for referencing metadata objects and data objects may allow accessing files by “mode” number in order to support persistent file handles. The B-tree may provide tools for managing efficient large directories having up to 1,000,000 entries, hard links, and a large amount of small files. The B-tree model allows flexible data layouts for non-streaming workloads.
  • The result is that B-tree collections tend to have fewer, larger objects with many smaller file system entities such as “modes,” “symlinks,” “xattrs,” and small amounts of data packed together. The downside includes an increased conceptual complexity resulting in a structure that is unlike the user-visible file system namespace, and in some cases, results in increased input/output (I/O) resource consumption due to additional read-modify-write operations to repack updates within an otherwise unchanged object.
  • All leaves in a B-tree have the same distance from the root with wide fanouts. The left to root path length is bounded to a small number, typically no more than 4-5 levels, even for bigger file systems. This property results in efficient access patterns to look data up and minimal I/O amplification when writing.
  • In some embodiments, a “libebtree” library is used for the B-tree implementation. In some embodiments, keys in B-tree nodes have a fixed size. In other embodiments, values in B-tree nodes have variable sizes. In some embodiments, much of the overall structure of a file system is placed into the design of the key space. In various embodiments, values associated with keys are then used to store small items efficiently packed into larger storage objects while larger objects are directly stored in object store 130.
  • Packing small items such as “mode” attributes, extended attributes, “symlinks,” and small file content into B-tree nodes can significantly reduce the number of objects required. For example, a small file with a “xattr” can be represented by 3 objects (“mode,” “xattr,” and “data”) in a collection described in FIG. 2. In the B-tree model, the small file can be packed into objects along with many other “inodes/xattrs/data” objects, consuming less than one object for the entire file.
  • In some embodiments, data storage is not limited by a fixed block size. Block sizes can be chosen to match the workload. Non-sequential I/O patterns can be matched to small blocks in order to avoid excess I/O amplification, whereas larger streaming writes are better served by large blocks. The B-tree model design allows using file blocks from 1 kilobyte to 1 Megabyte.
  • An example B-tree 300 is shown in FIG. 3. The example B-tree 300 includes at least a root 302, nodes 304, 306, and 308, and a leaf 310. The root 302 includes at least a key 308 (k1). The node 304 includes at least keys 312 (k2), 314 (k3), and 316 (k4). The node 306 includes at least keys 318 (k5) and 320 (k6). The node 308 includes at least keys 322 (k7) and 324 (k8). The example leaf 310 includes at least key 326 (k9). The nodes 304 and 306 are referred by the root 302 and refer further nodes (for example, node 308). The key k1 of root 302 divides all further nodes into two subtrees. The first subtree starts with the node 304. Keys k2, k3, and k4 are all less than key k1. The second subtree starts with the node 306 as a subtree. Keys k5 and k6 are greater than key k1. Node 304 may further divide B-tree 300 and refer at least four further subtrees, while node 306 may further divide B-tree into at least three further subtree. The example node 304 refers at least node 308. Keys k7 and k8 in node 308 are greater than key k3 and less than key k4 of the node 304. The leaf 312 does not refer any further nodes. Each node and leaf of B-tree may also include value fields (also referred to as payloads) for each key in the node or the leaf to store data inline, i.e. inside the B-tree.
  • FIG. 4 is a table 400 of types of variables that can be used for generating keys of B-trees, according to various example embodiments. FIG. 5A is a table showing superblock 510 containing attributes associated with an example file system modeled via a B-tree. In some embodiments, the superblock 510 can be implemented as a single, specific object in object store 130. The superblock 510 includes a magic number that identifies the superblock in the object store 130, unique identifier for the file system, update sequence of the file system, key of the root object in a B-tree, maximum object size in a B-tree, maximum object size of data in object store 130, and maximum size of data that can be included in a B-tree. In some embodiments, the identifier of file system and the update sequence can be used as references for debugging, as forensics to identify root objects for a given collection, and to determine the chronology of the root objects.
  • FIG. 5B shows fields of a key 520 located at a node in a B-tree, according to various example embodiments. The key 520 includes three fields: a “data” field 522, an “inum” field 524, and an “index/offset” field 526. The “data” field identifies type of object: data or metadata. In some embodiments, “data” field 522 is equal to 0 for metadata and 1 for data. “Inum” field 524 indicates which file, directory, or metadata entry the key 520 presents. “Index/offset” field 526 relates to a particular chunk of data or an index of metadata. The field 526 is an index if key 520 associated with metadata (“data” field 522 is 0) and offset of a particular file chunk if key 520 relates to a file (“data” field 522 is 1).
  • In some embodiments, each distinct type of metadata has its own metadata index. A list of metadata indexes is shown in table 610 in FIG. 6. The metadata index 612 in table 610 is the value placed in “index/offset” field 526 of key 520 if the key 520 relates to a metadata (that is when “data” field 522 is 1). In some embodiments, metadata index 610 (and corresponding “index/offset” field 526) is 1 if the metadata are POSIX file attributes and 2 if metadata is a “symlink” target path. The target data for a “symlink” is the literal bytes of the target. In some embodiments, the literal bytes of the target are stored in a value associated with a key 510 inside a node of a B-tree. If key 520 relates to a metadata containing an extended attribute of a file, index 612 is a hash of a name and value of the extended attribute.
  • FIG. 6 also illustrates table 620 containing attributes 614 of a file or a directory and table 630 containing additional attributes 616 for a directory. In various embodiments, the file attributes 614 include regular POSIX attributes. Additional attributes 616 for directories include a hash function (“hashnf”) and a key for the hash function (“hashkey”). In some embodiments, “hashnf” is 0 if the hash function is cyclic redundancy check “crh32c,” 1 if the hash function is SipHash, and 2 if the hash function is SipHash with 3 characters of ordering.
  • In some embodiments, root directory has a constant “Inode” number of 1000. The root directory has no parent. The root includes an “. . . ” entry pointing back to the root directory. All new directories, including the root directory, start with attribute “nlink” equal to 2.
  • Referring back to table 610, in some embodiments, “xattrs” are indexed by name using a “namehash” scheme which is further described in FIG. 9 and “crc32c” as the hash function. The value of a key associated with “xattrs” can be in one of the two forms: internal or external.
  • FIG. 7 illustrates internal form 710 and external form 720 for a value of a key associated with an extended attribute. Both forms include an “inline” field 712 and second field 714. In internal form 710, the “inline” field 712 is 1 and the second field 714 includes a literal name (and a value) of an extended attribute. In external form 720, the “inline” field 712 is equal to 0 and the second field 714 includes a smart hash of an object in an object store, wherein the object includes the extended attribute. If the size of the extended attribute is larger than “maxinlinesize” (described in FIG. 5A), then the external form 720 is stored in a B-tree. If the extended attribute does not exceed “maxinlinesize,” then the value for the extended attribute is stored inline in a B-tree using form 710. The size of the value for extended attribute may be as large as “maxdataobjsize” (described in FIG. 5A) so that the extended attribute values are never become more than one object in size.
  • FIG. 8 shows a key 810 for a chunk of file, an internal form 820 for value of the file chunk, and an external form 830 for value for the file chunk, according to some example embodiments. Key 810 includes a “data” field 522 set to 1, an “Inum” field 524, and “offset” field 526. “Inum” field is the metadata identifier of the file. “Offset” field is the offset of the file chunk from the beginning of file. Chunk (also referred to as a fragment or a continuous fragment) represents a byte extent of file content. Two chunks of the same file are not allowed to overlap. Gaps between chunks and gaps between the end of the last chunk and file size are implicitly zero-filled.
  • In some embodiments, value of the file chunk can be stored inline in a B-tree and externally in object store 130. Internal form 820 is used when the chunk is stored directly in a B-tree. Internal form 820 including a “BTC_DATA_INLINE” field 822 is set to 1 and a second field 824 containing literal byte data represents the chunk. The file chunk is stored inline in a node of a B-tree if the size of the chunk is less than “maxinlinesize” (described in FIG. 5A).
  • External form 830 is used when the file chunk is stored externally in object store 130. External form 830 includes a “BTC_DATA_EXTERN” field 832 set to 2, “size” field 834 representing size of the file chunk, and Smash of an object in object store 130, and the object corresponding to the file chunk. The external form is stored in a node of a B-tree if size of the chunk is larger than “maxinlinesize”(described in FIG. 5A).
  • In some embodiments, dividing files in chunks (also referred herein as chunking) is performed in accordance with one or more policy. While introducing a policy for chunking one may consider following facts:
      • Large size chunks may amortize per chunk or per object overhead at cost of increased IO amplification if the whole object is not needed.
      • The object store may perform deduplication of objects based on object ID. Boundaries of an object determine the object ID. Therefore, if two identical byte sequences are chunked differently, then these two byte sequences may not deduplicate against each other.
      • Compression of data in a file may cause a wide variance between a logical file content and resulting objects if the file data is highly compressible. For example, for common patterns like “all zeros” the difference between the logical file content and resulting objects can be in hundredfold.
  • In some embodiments, a “block-aligned chunking” policy is applied. The block-aligned policy causes a chunk structure of a file to reflect write patterns to the file. If the file is written with streaming sequential writes then the chunks end up being as large as possible, which minimizes the per-object overhead and metadata. If the file is written non-sequentially, the chunks reflect the IO sizes of the writes to minimize read-modify-write overhead of updating a portion of a chunk's object.
  • The following two parameters are relevant for “block-aligned chunking” policy.
      • maxdataobjsize parameter defined for a collection. The maxdataobjsize parameter constrains the maximum size of the object used to store data; and
      • block size parameter defined per a file. A block size parameter controls how chunks are split and aggregated. The block size parameter includes a power of 2.
  • In some embodiments, when data is written to a file, the data is accumulated in a memory waiting for a subsequent write. Adjacent writes are merged in an object up to the maxdataobjectsize.
  • In some embodiments, existing data for non-sequential writes, if present, is removed and replaced with the new data. The block size logically subdivides the file into a sequence of block sized and aligned segments. A chunk can span multiple blocks, but not more than one chunk can be located within a block. Therefore, if a write is not block-aligned, any existing chunk is split at the block boundaries, and a new chunk is inserted. The new chunk contains a combination of the old and new data. If the overwrite is already block aligned, then the new data is simply written. If the overwrite aligns with an existing chunk then the chunk is simply replaced without affecting the surrounding chunks.
  • In some embodiments, if it is determined that a write results in more than one chunk within a single block region, the chunks are merged to maintain the invariant of no more than one chunk per block.
  • In some embodiments, an additional constraint is introduced that forces chunks to be always split at “maxdataobjsize” boundaries in the file. This constrain may allow two files which are substantially similar to deduplicate against each other by allowing chunking to resynchronize at “maxdataobjsize” boundaries.
  • In some embodiments, a “fingerprint chunking” policy can be applied. The “fingerprint chunking” policy is intended to maximize the opportunities for deduplication by making the chunk structure a function of the file content rather than the write patterns. When “fingerprint chunking” is applied, the same byte sequence may result in the same chunk structure, and, therefore, the same object IDs. In some embodiments, Rabin fingerprint algorithm can be used to select content-dependent chunk division points.
  • The “fingerprint chunking” policy may work best for streaming sequential writes. Non-sequential overwrites are especially expensive, as the new writes need to be merged with existing data, and the new data re-chunked according to the fingerprinting.
  • In some embodiments, a compression can be applied while chunking the file data. In some embodiments, the user data (as large as possible) are fed into a compression algorithm at write time in order to create a specific output size. The compression can reduce not only the data size in bytes, but also in objects. Like “fingerprinting”, compression is a content-dependent transformation, and is easiest to apply to streaming writes. Using compression for non-sequential writes, and, especially, overwrites, is expensive because the read-modify-write cycle also requires decompression and recompression.
  • In some embodiments, an “inlining” policy is applied. The inlining may allow small data to be directly embedded within a B-tree, rather than requiring small external objects in the object store. The inlining precludes deduplication for small files and results in a minimum amount of space savings from deduping small files.
  • A B-tree collection possesses a global “maxinlinesize” setting. Maxinlinesize is defined at creation time of the B-tree collection to set the upper bound on the largest chunk that may be inlined. Typically, maxinlinesize is about 4k. As mentioned above, each file is also associated with an “inlinesize” which defines the chunk size that is inlined. The inlining is useful for small files or large sparse files with lots of small spans. The default “inlinesize” is 1k.
  • FIG. 9 shows a key 910 for a directory entry and value 920 for the directory entry. The directories are indexed by name using a “namehash” scheme similar to extended attributes “xattr.” In some embodiments, a region of the B-tree key space is reserved starting from a “base” for a certain number of slots. Slots are populated with entries as the entries are added. In some embodiments, it is assumed that the number of slots is very large with respect to expected number of entries, so the likelihood of single collisions is low. B-tree key 910 includes “data” filed field equal to 1, “Inum” field 524 representing a metadata identifier of the directory, and “offset” field 526 which includes a “diroffset” value. “Diroffset” is a hash function of a literal name of a directory entry (also referred to as a directory chunk). The value of directory entry 522 includes a “BTC_DIRENT” field 922 set to 0, a name field 924 holding the name of the directory entry, a “direnttype” field 926 holding the type of entry, and “inum” field 928 holding a metadata identifier of the directory entry.
  • Each entry in slots has a form 930 as shown in FIG. 9. The “tag” field is used by directories for the chunk type field. An entry with a zero-length name (the name field is a single 0x00 byte) is a tombstone 940. A tombstone 940 is a deleted entry, which is required so that a lookup algorithm can find later names with a colliding hash.
  • In some embodiments, when a new directory of extended attribute entry is inserted in the B-tree, the name of the entry is hashed with a hash function to find a corresponding key, which is a slot in reserved slots. If the located slot is occupied by another name due to a hash collision, the key is incremented until it finds an available slot, which can be either a completely unused slot or occupied by a tombstone.
  • In some embodiments, when an entry is looked up in the B-tree, the name is hashed to determine the corresponding slot. If the determined slot is unused, the entry does not exist. If the determined slot is occupied and the name does not match, the key is incremented until the name is found or an unused slot is found. Any tombstones encountered are then ignored and skipped over.
  • In some embodiments, when an entry is deleted, the lookup algorithm is used to find the slot for the name in order to form B-tree. If the slot exists and the next slot is unused, then the entry is deleted. If the next slot is occupied, then it is assumed that a hash collision occurs and the entry is replaced by a tombstone (a zero-length name, and no payload). If there is a series of tombstones followed by an empty entry, the tombstones can be deleted.
  • In some embodiments, three types of hash function can be used:
  • 1) crc32c. It is a resource inexpensive 32-bit hash. crc32c is neither strong nor collision resistant. Resulting names will have no apparent order and can be attacked to cause collisions, resulting in a denial-of-service attack. crc32c is used for “xattrs” since “xattr” is not generally writable from untrusted sources, and the number of “xattr” is not large.
  • 2) SipHash. It is an efficient 64-bit keyed hash. Each directory has its own randomly generated key which is not exposed outside of the filesystem. Any attacker would need to know the key to be able to cause collisions at a higher rate than a random chance.
  • 3) SipHash-order3. It is a variant of SipHash that truncates the hash to 4 bytes and prepends 3 bytes of the name to the start of the hash. With a “normal” mix of names in a directory, this may result in a directory that is nearly lexically sorted, with entries having a common prefix being mixed randomly. This may help to accommodate applications which tend to operate on directories in an alphabetical order.
  • In some embodiments, a case-insensitive variant of a hash function can be used to support case-insensitive names of directories. For example, the hash function may be marked as “CICP” (that is “case-insensitive, case-preserving”). Using a case-insensitive hash function allows looking up names without regard to case of characters of the names while keeping original capitalization of characters of the names. Applying a case-insensitive hash function allows performing filesystem operations on large directories to be efficient. For example, without case-insensitive lookups, Samba package is needed to scan the whole directory to perform case-insensitive matching. Names of directories are required to be properly formed with UTF-8 coding.
  • In some embodiments, when an Inode loses the last name (that is nlink=0 or the Inode is “deleted”), the Inode may still be in use. While the file is still open, it functions normally. This means that the bulk of the work related to deleting the file is deferred until the file is closed.
  • If there is a crash occurring before the close happens, then the Inode state could be left as stray garbage, which is Inodes with no names. Such an Inode is effectively inaccessible. At the same time the Inode can be still present within the B-tree, so a garbage collection stores Inode data alive.
  • In some embodiments, to solve the issue of having an Inode without a last name, an “orphan directory” is provided. The “orphan directory” is a nameless directory with Inum=1. When a file is unlinked and the file “nlink” count goes to zero, an entry is added to the orphan directory and the file's nlink remains 0. When mounted, it traverses the entries in the orphan directory and releases all associated data.
  • FIG. 10 is a process flow diagram showing a method 1000 for organizing data, according to an example embodiment. The method 1000 can be implemented using a computer system. The example computer system is shown in FIG. 11.
  • The method 1000 may commence with providing an object store to store objects in block 1010. The object represents fragments of the data and can be assigned addresses. In block 1020, method 1000 can proceed with associating a B-tree with the object store. The B-tree includes nodes. Each of the nodes includes keys. Each of the keys is associated with at least one object from the object store. In block 1030, method 1000 generates, based at least partially on objects from the object store, values for the each of the keys. In block 1040, method 1000 allows determining that a size of an object from the object store is less than a pre-determined size. In block 1050, if the result of the determination is positive, a value of the object is stored in a particular node of the B-tree, wherein the particular node includes a particular key associated with the object. In block 1060, if the result of the determination is negative, the address of the object is stored in the particular node of the B-tree.
  • FIG. 11 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of a computer system 1100, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed. In various exemplary embodiments, the machine operates as a standalone device or can be connected (e.g., networked) to other machines. In a networked deployment, the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine can be a server, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a digital camera, a portable music player (e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 (MP3) player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
  • The example computer system 1100 includes a processor or multiple processors 1102, a hard disk drive 1104, a main memory 1106, and a static memory 1108, which communicate with each other via a bus 1110. The computer system 1100 may also include a network interface device 1112. The hard disk drive 1104 may include a computer-readable medium 1120, which stores one or more sets of instructions 1122 embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 1122 can also reside, completely or at least partially, within the main memory 1106 and/or within the processors 1102 during execution thereof by the computer system 1100. The main memory 1106 and the processors 1102 also constitute machine-readable media.
  • While the computer-readable medium 1120 is shown in an exemplary embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions. The term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Such media can also include, without limitation, hard disks, floppy disks, NAND or NOR flash memory, digital video disks, RAM, ROM, and the like.
  • The exemplary embodiments described herein can be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware. The computer-executable instructions can be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions can be executed on a variety of hardware platforms and for interfaces to a variety of operating systems. Although not limited thereto, computer software programs for implementing the present method can be written in any number of suitable programming languages such as, for example, C, Python, JavaScript, Go, or other compilers, assemblers, interpreters or other computer languages or platforms.
  • Thus, systems and methods for methods for organizing data are disclosed. Although embodiments have been described with reference to specific example embodiments, it may be evident that various modifications and changes can be made to these example embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims (20)

What is claimed is:
1. A computer-implemented method for organizing data, the method comprising:
providing an object store to store objects, each of the objects representing a fragment of the data and being associated with an address;
associating a B-tree with the object store, the B-tree including nodes, wherein each of the nodes includes keys, wherein each of the keys is associated with at least one object from the object store; and
generating, based at least partially on objects from the object store, values for each of the keys.
2. The method of claim 1, wherein the address of an object in the object store is based on content of the fragment of the data.
3. The method of claim 1, further comprising:
determining that a size of an object from the object store is less than a pre-determined size;
if a result of the determination is positive, storing a value of the object in a particular node of the B-tree, the particular node including a particular key associated with the object; and
if a the result of the determination is negative, storing the address of the object in the particular node of the B-tree.
4. The method of claim 1, wherein each of the objects includes at least one of the following:
a metadata object storing at least a number of references to further objects from the data objects and an identification number associated with a file or a directory; and
a data object representing at least one of the following: a continuous fragment of the file or a directory entry.
5. The method of claim 4, wherein the key associated with the metadata object includes at least a first field including an indication of the metadata object, a second field including the identification number, and a third field including a metadata index representing a distinct type of the metadata object.
6. The method of claim 5, wherein the type of the metadata object includes at least one of the following: attributes associated with the file or the directory, a symbolic link to the file or the directory, and an extended file attribute associated with the file or the directory.
7. The method of claim 4, wherein the key associated with the fragment of the file includes at least a first field including an indication of the data object, a second field including the identification number, and a third field including an offset of the continuous fragment of the file from a beginning of the file.
8. The method of claim 4, wherein the key associated with the directory entry includes at least a first field including an indication of the data object, a second field including the identification number, and a third field including a hash calculated based on a literal name of the directory entry.
9. The method of claim 8, wherein calculating the hash includes:
applying a hash function to the literal name to obtain a preliminary hash, the hash function including at least one of the following: crc32c, SipHash, and SipHash-order3; and
shifting the preliminary hash by a pre-determined base number to obtain the hash.
10. The method of claim 9, further comprising:
determining that the hash matches an existing hash; and
based on the determination, incrementing the hash by 1.
11. A system for organizing data, the system comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor, the memory storing instructions, which, when executed by the at least one processor, perform a method comprising:
providing an object store to store objects, each of the objects representing a fragment of the data and associated with an address;
associating a B-tree with the object store, the B-tree including nodes, wherein each of the nodes includes keys, wherein each of the keys is associated with at least one object from the object store; and
generating, based at least partially on objects from the objects, values for each of the keys.
12. The system of claim 11, wherein the address of an object in the object store is based on content of the fragment of the data.
13. The system of claim 11, wherein the method further comprising:
determining that a size of an object from the object store is less than a pre-determined size;
if a result of the determination is positive, storing a value of the object in a particular node of the B-tree, the particular node including a particular key associated with the object; and
if a the result of the determination is negative, storing the address of the object in the particular node of the B-tree.
14. The system of claim 11, wherein each of the objects includes at least one of the following:
a metadata object storing at least a number of references to further objects from the data objects and an identification number associated with a file or a directory; and
a data object representing at least one of the following: a continuous fragment of the file or a directory entry.
15. The system of claim 14, wherein the key associated with the metadata object includes at least a first field including an indication of the metadata object, a second field including the identification number, and a third field including a metadata index representing a distinct type of the metadata object.
16. The system of claim 15, wherein the type of the metadata object includes at least one of the following: attributes associated with the file or the directory, a symbolic link to the file or the directory, and an extended file attribute associated with the file or the directory.
17. The system of claim 14, wherein the key associated with the fragment of the file includes at least a first field including an indication of the data object, a second field including the identification number, and a third field including an offset of the continuous fragment of the file from beginning of the file.
18. The system of claim 14, wherein the key associated with the directory entry includes at least a first field including an indication of the data object, a second field including the identification number, and a third field including a hash calculated based on a literal name of the directory entry.
19. The system of claim 18, wherein calculating the hash includes:
applying a hash function to the literal name to obtain a preliminary hash, the hash function including at least one of the following: crc32c, SipHash, and SipHash-order3;
shifting the preliminary hash by a pre-determined base number to obtain the hash;
determining that the hash matches an existing hash; and
based on the determination, incrementing the hash by 1.
20. A non-transitory computer-readable storage medium having embodied thereon instructions, which, when executed by one or more processors, perform a method for organizing data, the method comprising:
providing an object store to store objects, each of the objects representing a fragment of the data and associated with an address;
associating a B-tree with the object store, the B-tree including nodes, wherein each of the nodes includes keys, wherein each of the keys is associated with at least one object from the object store;
generating, based at least partially on objects from the objects, values for the each of the keys;
determining that a size of an object from the object store is less than a pre-determined size;
if a result of the determination is positive, storing a value of the object in a particular node of the B-tree, the particular node including a particular key associated with the object; and
if a result of the determination is negative, storing the address of the object in the particular node of the B-tree.
US15/084,401 2015-08-26 2016-03-29 B-Tree Based Data Model for File Systems Abandoned US20170060924A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/084,401 US20170060924A1 (en) 2015-08-26 2016-03-29 B-Tree Based Data Model for File Systems

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562210385P 2015-08-26 2015-08-26
US15/084,401 US20170060924A1 (en) 2015-08-26 2016-03-29 B-Tree Based Data Model for File Systems

Publications (1)

Publication Number Publication Date
US20170060924A1 true US20170060924A1 (en) 2017-03-02

Family

ID=58095689

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/084,401 Abandoned US20170060924A1 (en) 2015-08-26 2016-03-29 B-Tree Based Data Model for File Systems
US15/084,399 Active 2037-07-24 US10474654B2 (en) 2015-08-26 2016-03-29 Structural data transfer over a network

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/084,399 Active 2037-07-24 US10474654B2 (en) 2015-08-26 2016-03-29 Structural data transfer over a network

Country Status (1)

Country Link
US (2) US20170060924A1 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180373727A1 (en) * 2017-06-26 2018-12-27 Vmware, Inc. Management of b-tree leaf nodes with variable size values
US10491387B2 (en) * 2016-11-15 2019-11-26 International Business Machines Corporation End-to-end encryption of a block storage device with protected key
US20190392062A1 (en) * 2018-06-22 2019-12-26 EMC IP Holding Company LLC Multi-level data deduplication for elastic cloud storage devices
US10628407B1 (en) * 2017-02-27 2020-04-21 Amazon Technologies, Inc. Efficient multithreaded data structure processing
US20200185057A1 (en) * 2018-08-03 2020-06-11 Catalog Technologies, Inc. Systems and methods for storing and reading nucleic acid-based data with error protection
US11062084B2 (en) * 2018-06-27 2021-07-13 Microsoft Technology Licensing, Llc Generating diverse smart replies using synonym hierarchy
US11188194B2 (en) 2018-06-27 2021-11-30 Microsoft Technology Licensing, Llc Personalization and synonym hierarchy for smart replies
US20220188339A1 (en) * 2020-12-16 2022-06-16 Electronics And Telecommunications Research Institute Network environment synchronization apparatus and method
US11556897B2 (en) 2018-05-31 2023-01-17 Microsoft Technology Licensing, Llc Job-post budget recommendation based on performance
US11636220B2 (en) * 2019-02-01 2023-04-25 Intertrust Technologies Corporation Data management systems and methods
US11658926B2 (en) 2018-06-27 2023-05-23 Microsoft Technology Licensing, Llc Generating smart replies involving image files
US11671492B2 (en) * 2020-09-10 2023-06-06 EMC IP Holding Company LLC Multipart upload for distributed file systems
US11763169B2 (en) 2016-11-16 2023-09-19 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
US20230334093A1 (en) * 2022-04-13 2023-10-19 Oracle International Corporation Graph-organized file system
US20230334014A1 (en) * 2022-04-13 2023-10-19 Oracle International Corporation Implementing graph search with in-structure metadata of a graph-organized file system
US12002547B2 (en) 2019-05-09 2024-06-04 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
US12006497B2 (en) 2018-03-16 2024-06-11 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
US20240333794A1 (en) * 2017-10-12 2024-10-03 Stanley Bruce Kinsey Cloud-based index and centralized cloud-based hub interface for cloud-stored media

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400249A (en) * 2020-03-06 2020-07-10 深圳市瑞驰信息技术有限公司 File storage system and method easy for counting file number
US20230177069A1 (en) * 2021-12-08 2023-06-08 Vmware, Inc. Efficient journal log record for copy-on-write b+ tree operation
CN120255819B (en) * 2025-05-23 2025-09-30 安徽海马云科技股份有限公司 File processing method and device

Citations (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418947A (en) * 1992-12-23 1995-05-23 At&T Corp. Locating information in an unsorted database utilizing a B-tree
US5752243A (en) * 1993-10-20 1998-05-12 Microsoft Corporation Computer method and storage structure for storing and accessing multidimensional data
US6052688A (en) * 1995-01-26 2000-04-18 Hans Verner Thorsen Computer-implemented control of access to atomic data items
US6411957B1 (en) * 1999-06-30 2002-06-25 Arm Limited System and method of organizing nodes within a tree structure
US20020161781A1 (en) * 2001-04-27 2002-10-31 Sun Microsystems, Inc. Method, system, program, and computer readable medium for indexing object oriented objects in an object oriented database
US20020194184A1 (en) * 2001-06-04 2002-12-19 Baskins Douglas L. System for and method of efficient, expandable storage and retrieval of small datasets
US6675173B1 (en) * 1998-01-22 2004-01-06 Ori Software Development Ltd. Database apparatus
US20040133590A1 (en) * 2002-08-08 2004-07-08 Henderson Alex E. Tree data structure with range-specifying keys and associated methods and apparatuses
US20050063083A1 (en) * 2003-08-21 2005-03-24 Dart Scott E. Systems and methods for the implementation of a digital images schema for organizing units of information manageable by a hardware/software interface system
US20050257083A1 (en) * 2004-05-13 2005-11-17 Cousins Robert E Transaction-based storage system and method that uses variable sized objects to store data
US7213040B1 (en) * 2002-10-29 2007-05-01 Novell, Inc. Apparatus for policy based storage of file data and meta-data changes over time
US20070185902A1 (en) * 2006-01-26 2007-08-09 Seagate Technology Llc Object-based data storage device
US20070294319A1 (en) * 2006-06-08 2007-12-20 Emc Corporation Method and apparatus for processing a database replica
US20070299867A1 (en) * 2006-06-23 2007-12-27 Timothy John Baldwin Method and System for Defining a Heirarchical Structure
US7346734B2 (en) * 2005-05-25 2008-03-18 Microsoft Corporation Cluster storage collection based data management
US20090037500A1 (en) * 2007-07-31 2009-02-05 Kirshenbaum Evan R Storing nodes representing respective chunks of files in a data store
US20090182837A1 (en) * 2008-01-11 2009-07-16 Rogers J Andrew Spatial Sieve Tree
US20090234821A1 (en) * 2004-09-15 2009-09-17 International Business Machines Corporation Systems and Methods for Efficient Data Searching, Storage and Reduction
US20100042703A1 (en) * 2006-11-17 2010-02-18 Koninklijke Philips Electronics N.V. Method and apparatus for assigning addresses to nodes of a communication network tree structure
US20100146003A1 (en) * 2008-12-10 2010-06-10 Unisys Corporation Method and system for building a B-tree
US7853591B1 (en) * 2006-06-30 2010-12-14 Juniper Networks, Inc. Protection of database operations
US20110022573A1 (en) * 2009-07-27 2011-01-27 International Business Machines Corporation Preventing transfer and duplication of redundantly referenced objects across nodes of an application system
US20110225429A1 (en) * 2008-08-29 2011-09-15 Charalampos Papamanthou Cryptographic accumulators for authenticated hash tables
US20110314246A1 (en) * 2010-06-16 2011-12-22 Microsoft Corporation Hierarchical allocation for file system storage device
US20120030256A1 (en) * 2010-07-30 2012-02-02 Wolfgang Pfeifer Common Modeling of Data Access and Provisioning for Search, Query, Reporting and/or Analytics
US20120117067A1 (en) * 2010-10-29 2012-05-10 Navteq North America, Llc Method and apparatus for providing a range ordered tree structure
US20120331249A1 (en) * 2011-06-23 2012-12-27 CohortFS, LLC Dynamic data placement for distributed storage
US20130007007A1 (en) * 2011-06-29 2013-01-03 Nokia Corporation Method and apparatus for providing a list-based interface to key-value stores
US20130086303A1 (en) * 2011-09-30 2013-04-04 Fusion-Io, Inc. Apparatus, system, and method for a persistent object store
US20130346725A1 (en) * 2012-06-20 2013-12-26 Microsoft Corporation Structuring storage based on latch-free b-trees
US20140164284A1 (en) * 2012-12-12 2014-06-12 Michael Hartel Business Object Data Structures with Node Keys
US8868926B2 (en) * 2012-04-06 2014-10-21 Exablox Corporation Cryptographic hash database
US20140317065A1 (en) * 2013-04-23 2014-10-23 Exablox Corporation Reference counter integrity checking
US20150220578A1 (en) * 2014-02-04 2015-08-06 Exablox Corporation Content based organization of file systems
US20150278271A1 (en) * 2014-03-31 2015-10-01 Sandisk Enterprise Ip Llc Compaction of Information in Tiered Data Structure
US20150370502A1 (en) * 2014-06-19 2015-12-24 Cohesity, Inc. Making more active use of a secondary storage system
US20160063008A1 (en) * 2014-09-02 2016-03-03 Netapp, Inc. File system for efficient object fragment access
US20160210080A1 (en) * 2015-01-20 2016-07-21 Ultrata Llc Object memory data flow instruction execution
US20160306817A1 (en) * 2015-04-14 2016-10-20 Et International, Inc. Systems and methods for key-value stores
US9864773B1 (en) * 2013-04-29 2018-01-09 Seagate Technology Llc Object-based commands with data integrity identifiers

Family Cites Families (177)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8303167A (en) 1983-09-14 1985-04-01 Philips Nv ONLY SERVICE DEVICE WITH MICRO-COMPUTER PROTECTED FROM FAULTS.
US4660130A (en) 1984-07-24 1987-04-21 Texas Instruments Incorporated Method for managing virtual memory to separate active and stable memory blocks
US4953080A (en) * 1988-04-25 1990-08-28 Hewlett-Packard Company Object management facility for maintaining data in a computer system
CA2285089C (en) 1991-11-12 2000-05-09 Ibm Canada Limited-Ibm Canada Limitee Logical mapping of data objects using data spaces
GB2265734A (en) 1992-03-27 1993-10-06 Ibm Free memory cell management system
US5832529A (en) 1996-10-11 1998-11-03 Sun Microsystems, Inc. Methods, apparatus, and product for distributed garbage collection
US6237009B1 (en) 1996-10-11 2001-05-22 Sun Microsystems, Inc. Lease renewal service
US6167437A (en) 1997-09-02 2000-12-26 Silicon Graphics, Inc. Method, system, and computer program product for page replication in a non-uniform memory access system
US6098079A (en) 1998-04-02 2000-08-01 Mitsubishi Electric Information Technology Center America, Inc. (Ita) File version reconciliation using hash codes
US6154747A (en) 1998-08-26 2000-11-28 Hunt; Rolf G. Hash table implementation of an object repository
GB9825102D0 (en) 1998-11-16 1999-01-13 Insignia Solutions Plc Computer system
JP2000207266A (en) 1999-01-13 2000-07-28 Mitsubishi Electric Corp Replica system and replica method
US6480950B1 (en) 2000-01-24 2002-11-12 Oracle International Corporation Software paging system
US7266555B1 (en) 2000-03-03 2007-09-04 Intel Corporation Methods and apparatus for accessing remote storage through use of a local device
AU2001263033A1 (en) 2000-05-09 2001-11-20 Sun Microsystems, Inc. Method and apparatus for proximity discovery of services
US8219662B2 (en) 2000-12-06 2012-07-10 International Business Machines Corporation Redirecting data generated by network devices
US7216136B2 (en) 2000-12-11 2007-05-08 International Business Machines Corporation Concurrent collection of cyclic garbage in reference counting systems
SE0004736D0 (en) 2000-12-20 2000-12-20 Ericsson Telefon Ab L M Mapping system and method
US20030028514A1 (en) 2001-06-05 2003-02-06 Lord Stephen Philip Extended attribute caching in clustered filesystem
US7222187B2 (en) 2001-07-31 2007-05-22 Sun Microsystems, Inc. Distributed trust mechanism for decentralized networks
US7134041B2 (en) 2001-09-20 2006-11-07 Evault, Inc. Systems and methods for data backup over a network
US6973049B2 (en) 2001-10-16 2005-12-06 Corrigent Systems Ltd. Auto-configuration of network interfaces in a bidirectional ring network
US7117505B2 (en) 2001-11-29 2006-10-03 Veritas Operating Corporation Methods, systems, and apparatus to interface with storage objects
US7177980B2 (en) 2001-12-18 2007-02-13 Storage Technology Corporation Cache storage system and method
US7392421B1 (en) 2002-03-18 2008-06-24 Symantec Operating Corporation Framework for managing clustering and replication
US6968439B2 (en) 2002-08-29 2005-11-22 Micron Technology, Inc. Single segment data object management
EP1540510B1 (en) 2002-09-10 2009-08-12 Exagrid Systems, Inc. Method and apparatus for managing data integrity of backup and disaster recovery data
GB0301726D0 (en) 2003-01-24 2003-02-26 Ecebs Ltd Improved smartcard
US7043494B1 (en) 2003-01-28 2006-05-09 Pmc-Sierra, Inc. Fast, deterministic exact match look-ups in large tables
US7177886B2 (en) 2003-02-07 2007-02-13 International Business Machines Corporation Apparatus and method for coordinating logical data replication with highly available data replication
JP4068473B2 (en) 2003-02-19 2008-03-26 株式会社東芝 Storage device, assignment range determination method and program
US7478096B2 (en) 2003-02-26 2009-01-13 Burnside Acquisition, Llc History preservation in a computer storage system
US7403961B1 (en) 2003-03-14 2008-07-22 Xilinx, Inc. Dangling reference detection and garbage collection during hardware simulation
US6988180B2 (en) 2003-09-29 2006-01-17 Microsoft Corporation Method and apparatus for lock-free, non-blocking hash table
US7496690B2 (en) 2003-10-09 2009-02-24 Intel Corporation Method, system, and program for managing memory for data transmission through a network
US7596704B2 (en) 2003-10-10 2009-09-29 Jing-Jang Hwang Partition and recovery of a verifiable digital secret
US8103772B2 (en) 2003-12-24 2012-01-24 Sap Aktiengesellschaft Cluster extension in distributed systems using tree method
EP1870814B1 (en) 2006-06-19 2014-08-13 Texas Instruments France Method and apparatus for secure demand paging for processor devices
US8185663B2 (en) 2004-05-11 2012-05-22 Hewlett-Packard Development Company, L.P. Mirroring storage interface
US7065611B2 (en) 2004-06-29 2006-06-20 Hitachi, Ltd. Method for controlling storage policy according to volume activity
US7715396B2 (en) 2004-08-19 2010-05-11 Microsoft Corporation Network routing
US7613703B2 (en) 2004-09-30 2009-11-03 Microsoft Corporation Organizing resources into collections to facilitate more efficient and reliable resource access
CN1761219B (en) 2004-10-12 2010-04-28 华为技术有限公司 The Method of Realizing Topology Automatic Discovery in MPLS Ring Network
US20060083247A1 (en) 2004-10-14 2006-04-20 Sun Microsystems, Inc. Prefix lookup using address-directed hash tables
US7694167B2 (en) * 2004-10-22 2010-04-06 Microsoft Corporation Maintaining routing consistency within a rendezvous federation
US20120310892A1 (en) * 2004-12-21 2012-12-06 Dam Tru Q System and method for virtual cluster file server
WO2006098723A1 (en) 2005-03-10 2006-09-21 Thomson Licensing Hybrid mesh routing protocol
WO2006094366A1 (en) 2005-03-11 2006-09-14 Rocksoft Limited Method for indexing in a reduced-redundancy storage system
US8356021B2 (en) 2005-03-11 2013-01-15 Ross Neil Williams Method and apparatus for indexing in a reduced-redundancy storage system
US8442996B2 (en) 2005-04-12 2013-05-14 Enrico Maim Methods for granting access to resources modifiable by users in a computer environment, and resources structured therefore
US7539836B1 (en) 2005-04-18 2009-05-26 Netapp, Inc. Method and system for configuring a data storage object
US20070005746A1 (en) 2005-06-30 2007-01-04 Roe Bryan Y Enhanced network discovery service
US8296550B2 (en) 2005-08-29 2012-10-23 The Invention Science Fund I, Llc Hierarchical register file with operand capture ports
US20070130232A1 (en) * 2005-11-22 2007-06-07 Therrien David G Method and apparatus for efficiently storing and managing historical versions and replicas of computer data files
US7653650B2 (en) * 2005-12-13 2010-01-26 International Business Machines Corporation Apparatus, system, and method for synchronizing change histories in enterprise applications
US8554758B1 (en) 2005-12-29 2013-10-08 Amazon Technologies, Inc. Method and apparatus for monitoring and maintaining health in a searchable data service
US7454592B1 (en) 2006-02-16 2008-11-18 Symantec Operating Corporation Block-level and hash-based single-instance storage
CN101046755B (en) 2006-03-28 2011-06-15 郭明南 System and method of computer automatic memory management
US7801128B2 (en) 2006-03-31 2010-09-21 Amazon Technologies, Inc. Managing communications between computing nodes
US20070233828A1 (en) 2006-03-31 2007-10-04 Jeremy Gilbert Methods and systems for providing data storage and retrieval
US9195665B2 (en) 2006-04-28 2015-11-24 Hewlett-Packard Development Company, L.P. Method and system for data retention
US20070271303A1 (en) 2006-05-18 2007-11-22 Manuel Emilio Menendez Personal file version archival management and retrieval
US8255420B2 (en) 2006-05-23 2012-08-28 Noryan Holding Corporation Distributed storage
US8099605B1 (en) 2006-06-05 2012-01-17 InventSec AB Intelligent storage device for backup system
US8990270B2 (en) 2006-08-03 2015-03-24 Hewlett-Packard Development Company, L. P. Protocol virtualization for a network file system
WO2008024971A1 (en) 2006-08-25 2008-02-28 University Of Florida Research Foundation Inc. Recursively partioned static ip router tables
US7539783B2 (en) 2006-09-22 2009-05-26 Commvault Systems, Inc. Systems and methods of media management, such as management of media to and from a media storage library, including removable media
WO2008043197A1 (en) 2006-09-26 2008-04-17 Intel Corporation Dynamically changing a garbage collector in a managed runtime system
US7827218B1 (en) 2006-11-18 2010-11-02 X-Engines, Inc. Deterministic lookup using hashed key in a multi-stride compressed trie structure
US8161353B2 (en) 2007-12-06 2012-04-17 Fusion-Io, Inc. Apparatus, system, and method for validating that a correct data segment is read from a data storage device
US7987278B2 (en) 2006-12-18 2011-07-26 Ricoh Company, Ltd. Web services device profile on a multi-service device: dynamic addition of services
US7840537B2 (en) 2006-12-22 2010-11-23 Commvault Systems, Inc. System and method for storing redundant information
US8935206B2 (en) 2007-01-31 2015-01-13 Hewlett-Packard Development Company, L.P. Snapshots in distributed storage systems
JP5020673B2 (en) 2007-03-27 2012-09-05 株式会社日立製作所 A computer system that prevents the storage of duplicate files
WO2008125336A2 (en) 2007-04-15 2008-10-23 Phoenix Contact Gmbh & Co. Kg Method and control device for controlling an automation system
EP2015587B1 (en) 2007-05-14 2012-01-25 Apple Inc. Method of storing a multimedia object in memory, associated data structure and terminal
US7725437B2 (en) 2007-07-31 2010-05-25 Hewlett-Packard Development Company, L.P. Providing an index for a data store
JP4386932B2 (en) 2007-08-17 2009-12-16 富士通株式会社 Storage management program, storage management device, and storage management method
US8069317B2 (en) 2007-10-12 2011-11-29 International Business Machines Corporation Providing and utilizing high performance block storage metadata
US9413825B2 (en) * 2007-10-31 2016-08-09 Emc Corporation Managing file objects in a data storage system
US8447733B2 (en) 2007-12-03 2013-05-21 Apple Inc. Techniques for versioning file systems
US8140599B1 (en) 2007-12-07 2012-03-20 Emc Corporation Garbage collection for merged collections
US8244846B2 (en) 2007-12-26 2012-08-14 Symantec Corporation Balanced consistent hashing for distributed resource management
US8171244B2 (en) 2008-02-01 2012-05-01 Imation Corp. Methods for implementation of worm mode on a removable disk drive storage system
JP5091704B2 (en) 2008-02-06 2012-12-05 株式会社日立製作所 Storage configuration recovery method and storage management system
US8019882B2 (en) 2008-06-27 2011-09-13 Microsoft Corporation Content identification for peer-to-peer content retrieval
JP5146174B2 (en) 2008-07-28 2013-02-20 富士通株式会社 Virtual machine monitor device and program, and virtual machine memory sharing management method
US8078646B2 (en) 2008-08-08 2011-12-13 Oracle International Corporation Representing and manipulating RDF data in a relational database management system
WO2010061260A1 (en) 2008-11-03 2010-06-03 Elvin Slavik Method, system, and product for managing spatial data in a database
US8353018B2 (en) 2008-11-13 2013-01-08 Yahoo! Inc. Automatic local listing owner authentication system
JP5396848B2 (en) 2008-12-16 2014-01-22 富士通株式会社 Data processing program, server device, and data processing method
US8886714B2 (en) 2011-08-08 2014-11-11 Ctera Networks Ltd. Remote access service for cloud-enabled network devices
US9344438B2 (en) 2008-12-22 2016-05-17 Qualcomm Incorporated Secure node identifier assignment in a distributed hash table for peer-to-peer networks
US8132168B2 (en) 2008-12-23 2012-03-06 Citrix Systems, Inc. Systems and methods for optimizing a process of determining a location of data identified by a virtual hard drive address
US8094500B2 (en) 2009-01-05 2012-01-10 Sandisk Technologies Inc. Non-volatile memory and method with write cache partitioning
US8566362B2 (en) 2009-01-23 2013-10-22 Nasuni Corporation Method and system for versioned file system using structured data representations
US8397051B2 (en) 2009-02-23 2013-03-12 Autonomy, Inc. Hybrid hash tables
US8301654B2 (en) 2009-02-24 2012-10-30 Hitachi, Ltd. Geographical distributed storage system based on hierarchical peer to peer architecture
US8135748B2 (en) 2009-04-10 2012-03-13 PHD Virtual Technologies Virtual machine data replication
US8886206B2 (en) 2009-05-01 2014-11-11 Digimarc Corporation Methods and systems for content processing
US8407190B2 (en) 2009-06-30 2013-03-26 Commvault Systems, Inc. Performing data storage operations with a cloud environment, including containerized deduplication, data pruning, and data transfer
US8706980B2 (en) 2009-07-30 2014-04-22 Cleversafe, Inc. Method and apparatus for slice partial rebuilding in a dispersed storage network
US8320282B2 (en) 2009-07-30 2012-11-27 Calix, Inc. Automatic control node selection in ring networks
US8285925B1 (en) 2009-07-31 2012-10-09 Amazon Technologies, Inc. Management of object mapping information corresponding to a distributed storage system
US8843762B2 (en) 2009-09-04 2014-09-23 Gradiant, Centro Tecnolóxico de Telecomunicacións de Galicia Cryptographic system for performing secure iterative computations and signal processing directly on encrypted data in untrusted environments
BR112012007376B1 (en) 2009-10-01 2023-02-14 Telefonaktiebolaget Lm Ericsson (Publ) SCOPE OF APPLICATION PLATFORM, USER DEVICE FOR A WEB APPLICATION, AND METHOD FOR ENABLING COMMUNICATION BETWEEN A SCOPE OF APPLICATION PLATFORM AND A WEB APPLICATION
JP2011095976A (en) 2009-10-29 2011-05-12 Nippon Telegr & Teleph Corp <Ntt> Device, method and program for distributed data management
US8868510B2 (en) 2009-12-03 2014-10-21 Sybase, Inc. Managing data storage as an in-memory database in a database management system
US8452739B2 (en) 2010-03-16 2013-05-28 Copiun, Inc. Highly scalable and distributed data de-duplication
US8458131B2 (en) 2010-02-26 2013-06-04 Microsoft Corporation Opportunistic asynchronous de-duplication in block level backups
US8458299B2 (en) 2010-03-17 2013-06-04 Hitachi, Ltd. Metadata management method for NAS global namespace design
FR2959089B1 (en) 2010-04-16 2012-08-03 Inst Nat Rech Inf Automat COMPUTER RESOURCE AND INFRASTRUCTURE MANAGEMENT TOOL AND NETWORKS
US20110264880A1 (en) 2010-04-23 2011-10-27 Tatu Ylonen Oy Ltd Object copying with re-copying concurrently written objects
US9047218B2 (en) 2010-04-26 2015-06-02 Cleversafe, Inc. Dispersed storage network slice name verification
US8504718B2 (en) 2010-04-28 2013-08-06 Futurewei Technologies, Inc. System and method for a context layer switch
EP2569904B1 (en) * 2010-05-10 2020-12-02 Telefonaktiebolaget LM Ericsson (publ) A ring node, an ethernet ring and methods for loop protection in an ethernet ring
US8984241B2 (en) 2010-07-07 2015-03-17 Nexenta Systems, Inc. Heterogeneous redundant storage array
JP5982366B2 (en) 2010-07-09 2016-08-31 ステイト・ストリート・コーポレーション Systems and methods for private cloud computing
WO2012016089A2 (en) 2010-07-28 2012-02-02 Fusion-Io, Inc. Apparatus, system, and method for conditional and atomic storage operations
US20120030260A1 (en) 2010-07-30 2012-02-02 Maohua Lu Scalable and parallel garbage collection method and system for incremental backups with data de-duplication
US8407438B1 (en) 2010-08-16 2013-03-26 Symantec Corporation Systems and methods for managing virtual storage disk data
US8495093B2 (en) 2010-08-18 2013-07-23 International Business Machines Corporation Multiway trie data structure that dynamically adjusts node sizes in a manner that reduces memory footprint and improves access speed
JP5514041B2 (en) 2010-08-25 2014-06-04 日本電信電話株式会社 Identifier assignment method and program
US8473778B2 (en) 2010-09-08 2013-06-25 Microsoft Corporation Erasure coding immutable data
US8972366B2 (en) 2010-09-29 2015-03-03 Red Hat, Inc. Cloud-based directory system based on hashed values of parent and child storage locations
WO2012051600A2 (en) 2010-10-15 2012-04-19 Kyquang Son File system-aware solid-state storage management system
US8239584B1 (en) 2010-12-16 2012-08-07 Emc Corporation Techniques for automated storage management
US8484408B2 (en) 2010-12-29 2013-07-09 International Business Machines Corporation Storage system cache with flash memory in a raid configuration that commits writes as full stripes
WO2012092553A1 (en) 2010-12-31 2012-07-05 Desktone, Inc. Providing virtual desktops using resources accessed on public computer networks
US9311135B2 (en) 2011-01-18 2016-04-12 Scality, S.A. Method for generating universal objects identifiers in distributed multi-purpose storage systems
US9251087B2 (en) 2011-02-11 2016-02-02 SanDisk Technologies, Inc. Apparatus, system, and method for virtual memory management
US8510267B2 (en) 2011-03-08 2013-08-13 Rackspace Us, Inc. Synchronization of structured information repositories
US8966191B2 (en) 2011-03-18 2015-02-24 Fusion-Io, Inc. Logical interface for contextual storage
US8341312B2 (en) 2011-04-29 2012-12-25 International Business Machines Corporation System, method and program product to manage transfer of data to resolve overload of a storage system
US8572290B1 (en) 2011-05-02 2013-10-29 Board Of Supervisors Of Louisiana State University And Agricultural And Mechanical College System and architecture for robust management of resources in a wide-area network
US8433681B2 (en) 2011-05-12 2013-04-30 Dell Products L.P. System and method for managing replication in an object storage system
US8346810B2 (en) 2011-05-13 2013-01-01 Simplivity Corporation Reference count propagation
US8949182B2 (en) 2011-06-17 2015-02-03 International Business Machines Corporation Continuous and asynchronous replication of a consistent dataset
US8843998B2 (en) 2011-06-27 2014-09-23 Cliqr Technologies, Inc. Apparatus, systems and methods for secure and selective access to services in hybrid public-private infrastructures
US8880837B2 (en) 2011-08-24 2014-11-04 International Business Machines Corporation Preemptively allocating extents to a data set
WO2013039535A1 (en) 2011-09-12 2013-03-21 Microsoft Corporation Querying and repairing data
US9355119B2 (en) * 2011-09-22 2016-05-31 Netapp, Inc. Allocation of absent data within filesystems
EP2575379A1 (en) 2011-09-29 2013-04-03 Alcatel Lucent Apparatuses and computer program products for discovering and accessing local services via WiFi hotspots
US9372827B2 (en) 2011-09-30 2016-06-21 Commvault Systems, Inc. Migration of an existing computing system to new hardware
US9455996B2 (en) 2011-10-03 2016-09-27 New York University Generating progressively a perfect hash data structure, such as a multi-dimensional perfect hash data structure, and using the generated data structure for high-speed string matching
US9035568B2 (en) 2011-12-05 2015-05-19 Qualcomm Incorporated Telehealth wireless communication hub device and service platform system
US8762627B2 (en) 2011-12-21 2014-06-24 Sandisk Technologies Inc. Memory logical defragmentation during garbage collection
AU2011384790A1 (en) 2011-12-30 2014-07-24 Schneider Electric It Corporation Systems and methods of remote communication
US8782344B2 (en) 2012-01-12 2014-07-15 Fusion-Io, Inc. Systems and methods for managing cache admission
US9489827B2 (en) 2012-03-12 2016-11-08 Cisco Technology, Inc. System and method for distributing content in a video surveillance network
US9213581B2 (en) 2012-03-14 2015-12-15 Sap Se Method and system for a cloud frame architecture
US9043567B1 (en) 2012-03-28 2015-05-26 Netapp, Inc. Methods and systems for replicating an expandable storage volume
US20130263151A1 (en) 2012-04-03 2013-10-03 Microsoft Corporation Consistent Hashing Table for Workload Distribution
US9628438B2 (en) 2012-04-06 2017-04-18 Exablox Consistent ring namespaces facilitating data storage and organization in network infrastructures
TWI631871B (en) 2012-04-27 2018-08-01 內數位專利控股公司 Method and apparatus for supporting proximity discovery procedures
US20150081964A1 (en) 2012-05-01 2015-03-19 Hitachi, Ltd. Management apparatus and management method of computing system
US9323589B2 (en) 2012-05-15 2016-04-26 Oracle International Corporation Self registration of event—consumers/producers and auto discovery
US20130346839A1 (en) 2012-06-20 2013-12-26 Francis Dinha Private tunnel network portal
US20130346591A1 (en) 2012-06-21 2013-12-26 Alcatel-Lucent Usa Inc. Clientless Cloud Computing
US9003477B2 (en) 2012-06-27 2015-04-07 Microsoft Technology Licensing, Llc Model for managing hosted resources using logical scopes
US9348652B2 (en) 2012-07-02 2016-05-24 Vmware, Inc. Multi-tenant-cloud-aggregation and application-support system
US8966343B2 (en) 2012-08-21 2015-02-24 Western Digital Technologies, Inc. Solid-state drive retention monitor using reference blocks
US9002792B2 (en) 2012-11-19 2015-04-07 Compellent Technologies Confirming data consistency in a data storage environment
US9715507B2 (en) 2013-03-28 2017-07-25 Ctera Networks, Ltd. Techniques for reconciling metadata and data in a cloud storage system without service interruption
US9786197B2 (en) 2013-05-09 2017-10-10 Rockwell Automation Technologies, Inc. Using cloud-based data to facilitate enhancing performance in connection with an industrial automation system
JP2016527741A (en) 2013-05-21 2016-09-08 エグザブロックス・コーポレーション Automatic data ring discovery and configuration
US9389651B2 (en) 2013-05-22 2016-07-12 Exablox Corporation Modular electronics chassis
WO2014201270A1 (en) 2013-06-12 2014-12-18 Exablox Corporation Hybrid garbage collection
WO2014205286A1 (en) 2013-06-19 2014-12-24 Exablox Corporation Data scrubbing in cluster-based storage systems
US8991950B2 (en) 2013-07-10 2015-03-31 Exablox Corporation Modular electronics chassis
US9934242B2 (en) 2013-07-10 2018-04-03 Exablox Corporation Replication of data between mirrored data sites
US20150066524A1 (en) 2013-09-05 2015-03-05 Dorsata, Inc. Methods and systems for the implementation of web based collaborative clinical pathways
WO2015054664A1 (en) 2013-10-11 2015-04-16 Exablox Corporation Hierarchical data archiving
US10248556B2 (en) 2013-10-16 2019-04-02 Exablox Corporation Forward-only paged data storage management where virtual cursor moves in only one direction from header of a session to data field of the session
US9985829B2 (en) 2013-12-12 2018-05-29 Exablox Corporation Management and provisioning of cloud connected devices
US9774582B2 (en) 2014-02-03 2017-09-26 Exablox Corporation Private cloud connected device cluster architecture
US11023453B2 (en) * 2015-01-29 2021-06-01 Hewlett Packard Enterprise Development Lp Hash index

Patent Citations (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418947A (en) * 1992-12-23 1995-05-23 At&T Corp. Locating information in an unsorted database utilizing a B-tree
US5752243A (en) * 1993-10-20 1998-05-12 Microsoft Corporation Computer method and storage structure for storing and accessing multidimensional data
US6052688A (en) * 1995-01-26 2000-04-18 Hans Verner Thorsen Computer-implemented control of access to atomic data items
US6675173B1 (en) * 1998-01-22 2004-01-06 Ori Software Development Ltd. Database apparatus
US6411957B1 (en) * 1999-06-30 2002-06-25 Arm Limited System and method of organizing nodes within a tree structure
US20020161781A1 (en) * 2001-04-27 2002-10-31 Sun Microsystems, Inc. Method, system, program, and computer readable medium for indexing object oriented objects in an object oriented database
US20020194184A1 (en) * 2001-06-04 2002-12-19 Baskins Douglas L. System for and method of efficient, expandable storage and retrieval of small datasets
US20040133590A1 (en) * 2002-08-08 2004-07-08 Henderson Alex E. Tree data structure with range-specifying keys and associated methods and apparatuses
US7213040B1 (en) * 2002-10-29 2007-05-01 Novell, Inc. Apparatus for policy based storage of file data and meta-data changes over time
US20050063083A1 (en) * 2003-08-21 2005-03-24 Dart Scott E. Systems and methods for the implementation of a digital images schema for organizing units of information manageable by a hardware/software interface system
US20050257083A1 (en) * 2004-05-13 2005-11-17 Cousins Robert E Transaction-based storage system and method that uses variable sized objects to store data
US20090234821A1 (en) * 2004-09-15 2009-09-17 International Business Machines Corporation Systems and Methods for Efficient Data Searching, Storage and Reduction
US7346734B2 (en) * 2005-05-25 2008-03-18 Microsoft Corporation Cluster storage collection based data management
US20070185902A1 (en) * 2006-01-26 2007-08-09 Seagate Technology Llc Object-based data storage device
US20070294319A1 (en) * 2006-06-08 2007-12-20 Emc Corporation Method and apparatus for processing a database replica
US20070299867A1 (en) * 2006-06-23 2007-12-27 Timothy John Baldwin Method and System for Defining a Heirarchical Structure
US7853591B1 (en) * 2006-06-30 2010-12-14 Juniper Networks, Inc. Protection of database operations
US20100042703A1 (en) * 2006-11-17 2010-02-18 Koninklijke Philips Electronics N.V. Method and apparatus for assigning addresses to nodes of a communication network tree structure
US20090037500A1 (en) * 2007-07-31 2009-02-05 Kirshenbaum Evan R Storing nodes representing respective chunks of files in a data store
US20090182837A1 (en) * 2008-01-11 2009-07-16 Rogers J Andrew Spatial Sieve Tree
US20110225429A1 (en) * 2008-08-29 2011-09-15 Charalampos Papamanthou Cryptographic accumulators for authenticated hash tables
US20100146003A1 (en) * 2008-12-10 2010-06-10 Unisys Corporation Method and system for building a B-tree
US20110022573A1 (en) * 2009-07-27 2011-01-27 International Business Machines Corporation Preventing transfer and duplication of redundantly referenced objects across nodes of an application system
US20110314246A1 (en) * 2010-06-16 2011-12-22 Microsoft Corporation Hierarchical allocation for file system storage device
US20120030256A1 (en) * 2010-07-30 2012-02-02 Wolfgang Pfeifer Common Modeling of Data Access and Provisioning for Search, Query, Reporting and/or Analytics
US20120117067A1 (en) * 2010-10-29 2012-05-10 Navteq North America, Llc Method and apparatus for providing a range ordered tree structure
US20120331249A1 (en) * 2011-06-23 2012-12-27 CohortFS, LLC Dynamic data placement for distributed storage
US20130007007A1 (en) * 2011-06-29 2013-01-03 Nokia Corporation Method and apparatus for providing a list-based interface to key-value stores
US20130086303A1 (en) * 2011-09-30 2013-04-04 Fusion-Io, Inc. Apparatus, system, and method for a persistent object store
US8868926B2 (en) * 2012-04-06 2014-10-21 Exablox Corporation Cryptographic hash database
US20130346725A1 (en) * 2012-06-20 2013-12-26 Microsoft Corporation Structuring storage based on latch-free b-trees
US20140164284A1 (en) * 2012-12-12 2014-06-12 Michael Hartel Business Object Data Structures with Node Keys
US20140317065A1 (en) * 2013-04-23 2014-10-23 Exablox Corporation Reference counter integrity checking
US9864773B1 (en) * 2013-04-29 2018-01-09 Seagate Technology Llc Object-based commands with data integrity identifiers
US20150220578A1 (en) * 2014-02-04 2015-08-06 Exablox Corporation Content based organization of file systems
US20150278271A1 (en) * 2014-03-31 2015-10-01 Sandisk Enterprise Ip Llc Compaction of Information in Tiered Data Structure
US20150370502A1 (en) * 2014-06-19 2015-12-24 Cohesity, Inc. Making more active use of a secondary storage system
US20160063008A1 (en) * 2014-09-02 2016-03-03 Netapp, Inc. File system for efficient object fragment access
US20160210080A1 (en) * 2015-01-20 2016-07-21 Ultrata Llc Object memory data flow instruction execution
US20160306817A1 (en) * 2015-04-14 2016-10-20 Et International, Inc. Systems and methods for key-value stores

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10491387B2 (en) * 2016-11-15 2019-11-26 International Business Machines Corporation End-to-end encryption of a block storage device with protected key
US11763169B2 (en) 2016-11-16 2023-09-19 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
US12001962B2 (en) 2016-11-16 2024-06-04 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
US12236354B2 (en) 2016-11-16 2025-02-25 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
US10628407B1 (en) * 2017-02-27 2020-04-21 Amazon Technologies, Inc. Efficient multithreaded data structure processing
US20180373727A1 (en) * 2017-06-26 2018-12-27 Vmware, Inc. Management of b-tree leaf nodes with variable size values
US10698865B2 (en) * 2017-06-26 2020-06-30 Vmware, Inc. Management of B-tree leaf nodes with variable size values
US12323491B2 (en) * 2017-10-12 2025-06-03 Stanley Bruce Kinsey Cloud-based index and centralized cloud-based hub interface for cloud-stored media
US20240333794A1 (en) * 2017-10-12 2024-10-03 Stanley Bruce Kinsey Cloud-based index and centralized cloud-based hub interface for cloud-stored media
US12006497B2 (en) 2018-03-16 2024-06-11 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
US11556897B2 (en) 2018-05-31 2023-01-17 Microsoft Technology Licensing, Llc Job-post budget recommendation based on performance
US10929385B2 (en) * 2018-06-22 2021-02-23 EMC IP Holding Company LLC Multi-level data deduplication for elastic cloud storage devices
US20190392062A1 (en) * 2018-06-22 2019-12-26 EMC IP Holding Company LLC Multi-level data deduplication for elastic cloud storage devices
US11658926B2 (en) 2018-06-27 2023-05-23 Microsoft Technology Licensing, Llc Generating smart replies involving image files
US11188194B2 (en) 2018-06-27 2021-11-30 Microsoft Technology Licensing, Llc Personalization and synonym hierarchy for smart replies
US11062084B2 (en) * 2018-06-27 2021-07-13 Microsoft Technology Licensing, Llc Generating diverse smart replies using synonym hierarchy
US12437841B2 (en) * 2018-08-03 2025-10-07 Catalog Technologies, Inc. Systems and methods for storing and reading nucleic acid-based data with error protection
US20200185057A1 (en) * 2018-08-03 2020-06-11 Catalog Technologies, Inc. Systems and methods for storing and reading nucleic acid-based data with error protection
US11636220B2 (en) * 2019-02-01 2023-04-25 Intertrust Technologies Corporation Data management systems and methods
US12002547B2 (en) 2019-05-09 2024-06-04 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
US11671492B2 (en) * 2020-09-10 2023-06-06 EMC IP Holding Company LLC Multipart upload for distributed file systems
US20220188339A1 (en) * 2020-12-16 2022-06-16 Electronics And Telecommunications Research Institute Network environment synchronization apparatus and method
US12026181B2 (en) * 2020-12-16 2024-07-02 Electronics And Telecommunications Research Institute Network environment synchronization apparatus and method
US12204494B2 (en) * 2022-04-13 2025-01-21 Oracle International Corporation Implementing graph search with in-structure metadata of a graph-organized file system
US12001481B2 (en) * 2022-04-13 2024-06-04 Oracle International Corporation Graph-organized file system
US20230334014A1 (en) * 2022-04-13 2023-10-19 Oracle International Corporation Implementing graph search with in-structure metadata of a graph-organized file system
US20230334093A1 (en) * 2022-04-13 2023-10-19 Oracle International Corporation Graph-organized file system

Also Published As

Publication number Publication date
US20170063990A1 (en) 2017-03-02
US10474654B2 (en) 2019-11-12

Similar Documents

Publication Publication Date Title
US20170060924A1 (en) B-Tree Based Data Model for File Systems
US12164386B2 (en) Large content file optimization
US9542409B2 (en) Deduplicated file system
US9830324B2 (en) Content based organization of file systems
US10365974B2 (en) Acquisition of object names for portion index objects
US20150006495A1 (en) Methods and apparatuses to optimize updates in a file system based on birth time
US20100161608A1 (en) Methods and apparatus for content-aware data de-duplication
GB2520361A (en) Method and system for a safe archiving of data
US11221777B2 (en) Storage system indexed using persistent metadata structures
US20170242882A1 (en) An overlay stream of objects
US12164799B2 (en) Reducing memory usage in storing metadata
CN118861100A (en) Large object reading method, storage medium and device for database
US12014070B2 (en) Method, device, and computer program product for storage management
US11016933B2 (en) Handling weakening of hash functions by using epochs
Savić et al. The analysis and implication of data deduplication in digital forensics
US20180367313A1 (en) Secure memory and hierarchical structure and system therefor
US20260037389A1 (en) Opportunistic setting of relationships when cloning files across namespaces in a deduplication filesystem
CN118861064A (en) Database large object deletion method, storage medium and device
CN118885485A (en) Storage method and related products of temporary large objects in database
CN118861066A (en) Large object updating method, storage medium and device for database
CN118861068A (en) Methods for modifying large objects in databases and related products
CN118861063A (en) Database large object rewriting method, storage medium and device
CN118861040A (en) Large object deduplication processing method, storage medium and device for database
CN118861042A (en) Management method and related products for data table with large object column
CN118861039A (en) Large object storage method, storage medium and device for database

Legal Events

Date Code Title Description
AS Assignment

Owner name: EXABLOX CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FITZHARDINGE, JEREMY;REEL/FRAME:038635/0450

Effective date: 20160516

AS Assignment

Owner name: SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT, CALI

Free format text: SECURITY INTEREST;ASSIGNORS:EXABLOX CORPORATION;STORAGECRAFT INTERMEDIATE HOLDINGS, INC.;STORAGECRAFT ACQUISITION CORPORATION;AND OTHERS;REEL/FRAME:041748/0849

Effective date: 20170324

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: STORAGECRAFT TECHNOLOGY CORPORATION, MINNESOTA

Free format text: TERMINATION AND RELEASE OF PATENT SECURITY AGREEMENT;ASSIGNOR:SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT;REEL/FRAME:055614/0852

Effective date: 20210316

Owner name: EXABLOX CORPORATION, MINNESOTA

Free format text: TERMINATION AND RELEASE OF PATENT SECURITY AGREEMENT;ASSIGNOR:SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT;REEL/FRAME:055614/0852

Effective date: 20210316

Owner name: STORAGECRAFT ACQUISITION CORPORATION, MINNESOTA

Free format text: TERMINATION AND RELEASE OF PATENT SECURITY AGREEMENT;ASSIGNOR:SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT;REEL/FRAME:055614/0852

Effective date: 20210316

Owner name: STORAGECRAFT INTERMEDIATE HOLDINGS, INC., MINNESOTA

Free format text: TERMINATION AND RELEASE OF PATENT SECURITY AGREEMENT;ASSIGNOR:SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT;REEL/FRAME:055614/0852

Effective date: 20210316