WO2022177566A1 - Data access processing on network-attached storage devices - Google Patents
Data access processing on network-attached storage devices Download PDFInfo
- Publication number
- WO2022177566A1 WO2022177566A1 PCT/US2021/018552 US2021018552W WO2022177566A1 WO 2022177566 A1 WO2022177566 A1 WO 2022177566A1 US 2021018552 W US2021018552 W US 2021018552W WO 2022177566 A1 WO2022177566 A1 WO 2022177566A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- identifier
- data object
- storage devices
- processing operation
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/256—Integrating or interfacing systems involving database management systems in federated or virtual databases
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
Definitions
- the present disclosure is related to the processing of ordered data accesses in computing devices.
- Storage systems are often associated with processing inefficiencies for streaming partial or unaligned updates. For example, performing partial updates to a storage device, such as updates that are smaller than the device mapping unit, includes reading data that is larger than the partial updates, modifying the data, and storing back the modified data.
- data volatility e.g., volatility once the data is read from the storage device
- acknowledgment delays e.g., data update acknowledgment is communicated after the updated data is written back into storage.
- a system for processing data in a distributed storage network includes a plurality of storage devices and processing circuitry coupled to the plurality of storage devices. Each of the plurality of storage devices storing a copy of a data object.
- the processing circuitry is configured to perform operations including generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing the data object, the plurality of sequence identifiers generated in sequential order.
- a subset of storage devices of the plurality of storage devices is selected based on a sharding instance identifier associated with the plurality of data entries.
- a data stream with the plurality of data entries is communicated to a storage device of the subset of storage devices.
- Each data entry of the plurality of data entries includes a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers.
- the plurality of data entries are arranged in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers.
- a data processing operation is executed for each data entry of the plurality of data entries using the data object.
- the data processing operation corresponds to the data processing operation identifier, and the data processing operation is executed according to an execution order based on the sequence identifier.
- the selecting of the subset of storage devices of the plurality of storage devices is based on the sharding instance identifier and a redundancy group associated with the plurality of data entries.
- the outcome of the executing, or post-processing of the outcome based on redundancy requirements is stored into a mapping unit (such as a sector) of the storage device.
- each data entry of the plurality of data entries further includes availability information identifying one of a plurality of versions of the data object stored at the storage device.
- the processing circuitry is further configured to perform operations including executing the data processing operation using the one of a plurality of versions of the data object identified by the availability information.
- the plurality of versions of the data object identified by the availability information includes one of a plurality of redundancy versions associated with a corresponding plurality of copies of the data object, or a plurality of erasure-coded versions associated with data information or parity information of the data object.
- the processing circuitry is further configured to perform operations including selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier, the selected sharding instance identifying the subset of storage devices.
- the data processing operation is associated with a plurality of redundancy versions of the data object.
- the processing circuitry is further configured to perform operations including applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify the storage device of the subset of storage devices.
- the data stream with the plurality of data entries is communicated to the storage device of the subset of storage devices, each data entry of the plurality of data entries further including a redundancy version of the plurality of redundancy versions of the data object and a data segment for appending to the redundancy version of the data object during execution of the data processing operation.
- the processing circuitry is further configured to perform operations including arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers.
- the data processing operation is executed for each data entry of the plurality of data entries using the redundancy version of the data object and according to the execution order based on the sequence identifier.
- the data processing operation is associated with a plurality of erasure-coded versions of the data object.
- the processing circuitry is further configured to perform operations including selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier.
- the selected sharding instance identifies a plurality of redundancy groups of storage devices of the plurality of storage devices.
- the processing circuitry is further configured to perform operations including applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify a redundancy group of the plurality of redundancy groups, the identified redundancy group including the subset of storage devices.
- the processing circuitry is further configured to perform operations including communicating the data stream with the plurality of data entries to each storage device of the subset of storage devices.
- Each data entry of the plurality of data entries further includes a corresponding erasure -coded version of the plurality of erasure-coded versions of the data object and a data segment for appending to the corresponding erasure-coded version of the data object during execution of the data processing operation at each storage device of the subset of storage devices.
- the processing circuitry is further configured to perform operations including, for each storage device of the subset of storage devices, arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers.
- the data processing operation for each data entry of the plurality of data entries is executed using the corresponding erasure-coded version of the data object and according to the execution order based on the sequence identifier.
- the processing circuitry is further configured to perform operations including, for each storage device of the subset of storage devices, arranging the plurality of data entries in a buffer based on the data object identifier and the sequential order of the plurality of sequence identifiers.
- the data processing operation for each data entry of the plurality of data entries is executed using an erasure -coded version of the plurality of erasure-coded versions of the data object and according to the execution order based on the sequence identifier, to generate a corresponding erasure-coded result.
- the corresponding erasure-coded result is communicated to each storage device of the subset of storage devices for storage.
- a computer-implemented method for processing data in a distributed storage network includes generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order.
- a subset of storage devices of a plurality of storage devices is selected based on a sharding instance identifier associated with the plurality of data entries, the plurality of storage devices storing a copy of the data object.
- a data stream with the plurality of data entries is communicated to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers.
- the plurality of data entries is arranged in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers.
- a data processing operation is executed for each data entry of the plurality of data entries using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing operation executed according to an execution order based on the sequence identifier.
- each data entry of the plurality of data entries further includes availability information identifying one of a plurality of versions of the data object stored at the storage device.
- the method further includes executing the data processing operation using the one of a plurality of versions of the data object identified by the availability information.
- the plurality of versions of the data object identified by the availability information includes one of a plurality of redundancy versions associated with a corresponding plurality of copies of the data object, or a plurality of erasure- coded versions associated with data information or parity information of the data object.
- selecting the subset of storage devices further includes selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier, the selected sharding instance identifying the subset of storage devices.
- the data processing operation is associated with a plurality of redundancy versions of the data object.
- the method further includes applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify the storage device of the subset of storage devices.
- the data stream with the plurality of data entries is communicated to the storage device of the subset of storage devices.
- Each data entry of the plurality of data entries further includes a redundancy version of the plurality of redundancy versions of the data object and a data segment for appending to the redundancy version of the data object during the execution of the data processing operation.
- the method further includes arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers.
- the data processing operation is executed for each data entry of the plurality of data entries using the redundancy version of the data object and according to the execution order based on the sequence identifier.
- a non-transitory computer-readable medium storing instructions for processing data in a distributed storage network.
- the instructions When executed by one or more processors of a network node within the distributed storage network, the instructions cause the one or more processors to perform operations including generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order.
- a subset of storage devices of a plurality of storage devices is selected based on a sharding instance identifier associated with the plurality of data entries, the plurality of storage devices storing a copy of the data object.
- a data stream with the plurality of data entries is communicated to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers.
- the plurality of data entries is arranged in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers.
- a data processing operation is executed for each data entry of the plurality of data entries using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing operation executed according to an execution order based on the sequence identifier.
- the instructions cause the one or more processors to perform operations including selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier, the selected sharding instance identifying a plurality of redundancy groups of storage devices of the plurality of storage devices.
- the data processing operation is associated with a plurality of erasure-coded versions of the data object.
- Executing the instructions causes the one or more processors to perform operations including applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify a redundancy group of the plurality of redundancy groups, the identified redundancy group comprising the subset of storage devices.
- executing the instructions causes the one or more processors to perform operations including, for each storage device of the subset of storage devices, arranging the plurality of data entries in a buffer based on the data object identifier and the sequential order of the plurality of sequence identifiers.
- the data processing operation for each data entry of the plurality of data entries is executed using an erasure-coded version of the plurality of erasure-coded versions of the data object and according to the execution order based on the sequence identifier, to generate a corresponding erasure-coded result.
- the corresponding erasure-coded result is communicated to each storage device of the subset of storage devices for storage.
- an apparatus in a distributed storage network includes a plurality of storage devices, each of the plurality of storage devices storing a copy of a data object.
- the apparatus further includes a sequencer module for generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order.
- the apparatus further includes a switching module (e.g., a network switch) for selecting a subset of storage devices of a plurality of storage devices storing a copy of the data object, the selecting based on a sharding instance identifier associated with the plurality of data entries.
- the switching module is further for communicating a data stream with the plurality of data entries to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers.
- the apparatus further includes a storage controller module (e.g., a storage controller) for arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers.
- the storage module is further for executing for each data entry of the plurality of data entries, a data processing operation using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing operation executed according to an execution order based on the sequence identifier.
- FIG. 1 is a block diagram illustrating a network architecture including a sequencer generating sequence identifiers, a naming server generating data object naming sequences, and multiple computing devices configured to access multiple network-attached storage devices based on the data object naming sequences and the sequence identifiers, according to some embodiments.
- FIG. 2 is a block diagram illustrating example data object naming sequences, according to some embodiments.
- FIG. 3A is a block diagram illustrating processing a plurality of data entries associated with the same data object identifier, according to example embodiments.
- FIG. 3B is a block diagram illustrating processing a plurality of data entries associated with multiple data object identifiers, according to example embodiments.
- FIG. 4 is a block diagram illustrating a storage controller configured to perform erasure coding operations on streaming data within a storage device, according to example embodiments.
- FIG. 5 is a communication flow diagram of example communications associated with the execution of an Append data processing operation for data object replicas in a distributed storage network using data object naming sequences and shard mapping to multiple storage devices in a redundancy group, according to some embodiments.
- FIG. 6 is a block diagram illustrating the network architecture of
- FIG. 1 performing an Append data processing operation for erasure-coded versions of a data object stored at multiple storage devices in a redundancy group, according to some embodiments.
- FIG. 7 is a communication flow diagram of example communications associated with the execution of the Append data processing operation of FIG. 6, according to some embodiments.
- FIG. 8 is a block diagram illustrating the network architecture of
- FIG. 1 performing an Append data processing operation for erasure-coded versions of a data object stored at multiple storage devices in a redundancy group where pre-processing of the Append operation is performed at the network switch so that the amount of data transmitted between the switch and storage device can be reduced, according to some embodiments.
- FIG. 9 is a flowchart of another method for processing a stream of data accesses within a storage device, according to example embodiments.
- FIG. 10 is a block diagram illustrating a representative software architecture, which may be used in conjunction with various device hardware described herein, according to example embodiments.
- FIG. 11 is a block diagram illustrating circuitry for a device that implements algorithms and performs methods, according to example embodiments.
- data accesses or “ordered data accesses” refer to a stream of data entries used for accessing data objects at network-attached storage devices.
- An example data entry may include a data object identifier (e.g., an identifier, or ID, of at least one data object that the data entry is associated with), a data processing operation identifier (e.g., an identifier representing a data processing operation to be performed on the data object), and a sequence identifier for ordering data processing operations associated with the same data object identifier.
- the stream of data entries is also referred to as ordered data accesses.
- the term “data processing operation” includes at least one of a Create data processing operation (to generate a data object at a storage device), a Read data processing operation (for reading a data object), a Modify data processing operation such as an Append data processing operation (for appending data to an existing data object), and a Delete data processing operation (for deleting a data object).
- client “client computing device,” “computing device,” and “computer device” are used interchangeably and indicate network devices configured to access network-attached storage devices within a data processing network.
- network-attached storage device and “storage device” are interchangeable.
- redundant group indicates a grouping of storage devices that can be used for storing related versions of a data object (e.g., data object replicas or erasure-coded versions of a data object including data portions and parity bit portions).
- Storage systems most of which have inherent mapping units, are often associated with processing inefficiencies for streaming partial or unaligned data object updates.
- any partial (e.g., less than a mapping unit such as a sector) updates to the storage device will be performed using read-modify- write operations to carry out the updates.
- read-modify-write operations an entire sector is read out from the storage device into a temporary buffer, the temporary data is modified with the partial update, and then the whole sector (including the updated portion) is written back to the storage device (an example read-modify-write operation is discussed in connection with FIG. 1).
- the temporary data is outside of the storage device, thus may be volatile when stored in the temporary buffer (e.g., a memory buffer), and the new data may be susceptible to power loss, resulting in a data loss for all data stored in the temporary buffer.
- Storage systems may also be associated with processing inefficiencies when there are multiple partial updates to the same sector or adjacent sectors, and some of the updates are not sector aligned.
- a solution to this issue is to employ a coalescing buffer outside the storage device for temporarily holding the updates, coalescing the partial updates into one full sector, then flushing the full sector back to the device. Even though such processing removes the long read-modify-write cycle, additional inefficiencies are introduced. For example, some of the inefficiencies are associated with the coalescing buffer. First, the coalescing buffer is volatile, and the stored data can be lost in the case of power loss or a software error. Second, since the write operation cannot be acknowledged until the buffer is fully flushed to the storage device, a potential indefinite delay in acknowledging the update may be introduced. Third, for a misaligned update spanning across multiple sectors, extra logic is needed to ensure the atomicity of the update.
- This naming scheme is not scalable and does not provide data object naming that is unique and can be used for routing by network devices within a distributed storage network. Most distributed systems employ a naming service to map a system-wide unique name to a name local to a storage device.
- Techniques disclosed herein can be used to perform processing of ordered data accesses that is agnostic to mapping unit size and may be used for efficiently processing partial or unaligned data object updates.
- the disclosed techniques also eliminate (or reduce) the read-modify -write operations for partial updates, reduce data movement and bus bandwidth utilization, and reduce latency for individual update operations.
- the disclosed techniques use a storage device (e.g., a hard disk drive or a solid-state drive) controller to process ordered data accesses including data entries for performing data object updates (e.g., partial and/or unaligned data object updates).
- each of the data entries received as part of a stream of ordered data accesses may include a data object identifier (e.g., an ID of at least one data object that the data entry is associated with), a data processing operation identifier (e.g., an identifier representing a data processing operation to be performed on the data object such as a Create, Read, Write, Append, Verify, Delete, or another data modification operation), and a sequence identifier (e.g., sequence identifier SeqNum which can be used for ordering data processing operations associated with the same stream identifier).
- a data object identifier e.g., an ID of at least one data object that the data entry is associated with
- a data processing operation identifier e.g., an identifier representing a data processing operation to be performed on the data object such as a Create, Read, Write, Append, Verify, Delete, or another data modification operation
- sequence identifier e.g., sequence identifier SeqNum which can be
- a data entry received by the storage device controller is tagged with a unique ID for a particular data object and a sequence identifier (or number) (e.g., SeqNum) that can be used to order all the outstanding operations with regard to that data object.
- the storage device controller is configured to manage an array of buffers (e.g., coalescing buffers or other types of power-protected buffers) to temporarily store the updates.
- the storage device controller can reconstruct the correct order of data processing operations and apply the operations to the data object in the correct order.
- any operation that is a successor to a previously acknowledged operation can be acknowledged after its execution.
- the buffer can be flushed out to make room for more operations.
- additional data processing such as erasure coding, can be applied to the buffer before the flushing operation.
- a naming server is used to provide a flexible naming service using data object naming sequences for addressing data objects stored at different locations in a distributed storage network. More specifically, the data object naming sequences are long enough to provide a unique reference globally within the distributed storage network, allowing for use of the naming sequence within individual storage devices. Additionally, the data object naming sequences include partition information on how the data object is partitioned within the distributed system (e.g.
- the data object naming sequences include availability information identifying multiple versions of the data object (e.g., the availability information may indicate the existence of replicas, erasure -coded versions, etc.).
- a data object naming sequence includes a data object name, sharding information (e.g., information associated with shard mapping used for selecting a sharding instance with one or more storage devices storing the data object), and availability information.
- the disclosed techniques are used for performing data accesses (e.g., an Append data processing operation) from multiple clients in connection with data objects stored at multiple storage devices (e.g., storage devices within a redundancy group).
- a sequencer module is used to generate a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object at a network-attached storage device.
- the sequencer module may be configured as a centralized sequencer module generating multiple sequence identifiers used by multiple computer devices to access data objects (e.g., different copies of the data object stored at different storage devices).
- the sequencer module may also be configured as a distributed sequencer module associated with a specific computer device and generating sequence identifiers used by the specific computer device to access data objects at storage devices.
- FIG. 1 An overview of a distributed storage network using the disclosed naming scheme and a sequencer module is provided in connection with FIG. 1.
- FIG. 2 A more detailed view of data object naming sequences is illustrated in FIG. 2.
- FIG. 3A-FIG. 4 A more detailed description of processing data entries associated with a data object identifier is provided in connection with FIG. 3A-FIG. 4.
- the discussed techniques for processing data accesses can be implemented, for example, using firmware that is executed by one or more processors on a computer device (e.g., a dedicated network node), or via hardware (e.g., multiple processors associated with multiple computer devices) configured to perform the disclosed functions and facilitate the processing of the data accesses.
- firmware that is executed by one or more processors on a computer device (e.g., a dedicated network node), or via hardware (e.g., multiple processors associated with multiple computer devices) configured to perform the disclosed functions and facilitate the processing of the data accesses.
- the network architecture 100 (which also can be referred to as distributed storage network 100) includes a plurality of computer devices such as computer devices 104, 106, ..., 108 configured to access storage devices 128, 130, 132, ..., 134 via network routers 112, 114, network switches 116, 118, and network interconnect 160 associated with the communication network 110.
- ..., 134 are communicatively coupled to computer devices 104, 106, ..., 108 via network routers 112, 114 and network interconnect 160 associated with the communication network 110.
- the network interconnect 160 is part of the communication network 110, which includes the network routers 112, 114 and network switches 116, 118.
- the network interconnect 160 includes a Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) interface, a Peripheral Component Interconnect express (PCIe) interface, or another type of interconnect.
- RDMA Remote Direct Memory Access
- RoCE Converged Ethernet
- PCIe Peripheral Component Interconnect express
- the network architecture 100 includes a sequencer module 162 which can be used for generating sequence identifiers for other computer devices (e.g., computer devices 104, 106, ..., 108) in connection with ordered access to one or more data objects stored at storage devices 128 — 134. More specifically, the sequencer module 162 is configured to generate individual sequence identifiers or a range of sequence identifiers for accessing a data object based on a request from one or more of the computer devices 104 - 108.
- sequencer module 162 is configured to generate individual sequence identifiers or a range of sequence identifiers for accessing a data object based on a request from one or more of the computer devices 104 - 108.
- the network architecture 100 further includes a naming server
- the naming server 102 comprises suitable circuitry, logic, interfaces, and/or code and is configured to perform data object naming functionalities discussed herein. More specifically, the naming server 102 is configured to generate data object naming sequences 148 upon creation of a data object (e.g., using a Create data processing operation 150). Additionally, the naming server 102 is further configured to update the data object naming sequences 148 based on the execution of other data processing operations, such as a Read data processing operation 154, an Append data processing operation 156, or a Delete data processing operation 152. In an example embodiment, the naming server 102 uses a shard mapping 144 to determine the sharing information associated with the data object naming sequences for a data object.
- shard mapping indicates a mapping of a sharding instance identifier to a set of storage devices and a hashing function (collectively referred to as a sharding instance).
- sharding information such as a sharding instance identifier can be used as an input into the shard mapping to obtain the set of storage devices which can be used for performing one or more data processing operations.
- the sharding information is a tuple including the sharding instance identifier together with an input value such as hashing input information (also referred to as a shard information tuple, or SIT), which is used as input into the shard mapping to obtain the set of storage devices (e.g., using the sharding instance identifier to select a sharding instance) as well as select a specific storage device from the set (e.g., by applying the hashing function to the hashing input information).
- a sharding instance includes a list of storage devices and the input value is used for selecting one of the storage devices to receive a corresponding data processing operation.
- the sharding instance includes a list of redundancy groups of storage devices (e.g., each redundancy group includes two or more storage devices) and the input value is used for selecting one of the redundancy groups so that storage devices within the selected redundancy group receive the corresponding data processing operation.
- Example communication flows associated with data processing operations using data object naming sequences and shard mappings are illustrated in connection with FIG. 5 - FIG. 8.
- the network switch 116 is configured with a shard mapping 146 which can be used to determine sharding information when performing an Append data processing operation 156 or a Read data processing operation 154.
- the Create data processing operation 150 and the Delete data processing operation 152 are performed by the naming server 102 using shard mapping 144. Since shard mapping 144 and shard mapping 146 may be updated during the execution of data processing operations, shard mapping 144 and shard mapping 146 are synchronized with each other via communication link 158.
- FIG. 1 illustrates the use of shard mappings and data object naming sequences by the naming server 102 and the network switch 116
- the disclosure is not limited in this regard and such functionalities can be consolidated within a single computer device using a data entry processing module performing the disclosed functionalities of the naming server 102, the network switch 116, the sequencer module 162, as well as data object processing functions of a storage controller (e.g., the processing functions performed by one or more of storage controllers 136 - 142 of storage devices 128 - 134).
- Example software architecture and a computer device using such data object naming module are illustrated in FIG. 10 and FIG. 11 respectively.
- Computer devices 104-108 as well as the naming server 102 can be mobile devices or another type of computing device such as a compute node or a storage node in the network architecture 100.
- SSDs solid-state drives
- HDs hard drives
- storage device accessed by the naming server 102 or by computer devices 104, ..., 108 via the network switch 116 and the network interconnect 160.
- data objects accessed by the naming server 102 or by computer devices 104, ..., 108 via the network switch 116 and the network interconnect 160.
- SSDs solid-state drives
- HDs hard drives
- different copies of data objects are stored at different storage devices for redundancy or error coding purposes as indicated by the availability information in the corresponding data object naming sequence associated with a data object.
- three erasure-coded (EC) versions of data object D1 are stored at corresponding storage devices 128, 130, and 132, and are accessed via the network switch 116.
- the availability information for data object D1 includes 2+1 EC information reflected by the suffix El/2 (e.g., data portion 1 of 2), suffix E2/2 (e.g., data portion 2 of 2), and suffix EP (e.g., parity bits portion).
- Two replica versions of data object D2 are stored at corresponding storage devices 130 and 132 and are accessed via the network switch 116.
- the availability information for data object D2 includes replica (or copy) information reflected by the suffix R1 (e.g., the first replica of object D2) and suffix R2 (e.g., the second replica of object D2).
- two replica versions of data object D3 are stored at corresponding storage devices 132 and 134 and are accessed via the network switch 116.
- the availability information for data object D3 includes replica (or copy) information reflected by the suffix R1 (e.g., the first replica of object D3) and suffix R2 (e.g., the second replica of object D3).
- Each of the storage controllers 136 - 142 comprises suitable circuitry, logic, interfaces, and/or code and is configured to manage access to the storage devices 128 - 134, including configuring and managing operations for processing data entries in connection with ordered access to data objects as well as managing operations associated with the distributed naming scheme discussed herein (e.g., as described in connection with FIG. 3A, FIG. 3B, and FIG. 4).
- the storage controllers 136-142 are configured to perform operations for processing ordered data accesses using one or more power-protected buffers (e.g., storage areas 172 and 178 in FIG.
- power-protected buffer indicates a buffer implemented as part of a storage device or non-volatile memory, which buffer persists any stored information upon power failure.
- the power-protected buffers include multiple storage areas that can be sized to match a mapping unit of the corresponding storage device (e.g., sector 176 of the storage device 130 and sector 182 storage device 132).
- FIG. 1 further illustrates an example communication flow (which is also discussed in greater detail in connection with FIG. 5) of a stream of data entries for performing the append data processing operation to multiple replicas of the same data object from two client devices (e.g., computer devices 104 and 106). More specifically, computer device 104 communicates a stream 164 of data entries to switch 116 via router 112, and computer device 106 communicates a stream 166 of data entries to switch 116 via router 112.
- client devices e.g., computer devices 104 and 106
- Each of the data entries in streams 164 and 166 can include a data processing operation identifier to identify the Append data processing operation 156 (or another data processing operation) to be performed in connection with replica versions R1 and R2 for data object D2 stored at storage devices 130 and 132.
- the network switch 116 uses shard mapping 146 to determine a redundancy group of storage devices associated with the requested data accesses of the data entries within streams 164 and 166. For example, the network switch 116 uses sharding information within the data entries of streams 164 and 166 to select a sharding instance of redundancy groups and select one of the redundancy groups (e.g., a redundancy group including storage devices 130 and 132) for communication of the data entries of streams 164 and 166.
- Data entries of streams 164 and 166 are multiplexed by the switch 116 and communicated as streams 168 and 170 of data entries to corresponding storage devices 130 and 132.
- the data entries in streams 164 and 166 include corresponding data object naming sequences (e.g., such as data object naming sequences 148 illustrated in FIG. 2).
- storage controller 138 uses a data object naming sequence of the data object (e.g., a data object identifier within the data object naming sequence), a data processing operation identifier (which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier), and a sequence identifier within each of the data entries of stream 168 to rearrange the data entries as stream 174 (e.g., sequentially, based on the sequence identifier) using a buffer including the storage area 172.
- a data object naming sequence of the data object e.g., a data object identifier within the data object naming sequence
- a data processing operation identifier which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier
- a sequence identifier within each of the data entries of stream 168 e.g., sequentially, based on the sequence identifier
- the storage area 172 When the storage area 172 is full (e.g., with data packets associated with corresponding Append data processing operations of the data entries within the stream), the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object D2.R1 associated with the data entries) and the result is flushed to a mapping unit of the storage devices 130 such as sector 176.
- the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object D2.R1 associated with the data entries) and the result is flushed to a mapping unit of the storage devices 130 such as sector 176.
- storage controller 140 uses a data object naming sequence of the data object (e.g., a data object identifier within the data object naming sequence), a data processing operation identifier (which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier) and a sequence identifier within each of the data entries of stream 170 to rearrange the data entries as stream 180 (e.g., sequentially, based on the sequence identifier) in a buffer including the sector-sized storage area 178.
- a data object naming sequence of the data object e.g., a data object identifier within the data object naming sequence
- a data processing operation identifier which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier
- a sequence identifier within each of the data entries of stream 170 to rearrange the data entries as stream 180 (e.g., sequentially, based on the sequence identifier) in a buffer including the sector-sized storage
- the sector-sized storage area 178 When the sector-sized storage area 178 is full (e.g., with data packets associated with corresponding Append data processing operations of the data entries within the stream), the corresponding data processing operations of the data entries within the sector sized storage area are executed (e.g., the Append data processing operations for the data object D2.R2 associated with the data entries) and the result is flushed to a mapping unit of the storage devices 132 such as sector 182.
- FIG. 2 is a block diagram illustrating example data object naming sequences 148, according to some embodiments.
- the data object naming sequences 148 include data object naming sequence (DONS) 200,
- the DONS 200 includes a data object identifier 202 (e.g., a data object name), sharding information 204, and availability information 206.
- the availability information 206 may include one of the following: erasure coding information 208, replica information 210, reliability information 212, and service level agreement (SLA) information 214.
- the DONS 201 includes a data object name 216, sharding information 218, and availability information 220.
- the availability information 220 may include one of the following: erasure coding information 222, replica information 224, reliability information 226, and SLA information 228.
- the sharding information 204 may include a sharding instance identifier which is used as an input into the shard mapping to obtain a sharding instance with the set of storage devices that can be used for performing one or more data processing operations.
- the sharding information 204 is a shard information tuple (or SIT) including the sharding instance identifier and an input value such as hashing input information.
- the input value is used as input into the shard mapping to obtain the set of storage devices (e.g., using the sharding instance identifier to select a sharding instance) as well as select a specific storage device from the set (e.g., by applying the hashing function to the hashing input information).
- the availability information 206 used as part of the DONS is used to describe the relationship between different versions of a data object residing on multiple storage devices, which versions belong to the same logical data object identified by the data object identifier 202.
- the availability information portion of the DONS can be used to provide availability in the case of device failures and to facilitate data recovery.
- the erasure coding information 208 is described in connection with FIG. 1 and may include different suffixes (e.g., ECl/2, EC2/2, ECP/2, ECQ/2) indicating whether the specific erasure-coded data object version is a data portion or a parity bits portion based on the EC type that was used when the data object was created.
- the replica information 210 is described in connection with FIG. 1 and may include suffixes (e.g., Rl/3, R2/3, R3/3) indicating the total number of replicas (or copies) and the specific replica number out of the total number of replicas (e.g., suffix Rl/3 indicates the data object associated with the DONS having this availability information suffix is the first replica out of a total of three replicas).
- suffixes e.g., Rl/3, R2/3, R3/3
- suffix Rl/3 indicates the data object associated with the DONS having this availability information suffix is the first replica out of a total of three replicas.
- the reliability information 212 indicates the reliability of the storage device for accessing the particular data object.
- a hard disk drive may be associated with a reliability of 0.99999s (or 5_9s) and a data object stored at such HDD may have availability information represented as suffix 5_9s.
- a solid-state drive may be associated with a reliability of 0.9999999s (or 7_9s) and a data object stored at such SSD may have availability information represented as suffix 7_9s.
- the SLA information 214 indicates the latency associated with accessing a specific data object associated with a DONS using the SLA information.
- the SLA information may include the following SLA suffixes: .10us, .lOOus, .1ms, and .10ms, which indicate the access latency associated with a particular data object.
- DONS can be Dl.Shardl_X.Rl/3, which can be used for determining the following information: (a) the data object name is Dl; (b) the sharding information includes SIT consisting of a sharding instance identifier Shard 1 and hashing input information X; and (c) availability information is Rl/3 indicating the particular data object associated with this DONS is a first replica (or copy) out of a total of three replicas of the data object.
- FIG. 2 illustrates four different types of availability information 206
- the disclosure is not limited in this regard and other types of availability information may be used as well.
- FIG. 3A is a block diagram 300A illustrating processing the plurality of data entries 312-318 associated with stream 174 of data accesses, according to example embodiments.
- Each of the data entries 312 - 318 is configured to include a data object identifier of a data object the data entry is associated with, a data processing operation identifier (e.g., an identifier representing a data processing operation to be performed on the data object), a data packet (e.g., data for use with an Append data processing operation or another data modification operation), and a sequence identifier.
- a data processing operation identifier e.g., an identifier representing a data processing operation to be performed on the data object
- a data packet e.g., data for use with an Append data processing operation or another data modification operation
- sequence identifier e.g., a sequence identifier
- each of the data entries 312 - 318 includes the data object identifier (ID) 302 which identifies a data object (e.g., D2) the particular data entries are associated with.
- ID data object identifier
- FIG. 3A illustrates the data object ID 302 as a separate element outside of the data entries, such illustration is for simplicity, and each data entry 312 - 318 includes the data object ID 302.
- Data entries 312, 314, 316, and 318 also include corresponding sequence identifiers (or sequence numbers) (referenced interchangeably as SEQ or SeqNum) 304, 306, 308, and 310, corresponding data packets 1 , 2, 3, and 4, as well as corresponding data processing operation identifiers (also referred to as data processing commands or CMD) 305, 307, 309, and 311 to be performed using the corresponding data packets 1 - 4.
- the sequence identifiers 304 - 310 refer to the order that the corresponding data processing operations 305 - 311 (for data entries within stream 174 that are associated with data object ID 302) should be executed.
- the data processing operation is a Create data processing operation, a Read data processing operation, a Write data processing operation, an Append data processing operation, a Verify data processing operation, a Delete data processing operation, or another data modification operation to be performed on the identified data object.
- each of the data entries 312 - 318 of the stream 174 further includes a sequence validator function (or SEQ VLDTR) 303.
- the sequence validator function 303 is used to ensure that there are no duplicate data entries within a stream of ordered data accesses.
- the storage controller 138 can use the sequence validator function 303 to detect that one of the data entries 312 - 318 of the stream 174 is older than a corresponding flushed data entry stored within storage device 130. Such detection would indicate that the incoming entry is a duplicate and the storage controller 138 can discard the incoming entry.
- the storage controller 138 stores the corresponding data packets 1 - 4 in one or more storage areas (e.g., storage area 172 which may be sector-sized) of the power- protected buffer 313.
- data packets 1 - 4 are stored in the storage area 172 of the power-protected buffer 313 sequentially, based on the corresponding sequence identifiers 304 - 310.
- the data packets 1 - 4 are stored in the correct order inside the storage area 172, and the storage controller 138 can apply corresponding data processing operations 305 - 311 in the order the data entries are arranged using the sequence identifiers 304 - 310. More specifically, the storage controller 138 executes the data processing operation for each of the data entries 312-318 according to an execution order based on the corresponding sequence identifier. As data processing operations are executed in order according to the sequence identifiers associated with the same data object ID, an acknowledgment can be communicated upon completion of the execution of each data processing operation. Example processing of data entries associated with multiple data streams of ordered data accesses is discussed in connection with FIG. 3B.
- the storage area 172 is a sector-size storage area, matching the mapping unit (namely, a sector such as a sector 176) of the storage device 130. Additionally, as data packets associated with incoming data entries are stored in storage area 172, the last stored data packet may be partial. For example, as illustrated in FIG. 3 A and FIG. 3B, data packet 3 may be only partially stored inside the storage area 172, and a remaining portion of data packet 3 may be stored in another storage area of the power-protected buffer 313 managed by the storage controller 138 (or within the same storage area after the storage controller 138 flushes the store data into sector 176 of the storage device 130).
- storage area 172 contains corresponding data packets associated with acknowledged updates.
- the storage controller 138 is further configured to detect when a size of stored data packets in the storage area 172 is equal to the size of the mapping unit of the storage device 130 (namely, when the size of the storage area 172 equals the size of a sector of the storage device 130), and flush the contents of the storage area 172 from the power-protected buffer 313 into the sector 176 of the storage device 130 based on the detecting.
- FIG. 3B is a block diagram 300B illustrating processing a plurality of data entries associated with multiple data object identifiers, according to example embodiments.
- data entries 322, 324, and 326 are associated with stream 344, along with a data object ID (ID_1) 320
- data entries 330, 332, 334, and 336 are associated with stream 346, along with a data object ID (ID_2) 328.
- the storage controller 138 is configured to store data packets associated with the same data object ID in the same storage area inside the power-protected buffer 313. For example, data packets 1, 2, and 3 of data entries 322, 324, and 326 respectively are all stored in storage area 172 associated with data object ID 320 of stream 344. Similarly, data packets 4, 5, 6, and 7 of data entries 330, 332, 334, 336 respectively are all stored in storage area 173 associated with data object ID 328 of stream 346.
- different data entries associated with different data object identifiers can be received sequentially at storage device 128 (e.g., data entries of streams 344 and 346 are received out of order and at different times).
- the storage controller 138 is configured to detect the data object identifier of each received data entry and store the corresponding data packet in a corresponding storage area associated with the data object identifier of the data entry based on the sequence identifier of the data entry. As data packets are arranged sequentially inside the corresponding storage areas, the data processing operation indicated by each data entry can be executed and an acknowledgment can be communicated.
- storage controller 138 is configured to process a subset of a set of received data entries, where the subset is associated with the same data object identifier. Additionally, the data packet stored in storage areas 172 and 173, after the data processing operations have been executed, are flushed into sectors 338 and 340 of the storage device 130, based on storage areas 172 and 173 reaching a sector size with stored data.
- FIG. 4 is a block diagram illustrating a storage controller 138 configured to perform erasure coding operations 402 and 404 on streaming data within a storage device 130 to generate different parity values, according to example embodiments.
- the plurality of data entries 424 - 430 are associated with received stream 401 of data entries (which can be the same as stream 168 received at storage device 130).
- Each of the data entries 424 - 430 includes a data object identifier (IDx) of a data object the data entry is associated with, a data processing operation identifier (CMD which can identify an Append data processing operation), a data packet (e.g., 1, 2, 3, or 4), and a sequence identifier (SEQ).
- IDx data object identifier
- CMD data processing operation identifier
- SEQ sequence identifier
- each of the data entries 424 - 430 includes the data object identifier (IDx such as IDx 422) which identifies a data object the particular data entries of stream 401 are associated with.
- each of the data entries can include a data object naming sequence (as discussed in connection with FIG. 1 and FIG. 2) which can be used together with the data packet and the sequence identifier to execute/perform the data processing operation associated with the data processing operation identifier.
- data packets from corresponding data entries are stored in the storage area 406 of a power-protected buffer (e.g., buffer 313), and are managed by the storage controller 138.
- the storage controller 138 is configured to generate data chunks (or data slices) 408, 410, and 412 using data stored in storage area 406 (the stored data being also referred to in FIG. 4 as “trunk”).
- the storage controller 138 In the example erasure coding operation 402, the storage controller 138 generates a parity value P 418 using data stored in storage area 406. In the example erasure coding operation 404, the storage controller 138 generates a parity value Q 419 using data stored in storage area 406. In an example embodiment, parity values P and Q maybe generated using the same data in storage area 406, by applying parity value generation techniques associated with erasure coding techniques. As the parity values P 418 and Q 419 are generated, they may be stored in sector 420 of the storage device 130.
- the storage controller 138 detects a size of data entries stored in the storage area 406 of the power-protected buffer is equal to a predefined size associated with an erasure coding operation (e.g., the data generated by such operation will render a size greater than a size of a mapping unit, such as a sector of the storage device 130, for example, 12KB in the 3+2 erasure coding scheme and mapping unit is 4KB).
- the storage controller 138 applies the erasure coding operation (e.g., a logical operation, a byte-shifting operation, or another type of erasure coding operation) to data entries stored in the storage area 406 to generate data slices.
- the storage controller 138 generates a parity value (e.g., parity value P 418 or Q 419) based on the data slices, and stores the data slices or the parity value in the mapping unit (e.g., sector 420) of the storage device 130.
- a parity value e.g., parity value P 418 or Q 419
- FIG. 5 is a communication flow diagram 500 of example communications associated with the execution of an Append data processing operation for data object replicas in the distributed storage network of FIG. 1 using data object naming sequences and shard mapping to multiple storage devices in a redundancy group, according to some embodiments.
- the communication flow takes place between computer device 104, computer device 106, the network switch 116, and a plurality of redundancy groups 501, 503, 505 of storage devices.
- the first redundancy group 501 (or RG1) includes storage devices 128 (or A), 130 (or B), and 132 (or C).
- the second redundancy group 503 (or RG2) includes storage devices B and C.
- the third redundancy group 505 (or RG3) includes storage devices C and 134 (or N).
- computer device 104 and computer device 106 communicate a request to the sequencer module 162 for sequence identifiers for accessing a data object associated with the data object identifier NAME1.
- sequencer module 162 provides sequence identifiers SEQ1 502 and SEQ2 504 to computer devices 104 and 106 respectively.
- Computer device 104 communicates a request 506 to the switch 116 for a data processing operation such as an Append data processing operation, which includes a data object identifier of a data object (e.g., a data object name such as NAME1), an SIT (including a sharding instance identifier of SHARD_1 and hashing input information such as X), data (e.g., DATA1) for appending the data object, and sequence identifier SEQ1 obtained from the sequencer module 162.
- request 506 may further include availability information (e.g., identifying a version of a plurality of versions of the data object used in connection with the identified data processing operation).
- the network switch 116 includes a shard mapping 146 with sharding instances.
- Each of the sharding instances of the shard mapping 146 corresponds to a redundancy group (RG) (or a subset) of storage devices of a set of available storage devices (e.g., set of available storage devices 128 - 134) and a hashing function.
- RG redundancy group
- a first sharding instance of the shard mapping 146 can refer to RG1
- a second sharding instance of the shard mapping 146 can refer to RG2
- a third sharding instance of the shard mapping 146 can refer to RG3.
- the network switch 116 performs a shard mapping based on the sharding instance identifier and the hashing input information received with the request 506. More specifically, the sharding instance identifier is mapped to the first sharding instance and the hashing function (e.g., SHA256) of the first sharding instance is applied to the hashing input information to obtain RG2 (e.g., storage devices 130 and 132) as the receiving RG for the Append data processing operation of request 506.
- RG2 e.g., storage devices 130 and 132
- the network switch 116 generates a modified (or updated) request 510 for the data processing operation (e.g., a modified Append data processing operation) which is communicated to storage device 130.
- the original request 506 for an Append data processing operation is modified to include DONS of the data object that will be appended at storage device 130 as well as the corresponding availability information.
- the original request 506 for the Append data processing operation includes a data object identifier (e.g., a data object name) NAME1, SIT including a sharding instance identifier of SHARD_1 and hashing input information X, data (e.g., DATA1) for appending, and sequence identifier SEQ1.
- the network switch 116 uses shard mapping 146 and selects a sharding instance for RG2 based on the sharding instance identifier SHARD_1.
- the network switch 116 generates the modified request 510 for communication to the storage device 130 and modified request 514 for communication to the storage device 132.
- the modified request 510 for the Append data processing operation for the storage device 130 will include DONS with the data object identifier NAME1, the SIT (e.g.,
- the modified request 514 for the Append data processing operation for the storage device 132 will include DONS with the data object identifier NAME 1, the SIT (e.g., ⁇ SHARD_1, X ⁇ ), availability information (e.g. suffix R2/2 identifying a second replica version of the data object), data for appending (e.g., DATA1), and sequence identifier (e.g., SEQ1).
- the modified requests 510 and 514 for the Append data processing operation are communicated to storage devices 130 and 132 respectively for execution.
- the modified requests 510 and 514 are executed at storage devices 130 and 132, and corresponding notifications 512 and 516 of the outcome of the execution (e.g., a notification of successful execution) is communicated back to the network switch 116.
- the network switch 116 forwards the received notifications 512 and 516 as notification 518 to the computer device 104 where the original request 506 for an Append data processing operation originated.
- Similar processing is performed by the network switch 116 with regard to data entries associated with stream 166 originating from computer device 106. More specifically, computer device 106 communicates a request 520 to the switch 116 for a data processing operation such as an Append data processing operation, which includes a data object identifier of a data object (e.g., a data object name such as NAME1), an SIT (including a sharding instance identifier of SHARD_1 and hashing input information such as X), data (e.g., DATA2) for appending the data object, and sequence identifier SEQ2 obtained from the sequencer module 162.
- request 520 may further include availability information (e.g., identifying a version of a plurality of versions of the data object used in connection with the identified data processing operation).
- the network switch 116 performs a shard mapping based on the sharding instance identifier and the hashing input information received with the request 520. More specifically, the sharding instance identifier is mapped to the first sharding instance and the hashing function (e.g., SHA256) of the first sharding instance is applied to the hashing input information to obtain RG2 (e.g., storage devices 130 and 132) as the receiving RG for the Append data processing operation of request 520.
- RG2 e.g., storage devices 130 and 132
- the network switch 116 generates a modified (or updated) request 524 for the data processing operation (e.g., a modified Append data processing operation) which is communicated to storage device 130.
- the original request 520 for an Append data processing operation is modified to include DONS of the data object that will be appended at storage device 130 as well as the corresponding availability information.
- the original request 520 for the Append data processing operation includes a data object identifier (e.g., a data object name) NAME1, SIT including a sharding instance identifier of SHARD_1 and hashing input information X, data (e.g., DATA2) for appending, and sequence identifier SEQ2.
- the network switch 116 uses shard mapping 146 and selects a sharding instance for RG2 based on the sharding instance identifier SHARD_1.
- the network switch 116 generates the modified request 524 for communication to the storage device 130 and modified request 528 for communication to the storage device 132.
- the modified request 524 for the Append data processing operation for the storage device 130 will include DONS with the data object identifier NAME1, the SIT (e.g.,
- the modified request 528 for the Append data processing operation for the storage device 132 will include DONS with the data object identifier NAME 1, the SIT (e.g., ⁇ SHARD_1, X ⁇ ), availability information (e.g. suffix R2/2 identifying a second replica version of the data object), data for appending (e.g., DATA2), and sequence identifier (e.g., SEQ2).
- the modified requests 524 and 528 for the Append data processing operation are communicated to storage devices 130 and 132 respectively for execution.
- the modified requests 524 and 528 are executed at storage devices 130 and 132, and corresponding notifications 526 and 530 of the outcome of the execution (e.g., a notification of successful execution) is communicated back to the network switch 116.
- the network switch 116 forwards the received notifications 526 and 530 as notification 532 to the computer device 106 where the original request 520 for an Append data processing operation originated.
- FIG. 6 is a block diagram 600 illustrating the network architecture of FIG. 1 performing an Append data processing operation for erasure -coded versions of a data object stored at multiple storage devices in a redundancy group, according to some embodiments. More specifically, FIG. 6 illustrates an example communication flow (which is also discussed in greater detail in connection with FIG. 7) of a stream of data entries for performing the Append data processing operation to erasure-coded versions of the same data object from two client devices (e.g., computer devices 104 and 106). [0096] Computer device 104 communicates a stream 602 of data entries to switch 116 via router 112, and computer device 106 communicates a stream 604 of data entries to switch 116 via router 112.
- Each of the data entries in streams 164 and 166 can include a data processing operation identifier to identify the Append data processing operation 156 (or another data processing operation) to be performed in connection with erasure-coded versions ECl/2, EC2/2, and EC P for data object D1 stored at storage devices 128, 130, and 132 (identified in the shard mapping 146 of the network switch 116 as RG1).
- the network switch 116 uses shard mapping 146 to determine a redundancy group of storage devices associated with the requested data accesses of the data entries within streams 602 and 604.
- the network switch 116 uses sharding information within the data entries of streams 602 and 604 to select a sharding instance of redundancy groups and select one of the redundancy groups (e.g., RG1 including storage devices 128, 130, and 132) for communication of the data entries of streams 602 and 604.
- a sharding instance of redundancy groups e.g., RG1 including storage devices 128, 130, and 132
- the network switch 116 uses sharding information within the data entries of streams 602 and 604 to select a sharding instance of redundancy groups and select one of the redundancy groups (e.g., RG1 including storage devices 128, 130, and 132) for communication of the data entries of streams 602 and 604.
- the data entries in streams 606, 612, and 618 include corresponding data object naming sequences (e.g., such as data object naming sequences 148 illustrated in FIG. 2).
- storage controller 136 uses a data object naming sequence of the data object (e.g., a data object identifier within the data object naming sequence), a data processing operation identifier (which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier), and a sequence identifier within each of the data entries of stream 606 to rearrange the data entries (e.g., sequentially, based on the sequence identifier) using a buffer including the storage area 608.
- a data object naming sequence of the data object e.g., a data object identifier within the data object naming sequence
- a data processing operation identifier which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier
- a sequence identifier within each of the data entries of stream 606 e.g., sequentially, based on the sequence identifier
- the storage area 608 When the storage area 608 is full (e.g., with data packets associated with corresponding Append data processing operations of the data entries within the stream), the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object D1.E1/2 associated with the data entries) and the result (e.g., data portion A) is flushed to a mapping unit of the storage devices 128 such as sector 610.
- the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object D1.E1/2 associated with the data entries) and the result (e.g., data portion A) is flushed to a mapping unit of the storage devices 128 such as sector 610.
- storage controller 138 uses a data object naming sequence of the data object (e.g., a data object identifier within the data object naming sequence), a data processing operation identifier (which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier), and a sequence identifier within each of the data entries of stream 612 to rearrange the data entries (e.g., sequentially, based on the sequence identifier) using a buffer including the storage area 614.
- a data object naming sequence of the data object e.g., a data object identifier within the data object naming sequence
- a data processing operation identifier which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier
- a sequence identifier within each of the data entries of stream 612 e.g., sequentially, based on the sequence identifier
- the storage area 614 When the storage area 614 is full (e.g., with data packets associated with corresponding Append data processing operations of the data entries within the stream), the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object D1.E1/2 associated with the data entries) and the result (e.g., data portion B) is flushed to a mapping unit of the storage devices 130 such as sector 616.
- the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object D1.E1/2 associated with the data entries) and the result (e.g., data portion B) is flushed to a mapping unit of the storage devices 130 such as sector 616.
- storage controller 140 uses a data object naming sequence of the data object (e.g., a data object identifier within the data object naming sequence), a data processing operation identifier (which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier) and a sequence identifier within each of the data entries of stream 618 to rearrange the data entries (e.g., sequentially, based on the sequence identifier) in a buffer including the sector-sized storage area 620.
- a data object naming sequence of the data object e.g., a data object identifier within the data object naming sequence
- a data processing operation identifier which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier
- a sequence identifier within each of the data entries of stream 618 to rearrange the data entries (e.g., sequentially, based on the sequence identifier) in a buffer including the sector-sized storage area 620.
- the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object Dl.EP associated with the data entries) and the result (e.g., parity bits portion P) is flushed to a mapping unit of the storage devices 132 such as sector 622.
- FIG. 7 is a communication flow diagram 700 of example communications associated with the execution of the Append data processing operation of FIG. 6, according to some embodiments. Referring to FIG. 7, the communication flow takes place between computer device 104, computer device 106, the network switch 116, and storage devices 128, 130, and 132.
- computer device 104 and computer device 106 communicate a request to the sequencer module 162 for sequence identifiers for accessing a data object associated with the data object identifier NAME1.
- sequencer module 162 provides sequence identifiers SEQ1 704 and SEQ2706 to computer devices 104 and 106 respectively.
- Computer device 104 communicates a request 708 (which can include a data entry) to the switch 116 for a data processing operation such as an Append data processing operation, which includes a data object identifier of a data object (e.g., a data object name such as NAME1), an SIT (including a sharding instance identifier of SHARD_1 and hashing input information such as X), data (e.g., DATA1) for appending the data object, and sequence identifier SEQ1 obtained from the sequencer module 162.
- request 708 may further include availability information (e.g., identifying a version of a plurality of versions of the data object used in connection with the identified data processing operation).
- the network switch 116 includes a shard mapping 146 with sharding instances.
- Each of the sharding instances of the shard mapping 702 corresponds to a redundancy group (RG) (or a subset) of storage devices of a set of available storage devices (e.g., set of available storage devices 128 - 134) and a hashing function.
- RG redundancy group
- a first sharding instance of the shard mapping 702 can refer to RG1
- a second sharding instance of the shard mapping 702 can refer to RG2
- a third sharding instance of the shard mapping 702 can refer to RG3.
- the network switch 116 performs a shard mapping based on the sharding instance identifier and the hashing input information received with the request 708. More specifically, the sharding instance identifier is mapped to a first sharding instance and the hashing function (e.g., SHA256) of the first sharding instance is applied to the hashing input information to obtain RG1 (e.g., storage devices 128, 130, and 132) as the receiving RG for the Append data processing operation of request 708.
- RG1 e.g., storage devices 128, 130, and 132
- the network switch 116 generates a modified (or updated) request 712 for the data processing operation (e.g., a modified Append data processing operation) which is communicated to storage devices 128, 130, and 132. More specifically, the original request 704 for an Append data processing operation is modified to include DONS of the data object that will be appended at storage devices 128-132 as well as the corresponding availability information.
- the original request 704 for the Append data processing operation includes a data object identifier (e.g., a data object name) NAME1, SIT including a sharding instance identifier of SHARD_1 and hashing input information X, data (e.g., DATA1) for appending, and sequence identifier SEQ1.
- the network switch 116 uses shard mapping 702 and selects a sharding instance associated with RG1 based on the sharding instance identifier SHARD_1.
- the network switch 116 generates the modified request 712 for communication to the storage devices 128-132.
- the modified request 712 for the Append data processing operation for storage devices 128-130 will include DONS with the data object identifier NAME1, the SIT (e.g., ⁇ SHARD_1, X ⁇ ), data for appending (e.g., DATA1), and sequence identifier (e.g., SEQ1).
- the modified request 712 for the Append data processing operation is communicated to storage devices 128-130 for execution.
- the modified request 712 is executed at storage devices 128-130, and corresponding notifications 713 of the outcome of the execution (e.g., a notification of successful execution) are communicated back to the network switch 116.
- the network switch 116 forwards the received notifications 713 as notification 714 to the computer device 104 where the original request 708 for an Append data processing operation originated.
- Similar processing is performed by the network switch 116 with regard to data entries associated with stream 166 originating from computer device 106. More specifically, computer device 106 communicates a request 716 to the switch 116 for a data processing operation such as an Append data processing operation, which includes a data object identifier of a data object (e.g., a data object name such as NAME1), an SIT (including a sharding instance identifier of SHARD_1 and hashing input information such as X), data (e.g., DATA2) for appending the data object, and sequence identifier SEQ2 obtained from the sequencer module 162.
- a data object identifier of a data object e.g., a data object name such as NAME1
- SIT including a sharding instance identifier of SHARD_1 and hashing input information such as X
- data e.g., DATA2
- sequence identifier SEQ2 obtained from the sequencer module 162.
- request 716 may further include availability information (e.g., identifying a version of a plurality of versions of the data object used in connection with the identified data processing operation).
- availability information e.g., identifying a version of a plurality of versions of the data object used in connection with the identified data processing operation.
- the network switch 116 performs a shard mapping (similar to the shard mapping at operation 710) based on the sharding instance identifier and the hashing input information received with the request 716. More specifically, the sharding instance identifier is used to obtain RG1 (e.g., storage devices 128- 130) as the receiving RG for the Append data processing operation of request 716.
- RG1 e.g., storage devices 128- 130
- the network switch 116 generates a modified (or updated) request 718 for the data processing operation (e.g., a modified Append data processing operation) which is communicated to storage devices 128, 130, and 132. More specifically, the original request 716 for an Append data processing operation is modified to include DONS of the data object that will be appended at storage devices 128-132 as well as the corresponding availability information.
- the original request 716 for the Append data processing operation includes a data object identifier (e.g., a data object name) NAME1, SIT including a sharding instance identifier of SHARD_1 and hashing input information X, data (e.g., DATA2) for appending, and sequence identifier SEQ2.
- the network switch 116 uses shard mapping 702 and selects a sharding instance associated with RG1 based on the sharding instance identifier SHARD_1.
- the network switch 116 generates the modified request 718 for communication to the storage devices 128-132.
- the modified request 718 for the Append data processing operation for storage devices 128-130 will include DONS with the data object identifier NAME1, the SIT (e.g., ⁇ SHARD_1, X ⁇ ), data for appending (e.g., DATA2), and sequence identifier (e.g., SEQ2).
- the modified request 718 for the Append data processing operation is communicated to storage devices 128-130 for execution.
- the modified request 718 is executed at storage devices 128-130, and corresponding notifications 719 of the outcome of the execution (e.g., a notification of successful execution) are communicated back to the network switch 116.
- the network switch 116 forwards the received notifications 719 as notification 720 to the computer device 106 where the original request 716 for an Append data processing operation originated.
- FIG. 8 is a block diagram 800 illustrating the network architecture of FIG. 1 performing an Append data processing operation for erasure -coded versions of a data object stored at multiple storage devices in a redundancy group where pre-processing of the Append operation is performed at the network switch so that the amount of data transmitted between the switch and storage device can be reduced, according to some embodiments. More specifically, FIG. 8 illustrates an example communication flow of a stream of data entries for performing the Append data processing operation to erasure-coded versions of the same data object from two client devices (e.g., computer devices 104 and 106), where partial data processing is performed at the network switch 116.
- switch 116 can be configured as software-defined networking (SDN) switch implementing partial processing of data processing operations within a received stream of data entries.
- SDN software-defined networking
- Computer device 104 communicates a stream 802 of data entries to switch 116 via the router 112, and computer devices 106 communicates a stream 804 of data entries to switch 116 via router 112.
- Each of the data entries in streams 802 and 804 includes a data processing operation identifier to identify the Append data processing operation 156 (or another data processing operation) to be performed in connection with erasure-coded versions ECl/2, EC2/2, and EC P for data object D1 stored at storage devices 128, 130, and 132 (identified in the shard mapping 146 of the network switch 116 as RG1).
- the network switch 116 uses shard mapping 146 to determine a redundancy group of storage devices associated with the requested data accesses of the data entries within the streams 802 and 804. For example, the network switch 116 uses sharding information within the data entries of streams 802 and 804 to select a sharding instance of redundancy groups and select one of the redundancy groups (e.g., RG1 including storage devices 128, 130, and 132) for communication of the data entries of streams 802 and 804.
- a sharding instance of redundancy groups e.g., RG1 including storage devices 128, 130, and 132
- the network switch 116 may retrieve the erasure-coded versions of data object D1 and, at operation 806, may perform the Append data processing operations associated with data entries received with streams 802 and 804. More specifically, the network switch 116 performs the Append data processing operations associated with data entries received with streams 802 and 804 on erasure coded versions D1.E1/2 and D1.E2/2 (stored at storage devices 128 and 130) to obtain data portions A and B respectively. The network switch 116 further performs the Append data processing operations associated with data entries received with streams 802 and 804 on erasure coded version Dl.EP (stored at storage device 132) to obtain parity bits portion P.
- Dl.EP stored at storage device 132
- the obtained A, B, and P results of the Append data processing operations are stored at the local storage (or other remote storage) associated with the network switch 116.
- processing efficiency is increased (e.g., shortened processing time and reduced use of network-attached storage devices).
- the obtained A, B, and P results of the Append data processing operations are communicated as streams 808, 810, and 812 to corresponding storage devices 128, 130, and 132.
- the storage devices 128, 130, and 132 store the received A, B, and P results of the Append data processing operations into temporary storage locations (e.g., sector-sized storage locations) and then to sectors 814, 816, and 818 (e.g., when the sector-sized storage locations are full up to a sector size).
- FIG. 9 is a flowchart of another method 900 for processing a stream of data accesses within a storage device, according to example embodiments.
- Method 900 includes operations 902, 904, 906, 908, and 910.
- method 900 is described as being performed by one or more processors of the network architecture 100 (e.g., a data entry processing module performing the functionalities of sequencer module 162, naming server 102, the network switch 116, and storage controllers 136 - 142 of FIG. 1 or other firmware or hardware discussed herein), which can be configured to execute within a computer device such as device 1100 FIG. 11.
- processors of the network architecture 100 e.g., a data entry processing module performing the functionalities of sequencer module 162, naming server 102, the network switch 116, and storage controllers 136 - 142 of FIG. 1 or other firmware or hardware discussed herein
- a computer device such as device 1100 FIG. 11.
- a plurality of sequence identifiers is generated for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order. For example, as illustrated in FIG. 5 in FIG. 7, sequential sequence identifiers 502 and 504 are generated based on requests from computer devices 104 and 106 in connection with data processing operations for a data object with a data object identifier of NAME1.
- a subset of storage devices of the plurality of a plurality of storage devices is selected based on a sharding instance identifier associated with the plurality of data entries. For example, shard mapping 146 of switch 116 is used to map a sharding instance identifier and determine a redundancy group (e.g., RG2) of storage devices as the subset.
- a redundancy group e.g., RG2
- a data stream with the plurality of data entries is communicated to a storage device of the subset of storage devices.
- Each data entry of the plurality of data entries includes a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers.
- switch 116 communicates requests 510 and 514 associated with data entries to the storage devices of RG2.
- the plurality of data entries is arranged in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers. For example, as illustrated in FIG.
- the data entries received in streams 168 and 170 at storage devices 130 and 132 from the network switch 116 are arranged in storage areas 172 and 178 respectively, based on the data object identifiers and the sequential order of the sequence identifiers.
- a data processing operation is executed for each data entry of the plurality of data entries using the data object.
- the data processing operation corresponds to the data processing operation identifier, and the data processing operation is executed according to an execution order based on the sequence identifier. For example, after the data entries of stream 168 are arranged in sequential order as stream 174, corresponding data processing operations are executed based on the sequential order and the result is flushed to sector 176 when the storage area 172 is full with results of the execution of the data processing operation.
- FIG. 10 is a block diagram illustrating a representative software architecture, which may be used in conjunction with various device hardware described herein, according to example embodiments.
- FIG. 10 is merely a non limiting example of a software architecture 1002 and it will be appreciated that many other architectures may be implemented to facilitate the functionality described herein.
- the software architecture 1002 executes on hardware, such as any of the computer devices in FIG. 1 which can be the same as device 1100 of FIG. 11 that includes, among other things, processor 1105, memory 1110, storage 1115 and/or 1120, and I/O interfaces 1125 and 1130.
- a representative hardware layer 1004 is illustrated and can represent, for example, the device 1100 of FIG. 11.
- the representative hardware layer 1004 comprises one or more processing units 1006 having associated executable instructions 1008.
- Executable instructions 1008 represent the executable instructions of the software architecture 1002, including implementation of the methods, modules, and so forth of FIGS. 1-9.
- Hardware layer 1004 also includes memory or storage modules 1010, which also have executable instructions 1008.
- Hardware layer 1004 may also comprise other hardware 1012, which represents any other hardware of the hardware layer 1004, such as the other hardware illustrated as part of device 1100.
- the memory or storage modules 1010 can include storage devices (e.g., any of storage devices 128-134) with firmware implementing the data entry processing module 1060.
- the data entry processing module 1060 comprises suitable circuitry, logic, interfaces, or code and can be configured to perform the functionalities of sequencer module 162, naming server 102, the network switch 116, and storage controllers 136 - 142 of FIG. 1 or other firmware or hardware discussed herein in connection with FIGS. 1-8.
- the software architecture 1002 may be conceptualized as a stack of layers where each layer provides particular functionality.
- the software architecture 1002 may include layers such as an operating system 1014, libraries 1016, frameworks/middleware 1018, applications 1020, and presentation layer 1044.
- the applications 1020 or other components within the layers may invoke application programming interface (API) calls 1024 through the software stack and receive a response, returned values, and so forth, illustrated as messages 1026 in response to the API calls 1024.
- API application programming interface
- the layers illustrated in FIG. 10 are representative in nature and not all software architectures 1002 have all layers. For example, some mobile or special purpose operating systems may not provide frameworks/middleware 1018, while others may provide such a layer. Other software architectures may include additional or different layers.
- the operating system 1014 may manage hardware resources and provide common services.
- the operating system 1014 may include, for example, a kernel 1028, services 1030, and drivers 1032.
- the kernel 1028 may act as an abstraction layer between the hardware and the other software layers. For example, kernel 1028 may be responsible for memory management, processor management (e.g., scheduling), component management, networking, security settings, and so on.
- the services 1030 may provide other common services for the other software layers.
- Drivers 1032 may be responsible for controlling or interfacing with the underlying hardware.
- the drivers 1032 may include display drivers, camera drivers, Bluetooth® drivers, flash memory drivers, serial communication drivers (e.g., Universal Serial Bus (USB) drivers), Wi-Fi® drivers, audio drivers, power management drivers, and so forth, depending on the hardware configuration.
- serial communication drivers e.g., Universal Serial Bus (USB) drivers
- USB Universal Serial Bus
- Wi-Fi® drivers audio drivers
- power management drivers and so forth, depending on the hardware configuration.
- Libraries 1016 may provide a common infrastructure that may be utilized by the applications 1020 or other components or layers. Libraries 1016 typically provide functionality that allows other software modules to perform tasks more easily than to interface directly with the underlying operating system 1014 functionality (e.g., kernel 1028, services 1030, or drivers 1032). Libraries 1016 may include system libraries 1034 (e.g., C standard library) that may provide functions such as memory allocation functions, string manipulation functions, mathematic functions, and the like.
- system libraries 1034 e.g., C standard library
- libraries 1016 may include API libraries 1036 such as media libraries (e.g., libraries to support presentation and manipulation of various media format such as MPEG4, H.264, MP3, AAC, AMR, JPG, PNG), graphics libraries (e.g., an OpenGL framework that may be used to render 2D and 3D in a graphic content on a display), database libraries (e.g., SQLite that may provide various relational database functions), web libraries (e.g., WebKit that may provide web browsing functionality), and the like.
- libraries 1016 may also include a wide variety of other libraries 1038 to provide many other APIs to the applications 1020 and other software components/modules.
- the frameworks/middleware 1018 may provide a higher-level common infrastructure that may be utilized by the applications 1020 or other software components/modules.
- the frameworks/middleware 1018 may provide various graphical user interface (GUI) functions, high-level resource management, high-level location services, and so forth.
- GUI graphical user interface
- the frameworks/middleware 1018 may provide a broad spectrum of other APIs that may be utilized by the applications 1020 or other software components/modules, some of which may be specific to a particular operating system 1014 or platform.
- Applications 1020 include built-in applications 1040, and third- party applications 1042.
- built-in applications 1040 may include but are not limited to, a contacts application, a browser application, a book reader application, a location application, a media application, a messaging application, or a game application.
- Third-party applications 1042 may include any of the built-in applications 1040 as well as a broad assortment of other applications.
- the third-party application 1042 e.g., an application developed using the AndroidTM or iOSTM software development kit (SDK) by an entity other than the vendor of the particular platform
- the third-party application 1042 may be mobile software running on a mobile operating system such as iOSTM, AndroidTM, Windows® Phone, or other mobile operating systems.
- the third-party application 1042 may invoke the API calls 1024 provided by the mobile operating system such as operating system 1014 to facilitate functionality described herein.
- the applications 1020 may utilize built-in operating system functions (e.g., kernel 1028, services 1030, and drivers 1032), libraries (e.g., system libraries 1034, API libraries 1036, and other libraries 1038), and frameworks/middleware 1018 to create user interfaces to interact with users of the system.
- built-in operating system functions e.g., kernel 1028, services 1030, and drivers 1032
- libraries e.g., system libraries 1034, API libraries 1036, and other libraries 1038
- frameworks/middleware 1018 e.g., frameworks/middleware 1018 to create user interfaces to interact with users of the system.
- interactions with a user may occur through a presentation layer, such as presentation layer 1044.
- presentation layer 1044 such as presentation layer 1044.
- the application/module "logic" can be separated from the aspects of the application/module that interact with a user.
- virtual machine 1048 Some software architectures utilize virtual machines. In the example of FIG. 10, this is illustrated by virtual machine 1048.
- a virtual machine creates a software environment where applications/modules can execute as if they were executing on a hardware machine (such as the device 1100 of FIG. 11, for example).
- a virtual machine 1048 is hosted by a host operating system (e.g., operating system 1014) and typically, although not always, has a virtual machine monitor 1046, which manages the operation of the virtual machine 1048 as well as the interface with the host operating system (i.e., operating system 1014).
- a software architecture 1002 executes within the virtual machine 1048 such as an operating system 1050, libraries 1052, frameworks/middleware 1054, applications 1056, or presentation layer 1058. These layers of software architecture executing within the virtual machine 1048 can be the same as corresponding layers previously described or may be different.
- FIG. 11 is a block diagram illustrating circuitry for a device that implements algorithms and performs methods, according to example embodiments. All components need not be used in various embodiments. For example, clients, servers, and cloud-based network devices may each use a different set of components, or in the case of servers, larger storage devices.
- computing device 1100 may include a processor 1105, memory 1110, removable storage 1115, non-removable storage 1120, input interface 1125, the output interface 1130, and communication interface 1135, all connected by a bus 1140.
- processor 1105 may include a processor 1105, memory 1110, removable storage 1115, non-removable storage 1120, input interface 1125, the output interface 1130, and communication interface 1135, all connected by a bus 1140.
- memory 1110 may include a processor 1105, memory 1110, removable storage 1115, non-removable storage 1120, input interface 1125, the output interface 1130, and communication interface 1135, all connected by a bus 1140.
- the memory 1110 may include volatile memory 1145 and non volatile memory 1150 and may store a program 1155.
- the computing device 1100 may include - or have access to a computing environment that includes - a variety of computer-readable media, such as the volatile memory 1145, the non volatile memory 1150, the removable storage 1115, and the non-removable storage 1120.
- Computer storage includes random-access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD ROM), digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.
- RAM random-access memory
- ROM read only memory
- EPROM erasable programmable read-only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory or other memory technologies
- compact disc read-only memory (CD ROM), digital versatile disks (DVD) or other optical disk storage magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.
- Computer-readable instructions stored on a computer-readable medium are executable by the processor 1105 of the computing device 1100.
- a hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer- readable medium such as a storage device.
- the terms “computer-readable medium” and “storage device” do not include carrier waves to the extent that carrier waves are deemed too transitory.
- “Computer-readable non-transitory media” includes all types of computer-readable media, including magnetic storage media, optical storage media, flash media, and solid-state storage media. It should be understood that software can be installed in and sold with a computer.
- the software can be obtained and loaded into the computer, including obtaining the software through a physical medium or distribution system, including, for example, from a server owned by the software creator or from a server not owned but used by the software creator.
- the software can be stored on a server for distribution over the Internet, for example.
- the terms “computer-readable medium” and “machine -readable medium” are interchangeable.
- Program 1155 may utilize a data entry processing module 1160 which may be the same as or similar to the data entry processing module 1060 of FIG. 10.
- the data entry processing module 1160 comprises suitable circuitry, logic, interfaces, or code and can be configured to perform the functionalities of sequencer module 162, naming server 102, the network switch 116, and storage controllers 136 - 142 of FIG. 1 or other firmware or hardware discussed herein in connection with FIGS. 1-8.
- Any one or more of the modules described herein may be implemented using hardware (e.g., a processor of a machine, an application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or any suitable combination thereof). Moreover, any two or more of these modules may be combined into a single module, and the functions described herein for a single module may be subdivided among multiple modules. Furthermore, according to various example embodiments, modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.
- hardware e.g., a processor of a machine, an application- specific integrated circuit (ASIC), field-programmable gate array (FPGA), or any suitable combination thereof.
- ASIC application- specific integrated circuit
- FPGA field-programmable gate array
- software including one or more computer-executable instructions that facilitate processing and operations as described above regarding any one or all of the steps of the disclosure can be installed in and sold with one or more computing devices consistent with the disclosure.
- the software can be obtained and loaded into one or more computing devices, including obtaining the software through a physical medium or distribution system, including, for example, from a server owned by the software creator or from a server not owned but used by the software creator.
- the software can be stored on a server for distribution over the Internet, for example.
- the components of the illustrative devices, systems, and methods employed by the illustrated embodiments can be implemented, at least in part, in digital electronic circuitry, analog electronic circuitry, or computer hardware, firmware, software, or in combinations of them. These components can be implemented, for example, as a computer program product such as a computer program, program code, or computer instructions tangibly embodied in an information carrier, or a machine-readable storage device, for execution by, or to control the operation of, data processing apparatus such as a programmable processor, a computer, or multiple computers.
- a computer program product such as a computer program, program code, or computer instructions tangibly embodied in an information carrier, or a machine-readable storage device, for execution by, or to control the operation of, data processing apparatus such as a programmable processor, a computer, or multiple computers.
- a computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other units suitable for use in a computing environment.
- a computer program can be deployed to be executed on one computer or multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
- functional programs, codes, and code segments for accomplishing the techniques described herein can be easily construed as within the scope of the claims by programmers skilled in the art to which the techniques described herein pertain.
- Method steps associated with the illustrative embodiments can be performed by one or more programmable processors executing a computer program, code, or instructions to perform functions (e.g., by operating on input data or generating an output). Method steps can also be performed by, and apparatus for performing the methods can be implemented as, special purpose logic circuitry, e.g., an FPGA (field-programmable gate array) or an ASIC (application-specific integrated circuit), for example.
- FPGA field-programmable gate array
- ASIC application-specific integrated circuit
- DSP digital signal processor
- a general-purpose processor may be a microprocessor, but in the alternative, the processor may be any processor, controller, microcontroller, or state machine.
- a processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
- a processor will receive instructions and data from a read-only memory or a random-access memory or both.
- the required elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example, semiconductor memory devices, e.g., electrically programmable read-only memory or ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory devices, or data storage disks (e.g., magnetic disks, internal hard disks, or removable disks, magneto-optical disks, or CD-ROM and DVD-ROM disks).
- EPROM electrically programmable read-only memory
- EEPROM electrically erasable programmable ROM
- flash memory devices e.g., electrically erasable programmable ROM (EEPROM), flash memory devices, or data storage disks (e.g., magnetic disks, internal hard disks
- machine-readable medium comprises a device able to store instructions and data temporarily or permanently and may include, but is not limited to, random- access memory (RAM), read-only memory (ROM), buffer memory, flash memory, optical media, magnetic media, cache memory, other types of storage (e.g., Erasable Programmable Read-Only Memory (EEPROM)), or any suitable combination thereof.
- RAM random- access memory
- ROM read-only memory
- buffer memory flash memory
- optical media magnetic media
- cache memory other types of storage
- EEPROM Erasable Programmable Read-Only Memory
- machine-readable medium shall also be taken to include any medium or a combination of multiple media, that is capable of storing instructions for execution by one or more processors, such that the instructions, when executed by one or more processors, cause the one or more processors to perform any one or more of the methodologies described herein. Accordingly, a “machine- readable medium” refers to a single storage apparatus or device, as well as “cloud-based” storage systems or storage networks that include multiple storage apparatus or devices. The term “machine-readable medium” as used herein excludes signals per se.
- the computing device 1100 includes a a sequencer module for generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order; a switching module for: selecting a subset of storage devices of a plurality of storage devices storing a copy of the data object, the selecting based on a sharding instance identifier associated with the plurality of data entries; and communicating a data stream with the plurality of data entries to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers; and a storage controller module for: arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers; and executing for each data entry
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
In some implementations, a method for processing data includes generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the sequence identifiers generated in sequential order. A subset of storage devices of a plurality of storage devices is selected based on a sharding instance identifier. A stream with the data entries is communicated to a storage device of the subset. Each of the data entries includes a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers. The data entries are arranged in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers. A data processing operation corresponding to the data processing operation identifier is executed according to an execution order based on the sequence identifier.
Description
DATA ACCESS PROCESSING ON NETWORK-ATTACHED STORAGE
DEVICES
TECHNICAL FIELD
[0001] The present disclosure is related to the processing of ordered data accesses in computing devices.
BACKGROUND
[0002] Storage systems are often associated with processing inefficiencies for streaming partial or unaligned updates. For example, performing partial updates to a storage device, such as updates that are smaller than the device mapping unit, includes reading data that is larger than the partial updates, modifying the data, and storing back the modified data. There are drawbacks with this approach, including data volatility (e.g., volatility once the data is read from the storage device) as well as acknowledgment delays (e.g., data update acknowledgment is communicated after the updated data is written back into storage). Storing those updates on persistent media in a redundant fashion like replication or erasure-coded format requires additional complexity and overheads.
SUMMARY
[0003] Various examples are now described to introduce a selection of concepts in a simplified form, which are further described below in the detailed description. The Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
[0004] According to a first aspect of the present disclosure, there is provided a system for processing data in a distributed storage network. The system includes a plurality of storage devices and processing circuitry coupled to the plurality of storage devices. Each of the plurality of storage devices storing a copy of a data object. The processing circuitry is configured to perform operations including generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing the data object, the plurality of sequence identifiers generated in sequential order. A subset of storage
devices of the plurality of storage devices is selected based on a sharding instance identifier associated with the plurality of data entries. A data stream with the plurality of data entries is communicated to a storage device of the subset of storage devices. Each data entry of the plurality of data entries includes a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers. The plurality of data entries are arranged in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers. A data processing operation is executed for each data entry of the plurality of data entries using the data object. The data processing operation corresponds to the data processing operation identifier, and the data processing operation is executed according to an execution order based on the sequence identifier. In some embodiments, the selecting of the subset of storage devices of the plurality of storage devices is based on the sharding instance identifier and a redundancy group associated with the plurality of data entries. In another embodiment, the outcome of the executing, or post-processing of the outcome based on redundancy requirements, is stored into a mapping unit (such as a sector) of the storage device.
[0005] In a first implementation form of the system according to the first aspect as such, each data entry of the plurality of data entries further includes availability information identifying one of a plurality of versions of the data object stored at the storage device. The processing circuitry is further configured to perform operations including executing the data processing operation using the one of a plurality of versions of the data object identified by the availability information.
[0006] In a second implementation form of the system according to the first aspect as such or any implementation form of the first aspect, the plurality of versions of the data object identified by the availability information includes one of a plurality of redundancy versions associated with a corresponding plurality of copies of the data object, or a plurality of erasure-coded versions associated with data information or parity information of the data object.
[0007] In a third implementation form of the system according to the first aspect as such or any implementation form of the first aspect, to select the
subset of storage devices, the processing circuitry is further configured to perform operations including selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier, the selected sharding instance identifying the subset of storage devices.
[0008] In a fourth implementation form of the system according to the first aspect as such or any implementation form of the first aspect, the data processing operation is associated with a plurality of redundancy versions of the data object. The processing circuitry is further configured to perform operations including applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify the storage device of the subset of storage devices. The data stream with the plurality of data entries is communicated to the storage device of the subset of storage devices, each data entry of the plurality of data entries further including a redundancy version of the plurality of redundancy versions of the data object and a data segment for appending to the redundancy version of the data object during execution of the data processing operation.
[0009] In a fifth implementation form of the system according to the first aspect as such or any implementation form of the first aspect, the processing circuitry is further configured to perform operations including arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers. The data processing operation is executed for each data entry of the plurality of data entries using the redundancy version of the data object and according to the execution order based on the sequence identifier.
[0010] In a sixth implementation form of the system according to the first aspect as such or any implementation form of the first aspect, the data processing operation is associated with a plurality of erasure-coded versions of the data object. To select the subset of storage devices, the processing circuitry is further configured to perform operations including selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier. The selected sharding instance
identifies a plurality of redundancy groups of storage devices of the plurality of storage devices.
[0011] In a seventh implementation form of the system according to the first aspect as such or any implementation form of the first aspect, the processing circuitry is further configured to perform operations including applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify a redundancy group of the plurality of redundancy groups, the identified redundancy group including the subset of storage devices.
[0012] In an eighth implementation form of the system according to the first aspect as such or any implementation form of the first aspect, the processing circuitry is further configured to perform operations including communicating the data stream with the plurality of data entries to each storage device of the subset of storage devices. Each data entry of the plurality of data entries further includes a corresponding erasure -coded version of the plurality of erasure-coded versions of the data object and a data segment for appending to the corresponding erasure-coded version of the data object during execution of the data processing operation at each storage device of the subset of storage devices.
[0013] In a ninth implementation form of the system according to the first aspect as such or any implementation form of the first aspect, the processing circuitry is further configured to perform operations including, for each storage device of the subset of storage devices, arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers. The data processing operation for each data entry of the plurality of data entries is executed using the corresponding erasure-coded version of the data object and according to the execution order based on the sequence identifier.
[0014] In a tenth implementation form of the system according to the first aspect as such or any implementation form of the first aspect, the processing circuitry is further configured to perform operations including, for each storage device of the subset of storage devices, arranging the plurality of data entries in a buffer based on the data object identifier and the sequential order of the plurality of sequence identifiers. The data processing operation for each data entry of the
plurality of data entries is executed using an erasure -coded version of the plurality of erasure-coded versions of the data object and according to the execution order based on the sequence identifier, to generate a corresponding erasure-coded result. The corresponding erasure-coded result is communicated to each storage device of the subset of storage devices for storage.
[0015] According to a second aspect of the present disclosure, there is provided a computer-implemented method for processing data in a distributed storage network. The method includes generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order. A subset of storage devices of a plurality of storage devices is selected based on a sharding instance identifier associated with the plurality of data entries, the plurality of storage devices storing a copy of the data object. A data stream with the plurality of data entries is communicated to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers. The plurality of data entries is arranged in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers. A data processing operation is executed for each data entry of the plurality of data entries using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing operation executed according to an execution order based on the sequence identifier.
[0016] In a first implementation form of the method according to the second aspect as such, each data entry of the plurality of data entries further includes availability information identifying one of a plurality of versions of the data object stored at the storage device. The method further includes executing the data processing operation using the one of a plurality of versions of the data object identified by the availability information.
[0017] In a second implementation form of the method according to the second aspect as such or any implementation form of the second aspect, the plurality of versions of the data object identified by the availability information includes one of a plurality of redundancy versions associated with a
corresponding plurality of copies of the data object, or a plurality of erasure- coded versions associated with data information or parity information of the data object.
[0018] In a third implementation form of the method according to the second aspect as such or any implementation form of the second aspect, selecting the subset of storage devices further includes selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier, the selected sharding instance identifying the subset of storage devices.
[0019] In a fourth implementation form of the method according to the second aspect as such or any implementation form of the second aspect, the data processing operation is associated with a plurality of redundancy versions of the data object. The method further includes applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify the storage device of the subset of storage devices. The data stream with the plurality of data entries is communicated to the storage device of the subset of storage devices. Each data entry of the plurality of data entries further includes a redundancy version of the plurality of redundancy versions of the data object and a data segment for appending to the redundancy version of the data object during the execution of the data processing operation.
[0020] In a fifth implementation form of the method according to the second aspect as such or any implementation form of the second aspect, the method further includes arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers. The data processing operation is executed for each data entry of the plurality of data entries using the redundancy version of the data object and according to the execution order based on the sequence identifier.
[0021] According to a third aspect of the present disclosure, there is provided a non-transitory computer-readable medium storing instructions for processing data in a distributed storage network. When executed by one or more processors of a network node within the distributed storage network, the
instructions cause the one or more processors to perform operations including generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order. A subset of storage devices of a plurality of storage devices is selected based on a sharding instance identifier associated with the plurality of data entries, the plurality of storage devices storing a copy of the data object. A data stream with the plurality of data entries is communicated to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers. The plurality of data entries is arranged in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers. A data processing operation is executed for each data entry of the plurality of data entries using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing operation executed according to an execution order based on the sequence identifier.
[0022] In a first implementation form of the computer-readable medium according to the third aspect as such, to select the subset of storage devices, the instructions cause the one or more processors to perform operations including selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier, the selected sharding instance identifying a plurality of redundancy groups of storage devices of the plurality of storage devices.
[0023] In a second implementation form of the computer-readable medium according to the third aspect as such or any implementation form of the third aspect, the data processing operation is associated with a plurality of erasure-coded versions of the data object. Executing the instructions causes the one or more processors to perform operations including applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify a redundancy group of the plurality of redundancy groups, the identified redundancy group comprising the subset of storage devices.
[0024] In a third implementation form of the computer-readable medium according to the third aspect as such or any implementation form of the third aspect, executing the instructions causes the one or more processors to perform operations including, for each storage device of the subset of storage devices, arranging the plurality of data entries in a buffer based on the data object identifier and the sequential order of the plurality of sequence identifiers. The data processing operation for each data entry of the plurality of data entries is executed using an erasure-coded version of the plurality of erasure-coded versions of the data object and according to the execution order based on the sequence identifier, to generate a corresponding erasure-coded result. The corresponding erasure-coded result is communicated to each storage device of the subset of storage devices for storage.
[0025] According to a fourth aspect of the present disclosure, there is provided an apparatus in a distributed storage network. The apparatus includes a plurality of storage devices, each of the plurality of storage devices storing a copy of a data object. The apparatus further includes a sequencer module for generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order. The apparatus further includes a switching module (e.g., a network switch) for selecting a subset of storage devices of a plurality of storage devices storing a copy of the data object, the selecting based on a sharding instance identifier associated with the plurality of data entries.
The switching module is further for communicating a data stream with the plurality of data entries to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers. The apparatus further includes a storage controller module (e.g., a storage controller) for arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers. The storage module is further for executing for each data entry of the plurality of data entries, a data processing operation using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing
operation executed according to an execution order based on the sequence identifier.
[0026] Any of the foregoing examples may be combined with any one or more of the other foregoing examples to create a new embodiment within the scope of the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
[0027] In the drawings, like numerals may describe similar components in different views. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.
[0028] FIG. 1 is a block diagram illustrating a network architecture including a sequencer generating sequence identifiers, a naming server generating data object naming sequences, and multiple computing devices configured to access multiple network-attached storage devices based on the data object naming sequences and the sequence identifiers, according to some embodiments.
[0029] FIG. 2 is a block diagram illustrating example data object naming sequences, according to some embodiments.
[0030] FIG. 3A is a block diagram illustrating processing a plurality of data entries associated with the same data object identifier, according to example embodiments.
[0031] FIG. 3B is a block diagram illustrating processing a plurality of data entries associated with multiple data object identifiers, according to example embodiments.
[0032] FIG. 4 is a block diagram illustrating a storage controller configured to perform erasure coding operations on streaming data within a storage device, according to example embodiments.
[0033] FIG. 5 is a communication flow diagram of example communications associated with the execution of an Append data processing operation for data object replicas in a distributed storage network using data object naming sequences and shard mapping to multiple storage devices in a redundancy group, according to some embodiments.
[0034] FIG. 6 is a block diagram illustrating the network architecture of
FIG. 1 performing an Append data processing operation for erasure-coded versions of a data object stored at multiple storage devices in a redundancy group, according to some embodiments.
[0035] FIG. 7 is a communication flow diagram of example communications associated with the execution of the Append data processing operation of FIG. 6, according to some embodiments.
[0036] FIG. 8 is a block diagram illustrating the network architecture of
FIG. 1 performing an Append data processing operation for erasure-coded versions of a data object stored at multiple storage devices in a redundancy group where pre-processing of the Append operation is performed at the network switch so that the amount of data transmitted between the switch and storage device can be reduced, according to some embodiments.
[0037] FIG. 9 is a flowchart of another method for processing a stream of data accesses within a storage device, according to example embodiments.
[0038] FIG. 10 is a block diagram illustrating a representative software architecture, which may be used in conjunction with various device hardware described herein, according to example embodiments.
[0039] FIG. 11 is a block diagram illustrating circuitry for a device that implements algorithms and performs methods, according to example embodiments.
DETAILED DESCRIPTION
[0040] It should be understood at the outset that although an illustrative implementation of one or more embodiments is provided below, the disclosed systems and methods described with respect to FIGS. 1-8 may be implemented using any number of techniques, whether currently known or not yet in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
[0041] In the following description, reference is made to the accompanying drawings that form a part hereof, and in which are shown, by way of illustration, specific embodiments that may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the inventive subject matter, and it is to be understood that other embodiments may be utilized, and that structural, logical, and electrical changes may be made without departing from the scope of the present disclosure. The following description of example embodiments is, therefore, not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims.
[0042] As used herein, the terms “data accesses” or “ordered data accesses” refer to a stream of data entries used for accessing data objects at network-attached storage devices. An example data entry may include a data object identifier (e.g., an identifier, or ID, of at least one data object that the data entry is associated with), a data processing operation identifier (e.g., an identifier representing a data processing operation to be performed on the data object), and a sequence identifier for ordering data processing operations associated with the same data object identifier. Since the data entries (in the stream of data entries) may be ordered according to, e.g., the data object identifier, the stream of data entries is also referred to as ordered data accesses. As used herein, the term “data processing operation” includes at least one of a Create data processing operation (to generate a data object at a storage device), a Read data processing operation (for reading a data object), a Modify data processing operation such as an Append data processing operation (for appending data to an existing data object), and a Delete data processing operation (for deleting a data object).
[0043] As used herein, the terms “host,” “host computing device,”
“client,” “client computing device,” “computing device,” and “computer device” are used interchangeably and indicate network devices configured to access network-attached storage devices within a data processing network. As used herein, the terms “network-attached storage device” and “storage device” are interchangeable. As used herein, the term “redundancy group” indicates a grouping of storage devices that can be used for storing related versions of a data object (e.g., data object replicas or erasure-coded versions of a data object including data portions and parity bit portions).
[0044] Storage systems, most of which have inherent mapping units, are often associated with processing inefficiencies for streaming partial or unaligned data object updates. For example, any partial (e.g., less than a mapping unit such as a sector) updates to the storage device will be performed using read-modify- write operations to carry out the updates. During read-modify-write operations, an entire sector is read out from the storage device into a temporary buffer, the temporary data is modified with the partial update, and then the whole sector (including the updated portion) is written back to the storage device (an example read-modify-write operation is discussed in connection with FIG. 1). There are several drawbacks to this approach. First, the temporary data is outside of the storage device, thus may be volatile when stored in the temporary buffer (e.g., a memory buffer), and the new data may be susceptible to power loss, resulting in a data loss for all data stored in the temporary buffer. Second, if an acknowledgment for the data update is communicated upon writing the updated data back into the storage device, the acknowledgment delay can be significant since an acknowledgment may not be sent until the full sector is written back into the storage device.
[0045] Storage systems may also be associated with processing inefficiencies when there are multiple partial updates to the same sector or adjacent sectors, and some of the updates are not sector aligned. A solution to this issue is to employ a coalescing buffer outside the storage device for temporarily holding the updates, coalescing the partial updates into one full sector, then flushing the full sector back to the device. Even though such processing removes the long read-modify-write cycle, additional inefficiencies are introduced. For example, some of the inefficiencies are associated with the coalescing buffer. First, the coalescing buffer is volatile, and the stored data can be lost in the case of power loss or a software error. Second, since the write operation cannot be acknowledged until the buffer is fully flushed to the storage device, a potential indefinite delay in acknowledging the update may be introduced. Third, for a misaligned update spanning across multiple sectors, extra logic is needed to ensure the atomicity of the update.
[0046] Processing inefficiencies also result from using a conventional naming scheme where each storage device namespace is identified by a
namespace identifier (NSID), which is an integer selected from the set {0; 1; 2;
... } and assigned sequentially as NSID for accessing the namespace. This naming scheme, however, is not scalable and does not provide data object naming that is unique and can be used for routing by network devices within a distributed storage network. Most distributed systems employ a naming service to map a system-wide unique name to a name local to a storage device.
Normally this approach means an extra hop to the naming service for each access. Some systems use client-side caching to remember the recently used mapping information to eliminate the extra hop to naming service, but maintaining such information coherent across naming service and clients will incur extra communication and protocol overhead.
[0047] Techniques disclosed herein can be used to perform processing of ordered data accesses that is agnostic to mapping unit size and may be used for efficiently processing partial or unaligned data object updates. The disclosed techniques also eliminate (or reduce) the read-modify -write operations for partial updates, reduce data movement and bus bandwidth utilization, and reduce latency for individual update operations. The disclosed techniques use a storage device (e.g., a hard disk drive or a solid-state drive) controller to process ordered data accesses including data entries for performing data object updates (e.g., partial and/or unaligned data object updates). More specifically and as explained above, each of the data entries received as part of a stream of ordered data accesses may include a data object identifier (e.g., an ID of at least one data object that the data entry is associated with), a data processing operation identifier (e.g., an identifier representing a data processing operation to be performed on the data object such as a Create, Read, Write, Append, Verify, Delete, or another data modification operation), and a sequence identifier (e.g., sequence identifier SeqNum which can be used for ordering data processing operations associated with the same stream identifier). In this regard, a data entry received by the storage device controller is tagged with a unique ID for a particular data object and a sequence identifier (or number) (e.g., SeqNum) that can be used to order all the outstanding operations with regard to that data object. Additionally, the storage device controller is configured to manage an array of buffers (e.g., coalescing buffers or other types of power-protected buffers) to temporarily store the updates. With the help of the <ID, SeqNum>
pair of each data entry with a data processing operation, the storage device controller can reconstruct the correct order of data processing operations and apply the operations to the data object in the correct order. As data processing operations are arranged and executed in the correct order, any operation that is a successor to a previously acknowledged operation can be acknowledged after its execution. When the power-protected buffer containing acknowledged updates exceeds the inherent mapping unit of the storage device (e.g., a sector), the buffer can be flushed out to make room for more operations. When flushing out the buffer, additional data processing, such as erasure coding, can be applied to the buffer before the flushing operation.
[0048] The techniques disclosed herein can also be used in a distributed naming scheme for accessing network-attached storage devices (e.g., storage devices forming multiple redundancy groups) and performing data processing operations associated with data objects stored at such storage devices. In some embodiments, a naming server is used to provide a flexible naming service using data object naming sequences for addressing data objects stored at different locations in a distributed storage network. More specifically, the data object naming sequences are long enough to provide a unique reference globally within the distributed storage network, allowing for use of the naming sequence within individual storage devices. Additionally, the data object naming sequences include partition information on how the data object is partitioned within the distributed system (e.g. the hashing algorithm used, and the logical partition/shard to physical device mapping). Also, the data object naming sequences include availability information identifying multiple versions of the data object (e.g., the availability information may indicate the existence of replicas, erasure -coded versions, etc.). In an example embodiment, a data object naming sequence includes a data object name, sharding information (e.g., information associated with shard mapping used for selecting a sharding instance with one or more storage devices storing the data object), and availability information. In another embodiment, the disclosed techniques are used for performing data accesses (e.g., an Append data processing operation) from multiple clients in connection with data objects stored at multiple storage devices (e.g., storage devices within a redundancy group).
[0049] In some embodiments, a sequencer module is used to generate a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object at a network-attached storage device. The sequencer module may be configured as a centralized sequencer module generating multiple sequence identifiers used by multiple computer devices to access data objects (e.g., different copies of the data object stored at different storage devices). The sequencer module may also be configured as a distributed sequencer module associated with a specific computer device and generating sequence identifiers used by the specific computer device to access data objects at storage devices.
[0050] An overview of a distributed storage network using the disclosed naming scheme and a sequencer module is provided in connection with FIG. 1.
A more detailed view of data object naming sequences is illustrated in FIG. 2. A more detailed description of processing data entries associated with a data object identifier is provided in connection with FIG. 3A-FIG. 4. A more detailed description of processing data accesses to redundancy groups of storage devices using the disclosed techniques is provided in connection with FIG. 5 - FIG. 9.
[0051] The discussed techniques for processing data accesses can be implemented, for example, using firmware that is executed by one or more processors on a computer device (e.g., a dedicated network node), or via hardware (e.g., multiple processors associated with multiple computer devices) configured to perform the disclosed functions and facilitate the processing of the data accesses.
[0052] Existing prior art techniques for processing ordered data accesses do not use the disclosed data entry format (including a data object identifier, a data processing operation identifier, and a sequence identifier) to process partial/unaligned data accesses to persistent media. Existing prior art techniques also do not perform the disclosed techniques for processing data accesses to redundancy groups (e.g., pre-processing of erasure-coding operations outside of the storage devices in connection with Append data processing operations or additional erasure -coding operations inside the buffer of the storage controller), such as illustrated in FIG. 1 and FIGS. 4-8.
[0053] FIG. 1 is a block diagram illustrating a network architecture 100 including a sequencer generating sequence identifiers, a naming server generating data object naming sequences, and multiple computing devices configured to access multiple network-attached storage devices based on the data object naming sequences and the sequence identifiers, according to some embodiments. Referring to FIG. 1, the network architecture 100 (which also can be referred to as distributed storage network 100) includes a plurality of computer devices such as computer devices 104, 106, ..., 108 configured to access storage devices 128, 130, 132, ..., 134 via network routers 112, 114, network switches 116, 118, and network interconnect 160 associated with the communication network 110. In some aspects, storage devices 128, 130, 132,
..., 134 are communicatively coupled to computer devices 104, 106, ..., 108 via network routers 112, 114 and network interconnect 160 associated with the communication network 110. The network interconnect 160 is part of the communication network 110, which includes the network routers 112, 114 and network switches 116, 118. The network interconnect 160 includes a Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) interface, a Peripheral Component Interconnect express (PCIe) interface, or another type of interconnect.
[0054] In some embodiments, the network architecture 100 includes a sequencer module 162 which can be used for generating sequence identifiers for other computer devices (e.g., computer devices 104, 106, ..., 108) in connection with ordered access to one or more data objects stored at storage devices 128 — 134. More specifically, the sequencer module 162 is configured to generate individual sequence identifiers or a range of sequence identifiers for accessing a data object based on a request from one or more of the computer devices 104 - 108.
[0055] The network architecture 100 further includes a naming server
102 communicatively coupled to computer devices 104, ..., 108, storage devices 128, ..., 134, and network switches 116, 118. The naming server 102 comprises suitable circuitry, logic, interfaces, and/or code and is configured to perform data object naming functionalities discussed herein. More specifically, the naming server 102 is configured to generate data object naming sequences 148 upon
creation of a data object (e.g., using a Create data processing operation 150). Additionally, the naming server 102 is further configured to update the data object naming sequences 148 based on the execution of other data processing operations, such as a Read data processing operation 154, an Append data processing operation 156, or a Delete data processing operation 152. In an example embodiment, the naming server 102 uses a shard mapping 144 to determine the sharing information associated with the data object naming sequences for a data object.
[0056] As used herein, the term “shard mapping” indicates a mapping of a sharding instance identifier to a set of storage devices and a hashing function (collectively referred to as a sharding instance). In this regard, sharding information such as a sharding instance identifier can be used as an input into the shard mapping to obtain the set of storage devices which can be used for performing one or more data processing operations. In other aspects, the sharding information is a tuple including the sharding instance identifier together with an input value such as hashing input information (also referred to as a shard information tuple, or SIT), which is used as input into the shard mapping to obtain the set of storage devices (e.g., using the sharding instance identifier to select a sharding instance) as well as select a specific storage device from the set (e.g., by applying the hashing function to the hashing input information). In an example embodiment, a sharding instance includes a list of storage devices and the input value is used for selecting one of the storage devices to receive a corresponding data processing operation. In other aspects, the sharding instance includes a list of redundancy groups of storage devices (e.g., each redundancy group includes two or more storage devices) and the input value is used for selecting one of the redundancy groups so that storage devices within the selected redundancy group receive the corresponding data processing operation. Example communication flows associated with data processing operations using data object naming sequences and shard mappings are illustrated in connection with FIG. 5 - FIG. 8.
[0057] In other embodiments, the network switch 116 is configured with a shard mapping 146 which can be used to determine sharding information when performing an Append data processing operation 156 or a Read data processing
operation 154. In aspects when the network switch 116 is configured with the shard mapping 146, the Create data processing operation 150 and the Delete data processing operation 152 are performed by the naming server 102 using shard mapping 144. Since shard mapping 144 and shard mapping 146 may be updated during the execution of data processing operations, shard mapping 144 and shard mapping 146 are synchronized with each other via communication link 158.
[0058] Even though FIG. 1 illustrates the use of shard mappings and data object naming sequences by the naming server 102 and the network switch 116, the disclosure is not limited in this regard and such functionalities can be consolidated within a single computer device using a data entry processing module performing the disclosed functionalities of the naming server 102, the network switch 116, the sequencer module 162, as well as data object processing functions of a storage controller (e.g., the processing functions performed by one or more of storage controllers 136 - 142 of storage devices 128 - 134). Example software architecture and a computer device using such data object naming module are illustrated in FIG. 10 and FIG. 11 respectively.
[0059] Computer devices 104-108 as well as the naming server 102 can be mobile devices or another type of computing device such as a compute node or a storage node in the network architecture 100. Storage devices 128, 130,
132, ..., 134 include solid-state drives (SSDs), hard drives (HDs), or another type of storage device, and are configured to store data objects accessed by the naming server 102 or by computer devices 104, ..., 108 via the network switch 116 and the network interconnect 160. For example, different copies of data objects are stored at different storage devices for redundancy or error coding purposes as indicated by the availability information in the corresponding data object naming sequence associated with a data object.
[0060] As illustrated in FIG. 1, three erasure-coded (EC) versions of data object D1 (e.g., data portions D1.E1/2, D1.E2/2, and parity bits DEEP ) are stored at corresponding storage devices 128, 130, and 132, and are accessed via the network switch 116. In this regard, the availability information for data object D1 includes 2+1 EC information reflected by the suffix El/2 (e.g., data portion 1 of 2), suffix E2/2 (e.g., data portion 2 of 2), and suffix EP (e.g., parity bits portion).
[0061] Two replica versions of data object D2 (e.g., D2.R1 and D2.R2) are stored at corresponding storage devices 130 and 132 and are accessed via the network switch 116. In this regard, the availability information for data object D2 includes replica (or copy) information reflected by the suffix R1 (e.g., the first replica of object D2) and suffix R2 (e.g., the second replica of object D2).
[0062] Additionally, two replica versions of data object D3 (e.g., D3.R1 and D3.R2) are stored at corresponding storage devices 132 and 134 and are accessed via the network switch 116. In this regard, the availability information for data object D3 includes replica (or copy) information reflected by the suffix R1 (e.g., the first replica of object D3) and suffix R2 (e.g., the second replica of object D3).
[0063] In an example embodiment, storage devices 128, 130, 132, ...,
134 include corresponding storage controllers 136, 138, 140, ..., 142. Each of the storage controllers 136 - 142 comprises suitable circuitry, logic, interfaces, and/or code and is configured to manage access to the storage devices 128 - 134, including configuring and managing operations for processing data entries in connection with ordered access to data objects as well as managing operations associated with the distributed naming scheme discussed herein (e.g., as described in connection with FIG. 3A, FIG. 3B, and FIG. 4). For example, the storage controllers 136-142 are configured to perform operations for processing ordered data accesses using one or more power-protected buffers (e.g., storage areas 172 and 178 in FIG. 1 associated with power-protected buffer 313), such as coalescing buffers or other types of power-protected buffers. As used herein, the term “power-protected buffer” indicates a buffer implemented as part of a storage device or non-volatile memory, which buffer persists any stored information upon power failure. The power-protected buffers include multiple storage areas that can be sized to match a mapping unit of the corresponding storage device (e.g., sector 176 of the storage device 130 and sector 182 storage device 132). For example, the power-protected buffers in storage devices 130 and 132 include corresponding storage areas 172 and 178 (e.g., sector-sized storage areas) which match the size of the mapping unit (namely, sectors 176 and 182) of the storage devices 130 and 132 respectively.
[0064] FIG. 1 further illustrates an example communication flow (which is also discussed in greater detail in connection with FIG. 5) of a stream of data entries for performing the append data processing operation to multiple replicas of the same data object from two client devices (e.g., computer devices 104 and 106). More specifically, computer device 104 communicates a stream 164 of data entries to switch 116 via router 112, and computer device 106 communicates a stream 166 of data entries to switch 116 via router 112. Each of the data entries in streams 164 and 166 can include a data processing operation identifier to identify the Append data processing operation 156 (or another data processing operation) to be performed in connection with replica versions R1 and R2 for data object D2 stored at storage devices 130 and 132. The network switch 116 uses shard mapping 146 to determine a redundancy group of storage devices associated with the requested data accesses of the data entries within streams 164 and 166. For example, the network switch 116 uses sharding information within the data entries of streams 164 and 166 to select a sharding instance of redundancy groups and select one of the redundancy groups (e.g., a redundancy group including storage devices 130 and 132) for communication of the data entries of streams 164 and 166. Data entries of streams 164 and 166 are multiplexed by the switch 116 and communicated as streams 168 and 170 of data entries to corresponding storage devices 130 and 132. The data entries in streams 164 and 166 include corresponding data object naming sequences (e.g., such as data object naming sequences 148 illustrated in FIG. 2). At storage device 130, storage controller 138 uses a data object naming sequence of the data object (e.g., a data object identifier within the data object naming sequence), a data processing operation identifier (which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier), and a sequence identifier within each of the data entries of stream 168 to rearrange the data entries as stream 174 (e.g., sequentially, based on the sequence identifier) using a buffer including the storage area 172. When the storage area 172 is full (e.g., with data packets associated with corresponding Append data processing operations of the data entries within the stream), the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object D2.R1 associated with the data
entries) and the result is flushed to a mapping unit of the storage devices 130 such as sector 176.
[0065] Similarly, at storage device 132, storage controller 140 uses a data object naming sequence of the data object (e.g., a data object identifier within the data object naming sequence), a data processing operation identifier (which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier) and a sequence identifier within each of the data entries of stream 170 to rearrange the data entries as stream 180 (e.g., sequentially, based on the sequence identifier) in a buffer including the sector-sized storage area 178. When the sector-sized storage area 178 is full (e.g., with data packets associated with corresponding Append data processing operations of the data entries within the stream), the corresponding data processing operations of the data entries within the sector sized storage area are executed (e.g., the Append data processing operations for the data object D2.R2 associated with the data entries) and the result is flushed to a mapping unit of the storage devices 132 such as sector 182.
[0066] FIG. 2 is a block diagram illustrating example data object naming sequences 148, according to some embodiments. Referring to FIG. 2, the data object naming sequences 148 include data object naming sequence (DONS) 200,
..., DONS 201. The DONS 200 includes a data object identifier 202 (e.g., a data object name), sharding information 204, and availability information 206. The availability information 206 may include one of the following: erasure coding information 208, replica information 210, reliability information 212, and service level agreement (SLA) information 214. Similarly, the DONS 201 includes a data object name 216, sharding information 218, and availability information 220. The availability information 220 may include one of the following: erasure coding information 222, replica information 224, reliability information 226, and SLA information 228.
[0067] As explained above in connection with FIG. 1 , the sharding information 204 may include a sharding instance identifier which is used as an input into the shard mapping to obtain a sharding instance with the set of storage devices that can be used for performing one or more data processing operations. In other aspects, the sharding information 204 is a shard information tuple (or
SIT) including the sharding instance identifier and an input value such as hashing input information. The input value is used as input into the shard mapping to obtain the set of storage devices (e.g., using the sharding instance identifier to select a sharding instance) as well as select a specific storage device from the set (e.g., by applying the hashing function to the hashing input information).
[0068] The availability information 206 used as part of the DONS is used to describe the relationship between different versions of a data object residing on multiple storage devices, which versions belong to the same logical data object identified by the data object identifier 202. In some embodiments, the availability information portion of the DONS can be used to provide availability in the case of device failures and to facilitate data recovery. The erasure coding information 208 is described in connection with FIG. 1 and may include different suffixes (e.g., ECl/2, EC2/2, ECP/2, ECQ/2) indicating whether the specific erasure-coded data object version is a data portion or a parity bits portion based on the EC type that was used when the data object was created. The replica information 210 is described in connection with FIG. 1 and may include suffixes (e.g., Rl/3, R2/3, R3/3) indicating the total number of replicas (or copies) and the specific replica number out of the total number of replicas (e.g., suffix Rl/3 indicates the data object associated with the DONS having this availability information suffix is the first replica out of a total of three replicas).
[0069] The reliability information 212 indicates the reliability of the storage device for accessing the particular data object. For example, a hard disk drive (HDD) may be associated with a reliability of 0.99999s (or 5_9s) and a data object stored at such HDD may have availability information represented as suffix 5_9s. A solid-state drive (SSD) may be associated with a reliability of 0.9999999s (or 7_9s) and a data object stored at such SSD may have availability information represented as suffix 7_9s.
[0070] The SLA information 214 indicates the latency associated with accessing a specific data object associated with a DONS using the SLA information. For example, the SLA information may include the following SLA
suffixes: .10us, .lOOus, .1ms, and .10ms, which indicate the access latency associated with a particular data object.
[0071] The following is an example to illustrate the use of a DONS in the disclosed distributed naming scheme. An example DONS can be Dl.Shardl_X.Rl/3, which can be used for determining the following information: (a) the data object name is Dl; (b) the sharding information includes SIT consisting of a sharding instance identifier Shard 1 and hashing input information X; and (c) availability information is Rl/3 indicating the particular data object associated with this DONS is a first replica (or copy) out of a total of three replicas of the data object.
[0072] Even though FIG. 2 illustrates four different types of availability information 206, the disclosure is not limited in this regard and other types of availability information may be used as well.
[0073] FIG. 3A is a block diagram 300A illustrating processing the plurality of data entries 312-318 associated with stream 174 of data accesses, according to example embodiments. Each of the data entries 312 - 318 is configured to include a data object identifier of a data object the data entry is associated with, a data processing operation identifier (e.g., an identifier representing a data processing operation to be performed on the data object), a data packet (e.g., data for use with an Append data processing operation or another data modification operation), and a sequence identifier. For example, each of the data entries 312 - 318 includes the data object identifier (ID) 302 which identifies a data object (e.g., D2) the particular data entries are associated with. Even though FIG. 3A illustrates the data object ID 302 as a separate element outside of the data entries, such illustration is for simplicity, and each data entry 312 - 318 includes the data object ID 302. Data entries 312, 314, 316, and 318 also include corresponding sequence identifiers (or sequence numbers) (referenced interchangeably as SEQ or SeqNum) 304, 306, 308, and 310, corresponding data packets 1 , 2, 3, and 4, as well as corresponding data processing operation identifiers (also referred to as data processing commands or CMD) 305, 307, 309, and 311 to be performed using the corresponding data packets 1 - 4. The sequence identifiers 304 - 310 refer to the order that the corresponding data processing operations 305 - 311 (for data entries within
stream 174 that are associated with data object ID 302) should be executed. In an example embodiment, the data processing operation is a Create data processing operation, a Read data processing operation, a Write data processing operation, an Append data processing operation, a Verify data processing operation, a Delete data processing operation, or another data modification operation to be performed on the identified data object.
[0074] In an example embodiment, each of the data entries 312 - 318 of the stream 174 further includes a sequence validator function (or SEQ VLDTR) 303. The sequence validator function 303 is used to ensure that there are no duplicate data entries within a stream of ordered data accesses. For example, the storage controller 138 can use the sequence validator function 303 to detect that one of the data entries 312 - 318 of the stream 174 is older than a corresponding flushed data entry stored within storage device 130. Such detection would indicate that the incoming entry is a duplicate and the storage controller 138 can discard the incoming entry.
[0075] As data entries 312 - 318 of the stream 174 are detected by the storage controller 138 as associated with the same data object ID 302, the storage controller 138 stores the corresponding data packets 1 - 4 in one or more storage areas (e.g., storage area 172 which may be sector-sized) of the power- protected buffer 313. In an example embodiment, data packets 1 - 4 are stored in the storage area 172 of the power-protected buffer 313 sequentially, based on the corresponding sequence identifiers 304 - 310. In this way, the data packets 1 - 4 are stored in the correct order inside the storage area 172, and the storage controller 138 can apply corresponding data processing operations 305 - 311 in the order the data entries are arranged using the sequence identifiers 304 - 310. More specifically, the storage controller 138 executes the data processing operation for each of the data entries 312-318 according to an execution order based on the corresponding sequence identifier. As data processing operations are executed in order according to the sequence identifiers associated with the same data object ID, an acknowledgment can be communicated upon completion of the execution of each data processing operation. Example processing of data entries associated with multiple data streams of ordered data accesses is discussed in connection with FIG. 3B.
[0076] The storage area 172 is a sector-size storage area, matching the mapping unit (namely, a sector such as a sector 176) of the storage device 130. Additionally, as data packets associated with incoming data entries are stored in storage area 172, the last stored data packet may be partial. For example, as illustrated in FIG. 3 A and FIG. 3B, data packet 3 may be only partially stored inside the storage area 172, and a remaining portion of data packet 3 may be stored in another storage area of the power-protected buffer 313 managed by the storage controller 138 (or within the same storage area after the storage controller 138 flushes the store data into sector 176 of the storage device 130).
[0077] After the data processing operations associated with data entries
312 - 318 are executed, storage area 172 contains corresponding data packets associated with acknowledged updates. The storage controller 138 is further configured to detect when a size of stored data packets in the storage area 172 is equal to the size of the mapping unit of the storage device 130 (namely, when the size of the storage area 172 equals the size of a sector of the storage device 130), and flush the contents of the storage area 172 from the power-protected buffer 313 into the sector 176 of the storage device 130 based on the detecting.
[0078] FIG. 3B is a block diagram 300B illustrating processing a plurality of data entries associated with multiple data object identifiers, according to example embodiments. Referring to FIG. 3B, there are illustrated data entries associated with multiple data streams received at the storage device 128. For example, data entries 322, 324, and 326 are associated with stream 344, along with a data object ID (ID_1) 320, while data entries 330, 332, 334, and 336 are associated with stream 346, along with a data object ID (ID_2) 328.
[0079] In an example embodiment when streams of ordered data accesses associated with different data object identifiers are received at storage device 128, the storage controller 138 is configured to store data packets associated with the same data object ID in the same storage area inside the power-protected buffer 313. For example, data packets 1, 2, and 3 of data entries 322, 324, and 326 respectively are all stored in storage area 172 associated with data object ID 320 of stream 344. Similarly, data packets 4, 5, 6, and 7 of data entries 330, 332, 334, 336 respectively are all stored in storage area 173 associated with data object ID 328 of stream 346.
[0080] In an example embodiment, different data entries associated with different data object identifiers can be received sequentially at storage device 128 (e.g., data entries of streams 344 and 346 are received out of order and at different times). The storage controller 138 is configured to detect the data object identifier of each received data entry and store the corresponding data packet in a corresponding storage area associated with the data object identifier of the data entry based on the sequence identifier of the data entry. As data packets are arranged sequentially inside the corresponding storage areas, the data processing operation indicated by each data entry can be executed and an acknowledgment can be communicated. In this regard, storage controller 138 is configured to process a subset of a set of received data entries, where the subset is associated with the same data object identifier. Additionally, the data packet stored in storage areas 172 and 173, after the data processing operations have been executed, are flushed into sectors 338 and 340 of the storage device 130, based on storage areas 172 and 173 reaching a sector size with stored data.
[0081] FIG. 4 is a block diagram illustrating a storage controller 138 configured to perform erasure coding operations 402 and 404 on streaming data within a storage device 130 to generate different parity values, according to example embodiments. Referring to FIG. 4, the plurality of data entries 424 - 430 are associated with received stream 401 of data entries (which can be the same as stream 168 received at storage device 130). Each of the data entries 424 - 430 includes a data object identifier (IDx) of a data object the data entry is associated with, a data processing operation identifier (CMD which can identify an Append data processing operation), a data packet (e.g., 1, 2, 3, or 4), and a sequence identifier (SEQ). For example, each of the data entries 424 - 430 includes the data object identifier (IDx such as IDx 422) which identifies a data object the particular data entries of stream 401 are associated with. In an example embodiment, each of the data entries can include a data object naming sequence (as discussed in connection with FIG. 1 and FIG. 2) which can be used together with the data packet and the sequence identifier to execute/perform the data processing operation associated with the data processing operation identifier.
[0082] As illustrated in FIG. 4, data packets from corresponding data entries are stored in the storage area 406 of a power-protected buffer (e.g., buffer 313), and are managed by the storage controller 138. In an example embodiment, the storage controller 138 is configured to generate data chunks (or data slices) 408, 410, and 412 using data stored in storage area 406 (the stored data being also referred to in FIG. 4 as “trunk”).
[0083] Erasure coding may be performed on data that is larger than a mapping unit of a storage device. For example, a 3+2 erasure coding on a 4KB mapping unit device will use 3*4K=12KB data to work on, then generate five pieces of data (or data chunks), 4KB each, which can be stored on five storage devices for redundancy. As used herein, “3+2” indicates three pieces of data and two pieces of error correction coded data (or a parity value). Similarly, “x+y” erasure coding indicates x pieces of data and y pieces of error correction coded data (or a parity value).
[0084] In the example erasure coding operation 402, the storage controller 138 generates a parity value P 418 using data stored in storage area 406. In the example erasure coding operation 404, the storage controller 138 generates a parity value Q 419 using data stored in storage area 406. In an example embodiment, parity values P and Q maybe generated using the same data in storage area 406, by applying parity value generation techniques associated with erasure coding techniques. As the parity values P 418 and Q 419 are generated, they may be stored in sector 420 of the storage device 130.
[0085] In an example embodiment, the storage controller 138 detects a size of data entries stored in the storage area 406 of the power-protected buffer is equal to a predefined size associated with an erasure coding operation (e.g., the data generated by such operation will render a size greater than a size of a mapping unit, such as a sector of the storage device 130, for example, 12KB in the 3+2 erasure coding scheme and mapping unit is 4KB). In some aspects, the storage controller 138 applies the erasure coding operation (e.g., a logical operation, a byte-shifting operation, or another type of erasure coding operation) to data entries stored in the storage area 406 to generate data slices. The storage controller 138 generates a parity value (e.g., parity value P 418 or Q 419) based
on the data slices, and stores the data slices or the parity value in the mapping unit (e.g., sector 420) of the storage device 130.
[0086] FIG. 5 is a communication flow diagram 500 of example communications associated with the execution of an Append data processing operation for data object replicas in the distributed storage network of FIG. 1 using data object naming sequences and shard mapping to multiple storage devices in a redundancy group, according to some embodiments. Referring to FIG. 5, the communication flow takes place between computer device 104, computer device 106, the network switch 116, and a plurality of redundancy groups 501, 503, 505 of storage devices. The first redundancy group 501 (or RG1) includes storage devices 128 (or A), 130 (or B), and 132 (or C). The second redundancy group 503 (or RG2) includes storage devices B and C. The third redundancy group 505 (or RG3) includes storage devices C and 134 (or N).
[0087] Initially, computer device 104 and computer device 106 communicate a request to the sequencer module 162 for sequence identifiers for accessing a data object associated with the data object identifier NAME1. In response, the sequencer module 162 provides sequence identifiers SEQ1 502 and SEQ2 504 to computer devices 104 and 106 respectively. Computer device 104 communicates a request 506 to the switch 116 for a data processing operation such as an Append data processing operation, which includes a data object identifier of a data object (e.g., a data object name such as NAME1), an SIT (including a sharding instance identifier of SHARD_1 and hashing input information such as X), data (e.g., DATA1) for appending the data object, and sequence identifier SEQ1 obtained from the sequencer module 162. In some aspects, request 506 may further include availability information (e.g., identifying a version of a plurality of versions of the data object used in connection with the identified data processing operation).
[0088] The network switch 116 includes a shard mapping 146 with sharding instances. Each of the sharding instances of the shard mapping 146 corresponds to a redundancy group (RG) (or a subset) of storage devices of a set of available storage devices (e.g., set of available storage devices 128 - 134) and a hashing function. For example, a first sharding instance of the shard mapping 146 can refer to RG1, a second sharding instance of the shard mapping 146 can
refer to RG2, and a third sharding instance of the shard mapping 146 can refer to RG3. At operation 508, the network switch 116 performs a shard mapping based on the sharding instance identifier and the hashing input information received with the request 506. More specifically, the sharding instance identifier is mapped to the first sharding instance and the hashing function (e.g., SHA256) of the first sharding instance is applied to the hashing input information to obtain RG2 (e.g., storage devices 130 and 132) as the receiving RG for the Append data processing operation of request 506.
[0089] The network switch 116 generates a modified (or updated) request 510 for the data processing operation (e.g., a modified Append data processing operation) which is communicated to storage device 130. More specifically, the original request 506 for an Append data processing operation is modified to include DONS of the data object that will be appended at storage device 130 as well as the corresponding availability information. For example, the original request 506 for the Append data processing operation includes a data object identifier (e.g., a data object name) NAME1, SIT including a sharding instance identifier of SHARD_1 and hashing input information X, data (e.g., DATA1) for appending, and sequence identifier SEQ1. The network switch 116 uses shard mapping 146 and selects a sharding instance for RG2 based on the sharding instance identifier SHARD_1. The network switch 116 generates the modified request 510 for communication to the storage device 130 and modified request 514 for communication to the storage device 132. The modified request 510 for the Append data processing operation for the storage device 130 will include DONS with the data object identifier NAME1, the SIT (e.g.,
{SHARD_1, X}), availability information (e.g. suffix Rl/2 identifying a first replica version of the data object), data for appending (e.g., DATA1), and sequence identifier (e.g., SEQ1). The modified request 514 for the Append data processing operation for the storage device 132 will include DONS with the data object identifier NAME 1, the SIT (e.g., {SHARD_1, X}), availability information (e.g. suffix R2/2 identifying a second replica version of the data object), data for appending (e.g., DATA1), and sequence identifier (e.g., SEQ1).
[0090] The modified requests 510 and 514 for the Append data processing operation are communicated to storage devices 130 and 132
respectively for execution. The modified requests 510 and 514 are executed at storage devices 130 and 132, and corresponding notifications 512 and 516 of the outcome of the execution (e.g., a notification of successful execution) is communicated back to the network switch 116. The network switch 116 forwards the received notifications 512 and 516 as notification 518 to the computer device 104 where the original request 506 for an Append data processing operation originated.
[0091] Similar processing is performed by the network switch 116 with regard to data entries associated with stream 166 originating from computer device 106. More specifically, computer device 106 communicates a request 520 to the switch 116 for a data processing operation such as an Append data processing operation, which includes a data object identifier of a data object (e.g., a data object name such as NAME1), an SIT (including a sharding instance identifier of SHARD_1 and hashing input information such as X), data (e.g., DATA2) for appending the data object, and sequence identifier SEQ2 obtained from the sequencer module 162. In some aspects, request 520 may further include availability information (e.g., identifying a version of a plurality of versions of the data object used in connection with the identified data processing operation).
[0092] At operation 522, the network switch 116 performs a shard mapping based on the sharding instance identifier and the hashing input information received with the request 520. More specifically, the sharding instance identifier is mapped to the first sharding instance and the hashing function (e.g., SHA256) of the first sharding instance is applied to the hashing input information to obtain RG2 (e.g., storage devices 130 and 132) as the receiving RG for the Append data processing operation of request 520.
[0093] The network switch 116 generates a modified (or updated) request 524 for the data processing operation (e.g., a modified Append data processing operation) which is communicated to storage device 130. More specifically, the original request 520 for an Append data processing operation is modified to include DONS of the data object that will be appended at storage device 130 as well as the corresponding availability information. For example, the original request 520 for the Append data processing operation includes a data
object identifier (e.g., a data object name) NAME1, SIT including a sharding instance identifier of SHARD_1 and hashing input information X, data (e.g., DATA2) for appending, and sequence identifier SEQ2. The network switch 116 uses shard mapping 146 and selects a sharding instance for RG2 based on the sharding instance identifier SHARD_1. The network switch 116 generates the modified request 524 for communication to the storage device 130 and modified request 528 for communication to the storage device 132. The modified request 524 for the Append data processing operation for the storage device 130 will include DONS with the data object identifier NAME1, the SIT (e.g.,
{SHARD_1, X}), availability information (e.g. suffix Rl/2 identifying a first replica version of the data object), data for appending (e.g., DATA2), and sequence identifier (e.g., SEQ2). The modified request 528 for the Append data processing operation for the storage device 132 will include DONS with the data object identifier NAME 1, the SIT (e.g., {SHARD_1, X}), availability information (e.g. suffix R2/2 identifying a second replica version of the data object), data for appending (e.g., DATA2), and sequence identifier (e.g., SEQ2).
[0094] The modified requests 524 and 528 for the Append data processing operation are communicated to storage devices 130 and 132 respectively for execution. The modified requests 524 and 528 are executed at storage devices 130 and 132, and corresponding notifications 526 and 530 of the outcome of the execution (e.g., a notification of successful execution) is communicated back to the network switch 116. The network switch 116 forwards the received notifications 526 and 530 as notification 532 to the computer device 106 where the original request 520 for an Append data processing operation originated.
[0095] FIG. 6 is a block diagram 600 illustrating the network architecture of FIG. 1 performing an Append data processing operation for erasure -coded versions of a data object stored at multiple storage devices in a redundancy group, according to some embodiments. More specifically, FIG. 6 illustrates an example communication flow (which is also discussed in greater detail in connection with FIG. 7) of a stream of data entries for performing the Append data processing operation to erasure-coded versions of the same data object from two client devices (e.g., computer devices 104 and 106).
[0096] Computer device 104 communicates a stream 602 of data entries to switch 116 via router 112, and computer device 106 communicates a stream 604 of data entries to switch 116 via router 112. Each of the data entries in streams 164 and 166 can include a data processing operation identifier to identify the Append data processing operation 156 (or another data processing operation) to be performed in connection with erasure-coded versions ECl/2, EC2/2, and EC P for data object D1 stored at storage devices 128, 130, and 132 (identified in the shard mapping 146 of the network switch 116 as RG1). The network switch 116 uses shard mapping 146 to determine a redundancy group of storage devices associated with the requested data accesses of the data entries within streams 602 and 604. For example, the network switch 116 uses sharding information within the data entries of streams 602 and 604 to select a sharding instance of redundancy groups and select one of the redundancy groups (e.g., RG1 including storage devices 128, 130, and 132) for communication of the data entries of streams 602 and 604.
[0097] Data entries of streams 602 and 604 are multiplexed by the switch
116 and communicated as streams 606, 612, and 618 of data entries to corresponding storage devices 128, 130, and 132. The data entries in streams 606, 612, and 618 include corresponding data object naming sequences (e.g., such as data object naming sequences 148 illustrated in FIG. 2).
[0098] At storage device 128, storage controller 136 uses a data object naming sequence of the data object (e.g., a data object identifier within the data object naming sequence), a data processing operation identifier (which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier), and a sequence identifier within each of the data entries of stream 606 to rearrange the data entries (e.g., sequentially, based on the sequence identifier) using a buffer including the storage area 608. When the storage area 608 is full (e.g., with data packets associated with corresponding Append data processing operations of the data entries within the stream), the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object D1.E1/2 associated with the data
entries) and the result (e.g., data portion A) is flushed to a mapping unit of the storage devices 128 such as sector 610.
[0099] At storage device 130, storage controller 138 uses a data object naming sequence of the data object (e.g., a data object identifier within the data object naming sequence), a data processing operation identifier (which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier), and a sequence identifier within each of the data entries of stream 612 to rearrange the data entries (e.g., sequentially, based on the sequence identifier) using a buffer including the storage area 614. When the storage area 614 is full (e.g., with data packets associated with corresponding Append data processing operations of the data entries within the stream), the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object D1.E1/2 associated with the data entries) and the result (e.g., data portion B) is flushed to a mapping unit of the storage devices 130 such as sector 616.
[0100] Similarly, at storage device 132, storage controller 140 uses a data object naming sequence of the data object (e.g., a data object identifier within the data object naming sequence), a data processing operation identifier (which identifies the Append data processing operation that should be performed on the data object identified by the data object identifier) and a sequence identifier within each of the data entries of stream 618 to rearrange the data entries (e.g., sequentially, based on the sequence identifier) in a buffer including the sector-sized storage area 620. When the sector-sized storage area 620 is full (e.g., with data packets associated with corresponding Append data processing operations of the data entries within the stream), the corresponding data processing operations of the data entries within the sector-sized storage area are executed (e.g., the Append data processing operations for the data object Dl.EP associated with the data entries) and the result (e.g., parity bits portion P) is flushed to a mapping unit of the storage devices 132 such as sector 622.
[0101] FIG. 7 is a communication flow diagram 700 of example communications associated with the execution of the Append data processing operation of FIG. 6, according to some embodiments. Referring to FIG. 7, the
communication flow takes place between computer device 104, computer device 106, the network switch 116, and storage devices 128, 130, and 132.
[0102] Initially, computer device 104 and computer device 106 communicate a request to the sequencer module 162 for sequence identifiers for accessing a data object associated with the data object identifier NAME1. In response, the sequencer module 162 provides sequence identifiers SEQ1 704 and SEQ2706 to computer devices 104 and 106 respectively. Computer device 104 communicates a request 708 (which can include a data entry) to the switch 116 for a data processing operation such as an Append data processing operation, which includes a data object identifier of a data object (e.g., a data object name such as NAME1), an SIT (including a sharding instance identifier of SHARD_1 and hashing input information such as X), data (e.g., DATA1) for appending the data object, and sequence identifier SEQ1 obtained from the sequencer module 162. In some aspects, request 708 may further include availability information (e.g., identifying a version of a plurality of versions of the data object used in connection with the identified data processing operation).
[0103] The network switch 116 includes a shard mapping 146 with sharding instances. Each of the sharding instances of the shard mapping 702 corresponds to a redundancy group (RG) (or a subset) of storage devices of a set of available storage devices (e.g., set of available storage devices 128 - 134) and a hashing function. For example, a first sharding instance of the shard mapping 702 can refer to RG1, a second sharding instance of the shard mapping 702 can refer to RG2, and a third sharding instance of the shard mapping 702 can refer to RG3. At operation 710, the network switch 116 performs a shard mapping based on the sharding instance identifier and the hashing input information received with the request 708. More specifically, the sharding instance identifier is mapped to a first sharding instance and the hashing function (e.g., SHA256) of the first sharding instance is applied to the hashing input information to obtain RG1 (e.g., storage devices 128, 130, and 132) as the receiving RG for the Append data processing operation of request 708.
[0104] The network switch 116 generates a modified (or updated) request 712 for the data processing operation (e.g., a modified Append data processing operation) which is communicated to storage devices 128, 130, and
132. More specifically, the original request 704 for an Append data processing operation is modified to include DONS of the data object that will be appended at storage devices 128-132 as well as the corresponding availability information. For example, the original request 704 for the Append data processing operation includes a data object identifier (e.g., a data object name) NAME1, SIT including a sharding instance identifier of SHARD_1 and hashing input information X, data (e.g., DATA1) for appending, and sequence identifier SEQ1. The network switch 116 uses shard mapping 702 and selects a sharding instance associated with RG1 based on the sharding instance identifier SHARD_1. The network switch 116 generates the modified request 712 for communication to the storage devices 128-132. The modified request 712 for the Append data processing operation for storage devices 128-130 will include DONS with the data object identifier NAME1, the SIT (e.g., {SHARD_1, X}), data for appending (e.g., DATA1), and sequence identifier (e.g., SEQ1). The modified request 712 for the Append data processing operation is communicated to storage devices 128-130 for execution. The modified request 712 is executed at storage devices 128-130, and corresponding notifications 713 of the outcome of the execution (e.g., a notification of successful execution) are communicated back to the network switch 116. The network switch 116 forwards the received notifications 713 as notification 714 to the computer device 104 where the original request 708 for an Append data processing operation originated.
[0105] Similar processing is performed by the network switch 116 with regard to data entries associated with stream 166 originating from computer device 106. More specifically, computer device 106 communicates a request 716 to the switch 116 for a data processing operation such as an Append data processing operation, which includes a data object identifier of a data object (e.g., a data object name such as NAME1), an SIT (including a sharding instance identifier of SHARD_1 and hashing input information such as X), data (e.g., DATA2) for appending the data object, and sequence identifier SEQ2 obtained from the sequencer module 162. In some aspects, request 716 may further include availability information (e.g., identifying a version of a plurality of versions of the data object used in connection with the identified data processing operation).
[0106] The network switch 116 performs a shard mapping (similar to the shard mapping at operation 710) based on the sharding instance identifier and the hashing input information received with the request 716. More specifically, the sharding instance identifier is used to obtain RG1 (e.g., storage devices 128- 130) as the receiving RG for the Append data processing operation of request 716.
[0107] The network switch 116 generates a modified (or updated) request 718 for the data processing operation (e.g., a modified Append data processing operation) which is communicated to storage devices 128, 130, and 132. More specifically, the original request 716 for an Append data processing operation is modified to include DONS of the data object that will be appended at storage devices 128-132 as well as the corresponding availability information. For example, the original request 716 for the Append data processing operation includes a data object identifier (e.g., a data object name) NAME1, SIT including a sharding instance identifier of SHARD_1 and hashing input information X, data (e.g., DATA2) for appending, and sequence identifier SEQ2. The network switch 116 uses shard mapping 702 and selects a sharding instance associated with RG1 based on the sharding instance identifier SHARD_1. The network switch 116 generates the modified request 718 for communication to the storage devices 128-132. The modified request 718 for the Append data processing operation for storage devices 128-130 will include DONS with the data object identifier NAME1, the SIT (e.g., {SHARD_1, X}), data for appending (e.g., DATA2), and sequence identifier (e.g., SEQ2). The modified request 718 for the Append data processing operation is communicated to storage devices 128-130 for execution. The modified request 718 is executed at storage devices 128-130, and corresponding notifications 719 of the outcome of the execution (e.g., a notification of successful execution) are communicated back to the network switch 116. The network switch 116 forwards the received notifications 719 as notification 720 to the computer device 106 where the original request 716 for an Append data processing operation originated.
[0108] FIG. 8 is a block diagram 800 illustrating the network architecture of FIG. 1 performing an Append data processing operation for erasure -coded versions of a data object stored at multiple storage devices in a redundancy
group where pre-processing of the Append operation is performed at the network switch so that the amount of data transmitted between the switch and storage device can be reduced, according to some embodiments. More specifically, FIG. 8 illustrates an example communication flow of a stream of data entries for performing the Append data processing operation to erasure-coded versions of the same data object from two client devices (e.g., computer devices 104 and 106), where partial data processing is performed at the network switch 116. In this regard, switch 116 can be configured as software-defined networking (SDN) switch implementing partial processing of data processing operations within a received stream of data entries.
[0109] Computer device 104 communicates a stream 802 of data entries to switch 116 via the router 112, and computer devices 106 communicates a stream 804 of data entries to switch 116 via router 112. Each of the data entries in streams 802 and 804 includes a data processing operation identifier to identify the Append data processing operation 156 (or another data processing operation) to be performed in connection with erasure-coded versions ECl/2, EC2/2, and EC P for data object D1 stored at storage devices 128, 130, and 132 (identified in the shard mapping 146 of the network switch 116 as RG1). The network switch 116 uses shard mapping 146 to determine a redundancy group of storage devices associated with the requested data accesses of the data entries within the streams 802 and 804. For example, the network switch 116 uses sharding information within the data entries of streams 802 and 804 to select a sharding instance of redundancy groups and select one of the redundancy groups (e.g., RG1 including storage devices 128, 130, and 132) for communication of the data entries of streams 802 and 804.
[0110] In an example embodiment, the network switch 116 may retrieve the erasure-coded versions of data object D1 and, at operation 806, may perform the Append data processing operations associated with data entries received with streams 802 and 804. More specifically, the network switch 116 performs the Append data processing operations associated with data entries received with streams 802 and 804 on erasure coded versions D1.E1/2 and D1.E2/2 (stored at storage devices 128 and 130) to obtain data portions A and B respectively. The network switch 116 further performs the Append data processing operations
associated with data entries received with streams 802 and 804 on erasure coded version Dl.EP (stored at storage device 132) to obtain parity bits portion P. The obtained A, B, and P results of the Append data processing operations are stored at the local storage (or other remote storage) associated with the network switch 116. In this regard, by partially offloading the processing of the Append data processing operations associated with data entries received with streams 802 and 804, processing efficiency is increased (e.g., shortened processing time and reduced use of network-attached storage devices).
[0111] The obtained A, B, and P results of the Append data processing operations are communicated as streams 808, 810, and 812 to corresponding storage devices 128, 130, and 132. The storage devices 128, 130, and 132 store the received A, B, and P results of the Append data processing operations into temporary storage locations (e.g., sector-sized storage locations) and then to sectors 814, 816, and 818 (e.g., when the sector-sized storage locations are full up to a sector size).
[0112] FIG. 9 is a flowchart of another method 900 for processing a stream of data accesses within a storage device, according to example embodiments. Method 900 includes operations 902, 904, 906, 908, and 910. By way of example and not limitation, method 900 is described as being performed by one or more processors of the network architecture 100 (e.g., a data entry processing module performing the functionalities of sequencer module 162, naming server 102, the network switch 116, and storage controllers 136 - 142 of FIG. 1 or other firmware or hardware discussed herein), which can be configured to execute within a computer device such as device 1100 FIG. 11.
[0113] At operation 902, a plurality of sequence identifiers is generated for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order. For example, as illustrated in FIG. 5 in FIG. 7, sequential sequence identifiers 502 and 504 are generated based on requests from computer devices 104 and 106 in connection with data processing operations for a data object with a data object identifier of NAME1. At operation 904, a subset of storage devices of the plurality of a plurality of storage devices is selected based on a sharding instance identifier associated with the plurality of data entries. For example, shard mapping 146 of
switch 116 is used to map a sharding instance identifier and determine a redundancy group (e.g., RG2) of storage devices as the subset.
[0114] At operation 906, a data stream with the plurality of data entries is communicated to a storage device of the subset of storage devices. Each data entry of the plurality of data entries includes a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers. For example, switch 116 communicates requests 510 and 514 associated with data entries to the storage devices of RG2. At operation 908, the plurality of data entries is arranged in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers. For example, as illustrated in FIG. 1, the data entries received in streams 168 and 170 at storage devices 130 and 132 from the network switch 116 are arranged in storage areas 172 and 178 respectively, based on the data object identifiers and the sequential order of the sequence identifiers. At operation 910, a data processing operation is executed for each data entry of the plurality of data entries using the data object. The data processing operation corresponds to the data processing operation identifier, and the data processing operation is executed according to an execution order based on the sequence identifier. For example, after the data entries of stream 168 are arranged in sequential order as stream 174, corresponding data processing operations are executed based on the sequential order and the result is flushed to sector 176 when the storage area 172 is full with results of the execution of the data processing operation.
[0115] FIG. 10 is a block diagram illustrating a representative software architecture, which may be used in conjunction with various device hardware described herein, according to example embodiments. FIG. 10 is merely a non limiting example of a software architecture 1002 and it will be appreciated that many other architectures may be implemented to facilitate the functionality described herein. The software architecture 1002 executes on hardware, such as any of the computer devices in FIG. 1 which can be the same as device 1100 of FIG. 11 that includes, among other things, processor 1105, memory 1110, storage 1115 and/or 1120, and I/O interfaces 1125 and 1130.
[0116] A representative hardware layer 1004 is illustrated and can represent, for example, the device 1100 of FIG. 11. The representative hardware layer 1004 comprises one or more processing units 1006 having associated executable instructions 1008. Executable instructions 1008 represent the executable instructions of the software architecture 1002, including implementation of the methods, modules, and so forth of FIGS. 1-9. Hardware layer 1004 also includes memory or storage modules 1010, which also have executable instructions 1008. Hardware layer 1004 may also comprise other hardware 1012, which represents any other hardware of the hardware layer 1004, such as the other hardware illustrated as part of device 1100. The memory or storage modules 1010 can include storage devices (e.g., any of storage devices 128-134) with firmware implementing the data entry processing module 1060.
In some aspects, the data entry processing module 1060 comprises suitable circuitry, logic, interfaces, or code and can be configured to perform the functionalities of sequencer module 162, naming server 102, the network switch 116, and storage controllers 136 - 142 of FIG. 1 or other firmware or hardware discussed herein in connection with FIGS. 1-8.
[0117] In the example architecture of FIG. 10, the software architecture
1002 may be conceptualized as a stack of layers where each layer provides particular functionality. For example, the software architecture 1002 may include layers such as an operating system 1014, libraries 1016, frameworks/middleware 1018, applications 1020, and presentation layer 1044. Operationally, the applications 1020 or other components within the layers may invoke application programming interface (API) calls 1024 through the software stack and receive a response, returned values, and so forth, illustrated as messages 1026 in response to the API calls 1024. The layers illustrated in FIG. 10 are representative in nature and not all software architectures 1002 have all layers. For example, some mobile or special purpose operating systems may not provide frameworks/middleware 1018, while others may provide such a layer. Other software architectures may include additional or different layers.
[0118] The operating system 1014 may manage hardware resources and provide common services. The operating system 1014 may include, for example, a kernel 1028, services 1030, and drivers 1032. The kernel 1028 may
act as an abstraction layer between the hardware and the other software layers. For example, kernel 1028 may be responsible for memory management, processor management (e.g., scheduling), component management, networking, security settings, and so on. The services 1030 may provide other common services for the other software layers. Drivers 1032 may be responsible for controlling or interfacing with the underlying hardware. For instance, the drivers 1032 may include display drivers, camera drivers, Bluetooth® drivers, flash memory drivers, serial communication drivers (e.g., Universal Serial Bus (USB) drivers), Wi-Fi® drivers, audio drivers, power management drivers, and so forth, depending on the hardware configuration.
[0119] Libraries 1016 may provide a common infrastructure that may be utilized by the applications 1020 or other components or layers. Libraries 1016 typically provide functionality that allows other software modules to perform tasks more easily than to interface directly with the underlying operating system 1014 functionality (e.g., kernel 1028, services 1030, or drivers 1032). Libraries 1016 may include system libraries 1034 (e.g., C standard library) that may provide functions such as memory allocation functions, string manipulation functions, mathematic functions, and the like. In addition, libraries 1016 may include API libraries 1036 such as media libraries (e.g., libraries to support presentation and manipulation of various media format such as MPEG4, H.264, MP3, AAC, AMR, JPG, PNG), graphics libraries (e.g., an OpenGL framework that may be used to render 2D and 3D in a graphic content on a display), database libraries (e.g., SQLite that may provide various relational database functions), web libraries (e.g., WebKit that may provide web browsing functionality), and the like. Libraries 1016 may also include a wide variety of other libraries 1038 to provide many other APIs to the applications 1020 and other software components/modules.
[0120] The frameworks/middleware 1018 (also sometimes referred to as middleware) may provide a higher-level common infrastructure that may be utilized by the applications 1020 or other software components/modules. For example, the frameworks/middleware 1018 may provide various graphical user interface (GUI) functions, high-level resource management, high-level location services, and so forth. The frameworks/middleware 1018 may provide a broad
spectrum of other APIs that may be utilized by the applications 1020 or other software components/modules, some of which may be specific to a particular operating system 1014 or platform.
[0121] Applications 1020 include built-in applications 1040, and third- party applications 1042. Examples of representative built-in applications 1040 may include but are not limited to, a contacts application, a browser application, a book reader application, a location application, a media application, a messaging application, or a game application. Third-party applications 1042 may include any of the built-in applications 1040 as well as a broad assortment of other applications. In a specific example, the third-party application 1042 (e.g., an application developed using the Android™ or iOS™ software development kit (SDK) by an entity other than the vendor of the particular platform) may be mobile software running on a mobile operating system such as iOS™, Android™, Windows® Phone, or other mobile operating systems. In this example, the third-party application 1042 may invoke the API calls 1024 provided by the mobile operating system such as operating system 1014 to facilitate functionality described herein.
[0122] The applications 1020 may utilize built-in operating system functions (e.g., kernel 1028, services 1030, and drivers 1032), libraries (e.g., system libraries 1034, API libraries 1036, and other libraries 1038), and frameworks/middleware 1018 to create user interfaces to interact with users of the system. Alternatively, or additionally, in some systems, interactions with a user may occur through a presentation layer, such as presentation layer 1044. In these systems, the application/module "logic" can be separated from the aspects of the application/module that interact with a user.
[0123] Some software architectures utilize virtual machines. In the example of FIG. 10, this is illustrated by virtual machine 1048. A virtual machine creates a software environment where applications/modules can execute as if they were executing on a hardware machine (such as the device 1100 of FIG. 11, for example). A virtual machine 1048 is hosted by a host operating system (e.g., operating system 1014) and typically, although not always, has a virtual machine monitor 1046, which manages the operation of the virtual machine 1048 as well as the interface with the host operating system (i.e.,
operating system 1014). A software architecture 1002 executes within the virtual machine 1048 such as an operating system 1050, libraries 1052, frameworks/middleware 1054, applications 1056, or presentation layer 1058. These layers of software architecture executing within the virtual machine 1048 can be the same as corresponding layers previously described or may be different.
[0124] FIG. 11 is a block diagram illustrating circuitry for a device that implements algorithms and performs methods, according to example embodiments. All components need not be used in various embodiments. For example, clients, servers, and cloud-based network devices may each use a different set of components, or in the case of servers, larger storage devices.
[0125] One example computing device in the form of a computer 1100
(also referred to as computing device 1100, computer system 1100, or computer 1100) may include a processor 1105, memory 1110, removable storage 1115, non-removable storage 1120, input interface 1125, the output interface 1130, and communication interface 1135, all connected by a bus 1140. Although the example computing device is illustrated and described as the computer 1100, the computing device may be in different forms in different embodiments.
[0126] The memory 1110 may include volatile memory 1145 and non volatile memory 1150 and may store a program 1155. The computing device 1100 may include - or have access to a computing environment that includes - a variety of computer-readable media, such as the volatile memory 1145, the non volatile memory 1150, the removable storage 1115, and the non-removable storage 1120. Computer storage includes random-access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD ROM), digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.
[0127] Computer-readable instructions stored on a computer-readable medium (e.g., the program 1155 stored in the memory 1110) are executable by the processor 1105 of the computing device 1100. A hard drive, CD-ROM, and
RAM are some examples of articles including a non-transitory computer- readable medium such as a storage device. The terms “computer-readable medium” and “storage device” do not include carrier waves to the extent that carrier waves are deemed too transitory. “Computer-readable non-transitory media” includes all types of computer-readable media, including magnetic storage media, optical storage media, flash media, and solid-state storage media. It should be understood that software can be installed in and sold with a computer. Alternatively, the software can be obtained and loaded into the computer, including obtaining the software through a physical medium or distribution system, including, for example, from a server owned by the software creator or from a server not owned but used by the software creator. The software can be stored on a server for distribution over the Internet, for example. As used herein, the terms “computer-readable medium” and “machine -readable medium” are interchangeable.
[0128] Program 1155 may utilize a data entry processing module 1160 which may be the same as or similar to the data entry processing module 1060 of FIG. 10. In some aspects, the data entry processing module 1160 comprises suitable circuitry, logic, interfaces, or code and can be configured to perform the functionalities of sequencer module 162, naming server 102, the network switch 116, and storage controllers 136 - 142 of FIG. 1 or other firmware or hardware discussed herein in connection with FIGS. 1-8.
[0129] Any one or more of the modules described herein may be implemented using hardware (e.g., a processor of a machine, an application- specific integrated circuit (ASIC), field-programmable gate array (FPGA), or any suitable combination thereof). Moreover, any two or more of these modules may be combined into a single module, and the functions described herein for a single module may be subdivided among multiple modules. Furthermore, according to various example embodiments, modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.
[0130] Although a few embodiments have been described in detail above, other modifications are possible. For example, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to
achieve desirable results. Other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Other embodiments may be within the scope of the following claims.
[0131] It should be further understood that software including one or more computer-executable instructions that facilitate processing and operations as described above regarding any one or all of the steps of the disclosure can be installed in and sold with one or more computing devices consistent with the disclosure. Alternatively, the software can be obtained and loaded into one or more computing devices, including obtaining the software through a physical medium or distribution system, including, for example, from a server owned by the software creator or from a server not owned but used by the software creator. The software can be stored on a server for distribution over the Internet, for example.
[0132] Also, it will be understood by one skilled in the art that this disclosure is not limited in its application to the details of construction and the arrangement of components outlined in the description or illustrated in the drawings. The embodiments herein are capable of other embodiments and capable of being practiced or carried out in various ways. Also, it will be understood that the phraseology and terminology used herein is for description and should not be regarded as limiting. The use of “including,” “comprising,” or “having” and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. Unless limited otherwise, the terms “connected,” “coupled,” and “mounted,” and variations thereof herein are used broadly and encompass direct and indirect connections, couplings, and mountings. In addition, the terms “connected” and “coupled,” and variations thereof, are not restricted to physical or mechanical connections or couplings. Further, terms such as up, down, bottom, and top are relative, and are employed to aid illustration, but are not limiting.
[0133] The components of the illustrative devices, systems, and methods employed by the illustrated embodiments can be implemented, at least in part, in digital electronic circuitry, analog electronic circuitry, or computer hardware, firmware, software, or in combinations of them. These components can be
implemented, for example, as a computer program product such as a computer program, program code, or computer instructions tangibly embodied in an information carrier, or a machine-readable storage device, for execution by, or to control the operation of, data processing apparatus such as a programmable processor, a computer, or multiple computers.
[0134] A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other units suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or multiple computers at one site or distributed across multiple sites and interconnected by a communication network. Also, functional programs, codes, and code segments for accomplishing the techniques described herein can be easily construed as within the scope of the claims by programmers skilled in the art to which the techniques described herein pertain. Method steps associated with the illustrative embodiments can be performed by one or more programmable processors executing a computer program, code, or instructions to perform functions (e.g., by operating on input data or generating an output). Method steps can also be performed by, and apparatus for performing the methods can be implemented as, special purpose logic circuitry, e.g., an FPGA (field-programmable gate array) or an ASIC (application-specific integrated circuit), for example.
[0135] The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general-purpose processor, a digital signal processor (DSP), an ASIC, an FPGA or other programmable logic device, discrete gate, or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
[0136] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random-access memory or both. The required elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example, semiconductor memory devices, e.g., electrically programmable read-only memory or ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory devices, or data storage disks (e.g., magnetic disks, internal hard disks, or removable disks, magneto-optical disks, or CD-ROM and DVD-ROM disks). The processor and the memory can be supplemented by or incorporated in special purpose logic circuitry.
[0137] Those of skill in the art understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
[0138] As used herein, “machine-readable medium” (or “computer- readable medium”) comprises a device able to store instructions and data temporarily or permanently and may include, but is not limited to, random- access memory (RAM), read-only memory (ROM), buffer memory, flash memory, optical media, magnetic media, cache memory, other types of storage (e.g., Erasable Programmable Read-Only Memory (EEPROM)), or any suitable combination thereof. The term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store processor instructions. The term “machine-readable medium” shall also be taken to include any medium
or a combination of multiple media, that is capable of storing instructions for execution by one or more processors, such that the instructions, when executed by one or more processors, cause the one or more processors to perform any one or more of the methodologies described herein. Accordingly, a “machine- readable medium” refers to a single storage apparatus or device, as well as “cloud-based” storage systems or storage networks that include multiple storage apparatus or devices. The term “machine-readable medium” as used herein excludes signals per se.
[0139] In an example embodiment, the computing device 1100 includes a a sequencer module for generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order; a switching module for: selecting a subset of storage devices of a plurality of storage devices storing a copy of the data object, the selecting based on a sharding instance identifier associated with the plurality of data entries; and communicating a data stream with the plurality of data entries to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers; and a storage controller module for: arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers; and executing for each data entry of the plurality of data entries, a data processing operation using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing operation executed according to an execution order based on the sequence identifier.
[0140] In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other
examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the scope disclosed herein.
[0141] Although the present disclosure has been described concerning specific features and embodiments thereof, it is evident that various modifications and combinations can be made thereto without departing from the scope of the disclosure. For example, other components may be added to, or removed from, the described systems. The specification and drawings are, accordingly, to be regarded simply as an illustration of the disclosure as defined by the appended claims, and are contemplated to cover any modifications, variations, combinations, or equivalents that fall within the scope of the present disclosure. Other aspects may be within the scope of the following claims. Finally, as used herein, the conjunction “or” refers to a non-exclusive “or,” unless specifically stated otherwise.
Claims
1. A system for processing data in a distributed storage network, the system comprising: a plurality of storage devices, each of the plurality of storage devices storing a copy of a data object; and processing circuitry coupled to the plurality of storage devices, the processing circuitry configured to perform operations comprising: generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing the data object, the plurality of sequence identifiers generated in sequential order; selecting a subset of storage devices of the plurality of storage devices based on a sharding instance identifier associated with the plurality of data entries; communicating a data stream with the plurality of data entries to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers; arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers; and executing for each data entry of the plurality of data entries, a data processing operation using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing operation executed according to an execution order based on the sequence identifier.
2. The system of claim 1, wherein each data entry of the plurality of data entries further comprises availability information identifying one of a plurality of versions of the data object stored at the storage device, and wherein the processing circuitry is further configured to perform operations comprising:
executing the data processing operation using the one of a plurality of versions of the data object identified by the availability information.
3. The system of claim 2, wherein the plurality of versions of the data object identified by the availability information includes one of: a plurality of redundancy versions associated with a corresponding plurality of copies of the data object; or a plurality of erasure-coded versions associated with data information or parity information of the data object.
4. The system of any of claims 1-3, wherein to select the subset of storage devices, the processing circuitry is further configured to perform operations comprising: selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier, the selected sharding instance identifying the subset of storage devices.
5. The system of any of claims 1-4, wherein the data processing operation is associated with a plurality of redundancy versions of the data object, and wherein the processing circuitry is further configured to perform operations comprising: applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify the storage device of the subset of storage devices; and communicating the data stream with the plurality of data entries to the storage device of the subset of storage devices, each data entry of the plurality of data entries further including a redundancy version of the plurality of redundancy versions of the data object and a data segment for appending to the redundancy version of the data object during execution of the data processing operation.
6. The system of claim 5, wherein the processing circuitry is further configured to perform operations comprising:
arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers; and executing the data processing operation for each data entry of the plurality of data entries using the redundancy version of the data object and according to the execution order based on the sequence identifier.
7. The system of any of claims 1-6, wherein the data processing operation is associated with a plurality of erasure -coded versions of the data object, and wherein to select the subset of storage devices, the processing circuitry is further configured to perform operations comprising: selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier, the selected sharding instance identifying a plurality of redundancy groups of storage devices of the plurality of storage devices.
8. The system of claim 7, wherein the processing circuitry is further configured to perform operations comprising: applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify a redundancy group of the plurality of redundancy groups, the identified redundancy group comprising the subset of storage devices.
9. The system of any one of claims 7-8, wherein the processing circuitry is further configured to perform operations comprising: communicating the data stream with the plurality of data entries to each storage device of the subset of storage devices, each data entry of the plurality of data entries further including a corresponding erasure -coded version of the plurality of erasure-coded versions of the data object and a data segment for appending to the corresponding erasure-coded version of the data object during execution of the data processing operation at each storage device of the subset of storage devices.
10. The system of any one of claim 7-9, wherein the processing circuitry is further configured to perform operations comprising: for each storage device of the subset of storage devices: arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers; and executing the data processing operation for each data entry of the plurality of data entries using the corresponding erasure-coded version of the data object and according to the execution order based on the sequence identifier.
11. The system of any one of claims 7-10, wherein the processing circuitry is further configured to perform operations comprising: for each storage device of the subset of storage devices: arranging the plurality of data entries in a buffer based on the data object identifier and the sequential order of the plurality of sequence identifiers; executing the data processing operation for each data entry of the plurality of data entries using an erasure -coded version of the plurality of erasure-coded versions of the data object and according to the execution order based on the sequence identifier, to generate a corresponding erasure-coded result; and communicating the corresponding erasure -coded result to each storage device of the subset of storage devices for storage.
12. A computer-implemented method for processing data in a distributed storage network, the method comprising: generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order; selecting a subset of storage devices of a plurality of storage devices based on a sharding instance identifier associated with the plurality of data entries, the plurality of storage devices storing a copy of the data object;
communicating a data stream with the plurality of data entries to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers; arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers; and executing for each data entry of the plurality of data entries, a data processing operation using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing operation executed according to an execution order based on the sequence identifier.
13. The computer-implemented method of claim 12, wherein each data entry of the plurality of data entries further comprises availability information identifying one of a plurality of versions of the data object stored at the storage device, and the method further comprises: executing the data processing operation using the one of a plurality of versions of the data object identified by the availability information.
14. The computer-implemented method of claim 13, wherein the plurality of versions of the data object identified by the availability information includes one of: a plurality of redundancy versions associated with a corresponding plurality of copies of the data object; or a plurality of erasure-coded versions associated with data information or parity information of the data object.
15. The computer-implemented method of any of claims 12-14, wherein selecting the subset of storage devices further comprises: selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier, the selected sharding instance identifying the subset of storage devices.
16. The computer-implemented method of any of claims 12-15, wherein the data processing operation is associated with a plurality of redundancy versions of the data object, and wherein the method further comprises: applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify the storage device of the subset of storage devices; and communicating the data stream with the plurality of data entries to the storage device of the subset of storage devices, each data entry of the plurality of data entries further including a redundancy version of the plurality of redundancy versions of the data object and a data segment for appending to the redundancy version of the data object during execution of the data processing operation.
17. The computer-implemented method of claim 16, further comprising: arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers; and executing the data processing operation for each data entry of the plurality of data entries using the redundancy version of the data object and according to the execution order based on the sequence identifier.
18. A non-transitory computer-readable medium storing computer instructions for processing data in a distributed storage network, wherein the instructions when executed by one or more processors of a network node within the distributed storage network, cause the one or more processors to perform operations comprising: generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order; selecting a subset of storage devices of a plurality of storage devices of the distributed storage network based on a sharding instance identifier associated
with the plurality of data entries, the plurality of storage devices storing a copy of the data object; communicating a data stream with the plurality of data entries to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers; arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers; and executing for each data entry of the plurality of data entries, a data processing operation using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing operation executed according to an execution order based on the sequence identifier.
19. The non-transitory computer-readable medium of claim 18, wherein to select the subset of storage devices, the instructions cause the one or more processors to perform operations comprising: selecting a sharding instance from a plurality of available sharding instances associated with a shard mapping using the sharding instance identifier, the selected sharding instance identifying a plurality of redundancy groups of storage devices of the plurality of storage devices.
20. The non-transitory computer-readable medium of any of claims 18-19, wherein the data processing operation is associated with a plurality of erasure- coded versions of the data object, and wherein executing the instructions causes the one or more processors to perform operations comprising: applying a hashing function of the sharding instance corresponding to the sharding instance identifier to hashing input information associated with the sharding instance identifier to identify a redundancy group of the plurality of redundancy groups, the identified redundancy group comprising the subset of storage devices.
21. The non-transitory computer-readable medium of claim 20, wherein executing the instructions causes the one or more processors to perform operations comprising: for each storage device of the subset of storage devices: arranging the plurality of data entries in a buffer based on the data object identifier and the sequential order of the plurality of sequence identifiers; executing the data processing operation for each data entry of the plurality of data entries using an erasure -coded version of the plurality of erasure-coded versions of the data object and according to the execution order based on the sequence identifier, to generate a corresponding erasure-coded result; and communicating the corresponding erasure -coded result to each storage device of the subset of storage devices for storage.
22. An apparatus in a distributed storage network, the apparatus comprising: a plurality of storage devices, each of the plurality of storage devices storing a copy of a data object; and a sequencer module for generating a plurality of sequence identifiers for a corresponding plurality of data entries for accessing a data object, the plurality of sequence identifiers generated in sequential order; a switching module for: selecting a subset of storage devices of a plurality of storage devices storing a copy of the data object, the selecting based on a sharding instance identifier associated with the plurality of data entries; and communicating a data stream with the plurality of data entries to a storage device of the subset of storage devices, each data entry of the plurality of data entries comprising a data object identifier of the data object, a data processing operation identifier, and a sequence identifier of the plurality of sequence identifiers; and a storage controller module for:
arranging the plurality of data entries in a buffer of the storage device based on the data object identifier and the sequential order of the plurality of sequence identifiers; and executing for each data entry of the plurality of data entries, a data processing operation using the data object, the data processing operation corresponding to the data processing operation identifier, and the data processing operation executed according to an execution order based on the sequence identifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2021/018552 WO2022177566A1 (en) | 2021-02-18 | 2021-02-18 | Data access processing on network-attached storage devices |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2021/018552 WO2022177566A1 (en) | 2021-02-18 | 2021-02-18 | Data access processing on network-attached storage devices |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022177566A1 true WO2022177566A1 (en) | 2022-08-25 |
Family
ID=74870893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2021/018552 WO2022177566A1 (en) | 2021-02-18 | 2021-02-18 | Data access processing on network-attached storage devices |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022177566A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150161184A1 (en) * | 2012-08-08 | 2015-06-11 | Amazon Technologies, Inc. | Data storage inventory indexing |
US9262469B1 (en) * | 2012-04-23 | 2016-02-16 | Monsanto Technology Llc | Intelligent data integration system |
US10824612B2 (en) * | 2017-08-21 | 2020-11-03 | Western Digital Technologies, Inc. | Key ticketing system with lock-free concurrency and versioning |
US20200401562A1 (en) * | 2019-06-24 | 2020-12-24 | Western Digital Technologies, Inc. | Parallel processing of filtered transaction logs |
-
2021
- 2021-02-18 WO PCT/US2021/018552 patent/WO2022177566A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9262469B1 (en) * | 2012-04-23 | 2016-02-16 | Monsanto Technology Llc | Intelligent data integration system |
US20150161184A1 (en) * | 2012-08-08 | 2015-06-11 | Amazon Technologies, Inc. | Data storage inventory indexing |
US10824612B2 (en) * | 2017-08-21 | 2020-11-03 | Western Digital Technologies, Inc. | Key ticketing system with lock-free concurrency and versioning |
US20200401562A1 (en) * | 2019-06-24 | 2020-12-24 | Western Digital Technologies, Inc. | Parallel processing of filtered transaction logs |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11588783B2 (en) | Techniques for implementing IPV6-based distributed storage space | |
JP6798960B2 (en) | Virtual Disk Blueprint for Virtualized Storage Area Networks | |
JP6607901B2 (en) | Scalable distributed storage architecture | |
US9298386B2 (en) | System and method for improved placement of blocks in a deduplication-erasure code environment | |
US9727273B1 (en) | Scalable clusterwide de-duplication | |
US10528527B2 (en) | File management in thin provisioning storage environments | |
US20140143367A1 (en) | Robustness in a scalable block storage system | |
US20150254325A1 (en) | Managing a distributed database across a plurality of clusters | |
US20160004631A1 (en) | Profile-Dependent Write Placement of Data into a Non-Volatile Solid-State Storage | |
JP2018181325A (en) | Data path monitoring in distributed storage networks | |
US11003558B2 (en) | Systems and methods for sequential resilvering | |
CN113826065A (en) | Efficient space management for high-performance writable snapshots | |
US20160019128A1 (en) | Systems and methods providing mount catalogs for rapid volume mount | |
US10896201B2 (en) | Synchronization of block based volumes | |
CN107798063A (en) | Snapshot processing method and snapshot processing device | |
US10564883B2 (en) | Efficient migration to distributed storage | |
US11038960B1 (en) | Stream-based shared storage system | |
WO2022177566A1 (en) | Data access processing on network-attached storage devices | |
US20230409215A1 (en) | Graph-based storage management | |
CN113965582A (en) | Mode conversion method and system, and storage medium | |
WO2022177564A1 (en) | Distributed naming scheme for network-attached storage devices | |
WO2022177561A1 (en) | Data access processing agnostic to mapping unit size | |
CN110597465A (en) | A method, device and readable medium for improving the performance of a GPU server | |
US11500815B2 (en) | Dual relationship-based hash structure for non-volatile memory technology | |
US11755538B2 (en) | Distributed management of file modification-time field |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21711419 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21711419 Country of ref document: EP Kind code of ref document: A1 |