US20200052954A1 - Network-Assisted Raft Consensus Protocol - Google Patents
Network-Assisted Raft Consensus Protocol Download PDFInfo
- Publication number
- US20200052954A1 US20200052954A1 US16/101,751 US201816101751A US2020052954A1 US 20200052954 A1 US20200052954 A1 US 20200052954A1 US 201816101751 A US201816101751 A US 201816101751A US 2020052954 A1 US2020052954 A1 US 2020052954A1
- Authority
- US
- United States
- Prior art keywords
- server
- raft
- switch
- network
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000010076 replication Effects 0.000 claims abstract description 15
- 238000000034 method Methods 0.000 claims description 67
- 238000004891 communication Methods 0.000 claims description 60
- 230000006854 communication Effects 0.000 claims description 60
- 238000005516 engineering process Methods 0.000 abstract description 38
- 238000010586 diagram Methods 0.000 description 28
- 238000012545 processing Methods 0.000 description 23
- 230000001413 cellular effect Effects 0.000 description 18
- 230000009471 action Effects 0.000 description 12
- 230000004044 response Effects 0.000 description 10
- 238000004366 reverse phase liquid chromatography Methods 0.000 description 8
- 230000000875 corresponding effect Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 230000002457 bidirectional effect Effects 0.000 description 5
- 210000004027 cell Anatomy 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 230000001276 controlling effect Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- VJYFKVYYMZPMAB-UHFFFAOYSA-N ethoprophos Chemical compound CCCSP(=O)(OCC)SCCC VJYFKVYYMZPMAB-UHFFFAOYSA-N 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 229920001690 polydopamine Polymers 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 240000001436 Antirrhinum majus Species 0.000 description 1
- 241000272878 Apodiformes Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 241000414697 Tegra Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000007175 bidirectional communication Effects 0.000 description 1
- 210000004271 bone marrow stromal cell Anatomy 0.000 description 1
- 238000001444 catalytic combustion detection Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001152 differential interference contrast microscopy Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0668—Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/35—Switches specially adapted for specific applications
- H04L49/355—Application aware switches, e.g. for HTTP
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1044—Group management mechanisms
- H04L67/1051—Group master selection mechanisms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1087—Peer-to-peer [P2P] networks using cross-functional networking aspects
- H04L67/1093—Some peer nodes performing special functions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/64—Routing or path finding of packets in data switching networks using an overlay routing layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Definitions
- Consensus algorithms e.g., Paxos, ZAB, and Raft
- These consensus mechanisms tend to incur high overheads in terms of latency since they involve multiple rounds of communication. This is especially true when strong consistency guarantees are desired.
- consensus requires at least the round-trip time between servers running consensus algorithms.
- Raft is a consensus algorithm designed as an alternative to Paxos. Raft was designed to be more understandable than Paxos, and Raft is formally proven safe. Raft also provides a better foundation for building practical systems. To enhance understandability, Raft separates the main consensus components into the following sub-problems: 1) Leader election: a new leader is elected when the current leader fails; 2) Log replication: the leader accepts log entries from clients and replicates them, forcing other logs to be consistent with its own log; and 3) Log commitment: few restrictions are enforced to ensure safe log commitment, that is—if any member applied a particular command to its state machine, then no other member may apply a different command for the same entry.
- Raft starts by electing a strong leader, and then gives the leader full responsibility for managing the replicated log.
- the leader accepts log entries from clients (i.e., end devices making requests), and replicates the log entries to other servers. When it is safe to apply log entries to the state machines, the leader notifies the servers to apply the log entries to their local state machines.
- P4 is a language to program data-plane behavior of network devices. P4 can be used to support customized functionality (e.g., the evolving OpenFlow standard), specific datacenter packet processing logic, etc.
- the P4 language composes an abstract forwarding model that uses a chain of tables for packet processing. The tables match pre-defined packet fields, and perform a sequence of actions. Then, a P4 compiler takes charge of the abstract forwarding model to a concrete implementation on a particular target platform (e.g., software switches, field-programmable gate arrays (“FPGAs”), and the like).
- a target platform e.g., software switches, field-programmable gate arrays (“FPGAs”), and the like.
- a system can include a plurality of servers operating in a server cluster, and a plurality of P4 switches corresponding to the plurality of servers.
- Each server of the plurality of servers can include a back-end that executes a complete Raft algorithm to perform leader election, log replication, and log commitment of a Raft consensus algorithm.
- Each P4 switch of the plurality of P4 switches can include a front-end that executes a partial Raft algorithm to perform the log replication and the log commitment of the Raft consensus algorithm.
- the back-end can maintain a complete state for responding to requests that cannot be fulfilled by the front-end.
- the requests can include read requests and/or write requests.
- a first server operating in a server cluster can receive, from a client, a read request message.
- the first server in this case is not recognized as a leader in the server cluster.
- a first P4 switch in communication with the first server can receive, from the client the read request message.
- the first P4 switch can forward the read request message to a second server that is recognized as the leader in the server cluster.
- a second P4 switch in communication with the second server can receive the read request message immediately without involving the second server.
- a first server operating in a server cluster can receive, from a client, a write request message.
- the first server in this case is recognized as a leader in the server cluster.
- a first P4 switch in communication with the first server can receive, from the first client, the write request message.
- the first P4 switch can handle the write request message.
- the first P4 switch can notify, without involving the first server, a second server and a third server of the cluster of write request results resulting from the first P4 switch handling the write request message.
- FIG. 1 is a block diagram illustrating a legacy Raft algorithm overview of a legacy Raft consensus algorithm.
- FIG. 2 is a block diagram illustrating an overview of Raft terms.
- FIG. 3 is a block diagram illustrating aspects of a NetRaft system architecture, according to an illustrative embodiment.
- FIG. 4A is a block diagram illustrating aspects of a legacy Raft read operation.
- FIG. 4B is a block diagram illustrating aspects of a NetRaft read operation.
- FIG. 5A is a block diagram illustrating aspects of a legacy Raft write operation.
- FIG. 5B is a block diagram illustrating aspects of a NetRaft write operation.
- FIG. 6 is a flow diagram illustrating aspects of a method for operating a Raft-aware P4 switch, according to an illustrative embodiment of the concepts and technologies disclosed herein.
- FIG. 7 is a flow diagram illustrating aspects of a method for server selection, according to an illustrative embodiment of the concepts and technologies disclosed herein.
- FIG. 8 is a flow diagram illustrating aspects of a method for executing a NetRaft read operation, according to an illustrative embodiment of the concepts and technologies disclosed herein.
- FIG. 9 is a flow diagram illustrating aspects of a method for executing a NetRaft write operation, according to an illustrative embodiment of the concepts and technologies disclosed herein.
- FIG. 10 is a block diagram illustrating an example experimental setup for a NetRaft implementation, according to an illustrative embodiment.
- FIG. 11 is a block diagram illustrating a software-defined networking (“SDN”) network capable of implementing aspects of the embodiments disclosed herein.
- SDN software-defined networking
- FIG. 12 is a block diagram illustrating an example mobile device capable of implementing aspects of the embodiments disclosed herein.
- FIG. 13 is a block diagram illustrating an example computer system capable of implementing aspects of the embodiments presented herein.
- FIG. 14 is a diagram illustrating a network, according to an illustrative embodiment.
- FIG. 15 is a block diagram illustrating aspects of an illustrative cloud environment capable of implementing aspects of the embodiments presented herein.
- program modules include routines, programs, components, data structures, computer-executable instructions, and/or other types of structures that perform particular tasks or implement particular abstract data types.
- program modules include routines, programs, components, data structures, computer-executable instructions, and/or other types of structures that perform particular tasks or implement particular abstract data types.
- program modules include routines, programs, components, data structures, computer-executable instructions, and/or other types of structures that perform particular tasks or implement particular abstract data types.
- the subject matter described herein may be practiced with other computer systems, including hand-held devices, mobile devices, wireless devices, multiprocessor systems, distributed computing systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, routers, switches, other computing devices described herein, and the like.
- Consensus algorithms e.g., Paxos, ZAB, and Raft
- These consensus mechanisms tend to incur high overheads in terms of latency since they involve multiple rounds of communication. This is especially true when strong consistency guarantees are desired.
- consensus requires at least the round-trip time between servers running consensus algorithms.
- offloading application level implementation of a consensus algorithm to the network offers the potential to reduce the consensus latency.
- NetPaxos proposes implementing Paxos in the network by utilizing OpenFlow switches. NetPaxos can also be implemented using P4, a domain specific language that allows the programming of packet-forwarding data plane. Other efforts have been made to implement the entire ZAB consensus algorithm on FPGA devices using a low-level language. This hardware-based solution, however, might not be scalable as it requires the storage of potentially large amounts of consensus states, logic, and even the application data.
- the concepts and technologies disclosed herein propose a network-assisted Raft consensus algorithm that takes advantage of programmable P4 switches and offloads certain Raft functionality to the network.
- the proposed algorithm is referred to herein as “NetRaft.”
- the concepts and technologies disclosed herein focus on Raft since it has formally proven to be safe and is more understandable than Paxos.
- Raft has been used in the implementation of popular software-defined networking (“SDN”) controllers, such as OpenDayLight.
- SDN software-defined networking
- NetRaft effectively reduces consensus latency, is failure-aware, and does not sacrifice correctness or scalability.
- NetRaft uses P4-based programmable switches and offloads partial Raft functionality to the switch.
- the concepts and technologies disclosed herein demonstrate the efficacy of this approach and the performance improvements NetRaft offers via a prototype implementation.
- Raft is a consensus algorithm designed as an alternative to Paxos. The designers of Raft intended the algorithm to be more understandable than Paxos, and Raft has been formally proven safe. Raft also provides a better foundation for building practical systems. To enhance understandability, Raft separates the main consensus components into the following sub-problems: 1) Leader election: a new leader is elected when the current leader fails; 2) Log replication: the leader accepts log entries from clients and replicates the log entries, forcing other logs to be consistent with the leader's log; and 3) Log commitment: few restrictions are enforced to ensure safe log commitment—that is, if any member applied a particular command to its state machine, then no other member may apply a different command for the same entry.
- Raft starts by electing a strong leader, and then gives the leader full responsibility for managing the replicated log.
- the leader accepts log entries from clients (e.g., end devices making requests), and replicates the log entries to other servers. When it is safe to apply log entries to the state machines, the leader notifies the other servers to apply the log entries to their respective local state machines.
- Raft server clusters typically contain an odd number of members (e.g., five servers and two failures). Each server in a Raft server cluster can be in one of three states: a follower state 102 (“follower(s) 102 ”), a candidate state (“candidate(s) 104 ”), or a leader state 106 (“leader 106 ”). Typically, a Raft server cluster has one server operating as the leader 106 and the other servers operating as the followers 102 . The followers 102 can passively receive remote procedure calls (“RPCs”) from the leader 106 or the candidate(s) 104 .
- RPCs remote procedure calls
- the candidate(s) 104 can initiate an election to become the leader 106 after receiving majority votes from the servers in the Raft server cluster.
- the leader 106 responds to requests received from clients and replicates corresponding log entries to the follower(s) 102 . If a client sends a request to one of the followers 102 instead of the leader 106 , the follower 102 can redirect the request to the leader 106 .
- the Raft terms overview 200 illustrates time 202 that is divided into four terms 204 , 206 , 208 , 210 of arbitrary length.
- the terms 204 , 206 , 208 , 210 are monotonically increasing integers, where each term 204 , 206 , 208 , 210 begins with an election. If a given candidate 104 wins an election (i.e., a successful “election 212 , 212 ′, 212 ”), the candidate 104 will serve as the leader 106 for the rest of the corresponding term.
- terms 204 (“T 1 ”), 208 (“T 3 ”), 210 (“T 4 ”) each begins with a successful election 212 , 212 ′, 212 ′′, and continues thereafter with normal Raft operations 214 , 214 ′, 214 ′′.
- Term 206 (“T 2 ”) leads to split votes 216 , resulting in no successful election 212 and no normal Raft operations 214 .
- Terms 204 , 206 , 208 , 210 allow Raft servers to detect obsolete information, such as information stored by stale leaders. Current terms are exchanged whenever servers communicate using RPCs. When a leader 106 or a candidate 104 learns that its current term is out of date (i.e., there exists a higher term number among the server cluster), the leader 106 or the candidate 104 immediately reverts to the follower state 102 . If a server receives a request (e.g., either a vote request or a request to replicate a log entry), from the leader 106 , with a stale term number, the server will reject the request.
- a request e.g., either a vote request or a request to replicate a log entry
- the leader 106 of a Raft server cluster sends periodically heartbeats to the followers 102 . All other servers remain in the follower state 102 as long as they are receiving heartbeats from the (current) leader 106 . If a given follower 102 does not receive a heartbeat message during a predefined period of time (referred to herein as an “election timeout”), the follower 102 assumes that there is no leader 106 and starts a new election ( 108 ). To start a new election, the follower 102 that encountered the election timeout increments its current term, votes for itself, and transitions to the candidate state 104 . The newly-transitioned follower 102 to candidate 104 then sends RequestVote RPCs to all other servers in the server cluster ( 110 ).
- selection timeout a predefined period of time
- the candidate 104 wins the election if it receives votes from a majority of the other servers for its term ( 112 ). Then, the candidate 104 transitions to the leader state 106 and sends heartbeat to all other servers in the server cluster to prevent new elections and to establish its authority for its term.
- the candidate 104 While waiting for votes, the candidate 104 might receive a heartbeat message from another server claiming to be the leader 106 . If the received term number is at least as large as the candidate's 104 current term, then candidate 104 will surrender candidateship and transition back to the follower state 102 ( 114 ).
- Raft enforces restrictions on elected leaders to guarantee that all committed entries from previous terms are present on the new leader.
- a candidate must receive a majority vote from the server cluster.
- a server in the cluster will vote for the candidate with a higher term and the candidate with a log that is at least as up-to-date as its own log. Otherwise, the server rejects the vote request. Therefore, receiving a majority vote means that the log of the new leader contains all committed entries.
- the NetRaft system architecture 300 includes a network 302 with which a plurality of P4 switches 304 corresponding to a plurality of servers 306 are in communication.
- P4 SWITCH 1 304 A is shown in communication with SERVER 1 306 A
- P4 SWITCH 2 304 B is shown in communication with SERVER 2 306 B
- P4 SWITCH 3 304 C is shown in communication with SERVERS 306 C.
- the NetRaft system architecture 300 also shows a client 308 operating in communication with the P4 SWITCH 2 304 B.
- the network 302 can be or can include any packet network capable of exchanging data packets (e.g., among the client 308 , the P4 switches 304 , and the servers 306 ). Additional details regarding the network 302 are provided herein with reference to FIG. 14 .
- the P4 switches 304 utilize the P4 language to control data-plane behavior.
- the P4 language can be used to support customized functionality (e.g., the evolving OpenFlow standard), specific data-center packet processing logic, and the like.
- the P4 language composes an abstract forwarding model that uses a chain of tables for packet processing. The tables match pre-defined packet fields, and perform a sequence of actions.
- a P4 compiler then takes charge of applying the abstract forwarding model to create a concrete implementation on a particular target platform (e.g., the P4 switches 304 ).
- control blocks that specify a way of composing tables
- tables that specify packet processing logic, a high-level behavior representation about field matching and corresponding actions
- customized packet header fields that are a collection of packet bytes
- packet header parser that describes a way of transforming incoming packets to field matching instances
- actions that forward or drop packets, modify fields, perform stateful memory operations, and encapsulate or decapsulate headers.
- NetRaft uses registers to keep track of Raft states like logs and state machines. Registers provide persistent state that can be organized as an array of cells. NetRaft specifies the size of each cell and the number of cells in the array of cells, when declaring a register for Raft state.
- NetRaft A unique feature of NetRaft is the ability to duplicate only the necessary logic to the P4 switches 304 that act as a cache to reduce consensus latency. Thus, NetRaft minimizes the storage of replicated log entries and state machine in the P4 switches 304 . As will be described in further detail below, in NetRaft, the entire Raft algorithm is still running on the servers 306 . This partial offloading architecture helps improve the performance of Raft, especially the consensus latency, without sacrificing scalability.
- Raft has three roles: the leader 106 who maintains consensus in a centralized way, the follower(s) 102 who passively respond(s) to Raft RPCs, and the candidate(s) 104 who is/are converted/transitioned from a follower 102 during leader election when the original leader fails.
- the basic version of legacy Raft has only two RPCs: RequestVote issued by a candidate during election and AppendEntries issued by the leader to send heartbeats or log entries.
- legacy Raft has only four message types (two RPCs and two responses) compared to ten types in ZAB.
- NetRaft offloads the processing of AppendEntries messages and the responses of RequestVote to the P4 switches 304 .
- the illustrated NetRaft system architecture 300 includes two components: a front-end 310 implemented in the P4 switch 304 (in the illustrated embodiment, the P4 SWITCH 3 304 C) executing a partial Raft algorithm 312 to perform log replication 314 and log commitment 316 elements for NetRaft, and a back-end 318 in the server 306 (in the illustrated embodiment, the SERVERS 306 C) running a complete Raft algorithm 320 .
- Log replication 314 , 314 ′ and log commitment 316 , 316 ′ elements are duplicated at the front-end 310 and the back-end 318 to improve performance and scalability.
- the front-end 310 enhances Raft in two aspects.
- the front-end 310 is able to perform Raft-aware forwarding, and can quickly respond to Raft requests by rewriting the incoming packets.
- a job of the back-end 318 is to execute the complete Raft algorithm 320 to perform leader election 322 and to maintain complete states 324 on the server 306 for responding to certain requests that might not be fulfilled by the front-end 310 .
- the P4 switch via the front-end 310 , parses Raft request messages and caches Raft states using P4's primitive actions.
- the front-end 310 parses the request message and rewrites the request message to construct a corresponding response message.
- the front-end 310 also forwards the original packet to the back-end 318 for a liveness check.
- the back-end 318 sends a response message to the P4 switch 304 , but the front-end 310 does not forward the response message, and instead, only extracts any necessary flow control information therefrom.
- the front-end 310 might not be able to generate a response due to the limited information available on the P4 switch 304 ; such a request will be served normally by the back-end 318 .
- the new server attempts to fetch all the logs. These logs might not all be available at the front-end 310 , and in such instances, the back-end 318 would serve the request.
- the front-end 310 can forward certain Raft messages without involving the back-end 318 .
- Raft requests from the client 308 can only be handled by the leader 106 .
- the client 308 randomly picks a server in the cluster with which to communicate. If the selected server is not the leader 106 , the selected server notifies the client 308 of the leader's IP address (if known). The client 308 then issues a new request to the leader 106 .
- the front-end 310 since the front-end 310 is aware of Raft (via implementation of the partial Raft algorithm 312 ), the front-end 310 of the selected server can forward the request to the leader 106 directly and reduce the communication overhead, as will be described in greater detail below with reference to FIGS. 4A-4B and 5A-5B .
- the front-end 310 can discard obsolete information because the back-end 318 always keeps the necessary information.
- the mechanism for discarding state machine is different from discarding obsolete log entries because the front-end 310 needs to know whether a requested item is already in the front-end 310 state machine or the back-end state machine.
- the front-end 310 ensures the back-end 318 is in sync before deleting state information.
- FIGS. 4A-4B block diagrams illustrating a comparison between a read operation as performed by legacy raft ( 400 ) and a read operation as performed by NetRaft ( 406 ) will be described.
- the client 308 sends ( 402 ) a read request to the SERVER 2 306 B, which is not the leader 106 .
- the SERVER 2 306 B will notify the client 308 that the SERVER 1 306 A is the leader 106 and then the client 308 re-sends ( 404 ) the read request to the SERVER 1 306 A.
- a Raft-aware switch SWITCH 2 304 B, connected to the SERVER 2 306 B receives the read request in and then forwards it to the leader 106 (SERVER 1 306 A) directly ( 408 ). Since SWITCH 1 304 A is Raft-aware (via implementation of the partial Raft algorithm 312 ) and is connected to the SERVER 1 306 A operating in the leader state 106 and having the latest information, the SWITCH 1 304 A can reply to the client's request immediately without involving the SERVER 1 306 A.
- FIGS. 5A-5B block diagrams illustrating a comparison between a write operation as performed by legacy Raft ( 500 ) and a write operation as performed by NetRaft ( 508 ) will be described.
- the client 308 sends a write request ( 502 ) to the leader 106 (SERVER 1 306 A).
- the SERVER 1 306 A will then notify the SERVER 2 306 B and the SERVER 3 306 C of the write results ( 504 , 506 , respectively).
- SWITCH 1 304 A In NetRaft, after a Raft-aware switch, SWITCH 1 304 A, connected to the SERVER 1 306 A receives a write request ( 510 ), the SWITCH 1 304 A can handle the write request and notify the results to the SERVER 2 306 B and the SERVER 3 306 C directly without involving the SERVER 1 306 A ( 512 , 514 , respectively). Moreover, when the SWITCH 2 304 B and the SWITCH 3 304 C receive the results, these P4 switches can respond immediately.
- FIG. 6 aspects of a method 600 for operating a Raft-aware P4 switch, such as one of the P4 switches 304 , will be described, according to an illustrative embodiment of the concepts and technologies disclosed herein. It should be understood that the operations of the methods disclosed herein are not necessarily presented in any particular order and that performance of some or all of the operations in an alternative order(s) is possible and is contemplated. The operations have been presented in the demonstrated order for ease of description and illustration. Operations may be added, omitted, and/or performed simultaneously, without departing from the scope of the concepts and technologies disclosed herein.
- the logical operations described herein are implemented (1) as a sequence of computer implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system.
- the implementation is a matter of choice dependent on the performance and other requirements of the computing system.
- the logical operations described herein are referred to variously as states, operations, structural devices, acts, or modules. These states, operations, structural devices, acts, and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof.
- the phrase “cause a processor to perform operations” and variants thereof is used to refer to causing a processor or other processing component(s) disclosed herein to perform operations. It should be understood that the performance of one or more operations may include operations executed by one or more virtual processors at the instructions of one or more of the aforementioned hardware processors.
- the method 600 begins and proceeds to operation 602 , where the P4 switch 304 receives a Raft request message. From operation 602 , the method 600 proceeds to operation 604 , where the P4 switch 304 parses the raft request message. From operation 604 , the method 600 proceeds to operation 606 , where the P4 switch 304 rewrites the Raft request message to construct a corresponding front-end-generated Raft response message. From operation 606 , the method 600 proceeds to operation 608 , where the P4 switch 304 forwards the Raft request message to the back-end 319 for a liveness check.
- the method 600 proceeds to operation 610 , where the P4 switch 304 receives, from the back-end 318 , a back-end-generated Raft response message. From operation 610 , the method 600 proceeds to operation 612 , where the P4 switch 304 does not forward the back-end-generated Raft response message, and instead, extracts flow control information therefrom. From operation 612 , the method 600 proceeds to operation 614 , where the method 600 ends.
- the method 700 begins and proceeds to operation 702 , where the client 308 generates a request message. From operation 702 , the method 700 proceeds to operation 704 , where, during a bootstrap phase, the client 308 randomly selects one of the servers 306 in a server cluster with which to communicate. From operation 704 , the method 700 proceeds to operation 706 , where the client 308 issues a new request to the selected server. From operation 706 , the method 700 proceeds to operation 708 , where it is determined if the selected server is operating in the leader state 106 .
- the method 700 proceeds to operation 710 , where the method 700 ends. If, however, the selected server is not operating in the leader state 106 , the method 700 proceeds to operation 712 , where the front-end 310 of the P4 switch 304 in communication with the selected server forwards a new request to the leader 106 , thereby reducing the communication overhead generated in legacy Raft. From operation 712 , the method 700 proceeds to operation 710 , where the method 700 ends.
- FIG. 8 a flow diagram illustrating aspects of a method 800 for executing a NetRaft read operation will be described, according to an illustrative embodiment of the concepts and technologies disclosed herein.
- the method 800 will be described with reference to FIG. 8 and additional reference to FIG. 4B .
- the method 800 begins and proceeds to operation 802 , where the client 308 sends a read request message to the SERVER 2 306 B, which is not the current leader 106 in the server cluster. From operation 802 , the method 800 proceeds to operation 804 , where the P4 SWITCH 2 304 B receives the read request message from the SERVER 2 306 B.
- the method 800 proceeds to operation 806 , where the P4 SWITCH 2 304 B forwards the read request message to the current leader (the SERVER 1 306 A in FIG. 4B ). From operation 806 , the method 800 proceeds to operation 808 , where the P4 SWITCH 1 304 A, which is connected to the SERVER 1 306 A, has the latest information, and replies to the read request message immediately without involving the SERVER 1 306 A. From operation 808 , the method 808 proceeds to operation 810 , where the method 800 ends.
- FIG. 9 a flow diagram illustrating aspects of a method 900 for executing a NetRaft write operation will be described, according to an illustrative embodiment of the concepts and technologies disclosed herein.
- the method 900 begins and proceeds to operation 902 , where the client 308 sends a write request message to the SERVER 1 306 A, which is the current leader 106 in the server cluster. From operation 902 , the method 900 proceeds to operation 904 , where the P4 SWITCH 1 304 A receives a write request message.
- the method 900 proceeds to operation 906 , where the P4 SWITCH 1 304 A handles the write request message. From operation 906 , the method 900 proceeds to operation 908 , where the P4 SWITCH 1 304 A notifies the SERVER 2 306 B and the SERVER 3 306 C of the write request results without involving the SERVER 1 306 A. From operation 908 , the method 900 proceeds to operation 910 , where the method 900 ends.
- FIG. 10 is a block diagram illustrating an example experimental setup 1000 for a NetRaft implementation will be described, according to an illustrative embodiment.
- the experimental setup 1000 shows the SERVER 1 306 A operating as the leader 106 , and the SERVER 2 306 B and the SERVER 3 306 C operating as followers 102 , 102 ′.
- the P4 switches 304 A- 304 D are also shown.
- the client 308 is in communication with the P4 SWITCH 4 304 D.
- the interval of RPC calls can be measured (e.g., using LogCabin or similar software) and the timestamps for each RCP call can be recorded by network interface controllers (“NICs”) 1002 of each P4 switch 304 .
- NICs network interface controllers
- the latency (in ⁇ s) between the leader 106 and the followers 102 , 102 ′ for a heartbeat message and the client's write request are shown in a table 1004 .
- the latency is decomposed into several fine-grained segments. Latency savings from NetRaft over Raft can be observed for both heartbeat messages and write request messages.
- the experimental setup 1000 demonstrates that NetRaft does not add significant memory usage for P4 switches 304 compared to P4 switches 304 performing regular forwarding. It should be noted that the results shown in the table 1004 are from a simulation of one P4 switch 304 . Those skilled in the art will appreciate the expectancy of better performance when running the front-end 310 disclosed herein for NetRaft on a real hardware P4 switch.
- the table 1004 shows the decomposed latency between a leader 106 and a follower 102 .
- Column a shows RPC latency at the leader side and the bidirectional latency between SERVER 1 306 A and P4 SWITCH 4 304 D.
- Column b shows the bidirectional latency in P4 SWITCH 1 304 A.
- Column c shows the bidirectional latency between P4 SWITCH 1 304 A and P4 SWITCH 2 304 B.
- Column d shows the bidirectional latency in P4 SWITCH 2 304 B.
- Column e shows bidirectional latency between P4 SWITCH 2 304 B and SERVER 2 306 A and the latency of the follower 102 .
- the illustrated SDN network 1100 includes a SDN network data plane 1102 , a SDN network control plane 1104 , and a SDN network application plane 1106 .
- the SDN network data plane 1102 is a network plane responsible for bearing data traffic.
- the illustrated SDN network data plane 1102 includes SDN elements 1108 - 1108 K.
- the SDN elements 1108 - 1108 K can be or can include SDN-enabled network elements such as switches, routers, gateways, the like, or any combination thereof.
- the SDN elements 1108 - 1108 K can include the P4 switches 304 .
- the SDN network control plane 1104 is a network plane responsible for controlling elements of the SDN network data plane 1102 .
- the illustrated SDN network control plane 1104 includes SDN controllers 1110 - 1110 M.
- the SDN controllers 1110 - 1110 M are logically centralized network entities that perform operations, including translating an intent of one or more SDN applications 1112 - 1112 N operating within the SDN network application plane 1106 to rules and action sets that are useable by the SDN elements 1108 - 1108 K operating within the SDN network data plane 1102 .
- the rules can include criterion such as, for example, switch port, VLAN ID, VLAN PCP, MAC source address, MAC destination address, Ethernet type, IP source address, IP destination address, IP ToS, IP Protocol, L4 Source Port, and L4 Destination Port.
- the rules can be matched to one or more actions such as, for example, an action to forward traffic to one or more ports, an action to drop one or more packets, an action to encapsulate one or more packets and forward to a controller, an action to send one or more packets to a normal processing pipeline, and an action to modify one or more fields of one or more packets.
- the illustrated SDN network application plane 1106 is a network plane responsible for providing the SDN applications 1112 - 1112 N.
- the SDN applications 1112 - 1112 N are programs that can explicitly, directly, and programmatically communicate network requirements/intents and desired network behavior to the SDN controllers 1110 - 1110 M.
- FIG. 12 an illustrative mobile device 1200 and components thereof will be described.
- the client 308 is/are configured the same as or similar to the mobile device 1200 . While connections are not shown between the various components illustrated in FIG. 12 , it should be understood that some, none, or all of the components illustrated in FIG. 12 can be configured to interact with one other to carry out various device functions. In some embodiments, the components are arranged so as to communicate via one or more busses (not shown). Thus, it should be understood that FIG. 12 and the following description are intended to provide a general understanding of a suitable environment in which various aspects of embodiments can be implemented, and should not be construed as being limiting in any way.
- the mobile device 1200 can include a display 1202 for displaying data.
- the display 1202 can be configured to display various GUI elements, text, images, video, virtual keypads and/or keyboards, messaging data, notification messages, metadata, internet content, device status, time, date, calendar data, device preferences, map and location data, combinations thereof, and/or the like.
- the mobile device 1200 also can include a processor 1204 and a memory or other data storage device (“memory”) 1206 .
- the processor 1204 can be configured to process data and/or can execute computer-executable instructions stored in the memory 1206 .
- the computer-executable instructions executed by the processor 1204 can include, for example, an operating system 1208 , one or more applications 1210 , other computer-executable instructions stored in a memory 1206 , or the like.
- the applications 1210 also can include a user interface (“UI”) application (not illustrated in FIG. 12 ).
- UI user interface
- the UI application can interface with the operating system 1208 to facilitate user interaction with functionality and/or data stored at the mobile device 1200 and/or stored elsewhere.
- the operating system 1208 can include a member of the SYMBIAN OS family of operating systems from SYMBIAN LIMITED, a member of the WINDOWS MOBILE OS and/or WINDOWS PHONE OS families of operating systems from MICROSOFT CORPORATION, a member of the PALM WEBOS family of operating systems from HEWLETT PACKARD CORPORATION, a member of the BLACKBERRY OS family of operating systems from RESEARCH IN MOTION LIMITED, a member of the IOS family of operating systems from APPLE INC., a member of the ANDROID OS family of operating systems from GOOGLE INC., and/or other operating systems.
- These operating systems are merely illustrative of some contemplated operating systems that may be used in accordance with various embodiments of the concepts and technologies described herein and therefore should not be construed as being limiting in
- the UI application can be executed by the processor 1204 to aid a user in dialing telephone numbers, entering content, viewing account information, answering/initiating calls, entering/deleting data, entering and setting user IDs and passwords for device access, configuring settings, manipulating address book content and/or settings, multimode interaction, interacting with other applications 1210 , and otherwise facilitating user interaction with the operating system 1208 , the applications 1210 , and/or other types or instances of data 1212 that can be stored at the mobile device 1200 .
- the data 1212 can include, for example, telephone dialer applications, presence applications, visual voice mail applications, messaging applications, text-to-speech and speech-to-text applications, add-ons, plug-ins, email applications, music applications, video applications, camera applications, location-based service applications, power conservation applications, game applications, productivity applications, entertainment applications, enterprise applications, combinations thereof, and the like.
- the applications 1210 , the data 1212 , and/or portions thereof can be stored in the memory 1206 and/or in a firmware 1214 , and can be executed by the processor 1204 .
- the firmware 1214 also can store code for execution during device power up and power down operations. It can be appreciated that the firmware 1214 can be stored in a volatile or non-volatile data storage device including, but not limited to, the memory 1206 and/or a portion thereof.
- the mobile device 1200 also can include an input/output (“I/O”) interface 1216 .
- the I/O interface 1216 can be configured to support the input/output of data such as location information, user information, organization information, presence status information, user IDs, passwords, and application initiation (start-up) requests.
- the I/O interface 1216 can include a hardwire connection such as USB port, a mini-USB port, a micro-USB port, an audio jack, a PS2 port, an IEEE 1394 (“FIREWIRE”) port, a serial port, a parallel port, an Ethernet (RJ45) port, an RJ10 port, a proprietary port, combinations thereof, or the like.
- FIREWIRE IEEE 1394
- the mobile device 1200 can be configured to synchronize with another device to transfer content to and/or from the mobile device 1200 . In some embodiments, the mobile device 1200 can be configured to receive updates to one or more of the applications 1210 via the I/O interface 1216 , though this is not necessarily the case.
- the I/O interface 1216 accepts I/O devices such as keyboards, keypads, mice, interface tethers, printers, plotters, external storage, touch/multi-touch screens, touch pads, trackballs, joysticks, microphones, remote control devices, displays, projectors, medical equipment (e.g., stethoscopes, heart monitors, and other health metric monitors), modems, routers, external power sources, docking stations, combinations thereof, and the like. It should be appreciated that the I/O interface 1216 may be used for communications between the mobile device 1200 and a network device or local device.
- I/O devices such as keyboards, keypads, mice, interface tethers, printers, plotters, external storage, touch/multi-touch screens, touch pads, trackballs, joysticks, microphones, remote control devices, displays, projectors, medical equipment (e.g., stethoscopes, heart monitors, and other health metric monitors), modems, routers, external power sources, docking
- the mobile device 1200 also can include a communications component 1218 .
- the communications component 1218 can be configured to interface with the processor 1204 to facilitate wired and/or wireless communications with one or more networks such as one or more IP access networks and/or one or more circuit access networks.
- other networks include networks that utilize non-cellular wireless technologies such as WI-FI or WIMAX.
- the communications component 1218 includes a multimode communications subsystem for facilitating communications via the cellular network and one or more other networks.
- the communications component 1218 includes one or more transceivers.
- the one or more transceivers can be configured to communicate over the same and/or different wireless technology standards with respect to one another.
- one or more of the transceivers of the communications component 1218 may be configured to communicate using GSM, CDMA ONE, CDMA2000, LTE, and various other 2G, 2.5G, 3G, 4G, 5G, and greater generation technology standards.
- the communications component 1218 may facilitate communications over various channel access methods (which may or may not be used by the aforementioned standards) including, but not limited to, TDMA, FDMA, W-CDMA, OFDM, SDMA, and the like.
- the communications component 1218 may facilitate data communications using GPRS, EDGE, HSPA protocol family including HSDPA, EUL or otherwise termed HSUPA, HSPA+, and various other current and future wireless data access standards.
- the communications component 1218 can include a first transceiver (“TxRx”) 1220 A that can operate in a first communications mode (e.g., GSM).
- the communications component 1218 also can include an N th transceiver (“TxRx”) 1220 N that can operate in a second communications mode relative to the first transceiver 1220 A (e.g., UMTS).
- transceivers 1220 A- 1220 N are shown in FIG. 12 , it should be appreciated that less than two, two, and/or more than two transceivers 1220 can be included in the communications component 1218 .
- the communications component 1218 also can include an alternative transceiver (“Alt TxRx”) 1222 for supporting other types and/or standards of communications.
- the alternative transceiver 1222 can communicate using various communications technologies such as, for example, WI-FI, WIMAX, BLUETOOTH, infrared, infrared data association (“IRDA”), near-field communications (“NFC”), other radio frequency (“RF”) technologies, combinations thereof, and the like.
- the communications component 1218 also can facilitate reception from terrestrial radio networks, digital satellite radio networks, internet-based radio service networks, combinations thereof, and the like.
- the communications component 1218 can process data from a network such as the Internet, an intranet, a broadband network, a WI-FI hotspot, an Internet service provider (“ISP”), a digital subscriber line (“DSL”) provider, a broadband provider, combinations thereof, or the like.
- a network such as the Internet, an intranet, a broadband network, a WI-FI hotspot, an Internet service provider (“ISP”), a digital subscriber line (“DSL”) provider, a broadband provider, combinations thereof, or the like.
- ISP Internet service provider
- DSL digital subscriber line
- the mobile device 1200 also can include one or more sensors 1224 .
- the sensors 1224 can include temperature sensors, light sensors, air quality sensors, movement sensors, orientation sensors, noise sensors, proximity sensors, or the like. As such, it should be understood that the sensors 1224 can include, but are not limited to, accelerometers, magnetometers, gyroscopes, infrared sensors, noise sensors, microphones, combinations thereof, or the like. Additionally, audio capabilities for the mobile device 1200 may be provided by an audio I/O component 1226 .
- the audio I/O component 1226 of the mobile device 1200 can include one or more speakers for the output of audio signals, one or more microphones for the collection and/or input of audio signals, and/or other audio input and/or output devices.
- the illustrated mobile device 1200 also can include a subscriber identity module (“SIM”) system 1228 .
- SIM system 1228 can include a universal SIM (“USIM”), a universal integrated circuit card (“UICC”) and/or other identity devices.
- the SIM system 1228 can include and/or can be connected to or inserted into an interface such as a slot interface 1230 .
- the slot interface 1230 can be configured to accept insertion of other identity cards or modules for accessing various types of networks. Additionally, or alternatively, the slot interface 1230 can be configured to accept multiple subscriber identity cards. Because other devices and/or modules for identifying users and/or the mobile device 1200 are contemplated, it should be understood that these embodiments are illustrative, and should not be construed as being limiting in any way.
- the mobile device 1200 also can include an image capture and processing system 1232 (“image system”).
- image system 1232 can be configured to capture or otherwise obtain photos, videos, and/or other visual information.
- the image system 1232 can include cameras, lenses, charge-coupled devices (“CCDs”), combinations thereof, or the like.
- the mobile device 1200 may also include a video system 1234 .
- the video system 1234 can be configured to capture, process, record, modify, and/or store video content. Photos and videos obtained using the image system 1232 and the video system 1234 , respectively, may be added as message content to a multimedia message service (“MMS”) message, email message, and sent to another mobile device.
- MMS multimedia message service
- the video and/or photo content also can be shared with other devices via various types of data transfers via wired and/or wireless communication devices as described herein.
- the mobile device 1200 also can include one or more location components 1236 .
- the location components 1236 can be configured to send and/or receive signals to determine a geographic location of the mobile device 1200 .
- the location components 1236 can send and/or receive signals from GPS devices, assisted GPS (“A-GPS”) devices, WI-FI/WIMAX and/or cellular network triangulation data, combinations thereof, and the like.
- the location component 1236 also can be configured to communicate with the communications component 1218 to retrieve triangulation data for determining a location of the mobile device 1200 .
- the location component 1236 can interface with cellular network nodes, telephone lines, satellites, location transmitters and/or beacons, wireless network transmitters and receivers, combinations thereof, and the like.
- the location component 1236 can include and/or can communicate with one or more of the sensors 1224 such as a compass, an accelerometer, and/or a gyroscope to determine the orientation of the mobile device 1200 .
- the mobile device 1200 can generate and/or receive data to identify its geographic location, or to transmit data used by other devices to determine the location of the mobile device 1200 .
- the location component 1236 may include multiple components for determining the location and/or orientation of the mobile device 1200 .
- the illustrated mobile device 1200 also can include a power source 1238 .
- the power source 1238 can include one or more batteries, power supplies, power cells, and/or other power subsystems including alternating current (“AC”) and/or direct current (“DC”) power devices.
- the power source 1238 also can interface with an external power system or charging equipment via a power I/O component 1240 . Because the mobile device 1200 can include additional and/or alternative components, the above embodiment should be understood as being illustrative of one possible operating environment for various embodiments of the concepts and technologies described herein. The described embodiment of the mobile device 1200 is illustrative, and should not be construed as being limiting in any way.
- FIG. 13 is a block diagram illustrating a computer system 1300 configured to provide the functionality in accordance with various embodiments of the concepts and technologies disclosed herein.
- the P4 switches 304 , the servers 306 , and/or the client 308 can be configured, at least in part, like the architecture of the computer system 1300 . It should be understood, however, that modification to the architecture may be made to facilitate certain interactions among elements described herein.
- the computer system 1300 includes a processing unit 1302 , a memory 1304 , one or more user interface devices 1306 , one or more input/output (“I/O”) devices 1308 , and one or more network devices 1310 , each of which is operatively connected to a system bus 1312 .
- the bus 1312 enables bi-directional communication between the processing unit 1302 , the memory 1304 , the user interface devices 1306 , the I/O devices 1308 , and the network devices 1310 .
- the processing unit 1302 may be a standard central processor that performs arithmetic and logical operations, a more specific purpose programmable logic controller (“PLC”), a programmable gate array, or other type of processor known to those skilled in the art and suitable for controlling the operation of the server computer. Processing units are generally known, and therefore are not described in further detail herein.
- PLC programmable logic controller
- the memory 1304 communicates with the processing unit 1302 via the system bus 1312 .
- the memory 1304 is operatively connected to a memory controller (not shown) that enables communication with the processing unit 1302 via the system bus 1312 .
- the illustrated memory 1304 includes an operating system 1314 and one or more program modules 1316 .
- the operating system 1314 can include, but is not limited to, members of the WINDOWS, WINDOWS CE, and/or WINDOWS MOBILE families of operating systems from MICROSOFT CORPORATION, the LINUX family of operating systems, the SYMBIAN family of operating systems from SYMBIAN LIMITED, the BREW family of operating systems from QUALCOMM CORPORATION, the MAC OS, OS X, and/or iOS families of operating systems from APPLE CORPORATION, the FREEBSD family of operating systems, the SOLARIS family of operating systems from ORACLE CORPORATION, other operating systems, and the like.
- the program modules 1316 may include various software and/or program modules to perform the various operations described herein.
- the program modules 1316 and/or other programs can be embodied in computer-readable media containing instructions that, when executed by the processing unit 1302 , perform various operations such as those described herein.
- the program modules 1316 may be embodied in hardware, software, firmware, or any combination thereof.
- the program modules 1316 include a NetRaft algorithm 1320 , which can be implemented as the partial Raft algorithm 312 in the front-end 310 of the P4 switch 304 , or as the complete Raft algorithm 320 in the back-end 318 of the server 306 .
- Computer-readable media may include any available computer storage media or communication media that can be accessed by the computer system 1300 .
- Communication media includes computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media.
- modulated data signal means a signal that has one or more of its characteristics changed or set in a manner as to encode information in the signal.
- communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
- Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
- Computer storage media includes, but is not limited to, RAM, ROM, Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer system 1300 .
- the phrase “computer storage medium” and variations thereof does not include waves or signals per se and/or communication media.
- the user interface devices 1306 may include one or more devices with which a user accesses the computer system 1300 .
- the user interface devices 1306 may include, but are not limited to, computers, servers, PDAs, cellular phones, or any suitable computing devices.
- the I/O devices 1308 enable a user to interface with the program modules 1316 .
- the I/O devices 1308 are operatively connected to an I/O controller (not shown) that enables communication with the processing unit 1302 via the system bus 1312 .
- the I/O devices 1308 may include one or more input devices, such as, but not limited to, a keyboard, a mouse, or an electronic stylus.
- the I/O devices 1308 may include one or more output devices, such as, but not limited to, a display screen or a printer.
- the I/O devices 1308 can be used for manual controls for operations to exercise under certain emergency situations.
- the network devices 1310 enable the computer system 1300 to communicate with other networks or remote systems via a network 1318 , which can be or can include the network 302 .
- Examples of the network devices 1310 include, but are not limited to, a modem, a radio frequency (“RF”) or infrared (“IR”) transceiver, a telephonic interface, a bridge, a router, or a network card.
- the network 1318 may include a wireless network such as, but not limited to, a Wireless Local Area Network (“WLAN”), a Wireless Wide Area Network (“WWAN”), a Wireless Personal Area Network (“WPAN”) such as provided via BLUETOOTH technology, a Wireless Metropolitan Area Network (“WMAN”) such as a WiMAX network or metropolitan cellular network.
- WLAN Wireless Local Area Network
- WWAN Wireless Wide Area Network
- WPAN Wireless Personal Area Network
- WMAN Wireless Metropolitan Area Network
- the network 1318 may be a wired network such as, but not limited to, a Wide Area Network (“WAN”), a wired Personal Area Network (“PAN”), or a wired Metropolitan Area Network (“MAN”).
- WAN Wide Area Network
- PAN personal Area Network
- MAN wired Metropolitan Area Network
- the network 1318 may be any other network described herein.
- the network 302 can be or can include at least a portion of the network 1400 .
- the network 1400 includes a cellular network 1402 , a packet data network 1404 , for example, the Internet, and a circuit switched network 1406 , for example, a PSTN.
- the cellular network 1402 includes various components such as, but not limited to, base transceiver stations (“BTSs”), Node-B's or e-Node-B's, base station controllers (“BSCs”), radio network controllers (“RNCs”), mobile switching centers (“MSCs”), mobile management entities (“MMEs”), short message service centers (“SMSCs”), multimedia messaging service centers (“MMSCs”), home location registers (“HLRs”), home subscriber servers (“HSSs”), visitor location registers (“VLRs”), charging platforms, billing platforms, voicemail platforms, GPRS core network components, location service nodes, an IP Multimedia Subsystem (“IMS”), and the like.
- the cellular network 1402 also includes radios and nodes for receiving and transmitting voice, data, and combinations thereof to and from radio transceivers, networks, the packet data network 1404 , and the circuit switched network 1406 .
- a mobile communications device 1408 such as, for example, the client 308 , a cellular telephone, a user equipment, a mobile terminal, a PDA, a laptop computer, a handheld computer, and combinations thereof, can be operatively connected to the cellular network 1402 .
- the cellular network 1402 can be configured as a 2G GSM network and can provide data communications via GPRS and/or EDGE. Additionally, or alternatively, the cellular network 1402 can be configured as a 3G UMTS network and can provide data communications via the HSPA protocol family, for example, HSDPA, EUL (also referred to as HSUPA), and HSPA+.
- the cellular network 1402 also is compatible with 4G mobile communications standards such as LTE, or the like, as well as evolved and future mobile standards.
- the packet data network 1404 includes various devices, for example, servers, computers, databases, and other devices in communication with another, as is generally known.
- the packet data network 1404 devices are accessible via one or more network links.
- the servers often store various files that are provided to a requesting device such as, for example, a computer, a terminal, a smartphone, or the like.
- the requesting device includes software (a “browser”) for executing a web page in a format readable by the browser or other software.
- Other files and/or data may be accessible via “links” in the retrieved files, as is generally known.
- the packet data network 1404 includes or is in communication with the Internet.
- the circuit switched network 1406 includes various hardware and software for providing circuit switched communications.
- the circuit switched network 1406 may include, or may be, what is often referred to as a POTS.
- the functionality of a circuit switched network 1406 or other circuit-switched network are generally known and will not be described herein in detail.
- the illustrated cellular network 1402 is shown in communication with the packet data network 1404 and a circuit switched network 1406 , though it should be appreciated that this is not necessarily the case.
- One or more Internet-capable devices 1410 can communicate with one or more cellular networks 1402 , and devices connected thereto, through the packet data network 1404 . It also should be appreciated that the Internet-capable device 1410 can communicate with the packet data network 1404 through the circuit switched network 1406 , the cellular network 1402 , and/or via other networks (not illustrated).
- a communications device 1412 for example, a telephone, facsimile machine, modem, computer, or the like, can be in communication with the circuit switched network 1406 , and therethrough to the packet data network 1404 and/or the cellular network 1402 .
- the communications device 1412 can be an Internet-capable device, and can be substantially similar to the Internet-capable device 1410 .
- the network 302 is used to refer broadly to any combination of the networks 1402 , 1404 , 1406 shown in FIG. 10 .
- the cloud environment 1500 includes a physical environment 1502 , a virtualization layer 1504 , and a virtual environment 1506 . While no connections are shown in FIG. 15 , it should be understood that some, none, or all of the components illustrated in FIG. 15 can be configured to interact with one other to carry out various functions described herein. In some embodiments, the components are arranged so as to communicate via one or more networks, such as the network 302 . Thus, it should be understood that FIG. 15 and the remaining description are intended to provide a general understanding of a suitable environment in which various aspects of the embodiments described herein can be implemented, and should not be construed as being limiting in any way.
- the physical environment 1502 provides hardware resources, which, in the illustrated embodiment, include one or more physical compute resources 1508 , one or more physical memory resources 1510 , and one or more other physical resources 1512 .
- the physical compute resource(s) 1508 can include one or more hardware components that perform computations to process data and/or to execute computer-executable instructions of one or more application programs, one or more operating systems, and/or other software.
- the physical compute resources 1508 can include one or more central processing units (“CPUs”) configured with one or more processing cores.
- the physical compute resources 1508 can include one or more graphics processing unit (“GPU”) configured to accelerate operations performed by one or more CPUs, and/or to perform computations to process data, and/or to execute computer-executable instructions of one or more application programs, one or more operating systems, and/or other software that may or may not include instructions particular to graphics computations.
- the physical compute resources 1508 can include one or more discrete GPUs.
- the physical compute resources 1508 can include CPU and GPU components that are configured in accordance with a co-processing CPU/GPU computing model, wherein the sequential part of an application executes on the CPU and the computationally-intensive part is accelerated by the GPU processing capabilities.
- the physical compute resources 1508 can include one or more system-on-chip (“SoC”) components along with one or more other components, including, for example, one or more of the physical memory resources 1510 , and/or one or more of the other physical resources 1512 .
- SoC system-on-chip
- the physical compute resources 1508 can be or can include one or more SNAPDRAGON SoCs, available from QUALCOMM of San Diego, Calif.; one or more TEGRA SoCs, available from NVIDIA of Santa Clara, Calif.; one or more HUMMINGBIRD SoCs, available from SAMSUNG of Seoul, South Korea; one or more Open Multimedia Application Platform (“OMAP”) SoCs, available from TEXAS INSTRUMENTS of Dallas, Tex.; one or more customized versions of any of the above SoCs; and/or one or more proprietary SoCs.
- the physical compute resources 1508 can be or can include one or more hardware components architected in accordance with an ARM architecture, available for license from ARM HOLDINGS of Cambridge, United Kingdom.
- the physical compute resources 1508 can be or can include one or more hardware components architected in accordance with an x86 architecture, such an architecture available from INTEL CORPORATION of Mountain View, Calif., and others. Those skilled in the art will appreciate the implementation of the physical compute resources 1508 can utilize various computation architectures, and as such, the physical compute resources 1508 should not be construed as being limited to any particular computation architecture or combination of computation architectures, including those explicitly disclosed herein.
- the physical memory resource(s) 1510 can include one or more hardware components that perform storage/memory operations, including temporary or permanent storage operations.
- the physical memory resource(s) 1510 include volatile and/or non-volatile memory implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data disclosed herein.
- Computer storage media includes, but is not limited to, random access memory (“RAM”), read-only memory (“ROM”), Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store data and which can be accessed by the physical compute resources 1508 .
- RAM random access memory
- ROM read-only memory
- EPROM Erasable Programmable ROM
- EEPROM Electrically Erasable Programmable ROM
- flash memory or other solid state memory technology CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store data and which can be accessed by the physical compute resources 1508 .
- the other physical resource(s) 1512 can include any other hardware resources that can be utilized by the physical compute resources(s) 1508 and/or the physical memory resource(s) 1510 to perform operations described herein.
- the other physical resource(s) 1512 can include one or more input and/or output processors (e.g., network interface controller or wireless radio), one or more modems, one or more codec chipset, one or more pipeline processors, one or more fast Fourier transform (“FFT”) processors, one or more digital signal processors (“DSPs”), one or more speech synthesizers, and/or the like.
- input and/or output processors e.g., network interface controller or wireless radio
- modems e.g., network interface controller or wireless radio
- FFT fast Fourier transform
- DSPs digital signal processors
- the physical resources operating within the physical environment 1502 can be virtualized by one or more virtual machine monitors (not shown; also known as “hypervisors”) operating within the virtualization/control layer 1504 to create virtual resources 1514 that reside in the virtual environment 1506 .
- the virtual machine monitors can be or can include software, firmware, and/or hardware that alone or in combination with other software, firmware, and/or hardware, creates and manages virtual resources operating within the virtual environment 1506 .
- the virtual resources 1514 operating within the virtual environment 1506 can include abstractions of at least a portion of the physical compute resources 1508 , the physical memory resources 1510 , and/or the other physical resources 1512 , or any combination thereof.
- the abstractions can include one or more virtual machines upon which one or more applications can be executed.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
- Distributed systems often require participants to agree on some data value that is needed during computation. Consensus algorithms (e.g., Paxos, ZAB, and Raft) facilitate the participants to reach consensus, even in the face of failures. These consensus mechanisms tend to incur high overheads in terms of latency since they involve multiple rounds of communication. This is especially true when strong consistency guarantees are desired. Even without failure, consensus requires at least the round-trip time between servers running consensus algorithms.
- Raft is a consensus algorithm designed as an alternative to Paxos. Raft was designed to be more understandable than Paxos, and Raft is formally proven safe. Raft also provides a better foundation for building practical systems. To enhance understandability, Raft separates the main consensus components into the following sub-problems: 1) Leader election: a new leader is elected when the current leader fails; 2) Log replication: the leader accepts log entries from clients and replicates them, forcing other logs to be consistent with its own log; and 3) Log commitment: few restrictions are enforced to ensure safe log commitment, that is—if any member applied a particular command to its state machine, then no other member may apply a different command for the same entry. Raft starts by electing a strong leader, and then gives the leader full responsibility for managing the replicated log. The leader accepts log entries from clients (i.e., end devices making requests), and replicates the log entries to other servers. When it is safe to apply log entries to the state machines, the leader notifies the servers to apply the log entries to their local state machines.
- P4 is a language to program data-plane behavior of network devices. P4 can be used to support customized functionality (e.g., the evolving OpenFlow standard), specific datacenter packet processing logic, etc. The P4 language composes an abstract forwarding model that uses a chain of tables for packet processing. The tables match pre-defined packet fields, and perform a sequence of actions. Then, a P4 compiler takes charge of the abstract forwarding model to a concrete implementation on a particular target platform (e.g., software switches, field-programmable gate arrays (“FPGAs”), and the like).
- Concepts and technologies disclosed herein are directed to a network-assisted Raft consensus protocol, referred to herein as “NetRaft.” According to one aspect of the concepts and technologies disclosed herein, a system can include a plurality of servers operating in a server cluster, and a plurality of P4 switches corresponding to the plurality of servers. Each server of the plurality of servers can include a back-end that executes a complete Raft algorithm to perform leader election, log replication, and log commitment of a Raft consensus algorithm. Each P4 switch of the plurality of P4 switches can include a front-end that executes a partial Raft algorithm to perform the log replication and the log commitment of the Raft consensus algorithm. The back-end can maintain a complete state for responding to requests that cannot be fulfilled by the front-end. The requests can include read requests and/or write requests.
- According to another aspect of the concepts and technologies disclosed herein, a first server operating in a server cluster can receive, from a client, a read request message. The first server in this case is not recognized as a leader in the server cluster. A first P4 switch in communication with the first server can receive, from the client the read request message. The first P4 switch can forward the read request message to a second server that is recognized as the leader in the server cluster. A second P4 switch in communication with the second server can receive the read request message immediately without involving the second server.
- According to another aspect of the concepts and technologies disclosed herein, a first server operating in a server cluster can receive, from a client, a write request message. The first server in this case is recognized as a leader in the server cluster. A first P4 switch in communication with the first server can receive, from the first client, the write request message. The first P4 switch can handle the write request message. The first P4 switch can notify, without involving the first server, a second server and a third server of the cluster of write request results resulting from the first P4 switch handling the write request message.
- It should be appreciated that the above-described subject matter may be implemented as a computer-controlled apparatus, a computer process, a computing system, or as an article of manufacture such as a computer-readable storage medium. These and various other features will be apparent from a reading of the following Detailed Description and a review of the associated drawings.
- This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended that this Summary be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
-
FIG. 1 is a block diagram illustrating a legacy Raft algorithm overview of a legacy Raft consensus algorithm. -
FIG. 2 is a block diagram illustrating an overview of Raft terms. -
FIG. 3 is a block diagram illustrating aspects of a NetRaft system architecture, according to an illustrative embodiment. -
FIG. 4A is a block diagram illustrating aspects of a legacy Raft read operation. -
FIG. 4B is a block diagram illustrating aspects of a NetRaft read operation. -
FIG. 5A is a block diagram illustrating aspects of a legacy Raft write operation. -
FIG. 5B is a block diagram illustrating aspects of a NetRaft write operation. -
FIG. 6 is a flow diagram illustrating aspects of a method for operating a Raft-aware P4 switch, according to an illustrative embodiment of the concepts and technologies disclosed herein. -
FIG. 7 is a flow diagram illustrating aspects of a method for server selection, according to an illustrative embodiment of the concepts and technologies disclosed herein. -
FIG. 8 is a flow diagram illustrating aspects of a method for executing a NetRaft read operation, according to an illustrative embodiment of the concepts and technologies disclosed herein. -
FIG. 9 is a flow diagram illustrating aspects of a method for executing a NetRaft write operation, according to an illustrative embodiment of the concepts and technologies disclosed herein. -
FIG. 10 is a block diagram illustrating an example experimental setup for a NetRaft implementation, according to an illustrative embodiment. -
FIG. 11 is a block diagram illustrating a software-defined networking (“SDN”) network capable of implementing aspects of the embodiments disclosed herein. -
FIG. 12 is a block diagram illustrating an example mobile device capable of implementing aspects of the embodiments disclosed herein. -
FIG. 13 is a block diagram illustrating an example computer system capable of implementing aspects of the embodiments presented herein. -
FIG. 14 is a diagram illustrating a network, according to an illustrative embodiment. -
FIG. 15 is a block diagram illustrating aspects of an illustrative cloud environment capable of implementing aspects of the embodiments presented herein. - While the subject matter described herein may be presented, at times, in the general context of program modules that execute in conjunction with the execution of an operating system and application programs on a computer system, those skilled in the art will recognize that other implementations may be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, computer-executable instructions, and/or other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the subject matter described herein may be practiced with other computer systems, including hand-held devices, mobile devices, wireless devices, multiprocessor systems, distributed computing systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, routers, switches, other computing devices described herein, and the like.
- Distributed systems often require participants to agree on some data value that is needed during computation. Consensus algorithms (e.g., Paxos, ZAB, and Raft) facilitate the participants to reach consensus, even in the face of failures. These consensus mechanisms tend to incur high overheads in terms of latency since they involve multiple rounds of communication. This is especially true when strong consistency guarantees are desired. Even without failure, consensus requires at least the round-trip time between servers running consensus algorithms. Thus, offloading application level implementation of a consensus algorithm to the network offers the potential to reduce the consensus latency.
- Several recent projects investigate the offloading of consensus algorithms to the network. NetPaxos proposes implementing Paxos in the network by utilizing OpenFlow switches. NetPaxos can also be implemented using P4, a domain specific language that allows the programming of packet-forwarding data plane. Other efforts have been made to implement the entire ZAB consensus algorithm on FPGA devices using a low-level language. This hardware-based solution, however, might not be scalable as it requires the storage of potentially large amounts of consensus states, logic, and even the application data.
- In contrast, the concepts and technologies disclosed herein propose a network-assisted Raft consensus algorithm that takes advantage of programmable P4 switches and offloads certain Raft functionality to the network. The proposed algorithm is referred to herein as “NetRaft.” The concepts and technologies disclosed herein focus on Raft since it has formally proven to be safe and is more understandable than Paxos. Moreover, Raft has been used in the implementation of popular software-defined networking (“SDN”) controllers, such as OpenDayLight.
- NetRaft effectively reduces consensus latency, is failure-aware, and does not sacrifice correctness or scalability. To enable Raft-aware forwarding and quick response, NetRaft uses P4-based programmable switches and offloads partial Raft functionality to the switch. The concepts and technologies disclosed herein demonstrate the efficacy of this approach and the performance improvements NetRaft offers via a prototype implementation.
- Raft is a consensus algorithm designed as an alternative to Paxos. The designers of Raft intended the algorithm to be more understandable than Paxos, and Raft has been formally proven safe. Raft also provides a better foundation for building practical systems. To enhance understandability, Raft separates the main consensus components into the following sub-problems: 1) Leader election: a new leader is elected when the current leader fails; 2) Log replication: the leader accepts log entries from clients and replicates the log entries, forcing other logs to be consistent with the leader's log; and 3) Log commitment: few restrictions are enforced to ensure safe log commitment—that is, if any member applied a particular command to its state machine, then no other member may apply a different command for the same entry. Raft starts by electing a strong leader, and then gives the leader full responsibility for managing the replicated log. The leader accepts log entries from clients (e.g., end devices making requests), and replicates the log entries to other servers. When it is safe to apply log entries to the state machines, the leader notifies the other servers to apply the log entries to their respective local state machines.
- Turning now to
FIG. 1 , a block diagram illustrating a legacyRaft algorithm overview 100 of a legacy Raft consensus algorithm will be described. Raft server clusters typically contain an odd number of members (e.g., five servers and two failures). Each server in a Raft server cluster can be in one of three states: a follower state 102 (“follower(s) 102”), a candidate state (“candidate(s) 104”), or a leader state 106 (“leader 106”). Typically, a Raft server cluster has one server operating as theleader 106 and the other servers operating as thefollowers 102. Thefollowers 102 can passively receive remote procedure calls (“RPCs”) from theleader 106 or the candidate(s) 104. The candidate(s) 104 can initiate an election to become theleader 106 after receiving majority votes from the servers in the Raft server cluster. Theleader 106 responds to requests received from clients and replicates corresponding log entries to the follower(s) 102. If a client sends a request to one of thefollowers 102 instead of theleader 106, thefollower 102 can redirect the request to theleader 106. - Turning now to
FIG. 2 , a block diagram illustrating aRaft terms overview 200 will be described. The Raft termsoverview 200 illustratestime 202 that is divided into four 204, 206, 208, 210 of arbitrary length. Theterms 204, 206, 208, 210 are monotonically increasing integers, where eachterms 204, 206, 208, 210 begins with an election. If a giventerm candidate 104 wins an election (i.e., a successful “ 212, 212′, 212”), theelection candidate 104 will serve as theleader 106 for the rest of the corresponding term. For example, terms 204 (“T1”), 208 (“T3”), 210 (“T4”) each begins with a 212, 212′, 212″, and continues thereafter withsuccessful election 214, 214′, 214″. Term 206 (“T2”), however, leads to splitnormal Raft operations votes 216, resulting in nosuccessful election 212 and nonormal Raft operations 214. -
204, 206, 208, 210 allow Raft servers to detect obsolete information, such as information stored by stale leaders. Current terms are exchanged whenever servers communicate using RPCs. When aTerms leader 106 or acandidate 104 learns that its current term is out of date (i.e., there exists a higher term number among the server cluster), theleader 106 or thecandidate 104 immediately reverts to thefollower state 102. If a server receives a request (e.g., either a vote request or a request to replicate a log entry), from theleader 106, with a stale term number, the server will reject the request. - The concept of electing the
leader 106 will now be described with reference toFIGS. 1 and 2 . Theleader 106 of a Raft server cluster sends periodically heartbeats to thefollowers 102. All other servers remain in thefollower state 102 as long as they are receiving heartbeats from the (current)leader 106. If a givenfollower 102 does not receive a heartbeat message during a predefined period of time (referred to herein as an “election timeout”), thefollower 102 assumes that there is noleader 106 and starts a new election (108). To start a new election, thefollower 102 that encountered the election timeout increments its current term, votes for itself, and transitions to thecandidate state 104. The newly-transitionedfollower 102 tocandidate 104 then sends RequestVote RPCs to all other servers in the server cluster (110). - The
candidate 104 wins the election if it receives votes from a majority of the other servers for its term (112). Then, thecandidate 104 transitions to theleader state 106 and sends heartbeat to all other servers in the server cluster to prevent new elections and to establish its authority for its term. - While waiting for votes, the
candidate 104 might receive a heartbeat message from another server claiming to be theleader 106. If the received term number is at least as large as the candidate's 104 current term, thencandidate 104 will surrender candidateship and transition back to the follower state 102 (114). - If none of the
candidates 104 receives a majority vote, one of thecandidates 104 will timeout due to not receiving heartbeat messages from anyleader 106. Thatparticular candidate 104 then will start a new election (110). Raft uses randomized timeouts to ensure that split votes 216 (e.g., as shown atT2 206 inFIG. 2 ) are a rare event. If one of thecandidates 104 discovers thecurrent leader 106 or a new term, that candidate reverts back to the follower state 102 (118). - To ensure safe log commitment, Raft enforces restrictions on elected leaders to guarantee that all committed entries from previous terms are present on the new leader. During the election process, a candidate must receive a majority vote from the server cluster. A server in the cluster will vote for the candidate with a higher term and the candidate with a log that is at least as up-to-date as its own log. Otherwise, the server rejects the vote request. Therefore, receiving a majority vote means that the log of the new leader contains all committed entries.
- Turning now to
FIG. 3 , a block diagram illustrating aspects of aNetRaft system architecture 300 will be described, according to an illustrative embodiment. TheNetRaft system architecture 300 includes anetwork 302 with which a plurality of P4 switches 304 corresponding to a plurality of servers 306 are in communication. In particular,P4 SWITCH 1 304A is shown in communication withSERVER 1 306A;P4 SWITCH 2 304B is shown in communication withSERVER 2 306B; andP4 SWITCH 3 304C is shown in communication withSERVERS 306C. Although only three P4 switches 304 and three servers 306 are shown, those skilled in the art will appreciate implementations with different complexity (i.e., greater or fewer number of P4 switches 304 and/or servers 306) are possible. TheNetRaft system architecture 300 also shows aclient 308 operating in communication with theP4 SWITCH 2 304B. - The
network 302 can be or can include any packet network capable of exchanging data packets (e.g., among theclient 308, the P4 switches 304, and the servers 306). Additional details regarding thenetwork 302 are provided herein with reference toFIG. 14 . - The P4 switches 304 utilize the P4 language to control data-plane behavior. The P4 language can be used to support customized functionality (e.g., the evolving OpenFlow standard), specific data-center packet processing logic, and the like. The P4 language composes an abstract forwarding model that uses a chain of tables for packet processing. The tables match pre-defined packet fields, and perform a sequence of actions. A P4 compiler then takes charge of applying the abstract forwarding model to create a concrete implementation on a particular target platform (e.g., the P4 switches 304).
- There are five major components in a P4 program: 1) control blocks that specify a way of composing tables; 2) tables that specify packet processing logic, a high-level behavior representation about field matching and corresponding actions; 3) customized packet header fields that are a collection of packet bytes; 4) packet header parser that describes a way of transforming incoming packets to field matching instances; 5) actions that forward or drop packets, modify fields, perform stateful memory operations, and encapsulate or decapsulate headers.
- NetRaft uses registers to keep track of Raft states like logs and state machines. Registers provide persistent state that can be organized as an array of cells. NetRaft specifies the size of each cell and the number of cells in the array of cells, when declaring a register for Raft state.
- A unique feature of NetRaft is the ability to duplicate only the necessary logic to the P4 switches 304 that act as a cache to reduce consensus latency. Thus, NetRaft minimizes the storage of replicated log entries and state machine in the P4 switches 304. As will be described in further detail below, in NetRaft, the entire Raft algorithm is still running on the servers 306. This partial offloading architecture helps improve the performance of Raft, especially the consensus latency, without sacrificing scalability.
- The concepts and technologies disclosed herein aim to improve the performance of Raft without sacrificing correctness and scalability by the introduction of NetRaft. As described above with reference to
FIG. 1 , Raft has three roles: theleader 106 who maintains consensus in a centralized way, the follower(s) 102 who passively respond(s) to Raft RPCs, and the candidate(s) 104 who is/are converted/transitioned from afollower 102 during leader election when the original leader fails. The basic version of legacy Raft has only two RPCs: RequestVote issued by a candidate during election and AppendEntries issued by the leader to send heartbeats or log entries. Thus, legacy Raft has only four message types (two RPCs and two responses) compared to ten types in ZAB. NetRaft offloads the processing of AppendEntries messages and the responses of RequestVote to the P4 switches 304. - There are three fundamental requirements for a successful implementation of NetRaft. First, the implementation should guarantee the correctness of the Raft algorithm when offloading its processing logic to the P4 switches 304. Second, the Raft logic on the P4 switches 304 should be able to respond to most requests directly for improved performance. Third, the Raft logic on the P4 switches 304 should safely discard obsolete log entries and state machine for scalability. As mentioned above, in the basic Raft consensus algorithm, there are three major elements: leader election, log replication, and log commitment.
- To satisfy the above requirements, the illustrated
NetRaft system architecture 300 includes two components: a front-end 310 implemented in the P4 switch 304 (in the illustrated embodiment, theP4 SWITCH 3 304C) executing apartial Raft algorithm 312 to performlog replication 314 andlog commitment 316 elements for NetRaft, and a back-end 318 in the server 306 (in the illustrated embodiment, theSERVERS 306C) running acomplete Raft algorithm 320. 314, 314′ andLog replication 316, 316′ elements are duplicated at the front-log commitment end 310 and the back-end 318 to improve performance and scalability. The front-end 310 enhances Raft in two aspects. In particular, the front-end 310 is able to perform Raft-aware forwarding, and can quickly respond to Raft requests by rewriting the incoming packets. A job of the back-end 318 is to execute thecomplete Raft algorithm 320 to perform leader election 322 and to maintaincomplete states 324 on the server 306 for responding to certain requests that might not be fulfilled by the front-end 310. - The P4 switch, via the front-
end 310, parses Raft request messages and caches Raft states using P4's primitive actions. Upon receiving a request, the front-end 310 parses the request message and rewrites the request message to construct a corresponding response message. The front-end 310 also forwards the original packet to the back-end 318 for a liveness check. The back-end 318 sends a response message to the P4 switch 304, but the front-end 310 does not forward the response message, and instead, only extracts any necessary flow control information therefrom. For certain requests, the front-end 310 might not be able to generate a response due to the limited information available on the P4 switch 304; such a request will be served normally by the back-end 318. For example, when a new server joins the server cluster, the new server attempts to fetch all the logs. These logs might not all be available at the front-end 310, and in such instances, the back-end 318 would serve the request. - The front-
end 310 can forward certain Raft messages without involving the back-end 318. In Raft, requests from theclient 308 can only be handled by theleader 106. In the bootstrap phase, theclient 308 randomly picks a server in the cluster with which to communicate. If the selected server is not theleader 106, the selected server notifies theclient 308 of the leader's IP address (if known). Theclient 308 then issues a new request to theleader 106. In NetRaft, since the front-end 310 is aware of Raft (via implementation of the partial Raft algorithm 312), the front-end 310 of the selected server can forward the request to theleader 106 directly and reduce the communication overhead, as will be described in greater detail below with reference toFIGS. 4A-4B and 5A-5B . - The front-
end 310 can discard obsolete information because the back-end 318 always keeps the necessary information. However, the mechanism for discarding state machine is different from discarding obsolete log entries because the front-end 310 needs to know whether a requested item is already in the front-end 310 state machine or the back-end state machine. Thus, before discarding a state machine cached in the front-end 310, the front-end 310 ensures the back-end 318 is in sync before deleting state information. - Turning now to
FIGS. 4A-4B , block diagrams illustrating a comparison between a read operation as performed by legacy raft (400) and a read operation as performed by NetRaft (406) will be described. In both cases, theclient 308 sends (402) a read request to theSERVER 2 306B, which is not theleader 106. In legacy raft (400), theSERVER 2 306B will notify theclient 308 that theSERVER 1 306A is theleader 106 and then theclient 308 re-sends (404) the read request to theSERVER 1 306A. In NetRaft (406), a Raft-aware switch,SWITCH 2 304B, connected to theSERVER 2 306B receives the read request in and then forwards it to the leader 106 (SERVER 1 306A) directly (408). SinceSWITCH 1 304A is Raft-aware (via implementation of the partial Raft algorithm 312) and is connected to theSERVER 1 306A operating in theleader state 106 and having the latest information, theSWITCH 1 304A can reply to the client's request immediately without involving theSERVER 1 306A. - Turning now to
FIGS. 5A-5B , block diagrams illustrating a comparison between a write operation as performed by legacy Raft (500) and a write operation as performed by NetRaft (508) will be described. In both cases, theclient 308 sends a write request (502) to the leader 106 (SERVER 1 306A). In legacy Raft (500), theSERVER 1 306A will then notify theSERVER 2 306B and theSERVER 3 306C of the write results (504, 506, respectively). In NetRaft, after a Raft-aware switch,SWITCH 1 304A, connected to theSERVER 1 306A receives a write request (510), theSWITCH 1 304A can handle the write request and notify the results to theSERVER 2 306B and theSERVER 3 306C directly without involving theSERVER 1 306A (512, 514, respectively). Moreover, when theSWITCH 2 304B and theSWITCH 3 304C receive the results, these P4 switches can respond immediately. - Turning now to
FIG. 6 , aspects of amethod 600 for operating a Raft-aware P4 switch, such as one of the P4 switches 304, will be described, according to an illustrative embodiment of the concepts and technologies disclosed herein. It should be understood that the operations of the methods disclosed herein are not necessarily presented in any particular order and that performance of some or all of the operations in an alternative order(s) is possible and is contemplated. The operations have been presented in the demonstrated order for ease of description and illustration. Operations may be added, omitted, and/or performed simultaneously, without departing from the scope of the concepts and technologies disclosed herein. - It also should be understood that the methods disclosed herein can be ended at any time and need not be performed in its entirety. Some or all operations of the methods, and/or substantially equivalent operations, can be performed by execution of computer-readable instructions included on a computer storage media, as defined herein. The term “computer-readable instructions,” and variants thereof, as used herein, is used expansively to include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, servers, routers, switches, combinations thereof, and the like.
- Thus, it should be appreciated that the logical operations described herein are implemented (1) as a sequence of computer implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as states, operations, structural devices, acts, or modules. These states, operations, structural devices, acts, and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof. As used herein, the phrase “cause a processor to perform operations” and variants thereof is used to refer to causing a processor or other processing component(s) disclosed herein to perform operations. It should be understood that the performance of one or more operations may include operations executed by one or more virtual processors at the instructions of one or more of the aforementioned hardware processors.
- The
method 600 begins and proceeds tooperation 602, where the P4 switch 304 receives a Raft request message. Fromoperation 602, themethod 600 proceeds tooperation 604, where the P4 switch 304 parses the raft request message. Fromoperation 604, themethod 600 proceeds tooperation 606, where the P4 switch 304 rewrites the Raft request message to construct a corresponding front-end-generated Raft response message. Fromoperation 606, themethod 600 proceeds tooperation 608, where the P4 switch 304 forwards the Raft request message to the back-end 319 for a liveness check. Fromoperation 608, themethod 600 proceeds tooperation 610, where the P4 switch 304 receives, from the back-end 318, a back-end-generated Raft response message. Fromoperation 610, themethod 600 proceeds tooperation 612, where the P4 switch 304 does not forward the back-end-generated Raft response message, and instead, extracts flow control information therefrom. Fromoperation 612, themethod 600 proceeds tooperation 614, where themethod 600 ends. - Turning now to
FIG. 7 , a flow diagram illustrating aspects of amethod 700 for server selection will be described, according to an illustrative embodiment of the concepts and technologies disclosed herein. Themethod 700 begins and proceeds tooperation 702, where theclient 308 generates a request message. Fromoperation 702, themethod 700 proceeds tooperation 704, where, during a bootstrap phase, theclient 308 randomly selects one of the servers 306 in a server cluster with which to communicate. Fromoperation 704, themethod 700 proceeds tooperation 706, where theclient 308 issues a new request to the selected server. Fromoperation 706, themethod 700 proceeds tooperation 708, where it is determined if the selected server is operating in theleader state 106. If the selected server is operating in theleader state 106, themethod 700 proceeds tooperation 710, where themethod 700 ends. If, however, the selected server is not operating in theleader state 106, themethod 700 proceeds tooperation 712, where the front-end 310 of the P4 switch 304 in communication with the selected server forwards a new request to theleader 106, thereby reducing the communication overhead generated in legacy Raft. Fromoperation 712, themethod 700 proceeds tooperation 710, where themethod 700 ends. - Turning now to
FIG. 8 , a flow diagram illustrating aspects of amethod 800 for executing a NetRaft read operation will be described, according to an illustrative embodiment of the concepts and technologies disclosed herein. Themethod 800 will be described with reference toFIG. 8 and additional reference toFIG. 4B . Themethod 800 begins and proceeds tooperation 802, where theclient 308 sends a read request message to theSERVER 2 306B, which is not thecurrent leader 106 in the server cluster. Fromoperation 802, themethod 800 proceeds tooperation 804, where theP4 SWITCH 2 304B receives the read request message from theSERVER 2 306B. Fromoperation 804, themethod 800 proceeds tooperation 806, where theP4 SWITCH 2 304B forwards the read request message to the current leader (theSERVER 1 306A inFIG. 4B ). Fromoperation 806, themethod 800 proceeds tooperation 808, where theP4 SWITCH 1 304A, which is connected to theSERVER 1 306A, has the latest information, and replies to the read request message immediately without involving theSERVER 1 306A. Fromoperation 808, themethod 808 proceeds tooperation 810, where themethod 800 ends. - Turning now to
FIG. 9 , a flow diagram illustrating aspects of amethod 900 for executing a NetRaft write operation will be described, according to an illustrative embodiment of the concepts and technologies disclosed herein. Themethod 900 will be described with reference toFIG. 9 and additional reference toFIG. 5B . Themethod 900 begins and proceeds tooperation 902, where theclient 308 sends a write request message to theSERVER 1 306A, which is thecurrent leader 106 in the server cluster. Fromoperation 902, themethod 900 proceeds tooperation 904, where theP4 SWITCH 1 304A receives a write request message. Fromoperation 904, themethod 900 proceeds tooperation 906, where theP4 SWITCH 1 304A handles the write request message. Fromoperation 906, themethod 900 proceeds tooperation 908, where theP4 SWITCH 1 304A notifies theSERVER 2 306B and theSERVER 3 306C of the write request results without involving theSERVER 1 306A. Fromoperation 908, themethod 900 proceeds tooperation 910, where themethod 900 ends. - Turning now to
FIG. 10 is a block diagram illustrating an exampleexperimental setup 1000 for a NetRaft implementation will be described, according to an illustrative embodiment. Theexperimental setup 1000 shows theSERVER 1 306A operating as theleader 106, and theSERVER 2 306B and theSERVER 3 306C operating as 102, 102′. The P4 switches 304A-304D are also shown. Thefollowers client 308 is in communication with theP4 SWITCH 4 304D. - In the
experimental setup 1000, the interval of RPC calls can be measured (e.g., using LogCabin or similar software) and the timestamps for each RCP call can be recorded by network interface controllers (“NICs”) 1002 of each P4 switch 304. The latency (in μs) between theleader 106 and the 102, 102′ for a heartbeat message and the client's write request are shown in a table 1004. The latency is decomposed into several fine-grained segments. Latency savings from NetRaft over Raft can be observed for both heartbeat messages and write request messages. Moreover, thefollowers experimental setup 1000 demonstrates that NetRaft does not add significant memory usage for P4 switches 304 compared to P4 switches 304 performing regular forwarding. It should be noted that the results shown in the table 1004 are from a simulation of one P4 switch 304. Those skilled in the art will appreciate the expectancy of better performance when running the front-end 310 disclosed herein for NetRaft on a real hardware P4 switch. - The table 1004 shows the decomposed latency between a
leader 106 and afollower 102. Column a shows RPC latency at the leader side and the bidirectional latency betweenSERVER 1 306A andP4 SWITCH 4 304D. Column b shows the bidirectional latency inP4 SWITCH 1 304A. Column c shows the bidirectional latency betweenP4 SWITCH 1 304A andP4 SWITCH 2 304B. Column d shows the bidirectional latency inP4 SWITCH 2 304B. Column e shows bidirectional latency betweenP4 SWITCH 2 304B andSERVER 2 306A and the latency of thefollower 102. - Turning now to
FIG. 11 , a block diagram illustrating aspects of anSDN network 1100 for implementing various aspects of the concepts and technologies disclosed herein will be described. The illustratedSDN network 1100 includes a SDNnetwork data plane 1102, a SDNnetwork control plane 1104, and a SDNnetwork application plane 1106. - The SDN
network data plane 1102 is a network plane responsible for bearing data traffic. The illustrated SDNnetwork data plane 1102 includes SDN elements 1108-1108K. The SDN elements 1108-1108K can be or can include SDN-enabled network elements such as switches, routers, gateways, the like, or any combination thereof. In accordance with the concepts and technologies disclosed herein, the SDN elements 1108-1108K can include the P4 switches 304. - The SDN
network control plane 1104 is a network plane responsible for controlling elements of the SDNnetwork data plane 1102. The illustrated SDNnetwork control plane 1104 includes SDN controllers 1110-1110M. The SDN controllers 1110-1110M are logically centralized network entities that perform operations, including translating an intent of one or more SDN applications 1112-1112N operating within the SDNnetwork application plane 1106 to rules and action sets that are useable by the SDN elements 1108-1108K operating within the SDNnetwork data plane 1102. - The rules can include criterion such as, for example, switch port, VLAN ID, VLAN PCP, MAC source address, MAC destination address, Ethernet type, IP source address, IP destination address, IP ToS, IP Protocol, L4 Source Port, and L4 Destination Port. The rules can be matched to one or more actions such as, for example, an action to forward traffic to one or more ports, an action to drop one or more packets, an action to encapsulate one or more packets and forward to a controller, an action to send one or more packets to a normal processing pipeline, and an action to modify one or more fields of one or more packets. Those skilled in the art will appreciate the breadth of possible rule and action sets utilized in a particular implementation to achieve desired results. As such, the aforementioned examples should not be construed as being limiting in any way.
- The illustrated SDN
network application plane 1106 is a network plane responsible for providing the SDN applications 1112-1112N. The SDN applications 1112-1112N are programs that can explicitly, directly, and programmatically communicate network requirements/intents and desired network behavior to the SDN controllers 1110-1110M. - Turning now to
FIG. 12 , an illustrativemobile device 1200 and components thereof will be described. In some embodiments, theclient 308 is/are configured the same as or similar to themobile device 1200. While connections are not shown between the various components illustrated inFIG. 12 , it should be understood that some, none, or all of the components illustrated inFIG. 12 can be configured to interact with one other to carry out various device functions. In some embodiments, the components are arranged so as to communicate via one or more busses (not shown). Thus, it should be understood thatFIG. 12 and the following description are intended to provide a general understanding of a suitable environment in which various aspects of embodiments can be implemented, and should not be construed as being limiting in any way. - As illustrated in
FIG. 12 , themobile device 1200 can include adisplay 1202 for displaying data. According to various embodiments, thedisplay 1202 can be configured to display various GUI elements, text, images, video, virtual keypads and/or keyboards, messaging data, notification messages, metadata, internet content, device status, time, date, calendar data, device preferences, map and location data, combinations thereof, and/or the like. Themobile device 1200 also can include aprocessor 1204 and a memory or other data storage device (“memory”) 1206. Theprocessor 1204 can be configured to process data and/or can execute computer-executable instructions stored in thememory 1206. The computer-executable instructions executed by theprocessor 1204 can include, for example, anoperating system 1208, one ormore applications 1210, other computer-executable instructions stored in amemory 1206, or the like. In some embodiments, theapplications 1210 also can include a user interface (“UI”) application (not illustrated inFIG. 12 ). - The UI application can interface with the
operating system 1208 to facilitate user interaction with functionality and/or data stored at themobile device 1200 and/or stored elsewhere. In some embodiments, theoperating system 1208 can include a member of the SYMBIAN OS family of operating systems from SYMBIAN LIMITED, a member of the WINDOWS MOBILE OS and/or WINDOWS PHONE OS families of operating systems from MICROSOFT CORPORATION, a member of the PALM WEBOS family of operating systems from HEWLETT PACKARD CORPORATION, a member of the BLACKBERRY OS family of operating systems from RESEARCH IN MOTION LIMITED, a member of the IOS family of operating systems from APPLE INC., a member of the ANDROID OS family of operating systems from GOOGLE INC., and/or other operating systems. These operating systems are merely illustrative of some contemplated operating systems that may be used in accordance with various embodiments of the concepts and technologies described herein and therefore should not be construed as being limiting in any way. - The UI application can be executed by the
processor 1204 to aid a user in dialing telephone numbers, entering content, viewing account information, answering/initiating calls, entering/deleting data, entering and setting user IDs and passwords for device access, configuring settings, manipulating address book content and/or settings, multimode interaction, interacting withother applications 1210, and otherwise facilitating user interaction with theoperating system 1208, theapplications 1210, and/or other types or instances ofdata 1212 that can be stored at themobile device 1200. According to various embodiments, thedata 1212 can include, for example, telephone dialer applications, presence applications, visual voice mail applications, messaging applications, text-to-speech and speech-to-text applications, add-ons, plug-ins, email applications, music applications, video applications, camera applications, location-based service applications, power conservation applications, game applications, productivity applications, entertainment applications, enterprise applications, combinations thereof, and the like. Theapplications 1210, thedata 1212, and/or portions thereof can be stored in thememory 1206 and/or in afirmware 1214, and can be executed by theprocessor 1204. Thefirmware 1214 also can store code for execution during device power up and power down operations. It can be appreciated that thefirmware 1214 can be stored in a volatile or non-volatile data storage device including, but not limited to, thememory 1206 and/or a portion thereof. - The
mobile device 1200 also can include an input/output (“I/O”)interface 1216. The I/O interface 1216 can be configured to support the input/output of data such as location information, user information, organization information, presence status information, user IDs, passwords, and application initiation (start-up) requests. In some embodiments, the I/O interface 1216 can include a hardwire connection such as USB port, a mini-USB port, a micro-USB port, an audio jack, a PS2 port, an IEEE 1394 (“FIREWIRE”) port, a serial port, a parallel port, an Ethernet (RJ45) port, an RJ10 port, a proprietary port, combinations thereof, or the like. In some embodiments, themobile device 1200 can be configured to synchronize with another device to transfer content to and/or from themobile device 1200. In some embodiments, themobile device 1200 can be configured to receive updates to one or more of theapplications 1210 via the I/O interface 1216, though this is not necessarily the case. In some embodiments, the I/O interface 1216 accepts I/O devices such as keyboards, keypads, mice, interface tethers, printers, plotters, external storage, touch/multi-touch screens, touch pads, trackballs, joysticks, microphones, remote control devices, displays, projectors, medical equipment (e.g., stethoscopes, heart monitors, and other health metric monitors), modems, routers, external power sources, docking stations, combinations thereof, and the like. It should be appreciated that the I/O interface 1216 may be used for communications between themobile device 1200 and a network device or local device. - The
mobile device 1200 also can include acommunications component 1218. Thecommunications component 1218 can be configured to interface with theprocessor 1204 to facilitate wired and/or wireless communications with one or more networks such as one or more IP access networks and/or one or more circuit access networks. In some embodiments, other networks include networks that utilize non-cellular wireless technologies such as WI-FI or WIMAX. In some embodiments, thecommunications component 1218 includes a multimode communications subsystem for facilitating communications via the cellular network and one or more other networks. - The
communications component 1218, in some embodiments, includes one or more transceivers. The one or more transceivers, if included, can be configured to communicate over the same and/or different wireless technology standards with respect to one another. For example, in some embodiments one or more of the transceivers of thecommunications component 1218 may be configured to communicate using GSM, CDMA ONE, CDMA2000, LTE, and various other 2G, 2.5G, 3G, 4G, 5G, and greater generation technology standards. Moreover, thecommunications component 1218 may facilitate communications over various channel access methods (which may or may not be used by the aforementioned standards) including, but not limited to, TDMA, FDMA, W-CDMA, OFDM, SDMA, and the like. - In addition, the
communications component 1218 may facilitate data communications using GPRS, EDGE, HSPA protocol family including HSDPA, EUL or otherwise termed HSUPA, HSPA+, and various other current and future wireless data access standards. In the illustrated embodiment, thecommunications component 1218 can include a first transceiver (“TxRx”) 1220A that can operate in a first communications mode (e.g., GSM). Thecommunications component 1218 also can include an Nth transceiver (“TxRx”) 1220N that can operate in a second communications mode relative to thefirst transceiver 1220A (e.g., UMTS). While twotransceivers 1220A-1220N (hereinafter collectively and/or generically referred to as “transceivers 1220”) are shown inFIG. 12 , it should be appreciated that less than two, two, and/or more than two transceivers 1220 can be included in thecommunications component 1218. - The
communications component 1218 also can include an alternative transceiver (“Alt TxRx”) 1222 for supporting other types and/or standards of communications. According to various contemplated embodiments, thealternative transceiver 1222 can communicate using various communications technologies such as, for example, WI-FI, WIMAX, BLUETOOTH, infrared, infrared data association (“IRDA”), near-field communications (“NFC”), other radio frequency (“RF”) technologies, combinations thereof, and the like. - In some embodiments, the
communications component 1218 also can facilitate reception from terrestrial radio networks, digital satellite radio networks, internet-based radio service networks, combinations thereof, and the like. Thecommunications component 1218 can process data from a network such as the Internet, an intranet, a broadband network, a WI-FI hotspot, an Internet service provider (“ISP”), a digital subscriber line (“DSL”) provider, a broadband provider, combinations thereof, or the like. - The
mobile device 1200 also can include one ormore sensors 1224. Thesensors 1224 can include temperature sensors, light sensors, air quality sensors, movement sensors, orientation sensors, noise sensors, proximity sensors, or the like. As such, it should be understood that thesensors 1224 can include, but are not limited to, accelerometers, magnetometers, gyroscopes, infrared sensors, noise sensors, microphones, combinations thereof, or the like. Additionally, audio capabilities for themobile device 1200 may be provided by an audio I/O component 1226. The audio I/O component 1226 of themobile device 1200 can include one or more speakers for the output of audio signals, one or more microphones for the collection and/or input of audio signals, and/or other audio input and/or output devices. - The illustrated
mobile device 1200 also can include a subscriber identity module (“SIM”)system 1228. TheSIM system 1228 can include a universal SIM (“USIM”), a universal integrated circuit card (“UICC”) and/or other identity devices. TheSIM system 1228 can include and/or can be connected to or inserted into an interface such as aslot interface 1230. In some embodiments, theslot interface 1230 can be configured to accept insertion of other identity cards or modules for accessing various types of networks. Additionally, or alternatively, theslot interface 1230 can be configured to accept multiple subscriber identity cards. Because other devices and/or modules for identifying users and/or themobile device 1200 are contemplated, it should be understood that these embodiments are illustrative, and should not be construed as being limiting in any way. - The
mobile device 1200 also can include an image capture and processing system 1232 (“image system”). Theimage system 1232 can be configured to capture or otherwise obtain photos, videos, and/or other visual information. As such, theimage system 1232 can include cameras, lenses, charge-coupled devices (“CCDs”), combinations thereof, or the like. Themobile device 1200 may also include avideo system 1234. Thevideo system 1234 can be configured to capture, process, record, modify, and/or store video content. Photos and videos obtained using theimage system 1232 and thevideo system 1234, respectively, may be added as message content to a multimedia message service (“MMS”) message, email message, and sent to another mobile device. The video and/or photo content also can be shared with other devices via various types of data transfers via wired and/or wireless communication devices as described herein. - The
mobile device 1200 also can include one ormore location components 1236. Thelocation components 1236 can be configured to send and/or receive signals to determine a geographic location of themobile device 1200. According to various embodiments, thelocation components 1236 can send and/or receive signals from GPS devices, assisted GPS (“A-GPS”) devices, WI-FI/WIMAX and/or cellular network triangulation data, combinations thereof, and the like. Thelocation component 1236 also can be configured to communicate with thecommunications component 1218 to retrieve triangulation data for determining a location of themobile device 1200. In some embodiments, thelocation component 1236 can interface with cellular network nodes, telephone lines, satellites, location transmitters and/or beacons, wireless network transmitters and receivers, combinations thereof, and the like. In some embodiments, thelocation component 1236 can include and/or can communicate with one or more of thesensors 1224 such as a compass, an accelerometer, and/or a gyroscope to determine the orientation of themobile device 1200. Using thelocation component 1236, themobile device 1200 can generate and/or receive data to identify its geographic location, or to transmit data used by other devices to determine the location of themobile device 1200. Thelocation component 1236 may include multiple components for determining the location and/or orientation of themobile device 1200. - The illustrated
mobile device 1200 also can include a power source 1238. The power source 1238 can include one or more batteries, power supplies, power cells, and/or other power subsystems including alternating current (“AC”) and/or direct current (“DC”) power devices. The power source 1238 also can interface with an external power system or charging equipment via a power I/O component 1240. Because themobile device 1200 can include additional and/or alternative components, the above embodiment should be understood as being illustrative of one possible operating environment for various embodiments of the concepts and technologies described herein. The described embodiment of themobile device 1200 is illustrative, and should not be construed as being limiting in any way. -
FIG. 13 is a block diagram illustrating acomputer system 1300 configured to provide the functionality in accordance with various embodiments of the concepts and technologies disclosed herein. In some embodiments, the P4 switches 304, the servers 306, and/or theclient 308 can be configured, at least in part, like the architecture of thecomputer system 1300. It should be understood, however, that modification to the architecture may be made to facilitate certain interactions among elements described herein. - The
computer system 1300 includes aprocessing unit 1302, amemory 1304, one or more user interface devices 1306, one or more input/output (“I/O”)devices 1308, and one ormore network devices 1310, each of which is operatively connected to a system bus 1312. The bus 1312 enables bi-directional communication between theprocessing unit 1302, thememory 1304, the user interface devices 1306, the I/O devices 1308, and thenetwork devices 1310. - The
processing unit 1302 may be a standard central processor that performs arithmetic and logical operations, a more specific purpose programmable logic controller (“PLC”), a programmable gate array, or other type of processor known to those skilled in the art and suitable for controlling the operation of the server computer. Processing units are generally known, and therefore are not described in further detail herein. - The
memory 1304 communicates with theprocessing unit 1302 via the system bus 1312. In some embodiments, thememory 1304 is operatively connected to a memory controller (not shown) that enables communication with theprocessing unit 1302 via the system bus 1312. The illustratedmemory 1304 includes anoperating system 1314 and one ormore program modules 1316. Theoperating system 1314 can include, but is not limited to, members of the WINDOWS, WINDOWS CE, and/or WINDOWS MOBILE families of operating systems from MICROSOFT CORPORATION, the LINUX family of operating systems, the SYMBIAN family of operating systems from SYMBIAN LIMITED, the BREW family of operating systems from QUALCOMM CORPORATION, the MAC OS, OS X, and/or iOS families of operating systems from APPLE CORPORATION, the FREEBSD family of operating systems, the SOLARIS family of operating systems from ORACLE CORPORATION, other operating systems, and the like. - The
program modules 1316 may include various software and/or program modules to perform the various operations described herein. Theprogram modules 1316 and/or other programs can be embodied in computer-readable media containing instructions that, when executed by theprocessing unit 1302, perform various operations such as those described herein. According to embodiments, theprogram modules 1316 may be embodied in hardware, software, firmware, or any combination thereof. In the illustrated example, theprogram modules 1316 include aNetRaft algorithm 1320, which can be implemented as thepartial Raft algorithm 312 in the front-end 310 of the P4 switch 304, or as thecomplete Raft algorithm 320 in the back-end 318 of the server 306. - By way of example, and not limitation, computer-readable media may include any available computer storage media or communication media that can be accessed by the
computer system 1300. Communication media includes computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics changed or set in a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media. - Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the
computer system 1300. In the claims, the phrase “computer storage medium” and variations thereof does not include waves or signals per se and/or communication media. - The user interface devices 1306 may include one or more devices with which a user accesses the
computer system 1300. The user interface devices 1306 may include, but are not limited to, computers, servers, PDAs, cellular phones, or any suitable computing devices. The I/O devices 1308 enable a user to interface with theprogram modules 1316. In one embodiment, the I/O devices 1308 are operatively connected to an I/O controller (not shown) that enables communication with theprocessing unit 1302 via the system bus 1312. The I/O devices 1308 may include one or more input devices, such as, but not limited to, a keyboard, a mouse, or an electronic stylus. Further, the I/O devices 1308 may include one or more output devices, such as, but not limited to, a display screen or a printer. In some embodiments, the I/O devices 1308 can be used for manual controls for operations to exercise under certain emergency situations. - The
network devices 1310 enable thecomputer system 1300 to communicate with other networks or remote systems via anetwork 1318, which can be or can include thenetwork 302. Examples of thenetwork devices 1310 include, but are not limited to, a modem, a radio frequency (“RF”) or infrared (“IR”) transceiver, a telephonic interface, a bridge, a router, or a network card. Thenetwork 1318 may include a wireless network such as, but not limited to, a Wireless Local Area Network (“WLAN”), a Wireless Wide Area Network (“WWAN”), a Wireless Personal Area Network (“WPAN”) such as provided via BLUETOOTH technology, a Wireless Metropolitan Area Network (“WMAN”) such as a WiMAX network or metropolitan cellular network. Alternatively, thenetwork 1318 may be a wired network such as, but not limited to, a Wide Area Network (“WAN”), a wired Personal Area Network (“PAN”), or a wired Metropolitan Area Network (“MAN”). Thenetwork 1318 may be any other network described herein. - Turning now to
FIG. 14 , details of anetwork 1400 are illustrated, according to an illustrative embodiment. Thenetwork 302 can be or can include at least a portion of thenetwork 1400. Thenetwork 1400 includes acellular network 1402, apacket data network 1404, for example, the Internet, and a circuit switchednetwork 1406, for example, a PSTN. Thecellular network 1402 includes various components such as, but not limited to, base transceiver stations (“BTSs”), Node-B's or e-Node-B's, base station controllers (“BSCs”), radio network controllers (“RNCs”), mobile switching centers (“MSCs”), mobile management entities (“MMEs”), short message service centers (“SMSCs”), multimedia messaging service centers (“MMSCs”), home location registers (“HLRs”), home subscriber servers (“HSSs”), visitor location registers (“VLRs”), charging platforms, billing platforms, voicemail platforms, GPRS core network components, location service nodes, an IP Multimedia Subsystem (“IMS”), and the like. Thecellular network 1402 also includes radios and nodes for receiving and transmitting voice, data, and combinations thereof to and from radio transceivers, networks, thepacket data network 1404, and the circuit switchednetwork 1406. - A
mobile communications device 1408, such as, for example, theclient 308, a cellular telephone, a user equipment, a mobile terminal, a PDA, a laptop computer, a handheld computer, and combinations thereof, can be operatively connected to thecellular network 1402. Thecellular network 1402 can be configured as a 2G GSM network and can provide data communications via GPRS and/or EDGE. Additionally, or alternatively, thecellular network 1402 can be configured as a 3G UMTS network and can provide data communications via the HSPA protocol family, for example, HSDPA, EUL (also referred to as HSUPA), and HSPA+. Thecellular network 1402 also is compatible with 4G mobile communications standards such as LTE, or the like, as well as evolved and future mobile standards. - The
packet data network 1404 includes various devices, for example, servers, computers, databases, and other devices in communication with another, as is generally known. Thepacket data network 1404 devices are accessible via one or more network links. The servers often store various files that are provided to a requesting device such as, for example, a computer, a terminal, a smartphone, or the like. Typically, the requesting device includes software (a “browser”) for executing a web page in a format readable by the browser or other software. Other files and/or data may be accessible via “links” in the retrieved files, as is generally known. In some embodiments, thepacket data network 1404 includes or is in communication with the Internet. The circuit switchednetwork 1406 includes various hardware and software for providing circuit switched communications. The circuit switchednetwork 1406 may include, or may be, what is often referred to as a POTS. The functionality of a circuit switchednetwork 1406 or other circuit-switched network are generally known and will not be described herein in detail. - The illustrated
cellular network 1402 is shown in communication with thepacket data network 1404 and a circuit switchednetwork 1406, though it should be appreciated that this is not necessarily the case. One or more Internet-capable devices 1410, for example, theclient 308, a PC, a laptop, a portable device, or another suitable device, can communicate with one or morecellular networks 1402, and devices connected thereto, through thepacket data network 1404. It also should be appreciated that the Internet-capable device 1410 can communicate with thepacket data network 1404 through the circuit switchednetwork 1406, thecellular network 1402, and/or via other networks (not illustrated). - As illustrated, a
communications device 1412, for example, a telephone, facsimile machine, modem, computer, or the like, can be in communication with the circuit switchednetwork 1406, and therethrough to thepacket data network 1404 and/or thecellular network 1402. It should be appreciated that thecommunications device 1412 can be an Internet-capable device, and can be substantially similar to the Internet-capable device 1410. In the specification, thenetwork 302 is used to refer broadly to any combination of the 1402, 1404, 1406 shown innetworks FIG. 10 . - Turning now to
FIG. 15 , anillustrative cloud environment 1500 will be described, according to an illustrative embodiment. In some embodiments, theclient 308, the servers 306, and/or the P4 switches 304 can be implemented, at least in part, in thecloud environment 1500. Thecloud environment 1500 includes aphysical environment 1502, avirtualization layer 1504, and avirtual environment 1506. While no connections are shown inFIG. 15 , it should be understood that some, none, or all of the components illustrated inFIG. 15 can be configured to interact with one other to carry out various functions described herein. In some embodiments, the components are arranged so as to communicate via one or more networks, such as thenetwork 302. Thus, it should be understood thatFIG. 15 and the remaining description are intended to provide a general understanding of a suitable environment in which various aspects of the embodiments described herein can be implemented, and should not be construed as being limiting in any way. - The
physical environment 1502 provides hardware resources, which, in the illustrated embodiment, include one or morephysical compute resources 1508, one or more physical memory resources 1510, and one or more otherphysical resources 1512. The physical compute resource(s) 1508 can include one or more hardware components that perform computations to process data and/or to execute computer-executable instructions of one or more application programs, one or more operating systems, and/or other software. - The
physical compute resources 1508 can include one or more central processing units (“CPUs”) configured with one or more processing cores. Thephysical compute resources 1508 can include one or more graphics processing unit (“GPU”) configured to accelerate operations performed by one or more CPUs, and/or to perform computations to process data, and/or to execute computer-executable instructions of one or more application programs, one or more operating systems, and/or other software that may or may not include instructions particular to graphics computations. In some embodiments, thephysical compute resources 1508 can include one or more discrete GPUs. In some other embodiments, thephysical compute resources 1508 can include CPU and GPU components that are configured in accordance with a co-processing CPU/GPU computing model, wherein the sequential part of an application executes on the CPU and the computationally-intensive part is accelerated by the GPU processing capabilities. Thephysical compute resources 1508 can include one or more system-on-chip (“SoC”) components along with one or more other components, including, for example, one or more of the physical memory resources 1510, and/or one or more of the otherphysical resources 1512. In some embodiments, thephysical compute resources 1508 can be or can include one or more SNAPDRAGON SoCs, available from QUALCOMM of San Diego, Calif.; one or more TEGRA SoCs, available from NVIDIA of Santa Clara, Calif.; one or more HUMMINGBIRD SoCs, available from SAMSUNG of Seoul, South Korea; one or more Open Multimedia Application Platform (“OMAP”) SoCs, available from TEXAS INSTRUMENTS of Dallas, Tex.; one or more customized versions of any of the above SoCs; and/or one or more proprietary SoCs. Thephysical compute resources 1508 can be or can include one or more hardware components architected in accordance with an ARM architecture, available for license from ARM HOLDINGS of Cambridge, United Kingdom. Alternatively, thephysical compute resources 1508 can be or can include one or more hardware components architected in accordance with an x86 architecture, such an architecture available from INTEL CORPORATION of Mountain View, Calif., and others. Those skilled in the art will appreciate the implementation of thephysical compute resources 1508 can utilize various computation architectures, and as such, thephysical compute resources 1508 should not be construed as being limited to any particular computation architecture or combination of computation architectures, including those explicitly disclosed herein. - The physical memory resource(s) 1510 can include one or more hardware components that perform storage/memory operations, including temporary or permanent storage operations. In some embodiments, the physical memory resource(s) 1510 include volatile and/or non-volatile memory implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data disclosed herein. Computer storage media includes, but is not limited to, random access memory (“RAM”), read-only memory (“ROM”), Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store data and which can be accessed by the
physical compute resources 1508. - The other physical resource(s) 1512 can include any other hardware resources that can be utilized by the physical compute resources(s) 1508 and/or the physical memory resource(s) 1510 to perform operations described herein. The other physical resource(s) 1512 can include one or more input and/or output processors (e.g., network interface controller or wireless radio), one or more modems, one or more codec chipset, one or more pipeline processors, one or more fast Fourier transform (“FFT”) processors, one or more digital signal processors (“DSPs”), one or more speech synthesizers, and/or the like.
- The physical resources operating within the
physical environment 1502 can be virtualized by one or more virtual machine monitors (not shown; also known as “hypervisors”) operating within the virtualization/control layer 1504 to createvirtual resources 1514 that reside in thevirtual environment 1506. The virtual machine monitors can be or can include software, firmware, and/or hardware that alone or in combination with other software, firmware, and/or hardware, creates and manages virtual resources operating within thevirtual environment 1506. - The
virtual resources 1514 operating within thevirtual environment 1506 can include abstractions of at least a portion of thephysical compute resources 1508, the physical memory resources 1510, and/or the otherphysical resources 1512, or any combination thereof. In some embodiments, the abstractions can include one or more virtual machines upon which one or more applications can be executed. - Based on the foregoing, it should be appreciated that concepts and technologies directed to network-assisted Raft consensus algorithm have been disclosed herein. Although the subject matter presented herein has been described in language specific to computer structural features, methodological and transformative acts, specific computing machinery, and computer-readable media, it is to be understood that the concepts and technologies disclosed herein are not necessarily limited to the specific features, acts, or media described herein. Rather, the specific features, acts and mediums are disclosed as example forms of implementing the concepts and technologies disclosed herein.
- The subject matter described above is provided by way of illustration only and should not be construed as limiting. Various modifications and changes may be made to the subject matter described herein without following the example embodiments and applications illustrated and described, and without departing from the true spirit and scope of the embodiments of the concepts and technologies disclosed herein.
Claims (20)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/101,751 US10848375B2 (en) | 2018-08-13 | 2018-08-13 | Network-assisted raft consensus protocol |
| US17/101,280 US11533220B2 (en) | 2018-08-13 | 2020-11-23 | Network-assisted consensus protocol |
| US18/083,939 US20230118489A1 (en) | 2018-08-13 | 2022-12-19 | Network-Assisted Consensus Protocol |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/101,751 US10848375B2 (en) | 2018-08-13 | 2018-08-13 | Network-assisted raft consensus protocol |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/101,280 Continuation US11533220B2 (en) | 2018-08-13 | 2020-11-23 | Network-assisted consensus protocol |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20200052954A1 true US20200052954A1 (en) | 2020-02-13 |
| US10848375B2 US10848375B2 (en) | 2020-11-24 |
Family
ID=69407278
Family Applications (3)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/101,751 Expired - Fee Related US10848375B2 (en) | 2018-08-13 | 2018-08-13 | Network-assisted raft consensus protocol |
| US17/101,280 Active US11533220B2 (en) | 2018-08-13 | 2020-11-23 | Network-assisted consensus protocol |
| US18/083,939 Abandoned US20230118489A1 (en) | 2018-08-13 | 2022-12-19 | Network-Assisted Consensus Protocol |
Family Applications After (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/101,280 Active US11533220B2 (en) | 2018-08-13 | 2020-11-23 | Network-assisted consensus protocol |
| US18/083,939 Abandoned US20230118489A1 (en) | 2018-08-13 | 2022-12-19 | Network-Assisted Consensus Protocol |
Country Status (1)
| Country | Link |
|---|---|
| US (3) | US10848375B2 (en) |
Cited By (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111371877A (en) * | 2020-02-28 | 2020-07-03 | 桂林电子科技大学 | A Consensus Method for Heterogeneous Consortium Chains |
| US20210021412A1 (en) * | 2018-09-07 | 2021-01-21 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for electing representative node device, computer device, and storage medium |
| CN112395640A (en) * | 2020-11-16 | 2021-02-23 | 国网河北省电力有限公司信息通信分公司 | Industry Internet of things data lightweight credible sharing technology based on block chain |
| US20210160152A1 (en) * | 2019-11-21 | 2021-05-27 | Vmware, Inc. | Asynchronous Boosting Of State Machine Replication |
| US11108637B1 (en) * | 2019-11-08 | 2021-08-31 | Sprint Communications Company L.P. | Wireless relay consensus for mesh network architectures |
| US20210320977A1 (en) * | 2018-12-24 | 2021-10-14 | Huawei Technologies Co., Ltd. | Method and apparatus for implementing data consistency, server, and terminal |
| WO2021250652A1 (en) * | 2020-06-08 | 2021-12-16 | Drivenets Ltd. | Highly-available cluster leader election in a distributed routing system |
| CN114189421A (en) * | 2022-02-17 | 2022-03-15 | 江西农业大学 | Leader node election method, system, storage medium and equipment |
| CN114448900A (en) * | 2022-04-02 | 2022-05-06 | 南京邮电大学 | SDN controller interaction method and system based on extended raft algorithm |
| CN114490125A (en) * | 2022-01-19 | 2022-05-13 | 山东浪潮科学研究院有限公司 | Optimization method for preselection process in Raft consensus algorithm |
| CN115134161A (en) * | 2022-07-11 | 2022-09-30 | 西安理工大学 | Defense method for resisting tenure forgery based on Raft consensus algorithm |
| US20220358118A1 (en) * | 2021-05-10 | 2022-11-10 | International Business Machines Corporation | Data synchronization in edge computing networks |
| CN115617257A (en) * | 2021-07-15 | 2023-01-17 | 腾讯科技(深圳)有限公司 | Data reading method, server and client |
| CN115955504A (en) * | 2022-12-21 | 2023-04-11 | 杭州溪塔科技有限公司 | Method and device for implementing state synchronization middleware for stateful services |
| CN116055563A (en) * | 2022-11-22 | 2023-05-02 | 北京明朝万达科技股份有限公司 | Task scheduling method, system, electronic equipment and medium based on Raft protocol |
| US20250133131A1 (en) * | 2023-10-20 | 2025-04-24 | Oracle International Corporation | Raft consensus vice leader optimization |
| CN119892843A (en) * | 2025-01-13 | 2025-04-25 | 西南交通大学 | Breakdown fault tolerance consensus method and system based on relay thought and lease mechanism |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10848375B2 (en) * | 2018-08-13 | 2020-11-24 | At&T Intellectual Property I, L.P. | Network-assisted raft consensus protocol |
| US11671488B1 (en) | 2022-02-24 | 2023-06-06 | Bank Of America Corporation | Domain-based Raft consensus selection of leader nodes in distributed data services |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100082728A1 (en) * | 2008-09-30 | 2010-04-01 | Yahoo! Inc. | Consensus-based reliable messaging |
| US20180295546A1 (en) * | 2017-04-07 | 2018-10-11 | Vapor IO Inc. | Distributed handoff-related processing for wireless networks |
| US20190146884A1 (en) * | 2017-11-15 | 2019-05-16 | Zscaler, Inc. | Systems and methods for service replication, validation, and recovery in cloud-based systems |
| US20200028776A1 (en) * | 2018-07-20 | 2020-01-23 | Netsia, Inc. | SYSTEM AND METHOD FOR A TRANSLATOR SUPPORTING MULTIPLE SOFTWARE DEFINED NETWORK (SDN) APPLICATION PROGRAMMING INTERFACES (APIs) |
Family Cites Families (30)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2000060825A1 (en) | 1999-04-02 | 2000-10-12 | Infolibria, Inc. | Connection pass-through to optimize server performance |
| US6687847B1 (en) | 1999-04-21 | 2004-02-03 | Cornell Research Foundation, Inc. | Failure detector with consensus protocol |
| US6651242B1 (en) | 1999-12-14 | 2003-11-18 | Novell, Inc. | High performance computing system for distributed applications over a computer |
| US6826601B2 (en) | 2001-09-06 | 2004-11-30 | Bea Systems, Inc. | Exactly one cache framework |
| US7113980B2 (en) | 2001-09-06 | 2006-09-26 | Bea Systems, Inc. | Exactly once JMS communication |
| US7185236B1 (en) | 2002-08-30 | 2007-02-27 | Eternal Systems, Inc. | Consistent group membership for semi-active and passive replication |
| US7408945B2 (en) | 2003-10-14 | 2008-08-05 | International Business Machines Corporation | Use of hardware to manage dependencies between groups of network data packets |
| US7636372B2 (en) | 2003-12-19 | 2009-12-22 | Broadcom Corporation | Method and system for providing smart offload and upload |
| US7814064B2 (en) | 2004-05-12 | 2010-10-12 | Oracle International Corporation | Dynamic distributed consensus algorithm |
| US20050289152A1 (en) * | 2004-06-10 | 2005-12-29 | Earl William J | Method and apparatus for implementing a file system |
| US7539995B2 (en) | 2004-12-30 | 2009-05-26 | Intel Corporation | Method and apparatus for managing an event processing system |
| US7886083B2 (en) | 2005-08-31 | 2011-02-08 | Microsoft Corporation | Offloaded neighbor cache entry synchronization |
| US7797457B2 (en) | 2006-03-10 | 2010-09-14 | Microsoft Corporation | Leaderless byzantine consensus |
| US7937482B1 (en) | 2008-03-27 | 2011-05-03 | Amazon Technologies, Inc. | Scalable consensus protocol |
| US8856593B2 (en) | 2010-04-12 | 2014-10-07 | Sandisk Enterprise Ip Llc | Failure recovery using consensus replication in a distributed flash memory system |
| US10614098B2 (en) | 2010-12-23 | 2020-04-07 | Mongodb, Inc. | System and method for determining consensus within a distributed database |
| US9588924B2 (en) | 2011-05-26 | 2017-03-07 | International Business Machines Corporation | Hybrid request/response and polling messaging model |
| US9172670B1 (en) | 2012-01-31 | 2015-10-27 | Google Inc. | Disaster-proof event data processing |
| US10311014B2 (en) | 2012-12-28 | 2019-06-04 | Iii Holdings 2, Llc | System, method and computer readable medium for offloaded computation of distributed application protocols within a cluster of data processing nodes |
| US9419854B1 (en) | 2013-06-27 | 2016-08-16 | The Boeing Company | Methods and systems for shared awareness through local observations and global state consistency in distributed and decentralized systems |
| US10235404B2 (en) | 2014-06-25 | 2019-03-19 | Cohesity, Inc. | Distributed key-value store |
| US9690675B2 (en) | 2014-07-17 | 2017-06-27 | Cohesity, Inc. | Dynamically changing members of a consensus group in a distributed self-healing coordination service |
| US9923768B2 (en) | 2015-04-14 | 2018-03-20 | International Business Machines Corporation | Replicating configuration between multiple geographically distributed servers using the rest layer, requiring minimal changes to existing service architecture |
| US10430240B2 (en) | 2015-10-13 | 2019-10-01 | Palantir Technologies Inc. | Fault-tolerant and highly-available configuration of distributed services |
| WO2017179059A1 (en) | 2016-04-14 | 2017-10-19 | B. G. Negev Technologies And Applications Ltd., At Ben-Gurion University | Self-stabilizing secure and heterogeneous systems |
| US10180812B2 (en) | 2016-06-16 | 2019-01-15 | Sap Se | Consensus protocol enhancements for supporting flexible durability options |
| US20180063238A1 (en) | 2016-08-25 | 2018-03-01 | Jiangang Zhang | Massively Scalable, Low Latency, High Concurrency and High Throughput Decentralized Consensus Algorithm |
| WO2018065411A1 (en) | 2016-10-05 | 2018-04-12 | Calastone Limited | Computer system |
| CN106789095B (en) | 2017-03-30 | 2020-12-08 | 腾讯科技(深圳)有限公司 | Distributed system and message processing method |
| US10848375B2 (en) * | 2018-08-13 | 2020-11-24 | At&T Intellectual Property I, L.P. | Network-assisted raft consensus protocol |
-
2018
- 2018-08-13 US US16/101,751 patent/US10848375B2/en not_active Expired - Fee Related
-
2020
- 2020-11-23 US US17/101,280 patent/US11533220B2/en active Active
-
2022
- 2022-12-19 US US18/083,939 patent/US20230118489A1/en not_active Abandoned
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100082728A1 (en) * | 2008-09-30 | 2010-04-01 | Yahoo! Inc. | Consensus-based reliable messaging |
| US20180295546A1 (en) * | 2017-04-07 | 2018-10-11 | Vapor IO Inc. | Distributed handoff-related processing for wireless networks |
| US20190146884A1 (en) * | 2017-11-15 | 2019-05-16 | Zscaler, Inc. | Systems and methods for service replication, validation, and recovery in cloud-based systems |
| US20200028776A1 (en) * | 2018-07-20 | 2020-01-23 | Netsia, Inc. | SYSTEM AND METHOD FOR A TRANSLATOR SUPPORTING MULTIPLE SOFTWARE DEFINED NETWORK (SDN) APPLICATION PROGRAMMING INTERFACES (APIs) |
Cited By (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210021412A1 (en) * | 2018-09-07 | 2021-01-21 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for electing representative node device, computer device, and storage medium |
| US12052344B2 (en) * | 2018-09-07 | 2024-07-30 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for electing representative node device, computer device, and storage medium |
| US20210320977A1 (en) * | 2018-12-24 | 2021-10-14 | Huawei Technologies Co., Ltd. | Method and apparatus for implementing data consistency, server, and terminal |
| US11108637B1 (en) * | 2019-11-08 | 2021-08-31 | Sprint Communications Company L.P. | Wireless relay consensus for mesh network architectures |
| US11641313B2 (en) * | 2019-11-21 | 2023-05-02 | Vmware, Inc. | Asynchronous liveness boosting of state machine replication |
| US20210160152A1 (en) * | 2019-11-21 | 2021-05-27 | Vmware, Inc. | Asynchronous Boosting Of State Machine Replication |
| CN111371877A (en) * | 2020-02-28 | 2020-07-03 | 桂林电子科技大学 | A Consensus Method for Heterogeneous Consortium Chains |
| WO2021250652A1 (en) * | 2020-06-08 | 2021-12-16 | Drivenets Ltd. | Highly-available cluster leader election in a distributed routing system |
| US12368668B2 (en) | 2020-06-08 | 2025-07-22 | Drivenets Ltd. | Highly-available cluster leader election in a distributed routing system |
| CN112395640A (en) * | 2020-11-16 | 2021-02-23 | 国网河北省电力有限公司信息通信分公司 | Industry Internet of things data lightweight credible sharing technology based on block chain |
| US20220358118A1 (en) * | 2021-05-10 | 2022-11-10 | International Business Machines Corporation | Data synchronization in edge computing networks |
| CN115617257A (en) * | 2021-07-15 | 2023-01-17 | 腾讯科技(深圳)有限公司 | Data reading method, server and client |
| CN114490125A (en) * | 2022-01-19 | 2022-05-13 | 山东浪潮科学研究院有限公司 | Optimization method for preselection process in Raft consensus algorithm |
| CN114189421A (en) * | 2022-02-17 | 2022-03-15 | 江西农业大学 | Leader node election method, system, storage medium and equipment |
| CN114448900A (en) * | 2022-04-02 | 2022-05-06 | 南京邮电大学 | SDN controller interaction method and system based on extended raft algorithm |
| CN115134161A (en) * | 2022-07-11 | 2022-09-30 | 西安理工大学 | Defense method for resisting tenure forgery based on Raft consensus algorithm |
| CN116055563A (en) * | 2022-11-22 | 2023-05-02 | 北京明朝万达科技股份有限公司 | Task scheduling method, system, electronic equipment and medium based on Raft protocol |
| CN115955504A (en) * | 2022-12-21 | 2023-04-11 | 杭州溪塔科技有限公司 | Method and device for implementing state synchronization middleware for stateful services |
| US20250133131A1 (en) * | 2023-10-20 | 2025-04-24 | Oracle International Corporation | Raft consensus vice leader optimization |
| CN119892843A (en) * | 2025-01-13 | 2025-04-25 | 西南交通大学 | Breakdown fault tolerance consensus method and system based on relay thought and lease mechanism |
Also Published As
| Publication number | Publication date |
|---|---|
| US10848375B2 (en) | 2020-11-24 |
| US20210105177A1 (en) | 2021-04-08 |
| US11533220B2 (en) | 2022-12-20 |
| US20230118489A1 (en) | 2023-04-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11533220B2 (en) | Network-assisted consensus protocol | |
| US11349810B2 (en) | Single packet authorization in a cloud computing environment | |
| US10291689B2 (en) | Service centric virtual network function architecture for development and deployment of open systems interconnection communication model layer 4 through layer 7 services in a cloud computing system | |
| US10826812B2 (en) | Multiple quorum witness | |
| US20200374268A1 (en) | Cloud-Native Firewall | |
| US9749242B2 (en) | Network platform as a service layer for open systems interconnection communication model layer 4 through layer 7 services | |
| US20170163479A1 (en) | Method, Device and System of Renewing Terminal Configuration In a Memcached System | |
| US10091113B2 (en) | Network functions virtualization leveraging unified traffic management and real-world event planning | |
| WO2021184992A1 (en) | Mirror image file uploading method, related device and computer storage medium | |
| WO2017177621A1 (en) | Data synchronization method in local area network, and apparatus and user terminal therefor | |
| US20160127172A1 (en) | Device Operational Profiles | |
| WO2014090088A1 (en) | Method, server, and system for data sharing in social networking service | |
| US20220029920A1 (en) | Extending Distributed Hash Table-Based Software Network Functions to Switching Hardware | |
| JP2018506794A (en) | System-on-chip with I/O steering engine | |
| US20210329087A1 (en) | Distributed flow processing and flow cache | |
| US8725856B2 (en) | Discovery of network services | |
| CN110633046A (en) | A storage method, device, storage device and storage medium for a distributed system | |
| US10284392B2 (en) | Virtual private network resiliency over multiple transports | |
| US20230208856A1 (en) | Encrypted Applications Verification | |
| CN105376307A (en) | Asynchronous backup method among data centers | |
| US10862964B2 (en) | Peer packet transport | |
| US20240364697A1 (en) | System and method for managing metaverse instances | |
| WO2015196586A1 (en) | Virtual desktop configuration and acquisition method and apparatus | |
| CN106357764A (en) | Data synchronization method and server for mobile terminal | |
| CN114422538A (en) | Multi-cloud storage system, multi-cloud data reading and writing method and electronic equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: AT&T INTELLECTUAL PROPERTY I, L.P., GEORGIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HAN, BO;GOPALAKRISHNAN, VIJAY;PLATANIA, MARCO;AND OTHERS;SIGNING DATES FROM 20180719 TO 20180806;REEL/FRAME:046626/0008 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: NATIONAL SCIENCE FOUNDATION, VIRGINIA Free format text: CONFIRMATORY LICENSE;ASSIGNOR:UNIVERSITY OF MINNESOTA;REEL/FRAME:047207/0233 Effective date: 20181004 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| AS | Assignment |
Owner name: REGENTS OF THE UNIVERSITY OF MINNESOTA, MINNESOTA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZHANG, YANG;ZHANG, ZHI-LI;SIGNING DATES FROM 20200608 TO 20200610;REEL/FRAME:053045/0239 |
|
| ZAAA | Notice of allowance and fees due |
Free format text: ORIGINAL CODE: NOA |
|
| ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| AS | Assignment |
Owner name: REGENTS OF THE UNIVERSITY OF MINNESOTA, MINNESOTA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE RECEIVING PARTY DATA AND ASSIGNMENT DOCUMENT PREVIOUSLY RECORDED ON REEL 046626 FRAME 0008. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:ZHANG, YANG;REEL/FRAME:053978/0888 Effective date: 20200608 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
| FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20241124 |