US20160080247A1 - Optimal forwarding in a network implementing a plurality of logical networking schemes - Google Patents
Optimal forwarding in a network implementing a plurality of logical networking schemes Download PDFInfo
- Publication number
- US20160080247A1 US20160080247A1 US14/947,134 US201514947134A US2016080247A1 US 20160080247 A1 US20160080247 A1 US 20160080247A1 US 201514947134 A US201514947134 A US 201514947134A US 2016080247 A1 US2016080247 A1 US 2016080247A1
- Authority
- US
- United States
- Prior art keywords
- network
- rbridge
- gateways
- vxlan
- logical network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000006855 networking Effects 0.000 title claims abstract description 26
- 238000005538 encapsulation Methods 0.000 claims abstract description 54
- 238000000034 method Methods 0.000 claims abstract description 31
- 230000027455 binding Effects 0.000 claims description 24
- 238000009739 binding Methods 0.000 claims description 24
- 238000002372 labelling Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 235000008694 Humulus lupulus Nutrition 0.000 description 7
- 238000004891 communication Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000013459 approach Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/12—Shortest path evaluation
- H04L45/124—Shortest path evaluation using a combination of metrics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4633—Interconnection of networks using encapsulation techniques, e.g. tunneling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4641—Virtual LANs, VLANs, e.g. virtual private networks [VPN]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/12—Shortest path evaluation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/66—Layer 2 routing, e.g. in Ethernet based MAN's
Definitions
- TRILL Transparent Interconnect of Lots of Links
- TRILL provides an architecture of Layer 2 control and forwarding that provides benefits such as pair-wise optimal forwarding, loop mitigation, multipathing and provisioning free.
- the TRILL protocol is described in detail in Perlman et al., “RBridges: Base Protocol Specification,” available at http://tools.ietf.org/html/draft-ietf-trill-rbridge-protocol-16.
- the TRILL base protocol supports approximately four-thousand customer (or tenant) identifications through the use of inner virtual local area network (“VLAN”) tags. The number of tenant identifications provided by the TRILL base protocol is insufficient for large multi-tenant data center deployments.
- VLAN virtual local area network
- FGL fine-grained labeling
- VxLAN Virtual extensible local area network
- VxLAN is a networking scheme that provides a Layer 2 overlay on top of Layer 3 network infrastructure. Similar to FGL, VxLAN supports approximately sixteen million tenant identifications. Specifically, according to VxLAN, customer frames are encapsulated with a VxLAN header containing a VxLAN segment ID/VxLAN network identifier (“VNI”), which is a 24-bit field to identify virtual Layer 2 networks for different tenants.
- VNI VxLAN segment ID/VxLAN network identifier
- the VxLAN networking scheme is discussed in detail in Mahalingham et al., “VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks,” available at http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-01.
- TRILL FGL and VxLAN can co-exist in a multi-tenant data center.
- VxLAN origination and termination capabilities can be built into application-specific integrated circuits (“ASICs”) already supporting TRILL.
- ASICs application-specific integrated circuits
- packet-switching devices can be built with VxLAN gateway functionality.
- a VxLAN gateway can be configured to push FGL frames into VxLAN tunnels, as well as decapsulate frames from VxLAN tunnels for further forwarding as FGL frames. Accordingly, traffic can flow over the same physical network either natively in FGL or overlay in VxLAN.
- FIG. 1 is a block diagram illustrating an example physical network
- FIG. 2 is a block diagram illustrating forwarding paths in two logical networks over the network shown in FIG. 1 ;
- FIGS. 3A-3B are block diagrams illustrating example frame formats according to networking schemes discussed herein;
- FIG. 4 is a flow diagram illustrating example operations for determining an optimal forwarding path across the network shown in FIG. 1 ;
- FIG. 5 is a block diagram of an example computing device.
- Methods, systems and devices for determining an optimal forwarding path across a network that implements two different logical networking schemes are provided herein.
- the methods, systems and devices can compute the total path costs for traffic flowing via a plurality of forwarding paths, while accounting for the differences in the encapsulation overhead associated with the logical networking schemes.
- the path costs over the logical network with the greater encapsulation overhead can be weighted accordingly.
- the optimal path among the plurality of forwarding paths can be determined and optionally used when the traffic is forwarded over the network.
- the network 10 can be a multi-tenant data center deployment where FGL and VxLAN networking schemes are implemented for network virtualization.
- the network 10 can include RBridges RB 11 , RB 12 , RB 13 , RB 21 , RB 22 and RB 23 , physical server pm 1 and VxLAN servers 1 and 2 .
- Virtual machines vm 1 and vm 2 run on VxLAN servers 1 and 2 , respectively.
- the RBridges and servers discussed above can be communicatively connected through one or more communication links. This disclosure contemplates the communication links are any suitable communication link.
- a communication link may be implemented by any medium that facilitates data exchange between the network elements including, but not limited to, wired, wireless and optical links.
- the network 10 shown in FIG. 1 is provided only as an example. A person of ordinary skill in the art may provide the functionalities described herein in a network having more or less elements than shown in FIG. 1 .
- RBridges are packet-forwarding devices (e.g., switches, bridges, etc.) that are configured to implement the TRILL protocol.
- the TRILL protocol is well-known in the art and is therefore not discuss in further detail herein.
- TRILL links 12 between the RBridges are shown as solid lines in FIG. 1 .
- each of RBridges RB 11 , RB 12 , RB 13 , RB 21 , RB 22 and RB 23 can be configured to support the FGL networking scheme.
- two inner VLAN tags are used to increase the number of available tenant identifications as compared to the number of tenant identifications available using the TRILL base protocol.
- RBridges RB 12 , RB 21 and RB 22 can be configured to support the VxLAN networking scheme in addition to the FGL networking scheme. Similar to the FGL networking scheme, the VxLAN networking scheme increases the number of available tenant identifications. The FGL and VxLAN networking schemes are optionally implemented in large multi-tenant data centers due to the large number of available tenant identifications.
- RBridges RB 12 , RB 21 and RB 22 are also referred to as “VxLAN gateways” below because RBridges RB 12 , RB 21 and RB 22 can interface with both the FGL and VxLAN logical networks.
- three servers are communicatively connected to the network 10 through edge RBridges RB 21 , RB 22 and RB 23 .
- the servers are connected to the network 10 through classic Ethernet links 14 shown as dotted-dashed lines in FIG. 1 .
- physical server pm 1 is connected to RBridge 21 . It should be understood that physical server pm 1 is not configured or capable of performing VxLAN encapsulation/decapsulation.
- VxLAN servers 1 and 2 are connected to RB 22 and RB 23 , respectively. It should be understood that VxLAN servers 1 and 2 are configured or capable of performing VxLAN encapsulation/decapsulation.
- VxLAN servers 1 and 2 have respective VTEPs vtep 1 and vtep 2 to originate and terminate VxLAN tunnels for their respective virtual machines vm 1 and vm 2 .
- traffic e.g., a packet, frame, etc.
- traffic can be transported in two formats—natively in FGL and overlay in VxLAN.
- the traffic traverses two logical networks (e.g., the FGL and VxLAN networks) on top of the same physical network 10 .
- FIG. 2 is a block diagram illustrating the forwarding paths in the two logical networks over the network 10 of FIG. 1 .
- a plurality of (or multiple) forwarding paths exist between physical server pm 1 and VxLAN server 1 due to fact that there are multiple VxLAN gateways (i.e., RB 12 , RB 21 and RB 22 ) in the network 10 .
- the traffic flowing from physical server pm 1 can reach VxLAN server 1 via RBridges RB 12 , RB 21 or RB 22 (i.e., the VxLAN gateways).
- all links in the network 10 are the same (e.g., 10 G links) and that all links have the same default metric value of 10.
- all links in the network 10 are assumed to be equal for the purpose of the examples, this disclosure contemplates that all of the links in the network 10 may not be equal.
- the path costs of the multiple forwarding paths can be different due to the link metric values and/or the hop count.
- differences in path costs can exist, for example, due to the differences in the encapsulation overhead incurred by the networking schemes.
- example techniques for determining an optimal forwarding path are provided with reference to the two-tier fat tree network topology shown in FIG. 1 .
- This disclosure contemplates that the example techniques are also applicable to arbitrary network topologies as well.
- FIG. 2 the multiple forwarding paths for traffic flowing from physical server pm 1 to VxLAN server 1 , e.g., via each of RBridges RB 12 , RB 21 and RB 22 (e.g., the VxLAN gateways) are illustrated.
- the FGL paths 22 are shown by dotted-dashed lines and the VxLAN paths 24 are shown by solid lines in FIG. 2 .
- the FGL path costs between physical server pm 1 and each of RBridges RB 12 , RB 21 and RB 22 are 10, 0 and 20, respectively.
- there is one hop e.g., from RBridge RB 21 to RBridge RB 12
- the first hops e.g., between physical server pm 1 and RBridge RB 21 and between VxLAN server 1 and RBridge RB 22
- the VxLAN path costs between VxLAN server 1 and each of RBridges RB 12 , RB 21 and RB 22 are 10, 20 and 0, respectively.
- there are two hops e.g., from RBridge RB 21 to RBridge RB 12 to RBridge RB 22 ) between RBridge RB 21 and VxLAN server 1 .
- each of the multiple forwarding paths between physical server pm 1 and VxLAN server 1 appear to have the same total path cost (e.g., 20) on the surface.
- the forwarding path with the fewest hops over the VxLAN e.g., when RBridge RB 22 is the VxLAN gateway
- FIGS. 3A-3B block diagrams illustrating example frame formats according to networking schemes discussed herein are shown.
- FIGS. 3A-3B block diagrams illustrating example frame formats according to networking schemes discussed herein are shown.
- FIG. 3A-3B the original customer frame (e.g., inner source and destination MAC addresses and packet payload) are shaded.
- FIG. 3A illustrates an example FGL frame, which adds 32 bytes to the original customer frame.
- FIG. 3B illustrates an example VxLAN frame, which adds 76 bytes to the original customer frame.
- the VxLAN tunnel over the network 10 therefore, introduces an additional 44-byte encapsulation overhead per frame as compared to using FGL.
- the optimal forwarding path is via RBridge RB 22 (e.g., RBridge RB 22 acts as the VxLAN gateway).
- RBridge RB 22 acts as the VxLAN gateway.
- the total path cost over the two logical networks can be computed. Additionally, differences between the frame formats of the two logical networks (e.g., the FGL and VxLAN networks) can be taken into consideration when computing the total path cost. Further, gateway devices (e.g., RBridges RB 12 , RB 21 and RB 22 or the VxLAN gateways) can be configured to carry out the total path cost computation because the gateways connect the logical networks.
- the VxLAN gateways such as RBridges RB 12 , RB 21 and RB 22 , for example, can be configured to carry out the total path cost computation.
- the VxLAN gateways are RBridges and therefore are configured to implement the TRILL protocol.
- the VxLAN gateways can learn the network topology by exchanging link state information using the TRILL IS-IS link state protocol.
- the VxLAN gateways can optionally use other standard or proprietary protocols for exchanging link state information.
- the VxLAN gateways can compute their own path costs to/from any of the RBridges in the network 10 .
- RBridge RB 12 (e.g., one of the VxLAN gateways) can compute its path cost to each of RBridges RB 21 , RB 22 and RB 23 as 10 using the link state information.
- the VxLAN gateways can compute the path costs of the other VxLAN gateways to/from any of the RBridges in the network 10 if the VxLAN gateways know the other RBridge nicknames of the other VxLAN gateways.
- RBridge RB 12 e.g., one of the VxLAN gateways
- the RBridge nickname associated with RBridge RB 21 e.g., one of the other VxLAN gateways
- it can compute the path cost between RBridge RB 21 and each of RBridges RB 22 and RB 23 as 20 using the link state information.
- the VxLAN gateways can determine which RBridges the source and destination nodes are connected to, respectively, and then compute the total path costs between the source node and each of the VxLAN gateways and the total path costs between each of the VxLAN gateways and the destination node.
- a source node e.g., physical server pm 1
- a destination node e.g., VxLAN server 1
- the VxLAN gateways can determine which RBridges the source and destination nodes are connected to, respectively, and then compute the total path costs between the source node and each of the VxLAN gateways and the total path costs between each of the VxLAN gateways and the destination node.
- the traffic will traverse the FGL network between the source node (e.g., physical server pm 1 ) and the VxLAN gateway and traverse the VxLAN between the VxLAN gateway and the destination node (e.g., VxLAN server 1 ).
- each of the VxLAN gateways can be configured to advertise its respective RBridge nickname and, optionally, the IP address used for VxLAN encapsulation. It should be understood that each of the VxLAN gateways can be associated with a unique identifier (e.g., the RBridge nickname) according to the TRILL protocol.
- the RBridge nickname can be included in a Type Length Value (TLV) in the link state protocol used for disseminating the link state information. This is also referred to as the VxLAN Gateway Information TLV herein.
- the link state protocol can be the TRILL IS-IS link state protocol.
- the VxLAN Gateway Information TLV can optionally include the IP address used for VxLAN encapsulation, as well as the RBridge nickname. For example, if RBridge RB 21 (e.g., one of the VxLAN gateways) announces its RBridge nickname using the VxLAN Gateway Information TLV, then RBridge RB 12 (e.g., one of the VxLAN gateways) can compute the total path cost for RBridge RB 21 to/from the other RBridges in the network 10 in addition to its own path cost to/from the other RBridges in the network 10 .
- RBridge RB 21 e.g., one of the VxLAN gateways
- RBridge RB 12 e.g., one of the VxLAN gateways
- a VxLAN gateway can compute path costs between each of the other VxLAN gateways and each of the RBridges in the network 10 provided it knows the RBridge nicknames for the other VxLAN gateways. Additionally, as discussed in detail below, a VxLAN gateway can optionally use the IP address for VxLAN encapsulation when notifying the other RBridges in the network of the optimal VxLAN gateway.
- the VxLAN gateways can determine the RBridges to which the source and destination nodes, respectively, are connected. The determination is different depending on whether the source or destination node is a physical server (e.g., physical server pm 1 ) or a VxLAN server (e.g., VxLAN server 1 or 2 ).
- a VxLAN gateway processing traffic from a physical server can determine which RBridge the physical server is connected to via MAC learning. In other words, the VxLAN gateway can determine the RBridge to which the physical server is connected from the physical server's MAC address and RBridge nickname binding using its MAC address table.
- the VxLAN gateway learns the binding between physical server pmt's MAC address and the RBridge nickname associated with ingress RBridge RB 21 , e.g., the RBridge to which physical server pm 1 is connected. Then, using the link state information exchanged through the link state protocol, the VxLAN gateway can compute its own path cost from/to RBridge RB 21 as 10 .
- the VxLAN gateway can also compute path costs of RBridges RB 21 and RB 22 from/to RBridge RB 21 as 0 and 20, respectively.
- the process for determining the RBridge to which a VxLAN server is connected is discussed below.
- the IP addresses used by the VxLAN gateways e.g., RBridges RB 12 , RB 21 and RB 22
- the VTEPs e.g., VTEPs vtep 1 and vtep 2
- SVIs switch virtual interfaces
- VTEPs vtep 1 and vtep 2 can be configured to transmit VxLAN encapsulated frames in VLAN “X” and the SVIs for VLAN “X” can be configured on RBridges RB 12 , RB 21 and RB 22 .
- the VxLAN gateways can then determine the RBridge to which a VxLAN server is connected through the following bindings: (1) the binding between the MAC address associated with a VxLAN server and the IP address associated with the VTEP (e.g., VxLAN learning), (2) the binding between the IP address associated with the VTEP and the MAC address associated with the VTEP (e.g., ARP), and (3) the binding between the MAC address associated with the VTEP and the RBridge nickname of the ingress RBridge (e.g., MAC learning).
- RBridge RB 12 can determine that VxLAN server 1 is connected to RBridge RB 22 through the following three bindings. First, through VxLAN learning, RBridge RB 12 can find the binding of the MAC address associated with the VxLAN server 1 and the IP address associated with VTEP vtep 1 using its VxLAN table. Next, because RBridge 12 is in the same subnet as VTEP vtep 1 , RBridge RB 12 can find the MAC address associated with VTEP vtep 1 using its ARP table.
- RBridge RB 12 can find which RBridge VxLAN server 1 is connected to based on VTEP vtep 1 's MAC address and RBridge RB 22 's RBridge nickname using its MAC address table.
- the VxLAN gateway can compute its own path cost from/to RBridge RB 22 as 10.
- the VxLAN gateway e.g., RBridge RB 12
- the VxLAN gateway can also compute the path costs of RBridges RB 21 and RB 22 from/to RBridge RB 22 as 20 and 0, respectively.
- the VxLAN gateway (e.g., RBridge RB 12 , RB 21 or RB 22 ) can determine the optimal forwarding path and the optimal VxLAN gateway. It should be understood that traffic flows from the source node to the VxLAN gateway over the FGL network and from the VxLAN gateway to the destination node over the VxLAN. Alternatively or additionally, it should be understood that traffic flows from the source node to the VxLAN gateway over the VxLAN and from the VxLAN gateway to the destination node over the FGL network. This is shown in FIG. 2 .
- the optimal forwarding path is the forwarding path having the fewest hops in the logical network having the greater encapsulation overhead.
- the optimal forwarding path is chosen such that traffic makes fewer hops in the logical network associated with the larger encapsulation overhead (e.g., the VxLAN) and more hops in the logical network associated with smaller encapsulation overhead (e.g., the FGL network).
- VxLAN encapsulation overhead exceeds FGL encapsulation overhead.
- the optimal VxLAN gateway is RBridge RB 22 and the optimal forwarding path is through RBridge RB 22 .
- the VxLAN gateways can be configured to calculate an encapsulation overhead metric.
- the encapsulation overhead metric (“E O/H ”) can optionally be defined as:
- the encapsulation overhead metric provided in Eqn. (1) is provided only as an example and that the encapsulation overhead metric can be defined in other ways.
- the per frame encapsulation overhead of VxLAN encapsulation exceeds FGL encapsulation by 44 bytes.
- the encapsulation overhead metric calculated using Eqn. (1) is 1.1, assuming an average packet size of 440 bytes. This disclosure contemplates that the average packet size can optionally be more or less than 440 bytes, which is provided only as an example.
- Table 1 shows the total path costs computed for the multiple forwarding paths between physical server pm 1 and VxLAN server 1 of FIG. 2 , assuming an encapsulation overhead of 44 bytes per frame and an average packet size of 440 bytes.
- VxLAN Forwarding Path FGL Path Cost Path Cost E o/H Total Path Cost Via RB12 10 10 1.1 21 Via RB21 0 20 1.1 22 Via RB22 20 0 1.1 20 As shown above in Table 1, the optimal forwarding path is via RBridge RB 22 .
- the VxLAN gateways can be configured to notify the RBridges and VTEPs in the network 10 of which RBridge is the optimal VxLAN gateway.
- the VxLAN gateways can be configured to notify the RBridges and VTEPs in the network 10 of which RBridge is the optimal VxLAN gateway.
- RBridge RB 12 performs VxLAN encapsulation and transmits the encapsulated frame to the VxLAN IP multicast address.
- the distribution tree 16 rooted at RBridge RB 12 is shown as a dashed line in FIG. 1 .
- RBridge RB 12 learns the binding between physical server pmt's MAC address and RBridge RB 21 's RBridge nickname through MAC address learning, and therefore, RBridge RB 12 can compute the FGL path costs between physical server pm 1 and all of the VxLAN gateways (e.g., RBridges RB 12 , RB 21 and RB 22 ).
- VxLAN sever 1 responds with a unicast frame to physical server pm 1 .
- VTEP vtep 1 encapsulates the frame, using learned RBridge RB 12 's IP address as the destination IP address.
- RBridge RB 12 can learn the binding between VxLAN server 1 's MAC address and VTEP vtep 1 's IP address, for example, through the three bindings discussed above.
- RBridge RB 12 can then compute the VxLAN path costs between VxLAN server 1 and all of the VxLAN gateways (e.g., RBridges RB 12 , RB 21 and RB 22 ).
- RBridge RB 12 The ability of RBridge RB 12 to compute the total path costs for the other VxLAN gateways assumes that RBridge RB 12 has learned the RBridge nicknames of the other VxLAN gateways, for example, by exchanging messages using the link state protocol including the VxLAN Gateway Information TLV. Additionally, RBridge RB 12 can weight the path costs over the VxLAN because VxLAN encapsulation has a higher encapsulation overhead as compared to FGL encapsulation.
- RBridge RB 12 Upon computing the total path costs, for example, as shown in Table 1 above, RBridge RB 12 realizes that it is not in the optimal forwarding path.
- RBridge RB 12 can optionally be configured to notify one or more RBridges in the network 10 to use the optimal forwarding path, e.g., via RBridge RB 22 , instead of the forwarding path via RBridge RB 12 .
- RBridge RB 12 can be configured to notify the RBridge to which physical server pm 1 is connected (e.g., RBridge RB 21 ) and VxLAN server 1 's VTEP (e.g., VTEP vtep 1 ) to use the optimal path via RBridge RB 22 .
- physical server pm 1 e.g., RBridge RB 21
- VxLAN server 1 's VTEP e.g., VTEP vtep 1
- an implicit approach is provided below that can be used by a VxLAN gateway to notify an RBridge or a VTEP of the optimal forwarding path. It should be understood that the implicit approach does not require any protocol changes.
- a VxLAN gateway can encapsulate FGL frames using the desired optimal VxLAN gateway's RBridge nickname as the ingress RBridge nickname.
- the RBridge to which the physical server is connected can learn the binding between the desired MAC address and RBridge nickname and redirect traffic to the optimal VxLAN gateway.
- RBridge RB 12 (e.g., a non-optimal VxLAN gateway) can decapsulate VxLAN frames from VxLAN server 1 and can encapsulate the frames with FGL headers using RBridge RB 22 's (e.g., an optimal VxLAN gateway) RBridge nickname, instead of its own, as the ingress RBridge nickname. Then, the RBridge to which physical server pm 1 is connected (e.g., RBridge RB 21 ) can learn the desired binding between VxLAN server 1 's MAC address and RBridge RB 22 's RBridge nickname and redirect traffic to RBridge RB 22 .
- RBridge RB 21 the RBridge to which physical server pm 1 is connected
- a VxLAN gateway can encapsulate VxLAN frames using the desired optimal VxLAN gateway's IP address as the source IP address.
- the VTEP can learn the desired binding between the MAC address and IP address and redirect the traffic to the optimal VxLAN gateway.
- RBridge RB 12 e.g., a non-optimal VxLAN gateway
- RBridge RB 22 can decapsulate FGL frames from physical server pm 1 and can encapsulate the frames with VxLAN headers using RBridge RB 22 's (e.g., an optimal VxLAN gateway) IP address, instead of its own, as the source IP address.
- VTEP vtep 1 can learn the desired binding between physical server pmt's MAC address and RBridge RB 22 's IP address and redirect the traffic to RBridge RB 22 .
- an explicit approach is provided below that can be used by a VxLAN gateway to notify an RBridge or a VTEP of the optimal forwarding path.
- the explicit approach requires a protocol change, it provides the benefit of fast rerouting when a VxLAN gateway in the optimal forwarding path fails.
- a VxLAN gateway can use the TRILL ESADI to notify an RBridge to which the physical server is connected of the plurality of bindings between the VxLAN server's MAC address and RBridge nicknames of the VxLAN gateways and the associated VxLAN path costs.
- a VxLAN gateway can notify the RBridge to which the physical server is connected of a plurality of bindings with associated VxLAN path costs so that the RBridge can switch to the next-best VxLAN gateway if the optimal VxLAN gateway is detected as unreachable by the link state protocol.
- the VxLAN can be configured to use a modified MAC Reachability TLV, i.e., a VxLAN MAC Reachability TLV.
- the VxLAN MAC Reachability TLV can include a list of tuples, including but not limited to, one or more VxLAN server MAC addresses and associated VxLAN gateway RBridge nicknames and VxLAN path costs.
- the RBridge When the RBridge receives the VxLAN MAC Reachability TLV, it can compute the total path costs based on its FGL path costs to VxLAN gateways and the advertised VxLAN path costs. For example, RBridge RB 12 can use the VxLAN MAC Reachability TLV to announce the bindings of VxLAN server 1 's MAC address and the RBridge nicknames of RBridges RB 12 , RB 21 and RB 22 (e.g., the VxLAN gateways) with respective VxLAN path costs of 10, 20 and 0.
- RBridge RB 21 When the RBridge to which physical server pm 1 is connected (e.g., RBridge RB 21 ) receives the VxLAN MAC Reachability TLV, it can compute total path costs via RBridges RB 12 , RB 21 and RB 22 as 21, 22 and 20, respectively, based on its FGL path costs to RBridges RB 12 , RB 21 and RB 22 of 10, 0 and 20, respectively, and the advertised VxLAN path costs of 10, 20 and 0. RBridge RB 21 can then redirect the traffic to RBridge RB 22 because it is the VxLAN gateway associated with the lowest total path cost.
- a VxLAN gateway can use a control protocol (e.g., VxLAN Gateway Address Distribution Information (VGADI)) to allow a VxLAN gateway to notify a VTEP of the plurality of bindings between a physical server's MAC address and VxLAN gateway IP addresses of the VxLAN gateways and the associated total path costs.
- VGADI VxLAN Gateway Address Distribution Information
- a VxLAN gateway can unicast its protocol data units (“PDUs”) to the IP address of the intended VTEP.
- PDUs protocol data units
- Each PDU can carry a VxLAN Gateway Reachability TLV, which includes a list of tuples, including but not limited to, one or more physical server MAC addresses and associated VxLAN gateway IP addresses and total path costs.
- RBridge RB 12 can use the VxLAN Gateway Reachability TLV to inform VTEP vtep 1 of the bindings between physical server pmt's MAC address and the IP addresses of RBridges RB 12 , RB 21 and RB 22 (e.g., the VxLAN gateways) with respective total path costs of 21, 22 and 20.
- VTEP vtep 1 can then redirect the traffic to RBridge RB 22 because it is the optimal VxLAN gateway associated with the lowest total path cost.
- the logical operations described herein with respect to the various figures may be implemented (1) as a sequence of computer implemented acts or program modules (i.e., software) running on a computing device, (2) as interconnected machine logic circuits or circuit modules (i.e., hardware) within the computing device and/or (3) a combination of software and hardware of the computing device.
- the logical operations discussed herein are not limited to any specific combination of hardware and software. The implementation is a matter of choice dependent on the performance and other requirements of the computing device. Accordingly, the logical operations described herein are referred to variously as operations, structural devices, acts, or modules. These operations, structural devices, acts and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof. It should also be appreciated that more or fewer operations may be performed than shown in the figures and described herein. These operations may also be performed in a different order than those described herein.
- the network can be the network including RBridges configured to implement both FGL networking and VxLAN schemes, e.g., the network 10 shown in FIG. 1 .
- RBridges configured to implement both FGL and VxLAN networking schemes are VxLAN gateways.
- the example operations 400 can be carried out by a VxLAN gateway.
- one or more RBridge nicknames can be learned. As discussed above, each RBridge nickname is uniquely associated with one of the VxLAN gateways in the network.
- a path cost over the FGL network between each of the VxLAN gateways and a source node is determined.
- a path cost over the VxLAN between each of the VxLAN gateways and a destination node is determined.
- an encapsulation overhead metric associated with switching packets over the VxLAN can be determined.
- one of the VxLAN gateways can be selected as an optimal VxLAN gateway. The selection can be based on the path cost over the FGL network between each of the VxLAN gateways and the source node, the path cost over the VxLAN between each of the VxLAN gateways and the destination node and the encapsulation overhead metric.
- one or more RBridges in the network can be notified of the selection. This facilitates the ability of the RBridges to re-direct traffic via the optimal VxLAN gateway.
- the process may execute on any type of computing architecture or platform.
- FIG. 5 an example computing device upon which embodiments of the invention may be implemented is illustrated.
- the RBridges and servers discussed above may be a computing device, such as computing device 500 shown in FIG. 5 .
- the computing device 500 may include a bus or other communication mechanism for communicating information among various components of the computing device 500 .
- computing device 500 typically includes at least one processing unit 506 and system memory 504 .
- system memory 504 may be volatile (such as random access memory (RAM)), non-volatile (such as read-only memory (ROM), flash memory, etc.), or some combination of the two.
- RAM random access memory
- ROM read-only memory
- the processing unit 506 may be a standard programmable processor that performs arithmetic and logic operations necessary for operation of the computing device 500 .
- the processing unit 506 can be an ASIC.
- Computing device 500 may have additional features/functionality.
- computing device 500 may include additional storage such as removable storage 508 and non-removable storage 510 including, but not limited to, magnetic or optical disks or tapes.
- Computing device 500 may also contain network connection(s) 516 that allow the device to communicate with other devices.
- Computing device 500 may also have input device(s) 514 such as a keyboard, mouse, touch screen, etc.
- Output device(s) 512 such as a display, speakers, printer, etc. may also be included.
- the additional devices may be connected to the bus in order to facilitate communication of data among the components of the computing device 500 . All these devices are well known in the art and need not be discussed at length here.
- the processing unit 506 may be configured to execute program code encoded in tangible, computer-readable media.
- Computer-readable media refers to any media that is capable of providing data that causes the computing device 500 (i.e., a machine) to operate in a particular fashion.
- Various computer-readable media may be utilized to provide instructions to the processing unit 506 for execution.
- Common forms of computer-readable media include, for example, magnetic media, optical media, physical media, memory chips or cartridges, a carrier wave, or any other medium from which a computer can read.
- Example computer-readable media may include, but is not limited to, volatile media, non-volatile media and transmission media.
- Volatile and non-volatile media may be implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data and common forms are discussed in detail below.
- Transmission media may include coaxial cables, copper wires and/or fiber optic cables, as well as acoustic or light waves, such as those generated during radio-wave and infra-red data communication.
- Example tangible, computer-readable recording media include, but are not limited to, an integrated circuit (e.g., field-programmable gate array or application-specific IC), a hard disk, an optical disk, a magneto-optical disk, a floppy disk, a magnetic tape, a holographic storage medium, a solid-state device, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices.
- an integrated circuit e.g., field-programmable gate array or application-specific IC
- a hard disk e.g., an optical disk, a magneto-optical disk, a floppy disk, a magnetic tape, a holographic storage medium, a solid-state device, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (
- the processing unit 506 may execute program code stored in the system memory 504 .
- the bus may carry data to the system memory 504 , from which the processing unit 506 receives and executes instructions.
- the data received by the system memory 504 may optionally be stored on the removable storage 508 or the non-removable storage 510 before or after execution by the processing unit 506 .
- Computing device 500 typically includes a variety of computer-readable media.
- Computer-readable media can be any available media that can be accessed by device 500 and includes both volatile and non-volatile media, removable and non-removable media.
- Computer storage media include volatile and non-volatile, and removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- System memory 504 , removable storage 508 , and non-removable storage 510 are all examples of computer storage media.
- Computer storage media include, but are not limited to, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 500 . Any such computer storage media may be part of computing device 500 .
- the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination thereof.
- the methods and apparatuses of the presently disclosed subject matter, or certain aspects or portions thereof may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computing device, the machine becomes an apparatus for practicing the presently disclosed subject matter.
- the computing device In the case of program code execution on programmable computers, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
- One or more programs may implement or utilize the processes described in connection with the presently disclosed subject matter, e.g., through the use of an application programming interface (API), reusable controls, or the like.
- API application programming interface
- Such programs may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system.
- the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language and it may be combined with hardware implementations.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
An example method for determining an optimal forwarding path across a network having gateways configured to implement a plurality of logical networking protocols can include determining a path cost over a first logical network between each of the gateways and a source node and a path cost over the a second logical network between each of the gateways and a destination node. Additionally, the method can include determining an encapsulation cost difference between switching packets over the first and second logical networks. The method can also include determining an encapsulation overhead metric associated with one of the first or second logical networks, and weighting one of the first or second path cost by the encapsulation overhead metric. Further, the method can include selecting one of the gateways as an optimal gateway. The selection can be based on the computed path costs.
Description
- This application is a continuation of U.S. patent application Ser. No. 13/898,572, filed on May 21, 2013, entitled “OPTIMAL FORWARDING FOR TRILL FINE-GRAINED LABELING AND VXLAN INTERWORKING,” the disclosure of which is expressly incorporated herein by reference in its entirety.
- IETF Transparent Interconnect of Lots of Links (“TRILL”) provides an architecture of
Layer 2 control and forwarding that provides benefits such as pair-wise optimal forwarding, loop mitigation, multipathing and provisioning free. The TRILL protocol is described in detail in Perlman et al., “RBridges: Base Protocol Specification,” available at http://tools.ietf.org/html/draft-ietf-trill-rbridge-protocol-16. The TRILL base protocol supports approximately four-thousand customer (or tenant) identifications through the use of inner virtual local area network (“VLAN”) tags. The number of tenant identifications provided by the TRILL base protocol is insufficient for large multi-tenant data center deployments. Thus, a fine-grained labeling (“FGL”) networking scheme has been proposed to increase the number of tenant identifications to approximately sixteen million through the use of two inner VLAN tags. The FGL networking scheme is described in detail in Eastlake et al., “TRILL: Fine-Grained Labeling,” available at http://tools.ietf.org/html/draft-ietf-trill-fine-labeling-01. - Virtual extensible local area network (“VxLAN”) is a networking scheme that provides a
Layer 2 overlay on top of Layer 3 network infrastructure. Similar to FGL, VxLAN supports approximately sixteen million tenant identifications. Specifically, according to VxLAN, customer frames are encapsulated with a VxLAN header containing a VxLAN segment ID/VxLAN network identifier (“VNI”), which is a 24-bit field to identifyvirtual Layer 2 networks for different tenants. The VxLAN networking scheme is discussed in detail in Mahalingham et al., “VXLAN: A Framework for Overlaying VirtualizedLayer 2 Networks over Layer 3 Networks,” available at http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-01. - As two complementary network virtualization schemes, TRILL FGL and VxLAN can co-exist in a multi-tenant data center. To facilitate their interworking, VxLAN origination and termination capabilities can be built into application-specific integrated circuits (“ASICs”) already supporting TRILL. In other words, packet-switching devices can be built with VxLAN gateway functionality. A VxLAN gateway can be configured to push FGL frames into VxLAN tunnels, as well as decapsulate frames from VxLAN tunnels for further forwarding as FGL frames. Accordingly, traffic can flow over the same physical network either natively in FGL or overlay in VxLAN.
- The components in the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding parts throughout the several views.
-
FIG. 1 is a block diagram illustrating an example physical network; -
FIG. 2 is a block diagram illustrating forwarding paths in two logical networks over the network shown inFIG. 1 ; -
FIGS. 3A-3B are block diagrams illustrating example frame formats according to networking schemes discussed herein; -
FIG. 4 is a flow diagram illustrating example operations for determining an optimal forwarding path across the network shown inFIG. 1 ; and -
FIG. 5 is a block diagram of an example computing device. - Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. Methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure. As used in the specification, and in the appended claims, the singular forms “a,” “an,” “the” include plural referents unless the context clearly dictates otherwise. The term “comprising” and variations thereof as used herein is used synonymously with the term “including” and variations thereof and are open, non-limiting terms. While implementations will be described for determining an optimal forwarding path across a physical network where FGL and VxLAN networking schemes are implemented, it will become evident to those skilled in the art that the implementations are not limited thereto, but are applicable for determining an optimal forwarding path across a network that implements two different logical networking schemes.
- Methods, systems and devices for determining an optimal forwarding path across a network that implements two different logical networking schemes are provided herein. The methods, systems and devices can compute the total path costs for traffic flowing via a plurality of forwarding paths, while accounting for the differences in the encapsulation overhead associated with the logical networking schemes. Optionally, the path costs over the logical network with the greater encapsulation overhead can be weighted accordingly. After computing the total path costs, the optimal path among the plurality of forwarding paths can be determined and optionally used when the traffic is forwarded over the network.
- Referring now to
FIG. 1 , a block diagram illustrating an examplephysical network 10 is shown. For example, thenetwork 10 can be a multi-tenant data center deployment where FGL and VxLAN networking schemes are implemented for network virtualization. Thenetwork 10 can include RBridges RB11, RB12, RB13, RB21, RB22 and RB23, physical server pm1 andVxLAN servers VxLAN servers network 10 shown inFIG. 1 is provided only as an example. A person of ordinary skill in the art may provide the functionalities described herein in a network having more or less elements than shown inFIG. 1 . - RBridges are packet-forwarding devices (e.g., switches, bridges, etc.) that are configured to implement the TRILL protocol. The TRILL protocol is well-known in the art and is therefore not discuss in further detail herein.
TRILL links 12 between the RBridges are shown as solid lines inFIG. 1 . In addition, each of RBridges RB11, RB12, RB13, RB21, RB22 and RB23 can be configured to support the FGL networking scheme. As discussed above, according to the FGL networking scheme, two inner VLAN tags are used to increase the number of available tenant identifications as compared to the number of tenant identifications available using the TRILL base protocol. RBridges RB12, RB21 and RB22 (e.g., the shaded RBridges inFIG. 1 ) can be configured to support the VxLAN networking scheme in addition to the FGL networking scheme. Similar to the FGL networking scheme, the VxLAN networking scheme increases the number of available tenant identifications. The FGL and VxLAN networking schemes are optionally implemented in large multi-tenant data centers due to the large number of available tenant identifications. RBridges RB12, RB21 and RB22 are also referred to as “VxLAN gateways” below because RBridges RB12, RB21 and RB22 can interface with both the FGL and VxLAN logical networks. - As shown in
FIG. 1 , three servers are communicatively connected to thenetwork 10 through edge RBridges RB21, RB22 and RB23. Optionally, the servers are connected to thenetwork 10 through classic Ethernetlinks 14 shown as dotted-dashed lines inFIG. 1 . In particular, physical server pm1 is connected to RBridge 21. It should be understood that physical server pm1 is not configured or capable of performing VxLAN encapsulation/decapsulation. Additionally,VxLAN servers VxLAN servers VxLAN servers - When traffic (e.g., a packet, frame, etc.) is forwarded from one server to another (e.g., from physical server pm1 to VxLAN server 1), the traffic can be transported in two formats—natively in FGL and overlay in VxLAN. Conceptually, the traffic traverses two logical networks (e.g., the FGL and VxLAN networks) on top of the same
physical network 10. This is shown inFIG. 2 , which is a block diagram illustrating the forwarding paths in the two logical networks over thenetwork 10 ofFIG. 1 . It should be understood that a plurality of (or multiple) forwarding paths exist between physical server pm1 andVxLAN server 1 due to fact that there are multiple VxLAN gateways (i.e., RB12, RB21 and RB22) in thenetwork 10. Thus, the traffic flowing from physical server pm1 can reachVxLAN server 1 via RBridges RB12, RB21 or RB22 (i.e., the VxLAN gateways). When there are multiple forwarding paths available, it is desirable to configure the RBridges to perform optimal forwarding across thenetwork 10. In other words, it is desirable to configure the RBridges to identify and use the optimal VxLAN gateway when forwarding traffic. - In the example implementations described below for determining an optimal forwarding path in the
network 10, it is assumed that all links in thenetwork 10 are the same (e.g., 10 G links) and that all links have the same default metric value of 10. Although all links in thenetwork 10 are assumed to be equal for the purpose of the examples, this disclosure contemplates that all of the links in thenetwork 10 may not be equal. It should be understood that in an arbitrary network topology the path costs of the multiple forwarding paths can be different due to the link metric values and/or the hop count. Further, even in a two-tier fat tree network topology with equal link metric values (e.g., the network topology shown inFIG. 1 ), differences in path costs can exist, for example, due to the differences in the encapsulation overhead incurred by the networking schemes. - As discussed in further detail below, example techniques for determining an optimal forwarding path are provided with reference to the two-tier fat tree network topology shown in
FIG. 1 . This disclosure contemplates that the example techniques are also applicable to arbitrary network topologies as well. With reference toFIG. 2 , the multiple forwarding paths for traffic flowing from physical server pm1 toVxLAN server 1, e.g., via each of RBridges RB12, RB21 and RB22 (e.g., the VxLAN gateways) are illustrated. TheFGL paths 22 are shown by dotted-dashed lines and the VxLAN paths 24 are shown by solid lines inFIG. 2 . Further, the FGL path costs between physical server pm1 and each of RBridges RB12, RB21 and RB22 are 10, 0 and 20, respectively. For example, there is one hop (e.g., from RBridge RB21 to RBridge RB12) between physical server pm1 and RBridge RB12. It should be understood that in the examples described herein the first hops (e.g., between physical server pm1 and RBridge RB21 and betweenVxLAN server 1 and RBridge RB22) are ignored because these hops will be the same regardless of the chosen forwarding path. The VxLAN path costs betweenVxLAN server 1 and each of RBridges RB12, RB21 and RB22 are 10, 20 and 0, respectively. For example, there are two hops (e.g., from RBridge RB21 to RBridge RB12 to RBridge RB22) between RBridge RB21 andVxLAN server 1. - Considering the two-tier fat tree network topology of
FIG. 1 , each of the multiple forwarding paths between physical server pm1 andVxLAN server 1 appear to have the same total path cost (e.g., 20) on the surface. However, due to differences between FGL and VxLAN encapsulations, the forwarding path with the fewest hops over the VxLAN (e.g., when RBridge RB22 is the VxLAN gateway) is actually the optimal path due to the encapsulation overhead introduced by VxLAN encapsulation as compared to FGL encapsulation. For example, referring now toFIGS. 3A-3B , block diagrams illustrating example frame formats according to networking schemes discussed herein are shown. InFIGS. 3A-3B , the original customer frame (e.g., inner source and destination MAC addresses and packet payload) are shaded.FIG. 3A illustrates an example FGL frame, which adds 32 bytes to the original customer frame.FIG. 3B illustrates an example VxLAN frame, which adds 76 bytes to the original customer frame. The VxLAN tunnel over thenetwork 10, therefore, introduces an additional 44-byte encapsulation overhead per frame as compared to using FGL. Thus, the optimal forwarding path is via RBridge RB22 (e.g., RBridge RB22 acts as the VxLAN gateway). It should be understood that the fields/sizes shown in the example frames ofFIGS. 3A-3B are provided only as examples and that the FGL frame and/or the VxLAN frame may have more or less fields/sizes than those shown. - Path Cost Computation
- To facilitate optimal forwarding across the
network 10, the total path cost over the two logical networks (e.g., the FGL and the VxLAN networks) can be computed. Additionally, differences between the frame formats of the two logical networks (e.g., the FGL and VxLAN networks) can be taken into consideration when computing the total path cost. Further, gateway devices (e.g., RBridges RB12, RB21 and RB22 or the VxLAN gateways) can be configured to carry out the total path cost computation because the gateways connect the logical networks. - As discussed above, the VxLAN gateways such as RBridges RB12, RB21 and RB22, for example, can be configured to carry out the total path cost computation. Further, as discussed above, the VxLAN gateways are RBridges and therefore are configured to implement the TRILL protocol. As such, the VxLAN gateways can learn the network topology by exchanging link state information using the TRILL IS-IS link state protocol. This disclosure contemplates that the VxLAN gateways can optionally use other standard or proprietary protocols for exchanging link state information. Using the link state information, the VxLAN gateways can compute their own path costs to/from any of the RBridges in the
network 10. For example, RBridge RB12 (e.g., one of the VxLAN gateways) can compute its path cost to each of RBridges RB21, RB22 and RB23 as 10 using the link state information. In addition, the VxLAN gateways can compute the path costs of the other VxLAN gateways to/from any of the RBridges in thenetwork 10 if the VxLAN gateways know the other RBridge nicknames of the other VxLAN gateways. For example, provided RBridge RB12 (e.g., one of the VxLAN gateways) knows the RBridge nickname associated with RBridge RB21 (e.g., one of the other VxLAN gateways), it can compute the path cost between RBridge RB21 and each of RBridges RB22 and RB23 as 20 using the link state information. Thus, to calculate the total path costs across the two logical networks (e.g., the FGL and VxLAN networks) for traffic flowing between a source node (e.g., physical server pm1) and a destination node (e.g., VxLAN server 1), the VxLAN gateways can determine which RBridges the source and destination nodes are connected to, respectively, and then compute the total path costs between the source node and each of the VxLAN gateways and the total path costs between each of the VxLAN gateways and the destination node. InFIG. 2 , it should be understood that the traffic will traverse the FGL network between the source node (e.g., physical server pm1) and the VxLAN gateway and traverse the VxLAN between the VxLAN gateway and the destination node (e.g., VxLAN server 1). - To facilitate the VxLAN gateways learning the RBridge nicknames of the other VxLAN gateways in the
network 10, each of the VxLAN gateways can be configured to advertise its respective RBridge nickname and, optionally, the IP address used for VxLAN encapsulation. It should be understood that each of the VxLAN gateways can be associated with a unique identifier (e.g., the RBridge nickname) according to the TRILL protocol. Optionally, the RBridge nickname can be included in a Type Length Value (TLV) in the link state protocol used for disseminating the link state information. This is also referred to as the VxLAN Gateway Information TLV herein. Optionally, the link state protocol can be the TRILL IS-IS link state protocol. The VxLAN Gateway Information TLV can optionally include the IP address used for VxLAN encapsulation, as well as the RBridge nickname. For example, if RBridge RB21 (e.g., one of the VxLAN gateways) announces its RBridge nickname using the VxLAN Gateway Information TLV, then RBridge RB12 (e.g., one of the VxLAN gateways) can compute the total path cost for RBridge RB21 to/from the other RBridges in thenetwork 10 in addition to its own path cost to/from the other RBridges in thenetwork 10. Accordingly, a VxLAN gateway can compute path costs between each of the other VxLAN gateways and each of the RBridges in thenetwork 10 provided it knows the RBridge nicknames for the other VxLAN gateways. Additionally, as discussed in detail below, a VxLAN gateway can optionally use the IP address for VxLAN encapsulation when notifying the other RBridges in the network of the optimal VxLAN gateway. - In addition, to compute the total path cost for each of multiple forwarding paths between the source and destination nodes, the VxLAN gateways can determine the RBridges to which the source and destination nodes, respectively, are connected. The determination is different depending on whether the source or destination node is a physical server (e.g., physical server pm1) or a VxLAN server (e.g.,
VxLAN server 1 or 2). A VxLAN gateway processing traffic from a physical server can determine which RBridge the physical server is connected to via MAC learning. In other words, the VxLAN gateway can determine the RBridge to which the physical server is connected from the physical server's MAC address and RBridge nickname binding using its MAC address table. For example, when the traffic flows from physical server pm1 toVxLAN server 1 through RBridge RB12, the VxLAN gateway (e.g., RBridge RB12) learns the binding between physical server pmt's MAC address and the RBridge nickname associated with ingress RBridge RB21, e.g., the RBridge to which physical server pm1 is connected. Then, using the link state information exchanged through the link state protocol, the VxLAN gateway can compute its own path cost from/to RBridge RB21 as 10. In addition, provided that the VxLAN gateway has obtained the RBridge nicknames of the other VxLAN gateways in the network 10 (e.g., RBridges RB21 and RB22), the VxLAN gateway can also compute path costs of RBridges RB21 and RB22 from/to RBridge RB21 as 0 and 20, respectively. - The process for determining the RBridge to which a VxLAN server is connected is discussed below. The IP addresses used by the VxLAN gateways (e.g., RBridges RB12, RB21 and RB22) and the VTEPs (e.g., VTEPs vtep1 and vtep2) as the source IP addresses for VxLAN encapsulation are in the same IP subnet. This can be achieved by: (1) putting all VTEPs in the same VLAN and (2) configuring the switch virtual interfaces (“SVIs”) of the VLAN in the VxLAN gateways. For example, VTEPs vtep1 and vtep2 can be configured to transmit VxLAN encapsulated frames in VLAN “X” and the SVIs for VLAN “X” can be configured on RBridges RB12, RB21 and RB22. The VxLAN gateways can then determine the RBridge to which a VxLAN server is connected through the following bindings: (1) the binding between the MAC address associated with a VxLAN server and the IP address associated with the VTEP (e.g., VxLAN learning), (2) the binding between the IP address associated with the VTEP and the MAC address associated with the VTEP (e.g., ARP), and (3) the binding between the MAC address associated with the VTEP and the RBridge nickname of the ingress RBridge (e.g., MAC learning).
- For example, when the traffic flows from
VxLAN server 1 to physical server pm1 through RBridge RB12, RBridge RB12 can determine thatVxLAN server 1 is connected to RBridge RB22 through the following three bindings. First, through VxLAN learning, RBridge RB12 can find the binding of the MAC address associated with theVxLAN server 1 and the IP address associated with VTEP vtep1 using its VxLAN table. Next, becauseRBridge 12 is in the same subnet as VTEP vtep1, RBridge RB12 can find the MAC address associated with VTEP vtep1 using its ARP table. Then, through MAC learning, RBridge RB12 can find whichRBridge VxLAN server 1 is connected to based on VTEP vtep1's MAC address and RBridge RB22's RBridge nickname using its MAC address table. Using the link state information exchanged through the link state protocol, the VxLAN gateway can compute its own path cost from/to RBridge RB22 as 10. In addition, provided that the VxLAN gateway (e.g., RBridge RB12) has obtained the RBridge nicknames of the other VxLAN gateways in the network 10 (e.g., RBridges RB21 and RB22), the VxLAN gateway can also compute the path costs of RBridges RB21 and RB22 from/to RBridge RB22 as 20 and 0, respectively. - After computing the total path costs of the multiple forwarding paths between the source and destination nodes (e.g., physical server pm1 and VxLAN server 1), the VxLAN gateway (e.g., RBridge RB12, RB21 or RB22) can determine the optimal forwarding path and the optimal VxLAN gateway. It should be understood that traffic flows from the source node to the VxLAN gateway over the FGL network and from the VxLAN gateway to the destination node over the VxLAN. Alternatively or additionally, it should be understood that traffic flows from the source node to the VxLAN gateway over the VxLAN and from the VxLAN gateway to the destination node over the FGL network. This is shown in
FIG. 2 . The optimal forwarding path is the forwarding path having the fewest hops in the logical network having the greater encapsulation overhead. In other words, the optimal forwarding path is chosen such that traffic makes fewer hops in the logical network associated with the larger encapsulation overhead (e.g., the VxLAN) and more hops in the logical network associated with smaller encapsulation overhead (e.g., the FGL network). In the example implementations discussed herein, VxLAN encapsulation overhead exceeds FGL encapsulation overhead. Thus, the optimal VxLAN gateway is RBridge RB22 and the optimal forwarding path is through RBridge RB22. - Optionally, the VxLAN gateways can be configured to calculate an encapsulation overhead metric. The encapsulation overhead metric (“EO/H”) can optionally be defined as:
-
- It should be understood that the encapsulation overhead metric provided in Eqn. (1) is provided only as an example and that the encapsulation overhead metric can be defined in other ways. In the examples provided above, the per frame encapsulation overhead of VxLAN encapsulation exceeds FGL encapsulation by 44 bytes. The encapsulation overhead metric calculated using Eqn. (1) is 1.1, assuming an average packet size of 440 bytes. This disclosure contemplates that the average packet size can optionally be more or less than 440 bytes, which is provided only as an example. Then, the total path costs for the multiple forwarding paths can optionally be computed by weighting the path costs (e.g., Weighted Path Cost=EO/H×Path Cost) between the each of VxLAN gateways and the destination nodes by the encapsulation overhead metric. Table 1 below shows the total path costs computed for the multiple forwarding paths between physical server pm1 and
VxLAN server 1 ofFIG. 2 , assuming an encapsulation overhead of 44 bytes per frame and an average packet size of 440 bytes. -
TABLE 1 VxLAN Forwarding Path FGL Path Cost Path Cost Eo/H Total Path Cost Via RB12 10 10 1.1 21 Via RB21 0 20 1.1 22 Via RB22 20 0 1.1 20
As shown above in Table 1, the optimal forwarding path is via RBridge RB22. - Optimal Forwarding Notification
- Optionally, upon determining the optimal forwarding path and optimal VxLAN gateway, the VxLAN gateways can be configured to notify the RBridges and VTEPs in the
network 10 of which RBridge is the optimal VxLAN gateway. Consider the following initial traffic flow between physical server pm1 andVxLAN server 1 inFIG. 2 . First, physical server pm1 sends a unicast frame toVxLAN server 1. Since lookup fails in RBridge RB21, the frame is sent along the distribution tree to all other RBridges in thenetwork 10, including RBridges RB12 and RB22 (e.g., VxLAN gateways). Optionally, for multi-destination frame handling, only distribution tree root RBridge RB12 performs VxLAN encapsulation and transmits the encapsulated frame to the VxLAN IP multicast address. Thedistribution tree 16 rooted at RBridge RB12 is shown as a dashed line inFIG. 1 . Additionally, RBridge RB12 learns the binding between physical server pmt's MAC address and RBridge RB21's RBridge nickname through MAC address learning, and therefore, RBridge RB12 can compute the FGL path costs between physical server pm1 and all of the VxLAN gateways (e.g., RBridges RB12, RB21 and RB22). In addition, VxLAN sever 1 responds with a unicast frame to physical server pm1. As discussed above, VTEP vtep1 encapsulates the frame, using learned RBridge RB12's IP address as the destination IP address. After RBridge RB12 receives the frame, RBridge RB12 can learn the binding betweenVxLAN server 1's MAC address and VTEP vtep1's IP address, for example, through the three bindings discussed above. RBridge RB12 can then compute the VxLAN path costs betweenVxLAN server 1 and all of the VxLAN gateways (e.g., RBridges RB12, RB21 and RB22). The ability of RBridge RB12 to compute the total path costs for the other VxLAN gateways assumes that RBridge RB12 has learned the RBridge nicknames of the other VxLAN gateways, for example, by exchanging messages using the link state protocol including the VxLAN Gateway Information TLV. Additionally, RBridge RB12 can weight the path costs over the VxLAN because VxLAN encapsulation has a higher encapsulation overhead as compared to FGL encapsulation. - Upon computing the total path costs, for example, as shown in Table 1 above, RBridge RB12 realizes that it is not in the optimal forwarding path. RBridge RB12 can optionally be configured to notify one or more RBridges in the
network 10 to use the optimal forwarding path, e.g., via RBridge RB22, instead of the forwarding path via RBridge RB12. For example, RBridge RB12 can be configured to notify the RBridge to which physical server pm1 is connected (e.g., RBridge RB21) andVxLAN server 1's VTEP (e.g., VTEP vtep1) to use the optimal path via RBridge RB22. - Optionally, an implicit approach is provided below that can be used by a VxLAN gateway to notify an RBridge or a VTEP of the optimal forwarding path. It should be understood that the implicit approach does not require any protocol changes. A VxLAN gateway can encapsulate FGL frames using the desired optimal VxLAN gateway's RBridge nickname as the ingress RBridge nickname. Thus, the RBridge to which the physical server is connected can learn the binding between the desired MAC address and RBridge nickname and redirect traffic to the optimal VxLAN gateway. For example, RBridge RB12 (e.g., a non-optimal VxLAN gateway) can decapsulate VxLAN frames from
VxLAN server 1 and can encapsulate the frames with FGL headers using RBridge RB22's (e.g., an optimal VxLAN gateway) RBridge nickname, instead of its own, as the ingress RBridge nickname. Then, the RBridge to which physical server pm1 is connected (e.g., RBridge RB21) can learn the desired binding betweenVxLAN server 1's MAC address and RBridge RB22's RBridge nickname and redirect traffic to RBridge RB22. Additionally, a VxLAN gateway can encapsulate VxLAN frames using the desired optimal VxLAN gateway's IP address as the source IP address. The VTEP can learn the desired binding between the MAC address and IP address and redirect the traffic to the optimal VxLAN gateway. For example, RBridge RB12 (e.g., a non-optimal VxLAN gateway) can decapsulate FGL frames from physical server pm1 and can encapsulate the frames with VxLAN headers using RBridge RB22's (e.g., an optimal VxLAN gateway) IP address, instead of its own, as the source IP address. Then, VTEP vtep1 can learn the desired binding between physical server pmt's MAC address and RBridge RB22's IP address and redirect the traffic to RBridge RB22. - Optionally, an explicit approach is provided below that can be used by a VxLAN gateway to notify an RBridge or a VTEP of the optimal forwarding path. Although the explicit approach requires a protocol change, it provides the benefit of fast rerouting when a VxLAN gateway in the optimal forwarding path fails. A VxLAN gateway can use the TRILL ESADI to notify an RBridge to which the physical server is connected of the plurality of bindings between the VxLAN server's MAC address and RBridge nicknames of the VxLAN gateways and the associated VxLAN path costs. In other words, using ESADI, a VxLAN gateway can notify the RBridge to which the physical server is connected of a plurality of bindings with associated VxLAN path costs so that the RBridge can switch to the next-best VxLAN gateway if the optimal VxLAN gateway is detected as unreachable by the link state protocol. The VxLAN can be configured to use a modified MAC Reachability TLV, i.e., a VxLAN MAC Reachability TLV. The VxLAN MAC Reachability TLV can include a list of tuples, including but not limited to, one or more VxLAN server MAC addresses and associated VxLAN gateway RBridge nicknames and VxLAN path costs. When the RBridge receives the VxLAN MAC Reachability TLV, it can compute the total path costs based on its FGL path costs to VxLAN gateways and the advertised VxLAN path costs. For example, RBridge RB12 can use the VxLAN MAC Reachability TLV to announce the bindings of
VxLAN server 1's MAC address and the RBridge nicknames of RBridges RB12, RB21 and RB22 (e.g., the VxLAN gateways) with respective VxLAN path costs of 10, 20 and 0. When the RBridge to which physical server pm1 is connected (e.g., RBridge RB21) receives the VxLAN MAC Reachability TLV, it can compute total path costs via RBridges RB12, RB21 and RB22 as 21, 22 and 20, respectively, based on its FGL path costs to RBridges RB12, RB21 and RB22 of 10, 0 and 20, respectively, and the advertised VxLAN path costs of 10, 20 and 0. RBridge RB21 can then redirect the traffic to RBridge RB22 because it is the VxLAN gateway associated with the lowest total path cost. - Additionally, a VxLAN gateway can use a control protocol (e.g., VxLAN Gateway Address Distribution Information (VGADI)) to allow a VxLAN gateway to notify a VTEP of the plurality of bindings between a physical server's MAC address and VxLAN gateway IP addresses of the VxLAN gateways and the associated total path costs. For example, according to VGADI, a VxLAN gateway can unicast its protocol data units (“PDUs”) to the IP address of the intended VTEP. Each PDU can carry a VxLAN Gateway Reachability TLV, which includes a list of tuples, including but not limited to, one or more physical server MAC addresses and associated VxLAN gateway IP addresses and total path costs. For example, RBridge RB12 can use the VxLAN Gateway Reachability TLV to inform VTEP vtep1 of the bindings between physical server pmt's MAC address and the IP addresses of RBridges RB12, RB21 and RB22 (e.g., the VxLAN gateways) with respective total path costs of 21, 22 and 20. VTEP vtep1 can then redirect the traffic to RBridge RB22 because it is the optimal VxLAN gateway associated with the lowest total path cost.
- It should be appreciated that the logical operations described herein with respect to the various figures may be implemented (1) as a sequence of computer implemented acts or program modules (i.e., software) running on a computing device, (2) as interconnected machine logic circuits or circuit modules (i.e., hardware) within the computing device and/or (3) a combination of software and hardware of the computing device. Thus, the logical operations discussed herein are not limited to any specific combination of hardware and software. The implementation is a matter of choice dependent on the performance and other requirements of the computing device. Accordingly, the logical operations described herein are referred to variously as operations, structural devices, acts, or modules. These operations, structural devices, acts and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof. It should also be appreciated that more or fewer operations may be performed than shown in the figures and described herein. These operations may also be performed in a different order than those described herein.
- Referring now to
FIG. 4 , a flow diagram illustratingexample operations 400 for determining an optimal forwarding path across a network is shown. The network can be the network including RBridges configured to implement both FGL networking and VxLAN schemes, e.g., thenetwork 10 shown inFIG. 1 . RBridges configured to implement both FGL and VxLAN networking schemes are VxLAN gateways. As discussed above, theexample operations 400 can be carried out by a VxLAN gateway. At 402, one or more RBridge nicknames can be learned. As discussed above, each RBridge nickname is uniquely associated with one of the VxLAN gateways in the network. At 404, a path cost over the FGL network between each of the VxLAN gateways and a source node is determined. Additionally, at 406, a path cost over the VxLAN between each of the VxLAN gateways and a destination node is determined. At 408, an encapsulation overhead metric associated with switching packets over the VxLAN can be determined. Then, at 410, one of the VxLAN gateways can be selected as an optimal VxLAN gateway. The selection can be based on the path cost over the FGL network between each of the VxLAN gateways and the source node, the path cost over the VxLAN between each of the VxLAN gateways and the destination node and the encapsulation overhead metric. Optionally, after selecting an optimal VxLAN gateway, one or more RBridges in the network can be notified of the selection. This facilitates the ability of the RBridges to re-direct traffic via the optimal VxLAN gateway. - When the logical operations described herein are implemented in software, the process may execute on any type of computing architecture or platform. For example, referring to
FIG. 5 , an example computing device upon which embodiments of the invention may be implemented is illustrated. In particular, the RBridges and servers discussed above may be a computing device, such ascomputing device 500 shown inFIG. 5 . Thecomputing device 500 may include a bus or other communication mechanism for communicating information among various components of thecomputing device 500. In its most basic configuration,computing device 500 typically includes at least oneprocessing unit 506 andsystem memory 504. Depending on the exact configuration and type of computing device,system memory 504 may be volatile (such as random access memory (RAM)), non-volatile (such as read-only memory (ROM), flash memory, etc.), or some combination of the two. This most basic configuration is illustrated inFIG. 5 by dashedline 502. Theprocessing unit 506 may be a standard programmable processor that performs arithmetic and logic operations necessary for operation of thecomputing device 500. Alternatively or additionally, theprocessing unit 506 can be an ASIC. -
Computing device 500 may have additional features/functionality. For example,computing device 500 may include additional storage such asremovable storage 508 andnon-removable storage 510 including, but not limited to, magnetic or optical disks or tapes.Computing device 500 may also contain network connection(s) 516 that allow the device to communicate with other devices.Computing device 500 may also have input device(s) 514 such as a keyboard, mouse, touch screen, etc. Output device(s) 512 such as a display, speakers, printer, etc. may also be included. The additional devices may be connected to the bus in order to facilitate communication of data among the components of thecomputing device 500. All these devices are well known in the art and need not be discussed at length here. - The
processing unit 506 may be configured to execute program code encoded in tangible, computer-readable media. Computer-readable media refers to any media that is capable of providing data that causes the computing device 500 (i.e., a machine) to operate in a particular fashion. Various computer-readable media may be utilized to provide instructions to theprocessing unit 506 for execution. Common forms of computer-readable media include, for example, magnetic media, optical media, physical media, memory chips or cartridges, a carrier wave, or any other medium from which a computer can read. Example computer-readable media may include, but is not limited to, volatile media, non-volatile media and transmission media. Volatile and non-volatile media may be implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data and common forms are discussed in detail below. Transmission media may include coaxial cables, copper wires and/or fiber optic cables, as well as acoustic or light waves, such as those generated during radio-wave and infra-red data communication. Example tangible, computer-readable recording media include, but are not limited to, an integrated circuit (e.g., field-programmable gate array or application-specific IC), a hard disk, an optical disk, a magneto-optical disk, a floppy disk, a magnetic tape, a holographic storage medium, a solid-state device, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. - In an example implementation, the
processing unit 506 may execute program code stored in thesystem memory 504. For example, the bus may carry data to thesystem memory 504, from which theprocessing unit 506 receives and executes instructions. The data received by thesystem memory 504 may optionally be stored on theremovable storage 508 or thenon-removable storage 510 before or after execution by theprocessing unit 506. -
Computing device 500 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed bydevice 500 and includes both volatile and non-volatile media, removable and non-removable media. Computer storage media include volatile and non-volatile, and removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.System memory 504,removable storage 508, andnon-removable storage 510 are all examples of computer storage media. Computer storage media include, but are not limited to, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computingdevice 500. Any such computer storage media may be part ofcomputing device 500. - It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination thereof. Thus, the methods and apparatuses of the presently disclosed subject matter, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computing device, the machine becomes an apparatus for practicing the presently disclosed subject matter. In the case of program code execution on programmable computers, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs may implement or utilize the processes described in connection with the presently disclosed subject matter, e.g., through the use of an application programming interface (API), reusable controls, or the like. Such programs may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language and it may be combined with hardware implementations.
- Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
Claims (20)
1. A method for determining an optimal forwarding path across a network, the network including a plurality of gateways configured to implement respective networking protocols for switching packets over a first logical network and a second logical network, the method comprising:
determining a path cost over the first logical network between each of the gateways and a source node, wherein the first logical network is a Transparent Interconnect of Lots of Links (“TRILL”) fine-grained labeling (“FGL”) network;
determining a path cost over the second logical network between each of the gateways and a destination node;
determining an encapsulation cost difference between switching packets over the second logical network and switching packets over the TRILL FGL network;
determining an encapsulation overhead metric associated with switching packets over the second logical network, wherein the encapsulation overhead metric is proportional to the encapsulation cost difference;
weighting the path cost over the second logical network between each of the gateways and the destination node by the encapsulation overhead metric; and
selecting one of the gateways as an optimal gateway, wherein the selection is based on the path cost over the TRILL FGL network between each of the gateways and the source node and the weighted path cost over the second logical network between each of the gateways and the destination node.
2. The method of claim 1 , further comprising learning one or more RBridge nicknames, each RBridge nickname being uniquely associated with one of the gateways in the network, wherein learning one or more RBridge nicknames further comprises transmitting or receiving a message using a link state protocol, the message comprising at least one of an RBridge nickname and an IP address associated with one of the gateways in the network.
3. The method of claim 1 , wherein the source node comprises a physical server, and the method further comprises determining an RBridge to which the physical server is connected using a media access control (“MAC”) address table, wherein the path cost over the TRILL FGL network between each of the gateways and the source node is determined as a path cost over the TRILL FGL network between each of the gateways and the RBridge to which the physical server is connected.
4. The method of claim 1 , further comprising notifying at least one of an RBridge to which the source node is connected and an RBridge to which the destination node is connected of the optimal gateway.
5. The method of claim 4 , wherein notifying at least one of an RBridge to which the source node is connected and an RBridge to which the destination node is connected of the optimal gateway further comprises:
encapsulating a frame with at least one of an RBridge nickname or an IP address associated with the optimal gateway; and
transmitting the encapsulated frame.
6. The method of claim 4 , wherein notifying at least one of an RBridge to which the source node is connected of the optimal gateway further comprises advertising a plurality of bindings between a MAC address associated with the destination node and RBridge nicknames and path costs associated with the gateways in the network.
7. The method of claim 4 , wherein notifying at least one of an RBridge to which the destination node is connected of the optimal gateway further comprises advertising a plurality of bindings between a MAC address associated with the source node and IP addresses and path costs associated with the gateways in the network.
8. The method of claim 1 , wherein the second logical network is a VxLAN.
9. A non-transitory computer-readable recording medium having computer-executable instructions stored thereon for determining an optimal forwarding path across a network, the network including a plurality of gateways configured to implement respective networking protocols for switching packets over a first logical network and a second logical network, that, when executed by a gateway, cause the gateway to:
determine a path cost over the first logical network between each of the gateways and a source node, wherein the first logical network is a Transparent Interconnect of Lots of Links (“TRILL”) fine-grained labeling (“FGL”) network;
determine a path cost over the second logical network between each of the gateways and a destination node;
determine an encapsulation cost difference between switching packets over the second logical network and switching packets over the TRILL FGL network;
determine an encapsulation overhead metric associated with switching packets over the second logical network, wherein the encapsulation overhead metric is proportional to the encapsulation cost difference;
weight the path cost over the second logical network between each of the gateways and the destination node by the encapsulation overhead metric; and
select one of the gateways as an optimal gateway, wherein the selection is based on the path cost over the TRILL FGL network between each of the gateways and the source node and the weighted path cost over the second logical network between each of the gateways and the destination node.
10. The non-transitory computer-readable recording medium of claim 9 , having further computer-executable instructions stored thereon that, when executed by the gateway, cause the gateway to learn one or more RBridge nicknames, each RBridge nickname being uniquely associated with one of the gateways in the network, wherein learning one or more RBridge nicknames further comprises transmitting or receiving a message using a link state protocol, the message comprising at least one of an RBridge nickname and an IP address associated with one of the gateways in the network.
11. The non-transitory computer-readable recording medium of claim 9 , wherein the source node comprises a physical server, and the non-transitory computer-readable recording medium having further computer-executable instructions stored thereon that, when executed by the gateway, cause the gateway to determine an RBridge to which the physical server is connected using a media access control (“MAC”) address table, wherein the path cost over the TRILL FGL network between each of the gateways and the source node is determined as a path cost over the TRILL FGL network between each of the gateways and the RBridge to which the physical server is connected.
12. The non-transitory computer-readable recording medium of claim 9 , having further computer-executable instructions stored thereon that, when executed by the gateway, cause the gateway to notify at least one of an RBridge to which the source node is connected and an RBridge to which the destination node is connected of the optimal gateway.
13. The non-transitory computer-readable recording medium of claim 12 , wherein notifying at least one of an RBridge to which the source node is connected and an RBridge to which the destination node is connected of the optimal gateway further comprises:
encapsulating a frame with at least one of an RBridge nickname or an IP address associated with the optimal gateway; and
transmitting the encapsulated frame.
14. The non-transitory computer-readable recording medium of claim 12 , wherein notifying at least one of an RBridge to which the source node is connected of the optimal gateway further comprises advertising a plurality of bindings between a MAC address associated with the destination node and RBridge nicknames and path costs associated with the gateways in the network.
15. The non-transitory computer-readable recording medium of claim 12 , wherein notifying at least one of an RBridge to which the destination node is connected of the optimal gateway further comprises advertising a plurality of bindings between a MAC address associated with the source node and IP addresses and path costs associated with the gateways in the network.
16. The non-transitory computer-readable recording medium of claim 9 , wherein the second logical network is a VxLAN.
17. A method for determining an optimal forwarding path across a network, the network including a plurality of gateways configured to implement respective networking protocols for switching packets over a first logical network and a second logical network, the method comprising:
determining a path cost over the first logical network between each of the gateways and a source node;
determining a path cost over the second logical network between each of the gateways and a destination node;
determining an encapsulation cost difference between switching packets over the second logical network and switching packets over the first logical network;
determining an encapsulation overhead metric associated with switching packets over the second logical network, wherein the encapsulation overhead metric is proportional to the encapsulation cost difference;
weighting the path cost over the second logical network between each of the gateways and the destination node by the encapsulation overhead metric; and
selecting one of the gateways as an optimal gateway, wherein the selection is based on the path cost over the first logical network between each of the gateways and the source node and the weighted path cost over the second logical between each of the gateways and the destination node.
18. The method of claim 17 , wherein the first logical network is a Transparent Interconnect of Lots of Links (“TRILL”) fine-grained labeling (“FGL”) network.
19. The method of claim 18 , further comprising learning one or more RBridge nicknames, each RBridge nickname being uniquely associated with one of the gateways in the network.
20. The method of claim 17 , wherein the second logical network is a VxLAN.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/947,134 US20160080247A1 (en) | 2013-05-21 | 2015-11-20 | Optimal forwarding in a network implementing a plurality of logical networking schemes |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/898,572 US9203738B2 (en) | 2013-05-21 | 2013-05-21 | Optimal forwarding for trill fine-grained labeling and VXLAN interworking |
US14/947,134 US20160080247A1 (en) | 2013-05-21 | 2015-11-20 | Optimal forwarding in a network implementing a plurality of logical networking schemes |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/898,572 Continuation US9203738B2 (en) | 2013-05-21 | 2013-05-21 | Optimal forwarding for trill fine-grained labeling and VXLAN interworking |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160080247A1 true US20160080247A1 (en) | 2016-03-17 |
Family
ID=51935348
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/898,572 Active 2033-12-14 US9203738B2 (en) | 2013-05-21 | 2013-05-21 | Optimal forwarding for trill fine-grained labeling and VXLAN interworking |
US14/947,134 Abandoned US20160080247A1 (en) | 2013-05-21 | 2015-11-20 | Optimal forwarding in a network implementing a plurality of logical networking schemes |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/898,572 Active 2033-12-14 US9203738B2 (en) | 2013-05-21 | 2013-05-21 | Optimal forwarding for trill fine-grained labeling and VXLAN interworking |
Country Status (1)
Country | Link |
---|---|
US (2) | US9203738B2 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10412047B2 (en) * | 2017-08-17 | 2019-09-10 | Arista Networks, Inc. | Method and system for network traffic steering towards a service device |
US10721651B2 (en) | 2017-09-29 | 2020-07-21 | Arista Networks, Inc. | Method and system for steering bidirectional network traffic to a same service device |
US10749789B2 (en) | 2018-12-04 | 2020-08-18 | Arista Networks, Inc. | Method and system for inspecting broadcast network traffic between end points residing within a same zone |
US10764234B2 (en) | 2017-10-31 | 2020-09-01 | Arista Networks, Inc. | Method and system for host discovery and tracking in a network using associations between hosts and tunnel end points |
US10848457B2 (en) | 2018-12-04 | 2020-11-24 | Arista Networks, Inc. | Method and system for cross-zone network traffic between different zones using virtual network identifiers and virtual layer-2 broadcast domains |
US10855733B2 (en) | 2018-12-04 | 2020-12-01 | Arista Networks, Inc. | Method and system for inspecting unicast network traffic between end points residing within a same zone |
US10917342B2 (en) | 2018-09-26 | 2021-02-09 | Arista Networks, Inc. | Method and system for propagating network traffic flows between end points based on service and priority policies |
Families Citing this family (45)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9374323B2 (en) * | 2013-07-08 | 2016-06-21 | Futurewei Technologies, Inc. | Communication between endpoints in different VXLAN networks |
US9910686B2 (en) | 2013-10-13 | 2018-03-06 | Nicira, Inc. | Bridging between network segments with a logical router |
US9647883B2 (en) | 2014-03-21 | 2017-05-09 | Nicria, Inc. | Multiple levels of logical routers |
US9893988B2 (en) | 2014-03-27 | 2018-02-13 | Nicira, Inc. | Address resolution using multiple designated instances of a logical router |
US9509603B2 (en) * | 2014-03-31 | 2016-11-29 | Arista Networks, Inc. | System and method for route health injection using virtual tunnel endpoints |
CN105306613A (en) * | 2014-07-24 | 2016-02-03 | 中兴通讯股份有限公司 | MAC address notification method and device and acquisition device for ESADI |
CN105515999B (en) * | 2014-09-24 | 2020-05-19 | 中兴通讯股份有限公司 | Quick convergence method and device for end system address distribution information protocol |
CN104243318B (en) * | 2014-09-29 | 2018-10-09 | 新华三技术有限公司 | MAC address learning method and device in VXLAN networks |
US10511458B2 (en) | 2014-09-30 | 2019-12-17 | Nicira, Inc. | Virtual distributed bridging |
US10250443B2 (en) | 2014-09-30 | 2019-04-02 | Nicira, Inc. | Using physical location to modify behavior of a distributed virtual network element |
US9853873B2 (en) | 2015-01-10 | 2017-12-26 | Cisco Technology, Inc. | Diagnosis and throughput measurement of fibre channel ports in a storage area network environment |
US9787605B2 (en) | 2015-01-30 | 2017-10-10 | Nicira, Inc. | Logical router with multiple routing components |
US9900250B2 (en) | 2015-03-26 | 2018-02-20 | Cisco Technology, Inc. | Scalable handling of BGP route information in VXLAN with EVPN control plane |
US10222986B2 (en) | 2015-05-15 | 2019-03-05 | Cisco Technology, Inc. | Tenant-level sharding of disks with tenant-specific storage modules to enable policies per tenant in a distributed storage system |
US10063467B2 (en) | 2015-05-18 | 2018-08-28 | Cisco Technology, Inc. | Virtual extensible local area network performance routing |
US11588783B2 (en) | 2015-06-10 | 2023-02-21 | Cisco Technology, Inc. | Techniques for implementing IPV6-based distributed storage space |
US10348625B2 (en) | 2015-06-30 | 2019-07-09 | Nicira, Inc. | Sharing common L2 segment in a virtual distributed router environment |
US10778765B2 (en) | 2015-07-15 | 2020-09-15 | Cisco Technology, Inc. | Bid/ask protocol in scale-out NVMe storage |
US10129142B2 (en) | 2015-08-11 | 2018-11-13 | Nicira, Inc. | Route configuration for logical router |
US10057157B2 (en) | 2015-08-31 | 2018-08-21 | Nicira, Inc. | Automatically advertising NAT routes between logical routers |
CN106559325B (en) * | 2015-09-25 | 2020-06-09 | 华为技术有限公司 | Path detection method and device |
US10095535B2 (en) | 2015-10-31 | 2018-10-09 | Nicira, Inc. | Static route types for logical routers |
US9892075B2 (en) | 2015-12-10 | 2018-02-13 | Cisco Technology, Inc. | Policy driven storage in a microserver computing environment |
US10536297B2 (en) * | 2016-03-29 | 2020-01-14 | Arista Networks, Inc. | Indirect VXLAN bridging |
CN107332812B (en) * | 2016-04-29 | 2020-07-07 | 新华三技术有限公司 | Method and device for realizing network access control |
US10140172B2 (en) | 2016-05-18 | 2018-11-27 | Cisco Technology, Inc. | Network-aware storage repairs |
CN106101008B (en) * | 2016-05-31 | 2019-08-06 | 新华三技术有限公司 | A kind of transmission method and device of message |
US20170351639A1 (en) | 2016-06-06 | 2017-12-07 | Cisco Technology, Inc. | Remote memory access using memory mapped addressing among multiple compute nodes |
US10664169B2 (en) | 2016-06-24 | 2020-05-26 | Cisco Technology, Inc. | Performance of object storage system by reconfiguring storage devices based on latency that includes identifying a number of fragments that has a particular storage device as its primary storage device and another number of fragments that has said particular storage device as its replica storage device |
US10153973B2 (en) | 2016-06-29 | 2018-12-11 | Nicira, Inc. | Installation of routing tables for logical router in route server mode |
US11563695B2 (en) | 2016-08-29 | 2023-01-24 | Cisco Technology, Inc. | Queue protection using a shared global memory reserve |
US10454758B2 (en) * | 2016-08-31 | 2019-10-22 | Nicira, Inc. | Edge node cluster network redundancy and fast convergence using an underlay anycast VTEP IP |
CN106302258B (en) * | 2016-09-08 | 2019-06-04 | 杭州迪普科技股份有限公司 | A kind of message forwarding method and device |
WO2018058104A1 (en) | 2016-09-26 | 2018-03-29 | Nant Holdings Ip, Llc | Virtual circuits in cloud networks |
US10545914B2 (en) | 2017-01-17 | 2020-01-28 | Cisco Technology, Inc. | Distributed object storage |
US10243823B1 (en) | 2017-02-24 | 2019-03-26 | Cisco Technology, Inc. | Techniques for using frame deep loopback capabilities for extended link diagnostics in fibre channel storage area networks |
US10713203B2 (en) | 2017-02-28 | 2020-07-14 | Cisco Technology, Inc. | Dynamic partition of PCIe disk arrays based on software configuration / policy distribution |
US10254991B2 (en) | 2017-03-06 | 2019-04-09 | Cisco Technology, Inc. | Storage area network based extended I/O metrics computation for deep insight into application performance |
US10303534B2 (en) | 2017-07-20 | 2019-05-28 | Cisco Technology, Inc. | System and method for self-healing of application centric infrastructure fabric memory |
US10686734B2 (en) | 2017-09-26 | 2020-06-16 | Hewlett Packard Enterprise Development Lp | Network switch with interconnected member nodes |
US10404596B2 (en) | 2017-10-03 | 2019-09-03 | Cisco Technology, Inc. | Dynamic route profile storage in a hardware trie routing table |
US10942666B2 (en) | 2017-10-13 | 2021-03-09 | Cisco Technology, Inc. | Using network device replication in distributed storage clusters |
US10374827B2 (en) | 2017-11-14 | 2019-08-06 | Nicira, Inc. | Identifier that maps to different networks at different datacenters |
US10511459B2 (en) | 2017-11-14 | 2019-12-17 | Nicira, Inc. | Selection of managed forwarding element for bridge spanning multiple datacenters |
CN112702251B (en) * | 2019-10-22 | 2022-09-23 | 华为技术有限公司 | Message detection method, connectivity negotiation relationship establishment method and related equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100172249A1 (en) * | 2005-11-02 | 2010-07-08 | Hang Liu | Method for Determining a Route in a Wireless Mesh Network Using a Metric Based On Radio and Traffic Load |
US8102781B2 (en) * | 2008-07-31 | 2012-01-24 | Cisco Technology, Inc. | Dynamic distribution of virtual machines in a communication network |
US20120076150A1 (en) * | 2010-09-23 | 2012-03-29 | Radia Perlman | Controlled interconnection of networks using virtual nodes |
US20130259050A1 (en) * | 2010-11-30 | 2013-10-03 | Donald E. Eastlake, III | Systems and methods for multi-level switching of data frames |
US20130332602A1 (en) * | 2012-06-06 | 2013-12-12 | Juniper Networks, Inc. | Physical path determination for virtual network packet flows |
US20140029437A1 (en) * | 2012-07-24 | 2014-01-30 | Fujitsu Limited | Information processing system, information processing method, and relay apparatus |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4816957B2 (en) * | 2007-03-07 | 2011-11-16 | 日本電気株式会社 | Relay device, route selection system, route selection method, and program |
US9054999B2 (en) * | 2012-05-09 | 2015-06-09 | International Business Machines Corporation | Static TRILL routing |
US9380132B2 (en) * | 2011-06-27 | 2016-06-28 | Marvell Israel (M.I.S.L.) Ltd. | FCoE over trill |
WO2013117166A1 (en) * | 2012-02-08 | 2013-08-15 | Hangzhou H3C Technologies Co., Ltd. | Implement equal cost multiple path of trill network |
US9614759B2 (en) * | 2012-07-27 | 2017-04-04 | Dell Products L.P. | Systems and methods for providing anycast MAC addressing in an information handling system |
US9401862B2 (en) * | 2013-02-07 | 2016-07-26 | Dell Products L.P. | Optimized internet small computer system interface path |
JP6217138B2 (en) * | 2013-05-22 | 2017-10-25 | 富士通株式会社 | Packet transfer apparatus and packet transfer method |
US9203749B2 (en) * | 2013-05-29 | 2015-12-01 | Cisco Technology, Inc. | System, devices and methods for facilitating coexistence of VLAN labeling and fine-grained labeling RBridges |
US9565105B2 (en) * | 2013-09-04 | 2017-02-07 | Cisco Technology, Inc. | Implementation of virtual extensible local area network (VXLAN) in top-of-rack switches in a network environment |
-
2013
- 2013-05-21 US US13/898,572 patent/US9203738B2/en active Active
-
2015
- 2015-11-20 US US14/947,134 patent/US20160080247A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100172249A1 (en) * | 2005-11-02 | 2010-07-08 | Hang Liu | Method for Determining a Route in a Wireless Mesh Network Using a Metric Based On Radio and Traffic Load |
US8102781B2 (en) * | 2008-07-31 | 2012-01-24 | Cisco Technology, Inc. | Dynamic distribution of virtual machines in a communication network |
US20120076150A1 (en) * | 2010-09-23 | 2012-03-29 | Radia Perlman | Controlled interconnection of networks using virtual nodes |
US20130259050A1 (en) * | 2010-11-30 | 2013-10-03 | Donald E. Eastlake, III | Systems and methods for multi-level switching of data frames |
US20130332602A1 (en) * | 2012-06-06 | 2013-12-12 | Juniper Networks, Inc. | Physical path determination for virtual network packet flows |
US20140029437A1 (en) * | 2012-07-24 | 2014-01-30 | Fujitsu Limited | Information processing system, information processing method, and relay apparatus |
Non-Patent Citations (1)
Title |
---|
Michael Barbehenn; A Note on the Complexity of Dijkstra’s Algorithm for Graphs with Weighted Vertices; IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, FEBRUARY 1998 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10412047B2 (en) * | 2017-08-17 | 2019-09-10 | Arista Networks, Inc. | Method and system for network traffic steering towards a service device |
US20190356632A1 (en) * | 2017-08-17 | 2019-11-21 | Arista Networks, Inc. | Method and system for network traffic steering towards a service device |
US11012412B2 (en) | 2017-08-17 | 2021-05-18 | Arista Networks, Inc. | Method and system for network traffic steering towards a service device |
US10721651B2 (en) | 2017-09-29 | 2020-07-21 | Arista Networks, Inc. | Method and system for steering bidirectional network traffic to a same service device |
US11277770B2 (en) | 2017-09-29 | 2022-03-15 | Arista Networks, Inc. | Method and system for steering bidirectional network traffic to a same service device |
US10764234B2 (en) | 2017-10-31 | 2020-09-01 | Arista Networks, Inc. | Method and system for host discovery and tracking in a network using associations between hosts and tunnel end points |
US10917342B2 (en) | 2018-09-26 | 2021-02-09 | Arista Networks, Inc. | Method and system for propagating network traffic flows between end points based on service and priority policies |
US11463357B2 (en) | 2018-09-26 | 2022-10-04 | Arista Networks, Inc. | Method and system for propagating network traffic flows between end points based on service and priority policies |
US10749789B2 (en) | 2018-12-04 | 2020-08-18 | Arista Networks, Inc. | Method and system for inspecting broadcast network traffic between end points residing within a same zone |
US10848457B2 (en) | 2018-12-04 | 2020-11-24 | Arista Networks, Inc. | Method and system for cross-zone network traffic between different zones using virtual network identifiers and virtual layer-2 broadcast domains |
US10855733B2 (en) | 2018-12-04 | 2020-12-01 | Arista Networks, Inc. | Method and system for inspecting unicast network traffic between end points residing within a same zone |
Also Published As
Publication number | Publication date |
---|---|
US9203738B2 (en) | 2015-12-01 |
US20140348166A1 (en) | 2014-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9203738B2 (en) | Optimal forwarding for trill fine-grained labeling and VXLAN interworking | |
US9680751B2 (en) | Methods and devices for providing service insertion in a TRILL network | |
EP3497893B1 (en) | Segment routing based on maximum segment identifier depth | |
ES2588739T3 (en) | Method, equipment and system for mapping a service instance | |
US8830998B2 (en) | Separation of edge and routing/control information for multicast over shortest path bridging | |
US9167501B2 (en) | Implementing a 3G packet core in a cloud computer with openflow data and control planes | |
US7408941B2 (en) | Method for auto-routing of multi-hop pseudowires | |
US8345697B2 (en) | System and method for carrying path information | |
EP3197107B1 (en) | Message transmission method and apparatus | |
WO2016165492A1 (en) | Method and apparatus for implementing service function chain | |
US11128489B2 (en) | Maintaining data-plane connectivity between hosts | |
CN104396197B (en) | Selecting between equal-cost shortest paths in 802.1aq networks using separate tie-breakers | |
CN109314666A (en) | Virtual tunnel endpoints for congestion-aware load balancing | |
US20130100858A1 (en) | Distributed switch systems in a trill network | |
CN112868214B (en) | Coordinated load transfer OAM records within packets | |
WO2020173198A1 (en) | Message processing method, message forwarding apparatus, and message processing apparatus | |
EP3528441B1 (en) | Message forwarding | |
CN106170952A (en) | Method and system for deploying a maximally redundant tree in a data network | |
CN107872389B (en) | Method, apparatus, and computer-readable storage medium for service load balancing | |
US11362954B2 (en) | Tunneling inter-domain stateless internet protocol multicast packets | |
CN111740907A (en) | Message transmission method, device, equipment and machine readable storage medium | |
US20250047590A1 (en) | Packet Sending Method, Network Device, and Communication System | |
US20230164070A1 (en) | Packet sending method, device, and system | |
US20130279513A1 (en) | Systems and methods for pseudo-link creation | |
US10164795B1 (en) | Forming a multi-device layer 2 switched fabric using internet protocol (IP)-router / switched networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YANG, YIBIN;TSAI, CHIAJEN;DONG, LIQIN;AND OTHERS;REEL/FRAME:037100/0629 Effective date: 20130520 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |