US20170187616A1 - Multi-planed unified switching topologies - Google Patents
Multi-planed unified switching topologies Download PDFInfo
- Publication number
- US20170187616A1 US20170187616A1 US14/982,547 US201514982547A US2017187616A1 US 20170187616 A1 US20170187616 A1 US 20170187616A1 US 201514982547 A US201514982547 A US 201514982547A US 2017187616 A1 US2017187616 A1 US 2017187616A1
- Authority
- US
- United States
- Prior art keywords
- network
- plane
- connections
- switch
- switches
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 64
- 235000008694 Humulus lupulus Nutrition 0.000 abstract description 23
- 238000003860 storage Methods 0.000 description 21
- 238000005192 partition Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 9
- 238000002347 injection Methods 0.000 description 6
- 239000007924 injection Substances 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 241000238633 Odonata Species 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 238000003491 array Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000013341 scale-up Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/58—Association of routers
- H04L45/583—Stackable routers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/12—Discovery or management of network topologies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/02—Topology update or discovery
- H04L45/04—Interdomain routing, e.g. hierarchical routing
Definitions
- aspects of the present invention generally relate to an apparatus and method for extending the scalability and improving the partitionability of baseline networks for transporting packet traffic from a source endpoint to a destination endpoint.
- aspects of the invention generally relate to apparatus and method to build a large-scale partitionable network by stacking multiple copies of a baseline network.
- aspects of the invention relate to global switches in multiple planes of all-to-all-based networks being stacked and connected via global switches with minimal cost overhead and number of hops.
- aspects of the invention are an apparatus and method for increasing scalability of a network for transporting packet traffic from a source endpoint to a destination endpoint with low per-endpoint (per-server) cost and a small number of hops.
- Embodiments of the invention primarily concern an all-to-all wiring in the baseline topology decomposed into smaller all-to-all components in which each small all-to-all connection is replaced with star topology via a global switch.
- An exemplary method for building a multiple plane unified stacking topology network comprises providing a baseline network comprising endpoints, edge switches, and links, and containing more than one disjoint all-to-all connections that are not contained in a larger all-to-all connection; duplicating the baseline network to form a multiple plane switching topology; providing global switches connecting multiple planes; replacing the links in all or a subset of the all-to-all connections in each plane with a set of star connections, where each of the target all-to-all connections is decomposed into smaller all-to-all connections and replacing each of the smaller all-to-all connections with a star connection of the same size in each plane and where a global switch acts as the center switch of the star connection and each global switch acts as the center switches of star connections in multiple planes; and connecting each global switch directly to edge switches in multiple planes.
- An exemplary multiple plane unified stacking topology network comprises baseline network comprising endpoints, edge switches, and links, and containing more than one disjoint all-to-all connections that are not contained in a larger all-to-all connection; multiple baseline networks forming a multiple plane switching topology; global switches connecting multiple planes; the links in all or a subset of the all-to-all connections in each plane are replaced with a set of star connections, where each of the target all-to-all connections is decomposed into smaller all-to-all connections and replacing each of the smaller all-to-all connections with a star connection of the same size in each plane, and where a global switch acts as the center switch of the star connection and each global switch acts as the center switches of star connections in multiple planes; and each global switch being directly connected to edge switches in multiple planes.
- An exemplary multiple plane grouped unified stacked all-to-all topology network comprises a flat all-to-all baseline network comprising endpoints, edge switches, and links; multiple baseline networks forming a multiple plane switching topology; global switches connecting multiple planes; the links in the all-to-all connection in each plane are replaced with a set of star connections, where the all-to-all connection is decomposed into smaller all-to-all connections with size 3 or larger and replacing each of the smaller all-to-all connections with a star connection of the same size in each plane, and where a global switch acts as the center switch of the star connection and each global switch acts as the center switches of star connections in multiple planes; and each global switch being directly connected to edge switches in multiple planes.
- FIG. 1 a shows an embodiment of an all-to-all network topology.
- FIG. 1 b shows an embodiment of an all-to-all network topology including global switches.
- FIG. 1 c shows an embodiment of a stack of copies of an all-to-all network topology including global switches created by point-to-point unified stacking.
- FIG. 2 shows an aspect of a network topology illustrating a direct routing method.
- FIG. 3 shows an aspect of a network topology illustrating an indirect routing method.
- FIG. 4 shows oversubscribed stacking of all-to-all network topology.
- FIG. 5 shows an aspect of the invention referred to as group unified switching stacking, where the baseline network is a flat all-to-all.
- FIG. 6 a shows an embodiment of a 2D HyperX topology network.
- FIG. 6 b shows an embodiment of a 2D HyperX topology network with global switches on the S links for point-to-point unified stacking.
- FIG. 6 c shows an embodiment of a stack of 2D HyperX topology networks with global switches on the S links, created by point-to-point unified stacking.
- FIG. 7 a shows an embodiment of a 2D HyperX topology network.
- FIG. 7 b shows an embodiment of a 2D HyperX topology network with global switches for grouped unified stacking.
- FIG. 8 shows an embodiment of a stacked 2D HyperX topology network created by grouped unified stacking.
- FIG. 9 is a schematic block diagram of a computer system for practicing various embodiments of the invention.
- Embodiments of the invention include a method to build an apparatus which is a large-scale partitionable network by stacking multiple copies of a baseline network for transporting packet traffic from a source endpoint to a destination endpoint.
- aspects of the invention cover two variations of methods to build a large scale, low diameter, and partitionable network from a baseline network, as well as network topologies that can be built using the methods.
- the first variation method point-to-point unified stacking (2-way stacking)
- the second variation method grouped unified stacking (3 or more-way stacking)
- Both of the methods can be applied to a baseline network that contains one or more all-to-all connections, such as flat all-to-all, HyperX, or Dragonfly.
- Embodiments of the invention cover the following cases: Point-to-point unified stacking method, applied to a base line topology other than a flat all-to-all and grouped unified stacking method, applied to any baseline topology.
- aspects of the invention extend an all-to-all based network topology by creating multiple copies of the topology and stacking the copies using global switches.
- Embodiments of the invention is also useful for increasing the scale of the baseline topology to support more endpoints with small cost overhead (i.e. number of switch ports and links).
- the point-to-point unified stacking method can build a large scale network by duplicating a baseline network topology and stacking them via global switches, exploiting all-to-all connections in the baseline network.
- the increase in hardware (number of switches and links) and diameter (link hops) by this modification is minimal, resulting good cost and latency.
- the resulting multiple plane network has features that the baseline network (flat all-to-all, HyperX, Dragonfly) do not typically have: (1) each copy of the baseline network, or plane, can act as an independent partition when the whole network needs to be divided for multiple user tasks, and (2) if there are spare ports on the global switches, new planes could be installed afterward to extend the system scale, without making any changes on the existing links.
- the grouped unified stacking method is similar to the point-to-point method but replaces all-to-all connections with more sophisticated star topologies, increasing the scale (number of end points) of each plane.
- the scalability of the network increases by up to twice, practically 33 percent to 50 percent, without increasing per-endpoint (per-server) cost and number of hops.
- An all-to-all connection in the baseline topology is decomposed into smaller all-to-all connections where each smaller all-to-all connection is replaced with star topology via a global switch.
- the grouped method also has a benefit of system partitionability and extendibility, similar to the point-to-point method.
- group method will create a fewer number of planes, yielding less flexibility in partitioning.
- Resulting topologies created using these methods include, but are not limited to, stacked all-to-all, stacked 2D HyperX, and double stacked 2D HyperX.
- the following description will mainly focus on stacked all-to-all and 2D HyperX topologies, although not limited thereto, since they are simple yet important examples.
- An all-to-all connection of size K (K: natural, K ⁇ 2) is a set of total K(K ⁇ 1)/2 links L ij (i,j: natural, 0 ⁇ i ⁇ j ⁇ K) that connects K switches S l (l: natural, 0 ⁇ l ⁇ K) in all-to-all manner, where the link L ij connects between switches S i and S j
- the center switch can act as the center switch in more than one star connections.
- Example 1 Stacked all-to-all Network with Point-to-Point Unified Stacking
- Every switch in the baseline all-to-all network has N ports.
- First an all-to-all network is built using edge switches 102 .
- Each switch 102 called an edge switch, serves up to N/2 end points 104 , and is wired to up to N/2 other edge switches in an all-to-all manner.
- each edge switch 102 has N/2 end points 104 . (Only one set of end points 104 is shown with only one edge switch 102 for simplicity sake.)
- Each edge switch 102 is wired to each other edge switch 102 in the all-to-all network 106 .
- FIG. 1 b a set of switches 110 are inserted on the links that connect between edge switches. These switches 110 are called global switches. Similar to an edge switch, a global switch has N ports, although only two of them are used at this time. Finally, the whole network is duplicated or stacked 112 to create up to N/2 copies, or planes.
- FIG. 1 c shows three such duplicated networks or planes. The global switches in the same position in these planes are consolidated into a single switch. For example, in FIG. 1 c the three global switches labeled with “A” are really one switch, which has six ports connecting to six edge switches spread over three planes. Similarly, the switches that have the same label (“B”, “C”, “D”, “E”, and “F”) are really the same respective switches, each having six ports.
- the number or quantity of planes (copies) is limited to N/2 because each global switch consumes two ports per plane, thus all N ports are used with N/2 planes.
- a stacked all-to-all switching network can scale to up to N 2 (N+2)/8 ⁇ N 3 /8 end points:
- Each plane has (N/2+1) edge switches. There can be up to N/2 such planes.
- N 36 port switches
- a system size (number of end points) could be increased by adding planes. Initially a system can be built with less than N/2 planes. More planes can be added afterward to increase the system size until the number of planes reaches the upper limit of N/2, without affecting the existing wiring.
- a stacked all-to-all network can be partitioned in units of planes without interference among partitions.
- Network traffic within each plane, or a group of planes does not interfere with any other plane because the planes are decoupled by the global switches.
- various combinations of partition sizes are possible. For example, if there are 4 planes, possible partitioning examples include 2 partitions with 1 plane and 3 planes, 3 partitions with 1 plane ⁇ 2 and 2 planes ⁇ 1, and 4 of 1 plane partitions.
- Deadlock free direct and indirect routing methods are available on a stacked all-to-all network.
- Direct routing path shown in FIG. 2 consists of 4 link hops: Injection, S up , S down , and Reception.
- the Injection hop (1) is to traverse from the source endpoint 204 to the start edge switch 212 .
- the second hop S up (2) is to travel over the link from the start edge switch 212 to a global switch 210 labeled E in the figure.
- the next hop S down is to travel from the global switch 210 to the destination edge switch 206 .
- the final hop Reception is to hop from the edge switch 206 to the destination endpoint 208 .
- the 3 “E” labeled switches in FIG. 2 are actually a single switch connected to edge switches in each plane.
- Indirect routing path shown in FIG. 3 consists of 6 link hops: Injection, S up , S down , S up , S down , and Reception.
- an intermediate edge switch 314 is selected.
- the first 3 link hops, Injection, S up and S down are to reach this intermediate edge switch 314 from the source endpoint 304 .
- the remaining three link hops, S up , S down , and Reception carry the packet to the final destination endpoint 308 .
- the global switches with the same letter label (A, B, C, D, E, and F) are the same switch. This applies to both A labeled switches and F labeled switches in FIG. 3 .
- Example 2 Stacked 2D HyperX with Point-to-Point Unified Stacking
- the point-to-point unified stacking method could be applied to any topology that contains all-to-all connections.
- a 2D HyperX network is one such topology and can be stacked using this method as shown in FIG. 6 c described below.
- FIG. 6 a there is shown a 2D HyperX topology consisting of nine edge switches 602 (each of the two dimensions consists of 3 edge switches).
- S links 606 are in the horizontal direction as viewed in the figures.
- L links 608 are in the vertical direction as viewed in the figures.
- Each switch belongs to two different groups of switches with all-to-all connections within the group: a group in the horizontal direction, and a group in the vertical direction. We can apply stacking to either, or both, of the dimensions. We illustrate this where it is applied to the horizontal direction.
- global switches 610 are inserted on one dimension (S links in the figure), and then multiple copies of the 2D HyperX networks are stacked.
- Each edge switch 602 has N ports.
- each HyperX dimension size is N/3+1.
- Each global switch uses 2 ports per plane.
- Direct route consists of five cable hops (Injection, L, S up , S down , and Reception). Indirect routing consists of up to eight cable hops since L, S up , and S down can be repeated up to twice. Similar to stacked all-to-all topology, 3 VCs are required for fully-flexible indirect routing with unrestricted ordering. 2 VCs are required for indirect routing restricted ordering, and 1 VC for direct routing. Similar to stacked all-to-all, the stacked HyperX network could be partitioned into multiple planes (or set of planes) without interfering with each other. As for modular system growth, initially the system could have a small number of planes ( ⁇ N/2), and additional planes could be added afterward.
- Example 3 Stacked all-to-all with Grouped Unified Stacking
- This example covers a simple example of grouped unified stacking, where the baseline network is a flat all-to-all topology.
- the grouped method is an aspect of the invention different from Example 1 where the point-to-point method is applied to a flat all-to-all topology.
- a global switch bridges two existing edge switches in each plane.
- a global switch could bridge three or more edge switches in each plane, which we call “grouped unified stacking” or “multi-way stacking”.
- FIG. 5 shows an example of grouped unified stacking method.
- 3-way all-to-all components are replaced with 3-way star connections to global switches 510 .
- the global switches 510 act as the center switches in the star connections.
- global switch 510 labeled “A” serves three edge switches 502 in a plane 512 , replacing the 3-way all-to-all links among these three edge switches.
- Each plane 512 , 514 has seven edge switches.
- Each global switch 510 bridges three edge switches 502 . Any edge switch 502 can reach to any other edge switch 502 via one hop through a global switch 510 .
- each global switch 510 bridges three edge switches (rather than two) in each plane. There are seven edge switches 502 on each plane 512 , 514 . These edge switches are connected with each other via 7 global switches 510 . Any edge switch 502 can reach to any other edge switch 502 via one hop through a global switch 510 . Thus the required number of hops is the same as in the point-to-point method.
- Each edge switch 502 has three end points 504 with six ports.
- an edge switch could reach two other edge switches on the same plane via one up link port to a global switch. Therefore, more edge switches could be placed in each plane.
- multi-way stacking is a useful way to build larger scale network with limited number of switch ports.
- the number of planes is reduced since each global switch needs more ports per plane. For this reason, the improvement in terms of scalability is limited.
- N number of switch ports
- Each edge switch has N/2 uplink ports to N/2 global switches.
- Example 4 Stacked 2D HyperX with Grouped Unified Stacking
- FIG. 7 a shows the baseline 2D HyperX topology, which is a (N L +1) ⁇ (N S +1) array of edge switches 702 , where N L is the number of L links and N S is the number of S links per edge switch. There are all-to-all L links along the vertical dimension, and all-to-all S links along the horizontal dimension.
- FIG. 7 b not all end points 704 are shown.
- one dimension of the 2D HyperX wiring is replaced with 3-way star connections via global switches 710 , as shown in FIG. 7 b .
- the star connection links 706 along S dimension replace original all-to-all wiring along S dimension. This is similar to the stacked all-to-all with 3-way grouped method in the Example 3.
- each group of the 7 edge switches 702 along the S dimension is connected via seven global switches 710 . There are total of 49 global switches. Note an edge switch now needs only three S links (as opposed to six in the original 2D HyperX).
- FIG. 8 now shows multiple planes in the stacked 2D HyperX topology with 3-way stacking.
- the original 2D HyperX network can be duplicated into up to N/3 planes.
- the global switches 810 in the same position in each plane 812 , 814 are really one switch.
- the “A” switches in each plane 812 and 814 , . . . are only one switch. It is the same for “B”, “C”, “D”, . . . switches.
- Up to N/3 endpoints 804 can be connected to an edge switch 802 , up to N/3+1 edge switches can be placed along L dimension, up to 2N/3+1 edge switches can be placed along S dimension, and up to N/3 planes can be created.
- a direct routing path is five cables hops (Injection+L+S_up+S_down+Reception).
- An indirect routing path is a maximum of eight cable hops (additional L, S_up and S_down).
- the Stacked 2D HyperX topology could be further stacked using the L links.
- another set of global switches are inserted on the L links 608 in FIG. 6 c , and the whole Stacked 2D HyperX network is further replicated into N/2 copies, connected with the new global switches.
- This will allow a very large network (scales to ⁇ N 5 /108 end points with point-to-point unified stacking method) and many partitions (N 2 /4), but require additional cost for extra global switches and links.
- FIG. 4 shows an example of oversubscribed stacked all-to-all topology. Different from the original stacked all-to-all, which had six global switches, the oversubscribed network shown in FIG. 4 has only four global switches 410 . The rest of the links do not have global switches and hence the edge switches are directly wired within the plane. The “missing” global switches are shown in dotted outline. As a result, the number of links and switches are reduced, resulting in lower cost.
- FIG. 9 illustrates a schematic diagram of an example computer or processing system that may implement the extending the scalability and improving the partitionability of baseline networks for transporting packet traffic from a source endpoint to a destination endpoint in one exemplary embodiment of the present disclosure.
- the computer system is only one example of a suitable processing system and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the methodology described herein.
- the processing system shown may be operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with the processing system shown in FIG.
- 9 may include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
- the computer system may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system.
- program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types.
- the computer system may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules may be located in both local and remote computer system storage media including memory storage devices.
- the components of computer system may include, but are not limited to, one or more processors or processing units 902 , a system memory 906 , and a bus 904 that couples various system components including system memory 906 to processor 902 .
- the processor 902 may include a module 900 that performs the methods described herein.
- the module 900 may be programmed into the integrated circuits of the processor 902 , or loaded from memory 906 , storage device 908 , or network 914 or combinations thereof.
- Bus 904 may represent one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
- bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnects (PCI) bus.
- Computer system may include a variety of computer system readable media. Such media may be any available media that is accessible by computer system, and it may include both volatile and non-volatile media, removable and non-removable media.
- System memory 906 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) and/or cache memory or others. Computer system may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- storage system 908 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (e.g., a “hard drive”).
- a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”).
- an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media.
- each can be connected to bus 904 by one or more data media interfaces.
- Computer system may also communicate with one or more external devices 916 such as a keyboard, a pointing device, a display 918 , etc.; one or more devices that enable a user to interact with computer system; and/or any devices (e.g., network card, modem, etc.) that enable computer system to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 910 .
- external devices 916 such as a keyboard, a pointing device, a display 918 , etc.
- any devices e.g., network card, modem, etc.
- I/O Input/Output
- computer system can communicate with one or more networks 914 such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 912 .
- network adapter 912 communicates with the other components of computer system via bus 904 .
- bus 904 It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system. Examples include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
- Embodiments of the present invention may be a system, a method, and/or a computer program product.
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
- This invention was made with Government support under contract no. DE-AC02-05CH11231 awarded by the U.S. Department of Energy. The Government has certain rights in this invention.
- Aspects of the present invention generally relate to an apparatus and method for extending the scalability and improving the partitionability of baseline networks for transporting packet traffic from a source endpoint to a destination endpoint. Specifically, aspects of the invention generally relate to apparatus and method to build a large-scale partitionable network by stacking multiple copies of a baseline network. More specifically, aspects of the invention relate to global switches in multiple planes of all-to-all-based networks being stacked and connected via global switches with minimal cost overhead and number of hops.
- Aspects of the invention are an apparatus and method for increasing scalability of a network for transporting packet traffic from a source endpoint to a destination endpoint with low per-endpoint (per-server) cost and a small number of hops. Embodiments of the invention primarily concern an all-to-all wiring in the baseline topology decomposed into smaller all-to-all components in which each small all-to-all connection is replaced with star topology via a global switch.
- An exemplary method for building a multiple plane unified stacking topology network comprises providing a baseline network comprising endpoints, edge switches, and links, and containing more than one disjoint all-to-all connections that are not contained in a larger all-to-all connection; duplicating the baseline network to form a multiple plane switching topology; providing global switches connecting multiple planes; replacing the links in all or a subset of the all-to-all connections in each plane with a set of star connections, where each of the target all-to-all connections is decomposed into smaller all-to-all connections and replacing each of the smaller all-to-all connections with a star connection of the same size in each plane and where a global switch acts as the center switch of the star connection and each global switch acts as the center switches of star connections in multiple planes; and connecting each global switch directly to edge switches in multiple planes.
- An exemplary multiple plane unified stacking topology network comprises baseline network comprising endpoints, edge switches, and links, and containing more than one disjoint all-to-all connections that are not contained in a larger all-to-all connection; multiple baseline networks forming a multiple plane switching topology; global switches connecting multiple planes; the links in all or a subset of the all-to-all connections in each plane are replaced with a set of star connections, where each of the target all-to-all connections is decomposed into smaller all-to-all connections and replacing each of the smaller all-to-all connections with a star connection of the same size in each plane, and where a global switch acts as the center switch of the star connection and each global switch acts as the center switches of star connections in multiple planes; and each global switch being directly connected to edge switches in multiple planes.
- An exemplary multiple plane grouped unified stacked all-to-all topology network comprises a flat all-to-all baseline network comprising endpoints, edge switches, and links; multiple baseline networks forming a multiple plane switching topology; global switches connecting multiple planes; the links in the all-to-all connection in each plane are replaced with a set of star connections, where the all-to-all connection is decomposed into smaller all-to-all connections with
size 3 or larger and replacing each of the smaller all-to-all connections with a star connection of the same size in each plane, and where a global switch acts as the center switch of the star connection and each global switch acts as the center switches of star connections in multiple planes; and each global switch being directly connected to edge switches in multiple planes. - The objects, features, and advantage of the present disclosure will become more clearly apparent when the following description is taken in conjunction with the accompanying drawings.
-
FIG. 1a shows an embodiment of an all-to-all network topology. -
FIG. 1b shows an embodiment of an all-to-all network topology including global switches. -
FIG. 1c shows an embodiment of a stack of copies of an all-to-all network topology including global switches created by point-to-point unified stacking. -
FIG. 2 shows an aspect of a network topology illustrating a direct routing method. -
FIG. 3 shows an aspect of a network topology illustrating an indirect routing method. -
FIG. 4 shows oversubscribed stacking of all-to-all network topology. -
FIG. 5 shows an aspect of the invention referred to as group unified switching stacking, where the baseline network is a flat all-to-all. -
FIG. 6a shows an embodiment of a 2D HyperX topology network. -
FIG. 6b shows an embodiment of a 2D HyperX topology network with global switches on the S links for point-to-point unified stacking. -
FIG. 6c shows an embodiment of a stack of 2D HyperX topology networks with global switches on the S links, created by point-to-point unified stacking. -
FIG. 7a shows an embodiment of a 2D HyperX topology network. -
FIG. 7b shows an embodiment of a 2D HyperX topology network with global switches for grouped unified stacking. -
FIG. 8 shows an embodiment of a stacked 2D HyperX topology network created by grouped unified stacking. -
FIG. 9 is a schematic block diagram of a computer system for practicing various embodiments of the invention. - Embodiments of the invention include a method to build an apparatus which is a large-scale partitionable network by stacking multiple copies of a baseline network for transporting packet traffic from a source endpoint to a destination endpoint.
- Aspects of the invention cover two variations of methods to build a large scale, low diameter, and partitionable network from a baseline network, as well as network topologies that can be built using the methods. The first variation method, point-to-point unified stacking (2-way stacking), can be applied to various baseline network topology flexibly and can create multiple partitions. The second variation method, grouped unified stacking (3 or more-way stacking), has restrictions on the baseline network topology and can create a lesser number of partitions, but can build larger scale (more endpoints) network than the point-to-point method. Both of the methods can be applied to a baseline network that contains one or more all-to-all connections, such as flat all-to-all, HyperX, or Dragonfly.
- Embodiments of the invention cover the following cases: Point-to-point unified stacking method, applied to a base line topology other than a flat all-to-all and grouped unified stacking method, applied to any baseline topology.
- Existing all-to-all, Dragonfly, and HyperX network topologies have low diameter and good all-to-all communication bandwidth. They exploit all-to-all interconnection or wiring to achieve these benefits with low cost.
- However, all-to-all connections in these topologies have undesirable characteristics. First, the components wired in all-to-all cannot be partitioned efficiently. In a high-performance computing (HPC) system, a large scale system is often divided into multiple partitions used for different jobs. When an all-to-all network is divided into two equally-sized partitions for different independent jobs, half of the original all-to-all links becomes inter-partition idle links. As a result, half of the network bandwidth will be lost. It is still possible to use these inter-partition links for intra-partition communication by means of indirect routing, but that will cause undesirable inter-job interference. Second, it is hard to add new nodes/switches to all-to-all topology. To add a new component, it has to be wired to every existing component to maintain the all-to-all wiring.
- To overcome these limitations, aspects of the invention extend an all-to-all based network topology by creating multiple copies of the topology and stacking the copies using global switches. Embodiments of the invention is also useful for increasing the scale of the baseline topology to support more endpoints with small cost overhead (i.e. number of switch ports and links).
- The point-to-point unified stacking method can build a large scale network by duplicating a baseline network topology and stacking them via global switches, exploiting all-to-all connections in the baseline network. The increase in hardware (number of switches and links) and diameter (link hops) by this modification is minimal, resulting good cost and latency. In addition, the resulting multiple plane network has features that the baseline network (flat all-to-all, HyperX, Dragonfly) do not typically have: (1) each copy of the baseline network, or plane, can act as an independent partition when the whole network needs to be divided for multiple user tasks, and (2) if there are spare ports on the global switches, new planes could be installed afterward to extend the system scale, without making any changes on the existing links.
- The grouped unified stacking method is similar to the point-to-point method but replaces all-to-all connections with more sophisticated star topologies, increasing the scale (number of end points) of each plane.
- The scalability of the network increases by up to twice, practically 33 percent to 50 percent, without increasing per-endpoint (per-server) cost and number of hops.
- An all-to-all connection in the baseline topology is decomposed into smaller all-to-all connections where each smaller all-to-all connection is replaced with star topology via a global switch. The grouped method also has a benefit of system partitionability and extendibility, similar to the point-to-point method. However, there are restrictions on the baseline network due to the decomposition and replacement steps of all-to-all connections. In addition, group method will create a fewer number of planes, yielding less flexibility in partitioning.
- Resulting topologies created using these methods include, but are not limited to, stacked all-to-all, stacked 2D HyperX, and double stacked 2D HyperX. The following description will mainly focus on stacked all-to-all and 2D HyperX topologies, although not limited thereto, since they are simple yet important examples.
- An all-to-all connection of size K (K: natural, K≧2) is a set of total K(K−1)/2 links Lij (i,j: natural, 0≦i<j<K) that connects K switches Sl (l: natural, 0≦l<K) in all-to-all manner, where the link Lij connects between switches Si and Sj
- A star connection of size K is a set of K links Li (i=0, 1, . . . , K−1) that connects K switches Sl (l: natural, 0≦l<K) and a switch called “center switch”, where the link Li connects between switch Si and the center switch. The center switch can act as the center switch in more than one star connections.
- In this example, a simple example of point-to-point unified stacking method is presented, where the baseline networks is a flat all-to-all topology. This is the simplest case of the point-to-point method and hence is explained here as an introductory example. This network is constructed based on an all-to-all network as shown in
FIG. 1 a. - Every switch in the baseline all-to-all network has N ports. A particular example of N=6 is shown in the
FIG. 1a . First an all-to-all network is built using edge switches 102. Eachswitch 102, called an edge switch, serves up to N/2end points 104, and is wired to up to N/2 other edge switches in an all-to-all manner. InFIG. 1a , eachedge switch 102 has N/2 end points 104. (Only one set ofend points 104 is shown with only oneedge switch 102 for simplicity sake.) Eachedge switch 102 is wired to eachother edge switch 102 in the all-to-allnetwork 106. Then innetwork 108 inFIG. 1b a set ofswitches 110 are inserted on the links that connect between edge switches. Theseswitches 110 are called global switches. Similar to an edge switch, a global switch has N ports, although only two of them are used at this time. Finally, the whole network is duplicated or stacked 112 to create up to N/2 copies, or planes.FIG. 1c shows three such duplicated networks or planes. The global switches in the same position in these planes are consolidated into a single switch. For example, inFIG. 1c the three global switches labeled with “A” are really one switch, which has six ports connecting to six edge switches spread over three planes. Similarly, the switches that have the same label (“B”, “C”, “D”, “E”, and “F”) are really the same respective switches, each having six ports. The number or quantity of planes (copies) is limited to N/2 because each global switch consumes two ports per plane, thus all N ports are used with N/2 planes. In general cases, there are total up to N(N+2)/8 global switches in a switched all-to-all network. (In the N=6 example, there are 6(6+2)/8=6 global switches). - With this baseline configuration, a stacked all-to-all switching network can scale to up to N2(N+2)/8˜N3/8 end points: Each plane has (N/2+1) edge switches. There can be up to N/2 such planes. Each edge switch has N/2 end points. Therefore, the maximum number of end points is (N/2+1)×(N/2)×(N/2)=N2(N+2)/8. For example, with 36 port switches (N=36), a stacked all-to-all network could scale up to 362×(36+2)/8=6156 end points. This is a good scalability for required number of switch ports, links, and number of hops. Since one hop over a global switch allows both intra-plane and inter-plane traversal, a stacked topology has good scalability and small diameter (number of hops).
- As for system growth property, a system size (number of end points) could be increased by adding planes. Initially a system can be built with less than N/2 planes. More planes can be added afterward to increase the system size until the number of planes reaches the upper limit of N/2, without affecting the existing wiring.
- A stacked all-to-all network can be partitioned in units of planes without interference among partitions. Network traffic within each plane, or a group of planes, does not interfere with any other plane because the planes are decoupled by the global switches. Exploiting this property, various combinations of partition sizes are possible. For example, if there are 4 planes, possible partitioning examples include 2 partitions with 1 plane and 3 planes, 3 partitions with 1 plane×2 and 2 planes×1, and 4 of 1 plane partitions.
- Deadlock free direct and indirect routing methods are available on a stacked all-to-all network. Direct routing path shown in
FIG. 2 consists of 4 link hops: Injection, Sup, Sdown, and Reception. The Injection hop (1) is to traverse from thesource endpoint 204 to thestart edge switch 212. The second hop Sup (2), is to travel over the link from thestart edge switch 212 to aglobal switch 210 labeled E in the figure. The next hop Sdown is to travel from theglobal switch 210 to thedestination edge switch 206. The final hop Reception is to hop from theedge switch 206 to thedestination endpoint 208. As described above, the 3 “E” labeled switches inFIG. 2 are actually a single switch connected to edge switches in each plane. - Indirect routing path shown in
FIG. 3 consists of 6 link hops: Injection, Sup, Sdown, Sup, Sdown, and Reception. For indirect routing, anintermediate edge switch 314 is selected. The first 3 link hops, Injection, Sup and Sdown, are to reach thisintermediate edge switch 314 from thesource endpoint 304. Subsequently the remaining three link hops, Sup, Sdown, and Reception, carry the packet to thefinal destination endpoint 308. The global switches with the same letter label (A, B, C, D, E, and F) are the same switch. This applies to both A labeled switches and F labeled switches inFIG. 3 . - In a
worst case 3 VCs (virtual channels) will be required to support indirect routing with any unrestricted order. With restricted ordering, 2 VCs will suffice for indirect routing. Direct routing requires only 1 VC. - This is another example of point-to-point unified stacking method, where the baseline network is a 2D HyperX topology.
- The point-to-point unified stacking method could be applied to any topology that contains all-to-all connections. A 2D HyperX network is one such topology and can be stacked using this method as shown in
FIG. 6c described below. - In
FIG. 6a there is shown a 2D HyperX topology consisting of nine edge switches 602 (each of the two dimensions consists of 3 edge switches). S links 606 are in the horizontal direction as viewed in the figures. L links 608 are in the vertical direction as viewed in the figures. Each switch belongs to two different groups of switches with all-to-all connections within the group: a group in the horizontal direction, and a group in the vertical direction. We can apply stacking to either, or both, of the dimensions. We illustrate this where it is applied to the horizontal direction. InFIG. 6b global switches 610 are inserted on one dimension (S links in the figure), and then multiple copies of the 2D HyperX networks are stacked. Eachedge switch 602 has N ports. N/3 ports are wired to endpoints 604, another N/3 to one HyperX dimension (L links), and the rest of N/3 to the other HyperX dimension (e.g., S links) which is now bridged using the global switches 610. Therefore, the network scales to up to ˜N4/54 end points (N=number of switch ports): - Referring to
FIG. 6c , there are N/3end points 604 connected to eachedge switch 602. Each HyperX dimension size is N/3+1. Thus there are (N/3+1)2 edge switches in each plane. Each global switch uses 2 ports per plane. Thus there are total N/2 planes. Therefore, the total number ofend points 604 can be up to N/3*(N/3+1)2*N/2=N2(N+3)2/54˜N4/54. - Direct route consists of five cable hops (Injection, L, Sup, Sdown, and Reception). Indirect routing consists of up to eight cable hops since L, Sup, and Sdown can be repeated up to twice. Similar to stacked all-to-all topology, 3 VCs are required for fully-flexible indirect routing with unrestricted ordering. 2 VCs are required for indirect routing restricted ordering, and 1 VC for direct routing. Similar to stacked all-to-all, the stacked HyperX network could be partitioned into multiple planes (or set of planes) without interfering with each other. As for modular system growth, initially the system could have a small number of planes (<N/2), and additional planes could be added afterward.
- This example covers a simple example of grouped unified stacking, where the baseline network is a flat all-to-all topology. The grouped method is an aspect of the invention different from Example 1 where the point-to-point method is applied to a flat all-to-all topology.
- In the point-to-point method described above in Examples 1 and 2, a global switch bridges two existing edge switches in each plane. In general, a global switch could bridge three or more edge switches in each plane, which we call “grouped unified stacking” or “multi-way stacking”.
-
FIG. 5 shows an example of grouped unified stacking method. InFIG. 5 , 3-way all-to-all components are replaced with 3-way star connections toglobal switches 510. Theglobal switches 510 act as the center switches in the star connections. For example,global switch 510 labeled “A” serves threeedge switches 502 in aplane 512, replacing the 3-way all-to-all links among these three edge switches. Each 512, 514 has seven edge switches. Eachplane global switch 510 bridges three edge switches 502. Anyedge switch 502 can reach to anyother edge switch 502 via one hop through aglobal switch 510. - In
FIG. 5 eachglobal switch 510 bridges three edge switches (rather than two) in each plane. There are sevenedge switches 502 on each 512, 514. These edge switches are connected with each other via 7plane global switches 510. Anyedge switch 502 can reach to anyother edge switch 502 via one hop through aglobal switch 510. Thus the required number of hops is the same as in the point-to-point method. Eachedge switch 502 has threeend points 504 with six ports. - With this grouped unified stacking method, an edge switch could reach two other edge switches on the same plane via one up link port to a global switch. Therefore, more edge switches could be placed in each plane. Thus, multi-way stacking is a useful way to build larger scale network with limited number of switch ports. However, the number of planes is reduced since each global switch needs more ports per plane. For this reason, the improvement in terms of scalability is limited.
- When the grouped unified stacking method is applied to a flat all-to-all baseline network, the maximum network scale (number of end points) is N2(N+1)/6˜N3/6, which is better than N3/8 with the point-to-point method (N=number of switch ports): There are N/2 end points connected to each edge switch. Each edge switch has N/2 uplink ports to N/2 global switches. Each global switch allows the edge switch to travel to two different edge switches. Therefore, in each plane there can be up to N/2*2+1=N+1 edge switches. Since each global switch uses three ports per plane, there can be up to N/3 planes. Therefore, there can be total (N/2)*(N+1)*(N/3)=N2(N+1)/6˜N3/6 end points.
- With similar discussions, k-way stacked all-to-all scales to up to ˜((k−1)/k)*N3/4. Thus the upper limit with a large k is N3/4, about 2 times larger than N3/8 with point-to-point unified stacking method.
- This is another example of grouped unified stacking method, where the baseline network is 2D HyperX.
-
FIG. 7a shows the baseline 2D HyperX topology, which is a (NL+1)×(NS+1) array of edge switches 702, where NL is the number of L links and NS is the number of S links per edge switch. There are all-to-all L links along the vertical dimension, and all-to-all S links along the horizontal dimension. InFIG. 7b not allend points 704 are shown. When each end point switch has N ports, the optimal port assignment is NL=NS=N/3 to maximize the system scale. Thus there are (NL+1)(NS+1)=(N/3+1)2 edge switches 702. When L=6, S=6→(6+1)*(6+1)=49 edge switches. - With grouped unified staking method, one dimension of the 2D HyperX wiring is replaced with 3-way star connections via
global switches 710, as shown inFIG. 7b . For example, the star connection links 706 along S dimension replace original all-to-all wiring along S dimension. This is similar to the stacked all-to-all with 3-way grouped method in the Example 3. InFIG. 7b , each group of the 7 edge switches 702 along the S dimension is connected via sevenglobal switches 710. There are total of 49 global switches. Note an edge switch now needs only three S links (as opposed to six in the original 2D HyperX). -
FIG. 8 now shows multiple planes in the stacked 2D HyperX topology with 3-way stacking. The original 2D HyperX network can be duplicated into up to N/3 planes. In the figure, theglobal switches 810 in the same position in each 812, 814 are really one switch. For example, the “A” switches in eachplane 812 and 814, . . . are only one switch. It is the same for “B”, “C”, “D”, . . . switches. The maximum network size is up to N/3*(N/3+1)*(2N/3+1)2*N/3=N2(N+3)(2N+3)2/81(˜2N4/81)plane endpoints 804. Up to N/3endpoints 804 can be connected to anedge switch 802, up to N/3+1 edge switches can be placed along L dimension, up to 2N/3+1 edge switches can be placed along S dimension, and up to N/3 planes can be created. A direct routing path is five cables hops (Injection+L+S_up+S_down+Reception). An indirect routing path is a maximum of eight cable hops (additional L, S_up and S_down). - In general cases, with k-way grouped unified stacking method, a Stacked HyperX network would scale to ˜((k−1)/k)*N4/27 end points.
- Additional Topologies
- Although the detail is omitted, the Stacked 2D HyperX topology could be further stacked using the L links. For example, another set of global switches are inserted on the L links 608 in
FIG. 6c , and the whole Stacked 2D HyperX network is further replicated into N/2 copies, connected with the new global switches. This will allow a very large network (scales to ˜N5/108 end points with point-to-point unified stacking method) and many partitions (N2/4), but require additional cost for extra global switches and links. - These unified stacking methods can be applied to vast ranges of baseline networks that contain all-to-all connections, such as Dragonfly, 3D HyperX, or M-dimensional HyperX for general cases where M>3. Here one could stack one, or more, of the dimensions.
- Oversubscribed Stacking
- To save costs, sometimes a network is designed to have less global bandwidth (i.e. bandwidth between long-distance endpoint pairs) than local bandwidth. Such networks are often called oversubscribed networks. The stacking method described supports such demands of cost-effective oversubscribing by having global switches on only a fraction of the links.
FIG. 4 shows an example of oversubscribed stacked all-to-all topology. Different from the original stacked all-to-all, which had six global switches, the oversubscribed network shown inFIG. 4 has only fourglobal switches 410. The rest of the links do not have global switches and hence the edge switches are directly wired within the plane. The “missing” global switches are shown in dotted outline. As a result, the number of links and switches are reduced, resulting in lower cost. However, there is degradation in the number of hops and global bandwidth. As shown inFIG. 4 , in aworst case 5 hops are required (first two hops to move fromsource endpoint 404 to the destinationglobal switch 410, and the remaining three hops to travel from theglobal switch 410 to the destination end point 416). Since there are five hops, the oversubscribed stack is worse than the four hops on the original stacked all-to-all topology. As for the global bandwidth, the oversubscribed network inFIG. 4 has only ⅔ of the original stacked all-to-all network. The number of global switches could be adjusted to balance the cost and the global bandwidth for a certain use case. The 3 “B” labeled global switches inFIG. 4 are a single switch. -
FIG. 9 illustrates a schematic diagram of an example computer or processing system that may implement the extending the scalability and improving the partitionability of baseline networks for transporting packet traffic from a source endpoint to a destination endpoint in one exemplary embodiment of the present disclosure. The computer system is only one example of a suitable processing system and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the methodology described herein. The processing system shown may be operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with the processing system shown inFIG. 9 may include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like. - The computer system may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. The computer system may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
- The components of computer system may include, but are not limited to, one or more processors or
processing units 902, asystem memory 906, and abus 904 that couples various system components includingsystem memory 906 toprocessor 902. Theprocessor 902 may include amodule 900 that performs the methods described herein. Themodule 900 may be programmed into the integrated circuits of theprocessor 902, or loaded frommemory 906,storage device 908, ornetwork 914 or combinations thereof. -
Bus 904 may represent one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnects (PCI) bus. - Computer system may include a variety of computer system readable media. Such media may be any available media that is accessible by computer system, and it may include both volatile and non-volatile media, removable and non-removable media.
-
System memory 906 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) and/or cache memory or others. Computer system may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only,storage system 908 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (e.g., a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected tobus 904 by one or more data media interfaces. - Computer system may also communicate with one or more
external devices 916 such as a keyboard, a pointing device, adisplay 918, etc.; one or more devices that enable a user to interact with computer system; and/or any devices (e.g., network card, modem, etc.) that enable computer system to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 910. - Still yet, computer system can communicate with one or
more networks 914 such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) vianetwork adapter 912. As depicted,network adapter 912 communicates with the other components of computer system viabus 904. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system. Examples include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc. - Embodiments of the present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
- The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
- Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
- The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
- The corresponding structures, materials, acts, and equivalents of all means or step plus function elements, if any, in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Claims (21)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/982,547 US9699078B1 (en) | 2015-12-29 | 2015-12-29 | Multi-planed unified switching topologies |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/982,547 US9699078B1 (en) | 2015-12-29 | 2015-12-29 | Multi-planed unified switching topologies |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20170187616A1 true US20170187616A1 (en) | 2017-06-29 |
| US9699078B1 US9699078B1 (en) | 2017-07-04 |
Family
ID=59087320
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/982,547 Expired - Fee Related US9699078B1 (en) | 2015-12-29 | 2015-12-29 | Multi-planed unified switching topologies |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US9699078B1 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190007273A1 (en) * | 2017-06-30 | 2019-01-03 | Oracle International Corporation | High-performance data repartitioning for cloud-scale clusters |
| US11165686B2 (en) * | 2018-08-07 | 2021-11-02 | International Business Machines Corporation | Switch-connected Dragonfly network |
| US11855913B2 (en) * | 2018-10-31 | 2023-12-26 | Hewlett Packard Enterprise Development Lp | Hierarchical switching device with deadlockable storage and storage partitions |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10938751B2 (en) | 2018-04-18 | 2021-03-02 | Hewlett Packard Enterprise Development Lp | Hierarchical switching devices |
| US10757038B2 (en) | 2018-07-06 | 2020-08-25 | Hewlett Packard Enterprise Development Lp | Reservation-based switching devices |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020167954A1 (en) * | 2001-05-11 | 2002-11-14 | P-Com, Inc. | Point-to-multipoint access network integrated with a backbone network |
Family Cites Families (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6192422B1 (en) * | 1997-04-16 | 2001-02-20 | Alcatel Internetworking, Inc. | Repeater with flow control device transmitting congestion indication data from output port buffer to associated network node upon port input buffer crossing threshold level |
| US7209659B2 (en) * | 2000-12-19 | 2007-04-24 | Nortel Networks Limited | Modular high capacity network |
| US7440448B1 (en) * | 2001-07-02 | 2008-10-21 | Haw-Minn Lu | Systems and methods for upgradeable scalable switching |
| US20030217141A1 (en) * | 2002-05-14 | 2003-11-20 | Shiro Suzuki | Loop compensation for a network topology |
| US7672227B2 (en) * | 2005-07-12 | 2010-03-02 | Alcatel Lucent | Loop prevention system and method in a stackable ethernet switch system |
| CN101141404B (en) | 2007-10-16 | 2011-03-16 | 中兴通讯股份有限公司 | Stack system topological management method and topological alteration notifying method |
| WO2012154751A1 (en) * | 2011-05-08 | 2012-11-15 | Infinetics Technologies, Inc. | Flexible radix switching network |
| US9148348B2 (en) * | 2011-10-31 | 2015-09-29 | Hewlett-Packard Development Company, L.P. | Generating network topologies |
| US8750288B2 (en) | 2012-06-06 | 2014-06-10 | Juniper Networks, Inc. | Physical path determination for virtual network packet flows |
| US9767311B2 (en) | 2013-10-25 | 2017-09-19 | Netapp, Inc. | Stack isolation by a storage network switch |
| CN103795570B (en) | 2014-01-23 | 2018-05-08 | 新华三技术有限公司 | The unicast message restoration methods and device of the stacked switchboard system of ring topology |
| CN104539438B (en) | 2015-01-07 | 2018-04-17 | 烽火通信科技股份有限公司 | A kind of system and method for being used in PON system realize multicast service layering |
-
2015
- 2015-12-29 US US14/982,547 patent/US9699078B1/en not_active Expired - Fee Related
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020167954A1 (en) * | 2001-05-11 | 2002-11-14 | P-Com, Inc. | Point-to-multipoint access network integrated with a backbone network |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190007273A1 (en) * | 2017-06-30 | 2019-01-03 | Oracle International Corporation | High-performance data repartitioning for cloud-scale clusters |
| US10862755B2 (en) * | 2017-06-30 | 2020-12-08 | Oracle International Corporation | High-performance data repartitioning for cloud-scale clusters |
| US11165686B2 (en) * | 2018-08-07 | 2021-11-02 | International Business Machines Corporation | Switch-connected Dragonfly network |
| US11855913B2 (en) * | 2018-10-31 | 2023-12-26 | Hewlett Packard Enterprise Development Lp | Hierarchical switching device with deadlockable storage and storage partitions |
Also Published As
| Publication number | Publication date |
|---|---|
| US9699078B1 (en) | 2017-07-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9699078B1 (en) | Multi-planed unified switching topologies | |
| US9893950B2 (en) | Switch-connected HyperX network | |
| US10659309B2 (en) | Scalable data center network topology on distributed switch | |
| EP2708000B1 (en) | Flexible radix switching network | |
| US8621111B2 (en) | Transpose box based network scaling | |
| US12067472B2 (en) | Defect resistant designs for location-sensitive neural network processor arrays | |
| US10317930B2 (en) | Optimizing core utilization in neurosynaptic systems | |
| US9374321B2 (en) | Data center switch | |
| US10394738B2 (en) | Technologies for scalable hierarchical interconnect topologies | |
| US20190303740A1 (en) | Block transfer of neuron output values through data memory for neurosynaptic processors | |
| US11270193B2 (en) | Scalable stream synaptic supercomputer for extreme throughput neural networks | |
| US20240414160A1 (en) | Access control and routing optimization at a cloud headend in a cloud-based secure access service environment | |
| US20190236444A1 (en) | Functional synthesis of networks of neurosynaptic cores on neuromorphic substrates | |
| US20190140944A1 (en) | Routing between software defined networks and physical networks | |
| Rau et al. | Destination tag routing techniques based on a state model for the IADM network | |
| CN105122744B (en) | It is extended by the MSDC of on-demand routing update | |
| US20130070761A1 (en) | Systems and methods for controlling a network switch | |
| WO2023158467A1 (en) | Pooling smart nics for network disaggregation | |
| Chen et al. | Multi-planed unified switching topologies | |
| Vesović et al. | Fast and scalable routing protocols for data center networks | |
| US9081744B2 (en) | Trellis ring network architecture | |
| Yan | High performance scalable data center and computer network architectures based on distributed fast optical switches | |
| Yang et al. | Adaptive wormhole routing in k-ary n-cubes | |
| US20200337114A1 (en) | Communication control method and information processing apparatus | |
| Huang et al. | A Novel Cost-Effective Interconnection Networks of Modular Datacenters for the Cloud Computing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHEN, DONG;HEIDELBERGER, PHILIP;SUGAWARA, YUTAKA;REEL/FRAME:037587/0820 Effective date: 20160104 |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
| FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20210704 |