He et al., 2021 - Google Patents
Accl: Fpga-accelerated collectives over 100 gbps tcp-ipHe et al., 2021
View PDF- Document ID
- 470474292633576189
- Author
- He Z
- Parravicini D
- Petrica L
- O’Brien K
- Alonso G
- Blott M
- Publication year
- Publication venue
- 2021 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC)
External Links
Snippet
Collective operations such as scatter, gather, reduce, etc are utilized broadly to implement distributed HPC applications and are the target of extensive optimization in all MPI implementations as well as dedicated collective libraries by accelerator vendors (eg NCCL …
- 238000004891 communication 0 abstract description 31
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/10—Programme control for peripheral devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network-specific arrangements or communication protocols supporting networked applications
- H04L67/10—Network-specific arrangements or communication protocols supporting networked applications in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| He et al. | Accl: Fpga-accelerated collectives over 100 gbps tcp-ip | |
| Ibanez et al. | The nanopu: A nanosecond network stack for datacenters | |
| He et al. | Easynet: 100 gbps network for hls | |
| Wang et al. | {FpgaNIC}: An {FPGA-based} versatile 100gb {SmartNIC} for {GPUs} | |
| Shashidhara et al. | {FlexTOE}: Flexible {TCP} offload with {Fine-Grained} parallelism | |
| Cerović et al. | Fast packet processing: A survey | |
| Hoefler et al. | sPIN: High-performance streaming Processing in the Network | |
| Sun et al. | Fast and flexible: Parallel packet processing with GPUs and click | |
| CN107534582B (en) | Method, system, and computer-readable medium for use in a data center | |
| US7788334B2 (en) | Multiple node remote messaging | |
| US8140704B2 (en) | Pacing network traffic among a plurality of compute nodes connected using a data communications network | |
| US7802025B2 (en) | DMA engine for repeating communication patterns | |
| US20170220499A1 (en) | Massively parallel computer, accelerated computing clusters, and two-dimensional router and interconnection network for field programmable gate arrays, and applications | |
| US9225545B2 (en) | Determining a path for network traffic between nodes in a parallel computer | |
| US20090031002A1 (en) | Self-Pacing Direct Memory Access Data Transfer Operations for Compute Nodes in a Parallel Computer | |
| Heinz et al. | On-chip and distributed dynamic parallelism for task-based hardware accelerators | |
| Chowdhury et al. | $\mu\mathrm {NF} $: A Disaggregated Packet Processing Architecture | |
| Dang | Consensus protocols exploiting network programmability | |
| Zhang et al. | DFabric: Scaling out data parallel applications with CXL-Ethernet hybrid interconnects | |
| Si et al. | Collective communication for 100k+ gpus | |
| Chen et al. | {vFPIO}: A Virtual {I/O} Abstraction for {FPGA-accelerated}{I/O} Devices | |
| Balle et al. | Inter-kernel links for direct inter-FPGA communication | |
| Khazraee et al. | Shire: Making FPGA-accelerated Middlebox Development More Pleasant | |
| Pertuz et al. | A flexible mixed-mesh FPGA cluster architecture for high speed computing | |
| Awan et al. | Towards Hardware Support for FPGA Resource Elasticity |