[go: up one dir, main page]

He et al., 2021 - Google Patents

Accl: Fpga-accelerated collectives over 100 gbps tcp-ip

He et al., 2021

View PDF
Document ID
470474292633576189
Author
He Z
Parravicini D
Petrica L
O’Brien K
Alonso G
Blott M
Publication year
Publication venue
2021 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC)

External Links

Snippet

Collective operations such as scatter, gather, reduce, etc are utilized broadly to implement distributed HPC applications and are the target of extensive optimization in all MPI implementations as well as dedicated collective libraries by accelerator vendors (eg NCCL …
Continue reading at www.research-collection.ethz.ch (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogramme communication; Intertask communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Programme initiating; Programme switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/78Architectures of general purpose stored programme computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/44Arrangements for executing specific programmes
    • G06F9/455Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/80Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Programme control for peripheral devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network-specific arrangements or communication protocols supporting networked applications
    • H04L67/10Network-specific arrangements or communication protocols supporting networked applications in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks

Similar Documents

Publication Publication Date Title
He et al. Accl: Fpga-accelerated collectives over 100 gbps tcp-ip
Ibanez et al. The nanopu: A nanosecond network stack for datacenters
He et al. Easynet: 100 gbps network for hls
Wang et al. {FpgaNIC}: An {FPGA-based} versatile 100gb {SmartNIC} for {GPUs}
Shashidhara et al. {FlexTOE}: Flexible {TCP} offload with {Fine-Grained} parallelism
Cerović et al. Fast packet processing: A survey
Hoefler et al. sPIN: High-performance streaming Processing in the Network
Sun et al. Fast and flexible: Parallel packet processing with GPUs and click
CN107534582B (en) Method, system, and computer-readable medium for use in a data center
US7788334B2 (en) Multiple node remote messaging
US8140704B2 (en) Pacing network traffic among a plurality of compute nodes connected using a data communications network
US7802025B2 (en) DMA engine for repeating communication patterns
US20170220499A1 (en) Massively parallel computer, accelerated computing clusters, and two-dimensional router and interconnection network for field programmable gate arrays, and applications
US9225545B2 (en) Determining a path for network traffic between nodes in a parallel computer
US20090031002A1 (en) Self-Pacing Direct Memory Access Data Transfer Operations for Compute Nodes in a Parallel Computer
Heinz et al. On-chip and distributed dynamic parallelism for task-based hardware accelerators
Chowdhury et al. $\mu\mathrm {NF} $: A Disaggregated Packet Processing Architecture
Dang Consensus protocols exploiting network programmability
Zhang et al. DFabric: Scaling out data parallel applications with CXL-Ethernet hybrid interconnects
Si et al. Collective communication for 100k+ gpus
Chen et al. {vFPIO}: A Virtual {I/O} Abstraction for {FPGA-accelerated}{I/O} Devices
Balle et al. Inter-kernel links for direct inter-FPGA communication
Khazraee et al. Shire: Making FPGA-accelerated Middlebox Development More Pleasant
Pertuz et al. A flexible mixed-mesh FPGA cluster architecture for high speed computing
Awan et al. Towards Hardware Support for FPGA Resource Elasticity