Girondi et al., 2024 - Google Patents
Toward GPU-centric Networking on Commodity HardwareGirondi et al., 2024
View PDF- Document ID
- 15208686095084488382
- Author
- Girondi M
- Scazzariello M
- Maguire Jr G
- Kostić D
- Publication year
- Publication venue
- Proceedings of the 7th International Workshop on Edge Systems, Analytics and Networking
External Links
Snippet
GPUs are emerging as the most popular accelerator for many applications, powering the core of machine learning applications. In networked GPU-accelerated applications input & output data typically traverse the CPU and the OS network stack multiple times, getting …
- 230000006855 networking 0 title description 21
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network-specific arrangements or communication protocols supporting networked applications
- H04L67/10—Network-specific arrangements or communication protocols supporting networked applications in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Application independent communication protocol aspects or techniques in packet data networks
- H04L69/30—Definitions, standards or architectural aspects of layered protocol stacks
- H04L69/32—High level architectural aspects of 7-layer open systems interconnection [OSI] type protocol stacks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Application independent communication protocol aspects or techniques in packet data networks
- H04L69/12—Protocol engines, e.g. VLSIs or transputers
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11734179B2 (en) | Efficient work unit processing in a multicore system | |
US10841245B2 (en) | Work unit stack data structures in multiple core processor system for stream data processing | |
US11237880B1 (en) | Dataflow all-reduce for reconfigurable processor systems | |
US11392740B2 (en) | Dataflow function offload to reconfigurable processors | |
US10929175B2 (en) | Service chaining hardware accelerators within a data stream processing integrated circuit | |
CN110915173B (en) | Data processing unit for computing nodes and storage nodes | |
US8438578B2 (en) | Network on chip with an I/O accelerator | |
US8462369B2 (en) | Hybrid image processing system for a single field of view having a plurality of inspection threads | |
Daoud et al. | GPUrdma: GPU-side library for high performance networking from GPU kernels | |
US8675219B2 (en) | High bandwidth image processing with run time library function offload via task distribution to special purpose engines | |
US11956156B2 (en) | Dynamic offline end-to-end packet processing based on traffic class | |
US20220229677A1 (en) | Performance modeling of graph processing computing architectures | |
WO2020163327A1 (en) | System-based ai processing interface framework | |
US20230100935A1 (en) | Microservice deployments using accelerators | |
KR20120019711A (en) | Network architecture and method for processing packet data using the same | |
CN104820582A (en) | Realization method of multicore embedded DSP (Digital Signal Processor) parallel programming model based on Navigator | |
Girondi et al. | Toward GPU-centric Networking on Commodity Hardware | |
EP4607350A1 (en) | Unloading card provided with accelerator | |
EP4475001A1 (en) | Network device-based data processing method and network device | |
Cardellini et al. | Overlapping communication with computation in MPI applications | |
Zhang et al. | DFabric: Scaling out data parallel applications with CXL-Ethernet hybrid interconnects | |
CN119493661A (en) | Telemetry and load balancing in CXL systems | |
US20250315319A1 (en) | Asynchronous post-send | |
Nan et al. | Plug & Offload: Transparently Offloading TCP Stack onto Off-path SmartNIC with PnO-TCP | |
Brasilino et al. | In-network processing for edge computing with InLocus |