Hong et al., 2022 - Google Patents

Dfx: A low-latency multi-fpga appliance for accelerating transformer-based text generation

Hong et al., 2022

Document ID: 17813474168073272623
Author: Hong S; Moon S; Kim J; Lee S; Kim M; Lee D; Kim J
Publication year: 2022
Publication venue: 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)

External Links

Cited by

Snippet

Transformer is a deep learning language model widely used for natural language processing (NLP) services in datacenters. Among transformer models, Generative Pretrained Transformer (GPT) has achieved remarkable performance in text generation, or …

Continue reading at arxiv.org (PDF) (other versions)

238000000034 method 0 abstract description 3

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power Management, i.e. event-based initiation of power-saving mode
- G06F1/3234—Action, measure or step performed to reduce power consumption
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models

Similar Documents

Publication	Publication Date	Title
Hong et al.	2022	Dfx: A low-latency multi-fpga appliance for accelerating transformer-based text generation
Jouppi et al.	2020	A domain-specific supercomputer for training deep neural networks
Norrie et al.	2021	The design process for Google's training chips: TPUv2 and TPUv3
Zhou et al.	2022	Transpim: A memory-based acceleration via software-hardware co-design for transformer
Fowers et al.	2018	A configurable cloud-scale DNN processor for real-time AI
Zeng et al.	2024	Flightllm: Efficient large language model inference with a complete mapping flow on fpgas
Khan et al.	2021	NPE: An FPGA-based overlay processor for natural language processing
Ma et al.	2022	BaGuaLu: targeting brain scale pretrained models with over 37 million cores
US11275561B2 (en)	2022-03-15	Mixed precision floating-point multiply-add operation
He et al.	2018	A survey to predict the trend of AI-able server evolution in the cloud
Li et al.	2020	Efficient methods for mapping neural machine translator on FPGAs
US12380060B2 (en)	2025-08-05	Graph spatial split
US20230409882A1 (en)	2023-12-21	Efficient processing of transformer based models
Park et al.	2020	TrainBox: An extreme-scale neural network training server architecture by systematically balancing operations
Pati et al.	2024	T3: Transparent tracking & triggering for fine-grained overlap of compute & collectives
Sun et al.	2011	An I/O bandwidth-sensitive sparse matrix-vector multiplication engine on FPGAs
Wen-mei et al.	2017	Rebooting the data access hierarchy of computing systems
Que et al.	2020	A reconfigurable multithreaded accelerator for recurrent neural networks
Que et al.	2022	Remarn: A reconfigurable multi-threaded multi-core accelerator for recurrent neural networks
Moon et al.	2024	Lpu: A latency-optimized and highly scalable processor for large language model inference
Pietras	2014	Hardware conversion of neural networks simulation models for neural processing accelerator implemented as FPGA-based SoC
Ono et al.	2019	Fpga-based acceleration of word2vec using opencl
Dhingra et al.	2025	Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures
Chen et al.	2023	Ai soc design challenges in the foundation model era
Moon et al.	2025	Hybe: GPU-NPU Hybrid System for Efficient LLM Inference with Million-Token Context Window