Chen et al., 2022 - Google Patents

Multi-layer LSTM Parallel Optimization Based on Hardware and Software Cooperation

Chen et al., 2022

Document ID: 4797412632334001541
Author: Chen Q; Wu J; Huang F; Han Y; Zhao Q
Publication year: 2022
Publication venue: International Conference on Knowledge Science, Engineering and Management

External Links

Cited by

Snippet

LSTM's special gate structure and memory unit make it suitable for solving problems that are related to time series. It has excellent performance in the fields of machine translation and reasoning. However, LSTM also has some shortcomings, such as low parallelism, which …

Continue reading at link.springer.com (other versions)

238000005457 optimization 0 title abstract description 27

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication	Publication Date	Title
Dave et al.	2021	Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights
Mittal	2020	A survey of FPGA-based accelerators for convolutional neural networks
Kim et al.	2023	Full stack optimization of transformer inference: a survey
Mohaidat et al.	2024	A survey on neural network hardware accelerators
Nadalini et al.	2022	Pulp-trainlib: Enabling on-device training for risc-v multi-core mcus through performance-driven autotuning
Sun et al.	2017	A high-performance accelerator for large-scale convolutional neural networks
Fang et al.	2023	Efficient n: m sparse dnn training using algorithm, architecture, and dataflow co-design
Jahanshahi et al.	2021	Inf4edge: Automatic resource-aware generation of energy-efficient cnn inference accelerator for edge embedded fpgas
Mazouz et al.	2021	Automated CNN back-propagation pipeline generation for FPGA online training
Lu et al.	2022	THETA: A high-efficiency training accelerator for DNNs with triple-side sparsity exploration
Ahmed et al.	2022	Answer fast: Accelerating BERT on the tensor streaming processor
George et al.	2022	A unified programmable edge matrix processor for deep neural networks and matrix algebra
Manca et al.	2024	Onnx-to-hardware design flow for adaptive neural-network inference on fpgas
Singha et al.	2022	LEAPER: Fast and accurate FPGA-based system performance prediction via transfer learning
Chen et al.	2022	Multi-layer LSTM Parallel Optimization Based on Hardware and Software Cooperation
Lu et al.	2018	SparseNN: A performance-efficient accelerator for large-scale sparse neural networks
Misra et al.	2020	HLS-based acceleration framework for deep convolutional neural networks
Zhang et al.	2023	Compilation and Optimizations for Efficient Machine Learning on Embedded Systems
Gonçalves et al.	2019	Exploring data size to run convolutional neural networks in low density fpgas
Que	2023	Reconfigurable acceleration of recurrent neural networks
Su et al.	2024	Optimization and deployment of dnns for risc-v-based edge ai
Anderson et al.	2019	Toward energy–quality scaling in deep neural networks
Stehle et al.	2020	On the difficulty of designing processor arrays for deep neural networks
Manca et al.	2023	ONNX-to-Hardware Design Flow for the Generation of Adaptive Neural-Network Accelerators on FPGAs
Li	2024	Compression of deep neural network