[go: up one dir, main page]

CN113127407A - Chip architecture for AI calculation based on NVM - Google Patents

Chip architecture for AI calculation based on NVM Download PDF

Info

Publication number
CN113127407A
CN113127407A CN202110541351.3A CN202110541351A CN113127407A CN 113127407 A CN113127407 A CN 113127407A CN 202110541351 A CN202110541351 A CN 202110541351A CN 113127407 A CN113127407 A CN 113127407A
Authority
CN
China
Prior art keywords
nvm
data
neural network
chip architecture
npu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110541351.3A
Other languages
Chinese (zh)
Other versions
CN113127407B (en
Inventor
丛维
林小峰
金生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Youcun Technology Co ltd
Original Assignee
Nanjing Youcun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Youcun Technology Co ltd filed Critical Nanjing Youcun Technology Co ltd
Priority to CN202110541351.3A priority Critical patent/CN113127407B/en
Publication of CN113127407A publication Critical patent/CN113127407A/en
Application granted granted Critical
Publication of CN113127407B publication Critical patent/CN113127407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • G06F15/7846On-chip cache and off-chip main memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Advance Control (AREA)

Abstract

本发明提供的一种基于NVM进行AI计算的芯片架构,包括通过总线通信连接的NVM阵列、外部接口模块、NPU和MCU;采用NPU和NVM相结合进行AI神经网络计算,神经网络的权重参数数字化存储在NVM阵列中,MCU接收外部的AI运算指令控制NPU及NVM阵列实现神经网络计算,MCU控制NVM阵列加载其内部存储的神经网络的权重参数,通过运行的程序和神经网络模型来进行AI计算,与现有各类采用NVM进行模拟运算的存算方案相比,数字存储与运算方式运算结构灵活,可靠性好、精度高、读取准确度高,故此本发明在突破采用片外NVM存储速度瓶颈以及降低外部输入功耗的同时,又具备高度的可实施性、灵活性以及可靠性。

Figure 202110541351

The invention provides a chip architecture for AI computing based on NVM, including NVM array, external interface module, NPU and MCU connected through bus communication; AI neural network computing is performed by combining NPU and NVM, and the weight parameters of neural network are digitized Stored in the NVM array, the MCU receives external AI operation instructions to control the NPU and NVM array to implement neural network calculations, and the MCU controls the NVM array to load the weight parameters of the neural network stored in its internal storage, and performs AI calculations through running programs and neural network models. Compared with the existing storage and calculation schemes that use NVM for analog operation, the digital storage and operation mode has flexible operation structure, good reliability, high precision and high reading accuracy. Therefore, the present invention breaks through the use of off-chip NVM storage. Speed bottlenecks and reduced external input power consumption, while providing a high degree of implementability, flexibility and reliability.

Figure 202110541351

Description

Chip architecture for AI calculation based on NVM
Technical Field
The invention relates to the technical field of AI (Artificial Intelligence), in particular to a chip architecture for AI calculation based on NVM (non-volatile memory).
Background
The algorithm of AI derives from the enlightenment of the structure of the human brain. The human brain is a complex network of a large number of neurons connected in a complex manner, each neuron receiving information by connecting to a large number of other neurons via a large number of dendrites, each connection point being called a Synapse (Synapse). After the external stimulus has accumulated to a certain extent, a stimulus signal is generated and transmitted out through the axon. Axons have a large number of terminals, which are connected by synapses to dendrites of a large number of other neurons. It is such a network consisting of simple functional neurons that implement all the intelligent activities of human beings. Human memory and intelligence are generally believed to be stored in the different coupling strengths at each synapse.
Neural network algorithms, emerging from the 60 s of the last century, mimic the function of neurons with a function. The function accepts a plurality of inputs from other neurons, each input having a different weight, and the output is a multiplication of each input by a corresponding neuron connection weight and a summation. The function output is input to other neurons in the next layer to form a neural network.
The common AI chip optimizes the matrix parallel computation aiming at the network computation in algorithm, but because the AI computation needs extremely high storage and reading bandwidth, the architecture separating the processor, the memory and the storage meets the bottleneck of reading speed, and is also limited by external storage and reading power consumption. The industry has begun to extensively study the architecture of in-memory-computing (in-memory-computing).
At present, the scheme of using NVM to store internal calculation is to store weights in a neural network by using NVM in the form of analog signals, and to implement calculation of the neural network by a method of adding and multiplying analog signals, for a specific example, refer to chinese patent application with publication number CN 109086249A. Many scientific achievements have been made with such solutions, but practical application has been difficult. Because practical neural networks basically have a plurality of layers and very complicated connection structures, analog signals are very inconvenient when transmission between layers and various signal processing are carried out in the calculation process of the neural networks, and the analog calculation array structure is rigid, which is not beneficial to supporting a flexible neural network structure. In addition, various noises and errors in the storage, reading, writing and calculation of the analog signals can influence the reliability of the stored neural network model and the accuracy of the calculation to be limited.
Disclosure of Invention
The invention aims to provide a chip architecture for AI calculation based on NVM (non-volatile memory), which overcomes the defects that when the conventional scheme for calculating in memory stores weights in a neural network in the form of analog signals by using the NVM, the transmission of the analog signals between layers in the process of realizing the neural network calculation and the processing of various signals are very inconvenient, the structure of an analog calculation array is rigid and is not beneficial to supporting a flexible neural network structure, and the reliability of a stored neural network model and the calculation accuracy are limited due to various noises and errors in the storage, reading and writing and calculation of the analog signals.
In order to achieve the above object, the present invention provides a chip architecture for performing AI calculation based on NVM, which includes an NVM array, an external interface module, an NPU (embedded neural network processor) and an MCU (Microcontroller Unit) connected by bus communication;
the NVM array is used for storing weight parameters of a digitalized neural network, a program operated by the MCU and a neural network model in a chip;
the NPU is used for digital domain accelerated calculation of the neural network;
the external interface module is used for receiving external AI operation instructions, inputting data and outputting AI calculation results outwards;
the MCU is used for executing the program based on the AI operation instruction so as to control the NVM array and the NPU to carry out AI calculation on the input data to obtain the result of the AI calculation.
The NPU and the NVM in the chip architecture are combined to perform AI neural network calculation, wherein the weight parameters of the neural network are digitally stored in the NVM array in the chip, the neural network calculation is also digital domain calculation, the NPU and the NVM array are controlled by the MCU based on an external AI calculation instruction to realize, the MCU controls the NVM array to load the weight parameters of the neural network stored in the MCU, a program operated by the MCU and a neural network model to perform AI calculation, compared with various existing storage schemes adopting the NVM to perform analog calculation, the digital storage and calculation mode calculation structure is flexible, and compared with the information stored in the NVM, the reliability, the precision and the reading accuracy of multi-level storage of analog signals are good, so that the scheme has high implementability, the implementation possibility, the external NVM storage speed bottleneck and the external input power consumption are reduced while the scheme breaks through, Flexibility and reliability.
Further, the chip architecture further includes a high-speed data read channel through which the NPU reads the weight parameters from the NVM array.
In addition to the on-chip bus, the scheme also sets a high-speed data reading channel between the NPU and the NVM array, and the high-speed data reading channel is used for supporting the bandwidth requirement of the NPU on the high-speed reading of the weight parameters, namely the weight data, of the neural network when the NPU performs digital domain operation.
Further, the NVM array is provided with a read channel, the read channel is N channels, N is a positive integer, the read channel reads N bits of data in one read cycle, and the NPU is configured to read the weight parameter from the NVM array through the read channel via the high-speed data read channel.
According to the scheme, a reading channel is set, wherein the number of the reading channel is N, preferably, N is 128-512, and N bits of data can be read in one reading period (usually 30-40 nanoseconds). The NPU reads the weight parameters of the neural network from the NVM array through the read channel through the high-speed data read channel, the bandwidth is far higher than the supportable read speed of the off-chip NVM, and the parameter read speed requirement required by the conventional neural network reasoning calculation can be supported.
Furthermore, the bit width of the high-speed data reading channel is m bits, and m is a positive integer; the chip architecture further comprises a data conversion unit, the data conversion unit comprises a cache module and a sequential reading module, the cache module is used for sequentially caching the weight parameters output by the reading channel according to cycles, the capacity of the cache module is N x k bits, and k represents the number of cycles; and the sequential reading module is used for converting the cache data in the cache module into m-bit wide and outputting the m-bit wide to the NPU through the high-speed data reading channel, wherein N x k is an integral multiple of m.
The arrangement further comprises a data conversion unit for converting data into a combination of data of the same bit width as the high speed data read channel, typically a combination of words of small width (e.g. 32 bits), for the case where the number of read channels does not correspond to the bit width and/or the frequency of the high speed data read channel. The NPU reads data from the data conversion unit via a high speed data read channel at its own clock frequency (up to over 1 GHz).
The data conversion unit provided by the scheme comprises a cache containing N x k bits and a sequential reader for outputting m bits at a time, wherein N x k is an integral multiple of m; the reading channel is connected with the NVM array, N bits can be output in each period, and k periods of data can be stored in the cache; the high speed data read channel width is m bits. The high-speed data read channel may include a read/write Command (CMD) and an Acknowledge (ACK) signal, which are connected to the NVM array read control circuitry. After the read operation is completed, the ACK signal informs the high-speed data reading channel and can also inform the on-chip bus at the same time, and the high-speed data reading channel asynchronously inputs the data in the cache into the NPU for multiple times through the sequential reading module.
Further, the chip architecture further includes a Static Random-Access Memory (SRAM), and the SRAM is communicatively connected to the NVM array, the external interface module, the NPU, and the MCU through the bus; the SRAM is used for caching data in the program execution process of the MCU, data in the NPU operation process and input and output data of the neural network model operation.
The chip architecture provided by the scheme comprises an embedded SRAM which is used as a cache required by operation and calculation of a chip internal system and is used for storing input and output data, intermediate data generated by calculation and the like. The method specifically comprises the steps of caching in the process of executing the program by the MCU, storing the executable program, system configuration parameters, calculation network structure configuration parameters and the like when the MCU runs; and the NPU operation caches and stores input and output data when the input and output data are operated by the neural network model.
Further, a plurality of neural network models are stored in the NVM array, and the AI operation instruction includes an algorithm selection instruction, and the algorithm selection instruction is used for selecting one of the plurality of neural network models as an algorithm for AI calculation.
The neural network models in the scheme are digitally stored in the NVM array, a plurality of neural network models can be stored according to the number of application scenes, and for the situation that various application scenes correspond to various neural network models, the MCU can flexibly select any one of the prestored neural network models to perform AI calculation according to an externally input algorithm selection instruction, so that the problem that the existing scheme integrating storage and calculation is rigid in array structure by adopting analog calculation and is not beneficial to supporting a flexible neural network structure is solved.
Further, the NVM array may employ One of a flash Memory process, an MRAM (magnetic Random Access Memory) process, an RRAM (resistive Random Access Memory) process, an MTP (Multiple Time Programming) process, an OTP (One Time Programming) process, and/or the Interface standard of the external Interface module is at least One of SPI (Serial Peripheral Interface), qpi (quad SPI) and a parallel Interface.
Further, the MCU is further configured to receive, through the external interface module, a data access instruction for operating the NVM array from the outside, and the MCU is further configured to complete logic control of basic operations of the NVM array based on the data access instruction.
Further, the NVM array employs one of a SONOS (flash memory process) flash memory process, a Floating Gate (flash memory process) flash memory process, and a Split Gate (flash memory process) flash memory process, and the interface standard of the external interface module is SPI and/or QPI;
the data access instruction is a standard flash memory operation instruction; the AI operation instruction and the data access instruction adopt the same instruction format and rule; the AI operation instruction comprises an operation code, and further comprises an address part and/or a data part, wherein the operation code of the AI operation instruction is different from the operation code of the standard flash memory operation instruction.
The chip architecture provided by the scheme is improved on the basis of the traditional flash memory chip architecture, specifically, the MCU and the NPU are embedded in the flash memory chip and are in communication connection through an on-chip Bus, and the on-chip Bus can be an Advanced High Performance Bus (AHB) or other communication Bus meeting the requirements, and is not limited herein. In the scheme, the NPU and the NVM are combined, namely calculation and storage are both in a chip, the weight parameters of the neural network are digitally stored in the NVM array, the neural network calculation is also digital domain calculation, and the NPU and the NVM array are controlled by the MCU based on an external AI operation instruction, so that the bottleneck of the off-chip NVM storage speed is broken through, the external input power consumption is reduced, and high implementability, flexibility and reliability are realized.
The scheme realizes the digital operation of the NVM array based on the MCU, specifically can include the basic operation of flash memories such as read-write erasing, and the like, and the external data access instruction and the external interface can adopt a standard flash memory chip format, so that the chip is easy to flexibly and simply apply. The MCU embedded in the scheme is used as a logic control unit of the NVM, a logic state machine in a standard flash memory is replaced, the chip structure is simplified, and the chip area is saved.
The NVM array in this scheme may be further configured to store externally input data not limited to data related to AI calculation, that is, may also be configured to store externally input other data related to AI calculation and externally input data unrelated to AI calculation, where the unrelated data specifically includes information such as system parameters, configurations and/or codes of an external device or system, in addition to the neural network model, the weight parameter and the program run by the system in the chip; the basic operations include operations such as reading, writing and erasing the neural network model, the weight parameters and a program run by the internal system, and operations such as directly reading, writing and erasing stored externally input data in the NVM array.
The instruction used for NVM direct operation and the instruction used for AI calculation processing adopt the same instruction format and rule. Taking SPI and QPI interfaces as examples, on the basis of traditional SPI and QPI flash memory operation instructions op _ code, the op _ code which is not used by flash memory operation is selected to be used for expressing an AI instruction, more information is transmitted in an address part, and AI data transmission is implemented in a data exchange period. The AI calculation can be realized only by expanding the instruction decoder to realize the multiplexing of the interface and adding a plurality of state registers and configuration registers.
Further, the chip architecture further includes a Direct Memory Access (DMA) channel, and the DMA channel is used for an external device to directly read and write the SRAM.
The positive progress effects of the invention are as follows:
the invention provides a chip architecture for AI calculation based on NVM, which combines NPU and NVM to calculate AI neural network, wherein the weight parameter of neural network is stored in NVM array in the chip in digital mode, the neural network calculation is also digital domain calculation, and is realized by MCU based on external AI calculation instruction to control NPU and NVM array, MCU controls NVM array to load weight parameter of neural network stored in it, program run by MCU and neural network model to calculate AI, compared with various existing storage schemes using NVM to perform analog calculation, the digital storage and calculation mode has flexible calculation structure, and the information stored in NVM has good reliability, high precision and high reading accuracy compared with analog signal multi-level storage, so that the invention breaks through the bottleneck of off-chip NVM storage speed and reduces external input power consumption, but also has high implementability, flexibility and reliability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a diagram of neurons in a prior art AI algorithm;
FIG. 2 is a diagram of a prior art three-layer neural network;
FIG. 3 is a prior art convolutional neural network diagram;
FIG. 4 is a schematic diagram of a prior art AI calculation using additional circuitry within a standard NVM array;
FIG. 5 is a schematic diagram of a chip architecture for performing AI calculations based on NVM according to the present application;
FIG. 6 is a schematic diagram of a data conversion unit of the chip architecture of the present application;
FIG. 7 is a flowchart illustrating the operation of the chip architecture of the present application to invoke NVM read and write operations;
fig. 8 is a flowchart for executing an AI operation instruction based on the chip architecture of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
The basic architecture and relationship of neural networks and artificial intelligence, non-volatile storage and in-memory computation are explained first.
As previously mentioned: artificial Intelligence (AI) algorithms are generated by mimicking human brain structures, and connect to dendrites of a large number of other neurons through synapses between neurons to form a neuron network with simple functions, thereby realizing all human intelligence activities. Human memory and intelligence are generally believed to be stored in the different coupling strengths at each synapse.
Neural network algorithms, emerging from the 60 s of the 20 th century, mimic the function of neurons with a function. The function accepts a plurality of inputs, each with a different weight, and the output is the multiplication of each input by a weight and the summation, as shown in the exemplary AI algorithm neuron map of FIG. 1. The process of learning the training is to adjust the weights. The function is output to many other neurons, forming a network. The algorithm has achieved abundant results and is widely applied. The utility neural networks all have a layered structure, there is no communication inside the neurons in the same layer, and the input of each neuron is connected with the outputs of a plurality of or all the neurons in the previous layer, such as the three-layer neural network diagram shown in fig. 2, which includes an input layer, a hidden layer, and an output layer, and the input layer and the hidden layer include 784 and 15 neurons (neurons), respectively. Different connection modes exist between different layers of the neural network, and the neural network is a fully-connected network.
More commonly, a graph of a convolutional neural network, as shown in fig. 3, has a two-dimensional structure (image) at both the input (input) and output (output), with connections only at nearby points.
However, a practical neural network often has a plurality of layers, and the network structure selectively includes one or more of convolution layer, image size reduction layer, and full connection layer.
Non-volatile storage:
nonvolatile memory (NVM) is a semiconductor storage medium that can hold contents after power is turned off. Common NVMs include flash memory, EEPROM (Electrically Erasable Programmable read only memory), MRAM, RRAM, FeRAM (ferroelectric random access memory), MTP, OTP, and the like. The NVM which is most widely used at present is Flash Memory (NOR Flash Memory), in which a NOR Flash Memory structure has higher reliability and faster reading speed than a NAND Flash Memory structure, and is commonly used for storing system codes, parameters, algorithms, and the like. In particular applications, the system may employ an external stand-alone NVM or an embedded NVM embedded within the system. Embedded NVM is generally compatible with CMOS (Complementary Metal-Oxide-Semiconductor) Semiconductor processes, can be integrated with logic computing chips, and has a faster read speed in the system.
Compared to other NVMs, there are cost and capacity advantages to current flashes. Many flash memory technologies on the market today already have the capability to store multiple bits. Flash memory is slow to erase (milliseconds) but read much faster (nanoseconds), and the read speed of flash memory and other NVMs can support the high bandwidth required for neural network computations.
Inner calculation of storage (In-Memory-calculating)
Since AI computation requires extremely high memory bandwidth, the architecture that separates the processor and memory/storage encounters a bottleneck of insufficient read speed. The industry has begun to study extensively the architecture of computing in one, memory computing. For example, as shown in fig. 4, a schematic diagram of AI calculation by adding circuits in a standard NVM array in the prior art, that is, an architecture for performing neural network calculation, which uses a nonvolatile memory to store weights required by the neural network calculation, and uses an analog circuit to perform vector multiplication, large-scale multiplication and addition can be performed in parallel, which can improve the operation speed and save power consumption.
In the prior art, the in-memory calculation of the NVM is realized by using a nonvolatile memory to store weights in a neural network and realizing the calculation of the neural network through a simulation method, but because the practical neural network basically has a plurality of layers and very complicated connection structures, the transmission of simulation signals between the layers and various processing are very inconvenient, the flexible neural network structure is not favorably supported, the realization and the application of the whole neural network model are quite difficult, and various noises and errors in the storage, the reading and the writing of the simulation signals and the calculation obviously influence the reliability and the calculation accuracy of the model.
FIG. 5 is a diagram of a chip architecture for performing AI calculations based on NVM in accordance with the present invention. As shown in fig. 5, a chip architecture for NVM-based AI computation of the present invention includes NVM array 7, external interface module 2, SRAM5, NPU6, and MCU1 communicatively connected via bus 4. The MCU1 reads from and writes to the SRAM5 and internal NVM array 7 via the bus 4, and communicates with the NPU 6. The NVM array 7 is used to store the weight parameters of the digitized neural network, the program run by the MCU1, and the neural network model on-chip. The NPU6 is used for digital domain acceleration calculations for neural networks. The external interface module 2 is used for receiving external AI operation instructions, inputting data and outputting AI calculation results outwards. The MCU1 is used to execute the program stored in the NVM array 7 based on external AI operation instructions to control the NVM array 7 and NPU6 to perform AI calculations on the input data to obtain the AI calculation results.
The SRAM5 is used as a cache for system operation and calculation inside the chip, and is used for storing input and output data, intermediate data generated by calculation, and the like. The method specifically comprises the steps of caching in the process of executing the program by the MCU1, storing the executable program, system configuration parameters, calculation network structure configuration parameters and the like when the MCU1 runs; the NPU6 operates and buffers, and stores input and output data when operated by a neural network model.
In the chip architecture provided by this embodiment, the NPU6 and the NVM are combined to perform AI neural network calculation, wherein the weight parameters of the neural network are digitally stored in the NVM array 7 inside the chip, the neural network calculation is also digital domain calculation, and specifically, the MCU1 controls the NPU6 and the NVM array 7 based on an external AI calculation instruction, the MCU1 controls the NVM array 7 to perform AI calculation by loading the weight parameters of the neural network stored inside, a program run by the MCU1, and a neural network model, compared with various existing storage schemes that use the NVM to perform analog calculation, the digital storage and calculation method has a flexible calculation structure, compared with the existing various storage schemes that use the NVM to perform analog calculation, the NVM storage information has good reliability, high precision, and high reading accuracy, so that the scheme provided by this embodiment breaks through the bottleneck of speed of using the NVM outside the chip and reduces the external input power consumption, but also has high implementability, flexibility and reliability.
In one embodiment, the neural network model is stored in the NVM array 7 digitally, and the neural network model stored in the NVM array 7 may be various. The external AI operation instruction comprises an algorithm selection instruction, and one of the neural network models is selected as an algorithm for AI calculation through the algorithm selection instruction.
The neural network models in this embodiment are digitally stored in the NVM array 7, and there may be a plurality of neural network models according to the number of application scenarios, and for the case that a plurality of application scenarios correspond to a plurality of neural network models, the MCU1 can flexibly select any one of the prestored neural network models according to an externally input algorithm selection instruction to perform AI calculation, thereby overcoming the problem that the analog calculation array structure in which storage and calculation are integrated is rigid and is not favorable for supporting a flexible neural network structure in the prior art.
In one embodiment, NVM array 7 employs one of, but not limited to, flash memory, MRAM, RRAM, MTP, OTP. The interface standard of the external interface module 2 is at least one of SPI, QPI, and parallel interface.
In other embodiments, NVM array 7 employs one of, but not limited to, SONOS flash memory, Floating Gate flash memory, and Split Gate flash memory technologies. The interface standard of external interface module 2 is SPI and/or QPI.
The chip architecture provided in this embodiment is improved on the basis of the traditional flash memory chip architecture, and specifically, the MCU1 and the NPU6 are embedded in the flash memory chip and communicatively connected through the on-chip bus 4, where the on-chip bus 4 may be an AHB bus or other communication buses meeting the requirements, and is not limited herein. In the scheme provided by the embodiment, the NPU6 and the NVM are combined, that is, calculation and storage are both on-chip, wherein the weight parameters of the neural network are digitally stored in the NVM array 7, the neural network calculation is also digital domain calculation, and specifically, the NPU6 and the NVM array 7 are controlled by the MCU1 based on an external AI operation instruction, so that the bottleneck of using the off-chip NVM storage speed is broken through, the external input power consumption is reduced, and high implementability, flexibility and reliability are achieved.
In one embodiment, in addition to on-chip bus 4 communication, the chip architecture includes a high-speed data read channel; in particular, to set up a high speed data read channel between NPU6 and NVM array 7, NPU6 is also used to read the weight parameters from NVM array 7 via the high speed data read channel. In this embodiment, the high-speed data reading channel is used to support the bandwidth requirement for high-speed reading of the weight parameters, i.e., the weight data, of the neural network when the NPU6 performs digital domain operation. The bit width of the high-speed data reading channel is m bits, and m is a positive integer.
In addition, the NVM array 7 is provided with N read channels, where N is a positive integer, and the read channels read N bits of data in one read cycle, and the NPU6 is used to read the weight parameters from the NVM array 7 through the read channels via the high-speed data read channels. Preferably, N is 128-512, and in one read cycle (typically 30-40 ns), NPU6 reads the weight parameters of the neural network from NVM array 7 through the read channel via the high-speed data read channel with m-bit width. Compared with the reading speed supportable by the off-chip NVM in the prior art, the bandwidth of the method is far higher, and the method can support the parameter reading speed requirement required by the common neural network reasoning calculation.
In one embodiment, the present chip architecture further comprises a data conversion unit. The data conversion unit is used for converting data into a combination of data with the same bit width as the high-speed data reading channel, usually a combination of words with small width (for example 32 bits), for the case that the number of reading channels is not consistent with the bit width and/or the frequency of the high-speed data reading channel is asynchronous. The NPU6 reads data from the data conversion unit via the high speed data read channel at its own clock frequency (which may be above 1 GHz).
Fig. 6 is a schematic diagram of a data conversion unit of the chip architecture of the present application. As shown in fig. 6, the data conversion unit includes a buffer module and a sequential reading module, the buffer module is configured to buffer N bits of data output from the NVM array 7 via the reading channel in sequence according to a cycle, the capacity of the buffer module is N × k bits, and k represents a cycle number. The sequential reading module is used for converting the cache data in the cache module into m-bit wide and outputting the m-bit wide to the NPU6 through the high-speed data reading channel, wherein N x k is an integral multiple of m.
The data conversion unit comprises a buffer module containing N x k bits and a sequential reader for outputting m bits at a time, namely the sequential reading module, wherein N x k is an integral multiple of m; the reading channel is connected with the NVM array 7, N bits can be output in each period, and k periods of data can be stored in the cache; the high speed data read channel width is m bits. The high speed data read channel may contain read and write Command (CMD) and reply (ACK) signals, which are connected to the NVM array 7 read control circuitry. After the read operation is completed, the ACK signal informs the high speed data read channel, which may also inform the on-chip bus, asynchronously multiple times through the sequential read module to input the cached data to the NPU 6.
In one embodiment, the MCU1 is further configured to receive, via the external interface module 2, data access commands for operating the NVM array 7 from outside, and the MCU1 is further configured to complete logic control of basic operations of the NVM array 7 based on the data access commands, which are standard flash memory operation commands; the AI operation instruction and the data access instruction adopt the same instruction format and rule; the AI operation instruction comprises an operation code, and further comprises an address part and/or a data part, wherein the operation code of the AI operation instruction is different from the operation code of the standard flash memory operation instruction.
The instruction for NVM direct operation and the instruction for AI calculation processing in this embodiment use the same instruction format and rules. Taking SPI and QPI interfaces as examples, on the basis of traditional SPI and QPI flash memory operation instructions op _ code, the op _ code which is not used by flash memory operation is selected to be used for expressing an AI instruction, more information is transmitted in an address part, and AI data transmission is implemented in a data exchange period. The AI calculation can be realized only by expanding the instruction decoder to realize the multiplexing of the interface and adding a plurality of state registers and configuration registers.
The MCU1 realizes the digital operation of the NVM array 7, which may specifically include the basic operations of flash memory such as read/write erase, etc., and the external data access command and the external interface may adopt the standard flash memory chip format, which is easy for the chip to be applied flexibly and simply. The MCU1 embedded in the chip is used as a logic control unit of the NVM, a logic state machine in a standard flash memory is replaced, the chip structure is simplified, and the chip area is saved.
The NVM array 7 in this embodiment may be used to store externally input data not limited to data related to AI calculation, that is, may also be used to store externally input other data related to AI calculation and externally input data unrelated to AI calculation, in addition to the neural network model, the weight parameter and the program run by the system inside the chip, where the unrelated data specifically includes information such as system parameters, configuration and/or codes of an external device or system; the basic operations include operations such as reading, writing, and erasing of the neural network model, the weight parameters, and the program run by the internal system, and operations such as reading, writing, and erasing of the stored externally input data directly in the NVM array 7.
In the specific implementation process, the MCU1 receives an external command for reading and writing operations of the NVM array 7, and completes the logic control of the NVM basic operations. These basic operations include storing and reading AI operation model algorithms and parameters, and can also be used for directly storing and reading system parameters, configurations, codes, etc. in the NVM array 7. The MCU1 also accepts external AI arithmetic commands, controls internal arithmetic logic and input/output, and is also used for internal control AI arithmetic logic.
FIG. 7 is a flowchart illustrating the operation of the instructions for invoking NVM read and write operations according to the chip architecture of the present application. As shown in fig. 7, the instruction execution flow is as follows:
step S101, the external device starts the chip where the NVM is located, and the MCU1 is powered on.
Step S102, without external instruction, the MCU1 runs the required codes and parameters and loads them into the SRAM5 from the NVM array 7, and the chip is in standby state.
Step S103, the external device sends an NVM operation instruction, and the MCU1 receives and processes the instruction, where the format and processing mode of the NVM operation instruction are the same as those of the conventional standard NVM.
In one embodiment, the chip architecture further includes a DMA channel 3, the DMA channel 3 being used by an external device to directly read from or write to the SRAM 5. The external interface module 2 realizes multiplexing of data and instructions, and realizes direct read-write operation of external equipment to the SRAM5 in the chip through the DMA channel 3, thereby improving the data transmission efficiency. The external device can also call the SRAM5 as a system memory resource through the DMA channel 3, so that the flexibility of chip application is increased.
Fig. 8 is a flowchart for executing an AI operation instruction based on the chip architecture of the present application. As shown in fig. 8, the AI operation instruction execution flow includes:
step S201, the external device starts the NVM chip, and the MCU1 is powered on.
Step S202, without external command, the MCU1 runs the required codes and parameters and loads them into the SRAM5 from the NVM array 7, and the chip is in standby state.
Step S203, the external device sends an algorithm selection command to select a certain neural network model stored in the NVM array 7 of the chip.
Step S204, the MCU1 processes the instruction, and the internal corresponding storage module is powered on and addressed.
In step S205, the external device sends an AI operation command and input data, and the data is buffered in the SRAM 5.
In step S206, the MCU1 starts the NPU6, and recognizes the input data according to the AI operation command.
Step S207, NPU6 reads the weight parameter data corresponding to the neural network model from NVM array 7 for calculation.
In step S208, the external device reads the AI calculation result from the chip through the external interface module 2.
Steps S205 to S208 may be repeated to input, calculate and output data continuously.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1.一种基于NVM进行AI计算的芯片架构,其特征在于,包括通过总线通信连接的NVM阵列、外部接口模块、NPU和MCU;1. a chip architecture for AI computing based on NVM, is characterized in that, comprises NVM array, external interface module, NPU and MCU connected by bus communication; 所述NVM阵列用于片内存储数字化的神经网络的权重参数、所述MCU运行的程序以及神经网络模型;The NVM array is used for on-chip storage of the weight parameters of the digitized neural network, the program run by the MCU and the neural network model; 所述NPU用于神经网络的数字域加速计算;The NPU is used for accelerated computing in the digital domain of the neural network; 所述外部接口模块用于接收外部的AI运算指令、输入数据以及向外输出AI计算的结果;The external interface module is used to receive external AI operation instructions, input data and output the results of AI calculation; 所述MCU用于基于所述AI运算指令执行所述程序,以控制所述NVM阵列和所述NPU对所述输入数据进行AI计算,以得到所述AI计算的结果。The MCU is configured to execute the program based on the AI operation instruction, so as to control the NVM array and the NPU to perform AI calculation on the input data to obtain a result of the AI calculation. 2.如权利要求1所述的基于NVM进行AI计算的芯片架构,其特征在于,所述芯片架构还包括高速数据读取通道;所述NPU还用于通过所述高速数据读取通道从所述NVM阵列读取所述权重参数。2. The chip architecture for AI computing based on NVM according to claim 1, characterized in that, the chip architecture further comprises a high-speed data reading channel; the NPU is further configured to retrieve data from all data through the high-speed data reading channel. The NVM array reads the weight parameter. 3.如权利要求2所述的基于NVM进行AI计算的芯片架构,其特征在于,所述NVM阵列设有读通道,所述读通道为N路,N为正整数,在一个读周期内所述读通道共读取N比特数据,所述NPU用于通过所述高速数据读取通道经所述读通道从所述NVM阵列读取所述权重参数。3. The chip architecture for AI computing based on NVM as claimed in claim 2, wherein the NVM array is provided with a read channel, and the read channel is N paths, and N is a positive integer, and all of the read channels are in one read cycle. The read channel reads N bits of data in total, and the NPU is configured to read the weight parameter from the NVM array through the high-speed data read channel through the read channel. 4.如权利要求3所述的基于NVM进行AI计算的芯片架构,其特征在于,所述高速数据读取通道的位宽为m比特,m为正整数;所述芯片架构还包括数据转换单元,所述数据转换单元包括缓存模块和顺序读取模块,所述缓存模块用于按周期依次缓存经所述读通道输出的权重参数,所述缓存模块的容量为N*k比特,k表示周期数;所述顺序读取模块用于将所述缓存模块中的缓存数据转换成m比特位宽后经所述高速数据读取通道输出至所述NPU,其中N*k为m的整数倍。4. The chip architecture for AI computing based on NVM as claimed in claim 3, wherein the bit width of the high-speed data reading channel is m bits, and m is a positive integer; the chip architecture further comprises a data conversion unit , the data conversion unit includes a buffer module and a sequential read module, the buffer module is used to sequentially buffer the weight parameters output by the read channel in cycles, the capacity of the buffer module is N*k bits, and k represents the cycle The sequential reading module is configured to convert the buffered data in the buffering module into an m-bit bit width and output to the NPU through the high-speed data reading channel, where N*k is an integer multiple of m. 5.如权利要求1所述的基于NVM进行AI计算的芯片架构,其特征在于,所述芯片架构还包括SRAM,所述SRAM通过所述总线与所述NVM阵列、所述外部接口模块、所述NPU以及所述MCU通信连接;所述SRAM用于缓存所述MCU执行所述程序过程中的数据,所述NPU运算过程中的数据以及神经网络模型运算的输入输出数据。5. The chip architecture for AI computing based on NVM according to claim 1, wherein the chip architecture further comprises SRAM, and the SRAM communicates with the NVM array, the external interface module, all the The NPU and the MCU are communicatively connected; the SRAM is used for buffering the data in the process of executing the program by the MCU, the data in the operation process of the NPU, and the input and output data of the neural network model operation. 6.如权利要求1所述的基于NVM进行AI计算的芯片架构,其特征在于,所述NVM阵列中存储多种神经网络模型,所述AI运算指令包括算法选择指令,所述算法选择指令用于选择多个所述神经网络模型中的一个作为进行AI计算的算法。6. The chip architecture for AI computing based on NVM as claimed in claim 1, wherein a variety of neural network models are stored in the NVM array, and the AI operation instruction comprises an algorithm selection instruction, and the algorithm selection instruction uses for selecting one of the plurality of neural network models as an algorithm for AI computing. 7.如权利要求1所述的基于NVM进行AI计算的芯片架构,其特征在于,所述NVM阵列采用闪存工艺、MRAM工艺、RRAM工艺、MTP工艺、OTP工艺中的一种,和/或,所述外部接口模块的接口标准为SPI、QPI以及并行接口中的至少一种。7. The chip architecture for AI computing based on NVM as claimed in claim 1, wherein the NVM array adopts one of flash memory technology, MRAM technology, RRAM technology, MTP technology, and OTP technology, and/or, The interface standard of the external interface module is at least one of SPI, QPI and parallel interface. 8.如权利要求7所述的基于NVM进行AI计算的芯片架构,其特征在于,所述MCU还用于通过所述外部接口模块接收外部用于操作所述NVM阵列的数据访问指令,所述MCU还用于基于所述数据访问指令完成对所述NVM阵列的基本操作的逻辑控制。8. The chip architecture for AI computing based on NVM according to claim 7, wherein the MCU is further configured to receive an external data access instruction for operating the NVM array through the external interface module, the The MCU is further configured to complete logical control of basic operations of the NVM array based on the data access instruction. 9.如权利要求8所述的基于NVM进行AI计算的芯片架构,其特征在于,所述NVM阵列采用SONOS闪存工艺、Floating Gate闪存工艺、Split Gate闪存工艺中的一种,所述外部接口模块的接口标准为SPI和/或QPI;9. The chip architecture for AI computing based on NVM according to claim 8, wherein the NVM array adopts one of SONOS flash memory technology, Floating Gate flash memory technology, and Split Gate flash memory technology, and the external interface module The interface standard is SPI and/or QPI; 所述数据访问指令为标准的闪存操作指令;所述AI运算指令与所述数据访问指令采用相同的指令格式和规则;所述AI运算指令包括操作码,所述AI运算指令还包括地址部分和/或数据部分,所述AI运算指令的操作码与所述标准的闪存操作指令的操作码不同。The data access instruction is a standard flash memory operation instruction; the AI operation instruction adopts the same instruction format and rule as the data access instruction; the AI operation instruction includes an operation code, and the AI operation instruction also includes an address part and a /or data part, the operation code of the AI operation instruction is different from the operation code of the standard flash memory operation instruction. 10.如权利要求5所述的基于NVM进行AI计算的芯片架构,其特征在于,所述芯片架构还包括DMA通道,所述DMA通道用于外部设备直接读写所述SRAM。10 . The chip architecture for AI computing based on NVM according to claim 5 , wherein the chip architecture further comprises a DMA channel, and the DMA channel is used for external devices to directly read and write the SRAM. 11 .
CN202110541351.3A 2021-05-18 2021-05-18 Chip architecture for AI computing based on NVM Active CN113127407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110541351.3A CN113127407B (en) 2021-05-18 2021-05-18 Chip architecture for AI computing based on NVM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110541351.3A CN113127407B (en) 2021-05-18 2021-05-18 Chip architecture for AI computing based on NVM

Publications (2)

Publication Number Publication Date
CN113127407A true CN113127407A (en) 2021-07-16
CN113127407B CN113127407B (en) 2025-08-01

Family

ID=76782712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110541351.3A Active CN113127407B (en) 2021-05-18 2021-05-18 Chip architecture for AI computing based on NVM

Country Status (1)

Country Link
CN (1) CN113127407B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897150A (en) * 2022-04-01 2022-08-12 中国科学技术大学苏州高等研究院 Reliability design method of AI intelligent module
CN115238876A (en) * 2022-07-19 2022-10-25 北京苹芯科技有限公司 Memory neural network computing device and method based on heterogeneous storage
WO2023016030A1 (en) * 2021-08-11 2023-02-16 华为技术有限公司 Neural network parameter deployment method, ai integrated chip, and related apparatus thereof
CN116775554A (en) * 2023-06-27 2023-09-19 无锡中微亿芯有限公司 A storage and computing architecture FPGA that supports instruction broadcasting
CN119761435A (en) * 2024-12-11 2025-04-04 北京邮电大学 Method and electronic device for deploying neural network in analog in-memory computing NPU
CN119884007A (en) * 2024-12-16 2025-04-25 联和存储科技(江苏)有限公司 Data processing method based on AI model, chip architecture and intelligent terminal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018049648A1 (en) * 2016-09-18 2018-03-22 深圳市大疆创新科技有限公司 Data conversion apparatus, chip, method and device, and image system
US20180157970A1 (en) * 2016-12-01 2018-06-07 Via Alliance Semiconductor Co., Ltd. Processor with memory array operable as either cache memory or neural network unit memory
CN110334799A (en) * 2019-07-12 2019-10-15 电子科技大学 Neural Network Reasoning and Training Accelerator Based on Storage and Computing Integration and Its Operation Method
CN110516801A (en) * 2019-08-05 2019-11-29 西安交通大学 A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput
CN214846708U (en) * 2021-05-18 2021-11-23 南京优存科技有限公司 Chip architecture for AI calculation based on NVM

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018049648A1 (en) * 2016-09-18 2018-03-22 深圳市大疆创新科技有限公司 Data conversion apparatus, chip, method and device, and image system
US20180157970A1 (en) * 2016-12-01 2018-06-07 Via Alliance Semiconductor Co., Ltd. Processor with memory array operable as either cache memory or neural network unit memory
CN108133269A (en) * 2016-12-01 2018-06-08 上海兆芯集成电路有限公司 With the processor of memory array that cache memory or neural network cell memory can be used as to operate
CN110334799A (en) * 2019-07-12 2019-10-15 电子科技大学 Neural Network Reasoning and Training Accelerator Based on Storage and Computing Integration and Its Operation Method
CN110516801A (en) * 2019-08-05 2019-11-29 西安交通大学 A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput
CN214846708U (en) * 2021-05-18 2021-11-23 南京优存科技有限公司 Chip architecture for AI calculation based on NVM

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023016030A1 (en) * 2021-08-11 2023-02-16 华为技术有限公司 Neural network parameter deployment method, ai integrated chip, and related apparatus thereof
CN114897150A (en) * 2022-04-01 2022-08-12 中国科学技术大学苏州高等研究院 Reliability design method of AI intelligent module
CN115238876A (en) * 2022-07-19 2022-10-25 北京苹芯科技有限公司 Memory neural network computing device and method based on heterogeneous storage
CN116775554A (en) * 2023-06-27 2023-09-19 无锡中微亿芯有限公司 A storage and computing architecture FPGA that supports instruction broadcasting
CN119761435A (en) * 2024-12-11 2025-04-04 北京邮电大学 Method and electronic device for deploying neural network in analog in-memory computing NPU
CN119761435B (en) * 2024-12-11 2025-12-02 北京邮电大学 Methods and electronic devices for deploying neural networks in analog in-memory computing NPUs
CN119884007A (en) * 2024-12-16 2025-04-25 联和存储科技(江苏)有限公司 Data processing method based on AI model, chip architecture and intelligent terminal

Also Published As

Publication number Publication date
CN113127407B (en) 2025-08-01

Similar Documents

Publication Publication Date Title
CN113127407A (en) Chip architecture for AI calculation based on NVM
CN110390385B (en) BNRP-based configurable parallel general convolutional neural network accelerator
JP7628112B2 (en) Memory chips connecting system-on-chips and accelerator chips
JP2019204492A (en) Neuromorphic accelerator multitasking
CN111860773B (en) Processing device and method for information processing
CN112633505A (en) RISC-V based artificial intelligence reasoning method and system
CN112988082B (en) Chip system for AI calculation based on NVM and operation method thereof
CN110738316A (en) Operation method, device and electronic device based on neural network
US20250111217A1 (en) Data layout conscious processing in memory architecture for executing neural network model
CN113157638B (en) Low-power-consumption in-memory calculation processor and processing operation method
US20220391128A1 (en) Techniques to repurpose static random access memory rows to store a look-up-table for processor-in-memory operations
CN115443467A (en) Integrated circuit device with deep learning accelerator and random access memory
KR20220052355A (en) Copying data from memory system with AI mode
CN116635936A (en) Memory configuration for supporting deep learning accelerator in integrated circuit device
CN117952165A (en) A method for implementing a neural network accelerator based on FPGA
CN214846708U (en) Chip architecture for AI calculation based on NVM
CN114968362A (en) Heterogeneous fused computing instruction set and method of use
CN110232441B (en) Stack type self-coding system and method based on unidirectional pulsation array
CN117234720A (en) Dynamically configurable storage and computing fusion data cache structure, processor and electronic equipment
CN116529818B (en) Dual-port, dual-function memory device
CN116050492A (en) an extension unit
CN119721149B (en) A general-purpose CNN accelerator based on ZYNQ and its usage.
Bai et al. An OpenCL-based FPGA accelerator with the Winograd’s minimal filtering algorithm for convolution neuron networks
CN118485111A (en) Convolutional resource scheduling device, method and equipment for pulse neural network
WO2020051918A1 (en) Neuronal circuit, chip, system and method therefor, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant