TWI907830B

TWI907830B - Input circuit for artificial neural network array and operating methods thereof

Info

Publication number: TWI907830B
Application number: TW112131057A
Authority: TW
Inventors: 曉萬陳; 順武; 史蒂芬鄭; 史丹利洪; 義樂; 賢范
Original assignee: 美商超捷公司
Priority date: 2022-09-22
Filing date: 2023-08-18
Publication date: 2025-12-11

Abstract

Numerous examples are disclosed of input circuitry and associated methods in an artificial neural network. In one example, a system comprises a plurality of address decoders to receive an address and output a plurality of row enabling signals in response to the address; a first plurality of registers to store, sequentially, activation data in response to the plurality of row enabling signals; and a second plurality of registers to store, in parallel, activation data received from the first plurality of registers.

Description

Input circuit and its operation method for artificial neural network arrays

本申請案主張2022年12月8日申請且名稱為「用於人工神經網路陣列的輸入電路」之美國專利申請案第18/077,686號及2022年9月22日申請且名稱為「人工神經網路陣列中用於並行與管線操作的輸入電路及輸出電路」之美國臨時申請案第63/409,140號的優先權。 This application claims priority to U.S. Patent Application No. 18/077,686, filed December 8, 2022, entitled "Input Circuit for Artificial Neural Network Array," and U.S. Provisional Application No. 63/409,140, filed September 22, 2022, entitled "Input and Output Circuits for Parallel and Pipeline Operation in Artificial Neural Network Array."

揭示輸入電路系統及相關方法的許多示例，以實施人工神經網路中的並行與管線操作。 Numerous examples of input circuit systems and related methods are disclosed to implement parallel and pipelined operation in artificial neural networks.

人工神經網路類比生物神經網路(動物之中樞神經系統，特別是大腦)且用於估計或近似可取決於大量輸入且通常未知的功能。人工神經網路通常包括彼此交換訊息之互連「神經元」的層。 Artificial neural networks are analogous to biological neural networks (the central nervous system of animals, especially the brain) and are used to estimate or approximate functions that depend on large amounts of input and are often unknown. Artificial neural networks typically consist of layers of interconnected "neurons" that exchange information with each other.

圖1繪示人工神經網路，其中圓形表示神經元之輸入或層。連接(稱為突觸)由箭頭表示，且具有可基於經驗進行調諧之數值權重。此使得神經網路適應於輸入且能夠學習。典型地，神經網路包括一層多個輸入。典型地存在一或多個中間神經元層及提供神經網路之輸出的輸出神經元層。每一位準處之神經元基於從突觸所接收之資料而個別地或共同地作出決策。 Figure 1 illustrates an artificial neural network, where circles represent neuron inputs or layers. Connections (called synapses) are represented by arrows and have numerical weights that can be tuned based on experience. This allows the neural network to adapt to inputs and learn. Typically, a neural network consists of one layer with multiple inputs. There are typically one or more intermediate neuron layers and output neuron layers that provide the network's outputs. Each neuron makes decisions individually or collectively based on data received from synapses.

用於高效能資訊處理之人工神經網路之發展中的主要挑戰之一在於缺乏充分的硬體技術。實際上，實用的神經網路依賴於極大量之突觸，從而實現神經元之間的高連接性，亦即，極高計算平行性。原則上，此複雜性可利用數位超級電腦或專用圖形處理單元叢集來達成。然而，除高成本之外，與生物網路相比，此等方法亦受平凡能量效率困擾，主要因為生物網路執行低精確度類比計算，所以其消耗少得多的能量。CMOS類比電路已用於人工神經網路，但鑒於大量神經元及突觸，故大部分CMOS實施之突觸已過於龐大。 One of the major challenges in developing artificial neural networks for high-performance information processing is the lack of sufficient hardware technology. Practical neural networks rely on a vast number of synapses to achieve high connectivity between neurons, i.e., extremely high computational parallelism. In principle, this complexity can be achieved using digital supercomputers or dedicated clusters of graphics processing units. However, in addition to high cost, these methods are also plagued by mundane energy efficiency compared to biological networks, mainly because biological networks perform low-precision analog computations and therefore consume far less energy. CMOS analog circuits have been used in artificial neural networks, but given the large number of neurons and synapses, most CMOS implementations have excessively large synapses.

申請人先前在美國專利申請公開案2017/0337466A1中揭示一種利用一或多個非揮發性記憶體陣列作為突觸之人工(類比)神經網路，該美國專利申請公開案以引用之方式併入。非揮發性記憶體陣列操作為類比神經記憶體，且包含以列及行配置之非揮發性記憶體胞元。神經網路包括：第一複數個突觸，其被配置成接收第一複數個輸入且自該第一複數個輸入產生第一複數個輸出；及第一複數個神經元，其被配置成接收第一複數個輸出。第一複數個突觸包括複數個記憶體胞元，其中記憶體胞元之每個包括：形成於半導體基板中之間隔開的源極區及汲極區，以及在源極區與汲極區之間延伸的通道區；浮動閘極，設置於通道區之第一部分上方且與該第一部分絕緣；及非浮動閘極，設置於通道區之第二部分上方且與該第二部分絕緣。複數個記憶體胞元之每個儲存對應於浮動閘極上之電子數目的權重值。複數個記憶體胞元將第一複數個輸入乘以儲存的權重值以產生第一複數個輸出。 The applicant previously disclosed an artificial (analog) neural network utilizing one or more nonvolatile memory arrays as synapses in U.S. Patent Application Publication 2017/0337466A1, which is incorporated herein by reference. The nonvolatile memory arrays operate as analog neural memory and include nonvolatile memory cells arranged in columns and rows. The neural network includes: a first plurality of synapses configured to receive a first plurality of inputs and generate a first plurality of outputs from the first plurality of inputs; and a first plurality of neurons configured to receive the first plurality of outputs. The first plurality of synapses include a plurality of memory cells, each of which includes: a source region and a drain region formed in a semiconductor substrate and spaced apart, and a channel region extending between the source and drain regions; a floating gate disposed above and insulated from a first portion of the channel region; and a non-floating gate disposed above and insulated from a second portion of the channel region. Each of the plurality of memory cells stores a weighted value corresponding to the number of electrons on the floating gate. The plurality of memory cells multiply the first plurality of inputs by the stored weighted values to generate the first plurality of outputs.

非揮發性記憶體胞元： Non-volatile memory cells:

非揮發性記憶體為眾所周知的。舉例而言，以引用方式併入本文中之美國專利5,029,130(「'130專利」)揭示了一種分離閘極非揮發性記憶體胞元陣列，其為一種類型之快閃記憶體胞元。此記憶體胞元210顯示於圖2中。每個記憶體胞元210包括形成於半導體基板12中之源極區14及汲極區16，其中通道區18位於該源極區與該汲極區之間。浮動閘極20形成於通道區18之第一部分上方並與該第一部分絕緣(且控制該第一部分之導電性)，以及形成於源極區14之一部分上方。字元線端子22(其通常耦接至字元線)具有：第一部分，其設置於通道區18之第二部分上方且與該第二部分絕緣(且控制該第二部分之導電性)；及第二部分，其延伸向上且在浮動閘極20上方。浮動閘極20及字元線端子22藉由閘極氧化物與基板12絕緣。位元線24耦接至汲極區16。 Non-volatile memory is well known. For example, U.S. Patent 5,029,130 (“130 Patent”), incorporated herein by reference, discloses a separate gate non-volatile memory cell array, which is a type of flash memory cell. This memory cell 210 is shown in Figure 2. Each memory cell 210 includes a source region 14 and a drain region 16 formed in a semiconductor substrate 12, wherein a channel region 18 is located between the source region and the drain region. A floating gate 20 is formed above and insulated from (and controls the conductivity of) a first portion of the channel region 18, and is also formed above a portion of the source region 14. A character line terminal 22 (generally coupled to a character line) has: a first portion disposed above and insulated from (and controlling the conductivity of) a second portion of the channel region 18; and a second portion extending upward above the floating gate 20. The floating gate 20 and the character line terminal 22 are insulated from the substrate 12 by a gate oxide. A bit line 24 is coupled to the drain region 16.

記憶體胞元210藉由將高正電壓置放於字元線端子22上來抹除(其中電子自浮動閘極移除)，此使得浮動閘極20上之電子經由富爾-諾罕(Fowler-Nordheim；FN)穿隧從浮動閘極20經過中間絕緣件穿隧至字元線端子22。 Memory cell 210 is erased by applying a high positive voltage to word line terminal 22 (whereby electrons are removed from the floating gate). This causes electrons on the floating gate 20 to tunnel through the Fowler-Nordheim (FN) tunnel from the floating gate 20 through the intermediate insulation to the word line terminal 22.

記憶體胞元210係藉由將正電壓置放於字元線端子22上並將正電壓置放於源極區14上而藉由具熱電子之源極側注入(SSI)而經程式化(其中電子置放於浮動閘極上)。電子電流將從汲極區16朝向源極區14流動。當電子到達字元線端子22與浮動閘極20之間的間隙時，該等電子將加速並且被加熱。由於來自浮動閘極20之吸引靜電力，經加熱電子之一些將經過閘極氧化物被注入至浮動閘極20上。 Memory cell 210 is programmed via source-side injection (SSI) with hot electrons (where electrons are placed on the floating gate) by applying a positive voltage to word line terminals 22 and a positive voltage to source region 14. Electron current flows from drain region 16 towards source region 14. When electrons reach the gap between word line terminals 22 and floating gate 20, they are accelerated and heated. Due to the attractive electrostatic force from floating gate 20, some of the heated electrons are injected into floating gate 20 via gate oxide.

記憶體胞元210藉由將正讀取電壓置於汲極區16及字元線端子22上來讀取(此開通字元線端子下之通道區18的部分)。若浮動閘極20帶正電(亦即，電子被抹除)，則在浮動閘極20下方之通道區18的部分亦開通，且電流將流過通道區18，此被感測為經抹除或「1」狀態。若浮動閘極20帶負電(亦即，用電子程式化)，則在浮動閘極20下方之通道區的部分幾乎或完全關斷，且電流將不流過通道區18(或將有極少流動)，此被感測為經程式化或「0」狀態。 Memory cell 210 reads data by applying a positive read voltage to drain region 16 and word line terminal 22 (this opens a portion of channel region 18 below the word line terminal). If float gate 20 is positively charged (i.e., electrons are erased), the portion of channel region 18 below float gate 20 is also open, and current flows through channel region 18; this is sensed as erased or a "1" state. If float gate 20 is negatively charged (i.e., electronically programmed), the portion of channel region below float gate 20 is almost or completely closed, and current does not flow through channel region 18 (or flows very little); this is sensed as programmed or a "0" state.

表1描繪用於執行讀取、抹除及程式化操作之可施加至記憶體胞元210之端子的典型電壓及電流範圍： Table 1 depicts the typical voltage and current ranges that can be applied to the terminals of memory cell 210 for performing read, erase, and programmed operations:

其他分離閘極記憶體胞元配置為已知的，其為其他類型之快閃記憶體胞元。舉例而言，圖3描繪四閘極記憶體胞元310，包含源極區14、汲極區16、在通道區18之第一部分上方的浮動閘極20、在通道區18之第二部分上方的選擇閘極22(通常耦接至字元線WL)、在浮動閘極20上方之控制閘極28以及在源極區14上方之抹除閘極30。此配置描繪於美國專利6,747,310中，其出於所有目的以引用之方式併入本文中。此處，除浮動閘極20以外，所有閘極皆為非浮動閘極，意謂該等閘極電連接或可電連接至電壓源。程式化藉由來自通道區18之經加熱電子將自身注入至浮動閘極20上來執行。抹除藉由電子從浮動閘極20至抹除閘極30之穿隧來執行。 Other isolated gate memory cell configurations are known, which are other types of flash memory cells. For example, Figure 3 depicts a quad-gate memory cell 310, including a source region 14, a drain region 16, a floating gate 20 above a first portion of a channel region 18, a selection gate 22 above a second portion of the channel region 18 (typically coupled to a word line WL), a control gate 28 above the floating gate 20, and an erase gate 30 above the source region 14. This configuration is depicted in U.S. Patent 6,747,310, which is incorporated herein by reference for all purposes. Here, except for the floating gate 20, all gates are non-floating gates, meaning that these gates are electrically connected or can be electrically connected to a voltage source. Programming is performed by injecting heated electrons from channel region 18 into the floating gate 20. Erasure is performed by tunneling electrons from the floating gate 20 to the erasure gate 30.

表2描繪用於執行讀取、抹除及程式化操作之可施加至記憶體胞元310之端子的典型電壓及電流範圍： Table 2 depicts the typical voltage and current ranges that can be applied to the terminals of memory cell 310 for performing read, erase, and programmed operations:

圖4描繪三閘極記憶體胞元410，其為另一類型之快閃記憶體胞元。記憶體胞元410與圖3之記憶體胞元310相同，除了記憶體胞元410不具有單獨控制閘極外。抹除操作(進而經由使用抹除閘極進行抹除)及讀取操作類似於圖3之抹除操作及讀取操作，除了未施加控制閘極偏壓外。程式化操作亦在無控制閘極偏壓之情況下進行，且因此，在程式化操作期間較高電壓施加於源極線上以補償控制閘極偏壓之缺乏。 Figure 4 depicts a three-gate memory cell 410, another type of flash memory cell. Memory cell 410 is identical to memory cell 310 in Figure 3, except that it does not have a separate control gate. Erasure operations (and thus erasure via the erase gate) and read operations are similar to those in Figure 3, except that no control gate bias is applied. Programming operations are also performed without a control gate bias, and therefore, a higher voltage is applied to the source line during programming operations to compensate for the lack of a control gate bias.

表3描繪用於執行讀取、抹除及程式化操作之可施加至記憶體胞元410之端子的典型電壓及電流範圍： Table 3 depicts the typical voltage and current ranges that can be applied to the terminals of memory cell 410 for performing read, erase, and programmed operations:

圖5描繪堆疊閘極記憶體胞元510，其為另一類型之快閃記憶體胞元。記憶體胞元510類似於圖2之記憶體胞元210，除了浮動閘極20在整個通道區18上方延伸，且控制閘極22(其在此處將耦接至字元線)在浮動閘極20上方延伸，藉由絕緣層(未顯示)分離。該抹除藉由電子從FG至基板之FN穿隧來完成，程式化藉由在通道區18與汲極區16之間的區處進行通道熱電子(CHE)注入、藉由電子從源極區14朝向汲極區16流動來完成，且讀取操作類似於針對具有較高控制閘極電壓之記憶體胞元210之讀取操作。 Figure 5 depicts a stacked gate memory cell 510, which is another type of flash memory cell. Memory cell 510 is similar to memory cell 210 in Figure 2, except that the floating gate 20 extends over the entire channel area 18, and the control gate 22 (which will be coupled to the character line here) extends over the floating gate 20, separated by an insulating layer (not shown). The erasure is accomplished by electron tunneling from the source region (FG) to the substrate via the source region (FN). This is programmed by hot channel electron (CHE) injection in the region between the channel region 18 and the drain region 16, and by electron flow from the source region 14 to the drain region 16. The read operation is similar to that for reading memory cells 210 with higher control gate voltages.

表4描繪用於執行讀取、抹除及程式化操作之可施加至記憶體胞元510之端子及基板12的典型電壓範圍： Table 4 depicts the typical voltage ranges that can be applied to the terminals of memory cell 510 and substrate 12 for performing read, erase, and programmed operations:

本文中所描繪之方法及手段可應用於其他非揮發性記憶體技術，如FINFET分離閘極快閃或堆疊閘極快閃記憶體、NAND快閃、SONOS(氧化矽-氮化物-氧化物-矽，氮化物中之電荷捕捉)、MONOS(金屬-氧化物-氮化物-氧化物-矽，氮化物中之金屬電荷捕捉)、ReRAM(電阻式ram)、PCM(相變記憶體)、MRAM(磁性ram)、FeRAM(鐵電ram)、CT(電荷捕捉)記憶體、CN(碳管)記憶體、OTP(雙位準或多位準一次性可程式化)及CeRAM(相關電子ram)，但不限於此。 The methods and techniques described herein can be applied to other non-volatile memory technologies, such as FinFET gate flash or stacked gate flash memory, NAND flash, SONOS (silicon oxide-nitride-oxide-silicon, charge trapping in nitrides), MONOS (metal oxide-nitride-oxide-silicon, metal charge trapping in nitrides), ReRAM (resistive RAM), PCM (phase change memory), MRAM (magnetic RAM), FeRAM (ferroelectric RAM), CT (charge trapping) memory, CN (carbon nanotube) memory, OTP (two- or multi-level one-time programmable) and CeRAM (correlated electronic RAM), but are not limited thereto.

為了利用包含上文在人工神經網路中所描繪之非揮發性記憶體胞元類型之一者的記憶體陣列，進行二個修改。首先，線被配置成使得每個記憶體胞元可個別地經程式化、抹除及讀取而不會不利地影響陣列中之其他記憶體胞元的記憶狀態，如下文進一步解釋。其次，提供記憶體胞元之連續(類比)程式化。 To utilize a memory array containing one of the nonvolatile memory cell types described above in artificial neural networks, two modifications are made. First, the lines are configured such that each memory cell can be individually programmed, erased, and read without adversely affecting the memory state of other memory cells in the array, as explained further below. Second, continuous (analogous) programming of memory cells is provided.

具體而言，陣列中之每個記憶體胞元之記憶狀態(亦即，浮動閘極上之電荷)可獨立地且在最少干擾其他記憶體胞元之情況下連續地從完全抹除狀態改變至完全程式化狀態，反之亦然。此意謂胞元儲存器有效地類比或至少可儲存許多離散值(例如，16 或64個不同值)之一者，此允許記憶體陣列中之所有記憶體胞元的極精確及個別調諧，且此使得記憶體陣列對於儲存及對神經網路之突觸權重進行細微調諧而言係理想的。 Specifically, the memory state (i.e., the charge on the floating gate) of each memory cell in the array can independently and continuously change from a completely erased state to a fully programmed state, and vice versa, with minimal interference to other memory cells. This means that the cell memory can effectively analog to or at least store one of many discrete values (e.g., 16 or 64 different values), allowing for extremely precise and individual tuning of all memory cells in the memory array, and making the memory array ideal for storing and finely tuning the synaptic weights of neural networks.

採用非揮發性記憶體胞元陣列之神經網路： Neural networks employing non-volatile memory cell arrays:

圖6在概念上說明利用本發明示例之非揮發性記憶體陣列的神經網路之非限制性示例。此示例將非揮發性記憶體陣列神經網路用於人臉辨識應用，但任何其他適當應用皆可使用基於非揮發性記憶體陣列之神經網路來實施。 Figure 6 conceptually illustrates a non-limiting example of a neural network utilizing a non-volatile memory array, as described in this invention. This example uses a non-volatile memory array neural network for a face recognition application, but any other suitable application can be implemented using a neural network based on a non-volatile memory array.

S0為輸入層，對於此示例，該輸入層為具有5位元精確度之32×32像素RGB影像(亦即，三個32×32像素陣列，一個陣列針對每個色彩R、G及B，每個像素為5位元精確度)。在一些情況下，自輸入層S0進入層C1之突觸CB1施加不同權重集合，而在其它情況下共用權重且用3×3像素重疊濾波器(核心)掃描輸入影像，將濾波器移位1個像素(或大於1像素，如由模型指定)。具體地，影像之3×3部分(亦即，稱作濾波器或核心)中之9個像素的值被提供給突觸CB1，其中這些9個輸入值乘以適當權重，且在加總那相乘之輸出之後，單個輸出值被判定且藉由CB1之第一突觸提供，用於產生層C1之特徵圖之一者的一個像素。3×3濾波器接著在輸入層S0內向右移位一個像素(亦即，在右側上添加三個像素的行，且在左側上丟棄三個像素的行)，藉此將此新定位濾波器中之9個像素值提供至突觸CB1，其中使該等像素值乘以相同權重，且第二單一輸出值藉由相關突觸來判定。針對所有三種色彩且針對所有位元(精確度值)此程序繼續，直至3×3濾波器掃描輸入層S0之整個32×32像素影像為止。程序接著使用不同權重集合重複進行以產生層C1之不同特徵圖，直至層C1之所有特徵圖已經計算為止。 S0 is the input layer. In this example, the input layer is a 32×32 pixel RGB image with 5-bit accuracy (i.e., three 32×32 pixel arrays, one array for each color R, G, and B, with each pixel having 5-bit accuracy). In some cases, different weight sets are applied to the synapses CB1 from the input layer S0 into the input layer C1, while in others, the weights are shared and the input image is scanned with a 3×3 pixel overlay filter (core), shifting the filter by 1 pixel (or more than 1 pixel, as specified by the model). Specifically, the values of nine pixels in the 3×3 portion of the image (i.e., the filter or core) are provided to synapse CB1, where these nine input values are multiplied by appropriate weights, and after summing the multiplied outputs, a single output value is determined and provided by the first synapse of CB1 for the generation of one of the feature maps of layer C1. The 3×3 filter is then shifted one pixel to the right within the input layer S0 (i.e., a row of three pixels is added on the right and a row of three pixels is discarded on the left), thereby providing the nine pixel values in this newly positioned filter to synapse CB1, where these pixel values are multiplied by the same weights, and a second single output value is determined by the relevant synapse. This procedure continues for all three colors and for all bits (accuracy values) until the 3×3 filter scans the entire 32×32 pixel image of the input layer S0. The procedure is then repeated using different weight sets to generate different feature maps for layer C1 until all feature maps for layer C1 have been calculated.

在本示例中，在層C1中存在16個特徵圖，每個特徵圖具有30×30個像素。每個像素提取自輸入與核心相乘之新特徵像素，且因此每個特徵圖為二維陣列，且因此在此示例中，層C1構成二維陣列之16個層(應謹記，本文中所提及之層及陣列為邏輯關係，未必為實體關係-亦即，陣列未必定向於實體二維陣列中)。層C1中之16個特徵圖中的每個皆由應用於濾波器掃描之突觸權重之十六個不同集合的一者產生。C1特徵圖可皆針對同一影像特徵之不同態樣，如邊界識別。舉例而言，第一圖(使用第一權重集合而產生，對於產生此第一圖之所有掃描共用第一權重)可識別圓形邊緣，第二圖(使用不同於第一權重集合之第二權重集合而產生)可識別矩形邊緣，或某些特徵的長寬比等等。 In this example, there are 16 feature maps in layer C1, each with 30×30 pixels. Each pixel is extracted from the new feature pixel obtained by multiplying the input by the kernel, and therefore each feature map is a two-dimensional array. Thus, in this example, layer C1 constitutes 16 layers of two-dimensional arrays (it should be noted that the terms "layer" and "array" as used herein are logical relationships, not necessarily entity relationships—that is, the array is not necessarily oriented within a real two-dimensional array). Each of the 16 feature maps in layer C1 is generated from one of sixteen different sets of synaptic weights applied to the filter scan. The C1 feature maps can all target different variations of the same image feature, such as boundary recognition. For example, a first image (generated using a first set of weights, which are shared by all scans that generated this first image) can identify circular edges, while a second image (generated using a second set of weights different from the first set) can identify rectangular edges, or the aspect ratio of certain features, etc.

激勵函數P1(池化(pooling))在從層C1進入層S1之前應用，其池化值來自每個特徵圖中之連續非重疊2×2區的值。例如，池化函數P1之目的係取附近位置的平均值(或亦可使用最大函數)，以降低邊緣位置之相依性且在進入下一階段之前減小資料大小。在層S1處，存在16個15×15特徵圖(亦即，每個具有15×15像素之十六個不同陣列)。從層S1進入層C2之突觸CB2用4×4濾波器掃描層S1中之圖，其中濾波器移位1個像素。在層C2處，存在22個12×12特徵圖。激勵函數P2(池化)在從層C2進入層S2之前應用，其池化值來自每個特徵圖中之連續非重疊2×2區的值。在層S2處，存在22個6×6特徵圖。激勵函數(池化)在從層S2進入層C3之突觸CB3處應用，其中層C3中之每個神經元經由CB3之各別突觸連接至層S2中之每個圖。在層C3處，存在64個神經元。從層C3進入輸出層S3之突觸CB4將C3完全連接至S3，亦即，層C3中之每一神經元連接至層S3中之每一神經元。層S3處之輸出包括10個神經元，其中最高輸出神經元判定類別。此輸出可例如指示原始影像之內容的識別或分類。 The excitation function P1 (pooling) is applied before entering layer S1 from layer C1. Its pooling values are derived from the values of consecutive, non-overlapping 2×2 regions in each feature map. For example, the purpose of the pooling function P1 is to take the average of nearby locations (or a max function could also be used) to reduce the dependency at edge locations and reduce the data size before moving to the next stage. At layer S1, there are 16 15×15 feature maps (i.e., sixteen different arrays, each with 15×15 pixels). The synapse CB2, which allows entry from layer S1 into layer C2, scans the map in layer S1 using a 4×4 filter, with the filter shifted by one pixel. At layer C2, there are 22 12×12 feature maps. The activation function P2 (pooling) is applied before entering layer S2 from layer C2, and its pooling value comes from the values of consecutive, non-overlapping 2×2 regions in each feature map. At layer S2, there are 22 6×6 feature maps. The activation function (pooling) is applied at the synapse CB3 between layer S2 and layer C3, where each neuron in layer C3 is connected to each feature map in layer S2 via individual synapses in CB3. At layer C3, there are 64 neurons. The synapse CB4 between layer C3 and the output layer S3 completely connects C3 to S3; that is, each neuron in layer C3 is connected to every neuron in layer S3. The output at layer S3 comprises 10 neurons, with the highest output neuron determining the category. This output can, for example, indicate the identification or classification of the content of the original image.

突觸的每個層係使用非揮發性記憶體胞元之陣列或陣列之一者部分來實施。 Each layer of a synapse is implemented using arrays or portions of arrays of nonvolatile memory cells.

圖7為可用於該目的之陣列的方塊圖。向量乘矩陣乘法(VMM)陣列32包括非揮發性記憶體胞元，且用作一層與下一層之間的突觸(如圖6中之CB1、CB2、CB3及CB4)。具體地，VMM陣列32包括非揮發性記憶體胞元陣列33、抹除閘極及字元線閘極解碼器34、控制閘極解碼器35、位元線解碼器36及源極線解碼器37，該等解碼器對非揮發性記憶體胞元陣列33之各自輸入進行解碼。至VMM陣列32之輸入可來自抹除閘極及字元線閘極解碼器34或來自控制閘極解碼器35。在此示例中，源極線解碼器37亦對非揮發性記憶體胞元陣列33之輸出進行解碼。替代地，位元線解碼器36可對非揮發性記憶體胞元陣列33之輸出進行解碼。 Figure 7 is a block diagram of an array that can be used for this purpose. The Vector Multiplication Matrix Multiplication (VMM) array 32 includes non-volatile memory cells and serves as synapses between layers (such as CB1, CB2, CB3, and CB4 in Figure 6). Specifically, the VMM array 32 includes a non-volatile memory cell array 33, an erase gate and character line gate decoder 34, a control gate decoder 35, a bit line decoder 36, and a source line decoder 37, which decode the respective inputs of the non-volatile memory cell array 33. The inputs to the VMM array 32 can come from the erase gate and character line gate decoder 34 or from the control gate decoder 35. In this example, the source line decoder 37 also decodes the output of the non-volatile memory cell array 33. Alternatively, the bit line decoder 36 can decode the output of the non-volatile memory cell array 33.

非揮發性記憶體胞元陣列33用於二個目的。首先，其儲存將由VMM陣列32使用之權重。其次，非揮發性記憶體胞元陣列33有效地使輸入乘以儲存於非揮發性記憶體胞元陣列33中之權重，且按輸出線(源極線或位元線)將其相加以產生輸出，該輸出將為至下一層之輸入或至最終層之輸入。藉由執行乘法及加法函數，非揮發性記憶體胞元陣列33消除對單獨的乘法及加法邏輯電路之需求，且由於其原位記憶體計算而亦為功率高效的。 The non-volatile memory cell array 33 serves two purposes. First, it stores the weights to be used by the VMM array 32. Second, the non-volatile memory cell array 33 efficiently multiplies the inputs by the weights stored in it and adds them along the output lines (source lines or bit lines) to produce an output that will serve as the input to the next layer or the final layer. By performing multiplication and addition functions, the non-volatile memory cell array 33 eliminates the need for separate multiplication and addition logic circuits and is also power efficient due to its in-situ memory computation.

非揮發性記憶體胞元陣列33之輸出被供應至差分加總器(如加總運算放大器或加總電流鏡)38，該差分加總器對非揮發性記憶體胞元陣列33之輸出加總以產生用於該卷積之單一值。差分加總器38設置為執行正權重與負權重之加總。 The output of the nonvolatile memory cell array 33 is supplied to a differential summator (such as a summing operational amplifier or a summing current mirror) 38, which sums the output of the nonvolatile memory cell array 33 to produce a single value for the convolution. The differential summator 38 is configured to perform summation of positive and negative weights.

接著將差分加總器38之加總輸出值供應至激勵函數區塊39，該激勵函數區塊對輸出進行糾正。激勵函數區塊39可提供S型(sigmoid)、雙曲正切(tanh)或ReLU函數。激勵函數區塊39之經糾正輸出值變成作為下一層(例如，圖6中之C1)之特徵圖之元素，且接著應用於下一突觸以產生下一特徵圖層或最終層。因此，在此示例中，非揮發性記憶體胞元陣列33構成複數個突觸(其從前一神經元層或從輸入層，如影像資料庫，接收該等突觸之輸入)，且加總運算放大器38及激勵函數區塊39構成複數個神經元。 The summed output of the differential summer 38 is then supplied to the excitation function block 39, which corrects the output. The excitation function block 39 can provide a sigmoid, hyperbolic tangent, or ReLU function. The corrected output of the excitation function block 39 becomes an element of the feature map of the next layer (e.g., C1 in Figure 6), and is then applied to the next synapse to produce the next feature map layer or the final layer. Therefore, in this example, the nonvolatile memory cell array 33 constitutes a plurality of synapses (which receive input from the preceding neuronal layer or from an input layer, such as an image database), and the operational amplifier 38 and the excitation function block 39 together constitute a plurality of neurons.

圖7中至VMM陣列32之輸入(WLx，EGx，CGx，以及可選地BLx及SLx)可為類比位準、二進位位準或數位位元(在此情況下，DAC設置成將數位位元轉換成適當輸入類比位準)，且輸出可為類比位準、二進位位準或數位位元(在此情況下，輸出ADC被設置成將輸出類比位準轉換成數位位元)。 The inputs to the VMM array 32 in Figure 7 (WLx, EGx, CGx, and optionally BLx and SLx) can be analog, binary, or digital (in which case the DAC is configured to convert digital bits to the appropriate input analog level), and the outputs can be analog, binary, or digital (in which case the output ADC is configured to convert the output analog level to digital bits).

圖8為描繪此處標記為VMM陣列32a、32b、32c、32d及32e之VMM陣列32的眾多層之使用的方塊圖。如圖8中所顯示，表示為Inputx之輸入藉由數位至類比轉換器31從數位轉換成類比，且被提供至輸入VMM陣列32a。經轉換類比輸入可為電壓或電流。第一層之輸入D/A轉換可藉由使用函數或查找表(LUT)來進行，該函數或LUT將輸入Inputx映射至用於輸入VMM陣列 32a之矩陣乘法器的適當類比位準。輸入轉換亦可藉由類比至類比(A/A)轉換器來完成以將外部類比輸入轉換成至輸入VMM陣列32a之經映射類比輸入。 Figure 8 is a block diagram depicting the use of the multiple layers of VMM array 32, denoted here as VMM arrays 32a, 32b, 32c, 32d, and 32e. As shown in Figure 8, the input represented as Inputx is converted from digital to analog by a digital-to-analog converter 31 and provided to the input VMM array 32a. The converted analog input can be voltage or current. The input D/A conversion of the first layer can be performed by using a function or lookup table (LUT) that maps the input Inputx to the appropriate analog level for the matrix multiplier of the input VMM array 32a. Input conversion can also be accomplished using an analog-to-analog (A/A) converter to convert external analog inputs to mapped analog inputs to the input VMM array 32a.

由輸入VMM陣列32a產生之輸出設置成至下一VMM陣列(隱藏位準1)32b之輸入，該下一VMM陣列又產生輸出，該輸出設置成至下一VMM陣列(隱藏位準2)32c之輸入，等等。VMM陣列32之各種層作為卷積神經網路(CNN)之不同突觸層及神經元層。每個VMM陣列32a、32b、32c、32d及32e可為單獨的實體非揮發性記憶體陣列，或多個VMM陣列可利用相同實體非揮發性記憶體陣列之不同部分，或多個VMM陣列可利用相同實體非揮發性記憶體陣列之重疊部分。圖8中所顯示之示例含有五個層(32a、32b、32c、32d、32e)：一個輸入層(32a)、二個隱藏層(32b、32c)及二個完全連接層(32d、32e)。所屬技術領域中具有通常知識者應瞭解，此僅為示例，且系統替代地可包含二個以上的隱藏層及二個以上的完全連接層。 The output generated by the input VMM array 32a is set to the input of the next VMM array (hidden bit level 1) 32b, which in turn generates an output, which is set to the input of the next VMM array (hidden bit level 2) 32c, and so on. The various layers of the VMM array 32 serve as different synaptic layers and neuron layers of a convolutional neural network (CNN). Each VMM array 32a, 32b, 32c, 32d, and 32e can be a single physical nonvolatile memory array, or multiple VMM arrays can utilize different portions of the same physical nonvolatile memory array, or multiple VMM arrays can utilize overlapping portions of the same physical nonvolatile memory array. The example shown in Figure 8 contains five layers (32a, 32b, 32c, 32d, 32e): one input layer (32a), two hidden layers (32b, 32c), and two fully interconnected layers (32d, 32e). Those with ordinary knowledge in the relevant technical field should understand that this is merely an example, and the system may alternatively contain more than two hidden layers and more than two fully connected layers.

向量矩陣乘法(VMM)陣列： Vector Matrix Multiplication (VMM) array:

圖9描繪神經元VMM陣列900，其尤其適合於如圖3中所顯示之記憶體胞元310，且用作輸入層與下一層之間的突觸及神經元的部分。VMM陣列900包含非揮發性記憶體胞元之記憶體陣列901及非揮發性參考記憶體胞元之參考陣列902(在陣列之頂部處)。替代地，另一參考陣列可置放於底部處。 Figure 9 depicts a neuronal VMM array 900, which is particularly suitable for memory cells 310 as shown in Figure 3, and serves as a portion of the synapses and neurons between the input layer and the next layer. The VMM array 900 includes a memory array 901 of nonvolatile memory cells and a reference array 902 of nonvolatile reference memory cells (at the top of the array). Alternatively, another reference array may be placed at the bottom.

在VMM陣列900中，控制閘極線，如控制閘極線903，在垂直方向上延行(因此，列方向上之參考陣列902與控制閘極線903正交)，且抹除閘極線，如抹除閘極線904，在水平方向上延行。此處，至VMM陣列900之輸入設置於控制閘極線(CG0、CG1、CG2、CG3)上，且VMM陣列900之輸出出現於源極線(SL0、SL1)上。在一個示例中，僅使用偶數列，且在另一示例中，僅使用奇數列。置放於每個源極線(分別為SL0、SL1)上之電流對來自連接至那特定源極線之記憶體胞元的所有電流執行加總函數。 In the VMM array 900, control gate lines, such as control gate line 903, extend vertically (therefore, the reference array 902 in the column direction is orthogonal to control gate line 903), and erased gate lines, such as erased gate line 904, extend horizontally. Here, the inputs to the VMM array 900 are set on the control gate lines (CG0, CG1, CG2, CG3), and the outputs of the VMM array 900 appear on the source lines (SL0, SL1). In one example, only even-numbered columns are used, and in another example, only odd-numbered columns are used. The current placed on each source line (SL0, SL1, etc.) is summed against all currents from memory cells connected to that particular source line.

如本文中針對神經網路所描述的，VMM陣列900之非揮發性記憶體胞元，亦即，VMM陣列900之記憶體胞元310，可配置成在次臨界區中操作。 As described in this paper regarding neural networks, the non-volatile memory cells of the VMM array 900, namely memory cells 310 of the VMM array 900, can be configured to operate in subcritical regions.

本文中所描述之非揮發性參考記憶體胞元及非揮發性記憶體胞元在弱反轉(次臨界區)中經偏壓：其中 The nonvolatile reference memory cells and nonvolatile memory cells described in this article are biased in weak reversal (subcritical region): in

其中Ids為汲極至源極電流；Vg為記憶體胞元上之閘極電壓；Vth為記憶體胞元之臨界電壓；Vt為熱電壓=k*T/q，其中k為波茲曼常數(Boltzmann constant)，T為以克耳文(Kelvin)為單位之溫度，並且q為電子電荷；n為斜率因數=1+(Cdep/Cox)，其中Cdep=空乏層之電容，並且Cox為閘極氧化物層之電容；Io為等於臨界電壓之閘極電壓之記憶體胞元電流，Io為與(Wt/L)*u*Cox*(n-1)*Vt²成比例，其中u為記憶體胞元之載子遷移率，且Wt及L分別為記憶體胞元之寬度及長度。 Where Ids is the drain-to-source current; Vg is the gate voltage on the memory cell; Vth is the critical voltage of the memory cell; Vt is the thermoelectric pressure = k*T/q, where k is the Boltzmann constant, T is the temperature in Kelvin, and q is the electron charge; n is the slope factor = 1 + (Cdep/Cox), where Cdep = the capacitance of the depletion layer, and Cox is the capacitance of the gate oxide layer; Io is the memory cell current at the gate voltage equal to the critical voltage, and Io is the sum of (Wt/L)*u*Cox*(n-1)*Vt The ratio is ² , where u is the carrier migration rate of the memory cell, and Wt and L are the width and length of the memory cell, respectively.

對於使用記憶體胞元(如參考記憶體胞元或周邊記憶體胞元)或電晶體將輸入電流轉換成輸入電壓之I至V對數轉換器：Vg=n*Vt*log[Ids/wp*Io] For an I-to-V logarithmic converter that uses memory cells (such as reference memory cells or peripheral memory cells) or transistors to convert input current into input voltage: Vg = n * Vt * log[Ids/wp * Io]

其中，wp為參考或周邊記憶體胞元之w。 Where wp represents the w in the reference or peripheral memory cell.

對於用作具有電流輸入之向量矩陣乘法器VMM陣列之記憶體陣列，輸出電流為：，亦即 Iout=(wa/wp)* Iin=W * Iin For a memory array used as a vector matrix multiplier (VMM) array with current input, the output current is: , that is, Iout=(wa/wp)* Iin=W * Iin

此處，wa=記憶體陣列中之每個記憶體胞元之w。 Here, wa = w of each memory cell in the memory array.

Vthp為周邊記憶體胞元之有效臨界電壓，且Vtha為主(資料)記憶體胞元之有效臨界電壓。應注意，電晶體之臨界電壓為基板主體偏壓電壓之函數，且表示為Vsb之基板主體偏壓電壓可調變為補償此溫度下之各種條件。臨界電壓Vth可表述為：Vth=Vth0+γ(SQRT｜Vsb-2*φF)-SQRT｜2* φF｜) Vthp is the effective critical voltage of the peripheral memory cell, and Vtha is the effective critical voltage of the main (data) memory cell. It should be noted that the critical voltage of the transistor is a function of the substrate bias voltage, and the substrate bias voltage, expressed as Vsb, can be modulated to compensate for various conditions at this temperature. The critical voltage Vth can be expressed as: Vth = Vth0 + γ(SQRT｜Vsb - 2*φF) - SQRT｜2*φF｜)

其中Vth0為具有零基板偏壓之臨界電壓，φF為表面電位，且γ為主體效應參數。 Where Vth0 is the critical voltage with zero substrate bias, φF is the surface potential, and γ is the host effect parameter.

字元線或控制閘極可用作用於輸入電壓之記憶體胞元之輸入。 Character lines or control gates can be used to input memory cells that operate on the input voltage.

替代地，本文中所描述之VMM陣列之快閃記憶體胞元可配置成在線性區中操作：Ids=β*(Vgs-Vth)*Vds；β=u*Cox*W_t/L W=α(Vgs-Vth) Alternatively, the flash memory cells of the VMM array described herein can be configured to operate in a linear region: Ids = β*(Vgs - Vth)*Vds; β = u*Cox* _Wt /LW = α(Vgs - Vth)

此意謂線性區中之權重W與(Vgs-Vth)成比例。 This means that the weight W in the linear region is proportional to (Vgs - Vth).

字元線或控制閘極或位元線或源極線可用作在線性區中操作之記憶體胞元的輸入。位元線或源極線可用作記憶體胞元之輸出。 Character lines, control gates, bit lines, or source lines can be used as inputs to memory cells operating in the linear region. Bit lines or source lines can be used as outputs to memory cells.

對於I至V線性轉換器，記憶體胞元(如，參考記憶體胞元或周邊記憶體胞元)或在線性區中操作之電晶體可用來將輸入/輸出電流線性地轉換成輸入/輸出電壓。 For I-to-V linear converters, memory cells (e.g., reference memory cells or peripheral memory cells) or transistors operating in the linear region can be used to linearly convert input/output currents into input/output voltages.

替代地，本文中所描述之VMM陣列之記憶體胞元可配置成在飽和區中操作：；β=u*Cox*Wt/L Alternatively, the memory cells of the VMM array described herein can be configured to operate in saturation regions: β=u*Cox*Wt/L

Wα(Vgs-Vth)²，此意謂權重W與(Vgs-Vth)²成比例。 Wα(Vgs-Vth) ² means that the weight W is proportional to (Vgs-Vth) ² .

字元線、控制閘極或抹除閘極可用作在飽和區中操作之記憶體胞元之輸入。位元線或源極線可用作輸出神經元之輸出。 Character lines, control gates, or erase gates can be used as inputs to memory cells operating in the saturation region. Bit lines or source lines can be used as outputs to output neurons.

替代地，本文中所描述之VMM陣列之記憶體胞元可用於神經網路之每個層或多個層之所有區域或其組合(次臨界區、線性區或飽和區)中。 Alternatively, the memory cells of the VMM array described herein can be used in all regions or combinations thereof (subcritical regions, linear regions, or saturation regions) of each or more layers of a neural network.

圖7之VMM陣列32的其他示例描述於美國專利第10,748,630號中，該專利以引用之方式併入本文中。如該申請案中所描述，源極線或位元線可用作神經元輸出(電流加總輸出)。 Further examples of the VMM array 32 in Figure 7 are described in U.S. Patent No. 10,748,630, which is incorporated herein by reference. As described in that application, source lines or bit lines can be used as neural outputs (current summation outputs).

圖10描繪神經元VMM陣列1000，其尤其適合於如圖2中所顯示之記憶體胞元210，且用作輸入層與下一層之間的突觸。VMM陣列1000包含非揮發性記憶體胞元之記憶體陣列1003、第一非揮發性參考記憶體胞元之參考陣列1001及第二非揮發性參考記憶體胞元之參考陣列1002。配置於陣列之行方向上之參考陣列1001及1002用以將流動至端子BLR0、BLR1、BLR2及BLR3中之電流輸入轉換成電壓輸入WL0、WL1、WL2及WL3。實際上，第一及第二非揮發性參考記憶體胞元為二極體連接式貫穿多工器 1014(僅部分描繪)，其中電流輸入流入該等多工器中。參考胞元經調諧(例如，經程式化)至目標參考位準。目標參考位準由參考小型陣列矩陣(未顯示)提供。 Figure 10 depicts a neuronal VMM array 1000, which is particularly suitable for memory cells 210 as shown in Figure 2 and serves as a synapse between an input layer and the next layer. The VMM array 1000 includes a memory array 1003 of nonvolatile memory cells, a reference array 1001 of first nonvolatile reference memory cells, and a reference array 1002 of second nonvolatile reference memory cells. The reference arrays 1001 and 1002, arranged in the row direction of the array, are used to convert current inputs flowing to terminals BLR0, BLR1, BLR2, and BLR3 into voltage inputs WL0, WL1, WL2, and WL3. In practice, the first and second nonvolatile reference memory cells are diode-connected through-type multiplexers 1014 (partially depicted), in which current input flows into these multiplexers. The reference cells are tuned (e.g., programmed) to a target reference level. The target reference level is provided by a reference miniature array matrix (not shown).

記憶體陣列1003用於二個目的。首先，其儲存將由VMM陣列1000在其每個別記憶體胞元上使用之權重。其次，記憶體陣列1003有效地使輸入(亦即，在端子BLR0、BLR1、BLR2及BLR3中提供之電流輸入，其由參考陣列1001及1002轉換成輸入電壓以供應至字元線WL0、WL1、WL2及WL3)乘以儲存於記憶體陣列1003中之權重，且隨後將所有結果(記憶體胞元電流)相加以在各自位元線(BL0-BLN)上產生輸出，該輸出將為至下一層的輸入或至最終層之輸入。藉由執行乘法及加法函數，記憶體陣列1003消除對單獨的乘法及加法邏輯電路之需求，且亦為功率高效的。此處，電壓輸入提供於字元線WL0、WL1、WL2及WL3上，且在讀取(推理)操作期間輸出呈現於各自位元線BL0-BLN上。位於位元線BL0-BLN之每一者上的電流對來自連接至該特定位元線之所有非揮發性記憶體胞元的電流執行加總函數。 The memory array 1003 serves two purposes. First, it stores the weights used by the VMM array 1000 in each of its individual memory cells. Second, the memory array 1003 effectively multiplies the inputs (i.e., the current inputs provided in terminals BLR0, BLR1, BLR2, and BLR3, which are converted into input voltages by reference arrays 1001 and 1002 to supply word lines WL0, WL1, WL2, and WL3) by the weights stored in the memory array 1003, and then adds all the results (memory cell currents) to produce an output on the respective bit lines (BL0-BLN), which will be the input to the next layer or the input to the final layer. By performing multiplication and addition functions, the memory array 1003 eliminates the need for separate multiplication and addition logic circuits and is also power-efficient. Here, voltage inputs are provided on word lines WL0, WL1, WL2, and WL3, and outputs are presented on their respective bit lines BL0-BLN during read (inference) operations. The current on each of the bit lines BL0-BLN performs an aggregator function on the current from all non-volatile memory cells connected to that particular bit line.

表5描繪用於VMM陣列1000之操作電壓及電流。表中之行指示位於以下每一者上之電壓：用於選定胞元之字元線、用於未選定胞元之字元線、用於選定胞元之位元線、用於未選定胞元之位元線、用於選定胞元之源極線及用於未選定胞元之源極線。列指示讀取、抹除及程式化之操作。 Table 5 depicts the operating voltages and currents used in the VMM array 1000. The row indicators in the table are located at the following voltages: word lines for selected cells, word lines for unselected cells, bit lines for selected cells, bit lines for unselected cells, source lines for selected cells, and source lines for unselected cells. The column indicators are for read, erase, and programming operations.

表5：圖10之VMM陣列1000之操作： Table 5: Operation of VMM array 1000 in Figure 10:

圖11描繪神經元VMM陣列1100，其尤其適合於如圖2中所顯示之記憶體胞元210，且用作輸入層與下一層之間的突觸及神經元的部分。VMM陣列1100包含非揮發性記憶體胞元之記憶體陣列1103、第一非揮發性參考記憶體胞元之參考陣列1101及第二非揮發性參考記憶體胞元之參考陣列1102。參考陣列1101及1102在VMM陣列1100之列方向上延行。VMM陣列類似於VMM1000，除了在VMM陣列1100中字元線在垂直方向上延行外。此處，輸入提供於字元線(WLA0、WLB0、WLA1、WLB1、WLA2、WLB2、WLA3、WLB3)上，且在讀取操作期間輸出呈現於源極線(SL0、SL1)上。位於每個源極線上之電流對來自連接至該特定源極線之記憶體胞元的所有電流執行加總函數。 Figure 11 depicts a neuronal VMM array 1100, which is particularly suitable for memory cells 210 as shown in Figure 2, and serves as a portion of synapses and neurons between the input layer and the next layer. The VMM array 1100 includes a memory array 1103 of nonvolatile memory cells, a reference array 1101 of first nonvolatile reference memory cells, and a reference array 1102 of second nonvolatile reference memory cells. Reference arrays 1101 and 1102 extend in the column direction of the VMM array 1100. The VMM array is similar to VMM 1000, except that in the VMM array 1100, the character lines extend in the vertical direction. Here, inputs are provided on character lines (WLA0, WLB0, WLA1, WLB1, WLA2, WLB2, WLA3, WLB3), and outputs are presented on source lines (SL0, SL1) during read operations. The current on each source line is summed against all currents from the memory cells connected to that particular source line.

表6描繪用於VMM陣列1100之操作電壓及電流。表中之行指示位於以下每一者上之電壓：用於選定胞元之字元線、用於未選定胞元之字元線、用於選定胞元之位元線、用於未選定胞元之位元線、用於選定胞元之源極線及用於未選定胞元之源極線。列指示讀取、抹除及程式化之操作。 Table 6 depicts the operating voltages and currents used in the VMM array 1100. The row indicators in the table are located at the following voltages: word lines for selected cells, word lines for unselected cells, bit lines for selected cells, bit lines for unselected cells, source lines for selected cells, and source lines for unselected cells. The column indicators are for read, erase, and programming operations.

圖12描繪神經元VMM陣列1200，其尤其適合於如圖3中所顯示之記憶體胞元310，且用作輸入層與下一層之間的突觸及神經元的部分。VMM陣列1200包含非揮發性記憶體胞元之記憶體陣列1203、第一非揮發性參考記憶體胞元之參考陣列1201及第二非揮發性參考記憶體胞元之參考陣列1202。參考陣列1201及1202用以將流入端子BLR0、BLR1、BLR2及BLR3中之電流輸入轉換成電壓輸入CG0、CG1、CG2及CG3。實際上，第一及第二非揮發性參考記憶體胞元為二極體連接之貫穿多工器1212(僅部分顯示)，其中電流輸入經由BLR0、BLR1、BLR2及BLR3流入該等多工器中。多工器1212每個包括各自多工器1205及疊接電晶體1204以確保在讀取操作期間第一及第二非揮發性參考記憶體胞元之每一者之位元線(如BLR0)上的恆定電壓。參考胞元經調諧至目標參考位準。 Figure 12 depicts a neuronal VMM array 1200, which is particularly suitable for memory cells 310 as shown in Figure 3, and serves as a portion of synapses and neurons between the input layer and the next layer. The VMM array 1200 includes a memory array 1203 of nonvolatile memory cells, a reference array 1201 of first nonvolatile reference memory cells, and a reference array 1202 of second nonvolatile reference memory cells. Reference arrays 1201 and 1202 are used to convert current inputs flowing into terminals BLR0, BLR1, BLR2, and BLR3 into voltage inputs CG0, CG1, CG2, and CG3. In practice, the first and second non-volatile reference memory cells are diode-connected through-multiplexers 1212 (partially shown), with current input flowing into these multiplexers via BLR0, BLR1, BLR2, and BLR3. Each multiplexer 1212 includes its own multiplexer 1205 and an overlay transistor 1204 to ensure a constant voltage on the bit lines (e.g., BLR0) of each of the first and second non-volatile reference memory cells during read operations. The reference cell is tuned to a target reference level.

記憶體陣列1203用於二個目的。首先，其儲存將由VMM陣列1200使用之權重。其次，記憶體陣列1203有效地使輸入(提供至端子BLR0、BLR1、BLR2及BLR3之電流輸入，其中參考陣列1201及1202將此等電流輸入轉換成輸入電壓以供應至控制閘極(CG0、CG1、CG2及CG3))乘以儲存於記憶體陣列中之權重，且接著將所有結果(胞元電流)相加以產生輸出，該輸出顯現於BL0-BLN上，且將為至下一層之輸入或至最終層之輸入。藉由執行乘法及加法函數，記憶體陣列消除對單獨的乘法及加法邏輯電路之需求，且亦為功率高效的。此處，輸入提供於控制閘極線(CG0，CG1，CG2及CG3)上，且在讀取操作期間輸出呈現於位元線(BL0-BLN)上。位於每個位元線上之電流對來自連接至該特定位元線之記憶體胞元的所有電流執行加總函數。 The memory array 1203 serves two purposes. First, it stores the weights to be used by the VMM array 1200. Second, the memory array 1203 effectively multiplies the inputs (current inputs provided to terminals BLR0, BLR1, BLR2, and BLR3, wherein reference arrays 1201 and 1202 convert these current inputs into input voltages to supply to the control gates (CG0, CG1, CG2, and CG3)) by the weights stored in the memory array, and then adds all the results (cell currents) to produce an output, which is displayed on BL0-BLN and will be the input to the next layer or to the final layer. By performing multiplication and addition functions, the memory array eliminates the need for separate multiplication and addition logic circuits and is also power-efficient. Here, inputs are provided on control gate lines (CG0, CG1, CG2, and CG3), and outputs are presented on bit lines (BL0-BLN) during read operations. The current on each bit line is summed against all currents from the memory cells connected to that particular bit line.

VMM陣列1200針對記憶體陣列1203中之非揮發性記憶體胞元實施單向調諧。亦即，每個非揮發性記憶體胞元經抹除且接著部分地經程式化，直至達到浮動閘極上之所要電荷為止。若過多電荷被置放於浮動閘極上(使得錯誤值儲存於胞元中)，則胞元被抹除且部分程式化操作之序列重新開始。如所顯示，共用相同抹除閘極(如EG0或EG1)之二個列被一起抹除(其稱為頁面抹除)，且此後，每個胞元部分地經程式化直至達到浮動閘極上之所要電荷為止。 The VMM array 1200 performs unidirectional modulation on the non-volatile memory cells in the memory array 1203. That is, each non-volatile memory cell is erased and then partially programmed until the desired charge is reached on the floating gate. If excessive charge is placed on the floating gate (causing an error value to be stored in the cell), the cell is erased and the sequence of partially programmed operations restarts. As shown, two columns sharing the same erase gate (such as EG0 or EG1) are erased together (this is called page erasure), and thereafter, each cell is partially programmed until the desired charge is reached on the floating gate.

表7描繪用於VMM陣列1200之操作電壓及電流。該表中之行指示置放於以下每一者上之電壓：用於選定胞元之字元線、用於未選定胞元之字元線、用於選定胞元之位元線、用於未選定胞元之位元線、用於選定胞元之控制閘極、用於與選定胞元處於同一區段中的未選定胞元之控制閘極、用於與選定胞元處於不同區段中的未選定胞元之控制閘極、用於選定胞元之抹除閘極、用於未選定胞元之抹除閘極、用於選定胞元之源極線及用於未選定胞元之源極線。列指示讀取、抹除及程式化之操作。 Table 7 depicts the operating voltages and currents used in the VMM array 1200. The row indicators in this table are placed on the following voltages: word lines for selected cells, word lines for unselected cells, bit lines for selected cells, bit lines for unselected cells, control gate for selected cells, control gate for unselected cells in the same segment as the selected cell, control gate for unselected cells in a different segment from the selected cell, erase gate for selected cells, erase gate for unselected cells, source lines for selected cells, and source lines for unselected cells. The column indicators show the read, erase, and programmed operations.

圖13描繪神經元VMM陣列1300，其尤其適合於如圖3中所顯示之記憶體胞元310，且用作輸入層與下一層之間的突觸及神經元的部分。VMM陣列1300包含非揮發性記憶體胞元之記憶體陣列1303、第一非揮發性參考記憶體胞元之參考陣列1301及第二非揮發性參考記憶體胞元之參考陣列1302。EG線EGR0、EG0、EG1及EGR1垂直地延行，而CG線CG0、CG1、CG2及CG3以及WL線WL0、WL1、WL2及WL3水平地延行。VMM陣列1300類似於VMM陣列1200，除了VMM陣列1300實施雙向調諧外，其中由於使用單獨的EG線，每個個別胞元可視需要經完全抹除、部分地程式化及部分地抹除以達到浮動閘極上之所需電荷量。如所顯示，參考陣列1301及1302將端子BLR0、BLR1、BLR2及BLR3中之輸入電流轉換成在列方向上待施加至記憶體胞元之控制閘極電壓CG0、CG1、CG2及CG3(經由二極體連接式參考胞元貫穿多工器1314之動作)。電流輸出(神經元)在位元線BL0-BLN中，其中每個位元線對來自連接至該特定位元線之非揮發性記憶體胞元的所有電流進行加總。 Figure 13 depicts a neuronal VMM array 1300, which is particularly suitable for memory cells 310 as shown in Figure 3, and serves as a portion of synapses and neurons between the input layer and the next layer. The VMM array 1300 includes a memory array 1303 of nonvolatile memory cells, a reference array 1301 of first nonvolatile reference memory cells, and a reference array 1302 of second nonvolatile reference memory cells. EG lines EGR0, EG0, EG1, and EGR1 extend vertically, while CG lines CG0, CG1, CG2, and CG3 and WL lines WL0, WL1, WL2, and WL3 extend horizontally. VMM array 1300 is similar to VMM array 1200, except that VMM array 1300 implements bidirectional tuning. Because it uses a separate EG line, each individual cell can be completely erased, partially programmed, or partially erased as needed to achieve the required charge on the floating gate. As shown, reference arrays 1301 and 1302 convert the input current in terminals BLR0, BLR1, BLR2, and BLR3 into control gate voltages CG0, CG1, CG2, and CG3 to be applied to the memory cells in the column direction (through the operation of multiplexer 1314 via diode-connected reference cells). The current output (neuron) is located in bit lines BL0-BLN, where each bit line sums all currents from non-volatile memory cells connected to that specific bit line.

表8描繪用於VMM陣列1300之操作電壓及電流。該表中之行指示位於以下每一者上之電壓：用於選定胞元之字元線、用於未選定胞元之字元線、用於選定胞元之位元線、用於未選定胞元之位元線、用於選定胞元之控制閘極、用於與選定胞元處於同一區段中的未選定胞元之控制閘極、用於與選定胞元處於不同區段中的未選定胞元之控制閘極、用於選定胞元之抹除閘極、用於未選定胞元之抹除閘極、用於選定胞元之源極線及用於未選定胞元之源極線。列指示讀取、抹除及程式化之操作。 Table 8 depicts the operating voltages and currents used in the VMM array 1300. The row indicators in this table are located at the voltages for each of the following: word lines for selected cells, word lines for unselected cells, bit lines for selected cells, bit lines for unselected cells, control gate for selected cells, control gate for unselected cells in the same segment as the selected cell, control gate for unselected cells in a different segment from the selected cell, erase gate for selected cells, erase gate for unselected cells, source lines for selected cells, and source lines for unselected cells. The column indicators show the read, erase, and programmable operations.

圖22描繪神經元VMM陣列2200，其尤其適合於如圖2中所顯示之記憶體胞元210，且用作輸入層與下一層之間的突觸及神經元的部分。在VMM陣列2200中，輸入INPUT₀、．．．、INPUT_N分別於位元線BL₀、．．．、BL_N上被接收，且分別於源極線SL₀、SL₁、SL₂及SL₃上產生輸出OUTPUT₁、OUTPUT₂、OUTPUT₃及OUTPUT₄。 Figure 22 depicts a neuronal VMM array 2200, which is particularly suitable for memory cell 210 as shown in Figure 2, and serves as a portion of synapses and neurons between the input layer and the next layer. In the VMM array 2200, inputs INPUT ₀ , ..., INPUT _N are received on bit lines BL ₀ , ..., BL _N , respectively, and outputs OUTPUT ₁ , OUTPUT 2, OUTPUT ₃ , and OUTPUT ₄ are generated on source lines SL ₀ , SL ₁ , SL ₂ , and SL ₃ , _respectively .

圖23描繪神經元VMM陣列2300，其尤其適合於如圖2中所顯示之記憶體胞元210，且用作輸入層與下一層之間的突觸及神經元的部分。在此示例中，輸入INPUT₀、INPUT₁、INPUT₂及INPUT₃分別於源極線SL₀、SL₁、SL₂及SL₃上被接收，且於位元線BL₀、．．．、BL_N上產生輸出OUTPUT₀、．．．、OUTPUT_N。 Figure 23 depicts a neuronal VMM array 2300, which is particularly suitable for memory cell 210 as shown in Figure 2, and serves as a portion of synapses and neurons between input layers and the next layer. In this example, inputs INPUT ₀ , INPUT ₁ , INPUT ₂ , and INPUT ₃ are received on source lines SL ₀ , SL ₁ , SL ₂ , and SL ₃ , respectively, and outputs OUTPUT ₀ , ..., OUTPUT _N are generated on bit lines BL ₀ , ..., BL _N.

圖24描繪神經元VMM陣列2400，其尤其適合於如圖2中所顯示之記憶體胞元210，且用作輸入層與下一層之間的突觸及神經元的部分。在此示例中，輸入INPUT₀、．．．、INPUT_M分別於字元線WL₀、．．．、WL_M上被接收，且於位元線BL₀、．．．、BL_N上產生輸出OUTPUT₀、．．．、OUTPUT_N。 Figure 24 depicts a neuronal VMM array 2400, which is particularly suitable for memory cell 210 as shown in Figure 2, and serves as a portion of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT ₀ , ..., INPUT _M are received on word lines WL ₀ , ..., WL _M , respectively, and outputs OUTPUT ₀ , ..., OUTPUT _N are generated on bit lines BL ₀ , ..., BL _N.

圖25描繪神經元VMM陣列2500，其尤其適合於如圖3中所顯示之記憶體胞元310，且用作輸入層與下一層之間的突觸及神經元的部分。在此示例中，輸入INPUT₀、．．．、INPUT_M分別於字元線WL₀、．．．、WL_M上被接收，且於位元線BL₀、．．．、BL_N上產生輸出OUTPUT₀、．．．、OUTPUT_N。 Figure 25 depicts a neuronal VMM array 2500, which is particularly suitable for memory cell 310 as shown in Figure 3, and serves as a portion of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT ₀ , ..., INPUT _M are received on word lines WL ₀ , ..., WL _M , respectively, and outputs OUTPUT ₀ , ..., OUTPUT _N are generated on bit lines BL ₀ , ..., BL _N.

圖26描繪神經元VMM陣列2600，其尤其適合於如圖4中所顯示之記憶體胞元410，且用作輸入層與下一層之間的突觸及神經元的部分。在此示例中，輸入INPUT₀、．．．、INPUT_N分別於垂直控制閘極線CG₀、．．．、CG_N上被接收，且於源極線SL₀及SL₁上產生輸出OUTPUT₁及OUTPUT₂。 Figure 26 depicts a neuronal VMM array 2600, which is particularly suitable for memory cell 410 as shown in Figure 4, and serves as a portion of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT ₀ , ..., INPUT _N are received on vertical control gate lines CG ₀ , ..., CG _N , respectively, and outputs OUTPUT ₁ and OUTPUT ₂ are generated on source lines SL ₀ and SL ₁ .

圖27描繪神經元VMM陣列2700，其尤其適合於如圖4中所顯示之記憶體胞元410，且用作輸入層與下一層之間的突觸及神經元的部分。在此示例中，輸入INPUT₀、．．．、INPUT_N分別於位元線控制閘極2701-1、2701-2、．．．、2701-(N-1)及2701-N之閘極上被接收，該等閘極分別耦接至位元線BL₀、．．．、BL_N。於源極線SL₀及SL₁上產生示例輸出OUTPUT₁及OUTPUT₂。 Figure 27 depicts a neuronal VMM array 2700, which is particularly suitable for memory cell 410 as shown in Figure 4, and serves as a portion of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT ₀ , ..., INPUT _N are received at the gates of bit line control gates 2701-1, 2701-2, ..., 2701-(N-1), and 2701-N, respectively, which are coupled to bit lines BL ₀ , ..., BL _N. Example outputs OUTPUT ₁ and OUTPUT ₂ are generated on source lines SL ₀ and SL ₁ .

圖28描繪神經元VMM陣列2800，其尤其適合於如圖3中所顯示之記憶體胞元310、如圖5中所顯示之記憶體胞元510及如圖7中所顯示之記憶體胞元710，且用作輸入層與下一層之間的突觸及神經元的部分。在此示例中，輸入INPUT₀、．．．、INPUT_M於字元線WL₀、．．．、WL_M上被接收，且分別於位元線BL₀、．．．、BL_N上產生輸出OUTPUT₀、．．．、OUTPUT_N。 Figure 28 depicts a neuronal VMM array 2800, which is particularly suitable for memory cells 310 as shown in Figure 3, memory cells 510 as shown in Figure 5, and memory cells 710 as shown in Figure 7, and serves as a portion of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT ₀ , ..., INPUT _M are received on word lines WL ₀ , ..., WL _M , and outputs OUTPUT ₀ , ..., OUTPUT _N are generated on bit lines BL ₀ , ..., BL _N , respectively.

圖29描繪神經元VMM陣列2900，其尤其適合於如圖3中所顯示之記憶體胞元310、如圖5中所顯示之記憶體胞元510及如圖7中所顯示之記憶體胞元710，且用作輸入層與下一層之間的突觸及神經元的部分。在此示例中，輸入INPUT₀、．．．、INPUT_M於控制閘極線CG₀、．．．、CG_M上被接收。分別於垂直源極線SL₀、．．．、SL_N上產生輸出OUTPUT₀、．．．、OUTPUT_N，其中每個源極線SL_i耦接至行i中之所有記憶體胞元之源極線。 Figure 29 depicts a neuronal VMM array 2900, which is particularly suitable for memory cells 310 as shown in Figure 3, memory cells 510 as shown in Figure 5, and memory cells 710 as shown in Figure 7, and serves as a portion of synapses and neurons between input layers and the next layer. In this example, inputs INPUT ₀ , ..., INPUT _M are received on control gate lines CG ₀ , ..., CG _M. Outputs OUTPUT ₀ , ..., OUTPUT _N are generated on vertical source lines SL ₀ , ..., SL _N , respectively, where each source line SL _i is coupled to the source line of all memory cells in row i.

圖30描繪神經元VMM陣列3000，其尤其適合於如圖3中所顯示之記憶體胞元310、如圖5中所顯示之記憶體胞元510及如圖7中所顯示之記憶體胞元710，且用作輸入層與下一層之間的突觸及神經元的部分。在此示例中，輸入INPUT₀、．．．、INPUT_M於控制閘極線CG₀、．．．、CG_M上被接收。分別於垂直位元線BL₀、．．．、BL_N上產生輸出OUTPUT₀、．．．、OUTPUT_N，其中每個位元線BL_i耦接至行i中之所有記憶體胞元之位元線。 Figure 30 depicts a neuronal VMM array 3000, which is particularly suitable for memory cells 310 as shown in Figure 3, memory cells 510 as shown in Figure 5, and memory cells 710 as shown in Figure 7, and serves as a portion of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT ₀ , ..., INPUT _M are received on control gate lines CG ₀ , ..., CG _M. Outputs OUTPUT ₀ , ..., OUTPUT _N are generated on vertical bit lines BL ₀ , ..., BL _N , respectively, where each bit line BL _i is coupled to the bit lines of all memory cells in row i.

長短期記憶體： Long Short-Term Memory (LSTM):

先前技術包括被稱為長短期記憶體(LSTM)之概念。LSTM單元常常用於神經網路中。LSTM允許神經網路在預定任意時間間隔內記住資訊且在後續操作中使用該資訊。習知LSTM單元包含胞元、輸入閘極、輸出閘極及遺忘閘極。三個閘極調節資訊進入及離開胞元之流動及在LSTM中記住資訊之時間間隔。VMM尤其適用於LSTM單元。 Previous technologies included a concept known as Long Short-Term Memory (LSTM). LSTM cells are commonly used in neural networks. LSTM allows neural networks to remember information at predetermined time intervals and use that information in subsequent operations. A known LSTM cell consists of a cell, an input gate, an output gate, and a forget gate. These three gates regulate the flow of information into and out of the cell and the time interval at which information is remembered in the LSTM. Virtual Memory Models (VMMs) are particularly well-suited for LSTM cells.

圖14描繪示例LSTM 1400。此示例中之LSTM 1400包含胞元1401、1402、1403及1404。胞元1401接收輸入向量x₀，且產生輸出向量h₀及胞元狀態向量c₀。胞元1402接收輸入向量x₁、來自胞元1401之輸出向量(隱藏狀態)h₀及來自胞元1401之胞元狀態c₀，且產生輸出向量h₁及胞元狀態向量c₁。胞元1403接收輸入向量x₂、來自胞元1402之輸出向量(隱藏狀態)h₁及來自胞元1402之胞元狀態c₁，且產生輸出向量h₂及胞元狀態向量c₂。胞元1404接收輸入向量x₃、來自胞元1403之輸出向量(隱藏狀態) h₂及來自胞元1403之胞元狀態c₂，且產生輸出向量h₃。可使用額外胞元，且具有四個胞元之LSTM僅為示例。 Figure 14 illustrates an example LSTM 1400. This example LSTM 1400 includes cells 1401, 1402, 1403, and 1404. Cell 1401 receives the input vector _x0 and produces the output vector _h0 and cell state vector _c0 . Cell 1402 receives the input vector _x1 , the output vector (hidden state) _h0 from cell 1401, and the cell state _c0 from cell 1401, and produces the output vector _h1 and cell state vector _c1 . Cell 1403 receives the input vector _x2 , the output vector (hidden state) _h1 from cell 1402, and the cell state _c1 from cell 1402, and produces the output vector _h2 and cell state vector _c2 . Cell 1404 receives input vector _x3 , output vector (hidden state) _h2 from cell 1403, and cell state _c2 from cell 1403, and produces output vector _h3 . Additional cells can be used, and an LSTM with four cells is only an example.

圖15描繪LSTM胞元1500之示例實施，其可用於圖14中之胞元1401、1402、1403及1404。LSTM胞元1500接收輸入向量x(t)、來自前一胞元之胞元狀態向量c(t-1)及來自前一胞元之輸出向量h(t-1)，且產生胞元狀態向量c(t)及輸出向量h(t)。 Figure 15 illustrates an example implementation of LSTM cell 1500, which can be used in cells 1401, 1402, 1403, and 1404 in Figure 14. LSTM cell 1500 receives an input vector x(t), a cell state vector c(t-1) from the previous cell, and an output vector h(t-1) from the previous cell, and generates the cell state vector c(t) and the output vector h(t).

LSTM胞元1500包含S型函數裝置1501、1502及1503，其中之每一者應用0與1之間的數字以控制輸入向量中之每個分量有多少被允許通過輸出向量。LSTM胞元1500亦包含：雙曲正切函數裝置1504及1505，用以將雙曲正切函數應用於輸入向量、乘法器裝置1506、1507及1508，用以使二個向量相乘在一起，及加法裝置1509，用以將二個向量相加在一起。輸出向量h(t)可提供至系統中之下一LSTM胞元，或可出於其他目的被存取。 LSTM cell 1500 includes sigmoid function devices 1501, 1502, and 1503, each applying a number between 0 and 1 to control how much of each component of the input vector is allowed to pass through the output vector. LSTM cell 1500 also includes: hyperbolic tangent function devices 1504 and 1505 for applying the hyperbolic tangent function to the input vector; multiplier devices 1506, 1507, and 1508 for multiplying two vectors together; and adder device 1509 for adding two vectors together. The output vector h(t) can be provided to the next LSTM cell in the system or accessed for other purposes.

圖16描繪LSTM胞元1600，其為LSTM胞元1500之實施之示例。為了方便讀者，來自LSTM胞元1500之相同編號用於LSTM胞元1600中。S型函數裝置1501、1502及1503以及雙曲正切函數裝置1504每個包含多個VMM陣列1601及激勵函數區塊1602。因此，可見VMM陣列尤其適用於在某些神經網路系統中使用之LSTM胞元。乘法器裝置1506、1507及1508以及加法裝置1509以數位方式或以類比方式實施。激勵函數區塊1602可以數位方式或以類比方式實施。 Figure 16 depicts LSTM cell 1600, which is an example of an implementation of LSTM cell 1500. For ease of reading, the same designations from LSTM cell 1500 are used in LSTM cell 1600. S-shaped function devices 1501, 1502, and 1503, and hyperbolic tangent function device 1504 each contain multiple VMM arrays 1601 and excitation function blocks 1602. Therefore, it can be seen that VMM arrays are particularly suitable for LSTM cells used in certain neural network systems. Multiplier devices 1506, 1507, and 1508, and adder device 1509 are implemented digitally or analogously. Excitation function block 1602 can be implemented digitally or analogously.

圖17中顯示LSTM胞元1600之替代方案(及LSTM胞元1500之實施之另一示例)。在圖17中，S型函數裝置1501、 1502及1503以及雙曲正切函數裝置1504以時間多工方式共用相同實體硬體(VMM陣列1701及激勵函數區塊1702)。LSTM胞元1700亦包含：乘法器裝置1703，用以使二個向量相乘在一起；加法裝置1708，用以將二個向量相加在一起；雙曲正切函數裝置1505(其包含激勵函數區塊1702)；暫存器1707，用以當i(t)從S型函數區塊1702輸出時儲存值i(t)；暫存器1704，用以當值f(t)*c(t-1)經由多工器1710從乘法器裝置1703輸出時儲存該值；暫存器1705，用以當值i(t)*u(t)經由多工器1710從乘法器裝置1703輸出時儲存該值；及暫存器1706，用以當值o(t)*c~(t)經由多工器1710及多工器1709從乘法器裝置1703輸出時儲存該值。 Figure 17 shows an alternative to LSTM cell 1600 (and another example of an implementation of LSTM cell 1500). In Figure 17, sigmoid function devices 1501, 1502, and 1503 and hyperbolic tangent function device 1504 share the same physical hardware (VMM array 1701 and excitation function block 1702) in a time-multiplexed manner. The LSTM cell 1700 also includes: a multiplier 1703 for multiplying two vectors together; an adder 1708 for adding two vectors together; a hyperbolic tangent function 1505 (which includes the excitation function block 1702); a register 1707 for storing the value i(t) when i(t) is output from the sigmoid function block 1702; and a register 1704 for storing the value f(t). *c(t-1) is stored when it is output from multiplier device 1703 via multiplexer 1710; register 1705 is used to store the value i(t)*u(t) when it is output from multiplier device 1703 via multiplexer 1710; and register 1706 is used to store the value o(t)*c~(t) when it is output from multiplier device 1703 via multiplexer 1710 and multiplexer 1709.

LSTM胞元1600含有VMM陣列1601及各自激勵函數區塊1602之多個集合，而LSTM胞元1700僅含有VMM陣列1701及激勵函數區塊1702之一個集合，其用於表示LSTM胞元1700之示例中之多個層。LSTM胞元1700相較於LSTM 1600將需要較少的空間，因為LSTM胞元1700相較於LSTM胞元1600將需要1/4之空間用於VMM及激勵函數區塊。 LSTM cell 1600 contains multiple sets of VMM arrays 1601 and their respective excitation function blocks 1602, while LSTM cell 1700 contains only one set of VMM arrays 1701 and excitation function blocks 1702, which is used to represent multiple layers in the example of LSTM cell 1700. LSTM cell 1700 will require less space than LSTM 1600 because LSTM cell 1700 requires 1/4 of the space for VMMs and excitation function blocks compared to LSTM cell 1600.

可進一步瞭解，LSTM胞元將通常包含多個VMM陣列，其中之每個需要由VMM陣列外部之某些電路區塊，如加總器及激勵函數區塊以及高電壓產生區塊，所提供之功能性。向每個VMM陣列提供單獨電路區塊將需要半導體裝置內之大量空間且將略微低效。因此，下文所描述之示例減少在VMM陣列自身外部所需之電路系統。 As can be further understood, an LSTM cell will typically contain multiple VMM arrays, each requiring functionality provided by certain circuit blocks outside the VMM arrays, such as summer and excitation function blocks, and high-voltage generation blocks. Providing a separate circuit block for each VMM array would require a significant amount of space within the semiconductor device and would be somewhat inefficient. Therefore, the examples described below reduce the circuitry required outside the VMM arrays themselves.

閘控遞回單元： Gate return unit:

類比VMM實施可用於GRU(閘控遞回單元)系統。GRU為遞回神經網路中之閘控機制。GRU類似於LSTM，除了GRU胞元相較於LSTM胞元通常含有較少之組件外。 A VMM implementation can be used in GRU (Gate Recursive Unit) systems. A GRU is the gate control mechanism in a recursive neural network. A GRU is similar to an LSTM, except that a GRU cell typically contains fewer components than an LSTM cell.

圖18描繪示例GRU 1800。此示例中之GRU 1800包含胞元1801、1802、1803及1804。胞元1801接收輸入向量x₀並且產生輸出向量h₀。胞元1802接收輸入向量x₁、來自胞元1801之輸出向量h₀，且產生輸出向量h₁。胞元1803接收輸入向量x₂及來自胞元1802之輸出向量(隱藏狀態)h₁，且產生輸出向量h₂。胞元1804接收輸入向量x₃及來自胞元1803之輸出向量(隱藏狀態)h₂且產生輸出向量h₃。可使用額外胞元，且具有四個胞元之GRU僅為示例。 Figure 18 illustrates an example GRU 1800. This example GRU 1800 includes cells 1801, 1802, 1803, and 1804. Cell 1801 receives the input vector _x0 and produces the output vector _h0 . Cell 1802 receives the input vector _x1 , the output vector _h0 from cell 1801, and produces the output vector _h1 . Cell 1803 receives the input vector _x2 and the output vector (hidden state) _h1 from cell 1802, and produces the output vector _h2 . Cell 1804 receives the input vector _x3 and the output vector (hidden state) _h2 from cell 1803, and produces the output vector _h3 . Additional cells can be used, and this four-cell GRU is only an example.

圖19描繪GRU胞元1900之示例實施，其可用於圖18之胞元1801、1802、1803及1804。GRU胞元1900接收輸入向量x(t)及來自前一GRU胞元之輸出向量h(t-1)，且產生輸出向量h(t)。GRU胞元1900包含S型函數裝置1901及1902，其中之每個將0與1之間的數字應用至來自輸出向量h(t-1)及輸入向量x(t)之分量。GRU胞元1900亦包含用以將雙曲正切函數應用至輸入向量之雙曲正切函數裝置1903，用以將二個向量相乘在一起之複數個乘法器裝置1904、1905及1906，用以將二個向量相加在一起之加法裝置1907及用以將1減去輸入以產生輸出之互補裝置1908。 Figure 19 illustrates an example embodiment of GRU cell 1900, which can be used in cells 1801, 1802, 1803, and 1804 of Figure 18. GRU cell 1900 receives an input vector x(t) and an output vector h(t-1) from the previous GRU cell, and generates an output vector h(t). GRU cell 1900 includes sigmoid function devices 1901 and 1902, each of which applies a number between 0 and 1 to the components of the output vector h(t-1) and the input vector x(t). The GRU cell 1900 also includes a hyperbolic tangent function device 1903 for applying a hyperbolic tangent function to the input vector, multiple multiplier devices 1904, 1905, and 1906 for multiplying two vectors together, an adder device 1907 for adding two vectors together, and a complementary device 1908 for subtracting the input from 1 to produce the output.

圖20描繪GRU胞元2000，其為GRU胞元1900之實施之示例。為了讀者方便，來自GRU胞元1900之相同編號用於GRU胞元2000中。如圖20中所見，S型函數裝置1901及1902以及雙曲正切函數裝置1903每個包含多個VMM陣列2001及激勵函數區塊2002。因此，可見VMM陣列在某些神經網路系統中使用之GRU胞元中尤其有用。乘法器裝置1904、1905、1906、加法裝置1907及互補裝置1908以數位方式或以類比方式實施。激勵函數區塊2002可以數位方式或以類比方式實施。 Figure 20 depicts GRU cell 2000, which is an example of an implementation of GRU cell 1900. For the reader's convenience, the same designations from GRU cell 1900 are used in GRU cell 2000. As can be seen in Figure 20, the sigmoid function devices 1901 and 1902 and the hyperbolic tangent function device 1903 each contain multiple VMM arrays 2001 and excitation function blocks 2002. Therefore, it can be seen that VMM arrays are particularly useful in GRU cells used in certain neural network systems. Multiplier devices 1904, 1905, 1906, adder device 1907, and complement device 1908 are implemented digitally or analogously. Excitation function block 2002 can be implemented digitally or analogously.

GRU胞元2000之替代方案(及GRU胞元1900之實施之另一示例)顯示於圖21中。在圖21中，GRU胞元2100利用VMM陣列2101及激勵函數區塊2102，當配置為S型函數時其應用0與1之間的數字以控制輸入向量中之每個分量有多少被允許通過輸出向量。在圖21中，S型函數裝置1901及1902以及雙曲正切函數裝置1903以時間多工方式共用相同實體硬體(VMM陣列2101及激勵函數區塊2102)。GRU胞元2100亦包含：乘法器裝置2103，用以使二個向量相乘在一起；加法裝置2105，用以使二個向量相加在一起；互補裝置2109，用以將1減去輸入以產生輸出；多工器2104；暫存器2106，用以當值h(t-1)*r(t)經由多工器2104從乘法器裝置2103輸出時保存該值；暫存器2107，用以當值h(t-1)*z(t)經由多工器2104從乘法器裝置2103輸出時保存該值；及暫存器2108，用以當值h^(t)*(1-z(t))經由多工器2104從乘法器裝置2103輸出時保存該值。 An alternative to GRU cell 2000 (and another example of an implementation of GRU cell 1900) is shown in Figure 21. In Figure 21, GRU cell 2100 utilizes VMM array 2101 and excitation function block 2102, which, when configured as a sigmoid function, uses numbers between 0 and 1 to control how much of each component in the input vector is allowed through the output vector. In Figure 21, sigmoid function devices 1901 and 1902 and hyperbolic tangent function device 1903 share the same physical hardware (VMM array 2101 and excitation function block 2102) in a time-multiplexed manner. The GRU cell 2100 also includes: a multiplier 2103 for multiplying two vectors together; an adder 2105 for adding two vectors together; a complement 2109 for subtracting the input from 1 to produce the output; a multiplexer 2104; a register 2106 for storing the value h(t-1)*r(t) when it is output from the multiplier 2103 via the multiplexer 2104; a register 2107 for storing the value h(t-1)*z(t) when it is output from the multiplier 2103 via the multiplexer 2104; and a register 2108 for storing the value h^(t)*(1-z(t)) when it is output from the multiplier 2103 via the multiplexer 2104.

GRU胞元2000含有VMM陣列2001及激勵函數區塊2002之多個集合，而GRU胞元2100僅含有VMM陣列2101及激勵函數區塊2102的一個集合，其用於表示GRU胞元2100之示例中的多個層。GRU胞元2100相較於GRU胞元2000將需要較少的空間，因為GRU胞元2100相較於GRU胞元2000將需要1/3之空間用於VMM及激勵函數區塊。 GRU cell 2000 contains multiple sets of VMM arrays 2001 and excitation function blocks 2002, while GRU cell 2100 contains only one set of VMM arrays 2101 and excitation function blocks 2102, used to represent multiple layers in the example of GRU cell 2100. GRU cell 2100 will require less space than GRU cell 2000, as GRU cell 2100 requires 1/3 of the space for VMM and excitation function blocks.

可進一步瞭解，GRU系統將通常包含多個VMM陣列，其中之每個需要由VMM陣列外部之某些電路區塊，如加總器及激勵函數區塊以及高電壓產生區塊所提供之功能性。向每個VMM陣列提供單獨電路區塊將需要半導體裝置內之大量空間且將略微低效。因此，下文所描述之示例減少在VMM陣列自身外部所需之電路系統。 As can be further understood, a GRU system will typically contain multiple VMM arrays, each requiring functionality provided by certain circuit blocks outside the VMM arrays, such as summer and excitation function blocks, and high-voltage generation blocks. Providing a separate circuit block for each VMM array would require a significant amount of space within the semiconductor device and would be somewhat inefficient. Therefore, the examples described below reduce the circuitry required outside the VMM arrays themselves.

至VMM陣列之輸入可為類比位準、二進位位準、脈衝、時間調變脈衝或數位位元(在此情況下，需要DAC將數位位元轉換成適當的輸入類比位準)，且輸出可為類比位準、二進位位準、定時脈衝、脈衝或數位位元(在此情況下，需要輸出ADC將輸出類比位準轉換成數位位元)。 The inputs to the VMM array can be analog levels, binary levels, pulses, time-modulated pulses, or digital bits (in which case, a DAC is needed to convert the digital bits to the appropriate input analog level), and the outputs can be analog levels, binary levels, timing pulses, pulses, or digital bits (in which case, an output ADC is needed to convert the output analog level to digital bits).

通常，對於VMM陣列中之每個記憶體胞元，每個權重W可由單一記憶體胞元或差分胞元或二個混合記憶體胞元(2個胞元之平均值)實施。在差分胞元情況下，需要二個記憶體胞元將權重W實施為差分權重(W=W+-W-)。在二個混合記憶體胞元中，需要二個記憶體胞元將權重W實施為二個胞元之平均值。 Typically, for each memory cell in a VMM array, each weight W can be implemented by a single memory cell, a differential cell, or two mixed memory cells (the average of the two cells). In the case of differential cells, two memory cells are needed to implement the weight W as a differential weight (W = W+ - W-). In the case of two mixed memory cells, two memory cells are needed to implement the weight W as the average of the two cells.

圖31描述VMM系統3100。在一些示例中，儲存於VMM陣列中之權重W儲存為差分對W+(正權重)及W-(負權重)，其中W=(W+)-(W-)。在VMM系統3100中，一半位元線被指定為W+線，亦即，連接至將儲存正權重W+之記憶體胞元的位元線，且另一半位元線被指定為W-線，亦即，連接至實施負權重W-之記憶體胞元的位元線。W-線以交替方式穿插於W+線當中。減法運算藉由從W+線及W-線接收電流之加總電路執行，例如加總電路3101及3102。W+線之輸出及W-線之輸出組合在一起，從而對於所有對(W+,W-)線之每個對(W+,W-)胞元，有效地得出W=W+-W-。雖然上文已關於W-線以交替方式穿插在W+線當中進行描述，但在其他示例中，W+線及W-線可任意地位於陣列中之任何位置。 Figure 31 illustrates a VMM system 3100. In some examples, the weights W stored in the VMM array are stored as differential pairs W+ (positive weights) and W- (negative weights), where W = (W+) - (W-). In the VMM system 3100, half of the bit lines are designated as W+ lines, i.e., bit lines connected to memory cells that will store positive weights W+, and the other half of the bit lines are designated as W- lines, i.e., bit lines connected to memory cells that implement negative weights W-. The W- lines are interspersed among the W+ lines. Subtraction The operation is performed by a summing circuit that receives current from the W+ and W- lines, such as summing circuits 3101 and 3102. The outputs of the W+ and W- lines are combined to effectively derive W = W+ - W- for each (W+, W-) cell of all pairs of (W+, W-) lines. Although the alternating arrangement of the W- lines among the W+ lines has been described above, in other examples, the W+ and W- lines can be positioned arbitrarily anywhere in the array.

圖32描繪另一示例。在VMM系統3210中，正權重W+於第一陣列3211中實施且負權重W-於第二陣列3212中實施，第二陣列3212與第一陣列分開，且所得權重藉由加總電路3213適當地組合在一起。 Figure 32 illustrates another example. In the VMM system 3210, positive weights W+ are implemented in a first array 3211 and negative weights W- are implemented in a second array 3212, which is separate from the first array, and the resulting weights are appropriately combined by a summing circuit 3213.

圖33描繪VMM系統3300。儲存於VMM陣列中之權重W儲存為差分對W+(正權重)及W-(負權重)，其中W=(W+)-(W-)。VMM系統3300包含陣列3301及陣列3302。陣列3301及3302中之每個中的一半位元線被指定為W+線，亦即，連接至將儲存正權重W+之記憶體胞元的位元線，且陣列3301及3302中之每個中的另一半位元線被指定為W-線，亦即，連接至實施負權重W-之記憶體胞元的位元線。W-線以交替方式穿插於W+線當中。減法運算藉由從W+線及W-線接收電流之加總電路執行，如加總電路3303、3304、3305及3306。來自每個陣列3301、3302之W+線之輸出及W-線之輸出分別組合在一起，從而對於所有對(W+,W-)線之每個對(W+,W-)胞元，有效地得出W=W+-W-。另外，來自每個陣列3301及3302之W值可經由加總電路3307及3308進一步組合，使得每個W值為來自陣列3301之W值減去來自陣列3302之W值的結果，此意謂來自加總電路3307及3308之終端結果為二個差分值之差分值。 Figure 33 depicts a VMM system 3300. The weights W stored in the VMM array are stored as difference pairs W+ (positive weights) and W- (negative weights), where W = (W+) - (W-). The VMM system 3300 includes arrays 3301 and 3302. Half of the bit lines in each of arrays 3301 and 3302 are designated as W+ lines, i.e., bit lines connected to memory cells where positive weights W+ will be stored, and the other half of the bit lines in each of arrays 3301 and 3302 are designated as W- lines, i.e., bit lines connected to memory cells where negative weights W- are implemented. W- lines are interspersed among the W+ lines in an alternating manner. The subtraction operation is performed by summing circuits that receive current from the W+ and W- lines, such as summing circuits 3303, 3304, 3305, and 3306. The outputs of the W+ and W- lines from each array 3301 and 3302 are combined to effectively derive W = W+ - W- for each (W+, W-) cell of all pairs of (W+, W-) lines. Furthermore, the W values from each array 3301 and 3302 can be further combined via summing circuits 3307 and 3308, such that each W value is the result of subtracting the W value from array 3302 from the W value from array 3301. This means that the final result from summing circuits 3307 and 3308 is the difference between two differential values.

用於類比神經記憶體系統中之每個非揮發性記憶體胞元為待抹除及程式化，以在浮動閘極中保持極特定且精確的電荷量，亦即電子數目。舉例而言，每個浮動閘極應保存N個不同值之一者，其中N係可由每個胞元指示之不同權重的數目。N之示例包括16、32、64、128及256。 This analogy is used to describe how each non-volatile memory cell in a neural memory system is to be erased and programmed to maintain a very specific and precise amount of charge, i.e., the number of electrons, in a floating gate. For example, each floating gate should retain one of N different values, where N is the number of different weights that can be indicated by each cell. Examples of N include 16, 32, 64, 128, and 256.

現有技術的VMM系統需要大量的面積並且在輸入及輸出級處涉及大量的延遲。在輸入級，在程式化操作之前需要多個時脈週期以將激勵資料載入到列暫存器中。例如，對於8位元I/O，每列需要8位元激勵資料，通常數量為1024列或更多，這需要每行一個時脈週期，或者假如有1024列則需要1024個時脈週期，導致10ns到10μs之間的延遲。在輸出級，移動神經元輸出資料也涉及延遲。例如，對於128ADC，8位元輸出需要128個時脈。 Existing VMM systems require a large area and involve significant latency at both the input and output stages. At the input stage, multiple clock cycles are needed to load the excitation data into the column registers before programmed operations can begin. For example, for 8-bit I/O, each column requires 8 bits of excitation data, typically 1024 columns or more, requiring one clock cycle per row, or 1024 clock cycles if there are 1024 columns, resulting in a latency between 10 ns and 10 μs. At the output stage, moving neuron output data also involves latency. For example, for a 128-bit ADC, an 8-bit output requires 128 clock cycles.

期望減少輸入級與輸出級處的延遲以提高人工神經網路的整體操作速度。 The goal is to reduce latency at the input and output stages to improve the overall operating speed of artificial neural networks.

揭示輸入電路系統及輸出電路系統及相關方法的許多示例，以實施人工神經網路中的並行與管線操作。 Numerous examples of input and output circuit systems and related methods are disclosed to implement parallel and pipelined operation in artificial neural networks.

12:半導體基板 12: Semiconductor substrate

14:源極區 14: Source Area

16:汲極區 16: Jiji Area

18:通道區 18: Passage Area

20:浮動閘極 20: Floating gate pole

22:字元線端子 22: Character Line Terminal

24:位元線 24: Bitline

28:控制閘極 28: Control Gate

30:抹除閘極 30: Erasure of the gate pole

31:數位至類比轉換器 31: Digital-to-Analog Converter

32,32a,32b,32c,32d,32e:向量乘矩陣乘法陣列 32, 32a, 32b, 32c, 32d, 32e: Vector-matrix multiplication arrays

33:非揮發性記憶體胞元陣列 33: Non-volatile memory cell arrays

34:抹除閘極及字元線閘極解碼器 34: Gate and character line eraser decoder

35:控制閘極解碼器 35: Control Gate Decoder

36:位元線解碼器 36-bit line decoder

37:源極線解碼器 37: Source Line Decoder

38:差分加總器 38: Differential Additive

39,1602,1702,2002,2102:激勵函數區塊 Blocks 39, 1602, 1702, 2002, 2102: Excitation function blocks

210,310,410,510,710:記憶體胞元 210, 310, 410, 510, 710: Memory cells

900,1000,1100,1200,1300,1601,1701,2001,2101,2200,2300,2400,2500,2600,2700,2800,2900,3000,3401:VMM陣列 900, 1000, 1100, 1200, 1300, 1601, 1701, 2001, 2101, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3401: VMM Array

901,1003,1103,1203,1303:記憶體陣列 901, 1003, 1103, 1203, 1303: Memory arrays

902,1001,1002,1101,1102,1201,1202,1301,1302:參考陣列 902, 1001, 1002, 1101, 1102, 1201, 1202, 1301, 1302: Reference Array

903:控制閘極線 903: Control Gate Electrode

904:抹除閘極線 904: Remove gate polarity wire

1014:二極體連接式貫穿多工器 1014: Diode-connected through-type multiplexer

1204:疊接電晶體 1204: Overlay transistor

1205,1212:多工器 1205, 1212: Multiplexer

1314:二極體連接式參考胞元貫穿多工器 1314: Diode-connected reference cell through-type multiplexer

1400:LSTM 1400:LSTM

1401,1402,1403,1404,1801,1802,1803,1804:胞元 1401, 1402, 1403, 1404, 1801, 1802, 1803, 1804: Cells

1500,1600,1700:LSTM胞元 1500, 1600, 1700: LSTM cells

1501,1502,1503,1901,1902:S型函數裝置 1501, 1502, 1503, 1901, 1902: S-shaped function device

1504,1505,1903:雙曲正切函數裝置 1504, 1505, 1903: Hyperbolic Tangent Function Apparatus

1506,1507,1508,1703,1904,1905,1906,2103:乘法器裝置 1506, 1507, 1508, 1703, 1904, 1905, 1906, 2103: Multiplier device

1509,1708,1907,2105:加法裝置 1509, 1708, 1907, 2105: Adding device

1704,1705,1706,1707,2106,2107,2108:暫存器 1704, 1705, 1706, 1707, 2106, 2107, 2108: Registers

1709,1710,2104:多工器 1709, 1710, 2104: Multiplexer

1800:GRU 1800:GRU

1900,2000,2100:GRU胞元 1900, 2000, 2100: GRU cells

1908,2109:互補裝置 1908, 2109: Complementary Device

2701-1,2701-2,．．．,2701-(N-1),2701-N:位元線控制閘極 2701-1, 2701-2, ... , 2701-(N-1), 2701-N: Bit line control gate

3100,3210,3300,3400:VMM系統 3100, 3210, 3300, 3400: VMM systems

3101,3102,3213,3303,3304,3305,3306,3307,3308:加總電路 3101, 3102, 3213, 3303, 3304, 3305, 3306, 3307, 3308: Summarization circuit

3211:第一陣列 3211: First Line

3212:第二陣列 3212: Second Formation

3301,3302:陣列 3301, 3302: Formation

3402:列解碼器 3402: Column decoder

3403:高電壓解碼器 3403: High Voltage Decoder

3404:行解碼器 3404: Line decoder

3405:位元線驅動器 3405: Bitline Driver

3406:輸入電路 3406: Input Circuit

3407:輸出電路 3407: Output Circuit

3408:控制邏輯 3408: Control Logic

3409:偏壓產生器 3409: Bias Generator

3410:高電壓產生區塊 3410: High Voltage Generation Block

3411:電荷泵 3411: Electric Charge Pump

3412:電荷泵調節器 3412: Electric Charge Pump Regulator

3413:高電壓位準產生器 3413: High Voltage Position Generator

3414:演算法控制器 3414: Algorithm Controller

3415:類比電路系統 3415: Analog Circuit System

3416:控制引擎 3416: Control Engine

3417:測試控制邏輯 3417: Test Control Logic

3418:靜態隨機存取記憶體區塊 3418: Static Random Access to Memory Blocks

3500,3550,3600,3800,3900,3920,3940,3960:輸入區塊 3500, 3550, 3600, 3800, 3900, 3920, 3940, 3960: Input blocks

3501,3551,3601,3601-0,3601-1:全域數位至類比轉換器 3501, 3551, 3601, 3601-0, 3601-1: Global Digital-to-Analog Converters

3502,3502-0,3502-1,．．．,3502-(n-1),3502-n:列暫存器 3502, 3502-0, 3502-1, ..., 3502-(n-1), 3502-n: Column registers

3503,3503-0,3503-1,．．．,3503-(n-1),3503-n:數位比較器區塊 3503, 3503-0, 3503-1, ..., 3503-(n-1), 3503-n: Comparator blocks

3504,3504-0,3504-1,．．．,3504-(n-1),3504-n:列取樣及保持緩衝器 3504, 3504-0, 3504-1, ..., 3504-(n-1), 3504-n: Column sampling and holding buffers

3505,3505-0,3505-1,．．．,3505-(n-1),3505-n:輸出信號 3505, 3505-0, 3505-1, ..., 3505-(n-1), 3505-n: Output signals

3552,3552-0,3552-1,．．．,3552-(n-1),3552-n:列暫存器 3552, 3552-0, 3552-1, ..., 3552-(n-1), 3552-n: Column registers

3553,3553-0,3553-1,．．．,3553-(n-1),3553-n:數位多工器區塊 3553,3553-0,3553-1,．．． ,3553-(n-1),3553-n: Digital multiplexer blocks

3554,3554-0,3554-1,．．．,3554-(n-1),3554-n:列取樣及保持緩衝器 3554, 3554-0, 3554-1, ..., 3554-(n-1), 3554-n: Column sampling and holding buffers

3555,3555-0,3555-1,．．．,3555-(n-1),3555-n:輸出信號 3555, 3555-0, 3555-1, ..., 3555-(n-1), 3555-n: Output signal

3602,3602-0,3602-1,．．．,3602-(n-1),3602-n:列暫存器 3602, 3602-0, 3602-1, ..., 3602-(n-1), 3602-n: Column registers

3603,3603-0,3603-1,．．．,3603-(n-1),3603-n:數位比較器區塊 3603, 3603-0, 3603-1, ..., 3603-(n-1), 3603-n: Digit comparator blocks

3604,3604-0,3604-1,．．．,3604-(n-1),3604-n:列取樣及保持緩衝器 3604, 3604-0, 3604-1, ..., 3604-(n-1), 3604-n: Column sampling and holding buffers

3605,3605-0,3605-1,．．．,3605-(n-1),3605-n:輸出信號 3605, 3605-0, 3605-1, ..., 3605-(n-1), 3605-n: Output signals

3700,3720:波形 3700, 3720: Waveform

3701,3721:信號GDACsup 3701, 3721: Signal GDACsup

3702,3703,3722,3723:信號 3702, 3703, 3722, 3723: Signals

3704,3724:信號DAC_sampling_en 3704, 3724: Signal DAC_sampling_en

3705,3706,3707,3708,3725,3726,3727,3728:邊緣 3705, 3706, 3707, 3708, 3725, 3726, 3727, 3728: Edge

3801,3801-0,3801-1,．．．,3801-n:暫存器 3801, 3801-0, 3801-1, ... , 3801-n: Registers

3802,3802-0,3802-1,．．．,3802-n:暫存器 3802, 3802-0, 3802-1, ..., 3802-n: Registers

3803,3803-0,3803-1,．．．,3803-n:列取樣及保持緩衝器 3803, 3803-0, 3803-1, ..., 3803-n: Column sampling and holding buffers

3804,3804-0,3804-1,．．．,3804-n:位址解碼器 3804, 3804-0, 3804-1, ..., 3804-n: Address decoder

3810,3910:子區塊 3810, 3910: Sub-blocks

3850:輸入方法 3850: Input Method

3902,3902-0,3902-1,．．．,3902-n:列暫存器 3902, 3902-0, 3902-1, ..., 3902-n: Column registers

3903,3903-0,3903-1,．．．,3903-n:列取樣及保持緩衝器 3903, 3903-0, 3903-1, ..., 3903-n: Column sampling and holding buffers

3904,3904-0,3904-1,．．．,3904-n:位址解碼器 3904, 3904-0, 3904-1, ..., 3904-n: Address decoder

3905,3905-0,3905-1,．．．,3905-n:標籤位元 3905, 3905-0, 3905-1, ..., 3905-n: Label bits

3906,3906-0,3906-1,．．．,3906-n:列暫存器 3906, 3906-0, 3906-1, ..., 3906-n: Column registers

3907,3907-0,3907-1,．．．,3907-n:標籤位元 3907, 3907-0, 3907-1, ..., 3907-n: Label bits

3908,3908-0,3908-1,．．．,3908-n:列暫存器 3908, 3908-0, 3908-1, ..., 3908-n: Column registers

3909,3909-0,3909-1,．．．,3909-n:標籤位元 3909, 3909-0, 3909-1, ... , 3909-n: Label bits

3911,3911-0,3911-1,．．．,3911-n:列取樣及保持緩衝器 3911, 3911-0, 3911-1, ... , 3911-n: Column sampling and holding buffers

4000,4020:輸出區塊 4000, 4020: Output blocks

4001:電流至電壓轉換器 4001: Current to Voltage Converter

4002:類比至數位轉換器 4002: Analog to Digital Converter

4003,4004:輸出暫存器 4003, 4004: Output Registers

4005:行標籤位元 4005: Row label bit

4021:累加器 4021: Accumulator

4042:移位器 4042: Shifter

4043:加法器 4043: Adder

4044:累加器暫存器 4044: Accumulator Register

4100,4200,4300,4400:波形 4100, 4200, 4300, 4400: Waveforms

4101:第一階段 4101: First Stage

4102:第二階段 4102: Second Stage

4201:隨機存取讀取操作 4201: Random access read operation

4301:突發讀取操作 4301: Sudden Read Operation

4401:神經讀取操作 4401: Neural Reading Operations

4500,4600,4700,4900:神經讀取操作 4500, 4600, 4700, 4900: Neural readout operations

4800:讀出操作 4800: Read operation

C1:層 C1: Layer

C2:層 C2: Layer

C3:層 C3: Layer

CB1:突觸 CB1: Synapse

CB2:突觸 CB2: Synapse

CB3:突觸 CB3: Synapse

CB4:突觸 CB4: Synapse

P1:激勵函數 P1: Incentive Function

P2:激勵函數 P2: Incentive Function

S1:層 S1: Layer

S2:層 S2: Layer

S3:層 S3: Layer

圖1為說明人工神經網路之圖。 Figure 1 is a diagram illustrating an artificial neural network.

圖2描繪先前技術分離閘極快閃記憶體胞元。 Figure 2 depicts the separation of gated ultra-fast flash memory cells using prior art techniques.

圖3描繪另一先前技術分離閘極快閃記憶體胞元。 Figure 3 depicts another prior art technique for isolating gate-type ultra-fast flash memory cells.

圖4描繪另一先前技術分離閘極快閃記憶體胞元。 Figure 4 depicts another prior art technique for isolating gate-type ultra-fast flash memory cells.

圖5描繪另一先前技術分離閘極快閃記憶體胞元。 Figure 5 depicts another prior art technique for isolating gate-type ultra-fast flash memory cells.

圖6為說明利用一或多個非揮發性記憶體陣列之例示性人工神經網路之不同位準的圖。 Figure 6 is a diagram illustrating different levels of an exemplary artificial neural network utilizing one or more non-volatile memory arrays.

圖7為說明VMM系統之方塊圖。 Figure 7 is a block diagram illustrating the VMM system.

圖8為說明利用一或多個VMM系統之示例性人工神經網路的方塊圖。 Figure 8 is a block diagram illustrating an exemplary artificial neural network utilizing one or more VMM systems.

圖9描繪VMM系統之另一示例。 Figure 9 illustrates another example of a VMM system.

圖10描繪VMM系統之另一示例。 Figure 10 depicts another example of a VMM system.

圖11描繪VMM系統之另一示例。 Figure 11 illustrates another example of a VMM system.

圖12描繪VMM系統之另一示例。 Figure 12 illustrates another example of a VMM system.

圖13描繪VMM系統之另一示例。 Figure 13 illustrates another example of a VMM system.

圖14描繪先前技術長短期記憶體系統。 Figure 14 illustrates a prior art Long Short-Term Memory (LSTM) system.

圖15描繪供用於長短期記憶體系統中之示例性胞元。 Figure 15 depicts an exemplary cell for use in a Long Short-Term Memory (LSTM) system.

圖16描繪圖15之胞元之示例性實施。 Figure 16 illustrates an exemplary implementation of the cell in Figure 15.

圖17描繪圖15之胞元之另一示例性實施。 Figure 17 illustrates another exemplary implementation of the cell of Figure 15.

圖18描繪先前技術閘控遞回單元系統。 Figure 18 illustrates a prior art gate return unit system.

圖19描繪用於閘控遞回單元系統中之示例性胞元。 Figure 19 depicts an exemplary cell used in a gate control return unit system.

圖20描繪圖19之胞元的示例性實施。 Figure 20 illustrates an exemplary implementation of the cell in Figure 19.

圖21描繪圖19之胞元之另一示例性實施。 Figure 21 depicts another exemplary implementation of the cell of Figure 19.

圖22描繪VMM系統之另一示例。 Figure 22 illustrates another example of a VMM system.

圖23描繪VMM系統之另一示例。 Figure 23 illustrates another example of a VMM system.

圖24描繪VMM系統之另一示例。 Figure 24 illustrates another example of a VMM system.

圖25描繪VMM系統之另一示例。 Figure 25 illustrates another example of a VMM system.

圖26描繪VMM系統之另一示例。 Figure 26 illustrates another example of a VMM system.

圖27描繪VMM系統之另一示例。 Figure 27 illustrates another example of a VMM system.

圖28描繪VMM系統之另一示例。 Figure 28 illustrates another example of a VMM system.

圖29描繪VMM系統之另一示例。 Figure 29 illustrates another example of a VMM system.

圖30描繪VMM系統之另一示例。 Figure 30 depicts another example of a VMM system.

圖31描繪VMM系統之另一示例。 Figure 31 depicts another example of a VMM system.

圖32描繪VMM系統之另一示例。 Figure 32 illustrates another example of a VMM system.

圖33描繪VMM系統之另一示例。 Figure 33 illustrates another example of a VMM system.

圖34描繪VMM系統之另一示例。 Figure 34 illustrates another example of a VMM system.

圖35A及35B描繪VMM系統的輸入區塊。 Figures 35A and 35B depict the input blocks of the VMM system.

圖36描繪VMM系統的輸入區塊。 Figure 36 depicts the input block of the VMM system.

圖37A及37B描繪與VMM陣列的輸入操作相關聯的信號。 Figures 37A and 37B depict the signals associated with the input operations of the VMM array.

圖38A描繪VMM系統的輸入區塊。 Figure 38A depicts the input block of a VMM system.

圖38B描繪輸入方法。 Figure 38B depicts the input method.

圖39A、39B、39C及39D描繪VMM系統的輸入區塊。 Figures 39A, 39B, 39C, and 39D depict the input blocks of the VMM system.

圖40A描繪VMM系統的輸出區塊。 Figure 40A depicts the output block of the VMM system.

圖40B描繪VMM系統的輸出區塊。 Figure 40B depicts the output block of the VMM system.

圖40C描繪VMM系統的輸出區塊。 Figure 40C depicts the output block of the VMM system.

圖41描繪VMM系統的波形。 Figure 41 depicts the waveforms of the VMM system.

圖42描繪VMM系統的波形。 Figure 42 depicts the waveforms of the VMM system.

圖43描繪VMM系統的波形。 Figure 43 depicts the waveforms of the VMM system.

圖44描繪VMM系統的波形。 Figure 44 depicts the waveforms of the VMM system.

圖45描繪神經讀取操作方法。 Figure 45 illustrates the procedure for nerve reading.

圖46描繪神經讀取操作方法。 Figure 46 illustrates the neurotransmission procedure.

圖47描繪神經讀取操作方法。 Figure 47 illustrates the procedure for nerve reading.

圖48描繪神經讀取操作方法。 Figure 48 illustrates the neurotransmission procedure.

圖49描繪神經讀取操作方法。 Figure 49 illustrates the neurotransmission procedure.

VMM系統架構： VMM system architecture:

圖34描繪VMM系統3400之方塊圖。VMM系統3400包含VMM陣列3401、列解碼器3402、高電壓解碼器3403、行解碼器3404、位元線驅動器3405(例如用於程式化的位元線控制電路系統)、輸入電路3406、輸出電路3407、控制邏輯3408及偏壓產生器3409。VMM系統3400進一步包含高電壓產生區塊3410，該高電壓產生區塊包含電荷泵3411、電荷泵調節器3412及高電壓位準產生器3413。VMM系統3400進一步包含(程式化/抹除，或權重調諧)演算法控制器3414、類比電路系統3415、控制引擎3416(其可包括特殊函數，如算術函數、激勵函數、嵌入式微控制器邏輯，但不限於此)、測試控制邏輯3417以及用以儲存中間資料的靜態隨機存取記憶體(SRAM)區塊3418，例如用於輸入電路的中間資料(例如，激勵資料)或輸出電路的中間資料(神經元輸出資料、部分加總輸出神經元資料)或用於程式化的資料輸入(例如對於整列或多列的資料輸入)。 Figure 34 depicts a block diagram of a VMM system 3400. The VMM system 3400 includes a VMM array 3401, a column decoder 3402, a high-voltage decoder 3403, a row decoder 3404, a bit line driver 3405 (e.g., for a programmable bit line control circuit system), an input circuit 3406, an output circuit 3407, a control logic 3408, and a bias generator 3409. The VMM system 3400 further includes a high-voltage generation block 3410, which includes a charge pump 3411, a charge pump regulator 3412, and a high-voltage level generator 3413. The VMM system 3400 further includes a (programmed/erase, or weighted tuning) algorithm controller 3414, an analog circuit system 3415, a control engine 3416 (which may include, but is not limited to, special functions such as arithmetic functions, excitation functions, embedded microcontroller logic), a test control logic 3417, and static random access memory (SRAM) blocks 3418 for storing intermediate data, such as intermediate data for input circuits (e.g., excitation data) or intermediate data for output circuits (neural output data, partially summed output neuron data) or for programmable data input (e.g., for single or multiple columns of data input).

輸入電路3406可以包括如DAC(數位至類比轉換器)、DPC(數位至脈衝轉換器，數位至時間調變脈衝轉換器)、AAC(類比至類比轉換器，例如，電流至電壓轉換器、對數轉換器)、PAC(脈衝至類比位準轉換器)或任何其它類型的轉換器之電路。輸入電路3406可以實施正規化、線性或非線性上/下縮放函數或算術函數的一個或多個。輸入電路3406可以實施輸入位準的溫度補償功能。輸入電路3406可以實施如ReLU或sigmoid激勵函數。輸入電路3406可以在程式化或讀取操作期間儲存將被應用為輸入信號或與輸入信號組合的數位激勵資料。數位激勵資料可以儲存在暫存器中。輸入電路3406可以包括用以驅動陣列端子(例如CG、WL、EG及SL線)的電路，其可以包括取樣及保持電路與緩衝器。DAC可用於將數位激勵資料轉換為類比輸入電壓以應用於陣列。 Input circuitry 3406 may include circuitry such as a DAC (digital-to-analog converter), DPC (digital-to-pulse converter, digital-to-time-modulated pulse converter), AAC (analog-to-analog converter, e.g., current-to-voltage converter, logarithmic converter), PAC (pulse-to-analog level converter), or any other type of converter. Input circuitry 3406 may implement one or more normalization, linear, or nonlinear up/down scaling functions or arithmetic functions. Input circuitry 3406 may implement temperature compensation for the input level. Input circuitry 3406 may implement excitation functions such as ReLU or sigmoid. Input circuit 3406 can store digital excitation data that will be used as or combined with input signals during programming or read operations. The digital excitation data can be stored in a register. Input circuit 3406 may include circuitry for driving array terminals (e.g., CG, WL, EG, and SL lines), and may include sampling and holding circuitry and a buffer. A DAC can be used to convert the digital excitation data into an analog input voltage for application to the array.

輸出電路3407可以包括如ITV(電流至電壓電路)、ADC(類比至數位轉換器，用以將神經元類比輸出轉換成數位位元)、AAC(類比至類比轉換器，例如，電流至電壓轉換器、對數轉換器)、APC(類比至脈衝轉換器、類比至時間調變脈衝轉換器)或任何其它類型的轉換器之電路。輸出電路3407可以將陣列輸出轉換成激勵資料。輸出電路3407可以實施激勵函數，例如整流線性激勵函數(ReLU)或sigmoid激勵函數。輸出電路3407可以對神經元輸出實施統計正規化、正則化、上/下縮放/增益函數、統計修整及算術函數(例如，加、減、除、乘、移位、對數)的一個或多個。輸出電路3407可以對神經元輸出或陣列輸出(例如，位元線輸出)實施溫度補償函數，以便例如藉由保持IV斜率大致隨溫度相同來保持陣列的功率消耗近似恆定或提高陣列(神經元)輸出的精確度。輸出電路3407可以包括儲存輸出資料的暫存器。 Output circuit 3407 may include circuitry such as an ITV (current-to-voltage) circuit, an ADC (analog-to-digital converter for converting analog neuron outputs to digital bits), an AAC (analog-to-analog converter, e.g., a current-to-voltage converter, a logarithmic converter), an APC (analog-to-pulse converter, analog-to-time-modulated pulse converter), or any other type of converter. Output circuit 3407 may convert array outputs into excitation data. Output circuit 3407 may implement an excitation function, such as a rectified linear excitation function (ReLU) or a sigmoid excitation function. Output circuit 3407 may apply one or more of the following statistical functions to the neural output: statistical normalization, regularization, up/down scaling/gain functions, statistical trimming, and arithmetic functions (e.g., addition, subtraction, division, multiplication, shifting, logarithmic). Output circuit 3407 may apply a temperature compensation function to the neural output or array output (e.g., bitline output) to maintain approximately constant power consumption of the array or improve the accuracy of the array (neuron) output, for example, by keeping the IV slope approximately the same with temperature. Output circuit 3407 may include a register storing output data.

圖35A描繪用於向VMM陣列3401提供輸入的輸入區塊3500。輸入區塊3500包括全域數位至類比轉換器(DAC)3501；列暫存器3502-0至3502-n，分別對應於陣列中編號為0至n的列之一者；數位比較器區塊3503-0至3503-n，分別對應於陣列中編號為0至n的列之一者；列取樣及保持緩衝器3504-0至3504-n，分別對應於陣列中編號0至n的列之一者；以及輸出信號3505-0至3505-n，分別對應於陣列中編號為0至n的列之一者，並分別表示為CGIN0、CGIN1．．．、CGINn-1及CGINn。信號GDACsup是由全域DAC 3501提供的全域DAC信號。信號CGIN0至CGINn耦合到陣列3401的各自的列輸入。CLKDAC是用於提供類比輸出值的GDAC的輸入時脈。在一個示例中，這些類比輸出值對應於CLKDAC時脈的計數。 Figure 35A depicts the input block 3500 used to provide input to the VMM array 3401. The input block 3500 includes a global digital-to-analog converter (DAC) 3501; column registers 3502-0 to 3502-n, each corresponding to one of the columns numbered 0 to n in the array; digital comparator blocks 3503-0 to 3503-n, each corresponding to one of the columns numbered 0 to n in the array; column sample and hold buffers 3504-0 to 3504-n, each corresponding to one of the columns numbered 0 to n in the array; and output signals 3505-0 to 3505-n, each corresponding to one of the columns numbered 0 to n in the array, and are respectively denoted as CGIN0, CGIN1..., CGINn-1, and CGINn. The signal GDACsup is a global DAC signal provided by the global DAC 3501. Signals CGIN0 through CGINn are coupled to their respective column inputs of array 3401. CLKDAC is the input clock of the GDAC used to provide the analog output values. In one example, these analog output values correspond to the count of the CLKDAC clock.

數位比較器區塊3503將儲存在相關聯的列暫存器3502中的值與信號CLKCOUNTx進行比較。CLKCOUNTx是計數器的結果，該計數器在預定間隔期間對時脈信號進行計數；假如匹配，則對應的列S/H 3504藉由各自的數位比較器區塊3503致能以將來自全域DAC 3501的值取樣到各自的列S/H緩衝器中。該技術將稱為全域列DAC取樣。如上所述，VMM陣列3401中的每一列具有對應的列暫存器3502、數位比較器區塊3503及列S/H 3504。 The comparator block 3503 compares the value stored in the associated column register 3502 with the signal CLKCOUNTx. CLKCOUNTx is the result of a counter that counts the clock signal at predetermined intervals; if a match is found, the corresponding column S/H 3504 is enabled by its respective comparator block 3503 to sample the value from the global DAC 3501 into its respective column S/H buffer. This technique is referred to as global column DAC sampling. As described above, each column in the VMM array 3401 has a corresponding column register 3502, comparator block 3503, and column S/H 3504.

在操作期間，列暫存器3502-0至3502-n加載有用於該特定列的數位輸入位元DINx(其中x是位元的數目，例如8或16位元)並且接收時脈信號CLK。CLK信號用於將來自數位輸入位元DINx的資料載入到各自的列暫存器3502-x中。全域DAC 3501由所有列共用，並且以時間多工的方式對儲存在特定列暫存器3502中的數位位元DINx執行數位至類比轉換。該轉換是藉由數位比較器區塊3503的每一者，藉由將特定列的數位輸入位元與作為數位計數值的信號CLKCOUNTx進行比較來完成的。當信號 CLKCOUNTx的數位計數值與各自的列暫存器3502的內容匹配時，該列的對應的列取樣及保持緩衝器3504對來自全域DAC 3501的類比輸出進行取樣並保持該值，然後將該值用作該特定列的輸出信號3505。輸出信號3505可以在該特定列中的程式化操作期間以關於上面其他圖式描述的方式應用到例如控制閘極線或字元線或抹除閘極。 During operation, column registers 3502-0 to 3502-n are loaded with the digital input bits DINx (where x is the number of bits, such as 8 or 16 bits) for that particular column and receive the clock signal CLK. The CLK signal is used to load the data from the digital input bits DINx into their respective column registers 3502-x. The global DAC 3501 is shared by all columns and performs digital-to-analog conversion on the digital bits DINx stored in the particular column registers 3502 in a time-multiplexed manner. This conversion is accomplished by each of the digital comparator blocks 3503 by comparing the digital input bits of the particular column with a signal CLKCOUNTx, which serves as the digital counter value. When the digital count value of signal CLKCOUNTx matches the contents of the respective column register 3502, the corresponding column sample-and-hold buffer 3504 samples and holds the value of the analog output from the global DAC 3501, and then uses that value as the output signal 3505 for that particular column. Output signal 3505 can be applied during programmed operation in that particular column, for example, to control gate lines or word lines, or to erase gates, in a manner described with respect to the other diagrams above.

可選替地，列取樣及保持緩衝器3504可以通過對列取樣及保持緩衝器進行時間多工而在兩個或更多列之間共用。 Alternatively, the column sampling and holding buffer 3504 can be shared among two or more columns by time multiplexing the column sampling and holding buffer.

圖35B描繪待用於向VMM陣列3401提供輸入的輸入區塊3550。輸入區塊3550包括全域數位至類比轉換器(DAC)3551；列暫存器3552-0至3552-n，分別對應於VMM陣列3401中編號為0至n的列之一者；數位多工器(mux)區塊3553-0至3553-n，分別對應於編號為0至n的列之一者；列取樣及保持(S/H)緩衝器3554-0到3554-n，分別對應於編號為0到n的列之一者；以及輸出信號3555-0至3555-n，分別表示為CGIN0、CGIN1．．．、CGINn-1及CGINn，分別對應於編號為0至n的列之一者。數位多工器區塊3553用於將各自列暫存器3552的資料多工到匯流排GDAC_DINx中，該資料被用作至全域DAC 3551的輸入。對應的列S/H緩衝器3554對來自全域DAC的值進行取樣進入本地S/H緩衝器3554。每列具有其自己各自的列暫存器3552、S/H緩衝器3554及輸出信號3555。 Figure 35B depicts the input block 3550 used to provide input to the VMM array 3401. The input block 3550 includes a global digital-to-analog converter (DAC) 3551; column registers 3552-0 to 3552-n, each corresponding to one of the columns numbered 0 to n in the VMM array 3401; digital multiplexer (mux) blocks 3553-0 to 3553-n, each corresponding to one of the columns numbered 0 to n; column sample-and-hold (S/H) buffers 3554-0 to 3554-n, each corresponding to one of the columns numbered 0 to n; and output signals 3555-0 to 3555-n, denoted as CGIN0, CGIN1, ... CGINn-1 and CGINn correspond to columns numbered 0 to n, respectively. The digital multiplexer block 3553 multiplexes data from its respective column registers 3552 into the bus GDAC_DINx, which is then used as input to the global DAC 3551. The corresponding column S/H buffer 3554 samples the values from the global DAC into its local S/H buffer 3554. Each column has its own column register 3552, S/H buffer 3554, and output signal 3555.

在操作期間，列暫存器3552-0到3552-n加載用於該特定列的數位輸入位元DINx(其中x是位元的數目，如8或16位元)並且接收時脈信號CLK。CLK信號用於將來自數位輸入位元DINx 的資料載入到各自的列暫存器3552中。全域數位至類比轉換器3551由所有列共用，並且以時間多工的方式對對儲存在特定列暫存器3552中的數位位元DINx執行數位至類比轉換。該轉換藉由將列暫存器的資料時間多工成為全域DAC 3551的資料輸入(匯流排GDAC_DINx)來完成。列暫存器資料成為資料輸入匯流排GDAC_DINx的多工藉由每列各自的致能信號EN-x 3557-x來致能。對應的列取樣及保持緩衝器3554對來自全域DAC 3551的類比輸出進行取樣並保持該值，然後將該值用作該特定列的輸出信號3555。例如，在該特定列中的程式化操作期間，輸出信號3555可以用如上述關於其他圖式的方式應用到控制閘極線或字元線。 During operation, column registers 3552-0 to 3552-n load the digital input bits DINx (where x is the number of bits, such as 8 or 16 bits) for that specific column and receive the clock signal CLK. The CLK signal is used to load the data from the digital input bits DINx into their respective column registers 3552. The global digital-to-analog converter 3551 is shared by all columns and performs digital-to-analog conversion on the digital bits DINx stored in the specific column registers 3552 in a time-multiplexed manner. This conversion is accomplished by time-multiplexing the data from the column registers into the data inputs (bus GDAC_DINx) of the global DAC 3551. The column register data becomes multiplexed on the data input bus GDAC_DINx by a separate enable signal EN-x 3557-x for each column. The corresponding column sample-and-hold buffer 3554 samples and holds the analog output from the global DAC 3551, then uses that value as the output signal 3555 for that specific column. For example, during programmed operation in that specific column, the output signal 3555 can be applied to control gate lines or word lines in the manner described above with respect to other diagrams.

可選替地，列取樣及保持緩衝器3554可以通過對列取樣及保持緩衝器進行時間多工而在多個列之間共用。 Alternatively, the column sampling and holding buffer 3554 can be shared across multiple columns by time multiplexing the column sampling and holding buffer.

圖36描繪待用於向VMM陣列3401提供輸入的輸入區塊3600。輸入區塊3600包括兩個全域DAC 3601-0及3601-1；列暫存器3602-0至3602-n，分別對應於VMM陣列3401中編號為0至n的列之一者；數位比較器區塊3603至3603-n分別對應於VMM陣列3401中編號為0至n的列之一者；列取樣及保持緩衝器3604-0到3604-n，分別對應於VMM陣列3401中編號為0到n的列之一者；以及輸出信號3605-0至3605-n，分別表示為CGIN0、CGIN1．．．、CGINn-1及CGINn，分別對應於編號為0至n的列之一者。數位比較器區塊3603將儲存在各自列暫存器3602中的值與信號CLKCOUNTx進行比較，該信號CLKCOUNTx是計數器的結果，該計數器在預定間隔期間對時脈信號進行計數。當信號CLKCOUNTx的數位計數值與各自的列暫存器3602的內容匹配時，各自的列S/H緩衝器3604被致能以將來自全域DAC 3601的值取樣到各自的S/H緩衝器3604中。每列具有其自身的列暫存器3602、數位比較器區塊3603及列S/H緩衝器3604。 Figure 36 depicts the input block 3600, which is intended to provide input to the VMM array 3401. The input block 3600 includes two global DACs 3601-0 and 3601-1; column registers 3602-0 to 3602-n, each corresponding to one of the columns numbered 0 to n in the VMM array 3401; digital comparator blocks 3603 to 3603-n, each corresponding to one of the columns numbered 0 to n in the VMM array 3401; column sample and hold buffers 3604-0 to 3604-n, each corresponding to one of the columns numbered 0 to n in the VMM array 3401; and output signals 3605-0 to 3605-n, represented as CGIN0, CGIN1..., CGINn-1, and CGINn, each corresponding to one of the columns numbered 0 to n. The comparator block 3603 compares the value stored in its respective column register 3602 with the signal CLKCOUNTx, which is the result of a counter that counts the clock signal at predetermined intervals. When the digital count value of the signal CLKCOUNTx matches the contents of its respective column register 3602, the respective column S/H buffer 3604 is enabled to sample the value from the global DAC 3601 into its respective S/H buffer 3604. Each column has its own column register 3602, comparator block 3603, and column S/H buffer 3604.

在操作期間，列暫存器3602-0至3602-n加載用於相關聯的列的數位輸入位元DINx(其中x是位元的數目，如8或16位元)並且接收時脈信號CLK。CLK信號用於將來自數位輸入位元DINx的資料載入到列暫存器3602-x中。全域DAC 3601(其由多個全域DAC組成，例如3601-0及3601-1)由所有列共用。在一個示例中，全域DAC 3601-0在偶數列上操作並且全域DAC 3601-1在奇數列上操作。全域DAC 3601接收時脈信號CLKDAC並輸出與CLKDAC時脈的計數對應的類比值。全域DAC 3601對儲存在相關列暫存器3602中的數位位元DINx執行數位類比轉換(通過GDAC_DINx匯流排)。該列的對應的列取樣及保持緩衝器3604對來自全域數位至類比轉換器3601的類比輸出進行取樣並保持該值，該值然後被用作為該特定列的輸出信號3605。例如，在該特定列或多列中的程式化操作期間，可以用上述關於其他圖式的方式將輸出信號3605應用到控制閘極線或字元線。 During operation, column registers 3602-0 to 3602-n load the digital input bits DINx (where x is the number of bits, such as 8 or 16 bits) for the associated column and receive the clock signal CLK. The CLK signal is used to load data from the digital input bits DINx into column register 3602-x. The global DAC 3601 (which consists of multiple global DACs, such as 3601-0 and 3601-1) is shared by all columns. In one example, global DAC 3601-0 operates on even-numbered columns and global DAC 3601-1 operates on odd-numbered columns. The global DAC 3601 receives the clock signal CLKDAC and outputs an analog value corresponding to the count of the CLKDAC clock. The global DAC 3601 performs a digital-to-analog conversion on the digital bits DINx stored in the relevant column register 3602 (via the GDAC_DINx bus). The corresponding column sample-and-hold buffer 3604 samples and holds the analog output from the global digital-to-analog converter 3601, which is then used as the output signal 3605 for that specific column. For example, during programmed operation in that specific column or multiple columns, the output signal 3605 can be applied to control gate lines or word lines in the manner described above with respect to other diagrams.

圖37A描繪波形3700，其說明回應GDACsup 3701而藉由圖35A中的列取樣及保持緩衝器3504、圖35B中的列取樣及保持緩衝器3554或圖36中之列取樣及保持緩衝器3604進行各自取樣及保持動作後輸出CGIN0及CGIN1的示例電壓位準。信號GDACsup 3701是來自全域DAC(例如圖35A中的全域DAC 3501、圖35B中的全域DAC 3551以及圖36中的全域DAC 3601-0及3601-1)的供應電壓。GDACsup 3701是線性DAC電壓曲線，即全域DAC輸出表示數位輸入值至類比值的線性平移。這種線性平移適合於在線性區域中操作的記憶體胞元。作為示例，信號3702顯示列0(CGIN0)上的取樣電壓值(位準)，並且信號3703顯示列1(CGIN1)上的取樣電壓值(位準)。信號DAC_sampling_en 3704是致能取樣及保持操作的控制信號。取樣的四個示例顯示在對應於被取樣的不同電壓的邊緣3705、3706、3707及3708。 Figure 37A depicts waveform 3700, illustrating example voltage levels CGIN0 and CGIN1 output after sampling and holding operations by the column sample and hold buffer 3504 in Figure 35A, the column sample and hold buffer 3554 in Figure 35B, or the column sample and hold buffer 3604 in Figure 36 in response to GDACsup 3701. The signal GDACsup 3701 is the supply voltage from the global DAC (e.g., global DAC 3501 in Figure 35A, global DAC 3551 in Figure 35B, and global DACs 3601-0 and 3601-1 in Figure 36). GDACsup 3701 is a linear DAC voltage curve, representing a linear shift of the digital input value to the analog value at the global DAC output. This linear translation is suitable for memory cells operating in linear regions. As an example, signal 3702 displays the sampled voltage value (level) on column 0 (CGIN0), and signal 3703 displays the sampled voltage value (level) on column 1 (CGIN1). Signal DAC_sampling_en 3704 is the control signal enabling the sampling and holding operation. Four sampling examples are shown at the edges 3705, 3706, 3707, and 3708 corresponding to the different voltages being sampled.

圖37B描繪波形3720，其顯示，響應於GDACsup 3721，在圖35A中的列取樣及保持緩衝器3504的對應取樣和保持動作之後、在圖35B中的列取樣及保持緩衝器3554的對應取樣和保持動作之後，或在圖36中的列取樣及保持緩衝器3604之後，輸出CGIN0和CGIN1的示例對數電壓位準。數位輸入值到類比值的對數轉換的使用適合於在亞閾值區域中操作的儲存器單元。或者，它可以用於在飽和區操作的儲存器單元。信號GDACsup 3721是從全域DAC(例如從圖35A中的全域DAC 3501、圖35B中的全域DAC 3551以及圖36中的全域DAC 3601-0和3601-1)提供的電壓。GDACsup 3721是對數DAC曲線。作為示例，信號3722顯示了列0(CGIN0)上的取樣電壓值(位準)，並且信號3723顯示了列1(CGIN1)上的取樣電壓值(位準)。信號DAC_sampling_en 3724是致能取樣和保持操作的控制信號。在對應於被取樣的不同電壓的邊緣3725、3726、3727和3728處顯示了取樣的四個示例。 Figure 37B depicts waveform 3720, showing that, in response to GDACsup 3721, after the corresponding sampling and holding action of column sample and hold buffer 3504 in Figure 35A, after the corresponding sampling and holding action of column sample and hold buffer 3554 in Figure 35B, or after column sample and hold buffer 3604 in Figure 36, the example logarithmic voltage levels of CGIN0 and CGIN1 are output. The use of logarithmic conversion from digital input values to analog values is suitable for memory units operating in the subthreshold region. Alternatively, it can be used for memory units operating in the saturation region. Signal GDACsup 3721 is the voltage supplied from a global DAC (e.g., global DAC 3501 in Figure 35A, global DAC 3551 in Figure 35B, and global DACs 3601-0 and 3601-1 in Figure 36). GDACsup 3721 is a logarithmic DAC curve. As an example, signal 3722 shows the sampled voltage value (level) at column 0 (CGIN0), and signal 3723 shows the sampled voltage value (level) at column 1 (CGIN1). Signal DAC_sampling_en 3724 is the control signal enabling sampling and holding operations. Four examples of sampling are shown at the edges 3725, 3726, 3727, and 3728 corresponding to the different voltages being sampled.

智慧DAC取樣方法如下。如圖37A與37B所示，取樣致能僅在用於特定輸入操作的列暫存器上完成，這意味著取樣在列暫存器的第一個最小值處致能，並在列暫存器的最大值處結束。這是為了僅根據列暫存器上的輸入之值(即激勵輸入的值)的範圍來減少將如所需的取樣時間。 The intelligent DAC sampling method is as follows. As shown in Figures 37A and 37B, sampling enable is performed only on the column register used for a specific input operation. This means that sampling is enabled at the first minimum value of the column register and terminates at the maximum value. This is to reduce the required sampling time by only considering the range of the input values (i.e., the excitation input values) on the column register.

此外，針對一次的取樣，假如致能最大數量的列，例如致能最大128列，那麼對於相同的輸入值假如致能180列，則取樣將發生兩次取樣，第一次對於128列的取樣及第二次對於62列的取樣，可選替地，90列的第一次取樣及90列的第二次取樣。這是為了減少取樣電路上的負載，以防大負載可能導致不期望的設置時間。 Furthermore, for a single sampling, if the maximum number of columns is enabled, for example, a maximum of 128 columns, then for the same input value, if 180 columns are enabled, sampling will occur twice: a first sample for 128 columns and a second sample for 62 columns. Alternatively, a first sample for 90 columns and a second sample for 90 columns. This is to reduce the load on the sampling circuitry to prevent undesirable settling times caused by high loads.

圖38描繪與VMM陣列3401一起使用的輸入區塊3800。輸入區塊3800包括子區塊3810、SRAM 3418、暫存器3801-0、3801-1、．．．、3801-n及位址解碼器3804-0、3804-1、．．．、3804-n。子區塊3810可選擇性地包括分別來自圖35A、35B及36的輸入區塊3500、3550及3600之一者。子區塊3810包括暫存器3802-0、3802-1、．．．、3802-n及列取樣及保持緩衝器3803-0、3803-1、．．．、3803-n以及根據圖35A、35B及36的其間的電路系統，視具體情況而定。在子區塊3810包括輸入區塊3500的情況下，則暫存器3802包括列暫存器3502並且列取樣及保持緩衝器3803包括列取樣及保持緩衝器3504。在子區塊3810包括輸入區塊3550的情況下，則暫存器3802包括列暫存器3552並且列取樣及保持緩衝器3803包括列取樣及保持緩衝器3554。在子區塊3810包括輸入區塊3600的情況下，則暫存器3802包括列暫存器3602並且列取樣及保持緩衝器3803包括列取樣及保持緩衝器3604。 Figure 38 depicts the input block 3800 used in conjunction with the VMM array 3401. Input block 3800 includes sub-block 3810, SRAM 3418, registers 3801-0, 3801-1, ..., 3801-n, and address decoders 3804-0, 3804-1, ..., 3804-n. Sub-block 3810 may optionally include one of the input blocks 3500, 3550, and 3600 from Figures 35A, 35B, and 36, respectively. Sub-block 3810 includes registers 3802-0, 3802-1, ... The circuitry includes 3802-n and column sample and hold buffers 3803-0, 3803-1, ..., 3803-n, as well as the circuitry between them according to Figures 35A, 35B, and 36, depending on the specific circumstances. In the case where sub-block 3810 includes input block 3500, register 3802 includes column register 3502 and column sample and hold buffer 3803 includes column sample and hold buffer 3504. If subblock 3810 includes input block 3550, then register 3802 includes column register 3552 and column sample and hold buffer 3803 includes column sample and hold buffer 3554. If subblock 3810 includes input block 3600, then register 3802 includes column register 3602 and column sample and hold buffer 3803 includes column sample and hold buffer 3604.

位址解碼器3804接收用於資料輸入加載操作的位址，以將資料載入暫存器3802或暫存器3801中。資料例如激勵資料或輸入資料，例如是在神經網路應用中將要分類或辨識的物件或影像。它輸出致能暫存器3801或暫存器3802的信號，該信號指示哪些暫存器在資料輸入加載操作中被確立(asserted)。資料輸入(未顯示)通常在8-256位元之間變化。 Address decoder 3804 receives an address for a data input loading operation to load data into register 3802 or register 3801. The data may be, for example, activation data or input data, such as objects or images to be classified or identified in a neural network application. It outputs a signal enabling register 3801 or register 3802, indicating which registers are asserted during the data input loading operation. Data input (not shown) typically varies between 8 and 256 bits.

位址解碼器3804還接收用於讀取驗證或程式化操作的位址，並向暫存器3801或暫存器3802輸出信號，該信號指示哪一列或哪些列被確立為讀取驗證或程式化操作。讀取驗證是用於權重調整的讀取操作，其中將胞元程式化為代表神經網路中目標權重的目標電流，然後驗證胞元電流以確保其在權重調整演算法期間接近目標電流。 Address decoder 3804 also receives addresses for read verification or programming operations and outputs a signal to register 3801 or register 3802 indicating which column(s) are confirmed for read verification or programming operations. Read verification is a read operation used for weight adjustment, where cells are programmed as target currents representing target weights in the neural network, and then the cell currents are verified to ensure they approach the target current during the weight adjustment algorithm.

暫存器3802使用儲存在每個這樣的暫存器中的激勵資料來致能列取樣及保持緩衝器3803。在示例實施方式中，暫存器3802可能有1024列及1024個實例，其中8位元激勵資料儲存在每個暫存器3802中。 Register 3802 uses the excitation data stored in each of these registers to enable column sampling and holding buffer 3803. In the example implementation, register 3802 may have 1024 columns and 1024 instances, with 8-bit excitation data stored in each register 3802.

為了加載資料給暫存器3802，所需的時脈週期的數目R是R=列數x8(對於8位元激勵資料)並除以資料輸入寬度，例如16位元資料輸入(例如，R=1024*8/16=512)。 To load data into register 3802, the required number of clock cycles R is R = number of columns x 8 (for 8-bit stimulus data) divided by the data input width, for example, 16-bit data input (e.g., R = 1024 * 8 / 16 = 512).

暫存器3801包括耦合到每個暫存器3802並與其相關聯的暫存器。每個暫存器3801加載與其相關聯暫存器3802的激勵資料，這可以在R個時脈週期內順序執行。此後，在第一時脈週期期間，來自每個暫存器3801的資料平行地載入到其相關聯的暫存器3802中。因此，暫存器3802在單個時間週期內平行地，而不是在R個時脈週期內串列地，從各自的暫存器3801加載。這極大地加速了資料輸入加載操作的時序。 Register 3801 includes registers coupled to and associated with each register 3802. Each register 3801 loads the stimulus data of its associated register 3802, which can be performed sequentially over R clock cycles. Thereafter, during the first clock cycle, data from each register 3801 is loaded in parallel into its associated register 3802. Therefore, registers 3802 are loaded from their respective registers 3801 in parallel within a single clock cycle, rather than serially over R clock cycles. This significantly accelerates the timing of data input loading operations.

可選地，SRAM 3418可以用於在R時脈週期期間順序地加載所有暫存器3802作為後台操作。 Optionally, SRAM 3418 can be used to sequentially load all registers 3802 during the R clock cycle for background operation.

可選地，SRAM 3418用於順序地向暫存器3801加載其資料。圖38B描繪可以使用圖38A的輸入區塊3800來執行的輸入方法3850。第一操作是回應位址通過複數個位址解碼器，輸出複數個列致能信號(3851)。下一個操作是回應複數個列致能信號藉由第一複數個暫存器，順序地儲存激勵資料(3852)。該儲存順序地可選地包括藉由第一複數個暫存器從靜態隨機存取記憶體接收激勵資料。下一個操作是藉由第二複數個暫存器，平行地儲存從第一複數個暫存器接收的激勵資料(3853)。下一個操作是回應從第二複數個暫存器接收到的激勵資料在讀取神經元操作期間藉由複數個列取樣及保持緩衝器，驅動非揮發性記憶體胞元陣列的多列(3854)。 Optionally, SRAM 3418 is used to sequentially load its data into register 3801. Figure 38B depicts an input method 3850 that can be performed using the input block 3800 of Figure 38A. The first operation is to respond to an address by a plurality of address decoders, outputting a plurality of column enable signals (3851). The next operation is to respond to the plurality of column enable signals by sequentially storing the activation data via a first plurality of registers (3852). This sequential storage may optionally include receiving the activation data from static random access memory via the first plurality of registers. The next operation is to store the stimulus data received from the first plurality of registers in parallel using a second plurality of registers (3853). The next operation is to respond to the stimulus data received from the second plurality of registers during read neuronal operations by driving multiple columns of the non-volatile memory cell array (3854) using a multi-column sampling and holding buffer.

圖39A描繪輸入區塊3900。輸入區塊3900包括子區塊3910、VMM陣列3401及位址解碼器3904-0、3904-1、．．．、3904-n。子區塊3910可選擇地包括分別來自圖35A、35B及36的輸入區塊3500、3550及3600之一者。子區塊3910包括列暫存器3902-0、3902-1、．．．、3902-n以及列取樣及保持緩衝器3903-0、3903-1、．．．、3903-n以及根據圖35A、圖35B及36其間的電路系統，視情況而定。在子區塊3910包括輸入區塊3500的情況下，則列暫存器3902包括列暫存器3502並且列取樣及保持緩衝器3803包括列取樣及保持緩衝器3504。在子區塊3910包括輸入區塊3550的情況下，則列暫存器3902包括列暫存器3552，並且列取樣及保持緩衝器3901包括列取樣及保持緩衝器3554。在子區塊3910包括輸入區塊3600的情況下，則列暫存器3902包括列暫存器3602，並且列取樣及保持緩衝器3903包括列取樣及保持緩衝器3604。 Figure 39A depicts input block 3900. Input block 3900 includes sub-block 3910, VMM array 3401, and address decoders 3904-0, 3904-1, ..., 3904-n. Sub-block 3910 may optionally include one of input blocks 3500, 3550, and 3600 from Figures 35A, 35B, and 36, respectively. Sub-block 3910 includes column registers 3902-0, 3902-1, ..., 3902-n, and column sample and hold buffers 3903-0, 3903-1, ... 3903-n and the circuit system between Figures 35A, 35B and 36, as appropriate. In the case where sub-block 3910 includes input block 3500, then column register 3902 includes column register 3502 and column sample and hold buffer 3803 includes column sample and hold buffer 3504. In the case where sub-block 3910 includes input block 3550, then column register 3902 includes column register 3552 and column sample and hold buffer 3901 includes column sample and hold buffer 3554. If subblock 3910 includes input block 3600, then column register 3902 includes column register 3602, and column sample and hold buffer 3903 includes column sample and hold buffer 3604.

位址解碼器3904接收用於資料輸入加載操作的位址，以將資料(未顯示)載入到列暫存器3902中。資料例如激勵資料或輸入資料，例如是在神經網路應用中將要分類或辨識的物件或影像。它輸出致能列暫存器3902的信號，該信號指示哪些暫存器在資料輸入加載操作中被確立。資料輸入(未顯示)通常在8-256位元之間變化。 Address decoder 3904 receives the address used for the data input loading operation to load data (not shown) into column register 3902. The data may be, for example, stimulus data or input data, such as objects or images to be classified or identified in neural network applications. It outputs a signal enabling column register 3902, indicating which registers are established during the data input loading operation. The data input (not shown) typically varies between 8 and 256 bits.

位址解碼器3904還可以接收用於讀取驗證或程式化操作的位址，並向列暫存器3902輸出信號，該信號指示哪一列或哪些列被確立為讀取驗證或程式化操作。在此示例中，每個列暫存器儲存激勵資料(例如，8位元激勵資料)以及一個標籤位元或多個標籤位元，例如一個用於列致能而另一個用於列DAC取樣。舉例來說，列暫存器3902-0包括標籤位元3905-0，列暫存器3902-1包括標籤位元3905-1，列暫存器3902-n包括標籤位元3905-n等等。標籤位元(列致能標籤位元)3905用於列致能以禁能儲存在列暫存器中的激勵輸入資料，無論該列是否被位址解碼器3904選擇。例如，假如列0的標籤位元3905-0具有特定值(例如「1」值)時，則列暫存器3902-0中的激勵資料被輸出。假如標籤位元3905-0具有不同的值(例如，「0」值)，則列暫存器3902-0中的激勵資料將不會被輸出，並且列S/H緩衝器3903-0將從列暫存器3902-0接收Z狀態。另一個標籤位元(列S/H標籤位元)用於列DAC取樣以致能或禁能將全域DAC值取樣到本地列S/H緩衝器3903中。 Address decoder 3904 can also receive addresses used for read authentication or programmed operations and output signals to column register 3902 indicating which column(s) are confirmed for read authentication or programmed operations. In this example, each column register stores stimulus data (e.g., 8-bit stimulus data) and one or more tag bits, such as one for column enable and another for column DAC sampling. For example, column register 3902-0 includes tag bit 3905-0, column register 3902-1 includes tag bit 3905-1, column register 3902-n includes tag bit 3905-n, and so on. The label bit (column enable label bit) 3905 is used to enable the column to disable the activation input data stored in the column register, regardless of whether the column is selected by the address decoder 3904. For example, if label bit 3905-0 of column 0 has a specific value (e.g., a "1" value), the activation data in column register 3902-0 is output. If label bit 3905-0 has a different value (e.g., a "0" value), the activation data in column register 3902-0 will not be output, and column S/H buffer 3903-0 will receive the Z state from column register 3902-0. Another tag bit (column S/H tag bit) is used for column DAC sampling to enable or disable sampling the global DAC value into the local column S/H buffer 3903.

圖39B描繪輸入區塊3920。輸入區塊3920類似於輸入區塊3900，除了它包含第二組列暫存器(影子暫存器)3906-0、3906-1、．．．、3906-n，每個列暫存器包含一個各自的標籤位元3907-0、3907-1、．．．、3907-n。每列可以在列暫存器3902及列暫存器3906之間切換。例如，在一個操作期間，位址解碼器3904向列暫存器3902提供輸出，並且在另一操作期間，位址解碼器3904向列暫存器3906提供輸出。例如，在一個操作期間，假如標籤位元3905被致能，則列暫存器3902-0輸出資料，並且在另一操作期間，假如標籤位元3907被致能，則列暫存器3906輸出資料。這種切換可以通過多工器(未顯示)或其他控制邏輯來實現。以此方式，一組列暫存器3902或3906可加載有激勵資料，而另一組用於根據來自位址解碼器3904的信號主動輸出其激勵資料。 Figure 39B depicts input block 3920. Input block 3920 is similar to input block 3900, except that it contains a second set of column registers (shadow registers) 3906-0, 3906-1, ..., 3906-n, each column register containing a separate tag bit 3907-0, 3907-1, ..., 3907-n. Each column can be switched between column registers 3902 and 3906. For example, during one operation, address decoder 3904 provides output to column register 3902, and during another operation, address decoder 3904 provides output to column register 3906. For example, during one operation, if tag bit 3905 is enabled, column register 3902-0 outputs data, and during another operation, if tag bit 3907 is enabled, column register 3906 outputs data. This switching can be implemented using a multiplexer (not shown) or other control logic. In this way, one set of column registers 3902 or 3906 can be loaded with excitation data, while another set is used to actively output its excitation data based on signals from address decoder 3904.

圖39C描繪輸入區塊3940。輸入區塊3940類似於輸入區塊3900，除了它包含第二組列暫存器3908-0、3908-1、．．．、3908-n，每個列暫存器包含各自的標籤位元3909-0、3909-1、．．．、3909-n。每列可以在列暫存器3902及列暫存器3908之間切換。例如，在一個操作期間，位址解碼器3904向列暫存器3902提供信號，並且在另一操作期間，位址解碼器3904向列暫存器3908提供輸出。例如，在一個操作期間，假如標籤位元3905被致能，則列暫存器3902輸出資料，並且在另一操作期間，假如標籤位元3909被致能，則列暫存器3908輸出資料。這種切換可以通過多工器(未顯示)或其他控制邏輯來實現。以此方式，一組列暫存器3902或3908可以加載有激勵資料，而另一組列暫存器用於根據來自位址解碼器3904的信號主動地輸出其激勵資料。 Figure 39C depicts input block 3940. Input block 3940 is similar to input block 3900, except that it contains a second set of column registers 3908-0, 3908-1, ..., 3908-n, each column register containing its own tag bits 3909-0, 3909-1, ..., 3909-n. Each column can be switched between column registers 3902 and 3908. For example, during one operation, address decoder 3904 provides a signal to column register 3902, and during another operation, address decoder 3904 provides an output to column register 3908. For example, during one operation, if tag bit 3905 is enabled, column register 3902 outputs data, and during another operation, if tag bit 3909 is enabled, column register 3908 outputs data. This switching can be implemented using a multiplexer (not shown) or other control logic. In this way, one set of column registers 3902 or 3908 can be loaded with excitation data, while another set of column registers is used to actively output its excitation data based on signals from address decoder 3904.

圖39D描繪輸入區塊3960。輸入區塊3960類似於輸入區塊3940，除了它包含用於相同陣列輸入(例如CGINx)的第二組列取樣及保持緩衝器3911-0、3911-1、．．．、3911-n。每列可以在列暫存器3902及列暫存器3908之間切換。例如，在一個操作期間，位址解碼器3904向列暫存器3902提供信號，並且在另一操作期間，位址解碼器3904向列暫存器3908提供輸出。類似地，在一個操作期間，列暫存器3902根據標籤位元3905輸出資料，並且在另一操作期間，列暫存器3908根據標籤位元3909輸出資料。每列可以在列S/H緩衝器3903及列S/H緩衝器3911之間切換(使用控制信號，未顯示)。這種切換可以通過多工器(未顯示)或其他控制邏輯來實現。以這種方式，一組列暫存器3902或3908可以加載有激勵資料，而另一組用於根據來自位址解碼器3904的信號主動地輸出其激勵資料。 Figure 39D depicts input block 3960. Input block 3960 is similar to input block 3940, except that it includes a second set of column sample and hold buffers 3911-0, 3911-1, ..., 3911-n for the same array inputs (e.g., CGINx). Each column can be switched between column registers 3902 and 3908. For example, during one operation, address decoder 3904 provides a signal to column register 3902, and during another operation, address decoder 3904 provides an output to column register 3908. Similarly, during one operation, column register 3902 outputs data according to tag bit 3905, and during another operation, column register 3908 outputs data according to tag bit 3909. Each column can be switched between column S/H buffer 3903 and column S/H buffer 3911 (using control signals, not shown). This switching can be implemented via a multiplexer (not shown) or other control logic. In this way, one set of column registers 3902 or 3908 can be loaded with excitation data, while another set is used to actively output its excitation data according to signals from address decoder 3904.

在一個示例中，第一激勵資料及第一標籤位元被載入到列暫存器3902中，並且第二激勵資料及第二標籤位元被載入到列暫存器3908中。第一激勵資料及第二激勵資料可以相同或不同，並且第一標籤位元及第二標籤位元可以相同或不同。 In one example, the first stimulus data and the first tag bit are loaded into column register 3902, and the second stimulus data and the second tag bit are loaded into column register 3908. The first stimulus data and the second stimulus data may be the same or different, and the first tag bit and the second tag bit may be the same or different.

圖40A描繪輸出區塊4000。輸出區塊4000通常從VMM陣列3401的位元線或源極線接收來自VMM陣列3401(未顯示)的輸出電流。輸出區塊4000包括電流至電壓轉換器4001、類比至數位轉換器4002、輸出暫存器4003以及輸出暫存器4004。電流至電壓轉換器4001將從VMM陣列3401接收的電流轉換成各自的電壓，其值反映從VMM陣列3401接收的電流的值。類比至數位轉換器4002將那些各自的電壓轉換成位元，該位元表示從各自的電流至電壓轉換器4001接收的電壓的值，因此其反映從VMM陣列3401接收的電流值。然後將這些位元儲存在輸出暫存器4003中或輸出暫存器4004。輸出操作可以在輸出暫存器4003及輸出暫存器4004之間切換。例如，在第一時間週期(例如，一個或多個時脈週期)期間的第一操作期間，輸出資料被載入到輸出暫存器4003中。在第一時間週期之後的第二時間週期(例如，一個或多個時脈週期)期間的第二操作期間，資料由系統中的另一裝置從輸出暫存器4003讀出，同時將新輸出資料載入到輸出暫存器4004中。在第二時間週期之後的第三時間週期(例如，一個或多個時脈週期)期間的第三操作期間，新輸出資料由系統中的另一裝置從輸出暫存器4004讀出，並且可選地，輸出資料可以被載入到輸出暫存器4003並且重複該序列。這減少了與輸出操作相關聯的延遲量，因為資料可以由外部裝置從第一輸出暫存器讀取，同時其他資料被載入到其它輸出暫存器中。可選地，輸出區塊4000可選地包括行標籤位元4005，用以致能電流至電壓轉換器4001及類比至數位轉換器4002。行標籤位元4005可以包括用於VMM陣列3401中的每一行的行標籤位元。行標籤位元4005加載類似於上面關於圖39A至39D討論的列標籤位元加載。行標籤位元的功能類似於上面關於圖39A至39D討論的列標籤位元的功能。例如，電流至電壓轉換器4001及類比至數位轉換器4002可以配置為當那行的行標籤位元4005具有第一值時輸出該行的資料，並且當行標籤位元具有第二值時不輸出資料。 Figure 40A depicts output block 4000. Output block 4000 typically receives output current from VMM array 3401 (not shown) via bit lines or source lines. Output block 4000 includes a current-to-voltage converter 4001, an analog-to-digital converter 4002, an output register 4003, and an output register 4004. The current-to-voltage converter 4001 converts the current received from VMM array 3401 into a voltage that reflects the value of the current received from VMM array 3401. Analogous to digital converter 4002, each voltage is converted into bits representing the value of the voltage received from the respective current-to-voltage converter 4001, thus reflecting the current value received from the VMM array 3401. These bits are then stored in output register 4003 or output register 4004. Output operations can be switched between output register 4003 and output register 4004. For example, during the first operation period within a first time cycle (e.g., one or more clock cycles), output data is loaded into output register 4003. During a second operation period following the first time period (e.g., one or more clock cycles), data is read from output register 4003 by another device in the system, while new output data is loaded into output register 4004. During a third operation period following the second time period (e.g., one or more clock cycles), new output data is read from output register 4004 by another device in the system, and optionally, output data can be loaded into output register 4003 and the sequence can be repeated. This reduces latency associated with output operations because data can be read from the first output register by an external device while other data is loaded into other output registers. Optionally, output block 4000 may include row label bits 4005 to enable current to voltage converter 4001 and analog-to-digital converter 4002. Row label bits 4005 may include row label bits for each row in the VMM array 3401. Loading of row label bits 4005 is similar to the loading of column label bits discussed above with respect to Figures 39A to 39D. The function of the row label bits is similar to the function of the column label bits discussed above with respect to Figures 39A to 39D. For example, the current-to-voltage converter 4001 and the analog-to-digital converter 4002 can be configured to output data for a row when the row label bit 4005 has a first value, and not output data when the row label bit has a second value.

圖40B描繪輸出區塊4020。輸出區塊4020與輸出區塊4000相同，只是添加累加器4021。累加器4021可以將在一定時間段內從電流至電壓轉換器4001、類比至數位轉換器4002、輸出暫存器4003及輸出暫存器4004接收的值進行加總。例如，假如以時間多工方式對VMM陣列3401執行神經讀取操作，例如通過在第一時間週期期間讀取一半行並且在第二時間週期期間讀取另一半行，則這可能是有用的。來自第一時間週期的輸出可以由輸出暫存器4003接收，並且來自第二時間週期的輸出可以由輸出暫存器4004接收，並且累加器可以將從輸出暫存器4003及輸出暫存器4004接收的值加總。 Figure 40B depicts output block 4020. Output block 4020 is the same as output block 4000, except that an accumulator 4021 is added. Accumulator 4021 can sum the values received from current-to-voltage converter 4001, analog-to-digital converter 4002, output register 4003, and output register 4004 over a certain time period. For example, this could be useful if a neural read operation is performed on the VMM array 3401 in a time-multiplexed manner, such as by reading half a row during a first time cycle and the other half during a second time cycle. The output from the first time period can be received by output register 4003, and the output from the second time period can be received by output register 4004. The accumulator can sum the values received from output registers 4003 and 4004.

輸出區塊4000可選地包括行標籤位元4005以致能電流至電壓轉換器4001及類比至數位轉換器4002。行標籤位元4005可以包括用於VMM陣列3401中的每一行的行標籤位元。行標籤位元4005加載類似於上述關於圖39A至39D的列標籤位元加載。行標籤位元的功能類似於上述關於圖39A至39D的列標籤位元的功能。例如，電流至電壓轉換器4001及類比至數位轉換器4002可以配置為當該行的行標籤位元4005具有第一值時輸出該行的資料，並且當該行的行標籤位元具有第二值時不輸出資料。 Output block 4000 may optionally include row label bits 4005 to enable current-to-voltage converter 4001 and analog-to-digital converter 4002. Row label bits 4005 may include row label bits for each row in the VMM array 3401. Loading of row label bits 4005 is similar to the loading of column label bits described above with respect to Figures 39A-39D. The function of the row label bits is similar to the function of the column label bits described above with respect to Figures 39A-39D. For example, current-to-voltage converter 4001 and analog-to-digital converter 4002 may be configured to output data for that row when the row label bit 4005 of that row has a first value, and not output data when the row label bit of that row has a second value.

圖40C提供了輸出累加器4021的示例電路。輸出累加器從電流至電壓轉換器4001、類比至數位轉換器4002、輸出暫存器4003及輸出暫存器4004接收資料。資料由移位器4042接收，其回應EN_SHIFT執行移位功能。移位器4042的輸出D1提供給加法器4043，其將D1與D2相加，其接收EN_ADD並由EN_ADD致能。加法器4043的輸出提供給累加器暫存器4044，其儲存加法器4043的輸出並將其作為D2提供回加法器4043以用於下一個加法操作。以這種方式，可以在一時間週期內將電流至電壓轉換器 4001、類比至數位轉換器4002、輸出暫存器4003及輸出暫存器4004的輸出相加。 Figure 40C provides an example circuit for an output accumulator 4021. The output accumulator receives data from a current-to-voltage converter 4001, an analog-to-digital converter 4002, an output register 4003, and an output register 4004. The data is received by a shifter 4042, which responds to EN_SHIFT to perform a shift function. The output D1 of shifter 4042 is provided to adder 4043, which adds D1 to D2, and receives and is enabled by EN_ADD. The output of adder 4043 is provided to accumulator register 4044, which stores the output of adder 4043 and provides it back to adder 4043 as D2 for the next addition operation. In this way, the outputs of the voltage converter 4001, the analog-to-digital converter 4002, the output register 4003, and the output register 4004 can be summed within a time cycle.

例如，移位器4042在串列輸入(DAC)模式期間使用，其中一次讀取激勵輸入的一位元，並且其中輸出位元的移位量取決於輸入位元的二進位位置。例如，輸入位元的LSB(最低有效位元)導致輸出不移位，(LSB+1)輸入位元導致1位元左移，(LSB+2)輸入位元導致輸出2位元左移等等，並且對於8位元激勵輸入，讀取操作執行8次。來自累加器暫存器4044的最終輸出是整個8-b激勵輸入的結果。 For example, shifter 4042 is used during serial input (DAC) mode, where one bit of the excitation input is read at a time, and the shift amount of the output bit depends on the binary position of the input bit. For example, the LSB (least significant bit) of the input bit results in no output shift, (LSB+1) of the input bit results in a 1-bit left shift, (LSB+2) of the input bit results in a 2-bit left shift, and so on. For an 8-bit excitation input, the read operation is performed 8 times. The final output from accumulator register 4044 is the result of the entire 8-bit excitation input.

圖41描繪第一階段4101及第二階段4102的波形4100，其中第一階段4101列暫存器加載激勵資料，第二階段4102使用激勵資料來執行神經讀取操作。 Figure 41 depicts waveforms 4100 of the first stage 4101 and the second stage 4102, where the first stage 4101 loads stimulus data into a register, and the second stage 4102 uses the stimulus data to perform neural readout operations.

圖42描繪隨機存取讀取操作4201的波形4200。 Figure 42 depicts the waveform 4200 of the random access read operation 4201.

圖43描繪突發讀取操作4301的波形4300。 Figure 43 depicts the waveform 4300 of the burst read operation 4301.

圖44描繪神經讀取操作4401的波形4400。 Figure 44 depicts the waveform 4400 of the neural readout operation 4401.

圖45描繪神經讀取操作4500。神經讀取操作4500開始(4501)。激勵資料載入到列暫存器(4502)中。接著致能一組N列(4503)。接下來，輸入行位址(4504)。執行讀取操作(4505)，其涉及如圖37A及37B中所示的DAC取樣(其利用圖35A-B、36、38及39A-39D中所示的電路)以及如圖34所示的(位元線)輸出電路3407(其使用ITV將電流轉換為電壓，並使用ADC將電壓轉換為數位輸出)，其中輸出資料是來自ADC的數位輸出。資料被載入到輸出暫存器(4506)中。系統判定是否需要讀取另一個行位址。假如是，則返回到操作4504。假如否，則系統判定是否需要讀取另一組N列(4508)。假如是，則返回到操作4503。假如否，則完成神經讀取操作(4509)，在此時資料輸出可選地被移出(4510)或者神經讀取操作結束。每個神經讀取操作，對於具有行切換的一組列，神經讀取時間為1列DAC延遲+具有N行多工的N的ITV+DAC延遲。例如，假如DAC延遲為2μs，ITV+ADC延遲為1μs，則讀取整列的時間為1x(DAC延遲)+16x(ITV+ADC延遲)=18μs。基本上，對於下一行神經讀取，DAC延遲不會貢獻任何額外時間。 Figure 45 depicts neural read operation 4500. Neural read operation 4500 begins (4501). Excitation data is loaded into the column register (4502). Then, a group of N columns is enabled (4503). Next, the row address is input (4504). A read operation (4505) is performed, which involves DAC sampling as shown in Figures 37A and 37B (which utilizes the circuits shown in Figures 35A-B, 36, 38, and 39A-39D) and the (bit line) output circuit 3407 as shown in Figure 34 (which uses ITV to convert current to voltage and uses ADC to convert voltage to digital output), where the output data is from the digital output of the ADC. The data is loaded into the output register (4506). The system determines whether another row address needs to be read. If yes, it returns to operation 4504. If no, the system determines whether another set of N columns needs to be read (4508). If yes, it returns to operation 4503. If no, the neural read operation is completed (4509), at which point the data output can be optionally shifted out (4510) or the neural read operation ends. For each neural read operation, for a set of columns with row switching, the neural read time is 1 column DAC delay + N ITV+DAC delay with N row multiplexing. For example, if the DAC delay is 2μs and the ITV+ADC delay is 1μs, then the time to read the entire column is 1x(DAC delay) + 16x(ITV+ADC delay) = 18μs. Basically, the DAC delay doesn't contribute any extra time for the next line of neural readout.

圖46描繪了神經讀取操作4600。神經讀取操作4600開始(4601)。激勵資料被載入到第一組列暫存器中(4603)。然後激勵資料被載入到第二組列暫存器中(4602)。與該事件同時，行位址或列組改變(4604)。執行讀取操作(4605)。輸出資料載入到輸出暫存器(4606)中。假如讀取操作未完成(4607)，則過程返回到操作4604。假如讀取操作完成，則系統(使用邏輯控制器，未顯示)判定是否需要將資料從第二組列暫存器載入到第一組列暫存器(4608)。假如否，則完成神經讀取操作(4611)。假如是，則將資料從第二組列暫存器載入到第一組列暫存器中(4609)。然後系統判定神經讀取操作是否完成(4610)。假如是，則完成神經讀取操作(4611)。假如不是，則返回到操作4604。 Figure 46 illustrates neural read operation 4600. Neural read operation 4600 begins (4601). Activation data is loaded into the first set of column registers (4603). Then, activation data is loaded into the second set of column registers (4602). Simultaneously, the row address or column group changes (4604). The read operation is performed (4605). Output data is loaded into the output register (4606). If the read operation is not completed (4607), the process returns to operation 4604. If the read operation is completed, the system (using a logic controller, not shown) determines whether data needs to be loaded from the second set of column registers into the first set of column registers (4608). If not, the neural read operation is completed (4611). If yes, the data is loaded from the second set of column registers into the first set of column registers (4609). Then the system determines whether the neural read operation is complete (4610). If yes, the neural read operation is completed (4611). If not, it returns to operation 4604.

圖47描繪神經讀取操作4700。首先，激勵資料載入到第一組列暫存器中(4701)。然後激勵資料載入到第二組列暫存器中(4702)。與該事件同時，行位址或列組改變(4703)。執行讀取操作(4704)。資料載入到輸出暫存器(4705)中。假如讀取操作完成(4706)，則系統進行到操作4707。假如否，則系統返回操作4703。在操作4707中，系統判定第二組列暫存器及其對應的列S/H 緩衝器是否已致能。假如是，則操作完成(4708)。假如否，則第二組列暫存器及其對應的列S/H緩衝器致能並且系統返回到操作4703以繼續神經讀取操作。 Figure 47 depicts neural read operation 4700. First, the activation data is loaded into the first set of column registers (4701). Then, the activation data is loaded into the second set of column registers (4702). Simultaneously, the row address or column group changes (4703). The read operation is performed (4704). The data is loaded into the output register (4705). If the read operation is complete (4706), the system proceeds to operation 4707. If not, the system returns to operation 4703. In operation 4707, the system determines whether the second set of column registers and its corresponding column S/H buffers are enabled. If yes, the operation is complete (4708). If not, the second set of column registers and their corresponding column S/H buffers are enabled, and the system returns to operation 4703 to continue the neural read operation.

圖48描繪讀出操作4800。先前，數位輸出資料已被載入輸出暫存器1或輸出暫存器2中。然後輸出資料從輸出暫存器1或輸出暫存器2移出(4801)。 Figure 48 illustrates read operation 4800. Previously, digital output data has been loaded into output register 1 or output register 2. Then, the output data is removed from output register 1 or output register 2 (4801).

圖49描繪神經讀取操作4900。首先，激勵資料載入到列暫存器中(4901)。然後資料從輸出暫存器1或輸出暫存器2移出(4902)。與該事件同時，行位址或列組改變(4903)。執行神經讀取操作(4904)。資料載入到輸出暫存器1或輸出暫存器2(4905)。假如神經讀取操作完成(4906)，則操作完成(4907)。假如否，則系統返回操作4903以繼續神經讀取操作。 Figure 49 depicts neural read operation 4900. First, activation data is loaded into the column register (4901). Then, the data is removed from output register 1 or output register 2 (4902). Simultaneously, the row address or column group changes (4903). The neural read operation is performed (4904). Data is loaded into output register 1 or output register 2 (4905). If the neural read operation completes (4906), the operation is complete (4907). If not, the system returns to operation 4903 to continue the neural read operation.

應注意，如本文中所使用，術語「在．．．上方」及「在．．．上」兩者包括性地包括「直接在．．．上」(其間未設置有中間材料、元件或空間)及「間接地在．．．上」(其間設置有中間材料、元件或空間)。同樣地，術語「鄰近」包括「直接鄰近」(其間未設置有中間材料、元件或空間)及「間接鄰近」(其間設置有中間材料、元件或空間)，「安裝至」包括「直接安裝至」(其間未設置有中間材料、元件或空間)及「間接安裝至」(其間設置有中間材料、元件或空間)，且「電耦接」包括「直接電耦接至」(其間無將元件電連接在一起之中間材料或元件)及「間接電耦接至」(其間具有將元件電連接在一起之中間材料或元件)。舉例而言，「在基板上方」形成元件可包括直接在基板上形成元件而其間無中間材料/元件，以及間接地在基板上形成元件而其間具有一或多種中間材料/元件。 It should be noted that, as used herein, the terms "above" and "on" include, in a comprehensive manner, "directly on" (without any intermediate material, component, or space) and "indirectly on" (with any intermediate material, component, or space). Similarly, the term "proximate" includes "directly proximate" (without intermediate materials, components, or spaces) and "indirectly proximate" (with intermediate materials, components, or spaces), "mounted to" includes "directly mounted to" (without intermediate materials, components, or spaces) and "indirectly mounted to" (with intermediate materials, components, or spaces), and "electrically coupled" includes "directly electrically coupled to" (without intermediate materials or components electrically connecting the components together) and "indirectly electrically coupled to" (with intermediate materials or components electrically connecting the components together). For example, forming a component "above the substrate" can include forming a component directly on the substrate without intermediate materials/components, and forming a component indirectly on the substrate with one or more intermediate materials/components.

C1:層C2:層C3:層CB1:突觸CB2:突觸CB3:突觸CB4:突觸P1:激勵函數P2:激勵函數S1:層S2:層C1: Layer C2: Layer C3: Layer CB1: Synapse CB2: Synapse CB3: Synapse CB4: Synapse P1: Excitation Function P2: Excitation Function S1: Layer S2: Layer

Claims

An input circuit for an artificial neural network array includes: a plurality of address decoders for receiving an address and outputting a plurality of column enable signals in response to an address; a first plurality of registers electrically connected to the plurality of address decoders for sequentially storing excitation data in response to the plurality of column enable signals; and a second plurality of registers electrically connected to the first plurality of registers for storing excitation data received from the first plurality of registers in parallel.

The input circuit of claim 1 includes: a plurality of column sampling and holding buffers for responding to excitation data received from the second plurality of registers, and driving each column of a non-volatile memory cell array during a neural read operation.

The input circuit of claim 1 includes: a static random access memory for providing the stimulus data to be stored in the first plurality of registers.

An operation method of an input circuit includes: outputting a plurality of column enable signals in response to an address using a plurality of address decoders; sequentially storing excitation data in response to the plurality of column enable signals using a first plurality of registers; and storing excitation data received from the first plurality of registers in parallel using a second plurality of registers.

The method of operation, as described in claim 4, includes: responding to stimulus data received from the second plurality of registers by means of a plurality of column sampling and holding buffers, and driving columns of a non-volatile memory cell array during a read neuron operation.

As in the method of operation of claim 4, the sequential storage includes receiving the stimulus data from a static random access memory via the first plurality of registers.

An input circuit for an artificial neural network array includes: a plurality of address decoders for receiving an address and outputting a plurality of column enable signals in response to an address; and a first plurality of registers electrically connected to the plurality of address decoders for storing first excitation data and at least one first tag bit in response to the plurality of column enable signals; wherein, when the at least one first tag bit has a first value, the first plurality of registers respectively output the first excitation data, and when the at least one first tag bit has a second value, the first plurality of registers do not output the first excitation data.

The input circuit of claim 7 includes: a plurality of column sampling and holding buffers for responding to the first stimulus data output by the first plurality of buffers to drive the columns of a non-volatile memory cell array.

The input circuit of claim 7 includes: a second plurality of registers electrically connected to the plurality of address decoders for responding to the plurality of column enable signals to store second excitation data and at least one second tag bit; wherein, when the at least one second tag bit has a first value, the second plurality of registers respectively output the second excitation data, and when the at least one second tag bit has a second value, the second plurality of registers do not output the second excitation data.

The input circuit of claim 9 includes: a plurality of column sampling and holding buffers for responding to a first stimulus data from the first plurality of registers or a second stimulus data from the second plurality of registers to drive columns of a non-volatile memory cell array.

An operation method of an input circuit includes: outputting a plurality of column enable signals in response to an address using a plurality of address decoders; storing activation data and a first set of tag bits in response to the plurality of column enable signals using a first plurality of registers; outputting the activation data using the first plurality of registers when the first set of tag bits respectively contain a first value; and not outputting the activation data using the first plurality of registers when the first set of tag bits respectively contain a second value.

The method of operation of claim 11 includes: driving columns of a nonvolatile memory cell array in response to the stimulus data output from the first plurality of registers by means of a plurality of column sampling and holding buffers.

The method of operation of claim 11 includes: storing second stimulus data and a second set of tag bits in response to the multiple column enable signals using a first plurality of registers; outputting the second stimulus data when the second set of tag bits contains a first value using a second plurality of registers; and not outputting the second stimulus data when the second set of tag bits contains a second value using the second plurality of registers.

The method of operation, as described in claim 13, includes: driving columns of a non-volatile memory cell array in response to stimulus data output from the second plurality of registers by means of a plurality of column sampling and holding buffers.

An input circuit for an artificial neural network array includes: a plurality of address decoders for receiving an address and outputting a plurality of column enable signals in response to an address; and a first plurality of registers electrically connected to the plurality of address decoders for storing first activation data and a first tag bit in response to the plurality of column enable signals, wherein when a stored first tag bit has a first value, the first plurality of registers respectively output the first activation data, and when the stored first tag bit has a first value, the first plurality of registers respectively output the first activation data. When a second value is present, the first plurality of registers do not output the first excitation data, and the second plurality of registers are electrically connected to the plurality of address decoders to store the second excitation data and the second tag bit in response to the plurality of column enable signals. When a stored second tag bit has a first value, the second plurality of registers respectively output the second excitation data, and when the stored second tag bit has a second value, the second plurality of registers do not output the second excitation data.

The input circuit of claim 15 includes: a first plurality of column sampling and holding buffers for driving columns of a non-volatile memory cell array in response to a first stimulus data output from the first plurality of buffers; and a second plurality of column sampling and holding buffers for driving columns of the non-volatile memory cell array in response to a second stimulus data output from the second plurality of buffers.

An operation method of an input circuit includes: receiving an address and outputting a plurality of column enable signals in response to an address using a plurality of address decoders; storing first excitation data and a first set of tag bits in response to the plurality of column enable signals using a first plurality of registers; storing second excitation data and a second set of tag bits in response to the plurality of column enable signals using the first plurality of registers; outputting the first excitation data in response to the first set of tag bits using the first plurality of registers; and outputting the second excitation data in response to the second set of tag bits using the second plurality of registers.

The method of operation of claim 17 includes: driving a column of a non-volatile memory cell array in response to the first stimulus data from the first plurality of registers using a first plurality of column sampling and holding buffer; and driving a column of a non-volatile memory cell array in response to the second stimulus data from the second plurality of registers using a second plurality of column sampling and holding buffer.