TWI812391B - Computing-in-memory circuitry - Google Patents
Computing-in-memory circuitry Download PDFInfo
- Publication number
- TWI812391B TWI812391B TW111129843A TW111129843A TWI812391B TW I812391 B TWI812391 B TW I812391B TW 111129843 A TW111129843 A TW 111129843A TW 111129843 A TW111129843 A TW 111129843A TW I812391 B TWI812391 B TW I812391B
- Authority
- TW
- Taiwan
- Prior art keywords
- output capacitor
- bit line
- pairs
- output
- digital
- Prior art date
Links
- 239000003990 capacitor Substances 0.000 claims abstract description 94
- 238000012545 processing Methods 0.000 claims abstract description 48
- 238000003491 array Methods 0.000 claims abstract description 15
- 238000006243 chemical reaction Methods 0.000 claims description 25
- 230000000295 complement effect Effects 0.000 claims description 23
- 230000005540 biological transmission Effects 0.000 claims description 21
- 238000000034 method Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 8
- 230000003068 static effect Effects 0.000 claims description 7
- 238000009825 accumulation Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 18
- 101100422614 Arabidopsis thaliana STR15 gene Proteins 0.000 description 8
- 101100141327 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) RNR3 gene Proteins 0.000 description 8
- 101150112501 din1 gene Proteins 0.000 description 8
- 238000010801 machine learning Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 7
- 238000013473 artificial intelligence Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 238000005265 energy consumption Methods 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000006880 cross-coupling reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Images
Landscapes
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
- Information Transfer Systems (AREA)
- Static Random-Access Memory (AREA)
- Analogue/Digital Conversion (AREA)
Abstract
Description
本發明是有關於一種記憶體內運算電路,且特別是關於一種使用靜態隨機存取記憶體的記憶體內運算電路。The present invention relates to an in-memory computing circuit, and in particular to an in-memory computing circuit using a static random access memory.
隨著人工智慧(artificial intelligence;AI)與機器學習 (machine learning;ML)的技術不斷進步,基於神經網路之機器學習架構在語音及影像辨識等應用上已達到出色的準確率。相較於傳統的雲端運算,邊緣運算能夠實現較低運算延遲以及較佳的效能。同時,由於不用將資料上傳雲端,避免了資料被第三方竊取的風險,能夠有效提高資料的安全性,也能降低設備對於網絡的依賴性。With the continuous advancement of artificial intelligence (AI) and machine learning (ML) technologies, machine learning architecture based on neural networks has achieved excellent accuracy in applications such as speech and image recognition. Compared with traditional cloud computing, edge computing can achieve lower computing latency and better performance. At the same time, because there is no need to upload data to the cloud, the risk of data being stolen by third parties is avoided, which can effectively improve data security and reduce the device's dependence on the network.
然而,邊緣運算受限於終端設備之能量與運算資源,使得於終端設備上實現機器學習架構極具挑戰性。為因應終端AI應用,記憶體內運算(computing-in-memory;CIM)之記憶體電路架構逐漸崛起。藉由直接在記憶體內進行運算以避免大量資料搬運,此記憶體架構不但能夠打破傳統馮紐曼架構(Von Neumann architecture)下的記憶體瓶頸,同時能夠實現乘法與加法的平行化運算,藉此大幅度提升整體運算效能。然而,由於記憶體內運算之記憶體需要額外的資料轉換介面,其包含數位類比及類比數位轉換器等,由於這些類比元件的效能會影響整體電路的吞吐量(throughput)、能耗與面積使用效率,使得記憶體內運算之記憶體的效能受限,進而限制此種記憶體架構的應用。However, edge computing is limited by the energy and computing resources of terminal devices, making it extremely challenging to implement machine learning architecture on terminal devices. In response to terminal AI applications, the memory circuit architecture of computing-in-memory (CIM) is gradually emerging. By performing operations directly in the memory to avoid large amounts of data transfer, this memory architecture can not only break the memory bottleneck under the traditional Von Neumann architecture, but also achieve parallel operations of multiplication and addition, thereby greatly Significantly improve overall computing performance. However, since in-memory computing requires additional data conversion interfaces, including digital analog and analog-to-digital converters, the performance of these analog components will affect the throughput, energy consumption and area efficiency of the overall circuit. , which limits the performance of the memory for in-memory operations, thereby limiting the application of this memory architecture.
須注意的是,「先前技術」段落的內容是用來幫助了解本發明。在「先前技術」段落所揭露的部份內容(或全部內容)可能不是所屬技術領域中具有通常知識者所知道的習知技術。在「先前技術」段落所揭露的內容,不代表該內容在本發明申請前已被所屬技術領域中具有通常知識者所知悉。It should be noted that the content of the "Prior Art" paragraph is used to help understand the present invention. Some (or all) of the contents disclosed in the "Prior Art" paragraph may not be conventional techniques known to those with ordinary skill in the relevant technical field. The content disclosed in the "Prior Art" paragraph does not mean that the content has been known to those with ordinary knowledge in the technical field before the application of the present invention.
本發明提供一種記憶體內運算電路包括多個數位類比轉換器,多個運算陣列以及多個電荷處理網路。數位類比轉換器將外部資料轉換為輸入資料,且數位類比轉換器與對應的多個輸出電容器對串聯。運算陣列自雙端接收所述輸入資料並執行運算以輸出第一運算值。電荷處理網路通過與輸出電容器對串聯的開關對以在預定時間區間內接收並累積第一運算值。多個電荷處理網路將第一運算值的電荷平均分配至選定的輸出電容器對,並將多個輸出電容器對的兩端的電壓差值進行比較以輸出第二運算值。The invention provides an in-memory computing circuit including a plurality of digital-to-analog converters, a plurality of computing arrays and a plurality of charge processing networks. The digital-to-analog converter converts external data into input data, and the digital-to-analog converter is connected in series with corresponding pairs of output capacitors. The operation array receives the input data from both ends and performs operations to output a first operation value. The charge processing network receives and accumulates the first operation value within a predetermined time interval through a pair of switches connected in series with the pair of output capacitors. The plurality of charge processing networks evenly distribute the charge of the first operation value to the selected output capacitor pair, and compare the voltage differences across the plurality of output capacitor pairs to output the second operation value.
本發明提供一種應用於終端AI設備之高吞吐量、高能量與面積使用效率之記憶體內運算之靜態隨機存取記憶體(CIM SRAM)電路架構。藉由改善資料處理與轉換電路,以克服CIM SRAM目前在電路性能上所受到的限制,並改善電路的額外能量消耗及運算線性度受限的問題,藉此提高整體記憶體之運算速度、能量使用效率與線性度。除此之外,本發明也提供一種統一電荷處理網路(unified charge processing network;UCPN),其同時提供了訊號處理和資料轉換功能以提高能量使用效率,並同時提升電路效能以及晶片在實體設計時的晶片面積的使用效率。The present invention provides a static random access memory (CIM SRAM) circuit architecture for in-memory computing with high throughput, high energy and area usage efficiency for terminal AI equipment. By improving the data processing and conversion circuits, we can overcome the current limitations of CIM SRAM in circuit performance, and improve the circuit's additional energy consumption and limited computational linearity, thereby increasing the overall memory's computing speed and energy Usage efficiency and linearity. In addition, the present invention also provides a unified charge processing network (UCPN), which simultaneously provides signal processing and data conversion functions to improve energy usage efficiency, and simultaneously improves circuit performance and chip physical design. The usage efficiency of the chip area at that time.
本發明提供的一種記憶體內運算電路,為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。The present invention provides an in-memory computing circuit. In order to make the above features and advantages of the present invention more obvious and easy to understand, embodiments are given below and described in detail with reference to the attached drawings.
本發明概念的特徵和實現所述特徵的方法可通過參考實施例的以下詳細描述和隨附圖式更容易地加以理解。下文中,將參考隨附圖式更詳細地描述實施例,在所述隨附圖式中,相同參考標號通篇指代相同元件。然而,本發明可以各種不同形式體現,且不應理解為受限於僅本文中說明的實施例。相反,將這些實施例作為實例來提供以使得本揭露將透徹且完整,且將向本領域的技術人員充分地傳達本發明的各方面和特徵。因此,可能並不描述對於本領域普通技術人員對本發明的方面和特徵的完整理解非必要的工藝、元件以及技術。除非另外指出,否則相同參考標號貫穿隨附圖式和書面描述表示相同元件,且因此將不重複其描述。在圖式中,為清楚起見,可能放大元件、層以及區域的相對大小。Features of the inventive concept and methods of implementing said features may be more readily understood with reference to the following detailed description of embodiments and the accompanying drawings. Hereinafter, embodiments will be described in more detail with reference to the accompanying drawings, in which like reference numerals refer to like elements throughout. This invention may, however, be embodied in various different forms and should not be construed as limited to the only embodiments set forth herein. Rather, these embodiments are provided as examples so that this disclosure will be thorough and complete, and will fully convey the aspects and features of the invention to those skilled in the art. Accordingly, processes, elements, and techniques that are not necessary for a person of ordinary skill in the art to fully understand aspects and features of the invention may not be described. Unless otherwise indicated, the same reference numerals refer to the same elements throughout the accompanying drawings and written description, and therefore the description thereof will not be repeated. In the drawings, the relative sizes of components, layers and regions may be exaggerated for clarity.
在以下描述中,出於解釋的目的,闡述許多特定細節以提供對各種實施例的透徹理解。然而,顯而易知,可在沒有這些具體細節或有一或多種等效佈置的情況下實踐各種實施例。在其它情況下,以框圖的形式示出眾所周知的結構和裝置以便避免不必要地混淆各種實施例。In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the various embodiments. It will be apparent, however, that various embodiments may be practiced without these specific details or with one or more equivalent arrangements. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the various embodiments.
本文中使用的術語僅用於描述特定實施例的目的,且並不希望限制本發明。如本文中所使用,除非上下文另作明確指示,否則單數形式“一(a/an)”也意欲包含複數形式。將進一步理解,術語“包括(comprises/comprising)”、“具有(have/having)”、“包含(includes/including)”當在本說明書中使用時,表示所陳述特徵、整體、步驟、操作、元件和/或元件的存在,但不排除一或多個其它特徵、整體、步驟、操作、元件、元件和/或其群組的存在或增加。如本文中所使用,術語“和/或”包含相關聯的所列項中的一或多個的任何和所有組合。The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular form "a/an" is intended to include the plural form as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises/comprising", "have/having", and "includes/including" when used in this specification mean stated features, integers, steps, operations, The presence of an element and/or element does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, elements and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
如本文中所使用,術語“大體上”、“約”、“大致”以及類似術語用作近似的術語且不用作程度的術語,且意圖考慮將由本領域普通技術人員識別的測量值或計算值中的固有偏差。考慮到所討論的測量和與特定量的測量相關聯的誤差(即,測量系統的限制),如本文中所使用,“約”或“大致”包含所陳述值且意指在由本領域的普通技術人員確定的特定值的偏差的可接受範圍內。舉例來說,“約”可意味著在一或多個標準差內,或在所陳述值的± 30%、20%、10%、5%內。此外,當描述本發明的實施例時,使用“可”是指“本發明的一或多個實施例”。As used herein, the terms "substantially," "about," "approximately," and similar terms are used as terms of approximation and not as terms of degree, and are intended to contemplate measurements or calculations that would be recognized by one of ordinary skill in the art inherent bias in. As used herein, "about" or "approximately" encompasses the stated value and is intended to be within a reasonable time limit by one of ordinary skill in the art, taking into account the measurements in question and the errors associated with the measurement of particular quantities (i.e., the limitations of the measurement system). An acceptable range of deviation from a specific value determined by a technician. For example, "about" can mean within one or more standard deviations, or within ±30%, 20%, 10%, 5% of the stated value. Additionally, when describing embodiments of the invention, the use of "may" means "one or more embodiments of the invention."
當某一實施例可以不同方式實施時,特定處理次序可與所描述次序不同地執行。舉例來說,兩個連續描述的電路或元件可實質上同時執行或以與所描述次序相反的次序執行。While an embodiment may be implemented differently, the specific order of processing may be performed differently than described. For example, two consecutively described circuits or elements may be executed substantially concurrently or in the reverse order than described.
在本文中參考作為實施例和/或中間結構的示意說明的截面圖示來描述各種實施例。因而,應預期到作為例如製造技術和/或公差的結果而與圖示的形狀的差異。此外,出於描述根據本揭露的概念的實施例的目的,本文中所揭露的特定結構或功能性描述僅為說明性的。因此,本文中所揭露的實施例不應理解為受限於區域的特定圖示形狀,而是包含由(例如)製造引起的形狀偏差。Various embodiments are described herein with reference to cross-sectional illustrations that are schematic illustrations of embodiments and/or intermediate structures. Thus, differences from the shapes illustrated should be expected, as a result, for example, of manufacturing techniques and/or tolerances. Furthermore, for the purpose of describing embodiments in accordance with the presently disclosed concepts, specific structural or functional descriptions disclosed herein are illustrative only. Therefore, embodiments disclosed herein should not be construed as limited to the particular illustrated shapes of regions but are to include deviations in shapes that result, for example, from manufacturing.
本文中所描述的根據本發明實施例的電子或電子裝置和/或任何其它相關裝置或元件可利用任一適合的硬體、韌體(例如專用積體電路)、軟體或軟體、韌體以及硬體的組合實施。舉例來說,這些裝置的各種元件可形成於一個積體電路(integrated circuit;IC)晶片上或在獨立IC晶片上。此外,這些裝置的各種元件可實施於柔性印刷電路膜、帶載封裝(tape carrier package;TCP)、印刷電路板(printed circuit board;PCB)上,或形成於一個基底上。此外,這些裝置的各種元件可以是在一或多個計算裝置中在一或多個處理器上運行、執行電腦程式指令以及與其它系統元件交互以用於執行本文中所描述的各種功能的進程或執行緒。電腦程式指令儲存於可使用例如隨機存取記憶體(random access memory;RAM)的標準記憶體裝置在計算裝置中實施的記憶體內。電腦程式指令也可儲存在例如CD-ROM、快閃記憶體驅動器或類似物的其它非暫時性電腦可讀媒體中。此外,本領域的技術人員應認知到可將各種計算裝置的功能組合或集成到單個計算裝置中,或可將特定計算裝置的功能分佈於一或多個其它計算裝置上而不脫離本發明的示例性實施例的精神和範圍。Electronic or electronic devices according to embodiments of the invention described herein and/or any other related devices or components may utilize any suitable hardware, firmware (eg, application specific integrated circuits), software, or software, firmware, and A combination of hardware implementations. For example, the various components of these devices may be formed on an integrated circuit (IC) chip or on separate IC chips. Additionally, various components of these devices may be implemented on flexible printed circuit films, tape carrier packages (TCP), printed circuit boards (PCB), or formed on a substrate. Additionally, the various elements of these devices may be processes running on one or more processors in one or more computing devices, executing computer program instructions, and interacting with other system elements for performing the various functions described herein or thread. Computer program instructions are stored in memory that can be implemented in a computing device using standard memory devices such as random access memory (RAM). Computer program instructions may also be stored on other non-transitory computer-readable media such as CD-ROMs, flash memory drives, or the like. Additionally, those skilled in the art will recognize that the functionality of various computing devices may be combined or integrated into a single computing device, or the functionality of a particular computing device may be distributed across one or more other computing devices without departing from the scope of the present invention. spirit and scope of the exemplary embodiments.
除非另外定義,否則本文中所使用的所有術語(包含技術和科學術語)具有本發明所屬領域的普通技術人員所通常理解的相同意義。將進一步理解,術語(例如常用詞典中所定義的那些術語)應解釋為具有與其在相關技術的上下文和/或本說明書中的含義一致的含義,且不應在理想化或過分形式化的意義上進行解釋,除非在本文中這樣明確地定義。Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms (such as those defined in commonly used dictionaries) should be interpreted to have a meaning consistent with their meaning in the context of the relevant technology and/or this specification, and should not be interpreted in an idealized or overly formalized sense. unless otherwise explicitly defined herein.
為了實現新興的人工智慧應用,提出了記憶體內運算(CIM)架構來提升機器學習處理任務的計算效率。由於CIM架構是通過直接在記憶體內進行計算,避免了大量的資料移動。這使得能量使用效率遠高於傳統的Von Neumann計算架構,後者的能量損耗主要是由資料移動造成的。CIM靜態隨機存取記憶體的優點是可以直接處理從位元線(bit-line;BL)上所獲得的資訊,使得移動資料所需的能量最小化,並且可以進行平行化處理以執行高效的乘法和累加(multiplication and accumulation;MAC)運算。雖然CIM SRAM可以實現更高的能量使用效率,然而其操作頻率(operating frequency)和吞吐量相對低於現有的ML加速器。因此,會限制CIM SRAM可以應用的空間。因此,必須進一步提高CIM SRAM電路架構的吞吐量和執行效率,使其運算速度和吞吐量性能可以與高速運算或高性能的ML加速器相匹配,以實現高速運算或高性能的Al應用。In order to realize emerging artificial intelligence applications, the computing in memory (CIM) architecture is proposed to improve the computing efficiency of machine learning processing tasks. Since the CIM architecture performs calculations directly in memory, it avoids large amounts of data movement. This makes energy usage much more efficient than traditional Von Neumann computing architectures, where energy consumption is mainly caused by data movement. The advantage of CIM static random access memory is that it can directly process information obtained from the bit-line (BL), minimizing the energy required to move data, and can perform parallel processing to perform efficient Multiplication and accumulation (MAC) operations. Although CIM SRAM can achieve higher energy usage efficiency, its operating frequency and throughput are relatively lower than existing ML accelerators. Therefore, the space that CIM SRAM can be applied to will be limited. Therefore, the throughput and execution efficiency of the CIM SRAM circuit architecture must be further improved so that its computing speed and throughput performance can match high-speed computing or high-performance ML accelerators to achieve high-speed computing or high-performance Al applications.
本發明的CIM SRAM使用MAC 運算處理和資料轉換電路(data conversion circuits)來支援ML的運算任務。首先,本發明採用位元線控制搭配電荷重分配架構(charge redistribution architecture)以進行有效的MAC計算。藉由此架構,可有效提升CIM SRAM的分類能力(classification performance),而不會受到SRAM的存取電晶體( access transistor)所產生的非理想效應,例如:電晶體不匹配(transistor mismatch)與資料相依(data-dependent)的電流特性所影響。The CIM SRAM of the present invention uses MAC computing processing and data conversion circuits to support ML computing tasks. First, the present invention uses bit line control with a charge redistribution architecture to perform efficient MAC calculations. Through this architecture, the classification performance of CIM SRAM can be effectively improved without being affected by the non-ideal effects caused by the SRAM access transistor, such as transistor mismatch and transistor mismatch. Affected by data-dependent current characteristics.
除此之外,本發明還採用動態二進位加權的電流導向式(dynamic binary-weighted current-steering)數位類比轉換器(digital-to-analog converter;DAC)的架構來提升在高速操作下的能量效率和線性度(linearity)的效能。此外,也使用整合式的電荷處理網路(unified charge processing network;UCPN),以同時提供高效的信號處理和資料轉換功能,除了有效地提升吞吐量、能量和面積效率(energy-area-efficiency),也分別克服了使用額外資料處理和轉換電路所導致的CIM SRAM的性能瓶頸。In addition, the present invention also adopts a dynamic binary-weighted current-steering digital-to-analog converter (DAC) architecture to improve energy under high-speed operation. Efficiency and linearity performance. In addition, an integrated charge processing network (UCPN) is also used to provide efficient signal processing and data conversion functions at the same time, in addition to effectively improving throughput, energy and area efficiency (energy-area-efficiency) , and also respectively overcome the performance bottleneck of CIM SRAM caused by the use of additional data processing and conversion circuits.
另一方面,本發明亦使用逐次逼近類比數位轉換器(successive-approximation ADC;SAR ADC)電路架構搭配底板取樣(bottom-plate sampling)以提升資料轉換效能。藉由此架構除了可以減少CIM SRAM電路架構中的能量損耗,此外,也可有效改善輸出解析度(output resolution)和轉換時間(conversion time)並實現快速的轉換速度,以達到高吞吐量和高解析度的高效能CIM SRAM。On the other hand, the present invention also uses a successive-approximation analog-to-digital converter (SAR ADC) circuit architecture coupled with bottom-plate sampling to improve data conversion performance. In addition to reducing the energy loss in the CIM SRAM circuit architecture, this architecture can also effectively improve the output resolution and conversion time and achieve fast conversion speed to achieve high throughput and high High-performance CIM SRAM with high resolution.
請參照圖1,圖1是根據本發明的一實施例的一種記憶體內運算電路的系統方塊(circuit block)示意圖。Please refer to FIG. 1 , which is a system block (circuit block) diagram of an in-memory computing circuit according to an embodiment of the present invention.
在本實施例中,記憶體內運算電路100包括:多個電荷處理網路110、多個數位類比轉換器(DAC)120、多個運算陣列(101、1021、102N)、至少一個驅動電路(和/或解碼電路)130以及用來讀寫資料的輸入輸出電路140。其中,多個運算陣列包括多個運算庫(1021、102N)。In this embodiment, the in-memory computing circuit 100 includes: a plurality of
在本實施例中,解碼電路130用來接收一輸入致能訊號(未繪示)以將經編碼的外部資料進行解碼而得到解碼後的相應位址。其中,每一個位址對應到多個運算陣列(101、1021、102N)中特定的運算(或記憶)單元。驅動電路130通過多條字元線(WL、WL1、WLN)驅動相應的位址所對應的多個運算陣列(101、1021、102N)的其中至少一者,並且通過多條字元線(WL、WL1、WLN)與類比數位轉換器(analog -to- digital converter;ADC)電性連接。在本實施例中,ADC為SAR ADC。在一實施例中,ADC也可以是快閃式ADC(flash ADC)、管線式ADC(pipeline ADC)、管線式逐次逼近ADC(pipeline-SAR ADC),本實施例中不加以限制。In this embodiment, the
在本實施例中,多個DAC 120經配置以將輸入輸出電路140所傳入的數位外部資料轉換為類比輸入資料DIN、DIN1、DINM。在本實施例中,多個DAC 120中的每一者與對應的多個輸出電容器對串聯(請參見下文)。In this embodiment, the plurality of
在本實施例中,運算陣列(101、1021、102N)自對應的雙端(埠)接收來自位元線(BL、BL1、BLM)的輸入資料(DIN、DIN1、DINM)。字元線(WL、WL1、WLN)在特定的位址處導通後輸出運算致能訊號(EN、EN 1、EN N),並將位元線(BL、BL1、BLM)上的輸入資料(DIN、DIN1、DINM)與運算(或記憶)單元101進行運算。舉例來說,運算(或記憶)單元101可由左右對稱的兩個存取控制電晶體(或傳輸電晶體)(access transistors) (2T)(未繪示)和由四個電晶體(4T)組成的閂鎖器(latch)所構成(未繪示)。舉例來說,左側的其中一個電晶體與字元線連接,左側的另一電晶體則與閂鎖器(latch)連接。換言之,在本實施例中,運算單元101是由10個電晶體組成的靜態隨機存取記憶體(static random access memory;SRAM)所構成,本發明不加以限制。In this embodiment, the operation array (101, 1021, 102N) receives the input data (DIN, DIN1, DINM) from the bit lines (BL, BL1, BLM) from the corresponding double terminal (port). The word lines (WL, WL1, WLN) are turned on at specific addresses and output operation enable signals (EN,
其中,閂鎖器可在兩端儲存兩個互補的邏輯電壓準位作為權重。舉例來說,閂鎖器的兩端可儲存邏輯1或邏輯0。舉例來說,當字元線WL導通後,傳輸電晶體接到閂鎖器的其中一端具有邏輯1的權重,則在此示例中,與權重進行乘法運算後,原來位於位元線BL上的輸入資料(DIN、DIN1、DINM)會經由傳輸電晶體釋放,而在位元線BL上輸出運算值(輸出電壓或輸出電流)為邏輯0。相反地,當字元線WL導通後,傳輸電晶體接到閂鎖器的其中一端具有邏輯0的權重,則在此示例中,與權重進行相乘運算後,原來位於位元線BL上的輸入資料(DIN、DIN1、DINM)會經由傳輸電晶體釋放,而在位元線BL上輸出運算值(輸出電壓或輸出電流)為邏輯1。因此,在與權重進行相乘運算後,與左右對稱的SRAM的兩端相連接的位元線BL的端點上具有互補的邏輯準位。換言之,與運算(或記憶)單元101相接的對應的兩條位元線BL具有互補的邏輯準位。換言之,若與運算(或記憶)單元101相接的其中一條位元線具有例如邏輯1的電壓準位,則另一條位元線(或稱為反相位元線)具有互補的例如邏輯0的反相電壓準位。Among them, the latch can store two complementary logic voltage levels as weights at both ends. For example, a latch can store either a
在一實施例中,閂鎖器可以是由SR正反器所構成的閂鎖器,此處不加以限制。閂鎖器可以作為時脈閘控(clock gating),以記憶本次時間週期的比較狀態,而無須在下一週期重新改變訊號的狀態,因此可以減少時脈訊號的切換,而可有效降低電路的動態功率損耗,並可同時優化時鐘樹(clock tree)結構,以減少設置時間(setup timing)及增加電壓轉換的效率。In one embodiment, the latch may be a latch composed of an SR flip-flop, which is not limited here. The latch can be used as a clock gating to memorize the comparison status of this time period without having to change the signal status again in the next period. Therefore, the switching of clock signals can be reduced, which can effectively reduce the circuit load. Dynamic power loss, and the clock tree structure can be optimized at the same time to reduce setup timing and increase voltage conversion efficiency.
若電荷處理網路110通過與輸出電容器對(請參見下文)串聯的開關對以在預定時間區間內接收並累積第一運算值(即,與權重相乘後的運算值)。電荷處理網路110一方面將多個輸出電容器對所儲存的電荷傳送至全域位元線(請參見下文)進行累加(或累積),另一方面也同時將第一運算值的電荷透過多條位元線BL平均分配至輸出電容器對中選定的(或「有值的」,「邏輯準位非0的」)輸出電容器中(即,進行先相乘後累加運算後取一次平均值),最後將累加後取平均的兩端的電壓差值輸入比較器中進行比較,並基於比較結果輸出第二運算值。換言之,將全域位元線(請參見下文)的電壓差值傳至SAR ADC進行運算,輸出的運算值恰為輸入資料DIN、DIN1、DINM與權重相乘取平均的結果。If the
請參照圖2,圖2是根據本發明的一實施例的一種記憶體內運算電路的部分系統方塊示意圖。Please refer to FIG. 2 , which is a partial system block diagram of an in-memory computing circuit according to an embodiment of the present invention.
圖2示出了本發明所提出的CIM SRAM電路系統200的示例性架構。在本實施例中,CIM SRAM電路系統 200由64個7位元(7-bit)DAC(120、1201、120M)、16個運算庫(102、1021、102N’)、CIM SRAM巨集150A、150B以及輸入輸出電路140所組成。其中,集合來自CIM SRAM巨集150B運算後的輸出(O9、O16)(例如:具有8個輸出)及CIM SRAM巨集150A的運算庫(102、1021、102N’)運算後的輸出(O1、O8)(例如:具有8個輸出)即為卷積輸出(convolution output)160。在本實施例中,CIM SRAM電路系統200提供的資料轉換能力相當於16個7位元解析度的ADC。在本實施例中,最大濾波深度(filter depth)為64。在本實施例中,每一運算庫(102、1021、102N’)包括多個局部運算單元(local computing units;LCU)103。在一實施例中,CIM SRAM巨集150包括8個運算庫(未繪示)。換言之,CIM SRAM巨集150中的運算庫(未繪示)與運算庫(102、1021、102N’)組成一具有16個運算庫的CIM SRAM電路系統200。FIG. 2 shows an exemplary architecture of the CIM
在本實施例中,LCU 103接收來自DAC(120、1201、120M)將數位的外部訊號IN轉換成對應的類比輸入訊號(電壓),以便對運算庫(102、1021、102N’)進行預充電(pre-charging)。接著,在運算庫(102、1021、102N’)中進行相乘(或乘法)運算。最後,UCPN累積相乘(或乘法)的運算結果,並輸出相應的數位訊號。In this embodiment, the
請參照圖3,圖3是根據本發明的圖2的一實施例的一種運算庫的方塊示意圖。Please refer to FIG. 3. FIG. 3 is a block diagram of an operation library according to an embodiment of FIG. 2 of the present invention.
在本實施例中,運算庫102包括多個LCU 103、多個運算單元、多個控制輸入資料是否透過對應的互補式位元線(RBL_0、RBLB_0、RBL_N、RBLB_N)傳入運算單元的訊號(或開關)(EN_RBL0、EN_RBLB0、EN_RBLN、EN_RBLN)、符號位元開關SSW以及電荷處理網路170。在本實施例中,電荷處理網路170運算後,透過ADC輸出一運算值ADC_OUT。其中ADC所輸出的運算值ADC_OUT為7位元(7-bits)的輸出訊號。In this embodiment, the
在本實施例中,運算單元包括多個權重單元(W
0,0,…,W
15,63)。在一實施例中,運算單元的權重單元不加以限制。在本實施例中,多個權重單元(W
0,0,…,W
15,0)、(W
0,63,…,W
15,63)彼此平行連接。多個權重單元(W
0,0,…,W
15,0)、(W
0,63,…,W
15,63)的每一者的兩端分別與對應的多個位元線對的其中至少一者電性連接。舉例來說,權重單元(W
0,0,…,W
15,0)的兩端分別與位元線對RBL_0、RBLB_0電性連接。其中,每個權重單元(W
0,0,…,W
15,63)儲存相應的權重(邏輯0或邏輯1)。另一方面,權重單元(W
0,63,…,W
15,63)的兩端分別與位元線對RBL_N、RBLB_N電性連接。其中,每個權重單元(W
0,0,…,W
15,63)儲存相應的權重(邏輯0或邏輯1)。在一實施例中,每個權重單元(W
0,0,…,W
15,63)所儲存的權重可視運算需求而預先決定,本實施例不加以限制。
In this embodiment, the computing unit includes multiple weight units (W 0,0 ,...,W 15,63 ). In one embodiment, the weight unit of the operation unit is not limited. In this embodiment, multiple weight units (W 0,0 ,...,W 15,0 ), (W 0,63 ,...,W 15,63 ) are connected in parallel to each other. Two ends of each of the multiple weight units (W 0,0 ,...,W 15,0 ) and (W 0,63 ,...,W 15,63 ) are respectively connected to one of the corresponding multiple bit line pairs. At least one is electrically connected. For example, both ends of the weight unit (W 0,0 ,...,W 15,0 ) are electrically connected to the bit line pairs RBL_0 and RBLB_0 respectively. Among them, each weight unit (W 0,0 ,...,W 15,63 ) stores the corresponding weight (
在本實施例中,在多個權重單元(W 0,0,…,W 15,0)、(W 0,63,…,W 15,63)的相應的權重固定下執行運算時,基於被選擇到的多個權重單元的其中一者(例如,W 15,0或W 15,63)進行運算,未被選擇到的其餘權重單元(例如,W 0,0,…,W 14,0或W 0,63,…,W 14,63)維持閒置狀態。 In this embodiment, when the operation is performed with the corresponding weights of multiple weight units (W 0,0 ,...,W 15,0 ), (W 0,63 ,...,W 15,63 ) fixed, based on One of the multiple selected weight units (for example, W 15,0 or W 15,63 ) performs the operation, and the remaining unselected weight units (for example, W 0,0 ,…,W 14,0 or W 0,63 ,…,W 14,63 ) remain idle.
在本實施例中,符號位元開關SSW的兩端分別與對應的多個位元線對(RBL_0、RBLB_0)、(RBL_N、RBLB_N)的其中至少一者電性連接,其中在預充電期間之前,基於輸入資料的正負值決定對應的所述符號位元開關。換言之,在本實施例中,利用符號位元開關SSW以控制位元線對(RBL_0、RBLB_0)、(RBL_N、RBLB_N)使輸入資訊(輸入訊號)產生正負值(正負電壓)。In this embodiment, both ends of the sign bit switch SSW are electrically connected to at least one of the corresponding plurality of bit line pairs (RBL_0, RBLB_0), (RBL_N, RBLB_N), where before the precharge period , the corresponding sign bit switch is determined based on the positive and negative values of the input data. In other words, in this embodiment, the sign bit switch SSW is used to control the bit line pairs (RBL_0, RBLB_0), (RBL_N, RBLB_N) so that the input information (input signal) generates positive and negative values (positive and negative voltages).
請參照圖4,圖4是根據本發明的一實施例的數位類比轉換器(digital-to-analog converter;DAC)的電路示意圖。Please refer to FIG. 4 , which is a schematic circuit diagram of a digital-to-analog converter (DAC) according to an embodiment of the present invention.
請同時參照圖3和圖4,在本實施例中,DAC(120、1201、120M)是動態二進位加權的電流導向式DAC。在本實施例中,DAC(120、1201、120M)包括動態的二進位加權的電流源陣列CS,以及與多個輸出電容器對C1串聯的多條邏輯電壓準位互補的位元線對。換言之,DAC(120、1201、120M)包括由多個電流源開關SW0、SW1、SWM構成的動態的二進位加權的電流源陣列CS。藉由這些電流源開關SW0、SW1、SWM,可根據輸入資料選擇或切換所需的電流源陣列CS,以提供相應的電流。在一實施例中,來自DAC(120、1201、120M)的輸入資料為電流導向式資料。在一實施例中,來自DAC(120、1201、120M)的輸入資料為電壓導向式資料。Please refer to Figure 3 and Figure 4 at the same time. In this embodiment, the DACs (120, 1201, 120M) are dynamic binary weighted current-oriented DACs. In this embodiment, the DAC (120, 1201, 120M) includes a dynamic binary-weighted current source array CS, and a plurality of bit line pairs with complementary logic voltage levels connected in series with a plurality of output capacitor pairs C1. In other words, the DAC (120, 1201, 120M) includes a dynamic binary-weighted current source array CS composed of a plurality of current source switches SW0, SW1, SWM. Through these current source switches SW0, SW1, and SWM, the required current source array CS can be selected or switched according to the input data to provide the corresponding current. In one embodiment, the input data from the DAC (120, 1201, 120M) is current directed data. In one embodiment, the input data from the DAC (120, 1201, 120M) is voltage directed data.
在一實施例中,DAC(120、1201、120M)具有多組平行連接(或並聯)的電流源組。舉例來說,電流源組包括至少一個電流源CS。在一替代性的實施例中,DAC(120、1201、120M)具有多組平行連接(或並聯)的電流源組,其中每一組電流源組包括彼此平行連接的多個電流源CS。在一替代性的實施例中,每一個電流源組的彼此平行連接的電流源CS數目之間呈現2的冪次數列。舉例來說,第一組電流源組具有一個電流源CS, 第二組電流源組具有兩個彼此平行連接的電流源CS,…,第五組電流源組具有三十二個彼此平行連接的電流源CS,換言之,圖4的N值為32,依此類推。在一實施例中,每一個電流源CS至少包含一個開關SW0、SW1、SWM。舉例來說,第一組電流源組具有一個電流源CS、三個閘級偏壓BIAS以及一個電流源開關SW0。舉例來說,第二組電流源組具有兩個電流源CS,每個電流源CS包括三個閘級偏壓BIAS,以及一個電流源開關SW1。換言之,第二組電流源組具有兩個電流源開關SW1。舉例來說,第五組電流源組具有五個彼此並聯的電流源CS,每個電流源CS包括三個閘級偏壓BIAS,以及一個電流源開關SWM。換言之,第五組電流源組具有五個電流源開關SWM。其中,在一實施例中,電流源組中彼此並聯的電流源數目不加以限制,可視需求而設計。In one embodiment, the DAC (120, 1201, 120M) has multiple parallel-connected (or parallel) current source groups. For example, the current source group includes at least one current source CS. In an alternative embodiment, the DAC (120, 1201, 120M) has a plurality of parallel-connected (or parallel-connected) current source groups, wherein each current source group includes a plurality of current sources CS connected in parallel with each other. In an alternative embodiment, the number of current sources CS connected in parallel to each other in each current source group presents a power sequence of 2. For example, the first current source group has one current source CS, the second current source group has two current sources CS connected in parallel to each other, ..., and the fifth current source group has thirty-two current sources CS connected in parallel to each other. Current source CS, in other words, the N value of Figure 4 is 32, and so on. In one embodiment, each current source CS includes at least one switch SW0, SW1, and SWM. For example, the first current source group has one current source CS, three gate bias voltages BIAS and one current source switch SW0. For example, the second current source group has two current sources CS, and each current source CS includes three gate bias voltages BIAS and one current source switch SW1. In other words, the second current source group has two current source switches SW1. For example, the fifth current source group has five current sources CS connected in parallel with each other. Each current source CS includes three gate bias voltages BIAS and one current source switch SWM. In other words, the fifth current source group has five current source switches SWM. In one embodiment, the number of current sources connected in parallel in the current source group is not limited and can be designed as required.
在本實施例中,DAC(120、1201、120M)包括由多條位元線RBL、RBLB以及多個位元線上的電容CRBL、CRBLB(請參閱圖5)組成的輸出電容C1。藉由此種DAC(120、1201、120M)架構設計,可將DAC的能量使用效率提升2.01倍,且其線性度亦有效提升為0.65 ENOB(effective number of bits)。其中,ENOB為有效位元數,用以衡量數據轉換器相對於輸入信號在奈奎斯特帶寬上的轉換質量(以位為單位)的參數In this embodiment, the DAC (120, 1201, 120M) includes an output capacitor C1 composed of a plurality of bit lines RBL, RBLB and capacitors CRBL, CRBLB on the plurality of bit lines (see Figure 5). With this DAC (120, 1201, 120M) architecture design, the energy efficiency of the DAC can be improved by 2.01 times, and its linearity is also effectively improved to 0.65 ENOB (effective number of bits). Among them, ENOB is the number of effective bits, which is a parameter used to measure the conversion quality (in bits) of the data converter relative to the input signal at the Nyquist bandwidth.
請參照圖5,圖5是根據本發明的一實施例的局部運算單元(local computing unit;LCU)的電路示意圖。Please refer to FIG. 5 , which is a schematic circuit diagram of a local computing unit (local computing unit; LCU) according to an embodiment of the present invention.
在本實施例中,LCU 103包括兩個DAC輸入開關410、十六個記憶單元420、四個符號位元開關SSW以及電荷處理網路170。In this embodiment, the
在一示例性實施例中,運算陣列的同一列上具有六十四個LCU 103,每個LCU 103內又有十六個10T SRAM單元作為濾波運算的儲存處,其中十五組為閒置的SRAM單元,每一次運算只會啟動其中一組SRAM單元。在實際運算過程中可以依照不同需求或演算法隨時切換所需的SRAM單元。In an exemplary embodiment, there are sixty-four
在本實施例中,DAC輸入開關410包括位元線致能訊號(開關)EN_RBL、EN_RBLB。其中,位元線致能訊號(開關)EN_RBL、EN_RBLB用以決定是否允許接收來自DAC 120的輸入資訊。其中DAC 120基於DAC致能訊號EN_DAC以決定是否啟動該DAC進行運作。In this embodiment, the
在本實施例中,記憶單元420包括互補的位元線RBL、RBLB、在讀取期間同時控制多個傳輸電晶體是否導通的字元線RWL、左右對稱的兩個存取控制電晶體(2T)以及由四個電晶體(4T)構成的閂鎖器,其中閂鎖器具有互補的輸出端Q、QB。換言之,記憶單元420是由十六個10T SRAM所組成的架構。In this embodiment, the
在本實施例中,四個符號位元開關SSW包括控制位元線RBL是否可以將輸入資訊(訊號)傳入電荷處理網路170中的全域位元線GRBL_P的DAC輸入控制訊號DAC_6、控制位元線RBLB是否可以將反相輸入資訊(訊號)傳入電荷處理網路170中的全域位元線GRBL_N的DAC輸入控制訊號DAC_6、控制位元線RBL是否可以將反相輸入資訊(訊號)傳入電荷處理網路170中的全域位元線GRBL_P的DAC輸入控制訊號DAC_6B以及控制位元線RBLB是否可以將反相輸入資訊(訊號)傳入電荷處理網路170中的全域位元線GRBL_N的DAC輸入控制訊號DAC_6B。In this embodiment, the four symbol bit switches SSW include a DAC input control signal DAC_6 and a control bit that control whether the bit line RBL can transmit input information (signal) to the global bit line GRBL_P in the
在本實施例中,四個符號位元開關採用交叉式耦合配置,使輸入資訊(訊號)具有正負電壓形式。In this embodiment, the four sign bit switches adopt a cross-coupling configuration, so that the input information (signal) has a positive and negative voltage form.
在本實施例中,電荷處理網路170具有兩個ADC開關節點以接收ADC的輸出資訊(訊號),其中所述輸出資訊(訊號)經過其中一端接收一ADC參考電壓的反相器(未繪示)而輸出相應的ADC輸入控制訊號ADC_NB、ADCB_NB。In this embodiment, the
在一實施例中,反相器可用以將輸入資訊(訊號)作位元反相運算(即一補數),再將結果加 1,即為該數值的二補數。在二補數系統中,一個負數是以其對應正數的二補數來表示。二補數系統可以在加法或減法處理中,毋需因為數字的正負而使用不同的計算方式。只要使用一種加法電路即可處理有號數加法。類似地,減法可以用一個數加上另一個數的二補數來表示,因此只要採用加法電路及二補數電路即可完成各種有號數加法及減法。In one embodiment, the inverter can be used to invert the input information (signal) bit by bit (i.e., one's complement), and then add 1 to the result, which is the two's complement of the value. In the two's complement number system, a negative number is represented by its two's complement corresponding positive number. The two's complement number system can be used for addition or subtraction without having to use different calculation methods depending on the sign of the number. Just use an adding circuit to handle the addition of signed numbers. Similarly, subtraction can be expressed by adding one number to the two's complement of another number. Therefore, all kinds of addition and subtraction of signed numbers can be completed by using addition circuits and two's complement circuits.
在本實施例中,電荷處理網路170還具有兩個開關SEL,用以在位元線RBL及輸入控制訊號ADC_NB、ADCB_NB之間作出相應的切換。In this embodiment, the
請參照圖6A到圖6D,圖6A到圖6D是根據本發明的一實施例的LCU的不同階段的操作步驟示意圖及其時序示意圖。Please refer to FIGS. 6A to 6D . FIGS. 6A to 6D are schematic diagrams of operation steps and timing diagrams of different stages of the LCU according to an embodiment of the present invention.
在本實施例中,由LCU進行的乘法操作包括三個步驟。DAC預充電期間T1、相乘運算期間T2以及資料傳輸期間T3。In this embodiment, the multiplication operation performed by the LCU includes three steps. DAC precharge period T1, multiplication operation period T2 and data transmission period T3.
在本實施例中,在DAC預充電期間T1之前,首先根據所欲啟動的符號位元設置DAC符號位元開關,並將輸出電容C_RBL、C_RBLB的底板(bottom-plate)相應地連接到位元線RBL、RBLB上。參考操作610的操作方式,在DAC預充電期間T1,位元線致能訊號(開關)EN_RBL和DAC致能訊號EN_DAC被啟動。與輸出電容C_RBL、C_RBLB相連接的兩條全域位元線(global read bit lines) GRBL_P、GRBL_N在初始時被接地以將該全域位元線GRBL_P、GRBL_N上的電壓釋放掉至接地的電壓準位(例如,邏輯0),以確保兩個輸出電容C_RBL、C_RBLB之間的電壓差即為DAC被預充電的電壓。In this embodiment, before the DAC precharge period T1, the DAC sign bit switch is first set according to the sign bit to be activated, and the bottom-plates of the output capacitors C_RBL and C_RBLB are connected to the bit lines accordingly. RBL, RBLB on. Referring to the operation mode of
在本實施例中,參考操作620的操作方式,在相乘運算期間T2,預充電路徑將被開路(open)(即,位元線致能訊號(開關)EN_RBL開路),輸出電容C_RBL、C_RBLB將保持在預充電期間T1時的DAC預充電的電壓。接著,字元線RWL被致能以進行相乘運算。基於存儲在SRAM中的權重單元資料,位元線RBL或位元線RBLB上的電壓將被放電到零。放電過程結束後,字元線RWL將被禁能(disabled)以完成相乘運算。In this embodiment, referring to the operation mode of
在本實施例中,參考操作630的操作方式,在資料傳輸期間T3,基於開關(訊號)SEL將輸出電容C_RBL、C_RBLB的底板切換到UCPN 170中的ADC開關節點。傳輸的資料(電荷)在全域位元線GRBL_P、GRBL_N上累積。對應參照圖6D,全域位元線GRBL_P、GRBL_N上的電壓擺幅隨資料傳輸期間T3的時間增加而逐漸減少。在一實施例中,在資料傳輸期間T3,一開始全域位元線GRBL_N上的電壓擺幅相較於全域位元線GRBL_P上的電壓擺幅還要高,在資料傳輸期間T3後期,全域位元線GRBL_N上的電壓擺幅會逐漸趨於與全域位元線GRBL_P上的電壓擺幅相同。在ADC的輸出波形圖中示出了ADC的輸出訊號ADC_OUT變化情形。其中,ADC的輸出訊號ADC_OUT在資料傳輸期間T3為一7位元(7-bit)的輸出訊號。最後在UCPN170中進一步處理並轉換為數位輸出。在本實施例中,其中多個輸出電容器對C_RBL、C_RBLB實質上具有相同的電容值。在一實施例中,所有的輸出電容器C_RBL、C_RBLB均具有相同的電容值,而形成單一電容陣列(unitary capacitor array)。在資料傳輸期間T3,電容陣列720(參見圖7)在UCPN 170中可被重複使用(reusable)。在一實施例中,在SAR ADC的每次反覆運算中,不同數量的輸出電容器C_RBL、C_RBLB對C_RBL、C_RBLB以2的冪次逐漸進行切換,以實現動態的二進位加權的電容器切換。In this embodiment, referring to the operation mode of
參照圖6D,DAC預充電期間T1、相乘運算期間T2以及資料傳輸期間T3的波型關係可參照圖6A到圖6C以及上文的說明,且本領域之相關技術人員可清楚判讀,因而在此並不贅述。在本實施例中,一個完整的時脈訊號CLK週期包括預充電期間T1、相乘運算期間T2以及資料傳輸期間T3。Referring to Figure 6D, the waveform relationship between the DAC precharge period T1, the multiplication operation period T2 and the data transmission period T3 can be referred to Figures 6A to 6C and the above description, and can be clearly interpreted by those skilled in the art, so in This will not be described in detail. In this embodiment, a complete cycle of the clock signal CLK includes a precharge period T1, a multiplication operation period T2 and a data transmission period T3.
請參照圖7,圖7是根據本發明的一實施例的整合式的電荷處理網路(unified charge processing network;UCPN)的電路示意圖。Please refer to FIG. 7 , which is a schematic circuit diagram of an integrated charge processing network (unified charge processing network; UCPN) according to an embodiment of the present invention.
在本實施例中,使用整合式UCPN 170,不僅同時提供訊號處理和資料傳輸及轉換功能,也提升整體能量使用效率和面積使用效率。在多條互補的位元線RBL_0、RBLB_0、RBL_63、RBLB_63上僅使用單一的電容陣列720。其中,電容陣列720包括多個輸出電容 C_RBL 0、C_RBL63、C_RBLB、C_RBLB0、C_RBLB63,以在資料運算期間存儲MAC結果。在架構720下,電容陣列720也同時作為ADC的開關電容,藉以將運算結果進行轉換,以產生數位資訊。據此,本發明將晶片上的面積使用效率提高了1.15倍。此外,整合式的UCPN 170也減少了訊號傳播的路徑,並實質上有效地減低因為使用多個輸出電容C_RBL0、C_RBLB0、C_RBL63、C_RBLB63分壓造成全域位元線GRBL_P、GRBL_N上的輸出電壓擺幅下降的影響,同時也不需要使用額外的自舉式電路(bootstrap circuit)來提升傳送至比較器中進行比較的電壓準位以改善資料轉換精度,並藉此降低額外的面積消耗。換言之,在架構710下,整合式的UCPN 170包括一比較器,其輸出結果OUT經由SAR ADC再將比較器的輸出結果OUT轉成7-bits的輸出。特別是,整合式的UCPN 170可以有效避免多個輸出電容的電荷重分佈所造成輸出電壓擺幅降低的影響,同時因為與SAR ADC共用電容,而可節省晶片上的電容開銷(overhead)。In this embodiment, the
在本實施例中,整合式UCPN 170包括與多個輸出電容器對C_RBL0、C_RBLB0、C_RBL63、C_RBLB63串聯的所述多個開關對,其中所述多個開關對通過對應的多個反向器接收位元線對RBL_0、RBLB_0、RBL_63、RBLB_63上邏輯電壓準位互補的類比數位轉換控制訊號(ADC_0、ADCB_0)、(ADC_63、ADCB_63)。In this embodiment, the
在一實施例中,運算陣列420包括多條位元線RBL_0、RBLB_0以及多條字元線。多條位元線RBL_0、RBLB_0、RBL_63、RBLB_63包括邏輯電壓準位互補的多個位元線對。其中,多條位元線RBL_0、RBLB_0、RBL_63、RBLB_63在預充電期間被所述多個DAC充電至一預定電壓以對多個輸出電容器對C_RBL0、C_RBLB0、C_RBL63、C_RBLB63進行電荷累積,並將多個輸出電容器對C_RBL0、C_RBLB0、C_RBL63、C_RBLB63的電容器底板連接到相應的多個位元線對RBL_0、RBLB_0、RBL_63、RBLB_63。In one embodiment, the
在一實施例中,多條字元線在預充電期間結束後基於字元線致能訊號決定外部資料是否與對應的多個運算陣列執行對應的運算。在一實施例中,多個DAC的每一者在預充電期間基於數位類比轉換致能訊號、位元線致能訊號以及反相(互補)位元線致能訊號決定是否將所述外部資料輸入至對應的運算陣列中。In one embodiment, after the precharge period ends, the plurality of word lines determine whether the external data performs corresponding operations with the corresponding plurality of operation arrays based on the word line enable signal. In one embodiment, each of the plurality of DACs determines whether to convert the external data to the external data based on a digital-to-analog conversion enable signal, a bit line enable signal, and an inverted (complementary) bit line enable signal during the precharge period. Input into the corresponding operation array.
在一實施例中,開關對包括第一開關以及第二開關。第一開關,響應於時脈訊號在預定時間區間進行切換以連接第一位元線或第一反向器,其中第一反向器接收第一類比數位轉換控制訊號。第二開關,響應於時脈訊號在預定時間區間進行切換以連接第二位元線或第二反向器,其中第二反向器接收第二類比數位轉換控制訊號,其中第一位元線與所述第二位元線在執行運算階段時,其相應的邏輯電壓準位彼此互補。In one embodiment, the switch pair includes a first switch and a second switch. The first switch is switched in a predetermined time interval in response to the clock signal to connect the first element line or the first inverter, wherein the first inverter receives the first analog-to-digital conversion control signal. The second switch is switched in a predetermined time interval in response to the clock signal to connect the second bit line or the second inverter, wherein the second inverter receives the second analog-to-digital conversion control signal, wherein the first bit line When the second bit line and the second bit line are executing the operation stage, their corresponding logic voltage levels are complementary to each other.
在一實施例中,輸出電容器對包括第一輸出電容器及第二輸出電容器。第一輸出電容器基於第一開關在預定時間區間將第一輸出電容器的一端電性連接至第一位元線,其中第一輸出電容器被預充電到一第一電壓準位。In one embodiment, the output capacitor pair includes a first output capacitor and a second output capacitor. The first output capacitor electrically connects one end of the first output capacitor to the first element line within a predetermined time interval based on the first switch, wherein the first output capacitor is precharged to a first voltage level.
在一實施例中,第二輸出電容器基於第二開關在預定時間區間將第二輸出電容器的一端電性連接至所述第二位元線,其中第一輸出電容器被預充電到一第二電壓準位,其中第一輸出電容器的另一端以及第二輸出電容器的另一端分別電性連接至一比較器。比較器可用以比較第一電壓準位與第二電壓準位的電壓差值。In one embodiment, the second output capacitor electrically connects one end of the second output capacitor to the second bit line based on the second switch within a predetermined time interval, wherein the first output capacitor is precharged to a second voltage. level, wherein the other end of the first output capacitor and the other end of the second output capacitor are electrically connected to a comparator respectively. The comparator can be used to compare the voltage difference between the first voltage level and the second voltage level.
在一實施例中,電荷處理網路包括第一全域位元線和第二全域位元線。第一全域位元線耦接至第一輸出電容器的另一端及比較器的正端。第二全域位元線耦接至第二輸出電容器的另一端及比較器的負端,其中在預充電期間之前,第一全域位元線以及第二全域位元線被接地,使所述第一全域位元線以及所述第二全域位元線上的電壓準位為零。In one embodiment, the charge processing network includes a first global bit line and a second global bit line. The first global bit line is coupled to the other end of the first output capacitor and the positive end of the comparator. The second global bit line is coupled to the other end of the second output capacitor and the negative terminal of the comparator, wherein before the precharge period, the first global bit line and the second global bit line are grounded, so that the third global bit line The voltage level of a global bit line and the second global bit line is zero.
在一實施例中,在資料傳輸期間,第一開關響應於時脈訊號在資料傳輸期間將第一輸出電容器的一端切換至第一反向器,且第二開關響應於時脈訊號在資料傳輸期間將第二輸出電容器的一端切換至第二反向器。其中,輸入資料在第一全域位元線以及第二全域位元線上進行累積運算。In one embodiment, during data transmission, the first switch switches one end of the first output capacitor to the first inverter in response to the clock signal during data transmission, and the second switch responds to the clock signal during data transmission. During this period, one end of the second output capacitor is switched to the second inverter. The input data is accumulated on the first global bit line and the second global bit line.
在一實施例中,在預充電期間,第一輸出電容器與第二輸出電容器之間的電壓差為所述DAC進行預充電的電壓。In one embodiment, during the precharging period, the voltage difference between the first output capacitor and the second output capacitor is the voltage at which the DAC is precharged.
在一實施例中,多個電荷處理網路的每一者與多條位元線電性連接,其中多條位元線共用由多個輸出電容器對組成的一輸出電容器電容陣列。In one embodiment, each of the plurality of charge processing networks is electrically connected to a plurality of bit lines, wherein the plurality of bit lines share an output capacitor array composed of a plurality of output capacitor pairs.
在一實施例中,多個電荷處理網路將共用的輸出電容器電容陣列進行底板取樣並執行一次電荷重分佈處理,以在相應的全域位元線上輸出相應的輸出電壓值。In one embodiment, multiple charge processing networks perform backplane sampling on a shared output capacitor capacitance array and perform a charge redistribution process to output corresponding output voltage values on corresponding global bit lines.
在一實施例中,多個電荷處理網路在使用SAR ADC執行運算期間,利用共用的輸出電容器電容陣列可以同時儲存乘加運算的結果。In one embodiment, multiple charge processing networks can simultaneously store the results of multiply-and-accumulate operations using a shared output capacitor array during operations using the SAR ADC.
在一實施例中,輸出電壓值是基於輸入資料經過累加運算後再取平均的結果。其中,輸出電壓值經過SAR ADC進行運算,輸出輸入資料與多個運算陣列中的相應的權重單元相乘後再取平均的結果。In one embodiment, the output voltage value is based on the average result of the input data after an accumulation operation. Among them, the output voltage value is calculated by the SAR ADC, and the output input data is multiplied by the corresponding weight units in multiple calculation arrays and then averaged.
綜上所述,本發明諸實施例所述的記憶體內運算電路可以是基於使用靜態隨機存取記憶體的記憶體內運算電路。本發明也提供一種應用於終端AI設備之高吞吐量、高能量與面積使用效率之記憶體內運算之靜態隨機存取記憶體(CIM SRAM)電路架構。In summary, the in-memory computing circuit described in the embodiments of the present invention may be an in-memory computing circuit based on using static random access memory. The present invention also provides a static random access memory (CIM SRAM) circuit architecture for in-memory computing with high throughput, high energy and area usage efficiency for terminal AI equipment.
藉由改善資料處理與轉換電路,以克服CIM SRAM目前在電路性能上所受到的限制,並可改善電路的額外能量消耗及運算線性度受限的問題,藉此提高整體記憶體之運算速度、能量使用效率與線性度。除此之外,本發明也提供一種UCPN,其同時提供了訊號處理和資料轉換功能以提高能量使用效率,並同時提升電路效能以及晶片在實體設計時的晶片面積的使用效率。By improving the data processing and conversion circuit, we can overcome the current limitations of CIM SRAM in circuit performance, and improve the circuit's additional energy consumption and limited computational linearity, thereby increasing the overall memory computing speed. Energy efficiency and linearity. In addition, the present invention also provides a UCPN, which simultaneously provides signal processing and data conversion functions to improve energy efficiency, and simultaneously improves circuit performance and chip area utilization efficiency during physical design.
雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed above through embodiments, they are not intended to limit the present invention. Anyone with ordinary knowledge in the technical field may make some modifications and modifications without departing from the spirit and scope of the present invention. Therefore, The protection scope of the present invention shall be determined by the appended patent application scope.
100:記憶體內運算電路 101:運算單元 102、1021、102N’:運算庫 103:局部運算單元 110、170:電荷處理網路 120、1201、120M:數位類比轉換器 130:驅動電路、解碼器 140:輸入輸出電路 150A、150B:CIM SRAM巨集 200:CIM SRAM 電路系統 410:DAC輸入開關 420:記憶單元 610、620、630:操作 700、710、720:架構 BIAS:偏壓 BL、BL1、BLM、RBL_0、RBL_63、RBLB_0、RBLB_63、RBL_N、RBLB_N:位元線 C1、C_RBL、C_RBL 0、C_RBL63、C_RBLB、C_RBLB0、C_RBLB63:輸出電容 CLK:時脈訊號 CS:電流源 DIN、DIN1、DINM:輸入資料 DAC_6、DAC_6B:DAC輸入控制訊號 ADC_0、ADC_63、ADC_NB、ADCB_0、ADCB_63、ADCB_NB:ADC輸入控制訊號 EN、EN1、ENN:字元線致能訊號 EN_DAC:DAC致能訊號 EN_RBL、EN_RBL0、EN_RBLB、EN_RBLB0、EN_RBLN、EN_RBLBN:位元線致能訊號(開關) GRBL_P、GRBL_N:全域位元線 IN:外部資料 O1、O8、O9、O16、OUT、160、ADC_OUT:輸出訊號 Q、Q_0、Q_63、QB、QB_0、QB_63:輸出端 SEL、SW0、SW1、SWM:開關 SSW:符號位元開關 T1:預充電期間 T2:相乘運算期間 T3:資料傳輸期間 WL、WL1、WLN、RWL:字元線 W 0,0、W 1,0、W 2,0、W 13,0、W 14,0、W 15,0、W 0,63、W 1,63、W 2, 63、W 3, 63、W 13, 63、W 14, 63、W 15, 63:權重單元 100: In-memory arithmetic circuit 101: arithmetic unit 102, 1021, 102N': arithmetic library 103: local arithmetic unit 110, 170: charge processing network 120, 1201, 120M: digital-to-analog converter 130: drive circuit, decoder 140 :Input and output circuit 150A, 150B: CIM SRAM macro 200: CIM SRAM circuit system 410: DAC input switch 420: Memory unit 610, 620, 630: Operation 700, 710, 720: Architecture BIAS: Bias BL, BL1, BLM , RBL_0, RBL_63, RBLB_0, RBLB_63, RBL_N, RBLB_N: bit line C1, C_RBL, C_RBL 0, C_RBL63, C_RBLB, C_RBLB0, C_RBLB63: output capacitor CLK: clock signal CS: current source DIN, DIN1, DINM: input data DAC_6, DAC_6B: DAC input control signals ADC_0, ADC_63, ADC_NB, ADCB_0, ADCB_63, ADCB_NB: ADC input control signals EN, EN1, ENN: word line enable signal EN_DAC: DAC enable signal EN_RBL, EN_RBL0, EN_RBLB, EN_RBLB0, EN_RBLN, EN_RBLBN: bit line enable signal (switch) GRBL_P, GRBL_N: global bit line IN: external data O1, O8, O9, O16, OUT, 160, ADC_OUT: output signal Q, Q_0, Q_63, QB, QB_0 , QB_63: Output terminal SEL, SW0, SW1, SWM: switch SSW: sign bit switch T1: precharge period T2: multiplication operation period T3: data transmission period WL, WL1, WLN, RWL: word line W 0, 0 , W 1,0 , W 2,0 , W 13,0, W 14,0 , W 15,0 , W 0,63 , W 1,63, W 2, 63 , W 3, 63 , W 13, 63 , W 14, 63 , W 15, 63 : Weight unit
圖1是根據本發明的一實施例的一種記憶體內運算電路的系統方塊(circuit block)示意圖。 圖2是根據本發明的一實施例的一種記憶體內運算電路的部分系統方塊示意圖。 圖3是根據本發明的圖2的一實施例的一種運算庫的方塊示意圖。 圖4是根據本發明的一實施例的數位類比轉換器(digital-to-analog converter;DAC)的電路示意圖。 圖5是根據本發明的一實施例的局部運算單元(local computing unit;LCU)的電路示意圖。 圖6A到圖6D是根據本發明的一實施例的LCU的不同階段的操作步驟示意圖及其時序示意圖。 圖7是根據本發明的一實施例的整合式的電荷處理網路(unified charge processing network;UCPN)的電路示意圖。 FIG. 1 is a system block (circuit block) diagram of an in-memory computing circuit according to an embodiment of the present invention. FIG. 2 is a partial system block diagram of an in-memory computing circuit according to an embodiment of the present invention. FIG. 3 is a block diagram of an operation library according to an embodiment of FIG. 2 of the present invention. FIG. 4 is a schematic circuit diagram of a digital-to-analog converter (DAC) according to an embodiment of the present invention. FIG. 5 is a schematic circuit diagram of a local computing unit (LCU) according to an embodiment of the present invention. FIGS. 6A to 6D are schematic diagrams of operation steps and timing diagrams of different stages of the LCU according to an embodiment of the present invention. FIG. 7 is a schematic circuit diagram of an integrated charge processing network (unified charge processing network; UCPN) according to an embodiment of the present invention.
100:記憶體內運算電路 100: In-memory computing circuit
101:運算單元 101:Arithmetic unit
102、1021、102N’:運算庫 102, 1021, 102N’: operation library
110:電荷處理網路 110: Charge processing network
120:數位類比轉換器 120:Digital to analog converter
130:驅動電路、解碼電路 130: Drive circuit, decoding circuit
140:輸入輸出電路 140: Input and output circuit
BL、BL1、BLM:位元線 BL, BL1, BLM: bit lines
DIN、DIN1、DINM:輸入資料 DIN, DIN1, DINM: input data
EN、EN1、ENN:字元線致能訊號 EN, EN1, ENN: word line enable signal
WL、WL1、WLN:字元線 WL, WL1, WLN: word lines
Claims (19)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW111129843A TWI812391B (en) | 2022-08-09 | 2022-08-09 | Computing-in-memory circuitry |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW111129843A TWI812391B (en) | 2022-08-09 | 2022-08-09 | Computing-in-memory circuitry |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TWI812391B true TWI812391B (en) | 2023-08-11 |
| TW202407580A TW202407580A (en) | 2024-02-16 |
Family
ID=88585856
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW111129843A TWI812391B (en) | 2022-08-09 | 2022-08-09 | Computing-in-memory circuitry |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI812391B (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW202013264A (en) * | 2018-05-29 | 2020-04-01 | 英屬開曼群島商意騰科技股份有限公司 | Architecture of in-memory computing memory device for use in artificial neuron |
| US20210241820A1 (en) * | 2020-01-30 | 2021-08-05 | Texas Instruments Incorporated | Computation in-memory architecture for analog-to-digital conversion |
| CN113946310A (en) * | 2021-10-08 | 2022-01-18 | 上海科技大学 | Memory computing eDRAM accelerator for convolutional neural network |
-
2022
- 2022-08-09 TW TW111129843A patent/TWI812391B/en active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW202013264A (en) * | 2018-05-29 | 2020-04-01 | 英屬開曼群島商意騰科技股份有限公司 | Architecture of in-memory computing memory device for use in artificial neuron |
| US20210241820A1 (en) * | 2020-01-30 | 2021-08-05 | Texas Instruments Incorporated | Computation in-memory architecture for analog-to-digital conversion |
| CN113946310A (en) * | 2021-10-08 | 2022-01-18 | 上海科技大学 | Memory computing eDRAM accelerator for convolutional neural network |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202407580A (en) | 2024-02-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12009054B2 (en) | Computing-in-memory circuitry | |
| CN115048075B (en) | SRAM in-memory computing chip based on capacitive coupling | |
| US12007890B2 (en) | Configurable in memory computing engine, platform, bit cells and layouts therefore | |
| US11538509B2 (en) | Compute-in-memory with ternary activation | |
| KR102207909B1 (en) | Computation in memory apparatus based on bitline charge sharing and operating method thereof | |
| CN113255904A (en) | Voltage margin enhanced capacitive coupling storage integrated unit, subarray and device | |
| CN117130978A (en) | Charge domain in-memory calculation circuit and calculation method based on sparse tracking ADC | |
| Jeong et al. | A ternary neural network computing-in-memory processor with 16T1C bitcell architecture | |
| US20210150328A1 (en) | Hierarchical Hybrid Network on Chip Architecture for Compute-in-memory Probabilistic Machine Learning Accelerator | |
| CN109979503A (en) | A kind of Static RAM circuit structure realizing Hamming distance in memory and calculating | |
| Tsai et al. | RePIM: Joint exploitation of activation and weight repetitions for in-ReRAM DNN acceleration | |
| CN115910152A (en) | Charge domain memory calculation circuit and calculation circuit with positive and negative number operation function | |
| CN115080501A (en) | SRAM (static random Access memory) storage integrated chip based on local capacitance charge sharing | |
| CN116821048A (en) | Integrated memory chip and operation method thereof | |
| CN118072788A (en) | Storage and computing integrated circuits, chips and electronic devices | |
| CN118034644B (en) | A high-density and high-reliability in-memory computing circuit based on eDRAM | |
| US12249395B2 (en) | Memory device supporting in-memory MAC operation between ternary input data and binary weight using charge sharing method and operation method thereof | |
| TWI812391B (en) | Computing-in-memory circuitry | |
| CN117271436A (en) | SRAM-based current mirror complementary in-memory calculation macro circuit and chip | |
| Kim et al. | A charge-domain 10T SRAM based in-memory-computing macro for low energy and highly accurate DNN inference | |
| CN119356640B (en) | Randomly calculated CIM circuit and MAC operation circuit suitable for machine learning training | |
| CN118298872B (en) | In-memory computing circuit with configurable input weight bit and chip thereof | |
| US20240330178A1 (en) | Configurable in memory computing engine, platform, bit cells and layouts therefore | |
| CN115525250A (en) | memory computing circuit | |
| Xuan et al. | AiDAC: A Low-Cost In-Memory Computing Architecture with All-Analog Multi-Bit Compute and Interconnect |