TWI879351B

TWI879351B - Floating-point computing-in-memory device

Info

Publication number: TWI879351B
Application number: TW112151044A
Authority: TW
Inventors: 蘇建維; 梅芃翌; 林志昇; 李思翰; 許世玄
Original assignee: 財團法人工業技術研究院
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2025-04-01
Also published as: TW202526615A

Abstract

A floating-point computing-in-memory device is provided. The floating-point computing-in-memory device includes the exponent computing memory module and the mantissa computing memory module. The exponent computing memory module includes a plurality of weighting exponent memory circuits, a plurality of exponent computing circuits and a comparison circuit. The exponent computing circuits are used to obtain a plurality of exponent products. The mantissa computing memory module includes a bits shifting circuit, a plurality of weighting mantissa memory circuits, a plurality of mantissa computing circuits, a shift-and-addition circuit, a plurality of weighting sign memory circuits, a plurality of sign computing circuits and an addition circuit. The mantissa computing circuits and the shift-and-addition circuit are used to obtain a plurality of mantissa products. The sign computing circuits are used to obtain a plurality of sign products.

Description

Floating point in-memory device

本揭露是有關於一種浮點數記憶體內運算裝置。The present disclosure relates to a floating point in-memory computing device.

記憶體內運算技術（Computing in memory，CIM）被視為解決記憶體牆（memory wall）的有效技術之一，其利用在記憶體內的運算來減少資料搬移的次數，可以大幅提升運算速度至傳統架構的幾百甚至幾千倍以上。現今大型AI網路（例如DNN）有很大一部分的能量被消耗在資料的搬移中。透過記憶體內運算技術（CIM）可以大幅降低因此而虛耗掉的能量，可說是兼具增加運算能力及降低功耗的未來AI潛力技術。Computing in memory (CIM) is considered as one of the effective technologies to solve the memory wall. It uses computing in memory to reduce the number of data transfers, which can significantly increase the computing speed to hundreds or even thousands of times that of traditional architectures. A large part of the energy of today's large AI networks (such as DNN) is consumed in data transfer. Computing in memory (CIM) can significantly reduce the energy consumed by this, and can be said to be a potential future AI technology that increases computing power and reduces power consumption.

記憶體內運算技術（CIM）的潛力使得許多廠商及研究單位均投入並發表許多新穎的技術，但只能進行整數運算，且採用的類比感測可能會造成雜訊或製程變異等問題。目前所提出之記憶體內運算（CIM）並無法支援浮點數運算。因此研究人員正致力於開發支援浮點數的記憶體內運算架構。The potential of CIM technology has led many manufacturers and research institutions to invest in and publish many innovative technologies, but they can only perform integer operations, and the analog sensing used may cause problems such as noise or process variation. The current CIM does not support floating-point operations. Therefore, researchers are working to develop a CIM architecture that supports floating-point numbers.

本揭露係有關於一種浮點數記憶體內運算（floating-point computing in memory）裝置，其將浮點運算電路集成到記憶體內部，避免了資料的輸入和輸出，因此具有快速運算的優勢，並可減少功耗消耗，提高了能源效率。The present disclosure relates to a floating-point computing in memory device, which integrates floating-point computing circuits into the memory, avoiding data input and output, and thus has the advantages of fast computing, and can reduce power consumption and improve energy efficiency.

根據本揭露之一方面，提出一種浮點數記憶體內運算（floating-point computing in memory）裝置。浮點數記憶體內運算裝置包括一指數（exponent）儲存運算模組及一尾數（Mantissa）儲存運算模組。指數儲存運算模組包括數個權重指數儲存電路、數個指數運算電路及一比較電路。這些權重指數儲存電路用以儲存數個權重資料之指數部分。這些指數運算電路用以對數個輸入資料之指數部分與這些權重資料之指數部分進行一加法運算，以獲得數個指數乘積資料。比較電路用以比較這些指數乘積資料，以獲得一最大指數乘積資料。尾數儲存運算模組包括一位數位移電路、數個權重尾數儲存電路、數個尾數運算電路、一位移與加法電路、數個權重正負號儲存電路、數個正負號運算電路及一加總電路。位數位移電路用以依據最大指數乘積資料，位移這些輸入資料之尾數部分。這些權重尾數儲存電路用以儲存這些權重資料之尾數部分。這些尾數運算電路用以對這些輸入資料之尾數部分與這些權重資料之尾數部分進行一乘法運算，以獲得數個尾數乘積中間資料。位移與加法電路用以對這些尾數乘積中間資料進行位移後再進行加總，以獲得數個尾數乘積資料。數個權重正負號儲存電路用以儲存這些權重資料之正負號部分。數個正負號運算電路用以對這些輸入資料之正負號部分與這些權重資料之正負號部分進行一互斥或（Exclusive-OR）運算，以獲得數個正負號乘積資料。加總電路用以整合這些正負號乘積資料、這些最大指數乘積資料及這些尾數乘積資料，以獲得一輸入與權重乘積和資料。According to one aspect of the present disclosure, a floating-point computing in memory device is provided. The floating-point computing in memory device includes an exponent storage computing module and a mantissa storage computing module. The exponent storage computing module includes a plurality of weight exponent storage circuits, a plurality of exponent computing circuits and a comparison circuit. The weight exponent storage circuits are used to store the exponent parts of a plurality of weight data. The exponent computing circuits are used to perform an addition operation on the exponent parts of a plurality of input data and the exponent parts of the weight data to obtain a plurality of exponential product data. The comparison circuit is used to compare these exponential product data to obtain a maximum exponential product data. The mantissa storage operation module includes a digit shift circuit, a plurality of weight mantissa storage circuits, a plurality of mantissa operation circuits, a shift and addition circuit, a plurality of weight positive and negative sign storage circuits, a plurality of positive and negative sign operation circuits and a summing circuit. The digit shift circuit is used to shift the mantissa part of these input data according to the maximum exponential product data. These weight mantissa storage circuits are used to store the mantissa part of these weight data. These mantissa operation circuits are used to perform a multiplication operation on the mantissa part of these input data and the mantissa part of these weight data to obtain a plurality of mantissa product intermediate data. The shift and addition circuit is used to shift the intermediate data of the mantissa products and then sum them up to obtain a number of mantissa product data. The weight positive and negative sign storage circuits are used to store the positive and negative sign parts of the weight data. The positive and negative sign operation circuits are used to perform an exclusive-OR operation on the positive and negative sign parts of the input data and the positive and negative sign parts of the weight data to obtain a number of positive and negative sign product data. The summing circuit is used to integrate the positive and negative sign product data, the maximum exponent product data and the mantissa product data to obtain an input and weight product sum data.

根據本揭露之另一方面，提出一種指數儲存運算模組。指數儲存運算模組包括數個權重指數儲存電路、數個指數運算電路及一比較電路。這些權重指數儲存電路用以儲存數個權重資料之指數部分。這些指數運算電路用以對數個輸入資料之指數部分與這些權重資料之指數部分進行一加法運算，以獲得數個指數乘積資料。比較電路用以比較這些指數乘積資料，以獲得一最大指數乘積資料。According to another aspect of the present disclosure, an exponential storage operation module is proposed. The exponential storage operation module includes a plurality of weight index storage circuits, a plurality of exponential operation circuits and a comparison circuit. The weight index storage circuits are used to store the exponential parts of a plurality of weight data. The exponential operation circuits are used to perform an addition operation on the exponential parts of a plurality of input data and the exponential parts of the weight data to obtain a plurality of exponential product data. The comparison circuit is used to compare the exponential product data to obtain a maximum exponential product data.

根據本揭露之再一方面，提出一種尾數儲存運算模組。尾數儲存運算模組包括數個權重尾數儲存電路、數個尾數運算電路及一位移與加法電路。這些權重尾數儲存電路用以儲存數個權重資料之尾數部分。這些尾數運算電路用以對數個輸入資料之尾數部分與這些權重資料之尾數部分進行一乘法運算，以獲得數個尾數乘積中間資料。位移與加法電路用以對這些尾數乘積中間資料進行位移後再進行加總，以獲得數個尾數乘積資料。According to another aspect of the present disclosure, a mantissa storage operation module is proposed. The mantissa storage operation module includes a plurality of weight mantissa storage circuits, a plurality of mantissa operation circuits and a shift and addition circuit. These weight mantissa storage circuits are used to store the mantissa parts of a plurality of weight data. These mantissa operation circuits are used to perform a multiplication operation on the mantissa parts of a plurality of input data and the mantissa parts of these weight data to obtain a plurality of mantissa product intermediate data. The shift and addition circuit is used to shift and then sum up these mantissa product intermediate data to obtain a plurality of mantissa product data.

為了對本揭露之上述及其他方面有更佳的瞭解，下文特舉實施例，並配合所附圖式詳細說明如下：In order to better understand the above and other aspects of the present disclosure, the following embodiments are specifically described in detail with reference to the accompanying drawings:

本說明書的技術用語係參照本技術領域之習慣用語，如本說明書對部分用語有加以說明或定義，該部分用語之解釋係以本說明書之說明或定義為準。本揭露之各個實施例分別具有一或多個技術特徵。在可能實施的前提下，本技術領域具有通常知識者可選擇性地實施任一實施例中部分或全部的技術特徵，或者選擇性地將這些實施例中部分或全部的技術特徵加以組合。The technical terms in this specification refer to the customary terms in this technical field. If this specification explains or defines some terms, the interpretation of these terms shall be subject to the explanation or definition in this specification. Each embodiment of the present disclosure has one or more technical features. Under the premise of possible implementation, a person with ordinary knowledge in this technical field can selectively implement part or all of the technical features in any embodiment, or selectively combine part or all of the technical features in these embodiments.

請參照第1圖，其示例說明本揭露一實施例之浮點數資料之乘積運算。浮點數資料係由正負號部分S、指數部分E及尾數部分M所組成。以16位元之FP16運算架構為例，正負號部分S佔1位元、指數部分E佔8位元，尾數部分M佔7位元。尾數部分M的7位元依序為數值時，此浮點數資料的數值內容即為。 Please refer to FIG. 1, which illustrates the multiplication operation of floating point data in an embodiment of the present disclosure. The floating point data is composed of a sign part S, an exponent part E, and a mantissa part M. Taking the 16-bit FP16 operation architecture as an example, the sign part S occupies 1 bit, the exponent part E occupies 8 bits, and the mantissa part M occupies 7 bits. The 7 bits of the mantissa part M are the values in order. , the value of this floating point data is .

正負號部分S為0時，表示正值；正負號部分S為1時，表示負值。指數部分E可表示的範圍為。尾數部分M可表示的範圍為1.0～1.9921875。 When the sign part S is 0, it indicates a positive value; when the sign part S is 1, it indicates a negative value. The range that the exponent part E can represent is The range that the mantissa M can represent is 1.0 to 1.9921875.

如第1圖所示，輸入資料IN與權重資料WT都可以採用FP16運算架構。輸入資料IN與權重資料WT進行乘積運算後，可以獲得乘積資料ML。乘積資料ML也會是採用FP16運算架構。在輸入資料IN與權重資料WT進行乘積運算時，會對指數部分E進行加法運算，並對尾數部分M進行乘法運算，對正負號部分S則是進行互斥或（Exclusive-OR）運算。As shown in Figure 1, both the input data IN and the weight data WT can use the FP16 computing architecture. After the input data IN and the weight data WT are multiplied, the product data ML can be obtained. The product data ML will also use the FP16 computing architecture. When the input data IN and the weight data WT are multiplied, the exponent part E is added, the mantissa part M is multiplied, and the sign part S is exclusive-ORed.

請參照第2圖，其繪示根據本揭露一實施例之浮點數資料之儲存方式。在一實施例中，浮點數資料之指數部分E、正負號部分S及尾數部分M可以依序排列且儲存於記憶體中。Please refer to the 2nd figure, it illustrates the storage mode of the floating point data according to this disclosure one embodiment. In one embodiment, the exponent part E, the positive and negative sign part S and the mantissa part M of the floating point data can be arranged in sequence and stored in the memory.

請參照第3圖，其繪示根據本揭露一實施例之浮點數記憶體內運算（floating-point computing in memory）裝置100之架構圖。浮點數記憶體內運算裝置100包括一指數（exponent）儲存運算模組EP及一尾數（Mantissa）儲存運算模組MT。指數儲存運算模組EP用以進行浮點數資料之指數部分E（繪示於第1圖）的儲存與運算；尾數儲存運算模組MT用以進行浮點數資料之尾數部分M（繪示於第1圖）的儲存與運算。Please refer to FIG. 3, which shows a schematic diagram of a floating-point computing in memory device 100 according to an embodiment of the present disclosure. The floating-point computing in memory device 100 includes an exponent storage and operation module EP and a mantissa storage and operation module MT. The exponent storage and operation module EP is used to store and operate the exponent part E (shown in FIG. 1) of the floating-point data; the mantissa storage and operation module MT is used to store and operate the mantissa part M (shown in FIG. 1) of the floating-point data.

指數儲存運算模組EP包括數個權重指數儲存電路SRE、數個指數運算電路LCCE及一比較電路COMP。尾數儲存運算模組MT包括一位數位移電路SHT、數個權重正負號儲存電路SRS、數個正負號運算電路LCCS、數個權重尾數儲存電路SRM、數個尾數運算電路LCCM、一位移與加法電路SHTA及一加總電路MSA。The exponent storage operation module EP includes a plurality of weight exponent storage circuits SRE, a plurality of exponent operation circuits LCCE and a comparison circuit COMP. The mantissa storage operation module MT includes a single-digit shift circuit SHT, a plurality of weight sign storage circuits SRS, a plurality of sign operation circuits LCCS, a plurality of weight mantissa storage circuits SRM, a plurality of mantissa operation circuits LCCM, a shift and addition circuit SHTA and a summing circuit MSA.

在浮點數記憶體內運算裝置100中，整合了儲存單元（如權重指數儲存電路SRE、權重正負號儲存電路SRS、權重尾數儲存電路SRM）與運算單元（如指數運算電路LCCE、比較電路COMP、位數位移電路SHT、正負號運算電路LCCS、尾數運算電路LCCM、位移與加法電路SHTA、加總電路MSA）。因此，在進行浮點運算時，可以避免資料的頻繁輸入和輸出，故具有快速運算的優勢，並可減少功耗消耗，提高了能源效率。In the floating-point number in-memory operation device 100, storage units (such as weight exponent storage circuit SRE, weight sign storage circuit SRS, weight mantissa storage circuit SRM) and operation units (such as exponent operation circuit LCCE, comparison circuit COMP, bit shift circuit SHT, sign operation circuit LCCS, mantissa operation circuit LCCM, shift and addition circuit SHTA, summing circuit MSA) are integrated. Therefore, when performing floating-point operations, frequent input and output of data can be avoided, so it has the advantage of fast operation, and can reduce power consumption and improve energy efficiency.

請同時參照第3圖及第4圖，第4圖繪示根據本揭露之一實施例之浮點數記憶體內運算裝置100進行浮點數運算的資料流程。浮點數記憶體內運算裝置100進行浮點數運算的資料流程包括指數部分E（繪示於第1圖）之對齊AL、尾數部分M（繪示於第1圖）之乘積MLP、及乘積資料ML（繪示於第1圖）之累加AC。指數部分E之對齊AL係藉由指數儲存運算模組EP之指數運算電路LCCE、比較電路COMP及尾數儲存運算模組MT之位數位移電路SHT來完成。尾數部分M之乘積MLP係藉由尾數儲存運算模組MT之尾數運算電路LCCM、位移與加法電路SHTA來完成。乘積結果之累加AC係藉由尾數儲存運算模組MT之加總電路MSA來完成。Please refer to FIG. 3 and FIG. 4 simultaneously. FIG. 4 shows the data flow of the floating-point memory internal operation device 100 performing floating-point operation according to an embodiment of the present disclosure. The data flow of the floating-point memory internal operation device 100 performing floating-point operation includes the alignment AL of the exponent part E (shown in FIG. 1), the multiplication MLP of the mantissa part M (shown in FIG. 1), and the accumulation AC of the multiplication data ML (shown in FIG. 1). The alignment AL of the exponent part E is completed by the exponent operation circuit LCCE of the exponent storage operation module EP, the comparison circuit COMP, and the bit shift circuit SHT of the mantissa storage operation module MT. The multiplication MLP of the mantissa part M is completed by the mantissa operation circuit LCCM and the shift and addition circuit SHTA of the mantissa storage operation module MT. The accumulation AC of the multiplication result is completed by the summing circuit MSA of the mantissa storage operation module MT.

在進行浮點數資料的乘積運算時，會對指數部分E進行加法運算。如第4圖所示，指數運算電路LCCE用以對數個輸入資料IN之指數部分IN_E與數個權重資料WT之指數部分WT_E分別進行一加法運算，以獲得數個指數乘積資料ML_E。權重資料WT之指數部分WT_E係儲存於第3圖之權重指數儲存電路SRE內。When performing a multiplication operation on floating point data, an addition operation is performed on the exponential part E. As shown in FIG. 4 , the exponential operation circuit LCCE is used to perform an addition operation on the exponential part IN_E of a plurality of input data IN and the exponential part WT_E of a plurality of weight data WT, respectively, to obtain a plurality of exponential product data ML_E. The exponential part WT_E of the weight data WT is stored in the weight exponent storage circuit SRE of FIG. 3 .

比較電路COMP連接於指數運算電路LCCE。比較電路COMP用以比較這些指數乘積資料ML_E，以獲得一最大指數乘積資料ML_E_max。The comparison circuit COMP is connected to the exponential operation circuit LCCE. The comparison circuit COMP is used to compare the exponential product data ML_E to obtain a maximum exponential product data ML_E_max.

位數位移電路SHT連接於指數運算電路LCCE及比較電路COMP。位數位移電路SHT依據最大指數乘積資料ML_E_max，位移這些輸入資料IN之尾數部分IN_M，以獲得位移後之尾數部分IN_M’。權重資料WT之尾數部分WT_M係儲存於第3圖之權重尾數儲存電路SRM。The bit shift circuit SHT is connected to the exponential operation circuit LCCE and the comparison circuit COMP. The bit shift circuit SHT shifts the mantissa IN_M of the input data IN according to the maximum exponential product data ML_E_max to obtain the mantissa IN_M' after shifting. The mantissa WT_M of the weight data WT is stored in the weight mantissa storage circuit SRM of FIG. 3 .

尾數運算電路LCCM連接於位數位移電路SHT。尾數運算電路LCCM用以對輸入資料IN之尾數部分IN_M’與權重資料WT之尾數部分WT_M進行一乘法運算，以獲得數個尾數乘積中間資料ML_M_im。尾數乘積中間資料ML_M_im係為乘法運算中尾數部分WT_M之每一位元與尾數部分IN_M’逐點進行乘積的資料。The mantissa operation circuit LCCM is connected to the bit shift circuit SHT. The mantissa operation circuit LCCM is used to perform a multiplication operation on the mantissa part IN_M' of the input data IN and the mantissa part WT_M of the weight data WT to obtain a plurality of mantissa product intermediate data ML_M_im. The mantissa product intermediate data ML_M_im is the data of the point-by-point product of each bit of the mantissa part WT_M and the mantissa part IN_M' in the multiplication operation.

位移與加法電路SHTA用以對這些尾數乘積中間資料ML_M_im進行位移後再進行加總，以獲得尾數乘積資料ML_M。The shift and add circuit SHTA is used to shift and then add up the mantissa product intermediate data ML_M_im to obtain the mantissa product data ML_M.

正負號運算電路LCCS則用以對輸入資料IN之正負號部分IN_S與權重資料WT之正負號部分WT_S進行一互斥或（Exclusive-OR）運算，以獲得正負號乘積資料ML_S。權重資料WT之正負號部分WT_S係儲存於第3圖之權重正負號儲存電路SRS。The sign operation circuit LCCS is used to perform an exclusive-OR operation on the sign part IN_S of the input data IN and the sign part WT_S of the weight data WT to obtain the sign product data ML_S. The sign part WT_S of the weight data WT is stored in the weight sign storage circuit SRS in FIG. 3 .

加總電路MSA則用以整合正負號乘積資料ML_S、最大指數乘積資料ML_E_max及尾數乘積資料ML_M，以獲得一輸入與權重乘積和資料MAC。The summing circuit MSA is used to integrate the sign product data ML_S, the maximum exponent product data ML_E_max and the mantissa product data ML_M to obtain an input and weight product sum data MAC.

以下更進一步詳細說明各項元件之細部結構與運作。The following is a further detailed description of the detailed structure and operation of each component.

請參照第5圖，其繪示根據本揭露一實施例之權重指數儲存電路SRE與指數運算電路LCCE的示意圖。權重指數儲存電路SRE包括數個靜態隨機存取記憶體（Static random-access memory，SRAM）SR。每一靜態隨機存取記憶體SR包括六個電晶體（即6T-SRAM）。權重指數儲存電路SRE例如是具有全域位元線GBL＜0＞～GBL＜7＞、GBLB＜0＞～GBLB＜7＞及區域位元線LBL＜0＞～LBL＜7＞、LBLB＜0＞～LBLB＜7＞。某一橫列之靜態隨機存取記憶體SR儲存一組權重資料WT之指數部分WT_E。當某一橫列之靜態隨機存取記憶體SR被開啟時，經由區域位元線LBL＜0＞～LBL＜7＞，可以向指數運算電路LCCE輸入一組權重資料WT之指數部分WT_E。Please refer to Figure 5, which shows a schematic diagram of a weight index storage circuit SRE and an index operation circuit LCCE according to an embodiment of the present disclosure. The weight index storage circuit SRE includes a plurality of static random-access memories (SRAM) SR. Each static random-access memory SR includes six transistors (i.e., 6T-SRAM). The weight index storage circuit SRE, for example, has global bit lines GBL<0>~GBL<7>, GBLB<0>~GBLB<7> and local bit lines LBL<0>~LBL<7>, LBLB<0>~LBLB<7>. A certain row of static random access memories SR stores the index part WT_E of a set of weight data WT. When a static random access memory SR of a certain row is turned on, the exponential part WT_E of a set of weight data WT can be input into the exponential operation circuit LCCE via the local bit lines LBL<0>~LBL<7>.

指數運算電路LCCE包括數個切換與預充電電路SAP及一加法器AD。切換與預充電電路SAP連接於權重指數儲存電路SRE。切換與預充電電路SAP用以接收權重資料WT之指數部分WT_E。加法器AD連接於切換與預充電電路SAP，以接收權重資料WT之指數部分WT_E。加法器AD用以對輸入資料IN之指數部分IN_E與權重資料WT之指數部分WT_E進行加法運算，以獲得指數乘積資料ML_E。The exponential operation circuit LCCE includes a plurality of switching and pre-charging circuits SAP and an adder AD. The switching and pre-charging circuit SAP is connected to the weight exponent storage circuit SRE. The switching and pre-charging circuit SAP is used to receive the exponential part WT_E of the weight data WT. The adder AD is connected to the switching and pre-charging circuit SAP to receive the exponential part WT_E of the weight data WT. The adder AD is used to perform addition operation on the exponential part IN_E of the input data IN and the exponential part WT_E of the weight data WT to obtain the exponential product data ML_E.

請參照第6圖，其繪示根據本揭露一實施例之比較電路COMP的示意圖。比較電路COMP包括數個比較器CP。比較器CP用以比較指數乘積資料ML_E的兩筆指數乘積資料ML_E。經過階層式的兩兩比較，可以獲得最大指數乘積資料ML_E_max。Please refer to FIG. 6 , which shows a schematic diagram of a comparison circuit COMP according to an embodiment of the present disclosure. The comparison circuit COMP includes a plurality of comparators CP. The comparators CP are used to compare two exponential product data ML_E of the exponential product data ML_E. After a hierarchical pairwise comparison, the maximum exponential product data ML_E_max can be obtained.

請再參照第7圖，其繪示根據本揭露一實施例之比較器CP的示意圖。本實施例之比較器CP包括一第一判斷電路CP1、一第二判斷電路CP2及一第三判斷電路CP3。第一判斷電路CP1用以比較指數乘積資料ML_E之前段位元A＜0＞～A＜2＞、B＜0＞～B＜2＞。前段位元A＜2＞與前段位元B＜2＞進行比較時，透過致能訊號EN啟動互斥或判斷器，並輸出互斥或結果C＜2＞。互斥或結果C＜2＞～C＜0＞經由判斷器的判斷，可以輸出判斷結果AWIN或判斷結果BWIN。判斷結果AWIN代表前段位元A＜0＞～A＜2＞比前段位元B＜0＞～B＜2＞大。若在第一判斷電路CP1即可判斷出大小，則無需啟動後續的第二判斷電路CP2及第三判斷電路CP3。Please refer to FIG. 7 again, which shows a schematic diagram of a comparator CP according to an embodiment of the present disclosure. The comparator CP of the present embodiment includes a first judgment circuit CP1, a second judgment circuit CP2, and a third judgment circuit CP3. The first judgment circuit CP1 is used to compare the preceding bits A<0>~A<2>, B<0>~B<2> of the exponential product data ML_E. When the preceding bit A<2> is compared with the preceding bit B<2>, the mutually exclusive OR judge is activated by the enable signal EN, and the mutually exclusive OR result C<2> is output. The mutually exclusive OR results C<2>~C<0> can be output as a judgment result AWIN or a judgment result BWIN after being judged by the judge. The judgment result AWIN indicates that the previous bit A<0>~A<2> is greater than the previous bit B<0>~B<2>. If the size can be judged in the first judgment circuit CP1, there is no need to activate the subsequent second judgment circuit CP2 and the third judgment circuit CP3.

第二判斷電路CP2連接於第一判斷電路CP1。第二判斷電路CP2用以比較指數乘積資料ML_E之中段位元。若在第二判斷電路CP2即可判斷出大小，則無需啟動後續的第三判斷電路CP3。The second judging circuit CP2 is connected to the first judging circuit CP1. The second judging circuit CP2 is used to compare the middle bit of the exponential product data ML_E. If the size can be judged in the second judging circuit CP2, there is no need to activate the subsequent third judging circuit CP3.

第三判斷電路CP3連接於第二判斷電路CP2。第三判斷電路CP3用以比較指數乘積資料ML_E之後段位元。The third judging circuit CP3 is connected to the second judging circuit CP2. The third judging circuit CP3 is used to compare the latter bits of the exponential product data ML_E.

透過比較器CP之三階段判斷電路設計，很多指數乘積資料ML_E的比較可以省略不開啟第二判斷電路CP2及第三判斷電路CP3、或省略不開啟第三判斷電路CP3。因此，可以大幅節省功率的消耗，並且加快比較的速度。Through the three-stage judgment circuit design of the comparator CP, the comparison of many exponential product data ML_E can omit the second judgment circuit CP2 and the third judgment circuit CP3, or omit the third judgment circuit CP3. Therefore, the power consumption can be greatly saved and the comparison speed can be accelerated.

請參照第8圖，其繪示根據本揭露一實施例之位數位移電路SHT之示意圖。位數位移電路SHT包括數個減法器SB及數個位移器SH。減法器SB連接於比較電路COMP。減法器SB用以對最大指數乘積資料ML_E_max與指數乘積資料ML_E進行減法運算，以獲得位移量資料OF。Please refer to FIG. 8, which shows a schematic diagram of a bit shift circuit SHT according to an embodiment of the present disclosure. The bit shift circuit SHT includes a plurality of subtractors SB and a plurality of shifters SH. The subtractors SB are connected to the comparison circuit COMP. The subtractors SB are used to perform a subtraction operation on the maximum exponential product data ML_E_max and the exponential product data ML_E to obtain the shift amount data OF.

位移器SH連接於減法器SB。位移器SH用以依據位移量資料OF位移輸入資料IN之尾數部分IN_M，以獲得位移後之尾數部分IN_M’。The shifter SH is connected to the subtracter SB. The shifter SH is used to shift the mantissa IN_M of the input data IN according to the shift amount data OF to obtain the mantissa IN_M' after the shift.

請參照第9圖，其繪示根據本揭露一實施例之尾數運算電路LCCM之示意圖。權重尾數儲存電路SRM包括數個靜態隨機存取記憶體（Static random-access memory，SRAM）SR。每一靜態隨機存取記憶體SR包括六個電晶體（即6T-SRAM）。權重尾數儲存電路SRM例如是具有全域位元線GBL＜0＞～GBL＜7＞、GBLB＜0＞～GBLB＜7＞及區域位元線LBL＜0＞～LBL＜7＞、LBLB＜0＞～LBLB＜7＞。某一橫列之靜態隨機存取記憶體SR儲存一組權重資料WT之尾數部分WT_M。當某一橫列之靜態隨機存取記憶體SR被開啟時，經由區域位元線LBL＜0＞～LBL＜7＞，可以向尾數運算電路LCCM輸入一組權重資料WT之尾數部分WT_M。Please refer to Figure 9, which shows a schematic diagram of a mantissa calculation circuit LCCM according to an embodiment of the present disclosure. The weight mantissa storage circuit SRM includes a plurality of static random-access memories (SRAM) SR. Each static random-access memory SR includes six transistors (i.e., 6T-SRAM). The weight mantissa storage circuit SRM, for example, has global bit lines GBL<0>~GBL<7>, GBLB<0>~GBLB<7> and local bit lines LBL<0>~LBL<7>, LBLB<0>~LBLB<7>. A row of static random-access memories SR stores the mantissa part WT_M of a set of weight data WT. When a static random access memory SR of a certain row is turned on, the mantissa part WT_M of a set of weight data WT can be input into the mantissa operation circuit LCCM via the local bit lines LBL<0>~LBL<7>.

尾數運算電路LCCM包括數個切換與預充電電路SAP及一逐點乘法器（point-wise multiplier）PWM。切換與預充電電路SAP連接於權重尾數儲存電路SRM。切換與預充電電路SAP用以接收權重資料WT之尾數部分WT_M。逐點乘法器PWM連接於切換與預充電電路SAP，以接收權重資料WT之尾數部分WT_M。逐點乘法器PWM用以對輸入資料IN之尾數部分IN_M與權重資料WT之尾數部分WT_M進行乘法運算，以獲得尾數乘積資料ML_M。The mantissa calculation circuit LCCM includes a plurality of switching and pre-charging circuits SAP and a point-wise multiplier PWM. The switching and pre-charging circuit SAP is connected to the weight mantissa storage circuit SRM. The switching and pre-charging circuit SAP is used to receive the mantissa part WT_M of the weight data WT. The point-wise multiplier PWM is connected to the switching and pre-charging circuit SAP to receive the mantissa part WT_M of the weight data WT. The point-wise multiplier PWM is used to perform a multiplication operation on the mantissa part IN_M of the input data IN and the mantissa part WT_M of the weight data WT to obtain the mantissa product data ML_M.

請參照第10圖，其繪示根據本揭露一實施例之逐點乘法器PWM的示意圖。逐點乘法器PWM係由複數個電晶體TR、TRB所組成。靜態隨機存取記憶體SR用以儲存儲存權重資料WT之尾數部分WT_M的位元數值，最左側之靜態隨機存取記憶體SR例如是對應於最高有效位元MSB[7]，最右側之靜態隨機存取記憶體SR例如是對應於最低有效位元LSB[0]。Please refer to FIG. 10, which shows a schematic diagram of a point-by-point multiplier PWM according to an embodiment of the present disclosure. The point-by-point multiplier PWM is composed of a plurality of transistors TR and TRB. The static random access memory SR is used to store the bit value of the mantissa part WT_M of the weight data WT. The leftmost static random access memory SR corresponds to the most significant bit MSB[7], and the rightmost static random access memory SR corresponds to the least significant bit LSB[0].

儲存權重資料WT之尾數部分WT_M之靜態隨機存取記憶體SR的位元線BL0～BL7連接於串接之電晶體TR。靜態隨機存取記憶體SR的位元線BLB0～BLB7連接於串接之電晶體TRB。電晶體TR之兩端連接於輸入端IN[0]～IN[7]與輸出端OUT0[0]～OUT0[7]，電晶體TRB之兩端連接於接地端GD與輸出端OUT0[0]～OUT0[7]。輸入資料IN之尾數部分IN_M由輸入端IN[0]～IN[7]輸入。The bit lines BL0 to BL7 of the static random access memory SR storing the mantissa WT_M of the weight data WT are connected to the serially connected transistors TR. The bit lines BLB0 to BLB7 of the static random access memory SR are connected to the serially connected transistors TRB. Both ends of the transistor TR are connected to the input terminals IN[0] to IN[7] and the output terminals OUT0[0] to OUT0[7], and both ends of the transistor TRB are connected to the ground terminal GD and the output terminals OUT0[0] to OUT0[7]. The mantissa IN_M of the input data IN is input from the input terminals IN[0] to IN[7].

根據逐點乘法器PWM的電路架構，當權重資料WT之尾數部分WT_M從位元線BL7輸入1，且輸入資料IN之尾數部分IN_M從輸入端IN[7]輸入1時，輸出端OUT7[7]輸出1。當權重資料WT之尾數部分WT_M從位元線BL7輸入1，且輸入資料IN之尾數部分IN_M從輸入端IN[6]輸入0時，輸出端OUT7[6]輸出0。當權重資料WT之尾數部分WT_M從位元線BL0輸入0，且輸入資料IN之尾數部分IN_M從輸入端IN[7]輸入0時，輸出端OUT0[7]輸出0。當權重資料WT之尾數部分WT_M從位元線BL0輸入0，且輸入資料IN之尾數部分IN_M從輸入端IN[6]輸入1時，輸出端OUT0[6]輸出0。According to the circuit structure of the point-by-point multiplier PWM, when the mantissa WT_M of the weight data WT is input as 1 from the bit line BL7 and the mantissa IN_M of the input data IN is input as 1 from the input terminal IN[7], the output terminal OUT7[7] outputs 1. When the mantissa WT_M of the weight data WT is input as 1 from the bit line BL7 and the mantissa IN_M of the input data IN is input as 0 from the input terminal IN[6], the output terminal OUT7[6] outputs 0. When the mantissa WT_M of the weight data WT is input as 0 from the bit line BL0 and the mantissa IN_M of the input data IN is input as 0 from the input terminal IN[7], the output terminal OUT0[7] outputs 0. When the mantissa part WT_M of the weight data WT is input as 0 from the bit line BL0 and the mantissa part IN_M of the input data IN is input as 1 from the input terminal IN[6], the output terminal OUT0[6] outputs 0.

透過上述逐點乘法器PWM的電路架構，即可獲得權重資料WT之尾數部分WT_M與輸入資料IN之尾數部分IN_M的逐點乘積結果。這些乘積結果即為前述之尾數乘積中間資料ML_M_im。Through the circuit structure of the point-by-point multiplier PWM, the point-by-point product results of the mantissa part WT_M of the weight data WT and the mantissa part IN_M of the input data IN can be obtained. These product results are the mantissa product intermediate data ML_M_im mentioned above.

請參照第11圖，其示例說明根據本揭露一實施例之位移與加法電路SHTA的運作。位移與加法電路SHTA用以對尾數乘積中間資料ML_M_im進行位移後再進行加總，以獲得尾數乘積資料ML_M。Please refer to FIG. 11, which illustrates the operation of the shift and add circuit SHTA according to an embodiment of the present disclosure. The shift and add circuit SHTA is used to shift the mantissa product intermediate data ML_M_im and then add them up to obtain the mantissa product data ML_M.

請參照第12圖，其繪示根據本揭露一實施例之正負號運算電路LCCS的示意圖。權重正負號儲存電路SRS包括數個靜態隨機存取記憶體（Static random-access memory，SRAM）SR。每一靜態隨機存取記憶體SR包括六個電晶體（即6T-SRAM）。權重正負號儲存電路SRS例如是具有全域位元線GBL＜7＞、GBLB＜7＞及區域位元線LBL＜7＞、LBLB＜7＞。某一靜態隨機存取記憶體SR儲存一組權重資料WT之正負號部分WT_S。當某一靜態隨機存取記憶體SR被開啟時，經由區域位元線LBL＜7＞，可以向正負號運算電路LCCS輸入一組權重資料WT之正負號部分WT_S。Please refer to Figure 12, which shows a schematic diagram of a sign operation circuit LCCS according to an embodiment of the present disclosure. The weight sign storage circuit SRS includes a plurality of static random-access memories (SRAM) SR. Each static random-access memory SR includes six transistors (i.e., 6T-SRAM). The weight sign storage circuit SRS, for example, has global bit lines GBL＜7＞, GBLB＜7＞ and local bit lines LBL＜7＞, LBLB＜7＞. A static random-access memory SR stores the sign part WT_S of a set of weight data WT. When a static random access memory SR is turned on, the sign part WT_S of a set of weight data WT can be input into the sign operation circuit LCCS via the local bit line LBL＜7＞.

正負號運算電路LCCS包括一切換與預充電電路SAP與一互斥或運算器XOR。The positive and negative sign operation circuit LCCS includes a switching and pre-charging circuit SAP and an exclusive OR operator XOR.

切換與預充電電路SAP連接於權重正負號儲存電路SRS。切換與預充電電路SAP用以接收權重資料WT之正負號部分WT_S。互斥或運算器XOR連接於切換與預充電電路SAP，以接收權重資料WT之正負號部分WT_S。互斥或運算器XOR用以對輸入資料IN之正負號部分IN_S與權重資料WT之正負號部分WT_S進行互斥或運算，以獲得正負號乘積資料ML_S。The switching and pre-charging circuit SAP is connected to the weight sign storage circuit SRS. The switching and pre-charging circuit SAP is used to receive the sign part WT_S of the weight data WT. The exclusive OR operator XOR is connected to the switching and pre-charging circuit SAP to receive the sign part WT_S of the weight data WT. The exclusive OR operator XOR is used to perform an exclusive OR operation on the sign part IN_S of the input data IN and the sign part WT_S of the weight data WT to obtain the sign product data ML_S.

根據上述說明，浮點數記憶體內運算裝置100得以支援FP16運算架構。在其他實施例中，浮點數記憶體內運算裝置100亦同時支援INT8運算架構。請參照第13圖，其示例說明本揭露一實施例之整數資料之乘積運算。浮點數資料之正負號部分S與尾數部分M組成整數資料之整數部分INT。指數部分E則不做使用。整數部分INT佔8位元。輸入資料IN可表示的範圍為0～255。權重資料WT可表示的範圍為-128～127。輸入資料IN與權重資料WT進行乘積運算後，可以獲得乘積資料ML。According to the above description, the floating-point memory in-body computing device 100 is able to support the FP16 computing architecture. In other embodiments, the floating-point memory in-body computing device 100 also supports the INT8 computing architecture. Please refer to Figure 13, which illustrates an example of the multiplication operation of integer data in an embodiment of the present disclosure. The sign part S and the mantissa part M of the floating-point data constitute the integer part INT of the integer data. The exponent part E is not used. The integer part INT occupies 8 bits. The input data IN can represent a range of 0 to 255. The weight data WT can represent a range of -128 to 127. After the input data IN and the weight data WT are multiplied, the product data ML can be obtained.

請參照第14圖，其繪示根據本揭露之一實施例之浮點數記憶體內運算裝置100進行整數運算的資料流程。浮點數記憶體內運算裝置100進行整數運算的資料流程包括乘積MLP及累加AC。乘積MLP係藉由尾數儲存運算模組MT之尾數運算電路LCCM、位移與加法電路SHTA來完成。乘積資料ML之累加AC係藉由尾數儲存運算模組MT之加總電路MSA來完成。如此一來，浮點數記憶體內運算裝置100亦可同時支援INT8運算架構。Please refer to Figure 14, which illustrates the data flow of integer operations performed by the floating-point memory in-body operation device 100 according to an embodiment of the present disclosure. The data flow of integer operations performed by the floating-point memory in-body operation device 100 includes multiplication MLP and accumulation AC. The multiplication MLP is completed by the mantissa operation circuit LCCM and the shift and addition circuit SHTA of the mantissa storage operation module MT. The accumulation AC of the multiplication data ML is completed by the summing circuit MSA of the mantissa storage operation module MT. In this way, the floating-point memory in-body operation device 100 can also support the INT8 operation architecture at the same time.

以上揭露提供用於實施本揭露之一些實施方式或實例之不同特徵。上述描述部件及配置之特定實例（例如所提及的數值或名稱）以簡化/示意本揭露之一些實施方式。當然，此等部件及配置僅為實例且並非意欲為限制性的。此外，本揭露之一些實施方式在各種實例中可重複參考符號及/或字母。此重複係出於簡單及清楚之目的，且本身並不指明所論述之各種實施方式及/或組態之間的關係。The above disclosure provides different features for implementing some embodiments or examples of the present disclosure. The above description of specific examples of components and configurations (e.g., the values or names mentioned) is to simplify/illustrate some embodiments of the present disclosure. Of course, these components and configurations are only examples and are not intended to be restrictive. In addition, some embodiments of the present disclosure may repeat reference symbols and/or letters in various examples. This repetition is for the purpose of simplicity and clarity, and does not itself indicate the relationship between the various embodiments and/or configurations discussed.

根據上述實施例，在浮點數記憶體內運算裝置100中，整合了儲存單元（如權重指數儲存電路SRE、權重正負號儲存電路SRS、權重尾數儲存電路SRM）與運算單元（如指數運算電路LCCE、比較電路COMP、位數位移電路SHT、正負號運算電路LCCS、尾數運算電路LCCM、位移與加法電路SHTA、加總電路MSA）。因此，在進行浮點運算時，可以避免資料的頻繁輸入和輸出，故具有快速運算的優勢，並可減少功耗消耗，提高了能源效率。According to the above embodiment, in the floating point number in-memory operation device 100, storage units (such as weight exponent storage circuit SRE, weight sign storage circuit SRS, weight mantissa storage circuit SRM) and operation units (such as exponent operation circuit LCCE, comparison circuit COMP, bit shift circuit SHT, sign operation circuit LCCS, mantissa operation circuit LCCM, shift and addition circuit SHTA, summing circuit MSA) are integrated. Therefore, when performing floating point operation, frequent input and output of data can be avoided, so it has the advantage of fast operation, and can reduce power consumption and improve energy efficiency.

本揭露提出之指數儲存運算模組EP及/或尾數儲存運算模組MT皆為本揭露所欲保護之範圍。倘若單獨實施本揭露之指數儲存運算模組EP，其餘部分搭配其他方式的電路設計時，仍不脫離本揭露之精神和範圍。倘若單獨實施本揭露之尾數儲存運算模組MT，其餘部分搭配其他方式的電路設計時，仍不脫離本揭露之精神和範圍。The exponent storage operation module EP and/or the mantissa storage operation module MT proposed in this disclosure are all within the scope of protection of this disclosure. If the exponent storage operation module EP of this disclosure is implemented alone, the rest of the circuit design is combined with other methods, and it still does not deviate from the spirit and scope of this disclosure. If the mantissa storage operation module MT of this disclosure is implemented alone, the rest of the circuit design is combined with other methods, and it still does not deviate from the spirit and scope of this disclosure.

綜上所述，雖然本揭露已以實施例揭露如上，然其並非用以限定本揭露。本揭露所屬技術領域中具有通常知識者，在不脫離本揭露之精神和範圍內，當可作各種之更動與潤飾。因此，本揭露之保護範圍當視後附之申請專利範圍所界定者為準。In summary, although the present disclosure has been disclosed as above by way of embodiments, it is not intended to limit the present disclosure. A person with ordinary knowledge in the technical field to which the present disclosure belongs can make various changes and modifications without departing from the spirit and scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the scope defined by the attached patent application.

100:浮點數記憶體內運算裝置 AC:累加 AD:加法器 AL:對齊 AWIN, BWIN:判斷結果 A＜0＞, A＜1＞, A＜2＞, B＜0＞, B＜1＞, B＜2＞:前段位元 BL0, BLB0, BL7, BLB7:位元線 COMP:比較電路 CP:比較器 CP1:第一判斷電路 CP2:第二判斷電路 CP3:第三判斷電路 C＜0＞, C＜1＞, C＜2＞:互斥或結果 E:指數部分 EN:致能訊號 EP:指數儲存運算模組 GBL＜0＞, GBL＜7＞, GBLB＜0＞, GBLB＜7＞:全域位元線 GD:接地端 IN:輸入資料 INT:整數部分 IN[0], IN[6], IN[7]:輸入端 IN_E:輸入資料之指數部分 IN_M:輸入資料之尾數部分 IN_M’:位移後之尾數部分 IN_S:輸入資料之正負號部分 LBL＜0＞, LBL＜7＞, LBLB＜0＞, LBL＜7＞:區域位元線 LCCE:指數運算電路 LCCM:尾數運算電路 LCCS:正負號運算電路 LSB[0]:最低有效位元 M:尾數部分 :數值 MAC:輸入與權重乘積和資料 ML:乘積資料 MLP:乘積 ML_E:指數乘積資料 ML_E_max:最大指數乘積資料 ML_M:尾數乘積資料 ML_M_im:尾數乘積中間資料 ML_S:正負號乘積資料 MSA:加總電路 MSB[7]:最高有效位元 MT:尾數儲存運算模組 OF:位移量資料 OUT0[0], OUT0[6], OUT0[7], OUT7[0], OUT7[6], OUT7[7]:輸出端 PWM:逐點乘法器 S:正負號部分 SAP:切換與預充電電路 SB:減法器 SH:位移器 SHT:位數位移電路 SHTA:位移與加法電路 SR:靜態隨機存取記憶體 SRE:權重指數儲存電路 SRM:權重尾數儲存電路 SRS:權重正負號儲存電路 TR, TRB:電晶體 WT:權重資料 WT_E:權重資料之指數部分 WT_M:權重資料之尾數部分 WT_S:權重資料之正負號部分 XOR:互斥或運算器 100: floating point memory internal operation device AC: accumulation AD: adder AL: alignment AWIN, BWIN: judgment result A＜0＞, A＜1＞, A＜2＞, B＜0＞, B＜1＞, B＜2＞: front bit BL0, BLB0, BL7, BLB7: bit line COMP: comparison circuit CP: comparator CP1: first judgment circuit CP2: second judgment circuit CP3: third judgment circuit C＜0＞, C＜1＞, C＜2＞: exclusive or result E: index part EN: enable signal EP: index storage operation module GBL＜0＞, GBL＜7＞, GBLB＜0＞, GBLB＜7＞: global bit line GD: ground terminal IN: input data INT: integer part IN[0], IN[6], IN[7]: Input terminal IN_E: Exponent of input data IN_M: Mantissa of input data IN_M': Mantissa after shift IN_S: Sign of input data LBL<0>, LBL<7>, LBLB<0>, LBL<7>: Local bit line LCCE: Exponent calculation circuit LCCM: Mantissa calculation circuit LCCS: Sign calculation circuit LSB[0]: Least significant bit M: Mantissa : Numerical MAC: Input and weight product sum data ML: Product data MLP: Product ML_E: Exponential product data ML_E_max: Maximum exponential product data ML_M: Mantissa product data ML_M_im: Mantissa product intermediate data ML_S: Significant product data MSA: Sum circuit MSB[7]: Most significant bit MT: Mantissa storage operation module OF: Displacement data OUT0[0], OUT0[6], OUT0[7], OUT7[0], OUT7[6], OUT7[7]: output PWM: point-by-point multiplier S: sign part SAP: switching and pre-charging circuit SB: subtractor SH: shifter SHT: bit shift circuit SHTA: shift and addition circuit SR: static random access memory SRE: weight exponent storage circuit SRM: weight mantissa storage circuit SRS: weight sign storage circuit TR, TRB: transistor WT: weight data WT_E: exponent part of weight data WT_M: mantissa part of weight data WT_S: sign part of weight data XOR: exclusive OR operator

第1圖示例說明本揭露一實施例之浮點數資料之乘積運算。第2圖繪示根據本揭露一實施例之浮點數資料之儲存方式。第3圖繪示根據本揭露一實施例之浮點數記憶體內運算（floating-point computing in memory）裝置之架構圖。第4圖繪示根據本揭露之一實施例之浮點數記憶體內運算裝置進行浮點數運算的資料流程。第5圖繪示根據本揭露一實施例之權重指數儲存電路與指數運算電路的示意圖。第6圖繪示根據本揭露一實施例之比較電路的示意圖。第7圖繪示根據本揭露一實施例之比較器的示意圖。第8圖繪示根據本揭露一實施例之位數位移電路之示意圖。第9圖繪示根據本揭露一實施例之尾數運算電路之示意圖。第10圖繪示根據本揭露一實施例之逐點乘法器的示意圖。第11圖示例說明根據本揭露一實施例之位移與加法電路的運作。第12圖繪示根據本揭露一實施例之正負號運算電路的示意圖。第13圖示例說明本揭露一實施例之整數資料之乘積運算。第14圖繪示根據本揭露之一實施例之浮點數記憶體內運算裝置進行整數運算的資料流程。 FIG. 1 illustrates an example of a multiplication operation of floating-point data according to an embodiment of the present disclosure. FIG. 2 illustrates a storage method of floating-point data according to an embodiment of the present disclosure. FIG. 3 illustrates an architecture diagram of a floating-point computing in memory device according to an embodiment of the present disclosure. FIG. 4 illustrates a data flow of a floating-point computing in memory device according to an embodiment of the present disclosure for performing floating-point computing. FIG. 5 illustrates a schematic diagram of a weight index storage circuit and an index computing circuit according to an embodiment of the present disclosure. FIG. 6 illustrates a schematic diagram of a comparison circuit according to an embodiment of the present disclosure. FIG. 7 illustrates a schematic diagram of a comparator according to an embodiment of the present disclosure. FIG. 8 is a schematic diagram of a digit shift circuit according to an embodiment of the present disclosure. FIG. 9 is a schematic diagram of a mantissa operation circuit according to an embodiment of the present disclosure. FIG. 10 is a schematic diagram of a point-by-point multiplier according to an embodiment of the present disclosure. FIG. 11 illustrates an example of the operation of a shift and addition circuit according to an embodiment of the present disclosure. FIG. 12 is a schematic diagram of a positive and negative sign operation circuit according to an embodiment of the present disclosure. FIG. 13 illustrates an example of the multiplication operation of integer data according to an embodiment of the present disclosure. FIG. 14 illustrates a data flow of integer operation performed by an operation device in a floating-point memory according to an embodiment of the present disclosure.

100:浮點數記憶體內運算裝置 100: Floating point memory operation device

COMP:比較電路 COMP: Comparison circuit

EP:指數儲存運算模組 EP: Exponential storage operation module

LCCE:指數運算電路 LCCE: Exponential calculation circuit

LCCM:尾數運算電路 LCCM: Mantissa calculation circuit

LCCS:正負號運算電路 LCCS: positive and negative operation circuit

MSA:加總電路 MSA: summing circuit

MT:尾數儲存運算模組 MT: Mantissa storage operation module

SHT:位數位移電路 SHT: Bit shift circuit

SHTA:位移與加法電路 SHTA: Shift and Addition Circuit

SRE:權重指數儲存電路 SRE: Weight index storage circuit

SRM:權重尾數儲存電路 SRM: weight mantissa storage circuit

SRS:權重正負號儲存電路 SRS: weight positive and negative sign storage circuit

Claims

A floating-point computing in memory device, comprising: an exponent storage operation module, comprising: a plurality of weight exponent storage circuits, for storing the exponent parts of a plurality of weight data; a plurality of exponent operation circuits, for performing an addition operation on the exponent parts of a plurality of input data and the exponent parts of the weight data to obtain a plurality of exponential product data; and a comparison circuit, for comparing the exponential product data to obtain a maximum exponential product data; and a mantissa storage operation module, comprising: A one-digit shift circuit, used to shift the mantissa of the input data according to the maximum exponential product data; A plurality of weight mantissa storage circuits, used to store the mantissa of the weight data; A plurality of mantissa operation circuits, used to perform a multiplication operation on the mantissa of the input data and the mantissa of the weight data to obtain a plurality of mantissa product intermediate data; A shift and addition circuit, used to shift the mantissa product intermediate data and then add them up to obtain a plurality of mantissa product data; A plurality of weight positive and negative sign storage circuits, used to store the positive and negative sign of the weight data; A plurality of positive and negative sign operation circuits for performing an exclusive-OR operation on the positive and negative sign parts of the input data and the positive and negative sign parts of the weight data to obtain a plurality of positive and negative sign product data; and a summing circuit for integrating the positive and negative sign product data, the maximum exponent product data and the mantissa product data to obtain an input and weight product sum data.

A floating point in-memory arithmetic device as described in claim 1, wherein each weight index storage circuit includes a plurality of static random-access memories (SRAMs).

A floating point memory in-memory arithmetic device as described in claim 2, wherein each of the static random access memories includes six transistors.

The floating point memory in-body operation device as described in claim 1, wherein each of the exponential operation circuits comprises: a plurality of switching and precharging circuits connected to the weight exponent storage circuits and used to receive the exponential part of the weight data; and an adder connected to the switching and precharging circuits and used to receive the exponential part of the weight data to perform the addition operation on the exponential part of the input data and the exponential part of the weight data to obtain the exponential product data.

The floating-point in-memory operation device as described in claim 1, wherein the comparison circuit is connected to the exponential operation circuits, and the comparison circuit includes: A plurality of comparators for comparing two exponential product data of the exponential product data.

The floating point in-memory operation device as described in claim 5, wherein each comparator comprises: a first judgment circuit for comparing the preceding bits of the exponential product data; a second judgment circuit connected to the first judgment circuit and used to compare the middle bits of the exponential product data; and a third judgment circuit connected to the second judgment circuit and used to compare the following bits of the exponential product data.

The floating-point memory intra-operation device as described in claim 1, wherein the bit shift circuit includes: a plurality of subtractors connected to the comparison circuit and used to perform subtraction operations on the maximum exponential product data and the exponential product data to obtain a plurality of displacement data; and a plurality of shifters connected to the subtractors and used to shift the mantissa of the input data according to the displacement data.

A floating point in-memory arithmetic device as described in claim 1, wherein each weight mantissa storage circuit includes a plurality of static random-access memories (SRAMs).

The floating-point memory in-body operation device as described in claim 1, wherein each of the mantissa operation circuits comprises: a plurality of switching and precharging circuits connected to the weight mantissa storage circuits and used to receive the mantissa part of the weight data; and a point-wise multiplier connected to the switching and precharging circuits and used to receive the mantissa part of the weight data to perform the multiplication operation on the mantissa part of the input data and the mantissa part of the weight data to obtain the mantissa product data.

A floating point in-memory arithmetic device as described in claim 1, wherein each of the weight sign storage circuits includes a plurality of static random-access memories (SRAMs).

A floating point in-memory arithmetic device as described in claim 10, wherein each of the static random access memories includes six transistors.

The floating point number in-memory operation device as described in claim 1, wherein each of the positive and negative sign operation circuits comprises: a switching and precharging circuit connected to the weight positive and negative sign storage circuits and used to receive the positive and negative sign parts of the weight data; and a mutual exclusion or operator connected to the switching and precharging circuit and used to receive the positive and negative sign parts of the weight data to perform the mutual exclusion or operation on the positive and negative sign parts of the input data and the positive and negative sign parts of the weight data to obtain the positive and negative sign product data.