[go: up one dir, main page]

CN114168107A - Vector matrix multiplication method with adjustable memory precision and arithmetic unit - Google Patents

Vector matrix multiplication method with adjustable memory precision and arithmetic unit Download PDF

Info

Publication number
CN114168107A
CN114168107A CN202111331694.3A CN202111331694A CN114168107A CN 114168107 A CN114168107 A CN 114168107A CN 202111331694 A CN202111331694 A CN 202111331694A CN 114168107 A CN114168107 A CN 114168107A
Authority
CN
China
Prior art keywords
vector
binary
matrix
data
order
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111331694.3A
Other languages
Chinese (zh)
Other versions
CN114168107B (en
Inventor
缪向水
李健聪
李祎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202111331694.3A priority Critical patent/CN114168107B/en
Publication of CN114168107A publication Critical patent/CN114168107A/en
Application granted granted Critical
Publication of CN114168107B publication Critical patent/CN114168107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/491Computations with decimal numbers radix 12 or 20.
    • G06F7/498Computations with decimal numbers radix 12 or 20. using counter-type accumulators
    • G06F7/4981Adding; Subtracting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

本发明公开了一种存内精度可调的矢量矩阵乘法运算方法及运算器,包括:基于运算数据的拆分转换规则将存储矩阵拆分为1个符号位矩阵、p1‑1个高位矩阵和k1个低位矩阵,并按照高低位顺序分别存储在对应的非易失存储阵列中;用于运算的非易失存储阵列有p1+k1个;基于运算数据的拆分转换规则将输入矢量拆分为1个符号位矢量、p2‑1个高位矢量和k2个低位矢量,并依次输入到p1+k1个非易失存储阵列中,依次实现符号位矢量、高位矢量和低位矢量与p1+k1个矩阵的乘法运算;基于二进制运算规则对所得乘法运算结果进行移位与累加,得到输入矢量与存储矩阵的矢量矩阵乘法运算结果;其中,高位数据和低位数据的位数根据所需计算精度进行调整,实现了精度可变的矢量‑矩阵运算。

Figure 202111331694

The invention discloses a vector-matrix multiplication operation method and an operator with adjustable in-memory precision, comprising: splitting a storage matrix into a symbol bit matrix and p 1-1 high-order bit matrix based on a split conversion rule of operation data and k 1 low-order matrices, and store them in the corresponding non-volatile storage arrays in the order of high and low bits; there are p 1 +k 1 non-volatile storage arrays for operations; the split conversion rule based on operation data will be The input vector is split into 1 sign bit vector, p 2 -1 high-order vector and k 2 low-order vectors, and input them into p 1 +k 1 non-volatile storage arrays in turn, and realize the sign bit vector and high-order vector in turn. and the multiplication operation of the low-order vector and p 1 +k 1 matrices; shift and accumulate the obtained multiplication results based on the binary operation rules to obtain the vector-matrix multiplication result of the input vector and the storage matrix; among them, the high-order data and low-order data The number of bits is adjusted according to the required calculation precision, realizing vector-matrix operations with variable precision.

Figure 202111331694

Description

Vector matrix multiplication method with adjustable memory precision and arithmetic unit
Technical Field
The invention belongs to the field of analog circuits, and particularly relates to a vector matrix multiplication method with adjustable memory precision and an arithmetic unit.
Background
With the explosive growth of the artificial intelligence era on the data volume, the traditional von Neumann computers have failed to meet the urgent demands of the current information era on computer computing power and computer energy efficiency. The in-memory computing mode is gradually showing its value as an operational mode with high operational energy efficiency, low operational delay and low operational time complexity. Among them, the non-volatile memory-based in-memory vector-matrix operation has exhibited huge energy efficiency and computational power advantages over the traditional digital computer in data intensive tasks such as manual execution, scientific computation and the like.
However, as an analog operation mode, the memory vector-matrix operation based on the nonvolatile memory has a very limited calculation accuracy due to the non-ideal effect of the device. Vector-matrix operations based on a single memory array can only provide operation results with low precision. However, when precision expansion is performed using a multi-chip array, the calculation energy efficiency tends to decrease exponentially. Therefore, in order to make the memory computing mode have better universality, a matrix operation core with adjustable memory computing precision and high energy efficiency is urgently needed.
Disclosure of Invention
Aiming at the defects or the improvement requirements of the prior art, the invention provides a vector matrix multiplication method with adjustable memory precision and an arithmetic unit, which are used for realizing variable-precision vector-matrix operation and solving the technical problem that the conventional memristive vector-matrix operation unit cannot realize precision-adjustable calculation aiming at different applications.
In order to achieve the above object, in a first aspect, the present invention provides a vector matrix multiplication method with adjustable memory precision, including the following steps:
s1, converting each operation data in the storage matrix into length p based on the splitting conversion rule of the operation data1+k1Mixed binary data of (1) to obtainTo the converted memory matrix; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p 11 high order matrix and k1The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a nonvolatile memory array stores a matrix, and the number of nonvolatile memory arrays used for operation is p1+k1A 1, p1And k1The calculation precision is adjustable according to the requirement;
s2, converting each operation data in the input vector into length p based on the splitting conversion rule of the operation data2+k2Obtaining the converted input vector by mixing the binary data; integrating each mixed system data in the converted input vector according to bits to obtain 1 symbol bit vector p2-1 high vector sum k2A low-order vector;
s3, inputting the sign bit vector, the high bit vector and the low bit vector to p in sequence1+k1In a nonvolatile memory array, a sign bit vector, a high bit vector, a low bit vector and p are sequentially realized1+k1Multiplication of the matrices;
s4, shifting and accumulating the obtained multiplication result based on the binary operation rule to obtain a vector matrix multiplication result of the input vector and the storage matrix;
the splitting and converting rule of the operation data is as follows: performing complement operation on the operation data to obtain m-bit binary complement expression; splitting the previous p bits of data in the complementary code expression according to binary bits to obtain 1 symbol bit of data and p-1 high bits of data; sequentially splitting the m-p bit data in the complementary code expression into k bits with lengths of m1、m2、…、mkAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,
Figure BDA0003349110630000021
further preferably, the nonvolatile memory array is in a cross structure, and the nonvolatile memory devices are located at the cross points; the nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; recording a nonvolatile storage array of the nonvolatile storage device based on multi-value storage as a multi-value nonvolatile storage array;
the nonvolatile memory array comprises a binary nonvolatile memory array and a multivalued nonvolatile memory array;
the binary nonvolatile memory array is used for storing a sign bit matrix and a high bit matrix;
the multivalued nonvolatile memory array is used for storing a low-order matrix.
In a second aspect, the present invention provides a vector matrix multiplication operator with adjustable memory precision, including: the device comprises an external input module, a memory computing module and a shift accumulation module;
the in-memory computing module includes p1+k1A nonvolatile memory array, wherein p1And k1The calculation precision is adjustable according to the requirement;
the external input module is used for converting each operation data in the storage matrix into the length p based on the splitting conversion rule of the operation data1+k1Obtaining the converted memory matrix by mixing the binary data; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p 11 high order matrix and k1The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a non-volatile memory array storing a matrix;
the external input module is also used for converting each operation data in the input vector into the length p respectively based on the splitting conversion rule of the operation data2+k2Obtaining the converted input vector by mixing the binary data; integrating each mixed system data in the converted input vector according to bits to obtain 1 sign bitVector, p2-1 high vector sum k2A low-order vector, and sequentially input to p1+k1A plurality of non-volatile memory arrays;
the memory computing module is used for sequentially realizing the symbol bit vector, the high bit vector, the low bit vector and the p based on the nonvolatile memory array1+k1Multiplication of the matrices;
the shift accumulation module is used for shifting and accumulating the obtained multiplication operation result based on the binary operation rule to obtain a vector matrix multiplication operation result of the input vector and the storage matrix;
the splitting and converting rule of the operation data is as follows: performing complement operation on the operation data to obtain m-bit binary complement expression; splitting the previous p bits of data in the complementary code expression according to binary bits to obtain 1 symbol bit of data and p-1 high bits of data; sequentially splitting the m-p bit data in the complementary code expression into k bits with lengths of m1、m2、…、mkAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,
Figure BDA0003349110630000041
further preferably, the nonvolatile memory array is in a cross structure, and the nonvolatile memory devices are located at the cross points; the nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; recording a nonvolatile storage array of the nonvolatile storage device based on multi-value storage as a multi-value nonvolatile storage array;
the nonvolatile memory array comprises a binary nonvolatile memory array and a multivalued nonvolatile memory array;
the binary nonvolatile memory array is used for storing a sign bit matrix and a high bit matrix;
the multivalued nonvolatile memory array is used for storing a low-order matrix.
Further preferably, the nonvolatile memory device includes: resistive random access memory, phase change memory, NOR-FLASH, spin transfer torque magnetic memory or ferroelectric field effect transistor.
Further preferably, the memory computing module includes a binary operation unit and a multivalued operation unit;
the binary operation unit comprises p1A binary operator; the binary arithmetic unit comprises a binary nonvolatile memory array and a first peripheral circuit; the first peripheral circuit comprises a first digital-to-analog conversion module and a first analog-to-digital conversion module; the output end of the first digital-to-analog conversion module is connected with the input end of the binary nonvolatile storage array; the output end of the binary nonvolatile storage array is connected with the input end of the first analog-to-digital conversion module; the first digital-to-analog conversion module is used for inputting the symbol bit vector, the high bit vector or the low bit vector into the binary nonvolatile memory array in a voltage mode; the first analog-to-digital conversion module is used for performing analog-to-digital conversion on a multiplication operation result which is output by the binary nonvolatile memory array and is characterized by current;
the multivalued operation unit includes k1A plurality of multi-value operators; the multi-value arithmetic unit comprises a multi-value nonvolatile memory array and a second peripheral circuit; the second peripheral circuit comprises a second digital-to-analog conversion module, a difference module and a second analog-to-digital conversion module; the output end of the second digital-to-analog conversion module is connected with the input end of the multi-value nonvolatile storage array; the output end of the multi-value nonvolatile memory array is connected with the input end of the differential module, and the output end of the differential module is connected with the input end of the second analog-to-digital conversion module; the second digital-to-analog conversion module is used for inputting the symbol bit vector, the high bit vector or the low bit vector into the multi-value nonvolatile memory array in a voltage mode; the output ends of the multi-value nonvolatile memory array are divided into a group two by two, and the differential module is used for carrying out differential operation on each group of output of the multi-value nonvolatile memory array; the second analog-to-digital conversion module is used for performing analog-to-digital conversion on the difference operation result input by the difference module.
Further preferably, the first digital-to-analog conversion module comprises a plurality of digital-to-analog converters, and output ends of the digital-to-analog converters are connected with input ends of each row participating in operation in the binary nonvolatile memory array in a one-to-one correspondence manner; the first analog-to-digital conversion module comprises a plurality of groups of cascaded transimpedance amplifiers and analog-to-digital converters, and the input ends of the transimpedance amplifiers and the analog-to-digital converters are connected with the output ends of all the columns participating in operation in the multi-value nonvolatile memory array in a one-to-one correspondence mode.
Further preferably, the second digital-to-analog conversion module comprises a plurality of digital-to-analog converters, and output ends of the digital-to-analog converters are connected with input ends of each row participating in operation in the multi-valued nonvolatile memory array in a one-to-one correspondence manner; the differential module comprises a plurality of differential units, and the input ends of the differential units are correspondingly connected with each group of outputs of the multi-value nonvolatile memory array one by one; the differential unit comprises two trans-impedance amplifiers and a voltage subtracter; after each group of output of the multi-value nonvolatile memory array is amplified by a trans-impedance amplifier, differential operation is carried out by a voltage subtracter; the first analog-to-digital conversion module comprises a plurality of analog-to-digital converters which are connected with the output ends of the differential units in a one-to-one correspondence mode.
Further preferably, the shift accumulation module comprises a first shift accumulation unit, a second shift accumulation unit and a third shift accumulation unit;
the input end of the first shift accumulation unit is connected with the output end of each binary operator in the binary operation unit; the input end of the second shift accumulation unit is connected with the output end of each multi-value operator in the multi-value operation unit; the input end of the third shift accumulation unit is respectively connected with the output ends of the first shift accumulation unit and the second shift accumulation unit;
the first shift accumulation unit is used for carrying out shift and accumulation operation on multiplication operation results output by each binary operator based on a binary operation rule;
the second shift accumulation unit is used for carrying out shift and accumulation operation on the multiplication operation result output by each multi-value operator based on the binary operation rule;
the third shift accumulation unit is used for shifting and accumulating the outputs of the first shift accumulation unit and the second shift accumulation unit based on the binary operation rule to obtain a vector matrix multiplication operation result of the input vector and the storage matrix.
Further preferably, the vector matrix multiplication operator with adjustable memory precision further includes: a control module;
the control module is respectively connected with the external input module, the memory calculation module and the shift accumulation module and is used for controlling the working time sequence of the external input module, the memory calculation module and the shift accumulation module in the operation process.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
1. the invention provides a memory precision adjustable vector matrix multiplication method and an arithmetic unit, which are characterized in that after an input vector participating in operation and operation data in a storage matrix are split and converted into sign bit data, high bit data and low bit data based on a splitting and converting rule of the operation data, vector matrix multiplication operation is carried out based on a nonvolatile storage array, wherein the bit number of the high bit data and the low bit data is adjusted according to required calculation precision, variable-precision vector-matrix operation is realized, and the technical problem that the existing memristive vector-matrix operation unit cannot realize precision adjustable calculation aiming at different applications is solved.
2. The vector matrix multiplication operation method and the arithmetic unit with adjustable memory precision, provided by the invention, carry out vector matrix multiplication operation based on a binary nonvolatile memory array and a multi-value nonvolatile memory array, adopt the binary nonvolatile memory array to store a sign bit matrix and a high bit matrix, adopt the multi-value nonvolatile memory array to store a low bit matrix, and for memory calculation based on a nonvolatile memory, the accuracy of binary calculation is obviously higher than that of multi-value calculation, and the energy efficiency of the multi-value calculation is obviously higher than that of the binary calculation. The invention ensures that the data bit with obvious influence on the calculation is calculated by the binary unit to ensure the calculation precision, and the data bit with small influence on the calculation structure is calculated by the multi-value unit, thereby effectively reducing the influence of the nonideal effect of the device on the calculation result, ensuring the energy efficiency of the calculation, and realizing the precision adjustable memory calculation with high robustness and high energy efficiency.
3. The vector matrix multiplication method and the arithmetic unit with adjustable memory precision can enable more high-order data to be operated in a binary nonvolatile memory array when the calculation precision needs to be ensured; when the calculation energy efficiency needs to be ensured, more low-bit data can be operated in the multi-valued nonvolatile memory array; the calculation precision requirement and the calculation energy efficiency requirement can be well balanced.
4. The vector matrix multiplication arithmetic unit with adjustable memory precision provided by the invention carries out shift and accumulation operation on multiplication operation results output by the nonvolatile memory array in a grading way, and because the digital operation unit has very high operation precision and operation robustness, the digital operation unit is used for carrying out shift accumulation operation, so that extra calculation errors are not introduced in the shift accumulation process, and the accuracy of the shift accumulation results is ensured.
5. The arithmetic unit formed by the vector matrix multiplication method with adjustable memory precision has good universality, can be used for supporting the vector-matrix multiplication process in scientific calculation such as machine learning and the like based on the characteristic of adjustable precision, and has the remarkable advantages of calculation power and power consumption overhead compared with the conventional calculation system.
Drawings
Fig. 1 is a flowchart of a memory-precision-adjustable vector matrix multiplication method according to embodiment 1 of the present invention;
fig. 2 is a schematic diagram of weight slicing on a memory matrix according to embodiment 1 of the present invention;
fig. 3 is a schematic diagram of weight slicing of an input vector according to embodiment 1 of the present invention;
FIG. 4 is a diagram illustrating a stage-by-stage shift and accumulation operation performed on multiplication results according to embodiment 1 of the present invention;
FIG. 5 is a schematic diagram of a nonvolatile memory array provided in embodiment 2 of the present invention;
fig. 6 is a schematic structural diagram of an in-memory computing module according to embodiment 2 of the present invention;
FIG. 7 is a schematic diagram of a binary operator according to embodiment 2 of the present invention;
FIG. 8 is a block diagram of a multi-valued arithmetic unit according to embodiment 2 of the present invention;
fig. 9 is a schematic structural diagram of a memory-precision-adjustable vector matrix multiplication operator according to embodiment 2 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Examples 1,
A vector matrix multiplication method with adjustable memory precision, as shown in fig. 1, includes the following steps:
s1, converting each operation data in the storage matrix into length p based on the splitting conversion rule of the operation data1+k1Obtaining the converted memory matrix by mixing the binary data; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p 11 high order matrix and k1The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a nonvolatile memory array stores a matrix, and the number of nonvolatile memory arrays used for operation is p1+k1A 1, p1And k1All are integers which can be adjusted according to the required calculation precision; for example, when executing a machine learning algorithm, the required computational accuracy is 8-bit, and p can be taken1=k1When performing a scientific computational task, the required computational accuracy is 16-bit, and p can be taken1=k1=4。
The splitting and converting rule of the operation data is as follows: performing complement operation on the operation data to obtain m-bit binary complement expression; splitting the previous p bit data (the first bit data is the sign bit data) in the complementary code expression according to binary bits to obtain 1 sign bit data and p-1 high bit data; sequentially tearing back m-p bit data in complementary code expressionDivided into k pieces with length of m1、m2、…、mkAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,
Figure BDA0003349110630000091
and p and k are integers and are adjustable according to required calculation precision. It should be noted that, in the splitting process, the sign bit and bits 2 to p which can significantly affect the calculation result are both split into binary data; sequentially splitting m-p-bit data into k pieces of m lengths1、m2、…、mkGenerally, for the result after splitting the m-p bit data, the length of the binary data is shorter the farther the front is; in this example, m1、m2、…、mkAre sequentially increased from left to right; the values of p and k can be modulated according to the required computational accuracy.
Specifically, as shown in fig. 2, the mixed binary data of the memory matrix includes 1 symbol bit data, p, arranged in order from the upper bit to the lower bit1-1 high order data sum k1Individual low bit data; the sign bit data and the high bit data are binary data expressed by the first p in the binary complement of the operation data1Splitting bit data according to binary bits to obtain the split bit data; the lower data are decimal data expressed by the last m-p in the binary complement of the operation data1Bit data sequence split into k1Each length is m1、m2、…、mk1Performing decimal conversion on the binary data to obtain the decimal data; wherein,
Figure BDA0003349110630000092
and m is the length of the two's complement expression.
S2, converting each operation data in the input vector into length p based on the splitting conversion rule of the operation data2+k2To obtain the converted input vectorAn amount; integrating each mixed system data in the converted input vector according to bits to obtain 1 symbol bit vector p2-1 high vector sum k2A low-order vector; wherein p is2And k2The values are integers and are adjustable according to required calculation precision; likewise, for example, when executing a machine learning algorithm, the required computational accuracy is 8-bit, then p can be taken2=k2When performing a scientific computational task, the required computational accuracy is 16-bit, and p can be taken2=k2=4。
Specifically, as shown in fig. 3, the operation data in the input vector is converted by the same conversion method as the operation data in the memory matrix, and the length p is obtained2+k2And mixing the binary data to obtain the converted input vector.
It should be noted that the input vector and the memory matrix may not be consistent in accuracy or in accuracy.
S3, inputting the sign bit vector, the high bit vector and the low bit vector to p in sequence1+k1In a nonvolatile memory array, a sign bit vector, a high bit vector, a low bit vector and p are sequentially realized1+k1Multiplication of the matrices;
specifically, in this embodiment, 1 sign bit vector, p, corresponding to the input vector2-1 high vector sum k2The low-order bit vectors are sequentially arranged from high order to low order according to the original splitting order, and the sign bit vectors are simultaneously input into the matrix for storing 1 sign bit and p according to the arrangement order of the high order and the low order 11 high order matrix and k1P of a lower matrix1+k1In a non-volatile memory array, implementing a sign bit vector and p1+k1Multiplication of the matrices; a, (a) is 1,22-1) high order vectors are simultaneously input to the above-mentioned matrix for storing 1 sign bit, p 11 high order matrix and k1P of a lower matrix1+k1In a non-volatile memory array, implementing high bit vectors and p1+k1Multiplication of the matrices; the method comprises the following steps of (1, 2, 1.. times.p)2-1) low-order vectors are simultaneously input into the above-mentioned matrix for storing 1 sign bit, p 11 high order matrix and k1P of a lower matrix1+k1In a non-volatile memory array, implementing high bit vectors and p1+k1Multiplication of the matrices.
S4, shifting and accumulating the obtained multiplication result based on the binary operation rule to obtain a vector matrix multiplication result of the input vector and the storage matrix;
furthermore, the nonvolatile memory array is of a cross structure, and the nonvolatile memory devices are positioned on the cross points; the nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; recording a nonvolatile storage array of the nonvolatile storage device based on multi-value storage as a multi-value nonvolatile storage array;
in an optional embodiment, the nonvolatile memory array includes a binary nonvolatile memory array and a multivalued nonvolatile memory array; the binary non-volatile memory array has p1A plurality for storing a sign bit matrix and a high bit matrix; the multivalued nonvolatile memory array has k1And the low-order matrix is used for storing the low-order matrix.
Further, in an optional embodiment, as shown in fig. 4, the present invention performs shift and accumulation operations on the multiplication result in stages, and specifically, the shift and accumulation operations may be performed on the multiplication result output by the binary nonvolatile memory array and the multiplication result output by the multivalued nonvolatile memory array, respectively, and then the shift and accumulation operations may be further performed on the result obtained by the shift and accumulation operations.
It should be noted that the vector matrix multiplication method with adjustable memory precision has the characteristic of balancing the calculation precision requirement and the calculation energy efficiency requirement, and when the calculation precision needs to be ensured, more high-bit data are operated in the binary nonvolatile memory array; when the calculation energy efficiency needs to be guaranteed, more low-bit data are operated in the multi-value nonvolatile memory array. Therefore, according to the accuracy of the user pairWhen in operation, the operation precision can be ensured by adopting binary nonvolatile memory array operation, and the operation energy efficiency can be ensured by adopting multi-value nonvolatile memory array operation; can be obtained by adjusting p1And k1Is achieved by the value of (c).
Examples 2,
An in-memory precision-adjustable vector matrix multiplication arithmetic unit can be used for realizing the in-memory precision-adjustable vector matrix multiplication arithmetic method provided by embodiment 1 of the present invention, and includes: the device comprises an external input module, a memory computing module and a shift accumulation module;
the in-memory computing module includes p1+k1A nonvolatile memory array, wherein p1And k1The calculation precision is adjustable according to the requirement;
the external input module is used for converting each operation data in the storage matrix into the length p based on the splitting conversion rule of the operation data1+k1Obtaining the converted memory matrix by mixing the binary data; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p 11 high order matrix and k1The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a non-volatile memory array storing a matrix;
the external input module is also used for converting each operation data in the input vector into the length p respectively based on the splitting conversion rule of the operation data2+k2Obtaining the converted input vector by mixing the binary data; integrating each mixed system data in the converted input vector according to bits to obtain 1 symbol bit vector p2-1 high vector sum k2A low-order vector, and sequentially input to p1+k1A plurality of non-volatile memory arrays; it should be noted that the input vector and the memory matrix may not be consistent in accuracy or in accuracy.
The memory computing module is used for sequentially realizing the symbol bit vector, the high bit vector, the low bit vector and the p based on the nonvolatile memory array1+k1Of a matrixMultiplication operation;
the shift accumulation module is used for shifting and accumulating the obtained multiplication operation result based on the binary operation rule to obtain a vector matrix multiplication operation result of the input vector and the storage matrix;
the splitting and converting rule of the operation data is as follows: splitting operation data based on an operation rule of a binary signed number, specifically, performing complement operation on the operation data to obtain a binary complement expression of m bits; splitting the previous p bits of data in the complementary code expression according to binary bits to obtain 1 symbol bit of data and p-1 high bits of data; sequentially splitting the m-p bit data in the complementary code expression into k bits with lengths of m1、m2、…、mkAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,
Figure BDA0003349110630000121
specifically, as shown in FIG. 5, the non-volatile memory array includes an input port and an output port, wherein the input port is formed by row lines of the array and the output port is formed by column lines of the array. The nonvolatile memory array is of a cross structure, and the nonvolatile memory devices are positioned on the cross points; the selected nonvolatile memory device can be Resistive Random Access Memory (RRAM), Phase Change Memory (PCM), NOR-FLASH, spin transfer torque magnetic memory (STT-MRAM), ferroelectric field effect transistor (FeFET), etc. The nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; a nonvolatile memory array of a nonvolatile memory device based on multi-value storage is recorded as a multi-value nonvolatile memory array.
In an optional implementation, the nonvolatile memory array includes a binary nonvolatile memory array and a multivalued nonvolatile memory array; the binary non-volatile memory array has p1Is used for storingStoring a sign bit matrix and a high bit matrix input by an external input module; the multivalued nonvolatile memory array has k1And the low-order matrix is used for storing the input of the external input module.
Specifically, as shown in fig. 6, the memory computing module includes the binary operation unit and a multi-valued operation unit;
a binary operation unit:
the binary operation unit comprises p1A binary operator; as shown in fig. 7, the binary operator includes a binary nonvolatile memory array and a first peripheral circuit;
the first peripheral circuit comprises a first digital-to-analog conversion module and a first analog-to-digital conversion module; the output end of the first digital-to-analog conversion module is connected with the input end of the binary nonvolatile storage array; the output end of the binary nonvolatile storage array is connected with the input end of the first analog-to-digital conversion module;
the first digital-to-analog conversion module is used for inputting a sign bit vector, a high bit vector or a low bit vector which is input by the external input module into the binary nonvolatile storage array in a voltage mode; the first analog-to-digital conversion module is used for performing analog-to-digital conversion on a multiplication operation result which is output by the binary nonvolatile memory array and is characterized by current.
In an optional implementation manner, the first digital-to-analog conversion module includes a plurality of digital-to-analog converters (DACs), and output ends of the DACs are connected with input ends of each row participating in operation in the binary nonvolatile memory array in a one-to-one correspondence manner; the first analog-to-digital conversion module comprises a plurality of groups of cascaded transimpedance amplifiers and analog-to-digital converters (ADC), wherein the input end of each group is connected with the output end of each column participating in operation in the multi-value nonvolatile memory array in a one-to-one correspondence mode; the output end of the transimpedance amplifier is connected with the input end of the analog-to-digital converter.
Further, in an alternative embodiment, the array size for the memory matrix in the binary nonvolatile memory array is M × N; the first digital-to-analog conversion module is composed of M digital-to-analog converters (DAC), and the first analog-to-digital conversion module is composed of N trans-impedance amplifiers and N analog-to-digital converters (ADC).
A multivalued arithmetic unit:
the multivalued operation unit includes k1A plurality of multi-value operators; as shown in fig. 8, the multi-value operator includes a multi-value nonvolatile memory array and a second peripheral circuit;
the second peripheral circuit comprises a second digital-to-analog conversion module, a difference module and a second analog-to-digital conversion module; the output end of the second digital-to-analog conversion module is connected with the input end of the multi-value nonvolatile storage array; the output end of the multi-value nonvolatile memory array is connected with the input end of the differential module, and the output end of the differential module is connected with the input end of the second analog-to-digital conversion module; the second digital-to-analog conversion module is used for inputting the symbol bit vector, the high bit vector or the low bit vector into the multi-value nonvolatile memory array in a voltage mode; the output ends of the multi-value nonvolatile memory array are divided into a group two by two, and the differential module is used for carrying out differential operation on each group of output of the multi-value nonvolatile memory array; the second analog-to-digital conversion module is used for performing analog-to-digital conversion on the difference operation result input by the difference module. In an optional implementation manner, the second digital-to-analog conversion module includes a plurality of digital-to-analog converters, and output ends of the digital-to-analog converters are connected with input ends of each row participating in operation in the multi-valued nonvolatile memory array in a one-to-one correspondence manner; the differential module comprises a plurality of differential units, and the input ends of the differential units are correspondingly connected with each group of outputs of the multi-value nonvolatile memory array one by one; the differential unit comprises two trans-impedance amplifiers and a voltage subtracter; after each group of output of the multi-value nonvolatile memory array is amplified by a trans-impedance amplifier, differential operation is carried out by a voltage subtracter; the first analog-to-digital conversion module comprises a plurality of analog-to-digital converters which are connected with the output ends of the differential units in a one-to-one correspondence mode.
Further, in an alternative embodiment, the array size for the memory matrix in the multivalued nonvolatile memory array is K × 2L; for example, the size of the multivalued nonvolatile memory array is X × Y; wherein X is a positive integer and Y is an even number; when a matrix with the size of K multiplied by L is stored in the first K rows and the first 2L columns of the multi-value nonvolatile memory array, the element of the K row and the L column in the matrix is the difference between the conductance value of the K row 2L-1 column and the conductance value of the K row 2L column of the multi-value nonvolatile memory array, wherein K is 1,2, …, K, L is 1,2, …, L, K is less than or equal to X, and 2L is less than or equal to Y. The second digital-to-analog conversion module is composed of K digital-to-analog converters (DAC), and the second analog-to-digital conversion module is composed of 2L trans-impedance amplifiers and L analog-to-digital converters (ADC). The difference module is composed of 2L electric trans-impedance amplifiers and L voltage subtractors, wherein the trans-impedance amplifiers are connected to the output end of the multi-value nonvolatile storage array, two adjacent columns of the multi-value nonvolatile storage array are arranged into a group of difference pairs and connected to the voltage subtractors, the analog-to-digital conversion module is composed of L analog-to-digital converters (ADC), and the input end of the analog-to-digital conversion module is connected with the output end of the voltage subtractors.
Further, in an optional implementation manner, the shift accumulation module includes a first shift accumulation unit, a second shift accumulation unit, and a third shift accumulation unit; the shift accumulation output unit in the embodiment comprises an Arithmetic Logic Unit (ALU) and a corresponding cache structure;
the input end of the first shift accumulation unit is connected with the output end of each binary operator in the binary operation unit; the input end of the second shift accumulation unit is connected with the output end of each multi-value operator in the multi-value operation unit; the input end of the third shift accumulation unit is respectively connected with the output ends of the first shift accumulation unit and the second shift accumulation unit;
the first shift accumulation unit is used for carrying out shift and accumulation operation on multiplication operation results output by each binary operator based on a binary operation rule;
the second shift accumulation unit is used for carrying out shift and accumulation operation on the multiplication operation result output by each multi-value operator based on the binary operation rule;
the third shift accumulation unit is used for shifting and accumulating the outputs of the first shift accumulation unit and the second shift accumulation unit based on the binary operation rule to obtain a vector matrix multiplication operation result of the input vector and the storage matrix.
In an optional implementation manner, the vector matrix multiplication operator with adjustable in-memory precision further includes: a control module; specifically, the control module comprises an Arithmetic Logic Unit (ALU) and a corresponding cache structure;
the control module is respectively connected with the external input module, the memory calculation module and the shift accumulation module and is used for integrally controlling the working time sequence of the external input module, the memory calculation module and the shift accumulation module in the operation process.
In an alternative embodiment, a schematic structural diagram of the memory precision-adjustable vector matrix multiplication operator is shown in fig. 9, and a hardware structure thereof includes: the device comprises an external input module, an internal memory calculation module and a shift accumulation module. The memory computing module comprises a binary arithmetic unit and a multi-value arithmetic unit, and each arithmetic unit internally comprises a plurality of arithmetic units. Each of the operators is composed of a nonvolatile memory array and its corresponding peripheral circuits. During operation, the matrix to be operated is split according to the binary operation rule and is stored in the nonvolatile memory array of the operation core. The input vector is split by an external input unit according to the binary operation rule and then is input into the memory computing module. And after the memory computing module finishes computing, inputting the result into a shift accumulation output module, finishing shift accumulation operation according to the binary operation rule and outputting the computing result. The invention adopts a mixed operation framework, effectively reduces the influence of the nonideal effect of the device on the calculation result by adopting a mode that a binary operation unit processes high-order data and a multi-valued operation unit processes low-order data, simultaneously ensures the energy efficiency of calculation, and can realize high-robustness and high-energy-efficiency precision-adjustable memory calculation. The related technical features are the same as above, and are not described herein.
It should be noted that, for different operation requirements, the number of called nonvolatile memory arrays may be adjusted to perform operations with different accuracies, and the storage accuracy of devices in the multi-valued arithmetic unit may also be adjusted to fully implement operations with adjustable accuracy. Further, when the calculation accuracy needs to be ensured, more high-order data can be operated in the binary operation unit. When the calculation energy efficiency needs to be ensured, more low-bit data can be operated in the multi-value operation unit. Therefore, during the operation, the operation precision can be ensured by all binary operation unit operations, and the operation energy efficiency can be ensured by all multi-value operation unit operations
Further, under the condition that the usage scenario allows (such as hybrid precision architecture solution equation and inference application of a neural network), the binary operation unit can be used for simultaneously completing the binary operation requirement and the multi-value operation requirement, so that the circuit area is reduced, and higher operation energy efficiency is ensured.
The related technical features are the same as those of embodiment 1, and are not described herein.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1.一种存内精度可调的矢量矩阵乘法运算方法,其特征在于,包括以下步骤:1. a vector-matrix multiplication method with adjustable in-memory precision, is characterized in that, comprises the following steps: S1、基于运算数据的拆分转换规则,将存储矩阵中的各运算数据分别转换为长度为p1+k1的混合进制数据,得到转换后的存储矩阵;将所述转换后的存储矩阵中的各混合进制数据按位进行整合,得到1个符号位矩阵、p1-1个高位矩阵和k1个低位矩阵,并按照高低位顺序分别存储在对应的非易失存储阵列中;一个非易失存储阵列存储一个矩阵,用于进行运算的非易失存储阵列的个数为p1+k1个,p1和k1根据所需计算精度可调;S1, based on the split conversion rule of the operation data, each operation data in the storage matrix is converted into mixed binary data with a length of p 1 +k 1 respectively, and the converted storage matrix is obtained; the converted storage matrix is The mixed-system data in are integrated bit by bit to obtain 1 sign bit matrix, p 1 -1 high order matrix and k 1 low order matrix, and are respectively stored in the corresponding non-volatile storage array according to the order of high and low order; A non-volatile storage array stores a matrix, the number of non-volatile storage arrays used for operations is p 1 +k 1 , and p 1 and k 1 are adjustable according to the required calculation precision; S2、基于运算数据的拆分转换规则,将输入矢量中的各运算数据分别转换为长度为p2+k2的混合进制数据,得到转换后的输入矢量;将所述转换后的输入矢量中的各混合进制数据按位进行整合,得到1个符号位矢量、p2-1个高位矢量和k2个低位矢量;S2, based on the split conversion rule of the operational data, convert each operational data in the input vector into mixed binary data with a length of p 2 +k 2 respectively, to obtain a converted input vector; convert the converted input vector Each mixed binary data in is integrated bit by bit to obtain 1 symbol bit vector, p 2 -1 high-order vector and k 2 low-order vectors; S3、将所述符号位矢量、所述高位矢量和所述低位矢量依次输入到p1+k1个所述非易失存储阵列中,依次实现所述符号位矢量、所述高位矢量和所述低位矢量与p1+k1个矩阵的乘法运算;S3. Input the sign bit vector, the high order vector and the low order vector into p 1 +k 1 non-volatile storage arrays in sequence, and realize the sign bit vector, the high order vector and all the Multiplication of the low-order vector and p 1 +k 1 matrices; S4、基于二进制运算规则对所得乘法运算结果进行移位与累加,得到所述输入矢量与所述存储矩阵的矢量矩阵乘法运算结果;S4, based on the binary operation rule, the obtained multiplication result is shifted and accumulated to obtain the vector-matrix multiplication result of the input vector and the storage matrix; 所述运算数据的拆分转换规则为:对运算数据进行补码操作,得到m比特的二进制补码表达;将所述补码表达中的前p位数据按二进制位进行拆分,得到1个符号位数据和p-1个高位数据;将所述补码表达中后m-p位数据顺序拆分为k个长度分别为m1、m2、…、mk的二进制数据后,转化为对应的十进制数,得到k个低位数据;进而得到从高位到低位依次排列的1个符号位数据、p-1个高位数据和k个低位数据,从而得到长度为p+k的混合进制数据;其中,
Figure FDA0003349110620000011
The split conversion rule of the operation data is: perform a complement code operation on the operation data to obtain an m-bit twos complement expression; split the first p bits of data in the complement expression according to binary bits to obtain 1 Sign bit data and p-1 high-order data; after the rear mp bit data in the complement expression is sequentially split into k binary data with lengths m 1 , m 2 , . . . , m k , respectively, and then converted into corresponding Decimal number, get k low-order data; then obtain 1 symbol bit data, p-1 high-order data and k low-order data arranged in sequence from high order to low order, so as to obtain the mixed binary data of length p+k; ,
Figure FDA0003349110620000011
2.根据权利要求1所述的矢量矩阵乘法运算方法,其特征在于,所述非易失存储阵列为十字交叉结构,非易失存储器件位于十字交叉点上;所述非易失存储器件用于进行二值存储或多值存储;基于二值存储的非易失存储器件的非易失存储阵列记为二值非易失存储阵列;基于多值存储的非易失存储器件的非易失存储阵列记为多值非易失存储阵列;2. The vector-matrix multiplication operation method according to claim 1, wherein the non-volatile storage array is a cross structure, and the non-volatile storage device is located on the cross point; For binary storage or multi-value storage; non-volatile storage arrays based on non-volatile storage devices based on binary storage are recorded as binary non-volatile storage arrays; non-volatile storage arrays based on non-volatile storage devices based on multi-value storage The storage array is recorded as a multi-valued non-volatile storage array; 所述非易失存储阵列包括所述二值非易失存储阵列和所述多值非易失存储阵列;The non-volatile storage array includes the binary non-volatile storage array and the multi-value non-volatile storage array; 所述二值非易失存储阵列用于存储符号位矩阵和高位矩阵;The binary nonvolatile storage array is used to store the sign bit matrix and the high order matrix; 所述多值非易失存储阵列用于存储低位矩阵。The multi-valued non-volatile memory array is used to store the low order matrix. 3.一种存内精度可调的矢量矩阵乘法运算器,其特征在于,包括:外部输入模块、存内计算模块和移位累加模块;3. A vector-matrix multiplier with adjustable in-memory precision, comprising: an external input module, an in-memory calculation module and a shift-accumulate module; 所述存内计算模块包括p1+k1个非易失存储阵列,其中,p1和k1根据所需计算精度可调;The in-memory computing module includes p 1 +k 1 non-volatile storage arrays, wherein p 1 and k 1 are adjustable according to the required computing precision; 所述外部输入模块用于基于运算数据的拆分转换规则,将存储矩阵中的各运算数据分别转换为长度为p1+k1的混合进制数据,得到转换后的存储矩阵;将所述转换后的存储矩阵中的各混合进制数据按位进行整合,得到1个符号位矩阵、p1-1个高位矩阵和k1个低位矩阵,并按照高低位顺序分别存储在对应的非易失存储阵列中;一个非易失存储阵列存储一个矩阵;The external input module is used to convert each operation data in the storage matrix into mixed binary data with a length of p 1 +k 1 based on the split conversion rule of the operation data, to obtain the converted storage matrix; The mixed binary data in the converted storage matrix is integrated bit by bit to obtain 1 sign bit matrix, p 1 -1 high-order matrix and k 1 low-order matrix, and store them in the corresponding non-volatile matrix respectively according to the high and low order. volatile storage array; a non-volatile storage array stores a matrix; 所述外部输入模块还用于基于运算数据的拆分转换规则,将输入矢量中的各运算数据分别转换为长度为p2+k2的混合进制数据,得到转换后的输入矢量;将所述转换后的输入矢量中的各混合进制数据按位进行整合,得到1个符号位矢量、p2-1个高位矢量和k2个低位矢量,并依次输入到p1+k1个非易失存储阵列中;The external input module is also used to convert each operational data in the input vector into mixed binary data with a length of p 2 +k 2 respectively based on the split conversion rule of the operational data, to obtain the converted input vector; The mixed binary data in the converted input vector is integrated bit by bit to obtain 1 sign bit vector, p 2 -1 high-order vector and k 2 low-order vectors, and input them in sequence to p 1 +k 1 non-signal bit vector volatile memory array; 所述存内计算模块用于基于非易失存储阵列依次实现所述符号位矢量、所述高位矢量和所述低位矢量与p1+k1个矩阵的乘法运算;The in-memory computing module is configured to sequentially implement the multiplication operation of the sign bit vector, the high-order vector, and the low-order vector with p 1 +k 1 matrices based on a non-volatile storage array; 所述移位累加模块用于基于二进制运算规则对所得乘法运算结果进行移位与累加,得到所述输入矢量与所述存储矩阵的矢量矩阵乘法运算结果;The shift-accumulation module is used for shifting and accumulating the obtained multiplication result based on the binary operation rule, to obtain the vector-matrix multiplication result of the input vector and the storage matrix; 所述运算数据的拆分转换规则为:对运算数据进行补码操作,得到m比特的二进制补码表达;将所述补码表达中的前p位数据按二进制位进行拆分,得到1个符号位数据和p-1个高位数据;将所述补码表达中后m-p位数据顺序拆分为k个长度分别为m1、m2、…、mk的二进制数据后,转化为对应的十进制数,得到k个低位数据;进而得到从高位到低位依次排列的1个符号位数据、p-1个高位数据和k个低位数据,记为长度为p+k的混合进制数据;其中,
Figure FDA0003349110620000031
The split conversion rule of the operation data is: perform a complement code operation on the operation data to obtain an m-bit twos complement expression; split the first p bits of data in the complement expression according to binary bits to obtain 1 Sign bit data and p-1 high-order data; after the rear mp bit data in the complement expression is sequentially split into k binary data with lengths m 1 , m 2 , . . . , m k , respectively, and then converted into corresponding Decimal number, obtain k low-order data; and then obtain 1 symbol bit data, p-1 high-order data and k low-order data arranged in sequence from high order to low order, which is recorded as the mixed system data of length p+k; ,
Figure FDA0003349110620000031
4.根据权利要求3所述的矢量矩阵乘法运算器,其特征在于,所述非易失存储阵列为十字交叉结构,非易失存储器件位于十字交叉点上;所述非易失存储器件用于进行二值存储或多值存储;基于二值存储的非易失存储器件的非易失存储阵列记为二值非易失存储阵列;基于多值存储的非易失存储器件的非易失存储阵列记为多值非易失存储阵列;4 . The vector-matrix multiplier according to claim 3 , wherein the non-volatile storage array is a cross structure, and the non-volatile storage device is located at the cross point; the non-volatile storage device uses For binary storage or multi-value storage; non-volatile storage arrays based on non-volatile storage devices based on binary storage are recorded as binary non-volatile storage arrays; non-volatile storage arrays based on non-volatile storage devices based on multi-value storage The storage array is recorded as a multi-valued non-volatile storage array; 所述非易失存储阵列包括二值非易失存储阵列和多值非易失存储阵列;The non-volatile storage array includes a binary non-volatile storage array and a multi-value non-volatile storage array; 所述二值非易失存储阵列用于存储符号位矩阵和高位矩阵;The binary nonvolatile storage array is used to store the sign bit matrix and the high order matrix; 所述多值非易失存储阵列用于存储低位矩阵。The multi-valued non-volatile memory array is used to store the low order matrix. 5.根据权利要求4所述的矢量矩阵乘法运算器,其特征在于,所述非易失存储器件包括:阻变存储器、相变存储器、NOR-FLASH、自旋转移力矩磁存储器或铁电场效应晶体管。5. The vector-matrix multiplier according to claim 4, wherein the non-volatile memory device comprises: resistive memory, phase change memory, NOR-FLASH, spin transfer torque magnetic memory or ferroelectric field effect transistor. 6.根据权利要求4所述的矢量矩阵乘法运算器,其特征在于,所述存内计算模块包括二值运算单元和多值运算单元;6. The vector-matrix multiplier according to claim 4, wherein the in-memory computing module comprises a binary arithmetic unit and a multi-valued arithmetic unit; 所述二值运算单元包括p1个二值运算器;所述二值运算器包括所述二值非易失存储阵列和第一外围电路;所述第一外围电路包括第一数模转换模块和第一模数转换模块;所述第一数模转换模块的输出端与所述二值非易失存储阵列的输入端相连;所述二值非易失存储阵列的输出端与所述第一模数转换模块的输入端相连;所述第一数模转换模块用于将所述符号位矢量、所述高位矢量或所述低位矢量,以电压形式输入所述二值非易失存储阵列中;所述第一模数转换模块用于对所述二值非易失存储阵列输出的表征为电流的乘法运算结果进行模数转换;The binary operation unit includes p 1 binary operators; the binary operator includes the binary non-volatile storage array and a first peripheral circuit; the first peripheral circuit includes a first digital-to-analog conversion module and a first analog-to-digital conversion module; the output end of the first digital-to-analog conversion module is connected to the input end of the binary nonvolatile storage array; the output end of the binary nonvolatile storage array is connected to the second The input end of an analog-to-digital conversion module is connected; the first digital-to-analog conversion module is used to input the sign bit vector, the high-order vector or the low-order vector into the binary non-volatile storage array in the form of voltage in; the first analog-to-digital conversion module is configured to perform analog-to-digital conversion on the multiplication result outputted by the binary nonvolatile storage array and characterized as current; 所述多值运算单元包括k1个多值运算器;所述多值运算器包括所述多值非易失存储阵列和第二外围电路;所述第二外围电路包括第二数模转换模块、差分模块和第二模数转换模块;所述第二数模转换模块的输出端与所述多值非易失存储阵列的输入端相连;所述多值非易失存储阵列的输出端与所述差分模块的输入端相连,所述差分模块的输出端与所述第二模数转换模块的输入端相连;所述第二数模转换模块用于将所述符号位矢量、所述高位矢量或所述低位矢量,以电压形式输入所述多值非易失存储阵列中;所述多值非易失存储阵列的输出端两两分为一组,所述差分模块用于将所述多值非易失存储阵列的每一组输出进行差分运算;所述第二模数转换模块用于将所述差分模块输入的差分运算结果进行模数转换。The multi-value operation unit includes k 1 multi-value operators; the multi-value operator includes the multi-value non-volatile storage array and a second peripheral circuit; the second peripheral circuit includes a second digital-to-analog conversion module , a differential module and a second analog-to-digital conversion module; the output end of the second digital-to-analog conversion module is connected to the input end of the multi-value non-volatile storage array; the output end of the multi-value non-volatile storage array is connected to the The input terminal of the differential module is connected, and the output terminal of the differential module is connected to the input terminal of the second analog-to-digital conversion module; the second digital-to-analog conversion module is used to convert the symbol bit vector, the high-order bit The vector or the low-order vector is input into the multi-value non-volatile memory array in the form of voltage; the output ends of the multi-value non-volatile memory array are divided into two groups, and the differential module is used to divide the Differential operation is performed on each group of outputs of the multi-value non-volatile storage array; the second analog-to-digital conversion module is configured to perform analog-to-digital conversion on the differential operation result input by the differential module. 7.根据权利要求6所述的矢量矩阵乘法运算器,其特征在于,所述第一数模转换模块包括多个数模转换器,其输出端与所述二值非易失存储阵列中每个参与运算的行的输入端一一对应相连;7. The vector-matrix multiplier according to claim 6, wherein the first digital-to-analog conversion module comprises a plurality of digital-to-analog converters, the output terminals of which are connected to each of the binary nonvolatile storage arrays. The input terminals of each row participating in the operation are connected one-to-one; 所述第一模数转换模块包括多组级联的跨阻放大器和模数转换器,其输入端与所述多值非易失存储阵列中每个参与运算的列的输出端一一对应相连。The first analog-to-digital conversion module includes multiple groups of cascaded transimpedance amplifiers and analog-to-digital converters, the input terminals of which are connected to the output terminals of each column participating in the operation in the multi-value non-volatile storage array in one-to-one correspondence. . 8.根据权利要求6所述的矢量矩阵乘法运算器,其特征在于,所述第二数模转换模块包括多个数模转换器,其输出端与所述多值非易失存储阵列中每个参与运算的行的输入端一一对应相连;8 . The vector-matrix multiplier according to claim 6 , wherein the second digital-to-analog conversion module comprises a plurality of digital-to-analog converters, the output terminals of which are connected to each of the multi-value non-volatile storage arrays. The input terminals of each row participating in the operation are connected one-to-one; 所述差分模块包括多个差分单元,其输入端与所述多值非易失存储阵列的每一组输出一一对应相连;其中,所述差分单元包括两个跨阻放大器和一个电压减法器;所述多值非易失存储阵列的每一组输出分别经所述跨阻放大器放大后,通过所述电压减法器进行差分运算;The differential module includes a plurality of differential units, the input ends of which are connected to each group of outputs of the multi-value non-volatile memory array in a one-to-one correspondence; wherein, the differential unit includes two transimpedance amplifiers and a voltage subtractor ; After each group of outputs of the multi-valued non-volatile storage array is amplified by the transimpedance amplifier respectively, the differential operation is performed by the voltage subtractor; 所述第一模数转换模块包括多个模数转换器,与各所述差分单元的输出端一一对应相连。The first analog-to-digital conversion module includes a plurality of analog-to-digital converters, which are connected to the output ends of the differential units in a one-to-one correspondence. 9.根据权利要求6所述的矢量矩阵乘法运算器,其特征在于,所述移位累加模块包括第一移位累加单元、第二移位累加单元和第三移位累加单元;9. The vector-matrix multiplier according to claim 6, wherein the shift accumulation module comprises a first shift accumulation unit, a second shift accumulation unit and a third shift accumulation unit; 所述第一移位累加单元的输入端与所述二值运算单元中的各二值运算器的输出端相连;所述第二移位累加单元的输入端与所述多值运算单元中的各多值运算器的输出端相连;所述第三移位累加单元的输入端分别与所述第一移位累加单元和所述第二移位累加单元的输出端相连;The input end of the first shift accumulation unit is connected to the output end of each binary operator in the binary operation unit; the input end of the second shift accumulation unit is connected with the output end of the multi-value operation unit. The output ends of each multi-value operator are connected; the input ends of the third shift accumulation unit are respectively connected with the output ends of the first shift accumulation unit and the second shift accumulation unit; 所述第一移位累加单元用于基于二进制运算规则对各二值运算器输出的乘法运算结果进行移位与累加操作;The first shift-accumulation unit is used to perform shift and accumulation operations on the multiplication result output by each binary operator based on the binary operation rule; 所述第二移位累加单元用于基于二进制运算规则对各多值运算器输出的乘法运算结果进行移位与累加操作;The second shift-accumulation unit is used to perform shift and accumulation operations on the multiplication result output by each multi-value operator based on the binary operation rule; 第三移位累加单元用于基于二进制运算规则对所述第一移位累加单元和所述第二移位累加单元的输出进行移位与累加操作,得到所述输入矢量与所述存储矩阵的矢量矩阵乘法运算结果。The third shift-accumulation unit is configured to perform shift and accumulation operations on the outputs of the first shift-and-accumulate unit and the second shift-and-accumulate unit based on binary operation rules to obtain the difference between the input vector and the storage matrix. The result of a vector-matrix multiplication operation. 10.根据权利要求3-9任意一项所述的矢量矩阵乘法运算器,其特征在于,还包括:控制模块;10. The vector-matrix multiplier according to any one of claims 3-9, further comprising: a control module; 所述控制模块分别与所述外部输入模块、所述存内计算模块和所述移位累加模块相连,用于在运算过程中,对所述外部输入模块、所述存内计算模块和所述移位累加模块的工作时序进行控制。The control module is respectively connected with the external input module, the in-memory calculation module and the shift-accumulation module, and is used for performing the operation on the external input module, the in-memory calculation module and the The working sequence of the shift-accumulate module is controlled.
CN202111331694.3A 2021-11-11 2021-11-11 A vector-matrix multiplication operation method and operator with adjustable in-memory precision Active CN114168107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111331694.3A CN114168107B (en) 2021-11-11 2021-11-11 A vector-matrix multiplication operation method and operator with adjustable in-memory precision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111331694.3A CN114168107B (en) 2021-11-11 2021-11-11 A vector-matrix multiplication operation method and operator with adjustable in-memory precision

Publications (2)

Publication Number Publication Date
CN114168107A true CN114168107A (en) 2022-03-11
CN114168107B CN114168107B (en) 2024-10-18

Family

ID=80478779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111331694.3A Active CN114168107B (en) 2021-11-11 2021-11-11 A vector-matrix multiplication operation method and operator with adjustable in-memory precision

Country Status (1)

Country Link
CN (1) CN114168107B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863936A (en) * 2023-09-04 2023-10-10 之江实验室 Voice recognition method based on FeFET (field effect transistor) memory integrated array

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200356620A1 (en) * 2019-05-09 2020-11-12 Applied Materials, Inc. Bit-Ordered Binary-Weighted Multiplier-Accumulator
CN111984921A (en) * 2020-08-27 2020-11-24 华中科技大学 In-memory numerical computing accelerator and in-memory numerical computing method
US20210208879A1 (en) * 2020-01-07 2021-07-08 SK Hynix Inc. Multiplication and accumulation(mac) operator and processing-in-memory (pim) device including the mac operator

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200356620A1 (en) * 2019-05-09 2020-11-12 Applied Materials, Inc. Bit-Ordered Binary-Weighted Multiplier-Accumulator
US20210208879A1 (en) * 2020-01-07 2021-07-08 SK Hynix Inc. Multiplication and accumulation(mac) operator and processing-in-memory (pim) device including the mac operator
CN111984921A (en) * 2020-08-27 2020-11-24 华中科技大学 In-memory numerical computing accelerator and in-memory numerical computing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
方东博;沈海斌;: "基于二维脉动阵列的GMM矢量乘法器设计", 电子技术, no. 03, 25 March 2011 (2011-03-25) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863936A (en) * 2023-09-04 2023-10-10 之江实验室 Voice recognition method based on FeFET (field effect transistor) memory integrated array
CN116863936B (en) * 2023-09-04 2023-12-19 之江实验室 Voice recognition method based on FeFET (field effect transistor) memory integrated array

Also Published As

Publication number Publication date
CN114168107B (en) 2024-10-18

Similar Documents

Publication Publication Date Title
CN110209375B (en) A multiply-accumulate circuit based on radix-4 encoding and differential weight storage
CN107636640B (en) Dot product engine, memristor dot product engine and method for calculating dot product
CN108780492B (en) analog coprocessor
CN110442323B (en) Device and method for performing floating point number or fixed point number multiply-add operation
CN112955863B (en) Method and device for implementing matrix operations
CN112181895B (en) Reconfigurable Architectures, Accelerators, Circuit Deployment, and Computational Dataflow Methods
US9933998B2 (en) Methods and apparatuses for performing multiplication
CN114168107A (en) Vector matrix multiplication method with adjustable memory precision and arithmetic unit
CN111988031B (en) Memristor memory vector matrix operator and operation method
TWI886426B (en) Hybrid method of using iterative product accumulation matrix multiplier and matrix multiplication
CN114816335B (en) Memristor array sign number multiplication implementation method, device and equipment
JP7279293B2 (en) Memory device and method of operation
CN114168888B (en) An in-memory simulation linear equation solver, solving system and solving method
CN115510791A (en) Semiconductor integrated circuits and computing systems
CN110262771A (en) A Basic Operational Circuit Based on MOS Transistor and Its Extended Circuit
CN115658013B (en) ROM in-memory computing device of vector multiply adder and electronic equipment
CN115658012B (en) SRAM analog memory computing device of vector multiply adder and electronic equipment
CN116127257B (en) Circuit device and calculation method for vector multiplication based on 1T1R
CN116594588A (en) A general in-memory matrix-tensor processor and its operation method
US20240220742A1 (en) Multiply-accumulate successive approximation devices and methods
KR20250129040A (en) Multiplicative-Accumulative Successive Approximation Device and Method
KR20240096766A (en) Hybrid Matrix Multiplier
CN117251213A (en) Processing circuits, methods and electronic devices for implementing fast Fourier transforms
CN120179603A (en) Storage and computing integrated device, computing method and electronic equipment
CN119479726A (en) A method for storing and calculating signed data based on a storage cell array

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant