CN114168107A

CN114168107A - Vector matrix multiplication method with adjustable memory precision and arithmetic unit

Info

Publication number: CN114168107A
Application number: CN202111331694.3A
Authority: CN
Inventors: 缪向水; 李健聪; 李祎
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2022-03-11
Anticipated expiration: 2041-11-11
Also published as: CN114168107B

Abstract

The invention discloses a vector-matrix multiplication operation method and an operator with adjustable in-memory precision, comprising: splitting a storage matrix into a symbol bit matrix and p _1-1 high-order bit matrix based on a split conversion rule of operation data and k ₁ low-order matrices, and store them in the corresponding non-volatile storage arrays in the order of high and low bits; there are p ₁ +k ₁ non-volatile storage arrays for operations; the split conversion rule based on operation data will be The input vector is split into 1 sign bit vector, p ₂ -1 high-order vector and k ₂ low-order vectors, and input them into p ₁ +k ₁ non-volatile storage arrays in turn, and realize the sign bit vector and high-order vector in turn. and the multiplication operation of the low-order vector and p ₁ +k ₁ matrices; shift and accumulate the obtained multiplication results based on the binary operation rules to obtain the vector-matrix multiplication result of the input vector and the storage matrix; among them, the high-order data and low-order data The number of bits is adjusted according to the required calculation precision, realizing vector-matrix operations with variable precision.

Description

Vector matrix multiplication method with adjustable memory precision and arithmetic unit

Technical Field

The invention belongs to the field of analog circuits, and particularly relates to a vector matrix multiplication method with adjustable memory precision and an arithmetic unit.

Background

With the explosive growth of the artificial intelligence era on the data volume, the traditional von Neumann computers have failed to meet the urgent demands of the current information era on computer computing power and computer energy efficiency. The in-memory computing mode is gradually showing its value as an operational mode with high operational energy efficiency, low operational delay and low operational time complexity. Among them, the non-volatile memory-based in-memory vector-matrix operation has exhibited huge energy efficiency and computational power advantages over the traditional digital computer in data intensive tasks such as manual execution, scientific computation and the like.

However, as an analog operation mode, the memory vector-matrix operation based on the nonvolatile memory has a very limited calculation accuracy due to the non-ideal effect of the device. Vector-matrix operations based on a single memory array can only provide operation results with low precision. However, when precision expansion is performed using a multi-chip array, the calculation energy efficiency tends to decrease exponentially. Therefore, in order to make the memory computing mode have better universality, a matrix operation core with adjustable memory computing precision and high energy efficiency is urgently needed.

Disclosure of Invention

Aiming at the defects or the improvement requirements of the prior art, the invention provides a vector matrix multiplication method with adjustable memory precision and an arithmetic unit, which are used for realizing variable-precision vector-matrix operation and solving the technical problem that the conventional memristive vector-matrix operation unit cannot realize precision-adjustable calculation aiming at different applications.

In order to achieve the above object, in a first aspect, the present invention provides a vector matrix multiplication method with adjustable memory precision, including the following steps:

s1, converting each operation data in the storage matrix into length p based on the splitting conversion rule of the operation data₁+k₁Mixed binary data of (1) to obtainTo the converted memory matrix; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p ₁1 high order matrix and k₁The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a nonvolatile memory array stores a matrix, and the number of nonvolatile memory arrays used for operation is p₁+k₁A 1, p₁And k₁The calculation precision is adjustable according to the requirement;

s2, converting each operation data in the input vector into length p based on the splitting conversion rule of the operation data₂+k₂Obtaining the converted input vector by mixing the binary data; integrating each mixed system data in the converted input vector according to bits to obtain 1 symbol bit vector p₂-1 high vector sum k₂A low-order vector;

s3, inputting the sign bit vector, the high bit vector and the low bit vector to p in sequence₁+k₁In a nonvolatile memory array, a sign bit vector, a high bit vector, a low bit vector and p are sequentially realized₁+k₁Multiplication of the matrices;

s4, shifting and accumulating the obtained multiplication result based on the binary operation rule to obtain a vector matrix multiplication result of the input vector and the storage matrix;

the splitting and converting rule of the operation data is as follows: performing complement operation on the operation data to obtain m-bit binary complement expression; splitting the previous p bits of data in the complementary code expression according to binary bits to obtain 1 symbol bit of data and p-1 high bits of data; sequentially splitting the m-p bit data in the complementary code expression into k bits with lengths of m₁、m₂、…、m_kAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,

further preferably, the nonvolatile memory array is in a cross structure, and the nonvolatile memory devices are located at the cross points; the nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; recording a nonvolatile storage array of the nonvolatile storage device based on multi-value storage as a multi-value nonvolatile storage array;

the nonvolatile memory array comprises a binary nonvolatile memory array and a multivalued nonvolatile memory array;

the binary nonvolatile memory array is used for storing a sign bit matrix and a high bit matrix;

the multivalued nonvolatile memory array is used for storing a low-order matrix.

In a second aspect, the present invention provides a vector matrix multiplication operator with adjustable memory precision, including: the device comprises an external input module, a memory computing module and a shift accumulation module;

the in-memory computing module includes p₁+k₁A nonvolatile memory array, wherein p₁And k₁The calculation precision is adjustable according to the requirement;

the external input module is used for converting each operation data in the storage matrix into the length p based on the splitting conversion rule of the operation data₁+k₁Obtaining the converted memory matrix by mixing the binary data; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p ₁1 high order matrix and k₁The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a non-volatile memory array storing a matrix;

the external input module is also used for converting each operation data in the input vector into the length p respectively based on the splitting conversion rule of the operation data₂+k₂Obtaining the converted input vector by mixing the binary data; integrating each mixed system data in the converted input vector according to bits to obtain 1 sign bitVector, p₂-1 high vector sum k₂A low-order vector, and sequentially input to p₁+k₁A plurality of non-volatile memory arrays;

the memory computing module is used for sequentially realizing the symbol bit vector, the high bit vector, the low bit vector and the p based on the nonvolatile memory array₁+k₁Multiplication of the matrices;

the shift accumulation module is used for shifting and accumulating the obtained multiplication operation result based on the binary operation rule to obtain a vector matrix multiplication operation result of the input vector and the storage matrix;

Further preferably, the nonvolatile memory device includes: resistive random access memory, phase change memory, NOR-FLASH, spin transfer torque magnetic memory or ferroelectric field effect transistor.

Further preferably, the memory computing module includes a binary operation unit and a multivalued operation unit;

the binary operation unit comprises p₁A binary operator; the binary arithmetic unit comprises a binary nonvolatile memory array and a first peripheral circuit; the first peripheral circuit comprises a first digital-to-analog conversion module and a first analog-to-digital conversion module; the output end of the first digital-to-analog conversion module is connected with the input end of the binary nonvolatile storage array; the output end of the binary nonvolatile storage array is connected with the input end of the first analog-to-digital conversion module; the first digital-to-analog conversion module is used for inputting the symbol bit vector, the high bit vector or the low bit vector into the binary nonvolatile memory array in a voltage mode; the first analog-to-digital conversion module is used for performing analog-to-digital conversion on a multiplication operation result which is output by the binary nonvolatile memory array and is characterized by current;

the multivalued operation unit includes k₁A plurality of multi-value operators; the multi-value arithmetic unit comprises a multi-value nonvolatile memory array and a second peripheral circuit; the second peripheral circuit comprises a second digital-to-analog conversion module, a difference module and a second analog-to-digital conversion module; the output end of the second digital-to-analog conversion module is connected with the input end of the multi-value nonvolatile storage array; the output end of the multi-value nonvolatile memory array is connected with the input end of the differential module, and the output end of the differential module is connected with the input end of the second analog-to-digital conversion module; the second digital-to-analog conversion module is used for inputting the symbol bit vector, the high bit vector or the low bit vector into the multi-value nonvolatile memory array in a voltage mode; the output ends of the multi-value nonvolatile memory array are divided into a group two by two, and the differential module is used for carrying out differential operation on each group of output of the multi-value nonvolatile memory array; the second analog-to-digital conversion module is used for performing analog-to-digital conversion on the difference operation result input by the difference module.

Further preferably, the first digital-to-analog conversion module comprises a plurality of digital-to-analog converters, and output ends of the digital-to-analog converters are connected with input ends of each row participating in operation in the binary nonvolatile memory array in a one-to-one correspondence manner; the first analog-to-digital conversion module comprises a plurality of groups of cascaded transimpedance amplifiers and analog-to-digital converters, and the input ends of the transimpedance amplifiers and the analog-to-digital converters are connected with the output ends of all the columns participating in operation in the multi-value nonvolatile memory array in a one-to-one correspondence mode.

Further preferably, the second digital-to-analog conversion module comprises a plurality of digital-to-analog converters, and output ends of the digital-to-analog converters are connected with input ends of each row participating in operation in the multi-valued nonvolatile memory array in a one-to-one correspondence manner; the differential module comprises a plurality of differential units, and the input ends of the differential units are correspondingly connected with each group of outputs of the multi-value nonvolatile memory array one by one; the differential unit comprises two trans-impedance amplifiers and a voltage subtracter; after each group of output of the multi-value nonvolatile memory array is amplified by a trans-impedance amplifier, differential operation is carried out by a voltage subtracter; the first analog-to-digital conversion module comprises a plurality of analog-to-digital converters which are connected with the output ends of the differential units in a one-to-one correspondence mode.

Further preferably, the shift accumulation module comprises a first shift accumulation unit, a second shift accumulation unit and a third shift accumulation unit;

the input end of the first shift accumulation unit is connected with the output end of each binary operator in the binary operation unit; the input end of the second shift accumulation unit is connected with the output end of each multi-value operator in the multi-value operation unit; the input end of the third shift accumulation unit is respectively connected with the output ends of the first shift accumulation unit and the second shift accumulation unit;

the first shift accumulation unit is used for carrying out shift and accumulation operation on multiplication operation results output by each binary operator based on a binary operation rule;

the second shift accumulation unit is used for carrying out shift and accumulation operation on the multiplication operation result output by each multi-value operator based on the binary operation rule;

the third shift accumulation unit is used for shifting and accumulating the outputs of the first shift accumulation unit and the second shift accumulation unit based on the binary operation rule to obtain a vector matrix multiplication operation result of the input vector and the storage matrix.

Further preferably, the vector matrix multiplication operator with adjustable memory precision further includes: a control module;

the control module is respectively connected with the external input module, the memory calculation module and the shift accumulation module and is used for controlling the working time sequence of the external input module, the memory calculation module and the shift accumulation module in the operation process.

Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:

1. the invention provides a memory precision adjustable vector matrix multiplication method and an arithmetic unit, which are characterized in that after an input vector participating in operation and operation data in a storage matrix are split and converted into sign bit data, high bit data and low bit data based on a splitting and converting rule of the operation data, vector matrix multiplication operation is carried out based on a nonvolatile storage array, wherein the bit number of the high bit data and the low bit data is adjusted according to required calculation precision, variable-precision vector-matrix operation is realized, and the technical problem that the existing memristive vector-matrix operation unit cannot realize precision adjustable calculation aiming at different applications is solved.

2. The vector matrix multiplication operation method and the arithmetic unit with adjustable memory precision, provided by the invention, carry out vector matrix multiplication operation based on a binary nonvolatile memory array and a multi-value nonvolatile memory array, adopt the binary nonvolatile memory array to store a sign bit matrix and a high bit matrix, adopt the multi-value nonvolatile memory array to store a low bit matrix, and for memory calculation based on a nonvolatile memory, the accuracy of binary calculation is obviously higher than that of multi-value calculation, and the energy efficiency of the multi-value calculation is obviously higher than that of the binary calculation. The invention ensures that the data bit with obvious influence on the calculation is calculated by the binary unit to ensure the calculation precision, and the data bit with small influence on the calculation structure is calculated by the multi-value unit, thereby effectively reducing the influence of the nonideal effect of the device on the calculation result, ensuring the energy efficiency of the calculation, and realizing the precision adjustable memory calculation with high robustness and high energy efficiency.

3. The vector matrix multiplication method and the arithmetic unit with adjustable memory precision can enable more high-order data to be operated in a binary nonvolatile memory array when the calculation precision needs to be ensured; when the calculation energy efficiency needs to be ensured, more low-bit data can be operated in the multi-valued nonvolatile memory array; the calculation precision requirement and the calculation energy efficiency requirement can be well balanced.

4. The vector matrix multiplication arithmetic unit with adjustable memory precision provided by the invention carries out shift and accumulation operation on multiplication operation results output by the nonvolatile memory array in a grading way, and because the digital operation unit has very high operation precision and operation robustness, the digital operation unit is used for carrying out shift accumulation operation, so that extra calculation errors are not introduced in the shift accumulation process, and the accuracy of the shift accumulation results is ensured.

5. The arithmetic unit formed by the vector matrix multiplication method with adjustable memory precision has good universality, can be used for supporting the vector-matrix multiplication process in scientific calculation such as machine learning and the like based on the characteristic of adjustable precision, and has the remarkable advantages of calculation power and power consumption overhead compared with the conventional calculation system.

Drawings

Fig. 1 is a flowchart of a memory-precision-adjustable vector matrix multiplication method according to embodiment 1 of the present invention;

fig. 2 is a schematic diagram of weight slicing on a memory matrix according to embodiment 1 of the present invention;

fig. 3 is a schematic diagram of weight slicing of an input vector according to embodiment 1 of the present invention;

FIG. 4 is a diagram illustrating a stage-by-stage shift and accumulation operation performed on multiplication results according to embodiment 1 of the present invention;

FIG. 5 is a schematic diagram of a nonvolatile memory array provided in embodiment 2 of the present invention;

fig. 6 is a schematic structural diagram of an in-memory computing module according to embodiment 2 of the present invention;

FIG. 7 is a schematic diagram of a binary operator according to embodiment 2 of the present invention;

FIG. 8 is a block diagram of a multi-valued arithmetic unit according to embodiment 2 of the present invention;

fig. 9 is a schematic structural diagram of a memory-precision-adjustable vector matrix multiplication operator according to embodiment 2 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Examples 1,

A vector matrix multiplication method with adjustable memory precision, as shown in fig. 1, includes the following steps:

s1, converting each operation data in the storage matrix into length p based on the splitting conversion rule of the operation data₁+k₁Obtaining the converted memory matrix by mixing the binary data; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p ₁1 high order matrix and k₁The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a nonvolatile memory array stores a matrix, and the number of nonvolatile memory arrays used for operation is p₁+k₁A 1, p₁And k₁All are integers which can be adjusted according to the required calculation precision; for example, when executing a machine learning algorithm, the required computational accuracy is 8-bit, and p can be taken₁＝k₁When performing a scientific computational task, the required computational accuracy is 16-bit, and p can be taken₁＝k₁＝4。

The splitting and converting rule of the operation data is as follows: performing complement operation on the operation data to obtain m-bit binary complement expression; splitting the previous p bit data (the first bit data is the sign bit data) in the complementary code expression according to binary bits to obtain 1 sign bit data and p-1 high bit data; sequentially tearing back m-p bit data in complementary code expressionDivided into k pieces with length of m₁、m₂、…、m_kAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,

and p and k are integers and are adjustable according to required calculation precision. It should be noted that, in the splitting process, the sign bit and bits 2 to p which can significantly affect the calculation result are both split into binary data; sequentially splitting m-p-bit data into k pieces of m lengths₁、m₂、…、m_kGenerally, for the result after splitting the m-p bit data, the length of the binary data is shorter the farther the front is; in this example, m₁、m₂、…、m_kAre sequentially increased from left to right; the values of p and k can be modulated according to the required computational accuracy.

Specifically, as shown in fig. 2, the mixed binary data of the memory matrix includes 1 symbol bit data, p, arranged in order from the upper bit to the lower bit₁-1 high order data sum k₁Individual low bit data; the sign bit data and the high bit data are binary data expressed by the first p in the binary complement of the operation data₁Splitting bit data according to binary bits to obtain the split bit data; the lower data are decimal data expressed by the last m-p in the binary complement of the operation data₁Bit data sequence split into k₁Each length is m₁、m₂、…、m_k1Performing decimal conversion on the binary data to obtain the decimal data; wherein,

and m is the length of the two's complement expression.

S2, converting each operation data in the input vector into length p based on the splitting conversion rule of the operation data₂+k₂To obtain the converted input vectorAn amount; integrating each mixed system data in the converted input vector according to bits to obtain 1 symbol bit vector p₂-1 high vector sum k₂A low-order vector; wherein p is₂And k₂The values are integers and are adjustable according to required calculation precision; likewise, for example, when executing a machine learning algorithm, the required computational accuracy is 8-bit, then p can be taken₂＝k₂When performing a scientific computational task, the required computational accuracy is 16-bit, and p can be taken₂＝k₂＝4。

Specifically, as shown in fig. 3, the operation data in the input vector is converted by the same conversion method as the operation data in the memory matrix, and the length p is obtained₂+k₂And mixing the binary data to obtain the converted input vector.

It should be noted that the input vector and the memory matrix may not be consistent in accuracy or in accuracy.

specifically, in this embodiment, 1 sign bit vector, p, corresponding to the input vector₂-1 high vector sum k₂The low-order bit vectors are sequentially arranged from high order to low order according to the original splitting order, and the sign bit vectors are simultaneously input into the matrix for storing 1 sign bit and p according to the arrangement order of the high order and the low order ₁1 high order matrix and k₁P of a lower matrix₁+k₁In a non-volatile memory array, implementing a sign bit vector and p₁+k₁Multiplication of the matrices; a, (a) is 1,2₂-1) high order vectors are simultaneously input to the above-mentioned matrix for storing 1 sign bit, p ₁1 high order matrix and k₁P of a lower matrix₁+k₁In a non-volatile memory array, implementing high bit vectors and p₁+k₁Multiplication of the matrices; the method comprises the following steps of (1, 2, 1.. times.p)₂-1) low-order vectors are simultaneously input into the above-mentioned matrix for storing 1 sign bit, p ₁1 high order matrix and k₁P of a lower matrix₁+k₁In a non-volatile memory array, implementing high bit vectors and p₁+k₁Multiplication of the matrices.

furthermore, the nonvolatile memory array is of a cross structure, and the nonvolatile memory devices are positioned on the cross points; the nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; recording a nonvolatile storage array of the nonvolatile storage device based on multi-value storage as a multi-value nonvolatile storage array;

in an optional embodiment, the nonvolatile memory array includes a binary nonvolatile memory array and a multivalued nonvolatile memory array; the binary non-volatile memory array has p₁A plurality for storing a sign bit matrix and a high bit matrix; the multivalued nonvolatile memory array has k₁And the low-order matrix is used for storing the low-order matrix.

Further, in an optional embodiment, as shown in fig. 4, the present invention performs shift and accumulation operations on the multiplication result in stages, and specifically, the shift and accumulation operations may be performed on the multiplication result output by the binary nonvolatile memory array and the multiplication result output by the multivalued nonvolatile memory array, respectively, and then the shift and accumulation operations may be further performed on the result obtained by the shift and accumulation operations.

It should be noted that the vector matrix multiplication method with adjustable memory precision has the characteristic of balancing the calculation precision requirement and the calculation energy efficiency requirement, and when the calculation precision needs to be ensured, more high-bit data are operated in the binary nonvolatile memory array; when the calculation energy efficiency needs to be guaranteed, more low-bit data are operated in the multi-value nonvolatile memory array. Therefore, according to the accuracy of the user pairWhen in operation, the operation precision can be ensured by adopting binary nonvolatile memory array operation, and the operation energy efficiency can be ensured by adopting multi-value nonvolatile memory array operation; can be obtained by adjusting p₁And k₁Is achieved by the value of (c).

Examples 2,

An in-memory precision-adjustable vector matrix multiplication arithmetic unit can be used for realizing the in-memory precision-adjustable vector matrix multiplication arithmetic method provided by embodiment 1 of the present invention, and includes: the device comprises an external input module, a memory computing module and a shift accumulation module;

the external input module is also used for converting each operation data in the input vector into the length p respectively based on the splitting conversion rule of the operation data₂+k₂Obtaining the converted input vector by mixing the binary data; integrating each mixed system data in the converted input vector according to bits to obtain 1 symbol bit vector p₂-1 high vector sum k₂A low-order vector, and sequentially input to p₁+k₁A plurality of non-volatile memory arrays; it should be noted that the input vector and the memory matrix may not be consistent in accuracy or in accuracy.

The memory computing module is used for sequentially realizing the symbol bit vector, the high bit vector, the low bit vector and the p based on the nonvolatile memory array₁+k₁Of a matrixMultiplication operation;

the splitting and converting rule of the operation data is as follows: splitting operation data based on an operation rule of a binary signed number, specifically, performing complement operation on the operation data to obtain a binary complement expression of m bits; splitting the previous p bits of data in the complementary code expression according to binary bits to obtain 1 symbol bit of data and p-1 high bits of data; sequentially splitting the m-p bit data in the complementary code expression into k bits with lengths of m₁、m₂、…、m_kAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,

specifically, as shown in FIG. 5, the non-volatile memory array includes an input port and an output port, wherein the input port is formed by row lines of the array and the output port is formed by column lines of the array. The nonvolatile memory array is of a cross structure, and the nonvolatile memory devices are positioned on the cross points; the selected nonvolatile memory device can be Resistive Random Access Memory (RRAM), Phase Change Memory (PCM), NOR-FLASH, spin transfer torque magnetic memory (STT-MRAM), ferroelectric field effect transistor (FeFET), etc. The nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; a nonvolatile memory array of a nonvolatile memory device based on multi-value storage is recorded as a multi-value nonvolatile memory array.

In an optional implementation, the nonvolatile memory array includes a binary nonvolatile memory array and a multivalued nonvolatile memory array; the binary non-volatile memory array has p₁Is used for storingStoring a sign bit matrix and a high bit matrix input by an external input module; the multivalued nonvolatile memory array has k₁And the low-order matrix is used for storing the input of the external input module.

Specifically, as shown in fig. 6, the memory computing module includes the binary operation unit and a multi-valued operation unit;

a binary operation unit:

the binary operation unit comprises p₁A binary operator; as shown in fig. 7, the binary operator includes a binary nonvolatile memory array and a first peripheral circuit;

the first peripheral circuit comprises a first digital-to-analog conversion module and a first analog-to-digital conversion module; the output end of the first digital-to-analog conversion module is connected with the input end of the binary nonvolatile storage array; the output end of the binary nonvolatile storage array is connected with the input end of the first analog-to-digital conversion module;

the first digital-to-analog conversion module is used for inputting a sign bit vector, a high bit vector or a low bit vector which is input by the external input module into the binary nonvolatile storage array in a voltage mode; the first analog-to-digital conversion module is used for performing analog-to-digital conversion on a multiplication operation result which is output by the binary nonvolatile memory array and is characterized by current.

In an optional implementation manner, the first digital-to-analog conversion module includes a plurality of digital-to-analog converters (DACs), and output ends of the DACs are connected with input ends of each row participating in operation in the binary nonvolatile memory array in a one-to-one correspondence manner; the first analog-to-digital conversion module comprises a plurality of groups of cascaded transimpedance amplifiers and analog-to-digital converters (ADC), wherein the input end of each group is connected with the output end of each column participating in operation in the multi-value nonvolatile memory array in a one-to-one correspondence mode; the output end of the transimpedance amplifier is connected with the input end of the analog-to-digital converter.

Further, in an alternative embodiment, the array size for the memory matrix in the binary nonvolatile memory array is M × N; the first digital-to-analog conversion module is composed of M digital-to-analog converters (DAC), and the first analog-to-digital conversion module is composed of N trans-impedance amplifiers and N analog-to-digital converters (ADC).

A multivalued arithmetic unit:

the multivalued operation unit includes k₁A plurality of multi-value operators; as shown in fig. 8, the multi-value operator includes a multi-value nonvolatile memory array and a second peripheral circuit;

the second peripheral circuit comprises a second digital-to-analog conversion module, a difference module and a second analog-to-digital conversion module; the output end of the second digital-to-analog conversion module is connected with the input end of the multi-value nonvolatile storage array; the output end of the multi-value nonvolatile memory array is connected with the input end of the differential module, and the output end of the differential module is connected with the input end of the second analog-to-digital conversion module; the second digital-to-analog conversion module is used for inputting the symbol bit vector, the high bit vector or the low bit vector into the multi-value nonvolatile memory array in a voltage mode; the output ends of the multi-value nonvolatile memory array are divided into a group two by two, and the differential module is used for carrying out differential operation on each group of output of the multi-value nonvolatile memory array; the second analog-to-digital conversion module is used for performing analog-to-digital conversion on the difference operation result input by the difference module. In an optional implementation manner, the second digital-to-analog conversion module includes a plurality of digital-to-analog converters, and output ends of the digital-to-analog converters are connected with input ends of each row participating in operation in the multi-valued nonvolatile memory array in a one-to-one correspondence manner; the differential module comprises a plurality of differential units, and the input ends of the differential units are correspondingly connected with each group of outputs of the multi-value nonvolatile memory array one by one; the differential unit comprises two trans-impedance amplifiers and a voltage subtracter; after each group of output of the multi-value nonvolatile memory array is amplified by a trans-impedance amplifier, differential operation is carried out by a voltage subtracter; the first analog-to-digital conversion module comprises a plurality of analog-to-digital converters which are connected with the output ends of the differential units in a one-to-one correspondence mode.

Further, in an alternative embodiment, the array size for the memory matrix in the multivalued nonvolatile memory array is K × 2L; for example, the size of the multivalued nonvolatile memory array is X × Y; wherein X is a positive integer and Y is an even number; when a matrix with the size of K multiplied by L is stored in the first K rows and the first 2L columns of the multi-value nonvolatile memory array, the element of the K row and the L column in the matrix is the difference between the conductance value of the K row 2L-1 column and the conductance value of the K row 2L column of the multi-value nonvolatile memory array, wherein K is 1,2, …, K, L is 1,2, …, L, K is less than or equal to X, and 2L is less than or equal to Y. The second digital-to-analog conversion module is composed of K digital-to-analog converters (DAC), and the second analog-to-digital conversion module is composed of 2L trans-impedance amplifiers and L analog-to-digital converters (ADC). The difference module is composed of 2L electric trans-impedance amplifiers and L voltage subtractors, wherein the trans-impedance amplifiers are connected to the output end of the multi-value nonvolatile storage array, two adjacent columns of the multi-value nonvolatile storage array are arranged into a group of difference pairs and connected to the voltage subtractors, the analog-to-digital conversion module is composed of L analog-to-digital converters (ADC), and the input end of the analog-to-digital conversion module is connected with the output end of the voltage subtractors.

Further, in an optional implementation manner, the shift accumulation module includes a first shift accumulation unit, a second shift accumulation unit, and a third shift accumulation unit; the shift accumulation output unit in the embodiment comprises an Arithmetic Logic Unit (ALU) and a corresponding cache structure;

In an optional implementation manner, the vector matrix multiplication operator with adjustable in-memory precision further includes: a control module; specifically, the control module comprises an Arithmetic Logic Unit (ALU) and a corresponding cache structure;

the control module is respectively connected with the external input module, the memory calculation module and the shift accumulation module and is used for integrally controlling the working time sequence of the external input module, the memory calculation module and the shift accumulation module in the operation process.

In an alternative embodiment, a schematic structural diagram of the memory precision-adjustable vector matrix multiplication operator is shown in fig. 9, and a hardware structure thereof includes: the device comprises an external input module, an internal memory calculation module and a shift accumulation module. The memory computing module comprises a binary arithmetic unit and a multi-value arithmetic unit, and each arithmetic unit internally comprises a plurality of arithmetic units. Each of the operators is composed of a nonvolatile memory array and its corresponding peripheral circuits. During operation, the matrix to be operated is split according to the binary operation rule and is stored in the nonvolatile memory array of the operation core. The input vector is split by an external input unit according to the binary operation rule and then is input into the memory computing module. And after the memory computing module finishes computing, inputting the result into a shift accumulation output module, finishing shift accumulation operation according to the binary operation rule and outputting the computing result. The invention adopts a mixed operation framework, effectively reduces the influence of the nonideal effect of the device on the calculation result by adopting a mode that a binary operation unit processes high-order data and a multi-valued operation unit processes low-order data, simultaneously ensures the energy efficiency of calculation, and can realize high-robustness and high-energy-efficiency precision-adjustable memory calculation. The related technical features are the same as above, and are not described herein.

It should be noted that, for different operation requirements, the number of called nonvolatile memory arrays may be adjusted to perform operations with different accuracies, and the storage accuracy of devices in the multi-valued arithmetic unit may also be adjusted to fully implement operations with adjustable accuracy. Further, when the calculation accuracy needs to be ensured, more high-order data can be operated in the binary operation unit. When the calculation energy efficiency needs to be ensured, more low-bit data can be operated in the multi-value operation unit. Therefore, during the operation, the operation precision can be ensured by all binary operation unit operations, and the operation energy efficiency can be ensured by all multi-value operation unit operations

Further, under the condition that the usage scenario allows (such as hybrid precision architecture solution equation and inference application of a neural network), the binary operation unit can be used for simultaneously completing the binary operation requirement and the multi-value operation requirement, so that the circuit area is reduced, and higher operation energy efficiency is ensured.

The related technical features are the same as those of embodiment 1, and are not described herein.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. a vector-matrix multiplication method with adjustable in-memory precision, is characterized in that, comprises the following steps:

S1, based on the split conversion rule of the operation data, each operation data in the storage matrix is converted into mixed binary data with a length of p ₁ +k ₁ respectively, and the converted storage matrix is obtained; the converted storage matrix is The mixed-system data in are integrated bit by bit to obtain 1 sign bit matrix, p ₁ -1 high order matrix and k ₁ low order matrix, and are respectively stored in the corresponding non-volatile storage array according to the order of high and low order; A non-volatile storage array stores a matrix, the number of non-volatile storage arrays used for operations is p ₁ +k ₁ , and p ₁ and k ₁ are adjustable according to the required calculation precision;

S2, based on the split conversion rule of the operational data, convert each operational data in the input vector into mixed binary data with a length of p ₂ +k ₂ respectively, to obtain a converted input vector; convert the converted input vector Each mixed binary data in is integrated bit by bit to obtain 1 symbol bit vector, p ₂ -1 high-order vector and k ₂ low-order vectors;

S3. Input the sign bit vector, the high order vector and the low order vector into p ₁ +k ₁ non-volatile storage arrays in sequence, and realize the sign bit vector, the high order vector and all the Multiplication of the low-order vector and p ₁ +k ₁ matrices;

S4, based on the binary operation rule, the obtained multiplication result is shifted and accumulated to obtain the vector-matrix multiplication result of the input vector and the storage matrix;

The split conversion rule of the operation data is: perform a complement code operation on the operation data to obtain an m-bit twos complement expression; split the first p bits of data in the complement expression according to binary bits to obtain 1 Sign bit data and p-1 high-order data; after the rear mp bit data in the complement expression is sequentially split into k binary data with lengths m ₁ , m ₂ , . . . , m _k , respectively, and then converted into corresponding Decimal number, get k low-order data; then obtain 1 symbol bit data, p-1 high-order data and k low-order data arranged in sequence from high order to low order, so as to obtain the mixed binary data of length p+k; ,

2. The vector-matrix multiplication operation method according to claim 1, wherein the non-volatile storage array is a cross structure, and the non-volatile storage device is located on the cross point; For binary storage or multi-value storage; non-volatile storage arrays based on non-volatile storage devices based on binary storage are recorded as binary non-volatile storage arrays; non-volatile storage arrays based on non-volatile storage devices based on multi-value storage The storage array is recorded as a multi-valued non-volatile storage array;

The non-volatile storage array includes the binary non-volatile storage array and the multi-value non-volatile storage array;

The binary nonvolatile storage array is used to store the sign bit matrix and the high order matrix;

The multi-valued non-volatile memory array is used to store the low order matrix.

3. A vector-matrix multiplier with adjustable in-memory precision, comprising: an external input module, an in-memory calculation module and a shift-accumulate module;

The in-memory computing module includes p ₁ +k ₁ non-volatile storage arrays, wherein p ₁ and k ₁ are adjustable according to the required computing precision;

The external input module is used to convert each operation data in the storage matrix into mixed binary data with a length of p ₁ +k ₁ based on the split conversion rule of the operation data, to obtain the converted storage matrix; The mixed binary data in the converted storage matrix is integrated bit by bit to obtain 1 sign bit matrix, p ₁ -1 high-order matrix and k ₁ low-order matrix, and store them in the corresponding non-volatile matrix respectively according to the high and low order. volatile storage array; a non-volatile storage array stores a matrix;

The external input module is also used to convert each operational data in the input vector into mixed binary data with a length of p ₂ +k ₂ respectively based on the split conversion rule of the operational data, to obtain the converted input vector; The mixed binary data in the converted input vector is integrated bit by bit to obtain 1 sign bit vector, p ₂ -1 high-order vector and k ₂ low-order vectors, and input them in sequence to p ₁ +k ₁ non-signal bit vector volatile memory array;

The in-memory computing module is configured to sequentially implement the multiplication operation of the sign bit vector, the high-order vector, and the low-order vector with p ₁ +k ₁ matrices based on a non-volatile storage array;

The shift-accumulation module is used for shifting and accumulating the obtained multiplication result based on the binary operation rule, to obtain the vector-matrix multiplication result of the input vector and the storage matrix;

The split conversion rule of the operation data is: perform a complement code operation on the operation data to obtain an m-bit twos complement expression; split the first p bits of data in the complement expression according to binary bits to obtain 1 Sign bit data and p-1 high-order data; after the rear mp bit data in the complement expression is sequentially split into k binary data with lengths m ₁ , m ₂ , . . . , m _k , respectively, and then converted into corresponding Decimal number, obtain k low-order data; and then obtain 1 symbol bit data, p-1 high-order data and k low-order data arranged in sequence from high order to low order, which is recorded as the mixed system data of length p+k; ,

4 . The vector-matrix multiplier according to claim 3 , wherein the non-volatile storage array is a cross structure, and the non-volatile storage device is located at the cross point; the non-volatile storage device uses For binary storage or multi-value storage; non-volatile storage arrays based on non-volatile storage devices based on binary storage are recorded as binary non-volatile storage arrays; non-volatile storage arrays based on non-volatile storage devices based on multi-value storage The storage array is recorded as a multi-valued non-volatile storage array;

The non-volatile storage array includes a binary non-volatile storage array and a multi-value non-volatile storage array;

5. The vector-matrix multiplier according to claim 4, wherein the non-volatile memory device comprises: resistive memory, phase change memory, NOR-FLASH, spin transfer torque magnetic memory or ferroelectric field effect transistor.

6. The vector-matrix multiplier according to claim 4, wherein the in-memory computing module comprises a binary arithmetic unit and a multi-valued arithmetic unit;

The binary operation unit includes p ₁ binary operators; the binary operator includes the binary non-volatile storage array and a first peripheral circuit; the first peripheral circuit includes a first digital-to-analog conversion module and a first analog-to-digital conversion module; the output end of the first digital-to-analog conversion module is connected to the input end of the binary nonvolatile storage array; the output end of the binary nonvolatile storage array is connected to the second The input end of an analog-to-digital conversion module is connected; the first digital-to-analog conversion module is used to input the sign bit vector, the high-order vector or the low-order vector into the binary non-volatile storage array in the form of voltage in; the first analog-to-digital conversion module is configured to perform analog-to-digital conversion on the multiplication result outputted by the binary nonvolatile storage array and characterized as current;

The multi-value operation unit includes k ₁ multi-value operators; the multi-value operator includes the multi-value non-volatile storage array and a second peripheral circuit; the second peripheral circuit includes a second digital-to-analog conversion module , a differential module and a second analog-to-digital conversion module; the output end of the second digital-to-analog conversion module is connected to the input end of the multi-value non-volatile storage array; the output end of the multi-value non-volatile storage array is connected to the The input terminal of the differential module is connected, and the output terminal of the differential module is connected to the input terminal of the second analog-to-digital conversion module; the second digital-to-analog conversion module is used to convert the symbol bit vector, the high-order bit The vector or the low-order vector is input into the multi-value non-volatile memory array in the form of voltage; the output ends of the multi-value non-volatile memory array are divided into two groups, and the differential module is used to divide the Differential operation is performed on each group of outputs of the multi-value non-volatile storage array; the second analog-to-digital conversion module is configured to perform analog-to-digital conversion on the differential operation result input by the differential module.

7. The vector-matrix multiplier according to claim 6, wherein the first digital-to-analog conversion module comprises a plurality of digital-to-analog converters, the output terminals of which are connected to each of the binary nonvolatile storage arrays. The input terminals of each row participating in the operation are connected one-to-one;

The first analog-to-digital conversion module includes multiple groups of cascaded transimpedance amplifiers and analog-to-digital converters, the input terminals of which are connected to the output terminals of each column participating in the operation in the multi-value non-volatile storage array in one-to-one correspondence. .

8 . The vector-matrix multiplier according to claim 6 , wherein the second digital-to-analog conversion module comprises a plurality of digital-to-analog converters, the output terminals of which are connected to each of the multi-value non-volatile storage arrays. The input terminals of each row participating in the operation are connected one-to-one;

The differential module includes a plurality of differential units, the input ends of which are connected to each group of outputs of the multi-value non-volatile memory array in a one-to-one correspondence; wherein, the differential unit includes two transimpedance amplifiers and a voltage subtractor ; After each group of outputs of the multi-valued non-volatile storage array is amplified by the transimpedance amplifier respectively, the differential operation is performed by the voltage subtractor;

The first analog-to-digital conversion module includes a plurality of analog-to-digital converters, which are connected to the output ends of the differential units in a one-to-one correspondence.

9. The vector-matrix multiplier according to claim 6, wherein the shift accumulation module comprises a first shift accumulation unit, a second shift accumulation unit and a third shift accumulation unit;

The input end of the first shift accumulation unit is connected to the output end of each binary operator in the binary operation unit; the input end of the second shift accumulation unit is connected with the output end of the multi-value operation unit. The output ends of each multi-value operator are connected; the input ends of the third shift accumulation unit are respectively connected with the output ends of the first shift accumulation unit and the second shift accumulation unit;

The first shift-accumulation unit is used to perform shift and accumulation operations on the multiplication result output by each binary operator based on the binary operation rule;

The second shift-accumulation unit is used to perform shift and accumulation operations on the multiplication result output by each multi-value operator based on the binary operation rule;

The third shift-accumulation unit is configured to perform shift and accumulation operations on the outputs of the first shift-and-accumulate unit and the second shift-and-accumulate unit based on binary operation rules to obtain the difference between the input vector and the storage matrix. The result of a vector-matrix multiplication operation.

10. The vector-matrix multiplier according to any one of claims 3-9, further comprising: a control module;

The control module is respectively connected with the external input module, the in-memory calculation module and the shift-accumulation module, and is used for performing the operation on the external input module, the in-memory calculation module and the The working sequence of the shift-accumulate module is controlled.