Disclosure of Invention
Aiming at the defects or the improvement requirements of the prior art, the invention provides a vector matrix multiplication method with adjustable memory precision and an arithmetic unit, which are used for realizing variable-precision vector-matrix operation and solving the technical problem that the conventional memristive vector-matrix operation unit cannot realize precision-adjustable calculation aiming at different applications.
In order to achieve the above object, in a first aspect, the present invention provides a vector matrix multiplication method with adjustable memory precision, including the following steps:
s1, converting each operation data in the storage matrix into length p based on the splitting conversion rule of the operation data1+k1Mixed binary data of (1) to obtainTo the converted memory matrix; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p 11 high order matrix and k1The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a nonvolatile memory array stores a matrix, and the number of nonvolatile memory arrays used for operation is p1+k1A 1, p1And k1The calculation precision is adjustable according to the requirement;
s2, converting each operation data in the input vector into length p based on the splitting conversion rule of the operation data2+k2Obtaining the converted input vector by mixing the binary data; integrating each mixed system data in the converted input vector according to bits to obtain 1 symbol bit vector p2-1 high vector sum k2A low-order vector;
s3, inputting the sign bit vector, the high bit vector and the low bit vector to p in sequence1+k1In a nonvolatile memory array, a sign bit vector, a high bit vector, a low bit vector and p are sequentially realized1+k1Multiplication of the matrices;
s4, shifting and accumulating the obtained multiplication result based on the binary operation rule to obtain a vector matrix multiplication result of the input vector and the storage matrix;
the splitting and converting rule of the operation data is as follows: performing complement operation on the operation data to obtain m-bit binary complement expression; splitting the previous p bits of data in the complementary code expression according to binary bits to obtain 1 symbol bit of data and p-1 high bits of data; sequentially splitting the m-p bit data in the complementary code expression into k bits with lengths of m
1、m
2、…、m
kAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,
further preferably, the nonvolatile memory array is in a cross structure, and the nonvolatile memory devices are located at the cross points; the nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; recording a nonvolatile storage array of the nonvolatile storage device based on multi-value storage as a multi-value nonvolatile storage array;
the nonvolatile memory array comprises a binary nonvolatile memory array and a multivalued nonvolatile memory array;
the binary nonvolatile memory array is used for storing a sign bit matrix and a high bit matrix;
the multivalued nonvolatile memory array is used for storing a low-order matrix.
In a second aspect, the present invention provides a vector matrix multiplication operator with adjustable memory precision, including: the device comprises an external input module, a memory computing module and a shift accumulation module;
the in-memory computing module includes p1+k1A nonvolatile memory array, wherein p1And k1The calculation precision is adjustable according to the requirement;
the external input module is used for converting each operation data in the storage matrix into the length p based on the splitting conversion rule of the operation data1+k1Obtaining the converted memory matrix by mixing the binary data; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p 11 high order matrix and k1The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a non-volatile memory array storing a matrix;
the external input module is also used for converting each operation data in the input vector into the length p respectively based on the splitting conversion rule of the operation data2+k2Obtaining the converted input vector by mixing the binary data; integrating each mixed system data in the converted input vector according to bits to obtain 1 sign bitVector, p2-1 high vector sum k2A low-order vector, and sequentially input to p1+k1A plurality of non-volatile memory arrays;
the memory computing module is used for sequentially realizing the symbol bit vector, the high bit vector, the low bit vector and the p based on the nonvolatile memory array1+k1Multiplication of the matrices;
the shift accumulation module is used for shifting and accumulating the obtained multiplication operation result based on the binary operation rule to obtain a vector matrix multiplication operation result of the input vector and the storage matrix;
the splitting and converting rule of the operation data is as follows: performing complement operation on the operation data to obtain m-bit binary complement expression; splitting the previous p bits of data in the complementary code expression according to binary bits to obtain 1 symbol bit of data and p-1 high bits of data; sequentially splitting the m-p bit data in the complementary code expression into k bits with lengths of m
1、m
2、…、m
kAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,
further preferably, the nonvolatile memory array is in a cross structure, and the nonvolatile memory devices are located at the cross points; the nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; recording a nonvolatile storage array of the nonvolatile storage device based on multi-value storage as a multi-value nonvolatile storage array;
the nonvolatile memory array comprises a binary nonvolatile memory array and a multivalued nonvolatile memory array;
the binary nonvolatile memory array is used for storing a sign bit matrix and a high bit matrix;
the multivalued nonvolatile memory array is used for storing a low-order matrix.
Further preferably, the nonvolatile memory device includes: resistive random access memory, phase change memory, NOR-FLASH, spin transfer torque magnetic memory or ferroelectric field effect transistor.
Further preferably, the memory computing module includes a binary operation unit and a multivalued operation unit;
the binary operation unit comprises p1A binary operator; the binary arithmetic unit comprises a binary nonvolatile memory array and a first peripheral circuit; the first peripheral circuit comprises a first digital-to-analog conversion module and a first analog-to-digital conversion module; the output end of the first digital-to-analog conversion module is connected with the input end of the binary nonvolatile storage array; the output end of the binary nonvolatile storage array is connected with the input end of the first analog-to-digital conversion module; the first digital-to-analog conversion module is used for inputting the symbol bit vector, the high bit vector or the low bit vector into the binary nonvolatile memory array in a voltage mode; the first analog-to-digital conversion module is used for performing analog-to-digital conversion on a multiplication operation result which is output by the binary nonvolatile memory array and is characterized by current;
the multivalued operation unit includes k1A plurality of multi-value operators; the multi-value arithmetic unit comprises a multi-value nonvolatile memory array and a second peripheral circuit; the second peripheral circuit comprises a second digital-to-analog conversion module, a difference module and a second analog-to-digital conversion module; the output end of the second digital-to-analog conversion module is connected with the input end of the multi-value nonvolatile storage array; the output end of the multi-value nonvolatile memory array is connected with the input end of the differential module, and the output end of the differential module is connected with the input end of the second analog-to-digital conversion module; the second digital-to-analog conversion module is used for inputting the symbol bit vector, the high bit vector or the low bit vector into the multi-value nonvolatile memory array in a voltage mode; the output ends of the multi-value nonvolatile memory array are divided into a group two by two, and the differential module is used for carrying out differential operation on each group of output of the multi-value nonvolatile memory array; the second analog-to-digital conversion module is used for performing analog-to-digital conversion on the difference operation result input by the difference module.
Further preferably, the first digital-to-analog conversion module comprises a plurality of digital-to-analog converters, and output ends of the digital-to-analog converters are connected with input ends of each row participating in operation in the binary nonvolatile memory array in a one-to-one correspondence manner; the first analog-to-digital conversion module comprises a plurality of groups of cascaded transimpedance amplifiers and analog-to-digital converters, and the input ends of the transimpedance amplifiers and the analog-to-digital converters are connected with the output ends of all the columns participating in operation in the multi-value nonvolatile memory array in a one-to-one correspondence mode.
Further preferably, the second digital-to-analog conversion module comprises a plurality of digital-to-analog converters, and output ends of the digital-to-analog converters are connected with input ends of each row participating in operation in the multi-valued nonvolatile memory array in a one-to-one correspondence manner; the differential module comprises a plurality of differential units, and the input ends of the differential units are correspondingly connected with each group of outputs of the multi-value nonvolatile memory array one by one; the differential unit comprises two trans-impedance amplifiers and a voltage subtracter; after each group of output of the multi-value nonvolatile memory array is amplified by a trans-impedance amplifier, differential operation is carried out by a voltage subtracter; the first analog-to-digital conversion module comprises a plurality of analog-to-digital converters which are connected with the output ends of the differential units in a one-to-one correspondence mode.
Further preferably, the shift accumulation module comprises a first shift accumulation unit, a second shift accumulation unit and a third shift accumulation unit;
the input end of the first shift accumulation unit is connected with the output end of each binary operator in the binary operation unit; the input end of the second shift accumulation unit is connected with the output end of each multi-value operator in the multi-value operation unit; the input end of the third shift accumulation unit is respectively connected with the output ends of the first shift accumulation unit and the second shift accumulation unit;
the first shift accumulation unit is used for carrying out shift and accumulation operation on multiplication operation results output by each binary operator based on a binary operation rule;
the second shift accumulation unit is used for carrying out shift and accumulation operation on the multiplication operation result output by each multi-value operator based on the binary operation rule;
the third shift accumulation unit is used for shifting and accumulating the outputs of the first shift accumulation unit and the second shift accumulation unit based on the binary operation rule to obtain a vector matrix multiplication operation result of the input vector and the storage matrix.
Further preferably, the vector matrix multiplication operator with adjustable memory precision further includes: a control module;
the control module is respectively connected with the external input module, the memory calculation module and the shift accumulation module and is used for controlling the working time sequence of the external input module, the memory calculation module and the shift accumulation module in the operation process.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
1. the invention provides a memory precision adjustable vector matrix multiplication method and an arithmetic unit, which are characterized in that after an input vector participating in operation and operation data in a storage matrix are split and converted into sign bit data, high bit data and low bit data based on a splitting and converting rule of the operation data, vector matrix multiplication operation is carried out based on a nonvolatile storage array, wherein the bit number of the high bit data and the low bit data is adjusted according to required calculation precision, variable-precision vector-matrix operation is realized, and the technical problem that the existing memristive vector-matrix operation unit cannot realize precision adjustable calculation aiming at different applications is solved.
2. The vector matrix multiplication operation method and the arithmetic unit with adjustable memory precision, provided by the invention, carry out vector matrix multiplication operation based on a binary nonvolatile memory array and a multi-value nonvolatile memory array, adopt the binary nonvolatile memory array to store a sign bit matrix and a high bit matrix, adopt the multi-value nonvolatile memory array to store a low bit matrix, and for memory calculation based on a nonvolatile memory, the accuracy of binary calculation is obviously higher than that of multi-value calculation, and the energy efficiency of the multi-value calculation is obviously higher than that of the binary calculation. The invention ensures that the data bit with obvious influence on the calculation is calculated by the binary unit to ensure the calculation precision, and the data bit with small influence on the calculation structure is calculated by the multi-value unit, thereby effectively reducing the influence of the nonideal effect of the device on the calculation result, ensuring the energy efficiency of the calculation, and realizing the precision adjustable memory calculation with high robustness and high energy efficiency.
3. The vector matrix multiplication method and the arithmetic unit with adjustable memory precision can enable more high-order data to be operated in a binary nonvolatile memory array when the calculation precision needs to be ensured; when the calculation energy efficiency needs to be ensured, more low-bit data can be operated in the multi-valued nonvolatile memory array; the calculation precision requirement and the calculation energy efficiency requirement can be well balanced.
4. The vector matrix multiplication arithmetic unit with adjustable memory precision provided by the invention carries out shift and accumulation operation on multiplication operation results output by the nonvolatile memory array in a grading way, and because the digital operation unit has very high operation precision and operation robustness, the digital operation unit is used for carrying out shift accumulation operation, so that extra calculation errors are not introduced in the shift accumulation process, and the accuracy of the shift accumulation results is ensured.
5. The arithmetic unit formed by the vector matrix multiplication method with adjustable memory precision has good universality, can be used for supporting the vector-matrix multiplication process in scientific calculation such as machine learning and the like based on the characteristic of adjustable precision, and has the remarkable advantages of calculation power and power consumption overhead compared with the conventional calculation system.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Examples 1,
A vector matrix multiplication method with adjustable memory precision, as shown in fig. 1, includes the following steps:
s1, converting each operation data in the storage matrix into length p based on the splitting conversion rule of the operation data1+k1Obtaining the converted memory matrix by mixing the binary data; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p 11 high order matrix and k1The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a nonvolatile memory array stores a matrix, and the number of nonvolatile memory arrays used for operation is p1+k1A 1, p1And k1All are integers which can be adjusted according to the required calculation precision; for example, when executing a machine learning algorithm, the required computational accuracy is 8-bit, and p can be taken1=k1When performing a scientific computational task, the required computational accuracy is 16-bit, and p can be taken1=k1=4。
The splitting and converting rule of the operation data is as follows: performing complement operation on the operation data to obtain m-bit binary complement expression; splitting the previous p bit data (the first bit data is the sign bit data) in the complementary code expression according to binary bits to obtain 1 sign bit data and p-1 high bit data; sequentially tearing back m-p bit data in complementary code expressionDivided into k pieces with length of m
1、m
2、…、m
kAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,
and p and k are integers and are adjustable according to required calculation precision. It should be noted that, in the splitting process, the sign bit and
bits 2 to p which can significantly affect the calculation result are both split into binary data; sequentially splitting m-p-bit data into k pieces of m lengths
1、m
2、…、m
kGenerally, for the result after splitting the m-p bit data, the length of the binary data is shorter the farther the front is; in this example, m
1、m
2、…、m
kAre sequentially increased from left to right; the values of p and k can be modulated according to the required computational accuracy.
Specifically, as shown in fig. 2, the mixed binary data of the memory matrix includes 1 symbol bit data, p, arranged in order from the upper bit to the lower bit
1-1 high order data sum k
1Individual low bit data; the sign bit data and the high bit data are binary data expressed by the first p in the binary complement of the operation data
1Splitting bit data according to binary bits to obtain the split bit data; the lower data are decimal data expressed by the last m-p in the binary complement of the operation data
1Bit data sequence split into k
1Each length is m
1、m
2、…、m
k1Performing decimal conversion on the binary data to obtain the decimal data; wherein,
and m is the length of the two's complement expression.
S2, converting each operation data in the input vector into length p based on the splitting conversion rule of the operation data2+k2To obtain the converted input vectorAn amount; integrating each mixed system data in the converted input vector according to bits to obtain 1 symbol bit vector p2-1 high vector sum k2A low-order vector; wherein p is2And k2The values are integers and are adjustable according to required calculation precision; likewise, for example, when executing a machine learning algorithm, the required computational accuracy is 8-bit, then p can be taken2=k2When performing a scientific computational task, the required computational accuracy is 16-bit, and p can be taken2=k2=4。
Specifically, as shown in fig. 3, the operation data in the input vector is converted by the same conversion method as the operation data in the memory matrix, and the length p is obtained2+k2And mixing the binary data to obtain the converted input vector.
It should be noted that the input vector and the memory matrix may not be consistent in accuracy or in accuracy.
S3, inputting the sign bit vector, the high bit vector and the low bit vector to p in sequence1+k1In a nonvolatile memory array, a sign bit vector, a high bit vector, a low bit vector and p are sequentially realized1+k1Multiplication of the matrices;
specifically, in this embodiment, 1 sign bit vector, p, corresponding to the input vector2-1 high vector sum k2The low-order bit vectors are sequentially arranged from high order to low order according to the original splitting order, and the sign bit vectors are simultaneously input into the matrix for storing 1 sign bit and p according to the arrangement order of the high order and the low order 11 high order matrix and k1P of a lower matrix1+k1In a non-volatile memory array, implementing a sign bit vector and p1+k1Multiplication of the matrices; a, (a) is 1,22-1) high order vectors are simultaneously input to the above-mentioned matrix for storing 1 sign bit, p 11 high order matrix and k1P of a lower matrix1+k1In a non-volatile memory array, implementing high bit vectors and p1+k1Multiplication of the matrices; the method comprises the following steps of (1, 2, 1.. times.p)2-1) low-order vectors are simultaneously input into the above-mentioned matrix for storing 1 sign bit, p 11 high order matrix and k1P of a lower matrix1+k1In a non-volatile memory array, implementing high bit vectors and p1+k1Multiplication of the matrices.
S4, shifting and accumulating the obtained multiplication result based on the binary operation rule to obtain a vector matrix multiplication result of the input vector and the storage matrix;
furthermore, the nonvolatile memory array is of a cross structure, and the nonvolatile memory devices are positioned on the cross points; the nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; recording a nonvolatile storage array of the nonvolatile storage device based on multi-value storage as a multi-value nonvolatile storage array;
in an optional embodiment, the nonvolatile memory array includes a binary nonvolatile memory array and a multivalued nonvolatile memory array; the binary non-volatile memory array has p1A plurality for storing a sign bit matrix and a high bit matrix; the multivalued nonvolatile memory array has k1And the low-order matrix is used for storing the low-order matrix.
Further, in an optional embodiment, as shown in fig. 4, the present invention performs shift and accumulation operations on the multiplication result in stages, and specifically, the shift and accumulation operations may be performed on the multiplication result output by the binary nonvolatile memory array and the multiplication result output by the multivalued nonvolatile memory array, respectively, and then the shift and accumulation operations may be further performed on the result obtained by the shift and accumulation operations.
It should be noted that the vector matrix multiplication method with adjustable memory precision has the characteristic of balancing the calculation precision requirement and the calculation energy efficiency requirement, and when the calculation precision needs to be ensured, more high-bit data are operated in the binary nonvolatile memory array; when the calculation energy efficiency needs to be guaranteed, more low-bit data are operated in the multi-value nonvolatile memory array. Therefore, according to the accuracy of the user pairWhen in operation, the operation precision can be ensured by adopting binary nonvolatile memory array operation, and the operation energy efficiency can be ensured by adopting multi-value nonvolatile memory array operation; can be obtained by adjusting p1And k1Is achieved by the value of (c).
Examples 2,
An in-memory precision-adjustable vector matrix multiplication arithmetic unit can be used for realizing the in-memory precision-adjustable vector matrix multiplication arithmetic method provided by embodiment 1 of the present invention, and includes: the device comprises an external input module, a memory computing module and a shift accumulation module;
the in-memory computing module includes p1+k1A nonvolatile memory array, wherein p1And k1The calculation precision is adjustable according to the requirement;
the external input module is used for converting each operation data in the storage matrix into the length p based on the splitting conversion rule of the operation data1+k1Obtaining the converted memory matrix by mixing the binary data; integrating each mixed system data in the converted memory matrix according to bits to obtain 1 sign bit matrix and p 11 high order matrix and k1The low-order matrixes are respectively stored in the corresponding nonvolatile storage arrays according to the high-order and low-order orders; a non-volatile memory array storing a matrix;
the external input module is also used for converting each operation data in the input vector into the length p respectively based on the splitting conversion rule of the operation data2+k2Obtaining the converted input vector by mixing the binary data; integrating each mixed system data in the converted input vector according to bits to obtain 1 symbol bit vector p2-1 high vector sum k2A low-order vector, and sequentially input to p1+k1A plurality of non-volatile memory arrays; it should be noted that the input vector and the memory matrix may not be consistent in accuracy or in accuracy.
The memory computing module is used for sequentially realizing the symbol bit vector, the high bit vector, the low bit vector and the p based on the nonvolatile memory array1+k1Of a matrixMultiplication operation;
the shift accumulation module is used for shifting and accumulating the obtained multiplication operation result based on the binary operation rule to obtain a vector matrix multiplication operation result of the input vector and the storage matrix;
the splitting and converting rule of the operation data is as follows: splitting operation data based on an operation rule of a binary signed number, specifically, performing complement operation on the operation data to obtain a binary complement expression of m bits; splitting the previous p bits of data in the complementary code expression according to binary bits to obtain 1 symbol bit of data and p-1 high bits of data; sequentially splitting the m-p bit data in the complementary code expression into k bits with lengths of m
1、m
2、…、m
kAfter the binary data are converted into corresponding decimal numbers, k low-bit data are obtained; further obtaining 1 sign bit data, p-1 high bit data and k low bit data which are sequentially arranged from high bit to low bit and marked as mixed binary data with the length of p + k; wherein,
specifically, as shown in FIG. 5, the non-volatile memory array includes an input port and an output port, wherein the input port is formed by row lines of the array and the output port is formed by column lines of the array. The nonvolatile memory array is of a cross structure, and the nonvolatile memory devices are positioned on the cross points; the selected nonvolatile memory device can be Resistive Random Access Memory (RRAM), Phase Change Memory (PCM), NOR-FLASH, spin transfer torque magnetic memory (STT-MRAM), ferroelectric field effect transistor (FeFET), etc. The nonvolatile memory device is used for carrying out binary storage or multi-value storage; recording a nonvolatile memory array of the nonvolatile memory device based on binary storage as a binary nonvolatile memory array; a nonvolatile memory array of a nonvolatile memory device based on multi-value storage is recorded as a multi-value nonvolatile memory array.
In an optional implementation, the nonvolatile memory array includes a binary nonvolatile memory array and a multivalued nonvolatile memory array; the binary non-volatile memory array has p1Is used for storingStoring a sign bit matrix and a high bit matrix input by an external input module; the multivalued nonvolatile memory array has k1And the low-order matrix is used for storing the input of the external input module.
Specifically, as shown in fig. 6, the memory computing module includes the binary operation unit and a multi-valued operation unit;
a binary operation unit:
the binary operation unit comprises p1A binary operator; as shown in fig. 7, the binary operator includes a binary nonvolatile memory array and a first peripheral circuit;
the first peripheral circuit comprises a first digital-to-analog conversion module and a first analog-to-digital conversion module; the output end of the first digital-to-analog conversion module is connected with the input end of the binary nonvolatile storage array; the output end of the binary nonvolatile storage array is connected with the input end of the first analog-to-digital conversion module;
the first digital-to-analog conversion module is used for inputting a sign bit vector, a high bit vector or a low bit vector which is input by the external input module into the binary nonvolatile storage array in a voltage mode; the first analog-to-digital conversion module is used for performing analog-to-digital conversion on a multiplication operation result which is output by the binary nonvolatile memory array and is characterized by current.
In an optional implementation manner, the first digital-to-analog conversion module includes a plurality of digital-to-analog converters (DACs), and output ends of the DACs are connected with input ends of each row participating in operation in the binary nonvolatile memory array in a one-to-one correspondence manner; the first analog-to-digital conversion module comprises a plurality of groups of cascaded transimpedance amplifiers and analog-to-digital converters (ADC), wherein the input end of each group is connected with the output end of each column participating in operation in the multi-value nonvolatile memory array in a one-to-one correspondence mode; the output end of the transimpedance amplifier is connected with the input end of the analog-to-digital converter.
Further, in an alternative embodiment, the array size for the memory matrix in the binary nonvolatile memory array is M × N; the first digital-to-analog conversion module is composed of M digital-to-analog converters (DAC), and the first analog-to-digital conversion module is composed of N trans-impedance amplifiers and N analog-to-digital converters (ADC).
A multivalued arithmetic unit:
the multivalued operation unit includes k1A plurality of multi-value operators; as shown in fig. 8, the multi-value operator includes a multi-value nonvolatile memory array and a second peripheral circuit;
the second peripheral circuit comprises a second digital-to-analog conversion module, a difference module and a second analog-to-digital conversion module; the output end of the second digital-to-analog conversion module is connected with the input end of the multi-value nonvolatile storage array; the output end of the multi-value nonvolatile memory array is connected with the input end of the differential module, and the output end of the differential module is connected with the input end of the second analog-to-digital conversion module; the second digital-to-analog conversion module is used for inputting the symbol bit vector, the high bit vector or the low bit vector into the multi-value nonvolatile memory array in a voltage mode; the output ends of the multi-value nonvolatile memory array are divided into a group two by two, and the differential module is used for carrying out differential operation on each group of output of the multi-value nonvolatile memory array; the second analog-to-digital conversion module is used for performing analog-to-digital conversion on the difference operation result input by the difference module. In an optional implementation manner, the second digital-to-analog conversion module includes a plurality of digital-to-analog converters, and output ends of the digital-to-analog converters are connected with input ends of each row participating in operation in the multi-valued nonvolatile memory array in a one-to-one correspondence manner; the differential module comprises a plurality of differential units, and the input ends of the differential units are correspondingly connected with each group of outputs of the multi-value nonvolatile memory array one by one; the differential unit comprises two trans-impedance amplifiers and a voltage subtracter; after each group of output of the multi-value nonvolatile memory array is amplified by a trans-impedance amplifier, differential operation is carried out by a voltage subtracter; the first analog-to-digital conversion module comprises a plurality of analog-to-digital converters which are connected with the output ends of the differential units in a one-to-one correspondence mode.
Further, in an alternative embodiment, the array size for the memory matrix in the multivalued nonvolatile memory array is K × 2L; for example, the size of the multivalued nonvolatile memory array is X × Y; wherein X is a positive integer and Y is an even number; when a matrix with the size of K multiplied by L is stored in the first K rows and the first 2L columns of the multi-value nonvolatile memory array, the element of the K row and the L column in the matrix is the difference between the conductance value of the K row 2L-1 column and the conductance value of the K row 2L column of the multi-value nonvolatile memory array, wherein K is 1,2, …, K, L is 1,2, …, L, K is less than or equal to X, and 2L is less than or equal to Y. The second digital-to-analog conversion module is composed of K digital-to-analog converters (DAC), and the second analog-to-digital conversion module is composed of 2L trans-impedance amplifiers and L analog-to-digital converters (ADC). The difference module is composed of 2L electric trans-impedance amplifiers and L voltage subtractors, wherein the trans-impedance amplifiers are connected to the output end of the multi-value nonvolatile storage array, two adjacent columns of the multi-value nonvolatile storage array are arranged into a group of difference pairs and connected to the voltage subtractors, the analog-to-digital conversion module is composed of L analog-to-digital converters (ADC), and the input end of the analog-to-digital conversion module is connected with the output end of the voltage subtractors.
Further, in an optional implementation manner, the shift accumulation module includes a first shift accumulation unit, a second shift accumulation unit, and a third shift accumulation unit; the shift accumulation output unit in the embodiment comprises an Arithmetic Logic Unit (ALU) and a corresponding cache structure;
the input end of the first shift accumulation unit is connected with the output end of each binary operator in the binary operation unit; the input end of the second shift accumulation unit is connected with the output end of each multi-value operator in the multi-value operation unit; the input end of the third shift accumulation unit is respectively connected with the output ends of the first shift accumulation unit and the second shift accumulation unit;
the first shift accumulation unit is used for carrying out shift and accumulation operation on multiplication operation results output by each binary operator based on a binary operation rule;
the second shift accumulation unit is used for carrying out shift and accumulation operation on the multiplication operation result output by each multi-value operator based on the binary operation rule;
the third shift accumulation unit is used for shifting and accumulating the outputs of the first shift accumulation unit and the second shift accumulation unit based on the binary operation rule to obtain a vector matrix multiplication operation result of the input vector and the storage matrix.
In an optional implementation manner, the vector matrix multiplication operator with adjustable in-memory precision further includes: a control module; specifically, the control module comprises an Arithmetic Logic Unit (ALU) and a corresponding cache structure;
the control module is respectively connected with the external input module, the memory calculation module and the shift accumulation module and is used for integrally controlling the working time sequence of the external input module, the memory calculation module and the shift accumulation module in the operation process.
In an alternative embodiment, a schematic structural diagram of the memory precision-adjustable vector matrix multiplication operator is shown in fig. 9, and a hardware structure thereof includes: the device comprises an external input module, an internal memory calculation module and a shift accumulation module. The memory computing module comprises a binary arithmetic unit and a multi-value arithmetic unit, and each arithmetic unit internally comprises a plurality of arithmetic units. Each of the operators is composed of a nonvolatile memory array and its corresponding peripheral circuits. During operation, the matrix to be operated is split according to the binary operation rule and is stored in the nonvolatile memory array of the operation core. The input vector is split by an external input unit according to the binary operation rule and then is input into the memory computing module. And after the memory computing module finishes computing, inputting the result into a shift accumulation output module, finishing shift accumulation operation according to the binary operation rule and outputting the computing result. The invention adopts a mixed operation framework, effectively reduces the influence of the nonideal effect of the device on the calculation result by adopting a mode that a binary operation unit processes high-order data and a multi-valued operation unit processes low-order data, simultaneously ensures the energy efficiency of calculation, and can realize high-robustness and high-energy-efficiency precision-adjustable memory calculation. The related technical features are the same as above, and are not described herein.
It should be noted that, for different operation requirements, the number of called nonvolatile memory arrays may be adjusted to perform operations with different accuracies, and the storage accuracy of devices in the multi-valued arithmetic unit may also be adjusted to fully implement operations with adjustable accuracy. Further, when the calculation accuracy needs to be ensured, more high-order data can be operated in the binary operation unit. When the calculation energy efficiency needs to be ensured, more low-bit data can be operated in the multi-value operation unit. Therefore, during the operation, the operation precision can be ensured by all binary operation unit operations, and the operation energy efficiency can be ensured by all multi-value operation unit operations
Further, under the condition that the usage scenario allows (such as hybrid precision architecture solution equation and inference application of a neural network), the binary operation unit can be used for simultaneously completing the binary operation requirement and the multi-value operation requirement, so that the circuit area is reduced, and higher operation energy efficiency is ensured.
The related technical features are the same as those of embodiment 1, and are not described herein.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.