CN119848407A

CN119848407A - Arithmetic circuit and computing device

Info

Publication number: CN119848407A
Application number: CN202510346804.5A
Authority: CN
Inventors: 王丹阳; 范志军; 杨作兴
Original assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Current assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Priority date: 2025-03-24
Filing date: 2025-03-24
Publication date: 2025-04-18
Anticipated expiration: 2045-03-24
Also published as: CN119848407B

Abstract

The present disclosure relates to an operation circuit and a computing device, wherein the operation circuit comprises: a multiple operation module configured to calculate a preset multiple of a first operand to generate a corresponding second operand; and an inner product operation module communicatively connected to the multiple operation module, and the inner product operation module is configured to calculate a dot product of a first vector and a second vector based on at least a second operand from the multiple operation module and a second coefficient multiplied by the second operand.

Description

Arithmetic circuit and computing device

Technical Field

The present disclosure relates to the field of electronic circuit technology, and more particularly, to an arithmetic circuit and a computing device.

Background

Dot product operations between vectors are often involved in the processing of data. However, the current circuit for performing the vector dot product operation has the problems of complex structure, large circuit area, long critical path, difficult timing convergence, large circuit power consumption and the like, and needs to be improved.

Disclosure of Invention

It is an object of the present disclosure to provide an arithmetic circuit and a computing device.

According to a first aspect of the present disclosure, there is provided an arithmetic circuit for calculating at least a dot product of a first vector and a second vector, wherein the first vector is a coefficient vector known in advance, the arithmetic circuit comprising:

A multiple operation module configured to calculate a preset multiple of the first operand generated from at least one target element in the second vector to generate a corresponding second operand, wherein in the dot product calculation, an absolute value of a first coefficient multiplied by the target element in the first vector is greater than or equal to an absolute value of the preset multiple and the preset multiple is greater than 1 time, and

And an inner product operation module communicatively connected to the multiple operation module and configured to calculate a dot product of the first vector and the second vector based at least on the second operand from the multiple operation module and the second coefficient multiplied by the second operand, wherein for each first operand and the corresponding second operand, the product of the second coefficient and a predetermined multiple is less than or equal to the first coefficient.

In some embodiments, the predetermined multiple is determined based on a first vector, or,

The preset multiple includes at least one of 3 times, 5 times, and 7 times.

In some embodiments, the multiple operation module includes one or more multiple calculation units including:

A first shifter configured to shift a fourth operand to the left by m bits to generate a fifth operand, wherein the fifth operand is a fourth operand that is a multiple of 2 ^m, the fourth operand is a first operand that is a positive integer multiple, and m is a positive integer,

And a first addition unit configured to add a sixth operand and a seventh operand to generate an eighth operand, wherein the sixth operand is a positive integer multiple of the first operand and the seventh operand is a positive integer multiple of the first operand.

In some embodiments, the multiple operation module further comprises:

A first register communicatively connected to the at least one multiple computing unit, and configured to register a fifth operand or an eighth operand from the multiple computing unit.

In some embodiments, the inner product operation module is further configured to calculate a dot product of the first vector and the second vector based on the first operand corresponding to the second operand and the third coefficient multiplied by the first operand, wherein, for each of the first operand and the corresponding second operand, the sum of the product of the second coefficient and the predetermined multiple plus the third coefficient is equal to the first coefficient, and/or,

The inner product operation module is further configured to calculate a dot product of the first vector and the second vector based on a third operand and a first coefficient in the first vector multiplied by the third operand, wherein the third operand is generated based on at least one element in the second vector other than the element used to generate the first operand.

In some embodiments, the inner product operation module comprises at least one of:

A second shifter configured to shift the ninth operand to the left by n bits to generate a tenth operand, wherein the tenth operand is a ninth operand of 2 ⁿ times, the ninth operand is a second operand, the first operand, or a third operand, and n is a positive integer, and,

And a second addition unit configured to calculate a sum of at least a partial product of the second operand and the second coefficient, the product of the first operand and the third coefficient, and the product of the third operand and the first coefficient.

In some embodiments, the inner product operation module further comprises:

a second register communicatively connected to the second shifter or the second addition unit, and configured to register a sum of a tenth operand from the second shifter or a product from the second addition unit.

In some embodiments, the arithmetic circuit further comprises a bitwise inverting module configured to bitwise invert a first operand or a third operand multiplied by a first coefficient in a first vector, or a second operand generated from the first operand multiplied by the first coefficient, if the first coefficient is negative;

The second adding unit is configured to calculate a sum of a preset constant and at least a part of products among products of the second operand and the second coefficient, products of the first operand and the third coefficient and products of the third operand and the first coefficient, wherein the preset constant is determined according to the first coefficient which is at least partially negative in the first vector.

In some embodiments, the inner product operation module includes:

a third addition unit configured to add at least a product of the first operand and a third coefficient and a product of the third operand and the first coefficient to generate an eleventh operand and a twelfth operand, wherein the third addition unit operates within the same clock cycle as the multiple operation module, and,

A fourth addition unit communicatively coupled to the third addition unit and configured to add at least the eleventh operand, the twelfth operand, and the product of the second operand and the second coefficient from the third addition unit to produce a thirteenth operand and a fourteenth operand.

In some embodiments, the inner product operation module further comprises:

A third register communicatively connected to the third addition unit, and configured to register an eleventh operand from the third addition unit;

A fourth register communicatively connected to the third addition unit, and configured to register a twelfth operand from the third addition unit;

A fifth register communicatively coupled to the fourth addition unit and configured to register a thirteenth operand from the fourth addition unit, and,

A sixth register communicatively connected to the fourth addition unit, and configured to register a fourteenth operand from the fourth addition unit.

In some embodiments, the inner product operation module further comprises:

a fifth addition unit communicatively coupled to the fourth addition unit and configured to add a thirteenth operand and a fourteenth operand to produce a dot product of the first vector and the second vector.

In some embodiments, the inner product operation module further comprises:

A seventh register communicatively connected to the fifth addition unit, and configured to register a dot product of a first vector and a second vector from the fifth addition unit.

In some embodiments, the arithmetic circuitry is configured to perform image interpolation operations, and the first vector is a coefficient vector for image interpolation.

According to a second aspect of the present disclosure there is provided a computing device comprising one or more arithmetic circuits as described above.

In some embodiments, where the computing device includes multiple operational circuits, at least two operational circuits operate in parallel.

Other features of the present disclosure and its advantages will become more apparent from the following detailed description of exemplary embodiments of the disclosure, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 illustrates a schematic diagram of a computing device performing image interpolation;

FIG. 2 shows a schematic diagram of an arithmetic circuit in the computing device of FIG. 1;

FIG. 3 illustrates a block diagram of a computing device according to an exemplary embodiment of the present disclosure;

FIG. 4 illustrates a block diagram of an arithmetic circuit according to an exemplary embodiment of the present disclosure;

FIG. 5 illustrates a block diagram of a multiple operation module in an operation circuit according to a specific embodiment of the present disclosure;

FIG. 6 illustrates a block diagram of a multiple operation module in an operation circuit according to another specific embodiment of the present disclosure;

FIG. 7 illustrates a block diagram of a multiple operation module in an operation circuit according to yet another specific embodiment of the present disclosure;

FIG. 8 illustrates a block diagram of an inner product operation module in an operation circuit according to a specific embodiment of the present disclosure;

FIG. 9 shows a block diagram of an arithmetic circuit according to another exemplary embodiment of the present disclosure;

FIG. 10 illustrates a schematic diagram of a computing device performing image interpolation according to a specific embodiment of the present disclosure;

FIG. 11 shows a schematic diagram of an arithmetic circuit in the computing device of FIG. 10;

Fig. 12 shows a schematic configuration diagram of an arithmetic circuit according to a specific embodiment of the present disclosure.

Note that in the embodiments described below, the same reference numerals are used in common between different drawings to denote the same parts or parts having the same functions, and a repetitive description thereof may be omitted. In this specification, like reference numerals and letters are used to designate like items, and thus once an item is defined in one drawing, no further discussion thereof is necessary in subsequent drawings.

For ease of understanding, the positions, dimensions, ranges, etc. of the respective structures shown in the drawings and the like may not represent actual positions, dimensions, ranges, etc. Accordingly, the disclosed invention is not limited to the disclosed positions, dimensions, ranges, etc. as illustrated in the drawings. Moreover, the figures are not necessarily to scale, some features may be exaggerated to show details of particular components.

Detailed Description

Various exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It should be noted that the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. That is, the structures and methods herein are shown by way of example to illustrate different embodiments of the structures and methods in this disclosure. However, those skilled in the art will appreciate that they are merely illustrative of the exemplary ways in which the disclosure may be practiced, and not exhaustive. Moreover, the figures are not necessarily to scale, some features may be exaggerated to show details of particular components.

In addition, techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be considered part of the specification where appropriate.

In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.

In various data processing processes, dot product operation of vectors is a common form of operation. For example, the dot product P _d between the first vector a and the second vector B, each of dimensions I, can be expressed as:

(1)

Wherein, Representing the ith component of the first vector a,Represents the ith component of the second vector B, i being an integer and(As will be appreciated by those skilled in the art, the value of i may also be an integer greater than or equal to 1, which does not affect the nature of the operation, and thus will be described in detail herein by taking the example of starting from zero.

In some examples, the dot product operation of the vector may be used in image interpolation. Specifically, image interpolation is a process of deducing the values of new pixels by knowing the values of the pixels, which may not be on the integer coordinates of the original image, and these new pixel values may be obtained by an image interpolation algorithm. Image interpolation may be required in relation to an image or video frame when the size, shape or position of the image changes, or in order to fill in missing frames or to generate smooth slow motion effects during video processing, etc.

Depending on the image interpolation algorithm employed, a corresponding coefficient vector may be determined, which may be taken as the first vector in the vector dot product operation. In a specific example, in advanced video coding (AVC, h.264), pixel values for half-pixel positions may be generated based on coefficient vectors (0, 1, -5, 20, 20, -5, 1, 0). In another specific example, in high efficiency video coding (HEVC, h.265), pixel values for half-pixel positions may be generated based on coefficient vectors (-1, 4, -11, 40, 40, -11, 4, -1), pixel values for quarter-pixel positions based on coefficient vectors (-1, 4, -10, 58, 17, -5, 1, 0), or pixel values for three-quarter-pixel positions based on coefficient vectors (0, 1, -5, 17, 58, -10, 4, -1). In some embodiments, the technical solutions of the present disclosure will be described in detail below by taking the coefficient vector (-1, 4, -11, 40, 40, -11, 4, -1) in h.265 as a first vector and taking a second vector formed by pixel values of eight pixels to be processed in an image as an example. It will be appreciated, however, that the arithmetic circuitry and computing means of the present disclosure may also be used to calculate dot products between other types of vectors, or in application scenarios other than image interpolation, without limitation.

As shown in fig. 1 and 2, a computing device 200' may include a plurality of arithmetic circuits 100', and the plurality of arithmetic circuits 100' may operate in parallel to process image interpolation operations associated with groups of pixel points 910 in an image 900 in parallel. In a specific example, for the image 900 shown in fig. 1, in the current clock cycle, the pixel value (p ₀、p₁、…、p_n+7) of the pixel points 910 in the rightmost column of the pixel points that are not currently involved in the interpolation operation may be input to the computing device 200 'to perform the image interpolation operation related to the column, and in the next clock cycle, the pixel value of the pixel point 910 in the column left of the column may be input to the computing device 200' to continue the image interpolation operation until the interpolation of the entire image 900 is completed. For each column of pixels 910, each of the arithmetic circuits 100 'may be configured to perform interpolation associated with one of the groups of pixels 910, e.g., each of the arithmetic circuits 100' shown in fig. 1 may be configured to perform interpolation operations associated with the group of pixels including pixel p ₀~p₇, the group of pixels including pixel p ₁~p₈, and up to the group of pixels including pixel p _n~p_n+7, respectively. In other specific examples, the pixel values of the corresponding pixel points may be input to the corresponding arithmetic circuit 100 'in the computing device 200' in terms of the pixel rows in the image, or in terms of the diagonal lines in the image, etc., to implement the desired image interpolation operation, which is not limited herein. For example, each arithmetic circuit 100' may be used to account for image interpolation operations associated with pixel points 910 that are in a row or column in the image 900, and so forth. Each arithmetic circuit 100' may include eight inputs (d 0, d1, d2, d3, d4, d5, d6, d 7) to respectively receive the pixel value of a corresponding one of the pixels and an output (dout) to output a dot product of the first vector and the second vector or a calculation result of the image interpolation. From equation (1) above, it is known that in vector dot product or image interpolation operations, a large number of multiplication and addition operations are typically involved. In order to simplify the operation, inside each operation circuit 100', the multiplication operation in the dot product calculation may be converted into a shift and addition operation according to the first vector. Taking the first vector a= (-1, 4, -11, 40, 40, -11, 4, -1), the second vector b= (p ₀, p₁, p₂, p₃, p₄, p₅, p₆, p₇) as an example, each element in the first vector may be converted into the form of the sum of the integer powers of 2, i.e. a= (-1, 4, -8-2-1, 32+8, 32+8, -8-2-1, 4, -1), further the dot product of the first vector a and the second vector B may be expressed as:

(2)

Thus, p ₁ and p ₆ can be shifted to the left by 2 bits to obtain 4p ₁ and 4p ₆, respectively, by providing a shifter in the arithmetic circuit 100', p ₂ (or p ₂, i.e., the bit-by-bit inversion of p ₂), p ₃、p₄ and p ₅ (or p ₅) are moved left 3 bits to obtain 8p2 (or 8p ₂）、8p₃、8p₄ and 8p ₅ (or 8p ₅), p ₂ (or p ₂) and p ₅ (or p ₅) are moved left 1 bit to obtain 2p ₂ (or 2p ₂) and 2p ₅ (or 2p ₅), p ₃ and p ₄ are moved left 5 bits to obtain 32p ₃ and 32p ₄, respectively, and then the dot product of the first vector a and the second vector B is obtained by providing an addition unit (or a subtraction unit if necessary) in the arithmetic circuit 100', etc. However, as can be seen from the above formula (2), such an operation method introduces a large number of addition (or subtraction) operations, which results in the defects of complex structure, large circuit area, long critical path, difficult timing convergence, and large circuit power consumption in the operation circuit 100'.

In order to solve the above-described problems, the present disclosure proposes an arithmetic circuit and a computing device including the same. In an exemplary embodiment of the present disclosure, as shown in fig. 3, the computing device 200 may include one or more arithmetic circuits 100, wherein the arithmetic circuits 100 may be used to calculate a dot product of a first vector and a second vector. In embodiments of the present disclosure, the first vector may be a previously known coefficient vector in order to determine a specific arrangement of various components in the arithmetic circuit 100 from the previously known first coefficient in the first vector. In contrast to the case where both the first vector and the second vector are indeterminate, the arithmetic circuit 100 of the present disclosure only needs to consider the indeterminate second vector, thereby contributing to simplifying the circuit structures of the arithmetic circuit 100 and the computing device 200. Further, in the case where the computing device 200 includes a plurality of the arithmetic circuits 100, at least two of the arithmetic circuits 100 may be operated in parallel to improve the arithmetic efficiency.

In some embodiments, the arithmetic circuit 100 and the computing device 200 containing the same may be configured to perform image interpolation operations. Similar to the above description about fig. 1, in a specific example, for the image 900 shown in fig. 10, in the current clock cycle, the pixel value (p ₀、p₁、…、p_n+7) of the pixel point 910 in the rightmost column and the pixel value (5 p ₀、5p₁、…、5p_n+7) of the preset multiple among the pixel points that have not yet participated in the interpolation operation at present may be input into the computing device 200 to perform the image interpolation operation related to the column (the operation process will be described in detail later), and in the next clock cycle, the pixel value of the pixel point 910 in the column left side and the pixel value of the preset multiple may be input into the computing device 200 to continue the image interpolation operation until the interpolation of the entire image 900 is completed. For each column of pixels 910, each of the arithmetic circuits 100 may be configured to perform interpolation associated with one of the groups of pixels 910, e.g., each of the arithmetic circuits 100 shown in fig. 10 may be configured to perform interpolation operations associated with the group of pixels including pixel p ₀~p₇, the group of pixels including pixel p ₁~p₈, and up to the group of pixels including pixel p _n~p_n+7, respectively. In other specific examples, the pixel values of the corresponding pixel points may be input to the corresponding arithmetic circuit 100 in the computing device 200 in terms of the pixel rows in the image, or in terms of the diagonal lines in the image, or the like, to achieve a desired image interpolation operation, which is not limited herein. For example, each arithmetic circuit 100 may be responsible for image interpolation operations associated with pixels that are in a row or column in an image. Accordingly, the first vector may be a coefficient vector for image interpolation. It will be appreciated that in other embodiments, the computing device 200 and the computing circuit 100 of the present disclosure may also be used for vector dot product operations and the like that are involved in other application scenarios, and are not limited herein.

In this disclosure, examples of computing device 200 may include, but are not limited to, video encoders, consumer electronics, components of consumer electronics, electronic test equipment, image processing equipment, cellular communication infrastructure such as base stations, and the like. Examples of computing device 200 may include, but are not limited to, a mobile phone such as a smart phone, a wearable computing device such as a smart watch or headset, a phone, a television, a computer monitor, a computer, a modem, a handheld computer, a laptop computer, a tablet computer, a Personal Digital Assistant (PDA), an in-vehicle electronic system such as an automotive electronic system, a stereo system, a DVD player, a camcorder, a camera such as a digital camera, a portable memory chip, a peripheral device, a clock, and the like. Furthermore, computing device 200 may include a non-complete product.

In an exemplary embodiment of the present disclosure, as shown in fig. 4, the arithmetic circuit 100 may include a multiplier operation module 110 and an inner product operation module 120, wherein the inner product operation module 120 is communicatively connected with the multiplier operation module 110, e.g., an input of the inner product operation module 120 may be communicatively connected with an output of the multiplier operation module 110 to obtain data from the multiplier operation module 110 for further computation. It is noted that in some first vectors there may be first coefficients as negative numbers, which may be represented by subtraction, or the subtraction may be converted into addition by bitwise negation of the corresponding operand, etc. for calculation. In the following, if negative coefficients are concerned, they will be expressed in the form of a (or-a), depending on the particular way in which they are handled, either by themselves or by their absolute values, may be employed in a particular calculation.

The multiple operation module 110 may be configured to calculate a preset multiple of the first operand to generate a corresponding second operand, that is, the second operand is equal to a product of the first operand and the preset multiple. Wherein the first operand may be generated from at least one target element in the second vector. The target element in the second vector may refer to an element satisfying a condition that an absolute value of a first coefficient multiplied by the target element in the first vector is greater than or equal to an absolute value of a preset multiple, and the preset multiple is greater than 1 time in the dot product calculation. In this way, by calculating a preset multiple of at least a portion of the target elements in the second vector in advance, the number of shifts and/or addition and subtraction calculations that need to be performed during the dot product operation can be reduced, thereby simplifying the structure of the operation circuit 100.

In some specific examples, the first operand may be generated from a single target element in the second vector. For example, the first operand may be the target element itself in the second vector. Or the first operand may be equal to the bitwise negation of a certain target element in the second vector, especially in case the first coefficient in the first vector corresponding to this target element is a negative number, the subtraction calculation in the dot product operation may be converted into the addition calculation by the bitwise negation operation, thereby further simplifying the structure of the arithmetic circuit 100.

In other specific examples, the first operand may be generated from at least two target elements in the second vector. For example, during a dot product operation, if first coefficients in a first vector multiplied by two or more target elements in a second vector have absolute values equal to each other, then the sum or difference of the two or more target elements in the second vector may be first calculated, and then the product of the resulting sum or difference and the coefficients may be calculated. For example, in the case where the first vector is a= (-1, 4, -11, 40, 40, -11, 4, -1) and the second vector is b= (p ₀, p₁, p₂, p₃, p₄, p₅, p₆, p₇), the value of p ₀+p₇、p₁+p₆、p₂+p₅ or p ₃+p₄ may be calculated first and then their product with the corresponding coefficients may be calculated. That is, the first operand may be the sum or difference of two or more target elements, or may be a bitwise inversion equal to the sum or difference, as desired. It will be appreciated that the first operand may be calculated in other ways as desired from one or more target elements in the second vector, as desired and is not limited in this regard.

Further, it should be noted that in the second vector, there may be multiple target elements that meet the above conditions, but in some embodiments of the present disclosure, it is not necessary that all target elements be ultimately converted to the form of the second operand. For example, if a certain first coefficient in the first vector is 6 and the preset multiple is 5, for an element p multiplied by the first coefficient 6 in the second vector, the multiple operation module 110 may be used to calculate 5p, so that 6p is expressed as 5p+p to be further calculated, or the form that 5 times of the element is not calculated, and 6p is expressed as 4p+2p to be further calculated, where the times of addition operations brought by the two modes are the same. In general, it is possible to reduce the number of shift and/or addition calculations required by generating the corresponding second operand from only a portion of the target elements, thereby simplifying the arithmetic circuit 100 to some extent.

In some embodiments, the arithmetic circuit 100 may include one or more multiple operation modules 110 to convert respective first operands to second operands. In addition, in the case that there are a plurality of multiple operation modules 110, the multiple operation modules 110 may be used to calculate products of the corresponding first operands and the same preset multiple, respectively. Or, if necessary, a plurality of preset multiples may be set, in which case, the multiple operation modules 110 may also be used to calculate the products of the same first operand and the preset multiples, which is not limited herein.

In some embodiments, the preset multiple may be determined from the first vector. In general, the larger the preset multiple, the more likely it is to convert the multiplication calculation related to the first coefficient with larger absolute value in the first vector into less shift and/or addition calculation, but at the same time, for the first coefficient with smaller absolute value in the first vector, it cannot generally be rewritten into the form related to the preset multiple, but only the original shift and addition calculation manner can be adopted. Therefore, a specific value of the preset multiple may be weighted according to a specific first vector. To simplify the operation, the preset multiple may be an integer. Further, in some embodiments, taking into account the multiple of the integer power of 2 may be implemented by a shift operation, the preset multiple may be set to be an odd number or a prime number to simplify the operation as much as possible. For example, the preset multiple may include at least one of 3 times, 5 times, and 7 times. In some specific examples below, the technical solutions of the present disclosure will be described in detail by taking a preset multiple of 5 as an example.

In some embodiments, as shown in fig. 5 to 7, the multiple operation module 110 may include one or more multiple calculation units, wherein the multiple calculation unit may include a first shifter 111 or a first addition unit 112.

In the specific example shown in fig. 5, the multiple calculation unit may include a first shifter 111, and the first shifter 111 may be configured to shift the fourth operand to the left by m bits to generate a fifth operand, that is, a fourth operand of which the fifth operand is 2 ^m times, where the fourth operand may be a first operand of a positive integer multiple, and m is a positive integer. For example, the value of m may be 1, and the first shifter 111 may be configured to generate various even-numbered times of the first operand depending on the specific value of the fourth operand.

In the specific example shown in fig. 6, the multiple calculation unit may include a first addition unit 112, and the first addition unit 112 may be configured to add the sixth operand and the seventh operand to generate an eighth operand, that is, the eighth operand is a sum of the sixth operand and the seventh operand, where the sixth operand may be a first operand of a positive integer multiple, and the seventh operand may be a first operand of a positive integer multiple. It follows that the first addition unit 112 may be configured to generate various integer multiples of the first operand, depending on the specific values of the sixth and seventh operands. In some specific examples, the first addition unit 112 may include an adder to perform addition of two operands.

In addition, various multiple calculation units may be used in combination, and are not limited herein. In the specific example shown in fig. 7, the first adding unit 112 may be connected to the output of the first shifter 111 to conveniently generate the required second operand.

Further, in some embodiments, as shown in fig. 5-7, the multiple operation module 110 may further include a first register 113, the first register 113 may be communicatively connected with at least one multiple calculation unit, and the first register 113 may be configured to register a fifth operand or an eighth operand from the multiple calculation unit. In the specific examples shown in fig. 5 to 7, the first register 113 may be connected to the output terminal of the last stage multiple calculation unit in the multiple calculation module 110, so as to reduce the glitch in the signal, help control the timing in the calculation circuit 100, and ensure the correct operation.

Returning to fig. 4, as described above, the inner product operation module 120 may be communicatively coupled to the multiple operation module 110, and the inner product operation module 120 may be configured to calculate a dot product of the first vector and the second vector based at least on the second operand from the multiple operation module 110 and the second coefficient multiplied by the second operand. Wherein, for each first operand and corresponding second operand, the product of the second coefficient and the preset multiple is less than or equal to the first coefficient. For example, if the first vector a= (5, 15, -10, 5, 5, -10, 15, 5), the second vector b= (p ₀, p₁, p₂, p₃, p₄, p₅, p₆, p₇), the preset multiple is 5, the inner product operation module 120 may be configured to calculate:

(3)

The 5p ₀~5p₇ in the above formula (3) may be pre-calculated by the multiple operation module 110, and other multiplication and addition and subtraction operations related to the above may be implemented by the inner product operation module 120, where the second coefficients corresponding to the second operand 5p ₀~5p₇ are 1,3, 2 (or-2), 1, 2 (or-2), 3, 1, respectively.

In some embodiments, depending on the particular value of the first coefficient in the first vector, a certain target element in the second vector may be converted into the form of the sum of the product of the second operand and the integer power of 2 of that target element. In this case, in order to correctly calculate the dot product of the first vector and the second vector, the inner product operation module 120 may be further configured to calculate the dot product of the first vector and the second vector from the first operand corresponding to the second operand and the third coefficient multiplied by the first operand, wherein, for each of the first operand and the corresponding second operand, the sum of the product of the second coefficient and the preset multiple plus the third coefficient is equal to the first coefficient. For example, if the first vector a= (6, 15, -10, 5,5, -10, 15, 5), the second vector b= (p ₀, p₁, p₂, p₃, p₄, p₅, p₆, p₇), the preset multiple is 5, the inner product operation module 120 may be configured to calculate:

(4)

The factor 5p ₀~5p₇ in the above formula (4) may be pre-calculated by the multiple operation module 110, and for the term related to the first operand p ₀, the second coefficient is 1, the third coefficient is 1, that is, 6p ₀=1*5p₀+1*p₀, and the related multiplication and addition and subtraction operations may be implemented by the inner product operation module 120.

In some embodiments, depending on the particular value of the first coefficient in the first vector, an element in the second vector may not be converted to a form containing the second operand. In this case, in order to correctly calculate the dot product of the first vector and the second vector, the inner product operation module 120 may be further configured to calculate the dot product of the first vector and the second vector according to the third operand and the first coefficient of the first vector multiplied by the third operand. Wherein the third operand is generated from at least one element of the second vector other than the element used to generate the first operand. In some specific examples, the third operand may be generated from a single element in the second vector. For example, the third operand may be the element itself in the second vector. Or the third operand may be equal to the bit-wise negation of an element in the second vector, especially in the case that the first coefficient in the first vector corresponding to this element is negative, the subtraction calculation in the dot-product operation may be converted into the addition calculation by the bit-wise negation operation, thereby further simplifying the structure of the arithmetic circuit 100. In other specific examples, the third operand may be generated from at least two elements in the second vector. For example, during a dot product operation, if a first coefficient in a first vector multiplied by two or more elements in a second vector has equal absolute values to each other, then the sum or difference of the two or more elements in the second vector may be first calculated, and then the resulting product of the sum or difference and the coefficient may be calculated, similar to that explained above with respect to the first operand. The third operand may be the sum or difference of two or more elements in the second vector, or may be a bit-wise negation equal to the sum or difference, as desired. It will be appreciated that the third operand may be calculated in other ways as desired from one or more elements in the second vector, as desired and is not limited in this regard. For example, if the first vector a= (-1, 4, -11, 40, 40, -11, 4, -1), the second vector b= (p ₀, p₁, p₂, p₃, p₄, p₅, p₆, p₇), the inner product operation module 120 may be configured to calculate:

(5)

The 5p ₂~5p₅ in the above formula (5) may be pre-calculated by the multiple operation module 110, and the respective second coefficients of these terms are 2 (or-2), 8, and 2 (or-2), respectively, while for the terms related to the first operands p ₂ and p ₅, the third coefficients thereof are each 1 (or-1). In addition, for the terms related to the third operands p ₀、p₁、p₆ and p ₇, the first coefficients are 1 (or-1), 4, 1 (or-1), respectively. As described above, the relevant multiplication and addition and subtraction calculations may be implemented by the inner product operation module 120.

In some embodiments, as shown in fig. 8, the inner product operation module 120 may include at least one of a second shifter 121a and a second addition unit 122 a.

The second shifter 121a may be configured to shift the ninth operand, which may be the second operand, the first operand or the third operand (depending on the specific dot product operation), by n bits to the left to generate a tenth operand, i.e., a ninth operand that is 2 ⁿ times the tenth operand, where n is a positive integer. In a special case, the inner product operation module 120 may only comprise the second shifter 121a if the dot product of the first vector a and the second vector B can be written in the form of a product of a single operand and an integer power of 2.

The second addition unit 122a may be configured to calculate a sum of at least a part of the products of the second operand and the second coefficient, the product of the first operand and the third coefficient, and the product of the third operand and the first coefficient (depending on the specific dot product operation). In some specific examples, the second addition unit 122a may be implemented by a compression tree. The compression tree can efficiently compress a plurality of data into the form of a sum of two or three data. In addition, the shift operation on the related data can be implemented inside the compression tree as needed, so that the shift and addition operation involved in the dot product operation is implemented, and the plurality of numbers are shifted and compressed into the sum of two or three numbers. Common compression trees may include 4:2 compression trees, 3:2 compression trees, etc. compression trees having two outputs, and may also include 5:3 compression trees, 6:3 compression trees, 7:3 compression trees, etc. compression trees having three outputs. The compression tree may be a compression tree having any number of inputs and outputs now existing or later developed, or may be a compression tree module having any number of inputs and outputs implemented by a combination of a plurality of existing compression trees. In other specific examples, the second addition unit 122a may alternatively be implemented as a full adder or a combination of a full adder and a half adder, or the like. In this case, it may be necessary to use the second shifter 121a and the second addition unit 122a in combination to realize shifting and addition calculation in dot product operation. In still other specific examples, the second addition unit 122a may include a compressed tree and a full adder or a combination of a full adder and a half adder. For example, for simplicity, a compressed tree having two outputs may be employed in embodiments of the present disclosure, and the two output data of the compressed tree may be added by an adder to obtain the final dot product operation result, as will be described in more detail below.

In some embodiments, as shown in fig. 8, the inner product operation module 120 may further include a second register 123a, the second register 123a may be communicatively connected with the second shifter 121a or the second addition unit 122a, and the second register 123a may be configured to register a sum of a tenth operand from the second shifter 121a or a product from the second addition unit 122a, thereby reducing glitches in the signal, helping to control timing in the operation circuit 100, ensuring proper operation.

It will be appreciated that the second shifter 121a, the second addition unit 122a or the second register 123a may be provided in a variety of ways, and their respective numbers may be one or more as needed, and may have various arrangements, as long as shifting and addition calculations involved in the dot product operation can be implemented, which are not limited herein.

As mentioned above, in the case where a negative coefficient is included in the first vector, in order to simplify the operation, the subtraction in the dot product operation may be converted into addition by a bitwise inverting operation. For example, in the case of negative numbers represented by complements, (-1) b= b+1, where b is the bitwise inversion of b. It follows that the product of a negative coefficient and an element in the second vector can be expressed as the sum of the absolute value of the coefficient and the product of the element and a constant term, wherein the constant term is determined according to the specific value of the coefficient, and the constant term can be precipitated for subsequent processing. Accordingly, in some embodiments, as shown in fig. 9, the operation circuit 100 may further include a bit-wise inverting module 130, the bit-wise inverting module 130 may be configured to invert, in a case where a first coefficient in the first vector is negative, a first operand or a third operand multiplied by the first coefficient, or the bit-wise inverting module 130 may be configured to invert, in a bit-wise manner, a second operand generated from the first operand multiplied by the first coefficient. The bit-wise inverting module 130 may be connected to the input of the multiple operation module 110, that is, the element with the negative corresponding first coefficient in the second vector is inverted bit-wise in advance, so as to perform subsequent calculation, where the constant term precipitated in the process may be compensated in the inner product operation module 120, or may be compensated in another suitable position, which is not limited herein. Or the bit-wise inverting module 130 may be connected between the multiple operation module 110 and the inner product operation module 120, that is, before the inner product operation module 120, to convert the subtraction possibly involved in the dot product operation into the addition. Similarly, constant terms that are extracted during this process may be compensated in the internal calculation module 120, or may be compensated at other suitable locations, without limitation. In some embodiments, the bit-wise inverting module 130 may be formed of a plurality of inverters. Or the bit-wise inverting module 130 may be formed in other ways, without limitation. In some embodiments, particularly in the case where the compensation of the constant term is performed in the inner arithmetic module 120, the second addition unit 122a may be configured to calculate a sum of at least a part of the product of the second operand and the second coefficient, the product of the first operand and the third coefficient, and the product of the third operand and the first coefficient, and a preset constant, wherein the preset constant may be determined according to the first coefficient having at least a part of a negative number in the first vector, i.e. the preset constant and the constant term may be closely related.

In an embodiment of the present disclosure, in order to simplify the structure of the operation circuit, so that the delay of the critical path can be reduced, the timing can be properly converged, and the inner product operation module 120 can be designed in a three-stage pipeline form. As shown in fig. 10 to 12, the inner product operation module 120 may include a third addition unit 122b and a fourth addition unit 122c communicatively connected to the third addition unit 122 b. Wherein the third addition unit 122b may be configured to add at least a product of the first operand and the third coefficient and a product of the third operand and the first coefficient to generate an eleventh operand and a twelfth operand. Here, the third addition unit 122b may process only the calculations related to the first operand and the third operand, so that the propagation of the glitch can be effectively reduced. The third adding unit 122b and the multiple operation module 110 may operate in the same clock cycle. For example, the first adding unit 112 in the multiple operation module 110 and the third adding unit 122b in the inner product operation module 120 may operate in the same clock cycle to generate the second operand and the sum of the product of the first operand and the third coefficient and the product of the third operand and the first coefficient, respectively. The operation result may then be provided to a fourth addition unit 122c in the inner product operation module 120, and the fourth addition unit 122c may be configured to add at least the eleventh operand, the twelfth operand, and the product of the second operand and the second coefficient from the third addition unit 122b to generate a thirteenth operand and a fourteenth operand. Here, the fourth addition unit 122c may operate in a clock cycle subsequent to the operation clock cycle of the third addition unit 122b, so that the fourth addition unit 122c may obtain all data required for calculation. In some specific examples, the first addition unit 112 may be in the form of a compression tree and/or an adder, and the third addition unit 122b and the fourth addition unit 122c may each be in the form of a compression tree.

Further, as shown in fig. 12, the inner product operation module 120 may further include a fifth addition unit 122d, the fifth addition unit 122d may be communicatively connected with the fourth addition unit 122c, and the fifth addition unit 122d may be configured to add a thirteenth operand and a fourteenth operand to generate a dot product of the first vector and the second vector. In some specific examples, the fifth addition unit 122d may be in the form of an adder.

Further, in some embodiments, to reduce glitches in the signals and help achieve correct timing in the arithmetic circuit 100, as shown in fig. 12, the inner product operation module 120 may further include a third register 123b, a fourth register 123c, a fifth register 123d, and a sixth register 123e. Wherein the third register 123b and the fourth register 123c may each be communicatively coupled to the third addition unit 122b, and the third register 123b may be configured to register an eleventh operand from the third addition unit 122b, and the fourth register 123c may be configured to register a twelfth operand from the third addition unit 122 b. The fifth register 123d and the sixth register 123e may each be communicatively coupled to the fourth addition unit 122c, and the fifth register 123d may be configured to register a thirteenth operand from the fourth addition unit 122c, and the sixth register 123e may be configured to register a fourteenth operand from the fourth addition unit 122 c. Under the action of the first, third and fourth registers 113, 123b, 123c, the fourth addition unit 122c may synchronously or substantially synchronously acquire its required data to calculate the dot product of the first and second vectors.

Further, as shown in fig. 12, the inner product operation module 120 may further include a seventh register 123f, the seventh register 123f may be communicatively connected with the fifth addition unit 122d, and the seventh register 123f may be configured to register a dot product of the first vector and the second vector from the fifth addition unit 122 d.

As shown in fig. 10 to 12, taking the first vector a= (-1, 4, -11, 40, -11, 4, -1), the second vector b= (d 0, d1, d2, d3, d4, d5, d6, d 7) as an example, the first addition unit 112 may be used to calculate values of 5d2 (or 5d 2), 5d3, 5d4, and 5d5 (or 5d 5), and the first register 113 may be used to register the above values. Simultaneously with the calculation by the first adding unit 112, the third adding unit 122b may be used to calculate the values (_d0) +4d1+ (. D2) +(. D5) +4d6+ (. D7), and the resulting two operands reg_s1 and reg_c1 may be respectively registered in the third register 123b and the fourth register 123 c. The fourth adder 122c may then be configured to calculate values of 2x (×5d2) +8 (5d3) +8 x (5d4) +2 x (×5d5) +reg_s1+reg_c1+k, where the predetermined constant K is generated during the processing of the negative coefficient in the manner described above, and the fourth adder 122c may register the generated two operands in the fifth register 123d and the sixth register 123e, respectively. Finally, the fifth addition unit 122d may add operands registered in the fifth register 123d and the sixth register 123e to obtain a final dot product operation result, and may be registered in the seventh register 123 f. As can be seen from comparing fig. 11 and 2, the addition of 14 numbers involved in the dot product operation can be reduced to the addition of 10 numbers by adopting the technical scheme of the present disclosure.

In the technical scheme of the disclosure, the number of shift and/or addition calculation needed in subsequent calculation is effectively reduced by pre-calculating the preset multiple of certain vector elements involved in the vector dot product calculation, so that the number of shifters and/or addition units needed in an operation circuit can be reduced, the structure of the operation circuit is simpler, the circuit area is reduced, a critical path is shortened, the time sequence is easier to converge, the delay in the circuit is reduced, and the circuit power consumption is reduced, thereby improving the performance of the operation circuit and a calculation device comprising the operation circuit. In some examples, the arithmetic circuit and the computing device of the present disclosure may be used in the field of image interpolation, and thus can effectively improve the efficiency and effect of image processing.

The words "left", "right", "front", "rear", "top", "bottom", "upper", "lower", "high", "low", and the like in the description and in the claims, if present, are used for descriptive purposes and not necessarily for describing permanent relative positions. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are, for example, capable of operation in other orientations than those illustrated or otherwise described herein. For example, when the device in the figures is inverted, features that were originally described as "above" other features may be described as "below" the other features. The device may also be otherwise oriented (rotated 90 degrees or at other orientations) and the relative spatial relationship will be explained accordingly.

In the description and claims, an element is referred to as being "on," "attached to," connected to, "coupled to," or "contacting" another element, and the like, the element may be directly on, attached to, connected to, coupled to, or contacting the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly on," "directly attached to," directly connected to, "directly coupled to," or "directly contacting" another element, there are no intervening elements present. In the description and claims, a feature being disposed "adjacent" to another feature may refer to a feature having a portion that overlaps with, or is located above or below, the adjacent feature.

As used herein, the word "exemplary" means "serving as an example, instance, or illustration," and not as a "model" to be replicated accurately. Any implementation described herein by way of example is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, this disclosure is not limited by any expressed or implied theory presented in the technical field, background, brief summary or the detailed description.

As used herein, the term "substantially" is intended to encompass any minor variation due to design or manufacturing imperfections, tolerances of the device or element, environmental effects and/or other factors. The word "substantially" also allows for differences from perfect or ideal situations due to parasitics, noise, and other practical considerations that may be present in a practical implementation.

In addition, for reference purposes only, the terms "first," "second," and the like may also be used herein, and are thus not intended to be limiting. For example, the terms "first," "second," and other such numerical terms referring to structures or elements do not imply a sequence or order unless clearly indicated by the context.

It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components, and/or groups thereof.

In addition, as used in this application, the words "herein," "above," "below," "above," and words of similar import shall refer to this application as a whole and not to any particular portions of this application. Furthermore, unless explicitly stated otherwise or otherwise understood in the context of use, conditional language such as "may," "might," "for example," "such as," etc., as used herein are generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements, and/or states. Thus, such conditional language is not generally intended to imply that one or more embodiments require, or include, in any way, features, elements and/or states or that such features, elements and/or states are to be performed in any particular embodiment.

In this disclosure, the term "providing" is used in a broad sense to cover all ways of obtaining an object, and thus "providing an object" includes, but is not limited to, "purchasing," "preparing/manufacturing," "arranging/setting," "installing/assembling," and/or "ordering" an object, etc. Furthermore, in this disclosure, the terms "circuit," "unit," and "module" may be used interchangeably.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

Those skilled in the art will recognize that the boundaries between the above described operations are merely illustrative. The operations may be combined into a single operation, the single operation may be distributed among additional operations, and the operations may be performed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in other various embodiments. Other modifications, variations, and alternatives are also possible. Aspects and elements of all of the embodiments disclosed above may be combined in any manner and/or in combination with aspects or elements of other embodiments to provide a number of additional embodiments. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. Indeed, the novel apparatus, methods, and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions, and changes in the form of the methods and systems described herein may be made without departing from the spirit of the disclosure. For example, while blocks are presented in a given arrangement, alternative embodiments may perform similar functions with different components and/or circuit topologies, and some blocks may be deleted, moved, added, subdivided, combined, and/or modified. Each of these blocks may be implemented in a variety of different ways.

Various embodiments of the disclosure may be described in an incremental manner, with identical and similar parts being apparent from each other, and each embodiment is specifically illustrated as different from the other embodiments. In the present disclosure, descriptions of the terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In the present disclosure, the schematic representations of the above terms are not necessarily for the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. The embodiments disclosed herein may be combined in any desired manner without departing from the spirit and scope of the present disclosure. Those skilled in the art will also appreciate that various modifications might be made to the embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A computing circuit, the computing circuit being used at least to calculate a dot product of a first vector and a second vector, wherein the first vector is a coefficient vector known in advance, the computing circuit comprising:

a multiple operation module, the multiple operation module being configured to calculate a preset multiple of a first operand to generate a corresponding second operand, wherein the first operand is generated according to at least one target element in the second vector, and in a dot product calculation, an absolute value of a first coefficient in the first vector multiplied by the target element is greater than or equal to an absolute value of a preset multiple, and the preset multiple is greater than 1; and

An inner product operation module, the inner product operation module is communicatively connected to the multiple operation module, and the inner product operation module is configured to calculate the dot product of the first vector and the second vector based on at least a second operand from the multiple operation module and a second coefficient multiplied by the second operand, wherein for each first operand and corresponding second operand, the product of the second coefficient and the preset multiple is less than or equal to the first coefficient.

2. The operation circuit according to claim 1, wherein the preset multiple is determined according to the first vector; or

The preset multiple includes at least one of 3 times, 5 times and 7 times.

3. The operation circuit according to claim 1, wherein the multiple operation module comprises one or more multiple calculation units, and the multiple calculation unit comprises:

a first shifter configured to shift the fourth operand to the left by m bits to generate a fifth operand, wherein the fifth operand is 2 ^m times the fourth operand, the fourth operand is a positive integer multiple of the first operand, and m is a positive integer; or

A first adding unit is configured to add a sixth operand and a seventh operand to generate an eighth operand, wherein the sixth operand is a positive integer multiple of the first operand, and the seventh operand is a positive integer multiple of the first operand.

4. The operation circuit according to claim 3, wherein the multiple operation module further comprises:

A first register is communicatively connected to at least one multiple calculation unit, and the first register is configured to register a fifth operand or an eighth operand from the multiple calculation unit.

5. The operation circuit according to claim 1, wherein the inner product operation module is further configured to calculate the dot product of the first vector and the second vector according to the first operand corresponding to the second operand and the third coefficient multiplied by the first operand, wherein for each first operand and the corresponding second operand, the sum of the product of the second coefficient and the preset multiple plus the third coefficient is equal to the first coefficient; and/or

The inner product operation module is also configured to calculate the dot product of the first vector and the second vector based on a third operand and a first coefficient in the first vector multiplied by the third operand, wherein the third operand is generated based on at least one element in the second vector except the element used to generate the first operand.

6. The operation circuit according to claim 5, wherein the inner product operation module comprises at least one of the following:

a second shifter configured to shift the ninth operand to the left by n bits to generate a tenth operand, wherein the tenth operand is ²ⁿ times the ninth operand, the ninth operand is the second operand, the first operand or the third operand, and n is a positive integer; and

A second adding unit is configured to calculate a sum of at least a portion of a product of the second operand and the second coefficient, a product of the first operand and the third coefficient, and a product of the third operand and the first coefficient.

7. The operation circuit according to claim 6, wherein the inner product operation module further comprises:

A second register is communicatively connected to the second shifter or the second adding unit, and the second register is configured to register a tenth operand from the second shifter or a sum of products from the second adding unit.

8. The operation circuit according to claim 6, further comprising a bitwise inversion module, wherein the bitwise inversion module is configured to, when the first coefficient in the first vector is a negative number, bitwise invert the first operand or the third operand multiplied by the first coefficient, or the second operand generated according to the first operand multiplied by the first coefficient;

The second adding unit is configured to calculate the sum of at least a portion of the product of the second operand and the second coefficient, the product of the first operand and the third coefficient, and the product of the third operand and the first coefficient and a preset constant, wherein the preset constant is determined based on the first coefficient in the first vector that is at least partially negative.

9. The operation circuit according to claim 5, wherein the inner product operation module comprises:

a third adding unit configured to add at least a product of the first operand and the third coefficient and a product of the third operand and the first coefficient to generate an eleventh operand and a twelfth operand, wherein the third adding unit and the multiple operation module operate in the same clock cycle; and

A fourth adding unit is communicatively connected to the third adding unit and is configured to add at least the product of the eleventh operand, the twelfth operand, and the second operand from the third adding unit and the second coefficient to generate a thirteenth operand and a fourteenth operand.

10. The operation circuit according to claim 9, wherein the inner product operation module further comprises:

a third register, the third register being communicatively connected to the third adding unit and the third register being configured to register an eleventh operand from the third adding unit;

a fourth register, the fourth register being communicatively connected to the third adding unit and the fourth register being configured to register a twelfth operand from the third adding unit;

a fifth register, the fifth register being communicatively connected to the fourth adding unit and the fifth register being configured to register a thirteenth operand from the fourth adding unit; and

A sixth register is communicatively connected to the fourth adding unit and is configured to register a fourteenth operand from the fourth adding unit.

11. The operation circuit according to claim 9, wherein the inner product operation module further comprises:

A fifth adding unit is communicatively connected to the fourth adding unit and is configured to add the thirteenth operand and the fourteenth operand to generate a dot product of the first vector and the second vector.

12. The operation circuit according to claim 11, wherein the inner product operation module further comprises:

A seventh register is communicatively connected to the fifth adding unit and is configured to register a dot product of the first vector and the second vector from the fifth adding unit.

13 . The operation circuit according to claim 1 , wherein the operation circuit is configured to perform an image interpolation operation, and the first vector is a coefficient vector for image interpolation.

14. A computing device comprising one or more computing circuits according to any one of claims 1 to 13.

15 . The computing device according to claim 14 , wherein, in the case where the computing device comprises a plurality of computing circuits, at least two computing circuits operate in parallel.