TWI898206B

TWI898206B - Operation conversion method for neural network, method of performing matrix multiplication operation based on convolution operation, and intelligence processing unit

Info

Publication number: TWI898206B
Application number: TW112114445A
Authority: TW
Inventors: 夏雨
Original assignee: 大陸商星宸科技股份有限公司
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2025-09-21
Also published as: TW202443440A

Abstract

An operation conversion method for neural networks, a method of performing a matrix multiplication operation based on a convolution operation, and an intelligence processing unit (IPU). The method of performing a matrix multiplication operation based on a convolution operation includes the following steps: (A) reading a first data from a first storage device and storing the first data in a second storage device; (B) reading from the first storage device a second data and storing the second data in the second storage device; (C) performing the convolution operation on the first data and the second data to obtain a first result; and (D) storing the first result in the first storage device. The first result is equal to a second result obtained by performing the matrix multiplication operation on the first data and the second data.

Description

Neural network operation conversion method, matrix multiplication method based on convolution operation, and intelligent processing unit

本發明是關於矩陣乘法運算，尤其是關於基於卷積運算實現矩陣乘法運算的方法及裝置。 The present invention relates to matrix multiplication operations, and more particularly to a method and apparatus for implementing matrix multiplication operations based on convolution operations.

請參閱圖1，圖1顯示矩陣乘法(matrix multiplication)的示意圖。矩陣乘法運算子(operator)將矩陣ML與矩陣MR相乘而得到矩陣積(matrix product)MO(即，ML．MR=MO)。矩陣ML及矩陣MR是矩陣乘法運算子的兩個運算元(operand)。矩陣ML的列數rr為row_L(=5)，矩陣ML的欄數cc為col_L(=1)。矩陣MR的列數rr為row_R(=1)，矩陣MR的欄數cc為col_R(=5)。因此，矩陣積MO是一個row_L×col_R(=5×5)的矩陣。矩陣積MO的每一個元素(entry)(MO(rr,cc))的值為MO(1,1)=a1b1，MO(1,2)=a1b2，……，MO(2,1)=a2b1，……，以此類推。 Refer to Figure 1, which shows a schematic diagram of matrix multiplication. The matrix multiplication operator multiplies matrix ML by matrix MR to obtain the matrix product MO (i.e., ML.MR = MO). Matrix ML and matrix MR are the two operands of the matrix multiplication operator. Matrix ML has row_L (=5) columns, and ML has col_L (=1) columns. Matrix MR has row_R (=1) columns, and MR has col_R (=5) columns. Therefore, the matrix product MO is a matrix of row_L × col_R (=5 × 5). The value of each entry (MO(rr,cc)) of the matrix product MO is MO(1,1)=a1b1, MO(1,2)=a1b2, ..., MO(2,1)=a2b1, ..., and so on.

轉換神經網路(Transform Neural Network)中存在大量的矩陣乘法運算子。然而，對於一個沒有實作矩陣乘法加速電路的電子裝置而言，大量的矩陣乘法對電子裝置是一個負擔，會影響電子裝置的效能及使用者體驗。 Transform neural networks (TNNs) contain a large number of matrix multiplication operators. However, for electronic devices without matrix multiplication acceleration circuitry, the large number of matrix multiplications is a burden, impacting device performance and user experience.

鑑於先前技術之不足，本發明之一目的在於提供神經網路之運算轉換方法、基於卷積運算進行矩陣乘法運算的方法，以及智慧處理單元，以改善先前技術的不足。 In view of the shortcomings of the prior art, one purpose of the present invention is to provide a method for neural network operation conversion, a method for performing matrix multiplication operations based on convolution operations, and an intelligent processing unit to improve the shortcomings of the prior art.

本發明之一實施例提供一種神經網路之運算轉換方法，用來將一矩陣乘法運算轉換成一卷積運算，該方法包含：(A)從一儲存裝置取得該矩陣乘法運算之一第一運算元及一第二運算元；(B)根據該第一運算元之一第一維度資訊決定一第三運算元之一第三維度資訊；(C)根據該第二運算元之一第二維度資訊決定一第四運算元之一第四維度資訊；(D)設定一偏移參數及一比例參數；(E)根據該第三維度資訊、該第四維度資訊、該偏移參數及該比例參數產生一卷積運算子；以及(F)將該卷積運算子儲存於該儲存裝置。該卷積運算子係對該第三運算元及該第四運算元進行該卷積運算，該第三運算元及該第四運算元之該卷積運算的結果實質上等於該第一運算元及該第二運算元之該矩陣乘法運算的結果。 One embodiment of the present invention provides a neural network operation conversion method for converting a matrix multiplication operation into a convolution operation, the method comprising: (A) obtaining a first operator and a second operator of the matrix multiplication operation from a storage device; (B) determining third-dimensional information of a third operator based on first-dimensional information of the first operator; (C) determining fourth-dimensional information of a fourth operator based on second-dimensional information of the second operator; (D) setting an offset parameter and a scale parameter; (E) generating a convolution operator based on the third-dimensional information, the fourth-dimensional information, the offset parameter, and the scale parameter; and (F) storing the convolution operator in the storage device. The convolution operator performs the convolution operation on the third operand and the fourth operand, and the result of the convolution operation on the third operand and the fourth operand is substantially equal to the result of the matrix multiplication operation on the first operand and the second operand.

本發明之另一實施例提供一種基於一卷積運算進行一矩陣乘法運算的方法，包含：(A)從一第一儲存裝置讀取一第一資料，並將該第一資料儲存至一第二儲存裝置；(B)從該第一儲存裝置讀取一第二資料，並將該第二資料儲存至該第二儲存裝置；(C)對該第一資料及該第二資料進行該卷積運算以得到一第一結果；以及(D)將該第一結果儲存至該第一儲存裝置。該第一結果係等於對該第一資料及該第二資料進行該矩陣乘法運算所得之一第二結果。 Another embodiment of the present invention provides a method for performing a matrix multiplication operation based on a convolution operation, comprising: (A) reading first data from a first storage device and storing the first data in a second storage device; (B) reading second data from the first storage device and storing the second data in the second storage device; (C) performing the convolution operation on the first data and the second data to obtain a first result; and (D) storing the first result in the first storage device. The first result is equal to a second result obtained by performing the matrix multiplication operation on the first data and the second data.

本發明之另一實施例提供一種智慧處理單元。智慧處理單元耦接一第一儲存裝置，包含：一第二儲存裝置、一直接記憶體存取電路以及一計算電路。直接記憶體存取電路耦接該第二儲存裝置，被配置以執行以下步驟：(A)從該第一儲存裝置讀取一第一資料，並將該第一資料儲存至該第二儲存裝置；以及(B)從該第一儲存裝置讀取一第二資料，並將該第二資料儲存至該第二儲存裝置。計算電路耦接該第二儲存裝置，被配置以執行以下步驟：(C)對該第一資料及該第二資料進行一卷積運算以得到一第一結果。該直接記憶體存取電路更將該第一結果儲存至該第一儲存裝置，該第一結果係等於對該第一資料及該第二資料進行一矩陣乘法運算所得之一第二結果。 Another embodiment of the present invention provides an intelligent processing unit. The intelligent processing unit is coupled to a first storage device and includes: a second storage device, a direct memory access circuit, and a calculation circuit. The direct memory access circuit is coupled to the second storage device and is configured to perform the following steps: (A) read a first data from the first storage device and store the first data in the second storage device; and (B) read a second data from the first storage device and store the second data in the second storage device. The calculation circuit is coupled to the second storage device and is configured to perform the following steps: (C) perform a convolution operation on the first data and the second data to obtain a first result. The direct memory access circuit further stores the first result in the first storage device, where the first result is equal to a second result obtained by performing a matrix multiplication operation on the first data and the second data.

本發明之實施例所體現的技術手段可以改善先前技術之缺點的至少其中之一，因此本發明相較於先前技術可以提升電子裝置的效能。 The technical means embodied in the embodiments of the present invention can improve at least one of the shortcomings of the prior art. Therefore, the present invention can improve the performance of electronic devices compared to the prior art.

有關本發明的特徵、實作與功效，茲配合圖式作實施例詳細說明如下。 The features, implementation, and effects of the present invention are described in detail below with reference to the accompanying drawings and examples.

cc:欄數 cc: Number of columns

ML,MR,MR':矩陣 ML,MR,MR': Matrix

MO:矩陣積 MO: Matrix Product

rr:列數 rr: Number of columns

IB,KB,KB':輸入資料 IB,KB,KB': Input data

OB:卷積運算的結果 OB: Result of convolution operation

OBV:表格 OBV: Table

n:批號 n:Batch number

h:高度 h: height

w:寬度 w: width

c:通道 c: Channel

400:運算轉換裝置 400: Operation conversion device

410,710:處理器 410,710:Processor

420,702:記憶體 420,702:Memory

430:儲存裝置 430: Storage device

700:電子裝置 700: Electronic devices

701:晶片 701: Chip

720:智慧處理單元 720: Intelligent Processing Unit

722:直接記憶體存取電路 722: Direct Memory Access Circuit

724:快取記憶體 724: Cache memory

726:計算電路 726: Computing Circuits

727:卷積核心 727: Convolution Core

728:向量核心 728: Vector Core

305,S310,S320,S330,S340,S350,S360,S370,S380,S390,S610,S810,S820,S830,S840,S850,S860:步驟 305, S310, S320, S330, S340, S350, S360, S370, S380, S390, S610, S810, S820, S830, S840, S850, S860: Steps

圖1顯示矩陣乘法的示意圖；圖2顯示本發明以卷積運算實現矩陣乘法的示意圖；圖3是本發明神經網路之運算轉換方法之一實施例的流程圖；圖4是本發明運算轉換裝置之一實施例的功能方塊圖；圖5是本發明資料重新排佈之另一實施例的示意圖；圖6是本發明神經網路之運算轉換方法之另一實施例的流程圖；圖7是本發明電子裝置之一實施例的功能方塊圖；以及圖8是本發明基於卷積運算進行矩陣乘法運算的方法之一實施例的流程圖。 Figure 1 is a schematic diagram of matrix multiplication; Figure 2 is a schematic diagram of the present invention implementing matrix multiplication using convolution operations; Figure 3 is a flow chart of one embodiment of the present invention's neural network operation conversion method; Figure 4 is a functional block diagram of one embodiment of the present invention's operation conversion device; Figure 5 is a schematic diagram of another embodiment of the present invention's data rearrangement; Figure 6 is a flow chart of another embodiment of the present invention's neural network operation conversion method; Figure 7 is a functional block diagram of one embodiment of the present invention's electronic device; and Figure 8 is a flow chart of one embodiment of the present invention's method for performing matrix multiplication using convolution operations.

以下說明內容之技術用語係參照本技術領域之習慣用語，如本說明書對部分用語有加以說明或定義，該部分用語之解釋係以本說明書之說明或定義為準。 The technical terms used in the following descriptions are based on customary terminology in this technical field. If this manual provides explanations or definitions for certain terms, the explanations or definitions in this manual shall prevail for such terms.

本發明之揭露內容包含神經網路之運算轉換方法、基於卷積運算進行矩陣乘法運算的方法，以及智慧處理單元。由於本發明之智慧處理單元所包含之部分元件單獨而言可能為已知元件，因此在不影響該裝置發明之充分揭露及可實施性的前提下，以下說明對於已知元件的細節將予以節略。此外，本發明之基於卷積運算進行矩陣乘法運算的方法的部分或全部流程可以是軟體及/或韌體之形式，並且可藉由本發明之智慧處理單元或其等效裝置來執行，在不影響該方法發明之充分揭露及可實施性的前提下，以下方法發明之說明將著重於步驟內容而非硬體。 The present invention discloses a method for transforming neural network operations, a method for performing matrix multiplication based on convolution operations, and an intelligent processing unit. Because some components of the intelligent processing unit of the present invention may be known components, details of these components will be omitted in the following description without affecting the full disclosure and feasibility of the present invention. Furthermore, part or all of the process of the method for performing matrix multiplication based on convolution operations of the present invention may be implemented in software and/or firmware and executed by the intelligent processing unit of the present invention or its equivalent. Therefore, the following description of the method invention will focus on the steps rather than the hardware without affecting the full disclosure and feasibility of the method invention.

由於卷積(convolution)神經網路是目前常見的神經網路之一，所以本發明提出基於卷積運算實現矩陣乘法運算的電子裝置及方法。 Since convolution neural networks are one of the most common neural networks, the present invention proposes an electronic device and method for implementing matrix multiplication operations based on convolution operations.

請參閱圖2，圖2顯示本發明以卷積運算實現矩陣乘法的示意圖。輸入資料IB與輸入資料KB是2維(2D)卷積運算子的兩個運算元。舉例來說，輸入資料IB可以是一個張量(tensor)或是一個張量的一部分(例如一個資料塊(tile))，而輸入資料KB可以是對應於輸入資料IB的權重資料(convolution kernel)。 Please refer to Figure 2, which shows a schematic diagram of the present invention implementing matrix multiplication using a convolution operation. Input data IB and input data KB are two operands of a 2D convolution operator. For example, input data IB can be a tensor or a portion of a tensor (e.g., a tile), and input data KB can be weight data (convolution kernel) corresponding to input data IB.

輸入資料IB的維度資訊如下：批號(batch number(n))為1，高度(height(h))為1，寬度(width(w))為row_L(=5)，通道(channel(c))為col_L(=1)。 The dimension information of the input data IB is as follows: batch number (n) is 1, height (h) is 1, width (w) is row_L (=5), and channel (c) is col_L (=1).

輸入資料KB的維度資訊如下：批號(n)為col_R(=5)(即，輸入資料KB包含5個塊(block)，每一塊對應一個卷積核)，高度(h)為1，寬度(w)為1，通道(c)為row_R。 The dimensionality of the input data KB is as follows: the batch number (n) is col_R (=5) (i.e., the input data KB contains 5 blocks, each corresponding to a convolution kernel), the height (h) is 1, the width (w) is 1, and the channel (c) is row_R.

卷積運算的結果OB的維度資訊如下：批號(n)為1，深度(d)為1，高度(h)為1，寬度(w)為row_L，通道(c)為col_R。 The dimension information of the convolution result OB is as follows: batch number (n) is 1, depth (d) is 1, height (h) is 1, width (w) is row_L, and channel (c) is col_R.

表格OBV顯示卷積運算的結果OB的每一小塊的值(OB(w,c))。更明確地說，OB(1,1)=a1b1，OB(1,2)=a1b2，……，OB(2,1)=a2b1，……，以此類推。 The OBV table shows the value of each small block of the result of the convolution operation (OB(w,c)). More specifically, OB(1,1)=a1b1, OB(1,2)=a1b2, ..., OB(2,1)=a2b1, ..., and so on.

請同時參閱圖1及圖2，因為OB(x,y)=MO(x,y)(x及y為正整數)，所以卷積運算可以用來執行矩陣乘法。在一些實施例中，上述的卷積運算操作可以將偏移(bias)參數設定為0，以及將比例(scale)參數設定為1。 Refer to Figures 1 and 2 together. Because OB(x,y) = MO(x,y) (x and y are positive integers), the convolution operation can be used to perform matrix multiplication. In some embodiments, the convolution operation may set the bias parameter to 0 and the scale parameter to 1.

請參閱圖3，圖3是本發明神經網路之運算轉換方法之一實施例的流程圖。運算轉換方法用來將矩陣乘法運算子轉換成卷積運算子。請參閱圖4，圖4是本發明運算轉換裝置之一實施例的功能方塊圖。運算轉換裝置400包含處理器410(例如，中央處理器、微處理器、微處理單元)、記憶體420(可以是揮發性記憶體(volatile memory)，例如動態隨機存取記憶體(Dynamic Random Access Memory,DRAM))以及儲存裝置430(可以是揮發性記憶體或非揮發性記憶體(non-volatile memory)，例如DRAM、快閃記憶體(flash memory)或硬碟)。記憶體420儲存複數個程式碼。儲存裝置430 儲存一個神經網路的多個指令，該些指令中的一部分是關於矩陣乘法。處理器410藉由執行記憶體420中的該些程式碼來執行圖3的流程，以將矩陣乘法(矩陣乘法運算子)轉換成卷積運算(卷積運算子)。運算轉換裝置400可以是一個通用電腦(general-purpose computer)，或是專門用於實作運算轉換方法之電腦。圖3包含以下步驟。 Please refer to Figure 3, which is a flow chart of an embodiment of an operation conversion method for a neural network of the present invention. The operation conversion method is used to convert a matrix multiplication operator into a convolution operator. Please refer to Figure 4, which is a functional block diagram of an embodiment of an operation conversion device of the present invention. The operation conversion device 400 includes a processor 410 (for example, a central processing unit, a microprocessor, a microprocessing unit), a memory 420 (which can be a volatile memory (volatile memory), such as a dynamic random access memory (DRAM)) and a storage device 430 (which can be a volatile memory or a non-volatile memory (non-volatile memory), such as a DRAM, a flash memory (flash memory) or a hard drive). Memory 420 stores multiple program codes. Storage device 430 stores multiple instructions for a neural network, some of which involve matrix multiplication. Processor 410 executes the program codes in memory 420 to perform the process shown in Figure 3, converting matrix multiplication (matrix multiplication operator) into convolution (convolution operator). Operation conversion device 400 can be a general-purpose computer or a computer specifically designed to implement the operation conversion method. Figure 3 includes the following steps.

步驟S305：處理器410從儲存裝置430讀取神經網路的一個指令。 Step S305: The processor 410 reads an instruction of the neural network from the storage device 430.

步驟S310：處理器410判斷該指令是否為矩陣乘法運算。如果是，則處理器410執行步驟S320；否則，處理器410執行步驟S305以讀取下一個指令。 Step S310: Processor 410 determines whether the instruction is a matrix multiplication operation. If so, processor 410 executes step S320; otherwise, processor 410 executes step S305 to read the next instruction.

步驟S320：處理器410取得該矩陣乘法運算的第一運算元及第二運算元。以圖1為例，處理器410在此步驟中從儲存裝置430取得矩陣MR(第一運算元)及矩陣ML(第二運算元)。 Step S320: The processor 410 obtains the first operand and the second operand of the matrix multiplication operation. Taking Figure 1 as an example, in this step, the processor 410 obtains the matrix MR (first operand) and the matrix ML (second operand) from the storage device 430.

步驟S330：處理器410產生一資料排佈指令，該資料排佈指令用來重新排佈該第一運算元之資料。因為卷積運算的特性，所以需要對原本的矩陣乘法(ML．MR)中的右矩陣(即，矩陣MR)的資料(即，元素)進行重新排佈。舉例來說，請同時參閱圖1及圖2，矩陣ML與輸入資料IB之間沒有經過資料重新排佈，但輸入資料KB是矩陣MR經過資料重新排佈之後的結果。更明確地說，矩陣MR的第一欄對應到輸入資料KB的批號(n)等於1的塊，矩陣MR的第二欄對應到輸入資料KB的批號(n)等於2的塊，以此類推。 Step S330: Processor 410 generates a data reordering instruction to reorder the data of the first operand. Due to the nature of the convolution operation, the data (i.e., elements) of the right matrix (i.e., matrix MR) in the original matrix multiplication (ML.MR) must be reordered. For example, referring to Figures 1 and 2, matrix ML and input data IB are not reordered, but input data KB is the result of reordering matrix MR. To be more specific, the first column of the matrix MR corresponds to the block of the input data KB where the batch number (n) is equal to 1, the second column of the matrix MR corresponds to the block of the input data KB where the batch number (n) is equal to 2, and so on.

對一個矩陣來說，對該矩陣進行資料重新排佈等於或等效於對該矩陣進行轉置(transpose)操作。矩陣的轉置操作的細節為本技術領域具有通常知識者所熟知，故不再贅述。 For a matrix, rearranging the data is equivalent to performing a transpose operation on the matrix. The details of the matrix transpose operation are well known to those skilled in the art and will not be elaborated on here.

請參閱圖5，圖5是本發明資料重新排佈之另一實施例的示意圖。矩陣MR'經過資料重新排佈後成為輸入資料KB'。輸入資料KB'的批號(n)為col_R(=5)，而輸入資料KB'通道(c)為row_R(=3)。作為比較，若沒有經過資料重新排佈，則對應於矩陣MR'的輸入資料的批號(n)為1，寬度(w)為row_R(=3)，而通道(c)為col_R(=5)。 Please refer to Figure 5, which is a schematic diagram of another embodiment of the data rearrangement of the present invention. After data rearrangement, matrix MR' becomes input data KB'. The batch number (n) of input data KB' is col_R (=5), and the channel (c) of input data KB' is row_R (=3). For comparison, if data rearrangement is not performed, the corresponding input data of matrix MR' has a batch number (n) of 1, a width (w) of row_R (=3), and a channel (c) of col_R (=5).

請繼續參閱圖3。 Please continue to refer to Figure 3.

步驟S340：處理器410根據矩陣MR(第一運算元)的維度資訊決定輸入資料KB(第三運算元，即，卷積運算的其中一個運算元)的維度資訊。此步驟等於或等效於卷積運算之重整形(reshape)操作。舉例來說，圖1之矩陣MR的維度資訊是row_R×col_R=1×5，而圖2的輸入資料KB的維度資訊是(w,h,n,c)=(1,1,col_R,row_R)=(1,1,5,1)。卷積運算將根據輸入資料KB(第三運算元)的維度資訊處理輸入資料KB，也就是說，輸入資料KB的維度資訊決定卷積運算看待或處理輸入資料KB的方式。 Step S340: The processor 410 determines the dimension information of the input data KB (the third operator, i.e., one of the operators of the convolution operation) based on the dimension information of the matrix MR (the first operator). This step is equal to or equivalent to the reshaping operation of the convolution operation. For example, the dimension information of the matrix MR in Figure 1 is row_R×col_R=1×5, and the dimension information of the input data KB in Figure 2 is (w,h,n,c)=(1,1,col_R,row_R)=(1,1,5,1). The convolution operation will process the input data KB according to the dimension information of the input data KB (the third operator). In other words, the dimension information of the input data KB determines how the convolution operation views or processes the input data KB.

步驟S350：處理器410根據矩陣ML(第二運算元)的維度資訊決定輸入資料IB(第四運算元，即，卷積運算的另一個運算元)的維度資訊。類似地，此步驟等於或等效於卷積運算之重整形操作。舉例來說，圖1之矩陣ML的維度資訊是row_L×col_L=5×1，而圖2的輸入資料IB的維度資訊是(w,h,n,c)=(row_L,1,1,col_L)=(5,1,1,1)。卷積運算將根據輸入資料IB(第四運算元)的維度資訊處理輸入資料IB，也就是說，輸入資料IB的維度資訊決定卷積運算看待或處理輸入資料IB的方式。 Step S350: Processor 410 determines the dimension information of input data IB (the fourth operator, i.e., another operator of the convolution operation) based on the dimension information of matrix ML (the second operator). Similarly, this step is equivalent to or equivalent to the reshaping operation of the convolution operation. For example, the dimension information of matrix ML in Figure 1 is row_L × col_L = 5 × 1, while the dimension information of input data IB in Figure 2 is (w, h, n, c) = (row_L, 1, 1, col_L) = (5, 1, 1, 1). The convolution operation processes input data IB based on the dimension information of input data IB (the fourth operator). In other words, the dimension information of input data IB determines how the convolution operation views or processes input data IB.

卷積運算的結果與輸入資料的維度資訊相關。當卷積運算指令包含被安排過的維度資訊時，卷積運算的結果便能如預期般的等於或實質上等於矩陣乘法的結果。 The result of a convolution operation is related to the dimensionality of the input data. When the convolution operation instruction contains the arranged dimensionality information, the result of the convolution operation can be equal to or substantially equal to the result of matrix multiplication as expected.

步驟S360：處理器410設定一偏移參數及一比例參數。更明確地說，處理器410將偏移參數設定為0，以及將比例參數設定為1。 Step S360: The processor 410 sets an offset parameter and a scale parameter. More specifically, the processor 410 sets the offset parameter to 0 and the scale parameter to 1.

步驟S370：處理器410根據輸入資料KB(第三運算元)的維度資訊、輸入資料IB(第四運算元)的維度資訊、該偏移參數及該比例參數產生卷積運算子。該卷積運算子是用來取代上述的矩陣乘法運算。更明確地說，輸入資料IB及輸入資料KB是該卷積運算子的運算元，而該卷積運算的結果OB等於或實質上等於矩陣乘法運算的結果(矩陣積MO)。 Step S370: The processor 410 generates a convolution operator based on the dimensional information of the input data KB (the third operand), the dimensional information of the input data IB (the fourth operand), the offset parameter, and the scale parameter. This convolution operator is used to replace the aforementioned matrix multiplication operation. More specifically, the input data IB and the input data KB are the operands of the convolution operator, and the result of the convolution operation, OB, is equal to or substantially equal to the result of the matrix multiplication operation (matrix product MO).

步驟S380：處理器410將該卷積運算子儲存於儲存裝置430。在實作上，卷積運算子可以以卷積運算指令的形式儲存於儲存裝置430中。 Step S380: The processor 410 stores the convolution operator in the storage device 430. In practice, the convolution operator can be stored in the storage device 430 in the form of a convolution operation instruction.

步驟S390：處理器410根據矩陣乘法運算的結果(即，矩陣積MO)的維度資訊決定卷積運算的結果OB的維度資訊。此步驟等於或等效於卷積運算之重整形操作，以便於後續的指令或操作將卷積運算的結果OB視為一個矩陣(2維的資料)。 Step S390: The processor 410 determines the dimension information of the convolution result OB based on the dimension information of the matrix multiplication result (i.e., the matrix product MO). This step is equivalent to a reshaping operation of the convolution operation, so that subsequent instructions or operations can treat the convolution result OB as a matrix (2-dimensional data).

在一些實施例中，步驟S390結束後，處理器410繼續執行步驟S305。也就是說，處理器410掃描儲存裝置430以檢視每個指令。 In some embodiments, after step S390 is completed, processor 410 continues to execute step S305. In other words, processor 410 scans storage device 430 to check each instruction.

請注意，因為卷積運算滿足交換律(commutative property)(即，IB＊KB=KB＊IB)，所以在不同的實施例中，輸入資料KB可以對應於矩陣ML，而輸入資料IB可以對應於矩陣MR(即，輸入資料IB是矩陣MR經過資料重新排佈後的資料)。 Note that because the convolution operation satisfies the commutative property (i.e., IB * KB = KB * IB), in different embodiments, the input data KB can correspond to the matrix ML, and the input data IB can correspond to the matrix MR (i.e., the input data IB is the data of the matrix MR after data rearrangement).

請參閱圖6，圖6是本發明神經網路之運算轉換方法之另一實施例的流程圖。圖6與圖3相似，差別在於，在圖6的實施例中，處理器410於步驟S330之前判斷矩陣MR是否已經過資料重新排佈(步驟S610)。更明確地說，在一個包含該矩陣乘法運算(即，ML．MR=MO)的整體運算中，如果矩陣MR在該矩陣乘法運算之前已經過資料重新排佈，則步驟S330可以被省略。 Please refer to Figure 6, which is a flow chart of another embodiment of the neural network operation conversion method of the present invention. Figure 6 is similar to Figure 3, except that in the embodiment of Figure 6, the processor 410 determines whether the matrix MR has undergone data rearrangement before step S330 (step S610). More specifically, in an overall operation including the matrix multiplication operation (i.e., ML.MR=MO), if the matrix MR has undergone data rearrangement before the matrix multiplication operation, step S330 can be omitted.

請參閱圖7，圖7是本發明電子裝置之一實施例的功能方塊圖。電子裝置700可以執行前述的卷積運算。電子裝置700包含晶片701及記憶體702。記憶體702是儲存裝置的一種(可以是揮發性記憶體(例如，DRAM))。晶片701可以是一個具有特定功能的晶片(例如，影像處理晶片)，包含處理器710及智慧處理單元(intelligence processing unit,IPU)720。處理器710可以是具有程式執行能力的電路或電子元件，例如中央處理器、微處理器、微處理單元、數位訊號處理器、特殊應用積體電路(Application Specific Integrated Circuit,ASIC)，或其等效電路。在一些情況下，處理器710與智慧處理單元720協作以實現晶片701的功能；更明確地說，處理器710傳送指令(例如，與卷積運算或向量運算相關的指令)給智慧處理單元720，而智慧處理單元720執行該些指令。 Please refer to Figure 7, which is a functional block diagram of an embodiment of the electronic device of the present invention. Electronic device 700 can perform the aforementioned convolution operation. Electronic device 700 includes a chip 701 and a memory 702. Memory 702 is a type of storage device (can be a volatile memory (e.g., DRAM)). Chip 701 can be a chip with specific functions (e.g., an image processing chip), including a processor 710 and an intelligence processing unit (IPU) 720. Processor 710 can be a circuit or electronic component with program execution capability, such as a central processing unit, a microprocessor, a microprocessing unit, a digital signal processor, an application specific integrated circuit (ASIC), or an equivalent circuit thereof. In some cases, processor 710 collaborates with intelligent processing unit 720 to implement the functions of chip 701. More specifically, processor 710 transmits instructions (e.g., instructions related to convolution operations or vector operations) to intelligent processing unit 720, and intelligent processing unit 720 executes these instructions.

智慧處理單元720包含直接記憶體存取電路(Direct Memory Access,DMA)722、快取記憶體724(cache，儲存裝置的一種)及計算電路726。計算電路726包含卷積核心727及向量核心728。卷積核心727用來執行卷積運算，向量核心728用來執行向量運算。 The intelligent processing unit 720 includes a direct memory access (DMA) circuit 722, a cache memory 724 (a type of storage device), and a computation circuit 726. The computation circuit 726 includes a convolution core 727 and a vector core 728. The convolution core 727 is used to perform convolution operations, and the vector core 728 is used to perform vector operations.

請參閱圖8，圖8是本發明基於卷積運算進行矩陣乘法運算的方法之一實施例的流程圖。圖8的流程可以由圖7的電子裝置700(更明確地說，晶片701)執行，包含以下步驟。這裡假設以下的卷積運算對應到ML．MR之矩陣乘法運算(請參考圖1)。請注意，不同於卷積運算，矩陣乘法運算不滿足交換律(即，ML．MR≠MR．ML)。 Please refer to Figure 8, which is a flowchart of one embodiment of a method for performing matrix multiplication based on convolution operations according to the present invention. The process of Figure 8 can be executed by the electronic device 700 (more specifically, the chip 701) of Figure 7 and includes the following steps. It is assumed that the following convolution operation corresponds to the matrix multiplication operation of ML.MR (see Figure 1). Please note that, unlike convolution operations, matrix multiplication operations do not satisfy the commutative property (i.e., ML.MR ≠ MR.ML).

步驟S810：處理器710判斷是否有資料排佈指令。如果有(即，圖3或圖6之步驟S330曾被運算轉換裝置400執行)，則處理器710控制智慧處理單元720執行步驟S820；否則，處理器710控制智慧處理單元720執行步驟S830至步驟S860。如未有資料排佈指令表明該矩陣乘法運算的資料已預先進行過重新排布操作。在一具體實施例中，當矩陣乘法運算的資料例如矩陣MR為一常量(constant value)時，該資料會預先進行重新排布操作，且重新排布後的資料存入儲存裝置中。 Step S810: Processor 710 determines whether a data reordering instruction is present. If so (i.e., step S330 in Figure 3 or Figure 6 has been executed by computation conversion device 400), processor 710 controls intelligent processing unit 720 to execute step S820. Otherwise, processor 710 controls intelligent processing unit 720 to execute steps S830 through S860. If no data reordering instruction is present, this indicates that the data for the matrix multiplication operation has been previously reordered. In one embodiment, when the data for the matrix multiplication operation, such as matrix MR, is a constant value, the data is pre-reordered, and the reordered data is stored in a storage device.

步驟S820：直接記憶體存取電路722從第一儲存裝置(例如，記憶體702)讀取第一原始資料(例如，矩陣MR)，並且對該第一原始資料進行資料重新排佈操作以將第一原始資料轉換為第一資料(例如，輸入資料KB，即，後續之卷積運算的權重資料)，然後將第一資料儲存至該第一儲存裝置。更明確地說，直接記憶體存取電路722藉由將第一原始資料寫入及讀出快取記憶體724來對第一原始資料執行該資料重新排佈操作。該資料重新排佈操作等效於矩陣運算中的轉置操作。接著，直接記憶體存取電路722將第一資料寫回第一儲存裝置。 Step S820: The direct memory access circuit 722 reads first original data (e.g., matrix MR) from a first storage device (e.g., memory 702), performs a data rearrangement operation on the first original data to convert the first original data into first data (e.g., input data KB, i.e., weight data for a subsequent convolution operation), and then stores the first data in the first storage device. More specifically, the direct memory access circuit 722 performs the data rearrangement operation on the first original data by writing and reading the first original data to and from cache memory 724. This data rearrangement operation is equivalent to a transpose operation in matrix operations. Then, the direct memory access circuit 722 writes the first data back to the first storage device.

步驟S830：直接記憶體存取電路722從第一儲存裝置讀取第一資料(例如，輸入資料KB)，並將該第一資料儲存至第二儲存裝置(例如，快取記憶體724)。請注意，當沒有資料排佈指令時(即，步驟S810的結果為否)，第一原始資料的排佈已經是卷積運算之輸入資料的格式(例如，輸入資料KB的格式)。請注意，步驟S830不對第一資料進行重新排佈。 Step S830: Direct memory access circuit 722 reads the first data (e.g., input data KB) from the first storage device and stores the first data in the second storage device (e.g., cache 724). Note that when there is no data reordering instruction (i.e., the result of step S810 is no), the first original data is already arranged in the format of the input data for the convolution operation (e.g., the format of input data KB). Note that step S830 does not reorder the first data.

步驟S840：直接記憶體存取電路722從該第一儲存裝置讀取第二資料(例如，輸入資料IB，實質上等同於矩陣ML)，並將該第二資料儲存至該第二儲存裝置。請注意，步驟S840不對第二資料進行重新排佈。 Step S840: The direct memory access circuit 722 reads the second data (e.g., input data IB, essentially equivalent to matrix ML) from the first storage device and stores the second data in the second storage device. Note that step S840 does not rearrange the second data.

步驟S850：計算電路726(更明確地說，卷積核心727)對該第一資料及該第二資料進行該卷積運算以得到結果(例如，卷積運算的結果OB)，並將該結果儲存在第二儲存裝置。在一些實施例中，該卷積運算之偏移參數及比例參數分別為0與1。在一些實施例中，偏移參數及比例參數預先儲存在記憶體702中，而智慧處理單元720從記憶體702中讀取該些參數並進行對應的設定。 Step S850: The calculation circuit 726 (more specifically, the convolution core 727) performs the convolution operation on the first data and the second data to obtain a result (e.g., the convolution result OB), and stores the result in a second storage device. In some embodiments, the offset parameter and scale parameter of the convolution operation are 0 and 1, respectively. In some embodiments, the offset parameter and scale parameter are pre-stored in the memory 702, and the intelligent processing unit 720 reads these parameters from the memory 702 and sets them accordingly.

步驟S860：直接記憶體存取電路722將該結果儲存至該第一儲存裝置，以供晶片701進行後續的其他操作。 Step S860: The direct memory access circuit 722 stores the result in the first storage device for the chip 701 to perform other subsequent operations.

如前述所討論的，因為晶片701所執行的指令已預先經過處理，所以晶片701所執行的卷積運算的結果會等同於矩陣乘法的結果(例如，請參閱圖1及圖2，卷積運算的結果OB等於或實質上等於矩陣積MO)。 As discussed above, because the instructions executed by chip 701 have been pre-processed, the result of the convolution operation executed by chip 701 is equivalent to the result of the matrix multiplication (for example, see Figures 1 and 2 , the result of the convolution operation OB is equal to or substantially equal to the matrix product MO).

綜上所述，即便電子裝置沒有實作矩陣乘法加速電路，仍可利用本發明來加速矩陣乘法的運算，以提升效能及改善使用者體驗。 In summary, even if an electronic device does not implement a matrix multiplication acceleration circuit, the present invention can still be used to accelerate matrix multiplication operations, thereby improving performance and enhancing the user experience.

雖然本發明之實施例如上所述，然而該些實施例並非用來限定本發明，本技術領域具有通常知識者可根據本發明之明示或隱含之內容對本發明之技術特徵施以變化，凡此種種變化均可能屬於本發明所尋求之專利保護範疇，換言之，本發明之專利保護範圍須視本說明書之申請專利範圍所界定者為準。 Although the embodiments of the present invention have been described above, these embodiments are not intended to limit the present invention. Those skilled in the art may modify the technical features of the present invention based on the explicit or implicit content of the present invention. All such modifications may fall within the scope of the patent protection sought for the present invention. In other words, the scope of patent protection for the present invention shall be determined by the scope of the patent application set forth in this specification.

S305,S310,S320,S330,S340,S350,S360,S370,S380,S390:步驟S305, S310, S320, S330, S340, S350, S360, S370, S380, S390: Step

Claims

A neural network operation conversion method is performed by an operation conversion device to convert a matrix multiplication operation into a convolution operation. The operation conversion device includes a processor and a storage device coupled to each other. The method includes: (A) the processor obtains a first operand and a second operand of the matrix multiplication operation from the storage device; (B) the processor determines third-dimensional information of a third operand based on first-dimensional information of the first operand; (C) the processor determines fourth-dimensional information of a fourth operand based on second-dimensional information of the second operand; (D) the processor sets an offset parameter and a scale parameter; (E) the processor generates a convolution operator based on the third-dimensional information, the fourth-dimensional information, the offset parameter, and the scale parameter; and (F) the processor stores the convolution operator in the storage device; wherein the convolution operator performs the convolution operation on the third operand and the fourth operand, and the result of the convolution operation of the third operand and the fourth operand is substantially equal to the result of the matrix multiplication operation of the first operand and the second operand; wherein the matrix multiplication operation generates a first result, and the convolution operator generates a second result. The method further includes: (G) determining sixth dimensional information of the second result based on fifth dimensional information of the first result.

The method of claim 1 further comprises: the processor generating a data arrangement instruction before the step (B), wherein the data arrangement instruction is used to rearrange the first operand.

The method of claim 2, wherein the first operand is a matrix, and the data arrangement instruction is equivalent to performing a transpose operation on the matrix.

The method of claim 2, wherein the third operand is a weight data of the convolution operation.

The method of claim 1, wherein the first operand is a matrix A, the second operand is a matrix B, and the matrix multiplication operation is to calculate B·A, the method further comprising: the processor generating a data arrangement instruction before the step (B), the data arrangement instruction being used to rearrange the data of the matrix A.