JP2020060967A

JP2020060967A - Neural network processing device, and neural network processing method

Info

Publication number: JP2020060967A
Application number: JP2018192019A
Authority: JP
Inventors: 貴登山田; Takato YAMADA; トーマスネバドビルチェスアントニオ; Tomas Nevado Vilchez Antonio
Original assignee: Leap Mind Inc
Current assignee: Leap Mind Inc
Priority date: 2018-10-10
Filing date: 2018-10-10
Publication date: 2020-04-16
Anticipated expiration: 2038-10-10
Also published as: JP7040771B2

Abstract

【課題】組み込み用のハードウェアを使用した場合であってもニューラルネットワークの処理速度の低下を抑制することができるニューラルネットワーク処理装置、およびニューラルネットワーク処理方法を提供することを目的とする。【解決手段】ＣＮＮ処理装置１は、ＣＮＮに与えられる入力信号Ａを記憶する入力バッファ１０と、ＣＮＮの重みＵを記憶する重みバッファ１１と、入力信号Ａと重みＵとの積和演算を含むＣＮＮの畳み込み演算を行う第１処理と、第１処理で用いられるデータの少なくとも一部のビット精度を削減する量子化を行う第２処理とを行うプロセッサとを備える。【選択図】図１PROBLEM TO BE SOLVED: To provide a neural network processing apparatus capable of suppressing a decrease in a processing speed of a neural network even when embedded hardware is used, and a neural network processing method. A CNN processing device 1 includes an input buffer 10 for storing an input signal A given to a CNN, a weight buffer 11 for storing a weight U of a CNN, and a sum-of-product operation of an input signal A and a weight U. It includes a processor that performs a first process of performing a CNN convolution operation and a second process of performing quantization that reduces bit accuracy of at least a part of the data used in the first process. [Selection diagram] Fig. 1

Description

本発明は、ニューラルネットワーク処理装置、およびニューラルネットワーク処理方法に関する。 The present invention relates to a neural network processing device and a neural network processing method.

近年、画像を複数のカテゴリに分類するためのディープニューラルネットワークとして、畳み込みニューラルネットワーク（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ：ＣＮＮ）が注目されている。ＣＮＮは、ディープニューラルネットワークにおいて畳み込み層を有することを特徴とする。畳み込み層では、入力データに対してフィルタを適用する。より詳細には、畳み込み層では、フィルタのウィンドウを一定のストライドでスライドさせて、フィルタの要素と入力データの対応する要素とを乗算し、その和を求める積和演算を行う。 In recent years, a convolutional neural network (CNN) has attracted attention as a deep neural network for classifying images into a plurality of categories. CNN is characterized by having a convolutional layer in a deep neural network. In the convolutional layer, a filter is applied to the input data. More specifically, in the convolutional layer, the filter window is slid by a constant stride, the filter element is multiplied by the corresponding element of the input data, and the sum-of-products operation is performed.

図１１は、一般的なＣＮＮの信号処理のフローを示す図である。ＣＮＮは、入力層、中間層、および出力層を有する（例えば、非特許文献１参照）。中間層においては、入力信号に重みをかけ合わせる畳み込み演算が行われている。図１１に示すように、中間層において、畳み込み演算の結果に対して、必要に応じてＲｅＬＵなどの活性化関数が適用され、さらに、場合によっては正規化およびプーリング処理が行われる。また、畳み込み演算を介して抽出された入力信号の特徴は、全結合層からなる分類器にかけられ、分類結果が出力層から出力される。このように、ＣＮＮなどのニューラルネットワークにおいては、積和演算が繰り返し行われていることがその特徴の１つとして挙げられる。 FIG. 11 is a diagram showing a general CNN signal processing flow. The CNN has an input layer, an intermediate layer, and an output layer (for example, see Non-Patent Document 1). In the intermediate layer, a convolution operation for weighting the input signal is performed. As shown in FIG. 11, in the intermediate layer, an activation function such as ReLU is applied to the result of the convolutional operation as needed, and further, normalization and pooling processing is performed in some cases. Further, the characteristics of the input signal extracted through the convolution operation are applied to the classifier including the fully connected layer, and the classification result is output from the output layer. As described above, one feature of a neural network such as CNN is that the product-sum calculation is repeatedly performed.

ここで、ＣＮＮに用いられる入力データの入力値や重みは、小数点を含む場合があるが、従来のＣＮＮなどのニューラルネットワークの積和演算においては、図１１の「入力信号」、「重み」、および「畳み込み演算」の各値に示すように、演算結果の桁数を確保した形での演算処理が行われている。 Here, the input value and weight of the input data used for CNN may include a decimal point, but in the product-sum calculation of a conventional neural network such as CNN, “input signal”, “weight”, As shown in each value of “convolution operation” and “convolution operation”, the operation processing is performed in a form in which the number of digits of the operation result is secured.

麻生英樹、他、「ＤｅｅｐＬｅａｒｎｉｎｇ深層学習」近代科学社、２０１５年１１月Hideki Aso, et al., “Deep Learning Deep Learning”, Modern Science Company, November 2015

しかし、従来のＣＮＮなどのニューラルネットワークをＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）やマイコンなどの組み込み用のハードウェアで実装する場合、桁数の多い積和演算による処理速度の低下が課題となっていた。 However, when a conventional neural network such as CNN is implemented by built-in hardware such as FPGA (Field Programmable Gate Array) or a microcomputer, a reduction in processing speed due to a product-sum operation with a large number of digits has been a problem.

本発明は、上述した課題を解決するためになされたものであり、組み込み用のハードウェアを使用した場合であってもニューラルネットワークの処理速度の低下を抑制することができるニューラルネットワーク処理装置、およびニューラルネットワーク処理方法を提供することを目的とする。 The present invention has been made to solve the above-described problems, and a neural network processing device capable of suppressing a decrease in the processing speed of a neural network even when using embedded hardware, and It is an object to provide a neural network processing method.

上述した課題を解決するために、本発明に係るニューラルネットワーク処理装置は、ニューラルネットワークに与えられる入力信号を記憶する第１メモリと、前記ニューラルネットワークの重みを記憶する第２メモリと、前記入力信号と前記重みとの積和演算を含む前記ニューラルネットワークの畳み込み演算を行う第１処理と、前記第１処理で用いられるデータの少なくとも一部のビット精度を削減する量子化を行う第２処理とを行うプロセッサとを備えることを特徴とする。 In order to solve the above-mentioned problems, a neural network processing device according to the present invention includes a first memory for storing an input signal given to a neural network, a second memory for storing a weight of the neural network, and the input signal. A first process for performing a convolution operation of the neural network including a product-sum operation of the weight and the weight, and a second process for performing a quantization for reducing the bit precision of at least a part of the data used in the first process. And a processor for performing.

また、本発明に係るニューラルネットワーク処理装置において、前記第２処理は、前記入力信号、前記重み、および前記第１処理における前記積和演算の演算結果の少なくとも１つのビット精度を削減する量子化を含んでいてもよい。 Further, in the neural network processing device according to the present invention, the second process performs quantization for reducing at least one bit precision of the input signal, the weight, and the operation result of the product-sum operation in the first process. May be included.

また、本発明に係るニューラルネットワーク処理装置において、前記第２メモリは、予め前記重みが量子化された量子化重みを記憶していてもよい。 Further, in the neural network processing device according to the present invention, the second memory may store a quantization weight obtained by previously quantizing the weight.

また、本発明に係るニューラルネットワーク処理装置において、前記プロセッサは、前記重みと前記入力信号とに基づいて、前記第２処理が組み込まれた前記第１処理を行う量子化畳み込み演算部を備えていてもよい。 Further, in the neural network processing device according to the present invention, the processor includes a quantization convolution operation unit that performs the first process in which the second process is incorporated, based on the weight and the input signal. Good.

また、本発明に係るニューラルネットワーク処理装置において、前記重みを量子化する第１関数を記憶する第３メモリと、前記第２処理が組み込まれた前記第１処理を実現する第２関数を記憶する第４メモリと、をさらに備え、前記プロセッサは、前記第１関数を読み出して前記重みを量子化し、前記第２関数を読み出して前記第２処理が組み込まれた前記第１処理を行ってもよい。 Further, in the neural network processing device according to the present invention, a third memory for storing a first function for quantizing the weight and a second function for implementing the first process in which the second process is incorporated are stored. A fourth memory may be further included, and the processor may read the first function to quantize the weight, read the second function, and perform the first process in which the second process is incorporated. .

また、本発明に係るニューラルネットワーク処理装置において、前記ニューラルネットワークは、少なくとも１層の中間層を有する多層ニューラルネットワークであってもよい。 In the neural network processing device according to the present invention, the neural network may be a multilayer neural network having at least one intermediate layer.

上述した課題を解決するために、本発明に係るニューラルネットワーク処理方法は、ニューラルネットワークに与えられる入力信号を第１メモリに記憶する第１ステップと、前記ニューラルネットワークの重みを第２メモリに記憶する第２ステップと、プロセッサが、前記入力信号と前記重みとの積和演算を含む前記ニューラルネットワークの畳み込み演算を行う第１処理と、前記第１処理で用いられるデータの少なくとも一部のビット精度を削減する量子化を行う第２処理とを行う第３ステップとを備えることを特徴とする。 In order to solve the above-mentioned problems, a neural network processing method according to the present invention stores a first step of storing an input signal given to a neural network in a first memory and a weight of the neural network in a second memory. A second step, a first step in which the processor performs a convolution operation of the neural network including a product-sum operation of the input signal and the weight, and a bit precision of at least a part of data used in the first processing. And a third step of performing a second process of performing quantization for reduction.

本発明によれば、ニューラルネットワークに与えられる入力信号と重みとの積和演算を含む畳み込み演算を行う第１処理で用いられるデータの少なくとも一部のビット精度を削減する量子化を行うので、組み込み用のハードウェアを使用した場合であってもニューラルネットワークの処理速度の低下を抑制することができる。 According to the present invention, since quantization is performed to reduce the bit precision of at least part of the data used in the first process that performs the convolution operation including the product-sum operation of the input signal and the weight given to the neural network, It is possible to suppress a decrease in the processing speed of the neural network even when the hardware for use is used.

図１は、本発明の第１の実施の形態に係るＣＮＮ処理装置の機能を説明するブロック図である。FIG. 1 is a block diagram illustrating the functions of the CNN processing device according to the first embodiment of the present invention. 図２は、第１の実施の形態に係るＣＮＮ処理装置のハードウェア構成を示すブロック図である。FIG. 2 is a block diagram showing the hardware configuration of the CNN processing device according to the first embodiment. 図３は、第１の実施の形態に係るＣＮＮ処理方法のフローを説明するための図である。FIG. 3 is a diagram for explaining the flow of the CNN processing method according to the first embodiment. 図４は、第１の実施の形態に係るＣＮＮ処理方法を説明するための図である。FIG. 4 is a diagram for explaining the CNN processing method according to the first embodiment. 図５は、第２の実施の形態に係るＣＮＮ処理装置の機能を説明するブロック図である。FIG. 5 is a block diagram illustrating the functions of the CNN processing device according to the second embodiment. 図６は、第２の実施の形態に係るＣＮＮ処理方法のフローを説明するための図である。FIG. 6 is a diagram for explaining the flow of the CNN processing method according to the second embodiment. 図７は、第２の実施の形態に係るＣＮＮ処理方法を説明するための図である。FIG. 7 is a diagram for explaining the CNN processing method according to the second embodiment. 図８は、第３の実施の形態に係るＣＮＮ処理装置の機能を説明するブロック図である。FIG. 8 is a block diagram illustrating the function of the CNN processing device according to the third embodiment. 図９は、第３の実施の形態に係るＣＮＮ処理方法のフローを説明するための図である。FIG. 9 is a diagram for explaining the flow of the CNN processing method according to the third embodiment. 図１０は、第３の実施の形態に係るＣＮＮ処理方法を説明するための図である。FIG. 10 is a diagram for explaining the CNN processing method according to the third embodiment. 図１１は、従来のＣＮＮの演算処理を説明するための図である。FIG. 11 is a diagram for explaining a conventional CNN calculation process.

以下、本発明の好適な実施の形態について、図１から図１０を参照して詳細に説明する。なお、以下においては、「ニューラルネットワーク」としてＣＮＮを用いる場合を例に挙げて説明する。
［第１の実施の形態］
まず、本発明の第１の実施の形態に係るＣＮＮ処理装置（ニューラルネットワーク処理装置）１の構成を説明する。図１は、ＣＮＮ処理装置１の機能構成を示すブロック図である。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to FIGS. 1 to 10. In the following, a case where CNN is used as the “neural network” will be described as an example.
[First Embodiment]
First, the configuration of the CNN processing device (neural network processing device) 1 according to the first embodiment of the present invention will be described. FIG. 1 is a block diagram showing a functional configuration of the CNN processing device 1.

本実施の形態に係るＣＮＮ処理装置１は、ＣＮＮに与えられる入力信号とＣＮＮの重みとの積和演算を行って演算結果を出力する演算処理装置である。この演算処理は、ＣＮＮの中間層を構成する畳み込み層の積和演算（以下、「畳み込み演算」ということがある。）を含む。ＣＮＮ処理装置１が畳み込み演算を行った演算結果は、ＲｅＬＵなどの活性化関数にかけられて、１層分の畳み込み層の出力が得られる。以下においては、説明の簡単のため、畳み込み層の積和演算の結果、すなわち畳み込み演算の結果が次の畳み込み層の入力信号として用いられるものとする。ＣＮＮ処理装置１は、入力信号と重みとの積和演算を繰り返し行って、予め設定されているＣＮＮモデルが有する畳み込み層の数に応じた回数の積和演算を実行する。 The CNN processing device 1 according to the present embodiment is a processing device that performs a product-sum calculation of an input signal given to the CNN and the weight of the CNN and outputs a calculation result. This operation processing includes a product-sum operation of the convolutional layers forming the intermediate layer of CNN (hereinafter, may be referred to as “convolutional operation”). The result of the convolution operation performed by the CNN processing device 1 is multiplied by an activation function such as ReLU to obtain an output of the convolutional layer for one layer. In the following, for simplicity of explanation, it is assumed that the result of the product-sum operation of the convolutional layer, that is, the result of the convolutional operation is used as the input signal of the next convolutional layer. The CNN processing device 1 repeatedly performs the product-sum calculation of the input signal and the weight, and executes the product-sum calculation of the number of times corresponding to the number of convolutional layers included in the preset CNN model.

［ＣＮＮ処理装置の機能ブロック］
上述したＣＮＮ処理装置１は、入力バッファ（第１メモリ）１０、重みバッファ（第２メモリ）１１、畳み込み演算部１２、演算結果バッファ１３、量子化処理部１４、出力バッファ１５、および記憶部１６を備える。 [Functional block of CNN processing device]
The CNN processing device 1 described above includes an input buffer (first memory) 10, a weight buffer (second memory) 11, a convolution operation unit 12, an operation result buffer 13, a quantization processing unit 14, an output buffer 15, and a storage unit 16. Equipped with.

入力バッファ１０は、ＣＮＮに与えられる入力信号を記憶する。より詳細には、入力バッファ１０には、例えば、外部から与えられた画像データなどの入力信号が記憶される。また、入力バッファ１０は、画像データなどの入力信号を量子化処理部１４に出力する。入力バッファ１０に供給される入力信号は、予め前処理を行った画像データであってもよい。前処理の例としては、モノクロ変換、コントラスト調整、および輝度調整などが挙げられる。また、入力信号は、ＣＮＮ処理装置１において予め設定されているＣＮＮモデルに応じて設定されたビット深度となるように縮小されていてもよい。入力バッファ１０に供給される入力信号の値としては、例えば、３２ビットや１６ビット精度の浮動小数点の配列で表現された小数点を含む値などが用いられる。 The input buffer 10 stores the input signal supplied to CNN. More specifically, the input buffer 10 stores an input signal such as image data given from the outside. The input buffer 10 also outputs an input signal such as image data to the quantization processing unit 14. The input signal supplied to the input buffer 10 may be pre-processed image data. Examples of preprocessing include monochrome conversion, contrast adjustment, and brightness adjustment. Further, the input signal may be reduced to have a bit depth set according to a CNN model preset in the CNN processing device 1. As the value of the input signal supplied to the input buffer 10, for example, a value including a decimal point represented by a 32-bit or 16-bit precision floating point array is used.

重みバッファ１１は、ＣＮＮの重みを記憶する。より詳細には、重みバッファ１１には、ＣＮＮ処理装置１の外部に設置されたサーバ（図示しない）や記憶部１６などに予め記憶されているＣＮＮの重みパラメータがロードされる。本実施の形態では、重みの値として、３２ビットや１６ビット精度の浮動小数点の配列で表現された小数点を含む値などが用いられる。重みバッファ１１は、バッファリングした重みを、後述の量子化処理部１４に出力する。 The weight buffer 11 stores the weight of CNN. More specifically, the weight buffer 11 is loaded with CNN weight parameters stored in advance in a server (not shown) installed outside the CNN processing device 1 or the storage unit 16. In the present embodiment, as the weight value, a value including a decimal point represented by a 32-bit or 16-bit precision floating-point array is used. The weight buffer 11 outputs the buffered weight to the quantization processing unit 14 described later.

畳み込み演算部１２は、入力バッファ１０に記憶されている入力信号と、重みバッファ１１に記憶されている重みとの積和演算を含むＣＮＮの畳み込み演算を行う（第１処理）。より詳細には、畳み込み演算部１２は、ＣＮＮ処理装置１において予め設定されているＣＮＮモデルを構成する畳み込み層の積和演算を行う。本実施の形態では、畳み込み演算部１２は、後述の量子化処理部１４によって量子化された入力信号および重みの値に基づいて、畳み込み演算を行う。畳み込み演算部１２が出力する演算結果は、演算結果バッファ１３に供給される。 The convolution operation unit 12 performs a CNN convolution operation including a product-sum operation of the input signal stored in the input buffer 10 and the weight stored in the weight buffer 11 (first processing). More specifically, the convolution operation unit 12 performs a product-sum operation of convolutional layers forming a CNN model preset in the CNN processing device 1. In the present embodiment, the convolution operation unit 12 performs the convolution operation based on the input signal quantized by the quantization processing unit 14 described below and the weight value. The calculation result output from the convolution calculation unit 12 is supplied to the calculation result buffer 13.

演算結果バッファ１３は、畳み込み演算部１２による畳み込み演算の結果をバッファリングし、その演算結果を量子化処理部１４に供給する。 The operation result buffer 13 buffers the result of the convolution operation by the convolution operation unit 12, and supplies the operation result to the quantization processing unit 14.

量子化処理部１４は、畳み込み演算の処理（第１処理）で用いられるデータの少なくとも一部のビット精度を削減する量子化を行う（第２処理）。より詳細には、量子化処理部１４は、入力信号、重み、および畳み込み演算の演算結果の少なくとも一部のデータを量子化する。本実施の形態では、量子化処理部１４は、入力信号、重み、および畳み込み演算の演算結果の全てのデータについて量子化処理を行う。 The quantization processing unit 14 performs quantization for reducing the bit precision of at least part of the data used in the convolution operation process (first process) (second process). More specifically, the quantization processing unit 14 quantizes at least part of the data of the input signal, the weight, and the operation result of the convolution operation. In the present embodiment, the quantization processing unit 14 performs a quantization process on all data of the input signal, the weight, and the calculation result of the convolution calculation.

具体的には、量子化処理部１４は、入力バッファ１０から読み出した入力信号の量子化、重みバッファ１１から読み出した重みの量子化、および演算結果バッファ１３から読み出した畳み込み演算の結果の量子化を行う。さらに、量子化処理部１４は、入力バッファ１０、重みバッファ１１、および演算結果バッファ１３にそれぞれ記憶されている入力信号、重み、および演算結果を、量子化された入力信号、重み、および演算結果の値に更新する。 Specifically, the quantization processing unit 14 quantizes the input signal read from the input buffer 10, quantizes the weight read from the weight buffer 11, and quantizes the result of the convolution operation read from the operation result buffer 13. I do. Further, the quantization processing unit 14 converts the input signal, the weight, and the calculation result stored in the input buffer 10, the weight buffer 11, and the calculation result buffer 13 into the quantized input signal, the weight, and the calculation result, respectively. Update to the value of.

量子化処理部１４が実行する量子化処理は、例えば、四捨五入、切り上げ、切り捨て、最近接丸めなどよく知られた端数処理を含み、入力信号、重み、および畳み込み演算の演算結果のそれぞれの値に対して、小数点を含む値を整数化するなどの制限をかける。量子化処理に伴う処理速度の向上と処理結果の精度とは、互いににトレードオフの関係となる。量子化処理部１４が行う各値の量子化は、例えば、８ビットや１６ビットなどＣＮＮ処理装置１が備えるプロセッサ１０２が一度に扱えるビット数に応じたビット数へ削減することによって処理速度と精度の両立を図ることができる。例えば、ＦＰＧＡであれば、各値を１ビットに削減すればよい。なお、量子化処理部１４は、ＣＮＮ処理装置１が扱うＣＮＮを構成する、多段に接続された複数の畳み込み層のうち、一部の畳み込み層のみに用いてもよい。 The quantization processing executed by the quantization processing unit 14 includes, for example, well-known fractional processing such as rounding, rounding up, rounding down, nearest rounding, and the like. On the other hand, restrictions such as converting the value including the decimal point to an integer are applied. There is a trade-off between the improvement of the processing speed associated with the quantization processing and the accuracy of the processing result. The quantization of each value performed by the quantization processing unit 14 is, for example, processing speed and accuracy by reducing the number of bits according to the number of bits that the processor 102 included in the CNN processing device 1 can handle at once, such as 8 bits or 16 bits. It is possible to achieve both. For example, in the case of FPGA, each value may be reduced to 1 bit. Note that the quantization processing unit 14 may be used only for some of the convolutional layers that are connected by the CNN processing device 1 and that are included in the CNN and are connected in multiple stages.

出力バッファ１５は、量子化処理部１４によって量子化された畳み込み演算の結果を一時的に記憶する。 The output buffer 15 temporarily stores the result of the convolution operation quantized by the quantization processing unit 14.

記憶部１６は、出力バッファ１５に一時的に記憶されている量子化された畳み込み演算の結果を格納する。記憶部１６に記憶される畳み込み演算の結果が、本実施の形態におけるＣＮＮの畳み込み層の出力値、かつ、次の畳み込み層の入力値として記憶される。また、記憶部１６は、量子化処理部１４が入力信号、重み、および畳み込み演算の結果の値をそれぞれ量子化する際に用いる端数処理の方法を示す情報を予め記憶している。 The storage unit 16 stores the result of the quantized convolution operation temporarily stored in the output buffer 15. The result of the convolution operation stored in the storage unit 16 is stored as the output value of the convolutional layer of CNN and the input value of the next convolutional layer in the present embodiment. Further, the storage unit 16 stores in advance information indicating a fraction processing method used when the quantization processing unit 14 quantizes the input signal, the weight, and the value of the result of the convolution operation.

［ＣＮＮ処理装置のハードウェア構成］
次に、上述した機能を有するＣＮＮ処理装置１のハードウェア構成の例について図２のブロック図を用いて説明する。 [Hardware configuration of CNN processing device]
Next, an example of the hardware configuration of the CNN processing device 1 having the above-described functions will be described with reference to the block diagram of FIG.

図２に示すように、ＣＮＮ処理装置１は、例えば、バス１０１を介して接続されるプロセッサ１０２、主記憶装置１０３、通信インターフェース１０４、補助記憶装置１０５、入出力装置１０６を備えるコンピュータと、これらのハードウェア資源を制御するプログラムによって実現することができる。 As shown in FIG. 2, the CNN processing device 1 is, for example, a computer including a processor 102, a main storage device 103, a communication interface 104, an auxiliary storage device 105, and an input / output device 106, which are connected via a bus 101, and these. Can be realized by a program that controls the hardware resources of

主記憶装置１０３には、プロセッサ１０２が各種制御や演算を行うためのプログラムが予め格納されている。プロセッサ１０２と主記憶装置１０３とによって、図１に示した畳み込み演算部１２および量子化処理部１４を含むＣＮＮ処理装置１の各機能が実現される。 Programs for the processor 102 to perform various controls and calculations are stored in the main storage device 103 in advance. Each function of the CNN processing device 1 including the convolution operation unit 12 and the quantization processing unit 14 illustrated in FIG. 1 is realized by the processor 102 and the main storage device 103.

主記憶装置１０３によって、図１で説明した入力バッファ１０、重みバッファ１１、演算結果バッファ１３、および出力バッファ１５が実現される。 The main storage device 103 implements the input buffer 10, the weight buffer 11, the calculation result buffer 13, and the output buffer 15 described with reference to FIG.

通信インターフェース１０４は、通信ネットワークＮＷを介して各種外部電子機器との通信を行うためのインターフェース回路である。通信インターフェース１０４を介して、ＣＮＮ処理装置１が用いる画像データなどの入力信号や、重みを、外部のサーバなどから受信してもよい。 The communication interface 104 is an interface circuit for communicating with various external electronic devices via the communication network NW. The input signal such as image data used by the CNN processing device 1 and the weight may be received from an external server or the like via the communication interface 104.

補助記憶装置１０５は、読み書き可能な記憶媒体と、その記憶媒体に対してプログラムやデータなどの各種情報を読み書きするための駆動装置とで構成されている。補助記憶装置１０５には、記憶媒体としてハードディスクやフラッシュメモリなどの半導体メモリを使用することができる。 The auxiliary storage device 105 includes a readable / writable storage medium and a drive device for reading / writing various information such as programs and data from / to the storage medium. For the auxiliary storage device 105, a semiconductor memory such as a hard disk or a flash memory can be used as a storage medium.

補助記憶装置１０５は、外部から取得された入力データや重みを記憶する記憶領域や、ＣＮＮ処理装置１が畳み込み演算などのＣＮＮの演算処理を行うためのプログラムを格納するプログラム格納領域を有する。補助記憶装置１０５によって、図１で説明した記憶部１６が実現される。さらには、例えば、上述したデータやプログラムやなどをバックアップするためのバックアップ領域などを有していてもよい。 The auxiliary storage device 105 has a storage area for storing input data and weights obtained from the outside, and a program storage area for storing a program for the CNN processing device 1 to perform CNN arithmetic processing such as convolution arithmetic. The auxiliary storage device 105 realizes the storage unit 16 described in FIG. 1. Furthermore, for example, it may have a backup area for backing up the above-mentioned data and programs.

入出力装置１０６は、外部機器からの信号を入力したり、外部機器へ信号を出力したりするＩ／Ｏ端子により構成される。入出力装置１０６を介して、図示しない表示装置などを備えて、ＣＮＮ処理装置１によって出力される演算結果を表示してもよい。 The input / output device 106 is configured by an I / O terminal that inputs a signal from an external device and outputs a signal to the external device. A display device (not shown) or the like may be provided via the input / output device 106 to display the calculation result output by the CNN processing device 1.

ここで、補助記憶装置１０５のプログラム格納領域に格納されているプログラムは、本明細書で説明するＣＮＮ処理方法の順序に沿って時系列に処理が行われるプログラムであってもよく、並列に、あるいは呼び出しが行われたときなどの必要なタイミングで処理が行われるプログラムであってもよい。また、プログラムは、１つのコンピュータにより処理されるものでもよく、複数のコンピュータによって分散処理されるものであってもよい。 Here, the program stored in the program storage area of the auxiliary storage device 105 may be a program that is processed in time series according to the order of the CNN processing method described in this specification, or in parallel. Alternatively, it may be a program that is processed at a necessary timing such as when it is called. Further, the program may be processed by one computer or may be processed in a distributed manner by a plurality of computers.

［ＣＮＮ処理方法］
次に、上述した構成を有するＣＮＮ処理装置１の動作について図３および図４を参照して説明する。まず、入力バッファ１０および重みバッファ１１は、ＣＮＮ処理装置１の外部に設置されたサーバなどから与えられた入力信号Ａおよび重みＵをそれぞれ一時的に記憶する（ステップＳ１、ステップＳ３）。 [CNN processing method]
Next, the operation of the CNN processing device 1 having the above configuration will be described with reference to FIGS. 3 and 4. First, the input buffer 10 and the weight buffer 11 temporarily store the input signal A and the weight U provided from a server or the like installed outside the CNN processing device 1 (steps S1 and S3).

図４に示すように、入力バッファ１０には、入力信号Ａが記憶されている。入力信号Ａは、ベクトル化された入力画像データであり、縦方向と横方向の次元を持つ。入力信号Ａの値は、小数点を含む値で表現される。図４の太線で示す四角形はフィルタを表している。また、重みバッファ１１には、重みＵが記憶されている。重みＵは、行列で表されるカーネルの要素であり、ＣＮＮの学習によって調整および更新されて最終的に決定されるパラメータである。重みＵの値についても、縦方向と横方向の次元を持ち、各要素は小数点を含む値で表現される。 As shown in FIG. 4, the input signal A is stored in the input buffer 10. The input signal A is vectorized input image data and has vertical and horizontal dimensions. The value of the input signal A is represented by a value including a decimal point. The bold rectangle in FIG. 4 represents a filter. The weight buffer 11 stores the weight U. The weight U is an element of the kernel represented by a matrix, and is a parameter that is adjusted and updated by learning of CNN and finally determined. The value of the weight U also has dimensions in the vertical direction and the horizontal direction, and each element is represented by a value including a decimal point.

次に、量子化処理部１４は、入力バッファ１０から入力信号Ａを読み出して各要素の値を量子化する（ステップＳ２）。また、量子化処理部１４は、重みバッファ１１から重みＵを読み出して各要素の値を量子化する（ステップＳ４）。 Next, the quantization processing unit 14 reads the input signal A from the input buffer 10 and quantizes the value of each element (step S2). The quantization processing unit 14 also reads the weight U from the weight buffer 11 and quantizes the value of each element (step S4).

より具体的には、図４に示すように、小数点を含む入力信号Ａの値は、量子化処理部１４によって、例えば、四捨五入される。量子化処理部１４は、入力バッファ１０の入力信号Ａの値を、量子化された入力信号Ａ’の値で更新する。 More specifically, as shown in FIG. 4, the value of the input signal A including the decimal point is rounded off by the quantization processing unit 14, for example. The quantization processing unit 14 updates the value of the input signal A of the input buffer 10 with the value of the quantized input signal A ′.

また、図４に示すように、小数点を含む重みＵの値は、量子化処理部１４によって、例えば、四捨五入される。量子化処理部１４は、重みバッファ１１の重みＵの値を、量子化された重みＵ’の値で更新する。 Further, as shown in FIG. 4, the value of the weight U including the decimal point is rounded off by the quantization processing unit 14, for example. The quantization processing unit 14 updates the value of the weight U of the weight buffer 11 with the value of the quantized weight U ′.

図３に戻り、畳み込み演算部１２は、入力バッファ１０および重みバッファ１１からそれぞれ量子化された入力信号Ａ’および量子化された重みＵ’を読み出して、畳み込み演算を行う（ステップＳ５）。より詳細には、畳み込み演算部１２は、量子化された入力信号Ａ’のベクトルと量子化された重みＵ’の行列を乗算する。 Returning to FIG. 3, the convolution operation unit 12 reads the quantized input signal A ′ and the quantized weight U ′ from the input buffer 10 and the weight buffer 11, respectively, and performs the convolution operation (step S5). More specifically, the convolution operation unit 12 multiplies the vector of the quantized input signal A ′ by the matrix of the quantized weight U ′.

具体的には、図４に示すように、ＣＮＮのフィルタのウィンドウ（図４の例では、２×２で示す太線の四角）を所定のストライドでスライドさせる。畳み込み演算部１２は、フィルタのそれぞれの場所で、量子化された重みＵ’の要素と、量子化された入力信号Ａ’の対応する要素とを乗算し、その和を求める。 Specifically, as shown in FIG. 4, the window of the CNN filter (in the example of FIG. 4, the thick line rectangle indicated by 2 × 2) is slid by a predetermined stride. The convolution operation unit 12 multiplies the element of the quantized weight U ′ by the corresponding element of the quantized input signal A ′ at each position of the filter, and obtains the sum.

畳み込み演算部１２は、この積和演算による畳み込み演算の演算結果Ｂを演算結果バッファ１３の対応する場所に格納する（ステップＳ６）。具体的には、図４に示すように、演算結果バッファ１３には、小数点を含まない積和の結果「３７」が格納されている。 The convolution operation unit 12 stores the operation result B of the convolution operation by this product-sum operation in the corresponding location of the operation result buffer 13 (step S6). Specifically, as shown in FIG. 4, the calculation result buffer 13 stores a product-sum result “37” that does not include a decimal point.

その後、量子化処理部１４は、演算結果バッファ１３からステップＳ５で得られた畳み込み演算の演算結果を読み出して、量子化処理を行う（ステップＳ７）。具体的には、図４に示すように、量子化処理部１４は、畳み込み演算の演算結果「３７」を、例えば、四捨五入して、「４０」に量子化する。量子化処理部１４は、演算結果バッファ１３のデータを、量子化した演算結果Ｂ’で更新する。 After that, the quantization processing unit 14 reads the operation result of the convolution operation obtained in step S5 from the operation result buffer 13 and performs the quantization process (step S7). Specifically, as illustrated in FIG. 4, the quantization processing unit 14 rounds the operation result “37” of the convolution operation, for example, and quantizes it to “40”. The quantization processing unit 14 updates the data in the operation result buffer 13 with the quantized operation result B '.

次に、ＣＮＮ処理装置１のプロセッサ１０２は、演算結果バッファ１３から量子化された畳み込み演算の演算結果Ｂ’を読み出し、ＲｅＬＵなどの活性化関数を適用する（ステップＳ８）。具体的には、プロセッサ１０２は、演算結果Ｂ’が負の値である場合には、ＲｅＬＵ関数を通して０を出力し、正の演算結果Ｂ’はそのままの値を出力する。 Next, the processor 102 of the CNN processing device 1 reads the quantized convolution operation result B ′ from the operation result buffer 13 and applies an activation function such as ReLU (step S8). Specifically, the processor 102 outputs 0 through the ReLU function when the calculation result B ′ is a negative value, and outputs the positive calculation result B ′ as it is.

次に、プロセッサ１０２は、ステップＳ８で出力された値に対してよく知られたプーリング処理を行い、畳み込み演算の結果を圧縮する（ステップＳ９）。なお、ステップＳ９のプーリング処理は必要に応じて行えばよい。また、プロセッサ１０２は、ステップＳ８で得られた活性化関数（ＲｅＬＵ）の出力に対して、正規化を行ってもよい（非特許文献１参照）。 Next, the processor 102 performs well-known pooling processing on the value output in step S8, and compresses the result of the convolution operation (step S9). The pooling process in step S9 may be performed as necessary. Further, the processor 102 may normalize the output of the activation function (ReLU) obtained in step S8 (see Non-Patent Document 1).

プーリング処理された量子化された演算結果Ｂ’は、出力バッファ１５に蓄えられて、プロセッサ１０２により読み出されて出力される（ステップＳ１０）。なお、出力された値は、ＣＮＮの特徴抽出部の出力として、後続の図示しない分類器を構成する全結合層に入力され、入力信号Ａの画像データを判別する。 The quantized operation result B'which has been subjected to the pooling processing is stored in the output buffer 15, read out by the processor 102 and output (step S10). The output value is input as an output of the feature extraction unit of the CNN to the all-connection layer that constitutes a subsequent classifier (not shown), and the image data of the input signal A is determined.

以上説明したように、第１の実施の形態に係るＣＮＮ処理装置１によれば、ＣＮＮの入力信号、重み、および畳み込み演算の演算結果を量子化するので、組み込み用のハードウェアを使用した場合であってもＣＮＮの処理速度の低下を抑制することができる。 As described above, according to the CNN processing apparatus 1 according to the first embodiment, the CNN input signal, the weight, and the operation result of the convolution operation are quantized. Even in this case, it is possible to suppress a decrease in CNN processing speed.

また、ＣＮＮ処理装置１によれば、特に、畳み込み演算の演算結果についても量子化するので、多数の層からなるＣＮＮ全体の計算負荷を低減することができ、信号処理の高速化が可能となる。 Further, according to the CNN processing device 1, in particular, the calculation result of the convolution calculation is also quantized, so that the calculation load of the entire CNN including a large number of layers can be reduced and the signal processing can be speeded up. .

なお、説明した実施の形態では、積和演算の説明において、説明の簡単のため、バイアスの加算を省略した。しかし、入力信号のベクトルと、重み行列とを乗算し、その後加算されるバイアスの値についても同様に量子化してもよい。 In the embodiment described above, in the description of the product-sum calculation, the addition of bias is omitted for simplicity of description. However, the vector of the input signal may be multiplied by the weight matrix, and the bias value added after that may be quantized in the same manner.

また、説明した実施の形態では、量子化処理部１４が、入力信号、重み、および畳み込み演算の演算結果の全てのデータを量子化する場合について説明した。しかし、量子化処理部１４は、入力信号、重み、および畳み込み演算の演算結果のいずれか１つまたは２つに含まれるデータの量子化を行う構成としてもよい。 Moreover, in the embodiment described, the case where the quantization processing unit 14 quantizes all data of the input signal, the weight, and the calculation result of the convolution calculation has been described. However, the quantization processing unit 14 may be configured to quantize the data included in any one or two of the input signal, the weight, and the operation result of the convolution operation.

［第２の実施の形態］
次に、本発明の第２の実施の形態について説明する。なお、以下の説明では、上述した第１の実施の形態と同じ構成については同一の符号を付し、その説明を省略する。 [Second Embodiment]
Next, a second embodiment of the present invention will be described. In the following description, the same components as those in the first embodiment described above will be designated by the same reference numerals and the description thereof will be omitted.

第１の実施の形態では、量子化処理部１４がＣＮＮにおける入力信号Ａ、重みＵ、および畳み込み演算の演算結果Ｂについてそれぞれ量子化処理を行う場合について説明した。これに対し、第２の実施の形態に係るＣＮＮ処理装置１Ａでは、予め量子化された重みＵ’を用いる。また、ＣＮＮ処理装置１Ａは、量子化処理が組み込まれた畳み込み演算を行う量子化畳み込み演算部１２Ａを備える。 In the first embodiment, the case has been described where the quantization processing unit 14 respectively performs the quantization processing on the input signal A, the weight U, and the operation result B of the convolution operation in the CNN. On the other hand, the CNN processing device 1A according to the second embodiment uses the weight U'quantized in advance. The CNN processing device 1A also includes a quantization convolution operation unit 12A that performs a convolution operation that incorporates a quantization process.

［ＣＮＮ処理装置の機能ブロック］
図５は、第２の実施の形態に係るＣＮＮ処理装置１Ａの機能構成を示すブロック図である。ＣＮＮ処理装置１Ａは、入力バッファ１０、量子化重みバッファ（第２メモリ）１１Ａ、量子化畳み込み演算部１２Ａ、演算結果バッファ１３、出力バッファ１５、および記憶部１６を備える。以下、第１の実施の形態と異なる構成を中心に説明する。 [Functional block of CNN processing device]
FIG. 5 is a block diagram showing a functional configuration of the CNN processing device 1A according to the second embodiment. The CNN processing device 1A includes an input buffer 10, a quantization weight buffer (second memory) 11A, a quantization convolution operation unit 12A, an operation result buffer 13, an output buffer 15, and a storage unit 16. Hereinafter, the configuration different from that of the first embodiment will be mainly described.

量子化重みバッファ１１Ａは、予め重みが量子化された重み（量子化重み）を記憶する。量子化重みバッファ１１Ａは、例えば、ＣＮＮ処理装置１Ａの外部に設置されているサーバ（図示しない）などから取得された、予め量子化された重みを一時的に記憶する。例えば、ＣＮＮ処理装置１Ａは、学習済みのＣＮＮの重みを予め外部に設置されたサーバなどから通信ネットワークＮＷを介して取得し、記憶部１６に予め格納しておいてもよい。この場合、量子化重みバッファ１１Ａは、記憶部１６から量子化された重みを読み出して一時的に蓄える。ここで、量子化された重みは、第１の実施の形態と同様に、ＣＮＮ処理装置１Ａの処理能力に応じたビット精度に削減された重みであり、例えば、小数点を含む重みが整数化されたものが含まれる。 The quantization weight buffer 11A stores weights (quantization weights) whose weights are quantized in advance. The quantization weight buffer 11A temporarily stores, for example, pre-quantized weights obtained from a server (not shown) installed outside the CNN processing device 1A. For example, the CNN processing device 1A may acquire the learned CNN weights from a server or the like installed outside in advance via the communication network NW and store the weights in the storage unit 16 in advance. In this case, the quantization weight buffer 11A reads the quantized weight from the storage unit 16 and temporarily stores it. Here, the quantized weight is a weight reduced to bit precision according to the processing capacity of the CNN processing device 1A, as in the first embodiment. For example, the weight including the decimal point is converted into an integer. Included.

量子化畳み込み演算部１２Ａは、入力信号および量子化された重みを、入力バッファ１０および量子化重みバッファ１１Ａからそれぞれ読み出し、畳み込み演算を行う。より詳には、量子化畳み込み演算部１２Ａは、畳み込み演算を行う際に、信号の量子化処理も行う。例えば、量子化畳み込み演算部１２Ａは、畳み込み演算の演算結果の桁数を予め整数に制限しておく処理や、演算結果の桁数に制限がある場合には、計算前にビットシフトを行う演算を畳み込み演算を実行する際に実施する。量子化畳み込み演算部１２Ａによる演算結果は、演算結果バッファ１３に一時的に記憶される。 The quantization convolution operation unit 12A reads the input signal and the quantized weight from the input buffer 10 and the quantization weight buffer 11A, respectively, and performs the convolution operation. More specifically, the quantization convolution operation unit 12A also performs a signal quantization process when performing the convolution operation. For example, the quantization convolution operation unit 12A performs a process of limiting the number of digits of the operation result of the convolution operation to an integer in advance, or an operation of performing a bit shift before the calculation when the number of digits of the operation result is limited. Is performed when executing the convolution operation. The calculation result by the quantization convolution calculation unit 12A is temporarily stored in the calculation result buffer 13.

［ＣＮＮ処理方法］
次に、上述した構成を有するＣＮＮ処理装置１Ａの動作について、図６および図７を参照して説明する。まず、外部のサーバなどで量子化された重みＵ’がＣＮＮ処理装置１において通信ネットワークＮＷを介して取得され、量子化された重みＵ’が記憶部１６に記憶されているものとする。 [CNN processing method]
Next, the operation of the CNN processing device 1A having the above configuration will be described with reference to FIGS. 6 and 7. First, it is assumed that the weight U ′ quantized by an external server or the like is acquired by the CNN processing device 1 via the communication network NW, and the quantized weight U ′ is stored in the storage unit 16.

まず、入力バッファ１０は、ＣＮＮ処理装置１の外部に設置されたサーバなどから取得された入力信号Ａを一時的に記憶する（ステップＳ２０）。また、量子化重みバッファ１１Ａは、記憶部１６から予め量子化された重みＵ’を読み出して一時的に記憶する（ステップＳ２１）。 First, the input buffer 10 temporarily stores the input signal A acquired from a server or the like installed outside the CNN processing device 1 (step S20). In addition, the quantization weight buffer 11A reads the weight U'quantized in advance from the storage unit 16 and temporarily stores it (step S21).

図７に示すように、入力バッファ１０には、入力信号Ａが記憶されている。入力信号Ａは、ベクトル化された入力画像データであり、縦方向と横方向の次元を持つ。入力信号Ａの値は、小数点を含む値で表現される。図７の太線で示す四角形はフィルタを表している。また、量子化重みバッファ１１Ａには、量子化された重みＵ’が記憶されている。重みＵは、行列で表されるカーネルの要素であり、ＣＮＮの学習によって調整および更新されて最終的に決定されるパラメータである。量子化された重みＵ’は、小数点の値を含む各要素からなる重みＵを、例えば、整数化してビット数を削減して表した重みである。量子化された重みＵ’についても、重みＵと同様に縦方向と横方向の次元を持つ。 As shown in FIG. 7, the input signal A is stored in the input buffer 10. The input signal A is vectorized input image data and has vertical and horizontal dimensions. The value of the input signal A is represented by a value including a decimal point. The rectangle shown by the thick line in FIG. 7 represents a filter. The quantized weight buffer 11A stores the quantized weight U '. The weight U is an element of the kernel represented by a matrix, and is a parameter that is adjusted and updated by learning of CNN and finally determined. The quantized weight U ′ is a weight that represents, for example, the weight U including each element including a decimal point value by converting it into an integer to reduce the number of bits. Similarly to the weight U, the quantized weight U'has vertical and horizontal dimensions.

次に、量子化畳み込み演算部１２Ａは、入力バッファ１０から入力信号Ａを、量子化重みバッファ１１Ａから量子化された重みＵ’を読み出して量子化処理が組み込まれた畳み込み演算を行う（ステップＳ２２）。例えば、量子化畳み込み演算部１２Ａは、畳み込み演算の演算結果の桁数を整数に制限しておく処理を行う。また、例えば、量子化畳み込み演算部１２Ａは、畳み込み演算を行う前に、入力信号Ａや量子化された重みＵ’に対して予めビットシフトを行いデータの桁数に対する処理を行う。 Next, the quantization convolution operation unit 12A reads the input signal A from the input buffer 10 and the quantized weight U ′ from the quantization weight buffer 11A, and performs the convolution operation incorporating the quantization processing (step S22). ). For example, the quantization convolution operation unit 12A performs a process of limiting the number of digits of the operation result of the convolution operation to an integer. In addition, for example, the quantization convolution operation unit 12A performs a bit shift in advance on the input signal A and the quantized weight U ′ before performing the convolution operation, and performs processing on the number of digits of data.

より具体的には、図７の例に示すように、量子化畳み込み演算部１２Ａは、畳み込み演算の際に、小数点以下を切り捨てる処理を行う（Ｃ＝ｆｌｏｏｒ（ｃｏｎｖ（Ａ，Ｕ’））。また、量子化畳み込み演算部１２Ａは、フィルタのそれぞれの場所で、量子化された重みＵ’の要素と、入力信号Ａの対応する要素とを乗算し、その和を求める。量子化畳み込み演算部１２Ａによる量子化された畳み込み演算の演算結果Ｃは、演算結果バッファ１３に蓄えられる（ステップＳ２３）。 More specifically, as shown in the example of FIG. 7, the quantization convolution operation unit 12A performs a process of rounding down the decimal point during the convolution operation (C = floor (conv (A, U ')). Further, the quantization convolution operation unit 12A multiplies the element of the quantized weight U ′ and the corresponding element of the input signal A at each position of the filter, and obtains the sum. The operation result C of the quantized convolution operation by 12A is stored in the operation result buffer 13 (step S23).

その後、ＣＮＮ処理装置１Ａのプロセッサ１０２は、演算結果バッファ１３から量子化された畳み込み演算の演算結果Ｃを読み出し、ＲｅＬＵなどの活性化関数を適用する（ステップＳ２４）。具体的には、プロセッサ１０２は、演算結果Ｃが負の値である場合には、ＲｅＬＵ関数を通して０を出力し、正の演算結果Ｃはそのままの値を出力する。 After that, the processor 102 of the CNN processing device 1A reads the quantized convolution operation result C from the operation result buffer 13 and applies an activation function such as ReLU (step S24). Specifically, the processor 102 outputs 0 through the ReLU function when the calculation result C has a negative value, and outputs the same value as the positive calculation result C.

次に、プロセッサ１０２は、ステップＳ２４で出力された値に対してよく知られたプーリング処理を行い、畳み込み演算の結果を圧縮する（ステップＳ２５）。なお、ステップＳ２５のプーリング処理は必要に応じて行えばよい。また、プロセッサ１０２は、ステップＳ２４で得られた活性化関数（ＲｅＬＵ）の出力に対して、正規化を行ってもよい（非特許文献１参照）。 Next, the processor 102 performs well-known pooling processing on the value output in step S24, and compresses the result of the convolution operation (step S25). The pooling process in step S25 may be performed as needed. Further, the processor 102 may normalize the output of the activation function (ReLU) obtained in step S24 (see Non-Patent Document 1).

プーリングされた量子化畳み込み演算の演算結果Ｃは、出力バッファ１５に蓄えられて、さらに、プロセッサ１０２により読み出されて外部に出力される（ステップＳ２６）。なお、出力された値は、ＣＮＮの特徴抽出部の出力として、後続の図示しない分類器を構成する全結合層に入力されて入力信号Ａの画像データを判別する。 The pooled quantized convolutional calculation result C is stored in the output buffer 15, further read by the processor 102 and output to the outside (step S26). The output value is input as an output of the feature extraction unit of the CNN to the subsequent all-connection layer that constitutes a classifier (not shown) to determine the image data of the input signal A.

以上説明したように、第２の実施の形態に係るＣＮＮ処理装置１Ａによれば、予め量子化された重みを用いることで、ＣＮＮの畳み込み演算における演算処理を削減することができる。また、ＣＮＮ処理装置１Ａでは、畳み込み演算を行う際に、量子化処理を組み込むので、入力信号や重みの量子化に伴う計算処理を削減することができる。 As described above, according to the CNN processing device 1A according to the second embodiment, it is possible to reduce the calculation processing in the CNN convolution calculation by using the quantized weight in advance. In addition, since the CNN processing device 1A incorporates the quantization processing when performing the convolution operation, it is possible to reduce the calculation processing associated with the quantization of the input signal and the weight.

［第３の実施の形態］
次に、本発明の第３の実施の形態について説明する。なお、以下の説明では、上述した第１および第２の実施の形態と同じ構成については同一の符号を付し、その説明を省略する。 [Third Embodiment]
Next, a third embodiment of the present invention will be described. In the following description, the same components as those in the above-described first and second embodiments will be designated by the same reference numerals and the description thereof will be omitted.

第２の実施の形態では、量子化畳み込み演算部１２Ａは、予め量子化された重みを用いて、量子化処理を組み込んだ畳み込み演算を行う場合について説明した。これに対し、第３の実施の形態では、重みの量子化、および量子化畳み込み演算を行う際に、記憶部１６Ｂに予め記憶されている関数を読み出して演算を行う。 In the second embodiment, the case where the quantized convolution operation unit 12A performs the convolution operation in which the quantization process is incorporated by using the quantized weight in advance has been described. On the other hand, in the third embodiment, when the weight quantization and the quantization convolution operation are performed, the function stored in the storage unit 16B in advance is read and the operation is performed.

［ＣＮＮ処理装置の機能ブロック］
図８は、第３の実施の形態に係るＣＮＮ処理装置１Ｂの機能構成を示すブロック図である。ＣＮＮ処理装置１Ｂは、入力バッファ１０、量子化重みバッファ１１Ａ、畳み込み演算部１２、演算結果バッファ１３、量子化処理部１４、出力バッファ１５、および記憶部（第３メモリ、第４メモリ）１６Ｂを備える。 [Functional block of CNN processing device]
FIG. 8 is a block diagram showing a functional configuration of the CNN processing device 1B according to the third embodiment. The CNN processing device 1B includes an input buffer 10, a quantization weight buffer 11A, a convolution operation unit 12, an operation result buffer 13, a quantization processing unit 14, an output buffer 15, and a storage unit (third memory, fourth memory) 16B. Prepare

記憶部１６Ｂは、予め定義された重み量子化関数（第１関数）１６０、および畳み込み演算量子化関数（第２関数）１６１を記憶している。
重み量子化関数１６０は、重みの量子化を実現する関数である。より詳細には、重み量子化関数１６０は、量子化処理部１４が、重みＵに対して予め設定された端数処理、例えば、小数点を含む重みＵの整数化などを行い、データ数を削減する量子化処理を実現する関数である。 The storage unit 16B stores a predefined weight quantization function (first function) 160 and a convolution operation quantization function (second function) 161.
The weight quantization function 160 is a function that realizes weight quantization. More specifically, in the weight quantization function 160, the quantization processing unit 14 performs preset fraction processing on the weight U, for example, integerization of the weight U including a decimal point to reduce the number of data. This is a function that realizes the quantization process.

畳み込み演算量子化関数１６１は、量子化処理が組み込まれた畳み込み演算を実現する関数である。より詳細には、畳み込み演算量子化関数１６１は、畳み込み演算部１２が畳み込み演算を行う際に、演算結果の桁数の制限を設けたり、予め入力信号などの値のビットシフトを行う量子化処理を畳み込み演算とともに実現する関数である。 The convolution operation quantization function 161 is a function that realizes a convolution operation that incorporates quantization processing. More specifically, the convolutional operation quantization function 161 is a quantization process that limits the number of digits of the operation result when the convolutional operation unit 12 performs a convolutional operation, or performs bit shift of a value such as an input signal in advance. Is a function that realizes with the convolution operation.

量子化処理部１４は、記憶部１６Ｂから重み量子化関数１６０を呼び出して、重みＵの量子化を実行する。なお、量子化の対象となる重みＵの値は、予め記憶部１６Ｂに記憶されている。量子化処理部１４が重み量子化関数１６０を用いて量子化した重みＵ’は量子化重みバッファ１１Ａに一時的に記憶される。 The quantization processing unit 14 calls the weight quantization function 160 from the storage unit 16B and executes the quantization of the weight U. The value of the weight U to be quantized is stored in the storage unit 16B in advance. The weight U'quantized by the quantization processing unit 14 using the weight quantization function 160 is temporarily stored in the quantization weight buffer 11A.

畳み込み演算部１２は、記憶部１６Ｂに記憶されている畳み込み演算量子化関数１６１を呼び出して、量子化畳み込み演算を実行する。より詳細には、畳み込み演算部１２は、量子化重みバッファ１１Ａから量子化された重みＵ’を読み出す。また、畳み込み演算部１２は、入力バッファ１０から入力信号Ａを読み出す。そして、畳み込み演算部１２は、入力信号Ａと量子化された重みＵ’とに基づいて、呼び出した畳み込み演算量子化関数１６１を用いて量子化処理が組み込まれた畳み込み演算を行う。畳み込み演算部１２が関数を用いた演算結果は演算結果バッファ１３に格納される。 The convolution operation unit 12 calls the convolution operation quantization function 161 stored in the storage unit 16B to execute the quantization convolution operation. More specifically, the convolution operation unit 12 reads the quantized weight U ′ from the quantization weight buffer 11A. The convolution operation unit 12 also reads the input signal A from the input buffer 10. Then, the convolution operation unit 12 performs a convolution operation in which a quantization process is incorporated using the convolution operation quantization function 161 that has been called based on the input signal A and the quantized weight U ′. The calculation result obtained by using the function by the convolution calculation unit 12 is stored in the calculation result buffer 13.

［ＣＮＮ処理方法］
次に上述した構成を有するＣＮＮ処理装置１Ｂの動作について図９および図１０を参照して説明する。 [CNN processing method]
Next, the operation of the CNN processing device 1B having the above configuration will be described with reference to FIGS. 9 and 10.

まず、量子化処理部１４は、記憶部１６Ｂに予め記憶されている重み量子化関数１６０を呼び出して、重みＵの量子化を行う（ステップＳ３０）。量子化された重みＵ’は、量子化重みバッファ１１Ａに一時的に記憶される（ステップＳ３１）。例えば、図１０に示すように、量子化された重みＵ’に対応する重み量子化関数１６０が量子化処理部１４によって記憶部１６Ｂから呼び出される。図１０の例では、重み量子化関数１６０として「重み量子化関数１」で示される関数が各要素に割り当てられている。 First, the quantization processing unit 14 calls the weight quantization function 160 stored in advance in the storage unit 16B to quantize the weight U (step S30). The quantized weight U'is temporarily stored in the quantization weight buffer 11A (step S31). For example, as shown in FIG. 10, the weighting quantization function 160 corresponding to the quantized weight U ′ is called from the storage unit 16B by the quantization processing unit 14. In the example of FIG. 10, as the weight quantization function 160, the function indicated by “weight quantization function 1” is assigned to each element.

次に入力バッファ１０は、ＣＮＮ処理装置１の外部に設置されたサーバなどから取得された入力信号Ａを一時的に記憶する（ステップＳ３２）。 Next, the input buffer 10 temporarily stores the input signal A acquired from a server or the like installed outside the CNN processing device 1 (step S32).

次に、畳み込み演算部１２は、記憶部１６Ｂから畳み込み演算量子化関数１６１を呼び出して量子化処理が組み込まれた畳み込み演算を行う（ステップＳ３３）。より詳細には、畳み込み演算部１２は、入力バッファ１０から入力信号Ａを、量子化重みバッファ１１Ａから量子化された重みＵ’を読み出す。畳み込み演算部１２は、入力信号Ａと量子化された重みＵ’とに基づいて、畳み込み演算量子化関数１６１を用いて量子化処理が組み込まれた畳み込み演算を行う。例えば、畳み込み演算部１２は、呼び出した関数に従って畳み込み演算の演算結果の桁数を整数に制限しておく処理を行う。また、例えば、量子化畳み込み演算部１２Ａは、畳み込み演算を行う前に、入力信号Ａや量子化された重みＵ’に対して予めビットシフトを行いデータの桁数に対する処理を行う。 Next, the convolution operation unit 12 calls the convolution operation quantization function 161 from the storage unit 16B and performs the convolution operation in which the quantization processing is incorporated (step S33). More specifically, the convolution operation unit 12 reads the input signal A from the input buffer 10 and the quantized weight U ′ from the quantization weight buffer 11A. The convolution operation unit 12 performs a convolution operation including a quantization process using the convolution operation quantization function 161 based on the input signal A and the quantized weight U ′. For example, the convolution operation unit 12 performs a process of limiting the number of digits of the operation result of the convolution operation to an integer according to the called function. In addition, for example, the quantization convolution operation unit 12A performs a bit shift in advance on the input signal A and the quantized weight U ′ before performing the convolution operation, and performs processing on the number of digits of data.

より具体的には、図９の例に示すように、畳み込み演算部１２は、畳み込み演算量子化関数１６１の関数１に従って、畳み込み演算の際に、小数点以下を切り捨てる量子化処理を行う（ｆｌｏｏｒ（ｃｏｎｖ））。また、畳み込み演算部１２は、畳み込み演算量子化関数１６１に従って、フィルタのそれぞれの場所で、量子化された重みＵ’の要素と、入力信号Ａの対応する要素とを乗算し、その和を求める。量子化畳み込み演算部１２Ａによる量子化された畳み込み演算の演算結果Ｘは、演算結果バッファ１３に一時的に記憶される（ステップＳ３４）。 More specifically, as shown in the example of FIG. 9, the convolution operation unit 12 performs a quantization process that rounds down the decimal point in the convolution operation according to the function 1 of the convolution operation quantization function 161 (floor ( conv)). Further, the convolution operation unit 12 multiplies the element of the quantized weight U ′ and the corresponding element of the input signal A at each position of the filter according to the convolution operation quantization function 161, and obtains the sum. . The operation result X of the quantized convolution operation performed by the quantization convolution operation unit 12A is temporarily stored in the operation result buffer 13 (step S34).

その後、ＣＮＮ処理装置１Ｂのプロセッサ１０２は、演算結果バッファ１３から量子化された畳み込み演算の演算結果Ｘを読み出し、ＲｅＬＵなどの活性化関数を適用する（ステップＳ３５）。具体的には、プロセッサ１０２は、演算結果Ｘが負の値である場合には、ＲｅＬＵ関数を通して０を出力し、正の演算結果Ｘはそのままの値を出力する。 Thereafter, the processor 102 of the CNN processing device 1B reads the quantized operation result X of the convolution operation from the operation result buffer 13 and applies an activation function such as ReLU (step S35). Specifically, when the calculation result X is a negative value, the processor 102 outputs 0 through the ReLU function, and the positive calculation result X outputs the same value.

次に、プロセッサ１０２は、ステップＳ３５で出力された値に対してよく知られたプーリング処理を行い、畳み込み演算の結果を圧縮する（ステップＳ３６）。なお、ステップＳ３５のプーリング処理は必要に応じて行えばよい。また、プロセッサ１０２は、ステップＳ３５で得られた活性化関数（ＲｅＬＵ）の出力に対して、正規化を行ってもよい（非特許文献１参照）。 Next, the processor 102 performs well-known pooling processing on the value output in step S35, and compresses the result of the convolution operation (step S36). The pooling process in step S35 may be performed as needed. Further, the processor 102 may normalize the output of the activation function (ReLU) obtained in step S35 (see Non-Patent Document 1).

プーリングされた量子化畳み込み演算の演算結果Ｘは、出力バッファ１５に一時的に記憶され、さらに、プロセッサ１０２により読み出されて外部に出力される（ステップＳ３７）。なお、出力された値は、ＣＮＮの特徴抽出部の出力として、後続の図示しない分類器を構成する全結合層に入力されて入力信号Ａの画像データを判別する。 The pooled quantized convolutional calculation result X is temporarily stored in the output buffer 15, is further read by the processor 102, and is output to the outside (step S37). The output value is input as an output of the feature extraction unit of the CNN to the subsequent all-connection layer that constitutes a classifier (not shown) to determine the image data of the input signal A.

以上説明したように、第３の実施の形態に係るＣＮＮ処理装置１Ｂによれば、予め定義された重みの量子化関数１６０および畳み込み演算の量子化関数１６１を用いて重みの量子化や、量子化畳み込み演算を行う。そのため、多段に接続される畳み込み層の演算において、関数を入れ替えることで、畳み込み演算の全てを定義することができ、プログラムの量を抑えてＣＮＮの処理速度の低下を抑制することができる。 As described above, according to the CNN processing device 1B according to the third embodiment, the quantization of the weight and the quantization using the quantization function 160 of the weight defined in advance and the quantization function 161 of the convolution operation are performed. Performs a convolution operation. Therefore, in the operation of the convolutional layers connected in multiple stages, by exchanging the functions, all the convolutional operations can be defined, and the amount of programs can be suppressed and the decrease in CNN processing speed can be suppressed.

なお、説明した実施の形態に係る重み量子化関数１６０および畳み込み演算量子化関数１６１は、ハードウェアに組み込みが可能なプログラムであってもよい。例えば、ＦＰＧＡのソースコードやマイコンのファームウェアなどがそれに該当し、この場合、ハードウェアＩＰなどといあった形態で別途備えることが可能となる。 The weight quantization function 160 and the convolution operation quantization function 161 according to the above-described embodiments may be programs that can be incorporated in hardware. For example, the source code of the FPGA or the firmware of the microcomputer corresponds to this, and in this case, the hardware IP or the like can be separately provided.

また、ハードウェアに組み込み可能なプログラムとして構成される重み量子化関数１６０および畳み込み演算量子化関数１６１は、同一ハードウェア内のメモリにあってもよく、また、他のネットワーク設備などに格納されていてもよい。上記プログラムは、ニューラルネットワークの形態に応じて入れ替えることにより、所望の処理機能を有したニューラルネットワークを柔軟に実現するハードウェアが実現できる。 Further, the weight quantization function 160 and the convolution operation quantization function 161 configured as a program that can be incorporated in hardware may be in a memory in the same hardware, or are stored in another network facility or the like. May be. By replacing the above program according to the form of the neural network, hardware that flexibly realizes the neural network having a desired processing function can be realized.

以上、本発明のニューラルネットワーク処理装置、およびニューラルネットワーク処理方法における実施の形態について説明したが、本発明は説明した実施の形態に限定されるものではなく、請求項に記載した発明の範囲において当業者が想定し得る各種の変形を行うことが可能である。 Although the embodiments of the neural network processing apparatus and the neural network processing method of the present invention have been described above, the present invention is not limited to the above described embodiments, and is applicable within the scope of the invention described in the claims. Various modifications that can be envisioned by a trader can be made.

例えば、説明した実施の形態では、ニューラルネットワークの一例としてＣＮＮを挙げて説明したが、ニューラルネットワーク処理装置が採用するニューラルネットワークはＣＮＮに限らない。 For example, although the CNN has been described as an example of the neural network in the described embodiment, the neural network adopted by the neural network processing device is not limited to the CNN.

なお、ここで開示された実施の形態に関連して記述された様々の機能ブロック、モジュール、および回路は、汎用プロセッサ、ＧＰＵ、デジタル信号プロセッサ（ＤＳＰ）、特定用途向け集積回路（ＡＳＩＣ）、ＦＰＧＡあるいはその他のプログラマブル論理デバイス、ディスクリートゲートあるいはトランジスタロジック、ディスクリートハードウェア部品、または上述した機能を実現するために設計された上記いずれかの組み合わせを用いて実行されうる。 It should be noted that the various functional blocks, modules, and circuits described in connection with the embodiments disclosed herein are general purpose processors, GPUs, digital signal processors (DSPs), application specific integrated circuits (ASICs), FPGAs. Alternatively, it may be implemented using other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or any combination of the above designed to implement the functions described above.

汎用プロセッサとしてマイクロプロセッサを用いることが可能であるが、代わりに、従来技術によるプロセッサ、コントローラ、マイクロコントローラ、あるいは状態機器を用いることも可能である。プロセッサは、例えば、ＤＳＰとマイクロプロセッサとの組み合わせ、複数のマイクロプロセッサ、ＤＳＰコアに接続された１つ以上のマイクロプロセッサ、またはこのような任意の構成である計算デバイスの組み合わせとして実現することも可能である。 A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. The processor may be implemented, for example, as a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors connected to a DSP core, or a combination of computing devices of any such configuration. Is.

１…ＣＮＮ処理装置、１０…入力バッファ、１１…重みバッファ、１２…畳み込み演算部、１３…演算結果バッファ、１４…量子化処理部、１５…出力バッファ、１６…記憶部、１０１…バス、１０２…プロセッサ、１０３…主記憶装置、１０４…通信インターフェース、１０５…補助記憶装置、１０６…入出力装置、ＮＷ…通信ネットワーク、Ｕ…重み、Ｕ’…量子化された重み、Ａ…入力信号、Ａ’…量子化された入力信号。 DESCRIPTION OF SYMBOLS 1 ... CNN processing device, 10 ... Input buffer, 11 ... Weight buffer, 12 ... Convolution operation part, 13 ... Operation result buffer, 14 ... Quantization processing part, 15 ... Output buffer, 16 ... Storage part, 101 ... Bus, 102 ... processor, 103 ... main storage device, 104 ... communication interface, 105 ... auxiliary storage device, 106 ... input / output device, NW ... communication network, U ... weight, U '... quantized weight, A ... input signal, A '... Quantized input signal.

Claims

A first memory for storing an input signal given to the neural network;
A second memory for storing the weights of the neural network;
A first process for performing a convolutional operation of the neural network including a product-sum operation of the input signal and the weight; and a second process for performing quantization for reducing bit precision of at least a part of data used in the first process. A neural network processing device comprising: a processor for performing processing.

The neural network processing apparatus according to claim 1,
The said 2nd process includes the said input signal, the said weight, and the quantization which reduces the bit precision of at least 1 data of the calculation result of the said product sum calculation in the said 1st process. The neural network processing apparatus characterized by the above-mentioned. .

The neural network processing apparatus according to claim 1,
The said 2nd memory memorize | stores the quantization weight which the said weight was quantized beforehand, The neural network processing apparatus characterized by the above-mentioned.

The neural network processing device according to any one of claims 1 to 3,
The neural network processing apparatus, wherein the processor includes a quantization convolution operation unit that performs the first process in which the second process is incorporated, based on the weight and the input signal.

The neural network processing apparatus according to claim 1,
A third memory storing a first function for quantizing the weights;
A fourth memory that stores a second function that implements the first process in which the second process is incorporated;
Further equipped with,
The neural network processing device, wherein the processor reads the first function, quantizes the weight, reads the second function, and performs the first process in which the second process is incorporated.

The neural network processing device according to any one of claims 1 to 5,
The neural network processing device, wherein the neural network is a multilayer neural network having at least one intermediate layer.

A first step of storing an input signal given to the neural network in a first memory;
A second step of storing the weights of the neural network in a second memory;
A first process in which the processor performs a convolution operation of the neural network including a product-sum operation of the input signal and the weight;
A second process of performing a quantization for reducing the bit precision of at least a part of the data used in the first process, and a third step of performing: a neural network processing method.