KR102833795B1

KR102833795B1 - Method and apparatus for processing convolution neural network

Info

Publication number: KR102833795B1
Application number: KR1020180158379A
Authority: KR
Inventors: 모스타파 마흐무드; 안드레아스 모쇼보스
Original assignee: 삼성전자주식회사
Priority date: 2018-08-23
Filing date: 2018-12-10
Publication date: 2025-07-15
Anticipated expiration: 2038-12-10
Also published as: KR20200023154A

Abstract

뉴럴 네트워크에서 컨볼루션 연산을 처리하는 방법 및 장치는, 복수의 로우 윈도우들로부터 차분 윈도우들을 생성하고, 기준 로우 윈도우와 커널간의 컨볼루션 연산 결과 및 차분 윈도우들과 커널간의 컨볼루션 연산 결과들을 이용하여, 출력 피처맵의 엘리먼트들을 획득한다.A method and device for processing a convolution operation in a neural network generate differential windows from a plurality of row windows, and obtain elements of an output feature map by using the results of a convolution operation between a reference row window and a kernel and the results of a convolution operation between the differential windows and the kernel.

Description

Method and apparatus for processing convolution neural network {Method and apparatus for processing convolution neural network}

컨볼루션 뉴럴 네트워크에서 입력 피처맵과 커널 간의 컨볼루션 연산을 처리하는 방법 및 장치에 관한다.A method and device for processing a convolution operation between an input feature map and a kernel in a convolutional neural network.

뉴럴 네트워크(neural network)는 생물학적 뇌를 모델링한 컴퓨터 과학적 아키텍쳐(computational architecture)를 참조한다. 최근 뉴럴 네트워크(neural network) 기술이 발전함에 따라, 다양한 종류의 전자 시스템에서 뉴럴 네트워크를 활용하여 입력 데이터를 분석하고 유효한 정보를 추출하는 연구가 활발히 진행되고 있다. 뉴럴 네트워크를 처리하는 장치는 복잡한 입력 데이터에 대한 많은 양의 연산을 필요로 한다. 따라서, 뉴럴 네트워크를 이용하여 대량의 입력 데이터를 실시간으로 분석하여, 원하는 정보를 추출하기 위해서는 뉴럴 네트워크에 관한 연산을 효율적으로 처리할 수 있는 기술이 요구된다.Neural network refers to a computational architecture that models the biological brain. Recently, with the development of neural network technology, research is actively being conducted to analyze input data and extract valid information by utilizing neural networks in various types of electronic systems. Devices that process neural networks require a large amount of computation on complex input data. Therefore, in order to analyze a large amount of input data in real time using neural networks and extract desired information, a technology that can efficiently process computations related to neural networks is required.

컨볼루션 뉴럴 네트워크를 처리하는 방법 및 장치를 제공하는데 있다. 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 이하의 실시예들로부터 또 다른 기술적 과제들이 유추될 수 있다.The present invention provides a method and device for processing a convolutional neural network. The technical problems to be solved by the present embodiment are not limited to the technical problems described above, and other technical problems can be inferred from the following embodiments.

일 측면에 따르면, 뉴럴 네트워크 프로세싱 장치에서 컨볼루션 뉴럴 네트워크를 처리하는 방법은 입력 피처맵의 복수의 로우(raw) 윈도우들을 차분 연산을 위한 복수의 차분 그룹들(differential groups)로 그룹핑하는 단계, 그룹핑된 복수의 차분 그룹들 각각에 속한 로우 윈도우들 간의 차분 연산을 통해 차분 윈도우들(differential windows)을 생성하는 단계, 로우 윈도우들 중 기준 로우 윈도우와 커널 간의 컨볼루션 연산을 수행함으로써, 기준 로우 윈도우에 대응하는 출력 피처맵의 기준 엘리먼트를 획득하는 단계, 및 차분 윈도우들 각각과 커널 간의 컨볼루션 연산 결과들에 대하여 기준 엘리먼트와 합산 연산을 수행함으로써, 출력 피처맵의 나머지 엘리먼트들을 획득하는 단계를 포함한다.According to one aspect, a method for processing a convolutional neural network in a neural network processing device includes the steps of grouping a plurality of raw windows of an input feature map into a plurality of differential groups for a differential operation, generating differential windows by performing a differential operation between raw windows belonging to each of the plurality of grouped differential groups, obtaining a reference element of an output feature map corresponding to the reference raw window by performing a convolution operation between a reference raw window and a kernel among the raw windows, and obtaining remaining elements of the output feature map by performing a summation operation with respect to results of the convolution operations between each of the differential windows and the kernel and the reference element.

다른 측면에 따르면, 뉴럴 네트워크 프로세싱 장치는 입력 피처맵이 저장된 메모리, 및 메모리에 저장된 입력 피처맵을 이용하여 컨볼루션 뉴럴 네트워크를 처리하기 위한 뉴럴 네트워크 프로세서를 포함하고, 뉴럴 네트워크 프로세서는, 입력 피처맵의 복수의 로우(raw) 윈도우들을 차분 연산을 위한 복수의 차분 그룹들(differential groups)로 그룹핑하고, 그룹핑된 복수의 차분 그룹들 각각에 속한 로우 윈도우들 간의 차분 연산을 통해 차분 윈도우들(differential windows)을 생성하고, 로우 윈도우들 중 기준 로우 윈도우와 커널 간의 컨볼루션 연산을 수행함으로써, 기준 로우 윈도우에 대응하는 출력 피처맵의 기준 엘리먼트를 획득하고, 차분 윈도우들 각각과 커널 간의 컨볼루션 연산 결과들에 대하여 기준 엘리먼트와 합산 연산을 수행함으로써, 출력 피처맵의 나머지 엘리먼트들을 획득한다.According to another aspect, a neural network processing device includes a memory in which an input feature map is stored, and a neural network processor for processing a convolutional neural network using the input feature map stored in the memory, wherein the neural network processor groups a plurality of raw windows of the input feature map into a plurality of differential groups for a differential operation, generates differential windows through a differential operation between raw windows belonging to each of the plurality of grouped differential groups, performs a convolution operation between a reference raw window and a kernel among the raw windows, thereby obtaining a reference element of an output feature map corresponding to the reference raw window, and performs a sum operation with respect to results of the convolution operations between each of the differential windows and the kernel, thereby obtaining the remaining elements of the output feature map.

도 1은 일 실시예에 따른 뉴럴 네트워크의 아키텍처를 설명하기 위한 도면이다.
도 2a 및 도 2b는 뉴럴 네트워크의 컨볼루션 연산을 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 뉴럴 네트워크 프로세싱 장치의 하드웨어 구성을 도시한 블록도이다.
도 4는 일 실시예에 따른 뉴럴 네트워크 프로세싱 장치에서 컨볼루션 뉴럴 네트워크를 처리하는 방법의 흐름도이다.
도 5는 일 실시 예에 따른 입력 피처맵을 도시한 도면이다.
도 6은 일 실시 예에 따른 차분 그룹 및 차분 윈도우를 설명하기 위한 도면이다.
도 7은 일 실시 예에 따른 캐스케이딩 방식의 합산 연산을 설명하기 위한 도면이다.
도 8은 일 실시 예에 따른 차분 윈도우를 이용한 컨볼루션 연산을 설명하기 위한 도면이다.
도 9는 일 실시 예에 따른 뉴럴 네트워크 프로세싱 장치의 구현 예를 도시한 도면이다.
도 10은 일 실시 예에 따른 차분 윈도우 출력부를 설명하기 위한 도면이다.
도 11은 본 개시에 따른 연산 처리 방식의 연산 처리 속도의 향상을 설명하기 위한 그래프이다.
도 12는 본 개시에 따른 연산 처리 방식의 프레임율(Frame Rate)의 향상을 설명하기 위한 그래프이다.FIG. 1 is a diagram illustrating the architecture of a neural network according to one embodiment.
Figures 2a and 2b are diagrams for explaining the convolution operation of a neural network.
FIG. 3 is a block diagram illustrating a hardware configuration of a neural network processing device according to one embodiment.
FIG. 4 is a flowchart of a method for processing a convolutional neural network in a neural network processing device according to one embodiment.
FIG. 5 is a diagram illustrating an input feature map according to one embodiment.
FIG. 6 is a diagram for explaining a differential group and a differential window according to one embodiment.
FIG. 7 is a diagram for explaining a cascading method of summation operation according to one embodiment.
FIG. 8 is a diagram for explaining a convolution operation using a differential window according to one embodiment.
FIG. 9 is a diagram illustrating an implementation example of a neural network processing device according to one embodiment.
FIG. 10 is a drawing for explaining a differential window output unit according to one embodiment.
Figure 11 is a graph for explaining the improvement in the operation processing speed of the operation processing method according to the present disclosure.
Figure 12 is a graph for explaining the improvement in frame rate of the operation processing method according to the present disclosure.

본 실시예들에서 사용되는 용어는 본 실시예들에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 기술분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 임의로 선정된 용어도 있으며, 이 경우 해당 실시예의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 실시예들에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 실시예들의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in these embodiments are selected from the most widely used and general terms possible while considering the functions in these embodiments, but this may vary depending on the intention of engineers in the relevant technical field, precedents, the emergence of new technologies, etc. In addition, in certain cases, there are terms that are arbitrarily selected, and in this case, the meanings thereof will be described in detail in the description of the relevant embodiments. Therefore, the terms used in these embodiments should be defined based on the meanings of the terms and the overall contents of these embodiments, rather than simply the names of the terms.

실시예들에 대한 설명들에서, 어떤 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 그 중간에 다른 구성요소를 사이에 두고 전기적으로 연결되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In the description of embodiments, when it is said that a part is connected to another part, this includes not only cases where they are directly connected, but also cases where they are electrically connected with another component in between. Also, when it is said that a part includes a certain component, this does not mean that other components are excluded, unless there is a specific description to the contrary, but that other components can be further included.

본 실시예들에서 사용되는 "구성된다" 또는 "포함한다" 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 도는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.The terms “comprises” or “comprising” used in these embodiments should not be construed to necessarily include all of the various components or various steps described in the specification, and should be construed to mean that some of the components or some of the steps may not be included, or that additional components or steps may be included.

하기 실시예들에 대한 설명은 권리범위를 제한하는 것으로 해석되지 말아야 하며, 해당 기술분야의 당업자가 용이하게 유추할 수 있는 것은 실시예들의 권리범위에 속하는 것으로 해석되어야 할 것이다. 이하 첨부된 도면들을 참조하면서 오로지 예시를 위한 실시예들을 상세히 설명하기로 한다.The description of the following examples should not be construed as limiting the scope of the rights, and what can be easily inferred by a person skilled in the art should be construed as falling within the scope of the rights of the examples. Hereinafter, examples for illustrative purposes only will be described in detail with reference to the attached drawings.

도 1은 일 실시예에 따른 뉴럴 네트워크의 아키텍처를 설명하기 위한 도면이다.FIG. 1 is a diagram illustrating the architecture of a neural network according to one embodiment.

도 1을 참고하면, 뉴럴 네트워크(1)는 딥 뉴럴 네트워크(Deep Neural Network, DNN) 또는 n-계층 뉴럴 네트워크(n-layers neural networks)의 아키텍처일 수 있다. DNN 또는 n-계층 뉴럴 네트워크는 컨볼루션 뉴럴 네트워크(Convolutional Neural Networks, CNN), 리커런트 뉴럴 네트워크(Recurrent Neural Networks, RNN), Deep Belief Networks, Restricted Boltzman Machines 등에 해당될 수 있다. 예를 들어, 뉴럴 네트워크(1)는 컨볼루션 뉴럴 네트워크(CNN)로 구현될 수 있으나, 이에 제한되지 않는다. 도 1에서는 뉴럴 네트워크(1)의 예시에 해당하는 컨볼루션 뉴럴 네트워크에서 일부의 컨볼루션 레이어가 도시되었지만, 컨볼루션 뉴럴 네트워크는 도시된 컨볼루션 레이어 외에도, 풀링 레이어(pooling layer), 풀리 커넥티드 레이어(fully connected layer) 등을 더 포함할 수 있다.Referring to FIG. 1, the neural network (1) may be an architecture of a deep neural network (DNN) or an n-layer neural network. The DNN or n-layer neural network may correspond to a convolutional neural network (CNN), a recurrent neural network (RNN), a deep belief network, a restricted Boltzman machine, etc. For example, the neural network (1) may be implemented as a convolutional neural network (CNN), but is not limited thereto. In FIG. 1, some convolutional layers are illustrated in a convolutional neural network corresponding to an example of the neural network (1), but the convolutional neural network may further include a pooling layer, a fully connected layer, etc. in addition to the illustrated convolutional layers.

뉴럴 네트워크(1)는 입력 이미지, 피처맵들(feature maps) 및 출력을 포함하는 복수 레이어들을 갖는 아키텍처로 구현될 수 있다. 뉴럴 네트워크(1)에서 입력 이미지는 커널(kernel)이라 불리는 필터와의 컨볼루션 연산이 수행되고, 그 결과 피처맵들이 출력된다. 이때 생성된 출력 피처맵들은 입력 피처맵들로서 다시 커널과의 컨볼루션 연산이 수행되고, 새로운 피처맵들이 출력된다. 이와 같은 컨볼루션 연산이 반복적으로 수행된 결과, 최종적으로는 뉴럴 네트워크(1)를 통한 입력 이미지의 특징들에 대한 인식 결과가 출력될 수 있다.A neural network (1) can be implemented as an architecture having multiple layers including an input image, feature maps, and output. In the neural network (1), a convolution operation is performed on an input image with a filter called a kernel, and feature maps are output as a result. The output feature maps generated at this time are again convolutionally operated with a kernel as input feature maps, and new feature maps are output. As a result of repeatedly performing such convolution operations, a recognition result for the features of the input image through the neural network (1) can be output as a final result.

예를 들어, 도 1의 뉴럴 네트워크(1)에 24x24 픽셀 크기의 이미지가 입력된 경우, 입력 이미지는 커널과의 컨볼루션 연산을 통해 20x20 픽셀 크기를 갖는 4채널의 피처맵들로 출력될 수 있다. 이후에도, 20x20 피처맵들은 커널과의 반복적인 컨볼루션 연산을 통해 크기가 줄어들면서, 최종적으로는 1x1 픽셀 크기의 특징들이 출력될 수 있다. 뉴럴 네트워크(1)는 여러 레이어들에서 컨볼루션 연산 및 서브-샘플링(또는 풀링) 연산을 반복적으로 수행함으로써 입력 이미지로부터 이미지 전체를 대표할 수 있는 강인한 특징들을 필터링하여 출력하고, 출력된 최종 특징들을 통해 입력 이미지의 인식 결과를 도출할 수 있다.For example, when an image of 24x24 pixels is input to the neural network (1) of Fig. 1, the input image can be output as 4-channel feature maps of 20x20 pixels through a convolution operation with the kernel. Thereafter, the 20x20 feature maps can be reduced in size through repeated convolution operations with the kernel, and ultimately features of 1x1 pixels can be output. The neural network (1) filters and outputs robust features that can represent the entire image from the input image by repeatedly performing convolution operations and sub-sampling (or pooling) operations in multiple layers, and can derive a recognition result of the input image through the output final features.

도 2a 및 도 2b는 뉴럴 네트워크의 컨볼루션 연산을 설명하기 위한 도면이다.Figures 2a and 2b are diagrams for explaining the convolution operation of a neural network.

도 2a의 예시에서, 입력 이미지에 대한 입력 피처맵(210)은 6x6 픽셀 크기이고, 원본 커널(220)은 3x3 픽셀 크기이고, 출력 피처맵(230)은 4x4 픽셀 크기인 것으로 가정하나, 이에 제한되지 않고 뉴럴 네트워크는 다양한 크기의 피처맵들 및 커널들로 구현될 수 있다. 또한, 입력 피처맵(210), 원본 커널(220) 및 출력 피처맵(230)에 정의된 값들은 모두 예시적인 값들일 뿐이고, 본 실시예들은 이에 제한되지 않는다.In the example of FIG. 2a, it is assumed that the input feature map (210) for the input image has a size of 6x6 pixels, the original kernel (220) has a size of 3x3 pixels, and the output feature map (230) has a size of 4x4 pixels, but it is not limited thereto, and the neural network can be implemented with feature maps and kernels of various sizes. In addition, the values defined in the input feature map (210), the original kernel (220), and the output feature map (230) are all exemplary values, and the present embodiments are not limited thereto.

원본 커널(220)은 입력 피처맵(210)에서 3x3 픽셀 크기의 윈도우 단위로 슬라이딩하면서(슬라이딩 윈도우 방식(sliding window fashion)) 컨볼루션 연산을 수행한다. 컨볼루션 연산은 입력 피처맵(210)의 어느 윈도우의 각 픽셀 값 및 원본 커널(220)에서 대응 위치의 각 엘리먼트의 웨이트 간의 곱셈을 하여 획득된 값들을 모두 합산하여, 출력 피처맵(230)의 각 픽셀 값을 구하는 연산을 의미한다. 구체적으로, 원본 커널(220)은 먼저 입력 피처맵(210)의 제1윈도우(211)와 컨볼루션 연산을 수행한다. 즉, 제1윈도우(211)의 각 픽셀 값 1, 2, 3, 4, 5, 6, 7, 8, 9는 각각 원본 커널(220)의 각 엘리먼트의 웨이트 -1, -3, +4, +7, -2, -1, -5, +3, +1과 각각 곱해지고, 그 결과로서 -1, -6, 12, 28, -10, -6, -35, 24, 9가 획득된다. 다음으로, 획득된 값들 -1, -6, 12, 28, -10, -6, -35, 24, 9를 모두 더한 결과인 15가 계산되고, 출력 피처맵(230)의 1행1열의 픽셀 값(231)은 15로 결정된다. 여기서, 출력 피처맵(230)의 1행1열의 픽셀 값(231)은 제1윈도우(211)에 대응된다. 마찬가지 방식으로, 입력 피처맵(210)의 제2윈도우(212)와 원본 커널(220) 간의 컨볼루션 연산이 수행됨으로써 출력 피처맵(230)의 1행2열의 픽셀 값(232)인 4가 결정된다. 최종적으로, 입력 피처맵(210)의 마지막 윈도우인 제16윈도우(213)와 원본 커널(220) 간의 컨볼루션 연산이 수행됨으로써 출력 피처맵(230)의 4행4열의 픽셀 값(233)인 11이 결정된다.The original kernel (220) performs a convolution operation while sliding in a window unit of 3x3 pixel size on the input feature map (210) (sliding window fashion). The convolution operation means an operation of obtaining each pixel value of the output feature map (230) by adding up all values obtained by multiplying each pixel value of a window of the input feature map (210) and the weight of each element of the corresponding position in the original kernel (220). Specifically, the original kernel (220) first performs a convolution operation with the first window (211) of the input feature map (210). That is, each pixel value 1, 2, 3, 4, 5, 6, 7, 8, 9 of the first window (211) is multiplied by each element's weight -1, -3, +4, +7, -2, -1, -5, +3, +1 of the original kernel (220), respectively, and as a result, -1, -6, 12, 28, -10, -6, -35, 24, 9 are obtained. Next, 15, which is the result of adding all of the obtained values -1, -6, 12, 28, -10, -6, -35, 24, 9, is calculated, and the pixel value (231) of the 1st row and 1st column of the output feature map (230) is determined as 15. Here, the pixel value (231) of the 1st row and 1st column of the output feature map (230) corresponds to the first window (211). In the same manner, a convolution operation is performed between the 2nd window (212) of the input feature map (210) and the original kernel (220), thereby determining the pixel value (232) of the 1st row and 2nd column of the output feature map (230), which is 4. Finally, a convolution operation is performed between the 16th window (213), which is the last window of the input feature map (210), and the original kernel (220), thereby determining the pixel value (233) of the 4th row and 4th column of the output feature map (230), which is 11.

즉, 하나의 입력 피처맵(210)과 하나의 원본 커널(220) 간의 컨볼루션 연산은 입력 피처맵(210) 및 원본 커널(220)에서 서로 대응하는 각 엘리먼트의 값들의 곱셈 및 곱셈 결과들의 합산을 반복적으로 수행함으로써 처리될 수 있고, 컨볼루션 연산의 결과로서 출력 피처맵(230)이 생성된다.That is, a convolution operation between one input feature map (210) and one original kernel (220) can be processed by repeatedly performing multiplication of values of corresponding elements in the input feature map (210) and the original kernel (220) and summing the results of the multiplication, and an output feature map (230) is generated as a result of the convolution operation.

한편, 도 2a에서는 2차원 컨볼루션 연산에 대하여 설명되었으나, 컨볼루션 연산은 복수의 채널들의 입력 피처맵들, 커널들, 출력 피처맵들이 존재하는 3차원 컨볼루션 연산에 해당될 수 있다. 이에 대해서는 도 2b를 참고하여 설명하도록 한다.Meanwhile, although Fig. 2a describes a two-dimensional convolution operation, the convolution operation may correspond to a three-dimensional convolution operation in which input feature maps, kernels, and output feature maps of multiple channels exist. This will be described with reference to Fig. 2b.

도 2b를 참고하면, 입력 피처맵들(201)은 X개의 채널들이 존재하고, 각 채널의 입력 피처맵은 H행 W열의 크기를 가질 수 있다 (X, W, H는 자연수). 커널들(202) 각각은 R행 S열의 크기를 갖고, 커널들(202)은 입력 피처맵들(201)의 채널 수(X) 및 출력 피처맵들(203)의 채널 수(Y)에 대응하는 개수의 채널들을 가질 수 있다 (R, S, Y는 자연수). 출력 피처맵들(203)은 입력 피처맵들(201)과 커널들(202) 간의 3차원 컨볼루션 연산을 통해 생성되고, 컨볼루션 연산에 따라 Y개의 채널들이 존재할 수 있다.Referring to FIG. 2b, input feature maps (201) have X channels, and the input feature map of each channel can have a size of H rows and W columns (X, W, H are natural numbers). Each of the kernels (202) has a size of R rows and S columns, and the kernels (202) can have a number of channels corresponding to the number of channels (X) of the input feature maps (201) and the number of channels (Y) of the output feature maps (203) (R, S, Y are natural numbers). The output feature maps (203) are generated through a 3D convolution operation between the input feature maps (201) and the kernels (202), and there can be Y channels depending on the convolution operation.

하나의 입력 피처맵과 하나의 커널 간의 컨볼루션 연산을 통해 출력 피처맵이 생성되는 과정은 앞서 도 2a에서 설명된 바와 같으며, 도 2a에서 설명된 2차원 컨볼루션 연산이 전체 채널들의 입력 피처맵들(201)과 전체 채널들의 커널들(202) 간에 반복적으로 수행됨으로써, 전체 채널들의 출력 피처맵들(203)이 생성될 수 있다.The process of generating an output feature map through a convolution operation between one input feature map and one kernel is as described above in FIG. 2a, and the two-dimensional convolution operation described in FIG. 2a is repeatedly performed between the input feature maps (201) of all channels and the kernels (202) of all channels, so that output feature maps (203) of all channels can be generated.

도 3은 일 실시예에 따른 뉴럴 네트워크 프로세싱 장치(300)의 하드웨어 구성을 도시한 블록도이다.FIG. 3 is a block diagram illustrating a hardware configuration of a neural network processing device (300) according to one embodiment.

도 3을 참고하면, 뉴럴 네트워크 프로세싱 장치(300)는 뉴럴 네트워크 프로세서(310) 및 메모리(320)를 포함할 수 있다.Referring to FIG. 3, a neural network processing device (300) may include a neural network processor (310) and a memory (320).

도 3에 도시된 뉴럴 네트워크 프로세싱 장치(300)에는 본 실시예들과 관련된 구성요소들만이 도시되어 있다. 따라서, 뉴럴 네트워크 프로세싱 장치(300)에는 도 3에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음은 당해 기술분야의 통상의 기술자에게 자명하다.The neural network processing device (300) illustrated in FIG. 3 only illustrates components related to the present embodiments. Therefore, it is obvious to those skilled in the art that the neural network processing device (300) may further include other general components in addition to the components illustrated in FIG. 3.

뉴럴 네트워크 프로세서(310)는 CPU(central processing unit), GPU(graphics processing unit), AP(application processor) 등으로 구현될 수 있으나, 이에 제한되지 않는다.The neural network processor (310) may be implemented as a CPU (central processing unit), a GPU (graphics processing unit), an AP (application processor), etc., but is not limited thereto.

또한, 메모리(320)는 DRAM(dynamic random access memory), SRAM(static random access memory) 등과 같은 RAM(random access memory), ROM(read-only memory), EEPROM(electrically erasable programmable read-only memory), CD-ROM, 블루레이 또는 다른 광학 디스크 스토리지, HDD(hard disk drive), SSD(solid state drive), 또는 플래시 메모리를 포함할 수 있다.Additionally, the memory (320) may include random access memory (RAM) such as dynamic random access memory (DRAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), CD-ROM, Blu-ray or other optical disk storage, hard disk drive (HDD), solid state drive (SSD), or flash memory.

뉴럴 네트워크 프로세싱 장치(300)는 뉴럴 네트워크 장치에 포함되어, 뉴럴 네트워크를 구동하기 위한 전반적인 기능들을 제어하는 역할을 한다. 예를 들어, 뉴럴 네트워크 프로세싱 장치(300) 뉴럴 네트워크 장치의 입력 피처맵으로부터 출력 피처맵을 추출하는 연산 처리 과정을 제어할 수 있다.The neural network processing device (300) is included in the neural network device and serves to control overall functions for driving the neural network. For example, the neural network processing device (300) can control an operation processing process for extracting an output feature map from an input feature map of the neural network device.

여기서, 뉴럴 네트워크 장치는 PC(personal computer), 서버 디바이스, 모바일 디바이스, 임베디드 디바이스 등의 다양한 종류의 디바이스들로 구현될 수 있고, 구체적인 예로서 뉴럴 네트워크를 이용한 음성 인식, 영상 인식, 영상 분류 등을 수행하는 스마트폰, 태블릿 디바이스, AR(Augmented Reality) 디바이스, IoT(Internet of Things) 디바이스, 자율주행 자동차, 로보틱스, 의료기기 등에 해당될 수 있으나, 이에 제한되지 않는다. 나아가서, 뉴럴 네트워크 프로세싱 장치(300)는 위와 같은 디바이스에 탑재되는 전용 하드웨어 가속기(HW accelerator)에 해당될 수 있고, 뉴럴 네트워크 장치는 뉴럴 네트워크 구동을 위한 전용 모듈인 NPU(neural processing unit), TPU(Tensor Processing Unit), Neural Engine 등과 같은 하드웨어 가속기일 수 있으나, 이에 제한되지 않는다.Here, the neural network device can be implemented with various types of devices such as a personal computer (PC), a server device, a mobile device, and an embedded device, and as specific examples, it can correspond to a smartphone, a tablet device, an Augmented Reality (AR) device, an Internet of Things (IoT) device, an autonomous vehicle, robotics, a medical device, etc. that perform voice recognition, image recognition, and image classification using a neural network, but is not limited thereto. Furthermore, the neural network processing device (300) can correspond to a dedicated hardware accelerator (HW accelerator) mounted on the above-mentioned device, and the neural network device can be a hardware accelerator such as an NPU (neural processing unit), a TPU (Tensor Processing Unit), a Neural Engine, etc., which are dedicated modules for driving a neural network, but is not limited thereto.

뉴럴 네트워크 장치는 뉴럴 네트워크 프로세싱 장치(300) 및 외부 메모리를 포함할 수 있다. 뉴럴 네트워크 프로세서(310) 및 메모리(320)를 포함하는 뉴럴 네트워크 프로세싱 장치(300)는 단일 칩으로 구현될 수 있으며, 이 경우 뉴럴 네트워크 프로세싱 장치(300)에 포함된 메모리(320)는 온-칩(on-chip) 메모리이며, 외부 메모리는 오프-칩(off-chip) 메모리이다.The neural network device may include a neural network processing device (300) and an external memory. The neural network processing device (300) including a neural network processor (310) and a memory (320) may be implemented as a single chip, in which case the memory (320) included in the neural network processing device (300) is an on-chip memory, and the external memory is an off-chip memory.

칩의 크기 등을 줄이기 위하여, 뉴럴 네트워크 프로세싱 장치(300)에 포함된 메모리(320)는 외부 메모리에 비해 상대적으로 적은 용량을 가질 수 있다.In order to reduce the size of the chip, etc., the memory (320) included in the neural network processing device (300) may have a relatively small capacity compared to an external memory.

상대적으로 큰 용량을 가진 오프-칩 메모리 모든 입력 피처맵과 커널의 웨이트 값들 및 출력 피처맵 등을 저장할 수 있다. 뉴럴 네트워크 프로세싱 장치(300)는 외부 메모리에 엑세스(access)하여 연산에 필요한 데이터를 획득하고, 획득된 데이터를 온-칩 메모리인 메모리(320)에 저장할 수 있다. 또한, 뉴럴 네트워크 프로세싱 장치(300)는 출력 피처맵을 생성하기 위한 중간 연산 결과 및 출력 피처맵의 일부를 온-칩 메모리인 메모리(320)에 저장할 수 있다.Off-chip memory with a relatively large capacity can store all input feature maps, weight values of kernels, output feature maps, etc. The neural network processing device (300) can access an external memory to obtain data required for calculation and store the obtained data in the memory (320), which is an on-chip memory. In addition, the neural network processing device (300) can store intermediate calculation results for generating an output feature map and a part of the output feature map in the memory (320), which is an on-chip memory.

온-칩 메모리(320)의 크기가 감소할수록 칩의 크기를 줄일 수 있으나, 오프-칩 메모리에 엑세스 빈도가 증가함에 따라 트래픽이 증가할 수 있다. 따라서, 온-칩 메모리(320)의 용량을 고려하여 중간 연산 결과의 데이터 용량을 감소시키고, 연산 처리 과정에서 발생하는 트래픽을 줄이기 위해 오프-칩 메모리에 엑세스 빈도를 감소시킬 필요가 있다.As the size of the on-chip memory (320) decreases, the size of the chip can be reduced, but as the frequency of access to the off-chip memory increases, traffic can increase. Therefore, it is necessary to reduce the data capacity of the intermediate operation result by considering the capacity of the on-chip memory (320) and to reduce the frequency of access to the off-chip memory in order to reduce the traffic generated during the operation processing.

뉴럴 네트워크 프로세서(310)는 메모리(320)에 저장된(또는 버퍼링된) 입력 피처맵들의 엘리먼트(element)들, 커널들의 웨이트들 등을 이용하여 입력 피처맵 및 커널 간의 컨볼루션 연산을 처리한다. 이 때, 입력 피처맵은 이미지 데이터에 관한 것을 수 있고, 입력 피처맵의 엘리먼트는 픽셀을 나타낼 수 있으나 이에 제한되지 아니한다.The neural network processor (310) processes a convolution operation between the input feature map and the kernel using elements of the input feature maps stored (or buffered) in the memory (320), weights of the kernels, etc. At this time, the input feature map may be related to image data, and elements of the input feature map may represent pixels, but are not limited thereto.

뉴럴 네트워크 프로세싱 장치(300) 내에서 뉴럴 네트워크 프로세서(310) 및 메모리(320) 각각은 하나 이상씩 구비될 수 있으며, 뉴럴 네트워크 프로세서(310) 및 메모리(320) 각각은 병렬적으로, 독립적으로 컨볼루션 연산을 처리하는데 이용됨으로써 컨볼루션 연산이 효율적으로 처리될 수 있다.In a neural network processing device (300), each of a neural network processor (310) and a memory (320) may be provided one or more times, and each of the neural network processors (310) and the memory (320) may be used to process a convolution operation in parallel and independently, thereby enabling the convolution operation to be processed efficiently.

뉴럴 네트워크 프로세서(310) 내에는 컨볼루션 연산을 위한 컨볼루션 연산기(convolution operator)를 구현한 로직 회로가 구비될 수 있다. 컨볼루션 연산기는 시프터(shifter) 또는 곱셈기(multiplier), 덧셈기(adder) 및 누산기(accumulator) 등의 조합으로 구현된 연산기이다. 컨볼루션 연산기 내에서 시프터, 곱셈기 및 덧셈기 각각은 다수의 서브-시프터들, 서브-곱셈기들 및 서브 덤셈기들 각각의 조합으로 구현될 수 있다.A logic circuit implementing a convolution operator for a convolution operation may be provided within the neural network processor (310). The convolution operator is an operator implemented by a combination of a shifter or a multiplier, an adder, and an accumulator. Each of the shifter, the multiplier, and the adder within the convolution operator may be implemented by a combination of a plurality of sub-shifters, sub-multipliers, and sub-dubbers, respectively.

뉴럴 네트워크 프로세서(310)는 입력 피처맵의 복수의 로우(raw) 윈도우들을 차분 연산을 위한 복수의 차분 그룹들(differential groups)로 그룹핑할 수 있다.The neural network processor (310) can group multiple raw windows of an input feature map into multiple differential groups for differential operations.

윈도우는 피처맵보다 작은 단위의 서브-피처맵을 나타낸다. 예를 들어 입력 피처맵의 복수의 윈도우들 각각은 입력 피처맵을 구성하는 복수의 엘리먼트들 중 일부의 엘리먼트들로 구성될 수 있다.A window represents a sub-feature map of a smaller unit than a feature map. For example, each of a plurality of windows of an input feature map may be composed of some elements among a plurality of elements that constitute the input feature map.

복수의 윈도우들 각각은 입력 피처맵의 엘리먼트들 중 일부를 중첩적으로 포함할 수 있다. 예를 들어, 입력 피처맵은 엘리먼트들이 다차원 공간에 배열된 블록 데이터일 수 있고, 다차원 공간상에 인접한 두 개의 윈도우들은 입력 피처맵의 일부 엘리먼트들을 중첩적으로 포함할 수 있다. 다른 일 예로서, 복수의 윈도우들은 상호간 중첩되지 않을 수 있다.Each of the plurality of windows may overlap some of the elements of the input feature map. For example, the input feature map may be block data in which elements are arranged in a multidimensional space, and two adjacent windows in the multidimensional space may overlap some of the elements of the input feature map. As another example, the plurality of windows may not overlap each other.

입력 피처맵의 다차원 공간은 공간적으로 인접한 엘리먼트들간의 관련성이 높도록 결정될 수 있다. 예를 들어, 입력 피처맵의 다차원 공간에서 인접한 엘리먼트들은 유사한 값을 가질 수 있다. 따라서, 입력 피처맵의 다차원 공간은 입력 피처맵의 종류에 따라 달리 결정될 수 있다. 예를 들어, 입력 피처맵이 이미지 데이터에 관한 것이라면 다차원 공간은 이미지 데이터의 픽셀 공간을 나타낼 수 있다.The multidimensional space of the input feature map can be determined so that spatially adjacent elements have high correlations. For example, adjacent elements in the multidimensional space of the input feature map can have similar values. Therefore, the multidimensional space of the input feature map can be determined differently depending on the type of the input feature map. For example, if the input feature map is about image data, the multidimensional space can represent the pixel space of the image data.

로우(raw) 윈도우들은 차분(raw) 윈도우와 달리 윈도우들간의 차분 연산이 수행되지 않은 윈도우를 나타낸다Raw windows represent windows in which no difference operation is performed between windows, unlike raw windows.

뉴럴 네트워크 프로세서(310)는 슬라이딩 윈도우 방식(sliding window fashion)에 따라 복수의 로우 윈도우들을 결정할 수 있다. 슬라이딩 윈도우 방식은 미리 결정된 크기 및 모양인, 즉, 미리 결정된 셰이프(shapes)를 가지는 슬라이딩 윈도우가 일정한 슬라이딩 간격으로 입력 피처맵을 슬라이딩하여 복수의 윈도우들을 결정하는 방식이다. 예를 들어, 복수의 슬라이딩 위치들 각각에서 슬라이딩 윈도우의 셰이프에 따라 구획되는 입력 피처맵의 엘리먼트들을 스캐닝함으로써, 복수의 윈도우들 각각이 결정될 수 있다. 이 때, 슬라이딩 방향은 입력 피처맵의 다차원 공간에서 복수의 윈도우들간의 공간적 방향성을 나타낼 수 있다. 그러나, 슬라이딩 방향은 복수의 윈도우들이 결정되거나 획득되는 시간적 선후를 나타내지 않는다. 예를 들어, 슬라이딩 방향으로 선행하는 위치의 윈도우와 후행하는 위치의 윈도우들은 동시적으로 결정되거나 획득될 수 있다.The neural network processor (310) can determine a plurality of row windows according to a sliding window fashion. The sliding window fashion is a method in which a sliding window having a predetermined size and shape, i.e., a predetermined shape, slides an input feature map at a constant sliding interval to determine a plurality of windows. For example, each of the plurality of windows can be determined by scanning elements of the input feature map partitioned according to the shape of the sliding window at each of the plurality of sliding positions. At this time, the sliding direction can indicate the spatial directionality between the plurality of windows in the multidimensional space of the input feature map. However, the sliding direction does not indicate the temporal order in which the plurality of windows are determined or acquired. For example, a window at a preceding position and a window at a succeeding position in the sliding direction can be determined or acquired simultaneously.

입력 피처맵의 복수의 윈도우들의 크기 및 모양인 셰이프는 슬라이딩 윈도우의 셰이프에 따라 결정될 수 있다. 예를 들어, 입력 피처맵은 상호간 직교하는 행방향(또는 길이 방향)의 축, 열방향(또는 너비 방향)의 측, 및 깊이 방향의 축으로 정의되는 3차원 공간에 배열된 3차원 블록일 수 있다. 슬라이딩 윈도우가 입력 피처맵의 열방향 및 행방향으로 소정의 크기를 가지는 사각형 셰이프인 경우, 복수의 윈도우들은 열방향 및 행방향으로 슬라이딩 윈도우와 동일한 크기를 가지고, 사각형 형상의 셰이프를 가지는 서브 블록들일 수 있다. 이 때, 슬라이딩 윈도우의 셰이프는 복수의 윈도우들의 깊이 방향의 크기는 결정하지 않는다. 복수의 윈도우들의 깊이 방향의 크기는 입력 피처맵의 깊이 방향의 크기와 동일할 수 있다.The shape, which is the size and shape of the plurality of windows of the input feature map, can be determined according to the shape of the sliding window. For example, the input feature map can be a three-dimensional block arranged in a three-dimensional space defined by mutually orthogonal row (or length) axes, column (or width) axes, and depth axes. When the sliding window has a rectangular shape having a predetermined size in the column and row directions of the input feature map, the plurality of windows can be sub-blocks having the same size as the sliding window in the column and row directions and having a rectangular shape. In this case, the shape of the sliding window does not determine the size of the plurality of windows in the depth direction. The size of the plurality of windows in the depth direction can be the same as the size of the input feature map in the depth direction.

뉴럴 네트워크 프로세서(310)는 입력 피처맵의 복수의 윈도우들을 차분 연산이 수행되는 단위의 복수의 차분 그룹들로 그룹핑(Grouping)할 수 있다. 예를 들어, 뉴럴 네트워크 프로세서(310)는 복수의 윈도우들을 인접한 두 개의 윈도우들로 구성된 차분 그룹들로 그룹핑할 수 있다. 이 때, 차분 그룹은 차분 대상 윈도우(differential subject windows)와 차분 대상 윈도우로부터 차분되는 차분 기준 윈도우(differential reference windows)로 구성될 수 있다. 차분 대상 윈도우는 슬라이딩 방향으로 후행하는 윈도우이며, 차분 기준 윈도우는 차분 대상 윈도우에 대하여 슬라이딩 방향으로 선행하는 윈도우일 수 있다.The neural network processor (310) can group multiple windows of an input feature map into multiple differential groups in which differential operations are performed. For example, the neural network processor (310) can group multiple windows into differential groups consisting of two adjacent windows. At this time, the differential group can be composed of differential subject windows and differential reference windows that are differential from the differential subject windows. The differential subject windows can be windows that follow in the sliding direction, and the differential reference windows can be windows that precede the differential subject windows in the sliding direction.

서로 다른 두 개의 차분 그룹들 중 하나의 차분 그룹에 속한 차분 대상 윈도우는 다른 차분 그룹에 속한 차분 기준 윈도우와 동일한 윈도우일 수 있다. 예를 들어, 슬라이딩 방향으로 인접하여 배치되는 3 개의 윈도우들인, 선행 윈도우와 중간 윈도우 및 후행 윈도우에 대하여, 두 개의 차분 그룹이 정의될 수 있다. 이 때, 선행 윈도우와 중간 윈도우로 구성된 차분 그룹에서 차분 대상 윈도우인 중간 윈도우는 중간 윈도우와 후행 윈도우로 구성된 다음 차분 그룹의 차분 기준 윈도우일 수 있다.Among two different difference groups, a difference target window belonging to one difference group can be the same window as a difference reference window belonging to another difference group. For example, for three windows, a leading window, a middle window, and a trailing window, which are arranged adjacently in a sliding direction, two difference groups can be defined. At this time, the middle window, which is a difference target window in the difference group consisting of the leading window and the middle window, can be a difference reference window of the next difference group consisting of the middle window and the trailing window.

뉴럴 네트워크 프로세서(310)는 그룹핑된 복수의 차분 그룹들 각각에 속한 로우 윈도우들 간의 차분 연산을 통해 차분 윈도우들(differential windows)을 생성할 수 있다.The neural network processor (310) can generate differential windows through differential operations between row windows belonging to each of a plurality of grouped differential groups.

뉴럴 네트워크 프로세서(310)는 그룹핑된 복수의 차분 그룹들 각각에 속한 차분 기준 윈도우와 차분 대상 윈도우간의 엘리먼트 단위(element-wise)의 차분 연산을 통해 차분 윈도우들 각각을 생성할 수 있다. 예를 들어, 뉴럴 네트워크 프로세서(310)는 동일한 셰이프를 가지는 차분 대상 윈도우와 차분 기준 윈도우들 각각의 대응하는 위치의 두 엘리먼트들간의 차분 연산을 수행하여, 차분 윈도우를 획득할 수 있다. 따라서, 차분 윈도우의 셰이프는 차분 대상 윈도우 및 차분 기준 윈도우의 셰이프와 동일하다. 뉴럴 네트워크 프로세서(310)는 차분 그룹들의 차분 대상 윈도우들 각각에 대응하는 차분 윈도우들 각각을 생성할 수 있다.The neural network processor (310) can generate each of the differential windows through an element-wise differential operation between the differential reference window and the differential target window belonging to each of the plurality of grouped differential groups. For example, the neural network processor (310) can obtain the differential window by performing a differential operation between two elements at corresponding positions of the differential target window and the differential reference window having the same shape. Accordingly, the shape of the differential window is the same as the shapes of the differential target window and the differential reference window. The neural network processor (310) can generate each of the differential windows corresponding to each of the differential target windows of the differential groups.

뉴럴 네트워크 프로세서(310)는 로우 윈도우들 중 기준 로우 윈도우와 커널 간의 컨볼루션 연산을 수행함으로써, 기준 로우 윈도우에 대응하는 출력 피처맵의 기준 엘리먼트를 획득할 수 있다.The neural network processor (310) can obtain a reference element of an output feature map corresponding to the reference low window by performing a convolution operation between a reference low window and a kernel among the low windows.

뉴럴 네트워크 프로세서(310)는 복수의 윈도우들 중 적어도 하나의 기준 로우 윈도우를 결정할 수 있다. 예를 들어, 뉴럴 네트워크 프로세서(310)는 복수의 윈도우들을 복수의 캐스케이딩 그룹들로 그룹핑할 수 있다. 복수의 캐스케이딩 그룹 내에 윈도우들 각각은 캐스케이딩 그룹 내의 적어도 하나의 다른 윈도우와 인접할 수 있다.The neural network processor (310) can determine at least one reference raw window among the plurality of windows. For example, the neural network processor (310) can group the plurality of windows into a plurality of cascading groups. Each of the windows within the plurality of cascading groups can be adjacent to at least one other window within the cascading group.

뉴럴 네트워크 프로세서(310)는 복수의 캐스케이딩 그룹들 각각에서 기준 윈도우를 결정할 수 있다. 뉴럴 네트워크 프로세서(310)는 기준 로우 윈도우와 커널간의 컨볼루션 연산을 수행하여, 기준 로우 윈도우에 대응하는 출력 피처맵의 기준 엘리먼트를 획득할 수 있다. 출력 피처맵의 기준 엘리먼트는 차분 윈도우와 독립적으로 기준 로우 윈도우와 커널간의 컨볼루션 연산 결과로부터 직접 획득된다.The neural network processor (310) can determine a reference window in each of a plurality of cascading groups. The neural network processor (310) can perform a convolution operation between the reference row window and the kernel to obtain a reference element of an output feature map corresponding to the reference row window. The reference element of the output feature map is obtained directly from the result of the convolution operation between the reference row window and the kernel, independently of the differential window.

뉴럴 네트워크 프로세서(310)는 차분 윈도우들 각각과 커널 간의 컨볼루션 연산 결과들에 대하여 출력 피처맵의 기준 엘리먼트와 합산 연산을 수행하여 출력 피처맵의 나머지 엘리먼트들을 획득할 수 있다.The neural network processor (310) can obtain the remaining elements of the output feature map by performing a sum operation on the results of convolution operations between each of the differential windows and the kernel with the reference element of the output feature map.

뉴럴 네트워크 프로세서(310)는 복수의 차분 그룹들에 대응하는 복수의 차분 윈도우들 각각과 커널간의 컨볼루션 연산을 수행할 수 있다. 또한, 뉴럴 네트워크 프로세서(310)는 차분 윈도우들 각각과 커널 간의 컨볼루션 연산 결과들의 캐스케이딩 방식(cascading)의 합산 결과에 대하여 기준 엘리먼트와 합산 연산을 수행함으로써, 출력 피처맵의 나머지 엘리먼트들을 획득할 수 있다. 따라서, 뉴럴 네트워크 프로세서(310)는, 복수의 차분 그룹들 각각의 차분 대상 윈도우들 각각에 대응하는 출력 피처맵의 나머지 엘리먼트들을 획득하기 위하여, 로우 윈도우들, 즉, 차분 대상 윈도우들 각각과 커널간의 컨볼루션 연산을 수행할 필요가 없다.The neural network processor (310) can perform a convolution operation between each of a plurality of differential windows corresponding to a plurality of differential groups and a kernel. In addition, the neural network processor (310) can obtain the remaining elements of the output feature map by performing a sum operation with a reference element on the result of the cascading method of the convolution operation results between each of the differential windows and the kernel. Therefore, the neural network processor (310) does not need to perform a convolution operation between each of the row windows, i.e., the differential target windows, and the kernel in order to obtain the remaining elements of the output feature map corresponding to each of the differential target windows of each of the plurality of differential groups.

차분 대상 윈도우와 차분 기준 윈도우의 대응하는 위치에서 엘리먼트들간의 유사성에 따라, 차분 윈도우의 엘리먼트들은 로우 윈도우의 엘리먼트들보다 상대적으로 작은 값을 가지므로, 상대적으로 적은 용량의 메모리를 이용하여 차분 윈도우를 저장할 수 있다.Since the elements of the difference window have relatively smaller values than the elements of the low window based on the similarity between the elements at corresponding positions in the difference target window and the difference reference window, the difference window can be stored using a relatively small amount of memory.

또한, 뉴럴 네트워크 프로세서(310)는 차분 윈도들 각각과 커널간의 컨볼루션 연산 속도를 증가시키기 위하여, 차분 윈도우들의 데이터 포맷을 변환할 수 있다. 예를 들어, 뉴럴 네트워크 프로세서(310)는 차분 윈도우들 각각이 비트 데이터 포맷인 경우, 차분 윈도우들 각각을 비트 값이 1을 나타내는 유효 비트의 자릿수에 대한 정보를 포함하는 데이터 포맷으로 변환할 수 있다. 또한, 뉴럴 네트워크 프로세서(310)는 데이터 포맷을 변환하기 위한 전 처리로서, 유효 비트의 개수를 감소시키기 위한 부스 알고리즘(booth algorithm)에 따라, 차분 윈도우들을 전처리 할 수 있다. 뉴럴 네트워크 프로세서(310)는 변환된 차분 윈도우들 각각의 유효 비트의 자릿수에 대한 정보에 기초하여 비트-시프트 연산을 수행하고, 이로부터 차분 윈도우와 커널간의 컨볼루션 연산 결과를 산출할 수 있다. 컨볼루션 연산 결과를 산출함에 있어서, 상대적으로 적은 정보량을 가지는 데이터 포맷으로 변환을 수행하고, 연산 처리의 부하가 적은 시프트 연산을 이용함으로써 메모리 용량을 줄이고 연산 처리 속도가 증가될 수 있다.In addition, the neural network processor (310) may convert the data format of the differential windows in order to increase the convolution operation speed between each of the differential windows and the kernel. For example, if each of the differential windows is in a bit data format, the neural network processor (310) may convert each of the differential windows into a data format including information on the number of digits of a significant bit representing a bit value of 1. In addition, the neural network processor (310) may preprocess the differential windows according to a booth algorithm for reducing the number of significant bits as a preprocessing for converting the data format. The neural network processor (310) may perform a bit-shift operation based on the information on the number of digits of a significant bit of each of the converted differential windows, and may calculate the convolution operation result between the differential window and the kernel from this. In calculating the convolution operation result, the memory capacity may be reduced and the operation processing speed may be increased by performing the conversion into a data format having a relatively small amount of information and using a shift operation having a small load on the operation processing.

도 4는 일 실시예에 따른 뉴럴 네트워크 프로세싱 장치에서 컨볼루션 뉴럴 네트워크를 처리하는 방법의 흐름도이다.FIG. 4 is a flowchart of a method for processing a convolutional neural network in a neural network processing device according to one embodiment.

도 4에 도시된, 컨볼루션 뉴럴 네트워크의 처리 방법은, 앞서 설명된 도면들에서 설명된 뉴럴 네트워크 프로세싱 장치(도 3의 300)에 의해 수행될 수 있으므로, 이하 생략된 내용이라 할지라도, 앞서 도면들에서 설명된 내용들은 도 4의 방법에도 적용될 수 있다.The processing method of the convolutional neural network illustrated in FIG. 4 can be performed by the neural network processing device (300 of FIG. 3) described in the drawings described above, so even if the contents are omitted below, the contents described in the drawings described above can also be applied to the method of FIG. 4.

단계 410에서, 뉴럴 네트워크 프로세싱 장치(300)는 입력 피처맵의 복수의 로우(raw) 윈도우들을 차분 연산을 위한 복수의 차분 그룹들(differential groups)로 그룹핑할 수 있다.In step 410, the neural network processing device (300) can group multiple raw windows of the input feature map into multiple differential groups for differential operation.

뉴럴 네트워크 프로세싱 장치(300)는 슬라이딩 윈도우 방식에 따라 복수의 로우 윈도우들을 결정할 수 있다. The neural network processing device (300) can determine multiple row windows according to a sliding window method.

뉴럴 네트워크 프로세싱 장치(300)는 입력 피처맵의 복수의 윈도우들을 차분 연산이 수행되는 단위의 복수의 차분 그룹들로 그룹핑할 수 있다. 예를 들어, 뉴럴 네트워크 프로세싱 장치(300)는 복수의 윈도우들을 인접한 두 개의 윈도우들로 구성된 차분 그룹들로 그룹핑할 수 있다The neural network processing device (300) can group multiple windows of an input feature map into multiple differential groups in which differential operations are performed. For example, the neural network processing device (300) can group multiple windows into differential groups consisting of two adjacent windows.

단계 420에서, 뉴럴 네트워크 프로세싱 장치(300)는 그룹핑된 복수의 차분 그룹들 각각에 속한 로우 윈도우들 간의 차분 연산을 통해 차분 윈도우들을 생성할 수 있다.In step 420, the neural network processing device (300) can generate differential windows through differential operations between row windows belonging to each of the plurality of grouped differential groups.

뉴럴 네트워크 프로세싱 장치(300)는 그룹핑된 복수의 차분 그룹들 각각에 속한 차분 기준 윈도우와 차분 대상 윈도우간의 엘리먼트 단위(element-wise) 차분 연산을 통해 차분 윈도우들 각각을 생성할 수 있다. 예를 들어, 뉴럴 네트워크 프로세싱 장치(300)는 동일한 셰이프를 가지는 차분 대상 윈도우와 차분 기준 윈도우들 각각에 대응하는 위치의 두 엘리먼트들간의 차분 연산을 수행하여, 차분 윈도우를 획득할 수 있다. 이에 따라, 뉴럴 네트워크 프로세싱 장치(300)는 차분 그룹들 각각에 대하여, 차분 대상 윈도우에 대응하는 차분 윈도우들 각각을 생성할 수 있다.The neural network processing device (300) can generate each of the differential windows through an element-wise differential operation between the differential reference window and the differential target window belonging to each of the plurality of grouped differential groups. For example, the neural network processing device (300) can obtain the differential window by performing a differential operation between two elements at positions corresponding to each of the differential target window and the differential reference window having the same shape. Accordingly, the neural network processing device (300) can generate each of the differential windows corresponding to the differential target window for each of the differential groups.

단계 430에서, 뉴럴 네트워크 프로세싱 장치(300)는 로우 윈도우들 중 기준 로우 윈도우와 커널 간의 컨볼루션 연산을 수행함으로써, 기준 로우 윈도우에 대응하는 출력 피처맵의 기준 엘리먼트를 획득할 수 있다.In step 430, the neural network processing device (300) can obtain a reference element of an output feature map corresponding to the reference low window by performing a convolution operation between a reference low window and a kernel among the low windows.

뉴럴 네트워크 프로세싱 장치(300)는 복수의 윈도우들 중 적어도 하나의 기준 로우 윈도우를 결정할 수 있다. 예를 들어, 뉴럴 네트워크 프로세싱 장치(300)는 복수의 윈도우들을 복수의 캐스케이딩 그룹들로 그룹핑할 수 있다. 복수의 캐스케이딩 그룹 내에 윈도우들은 각각은 캐스케이딩 그룹 내의 적어도 하나의 다른 윈도우와 인접할 수 있다. 또한, 뉴럴 네트워크 프로세싱 장치(300)는 복수의 캐스케이딩 그룹들 각각에서 기준 윈도우를 결정할 수 있다.The neural network processing device (300) can determine at least one reference raw window among the plurality of windows. For example, the neural network processing device (300) can group the plurality of windows into a plurality of cascading groups. Each of the windows within the plurality of cascading groups can be adjacent to at least one other window within the cascading group. In addition, the neural network processing device (300) can determine a reference window in each of the plurality of cascading groups.

뉴럴 네트워크 프로세싱 장치(300)는 기준 로우 윈도우의 엘리먼트와 대응하는 위치의 커널의 엘리먼트간의 컨볼루션 연산을 수행하여, 기준 로우 윈도우에 대응하는 출력 피처맵의 기준 엘리먼트를 획득할 수 있다.A neural network processing device (300) can obtain a reference element of an output feature map corresponding to a reference low window by performing a convolution operation between elements of a kernel at a corresponding position and elements of a reference low window.

단계 440에서, 뉴럴 네트워크 프로세싱 장치(300)는 차분 윈도우들 각각과 커널 간의 컨볼루션 연산 결과들에 대하여 기준 엘리먼트와 합산 연산을 수행하여 출력 피처맵의 나머지 엘리먼트들을 획득할 수 있다.In step 440, the neural network processing device (300) can perform a sum operation with the reference element on the convolution operation results between each of the differential windows and the kernel to obtain the remaining elements of the output feature map.

뉴럴 네트워크 프로세싱 장치(300)는 복수의 차분 그룹들에 대응하는 복수의 차분 윈도우들 각각과 커널간의 컨볼루션 연산을 수행할 수 있다. 또한, 뉴럴 네트워크 프로세싱 장치(300)는 차분 윈도우들 각각과 커널 간의 컨볼루션 연산 결과들의 캐스케이딩 방식(cascading)의 합산 결과에 대하여 기준 엘리먼트와 합산 연산을 수행함으로써, 출력 피처맵의 나머지 엘리먼트들을 획득할 수 있다. The neural network processing device (300) can perform a convolution operation between each of a plurality of differential windows corresponding to a plurality of differential groups and a kernel. In addition, the neural network processing device (300) can obtain the remaining elements of the output feature map by performing a sum operation with a reference element on the result of the cascading method of the convolution operation results between each of the differential windows and the kernel.

또한, 뉴럴 네트워크 프로세싱 장치(300)는 차분 윈도우들의 데이터 포맷을 변환할 수 있다. 예를 들어, 뉴럴 네트워크 프로세싱 장치(300)는 차분 윈도우들 각각이 비트 데이터 포맷인 경우, 차분 윈도우들 각각에 대하여 비트 값이 1을 나타내는 유효 비트의 자릿수에 대한 정보를 포함하는 데이터 포맷으로 변환을 수행할 수 있다. 또한, 뉴럴 네트워크 프로세싱 장치(300)는 데이터 포맷을 변환하기 위한 전 처리로서, 유효 비트 수를 감소시키기 위한 부스 알고리즘(booth algorithm)에 따라, 차분 윈도우를 전처리 할 수 있다. 뉴럴 네트워크 프로세싱 장치(300)는 변환된 차분 윈도우들 각각의 유효 비트의 자릿수에 대한 정보에 기초하여 비트-시프트 연산 결과에 기초하여, 차분 윈도우와 커널간의 컨볼루션 연산 결과를 산출할 수 있다.In addition, the neural network processing device (300) can convert the data format of the differential windows. For example, if each of the differential windows is in a bit data format, the neural network processing device (300) can perform conversion into a data format including information on the number of digits of a significant bit representing a bit value of 1 for each of the differential windows. In addition, the neural network processing device (300) can preprocess the differential windows according to a booth algorithm for reducing the number of significant bits as a preprocessing for converting the data format. The neural network processing device (300) can calculate the result of a convolution operation between the differential window and the kernel based on the result of a bit-shift operation based on the information on the number of digits of a significant bit of each of the converted differential windows.

도 5는 일 실시 예에 따른 입력 피처맵(500)을 도시한 도면이다.FIG. 5 is a diagram illustrating an input feature map (500) according to one embodiment.

도 5에서, 입력 피처맵(500)이 2차원 어레이 데이터인 실시 예가 도시되어 있으나, 입력 피처맵(500)은 3차원 블록 데이터 또는 다른 다양한 형태의 데이터일 수 있으며 본 실시 예에 제한되지 않는다.In FIG. 5, an embodiment is illustrated in which the input feature map (500) is two-dimensional array data, but the input feature map (500) may be three-dimensional block data or other various types of data and is not limited to the present embodiment.

도 5를 참조하면, 입력 피처맵(500)은 행방향과 열방향으로 7x6 엘리먼트 크기를 가지는 2차원 어레이 데이터이다. 입력 피처맵(500)은 7X6=42 개의 엘리먼트들로 구성된다. 입력 피처맵(500)이 이미지 데이터에 관한 것인 경우, 엘리먼트들 각각은 픽셀에 대응할 수 있다.Referring to Fig. 5, the input feature map (500) is a two-dimensional array data having a 7x6 element size in the row and column directions. The input feature map (500) is composed of 7X6=42 elements. If the input feature map (500) is about image data, each element may correspond to a pixel.

도 5에서, 입력 피처맵(500)의 엘리먼트들 각각은 행을 나타내는 인덱스와 열을 나타내는 인덱스의 조합으로 표현되었다. 예를 들어, 입력 피처맵(500)의 3행2열의 엘리먼트는 X₃₂로 나타낸다.In Fig. 5, each element of the input feature map (500) is expressed as a combination of an index representing a row and an index representing a column. For example, an element of 3 rows and 2 columns of the input feature map (500) is expressed as X ₃₂ .

도 5에서, 입력 피처맵(500)의 로우 윈도우들을 결정하기 위한 슬라이딩 윈도우(510)가 도시된다. 슬라이딩 윈도우 방식에 따라, 슬라이딩 윈도우(510)는 소정의 개수의 엘리먼트 간격으로 입력 피처맵(500)을 슬라이딩 하며, 로우 윈도우들을 추출한다. 본 실시예들에서 달리 차분 윈도우라 지칭하지 않는 경우, 윈도우는 차분 윈도우와 구별되는 로우 윈도우를 나타낸다.In FIG. 5, a sliding window (510) for determining the low windows of the input feature map (500) is illustrated. According to the sliding window method, the sliding window (510) slides the input feature map (500) at a predetermined number of element intervals and extracts the low windows. In the present embodiments, unless otherwise referred to as a differential window, the window represents a low window that is distinct from a differential window.

도 5에서, 슬라이딩 윈도우(510)는 행방향과 열방향으로 3X3 엘리먼트 크기를 가진다. 따라서, 입력 피처맵(500)의 윈도우들 각각은 슬라이딩 윈도우의 크기와 동일한 3X3 엘리먼트 크기의 2차원 어레이 데이터이다.In Fig. 5, the sliding window (510) has a 3X3 element size in the row and column directions. Accordingly, each of the windows of the input feature map (500) is a two-dimensional array data of a 3X3 element size equal to the size of the sliding window.

이하 도 6 및 도 7의 실시 예는 도 5의 입력 피처맵(500) 및 슬라이딩 윈도우(510)에 대응하여 결정된 윈도우들에 관한 것이다. 슬라이딩 윈도우(510)는 입력 피처맵을 열 방향으로 1 개의 엘리먼트 간격으로 이동하며, 동일한 행에 속하는 윈도우들을 결정한다. 또한, 슬라이딩 윈도우(510)는 행 방향으로 1 개의 엘리먼트 간격으로 이동하며, 다음 행에 속하는 윈도우들을 결정한다. 이와 같은 방식으로 입력 피처맵으로부터 5X4=20개의 윈도우들이 결정된다.The embodiments of FIGS. 6 and 7 below relate to windows determined in response to the input feature map (500) and sliding window (510) of FIG. 5. The sliding window (510) moves the input feature map by one element in the column direction and determines windows belonging to the same row. In addition, the sliding window (510) moves by one element in the row direction and determines windows belonging to the next row. In this manner, 5X4=20 windows are determined from the input feature map.

도 6은 일 실시 예에 따른 차분 그룹 및 차분 윈도우를 설명하기 위한 도면이다.FIG. 6 is a diagram for explaining a differential group and a differential window according to one embodiment.

도 6은 동일한 행에 속하는 4 개의 윈도우들(윈도우11, 윈도우12, 윈도우13, 윈도우14)과 4 개의 윈도우들 중 인접한 2 개의 윈도우들로 구성되는 3 개의 차분 그룹들(차분 그룹11, 차분 그룹12, 차분 그룹13)을 도시한다.Figure 6 illustrates four windows belonging to the same row (window 11, window 12, window 13, window 14) and three difference groups (difference group 11, difference group 12, difference group 13) each consisting of two adjacent windows among the four windows.

상기 도 5에 도시된 슬라이딩 윈도우가 슬라이딩하는 열 방향에 따라, 윈도우11 및 윈도우12, 윈도우12 및 윈도우13, 그리고 윈도우13 및 윈도우14는 상호간 인접한 윈도우들이며 각각 서로 다른 차분 그룹을 구성한다.According to the sliding window sliding direction illustrated in the above Fig. 5, window 11 and window 12, window 12 and window 13, and window 13 and window 14 are mutually adjacent windows and each constitutes a different difference group.

차분 그룹 중 차분 방향으로 선행하는 윈도우는 차분 기준이 되는 윈도우며, 후행하는 윈도우는 차분 대상이 되는 윈도우로서, 차분 대상 윈도우로부터 차분 기준 윈도우를 차분하여 차분 윈도우가 생성된다.Among the difference groups, the window preceding the difference direction is the difference reference window, and the window following the difference is the difference target window. The difference window is created by differentiating the difference reference window from the difference target window.

예를 들어, 차분 그룹12에서, 후행하는 윈도우13으로부터 선행하는 윈도우12를 차분하여, 차분 윈도우13이 생성된다.For example, in differential group 12, differential window 13 is generated by differentiating the preceding window 12 from the trailing window 13.

이 때, 윈도우들간의 차분 연산은 엘리먼트 단위로 수행된다. 예를 들어, 차분 윈도우13은 윈도우13과 윈도우12의 대응하는 위치의 엘리먼트들간의 차분 연산을 통해 생성된다. 예를 들어, 차분 윈도우13의 2행2열의 엘리먼트는 X₂₄-X₂₃으로서, 대응하는 위치인 윈도우13의 2행2열의 엘리먼트 X₂₄로부터 윈도우12의 엘리먼트 X₂₃을 차분하여 획득된다.At this time, the difference operation between windows is performed on an element basis. For example, the difference window 13 is generated through the difference operation between the elements of the corresponding positions of window 13 and window 12. For example, the element of row 2 and column 2 of the difference window 13 is X ₂₄ -X ₂₃ , which is obtained by differentiating element X 23 of window 12 from element X ₂₄ of row 2 and column ₂ of window 13, which is the corresponding position.

로우 윈도우들, 차분 그룹들 및 차분 윈도우들간의 대응 관계와 관련하여, 차분 그룹들 각각의 차분 대상 윈도우인 후행하는 로우 윈도우들 각각은 차분 윈도우들 각각과 대응한다. 예를 들어, 차분 그룹11의 차분 대상 윈도우인 로우 윈도우12는 차분 윈도우12에 대응하며, 차분 그룹12의 로우 윈도우13은 차분 윈도우13에 대응된다. 또한, 차분 윈도우들 각각과 커널간의 컨볼루션 연산 결과, 출력 피처맵의 서로 다른 엘리먼트들이 생성된다. 따라서, 로우 윈도우인 차분 대상 윈도우, 차분 윈도우 및 출력 피처맵의 엘리먼트는 대응 관계를 가진다.With respect to the correspondence between the low windows, the differential groups and the differential windows, each of the trailing low windows, which are the differential target windows of each of the differential groups, corresponds to each of the differential windows. For example, the low window 12, which is the differential target window of the differential group 11, corresponds to the differential window 12, and the low window 13 of the differential group 12 corresponds to the differential window 13. In addition, as a result of the convolution operation between each of the differential windows and the kernel, different elements of the output feature map are generated. Therefore, the low windows, which are the differential target windows, the differential windows and the elements of the output feature map have a correspondence relationship.

도 6에서, 로우 윈도우인 윈도우11에 대응하는 차분 윈도우는 도시되지 않는다. 이는, 도 7에서 설명하는 바와 같이 윈도우11은 기준 윈도우로서, 별도의 차분 윈도우를 이용하지 않고, 윈도우11 자체와 커널간의 컨볼루션 연산을 수행하여 대응하는 출력 피처맵의 기준 엘리먼트가 획득되기 때문이다. 따라서, 차분 그룹을 그룹핑함에 있어서, 기준 윈도우인 윈도우 11을 차분 대상 윈도우로 포함하는 차분 그룹이 요구되지 않는다.In Fig. 6, the difference window corresponding to the low window, window 11, is not shown. This is because, as explained in Fig. 7, window 11 is a reference window, and the reference element of the corresponding output feature map is obtained by performing a convolution operation between window 11 itself and the kernel without using a separate difference window. Therefore, in grouping the difference group, a difference group that includes window 11, which is a reference window, as a difference target window is not required.

도 7은 일 실시 예에 따른 캐스케이딩 방식의 합산 연산을 설명하기 위한 도면이다.FIG. 7 is a diagram for explaining a cascading method of summation operation according to one embodiment.

도 7에서, 커널은 3X3 엘리먼트 크기를 가지는 가중치들의 2차원 어레이 데이터이다. 커널은 윈도우11, 차분 윈도우12, 차분 윈도우13 및 차분 윈도우14 각각과 컨볼루션 연산이 수행된다.In Fig. 7, the kernel is a two-dimensional array data of weights having a 3X3 element size. The kernel performs a convolution operation with each of window 11, differential window 12, differential window 13, and differential window 14.

출력 피처맵의 엘리먼트들을 나타내는 O₁₁, O₁₂, O₁₃ 및 O₁₄는 출력 피처맵의 서로 다른 엘리먼트들을 나타내며, 각각 윈도우11, 차분 윈도우12, 차분 윈도우13 및 차분 윈도우14에 대응한다. 도 6에서 상술한 바와 같이 로우 윈도우인 차분 대상 윈도우와 차분 윈도우는 대응관계를 가지므로, 출력 피처맵의 엘리먼트들 O₁₁, O₁₂, O₁₃ 및 O₁₄ 각각은 윈도우11, 윈도우12, 윈도우13 및 윈도우14 각각에 대응한다.O ₁₁ , O ₁₂ , O ₁₃ , and O ₁₄ , which represent elements of the output feature map, represent different elements of the output feature map and correspond to window 11 , differential window 12 , differential window 13 , and differential window 14 , respectively. As described above in Fig. 6 , since the differential target window, which is a low window, and the differential window have a corresponding relationship, the elements O ₁₁ , O ₁₂ , O ₁₃ , and O ₁₄ of the output feature map correspond to window 11 , window 12 , window 13 , and window 14 , respectively.

도 7을 참조하면, 윈도우11에 대응하는 출력 피처맵의 기준 엘리먼트(reference element) O₁₁은 윈도우11과 커널간의 컨볼루션 결과로부터 직접 산출된다. 이와 달리, 출력 피처맵의 나머지 엘리먼트들 O₁₂, O₁₃ 및 O₁₄는 출력 피처맵의 기준 엘리먼트 O₁₁과 차분 윈도우들 및 커널간의 컨볼루션 결과들을 캐스케이딩 방식으로 합산하여 산출된다. 예를 들어, 차분 윈도우13에 대응하는 출력 피처맵의 엘리먼트 O₁₃은 기준 엘리먼트 O₁₁에 차분 윈도우12와 커널간의 컨볼루션 연산 결과인 dO₁₂를 합산하고, 합산 결과에 차분 윈도우13과 커널간의 컨볼루션 연산 결과인 dO₁₃을 캐스케이딩 방식으로 합산하여 산출된다.Referring to FIG. 7, the reference element O ₁₁ of the output feature map corresponding to window 11 is directly derived from the convolution result between window 11 and the kernel. In contrast, the remaining elements O ₁₂ , O ₁₃ , and O ₁₄ of the output feature map are derived by adding the convolution results between the reference element O ₁₁ of the output feature map and the differential windows and kernels in a cascading manner. For example, the element O ₁₃ of the output feature map corresponding to differential window 13 is derived by adding dO ₁₂ , which is the convolution result between differential window 12 and the kernel, to the reference element O ₁₁ , and then adding dO ₁₃ , which is the convolution result between differential window 13 and the kernel, to the result of the addition in a cascading manner.

도 8은 일 실시 예에 따른 차분 윈도우를 이용한 컨볼루션 연산을 설명하기 위한 도면이다.FIG. 8 is a diagram for explaining a convolution operation using a differential window according to one embodiment.

윈도우0, 윈도우1 및 윈도우2는 로우 윈도우를 나타내며, 차분 윈도우1는 윈도우1로부터 윈도우0을 차분한 윈도우1의 차분 윈도우를 나타내고, 차분 윈도우2는 윈도우2로부터 윈도우1을 차분한 윈도우2의 차분 윈도우를 나타낸다. Window0, Window1, and Window2 represent low windows, and DifferentialWindow1 represents a differential window of Window1 that differentiates Window0 from Window1, and DifferentialWindow2 represents a differential window of Window2 that differentiates Window1 from Window2.

도 8에서, 로우 윈도우들 윈도우0, 윈도우1 및 윈도우2, 차분 윈도우들 차분 윈도우1, 차분 윈도우2 및 커널은 모두 2x2 엘리먼트 크기를 가진 2차원 어레이 데이터이다.In Figure 8, the row windows Window0, Window1, and Window2, the differential windows Differential Window1, Differential Window2, and the kernel are all two-dimensional array data with 2x2 element size.

일반적인 컨볼루션 연산 처리에서는, 로우 윈도우들 각각과 커널간의 컨볼루션 연산을 수행하여, 출력 피처맵의 엘리먼트들 각각이 산출된다. 예를 들어, 윈도우1의 엘리먼트들 47, 47, 49, 50 각각과 커널의 대응하는 위치의 엘리먼트들 2, 1, 3, 2 각각 간의 곱셈 결과들 47x2, 47x1, 49x3, 50x2를 합하여 윈도우 1에 대응하는 출력 피처맵의 엘리먼트 388이 산출된다.In a general convolution operation processing, a convolution operation is performed between each of the row windows and the kernel, and each element of the output feature map is produced. For example, the multiplication results 47x2, 47x1, 49x3, 50x2 between the elements 47, 47, 49, and 50 of window 1 and the elements 2, 1, 3, and 2 of the corresponding positions of the kernel are added to produce element 388 of the output feature map corresponding to window 1.

이와 달리, 차분 윈도우를 이용한 컨볼루션 방식에서는, 로우 윈도우들 중 기준 윈도우인 윈도우0만 커널과 직접 컨볼루션 연산이 수행된다. 예를 들어, 윈도우 1에 대응하는 출력 피처맵의 엘리먼트를 산출하기 위해 차분 윈도우1과 커널간의 컨볼루션 연산 결과를 윈도우 0과 커널간의 컨볼루션 연산 결과인 373(903)에 합산하여 산출된다.In contrast, in the convolution method using the differential window, only the reference window, window 0, among the low windows is directly convolved with the kernel. For example, in order to produce an element of the output feature map corresponding to window 1, the result of the convolution operation between differential window 1 and the kernel is added to 373 (903), which is the result of the convolution operation between window 0 and the kernel.

우선, 차분 윈도우1은 윈도우 1의 엘리먼트들 47, 47, 49, 50 각각으로부터 인접한 윈도우 0의 엘리먼트들 45, 47, 46, 49 각각을 차분하여, 차분 윈도우1의 엘리먼트들 2, 0, 3, 1 각각이 산출된다. 또한, 차분 윈도우1의 엘리먼트들 2, 0, 3, 1 각각과 커널의 대응하는 위치의 엘리먼트들 2, 1, 3, 2 각각 간의 곱셈 결과들 2x2, 0x1, 3x3, 1x2를 합산하여 차분 윈도우1과 커널간의 컨볼루션 연산 결과 15(904)가 산출된다. 동일한 방식으로, 차분 윈도우2의 엘리먼트들로부터 인접한 차분 윈도우1의 대응하는 위치의 엘리먼트들 각각을 차분하여, 차분 윈도우 2의 엘리먼트들 0, -1, 1, -2 각각이 산출되며, 차분 윈도우2와 커널간의 컨볼루션 연산 결과 -2가 산출된다.First, the difference window 1 is obtained by differentiating the elements 45, 47, 46, 49 of the adjacent window 0 from the elements 47, 47, 49, 50 of the window 1, respectively, to obtain the elements 2, 0, 3, 1 of the difference window 1, respectively. In addition, the multiplication results 2x2, 0x1, 3x3, 1x2 between the elements 2, 0, 3, 1 of the difference window 1, respectively, and the elements 2, 1, 3, 2 of the corresponding positions of the kernel, respectively, are added to obtain the convolution operation result 15 (904) between the difference window 1 and the kernel. In the same manner, by differentiating each of the elements at corresponding positions in the adjacent difference window 1 from the elements in the difference window 2, the elements 0, -1, 1, -2 of the difference window 2 are produced, and the result of the convolution operation between the difference window 2 and the kernel is -2.

윈도우1에 대응하는 출력 피처맵의 엘리먼트는 기준 윈도우인 윈도우 1에 대응하는 출력 피처맵의 엘리먼트 373(903)에 차분 윈도우 1과 커널간의 컨볼루션 연산 결과 15(904)를 합산한 388(905)으로 산출된다. 이는, 일반적인 컨볼루션 연산을 이용하여 산출된 윈도우 1에 대응하는 출력 피처맵의 엘리먼트 388(902)와 동일하다.The element of the output feature map corresponding to window 1 is calculated as 388 (905) by adding 15 (904) of the convolution operation result between the differential window 1 and the kernel to element 373 (903) of the output feature map corresponding to window 1, which is the reference window. This is identical to element 388 (902) of the output feature map corresponding to window 1 calculated using a general convolution operation.

또한, 윈도우2에 대응하는 출력 피처맵의 엘리먼트는, 기준 윈도우1에 대응하는 출력 피처맵의 엘리먼트 373(903)와 차분 윈도우1과 커널간의 컨볼루션 연산 결과 15(904)를 합산한 결과에, 연쇄적으로 차분 윈도우2와 커널간의 컨볼루션 연산 결과인 -2를 합산하는 캐스케이딩 방식의 합산 결과인 386으로 산출된다.In addition, the element of the output feature map corresponding to window 2 is calculated as 386, which is the result of a cascading method that sequentially adds -2, which is the result of the convolution operation between the differential window 2 and the kernel, to the result of adding 15 (904) of the convolution operation between the differential window 1 and the kernel and the element 373 (903) of the output feature map corresponding to the reference window 1.

도 9는 일 실시 예에 따른 뉴럴 네트워크 프로세싱 장치의 구현 예를 도시한 도면이다.FIG. 9 is a diagram illustrating an implementation example of a neural network processing device according to one embodiment.

도 9에서 뉴럴 네트워크 프로세싱 장치는 입력 피처맵이 저장된 복수의 입력 피처맵 메모리(Input feature map Memory)들(1200, 1201, 1215), 커널의 가중치들이 저장된 가중치 메모리(1300), 컨볼루션 연산을 수행하는 복수의 컨볼루션 유닛(Convolution Unit: CU)들(1100, 1115) 및 차분 윈도우를 이용한 컨볼루션 연산 결과로부터 출력 피처맵의 엘리먼트를 산출하기 위하여 캐스케이딩 방식의 합산 연산을 수행하는 차분 연산 복원 유닛(Differential reconstruction Unit: DU)들을 포함할 수 있다.In FIG. 9, the neural network processing device may include a plurality of input feature map memories (1200, 1201, 1215) in which input feature maps are stored, a weight memory (1300) in which kernel weights are stored, a plurality of convolution units (CUs) (1100, 1115) that perform convolution operations, and differential reconstruction units (DUs) that perform a cascading-style summation operation to derive elements of an output feature map from a result of a convolution operation using a differential window.

복수의 입력 피처맵 메모리들(1200, 1201, 1215) 각각에는 서로 다른 로우 윈도우들 또는 차분 윈도우들 각각이 저장될 수 있다.Each of the multiple input feature map memories (1200, 1201, 1215) may store different row windows or differential windows.

또한, 미리 결정된 개수의 CU들(1100, 1115)이 하나의 열(column)로 그룹핑될 수 있다. 서로 다른 Column들 각각은 출력 피처맵의 서로 다른 열들 각각에 대응될 수 있다. 예를 들어, 출력 피처맵의 열 방향의 크기가 16인 경우, Column0, Column1 및 Column15는 출력 피처맵의 16 개의 열들 각각에 대응될 수 있다. 그러나, Column0 내지 Column15는 IM0(1200), IM1(1201) 내지 IM15(1215) 각각으로부터 입력 데이터 ABin을 입력받아, 병렬적으로 데이터를 처리하기 위한 다양한 방식으로 그룹핑될 수 있으며 본 실시예에 따른 그룹핑 방식에 제한되지 아니한다. 도 9에서, 대응 관계를 나타내기 위해 IM의 인덱스와 Column의 인덱스 및 CU의 괄호 인덱스의 두 번째 인덱스는 모두 동일하게 나타낸다.In addition, a predetermined number of CUs (1100, 1115) may be grouped into one column. Each of the different Columns may correspond to each of the different columns of the output feature map. For example, if the size of the column direction of the output feature map is 16, Column0, Column1, and Column15 may correspond to each of the 16 columns of the output feature map. However, Column0 to Column15 may receive input data ABin from IM0 (1200), IM1 (1201) to IM15 (1215), and may be grouped in various ways to process data in parallel, and are not limited to the grouping method according to the present embodiment. In Fig. 9, in order to indicate a correspondence relationship, the index of IM, the index of Column, and the second index of the parentheses index of CU are all indicated identically.

Column들 각각은 16 개의 CU들을 포함한다. 동일한 Column에 속한 서로 다른 CU들은 각각 입력 피처맵 및 커널의 복수의 채널들 각각에 대응할 수 있다. 예를 들어, CU(0, 0)(1100)은 IM0(1200)으로부터 입력된 윈도우의 1 번째 채널을 처리하며, CU(15,0)(1115)은 16 번째 채널을 처리할 수 있다. 따라서, 동일한 Column에 속한 16개의 CU들을 이용하여 16개의 채널들이 병렬적으로 처리될 수 있다.Each of the columns includes 16 CUs. Different CUs belonging to the same column can correspond to each of multiple channels of the input feature map and kernel. For example, CU(0, 0)(1100) can process the 1st channel of the window input from IM0(1200), and CU(15,0)(1115) can process the 16th channel. Therefore, 16 channels can be processed in parallel using 16 CUs belonging to the same column.

Column들 각각은 버퍼를 포함할 수 있다. 예를 들어, Column들 각각에 입력된 윈도우로부터 산출된 출력 피처맵이 ABout이 버퍼에 저장될 수 있다. 현재 입력된 로우 윈도우 또는 차분 윈도우로부터 산출된 출력 피처맵의 엘리먼트는 이에 대응하는 버퍼(Curr)에 저장되며, 다음 로우 윈도우 또는 차분 윈도우로부터 출력 피처맵의 엘리먼트가 산출됨에 따라 버퍼(Curr)에 저장된 엘리먼트는 버퍼(Prev)로 이동하여 저장되며, 다음 로우 윈도우 또는 차분 윈도우로부터 산출된 출력 피처맵의 엘리먼트가 버퍼(Curr)에 저장된다. 버퍼(Prev)에 저장된 출력 피처맵의 엘리먼트는 다른 Column의 차분 윈도우 및 커널간의 컨볼루션 연산 결과와 캐스케이딩 방식으로 합산될 수 있다. 예를 들어, Column15의 버퍼(Prev)에 저장된 출력 피처맵의 엘리먼트는 Column0의 차분 윈도우와 커널간의 컨볼루션 연산 결과와 합산되어 Column0에 대응하는 출력 피처맵의 엘리먼트가 산출될 수 있다.Each of the Columns may include a buffer. For example, an output feature map generated from a window input to each of the Columns may be stored in the ABout buffer. An element of an output feature map generated from a currently input low window or differential window is stored in a corresponding buffer (Curr), and as an element of an output feature map is generated from a next low window or differential window, the element stored in the buffer (Curr) is moved to and stored in the buffer (Prev), and an element of an output feature map generated from the next low window or differential window is stored in the buffer (Curr). An element of an output feature map stored in the buffer (Prev) may be cascaded and added with a result of a convolution operation between a differential window and a kernel of another Column. For example, an element of an output feature map stored in the buffer (Prev) of Column15 may be added with a result of a convolution operation between a differential window and a kernel of Column0 to output an element of an output feature map corresponding to Column0.

복수의 CU들 각각으로부터 산출된 컨볼루션 연산 결과는 대응하는 DU에 입력될 수 있다. CU로부터 산출된 컨볼루션 연산 결과가 로우 윈도우와 커널간의 컨볼루션 연산 결과인 경우, 컨볼루션 연산 결과가 멀티플렉서를 통해 출력되며, CU로부터 산출된 컨볼루션 연산 결과가 차분 윈도우와 커널간의 컨볼루션 연산 결과인 경우 다른 Column에 대응하는 출력 피처맵의 엘리먼트와 합산된 결과가 멀테플렉서를 통해 출력된다. 예를 들어, IM1(1201)으로부터 입력된 ABin이 로우 윈도우인 경우, CU(0,1)을 통해 DU로 입력된 결과(1003)가 그대로 멀티플렉서(1005)를 통해 출력된다. 만약, IM1(1201)으로부터 입력된 ABin이 차분 윈도우인 경우, CU(0,1)로부터 입력된 컨볼루션 연산 결과(1003)와 Column0의 버퍼(Curr)에 저장된 출력 피처맵의 엘리먼트(1002)가 캐스케이딩 방식으로 합산된 결과(1004)가 멀티플렉서(1005)를 통해 출력된다. 또한, 복수의 Column들 각각에서 산출된 출력 피처맵의 엘리먼트들은 현재 레이어의 다음 레이어에 입력 피처맵으로서, IM에 저장될 수 있다.The convolution operation result produced from each of the plurality of CUs can be input to the corresponding DU. If the convolution operation result produced from the CU is the convolution operation result between the low window and the kernel, the convolution operation result is output through the multiplexer, and if the convolution operation result produced from the CU is the convolution operation result between the differential window and the kernel, the result added with the elements of the output feature map corresponding to another column is output through the multiplexer. For example, if ABin input from IM1 (1201) is the low window, the result (1003) input to the DU through CU (0,1) is output as is through the multiplexer (1005). If the ABin input from IM1 (1201) is a differential window, the result (1004) of the convolution operation result (1003) input from CU (0,1) and the element (1002) of the output feature map stored in the buffer (Curr) of Column0 are added in a cascaded manner and output through the multiplexer (1005). In addition, the elements of the output feature map produced from each of the plurality of columns can be stored in IM as an input feature map for the next layer of the current layer.

도 10은 일 실시 예에 따른 차분 윈도우 출력부(1130)를 설명하기 위한 도면이다. 도 10은 도 9에서 설명한 구현 예에 부가되는 차분 윈도우 출력부(1130)를 설명하기 위한 것으로서, 도 9에서 상술한 실시 예가 도 10의 실시 예에 적용될 수 있다.FIG. 10 is a drawing for explaining a differential window output unit (1130) according to one embodiment. FIG. 10 is for explaining a differential window output unit (1130) added to the implementation example described in FIG. 9, and the embodiment described in FIG. 9 can be applied to the embodiment of FIG. 10.

차분 윈도우 출력부는 멀티플렉서(1150)를 포함할 수 있다. 예를 들어, 멀티 플렉서(1150)는 Column0 내지 Column16 중 하나의 Column을 지정하는 CS(Column Select)에 의해 지정된 Column에 대응하는 출력 피처맵의 엘리먼트를 출력한다. 이 때, 선택적으로 멀티플렉서로부터 출력된 엘리먼트는 엑티베이션(activation) 변화기(1110)를 통해서 엑티베이션 값으로 변환될 수 있다. 멀티플렉서로부터 출력된 출력 피처맵의 엘리먼트는 버퍼(1120)에 저장될 수 있다. 이 때, 버퍼에 저장된 출력 피처맵의 엘리먼트와 현재 선택된 Column의 출력 피처맵의 엘리먼트간의 차분 연산을 수행하는 차분기(1140)에 입력된다. 차분기(1140)는 입력된 엘리먼트들간의 차분 연산을 수행하여, 차분 연산 결과를 IM에 저장한다. 따라서, 다음 레이어에서 별도로 차분 윈도우를 생성하지 않고 현재 레이어에서 다음 레이어에서 이용되는 차분 윈도우들을 생성함으로써, 연산 처리의 효율성을 증가시킬 수 있다.The differential window output section may include a multiplexer (1150). For example, the multiplexer (1150) outputs an element of an output feature map corresponding to a Column specified by CS (Column Select) that specifies one Column among Column0 to Column16. At this time, the element optionally output from the multiplexer may be converted into an activation value through an activation converter (1110). The element of the output feature map output from the multiplexer may be stored in a buffer (1120). At this time, it is input to a differentiator (1140) that performs a difference operation between the element of the output feature map stored in the buffer and the element of the output feature map of the currently selected Column. The differentiator (1140) performs a difference operation between the input elements and stores the difference operation result in the IM. Therefore, by generating differential windows used in the next layer in the current layer without generating a separate differential window in the next layer, the efficiency of computational processing can be increased.

도 11은 본 개시에 따른 연산 처리 방식의 연산 처리 속도의 향상을 설명하기 위한 그래프이다.Figure 11 is a graph for explaining the improvement in the operation processing speed of the operation processing method according to the present disclosure.

도 11의 가로축은 가로축은 각각 종래 뉴럴 네트워크 연산 처리 방식 PRA와 본 실시예들에 따른 연산 처리 방식인 Diffy가 적용되는 뉴럴 네트워크 모델들을 나타내며 세로축은 속도 향상의 비교 기준이 되는 종래 방식인 VAA 대비 속도의 증가 정도를 나타낸다. 그래프를 참조하면, Diffy는 VAA 대비 속도가 약 6.1배 향상되었으며, PRA 대비 소도가 약 1.16배 향상 되었다. 또한 모든 뉴럴 네트워크 모델에서 본 개시에 따라 구현된 방식인 Diffy가 종래의 PRA 방식에 비해 연산 처리 속도가 향상되었다.The horizontal axis of Fig. 11 represents neural network models to which the conventional neural network operation processing method PRA and the operation processing method Diffy according to the present embodiments are applied, respectively, and the vertical axis represents the degree of increase in speed compared to the conventional method VAA, which serves as a comparison standard for speed improvement. Referring to the graph, Diffy is improved in speed by about 6.1 times compared to VAA, and is improved in speed by about 1.16 times compared to PRA. In addition, in all neural network models, the method Diffy implemented according to the present disclosure is improved in operation processing speed compared to the conventional PRA method.

도 12는 본 개시에 따른 연산 처리 방식의 프레임율(Frame Rate)의 향상을 설명하기 위한 그래프이다.Figure 12 is a graph for explaining the improvement in frame rate of the operation processing method according to the present disclosure.

도 12에서 뉴럴 네트워크 모델들(DnCNN, FFDNet, IRCNN, JointNet, VDSR, Geom) 각각에서 종래 연산 처리 방식인 VAA, PRA와 본 개시의 구현 예인 Diffy의 프레임율을 나타내는 FPS(HD Frames per Second)를 비교한 그래프이다. 그래프에서 나타난 바와 같이 Diffy는 종래 방식에 비하여 FPS를 크게 증가시킨다. 또한 JointNet 모델에서 Diffy는 30 FPS에 인접한 성능을 보인다. 이는 Diffy가 다른 비교예에 비하여 스마트폰과 같은 단말기에서 실행되는 이미지 관련 어플리케이션에 더 적합한 방식임을 나타낸다.FIG. 12 is a graph comparing FPS (HD Frames per Second) representing the frame rate of VAA, PRA, which are conventional computational processing methods, and Diffy, an implementation example of the present disclosure, in each of the neural network models (DnCNN, FFDNet, IRCNN, JointNet, VDSR, and Geom). As shown in the graph, Diffy significantly increases FPS compared to conventional methods. In addition, in the JointNet model, Diffy shows performance close to 30 FPS. This indicates that Diffy is a method more suitable for image-related applications running on terminals such as smartphones than other comparative examples.

Diffy는 CI-DNN(Computational imaging Deep Neural Netork) 및 기타 CNN (convolutional neural network)의 성능 및 에너지 효율을 향상시키는 DC 기반 아키텍처이다. Diffy는 차분 값을 이용함으로써, 온-칩 및 오프-칩 메모리의 필요한 저장 용량을 줄이고 트래픽을 감소시킬 수 있다. 또한, 최첨단 CI-DNN 에 적용됨에 따라, Diffy는 주기 당 1K 16 × 16b 곱셈 누적 연산을 수행할 수 있도록 하며, 이는 VAA 및 PRA 각각에 비하여 성능을 7.1 배 및 1.41 배 향상시킬 수 있다. Diffy는 대상 응용 프로그램에 따라 HD 프레임을 3.9에서 28.5 FPS까지 처리할 수 있도록 한다. 이는, VAA의 0.7 ~ 3.9FPS, PRA 2.6 ~ 18.9FPS에 비해 크게 향상된 성능이다. Diffy는 로우 값(Raw value)에 대하여, 그룹당 정밀도를 동적으로 결정하는 방식과 비교할 때 온-칩 메모리의 저장공간을 32 % 줄이고 오프-칩 트래픽을 1.43x 줄일 수 있다.Diffy is a DC-based architecture that improves the performance and energy efficiency of CI-DNN (Computational imaging Deep Neural Network) and other convolutional neural networks (CNNs). By utilizing differential values, Diffy can reduce the required storage capacity of on-chip and off-chip memory and reduce traffic. In addition, when applied to state-of-the-art CI-DNNs, Diffy can perform 1K 16 × 16b multiply-accumulate operations per cycle, which can improve the performance by 7.1x and 1.41x compared to VAA and PRA, respectively. Diffy can process HD frames from 3.9 to 28.5 FPS depending on the target application, which is a significant improvement over VAA's 0.7 to 3.9 FPS and PRA's 2.6 to 18.9 FPS. Diffy can reduce on-chip memory storage by 32% and off-chip traffic by 1.43x compared to dynamically determining precision per group for raw values.

한편, 상술한 본 발명의 실시예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터로 읽을 수 있는 기록매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 본 발명의 실시예에서 사용된 데이터의 구조는 컴퓨터로 읽을 수 있는 기록매체에 여러 수단을 통하여 기록될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함한다.Meanwhile, the embodiments of the present invention described above can be written as a program that can be executed on a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable recording medium. In addition, the structure of data used in the embodiments of the present invention described above can be recorded on a computer-readable recording medium through various means. The computer-readable recording medium includes storage media such as magnetic storage media (e.g., ROM, floppy disk, hard disk, etc.), optical reading media (e.g., CD-ROM, DVD, etc.).

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described with reference to preferred embodiments thereof. Those skilled in the art will appreciate that the present invention may be implemented in modified forms without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from an illustrative rather than a restrictive perspective. The scope of the present invention is indicated by the claims, not the foregoing description, and all differences within the scope equivalent thereto should be interpreted as being included in the present invention.

Claims

A method for processing a convolutional neural network in a neural network processing device,
A step of grouping multiple raw windows of an input feature map into multiple differential groups for differential operation;
A step of generating differential windows by performing a differential operation between row windows belonging to each of the above-mentioned plurality of differential groups;
A step of obtaining a reference element of an output feature map corresponding to the reference low window by performing a convolution operation between a reference low window and a kernel among the above low windows; and
A step of obtaining the remaining elements of the output feature map by performing a sum operation with the reference element on the results of the convolution operation between each of the differential windows and the kernel,
The above plurality of low windows are sub-feature maps of the input feature map,
The above difference operation is an element-wise difference operation between two row windows belonging to each of the above plurality of difference groups.
method.

In paragraph 1,
The above low windows are determined from the input feature map according to a sliding window fashion,
A method wherein each of the above plurality of differential groups is grouped to include two adjacent row windows in the sliding direction according to the sliding window method.

In the second paragraph,
The step of generating the above differential windows is:
A method for generating the difference windows by performing an element-wise difference operation between two adjacent row windows belonging to each of the plurality of difference groups.

In the second paragraph,
The steps for obtaining the remaining elements are:
A method for obtaining the remaining elements of the output feature map by performing a sum operation with the reference element on the result of the cascading of the convolution operation results between each of the differential windows and the kernel.

In paragraph 4,
The above summation result of the above cascading method is,
A method wherein the result of the convolution operation between the current differential window and the kernel is the result of the sum of the convolution operation results corresponding to one or more previous differential windows preceding the current differential window along the sliding direction.

In paragraph 1,
If each of the above differential windows is in a bit data format, the method further comprises a step of converting the data into a data format including information on the number of digits of a significant bit representing a bit value of 1 for each of the above differential windows.
The results of the convolution operation between each of the above differential windows and the kernel are
A method for convolution operation results between each of the differential windows converted into the above data format and the kernel.

In paragraph 6,
The results of the convolution operation between each of the differential windows converted to the above data format and the above kernel are
A method for producing results according to a bit-shift operation result based on information about the number of digits of the above valid bits.

In paragraph 7,
A method further comprising the step of preprocessing the differential windows according to a booth algorithm that reduces the number of valid bits as a preprocessing for converting the above data format.

In a neural network processing device,
Memory where the input feature map is stored; and
A neural network processor for processing a convolutional neural network using the input feature map stored in the memory,
The above neural network processor,
Group multiple raw windows of the input feature map into multiple differential groups for differential operation,
Differential windows are generated by performing a differential operation between the row windows belonging to each of the above-mentioned multiple differential groups,
By performing a convolution operation between the reference low window and the kernel among the above low windows, a reference element of the output feature map corresponding to the reference low window is obtained,
By performing a sum operation with the reference element on the results of the convolution operation between each of the above differential windows and the kernel, the remaining elements of the output feature map are obtained.
The above plurality of low windows are sub-feature maps of the input feature map,
The above difference operation is an element-wise difference operation between two row windows belonging to each of the above plurality of difference groups.
device.

In Article 9,
The above low windows are determined from the input feature map according to a sliding window fashion,
A device wherein each of the plurality of differential groups is grouped to include two adjacent row windows in the sliding direction according to the sliding window method.

In Article 10,
The above neural network processor,
A device for generating the difference windows by performing an element-wise difference operation between two adjacent row windows belonging to each of the plurality of difference groups.

In Article 10,
The above neural network processor,
A device for obtaining the remaining elements of the output feature map by performing a sum operation with the reference element on the result of the cascading of the convolution operation results between each of the differential windows and the kernel.

In Article 12,
The above summation result of the above cascading method is,
A device, wherein the result of the convolution operation between the current differential window and the kernel is the result of the sum of the convolution operation results corresponding to one or more previous differential windows preceding the current differential window along the sliding direction.

In Article 9,
The above neural network processor,
If each of the above differential windows is in a bit data format, convert it into a data format that includes information on the number of digits of significant bits representing a bit value of 1 for each of the above differential windows,
The results of the convolution operation between each of the above differential windows and the kernel are
A device, which is the result of a convolution operation between each of the differential windows converted into the above data format and the above kernel.

In Article 14,
The results of the convolution operation between each of the differential windows converted to the above data format and the above kernel are
A device which produces results according to the result of a bit-shift operation based on information about the number of digits of the above valid bits.

In Article 15,
The above neural network processor,
A device that preprocesses the differential windows according to a booth algorithm that reduces the number of valid bits as a preprocessing for converting the above data format.