KR102710479B1

KR102710479B1 - Apparatus and method for accelerating neural network inference based on efficient address translation

Info

Publication number: KR102710479B1
Application number: KR1020220023476A
Authority: KR
Inventors: 김태환; 이수정; 최경찬
Original assignee: 한국항공대학교산학협력단
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2024-09-25
Anticipated expiration: 2042-02-23
Also published as: KR20230126388A

Abstract

효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치 및 방법이 개시되며, 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법은, (a) 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터를 획득하는 단계 및 (b) 상기 위치 정보 및 상기 차원 파라미터에 기초하여 상기 계층의 피처 및 웨이트에 대한 주소 정보를 생성하는 단계를 포함할 수 있다.A neural network inference acceleration device and method having an efficient address conversion function are disclosed. The neural network inference acceleration method having an efficient address conversion function according to one embodiment of the present invention may include: (a) a step of obtaining location information and dimensional parameters associated with operations performed in a predetermined layer forming a neural network; and (b) a step of generating address information for features and weights of the layer based on the location information and the dimensional parameters.

Description

{APPARATUS AND METHOD FOR ACCELERATING NEURAL NETWORK INFERENCE BASED ON EFFICIENT ADDRESS TRANSLATION}

본원은 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치 및 방법에 관한 것이다. 예를 들면, 본원은 재설정 가능한 주소 변환 기능을 갖춘 낮은 복잡도의 컨벌루션 신경망 추론 가속 회로 장치 및 그의 구동 방법에 관한 것이다.The present invention relates to a neural network inference acceleration device and method having an efficient address translation function. For example, the present invention relates to a low complexity convolutional neural network inference acceleration circuit device having a reconfigurable address translation function and a method for driving the same.

본 연구는 연구지원사업명 경기도지역협력연구센터(GRRC)의 “3차원 공간 데이터 처리 및 응용 기술 연구” (연구기간: 2020-07-01 ~ 2021-06-30) 과제의 연구비에 의해 지원되었다(연구관리전문기관: 경기도, 주관기관: 한국항공대학교 산학협력단, 과제고유번호: GRRC-2017-B02).This study was supported by the research grant of the Gyeonggi Regional Research Center (GRRC) under the project titled “3D Spatial Data Processing and Application Technology Research” (Research Period: 2020-07-01 ~ 2021-06-30) (Research Management Agency: Gyeonggi Province, Host Organization: Korea Aerospace University Industry-Academic Cooperation Foundation, Project Identification Number: GRRC-2017-B02).

컨벌루션 신경망(합성곱 신경망, Convolutional Neural Network, CNN)은 이미지 분석 등에 사용되는 딥러닝 기술의 한 종류로, 입력 계층, 출력 계층과 입출력 계층 사이의 여러 은닉 계층으로 이루어진 네트워크 구조를 가진다. 구체적으로, 각 계층은 컨벌루션, 풀링, 활성화 함수로 구성되어 있고, 이러한 계층이 반복되면서 한 이미지의 특징을 추출하고 분류하는 작업을 수행하며, 여기서 컨벌루션 연산은 컨벌루션의 웨이트인 필터의 윈도우를 일정한 간격으로 이동해 가며 입력 데이터에 적용하여 합성곱을 연산하는 과정이다.A convolutional neural network (CNN) is a type of deep learning technology used for image analysis, etc., and has a network structure consisting of an input layer, an output layer, and multiple hidden layers between the input and output layers. Specifically, each layer consists of convolution, pooling, and activation functions, and as these layers are repeated, the features of an image are extracted and classified. Here, the convolution operation is a process of calculating convolution by applying the window of the filter, which is the weight of the convolution, to the input data while moving it at regular intervals.

이러한 컨벌루션(합성곱 연산)은 각 계층마다 적용되어 연산이 이루어지는데, 반복적인 합성곱 연산으로 이미지 추론을 위한 연산과정에서 가장 긴 시간을 차지하고, 하드웨어 구현에 있어 동작 주파수를 낮추는 문제점을 가지고 있다.This convolution (convolution operation) is applied to each layer to perform the operation, and it takes up the longest time in the computation process for image inference due to the repetitive convolution operation, and has the problem of lowering the operating frequency in hardware implementation.

한편, 컨벌루션 신경망 추론 가속 회로 장치는 주소 생성, 메모리 읽기, 컨벌루션 신경망 추론을 위한 연산과정, 메모리 쓰기 등의 작업을 수행하도록 이루어지며, 한 계층에 대한 이미지 추론을 수행한다. 여기서, 인풋 피처와 웨이트 값을 메모리에서 불러오기 위한 주소와 추론된 아웃풋 피처를 저장하기 위한 주소를 생성하는 주소 생성 과정은 매 계층 마다 컨벌루션 연산을 하기 전 수행되게 된다. 이에 따라, 주소 생성 과정이 복잡하게 구현되면, 컨벌루션이 진행될 때 마다 추론 속도가 현저히 낮아지는 문제가 발생하게 된다.Meanwhile, the convolutional neural network inference acceleration circuit device is configured to perform tasks such as address generation, memory reading, computational process for convolutional neural network inference, and memory writing, and performs image inference for one layer. Here, the address generation process for generating an address for loading input features and weight values from memory and an address for storing inferred output features is performed before the convolution operation for each layer. Accordingly, if the address generation process is implemented in a complex manner, a problem occurs in which the inference speed is significantly reduced each time convolution is performed.

본원의 배경이 되는 기술은 한국등록특허공보 제10-2107077호에 개시되어 있다.The background technology of this application is disclosed in Korean Patent Publication No. 10-2107077.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 신경망 추론 가속 회로(가속기)가 수행하는 동작 중 주소 생성 과정에서 발생하는 추론 속도 저하를 보완할 수 있는 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치 및 방법을 제공하려는 것을 목적으로 한다.The present invention is intended to solve the problems of the above-mentioned prior art, and to provide a neural network inference acceleration device and method equipped with an efficient address conversion function capable of compensating for the decrease in inference speed that occurs during the address generation process among the operations performed by a neural network inference acceleration circuit (accelerator).

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical tasks to be achieved by the embodiments of the present invention are not limited to the technical tasks described above, and other technical tasks may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법은, (a) 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터를 획득하는 단계 및 (b) 상기 위치 정보 및 상기 차원 파라미터에 기초하여 상기 계층의 피처 및 웨이트에 대한 주소 정보를 생성하는 단계를 포함할 수 있다.As a technical means for achieving the above-mentioned technical task, a neural network inference acceleration method having an efficient address conversion function according to one embodiment of the present invention may include (a) a step of obtaining location information and dimensional parameters associated with operations performed in a predetermined layer forming a neural network, and (b) a step of generating address information for features and weights of the layer based on the location information and the dimensional parameters.

또한, 상기 (b) 단계는, (b1) 상기 차원 파라미터에 기초하여 상기 주소 정보의 레이아웃을 결정하는 단계 및 (b2) 상기 위치 정보에 포함된 복수의 좌표 각각에 대하여 소정의 기준 연산을 수행하여 상기 레이아웃에 대응하는 주소 값을 결정하는 단계를 포함할 수 있다.In addition, the step (b) may include a step of (b1) determining a layout of the address information based on the dimensional parameters, and a step of (b2) performing a predetermined reference operation on each of a plurality of coordinates included in the location information to determine an address value corresponding to the layout.

또한, 상기 기준 연산은, 시프트 연산, AND 연산 및 OR 연산을 포함할 수 있다.Additionally, the above reference operations may include shift operations, AND operations, and OR operations.

또한, 상기 (b) 단계는, 상기 계층의 입력 피처에 대한 제1주소 정보, 상기 웨이트에 대한 제2주소 정보 및 상기 계층의 출력 피처에 대한 제3주소 정보를 각각 생성할 수 있다.In addition, the step (b) can generate first address information for the input feature of the layer, second address information for the weight, and third address information for the output feature of the layer, respectively.

또한, 상기 제1주소 정보 및 상기 제2주소 정보 각각은 상기 신경망과 연계된 데이터를 저장하는 메모리로부터 상기 입력 피처 및 상기 웨이트를 각각 불러오기 위한 주소 정보일 수 있다.Additionally, each of the first address information and the second address information may be address information for retrieving the input feature and the weight, respectively, from a memory that stores data linked to the neural network.

또한, 상기 제3주소 정보는 상기 신경망을 이용한 연산 결과를 상기 메모리에 저장하기 위한 주소 정보일 수 있다.Additionally, the third address information may be address information for storing the result of an operation using the neural network in the memory.

또한, 상기 신경망은 합성곱 신경망(Convolutional Neural Network, CNN)일 수 있다.Additionally, the neural network may be a convolutional neural network (CNN).

또한, 상기 (a) 단계에서, 상기 피처에 대한 상기 차원 파라미터는 3차원에 대응하고, 상기 웨이트에 대한 상기 차원 파라미터는 4차원에 대응할 수 있다.Additionally, in the step (a), the dimension parameter for the feature may correspond to three dimensions, and the dimension parameter for the weight may correspond to four dimensions.

한편, 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법은, (a) 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터에 기초하여 상기 계층의 입력 피처, 웨이트 및 출력 피처에 대한 주소 정보를 각각 생성하는 단계, (b) 상기 주소 정보에 기초하여 상기 신경망과 연계된 데이터를 저장하는 메모리로부터 상기 입력 피처 및 상기 웨이트를 획득하는 단계, (c) 상기 입력 피처 및 상기 웨이트에 기초하여 상기 연산을 수행하는 단계 및 (d) 상기 연산의 수행 결과를 상기 주소 정보에 기초하여 상기 메모리에 저장하는 단계를 포함할 수 있다.Meanwhile, a neural network inference acceleration method having an efficient address conversion function according to one embodiment of the present invention may include: (a) a step of generating address information for input features, weights, and output features of a predetermined layer forming a neural network, respectively, based on location information and dimensional parameters associated with an operation performed in the layer, (b) a step of obtaining the input features and the weights from a memory storing data associated with the neural network based on the address information, (c) a step of performing the operation based on the input features and the weights, and (d) a step of storing a result of performing the operation in the memory based on the address information.

한편, 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치는, 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터를 획득하는 파라미터 설정부 및 상기 위치 정보 및 상기 차원 파라미터에 기초하여 상기 계층의 피처 및 웨이트에 대한 주소 정보를 생성하는 주소 변환부를 포함할 수 있다.Meanwhile, a neural network inference acceleration device equipped with an efficient address conversion function according to one embodiment of the present invention may include a parameter setting unit that obtains location information and dimensional parameters associated with operations performed in a predetermined layer forming a neural network, and an address conversion unit that generates address information for features and weights of the layer based on the location information and the dimensional parameters.

또한, 상기 주소 변환부는, 상기 차원 파라미터에 기초하여 상기 주소 정보의 레이아웃을 결정하는 레이아웃 생성부 및 상기 위치 정보에 포함된 복수의 좌표 각각에 대하여 소정의 기준 연산을 수행하여 상기 레이아웃에 대응하는 주소 값을 결정하는 변환 수행부를 포함할 수 있다.In addition, the address conversion unit may include a layout generation unit that determines a layout of the address information based on the dimensional parameters, and a conversion performing unit that performs a predetermined reference operation on each of a plurality of coordinates included in the location information to determine an address value corresponding to the layout.

또한, 상기 주소 변환부는, 상기 계층의 입력 피처에 대한 제1주소 정보, 상기 웨이트에 대한 제2주소 정보 및 상기 계층의 출력 피처에 대한 제3주소 정보를 각각 생성할 수 있다.In addition, the address conversion unit can generate first address information for the input feature of the layer, second address information for the weight, and third address information for the output feature of the layer, respectively.

한편, 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 가속기는, 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터에 기초하여 상기 계층의 입력 피처, 웨이트 및 출력 피처에 대한 주소 정보를 각각 생성하는 주소 생성 모듈, 상기 주소 정보에 기초하여 상기 신경망과 연계된 데이터를 저장하는 메모리로부터 상기 입력 피처 및 상기 웨이트를 획득하는 탐색 모듈, 상기 입력 피처 및 상기 웨이트에 기초하여 상기 연산을 수행하는 연산 모듈 및 상기 연산의 수행 결과를 상기 주소 정보에 기초하여 상기 메모리에 저장하는 기록 모듈을 포함할 수 있다.Meanwhile, a neural network accelerator equipped with an efficient address conversion function according to one embodiment of the present invention may include an address generation module which generates address information for input features, weights and output features of a predetermined layer forming a neural network, respectively, based on location information and dimensional parameters associated with an operation performed in the layer; a search module which obtains the input features and the weights from a memory storing data associated with the neural network based on the address information; a calculation module which performs the operation based on the input features and the weights; and a recording module which stores a result of performing the operation in the memory based on the address information.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present invention. In addition to the above-described exemplary embodiments, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 신경망 추론 가속 회로(가속기)가 수행하는 동작 중 주소 생성 과정에서 발생하는 추론 속도 저하를 보완할 수 있는 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치 및 방법을 제공할 수 있다.According to the above-described means for solving the problem of the present invention, a neural network inference acceleration device and method having an efficient address conversion function capable of compensating for a decrease in inference speed that occurs during an address generation process among operations performed by a neural network inference acceleration circuit (accelerator) can be provided.

전술한 본원의 과제 해결 수단에 의하면, 해상도 등 입력 데이터의 특성에 따라 필터 윈도우의 크기가 변화할 수 있음을 고려하여 다차원 데이터에 대응하여서도 효율적으로 주소 변환이 이루어질 수 있다.According to the solution to the problem of the present invention described above, address conversion can be performed efficiently even in response to multidimensional data by considering that the size of the filter window can change depending on the characteristics of the input data, such as resolution.

전술한 본원의 과제 해결 수단에 의하면, 주소 정보 생성을 위한 연산을 시프트 연산, AND 연산 등 비교적 단순한 연산을 기초로 하여 구현함으로써 복잡도를 줄여 신경망의 추론 시간을 효과적으로 줄일 수 있다.According to the above-described means for solving the problem of the present invention, the complexity is reduced by implementing the operation for generating address information based on relatively simple operations such as shift operations and AND operations, thereby effectively reducing the inference time of the neural network.

전술한 본원의 과제 해결 수단에 의하면, 피처의 위치와 차원 파라미터를 기초로 하여 유동적으로 주소를 생성할 수 있어 데이터의 차원과 무관하게 주소 정보의 변환을 용이하게 수행할 수 있다.According to the above-described means for solving the problem of the present invention, an address can be generated flexibly based on the location and dimension parameters of a feature, so that conversion of address information can be easily performed regardless of the dimension of data.

다만, 본원에서 얻을 수 있는 효과는 상기된 바와 같은 효과들로 한정되지 않으며, 또 다른 효과들이 존재할 수 있다.However, the effects that can be obtained from this invention are not limited to the effects described above, and other effects may exist.

도 1은 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치를 포함하는 신경망 기반의 추론 시스템의 개략적인 구성도이다.
도 2는 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치를 포함하는 신경망 가속기의 개략적인 구성도이다.
도 3은 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치에 의해 생성되는 주소 정보의 변환 방식을 설명하기 위한 개념도이다.
도 4는 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치의 개략적인 구성도이다.
도 5는 본원의 제1실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법에 대한 동작 흐름도이다.
도 6은 본원의 제2실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법에 대한 동작 흐름도이다.
도 7은 본원의 제2실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법에 대한 세부 동작 흐름도이다.FIG. 1 is a schematic diagram of a neural network-based inference system including a neural network inference accelerator equipped with an efficient address conversion function according to one embodiment of the present invention.
FIG. 2 is a schematic diagram of a neural network accelerator including a neural network inference accelerator with an efficient address translation function according to one embodiment of the present invention.
FIG. 3 is a conceptual diagram for explaining a method of converting address information generated by a neural network inference acceleration device equipped with an efficient address conversion function according to one embodiment of the present invention.
FIG. 4 is a schematic diagram of a neural network inference acceleration device having an efficient address conversion function according to one embodiment of the present invention.
FIG. 5 is a flowchart illustrating an operation of a neural network inference acceleration method having an efficient address conversion function according to the first embodiment of the present invention.
FIG. 6 is a flowchart of an operation of a neural network inference acceleration method equipped with an efficient address conversion function according to the second embodiment of the present invention.
Figure 7 is a detailed operation flow diagram of a neural network inference acceleration method equipped with an efficient address conversion function according to the second embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Below, with reference to the attached drawings, embodiments of the present invention are described in detail so that those with ordinary skill in the art can easily practice the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present invention in the drawings, parts that are not related to the description are omitted, and similar parts are assigned similar drawing reference numerals throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected," but also the case where it is "electrically connected" or "indirectly connected" with another element in between.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when it is said that an element is located “on,” “above,” “below,” “below,” or “below” another element, this includes not only cases where an element is in contact with another element, but also cases where another element exists between the two elements.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, whenever a part is said to "include" a component, this does not mean that it excludes other components, but rather that it may include other components, unless otherwise specifically stated.

도 1은 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치를 포함하는 신경망 기반의 추론 시스템의 개략적인 구성도이다.FIG. 1 is a schematic diagram of a neural network-based inference system including a neural network inference accelerator equipped with an efficient address conversion function according to one embodiment of the present invention.

도 1을 참조하면, 본원에서 개시하는 신경망 기반의 추론 시스템(1000)은 본원의 일 실시예에 따른 신경망 가속기(10) 및 신경망의 연산을 위한 데이터, 신경망의 연산(추론) 결과에 따른 출력 데이터 등을 저장하기 위한 메모리(20)를 포함할 수 있다.Referring to FIG. 1, a neural network-based inference system (1000) disclosed in the present invention may include a neural network accelerator (10) according to one embodiment of the present invention and a memory (20) for storing data for neural network operation, output data according to neural network operation (inference) results, etc.

한편, 본원의 실시예에 관한 설명에서 신경망은 이미지 데이터 등을 입력으로 하여 추론을 수행하는 합성곱 신경망(Convolutional Neural Network, CNN)을 의미하는 것일 수 있으나, 이에만 한정되는 것은 아니고, 본원의 구현예에 따라 심층 신경망(Deep Neural Network, DNN), 재귀 신경망(Recurrent Neural Network, RNN) 등 종래 이미 공지되었거나 향후 개발될 수 있는 다양한 유형의 인공 신경망을 폭넓게 포함할 수 있다.Meanwhile, in the description of the embodiments of the present invention, the neural network may mean a convolutional neural network (CNN) that performs inference using image data and the like as input, but is not limited thereto, and may broadly include various types of artificial neural networks that are already known in the past or may be developed in the future, such as a deep neural network (DNN) and a recurrent neural network (RNN), depending on the implementation example of the present invention.

도 2는 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치를 포함하는 신경망 가속기의 개략적인 구성도이다.FIG. 2 is a schematic diagram of a neural network accelerator including a neural network inference accelerator with an efficient address translation function according to one embodiment of the present invention.

도 2를 참조하면, 본원의 일 실시예에 따른 신경망 가속기(10)는 주소 생성 모듈(100), 탐색 모듈(200), 연산 모듈(300) 및 기록 모듈(400)을 포함할 수 있다. 참고로, 본원의 실시예에 관한 설명에서 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치(100)(이하, '추론 가속 장치(100'라 한다.)는 신경망의 추론 프로세스를 수행하기 위한 입력 데이터 및 출력 데이터와 연계된 주소 정보를 생성하는 기능을 담당하는 측면에서 주소 생성 모듈(100)로 달리 지칭될 수 있다. 즉, 본원의 실시예에 관한 설명에서 도면부호 100은 '추론 가속 장치(100)' 및 '주소 생성 모듈(100)'에 대하여 혼용될 수 있다.Referring to FIG. 2, a neural network accelerator (10) according to one embodiment of the present invention may include an address generation module (100), a search module (200), a calculation module (300), and a recording module (400). For reference, in the description of the embodiment of the present invention, a neural network inference acceleration device (100) equipped with an efficient address conversion function according to one embodiment of the present invention (hereinafter referred to as 'inference acceleration device (100)') may be referred to differently as an address generation module (100) in that it is responsible for the function of generating address information linked to input data and output data for performing an inference process of a neural network. That is, in the description of the embodiment of the present invention, the reference numeral 100 may be used interchangeably for 'inference acceleration device (100)' and 'address generation module (100)'.

보다 구체적으로, 본원에서 개시하는 추론 가속 장치(100) 또는 주소 생성 모듈(100)은 이하에서 상세히 설명하는 바와 같이 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터에 기초하여 해당 계층의 입력 피처, 웨이트 및 출력 피처에 대한 주소 정보(도 1의 'I addr.', 'F addr.', 'O addr.' 등)를 생성할 수 있다.More specifically, the inference acceleration device (100) or address generation module (100) disclosed in the present invention can generate address information (such as 'I addr.', 'F addr.', 'O addr.' in FIG. 1) for input features, weights, and output features of a given layer forming a neural network based on location information and dimensional parameters associated with operations performed in the corresponding layer, as described in detail below.

또한, 도 1을 참조하면, 탐색 모듈(200)은 생성된 주소 정보에 기초하여 신경망과 연계된 데이터를 저장하는 메모리(20)로부터 입력 피처(도 1의 'I') 및 웨이트(도 1의 'F')를 획득할 수 있다.In addition, referring to FIG. 1, the search module (200) can obtain input features ('I' in FIG. 1) and weights ('F' in FIG. 1) from a memory (20) that stores data linked to a neural network based on the generated address information.

또한, 연산 모듈(300)은 메모리(20)로부터 획득한 입력 피처 및 웨이트에 기초하여 해당 계층에서 수행되는 연산을 수행할 수 있다. 이와 관련하여, 도 1을 참조하면, 연산 모듈(300)이 수행하는 연산은 합성곱(Convolution) 연산, 활성화 연산(활성화 함수), 풀링 연산(예를 들면, 맥스 풀링(Max Pooling) 등) 등을 포함할 수 있으나, 이에만 한정되는 것은 아니다.In addition, the operation module (300) can perform an operation performed in the corresponding layer based on the input features and weights obtained from the memory (20). In this regard, referring to FIG. 1, the operation performed by the operation module (300) may include, but is not limited to, a convolution operation, an activation operation (activation function), a pooling operation (e.g., Max Pooling), etc.

이하에서는 도 3을 참조하여 추론 가속 장치(100)의 주소 정보 생성 프로세스를 포함하는 구체적인 기능 및 동작에 대하여 상세히 설명하도록 한다.Hereinafter, with reference to FIG. 3, specific functions and operations including the address information generation process of the inference accelerator (100) will be described in detail.

도 3은 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치에 의해 생성되는 주소 정보의 변환 방식을 설명하기 위한 개념도이다.FIG. 3 is a conceptual diagram for explaining a method of converting address information generated by a neural network inference acceleration device equipped with an efficient address conversion function according to one embodiment of the present invention.

도 3을 참조하면, 추론 가속 장치(100)는 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터를 획득할 수 있다.Referring to FIG. 3, the inference accelerator (100) can obtain position information and dimensional parameters associated with operations performed in a given layer forming a neural network.

구체적으로 본원의 실시예에 관한 설명에서 위치 정보는 해당 계층에서 연산이 수행되는 입력 피처의 입력 데이터(예를 들면, 입력 이미지 등) 상에서의 좌표 정보와 웨이트의 좌표 정보를 포함할 수 있다. 참고로, 본원의 실시예에 관한 설명에서 웨이트는 가중치, 필터, 커널 등으로 달리 지칭될 수 있다.Specifically, in the description of the embodiments of the present invention, the location information may include coordinate information on input data (e.g., input image, etc.) of input features on which operations are performed in the corresponding layer, and coordinate information of weights. For reference, in the description of the embodiments of the present invention, the weights may be referred to differently as weights, filters, kernels, etc.

또한, 본원의 실시예에 관한 설명에서 차원 파라미터는 입력 피처와 출력 피처를 포함하는 피처 및 입력 피처에 대하여 합성곱 연산을 수행하기 위한 웨이트와 연계된 텐서(Tensor) 차원을 의미할 수 있다. 예를 들어, 피처에 대한 차원 파라미터는 3차원에 대응하고, 웨이트에 대한 상기 차원 파라미터는 4차원에 대응하는 것일 수 있으나, 이에만 한정되는 것은 아니다.In addition, in the description of the embodiments of the present invention, the dimension parameter may mean a tensor dimension associated with a feature including an input feature and an output feature and a weight for performing a convolution operation on the input feature. For example, the dimension parameter for the feature may correspond to three dimensions, and the dimension parameter for the weight may correspond to four dimensions, but is not limited thereto.

한편, 도 3의 'Address'의 부분을 참조하면, 텐서가 4차원일 경우, 4개의 차원의 크기에 맞게 주소 정보의 레이아웃이 생성되고, 각각의 차원에 부합하는 위치에 주소 정보가 저장될 수 있으며, 이와 달리 텐서가 3차원인 경우에는, 도 3의 'Address'의 부분에서 점선 부분이 제거된 형태로 3개의 차원의 크기에 맞게 주소 정보의 레이아웃이 생성되며, 각각의 차원에 부합하는 위치에 주소 정보가 저장될 수 있게 된다.Meanwhile, referring to the 'Address' part of Fig. 3, if the tensor is four-dimensional, the layout of address information is generated according to the sizes of the four dimensions, and the address information can be stored in a location corresponding to each dimension. On the other hand, if the tensor is three-dimensional, the layout of address information is generated according to the sizes of the three dimensions with the dotted line part removed in the 'Address' part of Fig. 3, and the address information can be stored in a location corresponding to each dimension.

이와 관련하여, 컨벌루션 신경망은 추론에 사용되는 입력 데이터의 크기, 특성 등에 따라 다른 해상도의 필터가 적용되어야 하며, 이를 고려하여 본원에서 개시하는 추론 가속 장치(100)는 피처의 위치와 차원 파라미터 정보에 기초하여 유동적으로 주소 정보를 생성할 수 있기 때문에 다차원 데이터에 확장성 있게 적용할 수 있다는 점에서 이점이 있다. 따라서, 추론 가속 장치(100)는 다차원 데이터인 경우에도 주소 정보를 효율적으로 생성할 수 있어 신경망의 전체적인 추론 속도를 획기적으로 향상시킬 수 있다.In this regard, a convolutional neural network must apply filters of different resolutions depending on the size, characteristics, etc. of input data used for inference, and considering this, the inference acceleration device (100) disclosed in the present invention has an advantage in that it can be applied scalably to multidimensional data because it can dynamically generate address information based on the location and dimensional parameter information of the feature. Accordingly, the inference acceleration device (100) can efficiently generate address information even in the case of multidimensional data, and thus can dramatically improve the overall inference speed of the neural network.

또한, 추론 가속 장치(100)는 획득(설정)한 위치 정보 및 차원 파라미터에 기초하여 신경망을 이루는 소정의 계층의 피처 및 웨이트에 대한 주소 정보를 생성할 수 있다. 보다 구체적으로, 추론 가속 장치(100)는 해당 계층의 입력 피처에 대한 제1주소 정보(도 1의 'I addr.'), 웨이트에 대한 제2주소 정보(도 1의 'F addr.') 및 해당 계층의 출력 피처에 대한 제3주소 정보(도 1의 'O addr.')를 각각 생성할 수 있다.In addition, the inference acceleration device (100) can generate address information for features and weights of a predetermined layer forming a neural network based on the acquired (set) location information and dimension parameters. More specifically, the inference acceleration device (100) can generate first address information for input features of the corresponding layer ('I addr.' in FIG. 1), second address information for weights ('F addr.' in FIG. 1), and third address information for output features of the corresponding layer ('O addr.' in FIG. 1), respectively.

이와 관련하여, 제1주소 정보 및 제2주소 정보 각각은 신경망과 연계된 데이터를 저장하는 메모리(20)로부터 입력 피처(도 1의 'I') 및 웨이트(도 1의 'F')를 각각 불러오기 위한 주소 정보이고, 제3주소 정보는 신경망을 이용한 연산 결과((도 1의 'O')를 메모리(20)에 저장하기 위한 주소 정보일 수 있다.In this regard, the first address information and the second address information are address information for respectively retrieving input features ('I' in FIG. 1) and weights ('F' in FIG. 1) from a memory (20) that stores data linked to a neural network, and the third address information may be address information for storing an operation result ('O' in FIG. 1) using a neural network in the memory (20).

또한, 본원의 일 실시예에 따르면, 추론 가속 장치(100)는 전술한 바와 같이 차원 파라미터에 기초하여 생성될 주소 정보의 레이아웃을 결정하고, 위치 정보에 포함된 복수의 좌표 각각에 대하여 소정의 기준 연산을 수행하여 생성된 레이아웃에 대응하는 주소 값을 각각 결정할 수 있다.In addition, according to one embodiment of the present invention, the inference acceleration device (100) can determine the layout of address information to be generated based on the dimensional parameters as described above, and perform a predetermined reference operation on each of a plurality of coordinates included in the location information to determine each address value corresponding to the generated layout.

또한, 본원의 일 실시예에 따르면, 주소 값을 결정하기 위해 위치 정보에 포함된 좌표에 대하여 적용되는 기준 연산은 시프트 연산, AND 연산 및 OR 연산을 포함할 수 있다.Additionally, according to one embodiment of the present invention, the reference operation applied to the coordinates included in the location information to determine the address value may include a shift operation, an AND operation, and an OR operation.

이와 관련하여, 종래의 컨벌루션 신경망 추론 가속회로 장치의 경우, 컨벌루션 신경망을 처리하는 전체 프로세서에서 주소 생성 과정을 덧셈기와 곱셈기를 사용하여 주로 수행하였기 때문에, 주소 생성 프로세스가 전체 회로에서 크리티컬 패스(Critical Path)에 해당하여 동작 주파수를 제한하였을 뿐만 아니라, 피처 차원에 따른 주소 변경이 어려워 다양한 이미지를 적용하는데 어려웠던 것과 달리, 본원에서 개시하는 추론 가속 장치(100)는 주소 정보를 생성하기 위한 연산 과정을 시프트 연산, AND 연산 및 OR 연산을 수행하도록 구현하여 복잡도를 획기적으로 낮출 수 있다.In this regard, in the case of a conventional convolutional neural network inference acceleration circuit device, since the address generation process was mainly performed using an adder and a multiplier in the entire processor processing the convolutional neural network, the address generation process corresponded to a critical path in the entire circuit, which not only limited the operating frequency but also made it difficult to change the address according to the feature dimension, making it difficult to apply various images, unlike this, the inference acceleration device (100) disclosed in the present invention can drastically reduce the complexity by implementing the operation process for generating address information to perform a shift operation, an AND operation, and an OR operation.

구체적으로, 추론 가속 장치(100)는 차원 파라미터에 기초하여 위치 정보에 포함된 각각의 좌표마다의 시프트 값 및 마스크 값을 결정(정의)하고, 각 좌표를 결정된 시프트 값만큼 시트프 연산하고, 각 좌표마다 결정된 마스크 값을 이용하여 미리 설정된 비트(bit)에 변환된 주소 값이 저장되도록 하는 과정을 통해 피처 또는 웨이트의 위치를 각 좌표의 주소로 변환할 수 있다.Specifically, the inference accelerator (100) determines (defines) shift values and mask values for each coordinate included in the location information based on dimensional parameters, performs a shift operation on each coordinate by the determined shift value, and stores the converted address value in a preset bit using the determined mask value for each coordinate, thereby converting the location of a feature or weight into the address of each coordinate.

예를 들어, 차원 파라미터가 3차원에 대응하는 입력 피처에 대한 주소 변환 과정은 하기 식 1을 통해 설명될 수 있다.For example, the address transformation process for an input feature whose dimension parameter corresponds to three dimensions can be described through Equation 1 below.

[식 1][Formula 1]

여기서, <<는 시프트 연산자를 의미하고, &는 AND 연산자를 의미하고, |는 OR 연산자를 의미하며, x, y, z는 각각 입력 피처 위치 정보의 좌표 값을 나타내고, x_shamt, y_shamt, z_shamt는 각각의 좌표를 시프트 해주기 위한 시프트 값이고, x_mask, y_mask, z_mask는 각 좌표를 정해진 비트에 저장하기 위한 상수인 마스트 값이며, 이와 같이 주어진 좌표 값은 시프트 연산자와 AND 연산자를 사용하여 x_trans_addr, y_trans_addr, z_trans_addr에 각 좌표에 대응하는 주소 값이 생성되며, 변환된 각 좌표의 주소 값은 OR 연산자를 사용하여 입력 피처의 주소 정보인 제1주소 정보로 최종 변환된다.Here, << represents a shift operator, & represents an AND operator, | represents an OR operator, x , y , and z represent coordinate values of input feature location information, respectively, x_shamt , y_shamt , and z_shamt represent shift values for shifting each coordinate, and x_mask , y_mask , and z_mask are mask values which are constants for storing each coordinate in a specified bit, and the given coordinate values are used to generate address values corresponding to each coordinate in x_trans_addr , y_trans_addr , and z_trans_addr using the shift operator and the AND operator, and the address values of each converted coordinate are finally converted to the first address information, which is the address information of the input feature, using the OR operator.

다른 예로, 차원 파라미터가 4차원에 대응하는 웨이트에 대한 주소 변환 과정은 하기 식 2를 통해 설명될 수 있다.As another example, the address conversion process for weights corresponding to four dimensions with dimension parameters can be explained through Equation 2 below.

[식 2][Formula 2]

상기 식 2를 참조하면, 웨이트도 입력 피처와 동일한 방식으로 시프트 값과 마스크 값이 상수로 주어지고, 이를 사용하여 웨이트의 위치 정보가 제2주소 정보로 변환될 수 있으며, 여기서 식 1를 통해 전술한 입력 피처는 3차원인 반면, 웨이트는 4차원으로 차원 파라미터가 표현되는 차이점이 있지만, 차원이 증가해도 주소 정보의 레이아웃만 변경될 뿐, 3차원을 주소로 표현한 방식과 동등하게 시프트 AND, OR 연산자를 이용하여 주소 정보를 생성할 수 있다.Referring to the above equation 2, the weight is also given a shift value and a mask value as constants in the same manner as the input feature, and by using this, the position information of the weight can be converted into the second address information, and while the input feature described above through equation 1 is three-dimensional, the weight has a difference in that the dimension parameter is expressed as four-dimensional, but even if the dimension increases, only the layout of the address information changes, and the address information can be generated using the shift AND and OR operators equivalent to the method of expressing the three-dimensional address.

이는 전술한 도 3에서 나타낸 주소 레이아웃의 형태에 반영되어 있으며, 피처의 차원이 3차원인 경우와 4차원인 경우의 주소 레이아웃의 차이는 늘어난 차원만큼 주소 레이아웃이 커지게 되는 것이다.This is reflected in the form of the address layout shown in the aforementioned Figure 3, and the difference in the address layout when the feature dimension is 3-dimensional and when it is 4-dimensional is that the address layout becomes larger as the dimension increases.

한편, 앞에 상세히 설명한 바와 같이 추론 가속 장치(100)에 의해 피처와 웨이트의 위치가 주소로 변환되면, 신경망 가속기(10)는 메모리(20)에서 입력 피처와 웨이트를 생성된 주소 정보를 이용하여 불러올 수 있고, 불러온 데이터(달리 말해, 입력 피처 및 웨이트)로 컨벌루션에서 합성곱을 계산하고 활성화 함수, 풀링 등을 포함하는 연산을 수행하여 해당 계층에서의 추론을 위한 연산이 완결되게 되며, 이렇게 연산된 결과(출력)는 추론 가속 장치(100)에서 생성된 제3주소 정보를 이용하여 출력 피처 값을 메모리(20)에 저장하게 된다.Meanwhile, when the locations of features and weights are converted into addresses by the inference acceleration device (100) as described in detail above, the neural network accelerator (10) can load input features and weights from the memory (20) using the generated address information, and calculate a synthetic convolution in convolution with the loaded data (in other words, the input features and weights) and perform operations including an activation function, pooling, etc. to complete the operation for inference in the corresponding layer, and the result (output) of the operation in this manner stores the output feature value in the memory (20) using the third address information generated by the inference acceleration device (100).

종합하면, 본원에서 개시하는 추론 가속 장치(100)를 포함하는 신경망 가속기(10)는 먼저 텐서의 위치를 정의하고, 정의된 위치 정보와 차원 파라미터 값으로 입력 피처, 출력 피처, 웨이트의 주소를 생성한다. 구체적으로, 주소 생성 모듈(100)은 차원 파라미터를 통해 주소의 레이아웃을 만들고, 위치 정보는 주소의 레이아웃에 맞게 주소로 변환이 된다. 또한, 신경망 가속기(10)는 생성된 주소 정보를 이용하여 입력 피처 및 웨이트를 메모리(20)에서 불러오고, 다음으로 불러온 입력 피처와 웨이트로 컨벌루션을 진행하여 합성곱을 만들고, 합성곱 연산 결과로부터 활성화 함수, 풀링이 진행된다. 이러한 과정을 통해 해당 계층의 출력 피처가 생성되면 처음 생성된 출력 피처에 대응하는 주소 정보(즉, 제3 주소 정보)로 메모리(20)에 값이 써지게 되고, 출력 피처는 다시금 다음 계층에서의 추론을 위한 입력 피처로 사용된다. 여기까지의 추론 과정은 한 계층의 추론 과정을 요약한 것이며, 이러한 계층이 여러 번 반복하여 동작함으로써 입력된 데이터(예를 들면, 이미지 데이터 등)에 대한 추론 결과를 도출하게 된다.In summary, the neural network accelerator (10) including the inference accelerator (100) disclosed in the present invention first defines the location of the tensor, and generates the addresses of the input features, output features, and weights with the defined location information and the dimension parameter values. Specifically, the address generation module (100) creates the layout of the address through the dimension parameters, and the location information is converted into an address according to the layout of the address. In addition, the neural network accelerator (10) loads the input features and weights from the memory (20) using the generated address information, and then performs convolution with the loaded input features and weights to create a convolution, and performs an activation function and pooling from the convolution operation result. When the output feature of the corresponding layer is generated through this process, the value is written to the memory (20) as the address information (i.e., the third address information) corresponding to the first generated output feature, and the output feature is again used as an input feature for inference in the next layer. The inference process up to this point is a summary of the inference process of one layer, and these layers operate repeatedly to derive inference results for input data (e.g., image data, etc.).

도 4는 본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치의 개략적인 구성도이다.FIG. 4 is a schematic diagram of a neural network inference acceleration device having an efficient address conversion function according to one embodiment of the present invention.

도 4를 참조하면, 추론 가속 장치(100) 내지 주소 생성 모듈(100)은 파라미터 설정부(110) 및 주소 변환부(120)를 포함할 수 있다. 또한, 도 4를 참조하면, 주소 변환부(120)는 레이아웃 생성부(121) 및 변환 수행부(122)를 포함할 수 있다.Referring to FIG. 4, the inference acceleration device (100) or address generation module (100) may include a parameter setting unit (110) and an address conversion unit (120). In addition, referring to FIG. 4, the address conversion unit (120) may include a layout generation unit (121) and a conversion performing unit (122).

파라미터 설정부(110)는 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터를 획득할 수 있다.The parameter setting unit (110) can obtain location information and dimensional parameters associated with operations performed in a given layer forming a neural network.

예시적으로, 파라미터 설정부(110)는 피처(예를 들면, 입력 피처 등)에 대한 차원 파라미터는 3차원에 대응하도록 설정(결정)하고, 웨이트에 대한 차원 파라미터는 4차원에 대응하도록 설정(결정)할 수 있으나, 이에만 한정되는 것은 아니다.For example, the parameter setting unit (110) may set (determine) the dimension parameter for a feature (e.g., input feature, etc.) to correspond to three dimensions, and may set (determine) the dimension parameter for a weight to correspond to four dimensions, but is not limited thereto.

주소 변환부(120)는 전술한 위치 정보 및 차원 파라미터에 기초하여 해당 계층의 피처 및 웨이트에 대한 주소 정보를 생성할 수 있다.The address conversion unit (120) can generate address information for features and weights of the corresponding layer based on the aforementioned location information and dimension parameters.

보다 구체적으로, 주소 변환부(120)는 신경망의 소정의 계층의 입력 피처에 대한 제1주소 정보(I addr.), 웨이트에 대한 제2주소 정보(F addr.) 및 해당 계층의 출력 피처에 대한 제3주소 정보(O addr.)를 각각 생성할 수 있다.More specifically, the address conversion unit (120) can generate first address information (I addr.) for an input feature of a given layer of the neural network, second address information (F addr.) for a weight, and third address information (O addr.) for an output feature of the corresponding layer, respectively.

먼저, 레이아웃 생성부(121)는 전술한 차원 파라미터에 기초하여 생성될 주소 정보의 레이아웃을 결정할 수 있다. 또한, 변환 수행부(122)는 전술한 위치 정보에 포함된 복수의 좌표 각각에 대하여 소정의 기준 연산을 수행하여 결정된 주소 정보의 레이아웃에 대응하는 주소 값을 결정할 수 있다.First, the layout generation unit (121) can determine the layout of the address information to be generated based on the above-described dimensional parameters. In addition, the conversion performing unit (122) can perform a predetermined reference operation on each of the plurality of coordinates included in the above-described location information to determine an address value corresponding to the determined layout of the address information.

이와 관련하여, 변환 수행부(122)는 시프트 연산, AND 연산 및 OR 연산을 포함하는 기준 연산을 수행함으로써 주소 정보에 포함될 각각의 주소 값을 도출할 수 있다.In this regard, the conversion performing unit (122) can derive each address value to be included in the address information by performing a reference operation including a shift operation, an AND operation, and an OR operation.

이하에서는 상기에 자세히 설명된 내용을 기반으로, 본원의 동작 흐름을 간단히 살펴보기로 한다.Below, we will briefly review the operating flow of this system based on the detailed explanation above.

도 5는 본원의 제1실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법에 대한 동작 흐름도이다.FIG. 5 is a flowchart illustrating an operation of a neural network inference acceleration method having an efficient address conversion function according to the first embodiment of the present invention.

도 5에 도시된 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법은 앞서 설명된 추론 가속 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 추론 가속 장치(100)에 대하여 설명된 내용은 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법에 대한 설명에도 동일하게 적용될 수 있다.The neural network inference acceleration method with an efficient address conversion function illustrated in Fig. 5 can be performed by the inference acceleration device (100) described above. Therefore, even if the content is omitted below, the content described for the inference acceleration device (100) can be equally applied to the description of the neural network inference acceleration method with an efficient address conversion function.

도 5를 참조하면, 단계 S11에서 파라미터 설정부(110)는 (a) 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터를 획득할 수 있다.Referring to FIG. 5, in step S11, the parameter setting unit (110) can obtain (a) location information and dimensional parameters associated with operations performed in a predetermined layer forming a neural network.

본원의 일 실시예에 따르면, 단계 S11에서 파라미터 설정부(110)는 피처(예를 들면, 입력 피처 등)에 대한 차원 파라미터는 3차원에 대응하도록 설정(결정)하고, 웨이트에 대한 차원 파라미터는 4차원에 대응하도록 설정(결정)할 수 있다.According to one embodiment of the present invention, in step S11, the parameter setting unit (110) can set (determine) the dimension parameter for a feature (e.g., input feature, etc.) to correspond to three dimensions, and set (determine) the dimension parameter for a weight to correspond to four dimensions.

다음으로, 단계 S12에서 주소 변환부(120)는 (b) 획득한 위치 정보 및 차원 파라미터에 기초하여 해당 계층의 피처 및 웨이트에 대한 주소 정보를 생성할 수 있다.Next, in step S12, the address conversion unit (120) can generate address information for features and weights of the corresponding layer based on the acquired location information and dimension parameters (b).

이와 관련하여, 단계 S12에서 주소 변환부(120)는 신경망의 소정의 계층의 입력 피처에 대한 제1주소 정보(I addr.), 웨이트에 대한 제2주소 정보(F addr.) 및 해당 계층의 출력 피처에 대한 제3주소 정보(O addr.)를 각각 생성할 수 있다.In this regard, in step S12, the address conversion unit (120) can generate first address information (I addr.) for input features of a given layer of the neural network, second address information (F addr.) for weights, and third address information (O addr.) for output features of the corresponding layer, respectively.

구체적으로, 단계 S12에서 레이아웃 생성부(121)는 (b1) 획득한 차원 파라미터에 기초하여 생성될 주소 정보의 레이아웃을 결정할 수 있다.Specifically, in step S12, the layout generation unit (121) can determine the layout of the address information to be generated based on the acquired dimensional parameters (b1).

또한, 단계 S12에서 변환 수행부(122)는 (b2) 위치 정보에 포함된 복수의 좌표 각각에 대하여 소정의 기준 연산을 수행하여 결정된 주소 정보의 레이아웃에 대응하는 주소 값을 결정할 수 있다.In addition, in step S12, the conversion performing unit (122) can determine an address value corresponding to the layout of the determined address information by performing a predetermined reference operation on each of the plurality of coordinates included in the (b2) location information.

본원의 일 실시예에 따르면, 단계 S12에서 변환 수행부(122)는 시프트 연산, AND 연산 및 OR 연산을 포함하는 기준 연산을 수행함으로써 주소 정보에 포함될 각각의 주소 값을 도출할 수 있다.According to one embodiment of the present invention, in step S12, the conversion performing unit (122) can derive each address value to be included in the address information by performing a reference operation including a shift operation, an AND operation, and an OR operation.

상술한 설명에서, 단계 S11 및 S12는 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다.In the above description, steps S11 and S12 may be further divided into additional steps or combined into fewer steps, depending on the implementation example of the present invention.

도 6은 본원의 제2실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법에 대한 동작 흐름도이다.FIG. 6 is a flowchart of an operation of a neural network inference acceleration method equipped with an efficient address conversion function according to the second embodiment of the present invention.

도 6에 도시된 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법은 앞서 설명된 신경망 가속기(10)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 신경망 가속기(10)에 대하여 설명된 내용은 도 6에 대한 설명에도 동일하게 적용될 수 있다.The neural network inference acceleration method equipped with an efficient address conversion function illustrated in Fig. 6 can be performed by the neural network accelerator (10) described above. Therefore, even if the content is omitted below, the content described for the neural network accelerator (10) can be equally applied to the description of Fig. 6.

도 6을 참조하면, 단계 S21에서 주소 생성 모듈(100)은 (a) 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터에 기초하여 해당 계층의 입력 피처, 웨이트 및 출력 피처에 대한 주소 정보를 각각 생성할 수 있다.Referring to FIG. 6, in step S21, the address generation module (100) can generate address information for input features, weights, and output features of each layer based on (a) location information and dimensional parameters associated with operations performed in a given layer forming a neural network.

다음으로, 단계 S22에서 탐색 모듈(200)은 (b) 생성된 주소 정보에 기초하여 신경망과 연계된 데이터를 저장하는 메모리(20)로부터 입력 피처 및 웨이트를 획득할 수 있다.Next, in step S22, the search module (200) can obtain input features and weights from the memory (20) storing data associated with the neural network based on the generated address information (b).

다음으로, 단계 S23에서 연산 모듈(300)은 (c) 획득한 입력 피처 및 웨이트에 기초하여 합성곱 연산, 활성화 연산 및 풀링 연산을 포함하는 연산을 수행할 수 있다.Next, in step S23, the operation module (300) can perform an operation including a convolution operation, an activation operation, and a pooling operation based on the acquired input features and weights (c).

다음으로, 단계 S24에서 기록 모듈(400)은 (d) 단계 S23에서의 연산의 수행 결과를 단계 S21에서 생성된 주소 정보에 기초하여 메모리(20)에 저장할 수 있다.Next, in step S24, the recording module (400) can store the result of performing the operation in step S23 in the memory (20) based on the address information generated in step S21.

상술한 설명에서, 단계 S21 내지 S24는 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S21 to S24 may be further divided into additional steps or combined into fewer steps, depending on the implementation example of the present invention. In addition, some steps may be omitted as needed, and the order between the steps may be changed.

도 7은 본원의 제2실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법에 대한 세부 동작 흐름도이다.Figure 7 is a detailed operation flow diagram of a neural network inference acceleration method equipped with an efficient address conversion function according to the second embodiment of the present invention.

도 7에 도시된 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법의 세부 프로세스는 앞서 설명된 신경망 가속기(10)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 신경망 가속기(10)에 대하여 설명된 내용은 도 7에 대한 설명에도 동일하게 적용될 수 있다.The detailed process of the neural network inference acceleration method with the efficient address conversion function illustrated in Fig. 7 can be performed by the neural network accelerator (10) described above. Therefore, even if the content is omitted below, the content described for the neural network accelerator (10) can be equally applied to the description of Fig. 7.

도 7을 참조하면, 단계 S211에서 파라미터 설정부(110)는 신경망을 이루는 소정의 계층에서 수행되는 연산과 연계된 위치 정보 및 차원 파라미터를 획득할 수 있다. 달리 말해, 단계 S211에서 파라미터 설정부(110)는 주소 정보를 생성하기 위한 피처/웨이트의 위치와 차원을 정의할 수 있다.Referring to FIG. 7, in step S211, the parameter setting unit (110) can obtain location information and dimension parameters associated with operations performed in a predetermined layer forming a neural network. In other words, in step S211, the parameter setting unit (110) can define the location and dimension of features/weights for generating address information.

다음으로, 단계 S212에서 변환 수행부(122)는 위치 정보 및 차원 파라미터에 기초하여 해당 계층과 연계된 입력 피처에 대한 주소 정보인 제1주소 정보를 생성할 수 있다. 이와 관련하여, 제1주소 정보는 신경망과 연계된 데이터를 저장하는 메모리(20)로부터 해당 계층과 연계된 입력 피처를 불러오기 위한 주소 정보를 의미할 수 있다.Next, in step S212, the transformation performing unit (122) can generate first address information, which is address information for an input feature associated with the corresponding layer, based on the location information and dimension parameters. In this regard, the first address information may mean address information for retrieving an input feature associated with the corresponding layer from a memory (20) that stores data associated with the neural network.

다음으로, 단계 S213에서 변환 수행부(122)는 위치 정보 및 차원 파라미터에 기초하여 해당 계층과 연계된 웨이트에 대한 주소 정보인 제2주소 정보를 생성할 수 있다. 이와 관련하여, 제2주소 정보는 신경망과 연계된 데이터를 저장하는 메모리(20)로부터 해당 계층과 연계된 웨이트를 불러오기 위한 주소 정보를 의미할 수 있다.Next, in step S213, the conversion performing unit (122) can generate second address information, which is address information for the weight associated with the corresponding layer, based on the location information and the dimension parameters. In this regard, the second address information may mean address information for loading the weight associated with the corresponding layer from the memory (20) that stores data associated with the neural network.

다음으로, 단계 S214에서 변환 수행부(122)는 위치 정보 및 차원 파라미터에 기초하여 해당 계층과 연계된 출력 피처에 대한 주소 정보인 제3주소 정보를 생성할 수 있다. 이와 관련하여, 제3주소 정보는 신경망을 이용한 해당 계층에서의 연산 결과를 메모리(20)에 저장하기 위한 주소 정보를 의미할 수 있다.Next, in step S214, the transformation performing unit (122) can generate third address information, which is address information for an output feature associated with the corresponding layer, based on the location information and dimension parameters. In this regard, the third address information may mean address information for storing the operation result in the corresponding layer using the neural network in the memory (20).

다음으로, 단계 S221에서 탐색 모듈(200)은 생성된 주소 정보 중 제1주소 정보에 기초하여 메모리(20)로부터 입력 피처를 획득할 수 있다.Next, in step S221, the search module (200) can obtain an input feature from the memory (20) based on the first address information among the generated address information.

다음으로, 단계 S222에서 탐색 모듈(200)은 생성된 주소 정보 중 제2주소 정보에 기초하여 메모리(20)로부터 웨이트를 획득할 수 있다.Next, in step S222, the search module (200) can obtain a weight from the memory (20) based on the second address information among the generated address information.

다음으로, 단계 S231에서 연산 모듈(300)은 획득한 입력 피처 및 웨이트에 기초하여 합성곱 연산(컨벌루션, Convolution)을 수행할 수 있다.Next, in step S231, the operation module (300) can perform a convolution operation (convolution) based on the acquired input features and weights.

다음으로, 단계 S232에서 연산 모듈(300)은 획득한 입력 피처 및 웨이트에 기초하여 활성화 연산(달리 말해, 활성화 함수 적용)을 수행할 수 있다.Next, in step S232, the operation module (300) can perform an activation operation (in other words, apply an activation function) based on the acquired input features and weights.

다음으로, 단계 S233에서 연산 모듈(300)은 획득한 입력 피처 및 웨이트에 기초하여 풀링 연산(예를 들면, 맥스 풀링(Max Pooling) 연산 등)을 수행할 수 있다.Next, in step S233, the operation module (300) can perform a pooling operation (e.g., a Max Pooling operation, etc.) based on the acquired input features and weights.

다음으로, 단계 S24에서 기록 모듈(400)은 단계 S231 내지 단계 S233를 통한 연산의 수행 결과를 제3주소 정보에 기초하여 메모리(20)에 저장할 수 있다.Next, in step S24, the recording module (400) can store the results of the operations performed through steps S231 to S233 in the memory (20) based on the third address information.

또한, 도 7을 참조하면 전술한 단계 S211 내지 단계 S24는 신경망을 이루는 복수의 계층에 대하여 순차적으로 반복 수행될 수 있다. 달리 말해, 도 7에 도시된 바와 같이 신경망을 이루는 복수의 계층 중 마지막 계층까지 단계 S211 내지 단계 S24가 수행되고 나면, 해당 신경망을 이용한 입력 데이터(입력 이미지)에 대한 추론이 완료되는 것일 수 있다.In addition, referring to FIG. 7, the above-described steps S211 to S24 may be sequentially and repeatedly performed for a plurality of layers forming a neural network. In other words, when steps S211 to S24 are performed up to the last layer among the plurality of layers forming a neural network as illustrated in FIG. 7, inference for input data (input image) using the corresponding neural network may be completed.

상술한 설명에서, 단계 S211 내지 S24는 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S211 to S24 may be further divided into additional steps or combined into fewer steps, depending on the implementation example of the present invention. In addition, some steps may be omitted as needed, and the order between the steps may be changed.

본원의 일 실시예에 따른 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.A neural network inference acceleration method having an efficient address conversion function according to one embodiment of the present invention may be implemented in the form of program commands that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program commands, data files, data structures, etc., alone or in combination. The program commands recorded on the medium may be those specially designed and configured for the present invention or may be those known to and usable by those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and hardware devices specially configured to store and execute program commands such as ROMs, RAMs, and flash memories. Examples of the program commands include not only machine language codes generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter, etc. The above hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

또한, 전술한 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 방법은 기록 매체에 저장되는 컴퓨터에 의해 실행되는 컴퓨터 프로그램 또는 애플리케이션의 형태로도 구현될 수 있다.In addition, the neural network inference acceleration method having the above-described efficient address translation function can also be implemented in the form of a computer program or application executed by a computer stored in a recording medium.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes only, and those skilled in the art will understand that the present invention can be easily modified into other specific forms without changing the technical idea or essential features of the present invention. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single component may be implemented in a distributed manner, and likewise, components described as distributed may be implemented in a combined manner.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the claims described below rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present application.

1000: 신경망 기반의 추론 시스템
10: 신경망 가속기
100: 효율적인 주소 변환 기능을 구비한 신경망 추론 가속 장치, 주소 생성 모듈
110: 파라미터 설정부
120: 주소 변환부
121: 레이아웃 생성부
122: 변환 수행부
200: 탐색 모듈
300: 연산 모듈
400: 기록 모듈
20: 메모리1000: Neural network-based inference system
10: Neural Network Accelerator
100: Neural network inference accelerator with efficient address conversion function, address generation module
110: Parameter setting section
120: Address conversion section
121: Layout generation section
122: Conversion execution unit
200: Navigation Module
300: Operation Module
400: Record Module
20: Memory

Claims

A neural network inference acceleration method having an efficient address conversion function performed by a neural network inference acceleration device,
(a) a step of obtaining position information and dimensional parameters associated with an operation performed in any one of a plurality of layers including an input layer, an output layer, and a layer between the input layer and the output layer forming a neural network; and
(b) a step of generating address information for features and weights of the layer based on the location information and the dimension parameters;
Including,
Step (b) above,
(b1) a step of determining the layout of the address information based on the above dimension parameters; and
(b2) a step of determining an address value corresponding to the layout by performing a reference operation including a shift operation for each of the plurality of coordinates included in the location information;
Including,
The above step (b2) is,
An inference acceleration method, wherein the shift value and mask value for each coordinate included in the location information are determined based on the dimension parameters, a shift operation is performed for each coordinate by the shift value, and the address value converted to a preset bit is stored using the mask value, thereby converting the location of the feature or the weight into the address of each coordinate.

delete

In the first paragraph,
The above criteria operation is,
A method for accelerating inference, characterized by further including AND operations and OR operations.

In the first paragraph,
Step (b) above,
An inference acceleration method, which generates first address information for input features of the above layer, second address information for the weights, and third address information for output features of the above layer, respectively.

In paragraph 4,
The above first address information and the above second address information are address information for respectively loading the input feature and the weight from a memory storing data linked to the neural network.
An inference acceleration method, characterized in that the third address information is address information for storing the result of an operation using the neural network in the memory.

In the first paragraph,
An inference acceleration method, characterized in that the above neural network is a convolutional neural network (CNN).

In Article 6,
In step (a) above,
A method for accelerating inference, wherein the dimension parameter for the feature corresponds to three dimensions, and the dimension parameter for the weight corresponds to four dimensions.

A neural network inference acceleration method having an efficient address translation function performed by a neural network accelerator,
(a) a step of generating address information for input features, weights and output features of a layer based on location information and dimensional parameters associated with operations performed in any one of a plurality of layers including an input layer, an output layer and layers between the input layer and the output layer forming a neural network;
(b) a step of obtaining the input features and the weights from a memory storing data associated with the neural network based on the address information;
(c) performing the operation based on the input features and the weights; and
(d) a step of storing the result of performing the above operation in the memory based on the address information;
Including,
Step (a) above,
A step of determining the layout of the address information based on the above dimension parameters; and
A step of determining an address value corresponding to the layout by performing a reference operation including a shift operation for each of the plurality of coordinates included in the above location information;
Including,
The step of determining the above address value is:
An inference acceleration method, wherein the shift value and mask value for each coordinate included in the location information are determined based on the dimension parameters, a shift operation is performed for each coordinate by the shift value, and the address value converted to a preset bit is stored using the mask value, thereby converting the location of the feature or the weight into the address of each coordinate.

In Article 8,
The above neural network is a convolutional neural network (CNN).
A method for accelerating inference, wherein the above operations include convolution operations, activation operations, and pooling operations.

In a neural network inference accelerator equipped with an efficient address translation function,
A parameter setting unit for obtaining position information and dimensional parameters associated with an operation performed in any one of a plurality of layers including an input layer, an output layer, and a layer between the input layer and the output layer forming a neural network; and
An address conversion unit that generates address information for features and weights of the layer based on the location information and the dimension parameters;
Including,
The above address conversion section,
A layout generation unit that determines the layout of the address information based on the above dimension parameters; and
A conversion performing unit that determines an address value corresponding to the layout by performing a reference operation including a shift operation for each of the plurality of coordinates included in the above location information;
Including,
The above conversion performing unit,
An inference accelerator device that determines a shift value and a mask value for each coordinate included in the location information based on the dimension parameters, performs a shift operation for each coordinate by the shift value, and stores the converted address value in a preset bit using the mask value, thereby converting the location of the feature or the weight into the address of each coordinate.

delete

In Article 10,
The above criteria operation is,
An inference accelerator device, characterized by further including AND operations and OR operations.

In Article 10,
The above address conversion section,
An inference accelerator device which generates first address information for input features of the above layer, second address information for the weights, and third address information for output features of the above layer, respectively.

In Article 10,
An inference accelerator device, characterized in that the above neural network is a convolutional neural network (CNN).

As a neural network accelerator with efficient address translation capabilities,
An address generation module which generates address information for input features, weights and output features of a layer based on location information and dimensional parameters associated with operations performed in any one of a plurality of layers including an input layer, an output layer and layers between the input layer and the output layer forming a neural network;
A search module that obtains the input features and the weights from a memory storing data associated with the neural network based on the address information;
A computation module that performs the computation based on the input features and the weights; and
A recording module that stores the result of performing the above operation in the memory based on the address information;
Including,
The above address generation module,
The layout of the address information is determined based on the above dimensional parameters, and a reference operation including a shift operation is performed for each of the plurality of coordinates included in the location information to determine an address value corresponding to the layout.
The above address generation module,
A neural network accelerator that determines a shift value and a mask value for each coordinate included in the location information based on the dimension parameters, performs a shift operation for each coordinate by the shift value, and stores an address value converted to a preset bit using the mask value, thereby converting the location of the feature or the weight into an address of each coordinate.