CN110717575B

CN110717575B - Frame buffer free convolutional neural network system and method

Info

Publication number: CN110717575B
Application number: CN201810767312.3A
Authority: CN
Inventors: 杨得炜
Original assignee: Himax Technologies Ltd
Current assignee: Himax Technologies Ltd
Priority date: 2018-07-13
Filing date: 2018-07-13
Publication date: 2022-07-26
Anticipated expiration: 2038-07-13
Also published as: CN110717575A

Abstract

The invention relates to a convolutional neural network system and a convolutional neural network method without a frame buffer. The frame buffer-less convolutional neural network system includes: a region-of-interest unit for extracting features to generate a region of interest of the input image frame; a convolutional neural network unit processing a region of interest of an input image frame to detect an object; and a tracking unit for comparing the features extracted at different times so that the convolutional neural network unit selectively processes the input image frame accordingly.

Description

Framebufferless Convolutional Neural Network System and Method

技术领域technical field

本发明涉及一种卷积神经网络(CNN)，特别是关于一种无帧缓冲器的卷积神经网络系统。The present invention relates to a convolutional neural network (CNN), in particular to a framebufferless convolutional neural network system.

背景技术Background technique

卷积神经网络(convolutional neural network,CNN)为人工神经网络(artificial neural network)的一种，可用于机器学习(machine learning)。卷积神经网络可应用于信号处理，例如图像处理及计算机视觉。Convolutional neural network (CNN) is a kind of artificial neural network, which can be used for machine learning. Convolutional neural networks can be applied to signal processing, such as image processing and computer vision.

图1显示传统卷积神经网络900的方块图，揭示于Li Du等人所提出的“用于物联网的可重置串流之深卷积神经网络加速器(AReconfigurable Streaming DeepConvolutional Neural Network Accelerator for Internet of Things)”，2017年8月，电机电子工程师学会(IEEE)电路与系统会刊(IEEE Transactions on Circuits andSystems)I：定期论文，其内容视为本说明书的一部份。卷积神经网络900包含缓冲组(buffer bank)91，其包含单端口的静态随机存取存储器(SRAM)，用以储存中间数据(intermediate data)且与帧缓冲器(frame buffer)92交换数据，该帧缓冲器92包含动态随机存取存储器(DRAM)，例如双倍数据率同步动态随机存取存储器(DDR DRAM)，用以储存整个影像帧，供卷积神经网络操作之用。缓冲组91被分为二部分：输入层与输出层。卷积神经网络900包含列(column)缓冲器93，用以将缓冲组91的输出重映射(remap)至卷积单元(convolution unit,CU)引擎阵列94。卷积单元引擎阵列94包含多个卷积单元以执行高度平行的卷积运算。卷积单元引擎阵列94包含预取(pre-fetch)控制器941，用以周期的从直接存储器访问(direct memory access,DMA)控制器(未显示)取得参数且更新卷积单元引擎阵列94的权重与偏压值。卷积神经网络900还包含累积(accumulation)缓冲器95，具草稿(scratchpad)存储器，用以储存卷积单元引擎阵列94的部分卷积结果。累积缓冲器95包含最大池化(max pool)951以池化输出层数据。卷积神经网络900包含指令解码器96，用以储存预存于帧缓冲器92的命令。Figure 1 shows a block diagram of a conventional convolutional neural network 900, which is disclosed in the "AReconfigurable Streaming DeepConvolutional Neural Network Accelerator for Internet of Things" proposed by Li Du et al. Things”, August 2017, IEEE Transactions on Circuits and Systems I: Periodic paper, the content of which is considered a part of this specification. The convolutional neural network 900 includes a buffer bank 91, which includes a single-port static random access memory (SRAM) for storing intermediate data and exchanging data with a frame buffer 92, The frame buffer 92 includes dynamic random access memory (DRAM), such as double data rate synchronous dynamic random access memory (DDR DRAM), for storing entire image frames for convolutional neural network operations. The buffer group 91 is divided into two parts: an input layer and an output layer. The convolutional neural network 900 includes a column buffer 93 for remapping the output of the buffer bank 91 to a convolution unit (CU) engine array 94 . The convolution unit engine array 94 contains a plurality of convolution units to perform highly parallel convolution operations. The convolutional unit engine array 94 includes a pre-fetch controller 941 for periodically fetching parameters from a direct memory access (DMA) controller (not shown) and updating the convolutional unit engine array 94 Weight and bias values. The convolutional neural network 900 also includes an accumulation buffer 95 with scratchpad memory for storing partial convolution results of the convolutional unit engine array 94 . The accumulation buffer 95 contains a max pool 951 to pool the output layer data. Convolutional neural network 900 includes command decoder 96 for storing commands pre-stored in frame buffer 92 .

如图1所示的传统卷积神经网络系统，帧缓冲器包含动态随机存取存储器(DRAM)，例如双倍数据率同步动态随机存取存储器(DDR DRAM)，用以储存整个影像帧，供卷积神经网络操作之用。举例而言，分辨率为320x240的影像帧需占用空间为320x240x8位的帧缓冲器。然而，双倍数据率同步动态随机存取存储器(DDR DRAM)并不适用于低功率应用，例如穿戴式或物联网(IoT)装置。因此亟需提出一种新颖的卷积神经网络系统，以适用于低功率应用。In a conventional convolutional neural network system as shown in Figure 1, the frame buffer contains dynamic random access memory (DRAM), such as double data rate synchronous dynamic random access memory (DDR DRAM), to store the entire image frame for Convolutional neural network operations. For example, an image frame with a resolution of 320x240 takes up a frame buffer of 320x240x8 bits. However, double data rate synchronous dynamic random access memory (DDR DRAM) is not suitable for low power applications such as wearable or Internet of Things (IoT) devices. Therefore, it is urgent to propose a novel convolutional neural network system suitable for low-power applications.

发明内容SUMMARY OF THE INVENTION

鉴于上述，本发明实施例的目的之一在于提出一种无帧缓冲器的卷积神经网络系统。本实施例可使用简易系统架构以执行卷积神经网络操作于高分辨率影像帧。In view of the above, one of the objectives of the embodiments of the present invention is to provide a frame buffer-free convolutional neural network system. This embodiment can use a simple system architecture to perform convolutional neural network operations on high-resolution image frames.

根据本发明的一实施例，一种无帧缓冲器的卷积神经网络系统包括感兴趣区域单元、卷积神经网络单元及追踪单元。感兴趣区域单元提取特征，据以产生输入影像帧的感兴趣区域。卷积神经网络单元处理输入影像帧的感兴趣区域以检测对象。追踪单元比较不同时间提取的特征，使得卷积神经网络单元据以选择地处理输入影像帧，其中该感兴趣区域单元产生基于区块的特征，据以决定每一影像区块是否执行卷积神经网络操作。According to an embodiment of the present invention, a frame bufferless convolutional neural network system includes a region of interest unit, a convolutional neural network unit and a tracking unit. The region of interest unit extracts features to generate a region of interest of the input image frame. A convolutional neural network unit processes the region of interest of the input image frame to detect objects. The tracking unit compares the features extracted at different times, so that the convolutional neural network unit selectively processes the input image frame, wherein the region-of-interest unit generates block-based features to determine whether to perform convolutional neural network for each image block. network operations.

根据本发明的另一实施例，一种用于无帧缓冲器的卷积神经网络的方法，包括：提取特征，据以产生输入影像帧的感兴趣区域；执行卷积神经网络操作于该输入影像帧的感兴趣区域，以检测对象；以及比较不同时间提取的特征，据以选择地处理该输入影像帧，其中产生该感兴趣区域的步骤包含：产生基于区块的特征，据以决定每一影像区块是否执行卷积神经网络操作。According to another embodiment of the present invention, a method for a framebufferless convolutional neural network includes: extracting features from which to generate a region of interest of an input image frame; performing a convolutional neural network operation on the input a region of interest of an image frame to detect objects; and comparing features extracted at different times to selectively process the input image frame, wherein the step of generating the region of interest comprises: generating block-based features to determine each Whether an image block performs convolutional neural network operations.

附图说明Description of drawings

图1显示传统卷积神经网络的方块图。Figure 1 shows a block diagram of a traditional convolutional neural network.

图2A显示本发明实施例的无帧缓冲器的卷积神经网络系统的方块图。FIG. 2A shows a block diagram of a frame buffer-less convolutional neural network system according to an embodiment of the present invention.

图2B显示本发明实施例的用于无帧缓冲器的卷积神经网络的方法的流程图。FIG. 2B shows a flowchart of a method for a frame bufferless convolutional neural network according to an embodiment of the present invention.

图3显示图2A的感兴趣区域单元的具体结构方块图。FIG. 3 is a block diagram showing the specific structure of the region of interest unit of FIG. 2A .

图4A例示决定图，其包含4x6区块。Figure 4A illustrates a decision map, which includes 4x6 blocks.

图4B例示另一决定图，其更新于图4A之后。FIG. 4B illustrates another decision diagram, which is updated after FIG. 4A.

图5显示图2A的缓存器的具体结构方块图。FIG. 5 is a block diagram showing a specific structure of the register of FIG. 2A .

图6显示图2A的卷积神经网络单元的具体结构方块图。FIG. 6 is a block diagram showing the specific structure of the convolutional neural network unit of FIG. 2A .

具体实施方式Detailed ways

图2A显示本发明实施例的无帧缓冲器(framebuffer-less)的卷积神经网络(CNN)系统100的方块图，图2B显示本发明实施例的无帧缓冲器的卷积神经网络(CNN)方法200的流程图。FIG. 2A shows a block diagram of a framebuffer-less convolutional neural network (CNN) system 100 according to an embodiment of the present invention, and FIG. 2B shows a framebuffer-less convolutional neural network (CNN) according to an embodiment of the present invention. ) flow chart of method 200.

在本实施例中，无帧缓冲器的卷积神经网络系统(以下简称系统)100可包括感兴趣区域(region of interest,ROI)单元11，用以于输入影像帧中产生感兴趣区域(步骤21)。由于本实施例的系统100不含帧缓冲器，感兴趣区域单元11可采用基于扫描线的技术与基于区块的机制，用以在输入影像帧中找出感兴趣区域。其中，输入影像帧分割为多个影像区块，排列为矩阵形式，例如4x6影像区块。In this embodiment, the frame buffer-less convolutional neural network system (hereinafter referred to as the system) 100 may include a region of interest (ROI) unit 11 for generating a region of interest in the input image frame (step twenty one). Since the system 100 of the present embodiment does not contain a frame buffer, the region of interest unit 11 can use the scanline-based technology and the block-based mechanism to find the region of interest in the input image frame. The input image frame is divided into a plurality of image blocks, which are arranged in a matrix form, such as 4×6 image blocks.

在本实施例中，感兴趣区域单元11产生基于区块的特征，据以决定每一影像区块是否执行卷积神经网络(CNN)操作。图3显示图2A的感兴趣区域单元11的具体结构方块图。在本实施例中，感兴趣区域单元11可包含特征提取器111，例如用以从输入影像帧中提取浅特征(shallow feature)。在一例子中，特征提取器111根据基于区块的直方图(histogram)以产生区块的(浅)特征。在另一例子中，特征提取器111根据频率分析以产生区块的(浅)特征。In this embodiment, the region-of-interest unit 11 generates block-based features to determine whether each image block performs a convolutional neural network (CNN) operation. FIG. 3 is a block diagram showing the specific structure of the region of interest unit 11 of FIG. 2A . In this embodiment, the region of interest unit 11 may include a feature extractor 111, for example, for extracting shallow features from the input image frame. In one example, the feature extractor 111 generates (shallow) features of a block according to a block-based histogram. In another example, the feature extractor 111 generates (shallow) features of the block based on frequency analysis.

感兴趣区域单元11还可包含分类器112，例如支持向量机(support vectormachine,SVM)，用以决定输入影像帧的每一区块是否执行卷积神经网络操作。藉此，可产生决定图(decision map)12，其包含代表输入影像帧的多个区块(其可排列为矩阵形式)。图4A例示决定图12，其包含4x6区块，其中X表示相关区块不需执行卷积神经网络操作，C表示相关区块需执行卷积神经网络操作，且D表示相关区块检测到对象(例如一只狗)。藉此，可决定感兴趣区域并执行卷积神经网络操作。The region of interest unit 11 may also include a classifier 112, such as a support vector machine (SVM), for determining whether each block of the input image frame performs a convolutional neural network operation. Thereby, a decision map 12 can be generated, which includes a plurality of blocks (which can be arranged in a matrix form) representing the input image frame. Figure 4A illustrates a decision diagram 12, which includes 4x6 blocks, where X indicates that the relevant block does not need to perform a convolutional neural network operation, C indicates that the relevant block needs to perform a convolutional neural network operation, and D indicates that the relevant block detects an object (eg a dog). From this, regions of interest can be determined and convolutional neural network operations performed.

参阅图2B，系统100可包含缓存器13，例如静态随机存取存储器(SRAM)，用以储存(感兴趣区域单元11的)特征提取器111所产生的(浅)特征(步骤22)。图5显示图2A的缓存器13的具体结构方块图。在本实施例中，缓存器13可包含二个特征图(feature map)，亦即，第一特征图131A，用以储存前一影像帧(于前一时间t-1)的特征；及第二特征图131B，用以储存目前影像帧(于目前时间t)的特征。缓存器13还可包含滑动视窗(sliding window)132，其大小可为40x40x8位，用以储存输入影像帧的区块。2B, the system 100 may include a register 13, such as a static random access memory (SRAM), for storing (shallow) features generated by the feature extractor 111 (of the region of interest unit 11) (step 22). FIG. 5 is a block diagram showing the specific structure of the register 13 of FIG. 2A . In this embodiment, the register 13 may include two feature maps, that is, the first feature map 131A, for storing the features of the previous image frame (at the previous time t-1); and the first feature map 131A; Two feature maps 131B are used to store the features of the current image frame (at the current time t). The register 13 may also include a sliding window 132, which may be 40x40x8 bits in size, for storing the block of the input image frame.

参阅图2A，本实施例的系统100可包含卷积神经网络(CNN)单元14，其接收并处理(感兴趣区域单元11)所产生的输入影像帧的感兴趣区域，以检测对象(步骤23)。其中，本实施例的卷积神经网络单元14仅于感兴趣区域执行，而非如具有帧缓冲器的传统系统是执行于整个输入影像帧。Referring to FIG. 2A , the system 100 of the present embodiment may include a convolutional neural network (CNN) unit 14 that receives and processes (region of interest unit 11 ) the region of interest of the input image frame to detect objects (step 23 ). ). Wherein, the convolutional neural network unit 14 of the present embodiment is only executed on the region of interest, instead of executing on the entire input image frame as in the conventional system with a frame buffer.

图6显示图2A的卷积神经网络单元14的具体结构方块图。其中，卷积神经网络单元14可包含卷积单元141，其包含多个卷积引擎(convolution engine)，用以执行卷积操作。卷积神经网络单元14可包含激励(activation)单元142，当检测到默认特征时，可执行激励功能。卷积神经网络单元14还可包含池化(pooling)单元143，用以对输入影像帧执行降低取样率(down-sampling)或池化(pooling)。FIG. 6 is a block diagram showing the specific structure of the convolutional neural network unit 14 of FIG. 2A . The convolutional neural network unit 14 may include a convolution unit 141, which includes a plurality of convolution engines for performing convolution operations. The convolutional neural network unit 14 may include an activation unit 142, which may perform an activation function when a default feature is detected. The convolutional neural network unit 14 may also include a pooling unit 143 for performing down-sampling or pooling on the input image frames.

本实施例的系统100可包含追踪单元15，用以比较(前一影像帧的)第一特征图131A与(目前影像帧的)第二特征图131B，接着更新决定图12(步骤24)。追踪单元15分析第一特征图131A与第二特征图131B之间的内容变化。图4B例示另一决定图12，其更新于图4A之后。在这个例子中，在前一时间，位于第5～6列与第3行的区块有检测到对象(如图4A所标示的D)，但在目前时间，该对象消失(如第四B图所标示的X)。据此，卷积神经网络单元14不需针对无特征变化的区块执行卷积神经网络操作。换句话说，卷积神经网络单元14选择地针对具特征变化的区块执行卷积神经网络操作。因此，系统100可大量地加速操作。The system 100 of this embodiment may include a tracking unit 15 for comparing the first feature map 131A (of the previous image frame) with the second feature map 131B (of the current image frame), and then updating the decision map 12 (step 24 ). The tracking unit 15 analyzes the content change between the first feature map 131A and the second feature map 131B. FIG. 4B illustrates another decision map 12, which is updated after FIG. 4A. In this example, at the previous time, the blocks located in the 5th to 6th columns and the 3rd row had detected objects (as indicated by D in FIG. 4A ), but at the current time, the objects disappeared (as in the fourth B marked by X). Accordingly, the convolutional neural network unit 14 does not need to perform the convolutional neural network operation on the block without feature change. In other words, the convolutional neural network unit 14 selectively performs convolutional neural network operations on the blocks with characteristic changes. Thus, the system 100 can greatly speed up operation.

相较于传统卷积神经网络系统，上述实施例的卷积神经网络操作可大量降低(且加速)。此外，由于本发明实施例不需要帧缓冲器，本实施例可较佳适用于低功率应用，例如穿戴式或物联网(IoT)装置。对于分辨率为320x240且(非重迭)滑动视窗大小为40x40的影像帧，具有帧缓冲器的传统系统需要8x6滑动视窗以执行卷积神经网络操作。相反的，本实施例的系统100仅需很少(小于10)的滑动视窗以执行卷积神经网络操作。Compared to conventional convolutional neural network systems, the convolutional neural network operations of the above-described embodiments can be greatly reduced (and accelerated). Furthermore, since the embodiment of the present invention does not require a frame buffer, this embodiment may be better suited for low-power applications, such as wearable or Internet of Things (IoT) devices. For image frames with a resolution of 320x240 and a (non-overlapping) sliding window size of 40x40, conventional systems with frame buffers require an 8x6 sliding window to perform convolutional neural network operations. In contrast, the system 100 of the present embodiment requires only few (less than 10) sliding windows to perform convolutional neural network operations.

以上所述仅为本发明之较佳实施例而已，并非用以限定本发明之专利保护范围；凡其它未脱离发明所揭示之精神下所完成的等效改变或修饰，均应包含在下述的权利要求范围内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the scope of patent protection of the present invention; all other equivalent changes or modifications made without departing from the spirit disclosed in the invention shall be included in the following within the scope of the claims.

【附图标记说明】[Description of reference numerals]

100 无帧缓冲器的卷积神经网络系统100 Framebufferless Convolutional Neural Network Systems

11 感兴趣区域单元11 Region of Interest Unit

111 特征提取器111 Feature Extractor

112 分类器112 Classifiers

12 决定图12 Decision map

13 缓存器13 Register

131A 第一特征图131A first characteristic map

131B 第二特征图131B second characteristic map

132 滑动视窗132 Sliding window

14 卷积神经网络单元14 Convolutional Neural Network Units

141 卷积单元141 Convolutional units

142 激励单元142 Excitation Unit

143 池化单元143 Pooling unit

15 追踪单元15 Tracking Unit

200 用于无帧缓冲器的卷积神经网络的方法200 Methods for Framebufferless Convolutional Neural Networks

21 在输入影像帧中产生感兴趣区域21 Generate regions of interest in input image frames

22 储存特征于特征图22 Store features in feature map

23 处理感兴趣区域以检测对象23 Process regions of interest to detect objects

24 比较特征并于具特征变化的区块执行卷积神经网络操作24 Compare features and perform convolutional neural network operations on blocks with feature changes

900 卷积神经网络900 Convolutional Neural Networks

91 缓冲组91 Buffer group

92 帧缓冲器92 frame buffer

93 列缓冲器93 column buffer

94 卷积单元引擎阵列94 Convolutional Cell Engine Array

941 预取控制器941 Prefetch Controller

95 累积缓冲器95 Accumulation buffer

951 最大池化951 Max Pooling

96 指令解码器96 instruction decoder

Claims

1. A framebufferless convolutional neural network system comprising:

a region of interest unit for extracting features to generate a region of interest of the input image frame;

a convolutional neural network unit that processes the region of interest of the input image frame to detect objects; and

a tracking unit that compares features extracted at different times so that the convolutional neural network unit selectively processes the input image frame,

The region-of-interest unit generates block-based features to determine whether each image block performs a convolutional neural network operation.

2 . The frame bufferless convolutional neural network system of claim 1 , wherein the region of interest unit adopts a scanline-based technique and a block-based mechanism to find the input image frame. 3 . A region of interest, wherein the input image frame is divided into a plurality of image blocks.

3. The framebufferless convolutional neural network system of claim 2, wherein the region of interest unit comprises:

a feature extractor to extract the feature from the input image frame; and

The classifier determines whether each image block performs a convolutional neural network operation, thereby generating a decision map for determining the region of interest.

4. The framebufferless convolutional neural network system of claim 3, wherein the feature extractor generates shallow features of the image block according to block-based histogram or frequency analysis.

5. The frame bufferless convolutional neural network system of claim 3, further comprising a register for storing the feature.

6. The frame bufferless convolutional neural network system of claim 5, wherein the register comprises: a first feature map for storing features of a previous image frame; and a second feature map for storing Characteristics of the current image frame.

7. The frame bufferless convolutional neural network system of claim 5, wherein the buffer comprises a sliding window for storing blocks of the input image frame.

8. The frame bufferless convolutional neural network system of claim 6, wherein the tracking unit compares the first feature map and the second feature map to update the decision map accordingly.

9. The framebufferless convolutional neural network system of claim 1, wherein the convolutional neural network unit comprises:

A convolution unit, including multiple convolution engines, for performing convolution operations on the region of interest;

an excitation unit that performs an excitation function when the default feature is detected; and

The pooling unit is used to perform downsampling on the input image frame.

10. A method for a framebufferless convolutional neural network, comprising:

extracting features to generate a region of interest of the input image frame;

performing a convolutional neural network operation on the region of interest of the input image frame to detect objects; and

comparing features extracted at different times to selectively process the input image frame,

The step of generating the region of interest includes: generating block-based features to determine whether each image block performs a convolutional neural network operation.

11. The method for a frame bufferless convolutional neural network of claim 10, wherein the region of interest is generated using a scanline-based technique and a block-based mechanism, wherein the input image frame Divide into multiple image blocks.

12. The method for a framebufferless convolutional neural network of claim 11, wherein the step of generating the region of interest comprises:

extract the feature from the input image frame; and

Whether each image block is subjected to a convolutional neural network operation is determined by a classification method, thereby generating a decision map for determining the region of interest.

13. The method for a framebufferless convolutional neural network of claim 12, wherein the step of extracting the feature comprises:

Shallow features of the image block are generated based on block-based histogram or frequency analysis.

14. The method for a framebufferless convolutional neural network of claim 12, further comprising the step of temporarily storing the feature.

15. The method for a framebufferless convolutional neural network of claim 14, wherein the step of temporarily storing the feature comprises:

generating a first feature map for storing the features of the previous image frame; and generating a second feature map for storing the features of the current image frame.

16. The method for a framebufferless convolutional neural network of claim 14, wherein the step of temporarily storing the feature comprises:

A sliding window is generated for storing the block of the input image frame.

17. The method for a framebufferless convolutional neural network of claim 15, wherein the step of comparing the features comprises:

The first feature map and the second feature map are compared to update the decision map accordingly.

18. The method for a framebufferless convolutional neural network of claim 10, wherein the step of performing the convolutional neural network operation comprises:

use multiple convolution engines to perform convolution operations on the region of interest;

When the default feature is detected, perform an activation function; and

Downsampling is performed on the input image frame.