WO2020029181A1

WO2020029181A1 - Three-dimensional convolutional neural network-based computation device and related product

Info

Publication number: WO2020029181A1
Application number: PCT/CN2018/099658
Authority: WO
Inventors: 范鸿翔
Original assignee: Shenzhen Corerain Technologies Co Ltd
Current assignee: Shenzhen Corerain Technologies Co Ltd
Priority date: 2018-08-09
Filing date: 2018-08-09
Publication date: 2020-02-13
Anticipated expiration: 2021-02-09
Also published as: CN111542837B; CN111542837A

Abstract

A three-dimensional convolutional neural network-based computation device and a related product. The computation device comprises: a score processing module, an exponential kernel, a three-dimensional convolution kernel (303), a frame superposition module (304), a channel superposition module (305), and an output cache (306). The computation device achieves a high computation speed and favorable accuracy.

Description

Three-dimensional convolutional neural network computing device and related products

Technical field

本申请涉及计算机以及人工智能技术领域，具体涉及一种三维卷积神经网络计算装置及相关产品。The present application relates to the field of computers and artificial intelligence technologies, and in particular, to a three-dimensional convolutional neural network computing device and related products.

Background technique

随着机器学习的飞速发展，机器学习算法已经在众多应用中取得了巨大成功。其中，深度学习算法，特别是三维卷积神经网络算法，已经广泛用于行为检测、医学图像分析以及地形地势分析等众多应用。与二维卷积不同，三维卷积不仅在二维图像维度对特征进行提取，还加入了第三维度的特征提取与分析，如时间或第三维空间维。对三个维度信息处理，三维卷积神经网络比传统的二维卷积神经网络在三维数据处理上更加准确，例如对视频或是三维地形图的分析。由于行为检测、地形地质分析等应用在移动端部署时，要求实时的处理速度。因此，如何快速，高效地实现三维卷积神经网络并应用在实际场景中，是当前的研究热点之一。With the rapid development of machine learning, machine learning algorithms have achieved great success in many applications. Among them, deep learning algorithms, especially three-dimensional convolutional neural network algorithms, have been widely used in many applications such as behavior detection, medical image analysis, and terrain analysis. Unlike 2D convolution, 3D convolution not only extracts features in the 2D image dimension, but also adds feature extraction and analysis in the 3rd dimension, such as time or the 3D space dimension. For three-dimensional information processing, 3D convolutional neural networks are more accurate than traditional 2D convolutional neural networks in 3D data processing, such as analysis of video or 3D topographic maps. Because applications such as behavior detection and terrain geological analysis are deployed on mobile terminals, real-time processing speed is required. Therefore, how to implement 3D convolutional neural network quickly and efficiently and apply it to actual scenes is one of the current research hotspots.

现有的三维卷积神经网络无法实现实时的处理速度，计算速度慢。The existing 3D convolutional neural network cannot achieve real-time processing speed, and the calculation speed is slow.

申请内容Application content

本申请实施例提供了一种三维卷积神经网络计算装置及相关产品，其通过浮点块实现对三维卷积神经网络的快速处理，提高计算速度。The embodiments of the present application provide a three-dimensional convolutional neural network computing device and related products, which realize fast processing of the three-dimensional convolutional neural network through floating-point blocks and improve calculation speed.

第一方面，本申请实施例提供一种三维卷积神经网络计算装置，其特征在于，所述计算装置包括：分数处理块、指数核、三维卷积核、帧叠加块、通道叠加块和输出缓存；In a first aspect, an embodiment of the present application provides a three-dimensional convolutional neural network computing device, which is characterized in that the computing device includes: a fraction processing block, an exponential kernel, a three-dimensional convolution kernel, a frame superposition block, a channel superposition block, and an output. Cache

分数处理块，用于接收三维图片数据块的分数数据Lm，将分数数据Lm分拆成三个二维分数数据，将三个二维分数数据输入三维卷积核；The score processing block is used to receive the score data Lm of the three-dimensional picture data block, divide the score data Lm into three two-dimensional score data, and input the three two-dimensional score data into the three-dimensional convolution kernel;

指数核，用于接收三维图片数据块的共享指数数据Le，将指数数据Le输入到三维卷积核；The exponential kernel is used to receive the shared exponential data Le of the three-dimensional image data block, and input the exponential data Le to the three-dimensional convolution kernel;

三维卷积核，用于将三个二维分数数据分别执行二维卷积运算得到三个分数卷积运算结果，将指数数据Le分成三个二维指数数据执行二维卷积运算得到三个指数卷积运算结果，将三个指数卷积运算结果以及三个分数卷积运算结果输入到帧叠加块；A three-dimensional convolution kernel, which is used to perform three-dimensional convolution operations on three two-dimensional fractional data to obtain three fractional convolution operation results, divide the index data Le into three two-dimensional index data, and perform three-dimensional convolution operations to obtain three. Exponential convolution operation results, inputting three exponential convolution operation results and three fractional convolution operation results into the frame superposition block;

帧叠加块，用于将三个指数卷积运算结果以及三个分数卷积运算结果执行帧叠加处理得到叠加处理后的数据，将叠加处理后的数据输入到通道叠加块；Frame superposition block, which is used to perform frame superposition processing on three exponential convolution operation results and three fractional convolution operation results to obtain superimposed data, and input the superimposed processing data to the channel superposition block;

通道叠加块，用于将叠加处理后的数据执行通道叠加处理后得到初步卷积结果，将初步处理结果输入到输出缓存。The channel superposition block is used to obtain the preliminary convolution result after performing the channel superposition processing on the data after the superposition processing, and input the preliminary processing result to the output buffer.

第二方面，提供一种采用第一方面的装置执行三维卷积运算的方法，所述方法包括：According to a second aspect, a method for performing a three-dimensional convolution operation by using the device of the first aspect is provided. The method includes:

接收三维图片数据块的分数数据Lm，将分数数据Lm分拆成三个二维分数数据，将三个二维分数数据输入三维卷积核；Receive the score data Lm of the three-dimensional picture data block, divide the score data Lm into three two-dimensional score data, and input the three two-dimensional score data into the three-dimensional convolution kernel;

接收三维图片数据块的共享指数数据Le，将指数数据Le输入到三维卷积核；Receive the shared index data Le of the three-dimensional picture data block, and input the index data Le to the three-dimensional convolution kernel;

将三个二维分数数据分别执行二维卷积运算得到三个分数卷积运算结果，将指数数据Le分成三个二维指数数据执行二维卷积运算得到三个指数卷积运算结果，将三个指数卷积运算结果以及三个分数卷积运算结果；The three two-dimensional fraction data is respectively subjected to a two-dimensional convolution operation to obtain three fractional convolution operation results. The index data Le is divided into three two-dimensional index data and the two-dimensional convolution operation is performed to obtain three exponential convolution operation results. Three exponential convolution operation results and three fractional convolution operation results;

将三个指数卷积运算结果以及三个分数卷积运算结果执行帧叠加处理得到叠加处理后的数据；Perform frame superposition processing on three exponential convolution operation results and three fractional convolution operation results to obtain superimposed data;

将叠加处理后的数据执行通道叠加处理后得到初步卷积结果，将初步处理结果输入到输出缓存。After performing the channel superposition processing on the data after the superposition processing, a preliminary convolution result is obtained, and the preliminary processing result is input to the output buffer.

第三方面，提供一种计算机可读存储介质，其存储用于电子数据交换的计算机程序，其中，所述计算机程序使得计算机执行如第二方面提供的方法。In a third aspect, a computer-readable storage medium is provided that stores a computer program for electronic data interchange, wherein the computer program causes a computer to execute the method as provided in the second aspect.

第四方面，提供一种计算机程序产品，所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，所述计算机程序可操作来使计算机执行第二方面提供的方法。In a fourth aspect, a computer program product is provided. The computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the method provided in the second aspect.

实施本申请实施例，具有如下有益效果：The implementation of the embodiments of the present application has the following beneficial effects:

可以看出，本申请提供的技术方案通过FPGA的快速计算的特点，通过浮点块克服FPGA缓存小的缺点，进而实现了三维卷积神经网络的快速处理，具有计算速度快的优点。It can be seen that the technical solution provided by this application uses the characteristics of fast calculation of FPGA and overcomes the shortcomings of small FPGA cache through floating point blocks, thereby realizing fast processing of 3D convolutional neural network, which has the advantage of fast calculation speed.

BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings according to the drawings without paying creative labor.

图1A是一种二维卷积的示意图。FIG. 1A is a schematic diagram of a two-dimensional convolution.

图1B是一种三维卷积的示意图。FIG. 1B is a schematic diagram of a three-dimensional convolution.

图2A是一种浮点块的数据表示示意图。FIG. 2A is a schematic diagram of data representation of a floating-point block.

图2B是另一种浮点块的数据表示示意图。FIG. 2B is a schematic diagram of data representation of another floating point block.

图3是本申请挺的一种三维卷积神经网络计算装置的结构示意图。FIG. 3 is a schematic structural diagram of a three-dimensional convolutional neural network computing device according to the present application.

图4是本申请提供的一种三维卷积运算实现方法的流程图。FIG. 4 is a flowchart of a method for implementing a three-dimensional convolution operation provided by the present application.

detailed description

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

本申请的说明书和权利要求书及所述附图中的术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "including" and "having" and any variations thereof in the specification and claims of this application and the drawings are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device containing a series of steps or units is not limited to the listed steps or units, but optionally also includes steps or units that are not listed, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference to "an embodiment" herein means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are they independent or alternative embodiments that are mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

本申请中的电子装置可以包括：服务器、智能摄像设备、智能手机(如Android手机、iOS手机、Windows Phone手机等)、平板电脑、掌上电脑、笔记本电脑、移动互联网设备(MID，Mobile Internet Devices)或穿戴式设备等，上述电子装置仅是举例，而非穷举，包含但不限于上述电子装置，为了描述的方便，下面实施例中将上述电子装置称为用户设备(User equipment，UE)、终端或电子设备。当然在实际应用中，上述用户设备也不限于上述变现形式，例如还可以包括：智能车载终端、计算机设备等等。The electronic device in this application may include: a server, a smart camera device, a smart phone (such as an Android phone, an iOS phone, a Windows Phone phone, etc.), a tablet computer, a handheld computer, a laptop computer, and a mobile Internet device (MID, Mobile Internet Devices) Or wearable devices, etc., the above electronic devices are merely examples, not exhaustive, and include but are not limited to the above electronic devices. For convenience of description, the above electronic devices are referred to as user equipment (UE), Terminal or electronic device. Of course, in practical applications, the above-mentioned user equipment is not limited to the above-mentioned realization form, and may include, for example, a smart vehicle terminal, a computer device, and the like.

在第一方面提供的装置中，所述三维卷积核包括：3个二维分数卷积核和3个二维指数卷积核，其中，In the apparatus provided by the first aspect, the three-dimensional convolution kernel includes: three two-dimensional fractional convolution kernels and three two-dimensional exponential convolution kernels, wherein:

所述二维分数卷积核，用于一个二维分数数据的二维卷积运算，The two-dimensional fractional convolution kernel is used for a two-dimensional convolution operation of two-dimensional fraction data,

所述二维指数卷积核，用于执行一个二维指数数据的二维卷积运算。The two-dimensional exponential convolution kernel is used to perform a two-dimensional convolution operation of two-dimensional exponential data.

在第一方面提供的装置中，所述装置还包括：池化处理模块和输出模块；In the apparatus provided by the first aspect, the apparatus further includes: a pooling processing module and an output module;

所述池化处理模块，用于将初步卷积结果执行池化处理得到最终卷积结果，将最终卷积结果通过输出模块输出。The pooling processing module is configured to perform pooling processing on the preliminary convolution result to obtain a final convolution result, and output the final convolution result through an output module.

在第一方面提供的装置中，所述分数处理块包括：缓存、帧第一缓存、帧第二缓存，第一分数缓存、第二分数缓存和第三分数缓存；其中，In the apparatus provided by the first aspect, the score processing block includes a cache, a frame first cache, a frame second cache, a first score cache, a second score cache, and a third score cache; wherein,

缓存分别与帧第一缓存以及第一分数缓存连接，帧第一缓存与帧第二缓存连接，，帧第一缓存还与第二分数缓存连接，帧第二缓存与第三分数缓存连接，第一分数缓存、第二分数缓存和第三分数缓存分别输出二维分数数据。The buffer is connected to the frame first buffer and the first score buffer, the frame first buffer is connected to the frame second buffer, the frame first buffer is also connected to the second score buffer, and the frame second buffer is connected to the third score buffer. The first score buffer, the second score buffer, and the third score buffer respectively output two-dimensional score data.

在第一方面提供的装置中，所述指数核包括：缓存和指数缓存，In the device provided by the first aspect, the index core includes a cache and an index cache,

缓存，用于接收三维分数数据，Buffer for receiving three-dimensional score data,

指数缓存，用于缓存三维分数数据并将三维分数数据分成3个二维分数数据输出。The index buffer is used for buffering three-dimensional score data and dividing the three-dimensional score data into three two-dimensional score data for output.

在第二方面提供的方法中，所述方法还包括：In the method provided by the second aspect, the method further includes:

将初步卷积结果执行池化处理得到最终卷积结果，将最终卷积结果通过输出模块输出。The preliminary convolution result is pooled to obtain the final convolution result, and the final convolution result is output through the output module.

在第二方面提供的方法中，将分数数据Lm分拆成三个二维分数数据，将三个二维分数数据输入三维卷积核具体包括：In the method provided in the second aspect, splitting the score data Lm into three two-dimensional score data, and inputting the three two-dimensional score data into the three-dimensional convolution kernel specifically include:

将分数数据Lm通过三个帧缓存存储，三个帧存储分别将二维分数数据输入到三维卷积核。The score data Lm is stored in three frame buffers, and the three frame stores respectively input the two-dimensional score data to the three-dimensional convolution kernel.

参阅图1A，图1A为一种二维卷积的示意图，对于二维卷积，其每帧的图片具有2个通道的数据，即frame 1透明框表示帧1的通道1的数据，frame 1灰度框表示帧1的通道2的数据，对于计算的数据，帧1的两个通道的数据被合并成1个数据框。对于图1B，图1B为一种三维卷积的示意图，其与二维卷积不同的是，不同帧也会合并，如图1B所示，合并后的数据框具有帧1的2个通道的数据以及帧2的2个通道的数据。Referring to FIG. 1A, FIG. 1A is a schematic diagram of a two-dimensional convolution. For a two-dimensional convolution, a picture of each frame has two channels of data, that is, a frame 1 transparent frame represents data of channel 1 of frame 1, frame 1 The gray box represents the data of channel 2 of frame 1. For the calculated data, the data of the two channels of frame 1 are combined into one data frame. For FIG. 1B, FIG. 1B is a schematic diagram of a three-dimensional convolution. Different from the two-dimensional convolution, different frames are also merged. As shown in FIG. 1B, the merged data frame has two channels of frame 1. Data and data of 2 channels of frame 2.

参阅图2A，图2A为一种浮点数据的表示方式，参阅图2A，图2A为一个帧在一个通道的数据，如图2A所示，一行数据表示一个像素点的数据，如图2A所示，N表示数据类型指示位，Le表示浮点数据的指数位数据，Lm表示浮点数据的分数位数据。Referring to FIG. 2A, FIG. 2A is a representation of floating point data. Referring to FIG. 2A, FIG. 2A is data of one frame in one channel, as shown in FIG. 2A, and one line of data represents data of one pixel, as shown in FIG. 2A. Indicates that N indicates the data type indicator bit, Le indicates the exponent bit data of the floating point data, and Lm indicates the fractional bit data of the floating point data.

参阅图2B，图2B为本申请的一种浮点数据的表示方式，将图2B与图2A对比可以看出，对于图2B其一个帧的一个通道的数据共享Le，相对于每个像素点均具有非共享的Le，图2B的浮点块数据表示方式能够大大的减少数据存储的空间，这样能够匹配FPGA缓存，避免了缓存较小无法有效的存储计算数据的问题，不同块之间运算时，分数位运算所需要的位数较少，从而大大降低了片上计算资源的使用。Referring to FIG. 2B, FIG. 2B is a representation of floating point data of the present application. Comparing FIG. 2B with FIG. 2A, it can be seen that for data sharing Le of one channel and one channel of FIG. 2B, relative to each pixel Both have non-shared Le. The floating-point block data representation of Figure 2B can greatly reduce the data storage space, which can match the FPGA cache, avoiding the problem of small caches that cannot effectively store calculation data, and operations between different blocks. In this case, the number of bits required for the fractional operation is reduced, which greatly reduces the use of on-chip computing resources.

参阅图3，图3提供了一种三维卷积神经网络计算装置，该计算装置如图3所示，包括：分数处理块301、指数核302、三维卷积核303、帧叠加块304、通道叠加块305和输出缓存306；Referring to FIG. 3, FIG. 3 provides a three-dimensional convolutional neural network computing device. As shown in FIG. 3, the computing device includes a score processing block 301, an exponential kernel 302, a three-dimensional convolution kernel 303, a frame superimposing block 304, and a channel. Overlay block 305 and output buffer 306;

分数处理块301，用于接收三维图片数据块的分数数据Lm，将分数数据Lm分拆成三个二维分数数据，将三个二维分数数据输入三维卷积核303；The score processing block 301 is configured to receive the score data Lm of the three-dimensional picture data block, divide the score data Lm into three two-dimensional score data, and input the three two-dimensional score data into the three-dimensional convolution kernel 303;

指数核302，用于接收三维图片数据块的共享指数数据Le，将指数数据Le输入到三维卷积核303；The index kernel 302 is configured to receive the shared index data Le of the three-dimensional picture data block, and input the index data Le to the three-dimensional convolution kernel 303;

三维卷积核303，用于将三个二维分数数据分别执行二维卷积运算得到三个分数卷积运算结果，将指数数据Le分成三个二维指数数据执行二维卷积运算得到三个指数卷积运算结果，将三个指数卷积运算结果以及三个分数卷积运算结果输入到帧叠加块304；The three-dimensional convolution kernel 303 is configured to perform three-dimensional convolution operations on three two-dimensional fraction data to obtain three score convolution operation results, divide the index data Le into three two-dimensional index data, and perform two-dimensional convolution operations to obtain three. Exponential convolution operation results, input three exponential convolution operation results and three fractional convolution operation results to the frame superposition block 304;

帧叠加块304，用于将三个指数卷积运算结果以及三个分数卷积运算结果执行帧叠加处理得到叠加处理后的数据，将叠加处理后的数据输入到通道叠加块305；The frame superposition block 304 is configured to perform frame superposition processing on three exponential convolution operation results and three fractional convolution operation results to obtain superimposed processing data, and input the superimposed processing data to the channel superposition block 305;

通道叠加块305，用于将叠加处理后的数据执行通道叠加处理后得到初步卷积结果，将初步处理结果输入到输出缓存306。A channel superimposing block 305 is configured to obtain a preliminary convolution result after performing channel superposition processing on the data after the superposition processing, and input the preliminary processing result to the output buffer 306.

本申请提供的技术方案首先仅仅只处理浮点块数据，并且在浮点块的数据处理时，需要浮点块的指数数据为共享的指数数据，这样将三维的卷积运算拆分成三个二维的卷积运算，然后在执行帧叠加以及通道叠加实现了三维的卷积运算，由于本申请提供的技术方案的浮点块的数据量小，所以其占用的内存空间较小，因此FPGA的小缓存能够适应上述结构，另外，其计算采用并行的三个二维卷积运算，并行运算能够节省计算时间，所以其具有计算时间短的优点。The technical solution provided in this application first only deals with floating-point block data, and when processing the data of the floating-point block, the index data of the floating-point block needs to be shared index data, so that the three-dimensional convolution operation is divided into three A two-dimensional convolution operation, and then a frame superposition and a channel superposition are performed to implement a three-dimensional convolution operation. Since the floating point block of the technical solution provided by the present application has a small amount of data, its memory space is small, so FPGA The small cache can adapt to the above structure. In addition, its calculation uses three parallel two-dimensional convolution operations. Parallel operations can save calculation time, so it has the advantage of short calculation time.

可选的，上述三维卷积核包括：3个二维分数卷积核和3个二维指数卷积核，二维分数卷积核，用于一个二维分数数据的二维卷积运算，二维指数卷积核，用于执行一个二维指数数据的二维卷积运算。Optionally, the three-dimensional convolution kernel includes: three two-dimensional fractional convolution kernels and three two-dimensional exponential convolution kernels. The two-dimensional fractional convolution kernel is used for a two-dimensional convolution operation of two-dimensional fraction data. Two-dimensional exponential convolution kernel, used to perform a two-dimensional convolution operation on two-dimensional exponential data.

可选的，上述计算装置还可以包括：池化处理模块和输出模块，该池化处理模块，用于将初步卷积结果执行池化处理得到最终卷积结果，将最终卷积结果通过输出模块输出。Optionally, the above computing device may further include a pooling processing module and an output module. The pooling processing module is configured to perform pooling processing on the preliminary convolution result to obtain a final convolution result, and pass the final convolution result through the output module. Output.

可选的，分数处理块301可以包括：缓存3011、帧第一缓存3012、帧第二缓存3013，第一分数缓存3014、第二分数缓存3015和第三分数缓存3016；其中，缓存3011分别与帧第一缓存3012以及第一分数缓存3014连接，帧第一缓存3012与帧第二缓存3013连接，，帧第一缓存3012还与第二分数缓存3015连接，帧第二缓存3013与第三分数缓存3016连接，第一分数缓存3014、第二分数缓存3015和第三分数缓存3016分别输出二维分数数据。Optionally, the score processing block 301 may include: a cache 3011, a frame first cache 3012, a frame second cache 3013, a first score cache 3014, a second score cache 3015, and a third score cache 3016; among them, the cache 3011 and The first frame buffer 3012 and the first score buffer 3014 are connected, the first frame buffer 3012 is connected to the second frame buffer 3013, the first frame buffer 3012 is also connected to the second score buffer 3015, and the second frame buffer 3013 is connected to the third score The buffer 3016 is connected, and the first score buffer 3014, the second score buffer 3015, and the third score buffer 3016 respectively output two-dimensional score data.

可选的，指数核302可以包括：缓存3021和指数缓存3022，缓存3021，用于接收三维分数数据，指数缓存3022，用于缓存三维分数数据并将三维分数数据分成3个二维分数数据输出。Optionally, the index core 302 may include: a cache 3021 and an index cache 3022, a cache 3021 for receiving three-dimensional score data, and an index cache 3022 for buffering three-dimensional score data and dividing the three-dimensional score data into three two-dimensional score data and output .

本申请提供的技术方案基于块浮点数运算，减少了位宽与计算复杂度，从而降低了片上内存资源和硬件计算资源的使用；基于新的三维卷积神经网络硬件架构，具有更快的处理速度与更高的准确率，达到实时处理的要求；与现有的CPU和GPU实现方案相比，能以更低的功耗作出行为判断，同时能保证正确的结果。The technical solution provided by this application is based on block floating point operation, which reduces the bit width and computational complexity, thereby reducing the use of on-chip memory resources and hardware computing resources; based on the new 3D convolutional neural network hardware architecture, it has faster processing Speed and higher accuracy meet the requirements of real-time processing; compared with existing CPU and GPU implementations, it can make behavioral judgments with lower power consumption, and at the same time ensure correct results.

参阅图4，图4提供一种三维卷积运算的方法，所述方法包括：Referring to FIG. 4, FIG. 4 provides a method for a three-dimensional convolution operation. The method includes:

步骤S401、接收三维图片数据块的分数数据Lm，将分数数据Lm分拆成三个二维分数数据，将三个二维分数数据输入三维卷积核；Step S401: Receive score data Lm of a three-dimensional picture data block, divide the score data Lm into three two-dimensional score data, and input the three two-dimensional score data into a three-dimensional convolution kernel;

步骤S402、接收三维图片数据块的共享指数数据Le，将指数数据Le输入到三维卷积核；Step S402: Receive the shared index data Le of the three-dimensional picture data block, and input the index data Le to the three-dimensional convolution kernel;

步骤S403、将三个二维分数数据分别执行二维卷积运算得到三个分数卷积运算结果，将指数数据Le分成三个二维指数数据执行二维卷积运算得到三个指数卷积运算结果，将三个指数卷积运算结果以及三个分数卷积运算结果；Step S403: Perform three-dimensional convolution operations on the three two-dimensional score data to obtain three score convolution operation results, divide the index data Le into three two-dimensional index data, and perform two-dimensional convolution operations to obtain three exponential convolution operations. As a result, three exponential convolution operation results and three fractional convolution operation results are obtained;

步骤S404、将三个指数卷积运算结果以及三个分数卷积运算结果执行帧叠加处理得到叠加处理后的数据；Step S404: Perform frame superposition processing on the three exponential convolution operation results and the three fractional convolution operation results to obtain data after the superposition processing;

步骤S405、将叠加处理后的数据执行通道叠加处理后得到初步卷积结果，将初步处理结果输入到输出缓存。Step S405: Perform the channel superposition processing on the data after the superposition processing to obtain a preliminary convolution result, and input the preliminary processing result to the output buffer.

本申请实施例还提供一种计算机存储介质，其中，该计算机存储介质存储用于电子数据交换的计算机程序，该计算机程序使得计算机执行如上述方法实施例中记载的任何一种三维卷积运算的方法的部分或全部步骤。An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute any one of the three-dimensional convolution operations described in the foregoing method embodiments. Part or all of the steps of a method.

本申请实施例还提供一种计算机程序产品，所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种三维卷积运算的方法的部分或全部步骤。An embodiment of the present application further provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform the operations described in the foregoing method embodiments. Part or all of the steps of any method of 3D convolution operation.

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请并不受所描述的动作顺序的限制，因为依据本申请，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于可选实施例，所涉及的动作和模块并不一定是本申请所必须的。It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all described as a series of action combinations. However, those skilled in the art should know that this application is not limited by the described action order. Because according to the present application, certain steps may be performed in another order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily required for this application.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not described in detail in one embodiment, reference may be made to related descriptions in other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置，可通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are merely schematic

另外，在本申请各个实施例中的处理器、芯片可以集成在一个处理单元中，也可以是单独物理存在，也可以两个或两个以上硬件集成在一个单元中。计算机可读存储介质或计算机可读程序可以存储在一个计算机可读取存储器中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储器中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the processors and chips in the various embodiments of the present application may be integrated in one processing unit, or may exist separately physically, or two or more pieces of hardware may be integrated in one unit. The computer-readable storage medium or computer-readable program may be stored in a computer-readable memory. Based on such an understanding, the technical solution of the present application essentially or part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, which is stored in a memory, Several instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. The foregoing memories include: U disks, Read-Only Memory (ROM), Random Access Memory (RAM), mobile hard disks, magnetic disks, or optical disks and other media that can store program codes.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序可以存储于一计算机可读存储器中，存储器可以包括：闪存盘、只读存储器(英文：Read-Only Memory，简称：ROM)、随机存取器(英文：Random Access Memory，简称：RAM)、磁盘或光盘等。A person of ordinary skill in the art may understand that all or part of the steps in the various methods of the foregoing embodiments may be completed by a program instructing related hardware. The program may be stored in a computer-readable memory, and the memory may include a flash disk. , Read-only memory (English: Read-Only Memory, referred to as ROM), random access device (English: Random Access Memory, referred to as RAM), magnetic disks or optical disks, etc.

以上对本申请实施例进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The embodiments of the present application have been described in detail above. Specific examples have been used in this document to explain the principles and implementation of the present application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application. Persons of ordinary skill in the art may change the specific implementation and application scope according to the idea of the present application. In summary, the content of this description should not be construed as a limitation on the present application.

Claims

A three-dimensional convolutional neural network computing device, characterized in that the computing device includes: a fraction processing block, an exponential kernel, a three-dimensional convolution kernel, a frame superposition block, a channel superposition block, and an output buffer;

The score processing block is used to receive the score data Lm of the three-dimensional picture data block, divide the score data Lm into three two-dimensional score data, and input the three two-dimensional score data into the three-dimensional convolution kernel;

The exponential kernel is used to receive the shared exponential data Le of the three-dimensional image data block, and input the exponential data Le to the three-dimensional convolution kernel;

A three-dimensional convolution kernel, which is used to perform three-dimensional convolution operations on three two-dimensional fractional data to obtain three fractional convolution operation results, divide the index data Le into three two-dimensional index data, and perform three-dimensional convolution operations to obtain three. Exponential convolution operation results, inputting three exponential convolution operation results and three fractional convolution operation results into the frame superposition block;

Frame superposition block, which is used to perform frame superposition processing on three exponential convolution operation results and three fractional convolution operation results to obtain superimposed data, and input the superimposed processing data to the channel superposition block;

The channel superposition block is used to obtain the preliminary convolution result after performing the channel superposition processing on the data after the superposition processing, and input the preliminary processing result to the output buffer.

The apparatus according to claim 1, wherein the three-dimensional convolution kernel comprises: three two-dimensional fractional convolution kernels and three two-dimensional exponential convolution kernels, wherein:

The two-dimensional fractional convolution kernel is used for a two-dimensional convolution operation of two-dimensional fraction data,

The two-dimensional exponential convolution kernel is used to perform a two-dimensional convolution operation of two-dimensional exponential data.

The device according to claim 1 or 2, further comprising: a pooling processing module and an output module;

The pooling processing module is configured to perform pooling processing on the preliminary convolution result to obtain a final convolution result, and output the final convolution result through an output module.

The device according to claim 1 or 2, wherein the score processing block comprises: a cache, a frame first cache, a frame second cache, a first score cache, a second score cache, and a third score cache; wherein ,

The buffer is connected to the frame first buffer and the first score buffer, the frame first buffer is connected to the frame second buffer, the frame first buffer is also connected to the second score buffer, and the frame second buffer is connected to the third score buffer. The first score buffer, the second score buffer, and the third score buffer respectively output two-dimensional score data.

The apparatus according to claim 1 or 2, wherein the exponential core comprises: a cache and an exponential cache,

Buffer for receiving three-dimensional score data,

The index buffer is used for buffering three-dimensional score data and dividing the three-dimensional score data into three two-dimensional score data for output.

A method for performing a three-dimensional convolution operation by using the device according to any one of claims 1-5, wherein the method includes:

Receive the score data Lm of the three-dimensional picture data block, divide the score data Lm into three two-dimensional score data, and input the three two-dimensional score data into the three-dimensional convolution kernel;

Receive the shared index data Le of the three-dimensional picture data block, and input the index data Le to the three-dimensional convolution kernel;

The three two-dimensional fraction data is respectively subjected to a two-dimensional convolution operation to obtain three fractional convolution operation results. The index data Le is divided into three two-dimensional index data and the two-dimensional convolution operation is performed to obtain three exponential convolution operation results. Three exponential convolution operation results and three fractional convolution operation results;

Perform frame superposition processing on three exponential convolution operation results and three fractional convolution operation results to obtain superimposed data;

After performing the channel superposition processing on the data after the superposition processing, a preliminary convolution result is obtained, and the preliminary processing result is input to the output buffer.

The method according to claim 6, further comprising:

The preliminary convolution result is pooled to obtain the final convolution result, and the final convolution result is output through the output module.

The method according to claim 6 or 7, wherein splitting the score data Lm into three two-dimensional score data, and inputting the three two-dimensional score data into the three-dimensional convolution kernel specifically include:

The score data Lm is stored in three frame buffers, and the three frame stores respectively input the two-dimensional score data to the three-dimensional convolution kernel.

A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes a computer to execute the computer program according to any one of claims 1-4. Methods.

A computer program product, characterized in that the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute any one of claims 1-4 The method described.