[go: up one dir, main page]

WO2019136758A1 - 人工智能处理装置硬件优化方法、系统、存储介质、终端 - Google Patents

人工智能处理装置硬件优化方法、系统、存储介质、终端 Download PDF

Info

Publication number
WO2019136758A1
WO2019136758A1 PCT/CN2018/072672 CN2018072672W WO2019136758A1 WO 2019136758 A1 WO2019136758 A1 WO 2019136758A1 CN 2018072672 W CN2018072672 W CN 2018072672W WO 2019136758 A1 WO2019136758 A1 WO 2019136758A1
Authority
WO
WIPO (PCT)
Prior art keywords
hardware
artificial intelligence
processing device
intelligence processing
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/072672
Other languages
English (en)
French (fr)
Inventor
肖梦秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Corerain Technologies Co Ltd
Original Assignee
Shenzhen Corerain Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Corerain Technologies Co Ltd filed Critical Shenzhen Corerain Technologies Co Ltd
Priority to PCT/CN2018/072672 priority Critical patent/WO2019136758A1/zh
Priority to CN201880002759.XA priority patent/CN109496319A/zh
Publication of WO2019136758A1 publication Critical patent/WO2019136758A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Definitions

  • the present invention relates to the technical field of software processing, and in particular, to a hardware optimization method, system, storage medium, and terminal for an artificial intelligence processing device.
  • Deep learning stems from the study of artificial neural networks.
  • a multilayer perceptron with multiple hidden layers is a deep learning structure. Deep learning combines low-level features to form more abstract high-level representation attribute categories or features to discover distributed feature representations of data.
  • Deep learning is a method based on the representation of data in machine learning. Observations (e.g., an image) can be represented in a variety of ways, such as a vector of each pixel intensity value, or more abstractly represented as a series of edges, regions of a particular shape, and the like. It is easier to learn tasks from instances (eg, face recognition or facial expression recognition) using some specific representation methods.
  • the advantage of deep learning is the use of unsupervised or semi-supervised feature learning and hierarchical feature extraction efficient algorithms instead of manual acquisition features.
  • CNN Convolutional Neural Networks
  • DBN Deep Belief Nets
  • CNN has become one of the research hotspots in many scientific fields, especially in the field of pattern classification. Since the network avoids the complicated pre-processing of images, it can directly input the original image, and thus has been widely used.
  • the basic structure of the CNN includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to the local acceptance domain of the previous layer, and the local features are extracted. Once the local feature is extracted, its positional relationship with other features is also determined; the second is the feature mapping layer, each computing layer of the network is composed of multiple feature maps, and each feature map is a plane. The weights of all neurons on the plane are equal.
  • the feature mapping structure uses a small sigmoid function that affects the function kernel as the activation function of the convolutional network, so that the feature map has displacement invariance. In addition, since the neurons on one mapping surface share weights, the number of network free parameters is reduced.
  • Each convolutional layer in the convolutional neural network is followed by a computational layer for local averaging and quadratic extraction. This unique two-feature extraction structure reduces feature resolution.
  • CNN is mainly used to identify two-dimensional graphics of displacement, scaling and other forms of distortion invariance. Since the feature detection layer of the CNN learns through the training data, when the CNN is used, the feature extraction of the display is avoided, and the learning data is implicitly learned; and the weights of the neurons on the same feature mapping surface are the same. So the network can learn in parallel, which is also a big advantage of the convolutional network relative to the neural network connected to each other.
  • the convolutional neural network has unique advantages in speech recognition and image processing with its special structure of local weight sharing. Its layout is closer to the actual biological neural network, and weight sharing reduces the complexity of the network, especially multidimensional.
  • the feature that the input vector image can be directly input into the network avoids the complexity of data reconstruction during feature extraction and classification.
  • an object of the present invention is to provide a hardware optimization method, system, storage medium, and terminal for an artificial intelligence processing device, which are optimized by hardware for a deep learning data flow graph of a deep learning algorithm. Can be implemented efficiently and orderly on the hardware.
  • the present invention provides a hardware optimization method for an artificial intelligence processing device, including the following steps: performing a hardware design space search to obtain hardware requirement information based on a deep learning data flow graph of a deep learning network model; Mapping the hardware requirement information on the artificial intelligence processing device to obtain hardware allocation information; and generating a hardware bit stream input to the artificial intelligence processing device based on the hardware allocation information.
  • the artificial intelligence processing device includes an FPGA, and the hardware bit stream is input to the FPGA.
  • the FPGA includes a convolution module, a deconvolution module, and a shared cache module; and the hardware requirement is implemented based on the convolution module, the deconvolution module, and the shared cache module. information.
  • the deep learning network model adopts a Tensorflow training model.
  • the present invention provides a hardware optimization system for an artificial intelligence processing device, including a search module, a mapping module, and a generation module;
  • the searching module is configured to perform a design space search of the hardware to obtain hardware requirement information based on the deep learning data flow graph of the deep learning network model;
  • the mapping module is configured to map the hardware requirement information on the artificial intelligence processing device to obtain hardware allocation information
  • the generating module is configured to generate a hardware bit stream input to the artificial intelligence processing device based on the hardware allocation information.
  • the artificial intelligence processing device includes an FPGA, and the hardware bit stream is input to the FPGA.
  • the FPGA includes a convolution module, a deconvolution module, and a shared cache module; the mapping module is implemented based on the convolution module, the deconvolution module, and the shared cache module.
  • the hardware requirement information is included in the FPGA.
  • the deep learning network model adopts a Tensorflow training model.
  • the present invention provides a storage medium having stored thereon a computer program that, when executed by a processor, implements the above-described artificial intelligence processing device hardware optimization method.
  • the present invention provides a terminal, including: a processor and a memory;
  • the memory is for storing a computer program
  • the processor is configured to execute the computer program stored in the memory to cause the terminal to execute the artificial intelligence processing device hardware optimization method.
  • the hardware optimization method, system, storage medium, and terminal of the artificial intelligence processing device of the present invention have the following beneficial effects:
  • FIG. 1 is a flow chart showing an embodiment of a hardware optimization method of an artificial intelligence processing device according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an artificial intelligence processing device hardware optimization system according to an embodiment of the present invention
  • FIG. 3 is a schematic structural view of a terminal according to an embodiment of the present invention.
  • the hardware optimization method, system, storage medium and terminal of the artificial intelligence processing device of the invention optimize the hardware of the deep learning data flow graph of the deep learning algorithm, so that the hardware learning can be realized in an efficient and orderly manner on the hardware, thereby fully utilizing the hardware resources. , improve the efficiency of the use of artificial intelligence processing devices.
  • the hardware optimization method of the artificial intelligence processing device of the present invention includes the following steps:
  • Step S1 Perform a design space search of the hardware to obtain hardware requirement information based on the deep learning data flow graph of the deep learning network model.
  • the deep learning network model adopts a Tensorflow training model.
  • Tensorflow is Google's second-generation artificial intelligence learning system based on DistBelief. Its name is derived from its operating principle.
  • Tensor means an N-dimensional array.
  • Flow means that based on the calculation of the data flow graph, Tensorflow flows from one end of the flow graph to the other.
  • Tensorflow is a system that transmits complex data structures to an artificial intelligence neural network for analysis and processing.
  • DSE Design Space Exploration
  • hardware design information of the deep learning network model is obtained by designing a space search.
  • Step S2 Mapping the hardware requirement information on the artificial intelligence processing device to obtain hardware allocation information.
  • the hardware requirement information is satisfied by the artificial intelligence processing device, thereby obtaining hardware allocation information of the artificial intelligence device.
  • the artificial intelligence processing device includes an FPGA, and the hardware bit stream is input to the FPGA.
  • the FPGA includes a convolution module, a deconvolution module, and a shared cache module; when performing mapping, the hardware requirement information is implemented based on the convolution module, the deconvolution module, and the shared cache module.
  • Step S3 Generate a hardware bit stream input to the artificial intelligence processing device based on the hardware allocation information.
  • a hardware bit stream of the artificial intelligence processing device is generated according to the hardware allocation information.
  • the hardware bit stream is input to the artificial intelligence processing device, and the artificial intelligence processing device can be used in a pipeline manner to achieve maximum utilization of the artificial intelligence processing device.
  • the artificial intelligence processing device hardware optimization system of the present invention includes a search module 21, a mapping module 22, and a generating module 23 that are sequentially connected.
  • the search module 21 is configured to perform a design space search of the hardware to obtain hardware requirement information based on the deep learning data flow graph of the deep learning network model.
  • the deep learning network model adopts a Tensorflow training model.
  • Tensorflow is Google's second-generation artificial intelligence learning system based on DistBelief. Its name is derived from its operating principle.
  • Tensor means an N-dimensional array.
  • Flow means that based on the calculation of the data flow graph, Tensorflow flows from one end of the flow graph to the other.
  • Tensorflow is a system that transmits complex data structures to an artificial intelligence neural network for analysis and processing.
  • DSE Design Space Exploration
  • hardware design information of the deep learning network model is obtained by designing a space search.
  • the mapping module 22 is configured to map the hardware requirement information on the artificial intelligence processing device to obtain hardware allocation information.
  • the hardware requirement information is satisfied by the artificial intelligence processing device, thereby obtaining hardware allocation information of the artificial intelligence device.
  • the artificial intelligence processing device includes an FPGA, and the hardware bit stream is input to the FPGA.
  • the FPGA includes a convolution module, a deconvolution module, and a shared cache module; when performing mapping, the hardware requirement information is implemented based on the convolution module, the deconvolution module, and the shared cache module.
  • the generating module 23 is configured to generate a hardware bit stream input to the artificial intelligence processing device based on the hardware allocation information.
  • a hardware bit stream of the artificial intelligence processing device is generated according to the hardware allocation information.
  • the hardware bit stream is input to the artificial intelligence processing device, and the artificial intelligence processing device can be used in a pipeline manner to achieve maximum utilization of the artificial intelligence processing device.
  • each module of the above system is only a division of logical functions, and the actual implementation may be integrated into one physical entity in whole or in part, or may be physically separated.
  • these modules can all be implemented by software in the form of processing component calls; they can also be implemented in hardware form; some modules can be realized by processing component calling software, and some modules are realized by hardware.
  • the x module may be a separately set processing element, or may be integrated in one of the above-mentioned devices, or may be stored in the memory of the above device in the form of program code, by a processing element of the above device. Call and execute the functions of the above x modules.
  • the implementation of other modules is similar.
  • all or part of these modules can be integrated or implemented independently.
  • the processing elements described herein can be an integrated circuit with signal processing capabilities. In the implementation process, each step of the above method or each of the above modules may be completed by an integrated logic circuit of hardware in the processor element or an instruction in a form of software.
  • the above modules may be one or more integrated circuits configured to implement the above method, for example, one or more specific integrated circuits (ASICs), or one or more microprocessors (digitalsingnal processors, referred to as DSP), or one or more Field Programmable Gate Arrays (FPGAs).
  • ASICs application specific integrated circuits
  • DSP digital signal processors
  • FPGAs Field Programmable Gate Arrays
  • the processing component may be a general-purpose processor, such as a central processing unit (CPU) or other processor that can call the program code.
  • these modules can be integrated and implemented in the form of a system-on-a-chip (SOC).
  • SOC system-on-a-chip
  • the storage medium of the present invention stores a computer program, and when the program is executed by the processor, the above-described artificial intelligence processing device hardware optimization method is implemented.
  • the storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the terminal of the present invention includes a processor 31 and a memory 32.
  • the memory 32 is used to store a computer program.
  • the memory 32 includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the processor 31 and the memory 32 are configured to execute a computer program stored by the memory 32 to cause the terminal to execute the artificial intelligence processing device hardware optimization method.
  • the processor 31 may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP for short), and the like; or a digital signal processor (DSP), dedicated integration.
  • CPU central processing unit
  • NP Network Processor
  • DSP digital signal processor
  • Circuit ApplicationSpecific Integrated Circuit, ASIC for short
  • FPGA Field-Programmable Gate Array
  • FPGA field-Programmable Gate Array
  • the hardware optimization method, system, storage medium, and terminal of the artificial intelligence processing device of the present invention perform hardware optimization on the deep learning data flow graph of the deep learning algorithm, so that it can be efficiently and orderly implemented on the hardware;
  • the use of hardware resources improves the use efficiency of the artificial intelligence processing device; the utility is strong. Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial utilization value.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提供一种人工智能处理装置硬件优化方法、系统、存储介质、终端,包括以下步骤:基于深度学习网络模型的深度学习数据流图,进行硬件的设计空间搜索以获取硬件需求信息;将所述硬件需求信息映射在所述人工智能处理装置上以获取硬件分配信息;基于所述硬件分配信息生成输入所述人工智能处理装置的硬件比特流。本发明的人工智能处理装置硬件优化方法、系统、存储介质、终端通过对深度学习算法的深度学习数据流图进行硬件优化,使其能够在硬件上高效有序地实现。

Description

人工智能处理装置硬件优化方法、系统、存储介质、终端 技术领域
本发明涉及软件处理的技术领域,特别是涉及一种人工智能处理装置硬件优化方法、系统、存储介质、终端。
背景技术
深度学习的概念源于人工神经网络的研究。含多隐层的多层感知器就是一种深度学习结构。深度学习通过组合低层特征形成更加抽象的高层表示属性类别或特征,以发现数据的分布式特征表示。
深度学习是机器学习中一种基于对数据进行表征学习的方法。观测值(例如一幅图像)可以使用多种方式来表示,如每个像素强度值的向量,或者更抽象地表示成一系列边、特定形状的区域等。而使用某些特定的表示方法更容易从实例中学习任务(例如,人脸识别或面部表情识别)。深度学习的好处是用非监督式或半监督式的特征学习和分层特征提取高效算法来替代手工获取特征。
同机器学习方法一样,深度机器学习方法也有监督学习与无监督学习之分.不同的学习框架下建立的学习模型很是不同.例如,卷积神经网络(Convolutional neural networks,CNN)就是一种深度的监督学习下的机器学习模型,而深度置信网(Deep Belief Nets,DBN)就是一种无监督学习下的机器学习模型。
目前,CNN已经成为众多科学领域的研究热点之一,特别是在模式分类领域,由于该网络避免了对图像的复杂前期预处理,可以直接输入原始图像,因而得到了更为广泛的应用。一般地,CNN的基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;其二是特征映射层,网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等。特征映射结构采用影响函数核小的sigmoid函数作为卷积网络的激活函数,使得特征映射具有位移不变性。此外,由于一个映射面上的神经元共享权值,因而减少了网络自由参数的个数。卷积神经网络中的每一个卷积层都紧跟着一个用来求局部平均与二次提取的计算层,这种特有的两次特征提取结构减小了特征分辨率。
CNN主要用来识别位移、缩放及其他形式扭曲不变性的二维图形。由于CNN的特征检测层通过训练数据进行学习,所以在使用CNN时,避免了显示的特征抽取,而隐式地从训 练数据中进行学习;再者由于同一特征映射面上的神经元权值相同,所以网络可以并行学习,这也是卷积网络相对于神经元彼此相连网络的一大优势。卷积神经网络以其局部权值共享的特殊结构在语音识别和图像处理方面有着独特的优越性,其布局更接近于实际的生物神经网络,权值共享降低了网络的复杂性,特别是多维输入向量的图像可以直接输入网络这一特点避免了特征提取和分类过程中数据重建的复杂度。
因此,如何实现深度学习算法的硬件优化使其能够在硬件上快速有序地实现成为当前的热点研究课题之一。
发明内容
鉴于以上所述现有技术的缺点,本发明的目的在于提供一种人工智能处理装置硬件优化方法、系统、存储介质、终端,通过对深度学习算法的深度学习数据流图进行硬件优化,使其能够在硬件上高效有序地实现。
为实现上述目的及其他相关目的,本发明提供一种人工智能处理装置硬件优化方法,包括以下步骤:基于深度学习网络模型的深度学习数据流图,进行硬件的设计空间搜索以获取硬件需求信息;将所述硬件需求信息映射在所述人工智能处理装置上以获取硬件分配信息;基于所述硬件分配信息生成输入所述人工智能处理装置的硬件比特流。
于本发明一实施例中,所述人工智能处理装置包括FPGA,所述硬件比特流输入所述FPGA。
于本发明一实施例中,所述FPGA包括卷积模块、反卷积模块和共享缓存模块;基于所述卷积模块、所述反卷积模块和所述共享缓存模块来实现所述硬件需求信息。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。
对应地,本发明提供一种人工智能处理装置硬件优化系统,包括搜索模块、映射模块和生成模块;
所述搜索模块用于基于深度学习网络模型的深度学习数据流图,进行硬件的设计空间搜索以获取硬件需求信息;
所述映射模块用于将所述硬件需求信息映射在所述人工智能处理装置上以获取硬件分配信息;
所述生成模块用于基于所述硬件分配信息生成输入所述人工智能处理装置的硬件比特流。
于本发明一实施例中,所述人工智能处理装置包括FPGA,所述硬件比特流输入所述 FPGA。
于本发明一实施例中,所述FPGA包括卷积模块、反卷积模块和共享缓存模块;所述映射模块基于所述卷积模块、所述反卷积模块和所述共享缓存模块来实现所述硬件需求信息。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。
本发明提供一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述人工智能处理装置硬件优化方法。
最后,本发明提供一种终端,包括:处理器及存储器;
所述存储器用于存储计算机程序;
所述处理器用于执行所述存储器存储的计算机程序,以使所述终端执行上述人工智能处理装置硬件优化方法。
如上所述,本发明的人工智能处理装置硬件优化方法、系统、存储介质、终端,具有以下有益效果:
(1)通过对深度学习算法的深度学习数据流图进行硬件优化,使其能够在硬件上高效有序地实现;
(2)充分利用了硬件资源,提高了人工智能处理装置的使用效率;
(3)实用性强。
附图说明
图1显示为本发明的人工智能处理装置硬件优化方法于一实施例中的流程图;
图2显示为本发明的人工智能处理装置硬件优化系统于一实施例中的结构示意图;
图3显示为本发明的终端于一实施例中的结构示意图。
元件标号说明
21                     搜索模块
22                     映射模块
23                     生成模块
31                     处理器
32                     存储器
具体实施方式
以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露 的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。
需要说明的是,本实施例中所提供的图示仅以示意方式说明本发明的基本构想,遂图式中仅显示与本发明中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制,其实际实施时各组件的型态、数量及比例可为一种随意的改变,且其组件布局型态也可能更为复杂。
本发明的人工智能处理装置硬件优化方法、系统、存储介质、终端通过对深度学习算法的深度学习数据流图进行硬件优化,使其能够在硬件上高效有序地实现,从而充分利用了硬件资源,提高了人工智能处理装置的使用效率。
如图1所示,于一实施例中,本发明的人工智能处理装置硬件优化方法包括以下步骤:
步骤S1、基于深度学习网络模型的深度学习数据流图,进行硬件的设计空间搜索以获取硬件需求信息。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。Tensorflow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理。Tensor(张量)意味着N维数组,Flow(流)意味着基于数据流图的计算,Tensorflow为张量从流图的一端流动到另一端计算过程。Tensorflow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统。
设计空间搜索(Design Space Exploration,DSE)用于寻找满足设计约束的处理器体系结构。具体地,DSE以多目标演化算法为核心,依据参数依赖性概念对设计空间进行大幅度的缩减并使用空间阈值技巧增加了策略的适应性。通过与敏感度分析搜索策略的实验对比,证明了该策略在得到更优配置的同时可以显著缩短搜索时间。
具体地,通过设计空间搜索,得到深度学习网络模型的硬件需求信息。
步骤S2、将所述硬件需求信息映射在所述人工智能处理装置上以获取硬件分配信息。
也就是说,通过所述人工智能处理装置来满足所述硬件需求信息,从而得到了所述人工智能装置的硬件分配信息。
于本发明一实施例中,所述人工智能处理装置包括FPGA,所述硬件比特流输入所述FPGA。所述FPGA包括卷积模块、反卷积模块和共享缓存模块;进行映射时,基于所述卷积模块、所述反卷积模块和所述共享缓存模块来实现所述硬件需求信息。
步骤S3、基于所述硬件分配信息生成输入所述人工智能处理装置的硬件比特流。
具体地,根据所述硬件分配信息,生成所述人工智能处理装置的硬件比特流。将所述硬件比特流输入所述人工智能处理装置,能够以流水线的方式使用所述人工智能处理装置,实现所述人工智能处理装置的最大化利用。
如图2所示,于一实施例中,本发明的人工智能处理装置硬件优化系统包括依次相连的搜索模块21、映射模块22和生成模块23。
搜索模块21用于基于深度学习网络模型的深度学习数据流图,进行硬件的设计空间搜索以获取硬件需求信息。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。Tensorflow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理。Tensor(张量)意味着N维数组,Flow(流)意味着基于数据流图的计算,Tensorflow为张量从流图的一端流动到另一端计算过程。Tensorflow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统。
设计空间搜索(Design Space Exploration,DSE)用于寻找满足设计约束的处理器体系结构。具体地,DSE以多目标演化算法为核心,依据参数依赖性概念对设计空间进行大幅度的缩减并使用空间阈值技巧增加了策略的适应性。通过与敏感度分析搜索策略的实验对比,证明了该策略在得到更优配置的同时可以显著缩短搜索时间。
具体地,通过设计空间搜索,得到深度学习网络模型的硬件需求信息。
映射模块22用于将所述硬件需求信息映射在所述人工智能处理装置上以获取硬件分配信息。
也就是说,通过所述人工智能处理装置来满足所述硬件需求信息,从而得到了所述人工智能装置的硬件分配信息。
于本发明一实施例中,所述人工智能处理装置包括FPGA,所述硬件比特流输入所述FPGA。所述FPGA包括卷积模块、反卷积模块和共享缓存模块;进行映射时,基于所述卷积模块、所述反卷积模块和所述共享缓存模块来实现所述硬件需求信息。
生成模块23用于基于所述硬件分配信息生成输入所述人工智能处理装置的硬件比特流。
具体地,根据所述硬件分配信息,生成所述人工智能处理装置的硬件比特流。将所述硬件比特流输入所述人工智能处理装置,能够以流水线的方式使用所述人工智能处理装置,实现所述人工智能处理装置的最大化利用。
需要说明的是,应理解以上系统的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软 件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块通过处理元件调用软件的形式实现,部分模块通过硬件的形式实现。例如,x模块可以为单独设立的处理元件,也可以集成在上述装置的某一个芯片中实现,此外,也可以以程序代码的形式存储于上述装置的存储器中,由上述装置的某一个处理元件调用并执行以上x模块的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件可以是一种集成电路,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(ApplicationSpecificIntegratedCircuit,简称ASIC),或,一个或多个微处理器(digitalsingnalprocessor,简称DSP),或,一个或者多个现场可编程门阵列(FieldProgrammableGateArray,简称FPGA)等。再如,当以上某个模块通过处理元件调度程序代码的形式实现时,该处理元件可以是通用处理器,例如中央处理器(CentralProcessingUnit,简称CPU)或其它可以调用程序代码的处理器。再如,这些模块可以集成在一起,以片上系统(system-on-a-chip,简称SOC)的形式实现。
本发明的存储介质上存储有计算机程序,该程序被处理器执行时实现上述的人工智能处理装置硬件优化方法。优选地,所述存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
如图3所示,于一实施例中,本发明的终端包括处理器31及存储器32。
所述存储器32用于存储计算机程序。
优选地,所述存储器32包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
所述处理器31与所述存储器32向,用于执行所述存储器32存储的计算机程序,以使所述终端执行上述人工智能处理装置硬件优化方法。
优选地,处理器31可以是通用处理器,包括中央处理器(CentralProcessingUnit,简称CPU)、网络处理器(NetworkProcessor,简称NP)等;还可以是数字信号处理器(DigitalSignalProcessing,简称DSP)、专用集成电路(ApplicationSpecificIntegratedCircuit,简称ASIC)、现场可编程门阵列(Field-ProgrammableGateArray,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
综上所述,本发明的人工智能处理装置硬件优化方法、系统、存储介质、终端通过对深 度学习算法的深度学习数据流图进行硬件优化,使其能够在硬件上高效有序地实现;充分利用了硬件资源,提高了人工智能处理装置的使用效率;实用性强。所以,本发明有效克服了现有技术中的种种缺点而具高度产业利用价值。
上述实施例仅例示性说明本发明的原理及其功效,而非用于限制本发明。任何熟悉此技术的人士皆可在不违背本发明的精神及范畴下,对上述实施例进行修饰或改变。因此,举凡所属技术领域中具有通常知识者在未脱离本发明所揭示的精神与技术思想下所完成的一切等效修饰或改变,仍应由本发明的权利要求所涵盖。

Claims (10)

  1. 一种人工智能处理装置硬件优化方法,其特征在于:包括以下步骤:
    基于深度学习网络模型的深度学习数据流图,进行硬件的设计空间搜索以获取硬件需求信息;
    将所述硬件需求信息映射在所述人工智能处理装置上以获取硬件分配信息;
    基于所述硬件分配信息生成输入所述人工智能处理装置的硬件比特流。
  2. 根据权利要求1所述的人工智能处理装置硬件优化方法,其特征在于:所述人工智能处理装置包括FPGA,所述硬件比特流输入所述FPGA。
  3. 根据权利要求2所述的人工智能处理装置硬件优化方法,其特征在于:所述FPGA包括卷积模块、反卷积模块和共享缓存模块;基于所述卷积模块、所述反卷积模块和所述共享缓存模块来实现所述硬件需求信息。
  4. 根据权利要求1所述的人工智能处理装置硬件优化方法,其特征在于:所述深度学习网络模型采用Tensorflow训练模型。
  5. 一种人工智能处理装置硬件优化系统,其特征在于:包括搜索模块、映射模块和生成模块;
    所述搜索模块用于基于深度学习网络模型的深度学习数据流图,进行硬件的设计空间搜索以获取硬件需求信息;
    所述映射模块用于将所述硬件需求信息映射在所述人工智能处理装置上以获取硬件分配信息;
    所述生成模块用于基于所述硬件分配信息生成输入所述人工智能处理装置的硬件比特流。
  6. 根据权利要求5所述的人工智能处理装置硬件优化系统,其特征在于:所述人工智能处理装置包括FPGA,所述硬件比特流输入所述FPGA。
  7. 根据权利要求6所述的人工智能处理装置硬件优化系统,其特征在于:所述FPGA包括卷积模块、反卷积模块和共享缓存模块;所述映射模块基于所述卷积模块、所述反卷积模 块和所述共享缓存模块来实现所述硬件需求信息。
  8. 根据权利要求5所述的人工智能处理装置硬件优化系统,其特征在于:所述深度学习网络模型采用Tensorflow训练模型。
  9. 一种存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至4中任一项所述人工智能处理装置硬件优化方法。
  10. 一种终端,其特征在于,包括:处理器及存储器;
    所述存储器用于存储计算机程序;
    所述处理器用于执行所述存储器存储的计算机程序,以使所述终端执行权利要求1至4中任一项所述人工智能处理装置硬件优化方法。
PCT/CN2018/072672 2018-01-15 2018-01-15 人工智能处理装置硬件优化方法、系统、存储介质、终端 Ceased WO2019136758A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/072672 WO2019136758A1 (zh) 2018-01-15 2018-01-15 人工智能处理装置硬件优化方法、系统、存储介质、终端
CN201880002759.XA CN109496319A (zh) 2018-01-15 2018-01-15 人工智能处理装置硬件优化方法、系统、存储介质、终端

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/072672 WO2019136758A1 (zh) 2018-01-15 2018-01-15 人工智能处理装置硬件优化方法、系统、存储介质、终端

Publications (1)

Publication Number Publication Date
WO2019136758A1 true WO2019136758A1 (zh) 2019-07-18

Family

ID=65713859

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072672 Ceased WO2019136758A1 (zh) 2018-01-15 2018-01-15 人工智能处理装置硬件优化方法、系统、存储介质、终端

Country Status (2)

Country Link
CN (1) CN109496319A (zh)
WO (1) WO2019136758A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313239A (zh) * 2021-06-25 2021-08-27 展讯通信(上海)有限公司 人工智能模型设计优化方法及装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070182B (zh) * 2019-04-01 2021-08-24 京微齐力(北京)科技有限公司 适合人工智能的平台芯片及其制造和设计方法
US11934940B2 (en) 2019-04-18 2024-03-19 Cambricon Technologies Corporation Limited AI processor simulation
CN111832739B (zh) * 2019-04-18 2024-01-09 中科寒武纪科技股份有限公司 一种数据处理方法及相关产品
WO2021068253A1 (zh) * 2019-10-12 2021-04-15 深圳鲲云信息科技有限公司 定制数据流硬件模拟仿真方法、装置、设备及存储介质
CN110750312A (zh) * 2019-10-17 2020-02-04 中科寒武纪科技股份有限公司 硬件资源配置方法、装置、云侧设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178749A (zh) * 2006-11-09 2008-05-14 松下电器产业株式会社 程序转换装置
CN102298344A (zh) * 2011-05-05 2011-12-28 杭州电子科技大学 一种基于fpga动态部分可重构技术的局部热点缓和系统
CN105511866A (zh) * 2015-12-01 2016-04-20 华东师范大学 基于并行结构感知技术的资源约束条件下调度寻优方法
CN106383695A (zh) * 2016-09-14 2017-02-08 中国科学技术大学苏州研究院 基于fpga的聚类算法的加速系统及其设计方法
CN107402745A (zh) * 2017-07-04 2017-11-28 清华大学 数据流图的映射方法及装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10572824B2 (en) * 2003-05-23 2020-02-25 Ip Reservoir, Llc System and method for low latency multi-functional pipeline with correlation logic and selectively activated/deactivated pipelined data processing engines
US10789544B2 (en) * 2016-04-05 2020-09-29 Google Llc Batching inputs to a machine learning model
CN106228238B (zh) * 2016-07-27 2019-03-22 中国科学技术大学苏州研究院 现场可编程门阵列平台上加速深度学习算法的方法和系统
CN107016175B (zh) * 2017-03-23 2018-08-31 中国科学院计算技术研究所 适用神经网络处理器的自动化设计方法、装置及优化方法
CN107423817B (zh) * 2017-04-17 2020-09-01 星环信息科技(上海)有限公司 一种深度学习实现的方法及设备
CN107256354A (zh) * 2017-06-08 2017-10-17 北京深瞐科技有限公司 硬件架构的验证方法及装置
CN107392308B (zh) * 2017-06-20 2020-04-03 中国科学院计算技术研究所 一种基于可编程器件的卷积神经网络加速方法与系统
CN107480789B (zh) * 2017-08-07 2020-12-29 北京中星微电子有限公司 一种深度学习模型的高效转换方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178749A (zh) * 2006-11-09 2008-05-14 松下电器产业株式会社 程序转换装置
CN102298344A (zh) * 2011-05-05 2011-12-28 杭州电子科技大学 一种基于fpga动态部分可重构技术的局部热点缓和系统
CN105511866A (zh) * 2015-12-01 2016-04-20 华东师范大学 基于并行结构感知技术的资源约束条件下调度寻优方法
CN106383695A (zh) * 2016-09-14 2017-02-08 中国科学技术大学苏州研究院 基于fpga的聚类算法的加速系统及其设计方法
CN107402745A (zh) * 2017-07-04 2017-11-28 清华大学 数据流图的映射方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313239A (zh) * 2021-06-25 2021-08-27 展讯通信(上海)有限公司 人工智能模型设计优化方法及装置

Also Published As

Publication number Publication date
CN109496319A (zh) 2019-03-19

Similar Documents

Publication Publication Date Title
WO2019136754A1 (zh) 人工智能处理装置的编译方法及系统、存储介质及终端
Deng et al. Vector neurons: A general framework for so (3)-equivariant networks
Zheng et al. A full stage data augmentation method in deep convolutional neural network for natural image classification
WO2019136758A1 (zh) 人工智能处理装置硬件优化方法、系统、存储介质、终端
Liu et al. Implementation of training convolutional neural networks
CN111242208A (zh) 一种点云分类方法、分割方法及相关设备
WO2019136756A1 (zh) 人工智能处理装置设计模型建立方法、系统、存储介质、终端
WO2018010434A1 (zh) 一种图像分类方法及装置
Wang et al. A novel GCN-based point cloud classification model robust to pose variances
CN109033107A (zh) 图像检索方法和装置、计算机设备和存储介质
Yang A CNN-based broad learning system
Zhao et al. PCUNet: A context-aware deep network for coarse-to-fine point cloud completion
Jiang et al. Sdf-3dgan: A 3d object generative method based on implicit signed distance function
Li et al. Multiscale receptive fields graph attention network for point cloud classification
CN116977265A (zh) 缺陷检测模型的训练方法、装置、计算机设备和存储介质
Zhao et al. Point-voxel dual stream transformer for 3d point cloud learning
Liu et al. Detection guided deconvolutional network for hierarchical feature learning
Yuan et al. Research on image classification of lightweight convolutional neural network
US12039740B2 (en) Vectorized bilinear shift for replacing grid sampling in optical flow estimation
Liu et al. An anisotropic Chebyshev descriptor and its optimization for deformable shape correspondence
CN117831677A (zh) 基于点云神经网络的非均匀点阵结构逆向设计方法及系统
CN116486030A (zh) 基于地表图像的三维地质体模型的建模方法和相关装置
Wei et al. Improved Few‐Shot Object Detection Method Based on Faster R‐CNN
Xu et al. ARShape-net: Single-view image oriented 3D shape reconstruction with an adversarial refiner
Qian et al. Hybrid neural network model for large-scale heterogeneous classification tasks in few-shot learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18900267

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.11.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18900267

Country of ref document: EP

Kind code of ref document: A1