JP2021043495A

JP2021043495A - Information processing program and information processing method

Info

Publication number: JP2021043495A
Application number: JP2019162722A
Authority: JP
Inventors: 久典飯島; Hisanori Iijima
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-09-06
Filing date: 2019-09-06
Publication date: 2021-03-18
Anticipated expiration: 2039-09-06
Also published as: JP7251416B2

Abstract

【課題】任意のバッチサイズを指定してニューラルネットワークの学習を実行する。【解決手段】情報処理プログラムは、ニューラルネットワークの学習対象の層の学習に使用する学習データのサイズと入力されたバッチサイズとに基づいて、処理するバッチサイズが異なる複数の処理命令の中からプロセッサに実行させる処理命令を選択し、学習データのサイズと選択した処理命令で指定されるバッチサイズとに基づいて、学習対象の層の学習に使用するメモリサイズを算出し、学習データを保持するメモリの第１のメモリ領域のメモリサイズが、算出したメモリサイズと一致しない場合、算出したメモリサイズを有する第２のメモリ領域をメモリに確保して学習データを転送し、第２のメモリ領域に転送した学習データを使用して、選択した処理命令をプロセッサに実行させる、処理を情報処理装置に実行させる。【選択図】図９PROBLEM TO BE SOLVED: To execute training of a neural network by designating an arbitrary batch size. An information processing program is a processor from a plurality of processing instructions having different batch sizes to be processed based on the size of training data used for training the layer to be trained in a neural network and the input batch size. Selects the processing instruction to be executed, calculates the memory size used for training the layer to be trained based on the size of the training data and the batch size specified by the selected processing instruction, and holds the training data. If the memory size of the first memory area does not match the calculated memory size, a second memory area having the calculated memory size is secured in the memory, the training data is transferred, and the data is transferred to the second memory area. Using the learned data, the processor is made to execute the selected processing instruction, and the information processing device is made to execute the processing. [Selection diagram] FIG. 9

Description

本発明は、情報処理プログラムおよび情報処理方法に関する。 The present invention relates to an information processing program and an information processing method.

ディープラーニングでは、数十万から数百万以上の学習データを使用して、ディープニューラルネットワークの各層のパラメータを適切な値に変更していく。ディープラーニングの計算では、積和演算が多用されるため、積和演算の実行効率を向上することでディープラーニングの計算時間は早くなる。 In deep learning, hundreds of thousands to millions or more of training data are used to change the parameters of each layer of the deep neural network to appropriate values. Since the product-sum calculation is often used in the deep learning calculation, the calculation time of the deep learning is shortened by improving the execution efficiency of the product-sum calculation.

このため、ＣＰＵ（Central Processing Unit）に比べて積和演算の実行効率が高いＧＰＵ（Graphics Processing Unit）が、ディープラーニングに広く使われている。また、ディープラーニングの産業利用が広まるにつれて、ディープラーニングの演算に特化したディープラーニング用の専用プロセッサが開発されている。 For this reason, GPUs (Graphics Processing Units), which have higher execution efficiency of multiply-accumulate operations than CPUs (Central Processing Units), are widely used for deep learning. In addition, as the industrial use of deep learning has become widespread, dedicated processors for deep learning have been developed that specialize in deep learning calculations.

例えば、ディープラーニングに使用されるプロセッサは、演算器を含む複数の演算コアと、各演算コアで使用される固有メモリと、所定数の演算コアで共有される共有メモリと、演算コアによる演算の実行を制御する制御部とを有し、メインメモリに接続される。そして、各演算コアは、制御部による制御に基づいて、メインメモリから固有メモリまたは共有メモリに転送されるデータの演算を実行する。 For example, a processor used for deep learning includes a plurality of arithmetic cores including an arithmetic unit, a unique memory used by each arithmetic core, a shared memory shared by a predetermined number of arithmetic cores, and an arithmetic core. It has a control unit that controls execution and is connected to the main memory. Then, each arithmetic core executes an arithmetic of data transferred from the main memory to the private memory or the shared memory based on the control by the control unit.

例えば、プロセッサで処理するデータブロックには、データブロックの分割に関する割当属性と、データブロックを分割したサブブロックとともに転送する周辺データを示す余白属性と、サブブロック間の依存関係を示す依存属性とが設定される。そして、制御部は、演算コアで順次実行されるプログラムである第１および第２のカーネルの割当属性が一致する場合、第１のカーネルの実行結果をローカルメモリからグローバルメモリに転送せずに第２のカーネルで使用させることで、処理効率を向上させる。この際、制御部は、第１のカーネルで使用するデータブロックの余白属性および依存属性に、第２のカーネルで使用するデータブロックに設定された余白属性および依存属性をそれぞれ論理和加算する（例えば、特許文献１参照）。 For example, the data block processed by the processor has an allocation attribute related to the division of the data block, a margin attribute indicating the peripheral data to be transferred together with the divided subblocks of the data block, and a dependency attribute indicating the dependency between the subblocks. Set. Then, when the allocation attributes of the first and second kernels, which are programs executed sequentially in the arithmetic core, match, the control unit does not transfer the execution result of the first kernel from the local memory to the global memory. By using it in the kernel of 2, the processing efficiency is improved. At this time, the control unit logically adds the margin attributes and the dependency attributes set in the data block used in the second kernel to the margin attributes and the dependency attributes of the data block used in the first kernel (for example). , Patent Document 1).

また、固有メモリに対応して設けられるメモリ制御回路に、共有メモリ間でのデータの転送機能を設けることで、演算コアで共通に使用される画像データをメインメモリにアクセスすることなく読み出せるため、処理効率が向上する（例えば、特許文献２参照）。 In addition, by providing a data transfer function between shared memories in the memory control circuit provided corresponding to the unique memory, the image data commonly used in the arithmetic core can be read without accessing the main memory. , Processing efficiency is improved (see, for example, Patent Document 2).

特開２０１５−１４９０３８号公報Japanese Unexamined Patent Publication No. 2015-149038 国際公開第２００８／１３９２６５号International Publication No. 2008/139265

例えば、ディープニューラルネットワークに含まれる層は、畳み込み層、プーリング層、全結合層または出力層等があり、層の種類は限られており、各層での処理内容とは限られている。このため、各層の処理内容に応じて演算器が実行する演算命令を組み合わせた処理命令が用意される場合がある。 For example, the layers included in the deep neural network include a convolution layer, a pooling layer, a fully connected layer, an output layer, and the like, and the types of layers are limited, and the processing contents of each layer are limited. Therefore, a processing instruction that combines arithmetic instructions executed by the arithmetic unit may be prepared according to the processing content of each layer.

また、ディープラーニングでは、バッチに含まれる学習データを複数に分割してミニバッチを作成し、ミニバッチ単位で学習を実行することが多い。このため、ディープニューラルネットワークに学習データを与える回数であるバッチサイズを特定した複数の処理命令を用意することで、ディープラーニングの演算効率は向上する。 Further, in deep learning, learning data included in a batch is often divided into a plurality of parts to create a mini-batch, and learning is executed in units of mini-batch. Therefore, the calculation efficiency of deep learning is improved by preparing a plurality of processing instructions that specify the batch size, which is the number of times that training data is given to the deep neural network.

しかしながら、この場合、処理命令で指定されるバッチサイズ以外のバッチサイズをユーザが指定した場合、エラーが発生して処理命令が実行されない不具合が発生する。処理命令で指定されるバッチサイズをユーザに公開する場合、エラーの発生は抑止できるが、ユーザは任意のバッチサイズを選択できないため、学習の自由度が狭まってしまう。一方、ユーザが指定可能な全てのバッチサイズに対応する多数の処理命令を用意することは現実的でない。 However, in this case, if the user specifies a batch size other than the batch size specified by the processing instruction, an error occurs and the processing instruction is not executed. When the batch size specified by the processing instruction is disclosed to the user, the occurrence of an error can be suppressed, but since the user cannot select an arbitrary batch size, the degree of freedom in learning is narrowed. On the other hand, it is not realistic to prepare a large number of processing instructions corresponding to all batch sizes that can be specified by the user.

１つの側面では、本発明は、任意のバッチサイズを指定してニューラルネットワークの学習を実行することを目的とする。 In one aspect, the present invention aims to perform neural network learning by specifying an arbitrary batch size.

一つの観点によれば、情報処理プログラムは、演算器と該演算器で使用するデータを保持するメモリとを含む１以上の演算部を有するプロセッサが実行する複数の層を含むニューラルネットワークの学習を制御する情報処理プログラムであって、学習対象の層の学習に使用する学習データのサイズと入力されたバッチサイズとに基づいて、処理するバッチサイズが異なる複数の処理命令の中から前記プロセッサに実行させる処理命令を選択し、前記学習データのサイズと選択した処理命令で指定されるバッチサイズとに基づいて、学習対象の層の学習に使用するメモリサイズを算出し、前記学習データを保持する前記メモリの第１のメモリ領域のメモリサイズが、算出したメモリサイズと一致しない場合、前記算出したメモリサイズを有する第２のメモリ領域を前記メモリに確保して前記学習データを転送し、前記第２のメモリ領域に転送した前記学習データを使用して、前記選択した処理命令を前記プロセッサに実行させる、処理を情報処理装置に実行させる。 According to one aspect, the information processing program learns a neural network including multiple layers executed by a processor having one or more arithmetic units including an arithmetic unit and a memory for holding data used in the arithmetic unit. An information processing program to be controlled, which is executed by the processor from a plurality of processing instructions having different batch sizes to be processed based on the size of training data used for learning the layer to be learned and the input batch size. The processing instruction to be processed is selected, the memory size used for learning the layer to be learned is calculated based on the size of the training data and the batch size specified by the selected processing instruction, and the training data is held. When the memory size of the first memory area of the memory does not match the calculated memory size, a second memory area having the calculated memory size is secured in the memory, the learning data is transferred, and the second memory area is transferred. Using the learning data transferred to the memory area of the above, the processor is made to execute the selected processing instruction, and the information processing apparatus is made to execute the processing.

１つの側面では、本発明は、任意のバッチサイズを指定してニューラルネットワークの学習を実行することができる。 In one aspect, the present invention allows the training of neural networks to be performed by specifying any batch size.

一実施形態における情報処理装置の一例を示すブロック図である。It is a block diagram which shows an example of the information processing apparatus in one Embodiment. 図１の情報処理装置がニューラルネットワークの学習を実行する場合の動作の一例を示すフロー図である。It is a flow diagram which shows an example of the operation when the information processing apparatus of FIG. 1 executes the learning of a neural network. 別の実施形態における情報処理装置の一例を示すブロック図である。It is a block diagram which shows an example of the information processing apparatus in another embodiment. ディープニューラルネットワークの概要を示す説明図である。It is explanatory drawing which shows the outline of a deep neural network. 図３の演算コアが実行可能な処理命令の仕様情報の一例を示す説明図である。It is explanatory drawing which shows an example of the specification information of the processing instruction which can be executed by the arithmetic core of FIG. 図３の情報処理装置において、ニューラルネットワークの各層の学習に使用される入出力データのデータ構造と、処理命令に関係するデータ構造との一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of a data structure of input / output data used for learning each layer of a neural network and a data structure related to a processing instruction in the information processing apparatus of FIG. 図６の配置パターンの一例を示す説明図である。It is explanatory drawing which shows an example of the arrangement pattern of FIG. 図３の情報処理装置がニューラルネットワークの学習を実行する場合の動作の一例を示すフロー図である。It is a flow diagram which shows an example of the operation when the information processing apparatus of FIG. 3 executes the learning of a neural network. 図３の情報処理装置において、ニューラルネットワークの学習対象の層における学習時の動作の一例を示す説明図である。It is explanatory drawing which shows an example of the operation at the time of learning in the layer to be learned of the neural network in the information processing apparatus of FIG. 図３の情報処理装置において、ニューラルネットワークの学習対象の層における学習時の動作の別の例を示す説明図である。It is explanatory drawing which shows another example of the operation at the time of learning in the layer to be learned of the neural network in the information processing apparatus of FIG. 別の実施形態における情報処理装置の一例を示すブロック図である。It is a block diagram which shows an example of the information processing apparatus in another embodiment. 図１１のホストＣＰＵがホストプログラムを実行することで実現される機能の一例を示す説明図である。It is explanatory drawing which shows an example of the function realized by the host CPU of FIG. 11 executing a host program. メモリに確保したメモリ領域のメモリサイズおよび配置パターンと、学習対象の層の学習で使用する処理命令に対応するメモリサイズおよび配置パターンとの関係を示す説明図である。It is explanatory drawing which shows the relationship between the memory size and arrangement pattern of the memory area reserved in memory, and the memory size and arrangement pattern corresponding to the processing instruction used for learning of the layer to be learned. 図１１の情報処理装置がニューラルネットワークの学習を実行する場合の動作の一例を示すフロー図である。It is a flow chart which shows an example of the operation when the information processing apparatus of FIG. 11 executes the learning of a neural network. 別の実施形態における情報処理装置において、処理命令を実行する一例を示す説明図である。It is explanatory drawing which shows an example which executes the processing instruction in the information processing apparatus in another embodiment. さらなる別の実施形態における情報処理装置において、処理命令を実行する一例を示す説明図である。It is explanatory drawing which shows an example which executes the processing instruction in the information processing apparatus in still another Embodiment.

以下、図面を用いて実施形態が説明される。 Hereinafter, embodiments will be described with reference to the drawings.

図１は、一実施形態における情報処理装置の一例を示す。図１に示す情報処理装置１００は、例えば、サーバであり、ホストＣＰＵ２００、ホストメモリ３００、Ｉ／Ｏ（Input/Output）コントローラ２２０およびディープラーニング（ＤＬ）用のプロセッサ５００を有する。ホストＣＰＵ２００は、バスを介してホストメモリ３００に接続され、Ｉ／Ｏコントローラ２２０を介してプロセッサ５００に接続される。例えば、ホストＣＰＵ２００、ホストメモリ３００、Ｉ／Ｏコントローラ２２０およびプロセッサ５００は、システム基板に搭載される。例えば、ホストメモリ３００は、システム基板に搭載されるメモリモジュールである。 FIG. 1 shows an example of an information processing device according to an embodiment. The information processing device 100 shown in FIG. 1 is, for example, a server, and includes a host CPU 200, a host memory 300, an I / O (Input / Output) controller 220, and a processor 500 for deep learning (DL). The host CPU 200 is connected to the host memory 300 via the bus and is connected to the processor 500 via the I / O controller 220. For example, the host CPU 200, the host memory 300, the I / O controller 220, and the processor 500 are mounted on the system board. For example, the host memory 300 is a memory module mounted on a system board.

ホストＣＰＵ２００は、ホストメモリ３００に格納された各種プログラムを実行することで、情報処理装置１００の全体の動作を制御するとともに、プロセッサ５００の動作を制御し、学習データを用いてニューラルネットワークの学習を実行する。各種プログラムにはオペレーティングシステムも含まれる。 The host CPU 200 controls the overall operation of the information processing apparatus 100 by executing various programs stored in the host memory 300, controls the operation of the processor 500, and learns the neural network using the learning data. Execute. The various programs also include the operating system.

なお、情報処理装置１００は、システム基板に接続される図示しない入力装置（キーボードやマウス）、出力装置（モニタやプリンタ）および外部記憶装置（ＨＤＤ（Hard Disk Drive）やＳＤＤ（Solid State Drive））を有してもよい。また、情報処理装置１００Ａは、イントラネットやインターネット等のネットワークに接続されてもよく、ネットワークを介して複数の情報処理装置１００Ａが相互に接続されてもよい。 The information processing device 100 includes an input device (keyboard and mouse), an output device (monitor and printer) and an external storage device (HDD (Hard Disk Drive) and SDD (Solid State Drive)) connected to the system board. May have. Further, the information processing device 100A may be connected to a network such as an intranet or the Internet, or a plurality of information processing devices 100A may be connected to each other via the network.

ホストメモリ３００には、ユーザ定義ファイル３１０と、ニューラルネットワーク（ＮＮ）に演算を実行させるためのホストプログラム４００とが格納される。ユーザ定義ファイル３１０は、ニューラルネットワークを定義した構成情報を含むファイル３１１と、学習データのファイル３１２とを含む。ファイル３１１、３１２は、ユーザにより準備される。以下では、ファイル３１１に含まれるニューラルネットワークの構成情報をニューラルネットワーク３１１とも称し、ファイル３１２に含まれる学習データを学習データ３１２とも称する。特に断らない限り、学習データ３１２は、ニューラルネットワークに含まれる複数の層のうち、学習対象の層の学習に使用される学習データを示す。 The host memory 300 stores a user-defined file 310 and a host program 400 for causing a neural network (NN) to execute an operation. The user-defined file 310 includes a file 311 containing the configuration information defining the neural network and a file 312 of training data. Files 311 and 312 are prepared by the user. Hereinafter, the configuration information of the neural network included in the file 311 is also referred to as a neural network 311, and the learning data included in the file 312 is also referred to as a learning data 312. Unless otherwise specified, the learning data 312 indicates the learning data used for learning the layer to be learned among the plurality of layers included in the neural network.

ホストプログラム４００は、ホストＣＰＵ２００に実行されることで、選択部４０１、算出部４０２、判定部４０３、転送部４０４および処理命令実行部４０５として機能する。ホストプログラム４００に含まれる処理命令のファイル４０６は、プロセッサ５００が実行する複数の処理命令を含んでおり、ニューラルネットワーク３１１の学習の実行時にプロセッサ５００に転送され、プロセッサ５００により実行される。例えば、処理命令は、ニューラルネットワークに含まれる畳み込み層の演算を実行する畳み込み演算用の処理命令、プーリング層の演算を実行するプーリング演算用の処理命令、加算命令および行列積演算命令等を含む。 When executed by the host CPU 200, the host program 400 functions as a selection unit 401, a calculation unit 402, a determination unit 403, a transfer unit 404, and a processing instruction execution unit 405. The processing instruction file 406 included in the host program 400 includes a plurality of processing instructions executed by the processor 500, is transferred to the processor 500 when the training of the neural network 311 is executed, and is executed by the processor 500. For example, the processing instruction includes a processing instruction for a convolution operation that executes an operation of a convolutional layer included in a neural network, a processing instruction for a pooling operation that executes an operation of a pooling layer, an addition instruction, a matrix product operation instruction, and the like.

プロセッサ５００は、複数の演算器５２０とメモリ５３０とを各々含む複数の演算コア５１０を有する。メモリ５３０は、メモリ５３０とともに演算コア５１０に搭載される演算器５２０のみに使用される。演算コア５１０は、演算部の一例である。演算コア５１０は、ホストＣＰＵ２００からの指示に基づいて、処理命令を並列に実行する。なお、各メモリ５３０は、各演算コア５１０に専用に使用されるメモリとして、各演算コア５１０の外部に設けられてもよい。複数の演算器５２０は、積和演算器、加算器または行列積演算器等である。 The processor 500 has a plurality of arithmetic cores 510 including a plurality of arithmetic units 520 and a memory 530, respectively. The memory 530 is used only for the arithmetic unit 520 mounted on the arithmetic core 510 together with the memory 530. The calculation core 510 is an example of a calculation unit. The arithmetic core 510 executes processing instructions in parallel based on an instruction from the host CPU 200. It should be noted that each memory 530 may be provided outside each arithmetic core 510 as a memory exclusively used for each arithmetic core 510. The plurality of arithmetic units 520 are a product-sum arithmetic unit, an adder, a matrix product arithmetic unit, and the like.

ホストＣＰＵ２００が、ホストプログラム４００を実行してニューラルネットワークの学習を実行する場合、選択部４０１、算出部４０２、判定部４０３、転送部４０４および処理命令実行部４０５は、次のように機能する。以下では、ニューラルネットワークの層毎の学習について説明するが、各層の学習を順次実行することで、ニューラルネットワークの学習を実行することができる。また、ニューラルネットワークの入力側の層から学習を順次実行することで、順伝播処理を実行することができ、ニューラルネットワークの出力側の層から学習を順次実行することで、逆伝播処理を実行することができる。 When the host CPU 200 executes the host program 400 to execute the learning of the neural network, the selection unit 401, the calculation unit 402, the determination unit 403, the transfer unit 404, and the processing instruction execution unit 405 function as follows. Hereinafter, learning for each layer of the neural network will be described, but learning of the neural network can be executed by sequentially executing learning of each layer. In addition, forward propagation processing can be executed by sequentially executing learning from the input side layer of the neural network, and backpropagation processing is executed by sequentially executing learning from the output side layer of the neural network. be able to.

選択部４０１は、ニューラルネットワークにおいて、学習対象の層の学習に使用する学習データのサイズをユーザ定義ファイル３１０から取得する。選択部４０１は、取得した学習データのサイズと、情報処理装置１００に入力されたバッチサイズとに基づいて、ファイル４０６内の複数の処理命令の中からプロセッサ５００に実行させる処理命令を選択する。図５で説明するように、バッチサイズが異なる複数の処理命令が、演算の種類（畳み込み演算やプーリング演算等）毎に用意され、ファイル４０６に予め格納される。例えば、情報処理装置１００に入力されるバッチサイズは、ユーザにより指定される任意のバッチサイズである。 The selection unit 401 acquires the size of the training data used for learning the layer to be learned from the user-defined file 310 in the neural network. The selection unit 401 selects a processing instruction to be executed by the processor 500 from a plurality of processing instructions in the file 406 based on the size of the acquired learning data and the batch size input to the information processing apparatus 100. As described with reference to FIG. 5, a plurality of processing instructions having different batch sizes are prepared for each type of operation (convolution operation, pooling operation, etc.) and stored in file 406 in advance. For example, the batch size input to the information processing device 100 is an arbitrary batch size specified by the user.

算出部４０２は、選択部４０１が取得した学習対象の層の学習データのサイズと、選択部４０１が選択した処理命令で指定されるバッチサイズとに基づいて、学習対象の層の学習に使用するメモリ５３０のメモリサイズを算出する。ここで、処理命令で指定されるバッチサイズは、処理命令毎に処理する固定のバッチサイズであり、情報処理装置１００に入力されたユーザ指定のバッチサイズが、処理命令で指定されるバッチサイズと一致するとは限らない。 The calculation unit 402 is used for learning the layer to be learned based on the size of the learning data of the layer to be learned acquired by the selection unit 401 and the batch size specified by the processing instruction selected by the selection unit 401. Calculate the memory size of memory 530. Here, the batch size specified by the processing instruction is a fixed batch size to be processed for each processing instruction, and the batch size specified by the user input to the information processing apparatus 100 is the batch size specified by the processing instruction. It does not always match.

学習対象の層の学習を開始する前に、演算コア５１０のメモリ５３０には、層への入力データおよび層からの出力データを格納するメモリ領域（第１のメモリ領域）が割り当てられる。また、入力データ用のメモリ領域には、学習に使用する入力データ（学習デーが）が格納される。なお、２層目以降の層では、１つ前の層の演算により出力された出力データが学習対象の層の入力データとしてメモリ領域（第１のメモリ領域）に保持される。 Before starting the learning of the layer to be learned, the memory 530 of the arithmetic core 510 is allocated a memory area (first memory area) for storing the input data to the layer and the output data from the layer. In addition, the input data (learning day) used for learning is stored in the memory area for input data. In the second and subsequent layers, the output data output by the calculation of the previous layer is held in the memory area (first memory area) as the input data of the layer to be learned.

判定部４０３は、学習データを保持するメモリ５３０の第１のメモリ領域のメモリサイズが、算出部４０２が算出したメモリサイズと一致するか否かを判定する。すなわち、判定部４０３は、学習データのサイズとユーザ指定のバッチサイズとにから求まるメモリサイズと、学習データのサイズと処理命令に依存するバッチサイズとから求まるメモリサイズとの一致／不一致を判定する。 The determination unit 403 determines whether or not the memory size of the first memory area of the memory 530 that holds the learning data matches the memory size calculated by the calculation unit 402. That is, the determination unit 403 determines whether or not the memory size obtained from the size of the learning data and the batch size specified by the user matches / does not match the memory size obtained from the size of the learning data and the batch size depending on the processing instruction. ..

メモリサイズが一致する場合、第１のメモリ領域に保持された学習データ（入力データ）を使用して学習を実行できるが、メモリサイズが一致しない場合、演算実行時にエラーが発生するため、学習を実行できない。なお、判定部４０３は、ニューラルネットワークの各層に対して入力または出力されるテンソルデータ毎にメモリサイズの一致／不一致を判定する。 If the memory sizes match, learning can be executed using the learning data (input data) held in the first memory area, but if the memory sizes do not match, an error will occur during calculation execution, so learning will be performed. Cannot be executed. The determination unit 403 determines whether the memory size matches or does not match for each tensor data input or output for each layer of the neural network.

転送部４０４は、判定部４０３がメモリサイズの不一致を判定した場合、不一致を判定したテンソルデータに対応して算出部４０２が算出したメモリサイズのメモリ領域（第２のメモリ領域）をメモリ５３０に新たに確保する。転送部４０４は、新たに確保した第２のメモリ領域に、学習対象の層で学習する学習データ（入力データ）を転送する。新たに確保した第２のメモリ領域に学習データが転送された場合、第１のメモリ領域に保持された学習データは、使用禁止または無効にされ、あるいは破棄される。 When the determination unit 403 determines that the memory sizes do not match, the transfer unit 404 sets the memory area (second memory area) of the memory size calculated by the calculation unit 402 in response to the tensor data that determines the mismatch to the memory 530. Secure a new one. The transfer unit 404 transfers the learning data (input data) to be learned in the layer to be learned to the newly secured second memory area. When the learning data is transferred to the newly secured second memory area, the learning data held in the first memory area is prohibited from use, invalidated, or destroyed.

そして、処理命令実行部４０５は、有効なメモリ領域（第１のメモリ領域または第２のメモリ領域）に保持された学習データを使用して、選択部４０１が選択した処理命令をプロセッサ５００に実行させることで、ニューラルネットワークの学習を実行する。 Then, the processing instruction execution unit 405 executes the processing instruction selected by the selection unit 401 to the processor 500 using the learning data held in the effective memory area (first memory area or second memory area). By letting it, the learning of the neural network is executed.

この実施形態では、情報処理装置１００に入力されたバッチサイズが、学習を実行する処理命令で指定されるバッチサイズと異なる場合にも、処理命令の実行に使用するメモリサイズに合わせてメモリ領域を確保することができる。この結果、任意のバッチサイズが情報処理装置１００に入力された場合にも、エラーを発生させることなく学習を実行することができる。 In this embodiment, even when the batch size input to the information processing apparatus 100 is different from the batch size specified by the processing instruction for executing learning, the memory area is set according to the memory size used for executing the processing instruction. Can be secured. As a result, even when an arbitrary batch size is input to the information processing apparatus 100, learning can be executed without causing an error.

図２は、図１の情報処理装置１００がニューラルネットワークの学習を実行する場合の動作の一例を示す。図２に示す動作は、ホストＣＰＵ２００がホストプログラム４００を実行することにより実現される。すなわち、図２は、情報処理プログラムおよび情報処理方法の一例を示す。 FIG. 2 shows an example of the operation when the information processing apparatus 100 of FIG. 1 executes the learning of the neural network. The operation shown in FIG. 2 is realized by the host CPU 200 executing the host program 400. That is, FIG. 2 shows an example of an information processing program and an information processing method.

例えば、図２に示す動作は、ニューラルネットワークの学習を実行する指示を情報処理装置１００が受けたことに基づいて開始される。なお、図２に示す動作は、順伝播処理および逆伝播処理において、ニューラルネットワークに含まれる複数の層の各々に対応して実行される。 For example, the operation shown in FIG. 2 is started based on the fact that the information processing apparatus 100 receives an instruction to execute the learning of the neural network. The operation shown in FIG. 2 is executed in the forward propagation process and the back propagation process corresponding to each of the plurality of layers included in the neural network.

まず、ステップＳ１において、選択部４０１は、学習データのサイズと、情報処理装置１００に入力されたバッチサイズとに基づいて、複数の処理命令の中からプロセッサ５００に実行させる処理命令を選択する。 First, in step S1, the selection unit 401 selects a processing instruction to be executed by the processor 500 from a plurality of processing instructions based on the size of the learning data and the batch size input to the information processing device 100.

次に、ステップＳ２において、算出部４０２は、学習データのサイズと、選択部４０１が選択した処理命令で指定されるバッチサイズとに基づいて、学習に使用するメモリ５３０のメモリサイズを算出する。 Next, in step S2, the calculation unit 402 calculates the memory size of the memory 530 used for learning based on the size of the learning data and the batch size specified by the processing instruction selected by the selection unit 401.

次に、ステップＳ３において、判定部４０３は、学習データを保持するメモリ５３０の第１のメモリ領域のメモリサイズが、算出部４０２が算出したメモリサイズと一致するか否かを判定する。メモリサイズが一致する場合、処理はステップＳ５に移行され、メモリサイズが一致しない場合、処理はステップＳ４に移行される。 Next, in step S3, the determination unit 403 determines whether or not the memory size of the first memory area of the memory 530 holding the learning data matches the memory size calculated by the calculation unit 402. If the memory sizes match, the process proceeds to step S5, and if the memory sizes do not match, the process proceeds to step S4.

ステップＳ４において、転送部４０４は、算出部４０２が算出したメモリサイズのメモリ領域（第２のメモリ領域）をメモリ５３０に新たに確保し、確保した第２のメモリ領域に、学習データを転送し、処理をステップＳ５に移行する。 In step S4, the transfer unit 404 newly secures a memory area (second memory area) of the memory size calculated by the calculation unit 402 in the memory 530, and transfers the learning data to the secured second memory area. , The process proceeds to step S5.

ステップＳ５において、処理命令実行部４０５は、有効なメモリ領域（第１のメモリ領域または第２のメモリ領域）に保持された学習データを使用して、選択部４０１が選択した処理命令をプロセッサ５００に実行させる。そして、プロセッサ５００が処理命令で指示される演算を実行することで、ニューラルネットワークにおける学習対象の層の学習が実行される。 In step S5, the processing instruction execution unit 405 uses the learning data held in the effective memory area (first memory area or second memory area) to transmit the processing instruction selected by the selection unit 401 to the processor 500. To execute. Then, when the processor 500 executes the operation instructed by the processing instruction, the learning of the layer to be learned in the neural network is executed.

以上、図１および図２に示す実施形態では、情報処理装置１００に入力されたバッチサイズが、学習を実行する処理命令で指定されるバッチサイズと異なる場合にも、処理命令の実行に使用するメモリサイズに合わせてメモリ領域を確保することができる。この結果、任意のバッチサイズが指定された場合にも、エラーを発生させることなく学習を実行することができる。 As described above, in the embodiment shown in FIGS. 1 and 2, even when the batch size input to the information processing apparatus 100 is different from the batch size specified by the processing instruction for executing learning, it is used for executing the processing instruction. A memory area can be secured according to the memory size. As a result, even if an arbitrary batch size is specified, learning can be executed without causing an error.

図３は、別の実施形態における情報処理装置の一例を示す。図１と同様の要素については、同じ符号を付し、詳細な説明は省略する。図３に示す情報処理装置１００Ａは、例えば、サーバであり、図１の情報処理装置１００と同様の構成を有する。 FIG. 3 shows an example of an information processing device according to another embodiment. The same elements as those in FIG. 1 are designated by the same reference numerals, and detailed description thereof will be omitted. The information processing device 100A shown in FIG. 3 is, for example, a server and has the same configuration as the information processing device 100 of FIG.

ホストメモリ３００には、ユーザ定義ファイル３１０と、ニューラルネットワーク（ＮＮ）に演算を実行させるためのホストプログラム４００Ａが格納される。ユーザ定義ファイル３１０は、図１と同様に、ニューラルネットワークを定義した構成情報を含むファイル３１１と、学習データのファイル３１２とを含む。 The host memory 300 stores a user-defined file 310 and a host program 400A for causing a neural network (NN) to execute an operation. Similar to FIG. 1, the user-defined file 310 includes a file 311 containing configuration information defining a neural network and a file 312 of training data.

ホストプログラム４００Ａは、ホストＣＰＵ２００に実行されることで、演算種決定部４１１、データサイズ変更部４１２、変換／転送指示部４１４、情報管理部４２１および処理命令実行部４３２として機能する。演算種決定部４１１、データサイズ変更部４１２、変換／転送指示部４１４、情報管理部４２１および処理命令実行部４３２の機能については、図８に示す動作フローで説明する。 When executed by the host CPU 200, the host program 400A functions as a calculation type determination unit 411, a data size change unit 412, a conversion / transfer instruction unit 414, an information management unit 421, and a processing instruction execution unit 432. The functions of the calculation type determination unit 411, the data size change unit 412, the conversion / transfer instruction unit 414, the information management unit 421, and the processing instruction execution unit 432 will be described with reference to the operation flow shown in FIG.

例えば、ホストプログラム４００Ａに含まれる処理命令のファイル４２３は、プロセッサ５００が実行する複数の処理命令を含んでおり、ニューラルネットワーク３１１の学習の実行時にプロセッサ５００に転送され、プロセッサ５００により実行される。各処理命令は、プロセッサ５００の演算器５２０が実行する演算命令列を含む。処理命令の種類については、図５で説明する。 For example, the processing instruction file 423 included in the host program 400A includes a plurality of processing instructions executed by the processor 500, is transferred to the processor 500 when the training of the neural network 311 is executed, and is executed by the processor 500. Each processing instruction includes a sequence of arithmetic instructions executed by the arithmetic unit 520 of the processor 500. The types of processing instructions will be described with reference to FIG.

図４は、ディープニューラルネットワークの概要を示す。図４に示すディープニューラルネットワークは、複数組の畳み込み層／プーリング層と、全結合層とを隠れ層に含むが、他の層を含んでもよい。 FIG. 4 shows an outline of a deep neural network. The deep neural network shown in FIG. 4 includes a plurality of sets of convolution layers / pooling layers and a fully connected layer as hidden layers, but may include other layers.

情報処理装置１００Ａは、例えば、ミニバッチに含まれる複数の学習データ（入力データ）の各々を入力層に入力し、畳み込み層、プーリング層等の演算を順次実行することで、演算により得られる情報を入力側から出力側に順次伝える順伝播処理を実行する。なお、例えば、畳み込み層は、前の層からのデータ（出力データ）と、学習データとして予め準備された重みデータとが畳み込み演算され、演算により得られた出力データが次の層の入力データとして出力される。 The information processing device 100A, for example, inputs each of a plurality of learning data (input data) included in the mini-batch to the input layer, and sequentially executes operations such as a convolution layer and a pooling layer to obtain information obtained by the operations. Executes forward propagation processing that sequentially transmits from the input side to the output side. For example, in the convolution layer, the data (output data) from the previous layer and the weight data prepared in advance as training data are convolved, and the output data obtained by the calculation is used as the input data of the next layer. It is output.

ミニバッチによる順伝播処理の実行後、出力データと正解データとの差分（例えば、誤差の二乗和）を小さくするための逆伝播処理が実行される。そして、逆伝播処理の実行に基づいて重み等のパラメータが更新される。複数のミニバッチにおいて、順伝播処理、逆伝播処理およびパラメータの更新処理を実行することで、徐々に正解率が上がっていき、ディープニューラルネットワークが最適化される。順伝播処理および逆伝播処理における各層の演算は、演算コア５１０が処理命令を実行することで行われる。 After executing the forward propagation process by the mini-batch, the back propagation process for reducing the difference between the output data and the correct answer data (for example, the sum of squares of the errors) is executed. Then, parameters such as weights are updated based on the execution of the back propagation process. By executing forward propagation processing, back propagation processing, and parameter update processing in a plurality of mini-batch, the correct answer rate is gradually increased, and the deep neural network is optimized. The operations of each layer in the forward propagation process and the back propagation process are performed by the arithmetic core 510 executing a processing instruction.

図５は、図３の演算コア５１０が実行可能な処理命令の仕様情報の一例を示す。この実施形態では、演算の種類毎に１つまたは複数の処理命令が予め準備され、処理命令としてホストメモリ３００のファイル４２３に保持される。なお、図５では、入力データは２つあり（入力１と入力２）、出力データは１つあるとする（出力１）が、入力データと出力データの数は、図５に示す例に限定されない。また、プロセッサ５００が実行可能な処理の種類と処理命令の数は、図５に限定されない。 FIG. 5 shows an example of specification information of a processing instruction that can be executed by the arithmetic core 510 of FIG. In this embodiment, one or a plurality of processing instructions are prepared in advance for each type of operation, and are held in the file 423 of the host memory 300 as the processing instructions. In FIG. 5, it is assumed that there are two input data (input 1 and input 2) and one output data (output 1), but the number of input data and output data is limited to the example shown in FIG. Not done. Further, the types of processing that can be executed by the processor 500 and the number of processing instructions are not limited to FIG.

順伝播処理では、各層において、入力層側から入力データが与えられ、出力層側から演算結果が出力される。逆伝播処理では、出力層側（順伝播処理での出力側）から入力が与えられ、入力層側（順伝播処理での入力側）から演算結果が出力される。 In the forward propagation process, input data is given from the input layer side in each layer, and the calculation result is output from the output layer side. In the back propagation process, an input is given from the output layer side (output side in the forward propagation process), and the calculation result is output from the input layer side (input side in the forward propagation process).

例えば、畳み込み演算では、４種類の処理命令Ｃｏｎｖ１、Ｃｏｎｖ２、Ｃｏｎｖ３、Ｃｏｎｖ４が使用可能である。処理命令Ｃｏｎｖ１−Ｃｏｎｖ４は、対応バッチ数と、入力１、入力２、出力１の配置パターン（分散型またはコピー型）とにより使い分けられる。 For example, in the convolution operation, four types of processing instructions Conv1, Conv2, Conv3, and Conv4 can be used. The processing instructions Conv1-Conv4 are used properly according to the number of corresponding batches and the arrangement pattern (distributed type or copy type) of the input 1, the input 2, and the output 1.

要素毎の加算では、１種類の処理命令Ａｄｄ１が使用可能である。処理命令Ａｄｄ１は、任意のバッチ数で使用可能であり、入力１、入力２、出力１のデータ配置は、全て分散型である。 One type of processing instruction Add1 can be used for element-by-element addition. The processing instruction Add1 can be used in any number of batches, and the data arrangements of the input 1, the input 2, and the output 1 are all distributed.

行列積では、２種類の処理命令Ｇｅｍｍ１、Ｇｅｍｍ２が使用可能である。処理命令Ｇｅｍｍ１、Ｇｅｍｍ２は、任意のバッチ数で使用可能であり、入力１／入力２の一方が分散型、入力１／入力２の他方がコピー型であり、出力１は分散型である。 In matrix multiplication, two types of processing instructions Gemm1 and Gemm2 can be used. The processing instructions Gemm1 and Gemm2 can be used in any number of batches, one of the inputs 1 / input 2 is a distributed type, the other of the inputs 1 / input 2 is a copy type, and the output 1 is a distributed type.

図６は、図３の情報処理装置において、ニューラルネットワークの各層の学習に使用される入出力データのデータ構造と、処理命令に関係するデータ構造との一例を示す説明図である。図６は、入力テンソルデータが２つで、出力テンソルデータが１つの層の入出力データのデータ構造を示す。 FIG. 6 is an explanatory diagram showing an example of a data structure of input / output data used for learning each layer of the neural network and a data structure related to a processing instruction in the information processing apparatus of FIG. FIG. 6 shows a data structure of input / output data in a layer having two input tensor data and one output tensor data.

入出力データのデータ構造は、ニューラルネットワークの各層に対して入力または出力されるテンソルデータ毎に定められる。各テンソルデータは、各層に対して入力または出力される入出力データ以外に、各次元の要素数と、メモリサイズと、配置パターンとの情報を有する。例えば、画像認識用のニューラルネットワークの学習において、入力画像データの次元は、バッチサイズＮ、色Ｃ、横の画素数Ｗ、縦の画素数Ｈの４つであり、次元毎に要素数が設定される。一例として、各次元の要素数が、バッチサイズＮ＝３２、色Ｃ＝３（カラー画像）、画素数Ｗ＝１０、画素数Ｈ＝１０であり、データが単精度（４バイト）の浮動小数点数であるとする。 The data structure of the input / output data is determined for each tensor data input or output for each layer of the neural network. Each tensor data has information on the number of elements in each dimension, the memory size, and the arrangement pattern, in addition to the input / output data input or output to each layer. For example, in learning a neural network for image recognition, there are four dimensions of input image data: batch size N, color C, number of horizontal pixels W, and number of vertical pixels H, and the number of elements is set for each dimension. Will be done. As an example, the number of elements in each dimension is batch size N = 32, color C = 3 (color image), number of pixels W = 10, number of pixels H = 10, and the data is single precision (4 bytes) floating point. Suppose it is a number.

この場合、テンソルデータサイズは、３８４００バイトとなり、テンソルデータを格納するメモリ５３０のメモリ領域のメモリサイズは３８４００バイト以上に設定される。各テンソルデータサイズは、各次元の要素数の積と１データ当たりのバイト数とを乗じることで算出される。配置パターンは、各テンソルデータを複数のメモリ５３０に分散して格納する分散型と、各テンソルデータを複数のメモリ５３０にそれぞれ格納するコピー型とのいずれかである。配置パターンの例は、図７で説明する。 In this case, the tensor data size is 38,400 bytes, and the memory size of the memory area of the memory 530 for storing the tensor data is set to 38,400 bytes or more. Each tensor data size is calculated by multiplying the product of the number of elements in each dimension by the number of bytes per data. The arrangement pattern is either a distributed type in which each tensor data is distributed and stored in a plurality of memories 530, or a copy type in which each tensor data is stored in a plurality of memories 530. An example of the arrangement pattern will be described with reference to FIG.

入力テンソルデータの数および出力テンソルデータの数は、演算の種類によって決まる。処理命令に関するデータ構造は、処理命令で使用するメモリサイズと、処理命令で使用する配置パターンとを有する。処理命令で使用するメモリサイズは、処理命令で指定されるバッチサイズと、学習対象の層の学習に使用するデータのサイズとに基づいて設定される。なお、処理命令による処理では、１つ以上のテンソルデータが入力されると、１つ以上のテンソルデータが出力される。 The number of input tensor data and the number of output tensor data are determined by the type of operation. The data structure related to the processing instruction has a memory size used in the processing instruction and an arrangement pattern used in the processing instruction. The memory size used in the processing instruction is set based on the batch size specified in the processing instruction and the size of the data used for learning the layer to be learned. In the processing by the processing instruction, when one or more tensor data is input, one or more tensor data is output.

図７は、図６の配置パターンの一例を示す。説明を分かりやすくするため、図７に示す例では、テンソルデータがデータＤ１、Ｄ２、Ｄ３、Ｄ４を含むものとする。配置パターンが分散型の場合、データＤ１−Ｄ４は、各演算コア５１０のメモリ５３０に分散して配置される。一方、配置パターンがコピー型の場合、データＤ１−Ｄ４が各演算コア５１０のメモリ５３０にそれぞれ配置される。コピー型は、重複型の一例である。 FIG. 7 shows an example of the arrangement pattern of FIG. In the example shown in FIG. 7, it is assumed that the tensor data includes the data D1, D2, D3, and D4 for the sake of clarity. When the arrangement pattern is a distributed type, the data D1-D4 are distributed and arranged in the memory 530 of each calculation core 510. On the other hand, when the arrangement pattern is a copy type, the data D1-D4 are arranged in the memory 530 of each calculation core 510. The copy type is an example of a duplicate type.

図８は、図３の情報処理装置１００Ａがニューラルネットワークの学習を実行する場合の動作の一例を示す。図６に示す動作は、図３のホストＣＰＵ２００がホストプログラム４００Ａを実行することにより実現される。すなわち、図６は、情報処理プログラムおよび情報処理方法の一例を示す。 FIG. 8 shows an example of the operation when the information processing apparatus 100A of FIG. 3 executes the learning of the neural network. The operation shown in FIG. 6 is realized by the host CPU 200 of FIG. 3 executing the host program 400A. That is, FIG. 6 shows an example of an information processing program and an information processing method.

例えば、図８に示す動作は、ニューラルネットワークの学習を実行する指示を情報処理装置１００Ａが受けたことに基づいて開始される。なお、図８に示す動作は、順伝播処理および逆伝播処理において、ニューラルネットワークに含まれる複数の層の各々に対応して実行される。 For example, the operation shown in FIG. 8 is started based on the fact that the information processing apparatus 100A receives an instruction to execute the learning of the neural network. The operation shown in FIG. 8 is executed in the forward propagation process and the back propagation process corresponding to each of the plurality of layers included in the neural network.

まず、ステップＳ１０において、図３の演算種決定部４１１は、ユーザ定義ファイル３１０中のニューラルネットワークにおける学習対象の層の構成情報等の仕様に基づいて、演算種を決定する。例えば、学習対象の層が畳み込み層の場合、"畳み込み"が演算種に決定される。演算種は、学習対象の層の学習に使用する演算方式の一例である。 First, in step S10, the calculation type determination unit 411 of FIG. 3 determines the calculation type based on the specifications such as the configuration information of the layer to be learned in the neural network in the user-defined file 310. For example, when the layer to be learned is a convolution layer, "convolution" is determined as the operation type. The calculation type is an example of a calculation method used for learning the layer to be learned.

次に、ステップＳ１２において、変換／転送指示部４１４は、ユーザにより指定されたバッチサイズと、ニューラルネットワークにおける学習対象の層の仕様と、学習データとに基づいて、各メモリ５３０において、演算に使用するメモリ領域を確保する。ここで、学習対象の層の仕様と学習データとして、入出力データの配置パターン（分散型／コピー型）と、演算に使用するデータサイズとが使用され、メモリ領域（第１のメモリ領域）が確保される。そして、変換／転送指示部４１４は、配置パターンにしたがって、各メモリ５３０に確保したメモリ領域に、演算に使用する学習データ（ユーザにより指定されたバッチサイズ分のデータ）を転送する。 Next, in step S12, the conversion / transfer instruction unit 414 is used for calculation in each memory 530 based on the batch size specified by the user, the specifications of the layer to be learned in the neural network, and the learning data. Allocate a memory area to be used. Here, as the specifications of the layer to be learned and the learning data, the input / output data arrangement pattern (distributed / copy type) and the data size used for the calculation are used, and the memory area (first memory area) is set. Secured. Then, the conversion / transfer instruction unit 414 transfers the learning data (data for the batch size specified by the user) to be used for the calculation to the memory area secured in each memory 530 according to the arrangement pattern.

なお、ニューラルネットワークの学習（順伝播処理または逆伝播処理のいずれか）において、２層目以降の処理では、前の層の演算により得られた出力データが入力データとして使用される。また、前の層の演算により得られた出力データの配置パターンが、演算対象の層の入力データの配置パターンとして使用される。このため、２層目以降の各層の学習では、ステップＳ１２によるメモリ領域の確保とデータ転送は実行されなくてもよい。 In the learning of the neural network (either forward propagation processing or back propagation processing), in the processing of the second and subsequent layers, the output data obtained by the calculation of the previous layer is used as the input data. Further, the arrangement pattern of the output data obtained by the calculation of the previous layer is used as the arrangement pattern of the input data of the layer to be calculated. Therefore, in the learning of each layer after the second layer, it is not necessary to secure the memory area and transfer the data in step S12.

次に、ステップＳ１４において、情報管理部４２１は、ステップＳ１０で決定した演算種と、ユーザにより指定されたバッチサイズと、演算に使用する学習データのサイズとに基づいて、演算対象の層で実行するための処理命令を選択する。ここで、２層目以降の各層の処理では、前の層の演算により得られた出力データのサイズを含めたメモリサイズに基づいて、処理命令が選択される。 Next, in step S14, the information management unit 421 executes the calculation target layer based on the calculation type determined in step S10, the batch size specified by the user, and the size of the learning data used for the calculation. Select the processing instruction to do. Here, in the processing of each layer after the second layer, the processing instruction is selected based on the memory size including the size of the output data obtained by the calculation of the previous layer.

次に、ステップＳ１６において、情報管理部４２１は、例えば、図５に示した情報に基づいて、ステップＳ１４で選択した処理命令に対応するデータの配置パターンを求める。また、情報管理部４２１は、ステップＳ１４で選択した処理命令で指定される処理命令固有のバッチサイズおよび配置パターンと、学習データのサイズとに基づいて、選択した処理命令で使用するメモリサイズとを算出する。 Next, in step S16, the information management unit 421 obtains a data arrangement pattern corresponding to the processing instruction selected in step S14, for example, based on the information shown in FIG. Further, the information management unit 421 determines the batch size and arrangement pattern peculiar to the processing instruction specified in the processing instruction selected in step S14, and the memory size used in the selected processing instruction based on the size of the learning data. calculate.

次に、ステップＳ２０において、データサイズ変更部４１２は、ステップＳ１６で算出したメモリサイズ（選択した処理命令で指定されるバッチサイズに依存する）と、ステップＳ１２で確保したメモリ領域のメモリサイズとの一致／不一致を判定する。なお、２層目以降の処理では、前の層の演算により得られた出力データのサイズを含めたメモリサイズが、ステップＳ１６で算出したメモリサイズと比較される。メモリサイズが一致する場合、処理はステップＳ２４に移行し、メモリサイズが不一致の場合、処理はステップＳ２２に移行する。 Next, in step S20, the data size changing unit 412 determines the memory size calculated in step S16 (depending on the batch size specified by the selected processing instruction) and the memory size of the memory area secured in step S12. Determine match / mismatch. In the processing of the second and subsequent layers, the memory size including the size of the output data obtained by the calculation of the previous layer is compared with the memory size calculated in step S16. If the memory sizes match, the process proceeds to step S24, and if the memory sizes do not match, the process proceeds to step S22.

ステップＳ２２において、データサイズ変更部４１２は、ステップＳ１２で確保したメモリ領域（第１のメモリ領域）に代えて、ステップＳ１６で算出したメモリサイズに対応するメモリ領域（第２のメモリ領域）を各メモリ５３０に新たに確保する。すなわち、データサイズ変更部４１２は、ステップＳ１２で確保した入出力データのメモリサイズを、処理命令の仕様であるメモリサイズに変更する。データサイズ変更部４１２は、新たに確保したメモリ領域に、演算に使用する学習データ（ユーザにより指定されたバッチサイズ分のデータ）を転送し、処理をステップＳ２４に移行する。 In step S22, the data size changing unit 412 replaces the memory area (first memory area) secured in step S12 with a memory area (second memory area) corresponding to the memory size calculated in step S16. It is newly secured in the memory 530. That is, the data size changing unit 412 changes the memory size of the input / output data secured in step S12 to the memory size specified in the processing instruction. The data size changing unit 412 transfers the learning data (data for the batch size specified by the user) used for the calculation to the newly secured memory area, and shifts the process to step S24.

なお、新たに確保したメモリ領域のサイズが、演算に使用する学習データのサイズより大きい場合、データサイズ変更部４１２は、空きのメモリ領域に処理命令による演算の実行に影響を与えないダミーデータを格納してもよい。なお、ステップＳ１２で確保したメモリ領域に保持されているデータは、使用禁止または無効にされ、あるいは破棄される。 When the size of the newly secured memory area is larger than the size of the learning data used for the calculation, the data size changing unit 412 applies dummy data to the free memory area so as not to affect the execution of the calculation by the processing instruction. May be stored. The data held in the memory area secured in step S12 is prohibited from use, invalidated, or destroyed.

ステップＳ２４において、変換／転送指示部４１４は、ステップＳ１２で使用したデータの配置パターンと、ステップＳ１６で求めたデータの配置パターンとの一致／不一致を判定する。ここで、配置パターンの一致／不一致は、データの種類毎（図５の入力１、入力２、出力１）に判定される。すなわち、ステップＳ２４、Ｓ２６は、データの種類毎に判定される。配置パターンが一致する場合、処理はステップＳ２８に移行し、配置パターンが不一致の場合、処理はステップＳ２６に移行する。なお、２層目以降の処理では、前の層の演算により得られた出力データの配置パターンを含めた配置パターンが、ステップＳ１６で求めたデータの配置パターンと比較される。 In step S24, the conversion / transfer instruction unit 414 determines a match / mismatch between the data arrangement pattern used in step S12 and the data arrangement pattern obtained in step S16. Here, the match / mismatch of the arrangement pattern is determined for each type of data (input 1, input 2, output 1 in FIG. 5). That is, steps S24 and S26 are determined for each type of data. If the arrangement patterns match, the process proceeds to step S28, and if the arrangement patterns do not match, the process proceeds to step S26. In the processing of the second and subsequent layers, the arrangement pattern including the arrangement pattern of the output data obtained by the calculation of the previous layer is compared with the arrangement pattern of the data obtained in step S16.

ステップＳ２６において、変換／転送指示部４１４は、ステップＳ１２で確保したメモリ領域（第１のメモリ領域）に代えて、ステップＳ１６で算出したメモリサイズに対応するメモリ領域（第３のメモリ領域）を各メモリ５３０に新たに確保する。すなわち、変換／転送指示部４１４は、ステップＳ１２で確保したメモリ領域に対応する配置パターンを処理命令の仕様である配置パターンに変更する。変換／転送指示部４１４は、新たに確保したメモリ領域に、演算に使用する学習データ（ユーザにより指定されたバッチサイズ分のデータ）を、ステップＳ１６で求めたデータの配置パターンにしたがって転送し、処理をステップＳ２８に移行する。 In step S26, the conversion / transfer instruction unit 414 replaces the memory area (first memory area) secured in step S12 with a memory area (third memory area) corresponding to the memory size calculated in step S16. A new memory is allocated to each memory 530. That is, the conversion / transfer instruction unit 414 changes the arrangement pattern corresponding to the memory area secured in step S12 to the arrangement pattern which is the specification of the processing instruction. The conversion / transfer instruction unit 414 transfers the learning data (data for the batch size specified by the user) used for the calculation to the newly secured memory area according to the data arrangement pattern obtained in step S16. The process proceeds to step S28.

なお、ステップＳ２２、Ｓ２６の処理は、同時に実行されてもよい。すなわち、メモリサイズが不一致で、配置パターンも不一致の場合、メモリ領域の確保とデータの転送とは、一度に実行される。 The processes of steps S22 and S26 may be executed at the same time. That is, when the memory sizes do not match and the arrangement patterns do not match, the memory area is secured and the data is transferred at the same time.

ステップＳ２８において、処理命令実行部４３２は、ステップＳ１４で選択された処理命令を各演算コア５１０に転送し、各演算コア５１０に処理命令を実行させ、図６に示す処理を終了する。これにより、ニューラルネットワークの１つの層の学習処理が実行される。 In step S28, the processing instruction execution unit 432 transfers the processing instruction selected in step S14 to each arithmetic core 510, causes each arithmetic core 510 to execute the processing instruction, and ends the processing shown in FIG. As a result, the learning process of one layer of the neural network is executed.

図９は、図３の情報処理装置において、ニューラルネットワークの学習対象の層における学習時の動作の一例を示す。例えば、図９は、画像を認識するニューラルネットワークの学習時の動作の一例を示し、各次元の要素数が、バッチサイズＮ＝３２、色Ｃ＝３、画素数Ｗ＝１０、画素数Ｈ＝１０、データが単精度浮動小数点数（４バイト）であるとする。学習対象の層に入力される入力データ（入力テンソルデータ）のサイズは３８４００バイトであるとする。また、学習対象の層と、学習対象の層の１つ前の層とにおいて、データの配置パターンは、分散型であるとする。以下では、学習対象の層を対象層とも称し、学習対象の層の１つ前の層を前層とも称する。 FIG. 9 shows an example of the operation during learning in the layer to be learned of the neural network in the information processing apparatus of FIG. For example, FIG. 9 shows an example of an operation during learning of a neural network that recognizes an image, and the number of elements in each dimension is batch size N = 32, color C = 3, number of pixels W = 10, number of pixels H =. 10. Assume that the data is a single precision floating point number (4 bytes). It is assumed that the size of the input data (input tensor data) input to the layer to be learned is 38,400 bytes. Further, it is assumed that the data arrangement pattern is distributed in the layer to be learned and the layer immediately before the layer to be learned. Hereinafter, the layer to be learned is also referred to as a target layer, and the layer immediately preceding the layer to be learned is also referred to as a front layer.

図９（ａ）では、前の層の出力データ（出力テンソルデータ）は、３８４００バイト（９６００×４）であるため、プロセッサ５００は、演算結果である出力データを、９６００バイトずつ４つのメモリ５３０に分散して格納する。 In FIG. 9A, since the output data (output tensor data) of the previous layer is 38400 bytes (9600 × 4), the processor 500 displays the output data which is the calculation result by 9600 bytes each in four memories 530. Store in a distributed manner.

対象層において、図８のステップＳ１６で算出されたメモリサイズは３８４００バイトであり、配置パターンは分散型である。このため、図８のステップＳ２０、Ｓ２４の判定において、メモリサイズ、配置パターンとも一致し、メモリサイズおよび配置パターンの変更は行われない。 In the target layer, the memory size calculated in step S16 of FIG. 8 is 38,400 bytes, and the arrangement pattern is distributed. Therefore, in the determination in steps S20 and S24 of FIG. 8, the memory size and the arrangement pattern also match, and the memory size and the arrangement pattern are not changed.

そして、プロセッサ５００は、メモリ５３０に格納された前層の演算結果を入力データとして、対象層の学習を実行する。このように、対象層の学習に使用するデータを保持するメモリ領域のメモリサイズと、対象層の学習に使用する処理命令で指定されるバッチサイズに基づき算出されたメモリサイズとが一致する場合、メモリ領域をそのまま使用して処理命令を実行することができる。前層の演算後、対象層の演算を実行するまでにメモリ領域を確保するなどの前処理がないため、学習効率を低下させずに、対象層の学習において処理命令の演算を実行することができる。 Then, the processor 500 executes learning of the target layer using the calculation result of the previous layer stored in the memory 530 as input data. In this way, when the memory size of the memory area that holds the data used for learning the target layer matches the memory size calculated based on the batch size specified by the processing instruction used for learning the target layer, Processing instructions can be executed using the memory area as it is. Since there is no preprocessing such as securing a memory area after the operation of the previous layer and before the operation of the target layer is executed, it is possible to execute the operation of the processing instruction in the learning of the target layer without lowering the learning efficiency. it can.

図９（ｂ）では、前の層の出力データ（出力テンソルデータ）用のメモリ領域のメモリサイズは、メモリ５３０毎に８４００バイトであり、プロセッサ５００は、演算結果である出力データを、８４００バイトずつ４つのメモリ５３０に分散して格納する。一方、対象層において、図８のステップＳ１６で算出されたメモリサイズは３８４００バイト（９６００×４）であり、配置パターンは分散型である。対象層の演算を実行する処理命令で使用するメモリ５３０毎のメモリサイズは９６００バイトとなり、図８のステップＳ２０、Ｓ２４の判定において、配置パターンは一致するが、メモリサイズは不一致となる。 In FIG. 9B, the memory size of the memory area for the output data (output tensor data) of the previous layer is 8400 bytes for each memory 530, and the processor 500 displays the output data which is the calculation result of 8400 bytes. Each is distributed and stored in four memories 530. On the other hand, in the target layer, the memory size calculated in step S16 of FIG. 8 is 38400 bytes (9600 × 4), and the arrangement pattern is distributed. The memory size for each memory 530 used in the processing instruction for executing the calculation of the target layer is 9600 bytes, and in the determinations in steps S20 and S24 of FIG. 8, the arrangement patterns match, but the memory sizes do not match.

したがって、図８のステップＳ２２において、データサイズ変更部４１２は、前層の出力データを保持している第１のメモリ領域に代えて、処理命令の仕様であるメモリサイズに対応する第２のメモリ領域を各メモリ５３０に新たに確保する。そして、データサイズ変更部４１２は、新たに確保した第２のメモリ領域に、前層の出力データを転送する。 Therefore, in step S22 of FIG. 8, the data size changing unit 412 replaces the first memory area holding the output data of the previous layer with the second memory corresponding to the memory size specified in the processing instruction. An area is newly secured in each memory 530. Then, the data size changing unit 412 transfers the output data of the previous layer to the newly secured second memory area.

これにより、対象層での学習において、入力データが格納されているメモリ領域のメモリサイズと、処理命令に対応するメモリサイズとを一致させることができ、プロセッサ５００による対象層の学習を正常に実行することができる。 As a result, in the learning in the target layer, the memory size of the memory area in which the input data is stored can be matched with the memory size corresponding to the processing instruction, and the learning of the target layer by the processor 500 is normally executed. can do.

なお、データの配置パターンがコピー型であり、前層の演算結果である出力データを格納する各メモリ５３０のメモリ領域のメモリサイズが３８４００バイトより少ない場合にも、図９（ｂ）と同様の処理が行われる。すなわち、データサイズ変更部４１２は、前層の出力データを保持しているメモリ領域に代えて、処理命令の仕様であるメモリサイズ（３８４００バイト）に対応するメモリ領域を各メモリ５３０に新たに確保する。そして、データサイズ変更部４１２は、新たに確保したメモリ領域に、前層の出力データ（コピー型）を転送する。 Note that the same as in FIG. 9B even when the data arrangement pattern is a copy type and the memory size of the memory area of each memory 530 for storing the output data which is the calculation result of the previous layer is smaller than 38400 bytes. Processing is done. That is, the data size changing unit 412 newly secures a memory area corresponding to the memory size (38400 bytes), which is the specification of the processing instruction, in each memory 530 instead of the memory area holding the output data of the previous layer. To do. Then, the data size changing unit 412 transfers the output data (copy type) of the previous layer to the newly secured memory area.

これにより、対象層での学習において、入力データが格納されているメモリ領域のメモリサイズと、処理命令の仕様であるメモリサイズとが一致させることができ、プロセッサ５００による対象層の学習を正常に実行することができる。 As a result, in the learning in the target layer, the memory size of the memory area in which the input data is stored can be matched with the memory size which is the specification of the processing instruction, and the learning of the target layer by the processor 500 can be performed normally. Can be executed.

なお、入力データを保持しているメモリ領域のメモリサイズと、対象層の演算の実行する処理命令の仕様であるメモリサイズとが一致しない場合で、新たなメモリ領域を確保しない場合、対象層の演算の実行時にエラーが発生する。このため、ニューラルネットワークの学習の実行が困難になる。 If the memory size of the memory area that holds the input data does not match the memory size that is the specification of the processing instruction that executes the operation of the target layer, and if a new memory area is not secured, the target layer An error occurs when executing an operation. Therefore, it becomes difficult to execute the learning of the neural network.

図１０は、図３の情報処理装置において、ニューラルネットワークの学習対象の層における学習時の動作の別の例を示す。図９と同様の動作については、詳細な説明は省略する。図１０は、図９と同様に、画像を認識するニューラルネットワークの学習時の動作の一例を示し、各次元の要素数が、バッチサイズＮ＝３２、色Ｃ＝３、画素数Ｗ＝１０、画素数Ｈ＝１０、データが単精度浮動小数点数であるとする。 FIG. 10 shows another example of the operation during learning in the layer to be learned of the neural network in the information processing apparatus of FIG. A detailed description of the same operation as in FIG. 9 will be omitted. FIG. 10 shows an example of the operation during learning of the neural network that recognizes the image as in FIG. 9, and the number of elements in each dimension is batch size N = 32, color C = 3, number of pixels W = 10, and so on. It is assumed that the number of pixels H = 10 and the data is a single precision floating point number.

対象層のデータの配置パターンは、コピー型であり、対象層に入力される入力データ（テンソルデータ）のサイズは３８４００バイト（９６００×４）であるとする。また、図１０（ａ）では、前層のデータの配置パターンは、コピー型であり、図１０（ｂ）では、前層のデータの配置パターンは分散型であるとする。 It is assumed that the data arrangement pattern of the target layer is a copy type, and the size of the input data (tensor data) input to the target layer is 38400 bytes (9600 × 4). Further, in FIG. 10A, it is assumed that the data arrangement pattern of the front layer is a copy type, and in FIG. 10B, the data arrangement pattern of the front layer is a distributed type.

図１０（ａ）では、前層の出力データ（テンソルデータ）は、３８４００バイト（９６００×４）であるため、プロセッサ５００は、演算結果である３８４００バイトの出力データを４つのメモリ５３０のそれぞれに格納する。対象層において、図８のステップＳ１６で算出されたメモリサイズは３８４００バイトであり、配置パターンはコピー型である。このため、図８のステップＳ２０、Ｓ２４の判定において、メモリサイズ、配置パターンとも一致し、メモリサイズおよび配置パターンの変更は行われない。 In FIG. 10A, since the output data (tensor data) of the front layer is 38400 bytes (9600 × 4), the processor 500 transfers the 38400 bytes of output data, which is the calculation result, to each of the four memories 530. Store. In the target layer, the memory size calculated in step S16 of FIG. 8 is 38,400 bytes, and the arrangement pattern is a copy type. Therefore, in the determination in steps S20 and S24 of FIG. 8, the memory size and the arrangement pattern also match, and the memory size and the arrangement pattern are not changed.

そして、プロセッサ５００は、メモリ５３０に格納された前層の演算結果を入力データとして、対象層の学習を実行する。このように、対象層の学習に使用するデータを配置パターンと、対象層の学習に使用する処理命令で指定される配置パターンとが一致し、メモリサイズが一致する場合、メモリ領域をそのまま使用して処理命令を実行することができる。図９（ａ）と同様に、前層の演算後、対象層の演算を実行するまでにメモリ領域を確保するなどの前処理がないため、学習効率を低下させずに、対象層の学習において処理命令の演算を実行することができる。 Then, the processor 500 executes learning of the target layer using the calculation result of the previous layer stored in the memory 530 as input data. In this way, if the arrangement pattern of the data used for learning the target layer matches the arrangement pattern specified by the processing instruction used for learning the target layer, and the memory size matches, the memory area is used as it is. Can execute processing instructions. Similar to FIG. 9A, since there is no preprocessing such as securing a memory area after the calculation of the previous layer and before the calculation of the target layer is executed, the learning of the target layer is performed without lowering the learning efficiency. It is possible to execute the operation of the processing instruction.

図１０（ｂ）では、前の層の出力データ（テンソルデータ）用のメモリ領域は分散型として確保されており、メモリサイズは、メモリ５３０毎に９６００バイトである。プロセッサ５００は、前の層の出力データ（９６００バイト）を、各メモリ５３０に分散して格納する。このため、図８のステップＳ２０、Ｓ２４の判定において、メモリサイズおよび配置パターンとも一致しない。 In FIG. 10B, the memory area for the output data (tensor data) of the previous layer is secured as a distributed type, and the memory size is 9600 bytes for each memory 530. The processor 500 distributes and stores the output data (9,600 bytes) of the previous layer in each memory 530. Therefore, in the determination in steps S20 and S24 of FIG. 8, the memory size and the arrangement pattern do not match.

このため、図８のステップＳ２２、Ｓ２６において、処理命令の仕様であるメモリサイズと配置パターン（コピー型）に対応するメモリ領域（第３のメモリ領域）が各メモリ５３０に新たに確保される。そして、新たに確保した第３のメモリ領域に、前層の出力データがコピー型で転送される。 Therefore, in steps S22 and S26 of FIG. 8, a memory area (third memory area) corresponding to the memory size and the arrangement pattern (copy type), which are the specifications of the processing instruction, is newly secured in each memory 530. Then, the output data of the previous layer is transferred to the newly secured third memory area in a copy type.

これにより、対象層での学習で、入力データが格納されているメモリ領域のメモリサイズ／配置パターンと、処理命令に対応するメモリサイズ／配置パターンとをそれぞれ一致させることができ、プロセッサ５００による対象層の学習を正常に実行することができる。なお、前層の出力データがコピー型であり、処理命令の仕様が分散型の場合にも、分散型に対応するメモリサイズの第３のメモリ領域を新たに確保し、データを分散型として転送することで、図１０（ｂ）と同様の処理を実施することができる。 As a result, the memory size / allocation pattern of the memory area in which the input data is stored can be matched with the memory size / allocation pattern corresponding to the processing instruction by learning in the target layer, and the target by the processor 500 can be matched. Layer learning can be performed normally. Even when the output data of the previous layer is a copy type and the processing instruction specifications are distributed type, a third memory area having a memory size corresponding to the distributed type is newly secured and the data is transferred as the distributed type. By doing so, the same processing as in FIG. 10B can be performed.

なお、例えば、前層の出力データの配置パターンが分散型であり、対象層の入力データの配置パターンがコピー型である場合、各演算コア５１０での演算に使用するデータが不足するため、対象層の演算の実行時にエラーが発生する。このため、ニューラルネットワークの学習の実行が困難になる。 For example, when the arrangement pattern of the output data of the previous layer is a distributed type and the arrangement pattern of the input data of the target layer is a copy type, the data used for the calculation in each calculation core 510 is insufficient, so that the target An error occurs when performing a layer operation. Therefore, it becomes difficult to execute the learning of the neural network.

以上、図３から図１０に示す実施形態においても、図１および図２に示す実施形態と同様の効果を得ることができる。例えば、学習データを保持するメモリ領域のメモリサイズと、処理命令で指定されるバッチサイズに基づき算出されるメモリサイズとが不一致の場合、処理命令に対応するメモリサイズの新たなメモリ領域を確保して学習データを転送する。これにより、入力データが格納されているメモリ領域のメモリサイズと、処理命令に対応するメモリサイズとを一致させることができ、任意のバッチサイズが指定された場合にも、エラーを発生させることなく学習を実行することができる。 As described above, even in the embodiments shown in FIGS. 3 to 10, the same effects as those in the embodiments shown in FIGS. 1 and 2 can be obtained. For example, if the memory size of the memory area that holds the training data and the memory size calculated based on the batch size specified by the processing instruction do not match, a new memory area of the memory size corresponding to the processing instruction is secured. And transfer the training data. As a result, the memory size of the memory area where the input data is stored can be matched with the memory size corresponding to the processing instruction, and even if an arbitrary batch size is specified, an error does not occur. You can perform learning.

さらに、図３から図１０に示す実施形態では、学習データと処理命令との配置パターンが不一致の場合に、新たなメモリ領域を確保して学習データを処理命令で指定される配置パターンに合わせて新たなメモリ領域に転送する。これにより、入力データが格納されているメモリ領域のメモリサイズ／配置パターンと、処理命令の仕様であるメモリサイズ／配置パターンとをそれぞれ一致させることができ、プロセッサ５００による対象層の学習を正常に実行することができる。 Further, in the embodiment shown in FIGS. 3 to 10, when the arrangement patterns of the learning data and the processing instruction do not match, a new memory area is secured and the learning data is matched with the arrangement pattern specified by the processing instruction. Transfer to a new memory area. As a result, the memory size / allocation pattern of the memory area in which the input data is stored can be matched with the memory size / allocation pattern which is the specification of the processing instruction, and the learning of the target layer by the processor 500 can be performed normally. Can be executed.

対象層の学習に使用するデータを保持するメモリ領域のメモリサイズと、対象層の学習に使用する処理命令で指定されるバッチサイズに基づいて算出されたメモリサイズとが一致する場合、メモリ領域をそのまま使用して処理命令を実行することができる。また、対象層の学習に使用するデータを配置パターンと、対象層の学習に使用する処理命令で指定される配置パターンとが一致し、メモリサイズが一致する場合、メモリ領域をそのまま使用して処理命令を実行することができる。これらの場合、前層の演算後、対象層の演算を実行するまでにメモリ領域を確保するなどの前処理がないため、学習効率を低下させずに、対象層の学習において処理命令の演算を実行することができる。 If the memory size of the memory area that holds the data used for learning the target layer matches the memory size calculated based on the batch size specified by the processing instruction used for learning the target layer, the memory area is selected. The processing instruction can be executed by using it as it is. If the data used for learning the target layer matches the placement pattern specified by the processing instruction used for learning the target layer and the memory size matches, the memory area is used as it is for processing. Instructions can be executed. In these cases, since there is no preprocessing such as securing a memory area after the operation of the previous layer and before the operation of the target layer is executed, the operation of the processing instruction is performed in the learning of the target layer without lowering the learning efficiency. Can be executed.

この結果、任意のバッチサイズが指定された場合にも、エラーを発生させることなく学習を実行することができる。 As a result, even if an arbitrary batch size is specified, learning can be executed without causing an error.

図１１は、別の実施形態における情報処理装置の一例を示す。図１および図３と同様の要素については、詳細な説明は省略する。図１１に示す情報処理装置１００Ｂは、例えば、サーバであり、図１と同様に、ホストＣＰＵ２００、ホストメモリ３００、Ｉ／Ｏコントローラ２２０およびプロセッサ５００を有する。ホストＣＰＵ２００は、ホストメモリ３００に格納された各種プログラムを実行することで、情報処理装置１００Ｂの全体の動作を制御するとともに、プロセッサ５００の動作を制御し、学習データを用いてニューラルネットワークの学習を実行する。なお、図１１のホストメモリ３００には、図３のホストプログラム４００Ａの代わりにホストプログラム４００Ｂが格納される。 FIG. 11 shows an example of an information processing apparatus according to another embodiment. Detailed description of the same elements as those in FIGS. 1 and 3 will be omitted. The information processing device 100B shown in FIG. 11 is, for example, a server, and has a host CPU 200, a host memory 300, an I / O controller 220, and a processor 500, as in FIG. 1. The host CPU 200 controls the overall operation of the information processing device 100B by executing various programs stored in the host memory 300, controls the operation of the processor 500, and learns the neural network using the learning data. Execute. The host memory 300 of FIG. 11 stores the host program 400B instead of the host program 400A of FIG.

ニューラルネットワークに演算を実行させるためのホストプログラム４００Ｂは、ディープラーニング（ＤＬ）用のフレームワーク４１０、ディープニューラルネットワーク（ＤＮＮ）のライブラリ４２０およびランタイムライブラリ４３０を有する。 The host program 400B for causing the neural network to execute an operation has a framework 410 for deep learning (DL), a library 420 for a deep neural network (DNN), and a runtime library 430.

ＤＬフレームワーク４１０は、演算種決定部４１１、データサイズ変更部４１２、問い合わせ部４１３、変換／転送指示部４１４および実行指示部４１５を有する。演算種決定部４１１、データサイズ変更部４１２および変換／転送指示部４１４の機能は、図３に示した演算種決定部４１１、データサイズ変更部４１２および変換／転送指示部４１４の機能とそれぞれ同様である。問い合わせ部４１３および実行指示部４１５の機能は、図１２で説明する。 The DL framework 410 includes a calculation type determination unit 411, a data size change unit 412, an inquiry unit 413, a conversion / transfer instruction unit 414, and an execution instruction unit 415. The functions of the calculation type determination unit 411, the data size change unit 412, and the conversion / transfer instruction unit 414 are the same as the functions of the calculation type determination unit 411, the data size change unit 412, and the conversion / transfer instruction unit 414 shown in FIG. Is. The functions of the inquiry unit 413 and the execution instruction unit 415 will be described with reference to FIG.

ＤＮＮライブラリ４２０は、情報管理部４２１、演算処理予測部４２２および処理命令が格納されたファイル４２３を有する。情報管理部４２１の機能は、図３に示した情報管理部４２１の機能と同様である。演算処理予測部４２２の機能は、図１２および図１３で説明する。 The DNN library 420 has an information management unit 421, an arithmetic processing prediction unit 422, and a file 423 in which processing instructions are stored. The function of the information management unit 421 is the same as the function of the information management unit 421 shown in FIG. The functions of the arithmetic processing prediction unit 422 will be described with reference to FIGS. 12 and 13.

ランタイムライブラリ４３０は、変換／転送実行部４３１および処理命令実行部４３２を有する。ランタイムライブラリ４３０は、プロセッサ５００に演算を実行させるためのインタフェースであり、プロセッサ５００の仕様に合わせて設計され、例えば、複数の演算コア５１０に演算を並列に実行させる制御を実行する。変換／転送実行部４３１および処理命令実行部４３２の機能は、図１２で説明する。 The runtime library 430 has a conversion / transfer execution unit 431 and a processing instruction execution unit 432. The runtime library 430 is an interface for causing the processor 500 to execute an operation, and is designed according to the specifications of the processor 500. For example, the runtime library 430 executes a control for causing a plurality of arithmetic cores 510 to execute an operation in parallel. The functions of the conversion / transfer execution unit 431 and the processing instruction execution unit 432 will be described with reference to FIG.

図１２は、図１１のホストＣＰＵ２００がホストプログラム４００Ｂを実行することで実現される機能の一例を示す。以下では、複数の層を有するディープニューラルネットワークにおいて、学習対象の層である対象層の学習を実行する場合の動作について説明する。なお、上述した実施形態で説明した機能および動作と同様の機能および動作については、詳細な説明は省略する。 FIG. 12 shows an example of the function realized by the host CPU 200 of FIG. 11 executing the host program 400B. In the following, in a deep neural network having a plurality of layers, the operation when learning the target layer, which is the layer to be learned, is described. A detailed description of the same functions and operations as those described in the above-described embodiment will be omitted.

ＤＬフレームワーク４１０は、対象層の学習を実行する指示を受けた場合、演算種決定部４１１に演算種を決定させる。この後、変換／転送指示部４１４は、ユーザにより指定されたバッチサイズと、入出力データのサイズおよび配置パターン等に基づいて、演算に使用するメモリ領域を各メモリ５３０に確保する指示をランタイムライブラリ４３０に発行する。入出力データのサイズおよび配置パターン等は、ユーザ定義ファイル３１０に格納されたニューラルネットワークの仕様と学習データとに基づいて求められる。 When the DL framework 410 receives an instruction to execute learning of the target layer, the DL framework 410 causes the calculation type determination unit 411 to determine the calculation type. After that, the conversion / transfer instruction unit 414 gives an instruction to secure a memory area used for the calculation in each memory 530 based on the batch size specified by the user, the size of the input / output data, the arrangement pattern, and the like. Issued at 430. The size and arrangement pattern of the input / output data are obtained based on the specifications of the neural network stored in the user-defined file 310 and the learning data.

また、変換／転送指示部４１４は、各メモリ５３０に確保したメモリ領域に、演算に使用する学習データを転送する指示をランタイムライブラリ４３０に発行する。なお、上述した実施形態と同様に、対象層の前の層である前層の出力データが対象層の入力データとしてメモリ５３０に保持されている場合、変換／転送指示部４１４は、メモリ領域の確保と学習データの転送指示を発行せず、データの転送指示を発行しない。 Further, the conversion / transfer instruction unit 414 issues an instruction to transfer the learning data used for the calculation to the run-time library 430 to the memory area secured in each memory 530. As in the above-described embodiment, when the output data of the previous layer, which is the layer before the target layer, is held in the memory 530 as the input data of the target layer, the conversion / transfer instruction unit 414 is in the memory area. Do not issue secure and learn data transfer instructions, and do not issue data transfer instructions.

ランタイムライブラリ４３０の変換／転送実行部４３１は、変換／転送指示部４１４からのメモリ領域の確保の指示に基づいて、メモリ５３０のメモリ領域を確保し、完了通知をＤＬフレームワーク４１０に発行する。また、変換／転送実行部４３１は、変換／転送指示部４１４からのデータ転送指示に基づいて、確保したメモリ領域に学習データを転送し、完了通知をＤＬフレームワーク４１０に発行する。 The conversion / transfer execution unit 431 of the runtime library 430 allocates the memory area of the memory 530 based on the instruction of the conversion / transfer instruction unit 414 to secure the memory area, and issues a completion notification to the DL framework 410. Further, the conversion / transfer execution unit 431 transfers the learning data to the reserved memory area based on the data transfer instruction from the conversion / transfer instruction unit 414, and issues a completion notification to the DL framework 410.

問い合わせ部４１３は、ＤＮＮライブラリ４２０の情報管理部４２１に、演算種とユーザにより指定されたバッチサイズとを通知し、対象層の学習に使用する処理命令を問い合わせる。また、問い合わせ部４１３は、情報管理部４２１に各次元の要素数を通知し、対象層の学習に使用する処理命令で使用するメモリサイズを問い合わせる。さらに、問い合わせ部４１３は、ランタイムライブラリ４３０に確保させたメモリ領域のメモリサイズと配置パターンとを演算処理予測部４２２に通知し、演算をより高速に実行できる演算方法を問い合わせる。 The inquiry unit 413 notifies the information management unit 421 of the DNN library 420 of the calculation type and the batch size specified by the user, and inquires about the processing instruction used for learning the target layer. Further, the inquiry unit 413 notifies the information management unit 421 of the number of elements in each dimension, and inquires about the memory size used in the processing instruction used for learning the target layer. Further, the inquiry unit 413 notifies the arithmetic processing prediction unit 422 of the memory size and the arrangement pattern of the memory area secured in the runtime library 430, and inquires about the arithmetic method capable of executing the arithmetic at a higher speed.

情報管理部４２１は、図５に示した各処理命令の仕様情報を保持しており、対象層の学習に適した処理命令を検索して問い合わせ部４１３に応答する。この際、情報管理部４２１は、検索で見つけた処理命令の対応バッチ数とテンソルデータ毎のデータの配置パターンを問い合わせ部４１３に通知する。また、情報管理部４２１は、問い合わせ部４１３から通知されたバッチサイズと各次元の要素数とに基づいて、検索で見つけた処理命令による対象層の学習（演算）の実行に使用するメモリサイズを算出する。そして、情報管理部４２１は、算出したメモリサイズを問い合わせ部４１３に通知する。 The information management unit 421 holds the specification information of each processing instruction shown in FIG. 5, searches for a processing instruction suitable for learning of the target layer, and responds to the inquiry unit 413. At this time, the information management unit 421 notifies the inquiry unit 413 of the number of corresponding batches of the processing instructions found by the search and the data arrangement pattern for each tensor data. Further, the information management unit 421 determines the memory size used for executing the learning (calculation) of the target layer by the processing instruction found by the search, based on the batch size notified from the inquiry unit 413 and the number of elements in each dimension. calculate. Then, the information management unit 421 notifies the inquiry unit 413 of the calculated memory size.

演算処理予測部４２２は、問い合わせ部４１３からの問い合わせに基づいて、演算をより高速に実行できる演算方法を問い合わせ部４１３に通知する。演算処理予測部４２２は、メモリサイズと配置パターンとを変更せずに処理命令を実行する場合と、メモリサイズおよび配置パターンの一方または両方を変更して処理命令を実行する場合のいずれが、演算効率が高いかを問い合わせ部４１３に通知する。演算処理予測部４２２の機能については、図１３で説明する。 The arithmetic processing prediction unit 422 notifies the inquiry unit 413 of an arithmetic method capable of executing the arithmetic at a higher speed based on the inquiry from the inquiry unit 413. The arithmetic processing prediction unit 422 calculates whether the processing instruction is executed without changing the memory size and the allocation pattern, or the processing instruction is executed by changing one or both of the memory size and the allocation pattern. Notify the inquiry unit 413 whether the efficiency is high. The function of the arithmetic processing prediction unit 422 will be described with reference to FIG.

問い合わせ部４１３は、情報管理部４２１から通知された処理命令を保持する。また、問い合わせ部４１３は、メモリサイズまたは配置パターンを変更したほうが、演算効率が高いことを演算処理予測部４２２から通知された場合、メモリサイズまたは配置パターンの変更を実行する。すなわち、処理命令に対応するメモリサイズとメモリ領域のメモリサイズとの一致／不一致を判定し、処理命令で指定される配置パターンとメモリ領域に保持されたデータの配置パターンとの一致／不一致とを判定し、判定結果に応じた処理を実行する。 The inquiry unit 413 holds the processing instruction notified from the information management unit 421. Further, when the calculation processing prediction unit 422 notifies that the calculation efficiency is higher when the memory size or the arrangement pattern is changed, the inquiry unit 413 changes the memory size or the arrangement pattern. That is, the match / mismatch between the memory size corresponding to the processing instruction and the memory size of the memory area is determined, and the match / mismatch between the arrangement pattern specified by the processing instruction and the arrangement pattern of the data held in the memory area is determined. Judgment is made and processing is executed according to the judgment result.

一方、問い合わせ部４１３は、メモリサイズまたは配置パターンを変更しないほうが、演算効率が高いことを演算処理予測部４２２から通知された場合、メモリサイズの一致／不一致の判定および配置パターンの一致／不一致の判定を実行しない。この場合、ＤＬフレームワーク４１０は、現在のメモリ領域に保持されたデータを使用して、演算コア５１０に処理命令の演算を実行させる。 On the other hand, when the calculation processing prediction unit 422 notifies that the calculation efficiency is higher when the memory size or the arrangement pattern is not changed, the inquiry unit 413 determines the memory size match / mismatch and the arrangement pattern match / mismatch. Do not execute the judgment. In this case, the DL framework 410 causes the arithmetic core 510 to execute the arithmetic of the processing instruction by using the data held in the current memory area.

問い合わせ部４１３は、情報管理部４２１から通知されたメモリサイズが、各メモリ５３０に確保されているメモリ領域のメモリサイズと一致するか否かを、配置パターンを考慮して判定する。問い合わせ部４１３は、メモリサイズが不一致の場合、メモリサイズの変更をデータサイズ変更部４１２に指示する。 The inquiry unit 413 determines whether or not the memory size notified from the information management unit 421 matches the memory size of the memory area reserved for each memory 530 in consideration of the arrangement pattern. When the memory sizes do not match, the inquiry unit 413 instructs the data size change unit 412 to change the memory size.

また、問い合わせ部４１３は、情報管理部４２１から通知された配置パターンが、メモリ領域への学習データの転送時に使用した配置パターンと一致するか否かを判定する。問い合わせ部４１３は、配置パターンが不一致の場合、情報管理部４２１から通知された配置パターンでメモリ領域を新たに確保する指示をデータサイズ変更部４１２に発行する。さらに、問い合わせ部４１３は、メモリサイズが不一致の場合、または、配置パターンが不一致の場合、新たに確保したメモリ領域に対象層の学習で使用する学習データを転送する指示を変換／転送指示部４１４に発行する。 Further, the inquiry unit 413 determines whether or not the arrangement pattern notified from the information management unit 421 matches the arrangement pattern used when transferring the learning data to the memory area. When the arrangement patterns do not match, the inquiry unit 413 issues an instruction to the data size change unit 412 to newly secure the memory area with the arrangement pattern notified from the information management unit 421. Further, when the memory size does not match or the arrangement pattern does not match, the inquiry unit 413 converts an instruction to transfer the learning data used for learning the target layer to the newly secured memory area 414. Issued to.

データサイズ変更部４１２は、問い合わせ部４１３からの指示に基づいて、情報管理部４２１から通知されたメモリサイズのメモリ領域を各メモリ５３０に確保する指示をランタイムライブラリ４３０に発行する。ランタイムライブラリ４３０は、データサイズ変更部４１２からの指示に基づいて、各メモリ５３０に新たなメモリ領域を確保し、完了通知をデータサイズ変更部４１２に発行する。 Based on the instruction from the inquiry unit 413, the data size change unit 412 issues an instruction to the runtime library 430 to secure a memory area of the memory size notified from the information management unit 421 in each memory 530. The runtime library 430 allocates a new memory area in each memory 530 based on the instruction from the data size change unit 412, and issues a completion notification to the data size change unit 412.

変換／転送指示部４１４は、問い合わせ部４１３からの指示に基づいて、対象層の学習で使用する学習データを新たに確保したメモリ領域に転送する指示をランタイムライブラリ４３０の変換／転送実行部４３１に発行する。変換／転送実行部４３１は、変換／転送指示部４１４からの指示に基づいて、ユーザ定義ファイル３１０から新たに確保したメモリ領域に学習データを転送し、完了通知を変換／転送指示部４１４に発行する。 Based on the instruction from the inquiry unit 413, the conversion / transfer instruction unit 414 sends an instruction to transfer the learning data used for learning the target layer to the newly secured memory area to the conversion / transfer execution unit 431 of the runtime library 430. Issue. The conversion / transfer execution unit 431 transfers the learning data from the user-defined file 310 to the newly secured memory area based on the instruction from the conversion / transfer instruction unit 414, and issues a completion notification to the conversion / transfer instruction unit 414. To do.

対象層の学習に使用する学習データが配置パターンにしたがってメモリ５３０に保持されている場合、ＤＬフレームワーク４１０の実行指示部４１５は、処理命令を指定して、処理命令の実行をＤＮＮライブラリ４２０に指示する。ＤＮＮライブラリ４２０は、実行指示部４１５からの指示に基づいて、ランタイムライブラリ４３０の処理命令実行部４３２に処理命令の実行を指示する。そして、処理命令実行部４３２は、対象層の学習用の演算を実行させる演算コア５１０のメモリ５３０に処理命令を転送し、演算コア５１０に処理命令を実行させる。これにより、対象層の学習が実行される。 When the learning data used for learning the target layer is held in the memory 530 according to the arrangement pattern, the execution instruction unit 415 of the DL framework 410 specifies a processing instruction and executes the processing instruction in the DNN library 420. Instruct. The DNN library 420 instructs the processing instruction execution unit 432 of the runtime library 430 to execute the processing instruction based on the instruction from the execution instruction unit 415. Then, the processing instruction execution unit 432 transfers the processing instruction to the memory 530 of the arithmetic core 510 that executes the learning operation of the target layer, and causes the arithmetic core 510 to execute the processing instruction. As a result, learning of the target layer is executed.

図１３は、メモリ５３０に確保したメモリ領域のメモリサイズおよび配置パターンと、学習対象の層の学習で使用する処理命令に対応するメモリサイズおよび配置パターンとの関係を示す。図１１および図１２の演算処理予測部４２２は、図１３に示す関係に基づいて、メモリサイズおよび配置パターンを変更せずに処理命令を実行可能か判断し、演算効率がより高い演算方法を決定する。 FIG. 13 shows the relationship between the memory size and the allocation pattern of the memory area secured in the memory 530 and the memory size and the allocation pattern corresponding to the processing instructions used in the learning of the layer to be learned. Based on the relationship shown in FIG. 13, the arithmetic processing prediction unit 422 of FIGS. 11 and 12 determines whether the processing instruction can be executed without changing the memory size and the arrangement pattern, and determines the arithmetic method having higher arithmetic efficiency. To do.

対象層の学習で使用する処理命令は、ＤＬフレームワーク４１０からの問い合わせに基づいて、ＤＮＮライブラリ４２０の情報管理部４２１が選択した処理命令である。メモリ５３０に確保したメモリ領域は、ユーザ定義ファイル３１０に格納されたニューラルネットワークの仕様と学習データとに基づいて、配置パターンを考慮して算出されたメモリサイズを有する。なお、対象層が２層目以降の場合、メモリ領域のメモリサイズと配置パターンとは、前層の学習により得られた出力データを保持するメモリ領域のメモリサイズと配置パターンでもよい。 The processing instruction used in the learning of the target layer is a processing instruction selected by the information management unit 421 of the DNN library 420 based on the inquiry from the DL framework 410. The memory area secured in the memory 530 has a memory size calculated in consideration of the arrangement pattern based on the specifications of the neural network and the learning data stored in the user-defined file 310. When the target layer is the second layer or later, the memory size and arrangement pattern of the memory area may be the memory size and arrangement pattern of the memory area that holds the output data obtained by learning the previous layer.

メモリサイズおよび配置パターンの両方が、既に確保されているメモリ領域と、処理命令に対応するメモリサイズおよび配置パターンとで一致する場合、メモリサイズおよび配置パターンを変更することなく演算が可能である（演算可能なパターン１、２）。 If both the memory size and the allocation pattern match the already allocated memory area and the memory size and allocation pattern corresponding to the processing instruction, the operation can be performed without changing the memory size and the allocation pattern (). Computable patterns 1, 2).

メモリ領域のメモリサイズが処理命令に対応するメモリサイズより大きい場合、処理命令を複数回実行することで、確保したメモリ領域のメモリサイズを変更することなく演算の実行が可能である（演算可能なパターン４、５、６）。但し、処理命令を複数回実行するため、演算時間は、処理命令の実行回数に依存して増加する。この際、メモリ領域のうち、学習に使用する有効なデータを保持していない領域のデータは、意味のないデータであるため、演算結果である出力データは次の学習に使用されない。 When the memory size of the memory area is larger than the memory size corresponding to the processing instruction, the operation can be executed without changing the memory size of the secured memory area by executing the processing instruction multiple times (calculation is possible). Patterns 4, 5, 6). However, since the processing instruction is executed a plurality of times, the calculation time increases depending on the number of times the processing instruction is executed. At this time, since the data in the memory area that does not hold the valid data used for learning is meaningless data, the output data that is the calculation result is not used for the next learning.

なお、メモリ領域のメモリサイズが処理命令に対応するメモリサイズより大きい場合、メモリ領域のメモリサイズを処理命令に対応するメモリサイズに合わせて再確保し、再確保したメモリ領域にデータを転送した後、演算が実行されてもよい。但し、この場合、メモリ領域を再確保する時間と、再確保したメモリ領域にデータを転送する時間が、処理命令の実行時間とは別に掛かる。 If the memory size of the memory area is larger than the memory size corresponding to the processing instruction, the memory size of the memory area is re-allocated according to the memory size corresponding to the processing instruction, and the data is transferred to the re-allocated memory area. , The operation may be performed. However, in this case, the time for reallocating the memory area and the time for transferring the data to the re-allocated memory area are different from the execution time of the processing instruction.

メモリ領域の配置パターンがコピー型で、処理命令で指定される配置パターンが分散型の場合、分散型での処理命令の演算を複数回実行することで、配置パターンを変更することなく演算の実行が可能である（演算可能なパターン３、６）。例えば、図７に示したように、４つの演算コア５１０で４つのデータＤ１−Ｄ４を演算する場合、コピー型で配置したデータを分散型の処理命令で実行する場合、演算時間は４倍掛かる。 When the memory area allocation pattern is copy type and the allocation pattern specified by the processing instruction is distributed, the operation can be executed without changing the allocation pattern by executing the operation of the distributed processing instruction multiple times. Is possible (calculable patterns 3 and 6). For example, as shown in FIG. 7, when the four data D1-D4 are calculated by the four calculation cores 510, the calculation time is four times longer when the data arranged in the copy type is executed by the distributed processing instruction. ..

既に確保されているメモリ領域と、処理命令で指定される配置パターンとが互いに異なる場合、処理命令で指定される配置パターンに合わせたメモリサイズのメモリ領域を再確保し、再確保したメモリ領域にデータを転送した後、演算が実行されてもよい。但し、この場合、メモリ領域を再確保する時間と、再確保したメモリ領域にデータを転送する時間が、処理命令の実行時間とは別に掛かる。 If the already allocated memory area and the allocation pattern specified by the processing instruction are different from each other, the memory area of the memory size corresponding to the allocation pattern specified by the processing instruction is re-allocated and the re-allocated memory area is used. After transferring the data, the operation may be performed. However, in this case, the time for reallocating the memory area and the time for transferring the data to the re-allocated memory area are different from the execution time of the processing instruction.

なお、既に確保されているメモリ領域のメモリサイズが処理命令に対応するメモリサイズより小さい場合、処理命令の実行によりエラーが発生するため、処理命令に対応するメモリサイズに合わせたメモリ領域が再確保される。また、既に確保されたメモリ領域が保持する学習データの配置パターンが分散型で、処理命令で指定される配置パターンがコピー型の場合、演算が正しく実行されないため、処理命令で指定される配置パターンに合わせたメモリサイズのメモリ領域が再確保される。 If the memory size of the memory area already reserved is smaller than the memory size corresponding to the processing instruction, an error occurs due to the execution of the processing instruction, so the memory area corresponding to the memory size corresponding to the processing instruction is re-allocated. Will be done. Also, if the placement pattern of the learning data held by the memory area already secured is distributed and the placement pattern specified by the processing instruction is copy type, the operation will not be executed correctly, so the placement pattern specified by the processing instruction will not be executed correctly. The memory area of the memory size according to is re-allocated.

以上より、演算効率を向上するため、演算可能なパターン３−６については、処理命令を複数回実行する場合の演算時間と、メモリ領域を再確保してデータを転送した後、処理命令を実行する場合の演算時間とのどちらが早いかが判定される。例えば、判定には式（１）に示す条件が使用される。
（メモリサイズを変更しない場合の演算時間）＜（メモリサイズの変更とデータの転送とに掛かる時間）＋（演算時間） ‥（１）
例えば、演算処理予測部４２２は、式（１）に示す条件を満足する場合、メモリサイズおよび配置パターンを変更せずに演算を実行するほうが、メモリサイズおよび配置パターンを変更して演算を実行するよりも演算効率が高いと判断する。また、演算処理予測部４２２は、式（１）に示す条件を満足しない場合、メモリサイズおよび配置パターンを変更して演算を実行するほうが、メモリサイズおよび配置パターンを変更せずに演算を実行するよりも演算効率が高いと判断する。 From the above, in order to improve the calculation efficiency, for the pattern 3-6 that can be calculated, the calculation time when the processing instruction is executed multiple times and the processing instruction are executed after reallocating the memory area and transferring the data. It is determined which is faster than the calculation time when the operation is performed. For example, the condition shown in the equation (1) is used for the determination.
(Calculation time when the memory size is not changed) <(Time required for changing the memory size and transferring data) + (Calculation time) ‥ (1)
For example, when the arithmetic processing prediction unit 422 satisfies the condition shown in the equation (1), it is better to execute the arithmetic without changing the memory size and the allocation pattern, and to execute the arithmetic by changing the memory size and the allocation pattern. It is judged that the calculation efficiency is higher than that. Further, when the arithmetic processing prediction unit 422 does not satisfy the condition shown in the equation (1), it is better to change the memory size and the allocation pattern to execute the arithmetic, and to execute the arithmetic without changing the memory size and the allocation pattern. It is judged that the calculation efficiency is higher than that.

図１４は、図１１の情報処理装置１００Ｂがニューラルネットワークの学習を実行する場合の動作の一例を示す。図１４に示す動作は、ホストＣＰＵ２００がホストプログラム４００Ｂを実行することにより実現される。すなわち、図１４は、情報処理プログラムおよび情報処理方法の一例を示す。図８と同様の動作および図１２で説明した動作については、詳細な説明は省略する。 FIG. 14 shows an example of the operation when the information processing apparatus 100B of FIG. 11 executes the learning of the neural network. The operation shown in FIG. 14 is realized by the host CPU 200 executing the host program 400B. That is, FIG. 14 shows an example of an information processing program and an information processing method. Detailed description of the same operation as in FIG. 8 and the operation described in FIG. 12 will be omitted.

図１４では、図８のステップＳ１６とステップＳ２０の間にステップＳ１７およびステップＳ１８が挿入される。ステップＳ１０、Ｓ１２、Ｓ１４、Ｓ１６、Ｓ２０、Ｓ２２、Ｓ２４、Ｓ２６、Ｓ２８の動作は、図８と同様である。 In FIG. 14, steps S17 and S18 are inserted between steps S16 and S20 of FIG. The operations of steps S10, S12, S14, S16, S20, S22, S24, S26, and S28 are the same as those in FIG.

ステップＳ１７において、演算処理予測部４２２は、ステップＳ１６で選択された処理命令に対応するメモリサイズおよび配置パターンと、ステップＳ１２で確保されたメモリサイズおよび配置パターンとを比較する。そして、演算処理予測部４２２は、メモリサイズおよび配置パターンが図１３に示した演算可能なパターンであるか否かを判定する。メモリサイズおよび配置パターンを変更せずに演算可能な場合、処理はステップＳ１８に移行される。メモリサイズおよび配置パターンを変更せずに演算可能でない場合、演算処理予測部４２２は、メモリ領域を再確保して再確保したメモリ領域に学習データを転送するほうが、演算効率が高い旨を問い合わせ部４１３に通知する。そして、処理はステップＳ２０に移行される。 In step S17, the arithmetic processing prediction unit 422 compares the memory size and allocation pattern corresponding to the processing instruction selected in step S16 with the memory size and allocation pattern secured in step S12. Then, the arithmetic processing prediction unit 422 determines whether or not the memory size and the arrangement pattern are the arithmetically operable patterns shown in FIG. If the calculation can be performed without changing the memory size and the arrangement pattern, the process proceeds to step S18. When the calculation cannot be performed without changing the memory size and the arrangement pattern, the calculation processing prediction unit 422 re-allocates the memory area and transfers the learning data to the re-allocated memory area, so that the calculation efficiency is higher. Notify 413. Then, the process proceeds to step S20.

ステップＳ１８において、演算処理予測部４２２は、メモリ領域のメモリサイズと配置パターンを変更する場合と変更しない場合との演算効率を、例えば、式（１）の条件を用いて比較する。演算処理予測部４２２は、式（１）の条件を満足する場合、メモリサイズ（配置パターン）を変更しないほうが、演算効率が高いと判断し、処理をステップＳ２８に移行する。演算処理予測部４２２は、式（１）の条件を満足しない場合、メモリサイズ（配置パターン）を変更するほうが、演算効率が高いと判断し、処理をステップＳ２０に移行する。 In step S18, the arithmetic processing prediction unit 422 compares the arithmetic efficiency of the case where the memory size of the memory area and the arrangement pattern are changed and the case where the arrangement pattern is not changed, for example, using the condition of the equation (1). When the condition of the equation (1) is satisfied, the arithmetic processing prediction unit 422 determines that the arithmetic efficiency is higher if the memory size (arrangement pattern) is not changed, and shifts the processing to step S28. When the calculation processing prediction unit 422 does not satisfy the condition of the equation (1), the calculation processing prediction unit 422 determines that changing the memory size (arrangement pattern) has higher calculation efficiency, and shifts the processing to step S20.

このように、メモリサイズまたは配置パターンが一致しない場合にも、メモリサイズを変更せずに演算を実行でき、かつ、メモリサイズを変更するよりも演算効率が高い場合、メモリサイズを変更せずに演算を実行することで、学習時間を短縮することができる。また、式（１）を用いて、演算効率が高い演算方法を容易に判定することができる。 In this way, even if the memory size or arrangement pattern does not match, if the calculation can be executed without changing the memory size and the calculation efficiency is higher than changing the memory size, the memory size is not changed. By executing the calculation, the learning time can be shortened. Moreover, the calculation method having high calculation efficiency can be easily determined by using the equation (1).

以上、図１１から図１４に示す実施形態においても、図１から図１０に示す実施形態と同様の効果を得ることができる。例えば、学習データを保持するメモリ領域のメモリサイズと、処理命令に対応して算出されるメモリサイズとが不一致の場合、処理命令に合わせて新たなメモリ領域を確保することで、プロセッサ５００による対象層の学習を正常に実行することができる。また、学習データと処理命令との配置パターンが不一致の場合、処理命令の配置パターンに合わせて新たなメモリ領域を確保することで、プロセッサ５００による対象層の学習を正常に実行することができる。これにより、任意のバッチサイズが指定された場合にも、エラーを発生させることなく学習を実行することができる。 As described above, even in the embodiments shown in FIGS. 11 to 14, the same effects as those in the embodiments shown in FIGS. 1 to 10 can be obtained. For example, when the memory size of the memory area for holding the training data and the memory size calculated in response to the processing instruction do not match, the processor 500 targets by securing a new memory area according to the processing instruction. Layer learning can be performed normally. Further, when the arrangement patterns of the learning data and the processing instructions do not match, the learning of the target layer by the processor 500 can be normally executed by allocating a new memory area according to the arrangement pattern of the processing instructions. As a result, even if an arbitrary batch size is specified, learning can be executed without causing an error.

さらに、図１１から図１４に示す実施形態では、予め確保されたメモリ領域のメモリサイズと、処理命令に対応するメモリサイズとが異なる場合、メモリサイズを変更せずに演算可能を判定する。そして、メモリサイズを変更せずに演算可能な場合、演算効率が高い演算方法を判定し、判定結果に基づいて演算を実行する。また、予め確保されたメモリ領域が保持する学習データの配置パターンと、処理命令で指定される配置パターンとが異なる場合、メモリサイズおよび配置パターンを変更せずに演算可能を判定する。そして、メモリサイズおよび配置パターンを変更せずに演算可能な場合、演算効率が高い演算方法を判定し、判定結果に基づいて演算を実行する。これにより、演算効率が高い演算方法で学習を実行することができ、ニューラルネットワークの学習時間を短縮することができる。 Further, in the embodiment shown in FIGS. 11 to 14, when the memory size of the memory area secured in advance and the memory size corresponding to the processing instruction are different, it is determined that the calculation is possible without changing the memory size. Then, when the calculation can be performed without changing the memory size, the calculation method having high calculation efficiency is determined, and the calculation is executed based on the determination result. Further, when the arrangement pattern of the learning data held in the memory area secured in advance and the arrangement pattern specified by the processing instruction are different, it is determined that the calculation is possible without changing the memory size and the arrangement pattern. Then, when the calculation can be performed without changing the memory size and the arrangement pattern, the calculation method having high calculation efficiency is determined, and the calculation is executed based on the determination result. As a result, learning can be executed by a calculation method with high calculation efficiency, and the learning time of the neural network can be shortened.

図１５は、別の実施形態における情報処理装置において、処理命令を実行する一例を示す。図１５に示す情報処理装置１００Ｃの構成は、図１１に示す情報処理装置１００Ｂの構成と同様である。 FIG. 15 shows an example of executing a processing instruction in the information processing apparatus according to another embodiment. The configuration of the information processing device 100C shown in FIG. 15 is the same as the configuration of the information processing device 100B shown in FIG.

図１５では、ユーザ定義ファイル３１０に格納されるニューラルネットワークの規模と学習データ量とが、図１１に示す情報処理装置１００Ｂにより学習するニューラルネットワークの規模と学習データ量よりもいずれも小さい。このため、１つの演算コア５１０によりニューラルネットワークの学習が実行可能である。この場合、複数の演算コア５１０による並列演算を実行しないことで、複数の演算コア５１０を動作させる場合に比べて、プロセッサ５００の消費電力を削減することができる。 In FIG. 15, the scale of the neural network and the amount of training data stored in the user-defined file 310 are both smaller than the scale of the neural network and the amount of training data learned by the information processing apparatus 100B shown in FIG. Therefore, the learning of the neural network can be executed by one arithmetic core 510. In this case, by not executing the parallel calculation by the plurality of calculation cores 510, the power consumption of the processor 500 can be reduced as compared with the case where the plurality of calculation cores 510 are operated.

情報処理装置１００Ｃによるニューラルネットワークの学習は、図１４と同様に実行される。但し、１つの演算コア５１０により学習を実行する場合、学習に使用する全てのデータは１つのメモリ５３０に格納されるため、配置パターンを考慮しなくてよい。このため、情報処理装置１００Ｃでは図１４のステップＳ２４、Ｓ２６の処理は省略され、図１４のステップＳ１７では、メモリ領域のメモリサイズが処理命令に対応するメモリサイズ以上の場合、演算可能と判定される。 The learning of the neural network by the information processing apparatus 100C is executed in the same manner as in FIG. However, when learning is executed by one arithmetic core 510, all the data used for learning is stored in one memory 530, so it is not necessary to consider the arrangement pattern. Therefore, in the information processing apparatus 100C, the processes of steps S24 and S26 of FIG. 14 are omitted, and in step S17 of FIG. 14, if the memory size of the memory area is equal to or larger than the memory size corresponding to the processing instruction, it is determined that the calculation is possible. To.

なお、図１に示す情報処理装置１００または図３に示す情報処理装置１００Ａにおいて、情報処理装置１００Ｃと同様に、１つの演算コア５１０によりニューラルネットワークの学習を実行してもよい。この場合、情報処理装置１００、１００Ａは、１つの演算コア５１０を演算対象として、図２に示す動作または図８に示す動作を実行する。但し、１つの演算コア５１０により学習を実行する場合、配置パターンを考慮しなくてよいため、図８のステップＳ２４、Ｓ２６の処理は省略される。 In the information processing device 100 shown in FIG. 1 or the information processing device 100A shown in FIG. 3, the neural network learning may be executed by one arithmetic core 510 in the same manner as in the information processing device 100C. In this case, the information processing devices 100 and 100A execute the operation shown in FIG. 2 or the operation shown in FIG. 8 with one calculation core 510 as the calculation target. However, when learning is executed by one arithmetic core 510, the processing of steps S24 and S26 in FIG. 8 is omitted because it is not necessary to consider the arrangement pattern.

以上、この実施形態においても、図１から図１４に示す実施形態と同様の効果を得ることができる。 As described above, also in this embodiment, the same effect as that of the embodiment shown in FIGS. 1 to 14 can be obtained.

図１６は、さらなる別の実施形態における情報処理装置において、処理命令を実行する一例を示す。図１５に示す情報処理装置１００Ｄの構成は、図１１に示す情報処理装置１００Ｂの構成と同様である。 FIG. 16 shows an example of executing a processing instruction in the information processing apparatus according to still another embodiment. The configuration of the information processing device 100D shown in FIG. 15 is the same as the configuration of the information processing device 100B shown in FIG.

図１６では、複数のユーザ定義ファイル３１０（１、２、３）に構成情報が格納される複数のニューラルネットワークの学習が並列して実行される。すなわち、各ユーザ定義ファイル３１０に格納されるニューラルネットワークの規模と学習データ量とは、図１１に示す情報処理装置１００Ｂにより学習するニューラルネットワークの規模と学習データ量よりもいずれも小さい。 In FIG. 16, learning of a plurality of neural networks in which configuration information is stored in a plurality of user-defined files 310 (1, 2, 3) is executed in parallel. That is, the scale of the neural network and the amount of training data stored in each user-defined file 310 are both smaller than the scale of the neural network and the amount of training data learned by the information processing apparatus 100B shown in FIG.

ユーザ定義ファイル１のニューラルネットワークの学習は、２つの演算コア５１０により、図１４と同様に実行される。ユーザ定義ファイル２、３のニューラルネットワークの各々の学習は、１つの演算コア５１０により、図１５に示した実施形態と同様に実行される。例えば、ＤＬフレームワーク４１０、ＤＮＮライブラリ４２０およびランタイムライブラリ４３０を含む４００Ｂ（図１１）は、ユーザ毎に設けられる。 The learning of the neural network of the user-defined file 1 is executed by the two arithmetic cores 510 in the same manner as in FIG. The learning of each of the neural networks of the user-defined files 2 and 3 is executed by one arithmetic core 510 in the same manner as in the embodiment shown in FIG. For example, a 400B (FIG. 11) including the DL framework 410, the DNN library 420, and the runtime library 430 is provided for each user.

なお、図１に示す情報処理装置１００または図３に示す情報処理装置１００Ａにおいて、情報処理装置１００Ｄと同様に、複数のユーザのニューラルネットワークの学習を並列に実行してもよい。この場合、情報処理装置１００、１００Ａは、ユーザ毎に図２に示す動作または図８に示す動作を実行する。但し、１つの演算コア５１０により学習を実行する場合、配置パターンを考慮しなくてよいため、図８のステップＳ２４、Ｓ２６の処理は省略される。 In the information processing device 100 shown in FIG. 1 or the information processing device 100A shown in FIG. 3, the learning of the neural networks of a plurality of users may be executed in parallel in the same manner as in the information processing device 100D. In this case, the information processing devices 100 and 100A execute the operation shown in FIG. 2 or the operation shown in FIG. 8 for each user. However, when learning is executed by one arithmetic core 510, the processing of steps S24 and S26 in FIG. 8 is omitted because it is not necessary to consider the arrangement pattern.

以上、この実施形態において、図１から図１４に示す実施形態と同様の効果を得ることができる。 As described above, in this embodiment, the same effect as that of the embodiment shown in FIGS. 1 to 14 can be obtained.

以上の図１から図１６に示す実施形態に関し、さらに以下の付記を開示する。
（付記１）
演算器と該演算器で使用するデータを保持するメモリとを含む１以上の演算部を有するプロセッサが実行する複数の層を含むニューラルネットワークの学習を制御する情報処理プログラムであって、
学習対象の層の学習に使用する学習データのサイズと入力されたバッチサイズとに基づいて、処理するバッチサイズが異なる複数の処理命令の中から前記プロセッサに実行させる処理命令を選択し、
前記学習データのサイズと選択した処理命令で指定されるバッチサイズとに基づいて、学習対象の層の学習に使用するメモリサイズを算出し、
前記学習データを保持する前記メモリの第１のメモリ領域のメモリサイズが、算出したメモリサイズと一致しない場合、前記算出したメモリサイズを有する第２のメモリ領域を前記メモリに確保して前記学習データを転送し、前記第２のメモリ領域に転送した前記学習データを使用して、前記選択した処理命令を前記プロセッサに実行させる、処理を情報処理装置に実行させる情報処理プログラム。
（付記２）
前記学習データを前記メモリに配置する配置パターンとして、前記学習データを複数の前記演算部の前記メモリに分散して配置する分散型と、前記学習データを複数の前記演算部の前記メモリに重複して配置する重複型とを有し、
前記第１のメモリ領域に保持された前記学習データの配置パターンが、前記選択した処理命令で使用する配置パターンと一致しない場合、第３のメモリ領域を前記メモリに確保し、前記選択した処理命令で使用する配置パターンで前記第３のメモリ領域に前記学習データを転送し、前記第３のメモリ領域に転送した前記学習データを使用して前記選択した処理命令を前記プロセッサに実行させる処理を情報処理装置に実行させる、付記１に記載の情報処理プログラム。
（付記３）
前記第１のメモリ領域のメモリサイズが、前記算出したメモリサイズと一致する場合、前記第１のメモリ領域に保持された前記学習データを使用して前記選択した処理命令を前記プロセッサに実行させる処理を情報処理装置に実行させる付記１または付記２に記載の情報処理プログラム。
（付記４）
前記第１のメモリ領域に保持された前記学習データの配置パターンが、前記選択した処理命令で使用する配置パターンと一致し、かつ、前記第１のメモリ領域のメモリサイズが、前記算出したメモリサイズと一致する場合、前記第１のメモリ領域に保持された前記学習データを使用して前記選択した処理命令を前記プロセッサに実行させる処理を情報処理装置に実行させる付記１または付記２に記載の情報処理プログラム。
（付記５）
前記第１のメモリ領域のメモリサイズが、前記算出したメモリサイズより大きい場合、前記第１のメモリ領域に保持された前記学習データを使用して前記選択した処理命令を実行する場合と、前記第２のメモリ領域を確保し前記学習データを転送して前記選択した処理命令を実行する場合との演算効率を算出し、
演算効率が高い方で、前記選択した処理命令を前記プロセッサに実行させる処理を情報処理装置に実行させる、付記１ないし付記４のいずれか１項に記載の情報処理プログラム。
（付記６）
前記第１のメモリ領域に前記重複型で前記学習データが保持され、前記選択した処理命令のデータの配置パターンが前記分散型である場合、前記第１のメモリ領域に保持された前記学習データを使用して前記選択した処理命令を実行する場合と、前記第３のメモリ領域を確保して前記分散型で前記学習データを転送して前記選択した処理命令を実行する場合との演算効率を算出し、
演算効率の高い方で、前記選択した処理命令を前記プロセッサに実行させる処理を情報処理装置に実行させる、付記２に記載の情報処理プログラム。
（付記７）
前記演算効率は、前記第１のメモリ領域に保持された前記学習データを使用して前記選択した処理命令を実行する場合の演算時間、または、新たにメモリ領域を確保して前記学習データを転送する時間と、新たに確保したメモリ領域に保持された前記学習データを使用して前記選択した処理命令を実行する場合の演算時間との和で示される、付記５または付記６に記載の情報処理プログラム。
（付記８）
メモリサイズの一致／不一致を、学習対象の層に入力される入力テンソルデータと学習対象の層から出力される出力テンソルデータ毎に判定する、付記１ないし付記７のいずれか１項に記載の情報処理プログラム。
（付記９）
配置パターンの一致／不一致を、学習対象の層に入力される入力テンソルデータと学習対象の層から出力される出力テンソルデータ毎に判定する、付記２ないし付記８のいずれか１項に記載の情報処理プログラム。
（付記１０）
前記ニューラルネットワークの構成情報に基づいて、学習対象の層の学習に使用する演算方式を選択し、
前記プロセッサに実行させる処理命令を、決定した演算方式に対応する前記複数の処理命令の中から選択する、付記１ないし付記９のいずれか１項に記載の情報処理プログラム。
（付記１１）
演算器と該演算器で使用するデータを保持するメモリとを含む１以上の演算部を有するプロセッサが実行する複数の層を含むニューラルネットワークの学習を制御する情報処理方法であって、
学習対象の層の学習に使用する学習データのサイズと入力されたバッチサイズとに基づいて、処理するバッチサイズが異なる複数の処理命令の中から前記プロセッサに実行させる処理命令を選択し、
前記学習データのサイズと選択した処理命令で指定されるバッチサイズとに基づいて、学習対象の層の学習に使用するメモリサイズを算出し、
前記学習データを保持する前記メモリの第１のメモリ領域のメモリサイズが、算出したメモリサイズと一致しない場合、前記算出したメモリサイズを有する第２のメモリ領域を前記メモリに確保して前記学習データを転送し、前記第２のメモリ領域に転送した前記学習データを使用して、前記選択した処理命令を前記プロセッサに実行させる、処理を情報処理装置に実行させる情報処理方法。 The following additional notes will be further disclosed with respect to the embodiments shown in FIGS. 1 to 16 described above.
(Appendix 1)
An information processing program that controls learning of a neural network including a plurality of layers executed by a processor having one or more arithmetic units including an arithmetic unit and a memory for holding data used in the arithmetic unit.
Based on the size of the training data used for learning the layer to be learned and the input batch size, a processing instruction to be executed by the processor is selected from a plurality of processing instructions having different batch sizes to be processed.
Based on the size of the training data and the batch size specified by the selected processing instruction, the memory size used for training the layer to be trained is calculated.
When the memory size of the first memory area of the memory that holds the training data does not match the calculated memory size, a second memory area having the calculated memory size is secured in the memory and the training data. An information processing program that causes an information processing apparatus to execute processing by causing the processor to execute the selected processing instruction using the learning data transferred to the second memory area.
(Appendix 2)
As an arrangement pattern for arranging the learning data in the memory, a distributed type in which the learning data is distributed and arranged in the memories of a plurality of the calculation units, and a distribution type in which the learning data is distributed and arranged in the memories of the plurality of calculation units are duplicated. Has a duplicate type to be placed
When the arrangement pattern of the training data held in the first memory area does not match the arrangement pattern used in the selected processing instruction, a third memory area is secured in the memory and the selected processing instruction is used. Information processing that transfers the training data to the third memory area in the arrangement pattern used in the above-mentioned third memory area, and causes the processor to execute the selected processing instruction using the learning data transferred to the third memory area. The information processing program according to Appendix 1, which is executed by a processing device.
(Appendix 3)
When the memory size of the first memory area matches the calculated memory size, a process of causing the processor to execute the selected processing instruction using the learning data held in the first memory area. The information processing program according to Appendix 1 or Appendix 2, which causes the information processing apparatus to execute the above.
(Appendix 4)
The arrangement pattern of the training data held in the first memory area matches the arrangement pattern used in the selected processing instruction, and the memory size of the first memory area is the calculated memory size. If the information matches, the information according to Appendix 1 or Appendix 2, which causes the information processing apparatus to execute a process of causing the processor to execute the selected processing instruction using the learning data held in the first memory area. Processing program.
(Appendix 5)
When the memory size of the first memory area is larger than the calculated memory size, the learning data held in the first memory area is used to execute the selected processing instruction, and the first. The calculation efficiency of the case where the memory area of 2 is secured, the learning data is transferred, and the selected processing instruction is executed is calculated.
The information processing program according to any one of Supplementary note 1 to Supplementary note 4, wherein the information processing apparatus is made to execute a process of causing the processor to execute the selected processing instruction, whichever has higher calculation efficiency.
(Appendix 6)
When the learning data is held in the first memory area in the duplicate type and the data arrangement pattern of the selected processing instruction is the distributed type, the learning data held in the first memory area is stored. Calculates the calculation efficiency between the case where the selected processing instruction is executed by using the third memory area and the case where the learning data is transferred in the distributed type and the selected processing instruction is executed. And
The information processing program according to Appendix 2, wherein the information processing apparatus is made to execute a process of causing the processor to execute the selected processing instruction with higher calculation efficiency.
(Appendix 7)
The calculation efficiency is the calculation time when the selected processing instruction is executed using the learning data held in the first memory area, or the learning data is transferred by securing a new memory area. The information processing according to Appendix 5 or Appendix 6, which is indicated by the sum of the time to be processed and the calculation time when the selected processing instruction is executed using the learning data held in the newly secured memory area. program.
(Appendix 8)
The information according to any one of Appendix 1 to Appendix 7, which determines the match / mismatch of the memory size for each of the input tensor data input to the learning target layer and the output tensor data output from the learning target layer. Processing program.
(Appendix 9)
The information according to any one of Supplementary note 2 to Appendix 8 that determines the match / mismatch of the arrangement pattern for each of the input tensor data input to the learning target layer and the output tensor data output from the learning target layer. Processing program.
(Appendix 10)
Based on the configuration information of the neural network, the calculation method used for learning the layer to be learned is selected.
The information processing program according to any one of Supplementary note 1 to Supplementary note 9, wherein a processing instruction to be executed by the processor is selected from the plurality of processing instructions corresponding to the determined calculation method.
(Appendix 11)
An information processing method that controls learning of a neural network including a plurality of layers executed by a processor having one or more arithmetic units including an arithmetic unit and a memory for holding data used in the arithmetic unit.
Based on the size of the training data used for learning the layer to be learned and the input batch size, a processing instruction to be executed by the processor is selected from a plurality of processing instructions having different batch sizes to be processed.
Based on the size of the training data and the batch size specified by the selected processing instruction, the memory size used for training the layer to be trained is calculated.
When the memory size of the first memory area of the memory that holds the training data does not match the calculated memory size, a second memory area having the calculated memory size is secured in the memory and the training data. An information processing method for causing an information processing apparatus to execute processing by causing the processor to execute the selected processing instruction by using the learning data transferred to the second memory area.

以上の詳細な説明により、実施形態の特徴点および利点は明らかになるであろう。これは、特許請求の範囲がその精神および権利範囲を逸脱しない範囲で前述のような実施形態の特徴点および利点にまで及ぶことを意図するものである。また、当該技術分野において通常の知識を有する者であれば、あらゆる改良および変更に容易に想到できるはずである。したがって、発明性を有する実施形態の範囲を前述したものに限定する意図はなく、実施形態に開示された範囲に含まれる適当な改良物および均等物に拠ることも可能である。 The above detailed description will clarify the features and advantages of the embodiments. It is intended that the claims extend to the features and advantages of the embodiments as described above, without departing from their spirit and scope of rights. Also, anyone with ordinary knowledge in the art should be able to easily come up with any improvements or changes. Therefore, there is no intention to limit the scope of the embodiments having invention to those described above, and it is possible to rely on suitable improvements and equivalents included in the scope disclosed in the embodiments.

１００、１００Ａ、１００Ｂ、１００Ｃ、１００Ｄ情報処理装置
２００ホストＣＰＵ
２２０Ｉ／Ｏコントローラ
３００ホストメモリ
３１０ユーザ定義ファイル
３１１ニューラルネットワーク
３１２学習データ
４００、４００Ａ、４００Ｂホストプログラム
４１０ＤＬフレームワーク
４１１演算種決定部
４１２データサイズ変更部
４１３問い合わせ部
４１４変換／転送指示部
４１５実行指示部
４２０ＤＮＮライブラリ
４２１情報管理部
４２２演算処理予測部
４２３処理命令
４３０ランタイムライブラリ
４３１変換／転送実行部
４３２処理命令実行部
５００プロセッサ
５１０演算コア
５２０演算器
５３０メモリ 100, 100A, 100B, 100C, 100D Information processing device 200 Host CPU
220 I / O controller 300 Host memory 310 User-defined file 311 Neural network 312 Learning data 400, 400A, 400B Host program 410 DL framework 411 Calculation type determination unit 412 Data size change unit 413 Inquiry unit 414 Conversion / transfer instruction unit 415 Execution Instructor 420 DNN library 421 Information management unit 422 Arithmetic processing prediction unit 423 Processing instruction 430 Runtime library 431 Conversion / transfer execution unit 432 Processing instruction execution unit 500 Processor 510 Arithmetic core 520 Arithmetic core 530 Memory

Claims

An information processing program that controls learning of a neural network including a plurality of layers executed by a processor having one or more arithmetic units including an arithmetic unit and a memory for holding data used in the arithmetic unit.
Based on the size of the training data used for learning the layer to be learned and the input batch size, a processing instruction to be executed by the processor is selected from a plurality of processing instructions having different batch sizes to be processed.
Based on the size of the training data and the batch size specified by the selected processing instruction, the memory size used for training the layer to be trained is calculated.
When the memory size of the first memory area of the memory that holds the training data does not match the calculated memory size, a second memory area having the calculated memory size is secured in the memory and the training data. An information processing program that causes an information processing apparatus to execute processing by causing the processor to execute the selected processing instruction using the learning data transferred to the second memory area.

As an arrangement pattern for arranging the learning data in the memory, a distributed type in which the learning data is distributed and arranged in the memories of a plurality of the calculation units and a distribution type in which the learning data is distributed and arranged in the memories of the plurality of calculation units are duplicated. Has a duplicate type to be placed
When the arrangement pattern of the training data held in the first memory area does not match the arrangement pattern used in the selected processing instruction, a third memory area is secured in the memory and the selected processing instruction is used. Information processing that transfers the training data to the third memory area in the arrangement pattern used in the above-mentioned third memory area, and causes the processor to execute the selected processing instruction using the learning data transferred to the third memory area. The information processing program according to claim 1, which is executed by a processing device.

When the memory size of the first memory area matches the calculated memory size, a process of causing the processor to execute the selected processing instruction using the learning data held in the first memory area. The information processing program according to claim 1 or 2, wherein the information processing apparatus is executed.

The arrangement pattern of the training data held in the first memory area matches the arrangement pattern used in the selected processing instruction, and the memory size of the first memory area is the calculated memory size. 1 or 2 according to claim 1 or 2, wherein the information processing apparatus is made to execute the process of causing the processor to execute the selected processing instruction by using the learning data held in the first memory area. Information processing program.

When the memory size of the first memory area is larger than the calculated memory size, the learning data held in the first memory area is used to execute the selected processing instruction, and the first. The calculation efficiency of the case where the memory area of 2 is secured, the learning data is transferred, and the selected processing instruction is executed is calculated.
The information processing program according to any one of claims 1 to 4, wherein the information processing apparatus is made to execute a process of causing the processor to execute the selected processing instruction with higher calculation efficiency.

When the learning data is held in the first memory area in the duplicate type and the data arrangement pattern of the selected processing instruction is the distributed type, the learning data held in the first memory area is stored. Calculates the calculation efficiency between the case where the selected processing instruction is executed by using the third memory area and the case where the learning data is transferred in the distributed type and the selected processing instruction is executed. And
The information processing program according to claim 2, wherein the information processing apparatus causes the information processing apparatus to execute a process of causing the processor to execute the selected processing instruction with higher calculation efficiency.

The calculation efficiency is the calculation time when the selected processing instruction is executed using the learning data held in the first memory area, or the learning data is transferred by securing a new memory area. The fifth or sixth aspect of the present invention, which is represented by the sum of the time to be processed and the calculation time when the selected processing instruction is executed using the learning data held in the newly secured memory area. Information processing program.

An information processing method that controls learning of a neural network including a plurality of layers executed by a processor having one or more arithmetic units including an arithmetic unit and a memory for holding data used in the arithmetic unit.
Based on the size of the training data used for learning the layer to be learned and the input batch size, a processing instruction to be executed by the processor is selected from a plurality of processing instructions having different batch sizes to be processed.
Based on the size of the training data and the batch size specified by the selected processing instruction, the memory size used for training the layer to be trained is calculated.
When the memory size of the first memory area of the memory that holds the training data does not match the calculated memory size, a second memory area having the calculated memory size is secured in the memory and the training data. An information processing method for causing an information processing apparatus to execute processing by causing the processor to execute the selected processing instruction by using the learning data transferred to the second memory area.