JP2016066329A

JP2016066329A - Information processing device, information processing method, and computer program

Info

Publication number: JP2016066329A
Application number: JP2014196174A
Authority: JP
Inventors: 隆盛緒方; Takamori Ogata
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2014-09-26
Filing date: 2014-09-26
Publication date: 2016-04-28
Anticipated expiration: 2034-09-26
Also published as: JP5954385B2

Abstract

PROBLEM TO BE SOLVED: To efficiently arrange elements contained in a matrix having one or more blocks, to a storage device.SOLUTION: An information processing device 2000 includes: a storage unit 2001 capable of storing at least data; and an arrangement unit 2002 arranging in a region continued to the storage unit 2001, data indicating elements arranged at the same position in a second matrix, for the second matrix to be at least one block contained in a first matrix.SELECTED DRAWING: Figure 20

Description

本発明は、情報処理装置及び情報処理方法等に関する。本発明は、特に、行列に含まれる要素が表すデータを記憶装置に効率的に配置可能な情報処理装置等に関する。 The present invention relates to an information processing apparatus, an information processing method, and the like. In particular, the present invention relates to an information processing apparatus and the like that can efficiently arrange data represented by elements included in a matrix in a storage device.

近年、大規模行列の要素情報（行列に含まれる要素が表すデータ）を配列として記憶装置に配置（格納）する方法が、各種数値計算等の分野において用いられている。以下において、特定の行列を表すデータ（特定の行列に含まれる全要素が表すデータ）を、単に「行列データ」と称する場合がある。 In recent years, a method of arranging (storing) element information (data represented by elements included in a matrix) of a large-scale matrix as an array in a storage device has been used in the fields of various numerical calculations. Hereinafter, data representing a specific matrix (data represented by all elements included in the specific matrix) may be simply referred to as “matrix data”.

このような技術として、例えば、ＪＤＳ（ＪａｇｇｅｄＤｉａｇｏｎａｌＳｔｏｒａｇｅ）格納法（下記特許文献１参照）や、ＣＲＳ（ＣｏｍｐｒｅｓｓｅｄＲｏｗＳｔｏｒａｇｅ）格納法等が知られている。 As such a technique, for example, a JDS (Jagged Diagonal Storage) storage method (see Patent Document 1 below), a CRS (Compressed Row Storage) storage method, and the like are known.

ここで、ＪＤＳ格納法について図１を参照して説明する。図１は、ＪＤＳ格納法を用いて、疎行列（図１における符号１００ａ）の行列データを記憶装置に格納する手順を例示する。図１においては、疎行列１００ａの鉛直方向（行方向）に行番号（１乃至８）が割り当てられる。また、疎行列１００ａの水平方向（列方向）に列番号(１乃至８）が割り当てられる。 Here, the JDS storage method will be described with reference to FIG. FIG. 1 illustrates a procedure for storing matrix data of a sparse matrix (reference numeral 100a in FIG. 1) in a storage device using the JDS storage method. In FIG. 1, row numbers (1 to 8) are assigned in the vertical direction (row direction) of the sparse matrix 100a. Also, column numbers (1 to 8) are assigned in the horizontal direction (column direction) of the sparse matrix 100a.

図１（ａ）に例示した疎行列１００ａにおいて「＊」が配置されている箇所には、非零（ゼロ）要素が配置される。また、図１（ａ）に例示した疎行列１００ａにおいて、空白が設定されている箇所は、零（ゼロ）要素が配置される。 In the sparse matrix 100a illustrated in FIG. 1A, a non-zero (zero) element is arranged at a position where “*” is arranged. In the sparse matrix 100a illustrated in FIG. 1A, a zero element is arranged at a place where a blank is set.

まず、ＪＤＳ格納法は、図１（ｂ）に例示するように、疎行列１００ａにおける非零要素のみを左方向（列方向左端側）に詰める。係る左詰めにされた疎行列を１００ｂとする。 First, in the JDS storage method, as illustrated in FIG. 1B, only non-zero elements in the sparse matrix 100a are packed in the left direction (column direction left end side). The left-justified sparse matrix is 100b.

次に、ＪＤＳ格納法は、図１（ｃ）に例示するように、上記左詰めにされた疎行列１００ｂにおける各行の行番号を、当該行に含まれる非零要素の個数の降順で並び変える。並び替えた疎行列を、１００ｃとする。 Next, in the JDS storage method, as illustrated in FIG. 1C, the row numbers of the respective rows in the left-justified sparse matrix 100b are rearranged in descending order of the number of non-zero elements included in the row. . The rearranged sparse matrix is defined as 100c.

ＪＤＳ格納法は、図１（ｄ）に例示するように、元の疎行列１００ａに含まれる各非零要素の列番号を、別の配列（２次元配列）１００ｄに格納する。即ち、配列１００ｄの各要素に格納された列番号を参照することにより、特定の行番号が割り振られた行に含まれる各要素が、元の疎行列１００ａにおいてどの列に存在したかを特定可能である。 In the JDS storage method, as illustrated in FIG. 1D, the column number of each non-zero element included in the original sparse matrix 100a is stored in another array (two-dimensional array) 100d. That is, by referring to the column number stored in each element of the array 100d, it is possible to specify in which column each element included in the row to which the specific row number is allocated was present in the original sparse matrix 100a. It is.

例えば、具体例として、疎行列１００ｃの行番号「３」の行について説明する。元の疎行列１００ａにおいて、行番号「３」に該当する行には、第１列目、第２列目、第３列目、第５列目、第７列目、及び、第８列目に非零要素が配置されている。このため、配列１００ｄにおいて行番号「３」に該当する行には、上記列番号を表すデータとして、「１、２、３、５、７、８」が設定されている。 For example, as a specific example, the row with the row number “3” of the sparse matrix 100c will be described. In the original sparse matrix 100a, the row corresponding to the row number “3” includes the first column, the second column, the third column, the fifth column, the seventh column, and the eighth column. Has a non-zero element. Therefore, “1, 2, 3, 5, 7, 8” is set as data representing the column number in the row corresponding to the row number “3” in the array 100d.

ＪＤＳ格納法は、行列１００ｃの行列データと、配列１００ｄとを、それぞれ図１（ｅ）に例示する矢印ＤＡ１（行方向の上から下に向かう方向）の方向に沿って１列目から順に（即ち、１列目から８列目に向かって順番に）、記憶装置に対して一次元的に格納する。 In the JDS storage method, the matrix data of the matrix 100c and the array 100d are respectively sequentially from the first column along the direction of the arrow DA1 (the direction from the top to the bottom in the row direction) illustrated in FIG. That is, the data are stored one-dimensionally in the storage device in order from the first column to the eighth column).

また、ＪＤＳ格納法により格納した行列とベクトルとの積を計算する場合、行列１００ｃの行列データと、列番号１０１ｄを格納した配列とは、図１（ｅ）に示す矢印ＤＡ１の順序で参照され、係る積の計算に用いられる。 When calculating the product of the matrix and the vector stored by the JDS storage method, the matrix data of the matrix 100c and the array storing the column number 101d are referred to in the order of the arrow DA1 shown in FIG. , Used to calculate the product.

このような行列演算に関連する技術として、以下のような特許文献が開示されている。 The following patent documents are disclosed as techniques related to such matrix operations.

特許文献１は、上記したように、計算機を構成する記憶装置に対して、ＪＤＳ格納法を用いて規模不規則疎行列の格納する方法を開示する。 As described above, Patent Document 1 discloses a method for storing a scale irregular sparse matrix using a JDS storage method in a storage device constituting a computer.

特許文献２（特開２００８−０７０９２８号公報）は、連立一次方程式の解を数値計算により求める際、係数行列を記憶する領域がページアウトされずにメインメモリ内に保持するように制御する計算方法を開示する。特許文献２に開示された技術は、修正コレスキー分解法に基づく係数行列の三角分解演算において、メインメモリの空き容量を超えない範囲で、分解済み係数値の一部をメインメモリに保持する。特許文献２に開示された技術は、分解済み係数値の残りを２次記憶に保持する。なお、特許文献２に開示された技術は、係数行列を記憶するメモリ領域を削減するために、スカイライン法を用いて、係数行列を圧縮する。 Patent Document 2 (Japanese Patent Application Laid-Open No. 2008-070928) discloses a calculation method for controlling so that an area for storing a coefficient matrix is held in a main memory without being paged out when a solution of simultaneous linear equations is obtained by numerical calculation. Is disclosed. The technique disclosed in Patent Document 2 retains a part of decomposed coefficient values in the main memory within a range not exceeding the free space of the main memory in the triangular decomposition operation of the coefficient matrix based on the modified Cholesky decomposition method. The technique disclosed in Patent Document 2 holds the remainder of the decomposed coefficient value in the secondary storage. The technique disclosed in Patent Document 2 compresses the coefficient matrix using the skyline method in order to reduce the memory area for storing the coefficient matrix.

特許文献３（特開２００２−１５７２３７号公報）は、反復法を用いて連立一次方程式の解を求める計算方法について開示する。特許文献３に開示された技術は、反復法を用いて連立一次方程式の近似解を計算する際、当該計算の収束を早めるように、Ｍ行列ではない係数行列に対してマルチレベル不完全ブロック分解を用いた前処理を行う。なお、特許文献３に開示された技術は、連立一次方程式の係数行列を、エルパック（Ｅｌｌｐａｃｋ）形式により圧縮して格納する。 Japanese Patent Application Laid-Open No. 2002-157237 discloses a calculation method for obtaining a solution of simultaneous linear equations using an iterative method. In the technique disclosed in Patent Document 3, when an approximate solution of simultaneous linear equations is calculated using an iterative method, a multilevel incomplete block decomposition is applied to a coefficient matrix that is not an M matrix so as to accelerate the convergence of the calculation. Pre-processing using Note that the technique disclosed in Patent Document 3 stores a coefficient matrix of simultaneous linear equations after compressing the coefficient matrix in an Elpack format.

特公平０６−０６６０６９号公報Japanese Patent Publication No. 06-0666069 特開２００８−０７０９２８号公報JP 2008-070928 A 特開２００２−１５７２３７号公報JP 2002-157237 A

各種数値計算においては、１以上のブロックに区分可能な行列が演算の対象となる場合がある。以下「１以上のブロックに区分可能な行列」を「１以上のブロックを有する行列」と表す場合がある。また、係るブロックを「小行列」と表す場合がある。 In various numerical calculations, a matrix that can be divided into one or more blocks may be a target of calculation. Hereinafter, the “matrix that can be divided into one or more blocks” may be expressed as “matrix having one or more blocks”. In addition, such a block may be expressed as a “small matrix”.

１以上のブロックを有する疎行列（以下「ブロック疎行列」と称する場合がある）が各種計算の対象となる場合、係るブロック疎行列の行列データを記憶装置に効率よく配置（格納）する必要がある。より具体的には、係るブロック疎行列に含まれる要素（要素が表すデータ）を、記憶装置に効率よく配置（格納）する必要がある。 When a sparse matrix having one or more blocks (hereinafter sometimes referred to as “block sparse matrix”) is a target of various calculations, it is necessary to efficiently arrange (store) the matrix data of the block sparse matrix in a storage device. is there. More specifically, it is necessary to efficiently arrange (store) the elements (data represented by the elements) included in the block sparse matrix in the storage device.

また、ブロック疎行列に含まれるブロックの単位で各種演算処理が繰り返し実行される場合、係るブロックの単位で、記憶装置に対するアクセス（参照）が発生する。例えば、大規模疎行列とベクトルとの積算が主たる計算コストを占めるような数値計算においては、ブロック疎行列に含まれるブロックの単位で各種演算処理が繰り返し実行される場合がある。 In addition, when various arithmetic processes are repeatedly executed in units of blocks included in the block sparse matrix, access (reference) to the storage device occurs in units of such blocks. For example, in a numerical calculation in which the integration of a large-scale sparse matrix and a vector occupies the main calculation cost, various arithmetic processes may be repeatedly executed in units of blocks included in the block sparse matrix.

各種数値計算にける演算処理の効率（処理速度等）を向上するためには、係る記憶装置に格納されたブロック疎行列の行列データに対して、効率的にアクセス可能であることが望ましい。即ち、記憶装置に格納されたブロック疎行列の行列データに対して効率的なアクセスを可能とする、ブロック疎行列の格納技術が求められる。 In order to improve the efficiency (processing speed, etc.) of arithmetic processing in various numerical calculations, it is desirable that the block sparse matrix data stored in the storage device can be efficiently accessed. That is, a block sparse matrix storage technique that enables efficient access to block sparse matrix data stored in a storage device is required.

以上より、ブロック疎行列を記憶する格納技術には、高い容量効率と演算効率とが求められる。 As described above, the storage technology for storing the block sparse matrix requires high capacity efficiency and calculation efficiency.

上記特許文献２、及び、特許文献３に開示された技術は、いずれも特定の連立方程式の解法に特化した技術を開示するにすぎず、ブロック疎行列に関する演算の効率化（高速化）と直接関係する技術ではない。それぞれの文献において開示された係数行列の記憶方法についても同様である。 The techniques disclosed in Patent Document 2 and Patent Document 3 only disclose techniques specialized for solving specific simultaneous equations, and increase the efficiency (speeding up) of operations related to block sparse matrices. It is not a directly related technology. The same applies to the coefficient matrix storage methods disclosed in the respective documents.

本発明は、上記のような事情を鑑みてなされたものである。 The present invention has been made in view of the above circumstances.

本発明は、１以上のブロックを有する行列に含まれる要素を、記憶装置において効率よく配置（格納）可能な情報処理装置等を提供することを主たる目的とする。より具体的には、本発明は、１以上のブロックを有する行列に含まれる要素を、容量効率及び演算効率がよい行列格納方法を用いて記憶装置に配置可能な情報処理装置等を提供することを主たる目的とする。 An object of the present invention is to provide an information processing apparatus and the like that can efficiently arrange (store) elements included in a matrix having one or more blocks in a storage device. More specifically, the present invention provides an information processing apparatus and the like that can arrange elements included in a matrix having one or more blocks in a storage device using a matrix storage method with high capacity efficiency and calculation efficiency. Is the main purpose.

上記の目的を達成すべく、本発明の一態様に係る情報処理装置は、以下の構成を備える。即ち、本発明の一態様に係る情報処理装置は、少なくともデータを記憶可能な記憶部と、第１の行列に少なくとも１以上含まれるブロックである第２の行列について、それぞれの上記第２の行列において同じ位置に配置された要素を表すデータを、上記記憶部における連続した領域に配置する配置部と、を備える。 In order to achieve the above object, an information processing apparatus according to an aspect of the present invention includes the following arrangement. That is, the information processing device according to one embodiment of the present invention includes each of the second matrix with respect to at least a storage unit capable of storing data and a second matrix that is a block included in the first matrix. And an arrangement unit that arranges data representing elements arranged at the same position in a continuous area in the storage unit.

また、本発明の一態様に係る情報処理方法は、以下の構成を備える。即ち、本発明の一態様に係る情報処理方法は、データを記憶可能な記憶部を備える情報処理装置が、第１の行列に少なくとも１以上含まれるブロックである第２の行列について、それぞれの上記第２の行列において同じ位置に配置された要素を表すデータを、上記記憶部における連続した領域に配置する。 An information processing method according to one embodiment of the present invention includes the following configuration. That is, in an information processing method according to one embodiment of the present invention, an information processing device including a storage unit capable of storing data is used for each of the above-described second matrices, which are blocks included in the first matrix. Data representing elements arranged at the same position in the second matrix is arranged in a continuous area in the storage unit.

また、同目的は、上記構成を有する情報処理装置、並びに対応する情報処理方法を、コンピュータによって実現するコンピュータ・プログラム、及び、そのコンピュータ・プログラムが格納されている、コンピュータ読み取り可能な記憶媒体等によっても達成される。 In addition, the same object is achieved by an information processing apparatus having the above-described configuration and a corresponding information processing method by a computer program realized by a computer, a computer-readable storage medium storing the computer program, and the like. Is also achieved.

本発明によれば、１以上のブロックを有する行列に含まれる要素を、記憶装置に対して効率よく配置（格納）可能である。より具体的には、本発明によれば、１以上のブロックを有する行列に含まれる要素を、容量効率及び演算効率がよい行列格納方法を用いて記憶装置に配置可能な情報処理装置等を提供可能である。 According to the present invention, elements included in a matrix having one or more blocks can be efficiently arranged (stored) in a storage device. More specifically, according to the present invention, there is provided an information processing apparatus and the like that can arrange elements included in a matrix having one or more blocks in a storage device using a matrix storage method with high capacity efficiency and calculation efficiency. Is possible.

図１は、ＪＤＳ格納法を用いて行列データを格納する手順について例示する図である。FIG. 1 is a diagram illustrating a procedure for storing matrix data using the JDS storage method. 図２は、１以上のブロックを有する行列データの具体例を示す図である。FIG. 2 is a diagram illustrating a specific example of matrix data having one or more blocks. 図３は、ＪＤＳ格納法を用いて、１以上のブロックを有する行列の行列データを格納する手順について例示する図（１／３）である。FIG. 3 is a diagram (1/3) illustrating a procedure for storing matrix data of a matrix having one or more blocks using the JDS storage method. 図４は、ＪＤＳ格納法を用いて、１以上のブロックを有する行列の行列データを格納する手順について例示する図（２／３）である。FIG. 4 is a diagram (2/3) illustrating a procedure for storing matrix data of a matrix having one or more blocks using the JDS storage method. 図５は、ＪＤＳ格納法を用いて、１以上のブロックを有する行列の行列データを格納する手順について例示する図（３／３）である。FIG. 5 is a diagram (3/3) illustrating a procedure for storing matrix data of a matrix having one or more blocks using the JDS storage method. 図６は、ＪＤＳ格納法を用いて格納された行列と、任意のベクトルとの間の積算処理を実装するソースコードの具体例を示す図である。FIG. 6 is a diagram illustrating a specific example of source code for implementing integration processing between a matrix stored using the JDS storage method and an arbitrary vector. 図７は、本発明の第１の実施形態における情報処理装置の機能的な構成を例示する図である。FIG. 7 is a diagram illustrating a functional configuration of the information processing apparatus according to the first embodiment of this invention. 図８は、１以上のブロックを有する行列データの具体例を示す図である。FIG. 8 is a diagram illustrating a specific example of matrix data having one or more blocks. 図９は、１以上のブロックを有する行列の行列データを、本発明の第１の実施形態における行列格納法を用いて格納する手順について例示する図（１／３）である。FIG. 9 is a diagram (1/3) illustrating a procedure for storing matrix data of a matrix having one or more blocks using the matrix storage method according to the first embodiment of the present invention. 図１０は、１以上のブロックを行列の行列データを、本発明の第１の実施形態における行列格納法を用いて格納する手順について例示する図（２／３）である。FIG. 10 is a diagram (2/3) illustrating a procedure for storing matrix data of one or more blocks by using the matrix storage method according to the first embodiment of this invention. 図１１は、１以上のブロックを有する行列の行列データを、本発明の第１の実施形態における行列格納法を用いて格納する手順について例示する図（３／３）である。FIG. 11 is a diagram (3/3) illustrating a procedure for storing matrix data of a matrix having one or more blocks using the matrix storage method according to the first embodiment of this invention. 図１２は、本発明の第１の実施形態における情報処理装置の動作を例示するフローチャートである。FIG. 12 is a flowchart illustrating the operation of the information processing apparatus according to the first embodiment of this invention. 図１３は、本発明の第１の実施形態における行列格納法を用いて格納された行列と、任意のベクトルと間の積算処理を実装するソースコードの具体例を示す図である。FIG. 13 is a diagram illustrating a specific example of source code that implements integration processing between a matrix stored using the matrix storage method according to the first embodiment of the present invention and an arbitrary vector. 図１４は、本発明の第１の実施形態における行列格納法を用いて格納された行列と、任意のベクトルと間の積算処理の過程を例示するフローチャートである。FIG. 14 is a flowchart illustrating the process of integration between a matrix stored using the matrix storage method and an arbitrary vector according to the first embodiment of the present invention. 図１５は、本発明の第１の実施形態における行列格納法を用いて格納された行列と、任意のベクトルと間の積算処理が、特定の計算機アーキテクチャ向けに最適化された場合の処理の過程を例示するフローチャートである。FIG. 15 shows the process when the integration process between the matrix stored using the matrix storage method and the arbitrary vector in the first embodiment of the present invention is optimized for a specific computer architecture. FIG. 図１６は、ＣＲＳ格納法を用いて、１以上のブロックを有する行列の行列データを格納する手順について例示する図である。FIG. 16 is a diagram illustrating a procedure for storing matrix data of a matrix having one or more blocks using the CRS storage method. 図１７は、１以上のブロックを有する行列の行列データを本発明の第１の実施形態における行列格納法を用いて格納する手順について例示する図である。FIG. 17 is a diagram illustrating a procedure for storing matrix data of a matrix having one or more blocks using the matrix storage method according to the first embodiment of the present invention. 図１８は、１以上のブロックを有する行列の行列データをＥＬＬ（Ｅｌｌｐａｃｋ）格納法を用いて格納する手順について例示する図である。FIG. 18 is a diagram illustrating a procedure for storing matrix data of a matrix having one or more blocks using an ELL (Ellpack) storage method. 図１９は、１以上のブロックを有する行列の行列データを本発明の第１の実施形態における行列格納法を用いて格納する手順について例示する図である。FIG. 19 is a diagram illustrating a procedure for storing matrix data of a matrix having one or more blocks using the matrix storage method according to the first embodiment of the present invention. 図２０は、本発明の第１の実施形態における情報処理装置の機能的な構成を例示する図である。FIG. 20 is a diagram illustrating a functional configuration of the information processing apparatus according to the first embodiment of this invention. 図２１は、本願発明の各実施形態における情報処理装置を実現可能なハードウェア構成を例示したブロック図である。FIG. 21 is a block diagram illustrating a hardware configuration capable of realizing the information processing apparatus according to each embodiment of the present invention.

次に、本発明を実施する形態について図面を参照して詳細に説明する。以下の実施の形態に記載されている構成は単なる例示であり、本願発明の技術範囲はそれらには限定されない。 Next, embodiments of the present invention will be described in detail with reference to the drawings. The configurations described in the following embodiments are merely examples, and the technical scope of the present invention is not limited thereto.

なお、各実施形態において説明する情報処理装置は、当該情報処理装置の１以上の構成要素が複数の物理的あるいは論理的に離間した装置（物理的なコンピュータや、仮想的なコンピュータ等）を用いて実現されたシステムとして構成されてもよい。 The information processing apparatus described in each embodiment uses an apparatus (a physical computer, a virtual computer, or the like) in which one or more components of the information processing apparatus are physically or logically separated. It may be configured as a realized system.

また、各実施形態において説明する情報処理装置は、当該情報処理装置の全ての構成要素が１つの装置（物理的なコンピュータや、仮想的なコンピュータ等）を用いて実現されたシステムとして構成されてもよい。 In addition, the information processing apparatus described in each embodiment is configured as a system in which all the components of the information processing apparatus are realized by using a single device (such as a physical computer or a virtual computer). Also good.

本発明の各実施形態に関する説明に先立って、本発明に関する技術的背景について説明する。 Prior to the description of each embodiment of the present invention, a technical background related to the present invention will be described.

まず、上記ＪＤＳ格納法を、ブロック疎行列に対応するように単純に拡張する構成について図２乃至図４を参照して説明する。拡張したＪＤＳ格納法は、疎行列に含まれる全ての非零要素の行番号と列番号を参照テーブルとして持つのではなく、疎行列に含まれるブロックの位置の行番号あるいは列番号のみを参照テーブルとして保持する。 First, a configuration in which the JDS storage method is simply extended to correspond to a block sparse matrix will be described with reference to FIGS. The extended JDS storage method does not have the row numbers and column numbers of all the non-zero elements included in the sparse matrix as a reference table, but only the row numbers or column numbers of the block positions included in the sparse matrix. Hold as.

図２は、ＪＤＳ格納法の各非零要素（図１における「＊」印）がブロック（小行列）である場合の、疎行列を表す図である。即ち、図２に例示する行列２００は、（２行２列）型のブロック（２行２列の小行列）を有する、疎行列を例示する。図２に例示するように、行列２００は、（１６行１６列）の正方行列である。 FIG. 2 is a diagram showing a sparse matrix when each non-zero element (“*” in FIG. 1) of the JDS storage method is a block (small matrix). That is, the matrix 200 illustrated in FIG. 2 illustrates a sparse matrix having a (2 × 2) block (a 2 × 2 small matrix). As illustrated in FIG. 2, the matrix 200 is a (16 × 16) square matrix.

図２においては、疎行列２００の鉛直方向（行方向）に、ブロック単位で行番号（ブロック行番号）が割り当てられる。また、疎行列１００の水平方向（列方向）にブロック単位で列番号(ブロック列番号）が割り当てられる。なお、図２に例示する疎行列２００においては、ブロック行番号として１乃至８、ブロック列番号として１乃至８が割り当てられている。 In FIG. 2, row numbers (block row numbers) are assigned in units of blocks in the vertical direction (row direction) of the sparse matrix 200. A column number (block column number) is assigned in block units in the horizontal direction (column direction) of the sparse matrix 100. In the sparse matrix 200 illustrated in FIG. 2, 1 to 8 are assigned as block row numbers and 1 to 8 are assigned as block column numbers.

なお、（２行２列）型のブロックの各要素については、「＊１」、「＊２」、「＊３」、「＊４」の符号を用いて表す。係る符号は、単に（２行２列）型のブロックにおける要素の配置位置を表すものであり、特定のデータ（値）を表すものではない。（２行２列）型のブロックの各要素には、任意のデータ（値）が設定され得る。上記図１と同様、疎行列２００における空白要素は零要素である。 In addition, each element of the (2 rows × 2 columns) type block is represented by using symbols “* 1”, “* 2”, “* 3”, and “* 4”. Such a code simply represents an arrangement position of an element in a (2 × 2) block, and does not represent specific data (value). Arbitrary data (value) can be set in each element of the (2 rows × 2 columns) type block. As in FIG. 1, the blank element in the sparse matrix 200 is a zero element.

上記図１（ｂ）と同様に、当該疎行列２００に含まれる各ブロックを左方向（列方向の左端側の領域）に詰めることにより、図３に例示する行列３００が得られる。なお、行列３００は、行列２００を変形したものとしてもよい。 Similar to FIG. 1B, the blocks 300 included in the sparse matrix 200 are packed in the left direction (the region on the left end side in the column direction) to obtain the matrix 300 illustrated in FIG. 3. Note that the matrix 300 may be a modified version of the matrix 200.

上記図１（ｃ）と同様に、図３に例示する行列３００について、各ブロック行に含まれる非零ブロックの個数の降順でブロック行番号を並び変えることにより、図４に例示する行列４００が得られる。なお、行列４００は、行列３００（あるいは行列２００）を変形したものとしてよい。なお、非零ブロックは、少なくとも零行列ではない小行列である。 Similar to FIG. 1C, the matrix 400 illustrated in FIG. 4 is rearranged in the descending order of the number of non-zero blocks included in each block row in the matrix 300 illustrated in FIG. can get. Note that the matrix 400 may be a modified version of the matrix 300 (or the matrix 200). A non-zero block is a small matrix that is not at least a zero matrix.

図１に例示するＪＤＳ格納法を単純に拡張する場合、行列４００に含まれる各ブロック内の要素は、図５に例示するような配置により、記憶装置に格納される。即ち、各ブロック内の要素は、記憶装置において連続したアドレス領域に格納される。 When the JDS storage method illustrated in FIG. 1 is simply expanded, the elements in each block included in the matrix 400 are stored in the storage device by the arrangement illustrated in FIG. That is, the elements in each block are stored in continuous address areas in the storage device.

具体例として、（ブロック行番号：３、ブロック行番号：１）のブロックに含まれる要素は、「＊１」「＊２」「＊３」「＊４」の順で、記憶装置において連続したアドレス領域に格納される。（ブロック行番号：３、ブロック行番号：１）のブロックが格納されたアドレス領域に連続する次のアドレス領域には、（ブロック行番号：１、ブロック行番号：１）のブロックに含まれる各要素が上記と同様に連続したアドレス領域に格納される。 As a specific example, the elements included in the block (block row number: 3, block row number: 1) are consecutive in the storage device in the order of “* 1” “* 2” “* 3” “* 4”. Stored in the address area. Each address included in the block of (block row number: 1, block row number: 1) is in the next address region continuous to the address region in which the block of (block row number: 3, block row number: 1) is stored. Elements are stored in consecutive address areas as described above.

例えば、ブロック疎行列とベクトルとの積を計算する際には、ブロック疎行列に含まれるブロックの単位で演算ループを実行する必要がある。ＪＤＳ格納法を単純に拡張した場合、あるブロックに対する演算処理から次のブロックに対する演算処理を実行する際、ブロックのサイズ分だけ離れた要素が参照される。即ち、記憶装置において連続した領域に配置されていない要素が参照される。このため、記憶装置において連続した領域にアクセスする場合と比べ、計算速度性能（演算効率）にロス（損失）が発生する。以下、図６に例示する具体的な実装例を用いて説明する。 For example, when calculating the product of a block sparse matrix and a vector, it is necessary to execute an operation loop in units of blocks included in the block sparse matrix. When the JDS storage method is simply extended, when an arithmetic process for a next block is executed from an arithmetic process for a certain block, elements separated by the size of the block are referred to. That is, elements that are not arranged in a continuous area in the storage device are referred to. For this reason, a loss (loss) occurs in calculation speed performance (computation efficiency) as compared with the case of accessing a continuous area in the storage device. Hereinafter, description will be made using a specific mounting example illustrated in FIG.

図６は、ブロック行列のサイズが２の場合（即ち、（２行２列）型のブロックである場合）に、上記説明したＪＤＳ格納法を単純に拡張した方法により格納した疎行列と、任意のベクトルとの積を計算する、Ｆｏｒｔｒａｎプログラム（コンピュータ・プログラム）のソースコードの一部を例示する。なお、Ｆｏｒｔｒａｎは周知のプログラミング言語である。 FIG. 6 shows a sparse matrix stored by a method simply expanded from the above-described JDS storage method when the block matrix size is 2 (that is, a (2 × 2) type block), and an arbitrary A part of the source code of a Fortran program (computer program) that calculates a product of a vector with Fortran is a well-known programming language.

図６に例示するソースコードにおいて、各変数と配列が表す内容は、以下の通りである。即ち、
・変数「Ｎ」は、元の行列（例えば、図２における行列２００））の行又は列方向のブロックの個数を表す。 In the source code illustrated in FIG. 6, the contents represented by each variable and array are as follows. That is,
The variable “N” represents the number of blocks in the row or column direction of the original matrix (for example, the matrix 200 in FIG. 2).

・変数「ＮＺ」は、元の行列に含まる、全ての非零ブロックの個数を表す。 The variable “NZ” represents the number of all non-zero blocks included in the original matrix.

・配列「Ａ」は、元の行列における非零要素の値を格納する配列を表す。配列「Ａ」の添字は、例えば、元の行列を図５に例示する行列５００のように変形し、１列目から順番に行方向上端から下端に向けて一次元化して番号（インデックス）を付けたときの、当該番号を表す。 The array “A” represents an array that stores the values of non-zero elements in the original matrix. For the subscript of the array “A”, for example, the original matrix is transformed into a matrix 500 illustrated in FIG. 5, and the number (index) is changed to one dimension from the top to the bottom in the row direction in order from the first column. This number is shown when it is attached.

・配列「Ｘ」は、演算対象となるベクトル（Ｘ）を表す。 The array “X” represents a vector (X) to be calculated.

・配列「Ｙ」は、上記配列「Ａ」と上記配列「Ｘ」との積の計算結果を格納する配列を表す。 The array “Y” represents an array that stores the calculation result of the product of the array “A” and the array “X”.

・配列「ＩＡ」は、元の行列Ａを図５に例示する行列５００のように変形した場合の、各ブロック列の先頭の番号を格納する。 The array “IA” stores the top number of each block column when the original matrix A is transformed as the matrix 500 illustrated in FIG.

・配列「ＪＡ」は、元の行列に含まれる、非零ブロックのブロック列番号を格納する。Ｋ番目のブロックのブロック列番号はＪＡ（Ｋ）となる。 The array “JA” stores block column numbers of non-zero blocks included in the original matrix. The block row number of the Kth block is JA (K).

・変数「ＭＪＡＤ」は、元の行列の各行に含まれる非零ブロックの個数の最大値を表す。 The variable “MJAD” represents the maximum value of the number of non-zero blocks included in each row of the original matrix.

・配列「Ｗ」は、上記配列「Ａ」と配列「Ｘ」との積の結果を一時的に格納する作業配列を表す。 The array “W” represents a work array that temporarily stores the product of the array “A” and the array “X”.

・配列ＩＯＲＤは、元の行列を、例えば、図４に例示するように並べ替えた場合のブロック行番号を格納した配列を表す。即ち、元の配列におけるブロック行番号が「Ｉ」のとき、並び替え後のブロック行番号はＩＯＲＤ（Ｉ）となる。 The array IORD represents an array storing block row numbers when the original matrix is rearranged as exemplified in FIG. That is, when the block row number in the original array is “I”, the rearranged block row number is IORD (I).

図６に例示するコードにおいて、最内の演算ループである２０番のＤＯループの添字Ｋが１増加すると、元の行列の非零要素を格納した配列Ａの添字ＫＩは４（＝２×２、「×」は乗算記号を表す、以下において同様。）増加する。したがって、ブロックのサイズがＭの場合（（Ｍ行Ｍ列）の場合）は、ブロックごとに処理される演算ループにおいては、Ｍ×Ｍ個の要素だけずれた（離れた）配列のアドレスが繰り返し参照される。この場合、記憶装置における非連続のアドレスが参照（アクセス）されることから、連続したアドレスにアクセスする場合と比べ、計算速度性能（演算効率）にロス（損失）が発生する。 In the code illustrated in FIG. 6, when the subscript K of the 20th DO loop which is the innermost operation loop is increased by 1, the subscript KI of the array A storing the non-zero elements of the original matrix is 4 (= 2 × 2 , “X” represents a multiplication symbol, and so on. Therefore, when the block size is M (in the case of (M rows and M columns)), in the operation loop processed for each block, the addresses of the array shifted (distant) by M × M elements are repeated. Referenced. In this case, since non-consecutive addresses in the storage device are referred to (accessed), a loss (loss) occurs in calculation speed performance (calculation efficiency) compared to the case of accessing consecutive addresses.

以下に説明する各実施形態における情報処理装置は、演算対象の行列に含まれるそれぞれのブロック内において配置される位置が同じ要素を、記憶装置において連続した領域（アドレス）に格納する。これにより、ブロックを有する疎行列とベクトルとの積の計算を演算する際、ブロックごとに実行される演算ループにおいて参照されるデータ（疎行列の非零要素を格納した配列データ）は、記憶装置において連続した領域（アドレス）に配置される。 The information processing apparatus in each embodiment described below stores, in a storage device, continuous areas (addresses) of elements having the same position arranged in each block included in a matrix to be calculated. Thereby, when calculating a product of a sparse matrix having a block and a vector, data (array data storing non-zero elements of the sparse matrix) referred to in an operation loop executed for each block is stored in a storage device. Are arranged in a continuous area (address).

なお、以下において、演算対象の行列に含まれるブロック内における配置位置が同じ要素を、「ブロック内における相対位置が同じ要素」と表す場合がある。 In the following, elements having the same arrangement position in a block included in a matrix to be calculated may be expressed as “elements having the same relative position in a block”.

以下、本発明の各実施形態について説明する。 Hereinafter, each embodiment of the present invention will be described.

＜第１の実施形態＞
本発明の第１の実施形態について説明する。以下においては、１以上のブロック（小行列）を有する不規則疎行列（スパース行列）を配列に格納する行列格納法について説明する。また、係る行列格納法を用いて格納された行列と、任意のベクトルとの積を計算する処理手順等について説明する。 <First Embodiment>
A first embodiment of the present invention will be described. In the following, a matrix storage method for storing an irregular sparse matrix (sparse matrix) having one or more blocks (small matrix) in an array will be described. A processing procedure for calculating the product of a matrix stored using the matrix storage method and an arbitrary vector will be described.

まず本実施形態における情報処理装置７００の構成について図７を参照して説明する。図７は、本実施形態における情報処理装置７００の機能的な構成を例示するブロック図である。 First, the configuration of the information processing apparatus 700 in this embodiment will be described with reference to FIG. FIG. 7 is a block diagram illustrating a functional configuration of the information processing apparatus 700 according to this embodiment.

本実施形態における情報処理措置７００は、記憶部７０１と、配置部７０２とを有する。また、本実施形態における情報処理装置７００は、演算処理部７０３と、入力部７０４とを有してもよい。情報処理装置７００を構成するこれらの構成要素の間は、任意の通信手段により通信可能に接続されていてもよい。また、情報処理装置７００は、物理的なコンピュータ等により構成されてもよく、周知の仮想化技術を用いて提供された仮想的なコンピュータ等により構成されてもよい。 The information processing measure 700 in this embodiment includes a storage unit 701 and an arrangement unit 702. Further, the information processing apparatus 700 in the present embodiment may include an arithmetic processing unit 703 and an input unit 704. These components constituting the information processing apparatus 700 may be communicably connected by any communication means. The information processing apparatus 700 may be configured with a physical computer or the like, or may be configured with a virtual computer or the like provided using a well-known virtualization technology.

記憶部７０１は、各種情報（データ）やプログラム（コンピュータ・プログラム）を記憶（格納）可能な記憶領域を提供する。記憶部７０１が提供する記憶領域における位置を特定可能な識別情報（例えば、アドレスやインデックス（索引）等）を用いることにより、当該記憶領域における任意の領域にアクセス可能であってもよい。
係る記憶部７０１は、例えば、物理的な半導体メモリ等（例えば、ＤＩＭＭ（ＤｕａｌＩｎｌｉｎｅＭｅｍｏｒｙＭｏｄｕｌｅ）の形態で提供されるＳＤＲＡＭ（ＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等）の記憶デバイスであってもよい。また、記憶部７０１は、仮想環境において提供される仮想的な記憶デバイスであってもよい。 The storage unit 701 provides a storage area in which various information (data) and programs (computer programs) can be stored (stored). By using identification information (for example, an address or an index (index) or the like) that can specify the position in the storage area provided by the storage unit 701, any area in the storage area may be accessible.
The storage unit 701 may be a storage device such as a physical semiconductor memory (for example, SDRAM (Synchronous Dynamic Random Access Memory) provided in the form of DIMM (Dual Inline Memory Module), for example). The storage unit 701 may be a virtual storage device provided in a virtual environment.

記憶部７０１は、後述する配置部７０１によって配置される行列の行列データを保持（格納）する。また、記憶部７０１は、当該格納された行列データを、後述する演算処理部７０３に対して提供する。 The storage unit 701 holds (stores) matrix data of a matrix arranged by the arrangement unit 701 described later. The storage unit 701 provides the stored matrix data to the arithmetic processing unit 703 described later.

配置部７０１は、演算対象の行列（以下、「演算対象行列」と称する場合がある）の行列データを、適切なデータ構造に格納して、記憶部７０１に配置（格納）する。配置部７０１は、後述する入力部７０４と通信可能に接続され、入力部７０４を介して演算対象行列を受け付けてもよい。 Arrangement unit 701 stores matrix data of a matrix to be calculated (hereinafter may be referred to as “calculation target matrix”) in an appropriate data structure, and arranges (stores) it in storage unit 701. The placement unit 701 may be communicably connected to an input unit 704 described later, and may receive a calculation target matrix via the input unit 704.

演算処理部７０３は、記憶部７０１に格納された行列データを用いて、各種演算処理を実行する。後述する具体例においては、演算処理部７０３は、上記演算対象行列と、任意のベクトルとの間の積算を実行する。なお、演算処理部７０３は、後述する入力部７０４から、各種演算に必要なデータ（例えば、演算対象のベクトルを表すデータ等）を受け付けてもよい。 The arithmetic processing unit 703 executes various arithmetic processes using the matrix data stored in the storage unit 701. In a specific example described later, the arithmetic processing unit 703 performs integration between the calculation target matrix and an arbitrary vector. Note that the arithmetic processing unit 703 may receive data (for example, data representing a vector to be calculated) necessary for various calculations from an input unit 704 described later.

入力部７０４は、情報処理装置７００に対する各種データの入力を受け付ける。入力部７０４は、例えば、各種入出力装置（キーボード、マウス、ディスプレイ）等を介して、上記各種データの入力を受け付けてもよい。また、入力部７０４は、任意の通信ネットワークを介して、上記各種データの入力を受け付けてもよい。 The input unit 704 receives various data input to the information processing apparatus 700. The input unit 704 may accept input of the various data via various input / output devices (keyboard, mouse, display) and the like, for example. Further, the input unit 704 may accept input of the various data via an arbitrary communication network.

次に、上記のように構成された情報処理装置７００の動作について説明する。 Next, the operation of the information processing apparatus 700 configured as described above will be described.

まず、情報処理装置７００における、記憶部７０１に対する演算対象行列の格納手順について、図８乃至図１２を参照して説明する。以下に説明する具体例においては、演算対象行列は、１以上のブロックを有する疎行列であることを仮定する。なお、係る演算対象行列は、不規則疎行列であってもよい。 First, the procedure for storing the calculation target matrix in the storage unit 701 in the information processing apparatus 700 will be described with reference to FIGS. 8 to 12. In the specific example described below, it is assumed that the calculation target matrix is a sparse matrix having one or more blocks. The calculation target matrix may be an irregular sparse matrix.

図８乃至図１１は、本実施形態において説明する行列格納法を用いて、演算対象行列の行列データを、記憶部７０１に格納するデータに変更する過程について例示する図である。また、図１２は、情報処理装置７００の具体的な動作を例示するフローチャートである。 8 to 11 are diagrams illustrating a process of changing the matrix data of the calculation target matrix to data stored in the storage unit 701 using the matrix storage method described in the present embodiment. FIG. 12 is a flowchart illustrating a specific operation of the information processing apparatus 700.

まず図８に例示する１以上のブロックを有する演算対象行列８００が、演算処理の対象として情報処理装置に入力される。この場合、入力部７０４が、演算対象行列８００の各要素を表すデータの入力を受け付けてもよい（ステップＳ１２０１）。 First, a calculation target matrix 800 having one or more blocks illustrated in FIG. 8 is input to the information processing apparatus as a calculation processing target. In this case, the input unit 704 may accept input of data representing each element of the calculation target matrix 800 (step S1201).

本具体例において、演算対象行列に含まれるブロックは（２行２列）型の小行列である。各ブロックに含まれる要素は、図２と同様に、「＊１」、「＊２」、「＊３」、「＊４」の符号を用いて表す。係る符号は、単に（２行２列）型のブロックにおける要素の配置位置を表すものであり、特定のデータ（値）を表すものではない。（２行２列）型のブロックの各要素には、任意のデータ（値）が設定され得る。上記図２と同様、行列８００における空白要素は零要素である。なお、本具体例においては、ブロックを（２行２列）型の小行列としているが、本実施形態はこれには限定されない。即ち、ブロックのサイズは任意としてよい。また、演算対象行列のサイズも任意としてよい。 In this specific example, the blocks included in the calculation target matrix are (2 rows × 2 columns) type small matrices. Elements included in each block are represented by using symbols “* 1”, “* 2”, “* 3”, and “* 4”, as in FIG. Such a code simply represents an arrangement position of an element in a (2 × 2) block, and does not represent specific data (value). Arbitrary data (value) can be set in each element of the (2 rows × 2 columns) type block. As in FIG. 2, the blank element in the matrix 800 is a zero element. In this specific example, the block is a (2 × 2) sub-matrix, but the present embodiment is not limited to this. That is, the block size may be arbitrary. The size of the calculation target matrix may be arbitrary.

入力部７０４は、入力された演算対象行列８００の各要素を表すデータを、配置部７０２に受け渡す。 The input unit 704 passes the data representing each element of the input calculation target matrix 800 to the arrangement unit 702.

係るデータを受け付けた配置部７０２は、図９に例示するように、演算対象行列８００に含まれる各ブロックを、列方向の左端側の領域に詰める（集約する）（ステップＳ１２０２）。 The placement unit 702 that has received such data packs (aggregates) each block included in the computation target matrix 800 in the left end region in the column direction, as illustrated in FIG. 9 (step S1202).

配置部７０２は、図１０に例示するように、上記ステップＳ１２０２において特定の領域に集約された行列について、各ブロック行に含まれる非零ブロックの個数の降順でブロック行番号を並び変える（ステップＳ１２０３）。 As illustrated in FIG. 10, the arrangement unit 702 rearranges the block row numbers in the descending order of the number of non-zero blocks included in each block row for the matrix aggregated in the specific area in step S1202 (step S1203). ).

上記ステップＳ１２０２乃至Ｓ１２０３の処理は、上記説明したＪＤＳ法と同様の方法を用いて実現してもよい。 The processes in steps S1202 to S1203 may be realized using a method similar to the JDS method described above.

次に、配置部７０２は、演算対象行列に含まれるブロックについて、それぞれのブロック内で同じ位置に配置される要素（相対位置が同じ要素）を、ステップＳ１２０３において並び替えた行列における行方向の下方に向かって、ブロック列番号の１列目から順に選択（抽出）する。そして、配置部７０２は、係る選択した要素（より具体的には、要素を表すデータ）を、記憶部７０１における連続した領域（記憶領域）に順番に配置（格納）する（ステップＳ１２０４）。ここで、記憶部７０１における連続した領域は、記憶部７０１が提供する記憶領域において論理的に連続した領域（アドレス）であってもよい。また、係る連続した領域は、例えば、記憶部７０１を構成する物理的な記憶デバイスにおける連続した領域（アドレス）であってもよい。 Next, with respect to the blocks included in the calculation target matrix, the arrangement unit 702 lowers the elements arranged in the same position in each block (elements having the same relative position) in the row direction in the matrix rearranged in step S1203. Then, selection (extraction) is sequentially performed from the first column of the block column numbers. The placement unit 702 places (stores) the selected elements (more specifically, data representing the elements) in order in a continuous area (storage area) in the storage unit 701 (step S1204). Here, the continuous area in the storage unit 701 may be a logically continuous area (address) in the storage area provided by the storage unit 701. Further, the continuous area may be, for example, a continuous area (address) in a physical storage device configuring the storage unit 701.

より具体的には、配置部７０２は、図１１の（ａ）乃至（ｄ）に例示するように、ブロック列番号の１列目から順に、ブロック列番号が同じ要素を、図１１に例示する矢印ＤＡ２の方向（行方向の上端側から下端側に向かう方向）に選択する。そして、配置部７０２は、選択した要素を記憶装置７０１における連続した領域に配置する。これにより、各ブロックにおいて配置位置が同じ要素は、記憶装置７０１における連続した領域に配置される。 More specifically, as illustrated in FIGS. 11A to 11D, the arrangement unit 702 exemplifies elements having the same block column number in order from the first column of the block column numbers in FIG. 11. The direction is selected in the direction of the arrow DA2 (the direction from the upper end side to the lower end side in the row direction). Then, the placement unit 702 places the selected element in a continuous area in the storage device 701. Thereby, elements having the same arrangement position in each block are arranged in a continuous area in the storage device 701.

具体的には、配置部７０２は、例えば、図１１の（ａ）に例示するように、ブロックにおける配置位置が”＊１”（（１、１）の位置）の要素をそれぞれのブロックから選択する。そして、配置部７０２は、係る選択した要素を、順番に記憶部７０１における連続した領域に配置する。同様に、各ブロックにおける配置位置が”＊２”（（１、２）の位置）、”＊３”（（２、１）の位置）、”＊４”（（２、２）の位置）に存在する要素を、それぞれ記憶部７０１における連続した領域に配置する。 Specifically, for example, as illustrated in FIG. 11A, the arrangement unit 702 selects an element whose arrangement position in the block is “* 1” (position (1, 1)) from each block. To do. Then, the arranging unit 702 arranges the selected elements in a continuous area in the storage unit 701 in order. Similarly, the arrangement position in each block is “* 2” (position (1, 2)), “* 3” (position (2, 1)), “* 4” (position (2, 2)). The elements existing in the storage area 701 are arranged in continuous areas.

換言すると、配置部７０２は、各ブロックを表す小行列の各行において、列方向の左端側の列から列方向の右端側に向けて順次、それぞれの列に配置された要素を表すデータを選択し、同じ列から選択されたデータを、記憶部７０１における連続した領域に配置する。 In other words, the arrangement unit 702 selects, in each row of the sub-matrix representing each block, data representing elements arranged in each column sequentially from the column on the left end side in the column direction toward the right end side in the column direction. The data selected from the same column is arranged in a continuous area in the storage unit 701.

図８乃至図１０に例示する具体例を用いて説明する。配置部７０２は、例えば、各ブロックから、各ブロックの１行目に配置された各要素を列方向の左端側の列（１列目）から右端側（２列目）に向けて順次選択し、同じ列に配置された要素を表すデータを記憶部７０１における連続した領域に配置する。各ブロックの２行目に配置された要素についても同様である。 This will be described with reference to specific examples illustrated in FIGS. For example, the arrangement unit 702 sequentially selects, from each block, each element arranged in the first row of each block from the left end column (first column) to the right end side (second column) in the column direction. The data representing the elements arranged in the same column is arranged in a continuous area in the storage unit 701. The same applies to the elements arranged in the second row of each block.

この場合、配置部７０２は、例えば、図１１に例示するように、ブロックにおける相対位置が同じ要素ごとに、格納する配列を分けてもよい（例えば、図１１に例示する１１０１、１１０２、１１０３、１１０４をそれぞれ格納する配列を設けてもよい）。 In this case, for example, as illustrated in FIG. 11, the arrangement unit 702 may divide the storage array for each element having the same relative position in the block (for example, 1101, 1102, 1103, illustrated in FIG. 11). An array for storing 1104 may be provided).

なお、各ブロックにおける特定の位置の要素を表すデータが配置される記憶部７０１おける領域と、各ブロックにおける他の位置の要素を表すデータが配置される記憶部７０１における領域とは、連続した領域であってもよい。また、各ブロックにおける特定の位置の要素を表すデータが配置される記憶部７０１における領域と、各ブロックにおける他の位置の要素を表すデータが配置される記憶部７０１における領域との間には、任意のデータ等が配置されてもよい。 An area in the storage unit 701 in which data representing an element at a specific position in each block is arranged and an area in the storage unit 701 in which data representing an element at another position in each block are arranged are continuous areas. It may be. In addition, between the area in the storage unit 701 in which data representing an element at a specific position in each block is arranged and the area in the storage unit 701 in which data representing an element at another position in each block is arranged, Arbitrary data or the like may be arranged.

上記ステップＳ１２０４の処理により、ブロックを有する演算対象行列と、任意のベクトルとの間の積を計算する演算ループにおいて、参照されるデータ（各ブロックの要素を格納した配列データ）は、記憶部７０１において連続した領域（アドレス）に配置される。なお、係る演算処理の具体例については、後述する。 In the operation loop for calculating the product between the operation target matrix having a block and an arbitrary vector by the processing in step S1204, data to be referred to (array data storing elements of each block) is stored in the storage unit 701. Are arranged in a continuous area (address). A specific example of such calculation processing will be described later.

演算対象行列（図８における符号８００）の列番号を格納する配列は、上記説明したＪＤＳ法と同様としてよい。即ち、配置部７０２は、上記説明した図１（ｄ）に例示した配列と同様、行列８００におけるブロック列番号を格納する配列を別途設ける。配置部７０２は、係る配列を、記憶部７０１に配置してもよい。 The array for storing the column numbers of the calculation target matrix (reference numeral 800 in FIG. 8) may be the same as the JDS method described above. That is, the arrangement unit 702 separately provides an array for storing the block column numbers in the matrix 800, similarly to the array illustrated in FIG. The arrangement unit 702 may arrange such an arrangement in the storage unit 701.

なお、上記ステップＳ１２０３において、配置部７０２は、演算対象行列に含まれるブロックの要素を記憶部７０１に直接配置してもよく、又は、記憶部７０１に配置可能な配置データを生成してもよい。 In step S1203, the arrangement unit 702 may directly arrange the elements of the blocks included in the calculation target matrix in the storage unit 701, or may generate arrangement data that can be arranged in the storage unit 701. .

次に、上記のように記憶部７０２に配置された演算対象行列と、任意のベクトルとの間の積算を具体例として、本実施形態における行列格納法による演算処理の効率化について説明する。 Next, the efficiency of the arithmetic processing by the matrix storage method in the present embodiment will be described by taking the integration between the calculation target matrix arranged in the storage unit 702 as described above and an arbitrary vector as a specific example.

図１３は、上記説明した行列格納方法を用いて格納した、（２行２列）型のブロックを有する演算対象行列と、任意のベクトルとの積を計算するＦｏｒｔｒａｎコードの一部である。各変数と配列は以下の意味を表す。 FIG. 13 is a part of a Fortran code for calculating a product of an arithmetic target matrix having a (2 × 2) type block stored using the matrix storage method described above and an arbitrary vector. Each variable and array has the following meaning.

・変数「Ｎ」は、元の行列（例えば、図８における演算対象行列８００））の行又は列方向のブロックの個数を表す。 The variable “N” represents the number of blocks in the row or column direction of the original matrix (for example, the calculation target matrix 800 in FIG. 8).

・配列「Ａ」は、元の行列に含まれる各ブロック（の要素）の値を格納する２次元の配列を表す。配列「Ａ」の第１次元目の添字は、図１１に例示する行列の１列目から順番に矢印ＤＡ２の方向に一次元化して番号（インデックス）を付けたときの、当該番号を表す。第２次元目の添え字は、各ブロックにおける相対位置の違いを表す番号である。 The array “A” represents a two-dimensional array that stores the value of each block (element) included in the original matrix. The subscript of the first dimension of the array “A” represents the number when it is made one-dimensionally and numbered (indexed) in the direction of the arrow DA2 in order from the first column of the matrix illustrated in FIG. The subscript in the second dimension is a number representing a difference in relative position in each block.

・配列「Ｘ」は、演算対象となる入力ベクトル（Ｘ）を表す。 The array “X” represents an input vector (X) to be calculated.

・配列「ＩＡ」は、元の行列Ａに含まれる各ブロックの要素を、図１１に例示するような順番で一次元化し、番号を付けて並べた際の、各ブロック列の先頭の番号を格納する。即ち、ブロック列番号がＩのブロックの番号はＩＡ（Ｉ）番目となる。また、ブロック列番号がＩのブロック行列の個数はＩＡ（Ｉ＋１）−ＩＡ（Ｉ）となる。 The array “IA” indicates the number of the head of each block string when the elements of each block included in the original matrix A are one-dimensionally arranged in the order illustrated in FIG. Store. That is, the number of the block having the block string number I is IA (I). In addition, the number of block matrices having the block column number I is IA (I + 1) −IA (I).

・配列「ＪＡ」は、元の行列に含まる非零ブロックのブロック列番号を格納する。Ｋ番目のブロックのブロック列番号はＪＡ（Ｋ）となる。 The array “JA” stores block column numbers of non-zero blocks included in the original matrix. The block row number of the Kth block is JA (K).

・配列ＩＯＲＤは、元の行列を、例えば、図１０に例示するように並べ替えた場合のブロック行番号を格納した配列を表す。即ち、元の配列におけるブロック行番号が「Ｉ」のとき、並び替え後のブロック行番号はＩＯＲＤ（Ｉ）となる。 The array IORD represents an array that stores the block row numbers when the original matrix is rearranged as exemplified in FIG. That is, when the block row number in the original array is “I”, the rearranged block row number is IORD (I).

図１３に例示するソースコードにおいては、最内の演算ループである２０番のＤＯループの添字Ｋが１増加すると、演算対象行列の非零要素を格納した配列Ａの添字が１増加する。 In the source code illustrated in FIG. 13, when the subscript K of the 20th DO loop which is the innermost operation loop is increased by 1, the subscript of the array A storing the non-zero elements of the operation target matrix is increased by 1.

図１４は、図１３に例示するＦｏｒｔｒａｎコードにおける、２０番のＤＯループ処理について、配列Ａに着目した処理をフローチャートとして表した図である。図１４に例示するフローチャートに係る処理は、演算処理部７０３により実行されてもよい。 FIG. 14 is a diagram illustrating, as a flowchart, processing focusing on the array A for the 20th DO loop processing in the Fortran code illustrated in FIG. 13. The processing according to the flowchart illustrated in FIG. 14 may be executed by the arithmetic processing unit 703.

図１３における２０番のＤＯループ処理は、図１４におけるステップ１４０２の分岐処理と、ステップＳ１４１１のＫをインクリメントする処理との間で実行される、繰り返し処理に相当する。 The DO loop process of No. 20 in FIG. 13 corresponds to an iterative process executed between the branch process in step 1402 in FIG. 14 and the process of incrementing K in step S1411.

上記繰り返し処理において、ステップＳ１４０３乃至ステップＳ１４１０の間で、記憶装置７０１に配置された配列Ａの読み込みと、配列Ｘの要素との積の計算とが、交互に繰り返される。当該繰り返し処理音回数は、演算対象魚列に含まれるブロックが（Ｍ行Ｍ列）型の小行列である場合、Ｍ×Ｍ回である。 In the above iterative processing, reading of the array A arranged in the storage device 701 and calculation of the product of the elements of the array X are alternately repeated between steps S1403 to S1410. The number of repeated processing sounds is M × M when the block included in the calculation target fish train is a small matrix of (M rows and M columns) type.

しかしながら、例えば、情報処理装置７００の実行環境や、図１３に例示するＦｏｒｔｒａｎコードを翻訳するコンパイラの最適化によって、実際の処理順序が変更される場合がある。 However, the actual processing order may be changed depending on, for example, the execution environment of the information processing apparatus 700 or the optimization of the compiler that translates the Fortran code illustrated in FIG.

特に、一つの命令で複数のデータを処理するＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉＤａｔａ）命令を有する計算機アーキテクチャを対象に最適化されたコンパイラは、記憶装置からの読み込み処理や演算処理を複数まとめて処理するよう、最適化した実行コードを生成する。 In particular, a compiler optimized for a computer architecture having a SIMD (Single Instruction Multi Data) instruction that processes a plurality of data with one instruction seems to process a plurality of reading processes and arithmetic processes from a storage device. , Generate optimized execution code.

図１５は、図１３に例示するソースコードがコンパイルされた実行コードが、ＳＩＭＤ命令を持つ計算機アーキテクチャ向けに最適化された場合の処理順序を例示するフローチャートである。なお、図１５に例示するフローチャートに係る処理は、演算処理部７０３により実行されてもよい。 FIG. 15 is a flowchart illustrating the processing sequence when the execution code obtained by compiling the source code illustrated in FIG. 13 is optimized for a computer architecture having SIMD instructions. Note that the processing according to the flowchart illustrated in FIG. 15 may be executed by the arithmetic processing unit 703.

ステップＳ１５０３における記号”Ｌ”は、ＳＩＭＤ命令で同時に処理する命令の個数を表す。ステップＳ１５０４乃至ステップＳ１５０６においては、上記配列Ａの値の読み込み処理が繰り返される。 The symbol “L” in step S1503 represents the number of instructions simultaneously processed by the SIMD instruction. In steps S1504 to S1506, the reading process of the value of the array A is repeated.

また、ステップＳ１５０８乃至ステップＳ１５１０においては、上記ステップＳ１５０４乃至ステップＳ１５０６において読み込んだ配列Ａの値と、配列Ｘとの積の計算が繰り返される。 In steps S1508 to S1510, the calculation of the product of the array A value read in steps S1504 to S1506 and the array X is repeated.

ステップＳ１５０４乃至ステップＳ１５０６、及び、ステップＳ１５０８乃至ステップＳ１５１０は、いずれも長さＬ以下の繰り返し処理になっている。ＳＩＭＤ命令を持つ計算機アーキテクチャ向けに最適化された場合、ステップＳ１５０４乃至ステップＳ１５０６、及び、ステップＳ１５０８乃至ステップＳ１５１０における繰り返しは、それぞれ１つの命令で処理される。 Steps S1504 to S1506 and steps S1508 to S1510 are all repeated processes of length L or less. When optimized for a computer architecture having SIMD instructions, the iterations in steps S1504 to S1506 and S1508 to S1510 are each processed with one instruction.

この際、ステップＳ１５０４乃至ステップＳ１５０６における配列Ａのデータの読み込み処理は、配列Ａの添字Ｋが１だけインクリメントされる。これより、係る処理においては、記憶装置（例えば記憶部７０１）において連続した領域（アドレス）に配置されたデータが読み込まれる。 At this time, in the process of reading data of the array A in steps S1504 to S1506, the subscript K of the array A is incremented by 1. Thus, in such processing, data arranged in a continuous area (address) in the storage device (for example, the storage unit 701) is read.

上記説明したＪＤＳ格納法を用いる場合、ブロックを有する演算対象行列を表すデータ記憶装置（例えば記憶部７０１）に格納する際、同一ブロック内の要素が、記憶装置において連続した領域（アドレス）に格納される。演算対象行列を表すデータを読み込む処理が、ブロック単位で繰り返される場合、記憶装置において不連続な領域（アドレス）がアクセスされることになる。 When the above-described JDS storage method is used, when data is stored in a data storage device (for example, the storage unit 701) that represents a calculation target matrix having blocks, elements in the same block are stored in a continuous area (address) in the storage device. Is done. When the process of reading data representing the calculation target matrix is repeated in units of blocks, a discontinuous area (address) is accessed in the storage device.

一方、上記説明した本実施形態における行列格納法によれば、ブロック単位で繰り返される処理を実行する際、記憶装置において連続した領域（アドレス）がアクセスされる。よって、本実施形態における行列格納法によれば、上記ＪＤＳ格納法に比して、記憶装置に対するアクセス性能が向上することが期待される。これに伴い、演算処理の性能も向上することが期待される。 On the other hand, according to the matrix storage method in the present embodiment described above, continuous areas (addresses) are accessed in the storage device when the process repeated in units of blocks is executed. Therefore, according to the matrix storage method in the present embodiment, it is expected that the access performance to the storage device is improved as compared with the JDS storage method. Along with this, it is expected that the performance of the arithmetic processing is also improved.

以上より、本実施形態における情報処理装置７００によれば、配置部７０２は、１以上の非零ブロックを有する演算対象行列の要素を記憶装置７０１に対して効率よく配置可能である。なぜならば、配置部７０２は、ＪＤＳ格納法と同様に演算対象行列における零要素を圧縮することが可能であり、容量効率を向上可能であるからである。 As described above, according to the information processing apparatus 700 in the present embodiment, the placement unit 702 can efficiently place the elements of the calculation target matrix having one or more non-zero blocks in the storage device 701. This is because the arrangement unit 702 can compress the zero element in the calculation target matrix as in the JDS storage method, and can improve the capacity efficiency.

また、配置部７０２は、各非零ブロック内における相対位置が同じ要素を記憶部７０１における連続した領域に格納する。これより、ブロック単位で繰り返される演算処理において記憶部７０１における連続した領域がアクセスされることから、配置部７０２は、演算効率を向上可能である。即ち、本実施形態によれば、１以上のブロックを有する行列に含まれる要素を、容量効率及び演算効率がよい方法により記憶部７０１に配置である。 The placement unit 702 stores elements having the same relative position in each non-zero block in a continuous area in the storage unit 701. As a result, since the continuous area in the storage unit 701 is accessed in the calculation process repeated in units of blocks, the arrangement unit 702 can improve the calculation efficiency. In other words, according to the present embodiment, elements included in a matrix having one or more blocks are arranged in the storage unit 701 by a method with high capacity efficiency and calculation efficiency.

＜第１の実施形態の変形例＞
上記第１の実施形態において説明した行列格納法は、周知の行列格納法であるＣＲＳ格納法、あるいは、ＥＬＬ（Ｅｌｌｐａｃｋ）格納法等に対して適用することで、これらの格納法を拡張可能である。以下、上記第１の実施形態において説明した行列格納法を用いて、これらの格納法を拡張する方法について説明する。なお、本変形例における情報処理装置７００は、上記第1の実施形態における情報処理装置７００と同様としてよいので、説明を省略する。また、ＣＲＳ格納法及びＥＬＬ格納法は、周知の技術であることから、それぞれの格納法自体に関する詳細な説明は省略する。 <Modification of First Embodiment>
The matrix storage method described in the first embodiment can be expanded by applying to the CRS storage method, which is a well-known matrix storage method, or the ELL (Ellpack) storage method. is there. Hereinafter, a method for extending these storage methods using the matrix storage method described in the first embodiment will be described. Note that the information processing apparatus 700 in the present modification may be the same as the information processing apparatus 700 in the first embodiment, and a description thereof will be omitted. Further, since the CRS storage method and the ELL storage method are well-known techniques, a detailed description of each storage method itself is omitted.

まず、上記第１の実施形態において説明した行列格納法を用いた、ＣＲＳ格納法の拡張について説明する。 First, an extension of the CRS storage method using the matrix storage method described in the first embodiment will be described.

図１６は、図８に例示する演算対象行列を列方向の左端側に詰めた行列（図９）について、それぞれのブロックに含まれる要素を列方向の右側に向けて順番に並べた行列（図１６の符号１６００）を表す。 FIG. 16 is a matrix (FIG. 9) in which elements included in each block are arranged in order toward the right side in the column direction with respect to the matrix (FIG. 9) in which the calculation target matrix illustrated in FIG. 8 is packed on the left end side in the column direction. 16 represents 1600).

周知のＣＲＳ格納法は、図１６に例示する行列１６００の１行目から順に、図１６に示す矢印ＤＡ３の方向（列方向の右側）に向けて、行列１６００の要素（要素が表すデータ）を記憶装置（例えば記憶部７０１）に連続して格納する。このため、各ブロック内の要素（「＊１」乃至「＊４」）は、記憶装置において連続した領域（アドレス）に配置される。この場合、上記説明した周知のＪＤＳ格納法と同様、演算対象行列（例えば、行列１６００）を表すデータを読み込む処理がブロック単位で繰り返されると、記憶装置において不連続な領域（アドレス）がアクセスされることになる。 In the known CRS storage method, the elements of the matrix 1600 (data represented by the elements) are sequentially directed from the first row of the matrix 1600 illustrated in FIG. 16 in the direction of the arrow DA3 illustrated in FIG. 16 (right side in the column direction). The data is continuously stored in a storage device (for example, the storage unit 701). For this reason, the elements (“* 1” to “* 4”) in each block are arranged in a continuous area (address) in the storage device. In this case, as in the well-known JDS storage method described above, when the process of reading data representing the calculation target matrix (for example, the matrix 1600) is repeated in units of blocks, a discontinuous area (address) is accessed in the storage device. Will be.

これに対して、本変形例における配置部７０１は、各ブロックにおいて相対位置が同じ要素を、記憶部７０１における連続した領域（アドレス）に配置する。より具体的には、本変形例における配置部７０２は、ブロック列番号の１列目から順に、図１７に例示する矢印ＤＡ４の方向（列方向右側）に、同じブロック行に含まれる要素を選択する。そして、配置部７０２は、選択した要素を記憶装置７０１における連続した領域に配置する。これにより、（２行２列）型のブロックにおける配置位置が同じ要素は、記憶部７０１における連続した領域（アドレス）に配置（格納）される。 On the other hand, the arrangement unit 701 in this modification example arranges elements having the same relative position in each block in a continuous area (address) in the storage unit 701. More specifically, the arrangement unit 702 in this modification example selects elements included in the same block row in the direction of the arrow DA4 illustrated in FIG. 17 (right side in the column direction) in order from the first column of the block column numbers. To do. Then, the placement unit 702 places the selected element in a continuous area in the storage device 701. Thereby, elements having the same arrangement position in the (2 rows × 2 columns) type block are arranged (stored) in a continuous area (address) in the storage unit 701.

この場合、配置部７０２は、図１７に例示するように、各ブロックにおける相対位置が同じ要素ごとに、当該要素を一次元的に格納する配列を分けてもよい（例えば、図１７における１７０１、１７０２、１７０３、１７０４）。 In this case, as illustrated in FIG. 17, the arrangement unit 702 may divide an array for storing the elements one-dimensionally for each element having the same relative position in each block (for example, 1701 in FIG. 17, 1702, 1703, 1704).

次に、上記第１の実施形態において説明した行列格納法を用いた、ＥＬＬ格納法の拡張について説明する。 Next, an extension of the ELL storage method using the matrix storage method described in the first embodiment will be described.

図１８は、図８に例示する演算対象行列を列方向の左端側に詰めた行列（図９）について、それぞれのブロックに含まれる要素を行方向の下側に向けて順番に並べた行列（図１８の符号１８００）を表す。なお、図１８における「＊０」は、零要素が格納されていることを表す。 18 illustrates a matrix (FIG. 9) in which the calculation target matrix illustrated in FIG. 8 is packed on the left end side in the column direction (FIG. 9), in which elements included in each block are arranged in order in the row direction downward ( Reference numeral 1800 in FIG. Note that “* 0” in FIG. 18 indicates that zero elements are stored.

周知のＥＬＬ格納法は、図１８に例示する行列１８００の１列目から順に、図１８に示す矢印ＤＡ５の方向（行方向の下側）に向けて、行列１８００の要素（要素が表すデータ）を記憶装置（例えば記憶部７０１）に連続して格納する。各ブロック内の要素（「＊１」乃至「＊４」）は、記憶装置において連続した領域（アドレス）に配置される。この場合、上記説明した周知のＪＤＳ格納法と同様、演算対象行列（例えば、行列１８００）を表すデータを読み込む処理がブロック単位で繰り返されると、記憶装置において不連続な領域（アドレス）がアクセスされることになる。 The well-known ELL storage method is such that the elements of the matrix 1800 (data represented by the elements) are sequentially directed from the first column of the matrix 1800 illustrated in FIG. 18 in the direction of the arrow DA5 illustrated in FIG. Are continuously stored in a storage device (for example, the storage unit 701). Elements (“* 1” to “* 4”) in each block are arranged in a continuous area (address) in the storage device. In this case, as in the well-known JDS storage method described above, when the process of reading data representing the calculation target matrix (for example, the matrix 1800) is repeated in units of blocks, a discontinuous area (address) is accessed in the storage device. Will be.

これに対して、本変形例における配置部７０１は、各ブロックにおいて相対位置が同じ要素を、記憶部７０１における連続した領域（アドレス）に配置する。より具体的には、本変形例における配置部７０２は、ブロック行番号の１行目から順に、図１９に例示する矢印ＤＡ６の方向（行方向の下側）に（２行２列）型のブロックにおける配置位置が同じ要素を、記憶部７０１における連続した領域（アドレス）に配置（格納）する。 On the other hand, the arrangement unit 701 in this modification example arranges elements having the same relative position in each block in a continuous area (address) in the storage unit 701. More specifically, the arrangement unit 702 in the present modification example has a (2 rows × 2 columns) type in the direction of the arrow DA6 (lower side in the row direction) illustrated in FIG. 19 in order from the first row of the block row numbers. Elements having the same arrangement position in the block are arranged (stored) in a continuous area (address) in the storage unit 701.

この場合、配置部７０２は、図１９に例示するように、各ブロックにおける相対位置が同じ要素ごとに、当該要素を一次元的に格納する配列を分けてもよい（例えば、図１９における１９０１、１９０２、１９０３、１９０４）。 In this case, as illustrated in FIG. 19, the arrangement unit 702 may divide an array for storing the elements one-dimensionally for each element having the same relative position in each block (for example, 1901 in FIG. 19, 1902, 1903, 1904).

以上より、本実施形態によれば、ＣＲＳ格納法やＥＬＬ格納法を拡張することにより、１以上のブロックを有する行列に含まれる要素を、容量効率及び演算効率がよい方法で記憶装置に配置可能な情報処理装置７００を提供可能である。 As described above, according to the present embodiment, by expanding the CRS storage method and the ELL storage method, elements included in a matrix having one or more blocks can be arranged in the storage device in a method with high capacity efficiency and calculation efficiency. A simple information processing apparatus 700 can be provided.

＜第２の実施形態＞
以下、本願発明の第２の実施形態について、図２０を参照して説明する。図２０は、本実施形態における情報処理装置２０００の機能的な構成を例示するブロック図である。 <Second Embodiment>
Hereinafter, a second embodiment of the present invention will be described with reference to FIG. FIG. 20 is a block diagram illustrating a functional configuration of the information processing apparatus 2000 according to this embodiment.

図２０に例示するように本実施形態における情報処理装置２０００は、記憶部２００１と、配置部２００２とを有する。情報処理装置２０００を構成するこれらの構成要素の間は、任意の通信手段により通信可能に接続されていてもよい。また、情報処理装置２０００は、物理的なコンピュータ等により構成されてもよく、周知の仮想化技術を用いて提供された仮想的なコンピュータ等により構成されてもよい。 As illustrated in FIG. 20, the information processing apparatus 2000 according to the present embodiment includes a storage unit 2001 and an arrangement unit 2002. These components constituting the information processing apparatus 2000 may be communicably connected by any communication means. The information processing apparatus 2000 may be configured with a physical computer or the like, or may be configured with a virtual computer or the like provided using a well-known virtualization technology.

記憶部２００１は、少なくともデータを記憶可能である。係る記憶部２００１は、例えば、物理的な半導体メモリ等の記憶デバイスであってもよい。また、記憶部２００１は、仮想環境において提供される仮想的な記憶デバイスであってもよい。なお、記憶部２００１は、上記第１の実施形態における記憶部７０１と同様としてもよい。 The storage unit 2001 can store at least data. For example, the storage unit 2001 may be a storage device such as a physical semiconductor memory. The storage unit 2001 may be a virtual storage device provided in a virtual environment. Note that the storage unit 2001 may be the same as the storage unit 701 in the first embodiment.

配置部２００２は、第１の行列に少なくとも１以上含まれるブロックである第２の行列について、それぞれの前記第２の行列において同じ位置に配置された要素を表すデータを、記憶部２００１における連続した領域に配置する。なお、配置部２００２は、上記第１の実施形態における記憶部７０２と同様としてもよい。 The arrangement unit 2002 continuously stores, in the storage unit 2001, data representing elements arranged at the same position in each second matrix with respect to the second matrix that is a block included in at least one of the first matrix. Place in the area. Note that the arrangement unit 2002 may be the same as the storage unit 702 in the first embodiment.

上記のように構成された、本実施形態における情報処理装置２０００によれば、配置部２００２は、１以上のブロック（第２の行列）を有する第１行列の要素を記憶装置２００１に対して効率よく配置可能である。 According to the information processing apparatus 2000 according to the present embodiment configured as described above, the placement unit 2002 efficiently uses the elements of the first matrix having one or more blocks (second matrix) for the storage device 2001. Can be placed well.

また、配置部２００２は、第２の行列内における相対位置が同じ要素を記憶装置２００２における連続した領域に格納する。これより、ブロック単位で繰り返される演算処理において、記憶装置における連続した領域がアクセスされることから、配置部２００２は、演算効率を向上可能である。即ち、本実施形態によれば、１以上の第２の行列を有する第１の行列に含まれる要素を、容量効率及び演算効率がよい方法により記憶部２００１に配置可能である。 In addition, the arranging unit 2002 stores elements having the same relative position in the second matrix in a continuous area in the storage device 2002. As a result, in the arithmetic processing repeated in units of blocks, since continuous areas in the storage device are accessed, the arrangement unit 2002 can improve the arithmetic efficiency. That is, according to the present embodiment, the elements included in the first matrix having one or more second matrices can be arranged in the storage unit 2001 by a method with good capacity efficiency and calculation efficiency.

＜ハードウェア及びソフトウェア・プログラム（コンピュータ・プログラム）の構成＞
以下、上記説明した各実施形態を実現可能なハードウェア構成について説明する。 <Configuration of hardware and software program (computer program)>
Hereinafter, a hardware configuration capable of realizing each of the above-described embodiments will be described.

以下の説明において、上記各実施形態において説明した情報処理装置（７００、２０００）をまとめて、単に「情報処理装置」と称する場合がある。また、当該情報処理装置の各構成要素（例えば、記憶部（７０１、２００１）、配置部（７０２、２００２）、演算処理部７０３、入力部７０４）をまとめて、単に「情報処理装置の構成要素」と称する場合がある。 In the following description, the information processing apparatuses (700, 2000) described in the above embodiments may be collectively referred to simply as “information processing apparatus”. In addition, each component of the information processing apparatus (for example, the storage unit (701, 2001), the arrangement unit (702, 2002), the arithmetic processing unit 703, and the input unit 704) is collectively referred to as “information processing apparatus component. May be called.

上記各実施形態において説明した情報処理装置は、専用のハードウェア装置により構成してもよい。その場合、上記各図に示した各構成要素は、一部又は全部を統合したハードウェア（処理ロジックを実装した集積回路等）として実現してもよい。 The information processing apparatus described in each of the above embodiments may be configured by a dedicated hardware device. In that case, each component shown in each of the above drawings may be realized as hardware (an integrated circuit or the like on which processing logic is mounted) that is partially or fully integrated.

例えば、各構成要素をハードウェアにより実現する場合、各構成要素は、それぞれの機能を提供可能な集積回路をＳｏＣ（Ｓｙｓｔｅｍ−ｏｎ−ａ−ｃｈｉｐ）等により実装されてもよい。この場合、例えば、各構成要素が保持するデータは、ＳｏＣとして統合されたＲＡＭ領域やフラッシュメモリ領域に記憶されてもよい。 For example, when each component is realized by hardware, an integrated circuit capable of providing each function may be implemented by SoC (System-on-a-chip) or the like. In this case, for example, data held by each component may be stored in a RAM area or a flash memory area integrated as SoC.

また、この場合、各構成要素を接続する通信回線としては、周知の通信バスを採用してもよい。また、各構成要素を接続する通信回線はバス接続に限らず、それぞれの構成要素間をピアツーピアで接続してもよい。 In this case, a well-known communication bus may be adopted as a communication line for connecting each component. Further, the communication line connecting each component is not limited to bus connection, and each component may be connected by peer-to-peer.

また、上述した情報処理装置、あるいは、当情報処理装置の構成要素は、図２１に例示するようなハードウェアと、係るハードウェアによって実行される各種ソフトウェア・プログラム（コンピュータ・プログラム）とによって構成されてもよい。 Further, the information processing apparatus described above or the components of the information processing apparatus are configured by hardware as illustrated in FIG. 21 and various software programs (computer programs) executed by the hardware. May be.

図２１における演算装置２１０１は、汎用のＣＰＵやマイクロプロセッサ等の演算処理装置である。演算装置２１０１は、例えば後述する不揮発性記憶装置２１０３に記憶された各種ソフトウェア・プログラムを記憶装置２１０２に読み出し、係るソフトウェア・プログラムに従って処理を実行してもよい。なお、上記第１の実施形態における演算処理部７０３は、演算装置２１０１を用いて、各種演算処理を実行してもよい。 An arithmetic device 2101 in FIG. 21 is an arithmetic processing device such as a general-purpose CPU or a microprocessor. The arithmetic device 2101 may read various software programs stored in a nonvolatile storage device 2103, which will be described later, into the storage device 2102 and execute processing according to the software programs. Note that the arithmetic processing unit 703 in the first embodiment may execute various arithmetic processes using the arithmetic device 2101.

記憶装置２１０２は、演算装置２１０１から参照可能な、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等のメモリ装置であり、ソフトウェア・プログラムや各種データ等を記憶する。なお、記憶装置２１０２は、揮発性のメモリ装置であってもよい。なお、上記各実施形態における記憶部（７０１、２００１）は、記憶装置２１０２を用いて実現されてもよい。 The storage device 2102 is a memory device such as a RAM (Random Access Memory) that can be referred to from the arithmetic device 2101, and stores software programs, various data, and the like. Note that the storage device 2102 may be a volatile memory device. Note that the storage units (701 and 2001) in each of the above embodiments may be realized using the storage device 2102.

不揮発性記憶装置２１０３は、例えば磁気ディスクドライブや、フラッシュメモリによる半導体記憶装置のような、不揮発性の記憶装置である。不揮発性記憶装置２１０３は、各種ソフトウェア・プログラムやデータ等を記憶可能である。 The nonvolatile storage device 2103 is a nonvolatile storage device such as a magnetic disk drive or a semiconductor storage device using flash memory. The nonvolatile storage device 2103 can store various software programs, data, and the like.

ネットワークインタフェース２１０６は、通信ネットワークに接続するインタフェース装置であり、例えば有線及び無線のＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）接続用インタフェース装置等を採用してもよい。なお、上記第１の実施形態における入力部７０４は、ネットワークインタフェース２１０６を介して、各種入力を受け付けてもよい。 The network interface 2106 is an interface device connected to a communication network, and for example, a wired and wireless LAN (Local Area Network) connection interface device or the like may be employed. Note that the input unit 704 in the first embodiment may accept various inputs via the network interface 2106.

ドライブ装置２１０４は、例えば、後述する外部記憶媒体２１０５に対するデータの読み込みや書き込みを処理する装置である。 The drive device 2104 is, for example, a device that processes reading and writing of data with respect to an external storage medium 2105 described later.

外部記録媒体２１０５は、例えば光ディスク、光磁気ディスク、半導体フラッシュメモリ等、データを記録可能な任意の記録媒体である。 The external recording medium 2105 is an arbitrary recording medium capable of recording data, such as an optical disk, a magneto-optical disk, and a semiconductor flash memory.

入出力インタフェース２１０７は、外部装置との間の入出力を制御する装置である。例えば、情報処理装置のユーザあるいは管理者等は、当該入出力インタフェースを介して接続された各種入出力装置（例えば、キーボード、マウス、ディスプレイ装置、プリンタ等）を用いて、情報処理装置に対して各種操作の指示等を入力してもよい。なお、上記第１の実施形態における入力部７０４は、入出力インタフェース２１０７を介して、各種入力を受け付けてもよい。 The input / output interface 2107 is a device that controls input / output with an external device. For example, a user or administrator of an information processing apparatus uses a variety of input / output devices (for example, a keyboard, mouse, display device, printer, etc.) connected via the input / output interface to the information processing apparatus. Various operation instructions and the like may be input. Note that the input unit 704 in the first embodiment may accept various inputs via the input / output interface 2107.

上述した各実施形態を例に説明した本発明は、例えば、図２１に例示したハードウェア装置により情報処理装置を構成し、係るハードウェア装置に対して、上記各実施形態において説明した機能を実現可能なソフトウェア・プログラムを供給することにより実現されてもよい。この場合、係る装置に対して供給したソフトウェア・プログラムを、演算装置２１０１が実行することによって、本願発明が実現されてもよい。 The present invention described by taking each embodiment described above as an example configures an information processing device by the hardware device illustrated in FIG. 21, for example, and realizes the function described in each of the above embodiments for the hardware device. It may be realized by supplying possible software programs. In this case, the present invention may be realized by the arithmetic device 2101 executing the software program supplied to the device.

上述した各実施形態において、上記各図（例えば、図７、図２０等）に示した各部は、上述したハードウェアにより実行されるソフトウェア・プログラムの機能（処理）単位である、ソフトウェアモジュールとして実現することができる。ただし、これらの図面に示した各ソフトウェアモジュールの区分けは、説明の便宜上の構成であり、実装に際しては、様々な構成が想定され得る。 In each of the above-described embodiments, each unit shown in each of the above-described drawings (for example, FIG. 7, FIG. 20, etc.) is realized as a software module, which is a function (processing) unit of a software program executed by the above-described hardware. can do. However, the division of each software module shown in these drawings is a configuration for convenience of explanation, and various configurations can be assumed for implementation.

例えば、図７、及び、図２０に例示した各部をソフトウェアモジュールとして実現する場合、これらのソフトウェアモジュールを不揮発性記憶装置２１０３に記憶しておき、演算装置２１０１がそれぞれの処理を実行する際に、これらのソフトウェアモジュールを記憶装置２１０２に読み出すよう構成してもよい。 For example, when each unit illustrated in FIG. 7 and FIG. 20 is realized as a software module, these software modules are stored in the nonvolatile storage device 2103, and when the arithmetic device 2101 executes each process, You may comprise so that these software modules may be read to the memory | storage device 2102. FIG.

また、これらのソフトウェアモジュール間は、共有メモリやプロセス間通信等の適宜の方法により、相互に各種データを伝達できるように構成してもよい。このような構成により、これらのソフトウェアモジュール間は、相互に通信可能に接続可能である。 In addition, these software modules may be configured to transmit various data to each other by an appropriate method such as shared memory or interprocess communication. With such a configuration, these software modules can be connected so as to communicate with each other.

更に、上記各ソフトウェア・プログラムを外部記憶媒体２１０５に記録しておき、上記通信装置等の出荷段階、あるいは運用段階等において、適宜ドライブ装置２１０４を通じて当該ソフトウェア・プログラムを不揮発性メモリ２１０３に格納するよう構成してもよい。 Further, each software program is recorded in the external storage medium 2105, and the software program is appropriately stored in the nonvolatile memory 2103 through the drive device 2104 at the shipping stage or the operation stage of the communication apparatus. It may be configured.

なお、上記の場合において、上記情報処理装置への各種ソフトウェア・プログラムの供給方法は、出荷前の製造段階、あるいは出荷後のメンテナンス段階等において、適当な治具を利用して当該装置内にインストールする方法を採用してもよい。また、各種ソフトウェア・プログラムの供給方法は、インターネット等の通信回線を介して外部からダウンロードする方法等のように、現在では一般的な手順を採用してもよい。 In the above case, the method of supplying various software programs to the information processing apparatus is installed in the apparatus using an appropriate jig at the manufacturing stage before shipment or the maintenance stage after shipment. You may adopt the method of doing. As a method for supplying various software programs, a general procedure may be adopted at present, such as a method of downloading from the outside via a communication line such as the Internet.

そして、このような場合において、本発明は、係るソフトウェア・プログラムを構成するコード、あるいは係るコードが記録されたところの、コンピュータ読み取り可能な記憶媒体によって構成されると捉えることができる。 In such a case, the present invention can be understood to be constituted by a code constituting the software program or a computer-readable storage medium in which the code is recorded.

また、上述した情報処理装置、あるいは、当情報処理装置の構成要素は、図２１に例示するハードウェア装置を仮想化した仮想化環境と、当該仮想化環境において実行される各種ソフトウェア・プログラム（コンピュータ・プログラム）とによって構成されてもよい。この場合、図２１に例示するハードウェア装置の構成要素は、当該仮想化環境における仮想デバイスとして提供される。なお、この場合も、図２１に例示するハードウェア装置を物理的な装置として構成した場合と同様の構成にて、本発明を実現可能である。 In addition, the information processing apparatus described above or the components of the information processing apparatus include a virtual environment in which the hardware device illustrated in FIG. 21 is virtualized, and various software programs (computers) that are executed in the virtual environment. -A program). In this case, the components of the hardware device illustrated in FIG. 21 are provided as virtual devices in the virtual environment. In this case as well, the present invention can be realized with the same configuration as the case where the hardware device illustrated in FIG. 21 is configured as a physical device.

以上、本発明を、上述した模範的な実施形態に適用した例として説明した。しかしながら、本発明の技術的範囲は、上述した各実施形態に記載した範囲には限定されない。当業者には、係る実施形態に対して多様な変更又は改良を加えることが可能であることは明らかである。そのような場合、係る変更又は改良を加えた新たな実施形態も、本発明の技術的範囲に含まれ得る。そしてこのことは、特許請求の範囲に記載した事項から明らかである。 In the above, this invention was demonstrated as an example applied to exemplary embodiment mentioned above. However, the technical scope of the present invention is not limited to the scope described in the above embodiments. It will be apparent to those skilled in the art that various modifications and improvements can be made to such embodiments. In such a case, new embodiments to which such changes or improvements are added can also be included in the technical scope of the present invention. This is clear from the matters described in the claims.

本発明は、例えば、有限要素法による数値解析処理において、大規模疎行列を係数行列とする行列方程式を、反復解法（例えばＣＧ（ＣｏｎｊｕｇａｔｅＧｒａｄｉｅｎｔ）法（共役勾配法））を用いて解く場合に適用可能である。本発明は、特に、大規模疎行列とベクトルとの積算が主たる計算コストを占める数値計算処理等に適用可能である。 The present invention is, for example, in the case of solving a matrix equation having a large-scale sparse matrix as a coefficient matrix using an iterative solving method (for example, a CG (Conjugate Gradient) method (conjugate gradient method)) in numerical analysis processing by a finite element method. Applicable. The present invention is particularly applicable to numerical calculation processing in which the integration of large-scale sparse matrices and vectors occupies the main calculation cost.

７００情報処理装置
７０１記憶部
７０２配置部
７０３演算処理部
７０４入力部
２０００情報処理装置
２００１記憶部
２００２配置部
２１０１演算装置
２１０２記憶装置
２１０３不揮発性記憶装置
２１０４ドライブ装置
２１０５外部記憶媒体
２１０６ネットワークインタフェース
２１０７入出力インタフェース 700 Information Processing Device 701 Storage Unit 702 Arrangement Unit 703 Arithmetic Processing Unit 704 Input Unit 2000 Information Processing Device 2001 Storage Unit 2002 Arrangement Unit 2101 Arithmetic Device 2102 Storage Device 2103 Nonvolatile Storage Device 2104 Drive Device 2105 External Storage Medium 2106 Network Interface 2107 Input Output interface

Claims

Storage means for storing at least data;
An arrangement in which data representing elements arranged at the same position in each of the second matrices is arranged in a continuous area in the storage unit with respect to a second matrix which is a block included in at least one of the first matrix. And an information processing apparatus.

The arrangement means includes
Aggregating the second matrix into a specific region in the first matrix;
In each row of the aggregated second matrix, sequentially select data representing elements arranged in a specific column from the column on the left end side in the column direction toward the right end side in the column direction;
The information processing apparatus according to claim 1, wherein data selected from the same specific column in each row of each second matrix is arranged in a continuous area in the storage unit.

The arrangement means includes
The columns included in the first matrix are grouped for each number of columns of the block into a block column,
The rows included in the first matrix are grouped for each number of rows of the block into block rows,
For each block column, the same block column is arranged from the block row arranged at the upper end in the row direction in the first matrix toward the block row arranged at the lower end in the row direction in the first matrix. Sequentially selecting the second matrix included;
The information processing apparatus according to claim 1, wherein data representing elements arranged at the same position in the selected second matrix is arranged in a continuous area in the storage unit.

The arrangement means includes
The columns included in the first matrix are grouped for each number of columns of the block into a block column,
The rows included in the first matrix are grouped for each number of rows of the block into block rows,
For each block row, the same block row from the block column arranged at the left end in the column direction in the first matrix toward the block column arranged at the right end in the column direction in the first matrix Sequentially selecting the second matrix included in
The information processing apparatus according to claim 1, wherein data representing elements arranged at the same position in the selected second matrix is arranged in a continuous area in the storage unit.

The arrangement means includes
The columns included in the first matrix are grouped for each number of columns of the block into a block column,
The rows included in the first matrix are grouped for each number of rows of the block into block rows,
For each block column, sequentially select the second matrix included in the block column in descending order of the number of the second matrices included in the block row, and the same in the selected second matrix The information processing apparatus according to claim 1, wherein data representing elements arranged at positions is arranged in a continuous area in the storage unit.

The arrangement means includes
In the first matrix, the block rows are rearranged in the descending order of the number of the second matrix included in the block row,
For each block column, from the block row arranged at the upper end in the row direction in the rearranged first matrix toward the block row arranged at the lower end in the row direction, the block column includes Select the second matrix sequentially,
The information processing apparatus according to claim 3, wherein data representing elements arranged at the same position in the selected second matrix is arranged in a continuous area in the storage unit.

The information processing apparatus according to claim 1, wherein the second matrix is a square matrix.

From the second matrix, which is an included block that is at least one block included in the first matrix, data representing elements arranged at the same position in each of the second matrices is selected, and the selected An information processing apparatus that generates arrangement data arranged so that data representing elements is arranged in a continuous area in a storage unit capable of storing data.

An information processing apparatus comprising a storage means for storing data,
Information for arranging data representing elements arranged at the same position in each of the second matrices in a continuous area in the storage means for the second matrix, which is a block included in at least one of the first matrix. Processing method.

In a computer having storage means for storing data,
Processing for arranging data representing elements arranged at the same position in each of the second matrices in a continuous area in the storage means for the second matrix that is at least one block included in the first matrix A computer program that runs