JP2006228152A

JP2006228152A - Data analyzer

Info

Publication number: JP2006228152A
Application number: JP2005044725A
Authority: JP
Inventors: Hiroshi Okamoto; 洋岡本; Yukihiro Tsuboshita; 幸寛坪下
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2005-02-21
Filing date: 2005-02-21
Publication date: 2006-08-31

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data analyzer, giving a solution having the continuity to the state of an initial pattern about a network constituted by data element groups as an object for analysis to be utilized for various purposes. <P>SOLUTION: In this data analyzer, the plurality of data elements are respectively associated with a data value of any of N stages (N is integers from 2 or more), and stored, the relational weighting information between the data elements is stored, one of the data elements is selected at random as an attentional data, and an input stimulation value related to the attentional data is computed on the basis of the relational weighting information between the attentional data and the other data element and the data value of the other data element. On the basis of the computed input stimulation value, a threshold determined depending on the current data value related to the attentional data and varied with every data value is compared with the input stimulation value to determine whether or not the data value is vaired, and if the data value is determined to be varied, the concerned data value is updated. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、ニューラルネットワークを用いたデータ分析装置に関する。 The present invention relates to a data analysis apparatus using a neural network.

ニューラルネットワークを用いたデータ分析手法は、従来からいくつかの方法が知られている。このうち、ホップフィールド（J.J.Hopfield）が提案したものは、複数のノードについて、その相互間の結合強度が定義され、このノードを含んで構成されるパタンが、予め定められているパタン群のうちどれに最も近いかを判定するものである。 As a data analysis method using a neural network, several methods have been conventionally known. Among them, the one proposed by Hopfield (JJHopfield) is that the coupling strength between the nodes is defined for a plurality of nodes, and the pattern including this node is a predetermined pattern group. It is to determine which is closest.

ここでノード間の結合強度は、ノードｉからノードｊへの結合強度と、ノードｊからノードｉへの結合強度が等しい、いわゆる対称結合となるよう設定されている。こうしたノードからなるネットワーク（対称結合ネットワーク）における活性伝播の結果は、特定の安定状態（活性伝播によって定められる力学系のアトラクタである不動点）に落ち着くことが知られている。すなわちホップフィールドの提案した方法は、ネットワークが最終的に、どの安定状態（不動点）に落ち着くかに応じてネットワークの初期パタンがどの不動点パタン（予め定められているパタン）に最も近いかを演算するものである（非特許文献１）。
Hopfield, J. J. (1982) Proc.Natl. Acad. Sci. USA, 79, 2554 Koulakov, A. A., Raghavachari,S., Kepecs, A. & Lisman, J. E. (2002) , Nat. Neurosci. 8, 775 Goldman M.S., Levine J.H., MajorG., Tank D.W., Seung H.S., "Robust persistent neural activity in a modelintegrator with multiple hesteretic dendrites per nuron", Cereb. Cortex13, 1185-1195(2003) Here, the coupling strength between the nodes is set to be a so-called symmetric coupling in which the coupling strength from the node i to the node j is equal to the coupling strength from the node j to the node i. It is known that the result of activity propagation in a network of such nodes (symmetric coupling network) settles to a specific stable state (a fixed point that is an attractor of a dynamic system defined by activity propagation). In other words, Hopfield's proposed method determines which network's initial pattern is closest to which fixed point pattern (predetermined pattern) depending on which stable state (fixed point) the network eventually settles into. The calculation is performed (Non-Patent Document 1).
Hopfield, JJ (1982) Proc. Natl. Acad. Sci. USA, 79, 2554 Koulakov, AA, Raghavachari, S., Kepecs, A. & Lisman, JE (2002), Nat. Neurosci. 8, 775 Goldman MS, Levine JH, MajorG., Tank DW, Seung HS, "Robust persistent neural activity in a model integrator with multiple hesteretic dendrites per nuron", Cereb. Cortex 13, 1185-1195 (2003)

しかしながら、上記従来のホップフィールドの方法では、ネットワークの分析結果は、事前に定められた不動点パタンのいずれかである。これら不動点パタンは、解空間内に離散的に存在し、初期パタンの変化に対して連続的に変化する解が得られないという問題点がある。 However, in the conventional hop field method, the network analysis result is one of predetermined fixed point patterns. These fixed point patterns exist discretely in the solution space, and there is a problem that a solution that continuously changes with respect to the change of the initial pattern cannot be obtained.

そこで、ヒステリシスユニット（値の変化にヒステリシス特性を有するユニット）によってネットワークを構成し、連続的アトラクタを持つ力学系を実現する試みがなされている（非特許文献２）。しかし、このクーラロフの方法では、結局パタンの連続性は実現されず、ネットワーク全体から定義されるスカラ量（活動度と呼ばれる）に関する連続性が実現されたに留まる。 Therefore, an attempt has been made to realize a dynamic system having a continuous attractor by configuring a network with hysteresis units (units having hysteresis characteristics in change in value) (Non-patent Document 2). However, in this Kurarov method, the continuity of the pattern is not realized in the end, and the continuity regarding the scalar quantity (called activity) defined from the whole network is only realized.

すなわち、分析の対象となるデータ要素群を用いて構成したネットワークについて、その初期パタンの状態に対して連続性を有する解を与える方法は見いだされていなかった。このため、上記従来の方法は、いずれも限られた用途でのデータ分析を行うことができるに留まっていた。 That is, no method has been found for providing a solution having continuity with respect to the initial pattern state of a network configured using a group of data elements to be analyzed. For this reason, any of the above conventional methods can only perform data analysis for limited applications.

本発明は上記実情に鑑みて為されたもので、分析の対象となるデータ要素群を用いて構成したネットワークについて、その初期パタンの状態に対して連続性を有する解を与え、幅広い用途に利用できるデータ分析装置を提供することを、その目的の一つとする。 The present invention has been made in view of the above circumstances, and provides a solution having continuity with respect to the initial pattern state of a network configured using a data element group to be analyzed, and is used for a wide range of applications. It is an object of the present invention to provide a data analysis apparatus that can perform the above.

また、本発明の別の目的の一つは、ネットワークの収束性を向上したデータ分析装置を提供することである。 Another object of the present invention is to provide a data analysis apparatus with improved network convergence.

上記従来例の問題点を解決するための本発明は、データ分析装置であって、複数のデータ要素のそれぞれにＮ（Ｎは２以上の整数）段階のいずれかのデータ値を関連づけて記憶するとともに、前記データ要素間の関係重みづけ情報を記憶する記憶手段と、前記データ要素の一つを注目データとして所定の規則に基づいて選択し、当該注目データと他のデータ要素との関係重みづけ情報と、他のデータ要素のデータ値とに基づいて、当該注目データに関係する入力刺激値を演算する手段と、前記演算された入力刺激値に基づいて、注目データのデータ値を更新する手段であって、当該注目データに関連する現在のデータ値に応じて定められ、データ値ごとに異なるしきい値と、前記入力刺激値とを比較して、データ値を変化させるか否かを決定し、変化させると決定した場合に、当該データ値を更新する手段と、前記各データ要素についてのデータ値の総和が予め定めた値となるよう、データ値をスケーリングする手段と、を含み、所定の条件が満足されるまで、前記入力刺激値の演算と、データ値の更新とを繰り返して実行した後の、前記データ要素の少なくとも一つについての前記データ値が、所定の処理に供されることを特徴としている。 The present invention for solving the problems of the above-described conventional example is a data analysis apparatus, and stores a data value in any of N stages (N is an integer of 2 or more) in association with each of a plurality of data elements. And storage means for storing relation weighting information between the data elements, and selecting one of the data elements as attention data based on a predetermined rule, and weighting the relation between the attention data and another data element Means for calculating an input stimulus value related to the attention data based on information and data values of other data elements, and means for updating the data value of the attention data based on the calculated input stimulus value And determining whether or not to change the data value by comparing the threshold value which is determined according to the current data value related to the data of interest and is different for each data value and the input stimulus value. Means for updating the data value when it is determined to be changed, and means for scaling the data value so that the sum of the data values for each data element becomes a predetermined value. The data value for at least one of the data elements after being repeatedly subjected to the calculation of the input stimulus value and the update of the data value until a condition is satisfied is subjected to a predetermined process. It is characterized by.

また、上記従来例の問題点を解決するための本発明は、データ分析装置であって、複数のデータ要素のそれぞれについて、ＹminからＹmax（Ｙmin＜Ｙmax）までの間の連続値であるデータ値を関連づけて記憶するとともに、前記データ要素間の関係重みづけ情報を記憶する記憶手段と、前記データ要素の一つを注目データとして所定の規則に基づいて選択し、当該注目データと他のデータ要素との関係重みづけ情報と、他のデータ要素のデータ値とに基づいて、当該注目データに関係する入力刺激値を演算する手段と、前記演算された入力刺激値に基づいて、注目データのデータ値を更新する手段であって、当該注目データに関連する現在のデータ値に応じて定められ、データ値ごとに異なるしきい値と、前記入力刺激値とを比較して、データ値を変化させるか否かを決定し、変化させると決定した場合に、当該データ値を更新する手段と、前記各データ要素についてのデータ値の総和が予め定めた値となるよう、データ値をスケーリングする手段と、を含み、所定の条件が満足されるまで、前記入力刺激値の演算と、データ値の更新とを繰り返して実行した後の、前記データ要素の少なくとも一つについての前記データ値が、所定の処理に供されることを特徴としている。 Further, the present invention for solving the problems of the conventional example is a data analyzing apparatus, wherein each of a plurality of data elements is a data value that is a continuous value from Ymin to Ymax (Ymin <Ymax). Are stored in association with each other, and storage means for storing relation weighting information between the data elements, one of the data elements is selected as attention data based on a predetermined rule, and the attention data and other data elements are selected. Based on the relationship weighting information and the data value of the other data element, the means for calculating the input stimulus value related to the attention data, the data of the attention data based on the calculated input stimulus value A means for updating a value, which is determined according to a current data value related to the data of interest, and compares a threshold value different for each data value with the input stimulus value to obtain a data value Decide whether to change, and if it is decided to change, means for updating the data value, and scale the data value so that the sum of the data values for each data element becomes a predetermined value The data value for at least one of the data elements after repeatedly performing the calculation of the input stimulus value and the update of the data value until a predetermined condition is satisfied, It is characterized by being subjected to predetermined processing.

ここで前記データ値を更新する手段は、入力刺激値が０からＸ1minまではＹminであり、Ｘ1minからＸ1maxまでの間に連続的に単調増加し、Ｘ1max以上においてはＹmaxとなる第１単調増加関数ｆ１と、入力刺激値がＸ2minからＸ2maxまでの間に連続的に単調増加し、Ｘ2max以上においてはＹmaxとなる第２単調増加関数ｆ２とを用い、注目データに対する入力刺激値がＸ1minからＸ2maxまでの間では、更新前の注目データのデータ値Ｙについて、Ｙ＝ｆ２（Ｉ２）となるＩ２を参照して、入力刺激値Ｉが当該Ｉ２を超える場合に、データ値をｆ２（Ｉ）に更新し、更新前の注目データのデータ値Ｙについて、Ｙ＝ｆ１（Ｉ１）となるＩ１を参照して、入力刺激値ＩがこのＩ１未満となる場合に、データ値をｆ１（Ｉ）に更新することとしてもよい。 Here, the means for updating the data value is a first monotonically increasing function in which the input stimulus value is Ymin from 0 to X1min, continuously monotonically increases from X1min to X1max, and becomes Ymax above X1max. Using f1 and a second monotonically increasing function f2 in which the input stimulus value continuously increases monotonically between X2min and X2max and becomes Ymax above X2max, the input stimulus value for the data of interest ranges from X1min to X2max. In the meantime, with respect to the data value Y of the attention data before the update, the data value is updated to f2 (I) when the input stimulus value I exceeds I2 with reference to I2 where Y = f2 (I2) Referring to the data value Y of the attention data before update, referring to I1 where Y = f1 (I1), and updating the data value to f1 (I) when the input stimulus value I is less than I1 It is good.

また、本発明の一態様は、複数のデータ要素のそれぞれに複数段階の離散値又は連続値のデータ値を関連づけて記憶するとともに、前記データ要素間の関係重みづけ情報を記憶する記憶手段を備えたコンピュータを用いたデータ分析方法であって、前記データ要素の一つを注目データとして所定の規則に基づいて選択し、当該注目データと他のデータ要素との関係重みづけ情報と、他のデータ要素のデータ値とに基づいて、当該注目データに関係する入力刺激値を演算する工程と、前記演算された入力刺激値に基づいて、注目データのデータ値を更新する手段であって、当該注目データに関連する現在のデータ値に応じて定められ、データ値ごとに異なるしきい値と、前記入力刺激値とを比較して、データ値を変化させるか否かを決定し、変化させると決定した場合に、当該データ値を更新する工程と、前記各データ要素についてのデータ値の総和が予め定めた値となるよう、データ値をスケーリングする工程と、を所定の条件が満足されるまで繰返し実行させ、当該繰返し実行後の前記データ要素の少なくとも一つについての前記データ値が、所定の処理に供されることを特徴としている。 According to another aspect of the present invention, there is provided storage means for storing a plurality of discrete values or continuous values of data values in association with each of the plurality of data elements, and storing relationship weighting information between the data elements. A data analysis method using a computer, wherein one of the data elements is selected as attention data based on a predetermined rule, relation weighting information between the attention data and another data element, and other data A step of calculating an input stimulus value related to the attention data based on the data value of the element; and a means for updating the data value of the attention data based on the calculated input stimulus value. A threshold value that is determined according to the current data value related to the data and is different for each data value is compared with the input stimulus value to determine whether to change the data value. A predetermined condition is satisfied when the data value is updated, and the step of scaling the data value so that the sum of the data values for each data element becomes a predetermined value. The data value for at least one of the data elements after the repeated execution is subjected to a predetermined process.

さらに本発明の別の態様に係るプログラムは、複数のデータ要素のそれぞれに複数段階の離散値又は連続値のデータ値を関連づけて記憶するとともに、前記データ要素間の関係重みづけ情報を記憶する記憶手段を備えたコンピュータに、前記データ要素の一つを注目データとして所定の規則に基づいて選択し、当該注目データと他のデータ要素との関係重みづけ情報と、他のデータ要素のデータ値とに基づいて、当該注目データに関係する入力刺激値を演算する手順と、前記演算された入力刺激値に基づいて、注目データのデータ値を更新する手段であって、当該注目データに関連する現在のデータ値に応じて定められ、データ値ごとに異なるしきい値と、前記入力刺激値とを比較して、データ値を変化させるか否かを決定し、変化させると決定した場合に、当該データ値を更新する手順と、前記各データ要素についてのデータ値の総和が予め定めた値となるよう、データ値をスケーリングする手順と、を所定の条件が満足されるまで繰返し実行させることを特徴としている。 Furthermore, a program according to another aspect of the present invention stores a plurality of discrete values or continuous data values in association with each of a plurality of data elements, and stores relationship weighting information between the data elements. A computer having means for selecting one of the data elements as attention data based on a predetermined rule, relation weighting information between the attention data and another data element, a data value of the other data element, And a means for updating the data value of the attention data based on the calculated input stimulus value based on the calculated input stimulus value, and a current value related to the attention data. A threshold value that differs for each data value and the input stimulus value is compared to determine whether to change the data value. If the predetermined value is satisfied, the procedure for updating the data value and the procedure for scaling the data value so that the sum of the data values for each data element becomes a predetermined value. It is characterized by being repeatedly executed.

本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係るデータ分析装置は、図１に示すように、制御部１１と記憶部１２とデータ入力部１３と結果出力部１４とを含んで構成されている。 Embodiments of the present invention will be described with reference to the drawings. As shown in FIG. 1, the data analysis apparatus according to the embodiment of the present invention includes a control unit 11, a storage unit 12, a data input unit 13, and a result output unit 14.

制御部１１は、ＣＰＵ等で実現でき、記憶部１２に格納されているプログラムに従って、データ分析処理を実行する。制御部１１によるこの処理の具体的な内容については、後に詳しく述べる。記憶部１２は、メモリ素子やディスクデバイスを用いて実現できる。この記憶部１２は制御部１１によって実行されるプログラムを格納している。また記憶部１２は、制御部１１のデータ分析処理において利用される種々のデータを保持するワークメモリとしても動作する。 The control unit 11 can be realized by a CPU or the like, and executes data analysis processing according to a program stored in the storage unit 12. The specific contents of this process by the control unit 11 will be described in detail later. The storage unit 12 can be realized using a memory element or a disk device. The storage unit 12 stores a program executed by the control unit 11. The storage unit 12 also operates as a work memory that holds various data used in the data analysis processing of the control unit 11.

データ入力部１３及び結果出力部１４は、例えば分析の対象となるデータを格納するデータストレージからデータを取得し、また、当該データストレージやディスプレイ等に分析結果を出力する入出力インタフェースである。 The data input unit 13 and the result output unit 14 are input / output interfaces that acquire data from, for example, a data storage that stores data to be analyzed, and output the analysis results to the data storage or display.

次に、本実施の形態の制御部１１によるデータ分析処理について説明する。本実施の形態では、相互結合ネットワーク型データ構造を規定し、そのノードの各々に多段階ヒステリシス入出力特性を有するデータ値記憶ユニットを対応させる。本実施の形態では、分析の対象となるデータは、複数のデータ要素を含む。具体的に分析対象のデータを文章であるとすると、データ要素は、個々の単語として定義できる。 Next, data analysis processing by the control unit 11 of the present embodiment will be described. In the present embodiment, an interconnection network type data structure is defined, and a data value storage unit having multi-stage hysteresis input / output characteristics is associated with each node. In the present embodiment, the data to be analyzed includes a plurality of data elements. Specifically, if the data to be analyzed is a sentence, the data element can be defined as an individual word.

各ノードの多段階ヒステリシス入出力特性は、次のようなものである。まず、単純なヒステリシス特性（単独ヒステリシス特性）は、図２（ａ）に示すように、データ値ＹがＹ＝０又はＹmaxのいずれか一方の値となる。そして、下側しきい値Ｑminと、上側しきい値Ｑmaxとが定められている。当初、データ値Ｙが「０」に初期化されているとすると、入力Ｉが上側しきい値Ｑmaxを超えるまでは「０」の状態を維持する。そして入力Ｉが上側しきい値Ｑmaxを超えると、データ値ＹはＹmaxとなる。この後、入力Ｉが下側しきい値Ｑminを超えている間はＹmaxの値を維持する。入力Ｉが下側しきい値Ｑmin以下となると、データ値Ｙは「０」となる。 The multistage hysteresis input / output characteristics of each node are as follows. First, as shown in FIG. 2A, in the simple hysteresis characteristic (single hysteresis characteristic), the data value Y is either Y = 0 or Ymax. A lower threshold value Qmin and an upper threshold value Qmax are determined. Assuming that the data value Y is initially initialized to “0”, the state of “0” is maintained until the input I exceeds the upper threshold value Qmax. When the input I exceeds the upper threshold value Qmax, the data value Y becomes Ymax. After that, while the input I exceeds the lower threshold value Qmin, the value of Ymax is maintained. When the input I falls below the lower threshold value Qmin, the data value Y becomes “0”.

多段階ヒステリシスとは、この単独ヒステリシス特性を多段階に積上げたものである。具体的にこの多段階ヒステリシス特性においては、各ノードは、Ｙ＝０からＹ＝ＹmaxまでのＮ段階の値のいずれかをとる。具体的には、図２（ｂ）に示すように、各段階ごとに下側しきい値Ｑｉminと、上側しきい値Ｑｉmaxとが規定されている。 Multi-stage hysteresis is obtained by accumulating this single hysteresis characteristic in multiple stages. Specifically, in this multi-stage hysteresis characteristic, each node takes one of N-stage values from Y = 0 to Y = Ymax. Specifically, as shown in FIG. 2B, a lower threshold value Qimin and an upper threshold value Qimax are defined for each stage.

そしてデータ値が「０」にリセットされている状態で、入力が、上側しきい値Ｑ１maxを超え、第２段階における上側しきい値Ｑ２maxを超えない場合（Ｑ１max＜Ｉ＜Ｑ２maxの場合）に、出力が第１段階の出力Ｙ＝Ｙmax／Ｎとなる。またここから第１段階の下側しきい値Ｑ１minを下回るまで入力Ｉが変化すると、データ値Ｙは「０」となる。 When the data value is reset to “0” and the input exceeds the upper threshold value Q1max and does not exceed the upper threshold value Q2max in the second stage (when Q1max <I <Q2max), The output is the first stage output Y = Ymax / N. If the input I changes from here to below the lower threshold value Q1min of the first stage, the data value Y becomes “0”.

また、データ値がＹ＝Ｙmax／Ｎである状態で、入力Ｉの値が、第２段階における上側しきい値Ｑ２maxを超え、第３段階における上側しきい値Ｑ３maxを超えない場合（Ｑ２max＜Ｉ＜Ｑ３maxの場合）、データ値Ｙは第２段階の値、Ｙ＝２×Ｙmax／Ｎとなる。 When the data value is Y = Ymax / N and the value of the input I exceeds the upper threshold value Q2max in the second stage and does not exceed the upper threshold value Q3max in the third stage (Q2max <I <In the case of Q3max), the data value Y is the second stage value, Y = 2 × Ymax / N.

制御部１１は、図３に示すように、各データ要素についてデータ要素を特定する情報（例えば単語の文字列等）と、データ値とを関連付けた要素データベースを生成して記憶部１２に格納する。また、制御部１１は、図４に示すように、データ要素のペア（対）ごとに関係重みの値を設定した関係重みづけデータベースを生成して、記憶部１２に格納する。また、図４の例では、下三角部分は上三角部分を対角線で折返した値に設定される（つまりＴij＝Tji）であるので、下三角部分を省略している。 As shown in FIG. 3, the control unit 11 generates an element database in which information (for example, a character string of a word) that identifies a data element and a data value are associated with each data element, and stores the element database in the storage unit 12. . Further, as illustrated in FIG. 4, the control unit 11 generates a relation weighting database in which a relation weight value is set for each pair of data elements, and stores the relation weighting database in the storage unit 12. In the example of FIG. 4, the lower triangular part is omitted because the lower triangular part is set to a value obtained by folding the upper triangular part with a diagonal line (that is, Tij = Tji).

さらに本実施の形態では、記憶部１２に、多段階ヒステリシス特性を規定するための、各段階の下側しきい値及び上側しきい値の値が格納されている。 Further, in the present embodiment, the storage unit 12 stores the lower threshold value and the upper threshold value of each step for defining the multistep hysteresis characteristics.

制御部１１は、これら記憶部１２に格納されている情報に基づきデータ分析処理を行う。本実施の形態では、相互結合ネットワークにおける各ノードにこのようなヒステリシスユニットを対応させ、以下のステップで定められる「力学系」により活性伝播を行う。本実施の形態のデータ値記憶ユニットからなるネットワークでは、その活性伝播によって得られるアトラクタ（不動点状態）が、初期状態に連続的に依存することになる。 The control unit 11 performs a data analysis process based on the information stored in the storage unit 12. In this embodiment, such a hysteresis unit is made to correspond to each node in the interconnection network, and active propagation is performed by a “dynamic system” determined in the following steps. In the network comprising the data value storage units of the present embodiment, the attractor (fixed point state) obtained by the active propagation continuously depends on the initial state.

以下、制御部１１における活性伝播の処理について図５を参照しながら説明する。処理対象となっているデータに含まれる複数（Ｍ個）のデータ要素のうち、一つのデータ要素を所定の規則に基づいて注目データとして選択する（Ｓ１）。この規則（選択規則）は、例えばランダムに選択するとの規則である。ここでは、ｉ番目のデータ要素が選択されたものとする。そしてこの注目データに関連付けられたデータ値を更新するための入力刺激値を次の（１）式で演算する（Ｓ２）。 Hereinafter, the process of activity propagation in the control unit 11 will be described with reference to FIG. One data element is selected as attention data based on a predetermined rule among a plurality (M) of data elements included in the data to be processed (S1). This rule (selection rule) is a rule for selecting at random, for example. Here, it is assumed that the i-th data element is selected. Then, the input stimulus value for updating the data value associated with the attention data is calculated by the following equation (1) (S2).

ここで、Ｔijは、ｉ番目のデータ要素とｊ番目のデータ要素との間の関係重みづけの値であり、関係重みづけデータベースから読出されて利用される。また、Ｏjは、ｊ番目のデータ要素に関連づけられて要素データベースに格納されているデータ値である。 Here, Tij is a value of the relation weight between the i-th data element and the j-th data element, and is read from the relation weight database and used. Oj is a data value stored in the element database in association with the jth data element.

次に、注目データであるｉ番目のデータ要素に関連付けられたデータ値を決定して（Ｓ３）、要素データベースを更新する。具体的に本実施の形態のデータ値の決定は次のようにして行われる。 Next, the data value associated with the i-th data element that is the data of interest is determined (S3), and the element database is updated. Specifically, the determination of the data value of the present embodiment is performed as follows.

すなわち、注目データに関連づけられているデータ値が多段階の値のうち、どの段階の値ｎであるかを調べる。ここでは１段階ごとにＹmax／Ｎ＝ΔＹずつ変化することとしているので、データ値をΔＹで除した値として段階の値ｎを求めることができる（すなわちＹ＝ｎ×ΔＹなので、これをΔＹで除することでｎを求める）。次に、当該求めた段階に対応する下側しきい値θminと上側しきい値θmaxとを取得する。 That is, it is checked which level n of the multilevel values the data value associated with the data of interest is. Here, since Ymax / N = ΔY is changed every step, the value n of the step can be obtained by dividing the data value by ΔY (that is, since Y = n × ΔY, this is expressed as ΔY. To obtain n). Next, the lower threshold value θmin and the upper threshold value θmax corresponding to the obtained stage are acquired.

そして、処理Ｓ２にて演算された入力刺激値Ｉと、上側しきい値θmaxとを比較し、Ｉ＞θmaxの場合は、データ値Ｙを（ｎ＋１）×ΔＹに更新し、さらに処理Ｓ３を再帰的に実行する。なお、最も上の段階Ｙ＝Ｙmaxにおいては、上側しきい値を入力刺激値の最大値より大きくしておくことで、データ値が最大値を超えないようにしておく。また、下側しきい値θminと入力刺激値Ｉとを比較し、Ｉ≦θminの場合は、データ値Ｙを（ｎ−１）×ΔＹに更新し、処理Ｓ３を再帰的に実行する。この場合も、最も下の段階Ｙ＝Ｙminにおいては、下側しきい値を入力刺激値の最小値より小さくしておくことで、データ値が最小値を下回らないようにしておく。 Then, the input stimulus value I calculated in the process S2 is compared with the upper threshold value θmax. If I> θmax, the data value Y is updated to (n + 1) × ΔY, and the process S3 is recursively performed. Run it. In the uppermost stage Y = Ymax, the upper threshold value is set larger than the maximum value of the input stimulus value so that the data value does not exceed the maximum value. Further, the lower threshold value θmin is compared with the input stimulus value I. When I ≦ θmin, the data value Y is updated to (n−1) × ΔY, and the process S3 is recursively executed. Also in this case, in the lowest stage Y = Ymin, the lower threshold value is set smaller than the minimum value of the input stimulus value so that the data value does not fall below the minimum value.

また、θmin＜Ｉ≦θmaxとなっている場合は、処理Ｓ４に移行する。 If θmin <I ≦ θmax, the process proceeds to step S4.

制御部１１は、各データ要素についてのデータ値の総和が予め定めた値Ｃとなるよう、データ値をスケーリングする（Ｓ４）。すなわち、ｉ番目のデータ値をＹiとして、

を演算し、データ値Ｙiに代えて、このデータ値Ｙ′iを記憶部１２に上書きして格納する。このスケーリングによって、ネットワークの収束性が向上される。 The control unit 11 scales the data values so that the sum of the data values for each data element becomes a predetermined value C (S4). That is, if the i-th data value is Yi,

And the data value Y′i is overwritten and stored in the storage unit 12 instead of the data value Yi. This scaling improves network convergence.

こうして注目データに関連づけられているデータ値を更新してスケーリングした後、各データ要素に関するデータ値のセットで規定されるベクトル

を生成する（Ｓ５）。そして記憶部１２に前回生成したベクトルが格納されているときには、当該前回生成したベクトルと当該ベクトルとの差を算出し、当該算出された差のベクトルがゼロベクトル（その要素がすべて「０」となっているベクトル）か否か、つまり、

となっているか否かを調べ（Ｓ６）、ゼロベクトルでない場合（Ｎｏの場合）は、処理Ｓ１に戻って処理を繰返す。ここではデータ値の更新後において、各データ要素に関係するデータ値が変化しなくなったか否かを調べ、変化している間は処理Ｓ１に戻るようにしているのである。 After updating and scaling the data values associated with the data of interest in this way, a vector defined by the set of data values for each data element

Is generated (S5). When the previously generated vector is stored in the storage unit 12, the difference between the previously generated vector and the vector is calculated, and the calculated difference vector is a zero vector (all elements are “0”). Or not), that is,

(S6), if it is not a zero vector (in the case of No), the process returns to the process S1 and the process is repeated. Here, after the data value is updated, it is checked whether or not the data value related to each data element is changed, and the process returns to the process S1 while the data value is changed.

また、処理Ｓ６において、算出した差のベクトルがゼロベクトルとなっている場合（Ｙｅｓの場合）は、処理を終了する。この時点で要素データベースに格納されているデータ要素を特定する情報とデータ値とのセットが、分析結果として後の所定処理に供されることになる。なお、この要素データベースに格納されている状態は個々のデータ要素における入出力が均衡した状態（いわば力学系のアトラクタ）を表す。この状態は「不動点」と呼ばれる状態に相当する。 If the calculated difference vector is a zero vector in the processing S6 (Yes), the processing is terminated. At this time, a set of information and data values specifying the data elements stored in the element database is provided as a result of analysis to a later predetermined process. The state stored in the element database represents a state in which the input / output of each data element is balanced (so-called dynamic system attractor). This state corresponds to a state called “fixed point”.

また、ここで説明した処理Ｓ３における演算では、演算処理を再帰的に実行する必要がある。すなわちデータ値が複数の段階分だけ変化することがあり得るためである。このように複数段階に亘って変化する場合、制御部１１はしきい値と入力刺激値との比較演算を繰返して行うこととなり、段階数Ｎを大きくすると、演算速度の低下が生じる。そこで本実施の形態では、データ値を離散的にＮ段階とする代りに、Ｎを無限大とした場合の極限を想定して、データ値を連続値として扱う。ここでの連続値は、コンピュータの内部表現としては離散的な表現によって表されるものであるが、理論的には段階的な値をとるものでないとの意味で連続値と呼ぶこととしたものである。 In addition, in the calculation in the process S3 described here, it is necessary to execute the calculation process recursively. That is, the data value may change by a plurality of stages. In this way, when changing over a plurality of stages, the control unit 11 repeatedly performs a comparison operation between the threshold value and the input stimulus value. If the number N of stages is increased, the calculation speed is reduced. Therefore, in this embodiment, instead of discretely setting the data value to N stages, the data value is handled as a continuous value assuming the limit when N is infinite. The continuous value here is expressed as a discrete value as the internal representation of the computer, but in theory it is called a continuous value in the sense that it does not take a stepwise value. It is.

連続値とする場合は、図６に示すように、入力刺激値が０からＸ1minまではＹminであり、Ｘ1minからＸ1maxまでの間に連続的に単調増加し、Ｘ1max以上においてはＹmaxとなる第１単調増加関数ｆ１と、入力刺激値がＸ2minからＸ2maxまでの間に連続的に単調増加し、Ｘ2max以上においてはＹmaxとなる第２単調増加関数ｆ２とを用いてデータ値の更新処理を行う。すなわち、
（１）注目データに対する入力刺激値ＩがＸ２maxを超える場合、つまりＸ２max＜Ｉの場合：
データ値Ｙ＝Ｙmaxとする。
（２）注目データに対する入力刺激値ＩがＸ１min以下の場合、つまりＸ１min＜Ｉの場合：
データ値Ｙ＝Ｙminとする。
（３）注目データに対する入力刺激値ＩがＸ1minからＸ2maxまでの間の場合、つまりＸ１min＜Ｉ≦Ｘ２maxの場合：
更新前の注目データのデータ値Ｙについて、Ｙ＝ｆ２（Ｉ２）となるＩ２を参照して、入力刺激値Ｉが当該Ｉ２を超える場合に、データ値をｆ２（Ｉ）に更新し、
更新前の注目データのデータ値Ｙについて、Ｙ＝ｆ１（Ｉ１）となるＩ１を参照して、入力刺激値ＩがこのＩ１未満となる場合に、データ値をｆ１（Ｉ）に更新する。 In the case of a continuous value, as shown in FIG. 6, the input stimulus value is Ymin from 0 to X1min, continuously increases monotonically from X1min to X1max, and becomes Ymax when X1max or more. Data value update processing is performed using a monotonically increasing function f1 and a second monotonically increasing function f2 in which the input stimulus value continuously increases monotonically between X2min and X2max and becomes Ymax when X2max or more. That is,
(1) When the input stimulus value I for the data of interest exceeds X2max, that is, when X2max <I:
The data value is Y = Ymax.
(2) When the input stimulus value I for the data of interest is X1 min or less, that is, when X1min <I:
The data value is Y = Ymin.
(3) When the input stimulus value I for the data of interest is between X1min and X2max, that is, when X1min <I ≦ X2max:
Regarding the data value Y of the attention data before update, referring to I2 where Y = f2 (I2), if the input stimulus value I exceeds the I2, the data value is updated to f2 (I),
With respect to the data value Y of the attention data before update, with reference to I1 where Y = f1 (I1), when the input stimulus value I is less than I1, the data value is updated to f1 (I).

ここで第１，第２の単調増加関数ｆ１，ｆ２は

として定義できる。この場合、Ｙ＝ｆ２（Ｉ２）、及びＹ＝ｆ２（Ｉ１）を、それぞれ、Ｉ２，Ｉ１について解いた解が

となる。従って、上記データ値の更新処理は、次のようになる。すなわち、
（１）注目データに対する入力刺激値ＩがＸ２maxを超える場合、つまりＸ２max＜Ｉの場合：
データ値Ｙ＝Ｙmaxとする。
（２）注目データに対する入力刺激値ＩがＸ１min以下の場合、つまりＸ１min＜Ｉの場合：
データ値Ｙ＝Ｙminとする。
（３）注目データに対する入力刺激値ＩがＸ1minからＸ2maxまでの間の場合、つまりＸ１min＜Ｉ≦Ｘ２maxの場合：
更新前の注目データのデータ値Ｙについて、入力刺激値ＩがＹ／α＋Ｘ２minを超える場合に、データ値をｆ２（Ｉ）＝α（Ｉ−Ｘ２min）に更新する。また更新前の注目データのデータ値Ｙについて、入力刺激値ＩがＹ／α＋Ｘ１min未満となる場合に、データ値をｆ１（Ｉ）＝α（Ｉ−Ｘ１min）に更新する。この方法によると、判断処理の回数を制限でき、計算量を低減できる。 Here, the first and second monotonically increasing functions f1 and f2 are

Can be defined as In this case, Y = f2 (I2) and Y = f2 (I1) are solved for I2 and I1, respectively.

It becomes. Therefore, the data value update process is as follows. That is,
(1) When the input stimulus value I for the data of interest exceeds X2max, that is, when X2max <I:
The data value is Y = Ymax.
(2) When the input stimulus value I for the data of interest is X1 min or less, that is, when X1min <I:
The data value is Y = Ymin.
(3) When the input stimulus value I for the data of interest is between X1min and X2max, that is, when X1min <I ≦ X2max:
When the input stimulus value I exceeds Y / α + X2min for the data value Y of the attention data before update, the data value is updated to f2 (I) = α (I−X2min). When the input stimulus value I is less than Y / α + X1min for the data value Y of the attention data before update, the data value is updated to f1 (I) = α (I−X1min). According to this method, the number of determination processes can be limited, and the amount of calculation can be reduced.

次に、本実施の形態のデータ分析装置の動作について説明する。ここでは所望のキーワードを含む文書を、文書群から検索する通常文書検索処理と、キーとなる文書に類似する類似文書を、文書群から検索する類似文書検索処理とを行う例について述べる。 Next, the operation of the data analysis apparatus according to this embodiment will be described. Here, an example will be described in which a normal document search process for searching a document including a desired keyword from a document group and a similar document search process for searching a similar document similar to a key document from the document group are described.

これらの処理を行うにあたり、まず、検索の対象となる文書群について次の処理を行っておく。すなわち、制御部１１は、図示しないストレージに格納された文書群のデータをデータ入力部１３を介して取り出して、Ｐ個の各文書Ｄ１，Ｄ２，…，ＤＰについて、Ｑ個の検索語ｗ１，ｗ２，…，ｗＱを抽出する。ここで検索語の抽出は、各文書に含まれている単語群から、予め定めた汎用語（いわゆるストップワード）を除去し、ステミング処理（活用や語尾変化を除いて語幹部分を取り出す処理）を行ったものである。なお、ステミング処理とは例えば、childや、childhood、childrenなどの単語について、語尾変化や複数形の語変化に配慮して、同じ「child」として扱う処理である。 In performing these processes, first, the following process is performed on a document group to be searched. That is, the control unit 11 retrieves data of a document group stored in a storage (not shown) via the data input unit 13, and for each of the P documents D1, D2, ..., DP, Q search terms w1, Extract w2,..., wQ. Here, the search term is extracted by removing a predetermined general-purpose word (so-called stop word) from a word group included in each document, and performing a stemming process (a process for extracting a stem part excluding use and ending change). It is what I did. Note that the stemming process is a process in which words such as child, childhood, and children are treated as the same “child” in consideration of ending changes and plural word changes.

次に、各文書Ｄｊにおける各索引語ｗｉの重要度ｄｉｊをＴＦＩＤＦ方式（Term Frequency Inverse Document Frequency）で定義する。すなわち、

とする。ここでａｉｊは、文書Ｄｊにおける単語ｗｉの出現回数を表し、ｂｉは、文書群全体において単語ｗｉを含む文書の数を表す。またＰは、文書群内の全文書の数である。 Next, the importance dij of each index word wi in each document Dj is defined by the TFIDF method (Term Frequency Inverse Document Frequency). That is,

And Here, aij represents the number of appearances of the word wi in the document Dj, and bi represents the number of documents including the word wi in the entire document group. P is the number of all documents in the document group.

次に索引語をデータ要素として、索引語を特定する情報と、データ値とを格納するための記憶領域（要素データベース）を記憶部１２内に確保する。また、各索引語の間の関係重みを記憶するための領域（関係重み付けデータベース）を記憶部１２内に確保する。ここで関係重みは、コバリアンス学習則（co-variant learning rule）に基づいて定めておく。すなわち同一文書に共起する単語間を関係付けすることとして、単語ｗｉと、ｗｊとの間の結合重みＴｉｊを次のように定義する。 Next, using the index word as a data element, a storage area (element database) for storing information for specifying the index word and a data value is secured in the storage unit 12. Further, an area (relation weight database) for storing relation weights between index words is secured in the storage unit 12. Here, the relational weight is determined based on a covariant learning rule. That is, associating words that co-occur in the same document, the connection weight Tij between the words wi and wj is defined as follows.

ここで、通常文書検索処理を行う場合、検索のキーとなる文（クエリ文）について、当該文に含まれる索引語ｗの出現数及び文書群での索引語の出現頻度に基づいて、ＴＦＩＤＦ方式によってクエリ文での索引語ｗｉの重要度のベクトルｑを規定する。 Here, when the normal document search process is performed, the TFIDF method is used based on the number of occurrences of the index word w included in the sentence and the appearance frequency of the index word in the document group for the sentence (query sentence) serving as a search key. Defines a vector q of the importance of the index word wi in the query sentence.

そして、クエリ文での各索引語ｗｉの重要度の値を、対応する索引語に関連付けるデータ値の初期値として、記憶部１２の要素データベースに格納する。制御部１１は、図５に示す活性伝播の処理を行う。要素データベースに格納したデータ値の更新がされなくなると、制御部１１は、このデータ値の組を出力（出力ベクトル）として読み出す。なお、ベクトルのｄｊは、ｊ番目の文書について、各単語の重要度を配列したベクトル値、ベクトルのｑは、活性伝播の処理後の出力ベクトルである。 Then, the importance value of each index word wi in the query sentence is stored in the element database of the storage unit 12 as the initial value of the data value associated with the corresponding index word. The control unit 11 performs the activity propagation process shown in FIG. When the data value stored in the element database is no longer updated, the control unit 11 reads out this data value set as an output (output vector). Note that the vector dj is a vector value in which the importance of each word is arranged for the j-th document, and the vector q is an output vector after the activity propagation process.

そして制御部１１は、この出力ベクトルと、文書ごとの索引語の重要度の組との内積に関係するコサイン値

を演算する。そして、文書ごとに演算されたコサイン値（ｊ番目の文書についてＣ（ｊ））の大きい文書を降順に配列して検索結果として結果出力部１４を介して利用者に提示する。 The control unit 11 then calculates a cosine value related to the inner product of the output vector and the index word importance set for each document.

Is calculated. Then, documents having a large cosine value (C (j) for the j-th document) calculated for each document are arranged in descending order and presented as search results to the user via the result output unit 14.

なお、類似文書検索処理を行う場合、クエリ文での索引語ｗｉの重要度のベクトルｑの代りに、キーとなる文書での索引語ｗｉの重要度のベクトルｑを用い、上記と同様の処理を行うこととすればよい。 When similar document search processing is performed, instead of the vector q of the importance of the index word wi in the query sentence, the vector q of the importance of the index word wi in the key document is used, and processing similar to the above is performed. Should be done.

このように、キーワードやキーとなる文書など、検索キーについて図５に示した処理によって前処理することで、例えば検索キーには現れていないが、関連する語を含んでいる文書などが実際に検索されるようになる。 In this way, by pre-processing the search key, such as a keyword or key document, by the processing shown in FIG. 5, for example, a document that does not appear in the search key but contains a related word actually It will be searched.

本実施の形態のデータ分析装置では、このような活性伝播を利用するため、分析の対象となるデータ要素群を用いて構成したネットワークについて、その初期パタンの状態に対して連続性を有する解を与える。このため不連続に予め定め得る解に収束するネットワークに比べ、幅広い用途に利用できる。 In the data analysis apparatus according to the present embodiment, in order to use such activity propagation, a solution having continuity with respect to the state of the initial pattern of the network configured using the data element group to be analyzed is obtained. give. For this reason, it can be used for a wider range of applications than a network that converges into a solution that can be determined in a discontinuous manner.

さらに、本実施の形態においては、活性伝播の処理において、データ値のスケーリング（総和を一定値Ｃとするべく、各データ値を、その総和値で除してＣ倍する規格化）を行うことで、ネットワークの構造を損なうことなく、収束性を向上させている。なお、この場合に、データ値の初期値によってスケーリングを行わない方が収束性が高いとされる場合は、スケーリングを実行するか否かを利用者が適宜選択可能としておいてもよい。利用者がスケーリングを実行しないことを選択した場合は、制御部１１は、図５の処理において、処理Ｓ４を実行することなく、処理Ｓ３の後に処理Ｓ５に移行する。 Furthermore, in the present embodiment, in the activity propagation process, data value scaling is performed (normalization is performed by dividing each data value by the sum value and multiplying it by C to obtain a constant sum C). Therefore, the convergence is improved without damaging the network structure. In this case, if the convergence is higher when the scaling is not performed based on the initial value of the data value, the user may be able to appropriately select whether or not to perform the scaling. When the user selects not to perform scaling, the control unit 11 proceeds to the process S5 after the process S3 without executing the process S4 in the process of FIG.

本発明の実施の形態に係るデータ分析装置の構成ブロック図である。1 is a configuration block diagram of a data analysis apparatus according to an embodiment of the present invention. データ要素ごとのデータ値の更新方法例を表す説明図である。It is explanatory drawing showing the example of the update method of the data value for every data element. 要素データベースの内容例を表す説明図である。It is explanatory drawing showing the example of the content of an element database. 関係重みデータベースの内容例を表す説明図である。It is explanatory drawing showing the example of the content of a relation weight database. 活性伝播の処理例を表すフローチャート図である。It is a flowchart figure showing the processing example of activity propagation. データ値の更新方法例を表す説明図である。It is explanatory drawing showing the example of the update method of a data value.

Explanation of symbols

１１制御部、１２記憶部、１３データ入力部、１４結果出力部。 11 control unit, 12 storage unit, 13 data input unit, 14 result output unit.

Claims

Storage means for associating and storing data values that are continuous values from Ymin to Ymax (Ymin <Ymax) for each of the plurality of data elements, and storing relationship weighting information between the data elements;
One of the data elements is selected as attention data based on a predetermined rule, and the attention data is determined based on relation weighting information between the attention data and another data element and the data value of the other data element. Means for calculating an input stimulus value related to
Based on the calculated input stimulus value, a means for updating the data value of the data of interest, which is determined according to the current data value related to the data of interest, and a different threshold for each data value; A means for comparing the input stimulus value to determine whether or not to change the data value, and, when determined to change, means for updating the data value;
Means for scaling the data values such that the sum of the data values for each data element is a predetermined value;
Including
The data value for at least one of the data elements after repeatedly executing the calculation of the input stimulus value and the update of the data value until a predetermined condition is satisfied is provided for a predetermined process. A data analyzer characterized by that.

The data analysis device according to claim 1,
The means for updating the data value is:
When the input stimulus value is 0 to X1min, it is Ymin, and continuously increases monotonically from X1min to X1max. When the input stimulus value is X1max or more, the first monotonically increasing function f1 becomes Ymax, and the input stimulus value is from X2min to X2max. And a second monotonically increasing function f2 that continuously increases monotonically during the period, and becomes Ymax above X2max,
When the input stimulus value for the data of interest is between X1min and X2max,
Regarding the data value Y of the attention data before update, referring to I2 where Y = f2 (I2), if the input stimulus value I exceeds the I2, the data value is updated to f2 (I),
Regarding the data value Y of the attention data before update, referring to I1 where Y = f1 (I1), when the input stimulus value I is less than this I1, the data value is updated to f1 (I). Characteristic data analysis device.

Using a computer comprising storage means for storing continuous weight data values in association with each of a plurality of data elements, and storing relationship weighting information between the data elements,
One of the data elements is selected as attention data based on a predetermined rule, and the attention data is determined based on relation weighting information between the attention data and another data element and the data value of the other data element. Calculating an input stimulus value related to
Based on the calculated input stimulus value, a means for updating the data value of the data of interest, which is determined according to the current data value related to the data of interest, and a different threshold for each data value; Comparing the input stimulus value to determine whether to change the data value, and if it is determined to change, updating the data value; and
Scaling the data values such that the sum of the data values for each data element is a predetermined value;
Is repeatedly executed until a predetermined condition is satisfied, and the data value for at least one of the data elements after the repeated execution is subjected to a predetermined process.

A computer having storage means for storing a plurality of discrete values or continuous values of data values in association with each of a plurality of data elements and storing relation weighting information between the data elements,
One of the data elements is selected as attention data based on a predetermined rule, and the attention data is determined based on relation weighting information between the attention data and another data element and the data value of the other data element. A procedure for calculating an input stimulus value related to
Based on the calculated input stimulus value, a means for updating the data value of the data of interest, which is determined according to the current data value related to the data of interest, and a different threshold for each data value; Comparing the input stimulus value to determine whether to change the data value, and if it is determined to change, a procedure to update the data value;
A procedure for scaling the data values such that the sum of the data values for each data element is a predetermined value;
Is repeatedly executed until a predetermined condition is satisfied.